International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 52 - Number 1 |
Year of Publication: 2012 |
Authors: Saurabh Sharma, Vishal Gupta |
10.5120/8167-1407 |
Saurabh Sharma, Vishal Gupta . Hybrid Approach for Punjabi Text Clustering. International Journal of Computer Applications. 52, 1 ( August 2012), 32-36. DOI=10.5120/8167-1407
Text Clustering is a text mining technique which is used to group similar documents into single cluster by using some sort of similarity measure and placing dissimilar documents into different clusters. Most of the popular clustering algorithms treats document as conglomeration of words and do not consider the syntactic or semantic relations between words. To overcome this drawback, some algorithms were proposed which aimed at trying to find connections among different words in a sentence by using different concepts, e. g. Frequent Itemsets, Frequent Words Sequences, Frequent Word Meaning Sequences, Ontology based clustering. In this paper, we proposed a hybrid algorithm for clustering of Punjabi text document, which uses semantic relations among words in a sentence for extracting phrases. Phrases extracted create a feature vector of the document which is used for finding similarity among all documents. Results on experiment data reveal that hybrid algorithm is more reasonable and has a better performance with real time data sets.