International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 61 - Number 18 |
Year of Publication: 2013 |
Authors: Shailendra Kumar Shrivastava, J. L. Rana, R. C. Jain |
10.5120/10032-5077 |
Shailendra Kumar Shrivastava, J. L. Rana, R. C. Jain . Text Document Clustering based on Phrase Similarity using Affinity Propagation. International Journal of Computer Applications. 61, 18 ( January 2013), 38-44. DOI=10.5120/10032-5077
Affinity propagation (AP) was recently introduced as an un-supervised learning algorithm for exemplar based clustering. In this paper novel text document clustering algorithm has been developed based on vector space model, phrases and affinity propagation clustering algorithm. Proposed algorithm can be called Phrase affinity clustering (PAC). PAC first finds the phrase by ukkonen suffix tree construction algorithm, second finds the vector space model using tf-idf weighting scheme of phrase. Third calculate the similarity matrix form VSD using cosine similarity . In Last affinity propagation algorithm generate the clusters . F-Measure ,Purity and Entropy of Proposed algorithm is better than GAHC ,ST-GAHC and ST-KNN on OHSUMED ,RCV1 and News group data sets.