National Conference on Future Computing 2013 |
Foundation of Computer Science USA |
NCFC - Number 1 |
February 2013 |
Authors: R.brintha, S. Bhuvaneswari |
de569446-b158-4eeb-8c97-a69e46f91811 |
R.brintha, S. Bhuvaneswari . Document Clustering in Distributed Environment. National Conference on Future Computing 2013. NCFC, 1 (February 2013), 30-33.
Document clustering has emerged as a widely used technique with the increase in large number of documents that is getting accumulated day by day in various fields like news groups, government organizations, Internet and digital libraries. Document clustering is the process of grouping similar documents into clusters . A good document clustering algorithm should have high intra-cluster similarity and less inter- cluster similarity. i. e the documents with the clusters should be more relevant compared to the documents of other clusters. In this paper, the implementation of document clustering in distributed environment based on peer to peer network architecture is reviewed. The documents in local site are clustered using K-means algorithm. Hierarchical clustering is obtained when clusters in each peer combine to form the next level of cluster. This process repeats until a global cluster is formed and is made available in all the peers. These clustered documents find its application in search engines.