International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 122 - Number 21 |
Year of Publication: 2015 |
Authors: Mrunal V. Upasani, Rucha C. Samant |
10.5120/21848-5165 |
Mrunal V. Upasani, Rucha C. Samant . Clustering and Classification of Documents based on Meta Information using COATES and COLT Algorithms. International Journal of Computer Applications. 122, 21 ( July 2015), 15-19. DOI=10.5120/21848-5165
The side information means the meta information of the documents can be used for the purpose of data mining applications like clustering, classification etc. Huge amount of meta-information is available along with the text documents in many text mining applications. Such meta-information is of different kinds, likes links in the document, user-access behavior from web logs etc. which can be useful for data mining. Tremendous amount of information can be found in this unstructured attributes for clustering purposes. Therefore, this system used an approach which carefully ascertains the coherence of the clustering characteristics of the meta information with that of the text content. For improving the quality of the clustering both the text data and meta information is helpful. In this system, the design of an algorithm which combines classical partitioning algorithms with probabilistic models in order to create an effective clustering approach using meta information present in document was performed. Then it shows how to extend the clustering approach to the classification problem. COATES and COLT algorithm for clustering and classification of text data along with the meta information are used and it shows the advantages of using such an approach.