International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 127 - Number 17 |
Year of Publication: 2015 |
Authors: Aishwarya Kappala, Sudhakar Godi |
10.5120/ijca2015906717 |
Aishwarya Kappala, Sudhakar Godi . A Naive Clustering Algorithm for Text Mining. International Journal of Computer Applications. 127, 17 ( October 2015), 20-24. DOI=10.5120/ijca2015906717
Predefined categories can be assigned to the natural language text using for text classification. It is a “bag-of-word” representation, previous documents have a word with values, it represents how frequently this word appears in the document or not. But large documents may face many problems because they have irrelevant or abundant information is there. This paper explores the effect of other types of values, which express the distribution of a word in the document. These values are called distributional features. All features are calculated by tfidf style equation and these features are combined with machine learning techniques. Term frequency is one of the major factor for distributional features it holds weighted item set. When the need is to minimize a certain score function, discovering rare data correlations is more interesting than mining frequent ones. This paper tackles the issue of discovering rare and weighted item sets, i.e., the infrequent weighted item set mining problem. The classifier which gives the more accurate result is selected for categorization. Experiments show that the distributional features are useful for text categorization.