International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 164 - Number 8 |
Year of Publication: 2017 |
Authors: B. S. Harish, M. B. Revanasiddappa |
10.5120/ijca2017913711 |
B. S. Harish, M. B. Revanasiddappa . A Comprehensive Survey on various Feature Selection Methods to Categorize Text Documents. International Journal of Computer Applications. 164, 8 ( Apr 2017), 1-7. DOI=10.5120/ijca2017913711
Feature selection is one of the well known solution to high dimensionality problem of text categorization. In text categorization, selection of good features (terms) plays a very important role. Feature selection is a strategy that can be used to improve categorization accuracy, effectiveness and computational efficiency. This paper presents an empirical study of most widely used feature selection methods viz. Term Frequency-Inverse Document Frequency (tf idf ), Information Gain (IG), Mutual Information(MI), CHI-Square ( 2), Ambiguity Measure (AM), Term Strength (TS), Term Frequency-Relevance Frequency (tf rf ) and Symbolic Feature Selection (SFS) with five different classifiers (Nave Bayes, KNearest Neighbor, Centroid Based Classifier, Support Vector Machine and Symbolic Classifier). Experimentations are carried out on standard bench mark datasets like Reuters-21578, 20-Newsgroups and 4 University dataset.