International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 81 - Number 6 |
Year of Publication: 2013 |
Authors: Subhajit Dey Sarkar, Saptarsi Goswami |
10.5120/14018-2173 |
Subhajit Dey Sarkar, Saptarsi Goswami . Empirical Study on Filter based Feature Selection Methods for Text Classification. International Journal of Computer Applications. 81, 6 ( November 2013), 38-43. DOI=10.5120/14018-2173
Text classification has become much more relevant with the increased volume of unstructured data from various sources. Several techniques have been developed for text classification. High dimensionality of feature space is one of the established problems in text classification. Feature selection is one of the techniques to reduce dimensionality. Feature selection helps in increasing classifier performance, reduce over filtering to speed up the classification model construction and testing and make models more interpretable. This paper presents an empirical study comparing performance of few feature selection techniques (Chi-squared, Information Gain, Mutual Information and Symmetrical Uncertainty) employed with different classifiers like naive bayes, SVM, decision tree and k-NN. Motivation of the paper is to present results of feature selection methods on various classifiers on text datasets. The study further allows comparing the relative performance of the classifiers and the methods.