International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 177 - Number 17 |
Year of Publication: 2019 |
Authors: Syed Basit Ali, Yan Qiang, Saad Abdul Rauf, Farhan Zaka |
10.5120/ijca2019919621 |
Syed Basit Ali, Yan Qiang, Saad Abdul Rauf, Farhan Zaka . Applying different Feature Selection and Classification Parameters for Categorization. International Journal of Computer Applications. 177, 17 ( Nov 2019), 45-49. DOI=10.5120/ijca2019919621
In today’s data-intensive world, millions of data is generated, processed and transferred, The main factors for the generation of data is an increase in the usage of social media and so is the increase in data mining methodologies. Text Classification is one of the most important aspects of data mining which includes fetching of data, pre-processing it and then applying classifiers to divide the data into the categories so as it would be easy to process and subject to further experimentation. In this paper, data is subjected through certain feature selection techniques enhancing its parameters and then applied multiple Machine Learning classifiers on it so as to study various parameters of the data which include accuracy, precision and various averages. The impact of increasing or decreasing the categories for classification of text on accuracy through various classifiers is studied which include Naive Bayes, Support Vector Machine and K-Nearest Neighbour and also the combination of individual classifiers in an ensemble classifier. In this research the internal parameters of Feature Selection Techniques and classifiers are also changed which lead to a slightest increase in overall accuracy of the classifier. Reducing different categories also increases accuracy to a greater extent because it also reduces the presence of multiple similar categories which lead to decrease in overall accuracy. Certain changes in the feature selection parameters are also included which is trying algorithms on uni-gram, bi-gram and tri-gram models and out of which bi-gram shows the best overall accuracy result with Support Vector Machine classifier.