CFP last date
20 January 2025
Reseach Article

Applying different Feature Selection and Classification Parameters for Categorization

by Syed Basit Ali, Yan Qiang, Saad Abdul Rauf, Farhan Zaka
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 177 - Number 17
Year of Publication: 2019
Authors: Syed Basit Ali, Yan Qiang, Saad Abdul Rauf, Farhan Zaka
10.5120/ijca2019919621

Syed Basit Ali, Yan Qiang, Saad Abdul Rauf, Farhan Zaka . Applying different Feature Selection and Classification Parameters for Categorization. International Journal of Computer Applications. 177, 17 ( Nov 2019), 45-49. DOI=10.5120/ijca2019919621

@article{ 10.5120/ijca2019919621,
author = { Syed Basit Ali, Yan Qiang, Saad Abdul Rauf, Farhan Zaka },
title = { Applying different Feature Selection and Classification Parameters for Categorization },
journal = { International Journal of Computer Applications },
issue_date = { Nov 2019 },
volume = { 177 },
number = { 17 },
month = { Nov },
year = { 2019 },
issn = { 0975-8887 },
pages = { 45-49 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume177/number17/30994-2019919621/ },
doi = { 10.5120/ijca2019919621 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:46:11.475020+05:30
%A Syed Basit Ali
%A Yan Qiang
%A Saad Abdul Rauf
%A Farhan Zaka
%T Applying different Feature Selection and Classification Parameters for Categorization
%J International Journal of Computer Applications
%@ 0975-8887
%V 177
%N 17
%P 45-49
%D 2019
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In today’s data-intensive world, millions of data is generated, processed and transferred, The main factors for the generation of data is an increase in the usage of social media and so is the increase in data mining methodologies. Text Classification is one of the most important aspects of data mining which includes fetching of data, pre-processing it and then applying classifiers to divide the data into the categories so as it would be easy to process and subject to further experimentation. In this paper, data is subjected through certain feature selection techniques enhancing its parameters and then applied multiple Machine Learning classifiers on it so as to study various parameters of the data which include accuracy, precision and various averages. The impact of increasing or decreasing the categories for classification of text on accuracy through various classifiers is studied which include Naive Bayes, Support Vector Machine and K-Nearest Neighbour and also the combination of individual classifiers in an ensemble classifier. In this research the internal parameters of Feature Selection Techniques and classifiers are also changed which lead to a slightest increase in overall accuracy of the classifier. Reducing different categories also increases accuracy to a greater extent because it also reduces the presence of multiple similar categories which lead to decrease in overall accuracy. Certain changes in the feature selection parameters are also included which is trying algorithms on uni-gram, bi-gram and tri-gram models and out of which bi-gram shows the best overall accuracy result with Support Vector Machine classifier.

References
  1. Thelwall M, Buckley K, Paltoglou G. Sentiment in Twitter events. Journal of the American Society for Information Science and Technology. 2011 Feb;62(2):406-18.
  2. Nigam K, McCallum AK, Thrun S, Mitchell T. Text classification from labeled and unlabeled documents using EM. Machine learning. 2000 May 1;39(2-3):103-34.
  3. Joachims T. Text categorization with support vector machines: Learning with many relevant features. InEuropean conference on machine learning 1998 Apr 21 (pp. 137-142). Springer, Berlin, Heidelberg.
  4. Kwon OW, Lee JH. Text categorization based on k-nearest neighbor approach for web site classification. Information Processing & Management. 2003 Jan 1;39(1):25-44.
  5. Conneau A, Schwenk H, Barrault L, Lecun Y. Very deep convolutional networks for text classification. arXiv preprint arXiv:1606.01781. 2016 Jun 6.
  6. Lai S, Xu L, Liu K, Zhao J. Recurrent convolutional neural networks for text classification. InTwenty-ninth AAAI conference on artificial intelligence 2015 Feb 19.
  7. Perikos I, Hatzilygeroudis I. Recognizing emotions in text using ensemble of classifiers. Engineering Applications of Artificial Intelligence. 2016 May 1;51:191-201.
  8. Kim HJ, Kim J, Kim J, Lim P. Towards perfect text classification with Wikipedia-based semantic Naïve Bayes learning. Neurocomputing. 2018 Nov 13;315:128-34.
  9. Meyer D, Hornik K, Feinerer I. Text mining infrastructure in R. Journal of statistical software. 2008 Mar 31;25(5):1-54.
  10. Na JC, Sui H, Khoo CS, Chan S, Zhou Y. Effectiveness of simple linguistic processing in automatic sentiment classification of product reviews.
  11. Cao J, Kwong S, Wang R, Li X, Li K, Kong X. Class-specific soft voting based multiple extreme learning machines ensemble. Neurocomputing. 2015 Feb 3;149:275-84.
  12. Tetlock PC, Saar‐Tsechansky M, Macskassy S. More than words: Quantifying language to measure firms' fundamentals. The Journal of Finance. 2008 Jun;63(3):1437-67.
  13. Wilson T, Wiebe J, Hoffmann P. Recognizing contextual polarity in phrase-level sentiment analysis. InProceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing 2005.
  14. Yu H, Hatzivassiloglou V. Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. InProceedings of the 2003 conference on Empirical methods in natural language processing 2003 Jul 11 (pp. 129-136). Association for Computational Linguistics.
  15. Tan LK, Na JC, Theng YL, Chang K. Sentence-level sentiment polarity classification using a linguistic approach. InInternational Conference on Asian Digital Libraries 2011 Oct 24 (pp. 77-87). Springer, Berlin, Heidelberg.
  16. Das SR. News analytics: Framework, techniques and metrics. InThe Handbook of News Analytics in Finance 2011 May 16 (Vol. 2). John Wiley & Sons Chichester.
  17. Pang B, Lee L, Vaithyanathan S. Thumbs up?: sentiment classification using machine learning techniques. InProceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10 2002 Jul 6 (pp. 79-86). Association for Computational Linguistics.
  18. Melville P, Gryc W, Lawrence RD. Sentiment analysis of blogs by combining lexical knowledge with text classification. InProceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining 2009 Jun 28 (pp. 1275-1284). ACM.
  19. Abbasi A, France S, Zhang Z, Chen H. Selecting attributes for sentiment classification using feature relation networks. IEEE Transactions on Knowledge and Data Engineering. 2010 Jul 15;23(3):447-62.
  20. Tan C, Lee L, Tang J, Jiang L, Zhou M, Li P. User-level sentiment analysis incorporating social networks. InProceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining 2011 Aug 21 (pp. 1397-1405). ACM.
  21. Boubacar A, Niu Z. Conceptual clustering. InFuture Information Technology 2014 (pp. 1-8). Springer, Berlin, Heidelberg.
  22. Gabrilovich E, Markovitch S. Wikipedia-based semantic interpretation for natural language processing. Journal of Artificial Intelligence Research. 2009 Mar 30;34:443-98.
  23. Hu X, Zhang X, Lu C, Park EK, Zhou X. Exploiting Wikipedia as external knowledge for document clustering. InProceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining 2009 Jun 28 (pp. 389-396). ACM.
  24. Wang P, Hu J, Zeng HJ, Chen Z. Using Wikipedia knowledge to improve text classification. Knowledge and Information Systems. 2009 Jun 1;19(3):265-81.
Index Terms

Computer Science
Information Sciences

Keywords

Text Classification Machine Learning Naive Bayes Support Vector Machine K-Nearest Neighbors Ensemble Soft-voting’