CFP last date
20 January 2025
Reseach Article

Feature Selection for Effective Text Classification using Semantic Information

by Rajul Jain, Nitin Pise
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 113 - Number 10
Year of Publication: 2015
Authors: Rajul Jain, Nitin Pise
10.5120/19861-1818

Rajul Jain, Nitin Pise . Feature Selection for Effective Text Classification using Semantic Information. International Journal of Computer Applications. 113, 10 ( March 2015), 18-25. DOI=10.5120/19861-1818

@article{ 10.5120/19861-1818,
author = { Rajul Jain, Nitin Pise },
title = { Feature Selection for Effective Text Classification using Semantic Information },
journal = { International Journal of Computer Applications },
issue_date = { March 2015 },
volume = { 113 },
number = { 10 },
month = { March },
year = { 2015 },
issn = { 0975-8887 },
pages = { 18-25 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume113/number10/19861-1818/ },
doi = { 10.5120/19861-1818 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:50:34.867364+05:30
%A Rajul Jain
%A Nitin Pise
%T Feature Selection for Effective Text Classification using Semantic Information
%J International Journal of Computer Applications
%@ 0975-8887
%V 113
%N 10
%P 18-25
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Text categorization is the task of assigning text or documents into pre-specified classes or categories. For an improved classification of documents text-based learning needs to understand the context, like humans can decide the relevance of a text through the context associated with it, thus it is required to incorporate the context information with the text in machine learning for better classification accuracy. This can be achieved by using semantic information like part-of-speech tagging associated with the text. Thus the aim of this experimentation is to utilize this semantic information to select features which may provide better classification results. Different datasets are constructed with each different collection of features to gain an understanding about what is the best representation for text data depending on different types of classifiers.

References
  1. Sebastiani F. , "Machine Learning in Automated Text Categorization", ACM Computing Surveys, vol. 34 (1), 2002, pp. 1-47.
  2. Ikonomakis M. , Kotsiantis S. and Tampakas V. : "Text Classification Using Machine Learning Techniques", WSEAS Transactions on Computers, Volume 4, August 2005.
  3. Sebastiani F. "Text categorization. ", In Laura C. Rivero, Jorge H. Doorn and Viviana E. Ferraggine (eds. ), The Encyclopedia of Database Technologies and Applications, Idea Group Publishing, Hershey, US, 2005, pp. 683-687.
  4. Harish B. S. , Guru D. S. and Manjunath S. ; "Representation and Classification of Text Documents: A Brief Review", in IJCA Special Issue on "Recent Trends in Image Processing and Pattern Recognition" RTIPPR, 2010.
  5. Patra A. and Singh D. : "A Survey Report on Text Classification with Different Term Weighing Methods and Comparison between Classification Algorithms", International Journal of Computer Applications, Volume 75, August 2013.
  6. Shen D, Sun J-T, Yang Q, Chen Z: "Text Classification Improved through Multigram Models" at ACM Transactions at CIKM'06, Nov. 2006, Virginia, USA.
  7. Giannakopoulos G, Mavridi P, Paliouras G, Papadakis G, Tserpes K: "Representation Models for Text Classification: a comparative analysis over three Web document types", ACM Transactions at WIMS'12, June 2012 , Romania.
  8. Gayathri K, Marimuthu A: "Text Document Pre-Processing with the KNN for Classification Using the SVM", Proceedings of 7th International Conference on Intelligent Systems and Control (ISCO 2013) IEEE.
  9. Zhixing Li, Zhongyang Xiong, Yufang Zhang, Chunyong Liu, Kuan Li: "Fast text categorization using concise semantic analysis", Pattern Recognition letters (2011), Elsevier.
  10. Keikha M, Khonsari A, Oroumchian F: "Rich document representation and classification: An analysis", Knowledge-Based Systems (2009), Elsevier.
  11. Suganya S, Gomathi C, ManoChitra S: "Syntax and Semantics based Efficient Text Classification Framework", International Journal of Computer Applications, Volume 65, March 2013.
  12. Chagheri S, Calabretto S, Roussey C, Dumoulin C: "Feature Vector Construction Combining Structure and Content for Document Classification", 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), 2012 IEEE.
  13. Celik K; Gungor T: "A comprehensive analysis of using semantic information in text categorization", International Symposium on Innovations in Intelligent Systems and Applications, 2013 IEEE.
  14. Kulkarni A. R. ; Tokekar V; Kulkarni P: "Identifying context of text documents using Naïve Bayes classification and Apriori association rule mining", CSI Sixth International Conference on Software Engineering, 2012.
  15. Niharika S. , SnehaLatha V. , Lavanya D. R. : "A Survey on Text Categorization", at the International Journal of Computer Trends and Technology- Volume3, 2012.
  16. Yan Xu: "A Study for Important Criteria of Feature Selection in Text Categorization", 2nd International Workshop on Intelligent Systems and Applications (ISA), 2010, IEEE.
  17. Li Y. H. and Jain A. K. , "Classification of Text Documents", The Computer Journal, Vol. 41, No. 8, 1998, IEEE Journal.
  18. Wang Ziqiang, Qian Xu: "Text Categorization Based on LDA and SVM", 2008 International Conference on Computer Science and Software Engineering, IEEE.
  19. Jiang Xiao-yu, Fan Xiao-zhong, Chen Kang: "Chinese Text Classification Based on Summarization Technique", Third International Conference on Semantics, Knowledge and Grid, 2007 IEEE.
  20. Jiang Xiao-Yu, Fan Xiao-Zhong, Wang Zhi-Fei, Jia Ke-Liang: "Improving the Performance of Text Categorization using Automatic Summarization", International Conference on Computer Modeling and Simulation,2009 IEEE.
  21. Ragas H, Koster Cornelis H. A. , "Four Text Classification Algorithms Compared on a Dutch corpus", In Proceedings of ACM Transactions SIGIR. '98.
  22. Joachims, T. (1998). "Text Categorization with Support Vector Machines: Learning with Many Relevant Features". Proceedings of ECML-98, 10th European Conference on Machine Learning.
  23. Ozg¨ur A. , ¨Ozg¨ur L. , and G¨ung¨or T. , "Text Categorization with Class-Based and Corpus-Based Keyword Selection", P. Yolum et al. (Eds. ): ISCIS 2005, Springer.
  24. Farkas Jennifer, "Improving the Classification Accuracy of Automatic Text Processing Systems Using Context Vectors and Back-Propagation Algorithms", at the Proceedings of the 1996 Canadian Conference on Electrical and Computer Engineering.
  25. Chen Z H, Huang L and Murphey Y Li: "Incremental Learning for Text Document Classification", International Joint Conference on Neural Networks, Orlando, Florida, USA, August 2007, IEEE.
  26. Jiang S, Pang G, Wu M, Kuang L: "An improved K-nearest-neighbor algorithm for text categorization", Expert Systems with Applications 39, 2012 Elsevier.
  27. Korde V; Mahender C. N. ; "Text Classification And Classifiers:A Survey", at International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 3, March 2012.
  28. Antonie M. , Zai'ane O, "Text Document Categorization by Term Association", at the Proceedings of ICDM 2002, IEEE, pp. 19-26 ,2002.
  29. Khan Aurangzeb, Baharudin Baharum, Lee Lam Hong, Khan Khairullah: "A Review of Machine Learning Algorithms for Text-Documents Classification", In Journal Of Advances In Information Technology, Vol. 1, February 2010.
  30. Larkey L. S and Croft W. B, "Combining Classifiers in Text Categorization", In Proceedings of ACM SIGIR'96.
  31. Qingxuan Chen, Dequan Zheng, Tiejun Zhao?Sheng Li: "A Fusion of Multiple Classifiers Approach Based on Reliability function for Text Categorization", Fifth International Conference on Fuzzy Systems and Knowledge Discovery, 2008 IEEE.
  32. Z. -H. Zhou. , "Ensemble learnin. " In: S. Z. Li ed. Encyclopedia of Biometrics, Berlin: Springer, 2009, 270-273.
  33. Silva Catarina, Ribeiro Bernardete: "RVM Ensemble for Text Classification", International Journal of Computational Intelligence Research. Vol. 3, pp 31–35, 2007.
  34. Lahlou F. Z. , Mountassir A, Benbrahim H and Kassou I: "A Text Classification Based Method for Context Extraction from Online Reviews", 8th International Conference on Intelligent Systems: Theories and Applications (SITA), 2013 IEEE.
  35. Lewis, D. , "Reuters-21578 text categorization test collection Distribution 1. 0 README file (v 1. 3)", 14 May 2004. Available online at http://www. daviddlewis. com / resources / testcollections/ reuters21578/ readme. txt.
Index Terms

Computer Science
Information Sciences

Keywords

Context POS tagging semantic information text categorization