CFP last date
20 December 2024
Reseach Article

Text Classification by PNN-based Term Re-weighting

by Atilla Elci
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 29 - Number 12
Year of Publication: 2011
Authors: Atilla Elci
10.5120/3701-5188

Atilla Elci . Text Classification by PNN-based Term Re-weighting. International Journal of Computer Applications. 29, 12 ( September 2011), 7-13. DOI=10.5120/3701-5188

@article{ 10.5120/3701-5188,
author = { Atilla Elci },
title = { Text Classification by PNN-based Term Re-weighting },
journal = { International Journal of Computer Applications },
issue_date = { September 2011 },
volume = { 29 },
number = { 12 },
month = { September },
year = { 2011 },
issn = { 0975-8887 },
pages = { 7-13 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume29/number12/3701-5188/ },
doi = { 10.5120/3701-5188 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:15:36.734764+05:30
%A Atilla Elci
%T Text Classification by PNN-based Term Re-weighting
%J International Journal of Computer Applications
%@ 0975-8887
%V 29
%N 12
%P 7-13
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Current approaches to feature selection for text classification aim to reduce the number of terms that are used to describe documents. Thus, documents can be classified and found with greater ease and precision. A key shortcoming of these approaches is that they select the topmost terms to describe documents after ranking all terms using a feature selection measure (scoring function). Lesser high-ranking terms below the topmost terms are discarded to reduce computational costs. Nevertheless, in many cases, they may have considerable discriminative power to enhance the text classification precision. In order to address this issue, we proposed a new feature weighting formalism that ties the topmost terms with lesser high-ranking terms using probabilistic neural networks. In the proposed method, probabilistic neural networks are formed using relative category distribution matrix and topmost terms are re-weighted and passed to Rocchio classifier. This is achieved without increasing the dimensionality of the feature space. Through experiments on datasets from Reuters news collection RCV1, we show that the proposed method is a significant supplement to the statistical feature selection measures for better text classification at extreme term filtering ranges.

References
  1. Baker, L.D. and McCallum, A.K. (1998), “Distributional Clustering of Words for Text Classification”, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, 96-103.
  2. Barzilay, R. and Elhadad, M. (1997), “Using Lexical Chains for Text Summarization”, Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, 10-17.
  3. Bekkerman, R., El-Yaniv, R., Tishby, N., and Winter, Y. (2003), “Distributional Word Clusters vs. Words for Text Categorization”, Journal of Machine Learning Research, 3, 1183-1208.
  4. Buckley, C., Salton, G., and Allan, J. (1994), “The Effect of Adding Relevance Information in a Relevance Feedback Environment”, Proceedings of the 17th Annual International ACM-SIGIR Conference, Dublin, Ireland, 293-300.
  5. Granitzer, A. and Auer, P. (2005), “Experiments with Hierarchical Text Classification”, Proceedings of the Artificial Intelligence Soft Computing (ASC 2005), Ed. del POBIL, A. P., Benidorm, Spain, 481, 57-62.
  6. Kang, B.Y. and Lee, S.J. (2005), “Document Indexing: A Concept Based Approach to Term Weight Estimation”, Information Processing and Management, 41, 1065-1080.
  7. Kettenring, J.R. (2006), "The Practice of Cluster Analysis", Journal of Classification, 23, 3-30, DOI: 10.1007/s00357-006-0002-6
  8. Kyriakopoulou, A. (2008), “Text Classification Aided by Clustering: A Literature Review”, Tools in Artificial Intelligence, Ed. FRITZSCHE, P., Austria: In Tech, 233-252.
  9. Lee, H.K.H. (2007), "Default Priors for Neural Network Classification", Journal of Classification, 24, 53-70, DOI: 10.1007/s00357-007-0001-2
  10. Lewis, D.D., Yang, Y., Rose, T.G., and Li, F. (2004), “RCV1: A New Benchmark Collection for Text Categorization Research”, Journal of Machine Learning Research, 5, 361-397.
  11. Liu, Y., Loh, H.T., and Sun, A. (2009), “Imbalanced Text Classification: A Term Weighting Approach”, Expert Systems with Applications, 36, 690-701.
  12. Mladenic, D. and Grobelnik, M. (2003), “Feature Selection on Hierarchy of Web Documents”, Decision Support Systems, 35, 45-87.
  13. Morris, J. and Hirst, G. (1991), “Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text”, Computational Linguistics, 17, 21-48.
  14. Porter, M.F. (1980), “An Algorithm for Suffix Stripping”, Program, 14, 130-137.
  15. Rocchio, J.J. (1971), “The SMART Retrieval System: Experiments in Automatic Document Processing”. In Relevance Feedback in Information Retrieval, Ed. Salton, G., Englewood Cliffs, NJ: Prentice-Hall, 313-323.
  16. Sebastiani, F. (2002), “Machine Learning in Automated Text Categorization”, ACM Computing Surveys, 34, 1–47.
  17. Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y., and Wang, Z. (2007), “A Novel Feature Selection Algorithm for Text Categorization”, Expert Systems with Applications, 33, 1-5.
  18. Varelas, G., Voutsakis, E., Raftopoulou, P., Petrakis, E.G.M., and Millios, E.E. (2005), “Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web”, Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management, Bremen, Germany, 10-16.
  19. Wu, T.F., Lin, C.J., and Weng, R.C. (2004), “Probability Estimates for Multi-Class Classification by Pairwise Coupling”, Journal of Machine Learning Research, 5, 975-1005.
Index Terms

Computer Science
Information Sciences

Keywords

Term re-weighting boosting probabilistic neural networks text classification feature selection Rocchio classifier