We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Reseach Article

Comparative Study and Analysis of Supervised and Unsupervised Term Weighting Methods on Text Classification

by Mahak Motwani, Aruna Tiwari
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 68 - Number 10
Year of Publication: 2013
Authors: Mahak Motwani, Aruna Tiwari
10.5120/11616-7013

Mahak Motwani, Aruna Tiwari . Comparative Study and Analysis of Supervised and Unsupervised Term Weighting Methods on Text Classification. International Journal of Computer Applications. 68, 10 ( April 2013), 24-27. DOI=10.5120/11616-7013

@article{ 10.5120/11616-7013,
author = { Mahak Motwani, Aruna Tiwari },
title = { Comparative Study and Analysis of Supervised and Unsupervised Term Weighting Methods on Text Classification },
journal = { International Journal of Computer Applications },
issue_date = { April 2013 },
volume = { 68 },
number = { 10 },
month = { April },
year = { 2013 },
issn = { 0975-8887 },
pages = { 24-27 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume68/number10/11616-7013/ },
doi = { 10.5120/11616-7013 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:27:28.665829+05:30
%A Mahak Motwani
%A Aruna Tiwari
%T Comparative Study and Analysis of Supervised and Unsupervised Term Weighting Methods on Text Classification
%J International Journal of Computer Applications
%@ 0975-8887
%V 68
%N 10
%P 24-27
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Text Classification is one of the booming area in research with the availability of huge amount of electronic data in the form of news article, research articles, email message, blog, web pages etc. Text Representation is a vital step for text classification. In text representation, term weighting method assigns appropriate weights to the term to get better performance; the term weighting method which uses known information on membership of training document is supervised Term weighting method. Unsupervised term weighting method tf is compared with supervised Term weighting method tf. rf with Back Propagation Neural Network, results of experiment demonstrates that term weighing method (tf. rf) performs better than (tf) term frequency.

References
  1. Goyal R. D. 2007," Knowledge based neural network for text classification". In proceedings of the IEEE international conference on Granular Computing, pp. 542 – 547.
  2. . T. Joachims,"Text Categorization with Support Vector Machine:Learning with many relevant features" Machine learning:ECML-98. 10th European Conference on Machine Learning,p. 137-42,Proceeding 1998
  3. . B. Svingen, "Using Genetic programming for document classification",FLAIRS-98,Proceeding of eleventh Florida Artificial Intelligence Research,p 63-67,1998.
  4. M. Benkhalifa,A Bensaid and A Mouradi"Text Categorization using Fuzzy C means Algorithm," 18th international conference of the north American Fuzzy Information Proceeding Society-NAFIPS,p. 561-5,1999
  5. J. Farkas"Generating Document Clusters using Thesauri and NeuralNetworks"Canadian Conference on Electrical and Computer Engineering, Vol 2,p. 710-713,1994
  6. M A Wajeed,T Vijayalaxmi,"Different Similarity Measure for Text Classification using KNN" International Conference on computer Communication Technology at NIT Allahabad Sept. 2011
  7. P. Rothman. "Syntactic Pattern Recognition . " AI Expert, Vol. 7 . pages 41-51, 1992
  8. Zhihang Chen, chengwen Ni,Murphey Y. L,"Neural network approaches for text document categorization",Neural Network 2006 IJCNN,
  9. Wang,Z, He,Y, Jiang M"A Cmparison among three Neural Networks for text classification"Internation Conference on Signal Processing,2006 volume 3 p 16-20
  10. Wei Wang, Bo Yu" Text categorization based on combination of modified back propagation neural network and latent semantic analysis" Neural Computing & Application (2009) p: 875–881
  11. Combination of modified BPNN algorithms and an efficient feature selection method for text categorization
  12. M. Lan, C. L. Tan, H. B. Low, and S. Y. Sung, "A Comprehensive Comparative Study on Term Weighting Schemes for Text Categorization with Support Vector Machines," Special Interest Tracks and Posters of the www, pp. 1032-1033, 2005. 13] Kim S. , Han K. , Rim H. , and Myaeng S. H. 2006. "Some effective techniques for naïve bayes text classification". IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 11, pp. 1457-1466.
  13. Zhang W. , Yoshida T. , and Tang X. 2007. "Text classification using multi-word features". In proceedings of the IEEE international conference on Systems, Man and Cybernetics, pp. 3519 – 3524.
  14. Hao Lili. , and Hao Lizhu. 2008. "Automatic identification of stopwords in Chinese text classification". In proceedings of the IEEE international conference on Computer Science and Software Engineering, pp. 718 – 722.
  15. Porter M. F. 1980. "An algorithm for suffix stripping". Program, 14 (3), pp. 130-137.
  16. Gerard Salton , Christopher Buckley "Term-weighting approaches in automatic text retrieval "(1988) in Information Processing And Management,p 1214-9
  17. Harry Wu, Gerard Salton" The Estimation Of Term Relevance Weights Using Relevance Feedback" Journal of Documentation, Vol. 37 Iss: 4, pp. 194 - 214
  18. Combination of modified BPNN algorithms and an efficient feature selection method for text categorization Cheng Hua Li *, Soon Cheol Park, Information Processing and Management 45 (2009) 329–340
Index Terms

Computer Science
Information Sciences

Keywords

Term Weighting Method Relevance Factor Term Frequency