CFP last date
20 May 2024
Reseach Article

TODV: Automatic Text Classification: A Technical Review

by Mita K. Dalal, Mukesh A. Zaveri
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 28 - Number 2
Year of Publication: 2011
Authors: Mita K. Dalal, Mukesh A. Zaveri
10.5120/3358-4633

Mita K. Dalal, Mukesh A. Zaveri . TODV: Automatic Text Classification: A Technical Review. International Journal of Computer Applications. 28, 2 ( August 2011), 37-40. DOI=10.5120/3358-4633

@article{ 10.5120/3358-4633,
author = { Mita K. Dalal, Mukesh A. Zaveri },
title = { TODV: Automatic Text Classification: A Technical Review },
journal = { International Journal of Computer Applications },
issue_date = { August 2011 },
volume = { 28 },
number = { 2 },
month = { August },
year = { 2011 },
issn = { 0975-8887 },
pages = { 37-40 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume28/number2/3358-4633/ },
doi = { 10.5120/3358-4633 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:13:44.930861+05:30
%A Mita K. Dalal
%A Mukesh A. Zaveri
%T TODV: Automatic Text Classification: A Technical Review
%J International Journal of Computer Applications
%@ 0975-8887
%V 28
%N 2
%P 37-40
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Automatic Text Classification is a semi-supervised machine learning task that automatically assigns a given document to a set of pre-defined categories based on its textual content and extracted features. Automatic Text Classification has important applications in content management, contextual search, opinion mining, product review analysis, spam filtering and text sentiment mining. This paper explains the generic strategy for automatic text classification and surveys existing solutions to major issues such as dealing with unstructured text, handling large number of attributes and selecting a machine learning technique appropriate to the text-classification application.

References
  1. Kim S., Han K., Rim H., and Myaeng S. H. 2006. Some effective techniques for naïve bayes text classification. IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 11, pp. 1457-1466.
  2. Zhang W., Yoshida T., and Tang X. 2007. Text classification using multi-word features. In proceedings of the IEEE international conference on Systems, Man and Cybernetics, pp. 3519 – 3524.
  3. Hao Lili., and Hao Lizhu. 2008. Automatic identification of stopwords in Chinese text classification. In proceedings of the IEEE international conference on Computer Science and Software Engineering, pp. 718 – 722.
  4. Porter M. F. 1980. An algorithm for suffix stripping. Program, 14 (3), pp. 130-137.
  5. Liu T., Chen Z., Zhang B., Ma W., and Wu G. 2004. Improving text classification using local latent semantic indexing. In proceedings of the 4th IEEE international conference on Data Mining, pp. 162-169.
  6. M. M. Saad Missen, and M. Boughanem. 2009. Using WordNet’s semantic relations for opinion detection in blogs. ECIR 2009, LNCS 5478, pp. 729-733, Springer-Verlag Berlin Heidelberg.
  7. Balahur A., and Montoyo A.. 2008. A feature dependent method for opinion mining and classification. In proceedings of the IEEE international conference on Natural Language Processing and Knowledge Engineering, pp. 1-7.
  8. Zhao L., and Li C.. 2009. Ontology based opinion mining for movie reviews. KSEM 2009, LNAI 5914, pp. 204-214, Springer-Verlag Berlin Heidelberg.
  9. Durant K. T., Smith M. D. 2006. Predicting the political sentiment of web log posts using supervised machine learning techniques coupled with feature selection,. WebKDD 2006, LNAI 4811, pp. 187-206, Springer-Verlag Berlin Heidelberg.
  10. Polpinij J., and Ghose A. K. 2008. An ontology-based sentiment classification methodology for online consumer reviews. In proceedings of the IEEE international conference on Web Intelligence and Intelligent Agent Technology, pp. 518-524.
  11. Ng V., Dasgupta S., and S. M. Niaz Arifin. 2006. Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In proceedings of the 21st international conference on Computational Linguistics and 44th annual meeting of the Association for Computational Linguistics, pp. 611-618.
  12. Goyal R. D. 2007. Knowledge based neural network for text classification. In proceedings of the IEEE international conference on Granular Computing, pp. 542 – 547.
  13. Changuel S., Labroche N., and Bouchon-Meunier B. 2009. Automatic web pages author extraction. LNAI 5822, pp. 300-311, Springer-Verlag Berlin Heidelberg.
  14. Jones K. S. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, Vol. 28, No. 1, pp. 11-21.
  15. Deerwester S., Dumais S. T., Landauer T. K., Furnas G. W., and Harshman R.. 1990. Indexing by Latent Semantic Analysis. Journal of American Society of Information Science, 41(6), pp. 391-407.
  16. Church K. W., and Hanks P. 1990. Word association norms, mutual information and lexicography. Computational Linguistics, Vol. 16, No. 1, pp. 22-29.
  17. Meena M. J., and Chandran K. R. 2009. Naïve bayes text classification with positive features selected by statistical method. In proceedings of the IEEE international conference on Advanced Computing, pp. 28 – 33.
  18. Zhang W., Yoshida T., and Tang X.. 2008. TF-IDF, LSI and Multi-word in information retrieval and text categorization. In proceedings of the IEEE international conference on Systems, Man and Cybernetics, pp. 108 – 113.
  19. Jones K. S. 2004. IDF term weighting and IR research lessons. Journal of Documentation, Vol. 60, No. 5, pp. 521-523.
  20. Wang Z., He Y., and Jiang M.. 2006. A comparison among three neural networks for text classification. In proceedings of the IEEE 8th international conference on Signal Processing.
  21. Isa D., Lee L. H., Kallimani V. P., and RajKumar R. 2008. Text document pre-processing with the Bayes formula for classification using the support vector machine. IEEE Transactions on Knowledge and Data Engineering, Vol. 20, No. 9, pp. 1264 – 1272.
  22. Rujiang B., and Junhua L.. 2009. A novel conception based text classification method. In proceedings of the IEEE international e-conference on Advanced Science and Technology, pp. 30 – 34.
  23. Wang Z., Sun X., Zhang D., Li X. 2006. An optimal SVM-based text classification algorithm. In proceedings of the 5th IEEE international conference on Machine Learning and Cybernetics, pp. 1378 – 1381.
  24. Zhang M., and Zhang D.. 2008. Trained SVMs based rules extraction method for text classification. In proceedings of the IEEE international symposium on IT in medicine and Education, pp. 16 – 19.
  25. Yuan P., Chen Y., Jin H., and Huang L. 2008. MSVM-kNN : Combining SVM and k-NN for multi-class text classification. IEEE international workshop on Semantic Computing and Systems, pp. 133 – 140.
  26. Zhang B., Su J., and Xu X. 2006. A class-incremental learning method for multi-class support vector machines in text classification. In proceedings of the 5th IEEE international conference on Machine Learning and Cybernetics, pp. 2581 – 2585.
  27. Quinlan J. R. 1986. Induction of Decision Trees. Machine Learning, pp. 81-106.
  28. Quinlan J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers.
Index Terms

Computer Science
Information Sciences

Keywords

automatic text classification feature-extraction pre-processing text mining