CFP last date
20 December 2024
Reseach Article

Algorithm for Punjabi Text Classification

by Nidhi, Vishal Gupta
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 37 - Number 11
Year of Publication: 2012
Authors: Nidhi, Vishal Gupta
10.5120/4731-6925

Nidhi, Vishal Gupta . Algorithm for Punjabi Text Classification. International Journal of Computer Applications. 37, 11 ( January 2012), 30-35. DOI=10.5120/4731-6925

@article{ 10.5120/4731-6925,
author = { Nidhi, Vishal Gupta },
title = { Algorithm for Punjabi Text Classification },
journal = { International Journal of Computer Applications },
issue_date = { January 2012 },
volume = { 37 },
number = { 11 },
month = { January },
year = { 2012 },
issn = { 0975-8887 },
pages = { 30-35 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume37/number11/4731-6925/ },
doi = { 10.5120/4731-6925 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:24:04.743950+05:30
%A Nidhi
%A Vishal Gupta
%T Algorithm for Punjabi Text Classification
%J International Journal of Computer Applications
%@ 0975-8887
%V 37
%N 11
%P 30-35
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Text Mining is a field that extracts hidden, not yet discovered, useful information from the text document according to user’s query. And Text Classification is one of the text mining tasks to manage the information efficiently, by classifying the documents into classes using classification algorithms. Any text classification method uses a set of features to characterize each text document, where these features should be relevant to the task at hand. Not much work has been done for Punjabi text classification. Adequate annotated corpora are not yet available in Punjabi. This paper introduces preprocessing techniques, features selection methods for Punjabi and classification algorithm to classify the Punjabi Text documents.

References
  1. J.H. Kroeze, M.C. Matthee and T.J.D. Bothma, July 2007, “Differentiating between data-mining and text-mining terminology”, “doi: 10.1.1.95.7062”.
  2. F. Sebastiani, 2002 “Machine learning in automated text categorization”, ACM Computer Surveys 34(1), 1–47.
  3. Nawei Chen and Dorothea Blostein, 2006, “A survey of document image classification: problem statement, classifier architecture and performance evaluation”, Springer-Verlag, “doi: 10.1007/s10032-006-0020-2”.
  4. Christoph Goller, Joachim Löning, Thilo Will and Werner Wolff, 2009, “Automatic Document Classification: A thorough Evaluation of various Methods”, “doi=10.1.1.90.966”.
  5. Kao, Anne, Poleet, R. Steve, “Natural Language Processing and Text Mining”, (Eds.), 1st edition, 2007, XII, 265p, 655illus.
  6. Vishal Gupta, Gurpreet S. Lehal, August 2009 “A Survey of Text Mining Techniques and Applications”, Journal of Emerging Technologies in Web Intelligence, VOL. 1, NO. 1.
  7. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, “Introduction to Information Retrieval”, Cambridge University Press. 2008.
  8. Yu Wang and Zheng-Ou Wang, 2007, “ A Fast KNN Algorithm for Text Classification”, Machine Learning and Cybernetics, International Conference on, Vol. 6, pp. 3436-3441, doi : 10.1109/ICMLC.2007.4370742, Hong Kong, IEEE.
  9. Wei Wang, Sujian Li and Chen Wang, 2008, “ICL at NTCIR-7: An Improved KNN Algorithm for Text Categorization”, Proceedings of NTCIR-7 Workshop Meeting, December 16–19, Tokyo, Japan.
  10. Jiawei Han, Michelin Kamber, 2001, “Data Mining Concepts and Techniques”, Morgan Kaufmann publishers, USA, 70-181.
  11. Jingnian Chen, Houkuan Huang, Shengfeng Tian and Youli Qu, 2009, “Feature selection for text classification with Naïve Bayes”, Expert Systems with Applications: An International Journal, Volume 36 Issue 3, and Elsevier.
  12. Wen Zhang, Taketoshi Yoshida and Xijin Tang, 2008, “Text classification based on multi-word with support vector machine”, Journal: Knowledge Based Systems – KBS, vol. 21, no. 8, pp. 879-886, doi: 10.1016/j.knosys.2008.03.044, Elsevier.
  13. Steve R. Gunn, 1998, “Support Vector Machines for Classification and Regression”, University of Southampton.
  14. Wenmin Li, Jiawei Han and Jian Pei, 2001, “CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules”, IEEE International Conference on Data Mining - ICDM, pp. 369-376, DOI: 10.1109/ICDM.2001.989541.
  15. Xiaoxin Yin, Jiawei Han. CPAR, 2003,” Classification based on Predictive Association Rules”, in Proceedings of SDM, doi=10.1.1.12.7268.
  16. Fernando Berzal, Juan-Carlos Cubero, Nicolás Marín, Daniel Sánchez, Jose-María Serrano, Amparo Vila, “Association rule evaluation for classification purposes”.
  17. Chuntao Jiang, Frans Coenen, Robert Sanderson, Michele Zito, May 2010, “Text classification using graph mining-based feature extraction”, Journal Knowledge-Based Systems Volume 23 Issue 4, Elsevier.
  18. Dat Huynh, Dat Tran, Wanli Ma, Dharmendra Sharma, 2011, “A New Term Ranking Method Based on Relation Extraction and Graph Model for Text Classification”, Faculty of Information Sciences and Engineering, University of Canberra ACT 2601, Australia.
  19. Guoqiang Peter Zhang, November 2000, “Neural Networks for Classification: A Survey”, IEEE Transactions on systems, man and cybernetics-Part C, Applications and Reviews, Vol. 30, NO. 4.
  20. Larry Manevitz, Malik Yousef, 2007, “One-class document classification via Neural Networks”, Neurocomputing 70, 1466–1481, Elsevier.
  21. Darvinder kaur, Vishal Gupta, “A survey of Named Entity Recognition in English and other Indian Languages”, Published in IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 6, November 2010.
  22. Kavi Narayana Murthy, “Automatic Categorization of Telugu News Articles”, Department of Computer and Information Sciences, University of Hyderabad, Hyderabad, doi: 202.41.85.68.
  23. S. Mohanty , P. K. Santi , Ranjeeta Mishra , R. N. Mohapatra , Sabyasachi Swain, “ Semantic Based Text Classification Using WordNets: Indian Language Perspective”, doi=10.1.1.134.866.
  24. Abbas Raza Ali, Maliha Ijaz, “Urdu Text Classification”, Published in FIT '09 Proceedings of the 7th International Conference on Frontiers of Information Technology, ACM New York, USA, 2009. ISBN: 978-1-60558-642-7 doi: 10.1145/1838002.1838025.
  25. P.Singh, A.Verma, N.S Chaudari, “ Performance Analysis of flexible zone based features to classify Hindi numerals”, Published in Electronics Computer Technology (ICECT), 3rd International Conference on 8-10 April 2011 on page 292-296, doi: 10.1109/ICECTECH.2011.5942101.
  26. K.Rajan, V. Ramalingam, M.Ganesan, S.Palanivel, B. Palaniappan, “ Automatic Classification of Tamil documents using Vector Space Model and Artificial Neural Network”, Published in: • Journal Expert Systems with Applications: An International Journal, Volume 36 Issue 8, October, doi: 10.1016/j.eswa.2009.02.010, 2009.
  27. George Forman, Evan Kirshenbaum, “Extremely Fast Text Feature Extraction for Classification and Indexing”, Published in: Proceeding CIKM '08 Proceedings of the 17th ACM conference on Information and knowledge management ACM New York, NY, USA, 2008 ISBN: 978-1-59593-991-3, doi :10.1145/1458082.1458243.
  28. G S Lehal and Chandan Singh, “Feature extraction and classification for OCR of Gurmukhi script”, Vivek, Vol. 12, No. 2, pp. 2-12 (1999).
  29. Punajbi Corpus
  30. Vishal Gupta and Gurpreet Singh Lehal, “Punjabi Language Stemmer for nouns and proper names”, Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP), IJCNLP 2011, Chiang Mai, Thailand, pp. 35–39. (2011).
  31. Yanbo J. Wang , Frans Coenen , Robert Sanderson, “A Hybrid Statistical Data Pre-processing Approach for Language-Independent Text Classification”, doi=10.1.1.157.6558, 2009.
  32. Guoshi Wu, Kaiping Liu, “Research on Text Classification Algorithm by Combining Statistical and Ontology Methods, IEEE International Conference on Computational Intelligence and Software Engineering, 11-13 Dec. 2009 , Pages 1-4, doi: 10.1109./CISE.2009.5363406.
Index Terms

Computer Science
Information Sciences

Keywords

NLP Text mining Text Classification Features extraction