We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Reseach Article

Article:Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey

by Dinesh Kumar, Gurpreet Singh Josan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 6 - Number 5
Year of Publication: 2010
Authors: Dinesh Kumar, Gurpreet Singh Josan
10.5120/1078-1409

Dinesh Kumar, Gurpreet Singh Josan . Article:Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey. International Journal of Computer Applications. 6, 5 ( September 2010), 1-9. DOI=10.5120/1078-1409

@article{ 10.5120/1078-1409,
author = { Dinesh Kumar, Gurpreet Singh Josan },
title = { Article:Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey },
journal = { International Journal of Computer Applications },
issue_date = { September 2010 },
volume = { 6 },
number = { 5 },
month = { September },
year = { 2010 },
issn = { 0975-8887 },
pages = { 1-9 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume6/number5/1078-1409/ },
doi = { 10.5120/1078-1409 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T19:54:36.643158+05:30
%A Dinesh Kumar
%A Gurpreet Singh Josan
%T Article:Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey
%J International Journal of Computer Applications
%@ 0975-8887
%V 6
%N 5
%P 1-9
%D 2010
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The problem of tagging in natural language processing is to find a way to tag every word in a text as a particular part of speech, e.g., proper pronoun. POS tagging is a very important preprocessing task for language processing activities. This paper reports about the Part of Speech (POS) taggers proposed for various Indian Languages like Hindi, Punjabi, Malayalam, Bengali and Telugu. Various part of speech tagging approaches like Hidden Markov Model (HMM), Support Vector Model (SVM), Rule based approaches, Maximum Entropy (ME) and Conditional Random Field (CRF) have been used for POS tagging. Accuracy is the prime factor in evaluating any POS tagger so the accuracy of every proposed tagger is also discussed in this paper.

References
  1. Aniket Dalal, Kumar Nagaraj, Uma Sawant and Sandeep Shelke, “Hindi Part-of-Speech Tagging and Chunking: A Maximum Entropy Approach”, In Proceeding of the NLPAI Machine Learning Competition, 2006.
  2. Antony P.J, Santhanu P Mohan, Soman K.P,”SVM Based Part of Speech Tagger for Malayalam”, IEEE International Conference on Recent Trends in Information, Telecommunication and Computing, pp. 339-341, 2010
  3. Agarwal Himashu, Amni Anirudh,” Part of Speech Tagging and Chunking with Conditional Random Fields” in the proceedings of NLPAI Contest, 2006
  4. Brants, TnT – A statistical part-of-speech tagger. In Proc. of the 6th Applied NLP Conference, pp. 224-231, 2000
  5. Cutting, J. Kupiec, J. Pederson and P. Sibun, A practical part-of-speech tagger. In Proc. of the 3rd Conference on Applied NLP, pp. 133-140, 1992
  6. Dermatas and K. George, Automatic stochastic tagging of natural language texts. Computational Linguistics, 21(2): 137-163, 1995
  7. Ekbal, Asif, and S. Bandyopadhyay,“Lexicon Development and POS tagging using a Tagged Bengali News Corpus”, In Proc. of FLAIRS-2007, Florida, 261-263, 2007
  8. Ekbal, Asif, Haque, R. and S. Bandyopadhyay, “Named Entity Recognition in Bengali: A Conditional Random Field Approach”, In Proc. of 3rd IJCNLP, 51-55, 2008
  9. Ekbal, A. Bandyopadhyay, S., “Part of Speech Tagging in Bengali Using Support Vector Machine”, ICIT- 08, IEEE International Conference on Information Technology, pp. 106-111, 2008
  10. E. Dermatas and K. George, Automatic stochastic tagging of Natural language texts, Computational Linguistics, 21(2): 137-163, 1995
  11. Ekbal Asif, et.al, “Bengali Part of Speech Tagging using Conditional Random Field” in Proceedings of the 7th International Symposium of Natural Language Processing (SNLP-2007), Pattaya, Thailand, 13-15 December 2007, pp.131-136
  12. Gurpreet Singh, “Development of Punjabi Grammar Checker, Phd. Dissertation, 2008
  13. Jurafsky D and Marting J H, Speech and Language Processing An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, Pearson Education Series 2002
  14. James Allen, Natural Language Understanding, Benjamin/ Cummings Publishing Company, 1995
  15. Jes´us Gim´enez and Llu´ıs M`arquez., SVMTtool:Technical manual v1.3, August 2006
  16. John Lafferty, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conf. on Machine Learning, pages 282–289.Morgan Kaufmann, San Francisco, CA.
  17. Kudo, T and Matsumoto, “Chunking with Support Vector Machines”, In Proc. of NAACL, 192-199, 2001.
  18. Lafferty, J., McCallum, A., and Pereira, F., “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data”, In Proc. of the 18th ICML’01, 282- 289, 2001.
  19. Linda Van Guilder (1995) Automated Part of Speech Tagging: A Brief Overview Handout for LING361, Fall 1995 Georgetown University
  20. Manju K., Soumya S., Sumam Mary Idicula, "Development of a POS Tagger for Malayalam - An Experience," artcom, pp.709-713, 2009 International Conference on Advances in Recent Technologies in Communication and Computing, 2009
  21. Manish Shrivastava and Pushpak Bhattacharyya, Hindi POS Tagger Using Naive Stemming: Harnessing Morphological Information Without Extensive Linguistic Knowledge, International Conference on NLP (ICON08), Pune, India, December, 2008 Also accessible from http://ltrc.iiit.ac.in/proceedings/ICON-2008
  22. PVS Avinesh, G Karthik, ”Part-Of-Speech Tagging and Chunking using Conditional Random Fields and Transformation Based Learning” in the proceedings of NLPAI Contest, 2006
  23. Ratnaparkhi, A., “A Maximum Entropy Part of Speech Tagger”, In Proc. of the EMNLP Conference, 133-142, 1996
  24. RamaSree, R.J, Kusuma Kumari, P., “Combining Pos Taggers For Improved Accuracy To Create Telugu Annotated Texts For Information Retrieval”, 2007, Available at http://www.ulib.org/conference/2007/RamaSree.pdf
  25. Sumam Mary Idicula and Peter S David, A Morphological processor for Malayalam Language, South Asia Research, SAGE Publications, 2007
  26. Sandipan Dandapat, Sudeshna Sarkar, Anupam Basu,” Automatic Part-of-Speech Tagging for Bengali: An Approach for Morphologically Rich Languages in a Poor Resource Scenario”, Proceedings of the Association for Computational Linguistic, pp 221-224, 2007
  27. S. Singh , K. Gupta , M. Shrivastava and P. Bhattacharya, “Morphological Richness Offsets Resource Demand-Experiences in Constructing a POS Tagger for Hindi”, In Proc. of COLING/ACL, 779-786, 2006
  28. Singh Mandeep, Lehal Gurpreet, and Sharma Shiv, 2008. ”A Part-of-Speech Tagset for Grammar Checking of Punjabi”, published in The Linguistic Journal, Vol 4, Issue 1, pp 6-22
  29. Smriti Singh, et.al,” Morphological Richness Offsets Resource Demand- Experiences in Constructing a POS Tagger for Hindi”, in the proceedings of COLING/ACL, pp. 779-786, 2006
  30. http://en.wikipedia.org/wiki/Malayalam
  31. http://www.bangla-online.info/PromotionalSite/Bangla Language/IntroductionOfBanglaLanguage.htm
  32. http://en.wikipedia.org/wiki/Punjabi_grammar
  33. http://en.wikipedia.org/wiki/Punjabi_language
  34. http://en.wikipedia.org/wiki/Telugu_language
Index Terms

Computer Science
Information Sciences

Keywords

HMM Tagging Stochastic Tagset Finite State Automata Suffix Prefix Support Vector Machines Stemming Maximum Entropy Corpora Tags Morphology