Parts of Speech Tagging for Indian Languages: A Literature Survey

Antony P.J.; Soman K.P.

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Parts of Speech Tagging for Indian Languages: A Literature Survey

by Antony P.J., Soman K.P.

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 34 - Number 8

Year of Publication: 2011

Authors: Antony P.J., Soman K.P.

10.5120/4119-5993

Antony P.J., Soman K.P. . Parts of Speech Tagging for Indian Languages: A Literature Survey. International Journal of Computer Applications. 34, 8 ( November 2011), 22-29. DOI=10.5120/4119-5993

@article{ 10.5120/4119-5993,

author = { Antony P.J., Soman K.P. },

title = { Parts of Speech Tagging for Indian Languages: A Literature Survey },

journal = { International Journal of Computer Applications },

issue_date = { November 2011 },

volume = { 34 },

number = { 8 },

month = { November },

year = { 2011 },

issn = { 0975-8887 },

pages = { 22-29 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume34/number8/4119-5993/ },

doi = { 10.5120/4119-5993 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:20:50.143773+05:30

%A Antony P.J.

%A Soman K.P.

%T Parts of Speech Tagging for Indian Languages: A Literature Survey

%J International Journal of Computer Applications

%@ 0975-8887

%V 34

%N 8

%P 22-29

%D 2011

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Part of speech (POS) tagging is the process of assigning the part of speech tag or other lexical class marker to each and every word in a sentence. In many Natural Language Processing applications such as word sense disambiguation, information retrieval, information processing, parsing, question answering, and machine translation, POS tagging is considered as the one of the basic necessary tool. Identifying the ambiguities in language lexical items is the challenging objective in the process of developing an efficient and accurate POS Tagger. Literature survey shows that, for Indian languages, POS taggers were developed only in Hindi, Bengali, Panjabi and Dravidian languages. Some POS taggers were also developed generic to the Hindi, Bengali and Telugu languages. All proposed POS taggers were based on different Tagset, developed by different organization and individuals. This paper addresses the various developments in POS-taggers and POS-tagset for Indian language, which is very essential computational linguistic tool needed for many natural language processing (NLP) applications.

References

Akshar Bharathi and Prashanth R. Mannem (2007), “Introduction to the Shallow Parsing Contest for South Asian Languages”, Language Technologies Research Center, International Institute of Information Technology, Hyderabad, India 500032.
Dinesh Kumar and Gurpreet Singh Josan,(2010), “Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey”, International Journal of Computer Applications (0975 – 8887) Volume6–No.5, September, 2010, www.ijcaonline.org/ volume6/number5 /pxc3871409 .pdf..
Manish Shrivastava and Pushpak Bhattacharyya (2008), “Hindi POS Tagger Using Naive Stemming : Harnessing Morphological Information Without Extensive Linguistic Knowledge”, Department of Computer Science and Engineering, Indian Institute of Technology, Bombay. Proceeding of the ICON 2008.
Nidhi Mishra Amit Mishra (2011), “Part of Speech Tagging for Hindi Corpus”, International Conference on Communication Systems and Network Technologies.
Pradipta Ranjan Ray, Harish V., Sudeshna Sarkar and Anupam Basu, “Part of Speech Tagging and Local Word Grouping Techniques for Natural Language Parsing in Hindi”,Department of Computer Science & Engineering, Indian Institute of Technology, Kharagpur, INDIA 721302. www.mla.iitkgp.ernet.in/papers/hindipostagging.pdf.
Debasri Chakrabarti (2011), “Layered Parts of Speech Tagging for Bangla”, Language in India www.languageinindia.c o m, M a y 2 0 1 1, Special Volume:Problems of Parsing in Indian Languages.
Vijayalaxmi .F. Patil (2010), “Designing POS Tagset for Kannada, Linguistic Data Consortium for Indian Languages (LDC-IL), Organized by Central Institute of Indian Languages, Department of Higher Education Ministry of Human Resource Development, Government of India, March 2010..
Hammad Ali (2010), “An Unsupervised Parts-of-Speech Tagger for the Bangla language”, Department of Computer Science, University of British Columbia. 2010.
S. Rajendran (2006), “Parsing in Tamil”, LANGUAGE IN INDIA www.languageinindia.com Volume 6: 8 August, 2006.
M. Selvam, A.M. Natarajan (2009), “Improvement of Rule Based Morphological Analysis and POS Tagging in Tamil Language via Projection and Induction Techniques”, International Journal of Computers, Issue 4, Volume 3, 2009.
Dhanalakshmi V1, Anand Kumar1, Shivapratap G1, Soman KP1 and Rajendran S (2009), “Tamil POS Tagging using Linear Programming”, International Journal of Recent Trends in Engineering, Vol. 1, No. 2, May 2009.
Dhanalakshmi V1, Anand kumar M1, Rajendran S2, Soman K P., ”POS Tagger and Chunker for Tamil Language”.
Jabar Hassan Yousif , Tengku Mohd Tengku Sembok, “Arabic part-of-speech tagger based support vectors machines”.
Antony P J, Santhanu P Mohan and Soman K P (2010), “SVM Based Parts Speech Tagger for Malayalam”, International Conference on-Recent Trends in Information, Telecommunication and Computing (ITC 2010).
A Part of Speech Tagger for Indian Languages (POS tagger), Tagset developed at IIIT - Hyderabad after consultations with several institutions through two workshops, 2007. shiva.iiit.ac.in/SPSAL2007/iiit_tagset_guidelines.pdf.
G.M. Ravi Sastry , Sourish Chaudhuri and P. Nagender Reddy, “An HMM based Part-Of-Speech tagger and statistical chunker for 3 Indian languages”, www.cs.cmu.edu/~schaudhu/publications.html.
Pattabhi R K Rao T, Vijay Sundar Ram R, Vijayakrishna R and Sobha L (2007), “A Text Chunker and Hybrid POS Tagger for Indian Languages”, AU-KBC Research Centre, MIT Campus, Anna University, Chromepet, Chennai, 2007 . shiva.iiit.ac.in/SPSAL2007/final/aukbc.pdf.
Asif Ekbal, Samiran Mandal and Sivaji Bandyopadhyay (2007), “POS Tagging Using HMM and Rule-based Chunking”, Workshop on shallow parsing in South Asian languages, shiva.iiit.ac.in/SPSAL2007/proceedings.php.
Sathish Chandra Pammi and Kishore Prahallad (2007), “POS Tagging and Chunking using Decision Forests”, Workshop on shallow parsing in South Asian languages, 2007. shiva.iiit.ac.in/SPSAL2007/proceedings.php.
Mona Parakh, Rajesha N. and Ramya M (2011), “Sentence Boundary Disambiguation in Kannada Texts”, Language in India www.languageinindia.c o m 1 1 : 5 M a y 2 0 1 1 Special Volume:Problems of Parsing in Indian Languages, Pages 17-19.
Delip Rao and David Yarowsky (2007), “Part of Speech Tagging and Shallow Parsing of Indian Languages”, Department of Computer Science, Johns Hopkins University, USA, 2007. The proceedings of the wworkshop on "Shallow Parsing in South Asian Languages" shiva.iiit.ac.in/SPSAL2007/final/iitmcsa.pdf.

Index Terms

Computer Science

Information Sciences

Keywords

Ambiguity Tagset Information Retrieval Data Driven System Foreign Languages