International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 34 - Number 8 |
Year of Publication: 2011 |
Authors: Antony P.J., Soman K.P. |
10.5120/4119-5993 |
Antony P.J., Soman K.P. . Parts of Speech Tagging for Indian Languages: A Literature Survey. International Journal of Computer Applications. 34, 8 ( November 2011), 22-29. DOI=10.5120/4119-5993
Part of speech (POS) tagging is the process of assigning the part of speech tag or other lexical class marker to each and every word in a sentence. In many Natural Language Processing applications such as word sense disambiguation, information retrieval, information processing, parsing, question answering, and machine translation, POS tagging is considered as the one of the basic necessary tool. Identifying the ambiguities in language lexical items is the challenging objective in the process of developing an efficient and accurate POS Tagger. Literature survey shows that, for Indian languages, POS taggers were developed only in Hindi, Bengali, Panjabi and Dravidian languages. Some POS taggers were also developed generic to the Hindi, Bengali and Telugu languages. All proposed POS taggers were based on different Tagset, developed by different organization and individuals. This paper addresses the various developments in POS-taggers and POS-tagset for Indian language, which is very essential computational linguistic tool needed for many natural language processing (NLP) applications.