CFP last date
20 January 2025
Reseach Article

A Maximum Entropy Approach to Kannada Part Of Speech Tagging

by Shambhavi.b. R, Ramakanth Kumar P, Revanth G
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 41 - Number 13
Year of Publication: 2012
Authors: Shambhavi.b. R, Ramakanth Kumar P, Revanth G
10.5120/5600-7852

Shambhavi.b. R, Ramakanth Kumar P, Revanth G . A Maximum Entropy Approach to Kannada Part Of Speech Tagging. International Journal of Computer Applications. 41, 13 ( March 2012), 9-12. DOI=10.5120/5600-7852

@article{ 10.5120/5600-7852,
author = { Shambhavi.b. R, Ramakanth Kumar P, Revanth G },
title = { A Maximum Entropy Approach to Kannada Part Of Speech Tagging },
journal = { International Journal of Computer Applications },
issue_date = { March 2012 },
volume = { 41 },
number = { 13 },
month = { March },
year = { 2012 },
issn = { 0975-8887 },
pages = { 9-12 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume41/number13/5600-7852/ },
doi = { 10.5120/5600-7852 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:29:30.079406+05:30
%A Shambhavi.b. R
%A Ramakanth Kumar P
%A Revanth G
%T A Maximum Entropy Approach to Kannada Part Of Speech Tagging
%J International Journal of Computer Applications
%@ 0975-8887
%V 41
%N 13
%P 9-12
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Part Of Speech (POS) tagging is the most important pre-processing step in almost all Natural Language Processing (NLP) applications. It is defined as the process of classifying each word in a text with its appropriate part of speech. In this paper, the probabilistic classifier technique of Maximum Entropy model is experimented for the tagging of Kannada sentences. Kannada language is agglutinative, morphologically very rich but resource poor. Hence 51267 words from EMILLE corpus were manually tagged and used as training data. The tagset included 25 tags as defined for Indian languages. The best suited feature set for the language was finalised after rigorous experiments. Data size of 2892 word forms was downloaded from Kannada websites for testing. Accuracy of 81. 6% was obtained in the experiments which prove that Maximum Entropy is well suited for Kannada language.

References
  1. D. Cutting, J. Kupiec, J. Pederson and P. Sibun. A practical part-of-speech tagger. In Proceedings of the 3rd Conference on Applied NLP, pp. 133-140. 1992.
  2. Adwait Ratnaparkhi. Maximum Entropy models for Natural Language Ambiguity Resolution, PhD thesis, University of Pennsylvania. 1998.
  3. Gimenez, J. and L. Marquez, Fast and Accurate Part-of-Speech Tagging: The SVM Approach Revisited. In Proceedings of the Fourth RANLP. 2003.
  4. J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In the proceedings of International Conference on Machine Learning (ICML), pp. 282-289, 2001.
  5. H Schmid, Part of Speech Tagging with Neural Network. In Proceedings of the 15th International Conference on Computational Linguistics (COLING-94) pp 172-176. 1994
  6. Aniket Dalal, Kumar Nagaraj, Uma Sawant, Sandeep Shelke. Hindi Part-of-Speech Tagging and Chunking : A Maximum Entropy Approach. In Proceedings of NLPAI Machine Learning Workshop on Artificial Intelligence. 2006.
  7. Avinesh, P. , Karthik, G. Part Of Speech Tagging and Chunking using Conditional Random Fields and Transformation Based Learning. In: Proceedings of IJCAI Workshop on Shallow Parsing for South Asian Languages, India. pp 21–24. 2007.
  8. Sandipan Dandapat. Part Of Speech Tagging and Chunking with Maximum Entropy Model. In Proceedings of IJCAI Workshop on "Shallow Parsing for South Asian Languages", Hyderabad, India. pp 29–32. 2007.
  9. Antony P. J , Soman K. P. Kernel based Part of Speech Tagger for Kannada. In Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao, , pp 2139 – 2144. 11-14 July 2010
  10. A. Ekbal, R. Haque and S. Bandyopadhyay. Maximum Entropy Based Bengali Part of Speech Tagging, Advances in Natural Language Processing and Applications, Research in Computing Science (RCS) Journal, Vol. (33), pp. 67-78. 2008.
  11. Siva Reddy, Serge Sharoff. Cross Language POS Taggers (and other Tools) for Indian Languages: An Experiment with Kannada using Telugu Resources. In Proceedings of IJCNLP workshop on Cross Lingual Information Access: Computational Linguistics and the Information Need of Multilingual Societies. Thailand, 2011
  12. Vijayalaxmi . F. Patil and Shahid Mushtaq Bhat, Part-of-Speech Tagging for Kannada, National Seminar on POS Annotation for Indian Languages: Issues and Perspectives. Organized by Linguistic Data Consortium for Indian Languages (LDC-IL), Government of India, 12-13 Dec 2011.
  13. S. N. Sridhar, KANNADA, A Kannada grammar book, Series Editor, Bernard Comrie.
  14. A. Bharati, R. Sangal, D. M. Sharma, and L. Bai. Anncorra: Annotating corpora guidelines for POS and chunk annotation for Indian languages. In Technical Report (TR-LTRC-31), LTRC, IIIT-Hyderabad. 2006.
  15. Baker, P, Hardie, A, McEnery, A, Xiao, R, Bontcheva, K, Cunningham, H, Gaizauskas, R, Hamza, O, Maynard, D, Tablan, V, Ursu, C, Jayaram, BD and Leisher, M. 2004 Corpus linguistics and South Asian languages: corpus creation and tool development. Literary and Linguistic Computing 19(4):509-524.
Index Terms

Computer Science
Information Sciences

Keywords

Natural Language Processing Part Of Speech Tagging Maximum Entropy