CFP last date
20 December 2024
Reseach Article

Kannada Part-Of-Speech Tagging with Probabilistic Classifiers

by Shambhavi B R, Ramakanth Kumar P
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 48 - Number 17
Year of Publication: 2012
Authors: Shambhavi B R, Ramakanth Kumar P
10.5120/7442-0452

Shambhavi B R, Ramakanth Kumar P . Kannada Part-Of-Speech Tagging with Probabilistic Classifiers. International Journal of Computer Applications. 48, 17 ( June 2012), 26-30. DOI=10.5120/7442-0452

@article{ 10.5120/7442-0452,
author = { Shambhavi B R, Ramakanth Kumar P },
title = { Kannada Part-Of-Speech Tagging with Probabilistic Classifiers },
journal = { International Journal of Computer Applications },
issue_date = { June 2012 },
volume = { 48 },
number = { 17 },
month = { June },
year = { 2012 },
issn = { 0975-8887 },
pages = { 26-30 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume48/number17/7442-0452/ },
doi = { 10.5120/7442-0452 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:44:21.090080+05:30
%A Shambhavi B R
%A Ramakanth Kumar P
%T Kannada Part-Of-Speech Tagging with Probabilistic Classifiers
%J International Journal of Computer Applications
%@ 0975-8887
%V 48
%N 17
%P 26-30
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Part-Of-Speech (POS) tagging is defined as the Natural Language Processing (NLP) task in which each word in a sentence is labeled with a tag indicating its appropriate part of speech. Of the entire supervised machine learning classification algorithms, second order Hidden Markov Model (HMM) and Conditional Random Fields (CRF) is chosen in this work for POS tagging of Kannada language. Training data includes 51,269 words and test data consists of around 2932 tokens. Both set being disjoint and taken from EMILLE corpus. Experiments show that the accuracy of the tools based on HMM and CRF is 79. 9% and 84. 58% respectively.

References
  1. Brill E. 1992 A Simple Rule-Based Part of Speech Tagger. In Proceedings of the Third Conference on Applied Computational Linguistics (ACL), Trento, Italy.
  2. Ratnaparkhi, A. 1996 A Maximum Entropy Model for Part-of Speech Tagging. Proceedings of the Conference on Empirical Methods in Natural Language Processing, 133–142.
  3. Gimenez, J. and L. Marquez, 2003. Fast and Accurate Part-of-Speech Tagging: The SVM Approach Revisited. In Proceedings of the Fourth RANLP.
  4. H Schmid, 1994, Part of Speech Tagging with Neural Networks. In Proceedings of the 15th International Conference on Computational Linguistics (COLING-94) 172-176.
  5. Proceedings of IJCAI- 2007, Workshop on Shallow Parsing for South Asian Languages (SPSAL-2007), Hyderabad, India
  6. Pranjal Awasthi, Delip Rao, Balaraman Ravindran 2006 Part Of Speech Tagging and Chunking with HMM and CRF. In Proceedings of the NLPAI ML contest workshop, National Workshop on Artificial Intelligence.
  7. Himanshu Agrawal, Anirudh Mani 2006 Part Of Speech Tagging and Chunking Using Conditional Random Fields. In Proceedings of the NLPAI ML contest workshop, National Workshop on Artificial Intelligence.
  8. A. Ekbal, R. Haque and S. Bandyopadhyay 2007 Bengali Part of Speech Tagging using Conditional Random Field. In Proceedings of the 7th International Symposium on Natural Language Processing (SNLP-07), Thailand. 131-136.
  9. Navanath Saharia, Dhrubajyoti Das, Utpal Sharma, Jugal Kalita 2009 Part of Speech Tagger for Assamese Text. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, Suntec, Singapore. 33–36.
  10. Chirag Patel, Karthik Gali 2008 Part-Of-Speech Tagging for Gujarati Using Conditional Random Fields. In Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages,Hyderabad, India. 117-122
  11. Ekbal, Asif, Mondal, S. , and S. Bandyopadhyay 2007 POS Tagging using HMM and Rule-based Chunking. In Proceedings of SPSAL-2007, IJCAI-07, 25-28.
  12. Manju K, Soumya S, Sumam Mary Idicula 2009 Development of A Pos Tagger for Malayalam-An Experience. In Proceedings of 2009 International Conference on Advances in Recent Technologies in Communication and Computing, IEEE
  13. Thoudam Doren Singh, Sivaji Bandyopadhyay 2008 Manipuri POS Tagging using CRF and SVM: A Language Independent Approach. In Proceedings of ICON-2008: 6th International Conference on Natural Language Processing.
  14. Antony P. J , Soman K. P. 2010 Kernel based Part of Speech Tagger for Kannada. In Proceedings of the Ninth International Conference on Machine Learning and Cybernetics, Qingdao. 2139 – 2144
  15. Siva Reddy, Serge Sharoff. 2011 Cross Language POS Taggers (and other Tools) for Indian Languages: An Experiment with Kannada using Telugu Resources. In Proceedings of IJCNLP workshop on Cross Lingual Information Access: Computational Linguistics and the Information Need of Multilingual Societies. Thailand.
  16. Shambhavi B R, RamakanthKumar P, Revanth G 2012 A Maximum Entropy Approach to Kannada Part Of Speech Tagging. International Journal of Computer Applications (IJCA), Volume 41 –No. 13,9-12.
  17. Rajapurohit B B, 1982. Accoustic Characteristics of Kannada, Central Institute of Indian Languages, Mysore.
  18. T. Brants. 2000 TnT – A statistical part-of-speech tagger. In Proceedings of the 6th Applied NLP Conference, 224-231.
  19. J. Lafferty, A. McCallum, and F. Pereira 2001 Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the International Conference on Machine Learning (ICML-2001), Williams, MA.
  20. F. Sha and F. Pereira. 2003 Shallow parsing with conditional random fields. In Proceedings of HLT-NAACL.
  21. A. Bharati, R. Sangal, D. M. Sharma, and L. Bai. 2006 Anncorra: Annotating corpora guidelines for POS and chunk annotation for Indian languages. In Technical Report (TR-LTRC-31), LTRC, IIIT-Hyderabad.
  22. Baker, P, Hardie, A, McEnery, A, Xiao, R, Bontcheva, K, Cunningham, H, Gaizauskas, R, Hamza, O, Maynard, D, Tablan, V, Ursu, C, Jayaram, BD and Leisher, M 2004 Corpus linguistics and South Asian languages: corpus creation and tool development. Literary and Linguistic Computing 19(4): 509-524.
Index Terms

Computer Science
Information Sciences

Keywords

Natural Language Processing Part Of Speech Tagging Hidden Markov Model Conditional Random Fields