CFP last date
20 January 2025
Reseach Article

Maximum Entropy Approach based Named Entity Recognition in Punjabi Language

by Arshdeep Singh, Jyoti Rani, Amandeep Kaur
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 84 - Number 3
Year of Publication: 2013
Authors: Arshdeep Singh, Jyoti Rani, Amandeep Kaur
10.5120/14553-2650

Arshdeep Singh, Jyoti Rani, Amandeep Kaur . Maximum Entropy Approach based Named Entity Recognition in Punjabi Language. International Journal of Computer Applications. 84, 3 ( December 2013), 1-5. DOI=10.5120/14553-2650

@article{ 10.5120/14553-2650,
author = { Arshdeep Singh, Jyoti Rani, Amandeep Kaur },
title = { Maximum Entropy Approach based Named Entity Recognition in Punjabi Language },
journal = { International Journal of Computer Applications },
issue_date = { December 2013 },
volume = { 84 },
number = { 3 },
month = { December },
year = { 2013 },
issn = { 0975-8887 },
pages = { 1-5 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume84/number3/14553-2650/ },
doi = { 10.5120/14553-2650 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:59:56.594120+05:30
%A Arshdeep Singh
%A Jyoti Rani
%A Amandeep Kaur
%T Maximum Entropy Approach based Named Entity Recognition in Punjabi Language
%J International Journal of Computer Applications
%@ 0975-8887
%V 84
%N 3
%P 1-5
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Named Entity Recognition is the task of identifying and classifying named entities into some predefine categories like person, location, organization etc. NER is used in many applications like text summarization, text classification, question answering and machine translation systems etc. For English a lot of work has already been done in the field of NER, where capitalization is a major key for rules, whereas Indian languages do not have such feature. This makes the task difficult for Indian Languages. This work reports about the evaluation of a Named Entity Recognition (NER) system for Punjabi language using the Maximum Entropy Approach (MAXENT). A manually tagged Punjabi news corpus is used for the evaluation which was developed from Punjabi newspaper available online. The training set annotated with a NE tagset of 12 tags is used. A MAXENT based NER system for Punjabi has reported an overall Precision, Recall and F-Score values of 90. 92%, 72. 30% and 80. 55% respectively with feature set context word, Part of Speech (POS) information, NE tag of previous word and First name Gazetteer list.

References
  1. Bikel, D. M, Miller, S. , Schwartz, R. and Weischedel, R. (1997). Nymble: a high performance learning name-finder. In proceedings of the fifth conference on Applied natural language processing, pp 194-201, San Francisco, CA, USA.
  2. Borthwich, A. (1999). Maximum Entropy Approach to Named Entity Recognition. Ph. D. Thesis, New York University.
  3. Carreras, X. , Marques, L. and Padro, L. (2002). Named Entity Extraction using adaboost. In proceedings of the Conference on Computational Natural Language Learning, pp. 167-170, Taipei, Taiwan.
  4. Darroch, J. N. and Ratcliff D. (1972). Generalized iterative scaling for loglinear models. Annals of Mathematical Statistics, pp 1470-1480.
  5. Ekbal, A. , Haque, R. , Das, A. , Poka, V. and Badyopadhyay, S. (2008 a). Language Independent Named Entity Recognition in Indian Languages. In the Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pp 33-40, Hyderabad, India.
  6. Ekbal, A. , Badyopadhyay, S. (2008 b). Bengali Named Entity Recognition using Support Vector Machine. In the Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pp 51-58, Hyderabad, India.
  7. Gali, K. , Surana, H. , Vaidya, A. , Shishtla, P. and Sharma, D. M. (2008). Aggregating Machine Learning and Rule Based Heuristics for Named Entity Recognition. In the Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pp 25-31, Hyderabad, India.
  8. Grishman, R. (1995). The NYU System for MUC-6 or Where's the Syntax. In the Procedings of Sixth Message Understanding Conference (MUC-6), pp 167-195, Fairfax, Viginia.
  9. Hasanuzzaman, M. , Ekbal, A. and Bandyopadhyay, S. (2009). "Maximum Entropy Approach for Named Entity Recognition in Bengali and Hindi," International Journal of Recent Trends in Engineering, vol. 1, Number 1, pp 408-412.
  10. Kaur, A. , Josan, G. S. and Kaur, J. (2009). Named Entity Recognition in Punjabi Language: A Conditional Random Field Approach, In the Proceedings of International Conference on Natural Language Processing, pp 277-282.
  11. Krishnarao, A. A. , Gahlot, H. , Srinet, A. and Kushwaha, D. S. (2009). A Comparison of Performance of sequential Learning Algorithms on the task of Named Entity Recognition for Indian Languages. In the Proceedings of 9th International Conference on Computer Science, pp 123-132, Bton Rouge, LA, USA.
  12. Kumar, N. and Pushpak, B. (2006). Named Entity Recognition in Hindi using MEMM. In Technical Report, IIT Bombay.
  13. Lafferty, J. D. , McCallum, A. , Pereira, F. C. N. (2001). Conditional Random Fields: Probabilistic Models for Segmenting and LAbelling Sequence Data. In the proceedings of International Conference on Machine Learning, pp 282-289, Williams College, Williamstown, MA, USA.
  14. Pietra S. D. , Pietra V. D. and Lafferty J. (1997). Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 19(4): 380-393.
  15. Raju, G. V. S. , Srinivasu, B. , Raju, S. V. and Kumar, K. S. M. V. (2008). Named Entity Recognition for Telugu using Maximum Entropy Model. In the Proceedings of Journal of Theoretical and Applied Information Technology, pp 125-130.
  16. Ramshaw, L. and Marcus, M. (1995). Text chunking using transformation-based learning. In Proceedings of the Third Workshop on Very Large Corpora,, Somerset, New Jersey Association for Computational Linguistics, pp82-94, Somerset, New Jersey.
  17. Saha, S. K. , Ghosh, P. S. , Sarkar, S. and Mitra, P. (2008). Named Entity Recognition in Hindi using Maximum Entropy and Transliteration. Polibits 38, pp 33-42.
  18. Shishtla, P. M. , Gali, K. , Pingali, P. and Varma, V. (2008). Experiments in Telugu NER: A Conditional Random Field Approach. In the Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pp 105-110, Hyderabad, India.
  19. Sekine, S. , and Eriguchi, Y. (2000). Japanese named entity extraction evaluation: analysis of results. In Proc. of the 18th COLING, pp 1106–1110.
  20. Srihari R. , Niu C. and Li W. (2000). A Hybrid Approach for Named Entity and Sub-Type Tagging. In: Proceedings of the sixth conference on applied natural language processing, pp 247-254, Washington, USA.
  21. Yamada, H. , Kudo, T. and Matsumoto, Y. (2002). Japanese named entity extraction using support vector machine. Transactions of Information Processing Society of Japan, pp 44–53.
Index Terms

Computer Science
Information Sciences

Keywords

Named Entity Recognition Named Entity Maximum Entropy NLP Punjabi.