Analyzing Probability Vectors for Named Entity Statistical Machine Transliteration

M. L. Dhore; S. K. Dixit; T. D. Sonwalkar

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Reseach Article

Analyzing Probability Vectors for Named Entity Statistical Machine Transliteration

by M. L. Dhore, S. K. Dixit, T. D. Sonwalkar

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 55 - Number 10

Year of Publication: 2012

Authors: M. L. Dhore, S. K. Dixit, T. D. Sonwalkar

10.5120/8791-2776

M. L. Dhore, S. K. Dixit, T. D. Sonwalkar . Analyzing Probability Vectors for Named Entity Statistical Machine Transliteration. International Journal of Computer Applications. 55, 10 ( October 2012), 28-34. DOI=10.5120/8791-2776

@article{ 10.5120/8791-2776,

author = { M. L. Dhore, S. K. Dixit, T. D. Sonwalkar },

title = { Analyzing Probability Vectors for Named Entity Statistical Machine Transliteration },

journal = { International Journal of Computer Applications },

issue_date = { October 2012 },

volume = { 55 },

number = { 10 },

month = { October },

year = { 2012 },

issn = { 0975-8887 },

pages = { 28-34 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume55/number10/8791-2776/ },

doi = { 10.5120/8791-2776 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:56:53.753958+05:30

%A M. L. Dhore

%A S. K. Dixit

%A T. D. Sonwalkar

%T Analyzing Probability Vectors for Named Entity Statistical Machine Transliteration

%J International Journal of Computer Applications

%@ 0975-8887

%V 55

%N 10

%P 28-34

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Machine transliteration systems are classified as either Rule-based methods or statistical methods. A rule-based method focuses on transliterating names using lots of human-made rules set. These systems are simple to implement but require huge amount of language expertise. In statistical methods, the importance is given in converting transliteration problem into a classification problem and employs a statistical model to solve this classification problem. Though these methods don't require expert knowledge of Language model, they need large amounts of bilingual data and good algorithm for training. Currently, basic Markov Chain Model (MM), Extended Markov Chain (EMC), Hidden Markov Model (HMM), Conditional Random Fields (CRF), Decision Tree (DT), Maximum Entropy Markov Model (MEMM) and Support Vector Machine (SVM) are the popular statistical approaches used by many researchers across the globe. This paper focuses on mathematical analysis of different statistical approaches used in machine transliteration of named entity which would be beneficial for many upcoming researchers to know the mathematics used behind the curtains.

References

Mitchell, T. 1997. Machine Learning, McGraw Hill
Christopher D. Manning, Hinrich Schutze. 1999. Foundations of Statistical Natural Language Processing, MIT Press
Karimi S, Scholer F, and Turpin, 2011. Machine transliteration survey, ACM Computing Surveys, Vol. 43, No. 3, Article 17, pp. 1-46.
Li Haizhou, Kumaran A, Vladimir Pervouchine and Min Zhang, 2009. Report of NEWS Machine Transliteration Shared Task
L. Rabiner. 1989. A tutorial on Hidden Markov Models and selected applications in Speech Recognition. Proceedings of IEEE, Vol 77, No. 2, pp. 257-296
A. L. Berger, S. D. Pietra, and V. J. Della Pietra. 1996. A maximum entropy approach to natural language processing, Computational Linguistics, vol. 22, no. 1, pp. 39–71
Nigam, K. , Lafferty, J. , & McCallum, A. 1999. Using maximum entropy for text classification. IJCAI-99 Workshop on Machine Learning for Information Filtering, pp. 61–67
Beeferman, D. , Berger, A. , & Lafferty, J. D. 1999. Statistical models for text segmentation. Machine Learning, 34, pp. 177–210.
Ratnaparkhi, A. 1996. A maximum entropy model for part-of speech tagging. In E. Brill and K. Church (Eds. ), Proceedings of the conference on empirical methods in natural language processing, Somerset, New Jersey: Association for Computational Linguistics, pp. 133–142
McCallum, A. , Freitag, D. , & Pereira, F. 2000. Maximum Entropy Markov models for information extraction and segmentation. Proceedings of ICML pp. 591–598
Punyakanok, V. , and Roth, D. 2001. The use of classifiers in sequential inference. NIPS 13.
Della Pietra, S. , Della Pietra, V. J. , & Lafferty, J. D. 1997. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, pp. 380–393.
Lafferty, J. , McCallum, A. , & Pereira, F. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proc. ICML.
Yasemin Altun, Thomas Hofmann, and Alexander J. Smola, 2004. Gaussian Process Classification for Segmenting and Annotating Sequences, Proceedings of the 21 st International Conference on Machine Learning, Canada
Phil Blunsom, 2004. Hidden Markov Models
Jong-Hoon Oh, Key-Sun Choi, and Hitoshi Isahara, 2006. A Machine Transliteration Model Based on Correspondence between Graphemes and Phonemes, ACM Transactions on Asian Language Information Processing, Vol. 5, No. 3, pp. 185–208.
Kevin Knight, 2009. Bayesian Inference with Tears, a tutorial workbook for natural language researchers
Kevin Knight, 2009. Training Finite-State Transducer Cascades with Carmel
Y. Yuan and M J Shaw, 1995. Introduction of Fuzzy Decision Trees, Fuzzy sets and Systems, pp 125-139
Sung Young Jung, Sung Lim Hong and Eunok Pack, 2000. An English to Korean transliteration model of Extended Markov Window, Proceeding COLING 2000 Proceedings of the 18th conference on Computational linguistics , Volume 1, pp 383-389.
GuoDong Zhou and Jian Su, 2002. Named Entity Recognition using an HMM-based Chunk Tagger, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, pp. 473-480.
Hanna M. Wallach, 2004. Conditional Random Fields: An introduction, University of Pennsylvania CIS Technical Report MS-CIS-04-21, pp. 1-9
Charles Sutton and Andrew McCallum, An Introduction to conditional random fields for relational learning, University of Massachusetts, USA
Sunita Sarawagi and WilliamW. Cohen, Semi-Markov Conditional Random Fields for Information Extraction, Indian Institute of Technology Bombay, India

Index Terms

Computer Science

Information Sciences

Keywords

Conditional Random Fields Decision Trees Hidden Markov Model Markov Chain Statistical Machine Transliteration