International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 55 - Number 10 |
Year of Publication: 2012 |
Authors: M. L. Dhore, S. K. Dixit, T. D. Sonwalkar |
10.5120/8791-2776 |
M. L. Dhore, S. K. Dixit, T. D. Sonwalkar . Analyzing Probability Vectors for Named Entity Statistical Machine Transliteration. International Journal of Computer Applications. 55, 10 ( October 2012), 28-34. DOI=10.5120/8791-2776
Machine transliteration systems are classified as either Rule-based methods or statistical methods. A rule-based method focuses on transliterating names using lots of human-made rules set. These systems are simple to implement but require huge amount of language expertise. In statistical methods, the importance is given in converting transliteration problem into a classification problem and employs a statistical model to solve this classification problem. Though these methods don't require expert knowledge of Language model, they need large amounts of bilingual data and good algorithm for training. Currently, basic Markov Chain Model (MM), Extended Markov Chain (EMC), Hidden Markov Model (HMM), Conditional Random Fields (CRF), Decision Tree (DT), Maximum Entropy Markov Model (MEMM) and Support Vector Machine (SVM) are the popular statistical approaches used by many researchers across the globe. This paper focuses on mathematical analysis of different statistical approaches used in machine transliteration of named entity which would be beneficial for many upcoming researchers to know the mathematics used behind the curtains.