International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 48 - Number 23 |
Year of Publication: 2012 |
Authors: Manikrao L Dhore, Shantanu K Dixit, Tushar D Sonwalkar |
10.5120/7522-0624 |
Manikrao L Dhore, Shantanu K Dixit, Tushar D Sonwalkar . Hindi to English Machine Transliteration of Named Entities using Conditional Random Fields. International Journal of Computer Applications. 48, 23 ( June 2012), 31-37. DOI=10.5120/7522-0624
Machine transliteration has received significant research attention in recent years. In most cases, the source language has been English and the target language is an Asian language. This paper focuses on Hindi to English machine transliteration of Indian named entities such as proper nouns, place names and organization names using conditional random fields (CRF). Hindi is the national language of the India and spoken by more than 500 millions Indian. Hindi is the world's fourth most commonly used language after Chinese, English and Spanish. This system takes Indian place name as an input in Hindi language using Devanagari script and transliterates it into English. The input to the system is provided in the form of syllabification in order to apply the n-gram techniques. As more than 50% named entities are formed as a combination of two and three syllabic units, the n-gram approach with unigrams, bigrams and trigrams of Hindi are used to train the corpus. The system provides the satisfactory performance for trigrams as compared to unigrams and bigrams.