International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 27 - Number 8 |
Year of Publication: 2011 |
Authors: Khin Thandar Nwet, Khin Mar Soe, Ni Lar Thein |
10.5120/3322-4566 |
Khin Thandar Nwet, Khin Mar Soe, Ni Lar Thein . Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models. International Journal of Computer Applications. 27, 8 ( August 2011), 12-18. DOI=10.5120/3322-4566
Word alignment in bilingual corpora has been an active research topic in the Machine Translation research groups. Corpus is the body of text collections, which are useful for Language Processing (NLP). Parallel text alignment is the identification of the corresponding sentences in the parallel text. Large collections of parallel level are prerequisite for many areas of linguistic research. Parallel corpus helps in making statistical bilingual dictionary, in supporting statistical machine translation and in supporting as training data for word sense disambiguation and translation disambiguation. Nowadays, the world is a global network and everybody will be learned more than one language. So, multilingual corpora are more processing. Thus, the main purpose of this system is to construct word-aligned parallel corpus to be able in Myanmar-English machine translation. One useful concept is to identify correspondences between words in one language and in other language. The proposed approach is based on the first three IBM models and EM algorithm. It also shows that the approach can also be improved by using a list of cognates and morphological analysis.