Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models

Khin Thandar Nwet; Khin Mar Soe; Ni Lar Thein

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Self-Training using a K-Nearest Neighbor as a Base Classifier Reinforced by Support Vector Machines

October

2012

GPU based Suffix Array Pattern Matching Approach for Big Data

Jul

2017

Open Source Vs Proprietary Application and Technologies

July

2012

Representation Learning with Adaptive Superpixel Coding

Dec

2025

Reseach Article

Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models

by Khin Thandar Nwet, Khin Mar Soe, Ni Lar Thein

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 27 - Number 8

Year of Publication: 2011

Authors: Khin Thandar Nwet, Khin Mar Soe, Ni Lar Thein

10.5120/3322-4566

Khin Thandar Nwet, Khin Mar Soe, Ni Lar Thein . Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models. International Journal of Computer Applications. 27, 8 ( August 2011), 12-18. DOI=10.5120/3322-4566

@article{ 10.5120/3322-4566,

author = { Khin Thandar Nwet, Khin Mar Soe, Ni Lar Thein },

title = { Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models },

journal = { International Journal of Computer Applications },

issue_date = { August 2011 },

volume = { 27 },

number = { 8 },

month = { August },

year = { 2011 },

issn = { 0975-8887 },

pages = { 12-18 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume27/number8/3322-4566/ },

doi = { 10.5120/3322-4566 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:13:14.268447+05:30

%A Khin Thandar Nwet

%A Khin Mar Soe

%A Ni Lar Thein

%T Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models

%J International Journal of Computer Applications

%@ 0975-8887

%V 27

%N 8

%P 12-18

%D 2011

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Word alignment in bilingual corpora has been an active research topic in the Machine Translation research groups. Corpus is the body of text collections, which are useful for Language Processing (NLP). Parallel text alignment is the identification of the corresponding sentences in the parallel text. Large collections of parallel level are prerequisite for many areas of linguistic research. Parallel corpus helps in making statistical bilingual dictionary, in supporting statistical machine translation and in supporting as training data for word sense disambiguation and translation disambiguation. Nowadays, the world is a global network and everybody will be learned more than one language. So, multilingual corpora are more processing. Thus, the main purpose of this system is to construct word-aligned parallel corpus to be able in Myanmar-English machine translation. One useful concept is to identify correspondences between words in one language and in other language. The proposed approach is based on the first three IBM models and EM algorithm. It also shows that the approach can also be improved by using a list of cognates and morphological analysis.

References

C. Callison-Burch, D. Talbot, and M. Osborne, “Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora”. In Proceedings of ACL, pages 175–182, Barcelona, Spain, July 2004.
D. Wu. “Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria” In: Proc. of the 32nd Annual Conference of the ACL: 80-87. Las Cruces, NM in 1994. http://acl.ldc.upenn.edu/P/P94/P94-1012.pdf
E. Venkataramani and D. Gupta, “English-Hindi Automatic Word Alignment with Scarce Resources”. In International Conference on Asian Language Processing, IEEE, 2010.
F. Och and H. Ney. “A Systematic Comparison of Various Statistical Alignment Models”. Computational Linguistics, 29(1):19–52, 2003.
G. Chinnappa and Anil Kumar Singh, “A java Implementation of an Extended Word Alignment Algorithm Based on the IBM Models”. In Proceedings of the 3rd Indian International Conference on Artificial Intelligence, Pune, India. 2007.
6 H. Hammarstrom, “Poor Man's Stemming: Unsupervised Recognition of Same-Stem Words”. Chalmer University, 412 96 Gothenburg Sweden, 2007.
H. Langone, Benjamin R. Haskell, Geroge, A.Miller, “Annotating WordNet”, In Proceedings of the Workshop Frontiers in Corpus Annotation at HLT-NAACL, 2004.
Ittycheriah and S. Roukos, “A Maximum Entropy Word Aligner for Arabic-English Machine Translation”. In Proceedings of HLT-EMNLP. Vancouver, Canada. Pages 89–96, 2005.
J. Martin, R. Mihalcea, and T. Pedersen, “Word Alignment for Languages with Scarce Resources”. In Proceedings of the ACL Workshop on Building and Using Parallel Texts. Ann Arbor, USA. Pages 65–74, 2005.
J. Brunning, A. de Gispert and William Byrne, “Context-Dependent Alignment Models for Statistical Machine Translation”. The 2009 Annual Conference of the North American Chapter of the ACL, pages110–118, Boulder, Colorado, June 2009.
Li and C. Zong, “Word Reordering Alignment for Combination of Statistical Machine Translation Systems”, IEEE, 2008.
"Myanmar Grammar", Department of the Myanmar Language Commission, Ministry of Education, Myanmar, 2005.
P. Fung and K. Ward Church “ K-vec: A New Approach for Aligning Parallel Texts”. In Proceedings of the 15th conference on Computational linguistics. Pages 1096-1102. Kyoto, Japan, 1994.
P. Koehn, F. J. Och, and D. Marcu, “Statistical Phrase based Translation”. In Proceedings of HLT-NAACL. Edmonton, Canada. Pages 81–88, 2003.
P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, and R. L.Mercer “The Mathematics of Statistical Machine Translation: Parameter Estimation”. Computational Linguistics, 19(2):263–311, 1993.
R. Mihalcea and T. Pedersen, “An evaluation exercise for word alignment”. In Proceedings of HLT-NAACL Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond. Edmonton, Canada. Pages 1–6, 2003.
R. Harshawardhan , Mridula Sara Augustine and Dr K. P. Soman "A Simplified Approach to Word Alignment Algorithm for English-Tamil Translation". In Indian Journal of Computer Science and Engineering", 2008.
W.P.Pa,N.L.Thein, "Disambiguation in Myanmar Word Segmentation",ICCA,February,2009.

Index Terms

Computer Science

Information Sciences

Keywords

Word-aligned Parallel Corpus IBM Models EM Algorithm