We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Reseach Article

Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models

by Khin Thandar Nwet, Khin Mar Soe, Ni Lar Thein
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 27 - Number 8
Year of Publication: 2011
Authors: Khin Thandar Nwet, Khin Mar Soe, Ni Lar Thein
10.5120/3322-4566

Khin Thandar Nwet, Khin Mar Soe, Ni Lar Thein . Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models. International Journal of Computer Applications. 27, 8 ( August 2011), 12-18. DOI=10.5120/3322-4566

@article{ 10.5120/3322-4566,
author = { Khin Thandar Nwet, Khin Mar Soe, Ni Lar Thein },
title = { Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models },
journal = { International Journal of Computer Applications },
issue_date = { August 2011 },
volume = { 27 },
number = { 8 },
month = { August },
year = { 2011 },
issn = { 0975-8887 },
pages = { 12-18 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume27/number8/3322-4566/ },
doi = { 10.5120/3322-4566 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:13:14.268447+05:30
%A Khin Thandar Nwet
%A Khin Mar Soe
%A Ni Lar Thein
%T Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models
%J International Journal of Computer Applications
%@ 0975-8887
%V 27
%N 8
%P 12-18
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Word alignment in bilingual corpora has been an active research topic in the Machine Translation research groups. Corpus is the body of text collections, which are useful for Language Processing (NLP). Parallel text alignment is the identification of the corresponding sentences in the parallel text. Large collections of parallel level are prerequisite for many areas of linguistic research. Parallel corpus helps in making statistical bilingual dictionary, in supporting statistical machine translation and in supporting as training data for word sense disambiguation and translation disambiguation. Nowadays, the world is a global network and everybody will be learned more than one language. So, multilingual corpora are more processing. Thus, the main purpose of this system is to construct word-aligned parallel corpus to be able in Myanmar-English machine translation. One useful concept is to identify correspondences between words in one language and in other language. The proposed approach is based on the first three IBM models and EM algorithm. It also shows that the approach can also be improved by using a list of cognates and morphological analysis.

References
  1. C. Callison-Burch, D. Talbot, and M. Osborne, “Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora”. In Proceedings of ACL, pages 175–182, Barcelona, Spain, July 2004.
  2. D. Wu. “Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria” In: Proc. of the 32nd Annual Conference of the ACL: 80-87. Las Cruces, NM in 1994. http://acl.ldc.upenn.edu/P/P94/P94-1012.pdf
  3. E. Venkataramani and D. Gupta, “English-Hindi Automatic Word Alignment with Scarce Resources”. In International Conference on Asian Language Processing, IEEE, 2010.
  4. F. Och and H. Ney. “A Systematic Comparison of Various Statistical Alignment Models”. Computational Linguistics, 29(1):19–52, 2003.
  5. G. Chinnappa and Anil Kumar Singh, “A java Implementation of an Extended Word Alignment Algorithm Based on the IBM Models”. In Proceedings of the 3rd Indian International Conference on Artificial Intelligence, Pune, India. 2007.
  6. 6 H. Hammarstrom, “Poor Man's Stemming: Unsupervised Recognition of Same-Stem Words”. Chalmer University, 412 96 Gothenburg Sweden, 2007.
  7. H. Langone, Benjamin R. Haskell, Geroge, A.Miller, “Annotating WordNet”, In Proceedings of the Workshop Frontiers in Corpus Annotation at HLT-NAACL, 2004.
  8. Ittycheriah and S. Roukos, “A Maximum Entropy Word Aligner for Arabic-English Machine Translation”. In Proceedings of HLT-EMNLP. Vancouver, Canada. Pages 89–96, 2005.
  9. J. Martin, R. Mihalcea, and T. Pedersen, “Word Alignment for Languages with Scarce Resources”. In Proceedings of the ACL Workshop on Building and Using Parallel Texts. Ann Arbor, USA. Pages 65–74, 2005.
  10. J. Brunning, A. de Gispert and William Byrne, “Context-Dependent Alignment Models for Statistical Machine Translation”. The 2009 Annual Conference of the North American Chapter of the ACL, pages110–118, Boulder, Colorado, June 2009.
  11. Li and C. Zong, “Word Reordering Alignment for Combination of Statistical Machine Translation Systems”, IEEE, 2008.
  12. "Myanmar Grammar", Department of the Myanmar Language Commission, Ministry of Education, Myanmar, 2005.
  13. P. Fung and K. Ward Church “ K-vec: A New Approach for Aligning Parallel Texts”. In Proceedings of the 15th conference on Computational linguistics. Pages 1096-1102. Kyoto, Japan, 1994.
  14. P. Koehn, F. J. Och, and D. Marcu, “Statistical Phrase based Translation”. In Proceedings of HLT-NAACL. Edmonton, Canada. Pages 81–88, 2003.
  15. P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, and R. L.Mercer “The Mathematics of Statistical Machine Translation: Parameter Estimation”. Computational Linguistics, 19(2):263–311, 1993.
  16. R. Mihalcea and T. Pedersen, “An evaluation exercise for word alignment”. In Proceedings of HLT-NAACL Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond. Edmonton, Canada. Pages 1–6, 2003.
  17. R. Harshawardhan , Mridula Sara Augustine and Dr K. P. Soman "A Simplified Approach to Word Alignment Algorithm for English-Tamil Translation". In Indian Journal of Computer Science and Engineering", 2008.
  18. W.P.Pa,N.L.Thein, "Disambiguation in Myanmar Word Segmentation",ICCA,February,2009.
Index Terms

Computer Science
Information Sciences

Keywords

Word-aligned Parallel Corpus IBM Models EM Algorithm