CFP last date
20 January 2025
Reseach Article

A Statistical Method for English to Arabic Machine Translation

by Marwan Akeel, R. B. Mishra
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 86 - Number 2
Year of Publication: 2014
Authors: Marwan Akeel, R. B. Mishra
10.5120/14957-3124

Marwan Akeel, R. B. Mishra . A Statistical Method for English to Arabic Machine Translation. International Journal of Computer Applications. 86, 2 ( January 2014), 13-19. DOI=10.5120/14957-3124

@article{ 10.5120/14957-3124,
author = { Marwan Akeel, R. B. Mishra },
title = { A Statistical Method for English to Arabic Machine Translation },
journal = { International Journal of Computer Applications },
issue_date = { January 2014 },
volume = { 86 },
number = { 2 },
month = { January },
year = { 2014 },
issn = { 0975-8887 },
pages = { 13-19 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume86/number2/14957-3124/ },
doi = { 10.5120/14957-3124 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:03:10.363993+05:30
%A Marwan Akeel
%A R. B. Mishra
%T A Statistical Method for English to Arabic Machine Translation
%J International Journal of Computer Applications
%@ 0975-8887
%V 86
%N 2
%P 13-19
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Translating from English into a morphologically richer language like Arabic is a challenge in statistical machine translation. Segmentation of Arabic text was introduced to bridge the inflection morphology gap. In this work, we investigate the impact of supporting Arabic morphologically segmented training corpus in a phrase-based statistical machine translation system with one to one dictionary and examine the effects on system performance. The results show that the dictionary improves the quality of the translation output especially when the corpus used is normalized and fully segmented excluding the determiner. The dictionary also decreases the out of vocabulary rate. The effect of the dictionary support with different baseline and factored models using data ranging from full word form to fully segmented forms are also demonstrated.

References
  1. Al-Haj, Hassan and Lavie, Alon. The impact of Arabic morphological segmentation on broad-coverage English-to-Arabic statistical machine translation. Machine translation, vol. 26, pp. 3-24, 2012.
  2. Alexandre Rafalovitch, Robert Dale. United Nations General Assembly Resolutions: A Six-Language Parallel Corpus. in Proceedings of the MT Summit XII, pp. pages 292-299, Ottawa, Canada. CiteULike record for the paper (UN REFERENCE), 2009.
  3. Alotaiby, Fahad, Alkharashi, Ibrahim, and Foda, Salah. Processing large Arabic text corpora: Preliminary analysis and results. in Proceedings of the Second International Conference on Arabic Language Resources and Tools, pp. 78-82, 2009.
  4. Badr, Ibrahim , Zbib, Rabih, and Glass, James. Segmentation for English-to-Arabic statistical machine translation. presented at the Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, Columbus, Ohio, 2008.
  5. Buckwalter, Tim. Buckwalter Arabic Morphologcal Analyzer. in Linguistic Data Consortium. (LDC2002L49), 2002.
  6. Denkowski, Michael and Lavie, Alon. Meteor 1. 3: Automatic metric for reliable optimization and evaluation of machine translation systems. in Proceedings of the EMNLP 2011 Workshop on Statistical Machine Translation, pp. 85-91, 2011.
  7. Diab, Mona. Second generation AMIRA tools for Arabic processing: Fast and robust tokenization, POS tagging, and base phrase chunking. in 2nd International Conference on Arabic Language Resources and Tools, 2009.
  8. Doddington, George. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. in Proceedings of the second international conference on Human Language Technology Research, pp. 138-145, 2002.
  9. Ekseer Dictionary. http://at. alixsys. com/codesprint2009/sandbox/taha/????? ???????/Ekseer Dictionary. mdb, last accessed on (2012, May).
  10. El Kholy, Ahmed and Habash, Nizar. Orthographic and Morphological Processing for English-Arabic Statistical Machine Translation. TALN 2010, Montréal, 2010.
  11. El Kholy, Ahmed and Habash, Nizar. Techniques for Arabic morphological detokenization and orthographic denormalization. in Workshop on Language Resources and Human Language Technology for Semitic Languages in the Language Resources and Evaluation Conference (LREC), Valletta, Malta, 2010.
  12. Habash, Nizar and Sadat, Fatiha. Arabic preprocessing schemes for statistical machine translation. in In Proc. of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pp. 49–52, New York City, NY, 2006.
  13. Knight, Kevin, Al-Onaizan, Yaser, Purdy, David, Curin, Jan, Jahr, Michael, Lafferty, John, Melamed, Dan, Smith, Noah, Och, Franz Josef, and Yarowsky, David. EGYPT: a statistical machine translation toolkit. http://old-site. clsp. jhu. edu/ws99/projects/mt/, last accessed on (1999, Nov 2012).
  14. Koehn, Philipp, Hoang, Hieu, Birch, Alexandra, Callison-Burch, Chris, Federico, Marcello, Bertoldi, Nicola, Cowan, Brooke, Shen, Wade, Moran, Christine, and Zens, Richard. Moses: Open source toolkit for statistical machine translation. in Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pp. 177-180, 2007.
  15. Latifa, Al-Sulaiti http://www. comp. leeds. ac. uk/latifa/research. htm, last accessed on (2012, 12 Nov. ).
  16. Lee, Young-Suk. Morphological analysis for statistical machine translation. in Proceedings of the 5th Meeting of the North American Chapter of the Association for Computational Linguitics/Human Language Technologies Conference (HLT NAACL04), pp. 57–60, Boston, MA, 2004.
  17. Maamouri, M. , Bies, A. , Kulick, S. , Gaddeche, F. , and Mekk, W. Arabic Treebank: Part 3(a) v. 2. 6. presented at the Linguistic Data Consortium. , Philadelphia, USA, Catalog ID: LDC2007E65. , 2007.
  18. Meedan. Meedan's Open Source Arabic/English Translation Memory http://github. com/anastaw/Meedan-Memory, last accessed on (2012, Sep. ).
  19. Och, Franz Josef and Ney, Hermann. A systematic comparison of various statistical alignment models. Computational linguistics, vol. 29, pp. 19-51, 2003.
  20. Papineni, Kishore, Roukos, Salim, Ward, Todd, and Zhu, Wei-Jing. BLEU: a method for automatic evaluation of machine translation. in Proceedings of the 40th annual meeting on association for computational linguistics, pp. 311-318, 2002.
  21. Sadat, Fatiha and Habash, Nizar. Combination of Arabic preprocessing schemes for statistical machine translation. in Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL (Coling ACL'06), pp. 1-8, Sydney, Australia, 2006.
  22. Sarikaya, Ruhi and Deng, Yonggang. Joint morphological-lexical language modeling for machine translation. in Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers, pp. 145-148, 2007.
  23. Stolcke, Andreas. SRILM-an extensible language modeling toolkit. in Proceedings of the international conference on spoken language processing, pp. 901-904, 2002.
  24. Toutanova, Kristina, Klein, Dan, Manning, Christopher D, and Singer, Yoram. Feature-rich part-of-speech tagging with a cyclic dependency network. in Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pp. 173-180, 2003.
  25. Zollmann, Andreas, Venugopal, Ashish, and Vogel, Stephan. Bridging the inflection morphology gap for Arabic statistical machine translation. presented at the Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, New York, New York, 2006.
Index Terms

Computer Science
Information Sciences

Keywords

Statistical machine translation Factored phrase based Natural language processing.