CFP last date
20 December 2024
Reseach Article

An Exploratory Study of Stacked Multilingual SMT Systems for Low Resource Languages

by Ikechukwu Ignatius Ayogu
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 185 - Number 33
Year of Publication: 2023
Authors: Ikechukwu Ignatius Ayogu
10.5120/ijca2023922810

Ikechukwu Ignatius Ayogu . An Exploratory Study of Stacked Multilingual SMT Systems for Low Resource Languages. International Journal of Computer Applications. 185, 33 ( Sep 2023), 1-5. DOI=10.5120/ijca2023922810

@article{ 10.5120/ijca2023922810,
author = { Ikechukwu Ignatius Ayogu },
title = { An Exploratory Study of Stacked Multilingual SMT Systems for Low Resource Languages },
journal = { International Journal of Computer Applications },
issue_date = { Sep 2023 },
volume = { 185 },
number = { 33 },
month = { Sep },
year = { 2023 },
issn = { 0975-8887 },
pages = { 1-5 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume185/number33/32899-2023922810/ },
doi = { 10.5120/ijca2023922810 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:27:38.491947+05:30
%A Ikechukwu Ignatius Ayogu
%T An Exploratory Study of Stacked Multilingual SMT Systems for Low Resource Languages
%J International Journal of Computer Applications
%@ 0975-8887
%V 185
%N 33
%P 1-5
%D 2023
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The indigenous capacity for the development of computational linguistics tools for Nigerian languages is yet low compared to what has been achieved in other multi-ethno-linguistic nations such as India. Effective communication among Nigerian citizens of different tongues, and who are unable to use English has been continuously hampered. Thus the need to inter-translate Nigerian languages has become increasingly urgent. Though machine translation (MT) research has achieved state-of-the-art for English and some few privileged languages of the world, the lack of datasets for many Nigerian languages further increases the difficulty of developing MT systems for them. This paper proposes a model for rapidly developing MT system for a new language in a multilingual setup. The overall aim of this research is to establish a scalable platform for the continuous development of MT systems for Nigerian languages using English language as a pivot language. For ease of adaptation and inclusion of a new language, purely datadriven approaches that carefully avoids absolute dependence on the availability of linguistic expertise is adopted. This paper presents a multilingual translation system for English, Igbo, and Yor`ub´a language mix. Using a research dataset, an overall best BLEU score of 35.62 was obtained for the English-Igbo system, 32.10 for English- Yor`ub´a system, and 21.03 for Igbo-Yor`ub´a. These results are encouraging, given the size of the training corpora used.

References
  1. B. Adelabu. A contrastive analysis of adjectives in english and yoruba. International Journal of Educational Research, 2(4):509–524, 2014.
  2. M. Artetxe, G. Labaka, and E. Agirre. Unsupervised statistical machine translation. rXiv preprint arXiv:1809.01272, 2(4):509–524, 2018.
  3. L. Benkova, D. Munkova, L. Benko, and M. Munk. Evaluation of english–slovak neural and statistical machine translation. Applied Sciences, 7(11):2948–2, 2021.
  4. P. Brown, J. Cocken, S. Della-Pietra, V. Della-Pietra, F. Jelinek, R. Mercer, and P. Roossin. A statistical approach to language translation. in:. COLING 1988: Proceedings of the 12th International Conference on Computational Linguistics, Budapest, Hungary, pages 71–76, 1988.
  5. M. Federico, N. Bertoldi, and M. Cettolo. Irstlm: an open source toolkit for handling large scale language models. In Interspeech, pages 1618–1621, 2008.
  6. W. A. Gale and K. W. Church. A program for aligning sentences in bilingual corpora. Computational Linguistics, 1(19):75–102, 1993.
  7. M. Huck and A. Birch. The edinburgh machine translation systems for iwslt 2015. in:. In Proceedings of the International Workshop on Spoken Language Translation. Da Nang, Vietnam, pages 31–38, 2015.
  8. D. Jurafsky. Speech and Language Processing. 2nd Edition. Prentice Hall, 2008.
  9. P. Koehn. Europal: A parallel corpus for statistical machine translation. In Proceedings of the tenth machine translation summit, Phuket, Thailand., 2005.
  10. P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B.e Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. Moses: Open source toolkit for statistical machine translation. in:. ACL- 2007: Procedings of the 45th Annual Meeting of the Association for Computational Liguistics, Prague, Czech Republic, pages 177–180, 2007.
  11. P. Koehn, F. J. Och, and D. Marcu. Statistical phrase-based translation. in:. HLT-NAACL 2003: Proceedings of the Joint Human Language Technology Conference and the Annual Meeting of the North American Chapter of the Association for Computational Linguistics, Edmonton, AB, Canada, pages 127–133, 2003.
  12. Philip Koehn. Statistical Machine Translation, volume 1. Prentice-Hall, Englewood Cliffs, NJ, 2010.
  13. B. Marie, H. Sun, R. Wang, K. Chen, A. Fujita, M. Utiyama, and E. Sumita. Nict’s unsupervised neural and statistical machine translation systems for the wmt19 news translation task. In In Proceedings of the Fourth Conference on Machine Translation, pages 294–301, 2019.
  14. M. Martindale, M. Carpuat, K. Duh, and P. McNamee. Identifying fluently inadequate output in neural and statistical machine translation. In In Proceedings of Machine Translation Summit XVII: Research Track, pages 233–243, 2019.
  15. M. S. Maucec and J. Brest. Slavic languages in phrase-based statistical machine translation: a survey. Artificial intelligence review, 1(51):77–117, 2019.
  16. F. J. Och. Statistical machine translation: from single-word models to alignment templates. PhD thesis, Bibliothek der RWTH Aachen, 2002.
  17. F. J. Och. Minimum error rate training in statistical machine translation. in:. ACL-2003: 41st annual meeting of the association for computational linguistics, Sapporo, Japan, pages 160–167, 2003.
  18. F. J. Och and H. Ney. A comparison of alignment models for statistical machine translation. in:. ACL-00:Proceedings of the 18th conference on Computational linguistics-Volume 2, pages 1086–109, 2000.
  19. F. J. Och and H. Ney. Discriminative training and maximum entropy models for statistical machine translation. in:. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, pages 295–302, 2002.
  20. K. Papineni, S. Roukus, T. Ward, and W. J. Zhu. Bleu: a method for automatic evaluation of machine translation. in:. ACL-2002: 40th Annual meeting of the association for computational linguistics, Philadelphia, pages 311–318, 2002.
  21. A. U. Umeodinka. Corpus-based contrastive analysis of igbo and english adjectives. journal of igbo language and linguistics, (3):54–62, 2011.
  22. B. C. Uzoigwe. A contrastive analysis of igbo and english determiner phrases. journal of igbo language and linguistics, (3):73–83, 2011.
  23. L. N. Vieira, M. O’Hagan, and C. O’Sullivan. Understanding the societal impacts of machine translation: a critical review of the literature on medical and legal use cases. Information, Communication and Society, 24(11):1515–1532, 2021.
  24. H. Wang, H. Wu, Z. He, L. Huang, and K. W. Church. Progress in machine translation. Engineering, 7(11):2948–2, 2021.
Index Terms

Computer Science
Information Sciences

Keywords

Multilingual Machine Translation SMT Parallel corpora Low resource languages Nigerian languages