CFP last date
20 December 2024
Reseach Article

English – Igala Parallel Corpora for Natural Language Processing Applications

by Sani Felix Ayegba, Abu Onoja, Musa Ugbedeojo
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 171 - Number 9
Year of Publication: 2017
Authors: Sani Felix Ayegba, Abu Onoja, Musa Ugbedeojo
10.5120/ijca2017913184

Sani Felix Ayegba, Abu Onoja, Musa Ugbedeojo . English – Igala Parallel Corpora for Natural Language Processing Applications. International Journal of Computer Applications. 171, 9 ( Aug 2017), 1-6. DOI=10.5120/ijca2017913184

@article{ 10.5120/ijca2017913184,
author = { Sani Felix Ayegba, Abu Onoja, Musa Ugbedeojo },
title = { English – Igala Parallel Corpora for Natural Language Processing Applications },
journal = { International Journal of Computer Applications },
issue_date = { Aug 2017 },
volume = { 171 },
number = { 9 },
month = { Aug },
year = { 2017 },
issn = { 0975-8887 },
pages = { 1-6 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume171/number9/28206-2017913184/ },
doi = { 10.5120/ijca2017913184 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:19:06.772432+05:30
%A Sani Felix Ayegba
%A Abu Onoja
%A Musa Ugbedeojo
%T English – Igala Parallel Corpora for Natural Language Processing Applications
%J International Journal of Computer Applications
%@ 0975-8887
%V 171
%N 9
%P 1-6
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Parallel text is a fundamental requirement for the development of corpus based or data driven machine translation systems and other Natural Language Processing applications. The unavailability of this valuable linguistic resource has greatly hampered the development of NLP applications in English and Igala language. This study is aimed at creating English – Igala parallel text. The result of the study in addition to providing linguistic resource will enhance language learning. Various algorithms for automatic construction of parallel text such as STRAND, PTMiner, PTI, WPDE, BITS were studied to determine their appropriateness in creating English –Igala parallel text. Wikipedia and the Bible which are excellent sources of parallel or comparable corpora were also gleaned. Existing algorithms and other sources of Parallel text were found to be unsuitable for the construction of English – Igala parallel text due to the unavailability of contents rendered in Igala language on the web and in electronic form. A combination of manual and machine assisted translation was used to generate the parallel text. English – Igala parallel corpora comprising of 50,000 aligned sentences was obtained.

References
  1. Chen, J., Chau, R., Yeh, C.H. 2004 Discovering parallel text from the World Wide Web. In Proceedings of the 2nd Workshop on Australasian Information Security, Data Mining and Web Intelligence, and Software Internationalisation, Dunedin, New Zealand, Australian Computer Society. 157-161
  2. Chen Yu, Martin Kay and Andreas Eisele. 2009. Intersecting multilingual data for faster and better statistical translations. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 128-136. Boulder, Colorado.
  3. Cohn Trevor and Mirella Lapata. 2007. Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora. Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics,pp. 728-735. Prague, Czech Republic.
  4. Christodouloupoulos C, Steedman M. 2015. A massively parallel corpus: the Bible in 100 languages. Lang Resources & Evaluation (2015) 49:375–395 DOI 10.1007/s10579-014-9287-y.
  5. Egbunu, F. E. 2013. Education and Re-orientation of Igala Cultural Values, African Journal of Culture, Religious, Educational and Environmental Sustainability (AJCREES), Vol. 1, No. 2. Pp. 66 – 82. Dec., 2013.
  6. Eisele A. & Yu C. 2010. MultiUN: A Multilingual Corpus from United Nation Documents. Proceedings of the International Conference on Language Resources and Evaluation (LREC 2010), pp. 2868-2872. Valletta, Malta.
  7. Koehn P. 2005. EuroParl: A Parallel Corpus for Statistical Machine Translation. Proceedings of the Machine Translation Summit, pp. 79-86, Phuket, Thailand.
  8. Koehn P., Alexandra B., & Ralf S. 2009. 462 Machine Translation Systems for Europe. In Proceedings of the Twelfth Machine Translation Summit (MT-Summit XII), pages 65-72. Ottawa, Canada, (August 2009).
  9. Ma, X. and M. Liberman. 1999. BITS: A Method for Bilingual Text Search over the Web. In Proceedings of Machine Translation Summit VII.
  10. Muhammed M. S., Mohammed M. K., Ali M. N A. 2015. Automated Construction of Arabic-English Parallel Corpus.
  11. Nie, J. Y., Isabelle, M. S. P., and Durand R. 1999. Cross-language Information Retrieval based on Parallel Texts and Automatic Mining of Parallel Texts from the Web. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development.
  12. Omachonu G.S. 2012. Igala Language Studies and Development: Progress, Issues and Challenges, Text of a paper presented at the 12th Igala Education Summit held at Kogi State University, Anyigba- Kogi State, Nigeria. (Dec. 2012).
  13. Pianta E., Bentivogli L. 2005. Exploiting parallel texts in the creation of multilingual semantically annotated resources: the MultiSemCor Corpus Natural Language Engineering 11 (3): 247–261. 2005 Cambridge University Press.
  14. Ralf S. et al. 2014. An overview of the European Union’s highly multilingual parallel corpora. EUROPEAN COMMISSION (EC) & EUROPEAN PARLIAMENT (EP) & EUROPEAN CENTRE FOR DISEASE PREVENTION AND CONTROL (ECDC).
  15. Resnik, P., Olsen, M., and Diab, M. 1999. The Bible as a parallel corpus: Annotating the ‘‘Book of 2000 Tongues’’. Computers and the Humanities, 33, 129–153.
  16. Resnik, P. and N. A. Smith. 2003. The Web as a Parallel Corpus. Computational Linguistics, 29(3)
  17. Sani F. A. 2016. English to Igala Machine Translation System. PhD Dissertation, Universidad Azteca, Mexico.
  18. Sani Rita I. 2013. Adaptation of the Staff of Office of Attah Igala into Textile Design Forms and Products. Master’s Thesis, University of Nigeria, Nnsukka.
  19. Utiyama Masao. 2012. Efficient Technologies for Creating Parallel Corpora. Journal of the National Institute of Information and Communications Technology Vol. 59. Pp 41 – 47.
  20. Ying Zhang, etal. 2006. Automatic acquisition of Chinese–English parallel corpus from the web. ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval. London, UK — April 10 - 12, 2006,   pp 420-431.
  21. www.wikipedia.org
Index Terms

Computer Science
Information Sciences

Keywords

Parallel text Natural Language Processing Machine Translation comparable corpora corpus based or data driven machine translation systems linguistic resource