English – Igala Parallel Corpora for Natural Language Processing Applications

Sani Felix Ayegba; Abu Onoja; Musa Ugbedeojo

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Reseach Article

English – Igala Parallel Corpora for Natural Language Processing Applications

by Sani Felix Ayegba, Abu Onoja, Musa Ugbedeojo

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 171 - Number 9

Year of Publication: 2017

Authors: Sani Felix Ayegba, Abu Onoja, Musa Ugbedeojo

10.5120/ijca2017913184

Sani Felix Ayegba, Abu Onoja, Musa Ugbedeojo . English – Igala Parallel Corpora for Natural Language Processing Applications. International Journal of Computer Applications. 171, 9 ( Aug 2017), 1-6. DOI=10.5120/ijca2017913184

@article{ 10.5120/ijca2017913184,

author = { Sani Felix Ayegba, Abu Onoja, Musa Ugbedeojo },

title = { English – Igala Parallel Corpora for Natural Language Processing Applications },

journal = { International Journal of Computer Applications },

issue_date = { Aug 2017 },

volume = { 171 },

number = { 9 },

month = { Aug },

year = { 2017 },

issn = { 0975-8887 },

pages = { 1-6 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume171/number9/28206-2017913184/ },

doi = { 10.5120/ijca2017913184 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:19:06.772432+05:30

%A Sani Felix Ayegba

%A Abu Onoja

%A Musa Ugbedeojo

%T English – Igala Parallel Corpora for Natural Language Processing Applications

%J International Journal of Computer Applications

%@ 0975-8887

%V 171

%N 9

%P 1-6

%D 2017

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Parallel text is a fundamental requirement for the development of corpus based or data driven machine translation systems and other Natural Language Processing applications. The unavailability of this valuable linguistic resource has greatly hampered the development of NLP applications in English and Igala language. This study is aimed at creating English – Igala parallel text. The result of the study in addition to providing linguistic resource will enhance language learning. Various algorithms for automatic construction of parallel text such as STRAND, PTMiner, PTI, WPDE, BITS were studied to determine their appropriateness in creating English –Igala parallel text. Wikipedia and the Bible which are excellent sources of parallel or comparable corpora were also gleaned. Existing algorithms and other sources of Parallel text were found to be unsuitable for the construction of English – Igala parallel text due to the unavailability of contents rendered in Igala language on the web and in electronic form. A combination of manual and machine assisted translation was used to generate the parallel text. English – Igala parallel corpora comprising of 50,000 aligned sentences was obtained.

References

Chen, J., Chau, R., Yeh, C.H. 2004 Discovering parallel text from the World Wide Web. In Proceedings of the 2nd Workshop on Australasian Information Security, Data Mining and Web Intelligence, and Software Internationalisation, Dunedin, New Zealand, Australian Computer Society. 157-161
Chen Yu, Martin Kay and Andreas Eisele. 2009. Intersecting multilingual data for faster and better statistical translations. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 128-136. Boulder, Colorado.
Cohn Trevor and Mirella Lapata. 2007. Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora. Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics,pp. 728-735. Prague, Czech Republic.
Christodouloupoulos C, Steedman M. 2015. A massively parallel corpus: the Bible in 100 languages. Lang Resources & Evaluation (2015) 49:375–395 DOI 10.1007/s10579-014-9287-y.
Egbunu, F. E. 2013. Education and Re-orientation of Igala Cultural Values, African Journal of Culture, Religious, Educational and Environmental Sustainability (AJCREES), Vol. 1, No. 2. Pp. 66 – 82. Dec., 2013.
Eisele A. & Yu C. 2010. MultiUN: A Multilingual Corpus from United Nation Documents. Proceedings of the International Conference on Language Resources and Evaluation (LREC 2010), pp. 2868-2872. Valletta, Malta.
Koehn P. 2005. EuroParl: A Parallel Corpus for Statistical Machine Translation. Proceedings of the Machine Translation Summit, pp. 79-86, Phuket, Thailand.
Koehn P., Alexandra B., & Ralf S. 2009. 462 Machine Translation Systems for Europe. In Proceedings of the Twelfth Machine Translation Summit (MT-Summit XII), pages 65-72. Ottawa, Canada, (August 2009).
Ma, X. and M. Liberman. 1999. BITS: A Method for Bilingual Text Search over the Web. In Proceedings of Machine Translation Summit VII.
Muhammed M. S., Mohammed M. K., Ali M. N A. 2015. Automated Construction of Arabic-English Parallel Corpus.
Nie, J. Y., Isabelle, M. S. P., and Durand R. 1999. Cross-language Information Retrieval based on Parallel Texts and Automatic Mining of Parallel Texts from the Web. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development.
Omachonu G.S. 2012. Igala Language Studies and Development: Progress, Issues and Challenges, Text of a paper presented at the 12th Igala Education Summit held at Kogi State University, Anyigba- Kogi State, Nigeria. (Dec. 2012).
Pianta E., Bentivogli L. 2005. Exploiting parallel texts in the creation of multilingual semantically annotated resources: the MultiSemCor Corpus Natural Language Engineering 11 (3): 247–261. 2005 Cambridge University Press.
Ralf S. et al. 2014. An overview of the European Union’s highly multilingual parallel corpora. EUROPEAN COMMISSION (EC) & EUROPEAN PARLIAMENT (EP) & EUROPEAN CENTRE FOR DISEASE PREVENTION AND CONTROL (ECDC).
Resnik, P., Olsen, M., and Diab, M. 1999. The Bible as a parallel corpus: Annotating the ‘‘Book of 2000 Tongues’’. Computers and the Humanities, 33, 129–153.
Resnik, P. and N. A. Smith. 2003. The Web as a Parallel Corpus. Computational Linguistics, 29(3)
Sani F. A. 2016. English to Igala Machine Translation System. PhD Dissertation, Universidad Azteca, Mexico.
Sani Rita I. 2013. Adaptation of the Staff of Office of Attah Igala into Textile Design Forms and Products. Master’s Thesis, University of Nigeria, Nnsukka.
Utiyama Masao. 2012. Efficient Technologies for Creating Parallel Corpora. Journal of the National Institute of Information and Communications Technology Vol. 59. Pp 41 – 47.
Ying Zhang, etal. 2006. Automatic acquisition of Chinese–English parallel corpus from the web. ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval. London, UK — April 10 - 12, 2006, pp 420-431.
www.wikipedia.org

Index Terms

Computer Science

Information Sciences

Keywords

Parallel text Natural Language Processing Machine Translation comparable corpora corpus based or data driven machine translation systems linguistic resource