International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 171 - Number 9 |
Year of Publication: 2017 |
Authors: Sani Felix Ayegba, Abu Onoja, Musa Ugbedeojo |
10.5120/ijca2017913184 |
Sani Felix Ayegba, Abu Onoja, Musa Ugbedeojo . English – Igala Parallel Corpora for Natural Language Processing Applications. International Journal of Computer Applications. 171, 9 ( Aug 2017), 1-6. DOI=10.5120/ijca2017913184
Parallel text is a fundamental requirement for the development of corpus based or data driven machine translation systems and other Natural Language Processing applications. The unavailability of this valuable linguistic resource has greatly hampered the development of NLP applications in English and Igala language. This study is aimed at creating English – Igala parallel text. The result of the study in addition to providing linguistic resource will enhance language learning. Various algorithms for automatic construction of parallel text such as STRAND, PTMiner, PTI, WPDE, BITS were studied to determine their appropriateness in creating English –Igala parallel text. Wikipedia and the Bible which are excellent sources of parallel or comparable corpora were also gleaned. Existing algorithms and other sources of Parallel text were found to be unsuitable for the construction of English – Igala parallel text due to the unavailability of contents rendered in Igala language on the web and in electronic form. A combination of manual and machine assisted translation was used to generate the parallel text. English – Igala parallel corpora comprising of 50,000 aligned sentences was obtained.