International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 180 - Number 8 |
Year of Publication: 2017 |
Authors: Shishpal Jindal, Vishal Goyal, Jaskarn Singh Bhullar |
10.5120/ijca2017916036 |
Shishpal Jindal, Vishal Goyal, Jaskarn Singh Bhullar . Building English-Punjabi Parallel corpus for Machine Translation. International Journal of Computer Applications. 180, 8 ( Dec 2017), 26-29. DOI=10.5120/ijca2017916036
Objective Parallel corpus is the key resource for English Punjabi machine translation. At wide level there is no availability of English-Punjabi corpora. There is a primary requirement of parallel corpus for the training of statistical machine translation. Methods/Analysis In this paper, authors focus on building English-Punjabi corpus at large scale. It posed difficulties and the intensive labor to develop the corpus. We are intricate on the collection as well as the flow of work for the construction of parallel corpus. Now after getting the raw text, we need to refine the corpus in such a way that every source language sentence should have corresponding target language sentence. Findings The paper attempts to explore existing tools as well as building new tools. One of the goals is alignment of bilingual corpus. The alignment algorithms are used to tune the sentences. The accuracy depends on the type of corpus. Novelty/Improvement A cautious endeavor has been made to capture different types of texts.