International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 52 - Number 14 |
Year of Publication: 2012 |
Authors: Thoudam Doren Singh |
10.5120/8274-1876 |
Thoudam Doren Singh . Building Parallel Corpora for SMT System: A Case Study of English-Manipuri. International Journal of Computer Applications. 52, 14 ( August 2012), 47-51. DOI=10.5120/8274-1876
The Statistical Machine Translation (SMT) systems are developed using sentence aligned parallel corpus. The difficulty is that there is no parallel corpus at the required measure for many language pairs. The preparation of large scale parallel corpus takes time and demands the linguistics skill. In the present work, the various issues of a quality parallel corpus and a technique that extracts parallel corpus between Manipuri, a morphologically rich and resource constrained Indian language and English has been developed from a web based comparable news corpora. We explore the crux of the parallel corpora towards improving the translation quality through linguistics factors for the language pair.