International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 85 - Number 14 |
Year of Publication: 2014 |
Authors: Shubhamay Sen, Sriram Chaudhury |
10.5120/14913-3521 |
Shubhamay Sen, Sriram Chaudhury . Improvement of the Results of Statistical Machine Translation System using Anusaaraka. International Journal of Computer Applications. 85, 14 ( January 2014), 41-47. DOI=10.5120/14913-3521
This paper describes an efficient experimental approach for the improvement of translation quality of phrase based statistical machine translation system by utilizing the insights of the rule based machine translation. As the most primitive step it is believed that appending large and accurately designed linguistic resources such as multiword bilingual dictionaries to the existing training corpus contributes a lot towards the enhancement of phrase alignment quality and phrase coverage of the Statistical Machine Translation (SMT) system. Further improvement in translation coverage can be achieved by improving the dictionary by introducing morph-syntactic word forms of the foreign language words instead of simple root word forms, and its corresponding translations in native language. As in real time testing scenario, the test corpus may possess different morphological extensions of the root word which is not covered by standard dictionaries. As a matter of fact addition of such dictionaries to the corpus enriches it and provides a solution to the improper translations previously generated due to occurrences of morph-syntactic extensions instead of the root word form. As the proposed approach towards further improvement, the intelligence of Anusaaraka and huge computational ability of SMT is integrated to achieve better translations. Anusaaraka is a machine translation system based on Panini's Astadhyayi grammatical rules and an expert when the English-Hindi phrase alignment is concerned. It does it by comparing its output translation with the accurate manual translation and extracting out the best possible option. The bi-lingual phrase pairs thus obtained are highly accurate and when appended to the training corpus of statistical machine translation system results as better phrase alignment structure, hence better translation quality.