An Experiment to Create Parallel Corpora for Odia

Rakesh Balabantaray; Deepak Sahoo

Call for Paper

April Edition

IJCA solicits high quality original research papers for the upcoming April edition of the journal. The last date of research paper submission is 20 March 2026

Submit your paper

Know more

The week's pick

Explainable Hybrid Deep Learning for Automated Diagnosis of Canine Mammary Tumors

Elham Shawky Salama Heba Askr Ashraf Darwish Aboul Ella Hassanien

Random Articles

Reseach Article

An Experiment to Create Parallel Corpora for Odia

by Rakesh Balabantaray, Deepak Sahoo

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 67 - Number 19

Year of Publication: 2013

Authors: Rakesh Balabantaray, Deepak Sahoo

10.5120/11503-7220

Rakesh Balabantaray, Deepak Sahoo . An Experiment to Create Parallel Corpora for Odia. International Journal of Computer Applications. 67, 19 ( April 2013), 18-20. DOI=10.5120/11503-7220

@article{ 10.5120/11503-7220,

author = { Rakesh Balabantaray, Deepak Sahoo },

title = { An Experiment to Create Parallel Corpora for Odia },

journal = { International Journal of Computer Applications },

issue_date = { April 2013 },

volume = { 67 },

number = { 19 },

month = { April },

year = { 2013 },

issn = { 0975-8887 },

pages = { 18-20 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume67/number19/11503-7220/ },

doi = { 10.5120/11503-7220 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:25:52.471653+05:30

%A Rakesh Balabantaray

%A Deepak Sahoo

%T An Experiment to Create Parallel Corpora for Odia

%J International Journal of Computer Applications

%@ 0975-8887

%V 67

%N 19

%P 18-20

%D 2013

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The term parallel corpora are typically used in linguistic circles to refer to texts that are translations of each other. And the term comparable corpora refer to texts in two languages that are similar in content, but are not exact translations. In order to exploit a parallel text, some kind of text alignment, which identifies equivalent text segments (approximate sentences), is a prerequisite for analysis. Parallel corpora are very much essential in cross lingual or multilingual information retrieval. This paper presents an approach for automatic creation of English-Odia parallel corpus from comparable corpus. Generally Named entities, Proper nouns and common nouns play an important role in information retrieval. We tried to find the effectiveness of named entities, Proper nouns and common nouns in aligning English – Odia comparable document pair. We have taken the Odia parallel corpus (152 English-Odia documents) from TDIL, as well as we have crawled comparable Wikipedia pages for testing and the results are encouraging. We have used Stanford coreNLP tool and Google translator in our work.

References

P. Sheridan & J. P. Ballerini, "Experiments in Multilingual information retrieval using the SPIDER system", SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 58-65.
J. R. Harsita, R. Kotagiri and V. Padu (Eds. ): DaSFAA 2008, LNCS 4947, pp. 380-392. 2008. Springer-Verlag Berlin Heidelberg 2008.
Sunita Arora, Rajni Tyagi, Somi Ram Singla: "Creation of Parallel Corpus from comparable Corpus" Proceedings of ASCNT – 2010, CDAC, Noida, India, pp. 77 – 83.
Dragos Stefan Munteanu, Daniel Marcu "Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora " Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 81–88, Sydney, July 2006. C 2006 Association for Computational Linguistics
Braschler. M, Scauble. P. : "Multilingual information retrieval based on document alignment techniques". Research and Advanced Technology for digital Libraries, 513-518 (1998).

Index Terms

Computer Science

Information Sciences

Keywords

Cross lingual information retrieval Named entity Comparable document document similarity key terms