CFP last date
20 January 2025
Reseach Article

Supporting Large English-Hindi Parallel Corpus using Word Alignment

by Shweta Dubey, Tarun Dhar Diwan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 49 - Number 6
Year of Publication: 2012
Authors: Shweta Dubey, Tarun Dhar Diwan
10.5120/7631-0710

Shweta Dubey, Tarun Dhar Diwan . Supporting Large English-Hindi Parallel Corpus using Word Alignment. International Journal of Computer Applications. 49, 6 ( July 2012), 16-19. DOI=10.5120/7631-0710

@article{ 10.5120/7631-0710,
author = { Shweta Dubey, Tarun Dhar Diwan },
title = { Supporting Large English-Hindi Parallel Corpus using Word Alignment },
journal = { International Journal of Computer Applications },
issue_date = { July 2012 },
volume = { 49 },
number = { 6 },
month = { July },
year = { 2012 },
issn = { 0975-8887 },
pages = { 16-19 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume49/number6/7631-0710/ },
doi = { 10.5120/7631-0710 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:45:34.566751+05:30
%A Shweta Dubey
%A Tarun Dhar Diwan
%T Supporting Large English-Hindi Parallel Corpus using Word Alignment
%J International Journal of Computer Applications
%@ 0975-8887
%V 49
%N 6
%P 16-19
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper gives description about methodology to understand parallel English-Hindi sentences using word alignment. This methodology is foundation to develop the parallel English-Hindi word dictionary after syntactically and semantically analysis of the English-Hindi source text. Methodology of proposed system is used for the English and Hindi sentences; also the methodology can be used for other languages. Outsized parallel corpus of English-Hindi pair language is not frequently available. Development is based on two strategies to solve this problem. First is normalization of tagged English sentences and Hindi sentences. Second is mapping English-Hindi sentence using parallel English-Hindi word dictionary. Fortunately word alignment is clearly known and few aligning algorithms are without restraint accessible.

References
  1. Niraj Aswani, "Aligning words in English- Hindi parallel corpora", Proceedings of the ACL Workshop on Building and Using Parallel Texts, pages115–118.
  2. Tong Xiao, Huizhen Wang,, "The NiuTrans Machine Translation System for NTCIR-9 Patent", Proceedings of NTCIR-9, December 6- 9, 2011, Tokyo, Japan, Pages 593- 599.
  3. Niraj Aswani, "A hybrid approach to align sentences and words in English-Hindi parallel corpora", Proceedings of the ACL Workshop on Building and Using Parallel Texts, pages 57–64.
  4. Antony P J, Nandini. J. Warrier, Dr. Soman K P, "Penn Treebank-Based Syntactic Parsers for South Dravidian Languages using a Machine Learning Approach", International Journal of Computer Applications (0975 – 8887), Volume 7– No. 8, October 2010, pages 14-21.
  5. Yoshinobu Kano, Jun'ichi Tsujii, "Sharable Type System Design for Tool Inter-Operability and Combinatorial Comparison", The First International Conference on Global Interoperability for Language Resources, pages 121-129.
  6. Richard Beaufort, Sophie Roekhaut, Louise-Amélie, Cougnon Cédrick Fairon, "A hybrid rule/model- based finite-state framework for normalizing SMS messages", Proceedings of the 48th AnnualMeeting of the Association for Computational Linguistics, pages 770–779.
  7. Hassan Al-Haj, Shuly Wintner,, "Identifying Multi-word Expressions by Leveraging Morphological and Syntactic Idiosyncrasy, Proceedings of the 23rd International conference on Computational Linguistics, pages 10–18.
  8. Yulia Tsvetkov, Shuly-Wintner, "Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources", Proceedings of the 2011 Conference on Empirical Methods in Natural Language.
Index Terms

Computer Science
Information Sciences

Keywords

Tagging Local Word Grouping Word Mapping Normalization Part of Speech tagging(POST) Word Dictionary Multi Word Expressions Mapping Score