CFP last date
20 September 2024
Reseach Article

Multilingual & Cross-Lingual Text Summarization of Marathi and English using Transformer Based Models and their Systematic Evaluation

by Shlok Deshpande, Vineet Shinde, Siddharth Chaudhari, Yashodhara V. Haribhakta
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Number 26
Year of Publication: 2024
Authors: Shlok Deshpande, Vineet Shinde, Siddharth Chaudhari, Yashodhara V. Haribhakta
10.5120/ijca2024923738

Shlok Deshpande, Vineet Shinde, Siddharth Chaudhari, Yashodhara V. Haribhakta . Multilingual & Cross-Lingual Text Summarization of Marathi and English using Transformer Based Models and their Systematic Evaluation. International Journal of Computer Applications. 186, 26 ( Jul 2024), 11-17. DOI=10.5120/ijca2024923738

@article{ 10.5120/ijca2024923738,
author = { Shlok Deshpande, Vineet Shinde, Siddharth Chaudhari, Yashodhara V. Haribhakta },
title = { Multilingual & Cross-Lingual Text Summarization of Marathi and English using Transformer Based Models and their Systematic Evaluation },
journal = { International Journal of Computer Applications },
issue_date = { Jul 2024 },
volume = { 186 },
number = { 26 },
month = { Jul },
year = { 2024 },
issn = { 0975-8887 },
pages = { 11-17 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume186/number26/multilingual-cross-lingual-text-summarization-of-marathi-and-english-using-transformer-based-models-and-their-systematic-evaluation/ },
doi = { 10.5120/ijca2024923738 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-07-09T00:35:21.269912+05:30
%A Shlok Deshpande
%A Vineet Shinde
%A Siddharth Chaudhari
%A Yashodhara V. Haribhakta
%T Multilingual & Cross-Lingual Text Summarization of Marathi and English using Transformer Based Models and their Systematic Evaluation
%J International Journal of Computer Applications
%@ 0975-8887
%V 186
%N 26
%P 11-17
%D 2024
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The proposed Methodology pioneers an approach to multilingual and cross-lingual text summarization, bridging Marathi and English languages through the innovative deployment and specialized optimization of advanced transformer-based models. The research introduces a novel framework designed to navigate and synthesize the linguistic nuances between these two languages, offering a unique contribution to the field of natural language processing. The utilization of Pegasus, T5, and BART is done for English and IndicBART, mT5, and mBART for Marathi summarization, using M2M-100 for translation, to create a synergistic framework that effectively handles the challenges of cross summarization across languages. The core objective is to perform cross-lingual summarization using these models, enhancing their ability to understand and summarize content across Marathi to English & vice-versa. The methodology includes a combination of multiple vast datasets for training and comprehensive evaluation using ROUGE, BLEU, and BERT metrics to assess summarization quality. Additionally, a novel evaluation metric is introduced, which is a combination of concept coverage, semantic similarity and relevance, tailored for assessing multi and cross-lingual summarization quality between English and Marathi. This project not only aims to advance the field of cross-lingual summarization but also seeks to improve accessibility and foster better understanding across linguistic and cultural boundaries.

References
  1. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. CoRR, abs/1706.03762, 2017.
  2. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks, 2014.
  3. Tahmid Hasan, Abhik Bhattacharjee, Md. Saiful Islam, Kazi Mubasshir, Yuan-Fang Li, Yong-Bin Kang, M. Sohel Rahman, and Rifat Shahriyar. XL-sum: Large-scale multilingual abstractive summarization for 44 languages. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4693–4703, Online, August 2021. Association for Computational Linguistics.
  4. Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, and Colin Raffel. mt5: A massively multilingual pre-trained text-to-text transformer. CoRR, abs/2010.11934, 2020.
  5. Abhik Bhattacharjee, Tahmid Hasan, Wasi Uddin Ahmad, Yuan-Fang Li, Yong-Bin Kang, and Rifat Shahriyar. CrossSum: Beyond English-centric cross-lingual summarization for 1,500+ language pairs. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2541–2564, Toronto, Canada, July 2023. Association for Computational Linguistics.
  6. Sotaro Takeshita, Tommaso Green, Niklas Friedrich, Kai Eckert, and Simone Paolo Ponzetto. X-scitldr: Cross-lingual extreme summarization of scholarly documents. In 2022 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pages 1–12, 2022.
  7. S. Kulkarni, Prapti Deshmukh, Majharoddin Kazi, and Karbhari Kale. Linguistic divergence patterns in english to marathi translation. International Journal of Computer Applications, 87, 01 2014.
  8. Arjit Agarwal, Soham Naik, and Sheetal S. Sonawane. Abstractive text summarization for hindi language using indicbart. In Fire, 2022.
  9. VAISHALI P KADAM1 SAMAH ALI ALAZANI and C NAMRATA MAHENDER. A text summarization system for marathi language.
  10. Ashok Urlana. Enhancing Text Summarization for Indian Languages: Mono, Multi and Cross-lingual Approaches. PhD thesis, 07 2023.
  11. Ashok Urlana, Sahil Manoj Bhatt, Nirmal Surange, and Manish Shrivastava. Indian language summarization using pretrained sequence-to-sequence models, 2023.
  12. Ashok Urlana, Pinzhen Chen, Zheng Zhao, Shay B. Cohen, Manish Shrivastava, and Barry Haddow. Pmindiasum: Multilingual and cross-lingual headline summarization for languages in india, 2023.
  13. Kishore Papineni, Salim Roukos, Todd Ward, and Wei Jing Zhu. Bleu: a method for automatic evaluation of machine translation. 10 2002.
  14. Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. Bertscore: Evaluating text generation with BERT. CoRR, abs/1904.09675, 2019.
  15. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
  16. Parth Patil, Aparna Ranade, Maithili Sabane, Onkar Litake, and Raviraj Joshi. L3cube-mahaner: A marathi named entity recognition dataset and bert models, 2022.
  17. Simran Khanuja, Diksha Bansal, Sarvesh Mehtani, Savya Khosla, Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi Dave, Shruti Gupta, Subhash Chandra Bose Gali, Vish Subramanian, and Partha Talukdar. Muril: Multilingual representations for indian languages, 2021.
Index Terms

Computer Science
Information Sciences
Natural Language Processing
Summarization
Transformers

Keywords

Natural Language Generation Multi & Cross-Lingual Summarization Indic Languages