International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 186 - Number 26 |
Year of Publication: 2024 |
Authors: Shlok Deshpande, Vineet Shinde, Siddharth Chaudhari, Yashodhara V. Haribhakta |
10.5120/ijca2024923738 |
Shlok Deshpande, Vineet Shinde, Siddharth Chaudhari, Yashodhara V. Haribhakta . Multilingual & Cross-Lingual Text Summarization of Marathi and English using Transformer Based Models and their Systematic Evaluation. International Journal of Computer Applications. 186, 26 ( Jul 2024), 11-17. DOI=10.5120/ijca2024923738
The proposed Methodology pioneers an approach to multilingual and cross-lingual text summarization, bridging Marathi and English languages through the innovative deployment and specialized optimization of advanced transformer-based models. The research introduces a novel framework designed to navigate and synthesize the linguistic nuances between these two languages, offering a unique contribution to the field of natural language processing. The utilization of Pegasus, T5, and BART is done for English and IndicBART, mT5, and mBART for Marathi summarization, using M2M-100 for translation, to create a synergistic framework that effectively handles the challenges of cross summarization across languages. The core objective is to perform cross-lingual summarization using these models, enhancing their ability to understand and summarize content across Marathi to English & vice-versa. The methodology includes a combination of multiple vast datasets for training and comprehensive evaluation using ROUGE, BLEU, and BERT metrics to assess summarization quality. Additionally, a novel evaluation metric is introduced, which is a combination of concept coverage, semantic similarity and relevance, tailored for assessing multi and cross-lingual summarization quality between English and Marathi. This project not only aims to advance the field of cross-lingual summarization but also seeks to improve accessibility and foster better understanding across linguistic and cultural boundaries.