| International Journal of Computer Applications |
| Foundation of Computer Science (FCS), NY, USA |
| Volume 187 - Number 110 |
| Year of Publication: 2026 |
| Authors: Shreya M. Kapadia, Payal D. Joshi |
10.5120/ijca6a8da113498f
|
Shreya M. Kapadia, Payal D. Joshi . Comparative Performance Analysis of BM25 and Vector Space Model for Document Retrieval in Gujarati News Corpora. International Journal of Computer Applications. 187, 110 ( May 2026), 38-44. DOI=10.5120/ijca6a8da113498f
Retrieving relevant information from Gujarati news articles is a challenging task because of the limited availability of computational resources and language-processing tools for Gujarati, despite the rapid growth of digital news content. In this study, an information retrieval–based framework for Gujarati news document retrieval is proposed using the GSF-2009 corpus released under the FIRE evaluation initiative. Two classical retrieval models, BM25 and the Vector Space Model (VSM), are employed to retrieve and rank documents relevant to user-defined event-oriented queries. Experimental evaluation is performed using both short and descriptive Gujarati queries. For the short query “ગુજરાતમાં ભારે વરસાદ”, VSM demonstrates better performance with Recall = 0.7 and F1-score = 0.8, whereas BM25 records Recall = 0.3 and F1-score = 0.5. In contrast, for the descriptive query “ગુજરાતમાં ભારે વરસાદના કારણે અનેક જિલ્લાઓમાં પૂર જેવી સ્થિતિ”, BM25 outperforms VSM with Precision = 1.0, Recall = 0.7, and F1-score = 0.8, whereas VSM achieves Precision = 0.8, Recall = 0.5, and F1-score = 0.6. The results indicate that VSM performs more effectively for short keyword-based queries, while BM25 achieves better retrieval effectiveness for long and context-rich queries. Explicit event detection is not performed in this study; however, event-oriented retrieval is effectively supported through retrieval of documents associated with real-world events. The proposed framework provides an effective baseline for Gujarati news retrieval and supports further research in event-oriented retrieval for low-resource Indic languages.