International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 55 - Number 17 |
Year of Publication: 2012 |
Authors: Kumar Sourabh, Vibhakar Mansotra |
10.5120/8845-2987 |
Kumar Sourabh, Vibhakar Mansotra . Query Optimization: A Solution for Low Recall Problem in Hindi Language Information Retrieval. International Journal of Computer Applications. 55, 17 ( October 2012), 6-17. DOI=10.5120/8845-2987
While information retrieval (IR) has been an active field of research for decades, for much of its history it has had a very strong bias towards English as the language of choice for research and evaluation purposes. Whatever they may have been, over the years, many of the motivations for an almost exclusive focus on English as the language of choice in IR have lost their validity. The Internet is no longer monolingual, as the non- English content is growing rapidly. Hindi is the third most widely-spoken language in the world (after English and Mandarin): an estimated 500-600 million people speak this language. Information Retrieval in Hindi language is getting popularity and IR systems face low recall if existing systems are used as-is. Certain characteristics of Indian languages cause the existing algorithms not being able to match relevant keywords in the documents for retrieval. Some of the major characteristics that affect Indian language IR are due to language morphology, compound word formations, word spelling variations, Ambiguity, Word Synonym, foreign language influence, lack of standards for spelling words. Taking into consideration the aforesaid issues we introduce Hindi Query Optimization technique (design and development) which solved the problem of recall up to a great extent.