CFP last date
20 January 2025
Reseach Article

Ad-hoc Retrieval on FIRE Data Set with TF-IDF and Probabilistic Models

by Chandra Shekhar Jangid, Santosh K Vishwakarma, Kamaljit I Lakhtaria
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 93 - Number 18
Year of Publication: 2014
Authors: Chandra Shekhar Jangid, Santosh K Vishwakarma, Kamaljit I Lakhtaria
10.5120/16435-6136

Chandra Shekhar Jangid, Santosh K Vishwakarma, Kamaljit I Lakhtaria . Ad-hoc Retrieval on FIRE Data Set with TF-IDF and Probabilistic Models. International Journal of Computer Applications. 93, 18 ( May 2014), 22-25. DOI=10.5120/16435-6136

@article{ 10.5120/16435-6136,
author = { Chandra Shekhar Jangid, Santosh K Vishwakarma, Kamaljit I Lakhtaria },
title = { Ad-hoc Retrieval on FIRE Data Set with TF-IDF and Probabilistic Models },
journal = { International Journal of Computer Applications },
issue_date = { May 2014 },
volume = { 93 },
number = { 18 },
month = { May },
year = { 2014 },
issn = { 0975-8887 },
pages = { 22-25 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume93/number18/16435-6136/ },
doi = { 10.5120/16435-6136 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:16:04.291659+05:30
%A Chandra Shekhar Jangid
%A Santosh K Vishwakarma
%A Kamaljit I Lakhtaria
%T Ad-hoc Retrieval on FIRE Data Set with TF-IDF and Probabilistic Models
%J International Journal of Computer Applications
%@ 0975-8887
%V 93
%N 18
%P 22-25
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Information Retrieval is finding documents of unstructured nature which should satisfy user's information needs. There exist various models for weighting terms of corpus documents and query terms. This work is carried out to analyze and evaluate the retrieval effectiveness of various IR models while using the new data set of FIRE 2011. The experiments were performed with tf-idf and its variants along with probabilistic models. For all experiments and evaluation the open search engine, Terrier 3. 5 was used. Our result shows that tf-idf model gives the highest precision values with the news corpus dataset.

References
  1. An Introduction to Information Retrieval Christopher D. Manning Prabhakar Raghavan Hinrich Schütze.
  2. Sager, Juan C. A practical course in terminology processing. John Benjamins Publishing, 1990.
  3. Baeza-Yates, Ricardo, and Berthier Ribeiro-Neto. Modern information retrieval. Vol. 463. New York: ACM press, 1999.
  4. Frakes, William B. "Stemming Algorithms. " (1992): 131-160.
  5. Patel, B. N. , Prajapati, S. G. , & Lakhtaria, K. I. (2012). Efficient Classification of Data Using Decision Tree. Bonfring International Journal of Data Mining, 2(1), 06-12. [6 ]Xia, Tian, and Yanmei Chai. "An Improvement to TF-IDF: Term Distribution based Term Weight Algorithm. " Journal of Software (1796217X) 6. 3 (2011).
  6. Alvarez, Sergio A. "An exact analytical relation among recall, precision, and classification accuracy in information retrieval. " Boston College, Boston, Technical Report BCCS-02-01 (2002): 1-22.
  7. Akhilesh Sharma, Kamaljit Lakhtaria, Santosh Vishwakarma, "Data Mining Based Predictions For Employees Skill Enhancement Using Pro-Skill-Improvement Program & Performance Using Classifier Scheme Algorithm", International Journal of Advanced Research in Computer Science, ISSN No. 0976-5697, Vol. 4, No. 3, March 2013, Page No. 102 – 107
  8. Robertson, Stephen. "Understanding inverse document frequency: on theoretical arguments for IDF. " Journal of documentation 60. 5 (2004): 503-520.
  9. Santosh K. Vishwakarma, Kamaljit I Lakhtaria, Divya Bhatnagar, Akhilesh Sharma (2014). "An efficient approach for inverted index pruning based on document relevance" Conference Proceeding of Fourth International Conference on Communication Systems and Network Technologies, Page No. 487-490. DOI 10. 1109/CSNT. 2014. 103
  10. Lakhtaria, Kamaljit I. , Bhaskar N. Patel. "Implementing R-Tree Index Optimizatioin in Core Banking system. " International Journal of Research in Management, Economics & Commerce, 2(3) (2012), 42-48
  11. Saracevic, Tefko. "Evaluation of evaluation in information retrieval. " Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1995.
  12. Amati, Giambattista. Probability models for information retrieval based on divergence from randomness. Diss. University of Glasgow, 2003.
  13. Robertson, Stephen, and Hugo Zaragoza. The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc, 2009.
  14. Lakhtaria, Kamaljit I. Technological Advancements and Applications in Mobile Ad-hoc Networks: Research Trends. Information Science Reference, 2012.
  15. Lakhtaria, K. I. , Patel, P. , & Gandhi, A. (2010). Enhancing Curriculum Acceptance among Students with E-learning 2. 0. arXiv preprint arXiv:1004. 2560.
  16. www. terrier. org
  17. Sharma, Akhilesh K. , Kamaljit I. Lakhtaria, Avinash Panwar, and Santosh K. Vishwakarma. "An efficient approach using LPFT for the karaoke formation of musical song. " In Advance Computing Conference (IACC), 2014 IEEE International, pp. 601 - 605. IEEE, 2014.
Index Terms

Computer Science
Information Sciences

Keywords

TF-IDF BM25 DFR Retrieval Effectiveness Precision