We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Reseach Article

The Effect of Term Importance Degree on Text Retrieval

by Soheila Karbasi, Mehdi Yaghoubi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 38 - Number 1
Year of Publication: 2012
Authors: Soheila Karbasi, Mehdi Yaghoubi
10.5120/4653-6734

Soheila Karbasi, Mehdi Yaghoubi . The Effect of Term Importance Degree on Text Retrieval. International Journal of Computer Applications. 38, 1 ( January 2012), 27-31. DOI=10.5120/4653-6734

@article{ 10.5120/4653-6734,
author = { Soheila Karbasi, Mehdi Yaghoubi },
title = { The Effect of Term Importance Degree on Text Retrieval },
journal = { International Journal of Computer Applications },
issue_date = { January 2012 },
volume = { 38 },
number = { 1 },
month = { January },
year = { 2012 },
issn = { 0975-8887 },
pages = { 27-31 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume38/number1/4653-6734/ },
doi = { 10.5120/4653-6734 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:24:26.488944+05:30
%A Soheila Karbasi
%A Mehdi Yaghoubi
%T The Effect of Term Importance Degree on Text Retrieval
%J International Journal of Computer Applications
%@ 0975-8887
%V 38
%N 1
%P 27-31
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Various approaches to index term-weighting have been investigated. In fact, term-weighting is an indispensable process for document ranking in most retrieval systems. As well actual information retrieval systems have to deal with explosive growth of documents of various sizes and terms of various frequencies because an appropriate term-weighting scheme has a crucial impact on the overall performance of systems. This paper attempts to investigate the impact of term-weighting parameters used in the most well-known retrieval models. The study has been particularly focused on normalization of term frequency in weighting schemes. A novel factor which is called "term importance degree" has been identified, which can be applied to term-weighting schemes by using several parameters. The calculated correlations between the parameters of weighting schemes confirmed the impact of this factor to increase the performance of text retrieval systems. Two models of term frequency normalization are inserted in a basic term-weighting scheme, which shows the importance of terms. The experiments were carried out on the standard test collections which validated by multiple statistical tests.

References
  1. Amati, G. & van Rijsbergen, C. J., Probabilistic models of information retrieval based on measuring the divergence from randomness. In ACM Transactions on Information Systems (TOIS), volume 20(4), pages 357 - 389, 2002.
  2. Anh, V. & Moffat, A., Simplified similarity scoring using term ranks, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil.
  3. Baeza-Yates, R., & Ribeiro-Neto, B., Modern information retrieval. Harlow, England: Addison - Wesley Longman Ltd, 1999.
  4. Buckley, C., Singhal, A., Mitra M. & Salton, G. (1996). New retrieval approaches using SMART. In Proceedings of TREC-4, (pp. 25-48), Gaithersburg, MD: NIST Publication #500-236.
  5. Buckley C. & Voorhees E.M., Evaluating evaluation measure stability. In Proceedings of the 23rd Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, pages 33–40, ACM Press, 2000.
  6. Craswell, N. & Hawking D., Overview of the trec-2002 web track. In The 11th Text Retrieval Conference, TREC’2002, pages Gaithersburg, Maryland, USA, NIST Special Publication SP 500-251, 2002.
  7. Cummins, R. & O'Riordan, C., An evaluation of evolved term-weighting schemes in information retrieval. In CIKM'05: Proceedings of the 14th ACM international conference on Information and knowledge management, pages 305-306, New York, NY, USA, 2005, ACM Press.
  8. Fang, H., Tao, T. & Zhai, C., A formal study of information retrieval heuristics. SIGIR 2004: 49-56.
  9. Luhn, H. P., The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development 2 (2), p. 159-165 and 317, April 1958.
  10. Maron, M., Automatic indexing: an experimental enquiry. Journal of the ACM, 24(8): 404-417, 1961.
  11. Robertson, S. E & Sparck Jones, K., Relevance weighting of search terms. Journal of the American Society for Information Science, 27: 129 - 146, 1976.
  12. Robertson, S., Walker, S., M. M. Beaulieu, Gatford, M. & A. Payne, Okapi at trec-4. In NIST Special Publication 500-236: The Fourth Text Retrieval Conference (TREC-4), pages 73 - 96, 1995.
  13. Robertson, S. E. & Walker, S., Okapi/Keenbow at TREC- 8. In E M Voorhees and D K Harman, editors, The Eighth Text Retrieval Conference (TREC-8), pages 151- 162. Gaithersburg, MD: NIST, 2000, NIST Special Publication 500-246.
  14. Salton, G. & McGill, M.J., Introduction to Modern Information Retrieval. McGraw-Hill, New York 1983.
  15. Salton, G., Syntactic approaches to automatic book indexing. In Proc of the annual meeting on Association for Computational Linguistics (ACL) (1988), pages 204-210, Department of Computer Science, Cornell University, Ithaca, New York, 1988.
  16. Salton, G. & Buckley, C., Term-Weighting Approaches in Automatic Text Retrieval, Information Processing & Management, 24(5), pp. 513-523, 1988.
  17. Singhal, A., Buckley, C. & Mitra, M., Pivoted document length normalization. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21–29, 1996.
  18. Singhal, A., Salton, G., Mitra M. & Buckley C. (1996), ‘Document length normalization’. Information Processing & Management 32, 619–633.
  19. Singhal, A., Choi, J., Hindle, D., Lewis, D.D. & Pereira, F., (1999). AT&T at TREC-7, In Proceedings of TREC-7, (pp. 239-251), Gaithersburg, MD: NIST Publication #500-242.
Index Terms

Computer Science
Information Sciences

Keywords

Text retrieval Term-weighting scheme Term frequency normalization Term importance degree