We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Call for Paper
December Edition
IJCA solicits high quality original research papers for the upcoming December edition of the journal. The last date of research paper submission is 20 November 2024

Submit your paper
Know more
Reseach Article

A Statistical Approach of Keyword Extraction for Efficient Retrieval

by Shruti Luthra, Dinkar Arora, Kanika Mittal, Anusha Chhabra
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 168 - Number 7
Year of Publication: 2017
Authors: Shruti Luthra, Dinkar Arora, Kanika Mittal, Anusha Chhabra
10.5120/ijca2017914443

Shruti Luthra, Dinkar Arora, Kanika Mittal, Anusha Chhabra . A Statistical Approach of Keyword Extraction for Efficient Retrieval. International Journal of Computer Applications. 168, 7 ( Jun 2017), 31-36. DOI=10.5120/ijca2017914443

@article{ 10.5120/ijca2017914443,
author = { Shruti Luthra, Dinkar Arora, Kanika Mittal, Anusha Chhabra },
title = { A Statistical Approach of Keyword Extraction for Efficient Retrieval },
journal = { International Journal of Computer Applications },
issue_date = { Jun 2017 },
volume = { 168 },
number = { 7 },
month = { Jun },
year = { 2017 },
issn = { 0975-8887 },
pages = { 31-36 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume168/number7/27889-2017914443/ },
doi = { 10.5120/ijca2017914443 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:16:06.747430+05:30
%A Shruti Luthra
%A Dinkar Arora
%A Kanika Mittal
%A Anusha Chhabra
%T A Statistical Approach of Keyword Extraction for Efficient Retrieval
%J International Journal of Computer Applications
%@ 0975-8887
%V 168
%N 7
%P 31-36
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Large number of techniques for keyword extraction have been proposed for better matching of documents with the user’s query but most of them deal with tf-idf to find the weight age of query terms in the entire document but this can result in improper result as if a term has a low term frequency in overall document but high frequency in a certain part of the document then that term can be ignored by traditional tf-idf method. Through this paper, the keyword extraction is improved using a hybrid technique in which the entire document is split into multiple domains using a master keyword and the frequency of all unique words is found in every domain . The words having high frequency are selected as candidate keywords and the final selection is made on the basis of a graph which is constructed between the keywords using Word Net. The experiments, conducted on various documents show that proposed approach outperforms other keyword extraction methodologies by enhancing document retrieval.

References
  1. Information Retrieval Research, Jonathan Furner, School of Information and Media Studies, and David Harper, School of Computer and Mathematical Studies, The Robert Gordon University, Aberdeen, Scotland. (Eds)
  2. Important problems in information retrieval, Dagobert Soergel College of Library and Information Services University of Maryland College Park, MD 20742
  3. "Keyword extraction-a review of methods and approaches" Slobodan Beliga University of Rijeka, Department of Informatics Radmile Matejčić 2, 51 000 Rijeka, Croatia
  4. Effective Approaches For Extraction Of Keywords Jasmeen Kaur, Vishal Gupta, ME Research Scholar Computer Science & Engineering, UIET, Panjab University Chandigarh, (UT)-160014
  5. Understanding Inverse Document Frequency: On theoretical arguments for IDF, Stephen Robertson Microsoft Research 7 JJ Thomson Avenue Cambridge CB3 0FB UK
  6. Keyword Extraction using graph based approaches, R. Nagarajan, Dr. S. Anu H Nair, Dr. P. Aruna, N. Puviarasan Department of Computer Science & Engineering, Annamalai University, Tamilnadu, India
  7. Salton G, Wong A and Yang C, “A vector space model for automatic indexing”, Communications of the ACM, 18(11), 613 – 620, 1975
  8. Cohen J. D., “Highlights: Language and Domain- independent Automatic Indexing Terms for Abstracting”,Journal of the American Society for Information Science, 46(3): 162 – 174, 1995
  9. Mihalcea R and Tarau P, “Textrank: Bringing order into texts”, In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 2004
  10. Jasmeen and Vishal,"Effective approaches for extraction of keywords", IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 6, November 2010 ISSN (Online): 1694-0814
  11. Hulth A., “Improved automatic keyword extraction given more linguistic knowledge”, In Proceedings of theConference on Empirical Methods in Natural Language Processing (EMNLP'03), 216 – 223, Sapporo, 2003
  12. Hulth A, “Combining machine learning and natural language processing for automatic keyword extraction”,PhD Thesis, Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences, 2004
  13. Whitney P, Engel D and Cramer N, “Mining for surprise events within text streams”. Proceedings of the NinthSIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, 617–627, 2009
  14. Salton G, Wong A and Yang C, “A vector space model for automatic indexing”, Communications of the ACM, 18(11), 613 – 620, 1975
  15. I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, C. G. Nevill-Manning, “Kea: Pra-ctical Automatic Keyphrase Extraction” inProc. of the 4th ACM Conf. of the Digital Libraries, Berkeley, CA, USA, 1999.
  16. P. D. Turney, “Learning to Extract Keyphrases from Text” in Tech. Report, National Research Council of Canada, Institute for Information Technology, 1999.
  17. T. D. Nguyen, M.-Y. Kan, „Keyphrase extraction in scientific publications“ in Proc. of ICADL 2007, pp. 317-326, 2007.
  18. M. Krapivin, A. Autayeu, M. Marchese, E. Blanzieri, N. Segata, “Keyphrases Extraction from Scientific Documents: Improving Machine Learning Approaches with Natural Language Processing” in Proc. of 12th Int. Conf. on Asia-Pacific Digital Libraries, ICADL 2010, Gold Coast, Australia, LNAI v.6102, pp. 102-111, 2010
  19. Y. HaCohen-Kerner, “Automatic Extraction of Keywords from Abstracts” in Proc. of 7th Int. Conf. KES 2003 (LNCS v. 2773), pp, 843-849, 2003.
  20. M. Litvak, M. Last, “Graph-based keyword extraction for single-document summarization” in ACM Workshop on Multi-source Multilingual Information Extraction and Summarization, pp.17-24, 2008.
  21. Z. Yang, J. Lei, K. Fan, Y. Lai, “Keyword extraction by entropy difference between the intrinsic and extrinsic mode” in Physica A: Statistical Mechanics and its Applications, V. 392, I. 19, pp. 4523-4531, 2013.
  22. Slobodan beliga, University of Rijeka, Department of Informatics Radmile Matejčić 2, 51 000 Rijeka, Croatia,"Keyword extraction a review of method and approaches"
  23. Y Matsuo," Keyword Extraction from a Single Document using Word Co-occurrence Statistical Information",International Journal on Artificial Intelligence Tools c World Scientific Publishing Company
  24. "Domain keyword extraction technique: A new weighting method based on frequency analysis" Rakhi Chakraborty ,Department of Computer Science & Engineering, Global Institute Of Management and Technology, Nadia, India
  25. Willett, P. (2006) The Porter stemming algorithm: then and now. Program: electronic library and information systems, 40 (3). pp. 219-223.
Index Terms

Computer Science
Information Sciences

Keywords

Information Retrieval Domain Splitting Natural Language Processing Inverse Document Frequency Word Net