International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 168 - Number 7 |
Year of Publication: 2017 |
Authors: Shruti Luthra, Dinkar Arora, Kanika Mittal, Anusha Chhabra |
10.5120/ijca2017914443 |
Shruti Luthra, Dinkar Arora, Kanika Mittal, Anusha Chhabra . A Statistical Approach of Keyword Extraction for Efficient Retrieval. International Journal of Computer Applications. 168, 7 ( Jun 2017), 31-36. DOI=10.5120/ijca2017914443
Large number of techniques for keyword extraction have been proposed for better matching of documents with the user’s query but most of them deal with tf-idf to find the weight age of query terms in the entire document but this can result in improper result as if a term has a low term frequency in overall document but high frequency in a certain part of the document then that term can be ignored by traditional tf-idf method. Through this paper, the keyword extraction is improved using a hybrid technique in which the entire document is split into multiple domains using a master keyword and the frequency of all unique words is found in every domain . The words having high frequency are selected as candidate keywords and the final selection is made on the basis of a graph which is constructed between the keywords using Word Net. The experiments, conducted on various documents show that proposed approach outperforms other keyword extraction methodologies by enhancing document retrieval.