CFP last date
20 December 2024
Reseach Article

Keyword and Keyphrase Extraction Techniques: A Literature Review

by Sifatullah Siddiqi, Aditi Sharan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 109 - Number 2
Year of Publication: 2015
Authors: Sifatullah Siddiqi, Aditi Sharan
10.5120/19161-0607

Sifatullah Siddiqi, Aditi Sharan . Keyword and Keyphrase Extraction Techniques: A Literature Review. International Journal of Computer Applications. 109, 2 ( January 2015), 18-23. DOI=10.5120/19161-0607

@article{ 10.5120/19161-0607,
author = { Sifatullah Siddiqi, Aditi Sharan },
title = { Keyword and Keyphrase Extraction Techniques: A Literature Review },
journal = { International Journal of Computer Applications },
issue_date = { January 2015 },
volume = { 109 },
number = { 2 },
month = { January },
year = { 2015 },
issn = { 0975-8887 },
pages = { 18-23 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume109/number2/19161-0607/ },
doi = { 10.5120/19161-0607 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:43:44.705033+05:30
%A Sifatullah Siddiqi
%A Aditi Sharan
%T Keyword and Keyphrase Extraction Techniques: A Literature Review
%J International Journal of Computer Applications
%@ 0975-8887
%V 109
%N 2
%P 18-23
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this paper we present a survey of various techniques available in text mining for keyword and keyphrase extraction. Keywords and keyphrases are very useful in analyzing large amount of textual material quickly and efficiently search over the internet besides being useful for many other purposes. Keywords and keyphrases are set of representative words of a document that give high-level specification of the content for interested readers. They are used highly in the field of Computer Science especially in Information Retrieval and Natural Language Processing and can be used for index generation, query refinement, text summarization, author assistance, etc. We have also discussed some important feature selection metrics generally employed by researchers to rank candidate keywords and keyphrases according to their importance.

References
  1. Feather, J. and S. P. , International encyclopedia of information and library science. London & New York: Routledge, 1996
  2. Justeson, J. , Katz, S. , "Technical terminology: some linguistic properties and an algorithm for identification in text", Natural Language Engineering 1, 9-27, 1995
  3. G. Salton, C. S. Yang, C. T. Yu, "A Theory of Term Importance in Automatic Text Analysis", Journal of the American society for Information Science, 26(1), 33-44, 1975.
  4. J. D. Cohen, "Highlights: Language and Domain-independent Automatic Indexing Terms for Abstracting" Journal of the American Society for Information Science, 46(3): 162-174, 1995
  5. M. Ortuño et al. , "Keyword detection in natural languages and DNA", Europhys. Lett. 57, 759, 2002
  6. J. P. Herrera, P. A. Pury, "Statistical keyword detection in literary corpora", The European physical journal, 2008
  7. P. Carpena et al. , "Level statistics of words-Finding keywords in literary texts and symbolic sequences", Physical Review E, 79, 03512(R), 2009
  8. Turney P. D. , "Learning algorithms for keyphrase extraction", Information Retrieval, 2: pp 303-336, 2000
  9. Frank E. , Paynter G. W. , Witten I. H. , Gutwin C. , Nevill-Manning C. G. , " Domain-specific keyphrase extraction", Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 668-673. San Francisco, CA, USA, 1999
  10. Song M. et al. ," KPSpotter: a flexible information gain-based keyphrase extraction system", Proceedings of the 5th ACM international workshop on Web information and data management, Pages 50 – 53, 2003
  11. Hulth A. "Improved automatic keyword extraction given more linguistic knowledge", Proceedings of the 2003 conference on Empirical methods in natural language processing, pp. 216-223. Association for Computational Linguistics, Morristown, NJ, USA, 2003
  12. Turney P. , "Coherent Keyphrase Extraction via Web Mining", Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03), pp. 434-439, 2003
  13. Tang J. et al. : Loss Minimization Based Keyword Distillation, Lecture Notes in Computer Science Volume 3007, pp 572-577, 2004
  14. Yasin Uzun, "Keyword Extraction Using Naïve Bayes", Bilkent University, Computer Science Dept. , Turkey, 2005
  15. Zhang K. et al. "Keyword Extraction Using Support Vector Machine", Lecture Notes in Computer Science Volume 4016, pp 85-96, 2006
  16. Medelyan O. , Witten H. "Thesaurus based automatic keyphrase indexing", Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, Pages 296-297, 2006
  17. Nguyen, T. D. , Kan, M. Y. , "Keyphrase extraction in scientific publications", Goh, D. H. L. , Cao, T. H. , Sfilvberg, I. , Rasmussen, E. M. (eds. ) ICADL. LNCS, vol. 4822, pp. 317-326. Springer, 2007
  18. Zhang C. et al. , "Automatic Keyword Extraction from Documents Using Conditional Random Fields", Journal of Computational Information Systems 4:3 pp 1169-1180, 2008
  19. Jiajia Feng et al. , "Keyword extraction based on sequential pattern mining", Proceedings of the Third International Conference on Internet Multimedia Computing and Service, pages 34-38, 2011
  20. Hong B. , Zhen D. , "An Extended Keyword Extraction Method", International Conference on Applied Physics and Industrial Engineering, Physics Procedia, Volume 24, Part B, 2012, Pages 1120–1127,2012
  21. Steier A. , Belew R. , "Exporting phrases: A statistical analysis of topical language", Second Symposium on Document Analysis and Information Retrieval, 1993
  22. Krulwich B. , and Burkey C. , "Learning user information interests through the extraction of semantically significant phrases", AAAI 1996 Spring Symposium on Machine Learning in Information Access, AAAI Press, 1996
  23. Muñoz,A. , "Compound key word generation from document databases using a hierarchical clustering ART model" Intelligent Data Analysis, 1996
  24. Barker, K. , and Cornacchia, N. , "Using nounphrase heads to extract document keyphrases", Advances in Artificial Intelligence, Lecture Notes in Computer Science, volume 1822/2000, pp 40-52, 2000
  25. Tomikoyo T. , Hurst M. , "A language model approach to keyphrase extraction", Proceedings of the ACL workshop on Multiword expressions: analysis, acquisition and treatment, Volume 18, Pages 33-40, 2003
  26. Mihalcea, R. , and Tarau, P. , "TextRank: Bringing order into texts", Proceedings of EMNLP, pp 404-411, 2004
  27. Bracewell et al. , "Multilingual single document keyword extraction for information retrieval", Natural Language Processing and Knowledge Engineering, pp. 517 – 522, 2005
  28. Liu, Z. , Li, P. , Zheng, Y. , Sun, M. , "Clustering to find exemplar terms for keyphrase extraction", Proceedings of Conference on Empirical Methods in Natural Language Processing. pp. 257-266, Singapore 2009
  29. Rose S. et al. , "Automatic keyword extractionfrom individual documents", Text Mining: Applications and Theory, John Wiley & Sons Ltd, 2010
  30. Luit Gazendam et al. "Thesaurus Based Term Ranking for Keyword Extraction", Workshops on Database and Expert Systems Applications, pp. 49-53, 2010
  31. Litvak M. et al. , "DegExt — A Language-Independent Graph-Based Keyphrase Extractor", Advances in Intelligent and Soft Computing, Volume 86, pp 121-130, 2011
  32. Ali Mehri et al. , "Keyword extraction by non-extensivity measure", Physical Review E, Volume 83, Issue 5, 2011
  33. Decong Li, Sujian Li, Wenjie Li, Wei Wang, Weiguang Qu, "A semi-supervised key phrase extraction approach: learning from title phrases through a document semantic network", Proceedings of the ACL 2010 Conference Short Papers, pages 296–300, 2010
  34. Decong Li, Sujian Li, "Hypergraph-based inductive learning for generating implicit key phrases", ACM 978-1-4503-0637, 2011
Index Terms

Computer Science
Information Sciences

Keywords

Keyword extraction keyphrase extraction survey feature selection weighting measures