CFP last date
20 January 2025
Reseach Article

A Hybrid Approach to Extract Keyphrases from Medical Documents

by Kamal Sarkar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 63 - Number 18
Year of Publication: 2013
Authors: Kamal Sarkar
10.5120/10565-5528

Kamal Sarkar . A Hybrid Approach to Extract Keyphrases from Medical Documents. International Journal of Computer Applications. 63, 18 ( February 2013), 14-19. DOI=10.5120/10565-5528

@article{ 10.5120/10565-5528,
author = { Kamal Sarkar },
title = { A Hybrid Approach to Extract Keyphrases from Medical Documents },
journal = { International Journal of Computer Applications },
issue_date = { February 2013 },
volume = { 63 },
number = { 18 },
month = { February },
year = { 2013 },
issn = { 0975-8887 },
pages = { 14-19 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume63/number18/10565-5528/ },
doi = { 10.5120/10565-5528 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:14:39.905874+05:30
%A Kamal Sarkar
%T A Hybrid Approach to Extract Keyphrases from Medical Documents
%J International Journal of Computer Applications
%@ 0975-8887
%V 63
%N 18
%P 14-19
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Keyphrases are the phrases, consisting of one or more words, representing the important concepts in the articles. Keyphrases are useful for a variety of tasks such as text summarization, automatic indexing, clustering/classification, text mining etc. This paper presents a hybrid approach to keyphrase extraction from medical documents. The keyphrase extraction approach presented in this paper is an amalgamation of two methods: the first one assigns weights to candidate keyphrases based on an effective combination of features such as position, term frequency, inverse document frequency and the second one assign weights to candidate keyphrases using some knowledge about their similarities to the structure and characteristics of keyphrases available in the memory (stored list of keyphrases). An efficient candidate keyphrase identification method as the first component of the proposed keyphrase extraction system has also been introduced in this paper. The experimental results show that the proposed hybrid approach performs better than some state-of-the art keyphrase extraction approaches.

References
  1. Wu, Y. B. , Li, Q. 2008. "Document keyphrases as subject metadata: incorporating document key concepts in search results", Journal of Information Retrieval, Volume 11, Number 3, 229-249.
  2. Elbeltagy, S. , Rafea, A. 2009. "Kp-miner: A keyphrase extraction system for English and Arabic documents", Information Systems, 34(1), 132–144.
  3. Witten, I. H. , Paynter, G. W. , Frank, E. et al. 1999. KEA: Practical Automatic Keyphrase Extraction, In E. A. Fox, N. Rowe (eds. ): Proceedings of Digital Libraries'99: The Fourth ACM Conference on Digital Libraries. ACM Press, Berkeley, CA, 254–255.
  4. Li, Q. , Wu, Y. B. , Bot, R. , Chen, X. 2004. Incorporating document keyphrases in search results, In proceedings of the tenth American conference on information systems, New York.
  5. Jones, S. , Staveley, M. 1999. Phrasier: A system for interactive document retrieval using keyphrases, In proceedings of SIGIR, Berkeley, CA.
  6. Buyukkokten, O. , Garcia-Molina, H. , Paepcke, A. 2001. Seeking the Whole in Parts: Text Summarization for Web Browsing on Handheld Devices, In Proceedings of the World Wide Web Conference, Hong Kong.
  7. Buyukkokten, O. , Kaljuvee, O. , Garcia-Molina, H. , Paepcke, A, Winograd, T. 2002. "Efficient Web Browsing on Handheld Devices", Using Page and Form Summarization. ACM Transactions on Information Systems (TOIS), 20(1), 82–115
  8. Gutwin, C. , Paynter, G. , Witten, I. , Nevill-Manning, C. , Frank, E. 2003. "Improving browsing in digital libraries with keyphrase indexes", Journal of Decision Support Systems, 27(1-2), 81-104.
  9. Kosovac, B. , Vanier, D. J. , Froese, T. M. 2000. "Use of keyphrase extraction software for creation of an AEC/FM thesaurus". Journal of Information Technology in Construction, 25-36.
  10. Jonse, S. , Mahoui, M. 2000. Hierarchical document clustering using automatically extracted keyphrase, In proceedings of the third international Asian conference on digital libraries, Seoul, Korea, 113-20.
  11. Barker, K. , Cornacchia, N. 2000. Using Noun Phrase Heads to Extract Document Keyphrases. In H. Hamilton, Q. Yang (eds. ): Canadian AI 2000. Lecture Notes in Artificial Intelligence, Vol. 1822, Springer-Verlag, Berlin Heidelberg, 40 – 52.
  12. Chien L. F. 1999. "PAT-tree-based Adaptive Keyphrase Extraction for Intelligent Chinese Information Retrieval", Information Processing and Management, 35, 501–521.
  13. HaCohen-Kerner Y. 2003. Automatic Extraction of Keywords from Abstracts. In V. Palade, R. J. Howlett, L. C. Jain (eds. ): KES 2003. Lecture Notes in Artificial Intelligence, Vol. 2773, Springer-Verlag, Berlin Heidelberg, 843 – 849.
  14. HaCohen-Kerner Y. , Gross Z. , Masa, A. 2005. Automatic Extraction and Learning of Keyphrases from Scientific Articles. In A. Gelbukh (ed. ): CICLing 2005. Lecture Notes in Computer Science, Vol. 3406, Springer-Verlag, Berlin Heidelberg, 657–669.
  15. Hulth A. , Karlgren J. , Jonsson A. , Boström H. 2001. Automatic Keyword Extraction Using Domain Knowledge. In A. Gelbukh (ed. ): CICLing 2001. Lecture Notes in Computer Science, Vol. 2004, Springer-Verlag, Berlin Heidelberg, 472–482.
  16. Matsuo, Y. , Ohsawa, Y. , Ishizuka, M. 2001. KeyWorld: Extracting Keywords from a Document as a Small World. In K. P. Jantke, A. shinohara (eds. ): DS 2001. Lecture Notes in Computer Science, Vol. 2226, Springer-Verlag, Berlin Heidelberg, 271– 281.
  17. Wang, J. , Peng, H. , Hu, J. -S. 2005. Automatic Keyphrases Extraction from Document Using Neural Network. ICMLC 2005, 633-641.
  18. Turney, P. D. 2000. "Learning algorithm for keyphrase extraction", Journal of Information Retrieval, 2(4), 303-336 .
  19. Sarkar, K. 2009. Automatic Keyphrase Extraction from Medical Documents. In Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence (PReMI '09), 273-278.
  20. Sarkar, K. , Nasipuri, M. , Ghose, S. 2010. "A New Approach to Keyphrase extraction using Neural Networks", International Journal of Computer Science Issues, 7(2,3), 16–25.
  21. Sarkar, K. 2011. An N-Gram Based Method for Bengali Keyphrase Extraction. In proceedings of International Conference on Information Systems for Indian Languages, Communications in Computer and Information Science, Volume 139 Part 1, 36-41.
  22. Li, Q. , Wu, Y. Brook. 2006. "Identifying important concepts from medical documents", Journal of Biomedical Informatics, 668-679.
  23. Kumar, N. , Srinathan, K. 2008. Automatic keyphrase extraction from scientific documents using N-gram filtration technique, In Proceeding of the eighth ACM symposium on Document engineering, September 16-19, Sao Paulo, Brazil.
  24. Porter, M. F. 1980. "An Algorithm for Suffix Stripping", Program, 14(3), 130–137.
  25. Wu, Y. B. , Li, Q. , Bot, R. S. , Chen, X. 2005. Domain-specific keyphrase extraction. CIKM 2005, 283-284.
Index Terms

Computer Science
Information Sciences

Keywords

keyphrase extraction medical domain automatic indexing metadata partial supervision