International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 185 - Number 35 |
Year of Publication: 2023 |
Authors: Ahmad Farhan AlShammari |
10.5120/ijca2023923137 |
Ahmad Farhan AlShammari . Implementation of Keyword Extraction using Term Frequency-Inverse Document Frequency (TF-IDF) in Python. International Journal of Computer Applications. 185, 35 ( Sep 2023), 9-14. DOI=10.5120/ijca2023923137
The goal of this research is to develop a keyword extraction program using Term Frequency-Inverse Document Frequency (TF-IDF) in Python. The purpose of keyword extraction is to identify the set of words (keywords) that describe the content of the text. The TF-IDF method is used to measure the importance of words in the text. The basic steps of keyword extraction are explained: preprocessing text, creating list of words, creating bag of words, creating word frequency (TF), creating inverse document frequency (IDF), creating word frequency-inverse document frequency (TF-IDF), creating keywords, and sorting keywords. The developed program was tested on an experimental text from Wikipedia. The program successfully performed the basic steps of keyword extraction and provided the required results.