International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 185 - Number 36 |
Year of Publication: 2023 |
Authors: Ahmad Farhan AlShammari |
10.5120/ijca2023923160 |
Ahmad Farhan AlShammari . Implementation of Text Similarity using Word Frequency and Cosine Similarity in Python. International Journal of Computer Applications. 185, 36 ( Oct 2023), 54-59. DOI=10.5120/ijca2023923160
The goal of this research is to develop a text similarity program using word frequency and cosine similarity in Python. The purpose of text similarity is to measure the similarity between texts. The word frequency is used to measure the word importance in the text, and cosine similarity is used to measure the similarity between texts. The basic steps of text similarity are explained: preprocessing text, creating list of words, creating bag of words, creating word frequency, calculating cosine similarity, and printing similarity score. The developed program was tested on an experimental text from Wikipedia. The program successfully performed the basic steps of text similarity and provided the required results.