CFP last date
20 December 2024
Reseach Article

Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents

by Shahzad Qaiser, Ramsha Ali
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 181 - Number 1
Year of Publication: 2018
Authors: Shahzad Qaiser, Ramsha Ali
10.5120/ijca2018917395

Shahzad Qaiser, Ramsha Ali . Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents. International Journal of Computer Applications. 181, 1 ( Jul 2018), 25-29. DOI=10.5120/ijca2018917395

@article{ 10.5120/ijca2018917395,
author = { Shahzad Qaiser, Ramsha Ali },
title = { Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents },
journal = { International Journal of Computer Applications },
issue_date = { Jul 2018 },
volume = { 181 },
number = { 1 },
month = { Jul },
year = { 2018 },
issn = { 0975-8887 },
pages = { 25-29 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume181/number1/29681-2018917395/ },
doi = { 10.5120/ijca2018917395 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:04:38.459200+05:30
%A Shahzad Qaiser
%A Ramsha Ali
%T Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents
%J International Journal of Computer Applications
%@ 0975-8887
%V 181
%N 1
%P 25-29
%D 2018
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this paper, the use of TF-IDF stands for (term frequency-inverse document frequency) is discussed in examining the relevance of key-words to documents in corpus. The study is focused on how the algorithm can be applied on number of documents. First, the working principle and steps which should be followed for implementation of TF-IDF are elaborated. Secondly, in order to verify the findings from executing the algorithm, results are presented, then strengths and weaknesses of TD-IDF algorithm are compared. This paper also talked about how such weaknesses can be tackled. Finally, the work is summarized and the future research directions are discussed.

References
  1. Bafna, P., Pramod, D., and Vaidya, A. (2016). "Document clustering: TF-IDF approach," International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), Chennai, 2016, pp. 61-66
  2. Trstenjak, B., Mikac, S., & Donko, D. (2014). “KNN with TF-IDF based framework for text categorization” In Procedia Engineering. Vol. 69, pp. 1356–1364. Elsevier Ltd
  3. Gautam, J., & Kumar, E.L. (2013). “An Integrated and Improved Approach to Terms Weighting in Text Classification,” International Journal of Computer Science Issues, Vol 10, Issue 1, No 1, January 2013
  4. Hakim, A. A., Erwin, A., Eng, K. I., Galinium, M., & Muliady, W. (2015). “Automated document classification for news article in Bahasa Indonesia based on term frequency inverse document frequency (TF-IDF) approach,” 6th International Conference on Information Technology and Electrical Engineering: Leveraging Research and Technology, (ICITEE), 2014
  5. Gurusamy, V., & Kannan, S. (2014). “Preprocessing Techniques for Text Mining,” RTRICS, pp. 7-16
  6. Nam, S., and Kim, K. (2017). "Monitoring Newly Adopted Technologies Using Keyword Based Analysis of Cited Patents," IEEE Access, vol. 5, pp. 23086-23091
  7. Ramos, J. (2003). “Using TF-IDF to Determine Word Relevance in Document Queries,” Proceedings of the First Instructional Conference on Machine Learning, pp. 1–4
  8. Santhanakumar, M., and Columbus, C.C. (2015). “Various Improved TFIDF Schemes for Term Weighing in text Categorization: A Survey," International Journal of Applied Engineering Research, vol. 10, no. 14, pp. 11905-11910
  9. Dai, W. (2018). “Improvement and Implementation of Feature Weighting Algorithm TF-IDF in Text Classification,” International Conference on Network, Communication, Computer Engineering (NCCE 2018), vol. 147
  10. Fan, H., and Qin, Y. (2018). “Research on Text Classification Based on Improved TF-IDF Algorithm,” International Conference on Network, Communication, Computer Engineering (NCCE 2018), vol. 147
Index Terms

Computer Science
Information Sciences

Keywords

TF-IDF Data Mining Relevance of Words to Documents