CFP last date
20 February 2025
Reseach Article

Summarization of Document using Feature Selection Method: TF-IDF

Published on January 2025 by Debisankar Jena, Jyotirmayee Rautaray, Pranati Mishra
International Conference on Artificial Intelligence and Data Science Applications - 2023
Control System labs
ICAIDSC2023 - Number 1
January 2025
Authors: Debisankar Jena, Jyotirmayee Rautaray, Pranati Mishra
10.5120/icaidsc202409

Debisankar Jena, Jyotirmayee Rautaray, Pranati Mishra . Summarization of Document using Feature Selection Method: TF-IDF. International Conference on Artificial Intelligence and Data Science Applications - 2023. ICAIDSC2023, 1 (January 2025), 27-32. DOI=10.5120/icaidsc202409

@article{ 10.5120/icaidsc202409,
author = { Debisankar Jena, Jyotirmayee Rautaray, Pranati Mishra },
title = { Summarization of Document using Feature Selection Method: TF-IDF },
journal = { International Conference on Artificial Intelligence and Data Science Applications - 2023 },
issue_date = { January 2025 },
volume = { ICAIDSC2023 },
number = { 1 },
month = { January },
year = { 2025 },
issn = 0975-8887,
pages = { 27-32 },
numpages = 6,
url = { /proceedings/icaidsc2023/number1/summarization-of-document-using-feature-selection-method-tf-idf/ },
doi = { 10.5120/icaidsc202409 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International Conference on Artificial Intelligence and Data Science Applications - 2023
%A Debisankar Jena
%A Jyotirmayee Rautaray
%A Pranati Mishra
%T Summarization of Document using Feature Selection Method: TF-IDF
%J International Conference on Artificial Intelligence and Data Science Applications - 2023
%@ 0975-8887
%V ICAIDSC2023
%N 1
%P 27-32
%D 2025
%I International Journal of Computer Applications
Abstract

In NLP, text summarization is the technique of condensing information from huge texts to smaller one. The phases in the summarization process includes reading the texts, normalizing the data, removing stop words, stemming, morphological analysis, and producing the summary. It falls under the extractive, abstractive, and hybrid categories. For the suggested Indian Language text summarization, extractive text summarization is being used. One method for extractive text summarization is PageRank. Each sentence in the document functions as a vertex on a graph, which is the basis of how it functions. Each node's initial score is determined by the number of words in the sentence, and the edges between nodes are determined by the cosine similarity of the sentences preprocessing, feature extraction, and graph building are the three main processes in PageRank technique. For a better understanding of the context, one of the simple things to take is feature extraction. We employ a specific way to apply weights to specific terms in our document before modeling them once the initial text has been cleaned and normalized. TF-IDF (Term Frequency-Inverse Document Frequency) is used for CNN dataset which produces better summary as compared to bag of words. Precision, recall and f score is calculated for generated summary and tf idf delivers best result.

References
  1. Elbarougy, R., Behery, G., & El Khatib, A. (2020). Extractive Arabic text summarization using modified PageRank algorithm. Egyptian informatics journal, 21(2), 73-81
  2. Sinha, A., Yadav, A., & Gahlot, A. (2018). Extractive text summarization using neural networks. arXiv preprint arXiv:1802.10137.
  3. Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., & Kochut, K. (2017). Text summarization techniques: a brief survey. arXiv preprint arXiv:1707.02268.
  4. Narayan, S., Cohen, S. B., & Lapata, M. (2018). Ranking sentences for extractive summarization with reinforcement learning. arXiv preprint arXiv:1802.08636.
  5. Rautray, R., & Balabantaray, R. C. (2017). Cat swarm optimization based evolutionary framework for multi document summarization. Physica a: statistical mechanics and its applications, 477, 174-186.
  6. Sanchez-Gomez, J. M., Vega-Rodríguez, M. A., & Pérez, C. J. (2018). Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach. Knowledge-Based Systems, 159, 1-8.
  7. Dutta, S., Ghatak, S., Roy, M., Ghosh, S., & Das, A. K. (2015, September). A graph-based clustering technique for tweet summarization. In 2015 4th international conference on reliability, infocom technologies and optimization (ICRITO) (trends and future directions) (pp. 1-6). IEEE.
  8. Fakhrezi, M. F., Bijaksana, M. A., & Huda, A. F. (2021). Implementation of automatic text summarization with TextRank method in the development of Al-qur’an vocabulary encyclopedia. Procedia Computer Science, 179, 391-398
  9. Mandal, S., Singh, G. K., & Pal, A. (2018). A Constraints Driven PSO Based Approach for Text Summarization. Journal of Informatics & Mathematical Sciences, 10(4).
  10. Yao, K., Zhang, L., Luo, T., & Wu, Y. (2018). Deep reinforcement learning for extractive summarization. Neurocomputing, 284, 52-62. document
  11. Mallick, C., Dutta, M., Das, A. K., Sarkar, A., & Das, A. K. (2019). Extractive summarization of a document using lexical chains. In Soft Computing in Data Analytics: Proceedings of International Conference on SCDA 2018 (pp. 825-836). Springer Singapore.
  12. Al-Saleh, A., & Menai, M. E. B. (2018, August). Ant colony system for multi-document summarization. In Proceedings of the 27th International Conference on Computational Linguistics (pp. 734-744).
  13. Rautray, R., & Balabantaray, R. C. (2018). An evolutionary framework for multi document summarization using Cuckoo search approach: MDSCSA. Applied computing and informatics, 14(2), 134-144.
  14. Verma, P., & Om, H. (2019). A variable dimension optimization approach for text summarization. In Harmony Search and Nature Inspired Optimization Algorithms: Theory and Applications, ICHSA 2018 (pp. 687-696). Springer Singapore.
  15. Tomer, M., & Kumar, M. (2022). Multi-document extractive text summarization based on firefly algorithm. Journal of King Saud University-Computer and Information Sciences, 34(8), 6057-6065.
  16. Shivakumar, K., & Soumya, R. (2015). Text summarization using clustering technique and SVM technique. International Journal of Applied Engineering Research, 10(12), 28873- 28881.
  17. Mutlu, B., Sezer, E. A., & Akcayol, M. A. (2020). Candidate sentence selection for extractive text summarization. Information Processing & Management, 57(6), 102359.
  18. Rautaray, J., Panigrahi, S., & Nayak, A. (2022, August). An Empirical and Comparative Study of Graph based Summarization Algorithms. In 2022 International Conference on Machine Learning, Computer Systems and Security (MLCSS) (pp. 274-279). IEEE.
Index Terms

Computer Science
Information Sciences

Keywords

ATS Pagerank bag of words Fscore tf idf feature extraction and extractive