International Conference on Artificial Intelligence and Data Science Applications - 2023 |
Control System labs |
ICAIDSC2023 - Number 1 |
January 2025 |
Authors: Debisankar Jena, Jyotirmayee Rautaray, Pranati Mishra |
10.5120/icaidsc202409 |
Debisankar Jena, Jyotirmayee Rautaray, Pranati Mishra . Summarization of Document using Feature Selection Method: TF-IDF. International Conference on Artificial Intelligence and Data Science Applications - 2023. ICAIDSC2023, 1 (January 2025), 27-32. DOI=10.5120/icaidsc202409
In NLP, text summarization is the technique of condensing information from huge texts to smaller one. The phases in the summarization process includes reading the texts, normalizing the data, removing stop words, stemming, morphological analysis, and producing the summary. It falls under the extractive, abstractive, and hybrid categories. For the suggested Indian Language text summarization, extractive text summarization is being used. One method for extractive text summarization is PageRank. Each sentence in the document functions as a vertex on a graph, which is the basis of how it functions. Each node's initial score is determined by the number of words in the sentence, and the edges between nodes are determined by the cosine similarity of the sentences preprocessing, feature extraction, and graph building are the three main processes in PageRank technique. For a better understanding of the context, one of the simple things to take is feature extraction. We employ a specific way to apply weights to specific terms in our document before modeling them once the initial text has been cleaned and normalized. TF-IDF (Term Frequency-Inverse Document Frequency) is used for CNN dataset which produces better summary as compared to bag of words. Precision, recall and f score is calculated for generated summary and tf idf delivers best result.