International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 185 - Number 33 |
Year of Publication: 2023 |
Authors: Moheb R. Girgis, Marina Esam, Mamdouh M. Gomaa |
10.5120/ijca2023923105 |
Moheb R. Girgis, Marina Esam, Mamdouh M. Gomaa . Automated Extractive Text Summarization using Genetic and Simulated Annealing Algorithms and their Hybridization. International Journal of Computer Applications. 185, 33 ( Sep 2023), 34-43. DOI=10.5120/ijca2023923105
With the growing world of information, the increase in on-line publishing, and prevalent access to the Internet, huge volume of electronic documents are currently available on-line. Automatic text summarization (ATS) has attracted great interest to assist users and computer systems to process vast amount of texts and extract relevant knowledge in a more efficient way. An ATS system can generate a summary of a document, i.e. short text that includes the main information in it. The aim of this work is to study the performance of ATS systems that utilize metaheuristic and heuristic algorithms in automated extractive text summarization. To this end, this paper proposes Genetic Algorithm (GA)-based, Simulated Annealing (SA)-based, and hybrid GA-SA-based methods for solving the single document summarization (SDS) problem. The objective of these methods is generating a high-quality summary that contains the main information of a given document. In these methods, to assess the quality of solutions (summaries) being generated, an objective function is used that will be maximized. This objective function is represented as a weighted sum that combines five features: sentence position, similarity with title, sentence length, cohesion, and coverage. The paper presents the results of the experiments that have been conducted to evaluate the quality of the summaries generated by the proposed SDS algorithms by applying them to sample articles from the CNN corpus, using co-occurrence statistical metrics (ROUGE metrics) and three content-based metrics (Fitness, Readability and Cohesion).