International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 185 - Number 33 |
Year of Publication: 2023 |
Authors: Abubakar Ahmad Aliero, Bashir Sulaimon Adebayo, Hamzat Olanrewaju Aliyu, Amina Gogo Tafida, Bashar Umar Kangiwa, Nasiru Muhammad Dankolo |
10.5120/ijca2023923106 |
Abubakar Ahmad Aliero, Bashir Sulaimon Adebayo, Hamzat Olanrewaju Aliyu, Amina Gogo Tafida, Bashar Umar Kangiwa, Nasiru Muhammad Dankolo . Systematic Review on Text Normalization Techniques and its Approach to Non-Standard Words. International Journal of Computer Applications. 185, 33 ( Sep 2023), 44-55. DOI=10.5120/ijca2023923106
Text normalization is the process of transforming text into a standardized and canonical form. It involves correcting spelling errors, expanding abbreviations, resolving contractions, normalizing punctuation, capitalization, and other linguistic variations to ensure consistent and coherent representations of textual data. The goal of text normalization is to reduce the lexical and orthographic variations in text, making it easier to process, analyze, and understand. It is a critical preprocessing step in many natural language processing (NLP) tasks, such as machine translation, text-to-speech synthesis, sentiment analysis, and information retrieval. Many techniques and approaches have been used for normalizing different kind of text including the User-Generated Content (UGC). This normalization helps to improve the performance of NLP downstream task. This paper provides a broad picture of the state-of-the-art researches in the area of text normalization from 2018 to 2022. About 54 journal and conference papers was selected to identifies and analyzed the trends of the text normalization techniques, approaches and issues in the related field. The use of dataset and evaluation metrics were excluded for future research.