International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 135 - Number 2 |
Year of Publication: 2016 |
Authors: Rouhia M. Sallam, Hamdy M. Mousa, Mahmoud Hussein |
10.5120/ijca2016908328 |
Rouhia M. Sallam, Hamdy M. Mousa, Mahmoud Hussein . Improving Arabic Text Categorization using Normalization and Stemming Techniques. International Journal of Computer Applications. 135, 2 ( February 2016), 38-43. DOI=10.5120/ijca2016908328
Text Categorization is a technique for assigning documents based on their contents to one or more pre-defined categories. Achieving highest categorization accuracy remains one of the major challenges and it is also time consuming. We proposed approach to tackle these challenges. The proposed approach uses Frequency Ratio Accumulation Method (FRAM) as a classifier. Its features are represented using bag of word technique and an improved Term Frequency (TF) technique is used in features selection. The proposed approach is tested with known datasets. The experiments are done without both of normalization and stemming, with one of them, and with both of them. The obtained results of proposed approach are generally improved compared to existing techniques.The performance attributes of proposed Arabic Text Categorization approach were considered: Accuracy, Recall, Precision and F-measure (F1). The averages of the obtained results are 97.50%, 97.50%, 97.51%, and 97.49% respectively using normalization.