International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 101 - Number 7 |
Year of Publication: 2014 |
Authors: Diab Abuaiadah, Jihad El Sana, Walid Abusalah |
10.5120/17701-8680 |
Diab Abuaiadah, Jihad El Sana, Walid Abusalah . On the Impact of Dataset Characteristics on Arabic Document Classification. International Journal of Computer Applications. 101, 7 ( September 2014), 31-38. DOI=10.5120/17701-8680
This paper describes the impact of dataset characteristics on the results of Arabic document classification algorithms using TF-IDF representations. The experiments compared different stemmers, different categories and different training set sizes, and found that different dataset characteristics produced widely differing results, in one case attaining a remarkable 99% recall (accuracy). The use of a standard dataset would eliminate this variability and enable researchers to gain comparable knowledge from the published results.