CFP last date
20 March 2025
Reseach Article

Feature Selection and the Preservation of Infrequent and Highly Significant Attributes in the Context of Arabic Text Mining

by Saeed Raheel
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 106 - Number 3
Year of Publication: 2014
Authors: Saeed Raheel

Effective feature selection is a key component for building an efficient automatic document classifier. We regularly encounter in the Arabic literature- especially the scientific one- infrequent non-Arabic words that are eliminated by practice during the pre-processing phase. Although infrequent, those words are highly pertinent to their documents and, thus, can contribute to build a more efficient classification model and enforce the subjectivity of the decision taken by the classifier. Therefore, we propose in this paper four different feature selection solutions that allow both preserving a maximum number of those words and getting satisfactory classification accuracy.

Index Terms

Computer Science
Information Sciences


Arabic Text mining Machine Learning Dimensionality Reduction Automatic Classification