International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 106 - Number 3 |
Year of Publication: 2014 |
Authors: Saeed Raheel |
10.5120/18503-9572 |
Saeed Raheel . Feature Selection and the Preservation of Infrequent and Highly Significant Attributes in the Context of Arabic Text Mining. International Journal of Computer Applications. 106, 3 ( November 2014), 31-36. DOI=10.5120/18503-9572
Effective feature selection is a key component for building an efficient automatic document classifier. We regularly encounter in the Arabic literature- especially the scientific one- infrequent non-Arabic words that are eliminated by practice during the pre-processing phase. Although infrequent, those words are highly pertinent to their documents and, thus, can contribute to build a more efficient classification model and enforce the subjectivity of the decision taken by the classifier. Therefore, we propose in this paper four different feature selection solutions that allow both preserving a maximum number of those words and getting satisfactory classification accuracy.