International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 183 - Number 4 |
Year of Publication: 2021 |
Authors: Maha A. Hana, Elsayed Badr, Sally Gamal, Naglaa Shehata |
10.5120/ijca2021921324 |
Maha A. Hana, Elsayed Badr, Sally Gamal, Naglaa Shehata . Breast Cancer Microarray Dataset with the Decision Tree Classifier and Efficient Scaling Techniques. International Journal of Computer Applications. 183, 4 ( May 2021), 13-17. DOI=10.5120/ijca2021921324
Badr et al. [1] proposed efficient scaling techniques EST with support vector machine on the data set Wisconsin from UCI machine learning with a total 569 rows and 33 columns. In this work, we try to evaluate the validity of the results reached by Badr et al. [1] in the case of using different datasets, different classifiers and dimensionality reduction tools? So, the decision tree algorithm is applied on the used breast cancer microarray dataset (BCMD) contains 289 patients and 35981 attributes. We use principal components analysis (PCA) to reduce the number of attributes. We also propose new scaling techniques to improve the accuracy of the decision tree algorithm. Experimental results show that the decision tree algorithm with new scaling techniques (equilibration, geometric mean and arithmetic mean) achieves 84.98 %, 80.65 % and 79.96 % accuracy against to the traditional normalization (normalization [0, 1], normalization [-1, 1] and standard normalization) by 75.44 %, 76.85% and 78.93%.