International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 186 - Number 16 |
Year of Publication: 2024 |
Authors: Rian Oktafiani, Enny Itje Sela |
10.5120/ijca2024923537 |
Rian Oktafiani, Enny Itje Sela . Breast Cancer Classification with Principal Component Analysis and Smote using Random Forest Method and Support Vector Machine. International Journal of Computer Applications. 186, 16 ( Apr 2024), 1-8. DOI=10.5120/ijca2024923537
Patients' lives may be at risk due to low-accuracy and inaccurate breast cancer classification results. The high dimensionality and unequal distribution of classes in breast cancer medical data presents a challenge for the application of machine learning techniques. Subsequently, studies that examine the parameters in the algorithm model are still scarce. Inappropriate parameter selection may lead to low accuracy. To classify breast cancer, this study compares the Random Forest and Support Vector Machine algorithms. The max depth parameter in Random Forest and Linear, Polynomial and RBF kernels in Support Vector Machine are the parameters analyzed in this study. Principal Component Analysis (PCA) is used for feature reduction and Synthetic Minority Oversampling Technique (SMOTE) method is used to overcome class imbalance. The results of this study are, the best accuracy obtained from the SVM method is 99.07% with precision, recall and f1 score 99% by using the RBF kernel and at n component PCA = 6, while Random Forest has the best test accuracy of 98.32%, with precision, recall and f1 score 98% by using max depth = 8 and n component PCA = 6. Therefore, it can be concluded that the method of using SMOTE and PCA can improve accuracy, and the SVM method is better than RF for breast cancer classification. Future studies can test various datasets to examine the impact of additional parameters and classification techniques.