International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 186 - Number 5 |
Year of Publication: 2024 |
Authors: Baiq Nurul Azmi, Arief Hermawan |
10.5120/ijca2024923385 |
Baiq Nurul Azmi, Arief Hermawan . Data Preprocessing to Improve Accuracy in Classification Methods (Case Study: Credit Risk Analysis Dataset Classification). International Journal of Computer Applications. 186, 5 ( Jan 2024), 22-29. DOI=10.5120/ijca2024923385
This research analyzes the use of various data pre-processing methods in the context of credit risk analysis with Support Vector Machine (SVM) classification models. The background of this research details the complexity of challenges faced in the banking industry regarding credit risk evaluation and how data pre-processing to improve model accuracy. The research method includes four experimental scenarios that consider various combinations of data pre-processing methods. Each scenario is designed to evaluate the performance of SVM models on credit risk datasets. The method steps include data preparation, Missing Data handling with Remove Features for features that have more than 50% Missing Data rate and MICE imputation for features that have less than 50% Missing Data, feature selection based on Correlation Matrix to overcome High Dimensional Data, and data resampling with SMOTE to overcome class imbalance. The test results show that using a combination of data pre-processing methods can significantly improve the accuracy of SVM models on credit risk datasets. The highest accuracy is obtained in the pre-processing scenario when overcoming Missing Data with remove features and MICE imputation with a value of 99.4%.