Reseach Article

Data Preprocessing to Improve Accuracy in Classification Methods (Case Study: Credit Risk Analysis Dataset Classification)

by Baiq Nurul Azmi, Arief Hermawan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Number 5
Year of Publication: 2024
Authors: Baiq Nurul Azmi, Arief Hermawan

This research analyzes the use of various data pre-processing methods in the context of credit risk analysis with Support Vector Machine (SVM) classification models. The background of this research details the complexity of challenges faced in the banking industry regarding credit risk evaluation and how data pre-processing to improve model accuracy. The research method includes four experimental scenarios that consider various combinations of data pre-processing methods. Each scenario is designed to evaluate the performance of SVM models on credit risk datasets. The method steps include data preparation, Missing Data handling with Remove Features for features that have more than 50% Missing Data rate and MICE imputation for features that have less than 50% Missing Data, feature selection based on Correlation Matrix to overcome High Dimensional Data, and data resampling with SMOTE to overcome class imbalance. The test results show that using a combination of data pre-processing methods can significantly improve the accuracy of SVM models on credit risk datasets. The highest accuracy is obtained in the pre-processing scenario when overcoming Missing Data with remove features and MICE imputation with a value of 99.4%.

Index Terms

Computer Science
Information Sciences


Credit Risk Analysis MICE SMOTE Support Vector Machine Correlation Matrix