International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 186 - Number 16 |
Year of Publication: 2024 |
Authors: Reski Noviana, Enny Itje Sela |
10.5120/ijca2024923548 |
Reski Noviana, Enny Itje Sela . Performance Comparison Random Forest and Logistic Regression in Predicting Time Deposit Customers with Feature Selection. International Journal of Computer Applications. 186, 16 ( Apr 2024), 33-38. DOI=10.5120/ijca2024923548
Machine learning algorithms can be used to analyze data and predict customer behavior. One important aspect in developing machine learning models is feature selection. Proper feature selection can significantly affect model performance. Irrelevant or redundant features can impair the performance of the model and increase its complexity. Therefore, feature selection is an important stage in building an effective prediction model. The main objective of this research is to compare the performance of Random Forest and Logistic Regression in predicting customers' decision to subscribe to time deposits. In addition, this research also includes the use of feature selection using Forward Selection and Recursive Feature Extraction (RFE) to ensure only relevant features are used in the model. The overall results show that the use of Forward Selection and Recursive Feature Elimination (RFE) feature selection also affects the accuracy value. In this study, the best accuracy was obtained by the first scenario, namely Radom Forest and Logistic Regression classification without using selection features but the target class has been balanced using the SMOTE method, resulting in the best accuracy of Random Forest 95.56%, and 96% for precision, recall and f1 score. While Logistic Regression 87.21% and 87% for precision, recall and f1 score. Then when using the feature selection scenario there is a decrease in accuracy for Random Forest by 3.39% when using Forward Selection and 0.33% when using RFE. While Logistic Regression there is a decrease in accuracy of 1.87% when using Forward Selection and 0.22% when using RFE. Further research can deepen the influence of parameters on classification models that can provide further information to improve model performance.