CFP last date
22 July 2024
Reseach Article

Performance Comparison Random Forest and Logistic Regression in Predicting Time Deposit Customers with Feature Selection

by Reski Noviana, Enny Itje Sela
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Number 16
Year of Publication: 2024
Authors: Reski Noviana, Enny Itje Sela
10.5120/ijca2024923548

Reski Noviana, Enny Itje Sela . Performance Comparison Random Forest and Logistic Regression in Predicting Time Deposit Customers with Feature Selection. International Journal of Computer Applications. 186, 16 ( Apr 2024), 33-38. DOI=10.5120/ijca2024923548

@article{ 10.5120/ijca2024923548,
author = { Reski Noviana, Enny Itje Sela },
title = { Performance Comparison Random Forest and Logistic Regression in Predicting Time Deposit Customers with Feature Selection },
journal = { International Journal of Computer Applications },
issue_date = { Apr 2024 },
volume = { 186 },
number = { 16 },
month = { Apr },
year = { 2024 },
issn = { 0975-8887 },
pages = { 33-38 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume186/number16/performance-comparison-random-forest-and-logistic-regression-in-predicting-time-deposit-customers-with-feature-selection/ },
doi = { 10.5120/ijca2024923548 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-04-27T03:06:46+05:30
%A Reski Noviana
%A Enny Itje Sela
%T Performance Comparison Random Forest and Logistic Regression in Predicting Time Deposit Customers with Feature Selection
%J International Journal of Computer Applications
%@ 0975-8887
%V 186
%N 16
%P 33-38
%D 2024
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Machine learning algorithms can be used to analyze data and predict customer behavior. One important aspect in developing machine learning models is feature selection. Proper feature selection can significantly affect model performance. Irrelevant or redundant features can impair the performance of the model and increase its complexity. Therefore, feature selection is an important stage in building an effective prediction model. The main objective of this research is to compare the performance of Random Forest and Logistic Regression in predicting customers' decision to subscribe to time deposits. In addition, this research also includes the use of feature selection using Forward Selection and Recursive Feature Extraction (RFE) to ensure only relevant features are used in the model. The overall results show that the use of Forward Selection and Recursive Feature Elimination (RFE) feature selection also affects the accuracy value. In this study, the best accuracy was obtained by the first scenario, namely Radom Forest and Logistic Regression classification without using selection features but the target class has been balanced using the SMOTE method, resulting in the best accuracy of Random Forest 95.56%, and 96% for precision, recall and f1 score. While Logistic Regression 87.21% and 87% for precision, recall and f1 score. Then when using the feature selection scenario there is a decrease in accuracy for Random Forest by 3.39% when using Forward Selection and 0.33% when using RFE. While Logistic Regression there is a decrease in accuracy of 1.87% when using Forward Selection and 0.22% when using RFE. Further research can deepen the influence of parameters on classification models that can provide further information to improve model performance.

References
  1. R. I. T. Linggadjaya, B. Sitio, and P. Situmorang, “Transformasi Digital Pt Bank Jago Tbk dari Bank Konvensional menjadi Bank Digital,” International Journal of Digital Entrepreneurship and Business, vol. 3, no. 1, pp. 9–22, Feb. 2022, doi: 10.52238/ideb.v3i1.76.
  2. R. Pratama, M. I. Herdiansyah, D. Syamsuar, and A. Syazili, “Prediksi Customer Retention Perusahaan Asuransi Menggunakan Machine Learning,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 12, no. 1, pp. 96–104, Mar. 2023, doi: 10.32736/sisfokom.v12i1.1507.
  3. I. Sulistiani, “Systematic Literature Review: Bankruptcy Prediction Menggunakan Teknik Machine Learning dan Deep Learning,” INTECH, vol. 2, no. 1, pp. 13–18, Jun. 2021, doi: 10.54895/intech.v2i1.824.
  4. F. H. Rachman and I. Imamah, “Pendekatan Data Science untuk Mengukur Empati Masyarakat terhadap Pandemi Menggunakan Analisis Sentimen dan Seleksi Fitur,” Jurnal Edukasi dan Penelitian Informatika (JEPIN), vol. 8, no. 3, p. 492, Dec. 2022, doi: 10.26418/jp.v8i3.56655.
  5. M. Shahriari and M. H. Asoodeh, “Predicting Long-Term Deposit Openings of Bank Customers Using Decision Tree and Random Forest Classification,” KEPES, vol. 19, no. 3, pp. 70–81, 2021, doi: 10.5281/zenodo.7936583#44.
  6. A. A. Aqham and K. D. Hartomo, “Data Mining untuk Nasabah Bank Telemarketing Menggunakan kombinasi Algoritm Naïve Bayes Dan Algoritma Genetik,” InfoTekJar (Jurnal Nasional Informatika dan Teknologi Jaringan), vol. 4, no. 1, pp. 47–56, Sep. 2019, doi: 10.30743/infotekjar.v4i1.1574.
  7. A. N. Puteri, A. Arizal, and A. D. Achmad, “Feature Selection Correlation-Based pada Prediksi Nasabah Bank Telemarketing untuk Deposito,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 20, no. 2, pp. 335–342, May 2021, doi: 10.30812/matrik.v20i2.1183.
  8. D. N. Aini, B. Oktavianti, M. J. Husain, D. A. Sabillah, S. T. Rizaldi, and M. Mustakim, “Seleksi Fitur untuk Prediksi Hasil Produksi Agrikultur pada Algoritma K-Nearest Neighbor (KNN),” Jurnal Sistem Komputer dan Informatika (JSON), vol. 4, no. 1, p. 140, Sep. 2022, doi: 10.30865/json.v4i1.4813.
  9. M. Zivkovic, C. Stoean, A. Chhabra, N. Budimirovic, A. Petrovic, and N. Bacanin, “Novel Improved Salp Swarm Algorithm: An Application for Feature Selection,” Sensors, vol. 22, no. 5, p. 1711, Feb. 2022, doi: 10.3390/s22051711.
  10. I. A. Rahmi, F. M. Afendi, and A. Kurnia, “Metode AdaBoost dan Random Forest untuk Prediksi Peserta JKN-KIS yang Menunggak,” Jambura Journal of Mathematics, vol. 5, no. 1, pp. 83–94, Jan. 2023, doi: 10.34312/jjom.v5i1.15869.
  11. A. Nugroho, I. Asror, and Y. F. A. Wibowo, “Klasifikasi Tingkat Kualitas Udara DKI Jakarta Berdasarkan Open Government Data Menggunakan Algoritma Random Forest,” in e-Proceeding of Engineering, Bandung: Telkom University, Mar. 2023, pp. 1824–1832. Accessed: Nov. 10, 2023. [Online]. Available: https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/20030/19395
  12. B. T. P. Briandy, E. Yulianingsih, Fatmasari, and Ferdiansyah, “Analisis Tingkat Akurasi Prediksi Gejala COVID - 19 Dengan Menggunakan Metode Logistic Regression dan Support Vector Machine,” JURNAL FASILKOM, vol. 13, no. 02, pp. 269–278, Aug. 2023, doi: 10.37859/jf.v13i02.5629.
  13. Elina, J. Cristian, V. Louise, S. Koka, and Christnatalis, “Prediksi Keberhasilan Lamaran Pekerjaan Dengan Count Vectorizer dan Logistic Regression,” in Prosiding Seminar Nasional Riset dan Information Science (SENARIS), Pematangsiantar: STIKOM Tunas Bangsa, Apr. 2022, pp. 16–25. Accessed: Oct. 27, 2023. [Online]. Available: http://tunasbangsa.ac.id/seminar/index.php/senaris/article/view/204
  14. S. Moro, P. Rita, and P. Cortez, “Bank Marketing,” UCI Machine Learning, vol. 1, no. 1, Jan. 2012, doi: https://doi.org/10.24432/C5K306.
  15. F. Safarkhani and S. Moro, “Improving the Accuracy of Predicting Bank Depositor’s Behavior Using a Decision Tree,” Applied Sciences, vol. 11, no. 19, p. 9016, Sep. 2021, doi: 10.3390/app11199016.
  16. F. K. Fikriah, “Feature Selection dengan Decision Tree untuk Prediksi Telemarketing Bank,” Jurnal Ilmu Komputer, vol. 15, no. 1, pp. 1–7, Apr. 2022, Accessed: Oct. 27, 2023. [Online]. Available: https://ojs.unud.ac.id/index.php/jik/article/view/74903
Index Terms

Computer Science
Information Sciences
Data Mining
Classification
Machine Learning

Keywords

Random Forest Logistic Regression Deposit Customers Feature Selection Forward Selection Recursive Feature Elimination