Performance Comparison Random Forest and Logistic Regression in Predicting Time Deposit Customers with Feature Selection

Reski Noviana; Enny Itje Sela

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

An Easily Comprehendible Unicode based Sorting Algorithm for Bangla Words

October

2013

Detection and Prevention of Sybil Attack in MANET using MAC Address

July

2015

A Comparative Study of Assessing Software Reliability using SPC: An MMLE Approach

July

2012

Performance Comparison of Three Types of Sensor Matrices for Indoor Multi-Robot Localization

Nov

2018

Reseach Article

Performance Comparison Random Forest and Logistic Regression in Predicting Time Deposit Customers with Feature Selection

by Reski Noviana, Enny Itje Sela

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 186 - Number 16

Year of Publication: 2024

Authors: Reski Noviana, Enny Itje Sela

10.5120/ijca2024923548

Reski Noviana, Enny Itje Sela . Performance Comparison Random Forest and Logistic Regression in Predicting Time Deposit Customers with Feature Selection. International Journal of Computer Applications. 186, 16 ( Apr 2024), 33-38. DOI=10.5120/ijca2024923548

@article{ 10.5120/ijca2024923548,

author = { Reski Noviana, Enny Itje Sela },

title = { Performance Comparison Random Forest and Logistic Regression in Predicting Time Deposit Customers with Feature Selection },

journal = { International Journal of Computer Applications },

issue_date = { Apr 2024 },

volume = { 186 },

number = { 16 },

month = { Apr },

year = { 2024 },

issn = { 0975-8887 },

pages = { 33-38 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume186/number16/performance-comparison-random-forest-and-logistic-regression-in-predicting-time-deposit-customers-with-feature-selection/ },

doi = { 10.5120/ijca2024923548 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-04-27T03:06:46+05:30

%A Reski Noviana

%A Enny Itje Sela

%T Performance Comparison Random Forest and Logistic Regression in Predicting Time Deposit Customers with Feature Selection

%J International Journal of Computer Applications

%@ 0975-8887

%V 186

%N 16

%P 33-38

%D 2024

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Machine learning algorithms can be used to analyze data and predict customer behavior. One important aspect in developing machine learning models is feature selection. Proper feature selection can significantly affect model performance. Irrelevant or redundant features can impair the performance of the model and increase its complexity. Therefore, feature selection is an important stage in building an effective prediction model. The main objective of this research is to compare the performance of Random Forest and Logistic Regression in predicting customers' decision to subscribe to time deposits. In addition, this research also includes the use of feature selection using Forward Selection and Recursive Feature Extraction (RFE) to ensure only relevant features are used in the model. The overall results show that the use of Forward Selection and Recursive Feature Elimination (RFE) feature selection also affects the accuracy value. In this study, the best accuracy was obtained by the first scenario, namely Radom Forest and Logistic Regression classification without using selection features but the target class has been balanced using the SMOTE method, resulting in the best accuracy of Random Forest 95.56%, and 96% for precision, recall and f1 score. While Logistic Regression 87.21% and 87% for precision, recall and f1 score. Then when using the feature selection scenario there is a decrease in accuracy for Random Forest by 3.39% when using Forward Selection and 0.33% when using RFE. While Logistic Regression there is a decrease in accuracy of 1.87% when using Forward Selection and 0.22% when using RFE. Further research can deepen the influence of parameters on classification models that can provide further information to improve model performance.

References

R. I. T. Linggadjaya, B. Sitio, and P. Situmorang, “Transformasi Digital Pt Bank Jago Tbk dari Bank Konvensional menjadi Bank Digital,” International Journal of Digital Entrepreneurship and Business, vol. 3, no. 1, pp. 9–22, Feb. 2022, doi: 10.52238/ideb.v3i1.76.
R. Pratama, M. I. Herdiansyah, D. Syamsuar, and A. Syazili, “Prediksi Customer Retention Perusahaan Asuransi Menggunakan Machine Learning,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 12, no. 1, pp. 96–104, Mar. 2023, doi: 10.32736/sisfokom.v12i1.1507.
I. Sulistiani, “Systematic Literature Review: Bankruptcy Prediction Menggunakan Teknik Machine Learning dan Deep Learning,” INTECH, vol. 2, no. 1, pp. 13–18, Jun. 2021, doi: 10.54895/intech.v2i1.824.
F. H. Rachman and I. Imamah, “Pendekatan Data Science untuk Mengukur Empati Masyarakat terhadap Pandemi Menggunakan Analisis Sentimen dan Seleksi Fitur,” Jurnal Edukasi dan Penelitian Informatika (JEPIN), vol. 8, no. 3, p. 492, Dec. 2022, doi: 10.26418/jp.v8i3.56655.
M. Shahriari and M. H. Asoodeh, “Predicting Long-Term Deposit Openings of Bank Customers Using Decision Tree and Random Forest Classification,” KEPES, vol. 19, no. 3, pp. 70–81, 2021, doi: 10.5281/zenodo.7936583#44.
A. A. Aqham and K. D. Hartomo, “Data Mining untuk Nasabah Bank Telemarketing Menggunakan kombinasi Algoritm Naïve Bayes Dan Algoritma Genetik,” InfoTekJar (Jurnal Nasional Informatika dan Teknologi Jaringan), vol. 4, no. 1, pp. 47–56, Sep. 2019, doi: 10.30743/infotekjar.v4i1.1574.
A. N. Puteri, A. Arizal, and A. D. Achmad, “Feature Selection Correlation-Based pada Prediksi Nasabah Bank Telemarketing untuk Deposito,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 20, no. 2, pp. 335–342, May 2021, doi: 10.30812/matrik.v20i2.1183.
D. N. Aini, B. Oktavianti, M. J. Husain, D. A. Sabillah, S. T. Rizaldi, and M. Mustakim, “Seleksi Fitur untuk Prediksi Hasil Produksi Agrikultur pada Algoritma K-Nearest Neighbor (KNN),” Jurnal Sistem Komputer dan Informatika (JSON), vol. 4, no. 1, p. 140, Sep. 2022, doi: 10.30865/json.v4i1.4813.
M. Zivkovic, C. Stoean, A. Chhabra, N. Budimirovic, A. Petrovic, and N. Bacanin, “Novel Improved Salp Swarm Algorithm: An Application for Feature Selection,” Sensors, vol. 22, no. 5, p. 1711, Feb. 2022, doi: 10.3390/s22051711.
I. A. Rahmi, F. M. Afendi, and A. Kurnia, “Metode AdaBoost dan Random Forest untuk Prediksi Peserta JKN-KIS yang Menunggak,” Jambura Journal of Mathematics, vol. 5, no. 1, pp. 83–94, Jan. 2023, doi: 10.34312/jjom.v5i1.15869.
A. Nugroho, I. Asror, and Y. F. A. Wibowo, “Klasifikasi Tingkat Kualitas Udara DKI Jakarta Berdasarkan Open Government Data Menggunakan Algoritma Random Forest,” in e-Proceeding of Engineering, Bandung: Telkom University, Mar. 2023, pp. 1824–1832. Accessed: Nov. 10, 2023. [Online]. Available: https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/20030/19395
B. T. P. Briandy, E. Yulianingsih, Fatmasari, and Ferdiansyah, “Analisis Tingkat Akurasi Prediksi Gejala COVID - 19 Dengan Menggunakan Metode Logistic Regression dan Support Vector Machine,” JURNAL FASILKOM, vol. 13, no. 02, pp. 269–278, Aug. 2023, doi: 10.37859/jf.v13i02.5629.
Elina, J. Cristian, V. Louise, S. Koka, and Christnatalis, “Prediksi Keberhasilan Lamaran Pekerjaan Dengan Count Vectorizer dan Logistic Regression,” in Prosiding Seminar Nasional Riset dan Information Science (SENARIS), Pematangsiantar: STIKOM Tunas Bangsa, Apr. 2022, pp. 16–25. Accessed: Oct. 27, 2023. [Online]. Available: http://tunasbangsa.ac.id/seminar/index.php/senaris/article/view/204
S. Moro, P. Rita, and P. Cortez, “Bank Marketing,” UCI Machine Learning, vol. 1, no. 1, Jan. 2012, doi: https://doi.org/10.24432/C5K306.
F. Safarkhani and S. Moro, “Improving the Accuracy of Predicting Bank Depositor’s Behavior Using a Decision Tree,” Applied Sciences, vol. 11, no. 19, p. 9016, Sep. 2021, doi: 10.3390/app11199016.
F. K. Fikriah, “Feature Selection dengan Decision Tree untuk Prediksi Telemarketing Bank,” Jurnal Ilmu Komputer, vol. 15, no. 1, pp. 1–7, Apr. 2022, Accessed: Oct. 27, 2023. [Online]. Available: https://ojs.unud.ac.id/index.php/jik/article/view/74903

Index Terms

Computer Science

Information Sciences

Data Mining

Classification

Machine Learning

Keywords

Random Forest Logistic Regression Deposit Customers Feature Selection Forward Selection Recursive Feature Elimination