CFP last date
22 July 2024
Reseach Article

Comparative Analysis of Classification Algorithms for Citizens Welfare Status using PCA as Feature Selection

by Erfin Nur Rohma Khakim, Erik Iman Heri Ujianto
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Number 5
Year of Publication: 2024
Authors: Erfin Nur Rohma Khakim, Erik Iman Heri Ujianto
10.5120/ijca2024923386

Erfin Nur Rohma Khakim, Erik Iman Heri Ujianto . Comparative Analysis of Classification Algorithms for Citizens Welfare Status using PCA as Feature Selection. International Journal of Computer Applications. 186, 5 ( Jan 2024), 30-37. DOI=10.5120/ijca2024923386

@article{ 10.5120/ijca2024923386,
author = { Erfin Nur Rohma Khakim, Erik Iman Heri Ujianto },
title = { Comparative Analysis of Classification Algorithms for Citizens Welfare Status using PCA as Feature Selection },
journal = { International Journal of Computer Applications },
issue_date = { Jan 2024 },
volume = { 186 },
number = { 5 },
month = { Jan },
year = { 2024 },
issn = { 0975-8887 },
pages = { 30-37 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume186/number5/33070-2024923386/ },
doi = { 10.5120/ijca2024923386 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:29:50.464163+05:30
%A Erfin Nur Rohma Khakim
%A Erik Iman Heri Ujianto
%T Comparative Analysis of Classification Algorithms for Citizens Welfare Status using PCA as Feature Selection
%J International Journal of Computer Applications
%@ 0975-8887
%V 186
%N 5
%P 30-37
%D 2024
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The government has launched various programs to improve the welfare of citizens in order to solve the problem of poverty. The problem in poverty alleviation is on its databases. Classification of the level of welfare conventionally with the estimation method causes the classification results to be invalid. In addition, many poor people who should be the target recipients of poverty alleviation programs have yet to be recorded. This study proposes a machine learning data mining method to classify the welfare of citizens so that the results of the category of welfare levels are more computable and valid. The proposed algorithms are Naïve Bayes, Decision Tree and K-Nearest Neighbor (K-NN) and using Principal Component Analysis (PCA) as feature selection and normalization method on the preprocessing. The data that used in this research is Data Indikator Kesejahteraan Sosial (IKS). IKS data is data collected from residents of Bantul Regency in 2022. The IKS data currently consists of 95,347 rows and uses 27 attributes. There are 4 (four) class or label in this dataset include very poor, poor, nearly poor and not poor. The results of the test show that generally the best algorithm performance is K-NN with accuracy, precision and recall values respectively 96.71%, 95.16% and 88.79%. In this study, using PCA and the normalization method also had a significant effect on improving the performance of the classification algorithm. For further research, it is expected to be able to use deep learning algorithms in classifying because it has large data dimensions.

References
  1. R. Manurung, P. S. Ramadhan, and M. I. Perangin-angin, “Perbandingan Akurasi Klasifikasi Tingkat Kemisinan Antara Algoritma C 4.5 Dan Naive Bayes,” J. Ilm. NERO, vol. 2, no. 4, pp. 37–43, 2019, [Online]. Available: http://nero.trunojoyo.ac.id/index.php/nero/article/view/42
  2. L. Afifah, “Apa itu Regresi, Klasifikasi, dan Clustering (Klasterisasi)?,” Ilmu Data Py. [Online]. Available: https://ilmudatapy.com/apa-itu-regresi-klasifikasi-dan-clustering-klasterisasi/
  3. Y. Hastuti and M. Muzaini, “Algoritma Chaid Pada Klasifikasi Rumah Tangga Miskin Kota Palopo,” J. Mat. dan Apl., vol. 1, no. 2, pp. 22–30, 2021.
  4. N. P. N. Hendayati and M. Nurhidayati, “Regresi Logistik Biner dalam Penentuan Ketepatan Klasifikasi Tingkat Kedalaman Kemiskinan Provinsi-Provinsi di Indonesia,” J. Sains dan Teknol., vol. 12, no. 2, pp. 63–70, 2020.
  5. A. M. Wahyu, P. G. Anugrah, A. M. Danyalin, and R. D. Noorrizki, “Ketimpangan Ekonomi Berdampak pada Tingkat Kriminalitas? Telaah dalam Perspektif Psikologi Problematika Sosial,” J. Ilm. Ilmu Sos., vol. 7, no. 2, p. 170, 2021, doi: 10.23887/jiis.v7i2.35361.
  6. B. BPS, “Tabel Kemiskinan,” BPS Kab Bantul. [Online]. Available: https://bantulkab.bps.go.id/subject/23/kemiskinan.html#subjekViewTab3
  7. K. DKB Ditjen Dukcapil, “Statistik Penduduk DIY,” Biro Tata Pemerintahan Setda DIY. [Online]. Available: https://kependudukan.jogjaprov.go.id/statistik/penduduk/jumlahpenduduk/14/0/12/04/.clear
  8. H. Annur, “Klasifikasi Masyarakat Miskin Menggunakan Metode Naive Bayes,” Ilk. J. Ilm., vol. 10, no. 2, pp. 160–165, 2018, doi: 10.33096/ilkom.v10i2.303.160-165.
  9. J. Y. Kim, “Using Machine Learning to Predict Poverty Status in Costa Rican Households,” SSRN Electron. J., 2021, doi: 10.2139/ssrn.3971979.
  10. A. Alsharkawi, M. Al-Fetyani, M. Dawas, H. Saadeh, and M. Alyaman, “Poverty classification using machine learning: The case of Jordan,” Sustain., vol. 13, no. 3, pp. 1–16, 2021, doi: 10.3390/su13031412.
  11. J. A. Talingdan, “Performance comparison of different classification algorithms for household poverty classification,” Proc. - 2019 4th Int. Conf. Inf. Syst. Eng. ICISE 2019, no. 4, pp. 11–15, 2019, doi: 10.1109/ICISE.2019.00010.
  12. M. Gallardo, “Measuring vulnerability to multidimensional poverty with Bayesian network classifiers,” Econ. Anal. Policy, vol. 73, pp. 492–512, 2022, doi: 10.1016/j.eap.2021.11.018.
  13. M. A. Hanafiah and A. Wanto, “Implementation of Data Mining Algorithms for Grouping Poverty Lines by District/City in North Sumatra,” (International J. Inf. Syst. …, vol. 3, no. 36, pp. 315–322, 2020.
  14. Y. Shino, Y. Durachman, and N. Sutisna, “Implementation of Data Mining with Naive Bayes Algorithm for Eligibility Classification of Basic Food Aid Recipients,” Int. J. Cyber IT Serv. Manag., vol. 2, no. 2, pp. 154–162, 2022, doi: 10.34306/ijcitsm.v2i2.114.
  15. E. Firasari, N. Khasanah, U. Khultsum, D. N. Kholifah, R. Komarudin, and W. Widyastuty, “Comparation of K-Nearest Neighboor (K-NN) and Naive Bayes Algorithm for the Classification of the Poor in Recipients of Social Assistance,” J. Phys. Conf. Ser., vol. 1641, no. 1, 2020, doi: 10.1088/1742-6596/1641/1/012077.
  16. E. Afrianto, J. E. Suseno, and B. Warsito, “Decision Tree Method with C4.5 Algorithm for Students Classification Who is Entitled to Receive Indonesian Smart Card (KIP),” IOP Conf. Ser. Mater. Sci. Eng., vol. 879, no. 1, 2020, doi: 10.1088/1757-899X/879/1/012072.
  17. L. G. P. Suardani, I. M. A. Bhaskara, and M. Sudarma, “Optimization of Feature Selection Using Genetic Algorithm with Naïve Bayes Classification for Home Improvement Recipients,” Int. J. Eng. Emerg. Technol., vol. 3, no. 1, pp. 66–70, 2018.
  18. J. Drábeková, “Classification model of poverty risk in the European Union,” Math. Educ. Res. Appl., vol. 7, no. 2, pp. 73–80, 2021, doi: 10.15414/meraa.2021.07.02.73-80.
  19. Fitria, “Perbandingan Algoritma Naive Bayes Validasi 2 dan 3 Pada Klasifikasi Keluarga Miskin di Kabupaten Banjar,” J. Phasti, vol. 05, no. April, pp. 8–14, 2019.
  20. E. Fitriani, “Perbandingan Algoritma C4.5 dan Naive Bayes untuk Menentukan Kelayakan Penerima Bantuan Program Keluarga Harapan,” J. Sist. Inf., vol. 9, no. 1, pp. 103–115, 2019.
  21. D. Ispriyanti, A. Prahutama, and Mustafid, “Analisis Klasifikasi Kemiskinan di Kota Semarang Menggunakan Algoritma Quest,” J. Stat., vol. 7, no. 1, 2019.
  22. K. S. Utomo, “Perbandingan Algoritma Machine Learning untuk Penentuan Klasifikasi Kemiskinan Multidimensi di Provinsi Nusa Tenggara Timur,” J. Stat. Terap., vol. 2, no. April, pp. 36–46, 2022.
  23. N. Zaman et al., Sumber Daya dan Kesejahteraan Masyarakat. Medan: Yayasan Kita Menulis, 2021. [Online]. Available: https://books.google.co.id/books?id=bKIjEAAAQBAJ&hl=id&source=gbs_navlinks_s
  24. D. Arfiani, Berantas Kemiskinan. Semarang: Alprin, 2019. [Online]. Available: https://books.google.co.id/books?id=xnn7DwAAQBAJ&hl=id
  25. H. Samsudin, Sadiman, and I. Pachrozi, Kajian Sosial : Menuju Kemiskinan Satu Digit. Banyuasin: Bappeda Litbang Banyuasin, 2019. [Online]. Available: https://books.google.co.id/books?id=dKndDwAAQBAJ&hl=id
  26. M. Arhami and M. Nasir, Data Mining - Algoritma dan Implementasi, 1st ed. Yogyakarta: Andi, 2020. [Online]. Available: https://books.google.co.id/books?id=AtcCEAAAQBAJ&hl=id
  27. L. Afifah, “Algoritma K-Nearest Neighbor (KNN) untuk Klasifikasi,” Ilmu Data Py. [Online]. Available: https://ilmudatapy.com/algoritma-k-nearest-neighbor-knn-untuk-klasifikasi/
  28. L. Muflikhah, D. E. Ratnawati, and R. R. M. Putri, Data Mining. Malang: Tim UB Press, 2018. [Online]. Available: https://books.google.co.id/books?id=V_NqDwAAQBAJ&hl=id
  29. A. Wanto et al., Data Mining : Algoritma dan Implementasi. Medan: Yayasan Kita Menulis, 2020. [Online]. Available: https://books.google.co.id/books?id=gAnfDwAAQBAJ&hl=id
  30. U. Sa’adah, M. Y. Rochayani, D. W. Lestari, and D. A. Lusia, Kupas Tuntas Algoritma Data Mining dan Implementasinya menggunakan R. Malang: Tim UB Press, 2021. [Online]. Available: https://books.google.co.id/books?id=SI1TEAAAQBAJ&hl=id
  31. S. Marpaung, Solikhun, and Irawan, “Penerapan Metode Naïve Bayes Dalam Memprediksi Prestasi Siswa Di SMA Negeri 1 Panombeian Panei,” J. Sist. Inf. dan Ilmu Komput. Prima(JUSIKOM PRIMA), vol. 4, no. 2, pp. 8–13, 2021, doi: 10.34012/jurnalsisteminformasidanilmukomputer.v4i2.1522.
  32. E. N. R. Khakim, “Perbandingan Algoritma Klasifikasi Data Kesejahteraan Sosial Kabupaten Bantul,” Process. J. Ilm. Sist. Informasi, Teknol. Inf. dan Sist. Komput., vol. 17, no. 2, pp. 91–100, 2022.
  33. Binus, “Decision Tree Algoritma Beserta Contohnya Pada Data Mining,” Binus. [Online]. Available: https://sis.binus.ac.id/2022/01/21/decision-tree-algoritma-beserta-contohnya-pada-data-mining/
  34. A. Khairi, A. F. Ghozali, and A. D. N. Hidayah, “Implementasi K-Nearest Neighbor (KNN) untuk Mengklasifikasi Masyarakat Pra-Sejahtera Desa Sapikerep Kecamatan Sukapura,” TRILOGI J. Ilmu Teknol. Kesehatan, dan Hum., vol. 2, no. 3, pp. 319–323, 2021, doi: 10.33650/trilogi.v2i3.2878.
  35. Sulandri, A. Basuki, and F. A. Bachtiar, “Metode Deteksi Intrusi Menggunakan Algoritme Extreme Learning Machine dengan Correlation-based Feature Selection,” J. Teknol. Inf. dan Ilmu Komput., vol. 8, no. 1, pp. 103–110, 2021, doi: 10.25126/jtiik.0813358.
  36. D. Dahman, “Dimensionality Reduction : LDA, PCA, t-SNE,” Medium. [Online]. Available: https://medium.com/sysinfo/dimensionality-reduction-lda-pca-t-sne-b85254e04348#:~:text=Perbedaan mendasar lain yang membedakan,data dapat dipisahkan dengan baik.
  37. Alfarisi, “Data Preprocessing - Konsep Pembelajaran Data Mining,” Steemit. [Online]. Available: https://steemit.com/education/@alfarisi/data-preprocessing-konsep-pembelajaran-data-mining
  38. Trivusi, “Normalisasi Data : Pengertian, Tujuan dan Metodenya,” Trivusi. [Online]. Available: https://www.trivusi.web.id/2022/09/normalisasi-data.html#:~:text=Normalisasi min-max biasanya memungkinkan,tidak memperlakukan outlier dengan baik.
  39. R. G. Whendasmoro and Joseph, “Analisis Penerapan Normalisasi Data Dengan Menggunakan Z-Score Pada Kinerja Algoritma K-NN,” JURIKOM (Jurnal Ris. Komputer), vol. 9, no. 4, pp. 2407–389, 2022, doi: 10.30865/jurikom.v9i4.4526.
  40. H. E. Wahanani, M. H. P. Swari, and F. A. Akbar, “Case based Reasoning Prediksi Waktu Studi Mahasiswa Menggunakan Metode Euclidean Distance dan Normalisasi Min-Max,” J. Teknol. Inf. dan Ilmu Komput., vol. 7, no. 6, pp. 1279–1288, 2020, doi: 10.25126/jtiik.2020763880.
  41. W. Nengsih, “Analisa Akurasi Permodelan Supervised Dan Unsupervised Learning Menggunakan Data Mining,” Sebatik, vol. 23, no. 2, pp. 285–291, 2019, doi: 10.46984/sebatik.v23i2.771.
Index Terms

Computer Science
Information Sciences

Keywords

Classification feature selection welfare poverty