CFP last date
20 May 2024
Call for Paper
June Edition
IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2024

Submit your paper
Know more
Reseach Article

A Combined Model based on Clustering and Regression to Predicting School Dropout in Higher Education Institution

by Marilia N. C. A. Lima, Wedson L. Soares, Iago R. R. Silva, Roberta A. De A. Fagundes
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 176 - Number 34
Year of Publication: 2020
Authors: Marilia N. C. A. Lima, Wedson L. Soares, Iago R. R. Silva, Roberta A. De A. Fagundes

Marilia N. C. A. Lima, Wedson L. Soares, Iago R. R. Silva, Roberta A. De A. Fagundes . A Combined Model based on Clustering and Regression to Predicting School Dropout in Higher Education Institution. International Journal of Computer Applications. 176, 34 ( Jun 2020), 1-8. DOI=10.5120/ijca2020920396

@article{ 10.5120/ijca2020920396,
author = { Marilia N. C. A. Lima, Wedson L. Soares, Iago R. R. Silva, Roberta A. De A. Fagundes },
title = { A Combined Model based on Clustering and Regression to Predicting School Dropout in Higher Education Institution },
journal = { International Journal of Computer Applications },
issue_date = { Jun 2020 },
volume = { 176 },
number = { 34 },
month = { Jun },
year = { 2020 },
issn = { 0975-8887 },
pages = { 1-8 },
numpages = {9},
url = { },
doi = { 10.5120/ijca2020920396 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Journal Article
%1 2024-02-07T00:44:11.197522+05:30
%A Marilia N. C. A. Lima
%A Wedson L. Soares
%A Iago R. R. Silva
%A Roberta A. De A. Fagundes
%T A Combined Model based on Clustering and Regression to Predicting School Dropout in Higher Education Institution
%J International Journal of Computer Applications
%@ 0975-8887
%V 176
%N 34
%P 1-8
%D 2020
%I Foundation of Computer Science (FCS), NY, USA

School dropout is a frequent problem in Brazil that has a professional and personal impact. The governing authorities seek to reduce this problem in education. Thus, the identification of the factors that cause the dropout rate and its prediction in higher education institutions are difficult tasks. Therefore, three combined models are proposed that use groupings and regression predict school dropouts in Higher Education Institutions (HEIs) in Brazil. The proposed models make the combination of algorithms, K-means with Linear Regression (LR), K-means with Robust Regression (RR), and Kmeans with Support Vector Regression (SVR). Four classic algorithms for evaluating our combined models (SVR, Bagging, LR, RR) are selected for comparison. The methodology utilized in this work was the Cross-Industry Standard Process for Data Mining (CRISP-DM). A comparative analysis performed with classic algorithms presents the efficiency and reliability of the proposed models for the school dropout problem.

  1. Inep., accessed August 8, 2018.
  2. Ibrahim Berkan Aydilek and Ahmet Arslan. A hybrid method for imputation of missing values using optimized fuzzy cmeans with support vector regression and a genetic algorithm. Information Sciences, 233:25–35, 2013.
  3. Leo Breiman. Bagging predictors. Machine learning, 24(2):123–140, 1996.
  4. Peter Bühlmann, Bin Yu, et al. Analyzing bagging. The Annals of Statistics, 30(4):927–961, 2002.
  5. Pete Chapman, Julian Clinton, Randy Kerber, Thomas Khabaza, Thomas Reinartz, Colin Shearer, and Rudiger Wirth. Crisp-dm 1.0 step-by-step data mining guide. 2000.
  6. Manoel Alves de AlmeidaNeto, Roberta Andrade A de de Fagundes, and Carmelo JA Bastos-Filho.Usingmulti-objective algorithms for optimizing support vector regression parameters. In 2018 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2018.
  7. Rafaella Leandra Souza do Nascimento, Geraldo Gomes da Cruz Junior, and Roberta Andrade de Araújo Fagundes. Mineração de dados educacionais: Um estudo sobre indicadores da educação em bases de dados do inep. RENOTE, 16(1).
  8. Rafaella Leandra Souza do Nascimento, Ricardo Batista das Neves Junior, Manoel Alves de Almeida Neto, and Roberta Andrade de Araújo Fagundes. Educational data mining: An application of regressors in predicting school dropout. In International Conference on Machine Learning and Data Mining in Pattern Recognition, pages 246–257. Springer, 2018.
  9. Roberta Andrade de Araújo FAGUNDES and Francisco José de Azevêdo CYSNEIROS. Métodos de regressão robusta e kernel para dados intervalares. 2013.
  10. Bobby J Franklin and Stephen B Trouard. An analysis of dropout predictors within a state high school graduation panel. Schooling, 5:1–8, 2014.
  11. Sharad Gangele, Kirti Soni, and Sunil Patil. Data mining approach towards students behavior assessment methods for higher studies. International Journal of Computer Applications, 181(30):11–14, 2018.
  12. TrevorHastie, Robert Tibshirani, and Jerome Friedman.Unsupervised learning. In The elements of statistical learning, pages 485–585. Springer, 2009.
  13. Anil K Jain. Data clustering: 50 years beyond k-means. Pattern recognition letters, 31(8):651–666, 2010.
  14. Anil K Jain, M Narasimha Murty, and Patrick J Flynn. Data clustering: a review. ACM computing surveys (CSUR), 31(3):264–323, 1999.
  15. Christopher Jepsen, PeterMueser, and Kenneth Troske. Second chance for high school dropouts? a regression discontinuity analysis of postsecondary educational returns to the ged. Journal of Labor Economics, 35(S1):S273–S304, 2017.
  16. Carlos Márquez-Vera, Alberto Cano, Cristobal Romero, Amin Yousef Mohammad Noaman, Habib Mousa Fardoun, and Sebastian Ventura. Early dropout prediction using data mining: a case study with high school students. Expert Systems, 33(1):107–124, 2016.
  17. Joao Mendes-Moreira, Carlos Soares, Alípio Mário Jorge, and Jorge Freire De Sousa. Ensemble approaches for regression: A survey. ACM Computing Surveys (CSUR), 45(1):10, 2012.
  18. Douglas C Montgomery, Elizabeth A Peck, and G Geoffrey Vining. Introduction to linear regression analysis, volume 821. JohnWiley & Sons, 2012.
  19. OD Oyerinde and PA Chia. Predicting students’ academic performances–a learning analytics approach usingmultiple linear regression. 2017.
  20. Tapio Pahikkala, Hanna Suominen, Jorma Boberg, and Tapio Salakoski. Efficient hold-out for subset of regressors. In International Conference on Adaptive and Natural Computing Algorithms, pages 350–359. Springer, 2009.
  21. MR Pooja and MP Pushpalatha. A hybrid decision support system for the identification of asthmatic subjects in a cross-sectional study. In Emerging Research in Electronics, Computer Science and Technology (ICERECT), 2015 International Conference on, pages 288–293. IEEE, 2015.
  22. Cristóbal Romero and Sebastián Ventura. Educational data mining: a review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(6):601–618, 2010.
  23. Mahsa Rouzbahman, Aleksandra Jovicic, and Mark Chignell. Can cluster-boosted regression improve prediction of death and length of stay in the icu? IEEE journal of biomedical and health informatics, 21(3):851–858, 2017.
  24. Dario Sansone. Beyond early warning indicators: High school dropout and machine learning. Social Science Research Network, 2017.
  25. Nicolae-Bogdan Sara, Rasmus Halland, Christian Igel, and Stephen Alstrup.High-school dropout prediction using machine learning: A danish large-scale study. In ESANN 2015 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pages 319–324, 2015.
  26. Paulo Silva, Rafaella Leandra Souza do Nascimento, Marilia Lima, Roberta Fagundes, and Fernando da Fonseca de Souza.Modelos de regressão aplicados a predição do desempenho escolar de estudantes do ensino fundamental. In Brazilian Symposium on Computers in Education (Simpósio Brasileiro de Informática na Educação-SBIE), volume 30, page 1621, 2019.
  27. Alex J Smola and Bernhard Schölkopf. A tutorial on support vector regression. Statistics and computing, 14(3):199–222, 2004.
  28. Silzá Tramontina, Silvia Martins, Mariana B Michalowski, Carla R Ketzer, Mariana Eizirik, Joseph Biederman, and Luis A Rohde. School dropout and conduct disorder in brazilian elementary school students. The Canadian Journal of Psychiatry, 46(10):941–947, 2001.
Index Terms

Computer Science
Information Sciences


Education Data Mining Combined Model Clustering Regression Predicting School dropout