Abstract
Data mining is widely considered a powerful instrument for searching and acquiring essential relationships among different variables/attributes in a database. Data mining applied in the educational framework is referred to as educational data mining (EDM). EDM enables to get insights into various higher education phenomena, such as students’ academic paths, learning behaviours and determinants of academic success or dropout. In this paper, we aim at evaluating the usefulness of a particular latent class model, the Bayesian Profile Regression, for the identification of students more likely to drop out. Considering students’ performance, motivation and resilience, this technique allows to draw the profiles of students with a higher risk of academic failure. The working example is based on real data collected through an online questionnaire filled in by undergraduate students of an Italian University.
Similar content being viewed by others
References
Alva, S. A. (1991). Academic invulnerability among Mexican-American students: The importance of protective and resources and appraisals. Hispanic Journal of Behavioral Sciences, 13, 18–34.
Appleton, J. J., Christensen, S. L., & Furlong, M. J. (2008). Student engagement with school: Critical conceptual and methodological issues of the construct. Psychology in the Schools, 45, 369–386.
Baepler, P., & Murdoch, C. J. (2010). Academic analytics and data mining in higher education. International Journal for the Scholarship of Teaching and Learning, 4(2), 1–9.
Baker, R., & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1(1), 3–17.
Baldwin, T. T., Bedell, M. D., & Johnson, J. L. (1997). The social fabric of a team-based M.B.A. program: Network effects on student satisfaction and performance. Academy of Management Journal, 40(6), 1369–1397.
Bound, J., & Turner, S. (2011). Dropouts and diplomas: The divergence in collegiate outcomes. In E. Hanushek, S. Machin, & L. Woessmann (Eds.), Handbook of the economics of education (Vol. 4). New York: Elsevier.
Burt, R. S. (1997). The contingent value of social capital. Administrative Science Quarterly, 42(2), 339–365.
Cho, H., Gay, G., Davidson, B., & Ingraffea, A. (2007). Social networks, communication styles, and learning performance in a CSCL community. Computers & Education, 49(2), 309–329.
Cole, S. T. (2005). Comparing mail and web-based survey distribution methods: Results of surveys to leisure travel retailers. Journal of Travel Research, 43(4), 422–430.
Covington, M. V. (2000). Goal theory, motivation, and school achievement: An integrative review. Annual Review of Psychology, 51, 171–200.
Downes-Le Guin, T., Baker, R., Mechling, J., Ruylea, E., & Ruylea, E. (2012). Myths and realities of respondent engagement in online surveys. Journal of Market Research, 54(5), 613–633.
Drea, C. (2004). Student attrition and retention in Ontario’s colleges. College Quarterly, 07(2), 1–7.
Eckles, J. E., & Stradley, G. (2012). A social network analysis of student retention using archival data. Social Psychology of Education, 15(2), 165–180.
Edwards, M., Cangemi, J. P., & Kowalski, C. J. (1990). The college dropout and institutional responsibility. Education, 111(1), 107–116.
Er, E. (2012). Identifying at-risk students using machine learning techniques: A case study with IS 100. International Journal of Machine Learning and Computing, 2(4), 476–481.
Gilks, W., Richardson, S., & Spiegelhalter, D. J. (1996). Markov chain Monte Carlo in practice. London: Chapman & Hall.
Hastie, D. I., Liverani, S., Azizi, L., Richardson, S., & Stücker, I. (2013). A semi-parametric approach to estimate risk functions associated with multi-dimensional exposure profiles: Application to smoking and lung cancer. BMC Medical Research Methodology, 13, 129.
Hu, S., & Kuh, G. D. (2002). Being (dis)engaged in educationally purposeful activities: The influences of student and institutional characteristics. Research in Higher Education, 43(5), 555–575.
Ishwaran, H., & James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96(453), 161–173.
Kotsiantis, S. (2009). Educational data mining: A case study for predicting dropout-prone students. International Journal of Knowledge Engineering and Soft Data Paradigms, 1(2), 101–111.
Larson, R. W. (2000). Toward a psychology of positive youth development. American Psychologist, 55(1), 170–183.
Liverani, S., Hastie, D. I., Azizi, L., Papathomas, M., & Richardson, S. (2015). PReMiuM: An R package for profile regression mixture models using Dirichlet processes. Journal of Statistical Software, 64(7), 1–30.
Locke, E. A., & Latham, G. P. (2002). Building practically useful theory of goal setting and task motivation. American Psychologist, 57(1), 705–717.
Marsh, M. L., & Meyer, H. A. (1997). Understanding motivation and schooling: Where we’ve been, where we are, and where we need to go. Educational Psychology Review, 9, 399–427.
Martin, A. J., & Marsh, H. W. (2006). Academic resilience and its psychological and educational correlates: A construct validity approach. Psychology in the Schools, 43, 267–282.
Martin, A. J., Marsh, H. W., Williamson, A., & Debus, R. L. (2003). Self-handicapping, defensive pessimism, and goal orientation: A qualitative study of university students. Journal of Educational Psychology, 95(3), 617–628.
Masten, A. S. (1994). Resilience in individual development: Successful adaptation despite risk and adversity. In M. Wang & E. Gordon (Eds.), Risk and resilience in inner city America: Challenges and prospects (pp. 3–25). Hillsdale, NJ: Erlbaum.
Meedech, P., Iam-On, N., & Boongoen, T. (2016). Prediction of student dropout using personal profile and data mining approach. In K. Lavangnananda, S. Phon-Amnuaisuk, W. Engchuan, & J. Chan (Eds.), Learning and optimization (Vol. 5, pp. 143–155). Cham: Springer.
Molitor, J., Papathomas, M., Jerrett, M., & Richardson, S. (2010). Bayesian profile regression with an application to the National Survey of Children’s Health. Biostatistics, 11(3), 484–498.
Nithya, P., Umamaheswari, B., & Umadevi, A. (2016). A survey on educational data mining in field of education. International Journal of Advanced Research in Computer Engineering & Technology, 5(1), 69–78.
Papathomas, M., Molitor, J., Hoggart, C., Hastie, D., & Richardson, S. (2012). Exploring data from genetic association studies using Bayesian variable selection and the Dirichlet process: Application to searching for gene x gene patterns. Genetic Epidemiology, 36(6), 663–674.
Papathomas, M., Molitor, J., Richardson, S., Riboli, E., & Vineis, P. (2011). Examining the joint effect of multiple risk factors using exposure risk profiles: Lung cancer in non smokers. Environmental Health Perspectives, 119(1), 84–91.
Pirani, M., Best, N., Blangiardo, M., Liverani, S., Atkinson, R. W., & Fuller, G. W. (2015). Analysing the health effects of simultaneous exposure to physical and chemical properties of airborne particles. Environmental International, 79, 56–64.
Quadri, M. M., & Kalyankar, N. (2010). Drop out feature of student data for academic performance using decision tree techniques. Global Journal of Computer Science and Technology, 10(2), 3–5.
Romero, C., & Ventura, S. (2007). Educational data mining. A survey from 1995 to 2005. Expert Systems with Applications, 33(1), 135–146.
Romero, C., & Ventura, S. (2010). Educational data mining: A review of the state-of-the-art. IEEE Transactions on Systems, Man, and Cybernetics Part C, 40, 601–618.
Roster, C. A., Lucianetti, L., & Albaum, G. (2015). Exploring slider vs. categorical response formats in web-based surveys. Journal of Research Practice, 11(1), D1.
Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica, 4, 639–650.
Smith, J. P., & Naylor, R. A. (2001). Dropping out of university: A statistical analysis of the probability of withdrawal for UK university students. Journal of Royal Statistical Society Series A, 164, 389–405.
Thomas, S. L. (2000). Ties that bind: A social network approach to understanding student integration and persistence. The Journal of Higher Education, 71(5), 591–615.
Tinto, V. (1975). Dropout from higher education: A theoretical synthesis of recent research. Review of Educational Research, 45(1), 89–125.
Ulriksen, L., Madsen, L. M., & Holmegaard, H. T. (2010). What do we know about explanations for drop out/opt out among young people from STM higher education programmes? Studies in Science Education, 46(2), 209–244.
Vandamme, J. P., Meskens, N., & Superby, J. F. (2007). Predicting academic performance by data mining methods. Education Economics, 15(4), 405–419.
Vrijheid, M., Slama, R., Robinson, O., Chatzi, L., Coen, M., van den Hazel, P., et al. (2014). The human early-life exposome (HELIX): Project rationale and design. Environmental Health Perspectives, 122, 535–544.
Yorke, M., & Longden, B. (2008). The first year experience of higher education in the UK: Final report. York, UK: Higher Education Academy Report.
Zimmerman, B. J. (2002). Achieving self-regulation: The trial and triumph of adolescence. In F. Pajares & T. Urdan (Eds.), Academic motivation of adolescents (pp. 1–28). Greenwich, CT: Information Age.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sarra, A., Fontanella, L. & Di Zio, S. Identifying Students at Risk of Academic Failure Within the Educational Data Mining Framework. Soc Indic Res 146, 41–60 (2019). https://doi.org/10.1007/s11205-018-1901-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11205-018-1901-8