Skip to main content

Advertisement

Log in

Identifying Students at Risk of Academic Failure Within the Educational Data Mining Framework

  • Published:
Social Indicators Research Aims and scope Submit manuscript

Abstract

Data mining is widely considered a powerful instrument for searching and acquiring essential relationships among different variables/attributes in a database. Data mining applied in the educational framework is referred to as educational data mining (EDM). EDM enables to get insights into various higher education phenomena, such as students’ academic paths, learning behaviours and determinants of academic success or dropout. In this paper, we aim at evaluating the usefulness of a particular latent class model, the Bayesian Profile Regression, for the identification of students more likely to drop out. Considering students’ performance, motivation and resilience, this technique allows to draw the profiles of students with a higher risk of academic failure. The working example is based on real data collected through an online questionnaire filled in by undergraduate students of an Italian University.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Alva, S. A. (1991). Academic invulnerability among Mexican-American students: The importance of protective and resources and appraisals. Hispanic Journal of Behavioral Sciences, 13, 18–34.

    Google Scholar 

  • Appleton, J. J., Christensen, S. L., & Furlong, M. J. (2008). Student engagement with school: Critical conceptual and methodological issues of the construct. Psychology in the Schools, 45, 369–386.

    Google Scholar 

  • Baepler, P., & Murdoch, C. J. (2010). Academic analytics and data mining in higher education. International Journal for the Scholarship of Teaching and Learning, 4(2), 1–9.

    Google Scholar 

  • Baker, R., & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1(1), 3–17.

    Google Scholar 

  • Baldwin, T. T., Bedell, M. D., & Johnson, J. L. (1997). The social fabric of a team-based M.B.A. program: Network effects on student satisfaction and performance. Academy of Management Journal, 40(6), 1369–1397.

    Google Scholar 

  • Bound, J., & Turner, S. (2011). Dropouts and diplomas: The divergence in collegiate outcomes. In E. Hanushek, S. Machin, & L. Woessmann (Eds.), Handbook of the economics of education (Vol. 4). New York: Elsevier.

    Google Scholar 

  • Burt, R. S. (1997). The contingent value of social capital. Administrative Science Quarterly, 42(2), 339–365.

    Google Scholar 

  • Cho, H., Gay, G., Davidson, B., & Ingraffea, A. (2007). Social networks, communication styles, and learning performance in a CSCL community. Computers & Education, 49(2), 309–329.

    Google Scholar 

  • Cole, S. T. (2005). Comparing mail and web-based survey distribution methods: Results of surveys to leisure travel retailers. Journal of Travel Research, 43(4), 422–430.

    Google Scholar 

  • Covington, M. V. (2000). Goal theory, motivation, and school achievement: An integrative review. Annual Review of Psychology, 51, 171–200.

    Google Scholar 

  • Downes-Le Guin, T., Baker, R., Mechling, J., Ruylea, E., & Ruylea, E. (2012). Myths and realities of respondent engagement in online surveys. Journal of Market Research, 54(5), 613–633.

    Google Scholar 

  • Drea, C. (2004). Student attrition and retention in Ontario’s colleges. College Quarterly, 07(2), 1–7.

    Google Scholar 

  • Eckles, J. E., & Stradley, G. (2012). A social network analysis of student retention using archival data. Social Psychology of Education, 15(2), 165–180.

    Google Scholar 

  • Edwards, M., Cangemi, J. P., & Kowalski, C. J. (1990). The college dropout and institutional responsibility. Education, 111(1), 107–116.

    Google Scholar 

  • Er, E. (2012). Identifying at-risk students using machine learning techniques: A case study with IS 100. International Journal of Machine Learning and Computing, 2(4), 476–481.

    Google Scholar 

  • Gilks, W., Richardson, S., & Spiegelhalter, D. J. (1996). Markov chain Monte Carlo in practice. London: Chapman & Hall.

    Google Scholar 

  • Hastie, D. I., Liverani, S., Azizi, L., Richardson, S., & Stücker, I. (2013). A semi-parametric approach to estimate risk functions associated with multi-dimensional exposure profiles: Application to smoking and lung cancer. BMC Medical Research Methodology, 13, 129.

    Google Scholar 

  • Hu, S., & Kuh, G. D. (2002). Being (dis)engaged in educationally purposeful activities: The influences of student and institutional characteristics. Research in Higher Education, 43(5), 555–575.

    Google Scholar 

  • Ishwaran, H., & James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96(453), 161–173.

    Google Scholar 

  • Kotsiantis, S. (2009). Educational data mining: A case study for predicting dropout-prone students. International Journal of Knowledge Engineering and Soft Data Paradigms, 1(2), 101–111.

    Google Scholar 

  • Larson, R. W. (2000). Toward a psychology of positive youth development. American Psychologist, 55(1), 170–183.

    Google Scholar 

  • Liverani, S., Hastie, D. I., Azizi, L., Papathomas, M., & Richardson, S. (2015). PReMiuM: An R package for profile regression mixture models using Dirichlet processes. Journal of Statistical Software, 64(7), 1–30.

    Google Scholar 

  • Locke, E. A., & Latham, G. P. (2002). Building practically useful theory of goal setting and task motivation. American Psychologist, 57(1), 705–717.

    Google Scholar 

  • Marsh, M. L., & Meyer, H. A. (1997). Understanding motivation and schooling: Where we’ve been, where we are, and where we need to go. Educational Psychology Review, 9, 399–427.

    Google Scholar 

  • Martin, A. J., & Marsh, H. W. (2006). Academic resilience and its psychological and educational correlates: A construct validity approach. Psychology in the Schools, 43, 267–282.

    Google Scholar 

  • Martin, A. J., Marsh, H. W., Williamson, A., & Debus, R. L. (2003). Self-handicapping, defensive pessimism, and goal orientation: A qualitative study of university students. Journal of Educational Psychology, 95(3), 617–628.

    Google Scholar 

  • Masten, A. S. (1994). Resilience in individual development: Successful adaptation despite risk and adversity. In M. Wang & E. Gordon (Eds.), Risk and resilience in inner city America: Challenges and prospects (pp. 3–25). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Meedech, P., Iam-On, N., & Boongoen, T. (2016). Prediction of student dropout using personal profile and data mining approach. In K. Lavangnananda, S. Phon-Amnuaisuk, W. Engchuan, & J. Chan (Eds.), Learning and optimization (Vol. 5, pp. 143–155). Cham: Springer.

    Google Scholar 

  • Molitor, J., Papathomas, M., Jerrett, M., & Richardson, S. (2010). Bayesian profile regression with an application to the National Survey of Children’s Health. Biostatistics, 11(3), 484–498.

    Google Scholar 

  • Nithya, P., Umamaheswari, B., & Umadevi, A. (2016). A survey on educational data mining in field of education. International Journal of Advanced Research in Computer Engineering & Technology, 5(1), 69–78.

    Google Scholar 

  • Papathomas, M., Molitor, J., Hoggart, C., Hastie, D., & Richardson, S. (2012). Exploring data from genetic association studies using Bayesian variable selection and the Dirichlet process: Application to searching for gene x gene patterns. Genetic Epidemiology, 36(6), 663–674.

    Google Scholar 

  • Papathomas, M., Molitor, J., Richardson, S., Riboli, E., & Vineis, P. (2011). Examining the joint effect of multiple risk factors using exposure risk profiles: Lung cancer in non smokers. Environmental Health Perspectives, 119(1), 84–91.

    Google Scholar 

  • Pirani, M., Best, N., Blangiardo, M., Liverani, S., Atkinson, R. W., & Fuller, G. W. (2015). Analysing the health effects of simultaneous exposure to physical and chemical properties of airborne particles. Environmental International, 79, 56–64.

    Google Scholar 

  • Quadri, M. M., & Kalyankar, N. (2010). Drop out feature of student data for academic performance using decision tree techniques. Global Journal of Computer Science and Technology, 10(2), 3–5.

    Google Scholar 

  • Romero, C., & Ventura, S. (2007). Educational data mining. A survey from 1995 to 2005. Expert Systems with Applications, 33(1), 135–146.

    Google Scholar 

  • Romero, C., & Ventura, S. (2010). Educational data mining: A review of the state-of-the-art. IEEE Transactions on Systems, Man, and Cybernetics Part C, 40, 601–618.

    Google Scholar 

  • Roster, C. A., Lucianetti, L., & Albaum, G. (2015). Exploring slider vs. categorical response formats in web-based surveys. Journal of Research Practice, 11(1), D1.

    Google Scholar 

  • Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica, 4, 639–650.

    Google Scholar 

  • Smith, J. P., & Naylor, R. A. (2001). Dropping out of university: A statistical analysis of the probability of withdrawal for UK university students. Journal of Royal Statistical Society Series A, 164, 389–405.

    Google Scholar 

  • Thomas, S. L. (2000). Ties that bind: A social network approach to understanding student integration and persistence. The Journal of Higher Education, 71(5), 591–615.

    Google Scholar 

  • Tinto, V. (1975). Dropout from higher education: A theoretical synthesis of recent research. Review of Educational Research, 45(1), 89–125.

    Google Scholar 

  • Ulriksen, L., Madsen, L. M., & Holmegaard, H. T. (2010). What do we know about explanations for drop out/opt out among young people from STM higher education programmes? Studies in Science Education, 46(2), 209–244.

    Google Scholar 

  • Vandamme, J. P., Meskens, N., & Superby, J. F. (2007). Predicting academic performance by data mining methods. Education Economics, 15(4), 405–419.

    Google Scholar 

  • Vrijheid, M., Slama, R., Robinson, O., Chatzi, L., Coen, M., van den Hazel, P., et al. (2014). The human early-life exposome (HELIX): Project rationale and design. Environmental Health Perspectives, 122, 535–544.

    Google Scholar 

  • Yorke, M., & Longden, B. (2008). The first year experience of higher education in the UK: Final report. York, UK: Higher Education Academy Report.

    Google Scholar 

  • Zimmerman, B. J. (2002). Achieving self-regulation: The trial and triumph of adolescence. In F. Pajares & T. Urdan (Eds.), Academic motivation of adolescents (pp. 1–28). Greenwich, CT: Information Age.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Annalina Sarra.

Appendix

Appendix

See Table 5 and Figs. 1, 2, 3 and 4.

Table 5 Quartiles for the recoded covariates
Fig. 1
figure 1

Summary plot of the posterior distribution of parameter \(\phi _{c}\), for \(c=1, \ldots , 9\): Academic career covariates and motivational items

Fig. 2
figure 2

Summary plot of the posterior distribution of parameter \(\phi _{c}\), for \(c=1, \ldots , 9\): Difficulties and satisfaction items

Fig. 3
figure 3

Summary plot of the posterior distribution of parameter \(\phi _{c}\), for \(c=1, \ldots , 9\): Academic Resilience Scale items

Fig. 4
figure 4

Heat map

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sarra, A., Fontanella, L. & Di Zio, S. Identifying Students at Risk of Academic Failure Within the Educational Data Mining Framework. Soc Indic Res 146, 41–60 (2019). https://doi.org/10.1007/s11205-018-1901-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11205-018-1901-8

Keywords

Navigation