miércoles, 20 de mayo de 2026

626 Article

 

626

Texto que debes decir — Slide 1 (Title)

Good morning everyone.

My name is Augusto Paolo Bernal Parraga, and today I will present our research titled:

“Explainable Machine Learning for Early Identification of At-Risk Students in Ecuadorian Higher Education: A Learning Analytics Approach.”

This work explores how explainable artificial intelligence and learning analytics can support early educational intervention by identifying students at academic risk using institutional academic and behavioral data.

Our study was developed using data from Ecuadorian higher education institutions and combines predictive performance with model interpretability to support transparent educational decision-making.

 

🎤 Texto que debes decir — Slide 2 (The Problem)

One of the major challenges in higher education is the high rate of academic failure and student dropout.

In many institutions, academic risk is detected too late, when students are already disengaged or close to abandoning their studies.

Traditional monitoring systems are often reactive instead of preventive.

As a result, universities lose valuable opportunities for timely intervention and personalized educational support.

This situation motivated our research to explore whether machine learning and learning analytics could support earlier and more transparent identification of at-risk students.

 

🎤 Texto que debes decir — Slide 3 (Objective)

The main objective of this research was to develop an explainable machine learning framework capable of identifying academically at-risk students early in the semester.

To achieve this, we combined institutional academic data from Student Information Systems and behavioral data from Learning Management Systems.

In addition, we incorporated explainable AI techniques to ensure that the predictive results were transparent and interpretable for educational decision-making.

Our intention was not only to improve predictive performance, but also to support responsible, trustworthy, and action-oriented learning analytics in real educational environments.

 

🎤 Texto que debes decir — Slide 4 (Dataset)

Our dataset included information from 350 university students from Ecuadorian higher education institutions.

We integrated two main institutional data sources:

First, Student Information Systems, which provided academic and demographic information such as GPA, credits, and enrollment history.

Second, Learning Management Systems, which provided behavioral indicators including login frequency, assignment activity, submission punctuality, and resource access.

In total, we analyzed more than 25 predictive variables during one academic semester.

The final objective was to classify students into two groups:
at-risk students and non-at-risk students.

 

🎤 Texto que debes decir — Slide 5 (Methodology)

Our methodology followed a complete machine learning pipeline composed of five stages.

First, we integrated academic and behavioral data obtained from institutional systems.

Second, we performed preprocessing and feature engineering, including missing value handling, normalization, and feature selection.

Third, we trained multiple supervised machine learning models, including Logistic Regression, Support Vector Machines, Random Forest, Gradient Boosting, and Neural Networks.

To ensure robustness, we used stratified cross-validation during model training and testing.

Next, we evaluated the models using several performance metrics, including ROC-AUC, F1-score, and Matthews Correlation Coefficient.

Finally, we incorporated explainable AI techniques using SHAP analysis to identify and interpret the variables most strongly associated with academic risk.

 

🎤 Texto que debes decir — Slide 6 (Explainable AI with SHAP)

One of the most important aspects of our study was the integration of explainable artificial intelligence using SHAP analysis.

In educational contexts, predictive accuracy alone is not sufficient.

Institutions also need to understand why a student is classified as being at academic risk.

SHAP allows us to identify the contribution of each variable to the model’s predictions in a transparent and interpretable way.

As shown in the results, the most influential factors were prior GPA, assignment performance, and submission punctuality.

This level of interpretability helps educators and administrators make more informed and trustworthy decisions while supporting targeted educational interventions.

 

🎤 Texto que debes decir — Slide 7 (Results: Model Performance)

In this slide, we present the comparative performance of the evaluated machine learning models.

Overall, all models demonstrated acceptable predictive capability; however, Gradient Boosting achieved the best overall performance.

Specifically, Gradient Boosting obtained a ROC-AUC value of 0.86, an F1-score of 0.76, and a Matthews Correlation Coefficient of 0.64.

These results indicate a strong balance between predictive accuracy and classification reliability.

As shown in the ROC curve comparison, Gradient Boosting consistently outperformed the other approaches across different classification thresholds.

This suggests that the proposed framework can effectively support the early identification of academically at-risk students in real educational settings.

 

🎤 Texto que debes decir — Slide 8 (Key Risk Factors – SHAP Analysis)

Beyond predictive accuracy, our study aimed to understand the key factors associated with academic risk.

Using SHAP analysis, we identified the variables with the greatest contribution to the model’s predictions.

The results show that prior GPA was the most influential predictor, followed by assignment performance and submission punctuality.

We also observed that low LMS activity, reduced engagement, and limited resource access were associated with higher academic risk.

The SHAP summary plots helped us visualize both the magnitude and direction of each variable’s impact on the prediction process.

This level of interpretability is especially important in educational environments because it enables transparent, explainable, and actionable interventions for students who may require additional support.

 

🎤 Texto que debes decir — Slide 9 (Early Identification Capability)

One of the most significant findings of this study is the model’s ability to identify at-risk students early in the semester.

As shown in the results, acceptable predictive performance was already achieved by Week 4, with a ROC-AUC value above 0.70.

This is particularly important because it allows institutions to intervene before academic difficulties become critical.

The confusion matrix also demonstrates that the model correctly identified a substantial proportion of at-risk students during the early stages of the semester.

From an educational perspective, this capability can support tutoring systems, personalized academic guidance, and evidence-based institutional decision-making.

Ultimately, early identification creates opportunities to improve student retention, academic success, and educational support strategies.

 

🎤 Texto que debes decir — Slide 10 (Conclusions)

In conclusion, this study demonstrates that explainable machine learning can effectively support the early identification of academically at-risk students in higher education.

Among the evaluated approaches, Gradient Boosting achieved the best predictive performance while SHAP analysis provided transparent and interpretable explanations of the main risk factors.

The integration of academic and behavioral data allowed us to identify meaningful patterns associated with student vulnerability and disengagement.

Our findings also highlight the importance of combining predictive accuracy with explainability in educational environments where transparency and trust are essential.

Finally, this framework has strong potential to support learning analytics systems, early intervention programs, and data-informed educational decision-making in real institutional contexts.

 

Texto que debes decir — Slide 11 (Closing / Questions)

We believe that explainable machine learning and learning analytics can play an important role in supporting more transparent, data-informed, and student-centered educational systems.

Our work demonstrates that combining predictive analytics with explainable AI can help institutions identify at-risk students earlier and design more effective educational interventions.

Thank you very much for your attention.

 

No hay comentarios:

Publicar un comentario

627 article

  627 🎤 Texto que debes decir — Slide 1 (Title) Good morning everyone. My name is Augusto Paolo Bernal Parraga, and today I will pre...