626
Texto que
debes decir — Slide 1 (Title)
Good morning everyone.
My name is Augusto Paolo Bernal Parraga, and today I will
present our research titled:
“Explainable Machine Learning for Early Identification of
At-Risk Students in Ecuadorian Higher Education: A Learning Analytics
Approach.”
This work explores how explainable artificial intelligence
and learning analytics can support early educational intervention by
identifying students at academic risk using institutional academic and
behavioral data.
Our study was developed using data from Ecuadorian higher
education institutions and combines predictive performance with model
interpretability to support transparent educational decision-making.
🎤 Texto que debes decir — Slide 2 (The
Problem)
One of the major challenges in higher education is the high
rate of academic failure and student dropout.
In many institutions, academic risk is detected too late,
when students are already disengaged or close to abandoning their studies.
Traditional monitoring systems are often reactive instead of
preventive.
As a result, universities lose valuable opportunities for
timely intervention and personalized educational support.
This situation motivated our research to explore whether
machine learning and learning analytics could support earlier and more
transparent identification of at-risk students.
🎤 Texto que debes decir — Slide 3 (Objective)
The main objective of this research was to develop an
explainable machine learning framework capable of identifying academically
at-risk students early in the semester.
To achieve this, we combined institutional academic data from
Student Information Systems and behavioral data from Learning Management
Systems.
In addition, we incorporated explainable AI techniques to
ensure that the predictive results were transparent and interpretable for
educational decision-making.
Our intention was not only to improve predictive performance,
but also to support responsible, trustworthy, and action-oriented learning
analytics in real educational environments.
🎤 Texto que debes decir — Slide
4 (Dataset)
Our dataset included information from 350 university students
from Ecuadorian higher education institutions.
We integrated two main institutional data sources:
First, Student Information Systems, which provided academic
and demographic information such as GPA, credits, and enrollment history.
Second, Learning Management Systems, which provided
behavioral indicators including login frequency, assignment activity,
submission punctuality, and resource access.
In total, we analyzed more than 25 predictive variables
during one academic semester.
The final objective was to classify students into two groups:
at-risk students and non-at-risk students.
🎤 Texto que debes decir — Slide
5 (Methodology)
Our methodology followed a complete machine learning pipeline
composed of five stages.
First, we integrated academic and behavioral data obtained
from institutional systems.
Second, we performed preprocessing and feature engineering,
including missing value handling, normalization, and feature selection.
Third, we trained multiple supervised machine learning
models, including Logistic Regression, Support Vector Machines, Random Forest,
Gradient Boosting, and Neural Networks.
To ensure robustness, we used stratified cross-validation
during model training and testing.
Next, we evaluated the models using several performance
metrics, including ROC-AUC, F1-score, and Matthews Correlation Coefficient.
Finally, we incorporated explainable AI techniques using SHAP
analysis to identify and interpret the variables most strongly associated with
academic risk.
🎤 Texto que debes decir — Slide 6 (Explainable AI with SHAP)
One of the most important aspects of our study was the
integration of explainable artificial intelligence using SHAP analysis.
In educational contexts, predictive accuracy alone is not
sufficient.
Institutions also need to understand why a student is
classified as being at academic risk.
SHAP allows us to identify the contribution of each variable
to the model’s predictions in a transparent and interpretable way.
As shown in the results, the most influential factors were
prior GPA, assignment performance, and submission punctuality.
This level of interpretability helps educators and
administrators make more informed and trustworthy decisions while supporting
targeted educational interventions.
🎤 Texto que debes decir — Slide
7 (Results: Model Performance)
In this slide, we present the comparative performance of the
evaluated machine learning models.
Overall, all models demonstrated acceptable predictive
capability; however, Gradient Boosting achieved the best overall performance.
Specifically, Gradient Boosting obtained a ROC-AUC value of
0.86, an F1-score of 0.76, and a Matthews Correlation Coefficient of 0.64.
These results indicate a strong balance between predictive
accuracy and classification reliability.
As shown in the ROC curve comparison, Gradient Boosting
consistently outperformed the other approaches across different classification
thresholds.
This suggests that the proposed framework can effectively
support the early identification of academically at-risk students in real
educational settings.
🎤 Texto que debes decir — Slide 8 (Key Risk Factors – SHAP
Analysis)
Beyond predictive accuracy, our study aimed to understand the
key factors associated with academic risk.
Using SHAP analysis, we identified the variables with the
greatest contribution to the model’s predictions.
The results show that prior GPA was the most influential
predictor, followed by assignment performance and submission punctuality.
We also observed that low LMS activity, reduced engagement,
and limited resource access were associated with higher academic risk.
The SHAP summary plots helped us visualize both the magnitude
and direction of each variable’s impact on the prediction process.
This level of interpretability is especially important in
educational environments because it enables transparent, explainable, and
actionable interventions for students who may require additional support.
🎤 Texto que debes decir — Slide 9 (Early Identification
Capability)
One of the most significant findings of this study is the
model’s ability to identify at-risk students early in the semester.
As shown in the results, acceptable predictive performance
was already achieved by Week 4, with a ROC-AUC value above 0.70.
This is particularly important because it allows institutions
to intervene before academic difficulties become critical.
The confusion matrix also demonstrates that the model
correctly identified a substantial proportion of at-risk students during the
early stages of the semester.
From an educational perspective, this capability can support
tutoring systems, personalized academic guidance, and evidence-based
institutional decision-making.
Ultimately, early identification creates opportunities to
improve student retention, academic success, and educational support
strategies.
🎤 Texto que debes decir — Slide
10 (Conclusions)
In conclusion, this study demonstrates that explainable
machine learning can effectively support the early identification of
academically at-risk students in higher education.
Among the evaluated approaches, Gradient Boosting achieved
the best predictive performance while SHAP analysis provided transparent and
interpretable explanations of the main risk factors.
The integration of academic and behavioral data allowed us to
identify meaningful patterns associated with student vulnerability and
disengagement.
Our findings also highlight the importance of combining
predictive accuracy with explainability in educational environments where
transparency and trust are essential.
Finally, this framework has strong potential to support
learning analytics systems, early intervention programs, and data-informed
educational decision-making in real institutional contexts.
Texto que
debes decir — Slide 11 (Closing / Questions)
We believe that explainable machine learning and learning
analytics can play an important role in supporting more transparent,
data-informed, and student-centered educational systems.
Our work demonstrates that combining predictive analytics
with explainable AI can help institutions identify at-risk students earlier and
design more effective educational interventions.
Thank you very much for your attention.
No hay comentarios:
Publicar un comentario