Track 5. Learning Analytics: The good, the bad and the ugly
Authors: Ángel Manuel Guerrero-Higueras, Noemí Decastro-García, Vicente Matellan and Miguel Ángel Conde
https://youtu.be/ut-fR9cfypU
Predictive models of academic success: a case study with version control systems
1. Predictive models of academic success
A case study with version control systems
´Angel Manuel Guerrero Higueras
Noem´ı de Castro Garc´ıa
Vicente Matell´an
Miguel ´Angel Conde
University of Le´on
Distributed under: Creative Commons Attribution-ShareAlike 4.0 International
2. Aim
Goal
Predict academic success in order to identify the students at-risk
situation.
How? Monitoring their interaction with a Version Control System
(VCS).
Practical assignment → ASSOOFS.
VCS → GitHub.
Data analysis → MoEv tool.
Research questions:
1 Are there features that we can extract from the students’ interactions
with this type of systems that are related with the academic success?
2 Can we build a model that allows predicting students’ success at a
practical assignment, by monitoring their use of a VCS?
Predictive models of academic success: a case study with VCSs 2 de 8
3. Methodolody: MoEv tool
1 Input data gathering.
features: raw data + synthethic data
target variable (class): AP or SS
2 Selection of most significant features.
Feature importance is computed as the
Gini coefficient (G).
3 Selection of models.
parametric and non-parametric models:
Adaptive Boosting (AB), Classification
And Regression Tree (CART),
K-Nearest Neighbors (KNN), Linear
Discriminant Analysis (LDA), Logistic
Regression (LR), Multi-Layer
Perceptron (MLP), Naive Bayes (NB),
and Random Forest (RF).
4 Decision.
Accuracy, Precision, Recall, and F1
score.
Predictive models of academic success: a case study with VCSs 3 de 8
4. Materials
Target Binary variable with two possible values: “AP”, for those
students who will finish a practical assignment
successfully; and “SS”, for those who not.
Input data:
Anonymized student identifier (id).
Number of commit operations (commits).
Number of days where there is at least one commit operation (days).
Average number of commit operations per date (commits/day).
Number of lines of code added (additions).
Number of lines deleted (deletions).
Number of issued opened (issues).
Number of issued closed (closed).
Authorship proof.
Datasets:
1 2016–2017 ASSOO course (46 students).
21 students labelled with “AP”, and 25 students with the label “SS”.
Used to train and test the prediction model.
2 2017–2018 ASSOO course (40 students who).
21 students labelled with “AP”, and 19 students with the label “SS”.
Used to validate the prediction model.
Predictive models of academic success: a case study with VCSs 4 de 8
6. Results II
Classifier Test score Validation score
NB 0.8 0.825
RF 0.8 0.8
LDA 0.8 0.725
MLP 0.5 0.65
CART 0.4 0.6
AB 0.4 0.5
LR 0.7 0.475
KNN 0.6 0.475
SVM 0.4 0.525
Accuracy classification score.
Predictive models of academic success: a case study with VCSs 6 de 8
7. Results III
Confusion matrix for the NB (left), RF (center), and LDA (right) classifiers
evaluated using the test dataset.
Confusion matrix for the NB (left), RF (center), and LDA (right) classifiers
evaluated using the validation dataset.
Predictive models of academic success: a case study with VCSs 7 de 8
8. Conclusions and future steps
Regarding question 1
Results show that some features related with students interaction
with the VCS are discriminant.
However, including more features, such as an authorship proof,
increases models accuracy.
Regarding question 2
The MoEv tools provide a prediction model by evaluating several
classifiers.
There are works to do in order to optimize the selected model by
tuning its hyper-parameters, but results are enough to assert that we
can predict students’ results at a practical assignment with a high
percentage of success.
Predictive models of academic success: a case study with VCSs 8 de 8