These slides present the preliminary results through the utilisation of machine learning techniques for the analysis of Educational Robotics activities. An experimentation with 197 secondary school students from Italy was con-ducted, through updating Lego Mindstorms EV3 programming blocks in order to record log files containing the coding sequences designed by the students (within team work), during the resolution of a preliminary Robotics’ exercise. We utilised four machine learning techniques (logistic regression, support vec-tor machine, K-nearest neighbors and random forests) to predict the students’ performance, comparing a supervised approach (using twelve indicators ex-tracted from the log files as input for the algorithms) and a mixed approach (ap-plying a k-means algorithm to calculate the machine learning features). The re-sults have highlighted that SVM with the mixed approach outperformed the other techniques, and that three learning styles were predominantly emerged from the data mining analysis.
1. Analysis of Educational Robotics activities
using a machine learning approach
L. Cesaretti*,**, L. Screpanti*, D. Scaradozzi*,***, E. Mangina****
* Department of Information Engineering, Università Politecnica delle Marche, Ancona, Italy
** Talent srl, Osimo, Italy
*** LSIS- umr CNRS 6168, Laboratoire des Sciences de l'Information et des Systèmes Equipe I&M (ESIL)
**** School of Computer Science, University College Dublin, Dublin, Ireland
2. Why this research project?
In the Educational Robotics (ER) field
researchers have identified lack of
quantitative analysis on how robotics can
improve skills and increase learning
achievements in students (Benitti 2012;
Alimisis, 2013).
Alimisis, D. (2013). Educational robotics: Open questions and new challenges. Themes in Science and Technology Education, 6(1), 63-71.
Benitti, F. B. V. (2012). Exploring the educational potential of robotics in schools: A systematic review. Computers & Education, 58(3), 978-988.
3. Machine learning and Data mining are everywhere!
Made for me
That's a prediction “of what Netflix thinks
you may enjoy watching, based on your
own unique tastes”.
4. Real time monitoring of students’ activity
(Registered Patent: n. 102018000009636, «APPARATO DI MONITORAGGIO DI OPERAZIONI DI ASSEMBLAGGIO E
PROGRAMMAZIONE»)
5. Case Study: Research Questions
1. identification of different patterns in the students’ problem-solving trajectories;
2. accurate prediction of students’ team final performance; and
3. correlation of the discovered patterns of students’ problem-solving with the
evaluation given by the educators
Applying data mining and machine learning methods to data collected from the
educational environments can allow to predict and classify students’ behaviours and
discover latent structural regularities to large educational dataset.
Berland, M., Baker, R. S., & Blikstein, P. (2014). Educational data mining and learning analytics: Applications to constructionist
research. Technology, Knowledge and Learning, 19(1-2), 205-220.
6. Case Study: Participants and Procedure
Students from seven Italian lower and
higher secondary schools, located in
the Emilia Romagna and Marche
regions.
The total number of students involved
in this study is 197.
The experimentation was carried out
from March 2018 to March 2019.
8. Program the robot so that it covers a given
distance (1 m), trying to be as precise as
possible.
Students’ teams involved in the research project
had to take into account some constraints:
• the amount of time within they had to design
and test their solution (15 - 20 minutes);
• the teams could test the programming
sequence as many times as they wanted;
• they were allowed to use measuring
instruments only to measure some robot’s
parameters (for example the radius of the
wheel).
Case Study: the Introductory Exercise
EVALUATION
• if the error was < 4 cm, the educator considered the
challenge completed;
• if the error was >= 4 cm the educator considered the
challenge not completed.
9. Case Study: how to represent the participants' activities in the
robot programming activity
Students’ teams designed 1113 programming sequences to solve the introductory Exercise.
Each programming test realised by the students’ team can be represented as a vector composed by these 12
elements:
● Motors: the n° of Motor blocks in the sequence
● Loops: the n° of Loop blocks in the sequence.
● Conditionals: the n° of Conditional and Sensors blocks in the sequence.
● Others: the n° of blocks in the sequence belonging to different categories than Motors, Loops and Conditionals.
● Added: the n° of blocks added, compared to the previous sequence;
● Deleted: the n° of blocks deleted, compared to the previous sequence;
● Changed: the n° of blocks changed, compared to the previous sequence;
● Equal: the n° of the same blocks, compared to the previous sequence;
● Delta Motors: amount of change in Motor blocks parameters (first, second or third parameter), compared to the
previous sequence (calculated only for blocks of the “Changed” category);
● Delta Loops: amount of change in Loop blocks parameters, compared to the previous sequence;
● Delta Conditionals: amount of change in Conditional blocks parameters, compared to the previous sequence;
● Delta Others: amount of change in Other blocks parameters, compared to the previous sequence.
10. SUPERVISED APPROACH
The feature matrix was created by calculating the
mean value and the standard deviation for each
indicator presented in the previous section, taking into
account all the trials performed by a students’ team to
solve an exercise; then, the authors compared the
performances of four different machine learning
algorithms:
• Logistic Regression
• Support Vector Machine
• k-nearest neighbors
• Random Forest classifier
in the prediction of the students’ teams final result.
MIXED APPROACH
Characterized by the benefits of both supervised
and unsupervised methods: a k-means algorithm
was applied to calculate in which clusters the
programming sequences could be divided. Following
the clustering, the percentage of each cluster in the
programming activity of the students’ groups was
calculated and these new features were then used
to create a feature matrix as an input for the
previously cited four supervised algorithms.
Case Study: Results
11. Case Study: Results
− Accuracy
− Mean Precision (calculated considering the average
value between precision in the prediction of students’
positive performance and negative performance)
− Mean Recall (calculated considering the average value
between the recall in the prediction of students’ positive
performance and negative performance)
− Mean F1 – Score (calculated considering the average
value between the F1-score in the prediction of
students’ positive performance and negative
performance)
To obtain these parameters a repeated 10-fold cross
validation was performed, so that the average value and
standard deviation of the previous four parameters
repeating the 10-fold validation multiple times were
calculated.
13. Case Study: Problem solving patterns
0
5
10
15
20
25
30
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7 Cluster 8 Trials
#ofprogrammingsequences
Different teams' behaviours
Mathematical / Planning Tinkering (with prevalent refining behaviuor)
Tinkering (with significant high changes)
Pearson correlation coefficient
(between the features extracted with
the k-means algorithms and the final
results obtained by students’ teams)
Two features show a statistically
significant negative correlation:
• number of trials (PCC = -0.48, p-
value < 0.001)
• percentage of sequences of the
cluster named “HIGH MOTORS
PARAMETERS CHANGE” (PCC = -
0.39, p-value < 0.01);
High values of these features (typical
for the “Tinkering with significantly
high changes” teams) indicate higher
probability of a negative
performance.
16. 0
5
10
15
20
25
30
35
40
45
50
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
DeltaMotors
Test number
Case Study: Tinkering behaviour (with large changes)
FINAL ERROR = 50 CM
17. Conclusions
• Compare the performance of the machine learning techniques already applied (SVM, Logistic
Regression, KNN, Random Forest) with MLP Neural Network → Larger Dataset
• Another improvement for future development of this study will be time tracking in the log files
generated by the system.
• Authors intend also to utilise recurrent neural network, in particular the long short-term memory
autoencoders (a structure specifically designed to support sequences of input data), in order to
translate the programming sequences created by students into fixed-length vectors (compress
representation of the input data), maintaining high level of information content.
• Another planned development is the update of the current system design with a personalised e-
learning system: an educational recommender system could give real-time feedback to teachers
and students involved in Educational Robotics activities, or propose personalised learning path to
learners.