Learning Pulse
D. Di Mitri, M. Scheffel, H. Drachsler, D. Börner, S. Ternier, M. Specht
A machine learning approach for
predicting performance in self-regulated
learning using multimodal data
Paper presentation at LAK17
15th March 2017, Vancouver, Canada
Outline
1. Background, context, vision
2. Our approach
3. Data collection
4. Data analysis
5. Conclusions
Pagina 2
Data deluge in education
Pagina 3
Collecting learning experiences
Picture from tincanapi.com
Pagina 4
Pagina 5
Learning happening across spaces
Context: Self Regulated Learning
Self-Regulated Learning → no guidance → no feedback → no support
Pagina 6
Vision: machine learning approach
y = f(X)
Learning
Performance
(output space)
Predictive
Model
Multimodal Data
(input space)
Pagina 7
Our approach
Pagina 8
Research questions
(RQ-MAIN) How can we store, model and analyse
multimodal data to predict performance in human learning?
(RQ1) Which architecture allows the collection and storage
of multimodal data in a scalable and efficient way?
(RQ2) What is the best way to model multimodal data to
apply supervise machine learning techniques?
(RQ3) Which machine learning model is able to produce
learner specific predictions on multimodal data?
Pagina 9
Participants
• 9 PhD students at Welten institute
• Different disciplines
• Different working setups:
– Time
– Tasks
– Operating systems
Pagina 10
Experimental timeline
Pagina 11
Phase 0
Pre-test
System architecture tested
Phase 1
Training
3 weeks of data collection
Phase 2
Validation
2 weeks of data collection and prediction
Input space – multimodal data
Pagina 12
Context
Body
Activities
Body: physiological (heart-rate)
and physical responses
(steps) - from Fitbit HR
Activities: applications used
during learning
from RescueTime
Context: weather data
from OpenWeatherMap
Output space – Flow Csikszentmihalyi, 1972
Pagina 13
Theoretical Empirical
Activity Rating Tool
Productivity
How productive was
last activity?
Stress
How stressful was
last activity?
Challenge
How challenging was
last activity?
Abilities
How prepared did you
feel for the activity?
FLOW
Participants rate hourly, from 7AM to 7PM
A scalable web app!
Client: Bootstrap + Jquery
Sever: GoogleApp + Python
“Very easy to
use!”
Pagina 14
Data collection
Pagina 15
Data model
Pagina 16
Berg, A., Scheffel, M., Drachsler, H., Ternier, S. & Specht, M. (2016). The Dutch xAPI Experience. Proceedings of the 6th
International Conference on Learning Analytics and Knowledge (LAK’16), April 25-29, 2016, Edinburgh, UK.
Data storing format for the Learning Record StoreExperience API
The data journey
Pagina 18
Complex architecture
Pagina 19
Data collection
• PULL data from the 3rd party APIs
• Make the xAPI triples
• PUSH data in the LRS
• It’s scalable!
• No collisions
• It’s fast
• It’s Interoperable
Learning Pulse Server
+
Learning Record Store
Pagina 20
Data Processing Application
Script in Python running on a VM which processes data in real time
Pagina 21
Data Analysis
Pagina 22
Transformed dataset
• Time Series: tabular representation
• 5 minutes intervals
• Enough samples now!
• Easier view for Machine Learning
• Signal resampling needed
9410
observations
X
29 attributes
Pagina 23
Issue 1) Feature extraction from Time Series
Heart Rate Variability and
Heart Rate Entropy…
didn’t work
SOLUTION
• Mean of the signal
• Maximum
• Minimum
• Standard Deviation
• Average change
Heart-ratesignalfor15mins
Pagina 24
Issue 2) Activity data very sparse
Rule based grouping of applications
Learners’ activity can be compared!
Applications used are
too sparse
SOLUTION
Let’s create
application categories
Pagina 25
Issue 3) Number of labels available
Trade-off:
number of labels
vs
Seamlessness of the data
collections
NO SOLUTION
Pagina 26
Issue 5) Random vs continuous data
Independence constraint
Knowing one value of et for
one observation does not
help us to guess value of
et+1
yt = α + βX t + et
cov(et ,et+1) = 0
FIXED Effect
RANDOM Effect
SOLUTION follows...
Pagina 27
Mixed Effect Linear Model
x0 x1 x2
... xn-1 xn g y
t0 x x x ... x x 1 y
t1 x x x ... x x 1 y
t2 x x x ... x x 2 y
t... ... ... ... ... ... ... 2 y
tp
-1
x x x x x x 3 y
tp ? ? --- --- --- --- x ?
Random EffectsFixed Effects Group
Used R-squared for
goodness-test
LIMITATIONS
● Convergence time
● Mono-output
Pagina 28
Issue 6) Inter-subject variability
i.e. Participants have
rated very differently
SOLUTION
Predictions are
normalised wrt each
learner
xnew = (xmax – xmin)
*xi/100 + xmin
Pagina 29
Conclusions
Pagina 30
RQ1) Architecture
The architecture developed was able of:
1. Importing great number of sensor data in xAPI
format;
2. combining sensor data with self-reports
3. programmatically transform xAPI data
4. train predictive models & reuse them
5. save the predictions to compare with actual values
Pagina 31
RQ2) Represent multimodal data
• Multiple Instance Representation
• Each learning sample is a 5 minute interval
• It’s suitable for machine learning
Pagina 32
RQ3) Machine learning model
• Linear Mixed Effect Models allow
1. taking into account data specific to each
learner
2. distinguish between fixed and random
effects
3. Take categorical data into account.
Pagina 33
Limitations
• Low accuracy of predictions
R-Square tests Stress: 0.32, Challenge: 0.22, Flow score:
0.16, Abilities: 0.08, Productivity: 0.05.
• Real-time issues
Fitbit synchronisation, Virtual Machine performance
• 3rd party API constraints
• No great solution for grouping activity data (manual
grouping)
Pagina 34
Opportunities
• Data driven
• Real Time feedback
• Visualisations can show feedback
• Seamless data collection
• Multimodal dataset for reserach
• Reusable architecture
Pagina 35
*Börner, Tabuenca, Storm,
Happe, and Specht. 2015
Example visualisation:
The Feedback Cube*
Q&A
Thanks for listening!
Daniele Di Mitri
ddm@ou.nl
@dimstudi0
Pagina 36
Check my poster!

Learning Pulse - paper presentation at LAK17

  • 1.
    Learning Pulse D. DiMitri, M. Scheffel, H. Drachsler, D. Börner, S. Ternier, M. Specht A machine learning approach for predicting performance in self-regulated learning using multimodal data Paper presentation at LAK17 15th March 2017, Vancouver, Canada
  • 2.
    Outline 1. Background, context,vision 2. Our approach 3. Data collection 4. Data analysis 5. Conclusions Pagina 2
  • 3.
    Data deluge ineducation Pagina 3
  • 4.
    Collecting learning experiences Picturefrom tincanapi.com Pagina 4
  • 5.
  • 6.
    Context: Self RegulatedLearning Self-Regulated Learning → no guidance → no feedback → no support Pagina 6
  • 7.
    Vision: machine learningapproach y = f(X) Learning Performance (output space) Predictive Model Multimodal Data (input space) Pagina 7
  • 8.
  • 9.
    Research questions (RQ-MAIN) Howcan we store, model and analyse multimodal data to predict performance in human learning? (RQ1) Which architecture allows the collection and storage of multimodal data in a scalable and efficient way? (RQ2) What is the best way to model multimodal data to apply supervise machine learning techniques? (RQ3) Which machine learning model is able to produce learner specific predictions on multimodal data? Pagina 9
  • 10.
    Participants • 9 PhDstudents at Welten institute • Different disciplines • Different working setups: – Time – Tasks – Operating systems Pagina 10
  • 11.
    Experimental timeline Pagina 11 Phase0 Pre-test System architecture tested Phase 1 Training 3 weeks of data collection Phase 2 Validation 2 weeks of data collection and prediction
  • 12.
    Input space –multimodal data Pagina 12 Context Body Activities Body: physiological (heart-rate) and physical responses (steps) - from Fitbit HR Activities: applications used during learning from RescueTime Context: weather data from OpenWeatherMap
  • 13.
    Output space –Flow Csikszentmihalyi, 1972 Pagina 13 Theoretical Empirical
  • 14.
    Activity Rating Tool Productivity Howproductive was last activity? Stress How stressful was last activity? Challenge How challenging was last activity? Abilities How prepared did you feel for the activity? FLOW Participants rate hourly, from 7AM to 7PM A scalable web app! Client: Bootstrap + Jquery Sever: GoogleApp + Python “Very easy to use!” Pagina 14
  • 15.
  • 16.
  • 17.
    Berg, A., Scheffel,M., Drachsler, H., Ternier, S. & Specht, M. (2016). The Dutch xAPI Experience. Proceedings of the 6th International Conference on Learning Analytics and Knowledge (LAK’16), April 25-29, 2016, Edinburgh, UK. Data storing format for the Learning Record StoreExperience API
  • 18.
  • 19.
  • 20.
    Data collection • PULLdata from the 3rd party APIs • Make the xAPI triples • PUSH data in the LRS • It’s scalable! • No collisions • It’s fast • It’s Interoperable Learning Pulse Server + Learning Record Store Pagina 20
  • 21.
    Data Processing Application Scriptin Python running on a VM which processes data in real time Pagina 21
  • 22.
  • 23.
    Transformed dataset • TimeSeries: tabular representation • 5 minutes intervals • Enough samples now! • Easier view for Machine Learning • Signal resampling needed 9410 observations X 29 attributes Pagina 23
  • 24.
    Issue 1) Featureextraction from Time Series Heart Rate Variability and Heart Rate Entropy… didn’t work SOLUTION • Mean of the signal • Maximum • Minimum • Standard Deviation • Average change Heart-ratesignalfor15mins Pagina 24
  • 25.
    Issue 2) Activitydata very sparse Rule based grouping of applications Learners’ activity can be compared! Applications used are too sparse SOLUTION Let’s create application categories Pagina 25
  • 26.
    Issue 3) Numberof labels available Trade-off: number of labels vs Seamlessness of the data collections NO SOLUTION Pagina 26
  • 27.
    Issue 5) Randomvs continuous data Independence constraint Knowing one value of et for one observation does not help us to guess value of et+1 yt = α + βX t + et cov(et ,et+1) = 0 FIXED Effect RANDOM Effect SOLUTION follows... Pagina 27
  • 28.
    Mixed Effect LinearModel x0 x1 x2 ... xn-1 xn g y t0 x x x ... x x 1 y t1 x x x ... x x 1 y t2 x x x ... x x 2 y t... ... ... ... ... ... ... 2 y tp -1 x x x x x x 3 y tp ? ? --- --- --- --- x ? Random EffectsFixed Effects Group Used R-squared for goodness-test LIMITATIONS ● Convergence time ● Mono-output Pagina 28
  • 29.
    Issue 6) Inter-subjectvariability i.e. Participants have rated very differently SOLUTION Predictions are normalised wrt each learner xnew = (xmax – xmin) *xi/100 + xmin Pagina 29
  • 30.
  • 31.
    RQ1) Architecture The architecturedeveloped was able of: 1. Importing great number of sensor data in xAPI format; 2. combining sensor data with self-reports 3. programmatically transform xAPI data 4. train predictive models & reuse them 5. save the predictions to compare with actual values Pagina 31
  • 32.
    RQ2) Represent multimodaldata • Multiple Instance Representation • Each learning sample is a 5 minute interval • It’s suitable for machine learning Pagina 32
  • 33.
    RQ3) Machine learningmodel • Linear Mixed Effect Models allow 1. taking into account data specific to each learner 2. distinguish between fixed and random effects 3. Take categorical data into account. Pagina 33
  • 34.
    Limitations • Low accuracyof predictions R-Square tests Stress: 0.32, Challenge: 0.22, Flow score: 0.16, Abilities: 0.08, Productivity: 0.05. • Real-time issues Fitbit synchronisation, Virtual Machine performance • 3rd party API constraints • No great solution for grouping activity data (manual grouping) Pagina 34
  • 35.
    Opportunities • Data driven •Real Time feedback • Visualisations can show feedback • Seamless data collection • Multimodal dataset for reserach • Reusable architecture Pagina 35 *Börner, Tabuenca, Storm, Happe, and Specht. 2015 Example visualisation: The Feedback Cube*
  • 36.
    Q&A Thanks for listening! DanieleDi Mitri ddm@ou.nl @dimstudi0 Pagina 36 Check my poster!