Data-Driven Education:
Using Big Educational Data
to Improve Teaching and Learning
Peter Brusilovsky
School of Information Sciences,
University of Pittsburgh
MOOC
Massive Open Online Course
Completion Rate
MOOC Completion Rate
Classic loop user modeling - adaptation in adaptive systems
http://www.katyjordan.com/MOOCproject.html
What Else These Students Need?
•  Top colleges
–  Stanford, CalTech, Princeton, GATech, Penn, Duke..
•  Great faculty – top guns in their fields
•  Great content
•  Top online platforms – Coursera, edX, Udacity
•  FREE!
The Needs and the Means
•  The challenges
–  Support self-regulated learning
–  Engage students into learning
–  Personalize learning process
–  Find our how we can teach better
–  Learn more about learning
•  The opportunities
–  Large volume of data
–  Who did what, when, and what was the result
Data-Driven Education
•  Using data left by past learners to benefit future
learners
•  How this data could be used? Who is making
sense of the data?
•  Human-Centered Approach
–  Visual Learning Analytics
•  Machine-Centered Approach
–  Educational Data Mining
Visual Learning Analytics
•  The idea: Present data in visual form to student
administrators helping them to make better
decisions about learning process
•  Support self-regulated learning
•  Provide navigation support for students
•  Show performance to instructors to make
decisions
•  Show data to administrators to redesign process
Educational Data Mining
•  The idea: Feed data to various data mining and
machine learning approaches to improve
existing automated learning and discover
important things for future improvements
•  Better domain modeling
•  Better student modeling
•  Better adaptation approaches
•  Finding what works best for different groups and
students
Research at PAWS Lab, U of Pittsburgh
•  http://adapt2.sis.pitt.edu/wiki/
•  Social Navigation in E-learning
•  Open Social Student modeling with Social
Comparison
•  Mining and using problem solving genome
•  Domain Modeling and Latent topic discovery
•  Data-driven student modeling
•  Open Corpus personalization
Navigation Support
•  Students need personalized guidance (navigation
support) to access right content in the right time
–  Too late – easy but mostly useless
–  Too early – not year ready to understand/apply
–  Students start with different knowledge, learn with
different speed
•  Knowledge-based navigation support based on
student modeling works well to increase success and
motivation
•  Knowledge-based approaches require considerable
knowledge engineering – domain modeling, content
analysis, prerequisite elicitation, etc.
Knowledge-based navigation support
 
1. Concept role
2. Current concept state
3. Current section state
4. Linked sections state
4
3
2
1
√"
Metadata-based mechanism
Knowledge organization for guidance
Example 2	 Example M	
Example 1	
Problem 1	
Problem 2	 Problem K	
Concept 1	
Concept 2	
Concept 3	
Concept 4	
Concept 5	
Concept N	
Examples
Problems
Concepts
Social Navigation
•  Wisdom from user data vs. wisdom from experts
•  Social navigation uses behavior of past users to
guide new users
•  Can we use “wisdom” extracted from the work of
a community of learners to replace knowledge-
based guidance?
•  Knowledge engineering vs. data analysis
Knowledge Sea II`
Brusilovsky, P., Chavan, G., and Farzan, R. (2004) Social adaptive navigation support for open corpus electronic
textbooks. Third International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems
Knowledge Sea II (+ AnnotatEd)
Farzan, R. and Brusilovsky, P. (2008) AnnotatEd: A social navigation and annotation service for web-based
educational resources. New Review in Hypermedia and Multimedia 14 (1), 3-32.
Knowledge Sea
•  Focused on group-level guidance
– Students do follow visual cues
– Social visual cues help to discover important
pages without knowledge engineering
•  Produced sizeable increase of motivations
– Students access the system more and do more
readings
– With annotation-based visual cues they do
more annotations
Open Social Student Modeling
•  Key ideas
–  Make traditional student models open to the users
–  Allow students to compare themselves with class and
peers
–  Social navigation based on performance data
•  Main challenge
–  How to design the interface to make an easy
comparison and provide social guidance and
motivation
–  We went through several attempts
QuizMap
19
Parallel Introspective Views
20
Class vs. Peers
•  Peer progress was important, students
frequently accessed content using peer models
•  The more the students compared to their peers,
the higher post-quiz scores they received (r=
0.34 p=0.004)
•  Parallel IV didn’t allow to recognized good peers
before opening the model
•  Progressor added clear peer progress
Progressor
Hsiao, I. H., Bakalov, F., Brusilovsky, P., and König-Ries, B. (2013) Progressor: social navigation support through
open social student modeling. New Review of Hypermedia and Multimedia 19 (2), 112-131.
The Value of Peers
205.73
113.05
80.81
125.5
0
50
100
150
200
250
Attempts
Progressor
QuizJET+IV
QuizJET+Portal
JavaGuide
68.39%
71.35%
42.63%
58.31%
0.00%
20.00%
40.00%
60.00%
80.00%
Success Rate
Progressor
QuizJET+IV
QuizJET+Portal
JavaGuide
The Secret
MasteryGrids
•  Adaptive Navigation Support
•  Topic-based Adaptation
•  Open Social Student Modeling
•  Social Educational Progress Visualization
•  Multiple Content Types
•  Open Source
•  Concept-Based Recommendation
•  Multiple Groups
Open Social Student Modeling - I
Interactive Demo YouTube Demo
Open Social Student Modeling - II
Open Social Student Modeling - III
The Study
•  A classroom study in a graduate Database Course
•  Two sections of the same class. Same teacher, same
lectures, etc.
•  The students were able to access non-mandatory
database practice content (exercises, examples) through
Mastery Grids
•  47 students worked with OSM interface and 42 students
worked with OSSM interface
Brusilovsky, P., Somyurek, S., Guerra, J., Hosseini, R., Zadorozhny, V., and Durlach, P. (2016) The Value of
Social: Comparing Open Student Modeling and Open Social Student Modeling. IEEE Transactions on
Emerging Topics in Computing 4 (3), 450-461.
Impact on Learning
•  Student knowledge significantly increased in both
groups
•  Number of attempted problems significantly
predicts the final grade (SE=0.04,p=.017).
•  We obtained the coefficient of 0.09 for number of
attempts on problems, meaning attempting 100
problems increases the final grade by 9
•  The mean learning gain was higher for both weak and
strong students in OSSM group
•  The difference was significant for weak students (p=.
033)
Does OSSM increases system usage?
Variable!
OSM! OSSM!
U!
Mean! Mean !
Sessions! 3.93! 6.26! 685.500*!
Topics coverage! 19.0%! 56.4%! 567.500**!
Total attempts to problems 25.86! 97.62! 548.500**!
Correct attempts to problems 14.62! 60.28! 548.000**!
Distinct problems attempted 7.71! 23.51! 549.000**!
Distinct problems attempted correctly 7.52! 23.11! 545.000**!
Distinct examples viewed 18.19! 38.55! 611.500**!
Views to example lines 91.60! 209.40! 609.000**!
MG loads! 5.05! 9.83! 618.500**!
MG clicks on topic cells 24.17! 61.36! 638.500**!
MG click on content cells 46.17! 119.19! 577.500**!
MG difficulty feedback answers 6.83! 14.68! 599.500**!
Total time in the system 5145.34! 9276.58! 667.000**!
Time in problems 911.86! 2727.38! 582.000**!
Time in MG (navigation) 2260.10! 4085.31! 625.000**!
Does OSSM increase Efficiency?
•  Time per line, time per example and time per activity
scores of students in OSSM group are significantly lower
than in the other group.
•  Students who used OSSM interface worked more
efficiently.
Variable!
OSM! OSSM!
U!
Mean! Mean !
Time per line! 22.93! 11.61! 570.000**!
T i m e p e r
example! 97.74! 58.54! 508.000*!
T i m e p e r
problem! 37.96! 29.72! 242.000!
T i m e p e r
activity ! 47.92! 34.33! 277.000*!
Does OSSM Increase Student Retention?
0
20
40
60
80
100
0+ 10+ 20+ 30+ 40+ 50+
%Studentsinclass
Problem attempts
OSSM
OSM
•  OSSM group had much higher
student usage
•  Looking much more interesting to
students at the start (compare
#students after the first login)
•  At the level of 30+, serious
engagement with the system, the
OSSM group still retained more
than 50% of its original users
while OSM engagement was below
20%.
0
20
40
60
80
100
0+ 10+ 20+ 30+ 40+ 50+
Problem attempts
OSSM
OSM
Why Engagement Is Important?
•  Many systems demonstrated their educational
effectiveness in a lab-like settings: once the students are
pushed to use it - it benefits their learning
•  However, once released to real classes, these systems are
under-used - most of them offer additional non-
mandatory learning opportunities
•  “Students are only interested in points and grades”
•  Convert all tools into credit-bearing activities?
•  Or use alternative approaches to increase motivation
•  Critical to support students in self-organized non-credit
learning context like MOOCs
Current State on OSSM
•  MasteryGrids is an open source system
•  Full support offered for three domains
–  Java, 3 types of smart content
–  Python, 4 types of smart content
–  SQL, 3 types of smart content
•  Several large-scale studies in progress
•  Exploring new concept-based OSSM Interface
•  Collaborators are welcome!
–  Can use your own content and course structure!
These cells (first row) shows your
progress in the topics of the course
This bar chart shows
your progress in the
concepts of the course
Each topic has several concepts
associated to it. Mouseover a topic
to highlight its concepts
This bar chart (upside-down)
shows the average progress of
the rest of the class on the
concepts
Middle row shows the difference
between your progress and the
progress of the group
Third row shows the progress of the
group in blue
Concept-Based OSSM
Problem-Solving Genome
•  Key ideas
–  Individual differences important for understanding
students and adapting learning
–  "Old generation" of individual differences (i.e. learning
styles) not valuable in e-learning context
–  Could we use "data-driven" science extracting individual
differences from behavior data?
•  Main challenge
–  How to process the data to find and use individual
differences
•  Our approach uses sequence mining and profiling
based on the use of micro-sequences
Context: Parameterized Java Exercises
Some numbers change each time
the exercise is loaded
Hard to game
Exercise from QuizJet system
Dataset
Exercises
•  101 parameterized exercises
•  19 topics
•  Exercises labeled as easy (41), medium (41) or hard (19)
complexity
Students
•  3 terms, a total of 101 students
•  21,215 attempts, 14,726 correct and 6,489 incorrect
•  We formed sequences of repetitions of the student in the
same exercise in the same session within the system
•  We collect time in each attempt
•  Pretest, posttest (not all the students)
Timing
•  Time in first attempt is always longer (the
student has to understand the exercise)
First attempt
Next attempts
Labeling Steps (attempts)
Correctness: Success (S) or Failure (F)
Time: Short (lowercase) or Long (uppercase)
–  Using median of the distribution of time per exercise
–  Using different distributions for first attempt
label correctness time
s success short
S success long
f failure short
F failure long
Labeled Sequences
•  First and last attempt are labeled differently. Here
we used underscore ‘_’
•  Example sequences:
_fS_
_fFs_
_ss_
This labeled representation is for making sequences and patterns more
readable. The actual labeling used for running the pattern mining
algorithm uses only uppercase letters and different sets of letters for
first and last attempts within sequences.
Pattern mining
•  Using PexSPAM algorithm with gap = 0
•  Each possible pattern of length 2 or higher is
explored
•  Support of a pattern: proportion of sequences
containing the pattern (at least once)
–  Does not count multiple occurrences of the pattern within a
sequence
•  Select all patterns with minimum support of 1%
Pattern mining
•  There were 102 frequent micro patterns
Top 20 frequent micro patterns
The Problem Solving Genome
•  Constructed a frequency vector over the 102
patterns (vector of size 102) for each student
–  Each common pattern is a gene
•  The vector represents how frequently a student uses
each of the micro patterns
•  The vector is an individual genome build of genes
Problem Solving Genome
_fSss_
_fSS_
_FFss_
_FSss
_
_fSs_ Frequencies of each of the
102 common patterns
3/5
ss_ ss Ss SS_ _FS_
0/5 2/5 1/5 0/5 …
Guerra, J., Sahebi, S., Lin, Y.-R., and Brusilovsky, P. (2014) The
Problem Solving Genome: Analyzing Sequential Patterns of Student Work
with Parameterized Exercises. In: J. Stamper, Z. Pardos, M. Mavrikis and
B. M. McLaren (eds.) Proceedings of the 7th International Conference on
Educational Data Mining (EDM 2014) pp. 153-160.
Exploring the Genome
•  Stability
–  Are the patterns stable on a
student?
•  Effect of complexity
–  Are the patterns different across
complexity levels?
•  Patterns of success
–  Are successful students
following different patterns?
Genome Stability
•  Is the student more similar to him/herself
than to others?
–  Select students with at least 60 sequences (32 students)
–  For each student:
•  Split sequences per student in two random sets (set 1, set 2)
•  Form Genome of each set
–  Compute Jensen-Shannon (JS) divergence between:
•  The the genome of the 2 sets of each student (self-distance)
•  Student’s set 1 genome and set 1 of other students (average)
(other-distance)
•  Are students changing patterns over time?
–  Repeat the procedure splitting sets in early (first half) and
late (second half) sequences per student
Results (1)
Self-
distances
Other-
distances
Sig. Cohen’s
d
M SE M SE
Randomly split Genome
(a)
.2370 .0169 .4815 .0141 <.001 2.693
Early/Late Genome (b) .3211 .0214 .4997 .0164 <.001 1.205
Paired-sample t-test
•  Even when changing from early to late sequences,
student self distance is significantly smaller than the
distance to others
Genome is stable on individuals
Clustering by Genome
•  Cluster students by their genomes and analyze
different patterns
–  Between clusters
–  Between low and high students within each cluster
•  Spectral Clustering with k = 2
–  Larger eigen-gap with k = 2
•  Cluster 1: confirmers (repeat short successes)
•  Cluster 2: non-confirmers (quitters)
Ordering patterns by difference magnitude
(cluster 2 – cluster 1)
Latent groups vs performance
Groups and guidance
•  Successful patterns in each cluster
are closer to the other cluster
–  Successful confirmers tend to stop
after long success
–  Successful non-confirmers (c 2) tend
to continue after hard success
•  Extreme different patterns between
clusters are “harmful”
•  How it could be used for
personalization?
–  Identify student type
–  Offer different interface or discourage
poor behavior with recommendation
_FS_
Learning from examples vs. problems in Java
S / F – problem attempts
e – example
a – animated example
Preferred type of online learning content
E – Exercise
T – Text tutorial
X – Example
V – Video tutorial
Leaving for the next time
•  Domain Modeling and Latent topic discovery
–  Sahebi, S., Lin, Y.-R., and Brusilovsky, P. (2016) Tensor Factorization
for Student Modeling and Performance Prediction in Unstructured Domain.
Proceedings of the 9th International Conference on Educational Data
Mining (EDM 2016), pp. 502-505.
•  Data-driven student modeling
–  González-Brenes, J. P., Huang, Y., and Brusilovsky, P. (2014)
General Features in Knowledge Tracing to Model Multiple Subskills,
Temporal Item Response Theory, and Expert Knowledge. Proceedings of the
7th International Conference on Educational Data Mining (EDM 2014),
London, UK, July 4-7, 2014, pp. 84-91.
•  Open Corpus modeling and personalization
–  Huang, Y., Yudelson, M., Han, S., He, D., and Brusilovsky, P.
(2016) A Framework for Dynamic Knowledge Modeling in Textbook-Based
Learning. In: Proceedings of 24th Conference on User Modeling,
Adaptation and Personalization (UMAP 2016), pp. 141-150.
Acknowledgements
•  Joint work with
–  Rosta Farzan, Sharon Hsiao, Tomek Loboda
–  Sherry Sahebi, Julio Guerra, Roya Hosseini
–  Yun Huang, Rafael Dias Araújo
•  U. of Pittsburgh “Innovation in Education” awards
•  NSF Grants
–  CAREER 0447083
–  EHR 0310576
–  IIS 0426021
•  ADL.net support for OSSM work
Visit us in Pittsburgh to Learn More!
… or Read our Papers
•  http://www.pitt.edu/~peterb/papers.html
•  https://www.researchgate.net/profile/
Peter_Brusilovsky

Data driveneducationicwl2016

  • 1.
    Data-Driven Education: Using BigEducational Data to Improve Teaching and Learning Peter Brusilovsky School of Information Sciences, University of Pittsburgh
  • 2.
  • 3.
  • 4.
    MOOC Completion Rate Classicloop user modeling - adaptation in adaptive systems http://www.katyjordan.com/MOOCproject.html
  • 5.
    What Else TheseStudents Need? •  Top colleges –  Stanford, CalTech, Princeton, GATech, Penn, Duke.. •  Great faculty – top guns in their fields •  Great content •  Top online platforms – Coursera, edX, Udacity •  FREE!
  • 6.
    The Needs andthe Means •  The challenges –  Support self-regulated learning –  Engage students into learning –  Personalize learning process –  Find our how we can teach better –  Learn more about learning •  The opportunities –  Large volume of data –  Who did what, when, and what was the result
  • 7.
    Data-Driven Education •  Usingdata left by past learners to benefit future learners •  How this data could be used? Who is making sense of the data? •  Human-Centered Approach –  Visual Learning Analytics •  Machine-Centered Approach –  Educational Data Mining
  • 8.
    Visual Learning Analytics • The idea: Present data in visual form to student administrators helping them to make better decisions about learning process •  Support self-regulated learning •  Provide navigation support for students •  Show performance to instructors to make decisions •  Show data to administrators to redesign process
  • 9.
    Educational Data Mining • The idea: Feed data to various data mining and machine learning approaches to improve existing automated learning and discover important things for future improvements •  Better domain modeling •  Better student modeling •  Better adaptation approaches •  Finding what works best for different groups and students
  • 10.
    Research at PAWSLab, U of Pittsburgh •  http://adapt2.sis.pitt.edu/wiki/ •  Social Navigation in E-learning •  Open Social Student modeling with Social Comparison •  Mining and using problem solving genome •  Domain Modeling and Latent topic discovery •  Data-driven student modeling •  Open Corpus personalization
  • 11.
    Navigation Support •  Studentsneed personalized guidance (navigation support) to access right content in the right time –  Too late – easy but mostly useless –  Too early – not year ready to understand/apply –  Students start with different knowledge, learn with different speed •  Knowledge-based navigation support based on student modeling works well to increase success and motivation •  Knowledge-based approaches require considerable knowledge engineering – domain modeling, content analysis, prerequisite elicitation, etc.
  • 12.
    Knowledge-based navigation support   1.Concept role 2. Current concept state 3. Current section state 4. Linked sections state 4 3 2 1 √" Metadata-based mechanism
  • 13.
    Knowledge organization forguidance Example 2 Example M Example 1 Problem 1 Problem 2 Problem K Concept 1 Concept 2 Concept 3 Concept 4 Concept 5 Concept N Examples Problems Concepts
  • 14.
    Social Navigation •  Wisdomfrom user data vs. wisdom from experts •  Social navigation uses behavior of past users to guide new users •  Can we use “wisdom” extracted from the work of a community of learners to replace knowledge- based guidance? •  Knowledge engineering vs. data analysis
  • 15.
    Knowledge Sea II` Brusilovsky,P., Chavan, G., and Farzan, R. (2004) Social adaptive navigation support for open corpus electronic textbooks. Third International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems
  • 16.
    Knowledge Sea II(+ AnnotatEd) Farzan, R. and Brusilovsky, P. (2008) AnnotatEd: A social navigation and annotation service for web-based educational resources. New Review in Hypermedia and Multimedia 14 (1), 3-32.
  • 17.
    Knowledge Sea •  Focusedon group-level guidance – Students do follow visual cues – Social visual cues help to discover important pages without knowledge engineering •  Produced sizeable increase of motivations – Students access the system more and do more readings – With annotation-based visual cues they do more annotations
  • 18.
    Open Social StudentModeling •  Key ideas –  Make traditional student models open to the users –  Allow students to compare themselves with class and peers –  Social navigation based on performance data •  Main challenge –  How to design the interface to make an easy comparison and provide social guidance and motivation –  We went through several attempts
  • 19.
  • 20.
  • 21.
    Class vs. Peers • Peer progress was important, students frequently accessed content using peer models •  The more the students compared to their peers, the higher post-quiz scores they received (r= 0.34 p=0.004) •  Parallel IV didn’t allow to recognized good peers before opening the model •  Progressor added clear peer progress
  • 22.
    Progressor Hsiao, I. H.,Bakalov, F., Brusilovsky, P., and König-Ries, B. (2013) Progressor: social navigation support through open social student modeling. New Review of Hypermedia and Multimedia 19 (2), 112-131.
  • 23.
    The Value ofPeers 205.73 113.05 80.81 125.5 0 50 100 150 200 250 Attempts Progressor QuizJET+IV QuizJET+Portal JavaGuide 68.39% 71.35% 42.63% 58.31% 0.00% 20.00% 40.00% 60.00% 80.00% Success Rate Progressor QuizJET+IV QuizJET+Portal JavaGuide
  • 24.
  • 25.
    MasteryGrids •  Adaptive NavigationSupport •  Topic-based Adaptation •  Open Social Student Modeling •  Social Educational Progress Visualization •  Multiple Content Types •  Open Source •  Concept-Based Recommendation •  Multiple Groups
  • 26.
    Open Social StudentModeling - I Interactive Demo YouTube Demo
  • 27.
    Open Social StudentModeling - II
  • 28.
    Open Social StudentModeling - III
  • 29.
    The Study •  Aclassroom study in a graduate Database Course •  Two sections of the same class. Same teacher, same lectures, etc. •  The students were able to access non-mandatory database practice content (exercises, examples) through Mastery Grids •  47 students worked with OSM interface and 42 students worked with OSSM interface Brusilovsky, P., Somyurek, S., Guerra, J., Hosseini, R., Zadorozhny, V., and Durlach, P. (2016) The Value of Social: Comparing Open Student Modeling and Open Social Student Modeling. IEEE Transactions on Emerging Topics in Computing 4 (3), 450-461.
  • 30.
    Impact on Learning • Student knowledge significantly increased in both groups •  Number of attempted problems significantly predicts the final grade (SE=0.04,p=.017). •  We obtained the coefficient of 0.09 for number of attempts on problems, meaning attempting 100 problems increases the final grade by 9 •  The mean learning gain was higher for both weak and strong students in OSSM group •  The difference was significant for weak students (p=. 033)
  • 31.
    Does OSSM increasessystem usage? Variable! OSM! OSSM! U! Mean! Mean ! Sessions! 3.93! 6.26! 685.500*! Topics coverage! 19.0%! 56.4%! 567.500**! Total attempts to problems 25.86! 97.62! 548.500**! Correct attempts to problems 14.62! 60.28! 548.000**! Distinct problems attempted 7.71! 23.51! 549.000**! Distinct problems attempted correctly 7.52! 23.11! 545.000**! Distinct examples viewed 18.19! 38.55! 611.500**! Views to example lines 91.60! 209.40! 609.000**! MG loads! 5.05! 9.83! 618.500**! MG clicks on topic cells 24.17! 61.36! 638.500**! MG click on content cells 46.17! 119.19! 577.500**! MG difficulty feedback answers 6.83! 14.68! 599.500**! Total time in the system 5145.34! 9276.58! 667.000**! Time in problems 911.86! 2727.38! 582.000**! Time in MG (navigation) 2260.10! 4085.31! 625.000**!
  • 32.
    Does OSSM increaseEfficiency? •  Time per line, time per example and time per activity scores of students in OSSM group are significantly lower than in the other group. •  Students who used OSSM interface worked more efficiently. Variable! OSM! OSSM! U! Mean! Mean ! Time per line! 22.93! 11.61! 570.000**! T i m e p e r example! 97.74! 58.54! 508.000*! T i m e p e r problem! 37.96! 29.72! 242.000! T i m e p e r activity ! 47.92! 34.33! 277.000*!
  • 33.
    Does OSSM IncreaseStudent Retention? 0 20 40 60 80 100 0+ 10+ 20+ 30+ 40+ 50+ %Studentsinclass Problem attempts OSSM OSM •  OSSM group had much higher student usage •  Looking much more interesting to students at the start (compare #students after the first login) •  At the level of 30+, serious engagement with the system, the OSSM group still retained more than 50% of its original users while OSM engagement was below 20%. 0 20 40 60 80 100 0+ 10+ 20+ 30+ 40+ 50+ Problem attempts OSSM OSM
  • 34.
    Why Engagement IsImportant? •  Many systems demonstrated their educational effectiveness in a lab-like settings: once the students are pushed to use it - it benefits their learning •  However, once released to real classes, these systems are under-used - most of them offer additional non- mandatory learning opportunities •  “Students are only interested in points and grades” •  Convert all tools into credit-bearing activities? •  Or use alternative approaches to increase motivation •  Critical to support students in self-organized non-credit learning context like MOOCs
  • 35.
    Current State onOSSM •  MasteryGrids is an open source system •  Full support offered for three domains –  Java, 3 types of smart content –  Python, 4 types of smart content –  SQL, 3 types of smart content •  Several large-scale studies in progress •  Exploring new concept-based OSSM Interface •  Collaborators are welcome! –  Can use your own content and course structure!
  • 36.
    These cells (firstrow) shows your progress in the topics of the course This bar chart shows your progress in the concepts of the course Each topic has several concepts associated to it. Mouseover a topic to highlight its concepts This bar chart (upside-down) shows the average progress of the rest of the class on the concepts Middle row shows the difference between your progress and the progress of the group Third row shows the progress of the group in blue Concept-Based OSSM
  • 37.
    Problem-Solving Genome •  Keyideas –  Individual differences important for understanding students and adapting learning –  "Old generation" of individual differences (i.e. learning styles) not valuable in e-learning context –  Could we use "data-driven" science extracting individual differences from behavior data? •  Main challenge –  How to process the data to find and use individual differences •  Our approach uses sequence mining and profiling based on the use of micro-sequences
  • 38.
    Context: Parameterized JavaExercises Some numbers change each time the exercise is loaded Hard to game Exercise from QuizJet system
  • 39.
    Dataset Exercises •  101 parameterizedexercises •  19 topics •  Exercises labeled as easy (41), medium (41) or hard (19) complexity Students •  3 terms, a total of 101 students •  21,215 attempts, 14,726 correct and 6,489 incorrect •  We formed sequences of repetitions of the student in the same exercise in the same session within the system •  We collect time in each attempt •  Pretest, posttest (not all the students)
  • 40.
    Timing •  Time infirst attempt is always longer (the student has to understand the exercise) First attempt Next attempts
  • 41.
    Labeling Steps (attempts) Correctness:Success (S) or Failure (F) Time: Short (lowercase) or Long (uppercase) –  Using median of the distribution of time per exercise –  Using different distributions for first attempt label correctness time s success short S success long f failure short F failure long
  • 42.
    Labeled Sequences •  Firstand last attempt are labeled differently. Here we used underscore ‘_’ •  Example sequences: _fS_ _fFs_ _ss_ This labeled representation is for making sequences and patterns more readable. The actual labeling used for running the pattern mining algorithm uses only uppercase letters and different sets of letters for first and last attempts within sequences.
  • 43.
    Pattern mining •  UsingPexSPAM algorithm with gap = 0 •  Each possible pattern of length 2 or higher is explored •  Support of a pattern: proportion of sequences containing the pattern (at least once) –  Does not count multiple occurrences of the pattern within a sequence •  Select all patterns with minimum support of 1%
  • 44.
    Pattern mining •  Therewere 102 frequent micro patterns Top 20 frequent micro patterns
  • 45.
    The Problem SolvingGenome •  Constructed a frequency vector over the 102 patterns (vector of size 102) for each student –  Each common pattern is a gene •  The vector represents how frequently a student uses each of the micro patterns •  The vector is an individual genome build of genes
  • 46.
    Problem Solving Genome _fSss_ _fSS_ _FFss_ _FSss _ _fSs_Frequencies of each of the 102 common patterns 3/5 ss_ ss Ss SS_ _FS_ 0/5 2/5 1/5 0/5 … Guerra, J., Sahebi, S., Lin, Y.-R., and Brusilovsky, P. (2014) The Problem Solving Genome: Analyzing Sequential Patterns of Student Work with Parameterized Exercises. In: J. Stamper, Z. Pardos, M. Mavrikis and B. M. McLaren (eds.) Proceedings of the 7th International Conference on Educational Data Mining (EDM 2014) pp. 153-160.
  • 47.
    Exploring the Genome • Stability –  Are the patterns stable on a student? •  Effect of complexity –  Are the patterns different across complexity levels? •  Patterns of success –  Are successful students following different patterns?
  • 48.
    Genome Stability •  Isthe student more similar to him/herself than to others? –  Select students with at least 60 sequences (32 students) –  For each student: •  Split sequences per student in two random sets (set 1, set 2) •  Form Genome of each set –  Compute Jensen-Shannon (JS) divergence between: •  The the genome of the 2 sets of each student (self-distance) •  Student’s set 1 genome and set 1 of other students (average) (other-distance) •  Are students changing patterns over time? –  Repeat the procedure splitting sets in early (first half) and late (second half) sequences per student
  • 49.
    Results (1) Self- distances Other- distances Sig. Cohen’s d MSE M SE Randomly split Genome (a) .2370 .0169 .4815 .0141 <.001 2.693 Early/Late Genome (b) .3211 .0214 .4997 .0164 <.001 1.205 Paired-sample t-test •  Even when changing from early to late sequences, student self distance is significantly smaller than the distance to others Genome is stable on individuals
  • 50.
    Clustering by Genome • Cluster students by their genomes and analyze different patterns –  Between clusters –  Between low and high students within each cluster •  Spectral Clustering with k = 2 –  Larger eigen-gap with k = 2
  • 51.
    •  Cluster 1:confirmers (repeat short successes) •  Cluster 2: non-confirmers (quitters) Ordering patterns by difference magnitude (cluster 2 – cluster 1)
  • 52.
    Latent groups vsperformance
  • 53.
    Groups and guidance • Successful patterns in each cluster are closer to the other cluster –  Successful confirmers tend to stop after long success –  Successful non-confirmers (c 2) tend to continue after hard success •  Extreme different patterns between clusters are “harmful” •  How it could be used for personalization? –  Identify student type –  Offer different interface or discourage poor behavior with recommendation _FS_
  • 54.
    Learning from examplesvs. problems in Java S / F – problem attempts e – example a – animated example
  • 55.
    Preferred type ofonline learning content E – Exercise T – Text tutorial X – Example V – Video tutorial
  • 56.
    Leaving for thenext time •  Domain Modeling and Latent topic discovery –  Sahebi, S., Lin, Y.-R., and Brusilovsky, P. (2016) Tensor Factorization for Student Modeling and Performance Prediction in Unstructured Domain. Proceedings of the 9th International Conference on Educational Data Mining (EDM 2016), pp. 502-505. •  Data-driven student modeling –  González-Brenes, J. P., Huang, Y., and Brusilovsky, P. (2014) General Features in Knowledge Tracing to Model Multiple Subskills, Temporal Item Response Theory, and Expert Knowledge. Proceedings of the 7th International Conference on Educational Data Mining (EDM 2014), London, UK, July 4-7, 2014, pp. 84-91. •  Open Corpus modeling and personalization –  Huang, Y., Yudelson, M., Han, S., He, D., and Brusilovsky, P. (2016) A Framework for Dynamic Knowledge Modeling in Textbook-Based Learning. In: Proceedings of 24th Conference on User Modeling, Adaptation and Personalization (UMAP 2016), pp. 141-150.
  • 57.
    Acknowledgements •  Joint workwith –  Rosta Farzan, Sharon Hsiao, Tomek Loboda –  Sherry Sahebi, Julio Guerra, Roya Hosseini –  Yun Huang, Rafael Dias Araújo •  U. of Pittsburgh “Innovation in Education” awards •  NSF Grants –  CAREER 0447083 –  EHR 0310576 –  IIS 0426021 •  ADL.net support for OSSM work
  • 58.
    Visit us inPittsburgh to Learn More!
  • 59.
    … or Readour Papers •  http://www.pitt.edu/~peterb/papers.html •  https://www.researchgate.net/profile/ Peter_Brusilovsky