Data driveneducationicwl2016

Data-Driven Education:
Using Big Educational Data
to Improve Teaching and Learning
Peter Brusilovsky
School of Information Sciences,
University of Pittsburgh

MOOC
Massive Open Online Course

MOOC Completion Rate
Classic loop user modeling - adaptation in adaptive systems
http://www.katyjordan.com/MOOCproject.html

What Else These Students Need?
•  Top colleges
–  Stanford, CalTech, Princeton, GATech, Penn, Duke..
•  Great faculty – top guns in their fields
•  Great content
•  Top online platforms – Coursera, edX, Udacity
•  FREE!

The Needs and the Means
•  The challenges
–  Support self-regulated learning
–  Engage students into learning
–  Personalize learning process
–  Find our how we can teach better
–  Learn more about learning
•  The opportunities
–  Large volume of data
–  Who did what, when, and what was the result

Data-Driven Education
•  Using data left by past learners to benefit future
learners
•  How this data could be used? Who is making
sense of the data?
•  Human-Centered Approach
–  Visual Learning Analytics
•  Machine-Centered Approach
–  Educational Data Mining

Visual Learning Analytics
•  The idea: Present data in visual form to student
administrators helping them to make better
decisions about learning process
•  Support self-regulated learning
•  Provide navigation support for students
•  Show performance to instructors to make
decisions
•  Show data to administrators to redesign process

Educational Data Mining
•  The idea: Feed data to various data mining and
machine learning approaches to improve
existing automated learning and discover
important things for future improvements
•  Better domain modeling
•  Better student modeling
•  Better adaptation approaches
•  Finding what works best for different groups and
students

Research at PAWS Lab, U of Pittsburgh
•  http://adapt2.sis.pitt.edu/wiki/
•  Social Navigation in E-learning
•  Open Social Student modeling with Social
Comparison
•  Mining and using problem solving genome
•  Domain Modeling and Latent topic discovery
•  Data-driven student modeling
•  Open Corpus personalization

Navigation Support
•  Students need personalized guidance (navigation
support) to access right content in the right time
–  Too late – easy but mostly useless
–  Too early – not year ready to understand/apply
–  Students start with different knowledge, learn with
different speed
•  Knowledge-based navigation support based on
student modeling works well to increase success and
motivation
•  Knowledge-based approaches require considerable
knowledge engineering – domain modeling, content
analysis, prerequisite elicitation, etc.

Knowledge-based navigation support
 
1. Concept role
2. Current concept state
3. Current section state
4. Linked sections state
4
3
2
1
√"
Metadata-based mechanism

Knowledge organization for guidance
Example 2 Example M
Example 1
Problem 1
Problem 2 Problem K
Concept 1
Concept 2
Concept 3
Concept 4
Concept 5
Concept N
Examples
Problems
Concepts

Social Navigation
•  Wisdom from user data vs. wisdom from experts
•  Social navigation uses behavior of past users to
guide new users
•  Can we use “wisdom” extracted from the work of
a community of learners to replace knowledge-
based guidance?
•  Knowledge engineering vs. data analysis

Knowledge Sea II`
Brusilovsky, P., Chavan, G., and Farzan, R. (2004) Social adaptive navigation support for open corpus electronic
textbooks. Third International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems

Knowledge Sea II (+ AnnotatEd)
Farzan, R. and Brusilovsky, P. (2008) AnnotatEd: A social navigation and annotation service for web-based
educational resources. New Review in Hypermedia and Multimedia 14 (1), 3-32.

Knowledge Sea
•  Focused on group-level guidance
– Students do follow visual cues
– Social visual cues help to discover important
pages without knowledge engineering
•  Produced sizeable increase of motivations
– Students access the system more and do more
readings
– With annotation-based visual cues they do
more annotations

Open Social Student Modeling
•  Key ideas
–  Make traditional student models open to the users
–  Allow students to compare themselves with class and
peers
–  Social navigation based on performance data
•  Main challenge
–  How to design the interface to make an easy
comparison and provide social guidance and
motivation
–  We went through several attempts

Parallel Introspective Views
20

Class vs. Peers
•  Peer progress was important, students
frequently accessed content using peer models
•  The more the students compared to their peers,
the higher post-quiz scores they received (r=
0.34 p=0.004)
•  Parallel IV didn’t allow to recognized good peers
before opening the model
•  Progressor added clear peer progress

Progressor
Hsiao, I. H., Bakalov, F., Brusilovsky, P., and König-Ries, B. (2013) Progressor: social navigation support through
open social student modeling. New Review of Hypermedia and Multimedia 19 (2), 112-131.

The Value of Peers
205.73
113.05
80.81
125.5
0
50
100
150
200
250
Attempts
Progressor
QuizJET+IV
QuizJET+Portal
JavaGuide
68.39%
71.35%
42.63%
58.31%
0.00%
20.00%
40.00%
60.00%
80.00%
Success Rate
Progressor
QuizJET+IV
QuizJET+Portal
JavaGuide

MasteryGrids
•  Adaptive Navigation Support
•  Topic-based Adaptation
•  Open Social Student Modeling
•  Social Educational Progress Visualization
•  Multiple Content Types
•  Open Source
•  Concept-Based Recommendation
•  Multiple Groups

Open Social Student Modeling - I
Interactive Demo YouTube Demo

Open Social Student Modeling - II

Open Social Student Modeling - III

The Study
•  A classroom study in a graduate Database Course
•  Two sections of the same class. Same teacher, same
lectures, etc.
•  The students were able to access non-mandatory
database practice content (exercises, examples) through
Mastery Grids
•  47 students worked with OSM interface and 42 students
worked with OSSM interface
Brusilovsky, P., Somyurek, S., Guerra, J., Hosseini, R., Zadorozhny, V., and Durlach, P. (2016) The Value of
Social: Comparing Open Student Modeling and Open Social Student Modeling. IEEE Transactions on
Emerging Topics in Computing 4 (3), 450-461.

Impact on Learning
•  Student knowledge significantly increased in both
groups
•  Number of attempted problems significantly
predicts the final grade (SE=0.04,p=.017).
•  We obtained the coefficient of 0.09 for number of
attempts on problems, meaning attempting 100
problems increases the final grade by 9
•  The mean learning gain was higher for both weak and
strong students in OSSM group
•  The difference was significant for weak students (p=.
033)

Does OSSM increases system usage?
Variable!
OSM! OSSM!
U!
Mean! Mean !
Sessions! 3.93! 6.26! 685.500*!
Topics coverage! 19.0%! 56.4%! 567.500**!
Total attempts to problems 25.86! 97.62! 548.500**!
Correct attempts to problems 14.62! 60.28! 548.000**!
Distinct problems attempted 7.71! 23.51! 549.000**!
Distinct problems attempted correctly 7.52! 23.11! 545.000**!
Distinct examples viewed 18.19! 38.55! 611.500**!
Views to example lines 91.60! 209.40! 609.000**!
MG loads! 5.05! 9.83! 618.500**!
MG clicks on topic cells 24.17! 61.36! 638.500**!
MG click on content cells 46.17! 119.19! 577.500**!
MG difficulty feedback answers 6.83! 14.68! 599.500**!
Total time in the system 5145.34! 9276.58! 667.000**!
Time in problems 911.86! 2727.38! 582.000**!
Time in MG (navigation) 2260.10! 4085.31! 625.000**!

Does OSSM increase Efficiency?
•  Time per line, time per example and time per activity
scores of students in OSSM group are significantly lower
than in the other group.
•  Students who used OSSM interface worked more
efficiently.
Variable!
OSM! OSSM!
U!
Mean! Mean !
Time per line! 22.93! 11.61! 570.000**!
T i m e p e r
example! 97.74! 58.54! 508.000*!
T i m e p e r
problem! 37.96! 29.72! 242.000!
T i m e p e r
activity ! 47.92! 34.33! 277.000*!

Does OSSM Increase Student Retention?
0
20
40
60
80
100
0+ 10+ 20+ 30+ 40+ 50+
%Studentsinclass
Problem attempts
OSSM
OSM
•  OSSM group had much higher
student usage
•  Looking much more interesting to
students at the start (compare
#students after the first login)
•  At the level of 30+, serious
engagement with the system, the
OSSM group still retained more
than 50% of its original users
while OSM engagement was below
20%.
0
20
40
60
80
100
0+ 10+ 20+ 30+ 40+ 50+
Problem attempts
OSSM
OSM

Why Engagement Is Important?
•  Many systems demonstrated their educational
effectiveness in a lab-like settings: once the students are
pushed to use it - it benefits their learning
•  However, once released to real classes, these systems are
under-used - most of them offer additional non-
mandatory learning opportunities
•  “Students are only interested in points and grades”
•  Convert all tools into credit-bearing activities?
•  Or use alternative approaches to increase motivation
•  Critical to support students in self-organized non-credit
learning context like MOOCs

Current State on OSSM
•  MasteryGrids is an open source system
•  Full support offered for three domains
–  Java, 3 types of smart content
–  Python, 4 types of smart content
–  SQL, 3 types of smart content
•  Several large-scale studies in progress
•  Exploring new concept-based OSSM Interface
•  Collaborators are welcome!
–  Can use your own content and course structure!

These cells (first row) shows your
progress in the topics of the course
This bar chart shows
your progress in the
concepts of the course
Each topic has several concepts
associated to it. Mouseover a topic
to highlight its concepts
This bar chart (upside-down)
shows the average progress of
the rest of the class on the
concepts
Middle row shows the difference
between your progress and the
progress of the group
Third row shows the progress of the
group in blue
Concept-Based OSSM

Problem-Solving Genome
•  Key ideas
–  Individual differences important for understanding
students and adapting learning
–  "Old generation" of individual differences (i.e. learning
styles) not valuable in e-learning context
–  Could we use "data-driven" science extracting individual
differences from behavior data?
•  Main challenge
–  How to process the data to find and use individual
differences
•  Our approach uses sequence mining and profiling
based on the use of micro-sequences

Context: Parameterized Java Exercises
Some numbers change each time
the exercise is loaded
Hard to game
Exercise from QuizJet system

Dataset
Exercises
•  101 parameterized exercises
•  19 topics
•  Exercises labeled as easy (41), medium (41) or hard (19)
complexity
Students
•  3 terms, a total of 101 students
•  21,215 attempts, 14,726 correct and 6,489 incorrect
•  We formed sequences of repetitions of the student in the
same exercise in the same session within the system
•  We collect time in each attempt
•  Pretest, posttest (not all the students)

Timing
•  Time in first attempt is always longer (the
student has to understand the exercise)
First attempt
Next attempts

Labeling Steps (attempts)
Correctness: Success (S) or Failure (F)
Time: Short (lowercase) or Long (uppercase)
–  Using median of the distribution of time per exercise
–  Using different distributions for first attempt
label correctness time
s success short
S success long
f failure short
F failure long

Labeled Sequences
•  First and last attempt are labeled differently. Here
we used underscore ‘_’
•  Example sequences:
_fS_
_fFs_
_ss_
This labeled representation is for making sequences and patterns more
readable. The actual labeling used for running the pattern mining
algorithm uses only uppercase letters and different sets of letters for
first and last attempts within sequences.

Pattern mining
•  Using PexSPAM algorithm with gap = 0
•  Each possible pattern of length 2 or higher is
explored
•  Support of a pattern: proportion of sequences
containing the pattern (at least once)
–  Does not count multiple occurrences of the pattern within a
sequence
•  Select all patterns with minimum support of 1%

Pattern mining
•  There were 102 frequent micro patterns
Top 20 frequent micro patterns

The Problem Solving Genome
•  Constructed a frequency vector over the 102
patterns (vector of size 102) for each student
–  Each common pattern is a gene
•  The vector represents how frequently a student uses
each of the micro patterns
•  The vector is an individual genome build of genes

Problem Solving Genome
_fSss_
_fSS_
_FFss_
_FSss
_
_fSs_ Frequencies of each of the
102 common patterns
3/5
ss_ ss Ss SS_ _FS_
0/5 2/5 1/5 0/5 …
Guerra, J., Sahebi, S., Lin, Y.-R., and Brusilovsky, P. (2014) The
Problem Solving Genome: Analyzing Sequential Patterns of Student Work
with Parameterized Exercises. In: J. Stamper, Z. Pardos, M. Mavrikis and
B. M. McLaren (eds.) Proceedings of the 7th International Conference on
Educational Data Mining (EDM 2014) pp. 153-160.

Exploring the Genome
•  Stability
–  Are the patterns stable on a
student?
•  Effect of complexity
–  Are the patterns different across
complexity levels?
•  Patterns of success
–  Are successful students
following different patterns?

Genome Stability
•  Is the student more similar to him/herself
than to others?
–  Select students with at least 60 sequences (32 students)
–  For each student:
•  Split sequences per student in two random sets (set 1, set 2)
•  Form Genome of each set
–  Compute Jensen-Shannon (JS) divergence between:
•  The the genome of the 2 sets of each student (self-distance)
•  Student’s set 1 genome and set 1 of other students (average)
(other-distance)
•  Are students changing patterns over time?
–  Repeat the procedure splitting sets in early (first half) and
late (second half) sequences per student

Results (1)
Self-
distances
Other-
distances
Sig. Cohen’s
d
M SE M SE
Randomly split Genome
(a)
.2370 .0169 .4815 .0141 <.001 2.693
Early/Late Genome (b) .3211 .0214 .4997 .0164 <.001 1.205
Paired-sample t-test
•  Even when changing from early to late sequences,
student self distance is significantly smaller than the
distance to others
Genome is stable on individuals

Clustering by Genome
•  Cluster students by their genomes and analyze
different patterns
–  Between clusters
–  Between low and high students within each cluster
•  Spectral Clustering with k = 2
–  Larger eigen-gap with k = 2

•  Cluster 1: confirmers (repeat short successes)
•  Cluster 2: non-confirmers (quitters)
Ordering patterns by difference magnitude
(cluster 2 – cluster 1)

Groups and guidance
•  Successful patterns in each cluster
are closer to the other cluster
–  Successful confirmers tend to stop
after long success
–  Successful non-confirmers (c 2) tend
to continue after hard success
•  Extreme different patterns between
clusters are “harmful”
•  How it could be used for
personalization?
–  Identify student type
–  Offer different interface or discourage
poor behavior with recommendation
_FS_

Learning from examples vs. problems in Java
S / F – problem attempts
e – example
a – animated example

Preferred type of online learning content
E – Exercise
T – Text tutorial
X – Example
V – Video tutorial

Leaving for the next time
•  Domain Modeling and Latent topic discovery
–  Sahebi, S., Lin, Y.-R., and Brusilovsky, P. (2016) Tensor Factorization
for Student Modeling and Performance Prediction in Unstructured Domain.
Proceedings of the 9th International Conference on Educational Data
Mining (EDM 2016), pp. 502-505.
•  Data-driven student modeling
–  González-Brenes, J. P., Huang, Y., and Brusilovsky, P. (2014)
General Features in Knowledge Tracing to Model Multiple Subskills,
Temporal Item Response Theory, and Expert Knowledge. Proceedings of the
7th International Conference on Educational Data Mining (EDM 2014),
London, UK, July 4-7, 2014, pp. 84-91.
•  Open Corpus modeling and personalization
–  Huang, Y., Yudelson, M., Han, S., He, D., and Brusilovsky, P.
(2016) A Framework for Dynamic Knowledge Modeling in Textbook-Based
Learning. In: Proceedings of 24th Conference on User Modeling,
Adaptation and Personalization (UMAP 2016), pp. 141-150.

Acknowledgements
•  Joint work with
–  Rosta Farzan, Sharon Hsiao, Tomek Loboda
–  Sherry Sahebi, Julio Guerra, Roya Hosseini
–  Yun Huang, Rafael Dias Araújo
•  U. of Pittsburgh “Innovation in Education” awards
•  NSF Grants
–  CAREER 0447083
–  EHR 0310576
–  IIS 0426021
•  ADL.net support for OSSM work

Visit us in Pittsburgh to Learn More!

… or Read our Papers
•  http://www.pitt.edu/~peterb/papers.html
•  https://www.researchgate.net/profile/
Peter_Brusilovsky

Data driveneducationicwl2016

More Related Content

What's hot

Similar to Data driveneducationicwl2016

More from Peter Brusilovsky

Recently uploaded

Data driveneducationicwl2016