EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subskills, Temporal Item Response Theory, and Expert Knowledge

General Features in Knowledge Tracing
Applications to Multiple Subskills,
Temporal IRT & Expert Knowledge
* First authors
Yun Huang, University of Pittsburgh*
José P. González-Brenes, Pearson*
Peter Brusilovsky, University of Pittsburgh

This talk…
•  What? Determine student mastery of a skill
•  How? Novel algorithm called FAST
–  Enables features in Knowledge Tracing
•  Why? Better and faster student modeling
–  25% better AUC, a classification metric
–  300 times faster than popular general purpose
student modeling techniques (BNT-SM)

Outline
•  Introduction
•  FAST – Feature-Aware Student Knowledge Tracing
•  Experimental Setup
•  Applications
1.  Multiple subskills
2.  Temporal Item Response Theory
3.  Paper exclusive: Expert knowledge
•  Execution time
•  Conclusion

Motivation
•  Personalize learning of students
– For example, teach students new material as
they learn, so we don’t teach students
material they know
•  How? Typically with Knowledge Tracing

:

û û

ü

ü
û û ü

ü

ü

û û ü

ü

ü

û û ü

ü

:

:

:

û û ü

ü

ü

û û ü

ü

Masters a
skill or not
•  Knowledge Tracing fits a two-
state HMM per skill
•  Binary latent variables indicate
the knowledge of the student
of the skill
•  Four parameters:
1.  Initial Knowledge
2.  Learning
3.  Guess
4.  Slip
Transition
Emission

What’s wrong?
•  Only uses performance data
(correct or incorrect)
•  We are now able to capture feature rich data
–  MOOCs & intelligent tutoring systems are able to
log fine-grained data
–  Used a hint, watched video, after hours practice…
•  … these features can carry information or
intervene on learning

What’s a researcher gotta do?
•  Modify Knowledge Tracing algorithm
•  For example, just on a small-scale
literature survey, we find at least nine
different flavors of Knowledge Tracing

So you want to publish in EDM?
1.  Think of a feature (e.g., from a MOOC)
2.  Modify Knowledge Tracing
3.  Write Paper
4.  Publish
5.  Loop!

Are all of those models sooooo
different?
•  No! we identify three main variants
•  We call them the “Knowledge Tracing
Family”

Knowledge Tracing Family
No features
Emission
(guess/slip)
Transition
(learning)
Both
(guess/slip and
learning)
•  Item
diﬃculty

(Gowda
et
al
’11;

Pardos
et
al
’11)

•  Student
ability

(Pardos
et
al

’10)

•  Subskills
(Xu
et

al
’12)

•  Help
(Sao
Pedro

et
al
’13)

•  Student
ability

(Lee
et
al
’12;

Yudelson
et
al
’13)

•  Item
diﬃculty

(Schultz
et
al
’13)

•  Help
(Becker

et
al

’08)

k

y

k

y

f

k

y

f

k

y

f
f

•  Each model is successful for
an ad hoc purpose only
– Hard to compare models
– Doesn’t help to build a
cognition theory

•  Learning scientists have to
worry about both features
and modeling

•  These models are not
scalable:
– Rely on Bayes Net’s
conditional probability tables
– Memory performance grows
exponentially with number of
features
– Runtime performance grows
exponentially with number of
features (with exact
inference)

Example:
Mastery p(Correct)
False (1) 0.10 (guess)
True (2) 0.85 (1-slip)
20+1 parameters!
Emission probabilities with no features:

Example:
Emission probabilities with 1 binary feature:
Mastery Hint p(Correct)
False False (1) 0.06
True False (2) 0.75
False True (3) 0.25
True True (4) 0.99
21+1 parameters!

Example:
Emission probabilities with 10 binary features:
Mastery F1 … F10 p(Correct)
False False False False (1) 0.06
… …
True True True True (2048) 0.90
210+1 parameters!

Outline
•  Introduction
•  Applications
– Multiple subskills
– Temporal IRT
•  Conclusion

Something old…
k

y

f
f

•  Uses the most general model
in the Knowledge Tracing
Family
•  Parameterizes learning and
emission (guess+slip)
probabilities

Something new…
k

y

f
f

•  Instead of using inefficient
conditional probability tables,
we use logistic regression
[Berg-Kirkpatrick et al’10 ]
•  Exponential complexity ->
linear complexity

Example:
# of features # of pararameters in KTF # of parameters in FAST
0 2 2
1 4 3
10 2048 12
25 67,108,864 27
25 features are not that many, and yet they
can become intractable with Knowledge
Tracing Family

Something blue?
k

y

f
f

•  Not a lot of changes to
implement prediction
•  Training requires quite a bit of
changes
– We use a recent modification of
the Expectation-Maximization
algorithm proposed for
Computational Linguistics
problems

(A parenthesis)
•  Jose’s corollary: Each
equation in a presentation
would send to sleep half the
audience
•  Equations are in the paper!
“Each
equaMon
I

include
in
the
book

would
halve
the
sales”

KT uses Expectation-Maximization
Conditional
Probability
Table
Lookup
Latent
Mastery
E-Step:Forward-Backward algorithm
M-Step: Maximum Likelihood

“Conditional
Probability
Table”
Lookup
Latent
Mastery
Logistic
regression
weights
FAST uses a recent E-M algorithm
E-step

Slip/guess lookup:
Mastery p(Correct)
False (1)
True (2)
Use the multiple
parameters of logistic
regression to fill the
values of a “no-
features”conditional
probability table!

“Conditional
Probability
Table”
Lookup
Latent
Mastery
Logistic
regression
weights
FAST uses a recent E-M algorithm

observation 1
observation 2
observation n
...
feature1feature2
featurekfeature1feature2
featurek
... ... ...
observation 1
observation 2
observation n
...
{
{
{
active when
mastered
active when
not mastered
always active
Features:Instance
weights:
probabilityof
notmastering
probabilityof
mastering
Slip/Guess logistic regression

observation 1
observation 2
observation n
...
feature1feature2
featurek
... ... ...
observation 1
observation 2
observation n
...
{
{
{
active when
mastered
active when
not mastered
always active
Features:Instance
weights:
probabilityof
notmastering
probabilityof
mastering
Slip/Guess logistic regression
When FAST
uses only
intercept terms
as features for
the two levels
of mastery, it is
equivalent to
Knowledge
Tracing!

Outline
•  Introduction
•  Examples
– Temporal IRT
– Expert knowledge
•  Conclusion

Collected from QuizJET, a tutor for learning Java programming.
March 28, 2014 31
Each question is generated from a template,
and students can try multiple attempts
Students give values for a variable or the
output
Java code
Tutoring System

March 28, 2014 32
Data
•  Smaller dataset:
– ~21,000 observations
– First attempt: ~7,000 observations
– 110 students
•  Unbalanced: 70% correct
•  95 question templates
•  “Hierarchical” cognitive model:
19 skills, 99 subskills

•  Predict future performance given history
-  Will a student get answer correctly at t=0 ?
-  At t =1 given t = 0 performance ?
-  At t = 2 given t = 0, 1 performance ? ….
•  Area Under Curve metric
-  1: perfect classifier
-  0.5: random classifier
March 28, 2014 33
Evaluation

Outline
•  Introduction
•  Applications
–  Multiple subskills
–  Temporal IRT
–  Expert knowledge
•  Conclusion

Multiple subskills
•  Experts annotated items (question) with a
single skill and multiple subskills

Multiple subskills &
KnowledgeTracing
•  Original Knowledge Tracing can not
model multiple subskills
•  Most Knowledge Tracing variants assume
equal importance of subskills during
training (and then adjust it during testing)
•  State of the art method, LR-DBN [Xu and
Mostow ’11] assigns importance in both
training and testing

FAST can handle multiple subskills
•  Parameterize learning
•  Parameterize slip and guess
•  Features: binary variables that indicate
presence of subskills

FAST vs Knowledge Tracing:
Slip parameters of subskills
•  Conventional
Knowledge assumes
that all subskills have
the same difficulty
(red line)
•  FAST can identify
different difficulty
between subskills
•  Does it matter?
subskills within a skill:

State of the art (Xu & Mostow’11)
•  The 95% of confidence intervals are within +/- .01 points
Model AUC
LR-DBN .71
KT - Weakest .69
KT - Multiply .62

Benchmark
Model AUC
LR-DBN .71
Single-skill KT .71
KT - Weakest .69
KT - Multiply .62
•  We are testing on non-overlapping students, LR-DBN was
designed/tested in overlapping students and didn’t compare to
single skill KT
!

Benchmark
Model AUC
FAST .74
LR-DBN .71
Single-skill KT .71
KT - Weakest .69
KT - Multiply .62

Two paradigms:
(50 years of research in 1 slide)
•  Knowledge Tracing
– Allows learning
– Every item = same difficulty
– Every student = same ability
•  Item Response Theory
– NO learning
– Models items difficulties
– Models student abilities

Can FAST help merging the
paradigms?

Item Response Theory
•  The simplest of its forms, it’s the Rasch
model
•  The Rasch can be formulated in many
ways:
– Typically using latent variables
– Logistic regression
•  a feature per student
•  a feature per item
•  We end up with a lot of features! – Good thing we
are using FAST ;-)

Results
AUC
Knowledge Tracing .65
FAST + student .64
FAST + item .73
FAST + IRT .76
25%
improvement

Disclaimer
•  In our dataset, most students answer
items in the same order
•  Item estimates are biased
•  Future work: define continuous IRT
difficulty features
– It’s easy in FAST ;-)

March 28, 2014 50
7,100 11,300 15,500 19,800
0
10
20
30
40
50
60
23
28
46
54
0.08 0.10 0.12 0.15
# of observations
executiontime(min.)
BNT−SM (no feat.)
FAST (no feat.)
FAST is 300x faster than BNT-SM!

LR-DBN vs FAST
•  We use the authors’ implementation of
LR-DBN
•  LR-DBN takes about 250 minutes
•  FAST only takes about 44 seconds
•  15,500 datapoints
•  This is on an old laptop, no parallelization,
nothing fancy
•  (details on the paper)

Outline
•  Introduction
•  Examples
– Temporal IRT
•  Conclusion

Comparison of existing techniques
March 28, 2014 53
allows
features
slip/
guess
recency/
ordering
learning
FAST ✓
✓
✓
✓

PFA
Pavlik et al ’09
✓
✗
✗
✓

Knowledge Tracing
Corbett & Anderson ’95
✗
✓
✓
✓

Rasch Model
Rasch ’60
✓
✗
✗
✗

•  FAST lives by its name
•  FAST provides high flexibility in utilizing
features, and as our studies show, even
with simple features improves significantly
over Knowledge Tracing

•  The effect of features depends on how
smartly they are designed and on the
dataset
•  I am looking forward for more clever uses
of feature engineering for FAST in the
community

EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subskills, Temporal Item Response Theory, and Expert Knowledge

EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subskills, Temporal Item Response Theory, and Expert Knowledge

More Related Content

What's hot

Similar to EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subskills, Temporal Item Response Theory, and Expert Knowledge

Recently uploaded

EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subskills, Temporal Item Response Theory, and Expert Knowledge