+
Integrating Knowledge Tracing
and Item Response Theory:
A Tale of Two Frameworks
Mohammad M. Khajah*, University of Colorado, Boulder
Yun Huang*, University of Pittsburgh
José P. González-Brenes*, Pearson
Michael C. Mozer, University of Colorado, Boulder
Peter Brusilovsky, University of Pittsburgh
* First authors
+ Introduction
n  Item Response Theory and Knowledge Tracing have
complimentary advantages and limitations
n  We present a unified view and evaluation of two recent
models that integrate IRT and Knowledge Tracing.
student modeling algorithm student
ability
item
difficulty
learning
Rasch Model ✓ ✓ ✗
Knowledge Tracing ✗ ✗ ✓
Combined Models
FAST González,Huang,Brusilovsky ’13
LFKT Khajah & Mozer ’13
✓ ✓ ✓
+
From Competition to Cooperation
Both frameworks got nominated for Best Paper Award in
Educational Data Mining 2014
José Yun Peter
Robert
Lindsey
Mohammad Rowan Robert Michael
Best Paper Award
and
FAST LFKT
Best Paper Nominated
+
Comparison of two frameworks
FAST LFKT
Learning? Yes! Both frameworks can infer student knowledge.
Student ability? Yes! Both frameworks use 1 parameter per student.
Item difficulty? Yes! Both frameworks use 1 parameter per item.
Model structure Both use logistic function to incorporate IRT features
Training method A variant of the EM
maximizing the likelihood
(MLE)
Bayesian estimation
(MCMC)
Generalizability Allow arbitrary features Designed for IRT features
+
FAST
n  FAST allows inference of student knowledge
n  FAST improves Knowledge Tracing by allowing
features
n Why features? MOOCs, ITSs can collect a lot of data
from students. For example:
n Hints
n affect …
n We focus on student ability and problem difficulty
(IRT)
n  See our paper in EDM ’14 to see how we
improved 25% accuracy and 300x runtime
performance!
+
Model Structure
Use IRT logistic function
with item and student
features to model emission
probabilities
* Details are in the paper.
+
Datasets
n  We conduct an extensive evaluation on four datasets of
different sizes and characteristics.
n  geometry: 5,104 observations, smallest
n  quizjet: 6,549 observations
n  physics: 110,041 observations
n  statics: 189,297 observations, largest
n  On each dataset, we compare Knowledge Tracing, IRT and
Combined Model.
n  Each model has two variants differing in training method:
n  MCMC (Bayesian)
n  EM (MLE)
+
Evaluation
n  Predict future performance given history
n  Will a student get answer correctly at t=0 ?
n  At t =1 given t = 0 performance ?
n  At t = 2 given t = 0, 1 performance ? ….
n  Area Under Curve metric
n  1: perfect classifier,
n  0.5: random classifier
n  We place the first 80% of trials from each student-skill
sequence in the training set, and the remaining in test set.
+ Results
n  Combined models beats Knowledge Tracing !
(except on the smallest dataset geometry)
n  Combined models perform similarly as IRT …
(except on the largest dataset statics, combined MCMC
model beats all other models)
+When can IRT perform similarly as
combined models?
n  We hypothesize the reason is the order in which items are
presented to students.
n  If students interact with items in deterministic orders, even
students present learning in latter items, IRT can use
underestimated item difficulties to account for student learning.
n  We examine the item order randomness and found that on the
physics and statics datasets, items are presented in more
random orders – Combined (MCMC) model starts to show
predictive advantages over IRT!
n  We will further investigate this issue.
+However, IRT doesn’t model learning…
n  We calculate the estimated learning of a student as the
probability of mastery at the last practice opportunity minus the
probability of mastery at the first practice opportunity for each
student-skill sequence.
n  Combined models and Knowledge Tracing can model learning,
but IRT can’t.
+
Conclusion
n The combined method persistently outperforms
Knowledge Tracing and offers the best of assessment
and learning sciences:
n  high prediction accuracy like the IRT model
n  the ability to model student learning like Knowledge
Tracing.
n  Both combined models have similar performance,
with a small advantage to the Bayesian method in the
largest dataset we used.
+
Try our models for your user
modeling!
n FAST code is available online soon! Also can
contactYun or José or Peter to get it now …
n Contact Mohammad or Michael for LFKT!
+
Thank you for listening !

UMAP-PALE2014 paper: Integrating Knowledge Tracing and Item Response Theory: A Tale of Two Frameworks.

  • 1.
    + Integrating Knowledge Tracing andItem Response Theory: A Tale of Two Frameworks Mohammad M. Khajah*, University of Colorado, Boulder Yun Huang*, University of Pittsburgh José P. González-Brenes*, Pearson Michael C. Mozer, University of Colorado, Boulder Peter Brusilovsky, University of Pittsburgh * First authors
  • 2.
    + Introduction n  ItemResponse Theory and Knowledge Tracing have complimentary advantages and limitations n  We present a unified view and evaluation of two recent models that integrate IRT and Knowledge Tracing. student modeling algorithm student ability item difficulty learning Rasch Model ✓ ✓ ✗ Knowledge Tracing ✗ ✗ ✓ Combined Models FAST González,Huang,Brusilovsky ’13 LFKT Khajah & Mozer ’13 ✓ ✓ ✓
  • 3.
    + From Competition toCooperation Both frameworks got nominated for Best Paper Award in Educational Data Mining 2014 José Yun Peter Robert Lindsey Mohammad Rowan Robert Michael Best Paper Award and FAST LFKT Best Paper Nominated
  • 4.
    + Comparison of twoframeworks FAST LFKT Learning? Yes! Both frameworks can infer student knowledge. Student ability? Yes! Both frameworks use 1 parameter per student. Item difficulty? Yes! Both frameworks use 1 parameter per item. Model structure Both use logistic function to incorporate IRT features Training method A variant of the EM maximizing the likelihood (MLE) Bayesian estimation (MCMC) Generalizability Allow arbitrary features Designed for IRT features
  • 5.
    + FAST n  FAST allowsinference of student knowledge n  FAST improves Knowledge Tracing by allowing features n Why features? MOOCs, ITSs can collect a lot of data from students. For example: n Hints n affect … n We focus on student ability and problem difficulty (IRT) n  See our paper in EDM ’14 to see how we improved 25% accuracy and 300x runtime performance!
  • 6.
    + Model Structure Use IRTlogistic function with item and student features to model emission probabilities * Details are in the paper.
  • 7.
    + Datasets n  We conductan extensive evaluation on four datasets of different sizes and characteristics. n  geometry: 5,104 observations, smallest n  quizjet: 6,549 observations n  physics: 110,041 observations n  statics: 189,297 observations, largest n  On each dataset, we compare Knowledge Tracing, IRT and Combined Model. n  Each model has two variants differing in training method: n  MCMC (Bayesian) n  EM (MLE)
  • 8.
    + Evaluation n  Predict futureperformance given history n  Will a student get answer correctly at t=0 ? n  At t =1 given t = 0 performance ? n  At t = 2 given t = 0, 1 performance ? …. n  Area Under Curve metric n  1: perfect classifier, n  0.5: random classifier n  We place the first 80% of trials from each student-skill sequence in the training set, and the remaining in test set.
  • 9.
    + Results n  Combinedmodels beats Knowledge Tracing ! (except on the smallest dataset geometry) n  Combined models perform similarly as IRT … (except on the largest dataset statics, combined MCMC model beats all other models)
  • 10.
    +When can IRTperform similarly as combined models? n  We hypothesize the reason is the order in which items are presented to students. n  If students interact with items in deterministic orders, even students present learning in latter items, IRT can use underestimated item difficulties to account for student learning. n  We examine the item order randomness and found that on the physics and statics datasets, items are presented in more random orders – Combined (MCMC) model starts to show predictive advantages over IRT! n  We will further investigate this issue.
  • 11.
    +However, IRT doesn’tmodel learning… n  We calculate the estimated learning of a student as the probability of mastery at the last practice opportunity minus the probability of mastery at the first practice opportunity for each student-skill sequence. n  Combined models and Knowledge Tracing can model learning, but IRT can’t.
  • 12.
    + Conclusion n The combined methodpersistently outperforms Knowledge Tracing and offers the best of assessment and learning sciences: n  high prediction accuracy like the IRT model n  the ability to model student learning like Knowledge Tracing. n  Both combined models have similar performance, with a small advantage to the Bayesian method in the largest dataset we used.
  • 13.
    + Try our modelsfor your user modeling! n FAST code is available online soon! Also can contactYun or José or Peter to get it now … n Contact Mohammad or Michael for LFKT!
  • 14.
    + Thank you forlistening !