SlideShare a Scribd company logo
1 of 55
Download to read offline
Your model is predictive but is it useful?−
Theoretical and Empirical Considerations of a
New Paradigm for Adaptive Tutoring Evaluation
José P. González-Brenes, Pearson
Yun Huang, University of Pittsburgh
1
Main point of the paper:
We are not evaluating student models
correctly
New paradigm, Leopard, may help
2
3
Why are we here?
4
5
Educational Data Mining
Data Mining
=	
  
?
6
Should we just publish at KDD*?
*or other data mining venue
7
Claim:
Educational Data Mining helps learners
8
9
Is our research helping learners?
Adaptive Intelligent Tutoring Systems:
Systems that teach and adapt content to
humans
teach	
  
collect	
  evidence	
  
10
Paper writing: Researchers quantify the
improvements of the systems compared
Not a purely academic pursuit:
Superintendents to choose between
alternative technology
Teachers choose between systems
11
Randomized Controlled Trials may measure
the time students spent on tutoring, and
their performance on post-tests
12
Difficulties of Randomized Controlled Trials:
•  IRB approval
•  experimental design by an expert
•  recruiting (and often payment!) of enough
participants to achieve statistical power
•  data analysis
13
How do other A.I. disciplines do it?
14
Bleu [Papineni et al ‘01]:
machine translation systems
Rouge [Lin et al ’02]:
automatic summarization systems
Paradise [Walker et al ’99]:
spoken dialogue systems
15
Using automatic metrics can be very
positive:
•  Cheaper experimentation
•  Faster comparisons
•  Competitions that accelerate progress
16
Automatic metrics do not replace RCTs
17
What does the Educational Data Mining
community do?
Evaluate the student model using
classification accuracy metrics like RMSE,
AUC of ROC, accuracy…
(Literature reviews by Pardos, Pelánek, … )
18
Other fields verify that automatic metrics
correlated with the target behavior
[Eg.: Callison-Burch et al ’06]
19
Ironically, we have a growing body of
evidence that classification evaluation
metrics are a BAD way to evaluate adaptive
tutors
Read Baker and Beck papers on limitations /
problems of these evaluation metrics
20
Surprisingly, in spite of all of the evidence
against using classification evaluation
metrics, their use is still very widespread in
the adaptive literature*
Can we do better?
* ExpOppNeed [Lee & Brunskill] is an exception
21
The rest of this talk:
•  Leopard Paradigm
– Teal
– White
•  Meta-evaluation
•  Discussion
22
Leopard
23
•  Leopard: Learner Effort-Outcomes
Paradigm
•  Leopard quantifies the effort and
outcomes of students in adaptive tutoring
24
•  Effort: Quantifies how much practice the
adaptive tutor gives to students. Eg.,
number of items assigned to students,
amount of time…
•  Outcome: Quantifies the performance of
students after adaptive tutoring
25
•  Measuring effort and outcomes is not
novel by itself (e.g, RCT)
•  Leopard’s contribution is measuring both
without a randomized control trial
•  White and Teal are metrics that
operationalize Leopard
26
White: Whole Intelligent Tutoring system
Evaluation
White performs a counterfactual simulation
(“What Would the Tutor Do?”) to estimate
how much practice students receive
27
28
Design desiderata:
Evaluation metric should be easy to use
Same, or similar input than conventional
metrics
29
Bob
Alice
skill q
1
student u
.5
Bob
s1
s1
Bob
Alice
s1
.7
Bob
s13
.7
s1
.8
6
1
yt
0
4
1
.6
1
1
Bob
2
3
4
Bob
.7
2
.9
1Bob
.5
t
1
Alice s1
1
0
s1
s1
s1
0
s1
s1
1
ct+1
.6
Alice 0
.4
.9
ˆ
actual performance
predicted performance
cu,q,t+1 yu,q,tˆ
30
Bob
Alice
skill q
1
student u
.5
Bob
s1
s1
Bob
Alice
s1
.7
Bob
s13
.7
s1
.8
6
1
yt
0
4
1
.6
1
1
Bob
2
3
4
Bob
.7
2
.9
1Bob
.5
t
1
Alice s1
1
0
s1
s1
s1
0
s1
s1
1
ct+1
.6
Alice 0
.4
.9
2/3
4/5
score
score
effort=
effort=
ˆ
actual performance
predicted performance
cu,q,t+1 yu,q,tˆ
Alternatively,
we can
model error
Future direction:
Present aggregate results
31
Q/ What if student does not achieve target
performance?
A: “Visible” imputation
32
Meta-Evaluation
33
Compare:
•  Conventional classification metrics
•  Leopard metrics (White)
34
Datasets:
•  Data from a middle-school Math commercial
tutor
–  1.2 million observations
–  25,000 students
–  Item to skill mapping:
•  Coarse: 27 skills
•  Fine: 90 skills
•  (Other item-to-skill model not reported)
•  Synthetic data
35
Assessing an evaluation metric with real
student data is difficult because we often do
not know the ground truth
Insight: Use data that we know a priori its
behavior in an adaptive tutor
36
For adaptive tutoring to be able to optimize
when to stop instruction, the student
performance should increase with repeated
practice (the learning curve should be
increasing)
Decreasing /flat learning curve = bad data
37
38
39
Procedure:
1.  Select skills with decreasing/flat learning
curve (aka bad data)
2.  Train a student model on those skills
3.  Compare classification metrics with
Leopard
40
F1 AUC Score Effort
Bad student model .79 .85
Majority class 0 .50
41
F1 AUC Score Effort
Bad student model .79 .85 .18 10.1
Majority class 0 .50 .18 11.2
42
What does this mean?
•  High accuracy models may not be useful
for adaptive tutoring
•  We need to change how we report results
in adaptive tutoring
43
Solutions
•  Report classification accuracy averaged
over skills (for models with 1 skill per item)
✖ Not useful for comparing or discovering
different skill models
•  Report as “difficulty” baseline
✖ Experiments suggest that models with
baseline performance can be useful
•  Use Leopard
44
Let’s use all data, and pick an item-to-skill
mapping:
AUC Score Effort
Coarse (27 skills) .69
Fine (90 skills) .74
45
Let’s use all data, and pick an item-to-skill
mapping:
AUC Score Effort
Coarse (27 skills) .69 .41	
   55.7	
  
Fine (90 skills) .74 .36	
   88.1	
  
46
With synthetic data we can use Teal as the
ground truth
We generate 500 synthetic datasets with
known Knowledge Tracing Parameters
Which metrics correlate best to the truth?
47
48
49
Discussion
50
51
In EDM 2014 we
proposed “FAST” toolkit
for Knowledge Tracing
with Features
“FAST model improves 25% AUC of ROC”
54
Input
Teal White
Knowledge Tracing
Family parameters
Student’s correct or
incorrect response
Sequence length Student models’
prediction of correct/
incorrect
Target Probability of
correct
Target Probability of
correct
55

More Related Content

What's hot

ICTCM 2013 Presentation -- Dan DuPort
ICTCM 2013 Presentation  --  Dan DuPortICTCM 2013 Presentation  --  Dan DuPort
ICTCM 2013 Presentation -- Dan DuPortDan DuPort
 
Integrating Data Analysis at Berea College
Integrating Data Analysis at Berea CollegeIntegrating Data Analysis at Berea College
Integrating Data Analysis at Berea CollegeSERC at Carleton College
 
Examview Features and benefits
Examview Features and benefitsExamview Features and benefits
Examview Features and benefitsWilliam McIntosh
 
Fuzzy logic based students’ learning assessment
Fuzzy logic based students’ learning assessmentFuzzy logic based students’ learning assessment
Fuzzy logic based students’ learning assessmentAung Thu Rha Hein
 
Froehlke plan revised
Froehlke plan revisedFroehlke plan revised
Froehlke plan revisedtfroehlke
 
SANDE NDT Training School - Online Blended Learning
SANDE NDT Training School - Online Blended LearningSANDE NDT Training School - Online Blended Learning
SANDE NDT Training School - Online Blended LearningNEIL HARRAP
 
Fine tuning assessment for teaching and learning
Fine tuning assessment for teaching and learningFine tuning assessment for teaching and learning
Fine tuning assessment for teaching and learningRoy Williams
 
Footholds and Foundations: Setting Freshmen on the Path to Lifelong Learning
Footholds and Foundations: Setting Freshmen on the Path to Lifelong LearningFootholds and Foundations: Setting Freshmen on the Path to Lifelong Learning
Footholds and Foundations: Setting Freshmen on the Path to Lifelong Learningannielibrarian
 
Lab_student_evaluation_2016
Lab_student_evaluation_2016Lab_student_evaluation_2016
Lab_student_evaluation_2016Jiemin Zhang
 
Ashford edu 645 week 4 dq 1 item analysis
Ashford edu 645 week 4 dq 1 item analysisAshford edu 645 week 4 dq 1 item analysis
Ashford edu 645 week 4 dq 1 item analysissdafasfasfasf
 
How to prove your edtech product works
How to prove your edtech product worksHow to prove your edtech product works
How to prove your edtech product worksJoseph Wilson
 
An accurate ability evaluation method for every student with small problem it...
An accurate ability evaluation method for every student with small problem it...An accurate ability evaluation method for every student with small problem it...
An accurate ability evaluation method for every student with small problem it...Hideo Hirose
 

What's hot (15)

IACBE Conference Presentation-2015
IACBE Conference Presentation-2015IACBE Conference Presentation-2015
IACBE Conference Presentation-2015
 
ICTCM 2013 Presentation -- Dan DuPort
ICTCM 2013 Presentation  --  Dan DuPortICTCM 2013 Presentation  --  Dan DuPort
ICTCM 2013 Presentation -- Dan DuPort
 
SBAC Performance Task Overview
SBAC Performance Task OverviewSBAC Performance Task Overview
SBAC Performance Task Overview
 
Integrating Data Analysis at Berea College
Integrating Data Analysis at Berea CollegeIntegrating Data Analysis at Berea College
Integrating Data Analysis at Berea College
 
Examview Features and benefits
Examview Features and benefitsExamview Features and benefits
Examview Features and benefits
 
Fuzzy logic based students’ learning assessment
Fuzzy logic based students’ learning assessmentFuzzy logic based students’ learning assessment
Fuzzy logic based students’ learning assessment
 
Mathadoption K5 Summary
Mathadoption K5 SummaryMathadoption K5 Summary
Mathadoption K5 Summary
 
Froehlke plan revised
Froehlke plan revisedFroehlke plan revised
Froehlke plan revised
 
SANDE NDT Training School - Online Blended Learning
SANDE NDT Training School - Online Blended LearningSANDE NDT Training School - Online Blended Learning
SANDE NDT Training School - Online Blended Learning
 
Fine tuning assessment for teaching and learning
Fine tuning assessment for teaching and learningFine tuning assessment for teaching and learning
Fine tuning assessment for teaching and learning
 
Footholds and Foundations: Setting Freshmen on the Path to Lifelong Learning
Footholds and Foundations: Setting Freshmen on the Path to Lifelong LearningFootholds and Foundations: Setting Freshmen on the Path to Lifelong Learning
Footholds and Foundations: Setting Freshmen on the Path to Lifelong Learning
 
Lab_student_evaluation_2016
Lab_student_evaluation_2016Lab_student_evaluation_2016
Lab_student_evaluation_2016
 
Ashford edu 645 week 4 dq 1 item analysis
Ashford edu 645 week 4 dq 1 item analysisAshford edu 645 week 4 dq 1 item analysis
Ashford edu 645 week 4 dq 1 item analysis
 
How to prove your edtech product works
How to prove your edtech product worksHow to prove your edtech product works
How to prove your edtech product works
 
An accurate ability evaluation method for every student with small problem it...
An accurate ability evaluation method for every student with small problem it...An accurate ability evaluation method for every student with small problem it...
An accurate ability evaluation method for every student with small problem it...
 

Similar to Is Our Evaluation of Student Models Accurately Measuring Educational Impact

eMOOCs2015 Does peer grading work?
eMOOCs2015 Does peer grading work?eMOOCs2015 Does peer grading work?
eMOOCs2015 Does peer grading work?Rémi Bachelet
 
AI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptxAI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptxMohammadAsim91
 
Seminario eMadrid sobre "Inteligencia natural y artificial en educación". Int...
Seminario eMadrid sobre "Inteligencia natural y artificial en educación". Int...Seminario eMadrid sobre "Inteligencia natural y artificial en educación". Int...
Seminario eMadrid sobre "Inteligencia natural y artificial en educación". Int...eMadrid network
 
Instructional software presentation ronen cohen
Instructional software presentation  ronen cohenInstructional software presentation  ronen cohen
Instructional software presentation ronen cohenroneninio
 
Umap17 learner modelingforintegrationskills_yunhuang
Umap17 learner modelingforintegrationskills_yunhuangUmap17 learner modelingforintegrationskills_yunhuang
Umap17 learner modelingforintegrationskills_yunhuangYun Huang
 
Real time embedded assessments 8 8-13
Real time embedded assessments 8 8-13Real time embedded assessments 8 8-13
Real time embedded assessments 8 8-13Robert Leneway
 
The Value of Peer Grading Using Rubrics
The Value of Peer Grading Using RubricsThe Value of Peer Grading Using Rubrics
The Value of Peer Grading Using RubricsExamSoft
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsAlejandro Bellogin
 
Investigating learning strategies in a dispositional learning analytics conte...
Investigating learning strategies in a dispositional learning analytics conte...Investigating learning strategies in a dispositional learning analytics conte...
Investigating learning strategies in a dispositional learning analytics conte...Bart Rienties
 
5 learning edited 2012.ppt
5 learning edited 2012.ppt5 learning edited 2012.ppt
5 learning edited 2012.pptHenokGetachew15
 
Revising Instructional Materials
Revising Instructional MaterialsRevising Instructional Materials
Revising Instructional MaterialsAngel Jones
 
The why and what of testa
The why and what of testaThe why and what of testa
The why and what of testaTansy Jessop
 
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptxLETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptxshamsul2010
 
PISD Assessment and PLC Presentation
PISD Assessment and PLC PresentationPISD Assessment and PLC Presentation
PISD Assessment and PLC PresentationMary Hooper
 
Rubric design workshop
Rubric design workshopRubric design workshop
Rubric design workshopLisa M. Snyder
 
Pragmatic software testing education - SIGCSE 2019
Pragmatic software testing education - SIGCSE 2019Pragmatic software testing education - SIGCSE 2019
Pragmatic software testing education - SIGCSE 2019Maurício Aniche
 

Similar to Is Our Evaluation of Student Models Accurately Measuring Educational Impact (20)

eMOOCs2015 Does peer grading work?
eMOOCs2015 Does peer grading work?eMOOCs2015 Does peer grading work?
eMOOCs2015 Does peer grading work?
 
AI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptxAI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptx
 
Seminario eMadrid sobre "Inteligencia natural y artificial en educación". Int...
Seminario eMadrid sobre "Inteligencia natural y artificial en educación". Int...Seminario eMadrid sobre "Inteligencia natural y artificial en educación". Int...
Seminario eMadrid sobre "Inteligencia natural y artificial en educación". Int...
 
Instructional software presentation ronen cohen
Instructional software presentation  ronen cohenInstructional software presentation  ronen cohen
Instructional software presentation ronen cohen
 
Umap17 learner modelingforintegrationskills_yunhuang
Umap17 learner modelingforintegrationskills_yunhuangUmap17 learner modelingforintegrationskills_yunhuang
Umap17 learner modelingforintegrationskills_yunhuang
 
Real time embedded assessments 8 8-13
Real time embedded assessments 8 8-13Real time embedded assessments 8 8-13
Real time embedded assessments 8 8-13
 
The Value of Peer Grading Using Rubrics
The Value of Peer Grading Using RubricsThe Value of Peer Grading Using Rubrics
The Value of Peer Grading Using Rubrics
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
Investigating learning strategies in a dispositional learning analytics conte...
Investigating learning strategies in a dispositional learning analytics conte...Investigating learning strategies in a dispositional learning analytics conte...
Investigating learning strategies in a dispositional learning analytics conte...
 
5 learning edited 2012.ppt
5 learning edited 2012.ppt5 learning edited 2012.ppt
5 learning edited 2012.ppt
 
Revising Instructional Materials
Revising Instructional MaterialsRevising Instructional Materials
Revising Instructional Materials
 
Orientation deliverydeck pucpr
Orientation deliverydeck pucprOrientation deliverydeck pucpr
Orientation deliverydeck pucpr
 
The why and what of testa
The why and what of testaThe why and what of testa
The why and what of testa
 
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptxLETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
 
Addictive links: Adaptive Navigation Support in College-Level Courses
Addictive links: Adaptive Navigation Support in College-Level CoursesAddictive links: Adaptive Navigation Support in College-Level Courses
Addictive links: Adaptive Navigation Support in College-Level Courses
 
L4: Systematic Approach
L4: Systematic ApproachL4: Systematic Approach
L4: Systematic Approach
 
L4: Systematic Approach
 L4: Systematic Approach L4: Systematic Approach
L4: Systematic Approach
 
PISD Assessment and PLC Presentation
PISD Assessment and PLC PresentationPISD Assessment and PLC Presentation
PISD Assessment and PLC Presentation
 
Rubric design workshop
Rubric design workshopRubric design workshop
Rubric design workshop
 
Pragmatic software testing education - SIGCSE 2019
Pragmatic software testing education - SIGCSE 2019Pragmatic software testing education - SIGCSE 2019
Pragmatic software testing education - SIGCSE 2019
 

More from Yun Huang

LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)
LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)
LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)Yun Huang
 
UMAP16: A Framework for Dynamic Knowledge Modeling in Textbook-Based Learning
UMAP16: A Framework for Dynamic Knowledge Modeling in Textbook-Based LearningUMAP16: A Framework for Dynamic Knowledge Modeling in Textbook-Based Learning
UMAP16: A Framework for Dynamic Knowledge Modeling in Textbook-Based LearningYun Huang
 
2014UMAP Student Modeling with Reduced Content Models
2014UMAP Student Modeling with Reduced Content Models2014UMAP Student Modeling with Reduced Content Models
2014UMAP Student Modeling with Reduced Content ModelsYun Huang
 
2015EDM: Feature-Aware Student Knowledge Tracing Tutorial
2015EDM: Feature-Aware Student Knowledge Tracing Tutorial2015EDM: Feature-Aware Student Knowledge Tracing Tutorial
2015EDM: Feature-Aware Student Knowledge Tracing TutorialYun Huang
 
2015EDM: A Framework for Multifaceted Evaluation of Student Models (Polygon)
2015EDM: A Framework for Multifaceted Evaluation of Student Models (Polygon)2015EDM: A Framework for Multifaceted Evaluation of Student Models (Polygon)
2015EDM: A Framework for Multifaceted Evaluation of Student Models (Polygon)Yun Huang
 
2014 EDM paper: The Problem Solving Genome: Analyzing Sequential Patterns of ...
2014 EDM paper: The Problem Solving Genome: Analyzing Sequential Patterns of ...2014 EDM paper: The Problem Solving Genome: Analyzing Sequential Patterns of ...
2014 EDM paper: The Problem Solving Genome: Analyzing Sequential Patterns of ...Yun Huang
 
EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subski...
EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subski...EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subski...
EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subski...Yun Huang
 

More from Yun Huang (7)

LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)
LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)
LAK21 Data Driven Redesign of Tutoring Systems (Yun Huang)
 
UMAP16: A Framework for Dynamic Knowledge Modeling in Textbook-Based Learning
UMAP16: A Framework for Dynamic Knowledge Modeling in Textbook-Based LearningUMAP16: A Framework for Dynamic Knowledge Modeling in Textbook-Based Learning
UMAP16: A Framework for Dynamic Knowledge Modeling in Textbook-Based Learning
 
2014UMAP Student Modeling with Reduced Content Models
2014UMAP Student Modeling with Reduced Content Models2014UMAP Student Modeling with Reduced Content Models
2014UMAP Student Modeling with Reduced Content Models
 
2015EDM: Feature-Aware Student Knowledge Tracing Tutorial
2015EDM: Feature-Aware Student Knowledge Tracing Tutorial2015EDM: Feature-Aware Student Knowledge Tracing Tutorial
2015EDM: Feature-Aware Student Knowledge Tracing Tutorial
 
2015EDM: A Framework for Multifaceted Evaluation of Student Models (Polygon)
2015EDM: A Framework for Multifaceted Evaluation of Student Models (Polygon)2015EDM: A Framework for Multifaceted Evaluation of Student Models (Polygon)
2015EDM: A Framework for Multifaceted Evaluation of Student Models (Polygon)
 
2014 EDM paper: The Problem Solving Genome: Analyzing Sequential Patterns of ...
2014 EDM paper: The Problem Solving Genome: Analyzing Sequential Patterns of ...2014 EDM paper: The Problem Solving Genome: Analyzing Sequential Patterns of ...
2014 EDM paper: The Problem Solving Genome: Analyzing Sequential Patterns of ...
 
EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subski...
EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subski...EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subski...
EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subski...
 

Recently uploaded

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 

Recently uploaded (20)

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 

Is Our Evaluation of Student Models Accurately Measuring Educational Impact

  • 1. Your model is predictive but is it useful?− Theoretical and Empirical Considerations of a New Paradigm for Adaptive Tutoring Evaluation José P. González-Brenes, Pearson Yun Huang, University of Pittsburgh 1
  • 2. Main point of the paper: We are not evaluating student models correctly New paradigm, Leopard, may help 2
  • 3. 3
  • 4. Why are we here? 4
  • 5. 5
  • 7. Should we just publish at KDD*? *or other data mining venue 7
  • 9. 9 Is our research helping learners?
  • 10. Adaptive Intelligent Tutoring Systems: Systems that teach and adapt content to humans teach   collect  evidence   10
  • 11. Paper writing: Researchers quantify the improvements of the systems compared Not a purely academic pursuit: Superintendents to choose between alternative technology Teachers choose between systems 11
  • 12. Randomized Controlled Trials may measure the time students spent on tutoring, and their performance on post-tests 12
  • 13. Difficulties of Randomized Controlled Trials: •  IRB approval •  experimental design by an expert •  recruiting (and often payment!) of enough participants to achieve statistical power •  data analysis 13
  • 14. How do other A.I. disciplines do it? 14
  • 15. Bleu [Papineni et al ‘01]: machine translation systems Rouge [Lin et al ’02]: automatic summarization systems Paradise [Walker et al ’99]: spoken dialogue systems 15
  • 16. Using automatic metrics can be very positive: •  Cheaper experimentation •  Faster comparisons •  Competitions that accelerate progress 16
  • 17. Automatic metrics do not replace RCTs 17
  • 18. What does the Educational Data Mining community do? Evaluate the student model using classification accuracy metrics like RMSE, AUC of ROC, accuracy… (Literature reviews by Pardos, Pelánek, … ) 18
  • 19. Other fields verify that automatic metrics correlated with the target behavior [Eg.: Callison-Burch et al ’06] 19
  • 20. Ironically, we have a growing body of evidence that classification evaluation metrics are a BAD way to evaluate adaptive tutors Read Baker and Beck papers on limitations / problems of these evaluation metrics 20
  • 21. Surprisingly, in spite of all of the evidence against using classification evaluation metrics, their use is still very widespread in the adaptive literature* Can we do better? * ExpOppNeed [Lee & Brunskill] is an exception 21
  • 22. The rest of this talk: •  Leopard Paradigm – Teal – White •  Meta-evaluation •  Discussion 22
  • 24. •  Leopard: Learner Effort-Outcomes Paradigm •  Leopard quantifies the effort and outcomes of students in adaptive tutoring 24
  • 25. •  Effort: Quantifies how much practice the adaptive tutor gives to students. Eg., number of items assigned to students, amount of time… •  Outcome: Quantifies the performance of students after adaptive tutoring 25
  • 26. •  Measuring effort and outcomes is not novel by itself (e.g, RCT) •  Leopard’s contribution is measuring both without a randomized control trial •  White and Teal are metrics that operationalize Leopard 26
  • 27. White: Whole Intelligent Tutoring system Evaluation White performs a counterfactual simulation (“What Would the Tutor Do?”) to estimate how much practice students receive 27
  • 28. 28 Design desiderata: Evaluation metric should be easy to use Same, or similar input than conventional metrics
  • 29. 29 Bob Alice skill q 1 student u .5 Bob s1 s1 Bob Alice s1 .7 Bob s13 .7 s1 .8 6 1 yt 0 4 1 .6 1 1 Bob 2 3 4 Bob .7 2 .9 1Bob .5 t 1 Alice s1 1 0 s1 s1 s1 0 s1 s1 1 ct+1 .6 Alice 0 .4 .9 ˆ actual performance predicted performance cu,q,t+1 yu,q,tˆ
  • 30. 30 Bob Alice skill q 1 student u .5 Bob s1 s1 Bob Alice s1 .7 Bob s13 .7 s1 .8 6 1 yt 0 4 1 .6 1 1 Bob 2 3 4 Bob .7 2 .9 1Bob .5 t 1 Alice s1 1 0 s1 s1 s1 0 s1 s1 1 ct+1 .6 Alice 0 .4 .9 2/3 4/5 score score effort= effort= ˆ actual performance predicted performance cu,q,t+1 yu,q,tˆ Alternatively, we can model error
  • 32. Q/ What if student does not achieve target performance? A: “Visible” imputation 32
  • 34. Compare: •  Conventional classification metrics •  Leopard metrics (White) 34
  • 35. Datasets: •  Data from a middle-school Math commercial tutor –  1.2 million observations –  25,000 students –  Item to skill mapping: •  Coarse: 27 skills •  Fine: 90 skills •  (Other item-to-skill model not reported) •  Synthetic data 35
  • 36. Assessing an evaluation metric with real student data is difficult because we often do not know the ground truth Insight: Use data that we know a priori its behavior in an adaptive tutor 36
  • 37. For adaptive tutoring to be able to optimize when to stop instruction, the student performance should increase with repeated practice (the learning curve should be increasing) Decreasing /flat learning curve = bad data 37
  • 38. 38
  • 39. 39
  • 40. Procedure: 1.  Select skills with decreasing/flat learning curve (aka bad data) 2.  Train a student model on those skills 3.  Compare classification metrics with Leopard 40
  • 41. F1 AUC Score Effort Bad student model .79 .85 Majority class 0 .50 41
  • 42. F1 AUC Score Effort Bad student model .79 .85 .18 10.1 Majority class 0 .50 .18 11.2 42
  • 43. What does this mean? •  High accuracy models may not be useful for adaptive tutoring •  We need to change how we report results in adaptive tutoring 43
  • 44. Solutions •  Report classification accuracy averaged over skills (for models with 1 skill per item) ✖ Not useful for comparing or discovering different skill models •  Report as “difficulty” baseline ✖ Experiments suggest that models with baseline performance can be useful •  Use Leopard 44
  • 45. Let’s use all data, and pick an item-to-skill mapping: AUC Score Effort Coarse (27 skills) .69 Fine (90 skills) .74 45
  • 46. Let’s use all data, and pick an item-to-skill mapping: AUC Score Effort Coarse (27 skills) .69 .41   55.7   Fine (90 skills) .74 .36   88.1   46
  • 47. With synthetic data we can use Teal as the ground truth We generate 500 synthetic datasets with known Knowledge Tracing Parameters Which metrics correlate best to the truth? 47
  • 48. 48
  • 49. 49
  • 51. 51 In EDM 2014 we proposed “FAST” toolkit for Knowledge Tracing with Features
  • 52. “FAST model improves 25% AUC of ROC”
  • 53.
  • 54. 54
  • 55. Input Teal White Knowledge Tracing Family parameters Student’s correct or incorrect response Sequence length Student models’ prediction of correct/ incorrect Target Probability of correct Target Probability of correct 55