Complex skill mastery requires not only acquiring individual basic component skills, but also practicing integrating such basic skills. However, traditional approaches to knowledge modeling, such as Bayesian Knowledge Tracing, only trace knowledge of each decomposed basic component skill. This risks early assertion of mastery or ineffective remediation failing to address skill integration. We propose a diagnostic Bayesian network based on a hierarchical integration graph for learner knowledge modeling. We assess the value of such a model from four aspects: performance prediction, parameter plausibility, expected instructional effectiveness, and real-world recommendation helpfulness. Our experiments with a Java programming dataset and a user study based on a Java programming tutor show that proposed model significantly improves two popular multiple skill knowledge tracing models on all these four aspects. Our work serves as a first step towards building skill application context sensitive learner model for modeling and promoting students’ robust learning.
1. Learner
Modeling
for
Integration
Skills
1
Yun
Huang1,
Julio
Guerra-‐Hollstein1,2,
Jordan
Barria-‐Pineda1,
Peter
Brusilovsky1
1University
of
Pittsburgh,
2Universidad
Austral
de
Chile
07/11/2017
@
UMAP
2. 2
Ambrose, Susan A.,
et al. How learning
works: Seven
research-based
principles for smart
teaching. 2010.
How
do
students
develop
mastery?
KNOW WHEN
TO
APPLY
Skills
PRACTICE
Integrating
Skills
ACQUIRE
Component
Skills
MASTERY
3. Empirical
evidence
showing
difficulty
in
integration?
Ø Algebra
• Composition
effect
• Heffernan&Koedinger ‘97;
Koedinger&McLaughlin,
‘16
• translate
two
matched
one-‐step
problems
800-‐y and
40x
• translate
two-‐step
story
problems
into
expressions
800-‐40x
• Intervention
study
• Koedinger&McLaughlin,
‘10
3
4. Empirical
evidence
showing
difficulty
in
integration?
Ø Programming
Ø Patterns
in
programming
expertise(Gilmore&Green ’88;
Soloway&Ehrlich ’84)
4
print("Entertemperature, -300 to stop")
count = 0
sum = 0.0
temp = float(input("First:"))
while temp > -300.0:
sum += temp
count += 1
temp = float(input("Next: "))
print("Entertemperature, -300 to stop")
count = 0
sum = 0.0
temp = float(input("First:"))
while temp > -300.0:
sum += temp
count += 1
temp = float(input("Next: "))
Pattern
of
Sentinel
Input
ProcessingPattern
of
Summing
a
Sequence
5. Empirical
evidence
showing
difficulty
in
integration?
public
static
void
main(String[]
args)
{
int y
=
1;
for
(int j
=
5;
j
<
8;
j++){
y
+=
j;
}
System.out.print(y);
}
What
is
the
output
of
the
program?
5
success
rate:
64%
success
rate:
39%
Ø Our
recent
studies
demonstrate
integration
difficulty
in
program
comprehension
public static void main(String[] args) {
int z = 8;
int j = 7;
z += j;
System.out.print(z);
j = 1;
z += j;
System.out.print(z);
j = 3;
z += j;
System.out.print(z);
for(int k= 1; k < 4; k++) {
System.out.print(k);
}
}
What
is
the
output
of
the
program?
7. Limited
evaluation
by
performance
prediction
◦ Is
it
worthy
to
make
such
fine-‐grained
refinement
of
learner
models?
◦ Will
traditional
learner
model
evaluation
metrics
reveal
the
effect?
◦ Our
recent
work:
performance
prediction
is
not
enough!
◦ Highly
predictive
models
can
be
useless
for
adaptive
tutoring[1,
2]
◦ Similarly
predictive
models
can
be
very
different
for
adaptive
tutoring
[1,
2]
7
8. Approach
We
propose
and
demonstrate
the
effectiveness
of:
Ø A
new
knowledge
graph
defining
progressive
integration
skills
Ø A
new
learner
model
monitoring
students’
integration
skills
Ø A
multifaceted
evaluation
framework
for
complex
latent
variable
models
10. Integration-‐level
Learner
Model
10
basic
component
skills
(e.g.,
for,
+=,
a[])
integration
skills
(e.g.,
for&+=,
for&a[])
Conjunctive
Knowledge
Modeling
with
Hierarchical
Integration
skills
(CKM-‐HI)
• Based
on
an
integration
graph
(pairwise)
• Basic
skills
and
integration
skills
are
separately
represented
• Latent
skills
organized
in
a
hierarchical
way
latent
observed
11. Multifaceted
Evaluation
Framework
Ø Performance
prediction
Ø RMSE,
AUC
Ø Parameter
Plausibility
Ø Parameters
for
capturing
noise
(guess,
slip)
should
be
small
Ø Expected
instructional
effectiveness
Ø How
much
effort
a
student
needs
to
reach
a
specific
score
assuming
students
are
practicing
under
the
guidance
of
a
learner
model?
Ø Real-‐world
recommendation
helpfulness
(User
study)
Ø How
do
students
rank
recommendations
from
different
learner
models?
11
12. Dataset
and
Experimental
Setup
12
• QuizJET system
• 25,988
attempts,
347
students,
91
questions,
67%
correct
• 72
basic
individual,
43
integration
skills
• 10-‐fold
student
stratified
cross-‐
validation:
• In
each
fold
train
on
90%
of
students,
and
test
on
the
remaining
10%
of
new
students.
• Sequential
update
by
Bayesian
rule
13. Performance
Prediction
and
Parameter
Plausibility
13
CKM-‐HI
significantly
outperforms
WKT
and
CKM
in
both
aspects
* sig. at 0.05/3=0.017, ** sig. at 0.01/3=0.0033, *** sig. at 0.001/3=0.00033.
+ effect size ≥ 1 (large).
14. Expected
Instructional
Effectiveness
14
• Computed
based
on
collected
data,
focus
on
the
higher
mastery
threshold
region
• To
reach
the
same
score,
students
under
CKM-‐HI
needs
the
least
effort
• Using
the
same
effort,
students
under
CKM-‐HI
gets
the
highest
score
15. Expected
Instructional
Effectiveness
15
Ø Extends
our
prior
evaluation
framework
LEOPARD
(EDM
’14)
[1]
Ø Metrics:
§ Score: Computed
by
the
mean
performance
on
real
data
after
a
learner
model
asserts
mastery
for
the
set
of
required
skills.
§ Effort:
Computed
by
the
number
of
practices
on
real
data
in
order
to
reach
mastery
inferred
by
a
learner
model.
§ Consider
a
range
of
mastery
thresholds
16. User
Study
Setup
Ø Solve
7
Java
comprehension
problems
and
rank
recommended
subproblems
Ø 20
participants
pursuing
undergraduate
or
master’s
degrees
in
information
science
at
the
University
of
Pittsburgh
Ø 1.5h
session
on
average
ØCompare
3
learner
models
(CKM-‐HI,
CKM,
WKT)
+
1
distractor,
each
recommends
2
subproblems,
mixed
together
Ø Identify
weakest
skill,
picks
a
subproblem addressing
this
skill
Ø Identify
2nd weakest
skill,
picks
a
subproblem addressing
this
skill
Ø Compare
under
two
different
recommendation
strategies:
MaxDiff,
MinDiff
16
18. Does
CKM-‐HI
receive
the
highest
ranking?
18
• CKM-‐HI
receives
significantly
higher
ranking
than
others
• Two
ways
of
analyzing
the
ranking,
as
continuous/ordinal
variables
• Two
recommendation
strategies
• No
sig.
diff.
between
WKT
and
CKM
• All
models
sig.
outperform
Distractor
19. Future
Work
§ Conduct
a
large-‐span
and
long-‐scale
study
to
collect
objective
measurements.
§ Explore
skill
integration
beyond
the
single
context
§ Continue
to
contribute
to
best
practices
in
evaluating adaptive
educational
systems
§ Automated
methodsfor
extracting
integration
skills
that
advance
our
preliminary
approach
[15]
19
20. Conclusion
• New
knowledge
graph:
Integration
Graph
• New
integration-‐level
leaner
model
• CKM-‐HI,
which
significantly
outperforms
two
popular
multiple-‐skill
learner
models,
WKT
and
CKM,
on
investigated
dimensions
• New
multifaceted
evaluation
framework
• Performance
prediction
• Parameter
Plausibility
• Expected
instructional
effectiveness
• Real-‐world
recommendation
helpfulness
(User
study)
20
21. Details in the poster session J
Thank you very much for listening!
22. Reference
[1]
José
P
González-‐Brenes and
Yun
Huang.
2015.
Your
model
is
predictive
– but
is
it
useful?
theoretical
and
empirical
considerations
of
a
new
paradigm
for
adaptive
tutoring
evaluation.
In
Proc.
8th
Intl.
Conf.
Educational
Data
Mining.
187–194.
[2]
Yun
Huang,
José
P
González-‐Brenes,
Rohit Kumar,
and
Peter
Brusilovsky.
2015.
A
framework
for
multifaceted
evaluation
of
student
models.
In
Proc.
8th
Int.
Conf.
Educational
Data
Mining.
203–210.
[3]
AlbertT.
Corbett
and
JohnR.
Anderson.
1995.
Knowledge
tracing:
Modeling
the
acquisition
of
procedural
knowledge.
User
Modeling
and
User-‐Adapted
Interaction
4,
4
(1995),
253–278.
[4]
Yue
Gong,
Joseph
E
Beck,
and
Neil
T
Heffernan.
2010.
Comparing
knowledge
tracing
and
performance
factor
analysis
by
using
multiple
model
fitting
procedures.
In
Intelligent
Tutoring
Systems.
Springer,
35–44.
[5]
D.
J.
Gilmore
and
T.
R.
G.
Green.
1988.
Programming
plans
and
programming
expertise.
The
Quarterly
Journal
of
Experimental
Psychology
Section
A
40,
3
(1
Aug.
1988),
423–442.
[6]
Elliot
Soloway and
Kate
Ehrlich.
1984.
Empirical
Studies
of
Programming
Knowledge.
IEEE
Trans.
Software
Engineering
SE-‐10,
5
(1984),
595–609.
[7]
Heffernan,
Neil
T.,
and
Kenneth
R.
Koedinger.
"The
composition
effect
in
symbolizing:
The
role
of
symbol
production
vs.
text
comprehension." Proceedings
of
the
Nineteenth
Annual
Conference
of
the
Cognitive
Science
Society.
1997.
[8]
Anderson,
J.
R.
&
Lebiere,
C.
(1998). The
atomic
components
of
thought. Mahwah,
NJ:
Erlbaum.
22
23. Reference
[9]
Cristina
Conati,
Abigail
Gertner,
and
Kurt
Vanlehn.
2002.
Using
Bayesian
Networks
to
Manage
Uncertainty
in
Student
Modeling.
User
Modeling
and
User-‐Adapted
Interaction
12,
4
(2002),
371–417.
citeulike-‐article-‐id:2877137
[10]
Michael
Mayo
and
Antonija Mitrovic.
2001.
Optimising ITS
behaviour with
Bayesian
networks
and
decision
theory.
(2001).
[11]
Eva
Millán and
José
Luis
Pérez-‐De-‐La-‐Cruz.
2002.
A
Bayesian
diagnostic
algo-‐ rithm for
student
modeling
and
its
evaluation.
User
Modeling
and
User-‐Adapted
Interaction
12,
2-‐3
(2002),
281–330.
[12]
Cristina
Carmona,
Eva
Millán,
José-‐Luis
Pérez-‐de-‐la Cruz,
Mónica Trella,
and
Ricardo
Conejo.
2005.
Introducing
prerequisite
relations
in
a
multi-‐layered
Bayesian
student
model.
In
International
Conference
on
User
Modeling.
Springer,
347–356.
[13]
José
P
González-‐Brenes,
Yun
Huang,
and
Peter
Brusilovsky.
2014.
General
features
in
knowledge
tracing:
Applications
to
multiple
subskills,
temporal
item
response
theory,
and
expert
knowledge.
In
Proc.
7th
Int.
Conf.
Educational
Data
Mining.
84–91.
[14]
Yanbo Xu
and
Jack
Mostow.
2012.
Comparison
of
methods
to
trace
multiple
subskills:
Is
LR-‐DBN
best?.
In
Proc.
5th
Intl.
Conf.
Educational
Data
Mining.
Chania,
Greece,
41–48.
[15]
Yun
Huang,
Julio
Guerra,
and
Peter
Brusilovsky.
2016.
Modeling
skill
combination
patterns
for
deeper
knowledge
tracing.
In
Proceedings
of
the
6th
Workshop
on
Personalization
Approaches
in
Learning
Environments
(PALE
2016).
23
Editor's Notes
Please, prepare a 10 min. presentation of your results
-- According to Susan and her colleague’s highly cited book, How learning works: Seven research-based principles for smart teaching, here is the process a student should go through in order to reach mastery: firstly, she acquires component skills, then she practices integrating skills, then she needs to know when to apply skills.
-- Accordingly, a learner model that truly monitors whether a student has reached mastery or not, should be able to tell which level a student is in among these three levels.
-- This work particularly focuses on modeling integration skills, in addition to models students’ component skills.
-- Is there empirical evidence showing that students really have difficulty in integration and needs specific integration practices?
-- In algebra domain, students were found to be significantly worse at translating two-step algebra story problems into expressions than they were at translating two closely matched one-step problems. An intervention study showed that giving students deliberate practice on such integration can improve learning.
//////////////////////////
-- These results show that a two operator problem is harder than both of the parts that make it up put together. We call this the composition effect.
-- learning to symbolize story problems could be better enhanced through practice on dissimilar looking substitution exercises than through practice on more similar looking story problems.
-- In programming domain, educators and experts in the area of psychology of programming have long argued that programming plans or patterns form an important part of programming expertise. Here are examples of two typical patterns: summing a sequence, and sentinel input processing.
-- In our recent studies in program comprehension in Java and Python across several topics, we consistently find that many students failed on integration problems even though they could apply individual skills separately without problems.
/////////////////////////////////////
Java (28 stu): 0.3929***+++: sig. at 0.001, large effect size
Python (80 stu) : 0.111*+: sig. at 0.05, small effect size
* sig. at 0.05, ** sig. at 0.01, *** sig. at 0.001; + small effect size(0.2), ++ medium effect size(0.5), +++ large effect size (0.8)
-- The above examples from psychological, educational views all show the importance of practicing integration skills, however, in learner modeling domain, very little effort has been made to address this problem.
-- Existed popular learner models for multiple skill practices mostly model skills independently or individually. For example, here we demonstrate the graphical models of two typical models from two different families. They all fail to address integration skills. The danger is that they could assert mastery before students could fluently integrate skills and apply them in different contexts, or they will fail to identify specific integrations students have problem with.
/////////////////////
In this avenue of work, some use a hierarchical structure among skills, yet focus on either prerequisite relations among intrinsically different skills [7, 9, 25] , or granularity relationships [17, 33, 39], where the parental nodes denote more abstract, general skills.
-- However, is it worthy to make such fine-grained refinement of learner models? Will traditional evaluation metrics reveal the effect? Our recent work has demonstrated that performance prediction is not enough for evaluating learner models in that: firstly, highly predictive models can still be useless for adaptive tutoring, and secondly, similarly predictive models can be very different for adaptive tutoring.
-- In this work, we proposed and demonstrated effectiveness of:
A new knowledge graph defining progressive integration skills
A new learner model monitoring students’ integration skills
A multifaceted evaluation framework for complex latent variable models
-- First, let’s look at the integration graph that we proposed. It aims at showing the skill progression in a domain from basic component skills to more complex integration skills. For example, students need to firstly know how to do simple addition assignment, iterate through a for loop, access an array element, and then learn how to compute the sum of numbers with a for loop and addition assignment, iterate through an array with a for loop, and finally be able to compute the sum of an array with a for loop.
-- This graph can be constructed by experts or assisted by data mining methods
-- Based on an integration graph, we construct a Bayesian network that we called conjunctive knowledge modeling with hierarchical integration skills for modeling basic and integration skills. We chose BN because it give teachers and the recommendation engine clear ideas what skills students are struggling with.
-- Here in the CKM-HI model, basics skills and integration skills are separately represented, so that the target of remediation can be clearly identified
-- Latent skills are organized hierarchically. This hierarchical structure allows efficiency and accuracy in inference: once a student has mastered an integration skill, they should already have mastered its component skills. This avoids tedious assessment and the over-practicing of basic component skills.
/////////////
Integration skills are directly connected to items instead
Each integration skill node has its own parent node for cognitive load
Binomial distributions are used for integration skill nodes
Noisy-and maintained: However, the core characteristics that allow CKM-HI to model integration skills are less about its conjunctive nature for modeling the skill to item relationship and are more in how we represent integration skills.
//////////////
-- This skill model maps 4 basic component skills per item on average (ranging from 1 to 8) with a total of 72.
-- The final integration skill mapping indexes 2 integration skills per item on average (ranging from 1 to 5) with a total of 43.
CKM significantly outperforms WKT
conjunctive modeling is better
* sig. at 0.05/3=0.017, ** sig. at 0.01/3=0.0033, *** sig. at 0.001/3=0.00033. + effect size ≥ 1 (large).
This is based on Test_Obs that follows the studetent order (slightly different from the one before 2017/02/06 due to Test_Obs not following student order)
''In the social sciences, you may see values around .2 as a small effect, .5 as a medium effect, and .8 as a large effect size. ''’
Cohen classified effect sizes as small (d = 0.2), medium (d = 0.5), and large (d ≥ 0.8).5 According to Cohen, “a medium effect of .5 is visible to the naked eye of a careful observer. A small effect of .2 is noticeably smaller than medium but not so small as to be trivial. A large effect of .8 is the same distance above the medium as small is below it.”
-- non-parametric: R test
-- meta-analysis (analysis for publications for consistency): 0.76-0.79 (biggest one so far)
Distractor: randomly picked from irrelevant subproblems
////////
Ranking data analysis (with imputation randommaxsub_fillbymax)
Imputation: 88/959 with missing ranking (-1), 34/959 can be filled with ranking
.1 as a small effect, .3 as a medium effect, and .5 as a large effect size. '’’
ranking_mg_rankbymodel_20stu_corrected_randommaxsub_fillbymax
ranking_mg_rankbymodel_20stu_corrected_choosemaxsub_fillbymax: not sig. any more by all means