Computer Generated Items, Within-Template Variation, and the Impact on the Parameters of Response Models.
Master's thesis talk related to Lathrop, Q.N., Cheng, Y. Item Cloning Variation and the Impact on the Parameters of Response Models. Psychometrika 82, 245–263 (2017). https://doi.org/10.1007/s11336-016-9513-1
5. Item Response Theory
i = 1, 2, ..., I for Items
p = 1, 2, ..., N for Persons
Ypi = 0 or 1
ηpi = αi × (θp − βi)
P(Ypi = 1) =
exp(ηpi)
1 + exp(ηpi)
θp is Person Ability
βi is Item Difficulty
αi is Item Discrimination
3
6. New Technology Leads to New Psychometrics
Summer Internship with Pearson’s Center for Digital Data,
Analytics & Adaptive Learning
Computer/tablet-based course: lots of data
Technology allows for algorithmically generated items (called
Templates)
4
10. What are templates?
Templates generate items/tasks during computerized assessment.
I Creates secure and inexpensive item bank
I Creates miniature randomized experiments
I Students can repeat and practice templates
6
11. What are templates?
Templates generate items/tasks during computerized assessment.
I Creates secure and inexpensive item bank
I Creates miniature randomized experiments
I Students can repeat and practice templates
Templates contain:
6
12. What are templates?
Templates generate items/tasks during computerized assessment.
I Creates secure and inexpensive item bank
I Creates miniature randomized experiments
I Students can repeat and practice templates
Templates contain:
I a question form
6
13. What are templates?
Templates generate items/tasks during computerized assessment.
I Creates secure and inexpensive item bank
I Creates miniature randomized experiments
I Students can repeat and practice templates
Templates contain:
I a question form
I distributions for all variables in the question form
6
16. Examples of Templates
Question Form:
“What is X + Y ?”
Distributions:
fX (x) = 1/5, x ∈ {1, 2, 3, 4, 5}
fY (y) = 1/6, y ∈ {3, 4, 5, 6, 7, 8}
7
17. Examples of Templates
Question Form:
“What is X + Y ?”
Distributions:
fX (x) = 1/5, x ∈ {1, 2, 3, 4, 5}
fY (y) = 1/6, y ∈ {3, 4, 5, 6, 7, 8}
Question Form:
“What is the average of
x1, x2, x3, x4, x5?”
7
18. Examples of Templates
Question Form:
“What is X + Y ?”
Distributions:
fX (x) = 1/5, x ∈ {1, 2, 3, 4, 5}
fY (y) = 1/6, y ∈ {3, 4, 5, 6, 7, 8}
Question Form:
“What is the average of
x1, x2, x3, x4, x5?”
Distributions:
x1−5 ∼ Binom(40, .5)
7
21. Our Motivating Template
Question Form:
“What is the probability of rolling a X on a Y -sided die?”
Distributions:
fY (y) = 1/5, y ∈ {6, 8, 10, 12, 20}
fX (x) = 1/y, x ∈ {1, 2, ..., y}
8
22. Our Motivating Template
Question Form:
“What is the probability of rolling a X on a Y -sided die?”
Distributions:
fY (y) = 1/5, y ∈ {6, 8, 10, 12, 20}
fX (x) = 1/y, x ∈ {1, 2, ..., y}
Correct strategy:
1
y
8
23. Our Motivating Template
Question Form:
“What is the probability of rolling a X on a Y -sided die?”
Distributions:
fY (y) = 1/5, y ∈ {6, 8, 10, 12, 20}
fX (x) = 1/y, x ∈ {1, 2, ..., y}
Correct strategy:
1
y
An incorrect strategy:
x
y
8
24. Our Motivating Template
Question Form:
“What is the probability of rolling a X on a Y -sided die?”
Distributions:
fY (y) = 1/5, y ∈ {6, 8, 10, 12, 20}
fX (x) = 1/y, x ∈ {1, 2, ..., y}
Correct strategy:
1
y
An incorrect strategy:
x
y
For a subset (x = 1), students
can use the wrong strategy and
still get the correct answer!
8
27. Model within-template differences with multi-level IRT?
• Albers (1995)
• Glas and van der Linden (2003)
• Johnson and Sinharay (2005)
10
28. Model within-template differences with multi-level IRT?
• Albers (1995)
• Glas and van der Linden (2003)
• Johnson and Sinharay (2005)
Model within-template differences with covariates?
10
29. Model within-template differences with multi-level IRT?
• Albers (1995)
• Glas and van der Linden (2003)
• Johnson and Sinharay (2005)
Model within-template differences with covariates?
• Fischer (1973)
• de Boeck and Wilson (2004)
10
30. Model within-template differences with multi-level IRT?
• Albers (1995)
• Glas and van der Linden (2003)
• Johnson and Sinharay (2005)
Model within-template differences with covariates?
• Fischer (1973)
• de Boeck and Wilson (2004)
Both?
10
31. Model within-template differences with multi-level IRT?
• Albers (1995)
• Glas and van der Linden (2003)
• Johnson and Sinharay (2005)
Model within-template differences with covariates?
• Fischer (1973)
• de Boeck and Wilson (2004)
Both?
Neither?
10
32. Publications/Presentations
Lathrop, Q. N. & Cheng, Y. (Under Review). Computer Generated
Items, Within-Template Variation, and the Impact on the Parameters of
Response Models.
Lathrop, Q. N. (2014). The Impact of Within-Template Systematic
Variation. Presented at NCME and regional conferences.
Lathrop, Q. N. & Behrens, J. (2014). Psychometric, Computational, and
Interactional Issues in Designing Integrated Assessment and Learning Systems.
Presented at the AERA.
Lathrop, Q. N. & Cheng, Y. (2013). Modeling Tests Using Templates
and Effect of Ignoring Template Structure on Educational Outcomes.
Presented at NCME
11
34. Notation Changes
Each person responds to templates
• p = 1, 2, ..., N for persons and t = 1, 2, ..., T for templates
12
35. Notation Changes
Each person responds to templates
• p = 1, 2, ..., N for persons and t = 1, 2, ..., T for templates
Each template has some number of items
• ti = 1, 2, ..., tI
12
36. Notation Changes
Each person responds to templates
• p = 1, 2, ..., N for persons and t = 1, 2, ..., T for templates
Each template has some number of items
• ti = 1, 2, ..., tI
The items within a template may be grouped by a
design matrix
• A dummy variable Xti
= 1 if ti is in the “subset” and 0
otherwise
12
37. Notation Changes
Each person responds to templates
• p = 1, 2, ..., N for persons and t = 1, 2, ..., T for templates
Each template has some number of items
• ti = 1, 2, ..., tI
The items within a template may be grouped by a
design matrix
• A dummy variable Xti
= 1 if ti is in the “subset” and 0
otherwise
When person p is assigned template t, item ti is
randomly drawn from available items
• The response is Ypti
∼ Bernoulli(
exp(ηpti
)
1+exp(ηpti
))
12
39. Four Models
2P-T
ηpt = αt × (θp − µt)
• the “neither” option, just a
template level IRT model
13
40. Four Models
2P-T
ηpt = αt × (θp − µt)
• the “neither” option, just a
template level IRT model
2P-TX
ηpti
= αt × (θp − µt + λtXti
)
• adds a covariate, λt to explain
differences contained in Xti
13
41. Four Models
2P-T
ηpt = αt × (θp − µt)
• the “neither” option, just a
template level IRT model
2P-TX
ηpti
= αt × (θp − µt + λtXti
)
• adds a covariate, λt to explain
differences contained in Xti
2P-R
ηpti
= αt × (θp − βti
)
βti
∼ N(µt, σt)
• multi-level model
13
42. Four Models
2P-T
ηpt = αt × (θp − µt)
• the “neither” option, just a
template level IRT model
2P-TX
ηpti
= αt × (θp − µt + λtXti
)
• adds a covariate, λt to explain
differences contained in Xti
2P-R
ηpti
= αt × (θp − βti
)
βti
∼ N(µt, σt)
• multi-level model
2P-RX
ηpti
= αt × (θp − βti
+ λtXti
)
βti
∼ N(µt, σt)
• the “both” option
13
43. “What is the probability of rolling a X on a Y -sided die?”
14
44. “What is the probability of rolling a X on a Y -sided die?”
15
45. “What is the probability of rolling a X on a Y -sided die?”
2P-T
Prob
of
Correct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
15
46. “What is the probability of rolling a X on a Y -sided die?”
2P-T
Prob
of
Correct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
2P-TX
Prob
of
Correct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
15
47. “What is the probability of rolling a X on a Y -sided die?”
2P-T
Prob
of
Correct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
2P-TX
Prob
of
Correct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
2P-R
Prob
of
Correct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
15
48. “What is the probability of rolling a X on a Y -sided die?”
2P-T
Prob
of
Correct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
2P-TX
Prob
of
Correct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
2P-R
Prob
of
Correct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
2P-RX
Prob
of
Correct
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
15
50. Model selection may be out of our hands
Person ID Template ID Item ID Response X
1 1 12 0 0
1 2 37 1 0
.
.
.
.
.
.
.
.
.
.
.
.
1 T 2 0 1
2 1 7 1 1
.
.
.
.
.
.
.
.
.
.
.
.
N T It 1 1
17
53. Simulation Study
Simulate data (so we know the true values of the parameters)
Fit all four models (2P-T, 2P-TX, 2P-R, and 2P-RX) with
MCMC
18
54. Simulation Study
Simulate data (so we know the true values of the parameters)
Fit all four models (2P-T, 2P-TX, 2P-R, and 2P-RX) with
MCMC
Compare their results
18
55. Simulation Study
Simulate data (so we know the true values of the parameters)
Fit all four models (2P-T, 2P-TX, 2P-R, and 2P-RX) with
MCMC
Compare their results
What happens if we fit the simpler 2P-T? (ignore template
variability and ignore systematic variation)
18
57. Quick summary of MCMC inference
Bayesian analysis combines the data, our model for the data
(likelihood), and any prior information.
19
58. Quick summary of MCMC inference
Bayesian analysis combines the data, our model for the data
(likelihood), and any prior information.
The results of MCMC are samples from the distribution of all
parameters of interest
19
59. Quick summary of MCMC inference
Bayesian analysis combines the data, our model for the data
(likelihood), and any prior information.
The results of MCMC are samples from the distribution of all
parameters of interest
parameters
iterations theta[40]
[1,] -1.590404
[2,] -1.625150
[3,] -1.676880
[4,] -1.986976
[5,] -1.808551
[6,] -1.562125
[7,] -1.837187
[8,] -1.518175
19
60. Quick summary of MCMC inference
Bayesian analysis combines the data, our model for the data
(likelihood), and any prior information.
The results of MCMC are samples from the distribution of all
parameters of interest
parameters
iterations theta[40]
[1,] -1.590404
[2,] -1.625150
[3,] -1.676880
[4,] -1.986976
[5,] -1.808551
[6,] -1.562125
[7,] -1.837187
[8,] -1.518175
-Point estimate is the average across
iterations.
19
61. Quick summary of MCMC inference
Bayesian analysis combines the data, our model for the data
(likelihood), and any prior information.
The results of MCMC are samples from the distribution of all
parameters of interest
parameters
iterations theta[40]
[1,] -1.590404
[2,] -1.625150
[3,] -1.676880
[4,] -1.986976
[5,] -1.808551
[6,] -1.562125
[7,] -1.837187
[8,] -1.518175
-Point estimate is the average across
iterations.
-Standard error is the standard
deviation across iterations.
19
62. Quick summary of MCMC inference
Bayesian analysis combines the data, our model for the data
(likelihood), and any prior information.
The results of MCMC are samples from the distribution of all
parameters of interest
parameters
iterations theta[40]
[1,] -1.590404
[2,] -1.625150
[3,] -1.676880
[4,] -1.986976
[5,] -1.808551
[6,] -1.562125
[7,] -1.837187
[8,] -1.518175
-Point estimate is the average across
iterations.
-Standard error is the standard
deviation across iterations.
-Hypothesis testing can be done with
posterior intervals (like confidence
intervals).
19
65. Simulation Study
1000 persons answering 40 templates
Each template has 12 or 100 items
σt ∼ 0, |N(0, .3)|, or |N(0, .6)|
20
66. Simulation Study
1000 persons answering 40 templates
Each template has 12 or 100 items
σt ∼ 0, |N(0, .3)|, or |N(0, .6)|
For X, a random 25% of items within a template belong to the
“subset”
20
67. Simulation Study
1000 persons answering 40 templates
Each template has 12 or 100 items
σt ∼ 0, |N(0, .3)|, or |N(0, .6)|
For X, a random 25% of items within a template belong to the
“subset”
λt is zero (Type I error) or nonzero
20
73. What about λ̂t?
The covariate performs well in terms of bias.
But the 2P-TX has very high Type I error.
24
74. What about λ̂t?
The covariate performs well in terms of bias.
But the 2P-TX has very high Type I error.
And the 2P-RX properly controls Type I error.
24
75. What about λ̂t?
The covariate performs well in terms of bias.
But the 2P-TX has very high Type I error.
And the 2P-RX properly controls Type I error.
There is a clear benefit to the other parameters in the model.
24
76. What about λ̂t?
The covariate performs well in terms of bias.
But the 2P-TX has very high Type I error.
And the 2P-RX properly controls Type I error.
There is a clear benefit to the other parameters in the model.
Specification of Xti
is the limiting factor.
24
82. Implications - what is our inferential focus?
While templates are increasingly used, there is relatively little
methodological work
28
83. Implications - what is our inferential focus?
While templates are increasingly used, there is relatively little
methodological work
I If within-template variation exists, the 2P-TX, 2P-R, 2P-RX
models can account for them
28
84. Implications - what is our inferential focus?
While templates are increasingly used, there is relatively little
methodological work
I If within-template variation exists, the 2P-TX, 2P-R, 2P-RX
models can account for them
I Useful for item analysis, item selection, and other item-based
inferences
28
85. Implications - what is our inferential focus?
While templates are increasingly used, there is relatively little
methodological work
I If within-template variation exists, the 2P-TX, 2P-R, 2P-RX
models can account for them
I Useful for item analysis, item selection, and other item-based
inferences
I But doesn’t meaningfully effect the inferences based of θ
28
86. Implications - what is our inferential focus?
While templates are increasingly used, there is relatively little
methodological work
I If within-template variation exists, the 2P-TX, 2P-R, 2P-RX
models can account for them
I Useful for item analysis, item selection, and other item-based
inferences
I But doesn’t meaningfully effect the inferences based of θ
The Simple 2P-T Model
28
87. Implications - what is our inferential focus?
While templates are increasingly used, there is relatively little
methodological work
I If within-template variation exists, the 2P-TX, 2P-R, 2P-RX
models can account for them
I Useful for item analysis, item selection, and other item-based
inferences
I But doesn’t meaningfully effect the inferences based of θ
The Simple 2P-T Model
I while it cannot uncover the within-template effects, can still
measure θ very well
28
88. Implications - what is our inferential focus?
While templates are increasingly used, there is relatively little
methodological work
I If within-template variation exists, the 2P-TX, 2P-R, 2P-RX
models can account for them
I Useful for item analysis, item selection, and other item-based
inferences
I But doesn’t meaningfully effect the inferences based of θ
The Simple 2P-T Model
I while it cannot uncover the within-template effects, can still
measure θ very well
I and the 2P-T’s discrimination parameter can be used to
screen for high within-template variation
28
89. Implications - what is our inferential focus?
While templates are increasingly used, there is relatively little
methodological work
I If within-template variation exists, the 2P-TX, 2P-R, 2P-RX
models can account for them
I Useful for item analysis, item selection, and other item-based
inferences
I But doesn’t meaningfully effect the inferences based of θ
The Simple 2P-T Model
I while it cannot uncover the within-template effects, can still
measure θ very well
I and the 2P-T’s discrimination parameter can be used to
screen for high within-template variation
Already used in large assessment and learning systems
28
90. Data Collection is Key
Many systems have thousands of templates each with
potentially thousands of items.
29
91. Data Collection is Key
Many systems have thousands of templates each with
potentially thousands of items.
I Is the item index being recorded (needed for R
models)?
29
92. Data Collection is Key
Many systems have thousands of templates each with
potentially thousands of items.
I Is the item index being recorded (needed for R
models)?
I How do we organize the items by meaningful
dimensions in X (needed for X models)?
29
93. Data Collection is Key
Many systems have thousands of templates each with
potentially thousands of items.
I Is the item index being recorded (needed for R
models)?
I How do we organize the items by meaningful
dimensions in X (needed for X models)?
If not, the 2P-T is generally the only option.
29
94. Data Collection is Key
Many systems have thousands of templates each with
potentially thousands of items.
I Is the item index being recorded (needed for R
models)?
I How do we organize the items by meaningful
dimensions in X (needed for X models)?
If not, the 2P-T is generally the only option.
If we don’t collect the data, we can’t even begin to ask.
29