SlideShare a Scribd company logo
1 of 42
Download to read offline
Thoughts on Item Response Theory
...an incomplete story
Quinn N Lathrop
University of Notre Dame
February 20, 2014
Overview
I 3PL IRT model
I Measurement in IRT
I Guessing Parameter
I Nonparametric IRT
I The Digital Ocean
I A new method, simulation, results
The Default Model: 3PL
Prob(Ypi = 1) = ci + (1 − ci ) × logit−1
[ai (θp − bi )]
i = 1, 2, ..., I for items and p = 1, 2, ..., P for persons.
Ypi ∼ Bernoulli(Prob(Ypi = 1))
“a three-parameter hammer”
What about the 2PL?
Each item has its own discrimination parameter ai .
2P-IRT model
Ability
Prob(Y=1)
-2 -1 0 1 2
0.0
0.2
0.4
0.6
0.8
1.0
Discrimination is usually estimated.
Similar to logistic regression, except x’s are latent.
Tangent #1: What is Measurement?
Usual definition of measurement: the assignment of numbers to objects or
events according to rules.
“In the physical sciences, the practical
consequence of performing arithmetic
computations on a numeric scale with
ordinal properties relative to one with
ratio or interval properties is
significant.
Is it less so in the social sciences?”
-Derek Briggs 2013 JEM
What is measurement in psychology?
“Measurement as a metaphor” is not the only option.
Scientific definition of measurement: the estimation of the ratio of a
magnitude to a unit.
Psychology is a pathological science because we assume our latent traits are
quantitive without empirically testing that assumption, or even acknowledging
that we made that assumption (Michelle, 1991).
“By avoiding tests of the assumption of a quantitative structure of
psychological attributes, psychologists have yet failed to make progress ... in
regard to their most fundamental assumptions” (Heene, 2013).
Additive Conjoint Measurement
Newton’s 2nd law
log(Force) = log(Mass) + log(Acceleration)
If the addition of X and Y satisfy certain axioms, then X, Y , and X + Y = Z
all have quantitative structure.
If the data fit, the Rasch or 1PL IRT model measures a quantitative
(interval-level) latent trait.
logit[Prob(correct)] = Ability − Difficulty
2PL and 3PL IRT models (and Structural Equation Models) generally do not
support additive conjoint measurement.
The Unit in IRT
logit[Prob(Ypi = 1)] = ai (θp − bi )
The discrimination parameter ai is the ratio between the item’s unit and the
latent unit.
Estimating the units while simultaneously making a measurement in the unit.
3PL Model
In addition to estimating ai , the 3PL model also estimates the lower bound of
Prob(Ypi = 1)
Prob(Ypi = 1) = ci + (1 − ci ) × logit−1
[ai (θp − bi )]
where
ci - guessing (lower asymptote), Prob(Ypi = 1|θp = −∞) = ci
But...
...c cannot be estimated
I Only possible to estimate two parameters per item consistently (Holland
1990)
I Requires large sample size, medium to high difficulty (Lord, 1974; Wood,
Wingersky, & Lord, 1976; Thissen & Wainer, 1982)
I Convergences rates of can be below 20% (Han, 2012)
I Null hypothesis that c = 0 is on boundary of parameter space
I There are no examinees near θ = −∞
The solution?
I Estimate it anyways
I Strong prior distributions
I Don’t estimate it (1/number of options) (Han, 2012)
Reality rears its ugly head
The response form is often not included in the response data
Item:
“What is the probability of rolling a 4 on a 6-sided die?”
Response: /
I Free response so (1 / number of options) doesn’t apply
I Response form may allow for some guessing
Demonstration Data
I 540 3PL item parameters
from retired test bank
I 5000 person parameters
from N(0, 1)
I No missing data
I Fit 3PL with BILOG
(MMLE, E-M)
I Default priors vs no priors
BILOG-MG Prior on c, Beta(5,17)
c
dbeta(x,
5,
17)
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
4
5
With this much data, things seem ok
c Estimates with Prior
True c
Estimated
c
0.0 0.1 0.2 0.3 0.4 0.5
0.0
0.1
0.2
0.3
0.4
0.5
c Estimates without Prior
True c
Estimated
c
0.0 0.1 0.2 0.3 0.4 0.5
0.0
0.1
0.2
0.3
0.4
0.5
Standard Errors
SE of c Estimates with Prior
Estimated c
Standard
Error
0.1 0.2 0.3 0.4 0.5
0.0
0.2
0.4
0.6
0.8
SE of c Estimates without Prior
Estimated c
Standard
Error
0.0 0.1 0.2 0.3 0.4 0.5
0.0
0.2
0.4
0.6
0.8
Summary of demonstration data c
With a large sample size of 5000 persons, using a strong prior on c
I brings the estimation away from zero
I reduces standard errors of the estimates
But do we really want all the items to have nonzero and different c’s?
We like parameters...
but the 3PL parameters cannot be used separately across items.
I Guessing is ci = P(−∞)
I Difficulty is P(bi ) = (1 + ci )/2
I Discrimination is
ai = 4 × P0
(bi )/(1 − ci )
The difficulty and discrimination are not comparable unless c’s are equal.
Can use graphical curves instead (called ICCs or IRFs).
“The best way to prevent undesirable consequences of such misuse of the item
parameters with 3PL simply would be not to use or not to interpret the item
parameters. Graphical analyses on IRFs could be used instead. Preventing the
3PL item parameters from being interpreted, however, would substantially
limit the utility of 3PL” (Han, 2012).
If we don’t need parameters, let’s use nonparametric IRT
P̂i (t) =
X
p
K

θ̂p − t
h

Ypi
X
p
K

θ̂p − t
h

I θ̂p - person ability estimate
I Ypi - response
I t - evaluation point
I K - kernel
I h - bandwidth parameter
Ability Estimate
Prob.Correct
-2 -1 0 1 2
0.0
0.2
0.4
0.6
0.8
1.0
Measurement in Nonparametric IRT
To get θ̂p,
I rank persons on total score*
I place ranks onto N(0, 1)
Note: The latent trait estimates are explicitly ordinal but placed onto N(0, 1)
for connivence and familiarity.
Intermission
The Digital Ocean
Data are now a side effect of our interaction with the environment.
Consequence on technology.
The data are never balanced, and we would never expect them to be.
Items have different sample sizes.
Persons have different sample sizes.
Nonparametric IRT fails with missing data
I Can’t rank subjects on total score
I Can’t rank subjects on proportion correct
How else can we rank individuals?
Matrix Decomposition
Factorization of a matrix into a product of matrices.
The P by I response matrix Y, can be decomposed into a matrix
corresponding to the rows of Y and a matrix corresponding to the columns of
Y.
Y = RC0
Similar in concept to a 1PL IRT model where
Ỹ(P×I) = R(P×1)C0
(1×I)
or a Singular Value Decomposition of rank 1 or one Principal Component.
Alternating Least Squares: SVD with missing data
Alternating Least Square is simple and fast
Start by randomly filling C
R = YC(C0
C)−1
C0
= (R0
R)−1
R0
Y
Can simply skip over missing data in Y when calculating these equations.
How does ALS-SVD compare to total score?
Alternating Least Squares Ranking
True Ability
values
in
R
-3 -2 -1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
1.2
True Scores Ranking
True Ability
Total
Scores
-3 -2 -1 0 1 2 3
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Three item test, no missing data.
Unbalanced data
Unbalanced data
Alternating Least Squares Ranking
True Ability
values
in
R
-3 -2 -1 0 1 2 3
0.0
0.5
1.0
1.5
True Scores Ranking
True Ability
Total
Scores
-3 -2 -1 0 1 2 3
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Five item test, θ  0 answer items 1, 2, 3 and θ  0 answer items 3, 4, 5.
More unbalanced data
11 items (ordered in difficulty)
Persons interact with different numbers of items
Persons interact with different difficulties of items
More unbalanced data
Alternating Least Squares Ranking
True Ability
values
in
R
-3 -2 -1 0 1 2 3
0.0
0.5
1.0
1.5
2.0
Total Score
7
6
5
4
3
2
1
0
Proportion Correct Ranking
True Ability
Total
Scores -3 -2 -1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
ALS-SVD leads to better kernel-smoothing
-2 -1 0 1 2
0.0
0.4
0.8
Item 1
-2 -1 0 1 2
0.0
0.4
0.8
Item 2
-2 -1 0 1 2
0.0
0.4
0.8
Item 3
-2 -1 0 1 2
0.0
0.4
0.8
Item 4
-2 -1 0 1 2
0.0
0.4
0.8
Item 5
-2 -1 0 1 2
0.0
0.4
0.8
Item 6
-2 -1 0 1 2
0.0
0.4
0.8
Item 7
-2 -1 0 1 2
0.0
0.4
0.8
Item 8
-2 -1 0 1 2
0.0
0.4
0.8
Item 9
-2 -1 0 1 2
0.0
0.4
0.8
Item 10
-2 -1 0 1 2
0.0
0.4
0.8
Item 11
ALS-SVD NP
Prop-Cor NP
True ICC
Summary of ALS-SVD
Now, methods based on total score can be applied to more challenging data
structures.
I Nonparametric smoothing for graphical purposes
I Popular item fit statistics (S − χ2
)
I Nonparametric DIF
I Nonparametric classification accuracy/consistency estimation
But, still a ton of work to do
I Consequences of binary data? Constraints?
I Assumptions about missing data?
I Other assumptions?
I Connection to latent trait? 2PL model?
I Interpreting the values of R and C
If we don’t need parameters, let’s use nonparametric IRT
Ability Estimate
Prob.Correct
-2 -1 0 1 2
0.0
0.2
0.4
0.6
0.8
1.0
P̂i (t) =
X
p
K

θ̂p − t
h

Ypi
X
p
K

θ̂p − t
h

“But I want parameters” -Everybody
Nonparametric Parameters
I The nonparametric curve
converges to the true ICC
(Douglas, 1997)
I The best fitting parametric ICC to
the nonparametric curve (Wells 
Bolt 2008)
So we can fit the 3PL to the smoothed
data (pairs of t and P̂i (t)) instead of
to Ypi (just a vector of 0’s and 1’s).
-2 -1 0 1 2
0.0
0.2
0.4
0.6
0.8
1.0
Item 8 from More Unbalanced Data
ALS-SVD NP
Best Fitting 3PL to NP
Summary of proposed method
To get 3PL parameter estimates
1. Given the P by I matrix Y, find the P by 1 R by SVD
2. Rank persons based on R
3. Use kernel regression with the ranks to estimate ICC
4. Find the closest 3PL curve to the nonparametric ICC
Benefits
I barely iterative
I fast
I works for large and odd data structures
I easy to parallel implementations
But how does it compare?
Baseline condition
With the demonstration data (5000x540, no missing data):
Bias RMSE Cor(True, Est)
Disc (a) -.045 .134 .974
BILOG - Prior Diff (b) -.018 .094 .996
Guess (c) -.009 .034 .908
Disc (a) -.149 .237 .946
BIOLG - NoPrior Diff (b) -.108 .220 .990
Guess (c) -.066 .102 .627
Disc (a) -.071 .166 .965
SVD-NP Diff (b) .012 .181 .985
Guess (c) .002 .066 .761
More challenging data
11,000 persons and 600 3PL items
Persons respond to 10, 25, or 50 items
Items are responded to about 100, 500, 1000
In total 314,500 person/item interactions, so matrix is 96% NA.
How does SVD-NP perform?
ALS-SVD took 20 seconds and 8 iterations.
Kernel-smoothing and 3PL parameter fitting took 11 seconds.
BILOG would not converge, non-converged results were way off
R package ltm took close to 20 hours.
SVD-NP MMLE (ltm)
Bais RMSE Cor(T, E) Bais RMSE Cor(T, E)
Disc a -0.124 0.567 0.505 0.214 1.023 0.511
Diff b -0.015 0.515 0.905 -0.049 0.727 0.869
Guess c -0.042 0.143 0.315 -0.029 0.119 0.408
True a
0.5 1.0 1.5 2.0 2.5 3.0 3.5
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
True b
Bias
in
b
-3 -2 -1 0 1 2 3
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
True c
Estimated
c
0.0 0.1 0.2 0.3 0.4 0.5 0.6
0.0
0.1
0.2
0.3
0.4
0.5
0.6
True a
0.5 1.0 1.5 2.0 2.5 3.0 3.5
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
True b
Bias
in
b
-3 -2 -1 0 1 2 3
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
True c
Estimated
c
0.0 0.1 0.2 0.3 0.4 0.5 0.6
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Summary
The SVD-NP:
I Extends NP IRT methods to complex data structures
I Recovers parametric parameters as well as traditional implementations
I Very fast (0.04% of time of ltm)
Lot to do:
I Stronger foundation
0.0 0.2 0.4 0.6 0.8
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
ALS-SVD and Principal Components
ALS-SVD scores
Principal
Component
Scores

More Related Content

Similar to Quantitative Studies Group - Item Response Theory Spring 2014.pdf

Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
MLAI2
 
HUDE 225Take Home Directions You are a psychologist working a.docx
HUDE 225Take Home Directions You are a psychologist working a.docxHUDE 225Take Home Directions You are a psychologist working a.docx
HUDE 225Take Home Directions You are a psychologist working a.docx
wellesleyterresa
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 

Similar to Quantitative Studies Group - Item Response Theory Spring 2014.pdf (20)

Session II - Estimation methods and accuracy - Brunero Liseo, Discussion
Session II - Estimation methods and accuracy - Brunero Liseo, Discussion Session II - Estimation methods and accuracy - Brunero Liseo, Discussion
Session II - Estimation methods and accuracy - Brunero Liseo, Discussion
 
ML MODULE 2.pdf
ML MODULE 2.pdfML MODULE 2.pdf
ML MODULE 2.pdf
 
QMC: Transition Workshop - Probability Models for Discretization Uncertainty ...
QMC: Transition Workshop - Probability Models for Discretization Uncertainty ...QMC: Transition Workshop - Probability Models for Discretization Uncertainty ...
QMC: Transition Workshop - Probability Models for Discretization Uncertainty ...
 
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
 
Size Measurement and Estimation
Size Measurement and EstimationSize Measurement and Estimation
Size Measurement and Estimation
 
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTIONOPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
 
Clustering: A Survey
Clustering: A SurveyClustering: A Survey
Clustering: A Survey
 
Integrated modelling Cape Town
Integrated modelling Cape TownIntegrated modelling Cape Town
Integrated modelling Cape Town
 
A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...
 
MSSISS riBART 20160321
MSSISS riBART 20160321MSSISS riBART 20160321
MSSISS riBART 20160321
 
HUDE 225Take Home Directions You are a psychologist working a.docx
HUDE 225Take Home Directions You are a psychologist working a.docxHUDE 225Take Home Directions You are a psychologist working a.docx
HUDE 225Take Home Directions You are a psychologist working a.docx
 
Introduction to Algorithms
Introduction to AlgorithmsIntroduction to Algorithms
Introduction to Algorithms
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
50120130405032
5012013040503250120130405032
50120130405032
 
Introduction to Item Response Theory
Introduction to Item Response TheoryIntroduction to Item Response Theory
Introduction to Item Response Theory
 
My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systems
 
Multinomial Model Simulations
Multinomial Model SimulationsMultinomial Model Simulations
Multinomial Model Simulations
 
Making Psychometric Inferences with SVD when Data are Missing Not at Random
Making Psychometric Inferences with SVD when Data are Missing Not at RandomMaking Psychometric Inferences with SVD when Data are Missing Not at Random
Making Psychometric Inferences with SVD when Data are Missing Not at Random
 
Catalog of formulae for forensic genetics ppt
Catalog of formulae for forensic genetics pptCatalog of formulae for forensic genetics ppt
Catalog of formulae for forensic genetics ppt
 

Recently uploaded

Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
Wonjun Hwang
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 

Recently uploaded (20)

How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 

Quantitative Studies Group - Item Response Theory Spring 2014.pdf

  • 1.
  • 2. Thoughts on Item Response Theory ...an incomplete story Quinn N Lathrop University of Notre Dame February 20, 2014
  • 3.
  • 4. Overview I 3PL IRT model I Measurement in IRT I Guessing Parameter I Nonparametric IRT I The Digital Ocean I A new method, simulation, results
  • 5. The Default Model: 3PL Prob(Ypi = 1) = ci + (1 − ci ) × logit−1 [ai (θp − bi )] i = 1, 2, ..., I for items and p = 1, 2, ..., P for persons. Ypi ∼ Bernoulli(Prob(Ypi = 1)) “a three-parameter hammer”
  • 6. What about the 2PL? Each item has its own discrimination parameter ai . 2P-IRT model Ability Prob(Y=1) -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 Discrimination is usually estimated. Similar to logistic regression, except x’s are latent.
  • 7. Tangent #1: What is Measurement? Usual definition of measurement: the assignment of numbers to objects or events according to rules. “In the physical sciences, the practical consequence of performing arithmetic computations on a numeric scale with ordinal properties relative to one with ratio or interval properties is significant. Is it less so in the social sciences?” -Derek Briggs 2013 JEM
  • 8. What is measurement in psychology? “Measurement as a metaphor” is not the only option. Scientific definition of measurement: the estimation of the ratio of a magnitude to a unit. Psychology is a pathological science because we assume our latent traits are quantitive without empirically testing that assumption, or even acknowledging that we made that assumption (Michelle, 1991). “By avoiding tests of the assumption of a quantitative structure of psychological attributes, psychologists have yet failed to make progress ... in regard to their most fundamental assumptions” (Heene, 2013).
  • 9. Additive Conjoint Measurement Newton’s 2nd law log(Force) = log(Mass) + log(Acceleration) If the addition of X and Y satisfy certain axioms, then X, Y , and X + Y = Z all have quantitative structure. If the data fit, the Rasch or 1PL IRT model measures a quantitative (interval-level) latent trait. logit[Prob(correct)] = Ability − Difficulty 2PL and 3PL IRT models (and Structural Equation Models) generally do not support additive conjoint measurement.
  • 10. The Unit in IRT logit[Prob(Ypi = 1)] = ai (θp − bi ) The discrimination parameter ai is the ratio between the item’s unit and the latent unit. Estimating the units while simultaneously making a measurement in the unit.
  • 11. 3PL Model In addition to estimating ai , the 3PL model also estimates the lower bound of Prob(Ypi = 1) Prob(Ypi = 1) = ci + (1 − ci ) × logit−1 [ai (θp − bi )] where ci - guessing (lower asymptote), Prob(Ypi = 1|θp = −∞) = ci
  • 12. But... ...c cannot be estimated I Only possible to estimate two parameters per item consistently (Holland 1990) I Requires large sample size, medium to high difficulty (Lord, 1974; Wood, Wingersky, & Lord, 1976; Thissen & Wainer, 1982) I Convergences rates of can be below 20% (Han, 2012) I Null hypothesis that c = 0 is on boundary of parameter space I There are no examinees near θ = −∞ The solution? I Estimate it anyways I Strong prior distributions I Don’t estimate it (1/number of options) (Han, 2012)
  • 13. Reality rears its ugly head The response form is often not included in the response data Item: “What is the probability of rolling a 4 on a 6-sided die?” Response: / I Free response so (1 / number of options) doesn’t apply I Response form may allow for some guessing
  • 14. Demonstration Data I 540 3PL item parameters from retired test bank I 5000 person parameters from N(0, 1) I No missing data I Fit 3PL with BILOG (MMLE, E-M) I Default priors vs no priors BILOG-MG Prior on c, Beta(5,17) c dbeta(x, 5, 17) 0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 5
  • 15. With this much data, things seem ok c Estimates with Prior True c Estimated c 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 c Estimates without Prior True c Estimated c 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
  • 16. Standard Errors SE of c Estimates with Prior Estimated c Standard Error 0.1 0.2 0.3 0.4 0.5 0.0 0.2 0.4 0.6 0.8 SE of c Estimates without Prior Estimated c Standard Error 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.2 0.4 0.6 0.8
  • 17. Summary of demonstration data c With a large sample size of 5000 persons, using a strong prior on c I brings the estimation away from zero I reduces standard errors of the estimates But do we really want all the items to have nonzero and different c’s?
  • 18. We like parameters... but the 3PL parameters cannot be used separately across items. I Guessing is ci = P(−∞) I Difficulty is P(bi ) = (1 + ci )/2 I Discrimination is ai = 4 × P0 (bi )/(1 − ci ) The difficulty and discrimination are not comparable unless c’s are equal. Can use graphical curves instead (called ICCs or IRFs).
  • 19. “The best way to prevent undesirable consequences of such misuse of the item parameters with 3PL simply would be not to use or not to interpret the item parameters. Graphical analyses on IRFs could be used instead. Preventing the 3PL item parameters from being interpreted, however, would substantially limit the utility of 3PL” (Han, 2012).
  • 20. If we don’t need parameters, let’s use nonparametric IRT P̂i (t) = X p K θ̂p − t h Ypi X p K θ̂p − t h I θ̂p - person ability estimate I Ypi - response I t - evaluation point I K - kernel I h - bandwidth parameter Ability Estimate Prob.Correct -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0
  • 21. Measurement in Nonparametric IRT To get θ̂p, I rank persons on total score* I place ranks onto N(0, 1) Note: The latent trait estimates are explicitly ordinal but placed onto N(0, 1) for connivence and familiarity.
  • 23. The Digital Ocean Data are now a side effect of our interaction with the environment. Consequence on technology. The data are never balanced, and we would never expect them to be. Items have different sample sizes. Persons have different sample sizes. Nonparametric IRT fails with missing data I Can’t rank subjects on total score I Can’t rank subjects on proportion correct How else can we rank individuals?
  • 24. Matrix Decomposition Factorization of a matrix into a product of matrices. The P by I response matrix Y, can be decomposed into a matrix corresponding to the rows of Y and a matrix corresponding to the columns of Y. Y = RC0 Similar in concept to a 1PL IRT model where Ỹ(P×I) = R(P×1)C0 (1×I) or a Singular Value Decomposition of rank 1 or one Principal Component.
  • 25. Alternating Least Squares: SVD with missing data Alternating Least Square is simple and fast Start by randomly filling C R = YC(C0 C)−1 C0 = (R0 R)−1 R0 Y Can simply skip over missing data in Y when calculating these equations.
  • 26. How does ALS-SVD compare to total score? Alternating Least Squares Ranking True Ability values in R -3 -2 -1 0 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0 1.2 True Scores Ranking True Ability Total Scores -3 -2 -1 0 1 2 3 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Three item test, no missing data.
  • 28. Unbalanced data Alternating Least Squares Ranking True Ability values in R -3 -2 -1 0 1 2 3 0.0 0.5 1.0 1.5 True Scores Ranking True Ability Total Scores -3 -2 -1 0 1 2 3 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Five item test, θ 0 answer items 1, 2, 3 and θ 0 answer items 3, 4, 5.
  • 29. More unbalanced data 11 items (ordered in difficulty) Persons interact with different numbers of items Persons interact with different difficulties of items
  • 30. More unbalanced data Alternating Least Squares Ranking True Ability values in R -3 -2 -1 0 1 2 3 0.0 0.5 1.0 1.5 2.0 Total Score 7 6 5 4 3 2 1 0 Proportion Correct Ranking True Ability Total Scores -3 -2 -1 0 1 2 3 0.0 0.2 0.4 0.6 0.8 1.0
  • 31. ALS-SVD leads to better kernel-smoothing -2 -1 0 1 2 0.0 0.4 0.8 Item 1 -2 -1 0 1 2 0.0 0.4 0.8 Item 2 -2 -1 0 1 2 0.0 0.4 0.8 Item 3 -2 -1 0 1 2 0.0 0.4 0.8 Item 4 -2 -1 0 1 2 0.0 0.4 0.8 Item 5 -2 -1 0 1 2 0.0 0.4 0.8 Item 6 -2 -1 0 1 2 0.0 0.4 0.8 Item 7 -2 -1 0 1 2 0.0 0.4 0.8 Item 8 -2 -1 0 1 2 0.0 0.4 0.8 Item 9 -2 -1 0 1 2 0.0 0.4 0.8 Item 10 -2 -1 0 1 2 0.0 0.4 0.8 Item 11 ALS-SVD NP Prop-Cor NP True ICC
  • 32. Summary of ALS-SVD Now, methods based on total score can be applied to more challenging data structures. I Nonparametric smoothing for graphical purposes I Popular item fit statistics (S − χ2 ) I Nonparametric DIF I Nonparametric classification accuracy/consistency estimation But, still a ton of work to do I Consequences of binary data? Constraints? I Assumptions about missing data? I Other assumptions? I Connection to latent trait? 2PL model? I Interpreting the values of R and C
  • 33. If we don’t need parameters, let’s use nonparametric IRT Ability Estimate Prob.Correct -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 P̂i (t) = X p K θ̂p − t h Ypi X p K θ̂p − t h “But I want parameters” -Everybody
  • 34. Nonparametric Parameters I The nonparametric curve converges to the true ICC (Douglas, 1997) I The best fitting parametric ICC to the nonparametric curve (Wells Bolt 2008) So we can fit the 3PL to the smoothed data (pairs of t and P̂i (t)) instead of to Ypi (just a vector of 0’s and 1’s). -2 -1 0 1 2 0.0 0.2 0.4 0.6 0.8 1.0 Item 8 from More Unbalanced Data ALS-SVD NP Best Fitting 3PL to NP
  • 35. Summary of proposed method To get 3PL parameter estimates 1. Given the P by I matrix Y, find the P by 1 R by SVD 2. Rank persons based on R 3. Use kernel regression with the ranks to estimate ICC 4. Find the closest 3PL curve to the nonparametric ICC Benefits I barely iterative I fast I works for large and odd data structures I easy to parallel implementations But how does it compare?
  • 36. Baseline condition With the demonstration data (5000x540, no missing data): Bias RMSE Cor(True, Est) Disc (a) -.045 .134 .974 BILOG - Prior Diff (b) -.018 .094 .996 Guess (c) -.009 .034 .908 Disc (a) -.149 .237 .946 BIOLG - NoPrior Diff (b) -.108 .220 .990 Guess (c) -.066 .102 .627 Disc (a) -.071 .166 .965 SVD-NP Diff (b) .012 .181 .985 Guess (c) .002 .066 .761
  • 37. More challenging data 11,000 persons and 600 3PL items Persons respond to 10, 25, or 50 items Items are responded to about 100, 500, 1000 In total 314,500 person/item interactions, so matrix is 96% NA.
  • 38. How does SVD-NP perform? ALS-SVD took 20 seconds and 8 iterations. Kernel-smoothing and 3PL parameter fitting took 11 seconds. BILOG would not converge, non-converged results were way off R package ltm took close to 20 hours. SVD-NP MMLE (ltm) Bais RMSE Cor(T, E) Bais RMSE Cor(T, E) Disc a -0.124 0.567 0.505 0.214 1.023 0.511 Diff b -0.015 0.515 0.905 -0.049 0.727 0.869 Guess c -0.042 0.143 0.315 -0.029 0.119 0.408
  • 39. True a 0.5 1.0 1.5 2.0 2.5 3.0 3.5 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 True b Bias in b -3 -2 -1 0 1 2 3 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 True c Estimated c 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6 True a 0.5 1.0 1.5 2.0 2.5 3.0 3.5 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 True b Bias in b -3 -2 -1 0 1 2 3 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 True c Estimated c 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6
  • 40. Summary The SVD-NP: I Extends NP IRT methods to complex data structures I Recovers parametric parameters as well as traditional implementations I Very fast (0.04% of time of ltm) Lot to do: I Stronger foundation
  • 41.
  • 42. 0.0 0.2 0.4 0.6 0.8 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 ALS-SVD and Principal Components ALS-SVD scores Principal Component Scores