SlideShare a Scribd company logo
1 of 37
Download to read offline
Making Psychometric Inferences with SVD
when Data are Missing Not at Random
Quinn N Lathrop
Pearson Advanced Computing and Data Science Lab
Quick Overview
1. What is Singular Value Decomposition?
2. Our Algorithm
3. Analytical Results
4. Simulation Results
What is SVD?
X = UΣV 0
What is SVD?
X = UΣV 0






X X X
X X X
X X X
X X X
X X X






=






u1 u2 u3 u4 u5
u1 u2 u3 u4 u5
u1 u2 u3 u4 u5
u1 u2 u3 u4 u5
u1 u2 u3 u4 u5






×






s1
s2
s3






×


v1 v1 v1
v2 v2 v2
v3 v3 v3


0
What is SVD?
X = UΣV 0






X X X
X X X
X X X
X X X
X X X






=






u1 u2 u3 u4 u5
u1 u2 u3 u4 u5
u1 u2 u3 u4 u5
u1 u2 u3 u4 u5
u1 u2 u3 u4 u5






×






s1
s2
s3






×


v1 v1 v1
v2 v2 v2
v3 v3 v3


0
SVD shows up in many places
I Computational backbone of many implementations
I Image, NLP, Dimensionality reduction
I Recommenders (Netflix Challenge)
Our Version of SVD
The response matrix is decomposed into one component
representing the rows/persons and one component representing
the columns/items.
For person p and item i,
ỹpi = rpci
Where:
ỹpi is the best least squares approximation to ypi
rp is the parameter for person p
ci is the parameter for item i
How to Estimate rp and ci?
Define:
tp as the items that person p responded to
si as the persons that responded to item i
Alternating Least Squares:
rp =
P
i∈tp
ciypi
P
i∈tp
c2
i
ci =
P
p∈si
rpypi
P
p∈si
r2
p
initialized by setting all ci = 1
Remember we are dealing with Binary Data
IRT provides a great way to connect binary observed data with
latent properties of the items and the examinees.
Pr(ypi) = logit−1
(θp − βi)
ypi = rpci
SVD
I is a least squares procedure
I is not a latent model
I does not respect 0-1 nature of data
I does not represent educational theory
To Recap
We are going to use a simplified version of SVD on a binary
response matrix with missing data. We will use the results of the
SVD to make psychometric inferences.
Analytic Results
A1 The latent ability θ is unidimensional.
A2 Local independence.
A3 The ICCs of all items are monotonic nondecreasing.
Analytic Results
A1 The latent ability θ is unidimensional.
A2 Local independence.
A3 The ICCs of all items are monotonic nondecreasing.
SVD has psychometrically desirable and meaningful properties
Analytic Results
A1 The latent ability θ is unidimensional.
A2 Local independence.
A3 The ICCs of all items are monotonic nondecreasing.
SVD has psychometrically desirable and meaningful properties
I r is a consistent ordinal estimator of student ability
Analytic Results
A1 The latent ability θ is unidimensional.
A2 Local independence.
A3 The ICCs of all items are monotonic nondecreasing.
SVD has psychometrically desirable and meaningful properties
I r is a consistent ordinal estimator of student ability
I c is a consistent ordinal estimator of item easiness
What does it mean?
What does it mean?
r approaches the true rank order of θ
What does it mean?
r approaches the true rank order of θ
I easy to understand
What does it mean?
r approaches the true rank order of θ
I easy to understand
I widely used in psychometrics
What does it mean?
r approaches the true rank order of θ
I easy to understand
I widely used in psychometrics
c approaches the true rank order of
What does it mean?
r approaches the true rank order of θ
I easy to understand
I widely used in psychometrics
c approaches the true rank order of
I
R
Pr(Y = 1|θ)g(θ) dθ
What does it mean?
r approaches the true rank order of θ
I easy to understand
I widely used in psychometrics
c approaches the true rank order of
I
R
Pr(Y = 1|θ)g(θ) dθ
I Pr(Y = 1)
What does it mean?
r approaches the true rank order of θ
I easy to understand
I widely used in psychometrics
c approaches the true rank order of
I
R
Pr(Y = 1|θ)g(θ) dθ
I Pr(Y = 1)
Connect SVD to the familiar θ scale and P(θ).
Simulation Studies with Missing Data
Missing data are categorized as MCAR, MAR, and MNAR. IRT
models appropriately ignore the missingness in MCAR and MAR.
MNAR can be a problem.
Simulation Studies with Missing Data
Missing data are categorized as MCAR, MAR, and MNAR. IRT
models appropriately ignore the missingness in MCAR and MAR.
MNAR can be a problem.
When item selection is correlated with ability, it’s MNAR.
Simulation Studies with Missing Data
Missing data are categorized as MCAR, MAR, and MNAR. IRT
models appropriately ignore the missingness in MCAR and MAR.
MNAR can be a problem.
When item selection is correlated with ability, it’s MNAR.
I Age appropriate items
I Self selection
I Previous placement tests
I Teacher/instructor judgement
Simulation Studies with Missing Data
Missing data are categorized as MCAR, MAR, and MNAR. IRT
models appropriately ignore the missingness in MCAR and MAR.
MNAR can be a problem.
When item selection is correlated with ability, it’s MNAR.
I Age appropriate items
I Self selection
I Previous placement tests
I Teacher/instructor judgement
Note: Generally, if item parameters are known and the current θ̂ is used for
item selection (like a CAT), the missing data is MAR.
Block Design Simulation
Ranking Examinees
I Proportion correct
I IRT-2PL ability estimates (2-stage estimator)
I Estimate 2PL item parameters with MMLE
I Estimate person ability with MLE with 2PL item parameters
I SVD
Simulated Conditions
I N = 2000 examinees generated from θ ∼ N(0, 1)
I 1000 respond to “easy” items, 1000 respond to “hard” items
I The two item groups share 5% to 75% of their items
I Group membership is related to θ by
τ∗
= ρ × θ +
p
1 − ρ2 × 
where  ∼ N(0, 1)
I ρ is generated randomly from 0 to 1
I Each person responds to 20 or 40 items
1PL, Overlap 5% to 25%
MNAR Correlation - Two Groups
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.65
0.70
0.75
0.80
0.85
0.90
ALS-SVD
IRT-2PL
PropCor
1PL, Overlap 25% to 50%
MNAR Correlation - Two Groups
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.65
0.70
0.75
0.80
0.85
0.90
ALS-SVD
IRT-2PL
PropCor
1PL, Overlap 50% to 75%
MNAR Correlation - Two Groups
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.65
0.70
0.75
0.80
0.85
0.90
ALS-SVD
IRT-2PL
PropCor
2PL, Overlap 5% to 25%
MNAR Correlation - Two Groups
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.65
0.70
0.75
0.80
0.85
0.90
ALS-SVD
IRT-2PL
PropCor
2PL, Overlap 25% to 50%
MNAR Correlation - Two Groups
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.65
0.70
0.75
0.80
0.85
0.90
ALS-SVD
IRT-2PL
PropCor
2PL, Overlap 50% to 75%
MNAR Correlation - Two Groups
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.65
0.70
0.75
0.80
0.85
0.90
ALS-SVD
IRT-2PL
PropCor
3PL, Overlap 5% to 25%
MNAR Correlation - Two Groups
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.65
0.70
0.75
0.80
0.85
0.90
ALS-SVD
IRT-2PL
PropCor
3PL, Overlap 25% to 50%
MNAR Correlation - Two Groups
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.65
0.70
0.75
0.80
0.85
0.90
ALS-SVD
IRT-2PL
PropCor
3PL, Overlap 50% to 75%
MNAR Correlation - Two Groups
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.65
0.70
0.75
0.80
0.85
0.90
ALS-SVD
IRT-2PL
PropCor
1PL, Overlap 5% to 25%
MNAR Correlation - Two Groups
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.75
0.80
0.85
0.90
0.95
ALS-SVD
IRT-2PL
PropCor
1PL, Overlap 25% to 50%
MNAR Correlation - Two Groups
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.75
0.80
0.85
0.90
0.95
ALS-SVD
IRT-2PL
PropCor
1PL, Overlap 50% to 75%
MNAR Correlation - Two Groups
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.75
0.80
0.85
0.90
0.95
ALS-SVD
IRT-2PL
PropCor
2PL, Overlap 5% to 25%
MNAR Correlation - Two Groups
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.75
0.80
0.85
0.90
0.95
ALS-SVD
IRT-2PL
PropCor
2PL, Overlap 25% to 50%
MNAR Correlation - Two Groups
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.75
0.80
0.85
0.90
0.95
ALS-SVD
IRT-2PL
PropCor
2PL, Overlap 50% to 75%
MNAR Correlation - Two Groups
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.75
0.80
0.85
0.90
0.95
ALS-SVD
IRT-2PL
PropCor
3PL, Overlap 5% to 25%
MNAR Correlation - Two Groups
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.75
0.80
0.85
0.90
0.95
ALS-SVD
IRT-2PL
PropCor
3PL, Overlap 25% to 50%
MNAR Correlation - Two Groups
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.75
0.80
0.85
0.90
0.95
ALS-SVD
IRT-2PL
PropCor
3PL, Overlap 50% to 75%
MNAR Correlation - Two Groups
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.75
0.80
0.85
0.90
0.95
ALS-SVD
IRT-2PL
PropCor
Summary
Summary
I The more missing data there is in a response matrix, the
more aware we must be about the missing mechanism when
fitting a parametric IRT model or using proportion correct.
Summary
I The more missing data there is in a response matrix, the
more aware we must be about the missing mechanism when
fitting a parametric IRT model or using proportion correct.
I This concern does appear for SVD.
Summary
I The more missing data there is in a response matrix, the
more aware we must be about the missing mechanism when
fitting a parametric IRT model or using proportion correct.
I This concern does appear for SVD.
I This work provides foundational analytical and empirical
evidence that supports using SVD as a psychometric tool.
Making Psychometric Inferences with SVD when Data are Missing Not at Random

More Related Content

Similar to Making Psychometric Inferences with SVD when Data are Missing Not at Random

Statistica Sinica 16(2006), 847-860PSEUDO-R2IN LOGIS.docx
Statistica Sinica 16(2006), 847-860PSEUDO-R2IN LOGIS.docxStatistica Sinica 16(2006), 847-860PSEUDO-R2IN LOGIS.docx
Statistica Sinica 16(2006), 847-860PSEUDO-R2IN LOGIS.docx
rafaelaj1
 

Similar to Making Psychometric Inferences with SVD when Data are Missing Not at Random (20)

Statistica Sinica 16(2006), 847-860PSEUDO-R2IN LOGIS.docx
Statistica Sinica 16(2006), 847-860PSEUDO-R2IN LOGIS.docxStatistica Sinica 16(2006), 847-860PSEUDO-R2IN LOGIS.docx
Statistica Sinica 16(2006), 847-860PSEUDO-R2IN LOGIS.docx
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptx
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
 
Multiple linear regression II
Multiple linear regression IIMultiple linear regression II
Multiple linear regression II
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Correlation
Correlation  Correlation
Correlation
 
U1.4-RVDistributions.ppt
U1.4-RVDistributions.pptU1.4-RVDistributions.ppt
U1.4-RVDistributions.ppt
 
Fast ALS-based matrix factorization for explicit and implicit feedback datasets
Fast ALS-based matrix factorization for explicit and implicit feedback datasetsFast ALS-based matrix factorization for explicit and implicit feedback datasets
Fast ALS-based matrix factorization for explicit and implicit feedback datasets
 
ilp-nlp-slides.pdf
ilp-nlp-slides.pdfilp-nlp-slides.pdf
ilp-nlp-slides.pdf
 
Spurious Dependencies and EDA Scalability
Spurious Dependencies and EDA ScalabilitySpurious Dependencies and EDA Scalability
Spurious Dependencies and EDA Scalability
 
Medical statistics
Medical statisticsMedical statistics
Medical statistics
 
Regression for class teaching
Regression for class teachingRegression for class teaching
Regression for class teaching
 
Rouault sfn2014
Rouault sfn2014 Rouault sfn2014
Rouault sfn2014
 
Partitioning Heritability using GWAS Summary Statistics with LD Score Regression
Partitioning Heritability using GWAS Summary Statistics with LD Score RegressionPartitioning Heritability using GWAS Summary Statistics with LD Score Regression
Partitioning Heritability using GWAS Summary Statistics with LD Score Regression
 
Ullmayer_Rodriguez_Presentation
Ullmayer_Rodriguez_PresentationUllmayer_Rodriguez_Presentation
Ullmayer_Rodriguez_Presentation
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
 
Basic deep learning & Deep learning application to medicine
Basic deep learning & Deep learning application to medicineBasic deep learning & Deep learning application to medicine
Basic deep learning & Deep learning application to medicine
 
Dataanalysis2
Dataanalysis2Dataanalysis2
Dataanalysis2
 
L20_D.pdf
L20_D.pdfL20_D.pdf
L20_D.pdf
 

Recently uploaded

Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
Wonjun Hwang
 

Recently uploaded (20)

Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
 

Making Psychometric Inferences with SVD when Data are Missing Not at Random

  • 1.
  • 2. Making Psychometric Inferences with SVD when Data are Missing Not at Random Quinn N Lathrop Pearson Advanced Computing and Data Science Lab
  • 3. Quick Overview 1. What is Singular Value Decomposition? 2. Our Algorithm 3. Analytical Results 4. Simulation Results
  • 4. What is SVD? X = UΣV 0
  • 5. What is SVD? X = UΣV 0       X X X X X X X X X X X X X X X       =       u1 u2 u3 u4 u5 u1 u2 u3 u4 u5 u1 u2 u3 u4 u5 u1 u2 u3 u4 u5 u1 u2 u3 u4 u5       ×       s1 s2 s3       ×   v1 v1 v1 v2 v2 v2 v3 v3 v3   0
  • 6. What is SVD? X = UΣV 0       X X X X X X X X X X X X X X X       =       u1 u2 u3 u4 u5 u1 u2 u3 u4 u5 u1 u2 u3 u4 u5 u1 u2 u3 u4 u5 u1 u2 u3 u4 u5       ×       s1 s2 s3       ×   v1 v1 v1 v2 v2 v2 v3 v3 v3   0 SVD shows up in many places I Computational backbone of many implementations I Image, NLP, Dimensionality reduction I Recommenders (Netflix Challenge)
  • 7. Our Version of SVD The response matrix is decomposed into one component representing the rows/persons and one component representing the columns/items. For person p and item i, ỹpi = rpci Where: ỹpi is the best least squares approximation to ypi rp is the parameter for person p ci is the parameter for item i
  • 8. How to Estimate rp and ci? Define: tp as the items that person p responded to si as the persons that responded to item i Alternating Least Squares: rp = P i∈tp ciypi P i∈tp c2 i ci = P p∈si rpypi P p∈si r2 p initialized by setting all ci = 1
  • 9. Remember we are dealing with Binary Data IRT provides a great way to connect binary observed data with latent properties of the items and the examinees. Pr(ypi) = logit−1 (θp − βi) ypi = rpci SVD I is a least squares procedure I is not a latent model I does not respect 0-1 nature of data I does not represent educational theory
  • 10. To Recap We are going to use a simplified version of SVD on a binary response matrix with missing data. We will use the results of the SVD to make psychometric inferences.
  • 11. Analytic Results A1 The latent ability θ is unidimensional. A2 Local independence. A3 The ICCs of all items are monotonic nondecreasing.
  • 12. Analytic Results A1 The latent ability θ is unidimensional. A2 Local independence. A3 The ICCs of all items are monotonic nondecreasing. SVD has psychometrically desirable and meaningful properties
  • 13. Analytic Results A1 The latent ability θ is unidimensional. A2 Local independence. A3 The ICCs of all items are monotonic nondecreasing. SVD has psychometrically desirable and meaningful properties I r is a consistent ordinal estimator of student ability
  • 14. Analytic Results A1 The latent ability θ is unidimensional. A2 Local independence. A3 The ICCs of all items are monotonic nondecreasing. SVD has psychometrically desirable and meaningful properties I r is a consistent ordinal estimator of student ability I c is a consistent ordinal estimator of item easiness
  • 15. What does it mean?
  • 16. What does it mean? r approaches the true rank order of θ
  • 17. What does it mean? r approaches the true rank order of θ I easy to understand
  • 18. What does it mean? r approaches the true rank order of θ I easy to understand I widely used in psychometrics
  • 19. What does it mean? r approaches the true rank order of θ I easy to understand I widely used in psychometrics c approaches the true rank order of
  • 20. What does it mean? r approaches the true rank order of θ I easy to understand I widely used in psychometrics c approaches the true rank order of I R Pr(Y = 1|θ)g(θ) dθ
  • 21. What does it mean? r approaches the true rank order of θ I easy to understand I widely used in psychometrics c approaches the true rank order of I R Pr(Y = 1|θ)g(θ) dθ I Pr(Y = 1)
  • 22. What does it mean? r approaches the true rank order of θ I easy to understand I widely used in psychometrics c approaches the true rank order of I R Pr(Y = 1|θ)g(θ) dθ I Pr(Y = 1) Connect SVD to the familiar θ scale and P(θ).
  • 23. Simulation Studies with Missing Data Missing data are categorized as MCAR, MAR, and MNAR. IRT models appropriately ignore the missingness in MCAR and MAR. MNAR can be a problem.
  • 24. Simulation Studies with Missing Data Missing data are categorized as MCAR, MAR, and MNAR. IRT models appropriately ignore the missingness in MCAR and MAR. MNAR can be a problem. When item selection is correlated with ability, it’s MNAR.
  • 25. Simulation Studies with Missing Data Missing data are categorized as MCAR, MAR, and MNAR. IRT models appropriately ignore the missingness in MCAR and MAR. MNAR can be a problem. When item selection is correlated with ability, it’s MNAR. I Age appropriate items I Self selection I Previous placement tests I Teacher/instructor judgement
  • 26. Simulation Studies with Missing Data Missing data are categorized as MCAR, MAR, and MNAR. IRT models appropriately ignore the missingness in MCAR and MAR. MNAR can be a problem. When item selection is correlated with ability, it’s MNAR. I Age appropriate items I Self selection I Previous placement tests I Teacher/instructor judgement Note: Generally, if item parameters are known and the current θ̂ is used for item selection (like a CAT), the missing data is MAR.
  • 27. Block Design Simulation Ranking Examinees I Proportion correct I IRT-2PL ability estimates (2-stage estimator) I Estimate 2PL item parameters with MMLE I Estimate person ability with MLE with 2PL item parameters I SVD
  • 28. Simulated Conditions I N = 2000 examinees generated from θ ∼ N(0, 1) I 1000 respond to “easy” items, 1000 respond to “hard” items I The two item groups share 5% to 75% of their items I Group membership is related to θ by τ∗ = ρ × θ + p 1 − ρ2 × where ∼ N(0, 1) I ρ is generated randomly from 0 to 1 I Each person responds to 20 or 40 items
  • 29.
  • 30. 1PL, Overlap 5% to 25% MNAR Correlation - Two Groups Spearman Rho 0.0 0.2 0.4 0.6 0.8 1.0 0.65 0.70 0.75 0.80 0.85 0.90 ALS-SVD IRT-2PL PropCor 1PL, Overlap 25% to 50% MNAR Correlation - Two Groups Spearman Rho 0.0 0.2 0.4 0.6 0.8 1.0 0.65 0.70 0.75 0.80 0.85 0.90 ALS-SVD IRT-2PL PropCor 1PL, Overlap 50% to 75% MNAR Correlation - Two Groups Spearman Rho 0.0 0.2 0.4 0.6 0.8 1.0 0.65 0.70 0.75 0.80 0.85 0.90 ALS-SVD IRT-2PL PropCor 2PL, Overlap 5% to 25% MNAR Correlation - Two Groups Spearman Rho 0.0 0.2 0.4 0.6 0.8 1.0 0.65 0.70 0.75 0.80 0.85 0.90 ALS-SVD IRT-2PL PropCor 2PL, Overlap 25% to 50% MNAR Correlation - Two Groups Spearman Rho 0.0 0.2 0.4 0.6 0.8 1.0 0.65 0.70 0.75 0.80 0.85 0.90 ALS-SVD IRT-2PL PropCor 2PL, Overlap 50% to 75% MNAR Correlation - Two Groups Spearman Rho 0.0 0.2 0.4 0.6 0.8 1.0 0.65 0.70 0.75 0.80 0.85 0.90 ALS-SVD IRT-2PL PropCor 3PL, Overlap 5% to 25% MNAR Correlation - Two Groups Spearman Rho 0.0 0.2 0.4 0.6 0.8 1.0 0.65 0.70 0.75 0.80 0.85 0.90 ALS-SVD IRT-2PL PropCor 3PL, Overlap 25% to 50% MNAR Correlation - Two Groups Spearman Rho 0.0 0.2 0.4 0.6 0.8 1.0 0.65 0.70 0.75 0.80 0.85 0.90 ALS-SVD IRT-2PL PropCor 3PL, Overlap 50% to 75% MNAR Correlation - Two Groups Spearman Rho 0.0 0.2 0.4 0.6 0.8 1.0 0.65 0.70 0.75 0.80 0.85 0.90 ALS-SVD IRT-2PL PropCor
  • 31.
  • 32. 1PL, Overlap 5% to 25% MNAR Correlation - Two Groups Spearman Rho 0.0 0.2 0.4 0.6 0.8 1.0 0.75 0.80 0.85 0.90 0.95 ALS-SVD IRT-2PL PropCor 1PL, Overlap 25% to 50% MNAR Correlation - Two Groups Spearman Rho 0.0 0.2 0.4 0.6 0.8 1.0 0.75 0.80 0.85 0.90 0.95 ALS-SVD IRT-2PL PropCor 1PL, Overlap 50% to 75% MNAR Correlation - Two Groups Spearman Rho 0.0 0.2 0.4 0.6 0.8 1.0 0.75 0.80 0.85 0.90 0.95 ALS-SVD IRT-2PL PropCor 2PL, Overlap 5% to 25% MNAR Correlation - Two Groups Spearman Rho 0.0 0.2 0.4 0.6 0.8 1.0 0.75 0.80 0.85 0.90 0.95 ALS-SVD IRT-2PL PropCor 2PL, Overlap 25% to 50% MNAR Correlation - Two Groups Spearman Rho 0.0 0.2 0.4 0.6 0.8 1.0 0.75 0.80 0.85 0.90 0.95 ALS-SVD IRT-2PL PropCor 2PL, Overlap 50% to 75% MNAR Correlation - Two Groups Spearman Rho 0.0 0.2 0.4 0.6 0.8 1.0 0.75 0.80 0.85 0.90 0.95 ALS-SVD IRT-2PL PropCor 3PL, Overlap 5% to 25% MNAR Correlation - Two Groups Spearman Rho 0.0 0.2 0.4 0.6 0.8 1.0 0.75 0.80 0.85 0.90 0.95 ALS-SVD IRT-2PL PropCor 3PL, Overlap 25% to 50% MNAR Correlation - Two Groups Spearman Rho 0.0 0.2 0.4 0.6 0.8 1.0 0.75 0.80 0.85 0.90 0.95 ALS-SVD IRT-2PL PropCor 3PL, Overlap 50% to 75% MNAR Correlation - Two Groups Spearman Rho 0.0 0.2 0.4 0.6 0.8 1.0 0.75 0.80 0.85 0.90 0.95 ALS-SVD IRT-2PL PropCor
  • 34. Summary I The more missing data there is in a response matrix, the more aware we must be about the missing mechanism when fitting a parametric IRT model or using proportion correct.
  • 35. Summary I The more missing data there is in a response matrix, the more aware we must be about the missing mechanism when fitting a parametric IRT model or using proportion correct. I This concern does appear for SVD.
  • 36. Summary I The more missing data there is in a response matrix, the more aware we must be about the missing mechanism when fitting a parametric IRT model or using proportion correct. I This concern does appear for SVD. I This work provides foundational analytical and empirical evidence that supports using SVD as a psychometric tool.