Making Psychometric Inferences with SVD when Data are Missing Not at Random

Making Psychometric Inferences with SVD
when Data are Missing Not at Random
Quinn N Lathrop
Pearson Advanced Computing and Data Science Lab

Quick Overview
1. What is Singular Value Decomposition?
2. Our Algorithm
3. Analytical Results
4. Simulation Results

What is SVD?
X = UΣV 0






X X X
X X X
X X X
X X X
X X X






=






u1 u2 u3 u4 u5
u1 u2 u3 u4 u5
u1 u2 u3 u4 u5
u1 u2 u3 u4 u5
u1 u2 u3 u4 u5






×






s1
s2
s3






×


v1 v1 v1
v2 v2 v2
v3 v3 v3


0

What is SVD?
X = UΣV 0






X X X
X X X
X X X
X X X
X X X






=






u1 u2 u3 u4 u5
u1 u2 u3 u4 u5
u1 u2 u3 u4 u5
u1 u2 u3 u4 u5
u1 u2 u3 u4 u5






×






s1
s2
s3






×


v1 v1 v1
v2 v2 v2
v3 v3 v3


0
SVD shows up in many places
I Computational backbone of many implementations
I Image, NLP, Dimensionality reduction
I Recommenders (Netflix Challenge)

Our Version of SVD
The response matrix is decomposed into one component
representing the rows/persons and one component representing
the columns/items.
For person p and item i,
ỹpi = rpci
Where:
ỹpi is the best least squares approximation to ypi
rp is the parameter for person p
ci is the parameter for item i

How to Estimate rp and ci?
Define:
tp as the items that person p responded to
si as the persons that responded to item i
Alternating Least Squares:
rp =
P
i∈tp
ciypi
P
i∈tp
c2
i
ci =
P
p∈si
rpypi
P
p∈si
r2
p
initialized by setting all ci = 1

Remember we are dealing with Binary Data
IRT provides a great way to connect binary observed data with
latent properties of the items and the examinees.
Pr(ypi) = logit−1
(θp − βi)
ypi = rpci
SVD
I is a least squares procedure
I is not a latent model
I does not respect 0-1 nature of data
I does not represent educational theory

To Recap
We are going to use a simplified version of SVD on a binary
response matrix with missing data. We will use the results of the
SVD to make psychometric inferences.

Analytic Results
A1 The latent ability θ is unidimensional.
A2 Local independence.
A3 The ICCs of all items are monotonic nondecreasing.

Analytic Results
SVD has psychometrically desirable and meaningful properties

Analytic Results
I r is a consistent ordinal estimator of student ability

Analytic Results
I r is a consistent ordinal estimator of student ability
I c is a consistent ordinal estimator of item easiness

What does it mean?
r approaches the true rank order of θ

What does it mean?
I easy to understand

What does it mean?
I widely used in psychometrics

What does it mean?
c approaches the true rank order of

What does it mean?
I
R
Pr(Y = 1|θ)g(θ) dθ

What does it mean?
I
R
I Pr(Y = 1)

What does it mean?
I
R
I Pr(Y = 1)
Connect SVD to the familiar θ scale and P(θ).

Simulation Studies with Missing Data
Missing data are categorized as MCAR, MAR, and MNAR. IRT
models appropriately ignore the missingness in MCAR and MAR.
MNAR can be a problem.

When item selection is correlated with ability, it’s MNAR.

I Age appropriate items
I Self selection
I Previous placement tests
I Teacher/instructor judgement

I Age appropriate items
I Self selection
I Previous placement tests
I Teacher/instructor judgement
Note: Generally, if item parameters are known and the current θ̂ is used for
item selection (like a CAT), the missing data is MAR.

Block Design Simulation
Ranking Examinees
I Proportion correct
I IRT-2PL ability estimates (2-stage estimator)
I Estimate 2PL item parameters with MMLE
I Estimate person ability with MLE with 2PL item parameters
I SVD

Simulated Conditions
I N = 2000 examinees generated from θ ∼ N(0, 1)
I 1000 respond to “easy” items, 1000 respond to “hard” items
I The two item groups share 5% to 75% of their items
I Group membership is related to θ by
τ∗
= ρ × θ +
p
1 − ρ2 ×
where ∼ N(0, 1)
I ρ is generated randomly from 0 to 1
I Each person responds to 20 or 40 items

1PL, Overlap 5% to 25%
MNAR Correlation - Two Groups
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.65
0.70
0.75
0.80
0.85
0.90
ALS-SVD
IRT-2PL
PropCor
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.65
0.70
0.75
0.80
0.85
0.90
ALS-SVD
IRT-2PL
PropCor
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.65
0.70
0.75
0.80
0.85
0.90
ALS-SVD
IRT-2PL
PropCor
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.65
0.70
0.75
0.80
0.85
0.90
ALS-SVD
IRT-2PL
PropCor
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.65
0.70
0.75
0.80
0.85
0.90
ALS-SVD
IRT-2PL
PropCor
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.65
0.70
0.75
0.80
0.85
0.90
ALS-SVD
IRT-2PL
PropCor
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.65
0.70
0.75
0.80
0.85
0.90
ALS-SVD
IRT-2PL
PropCor
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.65
0.70
0.75
0.80
0.85
0.90
ALS-SVD
IRT-2PL
PropCor
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.65
0.70
0.75
0.80
0.85
0.90
ALS-SVD
IRT-2PL
PropCor

Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.75
0.80
0.85
0.90
0.95
ALS-SVD
IRT-2PL
PropCor
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.75
0.80
0.85
0.90
0.95
ALS-SVD
IRT-2PL
PropCor
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.75
0.80
0.85
0.90
0.95
ALS-SVD
IRT-2PL
PropCor
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.75
0.80
0.85
0.90
0.95
ALS-SVD
IRT-2PL
PropCor
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.75
0.80
0.85
0.90
0.95
ALS-SVD
IRT-2PL
PropCor
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.75
0.80
0.85
0.90
0.95
ALS-SVD
IRT-2PL
PropCor
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.75
0.80
0.85
0.90
0.95
ALS-SVD
IRT-2PL
PropCor
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.75
0.80
0.85
0.90
0.95
ALS-SVD
IRT-2PL
PropCor
Spearman
Rho
0.0 0.2 0.4 0.6 0.8 1.0
0.75
0.80
0.85
0.90
0.95
ALS-SVD
IRT-2PL
PropCor

Summary
I The more missing data there is in a response matrix, the
more aware we must be about the missing mechanism when
fitting a parametric IRT model or using proportion correct.

Summary
I This concern does appear for SVD.

Summary
I This concern does appear for SVD.
I This work provides foundational analytical and empirical
evidence that supports using SVD as a psychometric tool.

Making Psychometric Inferences with SVD when Data are Missing Not at Random

Making Psychometric Inferences with SVD when Data are Missing Not at Random

Recommended

Recommended

More Related Content

Similar to Making Psychometric Inferences with SVD when Data are Missing Not at Random

Similar to Making Psychometric Inferences with SVD when Data are Missing Not at Random (20)

Recently uploaded

Recently uploaded (20)

Making Psychometric Inferences with SVD when Data are Missing Not at Random