Kathleen Preston discusses the nominal response model (NRM) for item response theory. The NRM is a flexible model that can address important psychometric questions and be used for exploratory analysis of item response data. However, researchers are unclear on how to approach hypothesis testing of specific NRM parameters. Preston evaluates using the NRM for hypothesis testing, examining its ability to detect violations of the assumptions of the generalized partial credit model. She simulates data under varying conditions and uses likelihood ratio tests to evaluate parameter estimates. The results indicate the NRM can validly test assumptions and has good power to detect issues like an item having too many response categories. High discriminations and skewed trait distributions may present challenges.
2. Most general divide-by-total item response theory
model
NRM has received the least attention
Can be used to address important psychometric
questions
Useful in exploratory item response data
Currently unclear how researchers should
approach hypothesis testing of specific
parameters.
3. ( )
( )
( )
0
ix ix
ix m
ix ix
x
EXP c
P
EXP c
α θ
θ
α θ
=
+
=
+∑
Σ αix = Σ cix = 0
5. α is a category slope
There are four for a 4-Point Item
α* is a category discrimination
There are three for a 4-Point Item
They represent the discrimination of
three dichotomies
α1* = α2 - α1 1 vs. 2
α2* = α3 - α2 2 vs. 3
α3* = α4 - α3 3 vs. 4
6. Rating scale model
◦constrain all c* parameters to be
equal across items
Partial Credit Model
Generalized Partial Credit Model
7. Rating Scale model
Partial Credit model
◦α* is constrained to be equal within
and between items
Generalized Partial Credit model
8. Rating Scale model
Partial Credit model
Generalized Partial Credit model (G-
PCM):
◦a* parameters are constrained
within an item, but not between
items
9. The NRM will be evaluated as a method of
hypothesis testing
◦ Evaluate the assumption of the G-PCM of
equal category discriminations within items
◦ Using PROMIS data as an example of testing
the assumption
◦ Power to detect different category
discrimination parameters within an item
10. Part 1: Evaluation the assumption of the G-PCM
of equal category discriminations within items
◦ Manipulated variables
Category discrimination parameter
Intersection parameters
Number of items
Sample size
Distribution of θ
11. Part 2: Using PROMIS data to test assumption
◦ PROMIS Depression Inventory
768 individuals
28 items
G-PCM was fit to data using PARSCALE
Data simulated using produced parameter estimates
◦ Manipulated variables
Distribution of θ
Sample size
12. Part 3: Power to detect different category
discrimination parameters within an item
◦ Manipulated variables
Average category discrimination
Category discrimination variance
Different forms of too many response options
One discrimination too many
Multi-point item should be a dichotomy
13. Estimate the G-PCM for all simulated data and
identify the log-likelihood
Free up the category discriminations one item at a
time and identify the log-likelihood
Evaluate the change in log-likelihood
Difference in log-likelihood should be chi-square
distributed (M=df, σ2
= 2df)
14. For all conditions with normal θ distribution
For all conditions with skewed θ distribution
�ҧ= .05
�ഥ= 2.00
�2തതതത= 4.01
�ҧ= .31
�ഥ= 5.18
�2തതതത= 16.56
15.
16. PROMIS data parameters
L-R test results
500 1,000 2,000
M(σ2
)
Type I error
Normal θ
2.28 (167.43)
.07
2.28 (12.91)
.07
2.32 (9.57)
.09
M(σ2
)
Type I error
Skewed θ
1.90 (247.09)
.14
3.5 (14.21)
.19
5.58 (21.83)
.37
Sample Size
�ത= 2.25 �1
ഥ= 0.36 �2
തതത= 0.81 �3
തതത= 1.67
17. Average category discrimination
◦ α* = 1.75
◦ α* = 1.25
◦ α* = 0.75
Category discrimination variance
◦ α* variance = 0.5
◦ α* variance = 2.0
Different forms of too many response options
◦ One discrimination too many
For all conditions
◦ Multi-point item should be a dichotomy
For all conditions
�ҧ= 1.00
�ҧ= .77
�ҧ= .63
�ҧ= .67
�ҧ= .76
�ҧ= .26
�ҧ= .63
18. For all conditions under a normal θ distribution,
the LR-difference test appears to be valid
The LR-difference test appears to have adequate
power to detect unequal discrimination
parameters
The LR-difference test has excellent power to
detect when an item has one too many
discrimination parameters (α4 = 0)
High category discriminations and skewed θ
distribution appears to present some problems