This document provides an introduction to item response theory (IRT), including a comparison to classical test theory. IRT is a family of latent trait models that establish psychometric properties of items and scales. Unlike classical test theory, IRT can separate examinee characteristics from test characteristics and provide interval scale scores. The document outlines IRT assumptions, parameters such as difficulty and discrimination, models like the one-, two-, and three-parameter logistic models, and applications of IRT like differential item functioning analysis and computer adaptive testing.
2. Content
• Classical Test Theory (CTT)
• Item Response Theory (IRT)
- Item Response Function
- Item Information Function
- Invariance
• IRT assumptions
• Parameter estimations of IRT
• Scoring
• Applications
11/13/2019 2
3. Item analysis:- In psychometrics, it refers to a statistical methods
used for selecting items for inclusion in a psychological test.
eg. Classical Test Theory (CTT), Latent Trait Models (LTM)
-How appropriate the measurement are for the respondent.
- How well they measure their ability/trait.
Item Analysis
Classical Test Theory Latent Trait Theory
Item Response Theory
11/13/2019 3
4. Classical Test Theory
Developed by Guillford in 1936. Understand the reliability of test
X = T + E
Observed Score = True Score + Error
= the reliability of the observed test score X
The error in CTT is assumed to be: Normal, Mean=0, Uncorrelated with true score
CTT can calculate: Difficulty, Discrimination and Reliability
Limitations of CTT
- CTT has the test (not the items) as its basis
- Examinee characteristics and test characteristics can not be separated
11/13/2019 4
5. Latent Trait Models
- LTMs measure the underlying ability (or trait) which is producing test
performance rather than measuring performance itself; Sample free
-LTMs were around 1940s, popularized 1960s
Manifest Variables
Latent Variables Continuous Categorical
Continuous Factor analysis Item Response Theory
Categorical Latent profile analysis
(mixture models)
Latent Class Analysis
(subset of SEM)
Latent variable Models
11/13/2019 5
6. Item Response Theory
- It refers to a family of latent trait models used to establish psychometric
properties of items and scales.
-Sometimes it is called modern psychometrics as it has completely replaced CTT
- A person’s trait level is estimated from response to the test items.
- It specifies how both trait levels and item properties are related to the person’s
item responses.
-IRT is a model based measurement.
Assumptions of Item Response Theory models
1. The relationship between item responses (observed) and person ability (unobserved)
is manifested in the Item Characteristics Curve (ICC)
2. Unidimensionality i.e. only one ability
3. Local Independence: Examinee’s performance on one item must not effect, his response
to the other item of the test.
11/13/2019 6
7. Classical Test Theory vs Item Response Theory
Classical Test Theory Item Response Theory
The standard error of the
measurement applies to all the score
in a particular population.
The standard error of measurement
differs across scores, but generalizes
across population.
Longer test can be more reliable
than shorter test.
Shorter test can be more reliable
than longer test.
Comparing test scores across
multiple forms is optimal when test
forms are parallel.
Comparing test scores across
multiple forms is optimal when test
difficulty levels vary between
persons.
Unbiased assessment of item
properties depends on having
representative samples.
Unbiased estimates of item
properties may be obtained from
unrepresentative samples.
Test scores are obtained by
comparing their position in a norm
group.
Test scores obtained by comparing
their distance from items.
11/13/2019 7
8. Classical Test Theory Item Response Theory
Interval scale properties are
achieved by obtaining normal score
distributions.
Interval scale properties are
achieved by applying justifiable
measurement models.
Mixed item format leads to
unbalanced impact on test total
scores.
Mixed item formats can yield
optimal test scores.
Change score cannot be
meaningfully compared when initial
score levels differ.
Change score can be meaningfully
compared when initial score levels
differ.
Factor analysis on binary items
procedures artifacts rather than
factors.
Factor analysis on raw item data
yields a full information factor
analysis.
Item stimulus features are
unimportant compared to
psychometric properties.
It can be directly related to
psychometric properties.
Classical Test Theory vs Item Response Theory
11/13/2019 8
9. Components of Item Response Theory
Item Response Function (IRF): Function of probability of success of a
particular items.
Item Characteristics Curve (ICC): Graphical representation of IRF
Item Information Function (IIF): Indicates item quality, item’s ability to differentiate
among respondent. ICC is determined in relation to the
ICC and candidates probability of responding correctly
multiplied by probability of responding incorrectly.
Test Information Function (TIF): The sum of IIF of a particular candidate provides test
information function.
11/13/2019 9
10. (A)
(B)
(A) : Item Characteristics Curve
(B) : Item Information Function
11/13/2019 10
11. Parameters of Item Response Theory
Location or difficulty parameter (b)
- Defined as the amount of latent trait needed to have a 0.5 probability of
endorsing the item.
-The higher the “b” parameter the higher on the trait level a respondent need to
be in order to endorse the item
- It range -3 to +3
Discrimination parameter (a)
-Indicates the steepness of the ICC at the item location
- Indicates how strongly related the items is to the latent trait like loadings in factor
analysis.
-Item with high discriminations are better at differentiating respondent around the
location point; small changes in the latent point lead to large changes in the
probability similarly, with low discriminations
11/13/2019 11
13. Parameters of Item Response Theory
Guessing (c)
-Respondent with very low ability many still choose the correct answer./
respondent with very low trait levels may still have a small probability of
endorsing an item.
-Most happens in MCQs and value should not vary excessively from the reciprocal
of the number of choices.
Upper asymptotes(d)
- Respondent with very high on latent trait are not guaranteed (i.e. have less than 1
probability) to endorse the item.
11/13/2019 13
15. Item Response Models
Models for categorical data :
1) Rasch Model/ One Parameter Logistic Model
2) Two Parameter Logistic Model
3) Three Parameter Logistic Model
4) Four Parameter Logistic Model
Models for polytomous data :
1) Partial Credit Model
2) Rating Scale Model
3) Graded Response Model
4) Generalised Partial Credit Model
5) Generalised Graded Response Model
6) Nominal response model
11/13/2019 15
16. Rasch Model/ One Parameter Logistic Model
Developed in 1960 by Rasch for dichotomous dependent variable
Xis= response of person s on item i
s= Person’s trait score
i= item’s difficulty level
Assumption: all scale items relate to the latent trait equally and items vary only
in difficulty (equivalent to having equal factor loading in FA)
The one-parameter logistic model can also
Where is a constant discrimination value
11/13/2019 16
17. Two Parameter & Three Parameter Logistic Model
i represents item difference in discrimination.
I guessing of items
Which model to choose?
a) The weights for items scoring (equal or unequal)
b) Desired scale properties for that measure
c) Fit to data
d) Purpose for estimating parameters
11/13/2019 17
18. Normal Ogive Models
One parameter normal ogive model
Two parameter normal ogive models
Three parameter normal ogive model
11/13/2019 18
19. Item Response Models for polytomous data
Graded Response Model
Developed by Samejima (1969) extension of 2PL model, ordinal or likert type data
X is the examinee raw item response
Threshold j=1,2,….,mi; is the trait level
= operating characteristics curve
= trait level necessary to respond above
threshold j with 0.50 probability.
Category characteristics curve: The
probability of an examinee
responding in a particular category
conditional to the trait level.
OCC for five response category
CCC for five response category
11/13/2019 19
20. Item Response Models for polytomous data
Modified Graded Response Model
= location parameter
= threshold parameter
Partial Credit Model Developed by Master (1982); Extension of 1PL model
= step difficulty parameter
Generalised Partial Credit Model
11/13/2019 20
21. Item Response Models for polytomous data
Rating Scale Model
In RSM the step difficulties of PCM are
divided into two components. I, location of
the item on the latent scale. j the category
Intersection parameter
Developed by Andrich 1978
Nominal Response Model
11/13/2019 21
22. Applications Item Response Theory
1. Differential Item Functioning
Age group, genders, cultures, ethnic groups and socio-economic background
can be meaningfully compared
“Don’t care about life”
11/13/2019 22
23. Applications Item Response Theory
2. Scaling individual for further analysis
Data are collected in multiple items and collapse them into raw score.
IRT score can be used to represent an optimal scaling of individual on
latent trait.
3. Scale construction and modifications
4. Computer Adaptive Test (CAT)
A item is given to a participant that allows their trait level to be estimated
so that next item is chosen to target that trait level.
This is the primary test approach used by ETS
5. Test equating
Participants that have taken different tests measuring the same construct
(back depression vs CESD) can be placed and compared using IRT.
11/13/2019 23
Editor's Notes
Latent trait models aim to look beyond that at the underlying traits which are producing the test performance. They are measured at item level and provide sample-free measurement
OCC must be estimated between category thresholds.