Item Response TheoryAdvance Psychometric Theory CPS723P Dr. Carlo Magno
Importance of Test Theories• Estimate examinee ability and how the contribution of error might be minimized• Disattenuation of variables• Reporting true scores or ability scores and associated confidence
Psychometric History• Lord (1952, 1953) and other psychometricians were interested in psychometric models with which to assess examinees independently of the particular choice of items or assessment tasks that were used in the assessment.• Measurement practices would be enhanced if item and test statistics would be made sample independent.• Birnbaum (1957, 1958)• George Rasch (1960)• Wright (1968)
Limitations of the CTT• Item difficulty and item discrimination are group dependent.• The p and r values are dependent on the examinee sample from which they are taken.• Scores are entirely test dependent.• No basis to predict the performance of examinees on an item.
Assumptions in IRT• Unidimensionality – Examinee performance is a single ability• Response → Dichotomous – The relationship of examinee performance on each item and the ability measured by the test is described as monotonically increasing.
• Monotonicity of item performance and ability is typified in an item characteristic curve (ICC).• Examinees with more ability have higher probabilities for giving correct answers to items than lower ability students (Hambleton, 1989).
• Mathematical model linking the observable dichotomously scored data (item performance) b a to the unobservable data (ability)c • Pi(θ) gives the probability of a correct response to item i as a function if ability (θ) • b is the probability of a b=item difficulty correct answer (1+c)/2 a=item discrimination c=psuedoguessing parameter
• In IRT measurement framework, ability estimates of an examinee obtained from a test that vary difficulty will be the same.• Because of the unchanging ability, measurement errors are smaller• True score is determined each test.• Item parameters are independent on the particular examinee sample used.• Measurement error is estimated at each ability level.
Test Characteristic Curve (TCC) • TCC: Sum of ICC that make up a test or assessment and can be used to predict scores of examinees at given ability levels. TCC(Ѳ)=∑Pi(Ѳ) • Links the true score to the underlying ability measures by the test. • TCC shift to the right of the ability scale=difficult items
Item Information Function • I(Ѳ), Contribution of particular items to the assessment of ability. • Items with higher discriminating power contribute more to measurement precision than items with lower discriminating power. • Items tend to make their best contribution to measurement precision around their b value.
1 2 2 1 2 30.8 1.50.6 4 1 10.40.2 0.5 3 4 0 0 –3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3 Ability (θ) Ability (θ) Four item characteristic curves Item information for four test items Figure 6: Item characteristics curves and corresponding item information functions
their corresponding IFFTest Information Function• The sum of item information functions in a test.• Higher values of the a parameter increase the amount of information an item provides.• The lower the c parameter, the more information an item provides.•• The more information provided by an assessment at a particular level, the smaller the errors associated with ability estimation.
21.5 1 0.5 0 0 3 Ability (θ) Figure 7: Test information function for a four–item test
Item Parameter Invariance • Item/test characteristic functions and item/test information functions are integral features of IRT.
Benefits of ItemResponse Models• Item statistics that are independent of the groups from which they were estimated.• Scores describing examinee proficiency or ability that are not dependent on test difficulty.• Test models that provide a basis for matching items or assessment tasks to ability levels.• Models that do not require strict parallel tests or assessments for assessing reliability.
Application of IRT onTest Development• Item Analysis – Determining sample invariant item parameters. – Utilizing goodness-of-fit criteria to detect items that do not fit the specified response model (χ2, analysis of residuals).
Application of IRT onTest Development• Item Selection – Assess the contribution of each item the test information function independent of other items.
– Using item information functions: • Describe the shape of the desired test information function vs. desired range abilities. • Select items with information functions that will fill up the hard to fill areas under the target information function • Calculate the test information function for the selected assessment material. • Continue selecting materials until the test information function approximates the target information function to a satisfactory degree.
• Item banking – Test developers can build an assessment to fit any desired test information function with items having sufficient properties. – Comparisons of items can be made across dissimilar samples.