SlideShare a Scribd company logo
1 of 67
Day 2 AM: Advanced IRT topics
Linking and Equating
DIF
Polytomous IRT
IRT Software overview
Dimensionality
Part 1
Linking, equating, and scaling
Linking = setting different sets of item parameters onto the
same scale
Equating = setting different sets of students/scores on the
same scale
Linking and equating
 Why important? This is necessary for
a stable scale
 If we equate scores on this year’s
exam forms to last year’s we know
that a score of X still means the
exact same thing
 If we don’t, a score of X could mean
different things
Linking and equating
 Two approaches
◦ Base form: Completely map Form B on to
the scale of Form A (the base form)
 Appropriate for maintaining continuity across
time… Form A is base for a long time
◦ Merged scale: Combine data and scales
for Forms A and B (“super matrix”)
 OK for multiple forms at same time, but not
across time
Linking and equating
 In IRT, items and people are on the
same scale, so linking/equating are
equivalent theoretically (although
they can be conducted separate)
 In CTT, this is not the case
 Linking doesn’t really exist in CTT –
but there is extensive research on
equating because it is so important
IRT equating
 Many issues in CTT equating are
reduced with IRT because of the
property of invariance
 Item parameters are invariant across
calibration groups, except for a
linear transformation
 All we have to do is find it
IRT equating
 Why? The scales are defined by the
sample scores, centered on N(0,1)
 Another sample might have a slightly
different distribution of true scores,
and is calibrated with a theoretically
different N(0,1)
 So we find how to map the two scales
to each other
Prerequisites
 To accomplish linking/equating:
 1. The two tests/forms must measure
the same thing (otherwise
concordance)
 2. The two tests/forms must have
equal reliability (classically)
 3. The equating transformation must
be invertible
◦ (A to B and B to A)
Common items or people?
 To do an effective linking between
two data sets, you need something in
common
◦ People – some of the students are the
same as last year (but unchanged, so this
is probably not a good idea for
education!)
◦ Items – Rule of thumb is 20% or 20 items
Common item linking

Common item linking
 Suppose there were 100 items on a
test in 2010
 …and the first 20 were anchors back
to 2009
 Then we need to pick 20 out of the
last 80 to be anchors in 2011
 80 additional items would be
selected as “new” (not necessarily
brand new)
Common item linking
 2009 average = 65
 2010 average = 67
 2009 anchor average = 11
 2010 anchor average = 12
Common item linking
 Items should specifically be selected
to be the anchors
 Difficulty: spread similar to the test
as a whole
 Discrimination: higher is better, but
not so much that it is
unrepresentative of the test
 Not previous anchors
IRT linking analyses
 There are two paradigms for linking:
◦ Concurrent calibration linking
 Full Group (merges scale!!!)
 Target Group (fix parameters)
◦ Conversion linking
 Parameter transformation (mean/mean, mean
sigma)
 TRF methods (Stocking & Lord, Haebara)
IRT linking analyses
 I recommend either target group
calibration or S&L conversion
 Xcalibre does concurrent calibration
methods
 Conversion methods are an additional
post-hoc analysis, so separate
software: IRTEQ
Linking software - IRTEQ
 User-friendly conversion linking
◦ Kyung (Chris) T. Han
 Now at GMAC
◦ Windows GUI
◦ Does all major conversion methods, and
compares them
◦ Interfaces with Parscale
Linking software - IRTEQ
 Purpose of conversion methods:
estimate the linear conversion
between two IRT scales (two
different forms)
 Kind of like regression
 Since linear, “no change” as a slope
(A) of 1 and intercept (B) of 0
 Five different methods of estimating
these
Linking software – IRTEQ

Linking software - IRTEQ
 Compare bs
Linking software - IRTEQ
 Output from example data…
Linking software - IRTEQ
 Then what?
 θ* = A(θ)+B
 b* = A(b)+B
 a* = a/A
Part 2
Differential item functioning (DIF)
What is DIF?
 Differential item functioning
 The item functions differently for
different groups – and is therefore unfair
 One group is more likely to get an item
correct when ability is held constant
What is DIF?
 Two ways to operationally define:
 Directly evaluate probability of response
for ability level slices (Mantel-Haenszel)
 Compare item parameters or statistics
for each group
◦ Basically, analyze the data for each group
separately, then compare
DIF Groups
 Reference group – the main (usually
majority) group
 Focal group – the group being
examined to see if different than
reference group (usually minority)
 DIF analyses assume that both are on
same scale
Types of DIF
 Non-Crossing DIF = same across all ability
levels
◦ Females do better than males, regardless of
ability
◦ “Bias”
◦ aka Uniform DIF
 Crossing DIF = not the same across ability
levels
◦ Females do better than males at above
average ability, but the same for low ability
◦ aka Non-Uniform DIF
Non-crossing DIF
 Example
Crossing DIF
 Example
Quantifying DIF
 Mantel-Haenszel (in Xcalibre)
 Make N ability level slices
 At each, 2 x 2 table of reference/
focal and correct/incorrect
 “Ability” can be classical or IRT
scores
 Show Xcalibre – SMKING with P/L
Quantifying DIF
 There are two IRT-only approaches to
quantifying DIF
◦ Difference between item parameters
 bR = bF?
 Parscale uses this
◦ Difference between IRFs
 More advanced and recent (1995)
 Special program needed: DFIT
 ASC sells Windows version; DOS is free
DIF in Parscale
 Parscale gives several indices (bR = bF)
◦ 1. Absolute difference in parameter
◦ 2. Standardized difference (absolute/SE)
◦ 3. Chi-square = (StanDiff)2
DIF in Parscale
 Contrast and standardized difference
DIF in Parscale
 Chi-square
 More
conservative,
so better to
use
DIF in Parscale
 Straightforward interpretations
◦ Raw diff
◦ Standard diff
◦ p values
 But notice that they are different in the two
tables!
DFIT
 Tests the shaded area
Compensatory DIF
 Another thing to keep an eye out for
 DIF in one item can be offset by DIF
in another
 So a few items in one direction can
be offset
 The total test is then non-DTF
Compensatory DIF
 And you’re not likely to have a test
without DIF items, it just happens for
whatever reason (Type I error)
 DIF analysis can only flag items for
you
 You then need to closely evaluate
content and decide if there is an
issue, or to proceed
Part 3
Polytomous IRT
Polytomous IRT
 For data that is scored with 3+ data
points (remember, multiple choice
collapses to two)
 As mentioned previously, there are
two main families of polytomous
models
◦ Rating Scale
◦ Partial Credit
 Rasch and non-Rasch (“Generalized”)
Rating scale approach
 Designed for Likert-type questions…
◦ Rate on a scale of 1 to 5 whether the
adjective applies to you:
Adjective 1 2 3 4 5
Trustworthy
Outgoing
Diligent
Conscientious
Rating scale approach
 We assume that the process or
mental structure behind the 1-5 scale
is the same for every item
 But items might differ in “difficulty”
Partial credit approach
 We assume that the response process
is different for every item
 The difference between 2 and 4
points might be wider/narrower
Rasch/Non-Rasch
 Non-Rasch allows discrimination to
vary between items
 This means curves can have different
steepness/separation
Comparison table
Model Item Disc. Step
Spacing
Step
Ordering
Option Disc.
RSM Fixed Fixed Fixed Fixed
PCM Fixed Variable Variable Fixed
GRSM Variable Fixed Fixed Fixed
GRM Variable Variable Fixed Fixed
GPCM Variable Variable Variable Fixed
NRM Variable (each
option)
Variable Variable Variable
Fixed/Variable between items
Polytomous IRT
 It used to be that you had to program
all that manually (PARSCALE)
 Let’s look at it in Xcalibre 4…
Part 4
IRT Software
IRT Software
 There are a number of programs out
there, reflecting:
◦ Types of approaches (Rasch, 3PL, Poly)
◦ Cost (free up to 100s of dollars)
◦ Special topics like fit, linking, and form
assembly
◦ Usability vs. flexibility
Some IRT calibration programs
 Xcalibre 4 – easy to use, complete
reports
 Parscale – extremely flexible, does
most models, but difficult to use
 Bilog – most powerful dichotomous
program, difficult to use
 ConQuest – advanced things like
facets models and multidimensional
 Winsteps – most common Rasch
program
Some IRT calibration programs
 PARAM3PL – free; only 3PL
 ICL – free; lots of stuff, but difficult
to use and no support
 R – free; some routines there, but
slow, and inferior output
 OPLM – free; from Cito
Other IRT programs
 ASC’s Form Building Tool to build new
forms using calibrated items
 DIF Tool for DIF graphing
 DFIT8 – DFIT framework for DIF
 ScoreAll – scores examinees
 CATSim – CAT simulations
 IRTLRDIF2
 Most organizations build own tools for
specific purposes
What to do with the results?
 Often a good idea to import scores
and item parameters into Excel (ASC
does CSV directly)
 You can manipulate and further
analyze (frequency graphs, etc.)
 Also helpful for further importing –
scores into a database and item
parameters into the item banker
Part 5
Assumptions of IRT: Dimensionality
IRT assumptions
 Basic Assumptions
1. A stable unidimensional trait
 Item responses are independent of each
other (local independence), except for the
trait/ability that they measure
2. A specific form of the relationship
between trait level and probability of a
response (the response function, or IRT
model)
IRT assumptions
 Unidimensionality and local
independence are actually equivalent
◦ If items are interdependent, then the
probability of a response is due to two
things: your trait level, and whether you
saw the “tripping” item first
◦ This makes it two-dimensional
IRT assumptions
 Two other common violations:
◦ Speededness
◦ Actual multidimensional test (medical
knowledge vs. clinical ability, language
vs. math)
How to check
 So there are two important things to
check:
◦ Unidimensionality
 Factor Analysis
 Bejar’s method
 DIMTEST
◦ Whether our IRT model was a good
choice
 Model fit
Checking unidimensionality
 Factor analysis
 Used often in research investigating
dimensionality
 But it is not recommended to use
“normal” factor analysis, which uses
Pearson correlations
◦ This is used in typical software packages
like SPSS
Checking unidimensionality
 Item-level data of tests are
dichotomous, unlike total scores,
which are continuous
 Special software does factor analysis
with tetrachoric calculations for this
 MicroFact (from ASC)
 TESTFACT (from SSI)
Checking unidimensionality
 Output is still similar to regular
factor analysis
 Eigenvalue plot to examine number
of factors
 Factor loading matrix to examine
“sorting” of items
Checking unidimensionality
Checking unidimensionality
 See output files…
 If unidimensional, factor loadings will
pattern similar to IRT a parameters
Item a Loading
1 .72 .42
2 .81 .44
3 .96 .54
4 .83 .25
5 .47 .11
Checking unidimensionality
 Bejar’s Method
 Useful in situations where you know
your test has different content areas
 Examples:
◦ Cognitive test with fluid and crystallized
◦ Math test with story problems and
number-only problems
◦ Language test with writing and reading
Checking unidimensionality
 It is possible that these tests are not
completely unidimensional, and we
have a good reason to check
Checking unidimensionality
 Bejar’s method:
◦ 1. Do an IRT calibration of the entire test
◦ 2. Do an IRT calibration of each area
separately
◦ 3. Compare the item parameters
Checking unidimensionality
 b parameters
Checking unidimensionality
 a parameters
Checking unidimensionality
 c parameters

More Related Content

What's hot

Logistic regression with SPSS examples
Logistic regression with SPSS examplesLogistic regression with SPSS examples
Logistic regression with SPSS examplesGaurav Kamboj
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answerskavinilavuG
 
Rasch Model Theorem_Scale Construct
Rasch Model Theorem_Scale ConstructRasch Model Theorem_Scale Construct
Rasch Model Theorem_Scale ConstructSaidfudin Mas'udi
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionVARUN KUMAR
 
Fundamental of Statistics and Types of Correlations
Fundamental of Statistics and Types of CorrelationsFundamental of Statistics and Types of Correlations
Fundamental of Statistics and Types of CorrelationsRajesh Verma
 
Linear models
Linear modelsLinear models
Linear modelsFAO
 
Classical Test Theory and Item Response Theory
Classical Test Theory and Item Response TheoryClassical Test Theory and Item Response Theory
Classical Test Theory and Item Response Theorysaira kazim
 
NN-Nearest Neighbor and PDAF-Probabilistic Data Association Filters
NN-Nearest Neighbor and PDAF-Probabilistic Data Association FiltersNN-Nearest Neighbor and PDAF-Probabilistic Data Association Filters
NN-Nearest Neighbor and PDAF-Probabilistic Data Association FiltersEngin Gul
 
A Simple Guide to the Item Response Theory (IRT) and Rasch Modeling
A Simple Guide to the Item Response Theory (IRT) and Rasch ModelingA Simple Guide to the Item Response Theory (IRT) and Rasch Modeling
A Simple Guide to the Item Response Theory (IRT) and Rasch ModelingOpenThink Labs
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Multivariate adaptive regression splines
Multivariate adaptive regression splinesMultivariate adaptive regression splines
Multivariate adaptive regression splinesEklavya Gupta
 
Characterization and Comparison
Characterization and ComparisonCharacterization and Comparison
Characterization and ComparisonBenjamin Franklin
 

What's hot (20)

Logistic regression with SPSS examples
Logistic regression with SPSS examplesLogistic regression with SPSS examples
Logistic regression with SPSS examples
 
Item Response Theory (IRT)
Item Response Theory (IRT)Item Response Theory (IRT)
Item Response Theory (IRT)
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answers
 
Rasch Model Theorem_Scale Construct
Rasch Model Theorem_Scale ConstructRasch Model Theorem_Scale Construct
Rasch Model Theorem_Scale Construct
 
Pid lfr
Pid lfrPid lfr
Pid lfr
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Clock Fundamentals
Clock FundamentalsClock Fundamentals
Clock Fundamentals
 
IRT in Test Construction
IRT in Test Construction IRT in Test Construction
IRT in Test Construction
 
Fundamental of Statistics and Types of Correlations
Fundamental of Statistics and Types of CorrelationsFundamental of Statistics and Types of Correlations
Fundamental of Statistics and Types of Correlations
 
Priya
PriyaPriya
Priya
 
Linear models
Linear modelsLinear models
Linear models
 
Classical Test Theory and Item Response Theory
Classical Test Theory and Item Response TheoryClassical Test Theory and Item Response Theory
Classical Test Theory and Item Response Theory
 
Reliability
ReliabilityReliability
Reliability
 
Correlation
CorrelationCorrelation
Correlation
 
NN-Nearest Neighbor and PDAF-Probabilistic Data Association Filters
NN-Nearest Neighbor and PDAF-Probabilistic Data Association FiltersNN-Nearest Neighbor and PDAF-Probabilistic Data Association Filters
NN-Nearest Neighbor and PDAF-Probabilistic Data Association Filters
 
Confirmatory Factor Analysis
Confirmatory Factor AnalysisConfirmatory Factor Analysis
Confirmatory Factor Analysis
 
A Simple Guide to the Item Response Theory (IRT) and Rasch Modeling
A Simple Guide to the Item Response Theory (IRT) and Rasch ModelingA Simple Guide to the Item Response Theory (IRT) and Rasch Modeling
A Simple Guide to the Item Response Theory (IRT) and Rasch Modeling
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Multivariate adaptive regression splines
Multivariate adaptive regression splinesMultivariate adaptive regression splines
Multivariate adaptive regression splines
 
Characterization and Comparison
Characterization and ComparisonCharacterization and Comparison
Characterization and Comparison
 

Viewers also liked

Introduction to Item Response Theory
Introduction to Item Response TheoryIntroduction to Item Response Theory
Introduction to Item Response TheoryOpenThink Labs
 
Item discrimination
Item discriminationItem discrimination
Item discriminationBasil Ahamed
 
Spectrum Of Education Technologies1.1
Spectrum Of Education Technologies1.1Spectrum Of Education Technologies1.1
Spectrum Of Education Technologies1.1AlexHernandez99
 
Item Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresItem Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresCarlo Magno
 
Item and Distracter Analysis
Item and Distracter AnalysisItem and Distracter Analysis
Item and Distracter AnalysisSue Quirante
 
Item Analysis - Discrimination and Difficulty Index
Item Analysis - Discrimination and Difficulty IndexItem Analysis - Discrimination and Difficulty Index
Item Analysis - Discrimination and Difficulty IndexMr. Ronald Quileste, PhD
 
Item analysis and validation
Item analysis and validationItem analysis and validation
Item analysis and validationKEnkenken Tan
 

Viewers also liked (15)

Irt assessment
Irt assessmentIrt assessment
Irt assessment
 
Introduction to Item Response Theory
Introduction to Item Response TheoryIntroduction to Item Response Theory
Introduction to Item Response Theory
 
Item discrimination
Item discriminationItem discrimination
Item discrimination
 
Spectrum Of Education Technologies1.1
Spectrum Of Education Technologies1.1Spectrum Of Education Technologies1.1
Spectrum Of Education Technologies1.1
 
Presentations
PresentationsPresentations
Presentations
 
Item Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresItem Response Theory in Constructing Measures
Item Response Theory in Constructing Measures
 
T est item analysis
T est item analysisT est item analysis
T est item analysis
 
Item Analysis
Item AnalysisItem Analysis
Item Analysis
 
Item and Distracter Analysis
Item and Distracter AnalysisItem and Distracter Analysis
Item and Distracter Analysis
 
Item analysis
Item analysisItem analysis
Item analysis
 
Item analysis ppt
Item analysis pptItem analysis ppt
Item analysis ppt
 
Item Analysis - Discrimination and Difficulty Index
Item Analysis - Discrimination and Difficulty IndexItem Analysis - Discrimination and Difficulty Index
Item Analysis - Discrimination and Difficulty Index
 
Item analysis and validation
Item analysis and validationItem analysis and validation
Item analysis and validation
 
Item analysis
Item analysis Item analysis
Item analysis
 
Centrer sa recherche
Centrer sa rechercheCentrer sa recherche
Centrer sa recherche
 

Similar to Implementing Item Response Theory

Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind MapAshish Patel
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfBeyaNasr1
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data scienceTanujaSomvanshi1
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxChandrakalaV15
 
Software Measurement: Lecture 1. Measures and Metrics
Software Measurement: Lecture 1. Measures and MetricsSoftware Measurement: Lecture 1. Measures and Metrics
Software Measurement: Lecture 1. Measures and MetricsProgrameter
 
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docxhoney725342
 
Data analytics in computer networking
Data analytics in computer networkingData analytics in computer networking
Data analytics in computer networkingStenio Fernandes
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to RAnshik Bansal
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysisClaireWhittaker5
 
Unit 1 introduction to visual basic programming
Unit 1 introduction to visual basic programmingUnit 1 introduction to visual basic programming
Unit 1 introduction to visual basic programmingAbha Damani
 
MachineLlearning introduction
MachineLlearning introductionMachineLlearning introduction
MachineLlearning introductionThe IOT Academy
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET Journal
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET Journal
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxiamultapromax
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 

Similar to Implementing Item Response Theory (20)

ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind Map
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
EDA by Sastry.pptx
EDA by Sastry.pptxEDA by Sastry.pptx
EDA by Sastry.pptx
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptx
 
Software Measurement: Lecture 1. Measures and Metrics
Software Measurement: Lecture 1. Measures and MetricsSoftware Measurement: Lecture 1. Measures and Metrics
Software Measurement: Lecture 1. Measures and Metrics
 
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
1 Exploratory Data Analysis (EDA) by Melvin Ott, PhD.docx
 
algo 1.ppt
algo 1.pptalgo 1.ppt
algo 1.ppt
 
Data analytics in computer networking
Data analytics in computer networkingData analytics in computer networking
Data analytics in computer networking
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to R
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysis
 
Unit 1 introduction to visual basic programming
Unit 1 introduction to visual basic programmingUnit 1 introduction to visual basic programming
Unit 1 introduction to visual basic programming
 
MachineLlearning introduction
MachineLlearning introductionMachineLlearning introduction
MachineLlearning introduction
 
Introduction to ml
Introduction to mlIntroduction to ml
Introduction to ml
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification AlgorithmsIRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Predictive modeling
Predictive modelingPredictive modeling
Predictive modeling
 

More from Nathan Thompson

Best Practices in Item Writing
Best Practices in Item WritingBest Practices in Item Writing
Best Practices in Item WritingNathan Thompson
 
Introduction to standard setting (cutscores)
Introduction to standard setting (cutscores)Introduction to standard setting (cutscores)
Introduction to standard setting (cutscores)Nathan Thompson
 
Leveraging Tech Enhanced Items without Sacrificing Psychometrics
Leveraging Tech Enhanced Items without Sacrificing PsychometricsLeveraging Tech Enhanced Items without Sacrificing Psychometrics
Leveraging Tech Enhanced Items without Sacrificing PsychometricsNathan Thompson
 
Developing a Computerized Adaptive Test
Developing a Computerized Adaptive TestDeveloping a Computerized Adaptive Test
Developing a Computerized Adaptive TestNathan Thompson
 
Introduction to Computerized Adaptive Testing (CAT)
Introduction to Computerized Adaptive Testing (CAT)Introduction to Computerized Adaptive Testing (CAT)
Introduction to Computerized Adaptive Testing (CAT)Nathan Thompson
 
Statistical detection of test fraud (data forensics) - where do I start?
Statistical detection of test fraud (data forensics) - where do I start?Statistical detection of test fraud (data forensics) - where do I start?
Statistical detection of test fraud (data forensics) - where do I start?Nathan Thompson
 

More from Nathan Thompson (6)

Best Practices in Item Writing
Best Practices in Item WritingBest Practices in Item Writing
Best Practices in Item Writing
 
Introduction to standard setting (cutscores)
Introduction to standard setting (cutscores)Introduction to standard setting (cutscores)
Introduction to standard setting (cutscores)
 
Leveraging Tech Enhanced Items without Sacrificing Psychometrics
Leveraging Tech Enhanced Items without Sacrificing PsychometricsLeveraging Tech Enhanced Items without Sacrificing Psychometrics
Leveraging Tech Enhanced Items without Sacrificing Psychometrics
 
Developing a Computerized Adaptive Test
Developing a Computerized Adaptive TestDeveloping a Computerized Adaptive Test
Developing a Computerized Adaptive Test
 
Introduction to Computerized Adaptive Testing (CAT)
Introduction to Computerized Adaptive Testing (CAT)Introduction to Computerized Adaptive Testing (CAT)
Introduction to Computerized Adaptive Testing (CAT)
 
Statistical detection of test fraud (data forensics) - where do I start?
Statistical detection of test fraud (data forensics) - where do I start?Statistical detection of test fraud (data forensics) - where do I start?
Statistical detection of test fraud (data forensics) - where do I start?
 

Recently uploaded

ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 

Recently uploaded (20)

ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 

Implementing Item Response Theory

  • 1. Day 2 AM: Advanced IRT topics Linking and Equating DIF Polytomous IRT IRT Software overview Dimensionality
  • 2. Part 1 Linking, equating, and scaling Linking = setting different sets of item parameters onto the same scale Equating = setting different sets of students/scores on the same scale
  • 3. Linking and equating  Why important? This is necessary for a stable scale  If we equate scores on this year’s exam forms to last year’s we know that a score of X still means the exact same thing  If we don’t, a score of X could mean different things
  • 4. Linking and equating  Two approaches ◦ Base form: Completely map Form B on to the scale of Form A (the base form)  Appropriate for maintaining continuity across time… Form A is base for a long time ◦ Merged scale: Combine data and scales for Forms A and B (“super matrix”)  OK for multiple forms at same time, but not across time
  • 5. Linking and equating  In IRT, items and people are on the same scale, so linking/equating are equivalent theoretically (although they can be conducted separate)  In CTT, this is not the case  Linking doesn’t really exist in CTT – but there is extensive research on equating because it is so important
  • 6. IRT equating  Many issues in CTT equating are reduced with IRT because of the property of invariance  Item parameters are invariant across calibration groups, except for a linear transformation  All we have to do is find it
  • 7. IRT equating  Why? The scales are defined by the sample scores, centered on N(0,1)  Another sample might have a slightly different distribution of true scores, and is calibrated with a theoretically different N(0,1)  So we find how to map the two scales to each other
  • 8. Prerequisites  To accomplish linking/equating:  1. The two tests/forms must measure the same thing (otherwise concordance)  2. The two tests/forms must have equal reliability (classically)  3. The equating transformation must be invertible ◦ (A to B and B to A)
  • 9. Common items or people?  To do an effective linking between two data sets, you need something in common ◦ People – some of the students are the same as last year (but unchanged, so this is probably not a good idea for education!) ◦ Items – Rule of thumb is 20% or 20 items
  • 11. Common item linking  Suppose there were 100 items on a test in 2010  …and the first 20 were anchors back to 2009  Then we need to pick 20 out of the last 80 to be anchors in 2011  80 additional items would be selected as “new” (not necessarily brand new)
  • 12. Common item linking  2009 average = 65  2010 average = 67  2009 anchor average = 11  2010 anchor average = 12
  • 13. Common item linking  Items should specifically be selected to be the anchors  Difficulty: spread similar to the test as a whole  Discrimination: higher is better, but not so much that it is unrepresentative of the test  Not previous anchors
  • 14. IRT linking analyses  There are two paradigms for linking: ◦ Concurrent calibration linking  Full Group (merges scale!!!)  Target Group (fix parameters) ◦ Conversion linking  Parameter transformation (mean/mean, mean sigma)  TRF methods (Stocking & Lord, Haebara)
  • 15. IRT linking analyses  I recommend either target group calibration or S&L conversion  Xcalibre does concurrent calibration methods  Conversion methods are an additional post-hoc analysis, so separate software: IRTEQ
  • 16. Linking software - IRTEQ  User-friendly conversion linking ◦ Kyung (Chris) T. Han  Now at GMAC ◦ Windows GUI ◦ Does all major conversion methods, and compares them ◦ Interfaces with Parscale
  • 17. Linking software - IRTEQ  Purpose of conversion methods: estimate the linear conversion between two IRT scales (two different forms)  Kind of like regression  Since linear, “no change” as a slope (A) of 1 and intercept (B) of 0  Five different methods of estimating these
  • 18. Linking software – IRTEQ 
  • 19. Linking software - IRTEQ  Compare bs
  • 20. Linking software - IRTEQ  Output from example data…
  • 21. Linking software - IRTEQ  Then what?  θ* = A(θ)+B  b* = A(b)+B  a* = a/A
  • 22. Part 2 Differential item functioning (DIF)
  • 23. What is DIF?  Differential item functioning  The item functions differently for different groups – and is therefore unfair  One group is more likely to get an item correct when ability is held constant
  • 24. What is DIF?  Two ways to operationally define:  Directly evaluate probability of response for ability level slices (Mantel-Haenszel)  Compare item parameters or statistics for each group ◦ Basically, analyze the data for each group separately, then compare
  • 25. DIF Groups  Reference group – the main (usually majority) group  Focal group – the group being examined to see if different than reference group (usually minority)  DIF analyses assume that both are on same scale
  • 26. Types of DIF  Non-Crossing DIF = same across all ability levels ◦ Females do better than males, regardless of ability ◦ “Bias” ◦ aka Uniform DIF  Crossing DIF = not the same across ability levels ◦ Females do better than males at above average ability, but the same for low ability ◦ aka Non-Uniform DIF
  • 29. Quantifying DIF  Mantel-Haenszel (in Xcalibre)  Make N ability level slices  At each, 2 x 2 table of reference/ focal and correct/incorrect  “Ability” can be classical or IRT scores  Show Xcalibre – SMKING with P/L
  • 30. Quantifying DIF  There are two IRT-only approaches to quantifying DIF ◦ Difference between item parameters  bR = bF?  Parscale uses this ◦ Difference between IRFs  More advanced and recent (1995)  Special program needed: DFIT  ASC sells Windows version; DOS is free
  • 31. DIF in Parscale  Parscale gives several indices (bR = bF) ◦ 1. Absolute difference in parameter ◦ 2. Standardized difference (absolute/SE) ◦ 3. Chi-square = (StanDiff)2
  • 32. DIF in Parscale  Contrast and standardized difference
  • 33. DIF in Parscale  Chi-square  More conservative, so better to use
  • 34. DIF in Parscale  Straightforward interpretations ◦ Raw diff ◦ Standard diff ◦ p values  But notice that they are different in the two tables!
  • 35. DFIT  Tests the shaded area
  • 36. Compensatory DIF  Another thing to keep an eye out for  DIF in one item can be offset by DIF in another  So a few items in one direction can be offset  The total test is then non-DTF
  • 37. Compensatory DIF  And you’re not likely to have a test without DIF items, it just happens for whatever reason (Type I error)  DIF analysis can only flag items for you  You then need to closely evaluate content and decide if there is an issue, or to proceed
  • 39. Polytomous IRT  For data that is scored with 3+ data points (remember, multiple choice collapses to two)  As mentioned previously, there are two main families of polytomous models ◦ Rating Scale ◦ Partial Credit  Rasch and non-Rasch (“Generalized”)
  • 40. Rating scale approach  Designed for Likert-type questions… ◦ Rate on a scale of 1 to 5 whether the adjective applies to you: Adjective 1 2 3 4 5 Trustworthy Outgoing Diligent Conscientious
  • 41. Rating scale approach  We assume that the process or mental structure behind the 1-5 scale is the same for every item  But items might differ in “difficulty”
  • 42. Partial credit approach  We assume that the response process is different for every item  The difference between 2 and 4 points might be wider/narrower
  • 43. Rasch/Non-Rasch  Non-Rasch allows discrimination to vary between items  This means curves can have different steepness/separation
  • 44. Comparison table Model Item Disc. Step Spacing Step Ordering Option Disc. RSM Fixed Fixed Fixed Fixed PCM Fixed Variable Variable Fixed GRSM Variable Fixed Fixed Fixed GRM Variable Variable Fixed Fixed GPCM Variable Variable Variable Fixed NRM Variable (each option) Variable Variable Variable Fixed/Variable between items
  • 45. Polytomous IRT  It used to be that you had to program all that manually (PARSCALE)  Let’s look at it in Xcalibre 4…
  • 47. IRT Software  There are a number of programs out there, reflecting: ◦ Types of approaches (Rasch, 3PL, Poly) ◦ Cost (free up to 100s of dollars) ◦ Special topics like fit, linking, and form assembly ◦ Usability vs. flexibility
  • 48. Some IRT calibration programs  Xcalibre 4 – easy to use, complete reports  Parscale – extremely flexible, does most models, but difficult to use  Bilog – most powerful dichotomous program, difficult to use  ConQuest – advanced things like facets models and multidimensional  Winsteps – most common Rasch program
  • 49. Some IRT calibration programs  PARAM3PL – free; only 3PL  ICL – free; lots of stuff, but difficult to use and no support  R – free; some routines there, but slow, and inferior output  OPLM – free; from Cito
  • 50. Other IRT programs  ASC’s Form Building Tool to build new forms using calibrated items  DIF Tool for DIF graphing  DFIT8 – DFIT framework for DIF  ScoreAll – scores examinees  CATSim – CAT simulations  IRTLRDIF2  Most organizations build own tools for specific purposes
  • 51. What to do with the results?  Often a good idea to import scores and item parameters into Excel (ASC does CSV directly)  You can manipulate and further analyze (frequency graphs, etc.)  Also helpful for further importing – scores into a database and item parameters into the item banker
  • 52. Part 5 Assumptions of IRT: Dimensionality
  • 53. IRT assumptions  Basic Assumptions 1. A stable unidimensional trait  Item responses are independent of each other (local independence), except for the trait/ability that they measure 2. A specific form of the relationship between trait level and probability of a response (the response function, or IRT model)
  • 54. IRT assumptions  Unidimensionality and local independence are actually equivalent ◦ If items are interdependent, then the probability of a response is due to two things: your trait level, and whether you saw the “tripping” item first ◦ This makes it two-dimensional
  • 55. IRT assumptions  Two other common violations: ◦ Speededness ◦ Actual multidimensional test (medical knowledge vs. clinical ability, language vs. math)
  • 56. How to check  So there are two important things to check: ◦ Unidimensionality  Factor Analysis  Bejar’s method  DIMTEST ◦ Whether our IRT model was a good choice  Model fit
  • 57. Checking unidimensionality  Factor analysis  Used often in research investigating dimensionality  But it is not recommended to use “normal” factor analysis, which uses Pearson correlations ◦ This is used in typical software packages like SPSS
  • 58. Checking unidimensionality  Item-level data of tests are dichotomous, unlike total scores, which are continuous  Special software does factor analysis with tetrachoric calculations for this  MicroFact (from ASC)  TESTFACT (from SSI)
  • 59. Checking unidimensionality  Output is still similar to regular factor analysis  Eigenvalue plot to examine number of factors  Factor loading matrix to examine “sorting” of items
  • 61. Checking unidimensionality  See output files…  If unidimensional, factor loadings will pattern similar to IRT a parameters Item a Loading 1 .72 .42 2 .81 .44 3 .96 .54 4 .83 .25 5 .47 .11
  • 62. Checking unidimensionality  Bejar’s Method  Useful in situations where you know your test has different content areas  Examples: ◦ Cognitive test with fluid and crystallized ◦ Math test with story problems and number-only problems ◦ Language test with writing and reading
  • 63. Checking unidimensionality  It is possible that these tests are not completely unidimensional, and we have a good reason to check
  • 64. Checking unidimensionality  Bejar’s method: ◦ 1. Do an IRT calibration of the entire test ◦ 2. Do an IRT calibration of each area separately ◦ 3. Compare the item parameters