SlideShare a Scribd company logo
1 of 66
Day 1 PM: Using IRT
Item and test information
Comparison of IRT to Classical Test Theory
How to do IRT analysis
Part 1
Item and test information
Information
 Information is the tool that IRT uses
to build tests
 It is a statistical term that quantifies
how much something “adds” to a
procedure
 Or, alternatively, how much
uncertainty (error) it decreases
 A good test has a lot of information!
Item information
 IRT calculates information for each
item and test at each level of q
 It is therefore not a single number –
it is a function across ability
 Each item has an item information
function
 Each test has an test information
function
Item information
 Some items provide information for
high students, some for low
 Same is true for tests: a test can be
more accurate for certain score
ranges – and IRT will tell you which
Information
 Item information is summative, that
is, it can be added up to obtain the
test information function (TIF)
 Then we know where to add/subtract
items
 Bonus: The TIF can also be inverted
to obtain a predicted SEM curve
Item information
 With CTT, “information” can be
conceptualized by jointly considering
the P and rpbis
◦ Obviously, a higher rpbis is better
 Definitely don’t want negative!
◦ P represents which examinees it is most
appropriate for
 P = 0.95 is easy, good for low examinees
 P = 0.50 is hard, good for high examinees
Item information
 But since items and examinees are
not on the same scale, there is no
direct connection
 With IRT, there is
 Item with b = 0.7 is good for person
with q = 0.7
◦ This is the basis of adaptive testing –
doing this continually
Item information
 Item information takes this idea and
quantifies it across the spectrum
 It is therefore a function of q as well as
the item parameters
 Where P(q) is the probability of a
correct answer for a given value and
Q(q) is 1-P
 
2
2 2 ( ) ( )
I
( ) 1
i i i
i
i i
Q P c
D a
P c
 q q 
q   
q  
Item information
 That is the computational equation
 Conceptual version that is seen in the
literature is
 Or the slope squared over the
conditional variance
   
2
I ( ) / ( ) (1 ( ) )i i iP P Pq q q q 
Graphing info
 So what does this mean?
 We calculate with that equation, and
it will be higher wherever the slope
of the IRF is higher (for a given value
of q)
 This is the item information function
(IIF)
Graphing info
 So the location of the item
determines the location of the IIF
 The discrimination of the item
determines the spread/peakedness of
the IIF
 Information decreases as the guessing
parameter increases
Some example items
Seq a b c
1 1.00 -2.00 0.26
2 0.70 -1.00 0.21
3 0.40 -0.50 0.30
4 0.50 1.00 0.00
5 0.80 0.00 0.22
Example item IRFs
IIFs – example items
Graphing info functions
 Note that a lower slope is not ALL
bad
 Even though Item 3’s peak is lower, it
provides some info at a much wider
range
 So items like that are quite useful
when info is needed across a wide
range
Using item info
 Item information is inversely related
to error in measurement
 If the item provides more info, it
reduces error
 The equation:
   2/1
1 qq ISEM 
Using item info
Key point: an item has less
error where it has more
information
--> where it has more slope
A test has less error where
it has more information
(items)
Using item info
 IIFs are another way to examine
items individually
 They are also what adaptive testing
utilizes for item selection
 But the best use of item info: test
information and test assembly…
Test information
 As a result of the assumption of local
independence, IIFs can be summed to
obtain a test information function
(TIF)
 Same is true for IRFs – they can be
summed into a TRF
◦ This converts thetas to estimated raw
score
Test information
 Test information, like item
information, shows how well a test
measures at each value of q
 Also inverts to CSEM
 This is extremely useful for test
assembly (aka construction, design,
or building)
Test information
 Consider the 5 IRFs…
Test information
 The TRF is…
Test information
 Consider the 5 IIFs…
Test information
 The TIF is…
Test information
 The CSEM curve is…
Test assembly
 Form building is more efficient and
better directed with IRT
 Reason: we can predict measurement
error (SEM) at each level of θ, not
just overall reliability
Test assembly
 This then allows you to build test
forms with specific TIFs or CSEMs in
mind
 Or multiple forms with the same TIF
 The following figures have the same
average a (0.9) but differ in where
they provide information
TRFs
TIFs
CSEMs
Test development
 You can build your test with specific
TRF/TIF/SEM graph in mind
 Peak at cutscore?
 This can be done inside item bankers
(FastTEST & FT Web) or in separate
spreadsheets (my Form Building Tool)
Bank development
 You can also build the bank for a
testing program with the desired TIF
in mind
 If you know you want it to be peaked,
write items at the desired level of
difficulty to build an adequate bank
Bank development
 Otherwise you risk overexposure
 Don’t use all your best items at once
to make a peaked TIF – or any TIF for
that matter
 In the theoretical IRT world, we don’t
have to worry about that, but
exposure is a real issue
Bank development
 That is the reason linear-on-the-fly
(LOFT )was developed – to massively
reduce exposure and increase
security
◦ Every person gets an very similar TIF, but
a completely different test
◦ These tests are parallel, from an IRT
point of view
◦ Tests are conventional fixed-form
Part 2
A brief comparison of CTT and IRT
CTT and IRT Assumptions
 IRT:
◦ Unidimensionality and local independence
◦ Responses modeled by IRF
◦ Parameters, not statistics (sample
independence)
 CTT:
◦ X = T + E
◦ (1) true scores and error scores are
uncorrelated; (2) the average error score in
the sample is zero
◦ Statistics (not parameters) are sample-based
Comparing CTT and IRT
 CTT is said to have weaker assumptions
◦ Does not explicitly assume
unidimensionality
 But if not there, statistics will be iffy, and rpbis
and reliability suffer
 Sum scoring implicitly assumes items are
equivalent, which means unidimensional (all
items count equally on one total score)
Comparing CTT and IRT
 CTT is said to have weaker assumptions
◦ Does not explicitly assume IRF
 But if the idea of an IRF is not working, then the
item isn’t either
 And if you use rpbis, you assume a linear IRF,
which is actually impossible!
Comparing CTT and IRT
 CTT item statistics are at odds with
each other
◦ P says that there is one common
probability of a correct response
(binomial)
◦ But rpbis says that P increases with total
score (~ability)
Comparing CTT and IRT
 Classical SEM: same for everyone
 IRT SEM: different for everyone –
depends on the items you see and
your ability
 Which is more realistic?
Comparing CTT and IRT
 Direct comparison of item statistics
◦ We still use “difficulty” and
“discrimination”
◦ How different are they from CTT?
◦ Difficulty correlates highly (>0.90)
◦ Discrimination does not – because Rpbis
is linear and IRT is not
Comparing CTT and IRT
 IRT and CTT scores also correlate
>0.95
 So why use IRT?
 There are distinct advantages…
Advantages of IRT
 IRT has parameters, not statistics
 Sample-independent… within a linear
transformation
 Huh? This means that if you have two
calibration groups of different levels,
we can convert parameters/scores
with a simple y = mx + b
 (Linking)
Advantages of IRT
 Items and people are on the same
scale
 Easier to interpret, and allows
adaptive testing
Advantages of IRT
 Information provides an important
tool for test building and bank
development
 Better match the purposes of a test
 IRT CSEM allows far better
description of precision
Advantages of IRT
 More precise scores
 CTT number correct scoring is limited
to k + 1 scores
 3PL has 2k scores
 Compare with 10 items:
◦ 11 vs 1024 possible scores
Advantages of IRT
 Scores take item difficulty into
account
 Allows direct comparison of
examinees that saw different sets of
items
 Scores also account for guessing
Advantages of IRT
 Nonlinear IRF – the linear IRF
assumed by CTT is impossible
 Allows for different SEM for every
examinee
 Not realistic to assume they are all
the same
Disadvantages of IRT
 Sample size
 CTT: 50 is OK, 100 is great
◦ It is much easier to fit a straight line
“model” than an IRF because it is an
oversimplification
 IRT: 100 is bare minimum for 1PL
◦ 3PL? ~500
◦ Puts it out of reach of small testing
programs
Disadvantages of IRT
 No “native” distractor analysis unless
polytomous models
 Can adapt the CTT idea of
quantile/distractor plot with IRT
◦ IRT programs will also give you option P
and Rpbis
Disadvantages of IRT
 Complexity
◦ Not only do you have to understand it
yourself, but…
◦ You also have to explain it to
stakeholders!
Disadvantages of IRT
 However, note that these are not big
problems
◦ Many places have plenty of sample size
◦ You can still use CTT for distractor
analysis (always use both!!!!)
◦ The complexity is not too bad unless
using complex models
◦ Often, the biggest issue is the
stakeholders!
IRT Analysis
How do I go about doing this?
IRT Analysis
 Xcalibre 4 for IRT
 CTT analysis with Iteman 4 (not
necessary, but sometimes helps)
 Also:
◦ Scoring and graphing tool
◦ Form building tool
◦ Empirical IRFs in Excel
◦ Have we covered these sufficiently?
IRT Analysis
 I’m assuming here we are analyzing
just one sample of one test
 What would I look for? Basic…
◦ Items with good parameters (keep/clone)
◦ Items with bad parameters (retire)
 Evaluate their CTT option statistics
◦ TIF/CSEM – meet our needs? (not
good/bad in absolute sense)
IRT Analysis
 What would I look for? Advanced…
◦ Dimensionality assessment (reliability,
any items/sections “off on their own”)
◦ Item fit (also dimensionality, and possible
item issues)
◦ Test sections – any stand out for being
hard, easy, low discriminations, poor
precision, etc?
◦ CSEM/TIF for sections: anything under-
measured?
IRT Analysis
 What would I look for? Advanced…
◦ Finally: what do you want to see in the
data, and how will the test be used?
 Later, we’ll talk about more
advanced uses like:
◦ Linking and equating multiple forms
◦ Test assembly
◦ Adaptive testing
◦ Dimensionality evaluation
Iteman 4.1
 Performs comprehensive classical
analysis
 Quantile plots allow broad evaluation
of IRF shape
 Advantages:
◦ Easily understandable – can use with SMEs
◦ Includes distractors
Xcalibre 4.1
 Provides a comprehensive and user-
friendly IRT analysis
 Allows evaluation of individual items
and test as a whole
 All major graphs
 Many summary graphs (freqs etc.)
 Classical analysis too
Reasons for Xcalibre 4.1
 Current available software (Parscale,
Bilog, Multilog, ConQuest, WinSteps,
ICL) still require programming skills
 Some still run on DOS!
 If IRT is to be more widely used, it
needs a user-friendly system
◦ Input and output
Reasons for Xcalibre 4.1
 Better input
◦ Yes: Point and click buttons
◦ No: DOS programming quasi-language
 Better output
◦ Yes: Word docs (RTF), spreadsheets (CSV)
◦ No: DOS txt files with ugly tables
Reasons for Xcalibre 4.1
 Advanced users with programming
skills and need for customized analysis
can still utilize previous software
 Xcalibre 4.1 is designed for a wider
range of users
 The following description is of Xcalibre
4, but also applies to Iteman 4
Xcalibre 4.1 Interface
 Divided into tabs
 Move left to right…
Xcalibre 4.1 Interface
 All options are specified with buttons
or simple entry boxes
 No code based on keywords
◦ Best example: IRT models (you’ll see)
 Also: usable error messages
Specify files/input; choose options
 I’ll now show how to use X4, and do
some analysis of real data…

More Related Content

What's hot

Item Analysis: Classical and Beyond
Item Analysis: Classical and BeyondItem Analysis: Classical and Beyond
Item Analysis: Classical and BeyondMhairi Mcalpine
 
IRT - Item response Theory
IRT - Item response TheoryIRT - Item response Theory
IRT - Item response TheoryAjay Dhamija
 
Irt 1 pl, 2pl, 3pl.pdf
Irt 1 pl, 2pl, 3pl.pdfIrt 1 pl, 2pl, 3pl.pdf
Irt 1 pl, 2pl, 3pl.pdfCarlo Magno
 
Classical Test Theory and Item Response Theory
Classical Test Theory and Item Response TheoryClassical Test Theory and Item Response Theory
Classical Test Theory and Item Response Theorysaira kazim
 
A visual guide to item response theory
A visual guide to item response theoryA visual guide to item response theory
A visual guide to item response theoryahmad rustam
 
Introduction to Computerized Adaptive Testing (CAT)
Introduction to Computerized Adaptive Testing (CAT)Introduction to Computerized Adaptive Testing (CAT)
Introduction to Computerized Adaptive Testing (CAT)Nathan Thompson
 
Item Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresItem Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresCarlo Magno
 
Point biserial correlation
Point biserial correlationPoint biserial correlation
Point biserial correlationKen Plummer
 
Theories of moral development
Theories of moral developmentTheories of moral development
Theories of moral developmentRuth Hewitt
 
Research Methodology in management
Research Methodology in management Research Methodology in management
Research Methodology in management D Dutta Roy
 
11 adaptive testing-irt
11 adaptive testing-irt11 adaptive testing-irt
11 adaptive testing-irt宥均 林
 
Concepts and categories.ppt
Concepts and categories.pptConcepts and categories.ppt
Concepts and categories.pptRick McKinnon
 
Piaget theory of moral development
Piaget theory of moral developmentPiaget theory of moral development
Piaget theory of moral developmentjagannath kunar
 
Educ101- Child and Adolescent Development
Educ101- Child and Adolescent DevelopmentEduc101- Child and Adolescent Development
Educ101- Child and Adolescent DevelopmentSohaimi Karon
 

What's hot (20)

Item Analysis: Classical and Beyond
Item Analysis: Classical and BeyondItem Analysis: Classical and Beyond
Item Analysis: Classical and Beyond
 
IRT - Item response Theory
IRT - Item response TheoryIRT - Item response Theory
IRT - Item response Theory
 
Irt 1 pl, 2pl, 3pl.pdf
Irt 1 pl, 2pl, 3pl.pdfIrt 1 pl, 2pl, 3pl.pdf
Irt 1 pl, 2pl, 3pl.pdf
 
Irt assessment
Irt assessmentIrt assessment
Irt assessment
 
Classical Test Theory and Item Response Theory
Classical Test Theory and Item Response TheoryClassical Test Theory and Item Response Theory
Classical Test Theory and Item Response Theory
 
A visual guide to item response theory
A visual guide to item response theoryA visual guide to item response theory
A visual guide to item response theory
 
Introduction to Computerized Adaptive Testing (CAT)
Introduction to Computerized Adaptive Testing (CAT)Introduction to Computerized Adaptive Testing (CAT)
Introduction to Computerized Adaptive Testing (CAT)
 
Confirmatory Factor Analysis
Confirmatory Factor AnalysisConfirmatory Factor Analysis
Confirmatory Factor Analysis
 
Classical Test Theory (CTT)- By Dr. Jai Singh
Classical Test Theory (CTT)- By Dr. Jai SinghClassical Test Theory (CTT)- By Dr. Jai Singh
Classical Test Theory (CTT)- By Dr. Jai Singh
 
Item Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresItem Response Theory in Constructing Measures
Item Response Theory in Constructing Measures
 
Point biserial correlation
Point biserial correlationPoint biserial correlation
Point biserial correlation
 
Theories of moral development
Theories of moral developmentTheories of moral development
Theories of moral development
 
Research Methodology in management
Research Methodology in management Research Methodology in management
Research Methodology in management
 
Test scores
Test scoresTest scores
Test scores
 
Correlation
CorrelationCorrelation
Correlation
 
11 adaptive testing-irt
11 adaptive testing-irt11 adaptive testing-irt
11 adaptive testing-irt
 
Concepts and categories.ppt
Concepts and categories.pptConcepts and categories.ppt
Concepts and categories.ppt
 
Correlation 2
Correlation 2Correlation 2
Correlation 2
 
Piaget theory of moral development
Piaget theory of moral developmentPiaget theory of moral development
Piaget theory of moral development
 
Educ101- Child and Adolescent Development
Educ101- Child and Adolescent DevelopmentEduc101- Child and Adolescent Development
Educ101- Child and Adolescent Development
 

Viewers also liked

Item discrimination
Item discriminationItem discrimination
Item discriminationBasil Ahamed
 
Spectrum Of Education Technologies1.1
Spectrum Of Education Technologies1.1Spectrum Of Education Technologies1.1
Spectrum Of Education Technologies1.1AlexHernandez99
 
Item and Distracter Analysis
Item and Distracter AnalysisItem and Distracter Analysis
Item and Distracter AnalysisSue Quirante
 
Item Analysis - Discrimination and Difficulty Index
Item Analysis - Discrimination and Difficulty IndexItem Analysis - Discrimination and Difficulty Index
Item Analysis - Discrimination and Difficulty IndexMr. Ronald Quileste, PhD
 
Item analysis and validation
Item analysis and validationItem analysis and validation
Item analysis and validationKEnkenken Tan
 

Viewers also liked (12)

Item discrimination
Item discriminationItem discrimination
Item discrimination
 
Presentations
PresentationsPresentations
Presentations
 
Spectrum Of Education Technologies1.1
Spectrum Of Education Technologies1.1Spectrum Of Education Technologies1.1
Spectrum Of Education Technologies1.1
 
T est item analysis
T est item analysisT est item analysis
T est item analysis
 
Item Analysis
Item AnalysisItem Analysis
Item Analysis
 
Item and Distracter Analysis
Item and Distracter AnalysisItem and Distracter Analysis
Item and Distracter Analysis
 
Item analysis
Item analysisItem analysis
Item analysis
 
Item analysis ppt
Item analysis pptItem analysis ppt
Item analysis ppt
 
Item Analysis - Discrimination and Difficulty Index
Item Analysis - Discrimination and Difficulty IndexItem Analysis - Discrimination and Difficulty Index
Item Analysis - Discrimination and Difficulty Index
 
Item analysis and validation
Item analysis and validationItem analysis and validation
Item analysis and validation
 
Item analysis
Item analysis Item analysis
Item analysis
 
Centrer sa recherche
Centrer sa rechercheCentrer sa recherche
Centrer sa recherche
 

Similar to Using Item Response Theory to Improve Assessment

Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfDatacademy.ai
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Greg Landrum
 
information retrival evaluation.ppt
information retrival evaluation.pptinformation retrival evaluation.ppt
information retrival evaluation.pptBonnieKabiru
 
Improving the cosmic approximate sizing using the fuzzy logic epcu model al...
Improving the cosmic approximate sizing using the fuzzy logic epcu model   al...Improving the cosmic approximate sizing using the fuzzy logic epcu model   al...
Improving the cosmic approximate sizing using the fuzzy logic epcu model al...IWSM Mensura
 
Designing Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence IntervalsDesigning Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence IntervalsTetsuya Sakai
 
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningIRJET Journal
 
Module 4 information_attributes
Module 4 information_attributesModule 4 information_attributes
Module 4 information_attributesMargarita Araque
 
Result analysis of mining fast frequent itemset using compacted data
Result analysis of mining fast frequent itemset using compacted dataResult analysis of mining fast frequent itemset using compacted data
Result analysis of mining fast frequent itemset using compacted dataijistjournal
 
Result Analysis of Mining Fast Frequent Itemset Using Compacted Data
Result Analysis of Mining Fast Frequent Itemset Using Compacted DataResult Analysis of Mining Fast Frequent Itemset Using Compacted Data
Result Analysis of Mining Fast Frequent Itemset Using Compacted Dataijistjournal
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxChandrakalaV15
 
part-4-structural-equation-modelling-qr.pptx
part-4-structural-equation-modelling-qr.pptxpart-4-structural-equation-modelling-qr.pptx
part-4-structural-equation-modelling-qr.pptxKaRim295737
 
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.pptDECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.pptglorypreciousj
 
Profitable Itemset Mining using Weights
Profitable Itemset Mining using WeightsProfitable Itemset Mining using Weights
Profitable Itemset Mining using WeightsIRJET Journal
 
Functional Programming in C#
Functional Programming in C#Functional Programming in C#
Functional Programming in C#Tadeusz Balcer
 
Intra Cranial Pressure ( Icp ) Measurements Are Taken Via...
Intra Cranial Pressure ( Icp ) Measurements Are Taken Via...Intra Cranial Pressure ( Icp ) Measurements Are Taken Via...
Intra Cranial Pressure ( Icp ) Measurements Are Taken Via...Michelle Love
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016George Roth
 
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)ieijjournal1
 
ETL Testing Training Presentation
ETL Testing Training PresentationETL Testing Training Presentation
ETL Testing Training PresentationApurba Biswas
 
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWSSeeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWSIconic Translation Machines
 
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...Chris Hammerschmidt
 

Similar to Using Item Response Theory to Improve Assessment (20)

Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
information retrival evaluation.ppt
information retrival evaluation.pptinformation retrival evaluation.ppt
information retrival evaluation.ppt
 
Improving the cosmic approximate sizing using the fuzzy logic epcu model al...
Improving the cosmic approximate sizing using the fuzzy logic epcu model   al...Improving the cosmic approximate sizing using the fuzzy logic epcu model   al...
Improving the cosmic approximate sizing using the fuzzy logic epcu model al...
 
Designing Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence IntervalsDesigning Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence Intervals
 
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
 
Module 4 information_attributes
Module 4 information_attributesModule 4 information_attributes
Module 4 information_attributes
 
Result analysis of mining fast frequent itemset using compacted data
Result analysis of mining fast frequent itemset using compacted dataResult analysis of mining fast frequent itemset using compacted data
Result analysis of mining fast frequent itemset using compacted data
 
Result Analysis of Mining Fast Frequent Itemset Using Compacted Data
Result Analysis of Mining Fast Frequent Itemset Using Compacted DataResult Analysis of Mining Fast Frequent Itemset Using Compacted Data
Result Analysis of Mining Fast Frequent Itemset Using Compacted Data
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptx
 
part-4-structural-equation-modelling-qr.pptx
part-4-structural-equation-modelling-qr.pptxpart-4-structural-equation-modelling-qr.pptx
part-4-structural-equation-modelling-qr.pptx
 
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.pptDECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
 
Profitable Itemset Mining using Weights
Profitable Itemset Mining using WeightsProfitable Itemset Mining using Weights
Profitable Itemset Mining using Weights
 
Functional Programming in C#
Functional Programming in C#Functional Programming in C#
Functional Programming in C#
 
Intra Cranial Pressure ( Icp ) Measurements Are Taken Via...
Intra Cranial Pressure ( Icp ) Measurements Are Taken Via...Intra Cranial Pressure ( Icp ) Measurements Are Taken Via...
Intra Cranial Pressure ( Icp ) Measurements Are Taken Via...
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016
 
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
CLASSIFIER SELECTION MODELS FOR INTRUSION DETECTION SYSTEM (IDS)
 
ETL Testing Training Presentation
ETL Testing Training PresentationETL Testing Training Presentation
ETL Testing Training Presentation
 
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWSSeeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS
 
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
 

Recently uploaded

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 

Recently uploaded (20)

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 

Using Item Response Theory to Improve Assessment

  • 1. Day 1 PM: Using IRT Item and test information Comparison of IRT to Classical Test Theory How to do IRT analysis
  • 2. Part 1 Item and test information
  • 3. Information  Information is the tool that IRT uses to build tests  It is a statistical term that quantifies how much something “adds” to a procedure  Or, alternatively, how much uncertainty (error) it decreases  A good test has a lot of information!
  • 4. Item information  IRT calculates information for each item and test at each level of q  It is therefore not a single number – it is a function across ability  Each item has an item information function  Each test has an test information function
  • 5. Item information  Some items provide information for high students, some for low  Same is true for tests: a test can be more accurate for certain score ranges – and IRT will tell you which
  • 6. Information  Item information is summative, that is, it can be added up to obtain the test information function (TIF)  Then we know where to add/subtract items  Bonus: The TIF can also be inverted to obtain a predicted SEM curve
  • 7. Item information  With CTT, “information” can be conceptualized by jointly considering the P and rpbis ◦ Obviously, a higher rpbis is better  Definitely don’t want negative! ◦ P represents which examinees it is most appropriate for  P = 0.95 is easy, good for low examinees  P = 0.50 is hard, good for high examinees
  • 8. Item information  But since items and examinees are not on the same scale, there is no direct connection  With IRT, there is  Item with b = 0.7 is good for person with q = 0.7 ◦ This is the basis of adaptive testing – doing this continually
  • 9. Item information  Item information takes this idea and quantifies it across the spectrum  It is therefore a function of q as well as the item parameters  Where P(q) is the probability of a correct answer for a given value and Q(q) is 1-P   2 2 2 ( ) ( ) I ( ) 1 i i i i i i Q P c D a P c  q q  q    q  
  • 10. Item information  That is the computational equation  Conceptual version that is seen in the literature is  Or the slope squared over the conditional variance     2 I ( ) / ( ) (1 ( ) )i i iP P Pq q q q 
  • 11. Graphing info  So what does this mean?  We calculate with that equation, and it will be higher wherever the slope of the IRF is higher (for a given value of q)  This is the item information function (IIF)
  • 12. Graphing info  So the location of the item determines the location of the IIF  The discrimination of the item determines the spread/peakedness of the IIF  Information decreases as the guessing parameter increases
  • 13. Some example items Seq a b c 1 1.00 -2.00 0.26 2 0.70 -1.00 0.21 3 0.40 -0.50 0.30 4 0.50 1.00 0.00 5 0.80 0.00 0.22
  • 16. Graphing info functions  Note that a lower slope is not ALL bad  Even though Item 3’s peak is lower, it provides some info at a much wider range  So items like that are quite useful when info is needed across a wide range
  • 17. Using item info  Item information is inversely related to error in measurement  If the item provides more info, it reduces error  The equation:    2/1 1 qq ISEM 
  • 18. Using item info Key point: an item has less error where it has more information --> where it has more slope A test has less error where it has more information (items)
  • 19. Using item info  IIFs are another way to examine items individually  They are also what adaptive testing utilizes for item selection  But the best use of item info: test information and test assembly…
  • 20. Test information  As a result of the assumption of local independence, IIFs can be summed to obtain a test information function (TIF)  Same is true for IRFs – they can be summed into a TRF ◦ This converts thetas to estimated raw score
  • 21. Test information  Test information, like item information, shows how well a test measures at each value of q  Also inverts to CSEM  This is extremely useful for test assembly (aka construction, design, or building)
  • 26. Test information  The CSEM curve is…
  • 27. Test assembly  Form building is more efficient and better directed with IRT  Reason: we can predict measurement error (SEM) at each level of θ, not just overall reliability
  • 28. Test assembly  This then allows you to build test forms with specific TIFs or CSEMs in mind  Or multiple forms with the same TIF  The following figures have the same average a (0.9) but differ in where they provide information
  • 29. TRFs
  • 30. TIFs
  • 31. CSEMs
  • 32. Test development  You can build your test with specific TRF/TIF/SEM graph in mind  Peak at cutscore?  This can be done inside item bankers (FastTEST & FT Web) or in separate spreadsheets (my Form Building Tool)
  • 33. Bank development  You can also build the bank for a testing program with the desired TIF in mind  If you know you want it to be peaked, write items at the desired level of difficulty to build an adequate bank
  • 34. Bank development  Otherwise you risk overexposure  Don’t use all your best items at once to make a peaked TIF – or any TIF for that matter  In the theoretical IRT world, we don’t have to worry about that, but exposure is a real issue
  • 35. Bank development  That is the reason linear-on-the-fly (LOFT )was developed – to massively reduce exposure and increase security ◦ Every person gets an very similar TIF, but a completely different test ◦ These tests are parallel, from an IRT point of view ◦ Tests are conventional fixed-form
  • 36. Part 2 A brief comparison of CTT and IRT
  • 37. CTT and IRT Assumptions  IRT: ◦ Unidimensionality and local independence ◦ Responses modeled by IRF ◦ Parameters, not statistics (sample independence)  CTT: ◦ X = T + E ◦ (1) true scores and error scores are uncorrelated; (2) the average error score in the sample is zero ◦ Statistics (not parameters) are sample-based
  • 38. Comparing CTT and IRT  CTT is said to have weaker assumptions ◦ Does not explicitly assume unidimensionality  But if not there, statistics will be iffy, and rpbis and reliability suffer  Sum scoring implicitly assumes items are equivalent, which means unidimensional (all items count equally on one total score)
  • 39. Comparing CTT and IRT  CTT is said to have weaker assumptions ◦ Does not explicitly assume IRF  But if the idea of an IRF is not working, then the item isn’t either  And if you use rpbis, you assume a linear IRF, which is actually impossible!
  • 40. Comparing CTT and IRT  CTT item statistics are at odds with each other ◦ P says that there is one common probability of a correct response (binomial) ◦ But rpbis says that P increases with total score (~ability)
  • 41. Comparing CTT and IRT  Classical SEM: same for everyone  IRT SEM: different for everyone – depends on the items you see and your ability  Which is more realistic?
  • 42. Comparing CTT and IRT  Direct comparison of item statistics ◦ We still use “difficulty” and “discrimination” ◦ How different are they from CTT? ◦ Difficulty correlates highly (>0.90) ◦ Discrimination does not – because Rpbis is linear and IRT is not
  • 43. Comparing CTT and IRT  IRT and CTT scores also correlate >0.95  So why use IRT?  There are distinct advantages…
  • 44. Advantages of IRT  IRT has parameters, not statistics  Sample-independent… within a linear transformation  Huh? This means that if you have two calibration groups of different levels, we can convert parameters/scores with a simple y = mx + b  (Linking)
  • 45. Advantages of IRT  Items and people are on the same scale  Easier to interpret, and allows adaptive testing
  • 46. Advantages of IRT  Information provides an important tool for test building and bank development  Better match the purposes of a test  IRT CSEM allows far better description of precision
  • 47. Advantages of IRT  More precise scores  CTT number correct scoring is limited to k + 1 scores  3PL has 2k scores  Compare with 10 items: ◦ 11 vs 1024 possible scores
  • 48. Advantages of IRT  Scores take item difficulty into account  Allows direct comparison of examinees that saw different sets of items  Scores also account for guessing
  • 49. Advantages of IRT  Nonlinear IRF – the linear IRF assumed by CTT is impossible  Allows for different SEM for every examinee  Not realistic to assume they are all the same
  • 50. Disadvantages of IRT  Sample size  CTT: 50 is OK, 100 is great ◦ It is much easier to fit a straight line “model” than an IRF because it is an oversimplification  IRT: 100 is bare minimum for 1PL ◦ 3PL? ~500 ◦ Puts it out of reach of small testing programs
  • 51. Disadvantages of IRT  No “native” distractor analysis unless polytomous models  Can adapt the CTT idea of quantile/distractor plot with IRT ◦ IRT programs will also give you option P and Rpbis
  • 52. Disadvantages of IRT  Complexity ◦ Not only do you have to understand it yourself, but… ◦ You also have to explain it to stakeholders!
  • 53. Disadvantages of IRT  However, note that these are not big problems ◦ Many places have plenty of sample size ◦ You can still use CTT for distractor analysis (always use both!!!!) ◦ The complexity is not too bad unless using complex models ◦ Often, the biggest issue is the stakeholders!
  • 54. IRT Analysis How do I go about doing this?
  • 55. IRT Analysis  Xcalibre 4 for IRT  CTT analysis with Iteman 4 (not necessary, but sometimes helps)  Also: ◦ Scoring and graphing tool ◦ Form building tool ◦ Empirical IRFs in Excel ◦ Have we covered these sufficiently?
  • 56. IRT Analysis  I’m assuming here we are analyzing just one sample of one test  What would I look for? Basic… ◦ Items with good parameters (keep/clone) ◦ Items with bad parameters (retire)  Evaluate their CTT option statistics ◦ TIF/CSEM – meet our needs? (not good/bad in absolute sense)
  • 57. IRT Analysis  What would I look for? Advanced… ◦ Dimensionality assessment (reliability, any items/sections “off on their own”) ◦ Item fit (also dimensionality, and possible item issues) ◦ Test sections – any stand out for being hard, easy, low discriminations, poor precision, etc? ◦ CSEM/TIF for sections: anything under- measured?
  • 58. IRT Analysis  What would I look for? Advanced… ◦ Finally: what do you want to see in the data, and how will the test be used?  Later, we’ll talk about more advanced uses like: ◦ Linking and equating multiple forms ◦ Test assembly ◦ Adaptive testing ◦ Dimensionality evaluation
  • 59. Iteman 4.1  Performs comprehensive classical analysis  Quantile plots allow broad evaluation of IRF shape  Advantages: ◦ Easily understandable – can use with SMEs ◦ Includes distractors
  • 60. Xcalibre 4.1  Provides a comprehensive and user- friendly IRT analysis  Allows evaluation of individual items and test as a whole  All major graphs  Many summary graphs (freqs etc.)  Classical analysis too
  • 61. Reasons for Xcalibre 4.1  Current available software (Parscale, Bilog, Multilog, ConQuest, WinSteps, ICL) still require programming skills  Some still run on DOS!  If IRT is to be more widely used, it needs a user-friendly system ◦ Input and output
  • 62. Reasons for Xcalibre 4.1  Better input ◦ Yes: Point and click buttons ◦ No: DOS programming quasi-language  Better output ◦ Yes: Word docs (RTF), spreadsheets (CSV) ◦ No: DOS txt files with ugly tables
  • 63. Reasons for Xcalibre 4.1  Advanced users with programming skills and need for customized analysis can still utilize previous software  Xcalibre 4.1 is designed for a wider range of users  The following description is of Xcalibre 4, but also applies to Iteman 4
  • 64. Xcalibre 4.1 Interface  Divided into tabs  Move left to right…
  • 65. Xcalibre 4.1 Interface  All options are specified with buttons or simple entry boxes  No code based on keywords ◦ Best example: IRT models (you’ll see)  Also: usable error messages
  • 66. Specify files/input; choose options  I’ll now show how to use X4, and do some analysis of real data…