What makes a good adaptive testing program

www.twosigmas.com twitter.com/twosigmas_ facebook/twosigmaspage
www.twosigmas.com
What makes a good adaptive testing program?

Advantage of IRT based CAT
2
On a 40 question exam with dichotomous
scoring (wrong or right), the total number of
questions you might need to develop is
240
= 1.01 × 1012
.
On a well designed IRT based CAT, the total
number of questions you might need to
develop is
≈ 400.

Who is your market?
3
Students – Computer adaptive teaching/
learning? Blended learning with MOOCs?
Schools – Adaptive homework? Computer
based in school exams? Formative
assessment?
Exam Board – Professionalize organizations?
Corporations? Government organizations?

Background
4
What are you trying to
measure?
How is it manifest?
What questions tests
this?
Is this question valid?
Is this question reliable?
Is this question fair?

General Characteristics of any IRT CAT
5
Item bank
Development
Pre-Test Items
Prioritizing
Publishing
CAT
Maintaining
CAT
Qualification
Specification
Reliability &
Validity
Differential Item
Analysis (DIF)
Content Balancing Item Selection
Percentile
Ranking
Item
Calibration
Communicate Results
Standard Exam
Conditions
Exam Security
(Exposure)
Validity Testing
Termination Criteria
Better than
existing solution?

Differential Item Functioning (DIF): Guessing (𝑥)
6
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
-4 -3 -2 -1 0 1 2 3 4
LikelihoodofGettingaQuestionCorrect
Ability of Test Taker
(z-score)
Item Response Curves
Standard Normal Item 1 Item 2
𝑃𝑖𝑗 = 𝑃 𝑢 = 1 𝑥, 𝛼, 𝜃, 𝛿 = 𝑥 + 1 − 𝑥
𝑒1.702𝛼 𝜃−𝛿
1 + 𝑒1.702𝛼 𝜃−𝛿
,
𝛼 𝑥 𝛿
Normal 1 0 0
Item 1 1 0 0.2
Item 2 1 0 0.07
where i represents the ith item and j represents the jth test taker.

Differential Item Functioning (DIF): Difficulty (𝛿)
7
𝑃𝑖𝑗 = 𝑃 𝑢 = 1 𝑥, 𝛼, 𝜃, 𝛿 = 𝑥 + (1 − 𝑥)
𝑒1.702𝛼(𝜃−𝛿)
1 + 𝑒1.702𝛼(𝜃−𝛿)
𝛼 𝑥 𝛿
Item 1 1 0 0
Item 2 1 0 -1
Item 3 1 0 1
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
-4 -3 -2 -1 0 1 2 3 4
LikelihoodofGettingaQuestionCorrect(percent)
Ability of Test Taker (z-score)
Item 1 Item 2 Item 3

Differential Item Functioning (DIF): Discrimination (𝛼)
8
𝑃𝑖𝑗 = 𝑃 𝑢 = 1 𝑥, 𝛼, 𝜃, 𝛿 = 𝑥 + (1 − 𝑥)
𝑒1.702𝛼(𝜃−𝛿)
1 + 𝑒1.702𝛼(𝜃−𝛿)
𝛼 𝑥 𝛿
Item 1 0.4 0 0
Item 2 0.8 0 -1
Item 3 1.2 0 1
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
-4 -3 -2 -1 0 1 2 3 4
LikelihoodofGettingaQuestionCorrect(percent)
Ability of Test Taker (z-score)
Item 1 Item 2 Item 3

References:
9
• Baker, F. B. and Kim, S. (2004). Item Response Theory Parameter Estimation
Techniques, 2nd Edition, Revised and Expanded. New York, NY: CRC Press,
Taylor and Francis Group.
• Bao, Han, Dayton, C. Mitchell, & Hendrickson, Amy B. (2009). Differential Item
Functioning Amplification and Cancellation in a Reading Test. Practical
Assessment, Research & Evaluation, 14(19). Available online:
http://pareonline.net/getvn.asp?v=14&n=19
• Bergstorm, B. A., Gershon, R.C., and Brown, W. L. (1993) Differential Item
Functioning vs. Differential Test Functioning. Paper Presented at the Annual
Meeting of the American Educational Research Association (Atlanta, GA. April 12 -
16) Retrieved on February 20, 2011 from
http://www.eric.ed.gov/PDFS/ED377227.pdf
• Birdsall, M (2011) Implementing Computer Adaptive Testing to Improve
Achievement Opportunities. Ofqual, Coventry. Online at:
http://webarchive.nationalarchives.gov.uk/+/http://www.ofqual.gov.uk/files/2011-06-
15-implementing-computer-adaptive-testing-to-improve-achievement-
opportunities.pdf
• Bowles, R. and Pommerich, M. (2001). An Examination of Item Review on a CAT
Using the Specific Information Item Selection Algorithm. Paper presented at the
Annual Meeting of the National Council on Measurement in Education, Seattle, WA.
• Childs, Ruth A. & Andrew P. Jaciw (2003). Matrix sampling of items in large-scale
assessments. Practical Assessment, Research & Evaluation, 8(16). Retrieved
February 22, 2011 from http://PAREonline.net/getvn.asp?v=8&n=16
• de Ayala, R. J. (2009). The Theory and Practice of Item Response Theory. New
York, NY: The Guildfor Press.
• He, Q. (2010) Maintaining Standards in on Demand Testing Using Item Response
Theory. Ofqual, Coventry. Retrieved on February 10, 2011, from http://e-
assessment.org.uk/images/uploads/s-docs/Ofqual-10-4724-Maintaining-
standards.pdf
• Newton, Paul E. (2007) 'Clarifying the purposes of educational assessment',
Assessment in Education: Principles, Policy & Practice, 14:2, 149 -170. Retrieved
February 20, 2011 from http://dx.doi.org/10.1080/09695940701478321
• Pommerich, M., Segall, D.O., & Moreno, K.E. (2009). The nine lives of CAT-ASVAB:
Innovations and revelations. In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC
Conference on Computerized Adaptive Testing. Retrieved on February 15, 2011
from www.psych.umn.edu/psylabs/CATCentral/
• Rudner, L. M. (2007). Implementing the Graduate Management Admission Test®
computerized adaptive test. In D. J. Weiss (Ed.), Proceedings of the 2007 GMAC
Conference on Computerized Adaptive Testing. Retrieved January 10, 2010 from
www.psych.umn.edu/psylabs/CATCentral/
• Segall, D. O. and Moreno, K. E. (1999) Development of the CAT-ASVAB. In F.
Drasgow & J. B. Olson-Buchanan (Eds.). Innovations in Computerized Assessment
(pp. 35—65). Hillsdale, NJ: Lawrence Erlbaum Associates. Retrieved on February
20, 2011 from http://www.danielsegall.com/catasvab.pdf
• van der Linden, W. J. and Glas, A. W. (eds.), (2010) Elements of Computer
Adaptive Testing: Statistics. Chapters 4, 10, 17, and page 349. London, UK:
Springer Science + Business Media LLC.
• Wise, L. L., Curran, L. T., & McBride, J. R. (1997). CAT-ASVAB Cost and Benefit
Analyses. In W. A. Sands, B. K. Waters, & J. R. McBride (Eds.), Computerized
adaptive testing: From inquiry to operation (pp. 227-236). Washington, DC:
American Psychological Association.
• Zumbo, B. D. (2007). Three Generations of DIF Analyses: Considering Where it Has
Been, Where it is Now, and Where it is Going. Language Assessment Quarterly,
4(2), 223-233, Lawrence Erlbaum Associates, Publishers. Retrieved February 20,
2011 from http://educ.ubc.ca/faculty/zumbo/papers/Zumbo_LAQ_reprint.pdf

What makes a good adaptive testing program

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to What makes a good adaptive testing program

Similar to What makes a good adaptive testing program (20)

Recently uploaded

Recently uploaded (20)

What makes a good adaptive testing program