SlideShare a Scribd company logo
Upcoming Caveon Events
• Caveon Webinar Series: Next session, October 16
The Good and Bad of Online Proctoring, Part 2
• EATP – September 25-27 in St. Julian’s, Malta.
– Caveon’s John Fremer and Steve Addicott presenting:
What are we Accountable For? Security Standards and Resources for High
Stakes Testing Programs
– Steve Addicott hosting an ignite session: Leveraging Social Media to Connect with
International Test Candidates
• The 2nd Annual Statistical Detection of Potential Test Fraud Conference
– October 17-19, 2013, Madison, Wisconsin
– Caveon’s Dennis Maynes and Cindy Butler will be presenting three sessions
• Handbook of Test Security – Now Available. We will share a discount code at the
end of this session.
Caveon Online
• Caveon Security Insights Blog
– http://www.caveon.com/blog/
• twitter
– Follow @Caveon
• LinkedIn
– Caveon Company Page
– ―Caveon Test Security‖ Group
• Please contribute!
• Facebook
– Will you be our ―friend?‖
– ―Like‖ us!
www.caveon.com
Improving Testing with Key Strength Analysis
Dennis Maynes Dan Allen
Chief Scientist Psychometrician
Caveon Test Security Western Governors University
Marcus Scott Barbara Foster
Data Forensics Scientist Psychometrician
Caveon Test Security American Board of Obstetrics
and Gynecology
September 18, 2013
Caveon Webinar Series:
Agenda for Today
• Review classical item analysis
• Introduce Key Strength Analysis
• Derive Key Strength Analysis
• Observations by Dan Allen and Barbara Foster
• Conclusions and Q&A
Review Classical Item Analysis
• Statistics
– P-value
– Point-biserial correlation
• Typical rules
– Low p-values (hard items)
– High p-values (easy items)
– Low point-biserial correlations (low discriminations)
• Easy to understand and implement
• Good at flagging poor items
Introduce Key Strength Analysis
• Why Key Strength Analysis?
– Model uses information from all items
– Answer choices for same item are compared
– Provides possible reasons for poor performance
• High performing test takers (knowledgeable students)
– Typically report problems with the answer key
– Usually choose the correct answer
• Most frequently selected choice
– Is usually correct for easy items
– Is not necessarily correct for hard items
Capabilities of Key Strength Analysis
• Built upon classical item analysis
– Point-biserial correlations discriminate between high and low
performers
– P-values detect hard/easy items
• Typical problems with items
– Mis-keyed items
– Weakly keyed items
– Ambiguously keyed items
• Use probabilities to make inferences about item
performance
Modify Point-Biserial Correlation
1. Exclude the item score from the test score
• Places all answer choices on ―the same playing field‖
• Allows correct and incorrect answers to be compared using
―what if‖
2. Compute point-biserial correlations
• For correct answer and
• For distractors
3. Scale point-biserial appropriately
• We call this statistic, z*
• Use z* to compute the probability of the choice (A, B, etc.) being
a key--this is the ―key strength‖
Derive Key Strength Analysis
After Some Algebra
Why z* Depends on all the Right Quantities
Z* for all Items and Responses
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
z*
Right Wrong
154 Examinees, 100 Items
Calculating p(choice is a key | data)
Approximation Theory
• Central Limit Theorem  z* is normal.
• Probability function should be monotonic
increasing, which requires equal variances
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
z*
Right Right Normal Wrong Wrong Normal
P(choice is a key | z*)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
p(choiceisakey|z*)
z*
Analysis of Distractors
• Compute key strength (KS) for all responses
• Low KS – probability less than 50%
• High KS – probability 50% or more
AnswerDistractors Low KS High KS
Low KS Weakly keyed Potential mis-key
High KS Normal Ambiguously keyed
Example I – Good Key
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
p(choiceisakey|z*)
z*
A
C D
B
Response z* Probability
A 3.25 0.99
B 0.25 0.06
C -2.75 0
D -2.4 0
Answer key arrow is
colored gold
Example II – Potential Mis-key
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
p(choiceisakey|z*)
z*
A
B
C D
Response z* Probability
A 3.25 0.99
B 0.25 0.06
C -2.75 0
D -2.4 0
Answer key arrow is
colored gold
Example III – Weak Key
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
p(choiceisakey|z*)
z*
A
B
C D
Response z* Probability
A 1.0 0.32
B 0.25 0.06
C -3 0
D -2.5 0
Answer key arrow is
colored gold
Example IV – Ambiguous Key
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
p(choiceisakey|z*)
z*
Response z* Probability
A 3.75 0.99
B 2.25 0.9
C -3 0
D -2.5 0
C D
A
B
Answer key arrow is
colored gold
Validation – Answer Key Estimation
• Assume the key is not known
• Check accuracy of estimated answer key
• Algorithm:
– Start with most frequent response as initial guess
– Revise key using probabilities until no more changes
• For 12 different exams
– Key estimation accuracy varied from 81% to 99%
– Cannot infer multiple keys
– Cannot guess key when there are no correct responses
Summary of Validation Study
• Accuracy improves with item quality
• Accuracy affected by sample size & test length
Exam
Name
N Forms
Form
Length
Items
Non-scored
Items
Accuracy Observations
A 2,966 2 180 307 0 99.2%
B 337 2 107 214 0 85.5%
C 337 1 230 230 0 90.9%
D 1815 1 204 204 7 92.1%Some association with "deleted" items
E 1408 1 199 199 1 96.0%
F 46,356 2 240 480 0 96.0%
G 44,104 2 120 240 0 95.8%
H 25,448 2 60 120 0 93.3%
I 121 3 165 417 43 81.0%Strong association with "field test" items
J 1,071 8 52 & 61 391 0 80.5%85.2% (English-only)
K 2,033 8 68, 76 & 77 510 0 85.9%
L 6,473 21 250 1050 850 85.7%
All errors except one were on non-scored
items.
Reason for Answer Key Estimation
• If a group of test takers has stolen the test and worked
out their own answer key, it is likely some answers will
be wrong.
• Answer key estimation can find the errors committed by
test thieves.
Dan Allen
Psychometrician
Western Governors University
Example Item: Ambiguous Key
Which is a property of all X?
A. They contain Y.
B. They have property Z.
C. * They do not contain Y.
D. They have property W.
Looking at the item text, we see that this is likely being
caused by rival options A and C. SME feedback
suggests the item is too text specific.
Example Item: Ambiguous Key
Which is a component of X?
A. * Real anticipated expense
B. Time spent
C. Liquid assets
D. Quality
In this case, students of high ability were often
selecting C instead of A. SME feedback suggests the
deleted word may have been turning students off to
that option.
Example Item: Weak Key
Select 3 possible causes of X
A. *Obesity
B. Contaminated drinking water
C. *Unhealthy diet
D. *Genetic factors
E. Lack of exercise
High performing students were picking C and D correctly, but
were as likely to pick E as they were to pick A. SME feedback
suggested that E may be a reasonable answer to the question.
The revision involved making A, C, and E all incorrect answers
so that D would remain the sole answer.
Example Item: Potential Mis-key
Which is a sound accounting principle?
A. X
B. Not X
C. *Y
D. Z
Nearly all students selected distractor B (Not X). This
item was not mis-keyed. It seems most likely that this
concept was not covered sufficiently in the text and/or
other learning resources—leaving students to use
guessing strategies rather than content knowledge.
Barbara Foster
Psychometrician
The American Board of Obstetrics
and Gynecology
The American Board of
Obstetrics and Gynecology
2013 Certifying Exam
• 180 scored items
• Five sets of 40 field test items
• Potential mis-keys from Caveon
– 8 identified among the scored items (4%)
– 22 identified among the field test items (11%)
The lower proportion in the scored items is not
surprising since those items have been field
tested and some may have been previously
used.
The American Board of Obstetrics and Gynecology
• Result of the SME review of the flagged scored
items:
– 4 of the 8 (50%) were found to have problems.
These problems were a combination of ambiguous
wording, new information published just prior to
the exam, recent changes in guidelines, or just a
very difficult item. These items were deleted from
the exam prior to scoring.
The American Board of Obstetrics and Gynecology
• Result of the SME review of the flagged field
test items:
– 15 of the 22 (68%) were found to have problems.
These problems were mostly a combination of
ambiguous wording, responses too closely related,
and changes in the field.
The American Board of Obstetrics and Gynecology
Our Standard Methods The z* Method
27 Field Test Items
flagged
(13.5%)
22 Field Test Items
flagged
(11.0%)8 (4%)
items
flagged
by both
The American Board of Obstetrics and Gynecology
Our Standard Methods The z* Method
27 Field Test Items
flagged
(13.5%)
13 had problems
22 Field Test Items
flagged
(11.0%)
15 had problems
8 (4%)
5 items
had
problems
The American Board of Obstetrics and Gynecology
• Conclusion
This new method indicates that it is detecting
differences that are not being detected by our
current methods. These differences do not
appear to be strictly keying errors but involve
other important problem areas as well.
The American Board of Obstetrics and Gynecology
Conclusions
• Item analysis helps ensure
– Unidimensionality
– Desired item performance
• Key Strength Analysis enhances classical item analysis
– Uses information from all items
– Compares answer choices for same item
• Can detect structural flaws in items
• Can suggest the actual key when the item is mis-keyed
– Suggests possible reasons for poor performance
• Future research
– Investigate thresholds for Key Strength Analysis
– Simulate item problems to measure ability to detect
– Evaluate performance when assumptions fail
Questions?
Please type questions for our presenters in the
GoToWebinar control panel on your screen.
HANDBOOK OF TEST SECURITY
• Editors - James Wollack & John Fremer
• Published March 2013
• Preventing, Detecting, and Investigating Cheating
• Testing in Many Domains
– Certification/Licensure
– Clinical
– Educational
– Industrial/Organizational
• Don’t forget to order your copy at www.routledge.com
– http://bit.ly/HandbookTS (Case Sensitive)
– Save 20% - Enter discount code: HYJ82
THANK YOU!
- Follow Caveon on twitter @caveon
- Check out our blog…www.caveon.com/blog
- LinkedIn Group – ―Caveon Test Security‖
Dennis Maynes Dan Allen
Chief Scientist Psychometrician
Caveon Test Security Western Governors University
Marcus Scott Barbara Foster
Data Forensics Scientist Psychometrician
Caveon Test Security American Board of Obstetrics
and Gynecology

More Related Content

Similar to Caveon Webinar Series: Improving Testing with Key Strength Analysis

Surveys that work: training course for Rosenfeld Media, day 3
Surveys that work: training course for Rosenfeld Media, day 3 Surveys that work: training course for Rosenfeld Media, day 3
Surveys that work: training course for Rosenfeld Media, day 3
Caroline Jarrett
 
Psychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling YouPsychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling You
ExamSoft
 
Lesson 21 designing the questionaire and establishing validity and reliabilty
Lesson 21 designing the questionaire and establishing validity and reliabiltyLesson 21 designing the questionaire and establishing validity and reliabilty
Lesson 21 designing the questionaire and establishing validity and reliabilty
mjlobetos
 
ch.9 (1).ppt
ch.9 (1).pptch.9 (1).ppt
ch.9 (1).ppt
AkbarbekSanayev1
 
I wish I could believe you: the frustrating unreliability of some assessment ...
I wish I could believe you: the frustrating unreliability of some assessment ...I wish I could believe you: the frustrating unreliability of some assessment ...
I wish I could believe you: the frustrating unreliability of some assessment ...
Tim Hunt
 
Questionnaire development
Questionnaire developmentQuestionnaire development
Questionnaire development
Dr. Anugamini Priya
 
Teaching technology2
Teaching technology2Teaching technology2
Teaching technology2
Govt Medical College, Surat.
 
Fdu item analysis (1).ppt revised by dd
Fdu item analysis (1).ppt revised by ddFdu item analysis (1).ppt revised by dd
Fdu item analysis (1).ppt revised by dddettmore
 
Item and Distracter Analysis
Item and Distracter AnalysisItem and Distracter Analysis
Item and Distracter AnalysisSue Quirante
 
Harmon, Uncertainty analysis: An evaluation metric for synthesis science
Harmon, Uncertainty analysis: An evaluation metric for synthesis scienceHarmon, Uncertainty analysis: An evaluation metric for synthesis science
Harmon, Uncertainty analysis: An evaluation metric for synthesis sciencequestRCN
 
Unit 2 MARKETING RESEARCH
Unit 2 MARKETING RESEARCHUnit 2 MARKETING RESEARCH
Unit 2 MARKETING RESEARCH
Pramod Rawat
 
Administering, analyzing, and improving the test or assessment
Administering, analyzing, and improving the test or assessmentAdministering, analyzing, and improving the test or assessment
Administering, analyzing, and improving the test or assessment
Nema Grace Medillo
 
Cognitive, personality and behavioural predictors of academic success in a la...
Cognitive, personality and behavioural predictors of academic success in a la...Cognitive, personality and behavioural predictors of academic success in a la...
Cognitive, personality and behavioural predictors of academic success in a la...
Blackboard APAC
 
Chapter 6: Writing Objective Test Items
Chapter 6: Writing Objective Test ItemsChapter 6: Writing Objective Test Items
Chapter 6: Writing Objective Test Items
SHELAMIE SANTILLAN
 
Collection of data
Collection of dataCollection of data
Collection of data
Baiju KT
 
Test construction tony coloma
Test construction tony colomaTest construction tony coloma
Test construction tony coloma
Tony Coloma
 
Analysis of item test
Analysis of item testAnalysis of item test
Analysis of item test
Siti Purwaningsih
 

Similar to Caveon Webinar Series: Improving Testing with Key Strength Analysis (20)

Surveys that work: training course for Rosenfeld Media, day 3
Surveys that work: training course for Rosenfeld Media, day 3 Surveys that work: training course for Rosenfeld Media, day 3
Surveys that work: training course for Rosenfeld Media, day 3
 
Psychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling YouPsychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling You
 
Lesson 21 designing the questionaire and establishing validity and reliabilty
Lesson 21 designing the questionaire and establishing validity and reliabiltyLesson 21 designing the questionaire and establishing validity and reliabilty
Lesson 21 designing the questionaire and establishing validity and reliabilty
 
ch.9 (1).ppt
ch.9 (1).pptch.9 (1).ppt
ch.9 (1).ppt
 
Item analysis with spss software
Item analysis with spss softwareItem analysis with spss software
Item analysis with spss software
 
I wish I could believe you: the frustrating unreliability of some assessment ...
I wish I could believe you: the frustrating unreliability of some assessment ...I wish I could believe you: the frustrating unreliability of some assessment ...
I wish I could believe you: the frustrating unreliability of some assessment ...
 
Questionnaire development
Questionnaire developmentQuestionnaire development
Questionnaire development
 
Teaching technology2
Teaching technology2Teaching technology2
Teaching technology2
 
Fdu item analysis (1).ppt revised by dd
Fdu item analysis (1).ppt revised by ddFdu item analysis (1).ppt revised by dd
Fdu item analysis (1).ppt revised by dd
 
Item and Distracter Analysis
Item and Distracter AnalysisItem and Distracter Analysis
Item and Distracter Analysis
 
Harmon, Uncertainty analysis: An evaluation metric for synthesis science
Harmon, Uncertainty analysis: An evaluation metric for synthesis scienceHarmon, Uncertainty analysis: An evaluation metric for synthesis science
Harmon, Uncertainty analysis: An evaluation metric for synthesis science
 
Unit 2 MARKETING RESEARCH
Unit 2 MARKETING RESEARCHUnit 2 MARKETING RESEARCH
Unit 2 MARKETING RESEARCH
 
Administering, analyzing, and improving the test or assessment
Administering, analyzing, and improving the test or assessmentAdministering, analyzing, and improving the test or assessment
Administering, analyzing, and improving the test or assessment
 
Cognitive, personality and behavioural predictors of academic success in a la...
Cognitive, personality and behavioural predictors of academic success in a la...Cognitive, personality and behavioural predictors of academic success in a la...
Cognitive, personality and behavioural predictors of academic success in a la...
 
Chapter 6: Writing Objective Test Items
Chapter 6: Writing Objective Test ItemsChapter 6: Writing Objective Test Items
Chapter 6: Writing Objective Test Items
 
Collection of data
Collection of dataCollection of data
Collection of data
 
AOL-CHAPTER-3.pptx
AOL-CHAPTER-3.pptxAOL-CHAPTER-3.pptx
AOL-CHAPTER-3.pptx
 
Test construction tony coloma
Test construction tony colomaTest construction tony coloma
Test construction tony coloma
 
Analysis of item test
Analysis of item testAnalysis of item test
Analysis of item test
 
Analysis of item test
Analysis of item testAnalysis of item test
Analysis of item test
 

More from Caveon Test Security

Unpublished study indicates high chance of fraud in thousands of tests of enem
Unpublished study indicates high chance of fraud in thousands of tests of enemUnpublished study indicates high chance of fraud in thousands of tests of enem
Unpublished study indicates high chance of fraud in thousands of tests of enem
Caveon Test Security
 
Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...
Caveon Test Security
 
Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...
Caveon Test Security
 
Caveon Webinar Series - A Guide to Online Protection Strategies - March 28, ...
Caveon Webinar Series -  A Guide to Online Protection Strategies - March 28, ...Caveon Webinar Series -  A Guide to Online Protection Strategies - March 28, ...
Caveon Webinar Series - A Guide to Online Protection Strategies - March 28, ...
Caveon Test Security
 
Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...
Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...
Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...
Caveon Test Security
 
The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217
The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217
The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217
Caveon Test Security
 
Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...
Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...
Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...
Caveon Test Security
 
Caveon Webinar Series - Four Steps to Effective Investigations in School Dis...
Caveon Webinar Series -  Four Steps to Effective Investigations in School Dis...Caveon Webinar Series -  Four Steps to Effective Investigations in School Dis...
Caveon Webinar Series - Four Steps to Effective Investigations in School Dis...
Caveon Test Security
 
Caveon Webinar Series - On-site Monitoring in Districts 0317
Caveon Webinar Series - On-site Monitoring in Districts 0317Caveon Webinar Series - On-site Monitoring in Districts 0317
Caveon Webinar Series - On-site Monitoring in Districts 0317
Caveon Test Security
 
CESP Study Session #1 October 2016
CESP Study Session #1 October 2016CESP Study Session #1 October 2016
CESP Study Session #1 October 2016
Caveon Test Security
 
A Tale of Two Cities - School District Webinar #1 Jan 2017
A Tale of Two Cities - School District Webinar  #1 Jan 2017A Tale of Two Cities - School District Webinar  #1 Jan 2017
A Tale of Two Cities - School District Webinar #1 Jan 2017
Caveon Test Security
 
Caveon Webinar Series - Discrete Option Multiple Choice: A Revolution in Te...
Caveon Webinar Series  - Discrete Option Multiple Choice:  A Revolution in Te...Caveon Webinar Series  - Discrete Option Multiple Choice:  A Revolution in Te...
Caveon Webinar Series - Discrete Option Multiple Choice: A Revolution in Te...
Caveon Test Security
 
Caveon Webinar Series - Test Cheaters Say the Darnedest Things! - 072016
Caveon Webinar Series -  Test Cheaters Say the Darnedest Things! - 072016Caveon Webinar Series -  Test Cheaters Say the Darnedest Things! - 072016
Caveon Webinar Series - Test Cheaters Say the Darnedest Things! - 072016
Caveon Test Security
 
Caveon Webinar Series - The Test Security Framework- Why Different Tests Nee...
Caveon Webinar Series -  The Test Security Framework- Why Different Tests Nee...Caveon Webinar Series -  The Test Security Framework- Why Different Tests Nee...
Caveon Webinar Series - The Test Security Framework- Why Different Tests Nee...
Caveon Test Security
 
Caveon Webinar Series - Conducting Test Security Investigations in School Di...
Caveon Webinar Series -  Conducting Test Security Investigations in School Di...Caveon Webinar Series -  Conducting Test Security Investigations in School Di...
Caveon Webinar Series - Conducting Test Security Investigations in School Di...
Caveon Test Security
 
Caveon Webinar Series - Creating Your Test Security Game Plan - March 2016
Caveon Webinar Series -  Creating Your Test Security Game Plan - March 2016Caveon Webinar Series -  Creating Your Test Security Game Plan - March 2016
Caveon Webinar Series - Creating Your Test Security Game Plan - March 2016
Caveon Test Security
 
Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...
Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...
Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...
Caveon Test Security
 
Caveon Webinar Series - Will the Real Cloned Item Please Stand Up? final
Caveon Webinar Series -  Will the Real Cloned Item Please Stand Up? finalCaveon Webinar Series -  Will the Real Cloned Item Please Stand Up? final
Caveon Webinar Series - Will the Real Cloned Item Please Stand Up? final
Caveon Test Security
 
Caveon Webinar Series - Lessons Learned at the 2015 National Conference on S...
Caveon Webinar Series -  Lessons Learned at the 2015 National Conference on S...Caveon Webinar Series -  Lessons Learned at the 2015 National Conference on S...
Caveon Webinar Series - Lessons Learned at the 2015 National Conference on S...
Caveon Test Security
 
Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...
Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...
Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...
Caveon Test Security
 

More from Caveon Test Security (20)

Unpublished study indicates high chance of fraud in thousands of tests of enem
Unpublished study indicates high chance of fraud in thousands of tests of enemUnpublished study indicates high chance of fraud in thousands of tests of enem
Unpublished study indicates high chance of fraud in thousands of tests of enem
 
Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...
 
Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...
 
Caveon Webinar Series - A Guide to Online Protection Strategies - March 28, ...
Caveon Webinar Series -  A Guide to Online Protection Strategies - March 28, ...Caveon Webinar Series -  A Guide to Online Protection Strategies - March 28, ...
Caveon Webinar Series - A Guide to Online Protection Strategies - March 28, ...
 
Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...
Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...
Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...
 
The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217
The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217
The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217
 
Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...
Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...
Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...
 
Caveon Webinar Series - Four Steps to Effective Investigations in School Dis...
Caveon Webinar Series -  Four Steps to Effective Investigations in School Dis...Caveon Webinar Series -  Four Steps to Effective Investigations in School Dis...
Caveon Webinar Series - Four Steps to Effective Investigations in School Dis...
 
Caveon Webinar Series - On-site Monitoring in Districts 0317
Caveon Webinar Series - On-site Monitoring in Districts 0317Caveon Webinar Series - On-site Monitoring in Districts 0317
Caveon Webinar Series - On-site Monitoring in Districts 0317
 
CESP Study Session #1 October 2016
CESP Study Session #1 October 2016CESP Study Session #1 October 2016
CESP Study Session #1 October 2016
 
A Tale of Two Cities - School District Webinar #1 Jan 2017
A Tale of Two Cities - School District Webinar  #1 Jan 2017A Tale of Two Cities - School District Webinar  #1 Jan 2017
A Tale of Two Cities - School District Webinar #1 Jan 2017
 
Caveon Webinar Series - Discrete Option Multiple Choice: A Revolution in Te...
Caveon Webinar Series  - Discrete Option Multiple Choice:  A Revolution in Te...Caveon Webinar Series  - Discrete Option Multiple Choice:  A Revolution in Te...
Caveon Webinar Series - Discrete Option Multiple Choice: A Revolution in Te...
 
Caveon Webinar Series - Test Cheaters Say the Darnedest Things! - 072016
Caveon Webinar Series -  Test Cheaters Say the Darnedest Things! - 072016Caveon Webinar Series -  Test Cheaters Say the Darnedest Things! - 072016
Caveon Webinar Series - Test Cheaters Say the Darnedest Things! - 072016
 
Caveon Webinar Series - The Test Security Framework- Why Different Tests Nee...
Caveon Webinar Series -  The Test Security Framework- Why Different Tests Nee...Caveon Webinar Series -  The Test Security Framework- Why Different Tests Nee...
Caveon Webinar Series - The Test Security Framework- Why Different Tests Nee...
 
Caveon Webinar Series - Conducting Test Security Investigations in School Di...
Caveon Webinar Series -  Conducting Test Security Investigations in School Di...Caveon Webinar Series -  Conducting Test Security Investigations in School Di...
Caveon Webinar Series - Conducting Test Security Investigations in School Di...
 
Caveon Webinar Series - Creating Your Test Security Game Plan - March 2016
Caveon Webinar Series -  Creating Your Test Security Game Plan - March 2016Caveon Webinar Series -  Creating Your Test Security Game Plan - March 2016
Caveon Webinar Series - Creating Your Test Security Game Plan - March 2016
 
Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...
Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...
Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...
 
Caveon Webinar Series - Will the Real Cloned Item Please Stand Up? final
Caveon Webinar Series -  Will the Real Cloned Item Please Stand Up? finalCaveon Webinar Series -  Will the Real Cloned Item Please Stand Up? final
Caveon Webinar Series - Will the Real Cloned Item Please Stand Up? final
 
Caveon Webinar Series - Lessons Learned at the 2015 National Conference on S...
Caveon Webinar Series -  Lessons Learned at the 2015 National Conference on S...Caveon Webinar Series -  Lessons Learned at the 2015 National Conference on S...
Caveon Webinar Series - Lessons Learned at the 2015 National Conference on S...
 
Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...
Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...
Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...
 

Recently uploaded

Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 

Recently uploaded (20)

Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 

Caveon Webinar Series: Improving Testing with Key Strength Analysis

  • 1. Upcoming Caveon Events • Caveon Webinar Series: Next session, October 16 The Good and Bad of Online Proctoring, Part 2 • EATP – September 25-27 in St. Julian’s, Malta. – Caveon’s John Fremer and Steve Addicott presenting: What are we Accountable For? Security Standards and Resources for High Stakes Testing Programs – Steve Addicott hosting an ignite session: Leveraging Social Media to Connect with International Test Candidates • The 2nd Annual Statistical Detection of Potential Test Fraud Conference – October 17-19, 2013, Madison, Wisconsin – Caveon’s Dennis Maynes and Cindy Butler will be presenting three sessions • Handbook of Test Security – Now Available. We will share a discount code at the end of this session.
  • 2. Caveon Online • Caveon Security Insights Blog – http://www.caveon.com/blog/ • twitter – Follow @Caveon • LinkedIn – Caveon Company Page – ―Caveon Test Security‖ Group • Please contribute! • Facebook – Will you be our ―friend?‖ – ―Like‖ us! www.caveon.com
  • 3. Improving Testing with Key Strength Analysis Dennis Maynes Dan Allen Chief Scientist Psychometrician Caveon Test Security Western Governors University Marcus Scott Barbara Foster Data Forensics Scientist Psychometrician Caveon Test Security American Board of Obstetrics and Gynecology September 18, 2013 Caveon Webinar Series:
  • 4. Agenda for Today • Review classical item analysis • Introduce Key Strength Analysis • Derive Key Strength Analysis • Observations by Dan Allen and Barbara Foster • Conclusions and Q&A
  • 5. Review Classical Item Analysis • Statistics – P-value – Point-biserial correlation • Typical rules – Low p-values (hard items) – High p-values (easy items) – Low point-biserial correlations (low discriminations) • Easy to understand and implement • Good at flagging poor items
  • 6. Introduce Key Strength Analysis • Why Key Strength Analysis? – Model uses information from all items – Answer choices for same item are compared – Provides possible reasons for poor performance • High performing test takers (knowledgeable students) – Typically report problems with the answer key – Usually choose the correct answer • Most frequently selected choice – Is usually correct for easy items – Is not necessarily correct for hard items
  • 7. Capabilities of Key Strength Analysis • Built upon classical item analysis – Point-biserial correlations discriminate between high and low performers – P-values detect hard/easy items • Typical problems with items – Mis-keyed items – Weakly keyed items – Ambiguously keyed items • Use probabilities to make inferences about item performance
  • 8. Modify Point-Biserial Correlation 1. Exclude the item score from the test score • Places all answer choices on ―the same playing field‖ • Allows correct and incorrect answers to be compared using ―what if‖ 2. Compute point-biserial correlations • For correct answer and • For distractors 3. Scale point-biserial appropriately • We call this statistic, z* • Use z* to compute the probability of the choice (A, B, etc.) being a key--this is the ―key strength‖
  • 11. Why z* Depends on all the Right Quantities
  • 12. Z* for all Items and Responses 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 z* Right Wrong 154 Examinees, 100 Items
  • 13. Calculating p(choice is a key | data)
  • 14. Approximation Theory • Central Limit Theorem  z* is normal. • Probability function should be monotonic increasing, which requires equal variances 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 z* Right Right Normal Wrong Wrong Normal
  • 15. P(choice is a key | z*) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 p(choiceisakey|z*) z*
  • 16. Analysis of Distractors • Compute key strength (KS) for all responses • Low KS – probability less than 50% • High KS – probability 50% or more AnswerDistractors Low KS High KS Low KS Weakly keyed Potential mis-key High KS Normal Ambiguously keyed
  • 17. Example I – Good Key 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 p(choiceisakey|z*) z* A C D B Response z* Probability A 3.25 0.99 B 0.25 0.06 C -2.75 0 D -2.4 0 Answer key arrow is colored gold
  • 18. Example II – Potential Mis-key 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 p(choiceisakey|z*) z* A B C D Response z* Probability A 3.25 0.99 B 0.25 0.06 C -2.75 0 D -2.4 0 Answer key arrow is colored gold
  • 19. Example III – Weak Key 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 p(choiceisakey|z*) z* A B C D Response z* Probability A 1.0 0.32 B 0.25 0.06 C -3 0 D -2.5 0 Answer key arrow is colored gold
  • 20. Example IV – Ambiguous Key 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 p(choiceisakey|z*) z* Response z* Probability A 3.75 0.99 B 2.25 0.9 C -3 0 D -2.5 0 C D A B Answer key arrow is colored gold
  • 21. Validation – Answer Key Estimation • Assume the key is not known • Check accuracy of estimated answer key • Algorithm: – Start with most frequent response as initial guess – Revise key using probabilities until no more changes • For 12 different exams – Key estimation accuracy varied from 81% to 99% – Cannot infer multiple keys – Cannot guess key when there are no correct responses
  • 22. Summary of Validation Study • Accuracy improves with item quality • Accuracy affected by sample size & test length Exam Name N Forms Form Length Items Non-scored Items Accuracy Observations A 2,966 2 180 307 0 99.2% B 337 2 107 214 0 85.5% C 337 1 230 230 0 90.9% D 1815 1 204 204 7 92.1%Some association with "deleted" items E 1408 1 199 199 1 96.0% F 46,356 2 240 480 0 96.0% G 44,104 2 120 240 0 95.8% H 25,448 2 60 120 0 93.3% I 121 3 165 417 43 81.0%Strong association with "field test" items J 1,071 8 52 & 61 391 0 80.5%85.2% (English-only) K 2,033 8 68, 76 & 77 510 0 85.9% L 6,473 21 250 1050 850 85.7% All errors except one were on non-scored items.
  • 23. Reason for Answer Key Estimation • If a group of test takers has stolen the test and worked out their own answer key, it is likely some answers will be wrong. • Answer key estimation can find the errors committed by test thieves.
  • 25. Example Item: Ambiguous Key Which is a property of all X? A. They contain Y. B. They have property Z. C. * They do not contain Y. D. They have property W. Looking at the item text, we see that this is likely being caused by rival options A and C. SME feedback suggests the item is too text specific.
  • 26. Example Item: Ambiguous Key Which is a component of X? A. * Real anticipated expense B. Time spent C. Liquid assets D. Quality In this case, students of high ability were often selecting C instead of A. SME feedback suggests the deleted word may have been turning students off to that option.
  • 27. Example Item: Weak Key Select 3 possible causes of X A. *Obesity B. Contaminated drinking water C. *Unhealthy diet D. *Genetic factors E. Lack of exercise High performing students were picking C and D correctly, but were as likely to pick E as they were to pick A. SME feedback suggested that E may be a reasonable answer to the question. The revision involved making A, C, and E all incorrect answers so that D would remain the sole answer.
  • 28. Example Item: Potential Mis-key Which is a sound accounting principle? A. X B. Not X C. *Y D. Z Nearly all students selected distractor B (Not X). This item was not mis-keyed. It seems most likely that this concept was not covered sufficiently in the text and/or other learning resources—leaving students to use guessing strategies rather than content knowledge.
  • 29. Barbara Foster Psychometrician The American Board of Obstetrics and Gynecology
  • 30. The American Board of Obstetrics and Gynecology 2013 Certifying Exam • 180 scored items • Five sets of 40 field test items
  • 31. • Potential mis-keys from Caveon – 8 identified among the scored items (4%) – 22 identified among the field test items (11%) The lower proportion in the scored items is not surprising since those items have been field tested and some may have been previously used. The American Board of Obstetrics and Gynecology
  • 32. • Result of the SME review of the flagged scored items: – 4 of the 8 (50%) were found to have problems. These problems were a combination of ambiguous wording, new information published just prior to the exam, recent changes in guidelines, or just a very difficult item. These items were deleted from the exam prior to scoring. The American Board of Obstetrics and Gynecology
  • 33. • Result of the SME review of the flagged field test items: – 15 of the 22 (68%) were found to have problems. These problems were mostly a combination of ambiguous wording, responses too closely related, and changes in the field. The American Board of Obstetrics and Gynecology
  • 34. Our Standard Methods The z* Method 27 Field Test Items flagged (13.5%) 22 Field Test Items flagged (11.0%)8 (4%) items flagged by both The American Board of Obstetrics and Gynecology
  • 35. Our Standard Methods The z* Method 27 Field Test Items flagged (13.5%) 13 had problems 22 Field Test Items flagged (11.0%) 15 had problems 8 (4%) 5 items had problems The American Board of Obstetrics and Gynecology
  • 36. • Conclusion This new method indicates that it is detecting differences that are not being detected by our current methods. These differences do not appear to be strictly keying errors but involve other important problem areas as well. The American Board of Obstetrics and Gynecology
  • 37. Conclusions • Item analysis helps ensure – Unidimensionality – Desired item performance • Key Strength Analysis enhances classical item analysis – Uses information from all items – Compares answer choices for same item • Can detect structural flaws in items • Can suggest the actual key when the item is mis-keyed – Suggests possible reasons for poor performance • Future research – Investigate thresholds for Key Strength Analysis – Simulate item problems to measure ability to detect – Evaluate performance when assumptions fail
  • 38. Questions? Please type questions for our presenters in the GoToWebinar control panel on your screen.
  • 39. HANDBOOK OF TEST SECURITY • Editors - James Wollack & John Fremer • Published March 2013 • Preventing, Detecting, and Investigating Cheating • Testing in Many Domains – Certification/Licensure – Clinical – Educational – Industrial/Organizational • Don’t forget to order your copy at www.routledge.com – http://bit.ly/HandbookTS (Case Sensitive) – Save 20% - Enter discount code: HYJ82
  • 40. THANK YOU! - Follow Caveon on twitter @caveon - Check out our blog…www.caveon.com/blog - LinkedIn Group – ―Caveon Test Security‖ Dennis Maynes Dan Allen Chief Scientist Psychometrician Caveon Test Security Western Governors University Marcus Scott Barbara Foster Data Forensics Scientist Psychometrician Caveon Test Security American Board of Obstetrics and Gynecology