SlideShare a Scribd company logo
1 of 40
Upcoming Caveon Events
• Caveon Webinar Series: Next session, October 16
The Good and Bad of Online Proctoring, Part 2
• EATP – September 25-27 in St. Julian’s, Malta.
– Caveon’s John Fremer and Steve Addicott presenting:
What are we Accountable For? Security Standards and Resources for High
Stakes Testing Programs
– Steve Addicott hosting an ignite session: Leveraging Social Media to Connect with
International Test Candidates
• The 2nd Annual Statistical Detection of Potential Test Fraud Conference
– October 17-19, 2013, Madison, Wisconsin
– Caveon’s Dennis Maynes and Cindy Butler will be presenting three sessions
• Handbook of Test Security – Now Available. We will share a discount code at the
end of this session.
Caveon Online
• Caveon Security Insights Blog
– http://www.caveon.com/blog/
• twitter
– Follow @Caveon
• LinkedIn
– Caveon Company Page
– ―Caveon Test Security‖ Group
• Please contribute!
• Facebook
– Will you be our ―friend?‖
– ―Like‖ us!
www.caveon.com
Improving Testing with Key Strength Analysis
Dennis Maynes Dan Allen
Chief Scientist Psychometrician
Caveon Test Security Western Governors University
Marcus Scott Barbara Foster
Data Forensics Scientist Psychometrician
Caveon Test Security American Board of Obstetrics
and Gynecology
September 18, 2013
Caveon Webinar Series:
Agenda for Today
• Review classical item analysis
• Introduce Key Strength Analysis
• Derive Key Strength Analysis
• Observations by Dan Allen and Barbara Foster
• Conclusions and Q&A
Review Classical Item Analysis
• Statistics
– P-value
– Point-biserial correlation
• Typical rules
– Low p-values (hard items)
– High p-values (easy items)
– Low point-biserial correlations (low discriminations)
• Easy to understand and implement
• Good at flagging poor items
Introduce Key Strength Analysis
• Why Key Strength Analysis?
– Model uses information from all items
– Answer choices for same item are compared
– Provides possible reasons for poor performance
• High performing test takers (knowledgeable students)
– Typically report problems with the answer key
– Usually choose the correct answer
• Most frequently selected choice
– Is usually correct for easy items
– Is not necessarily correct for hard items
Capabilities of Key Strength Analysis
• Built upon classical item analysis
– Point-biserial correlations discriminate between high and low
performers
– P-values detect hard/easy items
• Typical problems with items
– Mis-keyed items
– Weakly keyed items
– Ambiguously keyed items
• Use probabilities to make inferences about item
performance
Modify Point-Biserial Correlation
1. Exclude the item score from the test score
• Places all answer choices on ―the same playing field‖
• Allows correct and incorrect answers to be compared using
―what if‖
2. Compute point-biserial correlations
• For correct answer and
• For distractors
3. Scale point-biserial appropriately
• We call this statistic, z*
• Use z* to compute the probability of the choice (A, B, etc.) being
a key--this is the ―key strength‖
Derive Key Strength Analysis
After Some Algebra
Why z* Depends on all the Right Quantities
Z* for all Items and Responses
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
z*
Right Wrong
154 Examinees, 100 Items
Calculating p(choice is a key | data)
Approximation Theory
• Central Limit Theorem  z* is normal.
• Probability function should be monotonic
increasing, which requires equal variances
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
z*
Right Right Normal Wrong Wrong Normal
P(choice is a key | z*)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
p(choiceisakey|z*)
z*
Analysis of Distractors
• Compute key strength (KS) for all responses
• Low KS – probability less than 50%
• High KS – probability 50% or more
AnswerDistractors Low KS High KS
Low KS Weakly keyed Potential mis-key
High KS Normal Ambiguously keyed
Example I – Good Key
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
p(choiceisakey|z*)
z*
A
C D
B
Response z* Probability
A 3.25 0.99
B 0.25 0.06
C -2.75 0
D -2.4 0
Answer key arrow is
colored gold
Example II – Potential Mis-key
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
p(choiceisakey|z*)
z*
A
B
C D
Response z* Probability
A 3.25 0.99
B 0.25 0.06
C -2.75 0
D -2.4 0
Answer key arrow is
colored gold
Example III – Weak Key
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
p(choiceisakey|z*)
z*
A
B
C D
Response z* Probability
A 1.0 0.32
B 0.25 0.06
C -3 0
D -2.5 0
Answer key arrow is
colored gold
Example IV – Ambiguous Key
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
p(choiceisakey|z*)
z*
Response z* Probability
A 3.75 0.99
B 2.25 0.9
C -3 0
D -2.5 0
C D
A
B
Answer key arrow is
colored gold
Validation – Answer Key Estimation
• Assume the key is not known
• Check accuracy of estimated answer key
• Algorithm:
– Start with most frequent response as initial guess
– Revise key using probabilities until no more changes
• For 12 different exams
– Key estimation accuracy varied from 81% to 99%
– Cannot infer multiple keys
– Cannot guess key when there are no correct responses
Summary of Validation Study
• Accuracy improves with item quality
• Accuracy affected by sample size & test length
Exam
Name
N Forms
Form
Length
Items
Non-scored
Items
Accuracy Observations
A 2,966 2 180 307 0 99.2%
B 337 2 107 214 0 85.5%
C 337 1 230 230 0 90.9%
D 1815 1 204 204 7 92.1%Some association with "deleted" items
E 1408 1 199 199 1 96.0%
F 46,356 2 240 480 0 96.0%
G 44,104 2 120 240 0 95.8%
H 25,448 2 60 120 0 93.3%
I 121 3 165 417 43 81.0%Strong association with "field test" items
J 1,071 8 52 & 61 391 0 80.5%85.2% (English-only)
K 2,033 8 68, 76 & 77 510 0 85.9%
L 6,473 21 250 1050 850 85.7%
All errors except one were on non-scored
items.
Reason for Answer Key Estimation
• If a group of test takers has stolen the test and worked
out their own answer key, it is likely some answers will
be wrong.
• Answer key estimation can find the errors committed by
test thieves.
Dan Allen
Psychometrician
Western Governors University
Example Item: Ambiguous Key
Which is a property of all X?
A. They contain Y.
B. They have property Z.
C. * They do not contain Y.
D. They have property W.
Looking at the item text, we see that this is likely being
caused by rival options A and C. SME feedback
suggests the item is too text specific.
Example Item: Ambiguous Key
Which is a component of X?
A. * Real anticipated expense
B. Time spent
C. Liquid assets
D. Quality
In this case, students of high ability were often
selecting C instead of A. SME feedback suggests the
deleted word may have been turning students off to
that option.
Example Item: Weak Key
Select 3 possible causes of X
A. *Obesity
B. Contaminated drinking water
C. *Unhealthy diet
D. *Genetic factors
E. Lack of exercise
High performing students were picking C and D correctly, but
were as likely to pick E as they were to pick A. SME feedback
suggested that E may be a reasonable answer to the question.
The revision involved making A, C, and E all incorrect answers
so that D would remain the sole answer.
Example Item: Potential Mis-key
Which is a sound accounting principle?
A. X
B. Not X
C. *Y
D. Z
Nearly all students selected distractor B (Not X). This
item was not mis-keyed. It seems most likely that this
concept was not covered sufficiently in the text and/or
other learning resources—leaving students to use
guessing strategies rather than content knowledge.
Barbara Foster
Psychometrician
The American Board of Obstetrics
and Gynecology
The American Board of
Obstetrics and Gynecology
2013 Certifying Exam
• 180 scored items
• Five sets of 40 field test items
• Potential mis-keys from Caveon
– 8 identified among the scored items (4%)
– 22 identified among the field test items (11%)
The lower proportion in the scored items is not
surprising since those items have been field
tested and some may have been previously
used.
The American Board of Obstetrics and Gynecology
• Result of the SME review of the flagged scored
items:
– 4 of the 8 (50%) were found to have problems.
These problems were a combination of ambiguous
wording, new information published just prior to
the exam, recent changes in guidelines, or just a
very difficult item. These items were deleted from
the exam prior to scoring.
The American Board of Obstetrics and Gynecology
• Result of the SME review of the flagged field
test items:
– 15 of the 22 (68%) were found to have problems.
These problems were mostly a combination of
ambiguous wording, responses too closely related,
and changes in the field.
The American Board of Obstetrics and Gynecology
Our Standard Methods The z* Method
27 Field Test Items
flagged
(13.5%)
22 Field Test Items
flagged
(11.0%)8 (4%)
items
flagged
by both
The American Board of Obstetrics and Gynecology
Our Standard Methods The z* Method
27 Field Test Items
flagged
(13.5%)
13 had problems
22 Field Test Items
flagged
(11.0%)
15 had problems
8 (4%)
5 items
had
problems
The American Board of Obstetrics and Gynecology
• Conclusion
This new method indicates that it is detecting
differences that are not being detected by our
current methods. These differences do not
appear to be strictly keying errors but involve
other important problem areas as well.
The American Board of Obstetrics and Gynecology
Conclusions
• Item analysis helps ensure
– Unidimensionality
– Desired item performance
• Key Strength Analysis enhances classical item analysis
– Uses information from all items
– Compares answer choices for same item
• Can detect structural flaws in items
• Can suggest the actual key when the item is mis-keyed
– Suggests possible reasons for poor performance
• Future research
– Investigate thresholds for Key Strength Analysis
– Simulate item problems to measure ability to detect
– Evaluate performance when assumptions fail
Questions?
Please type questions for our presenters in the
GoToWebinar control panel on your screen.
HANDBOOK OF TEST SECURITY
• Editors - James Wollack & John Fremer
• Published March 2013
• Preventing, Detecting, and Investigating Cheating
• Testing in Many Domains
– Certification/Licensure
– Clinical
– Educational
– Industrial/Organizational
• Don’t forget to order your copy at www.routledge.com
– http://bit.ly/HandbookTS (Case Sensitive)
– Save 20% - Enter discount code: HYJ82
THANK YOU!
- Follow Caveon on twitter @caveon
- Check out our blog…www.caveon.com/blog
- LinkedIn Group – ―Caveon Test Security‖
Dennis Maynes Dan Allen
Chief Scientist Psychometrician
Caveon Test Security Western Governors University
Marcus Scott Barbara Foster
Data Forensics Scientist Psychometrician
Caveon Test Security American Board of Obstetrics
and Gynecology

More Related Content

Similar to Caveon Webinar Series: Improving Testing with Key Strength Analysis

Lesson 21 designing the questionaire and establishing validity and reliabilty
Lesson 21 designing the questionaire and establishing validity and reliabiltyLesson 21 designing the questionaire and establishing validity and reliabilty
Lesson 21 designing the questionaire and establishing validity and reliabilty
mjlobetos
 
Fdu item analysis (1).ppt revised by dd
Fdu item analysis (1).ppt revised by ddFdu item analysis (1).ppt revised by dd
Fdu item analysis (1).ppt revised by dd
dettmore
 
Item and Distracter Analysis
Item and Distracter AnalysisItem and Distracter Analysis
Item and Distracter Analysis
Sue Quirante
 
Harmon, Uncertainty analysis: An evaluation metric for synthesis science
Harmon, Uncertainty analysis: An evaluation metric for synthesis scienceHarmon, Uncertainty analysis: An evaluation metric for synthesis science
Harmon, Uncertainty analysis: An evaluation metric for synthesis science
questRCN
 

Similar to Caveon Webinar Series: Improving Testing with Key Strength Analysis (20)

Surveys that work: training course for Rosenfeld Media, day 3
Surveys that work: training course for Rosenfeld Media, day 3 Surveys that work: training course for Rosenfeld Media, day 3
Surveys that work: training course for Rosenfeld Media, day 3
 
Psychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling YouPsychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling You
 
Lesson 21 designing the questionaire and establishing validity and reliabilty
Lesson 21 designing the questionaire and establishing validity and reliabiltyLesson 21 designing the questionaire and establishing validity and reliabilty
Lesson 21 designing the questionaire and establishing validity and reliabilty
 
ch.9 (1).ppt
ch.9 (1).pptch.9 (1).ppt
ch.9 (1).ppt
 
Item analysis with spss software
Item analysis with spss softwareItem analysis with spss software
Item analysis with spss software
 
I wish I could believe you: the frustrating unreliability of some assessment ...
I wish I could believe you: the frustrating unreliability of some assessment ...I wish I could believe you: the frustrating unreliability of some assessment ...
I wish I could believe you: the frustrating unreliability of some assessment ...
 
Questionnaire development
Questionnaire developmentQuestionnaire development
Questionnaire development
 
Teaching technology2
Teaching technology2Teaching technology2
Teaching technology2
 
Fdu item analysis (1).ppt revised by dd
Fdu item analysis (1).ppt revised by ddFdu item analysis (1).ppt revised by dd
Fdu item analysis (1).ppt revised by dd
 
Item and Distracter Analysis
Item and Distracter AnalysisItem and Distracter Analysis
Item and Distracter Analysis
 
Harmon, Uncertainty analysis: An evaluation metric for synthesis science
Harmon, Uncertainty analysis: An evaluation metric for synthesis scienceHarmon, Uncertainty analysis: An evaluation metric for synthesis science
Harmon, Uncertainty analysis: An evaluation metric for synthesis science
 
Unit 2 MARKETING RESEARCH
Unit 2 MARKETING RESEARCHUnit 2 MARKETING RESEARCH
Unit 2 MARKETING RESEARCH
 
Administering, analyzing, and improving the test or assessment
Administering, analyzing, and improving the test or assessmentAdministering, analyzing, and improving the test or assessment
Administering, analyzing, and improving the test or assessment
 
Cognitive, personality and behavioural predictors of academic success in a la...
Cognitive, personality and behavioural predictors of academic success in a la...Cognitive, personality and behavioural predictors of academic success in a la...
Cognitive, personality and behavioural predictors of academic success in a la...
 
Chapter 6: Writing Objective Test Items
Chapter 6: Writing Objective Test ItemsChapter 6: Writing Objective Test Items
Chapter 6: Writing Objective Test Items
 
Collection of data
Collection of dataCollection of data
Collection of data
 
AOL-CHAPTER-3.pptx
AOL-CHAPTER-3.pptxAOL-CHAPTER-3.pptx
AOL-CHAPTER-3.pptx
 
Test construction tony coloma
Test construction tony colomaTest construction tony coloma
Test construction tony coloma
 
Analysis of item test
Analysis of item testAnalysis of item test
Analysis of item test
 
Analysis of item test
Analysis of item testAnalysis of item test
Analysis of item test
 

More from Caveon Test Security

Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...
Caveon Test Security
 
Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...
Caveon Test Security
 

More from Caveon Test Security (20)

Unpublished study indicates high chance of fraud in thousands of tests of enem
Unpublished study indicates high chance of fraud in thousands of tests of enemUnpublished study indicates high chance of fraud in thousands of tests of enem
Unpublished study indicates high chance of fraud in thousands of tests of enem
 
Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...
 
Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...
 
Caveon Webinar Series - A Guide to Online Protection Strategies - March 28, ...
Caveon Webinar Series -  A Guide to Online Protection Strategies - March 28, ...Caveon Webinar Series -  A Guide to Online Protection Strategies - March 28, ...
Caveon Webinar Series - A Guide to Online Protection Strategies - March 28, ...
 
Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...
Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...
Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...
 
The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217
The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217
The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217
 
Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...
Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...
Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...
 
Caveon Webinar Series - Four Steps to Effective Investigations in School Dis...
Caveon Webinar Series -  Four Steps to Effective Investigations in School Dis...Caveon Webinar Series -  Four Steps to Effective Investigations in School Dis...
Caveon Webinar Series - Four Steps to Effective Investigations in School Dis...
 
Caveon Webinar Series - On-site Monitoring in Districts 0317
Caveon Webinar Series - On-site Monitoring in Districts 0317Caveon Webinar Series - On-site Monitoring in Districts 0317
Caveon Webinar Series - On-site Monitoring in Districts 0317
 
CESP Study Session #1 October 2016
CESP Study Session #1 October 2016CESP Study Session #1 October 2016
CESP Study Session #1 October 2016
 
A Tale of Two Cities - School District Webinar #1 Jan 2017
A Tale of Two Cities - School District Webinar  #1 Jan 2017A Tale of Two Cities - School District Webinar  #1 Jan 2017
A Tale of Two Cities - School District Webinar #1 Jan 2017
 
Caveon Webinar Series - Discrete Option Multiple Choice: A Revolution in Te...
Caveon Webinar Series  - Discrete Option Multiple Choice:  A Revolution in Te...Caveon Webinar Series  - Discrete Option Multiple Choice:  A Revolution in Te...
Caveon Webinar Series - Discrete Option Multiple Choice: A Revolution in Te...
 
Caveon Webinar Series - Test Cheaters Say the Darnedest Things! - 072016
Caveon Webinar Series -  Test Cheaters Say the Darnedest Things! - 072016Caveon Webinar Series -  Test Cheaters Say the Darnedest Things! - 072016
Caveon Webinar Series - Test Cheaters Say the Darnedest Things! - 072016
 
Caveon Webinar Series - The Test Security Framework- Why Different Tests Nee...
Caveon Webinar Series -  The Test Security Framework- Why Different Tests Nee...Caveon Webinar Series -  The Test Security Framework- Why Different Tests Nee...
Caveon Webinar Series - The Test Security Framework- Why Different Tests Nee...
 
Caveon Webinar Series - Conducting Test Security Investigations in School Di...
Caveon Webinar Series -  Conducting Test Security Investigations in School Di...Caveon Webinar Series -  Conducting Test Security Investigations in School Di...
Caveon Webinar Series - Conducting Test Security Investigations in School Di...
 
Caveon Webinar Series - Creating Your Test Security Game Plan - March 2016
Caveon Webinar Series -  Creating Your Test Security Game Plan - March 2016Caveon Webinar Series -  Creating Your Test Security Game Plan - March 2016
Caveon Webinar Series - Creating Your Test Security Game Plan - March 2016
 
Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...
Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...
Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...
 
Caveon Webinar Series - Will the Real Cloned Item Please Stand Up? final
Caveon Webinar Series -  Will the Real Cloned Item Please Stand Up? finalCaveon Webinar Series -  Will the Real Cloned Item Please Stand Up? final
Caveon Webinar Series - Will the Real Cloned Item Please Stand Up? final
 
Caveon Webinar Series - Lessons Learned at the 2015 National Conference on S...
Caveon Webinar Series -  Lessons Learned at the 2015 National Conference on S...Caveon Webinar Series -  Lessons Learned at the 2015 National Conference on S...
Caveon Webinar Series - Lessons Learned at the 2015 National Conference on S...
 
Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...
Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...
Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...
 

Recently uploaded

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 

Recently uploaded (20)

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 

Caveon Webinar Series: Improving Testing with Key Strength Analysis

  • 1. Upcoming Caveon Events • Caveon Webinar Series: Next session, October 16 The Good and Bad of Online Proctoring, Part 2 • EATP – September 25-27 in St. Julian’s, Malta. – Caveon’s John Fremer and Steve Addicott presenting: What are we Accountable For? Security Standards and Resources for High Stakes Testing Programs – Steve Addicott hosting an ignite session: Leveraging Social Media to Connect with International Test Candidates • The 2nd Annual Statistical Detection of Potential Test Fraud Conference – October 17-19, 2013, Madison, Wisconsin – Caveon’s Dennis Maynes and Cindy Butler will be presenting three sessions • Handbook of Test Security – Now Available. We will share a discount code at the end of this session.
  • 2. Caveon Online • Caveon Security Insights Blog – http://www.caveon.com/blog/ • twitter – Follow @Caveon • LinkedIn – Caveon Company Page – ―Caveon Test Security‖ Group • Please contribute! • Facebook – Will you be our ―friend?‖ – ―Like‖ us! www.caveon.com
  • 3. Improving Testing with Key Strength Analysis Dennis Maynes Dan Allen Chief Scientist Psychometrician Caveon Test Security Western Governors University Marcus Scott Barbara Foster Data Forensics Scientist Psychometrician Caveon Test Security American Board of Obstetrics and Gynecology September 18, 2013 Caveon Webinar Series:
  • 4. Agenda for Today • Review classical item analysis • Introduce Key Strength Analysis • Derive Key Strength Analysis • Observations by Dan Allen and Barbara Foster • Conclusions and Q&A
  • 5. Review Classical Item Analysis • Statistics – P-value – Point-biserial correlation • Typical rules – Low p-values (hard items) – High p-values (easy items) – Low point-biserial correlations (low discriminations) • Easy to understand and implement • Good at flagging poor items
  • 6. Introduce Key Strength Analysis • Why Key Strength Analysis? – Model uses information from all items – Answer choices for same item are compared – Provides possible reasons for poor performance • High performing test takers (knowledgeable students) – Typically report problems with the answer key – Usually choose the correct answer • Most frequently selected choice – Is usually correct for easy items – Is not necessarily correct for hard items
  • 7. Capabilities of Key Strength Analysis • Built upon classical item analysis – Point-biserial correlations discriminate between high and low performers – P-values detect hard/easy items • Typical problems with items – Mis-keyed items – Weakly keyed items – Ambiguously keyed items • Use probabilities to make inferences about item performance
  • 8. Modify Point-Biserial Correlation 1. Exclude the item score from the test score • Places all answer choices on ―the same playing field‖ • Allows correct and incorrect answers to be compared using ―what if‖ 2. Compute point-biserial correlations • For correct answer and • For distractors 3. Scale point-biserial appropriately • We call this statistic, z* • Use z* to compute the probability of the choice (A, B, etc.) being a key--this is the ―key strength‖
  • 11. Why z* Depends on all the Right Quantities
  • 12. Z* for all Items and Responses 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 z* Right Wrong 154 Examinees, 100 Items
  • 13. Calculating p(choice is a key | data)
  • 14. Approximation Theory • Central Limit Theorem  z* is normal. • Probability function should be monotonic increasing, which requires equal variances 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 z* Right Right Normal Wrong Wrong Normal
  • 15. P(choice is a key | z*) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 p(choiceisakey|z*) z*
  • 16. Analysis of Distractors • Compute key strength (KS) for all responses • Low KS – probability less than 50% • High KS – probability 50% or more AnswerDistractors Low KS High KS Low KS Weakly keyed Potential mis-key High KS Normal Ambiguously keyed
  • 17. Example I – Good Key 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 p(choiceisakey|z*) z* A C D B Response z* Probability A 3.25 0.99 B 0.25 0.06 C -2.75 0 D -2.4 0 Answer key arrow is colored gold
  • 18. Example II – Potential Mis-key 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 p(choiceisakey|z*) z* A B C D Response z* Probability A 3.25 0.99 B 0.25 0.06 C -2.75 0 D -2.4 0 Answer key arrow is colored gold
  • 19. Example III – Weak Key 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 p(choiceisakey|z*) z* A B C D Response z* Probability A 1.0 0.32 B 0.25 0.06 C -3 0 D -2.5 0 Answer key arrow is colored gold
  • 20. Example IV – Ambiguous Key 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 p(choiceisakey|z*) z* Response z* Probability A 3.75 0.99 B 2.25 0.9 C -3 0 D -2.5 0 C D A B Answer key arrow is colored gold
  • 21. Validation – Answer Key Estimation • Assume the key is not known • Check accuracy of estimated answer key • Algorithm: – Start with most frequent response as initial guess – Revise key using probabilities until no more changes • For 12 different exams – Key estimation accuracy varied from 81% to 99% – Cannot infer multiple keys – Cannot guess key when there are no correct responses
  • 22. Summary of Validation Study • Accuracy improves with item quality • Accuracy affected by sample size & test length Exam Name N Forms Form Length Items Non-scored Items Accuracy Observations A 2,966 2 180 307 0 99.2% B 337 2 107 214 0 85.5% C 337 1 230 230 0 90.9% D 1815 1 204 204 7 92.1%Some association with "deleted" items E 1408 1 199 199 1 96.0% F 46,356 2 240 480 0 96.0% G 44,104 2 120 240 0 95.8% H 25,448 2 60 120 0 93.3% I 121 3 165 417 43 81.0%Strong association with "field test" items J 1,071 8 52 & 61 391 0 80.5%85.2% (English-only) K 2,033 8 68, 76 & 77 510 0 85.9% L 6,473 21 250 1050 850 85.7% All errors except one were on non-scored items.
  • 23. Reason for Answer Key Estimation • If a group of test takers has stolen the test and worked out their own answer key, it is likely some answers will be wrong. • Answer key estimation can find the errors committed by test thieves.
  • 25. Example Item: Ambiguous Key Which is a property of all X? A. They contain Y. B. They have property Z. C. * They do not contain Y. D. They have property W. Looking at the item text, we see that this is likely being caused by rival options A and C. SME feedback suggests the item is too text specific.
  • 26. Example Item: Ambiguous Key Which is a component of X? A. * Real anticipated expense B. Time spent C. Liquid assets D. Quality In this case, students of high ability were often selecting C instead of A. SME feedback suggests the deleted word may have been turning students off to that option.
  • 27. Example Item: Weak Key Select 3 possible causes of X A. *Obesity B. Contaminated drinking water C. *Unhealthy diet D. *Genetic factors E. Lack of exercise High performing students were picking C and D correctly, but were as likely to pick E as they were to pick A. SME feedback suggested that E may be a reasonable answer to the question. The revision involved making A, C, and E all incorrect answers so that D would remain the sole answer.
  • 28. Example Item: Potential Mis-key Which is a sound accounting principle? A. X B. Not X C. *Y D. Z Nearly all students selected distractor B (Not X). This item was not mis-keyed. It seems most likely that this concept was not covered sufficiently in the text and/or other learning resources—leaving students to use guessing strategies rather than content knowledge.
  • 29. Barbara Foster Psychometrician The American Board of Obstetrics and Gynecology
  • 30. The American Board of Obstetrics and Gynecology 2013 Certifying Exam • 180 scored items • Five sets of 40 field test items
  • 31. • Potential mis-keys from Caveon – 8 identified among the scored items (4%) – 22 identified among the field test items (11%) The lower proportion in the scored items is not surprising since those items have been field tested and some may have been previously used. The American Board of Obstetrics and Gynecology
  • 32. • Result of the SME review of the flagged scored items: – 4 of the 8 (50%) were found to have problems. These problems were a combination of ambiguous wording, new information published just prior to the exam, recent changes in guidelines, or just a very difficult item. These items were deleted from the exam prior to scoring. The American Board of Obstetrics and Gynecology
  • 33. • Result of the SME review of the flagged field test items: – 15 of the 22 (68%) were found to have problems. These problems were mostly a combination of ambiguous wording, responses too closely related, and changes in the field. The American Board of Obstetrics and Gynecology
  • 34. Our Standard Methods The z* Method 27 Field Test Items flagged (13.5%) 22 Field Test Items flagged (11.0%)8 (4%) items flagged by both The American Board of Obstetrics and Gynecology
  • 35. Our Standard Methods The z* Method 27 Field Test Items flagged (13.5%) 13 had problems 22 Field Test Items flagged (11.0%) 15 had problems 8 (4%) 5 items had problems The American Board of Obstetrics and Gynecology
  • 36. • Conclusion This new method indicates that it is detecting differences that are not being detected by our current methods. These differences do not appear to be strictly keying errors but involve other important problem areas as well. The American Board of Obstetrics and Gynecology
  • 37. Conclusions • Item analysis helps ensure – Unidimensionality – Desired item performance • Key Strength Analysis enhances classical item analysis – Uses information from all items – Compares answer choices for same item • Can detect structural flaws in items • Can suggest the actual key when the item is mis-keyed – Suggests possible reasons for poor performance • Future research – Investigate thresholds for Key Strength Analysis – Simulate item problems to measure ability to detect – Evaluate performance when assumptions fail
  • 38. Questions? Please type questions for our presenters in the GoToWebinar control panel on your screen.
  • 39. HANDBOOK OF TEST SECURITY • Editors - James Wollack & John Fremer • Published March 2013 • Preventing, Detecting, and Investigating Cheating • Testing in Many Domains – Certification/Licensure – Clinical – Educational – Industrial/Organizational • Don’t forget to order your copy at www.routledge.com – http://bit.ly/HandbookTS (Case Sensitive) – Save 20% - Enter discount code: HYJ82
  • 40. THANK YOU! - Follow Caveon on twitter @caveon - Check out our blog…www.caveon.com/blog - LinkedIn Group – ―Caveon Test Security‖ Dennis Maynes Dan Allen Chief Scientist Psychometrician Caveon Test Security Western Governors University Marcus Scott Barbara Foster Data Forensics Scientist Psychometrician Caveon Test Security American Board of Obstetrics and Gynecology