Presented by Eric Ermie, Executive Director of Sales, ExamSoft Worldwide, Inc.
Keep it? Throw it out? Content/teaching issue? Bad question? Too easy? Too hard? What the heck? More than likely you have asked some or all of these questions at one point or another when trying to understand the performance of questions on an assessment. With differing opinions on how to interpret the statistics provided, how do you know what all this data is trying to tell you? Join us for a webinar on the fundamentals of item analysis, how the data is derived, and the different ways they can be interpreted. This presentation will cover how to put data into a useful context that will allow you to draw your own conclusions on what it means, how you should apply them, and why you should ignore rules that others may use for their specific situation.
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
Psychometrics 101: Know What Your Assessment Data is Telling You
1. Psychometrics 101:
Know what your assessment
data is telling you
Eric Ermie – Director of Client Solutions, ExamSoft
(Formerly) Program Manager for Assessment and Evaluation,
The Ohio State University College of Medicine.
2. AGENDA
• Overview
• Types of stats
• Interpreting the item analysis
report
• Examples
• General statistical guidelines
3. How can I reconcile what I know about my assessment’s
past with what the data is telling me?
Item analysis is not a fool proof answer to these
questions.
But…
THE
OVERVIEW
YOU HAVE TO START SOMEWHERE.
Where do I start?
Is this a good or bad question? Can statistics
even tell me that?
4. TYPES
OF STATS
Common Stats:
• Item Difficulty/p Value- decimal
representation of difficulty using
the percentage of students who
got the item correct. The lower
the decimal the higher the
difficulty
• Upper 27% - what percentage of
the top 27% of performers got
the question correct
• Lower 27% - what percentage of
the bottom 27% of performers
got the question correct.
Common Stats Cont’d:
• Discrimination index – the
difference in performance
between the Upper 27% and the
Lower 27%
• Point-Biserial- a discrimination
statistic that indicates whether
doing well on that specific item
correlated with doing well on the
exam overall. Thus was that item
a good or bad predictor of overall
performance on the exam.
6. But with any statistic it is important to
remember context matters!
7. ITEM ANALYSIS
EXAMPLES
Diff(p) Upper A B D E
0.98 100.00% 0.10 0 1 1 *178
0.00 0.55 0.55 98.34
0.00 0.02 -0.10 0.10
0.00 0.00 -0.02 0.02
0.00 0.00 0.00 1.00
0.00 0.00 0.02 0.98Lower 27%
Upper 27%
Disc. Index 0.00
0.00
0.00
0.00
0
0.00
Lower
Disc.
Index
1
% Selected
Point Biserial (rpb)
96.15% E0.04
Item
#
Correct Responses Point
Biserial
Correct
Answer
Response Frequencies (*Indicates correct answer)
C
8. Diff(p) Upper A B D E
0.66 82.00% 0.28 7 17 *120 9
3.87 9.39 66.30 4.97
-0.11 -0.19 0.28 -0.07
-0.04 -0.19 0.36 -0.04
0.00 0.00 0.82 0.06
0.04 0.19 0.46 0.10
Lower C
Item
#
Correct Responses Disc.
Index
Point
Biserial
Correct
Answer
Response Frequencies (*Indicates correct answer)
0.36
Lower 27%
Upper 27%
Disc. Index -0.09
0.21
0.12
Point Biserial (rpb)
46.15% D 28
15.47
-0.12
7
% Selected
ITEM ANALYSIS
EXAMPLES
9. ITEM ANALYSIS
EXAMPLES
Diff(p) Upper A B D E
0.36 52.00% 0.22 35 34 *66 25
19.34 18.78 36.46 13.81
-0.09 0.04 0.22 -0.06
-0.15 0.07 0.25 -0.02
0.10 0.24 0.52 0.10
0.25 0.17 0.27 0.12
Item
#
Correct Responses Disc.
Index
Point
Biserial
Correct
Answer
Response Frequencies (*Indicates correct answer)
Lower C
0.25
Lower 27%
Upper 27%
Disc. Index -0.15
0.19
0.04
Point Biserial (rpb)
26.92% D 21
11.60
-0.20
22
% Selected
10. ITEM ANALYSIS
EXAMPLES
Diff(p) Upper A B D E
0.55 25.00% -0.43 7 17 *120 9
3.87 9.39 55.00 7.46
-0.11 -0.19 -0.43 0.00
-0.04 -0.19 -0.57 0.00
0.00 0.00 0.25 0.00
0.00 0.00 0.83 0.00
Lower C
Item
#
Correct Responses Disc.
Index
Point
Biserial
Correct
Answer
Response Frequencies (*Indicates correct answer)
-0.57
Lower 27%
Upper 27%
Disc. Index -0.09
0.17
0.75
Point Biserial (rpb)
82.50% D 28
37.54
-0.12
82
% Selected
11. ITEM ANALYSIS
EXAMPLES
Diff(p) Upper A B D E
0.52 64.00% 0.18 61 21 5 0
33.70 11.60 2.76 0.00
-0.10 -0.19 0.12 0.00
-0.12 -0.13 0.04 0.00
0.26 0.04 0.06 0.00
0.38 0.17 0.02 0.00
Item
#
Correct Responses Disc.
Index
Point
Biserial
Correct
Answer
Response Frequencies (*Indicates correct answer)
Lower C
0.22
Lower 27%
Upper 27%
Disc. Index 0.22
0.42
0.64
Point Biserial (rpb)
42.31% C *94
51.93
0.18
24
% Selected
12. ITEM ANALYSIS
EXAMPLES
Diff(p) Upper A B D E
0.71 90.00% 0.31 0 *129 30 21
0.00 71.27 16.57 11.60
0.00 0.31 -0.25 -0.11
0.00 0.34 -0.23 -0.09
0.00 0.90 0.06 0.04
0.00 0.56 0.29 0.13
Item
#
Correct Responses Disc.
Index
Point
Biserial
Correct
Answer
Response Frequencies (*Indicates correct answer)
Lower C
0.34
Lower 27%
Upper 27%
Disc. Index -0.02
0.02
0.00
Point Biserial (rpb)
55.77% B 1
0.55
-0.16
34
% Selected
13. GENERAL
GUIDELINES
Desired statistical range’s - opinions differ but most commonly used are:
• Item Difficulty/p Value - Acceptable item difficulty is not a set number but more a
correlation with question intention. If you intended the item to be a mastery item you
want the difficulty as close to 1.00 as possible. If you desired a discriminating question
significantly lower levels are acceptable.
• Upper 27% - if less than 60% of your top performers are getting a question correct a
further analysis is needed to see if there are issues with the question. Also if less of
your upper 27% get a question correct than your lower 27% then there is also an issue.
• Lower 27% - generally you never want it to be higher than the upper 27%. As low as
0% can be acceptable as high as 100% can be acceptable if it is a mastery question.
14. GENERAL
GUIDELINES
Desired statistical range’s - opinions differ but most commonly used are:
• Discrimination index – some set specific numbers of acceptable and unacceptable
values, I would argue the more accurate guide is that the lower the p value the higher
the discrimination index needs to be.
Generally .2 the item is considered to have discriminated, less than that is considered
no discrimination. .3 or greater is consider highly discriminating.
• Point-Biserial – similarly to discrimination index some set specific numbers of
acceptable and unacceptable values.
Generally .2 and above is considered to have discrimination and have positive
association with overall performance on the assessment, lower levels are acceptable for
mastery and .3+ would be desired for discriminating questions.
15. GENERAL
GUIDELINES
KR-20
Used as an overall measure of reliability for the assessment.
Measured on a scale from 0.0 to 1.0 with 0.0 being very poor and 1.0 being excellent.
Quick notes:
Heavily influenced by number of questions in assessment
Heavily influenced by number of students taking the assessments
The combination can FREQUENTLY lead to false positive and false negative KR-20
values.
16. EXTRANEOUS
FACTORS
Stats alone do not tell the whole story:
• Student behavior
– Cheating
– Return on investment
• Conflicting content/faculty
• “six degrees from Sunday”
Ways to increase the accuracy/usefulness of your stats:
• Item review process
– Format
– Level of difficulty
– Alternative correct options
• Historical item analysis
– Across assessments
– Across versions
• Reuse/Recycle
18. • Simplified and detailed versions of item analysis
reports
• Historical item analysis data by version,
assessment and in aggregate
• Ability to pull item analysis by discipline/question
author/category
EXAMSOFT
FIT
THE DATA YOU NEED
19. Thank you for attending!
• Check our resource library:
resources.examsoft.com to re-watch
the webinar, download a PDF of the
presentation or access a certificate of
completion.
• Be sure to check out our upcoming
webinars:
• Creating a Secure Testing Environment
for Distance Education Programs
• Learning about the Learners: Using
Analytical Tools to Drive Curricular
Decisions
21. Click to edit Master title style
Click to edit Master subtitle style
For More Information:
Call: 1.866.429.8889
Email: info@examsoft.com
Visit: learn.examsoft.com