3. TEST DEVELOPMENT PROCESS
13. Standard Setting Study
14. Set Passing Standard
11. Administer Tests
12. Conduct Item
Analysis
9. Assemble Operational Test
Forms
10. Produce Printed Tests Mat.
1. Review
National
and
Professional
Standards
2. Convene National
Advisory Committee
3. Develop Domain,
Knowledge and Skills
Statements
Conduct NeedsAnalysis
5. Construct Table of
Specifications
6. Develop Test Design
7. Develop New Test Questions
8. Review Test Questions
4. WHAT IS ITEM ANALYSIS ?
process that examines student responses
to individual test items assessing quality
items and the test as a whole
valuable in improving items which will be
used again in later tests and eliminate
ambiguous or misleading items
valuable for increasing instructors' skills in
test construction, and
identifying specific areas of course content
which need greater emphasis or clarity.
5. SEVERAL PURPOSES
1. More diagnostic information on students
• Classroom level:
• determine questions most found very difficult/
guessing on • reteach that concept
• questions all got right –
• don't waste more time on this area
• find wrong answers students are choosing• identify common misconceptions
• Individual level:
• isolate specific errors the students made
6. 2. Build future tests, revise test
items to make them better
• know how much work in writing good
questions
• SHOULD NOT REUSE WHOLE TESTS -->
diagnostic teaching means responding to
needs of students, so after a few years a test
bank is build up and choose a tests for the
class
• can spread difficulty levels across your
blueprint (TOS)
7. 3. Part of continuing professional
development
• doing occasional item analysis will help
become a better test writer
• documenting just how good your evaluation
is
• useful for dealing with parents or
administrators if there's ever a dispute
• once you start bringing out all these
impressive looking statistics, parents and
administrators will believe why some
students failed.
9. TEST LEVEL STATISTIC
Quality of the Test
• Reliability and Validity
• Reliability Consistency of measurement
• Validity Truthfulness of response
• Overall Test Quality
• Individual Item Quality
10. RELIABILITY
• refers to the extent to which the test is likely to
produce consistent scores.
Characteristics:
1. The intercorrelations among the items -the greater/stronger the relative number
of positive relationships are, the greater
the reliability.
2. The length of the test –
a test with more items will have a higher
reliability, all other things being equal.
11. 3. The content of the test -generally, the more diverse the
subject matter tested and the testing
techniques used, the lower the
reliability.
4. Heterogeneous groups of test takers
13. • Stability
2. Inter – rater / Observer/ Scorer
•
applicable for mostly essay questions
• Use Cohen’s Kappa Statistic
14. • Equivalence
3. Parallel-Forms/ Equivalent
Used to assess the consistency of the results of
two tests constructed in the same way from the
same content domain.
15. • Internal Consistency
• Used to assess the consistency of results across items
within a test.
4. Split – Half
16. • 5. Kuder-Richardson
Formula 20 / 21
Correlation is determined from a
single administration of a test
through a study of score variances
18. Reliability
Indices
.91 and above
Interpretation
Excellent reliability; at the level of the best standardized
tests
.81 - .90
Very good for a classroom test
.71 - .80
Good for a classroom test; in the range of most. There
are probably a few items which could be improved.
.61 - .70
Somewhat low. This test needs to be supplemented by
other measures (e.g., more tests) to determine
grades. There are probably some items which could
be improved.
.51 - .60
Suggests need for revision of test, unless it is quite short
(ten or fewer items). The test definitely needs to be
supplemented by other measures (e.g., more tests)
for grading.
.50 or below
Questionable reliability. This test should not contribute
heavily to the course grade, and it needs revision.
19. TEST ITEM STATISTIC
Item Difficulty
Percent answering correctly
Item Discrimination
How well the item "functions“
How “valid” the item is based on
the total test score criterion
20. WHAT IS A WELL-FUNCTIONING
TEST ITEM?
• how many students got it correct?
(DIFFICULTY)
• which students got it correct?
(DECRIMINATION)
21. THREE IMPORTANT INFORMATION
ON QUALITY OF TEST ITEMS
• Item difficulty: measure whether an item was
too easy or too hard.
• Item discrimination: measure whether an item
discriminated between students who knew the
material well and students who did not.
• Effectiveness of alternatives: Determination
whether distracters (incorrect but plausible
answers) tend to be marked by the less able
students and not by the more able students.
22. ITEM DIFFICULTY
• Item difficulty is simply the percentage of
students who answer an item correctly. In this
case, it is also equal to the item mean.
Diff = # of students choosing correctly
total # of students
• The item difficulty index ranges from 0 to 100;
the higher the value, the easier the question.
23. ITEM DIFFICULTY LEVEL: DEFINITION
The percentage of students who answered
the item correctly.
High
(Difficult)
Low
(Easy
)
<= 30%
0
Medium
(Moderate)
> 30% AND < 80%
>=80
%
10
20
30
40
50
60
70
80
90
100
24. ITEM DIFFICULTY LEVEL: SAMPLE
Number of students who answered each item = 50
Item
No.
No. Correct
Answers
%
Correct
Difficulty
Level
1
15
30
High
2
25
50
Medium
3
35
70
Medium
4
45
90
Low
25. ITEM DIFFICULTY LEVEL:
QUESTIONS/DISCUSSION
• Is a test that nobody failed too
easy?
• Is a test on which nobody got 100%
too difficult?
• Should items that are “too easy” or
“too difficult” be thrown out?
26. ITEM DISCRIMINATION
• Traditionally, using high and low scoring groups
(upper 27 % and lower 27%)
• Computerized analyses provide more accurate
assessment of the discrimination power of items
since it accounts all responses rather than just
high and low scoring groups.
• Equivalent to point-biserial correlation. It
provides estimate the degree an individual item is
measuring the same thing as the rest of the items.
27. WHAT IS ITEM
DISCRIMINATION?
• Generally, students who did well on the
exam should select the correct answer to
any given item on the exam.
• The Discrimination Index distinguishes for
each item between the performance of
students who did well on the exam and
students who did poorly.
28. INDICES OF DIFFICULTY AND
DISCRIMINATION
(BY HOPKINS AND ANTES)
Index
Difficulty
Discrimination
0.86 above
Very Easy
To be discarded
0.71 – 0.85
Easy
To be revised
0.30 – 0.70
Moderate
Very Good items
0.15 – 0.29
Difficult
To be revised
0.14 below
Very Difficult
To be discarded
29. ITEM DISCRIMINATION:
QUESTIONS / DISCUSSION
• What factors could contribute to
low item discrimination between
the two groups of students?
• What is a likely cause for a
negative discrimination index?
32. STEPS IN ITEM ANALYSIS
1. Code the test items:
- 1 for correct and 0 for incorrect
- Vertical – columns (item numbers)
- Horizontal – rows
(respondents/students)
35. •
•
****** Method 1 (space saver) will be used for this analysis ******
R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L P H A)
• Item-total Statistics
•
Scale
Scale
Corrected
•
Mean
Variance
Item-
•
if Item
if Item
•
Deleted
Deleted
Total
Correlation
Alpha
if Item
Deleted
• VAR00001
14.4211
127.1053
.9401
.9502
• VAR00002
14.6316
136.8440
.7332
.9542
• VAR00022
14.4211
129.1410
.7311
.9513
• VAR00023
14.4211
127.1053
.4401
.9502
• VAR00024
14.6316
136.8440
-.0332
.9542
• VAR00047
14.4737
128.6109
.8511
.9508
• VAR00048
14.4737
128.8252
.8274
.9509
• VAR00049
14.0526
130.6579
.5236
.9525
• VAR00050
14.2105
127.8835
.7533
.9511
• Reliability Coefficients
• N of Cases =
• Alpha =
.9533
57.0
N of Items = 50
36. 3. In the output dialog box:
• Alpha placed at the bottom
• the corrected item total
correlation is the point biserial
correlation as bases for index of
test reliability
37. 4. Count the number
of items discarded
and fill up summary
item analysis table.
38. TEST ITEM RELIABILITY ANALYSIS
SUMMARY (SAMPLE)
Test
Math
(50 items)
Level of
Difficulty
Very Easy
Number
of Items
%
Item Number
1
2
1
Easy
2
4
2,5
Moderate
10
20 3,4,10,15…
Difficult
30
60 6,7,8,9,11,…
Very
Difficult
7
14 16,24,32…
39. 5. Count the number of
items retained based on
the cognitive domains
in the TOS. Compute the
percentage per level of
difficulty.
41. • Realistically: Do item analysis to
your most important tests
• end of unit tests, final exams -->
summative evaluation
• common exams with other teachers
(departmentalized exam)
• common exams gives bigger
sample to work with, which is good
• makes sure that questions other
teacher s prepared are working for
your class
42. ITEM ANALYSIS is one area where
even a lot of otherwise very good
classroom teachers fall down:
• they think they're doing a good job;
• they think they've doing good
evaluation;
• but without doing item analysis,
• They don’t really know.
43. ITEM ANALYSIS is not an
end in itself,
•no point unless you use it
to revise items, and
•helps students on the basis
of information you get out
of it.