View stunning SlideShares in full-screen with the new iOS app!Introducing SlideShare for AndroidExplore all your favorite topics in the SlideShare appGet the SlideShare app to Save for Later — even offline
View stunning SlideShares in full-screen with the new Android app!View stunning SlideShares in full-screen with the new iOS app!
Technique to improve test
items and instruction
TEST DEVELOPMENT PROCESS
13. Standard Setting Study
14. Set Passing Standard
11. Administer Tests
12. Conduct Item
9. Assemble Operational Test
10. Produce Printed Tests Mat.
2. Convene National
3. Develop Domain,
Knowledge and Skills
5. Construct Table of
6. Develop Test Design
7. Develop New Test Questions
8. Review Test Questions
WHAT IS ITEM ANALYSIS ?
process that examines student responses
to individual test items assessing quality
items and the test as a whole
valuable in improving items which will be
used again in later tests and eliminate
ambiguous or misleading items
valuable for increasing instructors' skills in
test construction, and
identifying specific areas of course content
which need greater emphasis or clarity.
1. More diagnostic information on students
• Classroom level:
• determine questions most found very difficult/
guessing on • reteach that concept
• questions all got right –
• don't waste more time on this area
• find wrong answers students are choosing• identify common misconceptions
• Individual level:
• isolate specific errors the students made
2. Build future tests, revise test
items to make them better
• know how much work in writing good
• SHOULD NOT REUSE WHOLE TESTS -->
diagnostic teaching means responding to
needs of students, so after a few years a test
bank is build up and choose a tests for the
• can spread difficulty levels across your
3. Part of continuing professional
• doing occasional item analysis will help
become a better test writer
• documenting just how good your evaluation
• useful for dealing with parents or
administrators if there's ever a dispute
• once you start bringing out all these
impressive looking statistics, parents and
administrators will believe why some
TEST LEVEL STATISTIC
Quality of the Test
• Reliability and Validity
• Reliability Consistency of measurement
• Validity Truthfulness of response
• Overall Test Quality
• Individual Item Quality
• refers to the extent to which the test is likely to
produce consistent scores.
1. The intercorrelations among the items -the greater/stronger the relative number
of positive relationships are, the greater
2. The length of the test –
a test with more items will have a higher
reliability, all other things being equal.
3. The content of the test -generally, the more diverse the
subject matter tested and the testing
techniques used, the lower the
4. Heterogeneous groups of test takers
TYPES OF RELIABILITY
1. Test – Retest
2. Inter – rater / Observer/ Scorer
applicable for mostly essay questions
• Use Cohen’s Kappa Statistic
3. Parallel-Forms/ Equivalent
Used to assess the consistency of the results of
two tests constructed in the same way from the
same content domain.
• Internal Consistency
• Used to assess the consistency of results across items
within a test.
4. Split – Half
• 5. Kuder-Richardson
Formula 20 / 21
Correlation is determined from a
single administration of a test
through a study of score variances
.91 and above
Excellent reliability; at the level of the best standardized
.81 - .90
Very good for a classroom test
.71 - .80
Good for a classroom test; in the range of most. There
are probably a few items which could be improved.
.61 - .70
Somewhat low. This test needs to be supplemented by
other measures (e.g., more tests) to determine
grades. There are probably some items which could
.51 - .60
Suggests need for revision of test, unless it is quite short
(ten or fewer items). The test definitely needs to be
supplemented by other measures (e.g., more tests)
.50 or below
Questionable reliability. This test should not contribute
heavily to the course grade, and it needs revision.
TEST ITEM STATISTIC
Percent answering correctly
How well the item "functions“
How “valid” the item is based on
the total test score criterion
WHAT IS A WELL-FUNCTIONING
• how many students got it correct?
• which students got it correct?
THREE IMPORTANT INFORMATION
ON QUALITY OF TEST ITEMS
• Item difficulty: measure whether an item was
too easy or too hard.
• Item discrimination: measure whether an item
discriminated between students who knew the
material well and students who did not.
• Effectiveness of alternatives: Determination
whether distracters (incorrect but plausible
answers) tend to be marked by the less able
students and not by the more able students.
• Item difficulty is simply the percentage of
students who answer an item correctly. In this
case, it is also equal to the item mean.
Diff = # of students choosing correctly
total # of students
• The item difficulty index ranges from 0 to 100;
the higher the value, the easier the question.
ITEM DIFFICULTY LEVEL: DEFINITION
The percentage of students who answered
the item correctly.
> 30% AND < 80%
ITEM DIFFICULTY LEVEL: SAMPLE
Number of students who answered each item = 50
ITEM DIFFICULTY LEVEL:
• Is a test that nobody failed too
• Is a test on which nobody got 100%
• Should items that are “too easy” or
“too difficult” be thrown out?
• Traditionally, using high and low scoring groups
(upper 27 % and lower 27%)
• Computerized analyses provide more accurate
assessment of the discrimination power of items
since it accounts all responses rather than just
high and low scoring groups.
• Equivalent to point-biserial correlation. It
provides estimate the degree an individual item is
measuring the same thing as the rest of the items.
WHAT IS ITEM
• Generally, students who did well on the
exam should select the correct answer to
any given item on the exam.
• The Discrimination Index distinguishes for
each item between the performance of
students who did well on the exam and
students who did poorly.
INDICES OF DIFFICULTY AND
(BY HOPKINS AND ANTES)
To be discarded
0.71 – 0.85
To be revised
0.30 – 0.70
Very Good items
0.15 – 0.29
To be revised
To be discarded
QUESTIONS / DISCUSSION
• What factors could contribute to
low item discrimination between
the two groups of students?
• What is a likely cause for a
negative discrimination index?
2. IN SPSS:
analysis – (drag/place variables
to Item box) – Statistics – Scale
if item deleted – ok.
****** Method 1 (space saver) will be used for this analysis ******
R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L P H A)
• Item-total Statistics
• Reliability Coefficients
• N of Cases =
• Alpha =
N of Items = 50
3. In the output dialog box:
• Alpha placed at the bottom
• the corrected item total
correlation is the point biserial
correlation as bases for index of
4. Count the number
of items discarded
and fill up summary
item analysis table.
TEST ITEM RELIABILITY ANALYSIS
5. Count the number of
items retained based on
the cognitive domains
in the TOS. Compute the
percentage per level of
24/50 = 48%
• Realistically: Do item analysis to
your most important tests
• end of unit tests, final exams -->
• common exams with other teachers
• common exams gives bigger
sample to work with, which is good
• makes sure that questions other
teacher s prepared are working for
ITEM ANALYSIS is one area where
even a lot of otherwise very good
classroom teachers fall down:
• they think they're doing a good job;
• they think they've doing good
• but without doing item analysis,
• They don’t really know.
ITEM ANALYSIS is not an
end in itself,
•no point unless you use it
to revise items, and
•helps students on the basis
of information you get out
END OF PRESENTATION…
THANK U FOR LISTENING…
HAVE A RELIABLE AND