Item Analysis Evaluation Techniques

How do we evaluate
our test?
A glimpse of our practices
When do we evaluate
our test?

What is a test?
An objective measure of a sample of
behavior or psychological object
(Anastasi & Urbina, 1997).
A systematic procedure for measuring a
sample of behavior by posing a set of
questions in a uniform manner
(Gronlund & Linn, 2000).
Are only tools…

QUALITIES OF AQUALITIES OF A
GOOD TESTGOOD TEST
To be a “GOOD” test, a
test ought to have
validity, reliability, and
accuracy.

The Error Components ofThe Error Components of
a Testa Test
Random Error
◦ Sources: Fatigue, Cheating, Guessing,
etc.
Systematic Error
◦ Sources: Item bias, Technical errors,
Contextual clues, etc.
True Score = Observed Score ± Error

Important Notes from the Classical Test Theory
Systematic errors lead to poor
test reliability and validity.
Interpretations of test result
may be distorted by too many
errors.
Random errors are more
difficult to control than
systematic errors.
Systematic errors can be
controlled by systematic
assessment (Item Analysis).

How do we evaluate our tests?How do we evaluate our tests?
◦ QUALITATIVE EVALUATION
 Systematic inspection of test plan, tasks,
and format
◦ QUANTITATIVE
EVALUATION
 Psychometric techniques in item analysis

A Systematic Inspection of TestsA Systematic Inspection of Tests
Adequacy of Assessment or
Test Plan
Adequacy of Assessment
Task
Adequacy of Test Format
and Directions

QUANTITATIVE
EVALUATION
- determining the psychometric
characteristics of the test
using ITEM ANALYSIS

Psychometric Techniques forPsychometric Techniques for
Item AnalysisItem Analysis
Test / Item Statistics - These are
psychometric techniques generally
based on a norm-referenced
perspective (Method of extreme
groups).
◦ Item Difficulty
◦ Item Discrimination Power
◦ Effectiveness of Distracters

The Method of Extreme GroupsThe Method of Extreme Groups
Selecting criterion groups
(Upper & Lower Criterion
Groups)
◦ If samples are less than 50:
Use 50% groupings
◦ If samples used are more than
50, and for a more refined
analysis: Use upper & lower 27%
groupings
◦ Set aside the papers which will
not be used in the analysis.

The Method of Extreme GroupsThe Method of Extreme Groups
Determine item statistics
◦ Item Difficulty
◦ Item Discrimination
◦ Effectiveness of distracters
Determine test statistics
◦ Solve for the mean (average) of the
difficulty and discrimination indices
of all the items in the test.

What is item difficulty?What is item difficulty?
Item difficulty is simply the percentage of
students taking the test who answered the
item correctly. It is represented by p index.
Range: 0.0 to 1.0 or 0% to 100%
It can be computed using the formula below
N
LU
p RR +
=

◦ Some important notations
 UR = No. of students from the UPPER
criterion group that had gotten the
item correctly or had chosen a
particular option under analysis.
 LR = No. of students from the LOWER
criterion group that had gotten the
item correctly or had chosen a
particular option under analysis.
 N = No. of students that had tried to
answer the item

Rule of the Thumb:Rule of the Thumb: pp - index- index
The nearer the p value is to 0.0,
or 0.0% the more DIFFICULT the
item becomes.
The nearer the p value is to 1.0,
or 100 % the EASIER the item
becomes.

How difficult should ourHow difficult should our
test/item be?test/item be?
For an objective item test, the ideal
difficulty would be halfway between the
percentage of pure guess and 100%.
(Thompson & Levitov, 1985)
◦ Ex. p=0.63 for a multiple choice test with 4
options.
Eclectic distribution of difficult, average
and easy items; with extremely limited
use of items having p = 0.9 or more
(Frary, 1995)

How important is item difficulty?How important is item difficulty?
An item having p = 0.0 and 1.0 does not
in any way contribute to measuring
individual differences, and seriously
affects test validity.
Item difficulty has a profound effect on
the variability of test scores and the
precision to which the test
discriminates between achievement
groups. (Thorndike, et al.,1991)

What is item discrimination?What is item discrimination?
It is the ability of an item to
discriminate between students with
high or low achievement. It is
represented by Di.
Range: -1.0 to 1.0 or 0% to 100%
It can be computed using the
formula below
0.5N
LU
Di RR −
=

Rule of the Thumb:Rule of the Thumb: D iD i- index- index
The higher the discrimination index,
the better the item becomes. This is
so because such a value indicates
that the discrimination index is in
favor of the upper achievement
group, whom we expect to get more
of the items in the test correctly.

Why should our items discriminate?Why should our items discriminate?
Items that do not discriminate
can seriously affect the validity of
the test.
Negatively discriminating items
are useless and tend to decrease
the validity of the test. (Wood,
1960)

How do we evaluate theHow do we evaluate the
effectiveness of distracters?effectiveness of distracters?
Solve for Di of each wrong
option (applicable only for
multiple choice items)
Rule of the thumb: The
more negative the Di index
will be the more effective is
the distracter

Workshop 2Workshop 2
Systematic inspection of
test items, tasks and format
Item analysis of using
Microsoft Excel ®
and
STATISTICA 6.0 .0 or SPSS

Some points of caution inSome points of caution in
interpreting item/test statisticsinterpreting item/test statistics
A low index of discriminating
power does NOT necessarily
indicate a defective item.
◦ Non-technical factors which
contribute to item discriminating
power
 Emphasis given to a domain or
content covered by a test.
 Homogeneity and characteristics
of student groups
 Difficulty of an item

Some points of caution inSome points of caution in
interpreting item/test statisticsinterpreting item/test statistics
Item analysis data from small
samples are highly tentative and
tends to fluctuate due to its norm
reference perspective.
A test that had undergone
psychometric analysis is NOT
necessarily a STANDARDIZED test.

What should we do after itemWhat should we do after item
analysis?analysis?
Further analysis with
other psychometric
methods
◦ Ex. Item bias analysis,
Point-biserial and biserial
correlations
Item Calibration
◦ IRT and Rasch Scaling
Item banking

Practical Implications ofPractical Implications of
Evaluating TestsEvaluating Tests
It helps prevent wastage of
time and effort that went in
to test and assessment
preparation.
It provides a basis for the
general improvement of
classroom instruction.
It provides a venue for
teachers to develop their test
construction skills.

Item Analysis Evaluation Techniques

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Item Analysis Evaluation Techniques

Similar to Item Analysis Evaluation Techniques (20)

More from Melanio Florino

More from Melanio Florino (20)

Recently uploaded

Recently uploaded (20)

Item Analysis Evaluation Techniques

Editor's Notes