Caveon webinar series Standard Setting for the 21st Century, Using Information Integration Theory to Produce Cut Scores - August 2014

Caveon Webinar Series:
Applying Information Integration
Theory to Setting Cutscores
and Other Tasks
David Foster,
CEO, Caveon
August 20, 2014

My Personal Issues with Current
Cutscore Methods
1. There are too many methods/variations,
perhaps hundreds. Why is that?
2. The cutscore point seems almost pre-determined.
3. The methods try to direct and conform
judgments (e.g., adding item statistics).
4. There is no check on the consistency and
quality of the judgments made.
5. The rating task is difficult to do.
6. There is a lack of confidence in the
cutscore.
3

So, What Is the Point?
• Why propose another method of
setting cutscores?
– To perhaps solve many of the issues above
– For added value: IIT can apply to other
“judgment” tasks in testing
• Introducing Information Integration
Theory or IIT, borrowed from the
Cognitive Sciences
– 50+ years of theoretical and scientific support
4

Reference Material
• Contributions to Information Integration Theory
Volume I: Cognition. Edited by Norman H.
Anderson (2009).
• Foundations of Information Integration Theory
by Norman H. Anderson (1981).
• Methods of Information Integration Theory by
Norman H. Anderson (1982).
5

IIT: How Is Information Integrated?
3 Fruits
2 Dips
6

Poll
Of the 6 given, which is your most preferred
combination of a dip and a fruit?
 Chocolate and strawberry
 Chocolate and apple slice
 Chocolate and orange slice
 Caramel and strawberry
 Caramel and apple slice
 Caramel and orange slice

Poll Results
• From the poll data:
– There are differences in your top choice,
which is normal for food preference
ratings
– MORE IMPORTANTLY, you were able to
combine or integrate the information
quickly, imagine the taste of the
combinations, rate the combinations,
and make your top pick
8

Much of What We Do Is Integrating
Information and Making Judgments
• Choosing a vacation place
• Buying a car
• Leaving a job for a better one
• Choosing a mate
• Voting
• Picking foods to eat
• …and everything else we do
We are constantly integrating various pieces of
information, then judging, rating, and eventually
deciding and acting based on the integrated
value.
How we do the cognitive part of
these tasks is explained by9 IIT.

Schematic of IIT
Source: Wikipedia
Not Directly Observable
Basic Cognitive Algebra Models:
ADDITIVE AND MULTIPLICATIVE
10

Cognitive Algebra:
ADDITIVE MODEL Examples
• Individuals are adding the stimuli before
judging
• Produces parallelism when charted
Statesmanship rated after
reading two biographical
paragraphs
Cookie size evaluated by 5-
year-olds given length and
width
11

Cognitive Algebra:
MULTIPLICATIVE MODEL Examples
• Individuals are multiplying the stimuli
Value of a lottery ticket
given odds of winning
and value of the ticket
before judging
• Produces linear fan when charted
Rating of likeableness
given adjective and
adverb
12

Not Just Humans
Research I
conducted in
1976 using
pigeons
Information
integrated:
 Type of food
 Amount of work
to obtain the
food

Mid-Webinar Summary of IIT Benefits
for Judgment Tasks in Testing
• Easy visual evaluation of overall ratings
and individual raters
• Better understanding of the judgment
process
• Production of results (e.g., item difficulty
ratings) on interval-level scales
• Quantitative comparison of performance
levels
• Practical benefits: Quicker, easier, less
expensive
15

Item Judgment Exercise
You were asked to go to a Caveon site
and to provide a rating of the difficulty of
3 math questions for students that had
completed the 2nd and 10th grades.
Information that was integrated:
A. Test item content (3 items)
B. Student performance level (2 grade
levels)
16

Example Individual Rater #12
Parallelism?
Additive Model?
17

Parallelism?
Additive Model?
18

Parallelism?
Additive Model?
19

Evaluation of Individual Raters
Here are the
results for Rater
#21 who either
didn’t try, didn’t
understand the
task or simply
answered
randomly.
His results were
removed from the
analysis.
20

Graphical Results for IIT Data
N = 47
Multiplicative
Model!
21

ANOVA Results for IIT Data
Factors F Score Probability
Items 208.48 6.70-35
Proficiency Levels (Grades) 483.97 4.71-26
Items x Proficiency Levels 26.93 6.21-10
Confirms the multiplicative model
22

Hewlett-Packard Certification Exam
Unqualified
Ideally
Qualified
Highly Qualified

Excelsior College Exam
Highly Competent
Competent
Marginally Competent
Weak

So, What Can We Do with
These Results?
Whether the model is ADDITIVE or
MULTIPLICATIVE, interpreting the results is the
same:
1. A model is confirmed.
2. Raters performed the task consistently and
properly.
3. Marginal means of item ratings can be used
as difficulty estimates on an interval scale.
4. Marginal means of performance level ratings
can be used for setting cutscores or other
purposes.
25

How to Set a Cutscore using IIT
At this point, the process is not very
different from what occurs with other
methods.
It is always a challenge to get from
ratings or judgment data to a
corresponding value on the score
scale.
26

Use Mean Ratings of Items for Each
Proficiency Level
• 2nd Grade = 4.95
– Average Difficulty Rating of 15.05
– Subtract from 20 to reverse the scale
• 10th Grade = 15.47
– Average Difficulty Rating of 4.53
– Subtract from 20 to reverse the scale
Remember that these are
cutscores based on the IIT
rating scale of 0 - 20
27

Graphical Display of IIT Cutscores
Cutscore for 2nd Grade = 4.95 (20 - avg rating of 15.05)
Cutscore for 10th Grade = 15.47 (20 - avg rating of 4.53)

One Conceptual Process for Converting
IIT Ratings to a Score Scale
For a particular IIT ratings-based cutscore,
how many items (or what % of items) have
IIT difficulty ratings below that IIT cutscore?
That number (or %) becomes an equivalent
cutscore on the score scale.
There will likely need to be some adjustments
for error.
29

Converting IIT Ratings to Score Scale:
Number of Items
Mean Ratings
Number of Items
Pretend we have 100 items
Instead of only 3
80
10th Grade
15.46
And this graph is a cumulative
frequency distribution of
those items and mean ratings
30

Converting IIT Ratings to Score Scale:
Number of Items
Mean Ratings
Number of Items
Pretend we have 100 items
Instead of only 3
7
2nd Grade
4.95
31

Other Applications of IIT in Testing
• Besides determining cutscores, where
else do we require ratings or
judgments?
– Item accuracy reviews
– Essay scoring
– Bias reviews (gender, race, age, etc.)
– Item quality (e.g., alignment with
objectives)
– Others?
32

Thank you!
Dr. David Foster
CEO, Caveon Test
Security
David.foster@caveon.com
Follow Caveon on twitter @caveon
Check out our blog www.caveon.com/blog
LinkedIn Group “Caveon Test Security”
34

Caveon webinar series Standard Setting for the 21st Century, Using Information Integration Theory to Produce Cut Scores - August 2014

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Caveon webinar series Standard Setting for the 21st Century, Using Information Integration Theory to Produce Cut Scores - August 2014

Similar to Caveon webinar series Standard Setting for the 21st Century, Using Information Integration Theory to Produce Cut Scores - August 2014 (20)

More from Caveon Test Security

More from Caveon Test Security (20)

Recently uploaded

Recently uploaded (20)

Caveon webinar series Standard Setting for the 21st Century, Using Information Integration Theory to Produce Cut Scores - August 2014

Editor's Notes