Best Practices in Item Writing

Objectives for Today
Item writing and development:
Terminology
General Guidelines
Writing CR items
Next week: Examples and reviewing

Objectives of Item Development
1. Build exams that are high quality, legally
defensible, and produce reliable and valid
score interpretations
2. Create items that directly assess the
knowledge and skills in question
3. Minimize (ultimately eliminate) distractions and
undue influence

Terminology
Item: a test ‘question’ or prompt designed to assess
knowledge, skills or abilities
Stem: the part of the item that presents the content
Asset/Stimulus: the part of the item that presents or
exhibits information, might be a reading passage, image,
or audio
Options: the available answers
Key: the correct answer
Distractor: the incorrect options

stem
asset
key
distractors
options
item
Terminology

Terminology
Response: the recorded input of an individual examinee
Rubric: a clearly defined set of criteria (rules) for scoring
a free response item
Convert open responses to a scale of numbers
Can have multiple rubrics on one item

Item Types
There are two major item type categories:
1. Selected Response
Multiple choice
Multiple Response
Drag and drop or matching
2. Constructed Response (aka free or open)
Essay or similar (e.g., math problem)
Short answer or Fill-In-The-Blank (can still be automatically
scored)
Performance tests

Example: Multiple Choice
What is the capital of Norway?
a. Oslo*
b. Bergen
c. Stavanger
d. Stockholm

Example: Multiple Response
Which of the following are cities in Norway?
a. Oslo*
b. Copenhagen
c. Stavanger*
d. Stockholm
City in Norway Not a City in Norway
Drag and Drop
is often same
as Multiple
Response!

Example: Scored short answer
James purchased a music album for $8. It was
discounted by 20%. What was the regular price?
(Student would type response in the box;
acceptable answers might be: $10, ten dollars, 10
dollars)
More difficult and high-fidelity than MC

Example: Fill in the blank
_________ is capital of Norway.
(Student would type response in the box)

Example: Essay
Write a detailed account of how Oslo became
the capital of Norway. Why was it a good
choice? Provide three reasons to support your
position.
This lends itself to rubrics:
Position? 0/1 points
Reasons? 0/1/2/3 points
Historical accuracy? 0/1/2 points

Construct-Irrelevant Variance
The enemy of all tests is construct-irrelevant
variance.
Scores should reflect an examinees knowledge,
skills, or abilities as it relates to the construct of
interest (in this case, course competency)
Measurement of anything else is irrelevant and
unhelpful.
Remember that reliability is ~unidimensionality

Our Goal:
Reduce construct-irrelevant variance

It’s a scientific experiment; we want to hold all
variables constant except the variable of
interest.

Guidelines for Item Writing
The following are some general applicable
guidelines, regardless of test purpose or item
type

Validity: Remember original purpose
What is the goal of the test?
Show minimal subject mastery?
Show mastery at a range of levels?
Differentiate the top students?
Identify students that need remediation?

Clear Information
Be clear and concise in the item’s content:
Provide all information that is needed
Do not provide extraneous or superfluous
information unless a distractor
Make sure formatting is as clear as possible

Utilize Blueprints!
Make sure that the
content of items maps to
the blueprint as directly
as possible
Record
rationale/source/etc.
Essential link in the
validity chain of evidence

Think like an examinee
While writing an items, it’s important to
think like an examinee.
Very important but often overlooked:
Quality distractors

What is Appropriate Difficulty?
 Write items of appropriate difficulty…
 While a very difficult item might be correct and
actually quite “good” it might not serve the
purposes of a test of minimal competence
 Some tests call for a narrow range of difficulty
 Some situations call for a wide range
 Enhances reliability because more score variance

Rationale and Source
Whenever possible, record the rationale
and source or reference for finding the
correct answer
For example:
“Answer B is correct because the stem says
_____; C and D would not have an effect and A
would actually counteract because _____.”
“Found on Page 125 of Jackson 2013 Text”

Maintain Grammar
If a question mark completes the stem,
options should be formatted as stand alone
phrases. No need for punctuation.
What is the capital of Norway?
a. Oslo*
b. Bergen
c. Stavanger
d. Stockholm

Maintain Grammar
If the stem does not end with punctuation, the
options should complete the stem’s sentence.
The capital of Norway is
a. Oslo.*
b. Bergen.
c. Stavanger.
d. Stockholm.

Maintain Grammar
Capitalize appropriately – proper nouns
require capitalization, but otherwise it is
generally unnecessary.
Washington, D.C. is the _____ of the United
States.
a. capital*
b. largest city
c. primary port
d. southernmost city

How to Write Items
1. Identify a relevant situation or a piece of
necessary knowledge that you’d like to
evaluate. Consult your Blueprint.
2. Browse text books, references and sources
that are relevant to the exam – generate ideas!
3. Determine how to structure the item
Correct answer
Distractors!

How to Write Items
Best Practices in quality control for Multiple
Choice questions:
Ensure that the key is truly correct
Check that distractors are fully incorrect, but
plausible
Review the stem to make sure all necessary
information is presented
Make sure that the “question” part of the stem is
clear and indicates the type of response necessary

How to Write Items
Examples and counter examples of these
specific guidelines will be the next
workshop (Item Review)

Constructed Response Items
Goal:
Forming a connection from complex
responses and real-life situations to reliable
scores

How to Write Items: CR
Examples of constructed response items:
Solving a practical problem (high fidelity)
Proposing solutions with explanations
Create a solution within certain parameters
Essays (argumentative or creative)
Synthesizing information

How to Write Items: CR
Guidelines for constructed response items:
Determine the topic for the item
Establish the scenario
Determine all necessary information
Reduce/eliminate unnecessary information
(unless a distractor!)
Think of the steps, write the item, answer it
yourself as a student

Scoring CR Items
Scoring of CR items can be difficult due to
complexity
Remember that the most interesting item in the
world does not do any good if no way to
accurately score it!
If possible, link to algorithm of problem solving
Weight by difficulty or criticality
Forgetting to round as last step, or provide units …vs…
Utilizing incorrect information from scenario

Scoring CR Items
Approaches to CR scoring (keep in mind
while writing)
Score on process
Score on results
Score on both
Did student complete each step?
Did they reach the correct answer?

Scoring CR Items
Ways to convert CR item to points
Rubrics
Points for errors/completions
Points for answers or multiple answers
These make your life easier and standardize the
scoring, making it more reliable

Rubrics
Rubrics are very helpful
A set of rules to convert open responses to
score points
Rubric/Criteria: What you are rating
Rating scale: Axis, with point levels
Descriptors: Examples of what each mean

Rubrics
Identify axes (often driven by curriculum)
Establish relevant point levels (can differ)
Establish descriptors
Revisit point levels
Observable or isolatable

Rubrics
Some examples of rubrics with dos/don’ts
Mention that they have their own set of
statistics – inter-rater reliability, agreement,
read-behinds, etc.

Using multiple answers
Provide a complex scenario, ask student to
list every piece of information they would
need to solve (e.g., there are 5)
3 points for each correct
-3 for each missing
-3 for each supplied that is not correct
Note: You could earn -15!

Performance Testing
Deductions or additions due to criticality
Example: 100 possible points
Cutscore = 80
-7 for minor error
-14 for moderate error
-21 for critical error (e.g., safety)

Performance Testing
Interestingly, Performance
Testing still lacks a true
psychometric theory

Readings
Haladyna, T. M., Rodriguez, M. C., &
Downing, S. M. (2013). Developing and
validating test items. NY: Routledge.
Downing & Haladyna (2006). The
Handbook of Test Development.
Lots of free resources on internet (ASC has
an item writing guide…)

Best Practices in Item Writing

More Related Content

What's hot

Viewers also liked

Similar to Best Practices in Item Writing

More from Nathan Thompson

Recently uploaded

Best Practices in Item Writing

Editor's Notes