Workshop: Item Writing
Objectives for Today
Item writing and development:
Terminology
General Guidelines
Writing CR items
Next week: Examples and reviewing
Objectives of Item Development
1. Build exams that are high quality, legally
defensible, and produce reliable and valid
score interpretations
2. Create items that directly assess the
knowledge and skills in question
3. Minimize (ultimately eliminate) distractions and
undue influence
Terminology
Item: a test ‘question’ or prompt designed to assess
knowledge, skills or abilities
Stem: the part of the item that presents the content
Asset/Stimulus: the part of the item that presents or
exhibits information, might be a reading passage, image,
or audio
Options: the available answers
Key: the correct answer
Distractor: the incorrect options
stem
asset
key
distractors
options
item
Terminology
Terminology
Response: the recorded input of an individual examinee
Rubric: a clearly defined set of criteria (rules) for scoring
a free response item
Convert open responses to a scale of numbers
Can have multiple rubrics on one item
Item Types
There are two major item type categories:
1. Selected Response
Multiple choice
Multiple Response
Drag and drop or matching
2. Constructed Response (aka free or open)
Essay or similar (e.g., math problem)
Short answer or Fill-In-The-Blank (can still be automatically
scored)
Performance tests
Example: Multiple Choice
What is the capital of Norway?
a. Oslo*
b. Bergen
c. Stavanger
d. Stockholm
Example: Multiple Response
Which of the following are cities in Norway?
a. Oslo*
b. Copenhagen
c. Stavanger*
d. Stockholm
City in Norway Not a City in Norway
Drag and Drop
is often same
as Multiple
Response!
Example: Scored short answer
James purchased a music album for $8. It was
discounted by 20%. What was the regular price?
(Student would type response in the box;
acceptable answers might be: $10, ten dollars, 10
dollars)
More difficult and high-fidelity than MC
Example: Fill in the blank
_________ is capital of Norway.
(Student would type response in the box)
Example: Essay
Write a detailed account of how Oslo became
the capital of Norway. Why was it a good
choice? Provide three reasons to support your
position.
This lends itself to rubrics:
Position? 0/1 points
Reasons? 0/1/2/3 points
Historical accuracy? 0/1/2 points
Construct-Irrelevant Variance
The enemy of all tests is construct-irrelevant
variance.
Scores should reflect an examinees knowledge,
skills, or abilities as it relates to the construct of
interest (in this case, course competency)
Measurement of anything else is irrelevant and
unhelpful.
Remember that reliability is ~unidimensionality
Construct-Irrelevant Variance
Our Goal:
Reduce construct-irrelevant variance
Construct-Irrelevant Variance
It’s a scientific experiment; we want to hold all
variables constant except the variable of
interest.
Guidelines for Item Writing
The following are some general applicable
guidelines, regardless of test purpose or item
type
Validity: Remember original purpose
What is the goal of the test?
Show minimal subject mastery?
Show mastery at a range of levels?
Differentiate the top students?
Identify students that need remediation?
Clear Information
Be clear and concise in the item’s content:
Provide all information that is needed
Do not provide extraneous or superfluous
information unless a distractor
Make sure formatting is as clear as possible
Utilize Blueprints!
Make sure that the
content of items maps to
the blueprint as directly
as possible
Record
rationale/source/etc.
Essential link in the
validity chain of evidence
Think like an examinee
While writing an items, it’s important to
think like an examinee.
Very important but often overlooked:
Quality distractors
What is Appropriate Difficulty?
 Write items of appropriate difficulty…
 While a very difficult item might be correct and
actually quite “good” it might not serve the
purposes of a test of minimal competence
 Some tests call for a narrow range of difficulty
 Some situations call for a wide range
 Enhances reliability because more score variance
Rationale and Source
Whenever possible, record the rationale
and source or reference for finding the
correct answer
For example:
“Answer B is correct because the stem says
_____; C and D would not have an effect and A
would actually counteract because _____.”
“Found on Page 125 of Jackson 2013 Text”
Maintain Grammar
If a question mark completes the stem,
options should be formatted as stand alone
phrases. No need for punctuation.
What is the capital of Norway?
a. Oslo*
b. Bergen
c. Stavanger
d. Stockholm
Maintain Grammar
If the stem does not end with punctuation, the
options should complete the stem’s sentence.
The capital of Norway is
a. Oslo.*
b. Bergen.
c. Stavanger.
d. Stockholm.
Maintain Grammar
Capitalize appropriately – proper nouns
require capitalization, but otherwise it is
generally unnecessary.
Washington, D.C. is the _____ of the United
States.
a. capital*
b. largest city
c. primary port
d. southernmost city
How to Write Items
1. Identify a relevant situation or a piece of
necessary knowledge that you’d like to
evaluate. Consult your Blueprint.
2. Browse text books, references and sources
that are relevant to the exam – generate ideas!
3. Determine how to structure the item
Correct answer
Distractors!
How to Write Items
Best Practices in quality control for Multiple
Choice questions:
Ensure that the key is truly correct
Check that distractors are fully incorrect, but
plausible
Review the stem to make sure all necessary
information is presented
Make sure that the “question” part of the stem is
clear and indicates the type of response necessary
How to Write Items
Examples and counter examples of these
specific guidelines will be the next
workshop (Item Review)
Constructed Response Items
Goal:
Forming a connection from complex
responses and real-life situations to reliable
scores
How to Write Items: CR
Examples of constructed response items:
Solving a practical problem (high fidelity)
Proposing solutions with explanations
Create a solution within certain parameters
Essays (argumentative or creative)
Synthesizing information
How to Write Items: CR
Guidelines for constructed response items:
Determine the topic for the item
Establish the scenario
Determine all necessary information
Reduce/eliminate unnecessary information
(unless a distractor!)
Think of the steps, write the item, answer it
yourself as a student
Scoring CR Items
Scoring of CR items can be difficult due to
complexity
Remember that the most interesting item in the
world does not do any good if no way to
accurately score it!
If possible, link to algorithm of problem solving
Weight by difficulty or criticality
Forgetting to round as last step, or provide units …vs…
Utilizing incorrect information from scenario
Scoring CR Items
Approaches to CR scoring (keep in mind
while writing)
Score on process
Score on results
Score on both
Did student complete each step?
Did they reach the correct answer?
Scoring CR Items
Ways to convert CR item to points
Rubrics
Points for errors/completions
Points for answers or multiple answers
These make your life easier and standardize the
scoring, making it more reliable
Rubrics
Rubrics are very helpful
A set of rules to convert open responses to
score points
Rubric/Criteria: What you are rating
Rating scale: Axis, with point levels
Descriptors: Examples of what each mean
Rubrics
Identify axes (often driven by curriculum)
Establish relevant point levels (can differ)
Establish descriptors
Revisit point levels
Observable or isolatable
Rubrics
Some examples of rubrics with dos/don’ts
Mention that they have their own set of
statistics – inter-rater reliability, agreement,
read-behinds, etc.
Using multiple answers
Provide a complex scenario, ask student to
list every piece of information they would
need to solve (e.g., there are 5)
3 points for each correct
-3 for each missing
-3 for each supplied that is not correct
Note: You could earn -15!
Performance Testing
Deductions or additions due to criticality
Example: 100 possible points
Cutscore = 80
-7 for minor error
-14 for moderate error
-21 for critical error (e.g., safety)
Performance Testing
Performance Testing
Interestingly, Performance
Testing still lacks a true
psychometric theory
Readings
Haladyna, T. M., Rodriguez, M. C., &
Downing, S. M. (2013). Developing and
validating test items. NY: Routledge.
Downing & Haladyna (2006). The
Handbook of Test Development.
Lots of free resources on internet (ASC has
an item writing guide…)
Question and Answer

Best Practices in Item Writing

  • 1.
  • 2.
    Objectives for Today Itemwriting and development: Terminology General Guidelines Writing CR items Next week: Examples and reviewing
  • 3.
    Objectives of ItemDevelopment 1. Build exams that are high quality, legally defensible, and produce reliable and valid score interpretations 2. Create items that directly assess the knowledge and skills in question 3. Minimize (ultimately eliminate) distractions and undue influence
  • 4.
    Terminology Item: a test‘question’ or prompt designed to assess knowledge, skills or abilities Stem: the part of the item that presents the content Asset/Stimulus: the part of the item that presents or exhibits information, might be a reading passage, image, or audio Options: the available answers Key: the correct answer Distractor: the incorrect options
  • 5.
  • 6.
    Terminology Response: the recordedinput of an individual examinee Rubric: a clearly defined set of criteria (rules) for scoring a free response item Convert open responses to a scale of numbers Can have multiple rubrics on one item
  • 7.
    Item Types There aretwo major item type categories: 1. Selected Response Multiple choice Multiple Response Drag and drop or matching 2. Constructed Response (aka free or open) Essay or similar (e.g., math problem) Short answer or Fill-In-The-Blank (can still be automatically scored) Performance tests
  • 8.
    Example: Multiple Choice Whatis the capital of Norway? a. Oslo* b. Bergen c. Stavanger d. Stockholm
  • 9.
    Example: Multiple Response Whichof the following are cities in Norway? a. Oslo* b. Copenhagen c. Stavanger* d. Stockholm City in Norway Not a City in Norway Drag and Drop is often same as Multiple Response!
  • 10.
    Example: Scored shortanswer James purchased a music album for $8. It was discounted by 20%. What was the regular price? (Student would type response in the box; acceptable answers might be: $10, ten dollars, 10 dollars) More difficult and high-fidelity than MC
  • 11.
    Example: Fill inthe blank _________ is capital of Norway. (Student would type response in the box)
  • 12.
    Example: Essay Write adetailed account of how Oslo became the capital of Norway. Why was it a good choice? Provide three reasons to support your position. This lends itself to rubrics: Position? 0/1 points Reasons? 0/1/2/3 points Historical accuracy? 0/1/2 points
  • 13.
    Construct-Irrelevant Variance The enemyof all tests is construct-irrelevant variance. Scores should reflect an examinees knowledge, skills, or abilities as it relates to the construct of interest (in this case, course competency) Measurement of anything else is irrelevant and unhelpful. Remember that reliability is ~unidimensionality
  • 14.
  • 15.
    Construct-Irrelevant Variance It’s ascientific experiment; we want to hold all variables constant except the variable of interest.
  • 16.
    Guidelines for ItemWriting The following are some general applicable guidelines, regardless of test purpose or item type
  • 17.
    Validity: Remember originalpurpose What is the goal of the test? Show minimal subject mastery? Show mastery at a range of levels? Differentiate the top students? Identify students that need remediation?
  • 18.
    Clear Information Be clearand concise in the item’s content: Provide all information that is needed Do not provide extraneous or superfluous information unless a distractor Make sure formatting is as clear as possible
  • 19.
    Utilize Blueprints! Make surethat the content of items maps to the blueprint as directly as possible Record rationale/source/etc. Essential link in the validity chain of evidence
  • 20.
    Think like anexaminee While writing an items, it’s important to think like an examinee. Very important but often overlooked: Quality distractors
  • 21.
    What is AppropriateDifficulty?  Write items of appropriate difficulty…  While a very difficult item might be correct and actually quite “good” it might not serve the purposes of a test of minimal competence  Some tests call for a narrow range of difficulty  Some situations call for a wide range  Enhances reliability because more score variance
  • 22.
    Rationale and Source Wheneverpossible, record the rationale and source or reference for finding the correct answer For example: “Answer B is correct because the stem says _____; C and D would not have an effect and A would actually counteract because _____.” “Found on Page 125 of Jackson 2013 Text”
  • 23.
    Maintain Grammar If aquestion mark completes the stem, options should be formatted as stand alone phrases. No need for punctuation. What is the capital of Norway? a. Oslo* b. Bergen c. Stavanger d. Stockholm
  • 24.
    Maintain Grammar If thestem does not end with punctuation, the options should complete the stem’s sentence. The capital of Norway is a. Oslo.* b. Bergen. c. Stavanger. d. Stockholm.
  • 25.
    Maintain Grammar Capitalize appropriately– proper nouns require capitalization, but otherwise it is generally unnecessary. Washington, D.C. is the _____ of the United States. a. capital* b. largest city c. primary port d. southernmost city
  • 26.
    How to WriteItems 1. Identify a relevant situation or a piece of necessary knowledge that you’d like to evaluate. Consult your Blueprint. 2. Browse text books, references and sources that are relevant to the exam – generate ideas! 3. Determine how to structure the item Correct answer Distractors!
  • 27.
    How to WriteItems Best Practices in quality control for Multiple Choice questions: Ensure that the key is truly correct Check that distractors are fully incorrect, but plausible Review the stem to make sure all necessary information is presented Make sure that the “question” part of the stem is clear and indicates the type of response necessary
  • 28.
    How to WriteItems Examples and counter examples of these specific guidelines will be the next workshop (Item Review)
  • 29.
    Constructed Response Items Goal: Forminga connection from complex responses and real-life situations to reliable scores
  • 30.
    How to WriteItems: CR Examples of constructed response items: Solving a practical problem (high fidelity) Proposing solutions with explanations Create a solution within certain parameters Essays (argumentative or creative) Synthesizing information
  • 31.
    How to WriteItems: CR Guidelines for constructed response items: Determine the topic for the item Establish the scenario Determine all necessary information Reduce/eliminate unnecessary information (unless a distractor!) Think of the steps, write the item, answer it yourself as a student
  • 32.
    Scoring CR Items Scoringof CR items can be difficult due to complexity Remember that the most interesting item in the world does not do any good if no way to accurately score it! If possible, link to algorithm of problem solving Weight by difficulty or criticality Forgetting to round as last step, or provide units …vs… Utilizing incorrect information from scenario
  • 33.
    Scoring CR Items Approachesto CR scoring (keep in mind while writing) Score on process Score on results Score on both Did student complete each step? Did they reach the correct answer?
  • 34.
    Scoring CR Items Waysto convert CR item to points Rubrics Points for errors/completions Points for answers or multiple answers These make your life easier and standardize the scoring, making it more reliable
  • 35.
    Rubrics Rubrics are veryhelpful A set of rules to convert open responses to score points Rubric/Criteria: What you are rating Rating scale: Axis, with point levels Descriptors: Examples of what each mean
  • 36.
    Rubrics Identify axes (oftendriven by curriculum) Establish relevant point levels (can differ) Establish descriptors Revisit point levels Observable or isolatable
  • 37.
    Rubrics Some examples ofrubrics with dos/don’ts Mention that they have their own set of statistics – inter-rater reliability, agreement, read-behinds, etc.
  • 38.
    Using multiple answers Providea complex scenario, ask student to list every piece of information they would need to solve (e.g., there are 5) 3 points for each correct -3 for each missing -3 for each supplied that is not correct Note: You could earn -15!
  • 39.
    Performance Testing Deductions oradditions due to criticality Example: 100 possible points Cutscore = 80 -7 for minor error -14 for moderate error -21 for critical error (e.g., safety)
  • 40.
  • 41.
    Performance Testing Interestingly, Performance Testingstill lacks a true psychometric theory
  • 42.
    Readings Haladyna, T. M.,Rodriguez, M. C., & Downing, S. M. (2013). Developing and validating test items. NY: Routledge. Downing & Haladyna (2006). The Handbook of Test Development. Lots of free resources on internet (ASC has an item writing guide…)
  • 43.

Editor's Notes

  • #8 We will start with examples of fixed response questions
  • #9 This is a multiple choice question, only one answer correct. Next example is a multiple response question,
  • #10 This is a multiple response item, two options are correct (but we follow the same question – answer format)
  • #11 This is a multiple choice question in format 2
  • #12 This is a multiple choice question in format 2
  • #13 This would be an essay question – asking examinees to put together a lengthy response