for
effective
test design
Isabela Villas Boas
Terminology brush-up
Evaluation, assessment, and testing
Types of tests
Cornerstones of testing: usefulness, validity,
reliability, practicality, washback, authenticity,
transparency
Alignment
Evaluation, assessment, and test
Evaluation – concerned with the overall program;
considers all aspects of teaching and learning (Genessee,
2001, as cited in Coobe et al., 2007)
Assessment – a variety of ways of collecting
information on learners’ achievement or ability
Test – a type of assessment tool
Types of tests
Placement Aptitude Diagnostic
Achievement Proficiency
Usefulness
Any language test must be
developed with a specific
purpose, a particular group of
test-takers, and a specific
language use in mind. (Bachman and
Palmer, 1996, as cited in Coombe et al, 2007)
Validity
The extent to which the test
measures what it purports to
measure – content, construct,
and face validity.
Reliability
Consistency of test scores:
formats and content of questions
and the time given to students to
take the exam must be
consistent. The more items on a
test, the more reliable it is.
Practicality
Are teachers able to develop,
administer, and mark the test
within the available time and
with the available resources?
Washback
The effect of testing on teaching
and learning. It can be positive
or negative.
Authenticity
Tests should reflect authentic,
real-life uses of the language and
use authentic or authentic-like
materials as much as possible.
Transparency
The availability of clear, accurate
information to students about
testing: outcomes to be
evaluated, formats used,
weighing of items, time allowed,
grading criteria.
LEARNING OBJECTIVES
ASSESSMENTS
INSTRUCTIONAL ACTIVITIES
A
L
I
G
N
M
E
N
T
Eberly Center for Teaching Excellence, Carnegie Mellon University
Are
tests
always
bad?
You have probably taken or given
bad tests. Why were they bad?
Before writing the test – make a
blueprint / test specifications
Write
down all
the
learning
outcomes
you want
to test
Decide how you
are going to
assess each
outcome – what
type of item will
you use?
Multiple-choice?
Fill-in-the blanks?
Constructed
response? Make
sure they are
aligned with the
instructional
strategies.
Balance types of
items.
Decide how
much each
test item is
going to be
worth, based
on how
much value
you want to
give to each
skill or sub-
skill.
Decisions about what to include on the
test
If you teach the four skills, the four skills should
be part of your assessment system. If you have
an oral test, then include the three other skills
on the written test.
Writing as a skill is more effectively assessed
independently, in writing activities (writing for
writing X writing for learning).
Decisions about what to include on the
test
Balance grammar, language functions, and
vocabulary (grammatical competence,
pragmatic competence, and discourse
competence).
Decisions about what to include on the
test
Think about how the test will reflect the types
of activities you do in your classroom.
Think about how the test will affect the
teaching (washback).
Testing listening and reading
Choose texts that are neither too easy nor too
hard for the students; use the i+1 rationale.
Choose text genres that match the genres
presented in the instructional activities.
Testing listening and reading
Try to use texts that are as authentic as
possible.
Include items that test listening and reading for
the main idea and listening and reading for
details (top-down and bottom-up processing).
Testing listening and reading
If the listening or reading is a bit difficult, construct
easier items. If it is easier, construct more
challenging items.
Make sure there aren’t unknown words/structures
in the listening or reading comprehension items.
You are testing the reading of the text, not of the
reading comprehension items!
Testing listening and reading
Go beyond True and False!
Other types of items are:
- Listen and fill in the blanks
- Correct wrong information
- Read and complete a summary
- Listen or read and fill in the chart / fill out a form
(information transfer)
- Number events in the correct order
- Relate the text to a picture
- Listen and draw
- Insert sentences into the reading.
Be careful!
Make sure students can’t answer the questions
just by common sense, without listening or
reading.
Make sure the questions are not ambiguous or
based on subjective interpretation.
Be careful!
Avoid “not stated” for listening because there’s
no time to process the text and also remember
what’s not in it.
Make sure the listening is at a pace that
matches the listening materials used in class.
Be careful!
Arrange the listening items in the same order
that the information appears in the listening.
Only assess the listening and reading strategies
that were taught during the period, as this is an
achievement test, not a proficiency one.
Avoid tricky items!
Grammar, functions, and vocabulary
Whenever possible, contextualize the item: use
a dialogue or a paragraph rather than isolated
sentences (discourse competence).
When you contextualize, choose topics that are
relevant/familiar to students.
Grammar, functions, and vocabulary
Balance selected-response and constructed-
response items. The more advanced the test,
the more constructed-response items it should
have.
Types of vocabulary items
- Fill-in-the blanks with words from a word
bank; always include at least one extra word.
- Match the word with the sentence it can
complete: always include an extra item on the
right, not on the left to prevent correction
confusion (more numbers than parentheses).
Don’t ask students to match only words and
definitions.
Types of vocabulary items
- Complete the sentences with words; provide
the first letter of each word (no word bank)
- Crossword puzzles
- Multiple-choice
- Odd-man out
Types of grammar / language function
items
- Multiple-choice
- Editing
- Fill-in-the blanks of sentences, dialogues, or
paragraphs
- Complete the dialogue
- Answer questions
- Ask questions for the answers provided
- Sentence transformation (from active to passive;
from direct to indirect speech)
- Write sentences based on a chart or graph
You can also integrate grammar and vocabulary
Beyond grammar: language functions
Be careful!
Avoid exceptions / rare cases / grammar
particularities.
Consider what was taught only for recognition and
what was taught for production when you decide
what type of test item to use.
Don’t be more wordy than you need in your
paragraphs and dialogues. You are not testing
reading!
Be careful!
Don’t make the items too mechanical, with
almost the same answer in all sentences.
Stick to your instructional strategies - ex: if
students didn’t do an exercise mixing two tenses,
don’t include this on the test!
Provide a correction key for reliability purposes:
what should be given partial credit to? What
should not?
Be careful!
Don’t assign too many points for each item.
Distribute your points according to what was
most and least emphasized.
Don’t make your test a test of intelligence
and/or critical thinking rather than a test of
language skills.
Controversial issues
Should timed, one-shot writings (paragraphs and
essays) be part of a test?
Should we provide a glossary of some unknown
words for the reading?
How many times should the listening be played?
Controversial issues
Should we test grammatical terminology?
Should the distractors in multiple-choice
exercises contain grammatical errors?
Should we only use authentic texts on tests,
even at the basic level?
Tips for good test design

Tips for good test design

  • 1.
  • 3.
    Terminology brush-up Evaluation, assessment,and testing Types of tests Cornerstones of testing: usefulness, validity, reliability, practicality, washback, authenticity, transparency Alignment
  • 4.
    Evaluation, assessment, andtest Evaluation – concerned with the overall program; considers all aspects of teaching and learning (Genessee, 2001, as cited in Coobe et al., 2007) Assessment – a variety of ways of collecting information on learners’ achievement or ability Test – a type of assessment tool
  • 5.
    Types of tests PlacementAptitude Diagnostic Achievement Proficiency
  • 6.
    Usefulness Any language testmust be developed with a specific purpose, a particular group of test-takers, and a specific language use in mind. (Bachman and Palmer, 1996, as cited in Coombe et al, 2007)
  • 7.
    Validity The extent towhich the test measures what it purports to measure – content, construct, and face validity.
  • 8.
    Reliability Consistency of testscores: formats and content of questions and the time given to students to take the exam must be consistent. The more items on a test, the more reliable it is.
  • 9.
    Practicality Are teachers ableto develop, administer, and mark the test within the available time and with the available resources?
  • 10.
    Washback The effect oftesting on teaching and learning. It can be positive or negative.
  • 11.
    Authenticity Tests should reflectauthentic, real-life uses of the language and use authentic or authentic-like materials as much as possible.
  • 12.
    Transparency The availability ofclear, accurate information to students about testing: outcomes to be evaluated, formats used, weighing of items, time allowed, grading criteria.
  • 13.
    LEARNING OBJECTIVES ASSESSMENTS INSTRUCTIONAL ACTIVITIES A L I G N M E N T EberlyCenter for Teaching Excellence, Carnegie Mellon University
  • 14.
  • 16.
    You have probablytaken or given bad tests. Why were they bad?
  • 18.
    Before writing thetest – make a blueprint / test specifications Write down all the learning outcomes you want to test Decide how you are going to assess each outcome – what type of item will you use? Multiple-choice? Fill-in-the blanks? Constructed response? Make sure they are aligned with the instructional strategies. Balance types of items. Decide how much each test item is going to be worth, based on how much value you want to give to each skill or sub- skill.
  • 19.
    Decisions about whatto include on the test If you teach the four skills, the four skills should be part of your assessment system. If you have an oral test, then include the three other skills on the written test. Writing as a skill is more effectively assessed independently, in writing activities (writing for writing X writing for learning).
  • 20.
    Decisions about whatto include on the test Balance grammar, language functions, and vocabulary (grammatical competence, pragmatic competence, and discourse competence).
  • 21.
    Decisions about whatto include on the test Think about how the test will reflect the types of activities you do in your classroom. Think about how the test will affect the teaching (washback).
  • 22.
    Testing listening andreading Choose texts that are neither too easy nor too hard for the students; use the i+1 rationale. Choose text genres that match the genres presented in the instructional activities.
  • 23.
    Testing listening andreading Try to use texts that are as authentic as possible. Include items that test listening and reading for the main idea and listening and reading for details (top-down and bottom-up processing).
  • 24.
    Testing listening andreading If the listening or reading is a bit difficult, construct easier items. If it is easier, construct more challenging items. Make sure there aren’t unknown words/structures in the listening or reading comprehension items. You are testing the reading of the text, not of the reading comprehension items!
  • 25.
    Testing listening andreading Go beyond True and False! Other types of items are: - Listen and fill in the blanks - Correct wrong information - Read and complete a summary - Listen or read and fill in the chart / fill out a form (information transfer) - Number events in the correct order - Relate the text to a picture - Listen and draw - Insert sentences into the reading.
  • 32.
    Be careful! Make surestudents can’t answer the questions just by common sense, without listening or reading. Make sure the questions are not ambiguous or based on subjective interpretation.
  • 33.
    Be careful! Avoid “notstated” for listening because there’s no time to process the text and also remember what’s not in it. Make sure the listening is at a pace that matches the listening materials used in class.
  • 34.
    Be careful! Arrange thelistening items in the same order that the information appears in the listening. Only assess the listening and reading strategies that were taught during the period, as this is an achievement test, not a proficiency one. Avoid tricky items!
  • 35.
    Grammar, functions, andvocabulary Whenever possible, contextualize the item: use a dialogue or a paragraph rather than isolated sentences (discourse competence). When you contextualize, choose topics that are relevant/familiar to students.
  • 36.
    Grammar, functions, andvocabulary Balance selected-response and constructed- response items. The more advanced the test, the more constructed-response items it should have.
  • 37.
    Types of vocabularyitems - Fill-in-the blanks with words from a word bank; always include at least one extra word. - Match the word with the sentence it can complete: always include an extra item on the right, not on the left to prevent correction confusion (more numbers than parentheses). Don’t ask students to match only words and definitions.
  • 38.
    Types of vocabularyitems - Complete the sentences with words; provide the first letter of each word (no word bank) - Crossword puzzles - Multiple-choice - Odd-man out
  • 42.
    Types of grammar/ language function items - Multiple-choice - Editing - Fill-in-the blanks of sentences, dialogues, or paragraphs - Complete the dialogue - Answer questions - Ask questions for the answers provided - Sentence transformation (from active to passive; from direct to indirect speech) - Write sentences based on a chart or graph
  • 49.
    You can alsointegrate grammar and vocabulary
  • 50.
  • 52.
    Be careful! Avoid exceptions/ rare cases / grammar particularities. Consider what was taught only for recognition and what was taught for production when you decide what type of test item to use. Don’t be more wordy than you need in your paragraphs and dialogues. You are not testing reading!
  • 53.
    Be careful! Don’t makethe items too mechanical, with almost the same answer in all sentences. Stick to your instructional strategies - ex: if students didn’t do an exercise mixing two tenses, don’t include this on the test! Provide a correction key for reliability purposes: what should be given partial credit to? What should not?
  • 54.
    Be careful! Don’t assigntoo many points for each item. Distribute your points according to what was most and least emphasized. Don’t make your test a test of intelligence and/or critical thinking rather than a test of language skills.
  • 55.
    Controversial issues Should timed,one-shot writings (paragraphs and essays) be part of a test? Should we provide a glossary of some unknown words for the reading? How many times should the listening be played?
  • 56.
    Controversial issues Should wetest grammatical terminology? Should the distractors in multiple-choice exercises contain grammatical errors? Should we only use authentic texts on tests, even at the basic level?