SlideShare a Scribd company logo
1 of 186
8/4/2014 I come not to bury summative assessments but to
praise them | The Thomas B. Fordham Institute
http://edexcellence.net/commentary/education-gadfly-
daily/common-core-watch/2012/i-come-not-to-bury-summative-
assessments-but-to-praise-them.html# 1/3
I come not to bury summative assessments but to
praise them
Kathleen Porter-Magee (/about-us/fordham-staff/kathleen-
porter-magee)
February 10, 2012
The Northwest Evaluation Association recently surveyed
parents and teachers
(http://www.nwea.org/sites/www.nwea.org/files/PressReleaseAs
sessmentPerceptions.pdf) to
gauge their support for various types of
assessm ent. The
results
(http://www.edweek.org/ew/articles/2012/02/08/21tests.h31.htm
l) indicated that just a quarter of
teachers find sum m ative
assessm ents “‘extrem ely’ or ‘very’ valuable for determ ining
whether students
have a deep understanding of content.” By contrast, 67 percent
of teachers (and
85 percent of parents) found form ative and interim assessm
ents extrem ely or
very valuable.
I can understand why teachers would find form ative and
interim assessm ents appealing. After all, teachers generally
either create those
assessm ents them selves, or are at least intim ately involved
with their
creation. And they are, therefore, m ore flexible tools that can
be tweaked
depending on, for instance, the pace of classroom instruction.
But, while form ative and interim assessm ents are
critically im portant and should be used to guide instruction and
planning, they
cannot and should not be used to replace sum m ative assessm
ents, which play an
equally critical role in a standards-driven system .
http://edexcellence.net/blog-types/common-core-watch
http://edexcellence.net/about-us/fordham-staff/kathleen-porter-
magee
http://www.nwea.org/sites/www.nwea.org/files/PressReleaseAss
essmentPerceptions.pdf
http://www.edweek.org/ew/articles/2012/02/08/21tests.h31.html
8/4/2014 I come not to bury summative assessments but to
praise them | The Thomas B. Fordham Institute
http://edexcellence.net/commentary/education-gadfly-
daily/common-core-watch/2012/i-come-not-to-bury-summative-
assessments-but-to-praise-them.html# 2/3
Formative and
interim assessments
cannot and should
not be used to
replace summative
assessments.
Everyone has a
Sum m ative assessm ents are designed to evaluate whether
students have m astered knowledge and skills at a particular
point in tim e. For
instance, a teacher m ight give a sum m ative assessm ent at the
end of a unit to
determ ine whether students have learned what they needed to in
order to m ove
forward.
Sim ilarly, and end-of-course or end-of-year sum m ative
assessm ent can help
determ ine whether students m astered the content and skills
outlined in a
state’s standards for that grade.
If you believe that we need standards to ensure that all
students—regardless of their zip code or socioeconom ic
status—need to learn
the
sam e essential content and be held to the sam e standards, than
it’s essential
to have an independent gauge that helps teachers, parents, adm
inistrators, and
leaders understand where students are not reaching the goals
we’ve set out for
them .
Unfortunately, the NWEA survey does not m ake this clear,
opting instead to narrowly define sum m ative assessm ents only
as “state or
district-wide standardized tests that m easure grade-level
proficiency, and
end-of-year subject or course exam s.”
It’s hard to im agine m any teachers who are going to be
enthusiastic about the current “state or district-wide
standardized tests” in
use, which often include low-quality questions and the results
of which typically
don’t reach teachers until it’s too late to do anything with them
. And so, by
defining sum m ative assessm ents in the particular rather than
the general, the
NWEA findings tell us less about how teachers feel about the
value of sum m ative
assessm ents writ large, and m ore about how they feel about the
current crop of
state tests, which pretty m uch everyone agrees need significant
im provem ent.
What’s m ore, everyone has a natural bias in favor of the
things they create them selves. And so, it’s unsurprising that
teachers find the
assessm ents that they create and score (in real tim e) m ore
useful than tests
that are created and scored centrally.
Yet, having a set of com m on standards—whether com m on to
all schools within a state, or com m on across all states—
requires some
independent m easure of student learning. There needs to be
som e gauge—for
8/4/2014 I come not to bury summative assessments but to
praise them | The Thomas B. Fordham Institute
http://edexcellence.net/commentary/education-gadfly-
daily/common-core-watch/2012/i-come-not-to-bury-summative-
assessments-but-to-praise-them.html# 3/3
natural bias in favor
of the
things they create
themselves.
teachers, adm inistrators, and parents—that helps show whether
classroom
instruction, m aterials, and even form ative and interim assessm
ents are
aligned
to the state standards in term s of both content and rigor. And to
help teachers
and parents understand whether, in the end, students learned the
essential
content and skills they needed each year.
Of course, shifting the focus from teacher-created
assessm ents to centrally-developed state (or even district)
assessm ents is
difficult. And m any teachers will resist being judged by som
ething they had no
hand in creating, and realigning instruction around standards
that m ay look
different from what they’ve taught in their classroom s for
years.
In
the end, if we want standards-driven reform to work, we need
to get sum m ative assessm ents right. Trading sum m ative
assessm ents for form ative assessm ents isn’t an option. They
are different tools
with very different roles in the system . That m eans policym
akers and education
leaders need to do a far better job of soliciting teacher feedback
on these
assessm ent tools and they need to focus m uch m ore tim e and
attention on
delivering high-quality professional developm ent that helps
teachers use the
data effectively to guide planning, instruction, and form ative
assessm ent
developm ent. But it also m eans that teachers in standards-
driven schools need
to accept that student learning will be m easured by som ething
other than the
observations and assessm ents created within the four walls of
their schools.
Teacher-Made Assessments
Focus Questions
After reading this chapter, you should be able to answer the
following questions:
1. What are some important steps in planning for assessment?
2. What kinds of teacher-made assessment options are
available?
3. What are some guidelines for constructing good selected-
response assessments?
4. What are the advantages and limitations of selected-response
assessments?
5. What are the advantages and limitations of constructed-
response assessments?
8
Tetra Images/SuperStock
A fool must now and then be right by chance.
—William Cowper
Even a blind squirrel sometimes finds a nut. And a fool is
sometimes right by chance.
But more often, the fool is wrong and the blind squirrel goes
hungry—or ends up feeding a
red-tailed hawk that is far from blind.
No one would ever have accused Leanne Crowder of being a
fool—not only because she
would have smacked you up the side of the head if you did, but
also because she was clearly
smarter than her more average classmates.
But she didn’t always have time to study for the many little
multiple-choice quizzes with
which Mrs. Moskal liked to keep her classes on their toes. Yet
she almost always did well on
these tests.
“How d’ ya do it?” asked Louis, who was trying hard to hang
out with her.
“I guess,” said Leanne. “I do well just by chance.”
Teacher-Made Assessments Chapter 8
Chapter Outline
8.1 Planning for Teacher-Made Tests
Goals and Instructional Objectives
Test Blueprints
Rubrics
Approaches to Classroom Assessment
8.2 Performance-Based Assessments
Types of Performance-Based
Assessments
Improving Performance-Based
Assessment
8.3 Constructed- and Selected-
Response Assessments
What Are Selected-Response
Assessments?
What Are Constructed-Response
Assessments?
Objective Versus Essay and Short-
Answer Tests
Which Approach to Assessment Is Best?
8.4 Developing Selected-Response
Assessments
Multiple-Choice Items
Matching Items
True-False Items
Interpretive Items
8.5 Developing Constructed-Response
Assessments
Short-Answer Constructed-Response
Items
Essay Constructed-Response Items
Planning for Assessment
Chapter 8 Themes and Questions
Section Summaries
Applied Questions
Key Terms
Planning for Teacher-Made Tests Chapter 8
“That’s a lie,” said Louis who was academically gifted but not
especially socially intelligent.
He went on to explain that by chance, Leanne might do well
some of the time—as might
any other student in the class. But if chance were the only factor
determining her results,
she should do very poorly most of the time. “If a multiple-
choice item has four options,” he
expounded like a little professor, “and each of them is equally
probable, if you have absolutely
no idea which is correct, on average you should answer
correctly 25% of the time. And you
should be dead wrong three-quarters of the time.”
“I’m dead right three-quarters of the time,” Leanne smirked,
“and I’m not going to any movie
with you.”
It turned out, as Louis eventually discovered, that Leanne had
quickly noticed that Mrs. Moskal’s
test items were so poorly constructed that the clever application
of a handful of guidelines
almost always assured a high degree of success, even if you
only knew a smattering of correct
answers to begin with. For example, Mrs. Moskal made
extensive use of terms like always,
never, everywhere, and entirely in her multiple-choice options;
Leanne knew that these are
almost always false. She also knew that the longest, most
inclusive options are more likely
to be correct than shorter, very specific options. And she was
clever enough to realize that
options that don’t match the question for grammatical or logical
reasons are likely incorrect
—as are silly or humorous options. And options like all of the
above are always correct if two
of the above are correct, and none of the above is more often
incorrect than not.
Mrs. Moskal should have read this chapter!
8.1 Planning for Teacher-Made Tests
Reading the chapter might have improved Mrs. Moskal’s
construction of teacher-made tests
(as opposed to standardized tests that are commercially
prepared; these are discussed in
Chapter 10).
Reading this chapter might have suggested to Mrs. Moskal that
she should not rely solely on
her memory and intuition when constructing a test, but that she
should begin with a clear
notion of what she is trying to teach. She then needs to decide
on the best ways of deter-
mining the extent to which she has been successful. If her
assessments are to be useful for
determining how well her students have learned (summative
function of tests) and for improv-
ing their learning (formative function of tests), she needs a
clear notion of her instructional
objectives, some detailed test blueprints and perhaps some
rubrics and checklists to help her
evaluate student performances.
Goals and Instructional Objectives
Educational goals are the nation’s, the state’s, the school
district’s, or the teacher’s general
statements of the broad intended outcomes of the educational
process. Instructional objec-
tives are the more specific statements of intended learning
outcomes relative to a lesson, a
unit, or even a course. In most cases, instructional objectives
reflect the broader goals of the
curriculum. Whereas educational goals are often somewhat
vague and idealistic, the most
useful learning objectives for the classroom tend to be very
explicit. Most are phrased in terms
of behaviors that can be taught and learned, and that can be
assessed.
Planning for Teacher-Made Tests Chapter 8
National Learning Goals
The nation’s educational goals, for example, are often detailed
in legislation and regulations.
As we saw in Chapter 1, in the United States, the No Child Left
Behind Act expresses some
very definite aims listed as five distinct goals. These are
summarized in Figure 8.1.
The law, Public Law 107–110, states as its purpose, “To close
the achievement gap with
accountability, flexibility, and choice, so that no child is left
behind.” Among its broad targets
are goals relating to
• Improving academic achievement of those with
disadvantages
• Preparing, training, and recruiting high-quality teachers
and school administrators
• Improving language instruction for those with limited
proficiency in English
• Promoting informed parental choice and expanding
available educational programs
• Increasing accountability and flexibility (NCLB, 2002)
Educational goals of this kind, laudable as they might be, are
not easily reached. In fact, cur-
rent statistics (and common sense) tell us that not a single one
of NCLB’s five goals as stated in
Figure 8.1 has been reached. Nor will any be reached in our
lifetime. It simply isn’t reasonable
to expect, for example, that all learners will become proficient
in reading and mathematics,
nor that all teachers will be highly qualified.
Still, these goals are worthwhile ideals. They tell us in what
general direction we should direct
our efforts so that most, even if not all, learners have a much
higher probability of reaching
the goals. National ideals such as these provide important
guides for state educational goals.
Figure 8.1: NCLB educational goals
The educational goals that are explicit in the No Child Left
Behind Act are lofty ideals that not all
learners can reach. But that the educational machinery is aimed
in their direction may herald some
enormous improvements.
f08.01_EDU645.ai
Goal 1
Goal 2
Goal 3
Goal 4
Goal 5
• By 2013–2014, ALL students will reach high standards and
attain proficiency in reading and mathematics.
• ALL limited English proficient students will become
proficient in English.
• By 2005–2006, ALL students will be taught by highly
qualified teachers.
• ALL students will be educated in learning environments that
are safe and drug-free.
• ALL students will graduate from high school.
Source: Based on the No Child Left Behind Act of 2001.
Retrieved April 10, 2013, from
http://www2.ed.gov/policy/elsec/leg/esea02/107-110.pdf.
http://www2.ed.gov/policy/elsec/leg/esea02/107-110.pdf
Planning for Teacher-Made Tests Chapter 8
Common Core State Standards
Virtually all states have published descriptions of criteria that
can be used to assess the extent
to which goals are being met. These are often referred to as
standards. Following a nation-
wide education initiative, many states have adopted identical
standards labeled Common Core
State Standards. These standards describe what students should
know at each grade level,
and for each subject. For example, based on these Common
Core State Standards, the state
of Washington provides explicit learning targets for science at
all levels from kindergarten to
12th grade (McClellan & Sneider, 2009). California, too, is one
of more than 45 states that
have adopted Common Core State Standards (California
Department of Education, 2012).
One intended result of adopting common core standards is to
bring about a realignment of
curricula in different states.
State standards serve as a guide for the broad goals and for the
specific instructional objec-
tives developed by local school jurisdictions and, ultimately, by
classroom teachers (Crawford,
2011). For example, the California core reading standards for
Literature at grade 1 level
specify that students should be able to do the following
(Sacramento County Office of
Education, 2012a):
• Ask and answer questions about key details in a text.
• Retell stories, including important particulars, and
demonstrate understanding of their
central message or lesson.
• Describe characters, settings, and major events in a story
using key details.
• Identify who is telling the story at various points in a text.
• Confirm predictions about what will happen next in a text.
• Compare and contrast the adventures and experiences of
characters in stories.
• Identify words and phrases that suggest feelings or appeal
to the senses.
These are six of the 10 general objectives listed for grade 1 in
this area. Note that each of
these suggests certain instructional activities. For example, the
last objective—identifying
words and phrases that suggest feelings or appeal to the
senses—leads to a wide range of
instructional possibilities. Teachers might take steps to ensure
that students understand what
emotions are and that they recognize words relating to them;
perhaps direct teaching meth-
ods might be used to inform learners about the human senses;
group activities might encour-
age learners to generate affect-related words; learners might be
asked to search stories for
words and phrases associated with feelings.
An objective such as this even suggests instructional activities
related to other subject areas.
For example, in art classes, students might be asked to draw
facial expressions correspond-
ing to emotional states described in the stories they are reading
in language arts. And in
mathematics, they might be asked to count the number of affect-
linked words or phrases
in different paragraphs or on different pages. And, depending on
relevant mathematics
objectives, they might be encouraged to add these or to subtract
the smaller number from
the larger.
Not only do state standards suggest a variety of instructional
activities, but by the same token,
they serve as indispensable guidelines for the school’s and the
teacher’s instructional objec-
tives. And these are basic to sound educational assessment. In
the same way as the main
purpose of all forms of instruction is to improve learning, so
too, an overriding objective of
assessment is to help learners reach instructional objectives.
Planning for Teacher-Made Tests Chapter 8
Test Blueprints
The best way of ensuring that assessments are directed toward
instructional objectives is
to use test blueprints. As we saw in Chapter 4, these are
basically tables of specifications
for developing assessment instruments. They are typically based
closely on the instructional
objectives for a course or a unit. They may also reflect a list or
a hierarchical arrangement
of relevant intellectual or motor activities such as those
provided by Bloom’s Taxonomy
(described in Chapter 4). Many states provide blueprints for
large-scale testing (Johnstone &
Thurlow, 2012).
Suppose, for example, you are teaching sixth-grade mathematics
in California. California core
standards list detailed objectives at that grade level for five
different areas: ratios and propor-
tional relationships, the number system, expressions and
equations, geometry, and statistics
and probability (SCOE, 2012b). The first of six core standards
for geometry reads as follows:
Find the area of right triangles, other triangles, special
quadrilaterals, and polygons by
composing into rectangles or decomposing into triangles and
other shapes; apply these
techniques in the context of solving real-world and
mathematical problems. (p. 27)
Part of a test blueprint reflecting related learning objectives,
based on Bloom’s Taxonomy,
might look something like that in Table 8.1. Numbers in the
grid indicate the number of test
items for each category. Questions in parentheses are examples
of the sorts of items that
might be used to assess a specific cognitive process with respect
to a given topic. Test blue-
prints of this kind might also include the value assigned to each
type of test item.
Table 8.1 Part of a sample test blueprint for a single geometry
objective reflecting
Bloom’s Revised Taxonomy, cognitive domain
Topic
Remembering
Understanding
Higher processes (applying,
analyzing, evaluating, creating)
Right triangles 4 items (e.g., What
is the formula for
finding the area of
a right triangle?)
1 item (e.g., If you were building a house
and could have a total of only 80 feet of
perimeter wall, which of the following
shapes would give you the largest area?
Quadrilateral; polygon; square; right-angle
triangle; other shape. Prove that your
answer is correct.)
Quadrilaterals 3 items
Other Triangles 3 items 2 items (e.g.,
Illustrate how
you would find
the area of an
isosceles triangle
by sketching a
solution.)
1 item
There are several other approaches to devising test blueprints.
For example, the blueprint
might list what learners are expected to understand, remember,
or be able to do. In addition,
the most useful blueprints will include an indication of how
many items or questions there
might be for each entry in the list and the test value for each.
Figure 8.2 gives an example of
a checklist blueprint for a unit covering part of the content of
Chapter 2 in this text. (For other
examples of test blueprints, see Tables 4.3 and 4.4 in Chapter
4.)
Planning for Teacher-Made Tests Chapter 8
A blueprint such as that shown in Figure 8.2 is useful for more
than simply organizing and
writing items for a test. It not only serves to guide the
instructor’s efforts, but, if given to
learners, it can also serve to direct their learning. And perhaps
most important, it directs the
attention of both teachers and learners toward the higher levels
of mental activity.
In this connection, it is worth noting that despite teachers’ best
intentions and their most
carefully prepared test blueprints, assessments don’t always
reflect instructional objectives.
For a variety of reasons, including that they are much easier to
assess, the lowest levels of
cognitive activity in Bloom’s Taxonomy (knowledge and
comprehension) are often far more
likely to be tapped by school assessments than are the higher
levels (Badgett & Christmann,
2009). For example, following an analysis of alignment between
instructional objectives and
assessments in food sciences classes, Jideani and Jideani (2012)
report that knowledge- and
comprehension-based assessments predominated. And this was
true even though instructors
intended that their students would go beyond remembering and
understanding—that they
would also learn to apply, to analyze, to evaluate, and to create.
Rubrics
As we saw in Chapter 7, another important tool for assessment
is the rubric. A rubric is a written
guide for assessment. Rubrics are used extensively in
performance assessments where, without
such guides, evaluations are often highly subjective and
unpredictable. Inconsistent assessments
are the hallmark of lack of test reliability. And measures that
are unreliable are also invalid.
Rubrics, like test blueprints, are a guide not only for assessment
but also for instruction. And,
also like blueprints, they are typically given to the learner
before instruction begins. They tell
the student what is important and expected far more clearly than
might be expressed verbally
by most teachers.
Figure 8.2: Checklist test blueprint
A test blueprint for a short-answer test on Chapter 2 of this text.
The instructor might also choose
to indicate the relative value of questions relating to each
objective listed.
f08.02_EDU645.ai
Checklist Test Blueprint for a Unit on
Characteristics of Good Testing Instruments
(Unless otherwise stated, there is one item for each question)
Fairness
• know what test fairness means
• be able to give examples of unfair test items
• understand the requirements of NCLB regarding
accommodations for learners with
special needs
Validity
• be able to define validity
• be able to name and explain the difference between each of
the different kinds of validity
• understand how test validity can be improved
Reliability
• understand the importance of test reliability
• know how reliability is calculated
• be able to suggest how reliability can be improved
Planning for Teacher-Made Tests Chapter 8
Table 8.2 is an example of a rubric that might be used for
evaluating an analysis paper at the
sixth-grade level. Developing detailed rubrics of this kind for
every instructional unit simplifies
the teacher’s task enormously. It makes lesson planning
straightforward and clear; it dramati-
cally shortens the amount of time that might otherwise be spent
in planning and developing
assessment instruments; and it is one of the surest ways of
increasing test reliability, validity,
and fairness.
Table 8.2 Rubric for evaluation of an analysis paper
Your analysis paper will be evaluated for each of the following:
Points
1. Purpose clearly stated in two or three sentences 10
2. Information provided to support and justify the purpose 10
3. Relevant information by way of facts, examples, and research
included 20
4. Absence of irrelevant information 5
5. Analysis presented in coherent, logical fashion evident in
paragraphing and sequencing 20
6. Few grammatical and spelling errors (up to 10 points may be
deducted) 0
7. Clear, well-supported conclusions 15
8. High interest level 20
TOTAL 100
Approaches to Classroom Assessment
In Chapter 4, we saw that assessment can serve at least four
different functions in schools.
1. Assessment might be used for placement purposes before or
after instruction—or, some-
times, during instruction (placement assessment).
2. It might assume a helping role when feedback from ongoing
assessments is given to learn-
ers to help them improve their learning and where ongoing
assessments suggest to the
teacher how instructional strategies might be modified
(formative assessment).
3. School assessments often serve to provide a summary of the
learner’s performance and
achievements. These unit- or year-end assessments are usually
the basis for grades and for
decisions affecting future placement (summative assessment).
4. Assessments might also be used to identify problems, to
determine strengths and weak-
nesses, and to suggest possible approaches to remediation
(diagnostic assessment).
Teacher-made assessments, no matter to which of these uses
they are put, can take any
one of several forms. Among them are performance-based
assessments, selected-response
assessments, and constructed-response assessments.
Planning for Teacher-Made Tests Chapter 8
A P P L I C A T I O N S :
New Assessment-Related CAEP Standards for Accreditation
of Teacher Preparation Programs
Until July 2013, two organizations were dedicated to ensuring
that teacher preparation programs
graduated highly qualified teachers for the nation’s PK-12
school systems: the National Council
for Accreditation of Teacher Education (NCATE) and the
Teacher Education Accreditation Council
(TEAC). Higher education institutions that had teacher
preparation programs could demonstrate
that they met either NCATE’s or TEAC’s standards for teacher
preparation to attain accreditation.
Accreditation was interpreted as proof of the quality of an
institution’s programs and enhanced its
credibility.
On July 1, 2013, these two organizations became a new entity:
the Council for the Accreditation of
Educator Preparation (CAEP). Their purpose was not just to
merge the two organizations to elimi-
nate duplication of efforts and reduce costs to higher education
institutions: In addition, they set as
their goal:
To create a model unified accreditation system…. CAEP’s goals
should be not only to
raise the performance of candidates as practitioners in the
nation’s PK-12 schools, but
also to raise the stature of the entire profession by raising the
standards for the evi-
dence the field relies on to support its claims of quality. (pp. 2
and 3)
In late August of 2013, the CAEP board of directors will meet
to ratify the standards that teacher
education programs will need to reach if they are to be
accredited. These standards were developed
by a committee whose membership reflected a broad spectrum
of interested parties, from public
school teachers to university deans to state school
superintendents. In addition, the draft standards
were made available for public comment so everyone had an
opportunity to react and contribute.
It is anticipated that teacher preparation programs will have
access to resources regarding the new
CAEP standards by January of 2014.
So what are the ramifications of these new standards for teacher
preparation programs? In terms
of assessment, the following table is a comparison of the old
and new standards associated with
assessment.
The new standards have a clear new emphasis: It is no longer
enough simply to have an assessment
system; now institutions must use assessment data to make
decisions and to evaluate how well they
are doing.
The new CAEP standards also recognize the importance of
having multiple tools for assessment
and of collecting data beyond the confines of the institution.
When the standards are approved by
the CAEP board of directors, teacher preparation programs will
need to offer proof that they solicit
information from schools and communities to inform their
practices. This will encourage close con-
tact between teacher preparation institutions and the systems
that hire their graduates and increase
responsiveness to the needs of the schools. Finally, the new
CAEP standards suggest that teacher
preparation programs should follow their graduates into the
schools to collect data on their perfor-
mance as teachers. Teacher preparation programs will be
charged with providing evidence that their
candidates can “walk the talk.”
It will be interesting to see how this new accreditation process
plays out. One initial purpose for
pursuing the consolidation of the two accrediting agencies was
to reduce the financial burden
teacher preparation programs incurred when seeking national
accreditation. Will CAEP with its
revised standards accomplish this goal? Or will the revisions
require teacher education programs to
expand the role of the assessment process, thereby increasing its
cost?
Performance-Based Assessments Chapter 8
8.2 Performance-Based Assessments
Performance-based assessments are covered in detail in Chapter
7 and are summarized
briefly here.
Types of Performance-Based Assessments
Basically, a performance-based assessment is one that asks the
student to perform a task
or produce something, often in a situation that approximates a
real-life setting as closely as
possible. Among the most common performance assessments are
developmental assess-
ments, demonstrations, exhibitions, and portfolios.
Performance-based assessments are
often referred to as authentic assessments, although the
expressions are not synonymous.
A performance assessment is judged to be authentic to the
extent that it asks the student
to perform in ways that are closer to the requirements of actual
performances in day-to-day
settings.
NCATE 2008 TEAC 2009 CAEP 2013
Standard 2: Assessment
System and Unit Evaluation
The unit has an assessment
system that collects and
analyzes data on applicant
qualifications, candidate
and graduate performance,
and unit operations to
evaluate and improve the
performance of candidates,
the unit, and its programs.
1.5 Evidence of valid
assessment
The program must
provide evidence
regarding the trustwor-
thiness, reliability, and
validity of the evidence
produced from the
assessment method
or methods that it has
adopted.
2. Data drive decisions about candidates
and programs
This standard addresses CAEP’s expectations
regarding data quality and use in program
improvement. The education preparation
provider (EPP) must provide evidence that it
has a functioning quality control system that is
effective in supporting program improvement.
Its quality control system must draw on valid
and reliable evidence from multiple sources.
2.1 Decisions are based on evidence from
multiple measures of candidates’
learning, completers’ performance
in the schools, and school and
community conditions and needs.
2.2 The education preparation provider
has a system for regular self-
assessment based on a coherent logic
that connects the program’s aims,
content, experiences and assessments.
2.3 The reliability and validity of each
assessment measure are known and
adequate, and the unit reviews and
revises assessments and data sources
regularly and systematically.
2.4 The education preparation provider
uses data for program improvement
and disaggregates the evidence for
discrete program options or certifica-
tion areas.
Performance-Based Assessments Chapter 8
A performance assessment might ask a fine arts student to
prepare an exhibition of paintings for display in a school caf-
eteria as a basis for a final mark; a physical education student
might be graded on a demonstration of different sports-
related skills in competitive situations combined with scores
on written tests; and part of a language arts student’s final
grade might be based on a portfolio that contains samples of
written work spanning the school year.
It is true that many of the instructional objectives related to
these three situations can be assessed with non-performance-
based, teacher-made instruments. However, a teacher-made
test that is not performance-based is unlikely to reveal very
clearly how well Lenore can select, organize, and present an
art exhibition or how Robert is likely to perform during the
pressure of athletic competition. Nor is a single, year-end cre-
ative writing test likely to say nearly as much about Elizabeth’s
writing skills as does her yearlong collection of representative
compositions. Also, because many performance assessments
do not require high levels of verbal skills, they are exception-
ally well suited for use in early grades or during the preschool
period, as well as for some children with special needs.
Improving Performance-Based Assessment
Performance-based assessments have a number of limita-
tions and drawbacks. First, they can be very time-consuming,
especially when they involve individual performances, each of
which must be evaluated.
Second, performance-based assessments are not always very
practical, particularly when they
require special equipment or locations—both of which might be
the case for assessments in
areas that require performances such as public speaking or
competitive sports activities.
Third, despite the argument that they are more authentic,
performance-based assessments
tend to have much lower reliability. And, because of that fact,
they may often be less valid
and less fair.
However, there are ways of improving performance-based
assessments. One way to improve
their reliability is to use carefully designed rubrics and
checklists. Wesolowski (2012) suggests
that these need to make the assessment process as objective as
possible. A rubric should be
designed so that different evaluators who base their assessments
on the same rubric will
arrive at very similar scores.
There is evidence that the usefulness of performance-based
assessments can be greatly
improved through additional teacher training and development.
This can be accomplished
by means of workshops that emphasize the use of rubrics,
checklists, and rating scales. Koh
(2011) looked at the results of teacher participation in such
workshops. He reports that these
teacher development activities resulted in significant
improvements in teachers’ understand-
ing of performance assessments and in the usefulness of the
assessments and the rubrics they
designed.
Wavebreak Media/Thinkstock
▲ Because they are closer to real-life situ-
ations, performance-based assessments
are often described as more authentic
assessments. Some of the most impor-
tant learning targets associated with
the music class to which this student
belongs cannot easily be assessed with a
selected-response test. The test is in the
performance.
Constructed- and Selected-Response Assessments Chapter 8
Performance assessments can also be improved by using a
variety of creative and highly
motivating approaches. Schurr (2012) provides numerous
suggestions in a book that lists
more than 60 different ways of using performances to assess
student learning. For example,
students might be asked to write blog or journal entries as
though they were actually part
of Lewis and Clark’s company of explorers, or as though they
were soldiers in Napoleon’s
army or members of Queen Isabella’s court. The book also
includes suggestions for designing
rubrics. It includes examples as well as a list of online
resources for performance assessments.
Figure 8.3 summarizes some of the guidelines that might be
used to ensure that performance-
based assessments are as reliable, valid, and fair as possible.
8.3 Constructed- and Selected-Response Assessments
Test items are the basic units that make up an assessment. These
are often referred to as test
questions, although many assessment items are not questions at
all; instead, they are direc-
tions, instructions, or requests.
Some teacher-made assessments include several different kinds
of items. Often, however,
they are made up of a single sort of item. Test items can
generally be divided into two broad
categories: those that ask students to select a correct answer,
termed selected-response
assessments, and those that require examinees to produce
(construct) the correct response,
usually in writing but also sometimes orally. These are referred
to as constructed-response
assessments.
Figure 8.3: Improving performance-based assessment
Of these suggestions, probably the most important for
increasing the reliability, validity, and
fairness of performance-based assessments is the use of
carefully designed scoring rubrics and
checklists.
f08.03_EDU645.ai
Suggestions for
Improving
Performance-Based
Assessments
• When possible, use a variety of different performance
assessments.
• Use carefully constructed rubrics and checklists.
• Assess performances that reflect clear learning
targets.
• Design performance tasks that closely approximate
real-life settings.
• Select tasks that are interesting, motivating, and
challenging.
• Assess behaviors that can be taught and learned and
where improvement can be demonstrated through
performance.
• Take steps to ensure that students understand what is
expected and the criteria upon which they will be
assessed.
• Develop performance assessments that are practical
within budget and time constraints.
• Direct assessments toward important rather than
trivial learning targets.
Constructed- and Selected-Response Assessments Chapter 8
What Are Selected-Response Assessments?
Selected-response items are generally considered to be more
objective than constructed-
response items, simply because each item usually has a single
clearly correct answer. In most
cases, if more than one response is correct, that is taken into
account in scoring. As a result,
answer keys for assessments made up of selected-response items
tend to be simple and exact.
No matter which examiner scores a selected-response
assessment, results should be identical.
There are four principal kinds of selected-response items:
1. Multiple-choice items ask students to select which of several
alternatives is the correct
response to a statement or question.
2. True-false items, also called binary-choice items, ask the
responder to make a choice
between two alternatives, such as true or false.
3. Matching-test items present two or more corresponding lists,
from which the examinee
must select those that match.
4. Interpretive items are often similar to multiple-choice items,
except that they provide
information that examinees need to interpret in order to select
the correct alternative.
Information may be in the form of a chart, a graph, a paragraph,
a video, or an audio
recording.
What Are Constructed-Response Assessments?
Constructed-response items are more subjective than selected-
response items, because they
ask learners to generate their own responses. As a result, they
often have more than one cor-
rect answer.
Test makers distinguish between two broad forms of
constructed-response items, based
largely on the length of the answer that is required. Thus there
are short-answer items
requiring brief responses—often no longer than a single
paragraph—and essay items that
ask the student to write a longer, essay-form response for the
item. Figure 8.4 summarizes
these distinctions.
Objective Versus Essay and Short-Answer Tests
The constructed-response (objective) items and the more
subjective essay and short-answer
items shown in Figure 8.4 can both be used to measure almost
any significant aspect of stu-
dents’ behavior. It is true, however, that some instructional
objectives are more easily assessed
with one type of item than with the other. The most important
uses, strengths, and limita-
tions of these approaches are described here. While the
descriptions can serve as a guide in
deciding which to use in a given situation, most good
assessment programs use a variety of
approaches:
1. It is easier to tap higher level processes (analysis, synthesis,
and evaluation) with an essay
examination. These can more easily be constructed to allow
students to organize knowl-
edge, to make inferences from it, to illustrate it, to apply it, and
to extrapolate from it.
Still, good multiple-choice items can be designed to measure
much the same things as
constructed-response items. Consider, for example, the
following multiple-choice item:
Constructed- and Selected-Response Assessments Chapter 8
Harvey is going on a solo fishing and camping trip in the far
north. What equipment
and supplies should he bring?
a. rainproof tent; rainproof gear; fishing equipment; food
b. an electric outboard motor; a dinner suit; a hunting rifle
c. some books; a smart phone; fishing equipment; money
*d. an ax; camping supplies; fishing equipment; warm,
waterproof clothing
Answering this item requires that the student analyze the
situation, imagine different sce-
narios, and apply previously acquired knowledge to a new
situation. In much the same
way, it is possible to design multiple-choice items that require
that students synthesize
ideas and perhaps even that they create new ones.
As we saw, however, the evidence indicates that most selected-
response assessments
tend to tap remembering—the lowest level in Bloom’s
Taxonomy. Most items simply ask
the student to name, recognize, relate, or recall. Few classroom
teachers can easily create
items that assess higher cognitive processes.
2. Because essay and short-answer exams usually consist of
only a few items, the range of
skills and of information sampled is often less than what can be
sampled with the more
Figure 8.4: Types of assessment items
As this chart indicates, some tests include more than one type of
assessment item.
f08.04_EDU645.ai
Types of
Assessment
Items
Selected-
Response
(More Objective)
Multiple choice
True-False
Matching
Interpretive
Short-answer
Essay
Constructed-
Response
(Less Objective)
Constructed- and Selected-Response Assessments Chapter 8
objective tests. Selected-response assessments permit coverage
of more content per unit
of testing time.
3. Essay examinations allow for more divergence. They make it
possible for students to pro-
duce unexpected and unscripted responses. Those who do not
like to be limited in their
answers often prefer essays over more objective assessments.
Conversely, those who
express themselves with difficulty when writing often prefer
selected-response assess-
ments. However, Bleske-Rechek, Zeug, and Webb (2007) found
that very few students
consistently do better on one type of assessment than another.
4. Constructing an essay examination is considerably easier and
less time-consuming than
making up an objective examination. In fact, an entire test with
an essay format can
often be written in the same time it would take to write no more
than two or three good
multiple-choice items.
5. Scoring essay examinations usually requires much more time
than scoring objective tests,
especially when classes are large. This is especially true when
tests are scored electroni-
cally. When classes are very small, however, the time required
for making and scoring an
essay test might be less than that required for making and
scoring a selected-response
test. The hypothetical relationship between class size and total
time for constructing and
scoring constructed-response and selected-response tests is
shown in Figure 8.5.
6. As Brown (2010) reports, the reliability of essay
examinations is much lower than that of
objective tests, primarily because of the subjectivity involved in
scoring them. In addition,
suggests Brown, examiners often overemphasize the language
aspects of the essays they
are scoring. As a result, they pay less attention to the content,
and the validity of the
grades suffers.
Some researchers have begun to develop computer programs
designed to score con-
structed-response test items. Typically, however, use of these is
limited to questions where
acceptable responses are highly constrained and easily
recognizable (e.g., Johnson, Nadas,
& Bell, 2010; McCurry, 2010).
Figure 8.5: Construction and scoring time: Essays versus
objective assessments
A graph of the hypothetical relationship between class size and
total time required for construct-
ing and scoring selected-response tests (multiple-choice, for
example) and constructed-response
tests (essay tests). As shown, preparation and scoring time for
essay tests increases dramatically
with larger class size, but it does not change appreciably for
machine-scored objective tests.
f08.05_EDU645.ai
Number of students from low to high
To
ta
l
ti
m
e
f
o
r
co
n
st
ru
ct
io
n
a
n
d
sc
o
ri
n
g
,
fr
o
m
l
o
w
t
o
h
ig
h
Essay
Assessment
Objective
Assessment
Constructed- and Selected-Response Assessments Chapter 8
Which Approach to Assessment Is Best?
The simple answer is, it depends. Very few teachers will ever
find themselves in situations where
they must always use either one form of assessment or the
other. Some class situations, particu-
larly those in which size is a factor, may lend themselves more
readily to objective formats; in
other situations, essay formats may be better; sometimes a
combination of both may be desir-
able. The important point is that each form of assessment has
advantages and disadvantages.
A good teacher should endeavor to develop the skills necessary
for constructing the best items
possible in a variety of formats without becoming a passionate
advocate of one over the other.
The good teacher also needs to keep in mind that there are many
alternatives to assess-
ment other than the usual teacher-made or commercially
prepared standardized tests. Among
these, as we saw earlier, are the great variety of approaches to
performance assessment. In
the final analysis, the assessment procedure chosen should be
determined by the goals of the
instructional process and the purposes for which the assessment
will be used.
Nor are teachers always entirely alone when faced with the task
of constructing (or select-
ing) assessment instruments and approaches—even as teachers
are not entirely on their own
when they face important decisions about curriculum,
objectives, or instructional approaches.
Help, support, and advice are available from many sources,
including other teachers, adminis-
trators, parents, and sometimes even students. In many schools,
formal associations, termed
professional learning communities (PLCs), are an extremely
valuable resource (see In the
Classroom: Professional Learning Communities [PLCs] ).
Table 8.3 shows how different types of assessment might be
used to tap learning objectives
relating to Bloom’s revised taxonomy (discussed in Chapter 4).
Note that the most common
I N T H E C L A S S R O O M :
Professional Learning Communities (PLCs)
A professional learning community (PLC) is a grouping of
educators, both new and experi-
enced—and sometimes of parents as well—who come together
to talk about, reflect on, and
share ideas and resources in an effort to improve curriculum,
learning, instruction, and assessment
(Dufour, 2012).
Professional learning communities are formal organizations
within schools or school systems. They
are typically established by principals or other school leaders
and are geared toward establishing
collaboration as a basis for promoting student learning. PLCs
are characterized by
• Supportive and collaborative educational
leadership
• Sharing of goals and values
• Collaborative creativity and innovation
• Sharing of personal experiences
• Sharing of instructional approaches and
resources
• Sharing of assessment strategies and applications
• A high degree of mutual support
Evidence suggests that professional learning communities are a
powerful means of professional
development and support (Brookhart, 2009; Strickland, 2009).
They are also a compelling strategy for educational change and
improvement.
Constructed- and Selected-Response Assessments Chapter 8
assessments for higher mental processes
such as analyzing, evaluating, and creating
are either constructed-response or perfor-
mance-based assessments. However, as we
see in the next section, selected-response
assessments such as multiple-choice tests
can also be designed to tap these processes.
Table 8.3 What assessment approach to use
Bloom’s Revised
Taxonomy of
Educational
Objectives
Verbs Related to Each
Objective
Students are asked to:
Some Useful Approaches to Assessment
Remembering copy, duplicate, list, learn,
replicate, imitate, memorize,
name, order, relate, reproduce,
repeat, recognize, . . .
Selected-response assessments including multiple-choice, true-
false, matching, and interpretive
Understanding indicate, know, identify, locate,
recognize, report, explain,
restate, review, describe,
distinguish, . . .
Selected-response assessments that require learner to locate,
identify, recognize, . . .
Constructed-response assessments including short-answer
and longer essay items where students are asked to explain,
describe, compare, . . .
Applying demonstrate, plan, draw,
outline, dramatize, choose,
sketch, solve, interpret,
operate, do, . . .
Written constructed-response assessments where students
are required to describe prototypes or simulations showing
applications
Performance assessments where learners demonstrate an appli-
cation, perhaps by sketching or dramatizing it
Analyzing calculate, check, categorize,
balance, compare, contrast,
test, differentiate, examine,
try, . . .
Written assessments requiring comparisons, detailed analyses,
advanced calculations
Performance assessments involving activities such as debating
or
designing concept maps
Evaluating assess, choose, appraise, price,
defend, judge, rate, calculate,
support, criticize, predict, . . .
Written assessments requiring judging, evaluating, critiquing
Performance assessments using portfolio entries reflecting
opinions, reflections, appraisals, reviews, etc.
Creating arrange, write, produce, make,
design, formulate, compose,
construct, build., generate,
craft, . . .
Written assignments perhaps summarizing original research
projects
Performance assessments involving original output such as
musical compositions, written material, designs, computer
programs, etc.
▶ Professional learning communities (PLCs) are
organized groups of educators who meet regu-
larly to reflect and collaborate on improving
curriculum, learning, instruction, and assess-
ment. Such groups are a powerful strategy for
educational improvement.
iStockphoto/Thinkstock
Developing Selected-Response Assessments Chapter 8
8.4 Developing Selected-Response Assessments
As we noted, selected-response assessments tend to be more
objective than constructed-
response assessments. After all, most of them have only one
correct answer.
Multiple-Choice Items
Among the most common of the highly objective selected-
response assessments is that con-
sisting of multiple-choice items. These are items that have a
stem—often a question or an
incomplete statement—followed by a series of possible
responses referred to as alterna-
tives. There are usually four or five alternatives, only one of
which is normally correct; the
others are termed distracters.
On occasion, some multiple-choice tests may contain more than
one correct alternative.
These, as Kubinger and Gottschall (2007) found, are usually
more difficult than items with
a single correct answer, providing responders are required to
select all correct alternatives
for the item to be marked correct. The researchers created a
multiple-choice test where any
number of the five alternatives might be correct. These test
items were more difficult than
identical items that had only one correct answer, because
guessing was now much less likely
to lead to a correct response. If responders did not recognize the
correct answers and tried to
guess which they might be, they would not know how many
alternatives to select.
Multiple-choice stems and alternatives can take a variety of
forms. Stems might consist of
questions, statements requiring completion, or negative
statements. Alternatives might be
best answer, combined answers, or single answers. Examples of
each of these items are
shown in Figure 8.6.
Guidelines for Constructing Multiple-Choice Items
Writing good multiple-choice items requires attention to a
number of important guidelines.
Many of them involve common sense (which makes them no
less valid):
1. Both stems and alternatives should be clearly worded,
unambiguous, grammatically cor-
rect, specific, and at the appropriate level of difficulty. In
addition, stems should be clearly
meaningful by themselves. Compare, for example, the following
two items:
A. In the story The Red Rat, how did Sally feel toward Angela
after her accident?
a. sad
b. angry
c. jealous
d. confused
B. In the story The Red Rat, how did Sally feel toward Angela
after Angela’s accident?
a. sad
b. angry
c. jealous
d. confused
The problem with the first stem is that the pronoun her has an
ambiguous referent. Does
the question refer to Sally’s accident or Angela’s? The second
stem corrects that error.
Similarly, stems that use the word they without a specific
context or reference are some-
times vague and misleading. For example, the true-false
question “Is it true that they say
Developing Selected-Response Assessments Chapter 8
you should avoid double negatives?” might be true or false,
depending who they is. If they
refers to most authors of assessment textbooks, the correct
answer is true. But if they
refers to the Mowats, who lived back in my isolated neck of the
woods, the correct answer
would be false: They didn’t never say don’t use no double
negatives!
2. But seriously, don’t use no double negatives when writing
multiple-choice items. They
are highly confusing and should be avoided at all costs. Are
single negatives highly rec-
ommended? Not. Common, easily found examples of double and
even triple negatives
include combinations like these:
It is not unnecessary to pay attention—meaning, simply, “It is
necessary to pay
attention.”
It is not impossible to pay attention—meaning, “It is possible to
pay attention.”
Figure 8.6: Examples of multiple-choice items
Stems and alternatives can take a variety of forms. In these
examples, the alternatives are always
ordered alphabetically or numerically. This is a precaution
against establishing a pattern that might
provide a clue for guessing the correct response. (Correct
responses are checked.)
f08.06_EDU645.ai
Incomplete Statement Stem
1. The extent to which a test appears to
measure what it is intended to measure
defines
___ a. construct validity
___ b. content validity
___ c. criterion-related validity
___ d. face validity
___ e. test reliability
Question Stem
2. Who is the theorist most closely
associated with the development of
operant conditioning?
___ a. Bandura
___ b. Pavlov
___ c. Skinner
___ d. Thorndike
___ e. Watson
Negative Statement Stem
3. Which of the following is NOT a
dinosaur?
___ a. allosaurus
___ b. brachiosaurus
___ c. stenogrosaurus
___ d. triceratop
___ e. velociraptor
Best Answer Alternative
4. What was the main motive for Britain
entering WWII?
___ a. economics
___ b. fear
___ c. greed
___ d. hatred
___ e. loyalty
5. Order the following from largest to
smallest in geographic area:
1. Brazil
2. Canada
3. China
4. Russia
5. United States
Single Answer Alternative
6. What is the area of a 20 foot by 36 inch
rectangle?
___ a. 16 square feet
___ b. 20 square feet
___ c. 56 square feet
___ d. 60 square feet
___ e. 720 square feet
Combined Answer Alternative
X
X
X
X
___ a. 1, 2, 3, 4, 5
___ b. 1, 3, 5, 4, 2
___ c. 4, 2, 5, 3, 1
___ d. 4, 5, 2, 1, 3
___ e. 2, 4, 5, 3, 1
X
X
Developing Selected-Response Assessments Chapter 8
The switch is not disabled—meaning, “The switch is
functioning.”
It is impossible to not do something illegal—meaning,
strangely, “It is not possible to do
something legal.”
For lack of no other option—meaning very little. If we lack no
other option, there must
be another option. No?
Test makers need to be especially wary of negative prefixes
such as un–, im–, dis–, in–,
and so on.
3. Unless the intention is clearly to test memorization, test
items should not be taken word
for word from the text or other study materials. This is
especially the case when instruc-
tional objectives involve application, analysis, or other higher
mental processes.
4. Create distracters that seem equally
plausible to students who don’t know
the correct answer. Otherwise, answer-
ing correctly might simply be a matter
of eliminating highly implausible dis-
tracters. Consider the following exam-
ple of a poor item:
A. 10 + 12 + 18 =
a. 2
b. 2,146
c. 40
d. 1
For students who don’t know how to
calculate the correct answer, highly
implausible distracters that can eas-
ily be eliminated may dramatically
increase the score-inflating effects of
guessing.
5. Unintentional cues should be avoided. For example, ending
the stem with a or an often
provides a cue as in, for example:
A. A pachyderm is an
a. cougar
b. dog
c. elephant
d. water buffalo
6. Avoid the use of strong qualifying words such as never,
always, none, impossible, and
absolutely in distracters. Distracters that contain them are most
often incorrect. On the
other hand, distracters that contain weaker qualifiers such as
sometimes, frequently, and
usually are often associated with correct alternatives. At other
times, they are simply vague
and confusing. Consider, for example:
A. Multiple-choice alternatives that contain strong qualifiers are
a. always incorrect
b. never incorrect
c. usually incorrect
d. always difficult
iStockphoto/Thinkstock
▲ This boy is completing an online, take-home, selected-
response test. Perhaps using two computers allows him to
have one send out various search engines looking for answers
while he completes the timed test on the other. That is one of
the factors that needs to be considered in online courses.
Developing Selected-Response Assessments Chapter 8
As expected, alternatives with strong qualifiers (always and
never) are incorrect, and the
alternative with a weak qualifier (usually) is correct.
Weak qualifiers often present an additional problem: They can
be highly ambiguous. For
example, interpreting the alternative usually incorrect is
difficult because the term is impre-
cise. Does usually in this context mean most of the time? Does
it mean more than half the
time? More than three quarters of the time?
In stems, both kinds of qualifiers also tend to be ambiguous,
and the weaker ones are
more ambiguous than those that are strong. Never usually means
“not ever”—although
it can sometimes be interpreted to mean “hardly ever.” But
frequently is one of those
alarmingly vague terms for whose meaning we have no
absolutes—only relatives. Just how
often is frequently? We don’t really know—which is why the
word fits so well in many of
our lies and exaggerations.
7. Multiple-choice assessments, like all forms of educational
assessment, also need to be
relevant to instructional objectives. That is, they need to
include items that sample course
objectives in proportion to their importance. This is one of the
reasons teachers should use
test blueprints and should make sure they are closely aligned
with instructional objectives.
8. Finally, as we saw in Chapter 2, assessments need to be as
fair, valid, and reliable as pos-
sible. Recall that fair tests are those that:
• Assess material that all learners have had an opportunity
to learn
• Allow sufficient time for all students to complete the test
• Guard against cheating
• Assess material that has been covered
• Make accommodation for learner’s special needs
• Are free of the influence of biases and stereotypes
• Avoid misleading, trick questions
• Grade assessments consistently
Recall, too, that the most reliable and valid assessments tend to
be those based on longer
tests or on a variety of shorter tests where scoring criteria are
clear and consistent. These
guidelines are summarized in Table 8.4.
Table 8.4 Checklist for constructing good multiple-choice items
Yes No Are stems and alternatives clear and unambiguous?
Yes No Have I avoided negatives as much as possible?
Yes No Have I included items that measure more than simple
recall?
Yes No Are all distracters equally plausible?
Yes No Have I avoided unintentional cues that suggest correct
answers?
Yes No Have I avoided qualifiers such as never, always, and
usually?
Yes No Do my items assess my instructional objectives?
Yes No Are my assessments as fair, reliable, and valid as
possible?
Developing Selected-Response Assessments Chapter 8
Matching Items
The simplest and most common matching-test item is one that
presents two columns of
information, arranged so that each item in one column matches
a single item in the other.
Columns are also organized so that matching terms are
randomly juxtaposed, as shown in
Figure 8.7.
Matching items can be especially useful for assessing
understanding, in addition to remember-
ing. In particular, they assess the student’s knowledge of
associations and relationships. They
can easily be constructed by generating corresponding matching
lists for a wide variety of
items. For example, Figure 8.7 matches people with concepts.
Other possible matches include
historical events with dates; words with definitions; words in
one language with translations
in another; geometric shapes with their names; literary works
titles with names of authors;
historical figures with historical positions or historical events;
names of different kinds of
implements with their uses; and on and on.
The most common matching items present what is termed the
premise column (or some-
times the stem column) on the left and possible matches in what
is called the response
column on the right.
A matching item might have more entries in the response
column or an equal number in each.
From a measurement point of view, one advantage of having
different numbers of entries in
each column is that this reduces the possibility of answering
correctly when the student does
not know the answer. In the example shown in Figure 8.7, for
example, students who know
three of four correct responses will also get the fourth correct.
By the same token, those who
know two of the four will have a 50-50 chance of getting the
next two correct. Even if a
Figure 8.7: Example of a matching-test item
Matching-test items should have very clear instructions. More
complex matching items sometimes
allow some responses to be used more than once or not at all.
f08.07_EDU645.ai
Instructions: Match the theorists in column A (described in
Chapter 2) with the concept in
column B most closely associated with each theorist. Write the
number in front of each entry
in column B in the appropriate space after each theorist named
in column A. Each term should
be used only once.
A. Theorist B. Associated Term
Thorndike _____
Watson _____
Skinner _____
Pavlov _____
4. Operant conditioning
3. Law of Effect
2. Classical conditioning
1. Behaviorism2
4
1
3
Developing Selected-Response Assessments Chapter 8
student knew only one response, there would still be a pretty
good chance of guessing one
or all of the others correctly. But the more items there are in the
response column, the lower
the odds of selecting the correct unknown response by chance.
Figure 8.8 presents an example of a matching item with more
items in the response than the
premise column.
Similarly, some matching tests are constructed in such a way
that each item in the response
list might be used once, more than once, or not at all. Not only
does this approach effec-
tively eliminate the possibility of narrowing down options for
guessing, but it might be
constructed to require that the student engage in behaviors that
require calculating, compar-
ing, differentiating, predicting, appraising, and so on. All of
these activities tap higher level
cognitive skills.
Figure 8.8: Example of a matching-test item with uneven
columns
When matching-test items contain more items in the response
list than in the premise list, reli-
ability of the measure increases because the probability of
correctly guessing unknown responses
decreases.
f08.08_EDU645.ai
Instructions: Match the 21st century world leaders in column A
with the country each has led or
currently leads, listed in column B. Write the number in front of
each entry in column B in the
appropriate space after each leader in column A. There is only
one correct response for each
item in column A.
A. World Leader B. Country Led
4. Italy
3. Egypt
2. Brazil
1. ArgentinaLuiz Inácio Lula da Silva _____2
Mohamed Morsy _____3
Al-Assad _____9
Ali Abdullah Saleh _____10
5. North Korea
Kim Jong-un _____5
6. Portugal
Silvio Berlusconi _____4
7. Saudi Arabia
Mariano Rajoy _____8
8. Spain
9. Syria
10. Yemen
Developing Selected-Response Assessments Chapter 8
Not all matching-test items are equally good. Consider, for
example, the item shown in Figure
8.9. Note how the instructions are clear and precise: They state
exactly what the test taker
must do and how often each response can be used. But it really
is a very bad item. Entries in
each column are structured so differently that for those with
adequate reading skills, gram-
matical cues make the answers totally obvious. The person who
built this item should have
paid attention to the following guidelines:
1. Items in each column should be parallel. For example, in
Figure 8.7, all items in the premise
column are names of theorists, and all items in the response
column are terms related in
some important way to one of the theories. Similarly, in Figure
8.8, premise entries are all
names of world leaders, and response entries are all countries.
The following is an example
of nonparallel premise items that are to be matched to a
response list of different formulas
for calculating the surface area of different geometric figures:
triangle
square
rectangle
cardboard boxes
circle
The inclusion of cardboard boxes among these geometric
shapes is confusing and
unnecessary.
Test makers must also guard against items that are not
grammatically parallel, as is shown
in Figure 8.9.
Figure 8.9: Example of a poorly constructed matching-test item
To avoid many of the problems that are obvious in this example,
simply use complete sentences or
parallel structures in the premise column.
f08.09_EDU645.ai
Instructions: Match the statements in column A with the best
answer from column B. Write the
number in front of each answer in column B in the appropriate
space after each statement in
column A. No answer can be used more than once. One will not
be used at all.
A. In the Story, Pablo’s Chicken B. Answers based on Pablo’s
Chicken
4. kitchen scraps
3. his mother
2. his dog dies
1. angryAt the beginning of the story,
Pablo lives in _____5
Pablo gets very upset when _____2
When Pablo answers the door,
his dog bites _____3
Pablo’s mother is very _____1 5. Monterrey
Developing Selected-Response Assessments Chapter 8
2. All items in the response list should be plausible. This is
especially true if the response list
contains more entries than the premise column. The test is less
reliable when it contains
items that allow students to quickly discard implausible
responses.
3. To increase the reliability of the test, the response column
should contain more items than
the premise column.
4. Limit the number of items to between six and 10 in each
column. Longer columns place
too much strain on memory. Recall from Chapter 3 that our
adult short-term memory is
thought to be limited to seven plus or minus two items. It is
difficult to keep more items
than this in our conscious awareness at any one time.
5. We saw that grammatical structure can sometimes provide
unwanted clues in multiple-
choice items. This can also be the case in matching-test items,
as is the case for the item in
Figure 8.9 where grammatical structure reveals almost all the
correct responses. Moreover,
the fourth item in the response column is an implausible
response.
6. Directions should be clear and specific. They should stipulate
how the match is to be made
and on what basis. For example, directions for online matching
tests might read: “Drag
each item in column B to the appropriate box in front of the
matching item in column
A.” Similar instructions for a written matching-item test might
specify: “Write the num-
ber in front of each answer in column B in the appropriate space
after each statement in
column A.”
7. Response items should be listed in logical order. Note, for
example, that response columns
in Figures 8.7 through 8.9 are alphabetical. Where response
items are numerical, they
should be listed in ascending or descending order. Doing so
eliminates the possibility of
developing some detectable pattern. It also discourages students
from wasting their time
looking for a pattern.
8. For paper-and-pencil matching items, ensure that lists are
entirely on one page. Having to
flip from one page to another can be time-consuming and
confusing.
Table 8.5 summarizes these guidelines in the form of a
checklist.
Table 8.5 Checklist for constructing good matching items
Yes No Are items in the premise column parallel?
Yes No Are items in the response column parallel?
Yes No Are all response-column items plausible?
Yes No Have I included more response- than premise-column
items?
Yes No Are my lists limited to no more than seven or so
items?
Yes No Have I avoided unintentional cues that suggest correct
matches?
Yes No Are my directions clear, specific, and complete?
Yes No Have I listed response-column items in logical order?
Yes No Are my columns entirely on one page?
Yes No Do my items assess my instructional objectives?
Yes No Are my assessments as fair, reliable, and valid as
possible?
Developing Selected-Response Assessments Chapter 8
True-False Items
A relatively common form of assessment, often used in the early
grades, is the true-false
item. True-false items typically take the form of a statement
that the examinee must judge as
true or false. But they can also consist of statements or
questions to which the correct answer
might involve choosing between responses such as yes and no or
right and wrong. As a result,
they are sometimes called binary-choice items rather than true-
false items.
True-false test items tend to be popular in early grades because
they are easy to construct, can
be used to sample a wide range of knowledge, and provide a
quick and easy way to look at
the extent to which instructional objectives are being met.
Most true-false items are simple propositions that can be
answered true or false (or yes or no).
For examples:
Reliability is a measure of the consistency of an assessment. T
F
Face validity reflects how closely scores from repeated
administrations of the same test
resemble each other. T F
Predictive validity is a kind of criterion-related validity. T F
True-False Assessments to Tap Higher Mental Processes
Answering these true-false questions requires little more than
simple recall. However, it is
possible to construct true-false items that assess other cognitive
skills. Consider the following:
Is the following statement true or false?
9 ÷ 3 × 12 + 20 = 56 T F
Responding correctly to this item requires recalling and
applying mathematical operations
rather than simply remembering a correct answer.
Binary-choice items can also be constructed so that the
responder is required to engage in a
variety of higher level cognitive activities such as comparing,
predicting, evaluating, and gen-
eralizing. Figure 8.10 shows examples of how this might be
accomplished in relation to the
Revised Bloom’s Taxonomy.
Note that Figure 8.10 does not include any examples that relate
to creating. Objectives that
have to do with designing, writing, producing, and related
activities are far better tapped by
means of performance-based assessments or constructed-
response items than with the more
objective, selected-response assessments.
Note, too, that although it is possible to design true-false items
that seem to tap higher cogni-
tive processes, whether they do so depends on what the learner
already knows. As we saw
earlier, what a test measures is not defined entirely by the test
items themselves. Rather, it
depends on the relationship between the test item and the
individual learner. Consider, for
example, the Figure 8.10 item that illustrates judging—an
activity described as having to do
with evaluating:
• It is usually better to use a constructed-response test
rather than a true-false test for
objectives having to do with evaluating. T F
One learner might respond, after considering the characteristics
of constructed-response and
true-false tests, by analyzing the requirements of evaluative
cognitive activities and judging
Developing Selected-Response Assessments Chapter 8
which characteristics of these assessments would be best suited
for the objective. That learn-
er’s cognitive activity would illustrate evaluating.
Another learner, however, might simply remember having read
or heard that constructed-
response assessments are better suited for objectives relating to
evaluating and would quickly
select the correct response. That learner’s cognitive activity
would represent the lowest level
in Bloom’s Taxonomy: remembering.
Figure 8.10: True-false items tapping higher cognitive skills
Although it is possible to design true-false items that do more
than measure simple recall, other
forms of assessment are often more appropriate for objectives
relating to activities such as analyz-
ing, evaluating, and creating.
f08.10_EDU645.ai
Bloom’s Revised
Taxonomy of
Educational
Objectives
Remembering
Understanding
Applying
Analyzing
Evaluating
Creating
Possible activity
representing
each objective
Example of true-false item
that reflects the boldface verb
in the second column
copy, duplicate, list, learn,
replicate, imitate,
memorize, name, order,
relate, reproduce, repeat,
recognize . . .
indicate, know, identify,
locate, recognize, report,
explain, restate, review,
describe, distinguish . . .
demonstrate, plan, draw,
outline, dramatize,
choose, sketch, solve,
interpret, operate, do, . . .
calculate, check, catego-
rize, balance, compare,
contrast, test, differenti-
ate, examine, try . . .
assess, choose, appraise,
price, defend, judge, rate,
calculate, support,
criticize, predict . . .
arrange, write, produce,
make, design, formulate,
compose, construct,
build, generate, craft . . .
(Creating cannot normally
be tested by means of a true-false item)
X
X
X
X
XThis is a spider: T F
It is usually better to use a constructed
response rather than a true-false test
for objectives having to do with
evaluating. T F
The total area of a rectangular house that
is 60 feet in one dimension and that
contains nothing other than 3 equal-sized
rectangular rooms that measure 30 feet in
one dimension is 1800 square feet.
T F
For cutting through a trombone you could
use either a hacksaw or a rip saw. T F
It is correct to say that affect has an
affect on most of us.
Yes No
Developing Selected-Response Assessments Chapter 8
Limitations of True-False Assessments
True-false assessments are open to a number of serious
criticisms. First, unless they are care-
fully and deliberately constructed to go beyond black-and-white
facts, they tend to measure
little more than simple recall. And second, because there is a
50% chance of answering any
one question correctly—all other things being equal—they tend
to provide unreliable assess-
ments. If everyone in a class knew absolutely nothing about an
area being tested with true-
false items, and they all simply guessed each answer randomly,
the average performance of
the class would be around 50%.
Nevertheless, the chance of receiving a high mark as a result of
what is termed blind guess-
ing is very low. And the chance of receiving a very poor mark is
equal to that of receiving a
very high one.
Most guessing tends to be relatively educated rather than
completely random. Even if they
are uncertain about the correct answer, many students know
something about the item and
guess on the basis of the information they have and the logic
and good sense at their disposal.
Variations on Binary-Choice Items
In one study, Wakabayashi and Guskin (2010) used an
intriguing approach to reduce the
effect of guessing. Instead of simply giving respondents the
traditional choice of true or false,
they added a third option: unsure. When students were retested
on the same material later,
items initially marked unsure were more likely to have been
learned in the interim and to be
answered correctly on the second test than were incorrect
responses of which the responders
had been more certain.
Another interesting approach that uses true-false items to
understand more clearly the learn-
er’s thinking processes asks responders to explain their choices.
For example, Stein, Larrabee,
and Barman (2008) developed an online test designed to
uncover false beliefs that people
have about science. The test consists of 47 true-false items,
each of which asks responders to
explain the reasons for their choices. As an example, one of the
items reads as follows:
An astronaut is standing on the moon with a baseball in her/his
hand. When the base-
ball is released, it will fall to the moon’s surface. (p. 5)
The correct answer, true, was selected by only 32.8% of 305
respondents, all of whom were
students enrolled in teacher education programs at two different
universities. More reassur-
ingly, 94.4% chose the correct answer (true) for this next item:
A force is needed to change the motion of an object. (p. 5)
The usefulness of this approach lies in the explanations, which
often reveal serious misconcep-
tions. And strikingly, this is frequently the case even when
answers are correct. For example,
more than 40% of respondents who answered this last item
correctly did so for the wrong
reasons. They failed to identify the normal forces (called
reaction forces) that counter the
effects of gravity.
Asking students to explain their choices on multiple-choice
tests might reveal significant gaps
in knowledge or logic. This approach could contribute in
important ways to the use of these
tests for formative purposes. Table 8.6 presents guidelines for
writing true-false items.
Developing Selected-Response Assessments Chapter 8
Table 8.6 Checklist for constructing good true-false items
Yes No Is the item useful for assessing important learning
objectives?
Yes No Have I avoided negatives as much as possible?
Yes No Have I included items that measure more than simple
recall?
Yes No Is the item absolutely clear and unambiguous?
Yes No Have I avoided qualifiers such as never, always, and
usually?
Yes No Have I avoided a pattern of correct responses?
Yes No Have I balanced correct response choices?
Yes No Have I made my statements as brief as possible?
Yes No Have I avoided trick questions?
Yes No Are my true and false statements of approximately
equal length?
Interpretive Items
Interpretive items present information that the responder needs
to interpret when answer-
ing test items. Although the test items themselves might take
the form of any of the objec-
tive test formats—matching, multiple-choice, or true-false—in
most cases, the material to be
interpreted is followed by multiple-choice questions.
Interpretive material most often takes the form of one or two
written paragraphs. It may also
involve graphs, charts, maps, tables, and video or audio
recordings. Figure 8.11 is an example
of a true-false interpretive item based on a graph. Answering
the items correctly might require
analysis and inference in addition to basic skills in reading
graphs.
Figure 8.12 illustrates a more common form of interpretive test
item. It is based on written
material that is novel for the student. Responding correctly
requires a high level of reading skill
and might also reflect a number of higher mental processes such
as those involved in analyz-
ing, generalizing, applying, and evaluating.
Advantages of Interpretive Items
Interpretive items present several advantages over traditional
multiple-choice items. Most
important, they make it considerably easier to tap intellectual
processes other than simple
recall. Because the material to be interpreted is usually novel,
the student cannot rely on recall
to respond correctly.
A second advantage of interpretive items is that they can be
used to assess understanding
of material that is closer to real life. For example, they can
easily be adapted to assess how
clearly students understand the sorts of tables, charts, maps, and
graphs that are found in
newspapers, on television, and in online sources.
Finally, not only can interpretive test items be used to assess a
large range of intellectual skills,
but they can also be scored completely objectively. This is not
the case for performance-based
assessments or for most constructed-response assessments.
Developing Selected-Response Assessments Chapter 8
Figure 8.11: Interpretive true-false item based on a graph
Interpretive test items are most often based on written material
but can also be based on a variety
of visual or auditory material, as shown here.
f08.11_EDU645.ai
2000 2006 2007
Year
2008 2009
60
55
50
45
40
35
30
25
P
e
rc
e
n
ta
g
e
o
f
a
d
u
lt
s
a
g
e
d
2
5
t
o
3
4
Currently married
Never married
Results of a 2011 U.S. Census Bureau survey are shown above.
Based on this graph,
indicate whether each of the following statements is true or
false by putting a
checkmark in front of the appropriate letter.
1. The vertical axis indicates the year in which the survey was
conducted. T F
2. In 2008, there were more 25- to 34-year-olds who were
married than who had
never married. T F
3. Every year between 2000 and 2009 there were more married
than never married
adults between 25 and 34. T F
4. One hundred percent of people surveyed were either currently
married or had
never married. T F
5. The number of never-married 25- to 34-year-olds increased
between the year
2000 and 2009. T F
�
�
�
�
�
Source: U.S. Census Bureau (2011). Retrieved September 12,
2011, from http://factfinder.census.gov/servlet/
STTable?_bm=y&-qr_name=ACS_2009_5YR_G00_S1201&-
ds_name=ACS_2009_5YR_G00_&-state=st&-_lang=en.
http://factfinder.census.gov/servlet/STTable?_bm=y&-
qr_name=ACS_2009_5YR_G00_S1201&-
ds_name=ACS_2009_5YR_G00_&-state=st&-_lang=en
http://factfinder.census.gov/servlet/STTable?_bm=y&-
qr_name=ACS_2009_5YR_G00_S1201&-
ds_name=ACS_2009_5YR_G00_&-state=st&-_lang=en
Developing Selected-Response Assessments Chapter 8
Limitations of Interpretive Items
Interpretive test items do have a number of limitations and
disadvantages, however. Among
them is that interpretive items often rely heavily on reading
skills. As a result, incorrect
responses may reflect reading problems rather than problems in
the intellectual skills being
assessed.
Another disadvantage of interpretive items is the difficulty of
constructing good interpretive
material and related multiple-choice items. Developing
interpretive text or visual representa-
tions is a time-consuming and demanding task. Poorly designed
items tend to measure rec-
ognition and recall, both of which can be assessed by means of
multiple-choice or true-false
formats, which are easier to construct.
Finally, like other objective assessments, interpretive-test items
seldom tap the productive
skills involved in creating. As we noted, performance-based and
constructed-response tests
are more appropriate for tapping the highest levels of cognitive
skills.
Constructing Good Interpretive Items
As is true for all forms of assessment, the best interpretive test
items are those that are rel-
evant to instructional objectives, that sample widely, that tap a
range of mental processes,
and that are as fair, valid, and reliable as possible. Table 8.7 is
a brief checklist of guidelines
for constructing good interpretive items.
Figure 8.12: Interpretive multiple-choice item based on written
material
The most common interpretive items are based on written
material. One of their disadvantages is
that they are highly dependent on reading skills.
f08.12_EDU645.ai
Beavers are hardworking, busy little creatures. Like all
mammals, when they are
young, the babies get milk from their mothers. They are warm-
blooded, so it’s
important for them to keep from freezing. They do this in the
winter by living in
houses they build of logs, sticks, and mud. Beaver houses are
called lodges. The
entrance to the lodge is far enough under water that it doesn’t
freeze even in
very cold weather. Because the walls of the lodge are very thick
and the area in
which the family lives is very small, the heat of their bodies
keeps it warm.
1. Which of the following best
describes mammals?
a. Mammals are warm
blooded and live in lodges.
b. Mammals need milk to
survive.
c. Some mammals hatch from
eggs that the mother lays.
d. Mammals produce milk for
their newborns.
e. Mammals live in warm
shelters or caves.
2. Beaver lodges stay warm
because
a. They have an underwater
entrance that doesn’t
freeze.
b. The living area is small and
the walls are thick.
c. Beavers are very hard-
working and busy.
d. Beavers are warm-blooded
mammals.
e. Beavers are mammals.
*
*
Developing Constructed-Response Assessments Chapter 8
Table 8.7 Checklist for constructing good interpretive items
Yes No Is the item useful for assessing important learning
objectives?
Yes No Is the reading and difficulty level appropriate for my
students?
Yes No Does the item tap more than simple recall?
Yes No Have I avoided questions that are answered literally in
the interpretive material?
Yes No Have I avoided questions that can be answered without
the interpretive material?
Yes No Are multiple-choice choice items well constructed?
Yes No Is the item absolutely clear and unambiguous?
Yes No Have I included all the information required?
Yes No Are instructions clear?
Yes No Have I avoided unnecessary length?
Yes No Is the interpretive material novel for learners?
Yes No Have I avoided trick questions?
8.5 Developing Constructed-Response Assessments
As the name implies, constructed-response assessments require
test takers to generate their
own responses. Two main kinds of constructed-response
assessments are used in educational
measurement: short-answer items (also referred to as restricted-
response items) and essay
items (also called extended-response items).
The main advantage of constructed-response assessments is that
they lend themselves better
to evaluating higher thought processes and cognitive skills.
Also, they allow for more variation
and more creativity.
When compared with selected-response assessments,
constructed-response assessments
have two principal limitations: First, they usually consist of a
small number of items and there-
fore sample course content less widely; second, they tend to be
less objective than selected-
response assessments, simply because they can seldom be
scored completely objectively.
Short-Answer Constructed-Response Items
Short-answer items sometimes require a response consisting of
only a single word or short
phrase to fill in a blank left in a sentence. These are normally
referred to as completion items
(or fill-in-the-blanks items). At other times, they require a brief
written response—perhaps
only one or two words—that does not fill in a blank space in a
sentence. In contrast, essay
items typically ask for a longer, more detailed written response,
often consisting of a number
of paragraphs or even pages.
Completion items can easily be generated simply by taking any
clear, unambiguous, complete
sentence from a written source and reproducing it with a single
word or phrase left out.
One problem with this approach is that it encourages rote
memorization rather than a more
thoughtful approach to studying. In addition, out-of-context
sentences are often somewhat
ambiguous at best; at worst, they can be completely misleading.
Developing Constructed-Response Assessments Chapter 8
Advantages of Short-Answer Items
Short-answer items have several important advantages:
1. Because they require the test taker to produce a response,
they effectively eliminate much
of the influence of guessing. This presents a distinct advantage
over selected-response
approaches such as multiple-choice and true-false assessments.
2. Because they ask for a single correct answer or a very brief
response, they are highly objec-
tive even though they ask for a constructed response. As a
result, it is a simple matter to
generate a marking key very much as is done for multiple-
choice, true-false, or matching
items.
3. They are easy to construct and can quickly sample a wide
range of knowledge.
Limitations of Short-Answer Items
Among their limitations is that because examiners need to read
every response, short-answer
items can take longer to score than selected-response measures.
Nor is constructing short-
answer items always easy: Sometimes it is difficult to phrase
the item so that only one answer
is correct.
Another limitation has to do with possible contamination of
scores due to bad spelling. If
marks are deducted for misspelled words, what is being
measured becomes a mixture of spell-
ing and content knowledge. But if spelling errors are ignored,
the marker may occasionally
have to guess at whether the misspelled word actually
represents the correct answer.
Finally, because correct responses are usually limited to a
single choice, they don’t allow for
creativity and are unlikely to tap processes such as synthesis
and evaluation.
Examples of Short-Answer Items
General guidelines for the construction of short-answer items
include many of those listed
earlier in Tables 8.4 through 8.7. In addition, test makers need
to ensure that only one answer
is correct and that what is required as a response is clear and
unambiguous. For this reason,
when preparing completion items it is often advisable to place
blanks at the end of the sen-
tence rather than in the middle. For example, consider these two
completion items:
1. In 1972, produced a film entitled Une Belle Fille Comme
Moi.
2. The name of the person who produced the 1972 film Une
Belle Fille Comme Moi is
.
Although the correct answer in both cases is François Truffaut,
the first item could also be
answered correctly with the word France. But the structure of
the second item makes the
nature of the required response clear. It’s even clearer to
rephrase the sentence as a question
so that it is no longer a fill-in-the blanks short-answer item. For
example:
What is the name of the director who produced the 1972 film
Une Belle Fille Comme
Moi?
Although short-answer items generally require only one or two
words as a correct response,
some ask for slightly longer responses. For example:
Define face validity.
Developing Constructed-Response Assessments Chapter 8
What states are contiguous with Nevada? , ,
, and .
As shown in the second example, four blanks are provided as a
clue to the number of responses
required. Providing a single, longer blank would increase the
item’s level of difficulty.
Essay Constructed-Response Items
Essay items also require test takers to generate their own
responses. But instead of being
asked to supply a single word or short phrase, they are asked to
write longer responses. The
instructions given to the test taker—sometimes referred to as
the stimulus—can vary in length
and form, but should always be worded so that the student has a
clear understanding of
what is required. Whenever possible, the stimulus should also
include an indication of scoring
guidelines. Without scoring guidelines, responses can vary
widely and increase the difficulty
of scoring an item.
Some essay assessments might ask a simple question that
requires a one- or two-sentence
response. For example:
What is the main effect of additional government spending on
unemployment?
Why is weather forecasting more accurate now than it was in
the middle of the
20th century?
Both of these questions can be answered correctly with a few
sentences. And both can be
keyed so that different examiners scoring them would arrive at
very similar results. Items of
this kind can have a high degree of reliability.
Many essay items ask for lengthier expositions that typically
require organizing information,
marshaling arguments, defending opinions, appealing to
authority, and so on. In responding
to these, students typically have wide latitude both in terms of
what they will say and how
they will say it. As a result, longer essay responses are
especially useful for tapping higher
mental processes such as applying, analyzing, evaluating, and
creating. This is their major
advantage over the more objective approaches.
The main limitation of longer essay assessments has to do with
their scoring. Not only is it
highly time-consuming, but it tends to be decidedly subjective.
As a result, both reliability and
validity of such assessments is lower than that of more
objective measures. For example, some
examiners consistently give higher marks than others (Brown,
2010). Also, because there is
sometimes a sequential effect in scoring essays, essay items that
follow each other are more
likely to receive similar marks than those that are farther apart
(Attali, 2011).
The reliability of essay tests can be increased significantly by
using detailed scoring guides
such as checklists and rubrics. These guides typically specify as
precisely as possible details of
the points, arguments, conclusions, and opinions that will be
considered in the scoring, and
the weightings assigned to each.
Essay questions can be developed to assess knowledge of
subject matter (remembering, in
Bloom’s revised taxonomy); or they can be designed to tap any
of the higher level intellectual
skills. Figure 8.13 gives examples of how this might be done.
Developing Constructed-Response Assessments Chapter 8
Writing good essay questions is not as time-consuming as
writing objective items such as
multiple-choice questions, but it does require attention to
several guidelines. These are sum-
marized in Table 8.8.
Figure 8.13: Essay items tapping higher intellectual skills
What each item assesses depends on what test takers already
know and the strategies they use to
craft their responses. Even responding to the first item,
remembering, might require a great deal
of creating, analyzing, evaluating, and understanding if the
learner has not already memorized a
correct response.
f08.13_EDU645.ai
Bloom’s Revised
Taxonomy of
Educational
Objectives
Possible activity
representing
each objective
Example of essay item
that reflects the boldface verb
in the second column
copy, duplicate, list, learn,
replicate, imitate,
memorize, name, order,
relate, reproduce, repeat,
recognize . . .
indicate, know, identify,
locate, recognize, report,
explain, restate, review,
describe, distinguish . . .
demonstrate, plan,
draw, outline, dramatize,
solve, choose, sketch, solve
interpret, operate, do, . . .
calculate, check, categorize,
balance, compare, contrast,
test, differentiate,
examine , try . . .
assess, choose, appraise,
price, defend, judge, rate,
calculate, support, criticize,
predict . . .
arrange, write, produce,
make, design, devise,
formulate, compose,
construct, build,
generate, craft . . .
Devise a procedure with accompanying
formulas that can be used to calculate the
volume of a small rock of any shape.
Describe each step in the procedure.
List the five most important events that led
to the Second World War.
Write an essay appraising the American
educational system. What are its strengths
and weaknesses? How can it be improved?
(Recommended length: 500–1000 words.)
Read the following two paragraphs.
Compare the use of figures of speech
in each. How are they similar?
How are they different?
Using the formula for compound interest,
calculate the monthly payment for a
$150,000 loan at 6.7% amortized over
22 years. Explain the implications of
lengthening the amortization period.
Explain in no more than half a page how
an internal combustion engine works.
Remembering
Understanding
Applying
Analyzing
Evaluating
Creating
Section Summaries Chapter 8
Table 8.8 Checklist for constructing good essay items
Yes No Do the essay questions assess intended instructional
objectives?
Yes No Have I worded the stimulus so that the requirements
are clear?
Yes No Am I assessing more than simple recall?
Yes No Have I indicated how much time should be spent on
each item (usually by saying how many
points each item is worth)?
Yes No Have I developed a scoring rubric or checklist?
Planning for Assessment
In Chapter 2, and again, earlier in this chapter, we described the
various steps that make up
an intelligent and effective assessment plan:
1. Know clearly what your instructional objectives are, and
communicate them to your
students.
2. Match instruction to objectives; match instruction to
assessment; match assessment to
objectives.
3. Use formative assessment as an integral part of instruction.
4. Use a variety of different assessments, especially when
important decisions depend on
their outcomes.
5. Use blueprints to construct tests and develop keys, checklists,
and rubrics to score them.
The importance of these five steps can hardly be
overemphasized.
Chapter 8 Themes and Questions
Section Summaries
8.1 Planning for Teacher-Made Tests Important steps in
planning for assessment include
clarifying instructional objectives, devising test blueprints,
matching instruction and assess-
ment to goals, developing rubrics and other scoring guides, and
using a variety of approaches
to assessment. Assessment should be used for placement
decisions, for improving teaching
and learning (diagnostic and formative functions), and for
evaluating and grading achieve-
ment and progress (summative function).
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx
842014 I come not to bury summative assessments but to prais.docx

More Related Content

Similar to 842014 I come not to bury summative assessments but to prais.docx

Educational Psychology Developing Learners 9th Edition ormrod Test Bank
Educational Psychology Developing Learners 9th Edition ormrod Test BankEducational Psychology Developing Learners 9th Edition ormrod Test Bank
Educational Psychology Developing Learners 9th Edition ormrod Test BankBreannaSampson
 
Outcome based curriculum-second_edition-practitioners_implementation_handbook...
Outcome based curriculum-second_edition-practitioners_implementation_handbook...Outcome based curriculum-second_edition-practitioners_implementation_handbook...
Outcome based curriculum-second_edition-practitioners_implementation_handbook...RareBooksnRecords
 
4 Key Strategies for Using Assessments to Drive a Culture of Growth
4 Key Strategies for Using Assessments to Drive a Culture of Growth4 Key Strategies for Using Assessments to Drive a Culture of Growth
4 Key Strategies for Using Assessments to Drive a Culture of GrowthAva O'Keefe
 
Standardized Testingss presentation.pptx
Standardized Testingss presentation.pptxStandardized Testingss presentation.pptx
Standardized Testingss presentation.pptxayeshayounas46
 
Teachers and Lesson Observations by @DrMattOLeary
Teachers and Lesson Observations by @DrMattOLearyTeachers and Lesson Observations by @DrMattOLeary
Teachers and Lesson Observations by @DrMattOLearymrsingh97classideas
 
E as t-tle yr 7 and 8 wairoa college 24th sept
E as t-tle yr 7 and 8 wairoa college 24th sept E as t-tle yr 7 and 8 wairoa college 24th sept
E as t-tle yr 7 and 8 wairoa college 24th sept Broadwood Area School
 
Pro questdocuments 2015-03-16(3)
Pro questdocuments 2015-03-16(3)Pro questdocuments 2015-03-16(3)
Pro questdocuments 2015-03-16(3)Rose Jedin
 
Test instruction behaviour, observation
Test instruction behaviour, observationTest instruction behaviour, observation
Test instruction behaviour, observationnazar hussain
 
Criterion-Referenced Competency Test
Criterion-Referenced Competency TestCriterion-Referenced Competency Test
Criterion-Referenced Competency TestTasha Holloway
 
6 Tips to Avoid Errors in Question Paper Setting Process
6 Tips to Avoid Errors in Question Paper Setting Process6 Tips to Avoid Errors in Question Paper Setting Process
6 Tips to Avoid Errors in Question Paper Setting ProcessKavika Roy
 
Purposes and Characteristics of Educational AssessmentF.docx
Purposes and Characteristics  of Educational AssessmentF.docxPurposes and Characteristics  of Educational AssessmentF.docx
Purposes and Characteristics of Educational AssessmentF.docxamrit47
 
w3s1 Introduction to Assessments.pptx
w3s1 Introduction to Assessments.pptxw3s1 Introduction to Assessments.pptx
w3s1 Introduction to Assessments.pptxMarly124286
 
MSEA: Assessment for ALL Learners
MSEA: Assessment for ALL LearnersMSEA: Assessment for ALL Learners
MSEA: Assessment for ALL LearnersEliLuce
 

Similar to 842014 I come not to bury summative assessments but to prais.docx (19)

Assessments
AssessmentsAssessments
Assessments
 
Assessments
AssessmentsAssessments
Assessments
 
Educational Psychology Developing Learners 9th Edition ormrod Test Bank
Educational Psychology Developing Learners 9th Edition ormrod Test BankEducational Psychology Developing Learners 9th Edition ormrod Test Bank
Educational Psychology Developing Learners 9th Edition ormrod Test Bank
 
Outcome based curriculum-second_edition-practitioners_implementation_handbook...
Outcome based curriculum-second_edition-practitioners_implementation_handbook...Outcome based curriculum-second_edition-practitioners_implementation_handbook...
Outcome based curriculum-second_edition-practitioners_implementation_handbook...
 
4 Key Strategies for Using Assessments to Drive a Culture of Growth
4 Key Strategies for Using Assessments to Drive a Culture of Growth4 Key Strategies for Using Assessments to Drive a Culture of Growth
4 Key Strategies for Using Assessments to Drive a Culture of Growth
 
Standardized Testingss presentation.pptx
Standardized Testingss presentation.pptxStandardized Testingss presentation.pptx
Standardized Testingss presentation.pptx
 
Teachers and Lesson Observations by @DrMattOLeary
Teachers and Lesson Observations by @DrMattOLearyTeachers and Lesson Observations by @DrMattOLeary
Teachers and Lesson Observations by @DrMattOLeary
 
Inklusi bu endang
Inklusi bu endangInklusi bu endang
Inklusi bu endang
 
E as t-tle yr 7 and 8 wairoa college 24th sept
E as t-tle yr 7 and 8 wairoa college 24th sept E as t-tle yr 7 and 8 wairoa college 24th sept
E as t-tle yr 7 and 8 wairoa college 24th sept
 
Pro questdocuments 2015-03-16(3)
Pro questdocuments 2015-03-16(3)Pro questdocuments 2015-03-16(3)
Pro questdocuments 2015-03-16(3)
 
Test instruction behaviour, observation
Test instruction behaviour, observationTest instruction behaviour, observation
Test instruction behaviour, observation
 
What is ‘Best Practice’ in Educational Assessment?
What is ‘Best Practice’ in Educational Assessment?What is ‘Best Practice’ in Educational Assessment?
What is ‘Best Practice’ in Educational Assessment?
 
Criterion-Referenced Competency Test
Criterion-Referenced Competency TestCriterion-Referenced Competency Test
Criterion-Referenced Competency Test
 
6 Tips to Avoid Errors in Question Paper Setting Process
6 Tips to Avoid Errors in Question Paper Setting Process6 Tips to Avoid Errors in Question Paper Setting Process
6 Tips to Avoid Errors in Question Paper Setting Process
 
Purposes and Characteristics of Educational AssessmentF.docx
Purposes and Characteristics  of Educational AssessmentF.docxPurposes and Characteristics  of Educational AssessmentF.docx
Purposes and Characteristics of Educational AssessmentF.docx
 
Group1 Assessment
Group1 AssessmentGroup1 Assessment
Group1 Assessment
 
w3s1 Introduction to Assessments.pptx
w3s1 Introduction to Assessments.pptxw3s1 Introduction to Assessments.pptx
w3s1 Introduction to Assessments.pptx
 
Common Formative Assessment
Common Formative AssessmentCommon Formative Assessment
Common Formative Assessment
 
MSEA: Assessment for ALL Learners
MSEA: Assessment for ALL LearnersMSEA: Assessment for ALL Learners
MSEA: Assessment for ALL Learners
 

More from evonnehoggarth79783

For this Portfolio Project, you will write a paper about John A.docx
For this Portfolio Project, you will write a paper about John A.docxFor this Portfolio Project, you will write a paper about John A.docx
For this Portfolio Project, you will write a paper about John A.docxevonnehoggarth79783
 
For this portfolio assignment, you are required to research and anal.docx
For this portfolio assignment, you are required to research and anal.docxFor this portfolio assignment, you are required to research and anal.docx
For this portfolio assignment, you are required to research and anal.docxevonnehoggarth79783
 
For this paper, discuss the similarities and differences of the .docx
For this paper, discuss the similarities and differences of the .docxFor this paper, discuss the similarities and differences of the .docx
For this paper, discuss the similarities and differences of the .docxevonnehoggarth79783
 
For this paper, discuss the similarities and differences of the impa.docx
For this paper, discuss the similarities and differences of the impa.docxFor this paper, discuss the similarities and differences of the impa.docx
For this paper, discuss the similarities and differences of the impa.docxevonnehoggarth79783
 
For this paper choose two mythological narratives that we have exami.docx
For this paper choose two mythological narratives that we have exami.docxFor this paper choose two mythological narratives that we have exami.docx
For this paper choose two mythological narratives that we have exami.docxevonnehoggarth79783
 
For this module, there is only one option.  You are to begin to deve.docx
For this module, there is only one option.  You are to begin to deve.docxFor this module, there is only one option.  You are to begin to deve.docx
For this module, there is only one option.  You are to begin to deve.docxevonnehoggarth79783
 
For this Major Assignment 2, you will finalize your analysis in .docx
For this Major Assignment 2, you will finalize your analysis in .docxFor this Major Assignment 2, you will finalize your analysis in .docx
For this Major Assignment 2, you will finalize your analysis in .docxevonnehoggarth79783
 
For this Final Visual Analysis Project, you will choose one website .docx
For this Final Visual Analysis Project, you will choose one website .docxFor this Final Visual Analysis Project, you will choose one website .docx
For this Final Visual Analysis Project, you will choose one website .docxevonnehoggarth79783
 
For this essay, you will select one of the sources you have found th.docx
For this essay, you will select one of the sources you have found th.docxFor this essay, you will select one of the sources you have found th.docx
For this essay, you will select one of the sources you have found th.docxevonnehoggarth79783
 
For this discussion, you will address the following prompts. Keep in.docx
For this discussion, you will address the following prompts. Keep in.docxFor this discussion, you will address the following prompts. Keep in.docx
For this discussion, you will address the following prompts. Keep in.docxevonnehoggarth79783
 
For this discussion, research a recent science news event that h.docx
For this discussion, research a recent science news event that h.docxFor this discussion, research a recent science news event that h.docx
For this discussion, research a recent science news event that h.docxevonnehoggarth79783
 
For this Discussion, review the case Learning Resources and the .docx
For this Discussion, review the case Learning Resources and the .docxFor this Discussion, review the case Learning Resources and the .docx
For this Discussion, review the case Learning Resources and the .docxevonnehoggarth79783
 
For this Discussion, give an example of how an event in one part.docx
For this Discussion, give an example of how an event in one part.docxFor this Discussion, give an example of how an event in one part.docx
For this Discussion, give an example of how an event in one part.docxevonnehoggarth79783
 
For this discussion, consider the role of the LPN and the RN in .docx
For this discussion, consider the role of the LPN and the RN in .docxFor this discussion, consider the role of the LPN and the RN in .docx
For this discussion, consider the role of the LPN and the RN in .docxevonnehoggarth79783
 
For this discussion, after you have viewed the videos on this topi.docx
For this discussion, after you have viewed the videos on this topi.docxFor this discussion, after you have viewed the videos on this topi.docx
For this discussion, after you have viewed the videos on this topi.docxevonnehoggarth79783
 
For this discussion choose  one of the case studies listed bel.docx
For this discussion choose  one of the case studies listed bel.docxFor this discussion choose  one of the case studies listed bel.docx
For this discussion choose  one of the case studies listed bel.docxevonnehoggarth79783
 
For this assignment, you will use what youve learned about symbolic.docx
For this assignment, you will use what youve learned about symbolic.docxFor this assignment, you will use what youve learned about symbolic.docx
For this assignment, you will use what youve learned about symbolic.docxevonnehoggarth79783
 
For this Assignment, you will research various perspectives of a mul.docx
For this Assignment, you will research various perspectives of a mul.docxFor this Assignment, you will research various perspectives of a mul.docx
For this Assignment, you will research various perspectives of a mul.docxevonnehoggarth79783
 
For this assignment, you will be studying a story from the Gospe.docx
For this assignment, you will be studying a story from the Gospe.docxFor this assignment, you will be studying a story from the Gospe.docx
For this assignment, you will be studying a story from the Gospe.docxevonnehoggarth79783
 
For this assignment, you will discuss how you see the Design Princip.docx
For this assignment, you will discuss how you see the Design Princip.docxFor this assignment, you will discuss how you see the Design Princip.docx
For this assignment, you will discuss how you see the Design Princip.docxevonnehoggarth79783
 

More from evonnehoggarth79783 (20)

For this Portfolio Project, you will write a paper about John A.docx
For this Portfolio Project, you will write a paper about John A.docxFor this Portfolio Project, you will write a paper about John A.docx
For this Portfolio Project, you will write a paper about John A.docx
 
For this portfolio assignment, you are required to research and anal.docx
For this portfolio assignment, you are required to research and anal.docxFor this portfolio assignment, you are required to research and anal.docx
For this portfolio assignment, you are required to research and anal.docx
 
For this paper, discuss the similarities and differences of the .docx
For this paper, discuss the similarities and differences of the .docxFor this paper, discuss the similarities and differences of the .docx
For this paper, discuss the similarities and differences of the .docx
 
For this paper, discuss the similarities and differences of the impa.docx
For this paper, discuss the similarities and differences of the impa.docxFor this paper, discuss the similarities and differences of the impa.docx
For this paper, discuss the similarities and differences of the impa.docx
 
For this paper choose two mythological narratives that we have exami.docx
For this paper choose two mythological narratives that we have exami.docxFor this paper choose two mythological narratives that we have exami.docx
For this paper choose two mythological narratives that we have exami.docx
 
For this module, there is only one option.  You are to begin to deve.docx
For this module, there is only one option.  You are to begin to deve.docxFor this module, there is only one option.  You are to begin to deve.docx
For this module, there is only one option.  You are to begin to deve.docx
 
For this Major Assignment 2, you will finalize your analysis in .docx
For this Major Assignment 2, you will finalize your analysis in .docxFor this Major Assignment 2, you will finalize your analysis in .docx
For this Major Assignment 2, you will finalize your analysis in .docx
 
For this Final Visual Analysis Project, you will choose one website .docx
For this Final Visual Analysis Project, you will choose one website .docxFor this Final Visual Analysis Project, you will choose one website .docx
For this Final Visual Analysis Project, you will choose one website .docx
 
For this essay, you will select one of the sources you have found th.docx
For this essay, you will select one of the sources you have found th.docxFor this essay, you will select one of the sources you have found th.docx
For this essay, you will select one of the sources you have found th.docx
 
For this discussion, you will address the following prompts. Keep in.docx
For this discussion, you will address the following prompts. Keep in.docxFor this discussion, you will address the following prompts. Keep in.docx
For this discussion, you will address the following prompts. Keep in.docx
 
For this discussion, research a recent science news event that h.docx
For this discussion, research a recent science news event that h.docxFor this discussion, research a recent science news event that h.docx
For this discussion, research a recent science news event that h.docx
 
For this Discussion, review the case Learning Resources and the .docx
For this Discussion, review the case Learning Resources and the .docxFor this Discussion, review the case Learning Resources and the .docx
For this Discussion, review the case Learning Resources and the .docx
 
For this Discussion, give an example of how an event in one part.docx
For this Discussion, give an example of how an event in one part.docxFor this Discussion, give an example of how an event in one part.docx
For this Discussion, give an example of how an event in one part.docx
 
For this discussion, consider the role of the LPN and the RN in .docx
For this discussion, consider the role of the LPN and the RN in .docxFor this discussion, consider the role of the LPN and the RN in .docx
For this discussion, consider the role of the LPN and the RN in .docx
 
For this discussion, after you have viewed the videos on this topi.docx
For this discussion, after you have viewed the videos on this topi.docxFor this discussion, after you have viewed the videos on this topi.docx
For this discussion, after you have viewed the videos on this topi.docx
 
For this discussion choose  one of the case studies listed bel.docx
For this discussion choose  one of the case studies listed bel.docxFor this discussion choose  one of the case studies listed bel.docx
For this discussion choose  one of the case studies listed bel.docx
 
For this assignment, you will use what youve learned about symbolic.docx
For this assignment, you will use what youve learned about symbolic.docxFor this assignment, you will use what youve learned about symbolic.docx
For this assignment, you will use what youve learned about symbolic.docx
 
For this Assignment, you will research various perspectives of a mul.docx
For this Assignment, you will research various perspectives of a mul.docxFor this Assignment, you will research various perspectives of a mul.docx
For this Assignment, you will research various perspectives of a mul.docx
 
For this assignment, you will be studying a story from the Gospe.docx
For this assignment, you will be studying a story from the Gospe.docxFor this assignment, you will be studying a story from the Gospe.docx
For this assignment, you will be studying a story from the Gospe.docx
 
For this assignment, you will discuss how you see the Design Princip.docx
For this assignment, you will discuss how you see the Design Princip.docxFor this assignment, you will discuss how you see the Design Princip.docx
For this assignment, you will discuss how you see the Design Princip.docx
 

Recently uploaded

PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 

Recently uploaded (20)

PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 

842014 I come not to bury summative assessments but to prais.docx

  • 1. 8/4/2014 I come not to bury summative assessments but to praise them | The Thomas B. Fordham Institute http://edexcellence.net/commentary/education-gadfly- daily/common-core-watch/2012/i-come-not-to-bury-summative- assessments-but-to-praise-them.html# 1/3 I come not to bury summative assessments but to praise them Kathleen Porter-Magee (/about-us/fordham-staff/kathleen- porter-magee) February 10, 2012 The Northwest Evaluation Association recently surveyed parents and teachers (http://www.nwea.org/sites/www.nwea.org/files/PressReleaseAs sessmentPerceptions.pdf) to gauge their support for various types of assessm ent. The results (http://www.edweek.org/ew/articles/2012/02/08/21tests.h31.htm l) indicated that just a quarter of teachers find sum m ative
  • 2. assessm ents “‘extrem ely’ or ‘very’ valuable for determ ining whether students have a deep understanding of content.” By contrast, 67 percent of teachers (and 85 percent of parents) found form ative and interim assessm ents extrem ely or very valuable. I can understand why teachers would find form ative and interim assessm ents appealing. After all, teachers generally either create those assessm ents them selves, or are at least intim ately involved with their creation. And they are, therefore, m ore flexible tools that can be tweaked depending on, for instance, the pace of classroom instruction. But, while form ative and interim assessm ents are critically im portant and should be used to guide instruction and planning, they cannot and should not be used to replace sum m ative assessm ents, which play an equally critical role in a standards-driven system . http://edexcellence.net/blog-types/common-core-watch http://edexcellence.net/about-us/fordham-staff/kathleen-porter-
  • 3. magee http://www.nwea.org/sites/www.nwea.org/files/PressReleaseAss essmentPerceptions.pdf http://www.edweek.org/ew/articles/2012/02/08/21tests.h31.html 8/4/2014 I come not to bury summative assessments but to praise them | The Thomas B. Fordham Institute http://edexcellence.net/commentary/education-gadfly- daily/common-core-watch/2012/i-come-not-to-bury-summative- assessments-but-to-praise-them.html# 2/3 Formative and interim assessments cannot and should not be used to replace summative assessments. Everyone has a Sum m ative assessm ents are designed to evaluate whether students have m astered knowledge and skills at a particular point in tim e. For instance, a teacher m ight give a sum m ative assessm ent at the end of a unit to determ ine whether students have learned what they needed to in
  • 4. order to m ove forward. Sim ilarly, and end-of-course or end-of-year sum m ative assessm ent can help determ ine whether students m astered the content and skills outlined in a state’s standards for that grade. If you believe that we need standards to ensure that all students—regardless of their zip code or socioeconom ic status—need to learn the sam e essential content and be held to the sam e standards, than it’s essential to have an independent gauge that helps teachers, parents, adm inistrators, and leaders understand where students are not reaching the goals we’ve set out for them . Unfortunately, the NWEA survey does not m ake this clear, opting instead to narrowly define sum m ative assessm ents only as “state or district-wide standardized tests that m easure grade-level
  • 5. proficiency, and end-of-year subject or course exam s.” It’s hard to im agine m any teachers who are going to be enthusiastic about the current “state or district-wide standardized tests” in use, which often include low-quality questions and the results of which typically don’t reach teachers until it’s too late to do anything with them . And so, by defining sum m ative assessm ents in the particular rather than the general, the NWEA findings tell us less about how teachers feel about the value of sum m ative assessm ents writ large, and m ore about how they feel about the current crop of state tests, which pretty m uch everyone agrees need significant im provem ent. What’s m ore, everyone has a natural bias in favor of the things they create them selves. And so, it’s unsurprising that teachers find the assessm ents that they create and score (in real tim e) m ore useful than tests that are created and scored centrally.
  • 6. Yet, having a set of com m on standards—whether com m on to all schools within a state, or com m on across all states— requires some independent m easure of student learning. There needs to be som e gauge—for 8/4/2014 I come not to bury summative assessments but to praise them | The Thomas B. Fordham Institute http://edexcellence.net/commentary/education-gadfly- daily/common-core-watch/2012/i-come-not-to-bury-summative- assessments-but-to-praise-them.html# 3/3 natural bias in favor of the things they create themselves. teachers, adm inistrators, and parents—that helps show whether classroom instruction, m aterials, and even form ative and interim assessm ents are aligned to the state standards in term s of both content and rigor. And to help teachers
  • 7. and parents understand whether, in the end, students learned the essential content and skills they needed each year. Of course, shifting the focus from teacher-created assessm ents to centrally-developed state (or even district) assessm ents is difficult. And m any teachers will resist being judged by som ething they had no hand in creating, and realigning instruction around standards that m ay look different from what they’ve taught in their classroom s for years. In the end, if we want standards-driven reform to work, we need to get sum m ative assessm ents right. Trading sum m ative assessm ents for form ative assessm ents isn’t an option. They are different tools with very different roles in the system . That m eans policym akers and education leaders need to do a far better job of soliciting teacher feedback on these assessm ent tools and they need to focus m uch m ore tim e and attention on
  • 8. delivering high-quality professional developm ent that helps teachers use the data effectively to guide planning, instruction, and form ative assessm ent developm ent. But it also m eans that teachers in standards- driven schools need to accept that student learning will be m easured by som ething other than the observations and assessm ents created within the four walls of their schools. Teacher-Made Assessments Focus Questions After reading this chapter, you should be able to answer the following questions: 1. What are some important steps in planning for assessment? 2. What kinds of teacher-made assessment options are available? 3. What are some guidelines for constructing good selected- response assessments? 4. What are the advantages and limitations of selected-response
  • 9. assessments? 5. What are the advantages and limitations of constructed- response assessments? 8 Tetra Images/SuperStock A fool must now and then be right by chance. —William Cowper Even a blind squirrel sometimes finds a nut. And a fool is sometimes right by chance. But more often, the fool is wrong and the blind squirrel goes hungry—or ends up feeding a red-tailed hawk that is far from blind. No one would ever have accused Leanne Crowder of being a fool—not only because she would have smacked you up the side of the head if you did, but also because she was clearly smarter than her more average classmates. But she didn’t always have time to study for the many little multiple-choice quizzes with which Mrs. Moskal liked to keep her classes on their toes. Yet she almost always did well on these tests. “How d’ ya do it?” asked Louis, who was trying hard to hang out with her.
  • 10. “I guess,” said Leanne. “I do well just by chance.” Teacher-Made Assessments Chapter 8 Chapter Outline 8.1 Planning for Teacher-Made Tests Goals and Instructional Objectives Test Blueprints Rubrics Approaches to Classroom Assessment 8.2 Performance-Based Assessments Types of Performance-Based Assessments Improving Performance-Based Assessment 8.3 Constructed- and Selected- Response Assessments What Are Selected-Response Assessments? What Are Constructed-Response Assessments? Objective Versus Essay and Short- Answer Tests Which Approach to Assessment Is Best?
  • 11. 8.4 Developing Selected-Response Assessments Multiple-Choice Items Matching Items True-False Items Interpretive Items 8.5 Developing Constructed-Response Assessments Short-Answer Constructed-Response Items Essay Constructed-Response Items Planning for Assessment Chapter 8 Themes and Questions Section Summaries Applied Questions Key Terms Planning for Teacher-Made Tests Chapter 8 “That’s a lie,” said Louis who was academically gifted but not especially socially intelligent.
  • 12. He went on to explain that by chance, Leanne might do well some of the time—as might any other student in the class. But if chance were the only factor determining her results, she should do very poorly most of the time. “If a multiple- choice item has four options,” he expounded like a little professor, “and each of them is equally probable, if you have absolutely no idea which is correct, on average you should answer correctly 25% of the time. And you should be dead wrong three-quarters of the time.” “I’m dead right three-quarters of the time,” Leanne smirked, “and I’m not going to any movie with you.” It turned out, as Louis eventually discovered, that Leanne had quickly noticed that Mrs. Moskal’s test items were so poorly constructed that the clever application of a handful of guidelines almost always assured a high degree of success, even if you only knew a smattering of correct answers to begin with. For example, Mrs. Moskal made extensive use of terms like always, never, everywhere, and entirely in her multiple-choice options; Leanne knew that these are almost always false. She also knew that the longest, most inclusive options are more likely to be correct than shorter, very specific options. And she was clever enough to realize that options that don’t match the question for grammatical or logical reasons are likely incorrect —as are silly or humorous options. And options like all of the above are always correct if two of the above are correct, and none of the above is more often incorrect than not.
  • 13. Mrs. Moskal should have read this chapter! 8.1 Planning for Teacher-Made Tests Reading the chapter might have improved Mrs. Moskal’s construction of teacher-made tests (as opposed to standardized tests that are commercially prepared; these are discussed in Chapter 10). Reading this chapter might have suggested to Mrs. Moskal that she should not rely solely on her memory and intuition when constructing a test, but that she should begin with a clear notion of what she is trying to teach. She then needs to decide on the best ways of deter- mining the extent to which she has been successful. If her assessments are to be useful for determining how well her students have learned (summative function of tests) and for improv- ing their learning (formative function of tests), she needs a clear notion of her instructional objectives, some detailed test blueprints and perhaps some rubrics and checklists to help her evaluate student performances. Goals and Instructional Objectives Educational goals are the nation’s, the state’s, the school district’s, or the teacher’s general statements of the broad intended outcomes of the educational process. Instructional objec- tives are the more specific statements of intended learning outcomes relative to a lesson, a unit, or even a course. In most cases, instructional objectives reflect the broader goals of the
  • 14. curriculum. Whereas educational goals are often somewhat vague and idealistic, the most useful learning objectives for the classroom tend to be very explicit. Most are phrased in terms of behaviors that can be taught and learned, and that can be assessed. Planning for Teacher-Made Tests Chapter 8 National Learning Goals The nation’s educational goals, for example, are often detailed in legislation and regulations. As we saw in Chapter 1, in the United States, the No Child Left Behind Act expresses some very definite aims listed as five distinct goals. These are summarized in Figure 8.1. The law, Public Law 107–110, states as its purpose, “To close the achievement gap with accountability, flexibility, and choice, so that no child is left behind.” Among its broad targets are goals relating to • Improving academic achievement of those with disadvantages • Preparing, training, and recruiting high-quality teachers and school administrators • Improving language instruction for those with limited proficiency in English • Promoting informed parental choice and expanding available educational programs
  • 15. • Increasing accountability and flexibility (NCLB, 2002) Educational goals of this kind, laudable as they might be, are not easily reached. In fact, cur- rent statistics (and common sense) tell us that not a single one of NCLB’s five goals as stated in Figure 8.1 has been reached. Nor will any be reached in our lifetime. It simply isn’t reasonable to expect, for example, that all learners will become proficient in reading and mathematics, nor that all teachers will be highly qualified. Still, these goals are worthwhile ideals. They tell us in what general direction we should direct our efforts so that most, even if not all, learners have a much higher probability of reaching the goals. National ideals such as these provide important guides for state educational goals. Figure 8.1: NCLB educational goals The educational goals that are explicit in the No Child Left Behind Act are lofty ideals that not all learners can reach. But that the educational machinery is aimed in their direction may herald some enormous improvements. f08.01_EDU645.ai Goal 1 Goal 2 Goal 3
  • 16. Goal 4 Goal 5 • By 2013–2014, ALL students will reach high standards and attain proficiency in reading and mathematics. • ALL limited English proficient students will become proficient in English. • By 2005–2006, ALL students will be taught by highly qualified teachers. • ALL students will be educated in learning environments that are safe and drug-free. • ALL students will graduate from high school. Source: Based on the No Child Left Behind Act of 2001. Retrieved April 10, 2013, from http://www2.ed.gov/policy/elsec/leg/esea02/107-110.pdf. http://www2.ed.gov/policy/elsec/leg/esea02/107-110.pdf Planning for Teacher-Made Tests Chapter 8 Common Core State Standards Virtually all states have published descriptions of criteria that can be used to assess the extent to which goals are being met. These are often referred to as standards. Following a nation- wide education initiative, many states have adopted identical standards labeled Common Core State Standards. These standards describe what students should know at each grade level,
  • 17. and for each subject. For example, based on these Common Core State Standards, the state of Washington provides explicit learning targets for science at all levels from kindergarten to 12th grade (McClellan & Sneider, 2009). California, too, is one of more than 45 states that have adopted Common Core State Standards (California Department of Education, 2012). One intended result of adopting common core standards is to bring about a realignment of curricula in different states. State standards serve as a guide for the broad goals and for the specific instructional objec- tives developed by local school jurisdictions and, ultimately, by classroom teachers (Crawford, 2011). For example, the California core reading standards for Literature at grade 1 level specify that students should be able to do the following (Sacramento County Office of Education, 2012a): • Ask and answer questions about key details in a text. • Retell stories, including important particulars, and demonstrate understanding of their central message or lesson. • Describe characters, settings, and major events in a story using key details. • Identify who is telling the story at various points in a text. • Confirm predictions about what will happen next in a text. • Compare and contrast the adventures and experiences of
  • 18. characters in stories. • Identify words and phrases that suggest feelings or appeal to the senses. These are six of the 10 general objectives listed for grade 1 in this area. Note that each of these suggests certain instructional activities. For example, the last objective—identifying words and phrases that suggest feelings or appeal to the senses—leads to a wide range of instructional possibilities. Teachers might take steps to ensure that students understand what emotions are and that they recognize words relating to them; perhaps direct teaching meth- ods might be used to inform learners about the human senses; group activities might encour- age learners to generate affect-related words; learners might be asked to search stories for words and phrases associated with feelings. An objective such as this even suggests instructional activities related to other subject areas. For example, in art classes, students might be asked to draw facial expressions correspond- ing to emotional states described in the stories they are reading in language arts. And in mathematics, they might be asked to count the number of affect- linked words or phrases in different paragraphs or on different pages. And, depending on relevant mathematics objectives, they might be encouraged to add these or to subtract the smaller number from the larger. Not only do state standards suggest a variety of instructional
  • 19. activities, but by the same token, they serve as indispensable guidelines for the school’s and the teacher’s instructional objec- tives. And these are basic to sound educational assessment. In the same way as the main purpose of all forms of instruction is to improve learning, so too, an overriding objective of assessment is to help learners reach instructional objectives. Planning for Teacher-Made Tests Chapter 8 Test Blueprints The best way of ensuring that assessments are directed toward instructional objectives is to use test blueprints. As we saw in Chapter 4, these are basically tables of specifications for developing assessment instruments. They are typically based closely on the instructional objectives for a course or a unit. They may also reflect a list or a hierarchical arrangement of relevant intellectual or motor activities such as those provided by Bloom’s Taxonomy (described in Chapter 4). Many states provide blueprints for large-scale testing (Johnstone & Thurlow, 2012). Suppose, for example, you are teaching sixth-grade mathematics in California. California core standards list detailed objectives at that grade level for five different areas: ratios and propor- tional relationships, the number system, expressions and equations, geometry, and statistics and probability (SCOE, 2012b). The first of six core standards
  • 20. for geometry reads as follows: Find the area of right triangles, other triangles, special quadrilaterals, and polygons by composing into rectangles or decomposing into triangles and other shapes; apply these techniques in the context of solving real-world and mathematical problems. (p. 27) Part of a test blueprint reflecting related learning objectives, based on Bloom’s Taxonomy, might look something like that in Table 8.1. Numbers in the grid indicate the number of test items for each category. Questions in parentheses are examples of the sorts of items that might be used to assess a specific cognitive process with respect to a given topic. Test blue- prints of this kind might also include the value assigned to each type of test item. Table 8.1 Part of a sample test blueprint for a single geometry objective reflecting Bloom’s Revised Taxonomy, cognitive domain Topic Remembering Understanding Higher processes (applying, analyzing, evaluating, creating)
  • 21. Right triangles 4 items (e.g., What is the formula for finding the area of a right triangle?) 1 item (e.g., If you were building a house and could have a total of only 80 feet of perimeter wall, which of the following shapes would give you the largest area? Quadrilateral; polygon; square; right-angle triangle; other shape. Prove that your answer is correct.) Quadrilaterals 3 items Other Triangles 3 items 2 items (e.g., Illustrate how you would find the area of an isosceles triangle by sketching a solution.) 1 item There are several other approaches to devising test blueprints. For example, the blueprint might list what learners are expected to understand, remember, or be able to do. In addition, the most useful blueprints will include an indication of how many items or questions there might be for each entry in the list and the test value for each. Figure 8.2 gives an example of a checklist blueprint for a unit covering part of the content of Chapter 2 in this text. (For other examples of test blueprints, see Tables 4.3 and 4.4 in Chapter
  • 22. 4.) Planning for Teacher-Made Tests Chapter 8 A blueprint such as that shown in Figure 8.2 is useful for more than simply organizing and writing items for a test. It not only serves to guide the instructor’s efforts, but, if given to learners, it can also serve to direct their learning. And perhaps most important, it directs the attention of both teachers and learners toward the higher levels of mental activity. In this connection, it is worth noting that despite teachers’ best intentions and their most carefully prepared test blueprints, assessments don’t always reflect instructional objectives. For a variety of reasons, including that they are much easier to assess, the lowest levels of cognitive activity in Bloom’s Taxonomy (knowledge and comprehension) are often far more likely to be tapped by school assessments than are the higher levels (Badgett & Christmann, 2009). For example, following an analysis of alignment between instructional objectives and assessments in food sciences classes, Jideani and Jideani (2012) report that knowledge- and comprehension-based assessments predominated. And this was true even though instructors intended that their students would go beyond remembering and understanding—that they would also learn to apply, to analyze, to evaluate, and to create. Rubrics
  • 23. As we saw in Chapter 7, another important tool for assessment is the rubric. A rubric is a written guide for assessment. Rubrics are used extensively in performance assessments where, without such guides, evaluations are often highly subjective and unpredictable. Inconsistent assessments are the hallmark of lack of test reliability. And measures that are unreliable are also invalid. Rubrics, like test blueprints, are a guide not only for assessment but also for instruction. And, also like blueprints, they are typically given to the learner before instruction begins. They tell the student what is important and expected far more clearly than might be expressed verbally by most teachers. Figure 8.2: Checklist test blueprint A test blueprint for a short-answer test on Chapter 2 of this text. The instructor might also choose to indicate the relative value of questions relating to each objective listed. f08.02_EDU645.ai Checklist Test Blueprint for a Unit on Characteristics of Good Testing Instruments (Unless otherwise stated, there is one item for each question) Fairness • know what test fairness means • be able to give examples of unfair test items • understand the requirements of NCLB regarding
  • 24. accommodations for learners with special needs Validity • be able to define validity • be able to name and explain the difference between each of the different kinds of validity • understand how test validity can be improved Reliability • understand the importance of test reliability • know how reliability is calculated • be able to suggest how reliability can be improved Planning for Teacher-Made Tests Chapter 8 Table 8.2 is an example of a rubric that might be used for evaluating an analysis paper at the sixth-grade level. Developing detailed rubrics of this kind for every instructional unit simplifies the teacher’s task enormously. It makes lesson planning straightforward and clear; it dramati- cally shortens the amount of time that might otherwise be spent in planning and developing assessment instruments; and it is one of the surest ways of increasing test reliability, validity, and fairness. Table 8.2 Rubric for evaluation of an analysis paper Your analysis paper will be evaluated for each of the following: Points
  • 25. 1. Purpose clearly stated in two or three sentences 10 2. Information provided to support and justify the purpose 10 3. Relevant information by way of facts, examples, and research included 20 4. Absence of irrelevant information 5 5. Analysis presented in coherent, logical fashion evident in paragraphing and sequencing 20 6. Few grammatical and spelling errors (up to 10 points may be deducted) 0 7. Clear, well-supported conclusions 15 8. High interest level 20 TOTAL 100 Approaches to Classroom Assessment In Chapter 4, we saw that assessment can serve at least four different functions in schools. 1. Assessment might be used for placement purposes before or after instruction—or, some- times, during instruction (placement assessment). 2. It might assume a helping role when feedback from ongoing assessments is given to learn- ers to help them improve their learning and where ongoing assessments suggest to the teacher how instructional strategies might be modified (formative assessment).
  • 26. 3. School assessments often serve to provide a summary of the learner’s performance and achievements. These unit- or year-end assessments are usually the basis for grades and for decisions affecting future placement (summative assessment). 4. Assessments might also be used to identify problems, to determine strengths and weak- nesses, and to suggest possible approaches to remediation (diagnostic assessment). Teacher-made assessments, no matter to which of these uses they are put, can take any one of several forms. Among them are performance-based assessments, selected-response assessments, and constructed-response assessments. Planning for Teacher-Made Tests Chapter 8 A P P L I C A T I O N S : New Assessment-Related CAEP Standards for Accreditation of Teacher Preparation Programs Until July 2013, two organizations were dedicated to ensuring that teacher preparation programs graduated highly qualified teachers for the nation’s PK-12 school systems: the National Council for Accreditation of Teacher Education (NCATE) and the Teacher Education Accreditation Council (TEAC). Higher education institutions that had teacher preparation programs could demonstrate that they met either NCATE’s or TEAC’s standards for teacher
  • 27. preparation to attain accreditation. Accreditation was interpreted as proof of the quality of an institution’s programs and enhanced its credibility. On July 1, 2013, these two organizations became a new entity: the Council for the Accreditation of Educator Preparation (CAEP). Their purpose was not just to merge the two organizations to elimi- nate duplication of efforts and reduce costs to higher education institutions: In addition, they set as their goal: To create a model unified accreditation system…. CAEP’s goals should be not only to raise the performance of candidates as practitioners in the nation’s PK-12 schools, but also to raise the stature of the entire profession by raising the standards for the evi- dence the field relies on to support its claims of quality. (pp. 2 and 3) In late August of 2013, the CAEP board of directors will meet to ratify the standards that teacher education programs will need to reach if they are to be accredited. These standards were developed by a committee whose membership reflected a broad spectrum of interested parties, from public school teachers to university deans to state school superintendents. In addition, the draft standards were made available for public comment so everyone had an opportunity to react and contribute. It is anticipated that teacher preparation programs will have access to resources regarding the new CAEP standards by January of 2014.
  • 28. So what are the ramifications of these new standards for teacher preparation programs? In terms of assessment, the following table is a comparison of the old and new standards associated with assessment. The new standards have a clear new emphasis: It is no longer enough simply to have an assessment system; now institutions must use assessment data to make decisions and to evaluate how well they are doing. The new CAEP standards also recognize the importance of having multiple tools for assessment and of collecting data beyond the confines of the institution. When the standards are approved by the CAEP board of directors, teacher preparation programs will need to offer proof that they solicit information from schools and communities to inform their practices. This will encourage close con- tact between teacher preparation institutions and the systems that hire their graduates and increase responsiveness to the needs of the schools. Finally, the new CAEP standards suggest that teacher preparation programs should follow their graduates into the schools to collect data on their perfor- mance as teachers. Teacher preparation programs will be charged with providing evidence that their candidates can “walk the talk.” It will be interesting to see how this new accreditation process plays out. One initial purpose for pursuing the consolidation of the two accrediting agencies was to reduce the financial burden teacher preparation programs incurred when seeking national accreditation. Will CAEP with its
  • 29. revised standards accomplish this goal? Or will the revisions require teacher education programs to expand the role of the assessment process, thereby increasing its cost? Performance-Based Assessments Chapter 8 8.2 Performance-Based Assessments Performance-based assessments are covered in detail in Chapter 7 and are summarized briefly here. Types of Performance-Based Assessments Basically, a performance-based assessment is one that asks the student to perform a task or produce something, often in a situation that approximates a real-life setting as closely as possible. Among the most common performance assessments are developmental assess- ments, demonstrations, exhibitions, and portfolios. Performance-based assessments are often referred to as authentic assessments, although the expressions are not synonymous. A performance assessment is judged to be authentic to the extent that it asks the student to perform in ways that are closer to the requirements of actual performances in day-to-day settings. NCATE 2008 TEAC 2009 CAEP 2013 Standard 2: Assessment System and Unit Evaluation
  • 30. The unit has an assessment system that collects and analyzes data on applicant qualifications, candidate and graduate performance, and unit operations to evaluate and improve the performance of candidates, the unit, and its programs. 1.5 Evidence of valid assessment The program must provide evidence regarding the trustwor- thiness, reliability, and validity of the evidence produced from the assessment method or methods that it has adopted. 2. Data drive decisions about candidates and programs This standard addresses CAEP’s expectations regarding data quality and use in program improvement. The education preparation provider (EPP) must provide evidence that it has a functioning quality control system that is effective in supporting program improvement. Its quality control system must draw on valid and reliable evidence from multiple sources.
  • 31. 2.1 Decisions are based on evidence from multiple measures of candidates’ learning, completers’ performance in the schools, and school and community conditions and needs. 2.2 The education preparation provider has a system for regular self- assessment based on a coherent logic that connects the program’s aims, content, experiences and assessments. 2.3 The reliability and validity of each assessment measure are known and adequate, and the unit reviews and revises assessments and data sources regularly and systematically. 2.4 The education preparation provider uses data for program improvement and disaggregates the evidence for discrete program options or certifica- tion areas. Performance-Based Assessments Chapter 8 A performance assessment might ask a fine arts student to prepare an exhibition of paintings for display in a school caf- eteria as a basis for a final mark; a physical education student might be graded on a demonstration of different sports- related skills in competitive situations combined with scores on written tests; and part of a language arts student’s final grade might be based on a portfolio that contains samples of written work spanning the school year.
  • 32. It is true that many of the instructional objectives related to these three situations can be assessed with non-performance- based, teacher-made instruments. However, a teacher-made test that is not performance-based is unlikely to reveal very clearly how well Lenore can select, organize, and present an art exhibition or how Robert is likely to perform during the pressure of athletic competition. Nor is a single, year-end cre- ative writing test likely to say nearly as much about Elizabeth’s writing skills as does her yearlong collection of representative compositions. Also, because many performance assessments do not require high levels of verbal skills, they are exception- ally well suited for use in early grades or during the preschool period, as well as for some children with special needs. Improving Performance-Based Assessment Performance-based assessments have a number of limita- tions and drawbacks. First, they can be very time-consuming, especially when they involve individual performances, each of which must be evaluated. Second, performance-based assessments are not always very practical, particularly when they require special equipment or locations—both of which might be the case for assessments in areas that require performances such as public speaking or competitive sports activities. Third, despite the argument that they are more authentic, performance-based assessments tend to have much lower reliability. And, because of that fact, they may often be less valid and less fair. However, there are ways of improving performance-based
  • 33. assessments. One way to improve their reliability is to use carefully designed rubrics and checklists. Wesolowski (2012) suggests that these need to make the assessment process as objective as possible. A rubric should be designed so that different evaluators who base their assessments on the same rubric will arrive at very similar scores. There is evidence that the usefulness of performance-based assessments can be greatly improved through additional teacher training and development. This can be accomplished by means of workshops that emphasize the use of rubrics, checklists, and rating scales. Koh (2011) looked at the results of teacher participation in such workshops. He reports that these teacher development activities resulted in significant improvements in teachers’ understand- ing of performance assessments and in the usefulness of the assessments and the rubrics they designed. Wavebreak Media/Thinkstock ▲ Because they are closer to real-life situ- ations, performance-based assessments are often described as more authentic assessments. Some of the most impor- tant learning targets associated with the music class to which this student belongs cannot easily be assessed with a selected-response test. The test is in the performance.
  • 34. Constructed- and Selected-Response Assessments Chapter 8 Performance assessments can also be improved by using a variety of creative and highly motivating approaches. Schurr (2012) provides numerous suggestions in a book that lists more than 60 different ways of using performances to assess student learning. For example, students might be asked to write blog or journal entries as though they were actually part of Lewis and Clark’s company of explorers, or as though they were soldiers in Napoleon’s army or members of Queen Isabella’s court. The book also includes suggestions for designing rubrics. It includes examples as well as a list of online resources for performance assessments. Figure 8.3 summarizes some of the guidelines that might be used to ensure that performance- based assessments are as reliable, valid, and fair as possible. 8.3 Constructed- and Selected-Response Assessments Test items are the basic units that make up an assessment. These are often referred to as test questions, although many assessment items are not questions at all; instead, they are direc- tions, instructions, or requests. Some teacher-made assessments include several different kinds of items. Often, however, they are made up of a single sort of item. Test items can generally be divided into two broad categories: those that ask students to select a correct answer, termed selected-response assessments, and those that require examinees to produce
  • 35. (construct) the correct response, usually in writing but also sometimes orally. These are referred to as constructed-response assessments. Figure 8.3: Improving performance-based assessment Of these suggestions, probably the most important for increasing the reliability, validity, and fairness of performance-based assessments is the use of carefully designed scoring rubrics and checklists. f08.03_EDU645.ai Suggestions for Improving Performance-Based Assessments • When possible, use a variety of different performance assessments. • Use carefully constructed rubrics and checklists. • Assess performances that reflect clear learning targets. • Design performance tasks that closely approximate real-life settings. • Select tasks that are interesting, motivating, and challenging. • Assess behaviors that can be taught and learned and
  • 36. where improvement can be demonstrated through performance. • Take steps to ensure that students understand what is expected and the criteria upon which they will be assessed. • Develop performance assessments that are practical within budget and time constraints. • Direct assessments toward important rather than trivial learning targets. Constructed- and Selected-Response Assessments Chapter 8 What Are Selected-Response Assessments? Selected-response items are generally considered to be more objective than constructed- response items, simply because each item usually has a single clearly correct answer. In most cases, if more than one response is correct, that is taken into account in scoring. As a result, answer keys for assessments made up of selected-response items tend to be simple and exact. No matter which examiner scores a selected-response assessment, results should be identical. There are four principal kinds of selected-response items: 1. Multiple-choice items ask students to select which of several alternatives is the correct response to a statement or question.
  • 37. 2. True-false items, also called binary-choice items, ask the responder to make a choice between two alternatives, such as true or false. 3. Matching-test items present two or more corresponding lists, from which the examinee must select those that match. 4. Interpretive items are often similar to multiple-choice items, except that they provide information that examinees need to interpret in order to select the correct alternative. Information may be in the form of a chart, a graph, a paragraph, a video, or an audio recording. What Are Constructed-Response Assessments? Constructed-response items are more subjective than selected- response items, because they ask learners to generate their own responses. As a result, they often have more than one cor- rect answer. Test makers distinguish between two broad forms of constructed-response items, based largely on the length of the answer that is required. Thus there are short-answer items requiring brief responses—often no longer than a single paragraph—and essay items that ask the student to write a longer, essay-form response for the item. Figure 8.4 summarizes these distinctions. Objective Versus Essay and Short-Answer Tests
  • 38. The constructed-response (objective) items and the more subjective essay and short-answer items shown in Figure 8.4 can both be used to measure almost any significant aspect of stu- dents’ behavior. It is true, however, that some instructional objectives are more easily assessed with one type of item than with the other. The most important uses, strengths, and limita- tions of these approaches are described here. While the descriptions can serve as a guide in deciding which to use in a given situation, most good assessment programs use a variety of approaches: 1. It is easier to tap higher level processes (analysis, synthesis, and evaluation) with an essay examination. These can more easily be constructed to allow students to organize knowl- edge, to make inferences from it, to illustrate it, to apply it, and to extrapolate from it. Still, good multiple-choice items can be designed to measure much the same things as constructed-response items. Consider, for example, the following multiple-choice item: Constructed- and Selected-Response Assessments Chapter 8 Harvey is going on a solo fishing and camping trip in the far north. What equipment and supplies should he bring? a. rainproof tent; rainproof gear; fishing equipment; food
  • 39. b. an electric outboard motor; a dinner suit; a hunting rifle c. some books; a smart phone; fishing equipment; money *d. an ax; camping supplies; fishing equipment; warm, waterproof clothing Answering this item requires that the student analyze the situation, imagine different sce- narios, and apply previously acquired knowledge to a new situation. In much the same way, it is possible to design multiple-choice items that require that students synthesize ideas and perhaps even that they create new ones. As we saw, however, the evidence indicates that most selected- response assessments tend to tap remembering—the lowest level in Bloom’s Taxonomy. Most items simply ask the student to name, recognize, relate, or recall. Few classroom teachers can easily create items that assess higher cognitive processes. 2. Because essay and short-answer exams usually consist of only a few items, the range of skills and of information sampled is often less than what can be sampled with the more Figure 8.4: Types of assessment items As this chart indicates, some tests include more than one type of assessment item. f08.04_EDU645.ai
  • 40. Types of Assessment Items Selected- Response (More Objective) Multiple choice True-False Matching Interpretive Short-answer Essay Constructed- Response (Less Objective) Constructed- and Selected-Response Assessments Chapter 8 objective tests. Selected-response assessments permit coverage of more content per unit of testing time. 3. Essay examinations allow for more divergence. They make it
  • 41. possible for students to pro- duce unexpected and unscripted responses. Those who do not like to be limited in their answers often prefer essays over more objective assessments. Conversely, those who express themselves with difficulty when writing often prefer selected-response assess- ments. However, Bleske-Rechek, Zeug, and Webb (2007) found that very few students consistently do better on one type of assessment than another. 4. Constructing an essay examination is considerably easier and less time-consuming than making up an objective examination. In fact, an entire test with an essay format can often be written in the same time it would take to write no more than two or three good multiple-choice items. 5. Scoring essay examinations usually requires much more time than scoring objective tests, especially when classes are large. This is especially true when tests are scored electroni- cally. When classes are very small, however, the time required for making and scoring an essay test might be less than that required for making and scoring a selected-response test. The hypothetical relationship between class size and total time for constructing and scoring constructed-response and selected-response tests is shown in Figure 8.5. 6. As Brown (2010) reports, the reliability of essay examinations is much lower than that of objective tests, primarily because of the subjectivity involved in scoring them. In addition,
  • 42. suggests Brown, examiners often overemphasize the language aspects of the essays they are scoring. As a result, they pay less attention to the content, and the validity of the grades suffers. Some researchers have begun to develop computer programs designed to score con- structed-response test items. Typically, however, use of these is limited to questions where acceptable responses are highly constrained and easily recognizable (e.g., Johnson, Nadas, & Bell, 2010; McCurry, 2010). Figure 8.5: Construction and scoring time: Essays versus objective assessments A graph of the hypothetical relationship between class size and total time required for construct- ing and scoring selected-response tests (multiple-choice, for example) and constructed-response tests (essay tests). As shown, preparation and scoring time for essay tests increases dramatically with larger class size, but it does not change appreciably for machine-scored objective tests. f08.05_EDU645.ai Number of students from low to high To ta l ti
  • 44. o w t o h ig h Essay Assessment Objective Assessment Constructed- and Selected-Response Assessments Chapter 8 Which Approach to Assessment Is Best? The simple answer is, it depends. Very few teachers will ever find themselves in situations where they must always use either one form of assessment or the other. Some class situations, particu- larly those in which size is a factor, may lend themselves more readily to objective formats; in other situations, essay formats may be better; sometimes a combination of both may be desir- able. The important point is that each form of assessment has advantages and disadvantages. A good teacher should endeavor to develop the skills necessary for constructing the best items possible in a variety of formats without becoming a passionate
  • 45. advocate of one over the other. The good teacher also needs to keep in mind that there are many alternatives to assess- ment other than the usual teacher-made or commercially prepared standardized tests. Among these, as we saw earlier, are the great variety of approaches to performance assessment. In the final analysis, the assessment procedure chosen should be determined by the goals of the instructional process and the purposes for which the assessment will be used. Nor are teachers always entirely alone when faced with the task of constructing (or select- ing) assessment instruments and approaches—even as teachers are not entirely on their own when they face important decisions about curriculum, objectives, or instructional approaches. Help, support, and advice are available from many sources, including other teachers, adminis- trators, parents, and sometimes even students. In many schools, formal associations, termed professional learning communities (PLCs), are an extremely valuable resource (see In the Classroom: Professional Learning Communities [PLCs] ). Table 8.3 shows how different types of assessment might be used to tap learning objectives relating to Bloom’s revised taxonomy (discussed in Chapter 4). Note that the most common I N T H E C L A S S R O O M : Professional Learning Communities (PLCs)
  • 46. A professional learning community (PLC) is a grouping of educators, both new and experi- enced—and sometimes of parents as well—who come together to talk about, reflect on, and share ideas and resources in an effort to improve curriculum, learning, instruction, and assessment (Dufour, 2012). Professional learning communities are formal organizations within schools or school systems. They are typically established by principals or other school leaders and are geared toward establishing collaboration as a basis for promoting student learning. PLCs are characterized by • Supportive and collaborative educational leadership • Sharing of goals and values • Collaborative creativity and innovation • Sharing of personal experiences • Sharing of instructional approaches and resources • Sharing of assessment strategies and applications • A high degree of mutual support Evidence suggests that professional learning communities are a powerful means of professional development and support (Brookhart, 2009; Strickland, 2009). They are also a compelling strategy for educational change and improvement. Constructed- and Selected-Response Assessments Chapter 8
  • 47. assessments for higher mental processes such as analyzing, evaluating, and creating are either constructed-response or perfor- mance-based assessments. However, as we see in the next section, selected-response assessments such as multiple-choice tests can also be designed to tap these processes. Table 8.3 What assessment approach to use Bloom’s Revised Taxonomy of Educational Objectives Verbs Related to Each Objective Students are asked to: Some Useful Approaches to Assessment Remembering copy, duplicate, list, learn, replicate, imitate, memorize, name, order, relate, reproduce, repeat, recognize, . . . Selected-response assessments including multiple-choice, true- false, matching, and interpretive Understanding indicate, know, identify, locate, recognize, report, explain, restate, review, describe,
  • 48. distinguish, . . . Selected-response assessments that require learner to locate, identify, recognize, . . . Constructed-response assessments including short-answer and longer essay items where students are asked to explain, describe, compare, . . . Applying demonstrate, plan, draw, outline, dramatize, choose, sketch, solve, interpret, operate, do, . . . Written constructed-response assessments where students are required to describe prototypes or simulations showing applications Performance assessments where learners demonstrate an appli- cation, perhaps by sketching or dramatizing it Analyzing calculate, check, categorize, balance, compare, contrast, test, differentiate, examine, try, . . . Written assessments requiring comparisons, detailed analyses, advanced calculations Performance assessments involving activities such as debating or designing concept maps Evaluating assess, choose, appraise, price, defend, judge, rate, calculate, support, criticize, predict, . . .
  • 49. Written assessments requiring judging, evaluating, critiquing Performance assessments using portfolio entries reflecting opinions, reflections, appraisals, reviews, etc. Creating arrange, write, produce, make, design, formulate, compose, construct, build., generate, craft, . . . Written assignments perhaps summarizing original research projects Performance assessments involving original output such as musical compositions, written material, designs, computer programs, etc. ▶ Professional learning communities (PLCs) are organized groups of educators who meet regu- larly to reflect and collaborate on improving curriculum, learning, instruction, and assess- ment. Such groups are a powerful strategy for educational improvement. iStockphoto/Thinkstock Developing Selected-Response Assessments Chapter 8 8.4 Developing Selected-Response Assessments As we noted, selected-response assessments tend to be more objective than constructed- response assessments. After all, most of them have only one correct answer.
  • 50. Multiple-Choice Items Among the most common of the highly objective selected- response assessments is that con- sisting of multiple-choice items. These are items that have a stem—often a question or an incomplete statement—followed by a series of possible responses referred to as alterna- tives. There are usually four or five alternatives, only one of which is normally correct; the others are termed distracters. On occasion, some multiple-choice tests may contain more than one correct alternative. These, as Kubinger and Gottschall (2007) found, are usually more difficult than items with a single correct answer, providing responders are required to select all correct alternatives for the item to be marked correct. The researchers created a multiple-choice test where any number of the five alternatives might be correct. These test items were more difficult than identical items that had only one correct answer, because guessing was now much less likely to lead to a correct response. If responders did not recognize the correct answers and tried to guess which they might be, they would not know how many alternatives to select. Multiple-choice stems and alternatives can take a variety of forms. Stems might consist of questions, statements requiring completion, or negative statements. Alternatives might be best answer, combined answers, or single answers. Examples of each of these items are
  • 51. shown in Figure 8.6. Guidelines for Constructing Multiple-Choice Items Writing good multiple-choice items requires attention to a number of important guidelines. Many of them involve common sense (which makes them no less valid): 1. Both stems and alternatives should be clearly worded, unambiguous, grammatically cor- rect, specific, and at the appropriate level of difficulty. In addition, stems should be clearly meaningful by themselves. Compare, for example, the following two items: A. In the story The Red Rat, how did Sally feel toward Angela after her accident? a. sad b. angry c. jealous d. confused B. In the story The Red Rat, how did Sally feel toward Angela after Angela’s accident? a. sad b. angry c. jealous d. confused The problem with the first stem is that the pronoun her has an ambiguous referent. Does the question refer to Sally’s accident or Angela’s? The second stem corrects that error. Similarly, stems that use the word they without a specific context or reference are some-
  • 52. times vague and misleading. For example, the true-false question “Is it true that they say Developing Selected-Response Assessments Chapter 8 you should avoid double negatives?” might be true or false, depending who they is. If they refers to most authors of assessment textbooks, the correct answer is true. But if they refers to the Mowats, who lived back in my isolated neck of the woods, the correct answer would be false: They didn’t never say don’t use no double negatives! 2. But seriously, don’t use no double negatives when writing multiple-choice items. They are highly confusing and should be avoided at all costs. Are single negatives highly rec- ommended? Not. Common, easily found examples of double and even triple negatives include combinations like these: It is not unnecessary to pay attention—meaning, simply, “It is necessary to pay attention.” It is not impossible to pay attention—meaning, “It is possible to pay attention.” Figure 8.6: Examples of multiple-choice items Stems and alternatives can take a variety of forms. In these examples, the alternatives are always ordered alphabetically or numerically. This is a precaution
  • 53. against establishing a pattern that might provide a clue for guessing the correct response. (Correct responses are checked.) f08.06_EDU645.ai Incomplete Statement Stem 1. The extent to which a test appears to measure what it is intended to measure defines ___ a. construct validity ___ b. content validity ___ c. criterion-related validity ___ d. face validity ___ e. test reliability Question Stem 2. Who is the theorist most closely associated with the development of operant conditioning? ___ a. Bandura ___ b. Pavlov ___ c. Skinner ___ d. Thorndike ___ e. Watson Negative Statement Stem 3. Which of the following is NOT a dinosaur? ___ a. allosaurus ___ b. brachiosaurus ___ c. stenogrosaurus ___ d. triceratop
  • 54. ___ e. velociraptor Best Answer Alternative 4. What was the main motive for Britain entering WWII? ___ a. economics ___ b. fear ___ c. greed ___ d. hatred ___ e. loyalty 5. Order the following from largest to smallest in geographic area: 1. Brazil 2. Canada 3. China 4. Russia 5. United States Single Answer Alternative 6. What is the area of a 20 foot by 36 inch rectangle? ___ a. 16 square feet ___ b. 20 square feet ___ c. 56 square feet ___ d. 60 square feet ___ e. 720 square feet Combined Answer Alternative X X
  • 55. X X ___ a. 1, 2, 3, 4, 5 ___ b. 1, 3, 5, 4, 2 ___ c. 4, 2, 5, 3, 1 ___ d. 4, 5, 2, 1, 3 ___ e. 2, 4, 5, 3, 1 X X Developing Selected-Response Assessments Chapter 8 The switch is not disabled—meaning, “The switch is functioning.” It is impossible to not do something illegal—meaning, strangely, “It is not possible to do something legal.” For lack of no other option—meaning very little. If we lack no other option, there must be another option. No? Test makers need to be especially wary of negative prefixes such as un–, im–, dis–, in–, and so on. 3. Unless the intention is clearly to test memorization, test items should not be taken word for word from the text or other study materials. This is especially the case when instruc-
  • 56. tional objectives involve application, analysis, or other higher mental processes. 4. Create distracters that seem equally plausible to students who don’t know the correct answer. Otherwise, answer- ing correctly might simply be a matter of eliminating highly implausible dis- tracters. Consider the following exam- ple of a poor item: A. 10 + 12 + 18 = a. 2 b. 2,146 c. 40 d. 1 For students who don’t know how to calculate the correct answer, highly implausible distracters that can eas- ily be eliminated may dramatically increase the score-inflating effects of guessing. 5. Unintentional cues should be avoided. For example, ending the stem with a or an often provides a cue as in, for example: A. A pachyderm is an a. cougar b. dog c. elephant d. water buffalo 6. Avoid the use of strong qualifying words such as never, always, none, impossible, and
  • 57. absolutely in distracters. Distracters that contain them are most often incorrect. On the other hand, distracters that contain weaker qualifiers such as sometimes, frequently, and usually are often associated with correct alternatives. At other times, they are simply vague and confusing. Consider, for example: A. Multiple-choice alternatives that contain strong qualifiers are a. always incorrect b. never incorrect c. usually incorrect d. always difficult iStockphoto/Thinkstock ▲ This boy is completing an online, take-home, selected- response test. Perhaps using two computers allows him to have one send out various search engines looking for answers while he completes the timed test on the other. That is one of the factors that needs to be considered in online courses. Developing Selected-Response Assessments Chapter 8 As expected, alternatives with strong qualifiers (always and never) are incorrect, and the alternative with a weak qualifier (usually) is correct. Weak qualifiers often present an additional problem: They can be highly ambiguous. For example, interpreting the alternative usually incorrect is difficult because the term is impre- cise. Does usually in this context mean most of the time? Does it mean more than half the
  • 58. time? More than three quarters of the time? In stems, both kinds of qualifiers also tend to be ambiguous, and the weaker ones are more ambiguous than those that are strong. Never usually means “not ever”—although it can sometimes be interpreted to mean “hardly ever.” But frequently is one of those alarmingly vague terms for whose meaning we have no absolutes—only relatives. Just how often is frequently? We don’t really know—which is why the word fits so well in many of our lies and exaggerations. 7. Multiple-choice assessments, like all forms of educational assessment, also need to be relevant to instructional objectives. That is, they need to include items that sample course objectives in proportion to their importance. This is one of the reasons teachers should use test blueprints and should make sure they are closely aligned with instructional objectives. 8. Finally, as we saw in Chapter 2, assessments need to be as fair, valid, and reliable as pos- sible. Recall that fair tests are those that: • Assess material that all learners have had an opportunity to learn • Allow sufficient time for all students to complete the test • Guard against cheating • Assess material that has been covered
  • 59. • Make accommodation for learner’s special needs • Are free of the influence of biases and stereotypes • Avoid misleading, trick questions • Grade assessments consistently Recall, too, that the most reliable and valid assessments tend to be those based on longer tests or on a variety of shorter tests where scoring criteria are clear and consistent. These guidelines are summarized in Table 8.4. Table 8.4 Checklist for constructing good multiple-choice items Yes No Are stems and alternatives clear and unambiguous? Yes No Have I avoided negatives as much as possible? Yes No Have I included items that measure more than simple recall? Yes No Are all distracters equally plausible? Yes No Have I avoided unintentional cues that suggest correct answers? Yes No Have I avoided qualifiers such as never, always, and usually? Yes No Do my items assess my instructional objectives? Yes No Are my assessments as fair, reliable, and valid as possible?
  • 60. Developing Selected-Response Assessments Chapter 8 Matching Items The simplest and most common matching-test item is one that presents two columns of information, arranged so that each item in one column matches a single item in the other. Columns are also organized so that matching terms are randomly juxtaposed, as shown in Figure 8.7. Matching items can be especially useful for assessing understanding, in addition to remember- ing. In particular, they assess the student’s knowledge of associations and relationships. They can easily be constructed by generating corresponding matching lists for a wide variety of items. For example, Figure 8.7 matches people with concepts. Other possible matches include historical events with dates; words with definitions; words in one language with translations in another; geometric shapes with their names; literary works titles with names of authors; historical figures with historical positions or historical events; names of different kinds of implements with their uses; and on and on. The most common matching items present what is termed the premise column (or some- times the stem column) on the left and possible matches in what is called the response column on the right.
  • 61. A matching item might have more entries in the response column or an equal number in each. From a measurement point of view, one advantage of having different numbers of entries in each column is that this reduces the possibility of answering correctly when the student does not know the answer. In the example shown in Figure 8.7, for example, students who know three of four correct responses will also get the fourth correct. By the same token, those who know two of the four will have a 50-50 chance of getting the next two correct. Even if a Figure 8.7: Example of a matching-test item Matching-test items should have very clear instructions. More complex matching items sometimes allow some responses to be used more than once or not at all. f08.07_EDU645.ai Instructions: Match the theorists in column A (described in Chapter 2) with the concept in column B most closely associated with each theorist. Write the number in front of each entry in column B in the appropriate space after each theorist named in column A. Each term should be used only once. A. Theorist B. Associated Term Thorndike _____ Watson _____ Skinner _____
  • 62. Pavlov _____ 4. Operant conditioning 3. Law of Effect 2. Classical conditioning 1. Behaviorism2 4 1 3 Developing Selected-Response Assessments Chapter 8 student knew only one response, there would still be a pretty good chance of guessing one or all of the others correctly. But the more items there are in the response column, the lower the odds of selecting the correct unknown response by chance. Figure 8.8 presents an example of a matching item with more items in the response than the premise column. Similarly, some matching tests are constructed in such a way that each item in the response list might be used once, more than once, or not at all. Not only does this approach effec- tively eliminate the possibility of narrowing down options for
  • 63. guessing, but it might be constructed to require that the student engage in behaviors that require calculating, compar- ing, differentiating, predicting, appraising, and so on. All of these activities tap higher level cognitive skills. Figure 8.8: Example of a matching-test item with uneven columns When matching-test items contain more items in the response list than in the premise list, reli- ability of the measure increases because the probability of correctly guessing unknown responses decreases. f08.08_EDU645.ai Instructions: Match the 21st century world leaders in column A with the country each has led or currently leads, listed in column B. Write the number in front of each entry in column B in the appropriate space after each leader in column A. There is only one correct response for each item in column A. A. World Leader B. Country Led 4. Italy 3. Egypt 2. Brazil 1. ArgentinaLuiz Inácio Lula da Silva _____2
  • 64. Mohamed Morsy _____3 Al-Assad _____9 Ali Abdullah Saleh _____10 5. North Korea Kim Jong-un _____5 6. Portugal Silvio Berlusconi _____4 7. Saudi Arabia Mariano Rajoy _____8 8. Spain 9. Syria 10. Yemen Developing Selected-Response Assessments Chapter 8 Not all matching-test items are equally good. Consider, for example, the item shown in Figure 8.9. Note how the instructions are clear and precise: They state exactly what the test taker must do and how often each response can be used. But it really is a very bad item. Entries in each column are structured so differently that for those with adequate reading skills, gram- matical cues make the answers totally obvious. The person who
  • 65. built this item should have paid attention to the following guidelines: 1. Items in each column should be parallel. For example, in Figure 8.7, all items in the premise column are names of theorists, and all items in the response column are terms related in some important way to one of the theories. Similarly, in Figure 8.8, premise entries are all names of world leaders, and response entries are all countries. The following is an example of nonparallel premise items that are to be matched to a response list of different formulas for calculating the surface area of different geometric figures: triangle square rectangle cardboard boxes circle The inclusion of cardboard boxes among these geometric shapes is confusing and unnecessary. Test makers must also guard against items that are not grammatically parallel, as is shown in Figure 8.9. Figure 8.9: Example of a poorly constructed matching-test item To avoid many of the problems that are obvious in this example,
  • 66. simply use complete sentences or parallel structures in the premise column. f08.09_EDU645.ai Instructions: Match the statements in column A with the best answer from column B. Write the number in front of each answer in column B in the appropriate space after each statement in column A. No answer can be used more than once. One will not be used at all. A. In the Story, Pablo’s Chicken B. Answers based on Pablo’s Chicken 4. kitchen scraps 3. his mother 2. his dog dies 1. angryAt the beginning of the story, Pablo lives in _____5 Pablo gets very upset when _____2 When Pablo answers the door, his dog bites _____3 Pablo’s mother is very _____1 5. Monterrey Developing Selected-Response Assessments Chapter 8 2. All items in the response list should be plausible. This is
  • 67. especially true if the response list contains more entries than the premise column. The test is less reliable when it contains items that allow students to quickly discard implausible responses. 3. To increase the reliability of the test, the response column should contain more items than the premise column. 4. Limit the number of items to between six and 10 in each column. Longer columns place too much strain on memory. Recall from Chapter 3 that our adult short-term memory is thought to be limited to seven plus or minus two items. It is difficult to keep more items than this in our conscious awareness at any one time. 5. We saw that grammatical structure can sometimes provide unwanted clues in multiple- choice items. This can also be the case in matching-test items, as is the case for the item in Figure 8.9 where grammatical structure reveals almost all the correct responses. Moreover, the fourth item in the response column is an implausible response. 6. Directions should be clear and specific. They should stipulate how the match is to be made and on what basis. For example, directions for online matching tests might read: “Drag each item in column B to the appropriate box in front of the matching item in column A.” Similar instructions for a written matching-item test might specify: “Write the num- ber in front of each answer in column B in the appropriate space
  • 68. after each statement in column A.” 7. Response items should be listed in logical order. Note, for example, that response columns in Figures 8.7 through 8.9 are alphabetical. Where response items are numerical, they should be listed in ascending or descending order. Doing so eliminates the possibility of developing some detectable pattern. It also discourages students from wasting their time looking for a pattern. 8. For paper-and-pencil matching items, ensure that lists are entirely on one page. Having to flip from one page to another can be time-consuming and confusing. Table 8.5 summarizes these guidelines in the form of a checklist. Table 8.5 Checklist for constructing good matching items Yes No Are items in the premise column parallel? Yes No Are items in the response column parallel? Yes No Are all response-column items plausible? Yes No Have I included more response- than premise-column items? Yes No Are my lists limited to no more than seven or so items? Yes No Have I avoided unintentional cues that suggest correct
  • 69. matches? Yes No Are my directions clear, specific, and complete? Yes No Have I listed response-column items in logical order? Yes No Are my columns entirely on one page? Yes No Do my items assess my instructional objectives? Yes No Are my assessments as fair, reliable, and valid as possible? Developing Selected-Response Assessments Chapter 8 True-False Items A relatively common form of assessment, often used in the early grades, is the true-false item. True-false items typically take the form of a statement that the examinee must judge as true or false. But they can also consist of statements or questions to which the correct answer might involve choosing between responses such as yes and no or right and wrong. As a result, they are sometimes called binary-choice items rather than true- false items. True-false test items tend to be popular in early grades because they are easy to construct, can be used to sample a wide range of knowledge, and provide a quick and easy way to look at the extent to which instructional objectives are being met.
  • 70. Most true-false items are simple propositions that can be answered true or false (or yes or no). For examples: Reliability is a measure of the consistency of an assessment. T F Face validity reflects how closely scores from repeated administrations of the same test resemble each other. T F Predictive validity is a kind of criterion-related validity. T F True-False Assessments to Tap Higher Mental Processes Answering these true-false questions requires little more than simple recall. However, it is possible to construct true-false items that assess other cognitive skills. Consider the following: Is the following statement true or false? 9 ÷ 3 × 12 + 20 = 56 T F Responding correctly to this item requires recalling and applying mathematical operations rather than simply remembering a correct answer. Binary-choice items can also be constructed so that the responder is required to engage in a variety of higher level cognitive activities such as comparing, predicting, evaluating, and gen- eralizing. Figure 8.10 shows examples of how this might be accomplished in relation to the Revised Bloom’s Taxonomy. Note that Figure 8.10 does not include any examples that relate
  • 71. to creating. Objectives that have to do with designing, writing, producing, and related activities are far better tapped by means of performance-based assessments or constructed- response items than with the more objective, selected-response assessments. Note, too, that although it is possible to design true-false items that seem to tap higher cogni- tive processes, whether they do so depends on what the learner already knows. As we saw earlier, what a test measures is not defined entirely by the test items themselves. Rather, it depends on the relationship between the test item and the individual learner. Consider, for example, the Figure 8.10 item that illustrates judging—an activity described as having to do with evaluating: • It is usually better to use a constructed-response test rather than a true-false test for objectives having to do with evaluating. T F One learner might respond, after considering the characteristics of constructed-response and true-false tests, by analyzing the requirements of evaluative cognitive activities and judging Developing Selected-Response Assessments Chapter 8 which characteristics of these assessments would be best suited for the objective. That learn- er’s cognitive activity would illustrate evaluating.
  • 72. Another learner, however, might simply remember having read or heard that constructed- response assessments are better suited for objectives relating to evaluating and would quickly select the correct response. That learner’s cognitive activity would represent the lowest level in Bloom’s Taxonomy: remembering. Figure 8.10: True-false items tapping higher cognitive skills Although it is possible to design true-false items that do more than measure simple recall, other forms of assessment are often more appropriate for objectives relating to activities such as analyz- ing, evaluating, and creating. f08.10_EDU645.ai Bloom’s Revised Taxonomy of Educational Objectives Remembering Understanding Applying Analyzing Evaluating Creating Possible activity
  • 73. representing each objective Example of true-false item that reflects the boldface verb in the second column copy, duplicate, list, learn, replicate, imitate, memorize, name, order, relate, reproduce, repeat, recognize . . . indicate, know, identify, locate, recognize, report, explain, restate, review, describe, distinguish . . . demonstrate, plan, draw, outline, dramatize, choose, sketch, solve, interpret, operate, do, . . . calculate, check, catego- rize, balance, compare, contrast, test, differenti- ate, examine, try . . . assess, choose, appraise, price, defend, judge, rate,
  • 74. calculate, support, criticize, predict . . . arrange, write, produce, make, design, formulate, compose, construct, build, generate, craft . . . (Creating cannot normally be tested by means of a true-false item) X X X X XThis is a spider: T F It is usually better to use a constructed response rather than a true-false test for objectives having to do with evaluating. T F The total area of a rectangular house that is 60 feet in one dimension and that contains nothing other than 3 equal-sized rectangular rooms that measure 30 feet in one dimension is 1800 square feet.
  • 75. T F For cutting through a trombone you could use either a hacksaw or a rip saw. T F It is correct to say that affect has an affect on most of us. Yes No Developing Selected-Response Assessments Chapter 8 Limitations of True-False Assessments True-false assessments are open to a number of serious criticisms. First, unless they are care- fully and deliberately constructed to go beyond black-and-white facts, they tend to measure little more than simple recall. And second, because there is a 50% chance of answering any one question correctly—all other things being equal—they tend to provide unreliable assess- ments. If everyone in a class knew absolutely nothing about an area being tested with true- false items, and they all simply guessed each answer randomly, the average performance of the class would be around 50%. Nevertheless, the chance of receiving a high mark as a result of what is termed blind guess- ing is very low. And the chance of receiving a very poor mark is equal to that of receiving a very high one. Most guessing tends to be relatively educated rather than
  • 76. completely random. Even if they are uncertain about the correct answer, many students know something about the item and guess on the basis of the information they have and the logic and good sense at their disposal. Variations on Binary-Choice Items In one study, Wakabayashi and Guskin (2010) used an intriguing approach to reduce the effect of guessing. Instead of simply giving respondents the traditional choice of true or false, they added a third option: unsure. When students were retested on the same material later, items initially marked unsure were more likely to have been learned in the interim and to be answered correctly on the second test than were incorrect responses of which the responders had been more certain. Another interesting approach that uses true-false items to understand more clearly the learn- er’s thinking processes asks responders to explain their choices. For example, Stein, Larrabee, and Barman (2008) developed an online test designed to uncover false beliefs that people have about science. The test consists of 47 true-false items, each of which asks responders to explain the reasons for their choices. As an example, one of the items reads as follows: An astronaut is standing on the moon with a baseball in her/his hand. When the base- ball is released, it will fall to the moon’s surface. (p. 5) The correct answer, true, was selected by only 32.8% of 305 respondents, all of whom were
  • 77. students enrolled in teacher education programs at two different universities. More reassur- ingly, 94.4% chose the correct answer (true) for this next item: A force is needed to change the motion of an object. (p. 5) The usefulness of this approach lies in the explanations, which often reveal serious misconcep- tions. And strikingly, this is frequently the case even when answers are correct. For example, more than 40% of respondents who answered this last item correctly did so for the wrong reasons. They failed to identify the normal forces (called reaction forces) that counter the effects of gravity. Asking students to explain their choices on multiple-choice tests might reveal significant gaps in knowledge or logic. This approach could contribute in important ways to the use of these tests for formative purposes. Table 8.6 presents guidelines for writing true-false items. Developing Selected-Response Assessments Chapter 8 Table 8.6 Checklist for constructing good true-false items Yes No Is the item useful for assessing important learning objectives? Yes No Have I avoided negatives as much as possible? Yes No Have I included items that measure more than simple recall?
  • 78. Yes No Is the item absolutely clear and unambiguous? Yes No Have I avoided qualifiers such as never, always, and usually? Yes No Have I avoided a pattern of correct responses? Yes No Have I balanced correct response choices? Yes No Have I made my statements as brief as possible? Yes No Have I avoided trick questions? Yes No Are my true and false statements of approximately equal length? Interpretive Items Interpretive items present information that the responder needs to interpret when answer- ing test items. Although the test items themselves might take the form of any of the objec- tive test formats—matching, multiple-choice, or true-false—in most cases, the material to be interpreted is followed by multiple-choice questions. Interpretive material most often takes the form of one or two written paragraphs. It may also involve graphs, charts, maps, tables, and video or audio recordings. Figure 8.11 is an example of a true-false interpretive item based on a graph. Answering the items correctly might require analysis and inference in addition to basic skills in reading graphs.
  • 79. Figure 8.12 illustrates a more common form of interpretive test item. It is based on written material that is novel for the student. Responding correctly requires a high level of reading skill and might also reflect a number of higher mental processes such as those involved in analyz- ing, generalizing, applying, and evaluating. Advantages of Interpretive Items Interpretive items present several advantages over traditional multiple-choice items. Most important, they make it considerably easier to tap intellectual processes other than simple recall. Because the material to be interpreted is usually novel, the student cannot rely on recall to respond correctly. A second advantage of interpretive items is that they can be used to assess understanding of material that is closer to real life. For example, they can easily be adapted to assess how clearly students understand the sorts of tables, charts, maps, and graphs that are found in newspapers, on television, and in online sources. Finally, not only can interpretive test items be used to assess a large range of intellectual skills, but they can also be scored completely objectively. This is not the case for performance-based assessments or for most constructed-response assessments. Developing Selected-Response Assessments Chapter 8 Figure 8.11: Interpretive true-false item based on a graph
  • 80. Interpretive test items are most often based on written material but can also be based on a variety of visual or auditory material, as shown here. f08.11_EDU645.ai 2000 2006 2007 Year 2008 2009 60 55 50 45 40 35 30 25 P e rc e n
  • 81. ta g e o f a d u lt s a g e d 2 5 t o 3 4 Currently married Never married Results of a 2011 U.S. Census Bureau survey are shown above. Based on this graph,
  • 82. indicate whether each of the following statements is true or false by putting a checkmark in front of the appropriate letter. 1. The vertical axis indicates the year in which the survey was conducted. T F 2. In 2008, there were more 25- to 34-year-olds who were married than who had never married. T F 3. Every year between 2000 and 2009 there were more married than never married adults between 25 and 34. T F 4. One hundred percent of people surveyed were either currently married or had never married. T F 5. The number of never-married 25- to 34-year-olds increased between the year 2000 and 2009. T F � � � � � Source: U.S. Census Bureau (2011). Retrieved September 12, 2011, from http://factfinder.census.gov/servlet/ STTable?_bm=y&-qr_name=ACS_2009_5YR_G00_S1201&-
  • 83. ds_name=ACS_2009_5YR_G00_&-state=st&-_lang=en. http://factfinder.census.gov/servlet/STTable?_bm=y&- qr_name=ACS_2009_5YR_G00_S1201&- ds_name=ACS_2009_5YR_G00_&-state=st&-_lang=en http://factfinder.census.gov/servlet/STTable?_bm=y&- qr_name=ACS_2009_5YR_G00_S1201&- ds_name=ACS_2009_5YR_G00_&-state=st&-_lang=en Developing Selected-Response Assessments Chapter 8 Limitations of Interpretive Items Interpretive test items do have a number of limitations and disadvantages, however. Among them is that interpretive items often rely heavily on reading skills. As a result, incorrect responses may reflect reading problems rather than problems in the intellectual skills being assessed. Another disadvantage of interpretive items is the difficulty of constructing good interpretive material and related multiple-choice items. Developing interpretive text or visual representa- tions is a time-consuming and demanding task. Poorly designed items tend to measure rec- ognition and recall, both of which can be assessed by means of multiple-choice or true-false formats, which are easier to construct. Finally, like other objective assessments, interpretive-test items seldom tap the productive skills involved in creating. As we noted, performance-based and constructed-response tests are more appropriate for tapping the highest levels of cognitive
  • 84. skills. Constructing Good Interpretive Items As is true for all forms of assessment, the best interpretive test items are those that are rel- evant to instructional objectives, that sample widely, that tap a range of mental processes, and that are as fair, valid, and reliable as possible. Table 8.7 is a brief checklist of guidelines for constructing good interpretive items. Figure 8.12: Interpretive multiple-choice item based on written material The most common interpretive items are based on written material. One of their disadvantages is that they are highly dependent on reading skills. f08.12_EDU645.ai Beavers are hardworking, busy little creatures. Like all mammals, when they are young, the babies get milk from their mothers. They are warm- blooded, so it’s important for them to keep from freezing. They do this in the winter by living in houses they build of logs, sticks, and mud. Beaver houses are called lodges. The entrance to the lodge is far enough under water that it doesn’t freeze even in very cold weather. Because the walls of the lodge are very thick and the area in which the family lives is very small, the heat of their bodies keeps it warm. 1. Which of the following best
  • 85. describes mammals? a. Mammals are warm blooded and live in lodges. b. Mammals need milk to survive. c. Some mammals hatch from eggs that the mother lays. d. Mammals produce milk for their newborns. e. Mammals live in warm shelters or caves. 2. Beaver lodges stay warm because a. They have an underwater entrance that doesn’t freeze. b. The living area is small and the walls are thick. c. Beavers are very hard- working and busy. d. Beavers are warm-blooded mammals. e. Beavers are mammals. *
  • 86. * Developing Constructed-Response Assessments Chapter 8 Table 8.7 Checklist for constructing good interpretive items Yes No Is the item useful for assessing important learning objectives? Yes No Is the reading and difficulty level appropriate for my students? Yes No Does the item tap more than simple recall? Yes No Have I avoided questions that are answered literally in the interpretive material? Yes No Have I avoided questions that can be answered without the interpretive material? Yes No Are multiple-choice choice items well constructed? Yes No Is the item absolutely clear and unambiguous? Yes No Have I included all the information required? Yes No Are instructions clear? Yes No Have I avoided unnecessary length? Yes No Is the interpretive material novel for learners? Yes No Have I avoided trick questions?
  • 87. 8.5 Developing Constructed-Response Assessments As the name implies, constructed-response assessments require test takers to generate their own responses. Two main kinds of constructed-response assessments are used in educational measurement: short-answer items (also referred to as restricted- response items) and essay items (also called extended-response items). The main advantage of constructed-response assessments is that they lend themselves better to evaluating higher thought processes and cognitive skills. Also, they allow for more variation and more creativity. When compared with selected-response assessments, constructed-response assessments have two principal limitations: First, they usually consist of a small number of items and there- fore sample course content less widely; second, they tend to be less objective than selected- response assessments, simply because they can seldom be scored completely objectively. Short-Answer Constructed-Response Items Short-answer items sometimes require a response consisting of only a single word or short phrase to fill in a blank left in a sentence. These are normally referred to as completion items (or fill-in-the-blanks items). At other times, they require a brief written response—perhaps only one or two words—that does not fill in a blank space in a sentence. In contrast, essay items typically ask for a longer, more detailed written response,
  • 88. often consisting of a number of paragraphs or even pages. Completion items can easily be generated simply by taking any clear, unambiguous, complete sentence from a written source and reproducing it with a single word or phrase left out. One problem with this approach is that it encourages rote memorization rather than a more thoughtful approach to studying. In addition, out-of-context sentences are often somewhat ambiguous at best; at worst, they can be completely misleading. Developing Constructed-Response Assessments Chapter 8 Advantages of Short-Answer Items Short-answer items have several important advantages: 1. Because they require the test taker to produce a response, they effectively eliminate much of the influence of guessing. This presents a distinct advantage over selected-response approaches such as multiple-choice and true-false assessments. 2. Because they ask for a single correct answer or a very brief response, they are highly objec- tive even though they ask for a constructed response. As a result, it is a simple matter to generate a marking key very much as is done for multiple- choice, true-false, or matching items. 3. They are easy to construct and can quickly sample a wide range of knowledge.
  • 89. Limitations of Short-Answer Items Among their limitations is that because examiners need to read every response, short-answer items can take longer to score than selected-response measures. Nor is constructing short- answer items always easy: Sometimes it is difficult to phrase the item so that only one answer is correct. Another limitation has to do with possible contamination of scores due to bad spelling. If marks are deducted for misspelled words, what is being measured becomes a mixture of spell- ing and content knowledge. But if spelling errors are ignored, the marker may occasionally have to guess at whether the misspelled word actually represents the correct answer. Finally, because correct responses are usually limited to a single choice, they don’t allow for creativity and are unlikely to tap processes such as synthesis and evaluation. Examples of Short-Answer Items General guidelines for the construction of short-answer items include many of those listed earlier in Tables 8.4 through 8.7. In addition, test makers need to ensure that only one answer is correct and that what is required as a response is clear and unambiguous. For this reason, when preparing completion items it is often advisable to place blanks at the end of the sen- tence rather than in the middle. For example, consider these two completion items:
  • 90. 1. In 1972, produced a film entitled Une Belle Fille Comme Moi. 2. The name of the person who produced the 1972 film Une Belle Fille Comme Moi is . Although the correct answer in both cases is François Truffaut, the first item could also be answered correctly with the word France. But the structure of the second item makes the nature of the required response clear. It’s even clearer to rephrase the sentence as a question so that it is no longer a fill-in-the blanks short-answer item. For example: What is the name of the director who produced the 1972 film Une Belle Fille Comme Moi? Although short-answer items generally require only one or two words as a correct response, some ask for slightly longer responses. For example: Define face validity. Developing Constructed-Response Assessments Chapter 8 What states are contiguous with Nevada? , , , and . As shown in the second example, four blanks are provided as a clue to the number of responses required. Providing a single, longer blank would increase the
  • 91. item’s level of difficulty. Essay Constructed-Response Items Essay items also require test takers to generate their own responses. But instead of being asked to supply a single word or short phrase, they are asked to write longer responses. The instructions given to the test taker—sometimes referred to as the stimulus—can vary in length and form, but should always be worded so that the student has a clear understanding of what is required. Whenever possible, the stimulus should also include an indication of scoring guidelines. Without scoring guidelines, responses can vary widely and increase the difficulty of scoring an item. Some essay assessments might ask a simple question that requires a one- or two-sentence response. For example: What is the main effect of additional government spending on unemployment? Why is weather forecasting more accurate now than it was in the middle of the 20th century? Both of these questions can be answered correctly with a few sentences. And both can be keyed so that different examiners scoring them would arrive at very similar results. Items of this kind can have a high degree of reliability. Many essay items ask for lengthier expositions that typically
  • 92. require organizing information, marshaling arguments, defending opinions, appealing to authority, and so on. In responding to these, students typically have wide latitude both in terms of what they will say and how they will say it. As a result, longer essay responses are especially useful for tapping higher mental processes such as applying, analyzing, evaluating, and creating. This is their major advantage over the more objective approaches. The main limitation of longer essay assessments has to do with their scoring. Not only is it highly time-consuming, but it tends to be decidedly subjective. As a result, both reliability and validity of such assessments is lower than that of more objective measures. For example, some examiners consistently give higher marks than others (Brown, 2010). Also, because there is sometimes a sequential effect in scoring essays, essay items that follow each other are more likely to receive similar marks than those that are farther apart (Attali, 2011). The reliability of essay tests can be increased significantly by using detailed scoring guides such as checklists and rubrics. These guides typically specify as precisely as possible details of the points, arguments, conclusions, and opinions that will be considered in the scoring, and the weightings assigned to each. Essay questions can be developed to assess knowledge of subject matter (remembering, in Bloom’s revised taxonomy); or they can be designed to tap any of the higher level intellectual
  • 93. skills. Figure 8.13 gives examples of how this might be done. Developing Constructed-Response Assessments Chapter 8 Writing good essay questions is not as time-consuming as writing objective items such as multiple-choice questions, but it does require attention to several guidelines. These are sum- marized in Table 8.8. Figure 8.13: Essay items tapping higher intellectual skills What each item assesses depends on what test takers already know and the strategies they use to craft their responses. Even responding to the first item, remembering, might require a great deal of creating, analyzing, evaluating, and understanding if the learner has not already memorized a correct response. f08.13_EDU645.ai Bloom’s Revised Taxonomy of Educational Objectives Possible activity representing each objective Example of essay item that reflects the boldface verb
  • 94. in the second column copy, duplicate, list, learn, replicate, imitate, memorize, name, order, relate, reproduce, repeat, recognize . . . indicate, know, identify, locate, recognize, report, explain, restate, review, describe, distinguish . . . demonstrate, plan, draw, outline, dramatize, solve, choose, sketch, solve interpret, operate, do, . . . calculate, check, categorize, balance, compare, contrast, test, differentiate, examine , try . . . assess, choose, appraise, price, defend, judge, rate, calculate, support, criticize, predict . . . arrange, write, produce, make, design, devise,
  • 95. formulate, compose, construct, build, generate, craft . . . Devise a procedure with accompanying formulas that can be used to calculate the volume of a small rock of any shape. Describe each step in the procedure. List the five most important events that led to the Second World War. Write an essay appraising the American educational system. What are its strengths and weaknesses? How can it be improved? (Recommended length: 500–1000 words.) Read the following two paragraphs. Compare the use of figures of speech in each. How are they similar? How are they different? Using the formula for compound interest, calculate the monthly payment for a $150,000 loan at 6.7% amortized over 22 years. Explain the implications of lengthening the amortization period. Explain in no more than half a page how an internal combustion engine works.
  • 96. Remembering Understanding Applying Analyzing Evaluating Creating Section Summaries Chapter 8 Table 8.8 Checklist for constructing good essay items Yes No Do the essay questions assess intended instructional objectives? Yes No Have I worded the stimulus so that the requirements are clear? Yes No Am I assessing more than simple recall? Yes No Have I indicated how much time should be spent on each item (usually by saying how many points each item is worth)? Yes No Have I developed a scoring rubric or checklist? Planning for Assessment In Chapter 2, and again, earlier in this chapter, we described the various steps that make up
  • 97. an intelligent and effective assessment plan: 1. Know clearly what your instructional objectives are, and communicate them to your students. 2. Match instruction to objectives; match instruction to assessment; match assessment to objectives. 3. Use formative assessment as an integral part of instruction. 4. Use a variety of different assessments, especially when important decisions depend on their outcomes. 5. Use blueprints to construct tests and develop keys, checklists, and rubrics to score them. The importance of these five steps can hardly be overemphasized. Chapter 8 Themes and Questions Section Summaries 8.1 Planning for Teacher-Made Tests Important steps in planning for assessment include clarifying instructional objectives, devising test blueprints, matching instruction and assess- ment to goals, developing rubrics and other scoring guides, and using a variety of approaches to assessment. Assessment should be used for placement decisions, for improving teaching and learning (diagnostic and formative functions), and for evaluating and grading achieve- ment and progress (summative function).