More Related Content Similar to Adaptive Testing, Learning Progressions, and Students with Disabilities - May 2011 Similar to Adaptive Testing, Learning Progressions, and Students with Disabilities - May 2011 (20) Adaptive Testing, Learning Progressions, and Students with Disabilities - May 20111. DISCUSSION PAPER
Stuart R. Kahl, Ph.D., CEO | kahl.stuart@measuredprogress.org
Michael Nering, Ph.D., Assistant Vice President | nering.michael@measuredprogress.org
Michael Russell, Ph.D.,Vice President | mike@nimbletools.com
Peter D. Hofman, Vice President | hofman.peter@measuredprogress.org
www.measuredprogress.org | 800.431.8901 | 100 Education Way, Dover, NH 03820
Adaptive Testing, Learning Progressions,
and Students with Disabilities
May 17, 2011
2. ©2011 Measured Progress. All rights reserved. | measuredprogress.org | 800.431.8901
DISCUSSION PAPER
2
Introduction
This paper responds to the legitimate concern that a
student may not be afforded a sufficient opportunity
during adaptive testing for accountability purposes to
demonstrate what on-grade-level knowledge or skills he or
she actually has. We summarize below different means of
achieving the desired objective of providing all students a
valid opportunity to perform at grade level prior to being
presented with off-grade test content. We also discuss the
implications of learning progressions in creating adaptive
assessments.
For purposes of this discussion, we assume that sufficient
accommodations are embedded in adaptive testing
systems so that the maximum number of students
possible has access to the test content and the ability
to respond to items. In other words, the absence of
accommodations is not an obstacle to students’ correctly
responding to grade-level test content. We do recognize
that this assumption will likely require modifications to
most, if not all, adaptive platforms currently available.
With that assumption in mind, three options—which
might be combined—could address the key issue:
ƒƒ Using a stage- rather than an item-adaptive test design
ƒƒ Developing multiple adaptive tests at finer grains of
detail—the strand or even learning progression level
rather than at the content area level
ƒƒ Employing monte carlo simulations to anticipate a
wide range of student performance scenarios, then
using the results to affect the item pool characteristics
and adaptive algorithm to achieve the desired objective
Adaptive Testing and Development
Options
The most commonly used adaptive tests are adaptive at
the total test level and use individual item difficulties to
zero in on an estimate of a student’s overall ability – the
total test score. Answering the first question correctly
leads to a harder second question. Answering that one
incorrectly would lead to the third question being an
easier one. As this process continues, an estimate within
a specified error tolerance eventually emerges. Note that
commonly with this type of adaptive test, item difficulty
and students’ answers (whether or not they are correct)
drive the adaptive algorithm.
However, poor performance on a few early items can
prematurely drive the next items and the ultimate ability
estimate too low. One way of reducing the chances of
this happening would be to use stage-adaptive testing
whereby the student takes a cluster of “on-grade” items
before a next cluster of items is identified/selected for the
student. This approach still assumes the goal is a total
test score for a purpose such as statewide accountability.
And, again, item difficulty generally plays a key role in the
adaptive algorithm.
The psychometric underpinning of adaptive testing as
described above is Item Response Theory, which assumes
that every item is an estimator of an underlying general
ability, mathematical ability for instance. However,
students’ knowledge and skill in mathematics are not
so neatly ordered. For a variety of reasons, including
the content and quality of instruction, a student might
be higher performing in one area of mathematics than
in another. Separate adaptive tests in each area (tests
that are adaptive at the math strand level) could reveal
this. In other words, a student who overall might be
pegged as below grade-level could perform at grade-level
performance in one or more areas. Still, math strands (e.g.,
geometry and measurement, probability and statistics)
are quite broad. For this reason, consideration should be
given to tests that are adaptive at a still finer level, such as
learning progressions. In fact, for meaningful information
that might be considered diagnostic from adaptive testing,
this might be the only appropriate approach. The finer
the level at which a test is adaptive, the more diagnostic
the results would be. A student profile of results from a
series of adaptive tests at the learning progression level
could show quite variable results across progressions for
a student, some revealing grade level proficiency, others
perhaps not. Such an approach would ensure that students
would be able to demonstrate their on-grade knowledge
and skills before receiving off-grade test content. And the
results on the individual adaptive tests in the series could
be combined to generate a total test score.
Using a different approach, we could study ahead of
time the concern around having students take off-grade
items before having an opportunity to demonstrate their
knowledge, skills, and/or ability associated with their
actual grade level. This approach would make use of
monte carlo simulations. For example, if a concern existed
about a low-performing student initially having trouble
with the navigation controls in a computer adaptive test
(CAT) that resulted in a student being routed to off-grade
(i.e., one grade lower) items – then we could study this
scenario to see what might happen. CATs have two key
components that will affect the results of this scenario:
3. ©2011 Measured Progress. All rights reserved. | measuredprogress.org | 800.431.8901
DISCUSSION PAPER
3
the item pool characteristics, and the CAT algorithm.
Ideally, CAT designers/developers optimize the algorithm
relative to the item pool and to scenarios such as the one
above. By carefully constructing the components of the
algorithm, the designers/developers can test various real
world scenarios ahead of CAT administration so that they
can develop a fair and precise administration process for
students across the entire performance continuum. As
another example, we may find that students struggling
early on in a CAT that are routed to a lower grade level
have no chance in recovering and being administered
items at the actual grade level. To mitigate this we may
find it necessary to increase the size of the item pool, or
make adjustments to our selection algorithm – so that we
can more precisely determine when somebody should
be routed to a lower grade level, or give them better
opportunities to return to their grade level once they’ve
been routed to a lower grade level. Note that we could
use the two distinct adaptive options (stage-adaptive or
narrow-scope item-adaptive) as part of a monte carlo
analysis during the assessment development stage. The
process would include pilot testing, since the monte carlo
analyses are data driven.
The Implications of Learning
Progressions
Although the issue is far broader and complex, when the
topic of adaptive testing arises, people might think about
the use of learning progressions. A learning progression
describes the developmental process of acquiring
knowledge and/or building skills one would need in order
to master an area. The individual kernels of knowledge
and/or skills are connected in some fashion that builds
capacity and expertise. The concept implies a linear path,
although the process may not at all be linear and for any
particular area might vary for different people.
If educators desire to use the results of adaptive
assessment to inform instructional interventions, then it
makes sense to use a learning progression in developing
the adaptive algorithm and selecting items. IN theory,
learning progressions could be used in both item- and
stage-adaptive tests. However, the extent to which the
learning progressions are not well articulated (e.g., in
general or just for some learners) will limited the ability
of the adaptive tests to precisely determine where a given
student is located on the performance continuum. This is
not so much a limitation of adaptive tests per se as it is on
using something as complicated as learning progressions
as a method of adaptation. If a sufficient item pool exists,
and if we were to use different learning progressions in
developing the adaptive algorithms, we might end up
developing distinct tests.
The complexity in using learning progressions arises in
large part from two key factors: (1) their relative newness
and lack of articulation in some content/sub-content
areas, and (2) unresolved questions surrounding the
most accurate/useful way to describe students’ learning
patterns—whether one size can really fit all. Put simply, we
need to be careful about what is not well articulated versus
for whom a learning progression is not well articulated.
Not unexpectedly—but unfortunately, the relative newness
of learning progressions has led to varied views of what
they are. The early Common Core State Standards work
regarded learning progressions as fairly broad. Others
have defined them more narrowly. Various experts
are exploring and developing learning progressions.
Some are quite generalized, while others are extremely
granular. Some experts have suggested teachers take
a practical approach and merely “chunk” content in a
logical progression, whereas others have taken a far more
rigorous, data-driven (IE expensive and time-consuming)
approach to develop specific, detailed progressions. And,
as noted, coverage is incomplete across—and in some
cases with-in—content areas. In general, the broader a sub-
domain is, the more paths there are for students to follow
– that is, the order in which they learn things can vary. If
progressions are very narrowly defined, then within one,
the order of learning might be fairly universal. The latter
situation could necessitate a large number of progressions
to cover a sub-domain and might only be applicable in
mathematics and possibly certain sub-domains of science.
Having multiple progressions in a particular area might
greatly increase instructional complexities and challenges,
although that approach might appear to better address the
needs of diverse student populations.
As noted, the varied views of learning progressions
stem from the fact that this is a relatively new topic.
Theoretically, they exist for all knowledge/skill areas and
apply to all students. Different theoretical perspectives
exist on establishing and measuring them for all students.
An example of distinct theoretical approaches is to
conceive of a learning progression as a honey comb with
each cell representing a discrete body of knowledge or
skill. Many cells build on or support each other and there
may be a linear progression to the development of those
associated skills, but there may also be skills or knowledge
that is developed in a non-linear manner. Progression
4. ©2011 Measured Progress. All rights reserved. | measuredprogress.org | 800.431.8901
DISCUSSION PAPER
4
occurs as more cells are “filled” in, but in this model the
order in which cells are filled in is of less importance
and can vary among students. This intriguing concept
has the advantage of addressing differences in student
learning patterns while not greatly increasing instructional
complexity.
Bringing us back full circle, with a sufficient item pool,
even if we found the honeycomb concept of a learning
progression to be most accurate, we could use it to design an
appropriate adaptive assessment. The bottom line is that at
this point we don’t know whether students with disabilities
require different learning progressions—or whether a
honeycomb or other pattern is more accurate, but we think
this is an extremely important issue to resolve.