Procuring digital preservation CAN be quick and painless with our new dynamic...
Findings on Early Reading Assessments from CIERA Research
1. Reading Assessments in Kindergarten through Third Grade: Findings from the Center for
the Improvement of Early Reading Achievement
Author(s): Scott G. Paris and James V. Hoffman
Source: The Elementary School Journal, Vol. 105, No. 2, Lessons from Research at the Center
for the Improvement of Early Reading Achievement<break></break>Joanne F. Carlisle, Steven
A. Stahl, and Deanna Birdyshaw, Guest Editors (November 2004), pp. 199-217
Published by: The University of Chicago Press
Stable URL: http://www.jstor.org/stable/10.1086/428865 .
Accessed: 19/06/2013 12:22
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
.
The University of Chicago Press is collaborating with JSTOR to digitize, preserve and extend access to The
Elementary School Journal.
http://www.jstor.org
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
2. The Elementary School Journal
Volume 105, Number 2
᭧ 2004 by The University of Chicago. All rights reserved.
0013-5984/2004/10502-0005$05.00
Reading Assessments
in Kindergarten
through Third Grade:
Findings from the
Center for the
Improvement of Early
Reading Achievement
Scott G. Paris
University of Michigan
James V. Hoffman
University of Texas at Austin
Abstract
Assessment of early reading development is im-
portant for all stakeholders. It can identify chil-
dren who need special instruction and provide
useful information to parents as well as sum-
mative accounts of early achievement in schools.
Researchers at the Center for Improvement of
Early Reading Achievement (CIERA) investi-
gated early reading assessment in a variety of
studies that employed diverse methods. One
group of studies used survey methods to deter-
mine the kinds of assessments available to teach-
ers and the teachers’ reactions to the assess-
ments. A second group of studies focused on
teachers’ use of informal reading inventories for
formative and summative purposes. In a third
group of studies, researchers designed innova-
tive assessments of children’s early reading, in-
cluding narrative comprehension, adult-childin-
teractive reading, the classroom environment,
and instructional texts. The CIERA studies pro-
vide useful information about current reading
assessments and identify promising new direc-
tions.
Achievement testing in the United States
has increased dramatically in frequency and
importance during the past 25 years and is
now a cornerstone of educational practice
and policy making. The No Child Left Be-
hind (NCLB) (2001) legislation mandates
annual testing of reading in grades 3–8 and
increased assessment for students in grades
K–3 with clear intentions of increased ac-
countability and achievement. The ration-
ales for early assessment lie in (a) research
on reading development that indicates the
importance of basic skills for future success
and (b) classroom evidence that early diag-
nosis and remediation of reading difficul-
ties can improve children’s reading achieve-
ment (Snow, Burns, & Griffin, 1998). The
unprecedented federal resolve and re-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
3. 200 THE ELEMENTARY SCHOOL JOURNAL
NOVEMBER 2004
sources at the beginning of the twenty-first
century that are focused on the improve-
ment of children’s reading achievement re-
quire researchers and educators to identify
useful assessment tools and procedures.
The Center for the Improvement of
Early Reading Achievement (CIERA), a
consortium of researchers from many uni-
versities, was funded and became opera-
tional in 1997. Assessment of reading
achievement and the corresponding prac-
tices and policies were major foci of the re-
search agenda. It is important to note that
the CIERA research was proposed, and of-
ten conducted, before the report of the Na-
tional Reading Panel (2000) and before the
NCLB legislation. These important events
did not frame CIERA research at the time,
but they certainly influence the interpreta-
tion of assessment tools today. For example,
both the NRP and NCLB emphasized five
essential skills for beginning reading suc-
cess: the alphabetic principle, phonemic
awareness, oral reading fluency, vocabu-
lary, and comprehension. Consequently,
many of the early reading assessments de-
veloped recently have focused on those
skills, especially the first three. CIERA re-
searchers acknowledge the importance of
assessing these skills, but they chose to in-
vestigate a broader array of assessment is-
sues and practices partly because there
were already many assessments of the al-
phabetic principle, phonemic awareness,
and oral reading fluency, such as the Dy-
namic Indicators of Basic Literacy Skills (DI-
BELS), a popular and quick battery of read-
ing assessments (Good & Kaminski, 2002).
Moreover, CIERA researchers in 1997
wanted to survey teachers to find out what
assessments they used and why, as well as
to identify new kinds of assessments for
nonreaders and beginning readers.
CIERA research on early reading assess-
ment was proposed and conducted in an
era of increased testing and evidence-based
policy making. An initial cluster of studies
examined the kinds of reading assessments
available and used by teachers in order to
describe current classroom practices. The
studies were intended to be surveys of best
practices in schools. A second group of
studies examined the use of oral reading as-
sessments to determine students’ adequate
yearly progress (AYP) because many infor-
mal reading inventories were being trans-
formed into formal summative assessments
of reading achievement. Several other
CIERA studies examined innovative tools
and new directions for assessing early read-
ing achievement. The research was explor-
atory, eclectic, and conducted by multiple
investigators, but collectively, the studies
help to identify promising assessments of
reading along with some practical obstacles
to implementation. We present the findings
of these three groups of studies and con-
clude with a discussion of future directions
in K–3 reading assessment research.
Surveys of Early Reading
Assessments
Many teachers are overwhelmed by the nu-
merous reading assessments mandated by
policy makers, advocated by publishers, re-
quired by administrators, or simply rec-
ommended for classrooms. We begin with
an examination of two CIERA studies on
the variety of assessment instruments avail-
able for K–3 teachers. We then examine two
CIERA studies of teachers’ attitudes toward
and use of assessments. The first two stud-
ies differ in their focus on commercial and
noncommercial measures. Both studies fol-
lowed up on the pioneering research by
Stallman and Pearson (1990), who con-
ducted one of the first comprehensive sur-
veys of early reading measures.
The Commercial Marketplace
Ten years after Stallman and Pearson’s
(1990) study, Pearson, Sensale, Vyas, and
Kim (1999) conducted a similar study of
commercial reading tests. They identified
148 tests with 468 subtests in their CIERA
survey. More than half of the tests had been
developed in the 1990s, and more than half
were designed for individual administra-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
4. ASSESSMENTS 201
tion, clearly a response to the preponder-
ance of group tests in the previous decade.
Multiple-choice responses and marking an-
swer sheets still predominated over read-
ing, writing, or open-ended responses.
Nearly all tests were administered with a
mixture of visual and auditory presenta-
tions. In contrast to the previous decade,
about 40% of tests required production as a
response mode. Recognition was required
in about 40% of the tests and identification
in only 10%. Scoring ease may have driven
the response mode because more than 60%
of the tests could be scored simply as right
or wrong, less than 20% contained multiple-
choice items, and only 10% of tests used ru-
brics to score answers.
Pearson et al. (1999) analyzed the skills
assessed in the 148 tests and found that
word knowledge, such as concept identifi-
cation, was assessed in 50%; sound and
symbol concepts were assessed in 65%; lit-
eracy and language concepts were assessed
in 90%; and comprehension was assessed in
only 24% of the tests. When they analyzed
only the K–1 tests to compare with the Stall-
man and Pearson (1990) findings, they
found that 52% compared to a previous 18%
of tests were administered to individual
children. Only 36% of the tests compared to
the previous 63% were multiple-choice
tests, and the heavy emphasis on sound-
symbol correspondence was reduced in half
and replaced by a much stronger emphasis
on language and literacy concepts. These
changes may be due to the growing influ-
ence of whole language, Clay’s Observation
Survey (1993), and assessment methods
used in Reading Recovery throughout the
1990s. Although the type of processing re-
quired was still largely recognition, it had
decreased from 72% of tests in the first sur-
vey to 51%. Likewise, filling in bubbles de-
creased from 63% to 28% of the tests, and
oral responding increased from 12% to 39%.
The authors also noted the variety of
new reading assessments that emerged in
the 1990s. Kits, such as the Dial-R, that in-
cluded assessment batteries became more
prominent. Some elaborate systems were
developed for using classroom assessment
for formative and summative functions. For
example, the Work Sampling System (Mei-
sels, Liaw, Dorfman, & Nelson, 1995) in-
cludes developmental benchmarks from
ages 3 to 12 in behavioral, academic, and
affective domains that can be used with
teachers’ checklists and students’ portfo-
lios to monitor individual growth and
achievement. The kits and elaborate sys-
tems usually include teachers’ guides, cur-
riculum materials, and developmental ru-
brics. Leveled books became a popular tool
for determining children’s reading levels
for the assessment tasks, again reflecting
the influence of Reading Recovery, Guided
Reading, and similar derivations for in-
struction.
Pearson et al. (1999) concluded that
commercial reading tests in the late 1990s
were much more numerous and varied than
the tests available 10 years earlier. More
skills were tested, particularly language
and literacy concepts. More choices, judg-
ments, and interpretations were required
from the examiner, usually the teacher, to
use the new tests. However, there was still
a preponderance of recognition responses
and filling in bubbles on answer sheets. The
researchers suggested that the changes in
early reading assessments during the 1990s
reflected the influences of three thematic
changes to early literacy and language arts
in classrooms: emergent literacy, process
writing approaches, and performance as-
sessments throughout the curriculum.
Noncommercial Assessments
The second CIERA survey of early read-
ing assessments was conducted by Meisels
and Piker (2000). Their study had three ob-
jectives: ‘‘1) to gain an understanding of
classroom-based literacy measures that are
available to teachers; 2) to characterize the
instructional assessments teachers use in
their classrooms to evaluate their students’
literacy performance; and 3) to learn more
about how teachers assess reading and writ-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
5. 202 THE ELEMENTARY SCHOOL JOURNAL
NOVEMBER 2004
ing skills’’ (p. 5). In contrast to the previous
studies of commercial assessments, Meisels
and Piker (2000) examined noncommercial
literacy assessments that were nominated
by teachers and educators or used by school
districts. They excluded assessments used
for research or accountability and focused
on K–3 instruments. Some assessments of
motivation and attitudes were also in-
cluded. The researchers collected informa-
tion from educational list serves, personal
contacts, literature searches, published re-
views of the measures, Web sites, and news-
letter postings so their survey was directed
at assessments in use rather than on sale in
the marketplace.
Their search identified 89 measures, 60
of which were developed in the 1990s. The
coding categories Meisels and Piker (2000)
used were adapted from the Stallman and
Pearson (1990) survey and categorized mea-
sures on 13 literacy skills: print awareness,
phonics, reading, reading strategies, com-
prehension, writing process, writing con-
ventions, motivation, self-perception, meta-
cognition, attitude, oral language listening
and speaking, and oral language other. The
first six of these skills are most directly re-
lated to reading assessment. However, the
CIERA researchers identified 203 subskills
among these 13 categories. This is again an
indication of the conceptual and practical
decomposition of literacy skills in complex
assessment batteries.
Meisels and Piker (2000) found that 70%
of the measures were designed for individ-
ual administration and nearly half were in-
tended for all grades, K–3. Only five were
available in languages other than English
(four in Spanish, one in Danish). Of the 13
skills, phonics, comprehension, and reading
were assessed most frequently, and moti-
vation, self-perception, and attitudes were
measured least often. Of the 89 measures,
47 were based on observation or on-
demand methods for evaluating students’
literacy. Constructed responses were used
mostly with writing. Checklists were used
in 36% of the measures, and running re-
cords in 15%. The most frequent kind of re-
sponse was oral response on 64% of the
measures, followed by writing on 46% of
the measures. Recognition, identification,
and recall were used to assess about one-
third of the skills. Meisels and Piker (2000)
then examined the skills assessed in each
test and found that 70% were assessed with
observations and that the data were re-
corded in checklists (69%) or anecdotal ob-
servations (45%) most often. Both the lim-
ited response formats for students and the
informal records of teachers are worth not-
ing.
Meisels and Piker (2000) examined the
measures for evidence of psychometric re-
liability and validity and expressed disap-
pointment with the results. Only 14% of the
measures had evidence of good reliability
that ranged from high to moderate. Even
less information was available about valid-
ity. No consistent tests or benchmarks were
used to establish concurrent or predictive
validity. The researchers noted that the non-
commercial measures were less likely to in-
clude psychometric evidence than commer-
cial tests. In comparing their results to the
Stallman and Pearson (1990) study, Meisels
and Piker (2000) also noted that the non-
commercial measures were usually de-
signed for individuals, not groups, and had
more opportunities for students to identify
or produce answers rather than just recog-
nize correct choices. Noncommercial mea-
sures usually had fewer guidelines for ad-
ministering, recording, and interpreting the
assessment information.
How Teachers Use and Regard
Reading Assessments
The next set of studies went beyond a
consideration of the instruments to examine
how teachers use and evaluate them. Paris,
Paris, and Carpenter (2002) reported find-
ings from a survey of teachers’ perceptions
of assessment in early elementary grades.
They asked successful teachers what kinds
of reading assessments they used for what
purposes so that a collection of ‘‘best prac-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
6. ASSESSMENTS 203
tices’’ might be available as models for
other teachers. The assessment survey was
a part of a large CIERA survey of elemen-
tary teachers who taught in ‘‘beat the odds’’
schools to determine their practices and
views. These schools across the United
States had a majority of students who qual-
ified for Title I programs and had a mean
school test score on some standardized
measure of reading achievement that was
higher than the average score of other Title
I schools in the state. Most of the selected
schools also scored above the state average
for all schools. Candidate schools were se-
lected from a network of CIERA partner
schools as well as from annual reports of
outstanding schools in 1996, 1997, and 1998
as reported by the National Association of
Title I Directors.
The sample included 504 K–3 classroom
teachers in ‘‘beat the odds’’ schools, but the
anonymous and voluntary survey made it
impossible to determine if these were the
most effective teachers in the schools. In
the first part of the survey, teachers were
asked to record the types of reading as-
sessments used in their classrooms and
the frequency with which they used each
one. Most teachers reported that they used
all of the assessment types; 86% used per-
formance assessments, 82% used teacher-
designed assessments, 78% used word at-
tack/word meaning, 74% used measures
of fluency and understanding, 67% used
commercial assessments, and 59% used
standardized reading tests.
The survey showed that K–3 teachers
used a variety of assessments in their class-
rooms daily. Assessments designed by
teachers, including the instructional assess-
ments Meisels and Piker (2000) examined,
were used most frequently, and standard-
ized tests were used least often. This con-
trast was most evident for K–1 teachers
who rarely used standardized tests. The
survey showed that K–3 teachers used ob-
servations, anecdotal evidence, informal in-
ventories, and work samples as their main
sources of evidence about children’s read-
ing achievement and progress. The survey
also showed the variety of tools available to
teachers and the large variation among
teachers in what they used. The daunting
variety of assessments requires a highly
skilled teacher to select and use appropriate
tools.
Another part of the survey posed ques-
tions about the effects of assessments on
various stakeholders. In general, teachers
reported that teacher-designed, informal as-
sessments had more positive effects on stu-
dents, teachers, and parents. Conversely,
teachers believed standardized and com-
mercial assessments had a higher positive
effect on administrators. These patterns
suggest that teachers differentiate between
assessments over which they have control
and assessments generated externally in
terms of their effects on stakeholders. It is
ironic that teachers believed that the most
useful assessments for students, teachers,
and parents were valued less by adminis-
trators than standardized and commercial
assessments.
Responses to High-Stakes Assessment
A fourth survey conducted by CIERA
researchers gathered the views of teachers
regarding high-stakes testing (Hoffman,
Assaf, & Paris, 2001). This study, which sur-
veyed reading teachers in Texas, was de-
signed as a modified replication of earlier
investigations of teachers’ views of high-
stakes testing in Arizona (Haladyna, Nolen,
& Haas, 1991) and Michigan (Urdan &
Paris, 1994). Texas is recognized nationally
as one of the leaders in the testing and ac-
countability movement. The Texas Assess-
ment of Academic Skills (TAAS) was the
centerpiece of the state’s accountability sys-
tem throughout the 1990s. The TAAS was a
criterion-referenced assessment of reading
and mathematics given to all Texas students
in grades 3–8 near the end of the year. It has
recently been replaced by the Texas Assess-
ment of Knowledge and Skills (TAKS), but
the design and use are essentially the same.
The study, conducted in 1998–1999, in-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
7. 204 THE ELEMENTARY SCHOOL JOURNAL
NOVEMBER 2004
cluded responses from 200 experienced
reading specialists who returned a mail sur-
vey. For the most part, respondents were
older (61% between the ages of 40 and 60)
and more experienced (63% with over 10
years experience and 45% with over 20
years experience) than Texas classroom
teachers in general. Most respondents were
working in elementary grades (78%) and in
minority school settings (81%) serving low-
income communities (72%) where the need
for reading specialists was greatest and
funds for them were most available.
To examine general attitudes we created
a composite scale for the following four
items from this section:
• Better TAAS tests will make teachers
do a better job.
• TAAS tests motivate students to learn.
• TAAS scores are good measures of
teachers’ effectiveness.
• TAAS test scores provide good com-
parisons of the quality of schools from
different districts.
Each of these items represents some of
the political motivations and intentions
that underlie the TAAS. Respondents rated
each item on a scale ranging from 1
(strongly disagree) to 4 (strongly agree).
The average rating on this composite vari-
able was 1.7 (SD ס .58), suggesting that
reading specialists strongly disagreed with
some of the underlying assumptions of and
intentions for the TAAS.
Another composite variable was created
with items related to the validity of the
TAAS as a measure of student learning. The
four items included in this analysis were:
• TAAS tests accurately measure achieve-
ment for minority students.
• TAAS tests accurately measure achieve-
ment for limited English-speaking stu-
dents.
• Students’ TAAS scores reflect what
students have learned in school during
the past year.
• Students’ TAAS scores reflect the cu-
mulative knowledge students have
learned during their years in school.
The average rating on this composite vari-
able was also 1.7 (SD ס .58), suggesting that
reading specialists challenge the validity of
the test, especially for minority students
and ESL speakers, who are the majority of
students in Texas public schools.
Contrast these general attitudes and be-
liefs regarding TAAS with the perception of
the respondents that administrators believe
TAAS performance is an accurate indicator
of student achievement (M ס 3.1) and the
quality of teaching (M ס 3.3). Also, contrast
this with the perception of the reading spe-
cialists that parents believe the TAAS re-
flects the quality of schooling (M ס 2.8).
The gaping disparity between the percep-
tions of those responding and their views of
administrators’ and parents’ attitudes sug-
gests an uncomfortable dissonance. Other
parts of the TAAS survey revealed that
reading specialists reported more pressure
to cheat on the tests among low-performing
schools, inappropriate uses of the TAAS
data, adverse effects on the curriculum, too
much time spent on test preparation, and
negative effects on teachers’ morale and
motivation. In sum, the survey revealed un-
intended and negative consequences of
high-stakes testing that are similar to results
of other studies of the consequences of
high-stakes testing (e.g., Paris, 2000; Paris,
Lawton, Turner, & Roth, 1991; Urdan &
Paris, 1994).
Summary of the CIERA Surveys
The four CIERA surveys support several
conclusions. First, a vast assortment of com-
mercial and informal reading assessments is
available for K–3 classroom teachers. Stall-
man and Pearson (1990) identified 20 com-
mercial reading tests, yet 10 years later
Pearson et al. (1999) found 148, and the
number is certainly higher today. However,
commercial tests are not the only source of
reading assessments. Meisels and Piker
(2000) solicited information about noncom-
mercial assessments from teachers and ed-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
8. ASSESSMENTS 205
ucators and identified 89 types of literacy
assessments measuring 203 skills. Teachers
face a formidable task of finding appropri-
ate tools, obtaining them, and then adapt-
ing the assessments to their own purposes
and students.
Second, reading assessments varied by
grade level. Teachers in K–1, compared to
teachers in grades 2–3, were more likely to
use assessments of print awareness, phon-
ics, and similar enabling skills than assess-
ments of reading, writing, or motivation.
Teachers in grades K–1 were also less likely
than teachers in grades 2–3 to use standard-
ized tests and commercial assessments. Ob-
servations were reported as the most com-
mon type of assessment and may be slightly
more frequent at grades K–1. Recognition
as a response option was also used most fre-
quently among younger children, whereas
identification and production were more
frequent at grades 2–3. Teachers in grades
2–3 use more sophisticated tests of reading
and writing and fewer measures of enabling
skills as their assessment methods match
the developing abilities of their students.
Third, teachers regarded informal mea-
sures that they design, select, and embed in
the curriculum as more useful for teachers,
students, and parents than commercial as-
sessments. Teachers regarded standardized
tests and commercial tests that allow little
teacher control and adaptation as less useful
and used them less often. Paradoxically, the
standardized tests were regarded as having
the most effect on administrators’ knowl-
edge and reporting practices. We think that
teachers’ frustration with assessments is
partly tied to this paradox.
Fourth, the most frequently used and
highly valued reading assessments are least
visible to parents and administrators be-
cause they are not reported publicly. Obser-
vations, anecdotes, and daily work samples
are certainly low-stakes evidence of
achievement for accountability purposes,
but they may be the most useful for teach-
ers, parents, and students. It is also ironic
that the assessments on which teachers feel
least trained and regard as least useful (i.e.,
standardized tests) are used most often for
evaluations and public reports. Together
these findings suggest that teachers need
support in establishing the value of instruc-
tional assessments in their classrooms for
administrators and parents while also de-
marcating the limits and interpretations of
externally mandated tests (see Hoffman,
Paris, Patterson, Salas, & Assaf, 2003). The
current slogan about the benefits of a bal-
anced approach to reading instruction
might also be applied to a balanced ap-
proach to reading assessment. The skills
that are assessed need to be balanced
among various components of reading, and
the purposes/benefits of assessment need
to be balanced among the stakeholders.
The critical question that many policy
makers ask is, Which reading assessments
provide the best evidence about children’s
accomplishments and progress? The an-
swer may not be one test or even one type
of assessment. A single assessment cannot
adequately represent the complexity of a
child’s reading development. Likewise, the
same assessments may not represent the
curriculum and instructional diversity
among teachers. A single assessment cannot
capture the variety of skills and develop-
mental levels of children in most K–3
classes. That is why teachers use multiple
assessments and choose those that fit their
purposes. These assessments are the ones
that can reveal the most information about
their students. We believe that the most ro-
bust evidence about children’s reading re-
veals developing skills that can be com-
pared to individual standards of progress as
well as to normative standards of achieve-
ment. A developmental approach balances
the types of assessments across a range of
reading factors and allows all stakeholders
to understand the strengths and weak-
nesses of the child’s reading profile. Many
teachers use this approach implicitly, and
we think it is a useful model for early read-
ing assessment rather than a one-test-fits-all
approach.
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
9. 206 THE ELEMENTARY SCHOOL JOURNAL
NOVEMBER 2004
Assessment of Students’ Oral
Reading
Oral reading has been a focus for the as-
sessment of early reading development
throughout the twentieth century (Rasinski
& Hoffman, 2003). Teachers in the afore-
mentioned surveys reported using chil-
dren’s oral reading as an indicator of
growth and achievement. The informal
reading inventory (IRI) changed over time
to focus on the accuracy of oral reading with
less attention to reading rate until recently.
Now researchers have focused attention on
three facets of oral reading fluency—rate,
accuracy, and prosody—as indicators of au-
tomatic decoding and successful reading
(Kuhn & Stahl, 2003).
During the first year of CIERA, Scott
Paris and David Pearson were asked by the
Michigan Department of Education (MDE)
to help evaluate the new Michigan Literacy
Progress Profile (MLPP) while also evalu-
ating summer reading programs through-
out the state. These research projects dove-
tailed with CIERA research on assessment,
so we spent 5 years working with the In-
gham Intermediate School District and
MDE evaluating summer reading programs
and testing components of the MLPP. The
program evaluations led to several insights
about early reading assessments and eval-
uation research that are worth noting here
(Paris, Pearson, et al., in press).
One insight from the research was the
realization that informal reading invento-
ries (IRIs) were legitimate tools for assess-
ing student growth in reading and for pro-
gram evaluation. In the past 5–7 years,
several state assessment programs and
commercial reading assessments have used
leveled texts with running records or mis-
cue analyses as formative and summative
assessments of early reading. There has
been widespread enthusiasm for such IRI
assessments that serve both purposes be-
cause the assessments are authentic, aligned
with classroom instructional practices, and
integrated into the curriculum. In fact, IRIs
are similar to the daily performance assess-
ments and observations teachers reported
in the CIERA survey of classroom assess-
ments. However, the use of IRIs for sum-
mative assessment must be viewed with
caution until the reliability and validity of
IRI assessments administered by teachers
can be established. Extensive training and
professional development that integrate
reading assessment with instruction seem
necessary in our experience.
A second insight has involved the diffi-
culties in analyzing students’ growth when
students are reading different leveled texts.
The main problem in using IRIs for mea-
suring reading growth is that running re-
cords and miscue analyses are gathered on
variable levels of text that are appropriate
for each child. Thus, comparing a child’s
reading proficiency at two times (or com-
paring various children to each other over
time) usually involves comparisons of dif-
ferent passages and text levels, so changes
in children’s performance are confounded
by differences between passages and diffi-
culty levels. Paris (2002) identified several
methods for analyzing IRI data from lev-
eled texts and concluded that the most so-
phisticated statistical procedure was based
on Item Response Theory (IRT). In the eval-
uation of summer reading programs, Paris,
Pearson, et al. (2004) used IRT analyses to
scale all the reading data from more than
1,000 children on different passages and dif-
ferent levels of an IRI so the scores could be
compared on single scales of accuracy, com-
prehension, retelling, and so forth. Those
analyses revealed significant effects on chil-
dren who participated in summer reading
programs compared to control groups of
children who did not participate in the sum-
mer programs (see Paris, Pearson, et al.,
2004).
A brief description of IRT analyses will
reveal the benefits of this approach. IRT is
a psychometric method of analyzing data
that allows estimates of individual scores
that are independent of the actual test items.
This is important for reading assessment
that compares students’ growth over time
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
10. ASSESSMENTS 207
on different levels, items, and tests, which
is the usual problem using IRI data. The IRT
scaling procedures in a two-parameter
Rasch model estimate the individual scores
and item difficulties simultaneously (Em-
bretson & Reise, 2000). The crux of an IRT
analysis is to find optimal estimates for the
item parameters that depend on the stu-
dents’ IRT scores that, in turn, depend on
the item parameters. This catch-22 is solved
statistically by an iterative procedure that
converges toward a final solution with op-
timal estimates for all parameters. How-
ever, the calculation is different from other
statistical procedures, such as regression
analysis, because ‘‘likelihood’’ is the under-
lying concept and not regression weights.
The item difficulty is calculated accord-
ing to a logistic function that identifies the
point on an item parameter scale where the
probability of a correct response is exactly
.50. The distribution of correct answers
across items of varying difficulty from stu-
dents in the sample permits estimates of in-
dividual IRT scores that are based on the
actual as well as possible patterns of correct
responses. The numerical IRT scale is then
established with a zero point and a range of
scores, for example, 0–100 or 200–800. For-
tunately, there are software programs avail-
able to calculate IRT scores, but they have
rarely been used with children’s reading
data derived from IRIs and leveled texts.
We think IRT analyses are scientifically rig-
orous and potentially useful ways to ex-
amine children’s reading data and progress
over time.
A third set of insights about reading as-
sessments involved practical decisions
about how to use IRIs effectively. Paris and
Carpenter (2003) found that teachers re-
quire sustained professional development
and schoolwide implementation of reading
assessments to use them uniformly, consis-
tently, and wisely. The real benefit of IRIs is
the knowledge teachers gain while assess-
ing individual children because the assess-
ment framework provides insights about
needed instruction. Teachers need guidance
in selecting IRIs, administering them, inter-
preting them, and using the results with
students and parents, and that guidance
needs to be shared knowledge among the
school staff so it creates a culture of under-
standing about reading assessment. Paris
and Carpenter (2003) found that imple-
menting a schoolwide system of recording
and reporting the data as part of the veri-
fication of students’ adequate yearly pro-
gress (AYP) made the assessments worth
the time and energy of all the participants.
Thus, teachers gained diagnostic informa-
tion about students and also provided ac-
countability through measures of AYP by
comparing fall and spring scores.
A fourth insight that researchers gained
is that IRIs can provide multiple indicators
of children’s oral reading, including rate,
accuracy, prosody, retelling, and compre-
hension and that teachers can choose which
measures to collect. CIERA research iden-
tified some problems with the various mea-
sures derived from IRIs (Paris, Carpenter,
Paris, & Hamilton, in press). For example,
there are restricted ranges and ceiling ef-
fects in some measures, such as prosody
and accuracy. It also appears that compre-
hension is more highly related to oral read-
ing accuracy and rate in beginning readers
and that the relation decreases by the time
children are reading texts at a third- or
fourth-grade level. This means that some
children become adept ‘‘word callers’’ with
little evidence of comprehension, so reading
rate and accuracy measures in IRIs may
yield incomplete information for older
readers.
IRI data on oral reading fluency and
comprehension are most informative about
children’s reading during initial skill devel-
opment, approximately grades K–3, and
when the information is used in combina-
tion with other assessments. Assessmentsof
prerequisite skills for fluent oral reading,
such as children’s vocabulary, letter-sound
knowledge, phonological awareness, begin-
ning writing, understanding of text conven-
tions, and book-handling skills, may aug-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
11. 208 THE ELEMENTARY SCHOOL JOURNAL
NOVEMBER 2004
ment IRIs with valuable information. Thus,
IRIs provide developmentally sensitive as-
sessments for beginning and struggling
readers when fluency and understanding
are growing quickly and when teaching fo-
cuses on specific reading skills. IRIs are ex-
cellent tools for combining diagnostic and
summative assessments in an authentic for-
mat for teachers and students.
New Directions in Early Reading
Assessment
In this part of our review, we summarize
four examples of innovative assessments by
CIERA researchers that chart new direc-
tions in literacy assessment with young
children.
Narrative Comprehension
During the past 10 years of renewed em-
phases on beginning reading, there has
been less attention given to children’s com-
prehension skills compared to decoding
skills (National Reading Panel, 2000). More
research on young children’s comprehen-
sion skills and strategies is needed to diag-
nose and address children’s early reading
difficulties that extend beyond decoding. A
major CIERA assessment project focused on
children’s comprehension of narrative sto-
ries, and more specifically, on narratives il-
lustrated in wordless picture books. Paris
and Paris (2003) created and tested compre-
hension assessment materials and proce-
dures that can be used with young children,
whether or not they can decode print. Such
early assessments of comprehension skills
can complement existing assessments of en-
abling skills, provide diagnostic measures
of comprehension problems, and link com-
prehension assessment with classroom in-
struction.
Narrative comprehension is a complex
meaning-making process that depends on
the simultaneous development of many
skills, including, for example, understand-
ing of story structure and relations among
elements and psychological understanding
about characters’ thoughts and feelings. It
is important to assess narrative comprehen-
sion for several reasons. First, narrative
competence is among the fundamental cog-
nitive skills that influence early reading de-
velopment. Whitehurst and Lonigan (1998)
refer to these skills as ‘‘outside-in’’ skills be-
cause children use the semantic, conceptual,
and narrative relations that they already
know to comprehend the text. In this view,
narrative competence is a fundamental as-
pect of children’s comprehension of expe-
riences before they begin to read, and it
helps children map their understanding
onto texts. The importance and early devel-
opment of narrative thinking may be one
reason that elementary classrooms are dom-
inated by texts in the narrative genre (Duke,
2000). Second, because of the extensive re-
search on narrative comprehension, there is
ample documentation of its importance
among older children and adults as well as
extensive research on its development (e.g.,
Berman & Slobin, 1994). Third, the clear
structure of narrative stories with specific
elements and relations provides a structure
for assessment of understanding. Fourth,
narrative is closely connected to many con-
current developmental accomplishments of
young children in areas such as language,
play, storytelling, and television viewing. It
is an authentic experience in young chil-
dren’s lives, and it reveals important cog-
nitive accomplishments.
In a procedure similar to one van Kraay-
enoord and Paris (1996) used, Paris and
Paris (2003) modified trade books with clear
narrative story lines—a strategy that can be
used easily for both assessment and instruc-
tional purposes—to create the narrative
comprehension (NC) assessment task. They
located commercially published wordless
picture books, adapted them by deleting
some irrelevant pages to shorten the task,
and assembled the pages of photocopied
black and white pictures into spiral bound
little books. It was important that the story
line revealed by the pictures was clear with
an obvious sequence of events and that the
pictures contained the main elements of sto-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
12. ASSESSMENTS 209
ries (i.e., settings, characters, problems, res-
olutions). The first study in Paris and Paris
(2003) established the NC task procedures
for observing how K–2 children interacted
with wordless picture books under three
conditions: spontaneous examination dur-
ing a ‘‘picture walk,’’ elicited retelling, and
prompted comprehension during question-
ing. The results were striking. The retelling
and prompted comprehension scores in-
creased in regular steps for each grade from
K to 2, and readers were significantly better
than nonreaders on both measures, thus
showing developmental sensitivity of the
assessment task. There were no develop-
mental differences on the picture walk be-
haviors, however.
In study 2, Paris and Paris (2003) ex-
tended the procedures to additional word-
less picture books and examined the reli-
ability of the assessment procedures. The
similarity of developmental trends across
books indicated that the NC task is sensitive
to progressive increases in children’s abili-
ties to make inferences and connections
among pictures and to construct coherent
narrative relations from picture books. Sim-
ilarity of performance across books showed
that examiners can administer the NC task
with different materials and score children’s
performance in a reliable manner. Thus, the
generalizability and robustness of the NC
task across picture books were supported.
In study 3, Paris and Paris (2003) ex-
amined the predictive and concurrent va-
lidity of the NC task with standardized and
informal measures of reading. The similar-
ity in correlations, overall means, and de-
velopmental progressions of NC measures
confirmed the patterns revealed by studies
1 and 2 with new materials and additional
children. In addition, the NC task was sen-
sitive to individual growth over 1 year that
was not due to practice effects. The NC re-
telling and comprehension measures were
correlated significantly with concurrent as-
sessments with an IRI and the Gates-
MacGinitie Reading Test, a standardized,
group-administered test. Furthermore, the
NC comprehension scores among first
graders significantly predicted their scores
on the Iowa Tests of Basic Skills (ITBS) a
year later in second grade (r ס .52).
The three studies provided consistent
and positive evidence about the NC task as
a developmentally appropriate measure of
5–8-year-old children’s narrative under-
standing of picture books. Retelling and
prompted comprehension scores improved
significantly with age, indicating that the
NC task differentiates children who can re-
call main narrative elements, identify criti-
cal explicit information, make inferences,
and connect information across pages from
children who have weaknesses with these
narrative comprehension skills. The NC
task requires brief training and can be given
to children in less than 15 minutes, which is
critical for individual assessment of young
children. The high percentage agreement
between raters among the three books
showed that the scoring rubrics are reliable
across story content and raters. The similar
patterns of cross-sectional and longitudinal
performance further confirmed the gener-
alizability of the task. The strong concurrent
and predictive relations provided encour-
aging evidence of the validity of the NC
task as a measure of comprehension for
emergent readers.
In a related series of CIERA studies, van
den Broek et al. (in press) examined pre-
school children’s comprehension of tele-
vised narratives. They showed 20-minute
episodes of children’s television programs
and presented 13-minute audiotaped sto-
ries to children to compare their viewing
and listening comprehension. Children re-
called causally related events in the narra-
tives better than other kinds of text rela-
tions, and their recall scores in viewing and
listening conditions were highly correlated.
Furthermore, preschoolers’ comprehension
of TV episodes predicted their standardized
reading comprehension test scores in sec-
ond grade. The predictive strength re-
mained even when vocabulary and word
identification skills were controlled in a re-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
13. 210 THE ELEMENTARY SCHOOL JOURNAL
NOVEMBER 2004
gression analysis. Thus, narrative compre-
hension skills of preschoolers can be as-
sessed with TV and picture books, and the
measures have significant predictive valid-
ity for later reading comprehension. We
think that narrative comprehension view-
ing and listening tasks can help teachers to
focus on comprehension skills of young
children even if the children have restricted
decoding skills, few experiences with
books, or limited skills in speaking English.
Parent-Child Interactive Reading
DeBruin-Parecki (1999) created an as-
sessment procedure for family literacy pro-
grams that records interactive book reading
behaviors. One purpose of the assessment
was to help parents with limited literacy
skills understand the kinds of social, cog-
nitive, and literate behaviors that facilitate
preschool children’s engagement with
books. A second purpose was to provide
family literacy programs with visible evi-
dence of the quantity and quality of parent-
child book interactions. The research was
based on the premise that children learn to
read early and with success when parents
provide stimulating, print-rich environ-
ments at home (e.g., Bus, van Ijzendoorn, &
Pellegrini, 1995). Moreover, parents must
provide appropriate support during joint
book reading. Morrow (1990) identified ef-
fective interactive reading behaviors, such
as questioning, scaffolding dialogue and re-
sponses, offering praise or positive rein-
forcement, giving or extending information,
clarifying information, restating informa-
tion, directing discussion, sharing personal
reactions, and relating concepts to life ex-
periences. Thus, DeBruin-Parecki (1999)
created the Adult/Child Interactive Read-
ing Inventory (ACIRI) to assess these kinds
of behaviors.
The ACIRI lists 12 literacy behaviors of
adults and the corresponding 12 behaviors
by children. For example, one adult behav-
ior is ‘‘poses and solicits questions about the
book’s content,’’ and the corresponding
child behavior is ‘‘responds to questions
about the book.’’ There were four behaviors
in each of the following three categories: en-
hancing attention to text, promoting inter-
active reading and supporting comprehen-
sion, and using literate strategies. The
observer using the ACIRI recorded the fre-
quencies of the 12 behaviors for both parent
and child along with notes about the joint
book reading. The assessment was designed
to be brief (15–30 minutes), flexible, non-
threatening, appropriate for any texts,
shared with parents, and informative about
effective instruction. Following the assess-
ment, the observer discusses the results
with the parent to emphasize the positive
features of the interaction and to provide
guidance for future book interactions. The
transparency of the assessment and the im-
mediate sharing of information minimize
the discomfort of being observed. After
leaving the home, the observer can record
additional notes and calculate quantitative
scores for the frequencies of observed be-
haviors.
DeBruin-Parecki tested the ACIRI with
29 mother-child pairs enrolled in an Even
Start family literacy program in Michigan.
The children were 3 to 5 years old, and the
mothers were characterized as lower socio-
economic status. The regular staff collected
ACIRI assessments at the beginning and
end of the year as part of their program
evaluation and field testing of the assess-
ment. Familiarity also minimized anxiety
about being observed. The results sup-
ported the usefulness of the instrument.
The ACIRI was shown to be sensitive to
parent-child interactions because the four
behaviors in each of the categories showed
significant correlations between the fre-
quencies of adult and child behaviors. Re-
liability was evaluated by having observers
rate videotaped parent-child book interac-
tions. Interrater reliability was 97% among
eight raters. Consequential validity was
established through staff interviews that
showed favorable evaluations of the ACIRI.
The comparison of fall to spring scores
showed that parents and children increased
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
14. ASSESSMENTS 211
the frequencies of many of the 12 behaviors
during the year. Thus, the ACIRI provided
both formative and summative assessment
functions for Even Start staff.
Book Reading in Early Childhood
Classrooms
In addition to measuring reading skills
of children and adults, it is also important
to assess the literate environment. Longi-
tudinal and cross-sectional studies of chil-
dren’s literacy development reveal that
more frequent book reading at home, sup-
ported by interactive conversations and
scaffolded instruction, leads to growth in
language and literacy during early child-
hood (Scarborough & Dobrich, 1994; Ta-
bors, Snow, & Dickinson, 2001). Similar
studies in schools have shown that the qual-
ity of teacher-child interaction, the fre-
quency of book reading, and the availability
of books all enhance children’s early read-
ing and language development (Neuman,
1999; Whitehurst & Lonigan, 1998). Thus,
assessments of environments that support
reading can help improve conditions at
home and school.
Dickinson, McCabe, and Anastasopoulos
(2001) reported a framework for assessing
book reading in early childhood classrooms
along with data from their observations of
many classrooms. They derived the follow-
ing five important dimensions to evaluate in
classrooms.
• Book area. Issues to consider include
whether there is a book area, the qual-
ity of the area, and the quantity and
quality of books provided.
• Time for adult-child book reading.
Time is a critical ingredient, and con-
sideration should be given to the fre-
quency and duration of adult-mediated
reading experiences, including one-to-
one, small-group, and whole-class read-
ings, as well as the number of books
read during these sessions.
• Curricular integration. Integration re-
fers to the nature of the connection be-
tween the ongoing curriculum and the
use of books during whole-class times
and throughout the day.
• Nature of the book reading event.
When considering a book reading
event, one should examine the teacher’s
reading and discussion styles and chil-
dren’s engagement.
• Connections between the home and
classroom. The most effective teachers
and programs strive to support read-
ing at home through parent education,
lending libraries, circulation of books
made by the class, and encouragement
of better use of community libraries.
Dickinson et al. (2001) examined data
from four studies in order to evaluate the
importance of these dimensions. They
noted that many of the preschool class-
rooms they observed were rated high in
quality using historical definitions of devel-
opmentally appropriate practices, but that
the same classrooms were rated as having
low-quality literacy instruction. For exam-
ple, only half the classrooms they observed
had separate areas for children to read
books, and there were few informational
books and few books about varied racial
and cultural groups. They found no book
reading at all in 66 classrooms. In the other
classrooms, adults read to children less than
10 minutes per day, and only 35% of the
classes allowed time for children to look at
books on their own. Other observations led
the researchers to conclude that book read-
ing was not coordinated with the curricu-
lum or learning goals. Only 19% of class-
rooms had three or more books related to
a curricular theme, and only 35% of class-
rooms had listening centers. Dickinson et
al. (2001) noted that group reading is a
filler activity used in classroom transitions
rather than an instructional and curricular
priority.
The researchers also examined book
reading in classrooms by assessing teachers’
style, animation, and prosody as they read.
Most teachers read with little expressive-
ness. Many used an explicit management
style, such as asking questions of children
who raised their hands, as ‘‘crowd control’’
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
15. 212 THE ELEMENTARY SCHOOL JOURNAL
NOVEMBER 2004
rather than using thought-provoking ques-
tions about text. More than 70% of teachers’
talk during book reading made few cogni-
tive demands on children. The researchers
suggested that teachers must devote more
attention to engaging children in discus-
sions that link their experiences to text,
teach new vocabulary words, probe char-
acters’ motivations, and promote compre-
hension of text relations. Analyses of home-
school connections revealed that more
could be done to encourage families to re-
inforce school practices and to seek com-
munity literacy resources. Teachers rarely
connected language and cultural experi-
ences at home with literacy instruction at
school.
Dickinson et al. (2001) concluded that
the framework can be useful for assessing
early childhood classrooms and for study-
ing the effects of specific environmental fea-
tures on children’s literacy development.
They noted, for example, that their research
revealed little correlation between the
amount of book reading in classrooms and
the degree to which reading was integrated
into the curriculum. They interpreted this as
evidence that book reading in many early
childhood classrooms is an incidental activ-
ity rather than a planned instructional goal.
The framework is also useful for reading
educators to use with preservice and in-
service teachers who want to assess their
own classrooms and teaching styles because
it identifies critical elements of successful
classrooms.
Texts and the Text Environment in
Beginning Reading Instruction
The research by Dickinson et al. (2001)
provides a conceptual bridge to several
other CIERA investigations of texts and the
text environment. Two important strands of
research at CIERA have examined the as-
sessment of text characteristics for begin-
ning reading instruction. In the first strand
of research, Hiebert and her colleagues de-
veloped a text assessment framework that
can be used to analyze important features
of texts used for beginning reading instruc-
tion. This framework is grounded in Hie-
bert’s theoretical claims that certain text fea-
tures scaffold readers’ success in early
reading. The framework, called the Text Ele-
ments by Task (TExT) model, identifies two
critical factors in determining beginning
readers’ success with texts: linguistic con-
tent and cognitive load.
Linguistic content refers to the knowl-
edge about oral and written language that
is necessary for readers to recognize the
words in particular texts. Phoneme-
grapheme knowledge that is required to
read a text is described in terms of several
measures that provide different but comple-
mentary information on the phoneme-
grapheme knowledge related to vowels.
The first measure of phoneme-grapheme
knowledge summarizes the complexity of
the vowel patterns in a text. The second
measure is the degree to which highly com-
mon vowel and consonant patterns are re-
peated. To use this measure, the TExT
model examines the number of different on-
sets that appear with a particular rime. The
number of syllables in words is a third mea-
sure of linguistic content that influences be-
ginning readers’ recognition of words. The
model claims that texts with fewer multi-
syllabic words help children acquire fluent
word recognition.
The cognitive load factor within the
TeXT model measures the amount of new
linguistic information to which beginning
readers can attend while continuing to un-
derstand the text’s message. Repetition of at
least some core linguistic content has tra-
ditionally been used to reduce the cognitive
load in text used for teaching children to
read. Within the TeXT model, word repeti-
tion and the number of unique words rela-
tive to total words are used to inspect the
cognitive load a particular text places on be-
ginning readers. Two additional features of
texts that are commonly used in classrooms
are also considered in the model: the sup-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
16. ASSESSMENTS 213
port provided through illustrations, and
patterns of sentence and text structure. In a
recent study applying these TeXT princi-
ples, Menon and Hiebert (2003) compared
‘‘little books’’ developed within the frame-
work to traditional beginning reading texts.
They demonstrated the effectiveness of stu-
dents’ practice reading little books devel-
oped in consideration of the TeXT model
over traditional basal texts. The assessment
tools developed in this research can be used
to evaluate the complexity of texts as well
as to guide the construction of new texts for
beginning readers.
In the second strand of research focused
on texts, Hoffman and his colleagues inves-
tigated the qualities of texts used in begin-
ning reading instruction and the leveling
systems of these texts. The research con-
ducted through CIERA was grounded in
earlier studies of changes in basal reading
texts associated with the literature-based
movement (Hoffman et al., 1998) and later
with the decodable text movement (Hoff-
man, Sailors, & Patterson, 2002). Hoffman
(2002) has proposed a model for the assess-
ment of text quality for beginning reading
instruction that considers three factors: ac-
cessibility, instructional design, and engag-
ing qualities. Accessibility of a text for the
reader is a function of the decoding de-
mands of the text (a word-level focus) and
the support provided through the predict-
able features of the text (ranging from pic-
ture support, to rhyme, to repeated phrases,
to a cumulative structure). The instructional
design factor involves the ways that a text
fits into the larger scheme for texts read (a
leveling issue) as well as the instruction and
curriculum that surround the reading of the
text. Finally, the content factor considers the
content, language, and design features of
the text.
In addition to evaluating the changes in
texts for beginning readers, Hoffman,
Roser, Salas, Patterson, and Pennington
(2001) used this framework to study the
validity of some of the popular text-
leveling schemes. In a study involving over
100 first-grade learners, these researchers
examined the ways in which estimates of
text difficulty using different text-leveling
systems predicted the performance of first-
grade readers. The research identified high
correlations between the factors in the theo-
retical model and the leveling from both the
Pinnell and Fountas and Reading Recovery
systems. The analysis also confirmed the
validity of these systems in predicting stu-
dent performance with first-grade texts. Fi-
nally, the study documented the effects of
varying levels of support provided to the
reader (shared reading, guided reading, no
support) and performance on such mea-
sures as oral reading accuracy, rate, and flu-
ency.
Hoffman and Sailors (2002) created a
method for assessing the classroom literacy
environment called the TEX-IN3 that in-
cludes three components: a text inventory,
a text in-use observation, and a series of text
interviews. The TEX-IN3 draws on several
research literatures, including research on
texts conducted through CIERA. In addi-
tion, the instrument was developed based
on the literature exploring the effects of the
text environment on teaching and learning.
The assessment yields a series of scores as
well as qualitative data on the classroom lit-
eracy environment.
The TEX-IN3 was validated in a study
of over 30 classrooms (Hoffman, Sailors,
Duffy, & Beretvas, 2003). In this study, stu-
dents were tested with pre- and posttests on
a standardized reading test. Observers were
trained in the use of TEX-IN3, and high lev-
els of reliability were established. Data were
collected in classrooms at three times (fall,
winter, and spring). Data analyses focused
on the relations between features of the text
environment from the TEX-IN3 and stu-
dents’ reading comprehension scores. The
analyses supported all three components of
the TEX-IN3. For example, the correlations
between students’ gain scores and the rat-
ings of the overall text environment were
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
17. 214 THE ELEMENTARY SCHOOL JOURNAL
NOVEMBER 2004
significant. Correlations between students’
gain scores and the in-use scores derived
from observation of teaching, as well as the
correlations between the rating of teachers’
understanding and valuing of the text en-
vironment with students’ gain scores, were
significant. The findings from the research
with the TEX-IN3 suggest the importance of
expanding assessment from a narrow focus
on the texts in classrooms to consideration
of texts in the full literacy environment.
Summary and Future Research
From 1997 to 2002, CIERA researchers con-
ducted many studies of early reading as-
sessment that focused on readers and text,
home and school, and policy and profes-
sion. The CIERA surveys of early reading
assessments identified the expanding array
of assessment instruments, commercial and
noncommercial, available to K–3 teachers.
Researchers also identified how effective
teachers in schools that ‘‘beat the odds’’ use
assessments and how they view the utility
and effects of various types of assessments.
The instruments used most frequently con-
tributed to ongoing CIERA research on the
development of the MLPP battery and the
use of IRIs for formative and summative
purposes. Those studies remain in progress
as researchers collect longitudinal evidence
about the reliability and validity of early
reading assessments (e.g., Paris, Carpenter,
et al., in press).
The most important insight from this re-
search is that some skills, such as alphabet
knowledge, concepts of print, and phone-
mic awareness, are universally mastered in
relatively brief developmental periods. As a
consequence, the distributions of data from
these variables are skewed by floor and ceil-
ing effects that, in turn, influence the cor-
relations used to establish reliability and va-
lidity of the assessments. Assessments of
oral reading accuracy, and perhaps rate, are
also skewed, so that measures of some basic
reading skills are difficult to analyze with
parametric statistics in traditional ways.
The mastery of some reading skills poses
challenges to conventional theories of read-
ing development and traditional statistical
analyses.
CIERA researchers also developed in-
novative assessments of comprehension
with wordless picture books that offer
teachers new ways to instruct and assess
comprehension with children who cannot
yet decode print. These cross-sectional and
longitudinal studies substantiate the reli-
ability and validity of early assessments. In
addition, CIERA researchers designed and
tested new methods for assessing narrative
comprehension, interactive parent-child
reading, literate environments in early
childhood classrooms, text features, and the
text environment. All of these tools have
immediate practical applications and bene-
fits for educators. Indeed, the hallmark of
CIERA research on reading assessment is
the use of rigorous methods to identify, cre-
ate, test, and refine instruments and prac-
tices that can help parents and teachers pro-
mote the reading achievement of all
children.
This research, as well as studies outside
the immediate CIERA network, points to
the need for continuing study of assessment
in early literacy. We believe that at least four
areas deserve special attention. First, the
policy context for instructional programs,
teaching, and teacher education that places
a premium on ‘‘scientifically proven’’ ap-
proaches and methods has immediate im-
plications for assessment. Tools to be used
in reading assessment (e.g., for diagnosis,
program evaluation, or research) are subject
to high standards of validity and reliability.
We applaud this attention to rigor in as-
sessment, but we believe that decision mak-
ing about the use of instruments should be
professional, public, and comprehensive.
Such deliberations must extend beyond the
traditional psychometric constructs of reli-
ability and validity to include consideration
of the consequences of testing and the social
contexts of assessment.
Second, researchers must continue to in-
vestigate the ways in which assessment
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
18. ASSESSMENTS 215
tools can be broadened to focus on multiple
factors and the interaction of these factors
in ways that reflect authentic learning and
teaching environments. For example, infor-
mal reading inventories have become pop-
ular tools for assessment, partly because
reading rate and accuracy can be assessed
quickly and reliably, but educators need to
consider how text-leveling factors might in-
teract with students’ developmental levels
to influence evaluations of reading perfor-
mance. Good assessments should lead to an
understanding of the complexity of learn-
ing to read and not impose a false sense of
simplicity on early reading development.
Third, the gulf between what teachers
value as informal assessments and what is
imposed on them in the form of standard-
ized testing appears to be broadening. Al-
though performance assessments and port-
folios were popular in the 1980s and 1990s,
the trends today are to increase high-stakes
testing for young children, to remove
teacher judgment from assessment, and to
streamline assessments so they can be con-
ducted quickly and repeatedly. More re-
search is needed on how highly effective
teachers assess developing reading skills in
their classrooms. Before educators and pol-
icy makers abandon performance assess-
ment, careful consideration must be given
to the ways that ongoing assessment can
promote differentiated instruction.
Fourth, researchers cannot lose sight of
the fact that good assessment rests on good
theory, not just a theory of reading but of
effective teaching and development. Just
because motivation, self-concept, and criti-
cal thinking are difficult to measure using
large-scale standardized tests does not
mean they should be ignored. The scientific
method is not just about comparing one
program or one approach to another to
prove which is best. The scientific investi-
gation of assessment in early literacy should
contribute to theory building that ulti-
mately informs effective teaching and learn-
ing.
References
Berman, R. A., & Slobin, D. I. (1994). Relating
events in narrative: A crosslinguistic develop-
mental study. Hillsdale, NJ: Erlbaum.
Bus, A. J., van Ijzendoorn, M. H., & Pellegrini, A.
D. (1995). Joint book-reading makes for suc-
cess in learning to read: A meta-analysis on
intergenerational transmission of literacy. Re-
view of Educational Research, 65(1), 1–21.
Clay, M. M. (1993). An observation survey of early
literacy achievement. Portsmouth, NH: Hei-
nemann.
DeBruin-Parecki, A. (1999). Assessing adult/child
storybook reading practices (Technical Rep. No.
2-004). Ann Arbor: University of Michigan,
Center for the Improvement of Early Read-
ing Achievement.
Dickinson, D. K., McCabe, A., & Anastasopou-
los, L. (2001). A framework for examining book
reading in early childhood classrooms (Tech.
Rep. No. 1-014). Ann Arbor: University of
Michigan, Center for the Improvement of
Early Reading Achievement.
Duke, N. K. (2000). 3.6 minutes per day: The
scarcity of information texts in first grade.
Reading Research Quarterly, 35(2), 202–224.
Embretson, S. E., & Reise, S. P. (2000). Item re-
sponse theory for psychologists. Mahwah, NJ:
Erlbaum.
Good, R. H., & Kaminski, R. A. (Eds.). (2002).
Dynamic indicators of basic early literacy skills
(6th
ed.). Eugene, OR: Institute for the Devel-
opment of Educational Achievement.
Haladyna, T., Nolen, S. B., & Haas, N. S. (1991).
Raising standardized achievement test
scores and the origins of test score pollution.
Educational Researcher, 20(5), 2–7.
Hoffman, J. V. (2002). Words on words in leveled
texts for beginning readers. In D. Schallert,
C. Fairbanks, J. Worthy, B. Maloch, & J. V.
Hoffman (Eds.), Fifty-first yearbook of the Na-
tional Reading Conference (pp. 59–81). Oak
Creek, WI: National Reading Conference.
Hoffman, J. V., Assaf, L. C., & Paris, S. G. (2001).
High-stakes testing in reading: Today in
Texas, tomorrow? Reading Teacher, 54(5), 482–
492.
Hoffman, J. V., McCarthey, S. J., Abbott, J., Chris-
tian, C., Corman, L., Curry, C., Dressman, M.,
Elliot, B., Mathern, D., & Stahle, E. (1998).
The literature-based basals in first-grade
classrooms: Savior, satan, or same-old, same-
old? Reading Research Quarterly, 33, 168–197.
Hoffman, J. V., Paris, S. G., Patterson, E., Salas,
R., & Assaf, L. (2003). High-stakes assess-
ment in the language arts: The piper plays,
the players dance, but who pays the price?
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
19. 216 THE ELEMENTARY SCHOOL JOURNAL
NOVEMBER 2004
In J. Flood & D. Lapp (Eds.), Handbook of re-
search on teaching the English language arts (2d
ed., pp. 619–630). Mahwah, NJ: Erlbaum.
Hoffman, J. V., Roser, N. L., Salas, R., Patterson,
E., & Pennington, J. (2001). Text leveling and
‘‘little books’’ in first-grade reading. Journal
of Literacy Research, 33(3), 507–528.
Hoffman, J. V., & Sailors, M. (2002). The TEX-IN3:
Text inventory, text in-use and text interviews.
Bastrop, TX: Jeaser.
Hoffman, J. V., Sailors, M., Duffy, G., & Beretvas,
C. (2003, April). Assessing the literacy environ-
ment using the TEX-IN3: A validity study. Pa-
per presented at the annual meeting of the
American Educational Research Association,
Chicago.
Hoffman, J. V., Sailors, M., & Patterson, E. (2002).
Decodable texts for beginning reading in-
struction: The year 2000 basals. Journal of Lit-
eracy Research, 34(3), 269–298.
Kuhn, M. R., & Stahl, S. A. (2003). Fluency: A
review of developmental and remedial prac-
tices. Journal of Educational Psychology, 95(1),
3–21.
Meisels, S. J., Liaw, F. R., Dorfman, A. B., & Nel-
son, R. (1995). The Work Sampling System:
Reliability and validity of a performance as-
sessment for young children. Early Childhood
Research Quarterly, 10(3), 277–296.
Meisels, S. J., & Piker, R. A. (2000). An analysis of
early literacy assessments used for instruction
(Tech. Rep. No. 3-002). Ann Arbor: Univer-
sity of Michigan, Center for the Improve-
ment of Early Reading Achievement.
Menon, S., & Hiebert, E. H. (2003). A comparison
of first graders’ reading acquisition with little
books and literature anthologies (Tech. Rep. No.
1-009). Ann Arbor: University of Michigan,
Center for the Improvement of Early Read-
ing Achievement.
Morrow, L. M. (1990). Assessing children’s un-
derstanding of story through their construc-
tion and reconstruction of narrative. In L. M.
Morrow & J. K. Smith (Eds.), Assessment for
instruction in early literacy (pp. 110–133). En-
glewood Cliffs, NJ: Prentice-Hall.
National Reading Panel. (2000). Teaching children
to read: An evidence-based assessment of the sci-
entific research literature on reading and its im-
plications for reading instruction: Reports of the
subgroups. Bethesda, MD: National Institute
of Child Health and Human Development.
Neuman, S. B. (1999). Books make a difference:
A study of access to literacy. Reading Research
Quarterly, 34(3), 286–311.
No Child Left Behind Act of 2001. (2002). Pub. L.
No. 107–110, paragraph 115, Stat. 1425.
Paris, A. H., & Paris, S. G. (2003). Assessing nar-
rative comprehension in young children.
Reading Research Quarterly, 38(1), 37–76.
Paris, S. G. (2000). Trojan horse in the schoolyard:
The hidden threats in high-stakes testing. Is-
sues in Education, 6(1,2), 1–16.
Paris, S. G. (2002). Measuring children’s reading
development using leveled texts. Reading
Teacher, 56(2), 168–170.
Paris, S. G., & Carpenter, R. D. (2003). FAQs
about IRIs. Reading Teacher, 56(6), 578–580.
Paris, S. G., Carpenter, R. D., Paris, A. H., &
Hamilton, E. E. (in press). Spurious and gen-
uine correlates of children’s reading compre-
hension. In S. G. Paris & S. A. Stahl (Eds.),
Children’s reading comprehension and assess-
ment. Mahwah, NJ: Erlbaum.
Paris, S. G., Lawton, T. A., Turner, J. C., & Roth,
J. L. (1991). A developmental perspective on
standardized achievement testing. Educa-
tional Researcher, 20, 12–20.
Paris, S. G., Paris, A. H., & Carpenter, R. D.
(2002). Effective practices for assessing
young readers. In B. Taylor & P. D. Pearson
(Eds.), Teaching reading: Effective schools, ac-
complished teachers (pp. 141–160). Mahwah,
NJ: Erlbaum.
Paris, S. G., Pearson, P. D., Cervetti, G., Carpen-
ter, R., Paris, A. H., DeGroot, J., Mercer, M.,
Schnabel, K., Martineau, J., Papanastasiou,
E., Flukes, J., Humphrey, K., & Bashore-Berg,
T. (2004). Assessing the effectiveness of sum-
mer reading programs. In G. Borman & M.
Boulay (Eds.), Summer learning: Research, pol-
icies, and programs (pp. 121–161). Mahwah,
NJ: Erlbaum.
Pearson, P. D., Sensale, L., Vyas, S., & Kim, Y.
(1999, June). Early literacy assessment: A mar-
ketplace analysis. Paper presented at the Na-
tional Conference on Large-Scale Assess-
ment, Snowbird, UT.
Rasinski, T. V., & Hoffman, J. V. (2003). Oral read-
ing in the school literacy curriculum. Reading
Research Quarterly, 38(4), 510–522.
Scarborough, H. S., & Dobrich, W. (1994). On the
efficacy of reading to preschoolers. Develop-
mental Review, 14, 245–302.
Snow, C. E., Burns, M. S., & Griffin, P. (1998).
Preventing reading difficulties in young children.
Washington, DC: National Academy Press.
Stallman, A. C., & Pearson, P. D. (1990). Formal
measures of early literacy. In L. M. Morrow
& J. K. Smith (Eds.), Assessment for instruction
in early literacy (pp. 7–44). Englewood Cliffs,
NJ: Prentice-Hall.
Tabors, P. O., Snow, C. E., & Dickinson, D. K.
(2001). Homes and schools together: Sup-
porting language and literacy development.
In D. K. Dickinson & P. O. Tabors (Eds.), Be-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
20. ASSESSMENTS 217
ginning literacy with language: Young children
learning at home and in school (pp. 313–334).
Baltimore: Brookes.
Urdan, T. C., & Paris, S. G. (1994). Teachers’ per-
ceptions of standardized achievement tests.
Educational Policy, 8(2), 137–156.
van den Broek, P., Kendeou, P., Kremer, K.,
Lynch, J., Butler, J., White, M. J., & Lorch, E.
P. (in press). Assessment of comprehension
abilities in young children. In S. G. Paris &
S. A. Stahl (Eds.), Children’s reading compre-
hension and assessment. Mahwah, NJ: Erl-
baum.
van Kraayenoord, C. E., & Paris, S. G. (1996).
Story construction from a picture book: An
assessment activity for young learners. Early
Childhood Research Quarterly, 11, 41–61.
Whitehurst, G. J., & Lonigan, C. J. (1998). Child
development and emergent literacy. Child
Development, 69(3), 848–872.
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions