SlideShare a Scribd company logo
1 of 20
Download to read offline
Reading Assessments in Kindergarten through Third Grade: Findings from the Center for
the Improvement of Early Reading Achievement
Author(s): Scott G. Paris and James V. Hoffman
Source: The Elementary School Journal, Vol. 105, No. 2, Lessons from Research at the Center
for the Improvement of Early Reading Achievement<break></break>Joanne F. Carlisle, Steven
A. Stahl, and Deanna Birdyshaw, Guest Editors (November 2004), pp. 199-217
Published by: The University of Chicago Press
Stable URL: http://www.jstor.org/stable/10.1086/428865 .
Accessed: 19/06/2013 12:22
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
.
The University of Chicago Press is collaborating with JSTOR to digitize, preserve and extend access to The
Elementary School Journal.
http://www.jstor.org
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
The Elementary School Journal
Volume 105, Number 2
᭧ 2004 by The University of Chicago. All rights reserved.
0013-5984/2004/10502-0005$05.00
Reading Assessments
in Kindergarten
through Third Grade:
Findings from the
Center for the
Improvement of Early
Reading Achievement
Scott G. Paris
University of Michigan
James V. Hoffman
University of Texas at Austin
Abstract
Assessment of early reading development is im-
portant for all stakeholders. It can identify chil-
dren who need special instruction and provide
useful information to parents as well as sum-
mative accounts of early achievement in schools.
Researchers at the Center for Improvement of
Early Reading Achievement (CIERA) investi-
gated early reading assessment in a variety of
studies that employed diverse methods. One
group of studies used survey methods to deter-
mine the kinds of assessments available to teach-
ers and the teachers’ reactions to the assess-
ments. A second group of studies focused on
teachers’ use of informal reading inventories for
formative and summative purposes. In a third
group of studies, researchers designed innova-
tive assessments of children’s early reading, in-
cluding narrative comprehension, adult-childin-
teractive reading, the classroom environment,
and instructional texts. The CIERA studies pro-
vide useful information about current reading
assessments and identify promising new direc-
tions.
Achievement testing in the United States
has increased dramatically in frequency and
importance during the past 25 years and is
now a cornerstone of educational practice
and policy making. The No Child Left Be-
hind (NCLB) (2001) legislation mandates
annual testing of reading in grades 3–8 and
increased assessment for students in grades
K–3 with clear intentions of increased ac-
countability and achievement. The ration-
ales for early assessment lie in (a) research
on reading development that indicates the
importance of basic skills for future success
and (b) classroom evidence that early diag-
nosis and remediation of reading difficul-
ties can improve children’s reading achieve-
ment (Snow, Burns, & Griffin, 1998). The
unprecedented federal resolve and re-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
200 THE ELEMENTARY SCHOOL JOURNAL
NOVEMBER 2004
sources at the beginning of the twenty-first
century that are focused on the improve-
ment of children’s reading achievement re-
quire researchers and educators to identify
useful assessment tools and procedures.
The Center for the Improvement of
Early Reading Achievement (CIERA), a
consortium of researchers from many uni-
versities, was funded and became opera-
tional in 1997. Assessment of reading
achievement and the corresponding prac-
tices and policies were major foci of the re-
search agenda. It is important to note that
the CIERA research was proposed, and of-
ten conducted, before the report of the Na-
tional Reading Panel (2000) and before the
NCLB legislation. These important events
did not frame CIERA research at the time,
but they certainly influence the interpreta-
tion of assessment tools today. For example,
both the NRP and NCLB emphasized five
essential skills for beginning reading suc-
cess: the alphabetic principle, phonemic
awareness, oral reading fluency, vocabu-
lary, and comprehension. Consequently,
many of the early reading assessments de-
veloped recently have focused on those
skills, especially the first three. CIERA re-
searchers acknowledge the importance of
assessing these skills, but they chose to in-
vestigate a broader array of assessment is-
sues and practices partly because there
were already many assessments of the al-
phabetic principle, phonemic awareness,
and oral reading fluency, such as the Dy-
namic Indicators of Basic Literacy Skills (DI-
BELS), a popular and quick battery of read-
ing assessments (Good & Kaminski, 2002).
Moreover, CIERA researchers in 1997
wanted to survey teachers to find out what
assessments they used and why, as well as
to identify new kinds of assessments for
nonreaders and beginning readers.
CIERA research on early reading assess-
ment was proposed and conducted in an
era of increased testing and evidence-based
policy making. An initial cluster of studies
examined the kinds of reading assessments
available and used by teachers in order to
describe current classroom practices. The
studies were intended to be surveys of best
practices in schools. A second group of
studies examined the use of oral reading as-
sessments to determine students’ adequate
yearly progress (AYP) because many infor-
mal reading inventories were being trans-
formed into formal summative assessments
of reading achievement. Several other
CIERA studies examined innovative tools
and new directions for assessing early read-
ing achievement. The research was explor-
atory, eclectic, and conducted by multiple
investigators, but collectively, the studies
help to identify promising assessments of
reading along with some practical obstacles
to implementation. We present the findings
of these three groups of studies and con-
clude with a discussion of future directions
in K–3 reading assessment research.
Surveys of Early Reading
Assessments
Many teachers are overwhelmed by the nu-
merous reading assessments mandated by
policy makers, advocated by publishers, re-
quired by administrators, or simply rec-
ommended for classrooms. We begin with
an examination of two CIERA studies on
the variety of assessment instruments avail-
able for K–3 teachers. We then examine two
CIERA studies of teachers’ attitudes toward
and use of assessments. The first two stud-
ies differ in their focus on commercial and
noncommercial measures. Both studies fol-
lowed up on the pioneering research by
Stallman and Pearson (1990), who con-
ducted one of the first comprehensive sur-
veys of early reading measures.
The Commercial Marketplace
Ten years after Stallman and Pearson’s
(1990) study, Pearson, Sensale, Vyas, and
Kim (1999) conducted a similar study of
commercial reading tests. They identified
148 tests with 468 subtests in their CIERA
survey. More than half of the tests had been
developed in the 1990s, and more than half
were designed for individual administra-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
ASSESSMENTS 201
tion, clearly a response to the preponder-
ance of group tests in the previous decade.
Multiple-choice responses and marking an-
swer sheets still predominated over read-
ing, writing, or open-ended responses.
Nearly all tests were administered with a
mixture of visual and auditory presenta-
tions. In contrast to the previous decade,
about 40% of tests required production as a
response mode. Recognition was required
in about 40% of the tests and identification
in only 10%. Scoring ease may have driven
the response mode because more than 60%
of the tests could be scored simply as right
or wrong, less than 20% contained multiple-
choice items, and only 10% of tests used ru-
brics to score answers.
Pearson et al. (1999) analyzed the skills
assessed in the 148 tests and found that
word knowledge, such as concept identifi-
cation, was assessed in 50%; sound and
symbol concepts were assessed in 65%; lit-
eracy and language concepts were assessed
in 90%; and comprehension was assessed in
only 24% of the tests. When they analyzed
only the K–1 tests to compare with the Stall-
man and Pearson (1990) findings, they
found that 52% compared to a previous 18%
of tests were administered to individual
children. Only 36% of the tests compared to
the previous 63% were multiple-choice
tests, and the heavy emphasis on sound-
symbol correspondence was reduced in half
and replaced by a much stronger emphasis
on language and literacy concepts. These
changes may be due to the growing influ-
ence of whole language, Clay’s Observation
Survey (1993), and assessment methods
used in Reading Recovery throughout the
1990s. Although the type of processing re-
quired was still largely recognition, it had
decreased from 72% of tests in the first sur-
vey to 51%. Likewise, filling in bubbles de-
creased from 63% to 28% of the tests, and
oral responding increased from 12% to 39%.
The authors also noted the variety of
new reading assessments that emerged in
the 1990s. Kits, such as the Dial-R, that in-
cluded assessment batteries became more
prominent. Some elaborate systems were
developed for using classroom assessment
for formative and summative functions. For
example, the Work Sampling System (Mei-
sels, Liaw, Dorfman, & Nelson, 1995) in-
cludes developmental benchmarks from
ages 3 to 12 in behavioral, academic, and
affective domains that can be used with
teachers’ checklists and students’ portfo-
lios to monitor individual growth and
achievement. The kits and elaborate sys-
tems usually include teachers’ guides, cur-
riculum materials, and developmental ru-
brics. Leveled books became a popular tool
for determining children’s reading levels
for the assessment tasks, again reflecting
the influence of Reading Recovery, Guided
Reading, and similar derivations for in-
struction.
Pearson et al. (1999) concluded that
commercial reading tests in the late 1990s
were much more numerous and varied than
the tests available 10 years earlier. More
skills were tested, particularly language
and literacy concepts. More choices, judg-
ments, and interpretations were required
from the examiner, usually the teacher, to
use the new tests. However, there was still
a preponderance of recognition responses
and filling in bubbles on answer sheets. The
researchers suggested that the changes in
early reading assessments during the 1990s
reflected the influences of three thematic
changes to early literacy and language arts
in classrooms: emergent literacy, process
writing approaches, and performance as-
sessments throughout the curriculum.
Noncommercial Assessments
The second CIERA survey of early read-
ing assessments was conducted by Meisels
and Piker (2000). Their study had three ob-
jectives: ‘‘1) to gain an understanding of
classroom-based literacy measures that are
available to teachers; 2) to characterize the
instructional assessments teachers use in
their classrooms to evaluate their students’
literacy performance; and 3) to learn more
about how teachers assess reading and writ-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
202 THE ELEMENTARY SCHOOL JOURNAL
NOVEMBER 2004
ing skills’’ (p. 5). In contrast to the previous
studies of commercial assessments, Meisels
and Piker (2000) examined noncommercial
literacy assessments that were nominated
by teachers and educators or used by school
districts. They excluded assessments used
for research or accountability and focused
on K–3 instruments. Some assessments of
motivation and attitudes were also in-
cluded. The researchers collected informa-
tion from educational list serves, personal
contacts, literature searches, published re-
views of the measures, Web sites, and news-
letter postings so their survey was directed
at assessments in use rather than on sale in
the marketplace.
Their search identified 89 measures, 60
of which were developed in the 1990s. The
coding categories Meisels and Piker (2000)
used were adapted from the Stallman and
Pearson (1990) survey and categorized mea-
sures on 13 literacy skills: print awareness,
phonics, reading, reading strategies, com-
prehension, writing process, writing con-
ventions, motivation, self-perception, meta-
cognition, attitude, oral language listening
and speaking, and oral language other. The
first six of these skills are most directly re-
lated to reading assessment. However, the
CIERA researchers identified 203 subskills
among these 13 categories. This is again an
indication of the conceptual and practical
decomposition of literacy skills in complex
assessment batteries.
Meisels and Piker (2000) found that 70%
of the measures were designed for individ-
ual administration and nearly half were in-
tended for all grades, K–3. Only five were
available in languages other than English
(four in Spanish, one in Danish). Of the 13
skills, phonics, comprehension, and reading
were assessed most frequently, and moti-
vation, self-perception, and attitudes were
measured least often. Of the 89 measures,
47 were based on observation or on-
demand methods for evaluating students’
literacy. Constructed responses were used
mostly with writing. Checklists were used
in 36% of the measures, and running re-
cords in 15%. The most frequent kind of re-
sponse was oral response on 64% of the
measures, followed by writing on 46% of
the measures. Recognition, identification,
and recall were used to assess about one-
third of the skills. Meisels and Piker (2000)
then examined the skills assessed in each
test and found that 70% were assessed with
observations and that the data were re-
corded in checklists (69%) or anecdotal ob-
servations (45%) most often. Both the lim-
ited response formats for students and the
informal records of teachers are worth not-
ing.
Meisels and Piker (2000) examined the
measures for evidence of psychometric re-
liability and validity and expressed disap-
pointment with the results. Only 14% of the
measures had evidence of good reliability
that ranged from high to moderate. Even
less information was available about valid-
ity. No consistent tests or benchmarks were
used to establish concurrent or predictive
validity. The researchers noted that the non-
commercial measures were less likely to in-
clude psychometric evidence than commer-
cial tests. In comparing their results to the
Stallman and Pearson (1990) study, Meisels
and Piker (2000) also noted that the non-
commercial measures were usually de-
signed for individuals, not groups, and had
more opportunities for students to identify
or produce answers rather than just recog-
nize correct choices. Noncommercial mea-
sures usually had fewer guidelines for ad-
ministering, recording, and interpreting the
assessment information.
How Teachers Use and Regard
Reading Assessments
The next set of studies went beyond a
consideration of the instruments to examine
how teachers use and evaluate them. Paris,
Paris, and Carpenter (2002) reported find-
ings from a survey of teachers’ perceptions
of assessment in early elementary grades.
They asked successful teachers what kinds
of reading assessments they used for what
purposes so that a collection of ‘‘best prac-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
ASSESSMENTS 203
tices’’ might be available as models for
other teachers. The assessment survey was
a part of a large CIERA survey of elemen-
tary teachers who taught in ‘‘beat the odds’’
schools to determine their practices and
views. These schools across the United
States had a majority of students who qual-
ified for Title I programs and had a mean
school test score on some standardized
measure of reading achievement that was
higher than the average score of other Title
I schools in the state. Most of the selected
schools also scored above the state average
for all schools. Candidate schools were se-
lected from a network of CIERA partner
schools as well as from annual reports of
outstanding schools in 1996, 1997, and 1998
as reported by the National Association of
Title I Directors.
The sample included 504 K–3 classroom
teachers in ‘‘beat the odds’’ schools, but the
anonymous and voluntary survey made it
impossible to determine if these were the
most effective teachers in the schools. In
the first part of the survey, teachers were
asked to record the types of reading as-
sessments used in their classrooms and
the frequency with which they used each
one. Most teachers reported that they used
all of the assessment types; 86% used per-
formance assessments, 82% used teacher-
designed assessments, 78% used word at-
tack/word meaning, 74% used measures
of fluency and understanding, 67% used
commercial assessments, and 59% used
standardized reading tests.
The survey showed that K–3 teachers
used a variety of assessments in their class-
rooms daily. Assessments designed by
teachers, including the instructional assess-
ments Meisels and Piker (2000) examined,
were used most frequently, and standard-
ized tests were used least often. This con-
trast was most evident for K–1 teachers
who rarely used standardized tests. The
survey showed that K–3 teachers used ob-
servations, anecdotal evidence, informal in-
ventories, and work samples as their main
sources of evidence about children’s read-
ing achievement and progress. The survey
also showed the variety of tools available to
teachers and the large variation among
teachers in what they used. The daunting
variety of assessments requires a highly
skilled teacher to select and use appropriate
tools.
Another part of the survey posed ques-
tions about the effects of assessments on
various stakeholders. In general, teachers
reported that teacher-designed, informal as-
sessments had more positive effects on stu-
dents, teachers, and parents. Conversely,
teachers believed standardized and com-
mercial assessments had a higher positive
effect on administrators. These patterns
suggest that teachers differentiate between
assessments over which they have control
and assessments generated externally in
terms of their effects on stakeholders. It is
ironic that teachers believed that the most
useful assessments for students, teachers,
and parents were valued less by adminis-
trators than standardized and commercial
assessments.
Responses to High-Stakes Assessment
A fourth survey conducted by CIERA
researchers gathered the views of teachers
regarding high-stakes testing (Hoffman,
Assaf, & Paris, 2001). This study, which sur-
veyed reading teachers in Texas, was de-
signed as a modified replication of earlier
investigations of teachers’ views of high-
stakes testing in Arizona (Haladyna, Nolen,
& Haas, 1991) and Michigan (Urdan &
Paris, 1994). Texas is recognized nationally
as one of the leaders in the testing and ac-
countability movement. The Texas Assess-
ment of Academic Skills (TAAS) was the
centerpiece of the state’s accountability sys-
tem throughout the 1990s. The TAAS was a
criterion-referenced assessment of reading
and mathematics given to all Texas students
in grades 3–8 near the end of the year. It has
recently been replaced by the Texas Assess-
ment of Knowledge and Skills (TAKS), but
the design and use are essentially the same.
The study, conducted in 1998–1999, in-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
204 THE ELEMENTARY SCHOOL JOURNAL
NOVEMBER 2004
cluded responses from 200 experienced
reading specialists who returned a mail sur-
vey. For the most part, respondents were
older (61% between the ages of 40 and 60)
and more experienced (63% with over 10
years experience and 45% with over 20
years experience) than Texas classroom
teachers in general. Most respondents were
working in elementary grades (78%) and in
minority school settings (81%) serving low-
income communities (72%) where the need
for reading specialists was greatest and
funds for them were most available.
To examine general attitudes we created
a composite scale for the following four
items from this section:
• Better TAAS tests will make teachers
do a better job.
• TAAS tests motivate students to learn.
• TAAS scores are good measures of
teachers’ effectiveness.
• TAAS test scores provide good com-
parisons of the quality of schools from
different districts.
Each of these items represents some of
the political motivations and intentions
that underlie the TAAS. Respondents rated
each item on a scale ranging from 1
(strongly disagree) to 4 (strongly agree).
The average rating on this composite vari-
able was 1.7 (SD ‫ס‬ .58), suggesting that
reading specialists strongly disagreed with
some of the underlying assumptions of and
intentions for the TAAS.
Another composite variable was created
with items related to the validity of the
TAAS as a measure of student learning. The
four items included in this analysis were:
• TAAS tests accurately measure achieve-
ment for minority students.
• TAAS tests accurately measure achieve-
ment for limited English-speaking stu-
dents.
• Students’ TAAS scores reflect what
students have learned in school during
the past year.
• Students’ TAAS scores reflect the cu-
mulative knowledge students have
learned during their years in school.
The average rating on this composite vari-
able was also 1.7 (SD ‫ס‬ .58), suggesting that
reading specialists challenge the validity of
the test, especially for minority students
and ESL speakers, who are the majority of
students in Texas public schools.
Contrast these general attitudes and be-
liefs regarding TAAS with the perception of
the respondents that administrators believe
TAAS performance is an accurate indicator
of student achievement (M ‫ס‬ 3.1) and the
quality of teaching (M ‫ס‬ 3.3). Also, contrast
this with the perception of the reading spe-
cialists that parents believe the TAAS re-
flects the quality of schooling (M ‫ס‬ 2.8).
The gaping disparity between the percep-
tions of those responding and their views of
administrators’ and parents’ attitudes sug-
gests an uncomfortable dissonance. Other
parts of the TAAS survey revealed that
reading specialists reported more pressure
to cheat on the tests among low-performing
schools, inappropriate uses of the TAAS
data, adverse effects on the curriculum, too
much time spent on test preparation, and
negative effects on teachers’ morale and
motivation. In sum, the survey revealed un-
intended and negative consequences of
high-stakes testing that are similar to results
of other studies of the consequences of
high-stakes testing (e.g., Paris, 2000; Paris,
Lawton, Turner, & Roth, 1991; Urdan &
Paris, 1994).
Summary of the CIERA Surveys
The four CIERA surveys support several
conclusions. First, a vast assortment of com-
mercial and informal reading assessments is
available for K–3 classroom teachers. Stall-
man and Pearson (1990) identified 20 com-
mercial reading tests, yet 10 years later
Pearson et al. (1999) found 148, and the
number is certainly higher today. However,
commercial tests are not the only source of
reading assessments. Meisels and Piker
(2000) solicited information about noncom-
mercial assessments from teachers and ed-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
ASSESSMENTS 205
ucators and identified 89 types of literacy
assessments measuring 203 skills. Teachers
face a formidable task of finding appropri-
ate tools, obtaining them, and then adapt-
ing the assessments to their own purposes
and students.
Second, reading assessments varied by
grade level. Teachers in K–1, compared to
teachers in grades 2–3, were more likely to
use assessments of print awareness, phon-
ics, and similar enabling skills than assess-
ments of reading, writing, or motivation.
Teachers in grades K–1 were also less likely
than teachers in grades 2–3 to use standard-
ized tests and commercial assessments. Ob-
servations were reported as the most com-
mon type of assessment and may be slightly
more frequent at grades K–1. Recognition
as a response option was also used most fre-
quently among younger children, whereas
identification and production were more
frequent at grades 2–3. Teachers in grades
2–3 use more sophisticated tests of reading
and writing and fewer measures of enabling
skills as their assessment methods match
the developing abilities of their students.
Third, teachers regarded informal mea-
sures that they design, select, and embed in
the curriculum as more useful for teachers,
students, and parents than commercial as-
sessments. Teachers regarded standardized
tests and commercial tests that allow little
teacher control and adaptation as less useful
and used them less often. Paradoxically, the
standardized tests were regarded as having
the most effect on administrators’ knowl-
edge and reporting practices. We think that
teachers’ frustration with assessments is
partly tied to this paradox.
Fourth, the most frequently used and
highly valued reading assessments are least
visible to parents and administrators be-
cause they are not reported publicly. Obser-
vations, anecdotes, and daily work samples
are certainly low-stakes evidence of
achievement for accountability purposes,
but they may be the most useful for teach-
ers, parents, and students. It is also ironic
that the assessments on which teachers feel
least trained and regard as least useful (i.e.,
standardized tests) are used most often for
evaluations and public reports. Together
these findings suggest that teachers need
support in establishing the value of instruc-
tional assessments in their classrooms for
administrators and parents while also de-
marcating the limits and interpretations of
externally mandated tests (see Hoffman,
Paris, Patterson, Salas, & Assaf, 2003). The
current slogan about the benefits of a bal-
anced approach to reading instruction
might also be applied to a balanced ap-
proach to reading assessment. The skills
that are assessed need to be balanced
among various components of reading, and
the purposes/benefits of assessment need
to be balanced among the stakeholders.
The critical question that many policy
makers ask is, Which reading assessments
provide the best evidence about children’s
accomplishments and progress? The an-
swer may not be one test or even one type
of assessment. A single assessment cannot
adequately represent the complexity of a
child’s reading development. Likewise, the
same assessments may not represent the
curriculum and instructional diversity
among teachers. A single assessment cannot
capture the variety of skills and develop-
mental levels of children in most K–3
classes. That is why teachers use multiple
assessments and choose those that fit their
purposes. These assessments are the ones
that can reveal the most information about
their students. We believe that the most ro-
bust evidence about children’s reading re-
veals developing skills that can be com-
pared to individual standards of progress as
well as to normative standards of achieve-
ment. A developmental approach balances
the types of assessments across a range of
reading factors and allows all stakeholders
to understand the strengths and weak-
nesses of the child’s reading profile. Many
teachers use this approach implicitly, and
we think it is a useful model for early read-
ing assessment rather than a one-test-fits-all
approach.
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
206 THE ELEMENTARY SCHOOL JOURNAL
NOVEMBER 2004
Assessment of Students’ Oral
Reading
Oral reading has been a focus for the as-
sessment of early reading development
throughout the twentieth century (Rasinski
& Hoffman, 2003). Teachers in the afore-
mentioned surveys reported using chil-
dren’s oral reading as an indicator of
growth and achievement. The informal
reading inventory (IRI) changed over time
to focus on the accuracy of oral reading with
less attention to reading rate until recently.
Now researchers have focused attention on
three facets of oral reading fluency—rate,
accuracy, and prosody—as indicators of au-
tomatic decoding and successful reading
(Kuhn & Stahl, 2003).
During the first year of CIERA, Scott
Paris and David Pearson were asked by the
Michigan Department of Education (MDE)
to help evaluate the new Michigan Literacy
Progress Profile (MLPP) while also evalu-
ating summer reading programs through-
out the state. These research projects dove-
tailed with CIERA research on assessment,
so we spent 5 years working with the In-
gham Intermediate School District and
MDE evaluating summer reading programs
and testing components of the MLPP. The
program evaluations led to several insights
about early reading assessments and eval-
uation research that are worth noting here
(Paris, Pearson, et al., in press).
One insight from the research was the
realization that informal reading invento-
ries (IRIs) were legitimate tools for assess-
ing student growth in reading and for pro-
gram evaluation. In the past 5–7 years,
several state assessment programs and
commercial reading assessments have used
leveled texts with running records or mis-
cue analyses as formative and summative
assessments of early reading. There has
been widespread enthusiasm for such IRI
assessments that serve both purposes be-
cause the assessments are authentic, aligned
with classroom instructional practices, and
integrated into the curriculum. In fact, IRIs
are similar to the daily performance assess-
ments and observations teachers reported
in the CIERA survey of classroom assess-
ments. However, the use of IRIs for sum-
mative assessment must be viewed with
caution until the reliability and validity of
IRI assessments administered by teachers
can be established. Extensive training and
professional development that integrate
reading assessment with instruction seem
necessary in our experience.
A second insight has involved the diffi-
culties in analyzing students’ growth when
students are reading different leveled texts.
The main problem in using IRIs for mea-
suring reading growth is that running re-
cords and miscue analyses are gathered on
variable levels of text that are appropriate
for each child. Thus, comparing a child’s
reading proficiency at two times (or com-
paring various children to each other over
time) usually involves comparisons of dif-
ferent passages and text levels, so changes
in children’s performance are confounded
by differences between passages and diffi-
culty levels. Paris (2002) identified several
methods for analyzing IRI data from lev-
eled texts and concluded that the most so-
phisticated statistical procedure was based
on Item Response Theory (IRT). In the eval-
uation of summer reading programs, Paris,
Pearson, et al. (2004) used IRT analyses to
scale all the reading data from more than
1,000 children on different passages and dif-
ferent levels of an IRI so the scores could be
compared on single scales of accuracy, com-
prehension, retelling, and so forth. Those
analyses revealed significant effects on chil-
dren who participated in summer reading
programs compared to control groups of
children who did not participate in the sum-
mer programs (see Paris, Pearson, et al.,
2004).
A brief description of IRT analyses will
reveal the benefits of this approach. IRT is
a psychometric method of analyzing data
that allows estimates of individual scores
that are independent of the actual test items.
This is important for reading assessment
that compares students’ growth over time
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
ASSESSMENTS 207
on different levels, items, and tests, which
is the usual problem using IRI data. The IRT
scaling procedures in a two-parameter
Rasch model estimate the individual scores
and item difficulties simultaneously (Em-
bretson & Reise, 2000). The crux of an IRT
analysis is to find optimal estimates for the
item parameters that depend on the stu-
dents’ IRT scores that, in turn, depend on
the item parameters. This catch-22 is solved
statistically by an iterative procedure that
converges toward a final solution with op-
timal estimates for all parameters. How-
ever, the calculation is different from other
statistical procedures, such as regression
analysis, because ‘‘likelihood’’ is the under-
lying concept and not regression weights.
The item difficulty is calculated accord-
ing to a logistic function that identifies the
point on an item parameter scale where the
probability of a correct response is exactly
.50. The distribution of correct answers
across items of varying difficulty from stu-
dents in the sample permits estimates of in-
dividual IRT scores that are based on the
actual as well as possible patterns of correct
responses. The numerical IRT scale is then
established with a zero point and a range of
scores, for example, 0–100 or 200–800. For-
tunately, there are software programs avail-
able to calculate IRT scores, but they have
rarely been used with children’s reading
data derived from IRIs and leveled texts.
We think IRT analyses are scientifically rig-
orous and potentially useful ways to ex-
amine children’s reading data and progress
over time.
A third set of insights about reading as-
sessments involved practical decisions
about how to use IRIs effectively. Paris and
Carpenter (2003) found that teachers re-
quire sustained professional development
and schoolwide implementation of reading
assessments to use them uniformly, consis-
tently, and wisely. The real benefit of IRIs is
the knowledge teachers gain while assess-
ing individual children because the assess-
ment framework provides insights about
needed instruction. Teachers need guidance
in selecting IRIs, administering them, inter-
preting them, and using the results with
students and parents, and that guidance
needs to be shared knowledge among the
school staff so it creates a culture of under-
standing about reading assessment. Paris
and Carpenter (2003) found that imple-
menting a schoolwide system of recording
and reporting the data as part of the veri-
fication of students’ adequate yearly pro-
gress (AYP) made the assessments worth
the time and energy of all the participants.
Thus, teachers gained diagnostic informa-
tion about students and also provided ac-
countability through measures of AYP by
comparing fall and spring scores.
A fourth insight that researchers gained
is that IRIs can provide multiple indicators
of children’s oral reading, including rate,
accuracy, prosody, retelling, and compre-
hension and that teachers can choose which
measures to collect. CIERA research iden-
tified some problems with the various mea-
sures derived from IRIs (Paris, Carpenter,
Paris, & Hamilton, in press). For example,
there are restricted ranges and ceiling ef-
fects in some measures, such as prosody
and accuracy. It also appears that compre-
hension is more highly related to oral read-
ing accuracy and rate in beginning readers
and that the relation decreases by the time
children are reading texts at a third- or
fourth-grade level. This means that some
children become adept ‘‘word callers’’ with
little evidence of comprehension, so reading
rate and accuracy measures in IRIs may
yield incomplete information for older
readers.
IRI data on oral reading fluency and
comprehension are most informative about
children’s reading during initial skill devel-
opment, approximately grades K–3, and
when the information is used in combina-
tion with other assessments. Assessmentsof
prerequisite skills for fluent oral reading,
such as children’s vocabulary, letter-sound
knowledge, phonological awareness, begin-
ning writing, understanding of text conven-
tions, and book-handling skills, may aug-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
208 THE ELEMENTARY SCHOOL JOURNAL
NOVEMBER 2004
ment IRIs with valuable information. Thus,
IRIs provide developmentally sensitive as-
sessments for beginning and struggling
readers when fluency and understanding
are growing quickly and when teaching fo-
cuses on specific reading skills. IRIs are ex-
cellent tools for combining diagnostic and
summative assessments in an authentic for-
mat for teachers and students.
New Directions in Early Reading
Assessment
In this part of our review, we summarize
four examples of innovative assessments by
CIERA researchers that chart new direc-
tions in literacy assessment with young
children.
Narrative Comprehension
During the past 10 years of renewed em-
phases on beginning reading, there has
been less attention given to children’s com-
prehension skills compared to decoding
skills (National Reading Panel, 2000). More
research on young children’s comprehen-
sion skills and strategies is needed to diag-
nose and address children’s early reading
difficulties that extend beyond decoding. A
major CIERA assessment project focused on
children’s comprehension of narrative sto-
ries, and more specifically, on narratives il-
lustrated in wordless picture books. Paris
and Paris (2003) created and tested compre-
hension assessment materials and proce-
dures that can be used with young children,
whether or not they can decode print. Such
early assessments of comprehension skills
can complement existing assessments of en-
abling skills, provide diagnostic measures
of comprehension problems, and link com-
prehension assessment with classroom in-
struction.
Narrative comprehension is a complex
meaning-making process that depends on
the simultaneous development of many
skills, including, for example, understand-
ing of story structure and relations among
elements and psychological understanding
about characters’ thoughts and feelings. It
is important to assess narrative comprehen-
sion for several reasons. First, narrative
competence is among the fundamental cog-
nitive skills that influence early reading de-
velopment. Whitehurst and Lonigan (1998)
refer to these skills as ‘‘outside-in’’ skills be-
cause children use the semantic, conceptual,
and narrative relations that they already
know to comprehend the text. In this view,
narrative competence is a fundamental as-
pect of children’s comprehension of expe-
riences before they begin to read, and it
helps children map their understanding
onto texts. The importance and early devel-
opment of narrative thinking may be one
reason that elementary classrooms are dom-
inated by texts in the narrative genre (Duke,
2000). Second, because of the extensive re-
search on narrative comprehension, there is
ample documentation of its importance
among older children and adults as well as
extensive research on its development (e.g.,
Berman & Slobin, 1994). Third, the clear
structure of narrative stories with specific
elements and relations provides a structure
for assessment of understanding. Fourth,
narrative is closely connected to many con-
current developmental accomplishments of
young children in areas such as language,
play, storytelling, and television viewing. It
is an authentic experience in young chil-
dren’s lives, and it reveals important cog-
nitive accomplishments.
In a procedure similar to one van Kraay-
enoord and Paris (1996) used, Paris and
Paris (2003) modified trade books with clear
narrative story lines—a strategy that can be
used easily for both assessment and instruc-
tional purposes—to create the narrative
comprehension (NC) assessment task. They
located commercially published wordless
picture books, adapted them by deleting
some irrelevant pages to shorten the task,
and assembled the pages of photocopied
black and white pictures into spiral bound
little books. It was important that the story
line revealed by the pictures was clear with
an obvious sequence of events and that the
pictures contained the main elements of sto-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
ASSESSMENTS 209
ries (i.e., settings, characters, problems, res-
olutions). The first study in Paris and Paris
(2003) established the NC task procedures
for observing how K–2 children interacted
with wordless picture books under three
conditions: spontaneous examination dur-
ing a ‘‘picture walk,’’ elicited retelling, and
prompted comprehension during question-
ing. The results were striking. The retelling
and prompted comprehension scores in-
creased in regular steps for each grade from
K to 2, and readers were significantly better
than nonreaders on both measures, thus
showing developmental sensitivity of the
assessment task. There were no develop-
mental differences on the picture walk be-
haviors, however.
In study 2, Paris and Paris (2003) ex-
tended the procedures to additional word-
less picture books and examined the reli-
ability of the assessment procedures. The
similarity of developmental trends across
books indicated that the NC task is sensitive
to progressive increases in children’s abili-
ties to make inferences and connections
among pictures and to construct coherent
narrative relations from picture books. Sim-
ilarity of performance across books showed
that examiners can administer the NC task
with different materials and score children’s
performance in a reliable manner. Thus, the
generalizability and robustness of the NC
task across picture books were supported.
In study 3, Paris and Paris (2003) ex-
amined the predictive and concurrent va-
lidity of the NC task with standardized and
informal measures of reading. The similar-
ity in correlations, overall means, and de-
velopmental progressions of NC measures
confirmed the patterns revealed by studies
1 and 2 with new materials and additional
children. In addition, the NC task was sen-
sitive to individual growth over 1 year that
was not due to practice effects. The NC re-
telling and comprehension measures were
correlated significantly with concurrent as-
sessments with an IRI and the Gates-
MacGinitie Reading Test, a standardized,
group-administered test. Furthermore, the
NC comprehension scores among first
graders significantly predicted their scores
on the Iowa Tests of Basic Skills (ITBS) a
year later in second grade (r ‫ס‬ .52).
The three studies provided consistent
and positive evidence about the NC task as
a developmentally appropriate measure of
5–8-year-old children’s narrative under-
standing of picture books. Retelling and
prompted comprehension scores improved
significantly with age, indicating that the
NC task differentiates children who can re-
call main narrative elements, identify criti-
cal explicit information, make inferences,
and connect information across pages from
children who have weaknesses with these
narrative comprehension skills. The NC
task requires brief training and can be given
to children in less than 15 minutes, which is
critical for individual assessment of young
children. The high percentage agreement
between raters among the three books
showed that the scoring rubrics are reliable
across story content and raters. The similar
patterns of cross-sectional and longitudinal
performance further confirmed the gener-
alizability of the task. The strong concurrent
and predictive relations provided encour-
aging evidence of the validity of the NC
task as a measure of comprehension for
emergent readers.
In a related series of CIERA studies, van
den Broek et al. (in press) examined pre-
school children’s comprehension of tele-
vised narratives. They showed 20-minute
episodes of children’s television programs
and presented 13-minute audiotaped sto-
ries to children to compare their viewing
and listening comprehension. Children re-
called causally related events in the narra-
tives better than other kinds of text rela-
tions, and their recall scores in viewing and
listening conditions were highly correlated.
Furthermore, preschoolers’ comprehension
of TV episodes predicted their standardized
reading comprehension test scores in sec-
ond grade. The predictive strength re-
mained even when vocabulary and word
identification skills were controlled in a re-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
210 THE ELEMENTARY SCHOOL JOURNAL
NOVEMBER 2004
gression analysis. Thus, narrative compre-
hension skills of preschoolers can be as-
sessed with TV and picture books, and the
measures have significant predictive valid-
ity for later reading comprehension. We
think that narrative comprehension view-
ing and listening tasks can help teachers to
focus on comprehension skills of young
children even if the children have restricted
decoding skills, few experiences with
books, or limited skills in speaking English.
Parent-Child Interactive Reading
DeBruin-Parecki (1999) created an as-
sessment procedure for family literacy pro-
grams that records interactive book reading
behaviors. One purpose of the assessment
was to help parents with limited literacy
skills understand the kinds of social, cog-
nitive, and literate behaviors that facilitate
preschool children’s engagement with
books. A second purpose was to provide
family literacy programs with visible evi-
dence of the quantity and quality of parent-
child book interactions. The research was
based on the premise that children learn to
read early and with success when parents
provide stimulating, print-rich environ-
ments at home (e.g., Bus, van Ijzendoorn, &
Pellegrini, 1995). Moreover, parents must
provide appropriate support during joint
book reading. Morrow (1990) identified ef-
fective interactive reading behaviors, such
as questioning, scaffolding dialogue and re-
sponses, offering praise or positive rein-
forcement, giving or extending information,
clarifying information, restating informa-
tion, directing discussion, sharing personal
reactions, and relating concepts to life ex-
periences. Thus, DeBruin-Parecki (1999)
created the Adult/Child Interactive Read-
ing Inventory (ACIRI) to assess these kinds
of behaviors.
The ACIRI lists 12 literacy behaviors of
adults and the corresponding 12 behaviors
by children. For example, one adult behav-
ior is ‘‘poses and solicits questions about the
book’s content,’’ and the corresponding
child behavior is ‘‘responds to questions
about the book.’’ There were four behaviors
in each of the following three categories: en-
hancing attention to text, promoting inter-
active reading and supporting comprehen-
sion, and using literate strategies. The
observer using the ACIRI recorded the fre-
quencies of the 12 behaviors for both parent
and child along with notes about the joint
book reading. The assessment was designed
to be brief (15–30 minutes), flexible, non-
threatening, appropriate for any texts,
shared with parents, and informative about
effective instruction. Following the assess-
ment, the observer discusses the results
with the parent to emphasize the positive
features of the interaction and to provide
guidance for future book interactions. The
transparency of the assessment and the im-
mediate sharing of information minimize
the discomfort of being observed. After
leaving the home, the observer can record
additional notes and calculate quantitative
scores for the frequencies of observed be-
haviors.
DeBruin-Parecki tested the ACIRI with
29 mother-child pairs enrolled in an Even
Start family literacy program in Michigan.
The children were 3 to 5 years old, and the
mothers were characterized as lower socio-
economic status. The regular staff collected
ACIRI assessments at the beginning and
end of the year as part of their program
evaluation and field testing of the assess-
ment. Familiarity also minimized anxiety
about being observed. The results sup-
ported the usefulness of the instrument.
The ACIRI was shown to be sensitive to
parent-child interactions because the four
behaviors in each of the categories showed
significant correlations between the fre-
quencies of adult and child behaviors. Re-
liability was evaluated by having observers
rate videotaped parent-child book interac-
tions. Interrater reliability was 97% among
eight raters. Consequential validity was
established through staff interviews that
showed favorable evaluations of the ACIRI.
The comparison of fall to spring scores
showed that parents and children increased
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
ASSESSMENTS 211
the frequencies of many of the 12 behaviors
during the year. Thus, the ACIRI provided
both formative and summative assessment
functions for Even Start staff.
Book Reading in Early Childhood
Classrooms
In addition to measuring reading skills
of children and adults, it is also important
to assess the literate environment. Longi-
tudinal and cross-sectional studies of chil-
dren’s literacy development reveal that
more frequent book reading at home, sup-
ported by interactive conversations and
scaffolded instruction, leads to growth in
language and literacy during early child-
hood (Scarborough & Dobrich, 1994; Ta-
bors, Snow, & Dickinson, 2001). Similar
studies in schools have shown that the qual-
ity of teacher-child interaction, the fre-
quency of book reading, and the availability
of books all enhance children’s early read-
ing and language development (Neuman,
1999; Whitehurst & Lonigan, 1998). Thus,
assessments of environments that support
reading can help improve conditions at
home and school.
Dickinson, McCabe, and Anastasopoulos
(2001) reported a framework for assessing
book reading in early childhood classrooms
along with data from their observations of
many classrooms. They derived the follow-
ing five important dimensions to evaluate in
classrooms.
• Book area. Issues to consider include
whether there is a book area, the qual-
ity of the area, and the quantity and
quality of books provided.
• Time for adult-child book reading.
Time is a critical ingredient, and con-
sideration should be given to the fre-
quency and duration of adult-mediated
reading experiences, including one-to-
one, small-group, and whole-class read-
ings, as well as the number of books
read during these sessions.
• Curricular integration. Integration re-
fers to the nature of the connection be-
tween the ongoing curriculum and the
use of books during whole-class times
and throughout the day.
• Nature of the book reading event.
When considering a book reading
event, one should examine the teacher’s
reading and discussion styles and chil-
dren’s engagement.
• Connections between the home and
classroom. The most effective teachers
and programs strive to support read-
ing at home through parent education,
lending libraries, circulation of books
made by the class, and encouragement
of better use of community libraries.
Dickinson et al. (2001) examined data
from four studies in order to evaluate the
importance of these dimensions. They
noted that many of the preschool class-
rooms they observed were rated high in
quality using historical definitions of devel-
opmentally appropriate practices, but that
the same classrooms were rated as having
low-quality literacy instruction. For exam-
ple, only half the classrooms they observed
had separate areas for children to read
books, and there were few informational
books and few books about varied racial
and cultural groups. They found no book
reading at all in 66 classrooms. In the other
classrooms, adults read to children less than
10 minutes per day, and only 35% of the
classes allowed time for children to look at
books on their own. Other observations led
the researchers to conclude that book read-
ing was not coordinated with the curricu-
lum or learning goals. Only 19% of class-
rooms had three or more books related to
a curricular theme, and only 35% of class-
rooms had listening centers. Dickinson et
al. (2001) noted that group reading is a
filler activity used in classroom transitions
rather than an instructional and curricular
priority.
The researchers also examined book
reading in classrooms by assessing teachers’
style, animation, and prosody as they read.
Most teachers read with little expressive-
ness. Many used an explicit management
style, such as asking questions of children
who raised their hands, as ‘‘crowd control’’
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
212 THE ELEMENTARY SCHOOL JOURNAL
NOVEMBER 2004
rather than using thought-provoking ques-
tions about text. More than 70% of teachers’
talk during book reading made few cogni-
tive demands on children. The researchers
suggested that teachers must devote more
attention to engaging children in discus-
sions that link their experiences to text,
teach new vocabulary words, probe char-
acters’ motivations, and promote compre-
hension of text relations. Analyses of home-
school connections revealed that more
could be done to encourage families to re-
inforce school practices and to seek com-
munity literacy resources. Teachers rarely
connected language and cultural experi-
ences at home with literacy instruction at
school.
Dickinson et al. (2001) concluded that
the framework can be useful for assessing
early childhood classrooms and for study-
ing the effects of specific environmental fea-
tures on children’s literacy development.
They noted, for example, that their research
revealed little correlation between the
amount of book reading in classrooms and
the degree to which reading was integrated
into the curriculum. They interpreted this as
evidence that book reading in many early
childhood classrooms is an incidental activ-
ity rather than a planned instructional goal.
The framework is also useful for reading
educators to use with preservice and in-
service teachers who want to assess their
own classrooms and teaching styles because
it identifies critical elements of successful
classrooms.
Texts and the Text Environment in
Beginning Reading Instruction
The research by Dickinson et al. (2001)
provides a conceptual bridge to several
other CIERA investigations of texts and the
text environment. Two important strands of
research at CIERA have examined the as-
sessment of text characteristics for begin-
ning reading instruction. In the first strand
of research, Hiebert and her colleagues de-
veloped a text assessment framework that
can be used to analyze important features
of texts used for beginning reading instruc-
tion. This framework is grounded in Hie-
bert’s theoretical claims that certain text fea-
tures scaffold readers’ success in early
reading. The framework, called the Text Ele-
ments by Task (TExT) model, identifies two
critical factors in determining beginning
readers’ success with texts: linguistic con-
tent and cognitive load.
Linguistic content refers to the knowl-
edge about oral and written language that
is necessary for readers to recognize the
words in particular texts. Phoneme-
grapheme knowledge that is required to
read a text is described in terms of several
measures that provide different but comple-
mentary information on the phoneme-
grapheme knowledge related to vowels.
The first measure of phoneme-grapheme
knowledge summarizes the complexity of
the vowel patterns in a text. The second
measure is the degree to which highly com-
mon vowel and consonant patterns are re-
peated. To use this measure, the TExT
model examines the number of different on-
sets that appear with a particular rime. The
number of syllables in words is a third mea-
sure of linguistic content that influences be-
ginning readers’ recognition of words. The
model claims that texts with fewer multi-
syllabic words help children acquire fluent
word recognition.
The cognitive load factor within the
TeXT model measures the amount of new
linguistic information to which beginning
readers can attend while continuing to un-
derstand the text’s message. Repetition of at
least some core linguistic content has tra-
ditionally been used to reduce the cognitive
load in text used for teaching children to
read. Within the TeXT model, word repeti-
tion and the number of unique words rela-
tive to total words are used to inspect the
cognitive load a particular text places on be-
ginning readers. Two additional features of
texts that are commonly used in classrooms
are also considered in the model: the sup-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
ASSESSMENTS 213
port provided through illustrations, and
patterns of sentence and text structure. In a
recent study applying these TeXT princi-
ples, Menon and Hiebert (2003) compared
‘‘little books’’ developed within the frame-
work to traditional beginning reading texts.
They demonstrated the effectiveness of stu-
dents’ practice reading little books devel-
oped in consideration of the TeXT model
over traditional basal texts. The assessment
tools developed in this research can be used
to evaluate the complexity of texts as well
as to guide the construction of new texts for
beginning readers.
In the second strand of research focused
on texts, Hoffman and his colleagues inves-
tigated the qualities of texts used in begin-
ning reading instruction and the leveling
systems of these texts. The research con-
ducted through CIERA was grounded in
earlier studies of changes in basal reading
texts associated with the literature-based
movement (Hoffman et al., 1998) and later
with the decodable text movement (Hoff-
man, Sailors, & Patterson, 2002). Hoffman
(2002) has proposed a model for the assess-
ment of text quality for beginning reading
instruction that considers three factors: ac-
cessibility, instructional design, and engag-
ing qualities. Accessibility of a text for the
reader is a function of the decoding de-
mands of the text (a word-level focus) and
the support provided through the predict-
able features of the text (ranging from pic-
ture support, to rhyme, to repeated phrases,
to a cumulative structure). The instructional
design factor involves the ways that a text
fits into the larger scheme for texts read (a
leveling issue) as well as the instruction and
curriculum that surround the reading of the
text. Finally, the content factor considers the
content, language, and design features of
the text.
In addition to evaluating the changes in
texts for beginning readers, Hoffman,
Roser, Salas, Patterson, and Pennington
(2001) used this framework to study the
validity of some of the popular text-
leveling schemes. In a study involving over
100 first-grade learners, these researchers
examined the ways in which estimates of
text difficulty using different text-leveling
systems predicted the performance of first-
grade readers. The research identified high
correlations between the factors in the theo-
retical model and the leveling from both the
Pinnell and Fountas and Reading Recovery
systems. The analysis also confirmed the
validity of these systems in predicting stu-
dent performance with first-grade texts. Fi-
nally, the study documented the effects of
varying levels of support provided to the
reader (shared reading, guided reading, no
support) and performance on such mea-
sures as oral reading accuracy, rate, and flu-
ency.
Hoffman and Sailors (2002) created a
method for assessing the classroom literacy
environment called the TEX-IN3 that in-
cludes three components: a text inventory,
a text in-use observation, and a series of text
interviews. The TEX-IN3 draws on several
research literatures, including research on
texts conducted through CIERA. In addi-
tion, the instrument was developed based
on the literature exploring the effects of the
text environment on teaching and learning.
The assessment yields a series of scores as
well as qualitative data on the classroom lit-
eracy environment.
The TEX-IN3 was validated in a study
of over 30 classrooms (Hoffman, Sailors,
Duffy, & Beretvas, 2003). In this study, stu-
dents were tested with pre- and posttests on
a standardized reading test. Observers were
trained in the use of TEX-IN3, and high lev-
els of reliability were established. Data were
collected in classrooms at three times (fall,
winter, and spring). Data analyses focused
on the relations between features of the text
environment from the TEX-IN3 and stu-
dents’ reading comprehension scores. The
analyses supported all three components of
the TEX-IN3. For example, the correlations
between students’ gain scores and the rat-
ings of the overall text environment were
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
214 THE ELEMENTARY SCHOOL JOURNAL
NOVEMBER 2004
significant. Correlations between students’
gain scores and the in-use scores derived
from observation of teaching, as well as the
correlations between the rating of teachers’
understanding and valuing of the text en-
vironment with students’ gain scores, were
significant. The findings from the research
with the TEX-IN3 suggest the importance of
expanding assessment from a narrow focus
on the texts in classrooms to consideration
of texts in the full literacy environment.
Summary and Future Research
From 1997 to 2002, CIERA researchers con-
ducted many studies of early reading as-
sessment that focused on readers and text,
home and school, and policy and profes-
sion. The CIERA surveys of early reading
assessments identified the expanding array
of assessment instruments, commercial and
noncommercial, available to K–3 teachers.
Researchers also identified how effective
teachers in schools that ‘‘beat the odds’’ use
assessments and how they view the utility
and effects of various types of assessments.
The instruments used most frequently con-
tributed to ongoing CIERA research on the
development of the MLPP battery and the
use of IRIs for formative and summative
purposes. Those studies remain in progress
as researchers collect longitudinal evidence
about the reliability and validity of early
reading assessments (e.g., Paris, Carpenter,
et al., in press).
The most important insight from this re-
search is that some skills, such as alphabet
knowledge, concepts of print, and phone-
mic awareness, are universally mastered in
relatively brief developmental periods. As a
consequence, the distributions of data from
these variables are skewed by floor and ceil-
ing effects that, in turn, influence the cor-
relations used to establish reliability and va-
lidity of the assessments. Assessments of
oral reading accuracy, and perhaps rate, are
also skewed, so that measures of some basic
reading skills are difficult to analyze with
parametric statistics in traditional ways.
The mastery of some reading skills poses
challenges to conventional theories of read-
ing development and traditional statistical
analyses.
CIERA researchers also developed in-
novative assessments of comprehension
with wordless picture books that offer
teachers new ways to instruct and assess
comprehension with children who cannot
yet decode print. These cross-sectional and
longitudinal studies substantiate the reli-
ability and validity of early assessments. In
addition, CIERA researchers designed and
tested new methods for assessing narrative
comprehension, interactive parent-child
reading, literate environments in early
childhood classrooms, text features, and the
text environment. All of these tools have
immediate practical applications and bene-
fits for educators. Indeed, the hallmark of
CIERA research on reading assessment is
the use of rigorous methods to identify, cre-
ate, test, and refine instruments and prac-
tices that can help parents and teachers pro-
mote the reading achievement of all
children.
This research, as well as studies outside
the immediate CIERA network, points to
the need for continuing study of assessment
in early literacy. We believe that at least four
areas deserve special attention. First, the
policy context for instructional programs,
teaching, and teacher education that places
a premium on ‘‘scientifically proven’’ ap-
proaches and methods has immediate im-
plications for assessment. Tools to be used
in reading assessment (e.g., for diagnosis,
program evaluation, or research) are subject
to high standards of validity and reliability.
We applaud this attention to rigor in as-
sessment, but we believe that decision mak-
ing about the use of instruments should be
professional, public, and comprehensive.
Such deliberations must extend beyond the
traditional psychometric constructs of reli-
ability and validity to include consideration
of the consequences of testing and the social
contexts of assessment.
Second, researchers must continue to in-
vestigate the ways in which assessment
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
ASSESSMENTS 215
tools can be broadened to focus on multiple
factors and the interaction of these factors
in ways that reflect authentic learning and
teaching environments. For example, infor-
mal reading inventories have become pop-
ular tools for assessment, partly because
reading rate and accuracy can be assessed
quickly and reliably, but educators need to
consider how text-leveling factors might in-
teract with students’ developmental levels
to influence evaluations of reading perfor-
mance. Good assessments should lead to an
understanding of the complexity of learn-
ing to read and not impose a false sense of
simplicity on early reading development.
Third, the gulf between what teachers
value as informal assessments and what is
imposed on them in the form of standard-
ized testing appears to be broadening. Al-
though performance assessments and port-
folios were popular in the 1980s and 1990s,
the trends today are to increase high-stakes
testing for young children, to remove
teacher judgment from assessment, and to
streamline assessments so they can be con-
ducted quickly and repeatedly. More re-
search is needed on how highly effective
teachers assess developing reading skills in
their classrooms. Before educators and pol-
icy makers abandon performance assess-
ment, careful consideration must be given
to the ways that ongoing assessment can
promote differentiated instruction.
Fourth, researchers cannot lose sight of
the fact that good assessment rests on good
theory, not just a theory of reading but of
effective teaching and development. Just
because motivation, self-concept, and criti-
cal thinking are difficult to measure using
large-scale standardized tests does not
mean they should be ignored. The scientific
method is not just about comparing one
program or one approach to another to
prove which is best. The scientific investi-
gation of assessment in early literacy should
contribute to theory building that ulti-
mately informs effective teaching and learn-
ing.
References
Berman, R. A., & Slobin, D. I. (1994). Relating
events in narrative: A crosslinguistic develop-
mental study. Hillsdale, NJ: Erlbaum.
Bus, A. J., van Ijzendoorn, M. H., & Pellegrini, A.
D. (1995). Joint book-reading makes for suc-
cess in learning to read: A meta-analysis on
intergenerational transmission of literacy. Re-
view of Educational Research, 65(1), 1–21.
Clay, M. M. (1993). An observation survey of early
literacy achievement. Portsmouth, NH: Hei-
nemann.
DeBruin-Parecki, A. (1999). Assessing adult/child
storybook reading practices (Technical Rep. No.
2-004). Ann Arbor: University of Michigan,
Center for the Improvement of Early Read-
ing Achievement.
Dickinson, D. K., McCabe, A., & Anastasopou-
los, L. (2001). A framework for examining book
reading in early childhood classrooms (Tech.
Rep. No. 1-014). Ann Arbor: University of
Michigan, Center for the Improvement of
Early Reading Achievement.
Duke, N. K. (2000). 3.6 minutes per day: The
scarcity of information texts in first grade.
Reading Research Quarterly, 35(2), 202–224.
Embretson, S. E., & Reise, S. P. (2000). Item re-
sponse theory for psychologists. Mahwah, NJ:
Erlbaum.
Good, R. H., & Kaminski, R. A. (Eds.). (2002).
Dynamic indicators of basic early literacy skills
(6th
ed.). Eugene, OR: Institute for the Devel-
opment of Educational Achievement.
Haladyna, T., Nolen, S. B., & Haas, N. S. (1991).
Raising standardized achievement test
scores and the origins of test score pollution.
Educational Researcher, 20(5), 2–7.
Hoffman, J. V. (2002). Words on words in leveled
texts for beginning readers. In D. Schallert,
C. Fairbanks, J. Worthy, B. Maloch, & J. V.
Hoffman (Eds.), Fifty-first yearbook of the Na-
tional Reading Conference (pp. 59–81). Oak
Creek, WI: National Reading Conference.
Hoffman, J. V., Assaf, L. C., & Paris, S. G. (2001).
High-stakes testing in reading: Today in
Texas, tomorrow? Reading Teacher, 54(5), 482–
492.
Hoffman, J. V., McCarthey, S. J., Abbott, J., Chris-
tian, C., Corman, L., Curry, C., Dressman, M.,
Elliot, B., Mathern, D., & Stahle, E. (1998).
The literature-based basals in first-grade
classrooms: Savior, satan, or same-old, same-
old? Reading Research Quarterly, 33, 168–197.
Hoffman, J. V., Paris, S. G., Patterson, E., Salas,
R., & Assaf, L. (2003). High-stakes assess-
ment in the language arts: The piper plays,
the players dance, but who pays the price?
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
216 THE ELEMENTARY SCHOOL JOURNAL
NOVEMBER 2004
In J. Flood & D. Lapp (Eds.), Handbook of re-
search on teaching the English language arts (2d
ed., pp. 619–630). Mahwah, NJ: Erlbaum.
Hoffman, J. V., Roser, N. L., Salas, R., Patterson,
E., & Pennington, J. (2001). Text leveling and
‘‘little books’’ in first-grade reading. Journal
of Literacy Research, 33(3), 507–528.
Hoffman, J. V., & Sailors, M. (2002). The TEX-IN3:
Text inventory, text in-use and text interviews.
Bastrop, TX: Jeaser.
Hoffman, J. V., Sailors, M., Duffy, G., & Beretvas,
C. (2003, April). Assessing the literacy environ-
ment using the TEX-IN3: A validity study. Pa-
per presented at the annual meeting of the
American Educational Research Association,
Chicago.
Hoffman, J. V., Sailors, M., & Patterson, E. (2002).
Decodable texts for beginning reading in-
struction: The year 2000 basals. Journal of Lit-
eracy Research, 34(3), 269–298.
Kuhn, M. R., & Stahl, S. A. (2003). Fluency: A
review of developmental and remedial prac-
tices. Journal of Educational Psychology, 95(1),
3–21.
Meisels, S. J., Liaw, F. R., Dorfman, A. B., & Nel-
son, R. (1995). The Work Sampling System:
Reliability and validity of a performance as-
sessment for young children. Early Childhood
Research Quarterly, 10(3), 277–296.
Meisels, S. J., & Piker, R. A. (2000). An analysis of
early literacy assessments used for instruction
(Tech. Rep. No. 3-002). Ann Arbor: Univer-
sity of Michigan, Center for the Improve-
ment of Early Reading Achievement.
Menon, S., & Hiebert, E. H. (2003). A comparison
of first graders’ reading acquisition with little
books and literature anthologies (Tech. Rep. No.
1-009). Ann Arbor: University of Michigan,
Center for the Improvement of Early Read-
ing Achievement.
Morrow, L. M. (1990). Assessing children’s un-
derstanding of story through their construc-
tion and reconstruction of narrative. In L. M.
Morrow & J. K. Smith (Eds.), Assessment for
instruction in early literacy (pp. 110–133). En-
glewood Cliffs, NJ: Prentice-Hall.
National Reading Panel. (2000). Teaching children
to read: An evidence-based assessment of the sci-
entific research literature on reading and its im-
plications for reading instruction: Reports of the
subgroups. Bethesda, MD: National Institute
of Child Health and Human Development.
Neuman, S. B. (1999). Books make a difference:
A study of access to literacy. Reading Research
Quarterly, 34(3), 286–311.
No Child Left Behind Act of 2001. (2002). Pub. L.
No. 107–110, paragraph 115, Stat. 1425.
Paris, A. H., & Paris, S. G. (2003). Assessing nar-
rative comprehension in young children.
Reading Research Quarterly, 38(1), 37–76.
Paris, S. G. (2000). Trojan horse in the schoolyard:
The hidden threats in high-stakes testing. Is-
sues in Education, 6(1,2), 1–16.
Paris, S. G. (2002). Measuring children’s reading
development using leveled texts. Reading
Teacher, 56(2), 168–170.
Paris, S. G., & Carpenter, R. D. (2003). FAQs
about IRIs. Reading Teacher, 56(6), 578–580.
Paris, S. G., Carpenter, R. D., Paris, A. H., &
Hamilton, E. E. (in press). Spurious and gen-
uine correlates of children’s reading compre-
hension. In S. G. Paris & S. A. Stahl (Eds.),
Children’s reading comprehension and assess-
ment. Mahwah, NJ: Erlbaum.
Paris, S. G., Lawton, T. A., Turner, J. C., & Roth,
J. L. (1991). A developmental perspective on
standardized achievement testing. Educa-
tional Researcher, 20, 12–20.
Paris, S. G., Paris, A. H., & Carpenter, R. D.
(2002). Effective practices for assessing
young readers. In B. Taylor & P. D. Pearson
(Eds.), Teaching reading: Effective schools, ac-
complished teachers (pp. 141–160). Mahwah,
NJ: Erlbaum.
Paris, S. G., Pearson, P. D., Cervetti, G., Carpen-
ter, R., Paris, A. H., DeGroot, J., Mercer, M.,
Schnabel, K., Martineau, J., Papanastasiou,
E., Flukes, J., Humphrey, K., & Bashore-Berg,
T. (2004). Assessing the effectiveness of sum-
mer reading programs. In G. Borman & M.
Boulay (Eds.), Summer learning: Research, pol-
icies, and programs (pp. 121–161). Mahwah,
NJ: Erlbaum.
Pearson, P. D., Sensale, L., Vyas, S., & Kim, Y.
(1999, June). Early literacy assessment: A mar-
ketplace analysis. Paper presented at the Na-
tional Conference on Large-Scale Assess-
ment, Snowbird, UT.
Rasinski, T. V., & Hoffman, J. V. (2003). Oral read-
ing in the school literacy curriculum. Reading
Research Quarterly, 38(4), 510–522.
Scarborough, H. S., & Dobrich, W. (1994). On the
efficacy of reading to preschoolers. Develop-
mental Review, 14, 245–302.
Snow, C. E., Burns, M. S., & Griffin, P. (1998).
Preventing reading difficulties in young children.
Washington, DC: National Academy Press.
Stallman, A. C., & Pearson, P. D. (1990). Formal
measures of early literacy. In L. M. Morrow
& J. K. Smith (Eds.), Assessment for instruction
in early literacy (pp. 7–44). Englewood Cliffs,
NJ: Prentice-Hall.
Tabors, P. O., Snow, C. E., & Dickinson, D. K.
(2001). Homes and schools together: Sup-
porting language and literacy development.
In D. K. Dickinson & P. O. Tabors (Eds.), Be-
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions
ASSESSMENTS 217
ginning literacy with language: Young children
learning at home and in school (pp. 313–334).
Baltimore: Brookes.
Urdan, T. C., & Paris, S. G. (1994). Teachers’ per-
ceptions of standardized achievement tests.
Educational Policy, 8(2), 137–156.
van den Broek, P., Kendeou, P., Kremer, K.,
Lynch, J., Butler, J., White, M. J., & Lorch, E.
P. (in press). Assessment of comprehension
abilities in young children. In S. G. Paris &
S. A. Stahl (Eds.), Children’s reading compre-
hension and assessment. Mahwah, NJ: Erl-
baum.
van Kraayenoord, C. E., & Paris, S. G. (1996).
Story construction from a picture book: An
assessment activity for young learners. Early
Childhood Research Quarterly, 11, 41–61.
Whitehurst, G. J., & Lonigan, C. J. (1998). Child
development and emergent literacy. Child
Development, 69(3), 848–872.
This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM
All use subject to JSTOR Terms and Conditions

More Related Content

What's hot

Dissertation defense ppt
Dissertation defense ppt Dissertation defense ppt
Dissertation defense ppt Dr. James Lake
 
Dissertation Defense Presentation
Dissertation Defense PresentationDissertation Defense Presentation
Dissertation Defense PresentationAvril El-Amin
 
Factors Effecting Students Performance in University (RM)
Factors Effecting Students Performance in University (RM)Factors Effecting Students Performance in University (RM)
Factors Effecting Students Performance in University (RM)RazzamMalik
 
Research proposal upload
Research proposal uploadResearch proposal upload
Research proposal uploadNabin Bhattarai
 
Transactional instruction
Transactional instructionTransactional instruction
Transactional instructionlutfan adli
 
Powerpoint presentation M.A. Thesis Defence
Powerpoint presentation M.A. Thesis DefencePowerpoint presentation M.A. Thesis Defence
Powerpoint presentation M.A. Thesis DefenceCatie Chase
 
11.descriptive evaluation of the primary schools an overview
11.descriptive evaluation of the primary schools an overview11.descriptive evaluation of the primary schools an overview
11.descriptive evaluation of the primary schools an overviewAlexander Decker
 
Descriptive evaluation of the primary schools an overview
Descriptive evaluation of the primary schools an overviewDescriptive evaluation of the primary schools an overview
Descriptive evaluation of the primary schools an overviewAlexander Decker
 
COMPETENCY- BASED SCIENCE NAT - VI INTERVENTION PROGRAM: ACTION RESEARCH
COMPETENCY- BASED SCIENCE NAT - VI INTERVENTION PROGRAM: ACTION RESEARCHCOMPETENCY- BASED SCIENCE NAT - VI INTERVENTION PROGRAM: ACTION RESEARCH
COMPETENCY- BASED SCIENCE NAT - VI INTERVENTION PROGRAM: ACTION RESEARCHDeped Tagum City
 
Rebecca Duong, PhD Proposal Defense, Dr. William Allan Kritsonis, Dissertatio...
Rebecca Duong, PhD Proposal Defense, Dr. William Allan Kritsonis, Dissertatio...Rebecca Duong, PhD Proposal Defense, Dr. William Allan Kritsonis, Dissertatio...
Rebecca Duong, PhD Proposal Defense, Dr. William Allan Kritsonis, Dissertatio...William Kritsonis
 
Curriculum trends, school reform, standards, and assesment
Curriculum trends, school reform, standards, and assesmentCurriculum trends, school reform, standards, and assesment
Curriculum trends, school reform, standards, and assesmentdyta maykasari
 
Dr. Arthur L. Petter, PhD Dissertation Defense, Dr. William Allan Kritsonis, ...
Dr. Arthur L. Petter, PhD Dissertation Defense, Dr. William Allan Kritsonis, ...Dr. Arthur L. Petter, PhD Dissertation Defense, Dr. William Allan Kritsonis, ...
Dr. Arthur L. Petter, PhD Dissertation Defense, Dr. William Allan Kritsonis, ...William Kritsonis
 
Research PresentatioThe Effects of Student Assessment Choices on 11th Grade E...
Research PresentatioThe Effects of Student Assessment Choices on 11th Grade E...Research PresentatioThe Effects of Student Assessment Choices on 11th Grade E...
Research PresentatioThe Effects of Student Assessment Choices on 11th Grade E...Matthew Prost
 
Dissertation proposal defense slideshow; phenomenology, qualitative
Dissertation proposal defense slideshow; phenomenology, qualitativeDissertation proposal defense slideshow; phenomenology, qualitative
Dissertation proposal defense slideshow; phenomenology, qualitativeCorey Caugherty
 
Effect of teaching method, choice of discipline and student lecturer relation...
Effect of teaching method, choice of discipline and student lecturer relation...Effect of teaching method, choice of discipline and student lecturer relation...
Effect of teaching method, choice of discipline and student lecturer relation...Alexander Decker
 

What's hot (19)

Dissertation defense ppt
Dissertation defense ppt Dissertation defense ppt
Dissertation defense ppt
 
Dissertation Defense Presentation
Dissertation Defense PresentationDissertation Defense Presentation
Dissertation Defense Presentation
 
Factors Effecting Students Performance in University (RM)
Factors Effecting Students Performance in University (RM)Factors Effecting Students Performance in University (RM)
Factors Effecting Students Performance in University (RM)
 
Research proposal upload
Research proposal uploadResearch proposal upload
Research proposal upload
 
Transactional instruction
Transactional instructionTransactional instruction
Transactional instruction
 
Powerpoint presentation M.A. Thesis Defence
Powerpoint presentation M.A. Thesis DefencePowerpoint presentation M.A. Thesis Defence
Powerpoint presentation M.A. Thesis Defence
 
11.descriptive evaluation of the primary schools an overview
11.descriptive evaluation of the primary schools an overview11.descriptive evaluation of the primary schools an overview
11.descriptive evaluation of the primary schools an overview
 
Descriptive evaluation of the primary schools an overview
Descriptive evaluation of the primary schools an overviewDescriptive evaluation of the primary schools an overview
Descriptive evaluation of the primary schools an overview
 
2 caldero done
2 caldero done2 caldero done
2 caldero done
 
COMPETENCY- BASED SCIENCE NAT - VI INTERVENTION PROGRAM: ACTION RESEARCH
COMPETENCY- BASED SCIENCE NAT - VI INTERVENTION PROGRAM: ACTION RESEARCHCOMPETENCY- BASED SCIENCE NAT - VI INTERVENTION PROGRAM: ACTION RESEARCH
COMPETENCY- BASED SCIENCE NAT - VI INTERVENTION PROGRAM: ACTION RESEARCH
 
2013 mansor et al
2013 mansor et al2013 mansor et al
2013 mansor et al
 
Action research
Action researchAction research
Action research
 
Rebecca Duong, PhD Proposal Defense, Dr. William Allan Kritsonis, Dissertatio...
Rebecca Duong, PhD Proposal Defense, Dr. William Allan Kritsonis, Dissertatio...Rebecca Duong, PhD Proposal Defense, Dr. William Allan Kritsonis, Dissertatio...
Rebecca Duong, PhD Proposal Defense, Dr. William Allan Kritsonis, Dissertatio...
 
Curriculum trends, school reform, standards, and assesment
Curriculum trends, school reform, standards, and assesmentCurriculum trends, school reform, standards, and assesment
Curriculum trends, school reform, standards, and assesment
 
Dr. Arthur L. Petter, PhD Dissertation Defense, Dr. William Allan Kritsonis, ...
Dr. Arthur L. Petter, PhD Dissertation Defense, Dr. William Allan Kritsonis, ...Dr. Arthur L. Petter, PhD Dissertation Defense, Dr. William Allan Kritsonis, ...
Dr. Arthur L. Petter, PhD Dissertation Defense, Dr. William Allan Kritsonis, ...
 
Research PresentatioThe Effects of Student Assessment Choices on 11th Grade E...
Research PresentatioThe Effects of Student Assessment Choices on 11th Grade E...Research PresentatioThe Effects of Student Assessment Choices on 11th Grade E...
Research PresentatioThe Effects of Student Assessment Choices on 11th Grade E...
 
A0350104
A0350104A0350104
A0350104
 
Dissertation proposal defense slideshow; phenomenology, qualitative
Dissertation proposal defense slideshow; phenomenology, qualitativeDissertation proposal defense slideshow; phenomenology, qualitative
Dissertation proposal defense slideshow; phenomenology, qualitative
 
Effect of teaching method, choice of discipline and student lecturer relation...
Effect of teaching method, choice of discipline and student lecturer relation...Effect of teaching method, choice of discipline and student lecturer relation...
Effect of teaching method, choice of discipline and student lecturer relation...
 

Viewers also liked

Diagnosis: The Missing Ingredient from RTI
Diagnosis: The Missing Ingredient from RTIDiagnosis: The Missing Ingredient from RTI
Diagnosis: The Missing Ingredient from RTIrathx039
 
Paris, hoffman reading assessments k 3
Paris, hoffman reading assessments k 3Paris, hoffman reading assessments k 3
Paris, hoffman reading assessments k 3rathx039
 
Assessmentand identification 5431
Assessmentand identification 5431Assessmentand identification 5431
Assessmentand identification 5431rathx039
 
Práctica 12 terminada hcd dámaris anastacio p16
Práctica 12 terminada hcd dámaris anastacio p16Práctica 12 terminada hcd dámaris anastacio p16
Práctica 12 terminada hcd dámaris anastacio p16Damaris Anastacio Zambrano
 
Organizing and Evaluating Results from Multiple Reading Assessments
Organizing and Evaluating Results from Multiple Reading AssessmentsOrganizing and Evaluating Results from Multiple Reading Assessments
Organizing and Evaluating Results from Multiple Reading Assessmentsrathx039
 
The Prevention of Reading Difficulties
The Prevention of Reading Difficulties The Prevention of Reading Difficulties
The Prevention of Reading Difficulties rathx039
 
Getting to scale: How we can achieve the reach required of prevention service...
Getting to scale: How we can achieve the reach required of prevention service...Getting to scale: How we can achieve the reach required of prevention service...
Getting to scale: How we can achieve the reach required of prevention service...HopkinsCFAR
 

Viewers also liked (7)

Diagnosis: The Missing Ingredient from RTI
Diagnosis: The Missing Ingredient from RTIDiagnosis: The Missing Ingredient from RTI
Diagnosis: The Missing Ingredient from RTI
 
Paris, hoffman reading assessments k 3
Paris, hoffman reading assessments k 3Paris, hoffman reading assessments k 3
Paris, hoffman reading assessments k 3
 
Assessmentand identification 5431
Assessmentand identification 5431Assessmentand identification 5431
Assessmentand identification 5431
 
Práctica 12 terminada hcd dámaris anastacio p16
Práctica 12 terminada hcd dámaris anastacio p16Práctica 12 terminada hcd dámaris anastacio p16
Práctica 12 terminada hcd dámaris anastacio p16
 
Organizing and Evaluating Results from Multiple Reading Assessments
Organizing and Evaluating Results from Multiple Reading AssessmentsOrganizing and Evaluating Results from Multiple Reading Assessments
Organizing and Evaluating Results from Multiple Reading Assessments
 
The Prevention of Reading Difficulties
The Prevention of Reading Difficulties The Prevention of Reading Difficulties
The Prevention of Reading Difficulties
 
Getting to scale: How we can achieve the reach required of prevention service...
Getting to scale: How we can achieve the reach required of prevention service...Getting to scale: How we can achieve the reach required of prevention service...
Getting to scale: How we can achieve the reach required of prevention service...
 

Similar to Findings on Early Reading Assessments from CIERA Research

A Critical Analysis Of Research On Reading Teacher Education
A Critical Analysis Of Research On Reading Teacher EducationA Critical Analysis Of Research On Reading Teacher Education
A Critical Analysis Of Research On Reading Teacher EducationSarah Adams
 
Assessment Practices And Students Approaches To Learning A Systematic Review
Assessment Practices And Students  Approaches To Learning  A Systematic ReviewAssessment Practices And Students  Approaches To Learning  A Systematic Review
Assessment Practices And Students Approaches To Learning A Systematic ReviewSheila Sinclair
 
Developing a comprehensive empirically based research framework for classroom...
Developing a comprehensive empirically based research framework for classroom...Developing a comprehensive empirically based research framework for classroom...
Developing a comprehensive empirically based research framework for classroom...ahfameri
 
Developing a comprehensive, empirically based research framework for classroo...
Developing a comprehensive, empirically based research framework for classroo...Developing a comprehensive, empirically based research framework for classroo...
Developing a comprehensive, empirically based research framework for classroo...Amir Hamid Forough Ameri
 
Survey question bases
Survey question basesSurvey question bases
Survey question basescelparcon1
 
Alternative assessment
 Alternative assessment Alternative assessment
Alternative assessmentGodfred Abledu
 
Authentic And Conventional Assessment In Singapore Schools An Empirical Stud...
Authentic And Conventional Assessment In Singapore Schools  An Empirical Stud...Authentic And Conventional Assessment In Singapore Schools  An Empirical Stud...
Authentic And Conventional Assessment In Singapore Schools An Empirical Stud...Gina Rizzo
 
Authentic assessment_ An instructional tool to enhance students l.pdf
Authentic assessment_ An instructional tool to enhance students l.pdfAuthentic assessment_ An instructional tool to enhance students l.pdf
Authentic assessment_ An instructional tool to enhance students l.pdfFelizaGalleo1
 
2013, year 4 leaders eval report
2013, year 4 leaders eval report2013, year 4 leaders eval report
2013, year 4 leaders eval reportLouise Smyth
 
Summary of efficacy studies May 2015 - OpenCon Community Webcasts
Summary of efficacy studies May 2015 - OpenCon Community Webcasts Summary of efficacy studies May 2015 - OpenCon Community Webcasts
Summary of efficacy studies May 2015 - OpenCon Community Webcasts Right to Research
 
Pro questdocuments 2015-03-16(2)
Pro questdocuments 2015-03-16(2)Pro questdocuments 2015-03-16(2)
Pro questdocuments 2015-03-16(2)Rose Jedin
 
Gk 12 eval report 2011
Gk 12 eval report 2011Gk 12 eval report 2011
Gk 12 eval report 2011Louise Smyth
 
Mini-Research on Single Methodology & Study: The Case Study
Mini-Research on Single Methodology & Study: The Case StudyMini-Research on Single Methodology & Study: The Case Study
Mini-Research on Single Methodology & Study: The Case StudyFernanda Vasconcelos Dias
 
Fuzzy Measurement of University Students Importance Indexes by Using Analytic...
Fuzzy Measurement of University Students Importance Indexes by Using Analytic...Fuzzy Measurement of University Students Importance Indexes by Using Analytic...
Fuzzy Measurement of University Students Importance Indexes by Using Analytic...IRJESJOURNAL
 
A Decade of Research Literature in Physical Education Pedagogy.pdf
A Decade of Research Literature in Physical Education Pedagogy.pdfA Decade of Research Literature in Physical Education Pedagogy.pdf
A Decade of Research Literature in Physical Education Pedagogy.pdfSarah Morrow
 
Tsl 3123 Language Assessment Module ppg
Tsl 3123 Language Assessment Module ppgTsl 3123 Language Assessment Module ppg
Tsl 3123 Language Assessment Module ppgRavi Nair
 
Dr. B.C. DeSpain, National Forum Journals, www.nationalforum.com
Dr. B.C. DeSpain, National Forum Journals, www.nationalforum.comDr. B.C. DeSpain, National Forum Journals, www.nationalforum.com
Dr. B.C. DeSpain, National Forum Journals, www.nationalforum.comWilliam Kritsonis
 

Similar to Findings on Early Reading Assessments from CIERA Research (20)

A Critical Analysis Of Research On Reading Teacher Education
A Critical Analysis Of Research On Reading Teacher EducationA Critical Analysis Of Research On Reading Teacher Education
A Critical Analysis Of Research On Reading Teacher Education
 
Assessment Practices And Students Approaches To Learning A Systematic Review
Assessment Practices And Students  Approaches To Learning  A Systematic ReviewAssessment Practices And Students  Approaches To Learning  A Systematic Review
Assessment Practices And Students Approaches To Learning A Systematic Review
 
Developing a comprehensive empirically based research framework for classroom...
Developing a comprehensive empirically based research framework for classroom...Developing a comprehensive empirically based research framework for classroom...
Developing a comprehensive empirically based research framework for classroom...
 
Developing a comprehensive, empirically based research framework for classroo...
Developing a comprehensive, empirically based research framework for classroo...Developing a comprehensive, empirically based research framework for classroo...
Developing a comprehensive, empirically based research framework for classroo...
 
Survey question bases
Survey question basesSurvey question bases
Survey question bases
 
Alternative assessment
 Alternative assessment Alternative assessment
Alternative assessment
 
Action research
Action researchAction research
Action research
 
Action Research
Action ResearchAction Research
Action Research
 
Authentic And Conventional Assessment In Singapore Schools An Empirical Stud...
Authentic And Conventional Assessment In Singapore Schools  An Empirical Stud...Authentic And Conventional Assessment In Singapore Schools  An Empirical Stud...
Authentic And Conventional Assessment In Singapore Schools An Empirical Stud...
 
Authentic assessment_ An instructional tool to enhance students l.pdf
Authentic assessment_ An instructional tool to enhance students l.pdfAuthentic assessment_ An instructional tool to enhance students l.pdf
Authentic assessment_ An instructional tool to enhance students l.pdf
 
2013, year 4 leaders eval report
2013, year 4 leaders eval report2013, year 4 leaders eval report
2013, year 4 leaders eval report
 
Summary of efficacy studies May 2015 - OpenCon Community Webcasts
Summary of efficacy studies May 2015 - OpenCon Community Webcasts Summary of efficacy studies May 2015 - OpenCon Community Webcasts
Summary of efficacy studies May 2015 - OpenCon Community Webcasts
 
Pro questdocuments 2015-03-16(2)
Pro questdocuments 2015-03-16(2)Pro questdocuments 2015-03-16(2)
Pro questdocuments 2015-03-16(2)
 
Gk 12 eval report 2011
Gk 12 eval report 2011Gk 12 eval report 2011
Gk 12 eval report 2011
 
Mini-Research on Single Methodology & Study: The Case Study
Mini-Research on Single Methodology & Study: The Case StudyMini-Research on Single Methodology & Study: The Case Study
Mini-Research on Single Methodology & Study: The Case Study
 
Fuzzy Measurement of University Students Importance Indexes by Using Analytic...
Fuzzy Measurement of University Students Importance Indexes by Using Analytic...Fuzzy Measurement of University Students Importance Indexes by Using Analytic...
Fuzzy Measurement of University Students Importance Indexes by Using Analytic...
 
A Decade of Research Literature in Physical Education Pedagogy.pdf
A Decade of Research Literature in Physical Education Pedagogy.pdfA Decade of Research Literature in Physical Education Pedagogy.pdf
A Decade of Research Literature in Physical Education Pedagogy.pdf
 
Tsl 3123 Language Assessment Module ppg
Tsl 3123 Language Assessment Module ppgTsl 3123 Language Assessment Module ppg
Tsl 3123 Language Assessment Module ppg
 
Dr. B.C. DeSpain, National Forum Journals, www.nationalforum.com
Dr. B.C. DeSpain, National Forum Journals, www.nationalforum.comDr. B.C. DeSpain, National Forum Journals, www.nationalforum.com
Dr. B.C. DeSpain, National Forum Journals, www.nationalforum.com
 
Power point blog
Power point blogPower point blog
Power point blog
 

Recently uploaded

Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 

Recently uploaded (20)

Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 

Findings on Early Reading Assessments from CIERA Research

  • 1. Reading Assessments in Kindergarten through Third Grade: Findings from the Center for the Improvement of Early Reading Achievement Author(s): Scott G. Paris and James V. Hoffman Source: The Elementary School Journal, Vol. 105, No. 2, Lessons from Research at the Center for the Improvement of Early Reading Achievement<break></break>Joanne F. Carlisle, Steven A. Stahl, and Deanna Birdyshaw, Guest Editors (November 2004), pp. 199-217 Published by: The University of Chicago Press Stable URL: http://www.jstor.org/stable/10.1086/428865 . Accessed: 19/06/2013 12:22 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. . The University of Chicago Press is collaborating with JSTOR to digitize, preserve and extend access to The Elementary School Journal. http://www.jstor.org This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM All use subject to JSTOR Terms and Conditions
  • 2. The Elementary School Journal Volume 105, Number 2 ᭧ 2004 by The University of Chicago. All rights reserved. 0013-5984/2004/10502-0005$05.00 Reading Assessments in Kindergarten through Third Grade: Findings from the Center for the Improvement of Early Reading Achievement Scott G. Paris University of Michigan James V. Hoffman University of Texas at Austin Abstract Assessment of early reading development is im- portant for all stakeholders. It can identify chil- dren who need special instruction and provide useful information to parents as well as sum- mative accounts of early achievement in schools. Researchers at the Center for Improvement of Early Reading Achievement (CIERA) investi- gated early reading assessment in a variety of studies that employed diverse methods. One group of studies used survey methods to deter- mine the kinds of assessments available to teach- ers and the teachers’ reactions to the assess- ments. A second group of studies focused on teachers’ use of informal reading inventories for formative and summative purposes. In a third group of studies, researchers designed innova- tive assessments of children’s early reading, in- cluding narrative comprehension, adult-childin- teractive reading, the classroom environment, and instructional texts. The CIERA studies pro- vide useful information about current reading assessments and identify promising new direc- tions. Achievement testing in the United States has increased dramatically in frequency and importance during the past 25 years and is now a cornerstone of educational practice and policy making. The No Child Left Be- hind (NCLB) (2001) legislation mandates annual testing of reading in grades 3–8 and increased assessment for students in grades K–3 with clear intentions of increased ac- countability and achievement. The ration- ales for early assessment lie in (a) research on reading development that indicates the importance of basic skills for future success and (b) classroom evidence that early diag- nosis and remediation of reading difficul- ties can improve children’s reading achieve- ment (Snow, Burns, & Griffin, 1998). The unprecedented federal resolve and re- This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM All use subject to JSTOR Terms and Conditions
  • 3. 200 THE ELEMENTARY SCHOOL JOURNAL NOVEMBER 2004 sources at the beginning of the twenty-first century that are focused on the improve- ment of children’s reading achievement re- quire researchers and educators to identify useful assessment tools and procedures. The Center for the Improvement of Early Reading Achievement (CIERA), a consortium of researchers from many uni- versities, was funded and became opera- tional in 1997. Assessment of reading achievement and the corresponding prac- tices and policies were major foci of the re- search agenda. It is important to note that the CIERA research was proposed, and of- ten conducted, before the report of the Na- tional Reading Panel (2000) and before the NCLB legislation. These important events did not frame CIERA research at the time, but they certainly influence the interpreta- tion of assessment tools today. For example, both the NRP and NCLB emphasized five essential skills for beginning reading suc- cess: the alphabetic principle, phonemic awareness, oral reading fluency, vocabu- lary, and comprehension. Consequently, many of the early reading assessments de- veloped recently have focused on those skills, especially the first three. CIERA re- searchers acknowledge the importance of assessing these skills, but they chose to in- vestigate a broader array of assessment is- sues and practices partly because there were already many assessments of the al- phabetic principle, phonemic awareness, and oral reading fluency, such as the Dy- namic Indicators of Basic Literacy Skills (DI- BELS), a popular and quick battery of read- ing assessments (Good & Kaminski, 2002). Moreover, CIERA researchers in 1997 wanted to survey teachers to find out what assessments they used and why, as well as to identify new kinds of assessments for nonreaders and beginning readers. CIERA research on early reading assess- ment was proposed and conducted in an era of increased testing and evidence-based policy making. An initial cluster of studies examined the kinds of reading assessments available and used by teachers in order to describe current classroom practices. The studies were intended to be surveys of best practices in schools. A second group of studies examined the use of oral reading as- sessments to determine students’ adequate yearly progress (AYP) because many infor- mal reading inventories were being trans- formed into formal summative assessments of reading achievement. Several other CIERA studies examined innovative tools and new directions for assessing early read- ing achievement. The research was explor- atory, eclectic, and conducted by multiple investigators, but collectively, the studies help to identify promising assessments of reading along with some practical obstacles to implementation. We present the findings of these three groups of studies and con- clude with a discussion of future directions in K–3 reading assessment research. Surveys of Early Reading Assessments Many teachers are overwhelmed by the nu- merous reading assessments mandated by policy makers, advocated by publishers, re- quired by administrators, or simply rec- ommended for classrooms. We begin with an examination of two CIERA studies on the variety of assessment instruments avail- able for K–3 teachers. We then examine two CIERA studies of teachers’ attitudes toward and use of assessments. The first two stud- ies differ in their focus on commercial and noncommercial measures. Both studies fol- lowed up on the pioneering research by Stallman and Pearson (1990), who con- ducted one of the first comprehensive sur- veys of early reading measures. The Commercial Marketplace Ten years after Stallman and Pearson’s (1990) study, Pearson, Sensale, Vyas, and Kim (1999) conducted a similar study of commercial reading tests. They identified 148 tests with 468 subtests in their CIERA survey. More than half of the tests had been developed in the 1990s, and more than half were designed for individual administra- This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM All use subject to JSTOR Terms and Conditions
  • 4. ASSESSMENTS 201 tion, clearly a response to the preponder- ance of group tests in the previous decade. Multiple-choice responses and marking an- swer sheets still predominated over read- ing, writing, or open-ended responses. Nearly all tests were administered with a mixture of visual and auditory presenta- tions. In contrast to the previous decade, about 40% of tests required production as a response mode. Recognition was required in about 40% of the tests and identification in only 10%. Scoring ease may have driven the response mode because more than 60% of the tests could be scored simply as right or wrong, less than 20% contained multiple- choice items, and only 10% of tests used ru- brics to score answers. Pearson et al. (1999) analyzed the skills assessed in the 148 tests and found that word knowledge, such as concept identifi- cation, was assessed in 50%; sound and symbol concepts were assessed in 65%; lit- eracy and language concepts were assessed in 90%; and comprehension was assessed in only 24% of the tests. When they analyzed only the K–1 tests to compare with the Stall- man and Pearson (1990) findings, they found that 52% compared to a previous 18% of tests were administered to individual children. Only 36% of the tests compared to the previous 63% were multiple-choice tests, and the heavy emphasis on sound- symbol correspondence was reduced in half and replaced by a much stronger emphasis on language and literacy concepts. These changes may be due to the growing influ- ence of whole language, Clay’s Observation Survey (1993), and assessment methods used in Reading Recovery throughout the 1990s. Although the type of processing re- quired was still largely recognition, it had decreased from 72% of tests in the first sur- vey to 51%. Likewise, filling in bubbles de- creased from 63% to 28% of the tests, and oral responding increased from 12% to 39%. The authors also noted the variety of new reading assessments that emerged in the 1990s. Kits, such as the Dial-R, that in- cluded assessment batteries became more prominent. Some elaborate systems were developed for using classroom assessment for formative and summative functions. For example, the Work Sampling System (Mei- sels, Liaw, Dorfman, & Nelson, 1995) in- cludes developmental benchmarks from ages 3 to 12 in behavioral, academic, and affective domains that can be used with teachers’ checklists and students’ portfo- lios to monitor individual growth and achievement. The kits and elaborate sys- tems usually include teachers’ guides, cur- riculum materials, and developmental ru- brics. Leveled books became a popular tool for determining children’s reading levels for the assessment tasks, again reflecting the influence of Reading Recovery, Guided Reading, and similar derivations for in- struction. Pearson et al. (1999) concluded that commercial reading tests in the late 1990s were much more numerous and varied than the tests available 10 years earlier. More skills were tested, particularly language and literacy concepts. More choices, judg- ments, and interpretations were required from the examiner, usually the teacher, to use the new tests. However, there was still a preponderance of recognition responses and filling in bubbles on answer sheets. The researchers suggested that the changes in early reading assessments during the 1990s reflected the influences of three thematic changes to early literacy and language arts in classrooms: emergent literacy, process writing approaches, and performance as- sessments throughout the curriculum. Noncommercial Assessments The second CIERA survey of early read- ing assessments was conducted by Meisels and Piker (2000). Their study had three ob- jectives: ‘‘1) to gain an understanding of classroom-based literacy measures that are available to teachers; 2) to characterize the instructional assessments teachers use in their classrooms to evaluate their students’ literacy performance; and 3) to learn more about how teachers assess reading and writ- This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM All use subject to JSTOR Terms and Conditions
  • 5. 202 THE ELEMENTARY SCHOOL JOURNAL NOVEMBER 2004 ing skills’’ (p. 5). In contrast to the previous studies of commercial assessments, Meisels and Piker (2000) examined noncommercial literacy assessments that were nominated by teachers and educators or used by school districts. They excluded assessments used for research or accountability and focused on K–3 instruments. Some assessments of motivation and attitudes were also in- cluded. The researchers collected informa- tion from educational list serves, personal contacts, literature searches, published re- views of the measures, Web sites, and news- letter postings so their survey was directed at assessments in use rather than on sale in the marketplace. Their search identified 89 measures, 60 of which were developed in the 1990s. The coding categories Meisels and Piker (2000) used were adapted from the Stallman and Pearson (1990) survey and categorized mea- sures on 13 literacy skills: print awareness, phonics, reading, reading strategies, com- prehension, writing process, writing con- ventions, motivation, self-perception, meta- cognition, attitude, oral language listening and speaking, and oral language other. The first six of these skills are most directly re- lated to reading assessment. However, the CIERA researchers identified 203 subskills among these 13 categories. This is again an indication of the conceptual and practical decomposition of literacy skills in complex assessment batteries. Meisels and Piker (2000) found that 70% of the measures were designed for individ- ual administration and nearly half were in- tended for all grades, K–3. Only five were available in languages other than English (four in Spanish, one in Danish). Of the 13 skills, phonics, comprehension, and reading were assessed most frequently, and moti- vation, self-perception, and attitudes were measured least often. Of the 89 measures, 47 were based on observation or on- demand methods for evaluating students’ literacy. Constructed responses were used mostly with writing. Checklists were used in 36% of the measures, and running re- cords in 15%. The most frequent kind of re- sponse was oral response on 64% of the measures, followed by writing on 46% of the measures. Recognition, identification, and recall were used to assess about one- third of the skills. Meisels and Piker (2000) then examined the skills assessed in each test and found that 70% were assessed with observations and that the data were re- corded in checklists (69%) or anecdotal ob- servations (45%) most often. Both the lim- ited response formats for students and the informal records of teachers are worth not- ing. Meisels and Piker (2000) examined the measures for evidence of psychometric re- liability and validity and expressed disap- pointment with the results. Only 14% of the measures had evidence of good reliability that ranged from high to moderate. Even less information was available about valid- ity. No consistent tests or benchmarks were used to establish concurrent or predictive validity. The researchers noted that the non- commercial measures were less likely to in- clude psychometric evidence than commer- cial tests. In comparing their results to the Stallman and Pearson (1990) study, Meisels and Piker (2000) also noted that the non- commercial measures were usually de- signed for individuals, not groups, and had more opportunities for students to identify or produce answers rather than just recog- nize correct choices. Noncommercial mea- sures usually had fewer guidelines for ad- ministering, recording, and interpreting the assessment information. How Teachers Use and Regard Reading Assessments The next set of studies went beyond a consideration of the instruments to examine how teachers use and evaluate them. Paris, Paris, and Carpenter (2002) reported find- ings from a survey of teachers’ perceptions of assessment in early elementary grades. They asked successful teachers what kinds of reading assessments they used for what purposes so that a collection of ‘‘best prac- This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM All use subject to JSTOR Terms and Conditions
  • 6. ASSESSMENTS 203 tices’’ might be available as models for other teachers. The assessment survey was a part of a large CIERA survey of elemen- tary teachers who taught in ‘‘beat the odds’’ schools to determine their practices and views. These schools across the United States had a majority of students who qual- ified for Title I programs and had a mean school test score on some standardized measure of reading achievement that was higher than the average score of other Title I schools in the state. Most of the selected schools also scored above the state average for all schools. Candidate schools were se- lected from a network of CIERA partner schools as well as from annual reports of outstanding schools in 1996, 1997, and 1998 as reported by the National Association of Title I Directors. The sample included 504 K–3 classroom teachers in ‘‘beat the odds’’ schools, but the anonymous and voluntary survey made it impossible to determine if these were the most effective teachers in the schools. In the first part of the survey, teachers were asked to record the types of reading as- sessments used in their classrooms and the frequency with which they used each one. Most teachers reported that they used all of the assessment types; 86% used per- formance assessments, 82% used teacher- designed assessments, 78% used word at- tack/word meaning, 74% used measures of fluency and understanding, 67% used commercial assessments, and 59% used standardized reading tests. The survey showed that K–3 teachers used a variety of assessments in their class- rooms daily. Assessments designed by teachers, including the instructional assess- ments Meisels and Piker (2000) examined, were used most frequently, and standard- ized tests were used least often. This con- trast was most evident for K–1 teachers who rarely used standardized tests. The survey showed that K–3 teachers used ob- servations, anecdotal evidence, informal in- ventories, and work samples as their main sources of evidence about children’s read- ing achievement and progress. The survey also showed the variety of tools available to teachers and the large variation among teachers in what they used. The daunting variety of assessments requires a highly skilled teacher to select and use appropriate tools. Another part of the survey posed ques- tions about the effects of assessments on various stakeholders. In general, teachers reported that teacher-designed, informal as- sessments had more positive effects on stu- dents, teachers, and parents. Conversely, teachers believed standardized and com- mercial assessments had a higher positive effect on administrators. These patterns suggest that teachers differentiate between assessments over which they have control and assessments generated externally in terms of their effects on stakeholders. It is ironic that teachers believed that the most useful assessments for students, teachers, and parents were valued less by adminis- trators than standardized and commercial assessments. Responses to High-Stakes Assessment A fourth survey conducted by CIERA researchers gathered the views of teachers regarding high-stakes testing (Hoffman, Assaf, & Paris, 2001). This study, which sur- veyed reading teachers in Texas, was de- signed as a modified replication of earlier investigations of teachers’ views of high- stakes testing in Arizona (Haladyna, Nolen, & Haas, 1991) and Michigan (Urdan & Paris, 1994). Texas is recognized nationally as one of the leaders in the testing and ac- countability movement. The Texas Assess- ment of Academic Skills (TAAS) was the centerpiece of the state’s accountability sys- tem throughout the 1990s. The TAAS was a criterion-referenced assessment of reading and mathematics given to all Texas students in grades 3–8 near the end of the year. It has recently been replaced by the Texas Assess- ment of Knowledge and Skills (TAKS), but the design and use are essentially the same. The study, conducted in 1998–1999, in- This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM All use subject to JSTOR Terms and Conditions
  • 7. 204 THE ELEMENTARY SCHOOL JOURNAL NOVEMBER 2004 cluded responses from 200 experienced reading specialists who returned a mail sur- vey. For the most part, respondents were older (61% between the ages of 40 and 60) and more experienced (63% with over 10 years experience and 45% with over 20 years experience) than Texas classroom teachers in general. Most respondents were working in elementary grades (78%) and in minority school settings (81%) serving low- income communities (72%) where the need for reading specialists was greatest and funds for them were most available. To examine general attitudes we created a composite scale for the following four items from this section: • Better TAAS tests will make teachers do a better job. • TAAS tests motivate students to learn. • TAAS scores are good measures of teachers’ effectiveness. • TAAS test scores provide good com- parisons of the quality of schools from different districts. Each of these items represents some of the political motivations and intentions that underlie the TAAS. Respondents rated each item on a scale ranging from 1 (strongly disagree) to 4 (strongly agree). The average rating on this composite vari- able was 1.7 (SD ‫ס‬ .58), suggesting that reading specialists strongly disagreed with some of the underlying assumptions of and intentions for the TAAS. Another composite variable was created with items related to the validity of the TAAS as a measure of student learning. The four items included in this analysis were: • TAAS tests accurately measure achieve- ment for minority students. • TAAS tests accurately measure achieve- ment for limited English-speaking stu- dents. • Students’ TAAS scores reflect what students have learned in school during the past year. • Students’ TAAS scores reflect the cu- mulative knowledge students have learned during their years in school. The average rating on this composite vari- able was also 1.7 (SD ‫ס‬ .58), suggesting that reading specialists challenge the validity of the test, especially for minority students and ESL speakers, who are the majority of students in Texas public schools. Contrast these general attitudes and be- liefs regarding TAAS with the perception of the respondents that administrators believe TAAS performance is an accurate indicator of student achievement (M ‫ס‬ 3.1) and the quality of teaching (M ‫ס‬ 3.3). Also, contrast this with the perception of the reading spe- cialists that parents believe the TAAS re- flects the quality of schooling (M ‫ס‬ 2.8). The gaping disparity between the percep- tions of those responding and their views of administrators’ and parents’ attitudes sug- gests an uncomfortable dissonance. Other parts of the TAAS survey revealed that reading specialists reported more pressure to cheat on the tests among low-performing schools, inappropriate uses of the TAAS data, adverse effects on the curriculum, too much time spent on test preparation, and negative effects on teachers’ morale and motivation. In sum, the survey revealed un- intended and negative consequences of high-stakes testing that are similar to results of other studies of the consequences of high-stakes testing (e.g., Paris, 2000; Paris, Lawton, Turner, & Roth, 1991; Urdan & Paris, 1994). Summary of the CIERA Surveys The four CIERA surveys support several conclusions. First, a vast assortment of com- mercial and informal reading assessments is available for K–3 classroom teachers. Stall- man and Pearson (1990) identified 20 com- mercial reading tests, yet 10 years later Pearson et al. (1999) found 148, and the number is certainly higher today. However, commercial tests are not the only source of reading assessments. Meisels and Piker (2000) solicited information about noncom- mercial assessments from teachers and ed- This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM All use subject to JSTOR Terms and Conditions
  • 8. ASSESSMENTS 205 ucators and identified 89 types of literacy assessments measuring 203 skills. Teachers face a formidable task of finding appropri- ate tools, obtaining them, and then adapt- ing the assessments to their own purposes and students. Second, reading assessments varied by grade level. Teachers in K–1, compared to teachers in grades 2–3, were more likely to use assessments of print awareness, phon- ics, and similar enabling skills than assess- ments of reading, writing, or motivation. Teachers in grades K–1 were also less likely than teachers in grades 2–3 to use standard- ized tests and commercial assessments. Ob- servations were reported as the most com- mon type of assessment and may be slightly more frequent at grades K–1. Recognition as a response option was also used most fre- quently among younger children, whereas identification and production were more frequent at grades 2–3. Teachers in grades 2–3 use more sophisticated tests of reading and writing and fewer measures of enabling skills as their assessment methods match the developing abilities of their students. Third, teachers regarded informal mea- sures that they design, select, and embed in the curriculum as more useful for teachers, students, and parents than commercial as- sessments. Teachers regarded standardized tests and commercial tests that allow little teacher control and adaptation as less useful and used them less often. Paradoxically, the standardized tests were regarded as having the most effect on administrators’ knowl- edge and reporting practices. We think that teachers’ frustration with assessments is partly tied to this paradox. Fourth, the most frequently used and highly valued reading assessments are least visible to parents and administrators be- cause they are not reported publicly. Obser- vations, anecdotes, and daily work samples are certainly low-stakes evidence of achievement for accountability purposes, but they may be the most useful for teach- ers, parents, and students. It is also ironic that the assessments on which teachers feel least trained and regard as least useful (i.e., standardized tests) are used most often for evaluations and public reports. Together these findings suggest that teachers need support in establishing the value of instruc- tional assessments in their classrooms for administrators and parents while also de- marcating the limits and interpretations of externally mandated tests (see Hoffman, Paris, Patterson, Salas, & Assaf, 2003). The current slogan about the benefits of a bal- anced approach to reading instruction might also be applied to a balanced ap- proach to reading assessment. The skills that are assessed need to be balanced among various components of reading, and the purposes/benefits of assessment need to be balanced among the stakeholders. The critical question that many policy makers ask is, Which reading assessments provide the best evidence about children’s accomplishments and progress? The an- swer may not be one test or even one type of assessment. A single assessment cannot adequately represent the complexity of a child’s reading development. Likewise, the same assessments may not represent the curriculum and instructional diversity among teachers. A single assessment cannot capture the variety of skills and develop- mental levels of children in most K–3 classes. That is why teachers use multiple assessments and choose those that fit their purposes. These assessments are the ones that can reveal the most information about their students. We believe that the most ro- bust evidence about children’s reading re- veals developing skills that can be com- pared to individual standards of progress as well as to normative standards of achieve- ment. A developmental approach balances the types of assessments across a range of reading factors and allows all stakeholders to understand the strengths and weak- nesses of the child’s reading profile. Many teachers use this approach implicitly, and we think it is a useful model for early read- ing assessment rather than a one-test-fits-all approach. This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM All use subject to JSTOR Terms and Conditions
  • 9. 206 THE ELEMENTARY SCHOOL JOURNAL NOVEMBER 2004 Assessment of Students’ Oral Reading Oral reading has been a focus for the as- sessment of early reading development throughout the twentieth century (Rasinski & Hoffman, 2003). Teachers in the afore- mentioned surveys reported using chil- dren’s oral reading as an indicator of growth and achievement. The informal reading inventory (IRI) changed over time to focus on the accuracy of oral reading with less attention to reading rate until recently. Now researchers have focused attention on three facets of oral reading fluency—rate, accuracy, and prosody—as indicators of au- tomatic decoding and successful reading (Kuhn & Stahl, 2003). During the first year of CIERA, Scott Paris and David Pearson were asked by the Michigan Department of Education (MDE) to help evaluate the new Michigan Literacy Progress Profile (MLPP) while also evalu- ating summer reading programs through- out the state. These research projects dove- tailed with CIERA research on assessment, so we spent 5 years working with the In- gham Intermediate School District and MDE evaluating summer reading programs and testing components of the MLPP. The program evaluations led to several insights about early reading assessments and eval- uation research that are worth noting here (Paris, Pearson, et al., in press). One insight from the research was the realization that informal reading invento- ries (IRIs) were legitimate tools for assess- ing student growth in reading and for pro- gram evaluation. In the past 5–7 years, several state assessment programs and commercial reading assessments have used leveled texts with running records or mis- cue analyses as formative and summative assessments of early reading. There has been widespread enthusiasm for such IRI assessments that serve both purposes be- cause the assessments are authentic, aligned with classroom instructional practices, and integrated into the curriculum. In fact, IRIs are similar to the daily performance assess- ments and observations teachers reported in the CIERA survey of classroom assess- ments. However, the use of IRIs for sum- mative assessment must be viewed with caution until the reliability and validity of IRI assessments administered by teachers can be established. Extensive training and professional development that integrate reading assessment with instruction seem necessary in our experience. A second insight has involved the diffi- culties in analyzing students’ growth when students are reading different leveled texts. The main problem in using IRIs for mea- suring reading growth is that running re- cords and miscue analyses are gathered on variable levels of text that are appropriate for each child. Thus, comparing a child’s reading proficiency at two times (or com- paring various children to each other over time) usually involves comparisons of dif- ferent passages and text levels, so changes in children’s performance are confounded by differences between passages and diffi- culty levels. Paris (2002) identified several methods for analyzing IRI data from lev- eled texts and concluded that the most so- phisticated statistical procedure was based on Item Response Theory (IRT). In the eval- uation of summer reading programs, Paris, Pearson, et al. (2004) used IRT analyses to scale all the reading data from more than 1,000 children on different passages and dif- ferent levels of an IRI so the scores could be compared on single scales of accuracy, com- prehension, retelling, and so forth. Those analyses revealed significant effects on chil- dren who participated in summer reading programs compared to control groups of children who did not participate in the sum- mer programs (see Paris, Pearson, et al., 2004). A brief description of IRT analyses will reveal the benefits of this approach. IRT is a psychometric method of analyzing data that allows estimates of individual scores that are independent of the actual test items. This is important for reading assessment that compares students’ growth over time This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM All use subject to JSTOR Terms and Conditions
  • 10. ASSESSMENTS 207 on different levels, items, and tests, which is the usual problem using IRI data. The IRT scaling procedures in a two-parameter Rasch model estimate the individual scores and item difficulties simultaneously (Em- bretson & Reise, 2000). The crux of an IRT analysis is to find optimal estimates for the item parameters that depend on the stu- dents’ IRT scores that, in turn, depend on the item parameters. This catch-22 is solved statistically by an iterative procedure that converges toward a final solution with op- timal estimates for all parameters. How- ever, the calculation is different from other statistical procedures, such as regression analysis, because ‘‘likelihood’’ is the under- lying concept and not regression weights. The item difficulty is calculated accord- ing to a logistic function that identifies the point on an item parameter scale where the probability of a correct response is exactly .50. The distribution of correct answers across items of varying difficulty from stu- dents in the sample permits estimates of in- dividual IRT scores that are based on the actual as well as possible patterns of correct responses. The numerical IRT scale is then established with a zero point and a range of scores, for example, 0–100 or 200–800. For- tunately, there are software programs avail- able to calculate IRT scores, but they have rarely been used with children’s reading data derived from IRIs and leveled texts. We think IRT analyses are scientifically rig- orous and potentially useful ways to ex- amine children’s reading data and progress over time. A third set of insights about reading as- sessments involved practical decisions about how to use IRIs effectively. Paris and Carpenter (2003) found that teachers re- quire sustained professional development and schoolwide implementation of reading assessments to use them uniformly, consis- tently, and wisely. The real benefit of IRIs is the knowledge teachers gain while assess- ing individual children because the assess- ment framework provides insights about needed instruction. Teachers need guidance in selecting IRIs, administering them, inter- preting them, and using the results with students and parents, and that guidance needs to be shared knowledge among the school staff so it creates a culture of under- standing about reading assessment. Paris and Carpenter (2003) found that imple- menting a schoolwide system of recording and reporting the data as part of the veri- fication of students’ adequate yearly pro- gress (AYP) made the assessments worth the time and energy of all the participants. Thus, teachers gained diagnostic informa- tion about students and also provided ac- countability through measures of AYP by comparing fall and spring scores. A fourth insight that researchers gained is that IRIs can provide multiple indicators of children’s oral reading, including rate, accuracy, prosody, retelling, and compre- hension and that teachers can choose which measures to collect. CIERA research iden- tified some problems with the various mea- sures derived from IRIs (Paris, Carpenter, Paris, & Hamilton, in press). For example, there are restricted ranges and ceiling ef- fects in some measures, such as prosody and accuracy. It also appears that compre- hension is more highly related to oral read- ing accuracy and rate in beginning readers and that the relation decreases by the time children are reading texts at a third- or fourth-grade level. This means that some children become adept ‘‘word callers’’ with little evidence of comprehension, so reading rate and accuracy measures in IRIs may yield incomplete information for older readers. IRI data on oral reading fluency and comprehension are most informative about children’s reading during initial skill devel- opment, approximately grades K–3, and when the information is used in combina- tion with other assessments. Assessmentsof prerequisite skills for fluent oral reading, such as children’s vocabulary, letter-sound knowledge, phonological awareness, begin- ning writing, understanding of text conven- tions, and book-handling skills, may aug- This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM All use subject to JSTOR Terms and Conditions
  • 11. 208 THE ELEMENTARY SCHOOL JOURNAL NOVEMBER 2004 ment IRIs with valuable information. Thus, IRIs provide developmentally sensitive as- sessments for beginning and struggling readers when fluency and understanding are growing quickly and when teaching fo- cuses on specific reading skills. IRIs are ex- cellent tools for combining diagnostic and summative assessments in an authentic for- mat for teachers and students. New Directions in Early Reading Assessment In this part of our review, we summarize four examples of innovative assessments by CIERA researchers that chart new direc- tions in literacy assessment with young children. Narrative Comprehension During the past 10 years of renewed em- phases on beginning reading, there has been less attention given to children’s com- prehension skills compared to decoding skills (National Reading Panel, 2000). More research on young children’s comprehen- sion skills and strategies is needed to diag- nose and address children’s early reading difficulties that extend beyond decoding. A major CIERA assessment project focused on children’s comprehension of narrative sto- ries, and more specifically, on narratives il- lustrated in wordless picture books. Paris and Paris (2003) created and tested compre- hension assessment materials and proce- dures that can be used with young children, whether or not they can decode print. Such early assessments of comprehension skills can complement existing assessments of en- abling skills, provide diagnostic measures of comprehension problems, and link com- prehension assessment with classroom in- struction. Narrative comprehension is a complex meaning-making process that depends on the simultaneous development of many skills, including, for example, understand- ing of story structure and relations among elements and psychological understanding about characters’ thoughts and feelings. It is important to assess narrative comprehen- sion for several reasons. First, narrative competence is among the fundamental cog- nitive skills that influence early reading de- velopment. Whitehurst and Lonigan (1998) refer to these skills as ‘‘outside-in’’ skills be- cause children use the semantic, conceptual, and narrative relations that they already know to comprehend the text. In this view, narrative competence is a fundamental as- pect of children’s comprehension of expe- riences before they begin to read, and it helps children map their understanding onto texts. The importance and early devel- opment of narrative thinking may be one reason that elementary classrooms are dom- inated by texts in the narrative genre (Duke, 2000). Second, because of the extensive re- search on narrative comprehension, there is ample documentation of its importance among older children and adults as well as extensive research on its development (e.g., Berman & Slobin, 1994). Third, the clear structure of narrative stories with specific elements and relations provides a structure for assessment of understanding. Fourth, narrative is closely connected to many con- current developmental accomplishments of young children in areas such as language, play, storytelling, and television viewing. It is an authentic experience in young chil- dren’s lives, and it reveals important cog- nitive accomplishments. In a procedure similar to one van Kraay- enoord and Paris (1996) used, Paris and Paris (2003) modified trade books with clear narrative story lines—a strategy that can be used easily for both assessment and instruc- tional purposes—to create the narrative comprehension (NC) assessment task. They located commercially published wordless picture books, adapted them by deleting some irrelevant pages to shorten the task, and assembled the pages of photocopied black and white pictures into spiral bound little books. It was important that the story line revealed by the pictures was clear with an obvious sequence of events and that the pictures contained the main elements of sto- This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM All use subject to JSTOR Terms and Conditions
  • 12. ASSESSMENTS 209 ries (i.e., settings, characters, problems, res- olutions). The first study in Paris and Paris (2003) established the NC task procedures for observing how K–2 children interacted with wordless picture books under three conditions: spontaneous examination dur- ing a ‘‘picture walk,’’ elicited retelling, and prompted comprehension during question- ing. The results were striking. The retelling and prompted comprehension scores in- creased in regular steps for each grade from K to 2, and readers were significantly better than nonreaders on both measures, thus showing developmental sensitivity of the assessment task. There were no develop- mental differences on the picture walk be- haviors, however. In study 2, Paris and Paris (2003) ex- tended the procedures to additional word- less picture books and examined the reli- ability of the assessment procedures. The similarity of developmental trends across books indicated that the NC task is sensitive to progressive increases in children’s abili- ties to make inferences and connections among pictures and to construct coherent narrative relations from picture books. Sim- ilarity of performance across books showed that examiners can administer the NC task with different materials and score children’s performance in a reliable manner. Thus, the generalizability and robustness of the NC task across picture books were supported. In study 3, Paris and Paris (2003) ex- amined the predictive and concurrent va- lidity of the NC task with standardized and informal measures of reading. The similar- ity in correlations, overall means, and de- velopmental progressions of NC measures confirmed the patterns revealed by studies 1 and 2 with new materials and additional children. In addition, the NC task was sen- sitive to individual growth over 1 year that was not due to practice effects. The NC re- telling and comprehension measures were correlated significantly with concurrent as- sessments with an IRI and the Gates- MacGinitie Reading Test, a standardized, group-administered test. Furthermore, the NC comprehension scores among first graders significantly predicted their scores on the Iowa Tests of Basic Skills (ITBS) a year later in second grade (r ‫ס‬ .52). The three studies provided consistent and positive evidence about the NC task as a developmentally appropriate measure of 5–8-year-old children’s narrative under- standing of picture books. Retelling and prompted comprehension scores improved significantly with age, indicating that the NC task differentiates children who can re- call main narrative elements, identify criti- cal explicit information, make inferences, and connect information across pages from children who have weaknesses with these narrative comprehension skills. The NC task requires brief training and can be given to children in less than 15 minutes, which is critical for individual assessment of young children. The high percentage agreement between raters among the three books showed that the scoring rubrics are reliable across story content and raters. The similar patterns of cross-sectional and longitudinal performance further confirmed the gener- alizability of the task. The strong concurrent and predictive relations provided encour- aging evidence of the validity of the NC task as a measure of comprehension for emergent readers. In a related series of CIERA studies, van den Broek et al. (in press) examined pre- school children’s comprehension of tele- vised narratives. They showed 20-minute episodes of children’s television programs and presented 13-minute audiotaped sto- ries to children to compare their viewing and listening comprehension. Children re- called causally related events in the narra- tives better than other kinds of text rela- tions, and their recall scores in viewing and listening conditions were highly correlated. Furthermore, preschoolers’ comprehension of TV episodes predicted their standardized reading comprehension test scores in sec- ond grade. The predictive strength re- mained even when vocabulary and word identification skills were controlled in a re- This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM All use subject to JSTOR Terms and Conditions
  • 13. 210 THE ELEMENTARY SCHOOL JOURNAL NOVEMBER 2004 gression analysis. Thus, narrative compre- hension skills of preschoolers can be as- sessed with TV and picture books, and the measures have significant predictive valid- ity for later reading comprehension. We think that narrative comprehension view- ing and listening tasks can help teachers to focus on comprehension skills of young children even if the children have restricted decoding skills, few experiences with books, or limited skills in speaking English. Parent-Child Interactive Reading DeBruin-Parecki (1999) created an as- sessment procedure for family literacy pro- grams that records interactive book reading behaviors. One purpose of the assessment was to help parents with limited literacy skills understand the kinds of social, cog- nitive, and literate behaviors that facilitate preschool children’s engagement with books. A second purpose was to provide family literacy programs with visible evi- dence of the quantity and quality of parent- child book interactions. The research was based on the premise that children learn to read early and with success when parents provide stimulating, print-rich environ- ments at home (e.g., Bus, van Ijzendoorn, & Pellegrini, 1995). Moreover, parents must provide appropriate support during joint book reading. Morrow (1990) identified ef- fective interactive reading behaviors, such as questioning, scaffolding dialogue and re- sponses, offering praise or positive rein- forcement, giving or extending information, clarifying information, restating informa- tion, directing discussion, sharing personal reactions, and relating concepts to life ex- periences. Thus, DeBruin-Parecki (1999) created the Adult/Child Interactive Read- ing Inventory (ACIRI) to assess these kinds of behaviors. The ACIRI lists 12 literacy behaviors of adults and the corresponding 12 behaviors by children. For example, one adult behav- ior is ‘‘poses and solicits questions about the book’s content,’’ and the corresponding child behavior is ‘‘responds to questions about the book.’’ There were four behaviors in each of the following three categories: en- hancing attention to text, promoting inter- active reading and supporting comprehen- sion, and using literate strategies. The observer using the ACIRI recorded the fre- quencies of the 12 behaviors for both parent and child along with notes about the joint book reading. The assessment was designed to be brief (15–30 minutes), flexible, non- threatening, appropriate for any texts, shared with parents, and informative about effective instruction. Following the assess- ment, the observer discusses the results with the parent to emphasize the positive features of the interaction and to provide guidance for future book interactions. The transparency of the assessment and the im- mediate sharing of information minimize the discomfort of being observed. After leaving the home, the observer can record additional notes and calculate quantitative scores for the frequencies of observed be- haviors. DeBruin-Parecki tested the ACIRI with 29 mother-child pairs enrolled in an Even Start family literacy program in Michigan. The children were 3 to 5 years old, and the mothers were characterized as lower socio- economic status. The regular staff collected ACIRI assessments at the beginning and end of the year as part of their program evaluation and field testing of the assess- ment. Familiarity also minimized anxiety about being observed. The results sup- ported the usefulness of the instrument. The ACIRI was shown to be sensitive to parent-child interactions because the four behaviors in each of the categories showed significant correlations between the fre- quencies of adult and child behaviors. Re- liability was evaluated by having observers rate videotaped parent-child book interac- tions. Interrater reliability was 97% among eight raters. Consequential validity was established through staff interviews that showed favorable evaluations of the ACIRI. The comparison of fall to spring scores showed that parents and children increased This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM All use subject to JSTOR Terms and Conditions
  • 14. ASSESSMENTS 211 the frequencies of many of the 12 behaviors during the year. Thus, the ACIRI provided both formative and summative assessment functions for Even Start staff. Book Reading in Early Childhood Classrooms In addition to measuring reading skills of children and adults, it is also important to assess the literate environment. Longi- tudinal and cross-sectional studies of chil- dren’s literacy development reveal that more frequent book reading at home, sup- ported by interactive conversations and scaffolded instruction, leads to growth in language and literacy during early child- hood (Scarborough & Dobrich, 1994; Ta- bors, Snow, & Dickinson, 2001). Similar studies in schools have shown that the qual- ity of teacher-child interaction, the fre- quency of book reading, and the availability of books all enhance children’s early read- ing and language development (Neuman, 1999; Whitehurst & Lonigan, 1998). Thus, assessments of environments that support reading can help improve conditions at home and school. Dickinson, McCabe, and Anastasopoulos (2001) reported a framework for assessing book reading in early childhood classrooms along with data from their observations of many classrooms. They derived the follow- ing five important dimensions to evaluate in classrooms. • Book area. Issues to consider include whether there is a book area, the qual- ity of the area, and the quantity and quality of books provided. • Time for adult-child book reading. Time is a critical ingredient, and con- sideration should be given to the fre- quency and duration of adult-mediated reading experiences, including one-to- one, small-group, and whole-class read- ings, as well as the number of books read during these sessions. • Curricular integration. Integration re- fers to the nature of the connection be- tween the ongoing curriculum and the use of books during whole-class times and throughout the day. • Nature of the book reading event. When considering a book reading event, one should examine the teacher’s reading and discussion styles and chil- dren’s engagement. • Connections between the home and classroom. The most effective teachers and programs strive to support read- ing at home through parent education, lending libraries, circulation of books made by the class, and encouragement of better use of community libraries. Dickinson et al. (2001) examined data from four studies in order to evaluate the importance of these dimensions. They noted that many of the preschool class- rooms they observed were rated high in quality using historical definitions of devel- opmentally appropriate practices, but that the same classrooms were rated as having low-quality literacy instruction. For exam- ple, only half the classrooms they observed had separate areas for children to read books, and there were few informational books and few books about varied racial and cultural groups. They found no book reading at all in 66 classrooms. In the other classrooms, adults read to children less than 10 minutes per day, and only 35% of the classes allowed time for children to look at books on their own. Other observations led the researchers to conclude that book read- ing was not coordinated with the curricu- lum or learning goals. Only 19% of class- rooms had three or more books related to a curricular theme, and only 35% of class- rooms had listening centers. Dickinson et al. (2001) noted that group reading is a filler activity used in classroom transitions rather than an instructional and curricular priority. The researchers also examined book reading in classrooms by assessing teachers’ style, animation, and prosody as they read. Most teachers read with little expressive- ness. Many used an explicit management style, such as asking questions of children who raised their hands, as ‘‘crowd control’’ This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM All use subject to JSTOR Terms and Conditions
  • 15. 212 THE ELEMENTARY SCHOOL JOURNAL NOVEMBER 2004 rather than using thought-provoking ques- tions about text. More than 70% of teachers’ talk during book reading made few cogni- tive demands on children. The researchers suggested that teachers must devote more attention to engaging children in discus- sions that link their experiences to text, teach new vocabulary words, probe char- acters’ motivations, and promote compre- hension of text relations. Analyses of home- school connections revealed that more could be done to encourage families to re- inforce school practices and to seek com- munity literacy resources. Teachers rarely connected language and cultural experi- ences at home with literacy instruction at school. Dickinson et al. (2001) concluded that the framework can be useful for assessing early childhood classrooms and for study- ing the effects of specific environmental fea- tures on children’s literacy development. They noted, for example, that their research revealed little correlation between the amount of book reading in classrooms and the degree to which reading was integrated into the curriculum. They interpreted this as evidence that book reading in many early childhood classrooms is an incidental activ- ity rather than a planned instructional goal. The framework is also useful for reading educators to use with preservice and in- service teachers who want to assess their own classrooms and teaching styles because it identifies critical elements of successful classrooms. Texts and the Text Environment in Beginning Reading Instruction The research by Dickinson et al. (2001) provides a conceptual bridge to several other CIERA investigations of texts and the text environment. Two important strands of research at CIERA have examined the as- sessment of text characteristics for begin- ning reading instruction. In the first strand of research, Hiebert and her colleagues de- veloped a text assessment framework that can be used to analyze important features of texts used for beginning reading instruc- tion. This framework is grounded in Hie- bert’s theoretical claims that certain text fea- tures scaffold readers’ success in early reading. The framework, called the Text Ele- ments by Task (TExT) model, identifies two critical factors in determining beginning readers’ success with texts: linguistic con- tent and cognitive load. Linguistic content refers to the knowl- edge about oral and written language that is necessary for readers to recognize the words in particular texts. Phoneme- grapheme knowledge that is required to read a text is described in terms of several measures that provide different but comple- mentary information on the phoneme- grapheme knowledge related to vowels. The first measure of phoneme-grapheme knowledge summarizes the complexity of the vowel patterns in a text. The second measure is the degree to which highly com- mon vowel and consonant patterns are re- peated. To use this measure, the TExT model examines the number of different on- sets that appear with a particular rime. The number of syllables in words is a third mea- sure of linguistic content that influences be- ginning readers’ recognition of words. The model claims that texts with fewer multi- syllabic words help children acquire fluent word recognition. The cognitive load factor within the TeXT model measures the amount of new linguistic information to which beginning readers can attend while continuing to un- derstand the text’s message. Repetition of at least some core linguistic content has tra- ditionally been used to reduce the cognitive load in text used for teaching children to read. Within the TeXT model, word repeti- tion and the number of unique words rela- tive to total words are used to inspect the cognitive load a particular text places on be- ginning readers. Two additional features of texts that are commonly used in classrooms are also considered in the model: the sup- This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM All use subject to JSTOR Terms and Conditions
  • 16. ASSESSMENTS 213 port provided through illustrations, and patterns of sentence and text structure. In a recent study applying these TeXT princi- ples, Menon and Hiebert (2003) compared ‘‘little books’’ developed within the frame- work to traditional beginning reading texts. They demonstrated the effectiveness of stu- dents’ practice reading little books devel- oped in consideration of the TeXT model over traditional basal texts. The assessment tools developed in this research can be used to evaluate the complexity of texts as well as to guide the construction of new texts for beginning readers. In the second strand of research focused on texts, Hoffman and his colleagues inves- tigated the qualities of texts used in begin- ning reading instruction and the leveling systems of these texts. The research con- ducted through CIERA was grounded in earlier studies of changes in basal reading texts associated with the literature-based movement (Hoffman et al., 1998) and later with the decodable text movement (Hoff- man, Sailors, & Patterson, 2002). Hoffman (2002) has proposed a model for the assess- ment of text quality for beginning reading instruction that considers three factors: ac- cessibility, instructional design, and engag- ing qualities. Accessibility of a text for the reader is a function of the decoding de- mands of the text (a word-level focus) and the support provided through the predict- able features of the text (ranging from pic- ture support, to rhyme, to repeated phrases, to a cumulative structure). The instructional design factor involves the ways that a text fits into the larger scheme for texts read (a leveling issue) as well as the instruction and curriculum that surround the reading of the text. Finally, the content factor considers the content, language, and design features of the text. In addition to evaluating the changes in texts for beginning readers, Hoffman, Roser, Salas, Patterson, and Pennington (2001) used this framework to study the validity of some of the popular text- leveling schemes. In a study involving over 100 first-grade learners, these researchers examined the ways in which estimates of text difficulty using different text-leveling systems predicted the performance of first- grade readers. The research identified high correlations between the factors in the theo- retical model and the leveling from both the Pinnell and Fountas and Reading Recovery systems. The analysis also confirmed the validity of these systems in predicting stu- dent performance with first-grade texts. Fi- nally, the study documented the effects of varying levels of support provided to the reader (shared reading, guided reading, no support) and performance on such mea- sures as oral reading accuracy, rate, and flu- ency. Hoffman and Sailors (2002) created a method for assessing the classroom literacy environment called the TEX-IN3 that in- cludes three components: a text inventory, a text in-use observation, and a series of text interviews. The TEX-IN3 draws on several research literatures, including research on texts conducted through CIERA. In addi- tion, the instrument was developed based on the literature exploring the effects of the text environment on teaching and learning. The assessment yields a series of scores as well as qualitative data on the classroom lit- eracy environment. The TEX-IN3 was validated in a study of over 30 classrooms (Hoffman, Sailors, Duffy, & Beretvas, 2003). In this study, stu- dents were tested with pre- and posttests on a standardized reading test. Observers were trained in the use of TEX-IN3, and high lev- els of reliability were established. Data were collected in classrooms at three times (fall, winter, and spring). Data analyses focused on the relations between features of the text environment from the TEX-IN3 and stu- dents’ reading comprehension scores. The analyses supported all three components of the TEX-IN3. For example, the correlations between students’ gain scores and the rat- ings of the overall text environment were This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM All use subject to JSTOR Terms and Conditions
  • 17. 214 THE ELEMENTARY SCHOOL JOURNAL NOVEMBER 2004 significant. Correlations between students’ gain scores and the in-use scores derived from observation of teaching, as well as the correlations between the rating of teachers’ understanding and valuing of the text en- vironment with students’ gain scores, were significant. The findings from the research with the TEX-IN3 suggest the importance of expanding assessment from a narrow focus on the texts in classrooms to consideration of texts in the full literacy environment. Summary and Future Research From 1997 to 2002, CIERA researchers con- ducted many studies of early reading as- sessment that focused on readers and text, home and school, and policy and profes- sion. The CIERA surveys of early reading assessments identified the expanding array of assessment instruments, commercial and noncommercial, available to K–3 teachers. Researchers also identified how effective teachers in schools that ‘‘beat the odds’’ use assessments and how they view the utility and effects of various types of assessments. The instruments used most frequently con- tributed to ongoing CIERA research on the development of the MLPP battery and the use of IRIs for formative and summative purposes. Those studies remain in progress as researchers collect longitudinal evidence about the reliability and validity of early reading assessments (e.g., Paris, Carpenter, et al., in press). The most important insight from this re- search is that some skills, such as alphabet knowledge, concepts of print, and phone- mic awareness, are universally mastered in relatively brief developmental periods. As a consequence, the distributions of data from these variables are skewed by floor and ceil- ing effects that, in turn, influence the cor- relations used to establish reliability and va- lidity of the assessments. Assessments of oral reading accuracy, and perhaps rate, are also skewed, so that measures of some basic reading skills are difficult to analyze with parametric statistics in traditional ways. The mastery of some reading skills poses challenges to conventional theories of read- ing development and traditional statistical analyses. CIERA researchers also developed in- novative assessments of comprehension with wordless picture books that offer teachers new ways to instruct and assess comprehension with children who cannot yet decode print. These cross-sectional and longitudinal studies substantiate the reli- ability and validity of early assessments. In addition, CIERA researchers designed and tested new methods for assessing narrative comprehension, interactive parent-child reading, literate environments in early childhood classrooms, text features, and the text environment. All of these tools have immediate practical applications and bene- fits for educators. Indeed, the hallmark of CIERA research on reading assessment is the use of rigorous methods to identify, cre- ate, test, and refine instruments and prac- tices that can help parents and teachers pro- mote the reading achievement of all children. This research, as well as studies outside the immediate CIERA network, points to the need for continuing study of assessment in early literacy. We believe that at least four areas deserve special attention. First, the policy context for instructional programs, teaching, and teacher education that places a premium on ‘‘scientifically proven’’ ap- proaches and methods has immediate im- plications for assessment. Tools to be used in reading assessment (e.g., for diagnosis, program evaluation, or research) are subject to high standards of validity and reliability. We applaud this attention to rigor in as- sessment, but we believe that decision mak- ing about the use of instruments should be professional, public, and comprehensive. Such deliberations must extend beyond the traditional psychometric constructs of reli- ability and validity to include consideration of the consequences of testing and the social contexts of assessment. Second, researchers must continue to in- vestigate the ways in which assessment This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM All use subject to JSTOR Terms and Conditions
  • 18. ASSESSMENTS 215 tools can be broadened to focus on multiple factors and the interaction of these factors in ways that reflect authentic learning and teaching environments. For example, infor- mal reading inventories have become pop- ular tools for assessment, partly because reading rate and accuracy can be assessed quickly and reliably, but educators need to consider how text-leveling factors might in- teract with students’ developmental levels to influence evaluations of reading perfor- mance. Good assessments should lead to an understanding of the complexity of learn- ing to read and not impose a false sense of simplicity on early reading development. Third, the gulf between what teachers value as informal assessments and what is imposed on them in the form of standard- ized testing appears to be broadening. Al- though performance assessments and port- folios were popular in the 1980s and 1990s, the trends today are to increase high-stakes testing for young children, to remove teacher judgment from assessment, and to streamline assessments so they can be con- ducted quickly and repeatedly. More re- search is needed on how highly effective teachers assess developing reading skills in their classrooms. Before educators and pol- icy makers abandon performance assess- ment, careful consideration must be given to the ways that ongoing assessment can promote differentiated instruction. Fourth, researchers cannot lose sight of the fact that good assessment rests on good theory, not just a theory of reading but of effective teaching and development. Just because motivation, self-concept, and criti- cal thinking are difficult to measure using large-scale standardized tests does not mean they should be ignored. The scientific method is not just about comparing one program or one approach to another to prove which is best. The scientific investi- gation of assessment in early literacy should contribute to theory building that ulti- mately informs effective teaching and learn- ing. References Berman, R. A., & Slobin, D. I. (1994). Relating events in narrative: A crosslinguistic develop- mental study. Hillsdale, NJ: Erlbaum. Bus, A. J., van Ijzendoorn, M. H., & Pellegrini, A. D. (1995). Joint book-reading makes for suc- cess in learning to read: A meta-analysis on intergenerational transmission of literacy. Re- view of Educational Research, 65(1), 1–21. Clay, M. M. (1993). An observation survey of early literacy achievement. Portsmouth, NH: Hei- nemann. DeBruin-Parecki, A. (1999). Assessing adult/child storybook reading practices (Technical Rep. No. 2-004). Ann Arbor: University of Michigan, Center for the Improvement of Early Read- ing Achievement. Dickinson, D. K., McCabe, A., & Anastasopou- los, L. (2001). A framework for examining book reading in early childhood classrooms (Tech. Rep. No. 1-014). Ann Arbor: University of Michigan, Center for the Improvement of Early Reading Achievement. Duke, N. K. (2000). 3.6 minutes per day: The scarcity of information texts in first grade. Reading Research Quarterly, 35(2), 202–224. Embretson, S. E., & Reise, S. P. (2000). Item re- sponse theory for psychologists. Mahwah, NJ: Erlbaum. Good, R. H., & Kaminski, R. A. (Eds.). (2002). Dynamic indicators of basic early literacy skills (6th ed.). Eugene, OR: Institute for the Devel- opment of Educational Achievement. Haladyna, T., Nolen, S. B., & Haas, N. S. (1991). Raising standardized achievement test scores and the origins of test score pollution. Educational Researcher, 20(5), 2–7. Hoffman, J. V. (2002). Words on words in leveled texts for beginning readers. In D. Schallert, C. Fairbanks, J. Worthy, B. Maloch, & J. V. Hoffman (Eds.), Fifty-first yearbook of the Na- tional Reading Conference (pp. 59–81). Oak Creek, WI: National Reading Conference. Hoffman, J. V., Assaf, L. C., & Paris, S. G. (2001). High-stakes testing in reading: Today in Texas, tomorrow? Reading Teacher, 54(5), 482– 492. Hoffman, J. V., McCarthey, S. J., Abbott, J., Chris- tian, C., Corman, L., Curry, C., Dressman, M., Elliot, B., Mathern, D., & Stahle, E. (1998). The literature-based basals in first-grade classrooms: Savior, satan, or same-old, same- old? Reading Research Quarterly, 33, 168–197. Hoffman, J. V., Paris, S. G., Patterson, E., Salas, R., & Assaf, L. (2003). High-stakes assess- ment in the language arts: The piper plays, the players dance, but who pays the price? This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM All use subject to JSTOR Terms and Conditions
  • 19. 216 THE ELEMENTARY SCHOOL JOURNAL NOVEMBER 2004 In J. Flood & D. Lapp (Eds.), Handbook of re- search on teaching the English language arts (2d ed., pp. 619–630). Mahwah, NJ: Erlbaum. Hoffman, J. V., Roser, N. L., Salas, R., Patterson, E., & Pennington, J. (2001). Text leveling and ‘‘little books’’ in first-grade reading. Journal of Literacy Research, 33(3), 507–528. Hoffman, J. V., & Sailors, M. (2002). The TEX-IN3: Text inventory, text in-use and text interviews. Bastrop, TX: Jeaser. Hoffman, J. V., Sailors, M., Duffy, G., & Beretvas, C. (2003, April). Assessing the literacy environ- ment using the TEX-IN3: A validity study. Pa- per presented at the annual meeting of the American Educational Research Association, Chicago. Hoffman, J. V., Sailors, M., & Patterson, E. (2002). Decodable texts for beginning reading in- struction: The year 2000 basals. Journal of Lit- eracy Research, 34(3), 269–298. Kuhn, M. R., & Stahl, S. A. (2003). Fluency: A review of developmental and remedial prac- tices. Journal of Educational Psychology, 95(1), 3–21. Meisels, S. J., Liaw, F. R., Dorfman, A. B., & Nel- son, R. (1995). The Work Sampling System: Reliability and validity of a performance as- sessment for young children. Early Childhood Research Quarterly, 10(3), 277–296. Meisels, S. J., & Piker, R. A. (2000). An analysis of early literacy assessments used for instruction (Tech. Rep. No. 3-002). Ann Arbor: Univer- sity of Michigan, Center for the Improve- ment of Early Reading Achievement. Menon, S., & Hiebert, E. H. (2003). A comparison of first graders’ reading acquisition with little books and literature anthologies (Tech. Rep. No. 1-009). Ann Arbor: University of Michigan, Center for the Improvement of Early Read- ing Achievement. Morrow, L. M. (1990). Assessing children’s un- derstanding of story through their construc- tion and reconstruction of narrative. In L. M. Morrow & J. K. Smith (Eds.), Assessment for instruction in early literacy (pp. 110–133). En- glewood Cliffs, NJ: Prentice-Hall. National Reading Panel. (2000). Teaching children to read: An evidence-based assessment of the sci- entific research literature on reading and its im- plications for reading instruction: Reports of the subgroups. Bethesda, MD: National Institute of Child Health and Human Development. Neuman, S. B. (1999). Books make a difference: A study of access to literacy. Reading Research Quarterly, 34(3), 286–311. No Child Left Behind Act of 2001. (2002). Pub. L. No. 107–110, paragraph 115, Stat. 1425. Paris, A. H., & Paris, S. G. (2003). Assessing nar- rative comprehension in young children. Reading Research Quarterly, 38(1), 37–76. Paris, S. G. (2000). Trojan horse in the schoolyard: The hidden threats in high-stakes testing. Is- sues in Education, 6(1,2), 1–16. Paris, S. G. (2002). Measuring children’s reading development using leveled texts. Reading Teacher, 56(2), 168–170. Paris, S. G., & Carpenter, R. D. (2003). FAQs about IRIs. Reading Teacher, 56(6), 578–580. Paris, S. G., Carpenter, R. D., Paris, A. H., & Hamilton, E. E. (in press). Spurious and gen- uine correlates of children’s reading compre- hension. In S. G. Paris & S. A. Stahl (Eds.), Children’s reading comprehension and assess- ment. Mahwah, NJ: Erlbaum. Paris, S. G., Lawton, T. A., Turner, J. C., & Roth, J. L. (1991). A developmental perspective on standardized achievement testing. Educa- tional Researcher, 20, 12–20. Paris, S. G., Paris, A. H., & Carpenter, R. D. (2002). Effective practices for assessing young readers. In B. Taylor & P. D. Pearson (Eds.), Teaching reading: Effective schools, ac- complished teachers (pp. 141–160). Mahwah, NJ: Erlbaum. Paris, S. G., Pearson, P. D., Cervetti, G., Carpen- ter, R., Paris, A. H., DeGroot, J., Mercer, M., Schnabel, K., Martineau, J., Papanastasiou, E., Flukes, J., Humphrey, K., & Bashore-Berg, T. (2004). Assessing the effectiveness of sum- mer reading programs. In G. Borman & M. Boulay (Eds.), Summer learning: Research, pol- icies, and programs (pp. 121–161). Mahwah, NJ: Erlbaum. Pearson, P. D., Sensale, L., Vyas, S., & Kim, Y. (1999, June). Early literacy assessment: A mar- ketplace analysis. Paper presented at the Na- tional Conference on Large-Scale Assess- ment, Snowbird, UT. Rasinski, T. V., & Hoffman, J. V. (2003). Oral read- ing in the school literacy curriculum. Reading Research Quarterly, 38(4), 510–522. Scarborough, H. S., & Dobrich, W. (1994). On the efficacy of reading to preschoolers. Develop- mental Review, 14, 245–302. Snow, C. E., Burns, M. S., & Griffin, P. (1998). Preventing reading difficulties in young children. Washington, DC: National Academy Press. Stallman, A. C., & Pearson, P. D. (1990). Formal measures of early literacy. In L. M. Morrow & J. K. Smith (Eds.), Assessment for instruction in early literacy (pp. 7–44). Englewood Cliffs, NJ: Prentice-Hall. Tabors, P. O., Snow, C. E., & Dickinson, D. K. (2001). Homes and schools together: Sup- porting language and literacy development. In D. K. Dickinson & P. O. Tabors (Eds.), Be- This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM All use subject to JSTOR Terms and Conditions
  • 20. ASSESSMENTS 217 ginning literacy with language: Young children learning at home and in school (pp. 313–334). Baltimore: Brookes. Urdan, T. C., & Paris, S. G. (1994). Teachers’ per- ceptions of standardized achievement tests. Educational Policy, 8(2), 137–156. van den Broek, P., Kendeou, P., Kremer, K., Lynch, J., Butler, J., White, M. J., & Lorch, E. P. (in press). Assessment of comprehension abilities in young children. In S. G. Paris & S. A. Stahl (Eds.), Children’s reading compre- hension and assessment. Mahwah, NJ: Erl- baum. van Kraayenoord, C. E., & Paris, S. G. (1996). Story construction from a picture book: An assessment activity for young learners. Early Childhood Research Quarterly, 11, 41–61. Whitehurst, G. J., & Lonigan, C. J. (1998). Child development and emergent literacy. Child Development, 69(3), 848–872. This content downloaded from 134.84.217.180 on Wed, 19 Jun 2013 12:22:20 PM All use subject to JSTOR Terms and Conditions