SlideShare a Scribd company logo
1 of 21
Download to read offline
Using Common Assessment Data to Predict High Stakes Performance:
An Efficient Teacher-Referent Process
Please address all correspondence to:
Bethany Silver
Capitol Region Education Council
Division of Teaching and Learning
111 Charter Oak Ave
Hartford, CT 06106
860-268-2189
bethanysilver@yahoo.com
Paper presented at the
Annual Meeting of the American Educational Research Association
New York, New York,
March 26, 2008
Using Common Assessment Data to Predict High Stakes Performance:
An Efficient Teacher-Referent Process
Bethany Silver
Colleen Palmer
Frances DiFiore
Capitol Region Education Council
Hartford, Connecticut
2
School districts across the nation routinely commit resources and instructional time to
the creation, implementation, scoring and data entry of locally constructed assessments. The
data from these instructionally useful tools can be used to not only monitor learning growth, but
also to estimate high stakes performance. This paper describes the process employed by a small
Northeastern school district to examine the relationship between student performance on
common district benchmark assessments and high stakes achievement.
3
The development of district-wide common assessments is a practice in which many
districts engage, with Reeves (2000, 2002a, 2002b) and Ainsworth and Viegut (2006) leading the
literature in support of this process. Typically, grade-level teacher leaders converge on the state
standards in small groups during the summer months and as times permits throughout the school
year to create tools for monitoring and tracking learner achievement during the school year
(Ainsworth & Viegut, 2006). Much of the time, each test item is carefully crafted to align with
high stakes assessment questions specific to question format, difficulty, and skill, then mapped to
specific standards, and reviewed by the group as a whole. The final instrument is pulled together,
sometimes literally using cut and paste technology, and prepared for distribution to district
schools, targeted for each grade level and classroom.
At the prescribed time, the test is administered; subjected to a process for scoring, the
data are collected, and compiled for the classroom, school and district. If the district has the
vision and resources, there are technology tools available to electronically acquire student
responses, and report them back to classroom teachers in an instructionally useful manner, while
passing them forward to administrators for broader decision making responsibilities.
Thomas (2005) describes six traditional functions of test results. This includes
diagnosing individual strengths and weaknesses, differentiating instruction, understanding whole
group strengths and weaknesses, assigning grades, and predicting performance. Thomas
specifically notes, “present day high-stakes programs are also often used for predicting pupils’
future academic success” (p.81). Given the summative nature of high stakes assessment, and the
regular practice of local benchmark measures of learning, the noble approach would be to
support the learner with formative assessments in a manner that empowers the teacher to better
prepare the student to both successfully participate in the assessment and achieve the targeted
performance standard well in advance of the summative assessment.
Based on Thomas’ (2005) suggestion, it is reasonable to infer that scores earned by a
typical child on a district-wide common assessment would relay expectations, to some degree,
regarding performance on a high stakes standardized assessments. This reasoning, however,
lends itself to anecdote, and calls for greater empirical rigor. It is appropriate to investigate the
relationship between high stakes and locally created common assessments using student data.
This study begins to explore that relationship with a sample of approximately 1200 grade 3
4
through 8 students in a small magnet school district in the Northeast. Three lenses are employed
for this inquiry. First, correlations between high stakes and local assessments were examined.
With evidence of criterion-related validity, the second approach required a six-step process
employed to predict high stakes performance with stepwise multiple linear regression. These
estimated scores were subjected to teacher review and refinement. The paper concludes with a
discussion of the relationship between algorithm-predicted scores, teacher-revised predictions,
and actual scores, as well as directions for future research and application.
For the intent of this paper, the term ‘district-wide common assessment’ refers to an
assessment designed for and administered to all children enrolled in a particular grade within a
particular school district at pre-determined intervals during the course of a school year.
Purpose
During the course of the school year, the classroom teacher is likely to infer, from the
child’s performance in general coursework, an approximate achievement level of their learners
on the high stakes assessments. The work by DuPaul, Volpe, Jitendra, Lutz, Lorah & Gruber
(2004) found teacher perceptions of academic skills to be the strongest predictor of achievement
test scores. Admittedly, a single standardized assessment score is a very narrow snapshot into the
academic life of a student. However, the No Child Left Behind (NCLB) Act has riveted the
nation’s attention, in the form of public criticism or celebration related to the information
contained on a single measure of learning. As constricting and inarticulate as that value may be,
in terms of describing the capacity of the learner, it is a cultural reality in the nation’s public
schools. The work of DuPaul et al. encourages the examination of the predictive utility common
assessment data might offer to teachers and building administrators. The ability to confidently
estimate, with relative accuracy, content mastery and learning needs with a local benchmark
assessment empowers the classroom teacher and school district to proactively address learning
needs in advance of the summative state assessment. This is in alignment with Stiggin’s (2005)
call to clarify the intent of assessments as ‘for learning’ (classroom assessment) or ‘of
learning’ (high stakes tests).
Schools invest a great deal of time tracking student performance. Common assessment
data is one internally controlled tool that schools can employ to formally process student
attainment relative to state standards. Boudet, City, and Murnane (2007) offer four cautions
5
related to the use of internally constructed assessments: validity and reliability challenges;
disparate difficulty levels between parallel forms; standardized administration protocols; and
consistent scoring procedures. When each of these concerns are addressed, the resulting system
of assessment promises a resilient and reasonably credible measurement. It would be useful to
classroom teachers and educational leaders to be aware of the relationship between performance
recorded on a district-wide common assessment and expectations for student performance on
high stakes assessments.
This inquiry sought to address the following questions:
Research Question 1: How well are the locally constructed district-wide common
assessments scores related to high stakes assessment performances?
Research Question 2: What processes can we use to make more efficient use of district-
wide common assessment data in understanding the relationship between common
assessment scores and high stakes assessment scores?
Methodology
Participants
Located in the Northeastern United States, the school district enrolls approximately 3400
urban and suburban learners in eight magnet schools and five special programs. Approximately
61% of the students are minority, non-white, and 27% qualify for free or reduced price lunches.
Existing data from approximately 1200 grade three through eight students enrolled in five
magnet schools were used for this research.
Assessment Tools
There are two data sets existing for the students: local common assessment data and high
stakes data. The local data is collected at the item level via hand entry for district-wide common
assessments. The district assessments have been constructed by teachers employed by the school
district, with the support of curriculum experts and consultants. These tests are grade-level
specific, ensuring that each child has experience with the format and difficulty level appropriate
to the corresponding grade level, prior to the high stakes assessment. Each item on each
assessment is aligned with a standard and skill, ensuring that the results of the assessments will
be instructionally useful to classroom teachers. The assessments are administered twice yearly,
6
during the fall and in January. All state standards are represented with a range of items, between
4 and 12, within at least one of the assessments for each grade level.
Instructional staff are provided with the district-wide assessment schedule prior to the
start of the school year. Assessment administration windows range from a week to 25 days,
dependent on the nature of the assessment and scoring process. Most assessments contain open
ended responses that require rubric-based scoring, making the average administration and data
entry time frame close to 20 days. The students in this study also participate in yearly high
stakes assessments, managed by the state department of education. This represents the second
set of existing data for the subjects in this study.
Annual criterion referenced state assessments are administered to students in grades three
through eight, as well as grade 10, during the month of March. For this research, only student
data from grades 3-8 were included. Results are provided by grade level and content area, with
specific mastery performance information for each learner relative to the 25 math strands, four
reading strands, the TASA Degrees of Reading Power (DRP), and three writing strands.
Students also receive summary score information for each content area in the form of a scale
score. Scale scores are calculated to range from 100 to 400.
Scale scores are divided into five levels of performance, with ‘5’ representing the most
accomplished score and ‘1’ representing the least skilled performance. Score ranges vary by
grade, relative to the level of achievement, with a level of 3 or higher describing the achievement
level reflective of the NCLB AYP criterion for the content area.
Procedure
A single data file containing high stakes assessment data from the March 2006 high
stakes assessment and scores for Fall 2005 and January 2006 district-wide common assessments
was used to explore the correlations between the two test formats. Appendix A, Table 1,
provides the Pearson correlation coefficients for each content area, for each grade level, for each
common assessment administration. Note that not all schools participate in every administration
of every test.
7
Results: Research question one
The first research question sought to clarify the quality of the relationship between the
high stakes and local assessment performance. To this end, correlation coefficients were used to
describe the degree of relationship between the two different types of assessments in both
reading and mathematics. All coefficients were statistically significant.
The mathematics content area offered statistically significant correlation coefficients
between locally constructed benchmark assessment performance and high stakes achievement.
Coefficients ranged from .74 to .91 (p<.01) when the local data were matched to high stake
performance.
In the area of reading there are two data elements used to define grade level proficiency.
The high stakes scale score is a composite score consisting of scores from both the Reading
Comprehension section and the Degrees of Reading Power unit score (DRP) (TASA, 2007).
These numbers are combined and then transformed into a Reading Scale Score (CSDE, 2007).
To emulate this approach, the district employed a common assessment score that simulates the
Reading Comprehension segment of the high stake assessment, and a separate DRP test.
Significant correlation coefficients between the common assessments scores and high stakes
scores ranged from .44 to .76 (p<.01). With the DRP administrations, the correlations between
local and state administered tests ranged from .41 to .88 (p<.01), dependent on the grade level
and administration time.
Methodology: Research Question Two
A six step process was employed to predict high stakes performance using district-wide
common assessment data. Appendix B offers a visual model of the entire six step process.
Step One: The high stakes assessment data were stratified by grade level. A stepwise
multiple linear regression was employed to identify the variables with the greatest explanatory
power relative to performance on the content area scale score for the areas of Mathematics and
Reading. For each grade level and content area, the proportion of variance explained by each of
the variables was documented. The goal was to identify the variables that explained the greatest
amount of variance in scale score for the content areas of Mathematics and Reading.
8
Respondent data were monitored carefully for outlying cases, which occurred for every
grade level and content area, for performances both above and below three standard deviations
from the mean. These excluded cases did not reveal any systematic patterns and represented less
than 5% of the overall sample for the grade level. At the close of step one, a maximum of three
predictor variables were identified for each grade level and content area.
Once the final regressions were run, strand scores were converted to proportions for each
test taker. This was performed because the district-wide common assessments did not perfectly
match the test item quantity or mastery criterion on the high stakes assessment. By converting
student scores to percents, for both internal assessments and high stakes assessments, the
multiple regression output became a tool.
Step Two: Using output to generate algorithms
The output generated regression coefficients for each of the analyses. These were used to
create algorithms, based on the multiple regression equation, Ŷ = a + b1X1 + b2X2 + b3X3 + e
(Tabachnick & Fidell, 2007). This equation describes the predicted value as Ŷ, a as the slope of
the line, bi as the regression coefficient and Xi as the corresponding indicator value for the
individual participant. The error term is represented as ‘e’, and was not used in the actual
calculation of estimated scale scores for each participant. However, the initial information
presented to building administrators did reveal the error term values. For example, a school-
specific equation for Mathematics scale score for grade 3 students was shared as follows:
Mathematics Scale = 34.91 + .475(Reading Scale) + 11.98(Math Strand 15) + 11.41(Math Strand 6) + 18.39
This regression explained 87.2% of the variance in Mathematics Scale Score for grade 3 students
at a specific school. Appendix C offers a sample of the materials that school principals were
provided relative to the multiple regression process.
Step Three: Integrate district-wide common assessment scores into regression algorithms
The third step in the prediction process was to locate the student data corresponding to
each of the strands identified as predictive of a scale score, from within the district-wide
common assessment data, and then integrate those values into the equations. Appendix D
presents the indicators that were employed across the school district data by grade level and
content area.
9
Step Four: Generate, distribute and collect teacher validation reports
Estimates for each student in each content area were generated based on the multiple
regression output. Once these predicted values were derived, teacher feedback was sought
regarding the accuracy of the estimates. Teacher rosters were printed that provided teachers with
student names, corresponding estimates, and requested that the teacher critically evaluate the
estimate for each learner in each content area by rating the estimate as ‘Too Low’, ‘On Target’ or
‘Too High’.
The estimation process was described to teachers as performed by a machine using a
mathematical equation applied to common district-wide assessment data. These estimates were
referred to as ‘Machine Estimates’ in the literature provided to teachers. Classroom teachers
were asked to review the estimates carefully and apply their professional expertise and
knowledge of the learner to revise the preliminary information that the equation had generated.
Specifically, teachers were asked to rate the ‘Machine Estimate’ as ‘Too Low’, ‘On Target’ or
‘Too High’. When a teacher identified an estimate that was not ‘On Target’ a revised level was
requested. These are termed ‘teacher guided estimates’ in this work. A sample of the teacher
roster is included in Appendix E.
Once all of the rosters were returned and the data entered, estimates and teacher guided
estimates were compared. Both machine estimates for math and reading were highly correlated
with teacher guided estimates for the same (Math Level, r=.79, p<.01, n=1233; Reading Level
r=.85, p<.01, n=1245). However, when examined with a pair-wise comparison, guided estimates
were significantly higher in reading than the machine estimates (T=18.524, df=1244, p<.01) by
approximately .396 levels on a 5-point scale.
Step Five: Use Teacher Guided Estimates to predict Adequate Yearly Progress (AYP) for the
schools and school district
Teacher guided estimates were employed in March 2007 to apprise central office staff
and building administrators of potential AYP issues. A formal report was made to district
leadership, and proactive planning began regarding areas of challenge for specific schools and
subgroups.
Step Six: Report back to teachers, administrators, and staff regarding expected AYP performance
10
Testing concluded at the close of March, and high stakes assessment results were
obtained on June 28, 2007. The attained data have been compared to teacher guided estimates
and machine estimates.
Results: Research Question 2
Correlations were used to examine the degree of relationship between Teacher Guided
Estimates, Machine Estimates, and High Stakes Attained Levels. The data are presented in
Table 2.
Table 2: Correlations between Estimated and Attained Performance
Mathematics Attained Score Reading Attained Score
Teacher Guided Estimate .72**(N=1242) .63** (N=1238)
Machine Estimate .62**(1234) .66**(N=1243)
**p<.01
Discussion
This line of inquiry has just begun. There are many assumptions that accompany this
process that need to be identified and refined. As data stores increase and student information is
captured systematically and methodically, the accuracy of the process can increase, as well as the
sophistication of the various models used to estimate performance.
With regard to question one, How well are the locally constructed district-wide common
assessments scores related to high stakes assessment performances? We observed that the
overall score for common assessments did, in most cases, align well with student high stakes
performance. Work needs to continue in the refinement of these instruments, so that the
calibration increases. However, in that all correlation coefficients were significant, and some
were very strong, it is reasonable to celebrate the current state of the local assessment tools for
this school district.
Future work needs to occur with those instruments and in those settings where there is a
lack of alignment to standards. To facilitate this work, it would be appropriate to determine the
correlation between each of the sub-scales on the district-wide common assessments and the
obtained corresponding high stakes strand scores. Those assessments that don’t correlate well
with high stakes data may indicate a need for deeper reflection and examination. A lack of
11
criterion related validity might be a factor to consider. However, with locally constructed
assessments there are many more assumptions that would benefit from careful scrutiny. These
relate to the timing of the assessment, the quality of the selected reading passages, the similarity
of question format, and the quality of item writing. Most notably, the gap of time between the
administrations of the common assessments occurring during the fall session, and the
administration of the high stakes assessment in March, reflects a long span of academic
opportunity. To observe a weaker correlation between the fall and spring observations invites
the speculation that learning has occurred. This is in line with the intent of the educational
environment, in terms of using the fall assessment as a measure for instructional purposes. In
fact, a high positive correlation between the fall and spring measures could be interpreted as a
learning environment that failed to serve the student adequately.
While a student may not score well on one or more areas of the common assessments in
the fall, that same student may have mastered those respective concepts and skills by the March
administration of the high-stakes assessments. In a similar manner, the time differential of the
January assessment window may impact correlation outcomes, perhaps to a lesser degree. Going
forward, further analysis may reveal benchmarks of student learning throughout the year on the
compendium of common assessments which indicate that the student is “on track” for readiness
in March. This benchmark approach would enable educators at key points during the year to
accurately gauge the progress towards content mastery.
A point of curiosity was the statistically significantly higher teacher revisions that
accompanied the reading scores, and the resulting correlations between Teacher Guided
Estimates, Machine Estimates, and Attained scores. Recall that teachers made statistically
significantly more upward revisions to reading scores estimated by the machine process. Yet
when the attained scores became available the machine estimates showed a more consistent
relationship (r=.66, p<.01) than the teacher guided estimates (r=.63, p<.01).
Conclusions
Translating the usefulness and validity of the predictive modeling process to the level of
classroom teachers required thoughtful implementation. Most classroom teachers have little
interest in comprehending the generation of multiple regression predictive models, but are
interested in reliable models that inform them of what students know and when. Integrating this
12
highly theoretical approach in the world of practitioners necessitated a two-phase strategic plan
of implementation.
The first phase was to provide an overview to the administrative team, whose members
would be responsible for introducing this modeling framework to the frontline teachers. These
administrators were charged with designing a customized implementation plan for their
respective sites, requiring the same core of basic information to be communicated, but allowing
for the flexibility in how to support each faculty’s use of this model. Every administrator was
empowered with creating a plan for his/her site that would provide enough technical information
from the viewpoint of a practitioner, but in a manner that also highlighted the professional
judgment of each teacher.
The second phase of rollout occurred at the building level, with each administrator
implementing the plan to ensure: 1) a general understanding of the overall process, 2) the
purpose of the predictive modeling, 3) an elimination [or reduction] of cultural resistance, and 4)
clear articulation of the value of the individual professional assessment of each teacher.
The outcome of the implementation was markedly successful in several ways: 1) teachers
completed the task with utmost sense of seriousness and purpose, 2) there was a general
acceptance of the merit of predictive modeling by teachers, 3) feedback after correlation studies
were completed offered teachers usable feedback and validation regarding their own individual
predictive skills, coupled with that of the mathematical model and 4.) the use of common
district-wide assessment data to predict high stakes performance introduced an awareness of the
validity of the common assessments in use in the district.
Future applications of this process include an expansion on the use of historical high
stakes data to inform the predictive models and the adoption of an assessment management
system that will acquire item level data electronically, to increase data integrity and release
teachers from the clerical task of data entry.
13
REFERENCE LIST
Ainsworth, L. & Viegut, D.J. (2006) Common Formative Assessments: How to connect
standards-based instruction and assessment. Thousand Oaks, California: Corwin Press, Inc.
Bureau of Student Assessment, Connecticut State Department of Education, (2007). CMT-4
technical bulletin, Calculation of scale scores for the 2007 CMT-4 administration (Form P’).
Retrieved January 3, 2007, from
http://www.csde.state.ct.us/public/cedar/assessment/cmt/resources/misc_cmt/cmt_technical_bu
lletin_2007.pdf
DuPaul, G.J., Volpe, R.J., Jitendra, A.K., Lutz, J.G., Lorah, K.S., & Gruber, R. (2004)
Elementary school students with AD/HD: predictors of academic achievement. Journal of
School Psychology, 42, 285-301.
Stevens, J. (1996). Applied multivariate statistics for the social sciences (3rd ed.). Mahway, NJ:
Lawrence Erlbaum.
Reeves, D. B. (2000). Accountability in action: A blueprint for learning organizations. Denver,
Colorado: Advanced Learning Press.
Reeves, D. B. (2002a). Holistic accountability: Serving students, schools, and community.
Thousand Oaks, California: Corwin Press, Inc.
Reeves, D. B. (2002b). The leader's guide to standards. San Francisco: Jossey-Bass.
Schmoker, M (1999). Results: The key to continuous school improvement, 2nd edition.
Alexandria, VA: Association for Supervision and Curriculum Development.
Stiggins, R.J. (2005). Student Involved Assessment for Learning, 4th
ed. Upper Saddle River, NJ:
Merrill/Prentice Hall.
Thomas, R.M. (2005) High Stakes Assessment: Coping with collateral damage. Mahway, NJ:
Lawrence Erlbaum.
Tabachnick, B. G., & Fidell, L. S. (2007). Using Multivariate Statistics, 5th ed. Boston: Allyn
and Bacon.
14
Appendix A: Table 1
Table 1: Correlation Coefficients
Mathematics: Correlation Coefficients
Grade Level
Fall 2005 Common Assessment Mid­Year 2006 Common Assessment
r With 2006 Math Scale Score (N) r With 2006 Math Scale Score (N)
Gr 3 .74**(144) .84**(144)
Gr 4 .83**(122) .89**(122)
Gr 5 .85**(111) .82**(111)
Gr 6 .82** (275) .76** (292)
Gr 7 .84** (296) .83** (293)
Gr 8 .83** (245) .91** (271)
District Overall* .73**(1193) .77**(1233)
Reading: Correlation Coefficients – Common Assessments
Fall 2005 Common Assessment January 2006 Common Assessments
r With 2006 Reading Scale Score (N) r With 2006 Reading Scale Score (N)
Gr 3 .72**(139) .75**(141)
Gr 4 .71**(121) .76**(121)
Gr 5 .61**(111) .69**(110)
Gr 6 .64**(254) .61**(308)
Gr 7 .59**(269) .68**(304)
Gr 8 .72**(241) .44**(273)
District Overall* .64**(1135) .57**(1257)
Fall 2005 DRP January 2006 DRP
r With 2006 High Stakes DRP
Unit Score (N)
r With 2006 High Stakes DRP
Unit Score (N)
Gr 3 .79**(143) .85**(43)
Gr 4 .83** (121) (0)
Gr 5 .88*(111) .87**(43)
Gr 6 .41**(263) (0)
Gr 7 .79**(173) (0)
Gr 8 .87**(153) (0)
District Overall .63**(964) .88**(85)
**p<.01; District Overall* employed percentage scores, instead of raw values.
15
Appendix B: Visual Model of the Prediction Process
16
Appendix C: Sample Building Administrator Initial Regression Materials
Mathematics Scale Score
Grade Level Variables in Order of Influence
Proportion of Variance 
Explained
3
a. Reading Scale Score
b. M15: Geometry and Measurement ­ Approximating 
Measures
c. M6: Numerical and Proportional Reasoning­Basic 
Facts
a.       69.3%
b.       12.2%
c.         5.7%
Total : 87.2%
Ŷ = a + b1X1 + b2X2 + b3X3 + e
MScale = 34.91 + .475(RScale) + 11.98(M15) + 11.41(M6) + 18.39
 
4
a. M4: Numerical and Proportional Reasoning­ 
Order, Magnitude and Rounding of Numbers
b. M23: Algebraic Reasoning: Patterns and 
Functions­Algebraic Concepts
c. M11: Numerical and Proportional Reasoning­
Estimating Solutions to Problems
a.   66.1%
b.   15.2%
c.     6.3%
Total : 87.6%
Ŷ = a + b1X1 + b2X2 + b3X3 + e 
MScale = 110.72 + 15.48(M4) + 14.49(M23) + 8.66(M11) + 9.33
   
5
a. ER­ Holistic Writing Score
b. M3: Numerical and Proportional Reasoning­
Equivalent Fractions, Decimals and Percents
c. M24: Working with Data: Probability and Statistics­
Classification and Logical Reasoning
a.   69.9%
b.   13.4%
c.     7.9%
Total : 89.2%
Ŷ = a + b1X1 + b2X2 + b3X3 + e
MScale =100.43 + 2.75(ER­ Holistic Writing) + 12.36(M3) + 10.73(M24) + 9.98
 
17
Reading Scale Score
Grade Level Variables in Order of Influence
Proportion of 
Variance 
Explained
3
a. DRP: Degrees of Reading Power
b. RC_A: Reading Comprehension­Forming a General Understanding
c. RC_B: Reading Comprehension­Developing Interpretation
a.       91.8%
b.        4.1%
c.         1.5%
Total : 97.4%
Ŷ = a + b1X1 + b2X2 + b3X3 + e
RScale = 65.73 + 2.08(DRP) + 4.64(RC_A) + 3.14(RC_B) + 5.96
 
4
a. DRP: Degrees of Reading Power 
b. RC_D: Reading Comprehension­Examining the Content and Structure
c. RC_A: Reading Comprehension­Forming a General Understanding
a.         90.2%
b.           6.3%
c.           1.4%
Total : 97.9%
Ŷ = a + b1X1 + b2X2 + b3X3 + e 
RScale = 68.93 + 1.86(DRP) + 6.06(RC_D) + 3.63(RC_A) + 4.53
   
5
a. DRP: Degrees of Reading Power 
b. RC_A: Reading Comprehension­Forming a General Understanding 
c. RC_D: Reading Comprehension­Examining the Content and Structure
a.   90.8%
b.     4.3%
c.     1.8%
Total : 96.9%
Ŷ = a + b1X1 + b2X2 + b3X3 + e
RScale =63.13 + 1.85(DRP) + 4.26(RC_A) + 3.66(RC_D) + 5.99
 
19
Appendix D: District Indicators
Mathematics Scale Score
Grade Level Variables in Order of Influence
Proportion 
of 
Variance 
Explained
3
a. M15: Geometry and Measurement­Approximating Measures
b. M25: Integrated Understandings­Mathematical Applications
c. M5: Numerical and Proportional Reasoning­Models of Operations
Total : 81%
4
a. M3: Numerical & Proportional Reasoning­Equivalent Fractions, Decimals & 
Percents
b. M5: Numerical & Proportional Reasoning­Models of Operations 
c. M25: Integrated Understandings­Mathematical Applications
Total : 
81.7%
5
a. M4: Numerical & Proportional Reasoning­Order, Magnitude & Rounding of 
Numbers
b. ER_CR: Composing/Revising
c. M24: Working with Data: Probability & Statistics­Classification & Logical 
Reasoning
Total : 
82.6%
6
a. M11: Numerical &Proportional Reasoning­Estimating Solutions to Problems
b. M2: Numerical &Proportional Reasoning­Pictorial Representations of Numbers
c. M9: Numerical &Proportional Reasoning­Solve Word Problems
Total : 
82.5%
7
a. M9: Numerical &Proportional Reasoning­Solve Word Problems
b. M11: Numerical &Proportional Reasoning­Estimating Solutions to Problems
c. M23: Algebraic Reasoning: Patterns & Functions­Algebraic Concepts
Total : 
86.4%
8
a. M11: Numerical &Proportional Reasoning­Estimating Solutions to Problems
b. M16: Geometry and Measurement­Customary and Metric Measures
c. M7: Numerical & Proportional Reasoning­Computation with Whole Numbers & 
Decimals
Total : 
90.0%
20
Reading Scale Score
Grade Level Variables in Order of Influence
Proportion of 
Variance 
Explained
3
a. DRP: Degrees of Reading Power
b. RC_A: Reading Comprehension­Forming a General 
Understanding
c. RC_D: Reading Comprehension­Examining the Content and 
Structure
Total : 98.2%
4
a. DRP: Degrees of Reading Power 
b. RC_A: Reading Comprehension­Forming a General 
Understanding 
c. RC_C: Reading Comprehension­Making Reader/Text 
Connections
Total : 96.7%
5
a. DRP: Degrees of Reading Power 
b. RC_D: Reading Comprehension­Examining the Content and 
Structure 
c. RC_A: Reading Comprehension­Forming a General 
Understanding
Total : 95.8%
6
a. DRP: Degrees of Reading Power
b. RC_D: Reading Comprehension­Examining the Content and 
Structure 
c. RC_B: Reading Comprehension­Developing Interpretation
Total : 97.2%
7
a. DRP: Degrees of Reading Power 
b. RC_D: Reading Comprehension­Examining the Content and 
Structure
c. RC_C: Reading Comprehension­Making Reader/Text 
Connections
Total : 97.1%
8
a. DRP: Degrees of Reading Power 
b. RC_D: Reading Comprehension­Examining the Content and 
Structure
c. RC_A: Reading Comprehension­Forming a General 
Understanding
Total : 96.8%
21
Appendix E: Sample Teacher Revision Rosters
22

More Related Content

What's hot

Using e instruction’s® cps™ to support effective instruction
Using e instruction’s® cps™ to support effective instructionUsing e instruction’s® cps™ to support effective instruction
Using e instruction’s® cps™ to support effective instructionCCS Presentation Systems Inc.
 
Factors of Quality Education Enhancement: Review on Higher Education Practic...
 Factors of Quality Education Enhancement: Review on Higher Education Practic... Factors of Quality Education Enhancement: Review on Higher Education Practic...
Factors of Quality Education Enhancement: Review on Higher Education Practic...Research Journal of Education
 
Teacher opinions about the use of Value-Added models
Teacher opinions about the use of Value-Added models Teacher opinions about the use of Value-Added models
Teacher opinions about the use of Value-Added models llee18
 
Motivational characteristics of e-learning students
Motivational characteristics of e-learning studentsMotivational characteristics of e-learning students
Motivational characteristics of e-learning studentsKatarina Karalic
 
CERA 17: District Program Evaluation to Improve RTI/MTSS
CERA 17: District Program Evaluation to Improve RTI/MTSSCERA 17: District Program Evaluation to Improve RTI/MTSS
CERA 17: District Program Evaluation to Improve RTI/MTSSChristopher Kolar
 
Improving adolescent literacy
Improving adolescent literacyImproving adolescent literacy
Improving adolescent literacygnonewleaders
 
The Empirical Analysis of Curriculum Quality Evaluation Based on Students Eva...
The Empirical Analysis of Curriculum Quality Evaluation Based on Students Eva...The Empirical Analysis of Curriculum Quality Evaluation Based on Students Eva...
The Empirical Analysis of Curriculum Quality Evaluation Based on Students Eva...ijtsrd
 
Week1 Assessment Overview
Week1 Assessment OverviewWeek1 Assessment Overview
Week1 Assessment OverviewIPT652
 
Assessing the Benefits of Extended Learning Programs_Vincent_Hamm
Assessing the Benefits of Extended Learning Programs_Vincent_HammAssessing the Benefits of Extended Learning Programs_Vincent_Hamm
Assessing the Benefits of Extended Learning Programs_Vincent_HammVince Hamm
 
Connecticut mesuring and modeling growth
Connecticut   mesuring and modeling growthConnecticut   mesuring and modeling growth
Connecticut mesuring and modeling growthJohn Cronin
 
Connecticut mesuring and modeling growth
Connecticut   mesuring and modeling growthConnecticut   mesuring and modeling growth
Connecticut mesuring and modeling growthJohn Cronin
 
Connecticut mesuring and modeling growth
Connecticut   mesuring and modeling growthConnecticut   mesuring and modeling growth
Connecticut mesuring and modeling growthJohn Cronin
 
Course Evaluation Poster
Course Evaluation PosterCourse Evaluation Poster
Course Evaluation PosterBridget Hanley
 
AN INTEGRATED SYSTEM FRAMEWORK FOR PREDICTING STUDENTS’ ACADEMIC PERFORMANCE ...
AN INTEGRATED SYSTEM FRAMEWORK FOR PREDICTING STUDENTS’ ACADEMIC PERFORMANCE ...AN INTEGRATED SYSTEM FRAMEWORK FOR PREDICTING STUDENTS’ ACADEMIC PERFORMANCE ...
AN INTEGRATED SYSTEM FRAMEWORK FOR PREDICTING STUDENTS’ ACADEMIC PERFORMANCE ...ijcsit
 
Performance assessment
Performance assessmentPerformance assessment
Performance assessmentXINYOUWANZ
 

What's hot (19)

Using e instruction’s® cps™ to support effective instruction
Using e instruction’s® cps™ to support effective instructionUsing e instruction’s® cps™ to support effective instruction
Using e instruction’s® cps™ to support effective instruction
 
Factors of Quality Education Enhancement: Review on Higher Education Practic...
 Factors of Quality Education Enhancement: Review on Higher Education Practic... Factors of Quality Education Enhancement: Review on Higher Education Practic...
Factors of Quality Education Enhancement: Review on Higher Education Practic...
 
Teacher opinions about the use of Value-Added models
Teacher opinions about the use of Value-Added models Teacher opinions about the use of Value-Added models
Teacher opinions about the use of Value-Added models
 
Motivational characteristics of e-learning students
Motivational characteristics of e-learning studentsMotivational characteristics of e-learning students
Motivational characteristics of e-learning students
 
CERA 17: District Program Evaluation to Improve RTI/MTSS
CERA 17: District Program Evaluation to Improve RTI/MTSSCERA 17: District Program Evaluation to Improve RTI/MTSS
CERA 17: District Program Evaluation to Improve RTI/MTSS
 
Improving adolescent literacy
Improving adolescent literacyImproving adolescent literacy
Improving adolescent literacy
 
Data and Assessment in Academic Libraries: Linking Freshmen Student Success a...
Data and Assessment in Academic Libraries: Linking Freshmen Student Success a...Data and Assessment in Academic Libraries: Linking Freshmen Student Success a...
Data and Assessment in Academic Libraries: Linking Freshmen Student Success a...
 
The Empirical Analysis of Curriculum Quality Evaluation Based on Students Eva...
The Empirical Analysis of Curriculum Quality Evaluation Based on Students Eva...The Empirical Analysis of Curriculum Quality Evaluation Based on Students Eva...
The Empirical Analysis of Curriculum Quality Evaluation Based on Students Eva...
 
2013 farkas
2013 farkas2013 farkas
2013 farkas
 
Week1 Assessment Overview
Week1 Assessment OverviewWeek1 Assessment Overview
Week1 Assessment Overview
 
Assessing the Benefits of Extended Learning Programs_Vincent_Hamm
Assessing the Benefits of Extended Learning Programs_Vincent_HammAssessing the Benefits of Extended Learning Programs_Vincent_Hamm
Assessing the Benefits of Extended Learning Programs_Vincent_Hamm
 
Connecticut mesuring and modeling growth
Connecticut   mesuring and modeling growthConnecticut   mesuring and modeling growth
Connecticut mesuring and modeling growth
 
Connecticut mesuring and modeling growth
Connecticut   mesuring and modeling growthConnecticut   mesuring and modeling growth
Connecticut mesuring and modeling growth
 
Connecticut mesuring and modeling growth
Connecticut   mesuring and modeling growthConnecticut   mesuring and modeling growth
Connecticut mesuring and modeling growth
 
Aste2019
Aste2019Aste2019
Aste2019
 
2013 mansor et al
2013 mansor et al2013 mansor et al
2013 mansor et al
 
Course Evaluation Poster
Course Evaluation PosterCourse Evaluation Poster
Course Evaluation Poster
 
AN INTEGRATED SYSTEM FRAMEWORK FOR PREDICTING STUDENTS’ ACADEMIC PERFORMANCE ...
AN INTEGRATED SYSTEM FRAMEWORK FOR PREDICTING STUDENTS’ ACADEMIC PERFORMANCE ...AN INTEGRATED SYSTEM FRAMEWORK FOR PREDICTING STUDENTS’ ACADEMIC PERFORMANCE ...
AN INTEGRATED SYSTEM FRAMEWORK FOR PREDICTING STUDENTS’ ACADEMIC PERFORMANCE ...
 
Performance assessment
Performance assessmentPerformance assessment
Performance assessment
 

Similar to Using Common Assessment Data to Predict High Stakes Performance- An Efficient Teacher-Referent Process (1)

Chapter Three Procedures and MethodologyIntroductionThe goal o
Chapter Three Procedures and MethodologyIntroductionThe goal oChapter Three Procedures and MethodologyIntroductionThe goal o
Chapter Three Procedures and MethodologyIntroductionThe goal oJinElias52
 
Assessment Standards Are To Guide The Design Of Exemplary Plans And Practices
Assessment Standards Are To Guide The Design Of Exemplary Plans And PracticesAssessment Standards Are To Guide The Design Of Exemplary Plans And Practices
Assessment Standards Are To Guide The Design Of Exemplary Plans And Practicesnoblex1
 
Teacher evaluation presentation3 mass
Teacher evaluation presentation3  massTeacher evaluation presentation3  mass
Teacher evaluation presentation3 massJohn Cronin
 
Teacher evaluations-and-local-flexibility
Teacher evaluations-and-local-flexibilityTeacher evaluations-and-local-flexibility
Teacher evaluations-and-local-flexibilityDavid Black
 
Using tests for teacher evaluation texas
Using tests for teacher evaluation texasUsing tests for teacher evaluation texas
Using tests for teacher evaluation texasNWEA
 
www.nationalforum.com - Dr. Jeff Goldhorn, Dr. W. Sean Kearney, Dr. Michael W...
www.nationalforum.com - Dr. Jeff Goldhorn, Dr. W. Sean Kearney, Dr. Michael W...www.nationalforum.com - Dr. Jeff Goldhorn, Dr. W. Sean Kearney, Dr. Michael W...
www.nationalforum.com - Dr. Jeff Goldhorn, Dr. W. Sean Kearney, Dr. Michael W...William Kritsonis
 
Jeff Goldhorn, W. Sean Kearney, Michael Webb, National Refereed Article Publi...
Jeff Goldhorn, W. Sean Kearney, Michael Webb, National Refereed Article Publi...Jeff Goldhorn, W. Sean Kearney, Michael Webb, National Refereed Article Publi...
Jeff Goldhorn, W. Sean Kearney, Michael Webb, National Refereed Article Publi...William Kritsonis
 
Dr. Jeff Goldhorn, Dr. W. Sean Kearney, Dr. Michael Webb, NATIONAL FORUM OF E...
Dr. Jeff Goldhorn, Dr. W. Sean Kearney, Dr. Michael Webb, NATIONAL FORUM OF E...Dr. Jeff Goldhorn, Dr. W. Sean Kearney, Dr. Michael Webb, NATIONAL FORUM OF E...
Dr. Jeff Goldhorn, Dr. W. Sean Kearney, Dr. Michael Webb, NATIONAL FORUM OF E...William Kritsonis
 
Advanced Foundations and Methods in EL (Lecture 1)
Advanced Foundations and Methods in EL (Lecture 1)Advanced Foundations and Methods in EL (Lecture 1)
Advanced Foundations and Methods in EL (Lecture 1)Sandra Halajian, M.A.
 
Pro questdocuments 2015-03-16(2)
Pro questdocuments 2015-03-16(2)Pro questdocuments 2015-03-16(2)
Pro questdocuments 2015-03-16(2)Rose Jedin
 
Pro questdocuments 2015-03-16(2)
Pro questdocuments 2015-03-16(2)Pro questdocuments 2015-03-16(2)
Pro questdocuments 2015-03-16(2)Rose Jedin
 
Teaching To The Test
Teaching To The TestTeaching To The Test
Teaching To The Testnoblex1
 
The standard of teachers’ assessment practices in three domains of learning i...
The standard of teachers’ assessment practices in three domains of learning i...The standard of teachers’ assessment practices in three domains of learning i...
The standard of teachers’ assessment practices in three domains of learning i...Alexander Decker
 
test construction in mathematics
test construction in mathematicstest construction in mathematics
test construction in mathematicsAlokBhutia
 
Dr. Fred C. Lunenburg - measurement and assessment in schools schooling v1 n1...
Dr. Fred C. Lunenburg - measurement and assessment in schools schooling v1 n1...Dr. Fred C. Lunenburg - measurement and assessment in schools schooling v1 n1...
Dr. Fred C. Lunenburg - measurement and assessment in schools schooling v1 n1...William Kritsonis
 
Connecting evidence based instructional practices to rti
Connecting evidence based instructional practices to rtiConnecting evidence based instructional practices to rti
Connecting evidence based instructional practices to rtiEast Central ISD
 
Dr. Teresa Ann Hughes, PhD Dissertation Defense, Dr. William Allan Kritsonis,...
Dr. Teresa Ann Hughes, PhD Dissertation Defense, Dr. William Allan Kritsonis,...Dr. Teresa Ann Hughes, PhD Dissertation Defense, Dr. William Allan Kritsonis,...
Dr. Teresa Ann Hughes, PhD Dissertation Defense, Dr. William Allan Kritsonis,...William Kritsonis
 

Similar to Using Common Assessment Data to Predict High Stakes Performance- An Efficient Teacher-Referent Process (1) (20)

Chapter Three Procedures and MethodologyIntroductionThe goal o
Chapter Three Procedures and MethodologyIntroductionThe goal oChapter Three Procedures and MethodologyIntroductionThe goal o
Chapter Three Procedures and MethodologyIntroductionThe goal o
 
Assessment Standards Are To Guide The Design Of Exemplary Plans And Practices
Assessment Standards Are To Guide The Design Of Exemplary Plans And PracticesAssessment Standards Are To Guide The Design Of Exemplary Plans And Practices
Assessment Standards Are To Guide The Design Of Exemplary Plans And Practices
 
Teacher evaluation presentation3 mass
Teacher evaluation presentation3  massTeacher evaluation presentation3  mass
Teacher evaluation presentation3 mass
 
Teacher evaluations-and-local-flexibility
Teacher evaluations-and-local-flexibilityTeacher evaluations-and-local-flexibility
Teacher evaluations-and-local-flexibility
 
Using tests for teacher evaluation texas
Using tests for teacher evaluation texasUsing tests for teacher evaluation texas
Using tests for teacher evaluation texas
 
www.nationalforum.com - Dr. Jeff Goldhorn, Dr. W. Sean Kearney, Dr. Michael W...
www.nationalforum.com - Dr. Jeff Goldhorn, Dr. W. Sean Kearney, Dr. Michael W...www.nationalforum.com - Dr. Jeff Goldhorn, Dr. W. Sean Kearney, Dr. Michael W...
www.nationalforum.com - Dr. Jeff Goldhorn, Dr. W. Sean Kearney, Dr. Michael W...
 
Jeff Goldhorn, W. Sean Kearney, Michael Webb, National Refereed Article Publi...
Jeff Goldhorn, W. Sean Kearney, Michael Webb, National Refereed Article Publi...Jeff Goldhorn, W. Sean Kearney, Michael Webb, National Refereed Article Publi...
Jeff Goldhorn, W. Sean Kearney, Michael Webb, National Refereed Article Publi...
 
Dr. Jeff Goldhorn, Dr. W. Sean Kearney, Dr. Michael Webb, NATIONAL FORUM OF E...
Dr. Jeff Goldhorn, Dr. W. Sean Kearney, Dr. Michael Webb, NATIONAL FORUM OF E...Dr. Jeff Goldhorn, Dr. W. Sean Kearney, Dr. Michael Webb, NATIONAL FORUM OF E...
Dr. Jeff Goldhorn, Dr. W. Sean Kearney, Dr. Michael Webb, NATIONAL FORUM OF E...
 
Assessment 101 Parts 1 & 2
Assessment 101 Parts 1 & 2Assessment 101 Parts 1 & 2
Assessment 101 Parts 1 & 2
 
Sip manual
Sip manualSip manual
Sip manual
 
Advanced Foundations and Methods in EL (Lecture 1)
Advanced Foundations and Methods in EL (Lecture 1)Advanced Foundations and Methods in EL (Lecture 1)
Advanced Foundations and Methods in EL (Lecture 1)
 
Pro questdocuments 2015-03-16(2)
Pro questdocuments 2015-03-16(2)Pro questdocuments 2015-03-16(2)
Pro questdocuments 2015-03-16(2)
 
Pro questdocuments 2015-03-16(2)
Pro questdocuments 2015-03-16(2)Pro questdocuments 2015-03-16(2)
Pro questdocuments 2015-03-16(2)
 
Va 101 ppt
Va 101 pptVa 101 ppt
Va 101 ppt
 
Teaching To The Test
Teaching To The TestTeaching To The Test
Teaching To The Test
 
The standard of teachers’ assessment practices in three domains of learning i...
The standard of teachers’ assessment practices in three domains of learning i...The standard of teachers’ assessment practices in three domains of learning i...
The standard of teachers’ assessment practices in three domains of learning i...
 
test construction in mathematics
test construction in mathematicstest construction in mathematics
test construction in mathematics
 
Dr. Fred C. Lunenburg - measurement and assessment in schools schooling v1 n1...
Dr. Fred C. Lunenburg - measurement and assessment in schools schooling v1 n1...Dr. Fred C. Lunenburg - measurement and assessment in schools schooling v1 n1...
Dr. Fred C. Lunenburg - measurement and assessment in schools schooling v1 n1...
 
Connecting evidence based instructional practices to rti
Connecting evidence based instructional practices to rtiConnecting evidence based instructional practices to rti
Connecting evidence based instructional practices to rti
 
Dr. Teresa Ann Hughes, PhD Dissertation Defense, Dr. William Allan Kritsonis,...
Dr. Teresa Ann Hughes, PhD Dissertation Defense, Dr. William Allan Kritsonis,...Dr. Teresa Ann Hughes, PhD Dissertation Defense, Dr. William Allan Kritsonis,...
Dr. Teresa Ann Hughes, PhD Dissertation Defense, Dr. William Allan Kritsonis,...
 

Using Common Assessment Data to Predict High Stakes Performance- An Efficient Teacher-Referent Process (1)

  • 1. Using Common Assessment Data to Predict High Stakes Performance: An Efficient Teacher-Referent Process Please address all correspondence to: Bethany Silver Capitol Region Education Council Division of Teaching and Learning 111 Charter Oak Ave Hartford, CT 06106 860-268-2189 bethanysilver@yahoo.com Paper presented at the Annual Meeting of the American Educational Research Association New York, New York, March 26, 2008
  • 2. Using Common Assessment Data to Predict High Stakes Performance: An Efficient Teacher-Referent Process Bethany Silver Colleen Palmer Frances DiFiore Capitol Region Education Council Hartford, Connecticut 2
  • 3. School districts across the nation routinely commit resources and instructional time to the creation, implementation, scoring and data entry of locally constructed assessments. The data from these instructionally useful tools can be used to not only monitor learning growth, but also to estimate high stakes performance. This paper describes the process employed by a small Northeastern school district to examine the relationship between student performance on common district benchmark assessments and high stakes achievement. 3
  • 4. The development of district-wide common assessments is a practice in which many districts engage, with Reeves (2000, 2002a, 2002b) and Ainsworth and Viegut (2006) leading the literature in support of this process. Typically, grade-level teacher leaders converge on the state standards in small groups during the summer months and as times permits throughout the school year to create tools for monitoring and tracking learner achievement during the school year (Ainsworth & Viegut, 2006). Much of the time, each test item is carefully crafted to align with high stakes assessment questions specific to question format, difficulty, and skill, then mapped to specific standards, and reviewed by the group as a whole. The final instrument is pulled together, sometimes literally using cut and paste technology, and prepared for distribution to district schools, targeted for each grade level and classroom. At the prescribed time, the test is administered; subjected to a process for scoring, the data are collected, and compiled for the classroom, school and district. If the district has the vision and resources, there are technology tools available to electronically acquire student responses, and report them back to classroom teachers in an instructionally useful manner, while passing them forward to administrators for broader decision making responsibilities. Thomas (2005) describes six traditional functions of test results. This includes diagnosing individual strengths and weaknesses, differentiating instruction, understanding whole group strengths and weaknesses, assigning grades, and predicting performance. Thomas specifically notes, “present day high-stakes programs are also often used for predicting pupils’ future academic success” (p.81). Given the summative nature of high stakes assessment, and the regular practice of local benchmark measures of learning, the noble approach would be to support the learner with formative assessments in a manner that empowers the teacher to better prepare the student to both successfully participate in the assessment and achieve the targeted performance standard well in advance of the summative assessment. Based on Thomas’ (2005) suggestion, it is reasonable to infer that scores earned by a typical child on a district-wide common assessment would relay expectations, to some degree, regarding performance on a high stakes standardized assessments. This reasoning, however, lends itself to anecdote, and calls for greater empirical rigor. It is appropriate to investigate the relationship between high stakes and locally created common assessments using student data. This study begins to explore that relationship with a sample of approximately 1200 grade 3 4
  • 5. through 8 students in a small magnet school district in the Northeast. Three lenses are employed for this inquiry. First, correlations between high stakes and local assessments were examined. With evidence of criterion-related validity, the second approach required a six-step process employed to predict high stakes performance with stepwise multiple linear regression. These estimated scores were subjected to teacher review and refinement. The paper concludes with a discussion of the relationship between algorithm-predicted scores, teacher-revised predictions, and actual scores, as well as directions for future research and application. For the intent of this paper, the term ‘district-wide common assessment’ refers to an assessment designed for and administered to all children enrolled in a particular grade within a particular school district at pre-determined intervals during the course of a school year. Purpose During the course of the school year, the classroom teacher is likely to infer, from the child’s performance in general coursework, an approximate achievement level of their learners on the high stakes assessments. The work by DuPaul, Volpe, Jitendra, Lutz, Lorah & Gruber (2004) found teacher perceptions of academic skills to be the strongest predictor of achievement test scores. Admittedly, a single standardized assessment score is a very narrow snapshot into the academic life of a student. However, the No Child Left Behind (NCLB) Act has riveted the nation’s attention, in the form of public criticism or celebration related to the information contained on a single measure of learning. As constricting and inarticulate as that value may be, in terms of describing the capacity of the learner, it is a cultural reality in the nation’s public schools. The work of DuPaul et al. encourages the examination of the predictive utility common assessment data might offer to teachers and building administrators. The ability to confidently estimate, with relative accuracy, content mastery and learning needs with a local benchmark assessment empowers the classroom teacher and school district to proactively address learning needs in advance of the summative state assessment. This is in alignment with Stiggin’s (2005) call to clarify the intent of assessments as ‘for learning’ (classroom assessment) or ‘of learning’ (high stakes tests). Schools invest a great deal of time tracking student performance. Common assessment data is one internally controlled tool that schools can employ to formally process student attainment relative to state standards. Boudet, City, and Murnane (2007) offer four cautions 5
  • 6. related to the use of internally constructed assessments: validity and reliability challenges; disparate difficulty levels between parallel forms; standardized administration protocols; and consistent scoring procedures. When each of these concerns are addressed, the resulting system of assessment promises a resilient and reasonably credible measurement. It would be useful to classroom teachers and educational leaders to be aware of the relationship between performance recorded on a district-wide common assessment and expectations for student performance on high stakes assessments. This inquiry sought to address the following questions: Research Question 1: How well are the locally constructed district-wide common assessments scores related to high stakes assessment performances? Research Question 2: What processes can we use to make more efficient use of district- wide common assessment data in understanding the relationship between common assessment scores and high stakes assessment scores? Methodology Participants Located in the Northeastern United States, the school district enrolls approximately 3400 urban and suburban learners in eight magnet schools and five special programs. Approximately 61% of the students are minority, non-white, and 27% qualify for free or reduced price lunches. Existing data from approximately 1200 grade three through eight students enrolled in five magnet schools were used for this research. Assessment Tools There are two data sets existing for the students: local common assessment data and high stakes data. The local data is collected at the item level via hand entry for district-wide common assessments. The district assessments have been constructed by teachers employed by the school district, with the support of curriculum experts and consultants. These tests are grade-level specific, ensuring that each child has experience with the format and difficulty level appropriate to the corresponding grade level, prior to the high stakes assessment. Each item on each assessment is aligned with a standard and skill, ensuring that the results of the assessments will be instructionally useful to classroom teachers. The assessments are administered twice yearly, 6
  • 7. during the fall and in January. All state standards are represented with a range of items, between 4 and 12, within at least one of the assessments for each grade level. Instructional staff are provided with the district-wide assessment schedule prior to the start of the school year. Assessment administration windows range from a week to 25 days, dependent on the nature of the assessment and scoring process. Most assessments contain open ended responses that require rubric-based scoring, making the average administration and data entry time frame close to 20 days. The students in this study also participate in yearly high stakes assessments, managed by the state department of education. This represents the second set of existing data for the subjects in this study. Annual criterion referenced state assessments are administered to students in grades three through eight, as well as grade 10, during the month of March. For this research, only student data from grades 3-8 were included. Results are provided by grade level and content area, with specific mastery performance information for each learner relative to the 25 math strands, four reading strands, the TASA Degrees of Reading Power (DRP), and three writing strands. Students also receive summary score information for each content area in the form of a scale score. Scale scores are calculated to range from 100 to 400. Scale scores are divided into five levels of performance, with ‘5’ representing the most accomplished score and ‘1’ representing the least skilled performance. Score ranges vary by grade, relative to the level of achievement, with a level of 3 or higher describing the achievement level reflective of the NCLB AYP criterion for the content area. Procedure A single data file containing high stakes assessment data from the March 2006 high stakes assessment and scores for Fall 2005 and January 2006 district-wide common assessments was used to explore the correlations between the two test formats. Appendix A, Table 1, provides the Pearson correlation coefficients for each content area, for each grade level, for each common assessment administration. Note that not all schools participate in every administration of every test. 7
  • 8. Results: Research question one The first research question sought to clarify the quality of the relationship between the high stakes and local assessment performance. To this end, correlation coefficients were used to describe the degree of relationship between the two different types of assessments in both reading and mathematics. All coefficients were statistically significant. The mathematics content area offered statistically significant correlation coefficients between locally constructed benchmark assessment performance and high stakes achievement. Coefficients ranged from .74 to .91 (p<.01) when the local data were matched to high stake performance. In the area of reading there are two data elements used to define grade level proficiency. The high stakes scale score is a composite score consisting of scores from both the Reading Comprehension section and the Degrees of Reading Power unit score (DRP) (TASA, 2007). These numbers are combined and then transformed into a Reading Scale Score (CSDE, 2007). To emulate this approach, the district employed a common assessment score that simulates the Reading Comprehension segment of the high stake assessment, and a separate DRP test. Significant correlation coefficients between the common assessments scores and high stakes scores ranged from .44 to .76 (p<.01). With the DRP administrations, the correlations between local and state administered tests ranged from .41 to .88 (p<.01), dependent on the grade level and administration time. Methodology: Research Question Two A six step process was employed to predict high stakes performance using district-wide common assessment data. Appendix B offers a visual model of the entire six step process. Step One: The high stakes assessment data were stratified by grade level. A stepwise multiple linear regression was employed to identify the variables with the greatest explanatory power relative to performance on the content area scale score for the areas of Mathematics and Reading. For each grade level and content area, the proportion of variance explained by each of the variables was documented. The goal was to identify the variables that explained the greatest amount of variance in scale score for the content areas of Mathematics and Reading. 8
  • 9. Respondent data were monitored carefully for outlying cases, which occurred for every grade level and content area, for performances both above and below three standard deviations from the mean. These excluded cases did not reveal any systematic patterns and represented less than 5% of the overall sample for the grade level. At the close of step one, a maximum of three predictor variables were identified for each grade level and content area. Once the final regressions were run, strand scores were converted to proportions for each test taker. This was performed because the district-wide common assessments did not perfectly match the test item quantity or mastery criterion on the high stakes assessment. By converting student scores to percents, for both internal assessments and high stakes assessments, the multiple regression output became a tool. Step Two: Using output to generate algorithms The output generated regression coefficients for each of the analyses. These were used to create algorithms, based on the multiple regression equation, Ŷ = a + b1X1 + b2X2 + b3X3 + e (Tabachnick & Fidell, 2007). This equation describes the predicted value as Ŷ, a as the slope of the line, bi as the regression coefficient and Xi as the corresponding indicator value for the individual participant. The error term is represented as ‘e’, and was not used in the actual calculation of estimated scale scores for each participant. However, the initial information presented to building administrators did reveal the error term values. For example, a school- specific equation for Mathematics scale score for grade 3 students was shared as follows: Mathematics Scale = 34.91 + .475(Reading Scale) + 11.98(Math Strand 15) + 11.41(Math Strand 6) + 18.39 This regression explained 87.2% of the variance in Mathematics Scale Score for grade 3 students at a specific school. Appendix C offers a sample of the materials that school principals were provided relative to the multiple regression process. Step Three: Integrate district-wide common assessment scores into regression algorithms The third step in the prediction process was to locate the student data corresponding to each of the strands identified as predictive of a scale score, from within the district-wide common assessment data, and then integrate those values into the equations. Appendix D presents the indicators that were employed across the school district data by grade level and content area. 9
  • 10. Step Four: Generate, distribute and collect teacher validation reports Estimates for each student in each content area were generated based on the multiple regression output. Once these predicted values were derived, teacher feedback was sought regarding the accuracy of the estimates. Teacher rosters were printed that provided teachers with student names, corresponding estimates, and requested that the teacher critically evaluate the estimate for each learner in each content area by rating the estimate as ‘Too Low’, ‘On Target’ or ‘Too High’. The estimation process was described to teachers as performed by a machine using a mathematical equation applied to common district-wide assessment data. These estimates were referred to as ‘Machine Estimates’ in the literature provided to teachers. Classroom teachers were asked to review the estimates carefully and apply their professional expertise and knowledge of the learner to revise the preliminary information that the equation had generated. Specifically, teachers were asked to rate the ‘Machine Estimate’ as ‘Too Low’, ‘On Target’ or ‘Too High’. When a teacher identified an estimate that was not ‘On Target’ a revised level was requested. These are termed ‘teacher guided estimates’ in this work. A sample of the teacher roster is included in Appendix E. Once all of the rosters were returned and the data entered, estimates and teacher guided estimates were compared. Both machine estimates for math and reading were highly correlated with teacher guided estimates for the same (Math Level, r=.79, p<.01, n=1233; Reading Level r=.85, p<.01, n=1245). However, when examined with a pair-wise comparison, guided estimates were significantly higher in reading than the machine estimates (T=18.524, df=1244, p<.01) by approximately .396 levels on a 5-point scale. Step Five: Use Teacher Guided Estimates to predict Adequate Yearly Progress (AYP) for the schools and school district Teacher guided estimates were employed in March 2007 to apprise central office staff and building administrators of potential AYP issues. A formal report was made to district leadership, and proactive planning began regarding areas of challenge for specific schools and subgroups. Step Six: Report back to teachers, administrators, and staff regarding expected AYP performance 10
  • 11. Testing concluded at the close of March, and high stakes assessment results were obtained on June 28, 2007. The attained data have been compared to teacher guided estimates and machine estimates. Results: Research Question 2 Correlations were used to examine the degree of relationship between Teacher Guided Estimates, Machine Estimates, and High Stakes Attained Levels. The data are presented in Table 2. Table 2: Correlations between Estimated and Attained Performance Mathematics Attained Score Reading Attained Score Teacher Guided Estimate .72**(N=1242) .63** (N=1238) Machine Estimate .62**(1234) .66**(N=1243) **p<.01 Discussion This line of inquiry has just begun. There are many assumptions that accompany this process that need to be identified and refined. As data stores increase and student information is captured systematically and methodically, the accuracy of the process can increase, as well as the sophistication of the various models used to estimate performance. With regard to question one, How well are the locally constructed district-wide common assessments scores related to high stakes assessment performances? We observed that the overall score for common assessments did, in most cases, align well with student high stakes performance. Work needs to continue in the refinement of these instruments, so that the calibration increases. However, in that all correlation coefficients were significant, and some were very strong, it is reasonable to celebrate the current state of the local assessment tools for this school district. Future work needs to occur with those instruments and in those settings where there is a lack of alignment to standards. To facilitate this work, it would be appropriate to determine the correlation between each of the sub-scales on the district-wide common assessments and the obtained corresponding high stakes strand scores. Those assessments that don’t correlate well with high stakes data may indicate a need for deeper reflection and examination. A lack of 11
  • 12. criterion related validity might be a factor to consider. However, with locally constructed assessments there are many more assumptions that would benefit from careful scrutiny. These relate to the timing of the assessment, the quality of the selected reading passages, the similarity of question format, and the quality of item writing. Most notably, the gap of time between the administrations of the common assessments occurring during the fall session, and the administration of the high stakes assessment in March, reflects a long span of academic opportunity. To observe a weaker correlation between the fall and spring observations invites the speculation that learning has occurred. This is in line with the intent of the educational environment, in terms of using the fall assessment as a measure for instructional purposes. In fact, a high positive correlation between the fall and spring measures could be interpreted as a learning environment that failed to serve the student adequately. While a student may not score well on one or more areas of the common assessments in the fall, that same student may have mastered those respective concepts and skills by the March administration of the high-stakes assessments. In a similar manner, the time differential of the January assessment window may impact correlation outcomes, perhaps to a lesser degree. Going forward, further analysis may reveal benchmarks of student learning throughout the year on the compendium of common assessments which indicate that the student is “on track” for readiness in March. This benchmark approach would enable educators at key points during the year to accurately gauge the progress towards content mastery. A point of curiosity was the statistically significantly higher teacher revisions that accompanied the reading scores, and the resulting correlations between Teacher Guided Estimates, Machine Estimates, and Attained scores. Recall that teachers made statistically significantly more upward revisions to reading scores estimated by the machine process. Yet when the attained scores became available the machine estimates showed a more consistent relationship (r=.66, p<.01) than the teacher guided estimates (r=.63, p<.01). Conclusions Translating the usefulness and validity of the predictive modeling process to the level of classroom teachers required thoughtful implementation. Most classroom teachers have little interest in comprehending the generation of multiple regression predictive models, but are interested in reliable models that inform them of what students know and when. Integrating this 12
  • 13. highly theoretical approach in the world of practitioners necessitated a two-phase strategic plan of implementation. The first phase was to provide an overview to the administrative team, whose members would be responsible for introducing this modeling framework to the frontline teachers. These administrators were charged with designing a customized implementation plan for their respective sites, requiring the same core of basic information to be communicated, but allowing for the flexibility in how to support each faculty’s use of this model. Every administrator was empowered with creating a plan for his/her site that would provide enough technical information from the viewpoint of a practitioner, but in a manner that also highlighted the professional judgment of each teacher. The second phase of rollout occurred at the building level, with each administrator implementing the plan to ensure: 1) a general understanding of the overall process, 2) the purpose of the predictive modeling, 3) an elimination [or reduction] of cultural resistance, and 4) clear articulation of the value of the individual professional assessment of each teacher. The outcome of the implementation was markedly successful in several ways: 1) teachers completed the task with utmost sense of seriousness and purpose, 2) there was a general acceptance of the merit of predictive modeling by teachers, 3) feedback after correlation studies were completed offered teachers usable feedback and validation regarding their own individual predictive skills, coupled with that of the mathematical model and 4.) the use of common district-wide assessment data to predict high stakes performance introduced an awareness of the validity of the common assessments in use in the district. Future applications of this process include an expansion on the use of historical high stakes data to inform the predictive models and the adoption of an assessment management system that will acquire item level data electronically, to increase data integrity and release teachers from the clerical task of data entry. 13
  • 14. REFERENCE LIST Ainsworth, L. & Viegut, D.J. (2006) Common Formative Assessments: How to connect standards-based instruction and assessment. Thousand Oaks, California: Corwin Press, Inc. Bureau of Student Assessment, Connecticut State Department of Education, (2007). CMT-4 technical bulletin, Calculation of scale scores for the 2007 CMT-4 administration (Form P’). Retrieved January 3, 2007, from http://www.csde.state.ct.us/public/cedar/assessment/cmt/resources/misc_cmt/cmt_technical_bu lletin_2007.pdf DuPaul, G.J., Volpe, R.J., Jitendra, A.K., Lutz, J.G., Lorah, K.S., & Gruber, R. (2004) Elementary school students with AD/HD: predictors of academic achievement. Journal of School Psychology, 42, 285-301. Stevens, J. (1996). Applied multivariate statistics for the social sciences (3rd ed.). Mahway, NJ: Lawrence Erlbaum. Reeves, D. B. (2000). Accountability in action: A blueprint for learning organizations. Denver, Colorado: Advanced Learning Press. Reeves, D. B. (2002a). Holistic accountability: Serving students, schools, and community. Thousand Oaks, California: Corwin Press, Inc. Reeves, D. B. (2002b). The leader's guide to standards. San Francisco: Jossey-Bass. Schmoker, M (1999). Results: The key to continuous school improvement, 2nd edition. Alexandria, VA: Association for Supervision and Curriculum Development. Stiggins, R.J. (2005). Student Involved Assessment for Learning, 4th ed. Upper Saddle River, NJ: Merrill/Prentice Hall. Thomas, R.M. (2005) High Stakes Assessment: Coping with collateral damage. Mahway, NJ: Lawrence Erlbaum. Tabachnick, B. G., & Fidell, L. S. (2007). Using Multivariate Statistics, 5th ed. Boston: Allyn and Bacon. 14
  • 15. Appendix A: Table 1 Table 1: Correlation Coefficients Mathematics: Correlation Coefficients Grade Level Fall 2005 Common Assessment Mid­Year 2006 Common Assessment r With 2006 Math Scale Score (N) r With 2006 Math Scale Score (N) Gr 3 .74**(144) .84**(144) Gr 4 .83**(122) .89**(122) Gr 5 .85**(111) .82**(111) Gr 6 .82** (275) .76** (292) Gr 7 .84** (296) .83** (293) Gr 8 .83** (245) .91** (271) District Overall* .73**(1193) .77**(1233) Reading: Correlation Coefficients – Common Assessments Fall 2005 Common Assessment January 2006 Common Assessments r With 2006 Reading Scale Score (N) r With 2006 Reading Scale Score (N) Gr 3 .72**(139) .75**(141) Gr 4 .71**(121) .76**(121) Gr 5 .61**(111) .69**(110) Gr 6 .64**(254) .61**(308) Gr 7 .59**(269) .68**(304) Gr 8 .72**(241) .44**(273) District Overall* .64**(1135) .57**(1257) Fall 2005 DRP January 2006 DRP r With 2006 High Stakes DRP Unit Score (N) r With 2006 High Stakes DRP Unit Score (N) Gr 3 .79**(143) .85**(43) Gr 4 .83** (121) (0) Gr 5 .88*(111) .87**(43) Gr 6 .41**(263) (0) Gr 7 .79**(173) (0) Gr 8 .87**(153) (0) District Overall .63**(964) .88**(85) **p<.01; District Overall* employed percentage scores, instead of raw values. 15
  • 16. Appendix B: Visual Model of the Prediction Process 16
  • 17. Appendix C: Sample Building Administrator Initial Regression Materials Mathematics Scale Score Grade Level Variables in Order of Influence Proportion of Variance  Explained 3 a. Reading Scale Score b. M15: Geometry and Measurement ­ Approximating  Measures c. M6: Numerical and Proportional Reasoning­Basic  Facts a.       69.3% b.       12.2% c.         5.7% Total : 87.2% Ŷ = a + b1X1 + b2X2 + b3X3 + e MScale = 34.91 + .475(RScale) + 11.98(M15) + 11.41(M6) + 18.39   4 a. M4: Numerical and Proportional Reasoning­  Order, Magnitude and Rounding of Numbers b. M23: Algebraic Reasoning: Patterns and  Functions­Algebraic Concepts c. M11: Numerical and Proportional Reasoning­ Estimating Solutions to Problems a.   66.1% b.   15.2% c.     6.3% Total : 87.6% Ŷ = a + b1X1 + b2X2 + b3X3 + e  MScale = 110.72 + 15.48(M4) + 14.49(M23) + 8.66(M11) + 9.33     5 a. ER­ Holistic Writing Score b. M3: Numerical and Proportional Reasoning­ Equivalent Fractions, Decimals and Percents c. M24: Working with Data: Probability and Statistics­ Classification and Logical Reasoning a.   69.9% b.   13.4% c.     7.9% Total : 89.2% Ŷ = a + b1X1 + b2X2 + b3X3 + e MScale =100.43 + 2.75(ER­ Holistic Writing) + 12.36(M3) + 10.73(M24) + 9.98   17
  • 18. Reading Scale Score Grade Level Variables in Order of Influence Proportion of  Variance  Explained 3 a. DRP: Degrees of Reading Power b. RC_A: Reading Comprehension­Forming a General Understanding c. RC_B: Reading Comprehension­Developing Interpretation a.       91.8% b.        4.1% c.         1.5% Total : 97.4% Ŷ = a + b1X1 + b2X2 + b3X3 + e RScale = 65.73 + 2.08(DRP) + 4.64(RC_A) + 3.14(RC_B) + 5.96   4 a. DRP: Degrees of Reading Power  b. RC_D: Reading Comprehension­Examining the Content and Structure c. RC_A: Reading Comprehension­Forming a General Understanding a.         90.2% b.           6.3% c.           1.4% Total : 97.9% Ŷ = a + b1X1 + b2X2 + b3X3 + e  RScale = 68.93 + 1.86(DRP) + 6.06(RC_D) + 3.63(RC_A) + 4.53     5 a. DRP: Degrees of Reading Power  b. RC_A: Reading Comprehension­Forming a General Understanding  c. RC_D: Reading Comprehension­Examining the Content and Structure a.   90.8% b.     4.3% c.     1.8% Total : 96.9% Ŷ = a + b1X1 + b2X2 + b3X3 + e RScale =63.13 + 1.85(DRP) + 4.26(RC_A) + 3.66(RC_D) + 5.99   19
  • 19. Appendix D: District Indicators Mathematics Scale Score Grade Level Variables in Order of Influence Proportion  of  Variance  Explained 3 a. M15: Geometry and Measurement­Approximating Measures b. M25: Integrated Understandings­Mathematical Applications c. M5: Numerical and Proportional Reasoning­Models of Operations Total : 81% 4 a. M3: Numerical & Proportional Reasoning­Equivalent Fractions, Decimals &  Percents b. M5: Numerical & Proportional Reasoning­Models of Operations  c. M25: Integrated Understandings­Mathematical Applications Total :  81.7% 5 a. M4: Numerical & Proportional Reasoning­Order, Magnitude & Rounding of  Numbers b. ER_CR: Composing/Revising c. M24: Working with Data: Probability & Statistics­Classification & Logical  Reasoning Total :  82.6% 6 a. M11: Numerical &Proportional Reasoning­Estimating Solutions to Problems b. M2: Numerical &Proportional Reasoning­Pictorial Representations of Numbers c. M9: Numerical &Proportional Reasoning­Solve Word Problems Total :  82.5% 7 a. M9: Numerical &Proportional Reasoning­Solve Word Problems b. M11: Numerical &Proportional Reasoning­Estimating Solutions to Problems c. M23: Algebraic Reasoning: Patterns & Functions­Algebraic Concepts Total :  86.4% 8 a. M11: Numerical &Proportional Reasoning­Estimating Solutions to Problems b. M16: Geometry and Measurement­Customary and Metric Measures c. M7: Numerical & Proportional Reasoning­Computation with Whole Numbers &  Decimals Total :  90.0% 20
  • 20. Reading Scale Score Grade Level Variables in Order of Influence Proportion of  Variance  Explained 3 a. DRP: Degrees of Reading Power b. RC_A: Reading Comprehension­Forming a General  Understanding c. RC_D: Reading Comprehension­Examining the Content and  Structure Total : 98.2% 4 a. DRP: Degrees of Reading Power  b. RC_A: Reading Comprehension­Forming a General  Understanding  c. RC_C: Reading Comprehension­Making Reader/Text  Connections Total : 96.7% 5 a. DRP: Degrees of Reading Power  b. RC_D: Reading Comprehension­Examining the Content and  Structure  c. RC_A: Reading Comprehension­Forming a General  Understanding Total : 95.8% 6 a. DRP: Degrees of Reading Power b. RC_D: Reading Comprehension­Examining the Content and  Structure  c. RC_B: Reading Comprehension­Developing Interpretation Total : 97.2% 7 a. DRP: Degrees of Reading Power  b. RC_D: Reading Comprehension­Examining the Content and  Structure c. RC_C: Reading Comprehension­Making Reader/Text  Connections Total : 97.1% 8 a. DRP: Degrees of Reading Power  b. RC_D: Reading Comprehension­Examining the Content and  Structure c. RC_A: Reading Comprehension­Forming a General  Understanding Total : 96.8% 21
  • 21. Appendix E: Sample Teacher Revision Rosters 22