Measuring the Impact of Principals on Student Achievement

cepa.stanford.edu
CENTERFOREDUCATIONPOLICYANALYSISatSTANFORDUNIVERSITY
Using Student Test Scores to
Measure Principal Performance
Susanna Loeb
Jason A. Grissom
Demetra Kalogrides
Research supported by Institute of Education Sciences Grant R305A100286
Instituto Nacional de Evaluación Educativa (INEE), March 21 2013

cepa.stanford.edu
Motivation
• Substantial recent policy interest in using student test
score data to evaluate the performance of school
personnel
– New York: Educational Law 3012-c (2010), 20-40%
– Louisiana: House Bill 1033 (2010), 50%
– Florida: Senate Bill 736 (2011), 50%
• A relatively large literature has focused on the issues
surrounding the use of student growth models to measure
teacher effects
• In contrast, very little research on using test scores to do
the same for principals

cepa.stanford.edu
Why Principals?
• Principals linked to teacher satisfaction and career choices
(link)
• Principals central actors in most recent school reforms
– Accountability
– School-based budgeting
– Mutual-consent agreements (teacher hiring)
– Charter schools
• Some empirical evidence of effects on students
– Students learn more in schools where
• Principals spend more time on organizational management
• Human resource practices are more conducive to hiring, retaining,
assigning and developing good teachers.
• Increased policy attention on attracting and preparing
effective school leaders

cepa.stanford.edu
Aren’t the principal issues like the
teacher issues?
• In some ways, yes
– Which test you use matters
– Student sorting across schools can create bias if not well
addressed
– Test measurement error, sampling error, and other shocks
introduces error in effect estimates
• But in some important ways, no
– Principal effects dispersed over entire school, so the principal can
affect a given student in more than one year
– Indirect effects on students mediated by resources only partially
under principal’s control
– For teachers, we use school fixed effects to combat sorting and
control for school contextual factors—but only 1 principal per
school

cepa.stanford.edu
Goals of this study
1. Identify a range of possible value added-style models for
capturing principal effects using student achievement
data
2. Consider the conceptual issues associated with those
models
3. Use longitudinal test score data to estimate the different
models and examine their empirical properties
4. Compare the models’ outcomes with non-test
performance measures
– Doesn’t‘ show which is right but gives insights into validity of
both test-score-based measures and other measures

cepa.stanford.edu
How should we think about how
principals affect school outcomes?
• Student achievement is a function of his/her characteristics (X) and the
effectiveness of the school for him/her (S), which is in turn a function of
the performance of the school’s principal (P), and other aspects of the
school (O) that aren’t under the principal’s control
• How you measure principal effects depends on your beliefs about S
• Two main issues
1. Timing: Do you expect that principal performance is reflected in student
outcomes immediately (e.g., assigning teachers where they can be most
effective, pushing everyone to work hard), or could good performance
take time to show up (e.g., through teacher recruitment, changing the
school culture)?
2. School effects: How important is distinguishing P from O?

cepa.stanford.edu
Approach 1: If principals’ effects are
primarily immediate and most relevant
school factors are under their control
• Then principal effects are quite like teacher effects
• We would measure how much students learn while the
principal is in the school, adjusting for student backgrounds
• Estimate a VA model with a principal-by-school effect
• Assumes that all differential growth in student learning from
similar students in similar contexts is due to principal
isptspgysptsptispttisispt CSXAA    4321)1(

cepa.stanford.edu
Many alternative models with the
same general idea
• Variation in how to separate principal/school effects
from differences in student population
– Could use additional prior scores
– Could instrument prior scores, adjusting for measurement
error
– Could include student fixed effects
– Could include non-linear school controls
– Could include neighborhood characteristics
– Could estimate in two stages
• Big debate currently in US about teacher value-added.
• In first stage control for school characteristics, save residuals in
second stage estimate principal by school effects
• Also, could not control for immediate prior score (if early score
is available)

cepa.stanford.edu
Drawback of Approach 1
• Attributes the entire school to the principal
– But this might not be fair—100% of school quality
probably not attributable to current principal
• A principal who steps into a new school inherits teachers
hired by someone else (78% in our data)
• Administrative team also (67%)
• Alternative: If principals’ effects are primarily
immediate but principals start with very different
schools and these differences are not under their
control

cepa.stanford.edu
2. Relative School Effectiveness
• Potential solution: Compare the relative
effectiveness of the school during the principal’s
tenure to the effectiveness of the same school at
other times by adding a school fixed effect:
– Principal and School fixed effects
– Identify school effects by differences in principals who
move across schools
– Identify principal effects by comparing principals serving
in the same school
isptpsgysptsptispttisispt CSXAA    4321)1(

cepa.stanford.edu
Similarly, many alternative
specifications but the idea is the same
• Could make similar adjusts as in Approach 1
– More prior test scores…
• Could run in two stages
– First estimate school effects
– Then estimate principals (or principal by school) effects
from the residuals
– Then principal effects would sum to one in a school
– Allows for a principal by school effect
• Could control for prior school effectiveness instead of
including a school fixed effect
– Measurement bias but more variation (fewer small group
comparisons).
– Do this, but weak control means similar to first approach.

cepa.stanford.edu
Drawbacks of Approach 2
• Need a long data stream so schools can have many
principals
– Not estimable when only 1 principal (in our data = 38% of
schools) and no principal movement across schools
– If only 1 additional principal, comparison is implicitly just to
that person, which is difficult to justify (34%)
• Schools change over time, including changes in school
composition and shocks to the culture.
– Approach 2 may then still have omitted variables problems
– Most used by researchers but, seems to be easily dismissed by
practitioners
• Alternative to Approaches 1 and 2: If principals’ effects
accumulate over time

cepa.stanford.edu
Approach 3: School Improvement
• Prior models do not explicitly incorporate long-run
improvement
– Building a productive work environment over time might be an
important part of what a principal does
• Can incorporate improvement by estimating a principal-
specific time trend in the school and taking its coefficient as
improvement:
• Drawbacks
– Can’t estimate for principals who spend only one year in a school
– Good face validity properties but substantial data requirements
– Measurement error more of an issue—may be difficult to get
reliable measures of improvement over time
isptsptspspgysptsptispttisispt TCSXAA    4321)1(

cepa.stanford.edu
The 3 Conceptual Approaches
• Approach 1: School Effectiveness during a principal’s
tenure
– Assumes the effect is immediate and constant
– Assumes principal controls all school effects
• Approach 2: Relative School Effectiveness to other
principals serving in the same school
– Assumes the effect is immediate and constant
– Can only be estimated for some principals
– Comparison to other principals may be idiosyncratic
– Assumes no unobserved changes in schools that are outside
the principal’s control
• Approach 3: School Improvement
– Measurement error
– Need multiple years in a school

cepa.stanford.edu
Prior Research Literature
• Studies have used student test score data to examine the impact of school
leadership on schools
– Often cross-section without proper controls
– Some longitudinal studies – estimating effects of characteristics and processes
• Only 4 value-added studies
– Coelli & Green (2012) only published paper: high school graduation and 12th
grade final exam scores in British Columbia. No prior controls. Model like 2A,
finds little effect. Model like 3A that allows growth: finds some effect of
principals
– Branch, Hanushek, & Rivkin, (2012): Model 1A (and a bit on 2A). Estimates
the magnitude of the effect. About 0.05 standard deviations in math.
– Dhuey and Smith (2012) elementary and middle schools in British Columbia.
Model 2A. Standard deviation about 0.16 in math, 0.10 in reading
– Chiang, Lipscomb, & Gill (2012). Alternative approach. Tries to separate the
school effect from the principal effect using principal transitions.
• Small literature
– 2A, model with principal and school fixed effects, most popular
– Focus on separating principal from school effect
– Little systematic discussion of the merits of different approaches
– No comparison to other measures

cepa.stanford.edu
Data for Value-Added Measures
• Longitudinal data on students and personnel in
Miami-Dade County Public Schools (M-DCPS) for
2003-04 to 2010-11 (8 years)
– 4th largest school district in U.S. (350,000 students)
– 90% black or Hispanic, 60% FRL
– Rich administrative files
• Test measure: Florida Comprehensive Assessment Test
(FCAT) in math and reading, grades 3-10
– Criterion-referenced, based on Sunshine State Standards
– Standardized within grade and school year

cepa.stanford.edu
Method
• Estimate models approximating each of the approaches
• Approach 1
– 1A: Basic Model with principal-by-school fixed effects
– Many others as specification check
• Approach 2
– 2A: Basic model with school and principal fixed effects
– 2B: Basic model with controls for school effectiveness with
other principals and principal fixed effects
– Others as specification checks
• Approach 3
– 3A: Basic model with principal-by-school fixed effects and
principal-by school time trends. Use principal-by-school time
trend coefficients
• only estimated if observe principals for 3 years in the same school
– Others as specification checks

cepa.stanford.edu
Basic Model
• X: Student Characteristics
– whether the student qualifies for free or reduced priced lunch
– whether currently classified as limited English proficient
– whether repeating the grade in which they are currently enrolled
– the number of days they missed school in a given year due to absence
or suspension (lagged)
– race
– Gender
• C: Class characteristics
– All of the above averages
– Achievement
– Standard deviation of achievement
• S: School characteristics
– Like C but at the school level
isptspgysptsptispttisispt CSXAA    4321)1(

cepa.stanford.edu
Steps
1. Estimate principal effects
– Create original
– Shrink using Empirical Bayes approach (for some
applications). Link
2. Coverage
3. Magnitude of principal effects
– How important do principals look for student
achievement using each method
4. Similarities among measures. Correlations
5. Consistency across subjects and schools.
Correlations
6. Comparison to other measures
– description to come

cepa.stanford.edu
Results: Coverage
• Approach 2 smaller because principal effects instead
of principal by school effects
– B smaller than A because have to adjust for prior
effectiveness
• Approach 3 smaller because need 3 years of data
Table 2. Distribution of Principal Value-Added Estimates
Sample Size
Math
1A (No Student FE) 781
2A (Principal & School FE) 484
2B (Principal FE, Control for Prior) 353
3A (Principal by School Time Trend) 263

cepa.stanford.edu
Network Sizes
• Need to compare principals to another group of principals and then
set the groups equal to each other
– For most models, choose to group elementary schools, middle
schools, k-8 schools, high schools, and combination schools – 5
networks
Math Reading
#
Networks Av. Size
#
Networks Av. Size
1A 5 156 5 159
2A 145 3 144 3
2B 5 71 5 73
3A 5 53 5 45
Model 2A is
setting a lot of
small groups of
principals equal
to each other

cepa.stanford.edu
Results: Magnitude
Table 2. Distribution of Principal Value-Added Estimates
FE EB TRUE
Math
1A (No Student FE) 0.109 0.095 0.105
2A (Principal & School FE) 0.064 0.058 0.059
2B (Principal FE, Control for Prior) 0.084 0.070 0.080
3A (Principal by School Time Trend) 0.058 0.032 0.050
Reading
Model 1A (No Student FE) 0.083 0.069 0.077
Model 2A (Principal & School FE) 0.038 0.039 0.034
Model 2B (Principal FE, Control for Prior) 0.065 0.055 0.061
Model 3 (Principal by School Time Trend) 0.042 0.023 0.032
School
Effectiveness
almost 2X as big

cepa.stanford.edu
Results: Similarities
Math, EB Estimates
1A 2A 2B
2A 0.45 1.00
2B 0.58 0.60 1.00
3A -0.05 0.16 0.09
Reading, EB Estimates
1A 2A 2B
2A 0.39 1.00
2B 0.63 0.44 1.00
3A -0.01 -0.04 0.14
Improvement not at
all correlated with
other measures
Controls for prior higher than
effects (measurement error)

cepa.stanford.edu
Results: Consistency
• Across subject
– Subject to same shocks and omitted variables
• Only for 1A across schools
– Could still be due to principals moving to similar schools
Acoss
subjects Across Schools
Math Reading
1A 0.51 0.25 0.16
2A 0.61 NA NA
2B 0.53 NA NA
3A 0.44 -0.39 -0.11

cepa.stanford.edu
Comparing VA Estimates to Other
Performance Measures
• Florida accountability grades (A-F)
• District’s evaluation rating of principal (4-category from
distinguished to below expectations)
• Student, staff, and parent climate survey scores (“Assign
an overall grade to school…”). Administered by MDCPS.
– 2004-2009 school year
– 4 questions (first 3 Likert scale) collapsed to the school level
• students are safe at this school
• students are getting a good education at this school
• the overall climate at this school is positive and helps students learn
at this school.
• Assign a letter grade (A–F) to their school that captures its overall
performance.
– Factor for each group

cepa.stanford.edu
Comparing VA Estimates to Other
Performance Measures
• Measures from our spring 2008 web-based surveys of
principals and assistant principals
– both asked about principal performance on a list of 42 areas of
job tasks common to most principal positions. factored into 6
areas
– principals’ overall effectiveness, plus effectiveness in
organizational management, given evidence of importance
• Measurement error in performance measures and no
build in adjustments so, model with following regression.
– Controls: principal race and gender, average school test scores
(from first year first grade), percent white, percent black,
percent suspended, and percent chronically absent
isptpspspsps SXVAO   321

cepa.stanford.edu
Mean Std Dev N
Average of Ratings Received
From District (1-4)
3.54 0.51 659
Average While at
School
Proportion of Years Received Highest
Rating From District
0.59 0.41 659
Average While at
School
School Accountability Grade, 0-4 Point
Scale
2.79 1.19 755
Average While at
School
School Climate Scale-Student Report -0.16 0.99 775
Average While at
School
School Climate Scale- Staff Report -0.17 1.05 788
Average While at
School
School Climate Scale- Parent Report -0.21 1.05 781
Average While at
School
AP Rating of Principal (Overall) 0.01 0.86 188 2008 Survey
AP Rating of Principal (Management) 0.02 0.91 236 2008 Survey
Principal Rating of Own Effectiveness
(Overall)
-0.02 1.00 213 2008 Survey
Principal Rating of Own Effectiveness
(Management)
-0.02 1.00 247 2008 Survey

cepa.stanford.edu
Results: Comparison Summary
District
Eval
FL Acct’y
Grade
Student
Climate
Parent
Climate
Staff
Climate
AP
Mang
Rating
Principal
Mang
Self-
Rating
1A ++ ++ ++ ++ ++ ++ +
2A + ++ +
2B ++ ++ ++ ++ ++ +
3A

cepa.stanford.edu
Summary
• Choice of model matters
• Need to be very thoughtful about how we model
principal effects
– School effect only: Attributes too much of the school effect
to the current principal
– Relative school effectiveness: May be biased by limited and
unequal comparison groups. Assuming all groups are equal.
Low coverage.
– School improvement: Measurement error may produce very
unreliable estimates. Low coverage.
• Simplest conceptual model has clearest relationships
with non-test performance measures
– it isn’t clear whether that is because they are both biased

cepa.stanford.edu
Implications
• Think carefully about the use of the measure
– Is it for evaluation or research?
• Research
– Cause is the most important factor
• Worry about internal validity.
• Not clear what the best approach is. Simulations might help
• Evaluation
– Ehlert, Koedel, Parsons and Podgursky (2012) identify 3 key objectives
for an evaluation measure:
• elicit optimal effort from personnel
• improve system-wide instruction by providing useful performance signals
• avoid exacerbating pre-existing inequities in the labor markets between
advantaged and disadvantaged schools
– For this, basic school effectiveness measures (perhaps with more
complete school characteristics adjustments) may be preferable if
comparisons are made to similar schools.
• Just a start in understanding measures
– Many possible measures and comparisons

cepa.stanford.edu
(perhaps and bit forced)
Transition
• My interest in principals, stemmed from their
importance for teachers.
• In particular, relative strong evidence
suggests that principals are key to teacher
retention, and maybe to the differential
retention of highly effective teachers
• But, surprisingly little literature on whether
teacher retention matters for students…

Measuring the Impact of Principals on Student Achievement

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Measuring the Impact of Principals on Student Achievement

Similar to Measuring the Impact of Principals on Student Achievement (20)

More from Instituto Nacional de Evaluación Educativa

More from Instituto Nacional de Evaluación Educativa (20)

Recently uploaded

Recently uploaded (20)

Measuring the Impact of Principals on Student Achievement