SlideShare a Scribd company logo
Why it’s time we stopped managing
schools like baseball teams part
…for the most
John Cronin, Ph.D.
Senior Director, the Kingsbury
Center at Northwest Evaluation
Association

You can view this presentation at slideshare:
http://www.slideshare.net/NWEA/schools-cant
How does it work in baseball?

In baseball, the contribution of
players to the success of the team
can be measured (value-added).
In baseball, general managers
have complete control over the
acquisition and deployment of
players.
How does it work in baseball?

Sabermetricians estimate the
number of wins a player
contributes to his team.
It’s calculated by estimating the
number of runs contributed by a
player and adding the number of
runs denied by that player’s
defensive contributions.
So what are the issues?

• We’ve confused players with
managers.
• The metrics are problematic.
• We’ve chosen the wrong focus for
policy.
Baseball hasn’t found a
We assume the statistics
methodology to effectively
applied to players (teachers)
apply sabermetrics to
can be applied to their
managers. (principals).
managers
How does it work in classrooms?

Brian’s projection gains on this for
A gain students’ is estimated
spring’s tests are compared to this
his students. This projection may
projection. If the gains exceed the
take into account his student’s
projection, we say Brian poverty
past performance, their state
Brian’s students took theproduced
“value-added”.
rate, and a variety of other factors.
exam last spring

Value-added methodologies attempt
to isolate a teacher’s contribution to
learning by measuring student
growth while controlling or
eliminating factors that influence
growth but are outside the teacher’s
control, such as student poverty.
Last spring

This spring
How does it work in classrooms?

+ .25

Brian’s students’ gains on this
Brian’s gain is compared to that of
spring’s tests and he is typically
other teachersare compared to this
projection. If score”, a exceed the
assigned a “z the gains metric that
projection, we say Brian produced
shows where he stands relative to
“value-added”.
other teachers in the state.

Last spring

This spring
How are principals different?

•

They don’t directly deliver
instruction to students.

•

Their impact cannot easily be
measured within a school year

Source: Lipscomb, S.; Teh, B.; Gill, B.; Chiang, H.; Owens, A (2010, Sept.). Teacher and Principal ValueAdded: Research Findings and Implementation Practices. Cambridge, MA. Mathematica Policy Research.
Three schools value-added math and reading
results – who is the better principal?
Math

Reading

2

1.5
1

Many state assessment
systems use a single
year of data for
principal evaluation.

0.5
0
-0.5
-1
-1.5

Langston Hughes Elem

Scott Joplin Elem

Lewis Latimer Elem
Langston Hughes Elementary
Math

Reading

2
1.5
1
0.5
0
-0.5
-1
-1.5
-2

2009-10
High Growth but not improving

2010-11

2011-12

2012-13
Scott Joplin Elementary
Math

Reading

2
1.5
1
0.5
0
-0.5
-1
-1.5
-2

2009-10
Below average
growth, improving but
decelerating

2010-11

2011-12

2012-13
Lewis Latimer Elementary
Math

Reading

2
1.5
1
0.5
0
-0.5
-1
-1.5
-2

2009-10
Below average
growth, improving and
accelerating

2010-11

2011-12

2012-13
So what are the issues?

• We’ve confused players with
managers.
• The metrics are problematic.
• The metrics are problematic.
• We’ve chosen the wrong focus for
policy.
How does it work in baseball?

In baseball, each player creates
his own metrics by getting on
base, stealing bases, or making
catches.
The metrics directly reflect their
performance.
Issues in the use of growth and valueadded measures

Differences among value-added
models
Los Angeles Times Study
Los Angeles Times Study #2
Issues in the use of value-added measures

Control for statistical error
All models attempt to address this
issue. Nevertheless, many teachers
value-added scores will fall within
the range of statistical error.
Issues in the use of growth and valueadded measures

Control for statistical error
New York City

New York City #2
What Makes Schools Work Study - Mathematics
Value-added indexwithin Group
15.0

Year 2

10.0

5.0

0.0

-5.0

-10.0
-10.0

-8.0

-6.0

-4.0

-2.0

0.0

2.0

4.0

6.0

8.0

10.0

12.0

Year 1
Data used represents a portion of the teachers who participated in Vanderbilt University’s What
Makes Schools Work Project, funded by the federal Institute of Education Sciences
Metrics matter

NCLB metrics
influenced
educator behavior
for a decade.
Metrics drive behavior

The term
“bubble kid” had a
different meaning
prior to 2000.
One district’s change in 5th grade math performance
relative to Kentucky cut scores

Mathematics

Number of Students

No Change
Down
Up

Fall RIT
Metrics drive behavior
Race to the Top
changes the focus.
Number of students who achieved the normal
mathematics growth in that district

Mathematics
Failed growth target
Number of Students

Met growth target

Student’s score in fall
Gaming distorts results

Testing conditions
may be gamed to
inflate results
Test duration and math growth between
two terms in one school’s fifth grade
Number of minutes for each
student’s first testtest. 50
second

140

100

Minutes

30
80
20
60
10

Scale score growth

The white line represents the
average duration of the 40
second
test.

120

40
0

20

The scale score growth attained
by each child.

0

-10
Test 1 Duration

Test 2 Duration

Scale Score Gain
The yellow

line represents the
average growth for fifth graders in
this district
Test duration and math growth between
two terms in all fifth grades in a district
90.0

25.0

80.0

60.0

Minutes

15.0
50.0
40.0

10.0
30.0
20.0

5.0

10.0
0.0

0.0
Test 1 Duration

Test 2 Duration

Scale Score Growth

Scale score growth

20.0

70.0
Test duration and math growth between
two terms in all fifth grades in a district
120.00

18.0
16.0

100.00

Minutes

80.00

12.0
10.0

60.00
8.0
40.00

6.0
4.0

20.00
2.0
0.00

0.0
Test 1 Duration

Test 2 Duration

Scale Score Growth

Scale score growth

14.0
The problem with spring-spring testing

Student’s spring to spring growth trajectory

Teacher 1
3/12

4/12

5/12

Summer
6/12

7/12

8/12

Teacher 2
9/12

10/12

11/12

12/12

1/13

2/13

3/13
Metrics do not provide a complete
picture of the classroom

They don’t capture
important noncognitive factors that
impact learning.
The intangibles

In baseball, the employment of
sabermetrics has reduced the
impact that a player’s
intangibles has on personnel
decisions. These intangibles
may include leadership
qualities, locker room
presence, and other
personality traits that may
contribute to team success.
Non-cognitive factors

In education, value-added
Jackson (2012) argues that
measurement have more
teachers may has focused
In on non-cognitive
policy-makers on the
impact baseball, the
employment of
teacher’s contribution to to
factors that are essential
sabermetrics has
academic success, as focused
student success like
general managers
reflected in test scores.on the
attendance, grades, and
player’s
suspensions.contribution to
the measures that
These are not the only
ultimately matter
measures that matter in the
sport,
however. runs and wins.
Non-cognitive factors

• Lowered the average student absenteeism by 7.4
days.
• Improved the probability that students would enroll
Employing value-added methodologies,
in the next grade by teachers had a
Jackson found that 5 percentage points.
• Reduced the likelihood of suspension by
substantive effect on non-cognitive 2.8%

outcomes that was independent of their .05
• Improved the average GPA by .09 (Algebra) or
effect on test scores
(English)

Source: Jackson, K. (2013). Non-Cognitive Ability, Test Scores and Teacher Quality: Evidence from 9th
Grade Teachers in North Carolina. Northwestern University and NBER
So what are the issues?

• We’ve confused players with
managers.
• The metrics are problematic.
• We’ve chosen the wrong focus for
• We’ve chosen the wrong focus for
policy.
policy.
Policy has focused on dismissal rather than
retention.

In
baseball, exceptional
players are much
rarer than average
ones. Thus it is vital
for a team to keep its
best players.
Employment of Elementary Teachers 20072012

1538000

The elementary school
NUMBER OF TEACHERS teacher workforce shrunk
by 178,000 teachers (11%)
between May, 2007 and
1544300
1544270
May, 2012.
1485600

Source

1415000
1360380

2007

2008

2009

2010

2011

2012

Source: (2012, May) Bureau of Labor Statistics – Occupational Employment Statistics
Numbers exclude special education and kindergarten teachers
The impact of seniority based layoffs on
school quality
In a simulation study of implementation of a layoff of
5% of teachers using New York City data, reliance on
seniority based layoffs resulted would:
• Result in 25% more teachers laid off.
• Teachers laid off wouldSource standard deviations
be .31
more effective (using a value-added criterion) than
those lost using an effectiveness criterion.
• 84% of teachers with unsatisfactory ratings would
be retained.
Source: Boyd, L., Lankford, H., Loeb, S., and Wycoff, J. (2011). Center for Education
Policy. Stanford University.
If evaluators do not
We must identify
also
differentiate their
identify the the
and protect least
ratings, then all to
effective teachers
most effective
differentiation with
gain credibility
teachers to improve
comes from the test.
public.
the profession.
Results of Tennessee Teacher Evaluation
Pilot
60%
53%
50%
40%

40%
30%
24%
20%

23%

22%

16%
12%
8%

10%

2%

0%

0%
1

2

Value-added result

3

4

Observation Result

5
Results of Georgia Teacher Evaluation Pilot
Evaluator Rating
1%

2%

23%

ineffective
Minimally Effective
Effective
Highly Effective

75%
Ratings under new Florida teacher
evaluation regulations
Florida Teacher Evaluation Rating
80

74.6

70

61.9

60
50
40

30

36.9
22.6

20
10

0.9 2.1

0.2 0.5

0.1 0.2

Needs Improvement

3 Year Developing

Ineffective

0
Highly Effective

Effective

2011-12

2012-13
It’s good to learn from
past failures.
What’s the analogy to schools?

Policy makers believe valueadded metrics provide a statistical
means to measure the
effectiveness of teachers and
principals.
What’s the assumed parallel to schools?

Policy
Policy-makers assume that reading
and mathematics constitute
adequate measures of effectiveness.
Policy-makers assume that the
principal controls the acquisition and
deployment of talent.
The Cincinnati Approach - Method

• Evaluators were trained and calibrated
to the Danielson model
• Both peer and administrator evaluators
were used.
• Each teacher was observed three
times by a peer and once by an
administrator.
• Stakes were higher for beginning
teachers than veterans.
Source: Taylor, E. and Tyler, J. (2012, Fall). Can Teacher Evaluation Improve Teaching?
The Cincinnati Approach - Findings

• In the first year, the average teacher
improved student math scores by .05
SD, in subsequent years this improved
to .11 SD,
• Improvement was sufficient to move a
25th percentile teacher to near average.
• Reading scores did not improve.
• The evaluations retained a “leniency”
bias typical of other evaluation
programs.
• The pilot cost was high, $7,500 per
teacher.
The Cincinnati Approach - Context

• In the first year, the average teacher
improved student math scores by .05
SD, in subsequent years this improved to
.11 SD,
• Gains in the first two years of teaching are
typically .10 SD in mathematics
(Rockoff, 2004).
• Gains from being placed with highly
effective peers are .04 SD in mathematics
(Jackson and Bruegmann,).
• The pilot cost was high, $7,500 per teacher.

Rockoff, J. E. (2004) “The Impact of Individual Teachers on Student Achievement: Evidence from
Panel Data.” American Economic Review. 94(2): 247-252.
Jackson, C. K. and Bruegmann, E., Teaching Students and Teaching Each Other: The Importance of
Peer Learning for Teachers (2009, July).
NBER Working Paper No. 15202 JEL No. I2,J24
Reliability of a variety of teacher observation
implementations
Observation by

Reliability coefficient
(relative to state test
value-added gain)

Proportion of test
variance
explained

Principal – 1

.51

26.0%

Principal – 2

.58

33.6%

Principal and other administrator

.67

44.9%

Principal and three short
observations by peer observers

.67

44.9%

Two principal observations and
two peer observations

.66

43.6%

Two principal observations and
two different peer observers

.69

47.6%

Two principal observations one
peer observation and three short
observations by peers

.72

51.8%

Bill and Melina Gates Foundation (2013, January). Ensuring Fair and Reliable Measures of Effective
Teaching: Culminating Findings from the MET Projects Three-Year Study
Assessment Literacy in a Teacher Evaluation
Framework

Presenter - John Cronin, Ph.D.
Contacting us:
Rebecca Moore: 503-548-5129
E-mail: rebecca.moore@nwea.org

This PowerPoint presentation and recommended resources are
available at our SlideShare website:
Why it’s time we stopped pretending schools should
be managed like baseball teams
Suggested reading
Baker B., Oluwole, J., Green, P. (2013). The legal
consequences of mandating high stakes
decisions based on low quality information:
Teacher evaluation in the Race to the Top Era.
Education Policy Analysis Archives. Vol 21. No
5.
Thank you for attending

Presenter - John Cronin, Ph.D.
Contacting us:
NWEA Main Number: 503-624-1951
E-mail: rebecca.moore@nwea.org
The presentation and recommended resources are
available at our SlideShare site:
http://www.slideshare.net/NWEA/tag/kingsbury-center
What about principals?

The issue is the same
with principals, it is
difficult to separate the
contribution of the
principal to learning from
the contribution of
teachers.
Source: Lipscomb, S.; Teh, B.; Gill, B.; Chiang, H.; Owens, A (2010, Sept.). Teacher and Principal ValueAdded: Research Findings and Implementation Practices. Cambridge, MA. Mathematica Policy Research.
How does it work in classrooms?

+ .25

Two very important assumptions
• The teacher directly delivers instruction that
causes learning!
Last spring
• The teacher’s impact can be measured
within a school year!

This spring
Four issues

• How do you measure a principal?
• How accurate and reliable are
these measures?
• What anticipated and
unanticipated impacts do your
measures have on behavior?
• Where should our energy really be
focused?
It’s good to learn from
past failures.
So what are the issues?

• We’ve confused players with
managers.
We’ve metrics are problematic.
• The confused players with
managers.
• We’ve chosen the wrong focus for
policy.
How does it work in education?

Teacher or School Value-Added
How much academic growth does a
teacher or school produce relative to
the median teacher or school?
So what are the issues

More Related Content

What's hot

Presentations morning session 22 January 2018 HEFCE open event “Using data to...
Presentations morning session 22 January 2018 HEFCE open event “Using data to...Presentations morning session 22 January 2018 HEFCE open event “Using data to...
Presentations morning session 22 January 2018 HEFCE open event “Using data to...
Bart Rienties
 
Valid data for school improvement final
Valid data for school improvement finalValid data for school improvement final
Valid data for school improvement final
John Cronin
 
Retaining High Performers: Insights from DC Public Schools’ Teacher Exit Survey
Retaining High Performers: Insights from DC Public Schools’ Teacher Exit SurveyRetaining High Performers: Insights from DC Public Schools’ Teacher Exit Survey
Retaining High Performers: Insights from DC Public Schools’ Teacher Exit Survey
Jeremy Knight
 
Action Research Project
Action Research ProjectAction Research Project
Action Research Project
Jessica Price
 
Improving classroom instruction with co taught instruction
Improving classroom instruction with co taught instructionImproving classroom instruction with co taught instruction
Improving classroom instruction with co taught instruction
purteeda
 
What do Student Evaluations of Teaching Really Measure?
What do Student Evaluations of Teaching Really Measure?What do Student Evaluations of Teaching Really Measure?
What do Student Evaluations of Teaching Really Measure?
Denise Wilson
 
"Unlocking the black box: what's happening in 'more effective' classrooms in ...
"Unlocking the black box: what's happening in 'more effective' classrooms in ..."Unlocking the black box: what's happening in 'more effective' classrooms in ...
"Unlocking the black box: what's happening in 'more effective' classrooms in ...
Young Lives Oxford
 
Data Summer
Data SummerData Summer
Data Summer
mark.richardson
 
Assessment information evening 2 3-16
Assessment information evening 2 3-16Assessment information evening 2 3-16
Assessment information evening 2 3-16
Kate Meakin
 
Achieving exam success
Achieving exam successAchieving exam success
Achieving exam success
Twynham School, Dorset, UK
 
Session 1 Tom Abbott Biddulph High School
Session 1    Tom  Abbott    Biddulph  High  SchoolSession 1    Tom  Abbott    Biddulph  High  School
Session 1 Tom Abbott Biddulph High SchoolMike Blamires
 
Action research on grading and assessment practices of grade 7 mathematics
Action research on grading and assessment practices of grade 7 mathematicsAction research on grading and assessment practices of grade 7 mathematics
Action research on grading and assessment practices of grade 7 mathematicsGary Johnston
 
Karim value added
Karim value addedKarim value added
Karim value addedAnilKarim
 
Action research plan
Action research planAction research plan
Action research planturnekan
 
Using Assessment data
Using Assessment dataUsing Assessment data
Using Assessment data
Dublin City University
 
How to conduct a Teacher Appraisal
How to conduct a Teacher Appraisal How to conduct a Teacher Appraisal
How to conduct a Teacher Appraisal
Mark S. Steed
 
ILC Super drop out challenge (2)
ILC Super drop out challenge (2)ILC Super drop out challenge (2)
ILC Super drop out challenge (2)Jamal Hayat
 
Aspects of Appraisal - Getting it Right
Aspects of Appraisal - Getting it RightAspects of Appraisal - Getting it Right
Aspects of Appraisal - Getting it RightJames de Bass
 
The impact of Aston Replay on student performance - Chris Jones
The impact of Aston Replay on student performance - Chris JonesThe impact of Aston Replay on student performance - Chris Jones
The impact of Aston Replay on student performance - Chris Jones
The Higher Education Academy
 

What's hot (20)

Presentations morning session 22 January 2018 HEFCE open event “Using data to...
Presentations morning session 22 January 2018 HEFCE open event “Using data to...Presentations morning session 22 January 2018 HEFCE open event “Using data to...
Presentations morning session 22 January 2018 HEFCE open event “Using data to...
 
Detailed Assessment Brochure - 2014
Detailed Assessment Brochure - 2014Detailed Assessment Brochure - 2014
Detailed Assessment Brochure - 2014
 
Valid data for school improvement final
Valid data for school improvement finalValid data for school improvement final
Valid data for school improvement final
 
Retaining High Performers: Insights from DC Public Schools’ Teacher Exit Survey
Retaining High Performers: Insights from DC Public Schools’ Teacher Exit SurveyRetaining High Performers: Insights from DC Public Schools’ Teacher Exit Survey
Retaining High Performers: Insights from DC Public Schools’ Teacher Exit Survey
 
Action Research Project
Action Research ProjectAction Research Project
Action Research Project
 
Improving classroom instruction with co taught instruction
Improving classroom instruction with co taught instructionImproving classroom instruction with co taught instruction
Improving classroom instruction with co taught instruction
 
What do Student Evaluations of Teaching Really Measure?
What do Student Evaluations of Teaching Really Measure?What do Student Evaluations of Teaching Really Measure?
What do Student Evaluations of Teaching Really Measure?
 
"Unlocking the black box: what's happening in 'more effective' classrooms in ...
"Unlocking the black box: what's happening in 'more effective' classrooms in ..."Unlocking the black box: what's happening in 'more effective' classrooms in ...
"Unlocking the black box: what's happening in 'more effective' classrooms in ...
 
Data Summer
Data SummerData Summer
Data Summer
 
Assessment information evening 2 3-16
Assessment information evening 2 3-16Assessment information evening 2 3-16
Assessment information evening 2 3-16
 
Achieving exam success
Achieving exam successAchieving exam success
Achieving exam success
 
Session 1 Tom Abbott Biddulph High School
Session 1    Tom  Abbott    Biddulph  High  SchoolSession 1    Tom  Abbott    Biddulph  High  School
Session 1 Tom Abbott Biddulph High School
 
Action research on grading and assessment practices of grade 7 mathematics
Action research on grading and assessment practices of grade 7 mathematicsAction research on grading and assessment practices of grade 7 mathematics
Action research on grading and assessment practices of grade 7 mathematics
 
Karim value added
Karim value addedKarim value added
Karim value added
 
Action research plan
Action research planAction research plan
Action research plan
 
Using Assessment data
Using Assessment dataUsing Assessment data
Using Assessment data
 
How to conduct a Teacher Appraisal
How to conduct a Teacher Appraisal How to conduct a Teacher Appraisal
How to conduct a Teacher Appraisal
 
ILC Super drop out challenge (2)
ILC Super drop out challenge (2)ILC Super drop out challenge (2)
ILC Super drop out challenge (2)
 
Aspects of Appraisal - Getting it Right
Aspects of Appraisal - Getting it RightAspects of Appraisal - Getting it Right
Aspects of Appraisal - Getting it Right
 
The impact of Aston Replay on student performance - Chris Jones
The impact of Aston Replay on student performance - Chris JonesThe impact of Aston Replay on student performance - Chris Jones
The impact of Aston Replay on student performance - Chris Jones
 

Similar to Naesp keynote3

Teacher evaluation presentation3 mass
Teacher evaluation presentation3  massTeacher evaluation presentation3  mass
Teacher evaluation presentation3 mass
John Cronin
 
NWEA Growth and Teacher evaluation VA 9-13
NWEA Growth and Teacher evaluation VA 9-13NWEA Growth and Teacher evaluation VA 9-13
NWEA Growth and Teacher evaluation VA 9-13
NWEA
 
Va 101 ppt
Va 101 pptVa 101 ppt
Chief accountability officers presentation
Chief accountability officers presentationChief accountability officers presentation
Chief accountability officers presentation
John Cronin
 
Using Assessment Data for Educator and Student Growth
Using Assessment Data for Educator and Student GrowthUsing Assessment Data for Educator and Student Growth
Using Assessment Data for Educator and Student Growth
NWEA
 
Ma sampletest-hs 2010-13
Ma sampletest-hs 2010-13Ma sampletest-hs 2010-13
Ma sampletest-hs 2010-13Erlinda Rey
 
PTA Meeting - Smith Middle School (Scott's Assignment)
PTA Meeting - Smith Middle School (Scott's Assignment)PTA Meeting - Smith Middle School (Scott's Assignment)
PTA Meeting - Smith Middle School (Scott's Assignment)
reshondascott
 
Teacher evaluation and goal setting connecticut
Teacher evaluation and goal setting   connecticutTeacher evaluation and goal setting   connecticut
Teacher evaluation and goal setting connecticut
John Cronin
 
Teacher Rating 2012-2013
Teacher Rating 2012-2013Teacher Rating 2012-2013
Teacher Rating 2012-2013Justin Rook
 
Niez - RMDA Final Exam
Niez - RMDA Final ExamNiez - RMDA Final Exam
Niez - RMDA Final ExamDaniel Niez
 
Teacher Assessment Fact Sheet
Teacher Assessment Fact SheetTeacher Assessment Fact Sheet
Teacher Assessment Fact Sheet
Abdul-Hakim Shabazz
 
California administrator symposium nwea
California administrator symposium nweaCalifornia administrator symposium nwea
California administrator symposium nwea
John Cronin
 
Dylan Willam- Leadership for Teacher Learning- Times Festival of Education
Dylan Willam- Leadership for Teacher Learning- Times Festival of EducationDylan Willam- Leadership for Teacher Learning- Times Festival of Education
Dylan Willam- Leadership for Teacher Learning- Times Festival of Educationhannahtyreman
 
Educator evaluation policy overview-final
Educator evaluation policy overview-finalEducator evaluation policy overview-final
Educator evaluation policy overview-final
Research in Action, Inc.
 
Principal and-teacher-evaluation-key-ideas-your-role-and-your-school's-leadin...
Principal and-teacher-evaluation-key-ideas-your-role-and-your-school's-leadin...Principal and-teacher-evaluation-key-ideas-your-role-and-your-school's-leadin...
Principal and-teacher-evaluation-key-ideas-your-role-and-your-school's-leadin...Winnie de Leon
 
PERA & SB 7 Principal Work
PERA & SB 7 Principal WorkPERA & SB 7 Principal Work
PERA & SB 7 Principal Work
Richard Voltz
 
The Value Of Value Added Methodology
The Value Of Value Added MethodologyThe Value Of Value Added Methodology
The Value Of Value Added Methodology
sidonye
 
Aeiou of k 3 literary checkpoints-3 3
Aeiou of k 3 literary checkpoints-3 3Aeiou of k 3 literary checkpoints-3 3
Aeiou of k 3 literary checkpoints-3 3
Keith Eades
 
Accountability: What's It Really All About?
Accountability: What's It Really All About?Accountability: What's It Really All About?
Accountability: What's It Really All About?
seprogram
 
Karthik Muralidharan on research on achieving universal quality primary educa...
Karthik Muralidharan on research on achieving universal quality primary educa...Karthik Muralidharan on research on achieving universal quality primary educa...
Karthik Muralidharan on research on achieving universal quality primary educa...
Twaweza
 

Similar to Naesp keynote3 (20)

Teacher evaluation presentation3 mass
Teacher evaluation presentation3  massTeacher evaluation presentation3  mass
Teacher evaluation presentation3 mass
 
NWEA Growth and Teacher evaluation VA 9-13
NWEA Growth and Teacher evaluation VA 9-13NWEA Growth and Teacher evaluation VA 9-13
NWEA Growth and Teacher evaluation VA 9-13
 
Va 101 ppt
Va 101 pptVa 101 ppt
Va 101 ppt
 
Chief accountability officers presentation
Chief accountability officers presentationChief accountability officers presentation
Chief accountability officers presentation
 
Using Assessment Data for Educator and Student Growth
Using Assessment Data for Educator and Student GrowthUsing Assessment Data for Educator and Student Growth
Using Assessment Data for Educator and Student Growth
 
Ma sampletest-hs 2010-13
Ma sampletest-hs 2010-13Ma sampletest-hs 2010-13
Ma sampletest-hs 2010-13
 
PTA Meeting - Smith Middle School (Scott's Assignment)
PTA Meeting - Smith Middle School (Scott's Assignment)PTA Meeting - Smith Middle School (Scott's Assignment)
PTA Meeting - Smith Middle School (Scott's Assignment)
 
Teacher evaluation and goal setting connecticut
Teacher evaluation and goal setting   connecticutTeacher evaluation and goal setting   connecticut
Teacher evaluation and goal setting connecticut
 
Teacher Rating 2012-2013
Teacher Rating 2012-2013Teacher Rating 2012-2013
Teacher Rating 2012-2013
 
Niez - RMDA Final Exam
Niez - RMDA Final ExamNiez - RMDA Final Exam
Niez - RMDA Final Exam
 
Teacher Assessment Fact Sheet
Teacher Assessment Fact SheetTeacher Assessment Fact Sheet
Teacher Assessment Fact Sheet
 
California administrator symposium nwea
California administrator symposium nweaCalifornia administrator symposium nwea
California administrator symposium nwea
 
Dylan Willam- Leadership for Teacher Learning- Times Festival of Education
Dylan Willam- Leadership for Teacher Learning- Times Festival of EducationDylan Willam- Leadership for Teacher Learning- Times Festival of Education
Dylan Willam- Leadership for Teacher Learning- Times Festival of Education
 
Educator evaluation policy overview-final
Educator evaluation policy overview-finalEducator evaluation policy overview-final
Educator evaluation policy overview-final
 
Principal and-teacher-evaluation-key-ideas-your-role-and-your-school's-leadin...
Principal and-teacher-evaluation-key-ideas-your-role-and-your-school's-leadin...Principal and-teacher-evaluation-key-ideas-your-role-and-your-school's-leadin...
Principal and-teacher-evaluation-key-ideas-your-role-and-your-school's-leadin...
 
PERA & SB 7 Principal Work
PERA & SB 7 Principal WorkPERA & SB 7 Principal Work
PERA & SB 7 Principal Work
 
The Value Of Value Added Methodology
The Value Of Value Added MethodologyThe Value Of Value Added Methodology
The Value Of Value Added Methodology
 
Aeiou of k 3 literary checkpoints-3 3
Aeiou of k 3 literary checkpoints-3 3Aeiou of k 3 literary checkpoints-3 3
Aeiou of k 3 literary checkpoints-3 3
 
Accountability: What's It Really All About?
Accountability: What's It Really All About?Accountability: What's It Really All About?
Accountability: What's It Really All About?
 
Karthik Muralidharan on research on achieving universal quality primary educa...
Karthik Muralidharan on research on achieving universal quality primary educa...Karthik Muralidharan on research on achieving universal quality primary educa...
Karthik Muralidharan on research on achieving universal quality primary educa...
 

Naesp keynote3

  • 1. Why it’s time we stopped managing schools like baseball teams part …for the most John Cronin, Ph.D. Senior Director, the Kingsbury Center at Northwest Evaluation Association You can view this presentation at slideshare: http://www.slideshare.net/NWEA/schools-cant
  • 2. How does it work in baseball? In baseball, the contribution of players to the success of the team can be measured (value-added). In baseball, general managers have complete control over the acquisition and deployment of players.
  • 3. How does it work in baseball? Sabermetricians estimate the number of wins a player contributes to his team. It’s calculated by estimating the number of runs contributed by a player and adding the number of runs denied by that player’s defensive contributions.
  • 4. So what are the issues? • We’ve confused players with managers. • The metrics are problematic. • We’ve chosen the wrong focus for policy.
  • 5. Baseball hasn’t found a We assume the statistics methodology to effectively applied to players (teachers) apply sabermetrics to can be applied to their managers. (principals). managers
  • 6. How does it work in classrooms? Brian’s projection gains on this for A gain students’ is estimated spring’s tests are compared to this his students. This projection may projection. If the gains exceed the take into account his student’s projection, we say Brian poverty past performance, their state Brian’s students took theproduced “value-added”. rate, and a variety of other factors. exam last spring Value-added methodologies attempt to isolate a teacher’s contribution to learning by measuring student growth while controlling or eliminating factors that influence growth but are outside the teacher’s control, such as student poverty. Last spring This spring
  • 7. How does it work in classrooms? + .25 Brian’s students’ gains on this Brian’s gain is compared to that of spring’s tests and he is typically other teachersare compared to this projection. If score”, a exceed the assigned a “z the gains metric that projection, we say Brian produced shows where he stands relative to “value-added”. other teachers in the state. Last spring This spring
  • 8. How are principals different? • They don’t directly deliver instruction to students. • Their impact cannot easily be measured within a school year Source: Lipscomb, S.; Teh, B.; Gill, B.; Chiang, H.; Owens, A (2010, Sept.). Teacher and Principal ValueAdded: Research Findings and Implementation Practices. Cambridge, MA. Mathematica Policy Research.
  • 9. Three schools value-added math and reading results – who is the better principal? Math Reading 2 1.5 1 Many state assessment systems use a single year of data for principal evaluation. 0.5 0 -0.5 -1 -1.5 Langston Hughes Elem Scott Joplin Elem Lewis Latimer Elem
  • 11. Scott Joplin Elementary Math Reading 2 1.5 1 0.5 0 -0.5 -1 -1.5 -2 2009-10 Below average growth, improving but decelerating 2010-11 2011-12 2012-13
  • 12. Lewis Latimer Elementary Math Reading 2 1.5 1 0.5 0 -0.5 -1 -1.5 -2 2009-10 Below average growth, improving and accelerating 2010-11 2011-12 2012-13
  • 13. So what are the issues? • We’ve confused players with managers. • The metrics are problematic. • The metrics are problematic. • We’ve chosen the wrong focus for policy.
  • 14. How does it work in baseball? In baseball, each player creates his own metrics by getting on base, stealing bases, or making catches. The metrics directly reflect their performance.
  • 15. Issues in the use of growth and valueadded measures Differences among value-added models Los Angeles Times Study Los Angeles Times Study #2
  • 16. Issues in the use of value-added measures Control for statistical error All models attempt to address this issue. Nevertheless, many teachers value-added scores will fall within the range of statistical error.
  • 17. Issues in the use of growth and valueadded measures Control for statistical error New York City New York City #2
  • 18. What Makes Schools Work Study - Mathematics Value-added indexwithin Group 15.0 Year 2 10.0 5.0 0.0 -5.0 -10.0 -10.0 -8.0 -6.0 -4.0 -2.0 0.0 2.0 4.0 6.0 8.0 10.0 12.0 Year 1 Data used represents a portion of the teachers who participated in Vanderbilt University’s What Makes Schools Work Project, funded by the federal Institute of Education Sciences
  • 20. Metrics drive behavior The term “bubble kid” had a different meaning prior to 2000.
  • 21. One district’s change in 5th grade math performance relative to Kentucky cut scores Mathematics Number of Students No Change Down Up Fall RIT
  • 22. Metrics drive behavior Race to the Top changes the focus.
  • 23. Number of students who achieved the normal mathematics growth in that district Mathematics Failed growth target Number of Students Met growth target Student’s score in fall
  • 24. Gaming distorts results Testing conditions may be gamed to inflate results
  • 25. Test duration and math growth between two terms in one school’s fifth grade Number of minutes for each student’s first testtest. 50 second 140 100 Minutes 30 80 20 60 10 Scale score growth The white line represents the average duration of the 40 second test. 120 40 0 20 The scale score growth attained by each child. 0 -10 Test 1 Duration Test 2 Duration Scale Score Gain The yellow line represents the average growth for fifth graders in this district
  • 26. Test duration and math growth between two terms in all fifth grades in a district 90.0 25.0 80.0 60.0 Minutes 15.0 50.0 40.0 10.0 30.0 20.0 5.0 10.0 0.0 0.0 Test 1 Duration Test 2 Duration Scale Score Growth Scale score growth 20.0 70.0
  • 27. Test duration and math growth between two terms in all fifth grades in a district 120.00 18.0 16.0 100.00 Minutes 80.00 12.0 10.0 60.00 8.0 40.00 6.0 4.0 20.00 2.0 0.00 0.0 Test 1 Duration Test 2 Duration Scale Score Growth Scale score growth 14.0
  • 28. The problem with spring-spring testing Student’s spring to spring growth trajectory Teacher 1 3/12 4/12 5/12 Summer 6/12 7/12 8/12 Teacher 2 9/12 10/12 11/12 12/12 1/13 2/13 3/13
  • 29. Metrics do not provide a complete picture of the classroom They don’t capture important noncognitive factors that impact learning.
  • 30. The intangibles In baseball, the employment of sabermetrics has reduced the impact that a player’s intangibles has on personnel decisions. These intangibles may include leadership qualities, locker room presence, and other personality traits that may contribute to team success.
  • 31. Non-cognitive factors In education, value-added Jackson (2012) argues that measurement have more teachers may has focused In on non-cognitive policy-makers on the impact baseball, the employment of teacher’s contribution to to factors that are essential sabermetrics has academic success, as focused student success like general managers reflected in test scores.on the attendance, grades, and player’s suspensions.contribution to the measures that These are not the only ultimately matter measures that matter in the sport, however. runs and wins.
  • 32. Non-cognitive factors • Lowered the average student absenteeism by 7.4 days. • Improved the probability that students would enroll Employing value-added methodologies, in the next grade by teachers had a Jackson found that 5 percentage points. • Reduced the likelihood of suspension by substantive effect on non-cognitive 2.8% outcomes that was independent of their .05 • Improved the average GPA by .09 (Algebra) or effect on test scores (English) Source: Jackson, K. (2013). Non-Cognitive Ability, Test Scores and Teacher Quality: Evidence from 9th Grade Teachers in North Carolina. Northwestern University and NBER
  • 33. So what are the issues? • We’ve confused players with managers. • The metrics are problematic. • We’ve chosen the wrong focus for • We’ve chosen the wrong focus for policy. policy.
  • 34. Policy has focused on dismissal rather than retention. In baseball, exceptional players are much rarer than average ones. Thus it is vital for a team to keep its best players.
  • 35. Employment of Elementary Teachers 20072012 1538000 The elementary school NUMBER OF TEACHERS teacher workforce shrunk by 178,000 teachers (11%) between May, 2007 and 1544300 1544270 May, 2012. 1485600 Source 1415000 1360380 2007 2008 2009 2010 2011 2012 Source: (2012, May) Bureau of Labor Statistics – Occupational Employment Statistics Numbers exclude special education and kindergarten teachers
  • 36. The impact of seniority based layoffs on school quality In a simulation study of implementation of a layoff of 5% of teachers using New York City data, reliance on seniority based layoffs resulted would: • Result in 25% more teachers laid off. • Teachers laid off wouldSource standard deviations be .31 more effective (using a value-added criterion) than those lost using an effectiveness criterion. • 84% of teachers with unsatisfactory ratings would be retained. Source: Boyd, L., Lankford, H., Loeb, S., and Wycoff, J. (2011). Center for Education Policy. Stanford University.
  • 37. If evaluators do not We must identify also differentiate their identify the the and protect least ratings, then all to effective teachers most effective differentiation with gain credibility teachers to improve comes from the test. public. the profession.
  • 38. Results of Tennessee Teacher Evaluation Pilot 60% 53% 50% 40% 40% 30% 24% 20% 23% 22% 16% 12% 8% 10% 2% 0% 0% 1 2 Value-added result 3 4 Observation Result 5
  • 39. Results of Georgia Teacher Evaluation Pilot Evaluator Rating 1% 2% 23% ineffective Minimally Effective Effective Highly Effective 75%
  • 40. Ratings under new Florida teacher evaluation regulations Florida Teacher Evaluation Rating 80 74.6 70 61.9 60 50 40 30 36.9 22.6 20 10 0.9 2.1 0.2 0.5 0.1 0.2 Needs Improvement 3 Year Developing Ineffective 0 Highly Effective Effective 2011-12 2012-13
  • 41. It’s good to learn from past failures.
  • 42. What’s the analogy to schools? Policy makers believe valueadded metrics provide a statistical means to measure the effectiveness of teachers and principals.
  • 43. What’s the assumed parallel to schools? Policy Policy-makers assume that reading and mathematics constitute adequate measures of effectiveness. Policy-makers assume that the principal controls the acquisition and deployment of talent.
  • 44.
  • 45. The Cincinnati Approach - Method • Evaluators were trained and calibrated to the Danielson model • Both peer and administrator evaluators were used. • Each teacher was observed three times by a peer and once by an administrator. • Stakes were higher for beginning teachers than veterans. Source: Taylor, E. and Tyler, J. (2012, Fall). Can Teacher Evaluation Improve Teaching?
  • 46. The Cincinnati Approach - Findings • In the first year, the average teacher improved student math scores by .05 SD, in subsequent years this improved to .11 SD, • Improvement was sufficient to move a 25th percentile teacher to near average. • Reading scores did not improve. • The evaluations retained a “leniency” bias typical of other evaluation programs. • The pilot cost was high, $7,500 per teacher.
  • 47. The Cincinnati Approach - Context • In the first year, the average teacher improved student math scores by .05 SD, in subsequent years this improved to .11 SD, • Gains in the first two years of teaching are typically .10 SD in mathematics (Rockoff, 2004). • Gains from being placed with highly effective peers are .04 SD in mathematics (Jackson and Bruegmann,). • The pilot cost was high, $7,500 per teacher. Rockoff, J. E. (2004) “The Impact of Individual Teachers on Student Achievement: Evidence from Panel Data.” American Economic Review. 94(2): 247-252. Jackson, C. K. and Bruegmann, E., Teaching Students and Teaching Each Other: The Importance of Peer Learning for Teachers (2009, July). NBER Working Paper No. 15202 JEL No. I2,J24
  • 48. Reliability of a variety of teacher observation implementations Observation by Reliability coefficient (relative to state test value-added gain) Proportion of test variance explained Principal – 1 .51 26.0% Principal – 2 .58 33.6% Principal and other administrator .67 44.9% Principal and three short observations by peer observers .67 44.9% Two principal observations and two peer observations .66 43.6% Two principal observations and two different peer observers .69 47.6% Two principal observations one peer observation and three short observations by peers .72 51.8% Bill and Melina Gates Foundation (2013, January). Ensuring Fair and Reliable Measures of Effective Teaching: Culminating Findings from the MET Projects Three-Year Study
  • 49. Assessment Literacy in a Teacher Evaluation Framework Presenter - John Cronin, Ph.D. Contacting us: Rebecca Moore: 503-548-5129 E-mail: rebecca.moore@nwea.org This PowerPoint presentation and recommended resources are available at our SlideShare website:
  • 50. Why it’s time we stopped pretending schools should be managed like baseball teams
  • 51. Suggested reading Baker B., Oluwole, J., Green, P. (2013). The legal consequences of mandating high stakes decisions based on low quality information: Teacher evaluation in the Race to the Top Era. Education Policy Analysis Archives. Vol 21. No 5.
  • 52. Thank you for attending Presenter - John Cronin, Ph.D. Contacting us: NWEA Main Number: 503-624-1951 E-mail: rebecca.moore@nwea.org The presentation and recommended resources are available at our SlideShare site: http://www.slideshare.net/NWEA/tag/kingsbury-center
  • 53. What about principals? The issue is the same with principals, it is difficult to separate the contribution of the principal to learning from the contribution of teachers. Source: Lipscomb, S.; Teh, B.; Gill, B.; Chiang, H.; Owens, A (2010, Sept.). Teacher and Principal ValueAdded: Research Findings and Implementation Practices. Cambridge, MA. Mathematica Policy Research.
  • 54. How does it work in classrooms? + .25 Two very important assumptions • The teacher directly delivers instruction that causes learning! Last spring • The teacher’s impact can be measured within a school year! This spring
  • 55. Four issues • How do you measure a principal? • How accurate and reliable are these measures? • What anticipated and unanticipated impacts do your measures have on behavior? • Where should our energy really be focused?
  • 56. It’s good to learn from past failures.
  • 57. So what are the issues? • We’ve confused players with managers. We’ve metrics are problematic. • The confused players with managers. • We’ve chosen the wrong focus for policy.
  • 58. How does it work in education? Teacher or School Value-Added How much academic growth does a teacher or school produce relative to the median teacher or school?
  • 59. So what are the issues