2. Policy Context
Focusing on teacher effectiveness is seen as
a promising path for education policy
New teacher evaluation systems, and
especially, “Value Added Models” (VAMs) are
promoted as tools to accomplish this goal
Policy makers can benefit from research
about what various models actually can and
cannot do
3. Professional Consensus
The research base is currently insufficient to
support the use of VAM for high-stakes
decisions about individual teachers or
schools.
–RAND Corporation, 2005
4. Professional Consensus
VAM estimates of teacher effectiveness …
should not used to make operational
decisions because such estimates are far
too unstable to be considered fair or
reliable.
– 2009 Letter Report from the
Board on Testing and Assessment,
National Research Council
5. Concerns Raised about
Value-Added Measures
Studies find that teachers’ value-added
“effectiveness” is highly variable & influenced by:
The measure of achievement used
The effectiveness of peers
Class size, curriculum, instructional supports,
and time spent with students
Tutoring and out-of-school learning
Student characteristics and attendance
6. Many Factors
Influence Student Achievement
Teacher knowledge, skills, dispositions, and behaviors that
support the learning process.
Hanushek et al. estimate the individual teacher effects component
of measured student achievement is about 7%-10% of the total.
Student availability for learning – Prior learning opportunities, health,
supportive home context, attendance, developed abilities
Resources for learning – Curriculum quality, materials, class sizes,
specialist supports, etc.
Coherence and continuity – The extent to which content & skills are
well organized and reinforced across grades and classes
7. Value-Added Measures of Teacher
Effects are Not Highly Stable
By at least 1 By at least 2 By at least 3
decile deciles deciles
Across
statistical 56-80% 12-33% 0-14%
models*
Across
courses* 85-100% 54-92% 39-54%
Across
years* 74-93% 45-63% 19-41%
*Depending on the model
8. A Teacher’s Measured “Effectiveness”
Can Vary Widely
YEAR 1 10 YEAR 2
10 Same high school
8
6 Same course
4 (English I)
2 1
Not a beginning
0
Decile Rank Y1 Decile Rank Y2 teacher
80
Model controls for:
60
Y1 Prior
40
Y2 achievement
20
Demographics
0
% ELL % Low- %Hispanic School fixed
income effects
9. The Unintended Effects of VA Teacher
Evaluation in Houston: Three Cases in Point
In spring of 2011, a number of HISD
teachers’ contracts were not renewed, largely
due to:
“a significant lack of student progress attributable
to the educator,” and
“insufficient student academic growth reflected by
[EVAAS] value-added scores.”
These teachers filed wrongful termination
appeals.
Let’s take a look at the EVAAS data for 3 of
them 9
10. Teacher A’s EVAAS Scores (2007-2010)
- Teacher A had been a teacher for more than 10 years, teaching elementary
school in HISD since 2000.
- Teacher A showed positive VA scores 50% of the time (8/16 of EVAAS
observations) . During Teacher A’s most recent years of activity, her VA
scores were positive 2/3 of the time.
- Until 2010-11, she “exceeded expectations” across every domain in her
supervisor evaluations. She was given a “Teacher of the Month” award in
2010 and a “Teacher of the Year” award in 2008.
10
11. Teacher B’s EVAAS Scores (2008-2010)
- Teacher B, a career-changer with a bachelor’s and master’s degree in
mathematics, was certified as a math teacher via HISD’s Alternative
Teaching Certificate (ATC) program. She taught middle- and high-school
math in HISD since 2007.
- Teacher B’s relative value-added scores were negative for math for two
years, and positive for the most recent year for which she had EVAAS data.
- Note that she taught alongside another math teacher who taught nearly half
of her students math an equal amount of time per week all year long.
- Until 2010-11, she scored a “proficient” across every domain in terms of her
supervisor evaluations.
11
12. Teacher C’s EVAAS Scores (2007-2010)
-Teacher C graduated with a bachelor’s degree in 2005, and in 2007 was
certified as a teacher for grades 4-8 via HISD’s Alternative Teaching
Certificate (ATC) program. She took a full-time position in HISD in 2006.
- Teacher C flip-flopped across subject areas, with positive VA scores 50% of
the time (3/6 EVAAS observations) and negative scores 50% of the time (3/6
EVAAS observations) up until 2009-2010.
- In 2009-2010 Teacher C was assigned to teach a large number of English
Language Learners who were transitioned into her classroom.
- Until 2010-11, she was rated as “exceeded expectations” or “proficient”
across every domain in terms of her supervisor evaluations.
12
13. The Unintended Effects of the EVAAS:
Student Characteristics Affect Teachers’ VA
Teachers teaching in grades in which English
Language Learners (ELLs) are transitioned into
mainstreamed classrooms are the least likely to show
“added value.”
Teachers teaching larger numbers of special education
students in mainstreamed classrooms are also found
to have lower “value-added” scores.
Teachers teaching students in consecutive years
report receiving bonuses for the first year and nothing
the next, as they “max out” on growth.
Teachers teaching gifted students have small gains
because their students are near the top. 13
14. UNINTENDED EFFECTS
“The most pernicious effect of these [test-
based accountability] systems is to cause
teachers to resent the children who don’t
score well.”
—Anonymous teacher,
in a workshop many years ago
15. How Can We Evaluate and
Develop Effective Teaching
for Every Child?
17. Effective Teachers…
Engage students in active learning
Create intellectually ambitious tasks
Use a variety of teaching strategies
Assess student learning
continuously and adapt
teaching to student needs
Create effective scaffolds and supports
Provide clear standards, constant feedback, and
opportunities for revising work
Develop and effectively manage a collaborative
classroom in which all students have membership.
18. These Qualities are Embedded in
Standards for Teaching
National Board for Professional Teaching
Standards (1987)
-- Portfolio used to certify accomplished teaching
Interstate New Teacher Assessment and
Support Consortium (INTASC) (1990)
-- Adopted in > 40 states including California (CSTP)
-- Basis of new licensing assessments
-- Recently revised to reflect Common Core Standards
Standards-based Teacher Evaluation
Instruments used in many districts
19. Standards-Based Evaluations
Use structured observations of teaching,
based on professional standards, along with
other evidence of practice (e.g. lesson
plans, student work)
Offer stable evidence over time
Are related to student learning gains
Help teachers become more effective when
they are the source of continuous feedback
(Milanowski, Kimball, & White, 2004).
20. Examples
Evaluation systems in San Mateo, Poway,
San Francisco, Cincinnati, Denver,
Rochester, New Mexico, as well as
Singapore, Netherlands, and elsewhere
A number of systems incorporate evidence
of student learning drawn from classroom
work and classroom / school / district
assessments in an integrated fashion
21. 1) Start with Standards and
Build a Unified System
Build on CA Standards for the Teaching
Profession
Create Standards-Based Approaches to state
licensure assessment and advanced certification
Use the same standards to shape teacher
evaluation tool(s) for local evaluation
Infuse into principal preparation, licensure, and
evaluation the ability to evaluate and support
teachers based on standards
22. 2) Use Performance Assessments to
Guide Teacher Preparation & Licensing
Teacher Performance Assessments examine
-- Planning for a unit of instruction
-- Instruction and rationale
-- Assessment and student learning
-- Reflection on teaching
-- Development of academic language
Trained scorers use analytic rubrics
Calibration and auditing of scores
Assessments reliably predict effectiveness
23.
24. Predictive Validity of Performance
Assessments
Mentor evaluations (Rockoff & Speroni)
National Board Certification
-- Effect sizes of .04 -.20 (pass/fail)
Connecticut BEST portfolio
-- Effect size of .46 (4 point scale)
California PACT assessment
-- Effect size of .15 (44 point scale)
20 percentile point difference in adjusted
student achievement for highest and
lowest-scoring teacher
25. What Performance
Assessments May Offer
• A means to better evaluate teacher effectiveness
• Stable evidence that is more valid than student
achievement data which are
-- unavailable for most teachers
-- volatile across years, courses, models
-- sometimes based on narrow tests
• A lever for improving teacher learning and
program quality (preparation, induction, and PD)
26. Teacher Candidates Learn
I think for me the most valuable thing was the
sequencing of the lessons, teaching the
lesson, and evaluating what the kids were
getting, what the kids weren’t getting, and
having that be reflected in my next
lesson...the ‘teach-assess-teach-assess-
teach-assess’ process. And so you’re
constantly changing – you may have a plan
or a framework that you have together, but
knowing that that’s flexible and that it has to
be flexible, based on what the children learn
that day.
27. Teacher Educators Learn
This [scoring] experience…has forced me to
revisit the question of what really matters in
the assessment of teachers, which – in turn –
means revisiting the question of what really
matters in the preparation of teachers.
28. Cooperating Teachers
Reflect on Practice
[The scoring process] forces you to be clear
about “good teaching;” what it looks like,
sounds like. It enables you to look at your
own practice critically/with new eyes.
29. Induction Programs Learn
As an induction program coordinator, I have a
much clearer picture of what credential holders
will bring to us and of what they’ll be required
to do. We can build on this.
30. 3) Build Annual Evaluation Tools Based on
the Same Standards
Combine Evidence of Practice, Performance, and
Outcomes in an Integrated Evaluation System that
looks at
Teaching practice in relation to standards,
curriculum goals, and student needs
Contributions to colleagues and the school, and
Student learning in relation to teaching practices,
curriculum goals, and student needs.
Accomplishment of individual and group goals
31. 4) Use Multiple Data Sources to
Reflect Practice and Learning
Standards-based observation (in person or video) by
experts trained in evaluation and, ideally, the content area
Examination of curriculum plans, assignments, and
student work samples
Evidence of practices that support student learning both in
and outside of the classroom (including work with parents
& colleagues)
Evidence of student learning measured in a variety of
ways (e.g. work samples, learning progressions, pre- and
post-measures tied to curriculum, exhibitions of mastery,
as well as annual tests)
32. 6) Develop Evaluation
Expertise and Systems
Train evaluators
Release and fund expert mentors to offer
assistance
-- to beginners
-- to teachers who need additional coaching
Create evaluation panels and processes for
making decisions about tenure and
continuation in cases of intervention (e.g.
Peer Assistance and Review systems)
33. 7) Integrate Systems
Link the implementation of common core
standards to educator support and evaluation
Train and assess prospective and current
principals for teacher evaluation and support
Introduce performance-based licensure for
leaders based on understanding teaching
Use professional development policies and
compensation to support assessment
Involve senior teachers, mentors, principals,
and teacher educators as assessors
35. 8) Create policies that support the
development of expertise
Research finds that student learning gains are related to:
Strong academic background
Quality preparation prior to entry
Certification in the field taught
Experience (> 3 years)
National Board Certification
In combination, these predict more of the difference in
student learning gains than race & parent education
combined (Clotfelter, Ladd, & Vigdor, 2008).
Policies should strengthen & equalize these features.
36. Expand High-Quality
Pathways to Teaching
Evaluate all preparation and induction
programs based on results of
-- teacher performance assessments (TPA)
-- graduates’ contributions to student learning
-- retention rates in teaching
Use results in program approval / accreditation
decisions
Study features of successful programs & create
incentives for other programs to adopt these features
Expand successful programs and eliminate those
that don’t improve
37. 9) Deepen Professional Learning
Create a strong infrastructure for professional
learning that is:
Responsive to teacher and principal needs
Sustained and readily available
Grounded in curriculum content
Supportive of diverse learners
Supported by coaching
Connected to collaborative work in
professional learning communities
Integrated into school and classroom planning around
curriculum, instruction, and assessment
38. What Research Tells Us
Well-designed professional development can
improve practice and increase student
achievement.
A review of high-quality experimental studies
found that among programs offering extended
PD (49 hours on average over 6 to 12 months),
student achievement increased by 21
percentile points. (Yoon et al., 2007)
One-shot workshops do not have positive
effects.
39. The Status of Professional
Development in the United States
Effective professional development is better
understood but still relatively rare in the U.S.
Most teachers (>90%) participate in 1 to 2 day
workshops and conferences.
Well under half get sustained PD, get mentoring
or coaching, or observe other classrooms.
Only 17% of U.S. teachers reported a great deal
of cooperative effort among staff members in
2004. This percentage shrank to 15% in 2008.
40. Professional Learning Opportunities
in High-Achieving Nations Abroad
High-achieving nations in Europe and Asia:
Ensure extensive (3-4 year) initial preparation
that includes clinical training in model schools
Provide beginners with intensive mentoring.
Offer extensive, sustained learning opportunities
embedded in practice:
Teachers have 15-25 hours a week for collaboration
plus 100 hours a year for professional learning
Most engage regularly in Lesson Study, Action
Research, and Peer Observation and Coaching to
evaluate and improve practice.
41. (10) Address other Influences on
Teaching Effectiveness
Mentoring and professional development
Curriculum and assessments that support
meaningful instruction
Collaborative planning that builds knowledge
& creates coherence
Personalization
Availability of high-quality materials
Administrative supports for instruction
42. A Smart System Would…
1. Adopt teaching standards that are coherent across the career
2. Use Performance Assessments for initial licensure, professional
licensure, & advanced certification
-- scored by practicing educators / teacher educators
-- used to evaluate and accredit programs
3. Develop local teacher evaluation based on the same
professional standards
4. Combine multi-faceted evidence about practice, professional
contributions, and student learning in an integrated model
5. Build expertise and professional support for evaluation by
focusing on principal knowledge and skills as well
6. Leverage changes in preparation, induction, and professional
development based on what features produce results.
7. Equalize access to teachers who are prepared and certified
based on these stronger measures.
Editor's Notes
Value-added models are designed to quantify the amount of achievement “value” teachers add to their students over the course of a school year. There are some old ideas here, but with some new vocabulary and some new statistical twists.
In a 2003 RAND research report, McCaffrey, Koretz, Lockwood, and Hamilton had this to say.
In 2009, the NRC’s Board on Testing and Assessment issued a letter report directed to Education Secretary Arne Duncan, commenting on the Department’s proposal on the Race to the Top Fund. That letter included strong cautions concerning value-added models, and strongly urged further research and pilot studies before mandating any operational use of these models. Since then, the evidence has continued to accumulate that these models have serious problems.
Click to add notes
Introduction: The score of interest here was the gain score index comparing each individual teacher to other similar teachers across the district. This is the score that is used by HISD for determining ASPIRE awards. - Because comparisons are made based on one standard error, teachers with a score above 1.0 are deemed as adding value, and teachers scoring between 1.0 and -1.0 are deemed as Not Detectably Different (NDD) from other like teachers across the district. These numbers are noted with asterisks. Notes: - Teacher A’s EVAAS performance is really no different than the flip of a coin. - Whether she demonstrated “a significant lack of EVAAS growth” is debatable. -“Exceeding Expectations” is the best score one can receive. Outcome: -- Teacher A decided to quit teaching in HISD and did not pursue a wrongful termination hearing.
Notes: - Teacher B’s most recent year was purportedly her best. -- Whether Teacher B can be held responsible for 100% of her students’ math value-added scores across all three of these years, given her purported losses and gains alike, is debatable. Outcome: -- Teacher B decided to quit teaching in HISD and did not pursue a wrongful termination hearing.
Notes: -- Like Teacher A, no different than the flip of a coin. -- Teacher C said: “I went to a transition classroom, and now there’s a red flag next to my name. I guess now I’m an ineffective teacher? I keep getting letters from the district, saying “You’ve been recognized as an outstanding teacher”…this, this, and that. But now because I teach English Language Learners who “transition in,” my scores drop? And I get a flag next to my name for not teaching them well?” - Teachers in Houston generally note that teachers who teach ELLs in transition are the least likely to show growth across grade levels. Outcome: -- Teacher C decided to quit teaching in HISD and did not pursue a wrongful termination hearing. .
Bullet 1: One teacher noted: “I’m scared to teach in the 4 th grade. I’m scared I might lose my job if I teach in an [ELL] transition grade level, because I’m scared my scores are going to drop, and I’m going to get fired because there’s probably going to be no growth.” Another teacher noted: “When they say nobody wants to do 4 th grade – nobody wants to do 4 th grade! Nobody.” Bullet 3: A teacher noted: “I found out that I [have been] competing with myself.” Bullet 4: A gifted teacher noted: “Every year I have the highest test scores, I have fellow teachers that come up to me when they get their bonuses…One recently came up to me [and] literally cried - ‘I’m so sorry.’… I’m like, don’t be sorry…It’s not your fault. Here I am…with the highest test scores and I’m getting $0 in bonuses. It makes no sense year to year how this works…. How do I, how do I… you know… I don’t know what to do. I don’t know how to get higher than a 100%.” Another gifted teacher noted, “I have students [in a 5 th grade gifted reading class] who score at the 6 th 7 th 8 th -grade levels in reading. But I’m like please babies, score at the 9 th grade level, cause if you don’t score at the 9 th or 10 th grade or higher in 5 th grade with me, I’m going to show negative growth. Even though you, you’re gifted and you’re talented, and you’re high! I can only push you so much higher when you are already so high. I’m scared.”
I can still hear a teacher I met over 20 years ago saying these words. I have a great fear that thoughtless implementation of score-based teacher evaluation models may undermine the education of our most vulnerable children.