Value-added models are designed to quantify the amount of achievement “value” teachers add to their students over the course of a school year. There are some old ideas here, but with some new vocabulary and some new statistical twists.
In a 2003 RAND research report, McCaffrey, Koretz, Lockwood, and Hamilton had this to say.
In 2009, the NRC’s Board on Testing and Assessment issued a letter report directed to Education Secretary Arne Duncan, commenting on the Department’s proposal on the Race to the Top Fund. That letter included strong cautions concerning value-added models, and strongly urged further research and pilot studies before mandating any operational use of these models. Since then, the evidence has continued to accumulate that these models have serious problems.
Click to add notes
Introduction: The score of interest here was the gain score index comparing each individual teacher to other similar teachers across the district. This is the score that is used by HISD for determining ASPIRE awards. - Because comparisons are made based on one standard error, teachers with a score above 1.0 are deemed as adding value, and teachers scoring between 1.0 and -1.0 are deemed as Not Detectably Different (NDD) from other like teachers across the district. These numbers are noted with asterisks. Notes: - Teacher A’s EVAAS performance is really no different than the flip of a coin. - Whether she demonstrated “a significant lack of EVAAS growth” is debatable. -“Exceeding Expectations” is the best score one can receive. Outcome: -- Teacher A decided to quit teaching in HISD and did not pursue a wrongful termination hearing.
Notes: - Teacher B’s most recent year was purportedly her best. -- Whether Teacher B can be held responsible for 100% of her students’ math value-added scores across all three of these years, given her purported losses and gains alike, is debatable. Outcome: -- Teacher B decided to quit teaching in HISD and did not pursue a wrongful termination hearing.
Notes: -- Like Teacher A, no different than the flip of a coin. -- Teacher C said: “I went to a transition classroom, and now there’s a red flag next to my name. I guess now I’m an ineffective teacher? I keep getting letters from the district, saying “You’ve been recognized as an outstanding teacher”…this, this, and that. But now because I teach English Language Learners who “transition in,” my scores drop? And I get a flag next to my name for not teaching them well?” - Teachers in Houston generally note that teachers who teach ELLs in transition are the least likely to show growth across grade levels. Outcome: -- Teacher C decided to quit teaching in HISD and did not pursue a wrongful termination hearing. .
Bullet 1: One teacher noted: “I’m scared to teach in the 4 th grade. I’m scared I might lose my job if I teach in an [ELL] transition grade level, because I’m scared my scores are going to drop, and I’m going to get fired because there’s probably going to be no growth.” Another teacher noted: “When they say nobody wants to do 4 th grade – nobody wants to do 4 th grade! Nobody.” Bullet 3: A teacher noted: “I found out that I [have been] competing with myself.” Bullet 4: A gifted teacher noted: “Every year I have the highest test scores, I have fellow teachers that come up to me when they get their bonuses…One recently came up to me [and] literally cried - ‘I’m so sorry.’… I’m like, don’t be sorry…It’s not your fault. Here I am…with the highest test scores and I’m getting $0 in bonuses. It makes no sense year to year how this works…. How do I, how do I… you know… I don’t know what to do. I don’t know how to get higher than a 100%.” Another gifted teacher noted, “I have students [in a 5 th grade gifted reading class] who score at the 6 th 7 th 8 th -grade levels in reading. But I’m like please babies, score at the 9 th grade level, cause if you don’t score at the 9 th or 10 th grade or higher in 5 th grade with me, I’m going to show negative growth. Even though you, you’re gifted and you’re talented, and you’re high! I can only push you so much higher when you are already so high. I’m scared.”
I can still hear a teacher I met over 20 years ago saying these words. I have a great fear that thoughtless implementation of score-based teacher evaluation models may undermine the education of our most vulnerable children.
Developing and Assessing Teacher Effectiveness
Developing and Assessing Teacher Effectiveness Getting Teacher Evaluation Right
Policy Context Focusing on teacher effectiveness is seen as a promising path for education policy New teacher evaluation systems, and especially, “Value Added Models” (VAMs) are promoted as tools to accomplish this goal Policy makers can benefit from research about what various models actually can and cannot do
Professional Consensus The research base is currently insufficient to support the use of VAM for high-stakes decisions about individual teachers or schools. –RAND Corporation, 2005
Professional Consensus VAM estimates of teacher effectiveness … should not used to make operational decisions because such estimates are far too unstable to be considered fair or reliable. – 2009 Letter Report from the Board on Testing and Assessment, National Research Council
Concerns Raised about Value-Added MeasuresStudies find that teachers’ value-added “effectiveness” is highly variable & influenced by: The measure of achievement used The effectiveness of peers Class size, curriculum, instructional supports, and time spent with students Tutoring and out-of-school learning Student characteristics and attendance
Many Factors Influence Student Achievement Teacher knowledge, skills, dispositions, and behaviors that support the learning process. Hanushek et al. estimate the individual teacher effects component of measured student achievement is about 7%-10% of the total. Student availability for learning – Prior learning opportunities, health, supportive home context, attendance, developed abilities Resources for learning – Curriculum quality, materials, class sizes, specialist supports, etc. Coherence and continuity – The extent to which content & skills are well organized and reinforced across grades and classes
Value-Added Measures of Teacher Effects are Not Highly Stable By at least 1 By at least 2 By at least 3 decile deciles deciles Across statistical 56-80% 12-33% 0-14% models* Across courses* 85-100% 54-92% 39-54% Across years* 74-93% 45-63% 19-41%*Depending on the model
A Teacher’s Measured “Effectiveness” Can Vary Widely YEAR 1 10 YEAR 210 Same high school 8 6 Same course 4 (English I) 2 1 Not a beginning 0 Decile Rank Y1 Decile Rank Y2 teacher80 Model controls for:60 Y1 Prior40 Y2 achievement20 Demographics 0 % ELL % Low- %Hispanic School fixed income effects
The Unintended Effects of VA TeacherEvaluation in Houston: Three Cases in Point In spring of 2011, a number of HISD teachers’ contracts were not renewed, largely due to: “a significant lack of student progress attributable to the educator,” and “insufficient student academic growth reflected by [EVAAS] value-added scores.” These teachers filed wrongful termination appeals. Let’s take a look at the EVAAS data for 3 of them 9
Teacher A’s EVAAS Scores (2007-2010)- Teacher A had been a teacher for more than 10 years, teaching elementaryschool in HISD since 2000.- Teacher A showed positive VA scores 50% of the time (8/16 of EVAASobservations) . During Teacher A’s most recent years of activity, her VAscores were positive 2/3 of the time.- Until 2010-11, she “exceeded expectations” across every domain in hersupervisor evaluations. She was given a “Teacher of the Month” award in2010 and a “Teacher of the Year” award in 2008. 10
Teacher B’s EVAAS Scores (2008-2010)- Teacher B, a career-changer with a bachelor’s and master’s degree inmathematics, was certified as a math teacher via HISD’s AlternativeTeaching Certificate (ATC) program. She taught middle- and high-schoolmath in HISD since 2007.- Teacher B’s relative value-added scores were negative for math for twoyears, and positive for the most recent year for which she had EVAAS data.- Note that she taught alongside another math teacher who taught nearly halfof her students math an equal amount of time per week all year long.- Until 2010-11, she scored a “proficient” across every domain in terms of hersupervisor evaluations. 11
Teacher C’s EVAAS Scores (2007-2010)-Teacher C graduated with a bachelor’s degree in 2005, and in 2007 wascertified as a teacher for grades 4-8 via HISD’s Alternative TeachingCertificate (ATC) program. She took a full-time position in HISD in 2006.- Teacher C flip-flopped across subject areas, with positive VA scores 50% ofthe time (3/6 EVAAS observations) and negative scores 50% of the time (3/6EVAAS observations) up until 2009-2010.- In 2009-2010 Teacher C was assigned to teach a large number of EnglishLanguage Learners who were transitioned into her classroom.- Until 2010-11, she was rated as “exceeded expectations” or “proficient”across every domain in terms of her supervisor evaluations. 12
The Unintended Effects of the EVAAS: Student Characteristics Affect Teachers’ VA Teachers teaching in grades in which English Language Learners (ELLs) are transitioned into mainstreamed classrooms are the least likely to show “added value.” Teachers teaching larger numbers of special education students in mainstreamed classrooms are also found to have lower “value-added” scores. Teachers teaching students in consecutive years report receiving bonuses for the first year and nothing the next, as they “max out” on growth. Teachers teaching gifted students have small gains because their students are near the top. 13
UNINTENDED EFFECTS“The most pernicious effect of these [test-based accountability] systems is to causeteachers to resent the children who don’tscore well.” —Anonymous teacher, in a workshop many years ago
How Can We Evaluate andDevelop Effective Teaching for Every Child?
What Do Effective and Equitable Teachers Know and Do?
Effective Teachers… Engage students in active learning Create intellectually ambitious tasks Use a variety of teaching strategies Assess student learning continuously and adapt teaching to student needs Create effective scaffolds and supports Provide clear standards, constant feedback, and opportunities for revising work Develop and effectively manage a collaborative classroom in which all students have membership.
These Qualities are Embedded in Standards for Teaching National Board for Professional Teaching Standards (1987) -- Portfolio used to certify accomplished teaching Interstate New Teacher Assessment and Support Consortium (INTASC) (1990) -- Adopted in > 40 states including California (CSTP) -- Basis of new licensing assessments -- Recently revised to reflect Common Core Standards Standards-based Teacher Evaluation Instruments used in many districts
Standards-Based Evaluations Use structured observations of teaching, based on professional standards, along with other evidence of practice (e.g. lesson plans, student work) Offer stable evidence over time Are related to student learning gains Help teachers become more effective when they are the source of continuous feedback (Milanowski, Kimball, & White, 2004).
Examples Evaluation systems in San Mateo, Poway, San Francisco, Cincinnati, Denver, Rochester, New Mexico, as well as Singapore, Netherlands, and elsewhere A number of systems incorporate evidence of student learning drawn from classroom work and classroom / school / district assessments in an integrated fashion
1) Start with Standards and Build a Unified System Build on CA Standards for the Teaching Profession Create Standards-Based Approaches to state licensure assessment and advanced certification Use the same standards to shape teacher evaluation tool(s) for local evaluation Infuse into principal preparation, licensure, and evaluation the ability to evaluate and support teachers based on standards
2) Use Performance Assessments toGuide Teacher Preparation & Licensing Teacher Performance Assessments examine -- Planning for a unit of instruction -- Instruction and rationale -- Assessment and student learning -- Reflection on teaching -- Development of academic language Trained scorers use analytic rubrics Calibration and auditing of scores Assessments reliably predict effectiveness
Predictive Validity of Performance Assessments Mentor evaluations (Rockoff & Speroni) National Board Certification -- Effect sizes of .04 -.20 (pass/fail) Connecticut BEST portfolio -- Effect size of .46 (4 point scale) California PACT assessment -- Effect size of .15 (44 point scale) 20 percentile point difference in adjusted student achievement for highest and lowest-scoring teacher
What Performance Assessments May Offer• A means to better evaluate teacher effectiveness• Stable evidence that is more valid than student achievement data which are -- unavailable for most teachers -- volatile across years, courses, models -- sometimes based on narrow tests• A lever for improving teacher learning and program quality (preparation, induction, and PD)
Teacher Candidates LearnI think for me the most valuable thing was the sequencing of the lessons, teaching the lesson, and evaluating what the kids were getting, what the kids weren’t getting, and having that be reflected in my next lesson...the ‘teach-assess-teach-assess- teach-assess’ process. And so you’re constantly changing – you may have a plan or a framework that you have together, but knowing that that’s flexible and that it has to be flexible, based on what the children learn that day.
Teacher Educators LearnThis [scoring] experience…has forced me to revisit the question of what really matters in the assessment of teachers, which – in turn – means revisiting the question of what really matters in the preparation of teachers.
Cooperating Teachers Reflect on Practice[The scoring process] forces you to be clear about “good teaching;” what it looks like, sounds like. It enables you to look at your own practice critically/with new eyes.
Induction Programs LearnAs an induction program coordinator, I have a much clearer picture of what credential holders will bring to us and of what they’ll be required to do. We can build on this.
3) Build Annual Evaluation Tools Based on the Same Standards Combine Evidence of Practice, Performance, and Outcomes in an Integrated Evaluation System that looks at Teaching practice in relation to standards, curriculum goals, and student needs Contributions to colleagues and the school, and Student learning in relation to teaching practices, curriculum goals, and student needs. Accomplishment of individual and group goals
4) Use Multiple Data Sources to Reflect Practice and Learning Standards-based observation (in person or video) by experts trained in evaluation and, ideally, the content area Examination of curriculum plans, assignments, and student work samples Evidence of practices that support student learning both in and outside of the classroom (including work with parents & colleagues) Evidence of student learning measured in a variety of ways (e.g. work samples, learning progressions, pre- and post-measures tied to curriculum, exhibitions of mastery, as well as annual tests)
6) Develop Evaluation Expertise and Systems Train evaluators Release and fund expert mentors to offer assistance -- to beginners -- to teachers who need additional coaching Create evaluation panels and processes for making decisions about tenure and continuation in cases of intervention (e.g. Peer Assistance and Review systems)
7) Integrate Systems Link the implementation of common core standards to educator support and evaluation Train and assess prospective and current principals for teacher evaluation and support Introduce performance-based licensure for leaders based on understanding teaching Use professional development policies and compensation to support assessment Involve senior teachers, mentors, principals, and teacher educators as assessors
After Evaluation, Then What?How Do we Develop Effective Teaching?
8) Create policies that support the development of expertiseResearch finds that student learning gains are related to: Strong academic background Quality preparation prior to entry Certification in the field taught Experience (> 3 years) National Board CertificationIn combination, these predict more of the difference in student learning gains than race & parent education combined (Clotfelter, Ladd, & Vigdor, 2008).Policies should strengthen & equalize these features.
Expand High-Quality Pathways to Teaching Evaluate all preparation and induction programs based on results of -- teacher performance assessments (TPA) -- graduates’ contributions to student learning -- retention rates in teaching Use results in program approval / accreditation decisions Study features of successful programs & create incentives for other programs to adopt these features Expand successful programs and eliminate those that don’t improve
9) Deepen Professional LearningCreate a strong infrastructure for professionallearning that is: Responsive to teacher and principal needs Sustained and readily available Grounded in curriculum content Supportive of diverse learners Supported by coaching Connected to collaborative work in professional learning communities Integrated into school and classroom planning around curriculum, instruction, and assessment
What Research Tells UsWell-designed professional development canimprove practice and increase studentachievement. A review of high-quality experimental studies found that among programs offering extended PD (49 hours on average over 6 to 12 months), student achievement increased by 21 percentile points. (Yoon et al., 2007)One-shot workshops do not have positiveeffects.
The Status of Professional Development in the United States Effective professional development is better understood but still relatively rare in the U.S. Most teachers (>90%) participate in 1 to 2 day workshops and conferences. Well under half get sustained PD, get mentoring or coaching, or observe other classrooms. Only 17% of U.S. teachers reported a great deal of cooperative effort among staff members in 2004. This percentage shrank to 15% in 2008.
Professional Learning Opportunities in High-Achieving Nations AbroadHigh-achieving nations in Europe and Asia: Ensure extensive (3-4 year) initial preparation that includes clinical training in model schools Provide beginners with intensive mentoring. Offer extensive, sustained learning opportunities embedded in practice: Teachers have 15-25 hours a week for collaboration plus 100 hours a year for professional learning Most engage regularly in Lesson Study, Action Research, and Peer Observation and Coaching to evaluate and improve practice.
(10) Address other Influences on Teaching Effectiveness Mentoring and professional development Curriculum and assessments that support meaningful instruction Collaborative planning that builds knowledge & creates coherence Personalization Availability of high-quality materials Administrative supports for instruction
A Smart System Would…1. Adopt teaching standards that are coherent across the career2. Use Performance Assessments for initial licensure, professional licensure, & advanced certification -- scored by practicing educators / teacher educators -- used to evaluate and accredit programs3. Develop local teacher evaluation based on the same professional standards4. Combine multi-faceted evidence about practice, professional contributions, and student learning in an integrated model5. Build expertise and professional support for evaluation by focusing on principal knowledge and skills as well6. Leverage changes in preparation, induction, and professional development based on what features produce results.7. Equalize access to teachers who are prepared and certified based on these stronger measures.