Using Assessment Data for Educator and Student Growth
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Using Assessment Data for Educator and Student Growth

on

  • 317 views

This presentation reviews major topics to be considered when using assessment data in implementing a school's program of educator and student growth and evaluation. By attending this workshop, ...

This presentation reviews major topics to be considered when using assessment data in implementing a school's program of educator and student growth and evaluation. By attending this workshop, participants will improve their assessment literacy, learn how to improve student achievement and instructional effectiveness through thoughtful data use, and discuss common issues shared by educators when using data for evaluative purposes.

Statistics

Views

Total Views
317
Views on SlideShare
317
Embed Views
0

Actions

Likes
0
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Teacher evaluations and the use of data in them can take many forms. You can use them for supporting teachers and their improvement. You can use the evaluations to compensate teachers or groups of teachers differently or you can use them in their highest stakes way to terminate teachers. <br /> <br /> The higher the stakes put on the evaluation, the more risk there is to you and your organization from a political, legal, and equity perspective. Most people naturally respond with increasing the levels of rigor put into designing the process as a way to ameliorate the risk. One fact is that the risk can’t be eliminated. <br /> <br /> Our goal – Make sure you are prepared. Understand the risk. Proper ways to implement including legal issues. Clarify some of the implications – Very complex – Prepare you and a prudent course
  • Contrast with what value added communicates <br /> <br /> Plot normal growth for Marcus vs anticipated growth – value added. If you ask whether the teachers provided value added, the answer is Yes. <br /> Other line is what is needed for college readiness <br /> <br /> Blue line is what is used to evaluate the teacher. <br /> Is he on the line the parents want him to be on? Probably not. <br /> <br /> Don’t focus on one at the expense of the other <br /> <br /> NCLB – AYP vs what the parent really wants for goal setting <br /> Can be come so focused on measuring teachers that we lose sight of what parents value <br /> We are better off moving towards the kids aspirations <br /> <br /> As a parent I didn’t care if the school made AYP. I cared if my kids got the courses that helped them go where they want to go.
  • Steps are quite important. People tend to skip some of these. <br /> <br /> Kids take a test – important that the test is aligned to instruction being given <br /> Metric – look at growth vs growth norm and calculate a growth index. Two benefits – Very transparent/Simple. <br /> People tend to use our growth norms – if you hit 60% for a grade level within a school you are dong well. <br /> Norms – growth of a kid or group of kids compared to a nationally representative sample of students <br /> Why isn’t this value added? <br /> Not all teachers can be compared to a nationally representative sample because they don’t teach kids that are just like the national sample <br /> <br /> The third step controls for variables unique to the teacher’s classroom or environment <br /> <br /> Fourth step – rating – how much below average before the district takes action or how much above before someone gets performance pay. Particular challenge in NY state right now. Law requires it.
  • Steps are quite important. People tend to skip some of these. <br /> <br /> Kids take a test – important that the test is aligned to instruction being given <br /> Metric – look at growth vs growth norm and calculate a growth index. Two benefits – Very transparent/Simple. <br /> People tend to use our growth norms – if you hit 60% for a grade level within a school you are dong well. <br /> Norms – growth of a kid or group of kids compared to a nationally representative sample of students <br /> Why isn’t this value added? <br /> Not all teachers can be compared to a nationally representative sample because they don’t teach kids that are just like the national sample <br /> <br /> The third step controls for variables unique to the teacher’s classroom or environment <br /> <br /> Fourth step – rating – how much below average before the district takes action or how much above before someone gets performance pay. Particular challenge in NY state right now. Law requires it.
  • Common core – very ambitious things they want to measure – tackle things on an AP test. Write and show their work. <br /> <br /> A CC assessment to evaluate teachers can be a problem. <br /> <br /> Raise your hand if you know what the capital of Chile is. Santiago. Repeat after me. We will review in a couple of minutes. Facts can be relatively easily acquired and are instructionally sensitive. If you expose kids to facts in a meaningful and engaging ways, it is sensitive to instruction.
  • State assessment designed to measure proficiency – many items in the middle not at the ends <br /> <br /> Must use multiple points of data over time to measure this. <br /> <br /> We also believe that a principal should be more in control of the evaluation than the test – Principal and Teacher leaders are what changes schools <br />
  • 5th grade NY reading cut scores shown
  • Problem – insensitive to instruction <br /> <br /> Prereq skills – writing skills. <br /> <br /> Given events on N. Africa today, <br /> <br /> Q requires a lot of pre-req knowledge. Need to know the story. Put it into writing. Reasoning skills to put it together with events today. And I need to know what is going on today as well. One doesn’t develop this entire set of skills in the 9 months of instruction. <br /> <br /> Common core is what we want. Just not for teacher evaluation. <br /> <br /> These questions are not that sensitive to instruction. Problematic when we hold teachers accountable for instruction or growth. <br />
  • Problem – insensitive to instruction <br /> <br /> Prereq skills – writing skills. <br /> <br /> Given events on N. Africa today, <br /> <br /> Q requires a lot of pre-req knowledge. Need to know the story. Put it into writing. Reasoning skills to put it together with events today. And I need to know what is going on today as well. One doesn’t develop this entire set of skills in the 9 months of instruction. <br /> <br /> Common core is what we want. Just not for teacher evaluation. <br /> <br /> These questions are not that sensitive to instruction. Problematic when we hold teachers accountable for instruction or growth. <br />
  • How you talk with students in advance <br /> How students see their data being used <br /> Does it make a difference in their life? <br /> Test scheduling and pre- or post- activities <br /> When during the day is testing scheduled <br />
  • Steps are quite important. People tend to skip some of these. <br /> <br /> Kids take a test – important that the test is aligned to instruction being given <br /> Metric – look at growth vs growth norm and calculate a growth index. Two benefits – Very transparent/Simple. <br /> People tend to use our growth norms – if you hit 60% for a grade level within a school you are dong well. <br /> Norms – growth of a kid or group of kids compared to a nationally representative sample of students <br /> Why isn’t this value added? <br /> Not all teachers can be compared to a nationally representative sample because they don’t teach kids that are just like the national sample <br /> <br /> The third step controls for variables unique to the teacher’s classroom or environment <br /> <br /> Fourth step – rating – how much below average before the district takes action or how much above before someone gets performance pay. Particular challenge in NY state right now. Law requires it.
  • NCLB required everyone to get above proficient – message focus on kids at or near proficient <br /> <br /> School systems responded <br /> <br /> MS standards are harder than the elem standards – MS problem <br /> <br /> No effort to calibrate them – no effort to project elem to ms standards <br /> <br /> Start easy and ramp up. <br /> <br /> Proficient in elem and not in MS with normal growth. <br /> <br /> When you control for the difficulty in the standards Elem and MS performance are the same
  • Not only are standards different across grades, they are different across states. <br /> <br /> It’s data like this that helps to inspire the Common Core and consistent standards so we compare apples to apples
  • Dramatic differences between standards based vs growth <br /> KY 5th grade mathematics <br /> Sample of students from a large school system <br /> <br /> X-axis Fall score, Y number of kids <br /> Blue are the kids who did not change status between the fall and the spring on the state test <br /> Red are the kids who declined in performance over spring – Decender <br /> Green are kids who moved above it in performance over the spring – Ascender – Bubble kids <br /> About 10% based on the total number of kids <br /> Accountability plans are made typically based on these red and green kids
  • Same district as before <br /> <br /> Yellow – did not meet target growth – spread over the entire range of kids <br /> Green – did meet growth targets <br /> 60% vs 40% is doing well – This is a high performing district with high growth <br /> <br /> Must attend to all kids – this is a good thing – ones in the middle and at both extremes <br /> Old one was discriminatory – focus on some in lieu of others <br /> Teachers who teach really hard at the standard for years – Teachers need to be able to reach them all <br /> <br /> This does a lot to move the accountability system to parents and our desires.
  • Steps are quite important. People tend to skip some of these. <br /> <br /> Kids take a test – important that the test is aligned to instruction being given <br /> Metric – look at growth vs growth norm and calculate a growth index. Two benefits – Very transparent/Simple. <br /> People tend to use our growth norms – if you hit 60% for a grade level within a school you are dong well. <br /> Norms – growth of a kid or group of kids compared to a nationally representative sample of students <br /> Why isn’t this value added? <br /> Not all teachers can be compared to a nationally representative sample because they don’t teach kids that are just like the national sample <br /> <br /> The third step controls for variables unique to the teacher’s classroom or environment <br /> <br /> Fourth step – rating – how much below average before the district takes action or how much above before someone gets performance pay. Particular challenge in NY state right now. Law requires it.
  • There are wonderful teachers who teach in very challenging, dysfunctional settings. The setting can impact the growth. HLM embeds the student in a classroom, the classroom in the school, and controls for the school parameters. Is it perfect. No. Is it better? Yes. <br /> <br /> Opposite is true and learning can be magnified as well. <br /> <br /> What if kids are a challenge, ESL or attendance for instance. It can deflate scores especially with a low number of kids in the sample being analyzed. Also need to make sure you have a large enough ‘n’ to make this possible especially true in small districts. <br /> <br /> Our position is that a test can inform the decision, but the principal/administrator should collect the bulk of the data that is used in the performance evaluation process.
  • Measurement error is compounded in test 1 and test 2
  • Green line is their VA estimate and bar is the error of measure <br /> <br /> Both on top and bottom people can be in other quartiles <br /> <br /> People in the middle can cross quintiles – just based on SEM <br /> <br /> Cross country – winners spread out. End of the race spread. Middle you get a pack. Middle moving up makes a big difference in the overall race. <br /> <br /> Instability and narrowness of ranges means evaluating teachers in the middle of the test mean slight changes in performance can be a large change in performance ranking
  • No solid research on learning and performance goals at the same time. For complex situations where learning is required, learning goals work best, then “Do your best” goals, then performance goals. Focus should be on mastering skills rather than reaching a desired level of performance. That will come later. Performance goals distract from the learning that is needed. <br /> <br /> Learning goals help moderate cheating as opposed to performance goal
  • No solid research on learning and performance goals at the same time. For complex situations where learning is required, learning goals work best, then “Do your best” goals, then performance goals. Focus should be on mastering skills rather than reaching a desired level of performance. That will come later. Performance goals distract from the learning that is needed. <br /> <br /> Learning goals help moderate cheating as opposed to performance goal
  • Steps are quite important. People tend to skip some of these. <br /> <br /> Kids take a test – important that the test is aligned to instruction being given <br /> Metric – look at growth vs growth norm and calculate a growth index. Two benefits – Very transparent/Simple. <br /> People tend to use our growth norms – if you hit 60% for a grade level within a school you are dong well. <br /> Norms – growth of a kid or group of kids compared to a nationally representative sample of students <br /> Why isn’t this value added? <br /> Not all teachers can be compared to a nationally representative sample because they don’t teach kids that are just like the national sample <br /> <br /> The third step controls for variables unique to the teacher’s classroom or environment <br /> <br /> Fourth step – rating – how much below average before the district takes action or how much above before someone gets performance pay. Particular challenge in NY state right now. Law requires it.
  • Use NY point system as the example

Using Assessment Data for Educator and Student Growth Presentation Transcript

  • 1. Andy Hegedus, Ed.D. Kingsbury Center at NWEA June 2014 Using Assessment Data for Educator and Student Growth
  • 2. • Increase your understanding about various urgent assessment related topics –Ask better questions –Useful for making all types of decisions with data My Purpose
  • 3. 1. Alignment between the content assessed and the content to be taught 2. Selection of an appropriate assessment • Used for the purpose for which it was designed (proficiency vs. growth) • Can accurately measure the knowledge of all students • Adequate sensitivity to growth 3. Adjust for context/control for factors outside a teacher’s direct control (value-added) Three primary conditions
  • 4. 1. Assessment results used wisely as part of a dialogue to help teachers set and meet challenging goals 2. Use of tests as a “yellow light” to identify teachers who may be in need of additional support or are ready for more Two approaches we like
  • 5. • What we’ve known to be true is now being shown to be true – Using data thoughtfully improves student achievement and growth rates – 12% mathematics, 13% reading • There are dangers present however – Unintended Consequences Go forth thoughtfully with care Slotnik, W. J. , Smith, M. D., It’s more than money, February 2013, retrieved from http://www.ctacusa.com/PDFs/MoreThanMoney-report.pdf
  • 6. “What gets measured (and attended to), gets done” Remember the old adage?
  • 7. • NCLB –Cast light on inequities –Improved performance of “Bubble Kids” –Narrowed taught curriculum The same dynamic happens inside your schools An infamous example
  • 8. It’s what we do that counts A patient’s health doesn’t change because we know their blood pressure It’s our response that makes all the difference
  • 9. Be considerate of the continuum of stakes involved Support Compensate Terminate Increasing levels of required rigor Increasingrisk
  • 10. Marcus Normal Growth Needed Growth Marcus’ growth College readiness standard
  • 11. The Test The Growth Metric The Evaluation The Rating There are four key steps required to answer this question Top-Down Model
  • 12. Assessment 1 Goal Setting Assessment(s) Results and Analysis Evaluation (Rating) How does the other popular process work? Bottom-Up Model (Student Learning Objectives) Understanding all four of the top-down elements are needed here
  • 13. The Test The Growth Metric The Evaluation The Rating Let’s begin at the beginning
  • 14. 3rd Grade ELA Standards 3rd Grade ELA Teacher? 3rd Grade Social Studies Teacher? Elem. Art Teacher? What is measured should be aligned to what is to be taught 1. Answer questions to demonstrate understanding of text…. 2. Determine the main idea of a text…. 3. Determine the meaning of general academic and domain specific words… Would you use a general reading assessment in the evaluation of a…. ~30% of teachers teach in tested subjects and grades The Other 69 Percent: Fairly Rewarding the Performance of Teachers of Nontested Subjects and Grades, http://www.cecr.ed.gov/guides/other69Percent.pdf
  • 15. • Assessments should align with the teacher’s instructional responsibility – Specific advanced content • HS teachers teaching discipline specific content – Especially 11th and 12th grade • MS teachers teaching HS content to advanced students – Non-tested subjects • School-wide results are more likely “professional responsibility” rather than reflecting competence – HS teachers providing remedial services What is measured should be aligned to what is to be taught
  • 16. • Many assessments are not designed to measure growth • Others do not measure growth equally well for all students The purpose and design of the instrument is significant
  • 17. Let’s ensure we have similar meaning Beginning Literacy Adult Reading 5th Grade x x Time 1 Time 2 Status Two assumptions: 1. Measurement accuracy, and 2. Vertical interval scale
  • 18. Accurately measuring growth depends on accurately measuring achievement
  • 19. Questions surrounding the student’s achievement level The more questions the merrier What does it take to accurately measure achievement?
  • 20. Teachers encounter a distribution of student performance Beginning Literacy Adult Reading 5th Grade x x x x x x x x x x x x x x x Grade Level Performance
  • 21. Adaptive testing works differently Item bank can span full range of achievement
  • 22. How about accurately measuring height? What if the yardstick stopped in the middle of his back?
  • 23. Items available need to match student ability California STAR NWEA MAP
  • 24. How about accurately measuring height? What if we could only mark within a pre- defined six inch range?
  • 25. 5th Grade Level Items These differences impact measurement error .00 .02 .04 .06 .08 .10 .12 Information 170 180 190 200 210 220 230 240 Scale Score Fully Adaptive Test Significantly Different Error 160 Constrained Adaptive or Paper/Pencil Test
  • 26. To determine growth, achievement measurements must be related through a scale
  • 27. If I was measured as: 5’ 9” And a year later I was: 1.82m Did I grow? Yes. ~ 2.5” How do you know? Let’s measure height again
  • 28. Traditional assessment uses items reflecting the grade level standards Beginning Literacy Adult Reading 4th Grade 5th Grade 6th Grade Grade Level Standards Traditional Assessment Item Bank
  • 29. Traditional assessment uses items reflecting the grade level standards Beginning Literacy Adult Reading 4th Grade 5th Grade 6th Grade Grade Level Standards Grade Level Standards Overlap allows linking and scale construction Grade Level Standards
  • 30. Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design principles drawn from international comparisons', Measurement: Interdisciplinary Research & Perspective, 5: 1, 1 — 53 • …when science is defined in terms of knowledge of facts that are taught in school…(then) those students who have been taught the facts will know them, and those who have not will…not. A test that assesses these skills is likely to be highly sensitive to instruction. The instrument must be able to detect instruction
  • 31. Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design principles drawn from international comparisons', Measurement: Interdisciplinary Research & Perspective, 5: 1, 1 — 53 • When ability in science is defined in terms of scientific reasoning…achievement will be less closely tied to age and exposure, and more closely related to general intelligence. In other words, science reasoning tasks are relatively insensitive to instruction. The more complex, the harder to detect and attribute to one teacher
  • 32. • Tests specifically designed to inform classroom instruction and school improvement in formative ways No incentive in the system for inaccurate data Using tests in high stakes ways creates new dynamic
  • 33. -6.00 -4.00 -2.00 0.00 2.00 4.00 6.00 8.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 Students taking 10+ minutes longer spring than fall All other students New phenomenon when used as part of a compensation program Mean value-added growth by school
  • 34. Cheating Atlanta Public Schools Crescendo Charter Schools Philadelphia Public Schools Washington DC Public Schools Houston Independent School District Michigan Public Schools
  • 35. When teachers are evaluated on growth using a once per year assessment, one teacher who cheats disadvantages the next teacher Other consequence
  • 36. • Both a proctor and the teacher should be presenting during testing – Teacher can best guide students and ensure effort – Proctor protects integrity of results and can support defense of teacher if results are challenged • Have all student test each term – Need two terms to determine growth – More student aggregated the more you know Proctoring
  • 37. • Important for reliable test data particularly when determining growth • Use Testing Condition Indicators as KPIs – Accuracy, duration, changes in duration – Formative conversations to improve over time • Short test durations are worth considering follow- up – Apply criteria each test event • Be concerned more with consistency in test duration than duration itself Consistent Testing Conditions
  • 38. • Pause or terminate before completion – Preferred option – Address when problems are identified – Not subject to challenge that student retested simply because the score wasn’t good enough • Monitor students as testing is going on – Ensure effort – Support students as they struggle – G&T • Show that accurate data is important Early Intervention
  • 39. • Define “Significant” decline between test events – Apply significant decline criteria each test term • Simply missing cut score is not an acceptable reason to retest Retesting
  • 40. Testing is complete . . . What is useful to answer our question? The Test The Growth Metric The Evaluation The Rating
  • 41. 0 10 20 30 40 50 60 70 80 90 100 Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8 Reading Math The metric matters - Let’s go underneath “Proficiency” Difficulty of New York Cut Score Between Level 2 and 3 NationalPercentile College Readiness A study of the alignment of the NWEA RIT scale with the New York State (NYS) Testing Program, November 2013
  • 42. Difficulty of ACT college readiness standards
  • 43. The metric matters - Let’s go underneath “Proficiency” Dahlin, M. and Durant, S., The State of Proficiency, Kingsbury Center at NWEA, July 2011
  • 44. NumberofStudents Fall RIT Mathematics No Change Down Up What gets measured and attended to really does matter Proficiency College Readiness One district’s change in 5th grade mathematics performance relative to the KY proficiency cut scores
  • 45. NumberofStudents Student’s score in fall Mathematics Below projected growth Met or above projected growth Number of 5th grade students meeting projected mathematics growth in the same district Changing from Proficiency to Growth means all kids matter
  • 46. • What did you just learn? • How will you change what you typically do? Guiding Questions
  • 47. How can we make it fair? The Test The Growth Metric The Evaluation The Rating
  • 48. Without context what is “Good”? Beginning Reading Adult Literacy National Percentile Norms StudyScale CollegeReadiness Benchmarks ACT PerformanceLevels State Test “Meets” Proficiency PerformanceLevels Common Core Proficient
  • 49. Normative data for growth is a bit different Fall Score Subject: Reading Grade: 4th 7 points FRL vs. non-FRL? IEP vs. non-IEP? ESL vs. non-ESL? Outside of a teacher’s direct control Starting Achievement Instructional Weeks Basic Factors Typical growth
  • 50. 60%20% 20% APPR Observations State Test Growth EA Value-Added How did we address requirements in New York? State Tested Grades / Subjects (4-8 Math and Reading) Other Grades / Subjects for which there is an available non-state test 60%20% 20% APPR Observations Local Measure 2 EA Value-Added Value- Added Value- Added Local Measure 2 (SLO) State Test Growth Partnered with Education Analytics on VAM
  • 51. The Oak Tree Analogy* – a conceptual introduction to the metric *Developed at the Value-Added Research Center An Introduction to Value-Added
  • 52. The Oak Tree Analogy
  • 53. Gardener A Gardener B Explaining Value-Added by Evaluating Gardener Performance • For the past year, these gardeners have been tending to their oak trees trying to maximize the height of the trees.
  • 54. This method is analogous to using an Achievement Model. Gardener A Gardener B 61 in. 72 in. Method 1: Measure the Height of the Trees Today (One Year After the Gardeners Began) • Using this method, Gardener B is the more effective gardener.
  • 55. 61 in. 72 in.Gardener A Gardener B Oak A Age 4 (Today) Oak B Age 4 (Today) Oak A Age 3 (1 year ago) Oak B Age 3 (1 year ago) 47 in. 52 in. This Achievement Result is not the Whole Story • We need to find the starting height for each tree in order to more fairly evaluate each gardener’s performance during the past year.
  • 56. This is analogous to a Simple Growth Model, also called Gain. 61 in. 72 in.Gardener A Gardener B Oak A Age 4 (Today) Oak B Age 4 (Today) Oak A Age 3 (1 year ago) Oak B Age 3 (1 year ago) 47 in. 52 in. Method 2: Compare Starting Height to Ending Height • Oak B had more growth this year, so Gardener B is the more effective gardener.
  • 57. Gardener A Gardener B What About Factors Outside the Gardener’s Influence? • This is an “apples to oranges” comparison. • For our oak tree example, three environmental factors we will examine are: Rainfall, Soil Richness, and Temperature.
  • 58. External condition Oak Tree A Oak Tree B Rainfall amount Soil richness Temperature High Low Low High High Low Gardener A Gardener B
  • 59. Gardener A Gardener B How Much Did These External Factors Affect Growth? • We need to analyze real data from the region to predict growth for these trees. • We compare the actual height of the trees to their predicted heights to determine if the gardener’s effect was above or below average.
  • 60. In order to find the impact of rainfall, soil richness, and temperature, we will plot the growth of each individual oak in the region compared to its environmental conditions.
  • 61. Rainfall Low Medium High Growth in inches relative to the average -5 -2 +3 Soil Richness Low Medium High Growth in inches relative to the average -3 -1 +2 Temperature Low Medium High Growth in inches relative to the average +5 -3 -8 Calculating Our Prediction Adjustments Based on Real Data
  • 62. Oak A Age 3 (1 year ago) Oak B Age 3 (1 year ago) 67 in. 72 in.Gardener A Gardener B Oak A Prediction Oak B Prediction 47 in. 52 in. +20 Average+20 Average Make Initial Prediction for the Trees Based on Starting Height • Next, we will refine out prediction based on the growing conditions for each tree. When we are done, we will have an “apples to apples” comparison of the gardeners’ effect.
  • 63. 70 in. 67 in.Gardener A Gardener B 47 in. 52 in. +20 Average+20 Average + 3 for Rainfall - 5 for Rainfall Based on Real Data, Customize Predictions based on Rainfall • For having high rainfall, Oak A’s prediction is adjusted by +3 to compensate. • Similarly, for having low rainfall, Oak B’s prediction is adjusted by -5 to compensate.
  • 64. 67 in. 69 in.Gardener A Gardener B 47 in. 52 in. +20 Average+20 Average + 3 for Rainfall - 3 for Soil + 2 for Soil - 5 for Rainfall Adjusting for Soil Richness • For having poor soil, Oak A’s prediction is adjusted by -3. • For having rich soil, Oak B’s prediction is adjusted by +2.
  • 65. 59 in. 74 in. Gardener A Gardener B 47 in. 52 in. +20 Average+20 Average + 3 for Rainfall - 3 for Soil + 2 for Soil - 8 for Temp + 5 for Temp - 5 for Rainfall Adjusting for Temperature • For having high temperature, Oak A’s prediction is adjusted by -8. • For having low temperature, Oak B’s prediction is adjusted by +5.
  • 66. +20 Average+20 Average + 3 for Rainfall - 3 for Soil + 2 for Soil - 8 for Temp + 5 for Temp _________ +12 inches During the year _________ +22 inches During the year 59 in. 74 in. Gardener A Gardener B 47 in. 52 in. - 5 for Rainfall Our Gardeners are Now on a Level Playing Field • The predicted height for trees in Oak A’s conditions is 59 inches. • The predicted height for trees in Oak B’s conditions is 74 inches.
  • 67. Predicted Oak A Predicted Oak B Actual Oak A Actual Oak B 59 in. 74 in. Gardener A Gardener B 61 in. 72 in. +2 -2 Compare the Predicted Height to the Actual Height • Oak A’s actual height is 2 inches more than predicted. We attribute this to the effect of Gardener A. • Oak B’s actual height is 2 inches less than predicted. We attribute this to the effect of Gardener B.
  • 68. This is analogous to a Value-Added measure. Above Average Value-Added Below Average Value-Added Predicted Oak A Predicted Oak B Actual Oak A Actual Oak B 59 in. 74 in. Gardener A Gardener B 61 in. 72 in. +2 -2 Method 3: Compare the Predicted Height to the Actual Height • By accounting for last year’s height and environmental conditions of the trees during this year, we found the “value” each gardener “added” to the growth of the trees.
  • 69. Gardener A Value-Added is a Group Measure • To statistically isolate a gardener’s effect, we need data from many trees under that gardener’s care. Gardener B
  • 70. Oak Tree Analogy Value-Added in Education What are we evaluating? • Gardeners • Districts • Schools • Grades • Classrooms • Programs and Interventions How does this analogy relate to value added in the education context? What are we using to measure success? • Relative height improvement in inches • Relative improvement on standardized test scores Sample • Single oak tree • Groups of students Control factors • Tree’s prior height • Other factors beyond the gardener’s control: • Rainfall • Soil richness • Temperature • Students’ prior test performance (usually most significant predictor) • Other demographic characteristics such as: • Grade level • Gender • Race / Ethnicity • Low-Income Status • ELL Status • Disability Status • Section 504 Status
  • 71. • What if I skip this step? –Comparison is likely against normative data so the comparison is to “typical kids in typical settings” • How fair is it to disregard context? –Good teacher – bad school –Good teacher – challenging kids Consider . . .
  • 72. • Control for measurement error – All models attempt to address this issue • Population size • Multiple data points – Error is compounded with combining two test events – Many teachers’ value-added scores will fall within the range of statistical error A variety of errors means more stability only at the extremes
  • 73. -12.00 -11.00 -10.00 -9.00 -8.00 -7.00 -6.00 -5.00 -4.00 -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 AverageGrowthIndexScoreandRange Mathematics Growth Index Distribution by Teacher - Validity Filtered Q5 Q4 Q3 Q2 Q1 Each line in this display represents a single teacher. The graphic shows the average growth index score for each teacher (green line), plus or minus the standard error of the growth index estimate (black line). We removed students who had tests of questionable validity and teachers with fewer than 20 students. Range of teacher value-added estimates
  • 74. With one teacher, error means a lot
  • 75. Because we want students to learn more! • Research view –Setting goals improves performance Why should we care about goal setting in education?
  • 76. What does research say on goal setting? Locke, E. A. & Latham, G. P. (2002). Building a practically useful theory of goal setting and task motivation: A 35-year odyssey. American psychologist. American Psychological Association. Goals Moderators Mechanisms Performance Satisfaction with Performance and Rewards Willingness to commit Essential Elements of Goal-Setting Theory and the High-Performance Cycle
  • 77. What does research say on goal setting? Locke, E. A. & Latham, G. P. (2002). Building a practically useful theory of goal setting and task motivation: A 35-year odyssey. American psychologist. American Psychological Association. Goals Moderators Mechanisms Performance Satisfaction with Performance and Rewards Willingness to commit Essential Elements of Goal-Setting Theory and the High-Performance Cycle
  • 78. • Specificity • Difficulty – Performance and learning goals – Proximal goals Goals Goals Explanation • Specific goals are typically stronger than “Do your best” goals • Moderately challenging is better than too easy or too hard – If complex and new knowledge or skills needed, set learning goals • Master five new ways to assess each student’s learning in the moment – If complex, set short term goals to gauge progress and feel rewarded
  • 79. • Lack of a historical context – What has this teacher and these students done in the past? • Lack of comparison groups – What have other teachers done in the past? • What is the objective? – Is the objective to meet a standard of performance or demonstrate improvement? • Do you set safe goals or challenging goals? Challenges with goal setting
  • 80. • Goals and targets themselves –Appropriately balance moderately challenging goals with consequences • Only use “Stretch” goals for the organization to stimulate creativity and create unconventional solutions Suggestions Locke, E. A., & Latham, G. P. (2013). New developments in goal setting and task performance.
  • 81. • Goals and targets themselves (cont.) –Set additional learning goals if complex and new –Set interim benchmarks for progress monitoring –Carefully consider what will not happen to attain the goal • Can you live with the consequences? • How will you look for other unintended ones? Suggestions Locke, E. A., & Latham, G. P. (2013). New developments in goal setting and task performance.
  • 82. How tests are used to evaluate teachers The Test The Growth Metric The Evaluation The Rating
  • 83. • How would you translate a rank order to a rating? • Data can be provided • Value judgment ultimately the basis for setting cut scores for points or rating Translation into ratings can be difficult to inform with data
  • 84. • What is far below a district’s expectation is subjective • What about • Obligation to help teachers improve? • Quality of replacement teachers? Decisions are value based, not empirical
  • 85. • System for combining elements and producing a rating is also a value based decision –Multiple measures and principal judgment must be included –Evaluate the extremes to make sure it makes sense Even multiple measures need to be used well
  • 86. Leadership Courage Is A Key 0 1 2 3 4 5 Teacher 1 Teacher 2 Teacher 3 Ratings can be driven by the assessment Observation Assessment Real or Noise?
  • 87. If evaluators do not differentiate their ratings, then all differentiation comes from the test Big Message
  • 88. 1. Alignment between the content assessed and the content to be taught 2. Selection of an appropriate assessment • Used for the purpose for which it was designed (proficiency vs. growth) • Can accurately measure the knowledge of all students • Adequate sensitivity to growth 3. Adjust for context/control for factors outside a teacher’s direct control (value-added) Please be thoughtful about . . .
  • 89. • Presentations and other recommended resources are available at: – www.nwea.org – www.kingsburycenter.org – www.slideshare.net • Contacting us: NWEA Main Number 503-624-1951 E-mail: andy.hegedus@nwea.org More information