Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Andy Hegedus, Ed. D.
Kingsbury Center at NWEA
September 2013
Measuring student
growth accurately – It
makes a difference i...
• Goal is to improve student achievement
through improving work force performance
over time
– Just like any profession the...
Evaluator Rating
Ineffective
Developing
Effective
Highly Effective
What is happening just can’t be
right!
5800 teachers ev...
Focus should likely be
elsewhere (on the 99%)
Executive Brief: Tracking Trends in Employee Turnover, Retrieved March
11, 2...
• Increase your understanding about various
urgent assessment related topics
– Ask better questions
– Useful for making al...
1. Selection of an appropriate test:
• Used for the purpose for which it was designed
(proficiency vs. growth)
• Can accur...
1. Evaluation process that focuses on helping
teachers improve
2. The principal or designated evaluator should
control the...
1. Use of tests as part of a
dialogue to help teachers
set improvement goals
2. Use of tests as a “yellow
light” to identi...
• What we’ve known to be true is now being
shown to be true
– Using data thoughtfully improves student
achievement
– 12% m...
“What gets measured (and attended to),
gets done”
Remember the old adage?
• NCLB
–Cast light on inequities
–Improved performance of “Bubble Kids”
–Narrowed taught curriculum
An infamous example
It’s what we do that counts
A patient’s health
doesn’t change
because we know
their blood pressure
It’s our response that
...
1. Shifting towards tighter state level
control – a shift of decision-making
away from local control
2. Our nation moved f...
Be considerate of the continuum of
stakes involved
Support
Compensate
Terminate
Increasing levels of required rigor
Increa...
The use of value-added data for high stakes
personnel decisions does not yet have a
strong, coherent, body of case law.
Ex...
Baker B., Oluwole, J., Green, P. (2013). The legal
consequences of mandating high stakes
decisions based on low quality in...
Is the progress produced
by this teacher
dramatically different
than teaching peers who
deliver instruction to
comparable ...
Marcus Normal Growth Needed Growth
Marcus’ growth
College readiness standard
The Test
The Growth Metric
The Evaluation
The Rating
There are four key steps required to
answer this question
Top-Down Mo...
Assessment 1
Goal Setting
Assessment(s)
Results and Analysis
Evaluation (Rating)
How does the other popular process
work?
...
The Test
The Growth Metric
The Evaluation
The Rating
Let’s begin at the beginning
3rd Grade
ELA
Standards
3rd Grade
ELA
Teacher?
3rd Grade
Social
Studies
Teacher?
Elem. Art
Teacher?
What is measured shoul...
• Assessments should align with the
teacher’s instructional responsibility
– Specific advanced content
• HS teachers teach...
• Many assessments are
not designed to
measure growth
• Others do not measure
growth equally well for
all students
The pur...
Both status and growth are
important but growth leads
Beginning Literacy
Adult
Reading
5th Grade
x
x
Time 1 Time 2
Status
...
Accurately measuring growth
depends on accurately measuring
achievement
How about measuring
height?
What if the pencil isn’t
very level?
What if we marked with
sidewalk chalk?
Measurement Accuracy
A test for you
Beginning Literacy
Adult
Reading
5th Grade
x
x
Time 1 Time 2
Pop Quiz:
What’s bigger?
...
Questions surrounding the
student’s achievement level
The more questions the merrier
What does it take to accurately
measu...
Teachers encounter a distribution
of student performance
Beginning Literacy
Adult
Reading
5th
Grade
x x x
x
x
x
x
x
x
x
x
...
Adaptive testing works differently
Item bank can
span full
range of
achievement
Items available need to match student
ability
California STAR NWEA MAP
5th Grade
Level Items
These differences impact
measurement error
.00
.02
.04
.06
.08
.10
.12
Information
170 180 190 200 2...
To determine growth, achievement
measurements must be related
through a scale
If I was measured as:
5’ 9”
And a year later I was:
1.82m
Did I grow?
Yes. ~ 2.5”
How do you know?
Let’s measure height ag...
Traditional assessment uses items
reflecting the grade level standards
Beginning Literacy
Adult
Reading
4th Grade
5th Grad...
Traditional assessment uses items
reflecting the grade level standards
Beginning Literacy
Adult
Reading
4th Grade
5th Grad...
• Study on impact of assessment
selection on VAM results
–Defined a misidentified teacher as one
who appeared to have grow...
• “. . . in the 25 student (single class)
simulations. At the 25 student level, the VAM
based on the TAKS misidentifies 35...
Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design principles
drawn from international comparisons', M...
Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design principles
drawn from international comparisons', M...
• Tests specifically designed to inform classroom
instruction and school improvement in
formative ways
No incentive in the...
-6.00
-4.00
-2.00
0.00
2.00
4.00
6.00
8.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 ...
Cheating
Atlanta Public Schools
Crescendo Charter Schools
Philadelphia Public Schools
Washington DC Public Schools
Houston...
When teachers are evaluated
on growth using a once per
year assessment, one teacher
who cheats disadvantages the
next teac...
Other issues
Proctoring
Proctoring both with and without the
classroom teacher raises possible
problems
Documentation that...
Testing is complete . . .
What is useful to answer our question?
The Test
The Growth Metric
The Evaluation
The Rating
The problem with spring-spring
testing
3/11 4/11 5/11 6/11 7/11 8/11 9/11 10/11 11/11 12/11 1/12 2/12 3/12
Teacher 1 Summe...
• When possible use a spring – fall – spring
approach
• Measure summer loss and incentivize schools
and teachers to minimi...
Without context what is
“Good”?
Beginning
Reading
Adult
Literacy
RIT
National
Percentile
NWEA Norms
Study
NWEA
Scale
Colle...
0
10
20
30
40
50
60
70
80
90
100
Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8
Reading
Math
The metric matters -...
Difficulty of ACT college readiness
standards
The metric matters -
Let’s go underneath “Proficiency”
Dahlin, M. and Durant, S., The State of Proficiency, Kingsbury Cent...
NumberofStudents
Fall RIT
Mathematics
No Change
Down
Up
What gets measured and attended to
really does matter
Proficiency ...
NumberofStudents
Student’s score in fall
Mathematics
Below projected
growth
Met or above
projected growth
Number of 5th gr...
How can we make it fair?
The Test
The Growth Metric
The Evaluation
The Rating
Context – 2011 NWEA Student
Norms
Starting
Score:
200
Fall RIT
Score
Subject:
Reading
Grade: 4th
7
RIT
FRL vs. non-FRL?
IE...
A Visual Representation of
Value Added
Spring 4th Grade
MAP Test
Student A
Spring RIT Score 209
RIT Score 207
(Average Spr...
• What if I skip this step?
– Comparison is likely against normative data so the
comparison is to “typical kids in typical...
• Lack of a historical context
– What has this teacher and these students done in
the past?
• Lack of comparison groups
– ...
• Value added models control for a variety of
classroom, school level, and other conditions
– Proven statistical methods
–...
• Control for measurement
error
– All models attempt to address
this issue
• Population size
• Multiple data points
– Erro...
-12.00
-11.00
-10.00
-9.00
-8.00
-7.00
-6.00
-5.00
-4.00
-3.00
-2.00
-1.00
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9....
With one teacher,
error means a lot
• Value-added models assume that variation is
caused by randomness if not controlled for
explicitly
– Young teachers are a...
“The findings indicate that these modeling
choices can significantly influence outcomes for
individual teachers, particula...
How tests are used to evaluate
teachers
The Test
The Growth Metric
The Evaluation
The Rating
• How would you
translate a rank order
to a rating?
• Data can be provided
• Value judgment
ultimately the basis
for setti...
• What is far below a
district’s expectation is
subjective
• What about
• Obligation to help
teachers improve?
• Quality o...
• System for combining elements and
producing a rating is also a value based
decision
–Multiple measures and principal jud...
Evaluator Rating
Ineffective
Developing
Effective
Highly Effective
Remember this?
5800 teachers evaluated between January ...
Leadership Courage Is A Key
0
1
2
3
4
5
Teacher 1 Teacher 2 Teacher 3
Ratings can be driven by the assessment
Observation ...
If evaluators do not differentiate
their ratings,
then all differentiation comes from
the test
Big Message
1. Selection of an appropriate test:
• Used for the purpose for which it was designed
(proficiency vs. growth)
• Can accur...
• Presentations and other recommended
resources are available at:
– www.nwea.org
– www.kingsburycenter.org
– Slideshare.ne...
Upcoming SlideShare
Loading in …5
×

NWEA Growth and Teacher evaluation VA 9-13

3,305 views

Published on

Overview of assessments, growth, and value added in a teacher evaluation context

Published in: Education, Technology
  • Be the first to comment

NWEA Growth and Teacher evaluation VA 9-13

  1. 1. Andy Hegedus, Ed. D. Kingsbury Center at NWEA September 2013 Measuring student growth accurately – It makes a difference in your world!
  2. 2. • Goal is to improve student achievement through improving work force performance over time – Just like any profession there is variability in the performance • Belief system driving policy – Rigorous performance evaluation process, and the rewards, support, or removal of teachers that comes with it, is a major lever Overview/Setting the stage
  3. 3. Evaluator Rating Ineffective Developing Effective Highly Effective What is happening just can’t be right! 5800 teachers evaluated between January and May 2012, The Atlanta Journal-Constitution January 7, 2013 “Statistically, this flies in the face of our academic achievement levels. These numbers just doesn’t jibe with reality,” Millar said. “If the Georgia evaluation system is going to be based on these type of statistics, I wouldn’t see us going forward with it because, just statistically, it can’t be valid.
  4. 4. Focus should likely be elsewhere (on the 99%) Executive Brief: Tracking Trends in Employee Turnover, Retrieved March 11, 2013, http://www.shrm.org/research/benchmarks/documents/trends%20in%20turnover_final.pdf Remaining Workforce: • Effectiveness of surrounding system • Powerful Professional Development • Performance management system explicitly designed to improve performance Voluntary Turnover: • Working conditions • Induction and support Involuntary Turnover: • Financial stability • Keep the best Remaining 85% Voluntary 9% Involuntary 6% 2011 Percentage
  5. 5. • Increase your understanding about various urgent assessment related topics – Ask better questions – Useful for making all types of decisions with data • Follow along and ask questions at any time – Slideshare.net • Will pause during transitions for you to discuss “Ah-Ha’s” with a neighbor My Purpose
  6. 6. 1. Selection of an appropriate test: • Used for the purpose for which it was designed (proficiency vs. growth) • Can accurately measure the test performance of all students 2. Alignment between the content assessed and the content to be taught 3. Adjust for context/control for factors outside a teacher’s direct control (value-added) Three primary conditions for using tests for teacher evaluation
  7. 7. 1. Evaluation process that focuses on helping teachers improve 2. The principal or designated evaluator should control the evaluation 3. Tests should inform the process, not dictate or decide it 4. Multiple measures should be used over time What NWEA supports
  8. 8. 1. Use of tests as part of a dialogue to help teachers set improvement goals 2. Use of tests as a “yellow light” to identify teachers who may be in need of additional support or assistance Two approaches we like
  9. 9. • What we’ve known to be true is now being shown to be true – Using data thoughtfully improves student achievement – 12% mathematics, 13% reading • There are dangers present however – Unintended Consequences Go forth thoughtfully with care Slotnik, W. J. , Smith, M. D., It’s more than money, February 2013, retrieved from http://www.ctacusa.com/PDFs/MoreThanMoney-report.pdf
  10. 10. “What gets measured (and attended to), gets done” Remember the old adage?
  11. 11. • NCLB –Cast light on inequities –Improved performance of “Bubble Kids” –Narrowed taught curriculum An infamous example
  12. 12. It’s what we do that counts A patient’s health doesn’t change because we know their blood pressure It’s our response that makes all the difference
  13. 13. 1. Shifting towards tighter state level control – a shift of decision-making away from local control 2. Our nation moved from a model of education reform that focused on fixing schools to a model that is focused on fixing the teaching profession Policy shifts make today’s conversation inevitable
  14. 14. Be considerate of the continuum of stakes involved Support Compensate Terminate Increasing levels of required rigor Increasingrisk
  15. 15. The use of value-added data for high stakes personnel decisions does not yet have a strong, coherent, body of case law. Expect litigation if value-added results are the lynchpin evidence for a teacher-dismissal case until a body of case law is established. • Due Process • Disparate impact doctrine Potential Litigation Issues
  16. 16. Baker B., Oluwole, J., Green, P. (2013). The legal consequences of mandating high stakes decisions based on low quality information: Teacher evaluation in the Race to the Top Era. Education Policy Analysis Archives. Vol 21. No 5. Suggested reading
  17. 17. Is the progress produced by this teacher dramatically different than teaching peers who deliver instruction to comparable students in comparable situations? What question is being answered in support of using data in evaluating teachers?
  18. 18. Marcus Normal Growth Needed Growth Marcus’ growth College readiness standard
  19. 19. The Test The Growth Metric The Evaluation The Rating There are four key steps required to answer this question Top-Down Model
  20. 20. Assessment 1 Goal Setting Assessment(s) Results and Analysis Evaluation (Rating) How does the other popular process work? Bottom-Up Model (Student Learning Objectives) Understanding all four of the top-down elements are needed here
  21. 21. The Test The Growth Metric The Evaluation The Rating Let’s begin at the beginning
  22. 22. 3rd Grade ELA Standards 3rd Grade ELA Teacher? 3rd Grade Social Studies Teacher? Elem. Art Teacher? What is measured should be aligned to what is to be taught 1. Answer questions to demonstrate understanding of text…. 2. Determine the main idea of a text…. 3. Determine the meaning of general academic and domain specific words… Would you use MAP in the evaluation of a…. ~30% of teachers teach in tested subjects and grades The Other 69 Percent: Fairly Rewarding the Performance of Teachers of Nontested Subjects and Grades, http://www.cecr.ed.gov/guides/other69Percent.pdf
  23. 23. • Assessments should align with the teacher’s instructional responsibility – Specific advanced content • HS teachers teaching discipline specific content – Especially 11th and 12th grade • MS teachers teaching HS content to advanced students – Non-tested subjects • School-wide results are more likely “professional responsibility” rather than reflecting competence – HS teachers providing remedial services What is measured should be aligned to what is to be taught
  24. 24. • Many assessments are not designed to measure growth • Others do not measure growth equally well for all students The purpose and design of the instrument is significant
  25. 25. Both status and growth are important but growth leads Beginning Literacy Adult Reading 5th Grade x x Time 1 Time 2 Status Two assumptions: 1. Measurement accuracy, and 2. Vertical scale
  26. 26. Accurately measuring growth depends on accurately measuring achievement
  27. 27. How about measuring height? What if the pencil isn’t very level? What if we marked with sidewalk chalk?
  28. 28. Measurement Accuracy A test for you Beginning Literacy Adult Reading 5th Grade x x Time 1 Time 2 Pop Quiz: What’s bigger? 1. Time 1 Error or Time 2 Error alone 2. Time 2 minus Time 1 Error (Growth)
  29. 29. Questions surrounding the student’s achievement level The more questions the merrier What does it take to accurately measure achievement?
  30. 30. Teachers encounter a distribution of student performance Beginning Literacy Adult Reading 5th Grade x x x x x x x x x x x x x x x Grade Level Performance
  31. 31. Adaptive testing works differently Item bank can span full range of achievement
  32. 32. Items available need to match student ability California STAR NWEA MAP
  33. 33. 5th Grade Level Items These differences impact measurement error .00 .02 .04 .06 .08 .10 .12 Information 170 180 190 200 210 220 230 240 Scale Score Pass/ Proficient Fully Adaptive Test Significantly Different Error 26th Fail/Basic Pass/Advanced 77th 160 Constrained Adaptive or Paper/Pencil Test
  34. 34. To determine growth, achievement measurements must be related through a scale
  35. 35. If I was measured as: 5’ 9” And a year later I was: 1.82m Did I grow? Yes. ~ 2.5” How do you know? Let’s measure height again
  36. 36. Traditional assessment uses items reflecting the grade level standards Beginning Literacy Adult Reading 4th Grade 5th Grade 6th Grade Grade Level Standards Traditional Assessment Item Bank
  37. 37. Traditional assessment uses items reflecting the grade level standards Beginning Literacy Adult Reading 4th Grade 5th Grade 6th Grade Grade Level Standards Grade Level Standards Overlap allows linking and scale construction Grade Level Standards
  38. 38. • Study on impact of assessment selection on VAM results –Defined a misidentified teacher as one who appeared to have growth which was incorrect by more than one-half a year1 • Less than .5 years or more than 1.5 years Error can change your life!!! 1Woodworth, J.L., Does Assessment Selection Matter When Computing Teacher Value-Added Measures?, http://www.kingsburycenter.org/sites/default/files/James%20Woodworth%20Data%20Award%20Research%20Brief.pdf
  39. 39. • “. . . in the 25 student (single class) simulations. At the 25 student level, the VAM based on the TAKS misidentifies 35% of all teachers, whereas, the VAM based on the MAP misidentifies only 1% of teachers.” Initial measurement error is a significant issue in AYP and Teacher Evaluation work Error can change your life!!!
  40. 40. Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design principles drawn from international comparisons', Measurement: Interdisciplinary Research & Perspective, 5: 1, 1 — 53 • …when science is defined in terms of knowledge of facts that are taught in school…(then) those students who have been taught the facts will know them, and those who have not will…not. A test that assesses these skills is likely to be highly sensitive to instruction. The instrument must be able to detect instruction
  41. 41. Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design principles drawn from international comparisons', Measurement: Interdisciplinary Research & Perspective, 5: 1, 1 — 53 • When ability in science is defined in terms of scientific reasoning…achievement will be less closely tied to age and exposure, and more closely related to general intelligence. In other words, science reasoning tasks are relatively insensitive to instruction. The more complex, the harder to detect and attribute to one teacher
  42. 42. • Tests specifically designed to inform classroom instruction and school improvement in formative ways No incentive in the system for inaccurate data Using tests in high stakes ways creates new dynamic
  43. 43. -6.00 -4.00 -2.00 0.00 2.00 4.00 6.00 8.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 Students taking 10+ minutes longer spring than fall All other students New phenomenon when used as part of a compensation program Mean value-added growth by school
  44. 44. Cheating Atlanta Public Schools Crescendo Charter Schools Philadelphia Public Schools Washington DC Public Schools Houston Independent School District Michigan Public Schools
  45. 45. When teachers are evaluated on growth using a once per year assessment, one teacher who cheats disadvantages the next teacher Other consequence
  46. 46. Other issues Proctoring Proctoring both with and without the classroom teacher raises possible problems Documentation that test administration procedures were properly followed is important Monitoring testing conditions assists with reliability
  47. 47. Testing is complete . . . What is useful to answer our question? The Test The Growth Metric The Evaluation The Rating
  48. 48. The problem with spring-spring testing 3/11 4/11 5/11 6/11 7/11 8/11 9/11 10/11 11/11 12/11 1/12 2/12 3/12 Teacher 1 Summer Teacher 2
  49. 49. • When possible use a spring – fall – spring approach • Measure summer loss and incentivize schools and teachers to minimize it • Measure teacher performance fall to spring, giving as much instructional time as possible between assessments • Monitor testing conditions to minimize gaming of fall spring results A better approach
  50. 50. Without context what is “Good”? Beginning Reading Adult Literacy RIT National Percentile NWEA Norms Study NWEA Scale CollegeReadiness Benchmarks ACT PerformanceLevels State Test “Meets” Proficiency PerformanceLevels Common Core Proficient
  51. 51. 0 10 20 30 40 50 60 70 80 90 100 Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8 Reading Math The metric matters - Let’s go underneath “Proficiency” Difficulty of Virginia SOL Pass/Proficient Cut Score NationalPercentile College Readiness A study of the alignment of the NWEA RIT scale with the Virginia Standards of Learning (SOL), December 2012
  52. 52. Difficulty of ACT college readiness standards
  53. 53. The metric matters - Let’s go underneath “Proficiency” Dahlin, M. and Durant, S., The State of Proficiency, Kingsbury Center at NWEA, July 2011
  54. 54. NumberofStudents Fall RIT Mathematics No Change Down Up What gets measured and attended to really does matter Proficiency College Readiness One district’s change in 5th grade mathematics performance relative to the KY proficiency cut scores
  55. 55. NumberofStudents Student’s score in fall Mathematics Below projected growth Met or above projected growth Number of 5th grade students meeting projected mathematics growth in the same district Changing from Proficiency to Growth means all kids matter
  56. 56. How can we make it fair? The Test The Growth Metric The Evaluation The Rating
  57. 57. Context – 2011 NWEA Student Norms Starting Score: 200 Fall RIT Score Subject: Reading Grade: 4th 7 RIT FRL vs. non-FRL? IEP vs. non-IEP? ESL vs. non-ESL? Outside of a teacher’s direct control
  58. 58. A Visual Representation of Value Added Spring 4th Grade MAP Test Student A Spring RIT Score 209 RIT Score 207 (Average Spring Score for Similar Students) Value Added (+2 RIT Score) Student A Fall RIT Score 200 Fall 4th Grade MAP Test
  59. 59. • What if I skip this step? – Comparison is likely against normative data so the comparison is to “typical kids in typical settings” • How fair is it to disregard context? – Good teacher – bad school – Good teacher – challenging kids Does your personal goal setting consider context? Consider . . .
  60. 60. • Lack of a historical context – What has this teacher and these students done in the past? • Lack of comparison groups – What have other teachers done in the past? • What is the objective? – Is the objective to meet a standard of performance or demonstrate improvement? • Do you set safe goals or stretch goals? Challenges with goal setting
  61. 61. • Value added models control for a variety of classroom, school level, and other conditions – Proven statistical methods – All attempt to minimize error – Variables outside controls are assumed as random Value-added is science
  62. 62. • Control for measurement error – All models attempt to address this issue • Population size • Multiple data points – Error is compounded with combining two test events – Nevertheless, many teachers’ value-added scores will fall within the range of statistical error A variety of errors means more stability only at the extremes
  63. 63. -12.00 -11.00 -10.00 -9.00 -8.00 -7.00 -6.00 -5.00 -4.00 -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 AverageGrowthIndexScoreandRange Mathematics Growth Index Distribution by Teacher - Validity Filtered Q5 Q4 Q3 Q2 Q1 Each line in this display represents a single teacher. The graphic shows the average growth index score for each teacher (green line), plus or minus the standard error of the growth index estimate (black line). We removed students who had tests of questionable validity and teachers with fewer than 20 students. Range of teacher value-added estimates
  64. 64. With one teacher, error means a lot
  65. 65. • Value-added models assume that variation is caused by randomness if not controlled for explicitly – Young teachers are assigned disproportionate numbers of students with poor discipline records – Parent requests for the “best” teachers are honored • Sound educational reasons for placement are likely to be defensible Assumption of randomness can have risk implications
  66. 66. “The findings indicate that these modeling choices can significantly influence outcomes for individual teachers, particularly those in the tails of the performance distribution who are most likely to be targeted by high-stakes policies.” Ballou, D., Mokher, C. and Cavalluzzo, L. (2012) Using Value-Added Assessment for Personnel Decisions: How Omitted Variables and Model Specification Influence Teachers’ Outcomes. Instability at the tails of the distribution LA Times Teacher #1 LA Times Teacher #2
  67. 67. How tests are used to evaluate teachers The Test The Growth Metric The Evaluation The Rating
  68. 68. • How would you translate a rank order to a rating? • Data can be provided • Value judgment ultimately the basis for setting cut scores for points or rating Translation into ratings can be difficult to inform with data
  69. 69. • What is far below a district’s expectation is subjective • What about • Obligation to help teachers improve? • Quality of replacement teachers? Decisions are value based, not empirical
  70. 70. • System for combining elements and producing a rating is also a value based decision –Multiple measures and principal judgment must be included –Evaluate the extremes to make sure it makes sense Even multiple measures need to be used well
  71. 71. Evaluator Rating Ineffective Developing Effective Highly Effective Remember this? 5800 teachers evaluated between January and May 2012, The Atlanta Journal-Constitution January 7, 2013
  72. 72. Leadership Courage Is A Key 0 1 2 3 4 5 Teacher 1 Teacher 2 Teacher 3 Ratings can be driven by the assessment Observation Assessment Real or Noise?
  73. 73. If evaluators do not differentiate their ratings, then all differentiation comes from the test Big Message
  74. 74. 1. Selection of an appropriate test: • Used for the purpose for which it was designed (proficiency vs. growth) • Can accurately measure the test performance of all students 2. Alignment between the content assessed and the content to be taught 3. Need for context for growth/control for factors outside a teacher’s direct control (value-added) Please be thoughtful about . . .
  75. 75. • Presentations and other recommended resources are available at: – www.nwea.org – www.kingsburycenter.org – Slideshare.net • Contacting us: NWEA Main Number 503-624-1951 E-mail: andy.hegedus@nwea.org More information

×