Using tests for high stakes evaluation, what        educators need to know in ConnecticutPresenter - John Cronin, Ph.D.Con...
Connecticut requirements•   Components of the evaluation     – Student growth (45%) - including the state test, one non-st...
Connecticut requirementsRequirements for goal setting• Process has each teacher set one to four goals with their principal...
What changes for educators?1.The proficiency standards get higher.2.Teachers become accountable for allstudents.
Difficulty of ACT college readinessstandards
Moving from Proficiency to Growth              All students count when            accountability is measured              ...
One district’s change in 5th grade math performance           relative to Kentucky cut scores       proficiency        col...
Number of 5th grade students meeting math growth                         target in the same district
How does the process work?
How does the process work?
Connecticut requirements•   Criteria for student growth indicator     – Fair to students         • The indicator of academ...
Issues in the use of growth and value-added measures           Measurement design of the           instrument           Ma...
Tests are not equally accurate for all              students          California STAR   NWEA MAP
Tests are not equally accurate for all              students       Grade 6 New York Mathematics
Issues in the use of growth and value-added measures           Measurement sensitivity           Assessments must align wi...
College and career readiness assessments will         not necessarily be instructionally sensitive  A thirdability might d...
Issues in the use of growth and value-added measures           Measurement sensitivity           Classroom tests, which ar...
Issues in the use of growth and value-added measures           Instructional alignment           Tests should align to the...
Issues in the use of growth and value-added measures           Uncovered Subjects and Teachers           High quality test...
Considerations for developing your own     assessment and student learning objectives• Developing valid instruments is ver...
How does the process work?
Issues in the use of growth and value-added measures           Control for statistical error           All models attempt ...
Sources of error in assessment• The students.• The testing conditions.• The assessments.Measurement error in the assessmen...
New York City• Margins of error can be very large• Increasing n doesnt always decrease the margin o• The margin of error i...
Range of teacher value-added          estimates
Issues in the use of growth and value-    added measures                         “Among those who ranked in the top       ...
Issues in the use of growth and value-added measures           Instability of results           A variety of factors can c...
Los Angeles Unified•   Teachers can easily rate in multiple categories•   The choice of model can have a large impact•   M...
Possible racial bias in models“Significant evidence of bias plagued the value-added modelestimated for the Los Angeles Tim...
Instability at the tails of the         distribution      “The findings indicate that these modeling      choices can sign...
Reliability of teacher value-added                         estimates Teachers with growth scores in lowest and highest qui...
How does the process work?
Challenges with goal setting• Lack of a “racing form”. What have this  teacher and these students done in the past?• Lack ...
Issues in the use of growth and value-added measures           Model Wars           There are a variety of models in the  ...
Issues in the use of growth and value-added measures           Lack of random assignment           The use of a value-adde...
How does the process work?
New York Rating System•   60 points assigned from classroom observation•   20 points assigned from state assessment•   20 ...
Connecticut requirements
Other issues      Security and Cheating      When measuring growth, one      teacher who cheats disadvantages      the nex...
Other issues      (1) Each district shall define effectiveness and          ineffectiveness utilizing a pattern of summati...
Other issues      Security and Cheating      When measuring growth, one      teacher who cheats disadvantages      the nex...
Cheating      Atlanta Public Schools      Crescendo Charter Schools      Philadelphia Public Schools      Washington DC Pu...
Case Study #1 - Mean value-added performance in mathematics byschool – fall to spring
Case Study #1 - Mean spring and fall test duration in minutes byschool
Case Study #1 - Mean value-added growth by school and testduration
Case Study # 2Differences in fall-spring test durations   Differences in growth index score                               ...
Case Study # 2              How much of summer loss is really summer loss?Differences in spring -fall test durations   Dif...
Case Study # 2 Differences in fall-spring test duration (yellow-black) and Differences in growth index scores (green) by s...
Security considerations• Teachers should not be allowed to view the contents  of the item bank or record items.• Districts...
Other issues      Proctoring      Proctoring both with and without the      classroom teacher raises possible      problem...
Potential Litigation IssuesThe use of value-added data for high stakespersonnel decisions does not yet have a strong,coher...
Possible legal issues• Title VII of the Civil Rights Act of 1964 –  Disparate impact of sanctions on a protected  group.• ...
Recommendations• Embrace the formative advantages of growth  measurement as well as the summative.• Create comprehensive e...
Thank you for attendingPresenter - John Cronin, Ph.D.Contacting us:NWEA Main Number: 503-624-1951E-mail: rebecca.moore@nwe...
Connecticut   mesuring and modeling growth
Upcoming SlideShare
Loading in …5
×

Connecticut mesuring and modeling growth

217 views
162 views

Published on

presentation by NWEA for Connecticut school administrators.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
217
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Race to the Top, Gates Foundation, Teach for America…
  • Connecticut mesuring and modeling growth

    1. 1. Using tests for high stakes evaluation, what educators need to know in ConnecticutPresenter - John Cronin, Ph.D.Contacting us:Rebecca Moore: 503-548-5129E-mail: rebecca.moore@nwea.orgVisit our website: www.kingsburycenter.org
    2. 2. Connecticut requirements• Components of the evaluation – Student growth (45%) - including the state test, one non-standardized indicator, and (optional) one other standardized indicator. • Requires a beginning of the year, mid-year, and end-of year conference – Teacher practice and performance (40%) – • First and second year teachers – 3 in-class observations • Developing or below standard – 3 in-class observations • Proficient or exemplary – 3 observations of practice, one in-class – Whole-school learning indicator or student feedback (5%) – Parent or peer feedback (10%)
    3. 3. Connecticut requirementsRequirements for goal setting• Process has each teacher set one to four goals with their principal taking into account:• Take into account the academic track record and overall needs and strengths of the students the teacher is teaching that year/semester;• Address the most important purposes of a teacher’s assignment through self- reflection;• Be aligned with school, district and state student achievement objectives;• Take into account their students’ starting learning needs vis a vis relevant baseline data when available.• Consideration of control factors tracked by the state-wide public school information system that may influence teacher performance ratings, including, but not limited to, student characteristics, student attendance and student mobility
    4. 4. What changes for educators?1.The proficiency standards get higher.2.Teachers become accountable for allstudents.
    5. 5. Difficulty of ACT college readinessstandards
    6. 6. Moving from Proficiency to Growth All students count when accountability is measured through growth.
    7. 7. One district’s change in 5th grade math performance relative to Kentucky cut scores proficiency college readiness
    8. 8. Number of 5th grade students meeting math growth target in the same district
    9. 9. How does the process work?
    10. 10. How does the process work?
    11. 11. Connecticut requirements• Criteria for student growth indicator – Fair to students • The indicator of academic growth and development is used in such a way as to provide students an opportunity to show that they have met or are making progress in meeting the learning objective. The use of the indicator of academic growth and development is as free as possible from bias and stereotype. – Fair to teachers • The use of an indicator of academic growth and development is fair when a teacher has the professional resources and opportunity to show that his/her students have made growth and when the indicator is appropriate to the teacher’s content, assignment and class composition. – Reliable – Valid – Useful • The indicator may be used to provide the teacher with meaningful feedback about student knowledge, skills, perspective and classroom experience that may be used to enhance student learning and provide opportunities for teacher professional growth and development.
    12. 12. Issues in the use of growth and value-added measures Measurement design of the instrument Many assessments are not designed to measure growth. Others do not measure growth equally well for all students.
    13. 13. Tests are not equally accurate for all students California STAR NWEA MAP
    14. 14. Tests are not equally accurate for all students Grade 6 New York Mathematics
    15. 15. Issues in the use of growth and value-added measures Measurement sensitivity Assessments must align with the curriculum and should be instructionally sensitive.
    16. 16. College and career readiness assessments will not necessarily be instructionally sensitive A thirdability might defined the discussion of …when science is ariseis defined inof knowledge When case in science in in terms terms of ethical that are taught in school…(then) where of factsand moral dimensions of science, those scientific reasoning…achievement will be less maturity, rather than intelligencethe facts will students who haveand exposure,or curriculum closely tied to age been taught and more exposure mighttothose most important factor. know them, and be the who have not will…not. A closely related general intelligence. In other Here it science reasoning tasks are relatively test that assesses thesethe assessment is not words, may well be that skills is likely to be particularly to instruction. highly sensitive to instruction. insensitive sensitive to instructionBlack, P. and Wiliam, D.(2007) Large-scale assessment systems: Designprinciples drawn from international comparisons, Measurement:Interdisciplinary Research & Perspective, 5: 1, 1 — 53
    17. 17. Issues in the use of growth and value-added measures Measurement sensitivity Classroom tests, which are designed to measure mastery, may not measure improvement well.
    18. 18. Issues in the use of growth and value-added measures Instructional alignment Tests should align to the teacher’s instructional responsibilities.
    19. 19. Issues in the use of growth and value-added measures Uncovered Subjects and Teachers High quality tests may not be administered, or available, for many teachers and grades. Subjects like social studies may be particularly problematic.
    20. 20. Considerations for developing your own assessment and student learning objectives• Developing valid instruments is very time consuming and resource intensive.• The assessments developed must discriminate between effective and ineffective teachers.• The assessments must be valid in other respects. – Aligned to curriculum – Unbiased items• The assessments can’t be open to security violations or cheating
    21. 21. How does the process work?
    22. 22. Issues in the use of growth and value-added measures Control for statistical error All models attempt to address this issue. Nevertheless, many teachers value-added scores will fall within the range of statistical error.
    23. 23. Sources of error in assessment• The students.• The testing conditions.• The assessments.Measurement error in the assessments can be dwarfed by error introduced by the testing conditions and the students.
    24. 24. New York City• Margins of error can be very large• Increasing n doesnt always decrease the margin o• The margin of error in math is typically less than re
    25. 25. Range of teacher value-added estimates
    26. 26. Issues in the use of growth and value- added measures “Among those who ranked in the top category on the TAKS reading test, more than 17% ranked among the lowest two categories on the Stanford. Similarly more than 15% of the lowest value-added teachers on the TAKS were in the highest two categories on the Stanford.”Corcoran, S., Jennings, J., & Beveridge, A., Teacher Effectiveness on High and Low StakesTests, Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI(2010).
    27. 27. Issues in the use of growth and value-added measures Instability of results A variety of factors can cause value- added results to lack stability. Results are more likely to be stable at the extremes. The use of multiple-years of data is highly recommended.
    28. 28. Los Angeles Unified• Teachers can easily rate in multiple categories• The choice of model can have a large impact• Models effect English more than Math• Teachers do better in some subjects than others• More complex models dont necessarily favor the t
    29. 29. Possible racial bias in models“Significant evidence of bias plagued the value-added modelestimated for the Los Angeles Times in 2010, including significantpatterns of racial disparities in teacher ratings both by the race ofthe student served and by the race of the teachers (see Green,Baker and Oluwole, 2012). These model biases raise the possibilitythat Title VII disparate impact claims might also be filed by teachersdismissed on the basis of their value-added estimates.Additional analyses of the data, including richer models usingadditional variables mitigated substantial portions of the bias in theLA Times models (Briggs & Domingue, 2010).” Baker, B. (2012, April 28). If it’s not valid, reliability doesn’t matter so much! More on VAM-ing
    30. 30. Instability at the tails of the distribution “The findings indicate that these modeling choices can significantly influence outcomes for individual teachers, particularly those in the tails of the performance distribution who are most likely to be targeted by high-stakes policies.”Ballou, D., Mokher, C. and Cavalluzzo, L. (2012)Using Value-Added Assessment for Personnel Decisions: How Omitted Variables and Model Specif LA Times Teacher #1 LA Times Teacher #2
    31. 31. Reliability of teacher value-added estimates Teachers with growth scores in lowest and highest quintile over two years using NWEA’s Measures of Academic Progress Bottom Top quintile quintile Y1&Y2 Y1&Y2 Number 59/493 63/493 Percent 12% 13% r .64 r2 .41Typical r values for measures of teaching effectiveness rangebetween .30 and .60 (Brown Center on Education Policy, 2010)
    32. 32. How does the process work?
    33. 33. Challenges with goal setting• Lack of a “racing form”. What have this teacher and these students done in the past?• Lack of comparison groups. What have other teachers done in the past.• What is the objective? Is the objective to meet a standard of performance or demonstrate improvement?• Do you set safety goals or stretch goals?
    34. 34. Issues in the use of growth and value-added measures Model Wars There are a variety of models in the marketplace. These models may come to different conclusions about the effectiveness of a teacher or school. Differences in findings are more likely to happen at the extremes.
    35. 35. Issues in the use of growth and value-added measures Lack of random assignment The use of a value-added model assumes that the school doesn’t add a source of variation that isn’t controlled for in the model. e.g. Young teachers are assigned disproportionate numbers of students with poor discipline records.
    36. 36. How does the process work?
    37. 37. New York Rating System• 60 points assigned from classroom observation• 20 points assigned from state assessment• 20 points assigned from local assessment• A score of 64 or less is rated ineffective.
    38. 38. Connecticut requirements
    39. 39. Other issues Security and Cheating When measuring growth, one teacher who cheats disadvantages the next teacher.
    40. 40. Other issues (1) Each district shall define effectiveness and ineffectiveness utilizing a pattern of summative ratings derived from the new evaluation system. (2) At the request of a district or employee, the State Department of Education or a third-party entity approved by the SDE will audit the evaluation components that are combined to determine an individuals summative rating in the event that such components are significantly dissimilar (i.e. include both exemplary and below standard ratings) to determine a final summative rating. (3) The State Department of Education or a third-party designated by the SDE will audit evaluations ratings of exemplary and below standard to validate such exemplary or below standard ratings by selecting ten districts at random annually
    41. 41. Other issues Security and Cheating When measuring growth, one teacher who cheats disadvantages the next teacher.
    42. 42. Cheating Atlanta Public Schools Crescendo Charter Schools Philadelphia Public Schools Washington DC Public Schools Houston Independent School District Michigan Public Schools
    43. 43. Case Study #1 - Mean value-added performance in mathematics byschool – fall to spring
    44. 44. Case Study #1 - Mean spring and fall test duration in minutes byschool
    45. 45. Case Study #1 - Mean value-added growth by school and testduration
    46. 46. Case Study # 2Differences in fall-spring test durations Differences in growth index score based on fall-spring test durations
    47. 47. Case Study # 2 How much of summer loss is really summer loss?Differences in spring -fall test durations Differences in raw growth based by spring-fall test duration
    48. 48. Case Study # 2 Differences in fall-spring test duration (yellow-black) and Differences in growth index scores (green) by school
    49. 49. Security considerations• Teachers should not be allowed to view the contents of the item bank or record items.• Districts should have policies for accomodation that are based on student IEPs.• Districts should consider having both the teacher and a proctor in the test room.• Districts should consider whether other security measures are needed for both the protection of the teacher and administrators.
    50. 50. Other issues Proctoring Proctoring both with and without the classroom teacher raises possible problems. Documentation that test administration procedures were properly followed is important.
    51. 51. Potential Litigation IssuesThe use of value-added data for high stakespersonnel decisions does not yet have a strong,coherent, body of case law.Expect litigation if value-added results are thelynchpin evidence for a teacher-dismissal caseuntil a body of case law is established.
    52. 52. Possible legal issues• Title VII of the Civil Rights Act of 1964 – Disparate impact of sanctions on a protected group.• State statutes that provide tenure and other related protections to teachers.• Challenges to a finding of “incompetence” stemming from the growth or value-added data.
    53. 53. Recommendations• Embrace the formative advantages of growth measurement as well as the summative.• Create comprehensive evaluation systems with multiple measures of teacher effectiveness (Rand, 2010)• Select measures as carefully as value-added models.• Use multiple years of student achievement data.• Understand the issues and the tradeoffs.
    54. 54. Thank you for attendingPresenter - John Cronin, Ph.D.Contacting us:NWEA Main Number: 503-624-1951E-mail: rebecca.moore@nwea.orgThe presentation and recommended resources areavailable at our website: www.kingsburycenter.org

    ×