SlideShare a Scribd company logo
Considerations when using tests for
        teacher evaluation

Presenter - John Cronin, Ph.D.

Contacting us:
NWEA Main Number: 503-624-1951
E-mail: rebecca.moore@nwea.org

This PowerPoint presentation and recommended resources are
available at our website: www.kingsburycenter.org
Label each player as effective, partially
effective, or ineffective

     Avg.      HR         RBI        SB
     .309      5          54         7
     .303      13         53         20
     .271      4          30         7
     .270      28         71         4
     .260      16         58         3
     .238      7          37         1
     .217      5          28         0
Label each player as effective, partially
           effective, or ineffective

                Avg.      HR         RBI        SB
Rosario         .309      5          54         7
Gonzales        .303      13         53         20
Scuturo         .271      4          30         7
Cudger          .270      28         71         4
Helton          .260      16         58         3
Hernandez       .238      7          37         1
Rosario         .217      5          28         0
Facts about baseball players

• If effective baseball players hit .300, then 90%
  of baseball players are ineffective.
• If effective baseball players are better-than-av
  average hitters than 50% are ineffective.
• A baseball player retains his job is he performs
  better than the available replacement.
• Most of the pool of available replacements are
  lousy baseball players.
Application to teaching

Don’t dismiss teachers for incompetence unless
you know you can replace them with someone
better.

Don’t identify more teachers for dismissal than
you can support through remediation.

Don’t identify more teachers for dismissal than
you can manage through the dismissal process.
Key requirements related to testing

• Assessment constitutes 50% of the evaluation.
• Statewide summative assessments for subjects in which available.
  Districts will be on their own for other subjects.
• Use of the Colorado Growth Model with statewide assessment.
• A measure of individually attributed or collectively attributed student
  growth.
• Local measure must be credible, valid (aligned), reliable, and inferences
  from the measure must be supportable by evidence and logic.
• The law requires that the measures should support consistent inferences.
• Rating of ineffective or partially effective can lead to loss of non-
  probationary status.
• If a value-added model is used the model must be transparent enough to
  permit external evaluation.
Unique characteristics of the
    Colorado approach
• Student progress counts for 50% of the
  evaluation.
• Teachers are evaluated on both a “catch up”
  and “keep up” metric (at least on TCAP)
• The Colorado Growth Model will likely be used
  to evaluate progress (at least on TCAP)
Unique characteristics of the
    Colorado approach
• Student progress counts for 50% of the
  evaluation.
• Teachers are evaluated on both a “catch up”
  and “keep up” metric (at least on TCAP)
• The Colorado Growth Model will likely be used
  to evaluate progress (at least on TCAP)
Obvious possible issues

• The requirement that the assessment support
  inferences of teacher effectiveness opens a
  legal question.
• The credibility requirement is unique and not
  interpreted.
How tests are used to evaluate teachers and
 principals


Testing


   Metric (Growth or Gain Score)


      Analysis (Value Added Effect
      Size and/or ranking)

          Evaluation (Performance
          Rating)
Expect consistent inconsistency!
Inconsistency occurs because

• Of differences in test design.
• Differences in testing conditions.
• Differences in models being applied to
  evaluate growth.
Inconsistency between tests


     California STAR   NWEA MAP
The reliability problem –
         Inconsistency in testing conditions


                                   Test   Retest




Test 1        Test 2             Test 1            Test 2
Time 1        Time 1             Time 2            Time 2
The reliability problem –
           Inconsistency in testing conditions

Test 1           Test 2             Test 1       Test 2
Time 1           Time 1             Time 2       Time 2




 Test 1           Test 2             Test 1       Test 2
 Time 1           Time 1             Time 2       Time 2




  Test 1           Test 2             Test 1       Test 2
  Time 1           Time 1             Time 2       Time 2
The problem with spring-spring testing




       Teacher 1             Summer                           Teacher 2

3/11    4/11   5/11   6/11    7/11   8/11   9/11   10/11   11/11   12/11   1/12   2/12   3/12
The problem with spring-spring testing




       Teacher 1             Summer                           Teacher 2

3/11    4/11   5/11   6/11    7/11   8/11   9/11   10/11   11/11   12/11   1/12   2/12   3/12
The problem with spring-spring testing




       Teacher 1             Summer                           Teacher 2

3/11    4/11   5/11   6/11    7/11   8/11   9/11   10/11   11/11   12/11   1/12   2/12   3/12
Characteristics of value-added metrics



• Value-added metrics are inherently NORMATIVE.
• If below average = partially effective then half of the
  average staff will be partially effective.
• Value-added metrics can’t measure progress of the
  larger group over time.
• Extreme performance is more likely to have alternate
  explanations.
Issues in the use of growth and value-
    added measures



                          “Among those who ranked in the top
                          category on the TAKS reading test, more
                          than 17% ranked among the lowest two
                          categories on the Stanford. Similarly
                          more than 15% of the lowest value-added
                          teachers on the TAKS were in the highest
                          two categories on the Stanford.”



Corcoran, S., Jennings, J., & Beveridge, A., Teacher Effectiveness on High and Low Stakes
Tests, Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI
(2010).
Reliability of teacher value-added
                        estimates
 Teachers with growth scores in lowest and
 highest quintile over two years using NWEA’s
 Measures of Academic Progress

               Bottom        Top quintile
               quintile      Y1&Y2
               Y1&Y2
 Number        59/493        63/493
 Percent       12%           13%


 r             .64           r2             .41


Typical r values for measures of teaching effectiveness range
between .30 and .60 (Brown Center on Education Policy, 2010)
Range of teacher value-added
                                                                     estimates
                                       12.00
                                       11.00
                                                Mathematics Growth Index Distribution by Teacher - Validity Filtered
                                       10.00
                                        9.00                                         Each line in this display represents a single teacher. The graphic
                                                                                     shows the average growth index score for each teacher (green
                                        8.00                                         line), plus or minus the standard error of the growth index estimate
                                        7.00                                         (black line). We removed students who had tests of questionable
                                                                                     validity and teachers with fewer than 20 students.
                                        6.00
                                        5.00
Average Growth Index Score and Range




                                        4.00                                                                                                                Q5
                                        3.00
                                        2.00
                                                                                                                                                            Q4
                                        1.00
                                        0.00
                                                                                                                                                            Q3
                                        -1.00
                                        -2.00                                                                                                               Q2
                                        -3.00
                                        -4.00                                                                                                               Q1
                                        -5.00
                                        -6.00
                                        -7.00
                                        -8.00
                                        -9.00
                                       -10.00
                                       -11.00
                                       -12.00
New York City

• Margins of error can be very large
• Increasing n doesn't always decrease the
  margin of error
• The margin of error in math is typically less
  than reading
Inconsistency among the Colorado
Growth Model and other value-added
approaches.
Los Angeles Unified

• Teachers can easily rate in multiple categories
• The choice of model can have a large impact
• Models effect English more than Math
• Teachers do better in some subjects than
  others
• More complex models don't necessarily favor
  the teacher
Issues with the Colorado Growth
     Model

• When applied to MAP it discards the
  advantages of a cross-grade scale and robust
  growth norms.
• It is a descriptive and not a causal model.
• As currently applied it does not control for
  factors outside the teacher’s influence that
  may affect student growth.
A brief commentary on the Colorado Growth
Model



          It’s limitations

          •It does not support inference.
          •It does not take advantage of the
          useful characteristics of a vertical
          scale.
          •It uses only prior scores and past
          testing history to evaluate growth.
A brief commentary on the Colorado Growth
Model



          Other limitations

          •The model can’t be used for cross-
          state comparisons.
          • the model is problematic for
          assessing long-term trends.
A finding of effectiveness or ineffectiveness is
more defensible when it is arrived at by:

1. Two or more assessments of different designs.
2. Two or more models of different designs.
3. As many cases as possible.

It is not good to choose tests or models for local
    assessment in hopes that they will mimic the
    state assessment.
Potential Litigation Issues


The use of value-added data for high stakes
personnel decisions does not yet have a
strong, coherent, body of case law.

Expect litigation if value-added results are the
lynchpin evidence for a teacher-dismissal case
until a body of case law is established.
Instability at the tails of the
         distribution

       “The findings indicate that these modeling
       choices can significantly influence outcomes
       for individual teachers, particularly those in
       the tails of the performance distribution who
       are most likely to be targeted by high-stakes
       policies.”

Ballou, D., Mokher, C. and Cavalluzzo, L. (2012) Using Value-Added Assessment for Personnel
Decisions: How Omitted Variables and Model Specification Influence Teachers’ Outcomes.




                                                                  LA Times Teacher #1
                                                                  LA Times Teacher #2
Possible racial bias in models

“Significant evidence of bias plagued the value-added model
estimated for the Los Angeles Times in 2010, including significant
patterns of racial disparities in teacher ratings both by the race of
the student served and by the race of the teachers (see
Green, Baker and Oluwole, 2012). These model biases raise the
possibility that Title VII disparate impact claims might also be filed
by teachers dismissed on the basis of their value-added estimates.

Additional analyses of the data, including richer models using
additional variables mitigated substantial portions of the bias in the
LA Times models (Briggs & Domingue, 2010).”


                 Baker, B. (2012, April 28). If it’s not valid, reliability doesn’t
                 matter so much! More on VAM-ing & SGP-ing
                 Teacher Dismissal.
Issues in the use of growth and value-
added measures

           Lack of random assignment

           The use of a value-added model
           assumes that the school doesn’t
           add a source of variation that isn’t
           controlled for in the model.

           e.g. Young teachers are assigned
           disproportionate numbers of
           students with poor discipline
           records.
Measurement Issues




            Moving from the model to
                    the teacher rating
Translating ranked data to ratings -
     principles


• There is no “science” per se around translating a
  ranking to a rating. If you call a bottom 40% teacher
  ineffective that is a judgment.
• The rating process can be politicized.
• The process is easy to over-engineer.
New York Rating System



•   60 points assigned from classroom observation
•   20 points assigned from state assessment
•   20 points assigned from local assessment
•   A score of 64 or less is rated ineffective.
Ineffective
                                         (Growth                           Developing (Growth Measures)                                                      Effective (Growth Measures)                                  Highly Effective (Growth Measures)
                                        Measures)
                                        0   1     2    3    4    5    6    7    8    9    10   11    12   13   14   15   16   17   18   19   20   21   22   23   24   25    26   27    28   29   30   31   32   33   34    35   36    37    38   39     40
                                   0    1   2     2    2    2    2    2    2    2    2     2    2     2    2    2    2    2    2    2    2    2    2    2    2    2    2     2    2     2    2    2    2    2    2    2     2    2     2     2    2      2
                                   1    2   3     4    4    4    4    5    5    5    5     5    5     5    5    5    5    5    5    6    6    6    6    6    6    6    6     6    6     6    6    6    6    6    6    6     6    6     6     6    6      6
                                   2    2   4     5    6    6    6    7    7    7    7     7    8     8    8    8    8    8    8    8    8    8    9    9    9    9    9     9    9     9    9    9    9    9    9    9     9    9     9     9    9      9
Ineffective (Observational)




                                   3    2   5     6    7    7    8    8    9    9    9    10   10    10   10   10   10   11   11   11   11   11   11   11   11   11   11    12   12    12   12   12   12   12   12   12    12   12    12    12    12    12
                                   4    3   5     7    8    9    9    10   10   11   11   11   12    12   12   12   13   13   13   13   13   13   14   14   14   14   14    14   14    14   14   15   15   15   15   15    15   15    15    15    15    15
                                   5    3   6     8    9    10   11   11   12   12   13   13   14    14   14   14   15   15   15   15   16   16   16   16   16   16   16    17   17    17   17   17   17   17   17   17    18   18    18    18    18    18
                                   6    3   6     8    10   11   12   13   13   14   14   15   15    16   16   16   17   17   17   17   18   18   18   18   18   19   19    19   19    19   19   19   20   20   20   20    20   20    20    20    20    21
                                   7    3   7     9    11   12   13   14   15   15   16   16   17    17   18   18   18   19   19   19   20   20   20   20   20   21   21    21   21    21   22   22   22   22   22   22    22   23    23    23    23    23
                                   8    3   7     10   11   13   14   15   16   17   17   18   18    19   19   20   20   20   21   21   21   22   22   22   23   23   23    23   23    24   24   24   24   24   24   25    25   25    25    25    25    25
                                   9    3   8     10   12   14   15   16   17   18   18   19   20    20   21   21   22   22   23   23   23   24   24   24   24   25   25    25   25    26   26   26   26   26   27   27    27   27    27    27    28    28
                                   10   3   8     11   13   14   16   17   18   19   20   20   21    22   22   23   23   24   24   25   25   25   26   26   26   27   27    27   27    28   28   28   28   29   29   29    29   29    29    30    30    30
                                   11   3   8     11   13   15   17   18   19   20   21   22   22    23   24   24   25   25   26   26   27   27   27   28   28   28   29    29   29    30   30   30   30   31   31   31    31   31    32    32    32    32
                                   12   4   8     12   14   16   17   19   20   21   22   23   24    24   25   26   26   27   27   28   28   29   29   29   30   30   30    31   31    31   32   32   32   33   33   33    33   33    34    34    34    34
                                   13   4   9     12   14   16   18   20   21   22   23   24   25    26   26   27   28   28   29   29   30   30   31   31   31   32   32    33   33    33   34   34   34   34   35   35    35   35    36    36    36    36
                                   14   4   9     12   15   17   19   20   22   23   24   25   26    27   27   28   29   30   30   31   31   32   32   33   33   33   34    34   35    35   35   36   36   36   37   37    37   37    38    38    38    38
                                   15   4   9     13   15   18   19   21   23   24   25   26   27    28   29   29   30   31   31   32   33   33   34   34   35   35   35    36   36    37   37   37   38   38   38   39    39   39    40    40    40    40
                                   16   4   9     13   16   18   20   22   23   25   26   27   28    29   30   31   31   32   33   33   34   35   35   36   36   37   37    37   38    38   39   39   39   40   40   40    41   41    41    42    42    42
                                   17   4   9     13   16   19   21   23   24   25   27   28   29    30   31   32   33   33   34   35   35   36   37   37   38   38   39    39   39    40   40   41   41   42   42   42    43   43    43    44    44    44
Developing (Observational)




                                   18   4   10    14   17   19   21   23   25   26   28   29   30    31   32   33   34   35   35   36   37   37   38   38   39   40   40    41   41    41   42   42   43   43   44   44    44   45    45    45    46    46
                                   19   4   10    14   17   20   22   24   26   27   28   30   31    32   33   34   35   36   36   37   38   39   39   40   40   41   42    42   43    43   43   44   44   45   45   46    46   46    47    47    47    48
                                   20   4   10    14   17   20   22   24   26   28   29   31   32    33   34   35   36   37   38   38   39   40   41   41   42   42   43    43   44    45   45   45   46   46   47   47    48   48    48    49    49    49
                                   21   4   10    14   18   21   23   25   27   29   30   31   33    34   35   36   37   38   39   40   40   41   42   42   43   44   44    45   45    46   46   47   47   48   48   49    49   50    50    50    51    51
                                   22   4   10    15   18   21   23   26   27   29   31   32   34    35   36   37   38   39   40   41   42   42   43   44   44   45   46    46   47    47   48   48   49   49   50   50    51   51    52    52    52    53
                                   23   4   10    15   18   21   24   26   28   30   31   33   34    36   37   38   39   40   41   42   43   43   44   45   46   46   47    48   48    49   49   50   50   51   51   52    52   53    53    54    54    54
                                   24   4   11    15   19   22   24   27   29   31   32   34   35    36   38   39   40   41   42   43   44   45   45   46   47   48   48    49   50    50   51   51   52   52   53   53    54   54    55    55    56    56
                                   25   4   11    15   19   22   25   27   29   31   33   34   36    37   39   40   41   42   43   44   45   46   47   47   48   49   50    50   51    52   52   53   53   54   54   55    55   56    56    57    57    58
                                   26   4   11    16   19   23   25   28   30   32   34   35   37    38   39   41   42   43   44   45   46   47   48   49   49   50   51    51   52    53   53   54   55   55   56   56    57   57    58    58    59    59
                                   27   4   11    16   20   23   26   28   30   32   34   36   37    39   40   42   43   44   45   46   47   48   49   50   50   51   52    53   53    54   55   55   56   57   57   58    58   59    59    60    60    61
                                   28   4   11    16   20   23   26   29   31   33   35   37   38    40   41   42   44   45   46   47   48   49   50   51   52   52   53    54   55    55   56   57   57   58   59   59    60   60    61    61    62    62
                                   29   4   11    16   20   24   26   29   31   34   35   37   39    40   42   43   45   46   47   48   49   50   51   52   53   54   54    55   56    57   57   58   59   59   60   61    61   62    62    63    63    64
                                   30   4   11    16   20   24   27   30   32   34   36   38   40    41   43   44   45   47   48   49   50   51   52   53   54   55   56    56   57    58   59   59   60   61   61   62    62   63    64    64    65    65
                                   31   4   11    17   21   24   27   30   32   35   37   39   40    42   43   45   46   47   49   50   51   52   53   54   55   56   57    57   58    59   60   61   61   62   63   63    64   64    65    66    66    67
                                   32   4   11    17   21   25   28   30   33   35   37   39   41    43   44   46   47   48   50   51   52   53   54   55   56   57   58    59   59    60   61   62   62   63   64   64    65   66    66    67    68    68
                                   33   4   12    17   21   25   28   31   33   36   38   40   42    43   45   46   48   49   50   52   53   54   55   56   57   58   59    60   61    61   62   63   64   64   65   66    66   67    68    68    69    69
Effective (Observational)




                                   34   4   12    17   21   25   28   31   34   36   38   40   42    44   46   47   49   50   51   53   54   55   56   57   58   59   60    61   62    63   63   64   65   66   66   67    68   68    69    70    70    71
                                   35   4   12    17   22   25   29   32   34   37   39   41   43    45   46   48   49   51   52   53   55   56   57   58   59   60   61    62   63    64   64   65   66   67   68   68    69   70    70    71    72    72
                                   36   4   12    17   22   26   29   32   35   37   39   41   43    45   47   49   50   52   53   54   55   57   58   59   60   61   62    63   64    65   66   66   67   68   69   69    70   71    72    72    73    74
                                   37   4   12    17   22   26   29   32   35   38   40   42   44    46   48   49   51   52   54   55   56   58   59   60   61   62   63    64   65    66   67   68   68   69   70   71    71   72    73    74    74    75
                                   38   4   12    18   22   26   30   33   36   38   40   43   45    46   48   50   52   53   55   56   57   58   60   61   62   63   64    65   66    67   68   69   69   70   71   72    73   73    74    75    75    76
                                   39   4   12    18   22   26   30   33   36   39   41   43   45    47   49   51   52   54   55   57   58   59   61   62   63   64   65    66   67    68   69   70   71   71   72   73    74   75    75    76    77    77
                                   40   4   12    18   23   27   30   33   36   39   41   44   46    48   50   51   53   55   56   57   59   60   61   63   64   65   66    67   68    69   70   71   72   73   73   74    75   76    77    77    78    79
                                   41   4   12    18   23   27   31   34   37   39   42   44   46    48   50   52   54   55   57   58   60   61   62   63   65   66   67    68   69    70   71   72   73   74   75   75    76   77    78    78    79    80
                                   42   5   12    18   23   27   31   34   37   40   42   45   47    49   51   53   54   56   58   59   60   62   63   64   66   67   68    69   70    71   72   73   74   75   76   76    77   78    79    80    80    81
                                   43   5   12    18   23   27   31   34   37   40   43   45   47    49   51   53   55   57   58   60   61   63   64   65   66   68   69    70   71    72   73   74   75   76   77   78    78   79    80    81    82    82
                                   44   5   12    18   23   28   31   35   38   41   43   46   48    50   52   54   56   57   59   60   62   63   65   66   67   69   70    71   72    73   74   75   76   77   78   79    80   80    81    82    83    84
                                   45   5   13    19   24   28   32   35   38   41   44   46   48    51   53   54   56   58   60   61   63   64   66   67   68   69   71    72   73    74   75   76   77   78   79   80    81   82    82    83    84    85
                                   46   5   13    19   24   28   32   35   39   41   44   47   49    51   53   55   57   59   60   62   63   65   66   68   69   70   71    73   74    75   76   77   78   79   80   81    82   83    83    84    85    86
Highly Effective (Observational)




                                   47   5   13    19   24   28   32   36   39   42   45   47   49    52   54   56   58   59   61   63   64   66   67   69   70   71   72    74   75    76   77   78   79   80   81   82    83   84    85    85    86    87
                                   48   5   13    19   24   29   32   36   39   42   45   47   50    52   54   56   58   60   62   63   65   66   68   69   71   72   73    74   76    77   78   79   80   81   82   83    84   85    86    87    87    88
                                   49   5   13    19   24   29   33   36   40   43   45   48   50    53   55   57   59   61   62   64   66   67   69   70   71   73   74    75   77    78   79   80   81   82   83   84    85   86    87    88    89    89
                                   50   5   13    19   24   29   33   37   40   43   46   48   51    53   55   57   59   61   63   65   66   68   69   71   72   74   75    76   77    79   80   81   82   83   84   85    86   87    88    89    90    90
                                   51   5   13    19   25   29   33   37   40   43   46   49   51    54   56   58   60   62   64   65   67   69   70   72   73   74   76    77   78    79   81   82   83   84   85   86    87   88    89    90    91    92
                                   52   5   13    19   25   29   33   37   41   44   47   49   52    54   56   58   61   62   64   66   68   69   71   72   74   75   77    78   79    80   82   83   84   85   86   87    88   89    90    91    92    93
                                   53   5   13    19   25   30   34   37   41   44   47   50   52    55   57   59   61   63   65   67   68   70   72   73   75   76   77    79   80    81   82   84   85   86   87   88    89   90    91    92    93    94
                                   54   5   13    20   25   30   34   38   41   44   47   50   53    55   57   60   62   64   66   67   69   71   72   74   75   77   78    80   81    82   83   85   86   87   88   89    90   91    92    93    94    95
                                   55   5   13    20   25   30   34   38   41   45   48   50   53    56   58   60   62   64   66   68   70   71   73   75   76   78   79    80   82    83   84   85   87   88   89   90    91   92    93    94    95    96
                                   56   5   13    20   25   30   34   38   42   45   48   51   54    56   58   61   63   65   67   69   70   72   74   75   77   78   80    81   82    84   85   86   87   89   90   91    92   93    94    95    96    97
                                   57   5   13    20   25   30   35   38   42   45   48   51   54    56   59   61   63   65   67   69   71   73   74   76   78   79   81    82   83    85   86   87   88   90   91   92    93   94    95    96    97    98
                                   58   5   13    20   26   30   35   39   42   46   49   52   54    57   59   62   64   66   68   70   72   73   75   77   78   80   81    83   84    85   87   88   89   90   92   93    94   95    96    97    98    99
                                   59   5   13    20   26   31   35   39   43   46   49   52   55    57   60   62   64   66   68   70   72   74   76   77   79   81   82    83   85    86   88   89   90   91   92   94    95   96    97    98    99    100
                                   60   5   13    20   26   31   35   39   43   46   49   52   55    58   60   63   65   67   69   71   73   75   76   78   80   81   83    84   86    87   88   90   91   92   93   95    96   97    98    99   100    101
Cheating

      Atlanta Public Schools
      Crescendo Charter Schools
      Philadelphia Public Schools
      Washington DC Public Schools
      Houston Independent School
      District
      Michigan Public Schools
Unintended Consequences?


• Many principals and teachers (including good ones)
  will seek schools or teaching assignments that they
  think will improve their results.
• Principals and teachers may game the system,
  inadvertently or intentionally.
• Many teachers will seek opportunities to avoid
  grades with standardized tests.
• Ranking metrics can discourage cooperation among
  principals and teachers – finding ways to reward
  teamwork and cooperation are important.
Case Study #1 - Mean value-added performance in mathematics by
    school – fall to spring



6.00

4.00

2.00

0.00

-2.00

-4.00

-6.00

-8.00
Case Study #1 - Mean spring and fall test duration in minutes by
        school


90.00


80.00


70.00


60.00


50.00
                                                                  Spring term
                                                                  Fall term
40.00


30.00


20.00


10.00


 0.00
Case Study #1 - Mean value-added growth by school and test
         duration


 8.00


 6.00


 4.00


 2.00


 0.00


 -2.00


 -4.00


 -6.00


 -8.00


-10.00

                 Students taking 10+ minutes longer spring than fall   All other students
Case Study # 2


Differences in fall-spring test durations                                Differences in growth index score
                                                                         based on fall-spring test durations
                  Mathematics
                                        15%
                                                                                              Mathematics
                                                                        6.0

                                                                        5.0




                                                         Growth Index
                                                                        4.0

                                                  25%                   3.0
      60%                                                               2.0

                                                                        1.0

                                                                        0.0
                                                                              Spring < Fall    Spring = Fall   Spring > Fall
        Spring < Fall   Spring = Fall    Spring > Fall
Case Study # 2

                 How much of summer loss is really summer loss?

Differences in spring -fall test durations                     Differences in raw growth based by
                                                                     spring-fall test duration


                                                        0.0
                                                        -0.5
           25%
                                                        -1.0
                                                        -1.5
                                               42%      -2.0
                                                        -2.5
                                                        -3.0
                                                        -3.5
                                                        -4.0
                                                        -4.5
               33%
                                                        -5.0

        Fall < Spring   Fall = Spring   Fall > Spring                Fall < Spring   Fall = Spring   Fall >Spring
Case Study # 2


                Differences in fall-spring test duration (yellow-black) and
                Differences in growth index scores (green) by school

          200                                                                        10.0

          180                                                                        9.0

          160                                                                        8.0

          140                                                                        7.0




                                                                                            Growth Index
          120                                                                        6.0
Minutes




          100                                                                        5.0

           80                                                                        4.0

           60                                                                        3.0

           40                                                                        2.0

           20                                                                        1.0

            0                                                                        0.0
                                               School

                          Growth Index   Fall test duration   Spring test duration
Negotiated goals – Student Learning
     Objectives

• Negotiated goals (SLOs) are likely to be
  necessary in some subjects.
• It is difficult to set fair and reasonable goals
  for improvement absent norms or context.
• It is likely that some goals will be absurdly high
  and others way too low.
An alternate approach

• Give primacy to evaluator observation for judging teachers.
• Focus mandatory observations on low performers.
• Use assessments and value-added measurement to validate
  observations.
• Require reassessment when observations and assessment
  data are in significant misalignment.
Possible legal issues

• Title VII of the Civil Rights Act of 1964 –
  Disparate impact of sanctions on a protected
  group.
• State statutes that provide tenure and other
  related protections to teachers.
• Challenges to a finding of “incompetence”
  stemming from the growth or value-added
  data.
Recommendations

• Embrace the formative advantages of growth
  measurement as well as the summative.
• Create comprehensive evaluation systems with
  multiple measures of teacher effectiveness (Rand,
  2010)
• Select measures as carefully as value-added models.
• Use multiple years of student achievement data.
• Understand the issues and the tradeoffs.
Thank you for attending this event


Presenter - John Cronin, Ph.D.

Contacting us:
NWEA Main Number: 503-624-1951
E-mail: rebecca.moore@nwea.org

The presentation and recommended resources are
available at our website: www.kingsburycenter.org

More Related Content

Viewers also liked

Parent conferencing with map
Parent conferencing with mapParent conferencing with map
Parent conferencing with map
John Cronin
 
Ed Reform Lecture - University of Arkansas
Ed Reform Lecture - University of ArkansasEd Reform Lecture - University of Arkansas
Ed Reform Lecture - University of Arkansas
John Cronin
 
Connecticut mesuring and modeling growth
Connecticut   mesuring and modeling growthConnecticut   mesuring and modeling growth
Connecticut mesuring and modeling growth
John Cronin
 
Nyinst
NyinstNyinst
Nyinst
John Cronin
 
Teacher evaluation and goal setting connecticut
Teacher evaluation and goal setting   connecticutTeacher evaluation and goal setting   connecticut
Teacher evaluation and goal setting connecticut
John Cronin
 
Teacher evaluation present
Teacher evaluation presentTeacher evaluation present
Teacher evaluation present
John Cronin
 
Connecticut mesuring and modeling growth
Connecticut   mesuring and modeling growthConnecticut   mesuring and modeling growth
Connecticut mesuring and modeling growth
John Cronin
 
New ways to think about framing accountability to your community
New ways to think about framing accountability to your communityNew ways to think about framing accountability to your community
New ways to think about framing accountability to your community
John Cronin
 
Connecticut mesuring and modeling growth
Connecticut   mesuring and modeling growthConnecticut   mesuring and modeling growth
Connecticut mesuring and modeling growth
John Cronin
 
College
CollegeCollege
College
John Cronin
 
Cv in english 2012 trainer lopez calderon j.
Cv in english 2012 trainer lopez calderon j.Cv in english 2012 trainer lopez calderon j.
Cv in english 2012 trainer lopez calderon j.
Javier Lopez-Pedro Labarthe
 
Teacher evaluation presentation3 mass
Teacher evaluation presentation3  massTeacher evaluation presentation3  mass
Teacher evaluation presentation3 mass
John Cronin
 
Teacher evaluation presentation oregon
Teacher evaluation presentation   oregonTeacher evaluation presentation   oregon
Teacher evaluation presentation oregon
John Cronin
 
Rv assessment
Rv assessment Rv assessment
Rv assessment
Joura Vishal
 
BLOCK HF trial
BLOCK HF trial BLOCK HF trial
BLOCK HF trial
Joura Vishal
 
Presentation1
Presentation1Presentation1
Presentation1
Joura Vishal
 
Triggers for college success cr
Triggers for college success crTriggers for college success cr
Triggers for college success cr
John Cronin
 

Viewers also liked (17)

Parent conferencing with map
Parent conferencing with mapParent conferencing with map
Parent conferencing with map
 
Ed Reform Lecture - University of Arkansas
Ed Reform Lecture - University of ArkansasEd Reform Lecture - University of Arkansas
Ed Reform Lecture - University of Arkansas
 
Connecticut mesuring and modeling growth
Connecticut   mesuring and modeling growthConnecticut   mesuring and modeling growth
Connecticut mesuring and modeling growth
 
Nyinst
NyinstNyinst
Nyinst
 
Teacher evaluation and goal setting connecticut
Teacher evaluation and goal setting   connecticutTeacher evaluation and goal setting   connecticut
Teacher evaluation and goal setting connecticut
 
Teacher evaluation present
Teacher evaluation presentTeacher evaluation present
Teacher evaluation present
 
Connecticut mesuring and modeling growth
Connecticut   mesuring and modeling growthConnecticut   mesuring and modeling growth
Connecticut mesuring and modeling growth
 
New ways to think about framing accountability to your community
New ways to think about framing accountability to your communityNew ways to think about framing accountability to your community
New ways to think about framing accountability to your community
 
Connecticut mesuring and modeling growth
Connecticut   mesuring and modeling growthConnecticut   mesuring and modeling growth
Connecticut mesuring and modeling growth
 
College
CollegeCollege
College
 
Cv in english 2012 trainer lopez calderon j.
Cv in english 2012 trainer lopez calderon j.Cv in english 2012 trainer lopez calderon j.
Cv in english 2012 trainer lopez calderon j.
 
Teacher evaluation presentation3 mass
Teacher evaluation presentation3  massTeacher evaluation presentation3  mass
Teacher evaluation presentation3 mass
 
Teacher evaluation presentation oregon
Teacher evaluation presentation   oregonTeacher evaluation presentation   oregon
Teacher evaluation presentation oregon
 
Rv assessment
Rv assessment Rv assessment
Rv assessment
 
BLOCK HF trial
BLOCK HF trial BLOCK HF trial
BLOCK HF trial
 
Presentation1
Presentation1Presentation1
Presentation1
 
Triggers for college success cr
Triggers for college success crTriggers for college success cr
Triggers for college success cr
 

Similar to Colorado assessment summit_oct12

TASA Presentation by John Cronin
TASA Presentation by John CroninTASA Presentation by John Cronin
TASA Presentation by John Cronin
NWEA
 
INEE. Ponencia Profesor Rivkin. Universidad Illinois. Estimating the Effect o...
INEE. Ponencia Profesor Rivkin. Universidad Illinois. Estimating the Effect o...INEE. Ponencia Profesor Rivkin. Universidad Illinois. Estimating the Effect o...
INEE. Ponencia Profesor Rivkin. Universidad Illinois. Estimating the Effect o...
Instituto Nacional de Evaluación Educativa
 
16.20 o12.3 j wright and p cliffe
16.20 o12.3 j wright and p cliffe16.20 o12.3 j wright and p cliffe
16.20 o12.3 j wright and p cliffe
NZIP
 
Ncea and standards based education
Ncea and standards based educationNcea and standards based education
Ncea and standards based education
dhousden
 
Action research on grading and assessment practices of grade 7 mathematics
Action research on grading and assessment practices of grade 7 mathematicsAction research on grading and assessment practices of grade 7 mathematics
Action research on grading and assessment practices of grade 7 mathematics
Gary Johnston
 
SWBOCES APPR Primer
SWBOCES APPR PrimerSWBOCES APPR Primer
SWBOCES APPR Primer
Southern Westchester BOCES
 
ITEM ANALYSIS 2023.pptx uses for exam development especially national examina...
ITEM ANALYSIS 2023.pptx uses for exam development especially national examina...ITEM ANALYSIS 2023.pptx uses for exam development especially national examina...
ITEM ANALYSIS 2023.pptx uses for exam development especially national examina...
GalataaAGoobanaa
 
Grading survey results
Grading survey resultsGrading survey results
Grading survey results
Karen Teff
 
Administration of the Test and Analysis of Students’ Performance
Administration of the Test and Analysis of Students’ PerformanceAdministration of the Test and Analysis of Students’ Performance
Administration of the Test and Analysis of Students’ Performance
Gautam Kumar
 
Naesp keynote3
Naesp keynote3Naesp keynote3
Naesp keynote3
John Cronin
 
Slo Demonstration For Web
Slo Demonstration For WebSlo Demonstration For Web
Slo Demonstration For Web
tayapage
 
Individualized-Data-Report_Sample
Individualized-Data-Report_SampleIndividualized-Data-Report_Sample
Individualized-Data-Report_Sample
Lisa Martinez
 
DE-MYSTIFYING THE U.S. NEWS RANKINGS
DE-MYSTIFYING THE U.S. NEWS RANKINGSDE-MYSTIFYING THE U.S. NEWS RANKINGS
DE-MYSTIFYING THE U.S. NEWS RANKINGS
Joe Brennan, Ph.D.
 
Using tests for teacher evaluation texas
Using tests for teacher evaluation texasUsing tests for teacher evaluation texas
Using tests for teacher evaluation texas
NWEA
 
Analyzing Assessment.docx
Analyzing Assessment.docxAnalyzing Assessment.docx
Analyzing Assessment.docx
4934bk
 
Colorado assessment summit_teacher_eval
Colorado assessment summit_teacher_evalColorado assessment summit_teacher_eval
Colorado assessment summit_teacher_eval
John Cronin
 
Assessment- Introduction, Internal & CIA, (Formative/Summative), Planning of ...
Assessment- Introduction, Internal & CIA, (Formative/Summative), Planning of ...Assessment- Introduction, Internal & CIA, (Formative/Summative), Planning of ...
Assessment- Introduction, Internal & CIA, (Formative/Summative), Planning of ...
Prof. Dr. Hironmoy Roy
 
item analysis.pptx education pnc item analysis
item analysis.pptx education pnc item analysisitem analysis.pptx education pnc item analysis
item analysis.pptx education pnc item analysis
swatisheth8
 
ESE444/544 - Types of Assessment
ESE444/544 - Types of AssessmentESE444/544 - Types of Assessment
ESE444/544 - Types of Assessment
amacargel
 
New item analysis
New item analysisNew item analysis
New item analysis
sibi rachel alex
 

Similar to Colorado assessment summit_oct12 (20)

TASA Presentation by John Cronin
TASA Presentation by John CroninTASA Presentation by John Cronin
TASA Presentation by John Cronin
 
INEE. Ponencia Profesor Rivkin. Universidad Illinois. Estimating the Effect o...
INEE. Ponencia Profesor Rivkin. Universidad Illinois. Estimating the Effect o...INEE. Ponencia Profesor Rivkin. Universidad Illinois. Estimating the Effect o...
INEE. Ponencia Profesor Rivkin. Universidad Illinois. Estimating the Effect o...
 
16.20 o12.3 j wright and p cliffe
16.20 o12.3 j wright and p cliffe16.20 o12.3 j wright and p cliffe
16.20 o12.3 j wright and p cliffe
 
Ncea and standards based education
Ncea and standards based educationNcea and standards based education
Ncea and standards based education
 
Action research on grading and assessment practices of grade 7 mathematics
Action research on grading and assessment practices of grade 7 mathematicsAction research on grading and assessment practices of grade 7 mathematics
Action research on grading and assessment practices of grade 7 mathematics
 
SWBOCES APPR Primer
SWBOCES APPR PrimerSWBOCES APPR Primer
SWBOCES APPR Primer
 
ITEM ANALYSIS 2023.pptx uses for exam development especially national examina...
ITEM ANALYSIS 2023.pptx uses for exam development especially national examina...ITEM ANALYSIS 2023.pptx uses for exam development especially national examina...
ITEM ANALYSIS 2023.pptx uses for exam development especially national examina...
 
Grading survey results
Grading survey resultsGrading survey results
Grading survey results
 
Administration of the Test and Analysis of Students’ Performance
Administration of the Test and Analysis of Students’ PerformanceAdministration of the Test and Analysis of Students’ Performance
Administration of the Test and Analysis of Students’ Performance
 
Naesp keynote3
Naesp keynote3Naesp keynote3
Naesp keynote3
 
Slo Demonstration For Web
Slo Demonstration For WebSlo Demonstration For Web
Slo Demonstration For Web
 
Individualized-Data-Report_Sample
Individualized-Data-Report_SampleIndividualized-Data-Report_Sample
Individualized-Data-Report_Sample
 
DE-MYSTIFYING THE U.S. NEWS RANKINGS
DE-MYSTIFYING THE U.S. NEWS RANKINGSDE-MYSTIFYING THE U.S. NEWS RANKINGS
DE-MYSTIFYING THE U.S. NEWS RANKINGS
 
Using tests for teacher evaluation texas
Using tests for teacher evaluation texasUsing tests for teacher evaluation texas
Using tests for teacher evaluation texas
 
Analyzing Assessment.docx
Analyzing Assessment.docxAnalyzing Assessment.docx
Analyzing Assessment.docx
 
Colorado assessment summit_teacher_eval
Colorado assessment summit_teacher_evalColorado assessment summit_teacher_eval
Colorado assessment summit_teacher_eval
 
Assessment- Introduction, Internal & CIA, (Formative/Summative), Planning of ...
Assessment- Introduction, Internal & CIA, (Formative/Summative), Planning of ...Assessment- Introduction, Internal & CIA, (Formative/Summative), Planning of ...
Assessment- Introduction, Internal & CIA, (Formative/Summative), Planning of ...
 
item analysis.pptx education pnc item analysis
item analysis.pptx education pnc item analysisitem analysis.pptx education pnc item analysis
item analysis.pptx education pnc item analysis
 
ESE444/544 - Types of Assessment
ESE444/544 - Types of AssessmentESE444/544 - Types of Assessment
ESE444/544 - Types of Assessment
 
New item analysis
New item analysisNew item analysis
New item analysis
 

More from John Cronin

Nycoss presentation
Nycoss presentationNycoss presentation
Nycoss presentation
John Cronin
 
California administrator symposium nwea
California administrator symposium nweaCalifornia administrator symposium nwea
California administrator symposium nwea
John Cronin
 
Seven purposes presentation
Seven purposes presentationSeven purposes presentation
Seven purposes presentation
John Cronin
 
Chief accountability officers presentation
Chief accountability officers presentationChief accountability officers presentation
Chief accountability officers presentation
John Cronin
 
Valid data for school improvement final
Valid data for school improvement finalValid data for school improvement final
Valid data for school improvement final
John Cronin
 
College readiness presentation
College readiness presentationCollege readiness presentation
College readiness presentation
John Cronin
 
Tasa presentation version 2
Tasa presentation version 2Tasa presentation version 2
Tasa presentation version 2
John Cronin
 
The purpose driven assessment system
The purpose driven assessment systemThe purpose driven assessment system
The purpose driven assessment system
John Cronin
 
Maximizing student assessment systems cronin
Maximizing student assessment systems   croninMaximizing student assessment systems   cronin
Maximizing student assessment systems cronin
John Cronin
 

More from John Cronin (9)

Nycoss presentation
Nycoss presentationNycoss presentation
Nycoss presentation
 
California administrator symposium nwea
California administrator symposium nweaCalifornia administrator symposium nwea
California administrator symposium nwea
 
Seven purposes presentation
Seven purposes presentationSeven purposes presentation
Seven purposes presentation
 
Chief accountability officers presentation
Chief accountability officers presentationChief accountability officers presentation
Chief accountability officers presentation
 
Valid data for school improvement final
Valid data for school improvement finalValid data for school improvement final
Valid data for school improvement final
 
College readiness presentation
College readiness presentationCollege readiness presentation
College readiness presentation
 
Tasa presentation version 2
Tasa presentation version 2Tasa presentation version 2
Tasa presentation version 2
 
The purpose driven assessment system
The purpose driven assessment systemThe purpose driven assessment system
The purpose driven assessment system
 
Maximizing student assessment systems cronin
Maximizing student assessment systems   croninMaximizing student assessment systems   cronin
Maximizing student assessment systems cronin
 

Colorado assessment summit_oct12

  • 1. Considerations when using tests for teacher evaluation Presenter - John Cronin, Ph.D. Contacting us: NWEA Main Number: 503-624-1951 E-mail: rebecca.moore@nwea.org This PowerPoint presentation and recommended resources are available at our website: www.kingsburycenter.org
  • 2. Label each player as effective, partially effective, or ineffective Avg. HR RBI SB .309 5 54 7 .303 13 53 20 .271 4 30 7 .270 28 71 4 .260 16 58 3 .238 7 37 1 .217 5 28 0
  • 3. Label each player as effective, partially effective, or ineffective Avg. HR RBI SB Rosario .309 5 54 7 Gonzales .303 13 53 20 Scuturo .271 4 30 7 Cudger .270 28 71 4 Helton .260 16 58 3 Hernandez .238 7 37 1 Rosario .217 5 28 0
  • 4. Facts about baseball players • If effective baseball players hit .300, then 90% of baseball players are ineffective. • If effective baseball players are better-than-av average hitters than 50% are ineffective. • A baseball player retains his job is he performs better than the available replacement. • Most of the pool of available replacements are lousy baseball players.
  • 5. Application to teaching Don’t dismiss teachers for incompetence unless you know you can replace them with someone better. Don’t identify more teachers for dismissal than you can support through remediation. Don’t identify more teachers for dismissal than you can manage through the dismissal process.
  • 6. Key requirements related to testing • Assessment constitutes 50% of the evaluation. • Statewide summative assessments for subjects in which available. Districts will be on their own for other subjects. • Use of the Colorado Growth Model with statewide assessment. • A measure of individually attributed or collectively attributed student growth. • Local measure must be credible, valid (aligned), reliable, and inferences from the measure must be supportable by evidence and logic. • The law requires that the measures should support consistent inferences. • Rating of ineffective or partially effective can lead to loss of non- probationary status. • If a value-added model is used the model must be transparent enough to permit external evaluation.
  • 7. Unique characteristics of the Colorado approach • Student progress counts for 50% of the evaluation. • Teachers are evaluated on both a “catch up” and “keep up” metric (at least on TCAP) • The Colorado Growth Model will likely be used to evaluate progress (at least on TCAP)
  • 8. Unique characteristics of the Colorado approach • Student progress counts for 50% of the evaluation. • Teachers are evaluated on both a “catch up” and “keep up” metric (at least on TCAP) • The Colorado Growth Model will likely be used to evaluate progress (at least on TCAP)
  • 9. Obvious possible issues • The requirement that the assessment support inferences of teacher effectiveness opens a legal question. • The credibility requirement is unique and not interpreted.
  • 10. How tests are used to evaluate teachers and principals Testing Metric (Growth or Gain Score) Analysis (Value Added Effect Size and/or ranking) Evaluation (Performance Rating)
  • 12. Inconsistency occurs because • Of differences in test design. • Differences in testing conditions. • Differences in models being applied to evaluate growth.
  • 13. Inconsistency between tests California STAR NWEA MAP
  • 14. The reliability problem – Inconsistency in testing conditions Test Retest Test 1 Test 2 Test 1 Test 2 Time 1 Time 1 Time 2 Time 2
  • 15. The reliability problem – Inconsistency in testing conditions Test 1 Test 2 Test 1 Test 2 Time 1 Time 1 Time 2 Time 2 Test 1 Test 2 Test 1 Test 2 Time 1 Time 1 Time 2 Time 2 Test 1 Test 2 Test 1 Test 2 Time 1 Time 1 Time 2 Time 2
  • 16. The problem with spring-spring testing Teacher 1 Summer Teacher 2 3/11 4/11 5/11 6/11 7/11 8/11 9/11 10/11 11/11 12/11 1/12 2/12 3/12
  • 17. The problem with spring-spring testing Teacher 1 Summer Teacher 2 3/11 4/11 5/11 6/11 7/11 8/11 9/11 10/11 11/11 12/11 1/12 2/12 3/12
  • 18. The problem with spring-spring testing Teacher 1 Summer Teacher 2 3/11 4/11 5/11 6/11 7/11 8/11 9/11 10/11 11/11 12/11 1/12 2/12 3/12
  • 19. Characteristics of value-added metrics • Value-added metrics are inherently NORMATIVE. • If below average = partially effective then half of the average staff will be partially effective. • Value-added metrics can’t measure progress of the larger group over time. • Extreme performance is more likely to have alternate explanations.
  • 20. Issues in the use of growth and value- added measures “Among those who ranked in the top category on the TAKS reading test, more than 17% ranked among the lowest two categories on the Stanford. Similarly more than 15% of the lowest value-added teachers on the TAKS were in the highest two categories on the Stanford.” Corcoran, S., Jennings, J., & Beveridge, A., Teacher Effectiveness on High and Low Stakes Tests, Paper presented at the Institute for Research on Poverty summer workshop, Madison, WI (2010).
  • 21. Reliability of teacher value-added estimates Teachers with growth scores in lowest and highest quintile over two years using NWEA’s Measures of Academic Progress Bottom Top quintile quintile Y1&Y2 Y1&Y2 Number 59/493 63/493 Percent 12% 13% r .64 r2 .41 Typical r values for measures of teaching effectiveness range between .30 and .60 (Brown Center on Education Policy, 2010)
  • 22. Range of teacher value-added estimates 12.00 11.00 Mathematics Growth Index Distribution by Teacher - Validity Filtered 10.00 9.00 Each line in this display represents a single teacher. The graphic shows the average growth index score for each teacher (green 8.00 line), plus or minus the standard error of the growth index estimate 7.00 (black line). We removed students who had tests of questionable validity and teachers with fewer than 20 students. 6.00 5.00 Average Growth Index Score and Range 4.00 Q5 3.00 2.00 Q4 1.00 0.00 Q3 -1.00 -2.00 Q2 -3.00 -4.00 Q1 -5.00 -6.00 -7.00 -8.00 -9.00 -10.00 -11.00 -12.00
  • 23. New York City • Margins of error can be very large • Increasing n doesn't always decrease the margin of error • The margin of error in math is typically less than reading
  • 24. Inconsistency among the Colorado Growth Model and other value-added approaches.
  • 25. Los Angeles Unified • Teachers can easily rate in multiple categories • The choice of model can have a large impact • Models effect English more than Math • Teachers do better in some subjects than others • More complex models don't necessarily favor the teacher
  • 26. Issues with the Colorado Growth Model • When applied to MAP it discards the advantages of a cross-grade scale and robust growth norms. • It is a descriptive and not a causal model. • As currently applied it does not control for factors outside the teacher’s influence that may affect student growth.
  • 27. A brief commentary on the Colorado Growth Model It’s limitations •It does not support inference. •It does not take advantage of the useful characteristics of a vertical scale. •It uses only prior scores and past testing history to evaluate growth.
  • 28. A brief commentary on the Colorado Growth Model Other limitations •The model can’t be used for cross- state comparisons. • the model is problematic for assessing long-term trends.
  • 29. A finding of effectiveness or ineffectiveness is more defensible when it is arrived at by: 1. Two or more assessments of different designs. 2. Two or more models of different designs. 3. As many cases as possible. It is not good to choose tests or models for local assessment in hopes that they will mimic the state assessment.
  • 30. Potential Litigation Issues The use of value-added data for high stakes personnel decisions does not yet have a strong, coherent, body of case law. Expect litigation if value-added results are the lynchpin evidence for a teacher-dismissal case until a body of case law is established.
  • 31. Instability at the tails of the distribution “The findings indicate that these modeling choices can significantly influence outcomes for individual teachers, particularly those in the tails of the performance distribution who are most likely to be targeted by high-stakes policies.” Ballou, D., Mokher, C. and Cavalluzzo, L. (2012) Using Value-Added Assessment for Personnel Decisions: How Omitted Variables and Model Specification Influence Teachers’ Outcomes. LA Times Teacher #1 LA Times Teacher #2
  • 32. Possible racial bias in models “Significant evidence of bias plagued the value-added model estimated for the Los Angeles Times in 2010, including significant patterns of racial disparities in teacher ratings both by the race of the student served and by the race of the teachers (see Green, Baker and Oluwole, 2012). These model biases raise the possibility that Title VII disparate impact claims might also be filed by teachers dismissed on the basis of their value-added estimates. Additional analyses of the data, including richer models using additional variables mitigated substantial portions of the bias in the LA Times models (Briggs & Domingue, 2010).” Baker, B. (2012, April 28). If it’s not valid, reliability doesn’t matter so much! More on VAM-ing & SGP-ing Teacher Dismissal.
  • 33. Issues in the use of growth and value- added measures Lack of random assignment The use of a value-added model assumes that the school doesn’t add a source of variation that isn’t controlled for in the model. e.g. Young teachers are assigned disproportionate numbers of students with poor discipline records.
  • 34. Measurement Issues Moving from the model to the teacher rating
  • 35. Translating ranked data to ratings - principles • There is no “science” per se around translating a ranking to a rating. If you call a bottom 40% teacher ineffective that is a judgment. • The rating process can be politicized. • The process is easy to over-engineer.
  • 36. New York Rating System • 60 points assigned from classroom observation • 20 points assigned from state assessment • 20 points assigned from local assessment • A score of 64 or less is rated ineffective.
  • 37. Ineffective (Growth Developing (Growth Measures) Effective (Growth Measures) Highly Effective (Growth Measures) Measures) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 0 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 3 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 2 2 4 5 6 6 6 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 Ineffective (Observational) 3 2 5 6 7 7 8 8 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 4 3 5 7 8 9 9 10 10 11 11 11 12 12 12 12 13 13 13 13 13 13 14 14 14 14 14 14 14 14 14 15 15 15 15 15 15 15 15 15 15 15 5 3 6 8 9 10 11 11 12 12 13 13 14 14 14 14 15 15 15 15 16 16 16 16 16 16 16 17 17 17 17 17 17 17 17 17 18 18 18 18 18 18 6 3 6 8 10 11 12 13 13 14 14 15 15 16 16 16 17 17 17 17 18 18 18 18 18 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 20 21 7 3 7 9 11 12 13 14 15 15 16 16 17 17 18 18 18 19 19 19 20 20 20 20 20 21 21 21 21 21 22 22 22 22 22 22 22 23 23 23 23 23 8 3 7 10 11 13 14 15 16 17 17 18 18 19 19 20 20 20 21 21 21 22 22 22 23 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25 25 25 9 3 8 10 12 14 15 16 17 18 18 19 20 20 21 21 22 22 23 23 23 24 24 24 24 25 25 25 25 26 26 26 26 26 27 27 27 27 27 27 28 28 10 3 8 11 13 14 16 17 18 19 20 20 21 22 22 23 23 24 24 25 25 25 26 26 26 27 27 27 27 28 28 28 28 29 29 29 29 29 29 30 30 30 11 3 8 11 13 15 17 18 19 20 21 22 22 23 24 24 25 25 26 26 27 27 27 28 28 28 29 29 29 30 30 30 30 31 31 31 31 31 32 32 32 32 12 4 8 12 14 16 17 19 20 21 22 23 24 24 25 26 26 27 27 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 33 33 34 34 34 34 13 4 9 12 14 16 18 20 21 22 23 24 25 26 26 27 28 28 29 29 30 30 31 31 31 32 32 33 33 33 34 34 34 34 35 35 35 35 36 36 36 36 14 4 9 12 15 17 19 20 22 23 24 25 26 27 27 28 29 30 30 31 31 32 32 33 33 33 34 34 35 35 35 36 36 36 37 37 37 37 38 38 38 38 15 4 9 13 15 18 19 21 23 24 25 26 27 28 29 29 30 31 31 32 33 33 34 34 35 35 35 36 36 37 37 37 38 38 38 39 39 39 40 40 40 40 16 4 9 13 16 18 20 22 23 25 26 27 28 29 30 31 31 32 33 33 34 35 35 36 36 37 37 37 38 38 39 39 39 40 40 40 41 41 41 42 42 42 17 4 9 13 16 19 21 23 24 25 27 28 29 30 31 32 33 33 34 35 35 36 37 37 38 38 39 39 39 40 40 41 41 42 42 42 43 43 43 44 44 44 Developing (Observational) 18 4 10 14 17 19 21 23 25 26 28 29 30 31 32 33 34 35 35 36 37 37 38 38 39 40 40 41 41 41 42 42 43 43 44 44 44 45 45 45 46 46 19 4 10 14 17 20 22 24 26 27 28 30 31 32 33 34 35 36 36 37 38 39 39 40 40 41 42 42 43 43 43 44 44 45 45 46 46 46 47 47 47 48 20 4 10 14 17 20 22 24 26 28 29 31 32 33 34 35 36 37 38 38 39 40 41 41 42 42 43 43 44 45 45 45 46 46 47 47 48 48 48 49 49 49 21 4 10 14 18 21 23 25 27 29 30 31 33 34 35 36 37 38 39 40 40 41 42 42 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50 50 51 51 22 4 10 15 18 21 23 26 27 29 31 32 34 35 36 37 38 39 40 41 42 42 43 44 44 45 46 46 47 47 48 48 49 49 50 50 51 51 52 52 52 53 23 4 10 15 18 21 24 26 28 30 31 33 34 36 37 38 39 40 41 42 43 43 44 45 46 46 47 48 48 49 49 50 50 51 51 52 52 53 53 54 54 54 24 4 11 15 19 22 24 27 29 31 32 34 35 36 38 39 40 41 42 43 44 45 45 46 47 48 48 49 50 50 51 51 52 52 53 53 54 54 55 55 56 56 25 4 11 15 19 22 25 27 29 31 33 34 36 37 39 40 41 42 43 44 45 46 47 47 48 49 50 50 51 52 52 53 53 54 54 55 55 56 56 57 57 58 26 4 11 16 19 23 25 28 30 32 34 35 37 38 39 41 42 43 44 45 46 47 48 49 49 50 51 51 52 53 53 54 55 55 56 56 57 57 58 58 59 59 27 4 11 16 20 23 26 28 30 32 34 36 37 39 40 42 43 44 45 46 47 48 49 50 50 51 52 53 53 54 55 55 56 57 57 58 58 59 59 60 60 61 28 4 11 16 20 23 26 29 31 33 35 37 38 40 41 42 44 45 46 47 48 49 50 51 52 52 53 54 55 55 56 57 57 58 59 59 60 60 61 61 62 62 29 4 11 16 20 24 26 29 31 34 35 37 39 40 42 43 45 46 47 48 49 50 51 52 53 54 54 55 56 57 57 58 59 59 60 61 61 62 62 63 63 64 30 4 11 16 20 24 27 30 32 34 36 38 40 41 43 44 45 47 48 49 50 51 52 53 54 55 56 56 57 58 59 59 60 61 61 62 62 63 64 64 65 65 31 4 11 17 21 24 27 30 32 35 37 39 40 42 43 45 46 47 49 50 51 52 53 54 55 56 57 57 58 59 60 61 61 62 63 63 64 64 65 66 66 67 32 4 11 17 21 25 28 30 33 35 37 39 41 43 44 46 47 48 50 51 52 53 54 55 56 57 58 59 59 60 61 62 62 63 64 64 65 66 66 67 68 68 33 4 12 17 21 25 28 31 33 36 38 40 42 43 45 46 48 49 50 52 53 54 55 56 57 58 59 60 61 61 62 63 64 64 65 66 66 67 68 68 69 69 Effective (Observational) 34 4 12 17 21 25 28 31 34 36 38 40 42 44 46 47 49 50 51 53 54 55 56 57 58 59 60 61 62 63 63 64 65 66 66 67 68 68 69 70 70 71 35 4 12 17 22 25 29 32 34 37 39 41 43 45 46 48 49 51 52 53 55 56 57 58 59 60 61 62 63 64 64 65 66 67 68 68 69 70 70 71 72 72 36 4 12 17 22 26 29 32 35 37 39 41 43 45 47 49 50 52 53 54 55 57 58 59 60 61 62 63 64 65 66 66 67 68 69 69 70 71 72 72 73 74 37 4 12 17 22 26 29 32 35 38 40 42 44 46 48 49 51 52 54 55 56 58 59 60 61 62 63 64 65 66 67 68 68 69 70 71 71 72 73 74 74 75 38 4 12 18 22 26 30 33 36 38 40 43 45 46 48 50 52 53 55 56 57 58 60 61 62 63 64 65 66 67 68 69 69 70 71 72 73 73 74 75 75 76 39 4 12 18 22 26 30 33 36 39 41 43 45 47 49 51 52 54 55 57 58 59 61 62 63 64 65 66 67 68 69 70 71 71 72 73 74 75 75 76 77 77 40 4 12 18 23 27 30 33 36 39 41 44 46 48 50 51 53 55 56 57 59 60 61 63 64 65 66 67 68 69 70 71 72 73 73 74 75 76 77 77 78 79 41 4 12 18 23 27 31 34 37 39 42 44 46 48 50 52 54 55 57 58 60 61 62 63 65 66 67 68 69 70 71 72 73 74 75 75 76 77 78 78 79 80 42 5 12 18 23 27 31 34 37 40 42 45 47 49 51 53 54 56 58 59 60 62 63 64 66 67 68 69 70 71 72 73 74 75 76 76 77 78 79 80 80 81 43 5 12 18 23 27 31 34 37 40 43 45 47 49 51 53 55 57 58 60 61 63 64 65 66 68 69 70 71 72 73 74 75 76 77 78 78 79 80 81 82 82 44 5 12 18 23 28 31 35 38 41 43 46 48 50 52 54 56 57 59 60 62 63 65 66 67 69 70 71 72 73 74 75 76 77 78 79 80 80 81 82 83 84 45 5 13 19 24 28 32 35 38 41 44 46 48 51 53 54 56 58 60 61 63 64 66 67 68 69 71 72 73 74 75 76 77 78 79 80 81 82 82 83 84 85 46 5 13 19 24 28 32 35 39 41 44 47 49 51 53 55 57 59 60 62 63 65 66 68 69 70 71 73 74 75 76 77 78 79 80 81 82 83 83 84 85 86 Highly Effective (Observational) 47 5 13 19 24 28 32 36 39 42 45 47 49 52 54 56 58 59 61 63 64 66 67 69 70 71 72 74 75 76 77 78 79 80 81 82 83 84 85 85 86 87 48 5 13 19 24 29 32 36 39 42 45 47 50 52 54 56 58 60 62 63 65 66 68 69 71 72 73 74 76 77 78 79 80 81 82 83 84 85 86 87 87 88 49 5 13 19 24 29 33 36 40 43 45 48 50 53 55 57 59 61 62 64 66 67 69 70 71 73 74 75 77 78 79 80 81 82 83 84 85 86 87 88 89 89 50 5 13 19 24 29 33 37 40 43 46 48 51 53 55 57 59 61 63 65 66 68 69 71 72 74 75 76 77 79 80 81 82 83 84 85 86 87 88 89 90 90 51 5 13 19 25 29 33 37 40 43 46 49 51 54 56 58 60 62 64 65 67 69 70 72 73 74 76 77 78 79 81 82 83 84 85 86 87 88 89 90 91 92 52 5 13 19 25 29 33 37 41 44 47 49 52 54 56 58 61 62 64 66 68 69 71 72 74 75 77 78 79 80 82 83 84 85 86 87 88 89 90 91 92 93 53 5 13 19 25 30 34 37 41 44 47 50 52 55 57 59 61 63 65 67 68 70 72 73 75 76 77 79 80 81 82 84 85 86 87 88 89 90 91 92 93 94 54 5 13 20 25 30 34 38 41 44 47 50 53 55 57 60 62 64 66 67 69 71 72 74 75 77 78 80 81 82 83 85 86 87 88 89 90 91 92 93 94 95 55 5 13 20 25 30 34 38 41 45 48 50 53 56 58 60 62 64 66 68 70 71 73 75 76 78 79 80 82 83 84 85 87 88 89 90 91 92 93 94 95 96 56 5 13 20 25 30 34 38 42 45 48 51 54 56 58 61 63 65 67 69 70 72 74 75 77 78 80 81 82 84 85 86 87 89 90 91 92 93 94 95 96 97 57 5 13 20 25 30 35 38 42 45 48 51 54 56 59 61 63 65 67 69 71 73 74 76 78 79 81 82 83 85 86 87 88 90 91 92 93 94 95 96 97 98 58 5 13 20 26 30 35 39 42 46 49 52 54 57 59 62 64 66 68 70 72 73 75 77 78 80 81 83 84 85 87 88 89 90 92 93 94 95 96 97 98 99 59 5 13 20 26 31 35 39 43 46 49 52 55 57 60 62 64 66 68 70 72 74 76 77 79 81 82 83 85 86 88 89 90 91 92 94 95 96 97 98 99 100 60 5 13 20 26 31 35 39 43 46 49 52 55 58 60 63 65 67 69 71 73 75 76 78 80 81 83 84 86 87 88 90 91 92 93 95 96 97 98 99 100 101
  • 38. Cheating Atlanta Public Schools Crescendo Charter Schools Philadelphia Public Schools Washington DC Public Schools Houston Independent School District Michigan Public Schools
  • 39. Unintended Consequences? • Many principals and teachers (including good ones) will seek schools or teaching assignments that they think will improve their results. • Principals and teachers may game the system, inadvertently or intentionally. • Many teachers will seek opportunities to avoid grades with standardized tests. • Ranking metrics can discourage cooperation among principals and teachers – finding ways to reward teamwork and cooperation are important.
  • 40. Case Study #1 - Mean value-added performance in mathematics by school – fall to spring 6.00 4.00 2.00 0.00 -2.00 -4.00 -6.00 -8.00
  • 41. Case Study #1 - Mean spring and fall test duration in minutes by school 90.00 80.00 70.00 60.00 50.00 Spring term Fall term 40.00 30.00 20.00 10.00 0.00
  • 42. Case Study #1 - Mean value-added growth by school and test duration 8.00 6.00 4.00 2.00 0.00 -2.00 -4.00 -6.00 -8.00 -10.00 Students taking 10+ minutes longer spring than fall All other students
  • 43. Case Study # 2 Differences in fall-spring test durations Differences in growth index score based on fall-spring test durations Mathematics 15% Mathematics 6.0 5.0 Growth Index 4.0 25% 3.0 60% 2.0 1.0 0.0 Spring < Fall Spring = Fall Spring > Fall Spring < Fall Spring = Fall Spring > Fall
  • 44. Case Study # 2 How much of summer loss is really summer loss? Differences in spring -fall test durations Differences in raw growth based by spring-fall test duration 0.0 -0.5 25% -1.0 -1.5 42% -2.0 -2.5 -3.0 -3.5 -4.0 -4.5 33% -5.0 Fall < Spring Fall = Spring Fall > Spring Fall < Spring Fall = Spring Fall >Spring
  • 45. Case Study # 2 Differences in fall-spring test duration (yellow-black) and Differences in growth index scores (green) by school 200 10.0 180 9.0 160 8.0 140 7.0 Growth Index 120 6.0 Minutes 100 5.0 80 4.0 60 3.0 40 2.0 20 1.0 0 0.0 School Growth Index Fall test duration Spring test duration
  • 46. Negotiated goals – Student Learning Objectives • Negotiated goals (SLOs) are likely to be necessary in some subjects. • It is difficult to set fair and reasonable goals for improvement absent norms or context. • It is likely that some goals will be absurdly high and others way too low.
  • 47. An alternate approach • Give primacy to evaluator observation for judging teachers. • Focus mandatory observations on low performers. • Use assessments and value-added measurement to validate observations. • Require reassessment when observations and assessment data are in significant misalignment.
  • 48. Possible legal issues • Title VII of the Civil Rights Act of 1964 – Disparate impact of sanctions on a protected group. • State statutes that provide tenure and other related protections to teachers. • Challenges to a finding of “incompetence” stemming from the growth or value-added data.
  • 49. Recommendations • Embrace the formative advantages of growth measurement as well as the summative. • Create comprehensive evaluation systems with multiple measures of teacher effectiveness (Rand, 2010) • Select measures as carefully as value-added models. • Use multiple years of student achievement data. • Understand the issues and the tradeoffs.
  • 50. Thank you for attending this event Presenter - John Cronin, Ph.D. Contacting us: NWEA Main Number: 503-624-1951 E-mail: rebecca.moore@nwea.org The presentation and recommended resources are available at our website: www.kingsburycenter.org