SlideShare a Scribd company logo
1 of 34
The effect of testing on student achievement:
                   1910-2010




                    Richard P. PHELPS

© 2012, Richard P                                       1
                     International Test Commission, 8th Conference, Amsterdam,
PHELPS
Meta-analysis


  • A method for
    summarizing a large
    research literature,
    with a single,
    comparable measure.



© 2012, Richard P                                      2
                    International Test Commission, 8th Conference, Amsterdam,
PHELPS
The effect of testing on student
                             achievement

                    • 12-year long study

                    • analyzed close to 700 separate studies,
                      and more than 1,600 separate effects

                    • 2,000 other studies were reviewed and
                      found incomplete or inappropriate

                    • lacking sufficient time and money,
                      hundreds of other studies will not be
                      reviewed

© 2012, Richard P                                        3
                      International Test Commission, 8th Conference, Amsterdam,
PHELPS
Looking for studies to include in the
                meta-analyses




1. Included only those studies that found an effect from
   testing on student achievement or on teacher
   instruction…

© 2012, Richard P                                      4
                    International Test Commission, 8th Conference, Amsterdam,
PHELPS
Studies included in the meta-analyses




2. …when:
  • a test is newly introduced, or newly removed
  • quantity of testing is increased or reduced
  • test stakes are introduced or increased, or removed or
    reduced
© 2012, Richard P                                      5
                    International Test Commission, 8th Conference, Amsterdam,
PHELPS
Studies included in the meta-analyses

   3. …plus previous research summaries (e.g.)

                           • Kulik, Kulik, Bangert-Drowns, &
                             Schwalb (1983-1991) on:
                                – mastery testing,
                                – frequency of testing, and
                                – programs for high-risk university
                                  students
                           • Basol & Johanson (2009) on testing
                             frequency
                           • Jaekyung Lee (2007) on cross-state
                             studies
                           • W.J. Haynie (2007) in career-tech ed

© 2012, Richard P                                      6
                    International Test Commission, 8th Conference, Amsterdam,
PHELPS
Number of studies of effects,
               by methodology type

                                   Number of           Number of
 Methodology type                   studies             effects
 Quantitative                          177                  640

 Surveys and public
                                       247                  813
 opinion polls (US & Canada)

 Qualitative                           245                  245

 TOTAL                                 669                 1698


© 2012, Richard P                                      7
                    International Test Commission, 8th Conference, Amsterdam,
PHELPS
Effect size: Cohen’s d


          d = (YE - YC) / Spool


      YE = mean, experimental group
      YC = mean, control group
      Spooled = standard deviation

© 2012, Richard P                                      8
                    International Test Commission, 8th Conference, Amsterdam,
PHELPS
Effect size: Other formulae


                         d = t*((n1+n2/n1*n2)^0.5
                         d = 2r/(1-r²)^0.5
                         d = (YE pre-YE post-YC pre+
                                      YC post)/Spooled post



© 2012, Richard P                                      9
                    International Test Commission, 8th Conference, Amsterdam,
PHELPS
Effect size: Interpretation




     • d between 0.25 & 0.50  weak effect
     • d between 0.50 et 0.75  medium effect
     • d more than 0.75               strong effect


© 2012, Richard P                                      10
                    International Test Commission, 8th Conference, Amsterdam,
PHELPS
Quantitative studies
  (population coverage ≈ 7 million persons)




© 2012, Richard P                                       11
                     International Test Commission, 8th Conference, Amsterdam,
PHELPS
Quantitative studies: Effect size

•   “Bare bones” calculation:

                 d ≈ +0.55      …a medium effect

•   Bare bones effect size adjusted for measurement error

                 d ≈ +0.71      …a stronger effect

•   Using same-study-author aggregation

                 d ≈ +0.88      …a strong effect




© 2012, Richard P                                        12
                      International Test Commission, 8th Conference, Amsterdam,
PHELPS
Which predictors matter?


                                                              Mean Effect
 Treatment Group…                                                Size
 …is made aware of performance, and control group is not         +0.98

 …receives targeted instruction (e.g., remediation)              +0.96

 …is tested with higher stakes than control group                +0.87

 …is tested more frequently than control group                   +0.85



© 2012, Richard P                                         13
                       International Test Commission, 8th Conference, Amsterdam,
PHELPS
More Moderators – Source of Test


                                     Number of         Mean
                                      Studies        Effect Size
     Researcher or Teacher              87              0.93
     National                             24             0.87
     Commercial                           38             0.82
     State or District                    11             0.72
     Total                                160




© 2012, Richard P                                       14
                     International Test Commission, 8th Conference, Amsterdam,
PHELPS
More Moderators – Sponsor of Test


                                  Number of   Mean
                                   Studies  Effect Size
          International               5        1.02
          Local                        99           0.93
          National                     45           0.81
          State                        11           0.64
          Total                       160



© 2012, Richard P                                       15
                     International Test Commission, 8th Conference, Amsterdam,
PHELPS
More Moderators - Study Design

                                       Number of          Mean
                                        Studies         Effect Size
  Pre-post                                12               0.97
  Experiment, Quasi-experiment             107             0.94
  Multivariate                              26             0.80
  Experiment, posttest only                  7             0.60
  Pre-post (with shadow test)                8             0.58
  Total                                    160


© 2012, Richard P                                      16
                    International Test Commission, 8th Conference, Amsterdam,
PHELPS
More Moderators – Scale of Analysis


                                       Number of   Mean
                                        Studies  Effect Size
        Aggregated                         9        1.60
        Small-scale                        118           0.91
        Large-scale                        33            0.57
        Total                              160




© 2012, Richard P                                        17
                      International Test Commission, 8th Conference, Amsterdam,
PHELPS
More Moderators – Scale of Administration


                                       Number      Mean
                                      of Studies Effect Size
        Classroom                        115        0.95
        Mid-scale                          6           0.72
        Large-scale                       39           0.71

        Total                             160




© 2012, Richard P                                        18
                      International Test Commission, 8th Conference, Amsterdam,
PHELPS
Surveys and opinion polls




© 2012, Richard P                                      19
                    International Test Commission, 8th Conference, Amsterdam,
PHELPS
Percentage of survey items,
    by respondent group and type of survey

                 50
                 45
                 40
                 35
                 30                                               Education
       Percent




                 25                                               Providers
                 20
                 15                                               Education
                 10                                               Consumers
                  5
                  0
                      Public opinion polls   Program evaluation
                                                  surveys*




© 2012, Richard P                                                 20
                               International Test Commission, 8th Conference, Amsterdam,
PHELPS
Number and percent of survey items,
        by test stakes and target group


       Test stakes   Number   %            Target group   Number   %

       High             507   62           Students          393   46

       Medium           184   23           Schools           281   33

       Low               33    4           Teachers          116   14

       Unknown           89   11           No stakes          64   7

       TOTAL            813                TOTAL             854




© 2012, Richard P                                        21
                      International Test Commission, 8th Conference, Amsterdam,
PHELPS
Opinion polls, by year

              • 244 between 1958--2008, in the U.S. & Canada
              • 813 unique question-response combinations
              • close to 700,000 individual respondents
 120



 100



  80



  60



  40



  20



   0
       1960    1965   1970       1975   1980          1985   1990   1995   2000   2005
                                               Year



© 2012, Richard P                                               22
                             International Test Commission, 8th Conference, Amsterdam,
PHELPS
Surveys and opinion polls:
     Regular standardized tests, performance tests


                                             Regular tests   Performance tests
                                               (N ≈125)           (N ≈ 50)

    Respondent opinion                            d                  d
    Achievement is increased                      1.2               1.0

    …weighted by size of study population         1.9               0.5

    Instruction is improved                       1.0               1.4

    …weighted by size of study population         0.9               0.9

    Tests help align instruction                  1.0               1.0

    …weighted by size of study population         0.5               0.9




© 2012, Richard P                                                23
                              International Test Commission, 8th Conference, Amsterdam,
PHELPS
Qualitative studies: Summary

             (One cannot calculate an effect size.)




© 2012, Richard P                                      24
                    International Test Commission, 8th Conference, Amsterdam,
PHELPS
Qualitative studies, by methodology type


                                                      Number of
     Methodology                                       studies    %

     Case study                                          120      43

     Experiment or pre-post study                        21        7

     Interviews (individual or group)                    75       27

     Journal                                              2        1

     Review of official records, documents, reports      33       12

     Research review                                      8        3

     Survey                                              22        8

     TOTAL                                               281      100



© 2012, Richard P                                               25
                             International Test Commission, 8th Conference, Amsterdam,
PHELPS
Qualitative studies:
                          Effect on student achievement

        244 studies conducted in the past century in over 30 countries

                               Number of                         Percent without
    Direction of effect         studies     Percent of studies     the inferred

    Positive                      204              84                  93

    Positive inferred             24               10


    Mixed                          5                2                  2


    No change                      8                3                  4


    Negative                       3                1                  1

    TOTAL                         244              100                100


© 2012, Richard P                                               26
                             International Test Commission, 8th Conference, Amsterdam,
PHELPS
Qualitative studies: Testing improves student achievement
                  and teacher instruction

                                      Number of
      Achievement is improved          studies           %

      Yes                                200             95

      Mixed results                       1              <1

      No                                 10              5

      TOTAL                              211            100



                                      Number of
      Instruction is improved          studies           %
      Yes                                158             96
      No                                  7              4
      TOTAL                              165            100



© 2012, Richard P                                           27
                         International Test Commission, 8th Conference, Amsterdam,
PHELPS
Qualitative studies:
                       Variation by rigor and test stakes
                                                Level of rigor
Direction of effect                   high            medium             low           Total
Positive                               95                 67             42            204
Positive inferred                      10                 8               6             24
Mixed                                  3                  1               1             5
No change                              4                  3               1             8
Negative                               1                  1               1             3
TOTAL                                 113                 80             51            244

                                             Stakes
        Direction of effect   high    medium      low          unknown         Total
        Positive              133       27        38              6            204
        Positive inferred      12       5             7                         24
        Mixed                  4                      1                         5
        No change              2        1             5                         8
        Negative               3                                                3
        TOTAL                 154       33        51              6            244
© 2012, Richard P                                                28
                              International Test Commission, 8th Conference, Amsterdam,
PHELPS
Qualitative studies:
     Regular standardized tests and performance tests


                                               Regular tests   Performance tests
                                                 (N =176)           (N = 69)

    Study results                                   %                 %
    Generally positive                              93                95

    High-stakes tests                               71                42

    High level of study rigor                       46                48

    Student attitudes toward test positive          60                71

    Teacher attitudes toward test positive          55                80

    Student achievement improved                    95                95

    Instruction improved                            92                100

    Large-scale testing                             86                68



© 2012, Richard P                                                  29
                                International Test Commission, 8th Conference, Amsterdam,
PHELPS
An enormous research literature


• But, assertions that it does not
  exist at all are common

    – Some claims are made by
      those who oppose standardized
      testing, and may be wishful
      thinking

    – Others are “firstness” claims


© 2012, Richard P                                      30
                    International Test Commission, 8th Conference, Amsterdam,
PHELPS
Dismissive research reviews



                        •    With a dismissive research
                             literature review, a
                             researcher assures all that
                             no other researcher has
                             studied the same topic



© 2012, Richard P                                      31
                    International Test Commission, 8th Conference, Amsterdam,
PHELPS
Firstness claims



  • With a firstness
    claim, a researcher
    insists that he or
    she is the first to
    ever study a topic




© 2012, Richard P                                      32
                    International Test Commission, 8th Conference, Amsterdam,
PHELPS
Social costs are
   enormous

  • Research conducted by those
    without power or celebrity is
    dismissed -- ignored and lost
  • Public policies are skewed, based
    exclusively on the research results of
    those with power or celebrity
  • Society pays again and again for
    research that has already been done

© 2012, Richard P                                      33
                    International Test Commission, 8th Conference, Amsterdam,
PHELPS
The effect of testing on student
                   achievement:
                     1910-2010




                    Richard P. PHELPS
© 2012, Richard P                                       34
                     International Test Commission, 8th Conference, Amsterdam,
PHELPS

More Related Content

Similar to The effect of testing on student achievement: 1910-2010

2. Tools to calculate samplesize
2. Tools to calculate samplesize2. Tools to calculate samplesize
2. Tools to calculate samplesizeAzmi Mohd Tamil
 
BRENDER-Economic considerations in risk management-ID1485-IDRC2014_b
BRENDER-Economic considerations in risk management-ID1485-IDRC2014_bBRENDER-Economic considerations in risk management-ID1485-IDRC2014_b
BRENDER-Economic considerations in risk management-ID1485-IDRC2014_bGlobal Risk Forum GRFDavos
 
Ovretveit implementation science research course 1day sept 11
Ovretveit implementation science research course 1day sept 11Ovretveit implementation science research course 1day sept 11
Ovretveit implementation science research course 1day sept 11john
 
Interpretation of Human Abuse Potential Studies and Clinically Important Resp...
Interpretation of Human Abuse Potential Studies and Clinically Important Resp...Interpretation of Human Abuse Potential Studies and Clinically Important Resp...
Interpretation of Human Abuse Potential Studies and Clinically Important Resp...nlevy-cooperman
 
Epidemiology study design
Epidemiology study designEpidemiology study design
Epidemiology study designrobayade
 
Learning Organization in Department of Skills Development Malaysia
Learning Organization in Department of Skills Development MalaysiaLearning Organization in Department of Skills Development Malaysia
Learning Organization in Department of Skills Development MalaysiaGhalip Spahat
 
Koonal's Slides from the 2017 PROMs Conference
Koonal's Slides from the 2017 PROMs ConferenceKoonal's Slides from the 2017 PROMs Conference
Koonal's Slides from the 2017 PROMs ConferenceOffice of Health Economics
 
Innovative Sample Size Methods For Clinical Trials
Innovative Sample Size Methods For Clinical Trials Innovative Sample Size Methods For Clinical Trials
Innovative Sample Size Methods For Clinical Trials nQuery
 
Delphi in community assessment na
Delphi in community assessment naDelphi in community assessment na
Delphi in community assessment naHibsah Ridwan
 
Navigation Support for Learners in Informal Learning Environments, Recommende...
Navigation Support for Learners in Informal Learning Environments, Recommende...Navigation Support for Learners in Informal Learning Environments, Recommende...
Navigation Support for Learners in Informal Learning Environments, Recommende...Hendrik Drachsler
 
Desarrollo por GRADE de la Gúia de práctica de Encefalopatía hipóxico-isquemi...
Desarrollo por GRADE de la Gúia de práctica de Encefalopatía hipóxico-isquemi...Desarrollo por GRADE de la Gúia de práctica de Encefalopatía hipóxico-isquemi...
Desarrollo por GRADE de la Gúia de práctica de Encefalopatía hipóxico-isquemi...Javier González de Dios
 
B.S 4- Class 1-Introduction to analytical chemistry
B.S 4- Class 1-Introduction to analytical chemistryB.S 4- Class 1-Introduction to analytical chemistry
B.S 4- Class 1-Introduction to analytical chemistrySajjad Ullah
 
VINCE'S Project planning forms_0210-1
  VINCE'S Project planning forms_0210-1  VINCE'S Project planning forms_0210-1
VINCE'S Project planning forms_0210-1radvin
 
Paper review on micropollutants in European river
Paper review on micropollutants in European riverPaper review on micropollutants in European river
Paper review on micropollutants in European riverhicky1225
 
Adaptation of evidence-based clinical practice guidelines: the 'Adapted ADAPT...
Adaptation of evidence-based clinical practice guidelines: the 'Adapted ADAPT...Adaptation of evidence-based clinical practice guidelines: the 'Adapted ADAPT...
Adaptation of evidence-based clinical practice guidelines: the 'Adapted ADAPT...Yasser Sami Abdel Dayem Amer
 
Case study: Methodology Reviews
Case study: Methodology ReviewsCase study: Methodology Reviews
Case study: Methodology ReviewsCTSI at UCSF
 
Use of case pairs can potentially improve the efficiency and effectiveness of...
Use of case pairs can potentially improve the efficiency and effectiveness of...Use of case pairs can potentially improve the efficiency and effectiveness of...
Use of case pairs can potentially improve the efficiency and effectiveness of...Poh-Sun Goh
 

Similar to The effect of testing on student achievement: 1910-2010 (20)

2. Tools to calculate samplesize
2. Tools to calculate samplesize2. Tools to calculate samplesize
2. Tools to calculate samplesize
 
BRENDER-Economic considerations in risk management-ID1485-IDRC2014_b
BRENDER-Economic considerations in risk management-ID1485-IDRC2014_bBRENDER-Economic considerations in risk management-ID1485-IDRC2014_b
BRENDER-Economic considerations in risk management-ID1485-IDRC2014_b
 
05 Programme evaluation
05 Programme evaluation05 Programme evaluation
05 Programme evaluation
 
Ovretveit implementation science research course 1day sept 11
Ovretveit implementation science research course 1day sept 11Ovretveit implementation science research course 1day sept 11
Ovretveit implementation science research course 1day sept 11
 
Interpretation of Human Abuse Potential Studies and Clinically Important Resp...
Interpretation of Human Abuse Potential Studies and Clinically Important Resp...Interpretation of Human Abuse Potential Studies and Clinically Important Resp...
Interpretation of Human Abuse Potential Studies and Clinically Important Resp...
 
EDR8205-5
EDR8205-5EDR8205-5
EDR8205-5
 
Epidemiology study design
Epidemiology study designEpidemiology study design
Epidemiology study design
 
Learning Organization in Department of Skills Development Malaysia
Learning Organization in Department of Skills Development MalaysiaLearning Organization in Department of Skills Development Malaysia
Learning Organization in Department of Skills Development Malaysia
 
Koonal's Slides from the 2017 PROMs Conference
Koonal's Slides from the 2017 PROMs ConferenceKoonal's Slides from the 2017 PROMs Conference
Koonal's Slides from the 2017 PROMs Conference
 
Innovative Sample Size Methods For Clinical Trials
Innovative Sample Size Methods For Clinical Trials Innovative Sample Size Methods For Clinical Trials
Innovative Sample Size Methods For Clinical Trials
 
Rationalize research
Rationalize researchRationalize research
Rationalize research
 
Delphi in community assessment na
Delphi in community assessment naDelphi in community assessment na
Delphi in community assessment na
 
Navigation Support for Learners in Informal Learning Environments, Recommende...
Navigation Support for Learners in Informal Learning Environments, Recommende...Navigation Support for Learners in Informal Learning Environments, Recommende...
Navigation Support for Learners in Informal Learning Environments, Recommende...
 
Desarrollo por GRADE de la Gúia de práctica de Encefalopatía hipóxico-isquemi...
Desarrollo por GRADE de la Gúia de práctica de Encefalopatía hipóxico-isquemi...Desarrollo por GRADE de la Gúia de práctica de Encefalopatía hipóxico-isquemi...
Desarrollo por GRADE de la Gúia de práctica de Encefalopatía hipóxico-isquemi...
 
B.S 4- Class 1-Introduction to analytical chemistry
B.S 4- Class 1-Introduction to analytical chemistryB.S 4- Class 1-Introduction to analytical chemistry
B.S 4- Class 1-Introduction to analytical chemistry
 
VINCE'S Project planning forms_0210-1
  VINCE'S Project planning forms_0210-1  VINCE'S Project planning forms_0210-1
VINCE'S Project planning forms_0210-1
 
Paper review on micropollutants in European river
Paper review on micropollutants in European riverPaper review on micropollutants in European river
Paper review on micropollutants in European river
 
Adaptation of evidence-based clinical practice guidelines: the 'Adapted ADAPT...
Adaptation of evidence-based clinical practice guidelines: the 'Adapted ADAPT...Adaptation of evidence-based clinical practice guidelines: the 'Adapted ADAPT...
Adaptation of evidence-based clinical practice guidelines: the 'Adapted ADAPT...
 
Case study: Methodology Reviews
Case study: Methodology ReviewsCase study: Methodology Reviews
Case study: Methodology Reviews
 
Use of case pairs can potentially improve the efficiency and effectiveness of...
Use of case pairs can potentially improve the efficiency and effectiveness of...Use of case pairs can potentially improve the efficiency and effectiveness of...
Use of case pairs can potentially improve the efficiency and effectiveness of...
 

More from Richard P Phelps

Dismissive Reviews, Citation Cartels, and the Replication Crisis.pptx
Dismissive Reviews, Citation Cartels, and the Replication Crisis.pptxDismissive Reviews, Citation Cartels, and the Replication Crisis.pptx
Dismissive Reviews, Citation Cartels, and the Replication Crisis.pptxRichard P Phelps
 
The Successful Degradation of Evidence on Educational Testing in the United S...
The Successful Degradation of Evidence on Educational Testing in the United S...The Successful Degradation of Evidence on Educational Testing in the United S...
The Successful Degradation of Evidence on Educational Testing in the United S...Richard P Phelps
 
Comparing achievement and aptitude tests for university admission
Comparing achievement and aptitude tests for university admissionComparing achievement and aptitude tests for university admission
Comparing achievement and aptitude tests for university admissionRichard P Phelps
 
Boarding School: Benefits and Drawbacks
Boarding School: Benefits and DrawbacksBoarding School: Benefits and Drawbacks
Boarding School: Benefits and DrawbacksRichard P Phelps
 
It's a myth: High stakes cause test score inflation
It's a myth: High stakes cause test score inflation It's a myth: High stakes cause test score inflation
It's a myth: High stakes cause test score inflation Richard P Phelps
 
It's a myth: High stakes cause test score inflation
It's a myth: High stakes cause test score inflationIt's a myth: High stakes cause test score inflation
It's a myth: High stakes cause test score inflationRichard P Phelps
 
Designing an Assessment System
Designing an Assessment SystemDesigning an Assessment System
Designing an Assessment SystemRichard P Phelps
 
Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...
Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...
Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...Richard P Phelps
 
Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...
Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...
Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...Richard P Phelps
 
Arkansas common core presentation
Arkansas common core presentationArkansas common core presentation
Arkansas common core presentationRichard P Phelps
 
University Admission Testing in Chile: The PSU
University Admission Testing in Chile: The PSUUniversity Admission Testing in Chile: The PSU
University Admission Testing in Chile: The PSURichard P Phelps
 
Forty years of polls on standardized tests in education
Forty years of polls on standardized tests in educationForty years of polls on standardized tests in education
Forty years of polls on standardized tests in educationRichard P Phelps
 
Economic perspectives on testing
Economic perspectives on testingEconomic perspectives on testing
Economic perspectives on testingRichard P Phelps
 
L'effet de tests standardisés sur les résultats scolaires des élèves : 1910-...
L'effet de tests standardisés sur les résultats scolaires des élèves :  1910-...L'effet de tests standardisés sur les résultats scolaires des élèves :  1910-...
L'effet de tests standardisés sur les résultats scolaires des élèves : 1910-...Richard P Phelps
 
L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...
L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...
L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...Richard P Phelps
 
Worse Than Plagiarism: Dismissive Reviews
Worse Than Plagiarism: Dismissive ReviewsWorse Than Plagiarism: Dismissive Reviews
Worse Than Plagiarism: Dismissive ReviewsRichard P Phelps
 

More from Richard P Phelps (18)

Dismissive Reviews, Citation Cartels, and the Replication Crisis.pptx
Dismissive Reviews, Citation Cartels, and the Replication Crisis.pptxDismissive Reviews, Citation Cartels, and the Replication Crisis.pptx
Dismissive Reviews, Citation Cartels, and the Replication Crisis.pptx
 
The Successful Degradation of Evidence on Educational Testing in the United S...
The Successful Degradation of Evidence on Educational Testing in the United S...The Successful Degradation of Evidence on Educational Testing in the United S...
The Successful Degradation of Evidence on Educational Testing in the United S...
 
Comparing achievement and aptitude tests for university admission
Comparing achievement and aptitude tests for university admissionComparing achievement and aptitude tests for university admission
Comparing achievement and aptitude tests for university admission
 
Boarding School: Benefits and Drawbacks
Boarding School: Benefits and DrawbacksBoarding School: Benefits and Drawbacks
Boarding School: Benefits and Drawbacks
 
It's a myth: High stakes cause test score inflation
It's a myth: High stakes cause test score inflation It's a myth: High stakes cause test score inflation
It's a myth: High stakes cause test score inflation
 
It's a myth: High stakes cause test score inflation
It's a myth: High stakes cause test score inflationIt's a myth: High stakes cause test score inflation
It's a myth: High stakes cause test score inflation
 
Designing an Assessment System
Designing an Assessment SystemDesigning an Assessment System
Designing an Assessment System
 
Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...
Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...
Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...
 
Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...
Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...
Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...
 
Arkansas common core presentation
Arkansas common core presentationArkansas common core presentation
Arkansas common core presentation
 
University Admission Testing in Chile: The PSU
University Admission Testing in Chile: The PSUUniversity Admission Testing in Chile: The PSU
University Admission Testing in Chile: The PSU
 
Test benefits slide show
Test benefits slide showTest benefits slide show
Test benefits slide show
 
Forty years of polls on standardized tests in education
Forty years of polls on standardized tests in educationForty years of polls on standardized tests in education
Forty years of polls on standardized tests in education
 
Economic perspectives on testing
Economic perspectives on testingEconomic perspectives on testing
Economic perspectives on testing
 
L'effet de tests standardisés sur les résultats scolaires des élèves : 1910-...
L'effet de tests standardisés sur les résultats scolaires des élèves :  1910-...L'effet de tests standardisés sur les résultats scolaires des élèves :  1910-...
L'effet de tests standardisés sur les résultats scolaires des élèves : 1910-...
 
L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...
L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...
L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...
 
Source of Lake Wobegon
Source of Lake WobegonSource of Lake Wobegon
Source of Lake Wobegon
 
Worse Than Plagiarism: Dismissive Reviews
Worse Than Plagiarism: Dismissive ReviewsWorse Than Plagiarism: Dismissive Reviews
Worse Than Plagiarism: Dismissive Reviews
 

Recently uploaded

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 

Recently uploaded (20)

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 

The effect of testing on student achievement: 1910-2010

  • 1. The effect of testing on student achievement: 1910-2010 Richard P. PHELPS © 2012, Richard P 1 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 2. Meta-analysis • A method for summarizing a large research literature, with a single, comparable measure. © 2012, Richard P 2 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 3. The effect of testing on student achievement • 12-year long study • analyzed close to 700 separate studies, and more than 1,600 separate effects • 2,000 other studies were reviewed and found incomplete or inappropriate • lacking sufficient time and money, hundreds of other studies will not be reviewed © 2012, Richard P 3 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 4. Looking for studies to include in the meta-analyses 1. Included only those studies that found an effect from testing on student achievement or on teacher instruction… © 2012, Richard P 4 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 5. Studies included in the meta-analyses 2. …when: • a test is newly introduced, or newly removed • quantity of testing is increased or reduced • test stakes are introduced or increased, or removed or reduced © 2012, Richard P 5 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 6. Studies included in the meta-analyses 3. …plus previous research summaries (e.g.) • Kulik, Kulik, Bangert-Drowns, & Schwalb (1983-1991) on: – mastery testing, – frequency of testing, and – programs for high-risk university students • Basol & Johanson (2009) on testing frequency • Jaekyung Lee (2007) on cross-state studies • W.J. Haynie (2007) in career-tech ed © 2012, Richard P 6 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 7. Number of studies of effects, by methodology type Number of Number of Methodology type studies effects Quantitative 177 640 Surveys and public 247 813 opinion polls (US & Canada) Qualitative 245 245 TOTAL 669 1698 © 2012, Richard P 7 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 8. Effect size: Cohen’s d d = (YE - YC) / Spool YE = mean, experimental group YC = mean, control group Spooled = standard deviation © 2012, Richard P 8 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 9. Effect size: Other formulae d = t*((n1+n2/n1*n2)^0.5 d = 2r/(1-r²)^0.5 d = (YE pre-YE post-YC pre+ YC post)/Spooled post © 2012, Richard P 9 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 10. Effect size: Interpretation • d between 0.25 & 0.50  weak effect • d between 0.50 et 0.75  medium effect • d more than 0.75  strong effect © 2012, Richard P 10 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 11. Quantitative studies (population coverage ≈ 7 million persons) © 2012, Richard P 11 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 12. Quantitative studies: Effect size • “Bare bones” calculation: d ≈ +0.55 …a medium effect • Bare bones effect size adjusted for measurement error d ≈ +0.71 …a stronger effect • Using same-study-author aggregation d ≈ +0.88 …a strong effect © 2012, Richard P 12 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 13. Which predictors matter? Mean Effect Treatment Group… Size …is made aware of performance, and control group is not +0.98 …receives targeted instruction (e.g., remediation) +0.96 …is tested with higher stakes than control group +0.87 …is tested more frequently than control group +0.85 © 2012, Richard P 13 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 14. More Moderators – Source of Test Number of Mean Studies Effect Size Researcher or Teacher 87 0.93 National 24 0.87 Commercial 38 0.82 State or District 11 0.72 Total 160 © 2012, Richard P 14 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 15. More Moderators – Sponsor of Test Number of Mean Studies Effect Size International 5 1.02 Local 99 0.93 National 45 0.81 State 11 0.64 Total 160 © 2012, Richard P 15 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 16. More Moderators - Study Design Number of Mean Studies Effect Size Pre-post 12 0.97 Experiment, Quasi-experiment 107 0.94 Multivariate 26 0.80 Experiment, posttest only 7 0.60 Pre-post (with shadow test) 8 0.58 Total 160 © 2012, Richard P 16 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 17. More Moderators – Scale of Analysis Number of Mean Studies Effect Size Aggregated 9 1.60 Small-scale 118 0.91 Large-scale 33 0.57 Total 160 © 2012, Richard P 17 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 18. More Moderators – Scale of Administration Number Mean of Studies Effect Size Classroom 115 0.95 Mid-scale 6 0.72 Large-scale 39 0.71 Total 160 © 2012, Richard P 18 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 19. Surveys and opinion polls © 2012, Richard P 19 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 20. Percentage of survey items, by respondent group and type of survey 50 45 40 35 30 Education Percent 25 Providers 20 15 Education 10 Consumers 5 0 Public opinion polls Program evaluation surveys* © 2012, Richard P 20 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 21. Number and percent of survey items, by test stakes and target group Test stakes Number % Target group Number % High 507 62 Students 393 46 Medium 184 23 Schools 281 33 Low 33 4 Teachers 116 14 Unknown 89 11 No stakes 64 7 TOTAL 813 TOTAL 854 © 2012, Richard P 21 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 22. Opinion polls, by year • 244 between 1958--2008, in the U.S. & Canada • 813 unique question-response combinations • close to 700,000 individual respondents 120 100 80 60 40 20 0 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 Year © 2012, Richard P 22 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 23. Surveys and opinion polls: Regular standardized tests, performance tests Regular tests Performance tests (N ≈125) (N ≈ 50) Respondent opinion d d Achievement is increased 1.2 1.0 …weighted by size of study population 1.9 0.5 Instruction is improved 1.0 1.4 …weighted by size of study population 0.9 0.9 Tests help align instruction 1.0 1.0 …weighted by size of study population 0.5 0.9 © 2012, Richard P 23 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 24. Qualitative studies: Summary (One cannot calculate an effect size.) © 2012, Richard P 24 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 25. Qualitative studies, by methodology type Number of Methodology studies % Case study 120 43 Experiment or pre-post study 21 7 Interviews (individual or group) 75 27 Journal 2 1 Review of official records, documents, reports 33 12 Research review 8 3 Survey 22 8 TOTAL 281 100 © 2012, Richard P 25 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 26. Qualitative studies: Effect on student achievement 244 studies conducted in the past century in over 30 countries Number of Percent without Direction of effect studies Percent of studies the inferred Positive 204 84 93 Positive inferred 24 10 Mixed 5 2 2 No change 8 3 4 Negative 3 1 1 TOTAL 244 100 100 © 2012, Richard P 26 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 27. Qualitative studies: Testing improves student achievement and teacher instruction Number of Achievement is improved studies % Yes 200 95 Mixed results 1 <1 No 10 5 TOTAL 211 100 Number of Instruction is improved studies % Yes 158 96 No 7 4 TOTAL 165 100 © 2012, Richard P 27 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 28. Qualitative studies: Variation by rigor and test stakes Level of rigor Direction of effect high medium low Total Positive 95 67 42 204 Positive inferred 10 8 6 24 Mixed 3 1 1 5 No change 4 3 1 8 Negative 1 1 1 3 TOTAL 113 80 51 244 Stakes Direction of effect high medium low unknown Total Positive 133 27 38 6 204 Positive inferred 12 5 7 24 Mixed 4 1 5 No change 2 1 5 8 Negative 3 3 TOTAL 154 33 51 6 244 © 2012, Richard P 28 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 29. Qualitative studies: Regular standardized tests and performance tests Regular tests Performance tests (N =176) (N = 69) Study results % % Generally positive 93 95 High-stakes tests 71 42 High level of study rigor 46 48 Student attitudes toward test positive 60 71 Teacher attitudes toward test positive 55 80 Student achievement improved 95 95 Instruction improved 92 100 Large-scale testing 86 68 © 2012, Richard P 29 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 30. An enormous research literature • But, assertions that it does not exist at all are common – Some claims are made by those who oppose standardized testing, and may be wishful thinking – Others are “firstness” claims © 2012, Richard P 30 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 31. Dismissive research reviews • With a dismissive research literature review, a researcher assures all that no other researcher has studied the same topic © 2012, Richard P 31 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 32. Firstness claims • With a firstness claim, a researcher insists that he or she is the first to ever study a topic © 2012, Richard P 32 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 33. Social costs are enormous • Research conducted by those without power or celebrity is dismissed -- ignored and lost • Public policies are skewed, based exclusively on the research results of those with power or celebrity • Society pays again and again for research that has already been done © 2012, Richard P 33 International Test Commission, 8th Conference, Amsterdam, PHELPS
  • 34. The effect of testing on student achievement: 1910-2010 Richard P. PHELPS © 2012, Richard P 34 International Test Commission, 8th Conference, Amsterdam, PHELPS