A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout
Upcoming SlideShare
Loading in...5

A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout



This AEA presentation gives results from systematic review of all published reports from 2002 to 2009

This AEA presentation gives results from systematic review of all published reports from 2002 to 2009



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 by M Cahalan and S Stout Presentation Transcript

  • A 8–Year Review and Lessons Learned from Federal Education Evaluations: 2002-2009 Prepared by: Margaret Cahalan Sharon Stout Prepared for: AEA Meeting November 13 2009 Draft not for Quotation or Citation: All views and data presented are solely those of the authors and do not reflect any review or authorization from the US Department of Education
  • A Review of IES/NCEE & OPEPD/PPSS Research Reports
    • Part 1: Overview---All reports listed on web sites published between 2002 and 2009--Looked at level, methods, topics, findings
    • Part 2: Case Study Example of difficulties in using evaluation research for PART, budget and program reform ----college access and persistence
    • Part 3: Lessons Learned--New Directions/Models for Evaluation
  • Part I: Overview of Studies of Two ED-Offices
    • 2002-2009—Period of transition between offices
      • OPEPD/ PPSS --Office Planning Evaluation Policy Development/Policy and Planning Studies Services ( known as PPSS in this presentation )
        • Earlier office responsible for most evaluations prior to 2002
        • Reports to --OPEPD Assistant Secretary
        • Informal agreement would do primarily Implementation evaluation studies after 2002
      • IES /NCEE---Institute for Education Sciences/ National Center Education Evaluation (known as IES in this presentation)
        • Created in re-organization in 2002 –former OERI
        • Independent with Commissioner confirmed by congress
        • Legislative mandate for evaluations--scientific approach—medical model-- experimental design—random assignment
    View slide
  • Published Reports from IESNCEE and OPEPD/PPSS: 2002-2009 117 ---Evaluation Reports IES/NCEE: Primarily experimental design 27 reports (2003 to 2009) OPEPD/PPSS: Primarily Implementation Studies 90 reports (2002-2009) 15 Studies (study with multiple reports counted once) 50 Studies (study with multiple reports counted once) View slide
  • IES/NCEE Studies: 2003-2009 27 reports– of 15 studies (Largely Practice Based)
    • Teacher Induction
    • Supplemental Reading Comprehension Interventions-
    • DC Opportunity Scholarship Program
    • Student Mentoring Program
    • Reading and Mathematics Software Products
    • Teachers Trained Through Different Routes to Certification
    • Professional Development Strategies to Improve Early Reading
    • Reading First Impact Study Comprehension
    • Classroom Literacy Interventions and Outcomes in Even Start (Started PPSS)
    • Enhanced Reading Opportunities Study
    • Four Early Elementary School Math Curricula
    • Enhanced Academic Instruction in After-School Programs—48 more hours
    • Striving Readers Study Closing the Reading Gap
    • Early Reading First
    • National Evaluation of 21st Century Community Learning Centers (started by PPSS)
  • PPSS—50 studies (Largely Program Based)
    • Career and Technical Education (CTE)-3
    • Federal college access and retention programs TRIO and GEAR UP—12
    • Title 1 (general, SES, Choice, funding, accountability, AYP & improvement)—9
    • Single sex schools –1
    • Private schools under Title 1-1
    • Literacy through Libraries Program-1
    • Reading First-2
    • Teacher Quality-4
    • Comprehensive School Reform—2
    • Class Size Reduction-1
    • Voluntary school choice-1
    • Small Learning Communities-1
    • Faith Based-1
    • Technology & Data System Use—5
    • Charter Schools-2
    • Sex Misconduct-1
    • Magnet Program-1
    • Migrant Program—Title 1
    • Even Start Program-1
  • Distribution of Studies By School Level
    • IES/NCEE 15 Studies
    • OPEPD/PPSS 50 Studies
  • Topic Summary: IES/NCEE
  • Topics for OPEPD/PPSS Elementary-Secondary Studies
  • Topics/Programs of Studies with Reports Published in Secondary-Postsecondary by PPSS: 2002-2009
  • Major Method IES/NCEE: 15 Studies
  • OPEPD/PPSS Major Methods: 50 Studies
  • Summary of Effects IES/NCEE: 15 Studies
  • Effects Summary 23 PPSS Studies that report comparative effects or outcomes
  • 15 IES/NCEE Studies Topics Effects Summary Reading 6 studies (one 9 th grade positive; 3 small positive on decoding not comprehension; 2 not detected; findings that practices were different but not outcomes; the one study with positive involved more time on reading—9 th grade study) DC Voucher 1 study --First two years no effects; third year began to see some effects for reading not for math; 26 percent did not take opportunity Mentoring 1 study-- not overall positive effects--some positive for some subgroups—60 percent of controls also had mentoring; Reading and Math Software 1 study ( not detected positive overall ) Teacher—Reading PD, Induction, Alternative Certification 3 studies (not detected overall on student outcomes; some on teacher practice for reading) 4 Math Curricula 1-study 2 curriculum’s had positive effects—relative to each other Afterschool 2 studies—1--not effects on academics or behavior; second study—positive on math not reading–48 hours more instruction
  • Observations— 15 IES/NCEE Studies
    • Topics
      • Lack of direct link to federal program but sought more understanding of intervention—most funded by program funds—Title1and Title 2 of ESEA/NCLB
    • Effects
      • Few overall effects –especially on students when counterfactual was existing practice
      • Observe some effects on teacher practices associated with intervention—not translate into changes for students’ test scores
      • Some effects seen when increase time spent on subject
      • Effects sometimes emerge over time when the intervention is sustained
    • Issues
      • Counterfactual why would not assume would use science based practices
      • Contamination of control group; spill over issues
      • Intent to treat drop outs
      • Implementing intervention issues
      • Lack of external validity –time and space----does the same circumstances ever happen again? Moving targets for curriculums and technology
  • K-12 PPSS Studies Topics Effects Summary Title 1-general & Funding Proficiency increased; however most states will not meet goals by 2013-14; Over 1100 schools in need of improvement by 2006 and most were in advanced stages of corrective action; did not show changes in time on subjects; poorest schools get less per capita; federal more targeted than state and local Teacher Over 90 percent of teachers met highly qualified; however those in high poverty have less experience and less likely to have degree in subject teaching and less professional development than less poverty schools; Supplemental ED Services (SES) Increase in number of schools eligible; however participation rate is stable--about 17 percent taking advantage ; Those who participated were more likely to show growth than eligible non-participants in sub-study of districts—selection effects issues; Waiver studies—mixed results—higher proportion eligible participating when district service—in two large districts do better in two small did not do better Choice/Voluntary School Choice About 1 percent of students participating; Mixed results—some show no difference in outcomes when compared with those not participating; one that students did better. Most districts kept choice within district and most did not go to high performing schools when transferred out of low performing schools Reading First More time on reading and adopted practices; limited evidence 3rd and 4th graders increased reading more than those not in programs; regression discontinuity
  • K-12 PPSS Studies (continued) Topics Effects Summary Charter Facilities study found loans were given to schools that served more needy population than other charter schools in areas; most would not have qualified for private loans-too high risk Descriptive study on characteristics focus on accountability Comprehensive School Reform No effects for those who got grants; evidence that those who implemented practices and longer term had increase in growth especially in math Small Learning Communities and class size Primarily implementation study; performance reports before and after for schools found increase in and decrease in violence incidence; 9 th grade indicators and drop out rate; plains for college; no change in test Even Start Random assignment—no difference detected on major child or parent outcomes; counter factual issues Technology Meta-analysis found significant effects for on-line courses; small number of studies; extensive reports on status of technology use access in schools and class rooms. Documents growth of use computer and internet and also less use in poorer schools. Use generally not related to test score differences Teaching History & Libraries History professional development---most who took were experienced teachers; increased knowledge. Library grants increased materials and associated with higher than expected growth in test scores
  • Observations— 50 PPSS Studies
    • Topics:
      • Studies more direct link to program
      • Attempts to get at impact overall of program-became black box studies
      • Studies initiated after re-organization were implementation studies that often combined descriptive with comparative observation of outcomes of policy or services linked to NCLB requirements or services--projections
      • Intervention studied was more likely to be a supplemental service rather than a replacement of curriculum
    • Effects
      • More positive effects observed—use propensity matching, matched comparison—larger n’s use of aggregated data or administrative student records
      • Larger effects seen when increase time spent on subject or in the program
    • Issues
      • Selection effects (voluntary participation interaction of service and motivation;
      • Difficulty in adequate comparison groups
      • Contamination of control group; spill over issues
      • Time scale –usually start study when start program---implementing intervention issues
      • More external validity; less rigorous designs for internal validity
  • Part 2: Example of Issues Faced in Using Evaluations for PART and Budget PPSS Postsecondary College Access and Retention Studies Case History
  • Rough Estimates of Percent of Eligible Population Served
  • PPSS College Access and Postsecondary Studies Topics Effects Summary and Policy History Student Support Services (SSS) Propensity matching found positive impact for participation in SSS supported services whether funded by the SSS project or some other sources on college retention and completion Moderately effective- 2003 Talent Search Propensity matching found positive impacts in 3 states using longitudinal data bases on high school completion, financial aid and postsecondary entrance Moderately effective-2005 GEAR Up Matched comparison of 7 th graders in original cohort when re-surveyed in 8 th grade found increases in parent knowledge and aspirations and student knowledge. Did not find impact on academic grades or course taking. Found impact on course taking for some sub-groups. Adequate- 2003 Upward Bound Math/ Science Propensity matching found positive impacts on college entrance, 4-year degree completion, planning to major in math or science and majoring in social science; majoring in math or natural science not statistically significant although positive direction—small n’s
  • Upward Bound Ineffective—PART-2002
    • Gold Standard Random assignment study with inverse of probability weights for national estimates begun in early 1990s and followed until 2004;
    • Several Mathematica reports over 10 years (1999-2009) reported no overall positive findings but large effects for students more academically at risk and with lower expectations at baseline.
    • On basis of study given “ineffective rating” in PART in 2002
    • Zero funding in Bush Budget FY2005 & FY2006
    • OMB efforts to reform based on study results--UB initiative & Absolute Priority-1/3 at risk
    • New IES Study to Evaluate started in 2006 –large opposition by UB grantees Congress cancelled in 2007
    • HEOA:2008 contains prohibition of certain RA—where required to recruit more than normally do for purposes of study; HEOA also cancelled Absolute Priority
    • Obama budgets level funded programs in FY10 despite establishing large college access and completion innovation fund
    • Recent re-analyses by ED staff person who was COR for study found these estimates were confounded by bias in favor of the control group caused by one project with extreme weight (26 percent of the total) who had a large imbalance between treatment and control group. Control group members were more likely to have high expectations, less likely to be academically at risk and more likely to be older at start of study.
    • A re-analyses found statistically significant and substantial impacts for UB on the key outcomes: postsecondary entrance, award of financial aid, and postsecondary degree completion including
    • Re-analysis Found Treatment on Treated (TOT) 50 percent increase in BA attainment for the average of 66 of the 67 projects in the sample.
  • National Evaluation of Upward Bound: 1992-2003-04
    • Controversial Study—random assignment combined with probability national sample—very rare
    • Mathematica reported findings of lack of overall impact but seeming large impact for higher academic risk students and those with lower expectations
    • Ineffective PART Rating ---OMB attempt to reform program with UB Initiatives and Absolute Priority to Serve more at-risk students—fit with priority of Bush administration
    • Bush Budget called for zero funding of all pre-college programs in FY05 and FY06
  • UB Study Strong Interaction with Policy History
    • Published reports -- reported few average overall effects; but large effects for students at-risk academically and with lower educational expectations
    • Newly devised OMB PART rating of “ineffective”; UB Initiative to serve at-risk
    • Budget ---zero funding of federal pre-college programs in FY05 and FY06—dropped in FY07 and FY08
  • Policy History (cont)
    • UB 2006 Absolute Priority to serve 1/3 at-risk and 9 th grade
    • New IES random assignment study to evaluate begun 2006
    • Following strong objections from UB projects and Council on Opportunity in Education (COE), Congress prohibited funding in 2007 and cancelled by ED in 2008
  • HEOA:2008 & TRIO FY10 Budget
      • HEOA:2008
        • Affirms goal of evaluation to identify: What are best practices to achieve goals of program?
        • Contains prohibition on certain types of random assignment studies that would require projects to recruit more than would normally serve only for the purposes of the studies—denial of service issues—work with IRB’s
        • Cancelled At-Risk Absolute Priority
        • HEOA specifies should begin a new rigorous UB study by 2010
      • Obama Budget for FY10—level funded TRIO and GEAR UP while putting in large money into new college access and completion innovation funds to attain priorities—5 million new degrees
  • ED PPSS Staff QA Re-examination of Sample Design and Re-Analyses
    • Found several serious bias issues combined to confound the results published by Mathematica in 2004 and 2009 concerning the UB program
    • Re-examination found study impact estimates suffer from: 1) uncontrolled for bias in favor of control group; 2) unequal weighting with one project with 26 percent of weight; and 3) serious representational issues with regard to 4-year grantees
    • When these issues are addressed using standard statistical methods—the study shows strong positive impacts for the UB program
  • Basic Assumptions of Random Assignment Studies –
    • Sample representative of population to which wish to generalize
    • Treatment and control group are equivalent
    • Treatment and control group treated equally except for the treatment
    • Treatment and control group are mutually exclusive with regard to the treatment
  • Extreme unequal weighting and serious representation issues
    • Project with 26 percent of weight (known as 69) was sole representative of 4-year public strata, but was a former 2-year school with largely less than 2-year programs
    • Project partnered with job training program
    • Inadequate representation of 4-year
  • Bias in Favor of Control Group: Imbalance in Project 69
  • Almost Balance in 66 other Projects
  • Uncontrolled for Bias in Favor of Control Group in Overall Sample with project 69 included
  • Impact Estimates from Two Stage Instrumental Variables Regression for Percent Obtaining a BA in +6 years based on UB Random Assignment Evaluation Note: All estimates significant at the .01 level or higher. Estimates based on 66 of 67 projects in sample representing 74 percent of UB at the time of the study. One project removed due to introducing bias into estimates and representational issues.
  • Estimated tax impact of UB compared to estimated cost based on average length of participation Increased taxes collected are 4.9 times higher than per participant cost of program
  • Part 3: Lessons Learned
    • Pay attention to sampling and non-sampling errors—evaluate the evaluations
    • Must ask question—Is the state of the art of evaluation as we practice it really capable of being the source of differentiating budget allocations?
    • Zero Sum Issues---How to get around the “zero sum” game in which projects are competing with each other for scarce resources—efforts to game the system---
    • Stakeholder Involvement critical for program improvement---Partnership ----Focus on how programs can work together and contribute to the shared goals—formative assessment
    • Is non-punitive accountability possible ?
    • Strength Finder Approach as opposed to focusing on deficits--Possible to work from strengths of program rather than weaknesses
  • Current Models in Evaluation Profession
    • Partnership –engage practitioners—formative assessment—evaluation as tool for improvement—using data to improve
    • Utilization Focused —Users and Client focus--Continuous Improvement—evaluation
    • Standards based (feasible, accurate, ethical, transparent, useful)
    • Ex Ante—Theory of change —what impact can reasonably expect given the intervention and given the system—multiple methods
    • Systems theory —role in contributing to the whole--interactions
    • Complexity theory —changing conditions-adaptation needed—away from summative—never repeat same situation exactly—rapid time feedback for adaptation and innovation
  • GEAR UP Next Generation Studies
    • Partnership Design work—systemic review, focus groups with grantees, expert papers on topics, APR analyses
    • Concept of using data as feedback for program improvement—Engagement of practitioners in defining interventions to study in rigorous manner—leverage institutional research capabilities—cross site analyses
      • Issued call for grantee partnerships to submit statements of interest in planning awards for rigorous studies—July 2009—$25,000
      • About to award 44 small planning/proposal awards to grantees—submit developed proposals in Spring
      • Implementation awards 2010-2012 —
  • Process of Evaluations for Improvement
  • UB Descriptive Follow up Findings
    • Enrollment
      • 68 percent in +18 months from EHSGY
      • 75-78 by end of study period 7-9 years out
    • Postsecondary –any degree or certificate 35-47%
    • BA degree--just over 20 percent by end of study—7 to 9 years out
  • UB Degree or Credential Attainment Impact
    • Any Degree or Credential
      • Both Mathematica and PPSS re-analysis found substantial and positive results on attainment of any degree or certificate by end of study
      • Academic at-risk—large impacts
    • BA Degree
      • PPSS re-analysis also found substantial impact on attainment of BA degrees without outlier
      • Academic at risk—too few attained (3 percent) –to make treatment/control group comparisons (25 unweighted cases)
    • Comparison to National
      • The National Center for Higher Education Management Systems (NCHEMS) Information Center , 9 th grade cohort graduate within 150 percent of program time was 17 percent in 1998; and 18.4 in 2004
      • NELS estimate was 7 percent in BA in + 8 years for low income students and 11 percent for first generation students. 26 percent overall in NELS in +8
    • Comparison to Expectations
      • Much lower than expectations which were at about 72 percent expected BA (38 percent expected advanced degree) ; 25 percent some other postsecondary degree; only 3 percent no degree or credential