Making the Case for UsingLongitudinal Data for Evaluating Effectiveness of Teacher Professional Development Programs Nina de las Alas, Council of Chief State School Officers Chris Thorn, University of Wisconsin-Madison, WCER NCES 2008 Summer Data Conference Hyatt Regency Bethesda Hotel Bethesda, MD - July 31, 2008
Overview of Presentation Discuss recently completed Cross-Study Highlight results & recommendations Discuss Teacher Incentive Fund projects Group discussion on using longitudinal data for evaluating PD
Cross-State Review of 25 PD Initiatives Began in 2005 To assist ed leaders in all states by providing a cross-state analysis of quality of professional development (PD) using a common rubric developed from recent research on program effectiveness Drew from prior research and evaluation in Local Systemic Initiative, Eisenhower PD Program Conducted in phases (I & II)
Logic ModelHigh Quality PD? Teacher Knowledge Instructional and Skills? Practices? Content-focused Active Questions: Coherence Efficient Duration Questions: methods? Colleagues Valid, adequate tools? Measure change? Follow-upAppropriate Measures ?What’s Implemented ? How to Improve with Data? Effects on Students? Formative Evaluation Measures of Achievement Cohorts over time Collect, Report, Use Student unit records Feedback methods Linked to teachers How to build skills Treatment vs. control? in use of data? Data systems?
Sample of States, Programs 14 states responded -- CO, DE, FL, ID, IN, KY, MA, ME, NJ, NC, OH, OR, SC, WI Directors nominated two high quality PD programs in Math or Science Voluntary sample of 27 programs from 14 states
Phase I Research Questions What is the quality of professional development across the nominated sample of programs, and what is the extent of variation in quality? What are the main program characteristics contributing to high ratings for quality that can be identified and replicated in future program design and development?
Phase I Approach Developed a rubric Had expert teams of 3-4 reviewers analyzed program quality and evaluation design for 25 programs using rubric Findings of each program review aggregated by criterion and indicator
Phase II Research Questions What evaluations were completed and what findings were reported? What were the types of major findings from the evaluations? How were they measured? What measures of outcomes were used in the PD evaluations? What conclusions can be drawn about the adequacy and usefulness of the evaluations and reports? What are the cross-report recommendations that are useful to state leaders and evaluators?
Phase II Approach Collected 41 evaluation reports from 25 programs (Spring 2005-Spring 2007) Devised a findings & results review sheet that collected data according to four categories of outcomes: quality of implementation of PD gain in teacher knowledge & skills change in instructional practices improvement in student achievement
Analysis of Quality of PD Programs in M/S: Cross-State Study Initial Findings (as of Dec. ’06)Findings Programs are more content focused (than in ‘90s) Active learning in most programs Coherence with standards (but school curriculum?) Most not school-based, but do use follow-up activities Evaluation designs broad – use of Tools (see list) More focus on Formative Program evaluation, less on measurable effects of PD but progress since 90s Problem of little feedback to decision-makers
Phase I FindingsAnalysis of the Quality of ProfessionalDevelopment Programs for Mathematics and Science Teachers: Findings from a Cross-State Study (Blank, de las Alas, & Smith, 2007)
Programs’ Ratings on Content Focus (N=25 Programs) 14 High Scores (4s or 5s) 12 Medium Scores (3s) Low Scores (1s or 2s) 10# of Programs 8 6 4 2 0 Provides study of content Provides study of Addresses identified area pedagogical content area teacher needs
Active Learning in PD across Programs (N= 25 Programs) 25 20# of Programs 15 Yes No 10 5 0 Model or lead Plan lessons Develop, Engage in instruction, Coaching or review or Observe other learning present or lead mentoring in score teachers network discussion classroom assessments
Programs Ratings for Coherence (N=25 Programs) 25 20# of Programs 15 Yes No 10 5 0 Consistent Aligned with Meets teacher Consistent with with school state or district certification or state rules for HQ curriculum standards licensure teachers or learning goals
Program Ratings for Collective Participation (N=25 Programs) 25 20# of Programs 15 Yes No 10 5 0 Participating with Participating with Participating with teachers from the teachers in same dept. teachers from same same school or content area grade span
Examples of Duration of Programs Planned Time for Professional DevelopmentNortheast Front Range 120 hoursRocky Mountain Middle School 120 hoursDMT 135 hoursM4 100 hoursMS*TEAMS 119 hoursMATHS 100 hoursLesley/MassInsight 270 hoursNash-Rocky Mount 209 hoursHigh Desert at least 100 hoursSouth Carolina Coaching Initiative 170 hoursNW Wisconsin 100-120 hoursR&R 154 hours for mentor teachers
Summary of Program Evaluation Design (N=23 Programs, as of Dec. 2006) 25 20# of Programs 15 10 5 0 Quality of PD Teacher Instructional Student Other Evaluation Activities Content Practices or Achievement Outcomes Design Total Knowledge Curriculum
Other Outcomes (N=23 Programs) 7 6 5# of Programs 4 3 2 1 0 No. of HQ Teachers Prof. Learning Effects on IHE Teacher Placement & Hi-Level Student Dev. Of Leadership Community Partners Retention Course Enrollment Cadre
Programs’ Ratings on Reporting of Evaluation Results (N=23 Programs) 16 14 12 10 # of Programs Yes 8 No 6 4 2 0 Participant District/State General Public School Grant Donors Teachers Officials Administrators
Phase II Findings Cross-State Analysis of Evaluations ofProfessional Development for Mathematics and Science Teachers (Blank, de las Alas, & Smith, 2008)
Results from Review of Evaluation Study Reports (N=41 Reports, as of Feb. 2008) Data on quality of implementation of PD activities – 14 reports Data on gains in teacher knowledge – 24 reports Data on change in instructional practices of the teachers participating in PD – 13 reports Data on student achievement for teachers, schools, and districts in which teachers were involved with the PD – 20 reports 16 evaluation reports included student achievement trends for at least two years—from one year to the next (e.g., 2005 to 2006). Small number tracked achievement gains for more than two years.
Evaluation Results – Student Outcomes with Measurable ResultsThree Primary Criteria:1. Finding of effect on student outcome is supported by statistical significance of change linked to the treatment teachers;2. Finding is substantively important, i.e., an educationally significant change; and3. Measure of student outcomes is reliable and valid for the evaluation purpose 7 study reports met criteria (see Table 2)
Evaluation Results – StudentOutcomes but not Measurable Effects 6 evaluation reports had student outcomes but did not meet criteria Shortcomings: Could not determine the effects of PD vs. other variables Indirect evidence (student attitude, teacher survey on student achievement) See Table 3
Evaluation Results – Teacher Knowledge Outcomes with Measurable ResultsTwo criteria: Assessment of knowledge gain based on an a validated instrument Assessment of knowledge mandated by the state for licensure/certification 10 study reports met criteria (See Table 4)
Evaluation Results – Instructional Practice Outcomes with Measurable ResultsCommon Characteristics Practice linked to PD experienced Measure of Practice is close to the classroom (e.g., not close = teacher attitudes about practice) Change in practice measure for same teacher 4 evaluation reports (See Table 6)
Evaluation Results – What we have learned about how these programs operate 8 program designs Relative high amount of time for each teacher in PD Targeted teachers in elem. grades or elem. & middle grades math or science Significant activities during school year Emphasized knowledge of how to teach content to students Schools were a strong partner in building & implementing PD See Table 8
Conclusions & Recommendations 1/3 of evaluation studies reported measurable effects of teacher PD Significant effects in programs designed with content focused PD + sufficient time + in- school component Important to plan purposeful evaluations Build valid, tested instruments in evaluation design Weigh carefully teacher-based vs. school-based design
Conclusions & Recommendations – cont’d Include outcome measures in allocation of evaluation resources Plan for use of data systems and experimental designs Instruments generating longitudinal data for students: CSAP, Indiana Curriculum Framework Assessment, Student Discourse Protocol, Terra Nova Instruments generating longitudinal data for teachers: RTOP, SEC, CKT-M, DTAMS, Project-specific surveys Link teacher knowledge gains to change in classroom practices Consider timely use of findings in program decisions by key decision-makers Program/states should appraise the value of partnerships for evaluation
Website : Evaluation Tools Evaluation Tools for Professional Development (Excel based) ProfessionalDevelopment Activities Teacher Knowledge and Skills Change in Instruction Analyzing Effects on Student Achievement Evaluation Design Assistance
Lessons from the Teacher Incentive Fund Project Making Longitudinal Data Systems Work Christopher Thorn University of Wisconsin-Madison
Core Information System Gaps Many HR systems do not handle multiple roles well (or at all) SIS focused on managing student-teacher ratios and scheduling rather than learning IT and Research offices understaffed and often disconnected from curricular departments Finance and HR not tied to research agenda Market for administrative systems lag educational needs – models often don’t fit
Data Quality Problems Poor links between students and teachers – often nonexistent links to subjects taught Little systematic data on system inputs or on fidelity of implementation Limited support for performance evaluation systems – framework, rubrics, and process Quality problems are tied to use. NCLB compliance quality is unacceptable for adult accountability
Analytical Gaps in Accountability Systems VA Models are haaaaaaaaard Tests given in mid-year complicates assignment of productivity – pre and post in year Lack of vertical scales complicates measurement of productivity – claims of linearity also an issue Lack of links to licensure systems complicates assessment of changes in teacher/leader characteristics and efficacy of teacher preparation institutions Limits of annual data Lack of quality in performance evaluation systems
What have been the big surprises? Districts drop number of initiatives to focus on program quality and evaluation Unions demanding VA results for equity National union bringing technical expertise to the table to solve local problems Talent pool in national vendors thin New demands for diagnostic assessment New demands for PD that works
Value-added Modeling Essential Elements of a System to Measure the Performance of Schools and Classrooms/Teachers with Respect to Student Achievement 1. Criterion validity/alignment Are the indicators measured in terms of student outcomes valued by students and society? 2. Statistical: Are the indictors accurate in the sense of measuring true school or classroom productivity, as opposed to other non-school factors that contribute to student achievement? 3. Behavioral: Are the indicators non-corruptible?
VA System Recommendations Use all available data to estimate a “T3+” model that exploits repeated observations to control for student selectivity. Include explicit measures of student characteristics in the model if: (a) the number of longitudinal observations per student is limited or (b) you want to maximize control for student differences across schools. If using prior test scores as regressors, control for test measurement error. Extend the model to allow for local conditions, as required. Example: mid-year testing.
Action Items & Recommendations Balanced assessments that combine measures of practice and productivity Address issues of data quality and implementation fidelity School as the unit of change the model in almost all TIF sites Proven efficacy of PD critical in systems that are high stakes for adults
For more information: Nina de las Alas, Research Associate Council of Chief State School Officers email@example.com 202-312-6863Chris Thorn, Assistant Research ScientistWisconsin Center for Education Research firstname.lastname@example.org 608-263-2709