2008 nces summer data conference making the case for longitudinal data of teacher effectiveness
1. Making the Case for Using
Longitudinal Data for Evaluating
Effectiveness of Teacher
Professional Development
Programs
Nina de las Alas, Council of Chief State School Officers
Chris Thorn, University of Wisconsin-Madison, WCER
NCES 2008 Summer Data Conference
Hyatt Regency Bethesda Hotel
Bethesda, MD - July 31, 2008
2. Overview of Presentation
Discuss recently completed Cross-Study
Highlight results & recommendations
Discuss Teacher Incentive Fund projects
Group discussion on using longitudinal data
for evaluating PD
3. Cross-State Review of 25 PD
Initiatives
Began in 2005
To assist ed leaders in all states by providing a
cross-state analysis of quality of professional
development (PD) using a common rubric
developed from recent research on program
effectiveness
Drew from prior research and evaluation in Local
Systemic Initiative, Eisenhower PD Program
Conducted in phases (I & II)
4. Logic Model
High Quality PD? Teacher Knowledge Instructional
and Skills? Practices?
Content-focused
Active Questions:
Coherence Efficient
Duration Questions: methods?
Colleagues Valid, adequate tools? Measure change?
Follow-up
Appropriate Measures ?
What’s Implemented ?
How to Improve with Data? Effects on Students?
Formative Evaluation Measures of Achievement
Cohorts over time
Collect, Report, Use Student unit records
Feedback methods Linked to teachers
How to build skills Treatment vs. control?
in use of data? Data systems?
5. Sample of States, Programs
14 states responded -- CO, DE, FL, ID,
IN, KY, MA, ME, NJ, NC, OH, OR,
SC, WI
Directors nominated two high quality
PD programs in Math or Science
Voluntary sample of 27 programs from
14 states
6. Phase I Research Questions
What is the quality of professional
development across the nominated
sample of programs, and what is the
extent of variation in quality?
What are the main program
characteristics contributing to high
ratings for quality that can be identified
and replicated in future program design
and development?
7. Phase I Approach
Developed a rubric
Had expert teams of
3-4 reviewers
analyzed program
quality and
evaluation design
for 25 programs
using rubric
Findings of each
program review
aggregated by
criterion and
indicator
8. Phase II Research Questions
What evaluations were completed and what
findings were reported?
What were the types of major findings from the
evaluations? How were they measured? What
measures of outcomes were used in the PD
evaluations?
What conclusions can be drawn about the
adequacy and usefulness of the evaluations and
reports? What are the cross-report
recommendations that are useful to state leaders
and evaluators?
9. Phase II Approach
Collected 41 evaluation
reports from 25 programs
(Spring 2005-Spring 2007)
Devised a findings &
results review sheet that
collected data according to
four categories of
outcomes:
quality of
implementation of PD
gain in teacher
knowledge & skills
change in instructional
practices
improvement in student
achievement
10. Analysis of Quality of PD Programs in
M/S: Cross-State Study Initial
Findings (as of Dec. ’06)
Findings
Programs are more content focused (than in ‘90s)
Active learning in most programs
Coherence with standards (but school
curriculum?)
Most not school-based, but do use follow-up
activities
Evaluation designs broad – use of Tools (see list)
More focus on Formative Program evaluation, less
on measurable effects of PD but progress since
90s
Problem of little feedback to decision-makers
11. Phase I Findings
Analysis of the Quality of Professional
Development Programs for Mathematics
and Science Teachers:
Findings from a Cross-State Study
(Blank, de las Alas, & Smith, 2007)
12. Programs’ Ratings on Content Focus
(N=25 Programs)
14
High Scores (4s or 5s)
12 Medium Scores (3s)
Low Scores (1s or 2s)
10
# of Programs
8
6
4
2
0
Provides study of content Provides study of Addresses identified
area pedagogical content area teacher needs
13. Active Learning in PD across Programs
(N= 25 Programs)
25
20
# of Programs
15
Yes
No
10
5
0
Model or lead Plan lessons Develop, Engage in
instruction, Coaching or review or Observe other learning
present or lead mentoring in score teachers network
discussion classroom assessments
14. Programs' Ratings for Coherence
(N=25 Programs)
25
20
# of Programs
15
Yes
No
10
5
0
Consistent Aligned with Meets teacher Consistent with
with school state or district certification or state rules for HQ
curriculum standards licensure teachers
or learning
goals
15. Program Ratings for Collective Participation
(N=25 Programs)
25
20
# of Programs
15
Yes
No
10
5
0
Participating with Participating with Participating with
teachers from the teachers in same dept. teachers from same
same school or content area grade span
16. Examples of Duration of Programs
Planned Time for Professional Development
Northeast Front Range 120 hours
Rocky Mountain Middle School 120 hours
DMT 135 hours
M4 100 hours
MS*TEAMS 119 hours
MATHS 100 hours
Lesley/MassInsight 270 hours
Nash-Rocky Mount 209 hours
High Desert at least 100 hours
South Carolina Coaching Initiative 170 hours
NW Wisconsin 100-120 hours
R&R 154 hours for mentor teachers
17. Summary of Program Evaluation Design
(N=23 Programs, as of Dec. 2006)
25
20
# of Programs
15
10
5
0
Quality of PD Teacher Instructional Student Other Evaluation
Activities Content Practices or Achievement Outcomes Design Total
Knowledge Curriculum
18. Other Outcomes (N=23 Programs)
7
6
5
# of Programs
4
3
2
1
0
No. of HQ Teachers Prof. Learning Effects on IHE Teacher Placement & Hi-Level Student Dev. Of Leadership
Community Partners Retention Course Enrollment Cadre
19. Programs’ Ratings on Reporting of Evaluation
Results (N=23 Programs)
16
14
12
10
# of Programs
Yes
8
No
6
4
2
0
Participant District/State General Public
School Grant Donors
Teachers Officials
Administrators
20. Phase II Findings
Cross-State Analysis of Evaluations of
Professional Development for Mathematics
and Science Teachers
(Blank, de las Alas, & Smith, 2008)
21. Results from Review of Evaluation
Study Reports
(N=41 Reports, as of Feb. 2008)
Data on quality of implementation of PD activities – 14
reports
Data on gains in teacher knowledge – 24 reports
Data on change in instructional practices of the teachers
participating in PD – 13 reports
Data on student achievement for teachers, schools, and districts
in which teachers were involved with the PD – 20 reports
16 evaluation reports included student achievement trends
for at least two years—from one year to the next (e.g., 2005 to
2006).
Small number tracked achievement gains for more than two
years.
22. Evaluation Results – Student
Outcomes with Measurable Results
Three Primary Criteria:
1. Finding of effect on student outcome is
supported by statistical significance of
change linked to the treatment teachers;
2. Finding is substantively important, i.e., an
educationally significant change; and
3. Measure of student outcomes is reliable
and valid for the evaluation purpose
7 study reports met criteria (see Table 2)
23. Evaluation Results – Student
Outcomes but not Measurable Effects
6 evaluation reports had student outcomes
but did not meet criteria
Shortcomings:
Could not determine the effects of PD vs. other
variables
Indirect evidence (student attitude, teacher
survey on student achievement)
See Table 3
24. Evaluation Results – Teacher Knowledge
Outcomes with Measurable Results
Two criteria:
Assessment of knowledge gain based on an
a validated instrument
Assessment of knowledge mandated by the
state for licensure/certification
10 study reports met criteria (See Table 4)
25. Evaluation Results – Instructional Practice
Outcomes with Measurable Results
Common Characteristics
Practice linked to PD experienced
Measure of Practice is close to the classroom
(e.g., not close = teacher attitudes about
practice)
Change in practice measure for same
teacher
4 evaluation reports (See Table 6)
26. Evaluation Results – What we have
learned about how these programs
operate
8 program designs
Relative high amount of time for each teacher in
PD
Targeted teachers in elem. grades or elem. &
middle grades math or science
Significant activities during school year
Emphasized knowledge of how to teach content to
students
Schools were a strong partner in building &
implementing PD
See Table 8
27. Conclusions & Recommendations
1/3 of evaluation studies reported measurable
effects of teacher PD
Significant effects in programs designed with
content focused PD + sufficient time + in-
school component
Important to plan purposeful evaluations
Build valid, tested instruments in evaluation
design
Weigh carefully teacher-based vs. school-based
design
28. Conclusions &
Recommendations – cont’d
Include outcome measures in allocation of evaluation
resources
Plan for use of data systems and experimental designs
Instruments generating longitudinal data for students:
CSAP, Indiana Curriculum Framework Assessment,
Student Discourse Protocol, Terra Nova
Instruments generating longitudinal data for teachers:
RTOP, SEC, CKT-M, DTAMS, Project-specific surveys
Link teacher knowledge gains to change in classroom
practices
Consider timely use of findings in program decisions by
key decision-makers
Program/states should appraise the value of partnerships
for evaluation
29. Website : Evaluation Tools
Evaluation Tools for Professional
Development (Excel based)
ProfessionalDevelopment Activities
Teacher Knowledge and Skills
Change in Instruction
Analyzing Effects on Student Achievement
Evaluation Design Assistance
31. Lessons from the Teacher Incentive Fund Project
Making Longitudinal Data Systems Work
Christopher Thorn
University of Wisconsin-Madison
32. Core Information System Gaps
Many HR systems do not handle multiple roles
well (or at all)
SIS focused on managing student-teacher ratios
and scheduling rather than learning
IT and Research offices understaffed and often
disconnected from curricular departments
Finance and HR not tied to research agenda
Market for administrative systems lag
educational needs – models often don’t fit
33. Data Quality Problems
Poor links between students and teachers –
often nonexistent links to subjects taught
Little systematic data on system inputs or on
fidelity of implementation
Limited support for performance evaluation
systems – framework, rubrics, and process
Quality problems are tied to use. NCLB
compliance quality is unacceptable for adult
accountability
34. Analytical Gaps in
Accountability Systems
VA Models are haaaaaaaaard
Tests given in mid-year complicates assignment of
productivity – pre and post in year
Lack of vertical scales complicates measurement of
productivity – claims of linearity also an issue
Lack of links to licensure systems complicates
assessment of changes in teacher/leader characteristics
and efficacy of teacher preparation institutions
Limits of annual data
Lack of quality in performance evaluation systems
35. What have been the big surprises?
Districts drop number of initiatives to focus on
program quality and evaluation
Unions demanding VA results for equity
National union bringing technical expertise to the table
to solve local problems
Talent pool in national vendors thin
New demands for diagnostic assessment
New demands for PD that works
36. Value-added Modeling
Essential Elements of a System to Measure the
Performance of Schools and Classrooms/Teachers
with Respect to Student Achievement
1. Criterion validity/alignment Are the indicators
measured in terms of student outcomes valued by
students and society?
2. Statistical: Are the indictors accurate in the sense of
measuring true school or classroom productivity, as
opposed to other non-school factors that contribute to
student achievement?
3. Behavioral: Are the indicators non-corruptible?
37. VA System Recommendations
Use all available data to estimate a “T3+” model that
exploits repeated observations to control for student
selectivity.
Include explicit measures of student characteristics in
the model if: (a) the number of longitudinal
observations per student is limited or (b) you want to
maximize control for student differences across
schools.
If using prior test scores as regressors, control for test
measurement error.
Extend the model to allow for local conditions, as
required. Example: mid-year testing.
38. Action Items &
Recommendations
Balanced assessments that combine measures of
practice and productivity
Address issues of data quality and
implementation fidelity
School as the unit of change the model in almost
all TIF sites
Proven efficacy of PD critical in systems that are
high stakes for adults
39. For more information:
Nina de las Alas, Research Associate
Council of Chief State School Officers
ninaa@ccsso.org
202-312-6863
Chris Thorn, Assistant Research Scientist
Wisconsin Center for Education Research
cathorn@wisc.edu
608-263-2709