2008 nces summer data conference making the case for longitudinal data of teacher effectiveness

Making the Case for Using
Longitudinal Data for Evaluating
Effectiveness of Teacher
Professional Development
Programs
Nina de las Alas, Council of Chief State School Officers
Chris Thorn, University of Wisconsin-Madison, WCER

NCES 2008 Summer Data Conference
Hyatt Regency Bethesda Hotel
Bethesda, MD - July 31, 2008

Overview of Presentation

 Discuss recently completed Cross-Study
 Highlight results & recommendations
 Discuss Teacher Incentive Fund projects
 Group discussion on using longitudinal data
for evaluating PD

Cross-State Review of 25 PD
Initiatives
 Began in 2005
 To assist ed leaders in all states by providing a
cross-state analysis of quality of professional
development (PD) using a common rubric
developed from recent research on program
effectiveness
 Drew from prior research and evaluation in Local
Systemic Initiative, Eisenhower PD Program
 Conducted in phases (I & II)

Logic Model
High Quality PD? Teacher Knowledge Instructional
and Skills? Practices?
Content-focused
Active Questions:
Coherence Efficient
Duration Questions: methods?
Colleagues Valid, adequate tools? Measure change?
Follow-up

Appropriate Measures ?
What’s Implemented ?

How to Improve with Data? Effects on Students?
Formative Evaluation Measures of Achievement
Cohorts over time
Collect, Report, Use Student unit records
Feedback methods Linked to teachers

How to build skills Treatment vs. control?
in use of data? Data systems?

Sample of States, Programs

 14 states responded -- CO, DE, FL, ID,
IN, KY, MA, ME, NJ, NC, OH, OR,
SC, WI
 Directors nominated two high quality
PD programs in Math or Science
 Voluntary sample of 27 programs from
14 states

Phase I Research Questions
 What is the quality of professional
development across the nominated
sample of programs, and what is the
extent of variation in quality?
 What are the main program
characteristics contributing to high
ratings for quality that can be identified
and replicated in future program design
and development?

Phase I Approach
 Developed a rubric
 Had expert teams of
3-4 reviewers
analyzed program
quality and
evaluation design
for 25 programs
using rubric
 Findings of each
program review
aggregated by
criterion and
indicator

Phase II Research Questions
 What evaluations were completed and what
findings were reported?
 What were the types of major findings from the
evaluations? How were they measured? What
measures of outcomes were used in the PD
evaluations?
 What conclusions can be drawn about the
adequacy and usefulness of the evaluations and
reports? What are the cross-report
recommendations that are useful to state leaders
and evaluators?

Phase II Approach
 Collected 41 evaluation
reports from 25 programs
(Spring 2005-Spring 2007)
 Devised a findings &
results review sheet that
collected data according to
four categories of
outcomes:
 quality of
implementation of PD
 gain in teacher
knowledge & skills
 change in instructional
practices
 improvement in student
achievement

Analysis of Quality of PD Programs in
M/S: Cross-State Study Initial
Findings (as of Dec. ’06)
Findings
 Programs are more content focused (than in ‘90s)
 Active learning in most programs
 Coherence with standards (but school
curriculum?)
 Most not school-based, but do use follow-up
activities
 Evaluation designs broad – use of Tools (see list)
 More focus on Formative Program evaluation, less
on measurable effects of PD but progress since
90s
 Problem of little feedback to decision-makers

Phase I Findings

Analysis of the Quality of Professional
Development Programs for Mathematics
and Science Teachers:
Findings from a Cross-State Study
(Blank, de las Alas, & Smith, 2007)

Programs’ Ratings on Content Focus
(N=25 Programs)
14

High Scores (4s or 5s)
12 Medium Scores (3s)
Low Scores (1s or 2s)

10
# of Programs

8

6

4

2

0
Provides study of content Provides study of Addresses identified
area pedagogical content area teacher needs

Active Learning in PD across Programs
(N= 25 Programs)

25

20
# of Programs

15

Yes
No

10

5

0

Model or lead Plan lessons Develop, Engage in
instruction, Coaching or review or Observe other learning
present or lead mentoring in score teachers network
discussion classroom assessments

Programs' Ratings for Coherence
(N=25 Programs)
25

20
# of Programs

15

Yes
No

10

5

0
Consistent Aligned with Meets teacher Consistent with
with school state or district certification or state rules for HQ
curriculum standards licensure teachers
or learning
goals

Program Ratings for Collective Participation
(N=25 Programs)
25

20
# of Programs

15

Yes
No

10

5

0

Participating with Participating with Participating with
teachers from the teachers in same dept. teachers from same
same school or content area grade span

Examples of Duration of Programs

Planned Time for Professional Development

Northeast Front Range 120 hours
Rocky Mountain Middle School 120 hours
DMT 135 hours
M4 100 hours
MS*TEAMS 119 hours
MATHS 100 hours
Lesley/MassInsight 270 hours
Nash-Rocky Mount 209 hours
High Desert at least 100 hours
South Carolina Coaching Initiative 170 hours
NW Wisconsin 100-120 hours
R&R 154 hours for mentor teachers

Summary of Program Evaluation Design
(N=23 Programs, as of Dec. 2006)

25

20
# of Programs

15

10

5

0
Quality of PD Teacher Instructional Student Other Evaluation
Activities Content Practices or Achievement Outcomes Design Total
Knowledge Curriculum

Other Outcomes (N=23 Programs)
7

6

5
# of Programs

4

3

2

1

0
No. of HQ Teachers Prof. Learning Effects on IHE Teacher Placement & Hi-Level Student Dev. Of Leadership
Community Partners Retention Course Enrollment Cadre

Programs’ Ratings on Reporting of Evaluation
Results (N=23 Programs)
16

14

12

10
# of Programs

Yes
8
No

6

4

2

0
Participant District/State General Public
School Grant Donors
Teachers Officials
Administrators

Phase II Findings

Cross-State Analysis of Evaluations of
Professional Development for Mathematics
and Science Teachers
(Blank, de las Alas, & Smith, 2008)

Results from Review of Evaluation
Study Reports
(N=41 Reports, as of Feb. 2008)

 Data on quality of implementation of PD activities – 14
reports
 Data on gains in teacher knowledge – 24 reports
 Data on change in instructional practices of the teachers
participating in PD – 13 reports
 Data on student achievement for teachers, schools, and districts
in which teachers were involved with the PD – 20 reports
 16 evaluation reports included student achievement trends
for at least two years—from one year to the next (e.g., 2005 to
2006).
 Small number tracked achievement gains for more than two
years.

Evaluation Results – Student
Outcomes with Measurable Results
Three Primary Criteria:
1. Finding of effect on student outcome is
supported by statistical significance of
change linked to the treatment teachers;
2. Finding is substantively important, i.e., an
educationally significant change; and
3. Measure of student outcomes is reliable
and valid for the evaluation purpose
7 study reports met criteria (see Table 2)

Evaluation Results – Student
Outcomes but not Measurable Effects
 6 evaluation reports had student outcomes
but did not meet criteria
 Shortcomings:
 Could not determine the effects of PD vs. other
variables
 Indirect evidence (student attitude, teacher
survey on student achievement)
 See Table 3

Evaluation Results – Teacher Knowledge

Two criteria:
 Assessment of knowledge gain based on an
a validated instrument
 Assessment of knowledge mandated by the
state for licensure/certification

10 study reports met criteria (See Table 4)

Evaluation Results – Instructional Practice

Common Characteristics
 Practice linked to PD experienced
 Measure of Practice is close to the classroom
(e.g., not close = teacher attitudes about
practice)
 Change in practice measure for same
teacher

4 evaluation reports (See Table 6)

Evaluation Results – What we have
learned about how these programs
operate
 8 program designs
 Relative high amount of time for each teacher in
PD
 Targeted teachers in elem. grades or elem. &
middle grades math or science
 Significant activities during school year
 Emphasized knowledge of how to teach content to
students
 Schools were a strong partner in building &
implementing PD
 See Table 8

Conclusions & Recommendations
 1/3 of evaluation studies reported measurable
effects of teacher PD
 Significant effects in programs designed with
content focused PD + sufficient time + in-
school component
 Important to plan purposeful evaluations
 Build valid, tested instruments in evaluation
design
 Weigh carefully teacher-based vs. school-based
design

Conclusions &
Recommendations – cont’d
 Include outcome measures in allocation of evaluation
resources
 Plan for use of data systems and experimental designs
 Instruments generating longitudinal data for students:
CSAP, Indiana Curriculum Framework Assessment,
Student Discourse Protocol, Terra Nova
 Instruments generating longitudinal data for teachers:
RTOP, SEC, CKT-M, DTAMS, Project-specific surveys
 Link teacher knowledge gains to change in classroom
practices
 Consider timely use of findings in program decisions by
key decision-makers
 Program/states should appraise the value of partnerships
for evaluation

Website : Evaluation Tools

 Evaluation Tools for Professional
Development (Excel based)
 ProfessionalDevelopment Activities
 Teacher Knowledge and Skills

 Change in Instruction

 Analyzing Effects on Student Achievement

 Evaluation Design Assistance

IMPDE Website

http://www.ccsso.org/projects/improving_evaluation_of_profe
ssional_development

Lessons from the Teacher Incentive Fund Project
Making Longitudinal Data Systems Work

Christopher Thorn
University of Wisconsin-Madison

Core Information System Gaps
 Many HR systems do not handle multiple roles
well (or at all)
 SIS focused on managing student-teacher ratios
and scheduling rather than learning
 IT and Research offices understaffed and often
disconnected from curricular departments
 Finance and HR not tied to research agenda
 Market for administrative systems lag
educational needs – models often don’t fit

Data Quality Problems
 Poor links between students and teachers –
often nonexistent links to subjects taught
 Little systematic data on system inputs or on
fidelity of implementation
 Limited support for performance evaluation
systems – framework, rubrics, and process
 Quality problems are tied to use. NCLB
compliance quality is unacceptable for adult
accountability

Analytical Gaps in
Accountability Systems
 VA Models are haaaaaaaaard
 Tests given in mid-year complicates assignment of
productivity – pre and post in year
 Lack of vertical scales complicates measurement of
productivity – claims of linearity also an issue
 Lack of links to licensure systems complicates
assessment of changes in teacher/leader characteristics
and efficacy of teacher preparation institutions
 Limits of annual data
 Lack of quality in performance evaluation systems

What have been the big surprises?
 Districts drop number of initiatives to focus on
program quality and evaluation
 Unions demanding VA results for equity
 National union bringing technical expertise to the table
to solve local problems
 Talent pool in national vendors thin
 New demands for diagnostic assessment
 New demands for PD that works

Value-added Modeling
 Essential Elements of a System to Measure the
Performance of Schools and Classrooms/Teachers
with Respect to Student Achievement
 1. Criterion validity/alignment Are the indicators
measured in terms of student outcomes valued by
students and society?
 2. Statistical: Are the indictors accurate in the sense of
measuring true school or classroom productivity, as
opposed to other non-school factors that contribute to
student achievement?
 3. Behavioral: Are the indicators non-corruptible?

VA System Recommendations
 Use all available data to estimate a “T3+” model that
exploits repeated observations to control for student
selectivity.
 Include explicit measures of student characteristics in
the model if: (a) the number of longitudinal
observations per student is limited or (b) you want to
maximize control for student differences across
schools.
 If using prior test scores as regressors, control for test
measurement error.
 Extend the model to allow for local conditions, as
required. Example: mid-year testing.

Action Items &
Recommendations
 Balanced assessments that combine measures of
practice and productivity
 Address issues of data quality and
implementation fidelity
 School as the unit of change the model in almost
all TIF sites
 Proven efficacy of PD critical in systems that are
high stakes for adults

For more information:
Nina de las Alas, Research Associate
Council of Chief State School Officers
ninaa@ccsso.org
202-312-6863

Chris Thorn, Assistant Research Scientist
Wisconsin Center for Education Research
cathorn@wisc.edu
608-263-2709

2008 nces summer data conference making the case for longitudinal data of teacher effectiveness

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 2008 nces summer data conference making the case for longitudinal data of teacher effectiveness

Similar to 2008 nces summer data conference making the case for longitudinal data of teacher effectiveness (20)

More from Christopher Thorn

More from Christopher Thorn (20)

Recently uploaded

Recently uploaded (20)

2008 nces summer data conference making the case for longitudinal data of teacher effectiveness