(1) Test DateDo you test students only near the end (early May) or beginning (late September) of the school year? If so, do you want to take account of the fact that the annual growth period (say, March to March) cuts across two school years and, typically, two different teachers or sets of teachers.(2) Test Properties Are test scores vertically scaled (across grades)? How can we account for test measurement error? Can we refine our predictions using statistical shrinkage?(3) Demographic controlsControl for differences across schools in student demographic characteristics?Income status (free lunch)race/ethnicitygenderspecial educationEnglish language learner, bilingual(4) Retention Include retained-in-grade and promoted students in the estimation of school effects? (Almost certainly yes.) (5) Student mobility Include students who changed schools over the summer (that is, within the annual testing interval if tests are not administered near the end or beginning of the school year)? (Probably yes.)(6) School-year mobility Include students who changed schools during the school year and take account of within-school year mobility by defining school enrollments in the model as the fraction of the school year enrolled in a given school (dose model)?(7) Classroom/teacher indicators What does a classroom teacher indicator represent? Answer: The productivity of teacher, classroom, principal, and school inputs.(8) Aggregation over units: schools, schools by grade, teacher teams, individual classroom /teacher/ school?Statistical precision is highest at the highest level of aggregation since precision increases with the number of students.Where should incentives be directed: at individuals or teams?(9) Aggregation over time. “Smooth” data over time to improve precision?(10) Multiple components. Separately estimate the productivity of regular school (and teachers), summer school, after school, NCLB Supplemental Education Services (SES)?(11) Special education detail. Control for many different types of special education status (type and severity of handicap)?(12) Multi-year data. Exploit multiple years of longitudinal student data to implicitly control for heterogeneity in student achievement growth profiles?
2010 ohio tif meeting creating a comprehensive teacher effectiveness system
Creating a ComprehensiveTeacher Effectiveness System Christopher A. Thorn University of Wisconsin-Madison Center for Data Quality and Systems Innovation Value-Added Research Center
Propositions and Claims to the Validity EvaluationProposition 1: The standards clearly define learningexpectations for the subject area and grade levelDesign Claims: Evidence:• Clarity • Expert reviews• Feasibility • Research studies validating• Explicit progressions progressions
Proposition 2a: The assessment instruments havebeen designed to yield scores that can accurately andfairly reflect student achievement of standards.Design Claims: Evidence:• Alignment with standards • Expert reviews of alignment (specs and items) • Small scale studies• Fair and accessible • Sensitivity reviews• Reliable/Replicable • Measurement reviews of procedures design, administration, and scoring procedures
Proposition 2b: The assessment instruments havebeen designed to yield scores that accurately andfairly reflect student growth over the course of theyearDesign Claims: Evidence:• Sample the range of where • Expert review students may start and end • Teacher review the school year• Designed to be sensitive to instruction
Proposition 3: Assessment scores accurately and fairlyreflect the status of students’ knowledge and skillsrelative to learning expectationsDesign Claims: Evidence:• Psychometric analyses • Psychometric analysis confirm the assessment’s • Bias analysis blueprint• Scores are sufficiently precise and reliable• Scores are fair/unbiased
Proposition 4: Student growth scores accurately andfairly measures student progress over the course ofthe yearDesign Claims: Evidence:• Score scale reflects the full • Psychometric modeling and distribution of where fit statistics students may start and end • Sensitivity/bias analysis the year• Growth scores are sufficiently precise and reliable for all students• Growth scores are fair/relatively free of bias
Proposition 5: Value-added scores represent teachers’contribution to student growthDesign Claims: Evidence:• Scores are instructionally • Assumption checking sensitive • Advanced statistical• Sores representing teacher modeling contribution are sufficiently • Research on instructional precise and reliable sensitivity• Scores representing teachers contributions are relatively free of bias
What do we see across the US?• Student Learning Objectives – Austin, Texas; Charlotte- Mecklenburg, North Carolina; and Washington, DC• Subject- and grade-alike measures – Delaware and Tennessee• Universal pre-/post-tests – Hillsborough County, Florida, and Washington, DC• Value-added composite (school/cohort) – Charlotte- Mecklenburg, North Carolina; Delaware; and Tennessee
SLOsTeacher (overseen by administrator) selectsappropriate measure(s); assesses students atbeginning of year; sets specific objectives forperformance; and assesses students at end ofyear. Administrator ultimately determines theteacher’s success in achieving SLOs (Goe 2011).
Pro/Con for SLOs• Advantages: Adaptable; permits specialization; and tends to be credible among educators.• Disadvantages: Requires significant attention from administrators; difficult to create comparability and rigor across classrooms; and does not account for differences across teachers in the students served (RttT Assistance Network 2011).
Subject- and grade-alike measuresTeachers meet in grade- and/ or subject-specificteams to consider growth measures (existing,adapted, or new assessments; portfolios ofwork; performances, and the like), then agreeon measure(s) all will use to determineindividual contributions to student growth.District/state then reviews and approves the listfor each grade and subject. (May include SLOs,pre-/ post-tests, or other measures.)
Subject- and grade-alike measures• Advantages: Designed to yield comparable measures across classrooms/districts.• Disadvantages: Requires significant attention to ensuring comparability and much dedicated time for teachers to work together to develop consistent scoring patterns (RttT Assistance Network 2011).
Universal pre-/post-tests• Written pre-/post-tests developed for every grade and every subject.• Advantages: Enables annual growth calculations for all students.• Disadvantages: Requires extensive test development and analysis capacity; and may be difficult to link previous years of test data, given course variety at secondary level.
Value-added composite (school/ cohort)Nontested teachers are assigned a compositevalue- added score based on average testperformance of either the school as a whole orthe particular cohort of students “claimed” bythe teacher(Goe 2011).
Value-added composite (school/ cohort)• Advantages: Can contribute to collective efforts around student achievement.• Disadvantages: May mask high and low performers and hold individual teachers accountable for measures over which they have limited influence.
A Tale of Two Cities – K-2• Value-Added Measures of Math and Reading• Assessments administered 2-4 times a year• Administered 1-1 by retired teachers• Minneapolis – Off the shelf public domain assessments – Used to study Beat the Odds Kindergarten Teachers• Edina – Purchased NWEA early grade assessments – Used whole grade and school to study overall effectiveness
Build Decision Points Test date Test Demographic Gradetiming and Properties Controls Retentionfrequency Classroom/ Aggregation Student School-year Teacher Units (grades, Mobility Mobility Indicators teams, schools) SpecialAggregation Multiple Education Multi-year data over time components detail
ReferencesWebinar on Evaluating Teacher Quality http://www.aacompcenter.org/cs/aacc/view/rs/26579A Practical Guide to Designing ComprehensiveTeacher Evaluation Systems http://tqsource.org/publications/practicalGuideEvalSyst ems.pdfMeasuring Teachers’ Contributions to StudentLearning Growth for Nontested Grades andSubjects http://www.lauragoe.com/LauraGoe/MeasuringTeacher sContributions.pdf