Pvaas Growth Standard MethodologyPresentation Transcript
Pennsylvania Value-Added Assessment System (PVAAS) PVAAS Growth Standard Methodology: Statewide Implementation Questions about these materials and the Growth Standard Methodology can be directed to the PVAAS Statewide Core Team. See contact info on Slide 31.
Important Questions for Value-added Calculations
What is a Growth Standard and how is it set?
How can we compare scores across different years?
How do we estimate a student’s true level of achievment?
The Growth Standard: Key Metric in PVAAS
The Growth Standard specifies the minimal designated academic gain from grade to grade for a cohort of students.
The use of a Growth Standard creates the possibility that ALL schools can demonstrate appropriate growth.
Doctors plot a child’s length/height over time.
Each child may have a unique growth curve.
When growth “acceptable”?
The length/height measurement is increasing over time.
The length/height measurement maintains the approximate position in its length/height distributions as the child grows.
The child’s length/height continues to increase in a consistent manner.
A significant deviation of the growth pattern or a change outside the “typical” range of values is an indication that further investigation is required.
What is the Growth Standard for a child’s length/height?
The standard is that the child maintain the approximate same position each of the increasing distributions of length/heights as the child grows.
A significant deviation from that pattern indicates a need for further investigation.
Growth Standard Charts for Academic Achievement
Let us build an Academic Achievement Growth Chart.
Collect the average performances of a large sample of students using a uniform assessment during each year of their career through school.
Plot curves to represent appropriate percentile patterns.
An example: Suppose the following table represents the means and SDs of a group of students on the PSSA beginning in 3 rd grade and continuing through 8 th grade and ultimately 11 th grade.
In an ideal world…
We would have a large body of longitudinal data from many cohorts to construct our growth charts.
Since we do not, we will use the distributions from a base year for the creation of the growth curves.
The base year distributions are approximates to the achievement distributions of a cohort from grade 3 to grade 8 and 11.
Using the Base Year 2006
Suppose the distributions from 2006 are given by
Conversion to NCE scores will use the Base Year distributions in their calculations. 276 255 310 250 SD 1285 1300 1290 1270 Mean 6 5 4 3 Grade
Suppose the means of a cohort in two consecutive years are: 2007: 3rd 1390 and 2008: 4th 1450
NCE scores are calculated for both using the 2006 means and SD’s.
2007: 3rd 1390 2008: 4th 1450 All future PSSA scaled scores will be converted to NCE scores using the 2006 Base year parameters for the comparison to calculate the mean gain of a cohort of students. 310 250 SD 1290 1270 Mean 4 3 Grade
The NCE Growth Curves
This Growth Standard concept demonstrates the need for longitudinal data when considering academic growth since each student has his/her own academic growth curve.
The example also exhibits the remaining two issues for PVAAS value-added methods:
Comparing scores from year to year
Estimate the “true” level of achievement for input into the growth curve.
Calculation of Gain from year to year
Student growth is measured by difference in performance in consecutive years.
But there is a problem with this! These scores are not comparable! 6 5 4 3 Grade 35 20 20 Gain 1365 1330 1310 1290 Score
Comparing scaled scores on the PSSA from different years
PSSA tests have different means and standard deviations at each grade and for different years. For example, in 8 th grade:
239.7 1370 208.1 1350 2004 274.3 1360 222.2 1370 2005 SD Mean SD Mean Year Reading Math
A Solution: Conversion to NCE Scores
NCE scores indicate the position of a scaled score on a reference scale (mean = 50, sd = 21.06) so that the scaled scores from different distributions with different scales can be compared.
The use of NCE scores does not impose a normal distribution on the data, nor does the use of NCE scores have any relationship to normed referenced tests.
NCEs are excellent for looking at scores over time.
Using Data to Improve Student Learning in High Schools
Victoria L. Bernhardt
NCE Scores Are About Position
To calculate an NCE score:
Calculate the z-score of the data value of interest , that is, the number of standard deviations the data value is from the mean of its distribution:
The NCE score is calculated using the following formula:
George scores a 655 on the SAT mathematics exam.
George also scores a 28 on the ACT mathematics exam.
Which score should he report to his colleges if he wants to provide the “better” score?
A Matter of Comparison
How do we compare George’s scores?
The nature of each distribution is irrelevant to the question of interest: 28 655 George 5.0 20.7 ACT 110 520 SAT SD Mean
Conversion of both scores to NCE scores allows for the identification of the position of each score on the same scale.
This identification of position provides the capability of comparison since the converted scores will be based on the same distribution parameters.
Which Score Should George Choose to Report? Using a NCE scale with mean 50 and standard deviation 21.06… SAT score of 655 NCE score 75.85 ACT score of 28 NCE score 80.74 Clearly, he should report his ACT score! ACT score SAT score
Consider Another Hypothetical Scenario… In 2006, Wilma was in 4 th grade and scored as follows on the 4 th grade PSSA: Mean for 4 th Grade – 2006 = 1303.24 Standard Deviation for 4 th Grade – 2006 = 164.20 Wilma’s scaled score = 1425 In 2005, Wilma was in 3 rd grade and scored as follows on the 3 rd grade PSSA: Mean for 3 rd Grade – 2005 = 1356.75 Standard Deviation for 3 rd Grade – 2005 = 126.20 Wilma’s scaled score = 1425 Do these scores indicated that Wilma progressed during 4 th grade?
Let’s Look at it Graphically… Even though Wilma’s scaled scores were the same (both 1425), since the distributions were different, we really can’t compare the two scores… Wilma Wilma
A Tentative Solution: Conversion to Percentiles In our example, Wilma score of 1425 was in the 66 th percentile for 2005 but was in the 76 th percentile for 2006. These percentiles focus on Wilma’s position in each distribution. Wilma Wilma
We cannot calculate Wilma’s gain – the difference of percentiles does not make sense…
Percentiles are not meaningful for calculating means for different years, gains, etc., since they are calculated from different distributions.
The Complete Solution: Conversion to NCE Scores
To establish a basis of comparison for different distributions from different schools in different years, we convert the scaled scores to units in the SAME scale.
The scale we will use is from the NCE distribution with mean 50 and standard deviation approximately equal to 21.06.
The NCE Distribution and Wilma
Wilma’s NCE score for 2005 (3 rd grade) is 61
while her score for 2006 (4 th grade) is 66.
Wilma 2006 4 th Wilma 2005 3rd
Wilma’s gain = 2006 NCE score – 2005 NCE score
(4 th Grade) (3 rd Grade)
= 66 – 61
= + 5
The mean gain of all of the students in Wilma’s cohort can now be compared to the Growth Standard for growth for Wilma’s cohort.
PVAAS Statewide Methodology Student A Base Year NCE Score (2006) 2009 Observed School Mean NCE Scores Student A Test Score (2009)
The Problem with the Mean of the Observed Scores
The mean of the observed NCE scores at best represents a single snapshot in time of student achievement of the PSSA Anchors…
Is it the most comprehensive assessment of the school’s TRUE level of achievement?
How about the Bad Day syndrome?
Observed vs. Composite Estimate… Which is better?
What if we combined the new, observed data with all of the prior PSSA assessment information that we have for this cohort of students?
Would not a longitudinal view of the cohort’s performance yield a more precise and reliable estimate of the true level of achievement?
This is the essence and power of the
Consider an Example…
Determine the percent of candies that are blue…
If you were to open only one bag and find that 13% of the candies are blue, how much confidence would you have in your estimate of the true percentage of blue candies for all candies?
Only One Sample? A Bit Risky…
Let’s open 50 bags and look at the distribution of the percents of blue candies…
Looking at these 50 bags, what would you estimate the “true” percent of blue candies for all candies?
Let’s open 50 more bags and add them to the 50 selected earlier…
Distribution with n = 50 Distribution with n = 100 With this additional data, we can make a better estimate of the true percent of blue candies!
PVAAS Statewide Methodology Computer 2009 Observed School Mean NCE Scores 2008 Estimated School Mean NCE Score 2007 Estimated School Mean NCE Score 2006 Estimated School Mean NCE Score 2009 Estimated School Mean NCE Scores Gain = 2009 Estimate – 2008 Estimate Compare to Growth Standard School Rating
How to Measure Growth of a School?
Using a Growth Standard
Student scaled scores are converted to NCE scores (2006 parameters).
The mean NCE score for each school is calculated.
PVAAS revises all earlier estimates based on the addition of the current data.
PVAAS calculates an estimated NCE mean score.
Estimated Mean NCE Gain
= Current Estimated NCE mean – Previous Estimated NCE mean
Gain is compared to Growth Standard for School Effect Rating.
Here is the Fall 2006 PVAAS District/School Report
Gain Ratings Mean NCE Gain for a cohort in a given year represents the progress of students in that cohort relative to the Growth Standard of 0. Color ratings: Green – mean gain greater than or equal to the Growth Standard favorable indicator Yellow – mean gain less than one SE below the Growth Standard warning sign Light Red – mean gain is between one and two SE’s below the Growth Standard stronger caution Red – mean gain less two SE’s below the Growth Standard most serious warning
Level of Evidence – The Role of Standard Error
The color-coded ratings on the mean gain of cohorts are based on the level of confidence we have that the gain of the cohort is truly below the Growth Standard…
THE GOAL Slight Evidenceof Lack of Progress Greater Evidenceof Lack of Progress Significant Evidence of Lack of Progress At or above the Growth Standard Less than 1 SE below Growth Standard Between 1 and 2 SE’s below Growth Standard More than 2 SE’s below Growth Standard
The Power of PVAAS
The power of this methodology is that it produces:
Accurate estimates of the true level of achievement of the students in this school.
Updated estimates of all prior mean performance estimates simultaneously as new data is input into the longitudinal data structure.
Over time, more accurate and reliable estimates of the true level of understanding of the students in this grade or school.
For more information, contact:
Gerald L. Zahorchak, D.Ed. Secretary of Education Commonwealth of Pennsylvania www.pde.state.pa.us 333 Market Street Harrisburg, PA 17126