SlideShare a Scribd company logo
1 of 10
Download to read offline
1
Measuring Positive Development of Youth in Context: The Design and Validation of an Embedded
Assessment System for Out-of-School Time Programs
Objective
Current research in positive youth development (PYD) is framed within developmental systems
models that recognize the bidirectional relationship between individuals and context (Jelicic, Theokas,
Phelps & Lerner, 2007; Lerner & Steinberg, 2009; Overton, 2010). However, the creation of ecologically
validated measures that assess the positive behavior and well-being of individuals within developmental
contexts has lagged (Ramey & Rose-Krasnor, 2012). Such measures are essential for supporting the
growth of individual children; evaluating and improving context-specific curriculum; as well as for testing
hypothesized links between program inputs and desired outcomes over time.
This paper describes the creation and initial validation study of the Desired Results
Developmental Profile-School Age (2011) Complete Version (DRDP-SA; CDE, 2011a), a strengths-based
measure created to assess the positive cognitive, socio-emotional, language and physical development
of youth who participate in before- and after-school programs funded by the California Department of
Education (CDE). Enrollment eligibility in such programs is based on financial need of families or other
at-risk criteria, for youth ages 6 through 12. The DRDP-SA is designed to allow for flexibility in the
structure and objectives of individual youth development programs, while remaining sensitive to the
economic, linguistic, and cultural diversity of youth and families that such programs serve (CDE, 2011b).
At a time when governmental testing systems are widely criticized for implementing narrow summative
assessment systems, the DRDP-SA represents a unique approach to valuing and measuring outcomes
that contribute to the overall well-being and positive development of youth.
2
Perspectives and Theoretical Frameworks
The DRDP-SA was constructed based on the principles of the BEAR (Berkeley Evaluation and
Assessment Research) Assessment System (BAS; Wilson & Sloane, 2000; Wilson, 2005). First, the DRDP-
SA assessment is built around four Desired Results (DR1-4, listed below)--defined by CDE as conditions
of well-being for children--which draw upon primary components of PYD theories (e.g., Benson, Scales,
Hamilton & Sesma, 2006; Damon, 2004; Lerner, 2009).
DR1: Children are personally and socially competent.
DR2: Children are effective learners.
DR3: Children show physical and motor competence.
DR4: Children are safe and healthy.
Each DR is associated with one or more domains, which represent crucial areas of learning and
development for young children. A domain defines a DR more specifically so that it can be measured.
The six domains include Self and Social Development; Health; Language and Literacy Development;
Cognitive Development; Mathematical Development; and Physical Development. There are multiple
measures for each domain. Each measure of focuses on a specific competency, conceptualized as a
developmental learning progression (Wilson, 2005). An example of a measure from the Self and Social
Development Domain of the DRDP-SA is shown in Figure 1.
The learning progressions, or developmental levels, that are found within each measure are
grounded in the relevant child development literature and elaborated with examples of observable
behaviors that a youth might exhibit in the context of a program setting. For instance, the first
development level of Measure 3: Empathy, from the Self and social development domain, is
“Demonstrates awareness of own feelings.” An example of this developmental level that a practitioner
might document is a child who draws a picture to show how she feels, or who says, “I feel really happy.”
3
(see Figure 1). Initial drafts of measures were created by teams of youth development researchers,
psychometric experts, and seasoned practitioners.
Second, as an embedded assessment system, the DRDP-SA integrates assessment into the
curriculum and regular program context (Wilson & Sloane, 2000). Thus, the DRDP-SA differs from most
other PYD measures in that it does not index the presence or absence of an outcome by asking youth to
self-report on behaviors, nor does it require providers to create artificial testing events. Instead,
providers are trained to observe and collect evidence of positive youth behaviors and learning (e.g.,
using quotes, drawings, writings, notations of actions) that occur during on-going program activities.
Providers are further trained to reflect on their observations and to make nuanced evaluations, based
on sequential developmental landmarks identified in the DRDP-SA, of what individual youths know and
what they can do. Because assessments are made during integrated activities, results of the assessment
can be used to plan curriculum for individual children or groups of children, as well as to support
continuous program improvement. In this way, a strong and meaningful connection is made between
assessment for purposes of both instruction and accountability (Wilson, 2010), framed by the
expectations of the Desired Results for PYD.
Finally, the technical quality of the instrument is addressed through a state-wide calibration
study, the application of generalized forms of item response models (IRMs), and consideration of
sources of evidence for validity, reliability and fairness as established by the Standards for Educational
and Psychological Testing (AERA, APA, NCME, 1999). The measurement model that is used for quality
assurance purposes defines how inferences are drawn from the scored observations of practitioners.
The output of IRMs provides useful information that is not available through numerical scores averages
and other traditional summary techniques.
4
Methods
Data for the calibration study was collected by staff from state-funded before- and after-school
programs sampled from throughout California. Programs included in the sample were selected from a
roster of all school-age programs served by the Child Development Division of CDE, by region within
California and by socioeconomic status within region. Staff were trained in observing children and
completing the DRDP-SA in day-long seminars held throughout the state. Following the training
seminars, staff returned to their programs to observe and evaluate a sample of the children under their
care (each staff member rated between one and six children; the median number of rated children per
teacher was three).
The DRDP-SAs were completed only for children who attended the program consistently for at
least 10 hours per week in the previous month. In addition, only children that were under the rater‘s
direct care for at least 30 days were rated. Following the training sessions, the teachers had about 2-4
months to observe the child, complete the DRDP-SAs and return them to the study center. The teachers
were compensated for their time and effort.
We conducted six unidimensional partial credit analyses, as well as multidimensional partial
credit analyses, with the data set described below. All item response models were estimated using the
ConQuest (Wu, Adams &Wilson, 1998) software program.
Data
Data were comprised of a total of 705 completed and acceptable DRDP-SAs. A primary goal of
an initial validity student was to collect evidence about a representative sample of youth across age
5
groups, gender and ethnicity, so that the DRDP-SA could be used to make valid inferences about the
population, and to evaluate the sensitivity of the instrument to ethnicity and gender. Children‘s
ethnicity was reasonably representative of the California population at this age: 57% Hispanic-American,
18% European-American, 10% African-American, 4% Asian-American and 11% of other ethnicities
compared to the 2008-2009 California school-age enrollment statistics available on line from the
California Department of Education‘s data and statistics site. Table 1 lists the demographic distribution
of the sample.
Qualitative data were also collected from teachers and site administrators using interviews and
surveys, and analyzed to inform the development team on the strengths and weaknesses of the
instrument.
Results
The main conclusion is that the DRDP-SA was reasonably well calibrated using the study sample
and exhibited technical properties that offer strong validity evidence. One domain, Physical
Development, exhibited some problems at the item and domain level, and should be examined in
further iterations of the instrument. Results are presented in detail below for Wright Map item-person
distributions; item fit; reliability; and internal structure.
Wright Maps: Wright maps help illustrate how individual persons and DRDR-SA measures are
distributed along a shared ability continuum on a logit scale for each domain. (Technically the logit is
the log of the odds ratio. The logit scale is commonly used in psychometric research and can be easily
rescaled to any other score range without the loss of generality.) Analysis of Wright maps for the six
domains showed good correspondence between persons and items, with the exception of the Physical
6
Development domain, which suggests that some of the measures may be too easy or that they do not
differentiate well among children with high levels of physical development.
Item fit: We inspected how each measure fits the model. For item fit statistics, we used the
weighted mean square (WMS) and the corresponding t-statistic (Wright & Masters, 1982). For each
measure, the fit statistic compares the variability in the observers‘ ratings with that expected by the
model, given the distribution of ability scores. Only one measure, Exercise and Fitness, indicated more
variance than expected (implying inconsistency). However, one measure with WMS with higher than
expected variance is no more than would be expected by chance in a set of items of this size; hence, the
total set can be considered within acceptable range.
Reliability: Overall the DRDP-SA showed very high internal consistency, ranging from .85 to .99.
The reliability of the DRDP-SA was also evaluated through an examination of individual children’s
standard error. Generally, standard errors were smaller in the middle range than at the tails of ability
distributions. Because student abilities in the middle range of ability distribution matched the
distribution of the items, more information was collected there, and, hence more precise abilities were
estimated.
Internal Structure: We examined differential item functioning (DIF) of gender (male vs. female)
and ethnicity (African-American vs. European-American; Hispanic-American vs. European- American) for
the six domains. An item (e.g., DRDP-SA measure) is flagged to exhibit DIF if it performs differently
across the groups of interest (i.e., gender or ethnicity) for children who possess the same ability for a
specific area of ability (e.g., a DRDP-SA domain). In other words, DIF is identified if the item obtained
different item difficulties for two interest groups conditioned on the same ability for this domain. For
7
instance, an item is flagged to exhibit Gender DIF if, on average, it is more difficult for females of a
certain ability than for males of the same ability. Only one item, Exercise and Fitness, exhibited
statistically significant Gender DIF, with a large effect size favoring males. Since this item was flagged in
estimates of both item fit and DIF, it should be reviewed in future iterations of the instrument. There
was no evidence of DIF for any measures on ethnicity for Hispanic-American vs. European-American, nor
for African-American vs. European-American.
Scientific or scholarly significance of the study or work.
The result of BEAR Assessment System (BAS) (Wilson, 2005; Wilson & Sloane, 2000) is the DRDP-
SA tool for assessing positive behavior and learning, tracking progress over time, and providing feedback
on individual and group progress. The DRDP-SA accommodates the bidirectional relationship of an
individual’s development in context, lending support to claims for its ecological validity. Application of
the BAS principles, as exemplified by the DRDP-SA, provides evidence of how assessments that are both
developmentally appropriate and of high technical quality can be developed to support instructional and
accountability purposes in youth development interventions.
8
Figure 1. Measure 3: Empathy, from the Self and Social Development Domain.
9
Table 1. Demographic Distribution of the Calibration Sample (N=705)
Percent
Gender Female 50%
Ethnicity African-American
Asian-American
European-American
Hispanic- American
Other
Missing
7%
3%
22%
59%
8%
1%
Language Spoken at
Home
English
Spanish
Other
Bilinguala
54%
30%
2%
4%
Number of weekly hours
with child
10 or lessb
11-20
21-30
31-40
40+
1%
69%
24%
5%
1%
aChild speaks English and Spanish or English and another language. bMost DRDP-Rs that indicated 10 or
less hours were discarded from the data set. A few DRDP-Rs were kept because they were part of a pair
for another Condition in the study.
10
References
American Educational Research Association, American Psychological Association, & National
Council on Measurement in Education. (1999). Standards for educational and
psychological testing. Washington, DC: American Educational Research Association.
Benson, Scales, Hamilton & Sesma, (2006). Positive youth development: Theory, research, and
applications.
California Department of Education (2011a). Desired Results Developmental Profile-School Age
(2011) Complete Version. Retrieved from:
http://www.cde.ca.gov/sp/cd/ci/drdpforms.asp
California Department of Education (2011b). Introduction to Desired Results. Retrieved from:
http://www.cde.ca.gov/sp/cd/ci/desiredresults.asp.
Damon, W. (2004). What is positive youth development? Annals of the American Academy of
Political and Social Science, 591-13-24.
Jelicic, H., Theokas, C., Phelps, E., Lerner, R. (2007). Conceptuzing and measuring the context
within person  context models of human development: Implications for theory,
research and application. In Little, T. D., Bovaird, J. A., & Card, N. A., (Eds.), Modeling
Contextual Effects in Longitudinal Studies of Human Development. Mahwah, NJ:
Erlbaum.
Lerner, R. M. & Steinberg, L. (Eds.) (2009). Handbook of adolescent psychology (3rd
Ed.).
Hoboken, NJ: Wiley.
Overton, W. (2010). Life-span development: concepts and issues.
Ramey, H. & Rose-Krasnor, L. (2012). Contexts of structured youth activities and positive youth
development. Child Development Perspectives (6)1, 85-91.
Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah:
Lawrence Erlbaum.
Wilson, M. (2010). Assessment for learning and for accountability. Retrieved from
http://www.k12center.org/rsc/pdf/WilsonPolicyBrief.pdf.
Wilson, M., & Sloane, K. (2000). From principles to practice: An embedded assessment system.
AppliedMeasurement in Education, 13(2), 181‐208.
Wright, B., & Masters, G. (1982). Rating scale analysis: Rasch measurement. Chicago: MESA
Press.
Wu, M., Adams, M. J., & Wilson, M. (1998). ACERConQuest [Computer program & manual].
Hawthorn, Australia: ACER Press.

More Related Content

Similar to Sammet, Moore & Wilson.2013.Measuring Positive Development of Youth in Context - The Design and Validation of an Embedded Assessment System for Out-of-School Time Programs

Child Assessment An Essential Component of Quality Early Childhoo.docx
Child Assessment An Essential Component of Quality Early Childhoo.docxChild Assessment An Essential Component of Quality Early Childhoo.docx
Child Assessment An Essential Component of Quality Early Childhoo.docxmccormicknadine86
 
Ashford samariaposter2009
Ashford samariaposter2009Ashford samariaposter2009
Ashford samariaposter2009SaMaria Hughes
 
Assignment Content1. Top of FormProfessional dispositions ha.docx
Assignment Content1. Top of FormProfessional dispositions ha.docxAssignment Content1. Top of FormProfessional dispositions ha.docx
Assignment Content1. Top of FormProfessional dispositions ha.docxbraycarissa250
 
MSSP-6-25-2012_Final_Summary_Report
MSSP-6-25-2012_Final_Summary_ReportMSSP-6-25-2012_Final_Summary_Report
MSSP-6-25-2012_Final_Summary_ReportDavid Anthony Lewis
 
EDUC – 3003 Week 2Assignment 1 .docx
EDUC – 3003 Week 2Assignment 1          .docxEDUC – 3003 Week 2Assignment 1          .docx
EDUC – 3003 Week 2Assignment 1 .docxtarifarmarie
 
A Career Exploration Program For Learning Disabled High School Students
A Career Exploration Program For Learning Disabled High School StudentsA Career Exploration Program For Learning Disabled High School Students
A Career Exploration Program For Learning Disabled High School StudentsScott Donald
 
Dr. Morris’ Alternate Extra Credit Assignment 10 points .docx
Dr. Morris’ Alternate Extra Credit Assignment 10 points  .docxDr. Morris’ Alternate Extra Credit Assignment 10 points  .docx
Dr. Morris’ Alternate Extra Credit Assignment 10 points .docxmadlynplamondon
 
Using State ECE Workforce Registry Data
Using State ECE Workforce Registry DataUsing State ECE Workforce Registry Data
Using State ECE Workforce Registry DataDebra Ackerman
 
Instruments for measuring public satisfaction with the education
Instruments for measuring public satisfaction with the educationInstruments for measuring public satisfaction with the education
Instruments for measuring public satisfaction with the educationEmad Mohammed Sindi
 
Evaluating Orphan and Vulnerable Children Outcomes: Innovative Methodology fo...
Evaluating Orphan and Vulnerable Children Outcomes: Innovative Methodology fo...Evaluating Orphan and Vulnerable Children Outcomes: Innovative Methodology fo...
Evaluating Orphan and Vulnerable Children Outcomes: Innovative Methodology fo...MEASURE Evaluation
 
· Independent Design Project Literature Review and Research Log .docx
· Independent Design Project Literature Review and Research Log .docx· Independent Design Project Literature Review and Research Log .docx
· Independent Design Project Literature Review and Research Log .docxodiliagilby
 
Lesson 9 Evidence-Based Practices Fall 2014 ReadingsBrown.docx
Lesson 9  Evidence-Based Practices Fall 2014  ReadingsBrown.docxLesson 9  Evidence-Based Practices Fall 2014  ReadingsBrown.docx
Lesson 9 Evidence-Based Practices Fall 2014 ReadingsBrown.docxSHIVA101531
 
3Screening Assessments Unit 3.docx
3Screening Assessments Unit 3.docx3Screening Assessments Unit 3.docx
3Screening Assessments Unit 3.docxgilbertkpeters11344
 
Org Behavior Case Study
Org Behavior Case StudyOrg Behavior Case Study
Org Behavior Case StudyCasey Hudson
 
Survey Report 19 May 2020.pdf
Survey Report  19 May 2020.pdfSurvey Report  19 May 2020.pdf
Survey Report 19 May 2020.pdfDevBhardwaj39
 
Child outcomes bulletin_2010--supporting_methodology
Child outcomes bulletin_2010--supporting_methodologyChild outcomes bulletin_2010--supporting_methodology
Child outcomes bulletin_2010--supporting_methodologymfmzk5
 
Choosing and-using-sel-competency-assessments what-schools-and-districts-need...
Choosing and-using-sel-competency-assessments what-schools-and-districts-need...Choosing and-using-sel-competency-assessments what-schools-and-districts-need...
Choosing and-using-sel-competency-assessments what-schools-and-districts-need...Professional Development Strategist
 

Similar to Sammet, Moore & Wilson.2013.Measuring Positive Development of Youth in Context - The Design and Validation of an Embedded Assessment System for Out-of-School Time Programs (20)

Child Assessment An Essential Component of Quality Early Childhoo.docx
Child Assessment An Essential Component of Quality Early Childhoo.docxChild Assessment An Essential Component of Quality Early Childhoo.docx
Child Assessment An Essential Component of Quality Early Childhoo.docx
 
Ashford samariaposter2009
Ashford samariaposter2009Ashford samariaposter2009
Ashford samariaposter2009
 
Assignment Content1. Top of FormProfessional dispositions ha.docx
Assignment Content1. Top of FormProfessional dispositions ha.docxAssignment Content1. Top of FormProfessional dispositions ha.docx
Assignment Content1. Top of FormProfessional dispositions ha.docx
 
MSSP-6-25-2012_Final_Summary_Report
MSSP-6-25-2012_Final_Summary_ReportMSSP-6-25-2012_Final_Summary_Report
MSSP-6-25-2012_Final_Summary_Report
 
EDUC – 3003 Week 2Assignment 1 .docx
EDUC – 3003 Week 2Assignment 1          .docxEDUC – 3003 Week 2Assignment 1          .docx
EDUC – 3003 Week 2Assignment 1 .docx
 
08 chapter 3 1
08 chapter 3 108 chapter 3 1
08 chapter 3 1
 
A Career Exploration Program For Learning Disabled High School Students
A Career Exploration Program For Learning Disabled High School StudentsA Career Exploration Program For Learning Disabled High School Students
A Career Exploration Program For Learning Disabled High School Students
 
Dr. Morris’ Alternate Extra Credit Assignment 10 points .docx
Dr. Morris’ Alternate Extra Credit Assignment 10 points  .docxDr. Morris’ Alternate Extra Credit Assignment 10 points  .docx
Dr. Morris’ Alternate Extra Credit Assignment 10 points .docx
 
Using State ECE Workforce Registry Data
Using State ECE Workforce Registry DataUsing State ECE Workforce Registry Data
Using State ECE Workforce Registry Data
 
Instruments for measuring public satisfaction with the education
Instruments for measuring public satisfaction with the educationInstruments for measuring public satisfaction with the education
Instruments for measuring public satisfaction with the education
 
EdNote-SEL
EdNote-SELEdNote-SEL
EdNote-SEL
 
Evaluating Orphan and Vulnerable Children Outcomes: Innovative Methodology fo...
Evaluating Orphan and Vulnerable Children Outcomes: Innovative Methodology fo...Evaluating Orphan and Vulnerable Children Outcomes: Innovative Methodology fo...
Evaluating Orphan and Vulnerable Children Outcomes: Innovative Methodology fo...
 
· Independent Design Project Literature Review and Research Log .docx
· Independent Design Project Literature Review and Research Log .docx· Independent Design Project Literature Review and Research Log .docx
· Independent Design Project Literature Review and Research Log .docx
 
Lesson 9 Evidence-Based Practices Fall 2014 ReadingsBrown.docx
Lesson 9  Evidence-Based Practices Fall 2014  ReadingsBrown.docxLesson 9  Evidence-Based Practices Fall 2014  ReadingsBrown.docx
Lesson 9 Evidence-Based Practices Fall 2014 ReadingsBrown.docx
 
3Screening Assessments Unit 3.docx
3Screening Assessments Unit 3.docx3Screening Assessments Unit 3.docx
3Screening Assessments Unit 3.docx
 
Org Behavior Case Study
Org Behavior Case StudyOrg Behavior Case Study
Org Behavior Case Study
 
Survey Report 19 May 2020.pdf
Survey Report  19 May 2020.pdfSurvey Report  19 May 2020.pdf
Survey Report 19 May 2020.pdf
 
Child outcomes bulletin_2010--supporting_methodology
Child outcomes bulletin_2010--supporting_methodologyChild outcomes bulletin_2010--supporting_methodology
Child outcomes bulletin_2010--supporting_methodology
 
HTT 2013-2014 Edited
HTT 2013-2014 EditedHTT 2013-2014 Edited
HTT 2013-2014 Edited
 
Choosing and-using-sel-competency-assessments what-schools-and-districts-need...
Choosing and-using-sel-competency-assessments what-schools-and-districts-need...Choosing and-using-sel-competency-assessments what-schools-and-districts-need...
Choosing and-using-sel-competency-assessments what-schools-and-districts-need...
 

Sammet, Moore & Wilson.2013.Measuring Positive Development of Youth in Context - The Design and Validation of an Embedded Assessment System for Out-of-School Time Programs

  • 1. 1 Measuring Positive Development of Youth in Context: The Design and Validation of an Embedded Assessment System for Out-of-School Time Programs Objective Current research in positive youth development (PYD) is framed within developmental systems models that recognize the bidirectional relationship between individuals and context (Jelicic, Theokas, Phelps & Lerner, 2007; Lerner & Steinberg, 2009; Overton, 2010). However, the creation of ecologically validated measures that assess the positive behavior and well-being of individuals within developmental contexts has lagged (Ramey & Rose-Krasnor, 2012). Such measures are essential for supporting the growth of individual children; evaluating and improving context-specific curriculum; as well as for testing hypothesized links between program inputs and desired outcomes over time. This paper describes the creation and initial validation study of the Desired Results Developmental Profile-School Age (2011) Complete Version (DRDP-SA; CDE, 2011a), a strengths-based measure created to assess the positive cognitive, socio-emotional, language and physical development of youth who participate in before- and after-school programs funded by the California Department of Education (CDE). Enrollment eligibility in such programs is based on financial need of families or other at-risk criteria, for youth ages 6 through 12. The DRDP-SA is designed to allow for flexibility in the structure and objectives of individual youth development programs, while remaining sensitive to the economic, linguistic, and cultural diversity of youth and families that such programs serve (CDE, 2011b). At a time when governmental testing systems are widely criticized for implementing narrow summative assessment systems, the DRDP-SA represents a unique approach to valuing and measuring outcomes that contribute to the overall well-being and positive development of youth.
  • 2. 2 Perspectives and Theoretical Frameworks The DRDP-SA was constructed based on the principles of the BEAR (Berkeley Evaluation and Assessment Research) Assessment System (BAS; Wilson & Sloane, 2000; Wilson, 2005). First, the DRDP- SA assessment is built around four Desired Results (DR1-4, listed below)--defined by CDE as conditions of well-being for children--which draw upon primary components of PYD theories (e.g., Benson, Scales, Hamilton & Sesma, 2006; Damon, 2004; Lerner, 2009). DR1: Children are personally and socially competent. DR2: Children are effective learners. DR3: Children show physical and motor competence. DR4: Children are safe and healthy. Each DR is associated with one or more domains, which represent crucial areas of learning and development for young children. A domain defines a DR more specifically so that it can be measured. The six domains include Self and Social Development; Health; Language and Literacy Development; Cognitive Development; Mathematical Development; and Physical Development. There are multiple measures for each domain. Each measure of focuses on a specific competency, conceptualized as a developmental learning progression (Wilson, 2005). An example of a measure from the Self and Social Development Domain of the DRDP-SA is shown in Figure 1. The learning progressions, or developmental levels, that are found within each measure are grounded in the relevant child development literature and elaborated with examples of observable behaviors that a youth might exhibit in the context of a program setting. For instance, the first development level of Measure 3: Empathy, from the Self and social development domain, is “Demonstrates awareness of own feelings.” An example of this developmental level that a practitioner might document is a child who draws a picture to show how she feels, or who says, “I feel really happy.”
  • 3. 3 (see Figure 1). Initial drafts of measures were created by teams of youth development researchers, psychometric experts, and seasoned practitioners. Second, as an embedded assessment system, the DRDP-SA integrates assessment into the curriculum and regular program context (Wilson & Sloane, 2000). Thus, the DRDP-SA differs from most other PYD measures in that it does not index the presence or absence of an outcome by asking youth to self-report on behaviors, nor does it require providers to create artificial testing events. Instead, providers are trained to observe and collect evidence of positive youth behaviors and learning (e.g., using quotes, drawings, writings, notations of actions) that occur during on-going program activities. Providers are further trained to reflect on their observations and to make nuanced evaluations, based on sequential developmental landmarks identified in the DRDP-SA, of what individual youths know and what they can do. Because assessments are made during integrated activities, results of the assessment can be used to plan curriculum for individual children or groups of children, as well as to support continuous program improvement. In this way, a strong and meaningful connection is made between assessment for purposes of both instruction and accountability (Wilson, 2010), framed by the expectations of the Desired Results for PYD. Finally, the technical quality of the instrument is addressed through a state-wide calibration study, the application of generalized forms of item response models (IRMs), and consideration of sources of evidence for validity, reliability and fairness as established by the Standards for Educational and Psychological Testing (AERA, APA, NCME, 1999). The measurement model that is used for quality assurance purposes defines how inferences are drawn from the scored observations of practitioners. The output of IRMs provides useful information that is not available through numerical scores averages and other traditional summary techniques.
  • 4. 4 Methods Data for the calibration study was collected by staff from state-funded before- and after-school programs sampled from throughout California. Programs included in the sample were selected from a roster of all school-age programs served by the Child Development Division of CDE, by region within California and by socioeconomic status within region. Staff were trained in observing children and completing the DRDP-SA in day-long seminars held throughout the state. Following the training seminars, staff returned to their programs to observe and evaluate a sample of the children under their care (each staff member rated between one and six children; the median number of rated children per teacher was three). The DRDP-SAs were completed only for children who attended the program consistently for at least 10 hours per week in the previous month. In addition, only children that were under the rater‘s direct care for at least 30 days were rated. Following the training sessions, the teachers had about 2-4 months to observe the child, complete the DRDP-SAs and return them to the study center. The teachers were compensated for their time and effort. We conducted six unidimensional partial credit analyses, as well as multidimensional partial credit analyses, with the data set described below. All item response models were estimated using the ConQuest (Wu, Adams &Wilson, 1998) software program. Data Data were comprised of a total of 705 completed and acceptable DRDP-SAs. A primary goal of an initial validity student was to collect evidence about a representative sample of youth across age
  • 5. 5 groups, gender and ethnicity, so that the DRDP-SA could be used to make valid inferences about the population, and to evaluate the sensitivity of the instrument to ethnicity and gender. Children‘s ethnicity was reasonably representative of the California population at this age: 57% Hispanic-American, 18% European-American, 10% African-American, 4% Asian-American and 11% of other ethnicities compared to the 2008-2009 California school-age enrollment statistics available on line from the California Department of Education‘s data and statistics site. Table 1 lists the demographic distribution of the sample. Qualitative data were also collected from teachers and site administrators using interviews and surveys, and analyzed to inform the development team on the strengths and weaknesses of the instrument. Results The main conclusion is that the DRDP-SA was reasonably well calibrated using the study sample and exhibited technical properties that offer strong validity evidence. One domain, Physical Development, exhibited some problems at the item and domain level, and should be examined in further iterations of the instrument. Results are presented in detail below for Wright Map item-person distributions; item fit; reliability; and internal structure. Wright Maps: Wright maps help illustrate how individual persons and DRDR-SA measures are distributed along a shared ability continuum on a logit scale for each domain. (Technically the logit is the log of the odds ratio. The logit scale is commonly used in psychometric research and can be easily rescaled to any other score range without the loss of generality.) Analysis of Wright maps for the six domains showed good correspondence between persons and items, with the exception of the Physical
  • 6. 6 Development domain, which suggests that some of the measures may be too easy or that they do not differentiate well among children with high levels of physical development. Item fit: We inspected how each measure fits the model. For item fit statistics, we used the weighted mean square (WMS) and the corresponding t-statistic (Wright & Masters, 1982). For each measure, the fit statistic compares the variability in the observers‘ ratings with that expected by the model, given the distribution of ability scores. Only one measure, Exercise and Fitness, indicated more variance than expected (implying inconsistency). However, one measure with WMS with higher than expected variance is no more than would be expected by chance in a set of items of this size; hence, the total set can be considered within acceptable range. Reliability: Overall the DRDP-SA showed very high internal consistency, ranging from .85 to .99. The reliability of the DRDP-SA was also evaluated through an examination of individual children’s standard error. Generally, standard errors were smaller in the middle range than at the tails of ability distributions. Because student abilities in the middle range of ability distribution matched the distribution of the items, more information was collected there, and, hence more precise abilities were estimated. Internal Structure: We examined differential item functioning (DIF) of gender (male vs. female) and ethnicity (African-American vs. European-American; Hispanic-American vs. European- American) for the six domains. An item (e.g., DRDP-SA measure) is flagged to exhibit DIF if it performs differently across the groups of interest (i.e., gender or ethnicity) for children who possess the same ability for a specific area of ability (e.g., a DRDP-SA domain). In other words, DIF is identified if the item obtained different item difficulties for two interest groups conditioned on the same ability for this domain. For
  • 7. 7 instance, an item is flagged to exhibit Gender DIF if, on average, it is more difficult for females of a certain ability than for males of the same ability. Only one item, Exercise and Fitness, exhibited statistically significant Gender DIF, with a large effect size favoring males. Since this item was flagged in estimates of both item fit and DIF, it should be reviewed in future iterations of the instrument. There was no evidence of DIF for any measures on ethnicity for Hispanic-American vs. European-American, nor for African-American vs. European-American. Scientific or scholarly significance of the study or work. The result of BEAR Assessment System (BAS) (Wilson, 2005; Wilson & Sloane, 2000) is the DRDP- SA tool for assessing positive behavior and learning, tracking progress over time, and providing feedback on individual and group progress. The DRDP-SA accommodates the bidirectional relationship of an individual’s development in context, lending support to claims for its ecological validity. Application of the BAS principles, as exemplified by the DRDP-SA, provides evidence of how assessments that are both developmentally appropriate and of high technical quality can be developed to support instructional and accountability purposes in youth development interventions.
  • 8. 8 Figure 1. Measure 3: Empathy, from the Self and Social Development Domain.
  • 9. 9 Table 1. Demographic Distribution of the Calibration Sample (N=705) Percent Gender Female 50% Ethnicity African-American Asian-American European-American Hispanic- American Other Missing 7% 3% 22% 59% 8% 1% Language Spoken at Home English Spanish Other Bilinguala 54% 30% 2% 4% Number of weekly hours with child 10 or lessb 11-20 21-30 31-40 40+ 1% 69% 24% 5% 1% aChild speaks English and Spanish or English and another language. bMost DRDP-Rs that indicated 10 or less hours were discarded from the data set. A few DRDP-Rs were kept because they were part of a pair for another Condition in the study.
  • 10. 10 References American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. Benson, Scales, Hamilton & Sesma, (2006). Positive youth development: Theory, research, and applications. California Department of Education (2011a). Desired Results Developmental Profile-School Age (2011) Complete Version. Retrieved from: http://www.cde.ca.gov/sp/cd/ci/drdpforms.asp California Department of Education (2011b). Introduction to Desired Results. Retrieved from: http://www.cde.ca.gov/sp/cd/ci/desiredresults.asp. Damon, W. (2004). What is positive youth development? Annals of the American Academy of Political and Social Science, 591-13-24. Jelicic, H., Theokas, C., Phelps, E., Lerner, R. (2007). Conceptuzing and measuring the context within person  context models of human development: Implications for theory, research and application. In Little, T. D., Bovaird, J. A., & Card, N. A., (Eds.), Modeling Contextual Effects in Longitudinal Studies of Human Development. Mahwah, NJ: Erlbaum. Lerner, R. M. & Steinberg, L. (Eds.) (2009). Handbook of adolescent psychology (3rd Ed.). Hoboken, NJ: Wiley. Overton, W. (2010). Life-span development: concepts and issues. Ramey, H. & Rose-Krasnor, L. (2012). Contexts of structured youth activities and positive youth development. Child Development Perspectives (6)1, 85-91. Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah: Lawrence Erlbaum. Wilson, M. (2010). Assessment for learning and for accountability. Retrieved from http://www.k12center.org/rsc/pdf/WilsonPolicyBrief.pdf. Wilson, M., & Sloane, K. (2000). From principles to practice: An embedded assessment system. AppliedMeasurement in Education, 13(2), 181‐208. Wright, B., & Masters, G. (1982). Rating scale analysis: Rasch measurement. Chicago: MESA Press. Wu, M., Adams, M. J., & Wilson, M. (1998). ACERConQuest [Computer program & manual]. Hawthorn, Australia: ACER Press.