SlideShare a Scribd company logo
1 of 23
Ch-13 & 14
Cizek & Bunch
Brikena Haxhiraj
1
Ch-13
Scheduling Standard Setting Activities
 Chapter Goals:
 Authors suggest ways and methods of how to schedule standard setting in
two types of assessments by drawing primarily on their experiences in large
scale credentialing programs and educational assessments while providing
examples of each standard setting activities.
1. Scheduling standard setting for educational assessments
2. Scheduling standard setting for credentialing programs
2
Scheduling standard setting for educational
assessment
 Table 31-1 (p. 219-221) provides an overview of the main activities to be
completed along with a time table for their completion.
 A generic version of the table can also be found at
www.sagepub.com/cizek/schedule
 This table shows the planning for standard setting beginning two years
before the actual standard setting session.
3
1. Overall Plan
 Establish performance level labels (PLLs) and performance level descriptions (PLDs)
 Drafting a standard setting plan before item writing begins, is one way to make sure the
test supports the standard-setting activity that is eventually carried out.
 Table 13-1 shows a field test exactly one year prior to the first operational administration
of the test. During the first year, a regular testing window would be reserved for field
testing.
 The planning should specify: a) a method, b) an agenda, c) training procedures and d)
analysis procedures.
 Technical advisory committee (TAC).
 Stakeholder review
4
2. Participants
 Identify and recruit the individuals who will participate in the standard setting activity (i.e., the panelists).
 For statewide assessments, it is preferable that the panelists be as representative of the state as possible.
 Table 13-1 shows the process of identifying these individuals about nine months before standard setting
begins.
 Creation of the standard-setting panels is a three-step process
1. Local superintendents or their designees identify potential panelists in accordance with specifications
provided by the state education agency.
2. Notify candidates prior to submitting their names by sending an initial letter to all candidates
3. State agency staff sort the nominations to create the required number of panels and with the approved
numbers of panelists.
5
3. Materials
 Training materials, forms and data analysis programs
 The timing of preparing these materials is crucial
 Some can be prepared in advance and some can not (refer to Table 13-2; 13-3).
 Final Preparations: Everyone involved needs to be thoroughly prepared; all presentations
should be scripted and rehearsed, all rating forms should be double checked, all
participant materials should be produced, duplicated, collated, and assembled easy in
sets.
 As a final part of the preparation, the entire standard-setting staff should conduct a dress
rehearsal making sure that timing of presentations, is consistent with the agenda, that all
forms are correct and usable and that the flow of events is logical.
6
4. At the standard setting site and following up
 The lead facilitator attends to matters related to conduct of the sessions
 Logistics coordinator attends to everything else
 Once panelists complete their tasks, and turn in their materials, data entry staff take over,
and the next morning, the data analysis staff continues the process.
 All data entry should be verified by a second person before data analysis begins.
 The state education agency responsible for the standard setting should have arranged
time on the agenda of the state board of education as soon as possible after standard
setting in order to have cut scores approved.
 Once cut scores are adopted by the board, it is possible to include them in the score
reporting programs and produce score reports.
7
Scheduling standard setting for
credentialing programs
 Scheduling standard setting for credentialing programs is different from
educational assessment programs. Educational assessment programs are
bound to specific time of the academic year and tests are typically given in
the spring or fall.
 Credentialing programs are not bound by these constraints, and have the
ability for some flexibility such as computer adaptive testing (CAT) or
computer based testing (CBT) may permit test administration on any day of
the year.
 Table 13-4 provides an overview of the major tasks for a credentialing
testing program.
8
Small group activity
 In groups of three review pages (237-245) and post the key components of
scheduling standard setting for credentialing programs focusing on
differences between scheduling standard setting for educational
assessments.
 Use this website to post your thoughts
 http://padlet.com/wall/4qxyguqgnd
9
Recommendations
 Planning for standard setting needs to be made an integral part of planning for
test development.
 Plans of the standard setting facilitators should be reviewed by test
development staff, and vice versa.
 One person with authority over both item developers and standard setters
should have informed oversight over both activities.
 Attention to scoring in particular with open ended or constructed response
items.
 Finally, test planning, test development, and standard setting are interlinked
parts of a single enterprise.
10
Ch-14
Vertically-Moderated Standard Setting
Chapter Goals:
Describe:
(1) the general concept of VMSS
(2) specific approaches to conduct VMSS
(3) a specific application of VMSS
Provide:
(1) suggestions for a current assessment system and a need for additional
research
11
Linking Test Scores across grades within the
Norm Referenced Testing (NRT) context
Review from Ch-6 (Ryan & Shepard)
 Construct of Linking- refers to several types of statistical methods that
establish a relationship between the score scales from two tests, so the
results can be comparable between the tests.
 Test Score Equating- Used to measure year to year changes over time for
different students in the same grade
 Vertical Equating- linking test scores vertically across grade levels and
schooling levels. The tests that are to be linked need to measure the same
construct.
12
Interrelated Challenges within the Standards-
Referenced Testing (SRT) context
 NCLB requirements for tracking cohort growth & achievement gaps
 These newer assessment apply standards-referenced testing (SRT)
 Linking test performance standards from two or more grade levels (adjacent and
not adjacent)
 The construct measured may be different
 Sheer number of performance levels that NCLB requires
 The wide test span and developmental range
 The panels of educators who participate in standard setting
13
A New Method that Links Standards Across Tests
 To address these challenges, a need to develop and implement standard
setting methods that set performance levels across all affected grade levels
with some method for smoothing out differences between grades.
 Suggested approach—VMSS—Vertically Moderated Standard Setting
14
History of VMSS
 Introduced by Lissitz & Huynh (2003b)
 AYP is based on the percentage of students who meet Proficient and the
expected percentage increases over time.
 The purpose of VMSS – deriving at a set of cross grade standards that
realistically tracks student growth over time and provides a reasonable
expectation of growth from one grade to the next.
 The critical issue—defining reasonable expectations using vertical scaling would
not produce a satisfactory set of expectations for grade to grade growth.
 Alternative to vertical scaling or equating, Lissitz and Huynh (2003 b) suggested
VMSS.
15
What is VMSS?
 A process of vertical articulation of standards: aligning scores, scales or
proficiency levels.
 Is a procedure or set of procedures, typically carried out after individual
standards have been set that seeks to smooth out the bumps that inevitably
occur across grades.
 Reasonable expectations are stated in terms of percentages of students at
or above a consequential performance level, such as Proficient.
 Lets discuss the hypothetical scenario using the table on the next slide
(p.255 in your book).
16
What is VMSS?
Grades % of Students At or Above
Proficient Performance Lv.
Difference
3
4
5
6
7
8
37
41
34
43
29
42
+ 4 %
- 7 %
+ 9 %
- 14 %
+ 13 %
17
Approaches to VMSS
 Focuses on percentages of students at various proficiency levels
 Is based on assumptions about growth in achievement over time
 Problem: Different percentages of students reaching a given performance level
– such as—Proficient cut score at different grades.
 Solution:
 1. Set all standards at the score point or such that equal percentages of students
would be classified as proficient at each grade level by fiat.
 2. Set standards only for the lowest and highest grades and then align the
percentages of Proficient students in the intermediate grades accordingly.
18
Approaches to VMSS
Grades % of Students At or
Above Proficient
Performance Lv.
3
4
5
6
7
8
37
38
39
40
41
42
36
37
38
39
40
41
42
0 5 10
Y-Value 1
Y-Value 1
19
Assumptions re: growth over time
 Lewis & Huang (2005)
 The percentage of students
classified as at or above Proficient
would be expected to be:
1. Equal across grades or subjects
2. Approximately equal
3. Smoothly decreasing
4. Smoothly increasing
 Ferrara, Johnson & Chen (2005)
 Assumptions for standard setting are
based on the intersection of three
growth models:
1. Linear Growth
2. Remediation
3. Acceleration
20
Alternative procedures
 Due to VMSS being a relatively new procedure, it is difficult to pinpoint
limitations and alternative procedures
 There have been few thoroughly documented applications of VMSS
 Each application has been slightly different from the others
 Authors have suggested a common core of elements to VMSS
 However, no fixed set of steps has emerged in applications of VMSS so far
 Every aspect of any application might be thought as an alternative procedure
21
Core components of VMSS future applications
1. Grounding in historical data (Lwesi & Haug, 2005; Buckendahl et al, 2005).
2. Establishment of performance models
3. Consideration of historical data
4. Cross-grade examination of test content and student performance
5. Polling of participants
6. Follow up review and adjustment
22
Limitations
 Lack of historical perspective or context would be not only limiting, but
debilitating If the focus of VMSS is the percentages of students at or above a
particular proficiency level.
 Any application of VMSS is hampered if it is not supported by a
theoretically or empirically sound model of achievement growth.
 Maintaining a meaning of cut scores and fidelity to PLDs is one of the most
fundamental for future research.
 Research and development is a growth industry
23

More Related Content

What's hot

[Appendix 1 a] rpms tool for proficient teachers sy 2021 2022 in the time of ...
[Appendix 1 a] rpms tool for proficient teachers sy 2021 2022 in the time of ...[Appendix 1 a] rpms tool for proficient teachers sy 2021 2022 in the time of ...
[Appendix 1 a] rpms tool for proficient teachers sy 2021 2022 in the time of ...GlennOcampo
 
Angelita chapter 12
Angelita chapter 12Angelita chapter 12
Angelita chapter 12brooks43
 
ch11sped420PP
ch11sped420PPch11sped420PP
ch11sped420PPfiegent
 
Training Feedback and Evaluation, Training Audit, Training as Continuous Process
Training Feedback and Evaluation, Training Audit, Training as Continuous ProcessTraining Feedback and Evaluation, Training Audit, Training as Continuous Process
Training Feedback and Evaluation, Training Audit, Training as Continuous ProcessAshish Hande
 
Training evaluation
Training evaluationTraining evaluation
Training evaluationNancy Raj
 
Chapter 6, Training Evaluation
Chapter 6, Training Evaluation Chapter 6, Training Evaluation
Chapter 6, Training Evaluation Kacung Abdullah
 
Adaptive Testing, Learning Progressions, and Students with Disabilities - May...
Adaptive Testing, Learning Progressions, and Students with Disabilities - May...Adaptive Testing, Learning Progressions, and Students with Disabilities - May...
Adaptive Testing, Learning Progressions, and Students with Disabilities - May...Peter Hofman
 
TELPAS HOLISTIC RATER TRAINING
TELPAS HOLISTIC RATER TRAINING TELPAS HOLISTIC RATER TRAINING
TELPAS HOLISTIC RATER TRAINING kedmsd1
 
SLO process steps 3,4,5
SLO process steps 3,4,5SLO process steps 3,4,5
SLO process steps 3,4,5emilycaryn
 
1307517296577 evaluation of training
1307517296577 evaluation of training1307517296577 evaluation of training
1307517296577 evaluation of trainingRock Corner
 
Employee Training & Development Ch 06
Employee Training & Development Ch 06Employee Training & Development Ch 06
Employee Training & Development Ch 06Eko Satriyo
 

What's hot (17)

Technical Assistance to Schools
Technical Assistance to SchoolsTechnical Assistance to Schools
Technical Assistance to Schools
 
[Appendix 1 a] rpms tool for proficient teachers sy 2021 2022 in the time of ...
[Appendix 1 a] rpms tool for proficient teachers sy 2021 2022 in the time of ...[Appendix 1 a] rpms tool for proficient teachers sy 2021 2022 in the time of ...
[Appendix 1 a] rpms tool for proficient teachers sy 2021 2022 in the time of ...
 
Angelita chapter 12
Angelita chapter 12Angelita chapter 12
Angelita chapter 12
 
ch11sped420PP
ch11sped420PPch11sped420PP
ch11sped420PP
 
Training Feedback and Evaluation, Training Audit, Training as Continuous Process
Training Feedback and Evaluation, Training Audit, Training as Continuous ProcessTraining Feedback and Evaluation, Training Audit, Training as Continuous Process
Training Feedback and Evaluation, Training Audit, Training as Continuous Process
 
Qms iso 9000
Qms iso 9000Qms iso 9000
Qms iso 9000
 
Training evaluation
Training evaluationTraining evaluation
Training evaluation
 
Chapter 6, Training Evaluation
Chapter 6, Training Evaluation Chapter 6, Training Evaluation
Chapter 6, Training Evaluation
 
MRCZ Training Officer
MRCZ Training OfficerMRCZ Training Officer
MRCZ Training Officer
 
article 041114
article 041114article 041114
article 041114
 
Production tech;l logic
Production tech;l logicProduction tech;l logic
Production tech;l logic
 
Adaptive Testing, Learning Progressions, and Students with Disabilities - May...
Adaptive Testing, Learning Progressions, and Students with Disabilities - May...Adaptive Testing, Learning Progressions, and Students with Disabilities - May...
Adaptive Testing, Learning Progressions, and Students with Disabilities - May...
 
TELPAS HOLISTIC RATER TRAINING
TELPAS HOLISTIC RATER TRAINING TELPAS HOLISTIC RATER TRAINING
TELPAS HOLISTIC RATER TRAINING
 
SLO process steps 3,4,5
SLO process steps 3,4,5SLO process steps 3,4,5
SLO process steps 3,4,5
 
1307517296577 evaluation of training
1307517296577 evaluation of training1307517296577 evaluation of training
1307517296577 evaluation of training
 
Scoringrubric
ScoringrubricScoringrubric
Scoringrubric
 
Employee Training & Development Ch 06
Employee Training & Development Ch 06Employee Training & Development Ch 06
Employee Training & Development Ch 06
 

Similar to Haxhiraj ch13 14-presentation

CURRICULUM DEVELOPMENT CYCLE.pdf
CURRICULUM DEVELOPMENT CYCLE.pdfCURRICULUM DEVELOPMENT CYCLE.pdf
CURRICULUM DEVELOPMENT CYCLE.pdfVictor Rosales
 
Advanced Foundations and Methods in EL (Lecture 1)
Advanced Foundations and Methods in EL (Lecture 1)Advanced Foundations and Methods in EL (Lecture 1)
Advanced Foundations and Methods in EL (Lecture 1)Sandra Halajian, M.A.
 
vertical Moderate Standard setting
vertical Moderate Standard setting vertical Moderate Standard setting
vertical Moderate Standard setting munsif123
 
Criteria and considerations with determining a benchmark assessment
Criteria and considerations with determining a benchmark assessmentCriteria and considerations with determining a benchmark assessment
Criteria and considerations with determining a benchmark assessmentJigsaw Learning
 
Relevant-and-Pressing-Topicsin-High-Stakes-Assessment-in-Education.pptx
Relevant-and-Pressing-Topicsin-High-Stakes-Assessment-in-Education.pptxRelevant-and-Pressing-Topicsin-High-Stakes-Assessment-in-Education.pptx
Relevant-and-Pressing-Topicsin-High-Stakes-Assessment-in-Education.pptxJodieBeatrizZamora
 
Admin_TrngEfficacy_Report_FINAL_01-04-16
Admin_TrngEfficacy_Report_FINAL_01-04-16Admin_TrngEfficacy_Report_FINAL_01-04-16
Admin_TrngEfficacy_Report_FINAL_01-04-16Joshua LeCappelain
 
Dr. Fred C. Lunenburg - measurement and assessment in schools schooling v1 n1...
Dr. Fred C. Lunenburg - measurement and assessment in schools schooling v1 n1...Dr. Fred C. Lunenburg - measurement and assessment in schools schooling v1 n1...
Dr. Fred C. Lunenburg - measurement and assessment in schools schooling v1 n1...William Kritsonis
 
SD-Session-3-The-Revised-SBM-Tool.pptx
SD-Session-3-The-Revised-SBM-Tool.pptxSD-Session-3-The-Revised-SBM-Tool.pptx
SD-Session-3-The-Revised-SBM-Tool.pptxKarlaLycaSequijorEsc
 
Institution Set Standards 2013-14 Final
Institution Set Standards 2013-14 FinalInstitution Set Standards 2013-14 Final
Institution Set Standards 2013-14 FinalCarl Yuan
 
AIOU Code 697 Assessment in Science Education Solved Assignment 1.pdf
AIOU Code 697 Assessment in Science Education Solved Assignment 1.pdfAIOU Code 697 Assessment in Science Education Solved Assignment 1.pdf
AIOU Code 697 Assessment in Science Education Solved Assignment 1.pdfZawarali786
 
Assessment Report 2013-2015.pdf
Assessment Report 2013-2015.pdfAssessment Report 2013-2015.pdf
Assessment Report 2013-2015.pdfssuser3f08c81
 
AIOU Code 697 Assessment in Science Education Solved Assignment 1.pptx
AIOU Code 697 Assessment in Science Education Solved Assignment 1.pptxAIOU Code 697 Assessment in Science Education Solved Assignment 1.pptx
AIOU Code 697 Assessment in Science Education Solved Assignment 1.pptxZawarali786
 
Assessment and Evaluation 2011.pptx
Assessment and Evaluation 2011.pptxAssessment and Evaluation 2011.pptx
Assessment and Evaluation 2011.pptxSendafa edget
 
An automated continuous quality improvement framework for failing student out...
An automated continuous quality improvement framework for failing student out...An automated continuous quality improvement framework for failing student out...
An automated continuous quality improvement framework for failing student out...GodistheDoerofEveryt
 
Apprenticeships - Planning Curriculum Intent
Apprenticeships - Planning Curriculum IntentApprenticeships - Planning Curriculum Intent
Apprenticeships - Planning Curriculum IntentCapellaSystems
 
Chapter IV: Programme of Action
Chapter IV: Programme of ActionChapter IV: Programme of Action
Chapter IV: Programme of ActionJefferson Molina
 
Determinants of Lecturers Assessment Practice in Higher Education in Somalia
Determinants of Lecturers Assessment Practice in Higher Education in SomaliaDeterminants of Lecturers Assessment Practice in Higher Education in Somalia
Determinants of Lecturers Assessment Practice in Higher Education in Somaliaijejournal
 
COMPARATIVE REVIEW OF CRITERION REFERENCED TESTS (CRTs) AND NORM REFERENCED T...
COMPARATIVE REVIEW OF CRITERION REFERENCED TESTS (CRTs) AND NORM REFERENCED T...COMPARATIVE REVIEW OF CRITERION REFERENCED TESTS (CRTs) AND NORM REFERENCED T...
COMPARATIVE REVIEW OF CRITERION REFERENCED TESTS (CRTs) AND NORM REFERENCED T...Tasneem Ahmad
 

Similar to Haxhiraj ch13 14-presentation (20)

CURRICULUM DEVELOPMENT CYCLE.pdf
CURRICULUM DEVELOPMENT CYCLE.pdfCURRICULUM DEVELOPMENT CYCLE.pdf
CURRICULUM DEVELOPMENT CYCLE.pdf
 
Advanced Foundations and Methods in EL (Lecture 1)
Advanced Foundations and Methods in EL (Lecture 1)Advanced Foundations and Methods in EL (Lecture 1)
Advanced Foundations and Methods in EL (Lecture 1)
 
vertical Moderate Standard setting
vertical Moderate Standard setting vertical Moderate Standard setting
vertical Moderate Standard setting
 
Criteria and considerations with determining a benchmark assessment
Criteria and considerations with determining a benchmark assessmentCriteria and considerations with determining a benchmark assessment
Criteria and considerations with determining a benchmark assessment
 
Relevant-and-Pressing-Topicsin-High-Stakes-Assessment-in-Education.pptx
Relevant-and-Pressing-Topicsin-High-Stakes-Assessment-in-Education.pptxRelevant-and-Pressing-Topicsin-High-Stakes-Assessment-in-Education.pptx
Relevant-and-Pressing-Topicsin-High-Stakes-Assessment-in-Education.pptx
 
Admin_TrngEfficacy_Report_FINAL_01-04-16
Admin_TrngEfficacy_Report_FINAL_01-04-16Admin_TrngEfficacy_Report_FINAL_01-04-16
Admin_TrngEfficacy_Report_FINAL_01-04-16
 
Dr. Fred C. Lunenburg - measurement and assessment in schools schooling v1 n1...
Dr. Fred C. Lunenburg - measurement and assessment in schools schooling v1 n1...Dr. Fred C. Lunenburg - measurement and assessment in schools schooling v1 n1...
Dr. Fred C. Lunenburg - measurement and assessment in schools schooling v1 n1...
 
SD-Session-3-The-Revised-SBM-Tool.pptx
SD-Session-3-The-Revised-SBM-Tool.pptxSD-Session-3-The-Revised-SBM-Tool.pptx
SD-Session-3-The-Revised-SBM-Tool.pptx
 
Institution Set Standards 2013-14 Final
Institution Set Standards 2013-14 FinalInstitution Set Standards 2013-14 Final
Institution Set Standards 2013-14 Final
 
AIOU Code 697 Assessment in Science Education Solved Assignment 1.pdf
AIOU Code 697 Assessment in Science Education Solved Assignment 1.pdfAIOU Code 697 Assessment in Science Education Solved Assignment 1.pdf
AIOU Code 697 Assessment in Science Education Solved Assignment 1.pdf
 
Assessment Report 2013-2015.pdf
Assessment Report 2013-2015.pdfAssessment Report 2013-2015.pdf
Assessment Report 2013-2015.pdf
 
AIOU Code 697 Assessment in Science Education Solved Assignment 1.pptx
AIOU Code 697 Assessment in Science Education Solved Assignment 1.pptxAIOU Code 697 Assessment in Science Education Solved Assignment 1.pptx
AIOU Code 697 Assessment in Science Education Solved Assignment 1.pptx
 
Campare 3737
Campare 3737Campare 3737
Campare 3737
 
Assessment and Evaluation 2011.pptx
Assessment and Evaluation 2011.pptxAssessment and Evaluation 2011.pptx
Assessment and Evaluation 2011.pptx
 
An automated continuous quality improvement framework for failing student out...
An automated continuous quality improvement framework for failing student out...An automated continuous quality improvement framework for failing student out...
An automated continuous quality improvement framework for failing student out...
 
Apprenticeships - Planning Curriculum Intent
Apprenticeships - Planning Curriculum IntentApprenticeships - Planning Curriculum Intent
Apprenticeships - Planning Curriculum Intent
 
Chapter IV: Programme of Action
Chapter IV: Programme of ActionChapter IV: Programme of Action
Chapter IV: Programme of Action
 
Unit 9-6503.pptx
Unit 9-6503.pptxUnit 9-6503.pptx
Unit 9-6503.pptx
 
Determinants of Lecturers Assessment Practice in Higher Education in Somalia
Determinants of Lecturers Assessment Practice in Higher Education in SomaliaDeterminants of Lecturers Assessment Practice in Higher Education in Somalia
Determinants of Lecturers Assessment Practice in Higher Education in Somalia
 
COMPARATIVE REVIEW OF CRITERION REFERENCED TESTS (CRTs) AND NORM REFERENCED T...
COMPARATIVE REVIEW OF CRITERION REFERENCED TESTS (CRTs) AND NORM REFERENCED T...COMPARATIVE REVIEW OF CRITERION REFERENCED TESTS (CRTs) AND NORM REFERENCED T...
COMPARATIVE REVIEW OF CRITERION REFERENCED TESTS (CRTs) AND NORM REFERENCED T...
 

Haxhiraj ch13 14-presentation

  • 1. Ch-13 & 14 Cizek & Bunch Brikena Haxhiraj 1
  • 2. Ch-13 Scheduling Standard Setting Activities  Chapter Goals:  Authors suggest ways and methods of how to schedule standard setting in two types of assessments by drawing primarily on their experiences in large scale credentialing programs and educational assessments while providing examples of each standard setting activities. 1. Scheduling standard setting for educational assessments 2. Scheduling standard setting for credentialing programs 2
  • 3. Scheduling standard setting for educational assessment  Table 31-1 (p. 219-221) provides an overview of the main activities to be completed along with a time table for their completion.  A generic version of the table can also be found at www.sagepub.com/cizek/schedule  This table shows the planning for standard setting beginning two years before the actual standard setting session. 3
  • 4. 1. Overall Plan  Establish performance level labels (PLLs) and performance level descriptions (PLDs)  Drafting a standard setting plan before item writing begins, is one way to make sure the test supports the standard-setting activity that is eventually carried out.  Table 13-1 shows a field test exactly one year prior to the first operational administration of the test. During the first year, a regular testing window would be reserved for field testing.  The planning should specify: a) a method, b) an agenda, c) training procedures and d) analysis procedures.  Technical advisory committee (TAC).  Stakeholder review 4
  • 5. 2. Participants  Identify and recruit the individuals who will participate in the standard setting activity (i.e., the panelists).  For statewide assessments, it is preferable that the panelists be as representative of the state as possible.  Table 13-1 shows the process of identifying these individuals about nine months before standard setting begins.  Creation of the standard-setting panels is a three-step process 1. Local superintendents or their designees identify potential panelists in accordance with specifications provided by the state education agency. 2. Notify candidates prior to submitting their names by sending an initial letter to all candidates 3. State agency staff sort the nominations to create the required number of panels and with the approved numbers of panelists. 5
  • 6. 3. Materials  Training materials, forms and data analysis programs  The timing of preparing these materials is crucial  Some can be prepared in advance and some can not (refer to Table 13-2; 13-3).  Final Preparations: Everyone involved needs to be thoroughly prepared; all presentations should be scripted and rehearsed, all rating forms should be double checked, all participant materials should be produced, duplicated, collated, and assembled easy in sets.  As a final part of the preparation, the entire standard-setting staff should conduct a dress rehearsal making sure that timing of presentations, is consistent with the agenda, that all forms are correct and usable and that the flow of events is logical. 6
  • 7. 4. At the standard setting site and following up  The lead facilitator attends to matters related to conduct of the sessions  Logistics coordinator attends to everything else  Once panelists complete their tasks, and turn in their materials, data entry staff take over, and the next morning, the data analysis staff continues the process.  All data entry should be verified by a second person before data analysis begins.  The state education agency responsible for the standard setting should have arranged time on the agenda of the state board of education as soon as possible after standard setting in order to have cut scores approved.  Once cut scores are adopted by the board, it is possible to include them in the score reporting programs and produce score reports. 7
  • 8. Scheduling standard setting for credentialing programs  Scheduling standard setting for credentialing programs is different from educational assessment programs. Educational assessment programs are bound to specific time of the academic year and tests are typically given in the spring or fall.  Credentialing programs are not bound by these constraints, and have the ability for some flexibility such as computer adaptive testing (CAT) or computer based testing (CBT) may permit test administration on any day of the year.  Table 13-4 provides an overview of the major tasks for a credentialing testing program. 8
  • 9. Small group activity  In groups of three review pages (237-245) and post the key components of scheduling standard setting for credentialing programs focusing on differences between scheduling standard setting for educational assessments.  Use this website to post your thoughts  http://padlet.com/wall/4qxyguqgnd 9
  • 10. Recommendations  Planning for standard setting needs to be made an integral part of planning for test development.  Plans of the standard setting facilitators should be reviewed by test development staff, and vice versa.  One person with authority over both item developers and standard setters should have informed oversight over both activities.  Attention to scoring in particular with open ended or constructed response items.  Finally, test planning, test development, and standard setting are interlinked parts of a single enterprise. 10
  • 11. Ch-14 Vertically-Moderated Standard Setting Chapter Goals: Describe: (1) the general concept of VMSS (2) specific approaches to conduct VMSS (3) a specific application of VMSS Provide: (1) suggestions for a current assessment system and a need for additional research 11
  • 12. Linking Test Scores across grades within the Norm Referenced Testing (NRT) context Review from Ch-6 (Ryan & Shepard)  Construct of Linking- refers to several types of statistical methods that establish a relationship between the score scales from two tests, so the results can be comparable between the tests.  Test Score Equating- Used to measure year to year changes over time for different students in the same grade  Vertical Equating- linking test scores vertically across grade levels and schooling levels. The tests that are to be linked need to measure the same construct. 12
  • 13. Interrelated Challenges within the Standards- Referenced Testing (SRT) context  NCLB requirements for tracking cohort growth & achievement gaps  These newer assessment apply standards-referenced testing (SRT)  Linking test performance standards from two or more grade levels (adjacent and not adjacent)  The construct measured may be different  Sheer number of performance levels that NCLB requires  The wide test span and developmental range  The panels of educators who participate in standard setting 13
  • 14. A New Method that Links Standards Across Tests  To address these challenges, a need to develop and implement standard setting methods that set performance levels across all affected grade levels with some method for smoothing out differences between grades.  Suggested approach—VMSS—Vertically Moderated Standard Setting 14
  • 15. History of VMSS  Introduced by Lissitz & Huynh (2003b)  AYP is based on the percentage of students who meet Proficient and the expected percentage increases over time.  The purpose of VMSS – deriving at a set of cross grade standards that realistically tracks student growth over time and provides a reasonable expectation of growth from one grade to the next.  The critical issue—defining reasonable expectations using vertical scaling would not produce a satisfactory set of expectations for grade to grade growth.  Alternative to vertical scaling or equating, Lissitz and Huynh (2003 b) suggested VMSS. 15
  • 16. What is VMSS?  A process of vertical articulation of standards: aligning scores, scales or proficiency levels.  Is a procedure or set of procedures, typically carried out after individual standards have been set that seeks to smooth out the bumps that inevitably occur across grades.  Reasonable expectations are stated in terms of percentages of students at or above a consequential performance level, such as Proficient.  Lets discuss the hypothetical scenario using the table on the next slide (p.255 in your book). 16
  • 17. What is VMSS? Grades % of Students At or Above Proficient Performance Lv. Difference 3 4 5 6 7 8 37 41 34 43 29 42 + 4 % - 7 % + 9 % - 14 % + 13 % 17
  • 18. Approaches to VMSS  Focuses on percentages of students at various proficiency levels  Is based on assumptions about growth in achievement over time  Problem: Different percentages of students reaching a given performance level – such as—Proficient cut score at different grades.  Solution:  1. Set all standards at the score point or such that equal percentages of students would be classified as proficient at each grade level by fiat.  2. Set standards only for the lowest and highest grades and then align the percentages of Proficient students in the intermediate grades accordingly. 18
  • 19. Approaches to VMSS Grades % of Students At or Above Proficient Performance Lv. 3 4 5 6 7 8 37 38 39 40 41 42 36 37 38 39 40 41 42 0 5 10 Y-Value 1 Y-Value 1 19
  • 20. Assumptions re: growth over time  Lewis & Huang (2005)  The percentage of students classified as at or above Proficient would be expected to be: 1. Equal across grades or subjects 2. Approximately equal 3. Smoothly decreasing 4. Smoothly increasing  Ferrara, Johnson & Chen (2005)  Assumptions for standard setting are based on the intersection of three growth models: 1. Linear Growth 2. Remediation 3. Acceleration 20
  • 21. Alternative procedures  Due to VMSS being a relatively new procedure, it is difficult to pinpoint limitations and alternative procedures  There have been few thoroughly documented applications of VMSS  Each application has been slightly different from the others  Authors have suggested a common core of elements to VMSS  However, no fixed set of steps has emerged in applications of VMSS so far  Every aspect of any application might be thought as an alternative procedure 21
  • 22. Core components of VMSS future applications 1. Grounding in historical data (Lwesi & Haug, 2005; Buckendahl et al, 2005). 2. Establishment of performance models 3. Consideration of historical data 4. Cross-grade examination of test content and student performance 5. Polling of participants 6. Follow up review and adjustment 22
  • 23. Limitations  Lack of historical perspective or context would be not only limiting, but debilitating If the focus of VMSS is the percentages of students at or above a particular proficiency level.  Any application of VMSS is hampered if it is not supported by a theoretically or empirically sound model of achievement growth.  Maintaining a meaning of cut scores and fidelity to PLDs is one of the most fundamental for future research.  Research and development is a growth industry 23

Editor's Notes

  1. The version may be easily adapted. This schedule assumes a new testing program. Part of the planning process is establishing the number and nature of the performance levels to be set. It is necessary to bring some precision to the performance level labels (PLLs) and performance level descriptions (PLDs). If these are established by state law or board action, then some of the work has already been done.
  2. PLLs and PLDs—establishing these at the beginning, will be possible to ensure that there are test items that will support these levels. TAC- Many assessment programs employ a panel of nationally recognized assessment experts to advise them on technical issues related to those programs. Stakeholder review- are individuals or groups with a particular interest in the testing program--community members, elected or appointed officials. It is a good idea to know early in the process who these stakeholders are and obtain their input as early in the process as possible. One very special stakeholder group is the policy board that will actually make the decision to adopt, modify or reject the cut scores. For licensure and certification testing programs, the policy entity is usually the professional association or credentialing board. For statewide assessments, the policy board is usually the state board of education.
  3. As the overall standard setting plan is being reviewed by various advisory committees and stakeholder groups, the next phase of the plan is to:Identifying potential panelists for a statewide assessment program, usually involves working with the local officials, usually local superintendent. Notification letter: should include a form on which the candidate can indicate interest in and availability for a standard-setting meeting on specific dates in a specific city. After the sorting is done, the agency sends a follow up letter to all candidates of their selection or non selection. The invitation letter is sent out about 5 months prior to standard setting in order to allow panelists time to schedule it in their calendars. Six weeks prior to the event, a final letter to all panelists should confirm their participation and provide the location and driving directions, a reminder of the purpose of the meeting and contact phone numbers in case of emergency. Once rooms are confirmed, the sponsoring agency may send a housing confirmation to each panelists. One person should ne designated as the lead facilitator, who is responsible for training and other matters. A different person should be designated as the logistical coordinator who is responsible for anything related to hotel guest rooms, meeting rooms, catering copying, etc.
  4. Generic-printed materials, visuals, scripts for training etc…
  5. Ability to see potential problems, conflicts or disconnects.
  6. Ch-6 on Ryan & Shepard discuss application of test linking, the diversity in state testing programs and challenges to linking tests in accountability systems. Construct of Linking: Refers to several types of statistical methods used to establish a relationship between the score scales from two tests , so that results from one test can be compared to results on another test. To ensure comparability of tests from one test administration to the next, large scale assessment programs use a process of test score equating. Without this process it would be impossible to measure changes in achievement over time. Test Score Equating: statistical procedures by which scores on two different tests are relatedPossible when test forms are built to the same specifications and test content, difficulty, reliability, format, purpose, administration and population are equivalent. Answer the question of “have this year’s 6th graders performed better in reading that last year’s 6th graders?” Vertical Equating and linking of test scores is successful when test design and item selection within and across grade levels are managed carefully, so that there is sufficient overlap of items in adjacent test levels enables stable links such as norm referenced tests and individual and intelligence achievement tests.
  7. NCLB-SRT (standard reference testing) requires tests built to statistical specifications that are also narrower and tightly matched to specific content specifications that are also narrower and tightly matched to specific within grade content standards that often do not have considerable across grade level overlap. Therefore, the content standards upon which SRTs are based can militate against the construction of traditional cross-grade scales’ vertically lining SRTs require strong assumptions about the equivalence of constructs being assessed at different levels. Interrelated Challenges:(a) the construct measured are different-the existence of empirically determined or the theoretically assumed of a continuous developmental construct across grade levels. (b) Sheer of number of performance levels that NCLB requires- 2 levels representing higher achievement (Proficient and Advanced) are required and a lower (Basic) level. These multiple levels of is compounded by the requirement of performance standards on three different tests (reading, math, and science in grades 3-8 plus one secondary for reading and math) and three grade levels for science. (c) tests span in such a wide grade and developmental range-
  8. Introduced by Lissitz and Huynh (2003b) in a background paper prepared for the Arkansas Department of Education where they spelled out the problem of determining AYP and proposed a solution: VMSSFirst Lissitz and Huynh (2003 b), tried to define reasonable expectations using vertical scaling/equating method. They concluded that vertical scaling would generally not produce a satisfactory set of expectations for grade-to grade growth. They recommended that “new cut scores for each test be set for all grades such that:each achievement level has the same generic meaning across all grades; The proportion of students in each achievement level follow a growth curve trend across these grades
  9. VMSS may be used when there is a need to establish meaningful progressions of standards across levels or to enable reasonable predictions of student classifications over time when traditional vertical equating is not possible.
  10. Lets look at 4th or 6th graders. If we believed that these groups of students on whom these results were based were typical (we would expect similar results next year with a new group of students ).We need to point out that these currently proficient students would only have about 75% chance of scoring at the Proficient level the next year. The standards have been set so 5th and 7th graders have a lower probability of scoring at the Proficient level than do 4th and 6th graders. This means that many proficient 4th and 6th graders are going to lose ground in the subsequent grades (17% and 33%). To remedy this situation, VMSS requires a reexamination of the cut scores and percentages in light of other historical or other corollary information available at the time of standard setting. Making adjustments to the cut scores, so we have a reasonable expectation for what should happen next year.
  11. In our last example, we would take the 37% figure for Grade 3 and 42% for Grade 8 and set cut scores for Grades 4-7, so that their resulting percentages of students at or above Proficient would fall on a straight line between 37% and 42%. A liner trend has been imposed on intervening grade levels to obtain cut scores for those grades. In all cases, VMSS is based on assumptions about growth in achievement over time.
  12. Linear growth- assumes that the proficiency of all examinees increases by a fixed amount and examinees retain their positions relative to one anotherRemediation- assumes the proficiency of examinees at the lower end of the score distribution increases more than those of examinees at the upper endAcceleration- assumes the proficiency of examinees in the upper portion of the score distribution increases at a greater rate than that of examinees at the lower end of the score distribution.
  13. From the example of illustrating a VMSS process implemented for the English Language Learners Assessment (ELDA) suggestions for a common core of elements to VMSS include the following: Grounding in historical data- collected and used historical performance data to prepare for and interpret results of standard setting. Collection of these data and planning for their use may include discussions with stakeholders, and content experts in advance of standard setting. Establishment of performance models- should be based on the historical evidence. If these evidence is not available, models should rely on theories of cognitive development, discussions with content experts and stakeholders or generalization from other tests. Consideration of historical data- when available, these data should be presented to those involved in setting standards; including the participants who work through the multiple rounds of a standard setting procedure cross grade or cross subject articulation. Cross-grade examination- include some degree of cross grade review by standard setting participants. Where possible all grade review should be included in a full-scale VMSS for at least one round either the final round or at some point just prior to the final round. Polling of participants- Two studies of VMSS included the data selection from participants at the end of the standard activity. It is important for the validity evidence of for the standard setting activity but also for future standard setting activities. Follow up review and adjustment- These follow up are important for two reasons: 1. elected or appointed state officials are responsible for the successful implementation of the performance standards. 2. Even with the best intentions and earnest application of standard-setting techniques, participants may still hold fairly disparate notions with regard to where cut scores should be set.