SlideShare a Scribd company logo
1 of 29
Download to read offline
Aspects of Validity
An Argument-Based, Systematic Framework
to Study Validity and Reliability of Unit and
Program Assessments
Nancy Wellenzohn, EdD
Associate Dean & Director of Accreditation
CAEP Coordinator
WHY ANALYZE VALIDITY?
• Assessments are instruments that
demonstrate that goals and objectives are
being met.
• Goals and objectives are established using
standards, current research in best practice,
conversations with field partners, and other
relevant sources.
• A validity study provides legitimacy to the
program assessments
WHY ANALYZE VALIDITY?
• Connecting assessments and curriculum
to standards, best practices, and needs of
the field is something that has always
been done.
• The new CAEP processes now want us to
prove it.
• Validity is more than just statistical validity.
CAEP REQUIREMENTS
• CAEP Evidence Guide provides a broad
discussion of what makes an assessment
valid and reliable.
• CAEP White Paper “Principles for measures
used in CAEP Accreditation Process” (Ewell,
2013) provides relevant insights.
• Supporting literature provides additional
guidance.
• Informal perceptions of validity are no longer
enough.
CAEP REQUIREMENTS
CAEP 5.2 says “provider’s quality assurance
system relies on relevant, verifiable,
representative, cumulative, and actionable
measures, and produces empirical evidence
that interpretations of data are valid and
consistent.”
VALIDITY LITERATURE
• Messick (1995) defined validity as “nothing
less than an evaluative summary of both
the evidence for and the actual as well as
potential consequences of score
interpretation and use.”
• Need to look at the validity of the
instrument and the validity of the data.
ASPECTS OF VALIDITY
• Need a clear and practical way to
systematically study validity.
• Messick separated the concept of validity
into six separate aspects.
• These aspects provide a good place to
start.
ASPECTS OF VALIDITY
Instrument Aspects
• Content
• Structural
• Consequential
ASPECTS OF VALIDITY
Results Aspects
• Generalizability
• External
• Substantive
CONTENT ASPECT OF VALIDITY
• Evidence of “content relevance,
representativeness, and technical quality”
(Messick 1995, p.6)
• Can be supported by content and
performance standards
• Topic of assessment can be found in
professional domain
STRUCTURAL ASPECT OF
VALIDITY
• Instrument “appraised the extent to which
the internal structure of the assessment is
consistent with the construct domain.”
(Messick, 1995, p. 6)
• Are we asking the right question?
CONSEQUENTIAL ASPECT OF
VALIDITY
• “Appraises the value implications of score
interpretation as a basis for action as well
as the actual and potential consequences
of test use…” (Messick, 1995, p.6)
• Does the instrument lead to results,
positive or negative, that are meaningful?
GENERALIZABILITY ASPECT OF
VALIDITY
• “Extent to which score properties and
interpretations generalize to and across
population groups, setting, and tasks”
(Messick, 1995, p.6)
• Are the data consistent between groups,
over time, and consistent with best
practice in the field?
• Are the data predictive?
EXTERNAL ASPECT OF VALIDITY
• Includes “convergent and discriminant
evidence from multi-trait and multi-method
comparisons as well as evidence of criterion
relevance and applied utility” (Messick 1995,
p. 6)
• Does the data correlate with other variables?
Are the results consistent with other
assessments? Are conclusions made
considering results of multiple assessments?
SUBSTANTIVE ASPECT OF
VALIDITY
• “Theoretical rationales for the observed
consistencies in test responses” (Messick,
1995, p.6)
• Are the candidates taking the right actions,
meaning similar to those in the field?
• Validity and Reliability
• Relevance
• Verifiability
• Representativeness
• Cumulativeness
• Fairness
• Stakeholder Interest
• Benchmarks
• Vulnerability to
Manipulation
• Actionability
CAEP WHITE PAPER
PETER EWELL
“PRINCIPLES FOR MEASURES USED IN THE CAEP ACCREDITATION PROCESS”
• •
OVERLAP BETWEEN THE EWELL AND MESSICK
CONCEPTS IS APPARENT
CONCEPT OF UNITARY VALIDITY
(MESSICK 1989)
• The standard for studying validity for years
had been to consider content, construct, and
criterion validity.
• Messick said “an ideal validation includes
several different types of evidence that spans
all three of the categories.” (Messick 1989)
• This allows for the consideration of different
types of evidence rather than separately
studying different types of validity.
AN ARGUMENT-BASED
APPROACH TO VALIDATION
(KANE, 2013)
• “Under the argument-based approach to
validity, test-score interpretations and uses
that are clearly stated and are supported by
appropriate evidence are considered to be
valid.” (Kane, 2013)
• This means that programs can validate their
assessments by providing multiple pieces of
evidence that lend support to the notion that
an assessment is valid
TAKEAWAYS FROM THE
LITERATURE
• Messick’s aspects of validity provide a useful
framework for analysis
• Ewell’s principles for measures used in CAEP
accreditation are supportive of and related to
the aspects of validity
• Kane suggests that programs can make
arguments that assessments are valid
• Messick’s unitary theory suggests that
multiple factors can be considered
ASPECTS OF VALIDITY REVIEW - INSTRUMENT
Unacceptable 1 Acceptable 2 Target 3
Content Aspect of Construct
Validity: Evidence of content
relevance
Assessment Content does not meet at
least two of the following:
Aligned with national or state
standards
Developed with input from external
partners.
Measure is relevant and demonstrably
related to an issue of importance
Assessment Content meets at least
two of the following:
Aligned with national or state
standards
Developed with input from external
partners
Measure is relevant and demonstrably
related to an issue of importance
Assessment Content meets all of the
following:
Aligned with national or state
standards
Developed with input from external
partners
Measure is relevant and demonstrably
related to an issue of importance
Structural Aspect of Construct
Validity: Observed consistency in
responses
Assessment Structure does not meet
at least 2 of the following:
Data are verifiable; can be replicated
by third parties
Measure is typical of underlying
situation, not an isolated case
Data are compared to benchmarks
such as peers or best practices.
Program considers sources of
potential bias
Assessment Structure meets at least
2 of the following:
Data are verifiable; can be replicated
by third parties
Measure is typical of underlying
situation, not an isolated case
Data are compared to benchmarks
such as peers or best practices.
Program considers sources of
potential bias
Assessment Structure meets at least
3 of the following:
Data are verifiable; can be replicated
by third parties
Measure is typical of underlying
situation, not an isolated case
Data are compared to benchmarks
such as peers or best practices
Program considers sources of
potential bias
Consequential Aspect of Construct
Validity: Positive and negative
consequences, either intended or
unintended, are observed and
discussed
Assessment Consequences are
reviewed but do not ensure at least 2
of the following:
Measure is free of bias
Measure is justly applied
Data are reinforced by reviewing
related measures to decrease
vulnerability to manipulation
Assessment Consequences are
reviewed to ensure at least 2 of the
following:
Measure is free of bias
Measure is justly applied
Data are reinforced by reviewing
related measures to decrease
vulnerability to manipulation
Assessment Consequences are
reviewed to ensure all of the following:
Measure is free of bias
Measure is justly applied
Data are reinforced by reviewing
related measures to decrease
vulnerability to manipulation
ASPECTS OF VALIDITY REVIEW - RESULTS
Unacceptable 1 Acceptable 2 Target 3
Substantive Aspect of
Construct Validity: Observed
consistency in the test
responses/scores
Assessment Substance does not meet at
least 2 of the following:
Measure has been subject to independent
verification
Measure is typical of situation, not an
isolated case, and representative of entire
population
Data are compared to benchmarks such
as peers or best practices
The program considers whether data are
vulnerable to manipulation
Assessment Substance meets at least 2 of
the following:
Measure has been subject to independent
verification
Measure is typical of situation, not an
isolated case, and representative of entire
population
Data are compared to benchmarks such
as peers or best practices
The program considers whether data are
vulnerable to manipulation
Assessment Substance meets at least 3 of
the following:
Measure has been subject to independent
verification
Measure is typical of situation, not an
isolated case, and representative of entire
population
Data are compared to benchmarks such
as peers or best practices
The program considers whether data are
vulnerable to manipulation
Generalizability Aspect of
Construct Validity: Results
generalize to and across
population groups
Assessment Generalizability does not
include at least 2 of the following:
The results can be subject to independent
verification and if repeated by observers it
would yield similar results
Measure is free of bias able to be justly
applied by any potential user or observer
Measure provides specific guidance for
action and improvement
Assessment Generalizability includes at
least 2 of the following:
The results can be subject to independent
verification and if repeated by observers it
would yield similar results
Measure is free of bias able to be justly
applied by any potential user or observer
Measure provides specific guidance for
action and improvement
Assessment Generalizability includes all of
the following:
The results can be subject to independent
verification and if repeated by observers it
would yield similar results
Measure is free of bias able to be justly
applied by any potential user or observer
Measure provides specific guidance for
action and improvement
External Aspect of Construct
Validity: Correlations with
external variables exist
External Aspect of the assessment does
not include at least 2 of the following:
Data are correlated with external data.
If repeated by third parties, results would
replicate.
Measure is combined with other measures
to increase cumulative weight of results
Student work is created with some
instructor support
External Aspect of the assessment
includes at least 2 of the following:
Data are correlated with external data.
If repeated by third parties, results would
replicate.
Measure is combined with other measures
to increase cumulative weight of results
Student work is created with some
instructor support
External Aspect of the assessment
includes 3 of the following:
Data are correlated with external data.
If repeated by third parties, results would
replicate.
Measure is combined with other measures
to increase cumulative weight of results
Student work is created without instructor
support
RESULTS CAN BE GRAPHED
• The instrument aspect scores (content, structural,
and consequential aspects) can be averaged.
• The results aspects scores (substantive,
generalizability, and external aspects) can be
averaged.
• Resulting scores can be plotted.
• This can be done for a collection of assessments
in a program to provide a visual representation of
how the program is doing overall.
IMPLEMENTATION
• This is a peer-review process.
• Directors/Chairs make arguments that their
assessments are valid.
• They submit evidence for each argument.
The Director/Chair self-scores the evidence
in a software system.
• A 2-3 person panel reviews the arguments
and applies the rubric. They meet to form a
consensus for scoring.
EXAMPLE EDUCATIONAL LEADERSHIP
Content Aspect of Construct
Validity: Evidence of content
relevance
Assessment Content does
not meet at least two of the
following:
Aligned with national or state
standards
Developed with input from
external partners.
Measure is relevant and
demonstrably related to an
issue of importance
Assessment Content meets
at least two of the following:
Aligned with national or state
standards
Developed with input from
external partners
Measure is relevant and
demonstrably related to an
issue of importance
Assessment Content meets
all of the following:
Aligned with national or state
standards
Developed with input from
external partners
Measure is relevant and
demonstrably related to an
issue of importance
EDUCATIONAL LEADERSHIP EXAMPLE
Substantive Aspect of Construct Validity:
Observed consistency in the test
responses/scores
Assessment Substance does not meet at
least 2 of the following:
Measure has been subject to independent
verification
Measure is typical of situation, not an
isolated case, and representative of entire
population
Data are compared to benchmarks such as
peers or best practices
The program considers whether data are
vulnerable to manipulation
Assessment Substance meets at least 2 of
the following:
Measure has been subject to independent
verification
Measure is typical of situation, not an
isolated case, and representative of entire
population
Data are compared to benchmarks such as
peers or best practices
The program considers whether data are
vulnerable to manipulation
Assessment Substance meets at least 3 of
the following:
Measure has been subject to independent
verification
Measure is typical of situation, not an
isolated case, and representative of entire
population
Data are compared to benchmarks such as
peers or best practices
The program considers whether data are
vulnerable to manipulation
RESULT OF VALIDITY REVIEW
REFERENCES
Council for the Accreditation of Educator Preparation, CAEP Evidence Guide, February 2014
Ewell, Peter (2013). Principles for measures used in CAEP accreditation process, White Paper for Council for
the Accreditation of Educator Preparation.
Kane, Michael. (2013). The argument-based approach to validation. School Psychology Review 42(4), 448-
457.
Messick, Samuel. (1995). Standards of validity and the validity of standards in performance assessment.
Educational Measurement, Issues and Practice. 14, 5-8.
Messick, Samuel. (1989). Chapter in Linn, R. L. Educational Measurement 3rd Edition. New York: American
Council on Education/Macmillan 13-103.
QUESTIONS?
wellenzn@Canisius.edu

More Related Content

What's hot

Tools in Qualitative Research: Validity and Reliability
Tools in Qualitative Research: Validity and ReliabilityTools in Qualitative Research: Validity and Reliability
Tools in Qualitative Research: Validity and ReliabilityDr. Sarita Anand
 
assessment and evaluation in education
 assessment and evaluation in education assessment and evaluation in education
assessment and evaluation in educationKanika Maheshwari
 
Validity, Reliability and Feasibility
Validity, Reliability and FeasibilityValidity, Reliability and Feasibility
Validity, Reliability and FeasibilityJasna3134
 
Item analysis in education
Item analysis  in educationItem analysis  in education
Item analysis in educationmunsif123
 
Qualities of a Good Test
Qualities of a Good TestQualities of a Good Test
Qualities of a Good TestDrSindhuAlmas
 
Week 9 validity and reliability
Week 9 validity and reliabilityWeek 9 validity and reliability
Week 9 validity and reliabilitywawaaa789
 
Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)Mahsa Farahanynia
 
Test Reliability and Validity
Test Reliability and ValidityTest Reliability and Validity
Test Reliability and ValidityBrian Ebie
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Linejan
 
Research Method
Research MethodResearch Method
Research MethodYeonYuRae
 
Validity of test
Validity of testValidity of test
Validity of testSarat Rout
 

What's hot (20)

Tools in Qualitative Research: Validity and Reliability
Tools in Qualitative Research: Validity and ReliabilityTools in Qualitative Research: Validity and Reliability
Tools in Qualitative Research: Validity and Reliability
 
assessment and evaluation in education
 assessment and evaluation in education assessment and evaluation in education
assessment and evaluation in education
 
Mixed method research
Mixed method researchMixed method research
Mixed method research
 
10. triangulation
10. triangulation10. triangulation
10. triangulation
 
Causal-Comparative Research
Causal-Comparative ResearchCausal-Comparative Research
Causal-Comparative Research
 
Validity, Reliability and Feasibility
Validity, Reliability and FeasibilityValidity, Reliability and Feasibility
Validity, Reliability and Feasibility
 
Item analysis in education
Item analysis  in educationItem analysis  in education
Item analysis in education
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validity
 
Validity in Assessment
Validity in AssessmentValidity in Assessment
Validity in Assessment
 
Qualities of a Good Test
Qualities of a Good TestQualities of a Good Test
Qualities of a Good Test
 
Monika seminar
Monika seminarMonika seminar
Monika seminar
 
Week 9 validity and reliability
Week 9 validity and reliabilityWeek 9 validity and reliability
Week 9 validity and reliability
 
Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)
 
Chapter10
Chapter10Chapter10
Chapter10
 
Test Reliability and Validity
Test Reliability and ValidityTest Reliability and Validity
Test Reliability and Validity
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity
 
Research Method
Research MethodResearch Method
Research Method
 
Triangulation
TriangulationTriangulation
Triangulation
 
Validity of test
Validity of testValidity of test
Validity of test
 
grading system.pptx
grading system.pptxgrading system.pptx
grading system.pptx
 

Viewers also liked

3.4 types of validity
3.4 types of validity3.4 types of validity
3.4 types of validityA M
 
Causation in epidemiology
Causation in epidemiologyCausation in epidemiology
Causation in epidemiologySoyebo Oluseye
 
Validity, its types, measurement & factors.
Validity, its types, measurement & factors.Validity, its types, measurement & factors.
Validity, its types, measurement & factors.Maheen Iftikhar
 
Uses of epidemiology
Uses of epidemiologyUses of epidemiology
Uses of epidemiologyKEM Hospital
 
Randomized Controlled Trial
Randomized Controlled Trial Randomized Controlled Trial
Randomized Controlled Trial Sumit Das
 
Randomised controlled trials
Randomised controlled trialsRandomised controlled trials
Randomised controlled trialsHesham Gaber
 
Presentation Validity & Reliability
Presentation Validity & ReliabilityPresentation Validity & Reliability
Presentation Validity & Reliabilitysongoten77
 
association and causation
association and causationassociation and causation
association and causationguestc43c63
 
There’s No Escape from External Validity – Reporting Habits of Randomized Con...
There’s No Escape from External Validity – Reporting Habits of Randomized Con...There’s No Escape from External Validity – Reporting Habits of Randomized Con...
There’s No Escape from External Validity – Reporting Habits of Randomized Con...Stockholm Institute of Transition Economics
 
Association & causation (2016)
Association & causation (2016)Association & causation (2016)
Association & causation (2016)Shyam Ashtekar
 
RELIABILITY AND IMPROVEMENT OF ELECTRIC POWER GENERATION AND DISTRIBUTION ( I...
RELIABILITY AND IMPROVEMENT OF ELECTRIC POWER GENERATION AND DISTRIBUTION ( I...RELIABILITY AND IMPROVEMENT OF ELECTRIC POWER GENERATION AND DISTRIBUTION ( I...
RELIABILITY AND IMPROVEMENT OF ELECTRIC POWER GENERATION AND DISTRIBUTION ( I...Austin Ola Oshin Zechariah
 
3.1 big picture
3.1 big picture3.1 big picture
3.1 big pictureA M
 
Experimental Evaluation Methods
Experimental Evaluation MethodsExperimental Evaluation Methods
Experimental Evaluation Methodsclearsateam
 

Viewers also liked (20)

Association & causation
Association & causationAssociation & causation
Association & causation
 
Validation
ValidationValidation
Validation
 
Validity Study Barbados NVQ-B
Validity Study Barbados NVQ-BValidity Study Barbados NVQ-B
Validity Study Barbados NVQ-B
 
3.4 types of validity
3.4 types of validity3.4 types of validity
3.4 types of validity
 
Causation in epidemiology
Causation in epidemiologyCausation in epidemiology
Causation in epidemiology
 
Validity, its types, measurement & factors.
Validity, its types, measurement & factors.Validity, its types, measurement & factors.
Validity, its types, measurement & factors.
 
Bias and validity
Bias and validityBias and validity
Bias and validity
 
Uses of epidemiology
Uses of epidemiologyUses of epidemiology
Uses of epidemiology
 
Validity
ValidityValidity
Validity
 
Randomized Controlled Trial
Randomized Controlled Trial Randomized Controlled Trial
Randomized Controlled Trial
 
Randomised controlled trials
Randomised controlled trialsRandomised controlled trials
Randomised controlled trials
 
Epidemiology ppt
Epidemiology pptEpidemiology ppt
Epidemiology ppt
 
Presentation Validity & Reliability
Presentation Validity & ReliabilityPresentation Validity & Reliability
Presentation Validity & Reliability
 
association and causation
association and causationassociation and causation
association and causation
 
Lecture 10
Lecture 10Lecture 10
Lecture 10
 
There’s No Escape from External Validity – Reporting Habits of Randomized Con...
There’s No Escape from External Validity – Reporting Habits of Randomized Con...There’s No Escape from External Validity – Reporting Habits of Randomized Con...
There’s No Escape from External Validity – Reporting Habits of Randomized Con...
 
Association & causation (2016)
Association & causation (2016)Association & causation (2016)
Association & causation (2016)
 
RELIABILITY AND IMPROVEMENT OF ELECTRIC POWER GENERATION AND DISTRIBUTION ( I...
RELIABILITY AND IMPROVEMENT OF ELECTRIC POWER GENERATION AND DISTRIBUTION ( I...RELIABILITY AND IMPROVEMENT OF ELECTRIC POWER GENERATION AND DISTRIBUTION ( I...
RELIABILITY AND IMPROVEMENT OF ELECTRIC POWER GENERATION AND DISTRIBUTION ( I...
 
3.1 big picture
3.1 big picture3.1 big picture
3.1 big picture
 
Experimental Evaluation Methods
Experimental Evaluation MethodsExperimental Evaluation Methods
Experimental Evaluation Methods
 

Similar to Aspects of Validity

JC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptxJC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptxsaurami
 
Validity and reliability in assessment.
Validity and reliability in assessment. Validity and reliability in assessment.
Validity and reliability in assessment. Tarek Tawfik Amin
 
1) A cyber crime is a crime that involves a computer and the Inter.docx
1) A cyber crime is a crime that involves a computer and the Inter.docx1) A cyber crime is a crime that involves a computer and the Inter.docx
1) A cyber crime is a crime that involves a computer and the Inter.docxSONU61709
 
The validity of Assessment.pptx
The validity of Assessment.pptxThe validity of Assessment.pptx
The validity of Assessment.pptxNurulKhusna13
 
Validity and reliability of questionnaires
Validity and reliability of questionnairesValidity and reliability of questionnaires
Validity and reliability of questionnairesVenkitachalam R
 
reliability and validity psychology 1234
reliability and validity psychology 1234reliability and validity psychology 1234
reliability and validity psychology 1234MajaAiraBumatay
 
Validity in performance appraisal
Validity in performance appraisalValidity in performance appraisal
Validity in performance appraisalMOHAMMED SAQIB
 
Validation of Score Meaning and Justification of a Score Use: A Comprehensive...
Validation of Score Meaning and Justification of a Score Use: A Comprehensive...Validation of Score Meaning and Justification of a Score Use: A Comprehensive...
Validation of Score Meaning and Justification of a Score Use: A Comprehensive...Castle Worldwide, Inc.
 
COMMUNITY EVALUATION 2023.pptx
COMMUNITY  EVALUATION 2023.pptxCOMMUNITY  EVALUATION 2023.pptx
COMMUNITY EVALUATION 2023.pptxgggadiel
 
Evolution of Family Planning Impact Evaluation: New contexts and methodologic...
Evolution of Family Planning Impact Evaluation: New contexts and methodologic...Evolution of Family Planning Impact Evaluation: New contexts and methodologic...
Evolution of Family Planning Impact Evaluation: New contexts and methodologic...MEASURE Evaluation
 
VALIDITY
VALIDITYVALIDITY
VALIDITYANCYBS
 
Evaluation of health programs
Evaluation of health programsEvaluation of health programs
Evaluation of health programsnium
 
Program evaluation
Program evaluationProgram evaluation
Program evaluationYen Bunsoy
 
Validity & reliability seminar
Validity & reliability seminarValidity & reliability seminar
Validity & reliability seminarmrikara185
 
· define the terms sample and population and describe some of
· define the terms sample and population and describe some of · define the terms sample and population and describe some of
· define the terms sample and population and describe some of LesleyWhitesidefv
 
Outcomes in Occupational Therapy (& Assistive Technology)
Outcomes in Occupational Therapy (& Assistive Technology)Outcomes in Occupational Therapy (& Assistive Technology)
Outcomes in Occupational Therapy (& Assistive Technology)will wade
 

Similar to Aspects of Validity (20)

JC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptxJC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptx
 
Validity and reliability in assessment.
Validity and reliability in assessment. Validity and reliability in assessment.
Validity and reliability in assessment.
 
1) A cyber crime is a crime that involves a computer and the Inter.docx
1) A cyber crime is a crime that involves a computer and the Inter.docx1) A cyber crime is a crime that involves a computer and the Inter.docx
1) A cyber crime is a crime that involves a computer and the Inter.docx
 
The validity of Assessment.pptx
The validity of Assessment.pptxThe validity of Assessment.pptx
The validity of Assessment.pptx
 
Validity and reliability of questionnaires
Validity and reliability of questionnairesValidity and reliability of questionnaires
Validity and reliability of questionnaires
 
reliability and validity psychology 1234
reliability and validity psychology 1234reliability and validity psychology 1234
reliability and validity psychology 1234
 
Rep
RepRep
Rep
 
Validity in performance appraisal
Validity in performance appraisalValidity in performance appraisal
Validity in performance appraisal
 
Week 8 & 9 - Validity and Reliability
Week 8 & 9 - Validity and ReliabilityWeek 8 & 9 - Validity and Reliability
Week 8 & 9 - Validity and Reliability
 
Validation of Score Meaning and Justification of a Score Use: A Comprehensive...
Validation of Score Meaning and Justification of a Score Use: A Comprehensive...Validation of Score Meaning and Justification of a Score Use: A Comprehensive...
Validation of Score Meaning and Justification of a Score Use: A Comprehensive...
 
COMMUNITY EVALUATION 2023.pptx
COMMUNITY  EVALUATION 2023.pptxCOMMUNITY  EVALUATION 2023.pptx
COMMUNITY EVALUATION 2023.pptx
 
Evolution of Family Planning Impact Evaluation: New contexts and methodologic...
Evolution of Family Planning Impact Evaluation: New contexts and methodologic...Evolution of Family Planning Impact Evaluation: New contexts and methodologic...
Evolution of Family Planning Impact Evaluation: New contexts and methodologic...
 
Validity and Reliability.pdf
Validity and Reliability.pdfValidity and Reliability.pdf
Validity and Reliability.pdf
 
Validity and Reliability.pdf
Validity and Reliability.pdfValidity and Reliability.pdf
Validity and Reliability.pdf
 
VALIDITY
VALIDITYVALIDITY
VALIDITY
 
Evaluation of health programs
Evaluation of health programsEvaluation of health programs
Evaluation of health programs
 
Program evaluation
Program evaluationProgram evaluation
Program evaluation
 
Validity & reliability seminar
Validity & reliability seminarValidity & reliability seminar
Validity & reliability seminar
 
· define the terms sample and population and describe some of
· define the terms sample and population and describe some of · define the terms sample and population and describe some of
· define the terms sample and population and describe some of
 
Outcomes in Occupational Therapy (& Assistive Technology)
Outcomes in Occupational Therapy (& Assistive Technology)Outcomes in Occupational Therapy (& Assistive Technology)
Outcomes in Occupational Therapy (& Assistive Technology)
 

Recently uploaded

Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 

Recently uploaded (20)

Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 

Aspects of Validity

  • 1. Aspects of Validity An Argument-Based, Systematic Framework to Study Validity and Reliability of Unit and Program Assessments Nancy Wellenzohn, EdD Associate Dean & Director of Accreditation CAEP Coordinator
  • 2. WHY ANALYZE VALIDITY? • Assessments are instruments that demonstrate that goals and objectives are being met. • Goals and objectives are established using standards, current research in best practice, conversations with field partners, and other relevant sources. • A validity study provides legitimacy to the program assessments
  • 3. WHY ANALYZE VALIDITY? • Connecting assessments and curriculum to standards, best practices, and needs of the field is something that has always been done. • The new CAEP processes now want us to prove it. • Validity is more than just statistical validity.
  • 4. CAEP REQUIREMENTS • CAEP Evidence Guide provides a broad discussion of what makes an assessment valid and reliable. • CAEP White Paper “Principles for measures used in CAEP Accreditation Process” (Ewell, 2013) provides relevant insights. • Supporting literature provides additional guidance. • Informal perceptions of validity are no longer enough.
  • 5. CAEP REQUIREMENTS CAEP 5.2 says “provider’s quality assurance system relies on relevant, verifiable, representative, cumulative, and actionable measures, and produces empirical evidence that interpretations of data are valid and consistent.”
  • 6. VALIDITY LITERATURE • Messick (1995) defined validity as “nothing less than an evaluative summary of both the evidence for and the actual as well as potential consequences of score interpretation and use.” • Need to look at the validity of the instrument and the validity of the data.
  • 7. ASPECTS OF VALIDITY • Need a clear and practical way to systematically study validity. • Messick separated the concept of validity into six separate aspects. • These aspects provide a good place to start.
  • 8. ASPECTS OF VALIDITY Instrument Aspects • Content • Structural • Consequential
  • 9. ASPECTS OF VALIDITY Results Aspects • Generalizability • External • Substantive
  • 10. CONTENT ASPECT OF VALIDITY • Evidence of “content relevance, representativeness, and technical quality” (Messick 1995, p.6) • Can be supported by content and performance standards • Topic of assessment can be found in professional domain
  • 11. STRUCTURAL ASPECT OF VALIDITY • Instrument “appraised the extent to which the internal structure of the assessment is consistent with the construct domain.” (Messick, 1995, p. 6) • Are we asking the right question?
  • 12. CONSEQUENTIAL ASPECT OF VALIDITY • “Appraises the value implications of score interpretation as a basis for action as well as the actual and potential consequences of test use…” (Messick, 1995, p.6) • Does the instrument lead to results, positive or negative, that are meaningful?
  • 13. GENERALIZABILITY ASPECT OF VALIDITY • “Extent to which score properties and interpretations generalize to and across population groups, setting, and tasks” (Messick, 1995, p.6) • Are the data consistent between groups, over time, and consistent with best practice in the field? • Are the data predictive?
  • 14. EXTERNAL ASPECT OF VALIDITY • Includes “convergent and discriminant evidence from multi-trait and multi-method comparisons as well as evidence of criterion relevance and applied utility” (Messick 1995, p. 6) • Does the data correlate with other variables? Are the results consistent with other assessments? Are conclusions made considering results of multiple assessments?
  • 15. SUBSTANTIVE ASPECT OF VALIDITY • “Theoretical rationales for the observed consistencies in test responses” (Messick, 1995, p.6) • Are the candidates taking the right actions, meaning similar to those in the field?
  • 16. • Validity and Reliability • Relevance • Verifiability • Representativeness • Cumulativeness • Fairness • Stakeholder Interest • Benchmarks • Vulnerability to Manipulation • Actionability CAEP WHITE PAPER PETER EWELL “PRINCIPLES FOR MEASURES USED IN THE CAEP ACCREDITATION PROCESS” • • OVERLAP BETWEEN THE EWELL AND MESSICK CONCEPTS IS APPARENT
  • 17. CONCEPT OF UNITARY VALIDITY (MESSICK 1989) • The standard for studying validity for years had been to consider content, construct, and criterion validity. • Messick said “an ideal validation includes several different types of evidence that spans all three of the categories.” (Messick 1989) • This allows for the consideration of different types of evidence rather than separately studying different types of validity.
  • 18. AN ARGUMENT-BASED APPROACH TO VALIDATION (KANE, 2013) • “Under the argument-based approach to validity, test-score interpretations and uses that are clearly stated and are supported by appropriate evidence are considered to be valid.” (Kane, 2013) • This means that programs can validate their assessments by providing multiple pieces of evidence that lend support to the notion that an assessment is valid
  • 19. TAKEAWAYS FROM THE LITERATURE • Messick’s aspects of validity provide a useful framework for analysis • Ewell’s principles for measures used in CAEP accreditation are supportive of and related to the aspects of validity • Kane suggests that programs can make arguments that assessments are valid • Messick’s unitary theory suggests that multiple factors can be considered
  • 20.
  • 21. ASPECTS OF VALIDITY REVIEW - INSTRUMENT Unacceptable 1 Acceptable 2 Target 3 Content Aspect of Construct Validity: Evidence of content relevance Assessment Content does not meet at least two of the following: Aligned with national or state standards Developed with input from external partners. Measure is relevant and demonstrably related to an issue of importance Assessment Content meets at least two of the following: Aligned with national or state standards Developed with input from external partners Measure is relevant and demonstrably related to an issue of importance Assessment Content meets all of the following: Aligned with national or state standards Developed with input from external partners Measure is relevant and demonstrably related to an issue of importance Structural Aspect of Construct Validity: Observed consistency in responses Assessment Structure does not meet at least 2 of the following: Data are verifiable; can be replicated by third parties Measure is typical of underlying situation, not an isolated case Data are compared to benchmarks such as peers or best practices. Program considers sources of potential bias Assessment Structure meets at least 2 of the following: Data are verifiable; can be replicated by third parties Measure is typical of underlying situation, not an isolated case Data are compared to benchmarks such as peers or best practices. Program considers sources of potential bias Assessment Structure meets at least 3 of the following: Data are verifiable; can be replicated by third parties Measure is typical of underlying situation, not an isolated case Data are compared to benchmarks such as peers or best practices Program considers sources of potential bias Consequential Aspect of Construct Validity: Positive and negative consequences, either intended or unintended, are observed and discussed Assessment Consequences are reviewed but do not ensure at least 2 of the following: Measure is free of bias Measure is justly applied Data are reinforced by reviewing related measures to decrease vulnerability to manipulation Assessment Consequences are reviewed to ensure at least 2 of the following: Measure is free of bias Measure is justly applied Data are reinforced by reviewing related measures to decrease vulnerability to manipulation Assessment Consequences are reviewed to ensure all of the following: Measure is free of bias Measure is justly applied Data are reinforced by reviewing related measures to decrease vulnerability to manipulation
  • 22. ASPECTS OF VALIDITY REVIEW - RESULTS Unacceptable 1 Acceptable 2 Target 3 Substantive Aspect of Construct Validity: Observed consistency in the test responses/scores Assessment Substance does not meet at least 2 of the following: Measure has been subject to independent verification Measure is typical of situation, not an isolated case, and representative of entire population Data are compared to benchmarks such as peers or best practices The program considers whether data are vulnerable to manipulation Assessment Substance meets at least 2 of the following: Measure has been subject to independent verification Measure is typical of situation, not an isolated case, and representative of entire population Data are compared to benchmarks such as peers or best practices The program considers whether data are vulnerable to manipulation Assessment Substance meets at least 3 of the following: Measure has been subject to independent verification Measure is typical of situation, not an isolated case, and representative of entire population Data are compared to benchmarks such as peers or best practices The program considers whether data are vulnerable to manipulation Generalizability Aspect of Construct Validity: Results generalize to and across population groups Assessment Generalizability does not include at least 2 of the following: The results can be subject to independent verification and if repeated by observers it would yield similar results Measure is free of bias able to be justly applied by any potential user or observer Measure provides specific guidance for action and improvement Assessment Generalizability includes at least 2 of the following: The results can be subject to independent verification and if repeated by observers it would yield similar results Measure is free of bias able to be justly applied by any potential user or observer Measure provides specific guidance for action and improvement Assessment Generalizability includes all of the following: The results can be subject to independent verification and if repeated by observers it would yield similar results Measure is free of bias able to be justly applied by any potential user or observer Measure provides specific guidance for action and improvement External Aspect of Construct Validity: Correlations with external variables exist External Aspect of the assessment does not include at least 2 of the following: Data are correlated with external data. If repeated by third parties, results would replicate. Measure is combined with other measures to increase cumulative weight of results Student work is created with some instructor support External Aspect of the assessment includes at least 2 of the following: Data are correlated with external data. If repeated by third parties, results would replicate. Measure is combined with other measures to increase cumulative weight of results Student work is created with some instructor support External Aspect of the assessment includes 3 of the following: Data are correlated with external data. If repeated by third parties, results would replicate. Measure is combined with other measures to increase cumulative weight of results Student work is created without instructor support
  • 23. RESULTS CAN BE GRAPHED • The instrument aspect scores (content, structural, and consequential aspects) can be averaged. • The results aspects scores (substantive, generalizability, and external aspects) can be averaged. • Resulting scores can be plotted. • This can be done for a collection of assessments in a program to provide a visual representation of how the program is doing overall.
  • 24.
  • 25. IMPLEMENTATION • This is a peer-review process. • Directors/Chairs make arguments that their assessments are valid. • They submit evidence for each argument. The Director/Chair self-scores the evidence in a software system. • A 2-3 person panel reviews the arguments and applies the rubric. They meet to form a consensus for scoring.
  • 26. EXAMPLE EDUCATIONAL LEADERSHIP Content Aspect of Construct Validity: Evidence of content relevance Assessment Content does not meet at least two of the following: Aligned with national or state standards Developed with input from external partners. Measure is relevant and demonstrably related to an issue of importance Assessment Content meets at least two of the following: Aligned with national or state standards Developed with input from external partners Measure is relevant and demonstrably related to an issue of importance Assessment Content meets all of the following: Aligned with national or state standards Developed with input from external partners Measure is relevant and demonstrably related to an issue of importance
  • 27. EDUCATIONAL LEADERSHIP EXAMPLE Substantive Aspect of Construct Validity: Observed consistency in the test responses/scores Assessment Substance does not meet at least 2 of the following: Measure has been subject to independent verification Measure is typical of situation, not an isolated case, and representative of entire population Data are compared to benchmarks such as peers or best practices The program considers whether data are vulnerable to manipulation Assessment Substance meets at least 2 of the following: Measure has been subject to independent verification Measure is typical of situation, not an isolated case, and representative of entire population Data are compared to benchmarks such as peers or best practices The program considers whether data are vulnerable to manipulation Assessment Substance meets at least 3 of the following: Measure has been subject to independent verification Measure is typical of situation, not an isolated case, and representative of entire population Data are compared to benchmarks such as peers or best practices The program considers whether data are vulnerable to manipulation
  • 29. REFERENCES Council for the Accreditation of Educator Preparation, CAEP Evidence Guide, February 2014 Ewell, Peter (2013). Principles for measures used in CAEP accreditation process, White Paper for Council for the Accreditation of Educator Preparation. Kane, Michael. (2013). The argument-based approach to validation. School Psychology Review 42(4), 448- 457. Messick, Samuel. (1995). Standards of validity and the validity of standards in performance assessment. Educational Measurement, Issues and Practice. 14, 5-8. Messick, Samuel. (1989). Chapter in Linn, R. L. Educational Measurement 3rd Edition. New York: American Council on Education/Macmillan 13-103. QUESTIONS? wellenzn@Canisius.edu