SlideShare a Scribd company logo
1 of 46
VALIDITY IN EDUCATIONAL
ASSESSMENT CHAPTER ONE
(NEWTON &SHAW, 2014)
Maryam Bolouri
Topics to be discussed:
 Validity definition
 Types ofValidity
 The history of validity
Validity 100years of debate
Agreement (bland consensus) among scholars:
 The hallmark of quality in testing
 The most important criterion for evaluating a test
Discrepancies:
Different perspectives and Meaning of validity:
1) Measurement concept (original scientific definition in
1920s)
2) Measurement and decision making concept 9btw
these 2 extremes, scientific and ethical)
3) Measurement, decision making and impacts concept
(social and ethical definition, concerns test use)
Why Validity is so confusing:
1) Very large and disparate literature onV within edu
and psycho measurement
2) Very large and disparate literature onV within
other disciplines
3) The official meaning ofV has evolved over time
4) Gained stature and expansion over time
5) Related accounts are hard to read
6) V is employed in different ways and contexts, so it
is unclear what the intended meaning is. If it is
technical sense then it is referred to which
discipline?
Validity (definition)
 Validity is the property or quality of being valid,
true, cogent or legally acceptable (every day
definition)
 Validity has been associated with different
meaning across disciplines.
 Validity theory: conceptual FW
 Validation practice: investigation intoV or a
process of investigation guided byV theory
 It derives from “validus” Latin word means strong,
healthy, worthy
Validity across disciplines:
 Philosophy and logic: if and only if it is not possible
for all the premises to be true when its conclusion is
false. It is a deductive argument .
 It is not related to validation but the strength of
validity argument
 Law and economics
 Genetic testing
 Management
 Edu and psych measurement
Validity for research or
measurement ??
 V for research (Campbell, 1957): relevant to
conclusions based on evidence, of 2 kinds:
 InternalV: degree of confidence that the
conclusions or observed effect is genuine for
experimental group and the design rules out other
irrelevant explanations
 ExternalV: th4e degree of confidence in
conclusions that can be generalized from
experimental group to the intended population. It is
decreased if the sampling is biased or the process
itself causes the effect.
Validity for research
 ExternalV is subdivided into:
1. PopulationV:confidence in the generalization
of conclusions across populations
2. Ecological V:confidence in the generalization
of conclusions across conditions
a) OutcomeV: across dependent variables
b) TemporalV: across time
c) TreatmentV: across treatment variations
Cook and Campbell 4 way classification
or 4 major decisions (1979)
 InternalV into: statistical conclusionV+
internalV
 ExternalV into: constructV+ externalV
 Validity is confidence in the credibility of
description and interpretation or the
legitimacy of the produced knowledge
 It is the social consequence of qualitative
research
Lather 3 main conceptions of
V (1986):
1. FaceV: member checking
2. ConstructV: systematized reflexivity of
researcher’s theory in response to the data
3. CatalyticV: the facilitating the
transformation of reality, reorienting,
focusing, and energizing the participants
These two Vs are confusing:
1. Same key contributors in both theories
2. V for research ideas are borrowed fromV for
measurement
3. Similar few terms in both literature with
different meanings
4. V for research is involved in allV for
measurement , yet the reverse is not true.
Validity for measurement
measure an attribute by a test
 Individuals are measured to make decisions
 The more accurate measurement, the better the
decision
 Professions are characterized in terms of the
different attributes they need to measure
e.g.: manic depression decide treatment
math achievement place a student in a class
Ultimate purpose: improve the ratio of
correct/incorrect decisions
Peculiarity of V for measurement
 Result of a single test are used for multiple
purposes or interpreted in terms of a variety
of attributes. Is it possible??
 Can each attribute be measured with
sufficient accuracy?
 How can it defend the use of results or
support a claim to validity?
Kinds of V for measurement
Cronbach (1949):
1. LogicalV: based on logical analysis and
content
2. EmpiricalV : based on empirical evidence
such as correlation of scores and test
Approaches to investigating
V from 1950s to 1970s
1. ContentV: derivative of logicalV
2. CriterionV: derivative of empiricalV
(subdivided into concurrent and predictiveV)
1. ConstructV: scientific and last resortV (when neither
could be relied on)
New kinds ofV continue to be proposed even to the
present day, and it is bizarre because since the mid
1980s it has been recognized officially thatV is a
unitary concept (Messick, 1980) there was only one
kind of v and that is constructV
Measurement in edu and psych
 Assessment
 Performance evaluation
 Diagnosis
 It carries both technical and emotional
baggage. It is used in its loosest sense and
embraces anyone with pro remit for
measurement, assessment, evaluation and
diagnosis
Test or score means:
 Any structured assessment of behavior
 Any measurement procedure
 A set of procedure to elicit evaluate ot interpret a
behavior
 The outcome of test will be summarized as score,
report, or profile to characterized individual in
terms of the attribute being measured
 V is a quality of this procedure, so when it is valid it
is tantamount to: thumbs-up, green light, stamp
of approval.
Review:
 Test or measurement procedure is valid--
claim of using the test for certain kind of
attribute based on evidence  making
certain type of decision in future
1) What to measure?
 A number of contenders are:
 Characteristic to be measured (human)
 Trait(human)
 Disposition (human)
 Construct: most frequent one
 Attribute
 Achievement, Attainment, Aptitude, attitude,
proficiency, competence, etc.
Construct V:A terminological
conundrum :
 If all ofV is now ConstructV , then construct is
redundant.
 It is also misleading because it implies a
traditional tradition that is no longer credible
in this century
 Large amount of literature of construct in edu
and psych is specific to measurement and
constructV
 There is no straightforward solution to this
problem.
Attribute:
 Why:
1. Minimize the confusion
2. Reserve the “construct” for talk about
constructV
Certain attributes are of sig importance in our
field: ( no universal agreement on their
connotatin
Achievement (evaluative overtone)
Aptitude (innateness connotation)
Intelligence
Problem with Attribute names
 They fall in and out of fashion
 Every day names of them can change over
time.
 The scientific understanding of them may
change
 New names may be proposed for particular
implications
For instance: SAT
 1926: to assess academic readiness for
university (scholastic aptitude test)
 Connotation of innate and fixed ability akin
to intelligence
 1990: not measure sth innate (scholastic
assessment test)
 Later: SAT retained as a name in its own right
For instance: end-of-course
test
 Achievement: evaluative overtone,
accomplishment after following a course of
instruction, may be mastering for one learner
 Attainment: assumed neutral, an attempt to
master particular learning outcomes, again is
specific to one individual
 Competence or proficiency: capacity to do X,
Y, or Z regardless of following a particular
course of learning or instruction
V and R
 Early definition: the degree to which a test
measures what it is supposed to measure
 Definition of R: consistency of outcome
 V vs. R: accuracy vs. consistency
 In the absence of consistency any claim to be able
to measure accurately would be indefensible.
 We might be consistently wrong, so it is not
enough or sufficient. But a necessary condition for
high measurement quality. It is just one facet ofV.
The history of V in edu and
psych field:
 Answers to two fundamental questions:
Different answers over the years to improve
validation prax
 1) what does it mean to claimV? (V theory)
 2) how can a validity claim be substantiated
(Validation prax)
V theory:
 No comprehensive coherent clear account of it
 Mid 1950s: document prepared by committees
of measurement professional from north
America (AKA: technical recommendations or
standards:
 encapsulated version of all official statements
Succinct guidance on validation, meaning ofV
Not only well-developed but ambiguous
Product of NA committees became the lingua
franca for the world
Product of compromise rather than universal
satisfaction with poor validation prax
main challenge of V theorists:
move beyond the disparate heuristic principles
of standards towards a comprehensive account
of V
 Got closer to it from 1970s-1990s by scholarship
of Messick: comprehensive yet unclear , ling,
dense, and philosophically challenging and
viscous
Validity accounts: Cureton 1951, Messick 1989
Test validation: Cronbach 1971
Validation: Kane 2006
Three phases of history
(first classification):
1. pre-Trinitarian:
2. Trinitarian ( contentV, criterionV,
constructV) 1950s---1970s holy
trinity
3. Unitarian
Newton and Shaw classification
( 5 key phases):
1. Mid 1800s—1920 Gestational period
2. 1921---1951 period of crystallization
3. 1952---1974 period of fragmentation
4. 1975---1999 period of reunification
5. 2000---2012 period of deconstruction
No sharp line btw them, just crude attempt to
structure the course
Many of the transition correspond to publication of
new version of standards
they captured the zeitgeist btw eras
Newton and Shaw classification
( 5 key phases):
 They focus on
1. Conception ofV
2. How to employ logical analyses and
empirical evidence to substantiate a claim
toV?
3. When to employ logical analyses and
empirical evidence to substantiate a claim
toV?
1) Gestational period
mid 1800-1920
1. Structured assessment: better decision making,
facilitating outcomes, fairer for individuals and useful
for society: introduction of written and local
examinations in USA and England
2. More structure and less objectivity by the end of 19th
century: introduction ofT/F, MC, completion, and
standardized tests
3. Advances in statistical procedures, invention of
Co.co, test of mental capacities
4. Early years of 20th: measurement movement all sort
of test of all sorts of attributes in all formats ,success
of test for placement and selection
2) A period of
crystallization (1921-1951)
Development ofTests of many uses:
1. Test of edu achievement: judge students and
schools
2. Test of intelligence: diagnose backwardness and
excellence
3. Test of specific aptitude: vocational guidance
Importance of quality and control, seek consensus
on the meaning of terms and procedures such as
R andV
2) A period of
crystallization (1921-1951)
How to establish a claim toV by 2 approaches:
1. Logical analysis of test content, group of expert
practitioners scrutinized the content of the test
and judge if it matches the content of
curriculum or not.
2. Empirical evidence of correlation btw the test
and the what was supposed to be measured
key question: what the test results ought to be
correlated against? what criterion in order to
judge the results? Expert judgments are valid?
2) A period of
crystallization (1921-1951)
 Validate a short standardized test of
achievement against along comprehensive
assessment of achievement that cover the full
range of learning outcomes. High correlation
with long one validate the test as measurement
of full domain
Different communities based on their interest
molded the definition
1. Psychologist with interest in aptitude
prioritized empirical evidence of correlation
2. Educators with interest in achievement
prioritized logical analysis of content
3) The fragmentation of V
(1952-1974)
 Publication of first standards in 1952 to govern info of
test producer by a committee of APA chaired by
Cronbach
 Previous classifications ofV into types such as
1) LogicalV and empiricalV- 1949
2) CurricularV, statisticalV, psychological and logicalV
9neither of the previous 2, arm-chair dissection of the
total process- 1943
3) 4 types ofV: content, predictive, status, congruent
V-1952
Final publication 1954: content, predictive, concurrent,
construct
3) The fragmentation of V
(1952-1974)
 Intention of constructV: when neither the logical
analysis nor the empirical evidence were
regarded as sufficient.
 Certain types of tests are evaluated in relation to
a universe of content (contentV)
 Aptitude tests are evaluated in relation to
criterion measure
 For other tests there was no yardstick and need a
different procedure. Such as personality tests
3) The fragmentation of V
(1952-1974)
 ConstructV determines what psychological
construct accounts for test performance
 Construct: means postulated attribute that is
manifest in test performance
 It subsumed both logical analysis and
empirical evidence or any other forms of
evidence to be brought on psychological
meaning of score so it is quintessentially
scientific and relied on a theory
3) The fragmentation of V
(1952-1974)
 Second Revision of standards in 1966 : 4 types ofV
collapsed into 3
1. ContentV
2. Criterion relatedV
3. ConstructV
Third revision of standards in 1974
They are not mutually exclusive butV theory and validation
fragmented along these lines. “validity types” as
alternatives to validation.They are not preferable over
each other.
Problem: criterionV definition couldn’t be reconciled with
classis definition. Predictor tests were black boxes and
irrelevant if not predict the criterion with accuracy
4) The reunification of V-
Messick years (1974-1999)
 4th edition of standards in 1985 and 5th in 1999
 AllV ought to be understood as constructV.
 Demolished the distinction btwV for measurement
andV for prediction.
 3 fundamental imperatives for validation:
1. Establish the criterion measure measured what
it was supposed to measure
2. Establish the aptitude test measured what it
was supposed to measure
3. Establish a theoretical rationale and presenting
evidence for aptitude test
4) The reunification of V-
Messick years (1974-1999)
 Before 1970s: blind empiricism and claimV
based on logical analysis
 Messick upped the ante:
1. Test performance must be representative
of learning outcomes.
2. Variance of test scores is attributable to
construct relevant factors
3. Twin threats of construct underrep and
construct irrelevant variance
Messick triumph:
 Validation: integration of logical analysis, empirical
evidence to substantiate the claim
 Validation: Scientific laborious inquiry
 Encourage evaluators to accumulate evidence and
analysis as much as they can ant stake a claim toV
based on single study in isolation
 Locate ethic at the heart ofV theory
(overemphasize the scientific evaluation of values,
and down play ethical evaluation)
Scientific investigation of consequences from testing
Failed to provide a persuasive synthesis of science
and ethics and left a rift btw measurement
professionals
5) The deconstruction of V
(2000-2012)
1. Validation prax by argumentation: construct and
defendV claims
 Where to begin? (interpretation and use of score)
 How to proceed? (make explicit claims and its
assumptions)
 When to stop? (coherent, complete argument with
plausible inferences and assumptions)
 Messick: emphasized sources of evidence and claim to
V as overall evaluative judgment
 Kane: emphasized the integration of sources within
overallV argument, how to construct and defend
claims, a methodology for subdividing theV into
chunks
5) The deconstruction of V
(2000-2012)
2. Development of newV theory
 Strong rejection of Cronbach and Messick ‘s
Validation which was a truly epic, never
ending, laborious and interminable quest or
undertaking.Validation was development of
theory relate one theoretical construct to
others within a large network of theoretical
constructs
5) The deconstruction of V
(2000-2012)
 To him,Validation is dependent on particular
interpretation and use of results that the test user
has in mind.
 If it is simple, small amount of evidence is needed
3. Drew a distinction btw observable and theoretical
attributes: interpretations of theoretical constructs
are scientific inquiry (traditional construct
validation) while interpretations of observable
attributes such as proficiency, vocab knowledge, …
are far easier.
So deconstruction means downplaying the sig of
theoretical constructs.
5) The deconstruction of V
(2000-2012)
 Cizek: no integration of scientific and ethical
analysis is possible since they are mutually
incompatible arguments .That’s why there is
disjunction btw theory ofV and prax of
validation. It is simply not feasible.
 Denny Borsboom:V is not a property of
interpretation of test score, but a property of
test
 Mitchel, Moss, Embretson
 New FW to evaluate test policy by Newton
and Shaw

More Related Content

What's hot

Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Linejan
 
Presentation validity
Presentation validityPresentation validity
Presentation validityAshMusavi
 
Validity & reliability seminar
Validity & reliability seminarValidity & reliability seminar
Validity & reliability seminarmrikara185
 
Reliability and validity1
Reliability and validity1Reliability and validity1
Reliability and validity1MMIHS
 
15th batch NPTI Validity & Reliablity Business Research Methods
15th batch NPTI Validity & Reliablity Business Research Methods 15th batch NPTI Validity & Reliablity Business Research Methods
15th batch NPTI Validity & Reliablity Business Research Methods Ravi Pohani
 
Tools in Qualitative Research: Validity and Reliability
Tools in Qualitative Research: Validity and ReliabilityTools in Qualitative Research: Validity and Reliability
Tools in Qualitative Research: Validity and ReliabilityDr. Sarita Anand
 
Validity in psychological testing
Validity in psychological testingValidity in psychological testing
Validity in psychological testingMilen Ramos
 
Questionnaire and Instrument validity
Questionnaire and Instrument validityQuestionnaire and Instrument validity
Questionnaire and Instrument validitymdanaee
 
Content &statistical validity
Content &statistical validityContent &statistical validity
Content &statistical validityAMU
 
Reliability and validity w3
Reliability and validity w3Reliability and validity w3
Reliability and validity w3Muhammad Ali
 
Validity, reliability and feasibility
Validity, reliability and feasibilityValidity, reliability and feasibility
Validity, reliability and feasibilitysilpa $H!lu
 
Week 9 validity and reliability
Week 9 validity and reliabilityWeek 9 validity and reliability
Week 9 validity and reliabilitywawaaa789
 

What's hot (20)

Aspects of Validity
Aspects of ValidityAspects of Validity
Aspects of Validity
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity
 
Presentation validity
Presentation validityPresentation validity
Presentation validity
 
Validation
ValidationValidation
Validation
 
Rep
RepRep
Rep
 
Validity & reliability seminar
Validity & reliability seminarValidity & reliability seminar
Validity & reliability seminar
 
Validity & Reliability
Validity & ReliabilityValidity & Reliability
Validity & Reliability
 
Reliability and validity1
Reliability and validity1Reliability and validity1
Reliability and validity1
 
15th batch NPTI Validity & Reliablity Business Research Methods
15th batch NPTI Validity & Reliablity Business Research Methods 15th batch NPTI Validity & Reliablity Business Research Methods
15th batch NPTI Validity & Reliablity Business Research Methods
 
Tools in Qualitative Research: Validity and Reliability
Tools in Qualitative Research: Validity and ReliabilityTools in Qualitative Research: Validity and Reliability
Tools in Qualitative Research: Validity and Reliability
 
Validity in psychological testing
Validity in psychological testingValidity in psychological testing
Validity in psychological testing
 
Questionnaire and Instrument validity
Questionnaire and Instrument validityQuestionnaire and Instrument validity
Questionnaire and Instrument validity
 
Validity and reliability
Validity and reliabilityValidity and reliability
Validity and reliability
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validity
 
Content &statistical validity
Content &statistical validityContent &statistical validity
Content &statistical validity
 
Reliability and validity w3
Reliability and validity w3Reliability and validity w3
Reliability and validity w3
 
Validity in Assessment
Validity in AssessmentValidity in Assessment
Validity in Assessment
 
Chapter 6: Validity
Chapter 6: ValidityChapter 6: Validity
Chapter 6: Validity
 
Validity, reliability and feasibility
Validity, reliability and feasibilityValidity, reliability and feasibility
Validity, reliability and feasibility
 
Week 9 validity and reliability
Week 9 validity and reliabilityWeek 9 validity and reliability
Week 9 validity and reliability
 

Viewers also liked

The role of corrective feedback in second language learning
The role of corrective feedback in second language learningThe role of corrective feedback in second language learning
The role of corrective feedback in second language learningAmir Hamid Forough Ameri
 
Thesis summary by amir hamid forough ameri
Thesis summary by amir hamid forough ameriThesis summary by amir hamid forough ameri
Thesis summary by amir hamid forough ameriAmir Hamid Forough Ameri
 
Critical pedagogy in l2 learning and teaching suresh canagarajah
Critical pedagogy in l2 learning and teaching  suresh canagarajahCritical pedagogy in l2 learning and teaching  suresh canagarajah
Critical pedagogy in l2 learning and teaching suresh canagarajahAmir Hamid Forough Ameri
 
Critical literacy and second language learning luke and dooley
Critical literacy and second language learning  luke and dooleyCritical literacy and second language learning  luke and dooley
Critical literacy and second language learning luke and dooleyAmir Hamid Forough Ameri
 
3.4 types of validity
3.4 types of validity3.4 types of validity
3.4 types of validityA M
 
The task based approach some questions and suggestions littlewood
The task based approach some questions and suggestions littlewoodThe task based approach some questions and suggestions littlewood
The task based approach some questions and suggestions littlewoodAmir Hamid Forough Ameri
 

Viewers also liked (20)

Reliability bachman 1990 chapter 6
Reliability bachman 1990 chapter 6Reliability bachman 1990 chapter 6
Reliability bachman 1990 chapter 6
 
The role of corrective feedback in second language learning
The role of corrective feedback in second language learningThe role of corrective feedback in second language learning
The role of corrective feedback in second language learning
 
Sifakis 2007
Sifakis 2007Sifakis 2007
Sifakis 2007
 
Extroversion introversion
Extroversion introversionExtroversion introversion
Extroversion introversion
 
Exploring culture by ah forough ameri
Exploring culture by ah forough ameriExploring culture by ah forough ameri
Exploring culture by ah forough ameri
 
Thesis summary by amir hamid forough ameri
Thesis summary by amir hamid forough ameriThesis summary by amir hamid forough ameri
Thesis summary by amir hamid forough ameri
 
Behavioral view of motivation
Behavioral view of motivationBehavioral view of motivation
Behavioral view of motivation
 
Language testing the social dimension
Language testing  the social dimensionLanguage testing  the social dimension
Language testing the social dimension
 
Critical pedagogy in l2 learning and teaching suresh canagarajah
Critical pedagogy in l2 learning and teaching  suresh canagarajahCritical pedagogy in l2 learning and teaching  suresh canagarajah
Critical pedagogy in l2 learning and teaching suresh canagarajah
 
Context culture .... m. wendt
Context culture .... m. wendtContext culture .... m. wendt
Context culture .... m. wendt
 
Integrated syllabus
Integrated syllabusIntegrated syllabus
Integrated syllabus
 
ANxiety bolouri
ANxiety bolouriANxiety bolouri
ANxiety bolouri
 
Swan.bolouri
Swan.bolouriSwan.bolouri
Swan.bolouri
 
attitide anxiety bolouri
 attitide anxiety bolouri attitide anxiety bolouri
attitide anxiety bolouri
 
Critical literacy and second language learning luke and dooley
Critical literacy and second language learning  luke and dooleyCritical literacy and second language learning  luke and dooley
Critical literacy and second language learning luke and dooley
 
Classroom assessment, glenn fulcher
Classroom assessment, glenn fulcherClassroom assessment, glenn fulcher
Classroom assessment, glenn fulcher
 
3.4 types of validity
3.4 types of validity3.4 types of validity
3.4 types of validity
 
Tsui 2011
Tsui 2011Tsui 2011
Tsui 2011
 
The task based approach some questions and suggestions littlewood
The task based approach some questions and suggestions littlewoodThe task based approach some questions and suggestions littlewood
The task based approach some questions and suggestions littlewood
 
Notional functional syllabus
Notional functional syllabusNotional functional syllabus
Notional functional syllabus
 

Similar to Maryam Bolouri

Applying A Validity Argument To The Viva
Applying A Validity Argument To The VivaApplying A Validity Argument To The Viva
Applying A Validity Argument To The VivaAudrey Britton
 
Research.Design.Sociology.ritchey2022.pptx
Research.Design.Sociology.ritchey2022.pptxResearch.Design.Sociology.ritchey2022.pptx
Research.Design.Sociology.ritchey2022.pptxLynn Ritchey
 
Psychometric properties
Psychometric propertiesPsychometric properties
Psychometric propertiesYoussef2000
 
11.vol 0005www.iiste.org call for paper_no 1-2_ pp. 25-64
11.vol 0005www.iiste.org call for paper_no 1-2_ pp. 25-6411.vol 0005www.iiste.org call for paper_no 1-2_ pp. 25-64
11.vol 0005www.iiste.org call for paper_no 1-2_ pp. 25-64Alexander Decker
 
MYP Science Year 4-5 Criterion D Rubric
MYP Science Year 4-5 Criterion D RubricMYP Science Year 4-5 Criterion D Rubric
MYP Science Year 4-5 Criterion D RubricBrad Kremer
 
Outline & Research Design RoadmapThis exercise will help you bui.docx
Outline & Research Design RoadmapThis exercise will help you bui.docxOutline & Research Design RoadmapThis exercise will help you bui.docx
Outline & Research Design RoadmapThis exercise will help you bui.docxalfred4lewis58146
 
IDS Impact, Innovation and Learning Workshop March 2013: Day 2, Paper Session...
IDS Impact, Innovation and Learning Workshop March 2013: Day 2, Paper Session...IDS Impact, Innovation and Learning Workshop March 2013: Day 2, Paper Session...
IDS Impact, Innovation and Learning Workshop March 2013: Day 2, Paper Session...Institute of Development Studies
 
internal Assign no 206 ( JAIPUR NATIONAL UNI)
internal Assign no   206 ( JAIPUR NATIONAL UNI)internal Assign no   206 ( JAIPUR NATIONAL UNI)
internal Assign no 206 ( JAIPUR NATIONAL UNI)Partha_bappa
 
3-Quantitative and Qualitative Research and principles
3-Quantitative and Qualitative  Research and principles3-Quantitative and Qualitative  Research and principles
3-Quantitative and Qualitative Research and principlesBeaOrbita1
 
Naf 2010 presentation teachers rev 7.10 copy
Naf 2010 presentation  teachers rev 7.10 copyNaf 2010 presentation  teachers rev 7.10 copy
Naf 2010 presentation teachers rev 7.10 copyNAFCareerAcads
 
Writing the Theoretical and Conceptual Framework of a Quantitative Research
Writing the Theoretical and Conceptual Framework of a Quantitative ResearchWriting the Theoretical and Conceptual Framework of a Quantitative Research
Writing the Theoretical and Conceptual Framework of a Quantitative Researchschool
 
The final examination of the UK PhD: fit for purpose?
The final examination of the  UK PhD: fit for purpose?The final examination of the  UK PhD: fit for purpose?
The final examination of the UK PhD: fit for purpose?UKCGE
 
Developing an in-house speaking assessment: Rasch analysis for action research
Developing an in-house speaking assessment: Rasch analysis for action researchDeveloping an in-house speaking assessment: Rasch analysis for action research
Developing an in-house speaking assessment: Rasch analysis for action researchAndy Vajirasarn
 

Similar to Maryam Bolouri (20)

Newton ch2
Newton ch2Newton ch2
Newton ch2
 
2010 01 psyf 588 master syllabus
2010 01 psyf 588 master syllabus2010 01 psyf 588 master syllabus
2010 01 psyf 588 master syllabus
 
Validity & reliability
Validity & reliabilityValidity & reliability
Validity & reliability
 
Prof. dr. Rolf Fasting
Prof. dr. Rolf Fasting Prof. dr. Rolf Fasting
Prof. dr. Rolf Fasting
 
Applying A Validity Argument To The Viva
Applying A Validity Argument To The VivaApplying A Validity Argument To The Viva
Applying A Validity Argument To The Viva
 
Research.Design.Sociology.ritchey2022.pptx
Research.Design.Sociology.ritchey2022.pptxResearch.Design.Sociology.ritchey2022.pptx
Research.Design.Sociology.ritchey2022.pptx
 
Criterion Essay
Criterion EssayCriterion Essay
Criterion Essay
 
Psychometric properties
Psychometric propertiesPsychometric properties
Psychometric properties
 
11.vol 0005www.iiste.org call for paper_no 1-2_ pp. 25-64
11.vol 0005www.iiste.org call for paper_no 1-2_ pp. 25-6411.vol 0005www.iiste.org call for paper_no 1-2_ pp. 25-64
11.vol 0005www.iiste.org call for paper_no 1-2_ pp. 25-64
 
MYP Science Year 4-5 Criterion D Rubric
MYP Science Year 4-5 Criterion D RubricMYP Science Year 4-5 Criterion D Rubric
MYP Science Year 4-5 Criterion D Rubric
 
Methodo Delphi
Methodo DelphiMethodo Delphi
Methodo Delphi
 
Outline & Research Design RoadmapThis exercise will help you bui.docx
Outline & Research Design RoadmapThis exercise will help you bui.docxOutline & Research Design RoadmapThis exercise will help you bui.docx
Outline & Research Design RoadmapThis exercise will help you bui.docx
 
IDS Impact, Innovation and Learning Workshop March 2013: Day 2, Paper Session...
IDS Impact, Innovation and Learning Workshop March 2013: Day 2, Paper Session...IDS Impact, Innovation and Learning Workshop March 2013: Day 2, Paper Session...
IDS Impact, Innovation and Learning Workshop March 2013: Day 2, Paper Session...
 
Research Critique
Research Critique Research Critique
Research Critique
 
internal Assign no 206 ( JAIPUR NATIONAL UNI)
internal Assign no   206 ( JAIPUR NATIONAL UNI)internal Assign no   206 ( JAIPUR NATIONAL UNI)
internal Assign no 206 ( JAIPUR NATIONAL UNI)
 
3-Quantitative and Qualitative Research and principles
3-Quantitative and Qualitative  Research and principles3-Quantitative and Qualitative  Research and principles
3-Quantitative and Qualitative Research and principles
 
Naf 2010 presentation teachers rev 7.10 copy
Naf 2010 presentation  teachers rev 7.10 copyNaf 2010 presentation  teachers rev 7.10 copy
Naf 2010 presentation teachers rev 7.10 copy
 
Writing the Theoretical and Conceptual Framework of a Quantitative Research
Writing the Theoretical and Conceptual Framework of a Quantitative ResearchWriting the Theoretical and Conceptual Framework of a Quantitative Research
Writing the Theoretical and Conceptual Framework of a Quantitative Research
 
The final examination of the UK PhD: fit for purpose?
The final examination of the  UK PhD: fit for purpose?The final examination of the  UK PhD: fit for purpose?
The final examination of the UK PhD: fit for purpose?
 
Developing an in-house speaking assessment: Rasch analysis for action research
Developing an in-house speaking assessment: Rasch analysis for action researchDeveloping an in-house speaking assessment: Rasch analysis for action research
Developing an in-house speaking assessment: Rasch analysis for action research
 

More from Allame Tabatabaei (17)

political discourse
political discoursepolitical discourse
political discourse
 
discourse analysis
discourse analysis discourse analysis
discourse analysis
 
flowerdew basics
 flowerdew basics  flowerdew basics
flowerdew basics
 
religion discourse analysis
religion discourse analysisreligion discourse analysis
religion discourse analysis
 
discourse analysis EAP
discourse analysis EAPdiscourse analysis EAP
discourse analysis EAP
 
General points in letter writing
General points in letter writing General points in letter writing
General points in letter writing
 
Edmodo presentations
Edmodo presentationsEdmodo presentations
Edmodo presentations
 
Coleman1,2
Coleman1,2Coleman1,2
Coleman1,2
 
White bolouri
White bolouriWhite bolouri
White bolouri
 
Mc kay bolouri
Mc kay bolouriMc kay bolouri
Mc kay bolouri
 
Attitudes bolouri
Attitudes bolouriAttitudes bolouri
Attitudes bolouri
 
Bell.bolouri
Bell.bolouriBell.bolouri
Bell.bolouri
 
Id
IdId
Id
 
structural
structuralstructural
structural
 
Regression presentation
Regression presentationRegression presentation
Regression presentation
 
Irt assessment
Irt assessmentIrt assessment
Irt assessment
 
Bolouri qualitative method
Bolouri qualitative methodBolouri qualitative method
Bolouri qualitative method
 

Recently uploaded

The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxheathfieldcps1
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....Ritu480198
 
8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital ManagementMBA Assignment Experts
 
The Liver & Gallbladder (Anatomy & Physiology).pptx
The Liver &  Gallbladder (Anatomy & Physiology).pptxThe Liver &  Gallbladder (Anatomy & Physiology).pptx
The Liver & Gallbladder (Anatomy & Physiology).pptxVishal Singh
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/siemaillard
 
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45MysoreMuleSoftMeetup
 
Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).Mohamed Rizk Khodair
 
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnershipsexpandedwebsite
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...EduSkills OECD
 
The Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFThe Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFVivekanand Anglo Vedic Academy
 
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhĐề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhleson0603
 
How to Manage Closest Location in Odoo 17 Inventory
How to Manage Closest Location in Odoo 17 InventoryHow to Manage Closest Location in Odoo 17 Inventory
How to Manage Closest Location in Odoo 17 InventoryCeline George
 
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...Krashi Coaching
 
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSean M. Fox
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjMohammed Sikander
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17Celine George
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024Borja Sotomayor
 

Recently uploaded (20)

The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptx
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....
 
8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management
 
The Liver & Gallbladder (Anatomy & Physiology).pptx
The Liver &  Gallbladder (Anatomy & Physiology).pptxThe Liver &  Gallbladder (Anatomy & Physiology).pptx
The Liver & Gallbladder (Anatomy & Physiology).pptx
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/
 
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
 
Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).Dementia (Alzheimer & vasular dementia).
Dementia (Alzheimer & vasular dementia).
 
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...
 
The Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFThe Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDF
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhĐề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
 
How to Manage Closest Location in Odoo 17 Inventory
How to Manage Closest Location in Odoo 17 InventoryHow to Manage Closest Location in Odoo 17 Inventory
How to Manage Closest Location in Odoo 17 Inventory
 
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
MSc Ag Genetics & Plant Breeding: Insights from Previous Year JNKVV Entrance ...
 
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
 
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
Mattingly "AI and Prompt Design: LLMs with Text Classification and Open Source"
 
How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17How To Create Editable Tree View in Odoo 17
How To Create Editable Tree View in Odoo 17
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024
 

Maryam Bolouri

  • 1. VALIDITY IN EDUCATIONAL ASSESSMENT CHAPTER ONE (NEWTON &SHAW, 2014) Maryam Bolouri
  • 2. Topics to be discussed:  Validity definition  Types ofValidity  The history of validity
  • 3. Validity 100years of debate Agreement (bland consensus) among scholars:  The hallmark of quality in testing  The most important criterion for evaluating a test Discrepancies: Different perspectives and Meaning of validity: 1) Measurement concept (original scientific definition in 1920s) 2) Measurement and decision making concept 9btw these 2 extremes, scientific and ethical) 3) Measurement, decision making and impacts concept (social and ethical definition, concerns test use)
  • 4. Why Validity is so confusing: 1) Very large and disparate literature onV within edu and psycho measurement 2) Very large and disparate literature onV within other disciplines 3) The official meaning ofV has evolved over time 4) Gained stature and expansion over time 5) Related accounts are hard to read 6) V is employed in different ways and contexts, so it is unclear what the intended meaning is. If it is technical sense then it is referred to which discipline?
  • 5. Validity (definition)  Validity is the property or quality of being valid, true, cogent or legally acceptable (every day definition)  Validity has been associated with different meaning across disciplines.  Validity theory: conceptual FW  Validation practice: investigation intoV or a process of investigation guided byV theory  It derives from “validus” Latin word means strong, healthy, worthy
  • 6. Validity across disciplines:  Philosophy and logic: if and only if it is not possible for all the premises to be true when its conclusion is false. It is a deductive argument .  It is not related to validation but the strength of validity argument  Law and economics  Genetic testing  Management  Edu and psych measurement
  • 7. Validity for research or measurement ??  V for research (Campbell, 1957): relevant to conclusions based on evidence, of 2 kinds:  InternalV: degree of confidence that the conclusions or observed effect is genuine for experimental group and the design rules out other irrelevant explanations  ExternalV: th4e degree of confidence in conclusions that can be generalized from experimental group to the intended population. It is decreased if the sampling is biased or the process itself causes the effect.
  • 8. Validity for research  ExternalV is subdivided into: 1. PopulationV:confidence in the generalization of conclusions across populations 2. Ecological V:confidence in the generalization of conclusions across conditions a) OutcomeV: across dependent variables b) TemporalV: across time c) TreatmentV: across treatment variations
  • 9. Cook and Campbell 4 way classification or 4 major decisions (1979)  InternalV into: statistical conclusionV+ internalV  ExternalV into: constructV+ externalV  Validity is confidence in the credibility of description and interpretation or the legitimacy of the produced knowledge  It is the social consequence of qualitative research
  • 10. Lather 3 main conceptions of V (1986): 1. FaceV: member checking 2. ConstructV: systematized reflexivity of researcher’s theory in response to the data 3. CatalyticV: the facilitating the transformation of reality, reorienting, focusing, and energizing the participants
  • 11. These two Vs are confusing: 1. Same key contributors in both theories 2. V for research ideas are borrowed fromV for measurement 3. Similar few terms in both literature with different meanings 4. V for research is involved in allV for measurement , yet the reverse is not true.
  • 12. Validity for measurement measure an attribute by a test  Individuals are measured to make decisions  The more accurate measurement, the better the decision  Professions are characterized in terms of the different attributes they need to measure e.g.: manic depression decide treatment math achievement place a student in a class Ultimate purpose: improve the ratio of correct/incorrect decisions
  • 13. Peculiarity of V for measurement  Result of a single test are used for multiple purposes or interpreted in terms of a variety of attributes. Is it possible??  Can each attribute be measured with sufficient accuracy?  How can it defend the use of results or support a claim to validity?
  • 14. Kinds of V for measurement Cronbach (1949): 1. LogicalV: based on logical analysis and content 2. EmpiricalV : based on empirical evidence such as correlation of scores and test
  • 15. Approaches to investigating V from 1950s to 1970s 1. ContentV: derivative of logicalV 2. CriterionV: derivative of empiricalV (subdivided into concurrent and predictiveV) 1. ConstructV: scientific and last resortV (when neither could be relied on) New kinds ofV continue to be proposed even to the present day, and it is bizarre because since the mid 1980s it has been recognized officially thatV is a unitary concept (Messick, 1980) there was only one kind of v and that is constructV
  • 16. Measurement in edu and psych  Assessment  Performance evaluation  Diagnosis  It carries both technical and emotional baggage. It is used in its loosest sense and embraces anyone with pro remit for measurement, assessment, evaluation and diagnosis
  • 17. Test or score means:  Any structured assessment of behavior  Any measurement procedure  A set of procedure to elicit evaluate ot interpret a behavior  The outcome of test will be summarized as score, report, or profile to characterized individual in terms of the attribute being measured  V is a quality of this procedure, so when it is valid it is tantamount to: thumbs-up, green light, stamp of approval.
  • 18. Review:  Test or measurement procedure is valid-- claim of using the test for certain kind of attribute based on evidence  making certain type of decision in future
  • 19. 1) What to measure?  A number of contenders are:  Characteristic to be measured (human)  Trait(human)  Disposition (human)  Construct: most frequent one  Attribute  Achievement, Attainment, Aptitude, attitude, proficiency, competence, etc.
  • 20. Construct V:A terminological conundrum :  If all ofV is now ConstructV , then construct is redundant.  It is also misleading because it implies a traditional tradition that is no longer credible in this century  Large amount of literature of construct in edu and psych is specific to measurement and constructV  There is no straightforward solution to this problem.
  • 21. Attribute:  Why: 1. Minimize the confusion 2. Reserve the “construct” for talk about constructV Certain attributes are of sig importance in our field: ( no universal agreement on their connotatin Achievement (evaluative overtone) Aptitude (innateness connotation) Intelligence
  • 22. Problem with Attribute names  They fall in and out of fashion  Every day names of them can change over time.  The scientific understanding of them may change  New names may be proposed for particular implications
  • 23. For instance: SAT  1926: to assess academic readiness for university (scholastic aptitude test)  Connotation of innate and fixed ability akin to intelligence  1990: not measure sth innate (scholastic assessment test)  Later: SAT retained as a name in its own right
  • 24. For instance: end-of-course test  Achievement: evaluative overtone, accomplishment after following a course of instruction, may be mastering for one learner  Attainment: assumed neutral, an attempt to master particular learning outcomes, again is specific to one individual  Competence or proficiency: capacity to do X, Y, or Z regardless of following a particular course of learning or instruction
  • 25. V and R  Early definition: the degree to which a test measures what it is supposed to measure  Definition of R: consistency of outcome  V vs. R: accuracy vs. consistency  In the absence of consistency any claim to be able to measure accurately would be indefensible.  We might be consistently wrong, so it is not enough or sufficient. But a necessary condition for high measurement quality. It is just one facet ofV.
  • 26. The history of V in edu and psych field:  Answers to two fundamental questions: Different answers over the years to improve validation prax  1) what does it mean to claimV? (V theory)  2) how can a validity claim be substantiated (Validation prax)
  • 27. V theory:  No comprehensive coherent clear account of it  Mid 1950s: document prepared by committees of measurement professional from north America (AKA: technical recommendations or standards:  encapsulated version of all official statements Succinct guidance on validation, meaning ofV Not only well-developed but ambiguous Product of NA committees became the lingua franca for the world Product of compromise rather than universal satisfaction with poor validation prax
  • 28. main challenge of V theorists: move beyond the disparate heuristic principles of standards towards a comprehensive account of V  Got closer to it from 1970s-1990s by scholarship of Messick: comprehensive yet unclear , ling, dense, and philosophically challenging and viscous Validity accounts: Cureton 1951, Messick 1989 Test validation: Cronbach 1971 Validation: Kane 2006
  • 29. Three phases of history (first classification): 1. pre-Trinitarian: 2. Trinitarian ( contentV, criterionV, constructV) 1950s---1970s holy trinity 3. Unitarian
  • 30. Newton and Shaw classification ( 5 key phases): 1. Mid 1800s—1920 Gestational period 2. 1921---1951 period of crystallization 3. 1952---1974 period of fragmentation 4. 1975---1999 period of reunification 5. 2000---2012 period of deconstruction No sharp line btw them, just crude attempt to structure the course Many of the transition correspond to publication of new version of standards they captured the zeitgeist btw eras
  • 31. Newton and Shaw classification ( 5 key phases):  They focus on 1. Conception ofV 2. How to employ logical analyses and empirical evidence to substantiate a claim toV? 3. When to employ logical analyses and empirical evidence to substantiate a claim toV?
  • 32. 1) Gestational period mid 1800-1920 1. Structured assessment: better decision making, facilitating outcomes, fairer for individuals and useful for society: introduction of written and local examinations in USA and England 2. More structure and less objectivity by the end of 19th century: introduction ofT/F, MC, completion, and standardized tests 3. Advances in statistical procedures, invention of Co.co, test of mental capacities 4. Early years of 20th: measurement movement all sort of test of all sorts of attributes in all formats ,success of test for placement and selection
  • 33. 2) A period of crystallization (1921-1951) Development ofTests of many uses: 1. Test of edu achievement: judge students and schools 2. Test of intelligence: diagnose backwardness and excellence 3. Test of specific aptitude: vocational guidance Importance of quality and control, seek consensus on the meaning of terms and procedures such as R andV
  • 34. 2) A period of crystallization (1921-1951) How to establish a claim toV by 2 approaches: 1. Logical analysis of test content, group of expert practitioners scrutinized the content of the test and judge if it matches the content of curriculum or not. 2. Empirical evidence of correlation btw the test and the what was supposed to be measured key question: what the test results ought to be correlated against? what criterion in order to judge the results? Expert judgments are valid?
  • 35. 2) A period of crystallization (1921-1951)  Validate a short standardized test of achievement against along comprehensive assessment of achievement that cover the full range of learning outcomes. High correlation with long one validate the test as measurement of full domain Different communities based on their interest molded the definition 1. Psychologist with interest in aptitude prioritized empirical evidence of correlation 2. Educators with interest in achievement prioritized logical analysis of content
  • 36. 3) The fragmentation of V (1952-1974)  Publication of first standards in 1952 to govern info of test producer by a committee of APA chaired by Cronbach  Previous classifications ofV into types such as 1) LogicalV and empiricalV- 1949 2) CurricularV, statisticalV, psychological and logicalV 9neither of the previous 2, arm-chair dissection of the total process- 1943 3) 4 types ofV: content, predictive, status, congruent V-1952 Final publication 1954: content, predictive, concurrent, construct
  • 37. 3) The fragmentation of V (1952-1974)  Intention of constructV: when neither the logical analysis nor the empirical evidence were regarded as sufficient.  Certain types of tests are evaluated in relation to a universe of content (contentV)  Aptitude tests are evaluated in relation to criterion measure  For other tests there was no yardstick and need a different procedure. Such as personality tests
  • 38. 3) The fragmentation of V (1952-1974)  ConstructV determines what psychological construct accounts for test performance  Construct: means postulated attribute that is manifest in test performance  It subsumed both logical analysis and empirical evidence or any other forms of evidence to be brought on psychological meaning of score so it is quintessentially scientific and relied on a theory
  • 39. 3) The fragmentation of V (1952-1974)  Second Revision of standards in 1966 : 4 types ofV collapsed into 3 1. ContentV 2. Criterion relatedV 3. ConstructV Third revision of standards in 1974 They are not mutually exclusive butV theory and validation fragmented along these lines. “validity types” as alternatives to validation.They are not preferable over each other. Problem: criterionV definition couldn’t be reconciled with classis definition. Predictor tests were black boxes and irrelevant if not predict the criterion with accuracy
  • 40. 4) The reunification of V- Messick years (1974-1999)  4th edition of standards in 1985 and 5th in 1999  AllV ought to be understood as constructV.  Demolished the distinction btwV for measurement andV for prediction.  3 fundamental imperatives for validation: 1. Establish the criterion measure measured what it was supposed to measure 2. Establish the aptitude test measured what it was supposed to measure 3. Establish a theoretical rationale and presenting evidence for aptitude test
  • 41. 4) The reunification of V- Messick years (1974-1999)  Before 1970s: blind empiricism and claimV based on logical analysis  Messick upped the ante: 1. Test performance must be representative of learning outcomes. 2. Variance of test scores is attributable to construct relevant factors 3. Twin threats of construct underrep and construct irrelevant variance
  • 42. Messick triumph:  Validation: integration of logical analysis, empirical evidence to substantiate the claim  Validation: Scientific laborious inquiry  Encourage evaluators to accumulate evidence and analysis as much as they can ant stake a claim toV based on single study in isolation  Locate ethic at the heart ofV theory (overemphasize the scientific evaluation of values, and down play ethical evaluation) Scientific investigation of consequences from testing Failed to provide a persuasive synthesis of science and ethics and left a rift btw measurement professionals
  • 43. 5) The deconstruction of V (2000-2012) 1. Validation prax by argumentation: construct and defendV claims  Where to begin? (interpretation and use of score)  How to proceed? (make explicit claims and its assumptions)  When to stop? (coherent, complete argument with plausible inferences and assumptions)  Messick: emphasized sources of evidence and claim to V as overall evaluative judgment  Kane: emphasized the integration of sources within overallV argument, how to construct and defend claims, a methodology for subdividing theV into chunks
  • 44. 5) The deconstruction of V (2000-2012) 2. Development of newV theory  Strong rejection of Cronbach and Messick ‘s Validation which was a truly epic, never ending, laborious and interminable quest or undertaking.Validation was development of theory relate one theoretical construct to others within a large network of theoretical constructs
  • 45. 5) The deconstruction of V (2000-2012)  To him,Validation is dependent on particular interpretation and use of results that the test user has in mind.  If it is simple, small amount of evidence is needed 3. Drew a distinction btw observable and theoretical attributes: interpretations of theoretical constructs are scientific inquiry (traditional construct validation) while interpretations of observable attributes such as proficiency, vocab knowledge, … are far easier. So deconstruction means downplaying the sig of theoretical constructs.
  • 46. 5) The deconstruction of V (2000-2012)  Cizek: no integration of scientific and ethical analysis is possible since they are mutually incompatible arguments .That’s why there is disjunction btw theory ofV and prax of validation. It is simply not feasible.  Denny Borsboom:V is not a property of interpretation of test score, but a property of test  Mitchel, Moss, Embretson  New FW to evaluate test policy by Newton and Shaw