SlideShare a Scribd company logo
Evaluating validity of criterion-
referenced test score
interpretations and uses
Takaaki Kumazawa
Kanto Gakuin University
(ktakaaki@kanto-gakuin.ac.jp)
Kintai Bridge, Japan (wiki
Purpose
ß The purpose of my talk is to evaluate
validity of criterion-referenced placement
test score interpretations and uses using
Kane’s (2006) argument-based validity
framework
ß This presentation is based on a paper I
published in the JALT Journal
(http://jalt-publications.org/jj/issues/2013-05_35.1)
Classical view of validity
ß Validity: the extent to which a test is
supposed to measure
ß Three types of validity
Þ Criterion-related validity
Correlation between a valid measure and a test
developing
Þ Content validity
Experts’ judgment on whether items are measuring
what is supposed to measure
Þ Construct validity
Statistical examination on whether items are measuring
what is supposed to measure
Current view of Validity
ß Validity is “the degree to which evidence
and theory support the interpretations of
test scores entailed by proposed uses of
tests” (American Educational Research
Association, American Psychological
Association, & National Council on
Measurement in Education [AERA, APA, &
NCME], 1999, p. 9).
Argument-based validity framework
Interpretive argument: proving argument that the inferences are
going to make is theoretically valid
Validity argument: evaluating the interpretive argument by providing
warrant
Observatio
n
Observed
score
Universe
score
Target
score
Use
Scoring generalization extrapolation
decision
Interpretive argument
ß Scoring inference
Þ to what extent do examinees get placement items correct
and high-scoring examinees get more placement items
correct
ß Generalization inference
Þ to what extent are placement items consistently sampled
from a domain and sufficient in number so as to reduce the
measurement error
ß Extrapolation inference
Þ to what extent do the difficulty of placement items match to
the objectives of a reading course
ß Decision inference
Þ to what extent do placement decisions made to place
examinees in their proper level of the course have an
impact on washback in the course
Participants
Þ 428 Japanese 1st year university students majoring in
law
Þ TOEIC score of about 250-450
Þ Three courses in the English program
Reading
Listening
TOEIC skills
ß Proficiency based program
Þ Three levels
Level 1: 60 high scoring students
Major objective of the reading course: improve their reading skills
such as fast reading
Level 2: about 300 students
Level 3: 50 low scoring students
Major objective of the reading class: re-learn Jr High and High
school grammar
Criterion-referenced placement test
ß Grammar (k = 40)
Þ Items are taken from textbooks used in junior and high schools
Þ Grammar: present, past, & future tenses, continuous, relative pronoun,
gerund, participle, etc…
Þ Sample: Hi, I ( ) Ken.
1. am 2. are 3. is 4. be
ß Vocabulary (k = 40)
Þ Items are taken from high frequent 1000-3000 words based on the
JACET 8000 corpus
Þ Sample: Bring
1. 送る (send) 2. 持ってくる (bring) 3. 鳴る (ring) 4. 購入する (buy)
ß Reading (k = 10)
Þ Two passages are taken from two textbooks used in Level 1 and Level
3 reading classes
Þ Sample: How do they travel?
1. by plane 2. by bus 3. by car 4. by train
Procedures
ß On the first day of semester, the placement
test was given in 45 minutes
ß A grammar pretest (k = 55, α = .85) was
given on the first day of class in Level 2
classes (n = 51) and Level 3 classes (n = 49)
ß 30 90-minute lessons in two semesters
ß The same grammar posttest (α = .92) was
given on the last day of class to the same
students (n = 51, 49)
ß A course evaluation survey was given to the
same students (n = 51, 49)
Backing for scoring inference
ß Item facility
Þ 7 items below .29
Þ 62 items between .30 and .70
Þ 21 items above .71
ß Item discrimination
Þ 4 items below .19
Þ 86 items above .20
ß Rasch Item difficulty estimates
Þ -3.79〜2.33
ß Infit MS
Þ 0.80〜1.30
Backing for generalization inference
ß Multivariate generalizability theory
(Decision study of a persons X Items
design)
Þ Grammar (k = 40, ρ = .85, Φ = .83)
Þ Vocabulary (k = 40, ρ = .86, Φ = .84)
Þ Reading (k = 10, ρ = .58, Φ = .55)
Þ Total (k = 90, ρ = .92, Φ = .91)
Cut point for Level 1
Level 1 reading
Cut point for Level 3
Junior High grammar and 1000 word level
Backing for extrapolation inference
Difficulty level estimates FACETS map
Level Difficulty SE InfitMS
JuniorHighgrammar -0.65 0.03 1.00
HighSchoolgrammar 0.29 0.02 1.00
1000wordlevelvocab -0.94 0.03 1.00
2000wordlevelvocab 0.15 0.03 1.00
3000wordlevelvocab 0.12 0.05 1.00
Level3rearing 0.30 0.05 1.00
Level1reading 0.73 0.05 1.10
-----------------------------------------------------
|Measr|+students |-items| -levels
|
CUT Po int for Leve ls 1, 2,
3
-----------------------------------------------------
+
3
+
+
+
+
| |. | |
|
| |.
|
|
|
| |.
|
|
|
| |.
|
|
|
| |.
|*
|
|
| |*
.
|
| |
+
2
+
.
+*
+
| |.
|
|
|
| |*
*
. |
|
|
| |*
.
|
*| | Level 1a ( 1.49)
---------------------------------------------------------------------------
| |*
*
**.
|
|
|
| |*
*
**.
|
|
|
| |*
*
*.
|*
| |
+
1
+
*
**.
+***
**
+
+
| |*
*
****
.
|*
*
*
*
*
*
** | |
| |*
*
*.
|
*** |Lev
el1Rea
d
ing | L
e
vel 1b (.77 )
---------------------------------------------------------------------------
| |*
*
****
.
|*
****
*
|
|
| |*
*
**
|
****
*
*
|
|
| |*
*
****
*
. |
*
*
*
*
*
*** | Basic H
S
G r a m
m a r |
| |*
*
**
|
****
**** | JACET2000
J
ACET3000 |
*
0
*
*
****
*
*. *
*
*
*
*
** * * L e
v e l 2 ( .
7 7
-.70)
| |*
*
****
*
|*
**
|
|
| |*
*
**.
| *
*
*
*** | |
| |*
*
****
.
|*
*
*
* |
|
| |*
*
****
*
** | ***
*
** | |
----------------------------------------------------------------------------
| |*
*
* |*
**** | Jr
H Gram
m
a
r
| L
e
vel 3a ( -.70)
| |*
*
*.
| *
*| |
----------------------------------------------------------------------------
+ -1 + **
*
*.
+ **
+
J
A
C
E
T
1
000+ L e v e
l 3
b
(
-.99)
| |*
*
. |*
*
|
|
| |.
| *
|
|
| |.
|
| |
| |.
|
|
|
| |
|*
|
|
| |.
|*
|
|
+ -2 +
+*
+
+
| |
|
| |
| |
|
|
|
| |
|
|
|
| |
|
|
|
| |
|*
|
|
| |
|
| |
+ -3 +
+
+
+
| |
|
|
|
| |
|
|
|
| |
|
|
|
| | | |
|
| |
|
|
|
| |
|
*
|
|
+ -4 +
+
+
+
-----------------------------------------------------
|Measr| * = 4
| * =
1 | -levels|
-----------------------------------------------------
Backing for decision inference
Level 2 and Level 3 students’ (n = 51, 49) grammar pretest and posttest
scores (k = 55)
11 points down
6 points up
Level 2
students
scored
higher
Level 3
students
scored
higher
Grammarpretest(α=.85) Grammarposttest(α=.92)
ClassLevel n M SD n M SD
Level2a 26 30.38 6.34 21 12.14 2.50
Level2b 25 32.36 8.47 24 28.63 7.93
Level2 51 31.35 7.45 45 20.93 10.24
Level3c 25 20.80 5.09 22 26.82 5.21
Level3d 24 19.88 4.29 23 26.78 5.95
Level3 49 20.35 4.69 45 26.80 5.53
Validity argument
Interpretive argument
ß Scoring inference
Þ to what extent do examinees get
placement items correct and high-
scoring examinees get more
placement items correct
ß Generalization inference
Þ to what extent are placement
items consistently sampled from a
domain and sufficient in number
so as to reduce the measurement
error
ß Extrapolation inference
Þ to what extent do the difficulty of
placement items match to the
objectives of a reading course
ß Decision inference
Þ to what extent do placement
decisions made to place
examinees in their proper level of
the course have an impact on
washback in the course
Validity argument
ß Scoring inference
Þ Because most items were working well,
the inference from observation to the
observed score was valid
ß Generalization inference
Þ Because of high dependability with the
small amount of measurement error, the
inference from the observed score to
universe score was valid
ß Extrapolation inference
Þ Because the difficulty of the items were
adequate to the objectives of the program,
the inference from the universe score to
target score was valid
ß Decision inference
Þ Because Level 3 students were placed in
the right level and were able to improve
their grammar test scores, the inference
from the target score to test use was valid.
Conclusion
ß “Validation is simple in principle, but
difficult in practice. The argument-based
framework provides a relatively pragmatic
approach to validation” (Kane, 2012, p. 15).
William Jolly Bridge, Brisbane
References
ß Kane, M. (2006). Validation. In R. Brennan
(Ed.), Educational measurement (4th ed.). (pp.
17-64). Westport, CT: Greenwood Publishing.
ß Kane, M. (2012). Validating score
interpretations and uses. Language Testing,
29, 3-17. doi: 10.1177/0265532211417210
ß Kumazawa, T. (2013). Evaluating validity for
in-house placement test score interpretations
and uses. JALT Journal, 35, 73-100.

More Related Content

What's hot

Two chapter 2 statistics
Two chapter 2 statistics Two chapter 2 statistics
Two chapter 2 statistics
Lizinis Cassendra Frederick Dony
 
Finding and Quantifying Temporal-Aware Contradiction in Reviews
Finding and Quantifying Temporal-Aware Contradiction in ReviewsFinding and Quantifying Temporal-Aware Contradiction in Reviews
Finding and Quantifying Temporal-Aware Contradiction in Reviews
Ismail BADACHE
 
CLASSIFICATION OF QUESTIONS AND LEARNING OUTCOME STATEMENTS (LOS) INTO BLOOM’...
CLASSIFICATION OF QUESTIONS AND LEARNING OUTCOME STATEMENTS (LOS) INTO BLOOM’...CLASSIFICATION OF QUESTIONS AND LEARNING OUTCOME STATEMENTS (LOS) INTO BLOOM’...
CLASSIFICATION OF QUESTIONS AND LEARNING OUTCOME STATEMENTS (LOS) INTO BLOOM’...
IJMIT JOURNAL
 
CLASSIFICATION OF QUESTIONS AND LEARNING OUTCOME STATEMENTS (LOS) INTO BLOOM’...
CLASSIFICATION OF QUESTIONS AND LEARNING OUTCOME STATEMENTS (LOS) INTO BLOOM’...CLASSIFICATION OF QUESTIONS AND LEARNING OUTCOME STATEMENTS (LOS) INTO BLOOM’...
CLASSIFICATION OF QUESTIONS AND LEARNING OUTCOME STATEMENTS (LOS) INTO BLOOM’...
IJMIT JOURNAL
 
he Comparative Study between Grade Level and Spelling Proficiency of Selected...
he Comparative Study between Grade Level and Spelling Proficiency of Selected...he Comparative Study between Grade Level and Spelling Proficiency of Selected...
he Comparative Study between Grade Level and Spelling Proficiency of Selected...
Mariz Pascua
 
Week3 Quiz Live Lecture 2010
Week3 Quiz Live Lecture 2010Week3 Quiz Live Lecture 2010
Week3 Quiz Live Lecture 2010
Brent Heard
 
Lyle F. Bachman Measurement ( Chapter 2 )
Lyle F. Bachman  Measurement ( Chapter 2 )Lyle F. Bachman  Measurement ( Chapter 2 )
Lyle F. Bachman Measurement ( Chapter 2 )
Abdolhossein Omidi
 
Harnessing Ratings and Aspect-Sentiment to Estimate Contradiction Intensity i...
Harnessing Ratings and Aspect-Sentiment to Estimate Contradiction Intensity i...Harnessing Ratings and Aspect-Sentiment to Estimate Contradiction Intensity i...
Harnessing Ratings and Aspect-Sentiment to Estimate Contradiction Intensity i...
Ismail BADACHE
 
Using Class Frequency for Improving Centroid-based Text Classification
Using Class Frequency for Improving Centroid-based Text ClassificationUsing Class Frequency for Improving Centroid-based Text Classification
Using Class Frequency for Improving Centroid-based Text Classification
IDES Editor
 
583 h139-rufi'i
583 h139-rufi'i583 h139-rufi'i
583 h139-rufi'i
Rufi'i Rufii
 
Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...researchinventy
 
Report frequency distribution table
Report frequency distribution tableReport frequency distribution table
Report frequency distribution tableMaTeresa Berondo
 
Measures of Central Tendency
Measures of Central TendencyMeasures of Central Tendency
Measures of Central Tendency
Suresh Babu
 
Quantitative techniques in research
Quantitative techniques in researchQuantitative techniques in research
Quantitative techniques in researchCarlo Magno
 
RELIABILITY IN LANGUAGE TESTING-TITIN'S GROUP
RELIABILITY IN LANGUAGE TESTING-TITIN'S GROUPRELIABILITY IN LANGUAGE TESTING-TITIN'S GROUP
RELIABILITY IN LANGUAGE TESTING-TITIN'S GROUPTitin Rohayati
 
Measures of Dispersion
Measures of DispersionMeasures of Dispersion
Measures of Dispersion
Suresh Babu
 

What's hot (18)

Two chapter 2 statistics
Two chapter 2 statistics Two chapter 2 statistics
Two chapter 2 statistics
 
The Components of Test Specifications
The Components of Test SpecificationsThe Components of Test Specifications
The Components of Test Specifications
 
Finding and Quantifying Temporal-Aware Contradiction in Reviews
Finding and Quantifying Temporal-Aware Contradiction in ReviewsFinding and Quantifying Temporal-Aware Contradiction in Reviews
Finding and Quantifying Temporal-Aware Contradiction in Reviews
 
CLASSIFICATION OF QUESTIONS AND LEARNING OUTCOME STATEMENTS (LOS) INTO BLOOM’...
CLASSIFICATION OF QUESTIONS AND LEARNING OUTCOME STATEMENTS (LOS) INTO BLOOM’...CLASSIFICATION OF QUESTIONS AND LEARNING OUTCOME STATEMENTS (LOS) INTO BLOOM’...
CLASSIFICATION OF QUESTIONS AND LEARNING OUTCOME STATEMENTS (LOS) INTO BLOOM’...
 
CLASSIFICATION OF QUESTIONS AND LEARNING OUTCOME STATEMENTS (LOS) INTO BLOOM’...
CLASSIFICATION OF QUESTIONS AND LEARNING OUTCOME STATEMENTS (LOS) INTO BLOOM’...CLASSIFICATION OF QUESTIONS AND LEARNING OUTCOME STATEMENTS (LOS) INTO BLOOM’...
CLASSIFICATION OF QUESTIONS AND LEARNING OUTCOME STATEMENTS (LOS) INTO BLOOM’...
 
he Comparative Study between Grade Level and Spelling Proficiency of Selected...
he Comparative Study between Grade Level and Spelling Proficiency of Selected...he Comparative Study between Grade Level and Spelling Proficiency of Selected...
he Comparative Study between Grade Level and Spelling Proficiency of Selected...
 
Week3 Quiz Live Lecture 2010
Week3 Quiz Live Lecture 2010Week3 Quiz Live Lecture 2010
Week3 Quiz Live Lecture 2010
 
Lyle F. Bachman Measurement ( Chapter 2 )
Lyle F. Bachman  Measurement ( Chapter 2 )Lyle F. Bachman  Measurement ( Chapter 2 )
Lyle F. Bachman Measurement ( Chapter 2 )
 
Harnessing Ratings and Aspect-Sentiment to Estimate Contradiction Intensity i...
Harnessing Ratings and Aspect-Sentiment to Estimate Contradiction Intensity i...Harnessing Ratings and Aspect-Sentiment to Estimate Contradiction Intensity i...
Harnessing Ratings and Aspect-Sentiment to Estimate Contradiction Intensity i...
 
Using Class Frequency for Improving Centroid-based Text Classification
Using Class Frequency for Improving Centroid-based Text ClassificationUsing Class Frequency for Improving Centroid-based Text Classification
Using Class Frequency for Improving Centroid-based Text Classification
 
583 h139-rufi'i
583 h139-rufi'i583 h139-rufi'i
583 h139-rufi'i
 
Ietcpresentation
IetcpresentationIetcpresentation
Ietcpresentation
 
Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...
 
Report frequency distribution table
Report frequency distribution tableReport frequency distribution table
Report frequency distribution table
 
Measures of Central Tendency
Measures of Central TendencyMeasures of Central Tendency
Measures of Central Tendency
 
Quantitative techniques in research
Quantitative techniques in researchQuantitative techniques in research
Quantitative techniques in research
 
RELIABILITY IN LANGUAGE TESTING-TITIN'S GROUP
RELIABILITY IN LANGUAGE TESTING-TITIN'S GROUPRELIABILITY IN LANGUAGE TESTING-TITIN'S GROUP
RELIABILITY IN LANGUAGE TESTING-TITIN'S GROUP
 
Measures of Dispersion
Measures of DispersionMeasures of Dispersion
Measures of Dispersion
 

Similar to Ail apresentation(kumazawa)

How Anchoring Concepts Influence Essay Conceptual Structure And Test Performance
How Anchoring Concepts Influence Essay Conceptual Structure And Test PerformanceHow Anchoring Concepts Influence Essay Conceptual Structure And Test Performance
How Anchoring Concepts Influence Essay Conceptual Structure And Test Performance
Roy Clariana
 
Achievement test powerpoint97 2003
Achievement test powerpoint97 2003Achievement test powerpoint97 2003
Achievement test powerpoint97 2003
SwathiE6
 
Act satseminar2008
Act satseminar2008Act satseminar2008
Act satseminar2008jaeinkay
 
Pivot INSPECT® Indiana's Formative Assessment Solution
Pivot INSPECT® Indiana's Formative Assessment SolutionPivot INSPECT® Indiana's Formative Assessment Solution
Pivot INSPECT® Indiana's Formative Assessment Solution
marketing_Fivestar
 
B.ed. 4th sem computational literacy
B.ed. 4th sem computational literacyB.ed. 4th sem computational literacy
B.ed. 4th sem computational literacy
Dammar Singh Saud
 
B.ed. 4th sem computational literacy
B.ed. 4th sem computational literacyB.ed. 4th sem computational literacy
B.ed. 4th sem computational literacy
Dammar Singh Saud
 
PHYSICS EDUCATION PRINCIPLE & EVALUATION TECHNIQUES (LARAS & NUR ASIAH)
PHYSICS EDUCATION PRINCIPLE & EVALUATION TECHNIQUES (LARAS & NUR ASIAH)PHYSICS EDUCATION PRINCIPLE & EVALUATION TECHNIQUES (LARAS & NUR ASIAH)
PHYSICS EDUCATION PRINCIPLE & EVALUATION TECHNIQUES (LARAS & NUR ASIAH)
vina serevina
 
Criteria to consider when constructing good tests
Criteria to consider when constructing good testsCriteria to consider when constructing good tests
Criteria to consider when constructing good tests
shimmy ct
 
Criteria to Consider when Constructing Good tests
Criteria to Consider when Constructing Good testsCriteria to Consider when Constructing Good tests
Criteria to Consider when Constructing Good tests
Shimmy Tolentino
 
Test construction tony coloma
Test construction tony colomaTest construction tony coloma
Test construction tony coloma
Tony Coloma
 
MCQ test item analysis
MCQ test item analysisMCQ test item analysis
MCQ test item analysis
Soha Rashed
 
Adaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrievalAdaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrieval
YI-JHEN LIN
 
A CRITICAL REVIEW ON THE OPTIMIZATION METHODS IN SOLVING EXAM TIMETABLING AND...
A CRITICAL REVIEW ON THE OPTIMIZATION METHODS IN SOLVING EXAM TIMETABLING AND...A CRITICAL REVIEW ON THE OPTIMIZATION METHODS IN SOLVING EXAM TIMETABLING AND...
A CRITICAL REVIEW ON THE OPTIMIZATION METHODS IN SOLVING EXAM TIMETABLING AND...
IAEME Publication
 
Pivot inspect with reading overview presentation for webinar 8 13-15 (1)
Pivot inspect with reading overview presentation for webinar 8 13-15 (1)Pivot inspect with reading overview presentation for webinar 8 13-15 (1)
Pivot inspect with reading overview presentation for webinar 8 13-15 (1)
marketing_Fivestar
 
Establishing Validity-and-Reliability-Test ppt.pptx
Establishing Validity-and-Reliability-Test ppt.pptxEstablishing Validity-and-Reliability-Test ppt.pptx
Establishing Validity-and-Reliability-Test ppt.pptx
RayLorenzOrtega
 
Principles of design of experiments (doe)20 5-2014
Principles of  design of experiments (doe)20 5-2014Principles of  design of experiments (doe)20 5-2014
Principles of design of experiments (doe)20 5-2014Awad Albalwi
 
Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)
Mahsa Farahanynia
 
sat-k12-teacher-training-module-1-key-features.pptx
sat-k12-teacher-training-module-1-key-features.pptxsat-k12-teacher-training-module-1-key-features.pptx
sat-k12-teacher-training-module-1-key-features.pptx
ShivamMishra465316
 
Education Assessment in Learnings 1.pptx
Education Assessment in Learnings 1.pptxEducation Assessment in Learnings 1.pptx
Education Assessment in Learnings 1.pptx
RayLorenzOrtega
 

Similar to Ail apresentation(kumazawa) (20)

Chapter iii
Chapter iiiChapter iii
Chapter iii
 
How Anchoring Concepts Influence Essay Conceptual Structure And Test Performance
How Anchoring Concepts Influence Essay Conceptual Structure And Test PerformanceHow Anchoring Concepts Influence Essay Conceptual Structure And Test Performance
How Anchoring Concepts Influence Essay Conceptual Structure And Test Performance
 
Achievement test powerpoint97 2003
Achievement test powerpoint97 2003Achievement test powerpoint97 2003
Achievement test powerpoint97 2003
 
Act satseminar2008
Act satseminar2008Act satseminar2008
Act satseminar2008
 
Pivot INSPECT® Indiana's Formative Assessment Solution
Pivot INSPECT® Indiana's Formative Assessment SolutionPivot INSPECT® Indiana's Formative Assessment Solution
Pivot INSPECT® Indiana's Formative Assessment Solution
 
B.ed. 4th sem computational literacy
B.ed. 4th sem computational literacyB.ed. 4th sem computational literacy
B.ed. 4th sem computational literacy
 
B.ed. 4th sem computational literacy
B.ed. 4th sem computational literacyB.ed. 4th sem computational literacy
B.ed. 4th sem computational literacy
 
PHYSICS EDUCATION PRINCIPLE & EVALUATION TECHNIQUES (LARAS & NUR ASIAH)
PHYSICS EDUCATION PRINCIPLE & EVALUATION TECHNIQUES (LARAS & NUR ASIAH)PHYSICS EDUCATION PRINCIPLE & EVALUATION TECHNIQUES (LARAS & NUR ASIAH)
PHYSICS EDUCATION PRINCIPLE & EVALUATION TECHNIQUES (LARAS & NUR ASIAH)
 
Criteria to consider when constructing good tests
Criteria to consider when constructing good testsCriteria to consider when constructing good tests
Criteria to consider when constructing good tests
 
Criteria to Consider when Constructing Good tests
Criteria to Consider when Constructing Good testsCriteria to Consider when Constructing Good tests
Criteria to Consider when Constructing Good tests
 
Test construction tony coloma
Test construction tony colomaTest construction tony coloma
Test construction tony coloma
 
MCQ test item analysis
MCQ test item analysisMCQ test item analysis
MCQ test item analysis
 
Adaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrievalAdaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrieval
 
A CRITICAL REVIEW ON THE OPTIMIZATION METHODS IN SOLVING EXAM TIMETABLING AND...
A CRITICAL REVIEW ON THE OPTIMIZATION METHODS IN SOLVING EXAM TIMETABLING AND...A CRITICAL REVIEW ON THE OPTIMIZATION METHODS IN SOLVING EXAM TIMETABLING AND...
A CRITICAL REVIEW ON THE OPTIMIZATION METHODS IN SOLVING EXAM TIMETABLING AND...
 
Pivot inspect with reading overview presentation for webinar 8 13-15 (1)
Pivot inspect with reading overview presentation for webinar 8 13-15 (1)Pivot inspect with reading overview presentation for webinar 8 13-15 (1)
Pivot inspect with reading overview presentation for webinar 8 13-15 (1)
 
Establishing Validity-and-Reliability-Test ppt.pptx
Establishing Validity-and-Reliability-Test ppt.pptxEstablishing Validity-and-Reliability-Test ppt.pptx
Establishing Validity-and-Reliability-Test ppt.pptx
 
Principles of design of experiments (doe)20 5-2014
Principles of  design of experiments (doe)20 5-2014Principles of  design of experiments (doe)20 5-2014
Principles of design of experiments (doe)20 5-2014
 
Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)
 
sat-k12-teacher-training-module-1-key-features.pptx
sat-k12-teacher-training-module-1-key-features.pptxsat-k12-teacher-training-module-1-key-features.pptx
sat-k12-teacher-training-module-1-key-features.pptx
 
Education Assessment in Learnings 1.pptx
Education Assessment in Learnings 1.pptxEducation Assessment in Learnings 1.pptx
Education Assessment in Learnings 1.pptx
 

Recently uploaded

Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 

Recently uploaded (20)

Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 

Ail apresentation(kumazawa)

  • 1. Evaluating validity of criterion- referenced test score interpretations and uses Takaaki Kumazawa Kanto Gakuin University (ktakaaki@kanto-gakuin.ac.jp) Kintai Bridge, Japan (wiki
  • 2. Purpose ß The purpose of my talk is to evaluate validity of criterion-referenced placement test score interpretations and uses using Kane’s (2006) argument-based validity framework ß This presentation is based on a paper I published in the JALT Journal (http://jalt-publications.org/jj/issues/2013-05_35.1)
  • 3. Classical view of validity ß Validity: the extent to which a test is supposed to measure ß Three types of validity Þ Criterion-related validity Correlation between a valid measure and a test developing Þ Content validity Experts’ judgment on whether items are measuring what is supposed to measure Þ Construct validity Statistical examination on whether items are measuring what is supposed to measure
  • 4. Current view of Validity ß Validity is “the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests” (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education [AERA, APA, & NCME], 1999, p. 9).
  • 5. Argument-based validity framework Interpretive argument: proving argument that the inferences are going to make is theoretically valid Validity argument: evaluating the interpretive argument by providing warrant Observatio n Observed score Universe score Target score Use Scoring generalization extrapolation decision
  • 6. Interpretive argument ß Scoring inference Þ to what extent do examinees get placement items correct and high-scoring examinees get more placement items correct ß Generalization inference Þ to what extent are placement items consistently sampled from a domain and sufficient in number so as to reduce the measurement error ß Extrapolation inference Þ to what extent do the difficulty of placement items match to the objectives of a reading course ß Decision inference Þ to what extent do placement decisions made to place examinees in their proper level of the course have an impact on washback in the course
  • 7. Participants Þ 428 Japanese 1st year university students majoring in law Þ TOEIC score of about 250-450 Þ Three courses in the English program Reading Listening TOEIC skills ß Proficiency based program Þ Three levels Level 1: 60 high scoring students Major objective of the reading course: improve their reading skills such as fast reading Level 2: about 300 students Level 3: 50 low scoring students Major objective of the reading class: re-learn Jr High and High school grammar
  • 8. Criterion-referenced placement test ß Grammar (k = 40) Þ Items are taken from textbooks used in junior and high schools Þ Grammar: present, past, & future tenses, continuous, relative pronoun, gerund, participle, etc… Þ Sample: Hi, I ( ) Ken. 1. am 2. are 3. is 4. be ß Vocabulary (k = 40) Þ Items are taken from high frequent 1000-3000 words based on the JACET 8000 corpus Þ Sample: Bring 1. 送る (send) 2. 持ってくる (bring) 3. 鳴る (ring) 4. 購入する (buy) ß Reading (k = 10) Þ Two passages are taken from two textbooks used in Level 1 and Level 3 reading classes Þ Sample: How do they travel? 1. by plane 2. by bus 3. by car 4. by train
  • 9. Procedures ß On the first day of semester, the placement test was given in 45 minutes ß A grammar pretest (k = 55, α = .85) was given on the first day of class in Level 2 classes (n = 51) and Level 3 classes (n = 49) ß 30 90-minute lessons in two semesters ß The same grammar posttest (α = .92) was given on the last day of class to the same students (n = 51, 49) ß A course evaluation survey was given to the same students (n = 51, 49)
  • 10. Backing for scoring inference ß Item facility Þ 7 items below .29 Þ 62 items between .30 and .70 Þ 21 items above .71 ß Item discrimination Þ 4 items below .19 Þ 86 items above .20 ß Rasch Item difficulty estimates Þ -3.79〜2.33 ß Infit MS Þ 0.80〜1.30
  • 11. Backing for generalization inference ß Multivariate generalizability theory (Decision study of a persons X Items design) Þ Grammar (k = 40, ρ = .85, Φ = .83) Þ Vocabulary (k = 40, ρ = .86, Φ = .84) Þ Reading (k = 10, ρ = .58, Φ = .55) Þ Total (k = 90, ρ = .92, Φ = .91)
  • 12. Cut point for Level 1 Level 1 reading Cut point for Level 3 Junior High grammar and 1000 word level Backing for extrapolation inference Difficulty level estimates FACETS map Level Difficulty SE InfitMS JuniorHighgrammar -0.65 0.03 1.00 HighSchoolgrammar 0.29 0.02 1.00 1000wordlevelvocab -0.94 0.03 1.00 2000wordlevelvocab 0.15 0.03 1.00 3000wordlevelvocab 0.12 0.05 1.00 Level3rearing 0.30 0.05 1.00 Level1reading 0.73 0.05 1.10 ----------------------------------------------------- |Measr|+students |-items| -levels | CUT Po int for Leve ls 1, 2, 3 ----------------------------------------------------- + 3 + + + + | |. | | | | |. | | | | |. | | | | |. | | | | |. |* | | | |* . | | | + 2 + . +* + | |. | | | | |* * . | | | | |* . | *| | Level 1a ( 1.49) --------------------------------------------------------------------------- | |* * **. | | | | |* * **. | | | | |* * *. |* | | + 1 + * **. +*** ** + + | |* * **** . |* * * * * * ** | | | |* * *. | *** |Lev el1Rea d ing | L e vel 1b (.77 ) --------------------------------------------------------------------------- | |* * **** . |* **** * | | | |* * ** | **** * * | | | |* * **** * . | * * * * * *** | Basic H S G r a m m a r | | |* * ** | **** **** | JACET2000 J ACET3000 | * 0 * * **** * *. * * * * * ** * * L e v e l 2 ( . 7 7 -.70) | |* * **** * |* ** | | | |* * **. | * * * *** | | | |* * **** . |* * * * | | | |* * **** * ** | *** * ** | | ---------------------------------------------------------------------------- | |* * * |* **** | Jr H Gram m a r | L e vel 3a ( -.70) | |* * *. | * *| | ---------------------------------------------------------------------------- + -1 + ** * *. + ** + J A C E T 1 000+ L e v e l 3 b ( -.99) | |* * . |* * | | | |. | * | | | |. | | | | |. | | | | | |* | | | |. |* | | + -2 + +* + + | | | | | | | | | | | | | | | | | | | | | | |* | | | | | | | + -3 + + + + | | | | | | | | | | | | | | | | | | | | | | | | | | | | * | | + -4 + + + + ----------------------------------------------------- |Measr| * = 4 | * = 1 | -levels| -----------------------------------------------------
  • 13. Backing for decision inference Level 2 and Level 3 students’ (n = 51, 49) grammar pretest and posttest scores (k = 55) 11 points down 6 points up Level 2 students scored higher Level 3 students scored higher Grammarpretest(α=.85) Grammarposttest(α=.92) ClassLevel n M SD n M SD Level2a 26 30.38 6.34 21 12.14 2.50 Level2b 25 32.36 8.47 24 28.63 7.93 Level2 51 31.35 7.45 45 20.93 10.24 Level3c 25 20.80 5.09 22 26.82 5.21 Level3d 24 19.88 4.29 23 26.78 5.95 Level3 49 20.35 4.69 45 26.80 5.53
  • 14. Validity argument Interpretive argument ß Scoring inference Þ to what extent do examinees get placement items correct and high- scoring examinees get more placement items correct ß Generalization inference Þ to what extent are placement items consistently sampled from a domain and sufficient in number so as to reduce the measurement error ß Extrapolation inference Þ to what extent do the difficulty of placement items match to the objectives of a reading course ß Decision inference Þ to what extent do placement decisions made to place examinees in their proper level of the course have an impact on washback in the course Validity argument ß Scoring inference Þ Because most items were working well, the inference from observation to the observed score was valid ß Generalization inference Þ Because of high dependability with the small amount of measurement error, the inference from the observed score to universe score was valid ß Extrapolation inference Þ Because the difficulty of the items were adequate to the objectives of the program, the inference from the universe score to target score was valid ß Decision inference Þ Because Level 3 students were placed in the right level and were able to improve their grammar test scores, the inference from the target score to test use was valid.
  • 15. Conclusion ß “Validation is simple in principle, but difficult in practice. The argument-based framework provides a relatively pragmatic approach to validation” (Kane, 2012, p. 15). William Jolly Bridge, Brisbane
  • 16. References ß Kane, M. (2006). Validation. In R. Brennan (Ed.), Educational measurement (4th ed.). (pp. 17-64). Westport, CT: Greenwood Publishing. ß Kane, M. (2012). Validating score interpretations and uses. Language Testing, 29, 3-17. doi: 10.1177/0265532211417210 ß Kumazawa, T. (2013). Evaluating validity for in-house placement test score interpretations and uses. JALT Journal, 35, 73-100.