Ail apresentation(kumazawa)

Evaluating validity of criterion-
referenced test score
interpretations and uses
Takaaki Kumazawa
Kanto Gakuin University
(ktakaaki@kanto-gakuin.ac.jp)
Kintai Bridge, Japan (wiki

Purpose
ß The purpose of my talk is to evaluate
validity of criterion-referenced placement
test score interpretations and uses using
Kane’s (2006) argument-based validity
framework
ß This presentation is based on a paper I
published in the JALT Journal
（http://jalt-publications.org/jj/issues/2013-05_35.1）

Classical view of validity
ß Validity: the extent to which a test is
supposed to measure
ß Three types of validity
Þ Criterion-related validity
Correlation between a valid measure and a test
developing
Þ Content validity
Experts’ judgment on whether items are measuring
what is supposed to measure
Þ Construct validity
Statistical examination on whether items are measuring
what is supposed to measure

Current view of Validity
ß Validity is “the degree to which evidence
and theory support the interpretations of
test scores entailed by proposed uses of
tests” (American Educational Research
Association, American Psychological
Association, & National Council on
Measurement in Education [AERA, APA, &
NCME], 1999, p. 9).

Argument-based validity framework
Interpretive argument: proving argument that the inferences are
going to make is theoretically valid
Validity argument: evaluating the interpretive argument by providing
warrant
Observatio
n
Observed
score
Universe
score
Target
score
Use
Scoring generalization extrapolation
decision

Interpretive argument
ß Scoring inference
Þ to what extent do examinees get placement items correct
and high-scoring examinees get more placement items
correct
ß Generalization inference
Þ to what extent are placement items consistently sampled
from a domain and sufficient in number so as to reduce the
measurement error
ß Extrapolation inference
Þ to what extent do the difficulty of placement items match to
the objectives of a reading course
ß Decision inference
Þ to what extent do placement decisions made to place
examinees in their proper level of the course have an
impact on washback in the course

Participants
Þ 428 Japanese 1st year university students majoring in
law
Þ TOEIC score of about 250-450
Þ Three courses in the English program
Reading
Listening
TOEIC skills
ß Proficiency based program
Þ Three levels
Level 1: 60 high scoring students
Major objective of the reading course: improve their reading skills
such as fast reading
Level 2: about 300 students
Level 3: 50 low scoring students
Major objective of the reading class: re-learn Jr High and High
school grammar

Criterion-referenced placement test
ß Grammar (k = 40)
Þ Items are taken from textbooks used in junior and high schools
Þ Grammar: present, past, & future tenses, continuous, relative pronoun,
gerund, participle, etc…
Þ Sample: Hi, I ( ) Ken.
1. am 2. are 3. is 4. be
ß Vocabulary (k = 40)
Þ Items are taken from high frequent 1000-3000 words based on the
JACET 8000 corpus
Þ Sample: Bring
1. 送る (send) 2. 持ってくる (bring) 3. 鳴る (ring) 4. 購入する (buy)
ß Reading (k = 10)
Þ Two passages are taken from two textbooks used in Level 1 and Level
3 reading classes
Þ Sample: How do they travel?
1. by plane 2. by bus 3. by car 4. by train

Procedures
ß On the first day of semester, the placement
test was given in 45 minutes
ß A grammar pretest (k = 55, α = .85) was
given on the first day of class in Level 2
classes (n = 51) and Level 3 classes (n = 49)
ß 30 90-minute lessons in two semesters
ß The same grammar posttest (α = .92) was
given on the last day of class to the same
students (n = 51, 49)
ß A course evaluation survey was given to the
same students (n = 51, 49)

Backing for scoring inference
ß Item facility
Þ 7 items below .29
Þ 62 items between .30 and .70
Þ 21 items above .71
ß Item discrimination
Þ 4 items below .19
Þ 86 items above .20
ß Rasch Item difficulty estimates
Þ -3.79〜2.33
ß Infit MS
Þ 0.80〜1.30

Backing for generalization inference
ß Multivariate generalizability theory
(Decision study of a persons X Items
design)
Þ Grammar (k = 40, ρ = .85, Φ = .83)
Þ Vocabulary (k = 40, ρ = .86, Φ = .84)
Þ Reading (k = 10, ρ = .58, Φ = .55)
Þ Total (k = 90, ρ = .92, Φ = .91)

Cut point for Level 1
Level 1 reading
Cut point for Level 3
Junior High grammar and 1000 word level
Backing for extrapolation inference
Difficulty level estimates FACETS map
Level Difficulty SE InfitMS
JuniorHighgrammar -0.65 0.03 1.00
HighSchoolgrammar 0.29 0.02 1.00
1000wordlevelvocab -0.94 0.03 1.00
2000wordlevelvocab 0.15 0.03 1.00
3000wordlevelvocab 0.12 0.05 1.00
Level3rearing 0.30 0.05 1.00
Level1reading 0.73 0.05 1.10
-----------------------------------------------------
|Measr|+students |-items| -levels
|
CUT Po int for Leve ls 1, 2,
3
-----------------------------------------------------
+
3
+
+
+
+
| |. | |
|
| |.
|
|
|
| |.
|
|
|
| |.
|
|
|
| |.
|*
|
|
| |*
.
|
| |
+
2
+
.
+*
+
| |.
|
|
|
| |*
*
. |
|
|
| |*
.
|
*| | Level 1a ( 1.49)
---------------------------------------------------------------------------
| |*
*
**.
|
|
|
| |*
*
**.
|
|
|
| |*
*
*.
|*
| |
+
1
+
*
**.
+***
**
+
+
| |*
*
****
.
|*
*
*
*
*
*
** | |
| |*
*
*.
|
*** |Lev
el1Rea
d
ing | L
e
vel 1b (.77 )
---------------------------------------------------------------------------
| |*
*
****
.
|*
****
*
|
|
| |*
*
**
|
****
*
*
|
|
| |*
*
****
*
. |
*
*
*
*
*
*** | Basic H
S
G r a m
m a r |
| |*
*
**
|
****
**** | JACET2000
J
ACET3000 |
*
0
*
*
****
*
*. *
*
*
*
*
** * * L e
v e l 2 ( .
7 7
-.70)
| |*
*
****
*
|*
**
|
|
| |*
*
**.
| *
*
*
*** | |
| |*
*
****
.
|*
*
*
* |
|
| |*
*
****
*
** | ***
*
** | |
----------------------------------------------------------------------------
| |*
*
* |*
**** | Jr
H Gram
m
a
r
| L
e
vel 3a ( -.70)
| |*
*
*.
| *
*| |
----------------------------------------------------------------------------
+ -1 + **
*
*.
+ **
+
J
A
C
E
T
1
000+ L e v e
l 3
b
(
-.99)
| |*
*
. |*
*
|
|
| |.
| *
|
|
| |.
|
| |
| |.
|
|
|
| |
|*
|
|
| |.
|*
|
|
+ -2 +
+*
+
+
| |
|
| |
| |
|
|
|
| |
|
|
|
| |
|
|
|
| |
|*
|
|
| |
|
| |
+ -3 +
+
+
+
| |
|
|
|
| |
|
|
|
| |
|
|
|
| | | |
|
| |
|
|
|
| |
|
*
|
|
+ -4 +
+
+
+
-----------------------------------------------------
|Measr| * = 4
| * =
1 | -levels|
-----------------------------------------------------

Backing for decision inference
Level 2 and Level 3 students’ (n = 51, 49) grammar pretest and posttest
scores (k = 55)
11 points down
6 points up
Level 2
students
scored
higher
Level 3
students
scored
higher
Grammarpretest（α=.85） Grammarposttest（α=.92）
ClassLevel n M SD n M SD
Level2a 26 30.38 6.34 21 12.14 2.50
Level2b 25 32.36 8.47 24 28.63 7.93
Level2 51 31.35 7.45 45 20.93 10.24
Level3c 25 20.80 5.09 22 26.82 5.21
Level3d 24 19.88 4.29 23 26.78 5.95
Level3 49 20.35 4.69 45 26.80 5.53

Validity argument
Interpretive argument
Þ to what extent do examinees get
placement items correct and high-
scoring examinees get more
placement items correct
Þ to what extent are placement
items consistently sampled from a
domain and sufficient in number
so as to reduce the measurement
error
Þ to what extent do the difficulty of
placement items match to the
objectives of a reading course
Þ to what extent do placement
decisions made to place
examinees in their proper level of
the course have an impact on
washback in the course
Validity argument
Þ Because most items were working well,
the inference from observation to the
observed score was valid
Þ Because of high dependability with the
small amount of measurement error, the
inference from the observed score to
universe score was valid
Þ Because the difficulty of the items were
adequate to the objectives of the program,
the inference from the universe score to
target score was valid
Þ Because Level 3 students were placed in
the right level and were able to improve
their grammar test scores, the inference
from the target score to test use was valid.

Conclusion
ß “Validation is simple in principle, but
difficult in practice. The argument-based
framework provides a relatively pragmatic
approach to validation” (Kane, 2012, p. 15).
William Jolly Bridge, Brisbane

References
ß Kane, M. (2006). Validation. In R. Brennan
(Ed.), Educational measurement (4th ed.). (pp.
17-64). Westport, CT: Greenwood Publishing.
ß Kane, M. (2012). Validating score
interpretations and uses. Language Testing,
29, 3-17. doi: 10.1177/0265532211417210
ß Kumazawa, T. (2013). Evaluating validity for
in-house placement test score interpretations
and uses. JALT Journal, 35, 73-100.

Ail apresentation(kumazawa)

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Ail apresentation(kumazawa)

Similar to Ail apresentation(kumazawa) (20)

Recently uploaded

Recently uploaded (20)

Ail apresentation(kumazawa)