SlideShare a Scribd company logo
1 of 19
Download to read offline
PSYCHOMETRIKA—VOL. 74, NO. 1, 1–19
MARCH 2009
DOI: 10.1007/S11336-008-9076-X
A NEW CONCURRENT CALIBRATION METHOD FOR NONEQUIVALENT GROUP
DESIGN UNDER NONRANDOM ASSIGNMENT
KEI MIYAZAKI
DEPARTMENT OF COGNITIVE AND BEHAVIORAL SCIENCE, THE UNIVERSITY OF TOKYO
TAKAHIRO HOSHINO
GRADUATE SCHOOL OF ECONOMICS, NAGOYA UNIVERSITY
SHIN-ICHI MAYEKAWA
GRADUATE SCHOOL OF DECISION SCIENCE AND TECHNOLOGY, TOKYO INSTITUTE
OF TECHNOLOGY
KAZUO SHIGEMASU
DEPARTMENT OF COGNITIVE AND BEHAVIORAL SCIENCE, THE UNIVERSITY OF TOKYO
This study proposes a new item parameter linking method for the common-item nonequivalent
groups design in item response theory (IRT). Previous studies assumed that examinees are randomly
assigned to either test form. However, examinees can frequently select their own test forms and tests of-
ten differ according to examinees’ abilities. In such cases, concurrent calibration or multiple group IRT
modeling without modeling test form selection behavior can yield severely biased results. We proposed a
model wherein test form selection behavior depends on test scores and used a Monte Carlo expectation
maximization (MCEM) algorithm. This method provided adequate estimates of testing parameters.
Key words: common-item design, concurrent calibration, IRT linking, item response theory, Monte Carlo
expectation maximization (MCEM) algorithm, multinomial logistic regression model, nonignorable miss-
ingness.
1. Introduction
Item response theory (IRT) methods are used in many testing applications. Advantageous
properties of IRT arise from the fact that the theory explicitly models examinee responses at the
item level, whereas, for example, the focus of classical test models is on responses at the level of
test scores.
In this study, we examined a common-item nonequivalent groups design from among various
kinds of test designs for the purpose of item parameter linking and to consider the situations
under which item parameter linking is performed between two test forms. In this study design,
the two test forms had some items in common but also included items that some examinees did
not answer. Figure 1 illustrates this design. The groups of examinees that differed with regard
to testing times are shown as rows in the figure, and the tests are indicated by columns. In the
common-item design, the scores on the tests the examinees did not select are not observed. The
shaded areas indicate the missing data. u indicates the item response vectors. The suffixes of
the alphabets and numbers indicate the kinds of tests and groups of examinees, respectively.
Requests for reprints should be sent to Takahiro Hoshino, Graduate School of Economics, Nagoya University,
Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan. E-mail: bayesian@jasmine.ocn.ne.jp
© 2008 The Psychometric Society
1
2 PSYCHOMETRIKA
FIGURE 1.
Concurrent calibration in a common item design.
Two groups of examinees from different populations were each assigned to different test forms.
The test design in which the groups of examinees are not equivalent is called the nonequivalent
groups design.
Two main IRT linking methods are used in a common-item nonequivalent groups design:
separate estimation and concurrent calibration (Wingersky & Lord, 1984). In separate estima-
tion, the two sets of item parameter estimates for the common items are used to estimate a scale
transformation that will put the item parameter estimates of one form on the scale of the item pa-
rameter estimates for the other form (Haebara, 1980; Stocking & Lord, 1983; Kolen & Brennan,
2004). In concurrent calibration, item parameters for all the items on both forms are estimated
simultaneously in one run of the estimation software. Estimating parameters for all items simul-
taneously ensures that all parameter estimates are on the same scale. Bock and Zimowski (1997)
suggested a multiple group IRT that can deal with multiple examinee groups differing in ability in
the concurrent calibration. Von Davier and von Davier (2004) presented all the linking methods
in a new framework (in which the calibration is performed in one step) in terms of restrictions
on the likelihood function.
Numerous studies have researched the accuracy of the estimation by using each method
(Baker & Al-Karni, 1991; Kim & Cohen, 1992). Hanson and Béguin (2002) found that concur-
rent calibration procedures produced more accurate results than did separate estimation. Conse-
quently, to date, concurrent calibration is thought to be the most appropriate estimation method
for the common-item nonequivalent groups design.
When concurrent calibration is applied in the common-item nonequivalent groups design,
the scores on the tests that some examinees did not take are regarded as missing data, whereas
the preceding methods related to the nonequivalent groups design ignored these data. In other
words, these methods implicitly assumed missing data to be data that was “missing at random”
(MAR, see Little & Rubin, 2002). The missing data mechanism is ignorable if (a) missingness
is MAR and (b) the parameters of the grouping variable and the parameters of the item response
variables are distinct (Little & Rubin, 2002). When these two conditions are satisfied, the like-
lihood function is separated into the likelihood of grouping variable and that of item response
variables. If missingness is not MAR, or the likelihood function is not separated into the like-
lihood of grouping variable and that of item response variables, the missing data mechanism is
nonignorable.
Meanwhile, in many cases, examinees can select their test forms; consequently, they are not
randomly assigned to a test form. We explain this by using the example of real data analysis. In
Sect. 5, we analyze a JLRT (Japanese Listening and Reading Comprehension Test) data set. The
JLRT, which is a part of the Business Japanese Proficiency Test (BJT; Kato & JETRO, 2006)
is a 100 multiple choice item test that measures the ability to communicate with other persons
in the Japanese language. The target population of the BJT is people whose first language is
KEI MIYAZAKI ET AL. 3
FIGURE 2.
The situation of the test design considered in this study.
not Japanese; the test has been administered in 32 cities across 13 countries such as Japan, the
United States, Canada, EU countries, Brazil, China, and other Asian countries. The version of the
JLRT that we analyzed has two equivalent test forms, and the one that the examinees are given
depends on the country in which they take the test; that is, we can determine the country in which
the examinees took the test by asking them which form they took. Because we can assume that
the examinees living in Japan are probably highly motivated to study Japanese and have higher
abilities than the examinees in other countries, just by asking an examinee which form he/she
took, we can form an expectation of whether he/she has higher or lower ability. This expectation
implies that missing data have some information about examinees’ achievement levels. Thus,
missing data are regarded as being nonignorable.
Another example demonstrating the advantage of our model is that examinees can frequently
select test forms by themselves and are consequently not randomly assigned to a test form. In
an AP (Advanced Placement) exam, in one measure at one administration, one student might
choose between two essays that are supposed to measure the same construct while the common
items are fixed (for more details about linking in AP exams, refer to Yang, 2004). In such cases,
since examinees are not randomly assigned to one of the two test forms, a mere application of
existing item parameter linking methods can yield biased results (see Sect. 4, simulation study).
In addition, when the test form selection behavior depends on the examinees’ abilities, the exist-
ing item parameter linking methods—including multiple group IRT—can yield biased estimates
because the likelihood function is not separated into the likelihood of grouping variable and that
of item response variables (a detailed explanation with mathematical expressions is provided in
Model Assumptions, Sect. 2).
To solve this problem, we constructed a model in which test form selection behavior is
dependent on the scores of the tests (Fig. 2). In this model, missing test scores are regarded as
nonignorable (Little & Rubin, 2002). Furthermore, we proposed an estimation method for the
parameters, using the MCEM algorithm (Wei & Tanner, 1990). Consequently, we proposed a
new concurrent calibration method for a common-item nonequivalent groups design.
In Sect. 2, we present the model assumptions and describe the form of likelihood function of
our model. We provide maximum likelihood estimations using the MCEM algorithm and address
certain topics related to parameter estimation, such as the calculation of the asymptotic variance
covariance matrix from the EM algorithm in Sect. 3. In Sect. 4, we present a simulation study to
show that the traditional method provides severely biased estimates and verify that the proposed
model can yield adequate estimates. In Sect. 5, we apply the proposed method to JLRT data and
describe the meaningful results. Finally, in Sect. 6, we provide concluding remarks.
4 PSYCHOMETRIKA
2. Model Assumptions
We considered the situations in which each of the two test forms were used separately. Item
parameter linking was performed between these two test forms (Fig. 1). For the examinees in
group 1, Tests A and B were administered, and for the examinees in group 2, Tests B and C were
administered. Test B was common between the two test groups. The examinees had to choose one
of the two groups. Let KA,KB, and KC be the number of items in Tests A, B, and C, respectively.
Let ri be a test form selection indicator (ri = j (j = 1,2) implies that the ith examinee selected
the jth test form). The item response vector uij is observed when the ith examinee selected the
jth test form, and uij′ (j′ = j) is missing. Let ui = ((uobs
i )′,(umis
i )′)′, where uobs
i represents
the observed components of ui, while umis
i represents the missing entries. Hence, the missing
patterns can be expressed as follows:

uobs
i
′
,

umis
i
′
=

(u′
i1,u′
i2) (ri = 1),
(u′
i2,u′
i1) (ri = 2),
(1)
and ui1 = (u′
iA,u′
iB1)′, ui2 = (u′
iB2,u′
iC)′. For example, uiA = (uiA1,...,uiAKA )′. uiB1 repre-
sents the item response vector when the ith examinee takes Test B at the first point in time or
at the first place. uiB2 represents the item response vector when the ith examinee takes Test B
at the second point in time or at the second place. The constraint that uiB1 = uiB2 should be
considered. However, as described in the Introduction, here we assume that uiB1 is not equal to
uiB2 because of the difference in examinees’ abilities with regard to each point in time or each
place.
We let θij be a random latent variable that represents the ability of the jth group. θij is
distributed as N(μj ,σ2
j ) (in this paper, we do not assume the multidimensionality of abilities.
For the problem of multidimensionality of abilities, see van der Linden  Luecht, 1998). Under
the three-parameter logistic model, the probability that the ith examinee of ability θij correctly
answered item k of test X (X = A,B,C) is defined as
p

uiXkX |θij ,ψjk

= cXkX + (1 − cXkX )
1
1 + exp{−1.7aXkX (θij − bXkX )}
, (2)
where kX = 1,...,KX and ψjk is the vector that contains all the item parameters of item k of
the jth group.
The probability that the ith item response vector uij is obtained is expressed as follows:
p

ui1|θi1,ψ1

=
KA

kA=1

cAkA + (1 − cAkA )
1
1 + exp{−1.7aAkA (θi1 − bAkA )}

×
KB

kB =1

cBkB + (1 − cBkB )
1
1 + exp{−1.7aBkB (θi1 − bBkB )}

, (3)
p

ui2|θi2,ψ2

=
KB

kB =1

cBkB + (1 − cBkB )
1
1 + exp{−1.7aBkB (θi2 − bBkB )}

×
KC

kC=1

cCkC + (1 − cCkC )
1
1 + exp{−1.7aCkC (θi2 − bCkC )}

. (4)
In the preceding models, the assignment mechanism is explained by the observed portion of
the complete item responses (Lord, 1974; Bernaards  Sijtsma, 1999). However, if the indicator
KEI MIYAZAKI ET AL. 5
variable of test form selection behavior depends on both observed and missing portions, the as-
signment mechanism is not random, thereby leading to the conclusion that the existing methods
yield biased estimates for ability and item parameters (see the simulation study in Sect. 4). To
solve this problem, we modeled the relation between grouping variables and item response vari-
ables of all the tests containing observed and missing item response variables, using the logistic
regression model.
Our model also seeks to estimate the differences in ability parameters between the two points
in time. In previous models, because the likelihood function was separated into the likelihood of
grouping variable and item response variables in item response models, it turns out that the ex-
isting IRT model provides consistent estimates for parameters. In our model, however, grouping
variables depend on the item response variables of all the test forms and, therefore, the likelihood
function cannot be separated. Therefore, the traditional method cannot be applied.
In this paper, the test form selection mechanism is modeled using the following nominal
logistic regression model. To express the model in a more general manner, we let the explanatory
variables include item response variables ui = (u′
i1,u′
i2)′ and ability variables θi = (θi1,θi2).
The equation is as follows:
p

ri = j|ui,θi,ρ

=
exp(ρ′
uj ui + ρ′
fj θi)
1 + exp(ρ′
uj ui + ρ′
fj θi)
=
exp(ρ′
j zi)
1 + exp(ρ′
j zi)
, (5)
where zi = (u′
i,θ′
i)′ and ρj = (ρ′
uj ,ρ′
fj )′. ρj are coefficients multiplied by item response vector
ui and latent ability variable θi. The higher the values of the elements of ρj , the higher is the
probability that the examinees are assigned to the jth group, and
ρuj =

ρ′
Aj ,ρ′
B1j ,ρ′
B2j ,ρ′
Cj
′
, (6)
ρAj = (ρA1j ,...,ρAKAj )′
, (7)
ρB1j = (ρB11j ,...,ρB1KB j )′
, (8)
ρB2j = (ρB21j ,...,ρB2KB j )′
, (9)
ρCj = (ρC1j ,...,ρCKCj )′
, (10)
ρθj = (ρθ1j ,ρθ2j )′
. (11)
Furthermore, to ensure notational simplicity, let φj be the vector that contains μj and σ2
j .
In this paper, we provide the maximum likelihood estimation of the parameters. Under the
assumption that θ1 and θ2 are independent, θ1 ⊥
⊥θ2, the complete-data log-likelihood function of
the sample of the ui,ri observations is a function of ψ,φ, and ρ, and is written as follows:
L

ψ,φ,ρ|r,u,θ

=
N

i=1
logp

ri = j,ui,θi|ψ,φ,ρ

=
N

i=1
logp

ri = j|ui,θi,ρ

+
2

j=1
logp

uij |θij ,ψj

+
2

j=1
logp

θij |φj

= LR + LU + L, (12)
6 PSYCHOMETRIKA
where
LR =
N

i=1
logp

ri = j|ui,θi,ρ

, (13)
LU =
N

i=1
2

j=1
logp

uij |θij ,ψj

, (14)
L =
N

i=1
2

j=1
logp

θij |φj

. (15)
Actually, both umis and θ cannot be observed. Let Lobs(ψ,ρ|r,uobs) be the observed log-
likelihood value. Lobs(ψ,ρ|r,uobs) is as follows:
Lobs

ψ,ρ|r,uobs

=
N

i=1
logp

ri = j|uobs
i ,ρ

+
N

i=1
logp

uobs
i |ψ

. (16)
The function form is so complex that we cannot maximize this directly. To solve this problem,
we used the EM algorithm (Dempster, Laird,  Rubin, 1977), which is useful for analyzing data
containing missing values.
The Relationship between Our Method and Existing Linking Methods for the Nonequivalent
Groups Design
Nonignorable missing data models are classified broadly into two categories: pattern mix-
ture models and selection models (Little and Rubin, 2002). The multiple group IRT model is
categorized as a kind of pattern mixture model. Pattern mixture models are expressed as a joint
distribution where the observed variables depend on missing indicators. Thus, the likelihood
function of the multiple groups IRT model is expressed as follows:
lobs

ψ,φ,ρ|r,u,θ

=
N

i=1
p(ri = j,uobs
i ,umis
i ,θi|ψ,φ,ρ) dumis
i dθi
=
N

i=1
p

uobs
i ,umis
i ,θi|ri = j,ψ,φ

× p

ri = j|ρ

dumis
i dθi. (17)
As described in Sect. 3, we can test whether the missing data umis
i have an effect on test form
selection behavior using the Wald statistic. In this way, the proposed method is more practical
and useful than the existing nonequivalent groups design.
We also considered the assumption that an examinee’s ability can influence test form assign-
ment. The observed likelihood of this assumption can be expressed as follows:
lobs

ψ,φ,ρ|r,uobs

=
N

i=1
p

ri = j|θi,ρ

× p

uobs
i ,umis
i ,θi|ψ,φ

dumis
i dθi. (18)
Equation (18) indicates that the likelihood function is not separated into the likelihood of group-
ing variable and that of item response variables. The likelihood function under the MAR assump-
tion, that is, the likelihood function of the existing multiple group IRT, is given by (17) and is
KEI MIYAZAKI ET AL. 7
separated into the likelihood of grouping variable and item response variables; this form is ob-
viously different from (18). The data are not MAR and, therefore, existing IRT linking methods
can yield biased ML estimates. Consequently, as long as we assume that test form selection be-
havior depends on examinees’ abilities, all the existing methods related to concurrent calibration
inevitably yield biased estimates. The proposed method can also adjust for these biases.
3. The Estimation Method
Obtaining the ML estimates by directly maximizing p(r,uobs|ψ,ρ) is very difficult because
the likelihood function p(r,uobs|ψ,ρ) is very complicated due to the presence of missing data
and latent variables. Thus, instead of working with p(r,uobs|ψ,ρ) directly, we augment uobs
with (umis,f ) using the EM algorithm in the ML estimation. Consequently, the ML estimation
based on the complete data set is made easier when the following EM algorithm is used:
[E-step]: Evaluate the expected value of the complete-data log-likelihood with respect to
umis and θ
Q

ψ,φ,ρ|ψ(t)
,φ(t)
,ρ(t)

=
N

i=1
logp

ri|ui,θi,ρ

+
2

j=1
logp

uij ,θij |ψj ,φj

× p

umis
i ,θi|uobs
i ,ri,ψ(t)
,φ(t)
,ρ(t)

dumis
i dθi (19)
at the tth iteration with a current value ψ(t)
,φ(t)
.
Since the E-step cannot be calculated analytically, we use the MCEM algorithm Wei
 Tanner, 1990), where this E-step is approximated by the Monte Carlo estimate of the
expectation using a sufficiently large number of observations simulated from the condi-
tional distribution p(umis,θ|uobs,ψ(t)
,φ(t)
). This is accomplished by using the Metropolis–
Hastings algorithm, which enables us to draw samples from p(umis
i |uobs
i ,ri,θi,ψ(t)
,ρ(t)) and
p(θi|uobs
i ,umis
i ,ri,φ(t)
). (See Ibrahim, Chen,  Lipsitz, 2001 for the ML estimation in general-
ized linear models when the missing data mechanism is nonignorable.)
The following Metropolis–Hastings algorithms are used to sample umis and θ. Let umis
(m)
and θ(m) be the current values at the mth iteration and umis
i(m) and θi(m) be the values of the ith
observation of umis
(m) and θ(m).
(i) Generate umis
i(m+1) from p(umis
i |uobs
i ,ri,θi(m),ψ(t)
,ρ(t)) (Metropolis–Hastings algorithm)
The target and the proposal distribution are as follows:
target distribution: p

umis
i |uobs
i ,ri,θi(m),ψ(t)
,ρ(t)

(20)
proposal distribution: p

umis
i |uobs
i ,θi(m),ψ(t)

(21)
(i-1) Draw u∗
i ∼ p(umis
i |uobs
i ,θi(m),ψ(t)
)
(i-2) Accept umis
i(m+1) = u∗
i with probability
α

u∗
i |umis
i(m),uobs
i ,θi(m),ri,ρ(t)

= min

p(ri|uobs
i ,u∗
i ,θi(m),ρ(t))
p(ri|uobs
i ,umis
i(m),θi(m),ρ(t))
,1

(22)
8 PSYCHOMETRIKA
(ii) Generate θi(m+1) from p(θi|uobs
i ,umis
i(m+1),ri,φ(t)
,ρ(t))
Further, here, we use the following Metropolis–Hastings algorithm. The target and the pro-
posal distribution are as follows:
target distribution: p

θi|ri,uobs
i ,umis
i(m+1),ρ(t)
,φ(t)

(23)
proposal distribution: p

θi|φ(t)

(24)
(ii-1) Draw θ∗
i ∼ p(θi|φ(t)
)
(ii-2) Accept θi(m+1) = θ∗
i with probability
α

θ∗
i |θi(m),uobs
i ,umis
i(m+1),ψ(t)

= min

p(ri,uobs
i ,umis
i(m+1)|θ∗
i ,ρ(t),ψ(t)
)
p(ri,uobs
i ,umis
i(m+1)|θi(m),ρ(t),ψ(t)
)
,1

= min

p(ri|uobs
i ,umis
i(m+1),θ∗
i ,ρ(t))p(uobs
i ,umis
i(m+1)|θ∗
i ,ψ(t)
)
p(ri|uobs
i ,umis
i(m+1),θi(m),ρ(t))p(uobs
i ,umis
i(m+1)|θi(m),ψ(t)
)
,1

. (25)
After drawing samples of umis
(m) and θ(m), Q(ψ,φ,ρ|ψ(t)
,φ(t)
,ρ(t)) is approximated by the
Monte Carlo integration:
Q

ψ,φ,ρ|ψ(t)
,φ(t)
,ρ(t)

≈
N

i=1
1
M
M

m=1
logp

ri,uobs
i ,umis
i(m),θi(m)|ψ,φ,ρ

, (26)
where M is the number of draws.
[M-step]: Maximize Q(ψ,φ,ρ|ψ(t)
,φ(t)
,ρ(t)) and update ψ(t)
,φ(t)
,ρ(t) to ψ(t+1)
,
φ(t+1)
, ρ(t+1).
At the maximization (M)-step, we need to maximize Q(ψ,φ,ρ|ψ(t)
,φ(t)
,ρ(t)) with respect
to ψ,φ and ρ. Using (12) and (26), Q(ψ,φ,ρ|ψ(t)
,φ(t)
,ρ(t)) can be written as follows:
Q

ψ,φ,ρ|ψ(t)
,φ(t)
,ρ(t)

= ER + EU + E, (27)
where
ER =
N

i=1
logp

ri|ui,θi,ρ

× p

umis
i ,θi|uobs
i ,ri,ψ(t)
,φ(t)
,ρ(t)

dumis
i dθi, (28)
EU =
N

i=1
2

j=1
logp

uij |θij ,ψ

× p

umis
i ,θi|uobs
i ,ri,ψ(t)
,φ(t)
,ρ(t)

dumis
i dθi, (29)
E =
N

i=1
2

j=1
logp

θij |φ

× p

umis
i ,θi|uobs
i ,ri,ψ(t)
,φ(t)
,ρ(t)

dumis
i dθi. (30)
Thus, maximizing Q(ψ,φ,ρ|ψ(t)
,φ(t)
,ρ(t)) is equivalent to solving the following equations:
∂Q(ψ,φ,ρ|ψ(t)
,φ(t)
,ρ(t))
∂ρ
=
∂ER
∂ρ
= 0, (31)
KEI MIYAZAKI ET AL. 9
∂Q(ψ,φ,ρ|ψ(t)
,φ(t)
,ρ(t))
∂ψ
=
∂EU
∂ψ
= 0, (32)
∂Q(ψ,φ,ρ|ψ(t)
,φ(t)
,ρ(t))
∂φ
=
∂E
∂φ
= 0. (33)
The complete-data likelihood equation for ρ cannot be obtained as a closed form; therefore, the
Newton–Raphson method is used to obtain the maximum of updating parameters. The first and
second partial derivatives of ρj (j = 1,2) are
∂LR
∂ρj
=
N

i=1

Rij −
exp(ρ′
j zi)
1 + exp(ρ′
j zi)
zi, (34)
∂2LR
∂ρj ∂ρ′
j
= −
N

i=1
exp(ρ′
j zi)
1 + exp(ρ′
j zi)

1 −
exp(ρ′
j zi)
1 + exp(ρ′
j zi)
ziz′
i, (35)
where R is N × 2 indicator matrix in which the (i,j)th element is defined as:
Rij =

1 if ri = j,
0 if ri = j.
(36)
Let ρ
(t)
j(s) be the value of the sth Newton–Raphson step in the tth M-step. The following equation
is used for updating ρ(t)
j
ρ
(t)
j(s+1) = ρ
(t)
j(s) −

∂2ER
∂ρj ∂ρt
j
−1
ρj =ρ
(t)
j(s)
×

∂ER
∂ρj

ρj =ρ
(t)
j(s)
, (37)
where
∂ER
∂ρj
=
1
M
M

m=1
∂LR
∂ρj
,
∂2ER
∂ρj ∂ρ′
j
=
1
M
M

m=1
∂2LR
∂ρj ∂ρ′
j
. (38)
Updating is repeated through the above equation until the convergence criterion is satisfied.
Moreover, the likelihood equation for the item parameters cannot be obtained as a closed
form; therefore, again, we use the Newton–Raphson method.
In operational practice, the following two types of constraints can be imposed for test form
selection behavior, and some parts of the Metropolis–Hastings algorithm in the E-step are altered
as described below due to these constraints.
When the Test Form Selection Behavior Depends Only on the Ability Parameters
(i) is altered as follows:
(i’) Generate umis
i(m+1) from p(umis
i |uobs
i ,ri,θi(m),ψ(t)
,ρ(t))
Because umis
i is independent of ri, the conditional distribution of umis
i can be obtained from
p(umis
i |uobs
i ,θi(m),ψ(t)
,ρ(t)).
When the Test Form Selection Behavior Depends Only on the Test Scores
(ii) is altered as follows:
10 PSYCHOMETRIKA
(ii’) Generate θi(m+1) from p(θi|uobs
i ,umis
i(m+1),ri,φ(t)
,ρ(t))
We again use the following Metropolis–Hastings algorithm. Further, the target and the pro-
posal distribution are as follows:
target distribution: p

θi|ri,uobs
i ,umis
i(m+1),ρ(t)
,φ(t)

(39)
proposal distribution: p

θi|φ(t)

(40)
(ii’-1) Draw θ∗
i ∼ p(θi|φ(t)
)
(ii’-2) Accept θi(m+1) = θ∗
i with probability
α

θ∗
i |θi(m),uobs
i ,umis
i(m+1),ψ(t)

= min

p(uobs
i ,umis
i(m+1)|θ∗
i ,ψ(t)
)
p(uobs
i ,umis
i(m+1)|θi(m),ψ(t)
)
,1

. (41)
Assessing the Covariance Matrix of the Estimates Based on the Observed Information Matrix
Let ξ = (ψ,φ,ρ); that is, ξ contains all the parameters of our model. As a by-product, the
standard error of the parameter vector ξ can be calculated for the MCEM algorithm. Louis (1982)
showed that the observed information matrix of ξ̂ from the EM algorithm can be expressed as
I

ξ̂|yobs

= Eξ Ic

ξ|y

|yobs

ξ=ξ̂
− Eξ

Sc

y|ξ

ST
c

y|ξ

|yobs

ξ=ξ̂
, (42)
where Ic(ξ|y) is the matrix of the negative of the second-order partial derivatives of the complete-
data log likelihood function with respect to the elements of ξ, and Sc(y|ξ) is the gradient vector
of the complete-data log likelihood function, that is,
Ic

ξ|y

= −
∂2 logLc(ξ|y)
∂ξ∂ξ′ , (43)
Sc

y|ξ

=
∂ logLc(ξ|y)
∂ξ
, (44)
where Lc(ξ|y) is a complete-data log likelihood function in which the missing part is com-
plemented through each MCE step. In practice, the calculation of expectation values in (42) is
approximated through Monte Carlo integration. Therefore, (42) is translated as follows:
I

ξ̂|yobs

=
1
M
M

m=1
Ic

ξ̂|y(m)

−
1
M
M

m=1
Sc

y(m)|ξ̂

ST
c

y(m)|ξ̂

. (45)
Using the observed information matrix I(ξ̂|yobs), let ξp be a part of the parameter vector ξ, so
that we can test the null hypothesis “H0 : ξp = 0” using the Wald statistic. The Wald statistic of
the hypothesis H0 can be expressed as follows:
W = ξ′
p

Iξ′
p

ξ̂|yobs
−1
ξp, (46)
where Iξp (ξ̂|yobs) is the submatrix of the Fisher information I(ξ̂|yobs) relevant to ξp.
4. Simulation Study
To show the reliability of the proposed method, we carried out a simulation study. We in-
cluded data for which the assignment of test forms was not random, and for which the test form
KEI MIYAZAKI ET AL. 11
selection behavior depended on both the scores on the tests the examinees selected and the scores
on the tests they did not select. We used IML/SAS to evaluate the multiple group item response
theory as well as the estimates from the proposed method. For the simulation study, we gener-
ated 100 data sets and for each data set, we obtained the usual ML estimates for multiple group
IRT (concurrent calibration) and the estimates using the proposed method. We were interested in
assessing the accuracy of the parameter estimation for our model.
A two-parameter logistic model was used as the functional form of the item response. Each
test had 10 items. The item parameters of Test B (aB,bB) were common across test forms, and
μ1 and σ2
1 were fixed as μ1 = 0 and σ2
1 = 1, respectively. To ensure the identifiability of the
model and for the sake of simplicity, we assumed that the sums of the observed and missing
test scores would determine test form selection behavior; that is, we considered the following
constraint:
ρA1j = ··· = ρAKAj = ρB11j = ··· = ρB1KB j = πj ,
(47)
ρB21j = ··· = ρB2KB j = ρC1j = ··· = ρCKCj = −πj .
Even when this constraint is assumed, the assumption of a nonrandom assignment can be upheld.
With these constraints, the test selection probability can be expressed as follows:
p

ri = j|ui,ρ

=
exp(πj vi)
1 + exp(π1vi)
, (48)
where vi is the difference between the scores of the ith examinee on the two test forms:
vi =
KA

kA=1
uiAkA +
KB

kB =1
uiB1kB −
KB

kB =1
uiB2kB +
KC

kC=1
uiCkC . (49)
The true values are provided in Tables 1 and 2. In the current model, the total number of
parameters was 63. In this simulation study, the correlation of θ1 and θ2 was set to 0.5.
We generated 100 replications, all of which followed the same population parameters (or
true values). For each replication, we generated 3,000 observations and set M = 30.
The convergence criterion for the Newton–Raphson algorithm in each M-step was 0.001.
The means of the ML estimates were computed based on 100 replications. The root mean squares
(RMSs) between the estimates and true values as well as the total value of mean squared errors
(MSEs) were computed to compare the accuracy of the results of this simulation with those under
a random assignment condition. The results are listed in Tables 1 and 2. We obtained the biases
by subtracting the estimated values from the true values. Our method could calculate scores.
We calculated Monte Carlo estimates for the ability parameter of each examinee and created
histograms (see Fig. 3).
The sum of the MSEs was calculated, and the value was 0.298 for the proposed model and
3.18 for the traditional model. Moreover, the sum of the absolute values of the biases was cal-
culated for each assumed model. The resultant value of the sum was 0.686 for the proposed
model and 12.5 for the traditional model. With regard to the sum of the MSEs, approximately 10
times the difference was observed, whereas we found that the sum of the biases under the con-
current calibration was about 20 times larger than that under the proposed model. These results
indicate that the parameters can be estimated accurately under the proposed model, whereas the
traditional model essentially yields biased estimates.
12 PSYCHOMETRIKA
TABLE 1.
The results of simulation study for the IRT model (ρ,φ,µ,aA,aB ).
Para Proposed model Existing model
Biases RMS Biases RMS
π1 = 0.2 −3.00 × 10−4 0.0238 ∗ *
μ2 = 0.5 −3.33 × 10−3 0.0575 0.420 0.423
σ2 = 1.5 −2.47 × 10−2 0.111 −0.181 0.204
aA1 = 0.6 4.68 × 10−3 0.0477 −0.0460 0.0641
aA2 = 0.8 1.42 × 10−4 0.0669 −0.0623 0.0885
aA3 = 1.0 −4.76 × 10−3 0.0825 −0.0808 0.113
aA4 = 1.2 −1.97 × 10−3 0.0788 −0.0936 0.120
aA5 = 1.5 2.56 × 10−2 0.107 −0.0944 0.139
aA6 = 0.8 6.35 × 10−3 0.0630 −0.0571 0.0824
aA7 = 0.6 5.91 × 10−3 0.0521 −0.0456 0.0674
aA8 = 1.0 3.01 × 10−3 0.0663 −0.0752 0.0979
aA9 = 1.5 1.32 × 10−2 0.105 −0.103 0.143
aA10 = 1.2 1.03 × 10−2 0.0959 −0.0839 0.122
aB1 = 1.8 1.32 × 10−2 0.0903 −0.116 0.145
aB2 = 1.3 3.82 × 10−3 0.0548 −0.0859 0.102
aB3 = 1.2 7.17 × 10−3 0.0570 −0.0779 0.0968
aB4 = 0.7 3.91 × 10−3 0.0333 −0.0488 0.0589
aB5 = 0.9 8.89 × 10−3 0.0508 −0.0543 0.0709
aB6 = 0.8 1.95 × 10−3 0.0366 −0.0563 0.0666
aB7 = 0.7 −1.67 × 10−3 0.0366 −0.0525 0.0651
aB8 = 1.0 −7.10 × 10−3 0.0481 −0.0774 0.0929
aB9 = 1.1 9.35 × 10−3 0.0531 −0.0669 0.0849
aB10 = 1.0 1.63 × 10−2 0.0521 −0.0551 0.0716
5. Real Data Analysis
We illustrated the applicability of our method, using a small part of the JLRT data set. The
JLRT is a test that measures the ability to communicate with other persons in Japanese in the
business setting, and is administered mainly to students or business people whose first language
is not Japanese. The JLRT has two equivalent test forms. Test form 1 includes Tests A and B,
and Test form 2 includes Tests B and C; Test B is common to the two test forms. The test form
the examinees take depends on the country in which they take the test. For illustrative purposes,
we used a subsample that recently took the JLRT. One thousand one hundred and ninety-nine
examinees took Test form 1 in Japan and 863 examinees took Test form 2 in other countries.
Considering that the examinees living in Japan were probably highly motivated to study Japanese
and had greater abilities than the examinees living in other countries, we can assume that the
examinees were not randomly assigned to the test forms.
For the purpose of simplification, a two-parameter logistic model was used as the functional
form of the item responses. The number of items in Test B was 33 and the number of items in
each Test A and Test C is 67. If we had analyzed all these items, there would have been too
much data in the tables and figures, which would have been inappropriate for the purpose of
illustration. Thus, after an exploratory analysis of 100 items included in the two test forms using
BILOG-MG, we chose 10 items from each of the Tests A, B, and C. μ1 and σ2
1 are fixed as
μ1 = 0, σ2
1 = 1, respectively. As in the simulation study, we assumed that the sums of the test
scores determined the test form selection behavior. The convergence criterion for the EM step
KEI MIYAZAKI ET AL. 13
TABLE 2.
The results of simulation study for the IRT model (aC,bA,bB ,bC).
Para Proposed model Existing model
Biases RMS Biases RMS
aC1 = 1.3 1.02 × 10−2 0.0986 −0.0529 0.111
aC2 = 0.7 2.97 × 10−3 0.0500 −0.0386 0.0615
aC3 = 0.6 8.71 × 10−3 0.0498 −0.0281 0.0546
aC4 = 0.8 1.41 × 10−2 0.0629 −0.0321 0.0671
aC5 = 1.5 3.34 × 10−2 0.120 −0.0172 0.101
aC6 = 0.9 7.85 × 10−3 0.0594 −0.0446 0.0725
aC7 = 0.8 1.01 × 10−2 0.0609 −0.0316 0.0659
aC8 = 0.6 3.48 × 10−3 0.0514 −0.0346 0.0593
aC9 = 1.0 4.59 × 10−3 0.0660 −0.0405 0.0727
aC10 = 1.1 1.11 × 10−2 0.0787 −0.0464 0.0831
bA1 = −0.5 −8.21 × 10−3 0.0785 −0.323 0.342
bA2 = −1.3 −1.89 × 10−2 0.122 −0.381 0.408
bA3 = −1.0 −1.68 × 10−2 0.0796 −0.343 0.360
bA4 = −0.4 −3.15 × 10−3 0.0408 −0.272 0.283
bA5 = −0.6 −9.54 × 10−3 0.0405 −0.289 0.302
bA6 = −0.7 −1.08 × 10−2 0.0767 −0.323 0.342
bA7 = −0.3 −3.60 × 10−3 0.0705 −0.302 0.317
bA8 = −0.2 −1.09 × 10−2 0.0451 −0.272 0.283
bA9 = −0.5 −5.11 × 10−3 0.0441 −0.276 0.285
bA10 = −1.0 −3.91 × 10−3 0.0684 −0.321 0.339
bB1 = 0 −8.73 × 10−3 0.0245 −0.223 0.231
bB2 = 0.1 −5.57 × 10−3 0.0285 −0.220 0.227
bB3 = −0.2 −6.35 × 10−3 0.0333 −0.247 0.255
bB4 = 0.3 −6.19 × 10−3 0.0404 −0.232 0.242
bB5 = −0.7 −6.89 × 10−3 0.0572 −0.295 0.306
bB6 = 0.6 −8.92 × 10−3 0.0357 −0.203 0.212
bB7 = −0.2 −9.09 × 10−3 0.0511 −0.275 0.287
bB8 = 1.0 −9.23 × 10−3 0.0410 −0.148 0.175
bB9 = −0.6 −2.51 × 10−3 0.0383 −0.275 0.284
bB10 = 0.5 −5.22 × 10−3 0.0282 −0.197 0.206
bC1 = 0.5 −1.57 × 10−2 0.0418 −0.182 0.192
bC2 = 0.7 6.47 × 10−3 0.0581 −0.174 0.190
bC3 = 1.5 −1.81 × 10−2 0.0723 −0.154 0.175
bC4 = 0.5 −6.53 × 10−3 0.0619 −0.194 0.210
bC5 = 1.0 −1.04 × 10−2 0.0346 −0.139 0.150
bC6 = 0.6 −1.40 × 10−2 0.0475 −0.189 0.201
bC7 = 1.2 −1.39 × 10−2 0.0529 −0.154 0.168
bC8 = 0.8 −1.27 × 10−2 0.0634 −0.201 0.217
bC9 = 1.3 −9.13 × 10−3 0.0478 −0.134 0.151
bC10 = 0.8 −6.42 × 10−3 0.0458 −0.158 0.172
and for the Newton–Raphson algorithm in each M-step was 0.01. We obtained the parameter
estimates and the asymptotic covariance matrix using Louis’s (1982) method. In addition, we
used BILOG-MG and obtained the estimates under the MAR assumption. The results are listed
in Tables 3 and 4. The estimates of the difficulty parameters varied greatly between the proposed
14 PSYCHOMETRIKA
FIGURE 3.
Histograms of scores for each group in simulation.
and existing methods. This result is similar to the data-generating situation in the simulation
study.
As a method for model comparison, the Akaike information criteria (AIC) and the Bayesian
information criteria (BIC) were calculated to compare the proposed method with the existing
method. The results are listed in Table 5. The method with the lower AIC and BIC values was
found to be more appropriate to analyze the current data. Hence, it is concluded that the proposed
method is superior to the existing one. Following (46), we also calculated the Wald statistic of
KEI MIYAZAKI ET AL. 15
TABLE 3.
The results of real data analysis (ρ,φ,µ,aA,aB ,aC).
Para Proposed model Existing model (MAR)
Estimates SE Estimates SE
π1 0.443 0.0383 * *
μ2 −0.445 0.0320 −0.639 0.0327
σ2 0.684 0.0316 0.701 0.0222
aA1 1.369 0.238 0.954 0.189
aA2 1.786 0.266 1.388 0.152
aA3 1.521 0.393 1.234 0.280
aA4 1.654 0.382 1.456 0.269
aA5 1.515 0.388 1.165 0.235
aA6 1.298 0.137 0.905 0.081
aA7 0.956 0.141 0.640 0.090
aA8 0.337 0.074 0.255 0.041
aA9 1.712 0.191 1.243 0.111
aA10 1.032 0.099 0.734 0.057
aB1 1.093 0.088 0.931 0.057
aB2 0.540 0.052 0.408 0.033
aB3 0.661 0.051 0.536 0.035
aB4 0.508 0.058 0.383 0.037
aB5 0.932 0.064 0.741 0.039
aB6 1.162 0.107 1.037 0.070
aB7 0.804 0.070 0.649 0.043
aB8 0.861 0.072 0.737 0.047
aB9 1.079 0.077 0.891 0.047
aB10 0.480 0.044 0.363 0.028
aC1 0.724 0.132 0.834 0.101
aC2 0.804 0.359 1.085 0.195
aC3 1.026 0.280 1.487 0.199
aC4 1.114 0.280 1.655 0.217
aC5 0.885 0.150 1.057 0.114
aC6 1.041 0.195 1.346 0.164
aC7 0.448 0.088 0.521 0.065
aC8 1.347 0.159 1.522 0.111
aC9 0.605 0.098 0.623 0.067
aC10 0.439 0.084 0.480 0.060
the hypothesis concerning MAR H0 : π1 = 0. The Wald statistic follows a chi-square distribution
with 1 degree of freedom. The resulting value was χ2(1) = 191.3,p  0.001. This indicates that
the MAR assumption cannot be upheld and missing data are nonignorable. As another statistical
testing of π1, the Z-value was also calculated at 11.57 (p  0.001), which is statistically signifi-
cant. The AIC, BIC, Wald statistic, and Z-value, all suggest that assignment to the test forms was
not random. The examinees could determine which test form they would take by determining
which country they lived in and were also expected to have acquired superior Japanese language
skills if they lived in Japan. Therefore, in our real data analysis, we can assume that test form
selection behavior exists and that assignment to a test form is not random.
For reference, we calculated Monte Carlo estimates for the ability parameters of each ex-
aminee and created histograms using our proposed method (see Fig. 4). Figure 4 shows that the
examinees living in countries other than Japan (group 2) had lower abilities than those living in
Japan, which is consistent with the expected results.
16 PSYCHOMETRIKA
TABLE 4.
The results of real data analysis (bA,bB ,bC).
Para Proposed model Existing model (MAR)
Estimates RMS Estimates RMS
bA1 −1.776 0.267 −2.970 0.473
bA2 −1.054 0.105 −1.672 0.119
bA3 −1.827 0.317 −2.815 0.442
bA4 −1.603 0.223 −2.354 0.257
bA5 −1.744 0.316 −2.729 0.393
bA6 −0.783 0.089 −1.407 0.119
bA7 −1.547 0.237 −2.697 0.362
bA8 −1.776 0.486 −2.922 0.542
bA9 −0.792 0.076 −1.345 0.093
bA10 −0.279 0.066 −0.723 0.085
bB1 −0.965 0.067 −1.499 0.064
bB2 −0.855 0.101 −1.551 0.126
bB3 1.493 0.094 1.475 0.094
bB4 −1.550 0.182 −2.490 0.223
bB5 −0.249 0.042 −0.679 0.046
bB6 −1.230 0.083 −1.767 0.075
bB7 −0.974 0.083 −1.577 0.089
bB8 −0.989 0.080 −1.542 0.079
bB9 −0.526 0.048 −0.988 0.046
bB10 0.658 0.072 0.412 0.080
bC1 −1.916 0.260 −2.257 0.163
bC2 −2.981 0.961 −2.934 0.304
bC3 −2.265 0.366 −2.335 0.137
bC4 −2.168 0.307 −2.261 0.118
bC5 −1.709 0.189 −2.044 0.115
bC6 −1.984 0.233 −2.241 0.126
bC7 −1.119 0.183 −1.540 0.127
bC8 −0.657 0.053 −1.098 0.038
bC9 0.640 0.153 0.120 0.098
bC10 −0.212 0.101 −0.766 0.088
TABLE 5.
The value of AIC and BIC for the real data analysis.
Proposed method Existing method
AIC 2.878 × 104 3.132 × 104
BIC 2.914 × 104 3.169 × 104
6. Conclusion
In this paper, we proposed a new method of item parameter linking in IRT. Through the
simulation study, we showed that ignoring test form selection behavior results in considerable
bias in the estimates of the item parameters when the assignment to the test forms is not ran-
dom. Furthermore, we showed that this bias can be reduced using the models presented above.
However, although the results in the simulation study seemed to be sufficient to demonstrate the
accuracy of the proposed estimation method, the models and methods for nonignorable missing
data are notoriously sensitive to misspecification. In some cases of misspecification, the bias of
KEI MIYAZAKI ET AL. 17
FIGURE 4.
Histograms of scores for each group in real data.
these non-MAR models can also be more serious than when assuming MAR. Because the true
missing data mechanism is not known in practice, the performance of simulation studies can sel-
dom be widely generalized to real data. The issue of robustness of the proposed method to model
assumptions is left for future empirical studies.
The proposed model includes the model in which test form selection behavior is determined
at random without dependency on the scores on all the tests, which leads to the idea that we can
apply the proposed model initially; then using some test statistics such as the Wald statistic, we
18 PSYCHOMETRIKA
can test whether missing test scores affect test form selection behavior. In this regard, this model
is advantageous.
A variety of methods for dealing with ignorable and nonignorable missing data in practical
situations have been proposed (Schafer, 1997). Models with nonignorable missing-data mech-
anisms in IRT were also proposed by Holman and Glas (2005). However, since they were in-
terested in modeling the nonignorable missing data mechanism with the item response model,
their model does not consider item parameter linking in the common-item nonequivalent groups
design. While their model considers the missing mechanisms per item, our model deals with the
missing mechanism per test form. Moreover, their model is based on the idea of pattern mixture
models in which the test form selection indicators are set as the explanatory variables and the
item responses are set as the dependent variables. In contrast, our model is regarded as a kind of
selection model in which the relation between explanatory variables and dependent variables are
contrary to that in pattern mixture models.
While the proposed model is a full parametric model, one can conduct analysis using a
semiparametric model with propensity scores (Hoshino, Kurata,  Shigemasu, 2006; Hoshino,
2007, 2008) under the MAR assumption. We are going to conduct simulations to confirm which
of the two methods to use: the proposed method that needs parametric model assumption but
does not need the MAR assumption or the above methods that enable semiparametric analysis
but require the MAR assumption.
Models for tests having items that differ in terms of levels of measurement (such as dichoto-
mous data and polytomous data) are also topics that can be addressed in future research.
Acknowledgements
The authors are grateful to Ms. Naoko Hojo of the Japan External Trade Organization
(JETRO) for helping us access the JLRT data. This study was partially supported by the Min-
istry of Education, Science, Sports, and Culture, Grant-in-Aid for Scientific Research, 19-8879,
Scientific Research (B), 193-30145 and the Inamori Foundation grant (to Takahiro Hoshino). Fi-
nally, we would like to express our sincere thanks to the associate editor and two reviewers for
their valuable advice and comments.
References
Baker, F.B.,  Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal
of Educational Measurement, 28, 147–162.
Bernaards, C.A.,  Sijtsma, K. (1999). Factor analysis of multidimensional polytomous item response data suffering
from ignorable item nonresponse. Multivariate Behavioral Research, 34, 277–313.
Bock, R.D.,  Zimowski, M.F. (1997). Multiple group IRT. In W.M. van der Linden  R.K. Hambleton (Eds.), Handbook
of modern item response theory (pp. 433–448). Berlin: Springer.
Dempster, A.P., Laird, N.M.,  Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm.
Journal of the Royal Statistical Society, Series B, 39, 1–38.
Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Re-
search, 22, 144–149.
Hanson, B.A.,  Béguin, A.A. (2002). Obtaining a common scale for item response theory item parameters using separate
versus concurrent calibration in the common-item equating design. Applied Psychological Measurement, 26, 3–24.
Holman, R.,  Glas, C.A.W. (2005). Modelling non-ignorable missing-data mechanisms with item response theory
models. British Journal of Mathematical and Statistical Psychology, 58, 1–17.
Hoshino, T. (2007). Doubly robust type estimation for covariate adjustment in latent variable modeling. Psychometrika,
72, 535–549.
Hoshino, T. (2008). A Bayesian propensity score adjustment for latent variable modeling and MCMC algorithm. Com-
putational Statistics  Data Analysis, 52, 1413–1429.
Hoshino, T., Kurata, H.,  Shigemasu, K. (2006). A propensity score adjustment for multiple group structural equation
modeling. Psychometrika, 71, 691–712.
Ibrahim, J.G., Chen, M.H.,  Lipsitz, S.R. (2001). Missing responses in generalised linear mixed models when the
missing data mechanism is nonignorable. Biometrika, 88, 551–564.
KEI MIYAZAKI ET AL. 19
Kato, K. Japan External Trade Organization (JETRO) (2006). BJT buisiness Japanese proficiency test official guide.
Japan External Trade Organization(JETRO), Tokyo, Japan.
Kim, S.H.,  Cohen, A.S. (1992). Effects of linking methods on detection of DIF. Journal of Educational Measurement,
29, 51–66.
Kolen, M.J.,  Brennan, R.L. (2004). Test equating, scaling, and linking: methods and practices (2nd ed.). New York:
Springer.
Little, R.J.A.,  Rubin, D.B. (2002). Statistical analysis with missing data (2nd ed.). New York: Wiley.
Lord, F.M. (1974). Estimation of latent ability and item parameters when there are omitted responses. Psychometrika,
39, 247–264.
Louis, T.A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statis-
tical Society, Series B, 44, 226–233.
Schafer, J.L. (1997). Analysis of incomplete multivariate data. New York: Chapman  Hall.
Stocking, M.L.,  Lord, F.M. (1983). Developing a common metric in item response theory. Applied Psychological
Measurement, 7, 201–210.
van der Linden, W.J.,  Luecht, R.M. (1998). Observed-score equating as a test assembly problem. Psychometrika, 63,
401–418.
von Davier, M.,  von Davier, A.A. (2004). A unified approach to irt scale linkage and scale transformations (Research
Report RR-04-09). ETS: Princeton, NJ
Wei, G.C.G.,  Tanner, M.A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data
augmentation algorithm. Journal of the American Statistical Association, 85, 699–704.
Wingersky, M.S.,  Lord, F.M. (1984). An investigation of methods for reducing sampling error in certain IRT proce-
dures. Applied Psychological Measurement, 8, 347–364.
Yang, W.L. (2004). Sensitivity of linkings between AP multiple-choice scores and composite scores to geographical
region: An illustration of checking for population invariance. Journal of Educational Measurement, 41, 33–41.
Manuscript Received: 31 MAR 2007
Final Version Received: 14 JUL 2008
Published Online Date: 9 SEP 2008

More Related Content

Similar to A New Concurrent Calibration Method For Nonequivalent Group Design Under Nonrandom Assignment

6145-Article Text-9370-1-10-20200513.pdf
6145-Article Text-9370-1-10-20200513.pdf6145-Article Text-9370-1-10-20200513.pdf
6145-Article Text-9370-1-10-20200513.pdfchalachew5
 
An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...
An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...
An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...IJECEIAES
 
#1 Characteristics, Strengths, Weaknesses, Kinds of.pptx
#1 Characteristics, Strengths, Weaknesses, Kinds of.pptx#1 Characteristics, Strengths, Weaknesses, Kinds of.pptx
#1 Characteristics, Strengths, Weaknesses, Kinds of.pptxJessaMaeGastar1
 
CHAPTER 8 QUANTITATIVE METHODSWe turn now from the introductio
CHAPTER 8 QUANTITATIVE METHODSWe turn now from the introductioCHAPTER 8 QUANTITATIVE METHODSWe turn now from the introductio
CHAPTER 8 QUANTITATIVE METHODSWe turn now from the introductioJinElias52
 
A review on evaluation metrics for
A review on evaluation metrics forA review on evaluation metrics for
A review on evaluation metrics forIJDKP
 
Automated Question Paper Generator And Answer Checker Using Information Retri...
Automated Question Paper Generator And Answer Checker Using Information Retri...Automated Question Paper Generator And Answer Checker Using Information Retri...
Automated Question Paper Generator And Answer Checker Using Information Retri...Sheila Sinclair
 
Bijker, M. (2010) Making Measures And Inferences Reserve
Bijker, M. (2010)   Making Measures And Inferences ReserveBijker, M. (2010)   Making Measures And Inferences Reserve
Bijker, M. (2010) Making Measures And Inferences ReserveFontys University
 
Bijker, M. (2010) Making Measures And Inferences Reserve
Bijker, M. (2010)   Making Measures And Inferences ReserveBijker, M. (2010)   Making Measures And Inferences Reserve
Bijker, M. (2010) Making Measures And Inferences ReserveFontys University
 
STAT 778 Project Proposal - Jonathan Poon
STAT 778 Project Proposal - Jonathan PoonSTAT 778 Project Proposal - Jonathan Poon
STAT 778 Project Proposal - Jonathan PoonJonathan Poon
 
Article Repeated Measures.docx
Article Repeated Measures.docxArticle Repeated Measures.docx
Article Repeated Measures.docxwrite12
 
122416, 11(23 AMModule 8 Mastery Exercise SchoologyPa.docx
122416, 11(23 AMModule 8 Mastery Exercise  SchoologyPa.docx122416, 11(23 AMModule 8 Mastery Exercise  SchoologyPa.docx
122416, 11(23 AMModule 8 Mastery Exercise SchoologyPa.docxmoggdede
 
Software Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking SchemeSoftware Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking SchemeEditor IJMTER
 
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...Editor IJCATR
 
An Overview and Application of Discriminant Analysis in Data Analysis
An Overview and Application of Discriminant Analysis in Data AnalysisAn Overview and Application of Discriminant Analysis in Data Analysis
An Overview and Application of Discriminant Analysis in Data AnalysisIOSR Journals
 
Stepsin researchprocesspartialleastsquareofstructuralequationmodeling2016
Stepsin researchprocesspartialleastsquareofstructuralequationmodeling2016Stepsin researchprocesspartialleastsquareofstructuralequationmodeling2016
Stepsin researchprocesspartialleastsquareofstructuralequationmodeling2016Aurangzeb Ch
 
Understanding The Experimental Research Design(Part I)
Understanding The Experimental Research Design(Part I)Understanding The Experimental Research Design(Part I)
Understanding The Experimental Research Design(Part I)DrShalooSaini
 

Similar to A New Concurrent Calibration Method For Nonequivalent Group Design Under Nonrandom Assignment (20)

6145-Article Text-9370-1-10-20200513.pdf
6145-Article Text-9370-1-10-20200513.pdf6145-Article Text-9370-1-10-20200513.pdf
6145-Article Text-9370-1-10-20200513.pdf
 
An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...
An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...
An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...
 
#1 Characteristics, Strengths, Weaknesses, Kinds of.pptx
#1 Characteristics, Strengths, Weaknesses, Kinds of.pptx#1 Characteristics, Strengths, Weaknesses, Kinds of.pptx
#1 Characteristics, Strengths, Weaknesses, Kinds of.pptx
 
CHAPTER 8 QUANTITATIVE METHODSWe turn now from the introductio
CHAPTER 8 QUANTITATIVE METHODSWe turn now from the introductioCHAPTER 8 QUANTITATIVE METHODSWe turn now from the introductio
CHAPTER 8 QUANTITATIVE METHODSWe turn now from the introductio
 
A review on evaluation metrics for
A review on evaluation metrics forA review on evaluation metrics for
A review on evaluation metrics for
 
U0 vqmtq2otq=
U0 vqmtq2otq=U0 vqmtq2otq=
U0 vqmtq2otq=
 
Automated Question Paper Generator And Answer Checker Using Information Retri...
Automated Question Paper Generator And Answer Checker Using Information Retri...Automated Question Paper Generator And Answer Checker Using Information Retri...
Automated Question Paper Generator And Answer Checker Using Information Retri...
 
Bijker, M. (2010) Making Measures And Inferences Reserve
Bijker, M. (2010)   Making Measures And Inferences ReserveBijker, M. (2010)   Making Measures And Inferences Reserve
Bijker, M. (2010) Making Measures And Inferences Reserve
 
Bijker, M. (2010) Making Measures And Inferences Reserve
Bijker, M. (2010)   Making Measures And Inferences ReserveBijker, M. (2010)   Making Measures And Inferences Reserve
Bijker, M. (2010) Making Measures And Inferences Reserve
 
STAT 778 Project Proposal - Jonathan Poon
STAT 778 Project Proposal - Jonathan PoonSTAT 778 Project Proposal - Jonathan Poon
STAT 778 Project Proposal - Jonathan Poon
 
Article Repeated Measures.docx
Article Repeated Measures.docxArticle Repeated Measures.docx
Article Repeated Measures.docx
 
PR 2, WEEK 2.pptx
PR 2, WEEK 2.pptxPR 2, WEEK 2.pptx
PR 2, WEEK 2.pptx
 
122416, 11(23 AMModule 8 Mastery Exercise SchoologyPa.docx
122416, 11(23 AMModule 8 Mastery Exercise  SchoologyPa.docx122416, 11(23 AMModule 8 Mastery Exercise  SchoologyPa.docx
122416, 11(23 AMModule 8 Mastery Exercise SchoologyPa.docx
 
Software Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking SchemeSoftware Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking Scheme
 
Sem sample size
Sem sample sizeSem sample size
Sem sample size
 
AJSR_23_01
AJSR_23_01AJSR_23_01
AJSR_23_01
 
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...A Formal Machine Learning or Multi Objective Decision Making System for Deter...
A Formal Machine Learning or Multi Objective Decision Making System for Deter...
 
An Overview and Application of Discriminant Analysis in Data Analysis
An Overview and Application of Discriminant Analysis in Data AnalysisAn Overview and Application of Discriminant Analysis in Data Analysis
An Overview and Application of Discriminant Analysis in Data Analysis
 
Stepsin researchprocesspartialleastsquareofstructuralequationmodeling2016
Stepsin researchprocesspartialleastsquareofstructuralequationmodeling2016Stepsin researchprocesspartialleastsquareofstructuralequationmodeling2016
Stepsin researchprocesspartialleastsquareofstructuralequationmodeling2016
 
Understanding The Experimental Research Design(Part I)
Understanding The Experimental Research Design(Part I)Understanding The Experimental Research Design(Part I)
Understanding The Experimental Research Design(Part I)
 

More from Kathryn Patel

The Electoral College How It W. Online assignment writing service.
The Electoral College How It W. Online assignment writing service.The Electoral College How It W. Online assignment writing service.
The Electoral College How It W. Online assignment writing service.Kathryn Patel
 
Critical Essay Persuasive Writing Process
Critical Essay Persuasive Writing ProcessCritical Essay Persuasive Writing Process
Critical Essay Persuasive Writing ProcessKathryn Patel
 
003 Why Boston University Sample Essay Example
003 Why Boston University Sample Essay Example003 Why Boston University Sample Essay Example
003 Why Boston University Sample Essay ExampleKathryn Patel
 
Biography Template 01 Life Facts, Autobiography, Sent
Biography Template 01 Life Facts, Autobiography, SentBiography Template 01 Life Facts, Autobiography, Sent
Biography Template 01 Life Facts, Autobiography, SentKathryn Patel
 
Writing A Hypothesis Worksheet - Escolagersonal
Writing A Hypothesis Worksheet - EscolagersonalWriting A Hypothesis Worksheet - Escolagersonal
Writing A Hypothesis Worksheet - EscolagersonalKathryn Patel
 
Pay Someone To Write My Essay Uk. Pay Fo
Pay Someone To Write My Essay Uk. Pay FoPay Someone To Write My Essay Uk. Pay Fo
Pay Someone To Write My Essay Uk. Pay FoKathryn Patel
 
Newspaper Report Writing - Examples, Format, Pdf Exa
Newspaper Report Writing - Examples, Format, Pdf ExaNewspaper Report Writing - Examples, Format, Pdf Exa
Newspaper Report Writing - Examples, Format, Pdf ExaKathryn Patel
 
Write My Essay Online For Cheap. Online assignment writing service.
Write My Essay Online For Cheap. Online assignment writing service.Write My Essay Online For Cheap. Online assignment writing service.
Write My Essay Online For Cheap. Online assignment writing service.Kathryn Patel
 
A Website That Writes Essays For You. The 5 Best Websit
A Website That Writes Essays For You. The 5 Best WebsitA Website That Writes Essays For You. The 5 Best Websit
A Website That Writes Essays For You. The 5 Best WebsitKathryn Patel
 
Pay Someone To Write A Letter For Me, Writing A Letter Requesting Money
Pay Someone To Write A Letter For Me, Writing A Letter Requesting MoneyPay Someone To Write A Letter For Me, Writing A Letter Requesting Money
Pay Someone To Write A Letter For Me, Writing A Letter Requesting MoneyKathryn Patel
 
Castle Writing Opps - T1W4 Jester, Castle, Gallery Wall
Castle Writing Opps - T1W4 Jester, Castle, Gallery WallCastle Writing Opps - T1W4 Jester, Castle, Gallery Wall
Castle Writing Opps - T1W4 Jester, Castle, Gallery WallKathryn Patel
 
Real Harvard Essays College Essay, Admission
Real Harvard Essays College Essay, AdmissionReal Harvard Essays College Essay, Admission
Real Harvard Essays College Essay, AdmissionKathryn Patel
 
How To Write A Discursive Essa. Online assignment writing service.
How To Write A Discursive Essa. Online assignment writing service.How To Write A Discursive Essa. Online assignment writing service.
How To Write A Discursive Essa. Online assignment writing service.Kathryn Patel
 
Writing A Letter To Your Future Self Can Be Very Therapeuti
Writing A Letter To Your Future Self Can Be Very TherapeutiWriting A Letter To Your Future Self Can Be Very Therapeuti
Writing A Letter To Your Future Self Can Be Very TherapeutiKathryn Patel
 
Free Writing Paper Template Writing Paper Template,
Free Writing Paper Template Writing Paper Template,Free Writing Paper Template Writing Paper Template,
Free Writing Paper Template Writing Paper Template,Kathryn Patel
 
Dr. Seuss- Cat In The Hat Writing By Kindergarten Bu
Dr. Seuss- Cat In The Hat Writing By Kindergarten BuDr. Seuss- Cat In The Hat Writing By Kindergarten Bu
Dr. Seuss- Cat In The Hat Writing By Kindergarten BuKathryn Patel
 
Linking Words, Connecting Words Full List And Useful Examples 7ESL
Linking Words, Connecting Words Full List And Useful Examples 7ESLLinking Words, Connecting Words Full List And Useful Examples 7ESL
Linking Words, Connecting Words Full List And Useful Examples 7ESLKathryn Patel
 
How To Write Excellent Lesson Aims Lesson, Ho
How To Write Excellent Lesson Aims Lesson, HoHow To Write Excellent Lesson Aims Lesson, Ho
How To Write Excellent Lesson Aims Lesson, HoKathryn Patel
 
News Article Examples 7 Examples Of Newspaper Arti
News Article Examples 7 Examples Of Newspaper ArtiNews Article Examples 7 Examples Of Newspaper Arti
News Article Examples 7 Examples Of Newspaper ArtiKathryn Patel
 
Writing Paper With Picture Box - Official
Writing Paper With Picture Box - OfficialWriting Paper With Picture Box - Official
Writing Paper With Picture Box - OfficialKathryn Patel
 

More from Kathryn Patel (20)

The Electoral College How It W. Online assignment writing service.
The Electoral College How It W. Online assignment writing service.The Electoral College How It W. Online assignment writing service.
The Electoral College How It W. Online assignment writing service.
 
Critical Essay Persuasive Writing Process
Critical Essay Persuasive Writing ProcessCritical Essay Persuasive Writing Process
Critical Essay Persuasive Writing Process
 
003 Why Boston University Sample Essay Example
003 Why Boston University Sample Essay Example003 Why Boston University Sample Essay Example
003 Why Boston University Sample Essay Example
 
Biography Template 01 Life Facts, Autobiography, Sent
Biography Template 01 Life Facts, Autobiography, SentBiography Template 01 Life Facts, Autobiography, Sent
Biography Template 01 Life Facts, Autobiography, Sent
 
Writing A Hypothesis Worksheet - Escolagersonal
Writing A Hypothesis Worksheet - EscolagersonalWriting A Hypothesis Worksheet - Escolagersonal
Writing A Hypothesis Worksheet - Escolagersonal
 
Pay Someone To Write My Essay Uk. Pay Fo
Pay Someone To Write My Essay Uk. Pay FoPay Someone To Write My Essay Uk. Pay Fo
Pay Someone To Write My Essay Uk. Pay Fo
 
Newspaper Report Writing - Examples, Format, Pdf Exa
Newspaper Report Writing - Examples, Format, Pdf ExaNewspaper Report Writing - Examples, Format, Pdf Exa
Newspaper Report Writing - Examples, Format, Pdf Exa
 
Write My Essay Online For Cheap. Online assignment writing service.
Write My Essay Online For Cheap. Online assignment writing service.Write My Essay Online For Cheap. Online assignment writing service.
Write My Essay Online For Cheap. Online assignment writing service.
 
A Website That Writes Essays For You. The 5 Best Websit
A Website That Writes Essays For You. The 5 Best WebsitA Website That Writes Essays For You. The 5 Best Websit
A Website That Writes Essays For You. The 5 Best Websit
 
Pay Someone To Write A Letter For Me, Writing A Letter Requesting Money
Pay Someone To Write A Letter For Me, Writing A Letter Requesting MoneyPay Someone To Write A Letter For Me, Writing A Letter Requesting Money
Pay Someone To Write A Letter For Me, Writing A Letter Requesting Money
 
Castle Writing Opps - T1W4 Jester, Castle, Gallery Wall
Castle Writing Opps - T1W4 Jester, Castle, Gallery WallCastle Writing Opps - T1W4 Jester, Castle, Gallery Wall
Castle Writing Opps - T1W4 Jester, Castle, Gallery Wall
 
Real Harvard Essays College Essay, Admission
Real Harvard Essays College Essay, AdmissionReal Harvard Essays College Essay, Admission
Real Harvard Essays College Essay, Admission
 
How To Write A Discursive Essa. Online assignment writing service.
How To Write A Discursive Essa. Online assignment writing service.How To Write A Discursive Essa. Online assignment writing service.
How To Write A Discursive Essa. Online assignment writing service.
 
Writing A Letter To Your Future Self Can Be Very Therapeuti
Writing A Letter To Your Future Self Can Be Very TherapeutiWriting A Letter To Your Future Self Can Be Very Therapeuti
Writing A Letter To Your Future Self Can Be Very Therapeuti
 
Free Writing Paper Template Writing Paper Template,
Free Writing Paper Template Writing Paper Template,Free Writing Paper Template Writing Paper Template,
Free Writing Paper Template Writing Paper Template,
 
Dr. Seuss- Cat In The Hat Writing By Kindergarten Bu
Dr. Seuss- Cat In The Hat Writing By Kindergarten BuDr. Seuss- Cat In The Hat Writing By Kindergarten Bu
Dr. Seuss- Cat In The Hat Writing By Kindergarten Bu
 
Linking Words, Connecting Words Full List And Useful Examples 7ESL
Linking Words, Connecting Words Full List And Useful Examples 7ESLLinking Words, Connecting Words Full List And Useful Examples 7ESL
Linking Words, Connecting Words Full List And Useful Examples 7ESL
 
How To Write Excellent Lesson Aims Lesson, Ho
How To Write Excellent Lesson Aims Lesson, HoHow To Write Excellent Lesson Aims Lesson, Ho
How To Write Excellent Lesson Aims Lesson, Ho
 
News Article Examples 7 Examples Of Newspaper Arti
News Article Examples 7 Examples Of Newspaper ArtiNews Article Examples 7 Examples Of Newspaper Arti
News Article Examples 7 Examples Of Newspaper Arti
 
Writing Paper With Picture Box - Official
Writing Paper With Picture Box - OfficialWriting Paper With Picture Box - Official
Writing Paper With Picture Box - Official
 

Recently uploaded

KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 

Recently uploaded (20)

KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 

A New Concurrent Calibration Method For Nonequivalent Group Design Under Nonrandom Assignment

  • 1. PSYCHOMETRIKA—VOL. 74, NO. 1, 1–19 MARCH 2009 DOI: 10.1007/S11336-008-9076-X A NEW CONCURRENT CALIBRATION METHOD FOR NONEQUIVALENT GROUP DESIGN UNDER NONRANDOM ASSIGNMENT KEI MIYAZAKI DEPARTMENT OF COGNITIVE AND BEHAVIORAL SCIENCE, THE UNIVERSITY OF TOKYO TAKAHIRO HOSHINO GRADUATE SCHOOL OF ECONOMICS, NAGOYA UNIVERSITY SHIN-ICHI MAYEKAWA GRADUATE SCHOOL OF DECISION SCIENCE AND TECHNOLOGY, TOKYO INSTITUTE OF TECHNOLOGY KAZUO SHIGEMASU DEPARTMENT OF COGNITIVE AND BEHAVIORAL SCIENCE, THE UNIVERSITY OF TOKYO This study proposes a new item parameter linking method for the common-item nonequivalent groups design in item response theory (IRT). Previous studies assumed that examinees are randomly assigned to either test form. However, examinees can frequently select their own test forms and tests of- ten differ according to examinees’ abilities. In such cases, concurrent calibration or multiple group IRT modeling without modeling test form selection behavior can yield severely biased results. We proposed a model wherein test form selection behavior depends on test scores and used a Monte Carlo expectation maximization (MCEM) algorithm. This method provided adequate estimates of testing parameters. Key words: common-item design, concurrent calibration, IRT linking, item response theory, Monte Carlo expectation maximization (MCEM) algorithm, multinomial logistic regression model, nonignorable miss- ingness. 1. Introduction Item response theory (IRT) methods are used in many testing applications. Advantageous properties of IRT arise from the fact that the theory explicitly models examinee responses at the item level, whereas, for example, the focus of classical test models is on responses at the level of test scores. In this study, we examined a common-item nonequivalent groups design from among various kinds of test designs for the purpose of item parameter linking and to consider the situations under which item parameter linking is performed between two test forms. In this study design, the two test forms had some items in common but also included items that some examinees did not answer. Figure 1 illustrates this design. The groups of examinees that differed with regard to testing times are shown as rows in the figure, and the tests are indicated by columns. In the common-item design, the scores on the tests the examinees did not select are not observed. The shaded areas indicate the missing data. u indicates the item response vectors. The suffixes of the alphabets and numbers indicate the kinds of tests and groups of examinees, respectively. Requests for reprints should be sent to Takahiro Hoshino, Graduate School of Economics, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan. E-mail: bayesian@jasmine.ocn.ne.jp © 2008 The Psychometric Society 1
  • 2. 2 PSYCHOMETRIKA FIGURE 1. Concurrent calibration in a common item design. Two groups of examinees from different populations were each assigned to different test forms. The test design in which the groups of examinees are not equivalent is called the nonequivalent groups design. Two main IRT linking methods are used in a common-item nonequivalent groups design: separate estimation and concurrent calibration (Wingersky & Lord, 1984). In separate estima- tion, the two sets of item parameter estimates for the common items are used to estimate a scale transformation that will put the item parameter estimates of one form on the scale of the item pa- rameter estimates for the other form (Haebara, 1980; Stocking & Lord, 1983; Kolen & Brennan, 2004). In concurrent calibration, item parameters for all the items on both forms are estimated simultaneously in one run of the estimation software. Estimating parameters for all items simul- taneously ensures that all parameter estimates are on the same scale. Bock and Zimowski (1997) suggested a multiple group IRT that can deal with multiple examinee groups differing in ability in the concurrent calibration. Von Davier and von Davier (2004) presented all the linking methods in a new framework (in which the calibration is performed in one step) in terms of restrictions on the likelihood function. Numerous studies have researched the accuracy of the estimation by using each method (Baker & Al-Karni, 1991; Kim & Cohen, 1992). Hanson and Béguin (2002) found that concur- rent calibration procedures produced more accurate results than did separate estimation. Conse- quently, to date, concurrent calibration is thought to be the most appropriate estimation method for the common-item nonequivalent groups design. When concurrent calibration is applied in the common-item nonequivalent groups design, the scores on the tests that some examinees did not take are regarded as missing data, whereas the preceding methods related to the nonequivalent groups design ignored these data. In other words, these methods implicitly assumed missing data to be data that was “missing at random” (MAR, see Little & Rubin, 2002). The missing data mechanism is ignorable if (a) missingness is MAR and (b) the parameters of the grouping variable and the parameters of the item response variables are distinct (Little & Rubin, 2002). When these two conditions are satisfied, the like- lihood function is separated into the likelihood of grouping variable and that of item response variables. If missingness is not MAR, or the likelihood function is not separated into the like- lihood of grouping variable and that of item response variables, the missing data mechanism is nonignorable. Meanwhile, in many cases, examinees can select their test forms; consequently, they are not randomly assigned to a test form. We explain this by using the example of real data analysis. In Sect. 5, we analyze a JLRT (Japanese Listening and Reading Comprehension Test) data set. The JLRT, which is a part of the Business Japanese Proficiency Test (BJT; Kato & JETRO, 2006) is a 100 multiple choice item test that measures the ability to communicate with other persons in the Japanese language. The target population of the BJT is people whose first language is
  • 3. KEI MIYAZAKI ET AL. 3 FIGURE 2. The situation of the test design considered in this study. not Japanese; the test has been administered in 32 cities across 13 countries such as Japan, the United States, Canada, EU countries, Brazil, China, and other Asian countries. The version of the JLRT that we analyzed has two equivalent test forms, and the one that the examinees are given depends on the country in which they take the test; that is, we can determine the country in which the examinees took the test by asking them which form they took. Because we can assume that the examinees living in Japan are probably highly motivated to study Japanese and have higher abilities than the examinees in other countries, just by asking an examinee which form he/she took, we can form an expectation of whether he/she has higher or lower ability. This expectation implies that missing data have some information about examinees’ achievement levels. Thus, missing data are regarded as being nonignorable. Another example demonstrating the advantage of our model is that examinees can frequently select test forms by themselves and are consequently not randomly assigned to a test form. In an AP (Advanced Placement) exam, in one measure at one administration, one student might choose between two essays that are supposed to measure the same construct while the common items are fixed (for more details about linking in AP exams, refer to Yang, 2004). In such cases, since examinees are not randomly assigned to one of the two test forms, a mere application of existing item parameter linking methods can yield biased results (see Sect. 4, simulation study). In addition, when the test form selection behavior depends on the examinees’ abilities, the exist- ing item parameter linking methods—including multiple group IRT—can yield biased estimates because the likelihood function is not separated into the likelihood of grouping variable and that of item response variables (a detailed explanation with mathematical expressions is provided in Model Assumptions, Sect. 2). To solve this problem, we constructed a model in which test form selection behavior is dependent on the scores of the tests (Fig. 2). In this model, missing test scores are regarded as nonignorable (Little & Rubin, 2002). Furthermore, we proposed an estimation method for the parameters, using the MCEM algorithm (Wei & Tanner, 1990). Consequently, we proposed a new concurrent calibration method for a common-item nonequivalent groups design. In Sect. 2, we present the model assumptions and describe the form of likelihood function of our model. We provide maximum likelihood estimations using the MCEM algorithm and address certain topics related to parameter estimation, such as the calculation of the asymptotic variance covariance matrix from the EM algorithm in Sect. 3. In Sect. 4, we present a simulation study to show that the traditional method provides severely biased estimates and verify that the proposed model can yield adequate estimates. In Sect. 5, we apply the proposed method to JLRT data and describe the meaningful results. Finally, in Sect. 6, we provide concluding remarks.
  • 4. 4 PSYCHOMETRIKA 2. Model Assumptions We considered the situations in which each of the two test forms were used separately. Item parameter linking was performed between these two test forms (Fig. 1). For the examinees in group 1, Tests A and B were administered, and for the examinees in group 2, Tests B and C were administered. Test B was common between the two test groups. The examinees had to choose one of the two groups. Let KA,KB, and KC be the number of items in Tests A, B, and C, respectively. Let ri be a test form selection indicator (ri = j (j = 1,2) implies that the ith examinee selected the jth test form). The item response vector uij is observed when the ith examinee selected the jth test form, and uij′ (j′ = j) is missing. Let ui = ((uobs i )′,(umis i )′)′, where uobs i represents the observed components of ui, while umis i represents the missing entries. Hence, the missing patterns can be expressed as follows: uobs i ′ , umis i ′ = (u′ i1,u′ i2) (ri = 1), (u′ i2,u′ i1) (ri = 2), (1) and ui1 = (u′ iA,u′ iB1)′, ui2 = (u′ iB2,u′ iC)′. For example, uiA = (uiA1,...,uiAKA )′. uiB1 repre- sents the item response vector when the ith examinee takes Test B at the first point in time or at the first place. uiB2 represents the item response vector when the ith examinee takes Test B at the second point in time or at the second place. The constraint that uiB1 = uiB2 should be considered. However, as described in the Introduction, here we assume that uiB1 is not equal to uiB2 because of the difference in examinees’ abilities with regard to each point in time or each place. We let θij be a random latent variable that represents the ability of the jth group. θij is distributed as N(μj ,σ2 j ) (in this paper, we do not assume the multidimensionality of abilities. For the problem of multidimensionality of abilities, see van der Linden Luecht, 1998). Under the three-parameter logistic model, the probability that the ith examinee of ability θij correctly answered item k of test X (X = A,B,C) is defined as p uiXkX |θij ,ψjk = cXkX + (1 − cXkX ) 1 1 + exp{−1.7aXkX (θij − bXkX )} , (2) where kX = 1,...,KX and ψjk is the vector that contains all the item parameters of item k of the jth group. The probability that the ith item response vector uij is obtained is expressed as follows: p ui1|θi1,ψ1 = KA kA=1 cAkA + (1 − cAkA ) 1 1 + exp{−1.7aAkA (θi1 − bAkA )} × KB kB =1 cBkB + (1 − cBkB ) 1 1 + exp{−1.7aBkB (θi1 − bBkB )} , (3) p ui2|θi2,ψ2 = KB kB =1 cBkB + (1 − cBkB ) 1 1 + exp{−1.7aBkB (θi2 − bBkB )} × KC kC=1 cCkC + (1 − cCkC ) 1 1 + exp{−1.7aCkC (θi2 − bCkC )} . (4) In the preceding models, the assignment mechanism is explained by the observed portion of the complete item responses (Lord, 1974; Bernaards Sijtsma, 1999). However, if the indicator
  • 5. KEI MIYAZAKI ET AL. 5 variable of test form selection behavior depends on both observed and missing portions, the as- signment mechanism is not random, thereby leading to the conclusion that the existing methods yield biased estimates for ability and item parameters (see the simulation study in Sect. 4). To solve this problem, we modeled the relation between grouping variables and item response vari- ables of all the tests containing observed and missing item response variables, using the logistic regression model. Our model also seeks to estimate the differences in ability parameters between the two points in time. In previous models, because the likelihood function was separated into the likelihood of grouping variable and item response variables in item response models, it turns out that the ex- isting IRT model provides consistent estimates for parameters. In our model, however, grouping variables depend on the item response variables of all the test forms and, therefore, the likelihood function cannot be separated. Therefore, the traditional method cannot be applied. In this paper, the test form selection mechanism is modeled using the following nominal logistic regression model. To express the model in a more general manner, we let the explanatory variables include item response variables ui = (u′ i1,u′ i2)′ and ability variables θi = (θi1,θi2). The equation is as follows: p ri = j|ui,θi,ρ = exp(ρ′ uj ui + ρ′ fj θi) 1 + exp(ρ′ uj ui + ρ′ fj θi) = exp(ρ′ j zi) 1 + exp(ρ′ j zi) , (5) where zi = (u′ i,θ′ i)′ and ρj = (ρ′ uj ,ρ′ fj )′. ρj are coefficients multiplied by item response vector ui and latent ability variable θi. The higher the values of the elements of ρj , the higher is the probability that the examinees are assigned to the jth group, and ρuj = ρ′ Aj ,ρ′ B1j ,ρ′ B2j ,ρ′ Cj ′ , (6) ρAj = (ρA1j ,...,ρAKAj )′ , (7) ρB1j = (ρB11j ,...,ρB1KB j )′ , (8) ρB2j = (ρB21j ,...,ρB2KB j )′ , (9) ρCj = (ρC1j ,...,ρCKCj )′ , (10) ρθj = (ρθ1j ,ρθ2j )′ . (11) Furthermore, to ensure notational simplicity, let φj be the vector that contains μj and σ2 j . In this paper, we provide the maximum likelihood estimation of the parameters. Under the assumption that θ1 and θ2 are independent, θ1 ⊥ ⊥θ2, the complete-data log-likelihood function of the sample of the ui,ri observations is a function of ψ,φ, and ρ, and is written as follows: L ψ,φ,ρ|r,u,θ = N i=1 logp ri = j,ui,θi|ψ,φ,ρ = N i=1 logp ri = j|ui,θi,ρ + 2 j=1 logp uij |θij ,ψj + 2 j=1 logp θij |φj = LR + LU + L, (12)
  • 6. 6 PSYCHOMETRIKA where LR = N i=1 logp ri = j|ui,θi,ρ , (13) LU = N i=1 2 j=1 logp uij |θij ,ψj , (14) L = N i=1 2 j=1 logp θij |φj . (15) Actually, both umis and θ cannot be observed. Let Lobs(ψ,ρ|r,uobs) be the observed log- likelihood value. Lobs(ψ,ρ|r,uobs) is as follows: Lobs ψ,ρ|r,uobs = N i=1 logp ri = j|uobs i ,ρ + N i=1 logp uobs i |ψ . (16) The function form is so complex that we cannot maximize this directly. To solve this problem, we used the EM algorithm (Dempster, Laird, Rubin, 1977), which is useful for analyzing data containing missing values. The Relationship between Our Method and Existing Linking Methods for the Nonequivalent Groups Design Nonignorable missing data models are classified broadly into two categories: pattern mix- ture models and selection models (Little and Rubin, 2002). The multiple group IRT model is categorized as a kind of pattern mixture model. Pattern mixture models are expressed as a joint distribution where the observed variables depend on missing indicators. Thus, the likelihood function of the multiple groups IRT model is expressed as follows: lobs ψ,φ,ρ|r,u,θ = N i=1 p(ri = j,uobs i ,umis i ,θi|ψ,φ,ρ) dumis i dθi = N i=1 p uobs i ,umis i ,θi|ri = j,ψ,φ × p ri = j|ρ dumis i dθi. (17) As described in Sect. 3, we can test whether the missing data umis i have an effect on test form selection behavior using the Wald statistic. In this way, the proposed method is more practical and useful than the existing nonequivalent groups design. We also considered the assumption that an examinee’s ability can influence test form assign- ment. The observed likelihood of this assumption can be expressed as follows: lobs ψ,φ,ρ|r,uobs = N i=1 p ri = j|θi,ρ × p uobs i ,umis i ,θi|ψ,φ dumis i dθi. (18) Equation (18) indicates that the likelihood function is not separated into the likelihood of group- ing variable and that of item response variables. The likelihood function under the MAR assump- tion, that is, the likelihood function of the existing multiple group IRT, is given by (17) and is
  • 7. KEI MIYAZAKI ET AL. 7 separated into the likelihood of grouping variable and item response variables; this form is ob- viously different from (18). The data are not MAR and, therefore, existing IRT linking methods can yield biased ML estimates. Consequently, as long as we assume that test form selection be- havior depends on examinees’ abilities, all the existing methods related to concurrent calibration inevitably yield biased estimates. The proposed method can also adjust for these biases. 3. The Estimation Method Obtaining the ML estimates by directly maximizing p(r,uobs|ψ,ρ) is very difficult because the likelihood function p(r,uobs|ψ,ρ) is very complicated due to the presence of missing data and latent variables. Thus, instead of working with p(r,uobs|ψ,ρ) directly, we augment uobs with (umis,f ) using the EM algorithm in the ML estimation. Consequently, the ML estimation based on the complete data set is made easier when the following EM algorithm is used: [E-step]: Evaluate the expected value of the complete-data log-likelihood with respect to umis and θ Q ψ,φ,ρ|ψ(t) ,φ(t) ,ρ(t) = N i=1 logp ri|ui,θi,ρ + 2 j=1 logp uij ,θij |ψj ,φj × p umis i ,θi|uobs i ,ri,ψ(t) ,φ(t) ,ρ(t) dumis i dθi (19) at the tth iteration with a current value ψ(t) ,φ(t) . Since the E-step cannot be calculated analytically, we use the MCEM algorithm Wei Tanner, 1990), where this E-step is approximated by the Monte Carlo estimate of the expectation using a sufficiently large number of observations simulated from the condi- tional distribution p(umis,θ|uobs,ψ(t) ,φ(t) ). This is accomplished by using the Metropolis– Hastings algorithm, which enables us to draw samples from p(umis i |uobs i ,ri,θi,ψ(t) ,ρ(t)) and p(θi|uobs i ,umis i ,ri,φ(t) ). (See Ibrahim, Chen, Lipsitz, 2001 for the ML estimation in general- ized linear models when the missing data mechanism is nonignorable.) The following Metropolis–Hastings algorithms are used to sample umis and θ. Let umis (m) and θ(m) be the current values at the mth iteration and umis i(m) and θi(m) be the values of the ith observation of umis (m) and θ(m). (i) Generate umis i(m+1) from p(umis i |uobs i ,ri,θi(m),ψ(t) ,ρ(t)) (Metropolis–Hastings algorithm) The target and the proposal distribution are as follows: target distribution: p umis i |uobs i ,ri,θi(m),ψ(t) ,ρ(t) (20) proposal distribution: p umis i |uobs i ,θi(m),ψ(t) (21) (i-1) Draw u∗ i ∼ p(umis i |uobs i ,θi(m),ψ(t) ) (i-2) Accept umis i(m+1) = u∗ i with probability α u∗ i |umis i(m),uobs i ,θi(m),ri,ρ(t) = min p(ri|uobs i ,u∗ i ,θi(m),ρ(t)) p(ri|uobs i ,umis i(m),θi(m),ρ(t)) ,1 (22)
  • 8. 8 PSYCHOMETRIKA (ii) Generate θi(m+1) from p(θi|uobs i ,umis i(m+1),ri,φ(t) ,ρ(t)) Further, here, we use the following Metropolis–Hastings algorithm. The target and the pro- posal distribution are as follows: target distribution: p θi|ri,uobs i ,umis i(m+1),ρ(t) ,φ(t) (23) proposal distribution: p θi|φ(t) (24) (ii-1) Draw θ∗ i ∼ p(θi|φ(t) ) (ii-2) Accept θi(m+1) = θ∗ i with probability α θ∗ i |θi(m),uobs i ,umis i(m+1),ψ(t) = min p(ri,uobs i ,umis i(m+1)|θ∗ i ,ρ(t),ψ(t) ) p(ri,uobs i ,umis i(m+1)|θi(m),ρ(t),ψ(t) ) ,1 = min p(ri|uobs i ,umis i(m+1),θ∗ i ,ρ(t))p(uobs i ,umis i(m+1)|θ∗ i ,ψ(t) ) p(ri|uobs i ,umis i(m+1),θi(m),ρ(t))p(uobs i ,umis i(m+1)|θi(m),ψ(t) ) ,1 . (25) After drawing samples of umis (m) and θ(m), Q(ψ,φ,ρ|ψ(t) ,φ(t) ,ρ(t)) is approximated by the Monte Carlo integration: Q ψ,φ,ρ|ψ(t) ,φ(t) ,ρ(t) ≈ N i=1 1 M M m=1 logp ri,uobs i ,umis i(m),θi(m)|ψ,φ,ρ , (26) where M is the number of draws. [M-step]: Maximize Q(ψ,φ,ρ|ψ(t) ,φ(t) ,ρ(t)) and update ψ(t) ,φ(t) ,ρ(t) to ψ(t+1) , φ(t+1) , ρ(t+1). At the maximization (M)-step, we need to maximize Q(ψ,φ,ρ|ψ(t) ,φ(t) ,ρ(t)) with respect to ψ,φ and ρ. Using (12) and (26), Q(ψ,φ,ρ|ψ(t) ,φ(t) ,ρ(t)) can be written as follows: Q ψ,φ,ρ|ψ(t) ,φ(t) ,ρ(t) = ER + EU + E, (27) where ER = N i=1 logp ri|ui,θi,ρ × p umis i ,θi|uobs i ,ri,ψ(t) ,φ(t) ,ρ(t) dumis i dθi, (28) EU = N i=1 2 j=1 logp uij |θij ,ψ × p umis i ,θi|uobs i ,ri,ψ(t) ,φ(t) ,ρ(t) dumis i dθi, (29) E = N i=1 2 j=1 logp θij |φ × p umis i ,θi|uobs i ,ri,ψ(t) ,φ(t) ,ρ(t) dumis i dθi. (30) Thus, maximizing Q(ψ,φ,ρ|ψ(t) ,φ(t) ,ρ(t)) is equivalent to solving the following equations: ∂Q(ψ,φ,ρ|ψ(t) ,φ(t) ,ρ(t)) ∂ρ = ∂ER ∂ρ = 0, (31)
  • 9. KEI MIYAZAKI ET AL. 9 ∂Q(ψ,φ,ρ|ψ(t) ,φ(t) ,ρ(t)) ∂ψ = ∂EU ∂ψ = 0, (32) ∂Q(ψ,φ,ρ|ψ(t) ,φ(t) ,ρ(t)) ∂φ = ∂E ∂φ = 0. (33) The complete-data likelihood equation for ρ cannot be obtained as a closed form; therefore, the Newton–Raphson method is used to obtain the maximum of updating parameters. The first and second partial derivatives of ρj (j = 1,2) are ∂LR ∂ρj = N i=1 Rij − exp(ρ′ j zi) 1 + exp(ρ′ j zi) zi, (34) ∂2LR ∂ρj ∂ρ′ j = − N i=1 exp(ρ′ j zi) 1 + exp(ρ′ j zi) 1 − exp(ρ′ j zi) 1 + exp(ρ′ j zi) ziz′ i, (35) where R is N × 2 indicator matrix in which the (i,j)th element is defined as: Rij = 1 if ri = j, 0 if ri = j. (36) Let ρ (t) j(s) be the value of the sth Newton–Raphson step in the tth M-step. The following equation is used for updating ρ(t) j ρ (t) j(s+1) = ρ (t) j(s) − ∂2ER ∂ρj ∂ρt j −1 ρj =ρ (t) j(s) × ∂ER ∂ρj ρj =ρ (t) j(s) , (37) where ∂ER ∂ρj = 1 M M m=1 ∂LR ∂ρj , ∂2ER ∂ρj ∂ρ′ j = 1 M M m=1 ∂2LR ∂ρj ∂ρ′ j . (38) Updating is repeated through the above equation until the convergence criterion is satisfied. Moreover, the likelihood equation for the item parameters cannot be obtained as a closed form; therefore, again, we use the Newton–Raphson method. In operational practice, the following two types of constraints can be imposed for test form selection behavior, and some parts of the Metropolis–Hastings algorithm in the E-step are altered as described below due to these constraints. When the Test Form Selection Behavior Depends Only on the Ability Parameters (i) is altered as follows: (i’) Generate umis i(m+1) from p(umis i |uobs i ,ri,θi(m),ψ(t) ,ρ(t)) Because umis i is independent of ri, the conditional distribution of umis i can be obtained from p(umis i |uobs i ,θi(m),ψ(t) ,ρ(t)). When the Test Form Selection Behavior Depends Only on the Test Scores (ii) is altered as follows:
  • 10. 10 PSYCHOMETRIKA (ii’) Generate θi(m+1) from p(θi|uobs i ,umis i(m+1),ri,φ(t) ,ρ(t)) We again use the following Metropolis–Hastings algorithm. Further, the target and the pro- posal distribution are as follows: target distribution: p θi|ri,uobs i ,umis i(m+1),ρ(t) ,φ(t) (39) proposal distribution: p θi|φ(t) (40) (ii’-1) Draw θ∗ i ∼ p(θi|φ(t) ) (ii’-2) Accept θi(m+1) = θ∗ i with probability α θ∗ i |θi(m),uobs i ,umis i(m+1),ψ(t) = min p(uobs i ,umis i(m+1)|θ∗ i ,ψ(t) ) p(uobs i ,umis i(m+1)|θi(m),ψ(t) ) ,1 . (41) Assessing the Covariance Matrix of the Estimates Based on the Observed Information Matrix Let ξ = (ψ,φ,ρ); that is, ξ contains all the parameters of our model. As a by-product, the standard error of the parameter vector ξ can be calculated for the MCEM algorithm. Louis (1982) showed that the observed information matrix of ξ̂ from the EM algorithm can be expressed as I ξ̂|yobs = Eξ Ic ξ|y |yobs ξ=ξ̂ − Eξ Sc y|ξ ST c y|ξ |yobs ξ=ξ̂ , (42) where Ic(ξ|y) is the matrix of the negative of the second-order partial derivatives of the complete- data log likelihood function with respect to the elements of ξ, and Sc(y|ξ) is the gradient vector of the complete-data log likelihood function, that is, Ic ξ|y = − ∂2 logLc(ξ|y) ∂ξ∂ξ′ , (43) Sc y|ξ = ∂ logLc(ξ|y) ∂ξ , (44) where Lc(ξ|y) is a complete-data log likelihood function in which the missing part is com- plemented through each MCE step. In practice, the calculation of expectation values in (42) is approximated through Monte Carlo integration. Therefore, (42) is translated as follows: I ξ̂|yobs = 1 M M m=1 Ic ξ̂|y(m) − 1 M M m=1 Sc y(m)|ξ̂ ST c y(m)|ξ̂ . (45) Using the observed information matrix I(ξ̂|yobs), let ξp be a part of the parameter vector ξ, so that we can test the null hypothesis “H0 : ξp = 0” using the Wald statistic. The Wald statistic of the hypothesis H0 can be expressed as follows: W = ξ′ p Iξ′ p ξ̂|yobs −1 ξp, (46) where Iξp (ξ̂|yobs) is the submatrix of the Fisher information I(ξ̂|yobs) relevant to ξp. 4. Simulation Study To show the reliability of the proposed method, we carried out a simulation study. We in- cluded data for which the assignment of test forms was not random, and for which the test form
  • 11. KEI MIYAZAKI ET AL. 11 selection behavior depended on both the scores on the tests the examinees selected and the scores on the tests they did not select. We used IML/SAS to evaluate the multiple group item response theory as well as the estimates from the proposed method. For the simulation study, we gener- ated 100 data sets and for each data set, we obtained the usual ML estimates for multiple group IRT (concurrent calibration) and the estimates using the proposed method. We were interested in assessing the accuracy of the parameter estimation for our model. A two-parameter logistic model was used as the functional form of the item response. Each test had 10 items. The item parameters of Test B (aB,bB) were common across test forms, and μ1 and σ2 1 were fixed as μ1 = 0 and σ2 1 = 1, respectively. To ensure the identifiability of the model and for the sake of simplicity, we assumed that the sums of the observed and missing test scores would determine test form selection behavior; that is, we considered the following constraint: ρA1j = ··· = ρAKAj = ρB11j = ··· = ρB1KB j = πj , (47) ρB21j = ··· = ρB2KB j = ρC1j = ··· = ρCKCj = −πj . Even when this constraint is assumed, the assumption of a nonrandom assignment can be upheld. With these constraints, the test selection probability can be expressed as follows: p ri = j|ui,ρ = exp(πj vi) 1 + exp(π1vi) , (48) where vi is the difference between the scores of the ith examinee on the two test forms: vi = KA kA=1 uiAkA + KB kB =1 uiB1kB − KB kB =1 uiB2kB + KC kC=1 uiCkC . (49) The true values are provided in Tables 1 and 2. In the current model, the total number of parameters was 63. In this simulation study, the correlation of θ1 and θ2 was set to 0.5. We generated 100 replications, all of which followed the same population parameters (or true values). For each replication, we generated 3,000 observations and set M = 30. The convergence criterion for the Newton–Raphson algorithm in each M-step was 0.001. The means of the ML estimates were computed based on 100 replications. The root mean squares (RMSs) between the estimates and true values as well as the total value of mean squared errors (MSEs) were computed to compare the accuracy of the results of this simulation with those under a random assignment condition. The results are listed in Tables 1 and 2. We obtained the biases by subtracting the estimated values from the true values. Our method could calculate scores. We calculated Monte Carlo estimates for the ability parameter of each examinee and created histograms (see Fig. 3). The sum of the MSEs was calculated, and the value was 0.298 for the proposed model and 3.18 for the traditional model. Moreover, the sum of the absolute values of the biases was cal- culated for each assumed model. The resultant value of the sum was 0.686 for the proposed model and 12.5 for the traditional model. With regard to the sum of the MSEs, approximately 10 times the difference was observed, whereas we found that the sum of the biases under the con- current calibration was about 20 times larger than that under the proposed model. These results indicate that the parameters can be estimated accurately under the proposed model, whereas the traditional model essentially yields biased estimates.
  • 12. 12 PSYCHOMETRIKA TABLE 1. The results of simulation study for the IRT model (ρ,φ,µ,aA,aB ). Para Proposed model Existing model Biases RMS Biases RMS π1 = 0.2 −3.00 × 10−4 0.0238 ∗ * μ2 = 0.5 −3.33 × 10−3 0.0575 0.420 0.423 σ2 = 1.5 −2.47 × 10−2 0.111 −0.181 0.204 aA1 = 0.6 4.68 × 10−3 0.0477 −0.0460 0.0641 aA2 = 0.8 1.42 × 10−4 0.0669 −0.0623 0.0885 aA3 = 1.0 −4.76 × 10−3 0.0825 −0.0808 0.113 aA4 = 1.2 −1.97 × 10−3 0.0788 −0.0936 0.120 aA5 = 1.5 2.56 × 10−2 0.107 −0.0944 0.139 aA6 = 0.8 6.35 × 10−3 0.0630 −0.0571 0.0824 aA7 = 0.6 5.91 × 10−3 0.0521 −0.0456 0.0674 aA8 = 1.0 3.01 × 10−3 0.0663 −0.0752 0.0979 aA9 = 1.5 1.32 × 10−2 0.105 −0.103 0.143 aA10 = 1.2 1.03 × 10−2 0.0959 −0.0839 0.122 aB1 = 1.8 1.32 × 10−2 0.0903 −0.116 0.145 aB2 = 1.3 3.82 × 10−3 0.0548 −0.0859 0.102 aB3 = 1.2 7.17 × 10−3 0.0570 −0.0779 0.0968 aB4 = 0.7 3.91 × 10−3 0.0333 −0.0488 0.0589 aB5 = 0.9 8.89 × 10−3 0.0508 −0.0543 0.0709 aB6 = 0.8 1.95 × 10−3 0.0366 −0.0563 0.0666 aB7 = 0.7 −1.67 × 10−3 0.0366 −0.0525 0.0651 aB8 = 1.0 −7.10 × 10−3 0.0481 −0.0774 0.0929 aB9 = 1.1 9.35 × 10−3 0.0531 −0.0669 0.0849 aB10 = 1.0 1.63 × 10−2 0.0521 −0.0551 0.0716 5. Real Data Analysis We illustrated the applicability of our method, using a small part of the JLRT data set. The JLRT is a test that measures the ability to communicate with other persons in Japanese in the business setting, and is administered mainly to students or business people whose first language is not Japanese. The JLRT has two equivalent test forms. Test form 1 includes Tests A and B, and Test form 2 includes Tests B and C; Test B is common to the two test forms. The test form the examinees take depends on the country in which they take the test. For illustrative purposes, we used a subsample that recently took the JLRT. One thousand one hundred and ninety-nine examinees took Test form 1 in Japan and 863 examinees took Test form 2 in other countries. Considering that the examinees living in Japan were probably highly motivated to study Japanese and had greater abilities than the examinees living in other countries, we can assume that the examinees were not randomly assigned to the test forms. For the purpose of simplification, a two-parameter logistic model was used as the functional form of the item responses. The number of items in Test B was 33 and the number of items in each Test A and Test C is 67. If we had analyzed all these items, there would have been too much data in the tables and figures, which would have been inappropriate for the purpose of illustration. Thus, after an exploratory analysis of 100 items included in the two test forms using BILOG-MG, we chose 10 items from each of the Tests A, B, and C. μ1 and σ2 1 are fixed as μ1 = 0, σ2 1 = 1, respectively. As in the simulation study, we assumed that the sums of the test scores determined the test form selection behavior. The convergence criterion for the EM step
  • 13. KEI MIYAZAKI ET AL. 13 TABLE 2. The results of simulation study for the IRT model (aC,bA,bB ,bC). Para Proposed model Existing model Biases RMS Biases RMS aC1 = 1.3 1.02 × 10−2 0.0986 −0.0529 0.111 aC2 = 0.7 2.97 × 10−3 0.0500 −0.0386 0.0615 aC3 = 0.6 8.71 × 10−3 0.0498 −0.0281 0.0546 aC4 = 0.8 1.41 × 10−2 0.0629 −0.0321 0.0671 aC5 = 1.5 3.34 × 10−2 0.120 −0.0172 0.101 aC6 = 0.9 7.85 × 10−3 0.0594 −0.0446 0.0725 aC7 = 0.8 1.01 × 10−2 0.0609 −0.0316 0.0659 aC8 = 0.6 3.48 × 10−3 0.0514 −0.0346 0.0593 aC9 = 1.0 4.59 × 10−3 0.0660 −0.0405 0.0727 aC10 = 1.1 1.11 × 10−2 0.0787 −0.0464 0.0831 bA1 = −0.5 −8.21 × 10−3 0.0785 −0.323 0.342 bA2 = −1.3 −1.89 × 10−2 0.122 −0.381 0.408 bA3 = −1.0 −1.68 × 10−2 0.0796 −0.343 0.360 bA4 = −0.4 −3.15 × 10−3 0.0408 −0.272 0.283 bA5 = −0.6 −9.54 × 10−3 0.0405 −0.289 0.302 bA6 = −0.7 −1.08 × 10−2 0.0767 −0.323 0.342 bA7 = −0.3 −3.60 × 10−3 0.0705 −0.302 0.317 bA8 = −0.2 −1.09 × 10−2 0.0451 −0.272 0.283 bA9 = −0.5 −5.11 × 10−3 0.0441 −0.276 0.285 bA10 = −1.0 −3.91 × 10−3 0.0684 −0.321 0.339 bB1 = 0 −8.73 × 10−3 0.0245 −0.223 0.231 bB2 = 0.1 −5.57 × 10−3 0.0285 −0.220 0.227 bB3 = −0.2 −6.35 × 10−3 0.0333 −0.247 0.255 bB4 = 0.3 −6.19 × 10−3 0.0404 −0.232 0.242 bB5 = −0.7 −6.89 × 10−3 0.0572 −0.295 0.306 bB6 = 0.6 −8.92 × 10−3 0.0357 −0.203 0.212 bB7 = −0.2 −9.09 × 10−3 0.0511 −0.275 0.287 bB8 = 1.0 −9.23 × 10−3 0.0410 −0.148 0.175 bB9 = −0.6 −2.51 × 10−3 0.0383 −0.275 0.284 bB10 = 0.5 −5.22 × 10−3 0.0282 −0.197 0.206 bC1 = 0.5 −1.57 × 10−2 0.0418 −0.182 0.192 bC2 = 0.7 6.47 × 10−3 0.0581 −0.174 0.190 bC3 = 1.5 −1.81 × 10−2 0.0723 −0.154 0.175 bC4 = 0.5 −6.53 × 10−3 0.0619 −0.194 0.210 bC5 = 1.0 −1.04 × 10−2 0.0346 −0.139 0.150 bC6 = 0.6 −1.40 × 10−2 0.0475 −0.189 0.201 bC7 = 1.2 −1.39 × 10−2 0.0529 −0.154 0.168 bC8 = 0.8 −1.27 × 10−2 0.0634 −0.201 0.217 bC9 = 1.3 −9.13 × 10−3 0.0478 −0.134 0.151 bC10 = 0.8 −6.42 × 10−3 0.0458 −0.158 0.172 and for the Newton–Raphson algorithm in each M-step was 0.01. We obtained the parameter estimates and the asymptotic covariance matrix using Louis’s (1982) method. In addition, we used BILOG-MG and obtained the estimates under the MAR assumption. The results are listed in Tables 3 and 4. The estimates of the difficulty parameters varied greatly between the proposed
  • 14. 14 PSYCHOMETRIKA FIGURE 3. Histograms of scores for each group in simulation. and existing methods. This result is similar to the data-generating situation in the simulation study. As a method for model comparison, the Akaike information criteria (AIC) and the Bayesian information criteria (BIC) were calculated to compare the proposed method with the existing method. The results are listed in Table 5. The method with the lower AIC and BIC values was found to be more appropriate to analyze the current data. Hence, it is concluded that the proposed method is superior to the existing one. Following (46), we also calculated the Wald statistic of
  • 15. KEI MIYAZAKI ET AL. 15 TABLE 3. The results of real data analysis (ρ,φ,µ,aA,aB ,aC). Para Proposed model Existing model (MAR) Estimates SE Estimates SE π1 0.443 0.0383 * * μ2 −0.445 0.0320 −0.639 0.0327 σ2 0.684 0.0316 0.701 0.0222 aA1 1.369 0.238 0.954 0.189 aA2 1.786 0.266 1.388 0.152 aA3 1.521 0.393 1.234 0.280 aA4 1.654 0.382 1.456 0.269 aA5 1.515 0.388 1.165 0.235 aA6 1.298 0.137 0.905 0.081 aA7 0.956 0.141 0.640 0.090 aA8 0.337 0.074 0.255 0.041 aA9 1.712 0.191 1.243 0.111 aA10 1.032 0.099 0.734 0.057 aB1 1.093 0.088 0.931 0.057 aB2 0.540 0.052 0.408 0.033 aB3 0.661 0.051 0.536 0.035 aB4 0.508 0.058 0.383 0.037 aB5 0.932 0.064 0.741 0.039 aB6 1.162 0.107 1.037 0.070 aB7 0.804 0.070 0.649 0.043 aB8 0.861 0.072 0.737 0.047 aB9 1.079 0.077 0.891 0.047 aB10 0.480 0.044 0.363 0.028 aC1 0.724 0.132 0.834 0.101 aC2 0.804 0.359 1.085 0.195 aC3 1.026 0.280 1.487 0.199 aC4 1.114 0.280 1.655 0.217 aC5 0.885 0.150 1.057 0.114 aC6 1.041 0.195 1.346 0.164 aC7 0.448 0.088 0.521 0.065 aC8 1.347 0.159 1.522 0.111 aC9 0.605 0.098 0.623 0.067 aC10 0.439 0.084 0.480 0.060 the hypothesis concerning MAR H0 : π1 = 0. The Wald statistic follows a chi-square distribution with 1 degree of freedom. The resulting value was χ2(1) = 191.3,p 0.001. This indicates that the MAR assumption cannot be upheld and missing data are nonignorable. As another statistical testing of π1, the Z-value was also calculated at 11.57 (p 0.001), which is statistically signifi- cant. The AIC, BIC, Wald statistic, and Z-value, all suggest that assignment to the test forms was not random. The examinees could determine which test form they would take by determining which country they lived in and were also expected to have acquired superior Japanese language skills if they lived in Japan. Therefore, in our real data analysis, we can assume that test form selection behavior exists and that assignment to a test form is not random. For reference, we calculated Monte Carlo estimates for the ability parameters of each ex- aminee and created histograms using our proposed method (see Fig. 4). Figure 4 shows that the examinees living in countries other than Japan (group 2) had lower abilities than those living in Japan, which is consistent with the expected results.
  • 16. 16 PSYCHOMETRIKA TABLE 4. The results of real data analysis (bA,bB ,bC). Para Proposed model Existing model (MAR) Estimates RMS Estimates RMS bA1 −1.776 0.267 −2.970 0.473 bA2 −1.054 0.105 −1.672 0.119 bA3 −1.827 0.317 −2.815 0.442 bA4 −1.603 0.223 −2.354 0.257 bA5 −1.744 0.316 −2.729 0.393 bA6 −0.783 0.089 −1.407 0.119 bA7 −1.547 0.237 −2.697 0.362 bA8 −1.776 0.486 −2.922 0.542 bA9 −0.792 0.076 −1.345 0.093 bA10 −0.279 0.066 −0.723 0.085 bB1 −0.965 0.067 −1.499 0.064 bB2 −0.855 0.101 −1.551 0.126 bB3 1.493 0.094 1.475 0.094 bB4 −1.550 0.182 −2.490 0.223 bB5 −0.249 0.042 −0.679 0.046 bB6 −1.230 0.083 −1.767 0.075 bB7 −0.974 0.083 −1.577 0.089 bB8 −0.989 0.080 −1.542 0.079 bB9 −0.526 0.048 −0.988 0.046 bB10 0.658 0.072 0.412 0.080 bC1 −1.916 0.260 −2.257 0.163 bC2 −2.981 0.961 −2.934 0.304 bC3 −2.265 0.366 −2.335 0.137 bC4 −2.168 0.307 −2.261 0.118 bC5 −1.709 0.189 −2.044 0.115 bC6 −1.984 0.233 −2.241 0.126 bC7 −1.119 0.183 −1.540 0.127 bC8 −0.657 0.053 −1.098 0.038 bC9 0.640 0.153 0.120 0.098 bC10 −0.212 0.101 −0.766 0.088 TABLE 5. The value of AIC and BIC for the real data analysis. Proposed method Existing method AIC 2.878 × 104 3.132 × 104 BIC 2.914 × 104 3.169 × 104 6. Conclusion In this paper, we proposed a new method of item parameter linking in IRT. Through the simulation study, we showed that ignoring test form selection behavior results in considerable bias in the estimates of the item parameters when the assignment to the test forms is not ran- dom. Furthermore, we showed that this bias can be reduced using the models presented above. However, although the results in the simulation study seemed to be sufficient to demonstrate the accuracy of the proposed estimation method, the models and methods for nonignorable missing data are notoriously sensitive to misspecification. In some cases of misspecification, the bias of
  • 17. KEI MIYAZAKI ET AL. 17 FIGURE 4. Histograms of scores for each group in real data. these non-MAR models can also be more serious than when assuming MAR. Because the true missing data mechanism is not known in practice, the performance of simulation studies can sel- dom be widely generalized to real data. The issue of robustness of the proposed method to model assumptions is left for future empirical studies. The proposed model includes the model in which test form selection behavior is determined at random without dependency on the scores on all the tests, which leads to the idea that we can apply the proposed model initially; then using some test statistics such as the Wald statistic, we
  • 18. 18 PSYCHOMETRIKA can test whether missing test scores affect test form selection behavior. In this regard, this model is advantageous. A variety of methods for dealing with ignorable and nonignorable missing data in practical situations have been proposed (Schafer, 1997). Models with nonignorable missing-data mech- anisms in IRT were also proposed by Holman and Glas (2005). However, since they were in- terested in modeling the nonignorable missing data mechanism with the item response model, their model does not consider item parameter linking in the common-item nonequivalent groups design. While their model considers the missing mechanisms per item, our model deals with the missing mechanism per test form. Moreover, their model is based on the idea of pattern mixture models in which the test form selection indicators are set as the explanatory variables and the item responses are set as the dependent variables. In contrast, our model is regarded as a kind of selection model in which the relation between explanatory variables and dependent variables are contrary to that in pattern mixture models. While the proposed model is a full parametric model, one can conduct analysis using a semiparametric model with propensity scores (Hoshino, Kurata, Shigemasu, 2006; Hoshino, 2007, 2008) under the MAR assumption. We are going to conduct simulations to confirm which of the two methods to use: the proposed method that needs parametric model assumption but does not need the MAR assumption or the above methods that enable semiparametric analysis but require the MAR assumption. Models for tests having items that differ in terms of levels of measurement (such as dichoto- mous data and polytomous data) are also topics that can be addressed in future research. Acknowledgements The authors are grateful to Ms. Naoko Hojo of the Japan External Trade Organization (JETRO) for helping us access the JLRT data. This study was partially supported by the Min- istry of Education, Science, Sports, and Culture, Grant-in-Aid for Scientific Research, 19-8879, Scientific Research (B), 193-30145 and the Inamori Foundation grant (to Takahiro Hoshino). Fi- nally, we would like to express our sincere thanks to the associate editor and two reviewers for their valuable advice and comments. References Baker, F.B., Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement, 28, 147–162. Bernaards, C.A., Sijtsma, K. (1999). Factor analysis of multidimensional polytomous item response data suffering from ignorable item nonresponse. Multivariate Behavioral Research, 34, 277–313. Bock, R.D., Zimowski, M.F. (1997). Multiple group IRT. In W.M. van der Linden R.K. Hambleton (Eds.), Handbook of modern item response theory (pp. 433–448). Berlin: Springer. Dempster, A.P., Laird, N.M., Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1–38. Haebara, T. (1980). Equating logistic ability scales by a weighted least squares method. Japanese Psychological Re- search, 22, 144–149. Hanson, B.A., Béguin, A.A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent calibration in the common-item equating design. Applied Psychological Measurement, 26, 3–24. Holman, R., Glas, C.A.W. (2005). Modelling non-ignorable missing-data mechanisms with item response theory models. British Journal of Mathematical and Statistical Psychology, 58, 1–17. Hoshino, T. (2007). Doubly robust type estimation for covariate adjustment in latent variable modeling. Psychometrika, 72, 535–549. Hoshino, T. (2008). A Bayesian propensity score adjustment for latent variable modeling and MCMC algorithm. Com- putational Statistics Data Analysis, 52, 1413–1429. Hoshino, T., Kurata, H., Shigemasu, K. (2006). A propensity score adjustment for multiple group structural equation modeling. Psychometrika, 71, 691–712. Ibrahim, J.G., Chen, M.H., Lipsitz, S.R. (2001). Missing responses in generalised linear mixed models when the missing data mechanism is nonignorable. Biometrika, 88, 551–564.
  • 19. KEI MIYAZAKI ET AL. 19 Kato, K. Japan External Trade Organization (JETRO) (2006). BJT buisiness Japanese proficiency test official guide. Japan External Trade Organization(JETRO), Tokyo, Japan. Kim, S.H., Cohen, A.S. (1992). Effects of linking methods on detection of DIF. Journal of Educational Measurement, 29, 51–66. Kolen, M.J., Brennan, R.L. (2004). Test equating, scaling, and linking: methods and practices (2nd ed.). New York: Springer. Little, R.J.A., Rubin, D.B. (2002). Statistical analysis with missing data (2nd ed.). New York: Wiley. Lord, F.M. (1974). Estimation of latent ability and item parameters when there are omitted responses. Psychometrika, 39, 247–264. Louis, T.A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statis- tical Society, Series B, 44, 226–233. Schafer, J.L. (1997). Analysis of incomplete multivariate data. New York: Chapman Hall. Stocking, M.L., Lord, F.M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7, 201–210. van der Linden, W.J., Luecht, R.M. (1998). Observed-score equating as a test assembly problem. Psychometrika, 63, 401–418. von Davier, M., von Davier, A.A. (2004). A unified approach to irt scale linkage and scale transformations (Research Report RR-04-09). ETS: Princeton, NJ Wei, G.C.G., Tanner, M.A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithm. Journal of the American Statistical Association, 85, 699–704. Wingersky, M.S., Lord, F.M. (1984). An investigation of methods for reducing sampling error in certain IRT proce- dures. Applied Psychological Measurement, 8, 347–364. Yang, W.L. (2004). Sensitivity of linkings between AP multiple-choice scores and composite scores to geographical region: An illustration of checking for population invariance. Journal of Educational Measurement, 41, 33–41. Manuscript Received: 31 MAR 2007 Final Version Received: 14 JUL 2008 Published Online Date: 9 SEP 2008