Fractional hot deck imputation - Jae Kim

Fractional hot deck imputation for multivariate
missing data in survey sampling
Jae Kwang Kim 1
Iowa State University
March 18, 2015
1
Joint work with Jongho Im and Wayne Fuller

Introduction
Basic Setup
Assume simple random sampling, for simplicity.
Under complete response, suppose that
ˆηn,g = n−1
n
i=1
g(yi )
is an unbiased estimator of ηg = E{g(Y )}, for known g(·).
δi = 1 if yi is observed and δi = 0 otherwise.
y∗
i : imputed value for yi for unit i with δi = 0.
Imputed estimator of ηg
ˆηI,g = n−1
n
i=1
{δi g(yi ) + (1 − δi )g(y∗
i )}
Need E {g(y∗
i ) | δi = 0} = E {g(yi ) | δi = 0}.
Kim (ISU) Fractional Imputation March 18, 2015 2 / 35

Introduction
ML estimation under missing data setup
Often, ﬁnd x (always observed) such that
Missing at random (MAR) holds: f (y | x, δ = 0) = f (y | x)
Imputed values are created from f (y | x).
Computing the conditional expectation can be a challenging problem.
1 Do not know the true parameter θ in f (y | x) = f (y | x; θ):
E {g (y) | x} = E {g (yi ) | xi ; θ} .
2 Even if we know θ, computing the conditional expectation can be
numerically diﬃcult.

Introduction
Imputation
Imputation: Monte Carlo approximation of the conditional
expectation (given the observed data).
E {g (yi ) | xi } ∼=
1
m
m
j=1
g y
∗(j)
i
1 Bayesian approach: generate y∗
i from
f (yi | xi , yobs) = f (yi | xi , θ) p(θ | xi , yobs)dθ
2 Frequentist approach: generate y∗
i from f yi | xi ; ˆθ , where ˆθ is a
consistent estimator.

Introduction
Basic Setup (Cont’d)
Thus, imputation is a computational tool for computing the
conditional expectation E{g(yi ) | xi } for missing unit i.
To compute the conditional expectation, we need to specify a model
f (y | x; θ) evaluated at θ = ˆθ.
Thus, we can write ˆηI,g = ˆηI,g (ˆθ).
To estimate the variance of ˆηI,g , we need to take into account of the
sampling variability of ˆθ in ˆηI,g = ˆθI,g (ˆθ).

Introduction
Basic Setup (Cont’d)
Three approaches
Bayesian approach: multiple imputation by Rubin (1978, 1987),
Rubin and Schenker (1986), etc.
Resampling approach: Rao and Shao (1992), Efron (1994), Rao and
Sitter (1995), Shao and Sitter (1996), Kim and Fuller (2004), Fuller
and Kim (2005).
Linearization approach: Clayton et al (1998), Shao and Steel (1999),
Robins and Wang (2000), Kim and Rao (2009).

Comparison
Bayesian Frequentist
Model Posterior distribution Prediction model
f (latent, θ | data) f (latent | data, θ)
Computation Data augmentation EM algorithm
Prediction I-step E-step
Parameter update P-step M-step
Parameter est’n Posterior mode ML estimation
Imputation Multiple imputation Fractional imputation
Variance estimation Rubin’s formula Linearization
or Bootstrap

Multiple imputation
The multiple imputation estimator of η, denoted by ˆηMI , is
ˆηMI =
1
m
m
j=1
ˆη
(j)
I
Rubin’s variance estimator is
ˆVMI (ˆηMI ) = Wm + 1 +
1
m
Bm,
where WM = m−1 m
j=1
ˆV (j) and Bm = (m − 1)−1 m
j=1(ˆη
(j)
I − ˆηMI )2.

Multiple imputation
Rubin’s variance estimator is based on the following decomposition,
var(ˆηMI ) = var(ˆηn) + var(ˆηMI,∞ − ˆηn) + var(ˆηMI − ˆηMI,∞), (1)
where ˆηn is the complete-sample estimator of η and ˆηMI,∞ is the
probability limit of ˆηMI for m → ∞.
Under some regularity conditions, Wm term estimates the ﬁrst term,
the Bm term estimates the second term, and the m−1Bm term
estimates the last term of (1), respectively.

Multiple imputation
In particular, Kim et al (2006, JRSSB) proved that the bias of
Bias( ˆVMI ) ∼= −2cov(ˆηMI − ˆηn, ˆηn). (2)
The decomposition (1) is equivalent to assuming that
cov(ˆηMI − ˆηn, ˆηn) ∼= 0, which is called the congeniality condition by
Meng (1994).
The congeniality condition holds when ˆηn is the MLE of η. In such
cases, Rubin’s variance estimator is asymptotically unbiased.

Multiple imputation
Theorem (Yang and Kim, 2015)
Let ˆηn = n−1 n
i=1 g(yi ) be used to estimate η = E{g(Y )} under
complete response. Then, under some regularity conditions, the bias of
Bias( ˆVMI ) ∼= 2n−1
(1 − p)E0 var g(Y ) − Bg (X)T
S(θ) | X ≥ 0,
with equality if and only if g(Y ) is a linear function of S(θ), where
p = E(δ), S(θ) is the score function of θ in f (y | x; θ),
Bg (X) = [var{S(θ) | X}]−1
cov{S(θ), g(Y ) | X},
and E0(·) = E(· | δ = 0).

Multiple imputation
Example
Suppose that you are interested in estimating η = P(Y ≤ 3).
Assume a normal model for f (y | x; θ) for multiple imputation.
Two choices for ˆηn:
1 Method-of-moment estimator: ˆηn1 = n−1 n
i=1 I(yi ≤ 3).
2 Maximum-likelihood estimator:
ˆηn2 = n−1
n
i=1
P(Y ≤ 3 | xi ; ˆθ),
where P(Y ≤ 3 | xi ; ˆθ) =
3
−∞
f (y | xi ; θ)dy.
Rubin’s variance estimator is nearly unbiased for ˆηn2, but provide
conservative variance estimation for ˆηn1 (30-50% overestimation of
the variances in most cases).

Fractional Imputation
Idea (parametric model approach)
Approximate E{g(yi ) | xi } by
E{g(yi ) | xi } ∼=
Mi
j=1
w∗
ij g(y
∗(j)
i )
where w∗
ij is the fractional weight assigned to the j-th imputed value
of yi .
If yi is a categorical variable, we can use
y
∗(j)
i = the j-th possible value of yi
w
∗(j)
ij = P(yi = y
∗(j)
i | xi ; ˆθ),
where ˆθ is the (pseudo) MLE of θ.

Fractional imputation
Features
Split the record with missing item into m(> 1) imputed values
Assign fractional weights
The ﬁnal product is a single data ﬁle with size ≤ nm.
For variance estimation, the fractional weights are replicated.

Fractional imputation
Example (n = 10)
ID Weight y1 y2
1 w1 y1,1 y1,2
2 w2 y2,1 ?
3 w3 ? y3,2
4 w4 y4,1 y4,2
5 w5 y5,1 y5,2
6 w6 y6,1 y6,2
7 w7 ? y7,2
8 w8 ? ?
9 w9 y9,1 y9,2
10 w10 y10,2 y10,2
?: Missing

Fractional imputation (categorical case)
Fractional Imputation Idea
If both y1 and y2 are categorical, then fractional imputation is easy to
apply.
We have only ﬁnite number of possible values.
Imputed values = possible values
The fractional weights are the conditional probabilities of the possible
values given the observations.
Can use “EM by weighting” method of Ibrahim (1990) to compute
the fractional weights.

Example (y1, y2: dichotomous, taking 0 or 1)
ID Weight y1 y2
1 w1 y1,1 y1,2
2 w2w∗
2,1 y2,1 0
w2w∗
2,2 y2,1 1
3 w3w∗
3,1 0 y3,2
w3w∗
3,2 1 y3,2
4 w4 y4,1 y4,2
5 w5 y5,1 y5,2

Example (y1, y2: dichotomous, taking 0 or 1)
ID Weight y1 y2
6 w6 y6,1 y6,2
7 w7w∗
7,1 0 y7,2
w7w∗
7,2 1 y7,2
8 w8w∗
8,1 0 0
w8w∗
8,2 0 1
w8w∗
8,3 1 0
w8w∗
8,4 1 1
9 w9 y9,1 y9,2
10 w10 y10,1 y10,2

Example (Cont’d)
E-step: Fractional weights are the conditional probabilities of the
imputed values given the observations.
w∗
ij = ˆP(y
∗(j)
i,mis | yi,obs)
=
ˆπ(yi,obs, y
∗(j)
i,mis)
Mi
l=1 ˆπ(yi,obs, y
∗(l)
i,mis)
where (yi,obs, yi,mis) is the (observed, missing) part of
yi = (yi1, · · · , yi,p).
M-step: Update the joint probability using the fractional weights.
ˆπab =
1
ˆN
n
i=1
Mi
j=1
wi w∗
ij I(y
∗(j)
i,1 = a, y
∗(j)
i,2 = b)
with ˆN = n
i=1 wi .

Example (Cont’d) Variance estimation
Recompute the fractional weights for each replication
Apply the same EM algorithm using the replicated weights.
E-step: Fractional weights are the conditional probabilities of the
imputed values given the observations.
w
∗(k)
ij =
ˆπ(k)
(yi,obs, y
∗(j)
i,mis)
Mi
l=1 ˆπ(k)(yi,obs, y
∗(l)
i,mis)
M-step: Update the joint probability using the fractional weights.
ˆπ
(k)
ab =
1
ˆN(k)
n
i=1
Mi
j=1
w
(k)
i w
∗(k)
ij I(y
∗(j)
i,1 = a, y
∗(j)
i,2 = b)
where ˆN(k)
=
n
i=1 w
(k)
i .

Example (Cont’d) Final Product
Replication Weights
Weight x y1 y2 Rep 1 Rep 2 · · · Rep L
w1 x1 y1,1 y1,2 w
(1)
1 w
(2)
1 · · · w
(L)
1
w2w∗
2,1 x2 y2,2 0 w
(1)
2 w
∗(1)
2,1 w
(2)
2 w
∗(2)
2,1 · · · w
(L)
2 w
∗(L)
2,1
w2w∗
2,2 x2 y2,2 1 w
(1)
2 w
∗(1)
2,2 w
(2)
2 w
∗(2)
2,1 · · · w
(L)
2 w
∗(L)
2,2
w3w∗
3,1 x3 0 y3,2 w
(1)
3 w
∗(1)
3,1 w
(2)
3 w
∗(2)
3,1 · · · w
(L)
3 w
∗(L)
3,1
w3w∗
3,2 x3 1 y3,2 w
(1)
3 w
∗(1)
3,2 w
(2)
3 w
∗(2)
3,2 · · · w
(L)
3 w
∗(L)
3,2
w4 x4 y4,1 y4,2 w
(1)
4 w
(2)
4 · · · w
(L)
4
w5 x5 y5,1 y5,2 w
(1)
5 w
(2)
5 · · · w
(L)
5
w6 x6 y6,1 y6,2 w
(1)
6 w
(2)
6 · · · w
(L)
6

Example (Cont’d) Final Product
Replication Weights
Weight x y1 y2 Rep 1 Rep 2 Rep L
w7w∗
7,1 x7 0 y7,2 w
(1)
7 w
∗(1)
7,1 w
(2)
7 w
∗(2)
7,1 w
(L)
7 w
∗(L)
7,1
w7w∗
7,2 x7 1 y7,2 w
(1)
7 w
∗(1)
7,2 w
(2)
7 w
∗(2)
7,2 w
(L)
7 w
∗(L)
7,2
w8w∗
8,1 x8 0 0 w
(1)
8 w
∗(1)
8,1 w
(2)
8 w
∗(2)
8,1 w
(L)
8 w
∗(L)
8,1
w8w∗
8,2 x8 0 1 w
(1)
8 w
∗(1)
8,2 w
(2)
8 w
∗(2)
8,2 w
(L)
8 w
∗(L)
8,2
w8w∗
8,3 x8 1 0 w
(1)
8 w
∗(1)
8,3 w
(2)
8 w
∗(2)
8,3 w
(L)
8 w
∗(L)
8,3
w8w∗
8,4 x8 1 1 w
(1)
8 w
∗(1)
8,4 w
(2)
8 w
∗(2)
8,4 w
(L)
8 w
∗(L)
8,4
w9 x9 y9,1 y9,2 w
(1)
9 w
(2)
9 · · · w
(L)
9
w10 x10 y10,1 y10,2 w
(1)
10 w
(2)
10 · · · w
(L)
10

Fractional hot deck imputation (general case)
Goals
Fractional hot deck imputation of size m. The ﬁnal product is a
single data ﬁle with size ≤ n · m.
Preserves correlation structure
Variance estimation relatively easy
Can handle domain estimation (but we do not know which domains
will be used.)

Fractional hot deck imputation
Idea
In hot deck imputation, we can make a nonparametric approximation
of f (·) using a ﬁnite mixture model
f (yi,mis | yi,obs) =
G
g=1
πg (yi,obs)fg (yi,mis), (3)
where πg (yi,obs) = P(zi = g | yi,obs), fg (yi,mis) = f (yi,mis | z = g)
and z is the latent variable associated with imputation cell.
To satisfy the above approximation, we need to ﬁnd z such that
f (yi,mis | zi , yi,obs) = f (yi,mis | zi ).

Imputation cell
Assume p-dimensional survey items: Y = (Y1, · · · , Yp)
For each item k, create a transformation of Yk into Zk, a discrete
version of Yk based on the sample quantiles among respondents.
If yi,k is missing, then zi,k is also missing.
Imputation cells are created based on the observed value of
zi = (zi,1, · · · , zi,p).
Expression (3) can be written as
f (yi,mis | yi,obs) =
zmis
P(zi,mis = zmis | zi,obs)f (yi,mis | zmis), (4)
where zi = (zi,obs, zi,mis) similarly to yi = (yi,obs, yi,mis).

Estimation of cell probability
Let {z1, · · · , zG } be the support z, which is the same as the sample
support of z from the full respondents.
Cell probability πg = P(z = zg ).
For each unit i, we only observe zi,obs.
Use EM algorithm for categorical missing data to estimate πg .

Two-step method for fractional hot deck imputation
1 Given zi,obs, identify possible values of zi,mis from the estimated
conditional probability P(zi,mis | zi,obs).
2 For each cell z∗
i = (zi,obs, z∗
i,mis), we select mg ≥ 2 imputed values for
yi,mis randomly from the full respondents in the same cell. (Joint hot
deck within imputation cell)
It is essentially two-phase stratified sampling. In phase one, stratification
(=cell identification) is made. In phase two, random sampling (=hot deck
imputation) is made within imputation cell.

Example
For example, consider trivariate Y = (Y1, Y2, Y3).
23 = 8 missing patterns are possible.
Imputation cells are created using Z = (Z1, Z2, Z3), where Zk is a
categorization of Yk.
If 3 categories are used for Zk, then 33 = 27 possible cells are created.
Two-phase imputation
1 For example, if unit i has missingness in Y1 and Y2, we use the
observed value of Z3i to identify possible cells. ( possible cells ≤ 9 )
2 For each cell with Zi = (Z∗
1i , Z∗
2i , Z3i ), mg = 2 imputed values are
chosen at random from the set of full respondents with the same cell.
3 For ﬁnal fractional weights assigned to y
∗(j)
i = (y
∗(j)
1i , y
∗(j)
2i , y3i ) is
w∗
ij = ˆP(z
∗(j)
1i , z
∗(j)
2i | z3i ) · (1/mg ).

Variance estimation
The ﬁnal fractional hot deck imputation estimator can be written,
ˆηFHDI = ˆηFEFI + (ˆηFHDI − ˆηFEFI ), (5)
where ˆηFHDI is a proposed fractional hot deck imputation estimator
and ˆηFEFI is also fractional hot deck imputation estimator but use all
possible donors in each imputation cell (Kim and Fuller, 2004).
From the decomposition (5), the variance of fractional hot deck
imputation estimator can be estimated with two components,
var(ˆηFHDI ) = var(ˆηFEFI ) + var(ˆηFHDI − ˆηFEFI ). (6)

Variance estimation (Cont’d)
Need two replication estimator (weights) to account for (6).
First replication fractional weights w
∗(k)
1,ij , k = 1, . . . , L:
w
∗(k)
1,ij = w∗
ij + R
(k)
ij , where R
(k)
ij is a regression adjusted term that
depends on imputation cells of the recipient i and guarantees
ˆη
(k)
FHDI,1 − ˆηFHDI = ˆη
(k)
FEFI − ˆηFEFI ,
that is,
L
k=1
ck (ˆη
(k)
FHDI,1 − ˆηFHDI )2
=
L
k=1
ck (ˆη
(k)
FEFI − ˆηFEFI )2
,
where ˆη
(k)
FHDI,1 is the ﬁrst replication estimator and ck is a factor
associated with k-th replication.

Variance estimation (Cont’d)
Second replication fractional weights w
∗(s)
2,ij , s = 1, . . . , G:
var(ˆηFHDI − ˆηFEFI ) is essentially about sampling variance due to
imputation.
Represent the sampling variance in each imputation cell (G replicates).
w
∗(s)
2,ij = w∗
ij + A
(s)
ij , where A
(s)
ij is an adjusted term that depends on
imputation cells of the recipient i and guarantees
E
G
s=1
(ˆη
(s)
FHDI,1 − ˆηFHDI )2
= var(ˆηFHDI − ˆηFEFI ).
Note that two replication weights are non-negative and their sum over
imputed values in each recipient is one.

Simulation
Trivariate data,
Y1 ∼ U(0, 2),
Y2 = 1 + Y1 + e2, e2 ∼ N(0, 1/2)
Y3 = 2 + Y1 + 0.5Y2 + e3, e3 ∼ N(0, 1)
The response are determined by the Bernoulli with p = (0.5, 0.7, 0.9)
for (Y1, Y2, Y3), respectively.
Multivariate fractional hot deck imputation was used with mg = 2
fractional imputation within each cell.
Categorical transformation (basically with 3 categories) was used to
each of Y1, Y2, and Y3.
Within imputation cell, joint hot deck imputation was used.
B = 2, 000 simulation samples with size of n = 500.

Simulation results: point estimation
Table 1 Point estimation
Parameter Method Mean Std Var.
Complete Data 1.00 100
E(Y1) FHDI 1.00 167
E(Y2) FHDI 2.00 134
E(Y3) FHDI 4.00 107
E(Y1 < 1, Y2 < 2) FHDI 0.40 154

Simulation results: variance estimation
Table 2 Variance estimation for FHDI
Parameter Relative Bias (%)
V (ˆθ1) -1.7
V (ˆθ2) 0.5
V (ˆθ3) 3.4
V (ˆθ4) -3.4
θ1 = E(Y1), θ2 = E(Y2) and θ3 = E(Y3)
θ4 = E(Y1 < 1, Y2 < 2).

Conclusion
Fractional hot deck imputation is considered.
Categorical data transformation was used for approximation.
Does not rely on parametric model assumptions.
Two-step imputation
1 Step 1: Allocation of Imputation of cells
2 Step 2: Joint hot deck imputation within imputation cell.
Replication-based approach for imputation variance estimation.
To be implemented in SAS: Proc SurveyImpute

Fractional hot deck imputation - Jae Kim

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Fractional hot deck imputation - Jae Kim

Similar to Fractional hot deck imputation - Jae Kim (20)

Recently uploaded

Recently uploaded (20)

Fractional hot deck imputation - Jae Kim