SlideShare a Scribd company logo
1 of 35
Download to read offline
Fractional hot deck imputation for multivariate
missing data in survey sampling
Jae Kwang Kim 1
Iowa State University
March 18, 2015
1
Joint work with Jongho Im and Wayne Fuller
Introduction
Basic Setup
Assume simple random sampling, for simplicity.
Under complete response, suppose that
ˆηn,g = n−1
n
i=1
g(yi )
is an unbiased estimator of ηg = E{g(Y )}, for known g(·).
δi = 1 if yi is observed and δi = 0 otherwise.
y∗
i : imputed value for yi for unit i with δi = 0.
Imputed estimator of ηg
ˆηI,g = n−1
n
i=1
{δi g(yi ) + (1 − δi )g(y∗
i )}
Need E {g(y∗
i ) | δi = 0} = E {g(yi ) | δi = 0}.
Kim (ISU) Fractional Imputation March 18, 2015 2 / 35
Introduction
ML estimation under missing data setup
Often, find x (always observed) such that
Missing at random (MAR) holds: f (y | x, δ = 0) = f (y | x)
Imputed values are created from f (y | x).
Computing the conditional expectation can be a challenging problem.
1 Do not know the true parameter θ in f (y | x) = f (y | x; θ):
E {g (y) | x} = E {g (yi ) | xi ; θ} .
2 Even if we know θ, computing the conditional expectation can be
numerically difficult.
Kim (ISU) Fractional Imputation March 18, 2015 3 / 35
Introduction
Imputation
Imputation: Monte Carlo approximation of the conditional
expectation (given the observed data).
E {g (yi ) | xi } ∼=
1
m
m
j=1
g y
∗(j)
i
1 Bayesian approach: generate y∗
i from
f (yi | xi , yobs) = f (yi | xi , θ) p(θ | xi , yobs)dθ
2 Frequentist approach: generate y∗
i from f yi | xi ; ˆθ , where ˆθ is a
consistent estimator.
Kim (ISU) Fractional Imputation March 18, 2015 4 / 35
Introduction
Basic Setup (Cont’d)
Thus, imputation is a computational tool for computing the
conditional expectation E{g(yi ) | xi } for missing unit i.
To compute the conditional expectation, we need to specify a model
f (y | x; θ) evaluated at θ = ˆθ.
Thus, we can write ˆηI,g = ˆηI,g (ˆθ).
To estimate the variance of ˆηI,g , we need to take into account of the
sampling variability of ˆθ in ˆηI,g = ˆθI,g (ˆθ).
Kim (ISU) Fractional Imputation March 18, 2015 5 / 35
Introduction
Basic Setup (Cont’d)
Three approaches
Bayesian approach: multiple imputation by Rubin (1978, 1987),
Rubin and Schenker (1986), etc.
Resampling approach: Rao and Shao (1992), Efron (1994), Rao and
Sitter (1995), Shao and Sitter (1996), Kim and Fuller (2004), Fuller
and Kim (2005).
Linearization approach: Clayton et al (1998), Shao and Steel (1999),
Robins and Wang (2000), Kim and Rao (2009).
Kim (ISU) Fractional Imputation March 18, 2015 6 / 35
Comparison
Bayesian Frequentist
Model Posterior distribution Prediction model
f (latent, θ | data) f (latent | data, θ)
Computation Data augmentation EM algorithm
Prediction I-step E-step
Parameter update P-step M-step
Parameter est’n Posterior mode ML estimation
Imputation Multiple imputation Fractional imputation
Variance estimation Rubin’s formula Linearization
or Bootstrap
Kim (ISU) Fractional Imputation March 18, 2015 7 / 35
Multiple imputation
The multiple imputation estimator of η, denoted by ˆηMI , is
ˆηMI =
1
m
m
j=1
ˆη
(j)
I
Rubin’s variance estimator is
ˆVMI (ˆηMI ) = Wm + 1 +
1
m
Bm,
where WM = m−1 m
j=1
ˆV (j) and Bm = (m − 1)−1 m
j=1(ˆη
(j)
I − ˆηMI )2.
Kim (ISU) Fractional Imputation March 18, 2015 8 / 35
Multiple imputation
Rubin’s variance estimator is based on the following decomposition,
var(ˆηMI ) = var(ˆηn) + var(ˆηMI,∞ − ˆηn) + var(ˆηMI − ˆηMI,∞), (1)
where ˆηn is the complete-sample estimator of η and ˆηMI,∞ is the
probability limit of ˆηMI for m → ∞.
Under some regularity conditions, Wm term estimates the first term,
the Bm term estimates the second term, and the m−1Bm term
estimates the last term of (1), respectively.
Kim (ISU) Fractional Imputation March 18, 2015 9 / 35
Multiple imputation
In particular, Kim et al (2006, JRSSB) proved that the bias of
Rubin’s variance estimator is
Bias( ˆVMI ) ∼= −2cov(ˆηMI − ˆηn, ˆηn). (2)
The decomposition (1) is equivalent to assuming that
cov(ˆηMI − ˆηn, ˆηn) ∼= 0, which is called the congeniality condition by
Meng (1994).
The congeniality condition holds when ˆηn is the MLE of η. In such
cases, Rubin’s variance estimator is asymptotically unbiased.
Kim (ISU) Fractional Imputation March 18, 2015 10 / 35
Multiple imputation
Theorem (Yang and Kim, 2015)
Let ˆηn = n−1 n
i=1 g(yi ) be used to estimate η = E{g(Y )} under
complete response. Then, under some regularity conditions, the bias of
Rubin’s variance estimator is
Bias( ˆVMI ) ∼= 2n−1
(1 − p)E0 var g(Y ) − Bg (X)T
S(θ) | X ≥ 0,
with equality if and only if g(Y ) is a linear function of S(θ), where
p = E(δ), S(θ) is the score function of θ in f (y | x; θ),
Bg (X) = [var{S(θ) | X}]−1
cov{S(θ), g(Y ) | X},
and E0(·) = E(· | δ = 0).
Kim (ISU) Fractional Imputation March 18, 2015 11 / 35
Multiple imputation
Example
Suppose that you are interested in estimating η = P(Y ≤ 3).
Assume a normal model for f (y | x; θ) for multiple imputation.
Two choices for ˆηn:
1 Method-of-moment estimator: ˆηn1 = n−1 n
i=1 I(yi ≤ 3).
2 Maximum-likelihood estimator:
ˆηn2 = n−1
n
i=1
P(Y ≤ 3 | xi ; ˆθ),
where P(Y ≤ 3 | xi ; ˆθ) =
3
−∞
f (y | xi ; θ)dy.
Rubin’s variance estimator is nearly unbiased for ˆηn2, but provide
conservative variance estimation for ˆηn1 (30-50% overestimation of
the variances in most cases).
Kim (ISU) Fractional Imputation March 18, 2015 12 / 35
Fractional Imputation
Idea (parametric model approach)
Approximate E{g(yi ) | xi } by
E{g(yi ) | xi } ∼=
Mi
j=1
w∗
ij g(y
∗(j)
i )
where w∗
ij is the fractional weight assigned to the j-th imputed value
of yi .
If yi is a categorical variable, we can use
y
∗(j)
i = the j-th possible value of yi
w
∗(j)
ij = P(yi = y
∗(j)
i | xi ; ˆθ),
where ˆθ is the (pseudo) MLE of θ.
Kim (ISU) Fractional Imputation March 18, 2015 13 / 35
Fractional imputation
Features
Split the record with missing item into m(> 1) imputed values
Assign fractional weights
The final product is a single data file with size ≤ nm.
For variance estimation, the fractional weights are replicated.
Kim (ISU) Fractional Imputation March 18, 2015 14 / 35
Fractional imputation
Example (n = 10)
ID Weight y1 y2
1 w1 y1,1 y1,2
2 w2 y2,1 ?
3 w3 ? y3,2
4 w4 y4,1 y4,2
5 w5 y5,1 y5,2
6 w6 y6,1 y6,2
7 w7 ? y7,2
8 w8 ? ?
9 w9 y9,1 y9,2
10 w10 y10,2 y10,2
?: Missing
Kim (ISU) Fractional Imputation March 18, 2015 15 / 35
Fractional imputation (categorical case)
Fractional Imputation Idea
If both y1 and y2 are categorical, then fractional imputation is easy to
apply.
We have only finite number of possible values.
Imputed values = possible values
The fractional weights are the conditional probabilities of the possible
values given the observations.
Can use “EM by weighting” method of Ibrahim (1990) to compute
the fractional weights.
Kim (ISU) Fractional Imputation March 18, 2015 16 / 35
Fractional imputation (categorical case)
Example (y1, y2: dichotomous, taking 0 or 1)
ID Weight y1 y2
1 w1 y1,1 y1,2
2 w2w∗
2,1 y2,1 0
w2w∗
2,2 y2,1 1
3 w3w∗
3,1 0 y3,2
w3w∗
3,2 1 y3,2
4 w4 y4,1 y4,2
5 w5 y5,1 y5,2
Kim (ISU) Fractional Imputation March 18, 2015 17 / 35
Fractional imputation (categorical case)
Example (y1, y2: dichotomous, taking 0 or 1)
ID Weight y1 y2
6 w6 y6,1 y6,2
7 w7w∗
7,1 0 y7,2
w7w∗
7,2 1 y7,2
8 w8w∗
8,1 0 0
w8w∗
8,2 0 1
w8w∗
8,3 1 0
w8w∗
8,4 1 1
9 w9 y9,1 y9,2
10 w10 y10,1 y10,2
Kim (ISU) Fractional Imputation March 18, 2015 18 / 35
Fractional imputation (categorical case)
Example (Cont’d)
E-step: Fractional weights are the conditional probabilities of the
imputed values given the observations.
w∗
ij = ˆP(y
∗(j)
i,mis | yi,obs)
=
ˆπ(yi,obs, y
∗(j)
i,mis)
Mi
l=1 ˆπ(yi,obs, y
∗(l)
i,mis)
where (yi,obs, yi,mis) is the (observed, missing) part of
yi = (yi1, · · · , yi,p).
M-step: Update the joint probability using the fractional weights.
ˆπab =
1
ˆN
n
i=1
Mi
j=1
wi w∗
ij I(y
∗(j)
i,1 = a, y
∗(j)
i,2 = b)
with ˆN = n
i=1 wi .
Kim (ISU) Fractional Imputation March 18, 2015 19 / 35
Fractional imputation (categorical case)
Example (Cont’d) Variance estimation
Recompute the fractional weights for each replication
Apply the same EM algorithm using the replicated weights.
E-step: Fractional weights are the conditional probabilities of the
imputed values given the observations.
w
∗(k)
ij =
ˆπ(k)
(yi,obs, y
∗(j)
i,mis)
Mi
l=1 ˆπ(k)(yi,obs, y
∗(l)
i,mis)
M-step: Update the joint probability using the fractional weights.
ˆπ
(k)
ab =
1
ˆN(k)
n
i=1
Mi
j=1
w
(k)
i w
∗(k)
ij I(y
∗(j)
i,1 = a, y
∗(j)
i,2 = b)
where ˆN(k)
=
n
i=1 w
(k)
i .
Kim (ISU) Fractional Imputation March 18, 2015 20 / 35
Fractional imputation (categorical case)
Example (Cont’d) Final Product
Replication Weights
Weight x y1 y2 Rep 1 Rep 2 · · · Rep L
w1 x1 y1,1 y1,2 w
(1)
1 w
(2)
1 · · · w
(L)
1
w2w∗
2,1 x2 y2,2 0 w
(1)
2 w
∗(1)
2,1 w
(2)
2 w
∗(2)
2,1 · · · w
(L)
2 w
∗(L)
2,1
w2w∗
2,2 x2 y2,2 1 w
(1)
2 w
∗(1)
2,2 w
(2)
2 w
∗(2)
2,1 · · · w
(L)
2 w
∗(L)
2,2
w3w∗
3,1 x3 0 y3,2 w
(1)
3 w
∗(1)
3,1 w
(2)
3 w
∗(2)
3,1 · · · w
(L)
3 w
∗(L)
3,1
w3w∗
3,2 x3 1 y3,2 w
(1)
3 w
∗(1)
3,2 w
(2)
3 w
∗(2)
3,2 · · · w
(L)
3 w
∗(L)
3,2
w4 x4 y4,1 y4,2 w
(1)
4 w
(2)
4 · · · w
(L)
4
w5 x5 y5,1 y5,2 w
(1)
5 w
(2)
5 · · · w
(L)
5
w6 x6 y6,1 y6,2 w
(1)
6 w
(2)
6 · · · w
(L)
6
Kim (ISU) Fractional Imputation March 18, 2015 21 / 35
Fractional imputation (categorical case)
Example (Cont’d) Final Product
Replication Weights
Weight x y1 y2 Rep 1 Rep 2 Rep L
w7w∗
7,1 x7 0 y7,2 w
(1)
7 w
∗(1)
7,1 w
(2)
7 w
∗(2)
7,1 w
(L)
7 w
∗(L)
7,1
w7w∗
7,2 x7 1 y7,2 w
(1)
7 w
∗(1)
7,2 w
(2)
7 w
∗(2)
7,2 w
(L)
7 w
∗(L)
7,2
w8w∗
8,1 x8 0 0 w
(1)
8 w
∗(1)
8,1 w
(2)
8 w
∗(2)
8,1 w
(L)
8 w
∗(L)
8,1
w8w∗
8,2 x8 0 1 w
(1)
8 w
∗(1)
8,2 w
(2)
8 w
∗(2)
8,2 w
(L)
8 w
∗(L)
8,2
w8w∗
8,3 x8 1 0 w
(1)
8 w
∗(1)
8,3 w
(2)
8 w
∗(2)
8,3 w
(L)
8 w
∗(L)
8,3
w8w∗
8,4 x8 1 1 w
(1)
8 w
∗(1)
8,4 w
(2)
8 w
∗(2)
8,4 w
(L)
8 w
∗(L)
8,4
w9 x9 y9,1 y9,2 w
(1)
9 w
(2)
9 · · · w
(L)
9
w10 x10 y10,1 y10,2 w
(1)
10 w
(2)
10 · · · w
(L)
10
Kim (ISU) Fractional Imputation March 18, 2015 22 / 35
Fractional hot deck imputation (general case)
Goals
Fractional hot deck imputation of size m. The final product is a
single data file with size ≤ n · m.
Preserves correlation structure
Variance estimation relatively easy
Can handle domain estimation (but we do not know which domains
will be used.)
Kim (ISU) Fractional Imputation March 18, 2015 23 / 35
Fractional hot deck imputation
Idea
In hot deck imputation, we can make a nonparametric approximation
of f (·) using a finite mixture model
f (yi,mis | yi,obs) =
G
g=1
πg (yi,obs)fg (yi,mis), (3)
where πg (yi,obs) = P(zi = g | yi,obs), fg (yi,mis) = f (yi,mis | z = g)
and z is the latent variable associated with imputation cell.
To satisfy the above approximation, we need to find z such that
f (yi,mis | zi , yi,obs) = f (yi,mis | zi ).
Kim (ISU) Fractional Imputation March 18, 2015 24 / 35
Fractional hot deck imputation
Imputation cell
Assume p-dimensional survey items: Y = (Y1, · · · , Yp)
For each item k, create a transformation of Yk into Zk, a discrete
version of Yk based on the sample quantiles among respondents.
If yi,k is missing, then zi,k is also missing.
Imputation cells are created based on the observed value of
zi = (zi,1, · · · , zi,p).
Expression (3) can be written as
f (yi,mis | yi,obs) =
zmis
P(zi,mis = zmis | zi,obs)f (yi,mis | zmis), (4)
where zi = (zi,obs, zi,mis) similarly to yi = (yi,obs, yi,mis).
Kim (ISU) Fractional Imputation March 18, 2015 25 / 35
Fractional hot deck imputation
Estimation of cell probability
Let {z1, · · · , zG } be the support z, which is the same as the sample
support of z from the full respondents.
Cell probability πg = P(z = zg ).
For each unit i, we only observe zi,obs.
Use EM algorithm for categorical missing data to estimate πg .
Kim (ISU) Fractional Imputation March 18, 2015 26 / 35
Fractional hot deck imputation
Two-step method for fractional hot deck imputation
1 Given zi,obs, identify possible values of zi,mis from the estimated
conditional probability P(zi,mis | zi,obs).
2 For each cell z∗
i = (zi,obs, z∗
i,mis), we select mg ≥ 2 imputed values for
yi,mis randomly from the full respondents in the same cell. (Joint hot
deck within imputation cell)
It is essentially two-phase stratified sampling. In phase one, stratification
(=cell identification) is made. In phase two, random sampling (=hot deck
imputation) is made within imputation cell.
Kim (ISU) Fractional Imputation March 18, 2015 27 / 35
Fractional hot deck imputation
Example
For example, consider trivariate Y = (Y1, Y2, Y3).
23 = 8 missing patterns are possible.
Imputation cells are created using Z = (Z1, Z2, Z3), where Zk is a
categorization of Yk.
If 3 categories are used for Zk, then 33 = 27 possible cells are created.
Two-phase imputation
1 For example, if unit i has missingness in Y1 and Y2, we use the
observed value of Z3i to identify possible cells. ( possible cells ≤ 9 )
2 For each cell with Zi = (Z∗
1i , Z∗
2i , Z3i ), mg = 2 imputed values are
chosen at random from the set of full respondents with the same cell.
3 For final fractional weights assigned to y
∗(j)
i = (y
∗(j)
1i , y
∗(j)
2i , y3i ) is
w∗
ij = ˆP(z
∗(j)
1i , z
∗(j)
2i | z3i ) · (1/mg ).
Kim (ISU) Fractional Imputation March 18, 2015 28 / 35
Variance estimation
The final fractional hot deck imputation estimator can be written,
ˆηFHDI = ˆηFEFI + (ˆηFHDI − ˆηFEFI ), (5)
where ˆηFHDI is a proposed fractional hot deck imputation estimator
and ˆηFEFI is also fractional hot deck imputation estimator but use all
possible donors in each imputation cell (Kim and Fuller, 2004).
From the decomposition (5), the variance of fractional hot deck
imputation estimator can be estimated with two components,
var(ˆηFHDI ) = var(ˆηFEFI ) + var(ˆηFHDI − ˆηFEFI ). (6)
Kim (ISU) Fractional Imputation March 18, 2015 29 / 35
Variance estimation (Cont’d)
Need two replication estimator (weights) to account for (6).
First replication fractional weights w
∗(k)
1,ij , k = 1, . . . , L:
w
∗(k)
1,ij = w∗
ij + R
(k)
ij , where R
(k)
ij is a regression adjusted term that
depends on imputation cells of the recipient i and guarantees
ˆη
(k)
FHDI,1 − ˆηFHDI = ˆη
(k)
FEFI − ˆηFEFI ,
that is,
L
k=1
ck (ˆη
(k)
FHDI,1 − ˆηFHDI )2
=
L
k=1
ck (ˆη
(k)
FEFI − ˆηFEFI )2
,
where ˆη
(k)
FHDI,1 is the first replication estimator and ck is a factor
associated with k-th replication.
Kim (ISU) Fractional Imputation March 18, 2015 30 / 35
Fractional hot deck imputation
Variance estimation (Cont’d)
Second replication fractional weights w
∗(s)
2,ij , s = 1, . . . , G:
var(ˆηFHDI − ˆηFEFI ) is essentially about sampling variance due to
imputation.
Represent the sampling variance in each imputation cell (G replicates).
w
∗(s)
2,ij = w∗
ij + A
(s)
ij , where A
(s)
ij is an adjusted term that depends on
imputation cells of the recipient i and guarantees
E
G
s=1
(ˆη
(s)
FHDI,1 − ˆηFHDI )2
= var(ˆηFHDI − ˆηFEFI ).
Note that two replication weights are non-negative and their sum over
imputed values in each recipient is one.
Kim (ISU) Fractional Imputation March 18, 2015 31 / 35
Fractional hot deck imputation
Simulation
Trivariate data,
Y1 ∼ U(0, 2),
Y2 = 1 + Y1 + e2, e2 ∼ N(0, 1/2)
Y3 = 2 + Y1 + 0.5Y2 + e3, e3 ∼ N(0, 1)
The response are determined by the Bernoulli with p = (0.5, 0.7, 0.9)
for (Y1, Y2, Y3), respectively.
Multivariate fractional hot deck imputation was used with mg = 2
fractional imputation within each cell.
Categorical transformation (basically with 3 categories) was used to
each of Y1, Y2, and Y3.
Within imputation cell, joint hot deck imputation was used.
B = 2, 000 simulation samples with size of n = 500.
Kim (ISU) Fractional Imputation March 18, 2015 32 / 35
Fractional hot deck imputation
Simulation results: point estimation
Table 1 Point estimation
Parameter Method Mean Std Var.
Complete Data 1.00 100
E(Y1) FHDI 1.00 167
Complete Data 2.00 100
E(Y2) FHDI 2.00 134
Complete Data 4.00 100
E(Y3) FHDI 4.00 107
Complete Data 0.40 100
E(Y1 < 1, Y2 < 2) FHDI 0.40 154
Kim (ISU) Fractional Imputation March 18, 2015 33 / 35
Fractional hot deck imputation
Simulation results: variance estimation
Table 2 Variance estimation for FHDI
Parameter Relative Bias (%)
V (ˆθ1) -1.7
V (ˆθ2) 0.5
V (ˆθ3) 3.4
V (ˆθ4) -3.4
θ1 = E(Y1), θ2 = E(Y2) and θ3 = E(Y3)
θ4 = E(Y1 < 1, Y2 < 2).
Kim (ISU) Fractional Imputation March 18, 2015 34 / 35
Conclusion
Fractional hot deck imputation is considered.
Categorical data transformation was used for approximation.
Does not rely on parametric model assumptions.
Two-step imputation
1 Step 1: Allocation of Imputation of cells
2 Step 2: Joint hot deck imputation within imputation cell.
Replication-based approach for imputation variance estimation.
To be implemented in SAS: Proc SurveyImpute
Kim (ISU) Fractional Imputation March 18, 2015 35 / 35

More Related Content

What's hot

Bellman functions and Lp estimates for paraproducts
Bellman functions and Lp estimates for paraproductsBellman functions and Lp estimates for paraproducts
Bellman functions and Lp estimates for paraproductsVjekoslavKovac1
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsCaleb (Shiqiang) Jin
 
Logit stick-breaking priors for partially exchangeable count data
Logit stick-breaking priors for partially exchangeable count dataLogit stick-breaking priors for partially exchangeable count data
Logit stick-breaking priors for partially exchangeable count dataTommaso Rigon
 
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operatorsA T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operatorsVjekoslavKovac1
 
A Note on Pseudo Operations Decomposable Measure
A Note on Pseudo Operations Decomposable MeasureA Note on Pseudo Operations Decomposable Measure
A Note on Pseudo Operations Decomposable Measureinventionjournals
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
 
Multilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureMultilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureVjekoslavKovac1
 
11.generalized and subset integrated autoregressive moving average bilinear t...
11.generalized and subset integrated autoregressive moving average bilinear t...11.generalized and subset integrated autoregressive moving average bilinear t...
11.generalized and subset integrated autoregressive moving average bilinear t...Alexander Decker
 
Integration material
Integration material Integration material
Integration material Surya Swaroop
 
Theory of Relational Calculus and its Formalization
Theory of Relational Calculus and its FormalizationTheory of Relational Calculus and its Formalization
Theory of Relational Calculus and its FormalizationYoshihiro Mizoguchi
 
Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted ParaproductsVjekoslavKovac1
 
A Coq Library for the Theory of Relational Calculus
A Coq Library for the Theory of Relational CalculusA Coq Library for the Theory of Relational Calculus
A Coq Library for the Theory of Relational CalculusYoshihiro Mizoguchi
 
Mathematics notes and formula for class 12 chapter 7. integrals
Mathematics notes and formula for class 12 chapter 7. integrals Mathematics notes and formula for class 12 chapter 7. integrals
Mathematics notes and formula for class 12 chapter 7. integrals sakhi pathak
 
Algebras for programming languages
Algebras for programming languagesAlgebras for programming languages
Algebras for programming languagesYoshihiro Mizoguchi
 
Bayesian regression models and treed Gaussian process models
Bayesian regression models and treed Gaussian process modelsBayesian regression models and treed Gaussian process models
Bayesian regression models and treed Gaussian process modelsTommaso Rigon
 
Generalizations For Cartans Equations And Bianchi Identities For Arbitrary Di...
Generalizations For Cartans Equations And Bianchi Identities For Arbitrary Di...Generalizations For Cartans Equations And Bianchi Identities For Arbitrary Di...
Generalizations For Cartans Equations And Bianchi Identities For Arbitrary Di...vcuesta
 
Computational Tools and Techniques for Numerical Macro-Financial Modeling
Computational Tools and Techniques for Numerical Macro-Financial ModelingComputational Tools and Techniques for Numerical Macro-Financial Modeling
Computational Tools and Techniques for Numerical Macro-Financial ModelingVictor Zhorin
 

What's hot (20)

Bellman functions and Lp estimates for paraproducts
Bellman functions and Lp estimates for paraproductsBellman functions and Lp estimates for paraproducts
Bellman functions and Lp estimates for paraproducts
 
Bayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear modelsBayesian hybrid variable selection under generalized linear models
Bayesian hybrid variable selection under generalized linear models
 
Logit stick-breaking priors for partially exchangeable count data
Logit stick-breaking priors for partially exchangeable count dataLogit stick-breaking priors for partially exchangeable count data
Logit stick-breaking priors for partially exchangeable count data
 
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operatorsA T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
 
A Note on Pseudo Operations Decomposable Measure
A Note on Pseudo Operations Decomposable MeasureA Note on Pseudo Operations Decomposable Measure
A Note on Pseudo Operations Decomposable Measure
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
Multilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureMultilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structure
 
11.generalized and subset integrated autoregressive moving average bilinear t...
11.generalized and subset integrated autoregressive moving average bilinear t...11.generalized and subset integrated autoregressive moving average bilinear t...
11.generalized and subset integrated autoregressive moving average bilinear t...
 
2002 santiago et al
2002 santiago et al2002 santiago et al
2002 santiago et al
 
Integration material
Integration material Integration material
Integration material
 
Theory of Relational Calculus and its Formalization
Theory of Relational Calculus and its FormalizationTheory of Relational Calculus and its Formalization
Theory of Relational Calculus and its Formalization
 
Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted Paraproducts
 
A Coq Library for the Theory of Relational Calculus
A Coq Library for the Theory of Relational CalculusA Coq Library for the Theory of Relational Calculus
A Coq Library for the Theory of Relational Calculus
 
Mathematics notes and formula for class 12 chapter 7. integrals
Mathematics notes and formula for class 12 chapter 7. integrals Mathematics notes and formula for class 12 chapter 7. integrals
Mathematics notes and formula for class 12 chapter 7. integrals
 
Algebras for programming languages
Algebras for programming languagesAlgebras for programming languages
Algebras for programming languages
 
Bayesian regression models and treed Gaussian process models
Bayesian regression models and treed Gaussian process modelsBayesian regression models and treed Gaussian process models
Bayesian regression models and treed Gaussian process models
 
CMU_13
CMU_13CMU_13
CMU_13
 
Generalizations For Cartans Equations And Bianchi Identities For Arbitrary Di...
Generalizations For Cartans Equations And Bianchi Identities For Arbitrary Di...Generalizations For Cartans Equations And Bianchi Identities For Arbitrary Di...
Generalizations For Cartans Equations And Bianchi Identities For Arbitrary Di...
 
Computational Tools and Techniques for Numerical Macro-Financial Modeling
Computational Tools and Techniques for Numerical Macro-Financial ModelingComputational Tools and Techniques for Numerical Macro-Financial Modeling
Computational Tools and Techniques for Numerical Macro-Financial Modeling
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 

Similar to Fractional hot deck imputation - Jae Kim

Predictive mean-matching2
Predictive mean-matching2Predictive mean-matching2
Predictive mean-matching2Jae-kwang Kim
 
4thchannel conference poster_freedom_gumedze
4thchannel conference poster_freedom_gumedze4thchannel conference poster_freedom_gumedze
4thchannel conference poster_freedom_gumedzeFreedom Gumedze
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsUmberto Picchini
 
Uncertainty in deep learning
Uncertainty in deep learningUncertainty in deep learning
Uncertainty in deep learningYujiro Katagiri
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Pierre Jacob
 
Comparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering modelsComparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering modelsBigMC
 
Cs229 notes7b
Cs229 notes7bCs229 notes7b
Cs229 notes7bVuTran231
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Deb Roy
 
Bayesian case studies, practical 2
Bayesian case studies, practical 2Bayesian case studies, practical 2
Bayesian case studies, practical 2Robin Ryder
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...Alexander Decker
 

Similar to Fractional hot deck imputation - Jae Kim (20)

Fi review5
Fi review5Fi review5
Fi review5
 
Predictive mean-matching2
Predictive mean-matching2Predictive mean-matching2
Predictive mean-matching2
 
MNAR
MNARMNAR
MNAR
 
4thchannel conference poster_freedom_gumedze
4thchannel conference poster_freedom_gumedze4thchannel conference poster_freedom_gumedze
4thchannel conference poster_freedom_gumedze
 
Ab cancun
Ab cancunAb cancun
Ab cancun
 
Ica group 3[1]
Ica group 3[1]Ica group 3[1]
Ica group 3[1]
 
ABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space modelsABC with data cloning for MLE in state space models
ABC with data cloning for MLE in state space models
 
Aaapresmarsella
AaapresmarsellaAaapresmarsella
Aaapresmarsella
 
Uncertainty in deep learning
Uncertainty in deep learningUncertainty in deep learning
Uncertainty in deep learning
 
Propensity albert
Propensity albertPropensity albert
Propensity albert
 
Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...Estimation of the score vector and observed information matrix in intractable...
Estimation of the score vector and observed information matrix in intractable...
 
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
 
lec2_CS540_handouts.pdf
lec2_CS540_handouts.pdflec2_CS540_handouts.pdf
lec2_CS540_handouts.pdf
 
Comparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering modelsComparing estimation algorithms for block clustering models
Comparing estimation algorithms for block clustering models
 
Cs229 notes7b
Cs229 notes7bCs229 notes7b
Cs229 notes7b
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01
 
Bayesian case studies, practical 2
Bayesian case studies, practical 2Bayesian case studies, practical 2
Bayesian case studies, practical 2
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...
 
Equivariance
EquivarianceEquivariance
Equivariance
 

Recently uploaded

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 

Recently uploaded (20)

“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 

Fractional hot deck imputation - Jae Kim

  • 1. Fractional hot deck imputation for multivariate missing data in survey sampling Jae Kwang Kim 1 Iowa State University March 18, 2015 1 Joint work with Jongho Im and Wayne Fuller
  • 2. Introduction Basic Setup Assume simple random sampling, for simplicity. Under complete response, suppose that ˆηn,g = n−1 n i=1 g(yi ) is an unbiased estimator of ηg = E{g(Y )}, for known g(·). δi = 1 if yi is observed and δi = 0 otherwise. y∗ i : imputed value for yi for unit i with δi = 0. Imputed estimator of ηg ˆηI,g = n−1 n i=1 {δi g(yi ) + (1 − δi )g(y∗ i )} Need E {g(y∗ i ) | δi = 0} = E {g(yi ) | δi = 0}. Kim (ISU) Fractional Imputation March 18, 2015 2 / 35
  • 3. Introduction ML estimation under missing data setup Often, find x (always observed) such that Missing at random (MAR) holds: f (y | x, δ = 0) = f (y | x) Imputed values are created from f (y | x). Computing the conditional expectation can be a challenging problem. 1 Do not know the true parameter θ in f (y | x) = f (y | x; θ): E {g (y) | x} = E {g (yi ) | xi ; θ} . 2 Even if we know θ, computing the conditional expectation can be numerically difficult. Kim (ISU) Fractional Imputation March 18, 2015 3 / 35
  • 4. Introduction Imputation Imputation: Monte Carlo approximation of the conditional expectation (given the observed data). E {g (yi ) | xi } ∼= 1 m m j=1 g y ∗(j) i 1 Bayesian approach: generate y∗ i from f (yi | xi , yobs) = f (yi | xi , θ) p(θ | xi , yobs)dθ 2 Frequentist approach: generate y∗ i from f yi | xi ; ˆθ , where ˆθ is a consistent estimator. Kim (ISU) Fractional Imputation March 18, 2015 4 / 35
  • 5. Introduction Basic Setup (Cont’d) Thus, imputation is a computational tool for computing the conditional expectation E{g(yi ) | xi } for missing unit i. To compute the conditional expectation, we need to specify a model f (y | x; θ) evaluated at θ = ˆθ. Thus, we can write ˆηI,g = ˆηI,g (ˆθ). To estimate the variance of ˆηI,g , we need to take into account of the sampling variability of ˆθ in ˆηI,g = ˆθI,g (ˆθ). Kim (ISU) Fractional Imputation March 18, 2015 5 / 35
  • 6. Introduction Basic Setup (Cont’d) Three approaches Bayesian approach: multiple imputation by Rubin (1978, 1987), Rubin and Schenker (1986), etc. Resampling approach: Rao and Shao (1992), Efron (1994), Rao and Sitter (1995), Shao and Sitter (1996), Kim and Fuller (2004), Fuller and Kim (2005). Linearization approach: Clayton et al (1998), Shao and Steel (1999), Robins and Wang (2000), Kim and Rao (2009). Kim (ISU) Fractional Imputation March 18, 2015 6 / 35
  • 7. Comparison Bayesian Frequentist Model Posterior distribution Prediction model f (latent, θ | data) f (latent | data, θ) Computation Data augmentation EM algorithm Prediction I-step E-step Parameter update P-step M-step Parameter est’n Posterior mode ML estimation Imputation Multiple imputation Fractional imputation Variance estimation Rubin’s formula Linearization or Bootstrap Kim (ISU) Fractional Imputation March 18, 2015 7 / 35
  • 8. Multiple imputation The multiple imputation estimator of η, denoted by ˆηMI , is ˆηMI = 1 m m j=1 ˆη (j) I Rubin’s variance estimator is ˆVMI (ˆηMI ) = Wm + 1 + 1 m Bm, where WM = m−1 m j=1 ˆV (j) and Bm = (m − 1)−1 m j=1(ˆη (j) I − ˆηMI )2. Kim (ISU) Fractional Imputation March 18, 2015 8 / 35
  • 9. Multiple imputation Rubin’s variance estimator is based on the following decomposition, var(ˆηMI ) = var(ˆηn) + var(ˆηMI,∞ − ˆηn) + var(ˆηMI − ˆηMI,∞), (1) where ˆηn is the complete-sample estimator of η and ˆηMI,∞ is the probability limit of ˆηMI for m → ∞. Under some regularity conditions, Wm term estimates the first term, the Bm term estimates the second term, and the m−1Bm term estimates the last term of (1), respectively. Kim (ISU) Fractional Imputation March 18, 2015 9 / 35
  • 10. Multiple imputation In particular, Kim et al (2006, JRSSB) proved that the bias of Rubin’s variance estimator is Bias( ˆVMI ) ∼= −2cov(ˆηMI − ˆηn, ˆηn). (2) The decomposition (1) is equivalent to assuming that cov(ˆηMI − ˆηn, ˆηn) ∼= 0, which is called the congeniality condition by Meng (1994). The congeniality condition holds when ˆηn is the MLE of η. In such cases, Rubin’s variance estimator is asymptotically unbiased. Kim (ISU) Fractional Imputation March 18, 2015 10 / 35
  • 11. Multiple imputation Theorem (Yang and Kim, 2015) Let ˆηn = n−1 n i=1 g(yi ) be used to estimate η = E{g(Y )} under complete response. Then, under some regularity conditions, the bias of Rubin’s variance estimator is Bias( ˆVMI ) ∼= 2n−1 (1 − p)E0 var g(Y ) − Bg (X)T S(θ) | X ≥ 0, with equality if and only if g(Y ) is a linear function of S(θ), where p = E(δ), S(θ) is the score function of θ in f (y | x; θ), Bg (X) = [var{S(θ) | X}]−1 cov{S(θ), g(Y ) | X}, and E0(·) = E(· | δ = 0). Kim (ISU) Fractional Imputation March 18, 2015 11 / 35
  • 12. Multiple imputation Example Suppose that you are interested in estimating η = P(Y ≤ 3). Assume a normal model for f (y | x; θ) for multiple imputation. Two choices for ˆηn: 1 Method-of-moment estimator: ˆηn1 = n−1 n i=1 I(yi ≤ 3). 2 Maximum-likelihood estimator: ˆηn2 = n−1 n i=1 P(Y ≤ 3 | xi ; ˆθ), where P(Y ≤ 3 | xi ; ˆθ) = 3 −∞ f (y | xi ; θ)dy. Rubin’s variance estimator is nearly unbiased for ˆηn2, but provide conservative variance estimation for ˆηn1 (30-50% overestimation of the variances in most cases). Kim (ISU) Fractional Imputation March 18, 2015 12 / 35
  • 13. Fractional Imputation Idea (parametric model approach) Approximate E{g(yi ) | xi } by E{g(yi ) | xi } ∼= Mi j=1 w∗ ij g(y ∗(j) i ) where w∗ ij is the fractional weight assigned to the j-th imputed value of yi . If yi is a categorical variable, we can use y ∗(j) i = the j-th possible value of yi w ∗(j) ij = P(yi = y ∗(j) i | xi ; ˆθ), where ˆθ is the (pseudo) MLE of θ. Kim (ISU) Fractional Imputation March 18, 2015 13 / 35
  • 14. Fractional imputation Features Split the record with missing item into m(> 1) imputed values Assign fractional weights The final product is a single data file with size ≤ nm. For variance estimation, the fractional weights are replicated. Kim (ISU) Fractional Imputation March 18, 2015 14 / 35
  • 15. Fractional imputation Example (n = 10) ID Weight y1 y2 1 w1 y1,1 y1,2 2 w2 y2,1 ? 3 w3 ? y3,2 4 w4 y4,1 y4,2 5 w5 y5,1 y5,2 6 w6 y6,1 y6,2 7 w7 ? y7,2 8 w8 ? ? 9 w9 y9,1 y9,2 10 w10 y10,2 y10,2 ?: Missing Kim (ISU) Fractional Imputation March 18, 2015 15 / 35
  • 16. Fractional imputation (categorical case) Fractional Imputation Idea If both y1 and y2 are categorical, then fractional imputation is easy to apply. We have only finite number of possible values. Imputed values = possible values The fractional weights are the conditional probabilities of the possible values given the observations. Can use “EM by weighting” method of Ibrahim (1990) to compute the fractional weights. Kim (ISU) Fractional Imputation March 18, 2015 16 / 35
  • 17. Fractional imputation (categorical case) Example (y1, y2: dichotomous, taking 0 or 1) ID Weight y1 y2 1 w1 y1,1 y1,2 2 w2w∗ 2,1 y2,1 0 w2w∗ 2,2 y2,1 1 3 w3w∗ 3,1 0 y3,2 w3w∗ 3,2 1 y3,2 4 w4 y4,1 y4,2 5 w5 y5,1 y5,2 Kim (ISU) Fractional Imputation March 18, 2015 17 / 35
  • 18. Fractional imputation (categorical case) Example (y1, y2: dichotomous, taking 0 or 1) ID Weight y1 y2 6 w6 y6,1 y6,2 7 w7w∗ 7,1 0 y7,2 w7w∗ 7,2 1 y7,2 8 w8w∗ 8,1 0 0 w8w∗ 8,2 0 1 w8w∗ 8,3 1 0 w8w∗ 8,4 1 1 9 w9 y9,1 y9,2 10 w10 y10,1 y10,2 Kim (ISU) Fractional Imputation March 18, 2015 18 / 35
  • 19. Fractional imputation (categorical case) Example (Cont’d) E-step: Fractional weights are the conditional probabilities of the imputed values given the observations. w∗ ij = ˆP(y ∗(j) i,mis | yi,obs) = ˆπ(yi,obs, y ∗(j) i,mis) Mi l=1 ˆπ(yi,obs, y ∗(l) i,mis) where (yi,obs, yi,mis) is the (observed, missing) part of yi = (yi1, · · · , yi,p). M-step: Update the joint probability using the fractional weights. ˆπab = 1 ˆN n i=1 Mi j=1 wi w∗ ij I(y ∗(j) i,1 = a, y ∗(j) i,2 = b) with ˆN = n i=1 wi . Kim (ISU) Fractional Imputation March 18, 2015 19 / 35
  • 20. Fractional imputation (categorical case) Example (Cont’d) Variance estimation Recompute the fractional weights for each replication Apply the same EM algorithm using the replicated weights. E-step: Fractional weights are the conditional probabilities of the imputed values given the observations. w ∗(k) ij = ˆπ(k) (yi,obs, y ∗(j) i,mis) Mi l=1 ˆπ(k)(yi,obs, y ∗(l) i,mis) M-step: Update the joint probability using the fractional weights. ˆπ (k) ab = 1 ˆN(k) n i=1 Mi j=1 w (k) i w ∗(k) ij I(y ∗(j) i,1 = a, y ∗(j) i,2 = b) where ˆN(k) = n i=1 w (k) i . Kim (ISU) Fractional Imputation March 18, 2015 20 / 35
  • 21. Fractional imputation (categorical case) Example (Cont’d) Final Product Replication Weights Weight x y1 y2 Rep 1 Rep 2 · · · Rep L w1 x1 y1,1 y1,2 w (1) 1 w (2) 1 · · · w (L) 1 w2w∗ 2,1 x2 y2,2 0 w (1) 2 w ∗(1) 2,1 w (2) 2 w ∗(2) 2,1 · · · w (L) 2 w ∗(L) 2,1 w2w∗ 2,2 x2 y2,2 1 w (1) 2 w ∗(1) 2,2 w (2) 2 w ∗(2) 2,1 · · · w (L) 2 w ∗(L) 2,2 w3w∗ 3,1 x3 0 y3,2 w (1) 3 w ∗(1) 3,1 w (2) 3 w ∗(2) 3,1 · · · w (L) 3 w ∗(L) 3,1 w3w∗ 3,2 x3 1 y3,2 w (1) 3 w ∗(1) 3,2 w (2) 3 w ∗(2) 3,2 · · · w (L) 3 w ∗(L) 3,2 w4 x4 y4,1 y4,2 w (1) 4 w (2) 4 · · · w (L) 4 w5 x5 y5,1 y5,2 w (1) 5 w (2) 5 · · · w (L) 5 w6 x6 y6,1 y6,2 w (1) 6 w (2) 6 · · · w (L) 6 Kim (ISU) Fractional Imputation March 18, 2015 21 / 35
  • 22. Fractional imputation (categorical case) Example (Cont’d) Final Product Replication Weights Weight x y1 y2 Rep 1 Rep 2 Rep L w7w∗ 7,1 x7 0 y7,2 w (1) 7 w ∗(1) 7,1 w (2) 7 w ∗(2) 7,1 w (L) 7 w ∗(L) 7,1 w7w∗ 7,2 x7 1 y7,2 w (1) 7 w ∗(1) 7,2 w (2) 7 w ∗(2) 7,2 w (L) 7 w ∗(L) 7,2 w8w∗ 8,1 x8 0 0 w (1) 8 w ∗(1) 8,1 w (2) 8 w ∗(2) 8,1 w (L) 8 w ∗(L) 8,1 w8w∗ 8,2 x8 0 1 w (1) 8 w ∗(1) 8,2 w (2) 8 w ∗(2) 8,2 w (L) 8 w ∗(L) 8,2 w8w∗ 8,3 x8 1 0 w (1) 8 w ∗(1) 8,3 w (2) 8 w ∗(2) 8,3 w (L) 8 w ∗(L) 8,3 w8w∗ 8,4 x8 1 1 w (1) 8 w ∗(1) 8,4 w (2) 8 w ∗(2) 8,4 w (L) 8 w ∗(L) 8,4 w9 x9 y9,1 y9,2 w (1) 9 w (2) 9 · · · w (L) 9 w10 x10 y10,1 y10,2 w (1) 10 w (2) 10 · · · w (L) 10 Kim (ISU) Fractional Imputation March 18, 2015 22 / 35
  • 23. Fractional hot deck imputation (general case) Goals Fractional hot deck imputation of size m. The final product is a single data file with size ≤ n · m. Preserves correlation structure Variance estimation relatively easy Can handle domain estimation (but we do not know which domains will be used.) Kim (ISU) Fractional Imputation March 18, 2015 23 / 35
  • 24. Fractional hot deck imputation Idea In hot deck imputation, we can make a nonparametric approximation of f (·) using a finite mixture model f (yi,mis | yi,obs) = G g=1 πg (yi,obs)fg (yi,mis), (3) where πg (yi,obs) = P(zi = g | yi,obs), fg (yi,mis) = f (yi,mis | z = g) and z is the latent variable associated with imputation cell. To satisfy the above approximation, we need to find z such that f (yi,mis | zi , yi,obs) = f (yi,mis | zi ). Kim (ISU) Fractional Imputation March 18, 2015 24 / 35
  • 25. Fractional hot deck imputation Imputation cell Assume p-dimensional survey items: Y = (Y1, · · · , Yp) For each item k, create a transformation of Yk into Zk, a discrete version of Yk based on the sample quantiles among respondents. If yi,k is missing, then zi,k is also missing. Imputation cells are created based on the observed value of zi = (zi,1, · · · , zi,p). Expression (3) can be written as f (yi,mis | yi,obs) = zmis P(zi,mis = zmis | zi,obs)f (yi,mis | zmis), (4) where zi = (zi,obs, zi,mis) similarly to yi = (yi,obs, yi,mis). Kim (ISU) Fractional Imputation March 18, 2015 25 / 35
  • 26. Fractional hot deck imputation Estimation of cell probability Let {z1, · · · , zG } be the support z, which is the same as the sample support of z from the full respondents. Cell probability πg = P(z = zg ). For each unit i, we only observe zi,obs. Use EM algorithm for categorical missing data to estimate πg . Kim (ISU) Fractional Imputation March 18, 2015 26 / 35
  • 27. Fractional hot deck imputation Two-step method for fractional hot deck imputation 1 Given zi,obs, identify possible values of zi,mis from the estimated conditional probability P(zi,mis | zi,obs). 2 For each cell z∗ i = (zi,obs, z∗ i,mis), we select mg ≥ 2 imputed values for yi,mis randomly from the full respondents in the same cell. (Joint hot deck within imputation cell) It is essentially two-phase stratified sampling. In phase one, stratification (=cell identification) is made. In phase two, random sampling (=hot deck imputation) is made within imputation cell. Kim (ISU) Fractional Imputation March 18, 2015 27 / 35
  • 28. Fractional hot deck imputation Example For example, consider trivariate Y = (Y1, Y2, Y3). 23 = 8 missing patterns are possible. Imputation cells are created using Z = (Z1, Z2, Z3), where Zk is a categorization of Yk. If 3 categories are used for Zk, then 33 = 27 possible cells are created. Two-phase imputation 1 For example, if unit i has missingness in Y1 and Y2, we use the observed value of Z3i to identify possible cells. ( possible cells ≤ 9 ) 2 For each cell with Zi = (Z∗ 1i , Z∗ 2i , Z3i ), mg = 2 imputed values are chosen at random from the set of full respondents with the same cell. 3 For final fractional weights assigned to y ∗(j) i = (y ∗(j) 1i , y ∗(j) 2i , y3i ) is w∗ ij = ˆP(z ∗(j) 1i , z ∗(j) 2i | z3i ) · (1/mg ). Kim (ISU) Fractional Imputation March 18, 2015 28 / 35
  • 29. Variance estimation The final fractional hot deck imputation estimator can be written, ˆηFHDI = ˆηFEFI + (ˆηFHDI − ˆηFEFI ), (5) where ˆηFHDI is a proposed fractional hot deck imputation estimator and ˆηFEFI is also fractional hot deck imputation estimator but use all possible donors in each imputation cell (Kim and Fuller, 2004). From the decomposition (5), the variance of fractional hot deck imputation estimator can be estimated with two components, var(ˆηFHDI ) = var(ˆηFEFI ) + var(ˆηFHDI − ˆηFEFI ). (6) Kim (ISU) Fractional Imputation March 18, 2015 29 / 35
  • 30. Variance estimation (Cont’d) Need two replication estimator (weights) to account for (6). First replication fractional weights w ∗(k) 1,ij , k = 1, . . . , L: w ∗(k) 1,ij = w∗ ij + R (k) ij , where R (k) ij is a regression adjusted term that depends on imputation cells of the recipient i and guarantees ˆη (k) FHDI,1 − ˆηFHDI = ˆη (k) FEFI − ˆηFEFI , that is, L k=1 ck (ˆη (k) FHDI,1 − ˆηFHDI )2 = L k=1 ck (ˆη (k) FEFI − ˆηFEFI )2 , where ˆη (k) FHDI,1 is the first replication estimator and ck is a factor associated with k-th replication. Kim (ISU) Fractional Imputation March 18, 2015 30 / 35
  • 31. Fractional hot deck imputation Variance estimation (Cont’d) Second replication fractional weights w ∗(s) 2,ij , s = 1, . . . , G: var(ˆηFHDI − ˆηFEFI ) is essentially about sampling variance due to imputation. Represent the sampling variance in each imputation cell (G replicates). w ∗(s) 2,ij = w∗ ij + A (s) ij , where A (s) ij is an adjusted term that depends on imputation cells of the recipient i and guarantees E G s=1 (ˆη (s) FHDI,1 − ˆηFHDI )2 = var(ˆηFHDI − ˆηFEFI ). Note that two replication weights are non-negative and their sum over imputed values in each recipient is one. Kim (ISU) Fractional Imputation March 18, 2015 31 / 35
  • 32. Fractional hot deck imputation Simulation Trivariate data, Y1 ∼ U(0, 2), Y2 = 1 + Y1 + e2, e2 ∼ N(0, 1/2) Y3 = 2 + Y1 + 0.5Y2 + e3, e3 ∼ N(0, 1) The response are determined by the Bernoulli with p = (0.5, 0.7, 0.9) for (Y1, Y2, Y3), respectively. Multivariate fractional hot deck imputation was used with mg = 2 fractional imputation within each cell. Categorical transformation (basically with 3 categories) was used to each of Y1, Y2, and Y3. Within imputation cell, joint hot deck imputation was used. B = 2, 000 simulation samples with size of n = 500. Kim (ISU) Fractional Imputation March 18, 2015 32 / 35
  • 33. Fractional hot deck imputation Simulation results: point estimation Table 1 Point estimation Parameter Method Mean Std Var. Complete Data 1.00 100 E(Y1) FHDI 1.00 167 Complete Data 2.00 100 E(Y2) FHDI 2.00 134 Complete Data 4.00 100 E(Y3) FHDI 4.00 107 Complete Data 0.40 100 E(Y1 < 1, Y2 < 2) FHDI 0.40 154 Kim (ISU) Fractional Imputation March 18, 2015 33 / 35
  • 34. Fractional hot deck imputation Simulation results: variance estimation Table 2 Variance estimation for FHDI Parameter Relative Bias (%) V (ˆθ1) -1.7 V (ˆθ2) 0.5 V (ˆθ3) 3.4 V (ˆθ4) -3.4 θ1 = E(Y1), θ2 = E(Y2) and θ3 = E(Y3) θ4 = E(Y1 < 1, Y2 < 2). Kim (ISU) Fractional Imputation March 18, 2015 34 / 35
  • 35. Conclusion Fractional hot deck imputation is considered. Categorical data transformation was used for approximation. Does not rely on parametric model assumptions. Two-step imputation 1 Step 1: Allocation of Imputation of cells 2 Step 2: Joint hot deck imputation within imputation cell. Replication-based approach for imputation variance estimation. To be implemented in SAS: Proc SurveyImpute Kim (ISU) Fractional Imputation March 18, 2015 35 / 35