Double Robustness: 
Theory and Applications with Missing Data 
Lu Mao 
Department of Biostatistics 
The University of North Carolina at Chapel Hill 
Email: lmao@unc.edu 
April 17, 2013 
1/49
Table of Contents 
Part I: A Semiparametric Perspective 
A motivating example 
Semiparametric approachs to coarsened data 
Constructing the estimating equation 
Part II: Applications in Missing Data Problems 
Data with two levels of missingness 
Monotone coarsened data 
2/49
Part I: A Semiparametric Perspective 
3/49
A Motivating Example 
I Given an iid sample Y1;    ; Yn from an arbitrary 
distribution, consider the estimation of the population 
mean  = EY by  Y , which solves 
Pn(Y  ) = 0; 
where PnZ  1 
n 
Pn 
i=1 Zi. 
I Suppose some of the Yi's are missing. Let Ri = 1 if Yi is 
observed and = 0 if otherwise. Let (Y ) = P(R = 1jY ). 
Now consider estimating  by solving 
PnR(Y  ) = 0 
resulting in 
^CC = 
P 
PRiYi 
Ri 
!p 
E[(Y )Y ] 
E(Y ) 
6= : 
4/49
I Suppose in addition to Yi, an auxilary variable Xi is also 
collected, and R ? Y jX. Assume P(R = 1jY;X) = (X; ). 
To correct the bias, we apply the estimating equation ( ^n 
is a consistent estimator for 0) 
	IPW 
n = Pn 
R 
(X; ^n) 
(Y  ) 
resulting in 
^IPW = 
Pn[RY=(X; ^n)] 
Pn[R=(X; ^n)] 
= 
Pn[RY=(X; 0)] 
Pn[R=(X; 0)] 
+ op(1) 
!p 
E[RY=(X; 0)] 
E[R=(X; 0)] 
=  
(1) 
5/49
I Assume (X) = E(Y jX; ), and consider a new estimating 
equation as a modi
cation of 	IPW 
n : 
	DR 
n = Pn 
  
R 
(X; ^n) 
(Y  )  
R  (X; ^n) 
(X; ^n) 
! 
; 
((X; ^n)  ) 
(2) 
resulting in 
^DR = Pn 
  
R 
(X; ^n) 
Y  
R  (X; ^n) 
(X; ^n) 
! 
: (3) 
(X; ^n) 
Now let's study the consistency of ^DR under dierent 
assumptions. 
6/49
I Scenario 1. (X; ) correct; (X; ) incorrect. So, 
^n !p 0, but ^n ! , with (X; )6= E(Y jX): 
^DR = Pn 
 
R 
(X; 0) 
Y  
R  (X; 0) 
(X; 0) 
(X; ) 
 
+ op(1) 
!p E 
 
R 
(X; 0) 
Y 
 
 E 
 
R  (X; 0) 
(X; 0) 
 
(X; ) 
= E 
 
Y E 
 
R 
(X; 0)
Y;X 
 
 E 
 
(X; )E 
 
R  (X; 0) 
(X; 0)
Y;X 
 
=   0 
=  
(4) 
7/49
I Scenario 2.(X; ) correct; (X; ) incorrect; So, ^n !p 0, 
but ^n ! , with (X; )6= E(RjY;X): 
^DR = Pn 
 
R 
(X; ) 
Y  
R  (X; ) 
(X; ) 
(X; 0) 
 
+ op(1) 
!p E 
 
R 
(X; ) 
Y 
 
 E 
 
R  (X; ) 
(X; ) 
 
(X; 0) 
= E 
 
R 
(X; ) 
E(Y jR;X) 
 
 E 
 
R  (X; ) 
(X; ) 
(X; 0) 
 
= E 
 
R 
(X; ) 
 
 E 
(X; 0) 
 
R  (X; ) 
(X; ) 
(X; 0) 
 
= E[E((X; 0))] 
= : 
(5) 
8/49
Result 1 (Double robustness) 
^DR is consistent if either the  model or the  model is 
correct, that is, under M1 [M2, where 
M1 = fp(rjy; x; ) :  2 1g, and M2 = fp(yjx; ) :  2 2g. In 
other words, ^DR is doubly robust. 
I Now, let's consider a somewhat dierent question: 
eciency under M1 M2. For simplicity, we assume we 
know the true values (0; 0). 
I Denote Gng(Z) = p1 
n 
Pn 
i=1[g(Zi)  Eg(Z)]. Algebraic 
manipulations yield: 
p 
n(^IPW  ) = 
1 
Pn[R=(X; 0)] 
 Gn 
R 
0(X; ) 
(Y  ) 
  N(0; 2 
IPW): 
(6) 
9/49
I where 
2 
IPW = E 
 
R 
(X; 0) 
(Y  ) 
2 
= E 
(Y  )2 
(X; 0) 
: 
I Similarly 
p 
n(^DR  ) = Gn 
 
R 
(X; 0) 
(Y  )  
R  (X; 0) 
(X; 0) 
 
((X; 0)  ) 
  N(0; 2D 
R); 
(7) 
where 
2D 
R = E 
 
R 
(X; 0) 
2 
(Y  ) 
 2E 
 
R 
(X; 0) 
(Y  ) 
 
R  (X; 0) 
(X; 0) 
((X; 0)  ) 
 
+ E 
 
R  (X; 0) 
(X; 0) 
2 
((X; 0)  ) 
10/49
I 
2D 
R = E 
(Y  )2 
(X; 0) 
 2E 
 
1  (X; 0) 
(X; 0) 
((X; 0)  )2 
 
+ E 
 
1  (X; 0) 
(X; 0) 
((X; 0)  )2 
 
= 2 
IPW  E 
 
1  (X; 0) 
(X; 0) 
((X; 0)  )2 
 
(X;0) (Y  ),  A = R(X;0) 
(X;0) ((X; 0)  ), 
I  IPW = R 
22 
and  DR =  IPW +  A. Consider the Hilbert space L2(P). 
Since IPW ^and DR ^have in
uence functions  IPW and 
 DR respectively, their squared length (jj  jj E()2) are 
the asumptotic variances for ^IPW and ^DR. 
11/49
I The following
gure provides a geometric illustration: 
Figure : A geometric interpretation of eciency improvement by 
the DR. 
Result 2 (Eciency of DR) 
^DR is more ecient than ^IPW under M1 M2. 
12/49
Remark 1.1 
The above example suggests that 
I For a full data problem, there is a natural extension, via 
the IPW (inverse probability weighting) method, to a 
corresponding missing data problem; 
I By positing a working model p(zmisjzobs; ), the IPW 
estimating equation can be modi
ed by adding a suitable 
augmentation term, resulting in an estimator that is still 
consistent even if the working model p(zmisjzobs; ) is not 
correct; 
I If in case p(zmisjzobs; ) is correct, the new estimator is 
consistent even if missing mechanism is incorrectly 
modeled. In this sense, the new estimator is doubly robust; 
I The doubly robust estimator has improved eciency if 
both models are correct. 
13/49
Semiparametric approachs to coarsened data 
I First we introduce the terminology of coarsening, which 
contains missing data as a special case: 
De
nition 1.2 (Coarsening) 
Suppose the full data consist of iid observations of an 
l-dimensional random vector Z. De
ne a coarsening variable C 
such that when C = r, we only observe Gr(Z), where Gr() is a 
many-to-one function. Further denote C = 1 if Z is completely 
observed (no coarsening), that is, G1(Z) = Z. Thus, the 
observed data consist of iid copies of (C;GC(Z)). 
De
nition 1.3 (Coarsening at random) 
The data are said to be coarsened at random (CAR) if 
C ? ZjGC(Z). 
Remark 1.4 (Assumption) 
All problems considered are under the assumption of CAR. 
14/49
Terminology 
I Z: Full data; 
I GC(Z): Observed data; 
I (C;GC(Z)): Coarsened data. 
I Semiparametric models arise naturally in coarsened data 
problems. 
I Consider a full data regression model, z = (y; x)0: 
p(zj
; ) = p(yjx;
)p(x; ); 
where
is the regression parameter, and  is in
nite 
dimensional (e.g. arbitrary cdf F for x). 
I Now suppose some components of x is missing (at 
random), then the likelihood becomes 
q(y; xobs; rj
; ; ) = p(rjy; xobs; ) 
Z 
p(yjx;
)p(x; )dxmis 
15/49
I Now the in
nite dimensional nuisance  cannot be ignored. 
Hence we have arrived at a semiparametric model. 
I Let's review some basic theory about semiparametric 
inference. We assume as previously that
is p-dimensional, 
the parameter of interest, and  is a possibly 
in
nite-dimensional nuissance parameter. 
De
nition 1.5 (RAL and in
uence function) 
The estimator ^
n is regular asymptotically linear (RAL) if 
p 
n( ^
n
0) = Gn ~
0;0 + op(1): (8) 
The mean-zero function ~
0;0 is said to be the in
uence 
function of ^
n. 
Remark 1.6 (RAL estimator) 
If (8) holds, by CLT we easily have 
p 
n( ^
n
0)   N(0;E ~  2) 
16/49
De

Double Robustness: Theory and Applications with Missing Data

  • 1.
    Double Robustness: Theoryand Applications with Missing Data Lu Mao Department of Biostatistics The University of North Carolina at Chapel Hill Email: lmao@unc.edu April 17, 2013 1/49
  • 2.
    Table of Contents Part I: A Semiparametric Perspective A motivating example Semiparametric approachs to coarsened data Constructing the estimating equation Part II: Applications in Missing Data Problems Data with two levels of missingness Monotone coarsened data 2/49
  • 3.
    Part I: ASemiparametric Perspective 3/49
  • 4.
    A Motivating Example I Given an iid sample Y1; ; Yn from an arbitrary distribution, consider the estimation of the population mean = EY by Y , which solves Pn(Y ) = 0; where PnZ 1 n Pn i=1 Zi. I Suppose some of the Yi's are missing. Let Ri = 1 if Yi is observed and = 0 if otherwise. Let (Y ) = P(R = 1jY ). Now consider estimating by solving PnR(Y ) = 0 resulting in ^CC = P PRiYi Ri !p E[(Y )Y ] E(Y ) 6= : 4/49
  • 5.
    I Suppose inaddition to Yi, an auxilary variable Xi is also collected, and R ? Y jX. Assume P(R = 1jY;X) = (X; ). To correct the bias, we apply the estimating equation ( ^n is a consistent estimator for 0) IPW n = Pn R (X; ^n) (Y ) resulting in ^IPW = Pn[RY=(X; ^n)] Pn[R=(X; ^n)] = Pn[RY=(X; 0)] Pn[R=(X; 0)] + op(1) !p E[RY=(X; 0)] E[R=(X; 0)] = (1) 5/49
  • 6.
    I Assume (X)= E(Y jX; ), and consider a new estimating equation as a modi
  • 7.
    cation of IPW n : DR n = Pn R (X; ^n) (Y ) R (X; ^n) (X; ^n) ! ; ((X; ^n) ) (2) resulting in ^DR = Pn R (X; ^n) Y R (X; ^n) (X; ^n) ! : (3) (X; ^n) Now let's study the consistency of ^DR under dierent assumptions. 6/49
  • 8.
    I Scenario 1.(X; ) correct; (X; ) incorrect. So, ^n !p 0, but ^n ! , with (X; )6= E(Y jX): ^DR = Pn R (X; 0) Y R (X; 0) (X; 0) (X; ) + op(1) !p E R (X; 0) Y E R (X; 0) (X; 0) (X; ) = E Y E R (X; 0)
  • 12.
    Y;X E (X; )E R (X; 0) (X; 0)
  • 16.
    Y;X = 0 = (4) 7/49
  • 17.
    I Scenario 2.(X;) correct; (X; ) incorrect; So, ^n !p 0, but ^n ! , with (X; )6= E(RjY;X): ^DR = Pn R (X; ) Y R (X; ) (X; ) (X; 0) + op(1) !p E R (X; ) Y E R (X; ) (X; ) (X; 0) = E R (X; ) E(Y jR;X) E R (X; ) (X; ) (X; 0) = E R (X; ) E (X; 0) R (X; ) (X; ) (X; 0) = E[E((X; 0))] = : (5) 8/49
  • 18.
    Result 1 (Doublerobustness) ^DR is consistent if either the model or the model is correct, that is, under M1 [M2, where M1 = fp(rjy; x; ) : 2 1g, and M2 = fp(yjx; ) : 2 2g. In other words, ^DR is doubly robust. I Now, let's consider a somewhat dierent question: eciency under M1 M2. For simplicity, we assume we know the true values (0; 0). I Denote Gng(Z) = p1 n Pn i=1[g(Zi) Eg(Z)]. Algebraic manipulations yield: p n(^IPW ) = 1 Pn[R=(X; 0)] Gn R 0(X; ) (Y ) N(0; 2 IPW): (6) 9/49
  • 19.
    I where 2 IPW = E R (X; 0) (Y ) 2 = E (Y )2 (X; 0) : I Similarly p n(^DR ) = Gn R (X; 0) (Y ) R (X; 0) (X; 0) ((X; 0) ) N(0; 2D R); (7) where 2D R = E R (X; 0) 2 (Y ) 2E R (X; 0) (Y ) R (X; 0) (X; 0) ((X; 0) ) + E R (X; 0) (X; 0) 2 ((X; 0) ) 10/49
  • 20.
    I 2D R= E (Y )2 (X; 0) 2E 1 (X; 0) (X; 0) ((X; 0) )2 + E 1 (X; 0) (X; 0) ((X; 0) )2 = 2 IPW E 1 (X; 0) (X; 0) ((X; 0) )2 (X;0) (Y ), A = R(X;0) (X;0) ((X; 0) ), I IPW = R 22 and DR = IPW + A. Consider the Hilbert space L2(P). Since IPW ^and DR ^have in uence functions IPW and DR respectively, their squared length (jj jj E()2) are the asumptotic variances for ^IPW and ^DR. 11/49
  • 21.
  • 22.
    gure provides ageometric illustration: Figure : A geometric interpretation of eciency improvement by the DR. Result 2 (Eciency of DR) ^DR is more ecient than ^IPW under M1 M2. 12/49
  • 23.
    Remark 1.1 Theabove example suggests that I For a full data problem, there is a natural extension, via the IPW (inverse probability weighting) method, to a corresponding missing data problem; I By positing a working model p(zmisjzobs; ), the IPW estimating equation can be modi
  • 24.
    ed by addinga suitable augmentation term, resulting in an estimator that is still consistent even if the working model p(zmisjzobs; ) is not correct; I If in case p(zmisjzobs; ) is correct, the new estimator is consistent even if missing mechanism is incorrectly modeled. In this sense, the new estimator is doubly robust; I The doubly robust estimator has improved eciency if both models are correct. 13/49
  • 25.
    Semiparametric approachs tocoarsened data I First we introduce the terminology of coarsening, which contains missing data as a special case: De
  • 26.
    nition 1.2 (Coarsening) Suppose the full data consist of iid observations of an l-dimensional random vector Z. De
  • 27.
    ne a coarseningvariable C such that when C = r, we only observe Gr(Z), where Gr() is a many-to-one function. Further denote C = 1 if Z is completely observed (no coarsening), that is, G1(Z) = Z. Thus, the observed data consist of iid copies of (C;GC(Z)). De
  • 28.
    nition 1.3 (Coarseningat random) The data are said to be coarsened at random (CAR) if C ? ZjGC(Z). Remark 1.4 (Assumption) All problems considered are under the assumption of CAR. 14/49
  • 29.
    Terminology I Z:Full data; I GC(Z): Observed data; I (C;GC(Z)): Coarsened data. I Semiparametric models arise naturally in coarsened data problems. I Consider a full data regression model, z = (y; x)0: p(zj
  • 30.
    ; ) =p(yjx;
  • 31.
  • 32.
    is the regressionparameter, and is in
  • 33.
    nite dimensional (e.g.arbitrary cdf F for x). I Now suppose some components of x is missing (at random), then the likelihood becomes q(y; xobs; rj
  • 34.
    ; ; )= p(rjy; xobs; ) Z p(yjx;
  • 35.
  • 36.
  • 37.
    nite dimensional nuisance cannot be ignored. Hence we have arrived at a semiparametric model. I Let's review some basic theory about semiparametric inference. We assume as previously that
  • 38.
    is p-dimensional, theparameter of interest, and is a possibly in
  • 39.
  • 40.
    nition 1.5 (RALand in uence function) The estimator ^
  • 41.
    n is regularasymptotically linear (RAL) if p n( ^
  • 42.
  • 43.
  • 44.
    0;0 + op(1):(8) The mean-zero function ~
  • 45.
    0;0 is saidto be the in uence function of ^
  • 46.
    n. Remark 1.6(RAL estimator) If (8) holds, by CLT we easily have p n( ^
  • 47.
  • 48.
    0) N(0;E ~ 2) 16/49
  • 49.
  • 50.
    nition 1.7 (Tangentspaces) Let H denote the Hilbert space of all mean-zero functions in L2(P).
  • 51.
    , the tangentspace for
  • 52.
  • 53.
    ned as thelinear span, in H, of the score function S
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
    g: Similarly, thenuisance tangent space is de
  • 61.
    ned as thelinear span of the union of the score functions for all one-dimensional parametric submodels S = @ @ p(zj
  • 62.
    0; ). IThe following important theorem provides a characterization of all in uence functions of semiparametric RAL estimators ^
  • 63.
  • 64.
    ? Theorem 1.8(The space of IF for
  • 65.
    ) The spaceof in uence functions of RAL estimators for
  • 66.
    consists of all ~ satisfying I ~ is orthogonal to , i.e. ~ 2 ; I E ~ ST
  • 67.
    = Ipp. Remark1.9 (Z-estimation) Consider estimating
  • 68.
  • 69.
  • 70.
    0 = 0: Then by standard Z-estimation theory, p n( ^
  • 71.
  • 72.
  • 73.
  • 74.
    0 + op(1): 18/49
  • 75.
    Remark 1.10 (Z-estimationwith estimated nuisance) In the presence of a nuisance paramter , the estimating equation generally involves . A natural strategy is to insert a consistent estimator ^n. Pn
  • 76.
    ;^n = 0;where E
  • 77.
    0;0 = 0: Now, p n( ^
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
    0g1 =fE _ Gn
  • 83.
    0;^n + + op(1) p n[E
  • 84.
  • 85.
  • 86.
    0g1 =fE _ Gn
  • 87.
  • 88.
    0;0S] +op(1): p n(^n 0)] If is constructed such that
  • 89.
    0;0 2 ? , we have E[
  • 90.
    0;0S] = 0,and so p n( ^
  • 91.
  • 92.
  • 93.
  • 94.
    0;0 + op(1); which is equivalent to the estimator solving Pn
  • 95.
    ;0 = 0. 19/49
  • 96.
    In the followingdevelopment for methods with coarsened data, we start with the assumption that the full data problem p(zj
  • 97.
    ; ) iswell studied. This includes that I The full data tangent spaces F
  • 98.
    and F arecompletely characterized; I We have a full data estimating function
  • 99.
    (Z) 2 F? . The likelihood for the coarsened data, consisting of (Ci;GCi(Zi)), is q(r; grj
  • 100.
    ; ; )= (rjgr; ) Z z:Gr(z)=gr p(zj
  • 101.
    ; )d(z) (9) Now the nuisance parameter consists of (; ). 20/49
  • 102.
    I We startby investigating the relationships between coarsened data tangent spaces with the full data counterparts. I Consider S
  • 103.
  • 104.
  • 105.
    ; ; ) = @ @
  • 106.
  • 107.
    ; )d(z) = R z:Gr(z)=gr (@p(zj
  • 108.
  • 109.
  • 110.
    ; )d(z) = R z:Gr(z)=gr (@ log p(zj
  • 111.
  • 112.
  • 113.
    ; )d(z) R p(zj
  • 114.
  • 115.
  • 116.
    (Z)jC = r;Gr(Z)= gr) I Similarly we have the following theorem about : 21/49
  • 117.
    Theorem 1.11 (Characterizationof ) The coarsened data tangent space for is characterized by = fE[F (Z)jC;GC(Z)] : F 2 F g: (10) I Remember that the important task is to characterize ? , which will aid us in constructing coarsened data estimating equations for
  • 118.
    . Theorem 1.12(Characterization of ? ) The space ?consists of all elements h(C;GC(Z)) 2 H such that E[h(C;GC(Z))jZ] 2 F? : (11) 22/49
  • 119.
    Proof. By Theorem1.11, The space ? consists of all elements h(C;GC(Z)) 2 H such that Efh(C;GC(Z))E[F (Z)jC;GC(Z)])g = 0; 8F (Z) 2 F : This is equivalent to Efh(C;GC(Z))F (Z)g = 0, which is equivalent to EfF (Z)E[h(C;GC(Z))jZ]g = 0 Remark 1.13 (An linear operator perspective) De
  • 120.
    ne the linearoperator K : H ! HF by K() = E[jZ]. Then ? = K1(F? ): (12) Given
  • 121.
    (Z) 2 F? , the inverse operation K1(
  • 122.
    (Z)) with provideus a usable collection of estimating functions. 23/49
  • 123.
    Constructing the estimatingequation Theorem 1.14 (The Space K1(
  • 124.
    (Z))) If (C;GC(Z)) 2 H is such that E[ (C;GC(Z))jZ] =
  • 125.
  • 126.
    (Z)) = (C;GC(Z)) + K1(0): De
  • 127.
    nition 1.15 (Augmentationspace) We denote A = K1(0), and call it the augmentation space Corollary 1.16 Assume (1;Z; 0) = P(C = 1jZ; 0) 0 a.s.. Then K1(
  • 128.
  • 129.
    ;0 I(C= 1)
  • 130.
    (Z) (1;Z; 0) + h(C;GC(Z)); h 2 A: (13) 24/49
  • 131.
    Suppose ^n isan ecient estimator of 0. Take h 0, and we obtain the inverse probability weighted (IPW) estimating equation IPW n = Pn I(C=1)
  • 132.
    (Z) (1;Z;0) :In practice, the choice of h 2 A will be based on eciency considerations. We have the following theorem regarding the in uence function resulting from the estimating function Pn h
  • 133.
    ; ^n Theorem1.17 The in uence function for ^
  • 134.
  • 135.
    ; ^n is ~ h = (E _
  • 136.
    0)1 I(C= 1)
  • 137.
    0(Z) (1;Z; 0) + h(C;GC(Z)) [j] (14) Remark 1.18 ( A) By calculus we easily obtain E[SjZ] = 0, and therefore A. 25/49
  • 138.
    From a geometricpoint of view, we easily get the following result: Theorem 1.19 (Eciency among ~ h) Arg min h2A jj ~ hjj2 = I(C = 1)
  • 139.
    (Z) (1;Z; 0) jA ; resulting in the estimating equation DR
  • 140.
  • 141.
    (Z) (1;Z; 0) I(C = 1)
  • 142.
    (Z) (1;Z; 0) jA : (15) I Typically, calculating the projection [jA] requires us to posit working parametric models p(zj). But the DR estimating equation will still be valid even if p(zj) does not contain the truth. 26/49
  • 143.
    We conclude thissection by a theorem characterizing the augmentation space A. Theorem 1.20 (Characterization of A) The space A consists of all elements that can be written as X r6=1 I(C = 1) (1;Z) (r;Gr(Z)) I(C = r) hr(Gr(Z)); (16) where hr(Gr(Z)) is an arbitrary function of Gr(Z). Proof. See Theorem 7.2 of Tsiatis (2005). 27/49
  • 144.
    Part II: Applicationsin Missing Data Problems 28/49
  • 145.
    Data with twolevels of missingness Suppose Z = (Z1;Z2) and Z2 is missing on some observations. Denote R = 1 if Z2 is observed and = 0 if otherwise. Let (Z1; 0) = P(R = 1jZ; 0). The following theorem states explicitly how to calculate [ R
  • 146.
    (Z) (Z1;0) jA]. Theorem 2.1 R
  • 147.
    (Z) (Z1; 0) jA = R (Z1; 0) (Z1; 0) E[
  • 148.
    (Z)jZ1]: A sketchof proof. Wh e
  • 149.
  • 150.
    nd that atypical element in A is R(Z1;0) (Z1;0) i h(Z1). Then we
  • 151.
    nd that theunique function h0(Z1) such that n R
  • 152.
    (Z) (Z1;0) h R(Z1;0) (Z1;0) i h0(Z1) o ? h R(Z1;0) (Z1;0) i h(Z1); for all h(Z1) is E[
  • 153.
  • 154.
    Remark 2.2 (DRestimating equation) From Thereom 2.1 we have that DR
  • 155.
  • 156.
    (Z) (Z1; 0) R (Z1; 0) (Z1; 0) E[
  • 157.
    (Z)jZ1]: (17) Tocompute E[
  • 158.
    (Z)jZ1], we needto posit a parametric model p(zj), or at least p(Z2jZ1; ), and
  • 159.
    nd a consistentestimator ^n for 0. Then the projection can be computed by E[
  • 160.
    (Z)jZ1; ^n]. Weshould note that the parametric models need to be consistent with the original semiparametric model. Similar to the motivating example, we can show that the resulting estimating equation is doubly robust to p(rjz; ) and p(zj). DR
  • 161.
  • 162.
    (Z) (Z1; ^n) R (Z1; ^n) (Z1; ^n) # E[
  • 163.
  • 164.
    From the theoreticdevelopement in Part I, we know that the estimating equation Pn DR
  • 165.
    ; ^n;0 isthe most ecient among the augmented IPW if the working model p(zj) is true. It can be shown that DR
  • 166.
  • 167.
    ; ^n;0 giveassumptotic equivalent estimators under the working model. If we are to conduct robust inference based on DR
  • 168.
    ; ^n;^n ,we need to derive the variance without relying on the correctness of the working model p(zj). Let h(Xi;
  • 169.
  • 170.
    (Z)jZ1; 0] and assume ^n ! , where h(Xi;
  • 171.
  • 172.
  • 173.
  • 174.
  • 175.
    0) = E @ @
  • 176.
  • 177.
    0;0; 1 Gn DR
  • 178.
    0;0; + p n[E DR
  • 179.
    0; ^n; E DR
  • 180.
    0;0; ] + p n[E DR
  • 181.
  • 182.
    0;0; ] + op(1) = h E _
  • 183.
    0 i1 Gn DR
  • 184.
  • 185.
    0;0;ST p n(^n 0) ] + op(1) = h E _
  • 186.
    0 i1 Gn DR
  • 187.
  • 188.
    0;0;ST ][ESST ]1GnS + op(1) 32/49
  • 189.
    Denote )(ES 2 = DR (E DRST )1S: The we have p n( ^
  • 190.
  • 191.
    0) = h E _
  • 192.
  • 193.
    0;0; + op(1) N(0; ); (18) where can be consistently estimated by Pn R _ ^
  • 194.
    n (Z1; ^n) #1 Pn ^
  • 195.
    n; ^n;^n (Z1;R) 2 Pn R _ ^
  • 196.
    n (Z1; ^n) #1 (19) 33/49
  • 197.
    Example: Logistic regressionwith missing covariate Consider a logistic regression P(Y = 1jX;
  • 198.
  • 199.
  • 200.
  • 201.
  • 202.
  • 203.
  • 204.
    2X2 where X2is a real-valued continuous covariate and is missing on some subjects; (Y;XT 1 )T is always observed. I The full data model is p(yjx;
  • 205.
    )(x): Let X= (1;XT 1 ;X2). The full data estimating equation is Pn
  • 206.
    = PnX y e
  • 207.
  • 208.
  • 209.
    I To usethe IPW, we use a logistic regression for the missing mechanism P(R = 1jY;X; ) = e0+1Y +T2 X1 1 + e0+1Y +T2 X1 : The MLE ^n can be computed by solving PnS = 0, where S = 0 @ 1 Y X1 1 A R e0+1Y +T2 X1 1 + e0+1Y +T2 X1 ! : I To construct the DR estimating equation, we need to compute the conditional expectation E X y e
  • 210.
  • 211.
  • 216.
    Y;X1 # : 35/49
  • 217.
    I Therefore weneed to posit a working model for p(xj), or at least for p(z2jy; z1; ). If we do the latter, we should be aware that p(z2jy; z1; ) must be compatible with the regression model p(yjx;
  • 218.
    ). In fact,if the covariate distribution is MVN, we can show that xjy is multivariate normal. This motivates the following working model X2jY;X1; N(0 + 1Y + T 2 X1; 3): The MLE ^n is easily computed by the Least Squares with a complete case analysis. I Finally we need to compute E X Y e
  • 219.
  • 220.
  • 225.
    Y;X1; ^n # : This can be completed using numerical or Monte Carlo integration. 36/49
  • 226.
    Hence the DRestimating equation is DR n (
  • 227.
    ) = Pn ( R 1 ; ^n) (Y;XT X Y e
  • 228.
  • 229.
    TX ! 1 ; ^n) 1 (Y;XT 1 ; ^n) (Y;XT E X Y e
  • 230.
  • 231.
  • 236.
    Y;X1; ^n #) (20) ^
  • 237.
    n can beobtained using the Newton-Raphson algorithm, and its variance estimated using (19). 37/49
  • 238.
  • 239.
  • 240.
    nition) If wecan order the levels of coarsening in such a way that Gr(Z) is a coarsened version of Gr+1(Z); r = 1; 2; . That is Gr(Z) = fr(Gr+1(Z)); where fr is a many-to-one function, then coarsening is said to be monotone. Example 2.4 (Monotone missing in longitudinal data) When subject is followed over time, we observe (Y1; ; Yk), where Yj is the measurement at the jth time point. Incomplete data arise if a subject is lost to follow-up at certain point. In this case, if a measurement is missing at the rth point, then all measurements after that will be missing. 38/49
  • 241.
    C = rGr(Z) 1 Y1 2 Y1; Y2 ... ... k 1 Y1; ; Yk1 1 Y1; ; Yk For monotone coarsened data, it is natural and convenient to model missingness via the discrete hazard function r(Gr) = P(C = rjC r;Z); r6= 1 1; r = 1 : De
  • 242.
    ne Kr(Gr) =P(C rjZ) = rj =1[1 j(Gj )]: Then the function can be expressed as (r;Gr(Z)) = Kr(Gr(Z))r(Gr): 39/49
  • 243.
    As in thecase with two levels of missingness, we
  • 244.
    rst need to chacterize the augmentation space A using Theorem 1.20. Then we use the characterization to derive ( IPWjA). We provide the end result in the following theorem. Theorem 2.5 (( IPWjA) in monotone coarsened data) The projection of I(C=1)
  • 245.
    (Z) (1;Z) ontoA is X r6=1 I(C = r) r(Gr)I(C r) Kr(Gr) E[
  • 246.
    (Z)jGr(Z)] (21) Again,to compute the conditional expectations E[
  • 247.
    (Z)jGr(Z)] we needto posit a parametric working model p(zj), or at least a series of conditional models p(gr+1jgr; r). 40/49
  • 248.
    Remark 2.6 (Modelingthe coarsening hazard) Instead of modeling the coarsening probability, we model the discrete hazard P(C = rjC r;Z; r) = r(Gr; r): With monotone missing longitudinal data, for example, we may apply the logistic model r(Gr; r) = e0r+1rY1++rrYr 1 + e0r+1rY1++rrYr : The likelihood of C now has the following form Y r Y i:Cir r(Gr(Zi); r) I(Ci=r) I(Cir) 1 r(Gr(Zi); r) : Note that the likelihood for r factorizes. So maximization can be done separately. 41/49
  • 249.
    If we uselogistic regression for monotone missing longitudinal data, the likelihood is given by kY1 r Y i:Cir e0r+1rY1++rrYrI(Ci = r) 1 + e0r+1rY1++rrYr : Each r can be estimated using logistic regression on the data fi : Ci rg, and S = (ST 1 ; ; ST k1)T : 42/49
  • 250.
    Now we lookat the problem of double robustness. Let ^n ! and ^n ! . Theorem 2.7 (Double robustness of DR) E I(C = 1)
  • 251.
    0 (Z) (1;Z;) + X r6=1 I(C = r) r(Gr; )I(C r) Kr(Gr; ) E[
  • 252.
    0 (Z)jGr(Z); ] = 0; if either the model for r(gr; ) or the working model p(zj) is correctly speci
  • 253.
  • 254.
  • 255.
    ltration Fr fI(C = 1); ; I(C = r 1);Zg: and use martingale arguments. 43/49
  • 256.
    Remark 2.8 (Inferencewith DR) Denote )(ES 2 = DR (E DRST )1S: Similar to the case with two levels of missingness, we can show that p n( ^
  • 257.
  • 258.
    0) = h E _
  • 259.
  • 260.
    0;0; + op(1) N(0; ); (22) where can be consistently estimated by Pn I(C = 1) _ ^
  • 261.
    n (1;Z; ^n) #1 Pn ^
  • 262.
    n; ^n;^n 2 Pn I(C = 1) _ ^
  • 263.
    n (1;Z; ^n) #1 (23) 44/49
  • 264.
    Example: A longitudinalRCT with dropout Tsiatis (2006) describes a randomized clinical trial on a new drug for HIV/AIDS. The primary outcome is CD4 count, denoted as Y . We also denote X as the indicator variable for the treatment. Measurements of Y are taken at baseline t1 = 0 and l 1 subsequent time points, denoted t2; ; tl. We want to model the mean CD4 count as a function of treatment and time through E[YjijXi] =
  • 265.
  • 266.
  • 267.
    2Xitj ; j= 1; ; l: Let the design matrix be D(X), that is E[YijXi] = D(Xi)
  • 268.
    : If thereis no dropout, we may use the GEE with independent working correlation, resulting in the estimating equation Pn
  • 269.
    (Y;X) = PnDT(X)(Y D(X)
  • 270.
  • 271.
    Now suppose thereis random dropout, and the mechanism is MAR. I First we use the logistic regression for the missing hazard r(Gr; r) = e0r+1rY1++rrYr+r+1;rX 1 + e0r+1rY1++rrYr+r+1;rX : and obtain the MLE ^n I Denote Y r = (Y1; ; Yr)T and Y r = (Yr+1; ; Yl)T . From Theorem 2.5, we need to compute the conditional expectation E[DT (X)(Y D(X)
  • 272.
    )jY r;X] =DT (X)E[(Y D(X)
  • 273.
  • 274.
    I If weposit the working model Y j(X = k) N(k; ); and denote rr to be the variance of Yr and rr to be the covariance between Yr and Yr. Then we have E[(Y D(X)
  • 275.
    )jYr;X; ] = Y r Dr3(X)
  • 276.
  • 277.
    ) : I The MLE ^ n can be computed using standard statistical package (e.g. PROC MIXED in SAS). 47/49
  • 278.
  • 279.
    can be estimatedthrough the DR estimating equation using the Newton-Raphson algorithm Pn DR
  • 280.
    ; ^n;^ n = Pn ( I(C = 1) (1; ^n) DT (X)(Y D(X)
  • 281.
    ) + Xl1 r=1 I(C = r) r(Yr;X; ^n)I(C r) Kr(Yr;X; ^n) # ) DT (X)E[(Y D(X)
  • 282.
    )jYr;X; ^ n] : (24) I The asymptotic variance of
  • 283.
    n can beestimated using the sandwich-type estimator described in (23). 48/49
  • 284.
    References Bang H,Robins JM (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962-73. Bickel J, Klaassen C, Ritov Y, Wellner JA (1993). Ecient and Adaptive Estimation for Semiparametric Models. Springer. Kosorok MR (2008). Introduction to empirical processes and semiparametric inference. Springer Lipsitz SR, Ibrahim JG, Zhao LP (1999). A Weighted Estimating Equation for Missing Covariate Data with Properties Similar to Maximum Likelihood. Journal of the American Statistical Association 94, 1147-1160. Robins JM, Rotnitzky A. (2001). Comment on the Bickel and Kwon article, Inference for semiparametric models: Some questions and an answer Statistica Sinica 11, 920-936.. Scharfstein DO, Rotnitzky A, Robins JM (1999). Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models. Journal of the American Statistical Association 94, 1135-1146. Tsiatis (2006). Semiparametric Theory and Missing Data. Springer Series in Statistics 49/49