Double Robustness: Theory and Applications with Missing Data

1.
Double Robustness: Theoryand Applications with Missing Data Lu Mao Department of Biostatistics The University of North Carolina at Chapel Hill Email: lmao@unc.edu April 17, 2013 1/49

2.
Table of Contents Part I: A Semiparametric Perspective A motivating example Semiparametric approachs to coarsened data Constructing the estimating equation Part II: Applications in Missing Data Problems Data with two levels of missingness Monotone coarsened data 2/49

3.
Part I: ASemiparametric Perspective 3/49

4.
A Motivating Example I Given an iid sample Y1; ; Yn from an arbitrary distribution, consider the estimation of the population mean = EY by Y , which solves Pn(Y ) = 0; where PnZ 1 n Pn i=1 Zi. I Suppose some of the Yi's are missing. Let Ri = 1 if Yi is observed and = 0 if otherwise. Let (Y ) = P(R = 1jY ). Now consider estimating by solving PnR(Y ) = 0 resulting in ^CC = P PRiYi Ri !p E[(Y )Y ] E(Y ) 6= : 4/49

5.
I Suppose inaddition to Yi, an auxilary variable Xi is also collected, and R ? Y jX. Assume P(R = 1jY;X) = (X; ). To correct the bias, we apply the estimating equation ( ^n is a consistent estimator for 0) IPW n = Pn R (X; ^n) (Y ) resulting in ^IPW = Pn[RY=(X; ^n)] Pn[R=(X; ^n)] = Pn[RY=(X; 0)] Pn[R=(X; 0)] + op(1) !p E[RY=(X; 0)] E[R=(X; 0)] = (1) 5/49

6.
I Assume (X)= E(Y jX; ), and consider a new estimating equation as a modi

7.
cation of IPW n : DR n = Pn R (X; ^n) (Y ) R (X; ^n) (X; ^n) ! ; ((X; ^n) ) (2) resulting in ^DR = Pn R (X; ^n) Y R (X; ^n) (X; ^n) ! : (3) (X; ^n) Now let's study the consistency of ^DR under dierent assumptions. 6/49

8.
I Scenario 1.(X; ) correct; (X; ) incorrect. So, ^n !p 0, but ^n ! , with (X; )6= E(Y jX): ^DR = Pn R (X; 0) Y R (X; 0) (X; 0) (X; ) + op(1) !p E R (X; 0) Y E R (X; 0) (X; 0) (X; ) = E Y E R (X; 0)

12.
Y;X E (X; )E R (X; 0) (X; 0)

16.
Y;X = 0 = (4) 7/49

17.
I Scenario 2.(X;) correct; (X; ) incorrect; So, ^n !p 0, but ^n ! , with (X; )6= E(RjY;X): ^DR = Pn R (X; ) Y R (X; ) (X; ) (X; 0) + op(1) !p E R (X; ) Y E R (X; ) (X; ) (X; 0) = E R (X; ) E(Y jR;X) E R (X; ) (X; ) (X; 0) = E R (X; ) E (X; 0) R (X; ) (X; ) (X; 0) = E[E((X; 0))] = : (5) 8/49

18.
Result 1 (Doublerobustness) ^DR is consistent if either the model or the model is correct, that is, under M1 [M2, where M1 = fp(rjy; x; ) : 2 1g, and M2 = fp(yjx; ) : 2 2g. In other words, ^DR is doubly robust. I Now, let's consider a somewhat dierent question: eciency under M1 M2. For simplicity, we assume we know the true values (0; 0). I Denote Gng(Z) = p1 n Pn i=1[g(Zi) Eg(Z)]. Algebraic manipulations yield: p n(^IPW ) = 1 Pn[R=(X; 0)] Gn R 0(X; ) (Y ) N(0; 2 IPW): (6) 9/49

19.
I where 2 IPW = E R (X; 0) (Y ) 2 = E (Y )2 (X; 0) : I Similarly p n(^DR ) = Gn R (X; 0) (Y ) R (X; 0) (X; 0) ((X; 0) ) N(0; 2D R); (7) where 2D R = E R (X; 0) 2 (Y ) 2E R (X; 0) (Y ) R (X; 0) (X; 0) ((X; 0) ) + E R (X; 0) (X; 0) 2 ((X; 0) ) 10/49

20.
I 2D R= E (Y )2 (X; 0) 2E 1 (X; 0) (X; 0) ((X; 0) )2 + E 1 (X; 0) (X; 0) ((X; 0) )2 = 2 IPW E 1 (X; 0) (X; 0) ((X; 0) )2 (X;0) (Y ), A = R(X;0) (X;0) ((X; 0) ), I IPW = R 22 and DR = IPW + A. Consider the Hilbert space L2(P). Since IPW ^and DR ^have in uence functions IPW and DR respectively, their squared length (jj jj E()2) are the asumptotic variances for ^IPW and ^DR. 11/49

21.
I The following

22.
gure provides ageometric illustration: Figure : A geometric interpretation of eciency improvement by the DR. Result 2 (Eciency of DR) ^DR is more ecient than ^IPW under M1 M2. 12/49

23.
Remark 1.1 Theabove example suggests that I For a full data problem, there is a natural extension, via the IPW (inverse probability weighting) method, to a corresponding missing data problem; I By positing a working model p(zmisjzobs; ), the IPW estimating equation can be modi

24.
ed by addinga suitable augmentation term, resulting in an estimator that is still consistent even if the working model p(zmisjzobs; ) is not correct; I If in case p(zmisjzobs; ) is correct, the new estimator is consistent even if missing mechanism is incorrectly modeled. In this sense, the new estimator is doubly robust; I The doubly robust estimator has improved eciency if both models are correct. 13/49

25.
Semiparametric approachs tocoarsened data I First we introduce the terminology of coarsening, which contains missing data as a special case: De

26.
nition 1.2 (Coarsening) Suppose the full data consist of iid observations of an l-dimensional random vector Z. De

27.
ne a coarseningvariable C such that when C = r, we only observe Gr(Z), where Gr() is a many-to-one function. Further denote C = 1 if Z is completely observed (no coarsening), that is, G1(Z) = Z. Thus, the observed data consist of iid copies of (C;GC(Z)). De

28.
nition 1.3 (Coarseningat random) The data are said to be coarsened at random (CAR) if C ? ZjGC(Z). Remark 1.4 (Assumption) All problems considered are under the assumption of CAR. 14/49

29.
Terminology I Z:Full data; I GC(Z): Observed data; I (C;GC(Z)): Coarsened data. I Semiparametric models arise naturally in coarsened data problems. I Consider a full data regression model, z = (y; x)0: p(zj

30.
; ) =p(yjx;

31.
)p(x; ); where

32.
is the regressionparameter, and is in

33.
nite dimensional (e.g.arbitrary cdf F for x). I Now suppose some components of x is missing (at random), then the likelihood becomes q(y; xobs; rj

34.
; ; )= p(rjy; xobs; ) Z p(yjx;

35.
)p(x; )dxmis 15/49

36.
I Now thein

37.
nite dimensional nuisance cannot be ignored. Hence we have arrived at a semiparametric model. I Let's review some basic theory about semiparametric inference. We assume as previously that

38.
is p-dimensional, theparameter of interest, and is a possibly in

39.
nite-dimensional nuissance parameter. De

40.
nition 1.5 (RALand in uence function) The estimator ^

41.
n is regularasymptotically linear (RAL) if p n( ^

43.
0) = Gn~

44.
0;0 + op(1):(8) The mean-zero function ~

45.
0;0 is saidto be the in uence function of ^

46.
n. Remark 1.6(RAL estimator) If (8) holds, by CLT we easily have p n( ^

48.
0) N(0;E ~ 2) 16/49

49.
De

50.
nition 1.7 (Tangentspaces) Let H denote the Hilbert space of all mean-zero functions in L2(P).

51.
, the tangentspace for

52.
is de

53.
ned as thelinear span, in H, of the score function S

54.
= @ @

55.
log p(zj

56.
; 0)j

58.
0 , i.e.

59.
= spanfS

60.
g: Similarly, thenuisance tangent space is de

61.
ned as thelinear span of the union of the score functions for all one-dimensional parametric submodels S = @ @ p(zj

62.
0; ). IThe following important theorem provides a characterization of all in uence functions of semiparametric RAL estimators ^

63.
n. 17/49

64.
? Theorem 1.8(The space of IF for

65.
) The spaceof in uence functions of RAL estimators for

66.
consists of all ~ satisfying I ~ is orthogonal to , i.e. ~ 2 ; I E ~ ST

67.
= Ipp. Remark1.9 (Z-estimation) Consider estimating

68.
from the estimatingequation Pn

69.
= 0; whereE

70.
0 = 0: Then by standard Z-estimation theory, p n( ^

72.
) = fE_

73.
0g1Gn

74.
0 + op(1): 18/49

75.
Remark 1.10 (Z-estimationwith estimated nuisance) In the presence of a nuisance paramter , the estimating equation generally involves . A natural strategy is to insert a consistent estimator ^n. Pn

76.
;^n = 0;where E

77.
0;0 = 0: Now, p n( ^

79.
) = fE_

80.
0g1p nPn

81.
0;^n + op(1)

82.
0g1 =fE _ Gn

83.
0;^n + + op(1) p n[E

84.
0;^n E

85.
0;^n]

86.
0g1 =fE _ Gn

87.
0;0 + E[

88.
0;0S] +op(1): p n(^n 0)] If is constructed such that

89.
0;0 2 ? , we have E[

90.
0;0S] = 0,and so p n( ^

92.
) = fE_

93.
0g1Gn

94.
0;0 + op(1); which is equivalent to the estimator solving Pn

95.
;0 = 0. 19/49

96.
In the followingdevelopment for methods with coarsened data, we start with the assumption that the full data problem p(zj

97.
; ) iswell studied. This includes that I The full data tangent spaces F

98.
and F arecompletely characterized; I We have a full data estimating function

99.
(Z) 2 F? . The likelihood for the coarsened data, consisting of (Ci;GCi(Zi)), is q(r; grj

100.
; ; )= (rjgr; ) Z z:Gr(z)=gr p(zj

101.
; )d(z) (9) Now the nuisance parameter consists of (; ). 20/49

102.
I We startby investigating the relationships between coarsened data tangent spaces with the full data counterparts. I Consider S

103.
(r; gr) = @ @

104.
log q(r; grj

105.
; ; ) = @ @

106.
log Z z:Gr(z)=gr p(zj

107.
; )d(z) = R z:Gr(z)=gr (@p(zj

108.
; )=@

109.
)d(z) R z:Gr(z)=gr p(zj

110.
; )d(z) = R z:Gr(z)=gr (@ log p(zj

111.
; )=@

112.
)p(zj

113.
; )d(z) R p(zj

114.
; )d(z) z:Gr(z)=gr = E(SF

115.
(Z)jGr(Z) = gr) = E(SF

116.
(Z)jC = r;Gr(Z)= gr) I Similarly we have the following theorem about : 21/49

117.
Theorem 1.11 (Characterizationof ) The coarsened data tangent space for is characterized by = fE[F (Z)jC;GC(Z)] : F 2 F g: (10) I Remember that the important task is to characterize ? , which will aid us in constructing coarsened data estimating equations for

118.
. Theorem 1.12(Characterization of ? ) The space ?consists of all elements h(C;GC(Z)) 2 H such that E[h(C;GC(Z))jZ] 2 F? : (11) 22/49

119.
Proof. By Theorem1.11, The space ? consists of all elements h(C;GC(Z)) 2 H such that Efh(C;GC(Z))E[F (Z)jC;GC(Z)])g = 0; 8F (Z) 2 F : This is equivalent to Efh(C;GC(Z))F (Z)g = 0, which is equivalent to EfF (Z)E[h(C;GC(Z))jZ]g = 0 Remark 1.13 (An linear operator perspective) De

120.
ne the linearoperator K : H ! HF by K() = E[jZ]. Then ? = K1(F? ): (12) Given

121.
(Z) 2 F? , the inverse operation K1(

122.
(Z)) with provideus a usable collection of estimating functions. 23/49

123.
Constructing the estimatingequation Theorem 1.14 (The Space K1(

124.
(Z))) If (C;GC(Z)) 2 H is such that E[ (C;GC(Z))jZ] =

125.
(Z). Then K1(

126.
(Z)) = (C;GC(Z)) + K1(0): De

127.
nition 1.15 (Augmentationspace) We denote A = K1(0), and call it the augmentation space Corollary 1.16 Assume (1;Z; 0) = P(C = 1jZ; 0) 0 a.s.. Then K1(

128.
(Z)) consists of h

129.
;0 I(C= 1)

130.
(Z) (1;Z; 0) + h(C;GC(Z)); h 2 A: (13) 24/49

131.
Suppose ^n isan ecient estimator of 0. Take h 0, and we obtain the inverse probability weighted (IPW) estimating equation IPW n = Pn I(C=1)

132.
(Z) (1;Z;0) :In practice, the choice of h 2 A will be based on eciency considerations. We have the following theorem regarding the in uence function resulting from the estimating function Pn h

133.
; ^n Theorem1.17 The in uence function for ^

134.
h n solvingPn h

135.
; ^n is ~ h = (E _

136.
0)1 I(C= 1)

137.
0(Z) (1;Z; 0) + h(C;GC(Z)) [j] (14) Remark 1.18 ( A) By calculus we easily obtain E[SjZ] = 0, and therefore A. 25/49

138.
From a geometricpoint of view, we easily get the following result: Theorem 1.19 (Eciency among ~ h) Arg min h2A jj ~ hjj2 = I(C = 1)

139.
(Z) (1;Z; 0) jA ; resulting in the estimating equation DR

140.
;0 = I(C= 1)

141.
(Z) (1;Z; 0) I(C = 1)

142.
(Z) (1;Z; 0) jA : (15) I Typically, calculating the projection [jA] requires us to posit working parametric models p(zj). But the DR estimating equation will still be valid even if p(zj) does not contain the truth. 26/49

143.
We conclude thissection by a theorem characterizing the augmentation space A. Theorem 1.20 (Characterization of A) The space A consists of all elements that can be written as X r6=1 I(C = 1) (1;Z) (r;Gr(Z)) I(C = r) hr(Gr(Z)); (16) where hr(Gr(Z)) is an arbitrary function of Gr(Z). Proof. See Theorem 7.2 of Tsiatis (2005). 27/49

144.
Part II: Applicationsin Missing Data Problems 28/49

145.
Data with twolevels of missingness Suppose Z = (Z1;Z2) and Z2 is missing on some observations. Denote R = 1 if Z2 is observed and = 0 if otherwise. Let (Z1; 0) = P(R = 1jZ; 0). The following theorem states explicitly how to calculate [ R

146.
(Z) (Z1;0) jA]. Theorem 2.1 R

147.
(Z) (Z1; 0) jA = R (Z1; 0) (Z1; 0) E[

148.
(Z)jZ1]: A sketchof proof. Wh e

149.
rst use Theorem1.20 to

150.
nd that atypical element in A is R(Z1;0) (Z1;0) i h(Z1). Then we

151.
nd that theunique function h0(Z1) such that n R

152.
(Z) (Z1;0) h R(Z1;0) (Z1;0) i h0(Z1) o ? h R(Z1;0) (Z1;0) i h(Z1); for all h(Z1) is E[

153.
(Z)jZ1]. 29/49

154.
Remark 2.2 (DRestimating equation) From Thereom 2.1 we have that DR

155.
;0 = R

156.
(Z) (Z1; 0) R (Z1; 0) (Z1; 0) E[

157.
(Z)jZ1]: (17) Tocompute E[

158.
(Z)jZ1], we needto posit a parametric model p(zj), or at least p(Z2jZ1; ), and

159.
nd a consistentestimator ^n for 0. Then the projection can be computed by E[

160.
(Z)jZ1; ^n]. Weshould note that the parametric models need to be consistent with the original semiparametric model. Similar to the motivating example, we can show that the resulting estimating equation is doubly robust to p(rjz; ) and p(zj). DR

161.
; ^n;^n = R

162.
(Z) (Z1; ^n) R (Z1; ^n) (Z1; ^n) # E[

163.
(Z)jZ1; ^n] 30/49

164.
From the theoreticdevelopement in Part I, we know that the estimating equation Pn DR

165.
; ^n;0 isthe most ecient among the augmented IPW if the working model p(zj) is true. It can be shown that DR

166.
; ^n;^n and DR

167.
; ^n;0 giveassumptotic equivalent estimators under the working model. If we are to conduct robust inference based on DR

168.
; ^n;^n ,we need to derive the variance without relying on the correctness of the working model p(zj). Let h(Xi;

169.
; 0) =E[

170.
(Z)jZ1; 0] and assume ^n ! , where h(Xi;

171.
; )6= E[

172.
(Z)jZ1]. 31/49

173.
Now, p n(^

174.
n

175.
0) = E @ @

176.
T DR

177.
0;0; 1 Gn DR

178.
0;0; + p n[E DR

179.
0; ^n; E DR

180.
0;0; ] + p n[E DR

181.
0;0;^ EDR

182.
0;0; ] + op(1) = h E _

183.
0 i1 Gn DR

184.
0;0; E[DR

185.
0;0;ST p n(^n 0) ] + op(1) = h E _

186.
0 i1 Gn DR

187.
0;0; E[DR

188.
0;0;ST ][ESST ]1GnS + op(1) 32/49

189.
Denote )(ES 2 = DR (E DRST )1S: The we have p n( ^

190.
n

191.
0) = h E _

192.
0 i1 Gn

193.
0;0; + op(1) N(0; ); (18) where can be consistently estimated by Pn R _ ^

194.
n (Z1; ^n) #1 Pn ^

195.
n; ^n;^n (Z1;R) 2 Pn R _ ^

196.
n (Z1; ^n) #1 (19) 33/49

197.
Example: Logistic regressionwith missing covariate Consider a logistic regression P(Y = 1jX;

198.
) = e

199.
0+

200.
T 1 X1+

201.
2X2 1 +e

202.
0+

203.
T 1 X1+

204.
2X2 where X2is a real-valued continuous covariate and is missing on some subjects; (Y;XT 1 )T is always observed. I The full data model is p(yjx;

205.
)(x): Let X= (1;XT 1 ;X2). The full data estimating equation is Pn

206.
= PnX y e

207.
TX 1 +e

208.
TX ! 34/49

209.
I To usethe IPW, we use a logistic regression for the missing mechanism P(R = 1jY;X; ) = e0+1Y +T2 X1 1 + e0+1Y +T2 X1 : The MLE ^n can be computed by solving PnS = 0, where S = 0 @ 1 Y X1 1 A R e0+1Y +T2 X1 1 + e0+1Y +T2 X1 ! : I To construct the DR estimating equation, we need to compute the conditional expectation E X y e

210.
TX 1 +e

211.
TX !

216.
Y;X1 # : 35/49

217.
I Therefore weneed to posit a working model for p(xj), or at least for p(z2jy; z1; ). If we do the latter, we should be aware that p(z2jy; z1; ) must be compatible with the regression model p(yjx;

218.
). In fact,if the covariate distribution is MVN, we can show that xjy is multivariate normal. This motivates the following working model X2jY;X1; N(0 + 1Y + T 2 X1; 3): The MLE ^n is easily computed by the Least Squares with a complete case analysis. I Finally we need to compute E X Y e

219.
TX 1 +e

220.
TX !

225.
Y;X1; ^n # : This can be completed using numerical or Monte Carlo integration. 36/49

226.
Hence the DRestimating equation is DR n (

227.
) = Pn ( R 1 ; ^n) (Y;XT X Y e

228.
TX 1 +e

229.
TX ! 1 ; ^n) 1 (Y;XT 1 ; ^n) (Y;XT E X Y e

230.
TX 1 +e

231.
TX !

236.
Y;X1; ^n #) (20) ^

237.
n can beobtained using the Newton-Raphson algorithm, and its variance estimated using (19). 37/49

238.
Monotone coarsened data De

239.
nition 2.3 (Generalde

240.
nition) If wecan order the levels of coarsening in such a way that Gr(Z) is a coarsened version of Gr+1(Z); r = 1; 2; . That is Gr(Z) = fr(Gr+1(Z)); where fr is a many-to-one function, then coarsening is said to be monotone. Example 2.4 (Monotone missing in longitudinal data) When subject is followed over time, we observe (Y1; ; Yk), where Yj is the measurement at the jth time point. Incomplete data arise if a subject is lost to follow-up at certain point. In this case, if a measurement is missing at the rth point, then all measurements after that will be missing. 38/49

241.
C = rGr(Z) 1 Y1 2 Y1; Y2 ... ... k 1 Y1; ; Yk1 1 Y1; ; Yk For monotone coarsened data, it is natural and convenient to model missingness via the discrete hazard function r(Gr) = P(C = rjC r;Z); r6= 1 1; r = 1 : De

242.
ne Kr(Gr) =P(C rjZ) = rj =1[1 j(Gj )]: Then the function can be expressed as (r;Gr(Z)) = Kr(Gr(Z))r(Gr): 39/49

243.
As in thecase with two levels of missingness, we

244.
rst need to chacterize the augmentation space A using Theorem 1.20. Then we use the characterization to derive ( IPWjA). We provide the end result in the following theorem. Theorem 2.5 (( IPWjA) in monotone coarsened data) The projection of I(C=1)

245.
(Z) (1;Z) ontoA is X r6=1 I(C = r) r(Gr)I(C r) Kr(Gr) E[

246.
(Z)jGr(Z)] (21) Again,to compute the conditional expectations E[

247.
(Z)jGr(Z)] we needto posit a parametric working model p(zj), or at least a series of conditional models p(gr+1jgr; r). 40/49

248.
Remark 2.6 (Modelingthe coarsening hazard) Instead of modeling the coarsening probability, we model the discrete hazard P(C = rjC r;Z; r) = r(Gr; r): With monotone missing longitudinal data, for example, we may apply the logistic model r(Gr; r) = e0r+1rY1++rrYr 1 + e0r+1rY1++rrYr : The likelihood of C now has the following form Y r Y i:Cir r(Gr(Zi); r) I(Ci=r) I(Cir) 1 r(Gr(Zi); r) : Note that the likelihood for r factorizes. So maximization can be done separately. 41/49

249.
If we uselogistic regression for monotone missing longitudinal data, the likelihood is given by kY1 r Y i:Cir e0r+1rY1++rrYrI(Ci = r) 1 + e0r+1rY1++rrYr : Each r can be estimated using logistic regression on the data fi : Ci rg, and S = (ST 1 ; ; ST k1)T : 42/49

250.
Now we lookat the problem of double robustness. Let ^n ! and ^n ! . Theorem 2.7 (Double robustness of DR) E I(C = 1)

251.
0 (Z) (1;Z;) + X r6=1 I(C = r) r(Gr; )I(C r) Kr(Gr; ) E[

252.
0 (Z)jGr(Z); ] = 0; if either the model for r(gr; ) or the working model p(zj) is correctly speci

253.
ed. Hint. De

254.
ne the discretetime

255.
ltration Fr fI(C = 1); ; I(C = r 1);Zg: and use martingale arguments. 43/49

256.
Remark 2.8 (Inferencewith DR) Denote )(ES 2 = DR (E DRST )1S: Similar to the case with two levels of missingness, we can show that p n( ^

257.
n

258.
0) = h E _

259.
0 i1 Gn

260.
0;0; + op(1) N(0; ); (22) where can be consistently estimated by Pn I(C = 1) _ ^

261.
n (1;Z; ^n) #1 Pn ^

262.
n; ^n;^n 2 Pn I(C = 1) _ ^

263.
n (1;Z; ^n) #1 (23) 44/49

264.
Example: A longitudinalRCT with dropout Tsiatis (2006) describes a randomized clinical trial on a new drug for HIV/AIDS. The primary outcome is CD4 count, denoted as Y . We also denote X as the indicator variable for the treatment. Measurements of Y are taken at baseline t1 = 0 and l 1 subsequent time points, denoted t2; ; tl. We want to model the mean CD4 count as a function of treatment and time through E[YjijXi] =

265.
0 +

266.
1tj +

267.
2Xitj ; j= 1; ; l: Let the design matrix be D(X), that is E[YijXi] = D(Xi)

268.
: If thereis no dropout, we may use the GEE with independent working correlation, resulting in the estimating equation Pn

269.
(Y;X) = PnDT(X)(Y D(X)

270.
): 45/49

271.
Now suppose thereis random dropout, and the mechanism is MAR. I First we use the logistic regression for the missing hazard r(Gr; r) = e0r+1rY1++rrYr+r+1;rX 1 + e0r+1rY1++rrYr+r+1;rX : and obtain the MLE ^n I Denote Y r = (Y1; ; Yr)T and Y r = (Yr+1; ; Yl)T . From Theorem 2.5, we need to compute the conditional expectation E[DT (X)(Y D(X)

272.
)jY r;X] =DT (X)E[(Y D(X)

273.
)jYr;X]: 46/49

274.
I If weposit the working model Y j(X = k) N(k; ); and denote rr to be the variance of Yr and rr to be the covariance between Yr and Yr. Then we have E[(Y D(X)

275.
)jYr;X; ] = Y r Dr3(X)

276.
rr(rr)1(Y r Dr3(X)

277.
) : I The MLE ^ n can be computed using standard statistical package (e.g. PROC MIXED in SAS). 47/49

278.
I Finally,

279.
can be estimatedthrough the DR estimating equation using the Newton-Raphson algorithm Pn DR

280.
; ^n;^ n = Pn ( I(C = 1) (1; ^n) DT (X)(Y D(X)

281.
) + Xl1 r=1 I(C = r) r(Yr;X; ^n)I(C r) Kr(Yr;X; ^n) # ) DT (X)E[(Y D(X)

282.
)jYr;X; ^ n] : (24) I The asymptotic variance of

283.
n can beestimated using the sandwich-type estimator described in (23). 48/49

284.
References Bang H,Robins JM (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962-73. Bickel J, Klaassen C, Ritov Y, Wellner JA (1993). Ecient and Adaptive Estimation for Semiparametric Models. Springer. Kosorok MR (2008). Introduction to empirical processes and semiparametric inference. Springer Lipsitz SR, Ibrahim JG, Zhao LP (1999). A Weighted Estimating Equation for Missing Covariate Data with Properties Similar to Maximum Likelihood. Journal of the American Statistical Association 94, 1147-1160. Robins JM, Rotnitzky A. (2001). Comment on the Bickel and Kwon article, Inference for semiparametric models: Some questions and an answer Statistica Sinica 11, 920-936.. Scharfstein DO, Rotnitzky A, Robins JM (1999). Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models. Journal of the American Statistical Association 94, 1135-1146. Tsiatis (2006). Semiparametric Theory and Missing Data. Springer Series in Statistics 49/49

Double Robustness: Theory and Applications with Missing Data

More Related Content

What's hot

Similar to Double Robustness: Theory and Applications with Missing Data

Recently uploaded

Double Robustness: Theory and Applications with Missing Data