Upcoming SlideShare
×

# Lesage

320 views
207 views

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
320
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
0
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Lesage

1. 1. On the problem of bias ampliﬁcation of the instrumental calibration estimator with missing survey data Éric LESAGE Laboratoire de statistique d’enquête CREST-ENSAI Joint work with David HAZIZA (Université de Montréal and CREST-ENSAI) 31 janvier 2013 Séminaire de Statistique ENSAE-ENSAI CREST, ParisÉric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 1 / 37
2. 2. Outlines1 Introduction2 Underlying models3 Bias ampliﬁcation of the instrumental calibration estimators4 Simulation studyÉric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 2 / 37
3. 3. IntroductionContext Nonresponse is a major problem in survey In presence of nonresponse, the usual complete data estimators may be biased when respondents and nonrespondents are diﬀerent with respect to the survey variables. A weighting approach that has received a lot of attention recently is the so-called single-step approach which uses calibration. See Deville (1998, 2002), Sautory (2003), Särndal and Lundström (2005), Kott (2006, 2009, 2012), among others.Issue We examine the properties of instrument vector calibration estimators, where the instrumental variables (related to the response propensity) are available for the responding units only. More speciﬁcally, the problem of bias ampliﬁcation is illustrated.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 3 / 37
4. 4. Introduction Consider a ﬁnite population U of size N . The objective is to estimate the population total ty = yk , of a k∈U variable of interest y(e.g., incomes). A sample, s, of size n, is selected from U according to a given sampling design p(s). A complete data estimator of ty is the expansion estimator ˆ tπ = dk yk , k∈s where dk = 1/πk denotes the design weight attached to unit k and πk = P (k ∈ s) denotes its ﬁrst-order probability of inclusion in the sample. In the presence of unit nonresponse, only a subset sr of s is observed, ˆ which makes it impossible to compute tπ .Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 4 / 37
5. 5. Introduction To deﬁne a nonresponse adjusted estimator of ty , we assume that a vector of auxiliary variables x is available for k ∈ sr ; the vector of population totals tx = k∈U xk is known; In practice, the x-vector is often deﬁned by survey managers, who wish to ensure consistency between survey weighted estimates and known population totals for some important variables (e.g., age and sex). In addition, we assume that a vector of instrumental variables z is available for k ∈ sr ; same dimension as x,; The z-vector needs only to be available for the respondents. The instrumental variables are believed to be associated with the propensity of units to respond to the survey. Let Rk be a response indicator attached to unit k such that Rk = 1 if unit k is a respondent Rk = 0 otherwise.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 5 / 37
6. 6. Introduction ˆInstrumental calibration estimator tC We consider an instrumental calibration estimator (Deville(1998, 2002)) of the form ˆ tC = wk Rk yk , k∈s where wk = dk F λ⊤ zk , r F (.) is a function which is monotonic and twice diﬀerentiable. F λ⊤ zk : weighting adjustment factor which is essentially an r estimate of the inverse of the response probability for unit k. The weights wk are constructed so that the calibration constraints wk xk = xk k∈sr k∈U are satisﬁed.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 6 / 37
7. 7. IntroductionRemarks: Linear weighting is a special case for which the weights wk are given by wk = dk 1 + λ⊤ zk . r When x is used in the calibration instead of z then we have a usual calibration estimator and wk = dk F λ⊤ xk . rÉric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 7 / 37
8. 8. IntroductionError decomposition ˆ The total error of tC can be expressed as ˆ ˆ tC − ty = (tπ − ty ) + ˆ ˆ (tC − tπ ) . sampling error nonresponse error Since the sampling error does not depend on nonresponse, we focus on the nonresponse error in the sequel. Without loss of generality, we consider the case of a census s = U so ˆ that the sampling error, tπ − ty , is equal to zero.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 8 / 37
9. 9. IntroductionFirst approach: good speciﬁcation of the y model ˆ Regardless of the choice of F (.), the instrument vector calibration, tC , perfectly estimates ty if the variable of interest y is perfectly explained by the x-vector, i.e., yk = x⊤ β k for some vector β. ˆ Hence, we expect tC , to exhibit a small bias if the y-variable and the x-vector are linearly related and the relationship is strong. However, in multipurpose surveys, the number of variables of interest is typically large (possibly few hundred) and therefore, it is unrealistic to presume that the x-vector is linearly related to all y-variables, in which case some estimates could suﬀer from bias.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 9 / 37
10. 10. IntroductionSecond approach: estimation of the propensity of response For linear weighting, Särndal and Lundström (2005, Chapter 9) ˆ showed that, tC is asymptotically unbiased for ty for every y-variable provided that the response probability of unit k, pk , is such that −1 pk = 1 + λ⊤ zk for all k ∈ U ; (1) for a vector of unknown constants λ; see also Kott and Liao (2012) for a discussion for nonlinear weighting. However, in practice, it is not clear how to validate the form of the relationship in (1) since the z-vector is available for the respondents only.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 10 / 37
11. 11. Introduction The purpose of this presentation is to examine the so-called problem of bias ampliﬁcation in the context of instrument vector calibration. In the context of epidemiological studies, it has been found that, including instrumental variables in the set of conditioning variables, can increase unmeasured confounding bias; see Bahattacharya and Vogt (2007), Wooldridge (2009), Pearl (2010) and Myers et al. (2011). We argue that the same is true in the context of instrument vector calibration. Some preliminary studies in this direction can be found in Lesage (2012) and Osier (2012).Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 11 / 37
12. 12. Underlying modelsSuperpopulation model Let (yk , zk )⊤ be a realisation of the vector of random variables (Yk , Zk )⊤ , k ∈ U. Without loss of generality, we assume that E (Zk ) = 0 and V (Zk ) = 1. Further, we assume that the relationship between Y and Z can be modeled using y Yk = β0 + β1 Zk + εk such that E(εy | Zk ) = 0. k This model is often called a prediction model or outcome regression model.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 12 / 37
13. 13. Underlying modelsNonresponse model We also assume the following nonresponse model: Rk = γ0 + γ1 Zk + εR k E(εR | Zk ) = 0. k We assume that y is not a direct explanatory variable of nonresponse cov (Yk Rk | Zk ) = 0. Remarks The nonresponse model states that the response indicators Rk are linearly related to Z. Although this relationship may seem awkward, it will be useful to study the problem of bias ampliﬁcation. A more realistic nonresponse model, namely the logistic model, is considered in the empirical study.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 13 / 37
14. 14. Underlying models Y β1 Z γ1 R Figure: Graph of the variables y, z et RÉric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 14 / 37
15. 15. Bias ampliﬁcation of the instrumental calibration estimators Naive estimator We consider the naive estimator k∈U y k Rk ˆ tnaive = N × . k∈U Rk We have: ˆ tnaive ty cov(Yk Rk ) − = + oP (1) N N γ0 β1 γ1 = + oP (1) γ0 √ Example: with γ0 = 0.5, γ1 = 3/10, β0 = 10 and β1 = 2 γ1 β1 × = 6.9%. γ0 β0 Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 15 / 37
16. 16. Bias ampliﬁcation of the instrumental calibration estimators Instrument vector calibration estimators We suppose that a proxy variable of z, denoted x, is available. Deﬁnition A proxy variable of z , in nonresponse context, is a variable x such that: 1 x is an auxiliary variable which we know the population total tx ; 2 cor(Xk , Zk ) = 0; 3 cov (Xk Zk | [Rk = 1]) = 0. We assume that the relationship between X and Z can be modeled using Xk = α0 + α1 Zk + εx k E (εx | Zk ) = 0, k V (εx ) = σx = 1 − α2 k 2 1 Remarks: V (Xk ) = 1 and cor(Xk , Zk ) = α1 . Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 16 / 37
17. 17. Bias ampliﬁcation of the instrumental calibration estimators Instrument vector calibration estimators We assume also that x is not a direct explanatory variable of the nonresponse cov (Xk Rk | Zk ) = 0. Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 17 / 37
18. 18. Bias ampliﬁcation of the instrumental calibration estimators Y β1 Z γ1 R Figure: Graph of the variables y, z, x et R Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 18 / 37
19. 19. Bias ampliﬁcation of the instrumental calibration estimators Y β1 α1 Z X γ1 R Figure: Graph of the variables y, z, x et R Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 19 / 37
20. 20. Bias ampliﬁcation of the instrumental calibration estimators Instrument vector calibration estimators We consider the instrument vector calibration estimator with linear weighting  −1 ˆ tC = t⊤  x z k x⊤  k z k yk k∈sr k∈sr where xk = (1, xk )⊤ ; z k = (1, zk )⊤ ; tx = (N, tx )⊤ . Since cov (Xk , Rk | Zk ) = 0, we have: ˆ tC ty − = oP (1). N N Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 20 / 37
21. 21. Bias ampliﬁcation of the instrumental calibration estimators What if... It is not trivial to veriﬁy if cov (Xk , Rk | Zk ) = 0, since the variable z is available only for the respondents. What if cov (Xk , Rk | Zk ) = 0? Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 21 / 37
22. 22. Bias ampliﬁcation of the instrumental calibration estimators Y β1 α1 Z X U γ1 R Figure: Graph of the variables y, z, x, u et R Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 22 / 37
23. 23. Bias ampliﬁcation of the instrumental calibration estimators Y β1 α1 α2 Z X U γ1 γ2 R cov(Rk Xk | Z k ) = α2 γ2 Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 23 / 37
24. 24. Bias ampliﬁcation of the instrumental calibration estimators We now assume that it exists a non-observe variable u, independent of z and y, that is an explanatory variable in the nonresponse model. Without loss of generality, we assume that E (Uk | Zk , Yk ) = 0 and V (Uk | Zk , Yk ) = 1. The nonresponse model is rewritten Rk = γ0 + γ1 Zk + γ2 Uk + εR k E εR | Zk , Uk = 0. k We still assume that cov (Yk Rk | Zk ) = 0. Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 24 / 37
25. 25. Bias ampliﬁcation of the instrumental calibration estimators Moreover, we assume that the variable x is linked to the variable u Xk = α0 + α1 Zk + α2 Uk + εx k E εX | Zk , Uk = 0 k V (εx ) = σx = 1 − α2 − α2 k 2 1 2 R x E εk εk | Zk , Uk = 0 Then we have cov(Rk Xk | Z k ) = α2 γ2 . Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 25 / 37
26. 26. Bias ampliﬁcation of the instrumental calibration estimators We have: ˆ tC ty β1 1 E (Zk Xk | [Rk = 1]) − = − cov(Rk Xk | Z k ) N N α1 E (Rk ) cov(Zk Xk | [Rk = 1]) + oP (1) γ1 α1 + α0 β1 α2 γ2 γ0 = − γ0 α1 γ1 γ1 γ2 α1 − α1 + α2 γ0 γ0 γ0 + oP (1) If α2 γ2 = 0 then the instruments vector calibration is “biased”; The “bias” is ampliﬁed if α1 is small (i.e. weak proxy). Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 26 / 37
27. 27. Bias ampliﬁcation of the instrumental calibration estimators Bias ampliﬁcation for weak proxy with the instrument vector calibration estimator H HH α2 0 0.1 0.3 0.7 α1 HHH 0.7 0 -0.8 -2.3 -5.8 0.3 0 -1.8 -5.8 -15.5 0.1 0 -5.8 -21.7 -101 Y β1 α1 α2 Z X U γ1 γ2 R Éric LESAGE Figure: Graph CREST(ENSAE-ENSAI) z, x, u et R31 janvier 2013 (CREST-ENSAI) of the variables y, 27 / 37
28. 28. Bias ampliﬁcation of the instrumental calibration estimators Usual calibration estimators We have seen that instruments vector calibration could lead to estimators with large biases. Would a simple calibration protect against such bias ampliﬁcation risk?  −1 ˆ tC = t⊤  x xk x⊤  k xk y k . (2) N k∈sr k∈sr Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 28 / 37
29. 29. Bias ampliﬁcation of the instrumental calibration estimators The simple calibration estimator is asymptotically biased ˆ 2 tC ty β1 γ1 σx − α2 (α1 γ2 − α2 γ1 ) + B − = N N γ0 β0 γ1 γ2 2 1 − α1 + α2 γ0 γ0 + oP (1), where B = α0 α2 γ2 (α1 γ1 + α2 γ2 ) − γ1 1 − (α2 + α2 ) 1 2 is a nul term when α0 = 0, but it oﬀers a protection against bias ampliﬁcation. Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 29 / 37
30. 30. Bias ampliﬁcation of the instrumental calibration estimators No bias ampliﬁcation for weak proxy with the simple calibration estimator The usual calibration has a bias similar to the bias of the naive estimator. This bias is not ampliﬁed with the decrease of the correlation, α1 , between x and z. α1 α2 = 0 α2 = 0.1 α2 = 0.3 α2 = 0.7 0.7 3.8 3.5 2.8 1.5 0.3 6.4 6.3 6.1 5.7 0.1 6.9 6.8 6.8 6.8 Table: Asymptotic relative bias (in %) of the simple calibration for diﬀerent values of the parameters α1 and α2 Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 30 / 37
31. 31. Simulation studySimulation study We generated a population U of size N = 1 000 consisting of a variable of interest Y , several proxy variables denoted X (α1 ,α2 ) where α1 ∈ {0.2, 0.3, 0.5, 0.7} and α2 ∈ {0, 0.1, 0.3, 0.5}, an instrumental variable Z and an unobserved variable U. First, the variables Z and U were generated from a uniform √ √ distribution − 3, 3 , which led to mean equal to zero and variance equal to 1. Then, given the z-values, the y-values were generated according to the linear regression model Yk = 10 + 2zk + εy , k where εy is normally distributed with mean 0 and variance 1. k The resulting coeﬃcient of determination was equal to 79.2%.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 31 / 37
32. 32. Simulation study Finally, the proxy variables x(α1 ,α2 ) -values were generated according to the linear regression models (α ,α ) (α ,α ) Xk 1 2 = α1 zk + α2 uk + σ(α1 ,α2 ) εk 1 2 2 where σ1 (α1 , α2 ) = 1 − α2 − α2 1 2 and ε(α1 ,α2 ) was normally distributed with mean 0 and variance 1. In order to focus on the nonresponse error, we considered the census case; i.e., n = N = 1 000. Each unit was assigned a response probability by logit(pk ) = 1.5zk + uk Then, the response indicators Rk for k ∈ U were generated independently from a Bernoulli distribution with parameter pk . This whole process was repeated K = 10 000 times leading to K = 10 000 sets of respondents.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 32 / 37
33. 33. Simulation studyFor each simulation, we computed instruments vector calibration estimators ˆdenoted tC (α1 , α2 ) where α1 ∈ {0.2, 0.3, 0.5, 0.7} andα2 ∈ {0, 0.1, 0.3, 0.5}:  −1 ⊤ ˆ N (α1 ,α2 )⊤  tC (α1 , α2 ) =  z k xk z k yk . tx(α1 ,α2 ) k∈sr k∈srWe computed: ˆ the Monte Carlo percent relative bias: RBM C tC (α1 , α2 ) the Monte Carlo percent coeﬃcient of variation (CV): ˆ CVM C tC (α1 , α2 )Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 33 / 37
34. 34. Simulation study Monte Carlo relative bias ˆ tC (α1 , α2 ) − ty ˆ RBM C tC (α1 , α2 ) = EM C × 100. ty Monte Carlo CV ˆ VM C tC (α1 , α2 ) − ty ˆ CVM C tC (α1 , α2 ) = × 100. EM C (ty )Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 34 / 37
35. 35. Simulation study ˆRBM C (tC ) (in %) ˆ CVM C (tC ) (in %) α1 α2 = 0 α2 = 0.1 α2 = 0.3 α2 = 0.5 0.7 0.02 −0.9 −2.8 −4.9 (0.9) (0.9) (1.0) (1.1) 0.5 −0.1 −1.3 −4.1 −7.2 (1.4) (1.5) (1.7) (2.1) 0.3 −0.2 −2.4 −7.5 −14.0 (2.6) (3.0) (4.1) (5.9) 0.2 −0.6 −4.5 −13.8 −27.4 (4.6) (15.6) (61.9) (65.6)Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 35 / 37
36. 36. Simulation studyConclusion Instrument vector calibration is a good technique to adjust for nonresponse under certain conditions such as cov (Xk , Rk | Zk ) = 0 or at least α1 large. otherwise, one can get bias and variance ampliﬁcation. Y β1 α1 large Z X γ1 RÉric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 36 / 37
37. 37. Simulation studyMerci de votre attention.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 37 / 37