On the problem of bias amplification        of the instrumental calibration estimator                with missing survey da...
Outlines1   Introduction2   Underlying models3   Bias amplification of the instrumental calibration estimators4   Simulatio...
IntroductionContext     Nonresponse is a major problem in survey     In presence of nonresponse, the usual complete data e...
Introduction     Consider a finite population U of size N .     The objective is to estimate the population total ty =     ...
Introduction     To define a nonresponse adjusted estimator of ty , we assume that              a vector of auxiliary varia...
Introduction                                   ˆInstrumental calibration estimator tC     We consider an instrumental cali...
IntroductionRemarks:     Linear weighting is a special case for which the weights wk are given by                         ...
IntroductionError decomposition                        ˆ     The total error of tC can be expressed as                    ...
IntroductionFirst approach: good specification of the y model                                                              ...
IntroductionSecond approach: estimation of the propensity of response     For linear weighting, Särndal and Lundström (200...
Introduction     The purpose of this presentation is to examine the so-called problem     of bias amplification in the cont...
Underlying modelsSuperpopulation model     Let (yk , zk )⊤ be a realisation of the vector of random variables     (Yk , Zk...
Underlying modelsNonresponse model     We also assume the following nonresponse model:                                    ...
Underlying models                                                          Y                                            β1...
Bias amplification of the instrumental calibration estimators  Naive estimator          We consider the naive estimator    ...
Bias amplification of the instrumental calibration estimators  Instrument vector calibration estimators          We suppose...
Bias amplification of the instrumental calibration estimators  Instrument vector calibration estimators          We assume ...
Bias amplification of the instrumental calibration estimators                                                              ...
Bias amplification of the instrumental calibration estimators                                                              ...
Bias amplification of the instrumental calibration estimators  Instrument vector calibration estimators          We conside...
Bias amplification of the instrumental calibration estimators  What if...          It is not trivial to verifiy if cov (Xk ,...
Bias amplification of the instrumental calibration estimators                                                              ...
Bias amplification of the instrumental calibration estimators                                                              ...
Bias amplification of the instrumental calibration estimators          We now assume that it exists a non-observe variable ...
Bias amplification of the instrumental calibration estimators          Moreover, we assume that the variable x is linked to...
Bias amplification of the instrumental calibration estimators   We have:          ˆ          tC   ty              β1       ...
Bias amplification of the instrumental calibration estimators  Bias amplification for weak proxy with the instrument vector ...
Bias amplification of the instrumental calibration estimators  Usual calibration estimators          We have seen that inst...
Bias amplification of the instrumental calibration estimators          The simple calibration estimator is asymptotically b...
Bias amplification of the instrumental calibration estimators  No bias amplification for weak proxy with the simple  calibra...
Simulation studySimulation study     We generated a population U of size N = 1 000 consisting of              a variable o...
Simulation study     Finally, the proxy variables x(α1 ,α2 ) -values were generated according to     the linear regression...
Simulation studyFor each simulation, we computed instruments vector calibration estimators        ˆdenoted tC (α1 , α2 ) w...
Simulation study     Monte Carlo relative bias                                             ˆ                              ...
Simulation study       ˆRBM C (tC ) (in %)        ˆ CVM C (tC ) (in %)                  α1 α2 = 0 α2 = 0.1 α2 = 0.3 α2 = 0...
Simulation studyConclusion     Instrument vector calibration is a good technique to adjust for     nonresponse under certa...
Simulation studyMerci de votre attention.Éric LESAGE   (CREST-ENSAI)         CREST(ENSAE-ENSAI)   31 janvier 2013   37 / 37
Upcoming SlideShare
Loading in …5
×

Lesage

320 views
207 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
320
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Lesage

  1. 1. On the problem of bias amplification of the instrumental calibration estimator with missing survey data Éric LESAGE Laboratoire de statistique d’enquête CREST-ENSAI Joint work with David HAZIZA (Université de Montréal and CREST-ENSAI) 31 janvier 2013 Séminaire de Statistique ENSAE-ENSAI CREST, ParisÉric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 1 / 37
  2. 2. Outlines1 Introduction2 Underlying models3 Bias amplification of the instrumental calibration estimators4 Simulation studyÉric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 2 / 37
  3. 3. IntroductionContext Nonresponse is a major problem in survey In presence of nonresponse, the usual complete data estimators may be biased when respondents and nonrespondents are different with respect to the survey variables. A weighting approach that has received a lot of attention recently is the so-called single-step approach which uses calibration. See Deville (1998, 2002), Sautory (2003), Särndal and Lundström (2005), Kott (2006, 2009, 2012), among others.Issue We examine the properties of instrument vector calibration estimators, where the instrumental variables (related to the response propensity) are available for the responding units only. More specifically, the problem of bias amplification is illustrated.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 3 / 37
  4. 4. Introduction Consider a finite population U of size N . The objective is to estimate the population total ty = yk , of a k∈U variable of interest y(e.g., incomes). A sample, s, of size n, is selected from U according to a given sampling design p(s). A complete data estimator of ty is the expansion estimator ˆ tπ = dk yk , k∈s where dk = 1/πk denotes the design weight attached to unit k and πk = P (k ∈ s) denotes its first-order probability of inclusion in the sample. In the presence of unit nonresponse, only a subset sr of s is observed, ˆ which makes it impossible to compute tπ .Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 4 / 37
  5. 5. Introduction To define a nonresponse adjusted estimator of ty , we assume that a vector of auxiliary variables x is available for k ∈ sr ; the vector of population totals tx = k∈U xk is known; In practice, the x-vector is often defined by survey managers, who wish to ensure consistency between survey weighted estimates and known population totals for some important variables (e.g., age and sex). In addition, we assume that a vector of instrumental variables z is available for k ∈ sr ; same dimension as x,; The z-vector needs only to be available for the respondents. The instrumental variables are believed to be associated with the propensity of units to respond to the survey. Let Rk be a response indicator attached to unit k such that Rk = 1 if unit k is a respondent Rk = 0 otherwise.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 5 / 37
  6. 6. Introduction ˆInstrumental calibration estimator tC We consider an instrumental calibration estimator (Deville(1998, 2002)) of the form ˆ tC = wk Rk yk , k∈s where wk = dk F λ⊤ zk , r F (.) is a function which is monotonic and twice differentiable. F λ⊤ zk : weighting adjustment factor which is essentially an r estimate of the inverse of the response probability for unit k. The weights wk are constructed so that the calibration constraints wk xk = xk k∈sr k∈U are satisfied.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 6 / 37
  7. 7. IntroductionRemarks: Linear weighting is a special case for which the weights wk are given by wk = dk 1 + λ⊤ zk . r When x is used in the calibration instead of z then we have a usual calibration estimator and wk = dk F λ⊤ xk . rÉric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 7 / 37
  8. 8. IntroductionError decomposition ˆ The total error of tC can be expressed as ˆ ˆ tC − ty = (tπ − ty ) + ˆ ˆ (tC − tπ ) . sampling error nonresponse error Since the sampling error does not depend on nonresponse, we focus on the nonresponse error in the sequel. Without loss of generality, we consider the case of a census s = U so ˆ that the sampling error, tπ − ty , is equal to zero.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 8 / 37
  9. 9. IntroductionFirst approach: good specification of the y model ˆ Regardless of the choice of F (.), the instrument vector calibration, tC , perfectly estimates ty if the variable of interest y is perfectly explained by the x-vector, i.e., yk = x⊤ β k for some vector β. ˆ Hence, we expect tC , to exhibit a small bias if the y-variable and the x-vector are linearly related and the relationship is strong. However, in multipurpose surveys, the number of variables of interest is typically large (possibly few hundred) and therefore, it is unrealistic to presume that the x-vector is linearly related to all y-variables, in which case some estimates could suffer from bias.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 9 / 37
  10. 10. IntroductionSecond approach: estimation of the propensity of response For linear weighting, Särndal and Lundström (2005, Chapter 9) ˆ showed that, tC is asymptotically unbiased for ty for every y-variable provided that the response probability of unit k, pk , is such that −1 pk = 1 + λ⊤ zk for all k ∈ U ; (1) for a vector of unknown constants λ; see also Kott and Liao (2012) for a discussion for nonlinear weighting. However, in practice, it is not clear how to validate the form of the relationship in (1) since the z-vector is available for the respondents only.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 10 / 37
  11. 11. Introduction The purpose of this presentation is to examine the so-called problem of bias amplification in the context of instrument vector calibration. In the context of epidemiological studies, it has been found that, including instrumental variables in the set of conditioning variables, can increase unmeasured confounding bias; see Bahattacharya and Vogt (2007), Wooldridge (2009), Pearl (2010) and Myers et al. (2011). We argue that the same is true in the context of instrument vector calibration. Some preliminary studies in this direction can be found in Lesage (2012) and Osier (2012).Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 11 / 37
  12. 12. Underlying modelsSuperpopulation model Let (yk , zk )⊤ be a realisation of the vector of random variables (Yk , Zk )⊤ , k ∈ U. Without loss of generality, we assume that E (Zk ) = 0 and V (Zk ) = 1. Further, we assume that the relationship between Y and Z can be modeled using y Yk = β0 + β1 Zk + εk such that E(εy | Zk ) = 0. k This model is often called a prediction model or outcome regression model.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 12 / 37
  13. 13. Underlying modelsNonresponse model We also assume the following nonresponse model: Rk = γ0 + γ1 Zk + εR k E(εR | Zk ) = 0. k We assume that y is not a direct explanatory variable of nonresponse cov (Yk Rk | Zk ) = 0. Remarks The nonresponse model states that the response indicators Rk are linearly related to Z. Although this relationship may seem awkward, it will be useful to study the problem of bias amplification. A more realistic nonresponse model, namely the logistic model, is considered in the empirical study.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 13 / 37
  14. 14. Underlying models Y β1 Z γ1 R Figure: Graph of the variables y, z et RÉric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 14 / 37
  15. 15. Bias amplification of the instrumental calibration estimators Naive estimator We consider the naive estimator k∈U y k Rk ˆ tnaive = N × . k∈U Rk We have: ˆ tnaive ty cov(Yk Rk ) − = + oP (1) N N γ0 β1 γ1 = + oP (1) γ0 √ Example: with γ0 = 0.5, γ1 = 3/10, β0 = 10 and β1 = 2 γ1 β1 × = 6.9%. γ0 β0 Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 15 / 37
  16. 16. Bias amplification of the instrumental calibration estimators Instrument vector calibration estimators We suppose that a proxy variable of z, denoted x, is available. Definition A proxy variable of z , in nonresponse context, is a variable x such that: 1 x is an auxiliary variable which we know the population total tx ; 2 cor(Xk , Zk ) = 0; 3 cov (Xk Zk | [Rk = 1]) = 0. We assume that the relationship between X and Z can be modeled using Xk = α0 + α1 Zk + εx k E (εx | Zk ) = 0, k V (εx ) = σx = 1 − α2 k 2 1 Remarks: V (Xk ) = 1 and cor(Xk , Zk ) = α1 . Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 16 / 37
  17. 17. Bias amplification of the instrumental calibration estimators Instrument vector calibration estimators We assume also that x is not a direct explanatory variable of the nonresponse cov (Xk Rk | Zk ) = 0. Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 17 / 37
  18. 18. Bias amplification of the instrumental calibration estimators Y β1 Z γ1 R Figure: Graph of the variables y, z, x et R Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 18 / 37
  19. 19. Bias amplification of the instrumental calibration estimators Y β1 α1 Z X γ1 R Figure: Graph of the variables y, z, x et R Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 19 / 37
  20. 20. Bias amplification of the instrumental calibration estimators Instrument vector calibration estimators We consider the instrument vector calibration estimator with linear weighting  −1 ˆ tC = t⊤  x z k x⊤  k z k yk k∈sr k∈sr where xk = (1, xk )⊤ ; z k = (1, zk )⊤ ; tx = (N, tx )⊤ . Since cov (Xk , Rk | Zk ) = 0, we have: ˆ tC ty − = oP (1). N N Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 20 / 37
  21. 21. Bias amplification of the instrumental calibration estimators What if... It is not trivial to verifiy if cov (Xk , Rk | Zk ) = 0, since the variable z is available only for the respondents. What if cov (Xk , Rk | Zk ) = 0? Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 21 / 37
  22. 22. Bias amplification of the instrumental calibration estimators Y β1 α1 Z X U γ1 R Figure: Graph of the variables y, z, x, u et R Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 22 / 37
  23. 23. Bias amplification of the instrumental calibration estimators Y β1 α1 α2 Z X U γ1 γ2 R cov(Rk Xk | Z k ) = α2 γ2 Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 23 / 37
  24. 24. Bias amplification of the instrumental calibration estimators We now assume that it exists a non-observe variable u, independent of z and y, that is an explanatory variable in the nonresponse model. Without loss of generality, we assume that E (Uk | Zk , Yk ) = 0 and V (Uk | Zk , Yk ) = 1. The nonresponse model is rewritten Rk = γ0 + γ1 Zk + γ2 Uk + εR k E εR | Zk , Uk = 0. k We still assume that cov (Yk Rk | Zk ) = 0. Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 24 / 37
  25. 25. Bias amplification of the instrumental calibration estimators Moreover, we assume that the variable x is linked to the variable u Xk = α0 + α1 Zk + α2 Uk + εx k E εX | Zk , Uk = 0 k V (εx ) = σx = 1 − α2 − α2 k 2 1 2 R x E εk εk | Zk , Uk = 0 Then we have cov(Rk Xk | Z k ) = α2 γ2 . Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 25 / 37
  26. 26. Bias amplification of the instrumental calibration estimators We have: ˆ tC ty β1 1 E (Zk Xk | [Rk = 1]) − = − cov(Rk Xk | Z k ) N N α1 E (Rk ) cov(Zk Xk | [Rk = 1]) + oP (1) γ1 α1 + α0 β1 α2 γ2 γ0 = − γ0 α1 γ1 γ1 γ2 α1 − α1 + α2 γ0 γ0 γ0 + oP (1) If α2 γ2 = 0 then the instruments vector calibration is “biased”; The “bias” is amplified if α1 is small (i.e. weak proxy). Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 26 / 37
  27. 27. Bias amplification of the instrumental calibration estimators Bias amplification for weak proxy with the instrument vector calibration estimator H HH α2 0 0.1 0.3 0.7 α1 HHH 0.7 0 -0.8 -2.3 -5.8 0.3 0 -1.8 -5.8 -15.5 0.1 0 -5.8 -21.7 -101 Y β1 α1 α2 Z X U γ1 γ2 R Éric LESAGE Figure: Graph CREST(ENSAE-ENSAI) z, x, u et R31 janvier 2013 (CREST-ENSAI) of the variables y, 27 / 37
  28. 28. Bias amplification of the instrumental calibration estimators Usual calibration estimators We have seen that instruments vector calibration could lead to estimators with large biases. Would a simple calibration protect against such bias amplification risk?  −1 ˆ tC = t⊤  x xk x⊤  k xk y k . (2) N k∈sr k∈sr Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 28 / 37
  29. 29. Bias amplification of the instrumental calibration estimators The simple calibration estimator is asymptotically biased ˆ 2 tC ty β1 γ1 σx − α2 (α1 γ2 − α2 γ1 ) + B − = N N γ0 β0 γ1 γ2 2 1 − α1 + α2 γ0 γ0 + oP (1), where B = α0 α2 γ2 (α1 γ1 + α2 γ2 ) − γ1 1 − (α2 + α2 ) 1 2 is a nul term when α0 = 0, but it offers a protection against bias amplification. Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 29 / 37
  30. 30. Bias amplification of the instrumental calibration estimators No bias amplification for weak proxy with the simple calibration estimator The usual calibration has a bias similar to the bias of the naive estimator. This bias is not amplified with the decrease of the correlation, α1 , between x and z. α1 α2 = 0 α2 = 0.1 α2 = 0.3 α2 = 0.7 0.7 3.8 3.5 2.8 1.5 0.3 6.4 6.3 6.1 5.7 0.1 6.9 6.8 6.8 6.8 Table: Asymptotic relative bias (in %) of the simple calibration for different values of the parameters α1 and α2 Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 30 / 37
  31. 31. Simulation studySimulation study We generated a population U of size N = 1 000 consisting of a variable of interest Y , several proxy variables denoted X (α1 ,α2 ) where α1 ∈ {0.2, 0.3, 0.5, 0.7} and α2 ∈ {0, 0.1, 0.3, 0.5}, an instrumental variable Z and an unobserved variable U. First, the variables Z and U were generated from a uniform √ √ distribution − 3, 3 , which led to mean equal to zero and variance equal to 1. Then, given the z-values, the y-values were generated according to the linear regression model Yk = 10 + 2zk + εy , k where εy is normally distributed with mean 0 and variance 1. k The resulting coefficient of determination was equal to 79.2%.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 31 / 37
  32. 32. Simulation study Finally, the proxy variables x(α1 ,α2 ) -values were generated according to the linear regression models (α ,α ) (α ,α ) Xk 1 2 = α1 zk + α2 uk + σ(α1 ,α2 ) εk 1 2 2 where σ1 (α1 , α2 ) = 1 − α2 − α2 1 2 and ε(α1 ,α2 ) was normally distributed with mean 0 and variance 1. In order to focus on the nonresponse error, we considered the census case; i.e., n = N = 1 000. Each unit was assigned a response probability by logit(pk ) = 1.5zk + uk Then, the response indicators Rk for k ∈ U were generated independently from a Bernoulli distribution with parameter pk . This whole process was repeated K = 10 000 times leading to K = 10 000 sets of respondents.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 32 / 37
  33. 33. Simulation studyFor each simulation, we computed instruments vector calibration estimators ˆdenoted tC (α1 , α2 ) where α1 ∈ {0.2, 0.3, 0.5, 0.7} andα2 ∈ {0, 0.1, 0.3, 0.5}:  −1 ⊤ ˆ N (α1 ,α2 )⊤  tC (α1 , α2 ) =  z k xk z k yk . tx(α1 ,α2 ) k∈sr k∈srWe computed: ˆ the Monte Carlo percent relative bias: RBM C tC (α1 , α2 ) the Monte Carlo percent coefficient of variation (CV): ˆ CVM C tC (α1 , α2 )Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 33 / 37
  34. 34. Simulation study Monte Carlo relative bias ˆ tC (α1 , α2 ) − ty ˆ RBM C tC (α1 , α2 ) = EM C × 100. ty Monte Carlo CV ˆ VM C tC (α1 , α2 ) − ty ˆ CVM C tC (α1 , α2 ) = × 100. EM C (ty )Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 34 / 37
  35. 35. Simulation study ˆRBM C (tC ) (in %) ˆ CVM C (tC ) (in %) α1 α2 = 0 α2 = 0.1 α2 = 0.3 α2 = 0.5 0.7 0.02 −0.9 −2.8 −4.9 (0.9) (0.9) (1.0) (1.1) 0.5 −0.1 −1.3 −4.1 −7.2 (1.4) (1.5) (1.7) (2.1) 0.3 −0.2 −2.4 −7.5 −14.0 (2.6) (3.0) (4.1) (5.9) 0.2 −0.6 −4.5 −13.8 −27.4 (4.6) (15.6) (61.9) (65.6)Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 35 / 37
  36. 36. Simulation studyConclusion Instrument vector calibration is a good technique to adjust for nonresponse under certain conditions such as cov (Xk , Rk | Zk ) = 0 or at least α1 large. otherwise, one can get bias and variance amplification. Y β1 α1 large Z X γ1 RÉric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 36 / 37
  37. 37. Simulation studyMerci de votre attention.Éric LESAGE (CREST-ENSAI) CREST(ENSAE-ENSAI) 31 janvier 2013 37 / 37

×