Favre

Introduction
How to measure the inﬂuence ?
Unit nonresponse
Simulation study
Conclusion

Robust inference in two phases sampling with an
application to unit nonresponse

Jean-François Beaumont, Statistics Canada
Cyril Favre Martinoz, Crest-Ensai
David Haziza, Université de Montréal, Crest-Ensai

30 janvier 2013

1/26
Ensai-Ensae 2013 Robust inference in two-phases sampling

Introduction
Unit nonresponse
Simulation study
Conclusion

Sommaire

1 Introduction

2 How to measure the inﬂuence ?

3 Unit nonresponse

4 Simulation study

5 Conclusion

2/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Sommaire

1 Introduction


3 Unit nonresponse

4 Simulation study

5 Conclusion

3/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Inﬂuential units

Unusual observations with possibly large design weights or large
value

4/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Inﬂuential units

value
Many survey statistics are sensitive to the presence of inﬂuential
units

4/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Inﬂuential units

value
units
Including or excluding an inﬂuential unit in the calculation of these
statistics can have a dramatic impact on their magnitude.

4/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Inﬂuential units

value
units
The occurrence of outliers is common in business surveys because
the distributions of variables (e.g., revenue, sales, etc.) are highly
skewed (heavy right tail)

4/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Inﬂuential units

value
units
Inﬂuential units are legitimate observations

4/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Influential units

value
units
Influential units are legitimate observations
The impact of influential units can be minimized by using a good
sampling design : for example, stratified sampling with a take-all
stratum

4/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Inﬂuential units

Even with a good sampling design, inﬂuential units may still be
selected in the sample (e.g., stratum jumpers)

5/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Inﬂuential units

In the presence of inﬂuential units, survey statistics are
(approximately) unbiased but they can have a very large variance.

5/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Inﬂuential units

Reducing the inﬂuence of large values produces stable but biased
estimators

5/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Influential units

Reducing the influence of large values produces stable but biased
estimators
Treatment of influential units : trade-off between bias and variance

5/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Two-phase designs

U : finite population of size N
s1 : first-phase sample, of size n1
s2 : second-phase sample, of size n2 , selected from s1
I1i : first-phase sample selection indicator for unit i
I2i : second-phase sample selection indicator for unit i
Vectors of indicators : I1 = (I11 , · · · , I1N ) and I2 = (I21 , · · · , I2N )
First-phase inclusion probability for unit i : π1i = P (I1i = 1)
Second-phase inclusion probability for unit i :
π2i (I1 ) = P (I2i = 1|I1 ; I1i = 1)

6/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Two-phase designs
U(N)

I1i  0, I 2i  0

I1i  1, I 2i  1
s2 (n2 )

I1i  1, I 2i  0 s1 (n1 )

6/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Point estimation

Goal : estimate a population total of a variable of interest y,

Y = yi
i∈U

7/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Point estimation


Y = yi
i∈U

y-values : available only for i ∈ s2

7/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Point estimation


Y = yi
i∈U

Complete data estimator : Double expansion estimator

ˆ yi yi
YDE = = ∗
i∈s2
π1i π2i i∈s
πi
2

7/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Point estimation


Y = yi
i∈U

Complete data estimator : Double expansion estimator

ˆ yi yi
YDE = = ∗
i∈s2
π1i π2i i∈s
πi
2

ˆ
YDE is design-unbiased for Y ; that is,
ˆ
E1 E2 (YDE |I1 ) = Y

7/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Sommaire

1 Introduction


3 Unit nonresponse

4 Simulation study

5 Conclusion

8/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Total error

ˆ
The total error of YDE :

ˆ
YDE − Y
How to measure the inﬂuence (or impact) of a unit on both errors ?

9/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Total error

ˆ

ˆ
YDE − Y
Single phase sampling : the conditional bias ; Moreno-Rebollo,
Munoz-Reyez and Munoz-Pichardo (1999), Beaumont, Haziza and
Ruiz-Gazen (2013).

9/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Total error

ˆ

ˆ
YDE − Y
Single phase sampling : the conditional bias ; Moreno-Rebollo,
Munoz-Reyez and Munoz-Pichardo (1999), Beaumont, Haziza and
Ruiz-Gazen (2013).
How to construct a robust estimator to the presence of inﬂuential
units ? Single phase designs : Beaumont, Haziza and Ruiz-Gazen
(2013).

9/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Measuring the inﬂuence : the conditional bias

We distinguish between three cases :
i ∈ s2 : sampled unit
i ∈ s1 − s2 : sampled in ﬁrst phase but not in the second phase
i ∈ U − s1 : nonsampled unit

10/26

Introduction
Unit nonresponse
Simulation study
Conclusion


We can only reduce the inﬂuence of the sampled units (i.e., the
units belonging to s2 )

10/26

Introduction
Unit nonresponse
Simulation study
Conclusion


Nothing can be done for the other units at the estimation stage

10/26

Introduction
Unit nonresponse
Simulation study
Conclusion


Nothing can be done for the other units at the estimation stage
Inﬂuence of sampled unit i ∈ s2 :
DE
Bi (I1i = 1, I2i = 1) ˆ
= E1 E2 (YDE − Y |I1 , I1i = 1, I2i = 1)

10/26

Introduction
Unit nonresponse
Simulation study
Conclusion

The conditional bias in particular designs
Arbitrary two-phase design :
∗
DE
πij
Bi (I1i = 1, I2i = 1) = ∗ ∗ − 1 yj
πi πj
j∈U

∗ n1 n2 n2
SRSWOR/SRSWOR with size n1 and n2 : πi = N × n1 = N

DE N N ¯
Bi (I1i = 1, I2i = 1) = ( − 1)(yi − Y )
(N − 1) n2

11/26

Introduction
Unit nonresponse
Simulation study
Conclusion

The conditional bias in particular designs
Arbitrary two-phase design :
∗
DE
πij
Bi (I1i = 1, I2i = 1) = ∗ ∗ − 1 yj
πi πj
j∈U

∗ n1 n2 n2
SRSWOR/SRSWOR with size n1 and n2 : πi = N × n1 = N

DE N N ¯
Bi (I1i = 1, I2i = 1) = ( − 1)(yi − Y )
(N − 1) n2

Arbitrary design/Poisson sampling :

DE π1ij −1 −1
Bi (I1i = 1, I2i = 1) = − 1 yj + π1i π2i − 1 yi
π1i π1j
j∈U

11/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Properties of conditional bias

can be interpreted as a contribution of each unit (sampled or
nonsampled) to the total error

12/26

Introduction
Unit nonresponse
Simulation study
Conclusion


take fully account of the sampling design : an unit may be highly
inﬂuential under a given sampling design but may have little or no
inﬂuence under another sampling design

12/26

Introduction
Unit nonresponse
Simulation study
Conclusion


∗ DE
If πi = 1, then Bi (I1i = 1, I2i = 1) = 0

12/26

Introduction
Unit nonresponse
Simulation study
Conclusion


∗ DE
If πi = 1, then Bi (I1i = 1, I2i = 1) = 0
unknown ⇒ must be estimated

12/26

Introduction
Unit nonresponse
Simulation study
Conclusion

A robust version of the double expansion estimator

Following Beaumont, Haziza and Ruiz-Gazen (2011), we obtain

ˆR ˆ
YDE = YDE − ˆ DE
Bi (I1i = 1, I2i = 1) + ˆ DE
ψ Bi (I1i = 1, I2i = 1)
i∈s2 i∈s2

Example of ψ-function :

 c if t > c
ψ (t) = t if |t| ≤ c
−c if t < −c


c : tuning constant

13/26

Introduction
Unit nonresponse
Simulation study
Conclusion

A robust version of the double expansion estimator

13/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Sommaire

1 Introduction


3 Unit nonresponse

4 Simulation study

5 Conclusion

14/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Notation
s2 : set of respondents
n2 : number of responding units (random)
I2i : response indicator for unit i
π2i : unknown response probability for unit i.
We assume sampled units respond independently of one another
(similar to Poisson sampling in the second phase)
Propensity score adjusted estimator, assuming the π2i ’s are known :
˜ yi
YP SA =
i∈s
π1i π2i
2

Influence of a responding unit :
π1ij −1 −1
Bi SA (I1i = 1, I2i = 1) =
P
− 1 yj + π1i π2i − 1 yi
π1i π1j
j∈U
Influence of unit i on
the nonresponse error
Influence of unit i on
the sampling error

15/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Nonresponse model

In practice, the response probability π2i is unknown

16/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Nonresponse model

In practice, the response probability π2i is unknown
Estimated response probability for unit i with a parametric
nonresponse model : π2i = m (xi , α)
ˆ ˆ

16/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Method of homogeneous response groups (RHG)

Step 1 :
we rank every element in G groups by a growing estimated response
probabilities
Step 2 :
The estimated probability of reponse in the g − th group is given by :
Si i ∈ g
−1
i∈srg π1i
π2i =
ˆ −1
i∈sg π1i
Propensity score adjusted estimator :

ˆ yi
YP SA =
i∈s2
π1i π2i
ˆ

17/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Asymptotic conditional bias

One can show that
ˆ ˆ
YP SA − YL = Op (n−1 ),

ˆ ˆ
where YL is the linearized version of YP SA .

18/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Asymptotic conditional bias

One can show that
ˆ ˆ
YP SA − YL = Op (n−1 ),

ˆ ˆ
where YL is the linearized version of YP SA .
Asymptotic conditional bias of a responding unit :
ˆ
Bi SA (I1i = 1, I2i = 1) ≈ E1 E2 (YL − Y |I1 , I1i = 1, I2i = 1)
P

π1ij
Bi SA (I1i = 1, I2i = 1) ≈
P
− 1 (yj − yU )
¯
π1i π1j
j∈U
−1 −1
+ π1i π2i − 1 (yi − yg )
¯

18/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Robust estimator in case of nonresponse

ˆ
Robust version of YP SA

ˆR
YP SA ˆ
= YP SA − ˆP
Bi SA (I1i = 1, I2i = 1)
i∈s2

+ ˆP
ψ Bi SA (I1i = 1, I2i = 1)
i∈s2

Choice of the tuning constant
We use a min-max criterion and we obtain :
ˆP ˆP
mini∈S (Bi SA ) + maxi∈S (Bi SA )
ˆR ˆ
YP SA = YP SA −
2

19/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Sommaire

1 Introduction


3 Unit nonresponse

4 Simulation study

5 Conclusion

20/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Sampling and generating the nonresponse

We generated three populations of size N = 5000 with two
variables : y and x.
Select P = 5000 samples, of size n1 = 300 or 500, according to
simple random sampling without replacement in ﬁrst phase
Generate nonresponse : Bernoulli trials with probability π2i , where

exp (xi α)
π2i =
1 + exp (xi α)

Global response rate : 70%
We computed the min-max method to calibrate the tuning constant

21/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Results for a total estimation

22/26

Introduction
Unit nonresponse
Simulation study
Conclusion


2
1 P ˆR ˆR
θP SA − EM C θP SA
P p=1
RVM C ˆR ˆ
θP SA , θP SA = 2,
1 P ˆ ˆ
θP SA − EM C θP SA
P p=1

et
2
1 P ˆR
θP SA − θ
P p=1
REM C ˆR ˆ
θP SA , θP SA = 2.
1 P ˆ
θP SA − θ
P p=1

22/26

Introduction
Unit nonresponse
Simulation study
Conclusion


Pop Taille n1 ˆRP
RBM C θp SA (%) ˆRP ˆP
RVM C θp SA , θp SA REM C
300 0.02 1.0 0.98
1
500 0.04 1.0 0.98
300 −0.98 0.52 0.58
2
500 −0.60 0.61 0.64
300 −1.33 0.50 0.52
3
500 −1.16 0.59 0.63

22/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Sommaire

1 Introduction


3 Unit nonresponse

4 Simulation study

5 Conclusion

23/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Concluding remarks

Conditional bias : measure of inﬂuence that takes account of the
sampling design, the parameter to be estimated and the estimator
Results can be extended to the case of calibration estimators ⇒
important in the unit nonresponse context since weight adjustment
procedures by the inverse of the estimated response probabilities are
generally followed by some form of calibration
Proof of the design consistency and the normality of the robust
estimator

24/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Bibliographie I

J.F. Beaumont, D. Haziza, and A. Ruiz-Gazen.
A unified approach to robust estimation in finite population
sampling.
En révision, 2011.
R.L. Chambers.
Outlier robust finite population estimation.
Journal of the American Statistical Association, pages 1063–1069,
1986.
P.N. Kokic and P.A. Bell.
Optimal winsorizing cutoffs for a stratified finite population
estimator.
Journal of official statistics-Stockholm, 10 :419–419, 1994.

25/26

Introduction
Unit nonresponse
Simulation study
Conclusion

Bibliographie II

J.L. Moreno-Rebollo, A. Munoz-Reyes, M.D. Jimenez-Gamero, and
J. Munoz-Pichardo.
Influence diagnostic in survey sampling : Estimating the conditional
bias.
Metrika, 55(3) :209–214, 2002.
J.L. Moreno-Rebollo, A. Muñoz-Reyes, and J. Muñoz-Pichardo.
Miscellanea. influence diagnostic in survey sampling : conditional
bias.
Biometrika, 86(4) :923–928, 1999.
J. Munoz-Pichardo, J. Munoz-Garcia, J.L. Moreno-Rebollo, and
R. Pino-Mejias.
A new approach to influence analysis in linear models.
Sankhy¯ : The Indian Journal of Statistics, Series A, pages 393–409,
a
1995.
26/26

Favre

Recommended

Recommended

More Related Content

More from eric_gautier

More from eric_gautier (8)

Favre