SlideShare a Scribd company logo
1 of 59
Introduction
       How to measure the influence ?
                   Unit nonresponse
                   Simulation study
                          Conclusion




Robust inference in two phases sampling with an
        application to unit nonresponse

       Jean-François Beaumont, Statistics Canada
            Cyril Favre Martinoz, Crest-Ensai
     David Haziza, Université de Montréal, Crest-Ensai



                           30 janvier 2013




                                                                                 1/26
                  Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
                   How to measure the influence ?
                               Unit nonresponse
                               Simulation study
                                      Conclusion


Sommaire


  1   Introduction

  2   How to measure the influence ?

  3   Unit nonresponse

  4   Simulation study

  5   Conclusion




                                                                                             2/26
                              Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
                     How to measure the influence ?
                                 Unit nonresponse
                                 Simulation study
                                        Conclusion


Sommaire


  1   Introduction

  2   How to measure the influence ?

  3   Unit nonresponse

  4   Simulation study

  5   Conclusion




                                                                                               3/26
                                Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Influential units

      Unusual observations with possibly large design weights or large
      value




                                                                                         4/26
                          Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
                How to measure the influence ?
                            Unit nonresponse
                            Simulation study
                                   Conclusion


Influential units

      Unusual observations with possibly large design weights or large
      value
      Many survey statistics are sensitive to the presence of influential
      units




                                                                                          4/26
                           Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
                How to measure the influence ?
                            Unit nonresponse
                            Simulation study
                                   Conclusion


Influential units

      Unusual observations with possibly large design weights or large
      value
      Many survey statistics are sensitive to the presence of influential
      units
      Including or excluding an influential unit in the calculation of these
      statistics can have a dramatic impact on their magnitude.




                                                                                          4/26
                           Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
                How to measure the influence ?
                            Unit nonresponse
                            Simulation study
                                   Conclusion


Influential units

      Unusual observations with possibly large design weights or large
      value
      Many survey statistics are sensitive to the presence of influential
      units
      Including or excluding an influential unit in the calculation of these
      statistics can have a dramatic impact on their magnitude.
      The occurrence of outliers is common in business surveys because
      the distributions of variables (e.g., revenue, sales, etc.) are highly
      skewed (heavy right tail)




                                                                                          4/26
                           Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
                How to measure the influence ?
                            Unit nonresponse
                            Simulation study
                                   Conclusion


Influential units

      Unusual observations with possibly large design weights or large
      value
      Many survey statistics are sensitive to the presence of influential
      units
      Including or excluding an influential unit in the calculation of these
      statistics can have a dramatic impact on their magnitude.
      The occurrence of outliers is common in business surveys because
      the distributions of variables (e.g., revenue, sales, etc.) are highly
      skewed (heavy right tail)
      Influential units are legitimate observations




                                                                                          4/26
                           Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
                How to measure the influence ?
                            Unit nonresponse
                            Simulation study
                                   Conclusion


Influential units

      Unusual observations with possibly large design weights or large
      value
      Many survey statistics are sensitive to the presence of influential
      units
      Including or excluding an influential unit in the calculation of these
      statistics can have a dramatic impact on their magnitude.
      The occurrence of outliers is common in business surveys because
      the distributions of variables (e.g., revenue, sales, etc.) are highly
      skewed (heavy right tail)
      Influential units are legitimate observations
      The impact of influential units can be minimized by using a good
      sampling design : for example, stratified sampling with a take-all
      stratum

                                                                                          4/26
                           Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
                How to measure the influence ?
                            Unit nonresponse
                            Simulation study
                                   Conclusion


Influential units

      Unusual observations with possibly large design weights or large
      value
      Many survey statistics are sensitive to the presence of influential
      units
      Including or excluding an influential unit in the calculation of these
      statistics can have a dramatic impact on their magnitude.
      The occurrence of outliers is common in business surveys because
      the distributions of variables (e.g., revenue, sales, etc.) are highly
      skewed (heavy right tail)
      Influential units are legitimate observations
      The impact of influential units can be minimized by using a good
      sampling design : for example, stratified sampling with a take-all
      stratum

                                                                                          4/26
                           Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Influential units



      Even with a good sampling design, influential units may still be
      selected in the sample (e.g., stratum jumpers)




                                                                                         5/26
                          Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Influential units



      Even with a good sampling design, influential units may still be
      selected in the sample (e.g., stratum jumpers)
      In the presence of influential units, survey statistics are
      (approximately) unbiased but they can have a very large variance.




                                                                                         5/26
                          Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Influential units



      Even with a good sampling design, influential units may still be
      selected in the sample (e.g., stratum jumpers)
      In the presence of influential units, survey statistics are
      (approximately) unbiased but they can have a very large variance.
      Reducing the influence of large values produces stable but biased
      estimators




                                                                                         5/26
                          Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Influential units



      Even with a good sampling design, influential units may still be
      selected in the sample (e.g., stratum jumpers)
      In the presence of influential units, survey statistics are
      (approximately) unbiased but they can have a very large variance.
      Reducing the influence of large values produces stable but biased
      estimators
      Treatment of influential units : trade-off between bias and variance




                                                                                         5/26
                          Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Influential units



      Even with a good sampling design, influential units may still be
      selected in the sample (e.g., stratum jumpers)
      In the presence of influential units, survey statistics are
      (approximately) unbiased but they can have a very large variance.
      Reducing the influence of large values produces stable but biased
      estimators
      Treatment of influential units : trade-off between bias and variance




                                                                                         5/26
                          Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
                How to measure the influence ?
                            Unit nonresponse
                            Simulation study
                                   Conclusion


Two-phase designs


     U : finite population of size N
     s1 : first-phase sample, of size n1
     s2 : second-phase sample, of size n2 , selected from s1
     I1i : first-phase sample selection indicator for unit i
     I2i : second-phase sample selection indicator for unit i
     Vectors of indicators : I1 = (I11 , · · · , I1N ) and I2 = (I21 , · · · , I2N )
     First-phase inclusion probability for unit i : π1i = P (I1i = 1)
     Second-phase inclusion probability for unit i :
     π2i (I1 ) = P (I2i = 1|I1 ; I1i = 1)




                                                                                          6/26
                           Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
            How to measure the influence ?
                        Unit nonresponse
                        Simulation study
                               Conclusion


Two-phase designs
                                                    U(N)

             I1i  0, I 2i  0




                                            I1i  1, I 2i  1
                                                                    s2 (n2 )


                                      I1i  1, I 2i  0                        s1 (n1 )


                                                                                           6/26
                       Ensai-Ensae 2013          Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Point estimation

     Goal : estimate a population total of a variable of interest y,

                                          Y =         yi
                                                i∈U




                                                                                          7/26
                          Ensai-Ensae 2013      Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Point estimation

     Goal : estimate a population total of a variable of interest y,

                                          Y =         yi
                                                i∈U


     y-values : available only for i ∈ s2




                                                                                          7/26
                          Ensai-Ensae 2013      Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Point estimation

     Goal : estimate a population total of a variable of interest y,

                                          Y =         yi
                                                i∈U


     y-values : available only for i ∈ s2
     Complete data estimator : Double expansion estimator

                             ˆ                   yi          yi
                             YDE =                     =      ∗
                                        i∈s2
                                               π1i π2i   i∈s
                                                             πi
                                                               2




                                                                                          7/26
                          Ensai-Ensae 2013      Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Point estimation

     Goal : estimate a population total of a variable of interest y,

                                          Y =         yi
                                                i∈U


     y-values : available only for i ∈ s2
     Complete data estimator : Double expansion estimator

                             ˆ                   yi          yi
                             YDE =                     =      ∗
                                        i∈s2
                                               π1i π2i   i∈s
                                                             πi
                                                               2



     ˆ
     YDE is design-unbiased for Y ; that is,
                                          ˆ
                                   E1 E2 (YDE |I1 ) = Y

                                                                                          7/26
                          Ensai-Ensae 2013      Robust inference in two-phases sampling
Introduction
                     How to measure the influence ?
                                 Unit nonresponse
                                 Simulation study
                                        Conclusion


Sommaire


  1   Introduction

  2   How to measure the influence ?

  3   Unit nonresponse

  4   Simulation study

  5   Conclusion




                                                                                               8/26
                                Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Total error


                         ˆ
      The total error of YDE :

                                             ˆ
                                             YDE − Y
      How to measure the influence (or impact) of a unit on both errors ?




                                                                                          9/26
                          Ensai-Ensae 2013      Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Total error


                         ˆ
      The total error of YDE :

                                             ˆ
                                             YDE − Y
      How to measure the influence (or impact) of a unit on both errors ?
      Single phase sampling : the conditional bias ; Moreno-Rebollo,
      Munoz-Reyez and Munoz-Pichardo (1999), Beaumont, Haziza and
      Ruiz-Gazen (2013).




                                                                                          9/26
                          Ensai-Ensae 2013      Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Total error


                         ˆ
      The total error of YDE :

                                             ˆ
                                             YDE − Y
      How to measure the influence (or impact) of a unit on both errors ?
      Single phase sampling : the conditional bias ; Moreno-Rebollo,
      Munoz-Reyez and Munoz-Pichardo (1999), Beaumont, Haziza and
      Ruiz-Gazen (2013).
      How to construct a robust estimator to the presence of influential
      units ? Single phase designs : Beaumont, Haziza and Ruiz-Gazen
      (2013).




                                                                                          9/26
                          Ensai-Ensae 2013      Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Measuring the influence : the conditional bias

      We distinguish between three cases :
          i ∈ s2 : sampled unit
          i ∈ s1 − s2 : sampled in first phase but not in the second phase
          i ∈ U − s1 : nonsampled unit




                                                                                         10/26
                          Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Measuring the influence : the conditional bias

      We distinguish between three cases :
          i ∈ s2 : sampled unit
          i ∈ s1 − s2 : sampled in first phase but not in the second phase
          i ∈ U − s1 : nonsampled unit
      We can only reduce the influence of the sampled units (i.e., the
      units belonging to s2 )




                                                                                         10/26
                          Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Measuring the influence : the conditional bias

      We distinguish between three cases :
          i ∈ s2 : sampled unit
          i ∈ s1 − s2 : sampled in first phase but not in the second phase
          i ∈ U − s1 : nonsampled unit
      We can only reduce the influence of the sampled units (i.e., the
      units belonging to s2 )
      Nothing can be done for the other units at the estimation stage




                                                                                         10/26
                          Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Measuring the influence : the conditional bias

      We distinguish between three cases :
          i ∈ s2 : sampled unit
          i ∈ s1 − s2 : sampled in first phase but not in the second phase
          i ∈ U − s1 : nonsampled unit
      We can only reduce the influence of the sampled units (i.e., the
      units belonging to s2 )
      Nothing can be done for the other units at the estimation stage
      Influence of sampled unit i ∈ s2 :
        DE
       Bi (I1i = 1, I2i = 1)                    ˆ
                                       = E1 E2 (YDE − Y |I1 , I1i = 1, I2i = 1)




                                                                                         10/26
                          Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Measuring the influence : the conditional bias

      We distinguish between three cases :
          i ∈ s2 : sampled unit
          i ∈ s1 − s2 : sampled in first phase but not in the second phase
          i ∈ U − s1 : nonsampled unit
      We can only reduce the influence of the sampled units (i.e., the
      units belonging to s2 )
      Nothing can be done for the other units at the estimation stage
      Influence of sampled unit i ∈ s2 :
        DE
       Bi (I1i = 1, I2i = 1)                    ˆ
                                       = E1 E2 (YDE − Y |I1 , I1i = 1, I2i = 1)




                                                                                         10/26
                          Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


The conditional bias in particular designs
      Arbitrary two-phase design :
                                                                      ∗
                DE
                                                                    πij
               Bi (I1i = 1, I2i = 1)               =                ∗ ∗ − 1 yj
                                                                   πi πj
                                                          j∈U

                                           ∗                              n1       n2        n2
      SRSWOR/SRSWOR with size n1 and n2 : πi =                            N    ×   n1   =    N


             DE                                        N      N           ¯
            Bi (I1i = 1, I2i = 1)              =            (   − 1)(yi − Y )
                                                     (N − 1) n2




                                                                                                  11/26
                          Ensai-Ensae 2013         Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


The conditional bias in particular designs
      Arbitrary two-phase design :
                                                                      ∗
                DE
                                                                    πij
               Bi (I1i = 1, I2i = 1)               =                ∗ ∗ − 1 yj
                                                                   πi πj
                                                          j∈U

                                           ∗                              n1       n2        n2
      SRSWOR/SRSWOR with size n1 and n2 : πi =                            N    ×   n1   =    N


             DE                                        N      N           ¯
            Bi (I1i = 1, I2i = 1)              =            (   − 1)(yi − Y )
                                                     (N − 1) n2

      Arbitrary design/Poisson sampling :

       DE                                       π1ij             −1  −1
      Bi (I1i = 1, I2i = 1) =                          − 1 yj + π1i π2i − 1 yi
                                               π1i π1j
                                     j∈U


                                                                                                  11/26
                          Ensai-Ensae 2013         Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


The conditional bias in particular designs
      Arbitrary two-phase design :
                                                                      ∗
                DE
                                                                    πij
               Bi (I1i = 1, I2i = 1)               =                ∗ ∗ − 1 yj
                                                                   πi πj
                                                          j∈U

                                           ∗                              n1       n2        n2
      SRSWOR/SRSWOR with size n1 and n2 : πi =                            N    ×   n1   =    N


             DE                                        N      N           ¯
            Bi (I1i = 1, I2i = 1)              =            (   − 1)(yi − Y )
                                                     (N − 1) n2

      Arbitrary design/Poisson sampling :

       DE                                       π1ij             −1  −1
      Bi (I1i = 1, I2i = 1) =                          − 1 yj + π1i π2i − 1 yi
                                               π1i π1j
                                     j∈U


                                                                                                  11/26
                          Ensai-Ensae 2013         Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Properties of conditional bias



      can be interpreted as a contribution of each unit (sampled or
      nonsampled) to the total error




                                                                                         12/26
                          Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Properties of conditional bias



      can be interpreted as a contribution of each unit (sampled or
      nonsampled) to the total error
      take fully account of the sampling design : an unit may be highly
      influential under a given sampling design but may have little or no
      influence under another sampling design




                                                                                         12/26
                          Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Properties of conditional bias



      can be interpreted as a contribution of each unit (sampled or
      nonsampled) to the total error
      take fully account of the sampling design : an unit may be highly
      influential under a given sampling design but may have little or no
      influence under another sampling design
          ∗            DE
      If πi = 1, then Bi (I1i = 1, I2i = 1) = 0




                                                                                         12/26
                          Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Properties of conditional bias



      can be interpreted as a contribution of each unit (sampled or
      nonsampled) to the total error
      take fully account of the sampling design : an unit may be highly
      influential under a given sampling design but may have little or no
      influence under another sampling design
          ∗            DE
      If πi = 1, then Bi (I1i = 1, I2i = 1) = 0
      unknown ⇒ must be estimated




                                                                                         12/26
                          Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Properties of conditional bias



      can be interpreted as a contribution of each unit (sampled or
      nonsampled) to the total error
      take fully account of the sampling design : an unit may be highly
      influential under a given sampling design but may have little or no
      influence under another sampling design
          ∗            DE
      If πi = 1, then Bi (I1i = 1, I2i = 1) = 0
      unknown ⇒ must be estimated




                                                                                         12/26
                          Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


A robust version of the double expansion estimator


      Following Beaumont, Haziza and Ruiz-Gazen (2011), we obtain

   ˆR    ˆ
   YDE = YDE −           ˆ DE
                         Bi (I1i = 1, I2i = 1) +                   ˆ DE
                                                                 ψ Bi (I1i = 1, I2i = 1)
                  i∈s2                                    i∈s2

      Example of ψ-function :
                                      
                                       c if t > c
                              ψ (t) =    t if |t| ≤ c
                                        −c if t < −c
                                      

      c : tuning constant



                                                                                         13/26
                          Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
             How to measure the influence ?
                         Unit nonresponse
                         Simulation study
                                Conclusion


A robust version of the double expansion estimator




                                                                                       13/26
                        Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
                     How to measure the influence ?
                                 Unit nonresponse
                                 Simulation study
                                        Conclusion


Sommaire


  1   Introduction

  2   How to measure the influence ?

  3   Unit nonresponse

  4   Simulation study

  5   Conclusion




                                                                                               14/26
                                Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
              How to measure the influence ?
                          Unit nonresponse
                          Simulation study
                                 Conclusion


Notation
     s2 : set of respondents
     n2 : number of responding units (random)
     I2i : response indicator for unit i
     π2i : unknown response probability for unit i.
     We assume sampled units respond independently of one another
     (similar to Poisson sampling in the second phase)
     Propensity score adjusted estimator, assuming the π2i ’s are known :
                               ˜               yi
                               YP SA =
                                         i∈s
                                             π1i π2i
                                                  2


     Influence of a responding unit :
                                               π1ij             −1  −1
     Bi SA (I1i = 1, I2i = 1) =
      P
                                                      − 1 yj + π1i π2i − 1 yi
                                              π1i π1j
                                      j∈U
                                                                        Influence of unit i on
                                                                        the nonresponse error
                                         Influence of unit i on
                                           the sampling error

                                                                                                15/26
                         Ensai-Ensae 2013      Robust inference in two-phases sampling
Introduction
              How to measure the influence ?
                          Unit nonresponse
                          Simulation study
                                 Conclusion


Nonresponse model




     In practice, the response probability π2i is unknown




                                                                                        16/26
                         Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
              How to measure the influence ?
                          Unit nonresponse
                          Simulation study
                                 Conclusion


Nonresponse model




     In practice, the response probability π2i is unknown
     Estimated response probability for unit i with a parametric
     nonresponse model : π2i = m (xi , α)
                         ˆ              ˆ




                                                                                        16/26
                         Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
              How to measure the influence ?
                          Unit nonresponse
                          Simulation study
                                 Conclusion


Nonresponse model




     In practice, the response probability π2i is unknown
     Estimated response probability for unit i with a parametric
     nonresponse model : π2i = m (xi , α)
                         ˆ              ˆ




                                                                                        16/26
                         Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
                How to measure the influence ?
                            Unit nonresponse
                            Simulation study
                                   Conclusion


Method of homogeneous response groups (RHG)

     Step 1 :
     we rank every element in G groups by a growing estimated response
     probabilities
     Step 2 :
     The estimated probability of reponse in the g − th group is given by :
     Si i ∈ g
                                                         −1
                                                i∈srg   π1i
                                     π2i =
                                     ˆ                   −1
                                                 i∈sg   π1i
     Propensity score adjusted estimator :

                                     ˆ                   yi
                                     YP SA =
                                                i∈s2
                                                       π1i π2i
                                                           ˆ



                                                                                          17/26
                           Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
                How to measure the influence ?
                            Unit nonresponse
                            Simulation study
                                   Conclusion


Method of homogeneous response groups (RHG)

     Step 1 :
     we rank every element in G groups by a growing estimated response
     probabilities
     Step 2 :
     The estimated probability of reponse in the g − th group is given by :
     Si i ∈ g
                                                         −1
                                                i∈srg   π1i
                                     π2i =
                                     ˆ                   −1
                                                 i∈sg   π1i
     Propensity score adjusted estimator :

                                     ˆ                   yi
                                     YP SA =
                                                i∈s2
                                                       π1i π2i
                                                           ˆ



                                                                                          17/26
                           Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
                How to measure the influence ?
                            Unit nonresponse
                            Simulation study
                                   Conclusion


Asymptotic conditional bias

      One can show that
                                 ˆ       ˆ
                                 YP SA − YL = Op (n−1 ),

            ˆ                               ˆ
      where YL is the linearized version of YP SA .




                                                                                          18/26
                           Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Asymptotic conditional bias

      One can show that
                                ˆ       ˆ
                                YP SA − YL = Op (n−1 ),

            ˆ                               ˆ
      where YL is the linearized version of YP SA .
      Asymptotic conditional bias of a responding unit :
                                           ˆ
         Bi SA (I1i = 1, I2i = 1) ≈ E1 E2 (YL − Y |I1 , I1i = 1, I2i = 1)
          P




                                                            π1ij
          Bi SA (I1i = 1, I2i = 1) ≈
           P
                                                                   − 1 (yj − yU )
                                                                             ¯
                                                           π1i π1j
                                                  j∈U
                                                  −1  −1
                                               + π1i π2i − 1 (yi − yg )
                                                                   ¯

                                                                                           18/26
                          Ensai-Ensae 2013       Robust inference in two-phases sampling
Introduction
              How to measure the influence ?
                          Unit nonresponse
                          Simulation study
                                 Conclusion


Robust estimator in case of nonresponse

                       ˆ
     Robust version of YP SA

              ˆR
              YP SA        ˆ
                         = YP SA −                   ˆP
                                                     Bi SA (I1i = 1, I2i = 1)
                                              i∈s2

                         +               ˆP
                                       ψ Bi SA (I1i = 1, I2i = 1)
                                i∈s2

     Choice of the tuning constant
     We use a min-max criterion and we obtain :
                                    ˆP                ˆP
                            mini∈S (Bi SA ) + maxi∈S (Bi SA )
            ˆR      ˆ
            YP SA = YP SA −
                                            2


                                                                                           19/26
                         Ensai-Ensae 2013        Robust inference in two-phases sampling
Introduction
                     How to measure the influence ?
                                 Unit nonresponse
                                 Simulation study
                                        Conclusion


Sommaire


  1   Introduction

  2   How to measure the influence ?

  3   Unit nonresponse

  4   Simulation study

  5   Conclusion




                                                                                               20/26
                                Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
               How to measure the influence ?
                           Unit nonresponse
                           Simulation study
                                  Conclusion


Sampling and generating the nonresponse


     We generated three populations of size N = 5000 with two
     variables : y and x.
     Select P = 5000 samples, of size n1 = 300 or 500, according to
     simple random sampling without replacement in first phase
     Generate nonresponse : Bernoulli trials with probability π2i , where

                                                 exp (xi α)
                                   π2i =
                                               1 + exp (xi α)

     Global response rate : 70%
     We computed the min-max method to calibrate the tuning constant



                                                                                            21/26
                          Ensai-Ensae 2013        Robust inference in two-phases sampling
Introduction
             How to measure the influence ?
                         Unit nonresponse
                         Simulation study
                                Conclusion


Results for a total estimation




                                                                                       22/26
                        Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
                How to measure the influence ?
                            Unit nonresponse
                            Simulation study
                                   Conclusion


Results for a total estimation



                                                                                        2
                                         1      P       ˆR           ˆR
                                                        θP SA − EM C θP SA
                                         P      p=1
        RVM C     ˆR      ˆ
                  θP SA , θP SA =                                                       2,
                                         1      P       ˆ            ˆ
                                                        θP SA − EM C θP SA
                                         P      p=1

  et
                                                                                2
                                                    1    P       ˆR
                                                                 θP SA − θ
                                                    P    p=1
             REM C         ˆR      ˆ
                           θP SA , θP SA        =                               2.
                                                    1    P       ˆ
                                                                 θP SA − θ
                                                    P    p=1




                                                                                              22/26
                           Ensai-Ensae 2013         Robust inference in two-phases sampling
Introduction
                 How to measure the influence ?
                             Unit nonresponse
                             Simulation study
                                    Conclusion


Results for a total estimation



   Pop   Taille n1          ˆRP
                      RBM C θp SA (%)                      ˆRP     ˆP
                                                     RVM C θp SA , θp SA                   REM C
           300                   0.02                              1.0                      0.98
    1
           500                   0.04                              1.0                      0.98
           300                   −0.98                            0.52                      0.58
    2
           500                   −0.60                            0.61                      0.64
           300                   −1.33                            0.50                      0.52
    3
           500                   −1.16                            0.59                      0.63




                                                                                                   22/26
                            Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
                     How to measure the influence ?
                                 Unit nonresponse
                                 Simulation study
                                        Conclusion


Sommaire


  1   Introduction

  2   How to measure the influence ?

  3   Unit nonresponse

  4   Simulation study

  5   Conclusion




                                                                                               23/26
                                Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
              How to measure the influence ?
                          Unit nonresponse
                          Simulation study
                                 Conclusion


Concluding remarks



     Conditional bias : measure of influence that takes account of the
     sampling design, the parameter to be estimated and the estimator
     Results can be extended to the case of calibration estimators ⇒
     important in the unit nonresponse context since weight adjustment
     procedures by the inverse of the estimated response probabilities are
     generally followed by some form of calibration
     Proof of the design consistency and the normality of the robust
     estimator




                                                                                        24/26
                         Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
                How to measure the influence ?
                            Unit nonresponse
                            Simulation study
                                   Conclusion


Bibliographie I

      J.F. Beaumont, D. Haziza, and A. Ruiz-Gazen.
      A unified approach to robust estimation in finite population
      sampling.
      En révision, 2011.
      R.L. Chambers.
      Outlier robust finite population estimation.
      Journal of the American Statistical Association, pages 1063–1069,
      1986.
      P.N. Kokic and P.A. Bell.
      Optimal winsorizing cutoffs for a stratified finite population
      estimator.
      Journal of official statistics-Stockholm, 10 :419–419, 1994.


                                                                                          25/26
                           Ensai-Ensae 2013     Robust inference in two-phases sampling
Introduction
                How to measure the influence ?
                            Unit nonresponse
                            Simulation study
                                   Conclusion


Bibliographie II

      J.L. Moreno-Rebollo, A. Munoz-Reyes, M.D. Jimenez-Gamero, and
      J. Munoz-Pichardo.
      Influence diagnostic in survey sampling : Estimating the conditional
      bias.
      Metrika, 55(3) :209–214, 2002.
      J.L. Moreno-Rebollo, A. Muñoz-Reyes, and J. Muñoz-Pichardo.
      Miscellanea. influence diagnostic in survey sampling : conditional
      bias.
      Biometrika, 86(4) :923–928, 1999.
      J. Munoz-Pichardo, J. Munoz-Garcia, J.L. Moreno-Rebollo, and
      R. Pino-Mejias.
      A new approach to influence analysis in linear models.
      Sankhy¯ : The Indian Journal of Statistics, Series A, pages 393–409,
             a
      1995.
                                                                                          26/26
                           Ensai-Ensae 2013     Robust inference in two-phases sampling

More Related Content

More from eric_gautier (8)

Rousseau
RousseauRousseau
Rousseau
 
Rouviere
RouviereRouviere
Rouviere
 
Grelaud
GrelaudGrelaud
Grelaud
 
Ryder
RyderRyder
Ryder
 
Bertail
BertailBertail
Bertail
 
Lesage
LesageLesage
Lesage
 
Davezies
DaveziesDavezies
Davezies
 
Collier
CollierCollier
Collier
 

Favre

  • 1. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Robust inference in two phases sampling with an application to unit nonresponse Jean-François Beaumont, Statistics Canada Cyril Favre Martinoz, Crest-Ensai David Haziza, Université de Montréal, Crest-Ensai 30 janvier 2013 1/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 2. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Sommaire 1 Introduction 2 How to measure the influence ? 3 Unit nonresponse 4 Simulation study 5 Conclusion 2/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 3. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Sommaire 1 Introduction 2 How to measure the influence ? 3 Unit nonresponse 4 Simulation study 5 Conclusion 3/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 4. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Influential units Unusual observations with possibly large design weights or large value 4/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 5. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Influential units Unusual observations with possibly large design weights or large value Many survey statistics are sensitive to the presence of influential units 4/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 6. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Influential units Unusual observations with possibly large design weights or large value Many survey statistics are sensitive to the presence of influential units Including or excluding an influential unit in the calculation of these statistics can have a dramatic impact on their magnitude. 4/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 7. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Influential units Unusual observations with possibly large design weights or large value Many survey statistics are sensitive to the presence of influential units Including or excluding an influential unit in the calculation of these statistics can have a dramatic impact on their magnitude. The occurrence of outliers is common in business surveys because the distributions of variables (e.g., revenue, sales, etc.) are highly skewed (heavy right tail) 4/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 8. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Influential units Unusual observations with possibly large design weights or large value Many survey statistics are sensitive to the presence of influential units Including or excluding an influential unit in the calculation of these statistics can have a dramatic impact on their magnitude. The occurrence of outliers is common in business surveys because the distributions of variables (e.g., revenue, sales, etc.) are highly skewed (heavy right tail) Influential units are legitimate observations 4/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 9. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Influential units Unusual observations with possibly large design weights or large value Many survey statistics are sensitive to the presence of influential units Including or excluding an influential unit in the calculation of these statistics can have a dramatic impact on their magnitude. The occurrence of outliers is common in business surveys because the distributions of variables (e.g., revenue, sales, etc.) are highly skewed (heavy right tail) Influential units are legitimate observations The impact of influential units can be minimized by using a good sampling design : for example, stratified sampling with a take-all stratum 4/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 10. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Influential units Unusual observations with possibly large design weights or large value Many survey statistics are sensitive to the presence of influential units Including or excluding an influential unit in the calculation of these statistics can have a dramatic impact on their magnitude. The occurrence of outliers is common in business surveys because the distributions of variables (e.g., revenue, sales, etc.) are highly skewed (heavy right tail) Influential units are legitimate observations The impact of influential units can be minimized by using a good sampling design : for example, stratified sampling with a take-all stratum 4/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 11. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Influential units Even with a good sampling design, influential units may still be selected in the sample (e.g., stratum jumpers) 5/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 12. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Influential units Even with a good sampling design, influential units may still be selected in the sample (e.g., stratum jumpers) In the presence of influential units, survey statistics are (approximately) unbiased but they can have a very large variance. 5/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 13. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Influential units Even with a good sampling design, influential units may still be selected in the sample (e.g., stratum jumpers) In the presence of influential units, survey statistics are (approximately) unbiased but they can have a very large variance. Reducing the influence of large values produces stable but biased estimators 5/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 14. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Influential units Even with a good sampling design, influential units may still be selected in the sample (e.g., stratum jumpers) In the presence of influential units, survey statistics are (approximately) unbiased but they can have a very large variance. Reducing the influence of large values produces stable but biased estimators Treatment of influential units : trade-off between bias and variance 5/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 15. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Influential units Even with a good sampling design, influential units may still be selected in the sample (e.g., stratum jumpers) In the presence of influential units, survey statistics are (approximately) unbiased but they can have a very large variance. Reducing the influence of large values produces stable but biased estimators Treatment of influential units : trade-off between bias and variance 5/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 16. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Two-phase designs U : finite population of size N s1 : first-phase sample, of size n1 s2 : second-phase sample, of size n2 , selected from s1 I1i : first-phase sample selection indicator for unit i I2i : second-phase sample selection indicator for unit i Vectors of indicators : I1 = (I11 , · · · , I1N ) and I2 = (I21 , · · · , I2N ) First-phase inclusion probability for unit i : π1i = P (I1i = 1) Second-phase inclusion probability for unit i : π2i (I1 ) = P (I2i = 1|I1 ; I1i = 1) 6/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 17. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Two-phase designs U(N) I1i  0, I 2i  0 I1i  1, I 2i  1 s2 (n2 ) I1i  1, I 2i  0 s1 (n1 ) 6/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 18. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Point estimation Goal : estimate a population total of a variable of interest y, Y = yi i∈U 7/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 19. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Point estimation Goal : estimate a population total of a variable of interest y, Y = yi i∈U y-values : available only for i ∈ s2 7/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 20. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Point estimation Goal : estimate a population total of a variable of interest y, Y = yi i∈U y-values : available only for i ∈ s2 Complete data estimator : Double expansion estimator ˆ yi yi YDE = = ∗ i∈s2 π1i π2i i∈s πi 2 7/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 21. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Point estimation Goal : estimate a population total of a variable of interest y, Y = yi i∈U y-values : available only for i ∈ s2 Complete data estimator : Double expansion estimator ˆ yi yi YDE = = ∗ i∈s2 π1i π2i i∈s πi 2 ˆ YDE is design-unbiased for Y ; that is, ˆ E1 E2 (YDE |I1 ) = Y 7/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 22. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Sommaire 1 Introduction 2 How to measure the influence ? 3 Unit nonresponse 4 Simulation study 5 Conclusion 8/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 23. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Total error ˆ The total error of YDE : ˆ YDE − Y How to measure the influence (or impact) of a unit on both errors ? 9/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 24. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Total error ˆ The total error of YDE : ˆ YDE − Y How to measure the influence (or impact) of a unit on both errors ? Single phase sampling : the conditional bias ; Moreno-Rebollo, Munoz-Reyez and Munoz-Pichardo (1999), Beaumont, Haziza and Ruiz-Gazen (2013). 9/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 25. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Total error ˆ The total error of YDE : ˆ YDE − Y How to measure the influence (or impact) of a unit on both errors ? Single phase sampling : the conditional bias ; Moreno-Rebollo, Munoz-Reyez and Munoz-Pichardo (1999), Beaumont, Haziza and Ruiz-Gazen (2013). How to construct a robust estimator to the presence of influential units ? Single phase designs : Beaumont, Haziza and Ruiz-Gazen (2013). 9/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 26. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Measuring the influence : the conditional bias We distinguish between three cases : i ∈ s2 : sampled unit i ∈ s1 − s2 : sampled in first phase but not in the second phase i ∈ U − s1 : nonsampled unit 10/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 27. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Measuring the influence : the conditional bias We distinguish between three cases : i ∈ s2 : sampled unit i ∈ s1 − s2 : sampled in first phase but not in the second phase i ∈ U − s1 : nonsampled unit We can only reduce the influence of the sampled units (i.e., the units belonging to s2 ) 10/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 28. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Measuring the influence : the conditional bias We distinguish between three cases : i ∈ s2 : sampled unit i ∈ s1 − s2 : sampled in first phase but not in the second phase i ∈ U − s1 : nonsampled unit We can only reduce the influence of the sampled units (i.e., the units belonging to s2 ) Nothing can be done for the other units at the estimation stage 10/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 29. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Measuring the influence : the conditional bias We distinguish between three cases : i ∈ s2 : sampled unit i ∈ s1 − s2 : sampled in first phase but not in the second phase i ∈ U − s1 : nonsampled unit We can only reduce the influence of the sampled units (i.e., the units belonging to s2 ) Nothing can be done for the other units at the estimation stage Influence of sampled unit i ∈ s2 : DE Bi (I1i = 1, I2i = 1) ˆ = E1 E2 (YDE − Y |I1 , I1i = 1, I2i = 1) 10/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 30. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Measuring the influence : the conditional bias We distinguish between three cases : i ∈ s2 : sampled unit i ∈ s1 − s2 : sampled in first phase but not in the second phase i ∈ U − s1 : nonsampled unit We can only reduce the influence of the sampled units (i.e., the units belonging to s2 ) Nothing can be done for the other units at the estimation stage Influence of sampled unit i ∈ s2 : DE Bi (I1i = 1, I2i = 1) ˆ = E1 E2 (YDE − Y |I1 , I1i = 1, I2i = 1) 10/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 31. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion The conditional bias in particular designs Arbitrary two-phase design : ∗ DE πij Bi (I1i = 1, I2i = 1) = ∗ ∗ − 1 yj πi πj j∈U ∗ n1 n2 n2 SRSWOR/SRSWOR with size n1 and n2 : πi = N × n1 = N DE N N ¯ Bi (I1i = 1, I2i = 1) = ( − 1)(yi − Y ) (N − 1) n2 11/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 32. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion The conditional bias in particular designs Arbitrary two-phase design : ∗ DE πij Bi (I1i = 1, I2i = 1) = ∗ ∗ − 1 yj πi πj j∈U ∗ n1 n2 n2 SRSWOR/SRSWOR with size n1 and n2 : πi = N × n1 = N DE N N ¯ Bi (I1i = 1, I2i = 1) = ( − 1)(yi − Y ) (N − 1) n2 Arbitrary design/Poisson sampling : DE π1ij −1 −1 Bi (I1i = 1, I2i = 1) = − 1 yj + π1i π2i − 1 yi π1i π1j j∈U 11/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 33. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion The conditional bias in particular designs Arbitrary two-phase design : ∗ DE πij Bi (I1i = 1, I2i = 1) = ∗ ∗ − 1 yj πi πj j∈U ∗ n1 n2 n2 SRSWOR/SRSWOR with size n1 and n2 : πi = N × n1 = N DE N N ¯ Bi (I1i = 1, I2i = 1) = ( − 1)(yi − Y ) (N − 1) n2 Arbitrary design/Poisson sampling : DE π1ij −1 −1 Bi (I1i = 1, I2i = 1) = − 1 yj + π1i π2i − 1 yi π1i π1j j∈U 11/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 34. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Properties of conditional bias can be interpreted as a contribution of each unit (sampled or nonsampled) to the total error 12/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 35. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Properties of conditional bias can be interpreted as a contribution of each unit (sampled or nonsampled) to the total error take fully account of the sampling design : an unit may be highly influential under a given sampling design but may have little or no influence under another sampling design 12/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 36. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Properties of conditional bias can be interpreted as a contribution of each unit (sampled or nonsampled) to the total error take fully account of the sampling design : an unit may be highly influential under a given sampling design but may have little or no influence under another sampling design ∗ DE If πi = 1, then Bi (I1i = 1, I2i = 1) = 0 12/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 37. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Properties of conditional bias can be interpreted as a contribution of each unit (sampled or nonsampled) to the total error take fully account of the sampling design : an unit may be highly influential under a given sampling design but may have little or no influence under another sampling design ∗ DE If πi = 1, then Bi (I1i = 1, I2i = 1) = 0 unknown ⇒ must be estimated 12/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 38. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Properties of conditional bias can be interpreted as a contribution of each unit (sampled or nonsampled) to the total error take fully account of the sampling design : an unit may be highly influential under a given sampling design but may have little or no influence under another sampling design ∗ DE If πi = 1, then Bi (I1i = 1, I2i = 1) = 0 unknown ⇒ must be estimated 12/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 39. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion A robust version of the double expansion estimator Following Beaumont, Haziza and Ruiz-Gazen (2011), we obtain ˆR ˆ YDE = YDE − ˆ DE Bi (I1i = 1, I2i = 1) + ˆ DE ψ Bi (I1i = 1, I2i = 1) i∈s2 i∈s2 Example of ψ-function :   c if t > c ψ (t) = t if |t| ≤ c −c if t < −c  c : tuning constant 13/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 40. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion A robust version of the double expansion estimator 13/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 41. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Sommaire 1 Introduction 2 How to measure the influence ? 3 Unit nonresponse 4 Simulation study 5 Conclusion 14/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 42. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Notation s2 : set of respondents n2 : number of responding units (random) I2i : response indicator for unit i π2i : unknown response probability for unit i. We assume sampled units respond independently of one another (similar to Poisson sampling in the second phase) Propensity score adjusted estimator, assuming the π2i ’s are known : ˜ yi YP SA = i∈s π1i π2i 2 Influence of a responding unit : π1ij −1 −1 Bi SA (I1i = 1, I2i = 1) = P − 1 yj + π1i π2i − 1 yi π1i π1j j∈U Influence of unit i on the nonresponse error Influence of unit i on the sampling error 15/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 43. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Nonresponse model In practice, the response probability π2i is unknown 16/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 44. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Nonresponse model In practice, the response probability π2i is unknown Estimated response probability for unit i with a parametric nonresponse model : π2i = m (xi , α) ˆ ˆ 16/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 45. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Nonresponse model In practice, the response probability π2i is unknown Estimated response probability for unit i with a parametric nonresponse model : π2i = m (xi , α) ˆ ˆ 16/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 46. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Method of homogeneous response groups (RHG) Step 1 : we rank every element in G groups by a growing estimated response probabilities Step 2 : The estimated probability of reponse in the g − th group is given by : Si i ∈ g −1 i∈srg π1i π2i = ˆ −1 i∈sg π1i Propensity score adjusted estimator : ˆ yi YP SA = i∈s2 π1i π2i ˆ 17/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 47. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Method of homogeneous response groups (RHG) Step 1 : we rank every element in G groups by a growing estimated response probabilities Step 2 : The estimated probability of reponse in the g − th group is given by : Si i ∈ g −1 i∈srg π1i π2i = ˆ −1 i∈sg π1i Propensity score adjusted estimator : ˆ yi YP SA = i∈s2 π1i π2i ˆ 17/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 48. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Asymptotic conditional bias One can show that ˆ ˆ YP SA − YL = Op (n−1 ), ˆ ˆ where YL is the linearized version of YP SA . 18/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 49. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Asymptotic conditional bias One can show that ˆ ˆ YP SA − YL = Op (n−1 ), ˆ ˆ where YL is the linearized version of YP SA . Asymptotic conditional bias of a responding unit : ˆ Bi SA (I1i = 1, I2i = 1) ≈ E1 E2 (YL − Y |I1 , I1i = 1, I2i = 1) P π1ij Bi SA (I1i = 1, I2i = 1) ≈ P − 1 (yj − yU ) ¯ π1i π1j j∈U −1 −1 + π1i π2i − 1 (yi − yg ) ¯ 18/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 50. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Robust estimator in case of nonresponse ˆ Robust version of YP SA ˆR YP SA ˆ = YP SA − ˆP Bi SA (I1i = 1, I2i = 1) i∈s2 + ˆP ψ Bi SA (I1i = 1, I2i = 1) i∈s2 Choice of the tuning constant We use a min-max criterion and we obtain : ˆP ˆP mini∈S (Bi SA ) + maxi∈S (Bi SA ) ˆR ˆ YP SA = YP SA − 2 19/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 51. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Sommaire 1 Introduction 2 How to measure the influence ? 3 Unit nonresponse 4 Simulation study 5 Conclusion 20/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 52. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Sampling and generating the nonresponse We generated three populations of size N = 5000 with two variables : y and x. Select P = 5000 samples, of size n1 = 300 or 500, according to simple random sampling without replacement in first phase Generate nonresponse : Bernoulli trials with probability π2i , where exp (xi α) π2i = 1 + exp (xi α) Global response rate : 70% We computed the min-max method to calibrate the tuning constant 21/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 53. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Results for a total estimation 22/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 54. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Results for a total estimation 2 1 P ˆR ˆR θP SA − EM C θP SA P p=1 RVM C ˆR ˆ θP SA , θP SA = 2, 1 P ˆ ˆ θP SA − EM C θP SA P p=1 et 2 1 P ˆR θP SA − θ P p=1 REM C ˆR ˆ θP SA , θP SA = 2. 1 P ˆ θP SA − θ P p=1 22/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 55. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Results for a total estimation Pop Taille n1 ˆRP RBM C θp SA (%) ˆRP ˆP RVM C θp SA , θp SA REM C 300 0.02 1.0 0.98 1 500 0.04 1.0 0.98 300 −0.98 0.52 0.58 2 500 −0.60 0.61 0.64 300 −1.33 0.50 0.52 3 500 −1.16 0.59 0.63 22/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 56. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Sommaire 1 Introduction 2 How to measure the influence ? 3 Unit nonresponse 4 Simulation study 5 Conclusion 23/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 57. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Concluding remarks Conditional bias : measure of influence that takes account of the sampling design, the parameter to be estimated and the estimator Results can be extended to the case of calibration estimators ⇒ important in the unit nonresponse context since weight adjustment procedures by the inverse of the estimated response probabilities are generally followed by some form of calibration Proof of the design consistency and the normality of the robust estimator 24/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 58. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Bibliographie I J.F. Beaumont, D. Haziza, and A. Ruiz-Gazen. A unified approach to robust estimation in finite population sampling. En révision, 2011. R.L. Chambers. Outlier robust finite population estimation. Journal of the American Statistical Association, pages 1063–1069, 1986. P.N. Kokic and P.A. Bell. Optimal winsorizing cutoffs for a stratified finite population estimator. Journal of official statistics-Stockholm, 10 :419–419, 1994. 25/26 Ensai-Ensae 2013 Robust inference in two-phases sampling
  • 59. Introduction How to measure the influence ? Unit nonresponse Simulation study Conclusion Bibliographie II J.L. Moreno-Rebollo, A. Munoz-Reyes, M.D. Jimenez-Gamero, and J. Munoz-Pichardo. Influence diagnostic in survey sampling : Estimating the conditional bias. Metrika, 55(3) :209–214, 2002. J.L. Moreno-Rebollo, A. Muñoz-Reyes, and J. Muñoz-Pichardo. Miscellanea. influence diagnostic in survey sampling : conditional bias. Biometrika, 86(4) :923–928, 1999. J. Munoz-Pichardo, J. Munoz-Garcia, J.L. Moreno-Rebollo, and R. Pino-Mejias. A new approach to influence analysis in linear models. Sankhy¯ : The Indian Journal of Statistics, Series A, pages 393–409, a 1995. 26/26 Ensai-Ensae 2013 Robust inference in two-phases sampling