Successfully reported this slideshow.
Upcoming SlideShare
×

# Calibration of weights in surveys with nonresponse and frame imperfections

1,904 views

Published on

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### Calibration of weights in surveys with nonresponse and frame imperfections

1. 1. Calibration of Weights inSurveys with Nonresponse and Frame Imperfections A course presented at Eustat Bilbao, Basque Country January 26-27, 2009 bySixten Lundström and Carl-Erik Särndal Statistics Sweden http://www.scb.se/statistik/_publikationer/OV9999_2000I02_BR_X97%c3%96P0103.pdf 1_1 Introduction 1
2. 2. Welcome to this course with the title : Calibration of Weights in Surveys with Nonresponse and Frame Imperfections The title of the course suggests two objectives :• To study calibration as a general method for estimation in surveys; this approach has attracted considerable attention in recent years• A focus on problems caused by nonresponse : bias in the estimates, and how to reduce it 2
3. 3. Key conceptsFinite population U : N objects (elements) : persons, or farms, or business firms, or …Sample s : A subset of the elements in U : s ⊂ USampling design : How to select a sample s from U or, more precisely, from the list of the elements in U (the frame population) Key concepts Probability sampling : Every element in the population has a non-zero probability of being selected for the sample In this course we assume that probability sampling is used. 3
4. 4. There is a well-defined survey objective . For ex., information needed about employment : How many unemployed persons are there in the population? Study variable : y with value yk = 1 if k unemployed yk = 0 if k not unemployed ‘Unemployed’ is a well-defined concept (ILO) Number of unemployed to be estimated : N ∑ yk = ∑ yk = ∑U yk k =1 k∈U Key conceptsA survey often has many study variables (y-variables) .• Categorical study variables: Frequently in surveys of individuals and households (number of persons by category)• Continuous study variables : Frequently in business surveys (monetary amounts) 4
5. 5. Key conceptsThere may exist other variables whose values are known and can be used to improve the estimation. They are called auxiliary variables.Calibration is a systematic approach to the use of auxiliary information. Key concepts Auxiliary variables play an important role • in the sampling design (e.g., stratification) • in the estimation (by calibration) In this course we discuss only how aux. information is used in the estimation. 5
6. 6. Key conceptsIdeal survey conditions :• The only error is sampling error.• All units selected for the sample provide the desired information (no nonresponse)• They respond correctly and truthfully (no measurement error)• The frame population agrees with the target population (no frame imperfections) This courseIdeal conditions : They do not exist in the real world.. But they are a starting point for theory.Session 1_4 of this course discuss uses of aux. information under ideal conditions.Objective : Unbiased estimation; small variance. 6
7. 7. This course Nonresponse (abbreviated NR) : All of those selected for the sample do not response, or they respond to part of the questionnaire only A troubling feature of surveys today: NR rates are very high. ‘Classical survey theory’ did not need to pay much attention to NR. This courseMost of this course - Sessions 1_5 to 2_6 -is devoted to the situation : sampling error and NR errorObjective : Describe approaches to estimation ; Reduce as much as possible both bias (due to NR) and variance 7
8. 8. This courseIn the concluding Session 2_7 we add another complication :Frame imperfections : The target population is not identical to the frame populationNot discussed in the course:Measurement error : Some of the answers provided are wrong Research on NR in recent years Two directions : Preventing NR from occurring (methods from behavioural sciences) - We do not discuss this Dealing with (‘adjusting for’) NR once it has occurred (mathematical and statistical sciences) ; the subject of this course. 8
9. 9. Categories of NR• Item NR : The selected element responds to some but not all questions on the questionnaire• Unit NR : The selected element does not respond at all ; among the reasons : refusal, not-at-home, and others Basic considerations for this course• NR is a normal, but undesirable feature of essentially all sample surveys today• NR causes bias in the estimates• We must still make the best possible estimates• Bias is never completely eliminated, but we strive to reduce it as far as possible• Small variance no consolation, because (bias)2 can be the dominating part of MSE 9
10. 10. Why is NR such a serious problem ?The intuitive understanding : Those who happen torespond are often not ‘representative’ for thepopulation for which we wish to make inferences(estimates).The result is bias : Data on the study variable(s)available only for those who respond. Theestimates computed on these data are oftensystematically wrong (biased), but we cannot(completely) eliminate that bias. Consequences of NR• (bias)2 can be the larger part of MSE• NR increases survey cost; follow-up is expensive• NR will increase the variance, because fewer thandesired will respond. But this can be compensated byanticipating the NR rate and allowing ‘extra samplesize’• Increased variance often a minor problem,compared with the bias. 10
11. 11. Treatment of NR• NR may be treated by imputation primarily the item NR ; not discussed in this course .• NR may be treated by (adjustment) weighting primarily the unit NR ; it is the main topic in this course Neither type of treatment will resolve the real problem, which is bias Starting points• Adjustment methods never completely eliminate the NR bias for a given study variable. This holds for the methods in this course, and for any other method• NR bias may be small for some of the usually many study variables, but large for others; unfortunately, we have no way of knowing 11
12. 12. Comments, questions• The course is theoretical, but has a very practical background• Different countries have very different conditions for sampling design and estimation. The Scandinavian countries have access to many kinds of registers, providing extensive sources of auxiliary data.• We are curious : What are the survey conditions in your country ?• What do you consider to be ‘high NR’ in your country? Literature on nonresponse• little was said in early books on survey sampling (Cochran and other books from the 1950’s)• in recent years, a large body of literature , many conferences• several statistical agencies have paid considerable attention to the problem 12
13. 13. Our background and experience for work on NR methodology• S. Lundström, Ph.D. thesis, StockholmUniv. (1997)• Lundström & Särndal : Current Best Methods manual, Statistics Sweden (2002)http://www.scb.se/statistik/_publikationer/OV9999_2000I02_BR_X97%c3%96P0103.pdf.• Särndal & Lundström: Estimation in Surveys with Nonresponse. New York: Wiley (2005). The course is structured on this book. Our backgroundSärndal & Lundström (2008): Assessingauxiliary vectors for control of nonresponse biasin the calibration estimator. Journal of OfficialStatistics, 24, 251-260Särndal & Lundström (2009): Design forestimation: Identifying auxiliary vectors toreduce nonresponse bias. Submitted forpublication 13
14. 14. Important earlier worksOlkin, Madow and Rubin (editors):Incomplete data in sample surveys.New York: Academic Press (1983) (3 volumes)Groves, Dillman, Eltinge and Little (editors):Survey Nonresponse.New York: Wiley (2001) These books examine NR from manydifferent perspectives. A comment The nature of NR is sometimes described by terms such as ignorable, MAR, MCAR, non-ignorable These distinctions not needed in this course 14
15. 15. 1_2 Introductory aspects of the course material Planning a surveyThe process usually starts with a general,sometimes rather vague description of aproblem (a need for information)The statistician must determinethe survey objective as clearly as possible:• What exactly is the problem?• Exactly what information is wanted? 1
16. 16. Types of fact finding Options : • An experminent ? • A survey ? • Other ? The statistician’s formulationmust specify : • the finite population and the subpopulations (domains) for which information is required • the variables to be measured and the parameters to be estimated 2
17. 17. The target population (U) Domain (U q )Parameters: Y = ∑ y U k Yq = ∑U yk where q = 1,..., Q q ψ = f (Y1,..., Ym ,..., YM )Aspects of the survey design that need to beconsidered :• Data collection method• Questionnaire design and pretesting• Procedures for minimizing response errors• Selection and training of interviewers• Techniques for handling nonresponse• Procedures for tabulation and analysis 3
18. 18. No survey is perfect in all regards! Sampling errors (examined) Nonsampling errors • Errors due to non-observation Undercoverage (examined) Nonresponse (examined) • Errors in observations Measurement Data processingSampling error Target population (U) Sample set (s) 4
19. 19. Sampling error and nonresponse error Target population (U) Sample set (s) Response set (r)A simple experiment to illustrate sampling error and nonresponse errorParameter to estimate : The proportion, in%, of elements with a given property : 100 P= ∑ yk N U where ⎧1 if element k has the property yk = ⎨ ⎩0 otherwise Let us assume P = 50 5
20. 20. Sampling design: SI , n from NAssume no auxiliary information available Estimator of P if full response : ˆ 100 ∑ y P= n s k Estimator of P if m out of n respond : ˆ 100 PNR = ∑ yk m rLet us study what happens if theresponse distributionis as follows, where θ k = Pr( k responds) : ⎧0.5 if element k has the property θk = ⎨ ⎩0.9 otherwise Note: The response is directly related to the property under estimation. 100 repeated realizations (s, r) 6
21. 21. n=30 Estimates (percent)Full- 50response 0 0 20 40 60 80 100 120 Sample number Estimates (percent)Nonresponse 50 0 0 20 40 60 80 100 120 Sample numbern=300 Estimates (percent) Full- 50 response 0 0 20 40 60 80 100 120 Sample number Estimates (percent)Nonresponse 50 0 0 20 40 60 80 100 120 Sample number 7
22. 22. Comments- In practice, we never know the responseprobabilities. To be able to study the effect ofnonresponse, assumptions about responseprobabilities are necessary.- Increasing the sample size will not reduce thenonresponse bias. As a matter of fact, theproportion of MSE due to the bias will increasewith increasing sample size, as we now shallshow.We consider response distributions of thetype : ⎧ ∗ ⎪ θ k = ⎨ θ if element k has the property ⎪0.9 otherwise ⎩Consider four such response distributions : (1) θ∗ = 0.5; (2) θ∗ = 0.85; (3) θ∗ = 0.88; (4) θ∗ = 0.89; 8
23. 23. 100 repeated realizations (s, r); for each of these, we compute ˆ 100 PNR = ∑ yk m r then compute the proportion of MSE due to squared bias : Bias 2 RelB 2 = 100× MSE where MSE = Var + Bias 2RelB 2 for different sample sizes and resp. distrib. θ∗ n 30 300 1000 2000 0.50 65.1 94.9 98.4 99.2 0.85 2.6 17.2 42.2 59.1 0.88 0.4 3.2 10.1 19.4 0.89 0.1 0.8 2.6 5.9 9
24. 24. The proportion of MSE due to squared bias…(i) increases with increasing sample size(ii) is rather high for large sample sizes evenwhen the difference between the responseprobabilties for elements with the property andelements without the property is small. The high proportion will cause the confidence interval to be invalid, as we now show. The usual 95% confidence intervalwould be computed as ˆ ˆ P (100 − PNR ) ˆ PNR ± 1.96 NR m Problem: The coverage rate does not reach 95% when there is NR. 10
25. 25. Coverage rate (%) for different sample sizes for the response distribution with ⎧0.85 if element k has the property θk = ⎨ ⎩ 0.9 otherwise Sample size (n) 30 300 1000 2000 93.2 92.6 87.1 77.9Sampling, nonresponse and undercoverage errorFrame population Target population ”Persisters”Overcoverage Undercoverage 11
26. 26. Different setsR: Target population elements with completeor partiell responseNR: Target population elements with no orinadequate responseO: Elements in the sample which we do notknow if they belong to the target population orthe overcoverageΦ : Elements in the sample which belong to theovercoverage Different sets (contin.) C: Target population elements with complete response NC: Target population elements with partiell response 12
27. 27. Breakdown of the sample size n n The data collectionnR nNR nO nφnC nNC Swedish standard for calculation of response rates Unweighted response rate = nR = nR + n NR + u × nO where u is the rate of O that belongs to the nonresponse. 13
28. 28. Weighted response rate = ∑ R dk = ∑ R d k + ∑ NR d k + u ∑O d kNR is an increasingly serious problem.It must always be taken into account inthe estimation.We illustrate this by some evidence. 14
29. 29. The Swedish Labour Force Survey - Time series of the nonresponse rate 25 Total unit nonresponse 20 15 10 5 0 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 Noncontact RefusalNonresponse analysis in the Survey on Life and Health Age group 18-34 35-49 50-64 65-79 Response rate(%) 54.9 61.0 72.5 78.2 15
30. 30. Country of birth Nordic Other countriesResponse rate (%) 66.7 50.8Income class (in 0-149 150-299 300-thousands of SEK)Response rate (%) 60.8 70.0 70.2 16
31. 31. Marital status Married Other Response rate (%) 72.7 58.7Education level Level 1 Level 2 Level 3Response rate (%) 63.7 65.4 75.6 17
32. 32. International experienceLower response rate for : Metropolitan residents Single people Members of childless households Young people Divorced / widowed people People with lower educational attainment Self-employed people Persons of foreign origin This course will show :Use of (the best possible) auxiliary informationwill reduce the nonresponse bias the variance the coverage errors 18
33. 33. 1_3 DiscussionSurvey response in your organizationTrends in survey response rates ? Increasing ?What are some typical response rates ? In theLabour Force Survey for ex.? Reason for concern ?Have measures been introduced to increase surveyresponse ?Have measures been introduced to improveestimation ? By more efficient use of auxiliaryinformation, or by other means ? 1
34. 34. Some response rates The Swedish Household Budget Survey 1958 86 % 2005 52 % The Swedish Labour Force Survey 1970 97 % 2005 81 %The Swedish Labour Force Survey - Timeseries of the nonresponse rate25 Total unit nonresponse201510 5 0 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 Noncontact Refusal 2
35. 35. 1_4 The use of auxiliary information under ideal survey conditions Review : Basic theory for complete response Important concepts in design-based estimation for finite populations :• Horvitz-Thompson (HT) estimator• Generalized Regression (GREG) estimator• Calibration estimator 1
36. 36. The progression of ideas Unbiased estimators for common designs (1930’s and on). Cochran (1953) and other important books of the 1950’s : • stratified simple random sampling (STSI) • cluster & two-stage sampling Horvitz-Thompson (HT) estimator (1952) : arbitrary sampling design; the idea of individual inclusion prob’s The progression of ideasGREG estimator (1970’s) : arbitrary auxiliary vector for model assisted estimationCalibration estimator (1990’s) : identify powerful information ; use it to compute weights for estimation (with or without NR)Concurrently, development of computerizedtools : CLAN97, Bascula, Calmar, others 2
37. 37. Basic theory for complete responsePopulation U of elements k = 1, 2, ..., NSample s (subset of U)Non-sampled (non-observed) : U – sComplete response : all those sampled arealso observed (their y-values recorded) Notation Finite population U = {1, 2,..., k ,..., N } Sample from U s Sampling design p (s ) Inclusion prob. of k πk Design weight of k dk = 1/ π kJoint incl. prob. of k and l π kl 3
38. 38. Notation Study variable y Its value for element k yk We want to estimate ∑U yk Usually, a survey has many y – variables Can be categorical of continuous Notation Domain = Sub-populationA typical domain : Uq It is a subset of U : Uq ⊆ UDomain total to estimate : ∑U q yk 4
39. 39. Notation Domain-specific y-variable yq Its value for element k yqk yqk = yk in domain , yqk = 0 outside Domain total to estimate: ∑U q yk = ∑U yqkfor ex.: total of disposable income (the variable)in single-member households (the domain)The approach to estimationmust handle a variety of practical circumstancesA typical survey has many y-variables :One for every socio-economic conceptOne for every domain of interest (every newdomain adds a new y-variable)A y-variable is often both categorical (“zero-one”) and domain-specific (= 0 outsidedomain).For ex.: Unemployed (variable) among personsliving alone (domain). 5
40. 40. Even though the survey has many y-variables,we can focus on one of themand on the estimation ofits unknown population total Y = ∑U yk HT estimator for complete response : YHT = ∑ s d k yk ˆ Design weight of k : dk = 1/ π k Auxiliary information not used at the estimation stage 6
41. 41. HT estimator for complete response :Variance V(YHT ) = ∑ ∑U Fkl yk yl ˆ 1 d d Fkl = k l − 1 for l ≠ k d kl = π kl ; d kl Fkk = d k − 1For ex., for SI sampling, we have YHT = N ys ˆ 2 1 1 2 and V ( N ys ) = N ( n − N ) S yU HT estimation for complete response :The variance estimator V ( YˆHT ) = ∑ ∑ s d kl Fkl yk yl ˆIt has familiar expressions for ‘the usual designs’.For STSI , with nh from Nh in stratum h H YHT = ∑ N h ysh ˆ h =1 H 2 1 1 2with estimated variance ∑ Nh ( − ) S ys h =1 nh N h h 7
42. 42. Auxiliary vectordenoted x ; its dimension may be large Its value for element k: xk To qualify as auxiliary vector, must know more than just x k for k ∈ s For example, know xk for k ∈ U Or know the total ∑U x kGREG estimator of Y = ∑U yk (1980’s)YGREG = ∑ s d k yk + (∑U x k − ∑ s d k x k )′B s;d ˆ regression adjustment; HT est. of Y + an estimator of 0B s; d is a regression vector , computed on the sample data 8
43. 43. GREG estimator ; alternative expressionYGREG = ˆ ∑U yk + ∑s dk ( yk − yk ) ˆ ˆ Sample sum of Population sum of weighted residuals predicted values yk = x′ Bs;d ˆ k computable for k∈U The auxiliary information for GREG is : ∑ U x k = pop. total of aux. vector Examples : • A continuous x-variable x k = (1, xk )′ ⇒ ∑ U x k = ( N , ∑ U x k ) ′ • A classification of the elementsx k = (0,...,1,..., 0)′ ( ) ⇒ ∑U x k = N1,..., N j ,..., N J ′ 9
44. 44. ˆ YGREG contains the estimated regression vector B s; d = (∑ s d k x k x′k ) −1 (∑ s d k x k yk ) matrix to invert × column vector is a (nearly unbiased) estimator of its population counterpart : BU = (∑U x k x′k ) −1(∑U x k yk ) System of notation for means, regression coefficients, etc.First index : the set of elements that definesthe quantity (“the computation set”) then semi-colon , thenSecond index : the weighting used in the quantity.Examples: y s; d = ∑ s d k yk ∑s dk weighted sample mean B s;d = (∑ s d k x k x′k ) −1(∑ s d k x k yk ) 10
45. 45. If the need arises to be even more explicit : B ( y:x) s;d = (∑ s d k x k x′k ) −1(∑ s d k x k yk ) Regression of y on x , computed over the sample s with the weighting dk = 1/πk System of notation Absence of the second index means : the weighting is uniform (“unweighted ”). Examples : 1 yU = ∑ yk unweighted population mean N UBU = (∑U x k x′k ) −1 (∑U x k yk ) (unweighted regr. vector) 11
46. 46. Estimators as weighted sums HT estimator : YHT = ∑ s d k yk ˆThe weight of k is d k = 1 / π k Estimators as weighted sumsGREG estimator as a weighted sum : YGREG = ∑ s d k g k yk ˆ The weight of element k isdk gk = design weight × adjustment factor based on the auxiliary info. 12
47. 47. The GREG estimator gives element k the weight dk gk where dk = 1 / π k g k =1 + λ′s x k λ′s = (∑U xk − ∑s dk xk )′(∑s dk xk x′ )−1 k GREG estimator; computation YGREG = ∑ s d k g k yk ˆ1. Matrix inversion (∑s dk xk x′ )−1 k 2. Compute λ′s = (∑ xk − ∑ dk xk )′(∑ dk xk x′k )−1 U s s 3. Compute g k =1 + λ ′ x k s 4. Finally compute d k g k Several software exists for this. 13
48. 48. Comment Matrix inversion is part of the weight computation λ′s = (∑U xk − ∑s dk xk )′(∑s dk xk x′ )−1 k row vector matrix inversionGREG estimator YGREG = ∑ s d k g k yk ˆProperty of the weights : ∑ s d k g k x k = ∑U x k (known total)They are calibrated to the known information 14
49. 49. Bias of GREG : is very small, already formodest sample sizesBias/stand. dev. is of order n −1/ 2Bias decreases faster than the stand.dev.For practical purposes we can forget the bias(assuming full response).Variance estimation for GREG : Well known since the 1980’s CommentWeights of the form d k (1 + λ′ x k )will be seen often in the following :the design weight multiplied by anadjustment factor of the form 1 + λ′ x k 15
50. 50. Note : When we examine estimation for NR, (Sessions 1_5 and following), the weights will again have the form design weight × adjustment factor but then the estimators will be biased, more or less, depending on the strength of the auxiliary vector Auxiliary information: An example For every k in U, suppose known : • Membership in one out of 2 × 3 = 6 possible groups, e.g., sex by age group • The value xk of a continuous variable x e.g., xk = income of kMany aux. vectors can be formulatedto transmit some or all of this total information . Let us consider 5 of these vectors . 16
51. 51. Vector xk Info ∑U x k Description xk ∑U xk total population income (1, xk )′ ( N , ∑U xk )′ population size and total population income Vector Info (0, xk ,0,0,0,0)′ (∑U xk ,..., ∑U 11 x )′ 23 k population income by age/sex group(0,1,0,0,0,0, 0, xk ,0,0,0,0)′ ( N11,..., N23, ∑U xk ,..., ∑U 11 x )′ 23 k size of age/sex groups, and population income by groups(1,0, 0, xk ,0)′ (N1⋅, N2⋅, ∑U⋅1 xk ,∑U⋅2 xk , ∑U⋅3 xk )′ size of sex groups, and income by age groups 17
52. 52. For each of the five formulated vectors, YGREG = ∑ s d k g k yk ˆwill have a certain mathematical form :Five different expressions, but all of them arespecial cases of the general formula for gk .(No need to give them individual names - theyare just special cases of one estimator namelyGREG) For example, with the aux. vector x k = (1, xk )′ YGREG = ∑ s d k g k yk ˆ takes the form that ‘the old literature’ calls the (simple) regression estimator, { ( YGREG = N y s;d + xU − xs; d Bs; d ˆ ) } In modern language : It isthe GREG estimator for the aux vector x k = (1, xk )′ 18
53. 53. 1_5Introduction to estimation in surveys with nonresponse Target population (U) Sample set (s) Response set (r) 1
54. 54. Notation Our objective : To estimate Y = ∑U y k with an estimator denoted ˆ Y NR representing either YW = ∑ r w k y k (weighting only) ˆ YIW = ∑ r wk y•k (imputation followed ˆ by weighting) Imputation followed by weightingA typical survey has many y-variables, indexed i = 1, …, I.Response set for variable i : riResponse set for the survey: The set of elementshaving responded to at least one item : r 2
55. 55. Imputation followed by weighting YIW = ∑ r wk y•k ˆ ⎧ y for k ∈ ri where y•k = ⎨ k ⎩ yk for k ∈ r − ri ˆImputation for item NR: The imputed value yk ˆtakes the place of the missing value yk Components of error YNR − Y = (Y − Y ) + (YNR − Y ) ˆ ˆ ˆ ˆTotal error = Sampling error + NR error ˆ Y is the estimator of Y that would be used under complete response (r = s) ˆ Y NR is the “NR-estimator” for Y 3
56. 56. Two phases of selection 1. s is selected from U 2. given s, r is realised as a subset from s.The two probability distributions are p (s ) (known) and q(r s) (unknown)Both are taken into account in evaluating bias and varianceWe use the conditional argument : For expected value : E pq (⋅) = E p [ Eq (⋅ s )] For variance : V pq (⋅) = V p [ Eq (⋅ s )] + E p [Vq (⋅ s )] 4
57. 57. ˆ The basic statistical properties of Y NR The bias: B pq (YNR ) = E pq (YNR ) − Y ˆ ˆ The accuracy, measured by MSE : ( MSE pq (Y NR ) = V pq (Y NR ) + B pq (Y NR ) ˆ ˆ ˆ )2 The biaswill be carefully studied in this course. Ithas two components B (Y ) = E pq (Y NR ) − Y pq ˆ NR ˆ = [ E p (Y ) − Y ] + [ E pq (YNR − Y )] ˆ ˆ ˆ = BSAM + BNR sampling bias + NR biasBSAM is zero (for HT) or negligible (for GREG) 5
58. 58. The variance By definition V pq (YNR ) = E pq (YNR − E pq (YNR )) 2 ˆ ˆ ˆIt can be decomposed into two components V pq (YNR ) = VSAM + VNR ˆ sampling variance + NR variance The sampling variance component : VSAM = V p (Y ) = E p [(Y − E p (Y )) 2 ] ˆ ˆ ˆ depends only on the sampling design p(s) For ex., under SRS, if the full response estimator is Y = N y s ˆ then the well-known expression 1 1 2 VSAM = N 2 ( − )S yU n N 6
59. 59. The NR variance component is more complex : VNR = E pVq (YNR s)+ V p ( BNR s ) + 2Cov p (Y , BNR s ) ˆ ˆ where ( BNR s = Eq (YNR − Y ) s ˆ ˆ ) (conditional NR bias) Add the squared bias to arrive at the the measure of accuracy :MSE pq (YNR ) = ˆ V p (Y ) + E pVq (YNR s ) + E p ( B 2 ˆ ˆ ) + 2Cov p (Y , B NR s ) + ˆ NR s 2 BSAM B NR + ( BSAM ) 2 BSAM is negligible, and if Cov term small, then MSE pq (YNR ) ≈ V p (Y ) + E pVq (YNR s ) + E p ( B NR s ) ˆ ˆ ˆ 2 7
60. 60. The accuracy has two parts : 2MSE pq (YNR ) ≈ V p (3 + E pVq (YNR s ) + E p ( B NR s ) ˆ 1 2 ˆ Y) ˆ 14444 24444 3 4 4 due to due to NR samplingThe main problem with NR: The term involving the bias, E p ( B 2 ) NR scan be a very large component of MSE 8
61. 61. 1_6 Weighting of data.Types of auxiliary information. The calibration approach. Structure Target population U Sample s Response set r 1
62. 62. Notation and terminology Population U of elements k = 1, 2, ..., N Sample s (subset of U) Non-sampled : U – s Response set r (subset of s) Sampled but non-responding : s – r U ⊇ s ⊇ r The objective remains to estimate the total Y = ∑U ykIn practice, many y-totals and functions of y-totals.But we can focus here on one total.No need at this point to distinguishitem NR and unit NR.Perfect coverage assumed. 2
63. 63. The response set r is the set for which we observe ykAvailable y-data : yk for k ∈ r Missing values : yk for k∈s−r where r ⊆ s ⊆U Nonresponse means that r⊂s Full response means that r =s with probability one Two phases of selection Phase one : Sample selection with known sampling design Phase two : Response selection with unknown response mechanism 3
64. 64. Phase one: Sample selectionKnown sampling design : p(s )Known inclusion prob. of k: πk Known design weight of k : dk = 1/ πk Phase two: Response selectionUnknown response mechanism : q(r s) Unknown response prob. of k: θk Unknown response influence of k : φk = 1/θ k 4
65. 65. A note on terminology dk = 1/ πk computable weight φk = 1/θ k unknown; not a weight, called influence Sample weighting combined with response weightingDesired (but impossible) combined weighting : 1 1 d k × φk = × πk θk known unknown 5
66. 66. Desirable nonresponse weighting ˆ d Y = ∑r θ k y k = ∑ r d k φk y k kCannot be computed,because unknown influences φk = 1/θ k We present the calibration approach. But first we look at a more traditional approach. Most estimators in the traditional approach are special cases of the calibration approach. 6
67. 67. Traditional approach : The principal idea ˆ is to derive estimates θ k of the unknown response prob. θ k Then use these estimates in constructing the estimator of the total Y . An often used form of this approach : ˆ 1 Starting from Y = ∑r dk yk θk replace 1 / θ k ˆ by 1 / θ kWe get ˆ Y = ∑r dk 1 ˆ k y θk sampling NR adjustment weight weight 7
68. 68. A large literature exists about this type of estimator : ˆ 1 Y = ∑r dk yk ˆ θk Estimation of θk done with the aid of a response model : • response homogeneity group (RHG:s) • logisticThe term response propensity is sometimes used The idea behind response homogeneity groups (RHG:s) The elements in the sample (and in the response set) can be divided into groups. Everyone in the same group responds with the same probability, but these probabilities can vary considerably between the groups . 8
69. 69. Example : STSI sampling RHG:s coinciding with strata (each stratum assumed to be an RHG) 1 N h nh N h dk = = ˆ θk nh mh mh H N n H ˆ Y= ∑ h h ∑ y k = ∑ N h y rh h =1 nh mh rh h =1The procedure is convenient but oversimplifies theproblem. It is a special case of the calibrationapproach. A variation of the traditional approach Start with 2-phase GREG estimator Yˆ = ∑ r d k 1 g θk y k θkAfter estimation of the response prob, we get ˆ Y= 1 ∑ r d k ˆ g θk y k ˆ θk 9
70. 70. A general method for estimation in the presenceof NR should• be easy to understand• cover many survey situations• offer a systematic way to incorporate auxiliary information• be computationally easy• be suitable for statistics production (in NSI:s) One can maintain that the calibration approach satisfies these requirements. There is an extensive literature since 1990. 10
71. 71. Steps in the calibration approach • State the information you wish to use. • Formulate the corresponding aux. vector • State the calibration equation • Specify the starting weights (usually the sampling weights) • Compute new weights - the calibrated weights - that respect the calibration equation • Use the weights to compute calibration estimates Pedagogical noteCalibration estimation is a highly general approach.It covers many situations arising in practice.Generality is at the price of a certain level ofabstraction.The formulation uses linear algebra.Knowledge of regression theory is helpful. 11
72. 72. Why can we not use the design weights d k = 1 / π kwithout any further adjustment ? Answer: They are not largeenough when there is NR. ˆ Y = ∑ r d k yk ⇒ underestimation We must expand the design weights Information may exist at the population level at the sample level 12
73. 73. Structure Target population U Sample s Response set r Levels of information Distinguish :• Information at the population level. Such info, taken from population registers, is particularly prevalent and important in Scandinavia, The Netherlands, and increasingly elsewhere in Europe• Information at the sample level. Such info may be present in any sample survey 13
74. 74. Levels of information Notation : Two types of auxiliary vectorx∗ transmits information k at the population levelxo transmits information k at the sample level Auxiliary vector , population level Two common situations : • x∗ known value for every k in U k (given in the frame, or coming from admin.reg. ∗ ∗ • the total X = ∑U x k is imported from accurate outside source x∗ need not be known for every k k 14
75. 75. Sources of variables for the star vector x∗ k• the existing frame• by matching with other registersExamples of variables for the star vector :For persons : age, sex, address, incomeTo related persons: Example, in survey ofschool children, get (by matching)vartiables for parents Auxiliary vector , sample level xo is a known value for every k in s k (observed for the sample units) Hence we can compute and use Xo = ∑ s d k x o ˆ k It is unbiased information , not damaged by NR 15
76. 76. Examples of variables for the moon vector xo k• Identity of the interviewer• Ease of establishing contact with selected sample element• Other survey process characteristics• Basic question method (“easily observed features” of sampled elements)• Register info transmitted only to the sample data file, for convenience The information statement • Specifies the information at hand ; totals or estimated totals • May refer to either level: Population level, sample level • It is not a model statement 16
77. 77. Information is something we know; it provides input for the calibration approach . (By contrast, a model is something you do not know, but venture to assume.) Statement of auxiliary information sampling, then nonresponse Set of units InformationPopulation U ∑U x∗ k knownSample s xo known, k ∈ s kResponse set r x∗ and xo known, k ∈ r k k 17
78. 78. • The auxiliary vector General notation : x k• The information available about that vector General notation : XThree special cases : • population info only • sample info only • both types of info 18
79. 79. • population info only x k = x∗ ; X = ∑U x∗ k k (known total) • sample info only x k = xo k ; X = ∑ s d k xo k (unbiasedly estimated total) • both types of info ⎛ ∗⎞ ⎛ ∑ x∗ ⎞xk = ⎜ xk ⎟ ; X=⎜ U k ⎟ ⎜ xo ⎟ ⎜ ⎝ k⎠ ⎝ ∑ s d k xo ⎟ k⎠Example :x k = ( 0,...,1,...,0 0,...,1,...0 )′ identifies age/sex group identifies interviewer for k∈ U for k∈ s 19
80. 80. For the study variable y we know (we have observed) : yk for k ∈ r ; r ⊂ s ⊂ U Missing values : yk for k ∈ s − r The calibration estimator is of the form ˆ YW = ∑r wk yk with wk = d k vkwhere dk = 1/πk , and the factor vk serves to• expand the design weight dk for unit k• incorporate the auxiliary information• reduce as far as possible bias due to NR• reduce the variance 20
81. 81. Note: We want vk > 1 for all (or nearly all) k ∈ r , in order to compensate for the elements lost by NR.Primary interest : ˆ Examine the (remaining) bias in YW = ∑ r wk yk attempt to reduce it further. Recepie: Seek better and betterauxiliary vectors for the calibration!(Sessions 2_3, 2_4, 2_5) Secundary interest (but also important): ˆ Examine the variance of YW find methods to estimate it . 21
82. 82. Mathematically, the adjustment factor vk can be determined by different criteria, for example • v k = 1 + λ ′x k linear in the aux. vector • vk = exp(λ ′x k ) exponential Determine first λ (explicitly or by numeric methods) Linear adjustment factor vk is determined to satisfy : (i ) v k = 1 + λ ′ x k linearityand(ii) ∑r d k wk x k = X calibration to the given information X Now determine λ 22
83. 83. From (i) and (ii) follow ( λ ′ = λ ′r = X − ∑ r d k x k )′ (∑r dk xk x′k )−1 assuming the matrix non-singular. Then the desired calibrated weights are wk = d k vk = d k (1 + λ ′r x k ) Computational note: Possibility of negative weights : d k v k = d k (1 + λ ′x k ) with λ ′ = (X − ∑ r d k x k )′ (∑ r d k x k x′k )−1can be negative. It does happen, but rarely. 23
84. 84. Computational note: The vector (X − ∑r dk xk )′ (∑r dk xk x′k )−1is not near zero, as it wasfor the GREG estimator (in the absence of NR) Properties of the calibrated weights w k = d k (1 + λ ′r x k ) 1. They expand : wk > d k all k , or almost all 2. ∑ r wk = N = population size under a simple condition 24
85. 85. Note : if both types of information, then ⎛ x∗ ⎞ xk = ⎜ k ⎟ ⎜ xo ⎟ ⎝ k⎠ and the information input is ⎛ ∑ x∗ ⎞ X= ⎜ U k ⎟ ⎜ o ⎟ ⎝ ∑ s dk xk ⎠ When both types of information present, it is also possible to calibrate in two steps : First on the sample information; gives intermediate weights. Then in step two, the intermediate weights are calibrated, using also the population information, to obtain the final weights wk . 25
86. 86. Consistencyis also an important motivation for calibration (inaddition to bias reduction and variance reduction)If xk is known for k ∈ s, the statistical agency cansum over s and publish the unbiased estimate X = ∑ dk xk ˆ sUsers often require that this estimate coincidewith the estimate obtained by summing over r ˆusing the calibrated weights : XW = ∑r wk x k Calibration makes this consistency possible Almost all of our aux. vectors are of the form: There exists a constant vector μ such that μ′x k = 1 for all k For example, if x k = (1, xk )′, then μ = (1,0)′ . 26
87. 87. When x k is such that μ′x k = 1 for all k then the weights simplify : wk = d k vk = d k {X′ (∑r d k xk x′k )−1xk } where ⎛ ∗ x ⎞ X = ⎜ ∑U ko ⎟ ⎜ ⎟ is the information input ⎜ ∑ dk xk ⎟ ⎝ s ⎠A summary of this session: We have• discussed two types of auxiliary information• introduced the idea of a weighting (ofresponding elements) that is calibrated to thegiven information• hinted that calibrated weighting givesconsistency, and that it often leads to bothreduced NR bias and reduced variance . Moreabout this later. 27
88. 88. 1_7 Comments on the calibration approachThe calibration approach Some features: • Generality (any sampling design, and auxiliary vector) • ”Conventional techniques” are special cases • Computational feasibility (software exists) 1
89. 89. The calibration approach brings generalityEarlier : Specific estimators were used for surveys with NR. They had names, such as Ratio estimator, Weighting Class estimator and so on.Now : Most of these ‘conventional techniques’ are simple special cases of the calibration approach. Specific names no longer needed. All are calibration estimators.Another feature of the calibration estimator: Perfect estimates under certain conditionConsider the case where x k = x∗ and X = X∗ = ∑U x∗ k k ∗ ∗ Assume that yk = (x k )′β holds for every k ∈ U (perfect linear regression), then YW = ∑U yk = Y ˆ No sampling error, no NR-bias! 2
90. 90. Recall: We have specified the weights as wk = d k vk ; vk = 1 + λ ′ x k r where λ ′r = (X − ∑ r d k x k )′ (∑ r d k x k x′k )−1 They satisfy the calibration equation ∑ wx r k k =X But they are not unique: They are not the only ones that satisfy the calibration equation.In fact, for a given xk-vector with giveninformation input X, there exist many sets ofweights that satisfy the calibration equation ∑ wxr k k =XIn other words, “calibrated weights” is nota unique concept. Let us examine this. 3
91. 91. The calibration procedure takes certaininitial weightsand transforms them into(final) calibrated weights The initial weigths can be specified in more than one way.Consider the weights wk = dαk vk where vk = 1 + λ ′ z k r λ ′r = (X − ∑ r dαk x k )′ (∑ r dαk z k x′k )−1 dαk is an initial weightzk is an instrument vector (may be ≠ x k )These wk satisfy the calibration equation ∑ r wk x k = ∑U x kfor any choice of dαk and zk(as long as the matrix can be inverted) 4
92. 92. The ”natural choices”dαk = d k = 1 / π k and z k = xkare used most of the time and will be called thestandard specifications . An important type of z-vector There exists a constant vector μ not dependent on k such that μ′z k = 1 for all k ∈ U z When z k = x k , this condition reads: μ′x k = 1 for all k ∈ U Almost all of our x-vectors are of this type 5
93. 93. Different initial weights may produce the same calibrated weights When the z-vector satisfies μ′z k = 1 for all k then dαk = d k and dαk = C d k give the same calibrated weights Example • SI sampling; n from N • z k = x k = x* = 1 kThen the initial weights N dαk = d k = and n n N dαk = d k = m m give the same calibrated weights, namely, N wk = m 6
94. 94. Invariant calibrated weightsare also obtained in the following situation:• STSI with strata U p ; np from Np ; p = 1,…,P• z k = x k = x k = stratum identifier * Then the initial weights dαk = d k = N p / n p and dαk = d k × (n p / m p ) = N p / m p give the same calibrated weights, namely wk = N p / m p Usually the components of zk are functions of the x-variables For example, if x k = ( x1k , x 2 k ) ′ we get calibrated weights by taking z k = ( x1k , x2k )′ 7
95. 95. The well-known Ratio (RA) estimator is obtained by the specifications x k = x ∗ = xk k and zk = 1 Note : Non-standard specifications ! They give ∑ d k yk YW = ∑ xk × r ˆ U ∑ d k xk rA perspective on the weights : We can write thecalibrated weight as the sum of two components wk = wMk + wRk = “Main term” + “Remainder” with wMk = dαk {X′ (∑ r dαk z k x′k )−1 z k } wRk = dαk {1 − (∑ r dαk x k )′ (∑ r dαk z k x′k )−1 z k } 8
96. 96. wRk is often small compared to the main term.In particular , wRk = 0 for all k when zk has the following property : We can specify a constant vector μ not dependent on k such that μ′z k = 1 for all k Then wRk = 0 and wk = wMk (An example : zk = xk =(1, xk) and μ=(1,0) ) When wRk = 0 , the calibrated weights have simplified form wk = wMk = dαk {X′ (∑ r dαk z k x′k )−1 z k } Under the standard specifications : wk = wMk = d k {X′ (∑ r d k x k x′k )−1 x k } 9
97. 97. Agreement with the GREG estimatorIf r = s (complete response), and ifx k = x* and z k = c x* k k for any positive constant c, then the calibration estimator and the GREG estimator can be shown to be identical. 10
98. 98. 1_8 Traditional estimators as special cases of the calibration approach The family of calibration estimatorsincludes many ‘traditional estimator formulas’ Let us look at some examples.The standard specification dαk = d k and z k = xk is used (unless otherwise stated). 1
99. 99. An advantage of the calibration approach:We need not any more think in terms of‘traditional estimators’ with specific names.All of the following examples are special casesof the calibration approach,corresponding to simple formulationsof the auxiliary vector xk The simplest auxiliary vector x k = x∗ = 1 for all k kThe corresponding information is weak : ∑U x k = ∑U 1 = NCalibrated weights (by the general formula) : N wk = d k × ∑r dk YW = N yr ;d = N ∑ r k k = YEXP ˆ d y ˆ d ∑r k known as the Expansion estimator 2
100. 100. The simplest auxiliary vector x k = x∗ = 1 kIn particular, for SI (n sampled from N;m respondents): N n N wk = = n m m sampling NR adjustmentThe simplest auxiliary vector xk = x∗ =1 for all k k⇒ YW = YEXP = N yr ;d ˆ ˆ • weakness the aux. vector xk =1 : recognizes no differences among elements • bias usually large 3
101. 101. One can show, for any sampling design, bias(YEXP ) / N ≈ yU ;θ − yU ˆ Note the difference between two means : ∑U θ k yk The theta-weighted mean yU ;θ = ∑U θ k ∑U yk The unweighted mean yU = N When y and θ are highly correlated, that difference can be very large (more about this later). Comment on the Expansion EstimatorDespite an often large nonresponse bias, theexpansion estimator is (surprisingly enough)often used by practitioners and researchers in socialscience.This practice, which has developed in somedisciplines, cannot be recommended. 4
102. 102. The classification vector (“gamma vector”)Elements classified into P dummy-coded groups γ k = (γ 1k ,..., γ pk ,..., γ Pk )′ = (0,...,1,...,0)′ The only entry ‘1’ identifies the group (out of P possible ones) to which element k belongs The classification vectorTypical examples:• Age groups• Age groups by sex (complete crossing)• Complete crossing of >2 groupings• Groups formed by intervals of a continuous x-variable 5
103. 103. The classification vector as a star vector x k = x∗ = k γ k = (0,...,1,...,0)′The associated information : The vector of population class frequencies ( ∑U x∗ = N1,..., N p ,..., N P k )′Calibrated weights (by the general formula) : Np wk = d k × for all k in group p ∑ rp d k The classification vector as a star vector : xk = x∗ = γk kThe calibration estimator takes the form P ˆ YW = ∑ N p yr p ;d ˆ = YPWA p =1known as thePopulation Weighting Adjustment estimator 6
104. 104. Population Weighting Adjustment estimator A closer look : P ˆ YPWA = ∑ N p yr p ; d with p =1 ∑ r p d k yk yr p ; d = = weighted group y-mean ∑ rp d k for respondents Np = known group count in the population The classification vector as a moon vector x k = x o = γ k = (0,...,1,...,0)′ kInformation for calibration : the unbiasedly estimated class counts N = ∑ d , p = 1 ,2,..., P ˆ p sp k The general formula gives the weights ∑s p dk wk = d k × for all k in group p ∑ rp d k 7
105. 105. The classification vector as a moon vector : x k = x o = k γk In particular for SI sampling : N np wk = for all k in group p . n mpSampling NR adjustmentweight by inverse of group response rate The classification vector as a moon vectorx k = x o = γ k = (0,...,1,...,0)′ k P ˆ ˆ YW = ∑ N p yr p ;d ˆ = YWC p =1known as Weighting Class estimator 8
106. 106. Weighting Class estimator P ˆ ˆ YWC = ∑ N p yr p ;d p =1Class sizes not known but estimated: N p = ∑ s d k ˆ p ∑ r p d k yk yr p ; d = = weighted group y-mean ∑ rp d k for respondents A continuous x-variable for example, xk = income ; yk = expenditure Two vector formulations are of interest : • x k = x ∗ = xk k Info : ∑U x k = ∑U xk• x k = x∗ = (1, xk )′ Info : ∑U x k = ( N , ∑U xk )′ k 9
107. 107. The Ratio Estimator is obtained by formulating x k = x∗ = xk and z k = 1 k (non-standard !) ∑U xk weights wk = d k × ∑ r d k xk ˆ ∑ d y ˆ calibration estimator YW = (∑U xk ) r k k = YRA ∑ r d k xkNot very efficient for controlling bias.A better use of the x-variable :create size groups or “include an intercept” The (simple) Regression EstimatorA better use of the x-variable: x k = x∗ = (1, xk )′ = z k k calibrated weights given by : ⎛ 1 xU − xr ;d ⎞ d k vk = d k × N ⎜ + ( xk − xr ;d ) ⎟ ⎜∑ d ⎟ ⎝ r k ∑ r d k ( xk − xr ;d ) 2 ⎠ The calibration estimator takes the form ˆ { ˆ } YW = N yr ; d + ( xU − xr ; d ) Br ; d = YREG regression coefficient 10
108. 108. The (simple) Regression Estimator A closer look : ˆ { YREG = N yr ;d + ( xU − xr ;d ) Br ; d } with xr ; d = ∑ r d k xk / ∑r dk yr ; d analogous y-mean ∑ r d k ( xk − xr ;d )( yk − yr ;d ) Br ;d = ∑ r d k ( xk − x r ; d ) 2 regression of y on x Combining a classification and a continuous x-variableInformation about both(i) the classification vector γk = (γ 1k ,..., γ pk ,..., γ Pk )′and = (0,...,1,...,0)′(ii) a continuous variable with value xk 11
109. 109. Known group totals for a continuous variable The vector formulation : x k = x∗ = (γ 1k xk ,..., γ pk xk ,..., γ Pk xk )′ = xk γ k k Information for p = 1,…, P : ∑U p xk z k = γ k = (0,...,1,...,0)′ (not standard) gives the SEPRA (separate ratio) estimator Known group counts and group totals for a continuous variable The vector formulation : x k = x∗ = ( γ′k , xk γ′k )′ = z k k(γ 1k ,..., γ pk ,..., γ Pk , x k γ 1k ,..., x k γ pk ,..., x k γ Pk )′Information for p = 1,…, P : Np and ∑U p xk gives the SEPREG (separate regression) estimator 12
110. 110. The Separate Regression Estimator ˆ P { ( )YW = ∑ N p yr p ;d + xU p − xr p ;d Br p ; d } = YˆSEPREG p =1 Marginal counts for a two-way classification P groups for classification 1 (say, age by sex) H groups for classification 2 (say, profession) x k = x∗ = k = (γ 1k ,..., γ pk ,..., γ Pk , δ1k ,..., δ hk ,..., δ H −1, k )′ = (0,...,1,...,0 0,...,1,...,0)′ Calibration on the P + H - 1 marginal counts . Note : H – 1 Gives the two-way classification estimator 13
111. 111. List of ‘traditional estimators’ (We shall refer to them later.) Expansion (EXP) Weighting Class (WC) Population Weighting Adjustment (PWA) Ratio (RA) Regression (REG) Separate Ratio (SEPRA) Separate Regression (SEPREG) Two-Way Classification (TWOWAY) Comment : No need to give individual names to the traditional estimators. All are calibration estimators. For example, although known earlier as ‘regression estimator’, ˆ { YREG = N yr ;d + ( xU − xr ;d ) Br ; d }is now completely described as thecalibration estimator for the vector x k = x∗ = (1, xk )′ k 14
112. 112. 1_9 ExercisesYour set of course materials contains an appendixwith a number of exercises.Some of these ask you to formulate (verbally)your response to a given practical situation,others require an algebraic derivation. 1
113. 113. You are encouraged to consider these exercises,now during the course, or after the course.Exercises 1 and 2 reflect practical situations thatsurvey statisticians are likely to encounter in theirwork. Think about the (verbal) answers you wouldgive. 2
114. 114. 2_1 Calibration with combined use of sample information and population informationDifferent levels of auxiliary information Population level x∗ k Sample level xo k 1
115. 115. Recall the traditional approach ˆ Find estimates θ kof the unknown response prob. θ k Then form 1 Yˆ= ∑r d k ˆ y k θk If population totals are available, there may be a second step: Use ˆ d k /θ k as starting weights; get final weights by calibrating to the known population totals Alternative traditional approach Start from 2-phase GREG estimator Yˆ = ∑ r d k 1 g θk y k θkAfter estimation of the response prob, we get ˆ 1 Y = ∑ r d k ˆ g θk y k ˆ θk 2
116. 116. The first step in traditional approaches:The idea: Adjust for nonresponse by model fitting An explicit model is formulated, with the θ kas unknown parameters. ˆ The model is fitted, θ k is obtained as an estimate of θ k , and 1 / θk ˆ is used as a weight adjustment to d k Ex. Logistic regression fittingFrequently used : Subgrouping The sample s is split into a number of subgroups (Response omogeneity groups) The inverse of the response fraction within a group is used as a weight adjustment to d k 3
117. 117. The traditional approach often gives the same result as the calibration approach We return to the calibration estimator YW = ∑ wk yk ˆ r Let us consider alternatives for computing the wkSingle-step or two-step may be used.We recommend single-step, as follows:Initial weights: dαk = d k ⎛ x∗ ⎞Auxiliary vector: xk = ⎜ k ⎟ ⎜ xo ⎟ ⎝ k⎠ ⎛ ∑ x∗ ⎞Calibration equation: ∑ r w k x k = ⎜ U ko ⎜ ⎟ ⎟ ⎝ ∑ s dk xk ⎠ Then compute the wk 4
118. 118. Two variations of two-step : Two-step A and Two-step BTwo-step AStep 1:Initial weights: dkAuxiliary vector: x k = xo kCalibration equation: ∑r wk xo = ∑ s d k xo o k k 5
119. 119. Two-step A (cont.) Step 2:Initial weights: o wk ⎛ x∗ ⎞Auxiliary vector: xk = ⎜ k ⎟ ⎜ xo ⎟ ⎝ k⎠ ⎛ ∑ x∗ ⎞Calibration equation: ∑ r w 2 Ak x k = ⎜ U ko ⎜ ⎟ ⎟ ⎝ ∑ s dk xk ⎠ Two-step B Step 1:Initial weights: dkAuxiliary vector: x k = xo kCalibration equation: ∑ r wk xo = ∑ s d k xo o k k 6
120. 120. Two-step B (cont.) Step 2: Initial weights: o wk Auxiliary vector: x k = x∗ k Calibration equation: ∑ r w2 Bk x∗ = ∑U x∗ k kHere no calibration to the sample information ∑ s d k xo k An example of calibration with information at both levelsSample level: x o = γ k = ( γ1k ,..., γ pk ,..., γ Pk )′ k (classification for k ∈ s) ∗Population level: x k = (1, xk )′ (xk a continuous variable with known population total) 7
121. 121. Single-stepInitial weights: d αk = d k ⎛ xk ⎞Auxiliary vector: xk = ⎜ ⎟ ⎜γ ⎟ ⎝ k⎠ ⎛ ∑U xk ⎞Calibration equation: ∑ r wk x k = ⎜ d γ ⎟ ⎜∑ ⎟ ⎝ s k k⎠ Two-step A Step 1: Initial weights: dk Auxiliary vector: xo = γ k k Calibration equation: ∑r wk xo = ∑ s d k γ k o k 8
122. 122. Two-step A (cont.) Step 2:Initial weights: o wk ⎛x ⎞Auxiliary vector: xk = ⎜ k ⎟ ⎜γ ⎟ ⎝ k⎠ ⎛ ∑ x ⎞Calibration equation: ∑ r w2 Ak x k = ⎜ U k ⎟ ⎜∑ d γ ⎟ ⎝ s k k⎠Two-step BStep 1:Initial weights: dkAuxiliary vector: xo = γ k kCalibration equation: ∑r wk xo = ∑ s d k γ k o k 9
123. 123. Two-step B (cont.) Step 2:Initial weights: o wkAuxiliary vector: x∗ = (1, xk )′ k ∗ =⎛ N ⎞Calibration equation: ∑ r w2 Bk x k ⎜ ⎟ ⎜∑ x ⎟ ⎝ U k⎠Comments:In general, Single-step, Two-step A andTwo-step B give different weight systems.But we expect the estimators to have minordifferences only.There is no disadvantage in mixing thepopulation information with the sampleinformation. It is important that both sources areallowed to contribute. 10
124. 124. The Two-step B procedure resembles the traditional approach, and has been much used in practiceStep 1: Adjust for nonresponseStep 2: Achieve consistency of theweight system and reduce the variancesomewhatBut we recommendthe Single-step procedure. Monte Carlo simulation10,000 SI sampleseach of size n = 300 drawn fromexperimental population of size N = 832,constructed from actual survey data :Statistics Sweden’s KYBOK surveyElements classified into four administrativegroups; sizes: 348, 234, 161, 89 11
125. 125. Monte Carlo simulation Information: For every k ∈ U, we know • membership in one of 4 admin. groups • the value xk of a continuous variable x = sq.root revenues We can use all or some of the info. Study variable: y = expenditures Monte Carlo simulation measures computed ˆ RelBias = 100 [ Ave(YW ) − Y ] / Y 10,000 ˆ Ave(YW ) = ∑ YW ( j ) / 10,000 ˆ j =1Variance = 1 10,000 ˆ [ ˆ 2 ∑ YW ( j ) − Ave(YW ) × 10 9,999 j =1 ] −8 12
126. 126. Monte Carlo simulation ; logit responseEstimator RelBias VarianceEXP 5.0 69.6Single-step -0.6 9.7Two-step A -0.6 9.8Two-step B -0.8 9.5Monte Carlo simulation ; increasing exp response Estimator RelBias Variance EXP 9.3 70.1 Single-Step -2.4 8.2 Two-step A -2.3 8.3 Two-step B -3.0 8.0 13
127. 127. Our conclusionIn practice there are no rational groundsfor selecting another method thanthe Single-step procedure. 14
128. 128. 2_2 Analysing the bias remaining in the calibration estimator Important to try to reduce the bias ?Most of us would say YES, OF COURSE.A (pessimistic) argument for a NO :There is no satisfactory theoretical solution;the bias cannot be estimated.It is always unknown(because the response probabilities unknown)The approach that we present not pessimistic. 1
129. 129. Important to try to reduce the bias ? Yes. It is true that the bias due to NR cannot be known or estimated. But we must strive to reduce the bias . We describe methods for this. Calibration is not a panacea.No matter how we choose the aux. vector, thecalibration estimator (or any other estimator) willalways have a remaining bias .The question becomes : How do we reduce theremaining bias ?Answer: Seek ever better xkWe need procedures for this search(Sessions 2_3, 2_4, 2_5) 2
130. 130. Improved auxiliary vector will (usually) lead to reduced bias , reduced variance Interesting quantities are : (a) the mean squared error 2 MSE = (Bias) + Varianceand(b) proportion of MSE due to squared bias (Bias)2 /{(Bias)2 + Variance} 3
131. 131. A bad situation : bias > stand. dev. distribution of Yˆ stand.dev. of Yˆ bias true value Y mean of Yˆ Bad situation : squared bias represents a large portion of the MSE⇒ the interval ˆ ˆ ˆ Y ± 1.96 × V (Y ) estimated stand.dev. will almost certainly not contain the unknown value Y for which we want to state valid 95% confidence limits 4
132. 132. We know :Variance is often small (and tends to 0) compared to squared bias (does not tend to 0) Both bias and variance are theoretical quantities (expectations), stated in terms of values for the whole finite population Variance can be estimated, but not the bias . The bias of the calibration estimator• The calibration estimator is not without bias.(Same holds for any other type of estimator.)• The bias comes (almost entirely) from the NR,not from the probability sampling.• If 100% response, the calibration estimatorbecomes the (almost) unbiased GREG estimator.• Both bias and variance of the calibrationestimator depend on the strength of the auxiliaryvector. Important: Seek powerful auxiliary vector. 5
133. 133. The bias of the calibration estimator Recall the general definition : bias = expected value of estimator minus value of parameter under estimation What is ‘expected value’ in our case ? The bias of the calibration estimatorWe assess expected value, bias and variancejointly under :the known sample selection p(s) andthe unknown response mechanism q(r s) ˆ ˆ bias (YW ) = E pq (YW ) − YOur assumptions on the unknown q(r s)are ‘almost none at all’. 6
134. 134. The bias of the calibration estimator Derivation of the bias is an evaluation in two phases :. ˆ ˆ bias(YW ) = E p ( Eq (YW s )) − Y Let us evaluate it ! Approximate expression is obtainable for any auxiliary vector any sampling design Before evaluating the bias in a general way (arbitrary sampling design, arbitrary aux. vector) let us consider a simple example . 7
135. 135. Example: The simplest auxiliary vector x k = x∗ = 1 for all k k YEXP = N yr ; d = N ∑r k k ˆ d y d ∑r k Weighted respondent mean, expanded by N Recommended exercise : Use first principles to derive its bias ! We find ˆ bias(YEXP / N ) ≈ yU ;θ − yU ∑U θ k yk yU ;θ = theta-weighted mean ∑U θ k 1 yU = N ∑U yk simple unweighted meanWhy approximation ?Answer: Exact expression hard to obtain.It is a close approx. ? Yes. 8
136. 136. The bias of the expansion estimator The theta-weighted population mean can differ considerably from the unweighted population mean, (both of them unknown), so bias can be very large. These means differ considerably when y and θ have high correlation.Suppose the correlationbetween y and θ is 0.6 .Then simple analysis shows that ˆ bias(YEXP / N ) ≈ 0.6 × cv(θ ) × S yUwhere cv(θ) = SθU / θU the coeff. of variation of θ S yU the stand. dev. of y in U 9
137. 137. If the response probabilities θ do not vary at all, then cv(θ) = SθU / θU = 0 and ˆ bias(YEXP / N ) ≈ 0As long as all elements have the sameresponse prob. (perhaps considerably < 1),there is no bias . But suppose cv(θ) = SθU / θU = 0.1 Then ˆbias(YEXP / N ) ≈ 0.6 × 0.1× S yU = 0.06 S yUThis bias may not seem large, but the crucialquestion is : How serious is it compared with ˆ stand.dev (YEXP / N ) ? 10
138. 138. ˆ 1 2Var (YEXP / N ) ≈ S yU m(a crude approximation; SI sampling assumed)Suppose m = 900 responding elements ˆ stand.dev (YEXP / N ) ≈ 0.033 S yU compared with : ˆ bias (YEXP / N ) ≈ 0.06 S yUThen (Bias)2 /[(Bias)2 + Variance] = (0.06)2/[(0.06)2 + (1/900)] = 0.0036/(0.0036 + 0.0011) = 77 %Impossible then to make validinference by confidence interval ! 11
139. 139. We return to the General calibration estimatorFor a specified auxiliary vector xk with corresponding information X , let us evaluate its bias. The Calibration Estimator : Its bias ˆ YW = ∑r wk yk with wk = d k vk = d k (1 + λ ′r x k ) λ ′r = (X − ∑ r d k x k )′ (∑ r d k x k x′k )−1 matrix inversion 12
140. 140. Deriving the bias of the calibration estimator requires an evaluation of ˆ ˆ bias(YW ) = E p ( Eq (YW s )) − Y This exact bias expression does not tell us much. But it is closely approximated by a much more informative quantity called ˆ nearbias (YW ) Comments on approximation: All ‘modern advanced estimators’, GREG and others, are complex (non- linear). We cannot assess the exact variance of GREG, but there is an excellent approximation. Likewise, for the calibration estimator, we work not with the exact expression for bias and variance, but with close approximations. 13
141. 141. Derivation of the bias : Technique : Taylor linearization. Keep the leading term of the development ; for this term, we can evaluate the expected values in question. Calibration estimator close approximation to its bias ˆ ˆ bias(YW ) ≈ nearbias (YW )where ˆ nearbias (YW ) = − ∑U (1 − θ k ) eθk with eθk = yk − x′k BU ;θ BU ;θ = (∑U θk xk x′k )−1∑U θk xk yk 14
142. 142. ˆ nearbias (YW ) = − ∑U (1 − θ k ) eθk is important in the following It is a general formula, valid for: • any sampling design • any aux. vector • it is a close approximation (verified in simulations) Comments• Detailed derivation of nearbias, see the book• For given auxiliary vector, nearbias is thesame for any sampling design, but depends onthe (unknown) response prob’s• nearbias is a function of certain regressionresiduals (not the usual regression residuals)• The variance does depend on samplingdesign 15
143. 143. Comments• The nearbias formula makes no distinctionbetween “star variables” and “moon variables”• In other words, for bias reduction, an x-variable isequally important whenit carries info to the pop. level (included in x∗ ) kas when it carries info only to the sample level(included in xo )kSurprising conclusion, perhaps.But for variance, the distinction can be important. Example: Let xk be a continuous aux. variable• Info at population level : x k = x∗ = (1, xk )′ k ⇒ N and ∑U xk known ˆ ˆ ( ) ⇒ YW = YREG = N { yr ;d + xU − xr ;d Br ;d } • Info at sample level only : x k = xo = (1, xk )′ k ˆ ⇒ N = ∑ s d k and ∑ s d k xk computable ˆ ˆ ( ) ⇒ YW = N { yr ;d + xs;d − xr ;d Br ;d } ˆ where xs;d = ∑ s d k xk / N The two estimators differ, but same nearbias . 16