Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Improving inferences from poor quality samples

279 views

Published on

A solutions-based approach, illustrated by case studies, which show how inferences can be improved from surveys administered to biased, low response rate and non-probability samples.

It addresses how to improve the accuracy of the survey estimates we generate from poorer quality and non-probability samples.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Improving inferences from poor quality samples

  1. 1. www.srcentre.com.au Improving inferences from poor quality samples Social Research Centre / Centre for Social Research and Methods Workshop 16 August, 2017
  2. 2. www.srcentre.com.au Presenters Social Research Centre Pty Ltd Darren W. Pennay Dina Neiger, PhD Paul J. Lavrakas, PhD Darren Pennay is an Adjunct Senior Research Fellow with the ANU Centre for Social Research and Methods and an Adjunct Professor with the Institute for Social Science Research at the University of Queensland Dina Neiger is a Centre Visitor with the ANU Centre for Social Research and Methods Paul Lavrakas is a Senior Fellow, NORC at the University of Chicago; Senior Research Fellow, Office of Survey Research, Michigan State University; and Senior Methodological Advisor for the ANU’s Social Research Centre 2
  3. 3. www.srcentre.com.au The Social Research Centre ➢ Based in Melbourne, Australia o owned by the Australian National University in Canberra o operates as a for-profit research services company. o profits returned to ANU o Co-founders of the ANU Centre for Social Research and Methods ➢ Conduct social and public policy research for government, not-for-profit and academic organisations ➢ Primary data collection involving survey research, qualitative research, statistical methods, data processing and analytics, survey methodology ➢ 65 staff, plus 120-seat CATI call centre 3
  4. 4. www.srcentre.com.au What is Total Survey Quality? ➢ Many national statistical offices, including Australia, Canada, New Zealand and USA, operate within a Total Survey Quality (TSQ) framework to determine whether or not a survey is “fit-for-purpose”. ➢ Accuracy is not enough. 4 Dimensions of TSQ Framework Dimension Description Accuracy Total survey error is minimised Credibility Data are considered trustworthy by the survey user communities Comparability Consistent with past studies in terms of demographic, spatial, and temporal comparisons Usability / Interpretability Documentation is clear and metadata are well organised Relevance Data satisfy user needs Accessibility Access to the data is user friendly Timeliness / Punctuality Data deliverables adhere to schedules Completeness Data are rich enough to satisfy the analysis objectives without undue burden on respondents Coherence Estimates from different sources can be reliably combined Source: Biemer, P. (2010) Public Opinion Quarterly 74 (5): p.819.
  5. 5. www.srcentre.com.au REPRESENTATION MEASUREMENT Survey Cycle from a Design Perspective 5 Final Results & Conclusions Target Population Sampling Frame Designated Sample Final sample Final Dataset Response Specification Measurement
  6. 6. www.srcentre.com.au Total Survey Error Framework 6 Final Results & Conclusions Sampling Error Nonresponse Error Adjustment Error ERRORS OF REPRESENTATION ERRORS OF MEASUREMENT Coverage Error Processing Error Specification Error Measurement Error Inferential Error Target Population Sampling Frame Designated Sample Final sample Final Dataset Response Specification Measurement
  7. 7. www.srcentre.com.au What is a probability sample? Textbook Definition: A probability sample is one in which every unit in the population of interest has a known, non-zero probability of being selected for the sample. Key elements which will differentiate the process from a non-probability sampling process: o Selection into the sample is via a random process. o Every unit in the population of interest has a chance of being sampled for the research. o The probability of selection is known. These design features enable us to: o Calculate standard errors o Calculate confidence intervals o Generalise to the target population of interest 7
  8. 8. www.srcentre.com.au Trends in response rates
  9. 9. www.srcentre.com.au Non-response and non-response bias 9
  10. 10. The general decline in response rates is evident across nearly all types of surveys, in the United States and abroad. At the same time, greater effort and expense are required to achieve even the diminished response rates of today. These challenges have led many to question whether surveys are still providing accurate and unbiased information [and whether or not probabilistic surveys are better than the alternatives?] Pew Research Center, 2012 10
  11. 11. www.srcentre.com.au Trends in response rates (U.S. data) There is no sign of an increase in nonresponse bias since 2012. On 13 demographic, lifestyle, and health questions that were compared with benchmarks from high response rate federal surveys, estimates from phone polls are just as accurate, on average, in 2016 as they were in 2012. The average (absolute) difference between the Center telephone estimates and the benchmark survey estimates was 2.7 percentage points in 2016, compared with 2.8 points in 2012. 11
  12. 12. www.srcentre.com.au Trends in response rates (U.S. data) 12
  13. 13. www.srcentre.com.au Impact of declining response rates on telephone survey estimates in the U.S. 13
  14. 14. www.srcentre.com.au Impact of low response rates on survey estimates Pew conclusions 2012 ➢ Despite the growing difficulties in obtaining a high level of participation in most surveys, well-designed telephone polls that include landlines and cell phones reach a cross- section of adults that mirrors the American public, both demographically and in many social behaviours. Pew conclusions 2016 ➢ Telephone polls still provide accurate data on a wide range of social, demographic and political variables, but some persistent weaknesses persist. 14
  15. 15. www.srcentre.com.au What is the situation in Australia? A literature review and industry consultation commissioned by RICA and undertaken by Bednall et al. (2013) concluded the following: ➢ Telephone response rates: o As far as the telephone is concerned response rates have been in a gradual decline the last decade. o Among cold-calling general community surveys, telephone response rates are typically below 10%. o Co-operation rates, (the ratio of obtained interviews to refusals) are typically below 0.2 (that is below one interview to five refusals). o Telephone interviews from client lists have a higher response rate – typically above 20% with co-operation rates above 1.0. o It would appear that some topics, such as financial services, may induce a lower level of co- operation. o Government sponsored surveys have higher response rates, at times over 50%, but even here a sharp decline in response rates over time for one long running monitor was observed. Co- operation rates were also higher in government sponsored surveys. 15
  16. 16. www.srcentre.com.au What is the situation in Australia – SRC Landline surveys 17 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 2011 2012 2013 2014 2015 Non-contactsas%usablesample Survey year Non-contacts as % usable sample - various projects Landline sample 1- National Landline sample 2 - National Landline sample 3 - Victoria Landline sample 4 - Victoria Landline sample 5 - National
  17. 17. www.srcentre.com.au What is the situation in Australia – SRC Landline surveys 18 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 2011 2012 2013 2014 2015 Refusalsasa%ofcontacts Survey year Refusals as % contacts - various projects Landline sample 1- National Landline sample 2 - National Landline sample 3 - Victoria Landline sample 4 - Victoria Landline sample 5 - National
  18. 18. www.srcentre.com.au What is the situation in Australia – SRC Landline surveys 19 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0% 2011 2012 2013 2014 2015 Interviewsasa%ofinterviews+refusals Survey year Interviews as % interviews and refusals - various projects Landline sample 1- National Landline sample 2 - National Landline sample 3 - Victoria Landline sample 4 - Victoria Landline sample 5 - National
  19. 19. www.srcentre.com.au What is the situation in Australia (cont.)? ! Not saying that low response rates are inconsequential! ! Poorly designed and poorly executed surveys, especially those with very limited call routines (e.g. polls/trackers enumerated over one or two evenings only), those with non-coverage errors (e.g. landline telephone surveys), and those that use non-probability sampling methods (e.g. online panels) are much more likely to produce biased results. ! HOWEVER, well designed and well executed probability-based surveys still produce estimates that can be relied on in many situations (and with known standard errors and confidence intervals!) 22
  20. 20. www.srcentre.com.au This appears to be the main situation regarding non-response 23
  21. 21. www.srcentre.com.au The sky is not falling in
  22. 22. www.srcentre.com.au A Word of Warning on Probability Sampling Rivers again …. [Probability surveys work] if the nonresponse rate is small. Cochran (1977, p. 363), in his classic text, concluded that the upper limit is approximately 10 percent nonresponse, which is difficult to achieve today in even the best funded surveys. Lohr (2010, p. 355) warns that “many examples exist of surveys with a 70% response rate whose results are flawed.” (Doug Rivers, Comments on the “2013 AAPOR Taskforce Report on Non- probability Sampling” (Douglas Rivers; Comment. J Surv Stat Methodol 2013; 1 (2): 111-117. doi: 10.1093/jssam/smt009) 25
  23. 23. www.srcentre.com.au The rise of non-probability online panels
  24. 24. www.srcentre.com.au The take-up of non-probability online panels The situation in Australia ➢ In 2014-15, 86% of Australian households had access to the internet at home (ABS). ➢ Online research is continuing to grow in popularity domestically and internationally. 28 24 39 34 34 40 26 28 25 31 12 0 10 20 30 40 50 2009 2010 2012 2013 2015 Online CATI Source: ESOMAR / Research Industry Council of Australia Percent of research industry turnover Online research globally – 28% in 2014 ➢ 50+ commercial ‘online research panels’. ➢ First of these established in the late 1990’s. ➢ All ‘until now’ use non- probability sampling methods.
  25. 25. www.srcentre.com.au The rise of non-probability panels - globally $0 $1,000 $2,000 $3,000 $4,000 $5,000 $6,000 $7,000 $8,000 '99 '00 '01 '02 '03 '04 '05 '06 '07 '08 '09 '10 '11 '12 '13 '14 '15E '16E Revenuein$USM Year Online Research Revenues 1999-2016E US Europe Rest of World 29
  26. 26. www.srcentre.com.au Cons ➢ Non-coverage ➢ Self-selection ➢ Reliance on the computer-literate respondents ➢ Non-probability sampling o Calculating standard errors o Calculating confidence intervals o Generalising to the target population of interest Pros ➢ Reduced cost ➢ Improved timeliness ➢ Respondent convenience ➢ Reduced social desirability bias ➢ Can target ‘hard to reach’ populations ➢ Multimedia functionality ➢ Computerised questionnaire scripts 30 Pros and cons of non-probability panels
  27. 27. www.srcentre.com.au How do they compare – the SRC Online Panels Benchmarking Survey ➢ Three surveys based on probability samples of the Australian population aged 18 years and over and five surveys of persons aged 18 years and over administered to members of non-probability online panels. ➢ Survey questionnaire included a range of demographic questions and questions about health, wellbeing and use of technology for which high-quality population benchmarks were available. ➢ Same questions were used across the eight surveys. ➢ 9 minutes average interview length for online and telephone. o 12 page booklet for the hard copy version. ➢ Fieldwork Oct – Dec 2015. ➢ Data and documentation available from the Australian Data Archive. https://www.ada.edu.au/ada/01329 31
  28. 28. www.srcentre.com.au Results – substantive health characteristics 32 Substantive variables Benchmark value Distance from benchmarks Probability Non-probability ABS ANU Poll RDD P1 P2 P3 P4 P5 Life satisfaction (8 out of 10) Percentage point error 32.6 -2.0 -2.0 1.9 -11.9 -11.6 -4.5 -9.2 -7.9 Psychological distress - Kessler 6 (Low) Percentage point error 82.2 -10.6 -11.6 -8.1 -25.9 -23.5 -22.2 -25.0 -23.2 General Health Status (SF1) (Very good) Percentage point error 36.2 0.4 -2.0 -2.6 -4.1 -5.8 -5.3 -5.0 1.5 Private Health Insurance Percentage point error 57.1 3.4 1.9 3.3 -8.9 -12.5 -3.7 -0.6 -2.6 Daily smoker Percentage point error 13.5 -4.1 3.5 1.6 9.8 6.7 3.9 2.7 4.3 Consumed alcohol in the last 12 months Percentage point error 81.9 3.6 -2.8 -4.0 2.4 5.3 3.9 4.2 1.5
  29. 29. www.srcentre.com.au Results – substantive health characteristics Average error across six substantive measures 33 Probability Surveys Non-probability Panels ABS ANU Poll RDD P1 P2 P3 P4 P5 Avge. error 4.02 3.98 3.58 10.5 10.9 7.24 7.78 6.83 Largest absolute error 10.59 11.57 8.08 25.86 23.52 22.20 24.96 23.20 No. of significant differences from benchmarks (out of 6) 2 3 2 4 6 3 3 3
  30. 30. www.srcentre.com.au Two questions … 1. How do we reduce non-response bias in low response rate probability surveys 2. How do we reduce bias in survey estimates based on non- probability online panels 34
  31. 31. www.srcentre.com.au Instructors Dina Neiger, PhD ➢ Dina is a professional statistician with over 20 years experience and a track record of achievement in leadership and technical roles. ➢ Social Research Centre, Monash University, Australian Bureau of Statistics, Peter MacCallum Cancer Centre. ➢ 1st Class Honours degree in Statistics and PhD in Business Systems from Monash University with an emphasis in applied Operations Research and Process Engineering. ➢ Accredited Statistician (AStat) and member of the Statistical Society of Australia (SSA). ➢ Full member of the Australian Society for Operations Research (ASOR). ➢ Calibration and blending methods to improve accuracy of the non-probability samples, establishment and maintenance of the first Australian Online Probability panel and complex business survey design and weighting. 35
  32. 32. www.srcentre.com.au Instructors Paul J. Lavrakas, PhD ➢ Research psychologist, research methodologist, and prolific author. ➢ Since 2007, Independent Consultant with clients in the USA, Australia, Belgium, Japan and Canada; and Senior Fellow at NORC (U-Chicago) and Office of Survey Research (Michigan State U.) . ➢ From 2000-2007, Chief Research Methodologist for Nielson Media Research. ➢ 1978-2000 - Professor at Northwestern U. and Ohio State U. and founding faculty director of a survey research center at each university. ➢ Australian roles - Senior Methodological Adviser, the Social Research Centre. Member of the Scientific Advisory Board for the Centre for Social Research and Methods, ANU. ➢ President of the American Association for Public Opinion Research (2011-2014) and continues to serve on a volunteer basis on many AAPOR task forces and committees. 36
  33. 33. www.srcentre.com.au Investigating Nonresponse Bias by Level of Effort Paul J. Lavrakas
  34. 34. www.srcentre.com.au Investigating Nonresponse Bias ➢ Methods have evolved in the past decade regarding how NR Bias can be investigated (Groves and Brick, 2007; Olson and Montaquilla, 2012) ➢ One approach is to conduct analyses comparing the difference between early responders and middle responders vs. late responders on key measures in a survey with the hypothesis/premise being that late responders are more like nonrespondents than are early and middle responders ➢ However, if no differences are found between late responders vs. early and middle responders, that is NOT taken as evidence that there is no differences between respondents and nonrespondents. 38
  35. 35. www.srcentre.com.au Case Study 1: University Faculty Health Benefits Survey Study Background ➢ The purpose of this survey was to provide reliable and valid information about the opinions of faculty towards the prospect of the university creating a new health care service for faculty working on the main campus. ➢ Approximately 800 of 5,000 members of the faculty were randomly sampled, and a telephone survey was conducted by the university’s survey research center, which yielded 47% completion rate. ➢ Nonresponse bias was investigated using multiple methods, with one being a Level of Effort analysis. 39
  36. 36. www.srcentre.com.au Level of Effort Analyses ➢ Level of Effort was operationalized as the number of calls made to a given sampled respondent who completed the interview ➢ Analyses were conducted to investigate whether the effort it took to complete an interview was related to any of the substantive questions that were asked in the questionnaire. ➢ From these analyses it was learned than none of the key questionnaire items had a statistically significant (p < .05) association with this effort measure ➢ For one of the questionnaire items – asking whether the respondent had heard about the possible new health service prior to being interviewed – there was a marginally significant correlation (p < .07) with effort, but the size of this correlation was extremely small (r = 0.09). ➢ Thus, it was judged that there as no reliable evidence that any nonignorable nonresponse bias was present in the findings reported in the main body of the report. 40
  37. 37. www.srcentre.com.au Case Study 2: Voter Identification Survey Study Background ➢ In 2012, there were several states in the USA whose state governments were controlled by conservative party members who passed legislation changing the requirements for the identification that a voter must show to prove that he or she were eligible to vote on the day of the Presidential Election. ➢ The purpose of the survey, which was funded by large labor unions that were known to support liberal candidates/policies, was to identify how many and which types of residents in the state were most likely to be disenfranchised by the new Voter ID legislation. 41
  38. 38. www.srcentre.com.au Level of Effort Analysis 1 ➢ One approach was to compare key data provided by those who initially refused, but later agreed to complete the questionnaire after being recontacted with a “refusal conversion” protocol, with data provided by the cohort that never refused. ➢ In this analysis it was found that for neither of the two of the key statistics measured by the survey – awareness of the new Voter ID legislation; whether someone had a valid photo ID to vote at their local polling place on November 6, 2012 – were there statistically significant differences (p < .05) between those who never refused and those who initially refused but later were converted. ➢ Thus, there was no evidence found to suggest that refusing nonrespondents who were not converted would have given materially different answers to these key questions than did the respondents. 42
  39. 39. www.srcentre.com.au Level of Effort Analysis 2 ➢ A second approach was to take into account the number of call attempts it took to complete an interview, and investigate if answers to the same two key variables correlated with the level of effort expended. ➢ In this analysis it was found that as more effort was expended to reach a respondent, the likelihood the respondent had a valid photo ID for voting purposes increased at a statistically significant level (p < .007). o Thus, the findings suggested that the unweighted survey data were somewhat biased in the direction of underestimating the proportion of people with a valid photo ID 43
  40. 40. www.srcentre.com.au Level of Effort Analysis 2 (cont.) ➢ However, since the level of effort to achieve a completion correlated with various demographic characteristics of respondents (e.g., it take less effort to reach elderly respondents and more effort to reach young adults) and given that key demographic characteristics were taken into account in the weighting process, it was judged to be unlikely that the findings reported about the proportion of registered voters without a valid photo ID were subject to nonignorable nonresponse bias. ➢ Concerning awareness of the legislation, there was no statistically significant difference associated with the level of effort expended to gain a completion. o Thus, there was no evidence found to suggest that uncontacted nonrespondents would have given materially different answers to this key questions than did the respondents. 44
  41. 41. www.srcentre.com.au Non-Response Follow-up Studies Paul J. Lavrakas
  42. 42. www.srcentre.com.au Case Study: IMLS Nonresponse Follow-Up (NRFU) Study ➢ Another preferred method to investigate Nonresponse Bias is to conduct a follow-up survey of a sample of the original survey’s nonresponders. ➢ Little has been reported regarding how to use the data from such follow-up studies ➢ Please note, that my slides today are extracted from a paper presentation that I made in 2015 at the 70th annual conference of the American Association for Public Opinion Research (AAPOR), “Studying Nonresponse Bias with a Follow-up Survey of Initial Nonresponders in a Dual Frame RDD Survey” 46
  43. 43. www.srcentre.com.au 2013 Public Needs for Library and Museum Services (PNLMS) Survey ➢ Sponsored by the Institute of Museum and Library Service (IMLS), a federal agency in the USA ➢ Conducted by M. Davis and Company, Inc., Philadelphia PA ➢ National dual-frame RDD survey of the general population of the USA o Data gathered September-November 2013 o 3,537 interviews completed; 2,506 from LL frame; 1,031 from CP frame • Landline sample AAPOR RR3 was 25% • Cell Phone sample AAPOR RR3 was 10% 47
  44. 44. www.srcentre.com.au 2014 IMLS Nonresponse Follow-up (NRFU) Survey ➢ Created shortened questionnaire with key variables ➢ Used a noncontingent and a contingent incentive protocol ➢ Used only best calibre interviewers ➢ 201 interviews were completed in January-February 2014 with a random sample of nonresponders to the main survey; 100 from LL frame, 101 from CP frame o Landline follow-up sample AAPOR RR3 was 32% o Cell Phone follow-up sample AAPOR RR3 was 16% ➢ The original sample and the follow-up samples had remarkably similar demographic characteristics with the exception that the follow-up sample had proportionally more young adults aged 18-24 years and fewer older adults aged 65+ year (p < .03) than the original survey. 48
  45. 45. www.srcentre.com.au Analytic Approach: Phase 1 ➢ Combining the two surveys via Weighting, in two Phases o Phase 1: Begin by assigning weights, separately, to each of four groups • Landline RDD sample (original study) • Landline NR follow-up sample • Cell phone RDD sample (original study) • Cell phone NR follow-up sample 49
  46. 46. www.srcentre.com.au Analytic Approach: Phase 1 ➢ For each of the four groups, Phase 1 including the following seven steps conducted for each US Census region within each sample type: 1. Probability of Selection and Design Weight 2. Nonresponse Follow-up Adjustment 3. Unknown Eligibility Adjustment 4. Removal of Known Ineligible 5. Unit Nonresponse Adjustment 6. Adult Subsampling Adjustment 7. Multiple Phone Line Adjustment ➢ Steps 1 and 2 created the weights to start the weighting process ➢ Steps 3-7 were sequential adjustments to the starting weights 50
  47. 47. www.srcentre.com.au 1. Probability of Selection and Design Weight 51 For each of the four samples: ➢ Using information from the sampling frame, the probability of selection was calculated as the number of released telephone numbers in each Census region divided by the total number of telephone numbers in the census region ➢ The design weight was the inverse of the probability of selection for the released telephone numbers and zero for the non-released telephone numbers
  48. 48. www.srcentre.com.au 2. Nonresponse Follow-up Adjustment 52 For each of the two NRFU samples: ➢ For the telephone numbers that were eligible for the nonresponse follow-up study, a further adjustment to the design weight was required to account for the subsampling of telephone numbers that were called from among all the eligible nonresponse follow-up numbers ➢ This adjustment to the design weight was calculated as the number of follow-up eligible telephone numbers divided by the number of follow-up eligible called telephone numbers
  49. 49. www.srcentre.com.au 3. Unknown Eligibility Adjustment As in any DFRDD survey, a considerable portion of the sampled numbers ended the original field period with their eligibility unresolved For each of the four samples: ➢ Using standard AAPOR final disposition codes, two groups of telephone numbers were created in each Census region: (1) unknown eligibility telephone numbers and (2) known eligibility telephone numbers ➢ Unknown eligibility adjustment factor was the sum of the follow-up adjusted design weights for all telephone numbers divided by the sum of the follow-up adjusted design weights for the known eligibility telephone numbers ➢ The unknown eligibility adjusted weight is the product of the unknown eligibility adjustment factor and the follow-up adjusted design weight for the known eligibility telephone numbers and zero (0.0) for the unknown eligibility telephone numbers 53
  50. 50. www.srcentre.com.au 4. Removal of Known Ineligible As with all telephone surveys of the general public, many of the telephone numbers that were released were found to be ineligible for various reasons, including various non-working numbers (e.g., disconnected or temporarily out of service, technical problems, etc.), and numbers that were not part of the target population, e.g., fax, business, or other nonresidential numbers. For each of the four samples: ➢ Using AAPOR final disposition codes, these known ineligible telephone numbers were identified and removed from the weighting process 54
  51. 51. www.srcentre.com.au 5. Unit Nonresponse Adjustment 55 As with all telephone surveys of the general public, for many of the telephone numbers for which contact was made, it was determined that there was at least one eligible adult but no data were collected For each of the four samples: ➢ Using the final disposition codes for the eligible telephone numbers produced two groups of eligible telephone numbers in each Census region, (1) eligible responding telephone numbers and (2) eligible non-responding telephone numbers ➢ The unit nonresponse adjustment shifts the weights from the eligible nonrespondents to the eligible respondents ➢ The nonresponse adjustment factor was the sum of the unknown eligibility adjusted weights for all eligible telephone numbers divided by the sum of the unknown eligibility adjusted weights for the responding telephone numbers ➢ The nonresponse adjusted weight was the product of the nonresponse adjustment factor and the unknown eligibility adjusted weight for respondents, and zero (0.0) for eligible nonrespondents
  52. 52. www.srcentre.com.au 6. Landline Adult Subsampling Adjustment 56 Within eligible landline households there may have been more than one eligible adult in residence; this was determined in the questionnaire For each of the four samples: ➢ When there were two or more eligible adults associated with an landline telephone number, the adult subsampling adjustment factor was capped at two (2.0) ➢ The adult subsampling adjusted weight was the product of the adult subsampling adjustment factor and the nonresponse adjusted weight.
  53. 53. www.srcentre.com.au 7. Multiple Phone Line Adjustment Most adults can be sampled on more than one phone line. For each of the four samples: ➢ A multiplicity adjustment was implemented when there were two or more telephone numbers, either landline or cell phone, at which the responding adult could have been contacted; this multiplicity adjustment factor was capped at two (2.0), otherwise, the multiplicity adjustment factor was one (1.0) This completed Phase 1 of the separate weighting of each of the four groups 57
  54. 54. www.srcentre.com.au Phase 2 Combining Samples: Composite Weights 58 For each of the four samples: ➢ The composite weighting adjusted the multiplicity adjusted weight from the landline respondents and cell phone respondents to account for overlap in the two samples ➢ If the person could have been reached by both frames the composite weight was 0.5; if the person could only be reached by one frame it was 1.0
  55. 55. www.srcentre.com.au Phase 2 Combining Samples: Calibration For each of the four samples: ➢ The final adjustment forced the weight totals from the survey data using the composite weight to match external population control totals; the external control totals were based on the following characteristics: o An extrapolation of Blumberg and Luke’s 2013 NHIS findings for telephone service ownership so that 50% were both landline and cell phone, 41% cell phone only, and 9% landline only o Socio-demographic characteristics from the most recent American Community Survey • Gender (Male, Female) • Age group (18-44, 45-64, 64+) • Marital status (Never been Married, Married, Separated/Divorced/Widow) • Hispanicity/Race (Hispanic, Non-Hispanic African American, Non-Hispanic White, Non-Hispanic Other) • Education (Less High School/High School, Some College, Associate or Bachelor Degree, Advanced Degree) • Presence or absence of children (No, Yes) 59
  56. 56. www.srcentre.com.au Phase 2 Combining Samples: Calibration (cont.) ➢ The calibration methodology that was used was Iterative Proportional Fitting, i.e., raking or sample balancing; ➢ This method forces all of the different characteristics to simultaneously match the control totals o Each of the socio-demographic characteristics was used as a separate dimension in the raking process 60
  57. 57. www.srcentre.com.au Using the Final Combined Data from the Four Samples ➢ After all this was done we had a weighted dataset containing all the interviews from the original study and from the follow-up survey of nonresponders (n= 3,738) ➢ This allowed us to generate two types of population estimates for key behavioral variables in the study o Estimates based on the original survey only o Estimates based on the combined surveys ➢ We then compared the differences between the two estimates to determine whether or not a statistic was materially different when the data from the nonresponse follow-up survey was taken into account 61
  58. 58. www.srcentre.com.au An Example of a Material Difference in a Key Statistic ➢ Percent of parents/guardians reporting having a child who visited a Zoo/Aquarium in the past 30 days 62 Original Survey Yes, Did Visit 19.2% Est. # in USA 14.45M
  59. 59. www.srcentre.com.au An Example of a Material Difference in a Key Statistic ➢ Percent of parents/guardians reporting having a child who visited a Zoo/Aquarium in the past 30 days ➢ 8.1 percentage point difference ➢ Estimated 6.1M difference 63 Original Survey Combined Surveys Yes, Did Visit 19.2% 27.3% Est. # in USA 14.45M 20.55M
  60. 60. www.srcentre.com.au The Extent of Differences Identified by the NR Follow-up Study ➢ For 10 of the 28 behavioral measures, the percentage found in the original survey differed by more than two percentage points (2 pp) from the percentage found in the combined survey dataset ➢ Of these, six of the behavioral measures differed by more than five percentage points (5 pp) ➢ All of these measures were for estimates associated with the reported behavior of children (where the adult respondent served as a proxy for her/his child) and none of them were for estimates of the reported behavior of the adult respondent Nonresponse Bias exists at the level of the individual measure, and here we found evidence of several measures that likely were highly biased 64
  61. 61. www.srcentre.com.au Response Propensity Modelling – Non-response Adjustments Dina Neiger
  62. 62. www.srcentre.com.au Acknowledgement Analysis by Andrew Ward, Principal Statistician, Social Research Centre Thanks to Dr Siu-Ming Tam and Mr Paul Schubert, ABS, for their collaboration on this work and their kind permission to use the survey data for this presentation. Slides based on presentation by Andrew Ward at the Australian Statistical Conference in December 2016. 66
  63. 63. www.srcentre.com.au Community Trust in Statistics Survey Determine public awareness and trust of official statistics Dual frame phone survey Also available for respondents and non-respondents: part-of- state (based on the landline prefix) or mobile Collected info from ~727 refusals – age, sex, awareness, trust 67
  64. 64. www.srcentre.com.au Challenge ➢ Incorporation of refusal information in non-response adjustment Probability of response derived from propensity model Base weight = Design weight / Probability of response Limited auxiliary information available for respondents and non- respondents 68
  65. 65. www.srcentre.com.au Non-response adjustment Awareness / Trust Non-respondents (%) Respondents (%) Have heard of and trust a great deal 12.4 20.5 Have heard of and tend to trust 28.5 54.6 Have heard of and tend to distrust 6.2 10.5 Have heard of and distrust a great deal 4.8 2.2 Have heard of but DK / Refused trust 26.3 2.9 Total awareness 78.2 90.7 Have not heard of 17.1 9.0 Don’t know / Refused 4.8 0.4 69
  66. 66. www.srcentre.com.au Gives boost to respondents most like non-respondents Location, Age, Sex, Awareness of official statistics, Trust in official statistics Logistic regression model predicts response probability from information available for both respondents and non-respondents Non-response adjustment (cont’d) 70
  67. 67. www.srcentre.com.au Non-response adjustment (cont’d) Awareness / Trust Unweighted (%) Without propensity weight (%) With propensity weight (%) With % - Without % Have heard of and trust a great deal 20.5 15.3 14.3 -1 Have heard of and tend to trust 54.6 52.6 47.5 -5.1 Have heard of and tend to distrust 10.5 11.1 9.9 -1.2 Have heard of and distrust a great deal 2.2 2.4 2.9 0.5 Have heard of but DK / Refused trust 2.9 3.4 7.5 4.1 Total awareness 90.7 84.8 82.1 -2.7 Have not heard of 9.0 14.5 17.1 2.6 Don’t know / Refused 0.4 0.7 0.9 0.2 71
  68. 68. www.srcentre.com.au Non-response adjustment (cont’d) Assumptions Refusals who provided basic information are representative of all non-contacts Refusals and respondents answer non- response items in the same way 72
  69. 69. www.srcentre.com.au Non-response adjustment (cont’d) Has face-validity, though… For example, persons with university education are typically over- represented in RDD surveys Therefore awareness estimates may have been inflated 73
  70. 70. www.srcentre.com.au Techniques to reduce bias in non- probability samples Dina Neiger
  71. 71. www.srcentre.com.au Data The Social Research Centre Cancer Council Victoria Slides and Ideas Darren Pennay Andrew C Ward Paul J Lavrakas Tina Petroulias Sebastian Misson Charles DiSogra Inferences from Non Probability Samples Workshop participants, Paris 2017 75 Acknowledgments
  72. 72. www.srcentre.com.au Key background references ➢ Yeager, D. S., Krosnick, J. A., Chang, L., Javitz, H. S., Levendusky, M. S., Simpser, A., (2011) o Comparing the Accuracy of RDD Telephone Surveys and Internet Surveys Conducted with Probability and Non-Probability Samples. Public Opinion Quarterly ➢ DiSogra, C., Cobb C., Chan E., Dennis J. M. (2011) o Calibrating Non-Probability Internet Samples with Probability Samples Using Early Adopter Characteristics. Section on Survey Research Methods – JSM 2011 ➢ Valliant, R., Dever, J. A. (2011) o Estimating propensity adjustments for volunteer web surveys, Sociological Methods & Research, 40(1), pp. 105-137) ➢ Terhnian, G., Bremer, J., Haney C., (2014) o A model based approach for achieving a representative sample ➢ Fahimi, M., F. M. Barlas, R. K. Thomas and N. Buttermore (2015) o Scientific Surveys Based on Incomplete Sampling Frames and High Rates of Nonresponse. Survey Practice. 8 (5) 76
  73. 73. www.srcentre.com.au Usual starting point ➢ Design weight – inverse of probability of selection o Probability sample: number of people in the household, number of landlines, etc. o Non-probability sample: unknown, usually given design weight=1 ➢ Calibration (post-stratification, raking, RIM) o Uses external measures as “benchmarks” to adjust/weight (calibrate) the data to improve accuracy and force estimates to be consistent with other data sources o Applied to both probability and non-probability samples for key population demographics e.g. gender, age, education to reflect population distribution o If non-probability sample uses quotas for the same demographics then post- stratification will have minimal impact 77
  74. 74. www.srcentre.com.au Selecting benchmarks Should be a known source of non-response bias and likely to have an effect on survey estimates. Must have information for each sample member individually and the population as a whole ➢ Identical or as close to as possible (e.g. the same survey question and the census) Stratified sample: ➢ Unequal probability of selection within population subgroups (markets / strata) ➢ Population totals within each subgroup/stratum must be included as part of the weights Common variables in general population surveys: ➢ Telephone status (for dual frame surveys) ➢ Gender ➢ Age by education ➢ Country of birth (Australia V Other English Speaking Country V Non-English Speaking Countries) ➢ Geography (State, Metro V regional) 78
  75. 75. www.srcentre.com.au Is your adjustment working? Key considerations: ➢ Variance (probability samples) and bias Variance: ➢ One sample of many possible samples, how different would your result be if you happened to select a different random sample? ➢ If use too many benchmarks (or data is severely biased), weights will become extreme increasing variance of your estimate and decreasing your confidence in the accuracy of your result Bias: ➢ Difference from your survey result to the true value ➢ If ignore a known source of non-response bias then survey results will be further away from the truth ➢ Need to know the truth in order to measure ➢ Bias measurement dilemma - if already know the truth why bother with the survey? What about weighting efficiency? ➢ A measure of how much work a weight is doing ➢ What is an acceptable minimum? No standard ➢ Can be misleading especially in non-probability or highly biased samples (e.g. weight=1) 79
  76. 76. www.srcentre.com.au Case study 1: The Online Panels Benchmarking Study ➢ Three surveys based on probability samples of the Australian population aged 18 years and over ➢ Five surveys of persons aged 18 years and over administered to members of non- probability online panels ➢ Survey questionnaire included a range of demographic questions and questions about health, wellbeing and use of technology ➢ Same questions were used across the eight surveys (Unified approach to questionnaire design to try and minimise mode effects) ➢ 9 minutes average interview length for online and telephone (12 page booklet) ➢ Fieldwork: Oct – Dec 2015 ➢ Data and documentation available from the Australian Data Archive https://www.ada.edu.au/ada/01329 80
  77. 77. www.srcentre.com.au Results – substantive health characteristics (modal response) 81 Substantive variables Benchmark value Distance from benchmarks Probability Non-probability ABS ANU Poll DF RDD P1 P2 P3 P4 P5 Life satisfaction (8 out of 10) Percentage point error 32.6 -2.0 -2.0 1.9 -11.9 -11.6 -4.5 -9.2 -7.9 Psychological distress - Kessler 6 (Low) Percentage point error 82.2 -10.6 -11.6 -8.1 -25.9 -23.5 -22.2 -25.0 -23.2 General Health Status (SF1) (Very good) Percentage point error 36.2 0.4 -2.0 -2.6 -4.1 -5.8 -5.3 -5.0 1.5 Private Health Insurance Percentage point error 57.1 3.4 1.9 3.3 -8.9 -12.5 -3.7 -0.6 -2.6 Daily smoker Percentage point error 13.5 -4.1 3.5 1.6 9.8 6.7 3.9 2.7 4.3 Consumed alcohol in the last 12 months Percentage point error 81.9 3.6 -2.8 -4.0 2.4 5.3 3.9 4.2 1.5
  78. 78. www.srcentre.com.au Impact on weighting Impact of weighting on average absolute error: The average difference (percentage points) across all benchmarks between the official statistics and the survey estimates. 82 Probability Surveys Non-probability Panels ABS ANU Poll RDD P1 P2 P3 P4 P5 Unweighted (avge. error) 4.28 3.68 4.63 9.34 10.35 7.28 7.20 6.41 Weighted (avge. error) 4.02 3.98 3.58 10.5 10.9 7.24 7.78 6.83 Impact ✓  ✓   -  
  79. 79. www.srcentre.com.au Model-based design weights
  80. 80. www.srcentre.com.au Design weight: Non-probability sample ➢ Current approaches: o Best case scenario • Model based selection to address known biases (e.g. quotas) o Worst case scenario: • Participation is a combination of idiosyncratic contacts, low response rates, and non-coverage. • This is not the kind of design one would choose if there were an affordable alternative. • (Doug Rivers, Comments on the 2013 AAPOR Taskforce Report on Non-probability Sampling” (Douglas Rivers; Comment. J Surv Stat Methodol 2013; 1 (2): 111-117. doi: 10.1093/jssam/smt009) ➢ Possible alternative: o Adapt propensity response model to model likelihood of being part of the non-probability sample o Need probability reference sample 84
  81. 81. www.srcentre.com.au Probability reference sample ➢ Gold standard that we’d like non-probability sample to resemble as closely as possible o Known to produce accurate estimates for key outcomes o Ideally includes data that can be compared to independent benchmarks ➢ Needs o Common data items with the non-probability estimate • As a minimum: demographic and attitudinal and outcomes of interest o Comparable data • Mode (depending on the questionnaire) • Question wording • Reference timeframe 85
  82. 82. www.srcentre.com.au Case study 2 LinA as a reference sample for OPBS ➢ Life in Australia o http://www.srcentre.com.au/our-research#life-in-aus o Australia’s first and only probability-based online panel o Launched in December 2016 o 3,300 adults from across Australia randomly recruited via a dedicated dual-frame telephone survey, Aged 18 years and over, Online and offline population o Replicated online panel benchmarking study in February 2016 ➢ OPBS non-probability panels reweighted o Design weight=1 o Post-stratification weight matched to LinA weighting o Add enrolment to vote to the substantive characteristics 86
  83. 83. www.srcentre.com.au LinA as a reference sample 87 Substantive variables Benchmark value (%) Distance from benchmarks (percentage point difference from benchmark) LinA Non-probability Panels 1 2 3 4 5 Life satisfaction (8 out of 10) 32.6 -1.4 -11.1 -13.6 -5.9 -10.3 -8.0 Psychological distress - Kessler 6 (Low) 82.2 -21.3 -26.5 -25.5 -22.7 -25.1 -23.4 General Health Status - SF1 (Very good) 36.2 -3.8 -4.2 -4.2 -4.0 -5.3 0.1 Private Health Insurance 57.1 2.6 -8.0 -9.8 -3.9 0.1 -2.0 Daily smoker 13.5 -1.0 9.0 6.0 3.5 2.2 3.5 Consumed alcohol in the last 12 months 81.9 2.7 -1.1 -6.4 -3.6 -4.9 -1.7 Enrolled to vote 78.5 9.0 8.4 7.6 10.7 8.8 13.1 Average absolute difference 5.23 8.54 9.14 6.79 7.09 6.48
  84. 84. www.srcentre.com.au Back to model-based design weights ➢ Use reference sample to calculate o Propensity scores – conditional probability that a respondent in the non-probability sample rather than in a probability sample given observed characteristics of the respondent • Logistic model (R PracTools package, Valliant et al. 2015) • Non-probability cases=1, reference cases=0 • Design weight for non-probability sample - inverse of estimated probability of inclusion in non- probability sample given weighted reference sample • Beware: extreme weights, size and quality of the reference sample ➢ Result: o Probability-based design weights for non-probability sample o Now ready to look at calibration/weighting adjustment methods 88
  85. 85. www.srcentre.com.au Calibration
  86. 86. www.srcentre.com.au No agreed approach – work in progress ➢ Starting point – known biases in the non-probability sample ➢ For example, for online panels o Heavier internet users o Heavier media users o More interested in technology o Early adopters skew o More health care card holders ➢ Common theme o Standard demographics are not enough o High quality “official” benchmarks not available ➢ Probability reference sample to the rescue! o As long as items that are known/suspected biases in the non-probability sample are included 90
  87. 87. www.srcentre.com.au Cast study 2 ➢ Significant differentiators o Early adopter variables o Internet usage variables o Income o Employment o Remoteness o Home ownership o Media consumption variables were not included in the survey ➢ Incorporate one additional variable at a time compare with benchmarks to evaluate impact on bias 91
  88. 88. www.srcentre.com.au Options for EA inclusion in calibration ➢ Each EA variable is included as a 0/1 variables where o 1 means agreed or strongly agreed with that statement and o 0 all else (disagreed or strongly disagreed, did not respond) ➢ Scale derived using Rasch model o Bond, T. G. and C. M. Fox (2007). Applying the Rasch model: Fundamental measurement in the human sciences. (2nd ed.) Mahwah, N.J.: Erlbaum. ➢ Categorical scale o None – did not agree or strongly agree to any of 5 statements o Some – agreed or strongly agreed with at least one and no more than 2 out of 5 statements o High – agreed or strongly agreed with 3 or more out of 5 statements ➢ Agreement total calculated as the number of statements (min 0, max 5) that the respondent agreed or strongly agreed with 92
  89. 89. www.srcentre.com.au Options for EA inclusion in calibration Composite is better than individual variables 93 6.2 2.5 -0.4 -1.6 2 -1.5 -4.4 -5.5 7.8 -0.3 -1.5 -2.4 -8 -6 -4 -2 0 2 4 6 8 10 EA1-EA4 EA1-EA5 Rasch Scale EA1-EA5 Categorical Scale EA1-EA5 Agreement total %change Impact of EA variables on bias (% change) Ave abs error compared to unweighted Ave abs error compared to std adjustment Ave RMSE compared to std adjustment
  90. 90. www.srcentre.com.au Internet usage measures Not useful when calibrating online to online 94 2.6 5 4.5 5.5 -1.5 0.9 0.3 1.3 0.1 2 6.4 1.9 -2 -1 0 1 2 3 4 5 6 7 Look for information Access at home, Look for information Look for information, Post to blogs etc, Financial transactions, Social Media Access at home, Frequency of use %change Impact of Internet Usage measures on bias (% change) Ave abs error compared to unweighted Ave abs error compared to std adjustment Ave RMSE compared to std adjustment
  91. 91. www.srcentre.com.au Socio-economic variables in calibration Income – single most influential variable to reduce bias and RMSE Benefit from including both: income and employment 95 -7.3 -10.3 -7.3 -5.9 -11.0 -13.9 -10.9 -9.6 -8.5 -10.6 -5.8 -4.6 -16.0 -14.0 -12.0 -10.0 -8.0 -6.0 -4.0 -2.0 0.0 Income Income, Employment Income, Employment, Home Ownership Income, Employment, Home Ownership, Remoteness %change Impact of Income, Employment and Home Ownership variables on bias (% change) Ave abs error compared to unweighted Ave abs error compared to std adjustment Ave RMSE compared to std adjustment
  92. 92. www.srcentre.com.au Problem solved
  93. 93. www.srcentre.com.au Impact of calibration (LinA: Income, Employment, EA total agreement score) 97 Benefit of adjustment is sample dependent! Standard adjustments do not help with bias reductions!! Average Absolute Error % Change in bias Unweighted Std adjustments Design weights and key differentiators From unweighted From std adjustment Panel 1 9.2 9.8 7.8 -15.1 -19.8 Panel 2 10.0 10.4 9.3 -7.2 -10.9 Panel 3 7.7 7.7 5.3 -31.0 -31.5 Panel 4 7.4 8.1 7.1 -3.3 -11.9 Panel 5 7.4 7.4 7.8 5.4 5.7 All Panels 8.3 8.7 7.5 -10.4 -13.9
  94. 94. www.srcentre.com.au Comparison with independent benchmarks vs reference sample benchmarks 98 Mostly change in bias is consistent but there can be large differences for some samples -15.1 -7.2 -31.0 -3.3 5.4 -26.9 5.7 -31.7 28.7 1.0 -40.0 -30.0 -20.0 -10.0 0.0 10.0 20.0 30.0 40.0 Panel 1 Panel 2 Panel 3 Panel 4 Panel 5 % Change in bias from unweighted Independent Benchmarks Reference Sample -19.8 -10.9 -31.5 -11.9 5.7 -30.6 0.6 -31.5 -12.4 -8.3 -50.0 -40.0 -30.0 -20.0 -10.0 0.0 10.0 20.0 Panel 1 Panel 2 Panel 3 Panel 4 Panel 5 % Change in bias from std adjustment Independent Benchmarks Reference Sample
  95. 95. www.srcentre.com.au What about blending? ➢ Adding probability sample to non-probability sample will further decrease the bias ➢ Confirmed that o Combining probability and non-probability data greatly reduces variability across panels and has much larger impact than calibration alone o Use the best probability sample to combine with non probability regardless of mode and response rate ➢ But… ➢ Costs and feasibility of running parallel probability sample o Push to web – costs (incentives, follow up), timeliness o Listed vs RDD mobile – option for blending 99

×