How NOT to Aggregate Polling Data 3 rd Socio-Cultural Data Summit National Defense University Nov. 27, 2012 Patrick Moynihan, Ph.D. Survey Methodologist Office of Opinion ResearchBureau of Intelligence and Research U.S. Department of State firstname.lastname@example.org
Presentation vs. Invitation Presentations: Not always the optimal format for learning But offers an opportunity to connect across groups, meet individuals from different social networks, different backgrounds and different challenges in their work It’s been said surveys benefit from a collaborative environment – and this summit allows us to bridge into networks we might not otherwise have reason to broach
Don’t Reinvent the Wheel:Survey Resources on the Web Professional/academic associations American Association for Public Opinion Research (AAPOR) National Council on Public Polls (NCPP) American Statistical Association (ASA) Section on Survey Research Methods Materials on professional standards, best practices, guidelines on survey administration, elements required for full disclosure, webinars
Don’t Reinvent the Wheel:Survey Resources on the Web Question searches and indices Roper Center’s iPoll Pew Research Center Gallup Organization General Social Survey Often useful to see how others are asking questions about satisfaction, awareness, confidence, knowledge and so on
Don’t Reinvent the Wheel: Survey Resources on the Web Vast survey research literature to inform our survey projects – from sampling to measurement to nonresponse AAPOR’s “Public Opinion Quarterly” (POQ) AAPOR’s online “Survey Practice” “InternationalJournal of Public Opinion Research” “Journal of Official Statistics” Fowler’s concise manuals: “Survey Research
Don’t Reinvent the Wheel (unless the wheel is broken!) Just because an individual question or entire survey is in the public domain DOES NOT mean it’s high quality! Check methodological details before use Even if high quality – which is a BIG IF – ask: Will it work now, as opposed to when it was originally fielded? Will it work with the population I’m interested in, as opposed to the population the item was originally fielded? Will it be applicable to the specific issues I’m interested in, as opposed to those concerning the original researchers?
Numbers ≠ High-Quality DataSelection Matters More Than Size
Headlines: Poll Aggregation “This relatively accurate polling data provided the raw material for the second group of election pioneers: poll analysts like Nate Silver, who writes the FiveThirtyEight blog for The New York Times, as well as Simon Jackman at Stanford, Sam Wang at Princeton and Drew Linzer at Emory University. “What do poll analysts do? They are like the meteorologists who forecast hurricanes. Data for meteorologists comes from satellites and other tracking stations; data for the poll analysts comes from polling companies. The analysts’ job is to take the often conflicting data from the polls and explain what it all means.”
Challenge: Poll Aggravation Quality assessments of data Empirical basis to claim biases across polls negate each other Limited number of variables often aggregated (e.g., horserace numbers); restricts what can be said about what the public thinks, feels, values Good polling more than forecasting a number
Challenge: Poll Aggravation Aggregation steamrolls nuance, which can provide understanding of how publics make distinctions on issues, policies, candidates Question wording matters! Aggregation suggests there is a single number that best represents public opinion at any one time and that number is extremely precise We know social science isn’t so precise!
International Polling Coverage error exists across countries – but at different rates using different methodologies Must always check for coverage in all polls – international or not, telephone or not Consider the ‘09 Pew Global Attitudes Project, including 25 countries from a highly regarded polling organization
International Polls (con’t) Note that of the 25 nations in the Pew 2009 poll, four nation samples are described as “disproportionately urban”: Brazil, China, India and Pakistan But how much noncoverage does that amount to?
International Polls (con’t)Percentnoncoverage:China: 58 percentBrazil: 56 percentIndia: 39 percentPakistan: 10 percentWe wouldn’t accept adisproportionately urbansample to represent theUnited States, so we shouldn’tfor other countries!But wouldn’t have PresidentKerry loved it?
International Polls (con’t)Practical thinking on coverage: A key part of evaluating any sampling scheme isdetermining the percentage of the population onewants to describe that has a chance of beingselected and the extent to which those exclude aredistinctive. That is, percent noncoverage and degree of difference between those excluded from frame and those included Very often a researcher must make a choicebetween an easier or less expensive way ofsampling a population that leaves out some peopleand a more expensive strategy that is also morecomprehensive. Theissues of schedule and budget again creep into our design considerations!
The Sum Is Less Than Its Parts Aggregation to drive up one’s sample size (smaller MOE, seemingly more scientific and precise) and concisely characterize “world opinion” would be wrongheaded in this case – and Pew smartly avoids such pitfalls (though not all polling groups do) VERY careful analysis might be able to piece these varied polls together – but it’d require far more than simply averaging the numbers!
Afterthoughts Poll aggregation is innovative and some of what we might encounter in the future isn’t necessarily difficulty (though there is some) but rather density of numbers One problem with poll aggregation (and polling more broadly) isn’t that there’s too much going on but that the abundance is often clumsily handled, so it feels crowded and confused rather than illuminating and textured
Afterthoughts II An essential feature of polling is representativeness, a feature of high-quality survey research typically using probability sampling Falling short of this goal, we should be wary of results from single polls or polling aggregated using nonprobability methods Thisrequires us to be educated consumers of survey methodology!
Survey Research Essentials High-quality methodology requires the application of “best practices” concerning: Coverage error Sampling error Non-response error Good, fair questions with reasonable response options – that is, minimize measurement error Stay within your data when presenting results Are results significant statistically? Are results practically significant?
Transparency/Full Disclosure To include in your own survey research project, or to ask for when evaluating another’s survey: Detailed description of the methodology Coverage, sampling, field protocols, non-response, weighting Full questionnaire To evaluate wordings, response options and question order Overall results to each question So you can evaluate the response distributions for yourself The final report or analysis of data To evaluate how the results are characterized Sponsorship
Total Survey Error Approach Considering potential sources of error and determining how to minimize them, within the context of budget and scheduling constraints, is a challenge Knowing the potential pitfalls in advance and having some sense of how to overcome them should significantly improve the quality of
THANK YOU! Patrick Moynihan, Ph.D.Office of Opinion Research U.S. Department of State email@example.com 202-736-4380