• Save
Probability Sampling and Alternative Methodologies
Upcoming SlideShare
Loading in...5
×
 

Probability Sampling and Alternative Methodologies

on

  • 1,209 views

Gary Langer's Oct. 2012 presentation to the National Science Foundation on the future of survey research. Discusses the limitations of emerging approaches to public opinion research (such as opt-in ...

Gary Langer's Oct. 2012 presentation to the National Science Foundation on the future of survey research. Discusses the limitations of emerging approaches to public opinion research (such as opt-in online panels and social media analysis).

Statistics

Views

Total Views
1,209
Views on SlideShare
998
Embed Views
211

Actions

Likes
0
Downloads
0
Comments
1

1 Embed 211

http://www.langerresearch.com 211

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • please Ineed more about sampling
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • After 1948, government, academics and pollsters basically all came to a consensus to adopt probability sampling…
  • Note that neither availability or quota sampling meet these criteria
  • prob can consolidate the RR issue a bit since Krosnick says someone else is deal with it. Will do.
  • Average absolute error (unweighted) 3.3 and 3.5 points in probability samples vs. 4.9. to 9.9 points in convenience samplesHighest single error (weighted) 9 points in probability samples, 18 points in convenience samples
  • primary demographics = sex, age, race/ethnicity, education, and regionaveraging across, it’s 2.6% for probability samples v. 6.3% for internet samples
  • secondary demographics = marital status, total number of people living in the household, employment status, number of bedrooms in the home, number of vehicles owned, home ownership and household incomenon-demographic questions = frequency of smoking cigarettes, whether the have ever had 12 drinks of alcohol during their lifetimes, the average number of drinks of alcohol they have on days when they drink, ratings of quality of their health and possession of a US password and a driver’s license.
  • primary demographics = sex, age, race/ethnicity, education, and region
  • secondary demographics = marital status, total number of people living in the household, employment status, number of bedrooms in the home, number of vehicles owned, home ownership and household incomenon-demographic questions = frequency of smoking cigarettes, whether the have ever had 12 drinks of alcohol during their lifetimes, the average number of drinks of alcohol they have on days when they drink, ratings of quality of their health and possession of a US password and a driver’s license.
  • These are betas from a multiple logistic regression. Bold indicates that the betas are significantly different across samples. So this shows that depending on what sample you use, you get a different list of top predictors. Even though not all predictors are significantly different, they appear in a different order so tell a different story.
  • This shows it a different way. Here are the predictors that significantly differ by sample type. Again, these are betas from a logistic regression.
  • Instances where the two samples told VERY different stories about change over time. The top three grafs show significant NEGATIVE correlations between the two over-time lines.However, there were cases where the data did correlate positively – so it’s important to mention that this discordance wasn’t always the case. Even still, the divergence is crazy!
  • can discuss Google’s attempt to impute demographic information on the last point. I can make a full slide about it if you’d like.
  • topic: Salviathis shows that the computer programs were pretty good at matching neutral tweets (well, at least the first one was) – but miserable at matching positive and negative tweets, which, presumably, are the more interesting ones to know about.
  • Google trends computes how many searches on a particular topic (i.e. for selected search terms) have been done relative to the total number of google searches during that time period
  • potentially promising for trend-spotting, modeling or qualitative analysis

Probability Sampling and Alternative Methodologies Probability Sampling and Alternative Methodologies Presentation Transcript

  • Probability Sampling and Alternative Methodologies “The Future of Survey Research” National Science Foundation Oct. 4, 2012 Gary Langer Langer Research Associates glanger@langerresearch.com @Langer Research
  • Good Data• Are powerful and compelling• Rise above anecdote• Sustain precision• Expand our knowledge, enrich our understanding and inform our judgment
  • Other Data• May be manufactured to promote a product, client or point of view• Are easily manipulated• Are increasingly prevalent (cheaply produced via Internet, e-mail)• Leave the house of inferential statistics• Lack the validity and reliability we seek• Often mis-ask and misanalyze• Misinform, even disinform our judgment
  • The Challenge• Subject established methods to rigorous and ongoing evaluation• Subject new methods to the same standards• Value theoreticism as much as empiricism• Consider fitness for purpose• Go with the facts, not with the fashion
  • Survey Sampling A brief history…
  • Sample Techniques pre-1936• Full population (census survey) • Pros: High level of accuracy • Cons: Prohibitively expensive and highly challenging (unless the population is small)• “Availability” sampling (e.g., straw polls) • Pros: Quick and inexpensive • Cons: No-scientific basis, no theoretical justification for generalizing beyond the sample • Justification: “Good enough” – Literary Digest correctly predicted presidential election winners from1916 to 1932
  • The 1936 Election Roosevelt (D) vs. Landon (R)• Literary Digest poll • Technique: An availability sample • Sample frame: Combination of magazine subscribers, phone books, and car registration lists • Contacted 10 million, 2.4 million responded• Found an easy win for Landon. Roosevelt won in a landslide.• What happened? • Coverage bias: The sampling frame differed in important ways from the population of interest. Republicans were more likely to afford magazine subscriptions, telephones and cars during the Great Depression. • Systematic nonresponse: Republicans were disproportionately likely to return their ballot (perhaps to express discontent with the New Deal).
  • On to quota sampling• After 1936, availability sampling largely was replaced with quota sampling (which had correctly predicted FDR‟s win).• Quota sampling essentially attempts to build a “miniature” version of the target population. • Identifies supposed key demographic characteristics in the population and attempts to mirror them in the sample. • Matching the sample to the population on variables such as gender, race, age, income and others was said to produce a “representative” sample.
  • How‟d that work out?
  • What went wrong?1. Timing: Pollsters stopped collecting data in mid-October, assuming that there would be no late-breaking changes.2. Faulty assumptions: Pollsters erroneously assumed that undecided voters would vote in proportion with those who had made up their minds. Likely voters also may have been misidentified.3. Sampling problems: Quota sampling may have allowed interviewers to select more-educated and better-off respondents within their assigned quotas – pro-Dewey groups. This is a key potential flaw in purposive sampling.
  • Quota Sampling Issues• Even if quota sampling were not entirely to blame, the Dewey-Truman fiasco highlighted inherent limitations in quota sampling • It‟s impossible to anticipate, let alone match, every potentially important demographic: • e.g., if you match on sex (2), ethnicity (2), race (5), education (4), age (5), region (4) and income (6) that yields 9,600 cells. If you want to interview more than one person per cell… • Respondent selection is left up to the interviewers, allowing human bias to sneak in (e.g., picking easy-to-reach resps). • As with availability sampling, there is no theoretical justification for drawing inferences about the greater population.
  • Probability Sampling
  • Not a new conceptA “criterion that a sample design should meet (at least ifone is to make important decisions on the basis of thesample results) is that the reliability of the sampleresults should be susceptible of measurement. Anessential feature of such sampling methods is that eachelement of the population being sampled … has achance of being included in the sample and, moreover,that that chance or probability is known.”Hansen and Hauser, POQ, 1945
  • This is really not a new concept“Diagoras, surnamed the Atheist, once paid a visit toSamothrace, and a friend of his addressed him thus: „Youbelieve that the gods have no interest in human welfare.Please observe these countless painted tablets; they showhow many persons have withstood the rage of the tempestand safely reached the haven because they made vows tothe gods.‟“„Quite so,‟ Diagoras answered, „but where are the tablets ofthose who suffered shipwreck and perished in the deep?‟”“On the Nature of the Gods,” Marcus Tullius Cicero, 45 B.C.(cited by Kruskal and Mosteller, International Statistical Review, 1979)
  • A shared view• “…the stratified random sample is recognized by mathematical statisticians as the only practical device now available for securing a representative sample from human populations…” Snedecor, Journal of Farm Economics, 1939• “It is obvious that the sample can be representative of the population only if all parts of the population have a chance of being sampled.” Tippett, The Methods of Statistics, 1952• “If the sample is to be representative of the population from which it is drawn, the elements of the sample must have been chosen at random.” Johnson and Jackson, Modern Statistical Methods, 1959• “Probability sampling is important for three reasons: (1) Its measurability leads to objective statistical inference, in contrast to the subjective inference from judgment sampling. (2) Like any scientific method, it permits cumulative improvement through the separation and objective appraisal of its sources of errors. (3) When simple methods fail, researchers turn to probability sampling…” Kish, Survey Sampling, 1965
  • Today: Internet Opt-Ins
  • Copyright 2003 Times Newspapers Limited Sunday Times (London) January 26, 2003, SundaySECTION: Features; News; 12LENGTH: 312 wordsHEADLINE: Blair fails to make a case for action thatconvinces publicBODY:Tony Blair has failed to convince people in Britain ofthe need for war with Iraq, a poll for The SundayTimes shows. Even among Labour supporters, theprime minister has not yet made the case.The poll of nearly 2,000 people by YouGov shows thatonly 26% say Blair has convinced them that SaddamHussein is sufficiently dangerous to justify militaryaction, against 68% who say he has not done so.
  • -----Original Message-----From: Students SM Team [mailto:alumteam@teams.com]Sent: Wednesday, October 04, 2006 11:27 AMSubject: New Job openingHi,Going to school requires a serious commitment, but most students still need extra money for rent, food, gas, books, tuition, clothes, pleasure and a whole list of other things.So what do you do? "Find some sort of work", but the problem is that many jobs are boring, have low pay and rigid/inflexible schedules. So you are in the middle of mid-terms and you need to study but you have to be at work, so your grades and education suffer at the expense of your "College Job".Now you can do flexible work that fits your schedule! Our company and several nationwide companies want your help. We are looking to expand, by using independent workers we can do so without buying additional buildings and equipment. You can START IMMEDIATELY!This type of work is Great for College and University Students who are seriously looking for extra income!We have compiled and researched hundreds of research companies that are willing to pay you between $5 and $75 per hour simply to answer an online survey in the peacefulness of your own home. Thats all there is to it, and theres no catch or gimmicks! Weve put together the most reliable and reputable companies in the industry. Our list of research companies will allow you to earn $5 to $75 filling out surveys on the internet from home. One hour focus groups will earn you $50 to $150. Its as simple as that.Our companies just want you to give them your opinion so that they can empower their own market research. Since your time is valuable, they are willing to pay you for it.If you want to apply for the job position, please email at:job2@alum.com Students SM Team
  • -----Original Message-----From: Ipsos News Alerts [mailto:newsalerts@ipsos-na.com]Sent: Friday, March 27, 2009 5:12 PMTo: Langer, GarySubject: McLeansville Mother Wins a Car By Taking SurveysMcLeansville Mother Wins a Car By Taking SurveysToronto, ON- McLeansville, NC native, Jennifer Gattis beats the odds andwins a car by answering online surveys. Gattis was one of over 105 300North Americans eligible to win. Representatives from Ipsos i-Say, aleading online market research panel will be in Greensboro on Tuesday,March 31, 2009 to present Gattis with a 2009 Toyota Prius.Access the full press release at:http://www.ipsos-na.com/news/pressrelease.cfm?id=4331
  • Opt-in online panelist32-year-old Spanish-speaking female African-American physician residing in Billings, MT
  • Professional Respondents?Among 10 largest opt-in panels:•10% of panel participants account for 81% of surveyresponses;• 1% of participants account for 34% of responses.Gian Fulgoni, chairman, comScore, Council of American SurveyResearch Organizations annual conference, Los Angeles, October 2006.
  • Questions Who joins the club, how and why? What verification and validation of respondent identities are undertaken? What logical and QC checks (duration, patterning, data quality) are applied? What weights are applied, and how? On what theoretical basis and with what effect? What level of disclosure is provided? What claims are made about the data, and how are they justified?
  • One Claim: Google Consumer Surveys• “Produces results that are as accurate as probability-based panels.” http://www.google.com/insights/consumersurveys/how• 1 or 2-question pop-ups delivered to search engine users of “premium content.”• Demographic data are imputed through analysis of users‟ IP addresses and previous page views• Among Langer Research staff, one woman was identified as a 55-year-old male with an interest in beauty and fitness; another as a 65+ woman (double her actual age) with an interest in motorcycles and one man as a senior citizen in the Pacific Northwest with an interest in space technology. https://www.google.com/settings/ads/onweb/
  • Another claim:“Bayesian Credibility Interval”• Ipsos opt-in online panel – presdential tracking poll• Computation of “Credibility Internval” appears to match how you‟d compute SRS MoE.• Its basis is “the opinion of Ipsos experts.”• (Headdesk)
  • Further claims: Convenience Sample MoE• Zogby Interactive: "The margin of error is +/- 0.6 percentage points.”• Ipsos/Reuters: “The margin of error is plus or minus 3.1 percentage points."• Kelton Research: “The survey results indicate a margin of error of +/- 3.1 percent at a 95 percent confidence level.”• Economist/YouGov/Polimetrix: “Margin of error: +/- 4%.”• PNC/HNW/Harris Interactive: “Findings are significant at the 95 percent confidence level with a margin of error of +/- 2.5 percent.”• Radio One/Yankelovich: “Margin of error: +/-2 percentage points.”• Citi Credit-ED/Synovate: “The margin of error is +/- 3.0 percentage points.”• Spectrem: “The data have a margin of error of plus or minus 6.2 percentage points.”• Luntz: “+3.5% margin of error”
  • Online SurveyTraditional phone-based survey techniques suffer from deteriorating response rates andescalating costs. YouGovPolimetrix combines technology infrastructure for data collection,integration with large scale databases, and novel instrumentation to deliver newcapabilities for polling and survey research.(read more)
  • Response Rates• Surveys based in probability sampling cannot achieve pure probability (e.g., 100% RR)• Pew (5/15/12) reports decline in its RRs from 36% in 1997 to 9% now. (ABC/Post: 19%)• Does this trend poison the well?
  • A Look at the Lit2006: "…little to suggest that unit nonresponse within the range ofresponse rates obtained seriously threatens the quality of surveyestimates."Keeter et al., POQ, 2006 – comp. RR 25 vs. 50%; see also Keeter et al., POQ, 2000, RR 36 vs. 612008: “In general population RDD telephone surveys, lower responserates do not notably reduce the quality of survey demographic estimates.… This evidence challenges the assumption that response rates are akey indicator of survey data quality…”Holbrook et al.: Demographic comparison, 81 RDD surveys, 1996-2005; AAPOR3 RRs from 5 to 54%2012: “Overall, there are only modest differences in responses betweenthe standard and high-effort surveys. Similar to 1997 and 2003, theadditional time and effort to encourage cooperation in the high-effortsurvey does not lead to significantly different estimates on mostquestions.”Pew 5/15/12 - Comp. RR 9 vs. 22%
  • See also O‟Neil, 1979; Smith, 1984; Merkle et al., 1993;Curtin et al., 2000; Groves et al., 2004; Curtin et al., 2005.And Groves, 2006:“Hence, there is little empirical support for the notion that lowresponse rate surveys de facto produce estimates with highnonresponse bias.”
  • Next stepEmpirical testing
  • Yeager, Krosnick, et al., 2011 (See also Malhotra and Krosnick, 2006)• Paper compares seven opt-in online convenience-sample surveys with two probability sample surveys• Probability-sample surveys were “consistently highly accurate”• Opt-in online surveys were “always less accurate… and less consistent in their level of accuracy.”• Weighted probability samples sig. diff. from benchmarks 31 or 46 percent of the time for probability samples, 62 to 77 percent of the time for convenience samples.• Opt-in data‟s highest single error vs. benchmarks, 19 points; average error, 9.9.
  • Average Absolute Errors primary (weighting) demographics, unweighted 20% 18% 16% 14% 12.0% 12% 10% 8% 6.4% 6.4% 6% 5.0% 5.3% 4.7% 4.1% 4% 3.3% 2.0% 2% 0%Yeager, Krosnick, et al., 2011
  • Average Absolute Errors secondary and non-demographic results 10% Unweighted Weighted 9% 8% 7% 5.9% 6% 5.2% 5% 3.8% 4% 3.2% 3% 2% 1% 0% Probability samples (combined) Internet samples (combined)Yeager, Krosnick, et al., 2011
  • Largest Absolute Error all benchmarks, unweighted 40% 35.5% 35% 30% 25% 20% 18.0% 15.6% 15.3% 16.0% 15% 13.2% 13.7% 11.7% 9.6% 10% 5% 0%Yeager, Krosnick, et al., 2011
  • Largest Absolute Error secondary and non-demographic results 20% Unweighted Weighted 18% 16% 14.9% 14% 13.5% 12% 10.7% 10% 8.7% 8% 6% 4% 2% 0% Probability samples (combined) Internet samples (combined)Yeager, Krosnick, et al., 2011
  • % significantly different from benchmarks secondary and non-demographic results 100% Unweighted Weighted 90% 80% 70.6% 71.4% 70% 60% 57.5% 50% 38.5% 40% 30% 20% 10% 0% Probability samples (combined) Internet samples (combined)Yeager, Krosnick, et al., 2011
  • Additional conclusions Yeager, Krosnick, et al., 2011• Little support for the claim that some non-probability internet surveys are consistently more accurate than others. • Weighting did not always improve the accuracy of the opt-in samples.• No support for the idea that higher completion = greater accuracy: • There was a significant negative correlation between a survey‟s average absolute error and its response rate.• Probability samples were not only more accurate overall, but also more consistently accurate. • True both across surveys (using pre-existing probability samples as comparison points) as well as within surveys. • Meaning it‟s virtually impossible to anticipate whether an opt-in survey will be just somewhat less accurate than a probability sample or substantially less accurate; and knowing an opt-in is accurate on one benchmark does not predict its accuracy on other measures.
  • Additional conclusions Yeager, Krosnick, et al., 2011• The fact that internet samples are less accurate, and less consistent, on average, than probability samples should “come as no surprise, because no theory provides a rationale whereby samples generated by non-probability methods would yield accurate results”• It is “possible to cherry-pick such results to claim that non- probability sampling can yield veridical measurements. But a systematic look at a wide array of benchmarks documented that such results are the exception rather than the rule.”
  • ARF FoQ via Reg Baker, 2009• Reported data on estimates of smoking prevalence: similar across three probability methods, but with as many as 14 points of variation across 17 opt-in online panels.• “In the end, the results we get for any given study are highly dependent (and mostly unpredictable) on the panel we use. This is not good news.”
  • Callegaro et al., 2012• Reviewed 45 studies comparing the quality of data collected via opt-in panels vs. another mode and/or benchmarks. In summary: • Panel estimates substantially deviated from benchmarks, and to a far greater degree than probability samples • High variability in data from different opt-in panels • High levels of multiple-panel membership (between 19-45% of respondents belong to 5+ panels) • Substantial differences among low- and higher-membership respondents • Weighting did not correct variations in the data
  • Traugott, 2012• Tested the accuracy of four low cost data collection (LCDC) methods (Mechanical Turk, Pulse Opinion Research, Qualtrics and Zoomerang) • LCDC unweighted demographics substantially deviated from ACS benchmarks to a far greater degree than RDD polls or the ANES • MTurk was substantially younger (45% under 30), female (60%), and more educated (47% with a college degree) than benchmarks. • Pulse was disproportionately older (43% over 65), female (61%), white (90%) and educated (46% with a college degree). • Qualtrics and Zoomerang had better unweighted age, sex, and race distributions, but still had far too few respondents without a high school degree and too many with some college or more.
  • Traugott, 2012 (cont‟d)100% Unweighted Party ID90% Republican Democrat Independent80%70%60% 57% 49%50% 44% 41% 42%40% 35% 37% 36% 31% 27% 28%30%20% 13% 13% 11%10% 1% 0% ANES Mturk Zoomerang Pulse Qualtrics
  • Traugott, 2012 (cont‟d)100% Weighted Party ID90% Republican Democrat Independent80%70% 58%60%50% 43% 44% 45% 46%40% 37% 34% 34% 31%30% 26% 26%20% 12% 13% 13%10% 2% 0% ANES Mturk Zoomerang Pulse Qualtrics
  • AAPOR‟s “Report on Online Panels,” April 2010• “Researchers should avoid nonprobability online panels when one of the research objectives is to accurately estimate population values.”• “The nonprobability character of volunteer online panels … violates the underlying principles of probability theory.”• “Empirical evaluations of online panels abroad and in the U.S. leave no doubt that those who choose to join online panels differ in important and nonignorable ways from those who do not.”• “In sum, the existing body of evidence shows that online surveys with nonprobability panels elicit systematically different results than probability sample surveys in a wide variety of attitudes and behaviors.”• “The reporting of a margin of sampling error associated with an opt-in sample is misleading.”
  • AAPOR‟s conclusion“There currently is no generally accepted theoretical basisfrom which to claim that survey results using samples fromnonprobability online panels are projectable to the generalpopulation. Thus, claims of „representativeness‟ should beavoided when using these sample sources.”AAPOR Report on Online Panels, April 2010
  • OK, but apart from population values, we can still use convenience samples to evaluate relationships among variables and trends over time. Right?
  • Pasek & Krosnick, 2010• Comparison of opt-in online and RDD surveys sponsored by the U.S. Census Bureau assessing intent to fill out the Census.• “The telephone samples were more demographically representative of the nation‟s population than were the Internet samples, even after post-stratification.”• “The distributions of opinions and behaviors were often significantly and substantially different across the two data streams. Thus, research conclusions would often be different.”• Instances “where the two data streams told very different stories about change over time … over-time trends in one line did not meaningfully covary with over-time trends in the other line.”
  • Top Predictors of Intent to Complete the Census In the probability RDD sample In the internet opt-in sampleAge: 18-24 -1.76 Age: 18-24 -1.81Education: <H.S. -1.16 Age: 25-44 -1.29Education: H.S. graduate -.97 Think Census can help them 1.19Spanish speaking household .97 Think Census can harm them -.98Counting everyone is important .92 Participation doesn‟t matter .95Age: 25-44 -.76 (disagree)Don‟t have time (disagree) .70 Counting everyone is important .71Education: Some college -.67 Age: 45-64 -.62Participation doesn‟t matter .67 Don‟t have time (disagree) .61(disagree) Race: Hispanic .56Counting everyone is -.62 Education: <H.S. -.47important (disagree) Pasek & Krosnick, 2010
  • Biggest Differences in Predictorsof Intent to Complete the Census RDD Opt-In Diff. Spanish speaking household .97*** -.38 -1.34*** Importance of counting -.62*** .14 .76** everyone (disagree) Education: <H.S. -1.16*** -.47* .69** Education: H.S. graduate -.97*** -.38 .59** Think Census can help them .61*** 1.19*** .57*** Age: 25-44 -.76*** -1.29*** -.54** Age: 45-64 -.21* -.62*** -.42* Region: Northeast -.06 .27* .33* Pasek & Krosnick, 2010
  • Differences in Time Trend Pasek & Krosnick, 2010
  • Pasek/Krosnick conclusion“This investigation revealed systematic and often sizabledifferences between probability sample telephone dataand non-probability Internet data in terms of demographicrepresentativeness of the samples, the proportion ofrespondents reporting various opinions and behaviors, thepredictors of intent to complete the Census form andactual completion of the form, changes over time inresponses, and relations between variables.”Pasek and Krosnick, 2010
  • Can non-probability samples be „fixed‟?• Bayesian analysis? • What variables? How derived? Are we weighting to the DV or its correlates?• Sample balancing? (i.e., quota sampling revisited) • “The microcosm idea will rarely work in a complicated social problem because we always have additional variables that may have important consequences for the outcome.” • Gilbert, Light and Mosteller, Statistics and Public Policy, 1977
  • Some say you can“The survey was administered by YouGovPolimetrix duringJuly 16-July 26, 2008. YouGovPolimetrix employs samplematching techniques to build representative web samplesthrough its pool of opt-in respondents (see Rivers 2008).Studies that use representative samples yielded this wayfind that their quality meets, and sometimes exceeds, thequality of samples yielded through more traditional surveytechniques.”Perez, Political Behavior, 2010
  • AAPOR‟s task force disagrees“There currently is no generally accepted theoreticalbasis from which to claim that survey results usingsamples from nonprobability online panels areprojectable to the general population. Thus, claims of„representativeness‟ should be avoided when using thesesample sources.”AAPOR Report on Online Panels, 2010
  • It has company• “Unfortunately, convenience samples are often used inappropriately as the basis for inference to some larger population.”• “…unlike random samples, purposive samples contain no information on how close the sample estimate is to the true value of the population parameter.”• “Quota sampling suffers from essentially the same limitations as convenience, judgment, and purposive sampling (i.e., it has no probabilistic basis for statistical inference).”• “Some variations of quota sampling contain elements of random sampling but are still not statistically valid methods.”Biemer and Lyberg, Introduction to Survey Quality, 2003
  • Social Media
  • Sampling Challenges• Users are not limited to one tweet/post per day; some may post incessantly; others rarely• Users can have multiple Twitter accounts• Accounts often do not belong to individuals, but rather companies, organizations or associations• Posts can be driven by a public relations campaign (i.e., through volunteers, paid agents or bots) rather than as an expression of individual attitudes
  • Sampling Challenges (cont.)• Users‟ location is often not accurately captured (if at all)• Users are self-selected. There is no theoretical basis on which to assume that their views are representative of the views of non-users. • 15 percent of online adults use Twitter, 8 percent daily (Pew) • Fewer than 30 percent of Twitter users are Americans (Semiocast) • In 2011, 45% of U.S. Facebook users were 25 or younger (Smith, 2012)
  • Sentiment Analysis• Determining the meaning of posts or tweets requires content analysis: • Traditional approaches (using independent coders and a codeback) typically is unrealistic given the volume of data • Computerized sentiment analysis programs, while lower-cost and faster, are fraught with problems: • Parsing slang, irony, sarcasm, abbreviations, acronyms, emoticons, contextual meaning and hashtags is highly complex • Ambiguity in determining target words (e.g., Obama vs. prez)• Regardless, contextual data (demographic, other attitudes) are sparse or absent, severely limiting the research questions that can be answered
  • Manual vs. Automated Coding Manual Coding Positive Neutral Negative Uncodable (n=100) (n=285) (n=81) (n=34) Positive 8% 5% 2% 0% radian6 Neutral 86% 90% 83% 94% 55% Negative 0% 5% 15% 6% Positive 42% 25% 6% 21% STAS Neutral 45% 57% 71% 68% 45% Negative 13% 18% 23% 12% Positive 30% 20% 2% 9%clarabridge Neutral 60% 61% 85% 79% 43% Negative 10% 19% 12% 12%Kim et al., 2012
  • Data Mining• Rather than attempting to parse the content of online posts, some researchers instead focus on the sheer quantity of posts about a certain topic. • Tumasjan et al. (2010) analyzed the German national election and found that “the mere number of tweets mentioning a political party” was quite good at reflecting their vote share and that its predictive power was similar to that of traditional polls. • At the same time, they found that only 4% of users accounted for more than 40% of the messages. • When such a small number of people are accounting for a large portion of the results, it opens the door for politically motivated actors to distort results intentionally – especially as the use of such methods becomes more popular.
  • Data Mining• Similar analyses in the U.S. have differed • Lui et al. (2011) showed that “Google trends” was worse at predicting 2008 and 2010 election outcomes than the New York Times (probability-sample) polls and even chance. • Gayo-Avello et al. (2011) looked at the 2010 election and found the mean average error for Twitter volume was 17.1% (it was 7.6% for sentiment analysis), far higher than the 2- 3% that is typical of RDD polls. • Olson & Bunnett (2011) found that Facebook “likes” in 2010 explained 13% of the variance in Senate races, but were very weak (or slightly negative) predictors of gubernatorial and house races
  • Twitter Index? The Twitter Political IndexAug. 6 – Obama‟s index # is three times greater than Romney‟s (“61” v. “24”)Aug. 7 – “35” v. “19”
  • Facebook sampling?• Bhutta, 2012: Snowball sampling to reach Catholics • faster and cheaper than traditional methods, but: Facebook GSS 70% 67% 66% 53% 33% 24% 27% 6% % female % Latino % college grad. % attend mass weekly+
  • “…the Facebook respondents cannot possiblyserve as a representative sample of the generalCatholic population. (Pollsters should viewFacebook findings with extreme caution.)”Bhutta, 2012
  • Twitter bots•Recent estimate that as many as 30% of Obama‟sTwitter followers and 22% of Romney‟s werefabricated. (Calzolari, 2012)•News report: Individuals can purchase 25,000fabricated Twitter followers for $247; these “bots” willautomatically tweet snippets of text. (Feifer, 2012)
  • And the future?In probability sampling:• Continued study of response-rate effects• Concerted efforts to maintain response rates• Renewed focus on other data-quality issues in probability samples, e.g. coverage• Development of probability-based alternatives, e.g. mixed-mode, ABS
  • The Future, cont.In convenience sampling:• Further study of appropriate uses (as well as inappropriate misuses) of convenience-sample data• Further evaluation of well-disclosed, emerging techniques in convenience sampling• Enhanced disclosure!• The quest for an online sampling frame
  • The future, cont.In social media:• Vast and growing source of data• Further evaluation of its appropriate use is needed • It‟s important to study how social media influences the formation and change of public opinion. But that does not mean it can supplant probability-sample polling. • Information gleaned from social media may be useful as a compliment to public opinion polls (akin to focus groups); for trend-spotting, qualitative analysis, perhaps modeling• The key: establish the research question; evaluate the limitations of the tools available to answer it.
  • Thank you! Gary Langer Langer Research Associatesglanger@langerresearch.com