AAPOR - comparing found data from social media and made data from surveys

452 views
335 views

Published on

This presentation was for the 2014 AAPOR conference, and deals with specific components of how "big data" from social media is different from data acquired through surveys.

Published in: Social Media, Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
452
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • How participants understand the activity of responding or posting
    Different motivations and communicative dynamics

    Nature of the data
    Different structure, users, and data properties

    Practical, ethical, and analytic considerations
  • How participants understand the activity of responding or posting
    Different motivations and communicative dynamics

    Nature of the data
    Different structure, users, and data properties

    Practical, ethical, and analytic considerations
  • AAPOR - comparing found data from social media and made data from surveys

    1. 1. "When Are Big Data Methods Trustworthy for Social Measurement?" Cliff Lampe (@clifflampe), Josh Pasek, Lauren Guggenheim, Fred Conrad University of Michigan Michael Schober The New School for Social Research
    2. 2. Presenting on “Big Data” • Cliff Lampe – University of Michigan School of Information – Social Scientist who uses some Big Data techniques – NOT A REAL DATA SCIENTIST – Background in survey research
    3. 3. Mostly publish in Computer Science conferences
    4. 4. CHI – Computer Human Interaction KDD – Knowledge Discovery and Data Mining WSDM – Web Search and Data Mining
    5. 5. Ironically Data-Free Presentation Today we are presenting on methodological issues of Big Social Data and surveys. Not presenting new data. First we describe Big Data and Big Social Data as terms. Then we describe methodological considerations at the intersection of surveys and Big Social Data
    6. 6. There have been many hyperbolic claims about Big Data Is Big Data going to replace other forms of social measurement, or is it too flawed to survive (HINT: Neither)
    7. 7. What is Big Data?
    8. 8. Big Data started in the physical sciences
    9. 9. Big Data is increasingly being applied to social science questions
    10. 10. What counts as “big”? LHC: .001% of sensors lead to 25 petabytes annually. Wikipedia: 17 terabytes Twitter: ~ 10 GB/day How many observations needed to count as “big”? Note: 100 million records not all that big.
    11. 11. Almost nobody who uses these techniques would use the term “big data”. Similar to surveys vs. polls. Big Data is short hand for a variety of techniques that include: - Data capture - Data storage - Data analytics - Search and Retrieval
    12. 12. Challenges in “Big Data” Capture Curation Storage Search Sharing Transfer Analysis Visualization Related terms: Computational social science, data science, information access and retrieval, Web-scale data, data mining, machine learning, non- reactive data
    13. 13. Big Social Data: large data sets about humans that are collected from social interactions captured online, primarily in social media sites.
    14. 14. What are the characteristics of surveys and Big Social Data that define when they are complementary, supplementary, or orthogonal?
    15. 15. Bob Groves “Three Eras of Survey Research” Mick Couper “Is the Sky Falling? New Technology, Changing Media, and the Future of Surveys”
    16. 16. Survey Research 80+ years of research and practice Sampling procedures Question design Estimating precision of statistics Practices in reducing survey error Attempt to represent the population of interest with a sample
    17. 17. Research Questions • Do we see big social data and survey data telling us the same things about society? When and why might this happen? • How do survey data and big social data compare on important dimensions? • In what ways are the two fundamentally different from each other? • How are their uses different from one another?
    18. 18. Highlighting 3 Areas of Concern How participants understand the activity of responding or posting Different motivations and communicative dynamics Nature of the data Different structure, users, and data properties Practical, ethical, and analytic considerations
    19. 19. Participants Understanding
    20. 20. Participants’ Understanding – Posting initiative or motivation – Informed consent – Ability to opt out – Prior considerations – User identity – Perceived audience and social desirability – Time pressure/synchrony – Respondent burden
    21. 21. Participants’ Understanding • Nature of perceived audience – Survey: Interviewer, Organization, others in HH – BSD: Groups of friends, acquaintances, public • Social Desirability – Survey: Avoid negative evaluations from researcher – BSD: Manage impressions for their audience • Scale of data • Face threatening topics
    22. 22. Participants’ Understanding • Identity of user – Survey: Kept anonymous – BSD: User-created persona. Multiple users on a single account, multiple accounts for one user, corporate users, etc. • Prior Considerations – Survey: May not have thought about issue – BSD: Have thought about it, maybe not deeply • Being asked vs caring to post
    23. 23. Nature of the Data
    24. 24. Nature of the Data – Population coverage – Sampled units – Sampling – Sample size – Temporal properties – Relevance to research topic – Granularity of possible analyses – Data structure – Auxiliary information
    25. 25. Nature of the Data • Sampling – Surveys: Representative of population of interest (via probability sampling) – BSD: Users/messages not the full population. User accounts are not always users. Frequency of posting among users varies • Sample Size – Surveys: Balance between large enough to make inference and low cost – BSD: More users and posts than surveys. Limited by access/storage. • Can size help overcome sampling/representativeness problems? • The aggregation of SM does not necessarily map on to collection of individual users in survey research
    26. 26. Nature of the Data • Temporal properties: – Surveys: Memory retrieval, measurement at discrete moments – BSD: Posting on recent events, continuously • Auxiliary data: – Surveys: Paradata (# calls, behavior during interview) – BSD: Geolocation, system activity, profile info
    27. 27. Practical, Ethical and Analytic Considerations
    28. 28. Practical, Ethical, and Analytic Considerations – Established research communities – Consent to research/IRB – Perception of research among public – Costs to researchers – Data ownership – Adjustments for non-representativeness – Stability of data source and adjustments – Updating models in changing environment – Users and impact
    29. 29. Practical Considerations • Adjustments for non-representativeness – Surveys: Well developed, weighting – BSD: No standard use, depends on style of analysis, may not be done if using certain techniques • Ethical issues – Surveys: Explicit consent, regulated by govm’t/IRB – BSD: Unaware of terms in user agreement, inconsistently regulated by IRBs
    30. 30. Practical Considerations • Perception of research/Legitimacy – Surveys: fatigue, falling response rates, confusion about legitimacy – BSD: not considered while posting, but concerns over surveillance
    31. 31. YOU’RE SLOW AND EXPENSIVE! YOU AREN’T REPRESENTATIVE!
    32. 32. Conclusion We need to stop arguing about the wrong things. We need a systematic agenda of research looking at the intersection of these methods. socialmediasurveys@umich.edu cacl@umich.edu Twitter: @clifflampe

    ×