Successfully reported this slideshow.
Your SlideShare is downloading. ×

O3S 2018 - Data at Pew Research Center

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 24 Ad
Advertisement

More Related Content

Advertisement

O3S 2018 - Data at Pew Research Center

  1. 1. Data at Pew Research Center OS3 2018 Patrick van Kessel Senior Data Scientist @pvankessel
  2. 2. December 7, 2019 2 About Pew Research Center Pew Research Center is a nonpartisan fact tank that informs the public about the issues, attitudes, and trends shaping the world. It does not take policy positions. The Center conducts public opinion polling, demographic research, content analysis and other data-driven social science research. It studies U.S politics and policy; journalism and media; internet, science and technology; religion and public life; Hispanic trends; global attitudes and trends; and U.S. social and demographic trends. All of the Center’s reports are available at www.pewresearch.org. Pew Research Center is a subsidiary of The Pew Charitable Trusts, its primary funder.
  3. 3. December 7, 2019 3 Our Relationship with Data Are we data creators or data consumers? We’re both! - Organic data - Open-source tools and methods - Survey data - Augmented organic data - Analysis and findings - New methods and tools The World Pew Research Center
  4. 4. December 7, 2019 4 What is Data Labs? • Created to collect, repurpose, and enrich organic data to supplement our surveys • Data scientists, engineers, and computational social scientists • Conduct original research and collaborate with other teams • Promote emerging computational methods and new data sources
  5. 5. December 7, 2019 5 Leveraging Open Data • Social media data (APIs) • Facebook • Twitter • YouTube • Google • Administrative datasets • FEC • FCC • Census / ACS • Other organic data • Online sermons • Mechanical Turk listings • Google search results • Google Maps • Congressional press releases • News articles
  6. 6. December 7, 2019 6 Leveraging Open Data FEC data Twitter data FCC data
  7. 7. December 7, 2019 7 Leveraging Open Data Facebook data Facebook data Reddit data
  8. 8. December 7, 2019 8 Contributing Open Data • Traditionally, Pew Research Center has been a data producer • 15+ years of survey research • We strive to share as much data as we can
  9. 9. December 7, 2019 9 Contributing Open Data • Most of our datasets eventually become available for download • Free and available to the public • http://www.pewresearch.org/downl oad-datasets/ • http://www.pewresearch.org/fact- tank/2018/03/09/how-to-access- pew-research-center-survey-data/
  10. 10. December 7, 2019 10 Contributing Open Data • Survey data released as .sav files!? • A proprietary format, but one that preserves question text and labels • Can be used in open-source statistical analysis programs like R (using packages like foreign and haven) • We even have an online guide on how to use these files: https://medium.com/pew-research- center-decoded/how-to-analyze- pew-research-center-survey-data-in- r-f326df360713
  11. 11. December 7, 2019 11 Leveraging Open Data • Some organic online data is becoming more difficult to collect for research: • Social media API restrictions (Facebook, Twitter) • GDPR • But we’re working towards finding a balance between the benefits of privacy and social research • A number of companies are now forging public-private research partnerships and have put out calls for proposals (e.g. https://socialscience.one)
  12. 12. December 7, 2019 12 When You Can’t Share Data • Even when available, organic data can be difficult to share • Terms of service / API restrictions • Size and complexity • Survey data can’t always be shared, either • Privacy concerns / disclosure risk • Especially with panel data: can’t release detailed geographic information
  13. 13. December 7, 2019 13 When You Can’t Share Data • Some emerging solutions show promise • Differential privacy • Synthetic data • But these are currently difficult to implement • So, if you can’t make your data open, how do you still support open scholarship?
  14. 14. December 7, 2019 14 Share What Data You Can • Share some of the data, even if you can’t share it all • Summary stats and aggregations • We try to make what we can available, even if we can’t release the raw data
  15. 15. December 7, 2019 15 Share the Process • There’s still opportunity for methodological transparency • Explain in detail how the data were made • How sampling frame was defined • How the data changed at every step (preprocessing, etc.) • How algorithms were trained • How data were weighted • Conduct and describe extensive validation • Provide everything necessary for successful replication if not reproduction
  16. 16. December 7, 2019 16 Share the Process • Our team’s methodology appendices tends to be nearly as long as our reports • We also have a dedicated Methods team that produces reports entirely focused on methodological transparency and innovation Report Methodology Sharing the News in a Polarized Congress 11 pages 9 pages Taking Sides on Facebook 21 pages 20 pages Bots in the Twittersphere 11 pages 18 pages Partisan Conflict and Congressional Outreach 38 pages 25 pages
  17. 17. December 7, 2019 17 Share the Process • Also: share the assumptions and limitations! • Be transparent about: • 1) All methodological decisions you make • 2) What the data can - and can’t - say • How? • Robustness checks and human validation • Control for confounds with regressions where possible • Show how the results of an analysis change under different assumptions
  18. 18. December 7, 2019 18 Share the Process
  19. 19. December 7, 2019 19 Share the Process • New blog to make our methods more transparent and accessible
  20. 20. December 7, 2019 20 Share the Tools • We’re starting to release code publicly on GitHub: http://github.com/pewresearch
  21. 21. December 7, 2019 21 Looking Forward • More survey data releases, including panel data • More open-sourcing, including tools that we use to analyze survey/organic data • Rigorous devotion to adhering to (and defining) best practices
  22. 22. December 7, 2019 22 Engage with the Community • Be responsive to questions from other researchers, interested observers • Present in-progress work at academic and technical conferences • Engage in a conversation about transparency • And always try to do more
  23. 23. December 7, 2019 23 • To that end: • What can we share with you? • What data do you want to see more of? • What methodologies can we be more transparent about? • What tools or software would be useful to release? • Let us know! info@pewresearch.org
  24. 24. December 7, 2019 24 Thank You! Patrick van Kessel Senior Data Scientist pvankessel@pewresearch.org

×