Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hacking the access, storage and preservation of social media data

668 views

Published on

IConference 2016 Proceedings
https://www.ideals.illinois.edu/handle/2142/89454

Published in: Social Media
  • Be the first to comment

Hacking the access, storage and preservation of social media data

  1. 1. Hacking the Access, Storage and Preservation of Social Media Data Anatoliy Gruzd, Ryerson University Jenna Jacobson, University of Toronto Elizabeth Dubois, University of Ottawa iConference, Philadelphia, USA (Mar 21, 2016)
  2. 2. Anatoliy Gruzd Ryerson University @gruzd Jenna Jacobson University of Toronto @jacobsonjenna Elizabeth Dubois University of Ottawa @lizdubois Research Team #smdata16
  3. 3. Research at the Social Media Lab #SMdata16 3
  4. 4. Outline 1. Defining ‘Social Media Data Stewardship’ (SMDS) 2. Outlining Social MediaTerms of Services and APIs 3. Reviewing SMDS Issues in a Sample Study 4. Introducing Group Discussions and Leads #SMdata16 4 Collection Storage Analysis Publishing Reuse Preservation
  5. 5. 1.5B users 400M users 300M users Growth of Social Media Data #SMdata16 5
  6. 6. How to Access | Collect | Analyze | Preserve the Full Stream of Social Media Data? #SMdata16 6
  7. 7. Social Media Data Stewardship Defined Social Media Data + Data Stewardship = processes related to all aspects of managing social media data including Collection Storage Analysis Publishing Reuse Preservation #SMdata16 7
  8. 8. Self- collected/ reported Public APIs Data Resellers Collection Storage Analysis Publishing Reuse Preservation SocialMediaData.org Social Media Data Stewardship: Collection #SMdata16 8
  9. 9. Social Media Data Stewardship: Storage and Analysis Collection Storage Analysis Publishing Reuse Preservation Credit: Nathan Lapierre SocialMediaData.org#SMdata16 9
  10. 10. The Rise of Social Bots • Who are we studying? • Humans or bots? Collection Storage Analysis Publishing Reuse Preservation Social Media Data Stewardship: Analysis Challenges SocialMediaData.org#SMdata16 10
  11. 11. The Rise of Algorithmic Filtering • What are we studying? • Human behaviour or algorithms? Social Media Data Stewardship: Analysis Challenges Collection Storage Analysis Publishing Reuse Preservation SocialMediaData.org#SMdata16 11
  12. 12. Collection Storage Analysis Publishing Reuse Preservation Last update in 2013 Social Media Data Stewardship: Publishing & Preservation + Cost prohibitive, constraints on data sharing outside the team = Limited scope / "Spritzer" version ofTwitter grabs + = = ? SocialMediaData.org#SMdata16 12
  13. 13. Ethical Considerations Collection Storage Analysis Publishing Reuse Preservation Social Media Data Stewardship: Ethics #SMdata16 13
  14. 14. Outline 1. Defining ‘Social Media Data Stewardship’ (SMDS) 2. Outlining Social MediaTerms of Services and APIs 3. Reviewing SMDS Issues in a Sample Study 4. Introducing Group Discussions and Leads Collection Storage Analysis Publishing Reuse Preservation #SMdata16 14
  15. 15. • Terms of Service (ToS) • Application Programming Interfaces (APIs) OutliningToS and APIs Collection Storage Analysis Publishing Reuse Preservation #SMdata16 15
  16. 16. http://SocialMediaData.org/api-tos Collection Storage Analysis Publishing Reuse Preservation #SMdata16 16
  17. 17. 1. Data Request Rate Limit ToS Challenges Collection Storage Analysis Publishing Reuse Preservation SocialMediaData.org 5,000/hr 60/min streaming vs. firehose100M/day #SMdata16 17
  18. 18. 1. Data Request Rate Limit 2. Permissibility to Store Data ToS Challenges Collection Storage Analysis Publishing Reuse Preservation SocialMediaData.org “reasonable” no geodata #SMdata16 18
  19. 19. 1. Data Request Rate Limit 2. Permissibility to Share Data 3. User Deleted/Modified Content ToS Challenges Collection Storage Analysis Publishing Reuse Preservation SocialMediaData.org#SMdata16 19
  20. 20. 1. Data Request Rate Limit 2. Permissibility to Store Data 3. User Deleted/Modified Content 4. Permissibility to Share Data ToS Challenges Collection Storage Analysis Publishing Reuse Preservation tweet ID + user ID #SMdata16 20
  21. 21. Outline 1. Defining ‘Social Media Data Stewardship’ (SMDS) 2. Outlining Social MediaTerms of Services and APIs 3. Reviewing SMDS Issues in a Sample Study 4. Introducing Group Discussions and Leads Collection Storage Analysis Publishing Reuse Preservation #SMdata16 21
  22. 22. Reviewing SMDS Issues in a Sample Study • Understanding Political Opinion Leadership •Tweets = 411,138 • Users = 45,986 • Collection = June 14-May 14, 2013 Collection Storage Analysis Publishing Reuse Preservation SocialMediaData.org 22
  23. 23. Collection Storage Analysis Publishing Reuse Preservation SocialMediaData.org Access: Sampling not Spamming 23
  24. 24. Collection Storage Analysis Publishing Reuse Preservation SocialMediaData.org Access: Sampling not Spamming 24
  25. 25. Collection Storage Analysis Publishing Reuse Preservation SocialMediaData.org Sharing:The Right Code 25
  26. 26. Collection Storage Analysis Publishing Reuse Preservation SocialMediaData.org Preservation:When Availability Changes 26
  27. 27. Outline 1. Defining ‘Social Media Data Stewardship’ (SMDS) 2. Outlining Social MediaTerms of Services and APIs 3. Reviewing SMDS Issues in a Sample Study 4. Introducing Group Discussions and Leads SocialMediaData.org Collection Storage Analysis Publishing Reuse Preservation #SMdata16 27
  28. 28. Lead Discussants Katie Shilton @KatieShilton University of Maryland AyongYoon Indiana University-Purdue University Indianapolis Bryan Semaan @bsemaan Syracuse University JessicaVitak @jvitak University of Maryland #SMdata16 28
  29. 29. 1. Collection – Ethics (Katie Shilton) 2. Collection – Research Design (Jessica Vitak) 3. Publishing – For Collaboration (Bryan Semaan) 4. Preservation – Metadata Schemas (Ayoung Yoon) What are the issues associated with publishing or sharing social media datasets that you collected for your research? What metadata schemas and what formats should be used to preserve social media data? Is informed consent necessary/ recommended when collecting social media data? Where and how is data being collected?
  30. 30. Hacking the Access, Storage and Preservation of Social Media Data Anatoliy Gruzd, Ryerson University Jenna Jacobson, University of Toronto Elizabeth Dubois, University of Ottawa

×