Social media data is a rich source of behavioural data that can reveal how we connect and interact with each other online in real time and over time, and what that might mean for our society as we continue to speed towards an increasingly computer-mediated future. However, much of the data being collected are being used in ways that are not always transparent to the users. Also once collected, the data can be combined with other types of data and analyzed by algorithms to reveal even more sensitive information about the users. As a result, questions around why and how data consumers’ use social media data are becoming pertinent, especially in the aftermath of the Facebook’s Cambridge Analytica scandal. This talk will discuss privacy and ethical implications of working with social media data.
The State of Social Media Research After Cambridge Analytica
1. The State of Social Media Research
After Cambridge Analytica
Anatoliy Gruzd
@gruzd
Symposium on Rethinking Digital Humanities
CRIHN, Université de Montréal
October 26, 2018
4. Social Media Data Stewardship Defined
Social Media Data + Data Stewardship =
processes related to all aspects of managing social media data including
Collection Storage Analysis Publishing Reuse Preservation
@Gruzd SocialMediaData.org 4
6. Social Media Data Stewardship:
Collection Storage Analysis Publishing Reuse Preservation
Cloud &
Distributed
Computing
Data &
Information
Organization
Analytics Visualization
6
7. Social Media Data Stewardship:
Social Media ResearchToolkit
Collection Storage Analysis Publishing Reuse Preservation 7
http://socialmediadata.org/social-media-research-toolkit/
8. • Who are we studying?
• Humans or bots?
Collection Storage Analysis Publishing Reuse Preservation
Social Media Data Stewardship:
The Rise of Social Bots (and Cyborgs)
9. Collection Storage Analysis Publishing Reuse Preservation
Social Media Data Stewardship:
Example: Internet Research Agency (IRA) –Twitter Dataset
2011 - 2018
3,836 IRA-associated accounts
9M tweets & RTs
https://about.twitter.com/en_us/values/elections-
integrity.html#data
10. Collection Storage Analysis Publishing Reuse Preservation
Social Media Data Stewardship:
Example: Internet Research Agency (IRA) –Twitter Dataset
Oct 8 – Nov 8, 2016 (200k tweets)
38k nodes | 190k edges
red nodes = IRA-associated accounts
edges = RTs/replies
11. Collection Storage Analysis Publishing Reuse Preservation
https://botometer.iuni.iu.edu/#!/
Social Media Data Stewardship:
Bot DetectionTool
12. Collection Storage Analysis Publishing Reuse Preservation
Social Media Data Stewardship:
Terms of Service
@Gruzd SocialMediaData.org 13
14. “Hydrating” Tweet ID Datasets
(Recollecting)
https://github.com/DocNow/hydrator
Collection Storage Analysis Publishing Reuse Preservation
Social Media Data Stewardship:
Example:Twitter Data Sharing
@Gruzd SocialMediaData.org 15
17. Collection Storage Analysis Publishing Reuse Preservation
Social Media Data Stewardship:
Public Archives
+
very limited scope+ =
= no longer archives all tweets
@Gruzd SocialMediaData.org 18
main focus on politically driven
manipulation campaigns
=
19. Ethical Considerations For
Researchers
Research more ‘acceptable’ if:
• “it’s going to a good cause”
• “morally right”
• “general public good”
• non-profit or academic
Social media users care about data quality,
accuracy, representation.
Special consideration when studying
sensitive topics and vulnerable groups
(e.g., minors, the deceased, mental
health). (Golder et al. 2017)
20. Ethical Considerations For
Third Parties
• Recent Work:
• Privacy Concerns and Self-Disclosure in
Private and Public Uses of Social Media
(Gruzd & Hernández-García, 2018)
• Journalists’ Use of Social Media to Infer
Public Opinion: The Citizens’ Perspective
(Dubois, Gruzd, & Jacobson., 2018)
• Employers’ Use of Young People’s Social
Media: Extending Stakeholder Theory to
Social Media Data (Jacobson & Gruzd, 2018)
21. Survey of Online Adults in Canada
• Academic researcher
• Marketer
• Financial institution
• Employer
• Journalist
• Government
• Legal professional
• …
• Credit check
• Insurance claim
• Public health
monitoring
• Law enforcement
• Political polling
• Suicide prevention
• Tenant application
review
• …
• Posting frequency
• Location
• Photos
• Posts
• Topics
• Sentiment
• Communication
network
• Friends’ list
• …
Who Why What data
(Gruzd, Jacobson, Mai, & Dubois, 2018)
22. Examples of Social
Media Data Use
Banks
Who
Determine
credit score
For what purpose
Aggregated
data
What information
http://uk.businessinsider.com/yasaman-hadjibashi-at-barclays-africa-banks-
using-big-data-and-social-media-2016-9
@Gruzd 24
23. Examples of Social
Media Data Use
Security
Agency
Who
Identify real
or potential
threats
For what purpose
Location-
based
information
What information
https://www.forbes.com/sites/kalevleetaru/2016/10/12/geofeedia-is-just-the-tip-
of-the-iceberg-the-era-of-social-surveillence/#10b6941a40e2
@Gruzd 25
24. Examples of Social
Media Data Use
Car
insurance
company
Who
Price car
insurance
For what purpose
Facebook
posts and
likes
What information
https://www.theguardian.com/technology/2016/nov/02/admiral-to-price-car-
insurance-based-on-facebook-posts
@Gruzd 26
25. Examples of Social
Media Data Use
Cambridge
Analytica
Who
Political
targeted ad
campaigns
For what purpose
Facebook
user data
What information
https://www.theguardian.com/news/series/cambridge-analytica-files
@Gruzd 27
26. Comfort by
Third Party
How comfortable would
you be if one of the
following entities
accessed information
about you or posted by
you publicly on social
media?
% Uncomfortable
27. $
Comfort by
Data Type
How comfortable
would you be if a third
party accessed the
following information
about you or posted by
you publicly on social
media?
% Uncomfortable
28. KeyTakeaways
• Social Media Research Ethics → not a
one size fits all framework
• Privacy varies across uses, users, and
data types
• Academic researchers > Employer >
FinancialCompanies, Government bodies,
Marketers
• Aggregated > Raw > Social Network Data
• No difference in privacy concerns when
asked about private vs. public social
media data
• People are adopting different privacy
protection strategies @Gruzd 30
29. References
• Gruzd & Hernández-García. (2018). Privacy Concerns and Self-Disclosure in
Private and Public Uses of Social Media. Cyberpsychology, Behavior, and Social
Networking 21(7), 418–428.
• Dubois, Gruzd, & Jacobson. (2018). Journalists’ Use of Social Media to Infer
Public Opinion:The Citizens’ Perspective. Social Science Computer Review.
• Jacobson & Gruzd. (2018). Employers’ Use ofYoung People’s Social Media:
Extending StakeholderTheory to Social Media Data. In Academy of Management
Proceedings (Vol. 2018, p. 18217).
• Gruzd,A., Jacobson, J., Mai, P., & Dubois, E. (2018).The State of Social Media in
Canada 2017. Public Report.
• Gruzd,A., Jacobson, J., Mai, P., & Dubois, E. (2018). Social Media Privacy in
Canada. Public Report.
• Image credit: https://pixabay.com
@Gruzd SocialMediaData.org 31
30. #SMSociety
400+ Authors
250+ Attendees
28 Countries
IMPORTANT DATES
Full & WIP Papers
Due: Jan. 28, 2019
Panels,Workshops, Posters
Due: Mar. 18, 2019
Conference Dates
July 19-21, 2019
31. The State of Social Media Research
After Cambridge Analytica
Anatoliy Gruzd
@gruzd
Symposium on Rethinking Digital Humanities
CRIHN, Université de Montréal
October 26, 2018