This document discusses ethical considerations around collecting and archiving social media data from Twitter. It notes that collecting tweets allows researchers to study important events and movements but may involve vulnerable individuals. While some topics like natural disasters are collected, others involving minors or sensitive issues are not. Embargoing data releases allows time for implications to settle and users to delete tweets. However, social media archiving poses challenges as future uses are unknown and obtaining consent does not scale. The ethics are complex with disconnects between those who collect data and those engaged with communities.
2. This is a tweet from Twitter’s API ...
{
"created_at": "Wed Feb 28 11:48:24 +0000 2018",
"id_str": "968815141558112256",
"full_text": "Does embargoing the release of a Twitter dataset of a social
movement / activism mitigate possible harm? Users have chance to delete
tweets. Less likely to be of use to law enforcement / state security.
Thinking as I ready our #Charlottesville tweet ids for release.",
"user": {
"id_str": "481186914",
"screen_name": "justin_littman",
},
"place": {
"full_name": "Washington, DC",
},
"favorite_count": 14,
"lang": "en"
}
3. Key affordances of Twitter API
● Types of collecting
○ By account
■ Can be private citizen, public figure, politician, bot, organization.
■ Actual identity is often not known.
■ Examples: Members of Congress, Beltway journalists, news
organizations
○ By keyword / hashtag
■ Cuts across many accounts.
■ Examples: #election2016, #MAGA, #WinterOlympics,
#WomensMarch
4. Key affordances of Twitter API
● Only current tweets can be collected, not historical tweets.
○ If tweets aren’t collected now, they can’t be collected.
● Sharing of Twitter datasets is limited by Twitter policies.
○ Share tweet ids only, not the complete tweet.
○ From tweet ids, complete tweets can be retrieved (if not deleted
or account deleted / protected / suspended).
○ Provides a “right to be forgotten.”
○ However, barrier to holding politicians accountable, researching
bots and reproducible research.
5. Twitter collecting at GWU in a nutshell
● GW Libraries proactively collect Twitter datasets.
○ Topics primarily align with GWU’s research interests.
● Support researchers reusing existing datasets or collecting own
datasets.
○ Researcher = faculty, student
● Publicly release tweet ids datasets.
● Tools
○ Social Feed Manager: http://go.gwu.edu/sfm
○ TweetSets: https://tweetsets.library.gwu.edu/
● Scale: 650 million tweets in 2017
6. Encouraging ethical social media research at GWU
● Discuss ethical and privacy considerations with researchers
○ Provide a 2-page summary: http://go.gwu.edu/sfmethics
● Focus on “yellow flags.”
○ For example, collecting the tweets of minor or students.
● Topics include:
○ Data collecting
○ Data sharing
○ Publishing
○ Other resources
7. Ethical conundrums
● Ethical murkiness: What social media to proactively collect or
not collect?
○ #MeToo: Important social movement, but women are sharing
personal and traumatic experiences.
○ #Parkland / #NeverAgain: Important event and social movement,
but involves minors.
● Each of these are areas are important, but involve vulnerable
individuals / communities or sensitive topics.
● Every decision not to collect results in a gap in historical record.
8. Ethical conundrums: examples
Collected
● Natural disasters
○ #HurricaneHarvey
○ #HurricaneIrma
● Social protests
○ #WomensMarch
○ #Charlottesville
○ #IranProtests
● Political activity
○ #election2016: By hashtag
○ #election2018: By hashtag and top
accounts
Not collected
● Mass killings
○ #PulseNightclub
○ #LasVegas
● Social movements
○ #BlackLivesMatter
○ #MeToo
○ #Parkland / #NeverAgain
9. Thoughts on these ethical conundrums
● Social media collecting is not a good fit for the ethical tools of
archives or social science research.
○ Ethical tools = informed consent or donor agreements.
○ Don’t scale.
○ Identities of users often unknown.
○ Difficult to communicate with users.
● Future uses of social media archives are not known.
○ Makes considering benefits / harms more challenging.
○ When collecting for current research, research questions /
methodologies are known.
10. Thoughts on these ethical conundrums
● Embargoing may be a useful strategy for some collections.
○ Allow social implications to settle out.
○ Gives users the opportunity to delete tweets.
11. Thoughts on these ethical conundrums
● When I say the “ethics are murky,” I mean:
○ I don’t adequately understand the possible benefits / harms of the
collection or the intent of the content creator.
○ Ethics would be less murky for someone who is a member of /
engaged with communities.
● Currently a disconnect between:
○ Those with the ability to collect social media.
○ Those engaged with communities and better equipped to grapple
with the ethics.