Using Twitter as a data source: An overview of ethical challenges

Using Twitter asadata
source: An overview of
ethical challenges
Wasim Ahmed (wahmed1@sheffield.ac.uk)
Prof Peter Bath (p.a.bath@sheffield.ac.uk)
Dr Gianluca Demartini (g.demartini@sheffield.ac.uk)
Ethics and Social Media Research Conference
Monday 21st
of March, 2016

About me
• Second Year PhD student in the Health Informatics
Research Group, Information School, University of
Sheffield
• PhD examines content that is shared on Twitter during
infectious disease outbreaks
• Run a social media research blog (over 9,500 hits)
• Twitter Manager for NatCen Social Research’s New
Social Media New Social Media (#NSMNSS) network.

Overview of presentation
• Part 1: Background to ethical issues
encountered (as a blogger and within PhD)
• Part 2: Completing an ethics application for
purposes of PhD
• Part 3: Ethical issues and challenges that are
discussed within the NSMNSS network

Part 1 : Ethical IssuesEncountered
• PhD topic uses Twitter as a primary source of data –
no interviews/surveys
• Non-traditional data for a social science PhD: concerns
over informed consent, and data confidentiality
• Due to the volume of tweets it is impossible to obtain
informed consent from all Twitter users

Part 1 : Ethical IssuesEncountered
• Synthesized and blogged about software to
retrieve data from Twitter. Twitter data in less
than 5 minutes
• Main critique of post –ethical implications of
using Twitter tools?
• Software developers: data is the in public
domain? But is this the case?

Part 2 : EthicsApplication
• Do I need ethics approval: Twitter data is
publically available for everyone to see?
• Ethics approval required as when the data is
analysed things may emerge from the data that
could draw attention to groups, individuals,
trends etc.
• Beyond what would normally be expected from
engagement on these platforms

• Ethics application covered:
• Potential participants: who are the Twitter users being
analysed: general public, organisations, public
figures, specific geographical locations, or all?
• For Twitter, participants are those whom use specific
keywords & hashtags.
• Data confidentiality and data storage measures: data
is stored on secure laptops.

• Methods used may have ethical implications
e.g., crowdsourcing (used in some articles)
• My ethics application listed: semantic analysis,
sentiment analysis, and thematic analysis
• Amendments can be requested if a new
method of analysis is to be used to analyse
Twitter data

• Consent:
• Ethics application made it possible to gain consent to
use tweets / user handles within publications
• Also possible to gain consent via a tweet if it was not
possible for participants to view a participant
information sheet and complete a consent form
• Electronic versions of participant information and
consent sheets, for example, via a Google form.

• Majority of analysis is aggregate – identifying
themes / clusters of tweets
• Individual tweets and/ or user handles in
original form will never be published
• Data is analysed confidentially and never in
public

Part 3: Ethical issuesdiscussed within NSMNSSnetwork
• Public vs. private/ Facebook vs. Twitter – is the
space being researched seen as private by its
users?
• Twitter datasets may include data generated by
minors – sometimes overlooked.
• Very difficult to filter Twitter data for under 18s

Part 3: Ethical issuesdiscussed within NSMNSSnetwork
• Although not of direct relation to my PhD – there
are cases of ‘suspect’ social media research in
the media
• Important to understand as to avoid potential
pitfalls

Casestudiesdiscussed in NSMNSSnetwork
• Samaritans Radar app designed to detect when people on Twitter
appeared to be suicidal – used an algorithm to identify key words
and phrases which indicated distress.
• Users who have signed up for the scheme would receive an email
alert if someone they followed tweeted such statements.

Casestudiesdiscussed in NSMNSSnetwork
• Timeline of events
• 30th
October 2014 - reassure users and mention Whitelist
• 31st
October – mention testing and research input
• 2nd
of November – Mention subscribers (3 thousand), and 20 thousand Twitter
mentions and trending for2 days.
• 4th
November – Offer reassurance and that they have listened to feedback
• 7th
November – Apologise for any distress caused to the public due to the range of
information and opinion on the app. Suspend the app.
• 14 November – Further apologise for any stress offer number, email, and website
• 10th
of March 2015 – Confirm that the app has been permanently deleted
Launched 29 October 2014 and suspended on 7th
November 2014

Casestudiesdiscussed within NSMNSSnetwork
• Facebook emotion study (Jan 2012) positive posts from
155,000 Facebook users were removed
• Issues of using scrapped data sets e.g., Reddit dataset
containing every publically available Reddit comment.
• Ted Cruz using firm that harvested data on millions of
unwitting Facebook users

What next?
• No fast answers but some good work already:
Report by NatCen - Research using Social
Media; Users’ Views
• Findings: Participant’s views about research
using social media fell into three categories:
scepticism, acceptance and ambiguity.

What next?
• Research using Social Media; Users’ Views
Suggestions for improving research practices:
•Sampling and recruitment – be transparent in
purpose and aims in order to ethically recruit
participants to online and social media research

What next?
•Collecting or generating data – to improve
representativeness of findings and to understand
privacy risks of platform used in a study in order
to uphold protection

What next?
•Reporting results – to protect the identify and
reputation of participants, maintain their trust in
the value of the research and contribute to the
progression of the field by being open and honest
in reporting.

• Wisdom of the Crowd ‘#SocialEthics’ report
(Ipsos MORI and Demos/CASM)
• Findings: public awareness that information on
social media can be mined for research is low
compared to other uses of social media data
What next?

Recommendations for Researchers:
•Researchers to work with developers – only
collect data that is required by the project, and
remove if not required
•Move to a culture of questioning whether the
data being collected is really necessary for a
research project
What next?

• Examples of data minimization for a project may
include:
• Removing author’s name and
@tag/userhandle from researchers sight
• Stripping out other data that is downloaded
e.g., named persons or place names
What next?

• Examples of data minimization for a project may
include:
• Removing metadata that is not relevant for
the purposes of a research project such as
GPS data that might be attached to the social
media post
• Creating generalized groupings of data rather
than analysing specific data e.g., by cities
instead of exact street locations
What next?

Conclusion
• Genuine ethical issues around
researching social media
• Academic researchers refer to best
practice guidelines when conducting
social media research
• Ethics application allows you to think
through ethical implications from the
beginning of the research process
03/22/16 © The University of Sheffield

Using Twitter as a data source: An overview of ethical challenges

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Using Twitter as a data source: An overview of ethical challenges

Similar to Using Twitter as a data source: An overview of ethical challenges (20)

Recently uploaded

Recently uploaded (20)

Using Twitter as a data source: An overview of ethical challenges