SlideShare a Scribd company logo
1 of 60
Working with Social Media Data:
Ethics & Good Practice around
Collecting, Using and Storing Data
Nicola Osborne
Digital Education Manager, EDINA
Nicola.osborne@ed.ac.uk
@suchprettyeyes
Introductions: my social media work
• Digital Education Manager at EDINA, University of Edinburgh.
• Work on EDINA’s educational technology, innovation, digital and data projects
for audiences across Scotland, UK and further afield.
• Co-I on: PTAS-funded Managing Your Digital Footprints research strand (2014-
2015); Ongoing (2015-) Managing Your Digital Footprint research team; PTAS-
funded “A Live Pulse”: Yik Yak for understanding teaching, learning and
assessment at Edinburgh project.
• Co-tutor on ongoing Digital Footprint MOOC (2017-)
• Previously EDINA Social Media Officer (2009-2015), providing expertise and
advice on social media to colleagues across UoE for over 8 years.
http://edina.ac.uk/
Introduction: you and your work
1. Who are you?
2. What social media related research are you
working on or hoping to work on?
3. What do you hope to get out of today’s
session?
Overview
• Introduction & Design Considerations
– Approach
– Data accuracy
• Ethical Considerations
– Recommended ethical guidance
– Terms & Conditions – and impact on Data
– Consent and trust
• Practical Considerations
– Existing data sets
– Available data tools
– APIS
– Options for analysis and visualisation
• Storing and handling Data
– Compliance with legal requirements
– Sources of support
• Recommended researchers, groups, and resources.
• Q&A/Discussion – but questions welcome throughout!
Where to start…
• What is your research question(s)?
• Are social media or social media communities the
subject, or core to the subject?
• Or, is it the space for recruitment or reaching an
audience?
• Or, is it just a convenient space for data collection?
The Elephant (Blue Bird) in the Room
Image ©Twitter.com 2012
Research Design Considerations
• Research approach to be taken
• Appropriate data types to support your research
– Streaming/live data OR
– Archived / capture of data over time with asynchronous analysis
• Ethical considerations
• Consent process of subjects and their network
• Etiquette considerations
• Platform(s) to be used
– Fit with target subjects
– Terms & Conditions
• Practical access limitations e.g.
– Do tools for data capture exist?
– Does an API exist?
– What are the API limitations?
– Costs of access
• Your (researcher) or RAs expertise.
• Long term research vision – do you have rights to use
and reuse data in the ways you hope to?
Possible Methods &
Questions to Think About
• Computational (See also Batrinca and Treleaven 2015):
– Data access through APIs, screen scraping, established methods (e.g. DMI tools)?
– Text and data mining and/or Natural Language Processing (NLP)?
– Social network analysis and/or Actor Network Theory (ANT) analysis using nodes and edges in the network?
– Sentiment analysis based on text mining/NLP or based on presence/absence of emojis and/or visual content?
– Visual analysis and/or video or audio analysis for multimedia content?
• Quantitative (See also OII 2013a, b & c):
– Medium or large scale data?
– Automated or survey/volunteered data collection?
– Data cleansing process – how will you ensure that you have a good quality data set?
– What kind of statistical analysis do you want to take? Tools might include SPSS, NVIVO, Gephi, Tableu, etc.
– Will you be comparing to existing data sets and/or undertaking trend analysis over time?
– What standard tools in your field – for digital or non digital data – can you use to collect or interpret your data?
• Qualitative:
– Manual collection?
– Ethnographic approaches and/or participant observation
– Focus groups or similar?
– Critical/reflexive reading and coding of texts/content
Batrinca, B. and Treleaven, P.C., 2015. Social Media Analytics: a survey of techniques, tools and platforms. In AI & Society, 30 (1). Pp. 89-116. https://doi.org/10.1007/s00146-014-0549-4
Oxford Internet Institute, 2013a. Quantitative Methods in Social Media Research: Big Data. In OII YouTube Channel, 15 March 2013. See: https://www.youtube.com/watch?v=6hmjj7_1sSY
Oxford Internet Institute, 2013b. Quantitative Methods in Social Media Research: Populations and Sampling. In OII YouTube Channel, 15 March 2013. See: https://www.youtube.com/watch?v=6hmjj7_1sSY
Oxford Internet Institute, 2013c. Space-Time as a Sampling Condition for New Media Research. In OII YouTube Channel, 15 March 2013. See: https://www.youtube.com/watch?v=HNxn0PqOc8k
Is Social Media Data Representative?
• Not all people use social media (and some of the least privileged groups in society are not
online at all).
• Most social media data collection methods favour English language data in mainstream
US/Global sites. It is unusual to see multilingual research or research that acknowledges use
of content including non-English text by primarily English speakers.
• Privacy settings and publicness tend to reflect status and privilege. Accessing at-risk,
vulnerable, heavily trolled, and/or niche interest groups is more difficult than obtaining public
posts from middle class white male social media users. BAME communities, women’s groups,
LGBTQ+ communities, etc. tend to make higher use of private groups, group moderation, and
protective measures that require more qualitative and overt consent-based approaches.
• Not all social media users are active. There is an “activity and agency bias” (Lutz and Hoffman
2017) in much of the current research. Obtaining data on passive reads and engagement with
content is extremely difficult through quantitative methods. It may be easier with participant
observation.
Lutz, C. and Hoffman, C. P. 2017. The dark side of online participation: exploring non-passive and negative participation. In Information,
Communication & Society: AoIR Special Issue, 20 (6), pp. 876-897. http://dx.doi.org/10.1080/1369118X.2017.1293129
Question/Discussion
Which platform(s) are you intending to/are you
working with?
How did you select these social media spaces?
Ethical Considerations
• Visibility vs expectations of privacy:
– Being “in public” is not consent to being researched, their imagined audience may be quite different.
(see AoIR guidance, Marwick and boyd 2011)
– Are you engaging with private or “public” figures – expectations over visibility will vary significantly.
• How possible is it to obtain informed consent for work undertaken with your chosen social
media platform? How can consent be withdrawn?
• How will your data be collected and used? (Attributed vs Pseudonyms vs Anonymous).
• What personal data is being used? Does it put anyone at risk?
• What is the risk of accidental exposure or re-identification? Text snippets, quotes and images
may all be easily searchable.
• Public – or previously public – data can change in sensitivities over time.
• How will you handle/remove/retain subsequently deleted content
Marwick, A. and boyd, d., 2011. I tweet honestly, I tweet passionately: Twitter users, context
collapse, and the imagined audience. In new Media & Society, 12 (1), pp. 144-133. DOI:
10.1177/1461444810365313.
Recommended: AoIR Ethics Guidance
• AoIR Ethics Guidance (2012):
https://aoir.org/reports/ethics2.pdf
• AoIR Ethics Chart – a quick guide to
key issues:
https://aoir.org/aoir_ethics_graphic_2
016/
• AoIR Ethics Guidance (2002):
https://aoir.org/reports/ethics.pdf
• Annette Markham (co-author of AoIR
guidance) on Impact Models for
ethical decision making in data
research and design:
https://annettemarkham.com/2017/0
7/impact-model-ethics/
Recommended: Social Media
Research: A Guide to Ethics
• Excellent concise research ethics
guidance from the ESRC-
funded “Social Media, Privacy and
Risk: Towards More Ethical Research
Methodologies” project at
University of Aberdeen.
• Includes pointers to further social
media ethics resources.
• Townsend, L. and Wallace, C. 2016.
Social Media Research: A Guide to
Ethics. Aberdeen: University of
Aberdeen/ESRC Social Media
Enhancement project. Available
from: http://www.dotrural.ac.uk/soc
ial-media-research-ethics/
“But the data is already public”
In 2008 researchers released profile data (The T3 Data Set) from Facebook accounts of students at a US University,
inadvertently making identifiable data public, as reported in Zimmer (2010).
In this case the researchers:
• Had employed RAs who were part of the Network being examined and had (various levels of) access to more
information than a non-logged-in user of Facebook/user beyond the Network.
• Had funding that mandated open publishing and sharing of results.
• Had University but not individuals consent for data collection
• Combined Facebook with university housing data in their data sets
• Obscured the identity of the university where students were based, but described key characterstics
• Attempted to make all data anonymous by removing identifying information (name, student id, etc.) but left
network and behavioural information intact.
• Asked other researchers using the data not to attempt to reidentify subjects.
• Stated that “hackers” and “extreme effort” would be the only way to “crack” the data.
The university was identified swiftly based purely on the codebook and other writings about the data – but not
requiring direct access to the data. Once the university was identified, other specific identifying data (nationality, race,
home state, etc.), sometimes with only 1 individual in these groups, made re-identification of (some) students simple.
After public scrutiny and identification of the university, the data set was swiftly withdrawn by the researchers.
Zimmer, M. 2010. “But the data is already public”: on the ethics of research in Facebook. In Ethics and Information
Technology, 12 (4), December 2010, pp. 313-325. https://link.springer.com/article/10.1007%2Fs10676-010-9227-5
Terms & Conditions
• Before undertaking any social media research understand the T&Cs and
Developer T&Cs for the platform(s) you are looking at.
• Understand how your research aligns with the T&Cs, and any possible
issues of privacy, etiquette, or practical access.
• If your work is in conflict with T&Cs either re-design your research
(strongly recommended) or look carefully at risks and impacts.
• You should not ignore any T&Cs for technical reasons. If there is a valid
reason to ignore T&Cs for specific research reasons (such as research on
deleted tweets), be prepared to justify that to ethics boards and peer
reviewers. And understand that you may risk losing access to the platform
and your research data if you are found to be in breach of T&Cs.
Twitter Developer T&Cs of note (1)
Section VII (Other Important Terms), A: User Protection:
"Twitter Content, and information derived from Twitter Content, may not be
used by, or knowingly displayed, distributed, or otherwise made available
to:"…
"any entity for the purposes of conducting or providing surveillance, analyses
or research that isolates a group of individuals or any single individual for any
unlawful or discriminatory purpose or in a manner that would be inconsistent
with our users' reasonable expectations of privacy;"
https://developer.twitter.com/en/developer-terms/agreement-and-policy
Twitter Developer T&Cs of note (1)
Section VII (Other Important Terms), C: Respect Users' Control and Privacy:
"3. If Content is deleted, gains protected status, or is otherwise suspended,
withheld, modified, or removed from the Twitter Service (including
removal of location information), you will make all reasonable efforts to
delete or modify such Content (as applicable) as soon as reasonably
possible, and in any case within 24 hours after a request to do so by
Twitter or by a Twitter user with regard to their Content."
https://developer.twitter.com/en/developer-terms/agreement-and-policy
Facebook Statement of Rights &
Responsibilities
Section 5: Protecting Other People's Rights
"We respect other people's rights, and expect you to do the same.
1. You will not post content or take any action on Facebook that infringes or violates someone else's rights or
otherwise violates the law.
2. We can remove any content or information you post on Facebook if we believe that it violates this Statement or
our policies.
3. We provide you with tools to help you protect your intellectual property rights. To learn more, visit our How to
Report Claims of Intellectual Property Infringement page.
4. If we remove your content for infringing someone else's copyright, and you believe we removed it by mistake, we
will provide you with an opportunity to appeal.
5. If you repeatedly infringe other people's intellectual property rights, we will disable your account when
appropriate.
6. You will not use our copyrights or Trademarks or any confusingly similar marks, except as expressly permitted by
our Brand Usage Guidelines or with our prior written permission.
7. If you collect information from users, you will: obtain their consent, make it clear you (and not
Facebook) are the one collecting their information, and post a privacy policy explaining what
information you collect and how you will use it.
8. You will not post anyone's identification documents or sensitive financial information on Facebook.
9. You will not tag users or send email invitations to non-users without their consent. Facebook offers social
reporting tools to enable users to provide feedback about tagging."
https://www.facebook.com/terms.php
Trust in Social Networks
vs Trust in Research
Research Ethics – Randall Munroe/xkcd (https://xkcd.com/1390/) Licensed under CC-BY-NC 2.5
Trust in social media networks is mixed, with
users increasingly savvy about data use…
However…
• Social Media users can find observation by
academic researchers more disconcerting
than by the companies who own the
platforms.
• Research, depending on the topic, can feel
like a judgement on behaviours making
consent hugely important.
• The burden on researchers to be clear about
motives, funders, process, etc. is higher than
on commercial companies.
• There are parallels here to how individuals
feel about e.g. Tesco Clubcard or Credit Card
data capture vs. surveys and censuses.
Question/Discussion
What are the ethical concerns and
considerations for your current (or previous)
social media research?
Obtaining Consent
• Consent may be implicitly included for API data access in some terms and
conditions BUT, when did you last read the terms and conditions? What about
your research participants?
So:
• Obtain explicit consent wherever possible.
• Be transparent if you are engaging in research in a space – with a pinned post, link
to your participant information sheet, etc.
• Consent can be tricky in anonymous and less traditional social media spaces (see
e.g. Osborne 2017 for approaches used with Yik Yak).
• Apply particular caution to gaining consent for screen shots, attributed posts,
reproducing exact images or text of posts etc.
Osborne, N. 2017. Addressing ethics of research in anonymous online spaces. In “A Live Pulse”: Yik Yak for understanding
teaching, learning and assessment at Edinburgh [blog], 13th July 2017.
http://yikyakresearch.blogs.edina.ac.uk/2017/07/13/addressing-ethics-of-research-in-anonymous-online-spaces/
Some Common Ethics Pitfalls
• Researcher assumes public data can be used in any way desired, without
considering the subject(s) intent when originally sharing their profile/post etc.
• Researcher explores conveniently available “public” data without realising that
privacy settings may make more information available to them, than is truly
“public”.
• Researcher is using “big” data under belief that individuals will not be identifiable
(as in the “But the data is already public” case).
• Research subject(s) has shared data on a public site but is not aware of their own
settings, or has not checked them lately, making implicit consent and the public
nature of the data problematic. Discovering that they have been included in
published research may be upsetting and problematic.
• Research Ethics Committees and/or Journal Editorial Boards are unaware or do not
properly consider that social media data includes real names, pseudonyms,
locations, highly disclosive data and do not ask the right questions around the
consent process, collection, aggregation, storage and retention of data.
• Researcher uses full text of a post as an “anonymous” example but this is then
Googled which identifies the original post/tweet/content and individual.
Data Considerations
• What kind of research approach are you taking?
• Who or what is the subject of your research – what is the right social media space to capture
appropriate data?
• What scale of data are you looking to collect/harvest? (If working with big data see boyd &
Crawford 2012)
• Will you be sampling or looking to collect all data over a specific time period?
• How sensitive is the topic?
• What level and type of consent can you obtain from participants?
• What kind of content?
– Profiles – for network analysis, image analysis, qualitative review of content through profile components/data?
– Posts – through API/data feed/harvesting or observation? Textual, visual, multimedia? Manual coding or text/data
mining?
– Comments/discussion – contents or threads of discussion?
– Metadata – tags, likes, engagements?
• Time bounds – how long do you expect to collect data for?
• What use will you make of the data after capture?
boyd, d. and Crawford, K., 2012. Critical questions for big data. In Information, Communication & Society special issue: A decade in
internet time: the dynamics of the internet and society, 15 (5). http://dx.doi.org/10.1080/1369118X.2012.678878
Sources of baseline data on usage,
access, trends, literacies etc.
• Oxford Internet Surveys: biennial data on UK public use and attitudes to the
internet, including social media: http://oxis.oii.ox.ac.uk/research/dataset-request/
• Ofcom research and data: Regular reporting on UK public use and attitudes to
media, including internet and social media: https://www.ofcom.org.uk/research-
and-data/search. Includes:
– Annual adult media use and attitudes, and children’s media literacy reporting:
https://www.ofcom.org.uk/research-and-data/media-literacy-research;
– Communications Market Report: annual overview at consumer use of communications of all types:
https://www.ofcom.org.uk/research-and-data/multi-sector-research/cmr
– Further regular and one-off data via the statistical release calendar:
https://www.ofcom.org.uk/research-and-data/data/statistics
• Pew Internet & American Life datasets: data on US public use, knowledge and
understanding of the web, digital literacy, social media, etc:
http://www.pewinternet.org/datasets/. For example:
– Social Media Update 2016: http://www.pewinternet.org/2016/11/11/social-media-update-2016/
Sources of Official Social Media
Usage Data, Trends, Financials, etc.
Best sources are quarterly earnings reports and presentations, typically including: monthly active
users, usage trends, earnings, monetization strategies, financials, future plans:
• Facebook & Instagram & WhatsApp: https://investor.fb.com/home/default.aspx
• Twitter: https://investor.twitterinc.com/results.cfm
• SnapChat: https://investor.snap.com/events-and-presentations/events
• YouTube/Google via Alphabet: https://abc.xyz/investor/
• Flickr:
– Currently owned by Oath, should be via Verizon once deal closes:
http://www.verizon.com/about/investors
– Historical up to 2017, via Yahoo captures in the Internet Archive:
https://web.archive.org/web/*/https://investor.yahoo.net/index.cfm
• LinkedIn:
– Current via Microsoft: https://www.microsoft.com/en-us/investor/
– Historical up to 2016: https://news.linkedin.com/topic/earnings
• Weibo: http://ir.weibo.com/phoenix.zhtml?c=253076&p=irol-irhome
Privately Held Social Media
• Crunchbase (https://www.crunchbase.com/) is a good source of
information on shareholders/owners, acquisitions, finances, etc.
• Alexa web rankings (owned by Amazon) give an overview of usage levels
and trends based on ranking relative to other sites in the US, and globally.
• Social Media sites’ “business” and “press” sites, official blogs and news
releases are best for user data.
• Some social media provide advertising APIs – which may be usable for
research depending on T&Cs and data content - but not developer or open
APIs, e.g. Snapchat: https://www.snap.com/en-GB/news/post/third-party-
applications-and-the-snapchat-api/
e.g:
– Pinterest:
• data on usage from Pinterest: https://business.pinterest.com/en
• Alexa data on usage: https://www.alexa.com/siteinfo/pinterest.com
• investor data:
https://www.crunchbase.com/organization/pinterest/investors/investors_list
Data Quality & Reliability
• Data sources and APIs can change regularly, and what is available may change over time (e.g.
Twitter moved from all to “Top” tweets some years ago for its API; Facebook have changed data
structures multiple times).
• Errors in automated data collection can be hard to spot until analysis is undertaken – sampling, trial
data collection, and review of code by colleagues can all be useful.
• Gaps in data may occur because there are genuine gaps in data creation/posting etc; because there
are technical issues with the social media service; because of an error in your code; or because you
are over your API rate limit for the minute/hour/day.
• Data may change over time – Facebook and Instagram allow posts to be edited so a request will
capture one moment in time not necessarily the original or final versions.
• Data may disappear over time. Notable example: the Twitter deletions terms and conditions means
that deleted tweets will not appear in a later API call.
– Research tools obeying the T&Cs will also update and remove deleted tweets.
– Research tools retaining deleted tweets are technically in breach of the T&Cs.
• Acquisitions, Mergers, and shut downs of social media sites can lead to changed terms and
conditions, changes to data availability and use, changes or removals of APIs and data access
routes, changes to user presence in a space, acceptable norms within a space (important for
qualitative work particularly).
Hidden pre-filtering and sampling
• Not all social media posts are equally likely to be included in standard API
endpoints
– e.g. a Twitter user with few posts and few followers is unlikely to appear on a
popular hashtag.
– The standard "Streaming" and "Search" APIs include 1% of Tweets and varies
in accuracy depending on activity/time etc. (See Morstatter et al 2013).
• Privacy settings will reduce the accuracy of any data sampled from
Facebook or other more complex privacy networks but it is hard to see
what is being excluded.
Morstatter, F., Pfeffer, J., Liu, H. and Carley, K.M., 2013. Is the Sample good enough?
Comparing data from Twitter's streaming API with Twitter's Firehose. In ICWSM 2013
and eprint arXiv:1306.5204. Available from: https://arxiv.org/abs/1306.5204
Question/Discussion
Have you already tried obtaining data for the
social media space you are using in your
research?
Have you faced any challenges or obstacles?
Existing Data Sets
• “The Zuckerberg files”: digital archive of all public comments by Mark Zuckerberg
including social media and mainstream media content for research use:
https://www.zuckerbergfiles.org/
• FiveThirtyEight Data: archive of data associated with FiveThirtyEight articles,
including social media data sets: https://github.com/fivethirtyeight/data
• Lumen database – tracking legal notices and complains for removal of online
materials (including social media content): https://www.lumendatabase.org/
• CSIRO (Australia’s national science agency) We Feel – emotions in Tweets – API:
http://wefeel.csiro.au/#/api (see:
http://datadrivenjournalism.net/resources/we_feel)
• Stanford Large Network Dataset Collection - includes social network data
sets: https://snap.stanford.edu/data/
• Network Repository – network datasets, including social media, Facebook and
Twitter networks: http://networkrepository.com/
• DocNow – social justice social network archives: http://www.docnow.io/
Cross-site data tools
• North Caroline Social Media Archive
Toolkit: https://www.lib.ncsu.edu/social-media-archives-toolkit; see
also: https://github.com/NCSU-Libraries/Social-Media-Combine
• Social Mention (search engine for social media) API:
http://www.socialmention.com/api/
• Scrapebox (premium tool) YouTube Downloader:
http://www.scrapebox.com/youtube-downloader and Social Account
Scraper: http://www.scrapebox.com/social-account-scraper
• ESRC COSMOS Open Data Tools (available but no longer updated since
2014): http://socialdatalab.net/software
• Overview of Twitter data tools (Ahmed
2015): http://blogs.lse.ac.uk/impactofsocialsciences/2015/07/10/social-
media-research-tools-overview/
Recommended: DMI Tools
The Digital Methods Initiative add new (documented) tools all the time, including:
• Censorship Explorer – determine censorship in various regions through URLs & proxies.
• Discus (Disqus) Comment Scraper – obtain data from the Discus comment plugin.
• Expand Tiny URLs – automatically expand large collections of Tiny URLs (e.g .from tweets).
• Geo IP – translate URLs or IP addresses into geographic locations (e.g. for a blog).
• Instagram Hashtag Explorer – retrieve Instagram media via specific hashtags.
• Issue Crawler – uses URLs to analyse relationships and connections through links between
URLs.
• Netvizz (Facebook) – extracts data from Facebook around groups, pages, search.
• Pinterest Scraper – scrapes Pinterest URLs and captures metadata of pins.
• Tumblr – data capture based on a Tumblr tags which retrieves metadata and co-incident tags.
• Twitter Capture and Analysis Toolset (DMI-TCAT) – robust and reproducible tool for data
capture and analysis of Twitter data. Source code available for local use.
• YouTube Data Tools – extract data on YouTube channels and videos, e.g. channel networks.
Access documentation and DMI tools at: https://wiki.digitalmethods.net/Dmi/ToolDatabase
See also, DMI Protocols: https://wiki.digitalmethods.net/Dmi/DmiProtocols
https://github.com/digitalmethodsinitiative/dmi-tcat/wiki
Internet Archive & WayBackMachine
• Global archive capturing websites (to various levels of detail/depth) based on IA
targets and user-submitted requests (since 2001).
• You can request a site for archiving, or a group of sites.
• Searchable resource OR can use exact URL to retrieve previous archived pages
(WayBackMachine).
• Collections exist for various social media collections, e.g:
– 2016 US Presidential Election Social Media: https://archive.org/details/2016electiontwitter
– Arab America on Social Media: https://archive.org/details/ArchiveIt-Collection-2797
– Gif Cities (Gifs from GeoCities): https://gifcities.org/
• Great for social media website changes, blogs, terms and conditions versions, etc.
• Sites available in a range of archive formats (IA), or as viewable pages
(WayBackMachine).
• See:
– https://archive.org/
– https://archive.org/web/
https://gifcities.org/?q=star+wars
UK Web Archive
• Run by the British Library (since 2004).
• Indexes (UK/related) sites to a greater depth than the Internet Archive.
• Smaller archive.
• You can request a site for archiving.
• Special Collections include:
– UK Blogs:
https://www.webarchive.org.uk/ukwa/collection/100698/page/1/source/colle
ction
– London Terror Attacks, 2005 (mainstream and social media commentary):
https://www.webarchive.org.uk/ukwa/collection/100757/page/1/source/colle
ction
– Olympic & Paralympic Games 2012 (mainstream and social media):
https://www.webarchive.org.uk/ukwa/collection/4325386/page/1/source/coll
ection
• See: https://www.webarchive.org.uk/ukwa/.
Other Web Archive Resources
• Rhizome: archiving for internet art, including interactive works
engaging with/critiquing social media: http://rhizome.org/
• Note: EDINA are currently working on an archiving tool for
researchers, ask me for more info on Site2Cite.
Using APIs to obtain Data
• APIs (Application Programming Interfaces) exist for most
social media sites and allow direct requests for data.
• Some unofficial APIs exist for sites without official/open
APIs. Use only with caution as these frequently have
privacy, security or legal issues.
• Consider working with text and data mining colleagues, or
developers, to seek additional ways to capture data such
as:
– Screen scraping (automated capture of pages from a user
perspective).
– Mobile data collection or data capture approaches to social
media.
– Internet archiving approaches using standard tools or code
libraries
Glossary: Data Request Terms
• API: Application Programming Interface – a way to request data from a web service.
• REST or RESTful API: REST stands for “Representational State Transfer” and means an API that uses
HTTP (the protocol for accessing websites) requests (or “calls”) to:
– GET – read access to content such as posts, users, etc. This is the main request you would use to retrieve
data.
– PUT – update or replace data.
– POST – create new data (such as a post to a blog, a wiki page, etc.).
– DELETE – Delete content.
• An API Endpoint – is essentially the way to address and structure what kind of request you are
making. E.g. home_timeline vs user_timeline. Each endpoint provides a different entry to the data
behind a web service.
• In a REST GET request you may have:
– Fields – the various fields of data you want to retrieve, e.g. link, message, post, etc. These are usually shown
in the Developer Documentation.
– Modifiers or Parameters - these act like filters, limiting the request in a specific way, e.g. only retrieving
posts with a location attached.
– Operators – are the various standard terms/labels for content and content types that you can use in your
GET request to shape and customise it, for instance this might include “retweets_of” or “bio” or “has:links”
etc.
• Other types of APIs and M2M (Machine-to-Machine) interfaces exist including “SOAP” and “RPC”.
• SDK is Standard Developer Kit and is used increasingly often as a way to package various requests
for developers to use in web or mobile apps (SDKs has been used as a term for the coding tools for
smartphone platforms iOS and Android for years).
Locating or Requesting
Social Media Data
ProgrammableWeb (https://www.programmableweb.com/) is a great source for API
information for social media sites:
• Instagram Developer: https://www.instagram.com/developer/
– API Endpoints: https://www.instagram.com/developer/endpoints/
• Twitter Developer: https://developer.twitter.com/
– APIs: https://developer.twitter.com/en/docs
– GNIP: http://support.gnip.com/apis/ - premium "Firehose" access. See also Twitter
Enterprise: https://developer.twitter.com/en/enterprise
– Free APIs cover 7 days tweets; Premium APIs exist for 30-day search and full archive search.
– Facebook for Developers: https://developers.facebook.com/
– API (Graph API): https://developers.facebook.com/docs/graph-api/
• YouTube Developers: https://developers.google.com/youtube/
– APIs (Comments and Comment Threads particularly useful):
https://developers.google.com/youtube/v3/docs/
• Weibo API: http://open.weibo.com/wiki/API%E6%96%87%E6%A1%A3/en
How do you make an API call?
• For open RESTful APIs you can enter an HTTP request in any browser
window, e.g. http://services.groupkt.com/state/get/USA/all
• Most social media APIs now require you to register your app, request a key
from them and for you to include the access tokens in your request.
• In general API calls are made from within a small programme – this might
be running on your machine or from a browser based coding tool.
• Lots of existing tools based on social media APIs exist – see later slide for a
sample of these.
• Try it out:
– Codecademy Twitter API tutorial:
https://www.codecademy.com/en/tracks/twitter
An API Endpoint is a bit
like a vending machine…
Vending machine priced by grams of fat, Google, San
Jose, California.jpg by Flickr user Cory Doctorrow.
You have to use the right machine to get hold of the item
you want, then you have to enter the right code and the
right price to get your candy.
• Each item has a name, and a standard way to access it
(in a vending machine this is the item code).
• Each item has a value (in a vending machine this is the
delicious edible contents of each item).
• Each item requires some sort of trust exchange before
you can access it (in a vending machine this is cash).
• In an API that “E12” item code is actually going to look
more like:
https://api.twitter.com/1.1/statuses/user_timeline.jso
n?screen_name=twitterapi&count=2
• In an API the price is usually a unique key/access
token that is unique to you and your app – that
indicates a legitimate request and who it’s from.
Bonus: In APIs there is usually a huge range of data
(research candy) to ask for, and lots of filtering options.
What will you get back from
an API GET request?
Assuming it has worked correctly, something like this…
See the full example at: https://developer.twitter.com/en/docs/tweets/timelines/api-
reference/get-statuses-user_timeline.html
Each of these is a new field for a single
tweet and it’s value.
[] is an empty field (e.g. no hashtag on this
tweet).
This data can then be processed by your app, or simply
retrieved and stored in a database or spreadsheet…
Recommended Tool:
Martin Hawksey’s TAGS
• Uses Google Docs to capture tweets based on a hashtag, search term, user, etc.
• Can be automated to allow rolling capture.
• Useful for capturing a sample of long term community dialogues or public
discourse where Top Tweets/7 day limits will be acceptable.
• Includes spreadsheet; visualisation; searchable archive - latter two options are
only available if you make data (semi) public.
• Uses Twitter API – takes “Top” rather than “Latest” tweets so accuracy depends on
popularity of content/hashtags.
• Well documented and supported by Martin.
• A great way to dip your toe in the API water – you have to obtain a key the first
time you run TAGs, and can access and look at the code it runs. You can also make
more advanced use of the tool and automation connecting it to other
visualisations and analysis tools.
• See: https://tags.hawksey.info/
• Support: https://tags.hawksey.info/forums/
Question/Discussion
Do you have any experience or
recommendations for social media data
collection tools or approaches?
Have you attended one of the Digital Scholarship
sessions where CAHSS researchers can meet
with developers and data specialists?
[Recommended!]
Analysis & Visualisation
Further information, tutorials etc. online and/or running through Digital Scholarship and Schools Research Methods training.
• Nvivo (http://www.qsrinternational.com/nvivo/) – Premium qualitative data analysis software with social media and multimedia
support, collaborative working also supported. Feature rich. Training available. Available through UoE/CAHSS license:
https://www.ed.ac.uk/information-services/computing/desktop-personal/software/main-software-deals/nvivo.
• IBM SPSS (https://www.ibm.com/analytics/us/en/technology/spss/) – Premium data analysis tool for surveys and particularly for
quantitative data, widely used in social sciences. Available through UoE license: https://www.ed.ac.uk/information-
services/computing/desktop-personal/software/main-software-deals/spss.
• Dedoose (http://www.dedoose.com/) – Premium qualitative data analysis software with simple interface, tagging, annotation and
exploration options.
• Chorus (http://chorusanalytics.co.uk/) – Free software for data harvesting and analytics for social science research using Twitter data
• Gephi (https://gephi.org/) – Visualisation and exploration of multiple data types, particularly good for network analysis. Feature rich so
a bit of a learning curve. Free download.
• D3 visualisation libraries (see: https://github.com/d3/d3/wiki/gallery) – Free collection of Javascript libraries for use in data
visualisation and exploration of multiple data types.
• NodeXL (https://nodexl.codeplex.com/) – Free network visualisation tool for Excel. Free.
• TAGS Explorer (https://tags.hawksey.info/) – Twitter only visualisations of networks (using NodeXL) and searchable timeline archive
explorations. Free.
• Textal (http://www.textal.org/) – Text analysis tools for mobile use with Twitter streams, websites (inc. blogs), and documents. Free.
• Tableau (https://www.tableau.com/) – Visualisation of multiple data sources and types. Free trial, otherwise monthly subscription.
A large quantity of open source tools and software are available. Search for these or look at the Journal of Open Research Software
(https://openresearchsoftware.metajnl.com/) or the Journal of Open Source Software (http://joss.theoj.org/) for well documented research-
driven examples. See also Tony Hirst’s OU Useful Blog (https://blog.ouseful.info/) for visualisation approaches. There are also many marketing
packages for social media analysis which could be used/adapted for research where their processes are well documented.
Appropriate Handling & Storage
• Data is usually returned with unique identifiers that can be easily
traced back to the original poster/subject.
• The unique identifiers connect conversations and posts so are hard
to strip away entirely – although you could try a one-way hash of
the data to mask the identifiable information but retain
connections.
• Short posts and tweets are highly identifiable. Try Googling or
searching Twitter for a recent tweet to see that in action.
• Images and videos can also be relatively easily compared/reverse
image searched and therefore identifiable.
• Think about which fields you actually need to retain for your
research question(s).
• Plan how long you will keep your data, and how you will keep it
secure - where and how you store your data really matters.
Data Protection & GDPR
• Be aware of current Data Protection (Data Protection Act 1998) guidance
on the use, storage and retention of personal data.
• From 25th May 2018 the General Data Protection Regulation (GDPR)
comes into effect with:
– Increased rights for individuals to understand the use, access, rectification, erasure,
rights to restrict processing, portability, and rights to object to the use of their data.
– Increased legal measures for organisations breaching GDPR guidance.
• Ensure your Consent process, your Research Data Management plans, and
your use, access and disposal of data is compliant.
• By default social media APIs provide a lot of data:
– What is the minimum data you require?
– Removing unneeded data at the point of collection and/or data cleaning will
help reduce any risks of exposure or non-compliance with data protection
legislation.
See:
• Data Protection Act 1998: https://www.legislation.gov.uk/ukpga/1998/29/contents
• ICO guidance: https://ico.org.uk/for-organisations/data-protection-reform/overview-of-the-gdpr/
Local Support
• Research Data Mantra – self-led course on Research Data Management,
including appropriate handling, storage and planning for onward
preservation, sharing or destruction: http://mantra.edina.ac.uk/
• Data Store – secure storage for active research data, available to all staff
and PGR students: https://www.ed.ac.uk/information-services/research-
support/research-data-service/working-with-data/data-storage
• Working with Sensitive Data – guidance and further resources on working
with sensitive and personal data: https://www.ed.ac.uk/information-
services/research-support/research-data-service/working-with-
data/sensitive-data
• Information Security Team – guidance on legal and technical approaches
to keeping data secure and appropriately encrypted and disposed of:
https://www.ed.ac.uk/infosec
Making Research Data Open
• If you have a consent process in place, ensure you request consent for any onward use you
expect to make of your data. And ensure there is a process to withdraw consent for onward.
• Beware verbatim quoting in publications – it can be easy to search back to the original text.
– Public figures who would consider their social media content a publication and part of their profile
(e.g. politicians) are more appropriate to quote, where needed.
– Even if anonymous/not attributed it is safer to paraphrase short comments where possible to make
reverse searching more challenging.
• Screenshots of posts often reveal the subject name, image, location, and their contacts. Only
use these where appropriate, properly consented to, and where you are not placing your
subjects at risk.
• Consider the timelag between data collection and any publication. Is your consent from
participants still valid if a year has passed? What about 2 years? Or 5 years? A teen
participant may feel differently about data being exposed when they are, for instance, a
newly qualified lawyer or medic with very different reputational considerations.
See also: University of North Carolina at Chapel Hill and UoE Research Data Management and Sharing
(Coursera): https://www.coursera.org/learn/data-management
Courses and Information
• DMI Digital Methods online course:
https://wiki.digitalmethods.net/Digitalmethods/WebHome
• UCL Why We Post: the Anthropology of Social Media course (FutureLearn):
https://www.futurelearn.com/courses/anthropology-social-media
• QUT Social Media Analytics: Using Data to Understand Public
Conversations (FutureLearn):
https://www.futurelearn.com/courses/social-media-analytics
• Rutgers University Social Media Data Analytics (Coursera):
https://www.coursera.org/learn/social-media-data-analytics
• Doing Journalism with Data: First steps, skills and tools:
http://learno.net/courses/doing-journalism-with-data-first-steps-skills-
and-tools
• UoE Digital Footprint MOOC – understand some of the challenging
identity, privacy and ethical concerns around social media for you and
your research subjects: https://www.coursera.org/learn/digital-footprint/
Useful Niche Resources
• Utrecht Data School Data Ethics Decision Aid (DEDA):
https://dataschool.nl/research/deda/?lang=en
• Programming Historian Data Mining the Internet Archive lesson:
https://programminghistorian.org/lessons/data-mining-the-internet-archive
• Insight News Lab Social Network Analysis and Visualisation for #RDAPlenary 3
(using ScraperWiki and OpenRefine): http://hujo.deri.ie/rdaplenarysn/
• Tony Hirst First Baby Steps to Anonymising Data with Open Refine:
https://blog.ouseful.info/2015/01/23/anonymising-data-with-open-refine/
• Tony Hirst Social Interest Positioning – Visualising Facebook Friends’ Likes with
Data Grabbed Using Google Refine: https://blog.ouseful.info/2012/01/04/social-
interest-positioning-visualising-facebook-friends-likes/
• Tony Hirst Grabbing Twitter Search Results into Google Refine and Exporting
Conversations into Gephi  needs updating for new Twitter API:
https://blog.ouseful.info/2012/10/02/grabbing-twitter-search-results-into-google-
refine-and-exporting-conversations-into-gephi/
Local research and expertise
(a small sampling thereof!)
• Social media, Digital Ethnography, Sociological research methods– Kate Orton Johnsone (Sociology)
• Social Media, Digital Labour – Karen Gregory (Sociology).
• Communities on the Darknet, illicit markets and cultures – Angus Bancroft (Sociology).
• Social media in education; bots; anonymity in social media – Sian Bayne (Research in Digital
Education Centre, Moray House).
• Digital cultural heritage learning and engagement– Jen Ross (Research in Digital Education Centre,
Moray House); Claire Sowton (CAHSS); Melissa Terras (UCL/CAHSS).
• Text and data mining of social media content – Claire Grover (Informatics); Richard Tobin
(Informatics); Clare Llewellyn (Informatics; Neuropolitics Research, SPS).
• Sharing of photography, autobiographical memory and distributed cognition (inc. social media) –
Tim Fawns (Clinical Education, Centre for Medical Education, MVM).
• Big data (inc. social media) in healthcare – Mhairi Aitken (Usher Institute, MVM).
• Social media, Digital Footprint, blogging and Buddhism – Louise Connelly (Vet School).
• Mobility, mobile technology, formal and informal education communities around the world –
Michael Sean Gallagher (Research in Digital Education Centre, Moray House).
• Playful learning in informal digital environments – Clara O’Shea (Research in Digital Education
Centre, Moray House).
• Social media and politics – Neuropolitics Research group: Laura Cram (Politics, SPS); Robin Hill
(Informatics; SPS); Sujin Hong (SPS); Adam Moore (PPL).
• Visualisation of big data, including network analysis – Benjamin Bach (Design Informatics).
• Social Media and scholarly communities–Sara Shinton (IAD); James Stewart (SPS).
Recommended work
& groups researching in this area
• UoE Beyond Text Network – interdisciplinary network for social media and multimedia researchers:
https://www.wiki.ed.ac.uk/display/DIG/Beyond+Text
• UoE Informatics Language Technology Group – text mining expertise working on projects including topic modelling and
social media analysis: https://www.ltg.ed.ac.uk/
• Digital Methods Initiative (DMI) (European multi-organisation research group):
https://wiki.digitalmethods.net/Dmi/DmiAbout
• Microsoft Research Social Media Collective (US) – particularly danah boyd, Nancy Baym and Kate Crawford’s work:
https://www.microsoft.com/en-us/research/group/social-media-collective/
• #NSMNSS: New social media, new social science? - great blog reflecting on social science methods around social
media http://nsmnss.blogspot.co.uk/
• Oxford Internet Institute – particularly strong on relationships to mainstream media environment: https://www.oii.ox.ac.uk/
• Visual Social Media Lab (Sheffield) – led by Farida Vis: http://visualsocialmedialab.org/
• DocNow – social justice social media archiving: http://www.docnow.io/
• Data Driven Journalism (European Journalism Centre and Netherlands): http://datadrivenjournalism.net/
• Analysing Social Media Collaboration (UK cross-institution group, site now dormant) – responsible for the high profile
“Reading the Riots” Twitter analysis work in 2011: http://www.analysingsocialmedia.org/home
• Michael Zimmer – influential work on privacy, leading projects on privacy and Facebook: http://www.michaelzimmer.org/
• Electronic Freedom Foundation –advocates with expertise on privacy and tracking in social media: https://www.eff.org/
• Centre for Social Media Research (University of Westminster): https://www.westminster.ac.uk/social-media-research
• Digital Media and Society Research Group (Cardiff): https://www.cardiff.ac.uk/research/explore/research-units/digital-
media-and-society
• COSMOS (legacy page for Cardiff research group): http://www.cs.cf.ac.uk/cosmos/
Recommended Journals
• First Monday (University of Illinois at Chicago): http://firstmonday.org/index
• New Media & Society (Sage): http://journals.sagepub.com/home/nms
• Information, Communication & Society (Taylor & Francis):
http://tandfonline.com/toc/rics20/current
• Social Media + Society (Sage): http://journals.sagepub.com/home/sms
• Big Data & Society (Sage): http://journals.sagepub.com/home/bds
• Policy & Internet (Wiley):
http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1944-2866
• Journal of Computer-Mediated Communication (Wiley):
http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1083-6101
• Cyberpsychology, Behaviour, and Social Networking (Mary Ann Liebert Inc.):
http://online.liebertpub.com/loi/CYBER
• Journal of Broadcasting & Electronic Media (Taylor & Francis):
http://www.tandfonline.com/toc/hbem20/current
Relevant Upcoming
Digital Scholarship Sessions
• Digital Research Clinics and Resources (26th October 2017)
• Cleaning Data with Open Refine (1st November 2017)
• Regex: Regular Expressions (23rd November 2017)
• Introduction to Sentiment Analysis: What it is and how to do it simply
(14th December 2017)
Look out for further sessions and/or contact the team with any specific
requests: http://www.digital.cahss.ed.ac.uk/
Questions & Discussion
Or follow up after today: nicola.osborne@ed.ac.uk

More Related Content

What's hot

Internet privacy presentation
Internet privacy presentationInternet privacy presentation
Internet privacy presentation
Matthew Momney
 
Internet ethics and rules
Internet ethics and rulesInternet ethics and rules
Internet ethics and rules
Mazni Salleh
 

What's hot (20)

Virus and malware presentation
Virus and malware presentationVirus and malware presentation
Virus and malware presentation
 
Social Networking (Ethics in Information Technology)
Social Networking (Ethics in Information Technology)Social Networking (Ethics in Information Technology)
Social Networking (Ethics in Information Technology)
 
Media Language: Print Media
Media Language: Print MediaMedia Language: Print Media
Media Language: Print Media
 
Internet privacy presentation
Internet privacy presentationInternet privacy presentation
Internet privacy presentation
 
The Digital Millennium Copyright Act
The Digital Millennium Copyright ActThe Digital Millennium Copyright Act
The Digital Millennium Copyright Act
 
Social Media Privacy Laws and Legal Liabilities
Social Media Privacy Laws and Legal LiabilitiesSocial Media Privacy Laws and Legal Liabilities
Social Media Privacy Laws and Legal Liabilities
 
Email Forensics
Email ForensicsEmail Forensics
Email Forensics
 
Iloveyou-Virus.pptx
Iloveyou-Virus.pptxIloveyou-Virus.pptx
Iloveyou-Virus.pptx
 
Inverted pyramid style of news writing
Inverted pyramid style of news writingInverted pyramid style of news writing
Inverted pyramid style of news writing
 
Video Journalism: How to make a news video
Video Journalism: How to make a news videoVideo Journalism: How to make a news video
Video Journalism: How to make a news video
 
Erasmus+ group 2
Erasmus+ group 2Erasmus+ group 2
Erasmus+ group 2
 
Network Forensics
Network ForensicsNetwork Forensics
Network Forensics
 
PRINT JOURNALISM II- PRINCIPLES OF EDITING
PRINT JOURNALISM II- PRINCIPLES OF EDITINGPRINT JOURNALISM II- PRINCIPLES OF EDITING
PRINT JOURNALISM II- PRINCIPLES OF EDITING
 
Firewall , Viruses and Antiviruses
Firewall , Viruses and AntivirusesFirewall , Viruses and Antiviruses
Firewall , Viruses and Antiviruses
 
Social networking privacy issues & exposure
Social networking privacy issues & exposureSocial networking privacy issues & exposure
Social networking privacy issues & exposure
 
Spam
Spam Spam
Spam
 
Internet ethics and rules
Internet ethics and rulesInternet ethics and rules
Internet ethics and rules
 
Writing for Radio & Television
Writing for Radio & TelevisionWriting for Radio & Television
Writing for Radio & Television
 
Web security ppt sniper corporation
Web security ppt   sniper corporationWeb security ppt   sniper corporation
Web security ppt sniper corporation
 
Writing for digital media
Writing for digital mediaWriting for digital media
Writing for digital media
 

Similar to Working with Social Media Data: Ethics & good practice around collecting, using and storing data

Social Networking, Online Communities and Clinical Research
Social Networking, Online Communities and Clinical ResearchSocial Networking, Online Communities and Clinical Research
Social Networking, Online Communities and Clinical Research
Colleen Young
 

Similar to Working with Social Media Data: Ethics & good practice around collecting, using and storing data (20)

Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
Blurring the Boundaries? Ethical challenges in using social media for social...
Blurring the Boundaries? Ethical challenges in using social media for social...Blurring the Boundaries? Ethical challenges in using social media for social...
Blurring the Boundaries? Ethical challenges in using social media for social...
 
Social Media, Social Science and Research Ethics
Social Media, Social Science and Research EthicsSocial Media, Social Science and Research Ethics
Social Media, Social Science and Research Ethics
 
Lecture series: Using trace data or subjective data, that is the question dur...
Lecture series: Using trace data or subjective data, that is the question dur...Lecture series: Using trace data or subjective data, that is the question dur...
Lecture series: Using trace data or subjective data, that is the question dur...
 
Sdal air education workforce analytics workshop jan. 7 , 2014.pptx
Sdal air education workforce analytics workshop jan. 7 , 2014.pptxSdal air education workforce analytics workshop jan. 7 , 2014.pptx
Sdal air education workforce analytics workshop jan. 7 , 2014.pptx
 
#AcAdvOnline Webinar
#AcAdvOnline Webinar#AcAdvOnline Webinar
#AcAdvOnline Webinar
 
Networked Scholars, or, Why on earth do academics use social media and why ...
Networked Scholars, or, Why on earth do academics use social media and why ...Networked Scholars, or, Why on earth do academics use social media and why ...
Networked Scholars, or, Why on earth do academics use social media and why ...
 
Social Networking, Online Communities & Research - WCHRI Rounds
Social Networking, Online Communities & Research - WCHRI RoundsSocial Networking, Online Communities & Research - WCHRI Rounds
Social Networking, Online Communities & Research - WCHRI Rounds
 
Ethical Challenges of Using Social Media Data In Research
Ethical Challenges of Using Social Media Data In Research Ethical Challenges of Using Social Media Data In Research
Ethical Challenges of Using Social Media Data In Research
 
Social Networking, Online Communities and Clinical Research
Social Networking, Online Communities and Clinical ResearchSocial Networking, Online Communities and Clinical Research
Social Networking, Online Communities and Clinical Research
 
Ethical challenges for learning analytics
Ethical challenges for learning analyticsEthical challenges for learning analytics
Ethical challenges for learning analytics
 
Access to and use of Web 2.0 and social media applications within the NHS in ...
Access to and use of Web 2.0 and social media applications within the NHS in ...Access to and use of Web 2.0 and social media applications within the NHS in ...
Access to and use of Web 2.0 and social media applications within the NHS in ...
 
Social media applications within the NHS: role and impact of organisational c...
Social media applications within the NHS: role and impact of organisational c...Social media applications within the NHS: role and impact of organisational c...
Social media applications within the NHS: role and impact of organisational c...
 
Stepping stones to ‘big data’: supporting quantitative methods teaching with ...
Stepping stones to ‘big data’: supporting quantitative methods teaching with ...Stepping stones to ‘big data’: supporting quantitative methods teaching with ...
Stepping stones to ‘big data’: supporting quantitative methods teaching with ...
 
Mapping Movements: Social movement research and big data: critiques and alter...
Mapping Movements: Social movement research and big data: critiques and alter...Mapping Movements: Social movement research and big data: critiques and alter...
Mapping Movements: Social movement research and big data: critiques and alter...
 
Learning analytics at the intersections of student trust, disclosure and benefit
Learning analytics at the intersections of student trust, disclosure and benefitLearning analytics at the intersections of student trust, disclosure and benefit
Learning analytics at the intersections of student trust, disclosure and benefit
 
The Hidden Data of Social Media Rearch_CSS-winter-symposium
The Hidden Data of Social Media Rearch_CSS-winter-symposiumThe Hidden Data of Social Media Rearch_CSS-winter-symposium
The Hidden Data of Social Media Rearch_CSS-winter-symposium
 
Internet Research Ethics CSSWS2015 Tutorial
Internet Research Ethics CSSWS2015 TutorialInternet Research Ethics CSSWS2015 Tutorial
Internet Research Ethics CSSWS2015 Tutorial
 
Griffiths lace workshop-eden-2016
Griffiths lace workshop-eden-2016Griffiths lace workshop-eden-2016
Griffiths lace workshop-eden-2016
 
Requirements for Learning Analytics
Requirements for Learning AnalyticsRequirements for Learning Analytics
Requirements for Learning Analytics
 

More from Nicola Osborne

More from Nicola Osborne (6)

Curating an Effective Digital Research Presence - Nicola Osborne, EDINA
Curating an Effective Digital Research Presence - Nicola Osborne, EDINACurating an Effective Digital Research Presence - Nicola Osborne, EDINA
Curating an Effective Digital Research Presence - Nicola Osborne, EDINA
 
Managing Your Digital Footprint presentation for #5Rights Youth Leadership Gr...
Managing Your Digital Footprint presentation for #5Rights Youth Leadership Gr...Managing Your Digital Footprint presentation for #5Rights Youth Leadership Gr...
Managing Your Digital Footprint presentation for #5Rights Youth Leadership Gr...
 
Enhancing your research impact through social media
Enhancing your research impact through social mediaEnhancing your research impact through social media
Enhancing your research impact through social media
 
The Digital Footprint MOOC: A Free online course and resources encouraging cr...
The Digital Footprint MOOC: A Free online course and resources encouraging cr...The Digital Footprint MOOC: A Free online course and resources encouraging cr...
The Digital Footprint MOOC: A Free online course and resources encouraging cr...
 
Edina and Second Life
Edina and Second LifeEdina and Second Life
Edina and Second Life
 
Cataloguing Your Friends and Neighbours: Personal Metadata and the Opportunit...
Cataloguing Your Friends and Neighbours: Personal Metadata and the Opportunit...Cataloguing Your Friends and Neighbours: Personal Metadata and the Opportunit...
Cataloguing Your Friends and Neighbours: Personal Metadata and the Opportunit...
 

Recently uploaded

Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
AnaAcapella
 

Recently uploaded (20)

21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx
 
How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
Play hard learn harder: The Serious Business of Play
Play hard learn harder:  The Serious Business of PlayPlay hard learn harder:  The Serious Business of Play
Play hard learn harder: The Serious Business of Play
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf arts
 
Economic Importance Of Fungi In Food Additives
Economic Importance Of Fungi In Food AdditivesEconomic Importance Of Fungi In Food Additives
Economic Importance Of Fungi In Food Additives
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdf
 
How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Introduction to TechSoup’s Digital Marketing Services and Use Cases
Introduction to TechSoup’s Digital Marketing  Services and Use CasesIntroduction to TechSoup’s Digital Marketing  Services and Use Cases
Introduction to TechSoup’s Digital Marketing Services and Use Cases
 

Working with Social Media Data: Ethics & good practice around collecting, using and storing data

  • 1. Working with Social Media Data: Ethics & Good Practice around Collecting, Using and Storing Data Nicola Osborne Digital Education Manager, EDINA Nicola.osborne@ed.ac.uk @suchprettyeyes
  • 2. Introductions: my social media work • Digital Education Manager at EDINA, University of Edinburgh. • Work on EDINA’s educational technology, innovation, digital and data projects for audiences across Scotland, UK and further afield. • Co-I on: PTAS-funded Managing Your Digital Footprints research strand (2014- 2015); Ongoing (2015-) Managing Your Digital Footprint research team; PTAS- funded “A Live Pulse”: Yik Yak for understanding teaching, learning and assessment at Edinburgh project. • Co-tutor on ongoing Digital Footprint MOOC (2017-) • Previously EDINA Social Media Officer (2009-2015), providing expertise and advice on social media to colleagues across UoE for over 8 years. http://edina.ac.uk/
  • 3. Introduction: you and your work 1. Who are you? 2. What social media related research are you working on or hoping to work on? 3. What do you hope to get out of today’s session?
  • 4. Overview • Introduction & Design Considerations – Approach – Data accuracy • Ethical Considerations – Recommended ethical guidance – Terms & Conditions – and impact on Data – Consent and trust • Practical Considerations – Existing data sets – Available data tools – APIS – Options for analysis and visualisation • Storing and handling Data – Compliance with legal requirements – Sources of support • Recommended researchers, groups, and resources. • Q&A/Discussion – but questions welcome throughout!
  • 5. Where to start… • What is your research question(s)? • Are social media or social media communities the subject, or core to the subject? • Or, is it the space for recruitment or reaching an audience? • Or, is it just a convenient space for data collection?
  • 6. The Elephant (Blue Bird) in the Room Image ©Twitter.com 2012
  • 7. Research Design Considerations • Research approach to be taken • Appropriate data types to support your research – Streaming/live data OR – Archived / capture of data over time with asynchronous analysis • Ethical considerations • Consent process of subjects and their network • Etiquette considerations • Platform(s) to be used – Fit with target subjects – Terms & Conditions • Practical access limitations e.g. – Do tools for data capture exist? – Does an API exist? – What are the API limitations? – Costs of access • Your (researcher) or RAs expertise. • Long term research vision – do you have rights to use and reuse data in the ways you hope to?
  • 8. Possible Methods & Questions to Think About • Computational (See also Batrinca and Treleaven 2015): – Data access through APIs, screen scraping, established methods (e.g. DMI tools)? – Text and data mining and/or Natural Language Processing (NLP)? – Social network analysis and/or Actor Network Theory (ANT) analysis using nodes and edges in the network? – Sentiment analysis based on text mining/NLP or based on presence/absence of emojis and/or visual content? – Visual analysis and/or video or audio analysis for multimedia content? • Quantitative (See also OII 2013a, b & c): – Medium or large scale data? – Automated or survey/volunteered data collection? – Data cleansing process – how will you ensure that you have a good quality data set? – What kind of statistical analysis do you want to take? Tools might include SPSS, NVIVO, Gephi, Tableu, etc. – Will you be comparing to existing data sets and/or undertaking trend analysis over time? – What standard tools in your field – for digital or non digital data – can you use to collect or interpret your data? • Qualitative: – Manual collection? – Ethnographic approaches and/or participant observation – Focus groups or similar? – Critical/reflexive reading and coding of texts/content Batrinca, B. and Treleaven, P.C., 2015. Social Media Analytics: a survey of techniques, tools and platforms. In AI & Society, 30 (1). Pp. 89-116. https://doi.org/10.1007/s00146-014-0549-4 Oxford Internet Institute, 2013a. Quantitative Methods in Social Media Research: Big Data. In OII YouTube Channel, 15 March 2013. See: https://www.youtube.com/watch?v=6hmjj7_1sSY Oxford Internet Institute, 2013b. Quantitative Methods in Social Media Research: Populations and Sampling. In OII YouTube Channel, 15 March 2013. See: https://www.youtube.com/watch?v=6hmjj7_1sSY Oxford Internet Institute, 2013c. Space-Time as a Sampling Condition for New Media Research. In OII YouTube Channel, 15 March 2013. See: https://www.youtube.com/watch?v=HNxn0PqOc8k
  • 9. Is Social Media Data Representative? • Not all people use social media (and some of the least privileged groups in society are not online at all). • Most social media data collection methods favour English language data in mainstream US/Global sites. It is unusual to see multilingual research or research that acknowledges use of content including non-English text by primarily English speakers. • Privacy settings and publicness tend to reflect status and privilege. Accessing at-risk, vulnerable, heavily trolled, and/or niche interest groups is more difficult than obtaining public posts from middle class white male social media users. BAME communities, women’s groups, LGBTQ+ communities, etc. tend to make higher use of private groups, group moderation, and protective measures that require more qualitative and overt consent-based approaches. • Not all social media users are active. There is an “activity and agency bias” (Lutz and Hoffman 2017) in much of the current research. Obtaining data on passive reads and engagement with content is extremely difficult through quantitative methods. It may be easier with participant observation. Lutz, C. and Hoffman, C. P. 2017. The dark side of online participation: exploring non-passive and negative participation. In Information, Communication & Society: AoIR Special Issue, 20 (6), pp. 876-897. http://dx.doi.org/10.1080/1369118X.2017.1293129
  • 10. Question/Discussion Which platform(s) are you intending to/are you working with? How did you select these social media spaces?
  • 11. Ethical Considerations • Visibility vs expectations of privacy: – Being “in public” is not consent to being researched, their imagined audience may be quite different. (see AoIR guidance, Marwick and boyd 2011) – Are you engaging with private or “public” figures – expectations over visibility will vary significantly. • How possible is it to obtain informed consent for work undertaken with your chosen social media platform? How can consent be withdrawn? • How will your data be collected and used? (Attributed vs Pseudonyms vs Anonymous). • What personal data is being used? Does it put anyone at risk? • What is the risk of accidental exposure or re-identification? Text snippets, quotes and images may all be easily searchable. • Public – or previously public – data can change in sensitivities over time. • How will you handle/remove/retain subsequently deleted content Marwick, A. and boyd, d., 2011. I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience. In new Media & Society, 12 (1), pp. 144-133. DOI: 10.1177/1461444810365313.
  • 12. Recommended: AoIR Ethics Guidance • AoIR Ethics Guidance (2012): https://aoir.org/reports/ethics2.pdf • AoIR Ethics Chart – a quick guide to key issues: https://aoir.org/aoir_ethics_graphic_2 016/ • AoIR Ethics Guidance (2002): https://aoir.org/reports/ethics.pdf • Annette Markham (co-author of AoIR guidance) on Impact Models for ethical decision making in data research and design: https://annettemarkham.com/2017/0 7/impact-model-ethics/
  • 13. Recommended: Social Media Research: A Guide to Ethics • Excellent concise research ethics guidance from the ESRC- funded “Social Media, Privacy and Risk: Towards More Ethical Research Methodologies” project at University of Aberdeen. • Includes pointers to further social media ethics resources. • Townsend, L. and Wallace, C. 2016. Social Media Research: A Guide to Ethics. Aberdeen: University of Aberdeen/ESRC Social Media Enhancement project. Available from: http://www.dotrural.ac.uk/soc ial-media-research-ethics/
  • 14. “But the data is already public” In 2008 researchers released profile data (The T3 Data Set) from Facebook accounts of students at a US University, inadvertently making identifiable data public, as reported in Zimmer (2010). In this case the researchers: • Had employed RAs who were part of the Network being examined and had (various levels of) access to more information than a non-logged-in user of Facebook/user beyond the Network. • Had funding that mandated open publishing and sharing of results. • Had University but not individuals consent for data collection • Combined Facebook with university housing data in their data sets • Obscured the identity of the university where students were based, but described key characterstics • Attempted to make all data anonymous by removing identifying information (name, student id, etc.) but left network and behavioural information intact. • Asked other researchers using the data not to attempt to reidentify subjects. • Stated that “hackers” and “extreme effort” would be the only way to “crack” the data. The university was identified swiftly based purely on the codebook and other writings about the data – but not requiring direct access to the data. Once the university was identified, other specific identifying data (nationality, race, home state, etc.), sometimes with only 1 individual in these groups, made re-identification of (some) students simple. After public scrutiny and identification of the university, the data set was swiftly withdrawn by the researchers. Zimmer, M. 2010. “But the data is already public”: on the ethics of research in Facebook. In Ethics and Information Technology, 12 (4), December 2010, pp. 313-325. https://link.springer.com/article/10.1007%2Fs10676-010-9227-5
  • 15. Terms & Conditions • Before undertaking any social media research understand the T&Cs and Developer T&Cs for the platform(s) you are looking at. • Understand how your research aligns with the T&Cs, and any possible issues of privacy, etiquette, or practical access. • If your work is in conflict with T&Cs either re-design your research (strongly recommended) or look carefully at risks and impacts. • You should not ignore any T&Cs for technical reasons. If there is a valid reason to ignore T&Cs for specific research reasons (such as research on deleted tweets), be prepared to justify that to ethics boards and peer reviewers. And understand that you may risk losing access to the platform and your research data if you are found to be in breach of T&Cs.
  • 16. Twitter Developer T&Cs of note (1) Section VII (Other Important Terms), A: User Protection: "Twitter Content, and information derived from Twitter Content, may not be used by, or knowingly displayed, distributed, or otherwise made available to:"… "any entity for the purposes of conducting or providing surveillance, analyses or research that isolates a group of individuals or any single individual for any unlawful or discriminatory purpose or in a manner that would be inconsistent with our users' reasonable expectations of privacy;" https://developer.twitter.com/en/developer-terms/agreement-and-policy
  • 17. Twitter Developer T&Cs of note (1) Section VII (Other Important Terms), C: Respect Users' Control and Privacy: "3. If Content is deleted, gains protected status, or is otherwise suspended, withheld, modified, or removed from the Twitter Service (including removal of location information), you will make all reasonable efforts to delete or modify such Content (as applicable) as soon as reasonably possible, and in any case within 24 hours after a request to do so by Twitter or by a Twitter user with regard to their Content." https://developer.twitter.com/en/developer-terms/agreement-and-policy
  • 18. Facebook Statement of Rights & Responsibilities Section 5: Protecting Other People's Rights "We respect other people's rights, and expect you to do the same. 1. You will not post content or take any action on Facebook that infringes or violates someone else's rights or otherwise violates the law. 2. We can remove any content or information you post on Facebook if we believe that it violates this Statement or our policies. 3. We provide you with tools to help you protect your intellectual property rights. To learn more, visit our How to Report Claims of Intellectual Property Infringement page. 4. If we remove your content for infringing someone else's copyright, and you believe we removed it by mistake, we will provide you with an opportunity to appeal. 5. If you repeatedly infringe other people's intellectual property rights, we will disable your account when appropriate. 6. You will not use our copyrights or Trademarks or any confusingly similar marks, except as expressly permitted by our Brand Usage Guidelines or with our prior written permission. 7. If you collect information from users, you will: obtain their consent, make it clear you (and not Facebook) are the one collecting their information, and post a privacy policy explaining what information you collect and how you will use it. 8. You will not post anyone's identification documents or sensitive financial information on Facebook. 9. You will not tag users or send email invitations to non-users without their consent. Facebook offers social reporting tools to enable users to provide feedback about tagging." https://www.facebook.com/terms.php
  • 19. Trust in Social Networks vs Trust in Research Research Ethics – Randall Munroe/xkcd (https://xkcd.com/1390/) Licensed under CC-BY-NC 2.5 Trust in social media networks is mixed, with users increasingly savvy about data use… However… • Social Media users can find observation by academic researchers more disconcerting than by the companies who own the platforms. • Research, depending on the topic, can feel like a judgement on behaviours making consent hugely important. • The burden on researchers to be clear about motives, funders, process, etc. is higher than on commercial companies. • There are parallels here to how individuals feel about e.g. Tesco Clubcard or Credit Card data capture vs. surveys and censuses.
  • 20. Question/Discussion What are the ethical concerns and considerations for your current (or previous) social media research?
  • 21. Obtaining Consent • Consent may be implicitly included for API data access in some terms and conditions BUT, when did you last read the terms and conditions? What about your research participants? So: • Obtain explicit consent wherever possible. • Be transparent if you are engaging in research in a space – with a pinned post, link to your participant information sheet, etc. • Consent can be tricky in anonymous and less traditional social media spaces (see e.g. Osborne 2017 for approaches used with Yik Yak). • Apply particular caution to gaining consent for screen shots, attributed posts, reproducing exact images or text of posts etc. Osborne, N. 2017. Addressing ethics of research in anonymous online spaces. In “A Live Pulse”: Yik Yak for understanding teaching, learning and assessment at Edinburgh [blog], 13th July 2017. http://yikyakresearch.blogs.edina.ac.uk/2017/07/13/addressing-ethics-of-research-in-anonymous-online-spaces/
  • 22.
  • 23. Some Common Ethics Pitfalls • Researcher assumes public data can be used in any way desired, without considering the subject(s) intent when originally sharing their profile/post etc. • Researcher explores conveniently available “public” data without realising that privacy settings may make more information available to them, than is truly “public”. • Researcher is using “big” data under belief that individuals will not be identifiable (as in the “But the data is already public” case). • Research subject(s) has shared data on a public site but is not aware of their own settings, or has not checked them lately, making implicit consent and the public nature of the data problematic. Discovering that they have been included in published research may be upsetting and problematic. • Research Ethics Committees and/or Journal Editorial Boards are unaware or do not properly consider that social media data includes real names, pseudonyms, locations, highly disclosive data and do not ask the right questions around the consent process, collection, aggregation, storage and retention of data. • Researcher uses full text of a post as an “anonymous” example but this is then Googled which identifies the original post/tweet/content and individual.
  • 24. Data Considerations • What kind of research approach are you taking? • Who or what is the subject of your research – what is the right social media space to capture appropriate data? • What scale of data are you looking to collect/harvest? (If working with big data see boyd & Crawford 2012) • Will you be sampling or looking to collect all data over a specific time period? • How sensitive is the topic? • What level and type of consent can you obtain from participants? • What kind of content? – Profiles – for network analysis, image analysis, qualitative review of content through profile components/data? – Posts – through API/data feed/harvesting or observation? Textual, visual, multimedia? Manual coding or text/data mining? – Comments/discussion – contents or threads of discussion? – Metadata – tags, likes, engagements? • Time bounds – how long do you expect to collect data for? • What use will you make of the data after capture? boyd, d. and Crawford, K., 2012. Critical questions for big data. In Information, Communication & Society special issue: A decade in internet time: the dynamics of the internet and society, 15 (5). http://dx.doi.org/10.1080/1369118X.2012.678878
  • 25. Sources of baseline data on usage, access, trends, literacies etc. • Oxford Internet Surveys: biennial data on UK public use and attitudes to the internet, including social media: http://oxis.oii.ox.ac.uk/research/dataset-request/ • Ofcom research and data: Regular reporting on UK public use and attitudes to media, including internet and social media: https://www.ofcom.org.uk/research- and-data/search. Includes: – Annual adult media use and attitudes, and children’s media literacy reporting: https://www.ofcom.org.uk/research-and-data/media-literacy-research; – Communications Market Report: annual overview at consumer use of communications of all types: https://www.ofcom.org.uk/research-and-data/multi-sector-research/cmr – Further regular and one-off data via the statistical release calendar: https://www.ofcom.org.uk/research-and-data/data/statistics • Pew Internet & American Life datasets: data on US public use, knowledge and understanding of the web, digital literacy, social media, etc: http://www.pewinternet.org/datasets/. For example: – Social Media Update 2016: http://www.pewinternet.org/2016/11/11/social-media-update-2016/
  • 26. Sources of Official Social Media Usage Data, Trends, Financials, etc. Best sources are quarterly earnings reports and presentations, typically including: monthly active users, usage trends, earnings, monetization strategies, financials, future plans: • Facebook & Instagram & WhatsApp: https://investor.fb.com/home/default.aspx • Twitter: https://investor.twitterinc.com/results.cfm • SnapChat: https://investor.snap.com/events-and-presentations/events • YouTube/Google via Alphabet: https://abc.xyz/investor/ • Flickr: – Currently owned by Oath, should be via Verizon once deal closes: http://www.verizon.com/about/investors – Historical up to 2017, via Yahoo captures in the Internet Archive: https://web.archive.org/web/*/https://investor.yahoo.net/index.cfm • LinkedIn: – Current via Microsoft: https://www.microsoft.com/en-us/investor/ – Historical up to 2016: https://news.linkedin.com/topic/earnings • Weibo: http://ir.weibo.com/phoenix.zhtml?c=253076&p=irol-irhome
  • 27. Privately Held Social Media • Crunchbase (https://www.crunchbase.com/) is a good source of information on shareholders/owners, acquisitions, finances, etc. • Alexa web rankings (owned by Amazon) give an overview of usage levels and trends based on ranking relative to other sites in the US, and globally. • Social Media sites’ “business” and “press” sites, official blogs and news releases are best for user data. • Some social media provide advertising APIs – which may be usable for research depending on T&Cs and data content - but not developer or open APIs, e.g. Snapchat: https://www.snap.com/en-GB/news/post/third-party- applications-and-the-snapchat-api/ e.g: – Pinterest: • data on usage from Pinterest: https://business.pinterest.com/en • Alexa data on usage: https://www.alexa.com/siteinfo/pinterest.com • investor data: https://www.crunchbase.com/organization/pinterest/investors/investors_list
  • 28. Data Quality & Reliability • Data sources and APIs can change regularly, and what is available may change over time (e.g. Twitter moved from all to “Top” tweets some years ago for its API; Facebook have changed data structures multiple times). • Errors in automated data collection can be hard to spot until analysis is undertaken – sampling, trial data collection, and review of code by colleagues can all be useful. • Gaps in data may occur because there are genuine gaps in data creation/posting etc; because there are technical issues with the social media service; because of an error in your code; or because you are over your API rate limit for the minute/hour/day. • Data may change over time – Facebook and Instagram allow posts to be edited so a request will capture one moment in time not necessarily the original or final versions. • Data may disappear over time. Notable example: the Twitter deletions terms and conditions means that deleted tweets will not appear in a later API call. – Research tools obeying the T&Cs will also update and remove deleted tweets. – Research tools retaining deleted tweets are technically in breach of the T&Cs. • Acquisitions, Mergers, and shut downs of social media sites can lead to changed terms and conditions, changes to data availability and use, changes or removals of APIs and data access routes, changes to user presence in a space, acceptable norms within a space (important for qualitative work particularly).
  • 29. Hidden pre-filtering and sampling • Not all social media posts are equally likely to be included in standard API endpoints – e.g. a Twitter user with few posts and few followers is unlikely to appear on a popular hashtag. – The standard "Streaming" and "Search" APIs include 1% of Tweets and varies in accuracy depending on activity/time etc. (See Morstatter et al 2013). • Privacy settings will reduce the accuracy of any data sampled from Facebook or other more complex privacy networks but it is hard to see what is being excluded. Morstatter, F., Pfeffer, J., Liu, H. and Carley, K.M., 2013. Is the Sample good enough? Comparing data from Twitter's streaming API with Twitter's Firehose. In ICWSM 2013 and eprint arXiv:1306.5204. Available from: https://arxiv.org/abs/1306.5204
  • 30. Question/Discussion Have you already tried obtaining data for the social media space you are using in your research? Have you faced any challenges or obstacles?
  • 31. Existing Data Sets • “The Zuckerberg files”: digital archive of all public comments by Mark Zuckerberg including social media and mainstream media content for research use: https://www.zuckerbergfiles.org/ • FiveThirtyEight Data: archive of data associated with FiveThirtyEight articles, including social media data sets: https://github.com/fivethirtyeight/data • Lumen database – tracking legal notices and complains for removal of online materials (including social media content): https://www.lumendatabase.org/ • CSIRO (Australia’s national science agency) We Feel – emotions in Tweets – API: http://wefeel.csiro.au/#/api (see: http://datadrivenjournalism.net/resources/we_feel) • Stanford Large Network Dataset Collection - includes social network data sets: https://snap.stanford.edu/data/ • Network Repository – network datasets, including social media, Facebook and Twitter networks: http://networkrepository.com/ • DocNow – social justice social network archives: http://www.docnow.io/
  • 32. Cross-site data tools • North Caroline Social Media Archive Toolkit: https://www.lib.ncsu.edu/social-media-archives-toolkit; see also: https://github.com/NCSU-Libraries/Social-Media-Combine • Social Mention (search engine for social media) API: http://www.socialmention.com/api/ • Scrapebox (premium tool) YouTube Downloader: http://www.scrapebox.com/youtube-downloader and Social Account Scraper: http://www.scrapebox.com/social-account-scraper • ESRC COSMOS Open Data Tools (available but no longer updated since 2014): http://socialdatalab.net/software • Overview of Twitter data tools (Ahmed 2015): http://blogs.lse.ac.uk/impactofsocialsciences/2015/07/10/social- media-research-tools-overview/
  • 33. Recommended: DMI Tools The Digital Methods Initiative add new (documented) tools all the time, including: • Censorship Explorer – determine censorship in various regions through URLs & proxies. • Discus (Disqus) Comment Scraper – obtain data from the Discus comment plugin. • Expand Tiny URLs – automatically expand large collections of Tiny URLs (e.g .from tweets). • Geo IP – translate URLs or IP addresses into geographic locations (e.g. for a blog). • Instagram Hashtag Explorer – retrieve Instagram media via specific hashtags. • Issue Crawler – uses URLs to analyse relationships and connections through links between URLs. • Netvizz (Facebook) – extracts data from Facebook around groups, pages, search. • Pinterest Scraper – scrapes Pinterest URLs and captures metadata of pins. • Tumblr – data capture based on a Tumblr tags which retrieves metadata and co-incident tags. • Twitter Capture and Analysis Toolset (DMI-TCAT) – robust and reproducible tool for data capture and analysis of Twitter data. Source code available for local use. • YouTube Data Tools – extract data on YouTube channels and videos, e.g. channel networks. Access documentation and DMI tools at: https://wiki.digitalmethods.net/Dmi/ToolDatabase See also, DMI Protocols: https://wiki.digitalmethods.net/Dmi/DmiProtocols
  • 35. Internet Archive & WayBackMachine • Global archive capturing websites (to various levels of detail/depth) based on IA targets and user-submitted requests (since 2001). • You can request a site for archiving, or a group of sites. • Searchable resource OR can use exact URL to retrieve previous archived pages (WayBackMachine). • Collections exist for various social media collections, e.g: – 2016 US Presidential Election Social Media: https://archive.org/details/2016electiontwitter – Arab America on Social Media: https://archive.org/details/ArchiveIt-Collection-2797 – Gif Cities (Gifs from GeoCities): https://gifcities.org/ • Great for social media website changes, blogs, terms and conditions versions, etc. • Sites available in a range of archive formats (IA), or as viewable pages (WayBackMachine). • See: – https://archive.org/ – https://archive.org/web/
  • 37. UK Web Archive • Run by the British Library (since 2004). • Indexes (UK/related) sites to a greater depth than the Internet Archive. • Smaller archive. • You can request a site for archiving. • Special Collections include: – UK Blogs: https://www.webarchive.org.uk/ukwa/collection/100698/page/1/source/colle ction – London Terror Attacks, 2005 (mainstream and social media commentary): https://www.webarchive.org.uk/ukwa/collection/100757/page/1/source/colle ction – Olympic & Paralympic Games 2012 (mainstream and social media): https://www.webarchive.org.uk/ukwa/collection/4325386/page/1/source/coll ection • See: https://www.webarchive.org.uk/ukwa/.
  • 38. Other Web Archive Resources • Rhizome: archiving for internet art, including interactive works engaging with/critiquing social media: http://rhizome.org/ • Note: EDINA are currently working on an archiving tool for researchers, ask me for more info on Site2Cite.
  • 39. Using APIs to obtain Data • APIs (Application Programming Interfaces) exist for most social media sites and allow direct requests for data. • Some unofficial APIs exist for sites without official/open APIs. Use only with caution as these frequently have privacy, security or legal issues. • Consider working with text and data mining colleagues, or developers, to seek additional ways to capture data such as: – Screen scraping (automated capture of pages from a user perspective). – Mobile data collection or data capture approaches to social media. – Internet archiving approaches using standard tools or code libraries
  • 40. Glossary: Data Request Terms • API: Application Programming Interface – a way to request data from a web service. • REST or RESTful API: REST stands for “Representational State Transfer” and means an API that uses HTTP (the protocol for accessing websites) requests (or “calls”) to: – GET – read access to content such as posts, users, etc. This is the main request you would use to retrieve data. – PUT – update or replace data. – POST – create new data (such as a post to a blog, a wiki page, etc.). – DELETE – Delete content. • An API Endpoint – is essentially the way to address and structure what kind of request you are making. E.g. home_timeline vs user_timeline. Each endpoint provides a different entry to the data behind a web service. • In a REST GET request you may have: – Fields – the various fields of data you want to retrieve, e.g. link, message, post, etc. These are usually shown in the Developer Documentation. – Modifiers or Parameters - these act like filters, limiting the request in a specific way, e.g. only retrieving posts with a location attached. – Operators – are the various standard terms/labels for content and content types that you can use in your GET request to shape and customise it, for instance this might include “retweets_of” or “bio” or “has:links” etc. • Other types of APIs and M2M (Machine-to-Machine) interfaces exist including “SOAP” and “RPC”. • SDK is Standard Developer Kit and is used increasingly often as a way to package various requests for developers to use in web or mobile apps (SDKs has been used as a term for the coding tools for smartphone platforms iOS and Android for years).
  • 41. Locating or Requesting Social Media Data ProgrammableWeb (https://www.programmableweb.com/) is a great source for API information for social media sites: • Instagram Developer: https://www.instagram.com/developer/ – API Endpoints: https://www.instagram.com/developer/endpoints/ • Twitter Developer: https://developer.twitter.com/ – APIs: https://developer.twitter.com/en/docs – GNIP: http://support.gnip.com/apis/ - premium "Firehose" access. See also Twitter Enterprise: https://developer.twitter.com/en/enterprise – Free APIs cover 7 days tweets; Premium APIs exist for 30-day search and full archive search. – Facebook for Developers: https://developers.facebook.com/ – API (Graph API): https://developers.facebook.com/docs/graph-api/ • YouTube Developers: https://developers.google.com/youtube/ – APIs (Comments and Comment Threads particularly useful): https://developers.google.com/youtube/v3/docs/ • Weibo API: http://open.weibo.com/wiki/API%E6%96%87%E6%A1%A3/en
  • 42. How do you make an API call? • For open RESTful APIs you can enter an HTTP request in any browser window, e.g. http://services.groupkt.com/state/get/USA/all • Most social media APIs now require you to register your app, request a key from them and for you to include the access tokens in your request. • In general API calls are made from within a small programme – this might be running on your machine or from a browser based coding tool. • Lots of existing tools based on social media APIs exist – see later slide for a sample of these. • Try it out: – Codecademy Twitter API tutorial: https://www.codecademy.com/en/tracks/twitter
  • 43. An API Endpoint is a bit like a vending machine… Vending machine priced by grams of fat, Google, San Jose, California.jpg by Flickr user Cory Doctorrow. You have to use the right machine to get hold of the item you want, then you have to enter the right code and the right price to get your candy. • Each item has a name, and a standard way to access it (in a vending machine this is the item code). • Each item has a value (in a vending machine this is the delicious edible contents of each item). • Each item requires some sort of trust exchange before you can access it (in a vending machine this is cash). • In an API that “E12” item code is actually going to look more like: https://api.twitter.com/1.1/statuses/user_timeline.jso n?screen_name=twitterapi&count=2 • In an API the price is usually a unique key/access token that is unique to you and your app – that indicates a legitimate request and who it’s from. Bonus: In APIs there is usually a huge range of data (research candy) to ask for, and lots of filtering options.
  • 44. What will you get back from an API GET request? Assuming it has worked correctly, something like this… See the full example at: https://developer.twitter.com/en/docs/tweets/timelines/api- reference/get-statuses-user_timeline.html Each of these is a new field for a single tweet and it’s value. [] is an empty field (e.g. no hashtag on this tweet).
  • 45. This data can then be processed by your app, or simply retrieved and stored in a database or spreadsheet…
  • 46. Recommended Tool: Martin Hawksey’s TAGS • Uses Google Docs to capture tweets based on a hashtag, search term, user, etc. • Can be automated to allow rolling capture. • Useful for capturing a sample of long term community dialogues or public discourse where Top Tweets/7 day limits will be acceptable. • Includes spreadsheet; visualisation; searchable archive - latter two options are only available if you make data (semi) public. • Uses Twitter API – takes “Top” rather than “Latest” tweets so accuracy depends on popularity of content/hashtags. • Well documented and supported by Martin. • A great way to dip your toe in the API water – you have to obtain a key the first time you run TAGs, and can access and look at the code it runs. You can also make more advanced use of the tool and automation connecting it to other visualisations and analysis tools. • See: https://tags.hawksey.info/ • Support: https://tags.hawksey.info/forums/
  • 47.
  • 48. Question/Discussion Do you have any experience or recommendations for social media data collection tools or approaches? Have you attended one of the Digital Scholarship sessions where CAHSS researchers can meet with developers and data specialists? [Recommended!]
  • 49. Analysis & Visualisation Further information, tutorials etc. online and/or running through Digital Scholarship and Schools Research Methods training. • Nvivo (http://www.qsrinternational.com/nvivo/) – Premium qualitative data analysis software with social media and multimedia support, collaborative working also supported. Feature rich. Training available. Available through UoE/CAHSS license: https://www.ed.ac.uk/information-services/computing/desktop-personal/software/main-software-deals/nvivo. • IBM SPSS (https://www.ibm.com/analytics/us/en/technology/spss/) – Premium data analysis tool for surveys and particularly for quantitative data, widely used in social sciences. Available through UoE license: https://www.ed.ac.uk/information- services/computing/desktop-personal/software/main-software-deals/spss. • Dedoose (http://www.dedoose.com/) – Premium qualitative data analysis software with simple interface, tagging, annotation and exploration options. • Chorus (http://chorusanalytics.co.uk/) – Free software for data harvesting and analytics for social science research using Twitter data • Gephi (https://gephi.org/) – Visualisation and exploration of multiple data types, particularly good for network analysis. Feature rich so a bit of a learning curve. Free download. • D3 visualisation libraries (see: https://github.com/d3/d3/wiki/gallery) – Free collection of Javascript libraries for use in data visualisation and exploration of multiple data types. • NodeXL (https://nodexl.codeplex.com/) – Free network visualisation tool for Excel. Free. • TAGS Explorer (https://tags.hawksey.info/) – Twitter only visualisations of networks (using NodeXL) and searchable timeline archive explorations. Free. • Textal (http://www.textal.org/) – Text analysis tools for mobile use with Twitter streams, websites (inc. blogs), and documents. Free. • Tableau (https://www.tableau.com/) – Visualisation of multiple data sources and types. Free trial, otherwise monthly subscription. A large quantity of open source tools and software are available. Search for these or look at the Journal of Open Research Software (https://openresearchsoftware.metajnl.com/) or the Journal of Open Source Software (http://joss.theoj.org/) for well documented research- driven examples. See also Tony Hirst’s OU Useful Blog (https://blog.ouseful.info/) for visualisation approaches. There are also many marketing packages for social media analysis which could be used/adapted for research where their processes are well documented.
  • 50. Appropriate Handling & Storage • Data is usually returned with unique identifiers that can be easily traced back to the original poster/subject. • The unique identifiers connect conversations and posts so are hard to strip away entirely – although you could try a one-way hash of the data to mask the identifiable information but retain connections. • Short posts and tweets are highly identifiable. Try Googling or searching Twitter for a recent tweet to see that in action. • Images and videos can also be relatively easily compared/reverse image searched and therefore identifiable. • Think about which fields you actually need to retain for your research question(s). • Plan how long you will keep your data, and how you will keep it secure - where and how you store your data really matters.
  • 51. Data Protection & GDPR • Be aware of current Data Protection (Data Protection Act 1998) guidance on the use, storage and retention of personal data. • From 25th May 2018 the General Data Protection Regulation (GDPR) comes into effect with: – Increased rights for individuals to understand the use, access, rectification, erasure, rights to restrict processing, portability, and rights to object to the use of their data. – Increased legal measures for organisations breaching GDPR guidance. • Ensure your Consent process, your Research Data Management plans, and your use, access and disposal of data is compliant. • By default social media APIs provide a lot of data: – What is the minimum data you require? – Removing unneeded data at the point of collection and/or data cleaning will help reduce any risks of exposure or non-compliance with data protection legislation. See: • Data Protection Act 1998: https://www.legislation.gov.uk/ukpga/1998/29/contents • ICO guidance: https://ico.org.uk/for-organisations/data-protection-reform/overview-of-the-gdpr/
  • 52. Local Support • Research Data Mantra – self-led course on Research Data Management, including appropriate handling, storage and planning for onward preservation, sharing or destruction: http://mantra.edina.ac.uk/ • Data Store – secure storage for active research data, available to all staff and PGR students: https://www.ed.ac.uk/information-services/research- support/research-data-service/working-with-data/data-storage • Working with Sensitive Data – guidance and further resources on working with sensitive and personal data: https://www.ed.ac.uk/information- services/research-support/research-data-service/working-with- data/sensitive-data • Information Security Team – guidance on legal and technical approaches to keeping data secure and appropriately encrypted and disposed of: https://www.ed.ac.uk/infosec
  • 53. Making Research Data Open • If you have a consent process in place, ensure you request consent for any onward use you expect to make of your data. And ensure there is a process to withdraw consent for onward. • Beware verbatim quoting in publications – it can be easy to search back to the original text. – Public figures who would consider their social media content a publication and part of their profile (e.g. politicians) are more appropriate to quote, where needed. – Even if anonymous/not attributed it is safer to paraphrase short comments where possible to make reverse searching more challenging. • Screenshots of posts often reveal the subject name, image, location, and their contacts. Only use these where appropriate, properly consented to, and where you are not placing your subjects at risk. • Consider the timelag between data collection and any publication. Is your consent from participants still valid if a year has passed? What about 2 years? Or 5 years? A teen participant may feel differently about data being exposed when they are, for instance, a newly qualified lawyer or medic with very different reputational considerations. See also: University of North Carolina at Chapel Hill and UoE Research Data Management and Sharing (Coursera): https://www.coursera.org/learn/data-management
  • 54. Courses and Information • DMI Digital Methods online course: https://wiki.digitalmethods.net/Digitalmethods/WebHome • UCL Why We Post: the Anthropology of Social Media course (FutureLearn): https://www.futurelearn.com/courses/anthropology-social-media • QUT Social Media Analytics: Using Data to Understand Public Conversations (FutureLearn): https://www.futurelearn.com/courses/social-media-analytics • Rutgers University Social Media Data Analytics (Coursera): https://www.coursera.org/learn/social-media-data-analytics • Doing Journalism with Data: First steps, skills and tools: http://learno.net/courses/doing-journalism-with-data-first-steps-skills- and-tools • UoE Digital Footprint MOOC – understand some of the challenging identity, privacy and ethical concerns around social media for you and your research subjects: https://www.coursera.org/learn/digital-footprint/
  • 55. Useful Niche Resources • Utrecht Data School Data Ethics Decision Aid (DEDA): https://dataschool.nl/research/deda/?lang=en • Programming Historian Data Mining the Internet Archive lesson: https://programminghistorian.org/lessons/data-mining-the-internet-archive • Insight News Lab Social Network Analysis and Visualisation for #RDAPlenary 3 (using ScraperWiki and OpenRefine): http://hujo.deri.ie/rdaplenarysn/ • Tony Hirst First Baby Steps to Anonymising Data with Open Refine: https://blog.ouseful.info/2015/01/23/anonymising-data-with-open-refine/ • Tony Hirst Social Interest Positioning – Visualising Facebook Friends’ Likes with Data Grabbed Using Google Refine: https://blog.ouseful.info/2012/01/04/social- interest-positioning-visualising-facebook-friends-likes/ • Tony Hirst Grabbing Twitter Search Results into Google Refine and Exporting Conversations into Gephi  needs updating for new Twitter API: https://blog.ouseful.info/2012/10/02/grabbing-twitter-search-results-into-google- refine-and-exporting-conversations-into-gephi/
  • 56. Local research and expertise (a small sampling thereof!) • Social media, Digital Ethnography, Sociological research methods– Kate Orton Johnsone (Sociology) • Social Media, Digital Labour – Karen Gregory (Sociology). • Communities on the Darknet, illicit markets and cultures – Angus Bancroft (Sociology). • Social media in education; bots; anonymity in social media – Sian Bayne (Research in Digital Education Centre, Moray House). • Digital cultural heritage learning and engagement– Jen Ross (Research in Digital Education Centre, Moray House); Claire Sowton (CAHSS); Melissa Terras (UCL/CAHSS). • Text and data mining of social media content – Claire Grover (Informatics); Richard Tobin (Informatics); Clare Llewellyn (Informatics; Neuropolitics Research, SPS). • Sharing of photography, autobiographical memory and distributed cognition (inc. social media) – Tim Fawns (Clinical Education, Centre for Medical Education, MVM). • Big data (inc. social media) in healthcare – Mhairi Aitken (Usher Institute, MVM). • Social media, Digital Footprint, blogging and Buddhism – Louise Connelly (Vet School). • Mobility, mobile technology, formal and informal education communities around the world – Michael Sean Gallagher (Research in Digital Education Centre, Moray House). • Playful learning in informal digital environments – Clara O’Shea (Research in Digital Education Centre, Moray House). • Social media and politics – Neuropolitics Research group: Laura Cram (Politics, SPS); Robin Hill (Informatics; SPS); Sujin Hong (SPS); Adam Moore (PPL). • Visualisation of big data, including network analysis – Benjamin Bach (Design Informatics). • Social Media and scholarly communities–Sara Shinton (IAD); James Stewart (SPS).
  • 57. Recommended work & groups researching in this area • UoE Beyond Text Network – interdisciplinary network for social media and multimedia researchers: https://www.wiki.ed.ac.uk/display/DIG/Beyond+Text • UoE Informatics Language Technology Group – text mining expertise working on projects including topic modelling and social media analysis: https://www.ltg.ed.ac.uk/ • Digital Methods Initiative (DMI) (European multi-organisation research group): https://wiki.digitalmethods.net/Dmi/DmiAbout • Microsoft Research Social Media Collective (US) – particularly danah boyd, Nancy Baym and Kate Crawford’s work: https://www.microsoft.com/en-us/research/group/social-media-collective/ • #NSMNSS: New social media, new social science? - great blog reflecting on social science methods around social media http://nsmnss.blogspot.co.uk/ • Oxford Internet Institute – particularly strong on relationships to mainstream media environment: https://www.oii.ox.ac.uk/ • Visual Social Media Lab (Sheffield) – led by Farida Vis: http://visualsocialmedialab.org/ • DocNow – social justice social media archiving: http://www.docnow.io/ • Data Driven Journalism (European Journalism Centre and Netherlands): http://datadrivenjournalism.net/ • Analysing Social Media Collaboration (UK cross-institution group, site now dormant) – responsible for the high profile “Reading the Riots” Twitter analysis work in 2011: http://www.analysingsocialmedia.org/home • Michael Zimmer – influential work on privacy, leading projects on privacy and Facebook: http://www.michaelzimmer.org/ • Electronic Freedom Foundation –advocates with expertise on privacy and tracking in social media: https://www.eff.org/ • Centre for Social Media Research (University of Westminster): https://www.westminster.ac.uk/social-media-research • Digital Media and Society Research Group (Cardiff): https://www.cardiff.ac.uk/research/explore/research-units/digital- media-and-society • COSMOS (legacy page for Cardiff research group): http://www.cs.cf.ac.uk/cosmos/
  • 58. Recommended Journals • First Monday (University of Illinois at Chicago): http://firstmonday.org/index • New Media & Society (Sage): http://journals.sagepub.com/home/nms • Information, Communication & Society (Taylor & Francis): http://tandfonline.com/toc/rics20/current • Social Media + Society (Sage): http://journals.sagepub.com/home/sms • Big Data & Society (Sage): http://journals.sagepub.com/home/bds • Policy & Internet (Wiley): http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1944-2866 • Journal of Computer-Mediated Communication (Wiley): http://onlinelibrary.wiley.com/journal/10.1111/(ISSN)1083-6101 • Cyberpsychology, Behaviour, and Social Networking (Mary Ann Liebert Inc.): http://online.liebertpub.com/loi/CYBER • Journal of Broadcasting & Electronic Media (Taylor & Francis): http://www.tandfonline.com/toc/hbem20/current
  • 59. Relevant Upcoming Digital Scholarship Sessions • Digital Research Clinics and Resources (26th October 2017) • Cleaning Data with Open Refine (1st November 2017) • Regex: Regular Expressions (23rd November 2017) • Introduction to Sentiment Analysis: What it is and how to do it simply (14th December 2017) Look out for further sessions and/or contact the team with any specific requests: http://www.digital.cahss.ed.ac.uk/
  • 60. Questions & Discussion Or follow up after today: nicola.osborne@ed.ac.uk

Editor's Notes

  1. An awful lots of social media research is on Twitter? Why Researchers use Twitter It’s really easy to get data from It feels influential and visible Usage has gone u: Around 45% of UK Online Adults use Twitter, 37% have an account and login daily [http://www.rosemcgrory.co.uk/2017/01/03/uk-social-media-statistics-for-2017/] But a lot of Twitter users NEVER post – 1% of accounts post 20% of all tweets. And most Twitter users have modest followings, impact and visibility, and their tweets won’t make search results on busy hashtags unless they have a connection. Twitter is no longer a serendipitous network, it is filtered and tailored so that it has some of the characteristics of Facebook in terms of visibility and “filter bubbles”.