Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The evolution of research on
social media
Farida Vis, University of Sheffield
@flygirltwo
European Conference on Social Me...
ACADEMIA
INDUSTRY
GOVERNMENT
SOCIAL MEDIA = BIG
DATA
REAL-TIME ANALYTICS
SOCIAL MEDIA =
TWITTER
WHERE ARE THE RQs?
WHERE’S THE
THEORY?
ETHICS
METHODS
SAMPLING
DATA SHARING
WHERE’S THE
FUNDING?
WHAT’S THE FUTURE?
Aftermath of Hurricane Katrina
2005: Flickr
2008: YouTube
Fitna: The Video Battle
2011: Twitter
Reading the Riots on Twitter
data
data
unstructured
235 posts – 106 individuals
(Flickr)
Aftermath of Hurricane Katrina
2005
Manual collection possible
1413 videos – 700 individuals
(YouTube)
Fitna: The Video Battle
2008
+ Computer Science
2.6 million tweets – 700K individuals
(Twitter)
Reading the Riots on Twitter
2011
+ Lots of Computer Science
READING
THE RIOTS
ON TWITTER
Rob Procter (University of Manchester)
Farida Vis (University of Leicester)
Alexander Voss (U...
BORDER RUNNER
BIG DATA
“Big data” is high-volume, -velocity and –variety
information assets that demand cost-effective,
innovative forms of infor...
• Technology: maximizing computation power and
algorithmic accuracy to gather, analyze, link, and
compare large data sets....
Critiques of Big Data
• Important to make visible inherent claims about
objectivity
• Problematic focus on quantitative me...
Data fundamentalism
The notion that correlation always indicates causation,
and that massive data sets and predictive anal...
CRITICAL BIG DATA
STUDIES?
How do we ground online data?
In the offline: assessing findings against what we
know about an offline population (census ...
Important considerations
1. Asking the right question – research should
be question driven rather than data driven.
2. Acc...
A critical reflection on Big Data:
considering APIs, researchers and
tools as data makers
Rather than assuming data already exists ‘out there’, waiting to
simply be recovered and turned into findings, the article...
Twitter data ecosystem
Standard API sampling problems
Sampling from the FIREHOSE
1% random sample of the firehose
If not rate limited – all data ...
New API sampling problems
New business models: enriched metadata
Social media vs social data
Datasift, GNIP and Topsy
Social media VS social data
• Social Media: User-generated content where one user
communicates and expresses themselves an...
Social media
Social data
Enriched metadata
Location and influence
Where are users?
Are they influential?
Twitter expanding/enriching metadata
Hawskey (2013)
New Profile Geo Enrichment
Geo-locating tweets
Exact location
Lat/long coordinates
Gold standard geo data
Problem: only 1% of users
-> Only 2% of fir...
Profile Geo Enrichment
‘our customers can now hear from the whole world of Twitter users
and not just 1%’ (Cairs, 2013 on ...
Problem with deleted tweets
‘A deleted tweet effectively disappears from the results of searching
Twitter, although a shor...
Profile Geo Enrichment
Linking data
‘Profile location data can be used to unlock demographic data
and other information th...
Social influence: Klout scores
Klout used to make profiles without consent
This is a DM. What is it doing here?
Klout rewards users for giving data
More data = more influential?
Online/offline?
Scores are easily gamed
Fake followers: Mitt Romney’s 100,000 extra followers in one day
As many as 20 million fake follower accounts (200 million...
Klout scores industry standard measure…
Ability to describe the limitations of our data:
- APIs as data makers. Once data is linked very
hard to untangle how meta...
- When creating a dataset important to
describe how it was made, what the
limitations are. What the sampling limitations
(...
Tools as data makers
In answering complex questions about social media
data, we need:
1. Know the questions! And know how ...
Need better understanding of complex
ever changing dynamics between
APIs
Researchers
Tools
Organic data / data in the wild
SOCIAL MEDIA SIMPLY AS (BIG)
DATA
VS
SOCIAL MEDIA AS A RESEARCH
AREA
DOMAIN EXPERTISE
TO UNDERSTAND
TWITTER DATA YOU
NEED TO
UNDERSTAND
TWITTER
+ BE ON TWITTER
WHAT’S THE FUTURE?
WHAT GETS LEFT OUT?
750 MILLION
IMAGES
SHARED
DAILY
Images posses the ability to grab our attention
Social media companies know this
Images are key to engagement
Camera: used to be for special occasions
Smartphone: always with us
Everyday snaps Witnessing events
US 65% smartphone penetration
Smartphones overtaken desktop usage to access the internet
Mobile internet accounts for majo...
UK: The over-55s will experience the fastest year-on-year rises in
smartphone penetration.
Smartphone ownership should inc...
Rise of platforms and apps focused on visual content
Pinterest
Tumblr
Instagram
Vine
Snapchat
‘Mobile first… and only’ | s...
Facebook daily image uploads: 350 million (November 2013)
Instagram daily image uploads: 60 million (March 2014)
Twitter: ...
Images largely ignored in
social media research
Not easy to ‘mine’
Hard to figure out meaning
Huge interest in industry
WHAT DOES THE
FUTURE OF SOCIAL
MEDIA RESEARCH
LOOK LIKE?
QUESTION DRIVEN
(TOOL AWARE) + CRITICAL
BETTER METHODS
MORE THEORY
TRANSPARENT
SUSTAINABLE
ETHICAL
CROSS PLATFORM
INTERDIS...
visualsocialmedialab.org
@VisSocMedLab
f.vis@sheffield.ac.uk
@flygirltwo
References
• Hazim Almuhimedia, Shomir Wilsona, Bin Liua, Norman Sadeha, Alessandro Acquistib, 2012. ‘Tweets Are
Forever: ...
The evolution of research on social media
The evolution of research on social media
The evolution of research on social media
The evolution of research on social media
The evolution of research on social media
Upcoming SlideShare
Loading in …5
×

The evolution of research on social media

3,716 views

Published on

Keynote at the first European Conference on Social Media, 10 July, 2014 Brighton, United Kingdom.

Published in: Social Media, Technology, Business
  • Be the first to comment

The evolution of research on social media

  1. 1. The evolution of research on social media Farida Vis, University of Sheffield @flygirltwo European Conference on Social Media, 10 July, Brighton, United Kingdom
  2. 2. ACADEMIA INDUSTRY GOVERNMENT
  3. 3. SOCIAL MEDIA = BIG DATA
  4. 4. REAL-TIME ANALYTICS
  5. 5. SOCIAL MEDIA = TWITTER
  6. 6. WHERE ARE THE RQs?
  7. 7. WHERE’S THE THEORY?
  8. 8. ETHICS METHODS SAMPLING DATA SHARING
  9. 9. WHERE’S THE FUNDING?
  10. 10. WHAT’S THE FUTURE?
  11. 11. Aftermath of Hurricane Katrina 2005: Flickr
  12. 12. 2008: YouTube Fitna: The Video Battle
  13. 13. 2011: Twitter Reading the Riots on Twitter
  14. 14. data data unstructured
  15. 15. 235 posts – 106 individuals (Flickr) Aftermath of Hurricane Katrina 2005 Manual collection possible
  16. 16. 1413 videos – 700 individuals (YouTube) Fitna: The Video Battle 2008 + Computer Science
  17. 17. 2.6 million tweets – 700K individuals (Twitter) Reading the Riots on Twitter 2011 + Lots of Computer Science
  18. 18. READING THE RIOTS ON TWITTER Rob Procter (University of Manchester) Farida Vis (University of Leicester) Alexander Voss (University of St Andrews) [Funded by JISC] #readingtheriots
  19. 19. BORDER RUNNER
  20. 20. BIG DATA
  21. 21. “Big data” is high-volume, -velocity and –variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making’ (Gartner in Sicular, 2013). Huge industry now built around ‘social data’ and ‘listening platforms’ feeding on this data (Many tools not suitable for academic use, black box).
  22. 22. • Technology: maximizing computation power and algorithmic accuracy to gather, analyze, link, and compare large data sets. • Analysis: drawing on large data sets to identify patterns in order to make economic, social, technical, and legal claims. • Mythology: the widespread belief that large data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of truth, objectivity, and accuracy. (boyd and Crawford p. 663).
  23. 23. Critiques of Big Data • Important to make visible inherent claims about objectivity • Problematic focus on quantitative methods • How can data answer questions it was not designed to answer? • How can the right questions be asked? • Inherent biases in large linked error prone datasets • Focus on text and numbers that can be mined algorithmically • Data fundamentalism
  24. 24. Data fundamentalism The notion that correlation always indicates causation, and that massive data sets and predictive analytics always reflect ‘objective truth’. Idea and belief in the existence of an objective ‘truth’, that something can be fully understood from a single perspective, again brings to light tensions about how the social world can be made known.
  25. 25. CRITICAL BIG DATA STUDIES?
  26. 26. How do we ground online data? In the offline: assessing findings against what we know about an offline population (census data) in order to better understand online data. Problems with over/under representation in online data? In the online: premised on the idea that data derived from social media should be grounded in other online data in order to understand it. So comparing Facebook use to what we know about Facebook use, rather than connecting it to offline measurements about citizens. Richard Rogers
  27. 27. Important considerations 1. Asking the right question – research should be question driven rather than data driven. 2. Accept poor data quality & users gaming metrics – once online metrics have value users will try to game them. 3. Limitations of tools (often built in disconnected way) 4. Transparency – researchers should be upfront about limitations of research and research design. Can the data answer the questions?
  28. 28. A critical reflection on Big Data: considering APIs, researchers and tools as data makers
  29. 29. Rather than assuming data already exists ‘out there’, waiting to simply be recovered and turned into findings, the article examines how data is co-produced through dynamic research intersections. A particular focus is the intersections between the Application Programming Interface (API), the researcher collecting the data as well as the tools used to process it. In light of this, the article offers three new ways to define and think about Big Data and proposes a series of practical suggestions for making data. (First Monday, October 2013, http://firstmonday.org/)
  30. 30. Twitter data ecosystem
  31. 31. Standard API sampling problems Sampling from the FIREHOSE 1% random sample of the firehose If not rate limited – all data collected?
  32. 32. New API sampling problems New business models: enriched metadata Social media vs social data Datasift, GNIP and Topsy
  33. 33. Social media VS social data • Social Media: User-generated content where one user communicates and expresses themselves and that content is delivered to other users. Examples of this are platforms such as Twitter, Facebook, YouTube, Tumblr and Disqus. Social media is delivered in a great user experience, and is focused on sharing and content discovery. Social media also offers both public and private experiences with the ability to share messages privately. • Social Data: Expresses social media in a computer-readable format (e.g. JSON) and shares metadata about the content to help provide not only content, but context. Metadata often includes information about location, engagement and links shared. Unlike social media, social data is focused strictly on publicly shared experiences. (Cairns, 2013)
  34. 34. Social media Social data
  35. 35. Enriched metadata Location and influence Where are users? Are they influential?
  36. 36. Twitter expanding/enriching metadata Hawskey (2013)
  37. 37. New Profile Geo Enrichment
  38. 38. Geo-locating tweets Exact location Lat/long coordinates Gold standard geo data Problem: only 1% of users -> Only 2% of firehose tweets Early adopters, highly skewed Where in the world are you? No Lat/long coordinates Text field – enter anything Advantage: more than half of all tweets contain profile location Much more evenly distributed
  39. 39. Profile Geo Enrichment ‘our customers can now hear from the whole world of Twitter users and not just 1%’ (Cairs, 2013 on Gnip company Blog) • Activity Location – 1% that provide lat/long • Profile Location – Place provided in their profile. May or may not be posting from there. • Mentioned Location – Places a user talks about ‘Both the tweet text and Profile fields contain geographic information, but not in substantial quantities and have poor accuracy’ (Leetaru et al, First Monday, May 2013)
  40. 40. Problem with deleted tweets ‘A deleted tweet effectively disappears from the results of searching Twitter, although a short delay sometimes occurs between deletion and disappearance. A status deletion notice is distributed via the Twitter streaming API to relevant users’ clients so that they, in turn, remove deleted tweets from their records.’ ‘Twitter does not provide a bulk-deletion of user’s tweets. It provides, however, a one-click bulk-deletion of all location data that were attached to user’s tweets, without deleting the tweets. By clicking on the “Delete all location information” button on user’s account settings page, all locations attached to all previous tweets are deleted. (Almuhimedi et al, 2013)
  41. 41. Profile Geo Enrichment Linking data ‘Profile location data can be used to unlock demographic data and other information that is not otherwise possible with activity location. For instance, US Census Bureau statistics are aggregated at the locality level and can provide basic stats like household income. Profile location is also a strong indicator of activity location when one isn’t provided. (Cairns, 2013)
  42. 42. Social influence: Klout scores
  43. 43. Klout used to make profiles without consent
  44. 44. This is a DM. What is it doing here?
  45. 45. Klout rewards users for giving data More data = more influential? Online/offline?
  46. 46. Scores are easily gamed
  47. 47. Fake followers: Mitt Romney’s 100,000 extra followers in one day As many as 20 million fake follower accounts (200 million active users) This doesn’t take into account the issue of spoof accounts (clearly in evidence in riot tweets) (Perlroth, 2013)
  48. 48. Klout scores industry standard measure…
  49. 49. Ability to describe the limitations of our data: - APIs as data makers. Once data is linked very hard to untangle how metadata is constructed and where problems might be. Included in terms of deleted content. - Researchers and tools as data makers
  50. 50. - When creating a dataset important to describe how it was made, what the limitations are. What the sampling limitations (both in terms of the API, but also related to offline ‘population’. What other limitations re: enriched metadata needs to be described?) - When creating a dataset how complete is it? - Limitations need to be known in order to describe them. This is a real problem.
  51. 51. Tools as data makers In answering complex questions about social media data, we need: 1. Know the questions! And know how they might be answered. 2. Problem with tools: not question driven. Often developed around available (poor quality) data, often by non social media experts, but those with data processing expertise. 3. Tools therefore become data-makers in that they limit the scope of possibility in the questions researchers imagine. This is a huge problem!
  52. 52. Need better understanding of complex ever changing dynamics between APIs Researchers Tools
  53. 53. Organic data / data in the wild SOCIAL MEDIA SIMPLY AS (BIG) DATA VS SOCIAL MEDIA AS A RESEARCH AREA
  54. 54. DOMAIN EXPERTISE
  55. 55. TO UNDERSTAND TWITTER DATA YOU NEED TO UNDERSTAND TWITTER
  56. 56. + BE ON TWITTER
  57. 57. WHAT’S THE FUTURE?
  58. 58. WHAT GETS LEFT OUT?
  59. 59. 750 MILLION IMAGES SHARED DAILY
  60. 60. Images posses the ability to grab our attention Social media companies know this Images are key to engagement
  61. 61. Camera: used to be for special occasions Smartphone: always with us
  62. 62. Everyday snaps Witnessing events
  63. 63. US 65% smartphone penetration Smartphones overtaken desktop usage to access the internet Mobile internet accounts for majority of internet use in US (57%) Users typically access the internet via apps on mobile devices All figures from comScore, US Digital Future in Focus, 2014
  64. 64. UK: The over-55s will experience the fastest year-on-year rises in smartphone penetration. Smartphone ownership should increase to about 50% by year-end, a 25% increase from 2013, but trailing 70% penetration among 18-54s. The difference in smartphone penetration by age will disappear, but differences in usage of smartphones remain substantial. Many over 55s use smartphones like feature phones. All figures from Deloitte, predictions for 2014
  65. 65. Rise of platforms and apps focused on visual content Pinterest Tumblr Instagram Vine Snapchat ‘Mobile first… and only’ | simple easy, user friendly design
  66. 66. Facebook daily image uploads: 350 million (November 2013) Instagram daily image uploads: 60 million (March 2014) Twitter: 500 million tweets daily (March 2014) Snapchat daily snaps: 400 million (November 2013)
  67. 67. Images largely ignored in social media research Not easy to ‘mine’ Hard to figure out meaning Huge interest in industry
  68. 68. WHAT DOES THE FUTURE OF SOCIAL MEDIA RESEARCH LOOK LIKE?
  69. 69. QUESTION DRIVEN (TOOL AWARE) + CRITICAL BETTER METHODS MORE THEORY TRANSPARENT SUSTAINABLE ETHICAL CROSS PLATFORM INTERDISCIPLINARY MORE CROSS SECTOR? MORE FUNDING!
  70. 70. visualsocialmedialab.org @VisSocMedLab f.vis@sheffield.ac.uk @flygirltwo
  71. 71. References • Hazim Almuhimedia, Shomir Wilsona, Bin Liua, Norman Sadeha, Alessandro Acquistib, 2012. ‘Tweets Are Forever: A Large-Scale Quantitative Analysis of Deleted Tweets’, CSCW’13, February 23–27, 2013, San Antonio, Texas, USA, http://www.cs.cmu.edu/~shomir/cscw2013_tweets_are_forever.pdf, accessed 18 September, 2013. • Ian Cairns, 2013. ‘Get More Geodata From Gnip With Our New Profile Geo Enrichment’, Gnip Company Blog, 22 August, at http://blog.gnip.com/tag/geolocation/, accessed 13 September 2013. • Grcommunication, 2012, ‘I will help raise your Klout score by sending you 10Ks and will tweet it out to my 50K+ followers from my 80+ Klout score for $5, http://fiverr.com/grcommunication/help-raise-your-klout- score-by-sending-you-10ks-and-will-tweet-it-out-to-my-17k-followers-from-my-70-klout-score, accessed 19 September 2013. • Anthony Ha, 2013. ‘Gnip Expands Its Partnership With Klout, Becoming The Exclusive Provider Of Klout Topics’, TechCrunch, 8 August, http://techcrunch.com/2013/08/08/gnip-klout/, accessed 19 September 2013. • Martin Hawksey, 2013. ‘Twitter throws a bone: Increased hits and metadata in Twitter Search API 1.1,’ March 28, at http://mashe.hawksey.info/2013/03/twitterthrows-a-bone-increased-hits-and-metadata-in- twitter-search-api-1-1/ , accessed 10 September 2013. • Kalev H. Leetaru, Shaowen Wang, Guofeng Cao, Anand Padmanabhan, and Eric Shook, 2013, ‘Mapping the global Twitter heartbeat: The geography of Twitter, First Monday, Volume 18, Number 5-6 May, http://firstmonday.org/article/view/4366/3654 • Nicole Perlroth, 2013. ‘Fake Twitter Followers Become Multimillion-Dollar Business’, New York Times, Bits blog, 5 April, http://bits.blogs.nytimes.com/2013/04/05/fake-twitter-followers-becomes-multimillion- dollar-business/, accessed 19 September 2013. • Farida Vis, 2013. ‘A critical reflection on Big Data: considering APIs, researchers and tools as data makers’, First Monday, 7 October, http://firstmonday.org

×