An overview of Twitter
analytics
Wasim Ahmed (wahmed1@sheffield.ac.uk)
(Twitter: @was3210)
Acknowledgements to Sergej Lugovic (@sergejlugovic)
Contemporary Issues in Economy & Technology (CIET).
15th June 2016. Split, Croatia.
19/06/2016 © The University of Sheffield
2
About me
• Second Year PhD student from the Information
School, University of Sheffield (UK).
• PhD examines content that is shared on Twitter
during infectious disease outbreaks.
• Run a social media research blog (over 11
thousand hits)
19/06/2016 © The University of Sheffield
3
About me
• Currently working on a PhD project
examining infectious disease outbreaks on
Twitter
• Alongside PhD assisted security research
teams, government, media, and
educational organisations globally
About me…continued
• Also work part time as a Research
Associate: Social Media specialist
19/06/2016 © The University of Sheffield
4
19/06/2016 © The University of Sheffield
5
Overview of workshop
• Part 1 – Overview of Twitter, and case
studies examples
• Part 2 – Overview of Twitter analytics
software / interactive sessions
• Part 3 – Q&A on tools – make sure to jot
down some questions!
19/06/2016 © The University of Sheffield
6
Aims
• Better understand Twitter as a platform
• Provide examples of case studies using
social media analytics
• Gain knowledge and awareness of Twitter
analytics
Twitter
• Twitter allows brief <140 character text
updates, known as ‘tweets’, to be shared
with other users
• Tweets can contain thoughts, feelings,
activities, and opinions (Chew and
Eysenbach, 2010).
19/06/2016 © The University of Sheffield
7
Twitter
• Twitter reports having 316 million monthly
active users
• There being 500 million tweets per day
• 80% of active Twitter users using a mobile
device (About Twitter, n.d.).
19/06/2016 © The University of Sheffield
8
Why Twitter (data)?
• See my LSE impact blog post baseline comparison to Facebook
• Twitter is a popular platform in terms of the media attention it receives and it
therefore attracts more research due to its cultural status
• Twitter makes it easier to find and follow conversations (i.e., by both its search
feature and by tweets appearing in Google search results)
• Twitter has hashtag norms which make it easier gathering, sorting, and expanding
searches when collecting data
• Twitter data is easy to retrieve as major incidents, news stories and events on
Twitter tend to be centred around a hashtag
• The Twitter API is more open and accessible compared to other social media
platforms, which makes Twitter more favourable to developers creating tools to
access data. This consequently increases the availability of tools to researchers.
• Many researchers themselves are using Twitter and because of their favourable
personal experiences, they feel more comfortable with researching a familiar
platform.
19/06/2016 © The University of Sheffield
Different types of Twitter API
• Application Programming Interface
• Twitter’s Search API – focused on relevance and not
completeness, some tweets and users may be missing
from results (7 days back in time up to 3200 queries)
• Twitter Streaming API – The Streaming APIs give
developers low latency access to Twitter’s global
stream of tweet data (live stream)
• Firehose API – in theory, 100% of Twitter data (most
software allows up to 30 days worth of historical
tweets)
19/06/2016 © The University of Sheffield
What if I want data going back
more than 30 days?
• In most instance you will have to pay for it
• I use Texifter (@texifter) with DiscoverText
(@discovertext)
• Can range from not that expensive to
very expensive depending on query and
time
19/06/2016 © The University of Sheffield
Legal issues
• Sharing of Twitter datasets is prohibited
see https://dev.twitter.com/terms/api-terms
• However, sharing Tweet IDs (to look up
the tweets used is permissible). This is
useful for reproducibility.
19/06/2016 © The University of Sheffield
19/06/2016 © The University of Sheffield
13
Business Expenditure
• Businesses spend millions of dollars every
year tailoring their brands and protecting
them
• Historically traditional media and one-to-
many approach gave control to brands via
advertisers
Shift of Power
19/06/2016 © The University of Sheffield
14
• With emergence of social media the
traditional brand communication process
has reached something of a crisis
• Traditional communication lines are rapidly
breaking down
19/06/2016 © The University of Sheffield
15
Shift of Power
• When it became clear that Twitter was becoming
an important social networking site and public
communication platform
• A number of businesses and social media
marketing professionals attempted to exploit
the platform for commercial purposes
Toyota
• Toyota had to recall a number of its cars in
2009 ad 2010 due to a serious safety
faulty which resulted in the deaths of over
50 people
• Unlike Sony - they immediately went into
Damage Control
19/06/2016 © The University of Sheffield
16
• As soon as the recall crisis start getting
media attention Toyota quickly put
together an ‘Online Newsroom’ and a
‘Social Media Strategy Team’ to
coordinate all the media releases
19/06/2016 © The University of Sheffield
17
Toyota
Sony PlayStation Network
• In mid-April 2011 the Playstation Network was
suddenly shut down without explanation
• Frustrations quickly spread through social media
sites such as Twitter as gamers around the
world voiced their annoyance at not being able
to access their online games
19/06/2016 © The University of Sheffield
18
Sony PlayStation Network
• The lack of regular updates and
information from Sony served to incense
users
• Users struggled to determine what was
fact and what was rumour on Twitter
19/06/2016 © The University of Sheffield
19
Sony PlayStation Network
• Lapse in communication was
incomprehensive to consumers
• Lack of regular updates and information
only served to incense users further
19/06/2016 © The University of Sheffield
20
• “I think It is pretty disgusting that Sony have waiting 7
days to tell users that their Credit Card details may have
been compromised”.
• “I bet the hacker will get emails out quicker than Sony!”
19/06/2016 © The University of Sheffield
21
Sony PlayStation Network
Toyota
• While there was still anger and negative
viewpoints shared through social media
services,
• Company was able to minimise their
impact by eliminating confusion and
keeping the consumer base regularly
informed of developments
19/06/2016 © The University of Sheffield
22
Brand Management
• The two cases have highlighted brands
need to know how they are being
mentioned across social media profiles
• Social Media Analytics is now a huge
market
19/06/2016 © The University of Sheffield
23
Types of analysis possible
• Sentiment analysis has the potential to
work well with Twitter data, as tweets are
consistent in length (i.e., <= 140)
• However sarcasm is difficult to detect
within tweets.
• SentiStrength algorithm
(http://sentistrength.wlv.ac.uk/)
19/06/2016 © The University of Sheffield
24
Types of analysis possible
• Time series analysis is normally used
when examining tweets overtime to see
when a peak of tweets may occur. One I
made today:
19/06/2016 © The University of Sheffield
25
Last 30 days time series graph
of Croatia
19/06/2016 © The University of Sheffield
26
Context behind the peak June 12th 2016
19/06/2016 © The University of Sheffield
27
Euro championship, Croatia win their
opening game:
Types of analysis possible
• Network analysis is used to visualize the
connections between people (who is
connected to who?)
• Who is the most influential Twitter user?
Various algorithms can be used, a popular
algorithm is the Betweenness Centrality
Algorithm
19/06/2016 © The University of Sheffield
28
Types of analysis possible
• Network analysis is used to visualize the
connections between people (who is
connected to who?)
• Who is the most influential Twitter user?
Various algorithms can be used, a popular
algorithm is the Betweenness Centrality
Algorithm
19/06/2016 © The University of Sheffield
29
Betweenness Centrality Algorithm
19/06/2016 © The University of Sheffield
30
Image from / read more here http://med.bioinf.mpi-
inf.mpg.de/netanalyzer/help/2.7/
Types of analysis possible
• Machine Learning e.g. using a text
classifier such as the naive Bayes
algorithm
• Involves training data e.g. manually coding
a subset of data e.g, 100 tweets in a
dataset of a 1,000 tweets and the
algorithm will automatically classifier the
remaining data
19/06/2016 © The University of Sheffield
31
Part 2 of the workshop
• Part 2 of the workshop will provide an
overview of some of the cutting edge
analytics platforms out there
• Pause here and create a Twitter account
(if you don’t have one)
19/06/2016 © The University of Sheffield
32
Visibrain Focus (commercial)
19/06/2016 © The University of Sheffield
33
Visibrain Focus
• Unfortunately not possible to get access for
delegates
• However, Visibrain offer a free 30 day trial
• I can provide an overview on this machine
19/06/2016 © The University of Sheffield
34
Echosec (fee version available)
19/06/2016 © The University of Sheffield
35
Echosec (fee version available)
• Location based social media search by
location rather than keywords
• Allows you to examine a specific
geographical area by drawing on
Facebook, Twitter, Instagram, Sina Weibo,
Youtube, Foursquare, Flickr, and VK APIs
19/06/2016 © The University of Sheffield
36
19/06/2016 © The University of Sheffield
37
Examples of case studies using
Echosec
• Echosec was used following the April 2015 Nepal
Earthquake
• Apps such as four-square have potential to provide first
responders ability to check where things are
• Geographically searching social media data in an area
can show you what you are looking for in an emergency
• Can examine locations of affected areas and see where
people have stopped posting from
19/06/2016 © The University of Sheffield
38
Real Examples of case studies
using Echosec
Echosec
• Navigate to https://app.echosec.net/
• Near the bottom left there will be an option to
enter a location to search for
• See what intelligence you can gain using
location based search. (5-10 minutes)
19/06/2016 © The University of Sheffield
39
Follow the Hashtag
• Free version available to access
• Navigate to
http://www.followthehashtag.com/
19/06/2016 © The University of Sheffield
40
Twitonomy
19/06/2016 © The University of Sheffield
41
Twitonomy
• Free version available to access navigate
to: https://www.twitonomy.com/
19/06/2016 © The University of Sheffield
42
NodeXL
• Social media analysis that looks at the
structure of the networks when using
social media
• One particular tool is called NodeXL,
unfortunately not enough time to download
and install, but can demonstrate on this
machine
19/06/2016 © The University of Sheffield
43
NodeXL
19/06/2016 © The University of Sheffield
44
• To examine network graphs currently
being created and uploaded.
• Navigate to the NodeXL graph gallery
http://www.nodexlgraphgallery.org/
NodeXL – Graph Gallery
19/06/2016 © The University of Sheffield
45
NodeXL
19/06/2016 © The University of Sheffield
46
• Example graphs on the Gallery
• For interpretation see Smith, Rainie,
Shneiderman, & Himelboim (2014)
• Also see this example of 6 types of
network graph
University of Sheffield Project
19/06/2016 © The University of Sheffield
47
• Produced a report for the Head of Digital
at the University of Sheffield Stephen
Thompson examining mentions of the
University over previous 12 months
University of Sheffield Project
• Step 1 – Obtain historical data using a
provider such as Sifter and data placed
into DiscoverText
• Step 2 – Using DiscoverText de-duplicate
data by removing exact duplicates, and
near duplicate clusters
19/06/2016 © The University of Sheffield
48
University of Sheffield Project
• Step 3 – Of a reduced dataset take a 10%
sample and manually code/ and or train a
machine classifier to code the entire
dataset.
• I used DiscoverText which is a cloud-
based, collaborative text analytics solution,
and which allows the above.
19/06/2016 © The University of Sheffield
49
DiscoverText
19/06/2016 © The University of Sheffield
50
University of Sheffield Project
19/06/2016 © The University of Sheffield
51
• By removing duplicates and near
duplicates the sample of N=43,521 tweets
became a total of N=13,078 tweets.
• Prevents from categorizing only popular
mentions.
University of Sheffield Project
19/06/2016 © The University of Sheffield
52
• A 10% random sample of tweets were
extracted from the filtered dataset (i.e.,
10% of 13,078) to leave a total of n=1,198
tweets (total coding time 19 hours 29
minutes and 20 seconds).
University of Sheffield Project
19/06/2016 © The University of Sheffield
53
• Conclusions and key findings:
• A university that is very well engaged with its
students, the public, and the mainstream
media
• Ranked highly amongst other Russell Group
universities for followers, and mentions
Conclusion
19/06/2016 © The University of Sheffield
54
• There is no ‘best’ social media analytics
tool as they all offer something different
and I use them in combination
Questions?
• Happy to answer any specific questions
19/06/2016 © The University of Sheffield
55
To
Discover
And
Understand.

An overview of Twitter analytics

  • 1.
    An overview ofTwitter analytics Wasim Ahmed (wahmed1@sheffield.ac.uk) (Twitter: @was3210) Acknowledgements to Sergej Lugovic (@sergejlugovic) Contemporary Issues in Economy & Technology (CIET). 15th June 2016. Split, Croatia.
  • 2.
    19/06/2016 © TheUniversity of Sheffield 2 About me • Second Year PhD student from the Information School, University of Sheffield (UK). • PhD examines content that is shared on Twitter during infectious disease outbreaks. • Run a social media research blog (over 11 thousand hits)
  • 3.
    19/06/2016 © TheUniversity of Sheffield 3 About me • Currently working on a PhD project examining infectious disease outbreaks on Twitter • Alongside PhD assisted security research teams, government, media, and educational organisations globally
  • 4.
    About me…continued • Alsowork part time as a Research Associate: Social Media specialist 19/06/2016 © The University of Sheffield 4
  • 5.
    19/06/2016 © TheUniversity of Sheffield 5 Overview of workshop • Part 1 – Overview of Twitter, and case studies examples • Part 2 – Overview of Twitter analytics software / interactive sessions • Part 3 – Q&A on tools – make sure to jot down some questions!
  • 6.
    19/06/2016 © TheUniversity of Sheffield 6 Aims • Better understand Twitter as a platform • Provide examples of case studies using social media analytics • Gain knowledge and awareness of Twitter analytics
  • 7.
    Twitter • Twitter allowsbrief <140 character text updates, known as ‘tweets’, to be shared with other users • Tweets can contain thoughts, feelings, activities, and opinions (Chew and Eysenbach, 2010). 19/06/2016 © The University of Sheffield 7
  • 8.
    Twitter • Twitter reportshaving 316 million monthly active users • There being 500 million tweets per day • 80% of active Twitter users using a mobile device (About Twitter, n.d.). 19/06/2016 © The University of Sheffield 8
  • 9.
    Why Twitter (data)? •See my LSE impact blog post baseline comparison to Facebook • Twitter is a popular platform in terms of the media attention it receives and it therefore attracts more research due to its cultural status • Twitter makes it easier to find and follow conversations (i.e., by both its search feature and by tweets appearing in Google search results) • Twitter has hashtag norms which make it easier gathering, sorting, and expanding searches when collecting data • Twitter data is easy to retrieve as major incidents, news stories and events on Twitter tend to be centred around a hashtag • The Twitter API is more open and accessible compared to other social media platforms, which makes Twitter more favourable to developers creating tools to access data. This consequently increases the availability of tools to researchers. • Many researchers themselves are using Twitter and because of their favourable personal experiences, they feel more comfortable with researching a familiar platform. 19/06/2016 © The University of Sheffield
  • 10.
    Different types ofTwitter API • Application Programming Interface • Twitter’s Search API – focused on relevance and not completeness, some tweets and users may be missing from results (7 days back in time up to 3200 queries) • Twitter Streaming API – The Streaming APIs give developers low latency access to Twitter’s global stream of tweet data (live stream) • Firehose API – in theory, 100% of Twitter data (most software allows up to 30 days worth of historical tweets) 19/06/2016 © The University of Sheffield
  • 11.
    What if Iwant data going back more than 30 days? • In most instance you will have to pay for it • I use Texifter (@texifter) with DiscoverText (@discovertext) • Can range from not that expensive to very expensive depending on query and time 19/06/2016 © The University of Sheffield
  • 12.
    Legal issues • Sharingof Twitter datasets is prohibited see https://dev.twitter.com/terms/api-terms • However, sharing Tweet IDs (to look up the tweets used is permissible). This is useful for reproducibility. 19/06/2016 © The University of Sheffield
  • 13.
    19/06/2016 © TheUniversity of Sheffield 13 Business Expenditure • Businesses spend millions of dollars every year tailoring their brands and protecting them • Historically traditional media and one-to- many approach gave control to brands via advertisers
  • 14.
    Shift of Power 19/06/2016© The University of Sheffield 14 • With emergence of social media the traditional brand communication process has reached something of a crisis • Traditional communication lines are rapidly breaking down
  • 15.
    19/06/2016 © TheUniversity of Sheffield 15 Shift of Power • When it became clear that Twitter was becoming an important social networking site and public communication platform • A number of businesses and social media marketing professionals attempted to exploit the platform for commercial purposes
  • 16.
    Toyota • Toyota hadto recall a number of its cars in 2009 ad 2010 due to a serious safety faulty which resulted in the deaths of over 50 people • Unlike Sony - they immediately went into Damage Control 19/06/2016 © The University of Sheffield 16
  • 17.
    • As soonas the recall crisis start getting media attention Toyota quickly put together an ‘Online Newsroom’ and a ‘Social Media Strategy Team’ to coordinate all the media releases 19/06/2016 © The University of Sheffield 17 Toyota
  • 18.
    Sony PlayStation Network •In mid-April 2011 the Playstation Network was suddenly shut down without explanation • Frustrations quickly spread through social media sites such as Twitter as gamers around the world voiced their annoyance at not being able to access their online games 19/06/2016 © The University of Sheffield 18
  • 19.
    Sony PlayStation Network •The lack of regular updates and information from Sony served to incense users • Users struggled to determine what was fact and what was rumour on Twitter 19/06/2016 © The University of Sheffield 19
  • 20.
    Sony PlayStation Network •Lapse in communication was incomprehensive to consumers • Lack of regular updates and information only served to incense users further 19/06/2016 © The University of Sheffield 20
  • 21.
    • “I thinkIt is pretty disgusting that Sony have waiting 7 days to tell users that their Credit Card details may have been compromised”. • “I bet the hacker will get emails out quicker than Sony!” 19/06/2016 © The University of Sheffield 21 Sony PlayStation Network
  • 22.
    Toyota • While therewas still anger and negative viewpoints shared through social media services, • Company was able to minimise their impact by eliminating confusion and keeping the consumer base regularly informed of developments 19/06/2016 © The University of Sheffield 22
  • 23.
    Brand Management • Thetwo cases have highlighted brands need to know how they are being mentioned across social media profiles • Social Media Analytics is now a huge market 19/06/2016 © The University of Sheffield 23
  • 24.
    Types of analysispossible • Sentiment analysis has the potential to work well with Twitter data, as tweets are consistent in length (i.e., <= 140) • However sarcasm is difficult to detect within tweets. • SentiStrength algorithm (http://sentistrength.wlv.ac.uk/) 19/06/2016 © The University of Sheffield 24
  • 25.
    Types of analysispossible • Time series analysis is normally used when examining tweets overtime to see when a peak of tweets may occur. One I made today: 19/06/2016 © The University of Sheffield 25
  • 26.
    Last 30 daystime series graph of Croatia 19/06/2016 © The University of Sheffield 26
  • 27.
    Context behind thepeak June 12th 2016 19/06/2016 © The University of Sheffield 27 Euro championship, Croatia win their opening game:
  • 28.
    Types of analysispossible • Network analysis is used to visualize the connections between people (who is connected to who?) • Who is the most influential Twitter user? Various algorithms can be used, a popular algorithm is the Betweenness Centrality Algorithm 19/06/2016 © The University of Sheffield 28
  • 29.
    Types of analysispossible • Network analysis is used to visualize the connections between people (who is connected to who?) • Who is the most influential Twitter user? Various algorithms can be used, a popular algorithm is the Betweenness Centrality Algorithm 19/06/2016 © The University of Sheffield 29
  • 30.
    Betweenness Centrality Algorithm 19/06/2016© The University of Sheffield 30 Image from / read more here http://med.bioinf.mpi- inf.mpg.de/netanalyzer/help/2.7/
  • 31.
    Types of analysispossible • Machine Learning e.g. using a text classifier such as the naive Bayes algorithm • Involves training data e.g. manually coding a subset of data e.g, 100 tweets in a dataset of a 1,000 tweets and the algorithm will automatically classifier the remaining data 19/06/2016 © The University of Sheffield 31
  • 32.
    Part 2 ofthe workshop • Part 2 of the workshop will provide an overview of some of the cutting edge analytics platforms out there • Pause here and create a Twitter account (if you don’t have one) 19/06/2016 © The University of Sheffield 32
  • 33.
    Visibrain Focus (commercial) 19/06/2016© The University of Sheffield 33
  • 34.
    Visibrain Focus • Unfortunatelynot possible to get access for delegates • However, Visibrain offer a free 30 day trial • I can provide an overview on this machine 19/06/2016 © The University of Sheffield 34
  • 35.
    Echosec (fee versionavailable) 19/06/2016 © The University of Sheffield 35
  • 36.
    Echosec (fee versionavailable) • Location based social media search by location rather than keywords • Allows you to examine a specific geographical area by drawing on Facebook, Twitter, Instagram, Sina Weibo, Youtube, Foursquare, Flickr, and VK APIs 19/06/2016 © The University of Sheffield 36
  • 37.
    19/06/2016 © TheUniversity of Sheffield 37 Examples of case studies using Echosec • Echosec was used following the April 2015 Nepal Earthquake • Apps such as four-square have potential to provide first responders ability to check where things are • Geographically searching social media data in an area can show you what you are looking for in an emergency • Can examine locations of affected areas and see where people have stopped posting from
  • 38.
    19/06/2016 © TheUniversity of Sheffield 38 Real Examples of case studies using Echosec
  • 39.
    Echosec • Navigate tohttps://app.echosec.net/ • Near the bottom left there will be an option to enter a location to search for • See what intelligence you can gain using location based search. (5-10 minutes) 19/06/2016 © The University of Sheffield 39
  • 40.
    Follow the Hashtag •Free version available to access • Navigate to http://www.followthehashtag.com/ 19/06/2016 © The University of Sheffield 40
  • 41.
    Twitonomy 19/06/2016 © TheUniversity of Sheffield 41
  • 42.
    Twitonomy • Free versionavailable to access navigate to: https://www.twitonomy.com/ 19/06/2016 © The University of Sheffield 42
  • 43.
    NodeXL • Social mediaanalysis that looks at the structure of the networks when using social media • One particular tool is called NodeXL, unfortunately not enough time to download and install, but can demonstrate on this machine 19/06/2016 © The University of Sheffield 43
  • 44.
    NodeXL 19/06/2016 © TheUniversity of Sheffield 44 • To examine network graphs currently being created and uploaded. • Navigate to the NodeXL graph gallery http://www.nodexlgraphgallery.org/
  • 45.
    NodeXL – GraphGallery 19/06/2016 © The University of Sheffield 45
  • 46.
    NodeXL 19/06/2016 © TheUniversity of Sheffield 46 • Example graphs on the Gallery • For interpretation see Smith, Rainie, Shneiderman, & Himelboim (2014) • Also see this example of 6 types of network graph
  • 47.
    University of SheffieldProject 19/06/2016 © The University of Sheffield 47 • Produced a report for the Head of Digital at the University of Sheffield Stephen Thompson examining mentions of the University over previous 12 months
  • 48.
    University of SheffieldProject • Step 1 – Obtain historical data using a provider such as Sifter and data placed into DiscoverText • Step 2 – Using DiscoverText de-duplicate data by removing exact duplicates, and near duplicate clusters 19/06/2016 © The University of Sheffield 48
  • 49.
    University of SheffieldProject • Step 3 – Of a reduced dataset take a 10% sample and manually code/ and or train a machine classifier to code the entire dataset. • I used DiscoverText which is a cloud- based, collaborative text analytics solution, and which allows the above. 19/06/2016 © The University of Sheffield 49
  • 50.
    DiscoverText 19/06/2016 © TheUniversity of Sheffield 50
  • 51.
    University of SheffieldProject 19/06/2016 © The University of Sheffield 51 • By removing duplicates and near duplicates the sample of N=43,521 tweets became a total of N=13,078 tweets. • Prevents from categorizing only popular mentions.
  • 52.
    University of SheffieldProject 19/06/2016 © The University of Sheffield 52 • A 10% random sample of tweets were extracted from the filtered dataset (i.e., 10% of 13,078) to leave a total of n=1,198 tweets (total coding time 19 hours 29 minutes and 20 seconds).
  • 53.
    University of SheffieldProject 19/06/2016 © The University of Sheffield 53 • Conclusions and key findings: • A university that is very well engaged with its students, the public, and the mainstream media • Ranked highly amongst other Russell Group universities for followers, and mentions
  • 54.
    Conclusion 19/06/2016 © TheUniversity of Sheffield 54 • There is no ‘best’ social media analytics tool as they all offer something different and I use them in combination
  • 55.
    Questions? • Happy toanswer any specific questions 19/06/2016 © The University of Sheffield 55
  • 56.