SlideShare a Scribd company logo
1 of 17
Download to read offline
Super Bowl 50
& the
Twitterverse
What can data tell us
about an event we didn’t
see?
Image source: https://en.wikipedia.org/wiki/Super_Bowl_50
The Event
• The 50th Super Bowl - champions of
the American Football Conference
playing champions of the National
Football Conference
• Audience of 111.9 million Americans
(3rd largest in US history)
• Advertising cost of $5 million for 30
seconds
• Cam Newton (Carolina Panthers)
versus Peyton Manning (the Denver
Broncos)
• And that’s all I knew up-front…
Image source: http://www.nfl.com/
The Project
• Use Natural Language Processing on a large number of Tweets
• Look at tweets using one of the main 4 hashtags (#superbowl,
#superbowl50, #nfl, #sb50)
• Can the data could tell us the key stories that happened?
NLP – My Approach
• Extract relevant tweets from Twitter, pulling a large sample for each hashtag, & store
• The Twitter data would be composed of 2 sections: Tweet text itself & the Tweeter (who the
person was, location, name, any other salient information)
• Utilise tm text mining package (R's most popular text mining package)
• Convert tweet content to a corpus (a large and structured set of texts)
• Apply standard NLP transformations (convert text to lowercase; remove retweets, numbers, links,
spaces, URLs; remove stopwords (words of no real help - a, the, and, or, and more); stem words
where needed (so that words which referenced the same thing would be treated the same))
• Build a Document-Term Matrix from the corpus (a matrix of the words left, to allow for analysis)
• Look at frequency (how often key terms are appearing/mentioned in tweets), clustering (do
these terms fit into logical families? Can patterns be observed?), etc.
• Perform sentiment analysis (for each tweet, look at the positive and negative words used, and
determine a sentiment score - the more negative the score, the more negative the tweet, and
vice versa)
• Include additional elements that make sense (e.g. a word cloud, a geographical analysis of
where people were tweeting from, etc.)
Getting the data – playing nice with Twitter
• Set up a Twitter app @ https://dev.twitter.com/
• Using twitteR package:
• Create Twitter handshake
• Extract data using searchTwitter()
• searchTwitter() – text of the tweet; screenname of Tweeter; when tweet was
created; was the status favourited (and how many times); longitude/latitude
of user; and more…
• Limitation: Twitter only returns subset of tweets from the last week, and
biased towards recency (so all tweets are from 1-2 days after the event)
setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)
superbowl <- searchTwitter("#superbowl", n = 10000, lang = "en", since = "2015-02-06")
superbowl50 <- searchTwitter("#superbowl50", n = 10000, lang = "en", since = "2015-02-06")
nfl <- searchTwitter("#nfl", n = 10000, lang = "en", since = "2015-02-06")
sb50 <- searchTwitter("#sb50", n = 10000, lang = "en", since = "2015-02-06")
“I need to make a corpse?” – The Corpus
• Strip out all retweets using strip_retweets()
• Store data in data frames
• Create a corpus – a large collection of
documents
• Perform a number of standard
transformations – removing punctuation,
whitespace, URLs, stopwords (“and”, “the”,
etc.); make the corpus lowercase
• Create a Document Term Matrix (matrix that
describes the frequency of terms that occur
in a collection of documents) and a sparse
DTM (ignore terms that with frequency lower
than a given threshold, making our remaining
terms more relevant)
superbowl_no_rt <- strip_retweets(superbowl)
superbowl_df <- twListToDF(superbowl_no_rt)
combined_corpus <-
Corpus(VectorSource(combined_df$text))
combined_corpus <- tm_map(combined_corpus,
removePunctuation)
combined_DTM <-
DocumentTermMatrix(combined_corpus)
combined_DTMs <-
removeSparseTerms(combined_DTM, 0.99)
Story 1 – What does frequency
tell us?
findFreqTerms(combined_DTM, lowfreq = 100)
• Looking at the most frequent terms, and
graphing these
Potential stories:
• Cam, Newton, Peyton, Manning – the main two players
• Avosinspace, avosfrommexico – some sort of reference to
avocados?
• Beyonce – the national anthem or halftime show?
• Promo, code, sale, freeride – some sort of promotion?
• Uber – offer rides, so could they be running a promotion? How
big was this, to feature this heavily?
• A wordcloud allows us to see the term
frequency in a more visual way
• The hashtags dominate proceedings, but
we can see some of other terms really
jump out – “camnewton”, “avosinspace”,
“beyonce”, “uber”. We’ll look at these in
more detail, next
• There are a number of other similar
terms – teams ("seahawk", "raider",
"dallascowboy"), sports ("reebok",
"apparel"), television channels/networks
("espn")
• Like in anything that exists on the
Internet, “sex” is in there. I have no idea
how, but this is the Internet, I guess
Frequency reimagined
wordcloud(combined_text_corpus, min.freq =
min_freq, scale = c(8, 0.5), rot.per=.15, colors
= brewer.pal(8, "Paired"), random.color = TRUE,
random.order = FALSE, max.words = Inf)
Story 2: A level deeper - word associations
• For some of the most frequent terms from our analysis and word
cloud, let’s look at what words they are most frequently associated
with
• The accompanying report has a more extensive list – we’ll look at the
most interesting findings
uber_assoc <- findAssocs(combined_DTM, "uber", assoc)
Cam Newton – what happened?
“Carolina Panthers quarterback Cam Newton has been harshly
criticised for appearing to hesitate instead of jumping on the
loose football he had fumbled late in the fourth quarter. The
Denver Broncos recovered the ball, and on the ensuing drive
C.J. Anderson found the back of the end zone to seal Denver’s
Super Bowl victory”
Uber (& Honda) – frenemies?
Honda offered free Uber rides for the
two hours after the game – although
it seems Uber was the winner in the
buzz stakes!
Beyonce – a political stance?
It seems Beyonce’s halftime show was more of a political statement; her female backup dancers wore
Black Panther uniforms (which explains some previous associations of other words with
“revolutionary” and “stanleynelson” (an Emmy-award winning filmmaker who made “The Black
Panthers: Vanguard of the Revolution”)), and her show supported the Black Lives Matter movement.
Cluster’s Last Stand
• Clustering terms allows us to see what is/could
be related
• Rather than a regular dendogram, we’ll use a
package (Ape) to create a more visually-
appealing dendogram
• We can see a number of distinct relationships:
• The largest is the game itself - teams, half-time
show and performers, key players, etc.
• An Uber cluster pulls together all the terms
related to the free ride promo. With no
mention of Honda, it seems Uber won the buzz
battle
• An Avo's cluster, which seems to be some sort
of spot asking people to vode for their
favourite...something? Looking into this, a
company called Avocados From Mexico ran a
commercial, pusing the "avosinspace" hashtag
(hence the frequency of the term), and a
number of companies are now asking people
to vote for their favourite Superbowl
commercial (and it seems this was it).
Story 3: Tell me how you feel…
• Let’s look at the sentiment of the tweets – were
people positive? Negative? Neutral? Angry? Sad?
Happy?
• We’ll use a basic sentiment analysis model for this:
• Take in a piece of text
• Reviews it for positive and negative keywords,
• Scores +1 for positive words found, -1 for negative
words found
• Tally the scores, giving a sentiment score for each
piece of text (in our case, each tweet).
• Positive and negative word lists are from Hu and
Liu's Opinion Lexicon
(https://www.cs.uic.edu/~liub/FBS/sentiment-
analysis.html#lexicon)
• We can see that the vast majority of tweets are
sentiment neutral (0), then 1 (positive), -1
(negative) and so forth. Overall, we can sentiment
is hugely neutral, with the slightest bias towards
positive; this is borne out when we look at the
mean and median values
mean(combined_df$sentiment)
[1] 0.1347915
median(combined_df$sentiment
)
[1] 0
Story 4: Feel the heat(map)
• Let’s look at the sentiment of the most popular tweets
(those tweets favourited more than 50 times), and who
tweeted these
• We create a bucket of the favouriteCount variable -
since this is a continuous variable, we want to reign it
in, so we'll look at buckets bracketed by 50 (0 - 50
favourites, 51 - 100, etc.)
• Limitation: we should expect the heatmap to have a
number of blank spaces, as it doesn't have data points
for every bucket
• We can see that the most-favourited tweets were
primarily neutral to negative; that those from the
sports networks were more neutral (BBC, FOX Sports);
a couple of celebrities got in on the act with positive
tweets (Maroon 5, Nick Jonas); and Ron Celements (an
NFL reporter) had the most-favourited negative tweet.
Story 5: We know where you live
• A very small amount of tweets
had associated longitude and
latitude - they can’t really be
seen as anything more than a
novelty to be mapped, versus a
true indication of tweet
volume/sentiment
• We see the West coast of the
US is a little more positive than
the East coast
• As we travel from West to East,
sentiment gets more negative
Summary
• From Beyonce's political stance, to Cam Newton's fumble, to
Avocado's in Space, to Honda and Uber's mega-deal which everyone
was talking about, it seems that a set of tweets can help us see stories
in the data
• For more information:
• Full R code, R Markdown report @ https://github.com/ivanheneghan/
• Blog post @ http://www.prettypicturestellstories.com/

More Related Content

Viewers also liked (11)

Quit smokingbb2
Quit smokingbb2Quit smokingbb2
Quit smokingbb2
 
violets-lookbook-overview-final (1)
violets-lookbook-overview-final (1)violets-lookbook-overview-final (1)
violets-lookbook-overview-final (1)
 
Apresentao2
Apresentao2Apresentao2
Apresentao2
 
Certificat de travail SG
Certificat de travail SGCertificat de travail SG
Certificat de travail SG
 
gtFace: Agile Scrum
gtFace: Agile ScrumgtFace: Agile Scrum
gtFace: Agile Scrum
 
2guerramundial
2guerramundial2guerramundial
2guerramundial
 
КЕНГУРУ 2016
КЕНГУРУ 2016КЕНГУРУ 2016
КЕНГУРУ 2016
 
End of module 4 review
End of module 4 reviewEnd of module 4 review
End of module 4 review
 
General sings & symptom of disease fish
General sings & symptom of disease fishGeneral sings & symptom of disease fish
General sings & symptom of disease fish
 
Bacterial Growth Factors
Bacterial Growth FactorsBacterial Growth Factors
Bacterial Growth Factors
 
CardiacMaintenance
CardiacMaintenanceCardiacMaintenance
CardiacMaintenance
 

Similar to Super Bowl 50 & the Twitterverse

Language of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 AnalysisLanguage of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 AnalysisYelena Mejova
 
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
Making Sense of Millions of Thoughts: Finding Patterns in the TweetsMaking Sense of Millions of Thoughts: Finding Patterns in the Tweets
Making Sense of Millions of Thoughts: Finding Patterns in the TweetsKrist Wongsuphasawat
 
Essay Structure Essay structure refers to organization; it.docx
Essay Structure Essay structure refers to organization; it.docxEssay Structure Essay structure refers to organization; it.docx
Essay Structure Essay structure refers to organization; it.docxSALU18
 
Sentiment Analysis (GDSCTU).pdf
Sentiment Analysis (GDSCTU).pdfSentiment Analysis (GDSCTU).pdf
Sentiment Analysis (GDSCTU).pdfYasminAzou
 
Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using Rsantoshi mangalgi
 
Predicting The Future With Social Media
Predicting The Future With Social MediaPredicting The Future With Social Media
Predicting The Future With Social MediaMaurizio Napolitano
 
(Keynote) Mike Thelwall - “Sentiment Strength Detection for Social Media Text...
(Keynote) Mike Thelwall - “Sentiment Strength Detection for Social Media Text...(Keynote) Mike Thelwall - “Sentiment Strength Detection for Social Media Text...
(Keynote) Mike Thelwall - “Sentiment Strength Detection for Social Media Text...icwe2015
 
Explaining Controversy on Social Media via Stance Summarization
Explaining Controversy on Social Media via Stance SummarizationExplaining Controversy on Social Media via Stance Summarization
Explaining Controversy on Social Media via Stance Summarizationmiajang
 
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...Subhabrata Mukherjee
 
TextMiningTwitters
TextMiningTwittersTextMiningTwitters
TextMiningTwittersLiu Chang
 
Advanced twitter for journos
Advanced twitter for journosAdvanced twitter for journos
Advanced twitter for journosSteve Buttry
 
Insights into the Twitterverse: Benchmarking and analysis twitter content
Insights into the Twitterverse: Benchmarking and analysis twitter contentInsights into the Twitterverse: Benchmarking and analysis twitter content
Insights into the Twitterverse: Benchmarking and analysis twitter contentStephen Dann
 
ENG/IMS 224, January 29, 2013
ENG/IMS 224, January 29, 2013ENG/IMS 224, January 29, 2013
ENG/IMS 224, January 29, 2013Miami University
 
EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...
EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...
EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...Symeon Papadopoulos
 
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"Pete Burnap
 
Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course Ashutosh Jadhav
 
Big Data LDN 2018: PROMISE AND PITFALLS OF TEXT ANALYTICS
Big Data LDN 2018: PROMISE AND PITFALLS OF TEXT ANALYTICSBig Data LDN 2018: PROMISE AND PITFALLS OF TEXT ANALYTICS
Big Data LDN 2018: PROMISE AND PITFALLS OF TEXT ANALYTICSMatt Stubbs
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 

Similar to Super Bowl 50 & the Twitterverse (20)

Language of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 AnalysisLanguage of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 Analysis
 
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
Making Sense of Millions of Thoughts: Finding Patterns in the TweetsMaking Sense of Millions of Thoughts: Finding Patterns in the Tweets
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
 
Essay Structure Essay structure refers to organization; it.docx
Essay Structure Essay structure refers to organization; it.docxEssay Structure Essay structure refers to organization; it.docx
Essay Structure Essay structure refers to organization; it.docx
 
Sentiment Analysis (GDSCTU).pdf
Sentiment Analysis (GDSCTU).pdfSentiment Analysis (GDSCTU).pdf
Sentiment Analysis (GDSCTU).pdf
 
Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using R
 
Predicting The Future With Social Media
Predicting The Future With Social MediaPredicting The Future With Social Media
Predicting The Future With Social Media
 
(Keynote) Mike Thelwall - “Sentiment Strength Detection for Social Media Text...
(Keynote) Mike Thelwall - “Sentiment Strength Detection for Social Media Text...(Keynote) Mike Thelwall - “Sentiment Strength Detection for Social Media Text...
(Keynote) Mike Thelwall - “Sentiment Strength Detection for Social Media Text...
 
Explaining Controversy on Social Media via Stance Summarization
Explaining Controversy on Social Media via Stance SummarizationExplaining Controversy on Social Media via Stance Summarization
Explaining Controversy on Social Media via Stance Summarization
 
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
 
twitter_golden_globes
twitter_golden_globestwitter_golden_globes
twitter_golden_globes
 
TextMiningTwitters
TextMiningTwittersTextMiningTwitters
TextMiningTwitters
 
Advanced twitter for journos
Advanced twitter for journosAdvanced twitter for journos
Advanced twitter for journos
 
Insights into the Twitterverse: Benchmarking and analysis twitter content
Insights into the Twitterverse: Benchmarking and analysis twitter contentInsights into the Twitterverse: Benchmarking and analysis twitter content
Insights into the Twitterverse: Benchmarking and analysis twitter content
 
ENG/IMS 224, January 29, 2013
ENG/IMS 224, January 29, 2013ENG/IMS 224, January 29, 2013
ENG/IMS 224, January 29, 2013
 
EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...
EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...
EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...
 
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
 
Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course
 
Trend Analysis
Trend AnalysisTrend Analysis
Trend Analysis
 
Big Data LDN 2018: PROMISE AND PITFALLS OF TEXT ANALYTICS
Big Data LDN 2018: PROMISE AND PITFALLS OF TEXT ANALYTICSBig Data LDN 2018: PROMISE AND PITFALLS OF TEXT ANALYTICS
Big Data LDN 2018: PROMISE AND PITFALLS OF TEXT ANALYTICS
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 

Recently uploaded

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 

Recently uploaded (20)

PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 

Super Bowl 50 & the Twitterverse

  • 1. Super Bowl 50 & the Twitterverse What can data tell us about an event we didn’t see? Image source: https://en.wikipedia.org/wiki/Super_Bowl_50
  • 2. The Event • The 50th Super Bowl - champions of the American Football Conference playing champions of the National Football Conference • Audience of 111.9 million Americans (3rd largest in US history) • Advertising cost of $5 million for 30 seconds • Cam Newton (Carolina Panthers) versus Peyton Manning (the Denver Broncos) • And that’s all I knew up-front… Image source: http://www.nfl.com/
  • 3. The Project • Use Natural Language Processing on a large number of Tweets • Look at tweets using one of the main 4 hashtags (#superbowl, #superbowl50, #nfl, #sb50) • Can the data could tell us the key stories that happened?
  • 4. NLP – My Approach • Extract relevant tweets from Twitter, pulling a large sample for each hashtag, & store • The Twitter data would be composed of 2 sections: Tweet text itself & the Tweeter (who the person was, location, name, any other salient information) • Utilise tm text mining package (R's most popular text mining package) • Convert tweet content to a corpus (a large and structured set of texts) • Apply standard NLP transformations (convert text to lowercase; remove retweets, numbers, links, spaces, URLs; remove stopwords (words of no real help - a, the, and, or, and more); stem words where needed (so that words which referenced the same thing would be treated the same)) • Build a Document-Term Matrix from the corpus (a matrix of the words left, to allow for analysis) • Look at frequency (how often key terms are appearing/mentioned in tweets), clustering (do these terms fit into logical families? Can patterns be observed?), etc. • Perform sentiment analysis (for each tweet, look at the positive and negative words used, and determine a sentiment score - the more negative the score, the more negative the tweet, and vice versa) • Include additional elements that make sense (e.g. a word cloud, a geographical analysis of where people were tweeting from, etc.)
  • 5. Getting the data – playing nice with Twitter • Set up a Twitter app @ https://dev.twitter.com/ • Using twitteR package: • Create Twitter handshake • Extract data using searchTwitter() • searchTwitter() – text of the tweet; screenname of Tweeter; when tweet was created; was the status favourited (and how many times); longitude/latitude of user; and more… • Limitation: Twitter only returns subset of tweets from the last week, and biased towards recency (so all tweets are from 1-2 days after the event) setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret) superbowl <- searchTwitter("#superbowl", n = 10000, lang = "en", since = "2015-02-06") superbowl50 <- searchTwitter("#superbowl50", n = 10000, lang = "en", since = "2015-02-06") nfl <- searchTwitter("#nfl", n = 10000, lang = "en", since = "2015-02-06") sb50 <- searchTwitter("#sb50", n = 10000, lang = "en", since = "2015-02-06")
  • 6. “I need to make a corpse?” – The Corpus • Strip out all retweets using strip_retweets() • Store data in data frames • Create a corpus – a large collection of documents • Perform a number of standard transformations – removing punctuation, whitespace, URLs, stopwords (“and”, “the”, etc.); make the corpus lowercase • Create a Document Term Matrix (matrix that describes the frequency of terms that occur in a collection of documents) and a sparse DTM (ignore terms that with frequency lower than a given threshold, making our remaining terms more relevant) superbowl_no_rt <- strip_retweets(superbowl) superbowl_df <- twListToDF(superbowl_no_rt) combined_corpus <- Corpus(VectorSource(combined_df$text)) combined_corpus <- tm_map(combined_corpus, removePunctuation) combined_DTM <- DocumentTermMatrix(combined_corpus) combined_DTMs <- removeSparseTerms(combined_DTM, 0.99)
  • 7. Story 1 – What does frequency tell us? findFreqTerms(combined_DTM, lowfreq = 100) • Looking at the most frequent terms, and graphing these Potential stories: • Cam, Newton, Peyton, Manning – the main two players • Avosinspace, avosfrommexico – some sort of reference to avocados? • Beyonce – the national anthem or halftime show? • Promo, code, sale, freeride – some sort of promotion? • Uber – offer rides, so could they be running a promotion? How big was this, to feature this heavily?
  • 8. • A wordcloud allows us to see the term frequency in a more visual way • The hashtags dominate proceedings, but we can see some of other terms really jump out – “camnewton”, “avosinspace”, “beyonce”, “uber”. We’ll look at these in more detail, next • There are a number of other similar terms – teams ("seahawk", "raider", "dallascowboy"), sports ("reebok", "apparel"), television channels/networks ("espn") • Like in anything that exists on the Internet, “sex” is in there. I have no idea how, but this is the Internet, I guess Frequency reimagined wordcloud(combined_text_corpus, min.freq = min_freq, scale = c(8, 0.5), rot.per=.15, colors = brewer.pal(8, "Paired"), random.color = TRUE, random.order = FALSE, max.words = Inf)
  • 9. Story 2: A level deeper - word associations • For some of the most frequent terms from our analysis and word cloud, let’s look at what words they are most frequently associated with • The accompanying report has a more extensive list – we’ll look at the most interesting findings uber_assoc <- findAssocs(combined_DTM, "uber", assoc)
  • 10. Cam Newton – what happened? “Carolina Panthers quarterback Cam Newton has been harshly criticised for appearing to hesitate instead of jumping on the loose football he had fumbled late in the fourth quarter. The Denver Broncos recovered the ball, and on the ensuing drive C.J. Anderson found the back of the end zone to seal Denver’s Super Bowl victory”
  • 11. Uber (& Honda) – frenemies? Honda offered free Uber rides for the two hours after the game – although it seems Uber was the winner in the buzz stakes!
  • 12. Beyonce – a political stance? It seems Beyonce’s halftime show was more of a political statement; her female backup dancers wore Black Panther uniforms (which explains some previous associations of other words with “revolutionary” and “stanleynelson” (an Emmy-award winning filmmaker who made “The Black Panthers: Vanguard of the Revolution”)), and her show supported the Black Lives Matter movement.
  • 13. Cluster’s Last Stand • Clustering terms allows us to see what is/could be related • Rather than a regular dendogram, we’ll use a package (Ape) to create a more visually- appealing dendogram • We can see a number of distinct relationships: • The largest is the game itself - teams, half-time show and performers, key players, etc. • An Uber cluster pulls together all the terms related to the free ride promo. With no mention of Honda, it seems Uber won the buzz battle • An Avo's cluster, which seems to be some sort of spot asking people to vode for their favourite...something? Looking into this, a company called Avocados From Mexico ran a commercial, pusing the "avosinspace" hashtag (hence the frequency of the term), and a number of companies are now asking people to vote for their favourite Superbowl commercial (and it seems this was it).
  • 14. Story 3: Tell me how you feel… • Let’s look at the sentiment of the tweets – were people positive? Negative? Neutral? Angry? Sad? Happy? • We’ll use a basic sentiment analysis model for this: • Take in a piece of text • Reviews it for positive and negative keywords, • Scores +1 for positive words found, -1 for negative words found • Tally the scores, giving a sentiment score for each piece of text (in our case, each tweet). • Positive and negative word lists are from Hu and Liu's Opinion Lexicon (https://www.cs.uic.edu/~liub/FBS/sentiment- analysis.html#lexicon) • We can see that the vast majority of tweets are sentiment neutral (0), then 1 (positive), -1 (negative) and so forth. Overall, we can sentiment is hugely neutral, with the slightest bias towards positive; this is borne out when we look at the mean and median values mean(combined_df$sentiment) [1] 0.1347915 median(combined_df$sentiment ) [1] 0
  • 15. Story 4: Feel the heat(map) • Let’s look at the sentiment of the most popular tweets (those tweets favourited more than 50 times), and who tweeted these • We create a bucket of the favouriteCount variable - since this is a continuous variable, we want to reign it in, so we'll look at buckets bracketed by 50 (0 - 50 favourites, 51 - 100, etc.) • Limitation: we should expect the heatmap to have a number of blank spaces, as it doesn't have data points for every bucket • We can see that the most-favourited tweets were primarily neutral to negative; that those from the sports networks were more neutral (BBC, FOX Sports); a couple of celebrities got in on the act with positive tweets (Maroon 5, Nick Jonas); and Ron Celements (an NFL reporter) had the most-favourited negative tweet.
  • 16. Story 5: We know where you live • A very small amount of tweets had associated longitude and latitude - they can’t really be seen as anything more than a novelty to be mapped, versus a true indication of tweet volume/sentiment • We see the West coast of the US is a little more positive than the East coast • As we travel from West to East, sentiment gets more negative
  • 17. Summary • From Beyonce's political stance, to Cam Newton's fumble, to Avocado's in Space, to Honda and Uber's mega-deal which everyone was talking about, it seems that a set of tweets can help us see stories in the data • For more information: • Full R code, R Markdown report @ https://github.com/ivanheneghan/ • Blog post @ http://www.prettypicturestellstories.com/