Super Bowl is the annual championship game of the National Football League (NFL), the highest level of professional American football in the world. Because of its high importance, the commercials broadcasted during the game are also specially created and are very much talked about among the people. These commercials are broadcasted at a very high cost, say $5m for 30 seconds. The objective of this research is to find out the impact of these commercials over people..
3. The SuperBowl
Super Bowl 2017 Statistics
• Viewers: 111.3 Million
• 30 sec of air time: $5 Million
• Brands: 56
• Ads: 66
• Famous for entertaining and catchy ads
• 5-6 hours long
4. Introduction - Objectives
• Gather and analyze data from Twitter on Super Bowl
commercials
• Use analyzer to determine overall sentiment of commercial
• Extract brand/product/commercial data and statistics
• Quickly provide results in infographic, word clouds, and white
paper
• Create infrastructure using new technologies
6. Project Processes
ETL Processing and Cleaning
1. Preparation
a. Prepare servers
b. Create Twitter apps
c. Write Python code
2. Processing
a. Stream data from Twitter to
VMs
b. Extract Transform Load data
on single server
3. Analyses:
a. Sentiment Analysis
b. Data Statistics
c. Word Cloud Analysis
d. Tableau Analysis
13. Commercial Keyword Process
• Monitor commercial for keywords
– Hashtags, brand names, etc.
• Enter keywords in python script
• Record keyword and associated server
17. ETL
• Raw Data
• Desired Data
• Tweet Samples
• MongoDB terminologies
• Operators used from pymongo
• Adding fields desired to processed data
• Sentiment Classifier
• Polarity
23. Note Operator : $push
Adding fields desired to processed data
24. Sentiment Classifier : TextBlob
● Noun phrase extraction
● Part-of-speech tagging
● Sentiment analysis
● Classification (Naive Bayes, Decision Tree)
● Language translation and detection powered by Google Translate
● Tokenization (splitting text into words and sentences)
● Word and phrase frequencies
● Parsing
27. Data Statistics
Total Ads: 66
Total Brands: 56
Total tweets collected: 1,063,236
Tweets collected during the game : 714447
Tweets collected related to Ads : 259,279
4PM
10PM
38. Deliverables
• Infographics
• Learnings from the Infrastructure
– MongoDB(Unstructured Data)
– Python
– Tableau
– Twitter API
– Text Mining
• White Paper
• Research Paper