SlideShare a Scribd company logo
1 of 42
General Assembly
Data Science
Airline Tweets
Sentiment Analysis
Natural language data pre-processing steps
02
PRE-PROCESSING
What were the data collection process.
When, where, and how?
01
DATA SETS
AGENDA
Sentiment Analysis
Using machine learning algorithms, we attempt
to classify positive and negative sentiments
04
MACHINE LEARNING
BINARY CLASSIFICATIONS
Using machine learning to predict positive,
neutral and negative sentiments and determine
the best model for data with unknown sentiments
05
MACHINE LEARNING
MULTI-CLASS CLASSIFICATIONS
Without using machine learning, can we use
some basic rules to classify sentiments?
03
RULE-BASED CLASSIFICATIONS
07
CLOSING THOUGHTS
FEEDBACK | Q&A
The summary and statistics of the collected airline
tweets
06
REASULTS
DATA SETS
Where, when and how?
#unitedairlines
#unitedair
@united
UNITED
#americanairlines
#americanair
@AmericanAir
AA
#deltaairlines
#deltaair
@delta
DELTA
#southwestairlines
#southwestair
@southwestair
SOUTHWEST
FOUR MAJOR AIRLINES
Sentiment Analysis
Reformatted/cleaned tweets with graded
sentiment of Major Airlines from Feb 2015
14,640 Tweets
KAGGLE
Commercial datasets provided by Newsroom
with machine graded tweets
4,000 Tweets
Newsroom
Using Python and twython to retrieve tweets
through Twitter’s API during 7 days period.
80,121 Tweets
TWITTER API
k
SOURCES
Sentiment Analysis
itsvatime.com
600 + 600 + 600 + 600
DATA PRE-
PROCESSING
Natural language processes to prepare and transform the tweet
content for various classification models and sentiment analysis
Major processing steps
NATURAL LANGUGE PROCESSING
Sentiment Analysis
For machine learning
Vector  Matrix… get it?
4 Vectorizer
Perform regular
expression
2 RegEx
Tokenize all tweet
contents
1 Tokenization
Remove all the English
stop words
3 Stop Words
Replacing all &amp symbols with the
actual words ‘and’ with a space
re.sub(r'&', r' and', tweet)
Replace ‘&amp’ Signs
Remove all annoying white spaces
re.sub('[s]+', ' ', tweet)
Remove Additional Spaces
Convert all #hashtagwords to just
‘hashtagwords’
re.sub(r'#([^s]+)', r'1', tweet)
Replace #hastags
Turn all https://... and www… format to
display just the word ‘URL’
re.sub('((www.[^s]+)|(https?://[^s]+))','URL',tweet)
Eliminate URLs
Convert all @username to display just
‘username’ without the ‘@’ sign
re.sub(r'@([^s]+)', r'1', tweet)
Replace @username
Turn all letters to lower cases. One
option is to retain original upper cases
Lower Cases
REGULAR EXPRESSION
Sentiment Analysis
tweet.lower()
&
RULE BASED
CLASSIFICATION
First we will use two simple rule-based techniques to conduct our sentiment analysis to see what happens
• (pos) x (pos) = (pos)
This movie is very good
• (pos) x (neg) = (neg)
This movie is not good at all
• (neg) x (neg) = (pos)
This movie was not bad
Require a Pre-defined List of Positive & Negative Words
‘COMMON SENSE’
CLASSIFICATION RULES
Accuracy
NEWSROOM (4,000)
KAGGLE (14,600)
MANUAL LABEL (2,400)
49.50 %
52.45 %
47.53 %
VADER is a lexicon and rule-
based sentiment analysis tool
that is especially attuned to
sentiments expressed in social
media.
Valence Aware Dictionary and sEntiment Reasoner
V.A.D.E.R.
SENTIMENT SCORES
Examples
{‘neg’: 0.0, ‘neu’: 0.293, ‘pos’: 0.707, ‘compound’: 0.8646}
“VADER is very smart, handsome, and funny!”
{‘neg’: 0.0, ‘neu’: 0.254, ‘pos’: 0.746, ‘compound’: 0.8316}
“VADER is smart, handsome and funny.”
{‘neg’: 0.0, ‘neu’: 0.233, ‘pos’: 0.767, ‘compound’: 0.9342}
“VADER is VERY SMART, handsome, and FUNNY!!!”
{‘neg’: 0.779, ‘neu’: 0.221, ‘pos’: 0.0, ‘compound’: -0.5461}
“Today SUX!”
{‘neg’: 0.0, ‘neu’: 0.637, ‘pos’: 0.363, ‘compound’: 0.431}
“At least it isn’t a horrible book.”
{‘neg’: 0.154, ‘neu’: 0.42, ‘pos’: 0.426, ‘compound’: 0.5974}
“Today kinda sux! But I’ll get by, lol :)”
NEWSROOM (4,000)
KAGGLE (14,600)
MANUAL LABEL (2,400)
54.75 %
38.00 %
54.02 %
Accuracy
MACHINE
LEARNING
Part I – Binary Classifications
Random Forest
Perform Random Forest Model2
Hyperplane
Linear SVM
Perform Support Vector Classifier3
Data Prep
Remove All Neutral Sentiment1
MACHINE LEARNING I
Sentiment Analysis
Training Accuracy: 99.88%
Testing Accuracy: 86.49%
TRAINING: 11,541 TESTING: 1,843
Training Accuracy: 99.91%
Testing Accuracy: 84.48%
TRAINING: 1,124 TESTING: 290
Training Accuracy: 99.87%
Testing Accuracy: 89.58%
TRAINING: 9,266 TESTING: 2,275
RANDOM FOREST
Sentiment Analysis
k
k
k
Training Accuracy: 99.12%
Testing Accuracy: 86.38%
Training Accuracy: 99.56%
Testing Accuracy: 84.14%
Training Accuracy: 99.18%
Testing Accuracy: 90.95%
SVC
Sentiment Analysis
TRAINING: 11,541 TESTING: 1,843
k
TRAINING: 9,266 TESTING: 2,275
kk
TRAINING: 1,124 TESTING: 290
MACHINE LEARING
Part II – Multi-Class Classifications
Apply Model on Testing
model.fit ( )
AdaBoost
MACHINE LEARNING II
Sentiment Analysis
Grid Search Cross-Validation
GridSearchCV ( )
5-Fold Cross Validation
StratifiedKFold ( )
TRAINING: 2,040 TESTING: 360 TRAINING: 14,640 TESTING: 2,400
k
SVC
ADABOOST
MODEL PERFORMANCE
While the test accuracy is slightly lower, using Kaggle data as
training set proved to be a stronger predictor based on the cross-
validation accuracy results. In addition, it is superior in positive and
neutral precision, not to mention much higher training volume.
96.89%
1.43%
33.85%
66.94%
64.86%
65.25%
50.00%
60.34%
60.88%
67.80%
0% 20% 40% 60% 80% 100% 120%
[ -1] Precision
[ne] Precision
[+1] Precision
Test Accuracy
CV Accuracy
Kaggle->Tweet Tweet->Tweet
SVC
MODEL PERFORMANCE
Again, while we see the precision of the twitter data is superior, the
Kaggle data yields a more balanced results for all three sentiments.
Because the overall test accuracy is about the same, we chose to
use Kaggle data as training due to its superior CV accuracy as well
as its large sum of data points (~8X).
93.33%
14.29%
47.69%
69.72%
66.39%
79.34%
49.10%
60.34%
69.08%
71.91%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[ -1] Precision
[ne] Precision
[+1] Precision
Test Accuracy
CV Accuracy
Kaggle->Tweet Tweet->Tweet
CONCLUSION
Support Vector Classifier
Model Parameters: decision_fun=‘ovo’, kernel=‘linear’, cost=0.4, gamma=0
Use Entire Kaggle Dataset as Training Set (14,640 data points)
Apply Model to Tweets of Unknown Sentiment
STATISTICS
Now we have the model, let’s take a look at the
results of collected tweets!
5,705
5,478
5,728
10,516
12,567
9,647
11,718
18,762
0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 20,000
United
Southwest
Delta
American
Original Tweets Re-Tweets
NUMBER OF TWEETS
Sentiment Analysis
OVERALL SENTIMENT
Sentiment Analysis
SENTIMENT
59.5 %26.0%
14.5%
52,694
Original Tweets
Positive
Neutral
Negative
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
20,000
American Delta Southwest United
Negative Netual Positive
Percentage of Original TweetsNumber of Original Tweets
SENTIMENT PER AIRLINE
Sentiment Analysis
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
American Delta Southwest United
Negative Netual Positive
-
Positive Sentiment
“Cenk Uygur”
Negative Sentiment
AMERICAN AIRLINE
Sentiment Analysis
-
Positive Sentiment
“Delta Assist”
Negative Sentiment
DELTA AIRLINE
Sentiment Analysis
“Dalton Rapattoni”
Positive Sentiment
“Dalton Rapattoni”
Negative Sentiment
SOUTHWEST AIRLINE
Sentiment Analysis
“Happy Birthday”, “90th"
Positive Sentiment
“Muslim”, “Shame”
Negative Sentiment
UNITED AIRLINE
Sentiment Analysis
RETWEET PER SENTIMENT
Sentiment Analysis
NEUTRAL POSITIVE
26.0%
NEGATIVE
31.7%
14.5%
19.1%
59.5%
49.2%
RetweetsOriginal Tweets
If the sentiment is classify accurately, people
are more likely to retweet a neutral or positive
tweets than a negative tweet.
RT @JackAndJackReal: Thanks @Delta for adding our
movie to all your flights! If any of you are traveling and
bored make sure u peep it
3,535 Retweets…
RT @ggreenwald: Arab-American family of 5 going on
vacation - 3 young kids in tow - removed from @United
plane for "safety reasons" https:/…
352 Retweets…
RT @UMG: The perfect trio @theweeknd @ddlovato and
Lucian Grainge #MusicIsUniversal #UMGShowcase w/
@AmericanAir & @Citi https://t.co/J
3,795 Retweets…
RT @DaltonRapattoni: Been home in Dallas for hours now
but still in the airport. @SouthwestAir PLEASE find my
bag. Please RT. #SWAFindDalto
1,952 Retweets…
RT @Mets: We’re partnering with @Delta to send a lucky
fan to games 1 & 2! RT for your chance to win
#DeltaMetsSweeps https://t.co/CDELSe84
17,093 Retweets…
RT @antoniodelotero: this skank bitch was SO RUDE to
me on my flight!!!! @united I asked for her name and she
goes, "it's Britney bitch.”
3,329 Retweets…
RT @JensenAckles: @AmericanAir I know it's crazy
#DFWsnow I'm just excited 2see my baby girl! Man
freaking out next 2 me isn't helping
10,260 Retweets…
RT @SouthwestAir: Yo #Kimye, we're really happy for you,
we'll let you finish, but South is one of the best names of all
time.
3,362 Retweets…
MOST RETWEETED TWEETS
Sentiment Analysis
TWEET
SOURCES
The majority of the tweets are from iPhone. There is also
almost 20% of tweets from Android devices. The rest
from the Web client and other applications. There is little
difference between the source of negative and positive
tweets; but there seems to be more alternative sources
for the neutral sentiment.
0%
10%
20%
30%
40%
50%
60%
iPhone
Android
Web
Others
Negative Neutral Positive
Photo Link Video LinkPhoto
16,787
SENTIMENT PER MEDIA TYPE
Sentiment Analysis
1,299 540
SENTIMENT
PositiveNeutralNegative
ENDING
THOUGHTS
Discussing final thoughts and some future steps
https://github.com/michaelucsb/Python_Twitter-Sentiment_GA
Each data point being
validated by multiple people
MULTIPLE VALIDATION
Naïve Bayes
Lemmatization
OTHER MODELS
Current coordinates data
isn’t great for Geo-location
analysis
COORDINATES
Unsupervised model to
discover words association
CLUSTERING
THOUGHTS & FUTURE STEPS
Sentiment Analysis
THANK
YOU
Q & A
Project Description
Prior to machine learning, the project conducted two rule-based senti-
ment classifications. One of which simply count the number of positive
and negative words, and the other utilizes VADER lexicon, a pre-built
sentiment analysis tool for Python. Both rule-based sentiment
classifications achieved an accuracy of 47% to 55%.
Next, the project used machine learning algorithms – Random Forest,
and Support Vector Classifier – to conduct binary classification on
positive and negative sentiments by removing all known neutral tweets.
85% to 90% accuracy were achieved on the test dataset.
Finally, the project used Ada Boost and Support Vector Classifier to
conduct multi-class classifications by including back all neutral tweets.
Through grid-search and cross-validation, the project achieved a 70%
overall accuracy on prediction and an 80% precision on negative
sentiment prediction on the test dataset.
Using the best model and parameters found, the model is applied to
tweets with unknown sentiments. The statistics summary were included
in the later half of the presentation. The presented word clouds also
captured interesting stories during the time the tweets were collected.
The project aims to use rule-based classification and machine learning
algorithms to predict content sentiments of Twitter feeds that mentioned
four major airlines: American Airline, United, Delta, and Southwest.
The project collected 80,121 tweets during a 7-day period via Twitter’s
API using Twython, a Python package; among these tweets, 2,400
randomly selected tweets were given a sentiment rating manually
(test). Through Kaggle, the project also obtained a cleaned dataset
contains 14,640 airline tweets with already rated sentiments (training).
To start off the project, the tweets were processed using tokenization,
regular expression, stop words removals, and vectorization. In regular
expression, the project eliminated all URLs, “@” sign, “#” sign, and
replaced “&” sign with the word “and”.

More Related Content

What's hot

Project on disease prediction
Project on disease predictionProject on disease prediction
Project on disease predictionKOYELMAJUMDAR1
 
Sentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataSentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataHari Prasad
 
detect emotion from text
detect emotion from textdetect emotion from text
detect emotion from textSafayet Hossain
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarRavi Kumar
 
Data analytics with python introductory
Data analytics with python introductoryData analytics with python introductory
Data analytics with python introductoryAbhimanyu Dwivedi
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Rachit Goel
 
project sentiment analysis
project sentiment analysisproject sentiment analysis
project sentiment analysissneha penmetsa
 
Character Recognition using Machine Learning
Character Recognition using Machine LearningCharacter Recognition using Machine Learning
Character Recognition using Machine LearningRitwikSaurabh1
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on TwitterSmritiAgarwal26
 
Weak Slot and Filler Structures
Weak Slot and Filler StructuresWeak Slot and Filler Structures
Weak Slot and Filler StructuresKushalParikh14
 
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...Geetika Gautam
 
Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on TwitterNitish J Prabhu
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesKarol Chlasta
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media Ravindra Chaudhary
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDevashish Shanker
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment AnalysisJaganadh Gopinadhan
 

What's hot (20)

Project on disease prediction
Project on disease predictionProject on disease prediction
Project on disease prediction
 
Loan prediction
Loan predictionLoan prediction
Loan prediction
 
Sentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataSentiment Analysis using Twitter Data
Sentiment Analysis using Twitter Data
 
detect emotion from text
detect emotion from textdetect emotion from text
detect emotion from text
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumar
 
Data analytics with python introductory
Data analytics with python introductoryData analytics with python introductory
Data analytics with python introductory
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14
 
project sentiment analysis
project sentiment analysisproject sentiment analysis
project sentiment analysis
 
Character Recognition using Machine Learning
Character Recognition using Machine LearningCharacter Recognition using Machine Learning
Character Recognition using Machine Learning
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on Twitter
 
Weak Slot and Filler Structures
Weak Slot and Filler StructuresWeak Slot and Filler Structures
Weak Slot and Filler Structures
 
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
 
Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on Twitter
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MINING
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use cases
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Cryptography
CryptographyCryptography
Cryptography
 

Viewers also liked

R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesJeffrey Breen
 
Sentiment Analysis - Analyzing Twitter Trend for Airline Travellers
Sentiment Analysis - Analyzing Twitter Trend for Airline TravellersSentiment Analysis - Analyzing Twitter Trend for Airline Travellers
Sentiment Analysis - Analyzing Twitter Trend for Airline TravellersAnimesh Dalakoti
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Dev Sahu
 
Who do Customers Like More? A Sentiment Analysis of SouthWest Air & Delta Air...
Who do Customers Like More? A Sentiment Analysis of SouthWest Air & Delta Air...Who do Customers Like More? A Sentiment Analysis of SouthWest Air & Delta Air...
Who do Customers Like More? A Sentiment Analysis of SouthWest Air & Delta Air...MMADave
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTrilok Sharma
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisFabio Benedetti
 
Sentiment Analysis in Dynamics CRM using Azure Text Analytics
Sentiment Analysis in Dynamics CRM using Azure Text AnalyticsSentiment Analysis in Dynamics CRM using Azure Text Analytics
Sentiment Analysis in Dynamics CRM using Azure Text AnalyticsLucas Alexander
 
Telecomin in indian industrial analysis
Telecomin in indian  industrial analysisTelecomin in indian  industrial analysis
Telecomin in indian industrial analysisNaveenkumar Sn
 
Industrial Analysis on power Industry
Industrial Analysis on power IndustryIndustrial Analysis on power Industry
Industrial Analysis on power IndustrySiva Konduri
 
Analysis of Industrial Wind Turbines_Final Presentation
Analysis of Industrial Wind Turbines_Final PresentationAnalysis of Industrial Wind Turbines_Final Presentation
Analysis of Industrial Wind Turbines_Final PresentationKofi Amoako-Gyan
 
industrial analysis
industrial analysisindustrial analysis
industrial analysistaimur2708
 
Airline Revenue Management Conference Presentations
Airline Revenue Management Conference PresentationsAirline Revenue Management Conference Presentations
Airline Revenue Management Conference PresentationsMillennium Aviation, Inc.
 
Peste Analysis Airline
Peste Analysis AirlinePeste Analysis Airline
Peste Analysis Airlinezeeshanvali
 
factors including economic and industrial analysis
factors including economic and industrial analysisfactors including economic and industrial analysis
factors including economic and industrial analysisvishnu1204
 
Toyota motors company-Industrial Analysis
Toyota motors company-Industrial AnalysisToyota motors company-Industrial Analysis
Toyota motors company-Industrial AnalysisMhara Dalangin
 
Web data from R
Web data from RWeb data from R
Web data from Rschamber
 
Emirates consumer behaviour anaysis
Emirates consumer  behaviour anaysisEmirates consumer  behaviour anaysis
Emirates consumer behaviour anaysisOlya Dyachuk
 
Industrial Analysis Presentation Oil & Gas
Industrial Analysis Presentation Oil & GasIndustrial Analysis Presentation Oil & Gas
Industrial Analysis Presentation Oil & GasSaravanan A
 
Twitter sentiment analysis project report
Twitter sentiment analysis project reportTwitter sentiment analysis project report
Twitter sentiment analysis project reportBharat Khanna
 

Viewers also liked (20)

R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlines
 
Sentiment Analysis - Analyzing Twitter Trend for Airline Travellers
Sentiment Analysis - Analyzing Twitter Trend for Airline TravellersSentiment Analysis - Analyzing Twitter Trend for Airline Travellers
Sentiment Analysis - Analyzing Twitter Trend for Airline Travellers
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
Who do Customers Like More? A Sentiment Analysis of SouthWest Air & Delta Air...
Who do Customers Like More? A Sentiment Analysis of SouthWest Air & Delta Air...Who do Customers Like More? A Sentiment Analysis of SouthWest Air & Delta Air...
Who do Customers Like More? A Sentiment Analysis of SouthWest Air & Delta Air...
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVM
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment Analysis
 
Sentiment Analysis in Dynamics CRM using Azure Text Analytics
Sentiment Analysis in Dynamics CRM using Azure Text AnalyticsSentiment Analysis in Dynamics CRM using Azure Text Analytics
Sentiment Analysis in Dynamics CRM using Azure Text Analytics
 
Telecomin in indian industrial analysis
Telecomin in indian  industrial analysisTelecomin in indian  industrial analysis
Telecomin in indian industrial analysis
 
Industrial analysis on Civil Aviation sector
Industrial analysis on Civil Aviation sectorIndustrial analysis on Civil Aviation sector
Industrial analysis on Civil Aviation sector
 
Industrial Analysis on power Industry
Industrial Analysis on power IndustryIndustrial Analysis on power Industry
Industrial Analysis on power Industry
 
Analysis of Industrial Wind Turbines_Final Presentation
Analysis of Industrial Wind Turbines_Final PresentationAnalysis of Industrial Wind Turbines_Final Presentation
Analysis of Industrial Wind Turbines_Final Presentation
 
industrial analysis
industrial analysisindustrial analysis
industrial analysis
 
Airline Revenue Management Conference Presentations
Airline Revenue Management Conference PresentationsAirline Revenue Management Conference Presentations
Airline Revenue Management Conference Presentations
 
Peste Analysis Airline
Peste Analysis AirlinePeste Analysis Airline
Peste Analysis Airline
 
factors including economic and industrial analysis
factors including economic and industrial analysisfactors including economic and industrial analysis
factors including economic and industrial analysis
 
Toyota motors company-Industrial Analysis
Toyota motors company-Industrial AnalysisToyota motors company-Industrial Analysis
Toyota motors company-Industrial Analysis
 
Web data from R
Web data from RWeb data from R
Web data from R
 
Emirates consumer behaviour anaysis
Emirates consumer  behaviour anaysisEmirates consumer  behaviour anaysis
Emirates consumer behaviour anaysis
 
Industrial Analysis Presentation Oil & Gas
Industrial Analysis Presentation Oil & GasIndustrial Analysis Presentation Oil & Gas
Industrial Analysis Presentation Oil & Gas
 
Twitter sentiment analysis project report
Twitter sentiment analysis project reportTwitter sentiment analysis project report
Twitter sentiment analysis project report
 

Similar to Sentiment Analysis of Airline Tweets

Course Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine LearningCourse Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine LearningJohn Edward Slough II
 
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Edureka!
 
Findings, Conclusions, & RecommendationsReport Writing
Findings, Conclusions, & RecommendationsReport WritingFindings, Conclusions, & RecommendationsReport Writing
Findings, Conclusions, & RecommendationsReport WritingShainaBoling829
 
Textual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie ReviewsTextual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie ReviewsYousef Fadila
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Sagar Deogirkar
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
 
Peterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_ProjectPeterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_Projectjpeterson2058
 
Younger: Predicting Age with Deep Learning
Younger: Predicting Age with Deep LearningYounger: Predicting Age with Deep Learning
Younger: Predicting Age with Deep LearningPrem Ananda
 
Simple rules for building robust machine learning models
Simple rules for building robust machine learning modelsSimple rules for building robust machine learning models
Simple rules for building robust machine learning modelsKyriakos Chatzidimitriou
 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision treeyazad dumasia
 
Machine Learning and Cognitive Fingerprinting - SparkCognition
Machine Learning and Cognitive Fingerprinting - SparkCognitionMachine Learning and Cognitive Fingerprinting - SparkCognition
Machine Learning and Cognitive Fingerprinting - SparkCognitionSparkCognition
 
Effort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software ProjectsEffort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software ProjectsDEEPRAJ PATHAK
 
Top 25 location R&D Hardware
Top 25 location R&D HardwareTop 25 location R&D Hardware
Top 25 location R&D HardwareCEB TalentNeuron
 
Big Data LDN 2017: Cognitive Search & Analytics – Bringing the Power of AI to...
Big Data LDN 2017: Cognitive Search & Analytics – Bringing the Power of AI to...Big Data LDN 2017: Cognitive Search & Analytics – Bringing the Power of AI to...
Big Data LDN 2017: Cognitive Search & Analytics – Bringing the Power of AI to...Matt Stubbs
 
Is your excel production code?
Is your excel production code?Is your excel production code?
Is your excel production code?ProCogia
 

Similar to Sentiment Analysis of Airline Tweets (20)

Course Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine LearningCourse Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine Learning
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
 
Cist2014 slides
Cist2014 slidesCist2014 slides
Cist2014 slides
 
CAR EVALUATION DATABASE
CAR EVALUATION DATABASECAR EVALUATION DATABASE
CAR EVALUATION DATABASE
 
Findings, Conclusions, & RecommendationsReport Writing
Findings, Conclusions, & RecommendationsReport WritingFindings, Conclusions, & RecommendationsReport Writing
Findings, Conclusions, & RecommendationsReport Writing
 
Textual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie ReviewsTextual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie Reviews
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
Peterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_ProjectPeterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_Project
 
Younger: Predicting Age with Deep Learning
Younger: Predicting Age with Deep LearningYounger: Predicting Age with Deep Learning
Younger: Predicting Age with Deep Learning
 
Simple rules for building robust machine learning models
Simple rules for building robust machine learning modelsSimple rules for building robust machine learning models
Simple rules for building robust machine learning models
 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision tree
 
ASA.pptx
ASA.pptxASA.pptx
ASA.pptx
 
Machine Learning and Cognitive Fingerprinting - SparkCognition
Machine Learning and Cognitive Fingerprinting - SparkCognitionMachine Learning and Cognitive Fingerprinting - SparkCognition
Machine Learning and Cognitive Fingerprinting - SparkCognition
 
Effort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software ProjectsEffort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software Projects
 
Top 25 location R&D Hardware
Top 25 location R&D HardwareTop 25 location R&D Hardware
Top 25 location R&D Hardware
 
Uncertainties in Deep Learning
Uncertainties in Deep LearningUncertainties in Deep Learning
Uncertainties in Deep Learning
 
Big Data LDN 2017: Cognitive Search & Analytics – Bringing the Power of AI to...
Big Data LDN 2017: Cognitive Search & Analytics – Bringing the Power of AI to...Big Data LDN 2017: Cognitive Search & Analytics – Bringing the Power of AI to...
Big Data LDN 2017: Cognitive Search & Analytics – Bringing the Power of AI to...
 
Is your excel production code?
Is your excel production code?Is your excel production code?
Is your excel production code?
 

Recently uploaded

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 

Recently uploaded (20)

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 

Sentiment Analysis of Airline Tweets

  • 1. General Assembly Data Science Airline Tweets Sentiment Analysis
  • 2. Natural language data pre-processing steps 02 PRE-PROCESSING What were the data collection process. When, where, and how? 01 DATA SETS AGENDA Sentiment Analysis
  • 3. Using machine learning algorithms, we attempt to classify positive and negative sentiments 04 MACHINE LEARNING BINARY CLASSIFICATIONS Using machine learning to predict positive, neutral and negative sentiments and determine the best model for data with unknown sentiments 05 MACHINE LEARNING MULTI-CLASS CLASSIFICATIONS Without using machine learning, can we use some basic rules to classify sentiments? 03 RULE-BASED CLASSIFICATIONS
  • 4. 07 CLOSING THOUGHTS FEEDBACK | Q&A The summary and statistics of the collected airline tweets 06 REASULTS
  • 7. Reformatted/cleaned tweets with graded sentiment of Major Airlines from Feb 2015 14,640 Tweets KAGGLE Commercial datasets provided by Newsroom with machine graded tweets 4,000 Tweets Newsroom Using Python and twython to retrieve tweets through Twitter’s API during 7 days period. 80,121 Tweets TWITTER API k SOURCES Sentiment Analysis
  • 9. DATA PRE- PROCESSING Natural language processes to prepare and transform the tweet content for various classification models and sentiment analysis
  • 10. Major processing steps NATURAL LANGUGE PROCESSING Sentiment Analysis For machine learning Vector  Matrix… get it? 4 Vectorizer Perform regular expression 2 RegEx Tokenize all tweet contents 1 Tokenization Remove all the English stop words 3 Stop Words
  • 11. Replacing all &amp symbols with the actual words ‘and’ with a space re.sub(r'&', r' and', tweet) Replace ‘&amp’ Signs Remove all annoying white spaces re.sub('[s]+', ' ', tweet) Remove Additional Spaces Convert all #hashtagwords to just ‘hashtagwords’ re.sub(r'#([^s]+)', r'1', tweet) Replace #hastags Turn all https://... and www… format to display just the word ‘URL’ re.sub('((www.[^s]+)|(https?://[^s]+))','URL',tweet) Eliminate URLs Convert all @username to display just ‘username’ without the ‘@’ sign re.sub(r'@([^s]+)', r'1', tweet) Replace @username Turn all letters to lower cases. One option is to retain original upper cases Lower Cases REGULAR EXPRESSION Sentiment Analysis tweet.lower() &
  • 12. RULE BASED CLASSIFICATION First we will use two simple rule-based techniques to conduct our sentiment analysis to see what happens
  • 13. • (pos) x (pos) = (pos) This movie is very good • (pos) x (neg) = (neg) This movie is not good at all • (neg) x (neg) = (pos) This movie was not bad Require a Pre-defined List of Positive & Negative Words ‘COMMON SENSE’ CLASSIFICATION RULES
  • 14. Accuracy NEWSROOM (4,000) KAGGLE (14,600) MANUAL LABEL (2,400) 49.50 % 52.45 % 47.53 %
  • 15. VADER is a lexicon and rule- based sentiment analysis tool that is especially attuned to sentiments expressed in social media. Valence Aware Dictionary and sEntiment Reasoner V.A.D.E.R. SENTIMENT SCORES
  • 16. Examples {‘neg’: 0.0, ‘neu’: 0.293, ‘pos’: 0.707, ‘compound’: 0.8646} “VADER is very smart, handsome, and funny!” {‘neg’: 0.0, ‘neu’: 0.254, ‘pos’: 0.746, ‘compound’: 0.8316} “VADER is smart, handsome and funny.” {‘neg’: 0.0, ‘neu’: 0.233, ‘pos’: 0.767, ‘compound’: 0.9342} “VADER is VERY SMART, handsome, and FUNNY!!!” {‘neg’: 0.779, ‘neu’: 0.221, ‘pos’: 0.0, ‘compound’: -0.5461} “Today SUX!” {‘neg’: 0.0, ‘neu’: 0.637, ‘pos’: 0.363, ‘compound’: 0.431} “At least it isn’t a horrible book.” {‘neg’: 0.154, ‘neu’: 0.42, ‘pos’: 0.426, ‘compound’: 0.5974} “Today kinda sux! But I’ll get by, lol :)”
  • 17. NEWSROOM (4,000) KAGGLE (14,600) MANUAL LABEL (2,400) 54.75 % 38.00 % 54.02 % Accuracy
  • 18. MACHINE LEARNING Part I – Binary Classifications
  • 19. Random Forest Perform Random Forest Model2 Hyperplane Linear SVM Perform Support Vector Classifier3 Data Prep Remove All Neutral Sentiment1 MACHINE LEARNING I Sentiment Analysis
  • 20. Training Accuracy: 99.88% Testing Accuracy: 86.49% TRAINING: 11,541 TESTING: 1,843 Training Accuracy: 99.91% Testing Accuracy: 84.48% TRAINING: 1,124 TESTING: 290 Training Accuracy: 99.87% Testing Accuracy: 89.58% TRAINING: 9,266 TESTING: 2,275 RANDOM FOREST Sentiment Analysis k k k
  • 21. Training Accuracy: 99.12% Testing Accuracy: 86.38% Training Accuracy: 99.56% Testing Accuracy: 84.14% Training Accuracy: 99.18% Testing Accuracy: 90.95% SVC Sentiment Analysis TRAINING: 11,541 TESTING: 1,843 k TRAINING: 9,266 TESTING: 2,275 kk TRAINING: 1,124 TESTING: 290
  • 22. MACHINE LEARING Part II – Multi-Class Classifications
  • 23. Apply Model on Testing model.fit ( ) AdaBoost MACHINE LEARNING II Sentiment Analysis Grid Search Cross-Validation GridSearchCV ( ) 5-Fold Cross Validation StratifiedKFold ( ) TRAINING: 2,040 TESTING: 360 TRAINING: 14,640 TESTING: 2,400 k SVC
  • 24. ADABOOST MODEL PERFORMANCE While the test accuracy is slightly lower, using Kaggle data as training set proved to be a stronger predictor based on the cross- validation accuracy results. In addition, it is superior in positive and neutral precision, not to mention much higher training volume. 96.89% 1.43% 33.85% 66.94% 64.86% 65.25% 50.00% 60.34% 60.88% 67.80% 0% 20% 40% 60% 80% 100% 120% [ -1] Precision [ne] Precision [+1] Precision Test Accuracy CV Accuracy Kaggle->Tweet Tweet->Tweet
  • 25. SVC MODEL PERFORMANCE Again, while we see the precision of the twitter data is superior, the Kaggle data yields a more balanced results for all three sentiments. Because the overall test accuracy is about the same, we chose to use Kaggle data as training due to its superior CV accuracy as well as its large sum of data points (~8X). 93.33% 14.29% 47.69% 69.72% 66.39% 79.34% 49.10% 60.34% 69.08% 71.91% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [ -1] Precision [ne] Precision [+1] Precision Test Accuracy CV Accuracy Kaggle->Tweet Tweet->Tweet
  • 26. CONCLUSION Support Vector Classifier Model Parameters: decision_fun=‘ovo’, kernel=‘linear’, cost=0.4, gamma=0 Use Entire Kaggle Dataset as Training Set (14,640 data points) Apply Model to Tweets of Unknown Sentiment
  • 27. STATISTICS Now we have the model, let’s take a look at the results of collected tweets!
  • 28. 5,705 5,478 5,728 10,516 12,567 9,647 11,718 18,762 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 20,000 United Southwest Delta American Original Tweets Re-Tweets NUMBER OF TWEETS Sentiment Analysis
  • 29. OVERALL SENTIMENT Sentiment Analysis SENTIMENT 59.5 %26.0% 14.5% 52,694 Original Tweets Positive Neutral Negative
  • 30. 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 20,000 American Delta Southwest United Negative Netual Positive Percentage of Original TweetsNumber of Original Tweets SENTIMENT PER AIRLINE Sentiment Analysis 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% American Delta Southwest United Negative Netual Positive
  • 31. - Positive Sentiment “Cenk Uygur” Negative Sentiment AMERICAN AIRLINE Sentiment Analysis
  • 32. - Positive Sentiment “Delta Assist” Negative Sentiment DELTA AIRLINE Sentiment Analysis
  • 33. “Dalton Rapattoni” Positive Sentiment “Dalton Rapattoni” Negative Sentiment SOUTHWEST AIRLINE Sentiment Analysis
  • 34. “Happy Birthday”, “90th" Positive Sentiment “Muslim”, “Shame” Negative Sentiment UNITED AIRLINE Sentiment Analysis
  • 35. RETWEET PER SENTIMENT Sentiment Analysis NEUTRAL POSITIVE 26.0% NEGATIVE 31.7% 14.5% 19.1% 59.5% 49.2% RetweetsOriginal Tweets If the sentiment is classify accurately, people are more likely to retweet a neutral or positive tweets than a negative tweet.
  • 36. RT @JackAndJackReal: Thanks @Delta for adding our movie to all your flights! If any of you are traveling and bored make sure u peep it 3,535 Retweets… RT @ggreenwald: Arab-American family of 5 going on vacation - 3 young kids in tow - removed from @United plane for "safety reasons" https:/… 352 Retweets… RT @UMG: The perfect trio @theweeknd @ddlovato and Lucian Grainge #MusicIsUniversal #UMGShowcase w/ @AmericanAir & @Citi https://t.co/J 3,795 Retweets… RT @DaltonRapattoni: Been home in Dallas for hours now but still in the airport. @SouthwestAir PLEASE find my bag. Please RT. #SWAFindDalto 1,952 Retweets… RT @Mets: We’re partnering with @Delta to send a lucky fan to games 1 & 2! RT for your chance to win #DeltaMetsSweeps https://t.co/CDELSe84 17,093 Retweets… RT @antoniodelotero: this skank bitch was SO RUDE to me on my flight!!!! @united I asked for her name and she goes, "it's Britney bitch.” 3,329 Retweets… RT @JensenAckles: @AmericanAir I know it's crazy #DFWsnow I'm just excited 2see my baby girl! Man freaking out next 2 me isn't helping 10,260 Retweets… RT @SouthwestAir: Yo #Kimye, we're really happy for you, we'll let you finish, but South is one of the best names of all time. 3,362 Retweets… MOST RETWEETED TWEETS Sentiment Analysis
  • 37. TWEET SOURCES The majority of the tweets are from iPhone. There is also almost 20% of tweets from Android devices. The rest from the Web client and other applications. There is little difference between the source of negative and positive tweets; but there seems to be more alternative sources for the neutral sentiment. 0% 10% 20% 30% 40% 50% 60% iPhone Android Web Others Negative Neutral Positive
  • 38. Photo Link Video LinkPhoto 16,787 SENTIMENT PER MEDIA TYPE Sentiment Analysis 1,299 540 SENTIMENT PositiveNeutralNegative
  • 40. https://github.com/michaelucsb/Python_Twitter-Sentiment_GA Each data point being validated by multiple people MULTIPLE VALIDATION Naïve Bayes Lemmatization OTHER MODELS Current coordinates data isn’t great for Geo-location analysis COORDINATES Unsupervised model to discover words association CLUSTERING THOUGHTS & FUTURE STEPS Sentiment Analysis
  • 42. Project Description Prior to machine learning, the project conducted two rule-based senti- ment classifications. One of which simply count the number of positive and negative words, and the other utilizes VADER lexicon, a pre-built sentiment analysis tool for Python. Both rule-based sentiment classifications achieved an accuracy of 47% to 55%. Next, the project used machine learning algorithms – Random Forest, and Support Vector Classifier – to conduct binary classification on positive and negative sentiments by removing all known neutral tweets. 85% to 90% accuracy were achieved on the test dataset. Finally, the project used Ada Boost and Support Vector Classifier to conduct multi-class classifications by including back all neutral tweets. Through grid-search and cross-validation, the project achieved a 70% overall accuracy on prediction and an 80% precision on negative sentiment prediction on the test dataset. Using the best model and parameters found, the model is applied to tweets with unknown sentiments. The statistics summary were included in the later half of the presentation. The presented word clouds also captured interesting stories during the time the tweets were collected. The project aims to use rule-based classification and machine learning algorithms to predict content sentiments of Twitter feeds that mentioned four major airlines: American Airline, United, Delta, and Southwest. The project collected 80,121 tweets during a 7-day period via Twitter’s API using Twython, a Python package; among these tweets, 2,400 randomly selected tweets were given a sentiment rating manually (test). Through Kaggle, the project also obtained a cleaned dataset contains 14,640 airline tweets with already rated sentiments (training). To start off the project, the tweets were processed using tokenization, regular expression, stop words removals, and vectorization. In regular expression, the project eliminated all URLs, “@” sign, “#” sign, and replaced “&” sign with the word “and”.