SlideShare a Scribd company logo
INST 737 – Twitter
Sentiment Prediction on
#Windows10 release
Anuj Sharma, Krishnesh Pujari and Rajesh
Gnanasekaran
12/03/15
Objective
• Twitter in the recent time has come at par to other social
Medias such as Facebook, Google+ and Myspace in terms
of creating sentiment waves on any issue around the
world.
• To perform a twitter sentiment analysis and sentiment
prediction on Microsoft’s Windows 10 release which took
place on July 29th of this year.
• Follow semi-supervised learning technique to create
target variable and use it in the classification models.
• To analyze and interpret the results and provide
recommendations to Microsoft.
About the Data
• Imported using NodeXL from Twitter Search Network
• Original dataset had 9000+ observations on hashtag
‘#Windows10’ for the time period between July 28th 2015
till August 05th 2015
• After cleaning (missing, duplicate, other language) ended up
with 4646 observations with 28 original factors, 19 derived
features
• Performed feature engineering to arrive at these additional
features as we felt they might be better used to predict the
target factor, i.e, “Polarity”
• Types of Variables - Categorical, Continuous
Sentiment Analysis
● Tweet text cleaning - remove filler words, ignore words
which are not in english
● Used a customized R code for text mining which parsed
tweets and classified the words into +ve, -ve or neutral
polarities
● The code compared the words in the tweets with a
dictionary and mapped the polarity with the tweet.
● Cross checked for the correct functionality of the code by
creating 100 odd tweets and manually checked the
polarity
Exploring the Data
● Created histograms and box plots to identify any unusual
behavior between the variables. Found some interesting
patterns
Continued...
● Tested the variables
over Pearson’s
Correlation; found
significant correlation
between factors like
Tweets and Followed.
Made sure that we did
not include both these
variables together in
logistic regression.
● Momentum of tweets
shifted from +ve-
neutral to -ve at the
end period of sample;
almost 80% of -ve
tweets on 08/05
Feature Engineering
● Tweet timestamp was broken into Tweet date and Tweet
time
● Current Date
● Days difference = Tweet date minus upgrade date
● Number of weeks since joined Twitter
● Number of months since joined Twitter
● Log of number of months since joined Twitter
● Log of number of followers
● Log of number of people followed by the user
● Log of number of favorites
● Log of number of tweets
● Length of Tweet
Multinomial Logistic Regression
and Interpretation
● Multinomial over Binomial - Target variable has more
than two values.
● To check which factors are affecting the tweet polarity in
any manner.
● Interpret using Log of odds to see the variation.
● Variables of importance: Relationship, No. of followers,
Tweet length, No. of weeks since joined twitter
Results
Decision Trees Classification
and Interpretation
● Decision trees are the alternative to logistic regression
● CART (Classification and Regression Trees) method is
used to recursively classify the target variable
● Variables of importance: Tweet date, Days difference and
length of the tweet
Results
Random Forest Classification
and Interpretation
● Random Forest is an ensemble of decision trees which
will helps in better prediction of polarity
● Implemented 501 decision trees to identify important
predictors of polarity
● Variables of importance: Tweet date, Days difference and
length of tweet
Results
Limitations
● The dataset was for a short span of time between 07/28/15 and
08/05/15, if bigger dataset sample, results may differ
● We have limited the scope of this project to tweets only in English
language.
● We were not able to take advantage of the geo-spatial coordinates
as most of the records had n/a value.
Recommendations
● As the negative sentiment starts to prevail post release in
the later half of the week, Microsoft should not stop on
the positive branding even post release.
● As the tweet coming from a seasoned twitter user is
more likely to be negative, Microsoft should target those
influential accounts to spread positive word.
Thank You !

More Related Content

What's hot

How I Hacked The Government And Got Away With It
How I Hacked The Government And Got Away With ItHow I Hacked The Government And Got Away With It
How I Hacked The Government And Got Away With It
Steven Hatfield
 
EMNLP2014読み会 徳永
EMNLP2014読み会 徳永EMNLP2014読み会 徳永
EMNLP2014読み会 徳永
Hiroyuki TOKUNAGA
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment AnalysisRexNige
 
Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Cyber Pun...
Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Cyber Pun...Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Cyber Pun...
Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Cyber Pun...
Spotle.ai
 
Turrin rec syschallenge_presentation_@recsys2014
Turrin rec syschallenge_presentation_@recsys2014Turrin rec syschallenge_presentation_@recsys2014
Turrin rec syschallenge_presentation_@recsys2014
Roberto Turrin
 
Product Sentiment Analysis
Product Sentiment AnalysisProduct Sentiment Analysis
Product Sentiment Analysisnancy amala
 
CUS 695 Project Presentation
CUS 695 Project PresentationCUS 695 Project Presentation
CUS 695 Project PresentationAdrian Duran
 

What's hot (7)

How I Hacked The Government And Got Away With It
How I Hacked The Government And Got Away With ItHow I Hacked The Government And Got Away With It
How I Hacked The Government And Got Away With It
 
EMNLP2014読み会 徳永
EMNLP2014読み会 徳永EMNLP2014読み会 徳永
EMNLP2014読み会 徳永
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Cyber Pun...
Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Cyber Pun...Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Cyber Pun...
Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Cyber Pun...
 
Turrin rec syschallenge_presentation_@recsys2014
Turrin rec syschallenge_presentation_@recsys2014Turrin rec syschallenge_presentation_@recsys2014
Turrin rec syschallenge_presentation_@recsys2014
 
Product Sentiment Analysis
Product Sentiment AnalysisProduct Sentiment Analysis
Product Sentiment Analysis
 
CUS 695 Project Presentation
CUS 695 Project PresentationCUS 695 Project Presentation
CUS 695 Project Presentation
 

Viewers also liked

PujariKrishnesh_Capstone_Project Poster
PujariKrishnesh_Capstone_Project PosterPujariKrishnesh_Capstone_Project Poster
PujariKrishnesh_Capstone_Project PosterKrishnesh Pujari
 
Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
Sentiment Analysis Using Hybrid Structure of Machine Learning AlgorithmsSentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
Sangeeth Nagarajan
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
Dr.ammara khakwani
 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkR
Databricks
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
James Neill
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
Sumit Raj
 

Viewers also liked (6)

PujariKrishnesh_Capstone_Project Poster
PujariKrishnesh_Capstone_Project PosterPujariKrishnesh_Capstone_Project Poster
PujariKrishnesh_Capstone_Project Poster
 
Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
Sentiment Analysis Using Hybrid Structure of Machine Learning AlgorithmsSentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkR
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 

Similar to Twitter Sentiment Prediction.pptx

Social media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATESocial media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATE
Diana Maynard
 
Twitter Sentiment Analysis.pdf
Twitter Sentiment Analysis.pdfTwitter Sentiment Analysis.pdf
Twitter Sentiment Analysis.pdf
Rachanasamal3
 
LSC Technology Initiative Grant Conference 2015 | Session Materials - Demonst...
LSC Technology Initiative Grant Conference 2015 | Session Materials - Demonst...LSC Technology Initiative Grant Conference 2015 | Session Materials - Demonst...
LSC Technology Initiative Grant Conference 2015 | Session Materials - Demonst...
Legal Services Corporation
 
Tweets Classification
Tweets ClassificationTweets Classification
Tweets ClassificationVarun Gupta
 
Multi-lingual Twitter sentiment analysis using machine learning
Multi-lingual Twitter sentiment analysis using machine learning Multi-lingual Twitter sentiment analysis using machine learning
Multi-lingual Twitter sentiment analysis using machine learning
IJECEIAES
 
Twitter analysis - Data as factor for designing the right communication star...
Twitter analysis  - Data as factor for designing the right communication star...Twitter analysis  - Data as factor for designing the right communication star...
Twitter analysis - Data as factor for designing the right communication star...
Pere Claver Llimona
 
Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service ii
Kan-Han (John) Lu
 
Twitter sentiment analysis using Azure NLP
Twitter sentiment analysis using Azure NLP Twitter sentiment analysis using Azure NLP
Twitter sentiment analysis using Azure NLP
Olusola Amusan
 
Predicting Tweet Sentiment
Predicting Tweet SentimentPredicting Tweet Sentiment
Predicting Tweet Sentiment
Lucinda Linde
 
Metrics with BMW Director of Product
Metrics with BMW Director of Product Metrics with BMW Director of Product
Metrics with BMW Director of Product
Promotable
 
Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique
IJERA Editor
 
Measuring the Effectiveness of Data Analysis Projects_ Key Metrics and Strate...
Measuring the Effectiveness of Data Analysis Projects_ Key Metrics and Strate...Measuring the Effectiveness of Data Analysis Projects_ Key Metrics and Strate...
Measuring the Effectiveness of Data Analysis Projects_ Key Metrics and Strate...
Soumodeep Nanee Kundu
 
A STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNING
A STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNINGA STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNING
A STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNING
IRJET Journal
 
What is Chatgpt Complete Guide
What is Chatgpt Complete GuideWhat is Chatgpt Complete Guide
What is Chatgpt Complete Guide
Ravendra Singh
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysis
Ashish Mundra
 
How to get things done - Lessons from Yahoo, Google, Netflix and Meta
How to get things done - Lessons from Yahoo, Google, Netflix and Meta How to get things done - Lessons from Yahoo, Google, Netflix and Meta
How to get things done - Lessons from Yahoo, Google, Netflix and Meta
Ido Green
 
Agile estimation
Agile estimationAgile estimation
Agile estimation
Stephen Forte
 
srd117.final.512Spring2016
srd117.final.512Spring2016srd117.final.512Spring2016
srd117.final.512Spring2016Saurabh Deochake
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
IRJET Journal
 
Multi-Class Sentiment Classification using Machine Learning and Deep Learning...
Multi-Class Sentiment Classification using Machine Learning and Deep Learning...Multi-Class Sentiment Classification using Machine Learning and Deep Learning...
Multi-Class Sentiment Classification using Machine Learning and Deep Learning...
saurav singla
 

Similar to Twitter Sentiment Prediction.pptx (20)

Social media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATESocial media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATE
 
Twitter Sentiment Analysis.pdf
Twitter Sentiment Analysis.pdfTwitter Sentiment Analysis.pdf
Twitter Sentiment Analysis.pdf
 
LSC Technology Initiative Grant Conference 2015 | Session Materials - Demonst...
LSC Technology Initiative Grant Conference 2015 | Session Materials - Demonst...LSC Technology Initiative Grant Conference 2015 | Session Materials - Demonst...
LSC Technology Initiative Grant Conference 2015 | Session Materials - Demonst...
 
Tweets Classification
Tweets ClassificationTweets Classification
Tweets Classification
 
Multi-lingual Twitter sentiment analysis using machine learning
Multi-lingual Twitter sentiment analysis using machine learning Multi-lingual Twitter sentiment analysis using machine learning
Multi-lingual Twitter sentiment analysis using machine learning
 
Twitter analysis - Data as factor for designing the right communication star...
Twitter analysis  - Data as factor for designing the right communication star...Twitter analysis  - Data as factor for designing the right communication star...
Twitter analysis - Data as factor for designing the right communication star...
 
Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service ii
 
Twitter sentiment analysis using Azure NLP
Twitter sentiment analysis using Azure NLP Twitter sentiment analysis using Azure NLP
Twitter sentiment analysis using Azure NLP
 
Predicting Tweet Sentiment
Predicting Tweet SentimentPredicting Tweet Sentiment
Predicting Tweet Sentiment
 
Metrics with BMW Director of Product
Metrics with BMW Director of Product Metrics with BMW Director of Product
Metrics with BMW Director of Product
 
Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique
 
Measuring the Effectiveness of Data Analysis Projects_ Key Metrics and Strate...
Measuring the Effectiveness of Data Analysis Projects_ Key Metrics and Strate...Measuring the Effectiveness of Data Analysis Projects_ Key Metrics and Strate...
Measuring the Effectiveness of Data Analysis Projects_ Key Metrics and Strate...
 
A STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNING
A STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNINGA STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNING
A STUDY ON TWITTER SENTIMENT ANALYSIS USING DEEP LEARNING
 
What is Chatgpt Complete Guide
What is Chatgpt Complete GuideWhat is Chatgpt Complete Guide
What is Chatgpt Complete Guide
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysis
 
How to get things done - Lessons from Yahoo, Google, Netflix and Meta
How to get things done - Lessons from Yahoo, Google, Netflix and Meta How to get things done - Lessons from Yahoo, Google, Netflix and Meta
How to get things done - Lessons from Yahoo, Google, Netflix and Meta
 
Agile estimation
Agile estimationAgile estimation
Agile estimation
 
srd117.final.512Spring2016
srd117.final.512Spring2016srd117.final.512Spring2016
srd117.final.512Spring2016
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
Multi-Class Sentiment Classification using Machine Learning and Deep Learning...
Multi-Class Sentiment Classification using Machine Learning and Deep Learning...Multi-Class Sentiment Classification using Machine Learning and Deep Learning...
Multi-Class Sentiment Classification using Machine Learning and Deep Learning...
 

Twitter Sentiment Prediction.pptx

  • 1. INST 737 – Twitter Sentiment Prediction on #Windows10 release Anuj Sharma, Krishnesh Pujari and Rajesh Gnanasekaran 12/03/15
  • 2. Objective • Twitter in the recent time has come at par to other social Medias such as Facebook, Google+ and Myspace in terms of creating sentiment waves on any issue around the world. • To perform a twitter sentiment analysis and sentiment prediction on Microsoft’s Windows 10 release which took place on July 29th of this year. • Follow semi-supervised learning technique to create target variable and use it in the classification models. • To analyze and interpret the results and provide recommendations to Microsoft.
  • 3. About the Data • Imported using NodeXL from Twitter Search Network • Original dataset had 9000+ observations on hashtag ‘#Windows10’ for the time period between July 28th 2015 till August 05th 2015 • After cleaning (missing, duplicate, other language) ended up with 4646 observations with 28 original factors, 19 derived features • Performed feature engineering to arrive at these additional features as we felt they might be better used to predict the target factor, i.e, “Polarity” • Types of Variables - Categorical, Continuous
  • 4. Sentiment Analysis ● Tweet text cleaning - remove filler words, ignore words which are not in english ● Used a customized R code for text mining which parsed tweets and classified the words into +ve, -ve or neutral polarities ● The code compared the words in the tweets with a dictionary and mapped the polarity with the tweet. ● Cross checked for the correct functionality of the code by creating 100 odd tweets and manually checked the polarity
  • 5. Exploring the Data ● Created histograms and box plots to identify any unusual behavior between the variables. Found some interesting patterns
  • 6. Continued... ● Tested the variables over Pearson’s Correlation; found significant correlation between factors like Tweets and Followed. Made sure that we did not include both these variables together in logistic regression. ● Momentum of tweets shifted from +ve- neutral to -ve at the end period of sample; almost 80% of -ve tweets on 08/05
  • 7. Feature Engineering ● Tweet timestamp was broken into Tweet date and Tweet time ● Current Date ● Days difference = Tweet date minus upgrade date ● Number of weeks since joined Twitter ● Number of months since joined Twitter ● Log of number of months since joined Twitter ● Log of number of followers ● Log of number of people followed by the user ● Log of number of favorites ● Log of number of tweets ● Length of Tweet
  • 8. Multinomial Logistic Regression and Interpretation ● Multinomial over Binomial - Target variable has more than two values. ● To check which factors are affecting the tweet polarity in any manner. ● Interpret using Log of odds to see the variation. ● Variables of importance: Relationship, No. of followers, Tweet length, No. of weeks since joined twitter
  • 10. Decision Trees Classification and Interpretation ● Decision trees are the alternative to logistic regression ● CART (Classification and Regression Trees) method is used to recursively classify the target variable ● Variables of importance: Tweet date, Days difference and length of the tweet
  • 12. Random Forest Classification and Interpretation ● Random Forest is an ensemble of decision trees which will helps in better prediction of polarity ● Implemented 501 decision trees to identify important predictors of polarity ● Variables of importance: Tweet date, Days difference and length of tweet
  • 14. Limitations ● The dataset was for a short span of time between 07/28/15 and 08/05/15, if bigger dataset sample, results may differ ● We have limited the scope of this project to tweets only in English language. ● We were not able to take advantage of the geo-spatial coordinates as most of the records had n/a value. Recommendations ● As the negative sentiment starts to prevail post release in the later half of the week, Microsoft should not stop on the positive branding even post release. ● As the tweet coming from a seasoned twitter user is more likely to be negative, Microsoft should target those influential accounts to spread positive word.