SlideShare a Scribd company logo
SENTIMENT ANALYSIS OF TWEETS
 Predicting a Movie's Box Office Success

                Vasu Jain
                 Shu Cai

                12/05/2012
SENTIMENT ANALYSIS OF TWEETS
 Predicting a Movie's Box Office success

             Under Guidance of :
                Dr. Yan Liu
AGENDA
1. Introduction
2. Related Work
3. Methodology
4. Experiments
5. Conclusion
6. Q and A




                  Image source: SNLP Slides for Sentiment Analysis
INTRODUCTION
About Twitter

•   Social networking and microblogging service
•   Enables users to send and read messages
•   Messages of length up to 140 characters, known as "tweets".

Tweets contain rich information about people’s preferences.

People share their thoughts about movies using Twitter.

Data analysis on twitter data to predict the success of a movie.
INTRODUCTION
People’s opinions towards a movie have huge impact on its
success.

Our project includes prediction using Twitter data, and analysis of
the prediction results.

High volume of positive tweets may indicate success of a movie.
But how to quantify ?




      Image source: http://www.demainlaveille.fr/2012/05/06/pourquoi-twitter-ne-peut-pas-predire-les-elections-presidentielles/
RELATED WORK
Using social media to predict the future becomes very popular in recent
years.

• Predicting the Future with Social Media (Sitaram Asur & Bernardo A.
  Huberman, 2010) tries to show that twitter-based prediction of box
  office revenue performs better than market-based prediction.

• Predicting IMDB movie ratings using social media (Andrei Oghina,
  Mathias Breuss, Manos Tsagkias & Maarten de Rijke 2012) uses twitter
  and youtube data to predict the imdb scores.

Our project includes prediction using Twitter data and investigation on two
new topics based on the prediction results.
RELATED WORK
• Predicting the results of presidential election (USC Annenberg
  Innovation Lab & USC SAIL).

• Sentiment 140 to discover the Twitter sentiment (sentiment140.com) .
  No movie prediction is provided.
OUR WORK

• Data Collection: existing twitter data set and recent tweets via
  Twitter API

• Data Pre-processing: get the "clean" data and transform it to the
  format we need

• Sentiment Analysis: train a classifier to classify the tweets as:
  positive, negative, neutral and irrelevant

• Prediction: use the statistics of the tweets' labels to predict the
  movie success (hit/flop/average)
METHODOLOGIES: Data Collection & Crawling
2009 Data set Subset of Stanford dataset (now unavailable)
   • 477 Million Tweets, period of June – Dec 2009
   • Filtered tweets during critical period for movie
   • 68.7 GB datasets (compressed format)
   • 30 movies, 6 Million relevant Tweets

2012 Data set live crawling using a script
   • Streaming API of python library for Twitter
      to collect data
   • Data Retrieval using keywords for movies
   • Data collection focus on critical period
   • 8 Movies, 2.5 Million Tweets



                         Image source: http://drupal.org/project/twitterminer
METHODOLOGIES: Data Collection & Crawling
   160000

   140000

   120000

   100000

    80000

    60000

    40000

    20000

        0
            week -6
                      week -5
                                week -4
                                          week -3
                                                    week -2
                                                              week -1
                                                                        week 0
                                                                                 week 1
                                                                                          week 2
                                                                                                   week 3
                                                                                                            week 4
                                                                                                                     week 5
                                                                                                                              week 6
                                                                                                                                       week 7
                                                                                                                                                week 8
                                                                                                                                                         week 9
                                                                                                                                                                  week 10
                                                                                                                                                                            week 11
                                                                                                                                                                                      week 12
                                                                                                                                                                                                week 13
                                                                                                                                                                                                          week 14
                                                                                                                                                                                                                    week 15
                                                                                                                                                                                                                              week 16
                                                                                                                                                                                                                                        week 17
                                                                                                                                                                                                                                                  week 18
                                                                                                                                                                                                                                                            week 19
                                                                                                                                                                                                                                                                      week 20
                                                                                                                                                                                                                                                                                week 21
                                                                                                                                                                                                                                                                                          week 22
                                                                                                                                                                                                                                                                                                    week 23
                                                                                                                                                                                                                                                                                                              week 24
                                                                                                                                       Tweets Number


                      Critical Period for movie “Harry Potter and the Half-Blood Prince".
     Show the relationship between sent time and number of tweets for the movie

                                                                                                   Image source: http://drupal.org/project/twitterminer
METHODOLOGIES: Data Preprocessing
  Why data preprocessing ?
  • Lot of noisy, spam, irrelevant tweets in our dataset
  • Convert the data to input format for our sentiment
     analysis tools.

  Techniques for preprocessing:
  • Removing URLs, user handles
  • Language detection to discard tweets not in English
  • Split the dataset into small chunks ~25000 Tweets/Chunk
  • Process chunks distributely
  • Filter for tweets related to target movies using regular
     expression.



                 Image source: http://mashable.com/2012/03/18/tweets-more-trustworthy-study/
METHODOLOGIES: Sentiment Analysis
  Algorithm:
  • Labelling tweets using Lingpipe sentiment analyzer, a natural
     language processing toolkit.
  • Sentence (tweet) based analysis with a logistic regression classifier.
     (Accuracy up to 80%)
  • Training & evaluation using 2009 dataset, testing on 2012 dataset.
  • Trained classifier labels tweet as positive, negative, neutral or
     irrelevant.
  • Calculate PT-NT Ratio for every movie. PT-NT Ratio is a function
     over parameters positive tweet ratio, negative tweet ratio, total
     tweets, neutral tweets, irrelevant tweets.
  • Thresholds to determine regions for PT-NT Ratio. Each region
     corresponds to Hit, Flop, Average results for movies.
  • Movie success correlated with PT-NT Ratio.
Experiments: Analysis of 30 Movies (Released in 2009)
Experiments: Movies vs. P/N Ratio, Profit Ratio
Experiments: Movies (Released in 2009) vs. PT-NT Ratio
Experiments: Analysis of 8 Movies (Released in 2012)
Experiments: Movies (Released in 2012) vs. PT-NT Ratio
Conclusion
Prediction for 2012 movies using our analysis:
   5 movies: Hit
   1 movie: Super hit
   1 movie: Average business
   Could not determine success rate for one due to it data unavailability.

Comparing our prediction results with box office results till date
   Prediction: exactly right for four cases
   On border line between hit and average for one case
   For remaining movies we lack data to check our prediction onfidence .

Half accuracy score if movie’ s classification near border.
Score of 4.5 out of 5 for accuracy that is equal to 90%.
Great achievement for our model even though there were limitations with
number of movies, hand labeled tweets etc.
Future Work
Bottlenecks:
1. Twitter data crawled by third party.
2. Limitation with Twitter APIs for crawling data.
3. Noise included in randomly picked 200 tweets.
4. Movies being released in limited number of theaters
    (Not enough data)

With more data, model can be more accurate and reliable.

Future work:
1. Using different other models and algorithms.
2. Temporal analysis can be added as a future work in the project.
3. Consideration of Retweets as a factor



                Image source: http://www.theispot.com/whatsnew/2012/2/brucie-rosch-twitter-data.htm
Thank you

  Q/A
Extra Slides
Experiments: Snapshot of Ling pipe's labelling results

More Related Content

What's hot

Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
Dev Sahu
 
Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on Twitter
Nitish J Prabhu
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big Data
Iswarya M
 
Sentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataSentiment Analysis using Twitter Data
Sentiment Analysis using Twitter Data
Hari Prasad
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysis
Ashish Mundra
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
Sunil Kandari
 
Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter Data
Nurendra Choudhary
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVM
Trilok Sharma
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
AntaraBhattacharya12
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
Pravin Katiyar
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use cases
Karol Chlasta
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
Sumit Raj
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes Algorithm
Khushboo Gupta
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
CJ Jenkins
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
SonuCreation
 
IRE2014-Sentiment Analysis
IRE2014-Sentiment AnalysisIRE2014-Sentiment Analysis
IRE2014-Sentiment Analysis
Gangasagar Patil
 
Twitter sentiment analysis project report
Twitter sentiment analysis project reportTwitter sentiment analysis project report
Twitter sentiment analysis project report
Bharat Khanna
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
Seher Can
 
Twitter Sentiment Analysis.pdf
Twitter Sentiment Analysis.pdfTwitter Sentiment Analysis.pdf
Twitter Sentiment Analysis.pdf
Rachanasamal3
 
Ml ppt
Ml pptMl ppt
Ml ppt
Alpna Patel
 

What's hot (20)

Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on Twitter
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big Data
 
Sentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataSentiment Analysis using Twitter Data
Sentiment Analysis using Twitter Data
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysis
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
 
Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter Data
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVM
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
Sentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use casesSentiment analysis - Our approach and use cases
Sentiment analysis - Our approach and use cases
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes Algorithm
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
 
IRE2014-Sentiment Analysis
IRE2014-Sentiment AnalysisIRE2014-Sentiment Analysis
IRE2014-Sentiment Analysis
 
Twitter sentiment analysis project report
Twitter sentiment analysis project reportTwitter sentiment analysis project report
Twitter sentiment analysis project report
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
Twitter Sentiment Analysis.pdf
Twitter Sentiment Analysis.pdfTwitter Sentiment Analysis.pdf
Twitter Sentiment Analysis.pdf
 
Ml ppt
Ml pptMl ppt
Ml ppt
 

More from Vasu Jain

Tech jobs beyond programming - Game design
Tech jobs beyond programming - Game designTech jobs beyond programming - Game design
Tech jobs beyond programming - Game design
Vasu Jain
 
Conversational AI & Smart Homes
Conversational AI & Smart HomesConversational AI & Smart Homes
Conversational AI & Smart Homes
Vasu Jain
 
Identifying and solving enterprise problems
Identifying and solving enterprise problems  Identifying and solving enterprise problems
Identifying and solving enterprise problems
Vasu Jain
 
Building Enterprise Chat Bots
Building Enterprise Chat BotsBuilding Enterprise Chat Bots
Building Enterprise Chat Bots
Vasu Jain
 
Chat Bots and how to build a Slack bot
Chat Bots and how to build a Slack botChat Bots and how to build a Slack bot
Chat Bots and how to build a Slack bot
Vasu Jain
 
Sentiment analysis of tweets [SCSE, 13]
Sentiment analysis of tweets [SCSE, 13]Sentiment analysis of tweets [SCSE, 13]
Sentiment analysis of tweets [SCSE, 13]
Vasu Jain
 
Indexing and Mining a Billion Time series using iSAX 2.0
Indexing and Mining a Billion Time series using iSAX 2.0Indexing and Mining a Billion Time series using iSAX 2.0
Indexing and Mining a Billion Time series using iSAX 2.0
Vasu Jain
 
How google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrowHow google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrow
Vasu Jain
 
Internet Explorer 9
Internet Explorer 9Internet Explorer 9
Internet Explorer 9Vasu Jain
 
Cloud Computing Architecture
Cloud Computing Architecture Cloud Computing Architecture
Cloud Computing Architecture
Vasu Jain
 
Power Point 2010
Power Point 2010 Power Point 2010
Power Point 2010
Vasu Jain
 
Microsoft Office 2010 Overview
Microsoft Office 2010 OverviewMicrosoft Office 2010 Overview
Microsoft Office 2010 Overview
Vasu Jain
 
Windows 7 What's Next
Windows 7 What's NextWindows 7 What's Next
Windows 7 What's Next
Vasu Jain
 
Windows Phone 7
Windows Phone 7Windows Phone 7
Windows Phone 7
Vasu Jain
 

More from Vasu Jain (14)

Tech jobs beyond programming - Game design
Tech jobs beyond programming - Game designTech jobs beyond programming - Game design
Tech jobs beyond programming - Game design
 
Conversational AI & Smart Homes
Conversational AI & Smart HomesConversational AI & Smart Homes
Conversational AI & Smart Homes
 
Identifying and solving enterprise problems
Identifying and solving enterprise problems  Identifying and solving enterprise problems
Identifying and solving enterprise problems
 
Building Enterprise Chat Bots
Building Enterprise Chat BotsBuilding Enterprise Chat Bots
Building Enterprise Chat Bots
 
Chat Bots and how to build a Slack bot
Chat Bots and how to build a Slack botChat Bots and how to build a Slack bot
Chat Bots and how to build a Slack bot
 
Sentiment analysis of tweets [SCSE, 13]
Sentiment analysis of tweets [SCSE, 13]Sentiment analysis of tweets [SCSE, 13]
Sentiment analysis of tweets [SCSE, 13]
 
Indexing and Mining a Billion Time series using iSAX 2.0
Indexing and Mining a Billion Time series using iSAX 2.0Indexing and Mining a Billion Time series using iSAX 2.0
Indexing and Mining a Billion Time series using iSAX 2.0
 
How google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrowHow google is using linked data today and vision for tomorrow
How google is using linked data today and vision for tomorrow
 
Internet Explorer 9
Internet Explorer 9Internet Explorer 9
Internet Explorer 9
 
Cloud Computing Architecture
Cloud Computing Architecture Cloud Computing Architecture
Cloud Computing Architecture
 
Power Point 2010
Power Point 2010 Power Point 2010
Power Point 2010
 
Microsoft Office 2010 Overview
Microsoft Office 2010 OverviewMicrosoft Office 2010 Overview
Microsoft Office 2010 Overview
 
Windows 7 What's Next
Windows 7 What's NextWindows 7 What's Next
Windows 7 What's Next
 
Windows Phone 7
Windows Phone 7Windows Phone 7
Windows Phone 7
 

Recently uploaded

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 

Recently uploaded (20)

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 

Sentiment analysis of tweets

  • 1. SENTIMENT ANALYSIS OF TWEETS Predicting a Movie's Box Office Success Vasu Jain Shu Cai 12/05/2012
  • 2. SENTIMENT ANALYSIS OF TWEETS Predicting a Movie's Box Office success Under Guidance of : Dr. Yan Liu
  • 3. AGENDA 1. Introduction 2. Related Work 3. Methodology 4. Experiments 5. Conclusion 6. Q and A Image source: SNLP Slides for Sentiment Analysis
  • 4. INTRODUCTION About Twitter • Social networking and microblogging service • Enables users to send and read messages • Messages of length up to 140 characters, known as "tweets". Tweets contain rich information about people’s preferences. People share their thoughts about movies using Twitter. Data analysis on twitter data to predict the success of a movie.
  • 5. INTRODUCTION People’s opinions towards a movie have huge impact on its success. Our project includes prediction using Twitter data, and analysis of the prediction results. High volume of positive tweets may indicate success of a movie. But how to quantify ? Image source: http://www.demainlaveille.fr/2012/05/06/pourquoi-twitter-ne-peut-pas-predire-les-elections-presidentielles/
  • 6.
  • 7. RELATED WORK Using social media to predict the future becomes very popular in recent years. • Predicting the Future with Social Media (Sitaram Asur & Bernardo A. Huberman, 2010) tries to show that twitter-based prediction of box office revenue performs better than market-based prediction. • Predicting IMDB movie ratings using social media (Andrei Oghina, Mathias Breuss, Manos Tsagkias & Maarten de Rijke 2012) uses twitter and youtube data to predict the imdb scores. Our project includes prediction using Twitter data and investigation on two new topics based on the prediction results.
  • 8. RELATED WORK • Predicting the results of presidential election (USC Annenberg Innovation Lab & USC SAIL). • Sentiment 140 to discover the Twitter sentiment (sentiment140.com) . No movie prediction is provided.
  • 9. OUR WORK • Data Collection: existing twitter data set and recent tweets via Twitter API • Data Pre-processing: get the "clean" data and transform it to the format we need • Sentiment Analysis: train a classifier to classify the tweets as: positive, negative, neutral and irrelevant • Prediction: use the statistics of the tweets' labels to predict the movie success (hit/flop/average)
  • 10. METHODOLOGIES: Data Collection & Crawling 2009 Data set Subset of Stanford dataset (now unavailable) • 477 Million Tweets, period of June – Dec 2009 • Filtered tweets during critical period for movie • 68.7 GB datasets (compressed format) • 30 movies, 6 Million relevant Tweets 2012 Data set live crawling using a script • Streaming API of python library for Twitter to collect data • Data Retrieval using keywords for movies • Data collection focus on critical period • 8 Movies, 2.5 Million Tweets Image source: http://drupal.org/project/twitterminer
  • 11. METHODOLOGIES: Data Collection & Crawling 160000 140000 120000 100000 80000 60000 40000 20000 0 week -6 week -5 week -4 week -3 week -2 week -1 week 0 week 1 week 2 week 3 week 4 week 5 week 6 week 7 week 8 week 9 week 10 week 11 week 12 week 13 week 14 week 15 week 16 week 17 week 18 week 19 week 20 week 21 week 22 week 23 week 24 Tweets Number Critical Period for movie “Harry Potter and the Half-Blood Prince". Show the relationship between sent time and number of tweets for the movie Image source: http://drupal.org/project/twitterminer
  • 12. METHODOLOGIES: Data Preprocessing Why data preprocessing ? • Lot of noisy, spam, irrelevant tweets in our dataset • Convert the data to input format for our sentiment analysis tools. Techniques for preprocessing: • Removing URLs, user handles • Language detection to discard tweets not in English • Split the dataset into small chunks ~25000 Tweets/Chunk • Process chunks distributely • Filter for tweets related to target movies using regular expression. Image source: http://mashable.com/2012/03/18/tweets-more-trustworthy-study/
  • 13. METHODOLOGIES: Sentiment Analysis Algorithm: • Labelling tweets using Lingpipe sentiment analyzer, a natural language processing toolkit. • Sentence (tweet) based analysis with a logistic regression classifier. (Accuracy up to 80%) • Training & evaluation using 2009 dataset, testing on 2012 dataset. • Trained classifier labels tweet as positive, negative, neutral or irrelevant. • Calculate PT-NT Ratio for every movie. PT-NT Ratio is a function over parameters positive tweet ratio, negative tweet ratio, total tweets, neutral tweets, irrelevant tweets. • Thresholds to determine regions for PT-NT Ratio. Each region corresponds to Hit, Flop, Average results for movies. • Movie success correlated with PT-NT Ratio.
  • 14. Experiments: Analysis of 30 Movies (Released in 2009)
  • 15. Experiments: Movies vs. P/N Ratio, Profit Ratio
  • 16. Experiments: Movies (Released in 2009) vs. PT-NT Ratio
  • 17. Experiments: Analysis of 8 Movies (Released in 2012)
  • 18. Experiments: Movies (Released in 2012) vs. PT-NT Ratio
  • 19. Conclusion Prediction for 2012 movies using our analysis: 5 movies: Hit 1 movie: Super hit 1 movie: Average business Could not determine success rate for one due to it data unavailability. Comparing our prediction results with box office results till date Prediction: exactly right for four cases On border line between hit and average for one case For remaining movies we lack data to check our prediction onfidence . Half accuracy score if movie’ s classification near border. Score of 4.5 out of 5 for accuracy that is equal to 90%. Great achievement for our model even though there were limitations with number of movies, hand labeled tweets etc.
  • 20. Future Work Bottlenecks: 1. Twitter data crawled by third party. 2. Limitation with Twitter APIs for crawling data. 3. Noise included in randomly picked 200 tweets. 4. Movies being released in limited number of theaters (Not enough data) With more data, model can be more accurate and reliable. Future work: 1. Using different other models and algorithms. 2. Temporal analysis can be added as a future work in the project. 3. Consideration of Retweets as a factor Image source: http://www.theispot.com/whatsnew/2012/2/brucie-rosch-twitter-data.htm
  • 21. Thank you Q/A
  • 23.
  • 24. Experiments: Snapshot of Ling pipe's labelling results