SlideShare a Scribd company logo
1 of 25
Download to read offline
Predicting the Future With Social
                 Media

                          Social Computing Lab
                 The Social Computing Lab focuses on methods     Bernardo A. Huberman
Sitaram Asur     for harvesting the collective intelligence of
                 groups of people in order to realize greater
                 value from the interaction between users and
                 information.


     Published on arXiv Cornell University – March 2010
                         http://arxiv.org/abs/1003.5699




      Maurizio Napolitano, SoNet group,http://sonet.fbk.eu - April 2010
SoNet Research Meetings
These slides were used for an internal presentation of
  the SoNet group.
Every week, one member of the SoNet group presents a
  research papers to the other members. The
  mentioned paper(s) are hence written by other
  researchers.
Being internal presentations, these slides might be a bit
  rough and unpolished.

You can find more information (including this
  presentation) about the SoNet group at
  http://sonet.fbk.eu
The question
 How social media content can be used to predict
  real-world outcomes?
The case study:
predicting box-office revenues for movies using
the chatter from Twitter
Why Twitter?
   several tens of millions of users who actively participate in the
   creation and propagation of content

Why movies?
  The topic of movies is of considerable interest among the social media
  user community
  The real-world outcomes can be easily observed from box-office
  revenue for movies
Topics
Viral marketing
• How buzz and attention is created for different movies
• How buzz and attention changes over time

  movies that are well talked about will be
               well-watched?
Sentiments
•How are created
•How positive and negative opinions propagate
•How they influence people
What discovery
• Social media feeds can be effective indicators of
   real-world performance


• The rate at which movie tweets are generated can be
   used to build a powerful model for predicting movie
   box-office revenue.


• The predictions are better than those produced by the
   Hollywood Stock Exchange, the gold standard in the
                    Exchange
   industry
The dataset
  TWITTER search API                                 2.89 million tweets
  •tweets                                            referring to 24 different movies
  •@userid                                           period of 3 months (nov-feb)
  •retweet                                           from 1.2 million users

                                 by using the movies keywords

Armored           Daybreakers        Extraordinary          Leap Year       Princess And The   Tooth Fairy
(2009-12-04)      (2010-01-08)       Measures               (2010-01-08)    Fog                (2010-02-26)
                                     (2010-02-22)                           (2009-11-13)

Avatar            Dear John          From Paris With Love   Legion          Sherlock Holmes    Transylmania
(2009-12-18)      (2010-02-05)       (2010-02-05)           (2010-01-22)    (2009-12-15)       (2009-12-04)


The Blind Side    Did You Hear       The Imaginarium of     Twilight: New   Spy Next Door      When in Rome
(2009-11-15)      About The          Dr Parnassus           moon            (2010-01-15)       (2010-01-29)
                  Morgans            (2010-01-08)           (2009-11-20)
                  (2009-12-08)
The Book of Eli   Edge of Darkness   Invictus               Pirate Radio    The Crazies        Youth in Revolt
(2010-01-15)      (2010-01-29)       (2009-12-11)           (2009-11-13)    (2010-02-26)       (2010-01-08)



critical period = the time to the week before a release movie
Dataset charatecteristics
         Number of tweets per unique authors for different movies




                                                         y → tweets
LIKE the box-office trends!!!                            x → days
                                                         lines → movies
Dataset characteristics
             Number of tweets per unique authors for different movies




                                                             y → tweets per authors
                                                             x → days
ratio remains fairly consistent between 1 and 1.5
                                                             lines → movies
Dataset charatecteristics
          Log distribution of authors and tweets over the critical period




POWER LAW – Zipfian distribution                           y → log(frequency of authors)
A few authors generating a large number of tweets          x → log(number of tweets)
Dataset characteristics
           Distribution of total authors and the movies they comment on




POWER LAW                                                   y → authors
A majority of the authors talking about only a few movies   x → number of movies
Attention and popularity
                            Twitter and real world



“Prior to the release of a movie, media companies and and
producers generate promotional information in the form of
trailer videos, news, blogs and photos.
We expect the tweets for movies before the time of their
release to consist primarily of such promotional campaigns,
geared to promote word-ofmouth cascades”


In Twitter:

                     tweets and retweets
referring a particular url (photos, trailer and other promotional material)
Attention and popularity
  Percentages of urls in tweets for different movies




there is a greater percentage of tweets containing urls
in the week prior to release than afterwards
Attention and popularity
               tweets with url VS retweets

    URLs and RETWEETs PERCENTAGES FOR CRITICAL WEEK

        Features Week 0      Week 1      Week 2
        url        39.5      25.5        22.5
        retweet    12.1      12.1        11.66


   CORRELATION and COEFFICENT OF DETERMINATION (R2 )
       values for URLS and RETWEETs before release

        Features      Correlation   R2
        url           0.64          0.39
        retweet       0.5           0.20


“This result is quite surprising since we would expect
promotional material to contribute significantly to a movie’s
box-office income”
Prediction
                   first weekend Box-office revenues

    “Using the tweets referring to movies prior to their release,
    can we accurately predict the box-office revenue generated
    by the movie in its opening weekend?”

      How use a quantifiable measure on the tweets?

TWEETRATE
  number of tweets referring to a particular movie per hour

                                ∣tweets mov∣
              Tweetrate mov =
                                ∣Time hours∣

“the correlation of the average tweetrate with the box-office gross
for the 24 movies considered showed a strong positive correlation,
with a correlation coefficient value of 0.90”
Prediction
                         use the regression analisys!

Prediction compared with the real box-office revenue information extracted from
the Box Office Mojo website => POSITIVE RESULTS


     Regression analysis with:

     •Time series values of the tweet rate for the 7 days
     before the release

     •Thent → number of the theaters the movies were
     released

     •HSX Index → the index of the Hollywood Stock
     Exchange
Prediction
                 linear regression the results

Features                         Adjusted R2     p-value***
Avg Tweet-rate                   0.80            3.65e-09

Tweet-rate timeseries            0.93            5.279e-09

Tweet-rate timeseries + thent    0.973           9.14e-12

HSX timeseries + thent           0.963           1.030e-10
Prediction
Predicted vs Actual box office scores using tweet-rate and HSX predictors
Prediction
                          Predicting prices


Prediction of HSX end of opening weekend price
         Predictor               Adjusted R2   p-value***
HSX timeseries + thent           0.95          4.495e-10
Tweet-rate timeseries + 0.97                   2.379e-11
thent



“The Hollywood Stock          Week-end         Adjusted R2
Exchange       de-lists
movie stocks after 4          Jan 15-17        0.92
weeks    of   release,        Jan 22-24        0.97
which means that
there is no timeseries        Jan 29-31        0.92
available for movies
after 4 weeks. In the         Feb 05-07        0.95
case of tweets, people
continue to discuss          Coefficient of determination
movies long after they       (R2) values using tweet-rate
are released”
                             timeseries for different week-
                             ends
Sentiment Analysis
investigate the importance of sentiments in predicting future outcomes

    •For each tweet assign the label Positive, Negative or Neutral
        • Clean data (no stop-words, removel url and userid,
          replace title, question, exclamations)
        • Amazon Meccanical Turk (1000 workers)

    •Use LingPipe – DynamicLDClassifier
           • Obtained an accuracy of 98%

    1)Define two variables

                       ∣Positive and NegativeTweets∣
          Subjectivity=
                              ∣Neutral Tweets∣

                   ∣Tweets with Positive Sentiment∣
          PNratio=
                  ∣Tweets with Negative Sentiment∣
Sentiment Analysis




                                           X → movies
the subjectivity increases after release   Y → subjectivity
Sentiment Analysis




The positive and negative go in the same direction   X → movies
of the movies success                                Y → polarity
Sentiment Analysis
       regression analisys and polartiy (PNRatio)


    Predictor                  Adjusted R2   p-value

    Avg Tweet-rate             0.79          8.39e-09

    Avg Tweet-rate + thent     0.83          7.93a-09

    Avg Tweet-rate + PNRatio   0.92          4.31e-12

    Tweet-rate time series     0.84          4.18e-06
    Tweet-rate timeseries +    0.863         3.64e-06
    thent

    Tweet-rate timeseries +    0.94          1.84e-08
    PNRatio


the sentiments do provide improvements, although they are not as
           important as the rate of tweets themselves
GENERAL PREDICTION MODEL FOR
        SOCIALMEDIA

   y=a∗A p∗P d ∗D

    A : rate of attention seeking
    P : polarity of sentiments and reviews
                          y=∧

    D : distribution parameter

    y denote the revenue to be predicted
    Є the error
    β values correspond to the regression
    coefficients
Bibliography

    D. M. Pennock, S. Lawrence, C. L. Giles, and F. A. Nielsen.
    The real power of artificial markets. Science, 291(5506):987–
    988, Jan 2001.

    W. Zhang and S. Skiena. Improving movie gross prediction
    through news analysis. In Web Intelligence, pages 301304,
    2009.
These slides are released under
Creative Commons
Attribution-ShareAlike 2.5
●
  You are free:
●
  to copy, distribute, display, and perform the work
●
  to make derivative works
●
  to make commercial use of the work

Under the following conditions:
Attribution. You must attribute the work in the manner specified by the author or licensor.
Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work
only under a license identical to this one.
●
    For any reuse or distribution, you must make clear to others the license terms of this work.
●
    Any of these conditions can be waived if you get permission from the copyright holder.


Your fair use and other rights are in no way affected by the above.
More info at http://creativecommons.org/licenses/by-sa/2.5/

More Related Content

What's hot

Data Architecture Strategies: The Rise of the Graph Database
Data Architecture Strategies: The Rise of the Graph DatabaseData Architecture Strategies: The Rise of the Graph Database
Data Architecture Strategies: The Rise of the Graph DatabaseDATAVERSITY
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data FabricAlan McSweeney
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4jNeo4j
 
The Rise of Self -service Business Intelligence
The Rise of Self -service Business IntelligenceThe Rise of Self -service Business Intelligence
The Rise of Self -service Business Intelligenceskewdlogix
 
AIOps: Steps Towards Autonomous Operations (DEV301-R1) - AWS re:Invent 2018
AIOps: Steps Towards Autonomous Operations (DEV301-R1) - AWS re:Invent 2018AIOps: Steps Towards Autonomous Operations (DEV301-R1) - AWS re:Invent 2018
AIOps: Steps Towards Autonomous Operations (DEV301-R1) - AWS re:Invent 2018Amazon Web Services
 
Use of Analytics by Netflix - Case Study
Use of Analytics by Netflix - Case StudyUse of Analytics by Netflix - Case Study
Use of Analytics by Netflix - Case StudySaket Toshniwal
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Adrien Blind
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSAmazon Web Services
 
Data Governance
Data GovernanceData Governance
Data GovernanceSambaSoup
 
Self-Service Analytics Framework - Connected Brains 2018
Self-Service Analytics Framework - Connected Brains 2018Self-Service Analytics Framework - Connected Brains 2018
Self-Service Analytics Framework - Connected Brains 2018LoQutus
 
Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks Nishant Gandhi
 
Fivetran pitch deck
Fivetran pitch deckFivetran pitch deck
Fivetran pitch deckTech in Asia
 
Tableau slideshare
Tableau slideshareTableau slideshare
Tableau slideshareSakshi Jain
 
Power BI vs Tableau
Power BI vs TableauPower BI vs Tableau
Power BI vs TableauDon Hyun
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 

What's hot (20)

Data Architecture Strategies: The Rise of the Graph Database
Data Architecture Strategies: The Rise of the Graph DatabaseData Architecture Strategies: The Rise of the Graph Database
Data Architecture Strategies: The Rise of the Graph Database
 
Data Cleansing
Data CleansingData Cleansing
Data Cleansing
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
 
ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4j
 
The Rise of Self -service Business Intelligence
The Rise of Self -service Business IntelligenceThe Rise of Self -service Business Intelligence
The Rise of Self -service Business Intelligence
 
AIOps: Steps Towards Autonomous Operations (DEV301-R1) - AWS re:Invent 2018
AIOps: Steps Towards Autonomous Operations (DEV301-R1) - AWS re:Invent 2018AIOps: Steps Towards Autonomous Operations (DEV301-R1) - AWS re:Invent 2018
AIOps: Steps Towards Autonomous Operations (DEV301-R1) - AWS re:Invent 2018
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Use of Analytics by Netflix - Case Study
Use of Analytics by Netflix - Case StudyUse of Analytics by Netflix - Case Study
Use of Analytics by Netflix - Case Study
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
 
Big Data use cases in telcos
Big Data use cases in telcosBig Data use cases in telcos
Big Data use cases in telcos
 
Data Governance
Data GovernanceData Governance
Data Governance
 
Data analytics
Data analyticsData analytics
Data analytics
 
Self-Service Analytics Framework - Connected Brains 2018
Self-Service Analytics Framework - Connected Brains 2018Self-Service Analytics Framework - Connected Brains 2018
Self-Service Analytics Framework - Connected Brains 2018
 
Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks
 
Fivetran pitch deck
Fivetran pitch deckFivetran pitch deck
Fivetran pitch deck
 
Tableau slideshare
Tableau slideshareTableau slideshare
Tableau slideshare
 
Power BI vs Tableau
Power BI vs TableauPower BI vs Tableau
Power BI vs Tableau
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 

Viewers also liked

Osiris - Opendata impact in jobs&skills
Osiris - Opendata impact in jobs&skillsOsiris - Opendata impact in jobs&skills
Osiris - Opendata impact in jobs&skillsMaurizio Napolitano
 
Predicting the future with social media (Twitter y Box Office)
Predicting the future with social media (Twitter y Box Office)Predicting the future with social media (Twitter y Box Office)
Predicting the future with social media (Twitter y Box Office)Gonzalo Martín
 
Predicting the future with social media
Predicting the future with social mediaPredicting the future with social media
Predicting the future with social mediaPrince Xavier Okeke
 
Cultura del dato e startup (lesson learned from finodex)
Cultura del dato e startup (lesson learned from finodex)Cultura del dato e startup (lesson learned from finodex)
Cultura del dato e startup (lesson learned from finodex)Maurizio Napolitano
 
Sentiment Analysis for Arabic tweets
Sentiment Analysis for Arabic tweetsSentiment Analysis for Arabic tweets
Sentiment Analysis for Arabic tweetsRaed Marji
 
Sentiment analysis of arabic,a survey
Sentiment analysis of arabic,a surveySentiment analysis of arabic,a survey
Sentiment analysis of arabic,a surveyArabic_NLP_ImamU2013
 

Viewers also liked (6)

Osiris - Opendata impact in jobs&skills
Osiris - Opendata impact in jobs&skillsOsiris - Opendata impact in jobs&skills
Osiris - Opendata impact in jobs&skills
 
Predicting the future with social media (Twitter y Box Office)
Predicting the future with social media (Twitter y Box Office)Predicting the future with social media (Twitter y Box Office)
Predicting the future with social media (Twitter y Box Office)
 
Predicting the future with social media
Predicting the future with social mediaPredicting the future with social media
Predicting the future with social media
 
Cultura del dato e startup (lesson learned from finodex)
Cultura del dato e startup (lesson learned from finodex)Cultura del dato e startup (lesson learned from finodex)
Cultura del dato e startup (lesson learned from finodex)
 
Sentiment Analysis for Arabic tweets
Sentiment Analysis for Arabic tweetsSentiment Analysis for Arabic tweets
Sentiment Analysis for Arabic tweets
 
Sentiment analysis of arabic,a survey
Sentiment analysis of arabic,a surveySentiment analysis of arabic,a survey
Sentiment analysis of arabic,a survey
 

Similar to Predicting The Future With Social Media

Anly 500-presentation
Anly 500-presentationAnly 500-presentation
Anly 500-presentationFangyaTan
 
A Movie Broker Dashboard in PowerBI
A Movie Broker Dashboard in PowerBIA Movie Broker Dashboard in PowerBI
A Movie Broker Dashboard in PowerBILeo Salemann
 
Super Bowl 50 & the Twitterverse
Super Bowl 50 & the TwitterverseSuper Bowl 50 & the Twitterverse
Super Bowl 50 & the TwitterverseIvan Heneghan
 
Teads Entertainment Barometer - July 2015
Teads Entertainment Barometer - July 2015Teads Entertainment Barometer - July 2015
Teads Entertainment Barometer - July 2015Teads
 
Storytelling with data think broad, mine deep, explain simply
Storytelling with data   think broad, mine deep, explain simplyStorytelling with data   think broad, mine deep, explain simply
Storytelling with data think broad, mine deep, explain simplyLuciano Pesci, PhD
 
Teads Entertainment Barometer February 2016
Teads Entertainment Barometer February 2016Teads Entertainment Barometer February 2016
Teads Entertainment Barometer February 2016Teads
 
Teads Entertainment Barometer October 2015
Teads Entertainment Barometer October 2015Teads Entertainment Barometer October 2015
Teads Entertainment Barometer October 2015Teads
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
Netflix and the film industry
Netflix and the film industryNetflix and the film industry
Netflix and the film industrylou80
 
EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...
EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...
EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...Symeon Papadopoulos
 
Community detection recommender system
Community detection   recommender systemCommunity detection   recommender system
Community detection recommender systemRupaDutta3
 
Artificial Intelligence for the Film Industry
Artificial Intelligence for the Film IndustryArtificial Intelligence for the Film Industry
Artificial Intelligence for the Film IndustryGeorg Rehm
 
A Two Step Ranking Solution for Twitter User Engagement
A Two Step Ranking Solution for Twitter User Engagement�A Two Step Ranking Solution for Twitter User Engagement�
A Two Step Ranking Solution for Twitter User EngagementBehnoush Abdollahi
 
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreH2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreSri Ambati
 
Teads Entertainment Barometer November 2015 (October data)
Teads Entertainment Barometer November 2015 (October data)Teads Entertainment Barometer November 2015 (October data)
Teads Entertainment Barometer November 2015 (October data)Teads
 
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...Hakka Labs
 
Bowling Alone and Trust Decline in Social Network Sites
Bowling Alone and  Trust Decline in  Social Network SitesBowling Alone and  Trust Decline in  Social Network Sites
Bowling Alone and Trust Decline in Social Network SitesPaolo Massa
 

Similar to Predicting The Future With Social Media (20)

Anly 500-presentation
Anly 500-presentationAnly 500-presentation
Anly 500-presentation
 
A Movie Broker Dashboard in PowerBI
A Movie Broker Dashboard in PowerBIA Movie Broker Dashboard in PowerBI
A Movie Broker Dashboard in PowerBI
 
Super Bowl 50 & the Twitterverse
Super Bowl 50 & the TwitterverseSuper Bowl 50 & the Twitterverse
Super Bowl 50 & the Twitterverse
 
Teads Entertainment Barometer - July 2015
Teads Entertainment Barometer - July 2015Teads Entertainment Barometer - July 2015
Teads Entertainment Barometer - July 2015
 
Storytelling with data think broad, mine deep, explain simply
Storytelling with data   think broad, mine deep, explain simplyStorytelling with data   think broad, mine deep, explain simply
Storytelling with data think broad, mine deep, explain simply
 
Teads Entertainment Barometer February 2016
Teads Entertainment Barometer February 2016Teads Entertainment Barometer February 2016
Teads Entertainment Barometer February 2016
 
Tim P
Tim P   Tim P
Tim P
 
Teads Entertainment Barometer October 2015
Teads Entertainment Barometer October 2015Teads Entertainment Barometer October 2015
Teads Entertainment Barometer October 2015
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
Netflix and the film industry
Netflix and the film industryNetflix and the film industry
Netflix and the film industry
 
EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...
EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...
EventSense: Capturing the Pulse of Large-scale Events by Mining Social Media ...
 
Foresee your movie revenue
Foresee your movie revenueForesee your movie revenue
Foresee your movie revenue
 
Community detection recommender system
Community detection   recommender systemCommunity detection   recommender system
Community detection recommender system
 
Artificial Intelligence for the Film Industry
Artificial Intelligence for the Film IndustryArtificial Intelligence for the Film Industry
Artificial Intelligence for the Film Industry
 
A Two Step Ranking Solution for Twitter User Engagement
A Two Step Ranking Solution for Twitter User Engagement�A Two Step Ranking Solution for Twitter User Engagement�
A Two Step Ranking Solution for Twitter User Engagement
 
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreH2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
 
Teads Entertainment Barometer November 2015 (October data)
Teads Entertainment Barometer November 2015 (October data)Teads Entertainment Barometer November 2015 (October data)
Teads Entertainment Barometer November 2015 (October data)
 
Introducing telemetrics
Introducing telemetricsIntroducing telemetrics
Introducing telemetrics
 
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
 
Bowling Alone and Trust Decline in Social Network Sites
Bowling Alone and  Trust Decline in  Social Network SitesBowling Alone and  Trust Decline in  Social Network Sites
Bowling Alone and Trust Decline in Social Network Sites
 

More from Maurizio Napolitano

I dati AGCOM del pluralismo politico sociale in televisione
I dati AGCOM del pluralismo politico sociale in televisioneI dati AGCOM del pluralismo politico sociale in televisione
I dati AGCOM del pluralismo politico sociale in televisioneMaurizio Napolitano
 
FIPAV - allievo allenatore Il protocollo di allenamento - Modulo 2 - napolita...
FIPAV - allievo allenatore Il protocollo di allenamento - Modulo 2 - napolita...FIPAV - allievo allenatore Il protocollo di allenamento - Modulo 2 - napolita...
FIPAV - allievo allenatore Il protocollo di allenamento - Modulo 2 - napolita...Maurizio Napolitano
 
Soluzioni open source per la mobilità
Soluzioni open source per la mobilitàSoluzioni open source per la mobilità
Soluzioni open source per la mobilitàMaurizio Napolitano
 
Il diritto all'oblio nell'era digitale
Il diritto all'oblio nell'era digitaleIl diritto all'oblio nell'era digitale
Il diritto all'oblio nell'era digitaleMaurizio Napolitano
 
OpenStreetMap: disegnamo la mappa del mondo
OpenStreetMap: disegnamo la mappa del mondoOpenStreetMap: disegnamo la mappa del mondo
OpenStreetMap: disegnamo la mappa del mondoMaurizio Napolitano
 
Estrarre dati da Twitter via API e soluzioni OSINT
Estrarre dati da Twitter via API e soluzioni OSINTEstrarre dati da Twitter via API e soluzioni OSINT
Estrarre dati da Twitter via API e soluzioni OSINTMaurizio Napolitano
 
OpenStreetMap: passato, presente e futuro (?)
OpenStreetMap:  passato, presente e futuro (?)OpenStreetMap:  passato, presente e futuro (?)
OpenStreetMap: passato, presente e futuro (?)Maurizio Napolitano
 
Ten years of opendata: what has happened and what is there to do
Ten years of opendata: what has happened and what is there to doTen years of opendata: what has happened and what is there to do
Ten years of opendata: what has happened and what is there to doMaurizio Napolitano
 
Infographics & data visualization - corso base FBK
Infographics & data visualization - corso base FBKInfographics & data visualization - corso base FBK
Infographics & data visualization - corso base FBKMaurizio Napolitano
 
Percorso di specializzazione per i ruoli di ricevitore–attaccante, opposto e ...
Percorso di specializzazione per i ruoli di ricevitore–attaccante, opposto e ...Percorso di specializzazione per i ruoli di ricevitore–attaccante, opposto e ...
Percorso di specializzazione per i ruoli di ricevitore–attaccante, opposto e ...Maurizio Napolitano
 
Dati: catalizzatori di innovazione per la smarticity
Dati: catalizzatori di innovazione per la smarticityDati: catalizzatori di innovazione per la smarticity
Dati: catalizzatori di innovazione per la smarticityMaurizio Napolitano
 
la comunicazione attraverso i social media
la comunicazione attraverso i social mediala comunicazione attraverso i social media
la comunicazione attraverso i social mediaMaurizio Napolitano
 
creare cruscotti per investigare i dati
creare cruscotti per investigare i daticreare cruscotti per investigare i dati
creare cruscotti per investigare i datiMaurizio Napolitano
 
Follow the white Rabbit: opportunità e trabocchetti nella nostra vita digitale
Follow the white Rabbit: opportunità e trabocchetti nella nostra vita digitaleFollow the white Rabbit: opportunità e trabocchetti nella nostra vita digitale
Follow the white Rabbit: opportunità e trabocchetti nella nostra vita digitaleMaurizio Napolitano
 
Strumenti e suggerimenti per creare grafici
Strumenti e suggerimenti per creare graficiStrumenti e suggerimenti per creare grafici
Strumenti e suggerimenti per creare graficiMaurizio Napolitano
 

More from Maurizio Napolitano (20)

I dati AGCOM del pluralismo politico sociale in televisione
I dati AGCOM del pluralismo politico sociale in televisioneI dati AGCOM del pluralismo politico sociale in televisione
I dati AGCOM del pluralismo politico sociale in televisione
 
FIPAV - allievo allenatore Il protocollo di allenamento - Modulo 2 - napolita...
FIPAV - allievo allenatore Il protocollo di allenamento - Modulo 2 - napolita...FIPAV - allievo allenatore Il protocollo di allenamento - Modulo 2 - napolita...
FIPAV - allievo allenatore Il protocollo di allenamento - Modulo 2 - napolita...
 
La gestione del gruppo
La gestione del gruppoLa gestione del gruppo
La gestione del gruppo
 
percorsi ciclabili e stress
percorsi ciclabili e stresspercorsi ciclabili e stress
percorsi ciclabili e stress
 
Soluzioni open source per la mobilità
Soluzioni open source per la mobilitàSoluzioni open source per la mobilità
Soluzioni open source per la mobilità
 
Il diritto all'oblio nell'era digitale
Il diritto all'oblio nell'era digitaleIl diritto all'oblio nell'era digitale
Il diritto all'oblio nell'era digitale
 
OpenStreetMap: disegnamo la mappa del mondo
OpenStreetMap: disegnamo la mappa del mondoOpenStreetMap: disegnamo la mappa del mondo
OpenStreetMap: disegnamo la mappa del mondo
 
Estrarre dati da Twitter via API e soluzioni OSINT
Estrarre dati da Twitter via API e soluzioni OSINTEstrarre dati da Twitter via API e soluzioni OSINT
Estrarre dati da Twitter via API e soluzioni OSINT
 
OpenStreetMap: passato, presente e futuro (?)
OpenStreetMap:  passato, presente e futuro (?)OpenStreetMap:  passato, presente e futuro (?)
OpenStreetMap: passato, presente e futuro (?)
 
Strumenti per il Fact Checking
Strumenti per il Fact CheckingStrumenti per il Fact Checking
Strumenti per il Fact Checking
 
Estrarre contenuti da Web
Estrarre contenuti da WebEstrarre contenuti da Web
Estrarre contenuti da Web
 
Ten years of opendata: what has happened and what is there to do
Ten years of opendata: what has happened and what is there to doTen years of opendata: what has happened and what is there to do
Ten years of opendata: what has happened and what is there to do
 
Infographics & data visualization - corso base FBK
Infographics & data visualization - corso base FBKInfographics & data visualization - corso base FBK
Infographics & data visualization - corso base FBK
 
Percorso di specializzazione per i ruoli di ricevitore–attaccante, opposto e ...
Percorso di specializzazione per i ruoli di ricevitore–attaccante, opposto e ...Percorso di specializzazione per i ruoli di ricevitore–attaccante, opposto e ...
Percorso di specializzazione per i ruoli di ricevitore–attaccante, opposto e ...
 
Dati: catalizzatori di innovazione per la smarticity
Dati: catalizzatori di innovazione per la smarticityDati: catalizzatori di innovazione per la smarticity
Dati: catalizzatori di innovazione per la smarticity
 
la comunicazione attraverso i social media
la comunicazione attraverso i social mediala comunicazione attraverso i social media
la comunicazione attraverso i social media
 
creare cruscotti per investigare i dati
creare cruscotti per investigare i daticreare cruscotti per investigare i dati
creare cruscotti per investigare i dati
 
Follow the white Rabbit: opportunità e trabocchetti nella nostra vita digitale
Follow the white Rabbit: opportunità e trabocchetti nella nostra vita digitaleFollow the white Rabbit: opportunità e trabocchetti nella nostra vita digitale
Follow the white Rabbit: opportunità e trabocchetti nella nostra vita digitale
 
Strumenti e suggerimenti per creare grafici
Strumenti e suggerimenti per creare graficiStrumenti e suggerimenti per creare grafici
Strumenti e suggerimenti per creare grafici
 
Data Journalism e Fake News
Data Journalism e Fake NewsData Journalism e Fake News
Data Journalism e Fake News
 

Recently uploaded

Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in  GERMANY for 2024: IPTVreelTHE BEST IPTV in  GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreelreely ones
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyUXDXConf
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsStefano
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty SecureFemke de Vroome
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 

Recently uploaded (20)

Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in  GERMANY for 2024: IPTVreelTHE BEST IPTV in  GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreel
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 

Predicting The Future With Social Media

  • 1. Predicting the Future With Social Media Social Computing Lab The Social Computing Lab focuses on methods Bernardo A. Huberman Sitaram Asur for harvesting the collective intelligence of groups of people in order to realize greater value from the interaction between users and information. Published on arXiv Cornell University – March 2010 http://arxiv.org/abs/1003.5699 Maurizio Napolitano, SoNet group,http://sonet.fbk.eu - April 2010
  • 2. SoNet Research Meetings These slides were used for an internal presentation of the SoNet group. Every week, one member of the SoNet group presents a research papers to the other members. The mentioned paper(s) are hence written by other researchers. Being internal presentations, these slides might be a bit rough and unpolished. You can find more information (including this presentation) about the SoNet group at http://sonet.fbk.eu
  • 3. The question How social media content can be used to predict real-world outcomes? The case study: predicting box-office revenues for movies using the chatter from Twitter Why Twitter? several tens of millions of users who actively participate in the creation and propagation of content Why movies? The topic of movies is of considerable interest among the social media user community The real-world outcomes can be easily observed from box-office revenue for movies
  • 4. Topics Viral marketing • How buzz and attention is created for different movies • How buzz and attention changes over time movies that are well talked about will be well-watched? Sentiments •How are created •How positive and negative opinions propagate •How they influence people
  • 5. What discovery • Social media feeds can be effective indicators of real-world performance • The rate at which movie tweets are generated can be used to build a powerful model for predicting movie box-office revenue. • The predictions are better than those produced by the Hollywood Stock Exchange, the gold standard in the Exchange industry
  • 6. The dataset TWITTER search API 2.89 million tweets •tweets referring to 24 different movies •@userid period of 3 months (nov-feb) •retweet from 1.2 million users by using the movies keywords Armored Daybreakers Extraordinary Leap Year Princess And The Tooth Fairy (2009-12-04) (2010-01-08) Measures (2010-01-08) Fog (2010-02-26) (2010-02-22) (2009-11-13) Avatar Dear John From Paris With Love Legion Sherlock Holmes Transylmania (2009-12-18) (2010-02-05) (2010-02-05) (2010-01-22) (2009-12-15) (2009-12-04) The Blind Side Did You Hear The Imaginarium of Twilight: New Spy Next Door When in Rome (2009-11-15) About The Dr Parnassus moon (2010-01-15) (2010-01-29) Morgans (2010-01-08) (2009-11-20) (2009-12-08) The Book of Eli Edge of Darkness Invictus Pirate Radio The Crazies Youth in Revolt (2010-01-15) (2010-01-29) (2009-12-11) (2009-11-13) (2010-02-26) (2010-01-08) critical period = the time to the week before a release movie
  • 7. Dataset charatecteristics Number of tweets per unique authors for different movies y → tweets LIKE the box-office trends!!! x → days lines → movies
  • 8. Dataset characteristics Number of tweets per unique authors for different movies y → tweets per authors x → days ratio remains fairly consistent between 1 and 1.5 lines → movies
  • 9. Dataset charatecteristics Log distribution of authors and tweets over the critical period POWER LAW – Zipfian distribution y → log(frequency of authors) A few authors generating a large number of tweets x → log(number of tweets)
  • 10. Dataset characteristics Distribution of total authors and the movies they comment on POWER LAW y → authors A majority of the authors talking about only a few movies x → number of movies
  • 11. Attention and popularity Twitter and real world “Prior to the release of a movie, media companies and and producers generate promotional information in the form of trailer videos, news, blogs and photos. We expect the tweets for movies before the time of their release to consist primarily of such promotional campaigns, geared to promote word-ofmouth cascades” In Twitter: tweets and retweets referring a particular url (photos, trailer and other promotional material)
  • 12. Attention and popularity Percentages of urls in tweets for different movies there is a greater percentage of tweets containing urls in the week prior to release than afterwards
  • 13. Attention and popularity tweets with url VS retweets URLs and RETWEETs PERCENTAGES FOR CRITICAL WEEK Features Week 0 Week 1 Week 2 url 39.5 25.5 22.5 retweet 12.1 12.1 11.66 CORRELATION and COEFFICENT OF DETERMINATION (R2 ) values for URLS and RETWEETs before release Features Correlation R2 url 0.64 0.39 retweet 0.5 0.20 “This result is quite surprising since we would expect promotional material to contribute significantly to a movie’s box-office income”
  • 14. Prediction first weekend Box-office revenues “Using the tweets referring to movies prior to their release, can we accurately predict the box-office revenue generated by the movie in its opening weekend?” How use a quantifiable measure on the tweets? TWEETRATE number of tweets referring to a particular movie per hour ∣tweets mov∣ Tweetrate mov = ∣Time hours∣ “the correlation of the average tweetrate with the box-office gross for the 24 movies considered showed a strong positive correlation, with a correlation coefficient value of 0.90”
  • 15. Prediction use the regression analisys! Prediction compared with the real box-office revenue information extracted from the Box Office Mojo website => POSITIVE RESULTS Regression analysis with: •Time series values of the tweet rate for the 7 days before the release •Thent → number of the theaters the movies were released •HSX Index → the index of the Hollywood Stock Exchange
  • 16. Prediction linear regression the results Features Adjusted R2 p-value*** Avg Tweet-rate 0.80 3.65e-09 Tweet-rate timeseries 0.93 5.279e-09 Tweet-rate timeseries + thent 0.973 9.14e-12 HSX timeseries + thent 0.963 1.030e-10
  • 17. Prediction Predicted vs Actual box office scores using tweet-rate and HSX predictors
  • 18. Prediction Predicting prices Prediction of HSX end of opening weekend price Predictor Adjusted R2 p-value*** HSX timeseries + thent 0.95 4.495e-10 Tweet-rate timeseries + 0.97 2.379e-11 thent “The Hollywood Stock Week-end Adjusted R2 Exchange de-lists movie stocks after 4 Jan 15-17 0.92 weeks of release, Jan 22-24 0.97 which means that there is no timeseries Jan 29-31 0.92 available for movies after 4 weeks. In the Feb 05-07 0.95 case of tweets, people continue to discuss Coefficient of determination movies long after they (R2) values using tweet-rate are released” timeseries for different week- ends
  • 19. Sentiment Analysis investigate the importance of sentiments in predicting future outcomes •For each tweet assign the label Positive, Negative or Neutral • Clean data (no stop-words, removel url and userid, replace title, question, exclamations) • Amazon Meccanical Turk (1000 workers) •Use LingPipe – DynamicLDClassifier • Obtained an accuracy of 98% 1)Define two variables ∣Positive and NegativeTweets∣ Subjectivity= ∣Neutral Tweets∣ ∣Tweets with Positive Sentiment∣ PNratio= ∣Tweets with Negative Sentiment∣
  • 20. Sentiment Analysis X → movies the subjectivity increases after release Y → subjectivity
  • 21. Sentiment Analysis The positive and negative go in the same direction X → movies of the movies success Y → polarity
  • 22. Sentiment Analysis regression analisys and polartiy (PNRatio) Predictor Adjusted R2 p-value Avg Tweet-rate 0.79 8.39e-09 Avg Tweet-rate + thent 0.83 7.93a-09 Avg Tweet-rate + PNRatio 0.92 4.31e-12 Tweet-rate time series 0.84 4.18e-06 Tweet-rate timeseries + 0.863 3.64e-06 thent Tweet-rate timeseries + 0.94 1.84e-08 PNRatio the sentiments do provide improvements, although they are not as important as the rate of tweets themselves
  • 23. GENERAL PREDICTION MODEL FOR SOCIALMEDIA y=a∗A p∗P d ∗D A : rate of attention seeking P : polarity of sentiments and reviews y=∧ D : distribution parameter y denote the revenue to be predicted Є the error β values correspond to the regression coefficients
  • 24. Bibliography  D. M. Pennock, S. Lawrence, C. L. Giles, and F. A. Nielsen. The real power of artificial markets. Science, 291(5506):987– 988, Jan 2001.  W. Zhang and S. Skiena. Improving movie gross prediction through news analysis. In Web Intelligence, pages 301304, 2009.
  • 25. These slides are released under Creative Commons Attribution-ShareAlike 2.5 ● You are free: ● to copy, distribute, display, and perform the work ● to make derivative works ● to make commercial use of the work Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. ● For any reuse or distribution, you must make clear to others the license terms of this work. ● Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above. More info at http://creativecommons.org/licenses/by-sa/2.5/