SlideShare a Scribd company logo
1 of 16
Download to read offline
ANLY500 Group #6
Yuanjie Lei
FangyaTan
ShuqingYang
YanhongYe
Introduction
• We found that the volume
of social conversations of
movies is highly correlated
with box office.
• E.g.The more people
discussing a movie on
Twitter, the higher the
movie’s box office.
• Our goal of this study is
using social data to predict
box office.
Research Question
• Study the correlation betweenTwitter data and/or
other factors and movies’ opening weekend box
office in the US movie market.
• Can we use social network (Twitter) data to predict
movies’ opening weekend box office?
Data Sources
Movies Meta Data
• We collected movies meta data from
some public sources online, key
variables include movie, movie id,
release date, genre, box office of the
opening weekend, rating, theater
counts.There are 417 movies in the
dataset.
MoviesVolume Data
• We obtained the volume data from a
social network analytics firm,
‘Networked Insights’.The dataset has
the daily conversation volume of
movies onTwitter. Key variables
include movie id, volume date, total
volume, intent volume, and etc.The
dataset has 84064 records.
Data Pre-processing
• Cleaned movie meta data, removed unreleased movies (box office is NA).
• Aggregated each daily volume variables to total volume. (post volume, intent
volume, positive volume, negative volume, female volume, male volume, organic
volume)
• Joined two datasets by movie id for EDA and model building.
EDA 1. Is there any differences in
behavior between different genres?
• Super Hero had the highest box office
on opening weekend, following by
Young Adult and Sci-Fi.
• Drama has the lowest box office on
average.
EDA 1. Is there any differences in
behavior between different genres?
• Young Adult and Super
Hero have the highest
average theater counts.
• Drama has the lowest
average theater counts.
EDA 1. Is there any differences in
behavior between different genres?
• We found significant growth of
volume at week -4, likely due to
TV & digital media campaigns.
• Most genres double their volume
from week -2 to week -1.
• Super Hero andYoung Adult
movies see significantly higher
volumes at all weeks followed by
Sci-Fi, Action and Family
Animation.
EDA 2. Which variables are correlated
with box office?
• Total volume, intent volume, and
organic volume are highly correlated
with box office. However, we should
be aware of the multicollinearity
issue since all 7 different types of
volume are highly correlated with
each other as well.
• Theater counts is also positively
correlated to box office.
Model 1.Linear Regression Model
• DependentVariable: Box Office
• IndependentVariable: Sum of PostVolume
• A film’s social volume alone is a strong indicator of opening weekend
performance.
• This simple linear regression model using the sum of post volume as a predictor of
opening weekend performance yields an R2 =68.31
Model 1.Linear Regression Model
Model 2.Multiple Linear Regression
Model - Improving the Predictive Model
• DependentVariable: Box Office
• IndependentVariable: Sum of PostVolume,Theater Counts
• Predicted opening box revenue=-33.447+1.53e(-5)*total volume +0.0167*theater
number
• Not all films are alike. Adding factorTheatre Counts greatly improves the accuracy
of the model.
• R2 =
Model 2.Multiple Linear Regression
Model - Improving the Predictive Model
Example - Prediction ofTomorrowland
• Predicted opening box revenue=-33.447+1.53e(-5)*total volume +0.0167*theater
number
• Total volume ofTomorrowland=250730,Theater number ofTomorrowland=3972
• Predicted opening box revenue=32.889 Million dollars
• Actual opening box revenue=33 Million dollars
• Standard error of opening box revenue=0.012
Limitation & Uncertainty
• Twitter data may not be a good representation of the population.
• Studio who invests more on promoting a movie may have higher volume than who
invests less.
• We haven’t taken into account the sentiment of the post volume, a movie who has
high negative post volume might result in lower box office.
Conclusion
• Social data is highly predictive of box office performance
• SocialVolume alone explains 75% of the variance!
• Model prediction 10-weeks prior to release can predict opening weekend
within 10 million 70% of the time.
• Studios can leverage these insights and make decisions based on social
volume performance.

More Related Content

What's hot

Cara mengatasi email yahoo yang di hack!
Cara mengatasi email yahoo yang di hack!Cara mengatasi email yahoo yang di hack!
Cara mengatasi email yahoo yang di hack!Dompet Dhuafa
 
Functional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | EonomicsFunctional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | EonomicsTransweb Global Inc
 
Seminar hasil -_copy
Seminar hasil -_copySeminar hasil -_copy
Seminar hasil -_copyReni Ningsih
 
Introduction to Regression Analysis
Introduction to Regression AnalysisIntroduction to Regression Analysis
Introduction to Regression AnalysisSibashis Chakraborty
 
Elaborazione della risposta idrologica del torrente Astico | Modello geomorfo...
Elaborazione della risposta idrologica del torrente Astico | Modello geomorfo...Elaborazione della risposta idrologica del torrente Astico | Modello geomorfo...
Elaborazione della risposta idrologica del torrente Astico | Modello geomorfo...MatteoMeneghetti3
 
Kubota b7300 hsd tractor parts catalogue manual
Kubota b7300 hsd tractor parts catalogue manualKubota b7300 hsd tractor parts catalogue manual
Kubota b7300 hsd tractor parts catalogue manualjfdjskmdmme
 
07 ch ken black solution
07 ch ken black solution07 ch ken black solution
07 ch ken black solutionKrunal Shah
 
Basic econometrics lectues_1
Basic econometrics lectues_1Basic econometrics lectues_1
Basic econometrics lectues_1Nivedita Sharma
 
12 ch ken black solution
12 ch ken black solution12 ch ken black solution
12 ch ken black solutionKrunal Shah
 
Derivatives and Pakistani Market
Derivatives and Pakistani MarketDerivatives and Pakistani Market
Derivatives and Pakistani Markethadishaikh2
 
Potilasvakuutuskeskuksen johtaja Minna Plit-Turunen: Potilasvakuutus potilaid...
Potilasvakuutuskeskuksen johtaja Minna Plit-Turunen: Potilasvakuutus potilaid...Potilasvakuutuskeskuksen johtaja Minna Plit-Turunen: Potilasvakuutus potilaid...
Potilasvakuutuskeskuksen johtaja Minna Plit-Turunen: Potilasvakuutus potilaid...Vakuutuskeskus
 
11 ch ken black solution
11 ch ken black solution11 ch ken black solution
11 ch ken black solutionKrunal Shah
 
16 ch ken black solution
16 ch ken black solution16 ch ken black solution
16 ch ken black solutionKrunal Shah
 
Advanced Econometrics by Sajid Ali Khan Rawalakot: 0334-5439066
Advanced Econometrics by Sajid Ali Khan Rawalakot: 0334-5439066Advanced Econometrics by Sajid Ali Khan Rawalakot: 0334-5439066
Advanced Econometrics by Sajid Ali Khan Rawalakot: 0334-5439066Sajid Ali Khan
 
Conectar un calendario con outlook en SharePoint 2013
Conectar un calendario con outlook en SharePoint 2013Conectar un calendario con outlook en SharePoint 2013
Conectar un calendario con outlook en SharePoint 2013OTic365
 
15 ch ken black solution
15 ch ken black solution15 ch ken black solution
15 ch ken black solutionKrunal Shah
 

What's hot (20)

National Certified Counselor (NCC)
National Certified Counselor (NCC)National Certified Counselor (NCC)
National Certified Counselor (NCC)
 
Ed Tech Proposal
Ed Tech ProposalEd Tech Proposal
Ed Tech Proposal
 
Cara mengatasi email yahoo yang di hack!
Cara mengatasi email yahoo yang di hack!Cara mengatasi email yahoo yang di hack!
Cara mengatasi email yahoo yang di hack!
 
Functional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | EonomicsFunctional Forms of Regression Models | Eonomics
Functional Forms of Regression Models | Eonomics
 
Seminar hasil -_copy
Seminar hasil -_copySeminar hasil -_copy
Seminar hasil -_copy
 
Introduction to Regression Analysis
Introduction to Regression AnalysisIntroduction to Regression Analysis
Introduction to Regression Analysis
 
Elaborazione della risposta idrologica del torrente Astico | Modello geomorfo...
Elaborazione della risposta idrologica del torrente Astico | Modello geomorfo...Elaborazione della risposta idrologica del torrente Astico | Modello geomorfo...
Elaborazione della risposta idrologica del torrente Astico | Modello geomorfo...
 
Kubota b7300 hsd tractor parts catalogue manual
Kubota b7300 hsd tractor parts catalogue manualKubota b7300 hsd tractor parts catalogue manual
Kubota b7300 hsd tractor parts catalogue manual
 
ARIMA
ARIMA ARIMA
ARIMA
 
07 ch ken black solution
07 ch ken black solution07 ch ken black solution
07 ch ken black solution
 
Basic econometrics lectues_1
Basic econometrics lectues_1Basic econometrics lectues_1
Basic econometrics lectues_1
 
12 ch ken black solution
12 ch ken black solution12 ch ken black solution
12 ch ken black solution
 
Derivatives and Pakistani Market
Derivatives and Pakistani MarketDerivatives and Pakistani Market
Derivatives and Pakistani Market
 
Potilasvakuutuskeskuksen johtaja Minna Plit-Turunen: Potilasvakuutus potilaid...
Potilasvakuutuskeskuksen johtaja Minna Plit-Turunen: Potilasvakuutus potilaid...Potilasvakuutuskeskuksen johtaja Minna Plit-Turunen: Potilasvakuutus potilaid...
Potilasvakuutuskeskuksen johtaja Minna Plit-Turunen: Potilasvakuutus potilaid...
 
Vaccine Logistics and Supply Chain | National Cold Chain Assessment India
Vaccine Logistics and Supply Chain | National Cold Chain Assessment IndiaVaccine Logistics and Supply Chain | National Cold Chain Assessment India
Vaccine Logistics and Supply Chain | National Cold Chain Assessment India
 
11 ch ken black solution
11 ch ken black solution11 ch ken black solution
11 ch ken black solution
 
16 ch ken black solution
16 ch ken black solution16 ch ken black solution
16 ch ken black solution
 
Advanced Econometrics by Sajid Ali Khan Rawalakot: 0334-5439066
Advanced Econometrics by Sajid Ali Khan Rawalakot: 0334-5439066Advanced Econometrics by Sajid Ali Khan Rawalakot: 0334-5439066
Advanced Econometrics by Sajid Ali Khan Rawalakot: 0334-5439066
 
Conectar un calendario con outlook en SharePoint 2013
Conectar un calendario con outlook en SharePoint 2013Conectar un calendario con outlook en SharePoint 2013
Conectar un calendario con outlook en SharePoint 2013
 
15 ch ken black solution
15 ch ken black solution15 ch ken black solution
15 ch ken black solution
 

Similar to Social Volume Predicts 68% of Box Office

Storytelling with data think broad, mine deep, explain simply
Storytelling with data   think broad, mine deep, explain simplyStorytelling with data   think broad, mine deep, explain simply
Storytelling with data think broad, mine deep, explain simplyLuciano Pesci, PhD
 
A Movie Broker Dashboard in PowerBI
A Movie Broker Dashboard in PowerBIA Movie Broker Dashboard in PowerBI
A Movie Broker Dashboard in PowerBILeo Salemann
 
Predicting The Future With Social Media
Predicting The Future With Social MediaPredicting The Future With Social Media
Predicting The Future With Social MediaMaurizio Napolitano
 
5. Long tail vs superstar
5. Long tail vs superstar5. Long tail vs superstar
5. Long tail vs superstarJoost Rietveld
 
WWW2013: Web Usage Mining with Semantic Analysis
WWW2013: Web Usage Mining with Semantic AnalysisWWW2013: Web Usage Mining with Semantic Analysis
WWW2013: Web Usage Mining with Semantic AnalysisLaura Hollink
 
Funding Workshop Slides from Big Sky Film Festival Doc Shop
Funding Workshop Slides from Big Sky Film Festival Doc Shop Funding Workshop Slides from Big Sky Film Festival Doc Shop
Funding Workshop Slides from Big Sky Film Festival Doc Shop Reva Goldberg
 
ediz_volkan_social_data
ediz_volkan_social_dataediz_volkan_social_data
ediz_volkan_social_dataVolkan Ediz
 
Data science for advanced dummies
Data science for advanced dummiesData science for advanced dummies
Data science for advanced dummiesSaurav Chakravorty
 
Developing and Movie Recommendation System in R
Developing and Movie Recommendation System in RDeveloping and Movie Recommendation System in R
Developing and Movie Recommendation System in RJody Schechter
 
Distribution & Exhibition
Distribution & ExhibitionDistribution & Exhibition
Distribution & ExhibitionVictory Media
 
Netflix and the film industry
Netflix and the film industryNetflix and the film industry
Netflix and the film industrylou80
 
Distribution lesson part 2
Distribution lesson part 2Distribution lesson part 2
Distribution lesson part 2hasnmedia
 
Ibm advanced analytics platform for m&e
Ibm advanced analytics platform for m&eIbm advanced analytics platform for m&e
Ibm advanced analytics platform for m&eUnited Partners
 
W6 57-010126-2009-8
W6 57-010126-2009-8W6 57-010126-2009-8
W6 57-010126-2009-8OHM_Resistor
 

Similar to Social Volume Predicts 68% of Box Office (20)

Storytelling with data think broad, mine deep, explain simply
Storytelling with data   think broad, mine deep, explain simplyStorytelling with data   think broad, mine deep, explain simply
Storytelling with data think broad, mine deep, explain simply
 
Movies and Market Share
Movies and Market ShareMovies and Market Share
Movies and Market Share
 
A Movie Broker Dashboard in PowerBI
A Movie Broker Dashboard in PowerBIA Movie Broker Dashboard in PowerBI
A Movie Broker Dashboard in PowerBI
 
Predicting The Future With Social Media
Predicting The Future With Social MediaPredicting The Future With Social Media
Predicting The Future With Social Media
 
5. Long tail vs superstar
5. Long tail vs superstar5. Long tail vs superstar
5. Long tail vs superstar
 
WWW2013: Web Usage Mining with Semantic Analysis
WWW2013: Web Usage Mining with Semantic AnalysisWWW2013: Web Usage Mining with Semantic Analysis
WWW2013: Web Usage Mining with Semantic Analysis
 
Cinema stuff
Cinema stuffCinema stuff
Cinema stuff
 
Introducing telemetrics
Introducing telemetricsIntroducing telemetrics
Introducing telemetrics
 
Funding Workshop Slides from Big Sky Film Festival Doc Shop
Funding Workshop Slides from Big Sky Film Festival Doc Shop Funding Workshop Slides from Big Sky Film Festival Doc Shop
Funding Workshop Slides from Big Sky Film Festival Doc Shop
 
ediz_volkan_social_data
ediz_volkan_social_dataediz_volkan_social_data
ediz_volkan_social_data
 
Tim P
Tim P   Tim P
Tim P
 
Data science for advanced dummies
Data science for advanced dummiesData science for advanced dummies
Data science for advanced dummies
 
Developing and Movie Recommendation System in R
Developing and Movie Recommendation System in RDeveloping and Movie Recommendation System in R
Developing and Movie Recommendation System in R
 
Distribution & Exhibition
Distribution & ExhibitionDistribution & Exhibition
Distribution & Exhibition
 
Netflix and the film industry
Netflix and the film industryNetflix and the film industry
Netflix and the film industry
 
Distribution lesson part 2
Distribution lesson part 2Distribution lesson part 2
Distribution lesson part 2
 
LION LEGENDS PP Most recent
LION LEGENDS PP Most recent LION LEGENDS PP Most recent
LION LEGENDS PP Most recent
 
GROUP 8
GROUP 8GROUP 8
GROUP 8
 
Ibm advanced analytics platform for m&e
Ibm advanced analytics platform for m&eIbm advanced analytics platform for m&e
Ibm advanced analytics platform for m&e
 
W6 57-010126-2009-8
W6 57-010126-2009-8W6 57-010126-2009-8
W6 57-010126-2009-8
 

More from FangyaTan

Machine Learning for prediction of survival outcomes with Immune checkpoint i...
Machine Learning for prediction of survival outcomes with Immune checkpoint i...Machine Learning for prediction of survival outcomes with Immune checkpoint i...
Machine Learning for prediction of survival outcomes with Immune checkpoint i...FangyaTan
 
MANOVA paper review with example in R
MANOVA paper review with example in RMANOVA paper review with example in R
MANOVA paper review with example in RFangyaTan
 
Factor analysis in R with Five Personality Survey Mini Sample
Factor analysis in R with Five Personality Survey Mini SampleFactor analysis in R with Five Personality Survey Mini Sample
Factor analysis in R with Five Personality Survey Mini SampleFangyaTan
 
Logistic regression
Logistic regression Logistic regression
Logistic regression FangyaTan
 
Trade war Bayesian Analysis
Trade war Bayesian AnalysisTrade war Bayesian Analysis
Trade war Bayesian AnalysisFangyaTan
 
Data Analysis on Crime Rate
Data Analysis on Crime Rate Data Analysis on Crime Rate
Data Analysis on Crime Rate FangyaTan
 

More from FangyaTan (6)

Machine Learning for prediction of survival outcomes with Immune checkpoint i...
Machine Learning for prediction of survival outcomes with Immune checkpoint i...Machine Learning for prediction of survival outcomes with Immune checkpoint i...
Machine Learning for prediction of survival outcomes with Immune checkpoint i...
 
MANOVA paper review with example in R
MANOVA paper review with example in RMANOVA paper review with example in R
MANOVA paper review with example in R
 
Factor analysis in R with Five Personality Survey Mini Sample
Factor analysis in R with Five Personality Survey Mini SampleFactor analysis in R with Five Personality Survey Mini Sample
Factor analysis in R with Five Personality Survey Mini Sample
 
Logistic regression
Logistic regression Logistic regression
Logistic regression
 
Trade war Bayesian Analysis
Trade war Bayesian AnalysisTrade war Bayesian Analysis
Trade war Bayesian Analysis
 
Data Analysis on Crime Rate
Data Analysis on Crime Rate Data Analysis on Crime Rate
Data Analysis on Crime Rate
 

Recently uploaded

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 

Recently uploaded (20)

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 

Social Volume Predicts 68% of Box Office

  • 1. ANLY500 Group #6 Yuanjie Lei FangyaTan ShuqingYang YanhongYe
  • 2. Introduction • We found that the volume of social conversations of movies is highly correlated with box office. • E.g.The more people discussing a movie on Twitter, the higher the movie’s box office. • Our goal of this study is using social data to predict box office.
  • 3. Research Question • Study the correlation betweenTwitter data and/or other factors and movies’ opening weekend box office in the US movie market. • Can we use social network (Twitter) data to predict movies’ opening weekend box office?
  • 4. Data Sources Movies Meta Data • We collected movies meta data from some public sources online, key variables include movie, movie id, release date, genre, box office of the opening weekend, rating, theater counts.There are 417 movies in the dataset. MoviesVolume Data • We obtained the volume data from a social network analytics firm, ‘Networked Insights’.The dataset has the daily conversation volume of movies onTwitter. Key variables include movie id, volume date, total volume, intent volume, and etc.The dataset has 84064 records.
  • 5. Data Pre-processing • Cleaned movie meta data, removed unreleased movies (box office is NA). • Aggregated each daily volume variables to total volume. (post volume, intent volume, positive volume, negative volume, female volume, male volume, organic volume) • Joined two datasets by movie id for EDA and model building.
  • 6. EDA 1. Is there any differences in behavior between different genres? • Super Hero had the highest box office on opening weekend, following by Young Adult and Sci-Fi. • Drama has the lowest box office on average.
  • 7. EDA 1. Is there any differences in behavior between different genres? • Young Adult and Super Hero have the highest average theater counts. • Drama has the lowest average theater counts.
  • 8. EDA 1. Is there any differences in behavior between different genres? • We found significant growth of volume at week -4, likely due to TV & digital media campaigns. • Most genres double their volume from week -2 to week -1. • Super Hero andYoung Adult movies see significantly higher volumes at all weeks followed by Sci-Fi, Action and Family Animation.
  • 9. EDA 2. Which variables are correlated with box office? • Total volume, intent volume, and organic volume are highly correlated with box office. However, we should be aware of the multicollinearity issue since all 7 different types of volume are highly correlated with each other as well. • Theater counts is also positively correlated to box office.
  • 10. Model 1.Linear Regression Model • DependentVariable: Box Office • IndependentVariable: Sum of PostVolume • A film’s social volume alone is a strong indicator of opening weekend performance. • This simple linear regression model using the sum of post volume as a predictor of opening weekend performance yields an R2 =68.31
  • 12. Model 2.Multiple Linear Regression Model - Improving the Predictive Model • DependentVariable: Box Office • IndependentVariable: Sum of PostVolume,Theater Counts • Predicted opening box revenue=-33.447+1.53e(-5)*total volume +0.0167*theater number • Not all films are alike. Adding factorTheatre Counts greatly improves the accuracy of the model. • R2 =
  • 13. Model 2.Multiple Linear Regression Model - Improving the Predictive Model
  • 14. Example - Prediction ofTomorrowland • Predicted opening box revenue=-33.447+1.53e(-5)*total volume +0.0167*theater number • Total volume ofTomorrowland=250730,Theater number ofTomorrowland=3972 • Predicted opening box revenue=32.889 Million dollars • Actual opening box revenue=33 Million dollars • Standard error of opening box revenue=0.012
  • 15. Limitation & Uncertainty • Twitter data may not be a good representation of the population. • Studio who invests more on promoting a movie may have higher volume than who invests less. • We haven’t taken into account the sentiment of the post volume, a movie who has high negative post volume might result in lower box office.
  • 16. Conclusion • Social data is highly predictive of box office performance • SocialVolume alone explains 75% of the variance! • Model prediction 10-weeks prior to release can predict opening weekend within 10 million 70% of the time. • Studios can leverage these insights and make decisions based on social volume performance.