SlideShare a Scribd company logo
Predicting Worldwide
Gross for Top-rated
Movies
Marine Veits
Motivation
Create a linear regression model that can predict
Worldwide Gross of top-rated movies by
determining the features most influential to their
success.
Procedure
➔ Scraping
Data from various sources, clean and
merge into one dataframe
➔ EDA & OLS regression
On training set (70%), model evaluation
and regularization
➔ Choosing model
Apply new model to the rest of dataset
Procedure: Data Scraping
Dataframe: Top rated
100 movies*21 years
Features:
# Worldwide Gross
# Domestic Gross
# Foreign Gross # Budget
# Runtime # Composer
# Genre # Year
# IMDB Metascore
# Rating # Oscar Wins
# Winning Awards
Training Set Results: OLS
Feature Coef. P>|t|
R^2: 0.47
Adjusted R^2:
0.46
Budget ($M) 2.7038 0.000
Runtime (min) 0.6776 0.001
Year 3.6525 0.000
Rating 14.911 0.015
IMDB Metascore 1.2369 0.001
Top 15 Composers 40.921 0.000
Winning Awards 14.095 0.067
Oscar 55.205 0.000
Final Model: Degree 2 Polynomial Regression
Feature Coef.
R^2: 0.632Budget ($M) 1.426
Runtime (min) -5.506
Year -1887.55
Rating -67.562
IMDB Metascore 1.571
Top 15 Composers 22.64
Winning Awards 3.400
Oscar 25.744
Predicted vs.
Actual Values
Feature Importance
Quotes for illustration purposes only
Composers
Although this is not the most important
feature in my model:
➔ When making a movie
You should have a really high budget!!!
➔ Giving the job to one of the top
(10%) composers
You would have to pay roughly 5-22M
$
John Williams Danny Elfman
Alan Silvestri
Hans Zimmer
James Newton Howard
Conclusions
The budget is is the key indicator of
Worldwide Gross.
Runtime & Year of production are
negatively correlated with
Worldwide Gross.
Top composers indicator
contribution to the model is higher
than winning Oscar or other
Awards.
Metis Project 2: Predicting Worldwide Gross - JungleBoogie

More Related Content

Similar to Metis Project 2: Predicting Worldwide Gross - JungleBoogie

Final Presentation Insight-2
Final Presentation Insight-2Final Presentation Insight-2
Final Presentation Insight-2
Carl Schiro
 
Improve phase lean six sigma tollgate template
Improve phase   lean six sigma tollgate templateImprove phase   lean six sigma tollgate template
Improve phase lean six sigma tollgate template
Steven Bonacorsi
 
Improve phase lean six sigma tollgate template
Improve phase   lean six sigma tollgate templateImprove phase   lean six sigma tollgate template
Improve phase lean six sigma tollgate template
Steven Bonacorsi
 
Elcio Grassia Presidente do SCC LATAM
Elcio Grassia Presidente do SCC LATAMElcio Grassia Presidente do SCC LATAM
Elcio Grassia Presidente do SCC LATAM
Sergio Grisa
 
Focus for Lean Six Sigma
Focus for Lean Six SigmaFocus for Lean Six Sigma
Focus for Lean Six Sigma
Jay Arthur
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptx
Jadna Almeida
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptx
Jadna Almeida
 
Supply Chain Council
Supply Chain CouncilSupply Chain Council
Supply Chain Council
Sergio Grisa
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?
Brent Ozar
 
Embracing service-level-objectives of your microservices in your Cl/CD
Embracing service-level-objectives of your microservices in your Cl/CDEmbracing service-level-objectives of your microservices in your Cl/CD
Embracing service-level-objectives of your microservices in your Cl/CD
Nebulaworks
 
Foresee your movie revenue
Foresee your movie revenueForesee your movie revenue
Foresee your movie revenue
Chung-Hsuan (Gary), Kao
 
6sigma
6sigma6sigma
6sigma
Yasser Hamed
 
CSCCIX2005
CSCCIX2005CSCCIX2005
CSCCIX2005
Vijay Desai
 
Data Model Architecture
Data Model ArchitectureData Model Architecture
Data Model Architecture
Daniel McKean
 
Monetizing Risks - A Prioritization & Optimization Solution
Monetizing Risks - A Prioritization & Optimization SolutionMonetizing Risks - A Prioritization & Optimization Solution
Monetizing Risks - A Prioritization & Optimization Solution
Black & Veatch
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
Rising Media, Inc.
 
Se notes
Se notesSe notes
Default Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan DataDefault Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan Data
Deep Borkar
 
The metrics that matter using scalability metrics for project planning of a d...
The metrics that matter using scalability metrics for project planning of a d...The metrics that matter using scalability metrics for project planning of a d...
The metrics that matter using scalability metrics for project planning of a d...
Mary Chan
 
The 5Ps of Sustainable Building Operations
The 5Ps of Sustainable Building OperationsThe 5Ps of Sustainable Building Operations
The 5Ps of Sustainable Building Operations
Katherine Morgan
 

Similar to Metis Project 2: Predicting Worldwide Gross - JungleBoogie (20)

Final Presentation Insight-2
Final Presentation Insight-2Final Presentation Insight-2
Final Presentation Insight-2
 
Improve phase lean six sigma tollgate template
Improve phase   lean six sigma tollgate templateImprove phase   lean six sigma tollgate template
Improve phase lean six sigma tollgate template
 
Improve phase lean six sigma tollgate template
Improve phase   lean six sigma tollgate templateImprove phase   lean six sigma tollgate template
Improve phase lean six sigma tollgate template
 
Elcio Grassia Presidente do SCC LATAM
Elcio Grassia Presidente do SCC LATAMElcio Grassia Presidente do SCC LATAM
Elcio Grassia Presidente do SCC LATAM
 
Focus for Lean Six Sigma
Focus for Lean Six SigmaFocus for Lean Six Sigma
Focus for Lean Six Sigma
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptx
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptx
 
Supply Chain Council
Supply Chain CouncilSupply Chain Council
Supply Chain Council
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?
 
Embracing service-level-objectives of your microservices in your Cl/CD
Embracing service-level-objectives of your microservices in your Cl/CDEmbracing service-level-objectives of your microservices in your Cl/CD
Embracing service-level-objectives of your microservices in your Cl/CD
 
Foresee your movie revenue
Foresee your movie revenueForesee your movie revenue
Foresee your movie revenue
 
6sigma
6sigma6sigma
6sigma
 
CSCCIX2005
CSCCIX2005CSCCIX2005
CSCCIX2005
 
Data Model Architecture
Data Model ArchitectureData Model Architecture
Data Model Architecture
 
Monetizing Risks - A Prioritization & Optimization Solution
Monetizing Risks - A Prioritization & Optimization SolutionMonetizing Risks - A Prioritization & Optimization Solution
Monetizing Risks - A Prioritization & Optimization Solution
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Se notes
Se notesSe notes
Se notes
 
Default Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan DataDefault Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan Data
 
The metrics that matter using scalability metrics for project planning of a d...
The metrics that matter using scalability metrics for project planning of a d...The metrics that matter using scalability metrics for project planning of a d...
The metrics that matter using scalability metrics for project planning of a d...
 
The 5Ps of Sustainable Building Operations
The 5Ps of Sustainable Building OperationsThe 5Ps of Sustainable Building Operations
The 5Ps of Sustainable Building Operations
 

Recently uploaded

DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 

Recently uploaded (20)

DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 

Metis Project 2: Predicting Worldwide Gross - JungleBoogie

  • 1. Predicting Worldwide Gross for Top-rated Movies Marine Veits
  • 2. Motivation Create a linear regression model that can predict Worldwide Gross of top-rated movies by determining the features most influential to their success.
  • 3. Procedure ➔ Scraping Data from various sources, clean and merge into one dataframe ➔ EDA & OLS regression On training set (70%), model evaluation and regularization ➔ Choosing model Apply new model to the rest of dataset
  • 4. Procedure: Data Scraping Dataframe: Top rated 100 movies*21 years Features: # Worldwide Gross # Domestic Gross # Foreign Gross # Budget # Runtime # Composer # Genre # Year # IMDB Metascore # Rating # Oscar Wins # Winning Awards
  • 5. Training Set Results: OLS Feature Coef. P>|t| R^2: 0.47 Adjusted R^2: 0.46 Budget ($M) 2.7038 0.000 Runtime (min) 0.6776 0.001 Year 3.6525 0.000 Rating 14.911 0.015 IMDB Metascore 1.2369 0.001 Top 15 Composers 40.921 0.000 Winning Awards 14.095 0.067 Oscar 55.205 0.000
  • 6. Final Model: Degree 2 Polynomial Regression Feature Coef. R^2: 0.632Budget ($M) 1.426 Runtime (min) -5.506 Year -1887.55 Rating -67.562 IMDB Metascore 1.571 Top 15 Composers 22.64 Winning Awards 3.400 Oscar 25.744
  • 8. Feature Importance Quotes for illustration purposes only
  • 9. Composers Although this is not the most important feature in my model: ➔ When making a movie You should have a really high budget!!! ➔ Giving the job to one of the top (10%) composers You would have to pay roughly 5-22M $ John Williams Danny Elfman Alan Silvestri Hans Zimmer James Newton Howard
  • 10. Conclusions The budget is is the key indicator of Worldwide Gross. Runtime & Year of production are negatively correlated with Worldwide Gross. Top composers indicator contribution to the model is higher than winning Oscar or other Awards.