SlideShare a Scribd company logo
1 of 11
Predicting Worldwide
Gross for Top-rated
Movies
Marine Veits
Motivation
Create a linear regression model that can predict
Worldwide Gross of top-rated movies by
determining the features most influential to their
success.
Procedure
➔ Scraping
Data from various sources, clean and
merge into one dataframe
➔ EDA & OLS regression
On training set (70%), model evaluation
and regularization
➔ Choosing model
Apply new model to the rest of dataset
Procedure: Data Scraping
Dataframe: Top rated
100 movies*21 years
Features:
# Worldwide Gross
# Domestic Gross
# Foreign Gross # Budget
# Runtime # Composer
# Genre # Year
# IMDB Metascore
# Rating # Oscar Wins
# Winning Awards
Training Set Results: OLS
Feature Coef. P>|t|
R^2: 0.47
Adjusted R^2:
0.46
Budget ($M) 2.7038 0.000
Runtime (min) 0.6776 0.001
Year 3.6525 0.000
Rating 14.911 0.015
IMDB Metascore 1.2369 0.001
Top 15 Composers 40.921 0.000
Winning Awards 14.095 0.067
Oscar 55.205 0.000
Final Model: Degree 2 Polynomial Regression
Feature Coef.
R^2: 0.632Budget ($M) 1.426
Runtime (min) -5.506
Year -1887.55
Rating -67.562
IMDB Metascore 1.571
Top 15 Composers 22.64
Winning Awards 3.400
Oscar 25.744
Predicted vs.
Actual Values
Feature Importance
Quotes for illustration purposes only
Composers
Although this is not the most important
feature in my model:
➔ When making a movie
You should have a really high budget!!!
➔ Giving the job to one of the top
(10%) composers
You would have to pay roughly 5-22M
$
John Williams Danny Elfman
Alan Silvestri
Hans Zimmer
James Newton Howard
Conclusions
The budget is is the key indicator of
Worldwide Gross.
Runtime & Year of production are
negatively correlated with
Worldwide Gross.
Top composers indicator
contribution to the model is higher
than winning Oscar or other
Awards.
Metis Project 2: Predicting Worldwide Gross - JungleBoogie

More Related Content

Similar to Metis Project 2: Predicting Worldwide Gross - JungleBoogie

Final Presentation Insight-2
Final Presentation Insight-2Final Presentation Insight-2
Final Presentation Insight-2Carl Schiro
 
Improve phase lean six sigma tollgate template
Improve phase   lean six sigma tollgate templateImprove phase   lean six sigma tollgate template
Improve phase lean six sigma tollgate templateSteven Bonacorsi
 
Improve phase lean six sigma tollgate template
Improve phase   lean six sigma tollgate templateImprove phase   lean six sigma tollgate template
Improve phase lean six sigma tollgate templateSteven Bonacorsi
 
Elcio Grassia Presidente do SCC LATAM
Elcio Grassia Presidente do SCC LATAMElcio Grassia Presidente do SCC LATAM
Elcio Grassia Presidente do SCC LATAMSergio Grisa
 
Focus for Lean Six Sigma
Focus for Lean Six SigmaFocus for Lean Six Sigma
Focus for Lean Six SigmaJay Arthur
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxJadna Almeida
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxJadna Almeida
 
Supply Chain Council
Supply Chain CouncilSupply Chain Council
Supply Chain CouncilSergio Grisa
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?Brent Ozar
 
Embracing service-level-objectives of your microservices in your Cl/CD
Embracing service-level-objectives of your microservices in your Cl/CDEmbracing service-level-objectives of your microservices in your Cl/CD
Embracing service-level-objectives of your microservices in your Cl/CDNebulaworks
 
Data Model Architecture
Data Model ArchitectureData Model Architecture
Data Model ArchitectureDaniel McKean
 
Monetizing Risks - A Prioritization & Optimization Solution
Monetizing Risks - A Prioritization & Optimization SolutionMonetizing Risks - A Prioritization & Optimization Solution
Monetizing Risks - A Prioritization & Optimization SolutionBlack & Veatch
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
Default Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan DataDefault Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan DataDeep Borkar
 
The metrics that matter using scalability metrics for project planning of a d...
The metrics that matter using scalability metrics for project planning of a d...The metrics that matter using scalability metrics for project planning of a d...
The metrics that matter using scalability metrics for project planning of a d...Mary Chan
 
The 5Ps of Sustainable Building Operations
The 5Ps of Sustainable Building OperationsThe 5Ps of Sustainable Building Operations
The 5Ps of Sustainable Building OperationsKatherine Morgan
 

Similar to Metis Project 2: Predicting Worldwide Gross - JungleBoogie (20)

Final Presentation Insight-2
Final Presentation Insight-2Final Presentation Insight-2
Final Presentation Insight-2
 
Improve phase lean six sigma tollgate template
Improve phase   lean six sigma tollgate templateImprove phase   lean six sigma tollgate template
Improve phase lean six sigma tollgate template
 
Improve phase lean six sigma tollgate template
Improve phase   lean six sigma tollgate templateImprove phase   lean six sigma tollgate template
Improve phase lean six sigma tollgate template
 
Elcio Grassia Presidente do SCC LATAM
Elcio Grassia Presidente do SCC LATAMElcio Grassia Presidente do SCC LATAM
Elcio Grassia Presidente do SCC LATAM
 
Focus for Lean Six Sigma
Focus for Lean Six SigmaFocus for Lean Six Sigma
Focus for Lean Six Sigma
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptx
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptx
 
Supply Chain Council
Supply Chain CouncilSupply Chain Council
Supply Chain Council
 
SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?SQL Query Optimization: Why Is It So Hard to Get Right?
SQL Query Optimization: Why Is It So Hard to Get Right?
 
Embracing service-level-objectives of your microservices in your Cl/CD
Embracing service-level-objectives of your microservices in your Cl/CDEmbracing service-level-objectives of your microservices in your Cl/CD
Embracing service-level-objectives of your microservices in your Cl/CD
 
Foresee your movie revenue
Foresee your movie revenueForesee your movie revenue
Foresee your movie revenue
 
6sigma
6sigma6sigma
6sigma
 
CSCCIX2005
CSCCIX2005CSCCIX2005
CSCCIX2005
 
Data Model Architecture
Data Model ArchitectureData Model Architecture
Data Model Architecture
 
Monetizing Risks - A Prioritization & Optimization Solution
Monetizing Risks - A Prioritization & Optimization SolutionMonetizing Risks - A Prioritization & Optimization Solution
Monetizing Risks - A Prioritization & Optimization Solution
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Se notes
Se notesSe notes
Se notes
 
Default Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan DataDefault Prediction & Analysis on Lending Club Loan Data
Default Prediction & Analysis on Lending Club Loan Data
 
The metrics that matter using scalability metrics for project planning of a d...
The metrics that matter using scalability metrics for project planning of a d...The metrics that matter using scalability metrics for project planning of a d...
The metrics that matter using scalability metrics for project planning of a d...
 
The 5Ps of Sustainable Building Operations
The 5Ps of Sustainable Building OperationsThe 5Ps of Sustainable Building Operations
The 5Ps of Sustainable Building Operations
 

Recently uploaded

一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理pyhepag
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfEmmanuel Dauda
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp onlinebalibahu1313
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 
Data analytics courses in Nepal Presentation
Data analytics courses in Nepal PresentationData analytics courses in Nepal Presentation
Data analytics courses in Nepal Presentationanshikakulshreshtha11
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxStephen266013
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonPayment Village
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfMichaelSenkow
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyRafigAliyev2
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理pyhepag
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group MeetingAlison Pitt
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 

Recently uploaded (20)

Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
Data analytics courses in Nepal Presentation
Data analytics courses in Nepal PresentationData analytics courses in Nepal Presentation
Data analytics courses in Nepal Presentation
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 

Metis Project 2: Predicting Worldwide Gross - JungleBoogie

  • 1. Predicting Worldwide Gross for Top-rated Movies Marine Veits
  • 2. Motivation Create a linear regression model that can predict Worldwide Gross of top-rated movies by determining the features most influential to their success.
  • 3. Procedure ➔ Scraping Data from various sources, clean and merge into one dataframe ➔ EDA & OLS regression On training set (70%), model evaluation and regularization ➔ Choosing model Apply new model to the rest of dataset
  • 4. Procedure: Data Scraping Dataframe: Top rated 100 movies*21 years Features: # Worldwide Gross # Domestic Gross # Foreign Gross # Budget # Runtime # Composer # Genre # Year # IMDB Metascore # Rating # Oscar Wins # Winning Awards
  • 5. Training Set Results: OLS Feature Coef. P>|t| R^2: 0.47 Adjusted R^2: 0.46 Budget ($M) 2.7038 0.000 Runtime (min) 0.6776 0.001 Year 3.6525 0.000 Rating 14.911 0.015 IMDB Metascore 1.2369 0.001 Top 15 Composers 40.921 0.000 Winning Awards 14.095 0.067 Oscar 55.205 0.000
  • 6. Final Model: Degree 2 Polynomial Regression Feature Coef. R^2: 0.632Budget ($M) 1.426 Runtime (min) -5.506 Year -1887.55 Rating -67.562 IMDB Metascore 1.571 Top 15 Composers 22.64 Winning Awards 3.400 Oscar 25.744
  • 8. Feature Importance Quotes for illustration purposes only
  • 9. Composers Although this is not the most important feature in my model: ➔ When making a movie You should have a really high budget!!! ➔ Giving the job to one of the top (10%) composers You would have to pay roughly 5-22M $ John Williams Danny Elfman Alan Silvestri Hans Zimmer James Newton Howard
  • 10. Conclusions The budget is is the key indicator of Worldwide Gross. Runtime & Year of production are negatively correlated with Worldwide Gross. Top composers indicator contribution to the model is higher than winning Oscar or other Awards.