2. Motivation
Create a linear regression model that can predict
Worldwide Gross of top-rated movies by
determining the features most influential to their
success.
3. Procedure
➔ Scraping
Data from various sources, clean and
merge into one dataframe
➔ EDA & OLS regression
On training set (70%), model evaluation
and regularization
➔ Choosing model
Apply new model to the rest of dataset
4. Procedure: Data Scraping
Dataframe: Top rated
100 movies*21 years
Features:
# Worldwide Gross
# Domestic Gross
# Foreign Gross # Budget
# Runtime # Composer
# Genre # Year
# IMDB Metascore
# Rating # Oscar Wins
# Winning Awards
5. Training Set Results: OLS
Feature Coef. P>|t|
R^2: 0.47
Adjusted R^2:
0.46
Budget ($M) 2.7038 0.000
Runtime (min) 0.6776 0.001
Year 3.6525 0.000
Rating 14.911 0.015
IMDB Metascore 1.2369 0.001
Top 15 Composers 40.921 0.000
Winning Awards 14.095 0.067
Oscar 55.205 0.000
6. Final Model: Degree 2 Polynomial Regression
Feature Coef.
R^2: 0.632Budget ($M) 1.426
Runtime (min) -5.506
Year -1887.55
Rating -67.562
IMDB Metascore 1.571
Top 15 Composers 22.64
Winning Awards 3.400
Oscar 25.744
9. Composers
Although this is not the most important
feature in my model:
➔ When making a movie
You should have a really high budget!!!
➔ Giving the job to one of the top
(10%) composers
You would have to pay roughly 5-22M
$
John Williams Danny Elfman
Alan Silvestri
Hans Zimmer
James Newton Howard
10. Conclusions
The budget is is the key indicator of
Worldwide Gross.
Runtime & Year of production are
negatively correlated with
Worldwide Gross.
Top composers indicator
contribution to the model is higher
than winning Oscar or other
Awards.