SlideShare a Scribd company logo
1 of 1
Download to read offline
Model Comparison
Movie Breakeven Analysis In
U.S Market
Liu Jialin | Priyadarshini Majumdar | Zhang Jiexi
Data Analytics Lab Project Challenge from Nov 23rd onwards at a theatre near YOU
INTRODUCTION
METHODOLOGY
What plays the most important role
in making a movie profitable?
Movie technical
 Language 4.3/10
 Content rating 4/10
 Aspect ratio 3/10
 Budget 2.5 /10
 Duration 1.5/10
 Colour or B&W 1/10
IMDB website Influence
 No of IMDB users who voted 9/10
 No of users reviewed 8/10
 No of critics for reviews 6/10
 IMDB score 5/10
Facebook influence
• Movie Facebook likes 4.5/10
• Actor 3 Facebook likes > Actor 2 > Actor 1
• Cast total Facebook likes 3.5/10
• Director Facebook likes 3.2/10
Poster and Promotional
materials
 No of faces in a poster 2.6/10
objectives
Data Processing1
Remove repetitive entries in JMP.
Calculate gross profit=
Create the binary Profit/Loss target
variable and remove missing
values.
SAS Enterprise Miner:
• Import the JMP file using File
Import and Save Data nodes.
• Change the level for Aspect Ratio
to nominal in the File Import node.
• Conduct text parsing,
text clustering and text filter on
plot key words and genres.
• Use Multiplot node to view the
distribution of the variables.
• Recode missing values and
erroneous entries using
Replacement node.
• Sample the data into Training Set
and Validation Set using the Data
Partition node.
Before running the parametric
models, fill in all missing values
using the Impute node and transform
the interval variables with skewed
distributions using the Transform
node.
Predictive Model Construction
Decision Tree
Applying nonparametric algorithm, decision tree is capable of fitting a large number of
functional forms and mapping observations to categorical targets.
Model Comparison
Conclusion
Background:
Movies are one of the top grossing industries in the world today
and in the U.S. itself it is a 38 billion dollar market as of 2016
Motivation:
IMDB is one of the top visited sites through which viewers
often decide whether to watch a movie or not. Hence this has
a direct effect on whether a movie will profit or loss.
Primary Objective: To develop a model that can predict whether a movie will
break even in the U.S. market or not.
Secondary Objective: To relay to promoters who use social media for movie
promotion on which factors affect the outcome of the movie
Confusion Matrix for Model Comparison
Gradient Boosting
A Gradient Boosting model builds up a strong learning tree from a base set of weak
learning trees, using Gradient Descending algorithm. It is computational intensive and
has excellent performance for moderate number of variables after fine-tuning.
Logistic Regression
Logistics regression describes the relationship between categorical target variable and
independent variables by estimating the probability from a cumulative logistic
distribution.
Neural Network
Neural network is a parametric model that accommodates a wider variety of nonlinear
relationships. Neural network also keeps checking the curse of dimensionality problem
which bedevils attempts to model non-linear functions with large number of variables.
Data set
5043
movie
titles
28
variables
The data set was scrapped from
IMDB using Python’s scrappy
library. This resulted in 5043
observations of 28 variables.
Random Forest
Random forest is ensemble of decision trees. It averages the predictive probability of
a large number of over trained decision trees, thus is more robust against overfitting
and more generalized than a single decision tree.
Most
influential
factors
2nd Most
influential
factors
3rd Most
influential
factors
Least
influential
factor
2 3
4
Target percentages show how accurate the model’s predictions are
towards future data set. Outcome percentages, on the other hand,
indicate the accuracy of model prediction for the sample data set. For
Gradient Boosting and Neural Network, the Outcome 1/1 percentages
are above 75%, which means the models have successfully predicted
75% of the breakeven movies. The Target 1/1 percentages are above
70%, which means the models predictions are reliable. Hence, Gradient
Boosting and Neural Network are the models chosen to predict the
breakeven status of the future movies in the U.S. market.
Misclassification rate takes the false positives
and the false negatives into consideration. Of
all the models, Gradient Boosting has the
lowest misclassification rate. This is not
surprising given the delicate algorithm that
seeks to minimise the intermediate pseudo-
residuals rather than simply relying on one
splitting criterion like in Decision Tree and
Random Forest. Neural Network 2 works the
second best, proving that its complicated
algorithm which imitates human mind indeed
has some advantage in building predictive
models.
The analysis and data set are highly reliant on online data given that it is extracted
from a movie rating website. This is however is not the only defining factor.
• Hence, further analysis on predicting movie successes should also take into
consideration traditional promotional channels such as theatre data.
• Additionally this data is collected over a period of time and when it comes to
movies, popularity of the movie grows over a period of time. Hence for a more
accurate analysis, time-stamps of the metrics must be collected and taken into
consideration.
• The most important insight from the above predictive analysis is that
online popularity of a movie is the best indicator of its success
• IMDB is a sought after site for movie opinions and hence movie votes,
critic reviews and general public reviews are the greatest influencers
• For Facebook likes Actor 3 Facebook likes are a better indicator than
actor 2 and actor 1 Facebook likes.
𝑔𝑟𝑜𝑠𝑠−𝑏𝑢𝑑𝑔𝑒𝑡
𝑏𝑢𝑑𝑔𝑒𝑡
%
future work

More Related Content

Viewers also liked

Managing Complex Services in SAP and SAP Ariba from a Client Perspective
Managing Complex Services in SAP and SAP Ariba from a Client PerspectiveManaging Complex Services in SAP and SAP Ariba from a Client Perspective
Managing Complex Services in SAP and SAP Ariba from a Client PerspectiveSAP Ariba
 
Introduction to Oracle Hyperion Planning - New Features in 11.1.2.4
Introduction to Oracle Hyperion Planning - New Features in 11.1.2.4Introduction to Oracle Hyperion Planning - New Features in 11.1.2.4
Introduction to Oracle Hyperion Planning - New Features in 11.1.2.4finitsolutions
 
Become Jythonic in FDMEE (KSCOPE15)
Become Jythonic in FDMEE (KSCOPE15)Become Jythonic in FDMEE (KSCOPE15)
Become Jythonic in FDMEE (KSCOPE15)Francisco Amores
 
Getting the Most Out of EPM: A deep dive into Account Reconciliation Manager
Getting the Most Out of EPM: A deep dive into Account Reconciliation ManagerGetting the Most Out of EPM: A deep dive into Account Reconciliation Manager
Getting the Most Out of EPM: A deep dive into Account Reconciliation Managerfinitsolutions
 
Introducing Oracle Advanced Financial Controls Cloud Service
Introducing Oracle Advanced Financial Controls Cloud ServiceIntroducing Oracle Advanced Financial Controls Cloud Service
Introducing Oracle Advanced Financial Controls Cloud ServiceDane Roberts
 
How to play Golf on the first day of Go-Live for a JD Edwards EnterpriseOne ...
 How to play Golf on the first day of Go-Live for a JD Edwards EnterpriseOne ... How to play Golf on the first day of Go-Live for a JD Edwards EnterpriseOne ...
How to play Golf on the first day of Go-Live for a JD Edwards EnterpriseOne ...KPIT
 
Unit 1 Overview of International Business
Unit 1 Overview of International BusinessUnit 1 Overview of International Business
Unit 1 Overview of International BusinessCharu Rastogi
 
Service Entry Sheets for Major and Complex Capital Projects
Service Entry Sheets for Major and Complex Capital ProjectsService Entry Sheets for Major and Complex Capital Projects
Service Entry Sheets for Major and Complex Capital ProjectsSAP Ariba
 
Oracle Data Integration - Overview
Oracle Data Integration - OverviewOracle Data Integration - Overview
Oracle Data Integration - OverviewJeffrey T. Pollock
 

Viewers also liked (14)

Managing Complex Services in SAP and SAP Ariba from a Client Perspective
Managing Complex Services in SAP and SAP Ariba from a Client PerspectiveManaging Complex Services in SAP and SAP Ariba from a Client Perspective
Managing Complex Services in SAP and SAP Ariba from a Client Perspective
 
UGC NET Classroom coaching photographs
UGC NET Classroom coaching photographsUGC NET Classroom coaching photographs
UGC NET Classroom coaching photographs
 
Introduction to Oracle Hyperion Planning - New Features in 11.1.2.4
Introduction to Oracle Hyperion Planning - New Features in 11.1.2.4Introduction to Oracle Hyperion Planning - New Features in 11.1.2.4
Introduction to Oracle Hyperion Planning - New Features in 11.1.2.4
 
Become Jythonic in FDMEE (KSCOPE15)
Become Jythonic in FDMEE (KSCOPE15)Become Jythonic in FDMEE (KSCOPE15)
Become Jythonic in FDMEE (KSCOPE15)
 
Getting the Most Out of EPM: A deep dive into Account Reconciliation Manager
Getting the Most Out of EPM: A deep dive into Account Reconciliation ManagerGetting the Most Out of EPM: A deep dive into Account Reconciliation Manager
Getting the Most Out of EPM: A deep dive into Account Reconciliation Manager
 
Introducing Oracle Advanced Financial Controls Cloud Service
Introducing Oracle Advanced Financial Controls Cloud ServiceIntroducing Oracle Advanced Financial Controls Cloud Service
Introducing Oracle Advanced Financial Controls Cloud Service
 
How to play Golf on the first day of Go-Live for a JD Edwards EnterpriseOne ...
 How to play Golf on the first day of Go-Live for a JD Edwards EnterpriseOne ... How to play Golf on the first day of Go-Live for a JD Edwards EnterpriseOne ...
How to play Golf on the first day of Go-Live for a JD Edwards EnterpriseOne ...
 
Unit 1 Overview of International Business
Unit 1 Overview of International BusinessUnit 1 Overview of International Business
Unit 1 Overview of International Business
 
Service Entry Sheets for Major and Complex Capital Projects
Service Entry Sheets for Major and Complex Capital ProjectsService Entry Sheets for Major and Complex Capital Projects
Service Entry Sheets for Major and Complex Capital Projects
 
Oracle Data Integration - Overview
Oracle Data Integration - OverviewOracle Data Integration - Overview
Oracle Data Integration - Overview
 
Literal h resultados-de_auditorias_internas_y_gubernamentales_ago_2015
Literal h resultados-de_auditorias_internas_y_gubernamentales_ago_2015Literal h resultados-de_auditorias_internas_y_gubernamentales_ago_2015
Literal h resultados-de_auditorias_internas_y_gubernamentales_ago_2015
 
Literal r indicadores-e_informacion_relevante_del_banco_central_nov_2015
Literal r indicadores-e_informacion_relevante_del_banco_central_nov_2015Literal r indicadores-e_informacion_relevante_del_banco_central_nov_2015
Literal r indicadores-e_informacion_relevante_del_banco_central_nov_2015
 
Literal f1 formularios-o_formatos_de_solicitudes_sep_2015
Literal f1 formularios-o_formatos_de_solicitudes_sep_2015Literal f1 formularios-o_formatos_de_solicitudes_sep_2015
Literal f1 formularios-o_formatos_de_solicitudes_sep_2015
 
Literal e texto-integro_de_contratos_colectivos_vigentes_oct_2015
Literal e texto-integro_de_contratos_colectivos_vigentes_oct_2015Literal e texto-integro_de_contratos_colectivos_vigentes_oct_2015
Literal e texto-integro_de_contratos_colectivos_vigentes_oct_2015
 

Similar to PowerPoint Presentation

A comparative analysis of machine learning approaches for movie success predi...
A comparative analysis of machine learning approaches for movie success predi...A comparative analysis of machine learning approaches for movie success predi...
A comparative analysis of machine learning approaches for movie success predi...IRJET Journal
 
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...ijaia
 
IRJET- Movie Success Prediction using Popularity Factor from Social Media
IRJET- Movie Success Prediction using Popularity Factor from Social MediaIRJET- Movie Success Prediction using Popularity Factor from Social Media
IRJET- Movie Success Prediction using Popularity Factor from Social MediaIRJET Journal
 
Building a Movie Success Predictor
Building a Movie Success PredictorBuilding a Movie Success Predictor
Building a Movie Success PredictorYouness Lahdili
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceHarivamshi D
 
Software Suite for Movie Market Analysis
Software Suite for Movie Market AnalysisSoftware Suite for Movie Market Analysis
Software Suite for Movie Market Analysisdariospin93
 
IRJET - Enhanced Movie Recommendation Engine using Content Filtering, Collabo...
IRJET - Enhanced Movie Recommendation Engine using Content Filtering, Collabo...IRJET - Enhanced Movie Recommendation Engine using Content Filtering, Collabo...
IRJET - Enhanced Movie Recommendation Engine using Content Filtering, Collabo...IRJET Journal
 
Quant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability DefaultsQuant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability DefaultsDavidkerrkelly
 
Nobody Knows Anything
Nobody Knows AnythingNobody Knows Anything
Nobody Knows AnythingJustinDavda
 
UTILIZING IMBALANCED DATA AND CLASSIFICATION COST MATRIX TO PREDICT MOVIE PRE...
UTILIZING IMBALANCED DATA AND CLASSIFICATION COST MATRIX TO PREDICT MOVIE PRE...UTILIZING IMBALANCED DATA AND CLASSIFICATION COST MATRIX TO PREDICT MOVIE PRE...
UTILIZING IMBALANCED DATA AND CLASSIFICATION COST MATRIX TO PREDICT MOVIE PRE...ijaia
 
Movie Prose - A Business Intelligence system
Movie Prose - A Business Intelligence systemMovie Prose - A Business Intelligence system
Movie Prose - A Business Intelligence systemankurkath
 
Applying Machine Learning for Mobile Games by Neil Patrick Del Gallego
Applying Machine Learning for Mobile Games by Neil Patrick Del GallegoApplying Machine Learning for Mobile Games by Neil Patrick Del Gallego
Applying Machine Learning for Mobile Games by Neil Patrick Del GallegoDEVCON
 
Fine grained root cause and impact analysis with CDAP Lineage
Fine grained root cause and impact analysis with CDAP LineageFine grained root cause and impact analysis with CDAP Lineage
Fine grained root cause and impact analysis with CDAP LineageBig Data Aplications Meetup
 
IRJET - YouTube Spam Comments Detection
IRJET - YouTube Spam Comments DetectionIRJET - YouTube Spam Comments Detection
IRJET - YouTube Spam Comments DetectionIRJET Journal
 
MOVIE RECOMMENDATION SYSTEM USING COLLABORATIVE FILTERING
MOVIE RECOMMENDATION SYSTEM USING COLLABORATIVE FILTERINGMOVIE RECOMMENDATION SYSTEM USING COLLABORATIVE FILTERING
MOVIE RECOMMENDATION SYSTEM USING COLLABORATIVE FILTERINGIRJET Journal
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language ProcessingYunyao Li
 
A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016
A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016
A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016Journal For Research
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language ProcessingYunyao Li
 
IRJET- Hybrid Recommendation System for Movies
IRJET-  	  Hybrid Recommendation System for MoviesIRJET-  	  Hybrid Recommendation System for Movies
IRJET- Hybrid Recommendation System for MoviesIRJET Journal
 

Similar to PowerPoint Presentation (20)

A comparative analysis of machine learning approaches for movie success predi...
A comparative analysis of machine learning approaches for movie success predi...A comparative analysis of machine learning approaches for movie success predi...
A comparative analysis of machine learning approaches for movie success predi...
 
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...
MOVIE SUCCESS PREDICTION AND PERFORMANCE COMPARISON USING VARIOUS STATISTICAL...
 
IRJET- Movie Success Prediction using Popularity Factor from Social Media
IRJET- Movie Success Prediction using Popularity Factor from Social MediaIRJET- Movie Success Prediction using Popularity Factor from Social Media
IRJET- Movie Success Prediction using Popularity Factor from Social Media
 
Building a Movie Success Predictor
Building a Movie Success PredictorBuilding a Movie Success Predictor
Building a Movie Success Predictor
 
Predicting Movie Success Using Neural Network
Predicting Movie Success Using Neural NetworkPredicting Movie Success Using Neural Network
Predicting Movie Success Using Neural Network
 
Movie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial IntelligenceMovie recommendation Engine using Artificial Intelligence
Movie recommendation Engine using Artificial Intelligence
 
Software Suite for Movie Market Analysis
Software Suite for Movie Market AnalysisSoftware Suite for Movie Market Analysis
Software Suite for Movie Market Analysis
 
IRJET - Enhanced Movie Recommendation Engine using Content Filtering, Collabo...
IRJET - Enhanced Movie Recommendation Engine using Content Filtering, Collabo...IRJET - Enhanced Movie Recommendation Engine using Content Filtering, Collabo...
IRJET - Enhanced Movie Recommendation Engine using Content Filtering, Collabo...
 
Quant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability DefaultsQuant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability Defaults
 
Nobody Knows Anything
Nobody Knows AnythingNobody Knows Anything
Nobody Knows Anything
 
UTILIZING IMBALANCED DATA AND CLASSIFICATION COST MATRIX TO PREDICT MOVIE PRE...
UTILIZING IMBALANCED DATA AND CLASSIFICATION COST MATRIX TO PREDICT MOVIE PRE...UTILIZING IMBALANCED DATA AND CLASSIFICATION COST MATRIX TO PREDICT MOVIE PRE...
UTILIZING IMBALANCED DATA AND CLASSIFICATION COST MATRIX TO PREDICT MOVIE PRE...
 
Movie Prose - A Business Intelligence system
Movie Prose - A Business Intelligence systemMovie Prose - A Business Intelligence system
Movie Prose - A Business Intelligence system
 
Applying Machine Learning for Mobile Games by Neil Patrick Del Gallego
Applying Machine Learning for Mobile Games by Neil Patrick Del GallegoApplying Machine Learning for Mobile Games by Neil Patrick Del Gallego
Applying Machine Learning for Mobile Games by Neil Patrick Del Gallego
 
Fine grained root cause and impact analysis with CDAP Lineage
Fine grained root cause and impact analysis with CDAP LineageFine grained root cause and impact analysis with CDAP Lineage
Fine grained root cause and impact analysis with CDAP Lineage
 
IRJET - YouTube Spam Comments Detection
IRJET - YouTube Spam Comments DetectionIRJET - YouTube Spam Comments Detection
IRJET - YouTube Spam Comments Detection
 
MOVIE RECOMMENDATION SYSTEM USING COLLABORATIVE FILTERING
MOVIE RECOMMENDATION SYSTEM USING COLLABORATIVE FILTERINGMOVIE RECOMMENDATION SYSTEM USING COLLABORATIVE FILTERING
MOVIE RECOMMENDATION SYSTEM USING COLLABORATIVE FILTERING
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016
A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016
A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
IRJET- Hybrid Recommendation System for Movies
IRJET-  	  Hybrid Recommendation System for MoviesIRJET-  	  Hybrid Recommendation System for Movies
IRJET- Hybrid Recommendation System for Movies
 

PowerPoint Presentation

  • 1. Model Comparison Movie Breakeven Analysis In U.S Market Liu Jialin | Priyadarshini Majumdar | Zhang Jiexi Data Analytics Lab Project Challenge from Nov 23rd onwards at a theatre near YOU INTRODUCTION METHODOLOGY What plays the most important role in making a movie profitable? Movie technical  Language 4.3/10  Content rating 4/10  Aspect ratio 3/10  Budget 2.5 /10  Duration 1.5/10  Colour or B&W 1/10 IMDB website Influence  No of IMDB users who voted 9/10  No of users reviewed 8/10  No of critics for reviews 6/10  IMDB score 5/10 Facebook influence • Movie Facebook likes 4.5/10 • Actor 3 Facebook likes > Actor 2 > Actor 1 • Cast total Facebook likes 3.5/10 • Director Facebook likes 3.2/10 Poster and Promotional materials  No of faces in a poster 2.6/10 objectives Data Processing1 Remove repetitive entries in JMP. Calculate gross profit= Create the binary Profit/Loss target variable and remove missing values. SAS Enterprise Miner: • Import the JMP file using File Import and Save Data nodes. • Change the level for Aspect Ratio to nominal in the File Import node. • Conduct text parsing, text clustering and text filter on plot key words and genres. • Use Multiplot node to view the distribution of the variables. • Recode missing values and erroneous entries using Replacement node. • Sample the data into Training Set and Validation Set using the Data Partition node. Before running the parametric models, fill in all missing values using the Impute node and transform the interval variables with skewed distributions using the Transform node. Predictive Model Construction Decision Tree Applying nonparametric algorithm, decision tree is capable of fitting a large number of functional forms and mapping observations to categorical targets. Model Comparison Conclusion Background: Movies are one of the top grossing industries in the world today and in the U.S. itself it is a 38 billion dollar market as of 2016 Motivation: IMDB is one of the top visited sites through which viewers often decide whether to watch a movie or not. Hence this has a direct effect on whether a movie will profit or loss. Primary Objective: To develop a model that can predict whether a movie will break even in the U.S. market or not. Secondary Objective: To relay to promoters who use social media for movie promotion on which factors affect the outcome of the movie Confusion Matrix for Model Comparison Gradient Boosting A Gradient Boosting model builds up a strong learning tree from a base set of weak learning trees, using Gradient Descending algorithm. It is computational intensive and has excellent performance for moderate number of variables after fine-tuning. Logistic Regression Logistics regression describes the relationship between categorical target variable and independent variables by estimating the probability from a cumulative logistic distribution. Neural Network Neural network is a parametric model that accommodates a wider variety of nonlinear relationships. Neural network also keeps checking the curse of dimensionality problem which bedevils attempts to model non-linear functions with large number of variables. Data set 5043 movie titles 28 variables The data set was scrapped from IMDB using Python’s scrappy library. This resulted in 5043 observations of 28 variables. Random Forest Random forest is ensemble of decision trees. It averages the predictive probability of a large number of over trained decision trees, thus is more robust against overfitting and more generalized than a single decision tree. Most influential factors 2nd Most influential factors 3rd Most influential factors Least influential factor 2 3 4 Target percentages show how accurate the model’s predictions are towards future data set. Outcome percentages, on the other hand, indicate the accuracy of model prediction for the sample data set. For Gradient Boosting and Neural Network, the Outcome 1/1 percentages are above 75%, which means the models have successfully predicted 75% of the breakeven movies. The Target 1/1 percentages are above 70%, which means the models predictions are reliable. Hence, Gradient Boosting and Neural Network are the models chosen to predict the breakeven status of the future movies in the U.S. market. Misclassification rate takes the false positives and the false negatives into consideration. Of all the models, Gradient Boosting has the lowest misclassification rate. This is not surprising given the delicate algorithm that seeks to minimise the intermediate pseudo- residuals rather than simply relying on one splitting criterion like in Decision Tree and Random Forest. Neural Network 2 works the second best, proving that its complicated algorithm which imitates human mind indeed has some advantage in building predictive models. The analysis and data set are highly reliant on online data given that it is extracted from a movie rating website. This is however is not the only defining factor. • Hence, further analysis on predicting movie successes should also take into consideration traditional promotional channels such as theatre data. • Additionally this data is collected over a period of time and when it comes to movies, popularity of the movie grows over a period of time. Hence for a more accurate analysis, time-stamps of the metrics must be collected and taken into consideration. • The most important insight from the above predictive analysis is that online popularity of a movie is the best indicator of its success • IMDB is a sought after site for movie opinions and hence movie votes, critic reviews and general public reviews are the greatest influencers • For Facebook likes Actor 3 Facebook likes are a better indicator than actor 2 and actor 1 Facebook likes. 𝑔𝑟𝑜𝑠𝑠−𝑏𝑢𝑑𝑔𝑒𝑡 𝑏𝑢𝑑𝑔𝑒𝑡 % future work