SlideShare a Scribd company logo
“NOBODY KNOWSANYTHING”
*
Movie Investment Decision Analysis
* Except Us
Why are we here?
The film industry has a serious business problem
■ Successful investing in movie production is hard: upwards of 50% of all films lose money for their
backers (i.e. financially backing films is no better than random chance)
■ However there is lots of public data available: www.themoviedb.org has millions of datapoints
on over 600k movies (and 1.9m people in the film industry)
■ Improving decision-making outcomes in this industry is a business problem that is ripe for DATA
ANALYTICS
■ We propose a supervised learning approach to generate binary classifications of profitability,
using only information known at the pitch stage (i.e. pre-production)
■ Predictive accuracy of > 80% is achievable using our proprietary model*
*Based on model backtesting on a randomly-sampled 20% holdoutset from 2965 to 2017
Data
We obtained raw data
on ~5000 films
released from 1960 to
2017 from public
sources
Detailed descriptive data
on Cast, Crew, Genre,
financials, Production
Company, Language,
Filming Location and
more were extracted
Substantial data pre-processing and cleaning was performed
EDA + Feature Engineering
We performed extensive exploratory data analysis in order to form hypotheses and identify potentially important features
And constructed
many novel features
from the raw data –
for example average
revenue and
profitability for
previous movies for
each key cast and crew
member
The final result was a
clean, scaled,
binarized dataset of
4803 rows, with 346
predictor columns
(mostly one-hots)
Modelling
We explored the effectiveness of predictive modelling along 2 dimensions:
1) Both binary (Positive Class = top quartile RoI) and multi-class classification (“Hit” = top quartile RoI, “Loss” = RoI < 0, “Neutral”
= everything in the middle) problems
2) Different types of data mining algorithm (Decision Trees, Logistic Regression, SupportVector Machines, K-Nearest
Neighbors, Random Forest, Bagged Decision Trees) were evaluated
Grid searching was employed for hyper-parameter tuning, and models were evaluated via K-fold cross-validation on training data
primarily based on classification accuracy and ROC AUC before final testing on a hold-out data set.
The most effective models were found to be tree-based: Random Forest and Bagged Decision Trees were the highest-performing
models
Evaluation – Multiclass Classification
Performance on the multi-class classification problem was challenged:
In both of these (best-performing!) models the
classifier struggled to identify the majority of
the target class (“Hit”) , or to differentiate
between “Hits”, “Losses” and the majority class
(approx. 60% of the data set) of “Neutral”
We posit that this is because the boundaries
between classes are hard and somewhat
artificial – i.e. RoI = 2.905 is a “Hit” but RoI =
2.895 is “Neutral” – and as such do not represent
natural clusters in the decision space
This level of performance is unlikely to be useful
to the business end user.
Bagged DecisionTree Random Forest
Evaluation – Binary Classification
Model performance on the binary classification problem – with the boundary of the positive class set at top-quartile return on
investment (RoI) – was highly satisfactory however. Random Forest was the top performing model evaluated
The tuned RF model scored well on the most important criteria for the business problem:
■ Precision = (TP /TP+FP) = when we predict a movie will be profitable it is ~74% of the time
■ ROC AUC = degree of separability between predictions of the two classes = the average positive
prediction has only ~19% of negative examples scored higher than it
■ Lift = ratio of results obtained with and without the model
For our specific business problem we are less concerned with recall (what proportion of theTP we
identify) as “passing” on a movie that turns out to be profitable has only opportunity not real cost
These results should be considered in the context of the business problem e.g. >50% of movies made lose
money, and that the base rate of “profitability” (as defined here) in our population is only 25%
Business Use + Conclusions
Using population averages for film production
cost and revenue we can evaluate the model in
an expected value framework to assess
potential real-world business impact:
■ Funding the top 15% of movies we see, as
ranked by our tuned RF binary classifier,
would result in optimal expected return on
investment (total expected profit / total
expected investment)
■ Funding the top 35% of movies would
result in slightly higher total profit, but at a
cost of significantly higher capital required
for investment
Analysis of feature
importance provides a
useful guide to investors
as to what attributes are
correlated to high
profitability:
“Good”
Features
“Bad”
Features
Visualisation of
sample decision
trees can help
stakeholders
interpret model
predictions
DeploymentConsiderations:
■ Accuracy of source data (esp. on
budgets, and for older films)
■ Model trained on pre-Covid data
■ Expected value assumes fixed
cost and revenue per film
■ Profit curve ignores sequencing
-> too many FP’s on big budget
movies could lead to
bankruptcy!
Opportunities for Enhancement:
■ Obtain more data
■ Refine multi-class model
■ Additional dimensionality
reduction

More Related Content

Similar to Nobody Knows Anything

"Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese...
"Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese..."Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese...
"Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese...
Quantopian
 
Iwsm2014 why cant people estimate (dan galorath)
Iwsm2014   why cant people estimate (dan galorath)Iwsm2014   why cant people estimate (dan galorath)
Iwsm2014 why cant people estimate (dan galorath)Nesma
 
Predicting Movie Success Using Neural Network
Predicting Movie Success Using Neural NetworkPredicting Movie Success Using Neural Network
Predicting Movie Success Using Neural Network
International Journal of Science and Research (IJSR)
 
Regressioin mini case
Regressioin mini caseRegressioin mini case
Regressioin mini case
veesingh
 
Machine learning for factor investing
Machine learning for factor investingMachine learning for factor investing
Machine learning for factor investing
QuantUniversity
 
Software Suite for Movie Market Analysis
Software Suite for Movie Market AnalysisSoftware Suite for Movie Market Analysis
Software Suite for Movie Market Analysis
dariospin93
 
Building a Movie Success Predictor
Building a Movie Success PredictorBuilding a Movie Success Predictor
Building a Movie Success Predictor
Youness Lahdili
 
Capacity building business template (success lab to market)
Capacity building business template (success lab to market)Capacity building business template (success lab to market)
Capacity building business template (success lab to market)
Sharifah Nur Rahimah
 
Tim P
Tim P   Tim P
Tim P
Hilary Ip
 
Stock Market Analysis Markov Models
Stock Market Analysis Markov ModelsStock Market Analysis Markov Models
Stock Market Analysis Markov ModelsGabriel Policiuc
 
Foresee your movie revenue
Foresee your movie revenueForesee your movie revenue
Foresee your movie revenue
Chung-Hsuan (Gary), Kao
 
Using Minitab for Superior Quality in Medical Device Manufacturing
Using Minitab for Superior Quality in Medical Device ManufacturingUsing Minitab for Superior Quality in Medical Device Manufacturing
Using Minitab for Superior Quality in Medical Device Manufacturing
Minitab, LLC
 
Risk Analysis for Dummies
Risk Analysis for DummiesRisk Analysis for Dummies
Risk Analysis for Dummies
William L. McGill
 
Profit Maximization over Social Networks
Profit Maximization over Social NetworksProfit Maximization over Social Networks
Profit Maximization over Social Networks
Wei Lu
 
Quant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability DefaultsQuant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability Defaults
Davidkerrkelly
 
Trends in Economic Capital Modeling: Curt Burmeister, Head of Buy-Side Produc...
Trends in Economic Capital Modeling: Curt Burmeister, Head of Buy-Side Produc...Trends in Economic Capital Modeling: Curt Burmeister, Head of Buy-Side Produc...
Trends in Economic Capital Modeling: Curt Burmeister, Head of Buy-Side Produc...
Graham Jones
 
Credit risk off shoring
Credit risk off shoringCredit risk off shoring
Credit risk off shoringVenkat Iyer
 
ENTR4800 Class 5 (Part 1): Conducting a Costing Analysis for Social Enterprise
ENTR4800 Class 5 (Part 1): Conducting a Costing Analysis for Social EnterpriseENTR4800 Class 5 (Part 1): Conducting a Costing Analysis for Social Enterprise
ENTR4800 Class 5 (Part 1): Conducting a Costing Analysis for Social Enterprise
Social Entrepreneurship
 

Similar to Nobody Knows Anything (20)

"Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese...
"Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese..."Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese...
"Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese...
 
Iwsm2014 why cant people estimate (dan galorath)
Iwsm2014   why cant people estimate (dan galorath)Iwsm2014   why cant people estimate (dan galorath)
Iwsm2014 why cant people estimate (dan galorath)
 
Predicting Movie Success Using Neural Network
Predicting Movie Success Using Neural NetworkPredicting Movie Success Using Neural Network
Predicting Movie Success Using Neural Network
 
Regressioin mini case
Regressioin mini caseRegressioin mini case
Regressioin mini case
 
Machine learning for factor investing
Machine learning for factor investingMachine learning for factor investing
Machine learning for factor investing
 
Software Suite for Movie Market Analysis
Software Suite for Movie Market AnalysisSoftware Suite for Movie Market Analysis
Software Suite for Movie Market Analysis
 
Building a Movie Success Predictor
Building a Movie Success PredictorBuilding a Movie Success Predictor
Building a Movie Success Predictor
 
Capacity building business template (success lab to market)
Capacity building business template (success lab to market)Capacity building business template (success lab to market)
Capacity building business template (success lab to market)
 
Tim P
Tim P   Tim P
Tim P
 
Stock Market Analysis Markov Models
Stock Market Analysis Markov ModelsStock Market Analysis Markov Models
Stock Market Analysis Markov Models
 
Foresee your movie revenue
Foresee your movie revenueForesee your movie revenue
Foresee your movie revenue
 
Using Minitab for Superior Quality in Medical Device Manufacturing
Using Minitab for Superior Quality in Medical Device ManufacturingUsing Minitab for Superior Quality in Medical Device Manufacturing
Using Minitab for Superior Quality in Medical Device Manufacturing
 
R af d
R af dR af d
R af d
 
Risk Analysis for Dummies
Risk Analysis for DummiesRisk Analysis for Dummies
Risk Analysis for Dummies
 
Profit Maximization over Social Networks
Profit Maximization over Social NetworksProfit Maximization over Social Networks
Profit Maximization over Social Networks
 
Pro max icdm2012-slides
Pro max icdm2012-slidesPro max icdm2012-slides
Pro max icdm2012-slides
 
Quant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability DefaultsQuant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability Defaults
 
Trends in Economic Capital Modeling: Curt Burmeister, Head of Buy-Side Produc...
Trends in Economic Capital Modeling: Curt Burmeister, Head of Buy-Side Produc...Trends in Economic Capital Modeling: Curt Burmeister, Head of Buy-Side Produc...
Trends in Economic Capital Modeling: Curt Burmeister, Head of Buy-Side Produc...
 
Credit risk off shoring
Credit risk off shoringCredit risk off shoring
Credit risk off shoring
 
ENTR4800 Class 5 (Part 1): Conducting a Costing Analysis for Social Enterprise
ENTR4800 Class 5 (Part 1): Conducting a Costing Analysis for Social EnterpriseENTR4800 Class 5 (Part 1): Conducting a Costing Analysis for Social Enterprise
ENTR4800 Class 5 (Part 1): Conducting a Costing Analysis for Social Enterprise
 

Recently uploaded

Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 

Recently uploaded (20)

Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 

Nobody Knows Anything

  • 1. “NOBODY KNOWSANYTHING” * Movie Investment Decision Analysis * Except Us
  • 2. Why are we here? The film industry has a serious business problem ■ Successful investing in movie production is hard: upwards of 50% of all films lose money for their backers (i.e. financially backing films is no better than random chance) ■ However there is lots of public data available: www.themoviedb.org has millions of datapoints on over 600k movies (and 1.9m people in the film industry) ■ Improving decision-making outcomes in this industry is a business problem that is ripe for DATA ANALYTICS ■ We propose a supervised learning approach to generate binary classifications of profitability, using only information known at the pitch stage (i.e. pre-production) ■ Predictive accuracy of > 80% is achievable using our proprietary model* *Based on model backtesting on a randomly-sampled 20% holdoutset from 2965 to 2017
  • 3. Data We obtained raw data on ~5000 films released from 1960 to 2017 from public sources Detailed descriptive data on Cast, Crew, Genre, financials, Production Company, Language, Filming Location and more were extracted Substantial data pre-processing and cleaning was performed
  • 4. EDA + Feature Engineering We performed extensive exploratory data analysis in order to form hypotheses and identify potentially important features And constructed many novel features from the raw data – for example average revenue and profitability for previous movies for each key cast and crew member The final result was a clean, scaled, binarized dataset of 4803 rows, with 346 predictor columns (mostly one-hots)
  • 5. Modelling We explored the effectiveness of predictive modelling along 2 dimensions: 1) Both binary (Positive Class = top quartile RoI) and multi-class classification (“Hit” = top quartile RoI, “Loss” = RoI < 0, “Neutral” = everything in the middle) problems 2) Different types of data mining algorithm (Decision Trees, Logistic Regression, SupportVector Machines, K-Nearest Neighbors, Random Forest, Bagged Decision Trees) were evaluated Grid searching was employed for hyper-parameter tuning, and models were evaluated via K-fold cross-validation on training data primarily based on classification accuracy and ROC AUC before final testing on a hold-out data set. The most effective models were found to be tree-based: Random Forest and Bagged Decision Trees were the highest-performing models
  • 6. Evaluation – Multiclass Classification Performance on the multi-class classification problem was challenged: In both of these (best-performing!) models the classifier struggled to identify the majority of the target class (“Hit”) , or to differentiate between “Hits”, “Losses” and the majority class (approx. 60% of the data set) of “Neutral” We posit that this is because the boundaries between classes are hard and somewhat artificial – i.e. RoI = 2.905 is a “Hit” but RoI = 2.895 is “Neutral” – and as such do not represent natural clusters in the decision space This level of performance is unlikely to be useful to the business end user. Bagged DecisionTree Random Forest
  • 7. Evaluation – Binary Classification Model performance on the binary classification problem – with the boundary of the positive class set at top-quartile return on investment (RoI) – was highly satisfactory however. Random Forest was the top performing model evaluated The tuned RF model scored well on the most important criteria for the business problem: ■ Precision = (TP /TP+FP) = when we predict a movie will be profitable it is ~74% of the time ■ ROC AUC = degree of separability between predictions of the two classes = the average positive prediction has only ~19% of negative examples scored higher than it ■ Lift = ratio of results obtained with and without the model For our specific business problem we are less concerned with recall (what proportion of theTP we identify) as “passing” on a movie that turns out to be profitable has only opportunity not real cost These results should be considered in the context of the business problem e.g. >50% of movies made lose money, and that the base rate of “profitability” (as defined here) in our population is only 25%
  • 8. Business Use + Conclusions Using population averages for film production cost and revenue we can evaluate the model in an expected value framework to assess potential real-world business impact: ■ Funding the top 15% of movies we see, as ranked by our tuned RF binary classifier, would result in optimal expected return on investment (total expected profit / total expected investment) ■ Funding the top 35% of movies would result in slightly higher total profit, but at a cost of significantly higher capital required for investment Analysis of feature importance provides a useful guide to investors as to what attributes are correlated to high profitability: “Good” Features “Bad” Features Visualisation of sample decision trees can help stakeholders interpret model predictions DeploymentConsiderations: ■ Accuracy of source data (esp. on budgets, and for older films) ■ Model trained on pre-Covid data ■ Expected value assumes fixed cost and revenue per film ■ Profit curve ignores sequencing -> too many FP’s on big budget movies could lead to bankruptcy! Opportunities for Enhancement: ■ Obtain more data ■ Refine multi-class model ■ Additional dimensionality reduction