SlideShare a Scribd company logo
1 of 8
“NOBODY KNOWSANYTHING”
*
Movie Investment Decision Analysis
* Except Us
Why are we here?
The film industry has a serious business problem
■ Successful investing in movie production is hard: upwards of 50% of all films lose money for their
backers (i.e. financially backing films is no better than random chance)
■ However there is lots of public data available: www.themoviedb.org has millions of datapoints
on over 600k movies (and 1.9m people in the film industry)
■ Improving decision-making outcomes in this industry is a business problem that is ripe for DATA
ANALYTICS
■ We propose a supervised learning approach to generate binary classifications of profitability,
using only information known at the pitch stage (i.e. pre-production)
■ Predictive accuracy of > 80% is achievable using our proprietary model*
*Based on model backtesting on a randomly-sampled 20% holdoutset from 2965 to 2017
Data
We obtained raw data
on ~5000 films
released from 1960 to
2017 from public
sources
Detailed descriptive data
on Cast, Crew, Genre,
financials, Production
Company, Language,
Filming Location and
more were extracted
Substantial data pre-processing and cleaning was performed
EDA + Feature Engineering
We performed extensive exploratory data analysis in order to form hypotheses and identify potentially important features
And constructed
many novel features
from the raw data –
for example average
revenue and
profitability for
previous movies for
each key cast and crew
member
The final result was a
clean, scaled,
binarized dataset of
4803 rows, with 346
predictor columns
(mostly one-hots)
Modelling
We explored the effectiveness of predictive modelling along 2 dimensions:
1) Both binary (Positive Class = top quartile RoI) and multi-class classification (“Hit” = top quartile RoI, “Loss” = RoI < 0, “Neutral”
= everything in the middle) problems
2) Different types of data mining algorithm (Decision Trees, Logistic Regression, SupportVector Machines, K-Nearest
Neighbors, Random Forest, Bagged Decision Trees) were evaluated
Grid searching was employed for hyper-parameter tuning, and models were evaluated via K-fold cross-validation on training data
primarily based on classification accuracy and ROC AUC before final testing on a hold-out data set.
The most effective models were found to be tree-based: Random Forest and Bagged Decision Trees were the highest-performing
models
Evaluation – Multiclass Classification
Performance on the multi-class classification problem was challenged:
In both of these (best-performing!) models the
classifier struggled to identify the majority of
the target class (“Hit”) , or to differentiate
between “Hits”, “Losses” and the majority class
(approx. 60% of the data set) of “Neutral”
We posit that this is because the boundaries
between classes are hard and somewhat
artificial – i.e. RoI = 2.905 is a “Hit” but RoI =
2.895 is “Neutral” – and as such do not represent
natural clusters in the decision space
This level of performance is unlikely to be useful
to the business end user.
Bagged DecisionTree Random Forest
Evaluation – Binary Classification
Model performance on the binary classification problem – with the boundary of the positive class set at top-quartile return on
investment (RoI) – was highly satisfactory however. Random Forest was the top performing model evaluated
The tuned RF model scored well on the most important criteria for the business problem:
■ Precision = (TP /TP+FP) = when we predict a movie will be profitable it is ~74% of the time
■ ROC AUC = degree of separability between predictions of the two classes = the average positive
prediction has only ~19% of negative examples scored higher than it
■ Lift = ratio of results obtained with and without the model
For our specific business problem we are less concerned with recall (what proportion of theTP we
identify) as “passing” on a movie that turns out to be profitable has only opportunity not real cost
These results should be considered in the context of the business problem e.g. >50% of movies made lose
money, and that the base rate of “profitability” (as defined here) in our population is only 25%
Business Use + Conclusions
Using population averages for film production
cost and revenue we can evaluate the model in
an expected value framework to assess
potential real-world business impact:
■ Funding the top 15% of movies we see, as
ranked by our tuned RF binary classifier,
would result in optimal expected return on
investment (total expected profit / total
expected investment)
■ Funding the top 35% of movies would
result in slightly higher total profit, but at a
cost of significantly higher capital required
for investment
Analysis of feature
importance provides a
useful guide to investors
as to what attributes are
correlated to high
profitability:
“Good”
Features
“Bad”
Features
Visualisation of
sample decision
trees can help
stakeholders
interpret model
predictions
DeploymentConsiderations:
■ Accuracy of source data (esp. on
budgets, and for older films)
■ Model trained on pre-Covid data
■ Expected value assumes fixed
cost and revenue per film
■ Profit curve ignores sequencing
-> too many FP’s on big budget
movies could lead to
bankruptcy!
Opportunities for Enhancement:
■ Obtain more data
■ Refine multi-class model
■ Additional dimensionality
reduction

More Related Content

Similar to Nobody Knows Anything

"Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese...
"Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese..."Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese...
"Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese...Quantopian
 
Iwsm2014 why cant people estimate (dan galorath)
Iwsm2014   why cant people estimate (dan galorath)Iwsm2014   why cant people estimate (dan galorath)
Iwsm2014 why cant people estimate (dan galorath)Nesma
 
Regressioin mini case
Regressioin mini caseRegressioin mini case
Regressioin mini caseveesingh
 
Machine learning for factor investing
Machine learning for factor investingMachine learning for factor investing
Machine learning for factor investingQuantUniversity
 
Software Suite for Movie Market Analysis
Software Suite for Movie Market AnalysisSoftware Suite for Movie Market Analysis
Software Suite for Movie Market Analysisdariospin93
 
Building a Movie Success Predictor
Building a Movie Success PredictorBuilding a Movie Success Predictor
Building a Movie Success PredictorYouness Lahdili
 
Capacity building business template (success lab to market)
Capacity building business template (success lab to market)Capacity building business template (success lab to market)
Capacity building business template (success lab to market)Sharifah Nur Rahimah
 
Stock Market Analysis Markov Models
Stock Market Analysis Markov ModelsStock Market Analysis Markov Models
Stock Market Analysis Markov ModelsGabriel Policiuc
 
Using Minitab for Superior Quality in Medical Device Manufacturing
Using Minitab for Superior Quality in Medical Device ManufacturingUsing Minitab for Superior Quality in Medical Device Manufacturing
Using Minitab for Superior Quality in Medical Device ManufacturingMinitab, LLC
 
Profit Maximization over Social Networks
Profit Maximization over Social NetworksProfit Maximization over Social Networks
Profit Maximization over Social NetworksWei Lu
 
Quant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability DefaultsQuant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability DefaultsDavidkerrkelly
 
Trends in Economic Capital Modeling: Curt Burmeister, Head of Buy-Side Produc...
Trends in Economic Capital Modeling: Curt Burmeister, Head of Buy-Side Produc...Trends in Economic Capital Modeling: Curt Burmeister, Head of Buy-Side Produc...
Trends in Economic Capital Modeling: Curt Burmeister, Head of Buy-Side Produc...Graham Jones
 
Credit risk off shoring
Credit risk off shoringCredit risk off shoring
Credit risk off shoringVenkat Iyer
 
What Women Want - Movio White Paper
What Women Want - Movio White PaperWhat Women Want - Movio White Paper
What Women Want - Movio White PaperBryan Smith
 

Similar to Nobody Knows Anything (20)

"Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese...
"Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese..."Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese...
"Supply Chain Earnings Diffusion" by Josh Holcroft, Head of Quantitative Rese...
 
Iwsm2014 why cant people estimate (dan galorath)
Iwsm2014   why cant people estimate (dan galorath)Iwsm2014   why cant people estimate (dan galorath)
Iwsm2014 why cant people estimate (dan galorath)
 
Predicting Movie Success Using Neural Network
Predicting Movie Success Using Neural NetworkPredicting Movie Success Using Neural Network
Predicting Movie Success Using Neural Network
 
Regressioin mini case
Regressioin mini caseRegressioin mini case
Regressioin mini case
 
Machine learning for factor investing
Machine learning for factor investingMachine learning for factor investing
Machine learning for factor investing
 
Software Suite for Movie Market Analysis
Software Suite for Movie Market AnalysisSoftware Suite for Movie Market Analysis
Software Suite for Movie Market Analysis
 
Building a Movie Success Predictor
Building a Movie Success PredictorBuilding a Movie Success Predictor
Building a Movie Success Predictor
 
Capacity building business template (success lab to market)
Capacity building business template (success lab to market)Capacity building business template (success lab to market)
Capacity building business template (success lab to market)
 
Tim P
Tim P   Tim P
Tim P
 
Stock Market Analysis Markov Models
Stock Market Analysis Markov ModelsStock Market Analysis Markov Models
Stock Market Analysis Markov Models
 
Foresee your movie revenue
Foresee your movie revenueForesee your movie revenue
Foresee your movie revenue
 
Using Minitab for Superior Quality in Medical Device Manufacturing
Using Minitab for Superior Quality in Medical Device ManufacturingUsing Minitab for Superior Quality in Medical Device Manufacturing
Using Minitab for Superior Quality in Medical Device Manufacturing
 
Risk Analysis for Dummies
Risk Analysis for DummiesRisk Analysis for Dummies
Risk Analysis for Dummies
 
R af d
R af dR af d
R af d
 
Pro max icdm2012-slides
Pro max icdm2012-slidesPro max icdm2012-slides
Pro max icdm2012-slides
 
Profit Maximization over Social Networks
Profit Maximization over Social NetworksProfit Maximization over Social Networks
Profit Maximization over Social Networks
 
Quant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability DefaultsQuant Foundry Labs - Low Probability Defaults
Quant Foundry Labs - Low Probability Defaults
 
Trends in Economic Capital Modeling: Curt Burmeister, Head of Buy-Side Produc...
Trends in Economic Capital Modeling: Curt Burmeister, Head of Buy-Side Produc...Trends in Economic Capital Modeling: Curt Burmeister, Head of Buy-Side Produc...
Trends in Economic Capital Modeling: Curt Burmeister, Head of Buy-Side Produc...
 
Credit risk off shoring
Credit risk off shoringCredit risk off shoring
Credit risk off shoring
 
What Women Want - Movio White Paper
What Women Want - Movio White PaperWhat Women Want - Movio White Paper
What Women Want - Movio White Paper
 

Recently uploaded

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 

Nobody Knows Anything

  • 1. “NOBODY KNOWSANYTHING” * Movie Investment Decision Analysis * Except Us
  • 2. Why are we here? The film industry has a serious business problem ■ Successful investing in movie production is hard: upwards of 50% of all films lose money for their backers (i.e. financially backing films is no better than random chance) ■ However there is lots of public data available: www.themoviedb.org has millions of datapoints on over 600k movies (and 1.9m people in the film industry) ■ Improving decision-making outcomes in this industry is a business problem that is ripe for DATA ANALYTICS ■ We propose a supervised learning approach to generate binary classifications of profitability, using only information known at the pitch stage (i.e. pre-production) ■ Predictive accuracy of > 80% is achievable using our proprietary model* *Based on model backtesting on a randomly-sampled 20% holdoutset from 2965 to 2017
  • 3. Data We obtained raw data on ~5000 films released from 1960 to 2017 from public sources Detailed descriptive data on Cast, Crew, Genre, financials, Production Company, Language, Filming Location and more were extracted Substantial data pre-processing and cleaning was performed
  • 4. EDA + Feature Engineering We performed extensive exploratory data analysis in order to form hypotheses and identify potentially important features And constructed many novel features from the raw data – for example average revenue and profitability for previous movies for each key cast and crew member The final result was a clean, scaled, binarized dataset of 4803 rows, with 346 predictor columns (mostly one-hots)
  • 5. Modelling We explored the effectiveness of predictive modelling along 2 dimensions: 1) Both binary (Positive Class = top quartile RoI) and multi-class classification (“Hit” = top quartile RoI, “Loss” = RoI < 0, “Neutral” = everything in the middle) problems 2) Different types of data mining algorithm (Decision Trees, Logistic Regression, SupportVector Machines, K-Nearest Neighbors, Random Forest, Bagged Decision Trees) were evaluated Grid searching was employed for hyper-parameter tuning, and models were evaluated via K-fold cross-validation on training data primarily based on classification accuracy and ROC AUC before final testing on a hold-out data set. The most effective models were found to be tree-based: Random Forest and Bagged Decision Trees were the highest-performing models
  • 6. Evaluation – Multiclass Classification Performance on the multi-class classification problem was challenged: In both of these (best-performing!) models the classifier struggled to identify the majority of the target class (“Hit”) , or to differentiate between “Hits”, “Losses” and the majority class (approx. 60% of the data set) of “Neutral” We posit that this is because the boundaries between classes are hard and somewhat artificial – i.e. RoI = 2.905 is a “Hit” but RoI = 2.895 is “Neutral” – and as such do not represent natural clusters in the decision space This level of performance is unlikely to be useful to the business end user. Bagged DecisionTree Random Forest
  • 7. Evaluation – Binary Classification Model performance on the binary classification problem – with the boundary of the positive class set at top-quartile return on investment (RoI) – was highly satisfactory however. Random Forest was the top performing model evaluated The tuned RF model scored well on the most important criteria for the business problem: ■ Precision = (TP /TP+FP) = when we predict a movie will be profitable it is ~74% of the time ■ ROC AUC = degree of separability between predictions of the two classes = the average positive prediction has only ~19% of negative examples scored higher than it ■ Lift = ratio of results obtained with and without the model For our specific business problem we are less concerned with recall (what proportion of theTP we identify) as “passing” on a movie that turns out to be profitable has only opportunity not real cost These results should be considered in the context of the business problem e.g. >50% of movies made lose money, and that the base rate of “profitability” (as defined here) in our population is only 25%
  • 8. Business Use + Conclusions Using population averages for film production cost and revenue we can evaluate the model in an expected value framework to assess potential real-world business impact: ■ Funding the top 15% of movies we see, as ranked by our tuned RF binary classifier, would result in optimal expected return on investment (total expected profit / total expected investment) ■ Funding the top 35% of movies would result in slightly higher total profit, but at a cost of significantly higher capital required for investment Analysis of feature importance provides a useful guide to investors as to what attributes are correlated to high profitability: “Good” Features “Bad” Features Visualisation of sample decision trees can help stakeholders interpret model predictions DeploymentConsiderations: ■ Accuracy of source data (esp. on budgets, and for older films) ■ Model trained on pre-Covid data ■ Expected value assumes fixed cost and revenue per film ■ Profit curve ignores sequencing -> too many FP’s on big budget movies could lead to bankruptcy! Opportunities for Enhancement: ■ Obtain more data ■ Refine multi-class model ■ Additional dimensionality reduction