SlideShare a Scribd company logo
PresciientTraining
The Zen of Predictive Modelling
Eugene Dubossarsky
eugene@presciient.com
+61414573322
@cargomoose
What This Talk Isn’t About
But worth mentioning anyway:
R and The Sydney Users of R Forum
Analyst First
My Courses
Sydney Users of R Forum
• Just 1 shy of 500 members
• Regular meetups
• Study groups: introduction to R, “Machine
Learning for Hackers”, “Elements of Statistical
Learning”
R
• Do a Google image search for “ggplot2”
• Look for “r4stats”, “popularity”
• Join SURF
• Download R and start using it.
Analyst First
• Strategic, Cultural, Organisational, Human issues in
analytics
• Making analytics work in organisations
• Focus on the Human side of analytics
• International : Aust, NZ, Singapore, US, Japan, India, Hong
Kong
• analystfirst.com – see “core principles” and “what is analyst
first” ?
My Analytics Training Courses
• Predictive Modelling, Data Mining, R, Forensic
Analytics, Visualisation, Forecasting training courses
• Sydney, Melbourne, Canberra, Singapore
• Public and in-house
• Pre-prepared or customised
• Informal coaching/mentoring
• Strategy, Review, Advice and Assistance with Analytics Capability
Development in your organisation
The Zen of Predictive Modelling
PredictiveModels
• The Most Important Part of My “Predictive Modelling and Data Mining Course”
• What every user of predictive modelling should know
• What every manager and owner of predictive modelling capability must know
• “Open Secrets” known to the masters
The Zen of Predictive Modelling
PredictiveModels
• To save people time
• To see the forest for the trees
• To real value out of predictive analytics
The Right Point of View
PredictiveModels
Which is unlike the other two?
• Kohonen neural network
• Backpropagation neural network
• CART decision tree
The Right Point of View
PredictiveModels
Which is unlike the other two?
• CART decision tree
• Random Forest
• Support Vector Machine
The Right Point of View
PredictiveModels
Which is unlike the other two?
• Backpropagation Neural Network
• Linear Model
• CART Decision Tree
The Right Point of View
PredictiveModels
• Out Of Sample Accuracy
• Robustness (Out of Time Accuracy)
• Interpretability
• Implementability
The Right Point of View
PredictiveModels
• Out Of Sample Accuracy
• Robustness (Out of Time Accuracy)
• Interpretability
• Implementability
The Right Point of View
PredictiveModels
• Out Of Sample Accuracy
• Robustness (Out of Time Accuracy)
• Interpretability
• Implementability
The Right Point of View
PredictiveModels
Why build predictive models ?
• Insights
• Operational prediction
• “What-if” analysis
What Do All Predictive Models Have in Common ?
PredictiveModels
All Predictive Models:
• Have a training set of predictors and outcomes
• Probably have a cross-validation and test set of predictors and outcomes too.
• Are “fit” (optimsied) to minimise an error function between their actual and target
outcomes
• Are probably cross-validated to control overfitting on an out-of-sample data set
• Provide information on the relationship between the predictors and outcomes in
the data
• Can be used to score new data (make new predictions)
• Can be deployed in IT systems
• Can be interrogated for insights
• Are only as accurate as the data allows
• Provide a (fairly) accurate estimate of how well they will predict on new data
What Do All Predictive Model Insights Have in Common ?
PredictiveModels
All Predictive Models:
• Have variable importance measures (a number of which can be applied to any
model)
• Allow plotting predictors vs outcomes
• Have variable accuracy measures
• Can be resampled for more robust measures of accuracy
What Do All Predictive Model Predictions Have in Common?
PredictiveModels
All Predictive Models:
• Make predictions that are numeric : estimates of amount for regression, and
probability for classification
• All predictions are applications of the underlying model structure and parameters
(formula) to new predictor data sets
• All predictions are deterministic. Once a model is fitted, the predictions for a given
record will be the same every time. (Though the prediction may be a distribution
rather than a fixed point. Also, note that model fitting itself may be random – some
models may differ slightly each time they are fitted to the same data set)
How Do Predictive Model Families Differ?
PredictiveModels
• Classification vs Regression (most families can do both)
• Predictive accuracy vs insights
• Predictive accuracy vs stability
• Deterministic fitting vs randomised fitting
• Specific insights
• Structure and complexity
• Model assumptions (linear models, neural nets)
• Model structure (trees vs additive models vs SVM vs Neural Nets etc)
• The kinds of insights models provide
• Tendency to overfit (most, but not all)
• Dependence on metrics
• Sensitivity to missing values and categorical variables
Becoming a Master of Modelling Kung Fu
PredictiveModels
• Predictive models should be thought of as a “black box” initially, with the
characteristics that all models have in common recognised
• The focus should be on the data, not the model.
• Focusing on the specific characteristics of the model is important when: deciding on
the degree of accuracy desired, and the kinds of insights desired.
• It is good to start by working with one highly accurate, simple to use method
(randomForest is a good choice) and one or two highly interpretable models (rpart
decision trees and (generalised) linear models are good here.
• In fact, you can go a long way with just randomForest alone.
Becoming a Master of Modelling Kung Fu
PredictiveModels
• Master an adequate tool.
• Empty your mind of the tool . It is an illusion.
• Meditate on the data.
Meditating on Data
PredictiveModels
• Start with a highly accurate, nonparametric model you are comfortable with.
• The accuracy of a highly accuarate method is close to the theoretical limit of
accuracy possible on the data. World class experts may get closer, but not a whole
lot closer.
• So once you build the model, forget about the specific family you used. It is just a
tool.
• Each predictor may provide a unique amount of predictability to the model.
Measure it.
• Each predictor may be masked by other predictors. Be careful.
• Check relationships between data and strongest predictors
Meditating on Data
PredictiveModels
• There are at least 3 ways that a predictor can be important. They are not the same:
• What is the unique contribution of the predictor to the accuracy of the model
?
• What is the individual predictive power of the predictor alone ?
• How vital is the predictor to the structure of a particular model ?
• The first two are about the data, the third is more about the specific model. Which
is more important ?
Meditating on Data
PredictiveModels
• There are at least 3 ways that a predictor can be important. They are not the same:
• What is the unique contribution of the predictor to the accuracy of the model
?
• What is the individual predictive power of the predictor alone ?
• How vital is the predictor to the structure of a particular model ?
• The first two are about the data, the third is more about the specific model. Which
is more important ?
The Predictive Modelling Master’s Data Meditation
PredictiveModels
• Start with a highly accurate, nonparametric model you are comfortable with.
• The accuracy of a highly accuarate method is close to the theoretical limit of
accuracy possible on the data. World class experts may get closer, but not a whole
lot closer.
• So once you build the model, forget about the specific family you used. It is just a
tool.
• Measure model accuracy on out-of-sample data. Pay attention to any imbalances in
class or data subset accuracy.
• Measure model stability if necessary (it almost always is)
• Measure the importance of all variables, using the three main techniques.
• Measure again, holding some of the main predictors constant
• Measure (visualise) the effects of each predictor
• Build an interpretable model to help tell the story
The Master Sharpens the Sword : Getting More Accuracy
PredictiveModels
• There is never enough data
• Some model accuracy can result from trying other model families. Usually not
much, and not the best use of time, though for some reason the favourite activity of
new data miners.
• Some more model accuracy can result from tweaking model parameters. This is
perhaps less of a waste of time, but still not the ideal focus.
• The most dramatic improvement in model accuracy comes from new predictors.
• New predictors may be entirely new data sets, or complex new transformations of
existing data.
• A large, multi-tabular data set may well have information that has not been
captured in the data.
• The most common information of this type involves relations between individual
records. (eg. Time series windows, geographic neighbourhoods or social network
statistics per record)
Illusions On the Path
PredictiveModels
• Colossal wastes of time can include
• Trying to find the “right” model family
• Getting stuck in data preprocessing trying to get all the predictors “right”
• Trying to figure out what the targets should be (usually a sign that the business
problem is not well understood)
• Trying to “improve” the model without defining what that means
The Sun Tzu of Modelling: Be Prepared
PredictiveModels
• Know what you are modelling and for what purpose.
• Know what your target variable is. You may have more than one.
• Do not hesitate, model with what you have, and add more predictors later.
• Messy data is better than no data
• Use the right error measures
• Know the connection between the model and your business
• Evaluate, interrogate the model accordingly
• Always question the business value of the analysis
• Always be ready to suggest the business use of the analysis
• Don’t assume that the client understands what to do with the model
Strategy and Tactics
PredictiveModels
• Why are you (re)building the model?
• If Strategic: what is going to be done with the insights ? By whom ?
• If Operational: what are the key metrics – accuracy, value, deployability?
Questions ?
PredictiveModels

More Related Content

What's hot

Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
David Murgatroyd
 
Module 1 introduction to machine learning
Module 1  introduction to machine learningModule 1  introduction to machine learning
Module 1 introduction to machine learning
Sara Hooker
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
Sri Ambati
 
6 Modelling Purposes
6 Modelling Purposes6 Modelling Purposes
6 Modelling Purposes
Bruce Edmonds
 
Artificial Intelligence Approaches
Artificial Intelligence  ApproachesArtificial Intelligence  Approaches
Artificial Intelligence Approaches
Jincy Nelson
 
Decision support systems
Decision support systemsDecision support systems
Decision support systems
MR Z
 
910 plenary Elder
910 plenary Elder910 plenary Elder
910 plenary Elder
Rising Media, Inc.
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5
Roger Barga
 
Analytical Skills Tools and Attitudes 2013 Survey lavastorm analytics
Analytical Skills Tools and Attitudes 2013 Survey   lavastorm analyticsAnalytical Skills Tools and Attitudes 2013 Survey   lavastorm analytics
Analytical Skills Tools and Attitudes 2013 Survey lavastorm analytics
jjoseph100
 
Creativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data ScienceCreativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data Science
DamianMingle
 
MLSEV Virtual. State of the Art in ML
MLSEV Virtual. State of the Art in MLMLSEV Virtual. State of the Art in ML
MLSEV Virtual. State of the Art in ML
BigML, Inc
 
RESEARCH in software engineering
RESEARCH in software engineeringRESEARCH in software engineering
RESEARCH in software engineering
Ivano Malavolta
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
Sara Hooker
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
RaflyRizky2
 
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
IDEAS - Int'l Data Engineering and Science Association
 
Presentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlalPresentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlal
IstiqlalEid
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Darshan Ambhaikar
 
Machine learning it is time...
Machine learning it is time...Machine learning it is time...
Machine learning it is time...
Sandip Chatterjee
 
Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...
Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...
Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...
PyData
 

What's hot (20)

Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
 
Module 1 introduction to machine learning
Module 1  introduction to machine learningModule 1  introduction to machine learning
Module 1 introduction to machine learning
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
6 Modelling Purposes
6 Modelling Purposes6 Modelling Purposes
6 Modelling Purposes
 
Artificial Intelligence Approaches
Artificial Intelligence  ApproachesArtificial Intelligence  Approaches
Artificial Intelligence Approaches
 
Rm tutorial
Rm tutorialRm tutorial
Rm tutorial
 
Decision support systems
Decision support systemsDecision support systems
Decision support systems
 
910 plenary Elder
910 plenary Elder910 plenary Elder
910 plenary Elder
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5
 
Analytical Skills Tools and Attitudes 2013 Survey lavastorm analytics
Analytical Skills Tools and Attitudes 2013 Survey   lavastorm analyticsAnalytical Skills Tools and Attitudes 2013 Survey   lavastorm analytics
Analytical Skills Tools and Attitudes 2013 Survey lavastorm analytics
 
Creativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data ScienceCreativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data Science
 
MLSEV Virtual. State of the Art in ML
MLSEV Virtual. State of the Art in MLMLSEV Virtual. State of the Art in ML
MLSEV Virtual. State of the Art in ML
 
RESEARCH in software engineering
RESEARCH in software engineeringRESEARCH in software engineering
RESEARCH in software engineering
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
 
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
 
Presentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlalPresentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlal
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine learning it is time...
Machine learning it is time...Machine learning it is time...
Machine learning it is time...
 
Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...
Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...
Understanding your data with Bayesian networks (in Python) by Bartek Wilczyns...
 

Viewers also liked

Best Practices In Predictive Analytics
Best Practices In Predictive AnalyticsBest Practices In Predictive Analytics
Best Practices In Predictive Analytics
Capgemini
 
Seismi Case Study | Oracle Mining Event | Santiago de Chile | 15 March 2012
Seismi Case Study | Oracle Mining Event | Santiago de Chile | 15 March 2012Seismi Case Study | Oracle Mining Event | Santiago de Chile | 15 March 2012
Seismi Case Study | Oracle Mining Event | Santiago de Chile | 15 March 2012
Seismi Limited
 
5 Benefits of Predictive Analytics for E-Commerce
5 Benefits of Predictive Analytics for E-Commerce5 Benefits of Predictive Analytics for E-Commerce
5 Benefits of Predictive Analytics for E-Commerce
Edureka!
 
Creating Your First Predictive Model In Python
Creating Your First Predictive Model In PythonCreating Your First Predictive Model In Python
Creating Your First Predictive Model In Python
Robert Dempsey
 
Webinar: The Whys and Hows of Predictive Modelling
Webinar: The Whys and Hows of Predictive Modelling Webinar: The Whys and Hows of Predictive Modelling
Webinar: The Whys and Hows of Predictive Modelling
Edureka!
 
Presentation Churn Management
Presentation Churn ManagementPresentation Churn Management
Presentation Churn Managementfarhanmajeed
 
Introduction to Machine Learning (case studies)
Introduction to Machine Learning (case studies)Introduction to Machine Learning (case studies)
Introduction to Machine Learning (case studies)
Dmitry Efimov
 
Predictive analytics for E-commerce
Predictive analytics for E-commerce Predictive analytics for E-commerce
Predictive analytics for E-commerce
Niyuj - Delivering innovation
 
churn prediction in telecom
churn prediction in telecom churn prediction in telecom
churn prediction in telecom
Hong Bui Van
 
Artificial Intelligence, Predictive Modelling and Chatbots: Applications in P...
Artificial Intelligence, Predictive Modelling and Chatbots: Applications in P...Artificial Intelligence, Predictive Modelling and Chatbots: Applications in P...
Artificial Intelligence, Predictive Modelling and Chatbots: Applications in P...
Nick Brown
 
Amazon Machine Learning Case Study: Predicting Customer Churn
Amazon Machine Learning Case Study: Predicting Customer ChurnAmazon Machine Learning Case Study: Predicting Customer Churn
Amazon Machine Learning Case Study: Predicting Customer Churn
Amazon Web Services
 
AWS re:Invent 2016: Predicting Customer Churn with Amazon Machine Learning (M...
AWS re:Invent 2016: Predicting Customer Churn with Amazon Machine Learning (M...AWS re:Invent 2016: Predicting Customer Churn with Amazon Machine Learning (M...
AWS re:Invent 2016: Predicting Customer Churn with Amazon Machine Learning (M...
Amazon Web Services
 
Churn management
Churn managementChurn management
Churn management
Mohammed Akram Ayyubi
 
Data analytics telecom churn final ppt
Data analytics telecom churn final ppt Data analytics telecom churn final ppt
Data analytics telecom churn final ppt
Gunvansh Khanna
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Xavier Amatriain
 
Big data ppt
Big data pptBig data ppt
Big data ppt
IDBI Bank Ltd.
 

Viewers also liked (17)

Best Practices In Predictive Analytics
Best Practices In Predictive AnalyticsBest Practices In Predictive Analytics
Best Practices In Predictive Analytics
 
Seismi Case Study | Oracle Mining Event | Santiago de Chile | 15 March 2012
Seismi Case Study | Oracle Mining Event | Santiago de Chile | 15 March 2012Seismi Case Study | Oracle Mining Event | Santiago de Chile | 15 March 2012
Seismi Case Study | Oracle Mining Event | Santiago de Chile | 15 March 2012
 
5 Benefits of Predictive Analytics for E-Commerce
5 Benefits of Predictive Analytics for E-Commerce5 Benefits of Predictive Analytics for E-Commerce
5 Benefits of Predictive Analytics for E-Commerce
 
Creating Your First Predictive Model In Python
Creating Your First Predictive Model In PythonCreating Your First Predictive Model In Python
Creating Your First Predictive Model In Python
 
Webinar: The Whys and Hows of Predictive Modelling
Webinar: The Whys and Hows of Predictive Modelling Webinar: The Whys and Hows of Predictive Modelling
Webinar: The Whys and Hows of Predictive Modelling
 
Presentation Churn Management
Presentation Churn ManagementPresentation Churn Management
Presentation Churn Management
 
Introduction to Machine Learning (case studies)
Introduction to Machine Learning (case studies)Introduction to Machine Learning (case studies)
Introduction to Machine Learning (case studies)
 
Predictive analytics for E-commerce
Predictive analytics for E-commerce Predictive analytics for E-commerce
Predictive analytics for E-commerce
 
churn prediction in telecom
churn prediction in telecom churn prediction in telecom
churn prediction in telecom
 
Artificial Intelligence, Predictive Modelling and Chatbots: Applications in P...
Artificial Intelligence, Predictive Modelling and Chatbots: Applications in P...Artificial Intelligence, Predictive Modelling and Chatbots: Applications in P...
Artificial Intelligence, Predictive Modelling and Chatbots: Applications in P...
 
Amazon Machine Learning Case Study: Predicting Customer Churn
Amazon Machine Learning Case Study: Predicting Customer ChurnAmazon Machine Learning Case Study: Predicting Customer Churn
Amazon Machine Learning Case Study: Predicting Customer Churn
 
AWS re:Invent 2016: Predicting Customer Churn with Amazon Machine Learning (M...
AWS re:Invent 2016: Predicting Customer Churn with Amazon Machine Learning (M...AWS re:Invent 2016: Predicting Customer Churn with Amazon Machine Learning (M...
AWS re:Invent 2016: Predicting Customer Churn with Amazon Machine Learning (M...
 
Churn management
Churn managementChurn management
Churn management
 
Data analytics telecom churn final ppt
Data analytics telecom churn final ppt Data analytics telecom churn final ppt
Data analytics telecom churn final ppt
 
Churn Predictive Modelling
Churn Predictive ModellingChurn Predictive Modelling
Churn Predictive Modelling
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 

Similar to The zen of predictive modelling

Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
tboubez
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
HJ van Veen
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Knowledge And Skill Forum
 
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
DurgaDevi310087
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
eShikshak
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Niko Vuokko
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new pptSalford Systems
 
Improving AI Development - Dave Litwiller - Jan 11 2022 - Public
Improving AI Development - Dave Litwiller - Jan 11 2022 - PublicImproving AI Development - Dave Litwiller - Jan 11 2022 - Public
Improving AI Development - Dave Litwiller - Jan 11 2022 - Public
Dave Litwiller
 
Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptx
nagarajan740445
 
Michael Bolton - Heuristics: Solving Problems Rapidly
Michael Bolton - Heuristics: Solving Problems RapidlyMichael Bolton - Heuristics: Solving Problems Rapidly
Michael Bolton - Heuristics: Solving Problems Rapidly
TEST Huddle
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
Roger Barga
 
Data Analysis, Intepretation
Data Analysis, IntepretationData Analysis, Intepretation
Data Science 101
Data Science 101Data Science 101
Data Science 101
ideatoipo
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
Turi, Inc.
 
Mixed Effects Models - Random Intercepts
Mixed Effects Models - Random InterceptsMixed Effects Models - Random Intercepts
Mixed Effects Models - Random Intercepts
Scott Fraundorf
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
Aun Akbar
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
SVasuKrishna1
 
Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18
Cloudera, Inc.
 
Data Analysis
Data AnalysisData Analysis
Ml2 production
Ml2 productionMl2 production
Ml2 production
Nikhil Ketkar
 

Similar to The zen of predictive modelling (20)

Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new ppt
 
Improving AI Development - Dave Litwiller - Jan 11 2022 - Public
Improving AI Development - Dave Litwiller - Jan 11 2022 - PublicImproving AI Development - Dave Litwiller - Jan 11 2022 - Public
Improving AI Development - Dave Litwiller - Jan 11 2022 - Public
 
Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptx
 
Michael Bolton - Heuristics: Solving Problems Rapidly
Michael Bolton - Heuristics: Solving Problems RapidlyMichael Bolton - Heuristics: Solving Problems Rapidly
Michael Bolton - Heuristics: Solving Problems Rapidly
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Data Analysis, Intepretation
Data Analysis, IntepretationData Analysis, Intepretation
Data Analysis, Intepretation
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
 
Mixed Effects Models - Random Intercepts
Mixed Effects Models - Random InterceptsMixed Effects Models - Random Intercepts
Mixed Effects Models - Random Intercepts
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
 
Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18
 
Data Analysis
Data AnalysisData Analysis
Data Analysis
 
Ml2 production
Ml2 productionMl2 production
Ml2 production
 

Recently uploaded

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 

Recently uploaded (20)

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 

The zen of predictive modelling

  • 1. PresciientTraining The Zen of Predictive Modelling Eugene Dubossarsky eugene@presciient.com +61414573322 @cargomoose
  • 2. What This Talk Isn’t About But worth mentioning anyway: R and The Sydney Users of R Forum Analyst First My Courses
  • 3. Sydney Users of R Forum • Just 1 shy of 500 members • Regular meetups • Study groups: introduction to R, “Machine Learning for Hackers”, “Elements of Statistical Learning”
  • 4.
  • 5. R • Do a Google image search for “ggplot2” • Look for “r4stats”, “popularity” • Join SURF • Download R and start using it.
  • 6.
  • 7. Analyst First • Strategic, Cultural, Organisational, Human issues in analytics • Making analytics work in organisations • Focus on the Human side of analytics • International : Aust, NZ, Singapore, US, Japan, India, Hong Kong • analystfirst.com – see “core principles” and “what is analyst first” ?
  • 8. My Analytics Training Courses • Predictive Modelling, Data Mining, R, Forensic Analytics, Visualisation, Forecasting training courses • Sydney, Melbourne, Canberra, Singapore • Public and in-house • Pre-prepared or customised • Informal coaching/mentoring • Strategy, Review, Advice and Assistance with Analytics Capability Development in your organisation
  • 9. The Zen of Predictive Modelling PredictiveModels • The Most Important Part of My “Predictive Modelling and Data Mining Course” • What every user of predictive modelling should know • What every manager and owner of predictive modelling capability must know • “Open Secrets” known to the masters
  • 10. The Zen of Predictive Modelling PredictiveModels • To save people time • To see the forest for the trees • To real value out of predictive analytics
  • 11. The Right Point of View PredictiveModels Which is unlike the other two? • Kohonen neural network • Backpropagation neural network • CART decision tree
  • 12. The Right Point of View PredictiveModels Which is unlike the other two? • CART decision tree • Random Forest • Support Vector Machine
  • 13. The Right Point of View PredictiveModels Which is unlike the other two? • Backpropagation Neural Network • Linear Model • CART Decision Tree
  • 14. The Right Point of View PredictiveModels • Out Of Sample Accuracy • Robustness (Out of Time Accuracy) • Interpretability • Implementability
  • 15. The Right Point of View PredictiveModels • Out Of Sample Accuracy • Robustness (Out of Time Accuracy) • Interpretability • Implementability
  • 16. The Right Point of View PredictiveModels • Out Of Sample Accuracy • Robustness (Out of Time Accuracy) • Interpretability • Implementability
  • 17. The Right Point of View PredictiveModels Why build predictive models ? • Insights • Operational prediction • “What-if” analysis
  • 18. What Do All Predictive Models Have in Common ? PredictiveModels All Predictive Models: • Have a training set of predictors and outcomes • Probably have a cross-validation and test set of predictors and outcomes too. • Are “fit” (optimsied) to minimise an error function between their actual and target outcomes • Are probably cross-validated to control overfitting on an out-of-sample data set • Provide information on the relationship between the predictors and outcomes in the data • Can be used to score new data (make new predictions) • Can be deployed in IT systems • Can be interrogated for insights • Are only as accurate as the data allows • Provide a (fairly) accurate estimate of how well they will predict on new data
  • 19. What Do All Predictive Model Insights Have in Common ? PredictiveModels All Predictive Models: • Have variable importance measures (a number of which can be applied to any model) • Allow plotting predictors vs outcomes • Have variable accuracy measures • Can be resampled for more robust measures of accuracy
  • 20. What Do All Predictive Model Predictions Have in Common? PredictiveModels All Predictive Models: • Make predictions that are numeric : estimates of amount for regression, and probability for classification • All predictions are applications of the underlying model structure and parameters (formula) to new predictor data sets • All predictions are deterministic. Once a model is fitted, the predictions for a given record will be the same every time. (Though the prediction may be a distribution rather than a fixed point. Also, note that model fitting itself may be random – some models may differ slightly each time they are fitted to the same data set)
  • 21. How Do Predictive Model Families Differ? PredictiveModels • Classification vs Regression (most families can do both) • Predictive accuracy vs insights • Predictive accuracy vs stability • Deterministic fitting vs randomised fitting • Specific insights • Structure and complexity • Model assumptions (linear models, neural nets) • Model structure (trees vs additive models vs SVM vs Neural Nets etc) • The kinds of insights models provide • Tendency to overfit (most, but not all) • Dependence on metrics • Sensitivity to missing values and categorical variables
  • 22. Becoming a Master of Modelling Kung Fu PredictiveModels • Predictive models should be thought of as a “black box” initially, with the characteristics that all models have in common recognised • The focus should be on the data, not the model. • Focusing on the specific characteristics of the model is important when: deciding on the degree of accuracy desired, and the kinds of insights desired. • It is good to start by working with one highly accurate, simple to use method (randomForest is a good choice) and one or two highly interpretable models (rpart decision trees and (generalised) linear models are good here. • In fact, you can go a long way with just randomForest alone.
  • 23. Becoming a Master of Modelling Kung Fu PredictiveModels • Master an adequate tool. • Empty your mind of the tool . It is an illusion. • Meditate on the data.
  • 24. Meditating on Data PredictiveModels • Start with a highly accurate, nonparametric model you are comfortable with. • The accuracy of a highly accuarate method is close to the theoretical limit of accuracy possible on the data. World class experts may get closer, but not a whole lot closer. • So once you build the model, forget about the specific family you used. It is just a tool. • Each predictor may provide a unique amount of predictability to the model. Measure it. • Each predictor may be masked by other predictors. Be careful. • Check relationships between data and strongest predictors
  • 25. Meditating on Data PredictiveModels • There are at least 3 ways that a predictor can be important. They are not the same: • What is the unique contribution of the predictor to the accuracy of the model ? • What is the individual predictive power of the predictor alone ? • How vital is the predictor to the structure of a particular model ? • The first two are about the data, the third is more about the specific model. Which is more important ?
  • 26. Meditating on Data PredictiveModels • There are at least 3 ways that a predictor can be important. They are not the same: • What is the unique contribution of the predictor to the accuracy of the model ? • What is the individual predictive power of the predictor alone ? • How vital is the predictor to the structure of a particular model ? • The first two are about the data, the third is more about the specific model. Which is more important ?
  • 27. The Predictive Modelling Master’s Data Meditation PredictiveModels • Start with a highly accurate, nonparametric model you are comfortable with. • The accuracy of a highly accuarate method is close to the theoretical limit of accuracy possible on the data. World class experts may get closer, but not a whole lot closer. • So once you build the model, forget about the specific family you used. It is just a tool. • Measure model accuracy on out-of-sample data. Pay attention to any imbalances in class or data subset accuracy. • Measure model stability if necessary (it almost always is) • Measure the importance of all variables, using the three main techniques. • Measure again, holding some of the main predictors constant • Measure (visualise) the effects of each predictor • Build an interpretable model to help tell the story
  • 28. The Master Sharpens the Sword : Getting More Accuracy PredictiveModels • There is never enough data • Some model accuracy can result from trying other model families. Usually not much, and not the best use of time, though for some reason the favourite activity of new data miners. • Some more model accuracy can result from tweaking model parameters. This is perhaps less of a waste of time, but still not the ideal focus. • The most dramatic improvement in model accuracy comes from new predictors. • New predictors may be entirely new data sets, or complex new transformations of existing data. • A large, multi-tabular data set may well have information that has not been captured in the data. • The most common information of this type involves relations between individual records. (eg. Time series windows, geographic neighbourhoods or social network statistics per record)
  • 29. Illusions On the Path PredictiveModels • Colossal wastes of time can include • Trying to find the “right” model family • Getting stuck in data preprocessing trying to get all the predictors “right” • Trying to figure out what the targets should be (usually a sign that the business problem is not well understood) • Trying to “improve” the model without defining what that means
  • 30. The Sun Tzu of Modelling: Be Prepared PredictiveModels • Know what you are modelling and for what purpose. • Know what your target variable is. You may have more than one. • Do not hesitate, model with what you have, and add more predictors later. • Messy data is better than no data • Use the right error measures • Know the connection between the model and your business • Evaluate, interrogate the model accordingly • Always question the business value of the analysis • Always be ready to suggest the business use of the analysis • Don’t assume that the client understands what to do with the model
  • 31. Strategy and Tactics PredictiveModels • Why are you (re)building the model? • If Strategic: what is going to be done with the insights ? By whom ? • If Operational: what are the key metrics – accuracy, value, deployability?