SlideShare a Scribd company logo
1 of 54
Copyright © 2010 SAS Institute Inc. All rights reserved.
Building Better Models
Malcolm Moore
2
Copyright © 2010, SAS Institute Inc. All rights reserved.
JMP helps you make better decisions, faster
3
Copyright © 2010, SAS Institute Inc. All rights reserved.
We will show how you can use JMP to
 Build better models
 Manage “messy data” easily
 Compare alternative models and approaches to quickly
 Learn more from your data
 Select the best variables
 Make better predictions
 Communicate the consequences to execs and other
stakeholders
 Make better decisions, faster
4
Copyright © 2010, SAS Institute Inc. All rights reserved.
Ways of building better models
Help us to help you . . .
5
Copyright © 2010, SAS Institute Inc. All rights reserved.
How many rows are in your data sets?
(Select one)
1. <1,000
2. 1001 to 10,000
3. 10,001 to 100,000
4. 100,001 to 1M
5. >1M
6
Copyright © 2010, SAS Institute Inc. All rights reserved.
How many columns are in your data sets?
(Select one)
1. <20
2. 21 to 50
3. 51 to 100
4. 101 to 1,000
5. >1,000
7
Copyright © 2010, SAS Institute Inc. All rights reserved.
Are your Xs correlated?
(Select one)
1. No
2. Moderately correlated
3. Strongly correlated
8
Copyright © 2010, SAS Institute Inc. All rights reserved.
Does your original data contain missing
cells, outliers or wrong values?
(Select one)
1. Rarely
2. Sometimes
3. Always
9
Copyright © 2010, SAS Institute Inc. All rights reserved.
How do you analyse / make sense of data?
(Select all that apply)
1. Tabular summaries
2. Graphs
3. Statistical methods
4. Data mining or predictive modelling
5. Quality or reliability methods
10
Copyright © 2010, SAS Institute Inc. All rights reserved.
What’s your knowledge of statistics?
(Select one)
1. Low
2. Moderate
3. High
11
Copyright © 2010, SAS Institute Inc. All rights reserved.
What function best describes your work?
(Select one)
1. Academia
2. Research
3. Development
4. Production
5. Marketing or Sales
6. Support Services
12
Copyright © 2010, SAS Institute Inc. All rights reserved.
Topics Covered
 Ways of building better statistical models.
 Common statistical modeling methods:
 Decision Trees, Uplift Modelling
 Regression, PLS
 Neural Networks
 Shrinkage methods
 Useful statistical modeling approaches:
 Stepwise.
 Boosting.
 Model averaging, e.g. random forests.
 Strategies for missing data
 Case study approach to show the use of these methods
and ideas.
13
Copyright © 2010, SAS Institute Inc. All rights reserved.
What is a statistical model?
 An empirical model that relates a set of inputs (predictors,
Xs) to one or more outcomes (responses, Ys).
 Separates the response variation into signal and noise:
Y = f(X) + E
 Y is one or more continuous or categorical response outcomes.
 X is one or more continuous or categorical predictors.
 f(X) describes predictable variation in Y (signal).
 E describes non-predictable variation in Y (noise).
 “All models are wrong, but some are useful”
– George Box
14
Copyright © 2010, SAS Institute Inc. All rights reserved.
What is a predictive model?
 A type of statistical model where the focus is on
predicting Y independent of the form used for f(X).
 There is less concern about the form of the model –
parameter estimation isn’t important. The focus is on
how well it predicts.
 http://en.wikipedia.org/wiki/Predictive_modelling
15
Copyright © 2010, SAS Institute Inc. All rights reserved.
Identifying a Useful Statistical Model
 “All models are wrong, but some are useful”,
George Box
 How do we guard against producing results that look
scientific or rigorous, but are at best irrelevant and at
worst positively misleading?
 Or put another way how do we protect from overfitting or
assigning too much of the variation in Y to f(X)?
16
Copyright © 2010, SAS Institute Inc. All rights reserved.
Holdback Helps Prevent Overfitting
 Holding back some data
not used to fit model.
 Instead use this data to
select model, i.e. select
model with smallest error
or validation root mean
square error.
 Third subset (often called
test data) could also be
used to predict how well
model predicts previously
unseen data (not used to
fit or select model).
17
Copyright © 2010, SAS Institute Inc. All rights reserved.
Model Validation Options
 Large datasets use holdback which randomly split data
into two or three subgroups:
 Training: Used to build (fit or estimate) the model.
 Validation: Used to select “best” model, i.e. model representing
f(X) without overfitting.
 Test: Used solely to evaluate the final model fit. Gives honest
assessment of how well model predicts previously unseen data.
 Small datasets use k-fold:
 Randomly divides into k separate groups.
 Hold out one of the “folds” from model building and fit a model to
the rest of the data.
 Held out portion is “scored” (predicted) by the model, and
measures of model error recorded. Repeat for each fold.
 Average error estimates across data folds and select model with
smallest k-fold average error.
18
Copyright © 2010, SAS Institute Inc. All rights reserved.
What About Missing Cells?
 Some data sets are full of missing values or cells.
 Standard methods drop a whole observation if any of the
X’s are missing.
 With lots of X’s may end up with little or no data for
modelling.
 Even when you do end up with enough data for
modelling, if the mechanism that causes missing values
is related to the response the data left will be a biased
sample.
19
Copyright © 2010, SAS Institute Inc. All rights reserved.
Missing Values
 Sometimes emptiness is meaningful:
 Loan applicant leaves ‘debt’ and ‘salary’ fields empty.
 Job applicant leaves ‘previous job’ field empty.
 Political candidate fills out a form and leaves ‘last conviction’ field empty.
 Missing values are values too - They are just harder to
accommodate in statistical methods.
 Even if they are not informative we don’t want to throw away
data, making our models less informative (lose power).
 ‘Informative Missing’ puts all data to use.
20
Copyright © 2010, SAS Institute Inc. All rights reserved.
Informative Missing
 Options for dealing with missing data depend on
modelling method.
 Regression methods:
 Categorical Predictor:
» Creates separate level for missing data and treats it as such.
 Continuous Predictor:
» The column mean is substituted for the missing value.
» Additionally an indicator column is added to the predictors where
rows take value of 1 where data is missing, 0 otherwise.
 This can significantly improve the fit when data is missing not
at random and avoids data and power reduction due to
missing cells in other situations:
http://blogs.sas.com/content/jmp/2013/10/29/its-not-just-what-you-say-but-what-you-
dont-say-informative-missing/
21
Copyright © 2010, SAS Institute Inc. All rights reserved.
Statistical Modeling
 We will take a case study approach to introducing some
of the common statistical modeling methods deployed
with model validation approaches:
 Types -
» Decision Trees
» Regression, PLS
» Neural Networks
» Shrinkage Methods
 Approaches -
» Stepwise
» Boosting
» Model averaging, e.g. random forests
Copyright © 2010 SAS Institute Inc. All rights reserved.
Case Study 1: Regression
Banding in a Printing Process
23
Copyright © 2010, SAS Institute Inc. All rights reserved.
Regression (continuous response)
 Examples
Y = f (X1
, X2
,… , Xk
)
Y = a0
+a1
X1
+a2
X2
+ +ak
Xk
0 i i ij i ji i j
Y a a X a X X
   
24
Copyright © 2010, SAS Institute Inc. All rights reserved.
 Example – Logistic Regression
Regression (categorical response)
1 2[ target] ( , ,..., )kP Y f X X X 
1 2( , ,..., )
1
[ target]
1 kf X X X
P Y
e
 

f (X1
,X2
,...,Xk
) = a0
+a1
X1
+a2
X2
+ +ak
Xk
25
Copyright © 2010, SAS Institute Inc. All rights reserved.
Model Selection
 Stepwise Regression
 Start with a base model: intercept only or all terms.
 If intercept only, find term not included that explains the most
variation and enter it into the model.
 If all terms, remove the term that explains the least.
 Continue until a stopping criterion is met (validation R-Square).
 A variation of stepwise regression is all possible subsets
(best subset) regression.
 Examine all 2, 3, 4, …, etc. term models and pick the best out of
each. Sometimes statistical heredity is imposed to make the
problem more tractable.
See Gardner, S. “Model Selection: Part 2 - Model Selection Procedures“,
ASQ Statistics Division Newsletter, Volume 29, No. 3, Spring, 2011,
http://asqstatdiv.org/newsletterarch.php, for a discussion of stepwise
regression for continuous response models.
26
Copyright © 2010, SAS Institute Inc. All rights reserved.
Model Selection
 Drawbacks:
 Selection is all or nothing. The term either is in the model or isn’t.
 May miss important X’s when data correlated and parameter
estimates can be unstable.
 Optimal search may not follow a linear algorithmic path. Adding
the best term at each step may not produce the best overall
model.
 Large models may be impossible to examine using all subsets
regression.
 Shrinkage Methods:
 Attempt to simultaneously minimize the prediction error and shrink
the parameter estimates toward zero. Resulting estimates are
biased, but prediction error is often smaller.
 Can be considered as continuous model term selection.
 Common techniques: Ridge Regression, LASSO, Elastic Net.
27
Copyright © 2010, SAS Institute Inc. All rights reserved.
Banding in a Printing Process
Example 1
Copyright © 2010 SAS Institute Inc. All rights reserved.
Case Study 2: Decision
Trees
Which customer segments to target with campaigns
29
Copyright © 2010, SAS Institute Inc. All rights reserved.
Decision Trees
 Also known as Recursive Partitioning, CHAID, CART
 Models are a series of nested IF() statements, where
each condition in the IF() statement can be viewed as a
separate branch in a tree.
 Branches are chosen so that the difference in the
average response between paired branches is
maximised.
 Doing so assigns more of the variation in Y to f(X).
 Algorithm gets more complicated and computations
more intensive with holdback.
30
Copyright © 2010, SAS Institute Inc. All rights reserved.
Decision Tree
Goal is to predict those with a code of “1”
Overall Rate is 3.23%
Candidate “X’s”
• Search through each of these
• Examine Splits for each unique level
in each X
• Find Split that maximizes the difference in proportions of the
target variable
• LogWorth = -Log10(p-value) for the best split
on each variable. Best split has maximum LogWorth
31
Copyright © 2010, SAS Institute Inc. All rights reserved.
Decision Tree
1st Split:
Optimal Split at Age<28
Notice the difference in the rates
in each branch of the tree
Repeat “Split Search” across both “Partitions”
of the data. Find optimal split across both
branches.
32
Copyright © 2010, SAS Institute Inc. All rights reserved.
Decision Tree
2nd split on CARDS
(no CC vs some CC’s)
Notice variation in
proportion of “1” in each
branch
33
Copyright © 2010, SAS Institute Inc. All rights reserved.
Decision Tree
3rd split on TEL
(# of handsets owned)
Notice variation in proportion
of “1” in each branch
34
Copyright © 2010, SAS Institute Inc. All rights reserved.
Model Evaluation
 Continuous response models evaluated using SSE (sum
of squared error) measures such as R^2, adjusted R^2:
 Other alternatives are information based measures such as AIC
and BIC.
 Categorical response models evaluated on ability to:
 Sort portions of the data into different levels of response using
ROC curves and Lift curves.
 Categorize a new observation measured by confusion matrices
and rates, as well as overall misclassification rate.
35
Copyright © 2010, SAS Institute Inc. All rights reserved.
ROC Curve Example
36
Copyright © 2010, SAS Institute Inc. All rights reserved.
ROC Curves
 The higher the ROC curve is above the 45 degree line, the better
the model is doing at sorting the data than just simple random
sorting of the data.
 The ROC curve is constructed on the sorted table (e.g. sort the data
from highest Prob[Y==target] to lowest):
For each row, if the actual value is equal to the target, then the curve is
drawn upward (vertically), otherwise it is drawn across (horizontally).
Drawing ‘up’ means the model sorted well, drawing ‘across’ means the
model did not sort well.
 A good general measure of how well the curve is doing at prediction
and sorting is the Area Under the Curve (AUC), which is just the
area under the constructed ROC curve. This will be a value in the
range [0,1]:
Values greater than 0.5 indicate models that are better than simple
random guessing.
37
Copyright © 2010, SAS Institute Inc. All rights reserved.
Decision Trees: Which customer
segments are most likely to churn
Example 2
Copyright © 2010 SAS Institute Inc. All rights reserved.
Case Study 3: Uplift
Modeling
Which customer segments to target with campaigns
39
Copyright © 2010, SAS Institute Inc. All rights reserved.
Uplift Modelling
 Also known as incremental modelling, true lift modelling
or net modelling.
 Identifies individuals or sub-groups who are most likely
to respond favourably to some action:
 Customers likely to respond to marketing campaigns to help
optimize marketing decisions
 Patients likely to respond to medical intervention to help define
personalized medicine protocols
 Unlike traditional partition models that find splits to
optimize a prediction, uplift models find splits to
maximize a treatment difference.
 Best split is the split that maximises the interaction of the split
and treatment.
40
Copyright © 2010, SAS Institute Inc. All rights reserved.
Example
41
Copyright © 2010, SAS Institute Inc. All rights reserved.
Example Continued
Copyright © 2010 SAS Institute Inc. All rights reserved.
Case Study 4: Bootstrap
Forest and Boosted Trees
Quantitative Structure-Activity Modeling
46
Copyright © 2010, SAS Institute Inc. All rights reserved.
Improvements to Decision Trees
Two modifications to basic decision trees that (depending on
circumstances and the specific data) may develop better
models are:
1. Fit many models and average them -
 Bootstrap Forest or Random Forest.
2. Fit a simple model and boost it by fitting a simple model
to model errors and repeat several times -
 Model Boosting or Boosted Tree.
47
Copyright © 2010, SAS Institute Inc. All rights reserved.
Bootstrap Forest
 Bootstrap Forest:
 For each tree, take a random sample (with replacement) of rows.
 For each split, take a random sample (30% sample) of X’s.
 Build decision tree.
 Repeat above process making many trees and average
predictions across all trees (bagging).
 Also known as a random forest.
 Works very well on wide tables (with correlated X’s).
 Can be used for both predictive modeling and variable
selection.
48
Copyright © 2010, SAS Institute Inc. All rights reserved.
See the Trees in the Forest
Tree on 1st Bootstrap Sample
Tree on 2nd Bootstrap Sample
Tree on 3rd Bootstrap Sample
…
Tree on 100th Bootstrap Sample
49
Copyright © 2010, SAS Institute Inc. All rights reserved.
Average the Trees in the Forest
50
Copyright © 2010, SAS Institute Inc. All rights reserved.
Boosted Tree
 Beginning with the first tree (layer) build a small simple tree.
 From the residuals of the first tree, build another small simple tree.
 The next layer in the model is fit to the residuals from the previous
layer, and residuals are saved from that new model fit.
 This continues until a specified number of layers has been fit, or a
determination has been made that adding successive layers doesn’t
improve the fit of the model.
 The final model is the weighted accumulation of all of the model
layers.
51
Copyright © 2010, SAS Institute Inc. All rights reserved.
Boosted Tree Illustrated
…
M1 M2 M3 M49
𝑀 = 𝑀1 + 𝜀 ∙ 𝑀2 + 𝜀 ∙ 𝑀3 + ⋯ + 𝜀 ∙ 𝑀49
Models
Final Model
𝜀 is the learning rate
52
Copyright © 2010, SAS Institute Inc. All rights reserved.
Boosted Tree
 Boosted Trees:
 Primarily used for building prediction models.
 Not as good as Bootstrap Forest at exploring all the relationships
between Y and the X’s, but still can be used for that purpose.
 Results in ‘smaller’ models (fewer arithmetic operations), faster
scoring.
53
Copyright © 2010, SAS Institute Inc. All rights reserved.
Other Pro Modelling Methods
 PLS
 Neural Networks
 Shrinkage Methods:
 Ridge Regression, LASSO, Elastic Net
 PCA Clustering
54
Copyright © 2010, SAS Institute Inc. All rights reserved.
We have shown how you can use JMP to
 Build better models
 Manage “messy data” easily
 Compare alternative models and approaches to quickly
 Learn more from your data
 Select the best variables
 Make better predictions
 Communicate the consequences to execs and other
stakeholders
 Make better decisions, faster
55
Copyright © 2010, SAS Institute Inc. All rights reserved.
How mining your data helps your company
 Increase growth and return
 Reduce costs
 Deliver a competitive edge
 Improve loyalty
 Accelerate innovation
 Speed time to market
56
Copyright © 2010, SAS Institute Inc. All rights reserved.
JMP helps you make better decisions, faster
57
Copyright © 2010, SAS Institute Inc. All rights reserved.
What are you going to do next?
Visit jmp.com for more information about JMP
Sign up for our webinars and seminars

More Related Content

What's hot

Everything You Wanted to Know About Definitive Screening Designs
Everything You Wanted to Know About Definitive Screening DesignsEverything You Wanted to Know About Definitive Screening Designs
Everything You Wanted to Know About Definitive Screening DesignsJMP software from SAS
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Edureka!
 
Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)Matt Hansen
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreTuri, Inc.
 
00 DoE vers. OFAT (or COST) , a comparison
00 DoE vers. OFAT (or COST) , a  comparison 00 DoE vers. OFAT (or COST) , a  comparison
00 DoE vers. OFAT (or COST) , a comparison Stefan Moser
 
840 plenary elder_using his laptop
840 plenary elder_using his laptop840 plenary elder_using his laptop
840 plenary elder_using his laptopRising Media, Inc.
 
Hypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical TestHypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical TestMatt Hansen
 
Andrii Belas: A/B testing overview: use-cases, theory and tools
Andrii Belas: A/B testing overview: use-cases, theory and toolsAndrii Belas: A/B testing overview: use-cases, theory and tools
Andrii Belas: A/B testing overview: use-cases, theory and toolsLviv Startup Club
 
S01 Shainin component swap, DoE
S01 Shainin component swap, DoES01 Shainin component swap, DoE
S01 Shainin component swap, DoEStefan Moser
 
03 Design of Experiments - Factor prioritization
03 Design of Experiments - Factor prioritization03 Design of Experiments - Factor prioritization
03 Design of Experiments - Factor prioritizationStefan Moser
 
Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)
Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)
Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)Matt Hansen
 
[db tech showcase Tokyo 2018] #dbts2018 #B16 『The Basics of Machine Learning』
[db tech showcase Tokyo 2018] #dbts2018 #B16 『The Basics of Machine Learning』[db tech showcase Tokyo 2018] #dbts2018 #B16 『The Basics of Machine Learning』
[db tech showcase Tokyo 2018] #dbts2018 #B16 『The Basics of Machine Learning』Insight Technology, Inc.
 
Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science Frank Kienle
 
Risk Management in Data Analysis
Risk Management in Data AnalysisRisk Management in Data Analysis
Risk Management in Data AnalysisDavid Lee
 
Causal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationCausal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationScientificRevenue
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...ijsc
 
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DMFrom Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DMMichał Łopuszyński
 
DIY market segmentation 20170125
DIY market segmentation 20170125DIY market segmentation 20170125
DIY market segmentation 20170125Displayr
 

What's hot (20)

Everything You Wanted to Know About Definitive Screening Designs
Everything You Wanted to Know About Definitive Screening DesignsEverything You Wanted to Know About Definitive Screening Designs
Everything You Wanted to Know About Definitive Screening Designs
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
 
Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
 
00 DoE vers. OFAT (or COST) , a comparison
00 DoE vers. OFAT (or COST) , a  comparison 00 DoE vers. OFAT (or COST) , a  comparison
00 DoE vers. OFAT (or COST) , a comparison
 
840 plenary elder_using his laptop
840 plenary elder_using his laptop840 plenary elder_using his laptop
840 plenary elder_using his laptop
 
Hypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical TestHypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical Test
 
Andrii Belas: A/B testing overview: use-cases, theory and tools
Andrii Belas: A/B testing overview: use-cases, theory and toolsAndrii Belas: A/B testing overview: use-cases, theory and tools
Andrii Belas: A/B testing overview: use-cases, theory and tools
 
S01 Shainin component swap, DoE
S01 Shainin component swap, DoES01 Shainin component swap, DoE
S01 Shainin component swap, DoE
 
03 Design of Experiments - Factor prioritization
03 Design of Experiments - Factor prioritization03 Design of Experiments - Factor prioritization
03 Design of Experiments - Factor prioritization
 
Chapter 021
Chapter 021Chapter 021
Chapter 021
 
Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)
Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)
Hypothesis Testing: Central Tendency – Non-Normal (Nonparametric Overview)
 
[db tech showcase Tokyo 2018] #dbts2018 #B16 『The Basics of Machine Learning』
[db tech showcase Tokyo 2018] #dbts2018 #B16 『The Basics of Machine Learning』[db tech showcase Tokyo 2018] #dbts2018 #B16 『The Basics of Machine Learning』
[db tech showcase Tokyo 2018] #dbts2018 #B16 『The Basics of Machine Learning』
 
Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science
 
Risk Management in Data Analysis
Risk Management in Data AnalysisRisk Management in Data Analysis
Risk Management in Data Analysis
 
Causal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationCausal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous Optimization
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
 
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DMFrom Raw Data to Deployed Product. Fast & Agile with CRISP-DM
From Raw Data to Deployed Product. Fast & Agile with CRISP-DM
 
DIY market segmentation 20170125
DIY market segmentation 20170125DIY market segmentation 20170125
DIY market segmentation 20170125
 

Viewers also liked

When a Linear Model Just Won't Do: Fitting Nonlinear Models in JMP
When a Linear Model Just Won't Do: Fitting Nonlinear Models in JMPWhen a Linear Model Just Won't Do: Fitting Nonlinear Models in JMP
When a Linear Model Just Won't Do: Fitting Nonlinear Models in JMPJMP software from SAS
 
Steps For A Screening DOE
Steps For A Screening DOESteps For A Screening DOE
Steps For A Screening DOEThomas Abraham
 
MT A: "Workshop: Building a Data-Driven Marketing Organization: Be the Agent ...
MT A: "Workshop: Building a Data-Driven Marketing Organization: Be the Agent ...MT A: "Workshop: Building a Data-Driven Marketing Organization: Be the Agent ...
MT A: "Workshop: Building a Data-Driven Marketing Organization: Be the Agent ...iMedia Connection
 
Becoming a data driven organization
Becoming a data driven organization Becoming a data driven organization
Becoming a data driven organization Magnus Backman
 
DOE Applications in Process Chemistry Presentation
DOE Applications in Process Chemistry PresentationDOE Applications in Process Chemistry Presentation
DOE Applications in Process Chemistry Presentationsaweissman
 
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...JMP software from SAS
 

Viewers also liked (6)

When a Linear Model Just Won't Do: Fitting Nonlinear Models in JMP
When a Linear Model Just Won't Do: Fitting Nonlinear Models in JMPWhen a Linear Model Just Won't Do: Fitting Nonlinear Models in JMP
When a Linear Model Just Won't Do: Fitting Nonlinear Models in JMP
 
Steps For A Screening DOE
Steps For A Screening DOESteps For A Screening DOE
Steps For A Screening DOE
 
MT A: "Workshop: Building a Data-Driven Marketing Organization: Be the Agent ...
MT A: "Workshop: Building a Data-Driven Marketing Organization: Be the Agent ...MT A: "Workshop: Building a Data-Driven Marketing Organization: Be the Agent ...
MT A: "Workshop: Building a Data-Driven Marketing Organization: Be the Agent ...
 
Becoming a data driven organization
Becoming a data driven organization Becoming a data driven organization
Becoming a data driven organization
 
DOE Applications in Process Chemistry Presentation
DOE Applications in Process Chemistry PresentationDOE Applications in Process Chemistry Presentation
DOE Applications in Process Chemistry Presentation
 
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
 

Similar to Building Better Models

Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanyRobert Grossman
 
Reasoning over big data
Reasoning over big dataReasoning over big data
Reasoning over big dataOSTHUS
 
machinelearning-191005133446.pdf
machinelearning-191005133446.pdfmachinelearning-191005133446.pdf
machinelearning-191005133446.pdfLellaLinton
 
Machine Learning: A Fast Review
Machine Learning: A Fast ReviewMachine Learning: A Fast Review
Machine Learning: A Fast ReviewAhmad Ali Abin
 
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...Intel® Software
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfAnanthReddy38
 
7 qc tools
7 qc tools7 qc tools
7 qc toolskmsonam
 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applicationsBenjaminlapid1
 
Machine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEMachine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEbutest
 
A Framework for Statistical Simulation of Physiological Responses (SSPR).
A Framework for Statistical Simulation of Physiological Responses (SSPR).A Framework for Statistical Simulation of Physiological Responses (SSPR).
A Framework for Statistical Simulation of Physiological Responses (SSPR).Waqas Tariq
 
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Denny Lee
 
Data Mining with SQL Server 2008
Data Mining with SQL Server 2008Data Mining with SQL Server 2008
Data Mining with SQL Server 2008Peter Gfader
 
DECISION SUPPORT SYSTEMS- ANIMAL PRODUCTION APPLICATIONS_Dr Talaat Refaaat_ A...
DECISION SUPPORT SYSTEMS- ANIMAL PRODUCTION APPLICATIONS_Dr Talaat Refaaat_ A...DECISION SUPPORT SYSTEMS- ANIMAL PRODUCTION APPLICATIONS_Dr Talaat Refaaat_ A...
DECISION SUPPORT SYSTEMS- ANIMAL PRODUCTION APPLICATIONS_Dr Talaat Refaaat_ A...Dr Talaat Refaat
 
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxLesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxcloudserviceuit
 
Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction SystemIRJET Journal
 
Humanmetrics Jung Typology Test™You haven’t answered 1 que
Humanmetrics Jung Typology Test™You haven’t answered 1 queHumanmetrics Jung Typology Test™You haven’t answered 1 que
Humanmetrics Jung Typology Test™You haven’t answered 1 queNarcisaBrandenburg70
 
Echelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopEchelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopGarrett Teoh Hor Keong
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...Editor IJCATR
 

Similar to Building Better Models (20)

Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your Company
 
Analyzing Performance Test Data
Analyzing Performance Test DataAnalyzing Performance Test Data
Analyzing Performance Test Data
 
Reasoning over big data
Reasoning over big dataReasoning over big data
Reasoning over big data
 
machinelearning-191005133446.pdf
machinelearning-191005133446.pdfmachinelearning-191005133446.pdf
machinelearning-191005133446.pdf
 
Machine Learning: A Fast Review
Machine Learning: A Fast ReviewMachine Learning: A Fast Review
Machine Learning: A Fast Review
 
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
 
7 qc tools
7 qc tools7 qc tools
7 qc tools
 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applications
 
Machine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEMachine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AE
 
A Framework for Statistical Simulation of Physiological Responses (SSPR).
A Framework for Statistical Simulation of Physiological Responses (SSPR).A Framework for Statistical Simulation of Physiological Responses (SSPR).
A Framework for Statistical Simulation of Physiological Responses (SSPR).
 
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
 
Data Mining with SQL Server 2008
Data Mining with SQL Server 2008Data Mining with SQL Server 2008
Data Mining with SQL Server 2008
 
DECISION SUPPORT SYSTEMS- ANIMAL PRODUCTION APPLICATIONS_Dr Talaat Refaaat_ A...
DECISION SUPPORT SYSTEMS- ANIMAL PRODUCTION APPLICATIONS_Dr Talaat Refaaat_ A...DECISION SUPPORT SYSTEMS- ANIMAL PRODUCTION APPLICATIONS_Dr Talaat Refaaat_ A...
DECISION SUPPORT SYSTEMS- ANIMAL PRODUCTION APPLICATIONS_Dr Talaat Refaaat_ A...
 
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxLesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
 
Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction System
 
Humanmetrics Jung Typology Test™You haven’t answered 1 que
Humanmetrics Jung Typology Test™You haven’t answered 1 queHumanmetrics Jung Typology Test™You haven’t answered 1 que
Humanmetrics Jung Typology Test™You haven’t answered 1 que
 
Echelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy WorkshopEchelon Asia Summit 2017 Startup Academy Workshop
Echelon Asia Summit 2017 Startup Academy Workshop
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
 

More from JMP software from SAS

Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...JMP software from SAS
 
Exploring Best Practises in Design of Experiments
Exploring Best Practises in Design of ExperimentsExploring Best Practises in Design of Experiments
Exploring Best Practises in Design of ExperimentsJMP software from SAS
 
Evaluating & Monitoring Your Process Using MSA & SPC
Evaluating & Monitoring Your Process Using MSA & SPCEvaluating & Monitoring Your Process Using MSA & SPC
Evaluating & Monitoring Your Process Using MSA & SPCJMP software from SAS
 
Basic Design of Experiments Using the Custom DOE Platform
Basic Design of Experiments Using the Custom DOE PlatformBasic Design of Experiments Using the Custom DOE Platform
Basic Design of Experiments Using the Custom DOE PlatformJMP software from SAS
 
Advanced Use Cases of the Bootstrap Method in JMP Pro
Advanced Use Cases of the Bootstrap Method in JMP ProAdvanced Use Cases of the Bootstrap Method in JMP Pro
Advanced Use Cases of the Bootstrap Method in JMP ProJMP software from SAS
 
Building Models for Complex Design of Experiments
Building Models for Complex Design of ExperimentsBuilding Models for Complex Design of Experiments
Building Models for Complex Design of ExperimentsJMP software from SAS
 
New Design of Experiments Features in JMP 11
New Design of Experiments Features in JMP 11New Design of Experiments Features in JMP 11
New Design of Experiments Features in JMP 11JMP software from SAS
 
The Bootstrap and Beyond: Using JSL for Resampling
The Bootstrap and Beyond: Using JSL for ResamplingThe Bootstrap and Beyond: Using JSL for Resampling
The Bootstrap and Beyond: Using JSL for ResamplingJMP software from SAS
 
Exploring Variable Clustering and Importance in JMP
Exploring Variable Clustering and Importance in JMPExploring Variable Clustering and Importance in JMP
Exploring Variable Clustering and Importance in JMPJMP software from SAS
 

More from JMP software from SAS (11)

Grafische Analyse Ihrer Excel Daten
Grafische Analyse  Ihrer Excel DatenGrafische Analyse  Ihrer Excel Daten
Grafische Analyse Ihrer Excel Daten
 
JMP for Ethanol Producers
JMP for Ethanol ProducersJMP for Ethanol Producers
JMP for Ethanol Producers
 
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
 
Exploring Best Practises in Design of Experiments
Exploring Best Practises in Design of ExperimentsExploring Best Practises in Design of Experiments
Exploring Best Practises in Design of Experiments
 
Evaluating & Monitoring Your Process Using MSA & SPC
Evaluating & Monitoring Your Process Using MSA & SPCEvaluating & Monitoring Your Process Using MSA & SPC
Evaluating & Monitoring Your Process Using MSA & SPC
 
Basic Design of Experiments Using the Custom DOE Platform
Basic Design of Experiments Using the Custom DOE PlatformBasic Design of Experiments Using the Custom DOE Platform
Basic Design of Experiments Using the Custom DOE Platform
 
Advanced Use Cases of the Bootstrap Method in JMP Pro
Advanced Use Cases of the Bootstrap Method in JMP ProAdvanced Use Cases of the Bootstrap Method in JMP Pro
Advanced Use Cases of the Bootstrap Method in JMP Pro
 
Building Models for Complex Design of Experiments
Building Models for Complex Design of ExperimentsBuilding Models for Complex Design of Experiments
Building Models for Complex Design of Experiments
 
New Design of Experiments Features in JMP 11
New Design of Experiments Features in JMP 11New Design of Experiments Features in JMP 11
New Design of Experiments Features in JMP 11
 
The Bootstrap and Beyond: Using JSL for Resampling
The Bootstrap and Beyond: Using JSL for ResamplingThe Bootstrap and Beyond: Using JSL for Resampling
The Bootstrap and Beyond: Using JSL for Resampling
 
Exploring Variable Clustering and Importance in JMP
Exploring Variable Clustering and Importance in JMPExploring Variable Clustering and Importance in JMP
Exploring Variable Clustering and Importance in JMP
 

Recently uploaded

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 

Recently uploaded (20)

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 

Building Better Models

  • 1. Copyright © 2010 SAS Institute Inc. All rights reserved. Building Better Models Malcolm Moore
  • 2. 2 Copyright © 2010, SAS Institute Inc. All rights reserved. JMP helps you make better decisions, faster
  • 3. 3 Copyright © 2010, SAS Institute Inc. All rights reserved. We will show how you can use JMP to  Build better models  Manage “messy data” easily  Compare alternative models and approaches to quickly  Learn more from your data  Select the best variables  Make better predictions  Communicate the consequences to execs and other stakeholders  Make better decisions, faster
  • 4. 4 Copyright © 2010, SAS Institute Inc. All rights reserved. Ways of building better models Help us to help you . . .
  • 5. 5 Copyright © 2010, SAS Institute Inc. All rights reserved. How many rows are in your data sets? (Select one) 1. <1,000 2. 1001 to 10,000 3. 10,001 to 100,000 4. 100,001 to 1M 5. >1M
  • 6. 6 Copyright © 2010, SAS Institute Inc. All rights reserved. How many columns are in your data sets? (Select one) 1. <20 2. 21 to 50 3. 51 to 100 4. 101 to 1,000 5. >1,000
  • 7. 7 Copyright © 2010, SAS Institute Inc. All rights reserved. Are your Xs correlated? (Select one) 1. No 2. Moderately correlated 3. Strongly correlated
  • 8. 8 Copyright © 2010, SAS Institute Inc. All rights reserved. Does your original data contain missing cells, outliers or wrong values? (Select one) 1. Rarely 2. Sometimes 3. Always
  • 9. 9 Copyright © 2010, SAS Institute Inc. All rights reserved. How do you analyse / make sense of data? (Select all that apply) 1. Tabular summaries 2. Graphs 3. Statistical methods 4. Data mining or predictive modelling 5. Quality or reliability methods
  • 10. 10 Copyright © 2010, SAS Institute Inc. All rights reserved. What’s your knowledge of statistics? (Select one) 1. Low 2. Moderate 3. High
  • 11. 11 Copyright © 2010, SAS Institute Inc. All rights reserved. What function best describes your work? (Select one) 1. Academia 2. Research 3. Development 4. Production 5. Marketing or Sales 6. Support Services
  • 12. 12 Copyright © 2010, SAS Institute Inc. All rights reserved. Topics Covered  Ways of building better statistical models.  Common statistical modeling methods:  Decision Trees, Uplift Modelling  Regression, PLS  Neural Networks  Shrinkage methods  Useful statistical modeling approaches:  Stepwise.  Boosting.  Model averaging, e.g. random forests.  Strategies for missing data  Case study approach to show the use of these methods and ideas.
  • 13. 13 Copyright © 2010, SAS Institute Inc. All rights reserved. What is a statistical model?  An empirical model that relates a set of inputs (predictors, Xs) to one or more outcomes (responses, Ys).  Separates the response variation into signal and noise: Y = f(X) + E  Y is one or more continuous or categorical response outcomes.  X is one or more continuous or categorical predictors.  f(X) describes predictable variation in Y (signal).  E describes non-predictable variation in Y (noise).  “All models are wrong, but some are useful” – George Box
  • 14. 14 Copyright © 2010, SAS Institute Inc. All rights reserved. What is a predictive model?  A type of statistical model where the focus is on predicting Y independent of the form used for f(X).  There is less concern about the form of the model – parameter estimation isn’t important. The focus is on how well it predicts.  http://en.wikipedia.org/wiki/Predictive_modelling
  • 15. 15 Copyright © 2010, SAS Institute Inc. All rights reserved. Identifying a Useful Statistical Model  “All models are wrong, but some are useful”, George Box  How do we guard against producing results that look scientific or rigorous, but are at best irrelevant and at worst positively misleading?  Or put another way how do we protect from overfitting or assigning too much of the variation in Y to f(X)?
  • 16. 16 Copyright © 2010, SAS Institute Inc. All rights reserved. Holdback Helps Prevent Overfitting  Holding back some data not used to fit model.  Instead use this data to select model, i.e. select model with smallest error or validation root mean square error.  Third subset (often called test data) could also be used to predict how well model predicts previously unseen data (not used to fit or select model).
  • 17. 17 Copyright © 2010, SAS Institute Inc. All rights reserved. Model Validation Options  Large datasets use holdback which randomly split data into two or three subgroups:  Training: Used to build (fit or estimate) the model.  Validation: Used to select “best” model, i.e. model representing f(X) without overfitting.  Test: Used solely to evaluate the final model fit. Gives honest assessment of how well model predicts previously unseen data.  Small datasets use k-fold:  Randomly divides into k separate groups.  Hold out one of the “folds” from model building and fit a model to the rest of the data.  Held out portion is “scored” (predicted) by the model, and measures of model error recorded. Repeat for each fold.  Average error estimates across data folds and select model with smallest k-fold average error.
  • 18. 18 Copyright © 2010, SAS Institute Inc. All rights reserved. What About Missing Cells?  Some data sets are full of missing values or cells.  Standard methods drop a whole observation if any of the X’s are missing.  With lots of X’s may end up with little or no data for modelling.  Even when you do end up with enough data for modelling, if the mechanism that causes missing values is related to the response the data left will be a biased sample.
  • 19. 19 Copyright © 2010, SAS Institute Inc. All rights reserved. Missing Values  Sometimes emptiness is meaningful:  Loan applicant leaves ‘debt’ and ‘salary’ fields empty.  Job applicant leaves ‘previous job’ field empty.  Political candidate fills out a form and leaves ‘last conviction’ field empty.  Missing values are values too - They are just harder to accommodate in statistical methods.  Even if they are not informative we don’t want to throw away data, making our models less informative (lose power).  ‘Informative Missing’ puts all data to use.
  • 20. 20 Copyright © 2010, SAS Institute Inc. All rights reserved. Informative Missing  Options for dealing with missing data depend on modelling method.  Regression methods:  Categorical Predictor: » Creates separate level for missing data and treats it as such.  Continuous Predictor: » The column mean is substituted for the missing value. » Additionally an indicator column is added to the predictors where rows take value of 1 where data is missing, 0 otherwise.  This can significantly improve the fit when data is missing not at random and avoids data and power reduction due to missing cells in other situations: http://blogs.sas.com/content/jmp/2013/10/29/its-not-just-what-you-say-but-what-you- dont-say-informative-missing/
  • 21. 21 Copyright © 2010, SAS Institute Inc. All rights reserved. Statistical Modeling  We will take a case study approach to introducing some of the common statistical modeling methods deployed with model validation approaches:  Types - » Decision Trees » Regression, PLS » Neural Networks » Shrinkage Methods  Approaches - » Stepwise » Boosting » Model averaging, e.g. random forests
  • 22. Copyright © 2010 SAS Institute Inc. All rights reserved. Case Study 1: Regression Banding in a Printing Process
  • 23. 23 Copyright © 2010, SAS Institute Inc. All rights reserved. Regression (continuous response)  Examples Y = f (X1 , X2 ,… , Xk ) Y = a0 +a1 X1 +a2 X2 + +ak Xk 0 i i ij i ji i j Y a a X a X X    
  • 24. 24 Copyright © 2010, SAS Institute Inc. All rights reserved.  Example – Logistic Regression Regression (categorical response) 1 2[ target] ( , ,..., )kP Y f X X X  1 2( , ,..., ) 1 [ target] 1 kf X X X P Y e    f (X1 ,X2 ,...,Xk ) = a0 +a1 X1 +a2 X2 + +ak Xk
  • 25. 25 Copyright © 2010, SAS Institute Inc. All rights reserved. Model Selection  Stepwise Regression  Start with a base model: intercept only or all terms.  If intercept only, find term not included that explains the most variation and enter it into the model.  If all terms, remove the term that explains the least.  Continue until a stopping criterion is met (validation R-Square).  A variation of stepwise regression is all possible subsets (best subset) regression.  Examine all 2, 3, 4, …, etc. term models and pick the best out of each. Sometimes statistical heredity is imposed to make the problem more tractable. See Gardner, S. “Model Selection: Part 2 - Model Selection Procedures“, ASQ Statistics Division Newsletter, Volume 29, No. 3, Spring, 2011, http://asqstatdiv.org/newsletterarch.php, for a discussion of stepwise regression for continuous response models.
  • 26. 26 Copyright © 2010, SAS Institute Inc. All rights reserved. Model Selection  Drawbacks:  Selection is all or nothing. The term either is in the model or isn’t.  May miss important X’s when data correlated and parameter estimates can be unstable.  Optimal search may not follow a linear algorithmic path. Adding the best term at each step may not produce the best overall model.  Large models may be impossible to examine using all subsets regression.  Shrinkage Methods:  Attempt to simultaneously minimize the prediction error and shrink the parameter estimates toward zero. Resulting estimates are biased, but prediction error is often smaller.  Can be considered as continuous model term selection.  Common techniques: Ridge Regression, LASSO, Elastic Net.
  • 27. 27 Copyright © 2010, SAS Institute Inc. All rights reserved. Banding in a Printing Process Example 1
  • 28. Copyright © 2010 SAS Institute Inc. All rights reserved. Case Study 2: Decision Trees Which customer segments to target with campaigns
  • 29. 29 Copyright © 2010, SAS Institute Inc. All rights reserved. Decision Trees  Also known as Recursive Partitioning, CHAID, CART  Models are a series of nested IF() statements, where each condition in the IF() statement can be viewed as a separate branch in a tree.  Branches are chosen so that the difference in the average response between paired branches is maximised.  Doing so assigns more of the variation in Y to f(X).  Algorithm gets more complicated and computations more intensive with holdback.
  • 30. 30 Copyright © 2010, SAS Institute Inc. All rights reserved. Decision Tree Goal is to predict those with a code of “1” Overall Rate is 3.23% Candidate “X’s” • Search through each of these • Examine Splits for each unique level in each X • Find Split that maximizes the difference in proportions of the target variable • LogWorth = -Log10(p-value) for the best split on each variable. Best split has maximum LogWorth
  • 31. 31 Copyright © 2010, SAS Institute Inc. All rights reserved. Decision Tree 1st Split: Optimal Split at Age<28 Notice the difference in the rates in each branch of the tree Repeat “Split Search” across both “Partitions” of the data. Find optimal split across both branches.
  • 32. 32 Copyright © 2010, SAS Institute Inc. All rights reserved. Decision Tree 2nd split on CARDS (no CC vs some CC’s) Notice variation in proportion of “1” in each branch
  • 33. 33 Copyright © 2010, SAS Institute Inc. All rights reserved. Decision Tree 3rd split on TEL (# of handsets owned) Notice variation in proportion of “1” in each branch
  • 34. 34 Copyright © 2010, SAS Institute Inc. All rights reserved. Model Evaluation  Continuous response models evaluated using SSE (sum of squared error) measures such as R^2, adjusted R^2:  Other alternatives are information based measures such as AIC and BIC.  Categorical response models evaluated on ability to:  Sort portions of the data into different levels of response using ROC curves and Lift curves.  Categorize a new observation measured by confusion matrices and rates, as well as overall misclassification rate.
  • 35. 35 Copyright © 2010, SAS Institute Inc. All rights reserved. ROC Curve Example
  • 36. 36 Copyright © 2010, SAS Institute Inc. All rights reserved. ROC Curves  The higher the ROC curve is above the 45 degree line, the better the model is doing at sorting the data than just simple random sorting of the data.  The ROC curve is constructed on the sorted table (e.g. sort the data from highest Prob[Y==target] to lowest): For each row, if the actual value is equal to the target, then the curve is drawn upward (vertically), otherwise it is drawn across (horizontally). Drawing ‘up’ means the model sorted well, drawing ‘across’ means the model did not sort well.  A good general measure of how well the curve is doing at prediction and sorting is the Area Under the Curve (AUC), which is just the area under the constructed ROC curve. This will be a value in the range [0,1]: Values greater than 0.5 indicate models that are better than simple random guessing.
  • 37. 37 Copyright © 2010, SAS Institute Inc. All rights reserved. Decision Trees: Which customer segments are most likely to churn Example 2
  • 38. Copyright © 2010 SAS Institute Inc. All rights reserved. Case Study 3: Uplift Modeling Which customer segments to target with campaigns
  • 39. 39 Copyright © 2010, SAS Institute Inc. All rights reserved. Uplift Modelling  Also known as incremental modelling, true lift modelling or net modelling.  Identifies individuals or sub-groups who are most likely to respond favourably to some action:  Customers likely to respond to marketing campaigns to help optimize marketing decisions  Patients likely to respond to medical intervention to help define personalized medicine protocols  Unlike traditional partition models that find splits to optimize a prediction, uplift models find splits to maximize a treatment difference.  Best split is the split that maximises the interaction of the split and treatment.
  • 40. 40 Copyright © 2010, SAS Institute Inc. All rights reserved. Example
  • 41. 41 Copyright © 2010, SAS Institute Inc. All rights reserved. Example Continued
  • 42. Copyright © 2010 SAS Institute Inc. All rights reserved. Case Study 4: Bootstrap Forest and Boosted Trees Quantitative Structure-Activity Modeling
  • 43. 46 Copyright © 2010, SAS Institute Inc. All rights reserved. Improvements to Decision Trees Two modifications to basic decision trees that (depending on circumstances and the specific data) may develop better models are: 1. Fit many models and average them -  Bootstrap Forest or Random Forest. 2. Fit a simple model and boost it by fitting a simple model to model errors and repeat several times -  Model Boosting or Boosted Tree.
  • 44. 47 Copyright © 2010, SAS Institute Inc. All rights reserved. Bootstrap Forest  Bootstrap Forest:  For each tree, take a random sample (with replacement) of rows.  For each split, take a random sample (30% sample) of X’s.  Build decision tree.  Repeat above process making many trees and average predictions across all trees (bagging).  Also known as a random forest.  Works very well on wide tables (with correlated X’s).  Can be used for both predictive modeling and variable selection.
  • 45. 48 Copyright © 2010, SAS Institute Inc. All rights reserved. See the Trees in the Forest Tree on 1st Bootstrap Sample Tree on 2nd Bootstrap Sample Tree on 3rd Bootstrap Sample … Tree on 100th Bootstrap Sample
  • 46. 49 Copyright © 2010, SAS Institute Inc. All rights reserved. Average the Trees in the Forest
  • 47. 50 Copyright © 2010, SAS Institute Inc. All rights reserved. Boosted Tree  Beginning with the first tree (layer) build a small simple tree.  From the residuals of the first tree, build another small simple tree.  The next layer in the model is fit to the residuals from the previous layer, and residuals are saved from that new model fit.  This continues until a specified number of layers has been fit, or a determination has been made that adding successive layers doesn’t improve the fit of the model.  The final model is the weighted accumulation of all of the model layers.
  • 48. 51 Copyright © 2010, SAS Institute Inc. All rights reserved. Boosted Tree Illustrated … M1 M2 M3 M49 𝑀 = 𝑀1 + 𝜀 ∙ 𝑀2 + 𝜀 ∙ 𝑀3 + ⋯ + 𝜀 ∙ 𝑀49 Models Final Model 𝜀 is the learning rate
  • 49. 52 Copyright © 2010, SAS Institute Inc. All rights reserved. Boosted Tree  Boosted Trees:  Primarily used for building prediction models.  Not as good as Bootstrap Forest at exploring all the relationships between Y and the X’s, but still can be used for that purpose.  Results in ‘smaller’ models (fewer arithmetic operations), faster scoring.
  • 50. 53 Copyright © 2010, SAS Institute Inc. All rights reserved. Other Pro Modelling Methods  PLS  Neural Networks  Shrinkage Methods:  Ridge Regression, LASSO, Elastic Net  PCA Clustering
  • 51. 54 Copyright © 2010, SAS Institute Inc. All rights reserved. We have shown how you can use JMP to  Build better models  Manage “messy data” easily  Compare alternative models and approaches to quickly  Learn more from your data  Select the best variables  Make better predictions  Communicate the consequences to execs and other stakeholders  Make better decisions, faster
  • 52. 55 Copyright © 2010, SAS Institute Inc. All rights reserved. How mining your data helps your company  Increase growth and return  Reduce costs  Deliver a competitive edge  Improve loyalty  Accelerate innovation  Speed time to market
  • 53. 56 Copyright © 2010, SAS Institute Inc. All rights reserved. JMP helps you make better decisions, faster
  • 54. 57 Copyright © 2010, SAS Institute Inc. All rights reserved. What are you going to do next? Visit jmp.com for more information about JMP Sign up for our webinars and seminars

Editor's Notes

  1. We start off by defining what a statistical model is. A statistical model is a function, f(X), that we use to predict some response or outcome, that we label Y. Here, X represents one or more continuous or categorical predictor variables. We write the statistical model as Y = f(X) + residual error. The residual error is the left over part of the variation in Y that we cannot predict with the function, f(X). It turns out that during the process of building and evaluating statistical models, the residual error plays a key role, and examining and understanding that residual error can help us greatly as be seek to build a good model.
  2. The Bootstrap Forest method applies ideas in data sampling and model averaging that help building predictive models with very good performance. The first idea that is applied is Bootstrapping. A bootstrap sample from a data table is a 100% sample of the data, but performed with replacement. So, for instance, if we had a data table with 1000 rows of data, then a bootstrap sample would have 1000 rows of data. But, the bootstrap sample would not be the same as the original data table, because sample with replacement will lead to some rows being sample more than once (2, 3, 4, or even more times), and some rows not being sampled at all. Typically about 38% of the data is not selected at all. If you build a decision tree model on that bootstrap sample, it may not, by itself, be a very good tree, because the bootstrapped data used to build it is so unrepresentative of the actual data. But if we repeat this process over and over again, and average all the models built across many bootstrap samples, this can lead to a model that performs very well. This approach, of averaging models built across many bootstrap samples is known as Bootstrap Aggregation, or “bagging”. Bagged decision tree models, on average, perform better than a single decision tree built on the original data. To improve the model performance even more, we apply sampling of factors during the tree building process. For each tree built (on a separate bootstrap sample), for each split decision, a random subset of all the possible factors is selected and the optimal split is found among them. Typically we choose a small subset, say 25% of all the candidate factors. So at each split decision, about 75% of the factors are ignored, at random. This give each factor an opportunity to contribute to the model. Combining bagging and variable sampling results in a tree building method known as a Bootstrap Forest. This is also known as a random forest technique.
  3. Boosting is a newer idea in data mining, where models are built in layers. (go to the next slide)