SlideShare a Scribd company logo
1 of 42
Copyright © 2010 SAS Institute Inc. All rights reserved.
Introduction to Modeling
Bulding Better Models – Part 1
2
Copyright © 2010, SAS Institute Inc. All rights reserved.
What is a Model?
 An empirical representation that relates a set of inputs
(predictors, X) to one or more outcomes (responses, Y)
 Separates the response variation into signal and noise
Y = f(X) + E
 Y is one or more continuous or categorical response outcomes
 X is one or more continuous or categorical predictors
 f(X) describes predictable variation in Y (signal)
 E describes non-predictable variation in Y (noise)
 The mathematical form of f(X) can be based on domain
knowledge or mathematical convenience.
 “Remember that all models are wrong; the practical question is how wrong
do they have to be to not be useful.”
– George Box
3
Copyright © 2010, SAS Institute Inc. All rights reserved.
What makes a Good Model?
 Accurate and precise estimation of predictor effects.
 Focus is on the right hand side of the above equation.
 Interest is in the influence predictors have on the response(s).
 Design of experiments is an important tool for achieving this
goal. Good designs lead to good models.
 Accurate and precise prediction.
 Focus is on the left hand side of the above equation.
 Interest is in best predicting outcomes from a set of inputs, often
from historical or in situ data.
 Predictive modelling (data mining) methodologies are important
tools for achieving this goal.
 These two goals typically occur at different points in the
data discovery process.
4
Copyright © 2010, SAS Institute Inc. All rights reserved.
Modeling in Action
What we think is happening
Historical/In situ
Data Collection
Real World ModelY = f(X) + E
Historical/In situ
Data Collection
Historical/In situ
Data Collection
Experimental
Data Collection
What is really happening
Experimental
Data Collection
Experimental
Data Collection
Adapted from Box, Hunter, & Hunter
5
Copyright © 2010, SAS Institute Inc. All rights reserved.
Introduction to Predictive Models
 A type of statistical model where the focus is on
predicting Y independent of the form used for f(X).
 There is less concern about the form of the model – parameter
estimation isn’t important. The focus is on how well it predicts.
 Very flexible models are often used to allow for a greater range
of possibilities.
 Variable selection/reduction is typically important.
 http://en.wikipedia.org/wiki/Predictive_modelling
6
Copyright © 2010, SAS Institute Inc. All rights reserved.
Introduction to Predictive Models
 Example: Predict group (red or blue) using X1 and X2
7
Copyright © 2010, SAS Institute Inc. All rights reserved.
Introduction to Predictive Models
 Two competing methods.
 Results are for a single sample
Regression
Misclassification Rate = 32.5%
Nearest Neighbor
Misclassification Rate = 0%
Example from Hastie, Tibshirani, & Freidman: Elements of Statistical Learning
8
Copyright © 2010, SAS Institute Inc. All rights reserved.
Introduction to Predictive Models
 f(X): A majority of techniques are captured by two
general approaches:
 Global function of data
» More stable, less flexible
» Captures smooth relationship between continuous inputs and
outputs
» Examples: Regression, Generalized Regression, Neural
Networks, PLS, Discriminant Analysis
 Local function of data
» More flexible, less stable
» Captures local relationships in data (e.g., discrete shifts and
discontinuities)
» Examples: Nearest Neighbors, Bootstrap Forrest, Boosted
Trees
9
Copyright © 2010, SAS Institute Inc. All rights reserved.
Introduction to Predictive Models
 Second sample results
 Nearest neighbor seems to overfit sample 1
Regression
Misclassification Rate = 40%
Nearest Neighbor
Misclassification Rate = 41%
10
Copyright © 2010, SAS Institute Inc. All rights reserved.
Preventing Model Overfitting
 If the model is flexible what guards against overfitting
(i.e., producing predictions that are too optimistic)?
 Put another way, how do we protect from trying to model the
noise variability as part of f(X)?
 Solution – Hold back part of the data, using it to check
against overfitting. Break the data into two or three sets:
 The model is built on the training set.
 The validation set is used to select model by determining when
the model is becoming too complex
 The test set is often used to evaluate how well model predicts
independent of training and validation sets.
 Common methods include k-fold and random holdback.
11
Copyright © 2010, SAS Institute Inc. All rights reserved.
Handling Missing Predictor Values
 Case-wise deletion – Easy, but reduces the sample
 Simple imputation – Replace the value with the variable
mean or median
 Multivariate imputation – Use the correlation between
multiple variables to determine what the replacement
value should be
 Model based imputation – Model with the non-missing
values, replace missing values based on similar cases
 Model free imputation – e.g., distance based, hot hand,
etc.
 Methods insensitive to missing values
12
Copyright © 2010, SAS Institute Inc. All rights reserved.
How Does JMP Account for Missing Data?
 Categorical
 Creates a separate level for missing data and treats it as such.
 Continuous
 Informative Missing/Missing Value Coding:
» Regression and Neural Network: The column mean is substituted for
the missing value. An indicator column is included in the predictors
where rows are 1 where data is missing, 0 otherwise. This can
significantly improve the fit when data is missing not at random.
» Partition: the missing observations are considered on both sides of
the split. It is grouped with the side providing the better fit.
 Save Tolerant Prediction Formula (Partition):
» The predictor is randomly assigned to one of the splits.
» Only available if Informative Missing is not selected.
13
Copyright © 2010, SAS Institute Inc. All rights reserved.
How does JMP Handle Missing Data?
 Multivariate Imputation – Available in the Multivariate
platform and Column Utilities: Explore Missing Values.
Based on the correlation structure of the continuous
predictors (the expectation conditional on the non-
missing data).
 Model Based Imputation – Available in Fit Model and
PLS (JMP Pro). Impute missing predictors based on
partial least squares model.
Copyright © 2010 SAS Institute Inc. All rights reserved.
Regression and Model
Selection
Building Better Models – Part 1
15
Copyright © 2010, SAS Institute Inc. All rights reserved.
Regression
 General linear regression typically uses simple
polynomial functions for f(X).
 For continuous y:
 For categorical y, the logistic function of f(X) is typically used.
     

p
i
iiji
p
i
p
ij
ji
p
i
ii xxxxxf
1
2
1 1
,
1
0 
)(
1
1
xf
e

16
Copyright © 2010, SAS Institute Inc. All rights reserved.
Model Selection
 Stepwise Regression
 Start with a base model: intercept only or all terms.
 If intercept only, find term not included that explains the most
variation and enter it into the model.
 If all terms, remove the term that explains the least.
 Continue until a (typically) p-value based stopping criterion is met.
 A variation of stepwise regression is all possible subsets
(best subset) regression.
 Examine all 2, 3, 4, …, etc. term models and pick the best out of
each. Sometimes statistical heredity is imposed to make the
problem more tractable.
17
Copyright © 2010, SAS Institute Inc. All rights reserved.
Generalized Regression
 For many engineers and scientists, modeling begins and
ends with ordinary least squares (OLS) and stepwise or
best subsets regression.
 Unfortunately, when data are highly correlated, OLS
may lead to estimates that are less stable with higher
prediction variance.
 In addition, there are common situations when OLS
assumptions are violated:
 Responses and/or errors are not normally distributed.
 The response is not a linear combination of the predictors.
 Error variance is not constant across the prediction range.
18
Copyright © 2010, SAS Institute Inc. All rights reserved.
Penalized Regression
 Generalized regression (aka penalized regression or
shrinkage methods) can circumvent these problems.
 For correlated data, penalized methods produce more stable
estimates by biasing the estimates in an effort to reduce
prediction estimate variability.
 Provides a continuous approach to variable selection. More
stable than stepwise regression.
 Can be used for problems when there are more variables then
observations, which cannot be estimated using OLS.
 Three common techniques:
 Ridge Regression: Stabilizes the estimates, but can’t be used for
variable selection.
 LASSO
 Elastic Net: Weighted average of Ridge and LASSO.
19
Copyright © 2010, SAS Institute Inc. All rights reserved.
Quantile Regression
 Makes no distributional or variance-based assumptions.
 Allows the response/predictor relationship to vary with
response.
 Allows modeling of not just the median, but of any
quantile.
 Is more robust than OLS, lessening the influence of
high-leverage points.
Copyright © 2010 SAS Institute Inc. All rights reserved.
Decision Trees
Overview
21
Copyright © 2010, SAS Institute Inc. All rights reserved.
Decision Trees
 Also known as Recursive Partitioning, CHAID, CART
 Models are a series of nested IF() statements, where
each condition in the IF() statement can be viewed as a
separate branch in a tree.
 Branches are chosen so that the difference in the
average response (or average response rate) between
paired branches is maximized.
 Tree models are “grown” by adding more branches to
the tree so the more of the variability in the response is
explained by the model
22
Copyright © 2010, SAS Institute Inc. All rights reserved.
Decision Tree
Step-by-Step
Goal is to predict those with a code of “1”
Overall Rate is 3.23%
Candidate “X’s”
• Search through each of these
• Examine Splits for each unique level
in each X
• Find Split that maximizes “LogWorth”
• Will find split that maximizes
difference in proportions of the
target variable
23
Copyright © 2010, SAS Institute Inc. All rights reserved.
Decision Tree
Step-by-Step
1st Split:
Optimal Split at Age<28
Notice the difference in the rates
in each branch of the tree
Repeat “Split Search” across both “Partitions”
of the data. Find optimal split across both
branches.
24
Copyright © 2010, SAS Institute Inc. All rights reserved.
2nd split on CARDS
(no CC vs some CC’s)
Notice variation in
proportion of “1” in each
branch
Decision Tree
Step by Step
25
Copyright © 2010, SAS Institute Inc. All rights reserved.
Decision Tree (Step by Step)
3rd split on TEL
(# of handsets owned)
Notice variation in
proportion of “1” in each
branch
26
Copyright © 2010, SAS Institute Inc. All rights reserved.
Bootstrap Forest
 Bootstrap Forest
 For each tree, take a random sample (with replacement) of the
data table. Build out a decision tree on that sample.
 Make many trees and average their predictions (bagging)
 This is also know as a random forest technique
 Works very well on wide tables.
 Can be used for both predictive modeling and variable
selection.
27
Copyright © 2010, SAS Institute Inc. All rights reserved.
See the Trees in the Forest
Tree on 1st Bootstrap Sample
Tree on 2nd Bootstrap Sample
Tree on 3rd Bootstrap Sample
…
Tree on 100th Bootstrap Sample
28
Copyright © 2010, SAS Institute Inc. All rights reserved.
Average the Trees in the Forest
29
Copyright © 2010, SAS Institute Inc. All rights reserved.
Boosted Tree
 Beginning with the first tree (layer) build a small simple tree.
 From the residuals of the first tree, build another small simple tree.
 This continues until a specified number of layers has been fit, or a
determination has been made that adding successive layers doesn’t
improve the fit of the model.
 The final model is the weighted accumulation of all of the model
layers.
30
Copyright © 2010, SAS Institute Inc. All rights reserved.
Boosted Tree Illustrated
…
M1 M2 M3 M49
𝑀 = 𝑀1 + 𝜀 ∙ 𝑀2 + 𝜀 ∙ 𝑀3 + ⋯ + 𝜀 ∙ 𝑀49
Models
Final Model
𝜀 is the learning rate
Copyright © 2010 SAS Institute Inc. All rights reserved.
Neural Networks
Overview
32
Copyright © 2010, SAS Institute Inc. All rights reserved.
Neural Networks
 Neural Networks are highly flexible nonlinear models.
 A neural network can be viewed as a weighted sum of
nonlinear functions applied to linear models.
 The nonlinear functions are called activation functions. Each
function is considered a (hidden) node.
 The nonlinear functions are grouped in layers. There may be
more than one layer.
 Consider a generic example where there is a response
Y and two predictors X1 and X2. An example type of
neural network that can be fit to this data is given in the
diagram that follows
33
Copyright © 2010, SAS Institute Inc. All rights reserved.
Example Neural Network Diagram
Inputs 2nd Hidden
Node Layer
1st Hidden
Node Layer
Output
34
Copyright © 2010, SAS Institute Inc. All rights reserved.
Neural Networks
 Big Picture
 Can model:
» Continuous and categorical predictors
» Continuous and categorical responses
» Multiple responses (simultaneously)
 Can be numerically challenging and time consuming to fit
 NN models are very prone to overfitting if you are not careful
» JMP has many ways to help prevent overfitting
» Some type of validation is required
» Core analytics are designed to stop fitting process early
if overfitting is occurring.
See Gotwalt, C., “JMP® 9 Neural Platform Numerics”, Feb 2011,
http://www.jmp.com/blind/whitepapers/wp_jmp9_neural_104886.pdf
Copyright © 2010 SAS Institute Inc. All rights reserved.
Model Comparison
Overview
36
Copyright © 2010, SAS Institute Inc. All rights reserved.
Choosing the Best Model
 In many situations you would try many different types of
modeling methods
 Even within each modeling method, there are options to
create different models
 In Stepwise, the base/full model specification can be varied
 In Bootstrap Forest, the number of trees and number of terms
sample per split
 In Boosted Tree, the learning rate, number of layers, and base
tree size
 In Neural, the specification of the model, as well as the use of
boosting
 So how can you choose the “best”, most useful model?
37
Copyright © 2010, SAS Institute Inc. All rights reserved.
The Importance of the Test Set
 One of the most important uses of having a training,
validation, AND test set is that you can use the test set
to assess each model on the same basis.
 Using the test set allows you to compare competing
models on the basis of model quality metrics
 R2
 Misclassification Rate
 ROC and AUC
Copyright © 2010 SAS Institute Inc. All rights reserved.
Appendix
39
Copyright © 2010, SAS Institute Inc. All rights reserved.
Model Evaluation
 Continuous response models evaluated using SSE (sum
of squared error) measures such as R2, adjusted R2
 R2= 1 – (SSE / TSS)
» TSS is the total sum of squares for the response
» R2=1  perfect prediction,
» R2=0  no better than using the response average
 Other alternatives are Information based measures such as AIC
and BIC, also dependent on the SSE.
 Categorical response models evaluated on ability:
 to sort the data, using ROC curves
 to categorize a new observation measured by confusion matrices
and rates, as well as overall misclassification rate
40
Copyright © 2010, SAS Institute Inc. All rights reserved.
Demystifying ROC Curves
41
Copyright © 2010, SAS Institute Inc. All rights reserved.
ROC Curve Construction Steps
 The ROC curve is constructed on the sorted table (e.g. sort the data
from highest Prob[Y==target] to lowest).
 For each row, if the actual value is equal to the positive target,
then the curve is drawn upward (vertically) an increment of
1/(Number of Positives)
 Otherwise it is draw across (horizontally) an increment of
1/(Number of Negatives).
42
Copyright © 2010, SAS Institute Inc. All rights reserved.
Using ROC Curves
The higher the curve is, the better it
is at sorting. The Area Under the
Curve (AUC) will allways be in the
[0,1] range and can be used as a
metric of model quality.
ROC Curve that is above the 45
degree line in chart is better than
“guessing” (sorting at random).
Every point on the curve represents
the true positive (sensitivity) vs the
false postitive (1-Specifiity) for a
particular threshold that one could
use to classify observations.

More Related Content

What's hot

00 DoE vers. OFAT (or COST) , a comparison
00 DoE vers. OFAT (or COST) , a  comparison 00 DoE vers. OFAT (or COST) , a  comparison
00 DoE vers. OFAT (or COST) , a comparison Stefan Moser
 
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...Galit Shmueli
 
To explain or to predict
To explain or to predictTo explain or to predict
To explain or to predictGalit Shmueli
 
Repurposing predictive tools for causal research
Repurposing predictive tools for causal researchRepurposing predictive tools for causal research
Repurposing predictive tools for causal researchGalit Shmueli
 
To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?Galit Shmueli
 
03 Design of Experiments - Factor prioritization
03 Design of Experiments - Factor prioritization03 Design of Experiments - Factor prioritization
03 Design of Experiments - Factor prioritizationStefan Moser
 
S01 Shainin component swap, DoE
S01 Shainin component swap, DoES01 Shainin component swap, DoE
S01 Shainin component swap, DoEStefan Moser
 
Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Galit Shmueli
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningBill Liu
 
Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)Matt Hansen
 
Confidence in Software Cost Estimation Results based on MMRE and PRED
Confidence in Software Cost Estimation Results based on MMRE and PREDConfidence in Software Cost Estimation Results based on MMRE and PRED
Confidence in Software Cost Estimation Results based on MMRE and PREDgregoryg
 
83 learningdecisiontree
83 learningdecisiontree83 learningdecisiontree
83 learningdecisiontreetahseen shaikh
 
Statistical flaws in excel calculations
Statistical flaws in excel calculationsStatistical flaws in excel calculations
Statistical flaws in excel calculationsProbodh Mallick
 
Hypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical TestHypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical TestMatt Hansen
 
DIY market segmentation 20170125
DIY market segmentation 20170125DIY market segmentation 20170125
DIY market segmentation 20170125Displayr
 
To combine forecasts or to combine forecast models?
To combine forecasts or to combine forecast models?To combine forecasts or to combine forecast models?
To combine forecasts or to combine forecast models?Devon K. Barrow
 
DIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slidesDIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slidesDisplayr
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreTuri, Inc.
 
Levine smume7 ch09 modified
Levine smume7 ch09 modifiedLevine smume7 ch09 modified
Levine smume7 ch09 modifiedmgbardossy
 

What's hot (20)

00 DoE vers. OFAT (or COST) , a comparison
00 DoE vers. OFAT (or COST) , a  comparison 00 DoE vers. OFAT (or COST) , a  comparison
00 DoE vers. OFAT (or COST) , a comparison
 
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
 
To explain or to predict
To explain or to predictTo explain or to predict
To explain or to predict
 
Repurposing predictive tools for causal research
Repurposing predictive tools for causal researchRepurposing predictive tools for causal research
Repurposing predictive tools for causal research
 
To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?
 
03 Design of Experiments - Factor prioritization
03 Design of Experiments - Factor prioritization03 Design of Experiments - Factor prioritization
03 Design of Experiments - Factor prioritization
 
S01 Shainin component swap, DoE
S01 Shainin component swap, DoES01 Shainin component swap, DoE
S01 Shainin component swap, DoE
 
Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine Learning
 
Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)
 
Confidence in Software Cost Estimation Results based on MMRE and PRED
Confidence in Software Cost Estimation Results based on MMRE and PREDConfidence in Software Cost Estimation Results based on MMRE and PRED
Confidence in Software Cost Estimation Results based on MMRE and PRED
 
83 learningdecisiontree
83 learningdecisiontree83 learningdecisiontree
83 learningdecisiontree
 
Statistical flaws in excel calculations
Statistical flaws in excel calculationsStatistical flaws in excel calculations
Statistical flaws in excel calculations
 
Hypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical TestHypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical Test
 
DIY market segmentation 20170125
DIY market segmentation 20170125DIY market segmentation 20170125
DIY market segmentation 20170125
 
To combine forecasts or to combine forecast models?
To combine forecasts or to combine forecast models?To combine forecasts or to combine forecast models?
To combine forecasts or to combine forecast models?
 
DIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slidesDIY Driver Analysis Webinar slides
DIY Driver Analysis Webinar slides
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
 
Levine smume7 ch09 modified
Levine smume7 ch09 modifiedLevine smume7 ch09 modified
Levine smume7 ch09 modified
 
Ch08 ci estimation
Ch08 ci estimationCh08 ci estimation
Ch08 ci estimation
 

Viewers also liked

When a Linear Model Just Won't Do: Fitting Nonlinear Models in JMP
When a Linear Model Just Won't Do: Fitting Nonlinear Models in JMPWhen a Linear Model Just Won't Do: Fitting Nonlinear Models in JMP
When a Linear Model Just Won't Do: Fitting Nonlinear Models in JMPJMP software from SAS
 
Building Models for Complex Design of Experiments
Building Models for Complex Design of ExperimentsBuilding Models for Complex Design of Experiments
Building Models for Complex Design of ExperimentsJMP software from SAS
 
Steps For A Screening DOE
Steps For A Screening DOESteps For A Screening DOE
Steps For A Screening DOEThomas Abraham
 
MT A: "Workshop: Building a Data-Driven Marketing Organization: Be the Agent ...
MT A: "Workshop: Building a Data-Driven Marketing Organization: Be the Agent ...MT A: "Workshop: Building a Data-Driven Marketing Organization: Be the Agent ...
MT A: "Workshop: Building a Data-Driven Marketing Organization: Be the Agent ...iMedia Connection
 
Advanced Use Cases of the Bootstrap Method in JMP Pro
Advanced Use Cases of the Bootstrap Method in JMP ProAdvanced Use Cases of the Bootstrap Method in JMP Pro
Advanced Use Cases of the Bootstrap Method in JMP ProJMP software from SAS
 
কীভাবে হালনাগাদকৃত কেবি লোকালাইজ করবেন
কীভাবে হালনাগাদকৃত কেবি লোকালাইজ করবেনকীভাবে হালনাগাদকৃত কেবি লোকালাইজ করবেন
কীভাবে হালনাগাদকৃত কেবি লোকালাইজ করবেনRaiyad Raad
 
Vicios del lenguaje
Vicios del lenguajeVicios del lenguaje
Vicios del lenguajeblft123
 
впн в россии
впн в россиивпн в россии
впн в россии19nature
 
Statistical Discovery for Consumer and Marketing Research
Statistical Discovery for Consumer and Marketing ResearchStatistical Discovery for Consumer and Marketing Research
Statistical Discovery for Consumer and Marketing ResearchJMP software from SAS
 
Photobooooooooth
PhotoboooooooothPhotobooooooooth
Photoboooooooothnadim1020
 
Localization with Mozilla
Localization with MozillaLocalization with Mozilla
Localization with MozillaRaiyad Raad
 
Angloingles
AngloinglesAngloingles
Angloinglesblft123
 
Webquest on output_devices[1]
Webquest on output_devices[1]Webquest on output_devices[1]
Webquest on output_devices[1]edtechfacey
 
Washington presentation 3.1
Washington presentation 3.1Washington presentation 3.1
Washington presentation 3.1jbuyonje
 
Perk acties a6
Perk acties a6Perk acties a6
Perk acties a6Jaap Kemp
 
Perk laskrant
Perk laskrantPerk laskrant
Perk laskrantJaap Kemp
 
Jeopardy (output devices)
Jeopardy (output devices)Jeopardy (output devices)
Jeopardy (output devices)edtechfacey
 

Viewers also liked (20)

When a Linear Model Just Won't Do: Fitting Nonlinear Models in JMP
When a Linear Model Just Won't Do: Fitting Nonlinear Models in JMPWhen a Linear Model Just Won't Do: Fitting Nonlinear Models in JMP
When a Linear Model Just Won't Do: Fitting Nonlinear Models in JMP
 
A Primer in Statistical Discovery
A Primer in Statistical DiscoveryA Primer in Statistical Discovery
A Primer in Statistical Discovery
 
Building Models for Complex Design of Experiments
Building Models for Complex Design of ExperimentsBuilding Models for Complex Design of Experiments
Building Models for Complex Design of Experiments
 
Steps For A Screening DOE
Steps For A Screening DOESteps For A Screening DOE
Steps For A Screening DOE
 
MT A: "Workshop: Building a Data-Driven Marketing Organization: Be the Agent ...
MT A: "Workshop: Building a Data-Driven Marketing Organization: Be the Agent ...MT A: "Workshop: Building a Data-Driven Marketing Organization: Be the Agent ...
MT A: "Workshop: Building a Data-Driven Marketing Organization: Be the Agent ...
 
Advanced Use Cases of the Bootstrap Method in JMP Pro
Advanced Use Cases of the Bootstrap Method in JMP ProAdvanced Use Cases of the Bootstrap Method in JMP Pro
Advanced Use Cases of the Bootstrap Method in JMP Pro
 
কীভাবে হালনাগাদকৃত কেবি লোকালাইজ করবেন
কীভাবে হালনাগাদকৃত কেবি লোকালাইজ করবেনকীভাবে হালনাগাদকৃত কেবি লোকালাইজ করবেন
কীভাবে হালনাগাদকৃত কেবি লোকালাইজ করবেন
 
Vicios del lenguaje
Vicios del lenguajeVicios del lenguaje
Vicios del lenguaje
 
впн в россии
впн в россиивпн в россии
впн в россии
 
Statistical Discovery for Consumer and Marketing Research
Statistical Discovery for Consumer and Marketing ResearchStatistical Discovery for Consumer and Marketing Research
Statistical Discovery for Consumer and Marketing Research
 
Photobooooooooth
PhotoboooooooothPhotobooooooooth
Photobooooooooth
 
Localization with Mozilla
Localization with MozillaLocalization with Mozilla
Localization with Mozilla
 
Cld 495 final
Cld 495 final Cld 495 final
Cld 495 final
 
Angloingles
AngloinglesAngloingles
Angloingles
 
Webquest on output_devices[1]
Webquest on output_devices[1]Webquest on output_devices[1]
Webquest on output_devices[1]
 
Washington presentation 3.1
Washington presentation 3.1Washington presentation 3.1
Washington presentation 3.1
 
IKT előadás
IKT előadásIKT előadás
IKT előadás
 
Perk acties a6
Perk acties a6Perk acties a6
Perk acties a6
 
Perk laskrant
Perk laskrantPerk laskrant
Perk laskrant
 
Jeopardy (output devices)
Jeopardy (output devices)Jeopardy (output devices)
Jeopardy (output devices)
 

Similar to Introduction to Modeling

Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfDatacademy.ai
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009Matthew Magistrado
 
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdfLeonardo Auslender
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesVimal Gupta
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffRaman Kannan
 
Model validation strategies ftc 2018
Model validation strategies ftc 2018Model validation strategies ftc 2018
Model validation strategies ftc 2018Philip Ramsey
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchjim
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Researchbutest
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchkevinlan
 
13 random forest
13 random forest13 random forest
13 random forestVishal Dutt
 
1. F A Using S P S S1 (Saq.Sav) Q Ti A
1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A
1. F A Using S P S S1 (Saq.Sav) Q Ti AZoha Qureshi
 
Factor analysis using SPSS
Factor analysis using SPSSFactor analysis using SPSS
Factor analysis using SPSSRemas Mohamed
 
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 r-kor
 
Factor analysis using spss 2005
Factor analysis using spss 2005Factor analysis using spss 2005
Factor analysis using spss 2005jamescupello
 
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSPREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSIJCI JOURNAL
 

Similar to Introduction to Modeling (20)

Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009
 
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
 
Model validation strategies ftc 2018
Model validation strategies ftc 2018Model validation strategies ftc 2018
Model validation strategies ftc 2018
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Research
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
13 random forest
13 random forest13 random forest
13 random forest
 
Introductionedited
IntroductioneditedIntroductionedited
Introductionedited
 
1. F A Using S P S S1 (Saq.Sav) Q Ti A
1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A
1. F A Using S P S S1 (Saq.Sav) Q Ti A
 
Factor analysis using SPSS
Factor analysis using SPSSFactor analysis using SPSS
Factor analysis using SPSS
 
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
 
Issues in DTL.pptx
Issues in DTL.pptxIssues in DTL.pptx
Issues in DTL.pptx
 
Factor analysis using spss 2005
Factor analysis using spss 2005Factor analysis using spss 2005
Factor analysis using spss 2005
 
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSPREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
 

More from JMP software from SAS

Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...JMP software from SAS
 
Exploring Best Practises in Design of Experiments
Exploring Best Practises in Design of ExperimentsExploring Best Practises in Design of Experiments
Exploring Best Practises in Design of ExperimentsJMP software from SAS
 
Evaluating & Monitoring Your Process Using MSA & SPC
Evaluating & Monitoring Your Process Using MSA & SPCEvaluating & Monitoring Your Process Using MSA & SPC
Evaluating & Monitoring Your Process Using MSA & SPCJMP software from SAS
 
Basic Design of Experiments Using the Custom DOE Platform
Basic Design of Experiments Using the Custom DOE PlatformBasic Design of Experiments Using the Custom DOE Platform
Basic Design of Experiments Using the Custom DOE PlatformJMP software from SAS
 
Correcting Misconceptions About Optimal Design
Correcting Misconceptions About Optimal DesignCorrecting Misconceptions About Optimal Design
Correcting Misconceptions About Optimal DesignJMP software from SAS
 
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...JMP software from SAS
 
The Bootstrap and Beyond: Using JSL for Resampling
The Bootstrap and Beyond: Using JSL for ResamplingThe Bootstrap and Beyond: Using JSL for Resampling
The Bootstrap and Beyond: Using JSL for ResamplingJMP software from SAS
 
Exploring Variable Clustering and Importance in JMP
Exploring Variable Clustering and Importance in JMPExploring Variable Clustering and Importance in JMP
Exploring Variable Clustering and Importance in JMPJMP software from SAS
 

More from JMP software from SAS (10)

Grafische Analyse Ihrer Excel Daten
Grafische Analyse  Ihrer Excel DatenGrafische Analyse  Ihrer Excel Daten
Grafische Analyse Ihrer Excel Daten
 
JMP for Ethanol Producers
JMP for Ethanol ProducersJMP for Ethanol Producers
JMP for Ethanol Producers
 
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
 
Exploring Best Practises in Design of Experiments
Exploring Best Practises in Design of ExperimentsExploring Best Practises in Design of Experiments
Exploring Best Practises in Design of Experiments
 
Evaluating & Monitoring Your Process Using MSA & SPC
Evaluating & Monitoring Your Process Using MSA & SPCEvaluating & Monitoring Your Process Using MSA & SPC
Evaluating & Monitoring Your Process Using MSA & SPC
 
Basic Design of Experiments Using the Custom DOE Platform
Basic Design of Experiments Using the Custom DOE PlatformBasic Design of Experiments Using the Custom DOE Platform
Basic Design of Experiments Using the Custom DOE Platform
 
Correcting Misconceptions About Optimal Design
Correcting Misconceptions About Optimal DesignCorrecting Misconceptions About Optimal Design
Correcting Misconceptions About Optimal Design
 
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
 
The Bootstrap and Beyond: Using JSL for Resampling
The Bootstrap and Beyond: Using JSL for ResamplingThe Bootstrap and Beyond: Using JSL for Resampling
The Bootstrap and Beyond: Using JSL for Resampling
 
Exploring Variable Clustering and Importance in JMP
Exploring Variable Clustering and Importance in JMPExploring Variable Clustering and Importance in JMP
Exploring Variable Clustering and Importance in JMP
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Introduction to Modeling

  • 1. Copyright © 2010 SAS Institute Inc. All rights reserved. Introduction to Modeling Bulding Better Models – Part 1
  • 2. 2 Copyright © 2010, SAS Institute Inc. All rights reserved. What is a Model?  An empirical representation that relates a set of inputs (predictors, X) to one or more outcomes (responses, Y)  Separates the response variation into signal and noise Y = f(X) + E  Y is one or more continuous or categorical response outcomes  X is one or more continuous or categorical predictors  f(X) describes predictable variation in Y (signal)  E describes non-predictable variation in Y (noise)  The mathematical form of f(X) can be based on domain knowledge or mathematical convenience.  “Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.” – George Box
  • 3. 3 Copyright © 2010, SAS Institute Inc. All rights reserved. What makes a Good Model?  Accurate and precise estimation of predictor effects.  Focus is on the right hand side of the above equation.  Interest is in the influence predictors have on the response(s).  Design of experiments is an important tool for achieving this goal. Good designs lead to good models.  Accurate and precise prediction.  Focus is on the left hand side of the above equation.  Interest is in best predicting outcomes from a set of inputs, often from historical or in situ data.  Predictive modelling (data mining) methodologies are important tools for achieving this goal.  These two goals typically occur at different points in the data discovery process.
  • 4. 4 Copyright © 2010, SAS Institute Inc. All rights reserved. Modeling in Action What we think is happening Historical/In situ Data Collection Real World ModelY = f(X) + E Historical/In situ Data Collection Historical/In situ Data Collection Experimental Data Collection What is really happening Experimental Data Collection Experimental Data Collection Adapted from Box, Hunter, & Hunter
  • 5. 5 Copyright © 2010, SAS Institute Inc. All rights reserved. Introduction to Predictive Models  A type of statistical model where the focus is on predicting Y independent of the form used for f(X).  There is less concern about the form of the model – parameter estimation isn’t important. The focus is on how well it predicts.  Very flexible models are often used to allow for a greater range of possibilities.  Variable selection/reduction is typically important.  http://en.wikipedia.org/wiki/Predictive_modelling
  • 6. 6 Copyright © 2010, SAS Institute Inc. All rights reserved. Introduction to Predictive Models  Example: Predict group (red or blue) using X1 and X2
  • 7. 7 Copyright © 2010, SAS Institute Inc. All rights reserved. Introduction to Predictive Models  Two competing methods.  Results are for a single sample Regression Misclassification Rate = 32.5% Nearest Neighbor Misclassification Rate = 0% Example from Hastie, Tibshirani, & Freidman: Elements of Statistical Learning
  • 8. 8 Copyright © 2010, SAS Institute Inc. All rights reserved. Introduction to Predictive Models  f(X): A majority of techniques are captured by two general approaches:  Global function of data » More stable, less flexible » Captures smooth relationship between continuous inputs and outputs » Examples: Regression, Generalized Regression, Neural Networks, PLS, Discriminant Analysis  Local function of data » More flexible, less stable » Captures local relationships in data (e.g., discrete shifts and discontinuities) » Examples: Nearest Neighbors, Bootstrap Forrest, Boosted Trees
  • 9. 9 Copyright © 2010, SAS Institute Inc. All rights reserved. Introduction to Predictive Models  Second sample results  Nearest neighbor seems to overfit sample 1 Regression Misclassification Rate = 40% Nearest Neighbor Misclassification Rate = 41%
  • 10. 10 Copyright © 2010, SAS Institute Inc. All rights reserved. Preventing Model Overfitting  If the model is flexible what guards against overfitting (i.e., producing predictions that are too optimistic)?  Put another way, how do we protect from trying to model the noise variability as part of f(X)?  Solution – Hold back part of the data, using it to check against overfitting. Break the data into two or three sets:  The model is built on the training set.  The validation set is used to select model by determining when the model is becoming too complex  The test set is often used to evaluate how well model predicts independent of training and validation sets.  Common methods include k-fold and random holdback.
  • 11. 11 Copyright © 2010, SAS Institute Inc. All rights reserved. Handling Missing Predictor Values  Case-wise deletion – Easy, but reduces the sample  Simple imputation – Replace the value with the variable mean or median  Multivariate imputation – Use the correlation between multiple variables to determine what the replacement value should be  Model based imputation – Model with the non-missing values, replace missing values based on similar cases  Model free imputation – e.g., distance based, hot hand, etc.  Methods insensitive to missing values
  • 12. 12 Copyright © 2010, SAS Institute Inc. All rights reserved. How Does JMP Account for Missing Data?  Categorical  Creates a separate level for missing data and treats it as such.  Continuous  Informative Missing/Missing Value Coding: » Regression and Neural Network: The column mean is substituted for the missing value. An indicator column is included in the predictors where rows are 1 where data is missing, 0 otherwise. This can significantly improve the fit when data is missing not at random. » Partition: the missing observations are considered on both sides of the split. It is grouped with the side providing the better fit.  Save Tolerant Prediction Formula (Partition): » The predictor is randomly assigned to one of the splits. » Only available if Informative Missing is not selected.
  • 13. 13 Copyright © 2010, SAS Institute Inc. All rights reserved. How does JMP Handle Missing Data?  Multivariate Imputation – Available in the Multivariate platform and Column Utilities: Explore Missing Values. Based on the correlation structure of the continuous predictors (the expectation conditional on the non- missing data).  Model Based Imputation – Available in Fit Model and PLS (JMP Pro). Impute missing predictors based on partial least squares model.
  • 14. Copyright © 2010 SAS Institute Inc. All rights reserved. Regression and Model Selection Building Better Models – Part 1
  • 15. 15 Copyright © 2010, SAS Institute Inc. All rights reserved. Regression  General linear regression typically uses simple polynomial functions for f(X).  For continuous y:  For categorical y, the logistic function of f(X) is typically used.        p i iiji p i p ij ji p i ii xxxxxf 1 2 1 1 , 1 0  )( 1 1 xf e 
  • 16. 16 Copyright © 2010, SAS Institute Inc. All rights reserved. Model Selection  Stepwise Regression  Start with a base model: intercept only or all terms.  If intercept only, find term not included that explains the most variation and enter it into the model.  If all terms, remove the term that explains the least.  Continue until a (typically) p-value based stopping criterion is met.  A variation of stepwise regression is all possible subsets (best subset) regression.  Examine all 2, 3, 4, …, etc. term models and pick the best out of each. Sometimes statistical heredity is imposed to make the problem more tractable.
  • 17. 17 Copyright © 2010, SAS Institute Inc. All rights reserved. Generalized Regression  For many engineers and scientists, modeling begins and ends with ordinary least squares (OLS) and stepwise or best subsets regression.  Unfortunately, when data are highly correlated, OLS may lead to estimates that are less stable with higher prediction variance.  In addition, there are common situations when OLS assumptions are violated:  Responses and/or errors are not normally distributed.  The response is not a linear combination of the predictors.  Error variance is not constant across the prediction range.
  • 18. 18 Copyright © 2010, SAS Institute Inc. All rights reserved. Penalized Regression  Generalized regression (aka penalized regression or shrinkage methods) can circumvent these problems.  For correlated data, penalized methods produce more stable estimates by biasing the estimates in an effort to reduce prediction estimate variability.  Provides a continuous approach to variable selection. More stable than stepwise regression.  Can be used for problems when there are more variables then observations, which cannot be estimated using OLS.  Three common techniques:  Ridge Regression: Stabilizes the estimates, but can’t be used for variable selection.  LASSO  Elastic Net: Weighted average of Ridge and LASSO.
  • 19. 19 Copyright © 2010, SAS Institute Inc. All rights reserved. Quantile Regression  Makes no distributional or variance-based assumptions.  Allows the response/predictor relationship to vary with response.  Allows modeling of not just the median, but of any quantile.  Is more robust than OLS, lessening the influence of high-leverage points.
  • 20. Copyright © 2010 SAS Institute Inc. All rights reserved. Decision Trees Overview
  • 21. 21 Copyright © 2010, SAS Institute Inc. All rights reserved. Decision Trees  Also known as Recursive Partitioning, CHAID, CART  Models are a series of nested IF() statements, where each condition in the IF() statement can be viewed as a separate branch in a tree.  Branches are chosen so that the difference in the average response (or average response rate) between paired branches is maximized.  Tree models are “grown” by adding more branches to the tree so the more of the variability in the response is explained by the model
  • 22. 22 Copyright © 2010, SAS Institute Inc. All rights reserved. Decision Tree Step-by-Step Goal is to predict those with a code of “1” Overall Rate is 3.23% Candidate “X’s” • Search through each of these • Examine Splits for each unique level in each X • Find Split that maximizes “LogWorth” • Will find split that maximizes difference in proportions of the target variable
  • 23. 23 Copyright © 2010, SAS Institute Inc. All rights reserved. Decision Tree Step-by-Step 1st Split: Optimal Split at Age<28 Notice the difference in the rates in each branch of the tree Repeat “Split Search” across both “Partitions” of the data. Find optimal split across both branches.
  • 24. 24 Copyright © 2010, SAS Institute Inc. All rights reserved. 2nd split on CARDS (no CC vs some CC’s) Notice variation in proportion of “1” in each branch Decision Tree Step by Step
  • 25. 25 Copyright © 2010, SAS Institute Inc. All rights reserved. Decision Tree (Step by Step) 3rd split on TEL (# of handsets owned) Notice variation in proportion of “1” in each branch
  • 26. 26 Copyright © 2010, SAS Institute Inc. All rights reserved. Bootstrap Forest  Bootstrap Forest  For each tree, take a random sample (with replacement) of the data table. Build out a decision tree on that sample.  Make many trees and average their predictions (bagging)  This is also know as a random forest technique  Works very well on wide tables.  Can be used for both predictive modeling and variable selection.
  • 27. 27 Copyright © 2010, SAS Institute Inc. All rights reserved. See the Trees in the Forest Tree on 1st Bootstrap Sample Tree on 2nd Bootstrap Sample Tree on 3rd Bootstrap Sample … Tree on 100th Bootstrap Sample
  • 28. 28 Copyright © 2010, SAS Institute Inc. All rights reserved. Average the Trees in the Forest
  • 29. 29 Copyright © 2010, SAS Institute Inc. All rights reserved. Boosted Tree  Beginning with the first tree (layer) build a small simple tree.  From the residuals of the first tree, build another small simple tree.  This continues until a specified number of layers has been fit, or a determination has been made that adding successive layers doesn’t improve the fit of the model.  The final model is the weighted accumulation of all of the model layers.
  • 30. 30 Copyright © 2010, SAS Institute Inc. All rights reserved. Boosted Tree Illustrated … M1 M2 M3 M49 𝑀 = 𝑀1 + 𝜀 ∙ 𝑀2 + 𝜀 ∙ 𝑀3 + ⋯ + 𝜀 ∙ 𝑀49 Models Final Model 𝜀 is the learning rate
  • 31. Copyright © 2010 SAS Institute Inc. All rights reserved. Neural Networks Overview
  • 32. 32 Copyright © 2010, SAS Institute Inc. All rights reserved. Neural Networks  Neural Networks are highly flexible nonlinear models.  A neural network can be viewed as a weighted sum of nonlinear functions applied to linear models.  The nonlinear functions are called activation functions. Each function is considered a (hidden) node.  The nonlinear functions are grouped in layers. There may be more than one layer.  Consider a generic example where there is a response Y and two predictors X1 and X2. An example type of neural network that can be fit to this data is given in the diagram that follows
  • 33. 33 Copyright © 2010, SAS Institute Inc. All rights reserved. Example Neural Network Diagram Inputs 2nd Hidden Node Layer 1st Hidden Node Layer Output
  • 34. 34 Copyright © 2010, SAS Institute Inc. All rights reserved. Neural Networks  Big Picture  Can model: » Continuous and categorical predictors » Continuous and categorical responses » Multiple responses (simultaneously)  Can be numerically challenging and time consuming to fit  NN models are very prone to overfitting if you are not careful » JMP has many ways to help prevent overfitting » Some type of validation is required » Core analytics are designed to stop fitting process early if overfitting is occurring. See Gotwalt, C., “JMP® 9 Neural Platform Numerics”, Feb 2011, http://www.jmp.com/blind/whitepapers/wp_jmp9_neural_104886.pdf
  • 35. Copyright © 2010 SAS Institute Inc. All rights reserved. Model Comparison Overview
  • 36. 36 Copyright © 2010, SAS Institute Inc. All rights reserved. Choosing the Best Model  In many situations you would try many different types of modeling methods  Even within each modeling method, there are options to create different models  In Stepwise, the base/full model specification can be varied  In Bootstrap Forest, the number of trees and number of terms sample per split  In Boosted Tree, the learning rate, number of layers, and base tree size  In Neural, the specification of the model, as well as the use of boosting  So how can you choose the “best”, most useful model?
  • 37. 37 Copyright © 2010, SAS Institute Inc. All rights reserved. The Importance of the Test Set  One of the most important uses of having a training, validation, AND test set is that you can use the test set to assess each model on the same basis.  Using the test set allows you to compare competing models on the basis of model quality metrics  R2  Misclassification Rate  ROC and AUC
  • 38. Copyright © 2010 SAS Institute Inc. All rights reserved. Appendix
  • 39. 39 Copyright © 2010, SAS Institute Inc. All rights reserved. Model Evaluation  Continuous response models evaluated using SSE (sum of squared error) measures such as R2, adjusted R2  R2= 1 – (SSE / TSS) » TSS is the total sum of squares for the response » R2=1  perfect prediction, » R2=0  no better than using the response average  Other alternatives are Information based measures such as AIC and BIC, also dependent on the SSE.  Categorical response models evaluated on ability:  to sort the data, using ROC curves  to categorize a new observation measured by confusion matrices and rates, as well as overall misclassification rate
  • 40. 40 Copyright © 2010, SAS Institute Inc. All rights reserved. Demystifying ROC Curves
  • 41. 41 Copyright © 2010, SAS Institute Inc. All rights reserved. ROC Curve Construction Steps  The ROC curve is constructed on the sorted table (e.g. sort the data from highest Prob[Y==target] to lowest).  For each row, if the actual value is equal to the positive target, then the curve is drawn upward (vertically) an increment of 1/(Number of Positives)  Otherwise it is draw across (horizontally) an increment of 1/(Number of Negatives).
  • 42. 42 Copyright © 2010, SAS Institute Inc. All rights reserved. Using ROC Curves The higher the curve is, the better it is at sorting. The Area Under the Curve (AUC) will allways be in the [0,1] range and can be used as a metric of model quality. ROC Curve that is above the 45 degree line in chart is better than “guessing” (sorting at random). Every point on the curve represents the true positive (sensitivity) vs the false postitive (1-Specifiity) for a particular threshold that one could use to classify observations.

Editor's Notes

  1. We start off by defining what a statistical model is. A statistical model is a function, f(X), that we use to predict some response or outcome, that we label Y. Here, X represents one or more continuous or categorical predictor variables. We write the statistical model as Y = f(X) + residual error. The residual error is the left over part of the variation in Y that we cannot predict with the function, f(X). It turns out that during the process of building and evaluating statistical models, the residual error plays a key role, and examining and understanding that residual error can help us greatly as be seek to build a good model.
  2. We start off by defining what a statistical model is. A statistical model is a function, f(X), that we use to predict some response or outcome, that we label Y. Here, X represents one or more continuous or categorical predictor variables. We write the statistical model as Y = f(X) + residual error. The residual error is the left over part of the variation in Y that we cannot predict with the function, f(X). It turns out that during the process of building and evaluating statistical models, the residual error plays a key role, and examining and understanding that residual error can help us greatly as be seek to build a good model.
  3. Decision Tree models are known by a number off different names and algorithms: Recursive partitioning, CHAID (stands for Chi-Square Automatic Interaction Detector), CART (stands for Classification and Regression Trees). Decision tree models are nothing more than a series of nested if statements. The IF-THEN and ELSE outcomes and conditions for the nested IF statements can be viewed as branches in a tree. Decision tree models are constructed in a way that for each branch in the tree, the difference in the average response is maximized. These models are grown by adding more and more branches in the tree, with each successive branch explaining more of the variation in the response variable.
  4. Consider a modeling example where the goal is to predict the likelihood that an individual will default on a loan. This is the same example that was used for illustrating how to construct an ROC Curve. Overall in this data, we have 27900 individuals with a 3.23% default rate. We also have 24 candidate “X” variables that can be used to model and explain the default rate. The algorithm for the decision tree will search through each one of these candidate variables, and through each unique level or combination of levels for each variable, and will find the “split” in the data, based on one variable, that maximizes difference in the response rate. (Actually, what is found is the split that has the largest LogWorth, which is just –log10([p-value) for the statistical test of the mean difference (or proportion difference) between the two groups.
  5. In this example, the 1st split that is chosen is based on the AGE variable. Across all the possible splits, the algorithm found that the split that maximized the difference in the default rate was when AGE < 28 or AGE >=28. For the under 28 group, the default rate was 6.51%, and the 28 and older group the rate was 2.24% Once this step is done, to find the next branch in the decision tree, we repeat the algorithms across each “partition” of the data, across each branch of the tree. So we look in the 28 and under group and find the split that maximizes logWorth, and also in the 28 and older group. Then whichever one of those has the overall maximum logWorth, we use that split in the model.
  6. For the 2nd split, it finds that the types of credit cards the under 28 group has describes the most variability in the default rate. For those with more traditional credit cards (Mastercard, VISA, Cheque Card), the default rate is 3.13%, while those with non-traditional or no credit cards have a default rate of 8.16%. To find the next split in the tree we repeat the process again across the 3 partitions of the data, represented by the 3 branches of the treee, to find the split that maximizes logWorth.
  7. The 3rd split in the tree is found in the 28 and older group, based on the number of telephone #’s the person has. Those who have 1 or less (fewer than 2) have a default rate of 4.35% default rate, and those with 2 or more have a 1.89% rate. So with just 3 steps of the tree building process, we have gone from a “trunk” of a tree with a rate of 3.23%, to a tree with 4 branches that have rates of 8.2% to 1.9%. We would normally continue this process of adding branches to the tree, so that we grow out a model that predicts well. One problem that decision trees have is that it is very easy to overfit a decision tree to the data. You could imagine that if we allowed the tree algorithm to continue until we had all branches with just ONE observation in each tree, that would fit the data perfectly. One way to prevent this from happening is the set a minimum branch size requirement, so that no branch in a tree will have fewer than a specified number of rows or percentage of rows. Another good idea is to use hold out validation. As you grow the tree, you can check to see how well that tree is predicting the validation data, and if that stops improving as you add more complexity into the tree model, you could stop adding splits to the tree.
  8. The Bootstrap Forest method applies ideas in data sampling and model averaging that help building predictive models with very good performance. The first idea that is applied is Bootstrapping. A bootstrap sample from a data table is a 100% sample of the data, but performed with replacement. So, for instance, if we had a data table with 1000 rows of data, then a bootstrap sample would have 1000 rows of data. But, the bootstrap sample would not be the same as the original data table, because sample with replacement will lead to some rows being sample more than once (2, 3, 4, or even more times), and some rows not being sampled at all. Typically about 38% of the data is not selected at all. If you build a decision tree model on that bootstrap sample, it may not, by itself, be a very good tree, because the bootstrapped data used to build it is so unrepresentative of the actual data. But if we repeat this process over and over again, and average all the models built across many bootstrap samples, this can lead to a model that performs very well. This approach, of averaging models built across many bootstrap samples is known as Bootstrap Aggregation, or “bagging”. Bagged decision tree models, on average, perform better than a single decision tree built on the original data. To improve the model performance even more, we apply sampling of factors during the tree building process. For each tree built (on a separate bootstrap sample), for each split decision, a random subset of all the possible factors is selected and the optimal split is found among them. Typically we choose a small subset, say 25% of all the candidate factors. So at each split decision, about 75% of the factors are ignored, at random. This give each factor an opportunity to contribute to the model. Combining bagging and variable sampling results in a tree building method known as a Bootstrap Forest. This is also known as a random forest technique.
  9. Boosting is a newer idea in data mining, where models are built in layers. (go to the next slide)
  10. Neural Networks are sometimes described as “black box” methods, in that the underyling models are complex and most applications of Neural Networks rely on software to fit the models for you automatically. It turns out that Neural Networks are nothing more than highly flexible non-linear models. While mathematically complex, the structure of a Neural Network model can be viewed as a network of connected blocks and nodes. To illustrate this, consider a simple example where there is one response variable, Y, and there are two predictor variables, X1 and X2. (next slide)
  11. This is a diagram represention of an example Neural Network. On the left hand side of the graph are the factors or inputs to the model. On the right hand side is the response or output from the model. In between is a network of nodes and links that represent the model. The types neural networks you can fit in JMP (Pro) have either one or two layers, called “hidden” layers, and within each layer you can have one or more node. The nodes have a symbol overlayed on them that represents a mathematical transformation. The way this neural network is constructed is as follows: You take a linear combination of the inputs into each node in the 2nd hidden layer, and then that value is transformed by the associated node’s “activation function”. So, for instance, on the top node in the first layer, you take a linear combination, a + b*X1 + c*X2, and you transform it using the “S”-shaped function, which is a hyperbolic tangent function. For the middle node in the 2nd hidden layer, you take a separate linear combination, d + e*X1 + f*X2 into that node and perform a linear (or multiplicative scale) transformation on that value. For the bottom node in the 2nd hidden layer, you take yet another linear combination of the inputs, g + h*X1 + i * X2, and you transform them with a Gaussian function (also known as a radial function). What comes out of each of the nodes in the 2nd layer is a new transformed value. In this NN, those transformed values are treated like inputs to the 1st hidden layer, and same approach is used. Linear combinations 2nd hidden layer results are brought into each of the 1st hidden layer nodes, and another transformation is applied in each node. What comes out of the end of the network, the output of the 1st hidden layer, is then used to make a prediction of the output, the Y variable, but taking a linear combination of those values. The very final part of the NN, between the 1st hidden layer and the output, is essentially a simple linear regression. One way to think about what a neural network does is to find appropriate transformations of inputs so that you use those transformed values to build a linear regression model. The description that I just gave about this neural network is somewhat simplistic, in that the estimation (or model building) process is done simultaneously, treating this neural network as one big non-linear model. Also, it is important to mention that this particular Neural Network is just one of many neural networks we could fit to this data. The number of layers and the number of each type of node to use is left up to the user to specify.
  12. So what is the big picture on Neural Networks? These types of models are mathematically complex, but they can be easily represented and understood using the network diagram. They are complex models that can be more computationally challenging and time consuming when you fit them The key strength of Neural Networks are their flexibility. They can model a wide variety of relationships between inputs and outputs. That strength is also a potential weakness of neural nets. Because they are so good at fitting data, they have a tendency to overfit. Because of that tendency, it is good you use some sort of validation strategy when you building models and deciding which network model is the right one to use. In JMP, we require you to use some sort of validation, whether it is holdout or k-fold crossvalidation. In addition, the underlying analytic method use to estimate the model parameters is designed to help reduce the overfitting problem. There is a very nice whitepaper on the JMP web site that describes this in greater details, and the link is shown at the bottom of this slide.
  13. During this seminar, we will use various measures of model quality. For models where we are predicting a continuous response variable, the metrics we use are based on the sum of the squared error (SSE) in prediction. The most common metric we use is the R^2 (R-squared). R^2 is defined as 1 – (SSE) / (TSS), where TSS is the total variability in the response. If the model predicts perfectly, then SSE is zero and R^2 is one. If the model doesn’t predict any better than using the average of the response variable, the R^2 is zero. R^2 will range between [0,1] when evaluating the model on the training data, but it can actually become negative when evaluating the model on the training data set. A negative R^2 is making worse predictions that just using the average response, and is a strong indication of overfitting. There are other model quality measures based on the SSE, such as AIC (Aikaike’s information criteria) and BIC (Bayesian Information Criteria), but we will not focus on using those metrics in this seminar. When the response is categorical, there are generalizations of R^2 which are used to help determine the appropriate model size, but we can also use other measures to assess model quality. When you have a categorical response model, the prediction model you build is a probability model, it predicts the probability of the outcome levels for the response. You can then use those probability models to make classification decisions about the expected response. This lees to metrics such as misclassification rate, and a cross-classification matrix known as the confusion matrix. Additionally, these probability models can be used to “sort” the data, and ROC curves summarize the ability of these models to effectively sort the data
  14. The ROC curve is a “mystical” entity. Many predictive modeling methods use ROC curves to measure model quality. But a simple and understandable explanation of ROC curves is hard to find. To help explain what an ROC curve is, we will use an example where a model was built to predict the probability of a binary outcome. In this case, the variable, GB, stands for “good’ / “bad”, and is an indicator of whether and individual defaulted on a loan or not. GB==1 indicates that the individual did NOT pay off the loan, and we have model formula that predicts the probability that GB==1. What we do is sort the data, based on that model, from high predicted probability to low probability. If the model is predicting well, then what we would expect, upon sorting the data by predicted probability, is that we would have a lot of the defaulters at the top of the sorted table, and a lot of the non-defaulters at the bottom of the table. What is shown is the ROC curve which describes the model’s sorting ability. Zooming in on the lower left hand corner of the graph, we can see how this relates to the sorted table. In the sorted table, the first 9 rows are sorted TOWARDS the target, that is, those rows are the highest predicted probability of default and they were actually defaulters. So, for each one of those rows, we draw the graph up, at total of 9 steps. Notice that the next 8 rows are not sorted toward the desired target, that is they have a relatively high probability of default, but they were not defaulters. We draw the graph over, horizontally, 8 steps, one step for each one of those rows. The next 9 rows are sorted toward the target, and again we draw the graph up 9 steps.
  15. This is a verbal description of how to construct the ROC curve. No need to go over this again, as it was just described in the last slide.
  16. The step size for each row is based on the number of 1’s and zero’s in the data set, so that the graph, if we draw it based on the whole table, is plotted on the [0,1] x [0,1] range. For a model that is predicting well, you would expect that the graph, initially, would go more “up” than “over”, and the graph would be raised above the 45degree line shown on the chart. The higher the curve is above that 45 degree line, the better the model is doing at sorting. Because ROC curves are always on the same scale., we can also use the area under the curve (AUC) as a metric of model quality. A model that predicts perfectly (and therefore sorts perfectly), would have a ROC curve that goes up all the way to 1, and then over all the way to 1, so it looks like an upper right angle. A model that does not better than just picking or sorting at random would have a ROC curve that looks roughly like the 45 degree lines. So the more the ROC curve is above the 45 degree line, the better the model. Each point on this curve corresponds to a particular Probability that can be used as a threshold for making decisions about an observation. Take, for instance, the point indicated on the graph (click animation to show). If the associated probability threshold were used to make decisions about the “Good/Bad” outcome, we would expect to correctly classify about 57% of the “Bad” credit risk, but at the tradeoff cost of incorrectly classifying about 27% of “Good” credit risk individuals as a “Bad” credit risk. (an expected question...how do I find that decision threshold? Unfortunately, with JMP you have to search the sorted table to find it. In JMP 11 there will also be a “cumulative” gains chart that is easier to use to find this threshold in the table. The Y axis for the cum gains chart will be the same as the ROC curve (Sensitivity), but the X-axis will be the sorted portion of the data, which you can then use to somewhat easily find the row and lookup it’s associated predicted probability).