SlideShare a Scribd company logo
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
THE BOOTSTRAP AND BEYOND:
USING JSL FOR RESAMPLING
• Michael Crotty & Clay Barker
• Research Statisticians
• JMP Division, SAS Institute
Copyright © 2012, SAS Institute Inc. All rights reserved.
OUTLINE
• Nonparametric Bootstrap
• Permutation Testing
• Parametric Bootstrap
• Bootstrap Aggregating (Bagging)
• Bootstrapping in R
• Stability Selection
• Wrap-up/Questions
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
NONPARAMETRIC BOOTSTRAP
Copyright © 2012, SAS Institute Inc. All rights reserved.
NONPARAMETRIC
BOOTSTRAP
INTRODUCTION TO THE BOOTSTRAP
• Introduced by Brad Efron in 1979; grown in popularity as computing power
increases
• Resampling technique that allows you to estimate the variance of statistics,
even when analytical expressions for the variance are difficult to obtain
• You want to know about the population, but all you have is one sample
• Treat the sample as a population and sample from it with replacement
• This is called a bootstrap sample
• Repeating this sampling scheme produces bootstrap replication
• For each bootstrap sample, you can calculate the statistic(s) of interest
Copyright © 2012, SAS Institute Inc. All rights reserved.
NONPARAMETRIC
BOOTSTRAP
BOOTSTRAP WORLD
• Efron & Tibshirani (1993) diagram of
the Real world and the Bootstrap
world to illustrate why bootstrapping
works
Copyright © 2012, SAS Institute Inc. All rights reserved.
NONPARAMETRIC
BOOTSTRAP
THE BOOTSTRAP IN JMP
• Possible to do a bootstrap analysis prior to JMP 10 using a script
• “One-click bootstrap” added to JMP Pro in Version 10
• Available in most Analysis platforms
• Rows need to be independent for one-click bootstrap to be implemented
• Takes advantage of the Automatic Recalc feature
• Results can be analyzed in Distribution platform, which will know to provide
Bootstrap Confidence Limits, based on percentile interval method (Efron &
Tibshirani 1993)
Copyright © 2012, SAS Institute Inc. All rights reserved.
NONPARAMETRIC
BOOTSTRAP
NON-STANDARD QUANTITIES
• By non-standard, I mean statistics for which we don’t readily have standard
errors
• Could be unavailable in JMP
• Could be difficult to obtain analytically
• Example: Adjusted R^2 value in linear regression
Copyright © 2012, SAS Institute Inc. All rights reserved.
NONPARAMETRIC
BOOTSTRAP
RECAP
• Bootstrap is a powerful feature with many uses
• Primarily a UI feature, but capability is enhanced when scripted in JSL
• Allows us to get confidence intervals for statistics, functions of statistics and
curves
• Examples from Discovery 2012:
• Non-standard quantities
• Functions of the output
• Multiple tables in one bootstrap run
• Model from the Fit Curve platform
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
PERMUTATION TESTS
Copyright © 2012, SAS Institute Inc. All rights reserved.
PERMUTATION
TESTS
INTRODUCTION
• Introduced by R.A. Fisher in the 1930’s
• Fisher wanted to demonstrate the validity of Student’s t test without normality
assumption
• Provide exact results, but only apply to a narrow range of problems
• Must have something to permute (i.e. change the order of)
• Bootstrap hypothesis testing extends permutation testing to more problems
Copyright © 2012, SAS Institute Inc. All rights reserved.
PERMUTATION
TESTS
CONCEPTS
• Basic idea:
• Sample repeatedly from the permutation distribution
• Note that this is sampling without replacement (not with replacement)
• Resampling purpose is to permute (change the order of) the observations
• Compare the number of results more extreme than the observed result
• Calculate a p-value:
# 𝑟𝑒𝑠𝑢𝑙𝑡𝑠 𝑚𝑜𝑟𝑒 𝑒𝑥𝑡𝑟𝑒𝑚𝑒
#{𝑡𝑜𝑡𝑎𝑙 𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠}
Copyright © 2012, SAS Institute Inc. All rights reserved.
PERMUTATION
TESTS
TWO-SAMPLE EXAMPLE
• Consider comparing the (possibly different) distributions (𝐹, 𝐺) of two samples
(sizes 𝑛, 𝑚 with 𝑁 = 𝑛 + 𝑚)
• 𝐻0: 𝐹 = 𝐺
• Under 𝐻0, all permutations of the observations across 𝐹, 𝐺 are equally likely
• There are 𝑁
𝑛
possible permutations; generally sampling from these is sufficient.
• For each permutation replication, determine if the difference ( 𝜃∗
) is greater than the
observed difference ( 𝜃).
• Tabulate the number of times 𝜃∗
≥ 𝜃 and divide by number of replications.
• This is a one-sided permutation test; a two-sided test can be performed by taking
absolute values of 𝜃∗
and 𝜃.
Copyright © 2012, SAS Institute Inc. All rights reserved.
PERMUTATION
TESTS
DEMONSTRATION #1
• Oneway platform in JMP can compute
robust mean estimates
• Test included is a Wald test with an asymptotic
Chi Square distribution p-value
• We wish to use a permutation test to avoid
the distributional assumption of the
asymptotic test
Copyright © 2012, SAS Institute Inc. All rights reserved.
PERMUTATION
TESTS
DEMONSTRATION #1
• Script input:
• continuous response
• categorical predictor
• # of permutation replications
• Script output:
• original Robust Fit
• permutation test results
newX = xVals[random shuffle(iVec)];
// set the x values to a random permutation
column(eval expr(xCol)) << set values(newX);
Copyright © 2012, SAS Institute Inc. All rights reserved.
PERMUTATION
TESTS
DEMONSTRATION #1
Robust mean permutation test demo
Copyright © 2012, SAS Institute Inc. All rights reserved.
PERMUTATION
TESTS
DEMONSTRATION #2
• Contingency platform in JMP performs two
Chi Square tests for testing if responses
differ across levels of the X variable
• Tests require that expected counts of
contingency table cells be > 5
• We wish to use a permutation test to avoid
this requirement
Copyright © 2012, SAS Institute Inc. All rights reserved.
PERMUTATION
TESTS
DEMONSTRATION #2
• Script input:
• categorical response
• categorical predictor
• # of permutation replications
• Script output:
• results of two original Chi Square tests
• permutation test results
newY = responseVals[random shuffle(ivec)];
column(eval expr(responseCol)) << set values(newY);
Copyright © 2012, SAS Institute Inc. All rights reserved.
PERMUTATION
TESTS
DEMONSTRATION #2
Contingency table permutation test demo
Copyright © 2012, SAS Institute Inc. All rights reserved.
PERMUTATION
TESTS
RECAP
• When available, full permutation tests provide exact results
• …and incomplete permutation tests still give good results
• If something can be permuted, these tests are easy to implement
• For other significance testing situations, bootstrap hypothesis testing works
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
PARAMETRIC BOOTSTRAP
Copyright © 2012, SAS Institute Inc. All rights reserved.
PARAMETRIC
BOOTSTRAP
INTRODUCTION
• JMP provides a one-click non-parametric bootstrap feature.
• But there are other variations of the bootstrap: residual resampling, Bayesian
bootstrap, …
• This section will provide an introduction to the parametric bootstrap and how
it can be implemented in JSL.
Copyright © 2012, SAS Institute Inc. All rights reserved.
PARAMETRIC
BOOTSTRAP
WHY NOT SAMPLE ROWS?
• There are times when we may not want to resample rows of our data.
Nonparametric
Bootstrap
Sample
Copyright © 2012, SAS Institute Inc. All rights reserved.
PARAMETRIC
BOOTSTRAP
WHY NOT SAMPLE ROWS?
• For a sigmoid curve, we don’t want to lose any sections of the curve:
• Upper asymptote
• Lower asymptote
• Inflection point
• Similar issues arise with logistic regression, resampling rows can lead to
problems with separation.
• There are several alternatives to resampling rows, we will focus on the
parametric bootstrap.
Copyright © 2012, SAS Institute Inc. All rights reserved.
PARAMETRIC
BOOTSTRAP
DETAILS
• Nearly identical to the nonparametric bootstrap, except for the way that we
generate our bootstrap samples.
• Nonparametric bootstrap samples from the empirical distribution of our data.
• Parametric bootstrap samples from the fitted parametric model for our data.
 Suppose we use 𝐹(𝛽) to model our response 𝑌. Fitting the model to our
observed data gives us 𝐹( 𝛽), our fitted parametric model.
 A nonparametric bootstrap algorithm is
1. Obtain 𝐹 𝛽 by fitting the parametric model to observed data.
2. Use 𝐹( 𝛽) to generate 𝑌𝑗
∗
, a vector of random pseudo-responses
3. Fit 𝐹(𝛽) to 𝑌𝑗
∗
, giving us 𝛽𝑗
∗
4. Store 𝛽𝑗
∗
and return to step 2 for j=1…,B.
Copyright © 2012, SAS Institute Inc. All rights reserved.
PARAMETRIC
BOOTSTRAP
SIGMOID CURVE EXAMPLE
• A few slides ago, we saw an example of a sigmoid curve.
𝐹 𝛽 = 𝑔 𝑥, 𝛽 = 𝛽3 +
𝛽4 − 𝛽3
1 + 𝐸𝑥𝑝[ −𝛽1 𝑥 − 𝛽2 ]
• Assume the response is normally distributed: y ∼ 𝑁(𝑔 𝑥, 𝛽 , 𝜎2).
• Fitting the curve to our original data set gives us an estimate for our
coefficients and error variance.
Term 𝛽1 𝛽2 𝛽3 𝛽4 𝜎
Estimate 5.72 .32 .60 1.11 .011
Copyright © 2012, SAS Institute Inc. All rights reserved.
PARAMETRIC
BOOTSTRAP
SIGMOID EXAMPLE CONTINUED
• Using our estimated coefficients and error variance
𝑦𝑗
∗
= 𝑔 𝑥𝑗, 𝛽 + 𝜎𝜖𝑗
where the 𝜖𝑗 are independent and identically distributed standard normal.
Parametric
Bootstrap
sample
Copyright © 2012, SAS Institute Inc. All rights reserved.
PARAMETRIC
BOOTSTRAP
MORE DETAILS
• Resampling rows can be problematic.
• Any other reasons to use the parametric bootstrap?
• Results will be close to “textbook” formulae when available.
• Very nice for doing goodness of fit tests.
• Example: Normal goodness of fit test
𝐻0: 𝑁 𝜇, 𝜎 appropriate for 𝑦 vs 𝐻1: 𝑁 𝜇, 𝜎 not appropriate
Parametric bootstrap gives us the distribution of the test statistic under 𝐻0
→ Perfect for calculating p-values
Copyright © 2012, SAS Institute Inc. All rights reserved.
PARAMETRIC
BOOTSTRAP
…AND JMP
• JMP provides a one-click nonparametric bootstrap, but a little bit of scripting
gets us the parametric bootstrap as well.
• Most (all?) modeling platforms in JMP allow you to save a prediction formula.
We can also create columns of random values for many distributions.
Put these two things together and we’re well on our way!
Copyright © 2012, SAS Institute Inc. All rights reserved.
PARAMETRIC
BOOTSTRAPPING
DEMONSTRATION
Parametric Bootstrapping Demonstration in JMP
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
BOOTSTRAP AGGREGATING (BAGGING)
Copyright © 2012, SAS Institute Inc. All rights reserved.
BAGGING INTRODUCTION
• We have seen bootstrapping for inference, we can also use the bootstrap to
improve prediction.
• Breiman (1996a) introduced the notion of “bootstrap aggregating” (or bagging
for short) to improve predictions.
• The name says it all…aggregate predictions across bootstrap samples.
Copyright © 2012, SAS Institute Inc. All rights reserved.
BAGGING UNSTABLE PREDICTORS
• Breiman (1996b) introduced the idea of instability in model selection.
• Let’s say that we are using our data
D = { (𝒙𝑖, 𝑦𝑖), 𝑖 = 1, … , 𝑛 }
to create a prediction function 𝜇 𝑥, 𝐷 .
• If a small change in 𝐷 results in a large change in 𝜇 ∙,∙ , we have an unstable
predictor.
• A variety of techniques have been shown to be unstable:
Regression trees, best subset regression, forward selection, …
• Instability is a major concern when predicting new observations.
Copyright © 2012, SAS Institute Inc. All rights reserved.
BAGGING THE BAGGING ALGORITHM
• A natural way to deal with instability is to observe the behavior of 𝜇 ∙,∙ for
repeated perturbations of the data → bootstrap it!
• Basic bagging algorithm:
1. Take a bootstrap sample 𝐷𝑗 from the observed data 𝐷.
2. Fit your model of choice to 𝐷𝑗, giving you predictor 𝜇 𝑗(𝒙).
Repeat 1 and 2 for 𝑗 = 1, … , 𝑏
Then the bagged prediction rule is
𝜇(𝒙) 𝑏𝑎𝑔 =
1
𝑏 𝑗=1
𝑏
𝜇 𝑗(𝒙)
• Bagging a classifier is slightly different. You can either average over the
probability formula or use a voting scheme.
Copyright © 2012, SAS Institute Inc. All rights reserved.
BAGGING REGRESSION TREES
• Regression trees (as well as classification
trees) are known to be particularly unstable.
Ex:
𝑦 𝑥 =
1 𝑥 < 2
5 2 ≤ 𝑥 ≤ 4
3 𝑥 > 4
• A regression tree looks for optimal splits in your
predictors and fits a simple mean to each
section.
Copyright © 2012, SAS Institute Inc. All rights reserved.
BAGGING MORE ON INSTABILITY
• In general, techniques that involve “hard decision rules” (like binary splits or
including/excluding terms) are likely to be unstable.
• Regression trees: binary splits
• Best subset: X1 is either included or left out of the model (nothing in between)
• Is anything stable???
One example: Penalized regression techniques can shrink estimates, which is kind of
like letting a variable partially enter the model.
Copyright © 2012, SAS Institute Inc. All rights reserved.
BAGGING REGRESSION TREE EXAMPLE
A regression tree for these
data will change drastically
depending on whether or not
we include the point at x=21.
Copyright © 2012, SAS Institute Inc. All rights reserved.
BAGGING REGRESSION TREE EXAMPLE
Predictions for a tree with a
single split.
Blue includes x=21.
Red excludes x=21.
This kind of difference is
crucial when we observe
new data.
Copyright © 2012, SAS Institute Inc. All rights reserved.
BAGGING REGRESSION TREE EXAMPLE
The bagged predictor is a
compromise between the two.
Copyright © 2012, SAS Institute Inc. All rights reserved.
BAGGING RECAP
• Bagging is a very useful tool that improves upon unstable predictors.
Added bonus: we also get a measure of uncertainty.
• Several features in JMP and JSL make it convenient to do bagging yourself.
Most platforms (if not all) allow you to save a prediction formula
linearModel << save prediction formula;
predFormula = column("Y Predictor") << get formula;
And we get a lot of mileage out of the resample freq() function
rsfCol = newColumn("rsf", set formula(resample freq(1)));
• An implementation of Random Forests was added to JMP Pro in version 10.
Random forests resample rows and columns.
Copyright © 2012, SAS Institute Inc. All rights reserved.
BAGGING DEMONSTRATION
Bagging Demonstration in JMP
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
USING R FOR BOOTSTRAPPING
Copyright © 2012, SAS Institute Inc. All rights reserved.
USING R FOR
BOOTSTRAPPING
THE JMP INTERFACE TO R
• JMP 10 added the ability to transfer information between JMP and R (a very
powerful open-source statistical software package).
• R has packages to do bootstrapping, in particular the “boot” package.
• We can do the bootstrap in R using a custom-made JMP interface.
Copyright © 2012, SAS Institute Inc. All rights reserved.
USING R FOR
BOOTSTRAPPING
CONNECTING TO R
• JMP is a very nice complement to R, making it easy to create convenient
interfaces to R and then present R results in JMP.
• A handful of JSL functions allow you to communicate with R.
R Init(); // Initializes the connection to R
x = [1, 2, 3];
R Send( x ); // sends the matrix x to R
R Submit("x <- 2*x"); // submits R Code
y = R Get(x);// gets the object x from R and names it y
R Term(); // Terminates the connection to R
• There are a few more JSL functions for communicating with R, but the
functions listed above will handle the majority of your needs.
Copyright © 2012, SAS Institute Inc. All rights reserved.
USING R FOR
BOOTSTRAPPING
CONNECTING TO R
• The R connection allows us to combine the strengths of R and JMP
Copyright © 2012, SAS Institute Inc. All rights reserved.
USING R FOR
BOOTSTRAPPING
DEMONSTRATION
JMP and R Integration Demonstration
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
STABILITY SELECTION
Copyright © 2012, SAS Institute Inc. All rights reserved.
STABILITY
SELECTION
INTRODUCTION
• Bagging is a way to use resampling to improve prediction.
We can also use resampling to improve variable selection techniques.
• Meinshausen and Buhlmann (2010) introduced Stability selection, a very
general modification that can be used in conjunction with any traditional
variable selection technique.
• The motivation behind stability selection is simple and very intuitive:
If a predictor is typically included in the final model after doing variable
selection on a subset of the data, then it is probably a meaningful variable.
Copyright © 2012, SAS Institute Inc. All rights reserved.
STABILITY
SELECTION
THE VARIABLE SELECTION PROBLEM
• Suppose that we have observed data
D = { (𝒙𝑖, 𝑦𝑖), 𝑖 = 1, … , 𝑛 }
𝒙𝑖 is a 𝑝 × 1 vector of predictors for observation 𝑖.
• We want to build a model for the response 𝑦 using a subset of the predictors
to improve both interpretation and predictive ability.
This is the classic variable selection problem.
• No shortage of variable selection techniques:
stepwise regression, best subset, penalized regression, …
Copyright © 2012, SAS Institute Inc. All rights reserved.
STABILITY
SELECTION
THE VARIABLE SELECTION PROBLEM
• Usual linear models problem, we want to
estimate the coefficient vector 𝛽 for the model
𝑌 = 𝑋𝛽 + 𝜀.
• Want our variable selection technique to set
𝛽𝑗 = 0 for some of the terms.
• We always have at least one tuning parameter
(λ) that controls the complexity of the model.
Doing variable selection yields a set
𝑆 𝜆 = 𝑘 ∶ 𝛽 𝑘 ≠ 0
• We tune the model using Cross-Validation, AIC,
BIC, …
Variable
Selection
Technique
Tuning
Parameter
Forward
Selection
Alpha-to-enter
Backward
Elimination
Alpha-to-leave
Best Subset Maximum model
size considered
Lasso L1 norm
Least Angle
Regression
Number of
nonzero
Variables
Copyright © 2012, SAS Institute Inc. All rights reserved.
STABILITY
SELECTION
THE DETAILS
• The stability selection algorithm:
1. Choose a random subsample without replacement (𝐷𝑗) of size
𝑛
2
from 𝐷.
2. Use a variable selection technique to obtain 𝑆 𝑘(𝜆), the set of nonzero coefficients for
variables selected for tuning parameter value 𝜆.
3. Repeat steps 1 and 2 for 𝑘 = 1 … 𝐵
• Easy to implement: only as complicated as the underlying selection technique.
• We calculate Π𝑗 𝜆 , the probability variable 𝑗 is included in the model when doing
selection (with tuning 𝜆) on a random subset of the data.
Π𝑗 𝜆 =
1
𝑏
𝑘=1
𝑏
𝐼(𝑥𝑗 ∈ 𝑆 𝑘(𝜆))
Copyright © 2012, SAS Institute Inc. All rights reserved.
STABILITY
SELECTION
MORE DETAILS
• Applying the algorithm to a meaningful
range of 𝜆 values shows us how
inclusion probabilities change as a
function of the tuning parameter.
• It makes sense that if a term maintains
a high selection probability, then it
should be in our final model.
Copyright © 2012, SAS Institute Inc. All rights reserved.
STABILITY
SELECTION
MORE DETAILS
• After looking across a meaningful
range of 𝜆 values, we include variable
𝑥𝑗 in our final model if
• max
𝜆
Π𝑗 𝜆 ≥ Π 𝑡ℎ𝑟
• Here Π 𝑡ℎ𝑟 is a tuning parameter
chosen in advance. Meinshausen and
Buhlmann (2010) shows that the
results are not sensitive to the choice
of Π 𝑡ℎ𝑟.
Copyright © 2012, SAS Institute Inc. All rights reserved.
STABILITY
SELECTION
MORE DETAILS
• Improves unstable variable selection problems.
• Very general, can be applied in most variable selection settings.
• Can greatly outperform cross-validation for high dimensional data (𝑛 ≪ 𝑝).
• Theory shows that stability selection controls the false discovery rate.
• Why subsample instead of the usual bootstrap?
Taking a subsample (without replacement) of size
𝑛
2
provides a similar
reduction in information as a standard bootstrap sample.
Saves time when the underlying variable selection technique is
computationally intense.
Copyright © 2012, SAS Institute Inc. All rights reserved.
STABILITY
SELECTION
…AND JMP
• JMP provides several built-in variable selection techniques:
• Forward selection, backward elimination, Lasso, Elastic Net, …
• The most convenient in JMP? Forward selection with alpha-to-enter tuning.
• Scripting stability selection for forward selection is fairly straightforward using
a frequency column in conjunction with the Stepwise platform.
newFreq = J(n,1,0);
newFreq[random shuffle(rows)[1::floor(n/2)]]=1;
fCol << set values(newFreq);
• We have implemented a slight modification of the stability selection algorithm,
which Shah and Samworth (2013) shows to have some added perks.
Copyright © 2012, SAS Institute Inc. All rights reserved.
STABILITY
SELECTION
DEMONSTRATION
Stability Selection Demonstration in JMP
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
WRAP-UP
Copyright © 2012, SAS Institute Inc. All rights reserved.
WRAP-UP REFERENCES
• Breiman, L. (1996). Bagging Predictors. Machine Learning, 24, 123-140.
• Breiman, L. (1996). Heuristics of Instability and Stabilization in Model
Selection. The Annals of Statistics, 24, 2350-2383.
• Efron, B. and Tibshirani, R. (1998). An Introduction to the Bootstrap,
Chapman & Hall/CRC.
• Meinshausen, N. and Buhlmann, P. (2010). Stability Selection. Journal of the
Royal Statistical Society Series B, 72, 417-473.
• Shah, R. and Samworth, R. (2013). Variable selection with error control:
another look at stability selection. Journal of the Royal Statistical Society
Series B, 75, 55-80.
www.SAS.comCopyr ight © 2012, SAS Institute Inc. All rights reser ved.
THANK YOU!
The Bootstrap and Beyond: Using JSL for resampling
Michael Crotty, michael.crotty@sas.com
Clay Barker, clay.barker@sas.com
Research Statisticians
JMP Division, SAS Institute

More Related Content

Similar to The Bootstrap and Beyond: Using JSL for Resampling

Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
MLconf
 
"A Framework for Developing Trading Models Based on Machine Learning" by Kris...
"A Framework for Developing Trading Models Based on Machine Learning" by Kris..."A Framework for Developing Trading Models Based on Machine Learning" by Kris...
"A Framework for Developing Trading Models Based on Machine Learning" by Kris...
Quantopian
 
AIAA-SDM-SequentialSampling-2012
AIAA-SDM-SequentialSampling-2012AIAA-SDM-SequentialSampling-2012
AIAA-SDM-SequentialSampling-2012
OptiModel
 
ASS_SDM2012_Ali
ASS_SDM2012_AliASS_SDM2012_Ali
ASS_SDM2012_Ali
MDO_Lab
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
Sri Ambati
 
Meetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_AllMeetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_All
Bernard Ong
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
Sangwoo Mo
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
SrushtiSuvarna
 
Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15
Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15
Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15
MLconf
 
Unit testing
Unit testingUnit testing
Unit testing
Adam Birr
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
Kien Le
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
Shree Shree
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
SigOpt
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
MLconf
 
Azure machine learning
Azure machine learningAzure machine learning
Azure machine learning
Simone Caldaro
 
R user group meeting 25th jan 2017
R user group meeting 25th jan 2017R user group meeting 25th jan 2017
R user group meeting 25th jan 2017
Garrett Teoh Hor Keong
 
Distributed Model Validation with Epsilon
Distributed Model Validation with EpsilonDistributed Model Validation with Epsilon
Distributed Model Validation with Epsilon
Sina Madani
 
Design p atterns
Design p atternsDesign p atterns
Design p atterns
Amr Abd El Latief
 
Kaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeKaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning Challenge
Bernard Ong
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
Subrat Panda, PhD
 

Similar to The Bootstrap and Beyond: Using JSL for Resampling (20)

Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
 
"A Framework for Developing Trading Models Based on Machine Learning" by Kris...
"A Framework for Developing Trading Models Based on Machine Learning" by Kris..."A Framework for Developing Trading Models Based on Machine Learning" by Kris...
"A Framework for Developing Trading Models Based on Machine Learning" by Kris...
 
AIAA-SDM-SequentialSampling-2012
AIAA-SDM-SequentialSampling-2012AIAA-SDM-SequentialSampling-2012
AIAA-SDM-SequentialSampling-2012
 
ASS_SDM2012_Ali
ASS_SDM2012_AliASS_SDM2012_Ali
ASS_SDM2012_Ali
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
 
Meetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_AllMeetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_All
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
 
Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15
Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15
Jorge Silva, Sr. Research Statistician Developer, SAS at MLconf ATL - 9/18/15
 
Unit testing
Unit testingUnit testing
Unit testing
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx04-Data-Analysis-Overview.pptx
04-Data-Analysis-Overview.pptx
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
Azure machine learning
Azure machine learningAzure machine learning
Azure machine learning
 
R user group meeting 25th jan 2017
R user group meeting 25th jan 2017R user group meeting 25th jan 2017
R user group meeting 25th jan 2017
 
Distributed Model Validation with Epsilon
Distributed Model Validation with EpsilonDistributed Model Validation with Epsilon
Distributed Model Validation with Epsilon
 
Design p atterns
Design p atternsDesign p atterns
Design p atterns
 
Kaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeKaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning Challenge
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 

More from JMP software from SAS

The Straight Way to a Final Result: Mixture Design of Experiments
The Straight Way to a Final Result: Mixture Design of ExperimentsThe Straight Way to a Final Result: Mixture Design of Experiments
The Straight Way to a Final Result: Mixture Design of Experiments
JMP software from SAS
 
A Primer in Statistical Discovery
A Primer in Statistical DiscoveryA Primer in Statistical Discovery
A Primer in Statistical Discovery
JMP software from SAS
 
Grafische Analyse Ihrer Excel Daten
Grafische Analyse  Ihrer Excel DatenGrafische Analyse  Ihrer Excel Daten
Grafische Analyse Ihrer Excel Daten
JMP software from SAS
 
Building Better Models
Building Better ModelsBuilding Better Models
Building Better Models
JMP software from SAS
 
JMP for Ethanol Producers
JMP for Ethanol ProducersJMP for Ethanol Producers
JMP for Ethanol Producers
JMP software from SAS
 
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
JMP software from SAS
 
Statistical Discovery for Consumer and Marketing Research
Statistical Discovery for Consumer and Marketing ResearchStatistical Discovery for Consumer and Marketing Research
Statistical Discovery for Consumer and Marketing Research
JMP software from SAS
 
Exploring Best Practises in Design of Experiments
Exploring Best Practises in Design of ExperimentsExploring Best Practises in Design of Experiments
Exploring Best Practises in Design of Experiments
JMP software from SAS
 
Statistical and Predictive Modelling
Statistical and Predictive ModellingStatistical and Predictive Modelling
Statistical and Predictive Modelling
JMP software from SAS
 
Evaluating & Monitoring Your Process Using MSA & SPC
Evaluating & Monitoring Your Process Using MSA & SPCEvaluating & Monitoring Your Process Using MSA & SPC
Evaluating & Monitoring Your Process Using MSA & SPC
JMP software from SAS
 
Everything You Wanted to Know About Definitive Screening Designs
Everything You Wanted to Know About Definitive Screening DesignsEverything You Wanted to Know About Definitive Screening Designs
Everything You Wanted to Know About Definitive Screening Designs
JMP software from SAS
 
Basic Design of Experiments Using the Custom DOE Platform
Basic Design of Experiments Using the Custom DOE PlatformBasic Design of Experiments Using the Custom DOE Platform
Basic Design of Experiments Using the Custom DOE Platform
JMP software from SAS
 
Correcting Misconceptions About Optimal Design
Correcting Misconceptions About Optimal DesignCorrecting Misconceptions About Optimal Design
Correcting Misconceptions About Optimal Design
JMP software from SAS
 
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
JMP software from SAS
 
Building Models for Complex Design of Experiments
Building Models for Complex Design of ExperimentsBuilding Models for Complex Design of Experiments
Building Models for Complex Design of Experiments
JMP software from SAS
 
Introduction to Modeling
Introduction to ModelingIntroduction to Modeling
Introduction to Modeling
JMP software from SAS
 
New Design of Experiments Features in JMP 11
New Design of Experiments Features in JMP 11New Design of Experiments Features in JMP 11
New Design of Experiments Features in JMP 11
JMP software from SAS
 
When a Linear Model Just Won't Do: Fitting Nonlinear Models in JMP
When a Linear Model Just Won't Do: Fitting Nonlinear Models in JMPWhen a Linear Model Just Won't Do: Fitting Nonlinear Models in JMP
When a Linear Model Just Won't Do: Fitting Nonlinear Models in JMP
JMP software from SAS
 
Exploring Variable Clustering and Importance in JMP
Exploring Variable Clustering and Importance in JMPExploring Variable Clustering and Importance in JMP
Exploring Variable Clustering and Importance in JMP
JMP software from SAS
 

More from JMP software from SAS (19)

The Straight Way to a Final Result: Mixture Design of Experiments
The Straight Way to a Final Result: Mixture Design of ExperimentsThe Straight Way to a Final Result: Mixture Design of Experiments
The Straight Way to a Final Result: Mixture Design of Experiments
 
A Primer in Statistical Discovery
A Primer in Statistical DiscoveryA Primer in Statistical Discovery
A Primer in Statistical Discovery
 
Grafische Analyse Ihrer Excel Daten
Grafische Analyse  Ihrer Excel DatenGrafische Analyse  Ihrer Excel Daten
Grafische Analyse Ihrer Excel Daten
 
Building Better Models
Building Better ModelsBuilding Better Models
Building Better Models
 
JMP for Ethanol Producers
JMP for Ethanol ProducersJMP for Ethanol Producers
JMP for Ethanol Producers
 
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
Exploring Best Practises in Design of Experiments: A Data Driven Approach to ...
 
Statistical Discovery for Consumer and Marketing Research
Statistical Discovery for Consumer and Marketing ResearchStatistical Discovery for Consumer and Marketing Research
Statistical Discovery for Consumer and Marketing Research
 
Exploring Best Practises in Design of Experiments
Exploring Best Practises in Design of ExperimentsExploring Best Practises in Design of Experiments
Exploring Best Practises in Design of Experiments
 
Statistical and Predictive Modelling
Statistical and Predictive ModellingStatistical and Predictive Modelling
Statistical and Predictive Modelling
 
Evaluating & Monitoring Your Process Using MSA & SPC
Evaluating & Monitoring Your Process Using MSA & SPCEvaluating & Monitoring Your Process Using MSA & SPC
Evaluating & Monitoring Your Process Using MSA & SPC
 
Everything You Wanted to Know About Definitive Screening Designs
Everything You Wanted to Know About Definitive Screening DesignsEverything You Wanted to Know About Definitive Screening Designs
Everything You Wanted to Know About Definitive Screening Designs
 
Basic Design of Experiments Using the Custom DOE Platform
Basic Design of Experiments Using the Custom DOE PlatformBasic Design of Experiments Using the Custom DOE Platform
Basic Design of Experiments Using the Custom DOE Platform
 
Correcting Misconceptions About Optimal Design
Correcting Misconceptions About Optimal DesignCorrecting Misconceptions About Optimal Design
Correcting Misconceptions About Optimal Design
 
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
Visual Analytic Approaches for the Analysis of Spontaneously Reported Adverse...
 
Building Models for Complex Design of Experiments
Building Models for Complex Design of ExperimentsBuilding Models for Complex Design of Experiments
Building Models for Complex Design of Experiments
 
Introduction to Modeling
Introduction to ModelingIntroduction to Modeling
Introduction to Modeling
 
New Design of Experiments Features in JMP 11
New Design of Experiments Features in JMP 11New Design of Experiments Features in JMP 11
New Design of Experiments Features in JMP 11
 
When a Linear Model Just Won't Do: Fitting Nonlinear Models in JMP
When a Linear Model Just Won't Do: Fitting Nonlinear Models in JMPWhen a Linear Model Just Won't Do: Fitting Nonlinear Models in JMP
When a Linear Model Just Won't Do: Fitting Nonlinear Models in JMP
 
Exploring Variable Clustering and Importance in JMP
Exploring Variable Clustering and Importance in JMPExploring Variable Clustering and Importance in JMP
Exploring Variable Clustering and Importance in JMP
 

Recently uploaded

Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 

Recently uploaded (20)

Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 

The Bootstrap and Beyond: Using JSL for Resampling

  • 1. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. THE BOOTSTRAP AND BEYOND: USING JSL FOR RESAMPLING • Michael Crotty & Clay Barker • Research Statisticians • JMP Division, SAS Institute
  • 2. Copyright © 2012, SAS Institute Inc. All rights reserved. OUTLINE • Nonparametric Bootstrap • Permutation Testing • Parametric Bootstrap • Bootstrap Aggregating (Bagging) • Bootstrapping in R • Stability Selection • Wrap-up/Questions
  • 3. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. NONPARAMETRIC BOOTSTRAP
  • 4. Copyright © 2012, SAS Institute Inc. All rights reserved. NONPARAMETRIC BOOTSTRAP INTRODUCTION TO THE BOOTSTRAP • Introduced by Brad Efron in 1979; grown in popularity as computing power increases • Resampling technique that allows you to estimate the variance of statistics, even when analytical expressions for the variance are difficult to obtain • You want to know about the population, but all you have is one sample • Treat the sample as a population and sample from it with replacement • This is called a bootstrap sample • Repeating this sampling scheme produces bootstrap replication • For each bootstrap sample, you can calculate the statistic(s) of interest
  • 5. Copyright © 2012, SAS Institute Inc. All rights reserved. NONPARAMETRIC BOOTSTRAP BOOTSTRAP WORLD • Efron & Tibshirani (1993) diagram of the Real world and the Bootstrap world to illustrate why bootstrapping works
  • 6. Copyright © 2012, SAS Institute Inc. All rights reserved. NONPARAMETRIC BOOTSTRAP THE BOOTSTRAP IN JMP • Possible to do a bootstrap analysis prior to JMP 10 using a script • “One-click bootstrap” added to JMP Pro in Version 10 • Available in most Analysis platforms • Rows need to be independent for one-click bootstrap to be implemented • Takes advantage of the Automatic Recalc feature • Results can be analyzed in Distribution platform, which will know to provide Bootstrap Confidence Limits, based on percentile interval method (Efron & Tibshirani 1993)
  • 7. Copyright © 2012, SAS Institute Inc. All rights reserved. NONPARAMETRIC BOOTSTRAP NON-STANDARD QUANTITIES • By non-standard, I mean statistics for which we don’t readily have standard errors • Could be unavailable in JMP • Could be difficult to obtain analytically • Example: Adjusted R^2 value in linear regression
  • 8. Copyright © 2012, SAS Institute Inc. All rights reserved. NONPARAMETRIC BOOTSTRAP RECAP • Bootstrap is a powerful feature with many uses • Primarily a UI feature, but capability is enhanced when scripted in JSL • Allows us to get confidence intervals for statistics, functions of statistics and curves • Examples from Discovery 2012: • Non-standard quantities • Functions of the output • Multiple tables in one bootstrap run • Model from the Fit Curve platform
  • 9. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. PERMUTATION TESTS
  • 10. Copyright © 2012, SAS Institute Inc. All rights reserved. PERMUTATION TESTS INTRODUCTION • Introduced by R.A. Fisher in the 1930’s • Fisher wanted to demonstrate the validity of Student’s t test without normality assumption • Provide exact results, but only apply to a narrow range of problems • Must have something to permute (i.e. change the order of) • Bootstrap hypothesis testing extends permutation testing to more problems
  • 11. Copyright © 2012, SAS Institute Inc. All rights reserved. PERMUTATION TESTS CONCEPTS • Basic idea: • Sample repeatedly from the permutation distribution • Note that this is sampling without replacement (not with replacement) • Resampling purpose is to permute (change the order of) the observations • Compare the number of results more extreme than the observed result • Calculate a p-value: # 𝑟𝑒𝑠𝑢𝑙𝑡𝑠 𝑚𝑜𝑟𝑒 𝑒𝑥𝑡𝑟𝑒𝑚𝑒 #{𝑡𝑜𝑡𝑎𝑙 𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠}
  • 12. Copyright © 2012, SAS Institute Inc. All rights reserved. PERMUTATION TESTS TWO-SAMPLE EXAMPLE • Consider comparing the (possibly different) distributions (𝐹, 𝐺) of two samples (sizes 𝑛, 𝑚 with 𝑁 = 𝑛 + 𝑚) • 𝐻0: 𝐹 = 𝐺 • Under 𝐻0, all permutations of the observations across 𝐹, 𝐺 are equally likely • There are 𝑁 𝑛 possible permutations; generally sampling from these is sufficient. • For each permutation replication, determine if the difference ( 𝜃∗ ) is greater than the observed difference ( 𝜃). • Tabulate the number of times 𝜃∗ ≥ 𝜃 and divide by number of replications. • This is a one-sided permutation test; a two-sided test can be performed by taking absolute values of 𝜃∗ and 𝜃.
  • 13. Copyright © 2012, SAS Institute Inc. All rights reserved. PERMUTATION TESTS DEMONSTRATION #1 • Oneway platform in JMP can compute robust mean estimates • Test included is a Wald test with an asymptotic Chi Square distribution p-value • We wish to use a permutation test to avoid the distributional assumption of the asymptotic test
  • 14. Copyright © 2012, SAS Institute Inc. All rights reserved. PERMUTATION TESTS DEMONSTRATION #1 • Script input: • continuous response • categorical predictor • # of permutation replications • Script output: • original Robust Fit • permutation test results newX = xVals[random shuffle(iVec)]; // set the x values to a random permutation column(eval expr(xCol)) << set values(newX);
  • 15. Copyright © 2012, SAS Institute Inc. All rights reserved. PERMUTATION TESTS DEMONSTRATION #1 Robust mean permutation test demo
  • 16. Copyright © 2012, SAS Institute Inc. All rights reserved. PERMUTATION TESTS DEMONSTRATION #2 • Contingency platform in JMP performs two Chi Square tests for testing if responses differ across levels of the X variable • Tests require that expected counts of contingency table cells be > 5 • We wish to use a permutation test to avoid this requirement
  • 17. Copyright © 2012, SAS Institute Inc. All rights reserved. PERMUTATION TESTS DEMONSTRATION #2 • Script input: • categorical response • categorical predictor • # of permutation replications • Script output: • results of two original Chi Square tests • permutation test results newY = responseVals[random shuffle(ivec)]; column(eval expr(responseCol)) << set values(newY);
  • 18. Copyright © 2012, SAS Institute Inc. All rights reserved. PERMUTATION TESTS DEMONSTRATION #2 Contingency table permutation test demo
  • 19. Copyright © 2012, SAS Institute Inc. All rights reserved. PERMUTATION TESTS RECAP • When available, full permutation tests provide exact results • …and incomplete permutation tests still give good results • If something can be permuted, these tests are easy to implement • For other significance testing situations, bootstrap hypothesis testing works
  • 20. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. PARAMETRIC BOOTSTRAP
  • 21. Copyright © 2012, SAS Institute Inc. All rights reserved. PARAMETRIC BOOTSTRAP INTRODUCTION • JMP provides a one-click non-parametric bootstrap feature. • But there are other variations of the bootstrap: residual resampling, Bayesian bootstrap, … • This section will provide an introduction to the parametric bootstrap and how it can be implemented in JSL.
  • 22. Copyright © 2012, SAS Institute Inc. All rights reserved. PARAMETRIC BOOTSTRAP WHY NOT SAMPLE ROWS? • There are times when we may not want to resample rows of our data. Nonparametric Bootstrap Sample
  • 23. Copyright © 2012, SAS Institute Inc. All rights reserved. PARAMETRIC BOOTSTRAP WHY NOT SAMPLE ROWS? • For a sigmoid curve, we don’t want to lose any sections of the curve: • Upper asymptote • Lower asymptote • Inflection point • Similar issues arise with logistic regression, resampling rows can lead to problems with separation. • There are several alternatives to resampling rows, we will focus on the parametric bootstrap.
  • 24. Copyright © 2012, SAS Institute Inc. All rights reserved. PARAMETRIC BOOTSTRAP DETAILS • Nearly identical to the nonparametric bootstrap, except for the way that we generate our bootstrap samples. • Nonparametric bootstrap samples from the empirical distribution of our data. • Parametric bootstrap samples from the fitted parametric model for our data.  Suppose we use 𝐹(𝛽) to model our response 𝑌. Fitting the model to our observed data gives us 𝐹( 𝛽), our fitted parametric model.  A nonparametric bootstrap algorithm is 1. Obtain 𝐹 𝛽 by fitting the parametric model to observed data. 2. Use 𝐹( 𝛽) to generate 𝑌𝑗 ∗ , a vector of random pseudo-responses 3. Fit 𝐹(𝛽) to 𝑌𝑗 ∗ , giving us 𝛽𝑗 ∗ 4. Store 𝛽𝑗 ∗ and return to step 2 for j=1…,B.
  • 25. Copyright © 2012, SAS Institute Inc. All rights reserved. PARAMETRIC BOOTSTRAP SIGMOID CURVE EXAMPLE • A few slides ago, we saw an example of a sigmoid curve. 𝐹 𝛽 = 𝑔 𝑥, 𝛽 = 𝛽3 + 𝛽4 − 𝛽3 1 + 𝐸𝑥𝑝[ −𝛽1 𝑥 − 𝛽2 ] • Assume the response is normally distributed: y ∼ 𝑁(𝑔 𝑥, 𝛽 , 𝜎2). • Fitting the curve to our original data set gives us an estimate for our coefficients and error variance. Term 𝛽1 𝛽2 𝛽3 𝛽4 𝜎 Estimate 5.72 .32 .60 1.11 .011
  • 26. Copyright © 2012, SAS Institute Inc. All rights reserved. PARAMETRIC BOOTSTRAP SIGMOID EXAMPLE CONTINUED • Using our estimated coefficients and error variance 𝑦𝑗 ∗ = 𝑔 𝑥𝑗, 𝛽 + 𝜎𝜖𝑗 where the 𝜖𝑗 are independent and identically distributed standard normal. Parametric Bootstrap sample
  • 27. Copyright © 2012, SAS Institute Inc. All rights reserved. PARAMETRIC BOOTSTRAP MORE DETAILS • Resampling rows can be problematic. • Any other reasons to use the parametric bootstrap? • Results will be close to “textbook” formulae when available. • Very nice for doing goodness of fit tests. • Example: Normal goodness of fit test 𝐻0: 𝑁 𝜇, 𝜎 appropriate for 𝑦 vs 𝐻1: 𝑁 𝜇, 𝜎 not appropriate Parametric bootstrap gives us the distribution of the test statistic under 𝐻0 → Perfect for calculating p-values
  • 28. Copyright © 2012, SAS Institute Inc. All rights reserved. PARAMETRIC BOOTSTRAP …AND JMP • JMP provides a one-click nonparametric bootstrap, but a little bit of scripting gets us the parametric bootstrap as well. • Most (all?) modeling platforms in JMP allow you to save a prediction formula. We can also create columns of random values for many distributions. Put these two things together and we’re well on our way!
  • 29. Copyright © 2012, SAS Institute Inc. All rights reserved. PARAMETRIC BOOTSTRAPPING DEMONSTRATION Parametric Bootstrapping Demonstration in JMP
  • 30. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. BOOTSTRAP AGGREGATING (BAGGING)
  • 31. Copyright © 2012, SAS Institute Inc. All rights reserved. BAGGING INTRODUCTION • We have seen bootstrapping for inference, we can also use the bootstrap to improve prediction. • Breiman (1996a) introduced the notion of “bootstrap aggregating” (or bagging for short) to improve predictions. • The name says it all…aggregate predictions across bootstrap samples.
  • 32. Copyright © 2012, SAS Institute Inc. All rights reserved. BAGGING UNSTABLE PREDICTORS • Breiman (1996b) introduced the idea of instability in model selection. • Let’s say that we are using our data D = { (𝒙𝑖, 𝑦𝑖), 𝑖 = 1, … , 𝑛 } to create a prediction function 𝜇 𝑥, 𝐷 . • If a small change in 𝐷 results in a large change in 𝜇 ∙,∙ , we have an unstable predictor. • A variety of techniques have been shown to be unstable: Regression trees, best subset regression, forward selection, … • Instability is a major concern when predicting new observations.
  • 33. Copyright © 2012, SAS Institute Inc. All rights reserved. BAGGING THE BAGGING ALGORITHM • A natural way to deal with instability is to observe the behavior of 𝜇 ∙,∙ for repeated perturbations of the data → bootstrap it! • Basic bagging algorithm: 1. Take a bootstrap sample 𝐷𝑗 from the observed data 𝐷. 2. Fit your model of choice to 𝐷𝑗, giving you predictor 𝜇 𝑗(𝒙). Repeat 1 and 2 for 𝑗 = 1, … , 𝑏 Then the bagged prediction rule is 𝜇(𝒙) 𝑏𝑎𝑔 = 1 𝑏 𝑗=1 𝑏 𝜇 𝑗(𝒙) • Bagging a classifier is slightly different. You can either average over the probability formula or use a voting scheme.
  • 34. Copyright © 2012, SAS Institute Inc. All rights reserved. BAGGING REGRESSION TREES • Regression trees (as well as classification trees) are known to be particularly unstable. Ex: 𝑦 𝑥 = 1 𝑥 < 2 5 2 ≤ 𝑥 ≤ 4 3 𝑥 > 4 • A regression tree looks for optimal splits in your predictors and fits a simple mean to each section.
  • 35. Copyright © 2012, SAS Institute Inc. All rights reserved. BAGGING MORE ON INSTABILITY • In general, techniques that involve “hard decision rules” (like binary splits or including/excluding terms) are likely to be unstable. • Regression trees: binary splits • Best subset: X1 is either included or left out of the model (nothing in between) • Is anything stable??? One example: Penalized regression techniques can shrink estimates, which is kind of like letting a variable partially enter the model.
  • 36. Copyright © 2012, SAS Institute Inc. All rights reserved. BAGGING REGRESSION TREE EXAMPLE A regression tree for these data will change drastically depending on whether or not we include the point at x=21.
  • 37. Copyright © 2012, SAS Institute Inc. All rights reserved. BAGGING REGRESSION TREE EXAMPLE Predictions for a tree with a single split. Blue includes x=21. Red excludes x=21. This kind of difference is crucial when we observe new data.
  • 38. Copyright © 2012, SAS Institute Inc. All rights reserved. BAGGING REGRESSION TREE EXAMPLE The bagged predictor is a compromise between the two.
  • 39. Copyright © 2012, SAS Institute Inc. All rights reserved. BAGGING RECAP • Bagging is a very useful tool that improves upon unstable predictors. Added bonus: we also get a measure of uncertainty. • Several features in JMP and JSL make it convenient to do bagging yourself. Most platforms (if not all) allow you to save a prediction formula linearModel << save prediction formula; predFormula = column("Y Predictor") << get formula; And we get a lot of mileage out of the resample freq() function rsfCol = newColumn("rsf", set formula(resample freq(1))); • An implementation of Random Forests was added to JMP Pro in version 10. Random forests resample rows and columns.
  • 40. Copyright © 2012, SAS Institute Inc. All rights reserved. BAGGING DEMONSTRATION Bagging Demonstration in JMP
  • 41. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. USING R FOR BOOTSTRAPPING
  • 42. Copyright © 2012, SAS Institute Inc. All rights reserved. USING R FOR BOOTSTRAPPING THE JMP INTERFACE TO R • JMP 10 added the ability to transfer information between JMP and R (a very powerful open-source statistical software package). • R has packages to do bootstrapping, in particular the “boot” package. • We can do the bootstrap in R using a custom-made JMP interface.
  • 43. Copyright © 2012, SAS Institute Inc. All rights reserved. USING R FOR BOOTSTRAPPING CONNECTING TO R • JMP is a very nice complement to R, making it easy to create convenient interfaces to R and then present R results in JMP. • A handful of JSL functions allow you to communicate with R. R Init(); // Initializes the connection to R x = [1, 2, 3]; R Send( x ); // sends the matrix x to R R Submit("x <- 2*x"); // submits R Code y = R Get(x);// gets the object x from R and names it y R Term(); // Terminates the connection to R • There are a few more JSL functions for communicating with R, but the functions listed above will handle the majority of your needs.
  • 44. Copyright © 2012, SAS Institute Inc. All rights reserved. USING R FOR BOOTSTRAPPING CONNECTING TO R • The R connection allows us to combine the strengths of R and JMP
  • 45. Copyright © 2012, SAS Institute Inc. All rights reserved. USING R FOR BOOTSTRAPPING DEMONSTRATION JMP and R Integration Demonstration
  • 46. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. STABILITY SELECTION
  • 47. Copyright © 2012, SAS Institute Inc. All rights reserved. STABILITY SELECTION INTRODUCTION • Bagging is a way to use resampling to improve prediction. We can also use resampling to improve variable selection techniques. • Meinshausen and Buhlmann (2010) introduced Stability selection, a very general modification that can be used in conjunction with any traditional variable selection technique. • The motivation behind stability selection is simple and very intuitive: If a predictor is typically included in the final model after doing variable selection on a subset of the data, then it is probably a meaningful variable.
  • 48. Copyright © 2012, SAS Institute Inc. All rights reserved. STABILITY SELECTION THE VARIABLE SELECTION PROBLEM • Suppose that we have observed data D = { (𝒙𝑖, 𝑦𝑖), 𝑖 = 1, … , 𝑛 } 𝒙𝑖 is a 𝑝 × 1 vector of predictors for observation 𝑖. • We want to build a model for the response 𝑦 using a subset of the predictors to improve both interpretation and predictive ability. This is the classic variable selection problem. • No shortage of variable selection techniques: stepwise regression, best subset, penalized regression, …
  • 49. Copyright © 2012, SAS Institute Inc. All rights reserved. STABILITY SELECTION THE VARIABLE SELECTION PROBLEM • Usual linear models problem, we want to estimate the coefficient vector 𝛽 for the model 𝑌 = 𝑋𝛽 + 𝜀. • Want our variable selection technique to set 𝛽𝑗 = 0 for some of the terms. • We always have at least one tuning parameter (λ) that controls the complexity of the model. Doing variable selection yields a set 𝑆 𝜆 = 𝑘 ∶ 𝛽 𝑘 ≠ 0 • We tune the model using Cross-Validation, AIC, BIC, … Variable Selection Technique Tuning Parameter Forward Selection Alpha-to-enter Backward Elimination Alpha-to-leave Best Subset Maximum model size considered Lasso L1 norm Least Angle Regression Number of nonzero Variables
  • 50. Copyright © 2012, SAS Institute Inc. All rights reserved. STABILITY SELECTION THE DETAILS • The stability selection algorithm: 1. Choose a random subsample without replacement (𝐷𝑗) of size 𝑛 2 from 𝐷. 2. Use a variable selection technique to obtain 𝑆 𝑘(𝜆), the set of nonzero coefficients for variables selected for tuning parameter value 𝜆. 3. Repeat steps 1 and 2 for 𝑘 = 1 … 𝐵 • Easy to implement: only as complicated as the underlying selection technique. • We calculate Π𝑗 𝜆 , the probability variable 𝑗 is included in the model when doing selection (with tuning 𝜆) on a random subset of the data. Π𝑗 𝜆 = 1 𝑏 𝑘=1 𝑏 𝐼(𝑥𝑗 ∈ 𝑆 𝑘(𝜆))
  • 51. Copyright © 2012, SAS Institute Inc. All rights reserved. STABILITY SELECTION MORE DETAILS • Applying the algorithm to a meaningful range of 𝜆 values shows us how inclusion probabilities change as a function of the tuning parameter. • It makes sense that if a term maintains a high selection probability, then it should be in our final model.
  • 52. Copyright © 2012, SAS Institute Inc. All rights reserved. STABILITY SELECTION MORE DETAILS • After looking across a meaningful range of 𝜆 values, we include variable 𝑥𝑗 in our final model if • max 𝜆 Π𝑗 𝜆 ≥ Π 𝑡ℎ𝑟 • Here Π 𝑡ℎ𝑟 is a tuning parameter chosen in advance. Meinshausen and Buhlmann (2010) shows that the results are not sensitive to the choice of Π 𝑡ℎ𝑟.
  • 53. Copyright © 2012, SAS Institute Inc. All rights reserved. STABILITY SELECTION MORE DETAILS • Improves unstable variable selection problems. • Very general, can be applied in most variable selection settings. • Can greatly outperform cross-validation for high dimensional data (𝑛 ≪ 𝑝). • Theory shows that stability selection controls the false discovery rate. • Why subsample instead of the usual bootstrap? Taking a subsample (without replacement) of size 𝑛 2 provides a similar reduction in information as a standard bootstrap sample. Saves time when the underlying variable selection technique is computationally intense.
  • 54. Copyright © 2012, SAS Institute Inc. All rights reserved. STABILITY SELECTION …AND JMP • JMP provides several built-in variable selection techniques: • Forward selection, backward elimination, Lasso, Elastic Net, … • The most convenient in JMP? Forward selection with alpha-to-enter tuning. • Scripting stability selection for forward selection is fairly straightforward using a frequency column in conjunction with the Stepwise platform. newFreq = J(n,1,0); newFreq[random shuffle(rows)[1::floor(n/2)]]=1; fCol << set values(newFreq); • We have implemented a slight modification of the stability selection algorithm, which Shah and Samworth (2013) shows to have some added perks.
  • 55. Copyright © 2012, SAS Institute Inc. All rights reserved. STABILITY SELECTION DEMONSTRATION Stability Selection Demonstration in JMP
  • 56. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. WRAP-UP
  • 57. Copyright © 2012, SAS Institute Inc. All rights reserved. WRAP-UP REFERENCES • Breiman, L. (1996). Bagging Predictors. Machine Learning, 24, 123-140. • Breiman, L. (1996). Heuristics of Instability and Stabilization in Model Selection. The Annals of Statistics, 24, 2350-2383. • Efron, B. and Tibshirani, R. (1998). An Introduction to the Bootstrap, Chapman & Hall/CRC. • Meinshausen, N. and Buhlmann, P. (2010). Stability Selection. Journal of the Royal Statistical Society Series B, 72, 417-473. • Shah, R. and Samworth, R. (2013). Variable selection with error control: another look at stability selection. Journal of the Royal Statistical Society Series B, 75, 55-80.
  • 58. www.SAS.comCopyr ight © 2012, SAS Institute Inc. All rights reser ved. THANK YOU! The Bootstrap and Beyond: Using JSL for resampling Michael Crotty, michael.crotty@sas.com Clay Barker, clay.barker@sas.com Research Statisticians JMP Division, SAS Institute