The Bootstrap and Beyond: Using JSL for Resampling

Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
THE BOOTSTRAP AND BEYOND:
USING JSL FOR RESAMPLING
• Michael Crotty & Clay Barker
• Research Statisticians
• JMP Division, SAS Institute

Copyright © 2012, SAS Institute Inc. All rights reserved.
OUTLINE
• Nonparametric Bootstrap
• Permutation Testing
• Parametric Bootstrap
• Bootstrap Aggregating (Bagging)
• Bootstrapping in R
• Stability Selection
• Wrap-up/Questions

NONPARAMETRIC BOOTSTRAP

NONPARAMETRIC
BOOTSTRAP
INTRODUCTION TO THE BOOTSTRAP
• Introduced by Brad Efron in 1979; grown in popularity as computing power
increases
• Resampling technique that allows you to estimate the variance of statistics,
even when analytical expressions for the variance are difficult to obtain
• You want to know about the population, but all you have is one sample
• Treat the sample as a population and sample from it with replacement
• This is called a bootstrap sample
• Repeating this sampling scheme produces bootstrap replication
• For each bootstrap sample, you can calculate the statistic(s) of interest

NONPARAMETRIC
BOOTSTRAP
BOOTSTRAP WORLD
• Efron & Tibshirani (1993) diagram of
the Real world and the Bootstrap
world to illustrate why bootstrapping
works

NONPARAMETRIC
BOOTSTRAP
THE BOOTSTRAP IN JMP
• Possible to do a bootstrap analysis prior to JMP 10 using a script
• “One-click bootstrap” added to JMP Pro in Version 10
• Available in most Analysis platforms
• Rows need to be independent for one-click bootstrap to be implemented
• Takes advantage of the Automatic Recalc feature
• Results can be analyzed in Distribution platform, which will know to provide
Bootstrap Confidence Limits, based on percentile interval method (Efron &
Tibshirani 1993)

NONPARAMETRIC
BOOTSTRAP
NON-STANDARD QUANTITIES
• By non-standard, I mean statistics for which we don’t readily have standard
errors
• Could be unavailable in JMP
• Could be difficult to obtain analytically
• Example: Adjusted R^2 value in linear regression

NONPARAMETRIC
BOOTSTRAP
RECAP
• Bootstrap is a powerful feature with many uses
• Primarily a UI feature, but capability is enhanced when scripted in JSL
• Allows us to get confidence intervals for statistics, functions of statistics and
curves
• Examples from Discovery 2012:
• Non-standard quantities
• Functions of the output
• Multiple tables in one bootstrap run
• Model from the Fit Curve platform

PERMUTATION TESTS

PERMUTATION
TESTS
INTRODUCTION
• Introduced by R.A. Fisher in the 1930’s
• Fisher wanted to demonstrate the validity of Student’s t test without normality
assumption
• Provide exact results, but only apply to a narrow range of problems
• Must have something to permute (i.e. change the order of)
• Bootstrap hypothesis testing extends permutation testing to more problems

PERMUTATION
TESTS
CONCEPTS
• Basic idea:
• Sample repeatedly from the permutation distribution
• Note that this is sampling without replacement (not with replacement)
• Resampling purpose is to permute (change the order of) the observations
• Compare the number of results more extreme than the observed result
• Calculate a p-value:
# 𝑟𝑒𝑠𝑢𝑙𝑡𝑠 𝑚𝑜𝑟𝑒 𝑒𝑥𝑡𝑟𝑒𝑚𝑒
#{𝑡𝑜𝑡𝑎𝑙 𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠}

PERMUTATION
TESTS
TWO-SAMPLE EXAMPLE
• Consider comparing the (possibly different) distributions (𝐹, 𝐺) of two samples
(sizes 𝑛, 𝑚 with 𝑁 = 𝑛 + 𝑚)
• 𝐻0: 𝐹 = 𝐺
• Under 𝐻0, all permutations of the observations across 𝐹, 𝐺 are equally likely
• There are 𝑁
𝑛
possible permutations; generally sampling from these is sufficient.
• For each permutation replication, determine if the difference ( 𝜃∗
) is greater than the
observed difference ( 𝜃).
• Tabulate the number of times 𝜃∗
≥ 𝜃 and divide by number of replications.
• This is a one-sided permutation test; a two-sided test can be performed by taking
absolute values of 𝜃∗
and 𝜃.

PERMUTATION
TESTS
DEMONSTRATION #1
• Oneway platform in JMP can compute
robust mean estimates
• Test included is a Wald test with an asymptotic
Chi Square distribution p-value
• We wish to use a permutation test to avoid
the distributional assumption of the
asymptotic test

PERMUTATION
TESTS
DEMONSTRATION #1
• Script input:
• continuous response
• categorical predictor
• # of permutation replications
• Script output:
• original Robust Fit
• permutation test results
newX = xVals[random shuffle(iVec)];
// set the x values to a random permutation
column(eval expr(xCol)) << set values(newX);

PERMUTATION
TESTS
DEMONSTRATION #1
Robust mean permutation test demo

PERMUTATION
TESTS
DEMONSTRATION #2
• Contingency platform in JMP performs two
Chi Square tests for testing if responses
differ across levels of the X variable
• Tests require that expected counts of
contingency table cells be > 5
• We wish to use a permutation test to avoid
this requirement

PERMUTATION
TESTS
DEMONSTRATION #2
• Script input:
• categorical response
• categorical predictor
• # of permutation replications
• Script output:
• results of two original Chi Square tests
• permutation test results
newY = responseVals[random shuffle(ivec)];
column(eval expr(responseCol)) << set values(newY);

PERMUTATION
TESTS
DEMONSTRATION #2
Contingency table permutation test demo

PERMUTATION
TESTS
RECAP
• When available, full permutation tests provide exact results
• …and incomplete permutation tests still give good results
• If something can be permuted, these tests are easy to implement
• For other significance testing situations, bootstrap hypothesis testing works

PARAMETRIC BOOTSTRAP

PARAMETRIC
BOOTSTRAP
INTRODUCTION
• JMP provides a one-click non-parametric bootstrap feature.
• But there are other variations of the bootstrap: residual resampling, Bayesian
bootstrap, …
• This section will provide an introduction to the parametric bootstrap and how
it can be implemented in JSL.

PARAMETRIC
BOOTSTRAP
WHY NOT SAMPLE ROWS?
• There are times when we may not want to resample rows of our data.
Nonparametric
Bootstrap
Sample

PARAMETRIC
BOOTSTRAP
WHY NOT SAMPLE ROWS?
• For a sigmoid curve, we don’t want to lose any sections of the curve:
• Upper asymptote
• Lower asymptote
• Inflection point
• Similar issues arise with logistic regression, resampling rows can lead to
problems with separation.
• There are several alternatives to resampling rows, we will focus on the
parametric bootstrap.

PARAMETRIC
BOOTSTRAP
DETAILS
• Nearly identical to the nonparametric bootstrap, except for the way that we
generate our bootstrap samples.
• Nonparametric bootstrap samples from the empirical distribution of our data.
• Parametric bootstrap samples from the fitted parametric model for our data.
 Suppose we use 𝐹(𝛽) to model our response 𝑌. Fitting the model to our
observed data gives us 𝐹( 𝛽), our fitted parametric model.
 A nonparametric bootstrap algorithm is
1. Obtain 𝐹 𝛽 by fitting the parametric model to observed data.
2. Use 𝐹( 𝛽) to generate 𝑌𝑗
∗
, a vector of random pseudo-responses
3. Fit 𝐹(𝛽) to 𝑌𝑗
∗
, giving us 𝛽𝑗
∗
4. Store 𝛽𝑗
∗
and return to step 2 for j=1…,B.

PARAMETRIC
BOOTSTRAP
SIGMOID CURVE EXAMPLE
• A few slides ago, we saw an example of a sigmoid curve.
𝐹 𝛽 = 𝑔 𝑥, 𝛽 = 𝛽3 +
𝛽4 − 𝛽3
1 + 𝐸𝑥𝑝[ −𝛽1 𝑥 − 𝛽2 ]
• Assume the response is normally distributed: y ∼ 𝑁(𝑔 𝑥, 𝛽 , 𝜎2).
• Fitting the curve to our original data set gives us an estimate for our
coefficients and error variance.
Term 𝛽1 𝛽2 𝛽3 𝛽4 𝜎
Estimate 5.72 .32 .60 1.11 .011

PARAMETRIC
BOOTSTRAP
SIGMOID EXAMPLE CONTINUED
• Using our estimated coefficients and error variance
𝑦𝑗
∗
= 𝑔 𝑥𝑗, 𝛽 + 𝜎𝜖𝑗
where the 𝜖𝑗 are independent and identically distributed standard normal.
Parametric
Bootstrap
sample

PARAMETRIC
BOOTSTRAP
MORE DETAILS
• Resampling rows can be problematic.
• Any other reasons to use the parametric bootstrap?
• Results will be close to “textbook” formulae when available.
• Very nice for doing goodness of fit tests.
• Example: Normal goodness of fit test
𝐻0: 𝑁 𝜇, 𝜎 appropriate for 𝑦 vs 𝐻1: 𝑁 𝜇, 𝜎 not appropriate
Parametric bootstrap gives us the distribution of the test statistic under 𝐻0
→ Perfect for calculating p-values

PARAMETRIC
BOOTSTRAP
…AND JMP
• JMP provides a one-click nonparametric bootstrap, but a little bit of scripting
gets us the parametric bootstrap as well.
• Most (all?) modeling platforms in JMP allow you to save a prediction formula.
We can also create columns of random values for many distributions.
Put these two things together and we’re well on our way!

PARAMETRIC
BOOTSTRAPPING
DEMONSTRATION
Parametric Bootstrapping Demonstration in JMP

BOOTSTRAP AGGREGATING (BAGGING)

BAGGING INTRODUCTION
• We have seen bootstrapping for inference, we can also use the bootstrap to
improve prediction.
• Breiman (1996a) introduced the notion of “bootstrap aggregating” (or bagging
for short) to improve predictions.
• The name says it all…aggregate predictions across bootstrap samples.

BAGGING UNSTABLE PREDICTORS
• Breiman (1996b) introduced the idea of instability in model selection.
• Let’s say that we are using our data
D = { (𝒙𝑖, 𝑦𝑖), 𝑖 = 1, … , 𝑛 }
to create a prediction function 𝜇 𝑥, 𝐷 .
• If a small change in 𝐷 results in a large change in 𝜇 ∙,∙ , we have an unstable
predictor.
• A variety of techniques have been shown to be unstable:
Regression trees, best subset regression, forward selection, …
• Instability is a major concern when predicting new observations.

BAGGING THE BAGGING ALGORITHM
• A natural way to deal with instability is to observe the behavior of 𝜇 ∙,∙ for
repeated perturbations of the data → bootstrap it!
• Basic bagging algorithm:
1. Take a bootstrap sample 𝐷𝑗 from the observed data 𝐷.
2. Fit your model of choice to 𝐷𝑗, giving you predictor 𝜇 𝑗(𝒙).
Repeat 1 and 2 for 𝑗 = 1, … , 𝑏
Then the bagged prediction rule is
𝜇(𝒙) 𝑏𝑎𝑔 =
1
𝑏 𝑗=1
𝑏
𝜇 𝑗(𝒙)
• Bagging a classifier is slightly different. You can either average over the
probability formula or use a voting scheme.

BAGGING REGRESSION TREES
• Regression trees (as well as classification
trees) are known to be particularly unstable.
Ex:
𝑦 𝑥 =
1 𝑥 < 2
5 2 ≤ 𝑥 ≤ 4
3 𝑥 > 4
• A regression tree looks for optimal splits in your
predictors and fits a simple mean to each
section.

BAGGING MORE ON INSTABILITY
• In general, techniques that involve “hard decision rules” (like binary splits or
including/excluding terms) are likely to be unstable.
• Regression trees: binary splits
• Best subset: X1 is either included or left out of the model (nothing in between)
• Is anything stable???
One example: Penalized regression techniques can shrink estimates, which is kind of
like letting a variable partially enter the model.

BAGGING REGRESSION TREE EXAMPLE
A regression tree for these
data will change drastically
depending on whether or not
we include the point at x=21.

Predictions for a tree with a
single split.
Blue includes x=21.
Red excludes x=21.
This kind of difference is
crucial when we observe
new data.

The bagged predictor is a
compromise between the two.

BAGGING RECAP
• Bagging is a very useful tool that improves upon unstable predictors.
Added bonus: we also get a measure of uncertainty.
• Several features in JMP and JSL make it convenient to do bagging yourself.
Most platforms (if not all) allow you to save a prediction formula
linearModel << save prediction formula;
predFormula = column("Y Predictor") << get formula;
And we get a lot of mileage out of the resample freq() function
rsfCol = newColumn("rsf", set formula(resample freq(1)));
• An implementation of Random Forests was added to JMP Pro in version 10.
Random forests resample rows and columns.

BAGGING DEMONSTRATION
Bagging Demonstration in JMP

USING R FOR BOOTSTRAPPING

USING R FOR
BOOTSTRAPPING
THE JMP INTERFACE TO R
• JMP 10 added the ability to transfer information between JMP and R (a very
powerful open-source statistical software package).
• R has packages to do bootstrapping, in particular the “boot” package.
• We can do the bootstrap in R using a custom-made JMP interface.

USING R FOR
BOOTSTRAPPING
CONNECTING TO R
• JMP is a very nice complement to R, making it easy to create convenient
interfaces to R and then present R results in JMP.
• A handful of JSL functions allow you to communicate with R.
R Init(); // Initializes the connection to R
x = [1, 2, 3];
R Send( x ); // sends the matrix x to R
R Submit("x <- 2*x"); // submits R Code
y = R Get(x);// gets the object x from R and names it y
R Term(); // Terminates the connection to R
• There are a few more JSL functions for communicating with R, but the
functions listed above will handle the majority of your needs.

USING R FOR
BOOTSTRAPPING
CONNECTING TO R
• The R connection allows us to combine the strengths of R and JMP

USING R FOR
BOOTSTRAPPING
DEMONSTRATION
JMP and R Integration Demonstration

STABILITY SELECTION

STABILITY
SELECTION
INTRODUCTION
• Bagging is a way to use resampling to improve prediction.
We can also use resampling to improve variable selection techniques.
• Meinshausen and Buhlmann (2010) introduced Stability selection, a very
general modification that can be used in conjunction with any traditional
variable selection technique.
• The motivation behind stability selection is simple and very intuitive:
If a predictor is typically included in the final model after doing variable
selection on a subset of the data, then it is probably a meaningful variable.

STABILITY
SELECTION
THE VARIABLE SELECTION PROBLEM
• Suppose that we have observed data
D = { (𝒙𝑖, 𝑦𝑖), 𝑖 = 1, … , 𝑛 }
𝒙𝑖 is a 𝑝 × 1 vector of predictors for observation 𝑖.
• We want to build a model for the response 𝑦 using a subset of the predictors
to improve both interpretation and predictive ability.
This is the classic variable selection problem.
• No shortage of variable selection techniques:
stepwise regression, best subset, penalized regression, …

STABILITY
SELECTION
THE VARIABLE SELECTION PROBLEM
• Usual linear models problem, we want to
estimate the coefficient vector 𝛽 for the model
𝑌 = 𝑋𝛽 + 𝜀.
• Want our variable selection technique to set
𝛽𝑗 = 0 for some of the terms.
• We always have at least one tuning parameter
(λ) that controls the complexity of the model.
Doing variable selection yields a set
𝑆 𝜆 = 𝑘 ∶ 𝛽 𝑘 ≠ 0
• We tune the model using Cross-Validation, AIC,
BIC, …
Variable
Selection
Technique
Tuning
Parameter
Forward
Selection
Alpha-to-enter
Backward
Elimination
Alpha-to-leave
Best Subset Maximum model
size considered
Lasso L1 norm
Least Angle
Regression
Number of
nonzero
Variables

STABILITY
SELECTION
THE DETAILS
• The stability selection algorithm:
1. Choose a random subsample without replacement (𝐷𝑗) of size
𝑛
2
from 𝐷.
2. Use a variable selection technique to obtain 𝑆 𝑘(𝜆), the set of nonzero coefficients for
variables selected for tuning parameter value 𝜆.
3. Repeat steps 1 and 2 for 𝑘 = 1 … 𝐵
• Easy to implement: only as complicated as the underlying selection technique.
• We calculate Π𝑗 𝜆 , the probability variable 𝑗 is included in the model when doing
selection (with tuning 𝜆) on a random subset of the data.
Π𝑗 𝜆 =
1
𝑏
𝑘=1
𝑏
𝐼(𝑥𝑗 ∈ 𝑆 𝑘(𝜆))

STABILITY
SELECTION
MORE DETAILS
• Applying the algorithm to a meaningful
range of 𝜆 values shows us how
inclusion probabilities change as a
function of the tuning parameter.
• It makes sense that if a term maintains
a high selection probability, then it
should be in our final model.

STABILITY
SELECTION
MORE DETAILS
• After looking across a meaningful
range of 𝜆 values, we include variable
𝑥𝑗 in our final model if
• max
𝜆
Π𝑗 𝜆 ≥ Π 𝑡ℎ𝑟
• Here Π 𝑡ℎ𝑟 is a tuning parameter
chosen in advance. Meinshausen and
Buhlmann (2010) shows that the
results are not sensitive to the choice
of Π 𝑡ℎ𝑟.

STABILITY
SELECTION
MORE DETAILS
• Improves unstable variable selection problems.
• Very general, can be applied in most variable selection settings.
• Can greatly outperform cross-validation for high dimensional data (𝑛 ≪ 𝑝).
• Theory shows that stability selection controls the false discovery rate.
• Why subsample instead of the usual bootstrap?
Taking a subsample (without replacement) of size
𝑛
2
provides a similar
reduction in information as a standard bootstrap sample.
Saves time when the underlying variable selection technique is
computationally intense.

STABILITY
SELECTION
…AND JMP
• JMP provides several built-in variable selection techniques:
• Forward selection, backward elimination, Lasso, Elastic Net, …
• The most convenient in JMP? Forward selection with alpha-to-enter tuning.
• Scripting stability selection for forward selection is fairly straightforward using
a frequency column in conjunction with the Stepwise platform.
newFreq = J(n,1,0);
newFreq[random shuffle(rows)[1::floor(n/2)]]=1;
fCol << set values(newFreq);
• We have implemented a slight modification of the stability selection algorithm,
which Shah and Samworth (2013) shows to have some added perks.

STABILITY
SELECTION
DEMONSTRATION
Stability Selection Demonstration in JMP

WRAP-UP

WRAP-UP REFERENCES
• Breiman, L. (1996). Bagging Predictors. Machine Learning, 24, 123-140.
• Breiman, L. (1996). Heuristics of Instability and Stabilization in Model
Selection. The Annals of Statistics, 24, 2350-2383.
• Efron, B. and Tibshirani, R. (1998). An Introduction to the Bootstrap,
Chapman & Hall/CRC.
• Meinshausen, N. and Buhlmann, P. (2010). Stability Selection. Journal of the
Royal Statistical Society Series B, 72, 417-473.
• Shah, R. and Samworth, R. (2013). Variable selection with error control:
another look at stability selection. Journal of the Royal Statistical Society
Series B, 75, 55-80.

www.SAS.comCopyr ight © 2012, SAS Institute Inc. All rights reser ved.
THANK YOU!
The Bootstrap and Beyond: Using JSL for resampling
Michael Crotty, michael.crotty@sas.com
Clay Barker, clay.barker@sas.com
Research Statisticians
JMP Division, SAS Institute

The Bootstrap and Beyond: Using JSL for Resampling

Recommended

Recommended

More Related Content

Similar to The Bootstrap and Beyond: Using JSL for Resampling

Similar to The Bootstrap and Beyond: Using JSL for Resampling (20)

More from JMP software from SAS

More from JMP software from SAS (19)

Recently uploaded

Recently uploaded (20)

The Bootstrap and Beyond: Using JSL for Resampling