Upcoming SlideShare
Loading in …5
×

# The Bootstrap and Beyond: Using JSL for Resampling

1,440 views

Published on

This presentation was originally given live at JMP Discovery Summit 2013 in San Antonio, Texas, USA. To sign up to attend this year's conference, visit http://jmp.com/summit

Published in: Technology
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

• Be the first to like this

### The Bootstrap and Beyond: Using JSL for Resampling

1. 1. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. THE BOOTSTRAP AND BEYOND: USING JSL FOR RESAMPLING • Michael Crotty & Clay Barker • Research Statisticians • JMP Division, SAS Institute
2. 2. Copyright © 2012, SAS Institute Inc. All rights reserved. OUTLINE • Nonparametric Bootstrap • Permutation Testing • Parametric Bootstrap • Bootstrap Aggregating (Bagging) • Bootstrapping in R • Stability Selection • Wrap-up/Questions
3. 3. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. NONPARAMETRIC BOOTSTRAP
4. 4. Copyright © 2012, SAS Institute Inc. All rights reserved. NONPARAMETRIC BOOTSTRAP INTRODUCTION TO THE BOOTSTRAP • Introduced by Brad Efron in 1979; grown in popularity as computing power increases • Resampling technique that allows you to estimate the variance of statistics, even when analytical expressions for the variance are difficult to obtain • You want to know about the population, but all you have is one sample • Treat the sample as a population and sample from it with replacement • This is called a bootstrap sample • Repeating this sampling scheme produces bootstrap replication • For each bootstrap sample, you can calculate the statistic(s) of interest
5. 5. Copyright © 2012, SAS Institute Inc. All rights reserved. NONPARAMETRIC BOOTSTRAP BOOTSTRAP WORLD • Efron & Tibshirani (1993) diagram of the Real world and the Bootstrap world to illustrate why bootstrapping works
6. 6. Copyright © 2012, SAS Institute Inc. All rights reserved. NONPARAMETRIC BOOTSTRAP THE BOOTSTRAP IN JMP • Possible to do a bootstrap analysis prior to JMP 10 using a script • “One-click bootstrap” added to JMP Pro in Version 10 • Available in most Analysis platforms • Rows need to be independent for one-click bootstrap to be implemented • Takes advantage of the Automatic Recalc feature • Results can be analyzed in Distribution platform, which will know to provide Bootstrap Confidence Limits, based on percentile interval method (Efron & Tibshirani 1993)
7. 7. Copyright © 2012, SAS Institute Inc. All rights reserved. NONPARAMETRIC BOOTSTRAP NON-STANDARD QUANTITIES • By non-standard, I mean statistics for which we don’t readily have standard errors • Could be unavailable in JMP • Could be difficult to obtain analytically • Example: Adjusted R^2 value in linear regression
8. 8. Copyright © 2012, SAS Institute Inc. All rights reserved. NONPARAMETRIC BOOTSTRAP RECAP • Bootstrap is a powerful feature with many uses • Primarily a UI feature, but capability is enhanced when scripted in JSL • Allows us to get confidence intervals for statistics, functions of statistics and curves • Examples from Discovery 2012: • Non-standard quantities • Functions of the output • Multiple tables in one bootstrap run • Model from the Fit Curve platform
9. 9. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. PERMUTATION TESTS
10. 10. Copyright © 2012, SAS Institute Inc. All rights reserved. PERMUTATION TESTS INTRODUCTION • Introduced by R.A. Fisher in the 1930’s • Fisher wanted to demonstrate the validity of Student’s t test without normality assumption • Provide exact results, but only apply to a narrow range of problems • Must have something to permute (i.e. change the order of) • Bootstrap hypothesis testing extends permutation testing to more problems
11. 11. Copyright © 2012, SAS Institute Inc. All rights reserved. PERMUTATION TESTS CONCEPTS • Basic idea: • Sample repeatedly from the permutation distribution • Note that this is sampling without replacement (not with replacement) • Resampling purpose is to permute (change the order of) the observations • Compare the number of results more extreme than the observed result • Calculate a p-value: # 𝑟𝑒𝑠𝑢𝑙𝑡𝑠 𝑚𝑜𝑟𝑒 𝑒𝑥𝑡𝑟𝑒𝑚𝑒 #{𝑡𝑜𝑡𝑎𝑙 𝑖𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠}
12. 12. Copyright © 2012, SAS Institute Inc. All rights reserved. PERMUTATION TESTS TWO-SAMPLE EXAMPLE • Consider comparing the (possibly different) distributions (𝐹, 𝐺) of two samples (sizes 𝑛, 𝑚 with 𝑁 = 𝑛 + 𝑚) • 𝐻0: 𝐹 = 𝐺 • Under 𝐻0, all permutations of the observations across 𝐹, 𝐺 are equally likely • There are 𝑁 𝑛 possible permutations; generally sampling from these is sufficient. • For each permutation replication, determine if the difference ( 𝜃∗ ) is greater than the observed difference ( 𝜃). • Tabulate the number of times 𝜃∗ ≥ 𝜃 and divide by number of replications. • This is a one-sided permutation test; a two-sided test can be performed by taking absolute values of 𝜃∗ and 𝜃.
13. 13. Copyright © 2012, SAS Institute Inc. All rights reserved. PERMUTATION TESTS DEMONSTRATION #1 • Oneway platform in JMP can compute robust mean estimates • Test included is a Wald test with an asymptotic Chi Square distribution p-value • We wish to use a permutation test to avoid the distributional assumption of the asymptotic test
14. 14. Copyright © 2012, SAS Institute Inc. All rights reserved. PERMUTATION TESTS DEMONSTRATION #1 • Script input: • continuous response • categorical predictor • # of permutation replications • Script output: • original Robust Fit • permutation test results newX = xVals[random shuffle(iVec)]; // set the x values to a random permutation column(eval expr(xCol)) << set values(newX);
15. 15. Copyright © 2012, SAS Institute Inc. All rights reserved. PERMUTATION TESTS DEMONSTRATION #1 Robust mean permutation test demo
16. 16. Copyright © 2012, SAS Institute Inc. All rights reserved. PERMUTATION TESTS DEMONSTRATION #2 • Contingency platform in JMP performs two Chi Square tests for testing if responses differ across levels of the X variable • Tests require that expected counts of contingency table cells be > 5 • We wish to use a permutation test to avoid this requirement
17. 17. Copyright © 2012, SAS Institute Inc. All rights reserved. PERMUTATION TESTS DEMONSTRATION #2 • Script input: • categorical response • categorical predictor • # of permutation replications • Script output: • results of two original Chi Square tests • permutation test results newY = responseVals[random shuffle(ivec)]; column(eval expr(responseCol)) << set values(newY);
18. 18. Copyright © 2012, SAS Institute Inc. All rights reserved. PERMUTATION TESTS DEMONSTRATION #2 Contingency table permutation test demo
19. 19. Copyright © 2012, SAS Institute Inc. All rights reserved. PERMUTATION TESTS RECAP • When available, full permutation tests provide exact results • …and incomplete permutation tests still give good results • If something can be permuted, these tests are easy to implement • For other significance testing situations, bootstrap hypothesis testing works
20. 20. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. PARAMETRIC BOOTSTRAP
21. 21. Copyright © 2012, SAS Institute Inc. All rights reserved. PARAMETRIC BOOTSTRAP INTRODUCTION • JMP provides a one-click non-parametric bootstrap feature. • But there are other variations of the bootstrap: residual resampling, Bayesian bootstrap, … • This section will provide an introduction to the parametric bootstrap and how it can be implemented in JSL.
22. 22. Copyright © 2012, SAS Institute Inc. All rights reserved. PARAMETRIC BOOTSTRAP WHY NOT SAMPLE ROWS? • There are times when we may not want to resample rows of our data. Nonparametric Bootstrap Sample
23. 23. Copyright © 2012, SAS Institute Inc. All rights reserved. PARAMETRIC BOOTSTRAP WHY NOT SAMPLE ROWS? • For a sigmoid curve, we don’t want to lose any sections of the curve: • Upper asymptote • Lower asymptote • Inflection point • Similar issues arise with logistic regression, resampling rows can lead to problems with separation. • There are several alternatives to resampling rows, we will focus on the parametric bootstrap.
24. 24. Copyright © 2012, SAS Institute Inc. All rights reserved. PARAMETRIC BOOTSTRAP DETAILS • Nearly identical to the nonparametric bootstrap, except for the way that we generate our bootstrap samples. • Nonparametric bootstrap samples from the empirical distribution of our data. • Parametric bootstrap samples from the fitted parametric model for our data.  Suppose we use 𝐹(𝛽) to model our response 𝑌. Fitting the model to our observed data gives us 𝐹( 𝛽), our fitted parametric model.  A nonparametric bootstrap algorithm is 1. Obtain 𝐹 𝛽 by fitting the parametric model to observed data. 2. Use 𝐹( 𝛽) to generate 𝑌𝑗 ∗ , a vector of random pseudo-responses 3. Fit 𝐹(𝛽) to 𝑌𝑗 ∗ , giving us 𝛽𝑗 ∗ 4. Store 𝛽𝑗 ∗ and return to step 2 for j=1…,B.
25. 25. Copyright © 2012, SAS Institute Inc. All rights reserved. PARAMETRIC BOOTSTRAP SIGMOID CURVE EXAMPLE • A few slides ago, we saw an example of a sigmoid curve. 𝐹 𝛽 = 𝑔 𝑥, 𝛽 = 𝛽3 + 𝛽4 − 𝛽3 1 + 𝐸𝑥𝑝[ −𝛽1 𝑥 − 𝛽2 ] • Assume the response is normally distributed: y ∼ 𝑁(𝑔 𝑥, 𝛽 , 𝜎2). • Fitting the curve to our original data set gives us an estimate for our coefficients and error variance. Term 𝛽1 𝛽2 𝛽3 𝛽4 𝜎 Estimate 5.72 .32 .60 1.11 .011
26. 26. Copyright © 2012, SAS Institute Inc. All rights reserved. PARAMETRIC BOOTSTRAP SIGMOID EXAMPLE CONTINUED • Using our estimated coefficients and error variance 𝑦𝑗 ∗ = 𝑔 𝑥𝑗, 𝛽 + 𝜎𝜖𝑗 where the 𝜖𝑗 are independent and identically distributed standard normal. Parametric Bootstrap sample
27. 27. Copyright © 2012, SAS Institute Inc. All rights reserved. PARAMETRIC BOOTSTRAP MORE DETAILS • Resampling rows can be problematic. • Any other reasons to use the parametric bootstrap? • Results will be close to “textbook” formulae when available. • Very nice for doing goodness of fit tests. • Example: Normal goodness of fit test 𝐻0: 𝑁 𝜇, 𝜎 appropriate for 𝑦 vs 𝐻1: 𝑁 𝜇, 𝜎 not appropriate Parametric bootstrap gives us the distribution of the test statistic under 𝐻0 → Perfect for calculating p-values
28. 28. Copyright © 2012, SAS Institute Inc. All rights reserved. PARAMETRIC BOOTSTRAP …AND JMP • JMP provides a one-click nonparametric bootstrap, but a little bit of scripting gets us the parametric bootstrap as well. • Most (all?) modeling platforms in JMP allow you to save a prediction formula. We can also create columns of random values for many distributions. Put these two things together and we’re well on our way!
29. 29. Copyright © 2012, SAS Institute Inc. All rights reserved. PARAMETRIC BOOTSTRAPPING DEMONSTRATION Parametric Bootstrapping Demonstration in JMP
30. 30. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. BOOTSTRAP AGGREGATING (BAGGING)
31. 31. Copyright © 2012, SAS Institute Inc. All rights reserved. BAGGING INTRODUCTION • We have seen bootstrapping for inference, we can also use the bootstrap to improve prediction. • Breiman (1996a) introduced the notion of “bootstrap aggregating” (or bagging for short) to improve predictions. • The name says it all…aggregate predictions across bootstrap samples.
32. 32. Copyright © 2012, SAS Institute Inc. All rights reserved. BAGGING UNSTABLE PREDICTORS • Breiman (1996b) introduced the idea of instability in model selection. • Let’s say that we are using our data D = { (𝒙𝑖, 𝑦𝑖), 𝑖 = 1, … , 𝑛 } to create a prediction function 𝜇 𝑥, 𝐷 . • If a small change in 𝐷 results in a large change in 𝜇 ∙,∙ , we have an unstable predictor. • A variety of techniques have been shown to be unstable: Regression trees, best subset regression, forward selection, … • Instability is a major concern when predicting new observations.
33. 33. Copyright © 2012, SAS Institute Inc. All rights reserved. BAGGING THE BAGGING ALGORITHM • A natural way to deal with instability is to observe the behavior of 𝜇 ∙,∙ for repeated perturbations of the data → bootstrap it! • Basic bagging algorithm: 1. Take a bootstrap sample 𝐷𝑗 from the observed data 𝐷. 2. Fit your model of choice to 𝐷𝑗, giving you predictor 𝜇 𝑗(𝒙). Repeat 1 and 2 for 𝑗 = 1, … , 𝑏 Then the bagged prediction rule is 𝜇(𝒙) 𝑏𝑎𝑔 = 1 𝑏 𝑗=1 𝑏 𝜇 𝑗(𝒙) • Bagging a classifier is slightly different. You can either average over the probability formula or use a voting scheme.
34. 34. Copyright © 2012, SAS Institute Inc. All rights reserved. BAGGING REGRESSION TREES • Regression trees (as well as classification trees) are known to be particularly unstable. Ex: 𝑦 𝑥 = 1 𝑥 < 2 5 2 ≤ 𝑥 ≤ 4 3 𝑥 > 4 • A regression tree looks for optimal splits in your predictors and fits a simple mean to each section.
35. 35. Copyright © 2012, SAS Institute Inc. All rights reserved. BAGGING MORE ON INSTABILITY • In general, techniques that involve “hard decision rules” (like binary splits or including/excluding terms) are likely to be unstable. • Regression trees: binary splits • Best subset: X1 is either included or left out of the model (nothing in between) • Is anything stable??? One example: Penalized regression techniques can shrink estimates, which is kind of like letting a variable partially enter the model.
36. 36. Copyright © 2012, SAS Institute Inc. All rights reserved. BAGGING REGRESSION TREE EXAMPLE A regression tree for these data will change drastically depending on whether or not we include the point at x=21.
37. 37. Copyright © 2012, SAS Institute Inc. All rights reserved. BAGGING REGRESSION TREE EXAMPLE Predictions for a tree with a single split. Blue includes x=21. Red excludes x=21. This kind of difference is crucial when we observe new data.
38. 38. Copyright © 2012, SAS Institute Inc. All rights reserved. BAGGING REGRESSION TREE EXAMPLE The bagged predictor is a compromise between the two.
39. 39. Copyright © 2012, SAS Institute Inc. All rights reserved. BAGGING RECAP • Bagging is a very useful tool that improves upon unstable predictors. Added bonus: we also get a measure of uncertainty. • Several features in JMP and JSL make it convenient to do bagging yourself. Most platforms (if not all) allow you to save a prediction formula linearModel << save prediction formula; predFormula = column("Y Predictor") << get formula; And we get a lot of mileage out of the resample freq() function rsfCol = newColumn("rsf", set formula(resample freq(1))); • An implementation of Random Forests was added to JMP Pro in version 10. Random forests resample rows and columns.
40. 40. Copyright © 2012, SAS Institute Inc. All rights reserved. BAGGING DEMONSTRATION Bagging Demonstration in JMP
41. 41. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. USING R FOR BOOTSTRAPPING
42. 42. Copyright © 2012, SAS Institute Inc. All rights reserved. USING R FOR BOOTSTRAPPING THE JMP INTERFACE TO R • JMP 10 added the ability to transfer information between JMP and R (a very powerful open-source statistical software package). • R has packages to do bootstrapping, in particular the “boot” package. • We can do the bootstrap in R using a custom-made JMP interface.
43. 43. Copyright © 2012, SAS Institute Inc. All rights reserved. USING R FOR BOOTSTRAPPING CONNECTING TO R • JMP is a very nice complement to R, making it easy to create convenient interfaces to R and then present R results in JMP. • A handful of JSL functions allow you to communicate with R. R Init(); // Initializes the connection to R x = [1, 2, 3]; R Send( x ); // sends the matrix x to R R Submit("x <- 2*x"); // submits R Code y = R Get(x);// gets the object x from R and names it y R Term(); // Terminates the connection to R • There are a few more JSL functions for communicating with R, but the functions listed above will handle the majority of your needs.
44. 44. Copyright © 2012, SAS Institute Inc. All rights reserved. USING R FOR BOOTSTRAPPING CONNECTING TO R • The R connection allows us to combine the strengths of R and JMP
45. 45. Copyright © 2012, SAS Institute Inc. All rights reserved. USING R FOR BOOTSTRAPPING DEMONSTRATION JMP and R Integration Demonstration
46. 46. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. STABILITY SELECTION
47. 47. Copyright © 2012, SAS Institute Inc. All rights reserved. STABILITY SELECTION INTRODUCTION • Bagging is a way to use resampling to improve prediction. We can also use resampling to improve variable selection techniques. • Meinshausen and Buhlmann (2010) introduced Stability selection, a very general modification that can be used in conjunction with any traditional variable selection technique. • The motivation behind stability selection is simple and very intuitive: If a predictor is typically included in the final model after doing variable selection on a subset of the data, then it is probably a meaningful variable.
48. 48. Copyright © 2012, SAS Institute Inc. All rights reserved. STABILITY SELECTION THE VARIABLE SELECTION PROBLEM • Suppose that we have observed data D = { (𝒙𝑖, 𝑦𝑖), 𝑖 = 1, … , 𝑛 } 𝒙𝑖 is a 𝑝 × 1 vector of predictors for observation 𝑖. • We want to build a model for the response 𝑦 using a subset of the predictors to improve both interpretation and predictive ability. This is the classic variable selection problem. • No shortage of variable selection techniques: stepwise regression, best subset, penalized regression, …
49. 49. Copyright © 2012, SAS Institute Inc. All rights reserved. STABILITY SELECTION THE VARIABLE SELECTION PROBLEM • Usual linear models problem, we want to estimate the coefficient vector 𝛽 for the model 𝑌 = 𝑋𝛽 + 𝜀. • Want our variable selection technique to set 𝛽𝑗 = 0 for some of the terms. • We always have at least one tuning parameter (λ) that controls the complexity of the model. Doing variable selection yields a set 𝑆 𝜆 = 𝑘 ∶ 𝛽 𝑘 ≠ 0 • We tune the model using Cross-Validation, AIC, BIC, … Variable Selection Technique Tuning Parameter Forward Selection Alpha-to-enter Backward Elimination Alpha-to-leave Best Subset Maximum model size considered Lasso L1 norm Least Angle Regression Number of nonzero Variables
50. 50. Copyright © 2012, SAS Institute Inc. All rights reserved. STABILITY SELECTION THE DETAILS • The stability selection algorithm: 1. Choose a random subsample without replacement (𝐷𝑗) of size 𝑛 2 from 𝐷. 2. Use a variable selection technique to obtain 𝑆 𝑘(𝜆), the set of nonzero coefficients for variables selected for tuning parameter value 𝜆. 3. Repeat steps 1 and 2 for 𝑘 = 1 … 𝐵 • Easy to implement: only as complicated as the underlying selection technique. • We calculate Π𝑗 𝜆 , the probability variable 𝑗 is included in the model when doing selection (with tuning 𝜆) on a random subset of the data. Π𝑗 𝜆 = 1 𝑏 𝑘=1 𝑏 𝐼(𝑥𝑗 ∈ 𝑆 𝑘(𝜆))
51. 51. Copyright © 2012, SAS Institute Inc. All rights reserved. STABILITY SELECTION MORE DETAILS • Applying the algorithm to a meaningful range of 𝜆 values shows us how inclusion probabilities change as a function of the tuning parameter. • It makes sense that if a term maintains a high selection probability, then it should be in our final model.
52. 52. Copyright © 2012, SAS Institute Inc. All rights reserved. STABILITY SELECTION MORE DETAILS • After looking across a meaningful range of 𝜆 values, we include variable 𝑥𝑗 in our final model if • max 𝜆 Π𝑗 𝜆 ≥ Π 𝑡ℎ𝑟 • Here Π 𝑡ℎ𝑟 is a tuning parameter chosen in advance. Meinshausen and Buhlmann (2010) shows that the results are not sensitive to the choice of Π 𝑡ℎ𝑟.
53. 53. Copyright © 2012, SAS Institute Inc. All rights reserved. STABILITY SELECTION MORE DETAILS • Improves unstable variable selection problems. • Very general, can be applied in most variable selection settings. • Can greatly outperform cross-validation for high dimensional data (𝑛 ≪ 𝑝). • Theory shows that stability selection controls the false discovery rate. • Why subsample instead of the usual bootstrap? Taking a subsample (without replacement) of size 𝑛 2 provides a similar reduction in information as a standard bootstrap sample. Saves time when the underlying variable selection technique is computationally intense.
54. 54. Copyright © 2012, SAS Institute Inc. All rights reserved. STABILITY SELECTION …AND JMP • JMP provides several built-in variable selection techniques: • Forward selection, backward elimination, Lasso, Elastic Net, … • The most convenient in JMP? Forward selection with alpha-to-enter tuning. • Scripting stability selection for forward selection is fairly straightforward using a frequency column in conjunction with the Stepwise platform. newFreq = J(n,1,0); newFreq[random shuffle(rows)[1::floor(n/2)]]=1; fCol << set values(newFreq); • We have implemented a slight modification of the stability selection algorithm, which Shah and Samworth (2013) shows to have some added perks.
55. 55. Copyright © 2012, SAS Institute Inc. All rights reserved. STABILITY SELECTION DEMONSTRATION Stability Selection Demonstration in JMP
56. 56. Copyr ight © 2012, SAS Institute Inc. All rights reser ved. WRAP-UP
57. 57. Copyright © 2012, SAS Institute Inc. All rights reserved. WRAP-UP REFERENCES • Breiman, L. (1996). Bagging Predictors. Machine Learning, 24, 123-140. • Breiman, L. (1996). Heuristics of Instability and Stabilization in Model Selection. The Annals of Statistics, 24, 2350-2383. • Efron, B. and Tibshirani, R. (1998). An Introduction to the Bootstrap, Chapman & Hall/CRC. • Meinshausen, N. and Buhlmann, P. (2010). Stability Selection. Journal of the Royal Statistical Society Series B, 72, 417-473. • Shah, R. and Samworth, R. (2013). Variable selection with error control: another look at stability selection. Journal of the Royal Statistical Society Series B, 75, 55-80.
58. 58. www.SAS.comCopyr ight © 2012, SAS Institute Inc. All rights reser ved. THANK YOU! The Bootstrap and Beyond: Using JSL for resampling Michael Crotty, michael.crotty@sas.com Clay Barker, clay.barker@sas.com Research Statisticians JMP Division, SAS Institute