SlideShare a Scribd company logo
1 of 68
Resampling Methods
Outline
• Background
• Jackknife
• Bootstrap
• Permutation
• Cross-validation
Why do we need resampling?
• Purpose of statistics is to estimate some parameter(s) and reliability
of them. Since estimators are function of the sample points they are
random variables. If we could find distribution of this random
variable (sample statistic) then we could estimate reliability of the
estimators.
• If we would have sampling distribution for the sampling statistics
then we can estimate variance of the estimator, interval, even test
hypotheses
Why do we need resampling?
• Unfortunately apart from the simplest cases, sampling distribution is
not easy to derive.
• What is the sampling distribution of:
• The time since most recent common ancestor of all humans?
• The adjusted R-squared?
• The AIC?
• The beta coefficient when independence is violated?
• The number of connections in a neural net?
• The eigenvalues of PCA?
• A bifurcation point in a phylogenetic tree?
Why do we need resampling?
• Unfortunately apart from the simplest cases, sampling distribution is
not easy to derive. There are several techniques to approximate
these distributions e.g., Laplace approximation. These
approximations give analytical form for the approximate
distributions. With advent of computers more computationally
intensive methods are emerging. They work in many cases
satisfactorily.
Why do we need resampling?
• The t-distribution and chi-squared distribution are good
approximations for sufficiently large and/or normally-distributed
samples.
• However, when data is of un-known distribution or sample size is
small, re-sampling tests are recommended.
Resampling Methods
• Jackknife
• Bootstrap
• Permutation
• Cross-validation
Resampling
Method
Application Sampling
procedure used
Bootstrap Standard deviation,
confidence interval,
hypothesis testing,
bias
Samples drawn at
random, with
replacement
Jackknife Standard deviation,
confidence interval,
bias
Samples consist of
full data set with one
observation left out
Permutation Hypothesis testing Samples drawn at
random, without
replacement.
Cross-validation Model validation Data is randomly
divided into two or
more subsets, with
results validated
across sub-samples.
Jackknife
Jacknife
Jacknife is used for bias removal. As we know that mean-square error for an
estimator is equal to square of bias plus variance of the estimator. If bias is
much higher than variance then under some circumstances Jacknife could be
used.
Description of Jacknife: Let us assume that we have a sample of size n. We estimate
some sample statistics using all data – tn. Then by removing one point at a
time we estimate tn-1,i, where subscript indicates size of the sample and index
of removed sample point. Then new estimator is derived as:
If the order of the bias of the statistic tn is O(n-1) then after jacknife order of the bias
becomes O(n-2).
Variance is estimated using:
This procedure can be applied iteratively. I.e. for new estimator jacknife can be
applied again. First application of Jacknife can reduce bias without changing
variance of the estimator. But its second and higher order application can in
general increase the variance of the estimator.
n
t
ttnntt
n
i
in
nnnn


  1
,1
11
'
where,)1(



 


1
1
2
1,1 )(
1ˆ
n
i
ninJ tt
n
n
V
Bootstrap
The bootstrap
• 1969 Simon publishes the bootstrap as an example in Basic Research
Methods in Social Science (the earlier pigfood example)
• 1979 Efron names and publishes first paper on the bootstrap
Bootstrap (Nonparametric)
1 2( , ,.... )nx x x x
( )t F 
Have a random sample
from an unknown PDF, F.
Want to estimate based on .
We calculate the estimate based on .
Want to know how accurate is .
x
xˆ ( )s x 
ˆ
Bootstrap (Nonparametric)
1 2( , ,.... )nx x x x
ˆF
Notation:
Random sample:
Empirical distribution , places mass of 1/n at
each observed data value.
Bootstrap sample: Random sample of size n,
drawn from , denoted as
Bootstrap replicate of :
ˆF
1 2* ( *, *,..., *)nx x x x
ˆ ˆ* ( *)s x 
Bootstrap (Nonparametric)
¶
* * 2
1
( )
1
B
ii
BSE
B
 




Bootstrap steps:
1. Select bootstrap sample
consisting of n data values drawn with replacement
from the original data set.
2. Evaluate for the bootstrap sample
3. Repeat steps 2 and 3 B times each.
4. Estimate the standard error by the sample
standard deviation of the B replications:
1 2* ( *, *,..., *)nx x x x
ˆ* ( *)s x 
ˆ( )Fse 
The Bootstrap
• A new pigfood ration is tested on twelve pigs,
with six-week weight gains as follows:
• 496 544 464 416 512 560 608 544 480 466 512 496
• Mean: 508 ounces (establish a confidence
interval)
Draw simulated samples from
a hypothetical universe that
embodies all we know about
the universe that this
sample came from – our
sample, replicated an
infinite number of times
The Classic Bootstrap
1. Put the observed weight gains in a hat
2. Sample 12 with replacement
3. Record the mean
4. Repeat steps 2-3, say, 1000 times
5. Record the 5th and 95th percentiles (for a
90% confidence interval)
Bootstrapped sample means
Parametric Bootstrap
Resampling makes no assumptions about the
population distribution. The bootstrap covered thus far
is a nonparametric bootstrap. If we have information
about the population distr., this can be used in
resampling. In this case, when we draw randomly from
the sample we can use population distr. For example, if
we know that the population distr. is normal then
estimate its parameters using the sample mean and
variance. Then approximate the population distr. with
the sample distr. and use it to draw new samples.
Parametric Bootstrap
As expected, if the assumption about population
distribution is correct then the parametric
bootstrap will perform better than the
nonparametric bootstrap. If not correct, then the
nonparametric bootstrap will perform better.
Example of Bootstrap (Nonparametric)
Have test scores (out of 100) for two consecutive years
for each of 60 subjects. Want to obtain the
correlation between the test scores and the variance
of the correlation estimate.
Can use bootstrap to obtain the variance estimate.
How many Bootstrap Replications, B?
 A fairly small number, B=25, is sufficient to
be “informative” (Efron)
 B=50 is typically sufficient to provide a crude
estimate of the SE, but B>200 is generally
used.
 CIs require larger values of B, B no less than
500, with B=1000 recommended.
Permutation test
Permutation Tests
 In classical hypothesis testing, we start with
assumptions about the underlying distribution and
then derive the sampling distribution of the test
statistic under H0.
 In Permutation testing, the initial assumptions are
not needed (except exchangeability), and the
sampling distribution of the test statistic under H0
is computed by using permutations of the data.
Permutation Tests (example)
• The Permutation test is a technique that bases
inference on “experiments” within the observed
dataset.
• Consider the following example:
• In a medical experiment, rats are randomly
assigned to a treatment (Tx) or control (C) group.
• The outcome Xi is measured in the ith rat.
• Under H0, the outcome does not depend on whether
a rat carries the label Tx or C.
• Under H1, the outcome tends to different, say larger
for rats labeled Tx.
• A test statistic T measures the difference in
observed outcomes for the two groups. T may be
the difference in the two group means (or medians),
denoted as t for the observed data.
Permutation Tests (example)
• Under H0, the individual labels of Tx and C are
unimportant, since they have no impact on the
outcome. Since they are unimportant, the label can
be randomly shuffled among the rats without
changing the joint null distribution of the data.
• Shuffling the data creates a “new” dataset. It has
the same rats, but with the group labels changed
so as to appear as there were different group
assignments.
Permutation Tests (example)
• Let t be the value of the test statistic from the
original dataset.
• Let t1 be the value of the test statistic computed
from a one dataset with permuted labels.
• Consider all M possible permutations of the labels,
obtaining the test statistics,
t1, …, tM.
• Under H0, t1, …, tM are all generated from the same
underlying distribution that generated t.
Permutation Tests (example)
 Thus, t can be compared to the permuted data
test statistics, t1, …, tM , to test the hypothesis
and obtain a p-value or to construct confidence
limits for the statistic.
Permutation Tests (example)
• Survival times
• Treated mice 94, 38, 23, 197, 99, 16, 141
• Mean: 86.8
• Untreated mice 52, 10, 40, 104, 51, 27, 146,
30, 46
• Mean: 56.2
(Efron & Tibshirani)
Permutation Tests (example)
Calculate the difference between the means of the
two observed samples – it’s 30.6 days in favor of the
treated mice.
Consider the two samples combined (16
observations) as the relevant universe to resample
from.
Permutation Tests (example)
 Draw 7 hypothetical observations and designate
them "Treatment"; draw 9 hypothetical
observations and designate them "Control".
 Compute and record the difference between the
means of the two samples.
Permutation Tests (example)
 Repeat steps 3 and 4 perhaps 1000 times.
 Determine how often the resampled difference exceeds
the observed difference of 30.6
Permutation Tests (example)
Histogram of permuted differences
 If the group means are truly equal, then
shifting the group labels will not have a big
impact the sum of the two groups (or
mean with equal sample sizes). Some
group sums will be larger than in the
original data set and some will be smaller.
Permutation Tests (example)
Permutation Test Example 1
•16!/(16-7)!= 57657600
•Dataset is too large to enumerate all
permutations, a large number of random
permutations are selected.
•When permutations are enumerated, this is
an exact permutation test.
38
1.1 What is Permutation Tests?
• Permutation tests are significance tests based on permutation
resamples drawn at random from the original data. Permutation
resamples are drawn without replacement.
• Also called randomization tests, re-randomization tests, exact tests.
• Introduced by R.A. Fisher
and E.J.G. Pitman in the 1930s.
R.A. Fisher E.J.G. Pitman
39
When Can We Use Permutation Tests?
• Only when we can see how to resample in a way that is consistent
with the study design and with the null hypothesis.
• If we cannot do a permutation test, we can often calculate a
bootstrap confidence interval instead.
Advantages
40
Exist for any test
statistic, regardless of
whether or not its
distribution is known
Free to choose the statistic which
best discriminates between
hypothesis and alternative and
which minimizes losses
Can be used for:
- Analyzing unbalanced designs;
- Combining dependent tests on mixtures of categorical,
ordinal, and metric data.
Limitations
41
The observations are exchangeable
under the null hypothesis
An Important
Assumption
Tests of difference in location (like a
permutation t-test) require equal
variance
Consequence
The permutation t-test shares the same
weakness as the classical Student’s t-
test.
In this respect
42
Procedure of Permutation Tests
Analyze the problem.
- What is the hypothesis and alternative?
- What distribution is the data drawn from?
- What losses are associated with bad decisions?
Choose a test statistic which will distinguish
the hypothesis from the alternative.
Compute the test statistic for the original data
of the observations.
I
II
III
43
Procedure of Permutation Tests
IV
V
Rearrange the Observations
Compute the test statistic for all possible
permutations (rearrangements) of the data of the
observations
Make a decision
Reject the Hypothesis: if the value of the test
statistic for the original data is an extreme value in
the permutation distribution of the statistic.
Otherwise, accept the hypothesis and reject the
alternative.
44
Permutation Resampling Process
5 7 8 4 1 6 8 9 7 5 9
5 7 8 4 1 6 8 9 7 5 9
5 7 1
5 9
8 4 6
8 9 7
Median (5 7 1 5 9) Median (8 4 6 8 9 7)
Compute “difference statistic”, save result in table
and repeat resampling process 1000+ iterations
Collect Data from Control &
Treatment Groups
Merge Samples to form a
pseudo population
Sample without replacement
from pseudo population to
simulate Control and
Treatment Groups
Compute target statistic for
each resample
45
An physiology experiment to found the relationship
between Vitamin E and human “life-extending”
Example: “I Lost the Labels”
46
Example: “I Lost the Labels”
• 6 petri dishes:
• 3 containing standard medium
• 3 containing standard medium + Vitamin E
• Without the labels, we have no way of knowing which cell cultures
have been treated with Vitamin E and which have not.
• There are six numbers “121, 118, 110, 34, 12, 22”, each one belongs
to the petri dishes’ results.
• The number belongs to which dishes?
47
Here is a simple sample:
48
Using the original T-test to Find P-value
49
Permutation Resampling Process
121 118 110 34 12 22
121 118 110 34 12 22
121 118
34
110 12
22
Median=91 Median=48
Compute “difference statistic”, save result in table
and repeat resampling process 1000+ iterations
Collect Data from Control &
Treatment Groups
Merge Samples to form a
pseudo population
Sample without replacement
from pseudo population to
simulate Control and
Treatment Groups
Compute target statistic for
each resample
50
After one permutation
51
Formula in Permutation need
52
All 20 Permutation data
53
How is the conclusion
• Test decision The absolute value of the test statistic t ≥ =
13.0875 we obtained for the original labeling.
• We obtain the exact p value p = 2/20 = 0.1.
• Note: If both groups have equal size, Only half of permutations is
really needed (symmetry)
What are resampling methods?
• Tools that involves repeatedly drawing samples from a training set
and refitting a model of interest on each sample in order to obtain
more information about the fitted model
• Model Assessment: estimate test error rates
• Model Selection: select the appropriate level of model flexibility
• They are computationally expensive! But these days we have
powerful computers 
• Two resampling methods:
• Cross Validation
• Bootstrapping
IOM 530: Intro. to Statistical Learning 54
Cross-validation
Cross-validation
• Cross-validation is a resampling technique to overcome overfitting.
The Validation Set Approach
• Suppose that we would like to find a set of variables that give the
lowest test (not training) error rate
• If we have a large data set, we can achieve this goal by randomly
splitting the data into training and validation(testing) parts
• We would then use the training part to build each possible model (i.e.
the different combinations of variables) and choose the model that
gave the lowest error rate when applied to the validation data
IOM 530: Intro. to Statistical Learning 57Training Data Testing Data
Example: Auto Data
• Suppose that we want to predict mpg from horsepower
• Two models:
• mpg ~ horsepower
• mpg ~ horsepower + horspower2
• Which model gives a better fit?
• Randomly split Auto data set into training (196 obs.) and validation data (196
obs.)
• Fit both models using the training data set
• Then, evaluate both models using the validation data set
• The model with the lowest validation (testing) MSE is the winner!
IOM 530: Intro. to Statistical Learning 58
Results: Auto Data
• Left: Validation error rate for a single split
• Right: Validation method repeated 10 times, each time the split is
done randomly!
• There is a lot of variability among the MSE’s… Not good! We need
more stable methods!
IOM 530: Intro. to Statistical Learning 59
Leave-One-Out Cross Validation (LOOCV)
• This method is similar to the Validation
Set Approach, but it tries to address the
latter’s disadvantages
• For each suggested model, do:
• Split the data set of size n into
• Training data set (blue) size: n -1
• Validation data set (beige) size: 1
• Fit the model using the training data
• Validate model using the validation data, and
compute the corresponding MSE
• Repeat this process n times
• The MSE for the model is computed as follows:
IOM 530: Intro. to Statistical Learning 60
FI GU R E 5.3. A schematic display of LOOCV. A set of n data points is repeat-
edly split into a training set (shown in blue) containing all but one observation,
and a validation set that contains only that observation (shown in beige). The test
error is then estimated by averaging the n resulting MSE’s. The first training set
contains all but observation 1, the second training set contains all but observation
2, and so forth.
observations, and a prediction ˆy1 is made for the excluded observation,
using itsvaluex1. Since(x1, y1) wasnot used in thefitting process, MSE1 =
(y1 − ˆy1)2
provides an approximately unbiased estimate for the test error.
But even though MSE1 is unbiased for the test error, it is a poor estimate
because it is highly variable, since it is based upon a single observation
(x1, y1).
We can repeat the procedure by selecting (x2, y2) for the validation
data, training the statistical learning procedure on the n − 1 observations
{ (x1, y1), (x3, y3), . . . , (xn , yn )} , and computing MSE2 = (y2− ˆy2)2
. Repeat-
ing this approach n times produces n squared errors, MSE1, . . . , MSEn .
The LOOCV estimate for the test MSE is the averageof these n test error
estimates:
CV(n ) =
1
n
n
i = 1
MSEi . (5.1)
A schematic of the LOOCV approach is illustrated in Figure 5.3.
LOOCV has a couple of major advantages over the validation set ap-
proach. First, it has far less bias. In LOOCV, we repeatedly fit the statis-
LOOCV vs. the Validation Set Approach
• LOOCV has less bias
• We repeatedly fit the statistical learning method using training data that contains n-1
obs., i.e. almost all the data set is used
• LOOCV produces a less variable MSE
• The validation approach produces different MSE when applied repeatedly due to
randomness in the splitting process, while performing LOOCV multiple times will
always yield the same results, because we split based on 1 obs. each time
• LOOCV is computationally intensive (disadvantage)
• We fit the each model n times!
IOM 530: Intro. to Statistical Learning 61
k-fold Cross Validation
• LOOCV is computationally intensive, so we can run k-fold Cross Validation instead
• With k-fold Cross Validation, we divide the data set into K different parts (e.g. K = 5, or K
= 10, etc.)
• We then remove the first part, fit the model on the remaining K-1 parts, and see how
good the predictions are on the left out part (i.e. compute the MSE on the first part)
• We then repeat this K different times taking out a different part each time
• By averaging the K different MSE’s we get an estimated validation (test) error rate for
new observations
IOM 530: Intro. to Statistical Learning 62
FI GU R E 5.5. A schematic display of 5-fold CV. A set of n observations is
randomly split into five non-overlapping groups. Each of these fifths acts as a
validation set (shown in beige), and the remainder as a training set (shown in
blue). The test error is estimated by averaging the five resulting MSE estimates.
chapters. The magic formula (5.2) does not hold in general, in which case
the model has to be refit n times.
5.1.3 k-Fold Cross-Validation
An alternative to LOOCV is k-fold CV. This approach involves randomly
k-fold CV
dividing the set of observations into k groups, or folds, of approximately
equal size. The first fold is treated as a validation set, and the method
is fit on the remaining k − 1 folds. The mean squared error, MSE1, is
then computed on the observations in the held-out fold. This procedure is
repeated k times; each time, a different group of observations is treated
as a validation set. This process results in k estimates of the test error,
MSE1, MSE2, . . . , MSEk . The k-fold CV estimateis computed by averaging
these values,
CV(k) =
1
k
k
i = 1
MSEi . (5.3)
Figure 5.5 illustrates the k-fold CV approach.
It isnot hard to seethat LOOCV isa special caseof k-fold CV in which k
K-fold Cross Validation
IOM 530: Intro. to Statistical Learning 63
Auto Data: LOOCV vs. K-fold CV
• Left: LOOCV error curve
• Right: 10-fold CV was run many times, and the figure shows the slightly different CV error
rates
• LOOCV is a special case of k-fold, where k = n
• They are both stable, but LOOCV is more computationally intensive!
IOM 530: Intro. to Statistical Learning 64
Auto Data: Validation Set Approach vs. K-fold
CV Approach
• Left: Validation Set Approach
• Right: 10-fold Cross Validation Approach
• Indeed, 10-fold CV is more stable!
IOM 530: Intro. to Statistical Learning 65
Bias- Variance Trade-off for k-fold CV
• Putting aside that LOOCV is more computationally intensive than k-fold CV… Which is
better LOOCV or K-fold CV?
• LOOCV is less bias than k-fold CV (when k < n)
• But, LOOCV has higher variance than k-fold CV (when k < n)
• Thus, there is a trade-off between what to use
• Conclusion:
• We tend to use k-fold CV with (K = 5 and K = 10)
• These are the magical K’s 
• It has been empirically shown that they yield test error rate estimates that suffer neither from
excessively high bias, nor from very high variance
IOM 530: Intro. to Statistical Learning 66
Cross Validation on Classification Problems
• Cross validation can be used in a classification situation in a similar
manner
• Divide data into K parts
• Hold out one part, fit using the remaining data and compute the error rate on
the hold out data
• Repeat K times
• CV error rate is the average over the K errors we have computed
IOM 530: Intro. to Statistical Learning 67
Software? R
• http://www.ats.ucla.edu/stat/r/library/bootstrap.htm
• http://spark.rstudio.com/ahmed/bootstrap/
• http://spark.rstudio.com/ahmed/permutation/

More Related Content

What's hot

Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descentSuraj Parmar
 
Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningKhang Pham
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Usama Fayyaz
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixAndrew Ferlitsch
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
Machine learning and types
Machine learning and typesMachine learning and types
Machine learning and typesPadma Metta
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualizationDr. Hamdan Al-Sabri
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisJaclyn Kokx
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kambererror007
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data AnalysisUmair Shafique
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentationRishavSharma112
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning ProjectAbhishek Singh
 
Back propagation
Back propagationBack propagation
Back propagationNagarajan
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining Sulman Ahmed
 

What's hot (20)

Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
supervised learning
supervised learningsupervised learning
supervised learning
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descent
 
Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep Learning
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion Matrix
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Random forest
Random forestRandom forest
Random forest
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
 
Machine learning and types
Machine learning and typesMachine learning and types
Machine learning and types
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 
Back propagation
Back propagationBack propagation
Back propagation
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 

Similar to Resampling Methods Explained

10. sampling and hypotehsis
10. sampling and hypotehsis10. sampling and hypotehsis
10. sampling and hypotehsisKaran Kukreja
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distributionswarna dey
 
Monte carlo analysis
Monte carlo analysisMonte carlo analysis
Monte carlo analysisGargiKhanna1
 
Day-2_Presentation for SPSS parametric workshop.pptx
Day-2_Presentation for SPSS parametric workshop.pptxDay-2_Presentation for SPSS parametric workshop.pptx
Day-2_Presentation for SPSS parametric workshop.pptxrjaisankar
 
Sampling_Distribution_stat_of_Mean_New.pptx
Sampling_Distribution_stat_of_Mean_New.pptxSampling_Distribution_stat_of_Mean_New.pptx
Sampling_Distribution_stat_of_Mean_New.pptxRajJirel
 
Non Parametric Test by Vikramjit Singh
Non Parametric Test by  Vikramjit SinghNon Parametric Test by  Vikramjit Singh
Non Parametric Test by Vikramjit SinghVikramjit Singh
 
Identifying the sampling distribution module5
Identifying the sampling distribution module5Identifying the sampling distribution module5
Identifying the sampling distribution module5REYEMMANUELILUMBA
 
Chapter 7 sampling distributions
Chapter 7 sampling distributionsChapter 7 sampling distributions
Chapter 7 sampling distributionsmeharahutsham
 
CPSC 531: System Modeling and Simulation.pptx
CPSC 531:System Modeling and Simulation.pptxCPSC 531:System Modeling and Simulation.pptx
CPSC 531: System Modeling and Simulation.pptxFarhan27013
 
Basics of biostatistic
Basics of biostatisticBasics of biostatistic
Basics of biostatisticNeurologyKota
 

Similar to Resampling Methods Explained (20)

10. sampling and hypotehsis
10. sampling and hypotehsis10. sampling and hypotehsis
10. sampling and hypotehsis
 
Statistics
StatisticsStatistics
Statistics
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distribution
 
Bootstrap2up
Bootstrap2upBootstrap2up
Bootstrap2up
 
Unit 3 Sampling
Unit 3 SamplingUnit 3 Sampling
Unit 3 Sampling
 
estimation
estimationestimation
estimation
 
Estimation
EstimationEstimation
Estimation
 
Sampling Theory Part 1
Sampling Theory Part 1Sampling Theory Part 1
Sampling Theory Part 1
 
Monte carlo analysis
Monte carlo analysisMonte carlo analysis
Monte carlo analysis
 
Day-2_Presentation for SPSS parametric workshop.pptx
Day-2_Presentation for SPSS parametric workshop.pptxDay-2_Presentation for SPSS parametric workshop.pptx
Day-2_Presentation for SPSS parametric workshop.pptx
 
LR 9 Estimation.pdf
LR 9 Estimation.pdfLR 9 Estimation.pdf
LR 9 Estimation.pdf
 
Sampling_Distribution_stat_of_Mean_New.pptx
Sampling_Distribution_stat_of_Mean_New.pptxSampling_Distribution_stat_of_Mean_New.pptx
Sampling_Distribution_stat_of_Mean_New.pptx
 
Non Parametric Test by Vikramjit Singh
Non Parametric Test by  Vikramjit SinghNon Parametric Test by  Vikramjit Singh
Non Parametric Test by Vikramjit Singh
 
Identifying the sampling distribution module5
Identifying the sampling distribution module5Identifying the sampling distribution module5
Identifying the sampling distribution module5
 
3 es timation-of_parameters[1]
3 es timation-of_parameters[1]3 es timation-of_parameters[1]
3 es timation-of_parameters[1]
 
STATISTIC ESTIMATION
STATISTIC ESTIMATIONSTATISTIC ESTIMATION
STATISTIC ESTIMATION
 
Chapter 7 sampling distributions
Chapter 7 sampling distributionsChapter 7 sampling distributions
Chapter 7 sampling distributions
 
CPSC 531: System Modeling and Simulation.pptx
CPSC 531:System Modeling and Simulation.pptxCPSC 531:System Modeling and Simulation.pptx
CPSC 531: System Modeling and Simulation.pptx
 
Basics of biostatistic
Basics of biostatisticBasics of biostatistic
Basics of biostatistic
 
Test of significance
Test of significanceTest of significance
Test of significance
 

More from Setia Pramana

Big data for official statistics @ Konferensi Big Data Indonesia 2016
Big data for official statistics @ Konferensi Big Data Indonesia 2016 Big data for official statistics @ Konferensi Big Data Indonesia 2016
Big data for official statistics @ Konferensi Big Data Indonesia 2016 Setia Pramana
 
Introduction to Computational Statistics
Introduction to Computational StatisticsIntroduction to Computational Statistics
Introduction to Computational StatisticsSetia Pramana
 
Bioinformatics I-4 lecture
Bioinformatics I-4 lectureBioinformatics I-4 lecture
Bioinformatics I-4 lectureSetia Pramana
 
Correlation and Regression Analysis using SPSS and Microsoft Excel
Correlation and Regression Analysis using SPSS and Microsoft ExcelCorrelation and Regression Analysis using SPSS and Microsoft Excel
Correlation and Regression Analysis using SPSS and Microsoft ExcelSetia Pramana
 
Pengalaman Menjadi Mahasiswa Muslim di Eropa
Pengalaman Menjadi Mahasiswa Muslim di EropaPengalaman Menjadi Mahasiswa Muslim di Eropa
Pengalaman Menjadi Mahasiswa Muslim di EropaSetia Pramana
 
Multivariate data analysis
Multivariate data analysisMultivariate data analysis
Multivariate data analysisSetia Pramana
 
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...Setia Pramana
 
The Role of The Statisticians in Personalized Medicine: An Overview of Stati...
The Role of The Statisticians in Personalized Medicine:  An Overview of Stati...The Role of The Statisticians in Personalized Medicine:  An Overview of Stati...
The Role of The Statisticians in Personalized Medicine: An Overview of Stati...Setia Pramana
 
High throughput Data Analysis
High throughput Data AnalysisHigh throughput Data Analysis
High throughput Data AnalysisSetia Pramana
 
Research Methods for Computational Statistics
Research Methods for Computational StatisticsResearch Methods for Computational Statistics
Research Methods for Computational StatisticsSetia Pramana
 
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik JakartaSurvival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik JakartaSetia Pramana
 
The Role of Statistician in Personalized Medicine: An Overview of Statistical...
The Role of Statistician in Personalized Medicine: An Overview of Statistical...The Role of Statistician in Personalized Medicine: An Overview of Statistical...
The Role of Statistician in Personalized Medicine: An Overview of Statistical...Setia Pramana
 
“Big Data” and the Challenges for Statisticians
“Big Data” and the  Challenges for Statisticians“Big Data” and the  Challenges for Statisticians
“Big Data” and the Challenges for StatisticiansSetia Pramana
 
Getting a Scholarship, how?
Getting a Scholarship, how?Getting a Scholarship, how?
Getting a Scholarship, how?Setia Pramana
 
Kehidupan sehari-hari dengan Personnummer atau SIN Single Identity Number
Kehidupan sehari-hari dengan Personnummer atau SIN Single Identity NumberKehidupan sehari-hari dengan Personnummer atau SIN Single Identity Number
Kehidupan sehari-hari dengan Personnummer atau SIN Single Identity NumberSetia Pramana
 
Research possibilities with the Personal Identification Number (person nummer...
Research possibilities with the Personal Identification Number (person nummer...Research possibilities with the Personal Identification Number (person nummer...
Research possibilities with the Personal Identification Number (person nummer...Setia Pramana
 
Developing R Graphical User Interfaces
Developing R Graphical User InterfacesDeveloping R Graphical User Interfaces
Developing R Graphical User InterfacesSetia Pramana
 
Academia vs industry
Academia vs industryAcademia vs industry
Academia vs industrySetia Pramana
 
Gene sebuah nikmat Allah
Gene sebuah nikmat AllahGene sebuah nikmat Allah
Gene sebuah nikmat AllahSetia Pramana
 

More from Setia Pramana (20)

Big data for official statistics @ Konferensi Big Data Indonesia 2016
Big data for official statistics @ Konferensi Big Data Indonesia 2016 Big data for official statistics @ Konferensi Big Data Indonesia 2016
Big data for official statistics @ Konferensi Big Data Indonesia 2016
 
Introduction to Computational Statistics
Introduction to Computational StatisticsIntroduction to Computational Statistics
Introduction to Computational Statistics
 
Bioinformatics I-4 lecture
Bioinformatics I-4 lectureBioinformatics I-4 lecture
Bioinformatics I-4 lecture
 
Correlation and Regression Analysis using SPSS and Microsoft Excel
Correlation and Regression Analysis using SPSS and Microsoft ExcelCorrelation and Regression Analysis using SPSS and Microsoft Excel
Correlation and Regression Analysis using SPSS and Microsoft Excel
 
Pengalaman Menjadi Mahasiswa Muslim di Eropa
Pengalaman Menjadi Mahasiswa Muslim di EropaPengalaman Menjadi Mahasiswa Muslim di Eropa
Pengalaman Menjadi Mahasiswa Muslim di Eropa
 
Multivariate data analysis
Multivariate data analysisMultivariate data analysis
Multivariate data analysis
 
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...
Molecular Subtyping of Breast Cancer and Somatic Mutation Discovery Using DNA...
 
The Role of The Statisticians in Personalized Medicine: An Overview of Stati...
The Role of The Statisticians in Personalized Medicine:  An Overview of Stati...The Role of The Statisticians in Personalized Medicine:  An Overview of Stati...
The Role of The Statisticians in Personalized Medicine: An Overview of Stati...
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
High throughput Data Analysis
High throughput Data AnalysisHigh throughput Data Analysis
High throughput Data Analysis
 
Research Methods for Computational Statistics
Research Methods for Computational StatisticsResearch Methods for Computational Statistics
Research Methods for Computational Statistics
 
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik JakartaSurvival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
Survival Data Analysis for Sekolah Tinggi Ilmu Statistik Jakarta
 
The Role of Statistician in Personalized Medicine: An Overview of Statistical...
The Role of Statistician in Personalized Medicine: An Overview of Statistical...The Role of Statistician in Personalized Medicine: An Overview of Statistical...
The Role of Statistician in Personalized Medicine: An Overview of Statistical...
 
“Big Data” and the Challenges for Statisticians
“Big Data” and the  Challenges for Statisticians“Big Data” and the  Challenges for Statisticians
“Big Data” and the Challenges for Statisticians
 
Getting a Scholarship, how?
Getting a Scholarship, how?Getting a Scholarship, how?
Getting a Scholarship, how?
 
Kehidupan sehari-hari dengan Personnummer atau SIN Single Identity Number
Kehidupan sehari-hari dengan Personnummer atau SIN Single Identity NumberKehidupan sehari-hari dengan Personnummer atau SIN Single Identity Number
Kehidupan sehari-hari dengan Personnummer atau SIN Single Identity Number
 
Research possibilities with the Personal Identification Number (person nummer...
Research possibilities with the Personal Identification Number (person nummer...Research possibilities with the Personal Identification Number (person nummer...
Research possibilities with the Personal Identification Number (person nummer...
 
Developing R Graphical User Interfaces
Developing R Graphical User InterfacesDeveloping R Graphical User Interfaces
Developing R Graphical User Interfaces
 
Academia vs industry
Academia vs industryAcademia vs industry
Academia vs industry
 
Gene sebuah nikmat Allah
Gene sebuah nikmat AllahGene sebuah nikmat Allah
Gene sebuah nikmat Allah
 

Recently uploaded

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 

Recently uploaded (20)

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 

Resampling Methods Explained

  • 2. Outline • Background • Jackknife • Bootstrap • Permutation • Cross-validation
  • 3. Why do we need resampling? • Purpose of statistics is to estimate some parameter(s) and reliability of them. Since estimators are function of the sample points they are random variables. If we could find distribution of this random variable (sample statistic) then we could estimate reliability of the estimators. • If we would have sampling distribution for the sampling statistics then we can estimate variance of the estimator, interval, even test hypotheses
  • 4. Why do we need resampling? • Unfortunately apart from the simplest cases, sampling distribution is not easy to derive. • What is the sampling distribution of: • The time since most recent common ancestor of all humans? • The adjusted R-squared? • The AIC? • The beta coefficient when independence is violated? • The number of connections in a neural net? • The eigenvalues of PCA? • A bifurcation point in a phylogenetic tree?
  • 5. Why do we need resampling? • Unfortunately apart from the simplest cases, sampling distribution is not easy to derive. There are several techniques to approximate these distributions e.g., Laplace approximation. These approximations give analytical form for the approximate distributions. With advent of computers more computationally intensive methods are emerging. They work in many cases satisfactorily.
  • 6. Why do we need resampling? • The t-distribution and chi-squared distribution are good approximations for sufficiently large and/or normally-distributed samples. • However, when data is of un-known distribution or sample size is small, re-sampling tests are recommended.
  • 7. Resampling Methods • Jackknife • Bootstrap • Permutation • Cross-validation
  • 8. Resampling Method Application Sampling procedure used Bootstrap Standard deviation, confidence interval, hypothesis testing, bias Samples drawn at random, with replacement Jackknife Standard deviation, confidence interval, bias Samples consist of full data set with one observation left out Permutation Hypothesis testing Samples drawn at random, without replacement. Cross-validation Model validation Data is randomly divided into two or more subsets, with results validated across sub-samples.
  • 10. Jacknife Jacknife is used for bias removal. As we know that mean-square error for an estimator is equal to square of bias plus variance of the estimator. If bias is much higher than variance then under some circumstances Jacknife could be used. Description of Jacknife: Let us assume that we have a sample of size n. We estimate some sample statistics using all data – tn. Then by removing one point at a time we estimate tn-1,i, where subscript indicates size of the sample and index of removed sample point. Then new estimator is derived as: If the order of the bias of the statistic tn is O(n-1) then after jacknife order of the bias becomes O(n-2). Variance is estimated using: This procedure can be applied iteratively. I.e. for new estimator jacknife can be applied again. First application of Jacknife can reduce bias without changing variance of the estimator. But its second and higher order application can in general increase the variance of the estimator. n t ttnntt n i in nnnn     1 ,1 11 ' where,)1(        1 1 2 1,1 )( 1ˆ n i ninJ tt n n V
  • 12. The bootstrap • 1969 Simon publishes the bootstrap as an example in Basic Research Methods in Social Science (the earlier pigfood example) • 1979 Efron names and publishes first paper on the bootstrap
  • 13. Bootstrap (Nonparametric) 1 2( , ,.... )nx x x x ( )t F  Have a random sample from an unknown PDF, F. Want to estimate based on . We calculate the estimate based on . Want to know how accurate is . x xˆ ( )s x  ˆ
  • 14. Bootstrap (Nonparametric) 1 2( , ,.... )nx x x x ˆF Notation: Random sample: Empirical distribution , places mass of 1/n at each observed data value. Bootstrap sample: Random sample of size n, drawn from , denoted as Bootstrap replicate of : ˆF 1 2* ( *, *,..., *)nx x x x ˆ ˆ* ( *)s x 
  • 15. Bootstrap (Nonparametric) ¶ * * 2 1 ( ) 1 B ii BSE B       Bootstrap steps: 1. Select bootstrap sample consisting of n data values drawn with replacement from the original data set. 2. Evaluate for the bootstrap sample 3. Repeat steps 2 and 3 B times each. 4. Estimate the standard error by the sample standard deviation of the B replications: 1 2* ( *, *,..., *)nx x x x ˆ* ( *)s x  ˆ( )Fse 
  • 16. The Bootstrap • A new pigfood ration is tested on twelve pigs, with six-week weight gains as follows: • 496 544 464 416 512 560 608 544 480 466 512 496 • Mean: 508 ounces (establish a confidence interval)
  • 17. Draw simulated samples from a hypothetical universe that embodies all we know about the universe that this sample came from – our sample, replicated an infinite number of times The Classic Bootstrap
  • 18. 1. Put the observed weight gains in a hat 2. Sample 12 with replacement 3. Record the mean 4. Repeat steps 2-3, say, 1000 times 5. Record the 5th and 95th percentiles (for a 90% confidence interval)
  • 20. Parametric Bootstrap Resampling makes no assumptions about the population distribution. The bootstrap covered thus far is a nonparametric bootstrap. If we have information about the population distr., this can be used in resampling. In this case, when we draw randomly from the sample we can use population distr. For example, if we know that the population distr. is normal then estimate its parameters using the sample mean and variance. Then approximate the population distr. with the sample distr. and use it to draw new samples.
  • 21. Parametric Bootstrap As expected, if the assumption about population distribution is correct then the parametric bootstrap will perform better than the nonparametric bootstrap. If not correct, then the nonparametric bootstrap will perform better.
  • 22. Example of Bootstrap (Nonparametric) Have test scores (out of 100) for two consecutive years for each of 60 subjects. Want to obtain the correlation between the test scores and the variance of the correlation estimate. Can use bootstrap to obtain the variance estimate.
  • 23. How many Bootstrap Replications, B?  A fairly small number, B=25, is sufficient to be “informative” (Efron)  B=50 is typically sufficient to provide a crude estimate of the SE, but B>200 is generally used.  CIs require larger values of B, B no less than 500, with B=1000 recommended.
  • 25. Permutation Tests  In classical hypothesis testing, we start with assumptions about the underlying distribution and then derive the sampling distribution of the test statistic under H0.  In Permutation testing, the initial assumptions are not needed (except exchangeability), and the sampling distribution of the test statistic under H0 is computed by using permutations of the data.
  • 26. Permutation Tests (example) • The Permutation test is a technique that bases inference on “experiments” within the observed dataset. • Consider the following example: • In a medical experiment, rats are randomly assigned to a treatment (Tx) or control (C) group. • The outcome Xi is measured in the ith rat.
  • 27. • Under H0, the outcome does not depend on whether a rat carries the label Tx or C. • Under H1, the outcome tends to different, say larger for rats labeled Tx. • A test statistic T measures the difference in observed outcomes for the two groups. T may be the difference in the two group means (or medians), denoted as t for the observed data. Permutation Tests (example)
  • 28. • Under H0, the individual labels of Tx and C are unimportant, since they have no impact on the outcome. Since they are unimportant, the label can be randomly shuffled among the rats without changing the joint null distribution of the data. • Shuffling the data creates a “new” dataset. It has the same rats, but with the group labels changed so as to appear as there were different group assignments. Permutation Tests (example)
  • 29. • Let t be the value of the test statistic from the original dataset. • Let t1 be the value of the test statistic computed from a one dataset with permuted labels. • Consider all M possible permutations of the labels, obtaining the test statistics, t1, …, tM. • Under H0, t1, …, tM are all generated from the same underlying distribution that generated t. Permutation Tests (example)
  • 30.  Thus, t can be compared to the permuted data test statistics, t1, …, tM , to test the hypothesis and obtain a p-value or to construct confidence limits for the statistic. Permutation Tests (example)
  • 31. • Survival times • Treated mice 94, 38, 23, 197, 99, 16, 141 • Mean: 86.8 • Untreated mice 52, 10, 40, 104, 51, 27, 146, 30, 46 • Mean: 56.2 (Efron & Tibshirani) Permutation Tests (example)
  • 32. Calculate the difference between the means of the two observed samples – it’s 30.6 days in favor of the treated mice. Consider the two samples combined (16 observations) as the relevant universe to resample from. Permutation Tests (example)
  • 33.  Draw 7 hypothetical observations and designate them "Treatment"; draw 9 hypothetical observations and designate them "Control".  Compute and record the difference between the means of the two samples. Permutation Tests (example)
  • 34.  Repeat steps 3 and 4 perhaps 1000 times.  Determine how often the resampled difference exceeds the observed difference of 30.6 Permutation Tests (example)
  • 35. Histogram of permuted differences
  • 36.  If the group means are truly equal, then shifting the group labels will not have a big impact the sum of the two groups (or mean with equal sample sizes). Some group sums will be larger than in the original data set and some will be smaller. Permutation Tests (example)
  • 37. Permutation Test Example 1 •16!/(16-7)!= 57657600 •Dataset is too large to enumerate all permutations, a large number of random permutations are selected. •When permutations are enumerated, this is an exact permutation test.
  • 38. 38 1.1 What is Permutation Tests? • Permutation tests are significance tests based on permutation resamples drawn at random from the original data. Permutation resamples are drawn without replacement. • Also called randomization tests, re-randomization tests, exact tests. • Introduced by R.A. Fisher and E.J.G. Pitman in the 1930s. R.A. Fisher E.J.G. Pitman
  • 39. 39 When Can We Use Permutation Tests? • Only when we can see how to resample in a way that is consistent with the study design and with the null hypothesis. • If we cannot do a permutation test, we can often calculate a bootstrap confidence interval instead.
  • 40. Advantages 40 Exist for any test statistic, regardless of whether or not its distribution is known Free to choose the statistic which best discriminates between hypothesis and alternative and which minimizes losses Can be used for: - Analyzing unbalanced designs; - Combining dependent tests on mixtures of categorical, ordinal, and metric data.
  • 41. Limitations 41 The observations are exchangeable under the null hypothesis An Important Assumption Tests of difference in location (like a permutation t-test) require equal variance Consequence The permutation t-test shares the same weakness as the classical Student’s t- test. In this respect
  • 42. 42 Procedure of Permutation Tests Analyze the problem. - What is the hypothesis and alternative? - What distribution is the data drawn from? - What losses are associated with bad decisions? Choose a test statistic which will distinguish the hypothesis from the alternative. Compute the test statistic for the original data of the observations. I II III
  • 43. 43 Procedure of Permutation Tests IV V Rearrange the Observations Compute the test statistic for all possible permutations (rearrangements) of the data of the observations Make a decision Reject the Hypothesis: if the value of the test statistic for the original data is an extreme value in the permutation distribution of the statistic. Otherwise, accept the hypothesis and reject the alternative.
  • 44. 44 Permutation Resampling Process 5 7 8 4 1 6 8 9 7 5 9 5 7 8 4 1 6 8 9 7 5 9 5 7 1 5 9 8 4 6 8 9 7 Median (5 7 1 5 9) Median (8 4 6 8 9 7) Compute “difference statistic”, save result in table and repeat resampling process 1000+ iterations Collect Data from Control & Treatment Groups Merge Samples to form a pseudo population Sample without replacement from pseudo population to simulate Control and Treatment Groups Compute target statistic for each resample
  • 45. 45 An physiology experiment to found the relationship between Vitamin E and human “life-extending” Example: “I Lost the Labels”
  • 46. 46 Example: “I Lost the Labels” • 6 petri dishes: • 3 containing standard medium • 3 containing standard medium + Vitamin E • Without the labels, we have no way of knowing which cell cultures have been treated with Vitamin E and which have not. • There are six numbers “121, 118, 110, 34, 12, 22”, each one belongs to the petri dishes’ results. • The number belongs to which dishes?
  • 47. 47 Here is a simple sample:
  • 48. 48 Using the original T-test to Find P-value
  • 49. 49 Permutation Resampling Process 121 118 110 34 12 22 121 118 110 34 12 22 121 118 34 110 12 22 Median=91 Median=48 Compute “difference statistic”, save result in table and repeat resampling process 1000+ iterations Collect Data from Control & Treatment Groups Merge Samples to form a pseudo population Sample without replacement from pseudo population to simulate Control and Treatment Groups Compute target statistic for each resample
  • 53. 53 How is the conclusion • Test decision The absolute value of the test statistic t ≥ = 13.0875 we obtained for the original labeling. • We obtain the exact p value p = 2/20 = 0.1. • Note: If both groups have equal size, Only half of permutations is really needed (symmetry)
  • 54. What are resampling methods? • Tools that involves repeatedly drawing samples from a training set and refitting a model of interest on each sample in order to obtain more information about the fitted model • Model Assessment: estimate test error rates • Model Selection: select the appropriate level of model flexibility • They are computationally expensive! But these days we have powerful computers  • Two resampling methods: • Cross Validation • Bootstrapping IOM 530: Intro. to Statistical Learning 54
  • 56. Cross-validation • Cross-validation is a resampling technique to overcome overfitting.
  • 57. The Validation Set Approach • Suppose that we would like to find a set of variables that give the lowest test (not training) error rate • If we have a large data set, we can achieve this goal by randomly splitting the data into training and validation(testing) parts • We would then use the training part to build each possible model (i.e. the different combinations of variables) and choose the model that gave the lowest error rate when applied to the validation data IOM 530: Intro. to Statistical Learning 57Training Data Testing Data
  • 58. Example: Auto Data • Suppose that we want to predict mpg from horsepower • Two models: • mpg ~ horsepower • mpg ~ horsepower + horspower2 • Which model gives a better fit? • Randomly split Auto data set into training (196 obs.) and validation data (196 obs.) • Fit both models using the training data set • Then, evaluate both models using the validation data set • The model with the lowest validation (testing) MSE is the winner! IOM 530: Intro. to Statistical Learning 58
  • 59. Results: Auto Data • Left: Validation error rate for a single split • Right: Validation method repeated 10 times, each time the split is done randomly! • There is a lot of variability among the MSE’s… Not good! We need more stable methods! IOM 530: Intro. to Statistical Learning 59
  • 60. Leave-One-Out Cross Validation (LOOCV) • This method is similar to the Validation Set Approach, but it tries to address the latter’s disadvantages • For each suggested model, do: • Split the data set of size n into • Training data set (blue) size: n -1 • Validation data set (beige) size: 1 • Fit the model using the training data • Validate model using the validation data, and compute the corresponding MSE • Repeat this process n times • The MSE for the model is computed as follows: IOM 530: Intro. to Statistical Learning 60 FI GU R E 5.3. A schematic display of LOOCV. A set of n data points is repeat- edly split into a training set (shown in blue) containing all but one observation, and a validation set that contains only that observation (shown in beige). The test error is then estimated by averaging the n resulting MSE’s. The first training set contains all but observation 1, the second training set contains all but observation 2, and so forth. observations, and a prediction ˆy1 is made for the excluded observation, using itsvaluex1. Since(x1, y1) wasnot used in thefitting process, MSE1 = (y1 − ˆy1)2 provides an approximately unbiased estimate for the test error. But even though MSE1 is unbiased for the test error, it is a poor estimate because it is highly variable, since it is based upon a single observation (x1, y1). We can repeat the procedure by selecting (x2, y2) for the validation data, training the statistical learning procedure on the n − 1 observations { (x1, y1), (x3, y3), . . . , (xn , yn )} , and computing MSE2 = (y2− ˆy2)2 . Repeat- ing this approach n times produces n squared errors, MSE1, . . . , MSEn . The LOOCV estimate for the test MSE is the averageof these n test error estimates: CV(n ) = 1 n n i = 1 MSEi . (5.1) A schematic of the LOOCV approach is illustrated in Figure 5.3. LOOCV has a couple of major advantages over the validation set ap- proach. First, it has far less bias. In LOOCV, we repeatedly fit the statis-
  • 61. LOOCV vs. the Validation Set Approach • LOOCV has less bias • We repeatedly fit the statistical learning method using training data that contains n-1 obs., i.e. almost all the data set is used • LOOCV produces a less variable MSE • The validation approach produces different MSE when applied repeatedly due to randomness in the splitting process, while performing LOOCV multiple times will always yield the same results, because we split based on 1 obs. each time • LOOCV is computationally intensive (disadvantage) • We fit the each model n times! IOM 530: Intro. to Statistical Learning 61
  • 62. k-fold Cross Validation • LOOCV is computationally intensive, so we can run k-fold Cross Validation instead • With k-fold Cross Validation, we divide the data set into K different parts (e.g. K = 5, or K = 10, etc.) • We then remove the first part, fit the model on the remaining K-1 parts, and see how good the predictions are on the left out part (i.e. compute the MSE on the first part) • We then repeat this K different times taking out a different part each time • By averaging the K different MSE’s we get an estimated validation (test) error rate for new observations IOM 530: Intro. to Statistical Learning 62 FI GU R E 5.5. A schematic display of 5-fold CV. A set of n observations is randomly split into five non-overlapping groups. Each of these fifths acts as a validation set (shown in beige), and the remainder as a training set (shown in blue). The test error is estimated by averaging the five resulting MSE estimates. chapters. The magic formula (5.2) does not hold in general, in which case the model has to be refit n times. 5.1.3 k-Fold Cross-Validation An alternative to LOOCV is k-fold CV. This approach involves randomly k-fold CV dividing the set of observations into k groups, or folds, of approximately equal size. The first fold is treated as a validation set, and the method is fit on the remaining k − 1 folds. The mean squared error, MSE1, is then computed on the observations in the held-out fold. This procedure is repeated k times; each time, a different group of observations is treated as a validation set. This process results in k estimates of the test error, MSE1, MSE2, . . . , MSEk . The k-fold CV estimateis computed by averaging these values, CV(k) = 1 k k i = 1 MSEi . (5.3) Figure 5.5 illustrates the k-fold CV approach. It isnot hard to seethat LOOCV isa special caseof k-fold CV in which k
  • 63. K-fold Cross Validation IOM 530: Intro. to Statistical Learning 63
  • 64. Auto Data: LOOCV vs. K-fold CV • Left: LOOCV error curve • Right: 10-fold CV was run many times, and the figure shows the slightly different CV error rates • LOOCV is a special case of k-fold, where k = n • They are both stable, but LOOCV is more computationally intensive! IOM 530: Intro. to Statistical Learning 64
  • 65. Auto Data: Validation Set Approach vs. K-fold CV Approach • Left: Validation Set Approach • Right: 10-fold Cross Validation Approach • Indeed, 10-fold CV is more stable! IOM 530: Intro. to Statistical Learning 65
  • 66. Bias- Variance Trade-off for k-fold CV • Putting aside that LOOCV is more computationally intensive than k-fold CV… Which is better LOOCV or K-fold CV? • LOOCV is less bias than k-fold CV (when k < n) • But, LOOCV has higher variance than k-fold CV (when k < n) • Thus, there is a trade-off between what to use • Conclusion: • We tend to use k-fold CV with (K = 5 and K = 10) • These are the magical K’s  • It has been empirically shown that they yield test error rate estimates that suffer neither from excessively high bias, nor from very high variance IOM 530: Intro. to Statistical Learning 66
  • 67. Cross Validation on Classification Problems • Cross validation can be used in a classification situation in a similar manner • Divide data into K parts • Hold out one part, fit using the remaining data and compute the error rate on the hold out data • Repeat K times • CV error rate is the average over the K errors we have computed IOM 530: Intro. to Statistical Learning 67
  • 68. Software? R • http://www.ats.ucla.edu/stat/r/library/bootstrap.htm • http://spark.rstudio.com/ahmed/bootstrap/ • http://spark.rstudio.com/ahmed/permutation/