SlideShare a Scribd company logo
1 of 5
Download to read offline
Short Guides to Microeconometrics
Spring 2012
Kurt Schmidheiny
Unversit¨at Basel
The Bootstrap
1 Introduction
The bootstrap is a method to derive properties (standard errors, confi-
dence intervals and critical values) of the sampling distribution of estima-
tors. It is very similar to Monte Carlo techniques (see the corresponding
hand-out). However instead of fully specifying the data generating process
(DGP), we use information from the sample.
In short, the bootstrap takes the sample (the values of the indepen-
dent and dependent variables) as the population and the estimates of the
sample as true values. Instead of drawing from a specified distribution
(such as the normal) by a random number generator, the bootstrap draws
with replacement from the sample. It therefore takes the empirical distri-
bution function (the step-function) as true distribution function. In the
example of a linear regression model, the sample provides the empirical
distribution for the dependent variable, the independent variables and the
error term as well as values for constant, slope and error variance. The
great advantage compared to Monte Carlo methods is that we neither
make assumption about the distributions nor about the true values of the
parameters.
The bootstrap is typically used for consistent but biased estimators.
In most cases we know the asymptotic properties of these estimators. So
we could use asymptotic theory to derive the approximate sampling distri-
bution. That is what we usually do when using, for example, maximum
likelihood estimators. The bootstrap is an alternative way to produce
approximations for the true small sample properties. So why (or when)
would we use the bootstrap? There are two main reasons:
Version: 9-4-2012, 16:09
The Bootstrap 2
1a) The asymptotic sampling distribution is very difficult to derive.
1b) The asymptotic sampling distribution is too difficult to derive for
me. This might apply to many multi-stage estimators. Example:
the two stage estimator of the heckman sample selection model.
1c) The asymptotic sampling distribution is too time-consuming and
error-prone for me. This might apply to forecasts or statistics that
are (nonlinear) functions of the estimated model parameters. Ex-
ample: elasticities calculated from slope coefficients.
2 ) The bootstrap produces “better” approximations for some properties.
It can be shown that bootstrap approximations converge faster for
certain statistics1
than the approximations based on asymptotic the-
ory. These bootstrap approximations are called asymptotic refine-
ments. Example: the t-statistic of a mean or a slope coefficient.
Note that both asymptotic theory and the bootstrap only provide approx-
imations for finite sample properties. The bootstrap produces consistent
approximations for the sampling distribution for a variety of estimators
such as the mean, median, the coefficients in OLS and most econometric
models. However, there are estimators (e.g. the maximum) for which the
bootstrap fails to produce consistent properties.
This handout covers the nonparametric bootstrap with paired sam-
pling. This method is appropriate for randomly sampled cross-section
data. Data from complex random samplings procedures (e.g. stratified
sampling) require special attention. See the handout on “Clustering”.
Time-series data and panel data also require more sophisticated boot-
strap techniques.
1These statistics are called asymptotically pivotal, i.e. there asymptotic distribu-
tions are independent of the data and of the true parameter values. This applies, for
example, to all statistics with the standard normal or Chi-squared as limiting distri-
bution.
3 Short Guides to Microeconometrics
2 The Method: Nonparametric Bootstrap
2.1 Bootstrap Samples
Consider a sample with n = 1, ..., N independent observations of a de-
pendent variable y and K + 1 explanatory variables x. A paired bootstrap
sample is obtained by independently drawing N pairs (xi, yi) from the
observed sample with replacement. The bootstrap sample has the same
number of observations, however some observations appear several times
and others never. The bootstrap involves drawing a large number B of
bootstrap samples. An individual bootstrap sample is denoted (x∗
b , y∗
b ),
where x∗
b is a N ×(K +1) matrix and y∗
b an N-dimensional column vector
of the data in the b-th bootstrap sample.
2.2 Bootstrap Standard Errors
The empirical standard deviation of a series of bootstrap replications of
ˆθ can be used to approximate the standard error se(ˆθ) of an estimator ˆθ.
1. Draw B independent bootstrap samples (x∗
b , y∗
b ) of size N from
(x, y). Usually B = 100 replications are sufficient.
2. Estimate the parameter θ of interest for each bootstrap sample:
ˆθ∗
b for b = 1, ..., B.
3. Estimate se(ˆθ) by
se =
1
B − 1
B
b=1
(ˆθ∗
b − ˆθ∗)2
where ˆθ∗
= 1
B
B
b=1
ˆθ∗
b .
The whole covariance matrix V (ˆθ) of a vector ˆθ is estimated analogously.
The Bootstrap 4
In case the estimator ˆθ is consistent and asymptotically normally dis-
tributed, bootstrap standard errors can be used to construct approximate
confidence intervals and to perform asymptotic tests based on the normal
distribution.
2.3 Confidence Intervals Based on Bootstrap Percentiles
We can construct a two-sided equal-tailed (1−α) confidence interval for an
estimate ˆθ from the empirical distribution function of a series of bootstrap
replications. The (α/2) and the (1 − α/2) empirical percentiles of the
bootstrap replications are used as lower and upper confidence bounds.
This procedure is called percentile bootstrap.
1. Draw B independent bootstrap samples (x∗
b , y∗
b ) of size N from
(x, y). It is recommended to use B = 1000 or more replications.
2. Estimate the parameter θ of interest for each bootstrap sample:
ˆθ∗
b for b = 1, ..., B.
3. Order the bootstrap replications of ˆθ such that ˆθ∗
1 ≤ ... ≤ ˆθ∗
B. The
lower and upper confidence bounds are the B · α/2-th and B · (1 −
α/2)-th ordered elements, respectively. For B = 1000 and α = 5%
these are the 25th and 975th ordered elements. The estimated (1−α)
confidence interval of ˆθ is
[ˆθ∗
B·α/2, ˆθ∗
B·(1−α/2)].
Note that these confidence intervals are in general not symmetric.
2.4 Bootstrap Hypothesis Tests
The approximate confidence interval in section 2.3 can be used to perform
an approximate two-sided test of a null hypothesis of the form H0 : θ = θ0.
The null hypothesis is rejected on the significance level α if θ0 lies outside
the two-tailed (1 − α) confidence interval.
5 Short Guides to Microeconometrics
2.5 The bootstrap-t
Assume that we have consistent estimates of ˆθ and se(ˆθ) at hand and that
the asymptotic distribution of the t-statistic is the standard normal
t =
ˆθ − θ0
se(ˆθ)
d
−→ N(0, 1).
Then we can calculate approximate critical values from percentiles of the
empirical distribution of a series bootstrap replications for the t-statistic.
1. Consistently estimate θ and se(ˆθ) using the observed sample:
ˆθ, se(ˆθ)
2. Draw B independent bootstrap samples (x∗
b , y∗
b ) of size N from
(x, y). It is recommended to use B = 1000 or more replications.
3. Estimate the t-value assuming θ0 = ˆθ for each bootstrap sample:
t∗
b =
ˆθ∗
b − ˆθ
se∗
b (ˆθ)
for b = 1, ..., B
where ˆθ∗
b and se∗
b (ˆθ) are estimates of the parameter θ and its stan-
dard error using the bootstrap sample.
4. Order the bootstrap replications of t such that t∗
1 ≤ ... ≤ t∗
B. The
lower critical value and the upper critical values are then the B·α/2-
th and B · (1 − α/2)-th elements, respectively. For B = 1000 and
α = 5% these are the 25th and 975th ordered elements.
tα/2 = t∗
B·α/2, t1−α/2 = t∗
B·(1−α/2)
These critical values can now be used in otherwise usual t-tests for θ.
The above bootstrap lower tB·α/2) and upper tB·(1−α/2) critical val-
ues generally differ in absolute values. Alternatively, we can estimate
symmetric critical values by adapting step 4:
The Bootstrap 6
4. Order the bootstrap replications of t such that |t∗
1| ≤ ... ≤ |t∗
B|. The
absolute critical value is then the the B · (1 − α)-th element. For
B = 1000 and α = 5% this is the 950th ordered element. The lower
and upper critical values are, respectively:
tα/2 = −|t∗
B·(1−α)|, t1−α/2 = |t∗
B·(1−α)|
The symmetric bootstrap-t is the preferred method for bootstrap hypoth-
esis testing as it makes use of the faster convergence of t-statistics relative
to asymptotic approximations (i.e. critical values from the t- or standard
normal tables).
The bootstrap-t procedure can also be used to create confidence inter-
vals using bootstrap critical values instead of the ones from the standard
normal tables:
[ˆθ + tα/2 · se(ˆθ), ˆθ + t1−α/2 · se(ˆθ)]
The confidence interval from bootstrap-t is not necessarily better then the
percentile method. However, it is consistent with bootstrap-t hypothesis
testing.
3 Implementation in Stata 12.0
Stata has very conveniently implemented the bootstrap for cross-section
data. Bootstrap sampling and summarizing the results is automatically
done by Stata. The Stata commands are shown for the example of a
univariate regression of a variable y on x.
Case 1: Bootstrap standard errors are implemented as option
in the stata command
Many stata estimation commands such as regress have a built-in vce
option to calculate bootstrap covariance estimates. For example
regress y x, vce(bootstrap, reps(100))
7 Short Guides to Microeconometrics
runs B = 100 bootstrap iterations of a linear regression and reports boot-
strap standard errors along with confidence intervals and p-values based
on the normal approximation and bootstrap standard errors. The postes-
timation command
regress y x, vce(bootstrap, reps(1000))
estat bootstrap, percentile
reports confidence bounds based on bootstrap percentiles rather than the
normal approximation. Remember that it is recommended to use at least
B = 1000 replications for bootstrap percentiles. The percentiles to be
reported are defined with the confidence level option. For example, the
0.5% and 99.5% percentiles that create the 99% confidence interval are
reported by
regress y x, vce(bootstrap, reps(1000)) level(99)
estat bootstrap, percentile
Case 2: The statistic of interest is returned by a single stata
command
The command
bootstrap, reps(100): reg y x
runs B = 100 bootstrap iterations of a linear regression and reports boot-
strap standard errors along with confidence intervals and p-values based
on the normal approximation and bootstrap standard errors. The postes-
timation command estat bootstrap is used to report confidence inter-
vals based on bootstrap percentiles from e.g. B = 1000 replications:
bootstrap, reps(1000): reg y x
estat bootstrap, percentile
We can select an specific statistic to be recorded in the bootstrap
iterations. For example the slope coefficient only:
bootstrap _b[x], reps(100): reg y x
The Bootstrap 8
By default, Stata records the whole coefficient vector b. Any value re-
turned by a stata command (see ereturn list) can be selected.
We can also record functions of returned statistics. For example, the
following commands create bootstrap critical values on the 5% significance
level of the t-statistic for the slope coefficient:
reg y x
scalar b = _b[x]
bootstrap t=((_b[x]-b)/_se[x]), reps(1000): reg y x, level(95)
estat bootstrap, percentile
The respective symmetric critical values on the 5% significance level are
calculated by
reg y x
scalar b = _b[x]
bootstrap t=abs((_b[x]-b)/_se[x]), reps(1000): reg y x, level(90)
estat bootstrap, percentile
We can save the bootstrap replications of the selected statistics in a
normal stata .dta file to further investigate the bootstrap sampling distri-
bution. For example,
bootstrap b=_b[x], reps(1000) saving(bs_b, replace): reg y x
use bs_b, replace
histogram b
shows the bootstrap histogram of the sampling distribution of the slope
coefficient.
Note: it is important that all observations with missing values are
dropped from the dataset before using the bootstrap command. Missing
values will lead to different bootstrap sample sizes.
Case 3: The statistics of interest is calculated in a series of stata
commands
The first task is to define a program that produces the statistic of inter-
est for a single sample. This program might involve several estimation
9 Short Guides to Microeconometrics
commands and intermediate results. For example, the following program
calculates the t-statistic centered at ˆβ in a regression of y on x
program tstat, rclass
reg y x
return scalar t = (_b[x]-b)/_se[x]
end
The last line of the program specifies the value that is investigated in the
bootstrap: (ˆβ − b)/se(ˆβ) which will be returned under the name t. The
definition of the program can be directly typed into the command window
or is part of a do-file. The program should now be tested by typing
reg y x
scalar b = _b[x]
tstat
return list
The bootstrap is then performed by the Stata commands
reg y x
scalar b = _b[x]
bootstrap t=r(t), reps(1000): tstat
estat bootstrap, percentile
As in case 2, the bootstrap results can be saved and evaluated manu-
ally. For example,
reg y x
scalar b = _b[x]
bootstrap t=r(t), reps(1000) saving(bs_t): tstat
use bs_t, replace
centile t, centile(2.5, 97.5)
gen t_abs = abs(t)
centile t_abs, centile(95)
reports both asymmetric and symmetric critical values on the 5% signifi-
cance level for t-tests on the slope coefficient.
The Bootstrap 10
4 See also ...
There is much more about the bootstrap than presented in this handout.
Instead of paired resampling there is residual resampling which is often
used in time-series context. There is also a parametric bootstrap. The
bootstrap can also be used to reduce the small sample bias of an estimator
by bias corrections. The m out of n bootstrap is used to overcome some
bootstrap failures. A method very similar to the bootstrap is the jackknife.
References
Bradley Efron and Robert J. Tibshirani (1993), An Introduction to the
Bootstrap, Boca Raton: Chapman & Hall. [A fairly advanced but
nicely and practically explained comprehensive text by the inventor of
the bootstrap.]
Brownstone, David and Robert Valetta (2001), The Bootstrap and Multi-
ple Imputations: Harnessing Increased Computing Power for Improved
Statistical Tests, Journal of Economic Perspectives, 15(4), 129-141.
[An intuitive pladoyer for the use of the bootstrap.]
Cameron, A. C. and P. K. Trivedi (2005), Microeconometrics: Methods
and Applications, Cambridge University Press. Sections 7.8 and chap-
ter 11.
Horowitz, Joel L. (1999) The Bootstrap, In: Handbook of Econometrics,
Vol. 5. [This is a very advanced description of the (asymptotic) prop-
erties of the bootstrap.]
Wooldridge, Jeffrey M. (2009), Introductory Econometrics: A Modern
Approach, 4th ed. South-Western. Appendix 6A. [A first glance at
the bootstrap.]

More Related Content

What's hot

2. polynomial interpolation
2. polynomial interpolation2. polynomial interpolation
2. polynomial interpolation
EasyStudy3
 
Roots of equations
Roots of equationsRoots of equations
Roots of equations
Mileacre
 

What's hot (19)

APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...
 
Roots equation
Roots equationRoots equation
Roots equation
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
Newton cotes integration method
Newton cotes integration  methodNewton cotes integration  method
Newton cotes integration method
 
Atan2
Atan2Atan2
Atan2
 
Interpolation and its applications
Interpolation and its applicationsInterpolation and its applications
Interpolation and its applications
 
Application of interpolation and finite difference
Application of interpolation and finite differenceApplication of interpolation and finite difference
Application of interpolation and finite difference
 
Monte Carlo Simulation Of Heston Model In Matlab(1)
Monte Carlo Simulation Of Heston Model In Matlab(1)Monte Carlo Simulation Of Heston Model In Matlab(1)
Monte Carlo Simulation Of Heston Model In Matlab(1)
 
Bisection
BisectionBisection
Bisection
 
Probability and random processes project based learning template.pdf
Probability and random processes project based learning template.pdfProbability and random processes project based learning template.pdf
Probability and random processes project based learning template.pdf
 
2. polynomial interpolation
2. polynomial interpolation2. polynomial interpolation
2. polynomial interpolation
 
Regula Falsi (False position) Method
Regula Falsi (False position) MethodRegula Falsi (False position) Method
Regula Falsi (False position) Method
 
Roots of equations
Roots of equationsRoots of equations
Roots of equations
 
Lar calc10 ch04_sec4
Lar calc10 ch04_sec4Lar calc10 ch04_sec4
Lar calc10 ch04_sec4
 
Numerical Method Analysis: Algebraic and Transcendental Equations (Non-Linear)
Numerical Method Analysis: Algebraic and Transcendental Equations (Non-Linear)Numerical Method Analysis: Algebraic and Transcendental Equations (Non-Linear)
Numerical Method Analysis: Algebraic and Transcendental Equations (Non-Linear)
 
Matlab plotting
Matlab plottingMatlab plotting
Matlab plotting
 
Bisection method
Bisection methodBisection method
Bisection method
 
Chapter 3
Chapter 3Chapter 3
Chapter 3
 

Viewers also liked

Cluster and multistage sampling
Cluster and multistage samplingCluster and multistage sampling
Cluster and multistage sampling
suncil0071
 

Viewers also liked (20)

India
IndiaIndia
India
 
Japan
JapanJapan
Japan
 
lassification with decision trees from a nonparametric predictive inference p...
lassification with decision trees from a nonparametric predictive inference p...lassification with decision trees from a nonparametric predictive inference p...
lassification with decision trees from a nonparametric predictive inference p...
 
ma52006id386
ma52006id386ma52006id386
ma52006id386
 
Quantum Minimax Theorem in Statistical Decision Theory (RIMS2014)
Quantum Minimax Theorem in Statistical Decision Theory (RIMS2014)Quantum Minimax Theorem in Statistical Decision Theory (RIMS2014)
Quantum Minimax Theorem in Statistical Decision Theory (RIMS2014)
 
Understanding the mysteries of the CSS property value syntax
Understanding the mysteries of the CSS property value syntaxUnderstanding the mysteries of the CSS property value syntax
Understanding the mysteries of the CSS property value syntax
 
E-learning
E-learning E-learning
E-learning
 
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrapStatistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
 
Rivers of india
Rivers of indiaRivers of india
Rivers of india
 
Geography Fundamentals" Class - 12" NCERT
Geography Fundamentals" Class - 12" NCERTGeography Fundamentals" Class - 12" NCERT
Geography Fundamentals" Class - 12" NCERT
 
Resampling methods
Resampling methodsResampling methods
Resampling methods
 
Difference between statistical description and inference
Difference between statistical description and inferenceDifference between statistical description and inference
Difference between statistical description and inference
 
Cluster and multistage sampling
Cluster and multistage samplingCluster and multistage sampling
Cluster and multistage sampling
 
Agriculture -Geography - Class 10
Agriculture -Geography - Class 10Agriculture -Geography - Class 10
Agriculture -Geography - Class 10
 
Quick reminder is this a central tendency - spread - symmetry question(2)
Quick reminder   is this a central tendency - spread - symmetry question(2)Quick reminder   is this a central tendency - spread - symmetry question(2)
Quick reminder is this a central tendency - spread - symmetry question(2)
 
Introduction to Language and Linguistics 005: Morphology & Syntax
Introduction to Language and Linguistics 005: Morphology & SyntaxIntroduction to Language and Linguistics 005: Morphology & Syntax
Introduction to Language and Linguistics 005: Morphology & Syntax
 
General knowledge and current affairs
General knowledge and current affairs  General knowledge and current affairs
General knowledge and current affairs
 
current affairs 2016
current affairs 2016current affairs 2016
current affairs 2016
 
sampling ppt
sampling pptsampling ppt
sampling ppt
 
Mobile-First SEO - The Marketers Edition #3XEDigital
Mobile-First SEO - The Marketers Edition #3XEDigitalMobile-First SEO - The Marketers Edition #3XEDigital
Mobile-First SEO - The Marketers Edition #3XEDigital
 

Similar to Bootstrap2up

Using the Componentwise Metropolis-Hastings Algorithm to Sample from the Join...
Using the Componentwise Metropolis-Hastings Algorithm to Sample from the Join...Using the Componentwise Metropolis-Hastings Algorithm to Sample from the Join...
Using the Componentwise Metropolis-Hastings Algorithm to Sample from the Join...
Thomas Templin
 
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
Karl Rudeen
 
Descriptive_Statistics : Introduction to Descriptive_Statistics,Central tende...
Descriptive_Statistics : Introduction to Descriptive_Statistics,Central tende...Descriptive_Statistics : Introduction to Descriptive_Statistics,Central tende...
Descriptive_Statistics : Introduction to Descriptive_Statistics,Central tende...
SanjanaSaxena17
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
Stéphane Canu
 
Essay On Linear Function
Essay On Linear FunctionEssay On Linear Function
Essay On Linear Function
Angie Lee
 

Similar to Bootstrap2up (20)

Some real life data analysis on stationary and non-stationary Time Series
Some real life data analysis on stationary and non-stationary Time SeriesSome real life data analysis on stationary and non-stationary Time Series
Some real life data analysis on stationary and non-stationary Time Series
 
Using the Componentwise Metropolis-Hastings Algorithm to Sample from the Join...
Using the Componentwise Metropolis-Hastings Algorithm to Sample from the Join...Using the Componentwise Metropolis-Hastings Algorithm to Sample from the Join...
Using the Componentwise Metropolis-Hastings Algorithm to Sample from the Join...
 
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
 
Project Paper
Project PaperProject Paper
Project Paper
 
Monte carlo-simulation
Monte carlo-simulationMonte carlo-simulation
Monte carlo-simulation
 
Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2
 
recko_paper
recko_paperrecko_paper
recko_paper
 
Talk 5
Talk 5Talk 5
Talk 5
 
Input analysis
Input analysisInput analysis
Input analysis
 
Quantitative Analysis for Emperical Research
Quantitative Analysis for Emperical ResearchQuantitative Analysis for Emperical Research
Quantitative Analysis for Emperical Research
 
Side Notes on Practical Natural Language Processing: Bootstrap Test
Side Notes on Practical Natural Language Processing: Bootstrap TestSide Notes on Practical Natural Language Processing: Bootstrap Test
Side Notes on Practical Natural Language Processing: Bootstrap Test
 
Output analysis of a single model
Output analysis of a single modelOutput analysis of a single model
Output analysis of a single model
 
Partha Sengupta_structural analysis.pptx
Partha Sengupta_structural analysis.pptxPartha Sengupta_structural analysis.pptx
Partha Sengupta_structural analysis.pptx
 
CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...
CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...
CLIM Program: Remote Sensing Workshop, Blocking Methods for Spatial Statistic...
 
Descriptive_Statistics : Introduction to Descriptive_Statistics,Central tende...
Descriptive_Statistics : Introduction to Descriptive_Statistics,Central tende...Descriptive_Statistics : Introduction to Descriptive_Statistics,Central tende...
Descriptive_Statistics : Introduction to Descriptive_Statistics,Central tende...
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
working with python
working with pythonworking with python
working with python
 
Module-2_ML.pdf
Module-2_ML.pdfModule-2_ML.pdf
Module-2_ML.pdf
 
Essay On Linear Function
Essay On Linear FunctionEssay On Linear Function
Essay On Linear Function
 

Recently uploaded

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Abortion pill for sale in Muscat (+918761049707)) Get Cytotec Cash on deliver...
Abortion pill for sale in Muscat (+918761049707)) Get Cytotec Cash on deliver...Abortion pill for sale in Muscat (+918761049707)) Get Cytotec Cash on deliver...
Abortion pill for sale in Muscat (+918761049707)) Get Cytotec Cash on deliver...
instagramfab782445
 
Editorial design Magazine design project.pdf
Editorial design Magazine design project.pdfEditorial design Magazine design project.pdf
Editorial design Magazine design project.pdf
tbatkhuu1
 
VVIP CALL GIRLS Lucknow 💓 Lucknow < Renuka Sharma > 7877925207 Escorts Service
VVIP CALL GIRLS Lucknow 💓 Lucknow < Renuka Sharma > 7877925207 Escorts ServiceVVIP CALL GIRLS Lucknow 💓 Lucknow < Renuka Sharma > 7877925207 Escorts Service
VVIP CALL GIRLS Lucknow 💓 Lucknow < Renuka Sharma > 7877925207 Escorts Service
aroranaina404
 
call girls in Dakshinpuri (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
call girls in Dakshinpuri  (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️call girls in Dakshinpuri  (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
call girls in Dakshinpuri (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Abortion Pills in Oman (+918133066128) Cytotec clinic buy Oman Muscat
Abortion Pills in Oman (+918133066128) Cytotec clinic buy Oman MuscatAbortion Pills in Oman (+918133066128) Cytotec clinic buy Oman Muscat
Abortion Pills in Oman (+918133066128) Cytotec clinic buy Oman Muscat
Abortion pills in Kuwait Cytotec pills in Kuwait
 
DESIGN THINKING in architecture- Introduction
DESIGN THINKING in architecture- IntroductionDESIGN THINKING in architecture- Introduction
DESIGN THINKING in architecture- Introduction
sivagami49
 

Recently uploaded (20)

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Abortion pill for sale in Muscat (+918761049707)) Get Cytotec Cash on deliver...
Abortion pill for sale in Muscat (+918761049707)) Get Cytotec Cash on deliver...Abortion pill for sale in Muscat (+918761049707)) Get Cytotec Cash on deliver...
Abortion pill for sale in Muscat (+918761049707)) Get Cytotec Cash on deliver...
 
Sector 104, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 104, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 104, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 104, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Editorial design Magazine design project.pdf
Editorial design Magazine design project.pdfEditorial design Magazine design project.pdf
Editorial design Magazine design project.pdf
 
HiFi Call Girl Service Delhi Phone ☞ 9899900591 ☜ Escorts Service at along wi...
HiFi Call Girl Service Delhi Phone ☞ 9899900591 ☜ Escorts Service at along wi...HiFi Call Girl Service Delhi Phone ☞ 9899900591 ☜ Escorts Service at along wi...
HiFi Call Girl Service Delhi Phone ☞ 9899900591 ☜ Escorts Service at along wi...
 
SD_The MATATAG Curriculum Training Design.pptx
SD_The MATATAG Curriculum Training Design.pptxSD_The MATATAG Curriculum Training Design.pptx
SD_The MATATAG Curriculum Training Design.pptx
 
UI:UX Design and Empowerment Strategies for Underprivileged Transgender Indiv...
UI:UX Design and Empowerment Strategies for Underprivileged Transgender Indiv...UI:UX Design and Empowerment Strategies for Underprivileged Transgender Indiv...
UI:UX Design and Empowerment Strategies for Underprivileged Transgender Indiv...
 
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Gi...
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Gi...Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Gi...
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Gi...
 
Jordan_Amanda_DMBS202404_PB1_2024-04.pdf
Jordan_Amanda_DMBS202404_PB1_2024-04.pdfJordan_Amanda_DMBS202404_PB1_2024-04.pdf
Jordan_Amanda_DMBS202404_PB1_2024-04.pdf
 
call girls in Kaushambi (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...
call girls in Kaushambi (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...call girls in Kaushambi (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...
call girls in Kaushambi (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...
 
Q4-W4-SCIENCE-5 power point presentation
Q4-W4-SCIENCE-5 power point presentationQ4-W4-SCIENCE-5 power point presentation
Q4-W4-SCIENCE-5 power point presentation
 
VIP Model Call Girls Kalyani Nagar ( Pune ) Call ON 8005736733 Starting From ...
VIP Model Call Girls Kalyani Nagar ( Pune ) Call ON 8005736733 Starting From ...VIP Model Call Girls Kalyani Nagar ( Pune ) Call ON 8005736733 Starting From ...
VIP Model Call Girls Kalyani Nagar ( Pune ) Call ON 8005736733 Starting From ...
 
VVIP CALL GIRLS Lucknow 💓 Lucknow < Renuka Sharma > 7877925207 Escorts Service
VVIP CALL GIRLS Lucknow 💓 Lucknow < Renuka Sharma > 7877925207 Escorts ServiceVVIP CALL GIRLS Lucknow 💓 Lucknow < Renuka Sharma > 7877925207 Escorts Service
VVIP CALL GIRLS Lucknow 💓 Lucknow < Renuka Sharma > 7877925207 Escorts Service
 
call girls in Dakshinpuri (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
call girls in Dakshinpuri  (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️call girls in Dakshinpuri  (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
call girls in Dakshinpuri (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
 
Abortion Pills in Oman (+918133066128) Cytotec clinic buy Oman Muscat
Abortion Pills in Oman (+918133066128) Cytotec clinic buy Oman MuscatAbortion Pills in Oman (+918133066128) Cytotec clinic buy Oman Muscat
Abortion Pills in Oman (+918133066128) Cytotec clinic buy Oman Muscat
 
call girls in Vasundhra (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...
call girls in Vasundhra (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...call girls in Vasundhra (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...
call girls in Vasundhra (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...
 
DESIGN THINKING in architecture- Introduction
DESIGN THINKING in architecture- IntroductionDESIGN THINKING in architecture- Introduction
DESIGN THINKING in architecture- Introduction
 
Chapter 19_DDA_TOD Policy_First Draft 2012.pdf
Chapter 19_DDA_TOD Policy_First Draft 2012.pdfChapter 19_DDA_TOD Policy_First Draft 2012.pdf
Chapter 19_DDA_TOD Policy_First Draft 2012.pdf
 
Sweety Planet Packaging Design Process Book.pptx
Sweety Planet Packaging Design Process Book.pptxSweety Planet Packaging Design Process Book.pptx
Sweety Planet Packaging Design Process Book.pptx
 
Hire 💕 8617697112 Meerut Call Girls Service Call Girls Agency
Hire 💕 8617697112 Meerut Call Girls Service Call Girls AgencyHire 💕 8617697112 Meerut Call Girls Service Call Girls Agency
Hire 💕 8617697112 Meerut Call Girls Service Call Girls Agency
 

Bootstrap2up

  • 1. Short Guides to Microeconometrics Spring 2012 Kurt Schmidheiny Unversit¨at Basel The Bootstrap 1 Introduction The bootstrap is a method to derive properties (standard errors, confi- dence intervals and critical values) of the sampling distribution of estima- tors. It is very similar to Monte Carlo techniques (see the corresponding hand-out). However instead of fully specifying the data generating process (DGP), we use information from the sample. In short, the bootstrap takes the sample (the values of the indepen- dent and dependent variables) as the population and the estimates of the sample as true values. Instead of drawing from a specified distribution (such as the normal) by a random number generator, the bootstrap draws with replacement from the sample. It therefore takes the empirical distri- bution function (the step-function) as true distribution function. In the example of a linear regression model, the sample provides the empirical distribution for the dependent variable, the independent variables and the error term as well as values for constant, slope and error variance. The great advantage compared to Monte Carlo methods is that we neither make assumption about the distributions nor about the true values of the parameters. The bootstrap is typically used for consistent but biased estimators. In most cases we know the asymptotic properties of these estimators. So we could use asymptotic theory to derive the approximate sampling distri- bution. That is what we usually do when using, for example, maximum likelihood estimators. The bootstrap is an alternative way to produce approximations for the true small sample properties. So why (or when) would we use the bootstrap? There are two main reasons: Version: 9-4-2012, 16:09 The Bootstrap 2 1a) The asymptotic sampling distribution is very difficult to derive. 1b) The asymptotic sampling distribution is too difficult to derive for me. This might apply to many multi-stage estimators. Example: the two stage estimator of the heckman sample selection model. 1c) The asymptotic sampling distribution is too time-consuming and error-prone for me. This might apply to forecasts or statistics that are (nonlinear) functions of the estimated model parameters. Ex- ample: elasticities calculated from slope coefficients. 2 ) The bootstrap produces “better” approximations for some properties. It can be shown that bootstrap approximations converge faster for certain statistics1 than the approximations based on asymptotic the- ory. These bootstrap approximations are called asymptotic refine- ments. Example: the t-statistic of a mean or a slope coefficient. Note that both asymptotic theory and the bootstrap only provide approx- imations for finite sample properties. The bootstrap produces consistent approximations for the sampling distribution for a variety of estimators such as the mean, median, the coefficients in OLS and most econometric models. However, there are estimators (e.g. the maximum) for which the bootstrap fails to produce consistent properties. This handout covers the nonparametric bootstrap with paired sam- pling. This method is appropriate for randomly sampled cross-section data. Data from complex random samplings procedures (e.g. stratified sampling) require special attention. See the handout on “Clustering”. Time-series data and panel data also require more sophisticated boot- strap techniques. 1These statistics are called asymptotically pivotal, i.e. there asymptotic distribu- tions are independent of the data and of the true parameter values. This applies, for example, to all statistics with the standard normal or Chi-squared as limiting distri- bution.
  • 2. 3 Short Guides to Microeconometrics 2 The Method: Nonparametric Bootstrap 2.1 Bootstrap Samples Consider a sample with n = 1, ..., N independent observations of a de- pendent variable y and K + 1 explanatory variables x. A paired bootstrap sample is obtained by independently drawing N pairs (xi, yi) from the observed sample with replacement. The bootstrap sample has the same number of observations, however some observations appear several times and others never. The bootstrap involves drawing a large number B of bootstrap samples. An individual bootstrap sample is denoted (x∗ b , y∗ b ), where x∗ b is a N ×(K +1) matrix and y∗ b an N-dimensional column vector of the data in the b-th bootstrap sample. 2.2 Bootstrap Standard Errors The empirical standard deviation of a series of bootstrap replications of ˆθ can be used to approximate the standard error se(ˆθ) of an estimator ˆθ. 1. Draw B independent bootstrap samples (x∗ b , y∗ b ) of size N from (x, y). Usually B = 100 replications are sufficient. 2. Estimate the parameter θ of interest for each bootstrap sample: ˆθ∗ b for b = 1, ..., B. 3. Estimate se(ˆθ) by se = 1 B − 1 B b=1 (ˆθ∗ b − ˆθ∗)2 where ˆθ∗ = 1 B B b=1 ˆθ∗ b . The whole covariance matrix V (ˆθ) of a vector ˆθ is estimated analogously. The Bootstrap 4 In case the estimator ˆθ is consistent and asymptotically normally dis- tributed, bootstrap standard errors can be used to construct approximate confidence intervals and to perform asymptotic tests based on the normal distribution. 2.3 Confidence Intervals Based on Bootstrap Percentiles We can construct a two-sided equal-tailed (1−α) confidence interval for an estimate ˆθ from the empirical distribution function of a series of bootstrap replications. The (α/2) and the (1 − α/2) empirical percentiles of the bootstrap replications are used as lower and upper confidence bounds. This procedure is called percentile bootstrap. 1. Draw B independent bootstrap samples (x∗ b , y∗ b ) of size N from (x, y). It is recommended to use B = 1000 or more replications. 2. Estimate the parameter θ of interest for each bootstrap sample: ˆθ∗ b for b = 1, ..., B. 3. Order the bootstrap replications of ˆθ such that ˆθ∗ 1 ≤ ... ≤ ˆθ∗ B. The lower and upper confidence bounds are the B · α/2-th and B · (1 − α/2)-th ordered elements, respectively. For B = 1000 and α = 5% these are the 25th and 975th ordered elements. The estimated (1−α) confidence interval of ˆθ is [ˆθ∗ B·α/2, ˆθ∗ B·(1−α/2)]. Note that these confidence intervals are in general not symmetric. 2.4 Bootstrap Hypothesis Tests The approximate confidence interval in section 2.3 can be used to perform an approximate two-sided test of a null hypothesis of the form H0 : θ = θ0. The null hypothesis is rejected on the significance level α if θ0 lies outside the two-tailed (1 − α) confidence interval.
  • 3. 5 Short Guides to Microeconometrics 2.5 The bootstrap-t Assume that we have consistent estimates of ˆθ and se(ˆθ) at hand and that the asymptotic distribution of the t-statistic is the standard normal t = ˆθ − θ0 se(ˆθ) d −→ N(0, 1). Then we can calculate approximate critical values from percentiles of the empirical distribution of a series bootstrap replications for the t-statistic. 1. Consistently estimate θ and se(ˆθ) using the observed sample: ˆθ, se(ˆθ) 2. Draw B independent bootstrap samples (x∗ b , y∗ b ) of size N from (x, y). It is recommended to use B = 1000 or more replications. 3. Estimate the t-value assuming θ0 = ˆθ for each bootstrap sample: t∗ b = ˆθ∗ b − ˆθ se∗ b (ˆθ) for b = 1, ..., B where ˆθ∗ b and se∗ b (ˆθ) are estimates of the parameter θ and its stan- dard error using the bootstrap sample. 4. Order the bootstrap replications of t such that t∗ 1 ≤ ... ≤ t∗ B. The lower critical value and the upper critical values are then the B·α/2- th and B · (1 − α/2)-th elements, respectively. For B = 1000 and α = 5% these are the 25th and 975th ordered elements. tα/2 = t∗ B·α/2, t1−α/2 = t∗ B·(1−α/2) These critical values can now be used in otherwise usual t-tests for θ. The above bootstrap lower tB·α/2) and upper tB·(1−α/2) critical val- ues generally differ in absolute values. Alternatively, we can estimate symmetric critical values by adapting step 4: The Bootstrap 6 4. Order the bootstrap replications of t such that |t∗ 1| ≤ ... ≤ |t∗ B|. The absolute critical value is then the the B · (1 − α)-th element. For B = 1000 and α = 5% this is the 950th ordered element. The lower and upper critical values are, respectively: tα/2 = −|t∗ B·(1−α)|, t1−α/2 = |t∗ B·(1−α)| The symmetric bootstrap-t is the preferred method for bootstrap hypoth- esis testing as it makes use of the faster convergence of t-statistics relative to asymptotic approximations (i.e. critical values from the t- or standard normal tables). The bootstrap-t procedure can also be used to create confidence inter- vals using bootstrap critical values instead of the ones from the standard normal tables: [ˆθ + tα/2 · se(ˆθ), ˆθ + t1−α/2 · se(ˆθ)] The confidence interval from bootstrap-t is not necessarily better then the percentile method. However, it is consistent with bootstrap-t hypothesis testing. 3 Implementation in Stata 12.0 Stata has very conveniently implemented the bootstrap for cross-section data. Bootstrap sampling and summarizing the results is automatically done by Stata. The Stata commands are shown for the example of a univariate regression of a variable y on x. Case 1: Bootstrap standard errors are implemented as option in the stata command Many stata estimation commands such as regress have a built-in vce option to calculate bootstrap covariance estimates. For example regress y x, vce(bootstrap, reps(100))
  • 4. 7 Short Guides to Microeconometrics runs B = 100 bootstrap iterations of a linear regression and reports boot- strap standard errors along with confidence intervals and p-values based on the normal approximation and bootstrap standard errors. The postes- timation command regress y x, vce(bootstrap, reps(1000)) estat bootstrap, percentile reports confidence bounds based on bootstrap percentiles rather than the normal approximation. Remember that it is recommended to use at least B = 1000 replications for bootstrap percentiles. The percentiles to be reported are defined with the confidence level option. For example, the 0.5% and 99.5% percentiles that create the 99% confidence interval are reported by regress y x, vce(bootstrap, reps(1000)) level(99) estat bootstrap, percentile Case 2: The statistic of interest is returned by a single stata command The command bootstrap, reps(100): reg y x runs B = 100 bootstrap iterations of a linear regression and reports boot- strap standard errors along with confidence intervals and p-values based on the normal approximation and bootstrap standard errors. The postes- timation command estat bootstrap is used to report confidence inter- vals based on bootstrap percentiles from e.g. B = 1000 replications: bootstrap, reps(1000): reg y x estat bootstrap, percentile We can select an specific statistic to be recorded in the bootstrap iterations. For example the slope coefficient only: bootstrap _b[x], reps(100): reg y x The Bootstrap 8 By default, Stata records the whole coefficient vector b. Any value re- turned by a stata command (see ereturn list) can be selected. We can also record functions of returned statistics. For example, the following commands create bootstrap critical values on the 5% significance level of the t-statistic for the slope coefficient: reg y x scalar b = _b[x] bootstrap t=((_b[x]-b)/_se[x]), reps(1000): reg y x, level(95) estat bootstrap, percentile The respective symmetric critical values on the 5% significance level are calculated by reg y x scalar b = _b[x] bootstrap t=abs((_b[x]-b)/_se[x]), reps(1000): reg y x, level(90) estat bootstrap, percentile We can save the bootstrap replications of the selected statistics in a normal stata .dta file to further investigate the bootstrap sampling distri- bution. For example, bootstrap b=_b[x], reps(1000) saving(bs_b, replace): reg y x use bs_b, replace histogram b shows the bootstrap histogram of the sampling distribution of the slope coefficient. Note: it is important that all observations with missing values are dropped from the dataset before using the bootstrap command. Missing values will lead to different bootstrap sample sizes. Case 3: The statistics of interest is calculated in a series of stata commands The first task is to define a program that produces the statistic of inter- est for a single sample. This program might involve several estimation
  • 5. 9 Short Guides to Microeconometrics commands and intermediate results. For example, the following program calculates the t-statistic centered at ˆβ in a regression of y on x program tstat, rclass reg y x return scalar t = (_b[x]-b)/_se[x] end The last line of the program specifies the value that is investigated in the bootstrap: (ˆβ − b)/se(ˆβ) which will be returned under the name t. The definition of the program can be directly typed into the command window or is part of a do-file. The program should now be tested by typing reg y x scalar b = _b[x] tstat return list The bootstrap is then performed by the Stata commands reg y x scalar b = _b[x] bootstrap t=r(t), reps(1000): tstat estat bootstrap, percentile As in case 2, the bootstrap results can be saved and evaluated manu- ally. For example, reg y x scalar b = _b[x] bootstrap t=r(t), reps(1000) saving(bs_t): tstat use bs_t, replace centile t, centile(2.5, 97.5) gen t_abs = abs(t) centile t_abs, centile(95) reports both asymmetric and symmetric critical values on the 5% signifi- cance level for t-tests on the slope coefficient. The Bootstrap 10 4 See also ... There is much more about the bootstrap than presented in this handout. Instead of paired resampling there is residual resampling which is often used in time-series context. There is also a parametric bootstrap. The bootstrap can also be used to reduce the small sample bias of an estimator by bias corrections. The m out of n bootstrap is used to overcome some bootstrap failures. A method very similar to the bootstrap is the jackknife. References Bradley Efron and Robert J. Tibshirani (1993), An Introduction to the Bootstrap, Boca Raton: Chapman & Hall. [A fairly advanced but nicely and practically explained comprehensive text by the inventor of the bootstrap.] Brownstone, David and Robert Valetta (2001), The Bootstrap and Multi- ple Imputations: Harnessing Increased Computing Power for Improved Statistical Tests, Journal of Economic Perspectives, 15(4), 129-141. [An intuitive pladoyer for the use of the bootstrap.] Cameron, A. C. and P. K. Trivedi (2005), Microeconometrics: Methods and Applications, Cambridge University Press. Sections 7.8 and chap- ter 11. Horowitz, Joel L. (1999) The Bootstrap, In: Handbook of Econometrics, Vol. 5. [This is a very advanced description of the (asymptotic) prop- erties of the bootstrap.] Wooldridge, Jeffrey M. (2009), Introductory Econometrics: A Modern Approach, 4th ed. South-Western. Appendix 6A. [A first glance at the bootstrap.]