Introduction to Bootstrap and elements of Markov Chains


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Introduction to Bootstrap and elements of Markov Chains

  1. 1. Statistics Lab Rodolfo Metulini IMT Institute for Advanced Studies, Lucca, Italy Lesson 5 - Introduction to Bootstrap and Introduction to Markov Chains - 23.01.2014
  2. 2. Introduction Let’s assume, for a moment, the CLT: If random samples of n observations y1 , y2 , ..., yn are drawn from a population of meran µ and sd σ, for n sufficienly large, the sampling distribution of the sample mean can be approximated by a normal density with mean µ and variance σ Averages taken from any distribution will have a normal distribution The standard deviation decreases as the number of observation increases But .. nobody tells us exactly how big the sample has to be.
  3. 3. Why Bootstrap? Sometimes we can not make use of the CLT, because: 1. Nobody tells us exactly how big the sample has to be 2. The sample can be really small. So that, we are not encouraged to hypotize any distribution assumption. We just have the data and let the raw data speak. The bootstrap method attempts to determine the probability distribution from the data itself, without recourse to CLT. N.B. The Bootstrap method is not a way of reducing the error! it only tries to estimate it.
  4. 4. Basic Idea of Bootstrap Using the original sample as the population, and draw N samples from the original sample (which are the bootstrap samples). Defining the estimator using the bootstrap samples. Figure: Real World versus Bootstrap World
  5. 5. Structure of Bootstrap 1. Originally, from a list of data (the sample), one compute a statistic (an estimation) 2. Create an artifical list of data (a new sample), by randomly drawing elements from the list 3. Compute a new statistic (estimation), from the new sample 4. Repeat, let’s say, 1000 times the point 2) and 3) and look to the distribution of these 1000 statistics.
  6. 6. Sample mean Suppose we extracted a sample x = (x1 , x2 , ..., xn ) from the population X . Let’s say the sample size is small: n = 10. We can compute the sample mean x using the values of the ˆ sample x. But, since n is small, the CLT does not hold, so that we can say anything about the sample mean distribution. APPROACH: We extract M samples (or sub-samples) of dimension n from the sample x (with replacement). We can define the bootstrap sample means: xbi ∀i = 1..., M. This become the new sample with dimension M Bootstrap sample mean: Mb(X ) = M i xbi /M Bootstrap sample variance: Vb(X ) = M i (xbi − Mb(X ))2 /M − 1
  7. 7. Bootstrap Confidence interval with variance estimation Let’s take a random sample of size 25 from a normal distribution with mean 10 and standard deviation 3. We can consider the sampling distribution of the sample mean. From that, we estimate the intervals. The bootstrap estimates standard error by resampling the data in our original sample. Instead of repeatedly drawing samples of size 100 from the population, we will repeatedly draw new samples of size 100 from our original sample, resampling with replacement. We can estimate the standard error of the sample mean using the standard deviation of the bootstrapped sample means.
  8. 8. Confidence interval with quantiles Suppose we have a sample of data from an exponential distribution with parameter λ: ˆ f (x|λ) = λe −λx (remember the estimation of λ = 1/ˆn ). x An alternative solution to the use of bootstrap estimated standard errors (the estimation of the sd from an exponential is not straightforward) is the use of bootstrap quantiles. ˆ We can obtain M bootstrap estimates λb and define q ∗ (α) the α quantile of the bootstrap distribution. The new bootstrap confidence interval for λ will be: ˆ ˆ [2 ∗ λ − q ∗ (1 − α/2); 2 ∗ λ − q ∗ (α/2)]
  9. 9. Regression model coefficient estimate with Bootstrap Now we will consider the situation where we have data on two variables. This is the type of data that arises in a linear regression situation. It doesnt make sense to bootstrap the two variables separately, and so must remain linked when bootstrapped. For example, if our original data contains the observations (1,3), (2,6), (4,3), and (6, 2), we re-sample this original sample in pairs. Recall that the linear regression model is: y = β0 + β1 x We are going to construct a bootstrap interval for the slope coefficient β1 1. Draw M bootstrap samples 2. Define the olsβ1 coefficient for each bootstrap sample 3. Define the bootstrap quantiles, and use the 0.025 and the 0.975 to define the confidence interval for β1
  10. 10. Regression model coefficient estimate with Bootstrap: sampling the residuals An alternative solution for the regression coefficient is a two stage methods in which: 1. You draw M samples, for eah one you run a regression and you define M bootstrap regression residuals (dim=n) 2. You add those residuals to each M draw sampled dependent variable, to define M bootstrapped β1 The method consists in using the 0.025 and the 0.975 quantiles of bootstrapped β1 to define the confidence interval.
  11. 11. References Efron, B., Tibshirani, R. (1993). An introduction to the bootstrap (Vol. 57). CRC press Figure: Efron and Tbishirani foundational book
  12. 12. Routines in R 1. boot, by Brian Ripley. Functions and datasets for bootstrapping from the book Bootstrap Methods and Their Applications by A. C. Davison and D. V. Hinkley (1997, CUP). 2. bootstrap, by Rob Tibshirani. Software (bootstrap, cross-validation, jackknife) and data for the book An Introduction to the Bootstrap by B. Efron and R. Tibshirani, 1993, Chapman and Hall
  13. 13. Markov Chain Markov Chain are important concept in probability and many other area of research. They are used to model the probability to belong to a certain state in a certain period, given the state in the past period. Example of weather: What is the markov probability for the state tomorrow will be sunny, given that today is rainy ? The main properties of Markov Chain processes are: Memory of the process (usually the memory is fixed to 1) Stationarity of the distribution
  14. 14. Chart 1 A picture of an easy example of markov chain with 2 possible states and transition probabilities. Figure: An example of 2 states markov chain
  15. 15. Notation We define a stochastic process {Xn , n = 0, 1, 2, ...} that takes on a finite or countable number of possible values. Let the possible values be non negative integers (i.e.Xn ∈ Z+ ). If Xn = i, then the process is said to be in state i at time n. The Markov process (in discrete time) is defined as follows: Pij = P[Xn+1 = j|Xn = in , Xn−1 = in−1 , ..., X0 = i0 ] = P[Xn+1 = j|Xn = in ], ∀i, j ∈ Z+ We call Pij a 1-step transition probability because we moved from time n to time n + 1. It is a first order Markov Chain (memory = 1) because the probability of being in state j at time (n + 1) only depends on the state at time n.
  16. 16. Notation - 2 The n − step transition probability Pnij = P[Xn+k = j|Xk = i], ∀n ≥ 0, i, j ≥ 0 The Champman Kolmogorov equations allow us to compute these n − step transition probabilities. It states that: Pnij = k Pnik Pmkj , ∀n, m ≥ 0, ∀i, j ≥ 0 N.B. Base probability properties: 1. Pij ≥ 0, ∀i, j ≥ 0 2. j≥0 Pij = 1, i = 0, 1, 2, ...
  17. 17. Example: conditional probability Consider two states: 0 = rain and 1 = no rain. Define two probabilities: α = P00 = P[Xn+1 = 0|Xn = 0] the probability it will rain tomorrow given it rained today β = P01 = P[Xn+1 = 1|Xn = 0] the probability it will rain tomorrow given it did not rain today. What is the probability it will rain the day after tomorrow given it rained today? The transition probability matrix will be: P = [α, β, 1 − α, 1 − β]
  18. 18. Example: uncoditional probababily What is the unconditional probability it will rain the day after tomorrow? We need to define the uncoditional or marginal distribution of the state at time n: P[Xn = j] = i P[Xn = j|X0 = 1]P[X0 = i] = i Pnij ∗ αi , where αi = P[X0 = i], ∀i ≥ 0 and P[Xn = j|X0 = 1] is the conditional probability just computed before.
  19. 19. Stationary distributions A stationary distribution π is the probability distribution such that when the Markov chain reaches the stationary distribution, them it remains in that probability forever. It means we are asking this question: What is the probability to be in a particular state in the long-run? Let’s define πj as the limiting probability that the process will be in state j at time n, or πj = limn→∞ Pnij Using Fubini’s theorem, we can define the stationary distribution as: πj = i Pij πi
  20. 20. Example: stationary distribution Back to out example. We can compute the 2 step, 3 step, ..., n- step transition distribution, and give a look WHEN it reach the convergence. An alternative method to compute the stationary transition distribution consists in using this easy formula: π0 = β α π1 = 1−α α
  21. 21. References Ross, S. M. (2006). Introduction to probability models. Access Online via Elsevier. Figure: Cover of the 10th edition
  22. 22. Routines in R markovchain, by Giorgio Alfredo Spedicato. A package for easily handling discrete Markov chains. MCMCpack, by Andrew D. Martin, Kevin M. Quinn, and Jong Hee Park. Perform Monte Carlo simulations based on Markov Chain approach.