ML-IR Discussion:
Bag of Little
Bootstrap (BLB)
Recap:
- Recap
- Why bootstrap
- What is bootstrap
- Bag of Little Bootstrap (BLB)
- Guarantees
- Examples
Recap:
Population

Our Sample
Estimate the median!
Estimate the median!
Asymptotic Approach
Theory has it:
Asymptotic Approach
Theory has it:

?
Asymptotic Approach

95%
Confidence Interval
Problems with the asymptotic
Approach:

- Density “f” is hard to estimate
- Sample size demand is much larger than the mean for
Central Limit theorem to kick in
- True median unknown
Solution:
When theory is too hard…
Let’s empirically estimate
theoretical truth!
Empirical Approach: Ideal
Population

Sample Over and
Over again!
Empirical Approach: Ideal
Population

Sample Over and
Over again!

Median Est 1

Median Est 2
Empirical Approach: Ideal
Empirical Approach: Ideal
95% of sample medians
Similar
Enough?
Population

Our Sample
Empirical Approach: Bootstrap
Efron Tibshirani (1993)
Our Sample

Draw with replacement
n samples

Median Est* 1

Median Est* 2
Empirical Approach: Bootstrap
Empirical Approach: Bootstrap
95% of sample medians
Empirical Approach: Bootstrap
Used for:
- Bias estimation
- Variance
- Confidence intervals
Main benefits:
- Automatic
- Flexible
- Fast convergence (Hall, 1992)
Key: There are 3 distributions
Population
Key: There are 3 distributions
Population

Approximate
distribution
Actual Sample
Key: There are 3 distributions
Population

Approximate
distribution
Actual Sample

Approximate
distribution

Bootstrap Samples
Key: There are 3 distributions
Population

Approximate
distribution
Actual Sample

Approximate
distribution

Bootstrap Samples

Approximate
the approximation
- Is there bias?
- What’s the variance?
- etc.
No free meals:
- Bootstrapping requires re-sampling the entire
population B times
- Each sample is size n
- Sampling m < n will violate the sample size
properties
- Original sample size cannot be too small
- “Pre-asymptopia” cases
Hope
-

Resample expects .632n unique samples
Sample less – m out of n bootstrap is possible with
analytical adjustments. (Bickel 1997)
Hope
-

Resample expects .632n unique samples
Sample less – m out of n bootstrap is possible with
analytical adjustments. (Bickel 1997)

Intuition: Need less than all n values for each bootstrap.
Hope
-

Resample expects .632n unique samples
Sample less – m out of n bootstrap is possible with
analytical adjustments. (Bickel 1997)

Intuition: Need less than all n values for each bootstrap.

Problem:
- Analytical adjustment is not as automatic as desirable
- m out of n bootstrap is sensitive to choices of m
Bag of Little Bootstrap
-

Sample without
replacement the
sample s times into
sizes of b
Bag of Little Bootstrap
-

Sample without
replacement the
sample s times into
sizes of b
- Resample each
until sample size is
n, r times.
Bag of Little Bootstrap
-

Med 1

Med r

Sample without
replacement the
sample s times into
sizes of b
- Resample each
until sample size is
n, r times.
- Compute the
median for each
Bag of Little Bootstrap
-

Med 1

Med r

Sample without
replacement the
sample s times into
sizes of b
- Resample each
until sample size is
n, r times.
- Compute the
median for each
- Compute the
confidence interval
for each
Bag of Little Bootstrap
-

Med 1

Med r

Sample without
replacement the
sample s times into
sizes of b
- Resample each
until sample size is
n, r times.
- Compute the
median for each
- Compute the
confidence interval
for each
Bag of Little Bootstrap
-

Med 1

Med r

-

Sample without
replacement the
sample s times into
sizes of b
- Resample each
until sample size is
n, r times.
- Compute the
median for each
- Compute the
confidence interval
for each
Take average of each
upper and lower point
for the confidence
interval
Bag of Little Bootstrap
Klein et al. 2012
Computational Gains:
- Each sample only has b unique values!
- Can sample a b-dimensional multinomial
with n trials.
- Scales in b instead of n
- Easily parallelizable
Bag of Little Bootstrap
Klein et al. 2012
Computational Gains:
- Each sample only has b unique values!
- Can sample a b-dimensional multinomial
with n trials.
- Scales in b instead of n
- Easily parallelizable
If b=n^(0.6), a dataset of size 1TB:
- Bootstrap storage demands ~ 632GB
- BLB storage demands ~ 4GB
Bag of Little Bootstrap
Theoretical guarantees:
- Consistency
- Higher order correctness
- Fast convergence rate (same as bootstrap)
Performance
b = n^(gamma), 0.5<= gamma <=1
These choices of gamma ensures bootstrap convergence rates.
Performance
b = n^(gamma), 0.5<= gamma <=1
These choices of gamma ensures bootstrap convergence rates.
Relative error of confidence interval width of logistic regression
coefficients
(Klein et al. 2012)
Performance
b = n^(gamma), 0.5<= gamma <=1
These choices of gamma ensures bootstrap convergence rates.
Relative error of confidence interval width of logistic regression
coefficients
(Klein et al. 2012)

Gamma residuals

t-distr residuals
Performance vs Time
Selecting Hyperparameters
• b, the number of unique samples for each little bootstrap
• s, the number of size b samples w/o replacement
• r, the number of multinomials to draw
Selecting Hyperparameters
• b, the number of unique samples for each little bootstrap
• s, the number of size b samples w/o replacement
• r, the number of multinomials to draw

b: the larger the better
s, r: adaptively increase this until a convergence
has been reached. (Median doesn’t change)
Bag of Little Bootstrap
Main benefits:
- Computationally friendly
- Maintains most statistical properties of bootstrap
- Flexibility
- More robust to choice of b than older methods
Reference
• Efron, Tibshirani (1993) An Introduction to the Bootstrap
• Kleiner et al. (2012) A Scalable Bootstrap for Massive Data

Thanks!

Introduction to Bag of Little Bootstrap

  • 1.
    ML-IR Discussion: Bag ofLittle Bootstrap (BLB)
  • 2.
    Recap: - Recap - Whybootstrap - What is bootstrap - Bag of Little Bootstrap (BLB) - Guarantees - Examples
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
    Problems with theasymptotic Approach: - Density “f” is hard to estimate - Sample size demand is much larger than the mean for Central Limit theorem to kick in - True median unknown
  • 10.
    Solution: When theory istoo hard… Let’s empirically estimate theoretical truth!
  • 11.
  • 12.
    Empirical Approach: Ideal Population SampleOver and Over again! Median Est 1 Median Est 2
  • 13.
  • 14.
  • 15.
  • 16.
    Empirical Approach: Bootstrap EfronTibshirani (1993) Our Sample Draw with replacement n samples Median Est* 1 Median Est* 2
  • 17.
  • 18.
  • 19.
    Empirical Approach: Bootstrap Usedfor: - Bias estimation - Variance - Confidence intervals Main benefits: - Automatic - Flexible - Fast convergence (Hall, 1992)
  • 20.
    Key: There are3 distributions Population
  • 21.
    Key: There are3 distributions Population Approximate distribution Actual Sample
  • 22.
    Key: There are3 distributions Population Approximate distribution Actual Sample Approximate distribution Bootstrap Samples
  • 23.
    Key: There are3 distributions Population Approximate distribution Actual Sample Approximate distribution Bootstrap Samples Approximate the approximation - Is there bias? - What’s the variance? - etc.
  • 24.
    No free meals: -Bootstrapping requires re-sampling the entire population B times - Each sample is size n - Sampling m < n will violate the sample size properties - Original sample size cannot be too small - “Pre-asymptopia” cases
  • 25.
    Hope - Resample expects .632nunique samples Sample less – m out of n bootstrap is possible with analytical adjustments. (Bickel 1997)
  • 26.
    Hope - Resample expects .632nunique samples Sample less – m out of n bootstrap is possible with analytical adjustments. (Bickel 1997) Intuition: Need less than all n values for each bootstrap.
  • 27.
    Hope - Resample expects .632nunique samples Sample less – m out of n bootstrap is possible with analytical adjustments. (Bickel 1997) Intuition: Need less than all n values for each bootstrap. Problem: - Analytical adjustment is not as automatic as desirable - m out of n bootstrap is sensitive to choices of m
  • 28.
    Bag of LittleBootstrap - Sample without replacement the sample s times into sizes of b
  • 29.
    Bag of LittleBootstrap - Sample without replacement the sample s times into sizes of b - Resample each until sample size is n, r times.
  • 30.
    Bag of LittleBootstrap - Med 1 Med r Sample without replacement the sample s times into sizes of b - Resample each until sample size is n, r times. - Compute the median for each
  • 31.
    Bag of LittleBootstrap - Med 1 Med r Sample without replacement the sample s times into sizes of b - Resample each until sample size is n, r times. - Compute the median for each - Compute the confidence interval for each
  • 32.
    Bag of LittleBootstrap - Med 1 Med r Sample without replacement the sample s times into sizes of b - Resample each until sample size is n, r times. - Compute the median for each - Compute the confidence interval for each
  • 33.
    Bag of LittleBootstrap - Med 1 Med r - Sample without replacement the sample s times into sizes of b - Resample each until sample size is n, r times. - Compute the median for each - Compute the confidence interval for each Take average of each upper and lower point for the confidence interval
  • 34.
    Bag of LittleBootstrap Klein et al. 2012 Computational Gains: - Each sample only has b unique values! - Can sample a b-dimensional multinomial with n trials. - Scales in b instead of n - Easily parallelizable
  • 35.
    Bag of LittleBootstrap Klein et al. 2012 Computational Gains: - Each sample only has b unique values! - Can sample a b-dimensional multinomial with n trials. - Scales in b instead of n - Easily parallelizable If b=n^(0.6), a dataset of size 1TB: - Bootstrap storage demands ~ 632GB - BLB storage demands ~ 4GB
  • 36.
    Bag of LittleBootstrap Theoretical guarantees: - Consistency - Higher order correctness - Fast convergence rate (same as bootstrap)
  • 37.
    Performance b = n^(gamma),0.5<= gamma <=1 These choices of gamma ensures bootstrap convergence rates.
  • 38.
    Performance b = n^(gamma),0.5<= gamma <=1 These choices of gamma ensures bootstrap convergence rates. Relative error of confidence interval width of logistic regression coefficients (Klein et al. 2012)
  • 39.
    Performance b = n^(gamma),0.5<= gamma <=1 These choices of gamma ensures bootstrap convergence rates. Relative error of confidence interval width of logistic regression coefficients (Klein et al. 2012) Gamma residuals t-distr residuals
  • 40.
  • 41.
    Selecting Hyperparameters • b,the number of unique samples for each little bootstrap • s, the number of size b samples w/o replacement • r, the number of multinomials to draw
  • 42.
    Selecting Hyperparameters • b,the number of unique samples for each little bootstrap • s, the number of size b samples w/o replacement • r, the number of multinomials to draw b: the larger the better s, r: adaptively increase this until a convergence has been reached. (Median doesn’t change)
  • 43.
    Bag of LittleBootstrap Main benefits: - Computationally friendly - Maintains most statistical properties of bootstrap - Flexibility - More robust to choice of b than older methods
  • 44.
    Reference • Efron, Tibshirani(1993) An Introduction to the Bootstrap • Kleiner et al. (2012) A Scalable Bootstrap for Massive Data Thanks!