SlideShare a Scribd company logo
1 of 44
ML-IR Discussion:
Bag of Little
Bootstrap (BLB)
Recap:
- Recap
- Why bootstrap
- What is bootstrap
- Bag of Little Bootstrap (BLB)
- Guarantees
- Examples
Recap:
Population

Our Sample
Estimate the median!
Estimate the median!
Asymptotic Approach
Theory has it:
Asymptotic Approach
Theory has it:

?
Asymptotic Approach

95%
Confidence Interval
Problems with the asymptotic
Approach:

- Density “f” is hard to estimate
- Sample size demand is much larger than the mean for
Central Limit theorem to kick in
- True median unknown
Solution:
When theory is too hard…
Let’s empirically estimate
theoretical truth!
Empirical Approach: Ideal
Population

Sample Over and
Over again!
Empirical Approach: Ideal
Population

Sample Over and
Over again!

Median Est 1

Median Est 2
Empirical Approach: Ideal
Empirical Approach: Ideal
95% of sample medians
Similar
Enough?
Population

Our Sample
Empirical Approach: Bootstrap
Efron Tibshirani (1993)
Our Sample

Draw with replacement
n samples

Median Est* 1

Median Est* 2
Empirical Approach: Bootstrap
Empirical Approach: Bootstrap
95% of sample medians
Empirical Approach: Bootstrap
Used for:
- Bias estimation
- Variance
- Confidence intervals
Main benefits:
- Automatic
- Flexible
- Fast convergence (Hall, 1992)
Key: There are 3 distributions
Population
Key: There are 3 distributions
Population

Approximate
distribution
Actual Sample
Key: There are 3 distributions
Population

Approximate
distribution
Actual Sample

Approximate
distribution

Bootstrap Samples
Key: There are 3 distributions
Population

Approximate
distribution
Actual Sample

Approximate
distribution

Bootstrap Samples

Approximate
the approximation
- Is there bias?
- What’s the variance?
- etc.
No free meals:
- Bootstrapping requires re-sampling the entire
population B times
- Each sample is size n
- Sampling m < n will violate the sample size
properties
- Original sample size cannot be too small
- “Pre-asymptopia” cases
Hope
-

Resample expects .632n unique samples
Sample less – m out of n bootstrap is possible with
analytical adjustments. (Bickel 1997)
Hope
-

Resample expects .632n unique samples
Sample less – m out of n bootstrap is possible with
analytical adjustments. (Bickel 1997)

Intuition: Need less than all n values for each bootstrap.
Hope
-

Resample expects .632n unique samples
Sample less – m out of n bootstrap is possible with
analytical adjustments. (Bickel 1997)

Intuition: Need less than all n values for each bootstrap.

Problem:
- Analytical adjustment is not as automatic as desirable
- m out of n bootstrap is sensitive to choices of m
Bag of Little Bootstrap
-

Sample without
replacement the
sample s times into
sizes of b
Bag of Little Bootstrap
-

Sample without
replacement the
sample s times into
sizes of b
- Resample each
until sample size is
n, r times.
Bag of Little Bootstrap
-

Med 1

Med r

Sample without
replacement the
sample s times into
sizes of b
- Resample each
until sample size is
n, r times.
- Compute the
median for each
Bag of Little Bootstrap
-

Med 1

Med r

Sample without
replacement the
sample s times into
sizes of b
- Resample each
until sample size is
n, r times.
- Compute the
median for each
- Compute the
confidence interval
for each
Bag of Little Bootstrap
-

Med 1

Med r

Sample without
replacement the
sample s times into
sizes of b
- Resample each
until sample size is
n, r times.
- Compute the
median for each
- Compute the
confidence interval
for each
Bag of Little Bootstrap
-

Med 1

Med r

-

Sample without
replacement the
sample s times into
sizes of b
- Resample each
until sample size is
n, r times.
- Compute the
median for each
- Compute the
confidence interval
for each
Take average of each
upper and lower point
for the confidence
interval
Bag of Little Bootstrap
Klein et al. 2012
Computational Gains:
- Each sample only has b unique values!
- Can sample a b-dimensional multinomial
with n trials.
- Scales in b instead of n
- Easily parallelizable
Bag of Little Bootstrap
Klein et al. 2012
Computational Gains:
- Each sample only has b unique values!
- Can sample a b-dimensional multinomial
with n trials.
- Scales in b instead of n
- Easily parallelizable
If b=n^(0.6), a dataset of size 1TB:
- Bootstrap storage demands ~ 632GB
- BLB storage demands ~ 4GB
Bag of Little Bootstrap
Theoretical guarantees:
- Consistency
- Higher order correctness
- Fast convergence rate (same as bootstrap)
Performance
b = n^(gamma), 0.5<= gamma <=1
These choices of gamma ensures bootstrap convergence rates.
Performance
b = n^(gamma), 0.5<= gamma <=1
These choices of gamma ensures bootstrap convergence rates.
Relative error of confidence interval width of logistic regression
coefficients
(Klein et al. 2012)
Performance
b = n^(gamma), 0.5<= gamma <=1
These choices of gamma ensures bootstrap convergence rates.
Relative error of confidence interval width of logistic regression
coefficients
(Klein et al. 2012)

Gamma residuals

t-distr residuals
Performance vs Time
Selecting Hyperparameters
• b, the number of unique samples for each little bootstrap
• s, the number of size b samples w/o replacement
• r, the number of multinomials to draw
Selecting Hyperparameters
• b, the number of unique samples for each little bootstrap
• s, the number of size b samples w/o replacement
• r, the number of multinomials to draw

b: the larger the better
s, r: adaptively increase this until a convergence
has been reached. (Median doesn’t change)
Bag of Little Bootstrap
Main benefits:
- Computationally friendly
- Maintains most statistical properties of bootstrap
- Flexibility
- More robust to choice of b than older methods
Reference
• Efron, Tibshirani (1993) An Introduction to the Bootstrap
• Kleiner et al. (2012) A Scalable Bootstrap for Massive Data

Thanks!

More Related Content

What's hot

ベイズモデリングと仲良くするために
ベイズモデリングと仲良くするためにベイズモデリングと仲良くするために
ベイズモデリングと仲良くするためにShushi Namba
 
分布から見た線形モデル・GLM・GLMM
分布から見た線形モデル・GLM・GLMM分布から見た線形モデル・GLM・GLMM
分布から見た線形モデル・GLM・GLMM. .
 
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...Edureka!
 
Winning Data Science Competitions
Winning Data Science CompetitionsWinning Data Science Competitions
Winning Data Science CompetitionsJeong-Yoon Lee
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingTed Xiao
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2izahn
 
Graph kernels
Graph kernelsGraph kernels
Graph kernelsLuc Brun
 
Data mining Part 1
Data mining Part 1Data mining Part 1
Data mining Part 1Gautam Kumar
 
Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat omarodibat
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitionsOwen Zhang
 
2 4.devianceと尤度比検定
2 4.devianceと尤度比検定2 4.devianceと尤度比検定
2 4.devianceと尤度比検定logics-of-blue
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsDarius Barušauskas
 
5分でわかるかもしれないglmnet
5分でわかるかもしれないglmnet5分でわかるかもしれないglmnet
5分でわかるかもしれないglmnetNagi Teramo
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationMarco Righini
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Causality without headaches
Causality without headachesCausality without headaches
Causality without headachesBenoît Rostykus
 
時系列データと確率的プログラミング tfp.sts
時系列データと確率的プログラミング tfp.sts時系列データと確率的プログラミング tfp.sts
時系列データと確率的プログラミング tfp.stsYuta Kashino
 
魅せる・際立つ・役立つグラフ Hands on!! ggplot2!! ~導入編~
魅せる・際立つ・役立つグラフ Hands on!! ggplot2!! ~導入編~魅せる・際立つ・役立つグラフ Hands on!! ggplot2!! ~導入編~
魅せる・際立つ・役立つグラフ Hands on!! ggplot2!! ~導入編~MrUnadon
 

What's hot (20)

ベイズモデリングと仲良くするために
ベイズモデリングと仲良くするためにベイズモデリングと仲良くするために
ベイズモデリングと仲良くするために
 
Resampling methods
Resampling methodsResampling methods
Resampling methods
 
分布から見た線形モデル・GLM・GLMM
分布から見た線形モデル・GLM・GLMM分布から見た線形モデル・GLM・GLMM
分布から見た線形モデル・GLM・GLMM
 
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
 
Winning Data Science Competitions
Winning Data Science CompetitionsWinning Data Science Competitions
Winning Data Science Competitions
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
 
Graph kernels
Graph kernelsGraph kernels
Graph kernels
 
Data mining Part 1
Data mining Part 1Data mining Part 1
Data mining Part 1
 
XGBoost & LightGBM
XGBoost & LightGBMXGBoost & LightGBM
XGBoost & LightGBM
 
Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitions
 
2 4.devianceと尤度比検定
2 4.devianceと尤度比検定2 4.devianceと尤度比検定
2 4.devianceと尤度比検定
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitions
 
5分でわかるかもしれないglmnet
5分でわかるかもしれないglmnet5分でわかるかもしれないglmnet
5分でわかるかもしれないglmnet
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Causality without headaches
Causality without headachesCausality without headaches
Causality without headaches
 
時系列データと確率的プログラミング tfp.sts
時系列データと確率的プログラミング tfp.sts時系列データと確率的プログラミング tfp.sts
時系列データと確率的プログラミング tfp.sts
 
魅せる・際立つ・役立つグラフ Hands on!! ggplot2!! ~導入編~
魅せる・際立つ・役立つグラフ Hands on!! ggplot2!! ~導入編~魅せる・際立つ・役立つグラフ Hands on!! ggplot2!! ~導入編~
魅せる・際立つ・役立つグラフ Hands on!! ggplot2!! ~導入編~
 

Similar to Introduction to Bag of Little Bootstrap

CHAPTER 3 EXERCISES (Set 2)
CHAPTER 3 EXERCISES (Set 2)CHAPTER 3 EXERCISES (Set 2)
CHAPTER 3 EXERCISES (Set 2)DleenBrowns
 
Genetic Algorithms-1.ppt
Genetic Algorithms-1.pptGenetic Algorithms-1.ppt
Genetic Algorithms-1.pptDrSanjeevPunia
 
NOTE: All requested Minitab output must be copied into your paper.
NOTE:  All requested Minitab output must be copied into your paper.NOTE:  All requested Minitab output must be copied into your paper.
NOTE: All requested Minitab output must be copied into your paper.DleenBrowns
 
Probability Distributions
Probability DistributionsProbability Distributions
Probability DistributionsHarish Lunani
 
Chapter 8 review
Chapter 8 reviewChapter 8 review
Chapter 8 reviewdrahkos1
 
Andrii Belas: A/B testing overview: use-cases, theory and tools
Andrii Belas: A/B testing overview: use-cases, theory and toolsAndrii Belas: A/B testing overview: use-cases, theory and tools
Andrii Belas: A/B testing overview: use-cases, theory and toolsLviv Startup Club
 
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 MLconf
 
5. sampling design
5. sampling design5. sampling design
5. sampling designkbhupadhoj
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knimeGreg Landrum
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
randomization approach in case-based reasoning: case of study of mammography ...
randomization approach in case-based reasoning: case of study of mammography ...randomization approach in case-based reasoning: case of study of mammography ...
randomization approach in case-based reasoning: case of study of mammography ...Miled Basma Bentaiba
 
TEST #1Perform the following two-tailed hypothesis test, using a.docx
TEST #1Perform the following two-tailed hypothesis test, using a.docxTEST #1Perform the following two-tailed hypothesis test, using a.docx
TEST #1Perform the following two-tailed hypothesis test, using a.docxmattinsonjanel
 
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 MLconf
 
regression.pptx
regression.pptxregression.pptx
regression.pptxaneeshs28
 

Similar to Introduction to Bag of Little Bootstrap (20)

Bootstrap.ppt
Bootstrap.pptBootstrap.ppt
Bootstrap.ppt
 
CHAPTER 3 EXERCISES (Set 2)
CHAPTER 3 EXERCISES (Set 2)CHAPTER 3 EXERCISES (Set 2)
CHAPTER 3 EXERCISES (Set 2)
 
Genetic Algorithms-1.ppt
Genetic Algorithms-1.pptGenetic Algorithms-1.ppt
Genetic Algorithms-1.ppt
 
NOTE: All requested Minitab output must be copied into your paper.
NOTE:  All requested Minitab output must be copied into your paper.NOTE:  All requested Minitab output must be copied into your paper.
NOTE: All requested Minitab output must be copied into your paper.
 
Stats chapter 9
Stats chapter 9Stats chapter 9
Stats chapter 9
 
Probability Distributions
Probability DistributionsProbability Distributions
Probability Distributions
 
Chapter10 Revised
Chapter10 RevisedChapter10 Revised
Chapter10 Revised
 
Chapter10 Revised
Chapter10 RevisedChapter10 Revised
Chapter10 Revised
 
Chapter10 Revised
Chapter10 RevisedChapter10 Revised
Chapter10 Revised
 
Chapter 8 review
Chapter 8 reviewChapter 8 review
Chapter 8 review
 
Andrii Belas: A/B testing overview: use-cases, theory and tools
Andrii Belas: A/B testing overview: use-cases, theory and toolsAndrii Belas: A/B testing overview: use-cases, theory and tools
Andrii Belas: A/B testing overview: use-cases, theory and tools
 
Model selection
Model selectionModel selection
Model selection
 
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
 
5. sampling design
5. sampling design5. sampling design
5. sampling design
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knime
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
randomization approach in case-based reasoning: case of study of mammography ...
randomization approach in case-based reasoning: case of study of mammography ...randomization approach in case-based reasoning: case of study of mammography ...
randomization approach in case-based reasoning: case of study of mammography ...
 
TEST #1Perform the following two-tailed hypothesis test, using a.docx
TEST #1Perform the following two-tailed hypothesis test, using a.docxTEST #1Perform the following two-tailed hypothesis test, using a.docx
TEST #1Perform the following two-tailed hypothesis test, using a.docx
 
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
 

More from Wayne Lee

Feature selection can hurt model inference
Feature selection can hurt model inferenceFeature selection can hurt model inference
Feature selection can hurt model inferenceWayne Lee
 
Explaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for StatisticiansExplaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for StatisticiansWayne Lee
 
What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?Wayne Lee
 
R merge-tutorial
R merge-tutorialR merge-tutorial
R merge-tutorialWayne Lee
 
The Key to Blind Dates - Data Snooping
The Key to Blind Dates - Data SnoopingThe Key to Blind Dates - Data Snooping
The Key to Blind Dates - Data SnoopingWayne Lee
 
Crash Course in A/B testing
Crash Course in A/B testingCrash Course in A/B testing
Crash Course in A/B testingWayne Lee
 
LDA Beginner's Tutorial
LDA Beginner's TutorialLDA Beginner's Tutorial
LDA Beginner's TutorialWayne Lee
 

More from Wayne Lee (7)

Feature selection can hurt model inference
Feature selection can hurt model inferenceFeature selection can hurt model inference
Feature selection can hurt model inference
 
Explaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for StatisticiansExplaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for Statisticians
 
What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?
 
R merge-tutorial
R merge-tutorialR merge-tutorial
R merge-tutorial
 
The Key to Blind Dates - Data Snooping
The Key to Blind Dates - Data SnoopingThe Key to Blind Dates - Data Snooping
The Key to Blind Dates - Data Snooping
 
Crash Course in A/B testing
Crash Course in A/B testingCrash Course in A/B testing
Crash Course in A/B testing
 
LDA Beginner's Tutorial
LDA Beginner's TutorialLDA Beginner's Tutorial
LDA Beginner's Tutorial
 

Recently uploaded

General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 

Recently uploaded (20)

General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

Introduction to Bag of Little Bootstrap

  • 1. ML-IR Discussion: Bag of Little Bootstrap (BLB)
  • 2. Recap: - Recap - Why bootstrap - What is bootstrap - Bag of Little Bootstrap (BLB) - Guarantees - Examples
  • 9. Problems with the asymptotic Approach: - Density “f” is hard to estimate - Sample size demand is much larger than the mean for Central Limit theorem to kick in - True median unknown
  • 10. Solution: When theory is too hard… Let’s empirically estimate theoretical truth!
  • 12. Empirical Approach: Ideal Population Sample Over and Over again! Median Est 1 Median Est 2
  • 14. Empirical Approach: Ideal 95% of sample medians
  • 16. Empirical Approach: Bootstrap Efron Tibshirani (1993) Our Sample Draw with replacement n samples Median Est* 1 Median Est* 2
  • 19. Empirical Approach: Bootstrap Used for: - Bias estimation - Variance - Confidence intervals Main benefits: - Automatic - Flexible - Fast convergence (Hall, 1992)
  • 20. Key: There are 3 distributions Population
  • 21. Key: There are 3 distributions Population Approximate distribution Actual Sample
  • 22. Key: There are 3 distributions Population Approximate distribution Actual Sample Approximate distribution Bootstrap Samples
  • 23. Key: There are 3 distributions Population Approximate distribution Actual Sample Approximate distribution Bootstrap Samples Approximate the approximation - Is there bias? - What’s the variance? - etc.
  • 24. No free meals: - Bootstrapping requires re-sampling the entire population B times - Each sample is size n - Sampling m < n will violate the sample size properties - Original sample size cannot be too small - “Pre-asymptopia” cases
  • 25. Hope - Resample expects .632n unique samples Sample less – m out of n bootstrap is possible with analytical adjustments. (Bickel 1997)
  • 26. Hope - Resample expects .632n unique samples Sample less – m out of n bootstrap is possible with analytical adjustments. (Bickel 1997) Intuition: Need less than all n values for each bootstrap.
  • 27. Hope - Resample expects .632n unique samples Sample less – m out of n bootstrap is possible with analytical adjustments. (Bickel 1997) Intuition: Need less than all n values for each bootstrap. Problem: - Analytical adjustment is not as automatic as desirable - m out of n bootstrap is sensitive to choices of m
  • 28. Bag of Little Bootstrap - Sample without replacement the sample s times into sizes of b
  • 29. Bag of Little Bootstrap - Sample without replacement the sample s times into sizes of b - Resample each until sample size is n, r times.
  • 30. Bag of Little Bootstrap - Med 1 Med r Sample without replacement the sample s times into sizes of b - Resample each until sample size is n, r times. - Compute the median for each
  • 31. Bag of Little Bootstrap - Med 1 Med r Sample without replacement the sample s times into sizes of b - Resample each until sample size is n, r times. - Compute the median for each - Compute the confidence interval for each
  • 32. Bag of Little Bootstrap - Med 1 Med r Sample without replacement the sample s times into sizes of b - Resample each until sample size is n, r times. - Compute the median for each - Compute the confidence interval for each
  • 33. Bag of Little Bootstrap - Med 1 Med r - Sample without replacement the sample s times into sizes of b - Resample each until sample size is n, r times. - Compute the median for each - Compute the confidence interval for each Take average of each upper and lower point for the confidence interval
  • 34. Bag of Little Bootstrap Klein et al. 2012 Computational Gains: - Each sample only has b unique values! - Can sample a b-dimensional multinomial with n trials. - Scales in b instead of n - Easily parallelizable
  • 35. Bag of Little Bootstrap Klein et al. 2012 Computational Gains: - Each sample only has b unique values! - Can sample a b-dimensional multinomial with n trials. - Scales in b instead of n - Easily parallelizable If b=n^(0.6), a dataset of size 1TB: - Bootstrap storage demands ~ 632GB - BLB storage demands ~ 4GB
  • 36. Bag of Little Bootstrap Theoretical guarantees: - Consistency - Higher order correctness - Fast convergence rate (same as bootstrap)
  • 37. Performance b = n^(gamma), 0.5<= gamma <=1 These choices of gamma ensures bootstrap convergence rates.
  • 38. Performance b = n^(gamma), 0.5<= gamma <=1 These choices of gamma ensures bootstrap convergence rates. Relative error of confidence interval width of logistic regression coefficients (Klein et al. 2012)
  • 39. Performance b = n^(gamma), 0.5<= gamma <=1 These choices of gamma ensures bootstrap convergence rates. Relative error of confidence interval width of logistic regression coefficients (Klein et al. 2012) Gamma residuals t-distr residuals
  • 41. Selecting Hyperparameters • b, the number of unique samples for each little bootstrap • s, the number of size b samples w/o replacement • r, the number of multinomials to draw
  • 42. Selecting Hyperparameters • b, the number of unique samples for each little bootstrap • s, the number of size b samples w/o replacement • r, the number of multinomials to draw b: the larger the better s, r: adaptively increase this until a convergence has been reached. (Median doesn’t change)
  • 43. Bag of Little Bootstrap Main benefits: - Computationally friendly - Maintains most statistical properties of bootstrap - Flexibility - More robust to choice of b than older methods
  • 44. Reference • Efron, Tibshirani (1993) An Introduction to the Bootstrap • Kleiner et al. (2012) A Scalable Bootstrap for Massive Data Thanks!