SlideShare a Scribd company logo
1 of 44
ML-IR Discussion:
Bag of Little
Bootstrap (BLB)
Recap:
- Recap
- Why bootstrap
- What is bootstrap
- Bag of Little Bootstrap (BLB)
- Guarantees
- Examples
Recap:
Population

Our Sample
Estimate the median!
Estimate the median!
Asymptotic Approach
Theory has it:
Asymptotic Approach
Theory has it:

?
Asymptotic Approach

95%
Confidence Interval
Problems with the asymptotic
Approach:

- Density “f” is hard to estimate
- Sample size demand is much larger than the mean for
Central Limit theorem to kick in
- True median unknown
Solution:
When theory is too hard…
Let’s empirically estimate
theoretical truth!
Empirical Approach: Ideal
Population

Sample Over and
Over again!
Empirical Approach: Ideal
Population

Sample Over and
Over again!

Median Est 1

Median Est 2
Empirical Approach: Ideal
Empirical Approach: Ideal
95% of sample medians
Similar
Enough?
Population

Our Sample
Empirical Approach: Bootstrap
Efron Tibshirani (1993)
Our Sample

Draw with replacement
n samples

Median Est* 1

Median Est* 2
Empirical Approach: Bootstrap
Empirical Approach: Bootstrap
95% of sample medians
Empirical Approach: Bootstrap
Used for:
- Bias estimation
- Variance
- Confidence intervals
Main benefits:
- Automatic
- Flexible
- Fast convergence (Hall, 1992)
Key: There are 3 distributions
Population
Key: There are 3 distributions
Population

Approximate
distribution
Actual Sample
Key: There are 3 distributions
Population

Approximate
distribution
Actual Sample

Approximate
distribution

Bootstrap Samples
Key: There are 3 distributions
Population

Approximate
distribution
Actual Sample

Approximate
distribution

Bootstrap Samples

Approximate
the approximation
- Is there bias?
- What’s the variance?
- etc.
No free meals:
- Bootstrapping requires re-sampling the entire
population B times
- Each sample is size n
- Sampling m < n will violate the sample size
properties
- Original sample size cannot be too small
- “Pre-asymptopia” cases
Hope
-

Resample expects .632n unique samples
Sample less – m out of n bootstrap is possible with
analytical adjustments. (Bickel 1997)
Hope
-

Resample expects .632n unique samples
Sample less – m out of n bootstrap is possible with
analytical adjustments. (Bickel 1997)

Intuition: Need less than all n values for each bootstrap.
Hope
-

Resample expects .632n unique samples
Sample less – m out of n bootstrap is possible with
analytical adjustments. (Bickel 1997)

Intuition: Need less than all n values for each bootstrap.

Problem:
- Analytical adjustment is not as automatic as desirable
- m out of n bootstrap is sensitive to choices of m
Bag of Little Bootstrap
-

Sample without
replacement the
sample s times into
sizes of b
Bag of Little Bootstrap
-

Sample without
replacement the
sample s times into
sizes of b
- Resample each
until sample size is
n, r times.
Bag of Little Bootstrap
-

Med 1

Med r

Sample without
replacement the
sample s times into
sizes of b
- Resample each
until sample size is
n, r times.
- Compute the
median for each
Bag of Little Bootstrap
-

Med 1

Med r

Sample without
replacement the
sample s times into
sizes of b
- Resample each
until sample size is
n, r times.
- Compute the
median for each
- Compute the
confidence interval
for each
Bag of Little Bootstrap
-

Med 1

Med r

Sample without
replacement the
sample s times into
sizes of b
- Resample each
until sample size is
n, r times.
- Compute the
median for each
- Compute the
confidence interval
for each
Bag of Little Bootstrap
-

Med 1

Med r

-

Sample without
replacement the
sample s times into
sizes of b
- Resample each
until sample size is
n, r times.
- Compute the
median for each
- Compute the
confidence interval
for each
Take average of each
upper and lower point
for the confidence
interval
Bag of Little Bootstrap
Klein et al. 2012
Computational Gains:
- Each sample only has b unique values!
- Can sample a b-dimensional multinomial
with n trials.
- Scales in b instead of n
- Easily parallelizable
Bag of Little Bootstrap
Klein et al. 2012
Computational Gains:
- Each sample only has b unique values!
- Can sample a b-dimensional multinomial
with n trials.
- Scales in b instead of n
- Easily parallelizable
If b=n^(0.6), a dataset of size 1TB:
- Bootstrap storage demands ~ 632GB
- BLB storage demands ~ 4GB
Bag of Little Bootstrap
Theoretical guarantees:
- Consistency
- Higher order correctness
- Fast convergence rate (same as bootstrap)
Performance
b = n^(gamma), 0.5<= gamma <=1
These choices of gamma ensures bootstrap convergence rates.
Performance
b = n^(gamma), 0.5<= gamma <=1
These choices of gamma ensures bootstrap convergence rates.
Relative error of confidence interval width of logistic regression
coefficients
(Klein et al. 2012)
Performance
b = n^(gamma), 0.5<= gamma <=1
These choices of gamma ensures bootstrap convergence rates.
Relative error of confidence interval width of logistic regression
coefficients
(Klein et al. 2012)

Gamma residuals

t-distr residuals
Performance vs Time
Selecting Hyperparameters
• b, the number of unique samples for each little bootstrap
• s, the number of size b samples w/o replacement
• r, the number of multinomials to draw
Selecting Hyperparameters
• b, the number of unique samples for each little bootstrap
• s, the number of size b samples w/o replacement
• r, the number of multinomials to draw

b: the larger the better
s, r: adaptively increase this until a convergence
has been reached. (Median doesn’t change)
Bag of Little Bootstrap
Main benefits:
- Computationally friendly
- Maintains most statistical properties of bootstrap
- Flexibility
- More robust to choice of b than older methods
Reference
• Efron, Tibshirani (1993) An Introduction to the Bootstrap
• Kleiner et al. (2012) A Scalable Bootstrap for Massive Data

Thanks!

More Related Content

What's hot

A Serverless Journey: AWS Lambda Under the Hood (SRV409-R1) - AWS re:Invent 2018
A Serverless Journey: AWS Lambda Under the Hood (SRV409-R1) - AWS re:Invent 2018A Serverless Journey: AWS Lambda Under the Hood (SRV409-R1) - AWS re:Invent 2018
A Serverless Journey: AWS Lambda Under the Hood (SRV409-R1) - AWS re:Invent 2018Amazon Web Services
 
AWS Introduction & History - AWSome Day Philadelphia 2019
AWS Introduction & History - AWSome Day Philadelphia 2019AWS Introduction & History - AWSome Day Philadelphia 2019
AWS Introduction & History - AWSome Day Philadelphia 2019Amazon Web Services
 
Recommender Systems: Beyond the user-item matrix
Recommender Systems: Beyond the user-item matrixRecommender Systems: Beyond the user-item matrix
Recommender Systems: Beyond the user-item matrixEugene Yan Ziyou
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniquesVenkata Reddy Konasani
 
Serverless Architecture - Design Patterns and Best Practices
Serverless Architecture - Design Patterns and Best PracticesServerless Architecture - Design Patterns and Best Practices
Serverless Architecture - Design Patterns and Best PracticesAmazon Web Services
 
Over fitting underfitting
Over fitting underfittingOver fitting underfitting
Over fitting underfittingSivapriyaS12
 
AWS Security Webinar: The Key to Effective Cloud Encryption
AWS Security Webinar: The Key to Effective Cloud EncryptionAWS Security Webinar: The Key to Effective Cloud Encryption
AWS Security Webinar: The Key to Effective Cloud EncryptionAmazon Web Services
 
AWS Web Application Firewall and AWS Shield - Webinar
AWS Web Application Firewall and AWS Shield - Webinar AWS Web Application Firewall and AWS Shield - Webinar
AWS Web Application Firewall and AWS Shield - Webinar Amazon Web Services
 
Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewChia-Ching Lin
 
20180726 AWS KRUG - RDS Aurora에 40억건 데이터 입력하기
20180726 AWS KRUG - RDS Aurora에 40억건 데이터 입력하기20180726 AWS KRUG - RDS Aurora에 40억건 데이터 입력하기
20180726 AWS KRUG - RDS Aurora에 40억건 데이터 입력하기Jongwon Han
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceAmazon Web Services
 
AWS Cloud cost optimization
AWS Cloud cost optimizationAWS Cloud cost optimization
AWS Cloud cost optimizationYogesh Sharma
 
Introduction to Structural Equation Modeling Partial Least Sqaures (SEM-PLS)
Introduction to Structural Equation Modeling Partial Least Sqaures (SEM-PLS)Introduction to Structural Equation Modeling Partial Least Sqaures (SEM-PLS)
Introduction to Structural Equation Modeling Partial Least Sqaures (SEM-PLS)Ali Asgari
 
Curse of dimensionality
Curse of dimensionalityCurse of dimensionality
Curse of dimensionalityNikhil Sharma
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemMarsan Ma
 
Storage with Amazon S3 and Amazon Glacier
Storage with Amazon S3 and Amazon GlacierStorage with Amazon S3 and Amazon Glacier
Storage with Amazon S3 and Amazon GlacierAmazon Web Services
 
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...Citus Data
 

What's hot (20)

A Serverless Journey: AWS Lambda Under the Hood (SRV409-R1) - AWS re:Invent 2018
A Serverless Journey: AWS Lambda Under the Hood (SRV409-R1) - AWS re:Invent 2018A Serverless Journey: AWS Lambda Under the Hood (SRV409-R1) - AWS re:Invent 2018
A Serverless Journey: AWS Lambda Under the Hood (SRV409-R1) - AWS re:Invent 2018
 
AWS Introduction & History - AWSome Day Philadelphia 2019
AWS Introduction & History - AWSome Day Philadelphia 2019AWS Introduction & History - AWSome Day Philadelphia 2019
AWS Introduction & History - AWSome Day Philadelphia 2019
 
Recommender Systems: Beyond the user-item matrix
Recommender Systems: Beyond the user-item matrixRecommender Systems: Beyond the user-item matrix
Recommender Systems: Beyond the user-item matrix
 
Auto scaling
Auto scalingAuto scaling
Auto scaling
 
Svm vs ls svm
Svm vs ls svmSvm vs ls svm
Svm vs ls svm
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
 
Serverless Architecture - Design Patterns and Best Practices
Serverless Architecture - Design Patterns and Best PracticesServerless Architecture - Design Patterns and Best Practices
Serverless Architecture - Design Patterns and Best Practices
 
Over fitting underfitting
Over fitting underfittingOver fitting underfitting
Over fitting underfitting
 
AWS Security Webinar: The Key to Effective Cloud Encryption
AWS Security Webinar: The Key to Effective Cloud EncryptionAWS Security Webinar: The Key to Effective Cloud Encryption
AWS Security Webinar: The Key to Effective Cloud Encryption
 
AWS Web Application Firewall and AWS Shield - Webinar
AWS Web Application Firewall and AWS Shield - Webinar AWS Web Application Firewall and AWS Shield - Webinar
AWS Web Application Firewall and AWS Shield - Webinar
 
Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical View
 
20180726 AWS KRUG - RDS Aurora에 40억건 데이터 입력하기
20180726 AWS KRUG - RDS Aurora에 40억건 데이터 입력하기20180726 AWS KRUG - RDS Aurora에 40억건 데이터 입력하기
20180726 AWS KRUG - RDS Aurora에 40억건 데이터 입력하기
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
 
AWS Cloud cost optimization
AWS Cloud cost optimizationAWS Cloud cost optimization
AWS Cloud cost optimization
 
Introduction to Structural Equation Modeling Partial Least Sqaures (SEM-PLS)
Introduction to Structural Equation Modeling Partial Least Sqaures (SEM-PLS)Introduction to Structural Equation Modeling Partial Least Sqaures (SEM-PLS)
Introduction to Structural Equation Modeling Partial Least Sqaures (SEM-PLS)
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Curse of dimensionality
Curse of dimensionalityCurse of dimensionality
Curse of dimensionality
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking system
 
Storage with Amazon S3 and Amazon Glacier
Storage with Amazon S3 and Amazon GlacierStorage with Amazon S3 and Amazon Glacier
Storage with Amazon S3 and Amazon Glacier
 
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
Scaling Multi-Tenant Applications Using the Django ORM & Postgres | PyCaribbe...
 

Similar to Introduction to Bag of Little Bootstrap

CHAPTER 3 EXERCISES (Set 2)
CHAPTER 3 EXERCISES (Set 2)CHAPTER 3 EXERCISES (Set 2)
CHAPTER 3 EXERCISES (Set 2)DleenBrowns
 
Genetic Algorithms-1.ppt
Genetic Algorithms-1.pptGenetic Algorithms-1.ppt
Genetic Algorithms-1.pptDrSanjeevPunia
 
NOTE: All requested Minitab output must be copied into your paper.
NOTE:  All requested Minitab output must be copied into your paper.NOTE:  All requested Minitab output must be copied into your paper.
NOTE: All requested Minitab output must be copied into your paper.DleenBrowns
 
Probability Distributions
Probability DistributionsProbability Distributions
Probability DistributionsHarish Lunani
 
Chapter 8 review
Chapter 8 reviewChapter 8 review
Chapter 8 reviewdrahkos1
 
Andrii Belas: A/B testing overview: use-cases, theory and tools
Andrii Belas: A/B testing overview: use-cases, theory and toolsAndrii Belas: A/B testing overview: use-cases, theory and tools
Andrii Belas: A/B testing overview: use-cases, theory and toolsLviv Startup Club
 
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 MLconf
 
5. sampling design
5. sampling design5. sampling design
5. sampling designkbhupadhoj
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knimeGreg Landrum
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsDarius Barušauskas
 
randomization approach in case-based reasoning: case of study of mammography ...
randomization approach in case-based reasoning: case of study of mammography ...randomization approach in case-based reasoning: case of study of mammography ...
randomization approach in case-based reasoning: case of study of mammography ...Miled Basma Bentaiba
 
TEST #1Perform the following two-tailed hypothesis test, using a.docx
TEST #1Perform the following two-tailed hypothesis test, using a.docxTEST #1Perform the following two-tailed hypothesis test, using a.docx
TEST #1Perform the following two-tailed hypothesis test, using a.docxmattinsonjanel
 
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 MLconf
 

Similar to Introduction to Bag of Little Bootstrap (20)

Bootstrap.ppt
Bootstrap.pptBootstrap.ppt
Bootstrap.ppt
 
CHAPTER 3 EXERCISES (Set 2)
CHAPTER 3 EXERCISES (Set 2)CHAPTER 3 EXERCISES (Set 2)
CHAPTER 3 EXERCISES (Set 2)
 
Genetic Algorithms-1.ppt
Genetic Algorithms-1.pptGenetic Algorithms-1.ppt
Genetic Algorithms-1.ppt
 
NOTE: All requested Minitab output must be copied into your paper.
NOTE:  All requested Minitab output must be copied into your paper.NOTE:  All requested Minitab output must be copied into your paper.
NOTE: All requested Minitab output must be copied into your paper.
 
Stats chapter 9
Stats chapter 9Stats chapter 9
Stats chapter 9
 
Probability Distributions
Probability DistributionsProbability Distributions
Probability Distributions
 
Chapter10 Revised
Chapter10 RevisedChapter10 Revised
Chapter10 Revised
 
Chapter10 Revised
Chapter10 RevisedChapter10 Revised
Chapter10 Revised
 
Chapter10 Revised
Chapter10 RevisedChapter10 Revised
Chapter10 Revised
 
Chapter 8 review
Chapter 8 reviewChapter 8 review
Chapter 8 review
 
Andrii Belas: A/B testing overview: use-cases, theory and tools
Andrii Belas: A/B testing overview: use-cases, theory and toolsAndrii Belas: A/B testing overview: use-cases, theory and tools
Andrii Belas: A/B testing overview: use-cases, theory and tools
 
Model selection
Model selectionModel selection
Model selection
 
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
 
5. sampling design
5. sampling design5. sampling design
5. sampling design
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knime
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitions
 
randomization approach in case-based reasoning: case of study of mammography ...
randomization approach in case-based reasoning: case of study of mammography ...randomization approach in case-based reasoning: case of study of mammography ...
randomization approach in case-based reasoning: case of study of mammography ...
 
TEST #1Perform the following two-tailed hypothesis test, using a.docx
TEST #1Perform the following two-tailed hypothesis test, using a.docxTEST #1Perform the following two-tailed hypothesis test, using a.docx
TEST #1Perform the following two-tailed hypothesis test, using a.docx
 
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
 

More from Wayne Lee

Feature selection can hurt model inference
Feature selection can hurt model inferenceFeature selection can hurt model inference
Feature selection can hurt model inferenceWayne Lee
 
Explaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for StatisticiansExplaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for StatisticiansWayne Lee
 
What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?Wayne Lee
 
R merge-tutorial
R merge-tutorialR merge-tutorial
R merge-tutorialWayne Lee
 
The Key to Blind Dates - Data Snooping
The Key to Blind Dates - Data SnoopingThe Key to Blind Dates - Data Snooping
The Key to Blind Dates - Data SnoopingWayne Lee
 
Crash Course in A/B testing
Crash Course in A/B testingCrash Course in A/B testing
Crash Course in A/B testingWayne Lee
 
LDA Beginner's Tutorial
LDA Beginner's TutorialLDA Beginner's Tutorial
LDA Beginner's TutorialWayne Lee
 

More from Wayne Lee (7)

Feature selection can hurt model inference
Feature selection can hurt model inferenceFeature selection can hurt model inference
Feature selection can hurt model inference
 
Explaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for StatisticiansExplaining the Basics of Mean Field Variational Approximation for Statisticians
Explaining the Basics of Mean Field Variational Approximation for Statisticians
 
What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?What is bayesian statistics and how is it different?
What is bayesian statistics and how is it different?
 
R merge-tutorial
R merge-tutorialR merge-tutorial
R merge-tutorial
 
The Key to Blind Dates - Data Snooping
The Key to Blind Dates - Data SnoopingThe Key to Blind Dates - Data Snooping
The Key to Blind Dates - Data Snooping
 
Crash Course in A/B testing
Crash Course in A/B testingCrash Course in A/B testing
Crash Course in A/B testing
 
LDA Beginner's Tutorial
LDA Beginner's TutorialLDA Beginner's Tutorial
LDA Beginner's Tutorial
 

Recently uploaded

Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayMakMakNepo
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxsqpmdrvczh
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 

Recently uploaded (20)

Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up Friday
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 

Introduction to Bag of Little Bootstrap

  • 1. ML-IR Discussion: Bag of Little Bootstrap (BLB)
  • 2. Recap: - Recap - Why bootstrap - What is bootstrap - Bag of Little Bootstrap (BLB) - Guarantees - Examples
  • 9. Problems with the asymptotic Approach: - Density “f” is hard to estimate - Sample size demand is much larger than the mean for Central Limit theorem to kick in - True median unknown
  • 10. Solution: When theory is too hard… Let’s empirically estimate theoretical truth!
  • 12. Empirical Approach: Ideal Population Sample Over and Over again! Median Est 1 Median Est 2
  • 14. Empirical Approach: Ideal 95% of sample medians
  • 16. Empirical Approach: Bootstrap Efron Tibshirani (1993) Our Sample Draw with replacement n samples Median Est* 1 Median Est* 2
  • 19. Empirical Approach: Bootstrap Used for: - Bias estimation - Variance - Confidence intervals Main benefits: - Automatic - Flexible - Fast convergence (Hall, 1992)
  • 20. Key: There are 3 distributions Population
  • 21. Key: There are 3 distributions Population Approximate distribution Actual Sample
  • 22. Key: There are 3 distributions Population Approximate distribution Actual Sample Approximate distribution Bootstrap Samples
  • 23. Key: There are 3 distributions Population Approximate distribution Actual Sample Approximate distribution Bootstrap Samples Approximate the approximation - Is there bias? - What’s the variance? - etc.
  • 24. No free meals: - Bootstrapping requires re-sampling the entire population B times - Each sample is size n - Sampling m < n will violate the sample size properties - Original sample size cannot be too small - “Pre-asymptopia” cases
  • 25. Hope - Resample expects .632n unique samples Sample less – m out of n bootstrap is possible with analytical adjustments. (Bickel 1997)
  • 26. Hope - Resample expects .632n unique samples Sample less – m out of n bootstrap is possible with analytical adjustments. (Bickel 1997) Intuition: Need less than all n values for each bootstrap.
  • 27. Hope - Resample expects .632n unique samples Sample less – m out of n bootstrap is possible with analytical adjustments. (Bickel 1997) Intuition: Need less than all n values for each bootstrap. Problem: - Analytical adjustment is not as automatic as desirable - m out of n bootstrap is sensitive to choices of m
  • 28. Bag of Little Bootstrap - Sample without replacement the sample s times into sizes of b
  • 29. Bag of Little Bootstrap - Sample without replacement the sample s times into sizes of b - Resample each until sample size is n, r times.
  • 30. Bag of Little Bootstrap - Med 1 Med r Sample without replacement the sample s times into sizes of b - Resample each until sample size is n, r times. - Compute the median for each
  • 31. Bag of Little Bootstrap - Med 1 Med r Sample without replacement the sample s times into sizes of b - Resample each until sample size is n, r times. - Compute the median for each - Compute the confidence interval for each
  • 32. Bag of Little Bootstrap - Med 1 Med r Sample without replacement the sample s times into sizes of b - Resample each until sample size is n, r times. - Compute the median for each - Compute the confidence interval for each
  • 33. Bag of Little Bootstrap - Med 1 Med r - Sample without replacement the sample s times into sizes of b - Resample each until sample size is n, r times. - Compute the median for each - Compute the confidence interval for each Take average of each upper and lower point for the confidence interval
  • 34. Bag of Little Bootstrap Klein et al. 2012 Computational Gains: - Each sample only has b unique values! - Can sample a b-dimensional multinomial with n trials. - Scales in b instead of n - Easily parallelizable
  • 35. Bag of Little Bootstrap Klein et al. 2012 Computational Gains: - Each sample only has b unique values! - Can sample a b-dimensional multinomial with n trials. - Scales in b instead of n - Easily parallelizable If b=n^(0.6), a dataset of size 1TB: - Bootstrap storage demands ~ 632GB - BLB storage demands ~ 4GB
  • 36. Bag of Little Bootstrap Theoretical guarantees: - Consistency - Higher order correctness - Fast convergence rate (same as bootstrap)
  • 37. Performance b = n^(gamma), 0.5<= gamma <=1 These choices of gamma ensures bootstrap convergence rates.
  • 38. Performance b = n^(gamma), 0.5<= gamma <=1 These choices of gamma ensures bootstrap convergence rates. Relative error of confidence interval width of logistic regression coefficients (Klein et al. 2012)
  • 39. Performance b = n^(gamma), 0.5<= gamma <=1 These choices of gamma ensures bootstrap convergence rates. Relative error of confidence interval width of logistic regression coefficients (Klein et al. 2012) Gamma residuals t-distr residuals
  • 41. Selecting Hyperparameters • b, the number of unique samples for each little bootstrap • s, the number of size b samples w/o replacement • r, the number of multinomials to draw
  • 42. Selecting Hyperparameters • b, the number of unique samples for each little bootstrap • s, the number of size b samples w/o replacement • r, the number of multinomials to draw b: the larger the better s, r: adaptively increase this until a convergence has been reached. (Median doesn’t change)
  • 43. Bag of Little Bootstrap Main benefits: - Computationally friendly - Maintains most statistical properties of bootstrap - Flexibility - More robust to choice of b than older methods
  • 44. Reference • Efron, Tibshirani (1993) An Introduction to the Bootstrap • Kleiner et al. (2012) A Scalable Bootstrap for Massive Data Thanks!