SlideShare a Scribd company logo
1 of 41
Download to read offline
Introduction to the
(Bag of Little) Bootstrap(s)
Agenda
1 2 3
4 5
Identification of the
Problem
Classic Bootstrap Bag of Little Bootstraps
Evaluation of Success
(or Failure)
Next Steps
2
Identification of the Problem
SECTION 1
3
A/B Testing
4
CONTROL VARIANT
No pugs. No change. Etsy with PUG!
5
Standard A/B Experiment
CONTROLS VARIANTS
6
Browser # ABJ463MZ2
OFF
to pug or not to pug?
CONTROLS VARIANTS
7
CONTROLS VARIANTS
8
9
10
standard error:
accounting for variance
and sample sizes
confidence intervals
11
0 0
? → +
A/A Testing
12
CONTROL VARIANT
No pugs. No change. No pugs. No change.
13
Standard A/A Experiment
14
→ +
What the?
Statistical Assumptions
for Parametric Testing
15
i.i.d.
independent & identically distributed
16
A
A
B
C
D
E
normally distributed
17
18
narrow confidence intervals
Classic Bootstrap
SECTION 2
19
20
...
sample boots:
WITH replacement
(do this many, many times)
take the 95%
confidence interval
21
2.5th 97.5th
distribution of t-statistics
boot 1
boot 3
boot 2
boot 4
boot 1 results
boot 2 results
boot 3 results
boot 4 results
22
BOOTSTRAP BY USER
instead of by "visit" (one user can have many "visits")
user id visit id $
A 45 1
A 23 1
A 85 0
B 37 0
C 12 1
C 72 0
A
B
C
}
}
}
closer to i.i.d. (independent & identically distributed)
Bag of Little Bootstraps
SECTION 3
23
24
25
A B C D E Fusers in our
original experiment
fetch a bag:
resample withOUT replacementbag 1
A
E
F
B
25
26
A B C D E Fusers in our
original experiment
bag 1
F
A
E
B
E
B
F
A
F
B
monte carlo subsamples:
resample WITH replacement (from our bag)
*TO SIZE OF ORIGINAL DATA SET*
boot 1
26
27
t-statistic =
difference in populations
standard error
[per monte carlo'd subsample]
bootstrapped
confidence interval
28
2.5th 97.5th
t-statistics
E
B
F
A
F
B
boot 1
E
B
F
A
B
boot 2
A
E
F
B
boot 3
E
A
A
E
F
A
F
B
boot 4
B
A B C D E Fusers in our
original experiment
confidence interval
from bag 1
2.5th 97.5th
t-statistics
E
B
F
A
B
boot 2
E
B
F
A
F
B
boot 1
E
F
B
boot 3
E
F
A
F
B
boot 4
A
E
A
A
B
bag 1
F
A
E
B
29
average the
confidence intervals
30
avg. avg.
2.5th 97.5th
averaged t-statistics
bag 1
bag 3
bag 2
bag 4
bag 1 results
bag 2 results
bag 3 results
bag 4 results
Fixes i.i.d. & distribution
Suitable for distributed systems
Faster, less memory
WINS
31
Evaluation of Success
(or Failure)
SECTION 4
32
Reshuffled Data
(Simulated A/As)
33
Generated Data
(Simulated A/Bs)
34
Existing Experiments
35
Next Steps
SECTION 5
36
Hyperparameter optimization,
Poisson Bootstrap
37
Support for larger experiments
38
Power calculations
& complicated metrics
39
Resources
Kleiner, A., Talwalkar, A., Sarkar, P., and Jordan, M. I. A scalable bootstrap for massive
data. arXiv preprint arXiv:1112.5016v2, 2012. URL http://arxiv.org/abs/1112.5016v2
Bakshy, E., Eckles, D. Uncertainty in Online Experiments with Dependent Data: An
Evaluation of Bootstrap Methods. arXiv preprint arXiv:1304.7406v3, 2013. URL
https://arxiv.org/pdf/1304.7406v3.pdf
Idea for Bootstrapping @ Etsy: @hpster (Hilary Parker)
40
QUESTIONS, COMMENTS, REFUTATIONS?
Contact me: Emily Sommer (esommer@etsy.com)
Thanks!
41

More Related Content

More from Evention

Stream Analytics with SQL on Apache Flink - Fabian Hueske
Stream Analytics with SQL on Apache Flink - Fabian HueskeStream Analytics with SQL on Apache Flink - Fabian Hueske
Stream Analytics with SQL on Apache Flink - Fabian Hueske
Evention
 
Real Time Data Processing at RTB House - Bartosz Łoś
Real Time Data Processing at RTB House - Bartosz ŁośReal Time Data Processing at RTB House - Bartosz Łoś
Real Time Data Processing at RTB House - Bartosz Łoś
Evention
 

More from Evention (20)

Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
 
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
 
Enhancing Spark - increase streaming capabilities of your applications - Kami...
Enhancing Spark - increase streaming capabilities of your applications - Kami...Enhancing Spark - increase streaming capabilities of your applications - Kami...
Enhancing Spark - increase streaming capabilities of your applications - Kami...
 
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
 
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
 
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data ArtisansStream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data Artisans
 
Scaling Cassandra in all directions - Jimmy Mardell Spotify
Scaling Cassandra in all directions - Jimmy Mardell SpotifyScaling Cassandra in all directions - Jimmy Mardell Spotify
Scaling Cassandra in all directions - Jimmy Mardell Spotify
 
Big Data for unstructured data Dariusz Śliwa
Big Data for unstructured data Dariusz ŚliwaBig Data for unstructured data Dariusz Śliwa
Big Data for unstructured data Dariusz Śliwa
 
Elastic development. Implementing Big Data search Grzegorz Kołpuć
Elastic development. Implementing Big Data search Grzegorz KołpućElastic development. Implementing Big Data search Grzegorz Kołpuć
Elastic development. Implementing Big Data search Grzegorz Kołpuć
 
H2 o deep water making deep learning accessible to everyone -jo-fai chow
H2 o deep water   making deep learning accessible to everyone -jo-fai chowH2 o deep water   making deep learning accessible to everyone -jo-fai chow
H2 o deep water making deep learning accessible to everyone -jo-fai chow
 
That won’t fit into RAM - Michał Brzezicki
That won’t fit into RAM -  Michał  BrzezickiThat won’t fit into RAM -  Michał  Brzezicki
That won’t fit into RAM - Michał Brzezicki
 
Stream Analytics with SQL on Apache Flink - Fabian Hueske
Stream Analytics with SQL on Apache Flink - Fabian HueskeStream Analytics with SQL on Apache Flink - Fabian Hueske
Stream Analytics with SQL on Apache Flink - Fabian Hueske
 
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
 
ING CoreIntel - collect and process network logs across data centers in near ...
ING CoreIntel - collect and process network logs across data centers in near ...ING CoreIntel - collect and process network logs across data centers in near ...
ING CoreIntel - collect and process network logs across data centers in near ...
 
Real Time Data Processing at RTB House - Bartosz Łoś
Real Time Data Processing at RTB House - Bartosz ŁośReal Time Data Processing at RTB House - Bartosz Łoś
Real Time Data Processing at RTB House - Bartosz Łoś
 
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
Redundancy for Big Hadoop Clusters is hard  - Stuart PookRedundancy for Big Hadoop Clusters is hard  - Stuart Pook
Redundancy for Big Hadoop Clusters is hard - Stuart Pook
 
Orchestrating Big Data pipelines @ Fandom - Krystian Mistrzak Thejas Murthy
Orchestrating Big Data pipelines @ Fandom - Krystian Mistrzak Thejas MurthyOrchestrating Big Data pipelines @ Fandom - Krystian Mistrzak Thejas Murthy
Orchestrating Big Data pipelines @ Fandom - Krystian Mistrzak Thejas Murthy
 
DataOps or how I learned to love production - Michael Hausenblas
DataOps or how I learned to love production  - Michael HausenblasDataOps or how I learned to love production  - Michael Hausenblas
DataOps or how I learned to love production - Michael Hausenblas
 
Anomaly detection made easy - Piotr Guzik Allegro
Anomaly detection made easy - Piotr Guzik AllegroAnomaly detection made easy - Piotr Guzik Allegro
Anomaly detection made easy - Piotr Guzik Allegro
 
Scalable analytics for microservices architecture nikolay golov
Scalable analytics for microservices architecture   nikolay golovScalable analytics for microservices architecture   nikolay golov
Scalable analytics for microservices architecture nikolay golov
 

Recently uploaded

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
JohnnyPlasten
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 

Recently uploaded (20)

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 

Meta experimentation at Etsy - Emily Sommer