Ton Wesseling
Jan 27 - 31, 2020
How an analyst can add value!
Digital Experiments
TON@ONLINEDIALOGUE.COM
TON@ONLINEDIALOGUE.COM
User	Growth	by	Wilson	Joseph	for	the	Noun	Project	
Business Science!
TON@ONLINEDIALOGUE.COM
Internet	by	Cindy	Hu	for	the	Noun	Project	
Internet / websites!
TON@ONLINEDIALOGUE.COM
Data	Analyst	by	Five	by	Five	for	the	Noun	Project	
Data Analyst!
TON@ONLINEDIALOGUE.COM
A/B-test	by	By	Evangeline	White	for	the	Noun	Project	
A/B-testing!
TON@ONLINEDIALOGUE.COM
Everyone seems to like it!
TON@ONLINEDIALOGUE.COM
2018	research	by	OpLmizely	
A/B-testing Culture!
TON@ONLINEDIALOGUE.COM
Hierarchy of evidence pyramid!
TON@ONLINEDIALOGUE.COM
		A/B-tesLng	mastery	course	
This talk mostly makes sense
if you have 10.000 transactions
or more per month – enough to
get experimentation in the
DNA of your organization.!
TON@ONLINEDIALOGUE.COM
“Our success at Amazon
is a function of
how many experiments
we do per year, per month, per
week, per day…”

Jeff Bezos, CEO Amazon
TON@ONLINEDIALOGUE.COM
TON@ONLINEDIALOGUE.COM
Data	Analyst	by	Five	by	Five	for	the	Noun	Project	
Data Analyst!
In the new world it’s the companies that have lots of data
and know how to properly use it that outperform the competition
TON@ONLINEDIALOGUE.COM
What should be done with the A/B-test program?!
A.  Increase budgets!
•  More a/b-tests (quantity)!
!
B.  Increase knowledge!
•  Better a/b-tests (quality)!
!
C.  Decrease budgets!
•  Less a/b-tests (quantity)!
TON@ONLINEDIALOGUE.COM
This should always be the answer!
A.  Increase budgets!
•  More a/b-tests (quantity)!
But in reality it’s different...!
ü  You can calculate the answer!
ü  You have a big influence on the outcome!
TON@ONLINEDIALOGUE.COM
DEF!
The task of an analyst within an A/B-testing Culture!
1.  Data!
2.  Effectiveness!
3.  Finance!
TON@ONLINEDIALOGUE.COM
DEF!
The task of an analyst within an A/B-testing Culture!
1.  Data!
2.  Effectiveness!
3.  Finance!
TON@ONLINEDIALOGUE.COM
Data!
Let there be high quality data!
TON@ONLINEDIALOGUE.COM
Make sure all funnels are measured…!
TON@ONLINEDIALOGUE.COM
Make sure your testing solution has all users!
Users on template: 42186!
Users in the tool: 37652!
Users with code executed: 34312 !
100%!
89%!
81%!
TON@ONLINEDIALOGUE.COM
What if my experiments had 20% more users?!
TON@ONLINEDIALOGUE.COM
Recognizing returning users!
TON@ONLINEDIALOGUE.COM
Recognizing returning users!
Buddhini	S.	on	Jargon	Wall
TON@ONLINEDIALOGUE.COM
Be able to segment on page interactions!
TON@ONLINEDIALOGUE.COM
Be able to segment on who can be influenced!
TON@ONLINEDIALOGUE.COM
Be able to create behavioral segments!
Typical ecommerce flow example:
ü  All users on your website with enough time to take action
ü  All users on your website with at least some interaction
ü  All users on your website with heavy interaction
ü  All users on your website with clear intent to buy
ü  All users on your website that are willing to buy
ü  All users on your website that succeed in buying
ü  All users on your website that return with intent to buy more
Funnel	
+	
Average	
Lme
TON@ONLINEDIALOGUE.COM
Scientific method
TON@ONLINEDIALOGUE.COM
Data!
Let there be high quality data!
TON@ONLINEDIALOGUE.COM
DEF!
The task of an analyst within an A/B-testing Culture!
1.  Data!
2.  Effectiveness!
3.  Finance!
TON@ONLINEDIALOGUE.COM
Effectiveness!
Make sure you work on stuff!
with the highest potential outcome!
TON@ONLINEDIALOGUE.COM
Statistical Power!
The likelihood that an experiment will
detect an effect, when there is an effect
there to be detected!
TON@ONLINEDIALOGUE.COM
Power & Significance
New version is
NOT better
New version is
better
New version is
NOT better
New version is
better
Measured
Reality
TON@ONLINEDIALOGUE.COM
Power & Significance
Do not reject H0 Reject H0
H0 is true
H0 is false
Measured
Reality
TON@ONLINEDIALOGUE.COM
Significance
Do not reject H0 Reject H0
H0 is true
H0 is false
Correct decision
J
Measured
Reality
TON@ONLINEDIALOGUE.COM
Significance
Do not reject H0 Reject H0
H0 is true
Type I
False Positive (α)
H0 is false
Correct decision
J
Measured
Reality
TON@ONLINEDIALOGUE.COM
Power
Do not reject H0 Reject H0
H0 is true
Correct decision
J
Type I
False Positive (α)
H0 is false
Correct decision
J
Measured
Reality
TON@ONLINEDIALOGUE.COM
Power
Do not reject H0 Reject H0
H0 is true
Correct decision
J
Type I
False Positive (α)
H0 is false
Type II

False Negative (β)
Correct decision
J
Measured
Reality
TON@ONLINEDIALOGUE.COM
Power
New version is
NOT better
New version is
better
New version is
NOT better
Correct decision
J
Type I
False Positive (α)
New version is
better
Type II

False Negative (β)
Correct decision
J
Measured
Reality
TON@ONLINEDIALOGUE.COM
Power & Significance rule of thumb
Power
When you start: try to test on pages with a high Power
(>80%) à otherwise you don’t detect effects when there is
an effect to be detected (False negatives).
Significance
When you start: try to test against a high enough
significance level (90%) à otherwise you’ll declare winners,
when in reality there isn’t an effect (False positives).
TON@ONLINEDIALOGUE.COM
This looks good!
TON@ONLINEDIALOGUE.COM
This is fascinating!
TON@ONLINEDIALOGUE.COM
This makes me sad!
TON@ONLINEDIALOGUE.COM
https://abtestguide.com/abtestsize/!
TON@ONLINEDIALOGUE.COM
TON@ONLINEDIALOGUE.COM
https://ondi.me/bandwidth!
TON@ONLINEDIALOGUE.COM
Prioritize based on MDE to start!
TON@ONLINEDIALOGUE.COM
Test Power Determination
DETERMINE UNIQUE WEEKLY VISITORS PER PAGE TYPE!
ü  We run and evaluate A/B tests on the unique visitor metric: we want to
influence unique users
TON@ONLINEDIALOGUE.COM
Test Power Determination
DETERMINE UNIQUE WEEKLY VISITORS PER PAGE TYPE!
à Build a segment for each page type / segment / test platform combination
TON@ONLINEDIALOGUE.COM
Test Power Determination
DETERMINE UNIQUE WEEKLY VISITORS PER PAGE TYPE!
à Look up the number of weekly visitors with this behavior (select multiple
weeks and device by the number of weeks to account for fluctuation)
TON@ONLINEDIALOGUE.COM
Test Power Determination
DETERMINE UNIQUE VISITORS WITH A CONVERSION PER PAGE TYPE!
ü  Visitors must have seen the test page before they converted
€
Converted
TON@ONLINEDIALOGUE.COM
Test Power Determination
DETERMINE UNIQUE VISITORS WITH A CONVERSION PER PAGE TYPE!
à Build a 2nd sequential segment with page seen à converted
TON@ONLINEDIALOGUE.COM
Test Power Determination
DETERMINE UNIQUE VISITORS WITH A CONVERSION PER PAGE TYPE!
à Look up the number of weekly visitors with a conversion (select multiple
weeks and device by the number of weeks to account for fluctuation)
à Make sure you don’t have sampled data. Otherwise select a shorter period
TON@ONLINEDIALOGUE.COM
https://ondi.me/bandwidth!
TON@ONLINEDIALOGUE.COM
Prioritize based on MDE to start!
TON@ONLINEDIALOGUE.COM
Prioritize based on measured results!!
TON@ONLINEDIALOGUE.COM
Prioritize based on measured results!!
TON@ONLINEDIALOGUE.COM
Prioritize based on measured results!!
TON@ONLINEDIALOGUE.COM
Prioritize based on measured results!!
TON@ONLINEDIALOGUE.COM
Prioritize based on measured results!!
TON@ONLINEDIALOGUE.COM
Prioritize based on measured results!!
With real data from your program!
your prioritization will change!!
TON@ONLINEDIALOGUE.COM
TON@ONLINEDIALOGUE.COM
Type-M errors…
TON@ONLINEDIALOGUE.COM
Prioritize based on measured results?!
(100% - M-Type Error) of course!
Low Power gives a higher Type-M error
TON@ONLINEDIALOGUE.COM
Effectiveness!
Make sure you work on stuff!
with the highest potential outcome!
TON@ONLINEDIALOGUE.COM
DEF!
The task of an analyst within an A/B-testing Culture!
1.  Data!
2.  Effectiveness!
3.  Finance!
TON@ONLINEDIALOGUE.COM
Finance!
Business case calculations!
TON@ONLINEDIALOGUE.COM
What does your calculation look like?!
If significant result:
!
Extra new customers per week!
x!
52 weeks effective!
x!
Average lifetime value!
TON@ONLINEDIALOGUE.COM
What does your calculation look like?!
If significant result:
!
Extra transactions per week!
X!
26 weeks effective!
x!
Average order value!
TON@ONLINEDIALOGUE.COM
So this experiment will bring us:!
€232,840!
(revenue in 6 months after implementation)
Ø  And then just add up all the winners from the past year?
Ø  Which makes €5,273,132 for the whole program?

Ø  And devide that through the yearly costs of €623,400
Ø  So your ROI is: €8.46 revenue per €1 investment?
TON@ONLINEDIALOGUE.COM
Implementing winners…!
TON@ONLINEDIALOGUE.COM
TON@ONLINEDIALOGUE.COM
So that one experiment will bring us:!
€232,840 * (100%-Type-M error %)?!
!
(Yes, if it indeed is a true positive)!
!
€232,840 * (100% - 12%) = €204,899
TON@ONLINEDIALOGUE.COM
Let’s see if the result are already significant!
Focusonpc	via	Pixabay
TON@ONLINEDIALOGUE.COM
How NOT to shorten the length of your A/B-test!
hSps://www.einarsen.no/is-your-ab-tesLng-effort-just-chasing-staLsLcal-ghosts/
TON@ONLINEDIALOGUE.COM
How NOT to shorten the length of your A/B-test!
hSps://www.evanmiller.org/how-not-to-run-an-ab-test.html
TON@ONLINEDIALOGUE.COM
How to shorten the length of your A/B-test!
hSps://codeascraV.com/2018/10/03/how-etsy-handles-peeking-in-a-b-tesLng/
TON@ONLINEDIALOGUE.COM
How to shorten the length of your A/B-test!
hSps://medium.com/convoy-tech/the-power-of-bayesian-a-b-tesLng-f859d2219d5
TON@ONLINEDIALOGUE.COM
How to shorten the length of your A/B-test!
hSps://booking.ai/how-booking-com-increases-the-power-of-online-experiments-with-cuped-995d186fff1d	
“CUPED tries to remove variance in a metric
that can be accounted for by pre-experiment information”
TON@ONLINEDIALOGUE.COM
You could even find more wins!
hSps://booking.ai/how-booking-com-increases-the-power-of-online-experiments-with-cuped-995d186fff1d
TON@ONLINEDIALOGUE.COM
SRM checks anybody?
Also check Lukas vermeer at #CH2019: https://conversionhotel.com/session/keynote-2019-run-better-experiments-srm-checks/
TON@ONLINEDIALOGUE.COM
Running experiment dashboard
TON@ONLINEDIALOGUE.COM
Running experiment dashboard
TON@ONLINEDIALOGUE.COM
Should I stop the experiment?
ü  Is something broken? à YES!
ü  Is there a SRM error? à YES!
ü  Are we losing too much money? à YES!
(and maybe a low chance of becoming significant if you can start a next experiment now)
TON@ONLINEDIALOGUE.COM
Back to the calculation!
€232,840 * (100%-Type-M error %)?!
!
(Yes, if it indeed is a true positive)!
!
€232,840 * (100% - 12%) = €204,899
TON@ONLINEDIALOGUE.COM
Implementing winners…!
TON@ONLINEDIALOGUE.COM
What is your False Discovery Rate?!
Significance border: 90%!
100 experiments!
20 significant outcomes!
!
50%!* (it’s a little lower, this is the poor man’s calculation)!
(with every real win the number of experiments without wins becomes lower, which leads to less false positives)!
TON@ONLINEDIALOGUE.COM
So not really 50%!
FDR* = (Measured Wins - ((Measured Wins - !
((100% - Confidence Level) * Experiments))!
/ Confidence Level)) / Measured Wins!
!
=!
!
(20 – ((20 – ((100% - 90%) * 100)) / 90%)) / 20!
!
=!
!
44%!* (only if your power on all experiments was 100%)!
(Your Power will be lower, which means you had more real wins, but not measured (false negatives).!
This leads to less experiments without an effect, so the number of false positives will be even lower)!
TON@ONLINEDIALOGUE.COM
Rule of thumb: once you have 10 winners or more!
You can calculate your
True Discovery Rate
Power(Winners+Significance-1)
Winners(Power+Significance-1)
80%*(20%+90%-1) = 0.08
20%*(80%+90%-1) = 0.14 
=
 57,14%
TON@ONLINEDIALOGUE.COM
https://abtestguide.com/fdr/!
FDR / TDR calculator!
TON@ONLINEDIALOGUE.COM
FDR / TDR calculator!
TON@ONLINEDIALOGUE.COM
So all your experiments will bring you:!
Sum of!
(every winner x (100% - Type-M error % per winner))!
!
X!!
True Discovery Rate!
x!
Implementation % (within x months…)!
(assuming every new win is tested on the new default where all earlier wins are implemented)!
TON@ONLINEDIALOGUE.COM
So all your experiments will bring you:!
€5,273,132 x (100%-12% average Type-M)!
!
X!!
57,14%!
=!
€2,651,500!
TON@ONLINEDIALOGUE.COM
Maximize your growth within your ROI limit:!
Value of A/B-testing for Optimization!
!
Costs of A/B-testing for Optimization!
= ROI!
TON@ONLINEDIALOGUE.COM
Are you above or below your ROI limit?!
1.  Above: increase budgets!
2.  Below: increase knowledge!
3.  Still below: decrease budgets!
TON@ONLINEDIALOGUE.COM
Are you above or below your ROI limit?!
①  Above: Increase budgets!
•  More a/b-tests (quantity)!
•  Lower win%, more winners
②  Below: Increase knowledge!
•  Better a/b-tests (quality)!
•  Higher win%, more winners
!
③  Still below: Decrease budgets!
•  Less a/b-tests (quantity)!
•  Higher win%, less winners
TON@ONLINEDIALOGUE.COM
You can help getting to this answer!
A.  Increase budgets!
•  More a/b-tests (quantity)!
ü  You can calculate the answer!
ü  You have a big influence on the outcome!
TON@ONLINEDIALOGUE.COM
Data	Analyst	-	The	Noun	Project	icon	from	the	Noun	Project	
An A/B-testing for growth analyst:!
1.  Makes sure there is high
quality Data available!
2.  Steers the data chance
on Effect!
3.  Reports on the real
Financial impact!
TON@ONLINEDIALOGUE.COM
Ton Wesseling
https://ondi.me/tonw
Let’s connect on LinkedIn

Latest article on A/B-testing:
Ton Wesseling
Jan 27 - 31, 2020
How an analyst can add value!
Digital Experiments

Keynote Ton Wesseling at Superweek 2020: How an analyst can add value!