SlideShare a Scribd company logo
1 of 149
SUBTITLE BELOW
Data Driven Test Strategy
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
10:00 Introduction
10:15 Where to test?
10:45 What to test?
11:15 How to test?
11:45 Break
12:00 Bayesian statistics
12:30 Post-analysis
13:00 The end… 
Program
WORKSHOP DATA DRIVEN TEST STRATEGY
Intro
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Analytics Psychology
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Who are you?!
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Junior analyst
Director Digital Analytics
Customer Experience Marketeer
CRO & Analytics
CRO Specialist
Conversion Specialist
Data Scientist
Webanalist
Web Analytics Consultant
Product Owner
Online Data Specialist
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
10:00 Introduction
10:15 Where to test?
10:45 What to test?
11:15 How to test?
11:45 Break
12:00 Bayesian statistics
12:30 Post-analysis
13:00 The end… 
Program
WORKSHOP DATA DRIVEN TEST STRATEGY
Where to test?
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
• Potential: Where can we get the biggest lift?
Where can we get the biggest lift?
Opinions are like **sholes
everyone has one
Without an analyzed reason
to start, it doesn’t make any
sense to start at all
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Goal Customer Behaviour Study
 Insight in the most important customer journeys
 Understand behaviour
 Input for setting (test) hypothesis
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Customer Behaviour Study
View Voice Verified
(1st party)
Verified
(2nd party)
Value
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Web Analytics
Heatmaps
Recordings
Market data
Customer Behaviour Study
View Voice Verified
(1st party)
Verified
(2nd party)
Value
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
ViewWeb analytics
 Where do visitors start on the site?
 Where do they come from?
 What is the flow of those visitors?
 Are there notable differences between segments or products?
 What’s the behavior on specific test pages?
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
 Where do visitors start
their journey?
 Difference between new /
existing customers?
 Difference per device?
Landing pages View
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
 Where do visitors come
from?
 Do they already have a
product in mind?
 Do they already know the
brand?
Traffic source View
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Funnel Analysis View
 What’s the CTR (and
return rate) to the
next step?
 Exit rate and time on
page per step?
 Determine for each
segment / product
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Customer Journey Analysis View
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Behavior on important pages View
 What’s the next action visitors take on the page?
 What was the previous action taken by visitors?
 What choices do they make?
 Does this differ among segments / products?
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
ViewHeat- and scroll maps
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
ViewSession recordings
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Web Analytics
Heatmaps
Recordings
Market data
Customer Service
Surveys
Feedback tools
User research
Customer Behaviour Study
View Voice Verified
(1st party)
Verified
(2nd party)
Value
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Talk to customer service!
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Ask for feedback online Voice
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
 Interview customers
 Focus groups
 Usability studies
VoiceUser research
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Web Analytics
Heatmaps
Recordings
Customer Service
Surveys
Feedback tools
User research
Previous tests
Customer Behaviour Study
View Voice Verified
(1st party)
Verified
(2nd party)
Value
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Insight previous tests Verified
(1st party)
 What have you tested already?
 What have you learned form those experiments?
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Web Analytics
Heatmaps
Recordings
Market data
Customer Service
Surveys
Online Chat
Feedback tools
User research
Previous tests
Customer Behaviour Study
View Voice Verified
(1st party)
Verified
(2nd party)
Value
Scientific research
Competitors
Scientific literature Verified
(2nd party)
 What do we know from scientific literature?
 In general about decision-making processes
 And specifically about the type of products sold
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Web Analytics
Heatmaps
Recordings
Market data
Customer Service
Surveys
Online Chat
Feedback tools
User research
Mission
Vision
Strategy
Goals
Previous tests
Customer Behaviour Study
View Voice Verified
(1st party)
Verified
(2nd party)
Value
Scientific research
Competitors
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
What to do with all this info?
1. Fix all the bugs and implement no-brainers
2. Growth-hack on places where you can not test
3. Set hypotheses and build an A/B-test road map
I APPLY THISIf , then THIS BEHAVORIAL CHANGE
will happen, ( among THIS GROUP ),
THIS REASONbecause of .
Set up concrete hypothesis
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Challenge your hypothesis
• Potential: Where can we get the biggest lift?
• Impact: Score hypothesis based on 5V
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Impact: Score Hypothesis based on 5V
Questions?
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
10:00 Introduction
10:15 What to test?
10:45 Where to test?
11:15 How to test?
11:45 Break
12:00 Bayesian statistics
12:30 Post-analysis
13:00 The end… 
Program
WORKSHOP DATA DRIVEN TEST STRATEGY
Where to test?
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
How should we test the hypothesis?
• Potential: Where can we get the biggest lift?
• Impact: Score Hypothesis based on 5V
• Power: How should we test the hypothesis?
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Time span
Conversionspermonth
Risk + Optimization + Automation Re-think
10.000 conversions
per month
1.000 conversions
per month
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Where can we test?
Map out your different page(type)s and
determine:
Can I run a test on this page with a
Power of >=80%?
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Test Power Determination
WHY?
 To determine what page are eligible to test on
 To structure A/B-tests
 To define the pages with the highest impact
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Power & Significance
Do not reject H0 Reject H0
H0 is true
H0 is false
Reality
Measured
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Null hypothesis
H0: Defendant is innocent
Alternative hypothesis
Ha: Defendant is guilty
Present the evidence
Collect data
Judge the evidence
“Could the data plausibly have
happened by chance if the
defendant is actually innocent?”
Yes
Fail to reject H0
No
Reject H0
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
P-value
“Could the data plausibly have
happened by chance if the null
hypothesis is true?”
Null hypothesis
H0: Conversion rates of default
and variation B are the same
Alternative hypothesis
Ha: Variation B is better
Present the evidence
Collect data
Yes
Fail to reject H0
No
Reject H0
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Significance
Do not reject H0 Reject H0
H0 is true
H0 is false
Correct decision
 (Significance)
Measured
Reality
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Significance
Do not reject H0 Reject H0
H0 is true
Type I
False Positive (α)
H0 is false
Correct decision
 (Significance)
Measured
Reality
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Power
Do not reject H0 Reject H0
H0 is true
Correct decision
 (Power)
Type I
False Positive (α)
H0 is false
Correct decision
 (Significance)
Measured
Reality
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Power
Do not reject H0 Reject H0
H0 is true
Correct decision
 (Power)
Type I
False Positive (α)
H0 is false
Type II
False Negative (β)
Correct decision
 (Significance)
Measured
Reality
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Power & Significance
Power
Only test on pages with a high Power (>80%) 
otherwise you don’t detect effects when there is an
effect to be detected (False negatives).
Significance
Test against a high enough significance level (90%
or 95%)  otherwise you’ll declare a winner, when
in reality there isn’t an effect (False positives).
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Power
“Statistical power is the likelihood that an
experiment will detect an effect when there
is an effect there to be detected”.
Depends on:
• Sample size
• Effect size
• Significance level
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Power examples abtestguide.com/calc
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Power examples abtestguide.com/calc
Power: 64,7%
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Power: sample size +
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Power: sample size +
Power: 85,4%
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Power: effect size +
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Power: effect size +
Power: 97,5%
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Test Power Determination
WHY?
 To determine what page are eligible to test on
 To structure A/B-tests
 To define the pages with the highest impact
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Test Power Determination
HOW?
1. Determine important KPI’s, segments / test platforms
2. Map out all the different page types for each flow
3. Determine unique weekly visitors per page type
4. Determine unique visitors with a conversion per page type
5. Determine test eligibility (test duration and minimum effect)
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Test Power Determination
2. MAP OUT ALL THE DIFFERENT PAGE TYPES PER FLOW
1. Homepage
2. Listing
3. Product page
4. Campaign landing page
5. Cart
6. Checkout step 1
7. …
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Test Power Determination
3. DETERMINE UNIQUE WEEKLY VISITORS PER PAGE TYPE
 We run and evaluate A/B tests on the unique visitor metric: we want
to influence unique users
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Test Power Determination
4. DETERMINE UNIQUE VISITORS WITH A CONVERSION PER PAGE TYPE
 Visitors must have seen the test page before they converted
€
Converted
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Word of Caution
KPI NEEDS TO BE BINARY!
1. A/B-test tools and calculators in the market are only
compatible with binary variables (0/1 variables)
2. You either convert or you don’t
3. Most important assumption: the distribution follows
the normal distribution (symmetrical)
 KPI can’t be AOV, average satisfaction, number of pageviews etc.
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Test Power Determination
5. DETERMINE TEST ELIGIBILITY (TEST DURATION AND MINIMUM EFFECT)
Ondi.me/bandwidth
If weekly conversions are
lower than 250, then testing
becomes very challenging
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Test Power Determination
5. DETERMINE TEST ELIGIBILITY (TEST DURATION AND MINIMUM EFFECT)
How many weeks should we run the test?
How big of an uplift is feasible?
 The longer you test, the smaller the uplift you are able to recognize
 Aim for uplifts at least lower than 10%
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Sample size <> MDE Ondi.me/samplesize
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Test Power Determination
5. DETERMINE TEST ELIGIBILITY (TEST DURATION AND MINIMUM EFFECT)
Ondi.me/bandwidth
Power: How much uplift do we expect?
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
How easy is it to test and implement?
• Potential: Where can we get the biggest lift?
• Impact: Score Hypothesis based on 5V
• Power: How should we test these hypothesis?
• Ease: How easy is it to test and implement?
Prioritize based on Ease
Questions?
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
10:00 Introduction
10:15 What to test?
10:45 Where to test?
11:15 How to test?
11:45 Break
12:00 Bayesian statistics
12:30 Post-analysis
13:00 The end… 
Program
WORKSHOP DATA DRIVEN TEST STRATEGY
How to test?
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Test duration
 You need a sample large enough not to be vulnerable to the
data’s natural variability
 You need a sample representative of your overall audience
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Variability – 10 coin tosses
 % heads varies between 30 and 80%
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Variability – 100 coin tosses
 % heads varies between 49 and 54%
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Central Limit Theorem
“The sampling distribution of the mean of any independent, random
variable will be normal or nearly normal, if the sample size is large
enough.”
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Representativeness
 Make sure the proportion of each
group in the sample is
representative of the total
population
 Highly dependent upon sample
size
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Representativeness
Variation A Variation B
Visitors 1.000 1.000
Conversions 184 162
Conversion rate 18,4% 16,2%
 Variation B performs 12,3% worse
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Representativeness
Variation A Variation B
Visitors 1.000 1.000
Conversions 184 162
Conversion rate 18,4% 16,2%
% new visitors 53% 60%
 Just because the sample isn’t representative (because of low sample
size), variation B seems to perform worse
CR new 3,2%
CR repeat 35,6%
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Day-of-the-week effects
 Test for full weeks
to rule out day-of-
the-week effects
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Don’t stop test too early!
 Determine the test duration (sample size) upfront
 Stick to the duration!
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Don’t stop test too early!
http://www.einarsen.no/is-your-ab-testing-effort-just-chasing-statistical-ghosts/
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Don’t stop test too early!
https://destack.home.xs4all.nl/projects/significance/#
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Sample pollution
“Sample pollution happens when people
in a test see both variations (ABBA-effect)
due to any uncontrolled external factor.
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Cookie deletion
 When a visitor deletes its cookie, he/she will be
newly assigned to a variation in their next visit
 In case of an A/B-test this means a probability of
50% of seeing the wrong variation
 The more variations you test, the higher the
probability he/she will see the wrong variation
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Cookie deletion
 5-10% deletes their cookie in every session
 Estimations in the market: 50% cookie deletion
within a few months, 31% within a month
 How big is your cookie deletion rate within the test
period?
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Cross-device usage
 When a visitor visits the site on another
device as well, he/she has a 50% of seeing
the wrong variation (with 2 variations)
 The more variations you test, the higher the
probability that he/she will see the wrong
variation
 Do research how big this issue is on your site
Did you visit our website on other
devices in the last week?
(i.e. desktop/smartphone/tablet)
a) No, I only visited website.com
with my current device
b) Yes, on 1 other device
c) Yes, on 2 or more other devices
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Customer Journey Effects
 When it takes a couple of visits for a visitors
to convert, it’s likely that he/she visited the
site prior to the test
 The longer the customer journey, the higher
this likelihood
 Do research how big this issue is on your site
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Test Duration vs Sample Pollution
The longer the test runs, the smaller the difference in conversion
rate you can detect, but…the longer the test runs, the higher the
chance of sample pollution
 It’s a balance between the two:
we recommend testing for max 4 weeks
Test Integration
&
extra measurements
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Can’t I just rely on the tool?
 You could, but tools are a black-box
 If you integrate with you analytics tool you
have more data: you control the data, you
can do the calculations yourself and analyse
more!
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Test set-up & integration
 Integrate your test tool with your analytics tool
 Use Custom Dimensions / eVars or Event
tracking to measure the variations
 Rather use the Code Editor instead of the
standard integration, so you have better control
over when to fire the measurement
//Set variable information
var testID = 'OD000',
testNamed = 'Testname',
testVariation = 'A: Control';
//Integration Google Analytics
ga('create', 'UA-xxxxxx-x', 'auto');
ga('send', 'event', 'AB-Test', testID+' -
'+testNamed+' - '+testVariation,
testVariation, {'nonInteraction': 1});
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Q & A your test
 Is the variation working correctly?
 Are all the extra measurements working correctly?
 Always check browser compatibility
 And device compatibility
 Do not break dynamic stuff!
Questions?
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
10:00 Introduction
10:15 What to test?
10:45 Where to test?
11:15 How to test?
11:45 Break
12:00 Bayesian statistics
12:30 Post-analysis
13:00 The end… 
Program
WORKSHOP DATA DRIVEN TEST STRATEGY
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Frequentist
statistics
Bayesian
statistics
Frequentist statistics
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Null hypothesis
H0: Conversion rates of default
and variation B are the same
Alternative hypothesis
Ha: Conversion rate of variation
B is better
Present the evidence
Collect data
Judge the evidence (p-value)
“Could the data plausibly have
happened by chance if the null
hypothesis is true?”
Yes
Fail to reject H0
No
Reject H0
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
H0 = Variation A and B have the same conversion rate
Hard to Understand
Ondi.me/vis
So the p-value only tells you:
How unlikely is it that you found this result,
given that the null hypothesis is true (that there
is no difference between the conversion rates)
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Focus on finding proof
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Challenge 2: Focus on finding proof
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
What’s the alternative?
Frequentist
statistics
Bayesian
statistics
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Bayesian Test evaluation
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
abtestguide.com/bayesian/
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Bayesian Test evaluation
89,1%
A test result is the probability that B outperforms A:
ranging from 0% - 100%
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
IMPLEMENT B PROBABILITY * EFFECT ON REVENU
Expected risk 10,9% - € 204.400
Expected uplift 89,1% € 647.150
Contribution € 554.552
* Based on 6 months and an average order value of € 175
Make a risk assessment
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
What’s an acceptable probability?
 Depends on the business : how much risk is the
business willing to take?
 Depends on the type of test : how invasive is the
test? How much resources does it cost?
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Adding
direct value
Learning
user behavior
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
The cut-off probability for implementation is not the same as the
cut-off probability for a learning
We still need the scientist!
CHANCE
LEARNING?
(Customer Intelligence)
< 70 % No learning
70 – 85 %
Indication
(need retest to confirm)
85 – 95 %
Strong indication
(need congruent other data)
> 95 % Learning
Questions?
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
10:00 Introduction
10:15 What to test?
10:45 Where to test?
11:15 How to test?
11:45 Break
12:00 Bayesian statistics
12:30 Post-analysis
13:00 The end… 
Program
WORKSHOP DATA DRIVEN TEST STRATEGY
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Before you start analyzing
 Know the test duration. Only check during the test that conversion are
coming in and the traffic is evenly distributed.
 Check if the population of users that have seen the test are about the
same per variation (to make sure you have a representative sample)
 Determine how to isolate the test population (users who have seen the
variation).
 Determine the test goals and how to isolate those users (users who have
seen the variation followed by a test goal).
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Post-analysis - Basics
 Analyse in the analytics tool and not in the test tool.
 Avoid sampling .
 Analyse users not sessions.
 Analyse users who have converted (0/1 variable) not users and total
conversions.
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Users of the test
 We run and evaluate A/B tests on the unique visitor metric: we want
to influence unique users
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Users in the test
 Build a segment for the specific targeting of the test
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Users with a conversion
 Visitors must have seen the test page before they converted
€
Converted
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Users with a conversion
 Build a 2nd sequential segment with page seen  converted
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Build a Custom Report in GA
AND APPLY THE SEGMENTS FOR USERS AND USERS WITH A CONVERSION
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Or even automate it! https://groups.google.com/forum/#!forum/
google-analytics-spreadsheet-add-on
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Or even automate it!
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Calculate the chance that B > A
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Main result (a)
THE TEST VARIATION LEADS TO A HIGHER ADD TO CART RATIO
 In the original 14,89% adds a product to their cart, in the
variation 15,49%. This is an uplift of +3,99%
 The chance that the variation leads to more visitors who
put a product in their cart is 96,7%
VISITORS ADD TO CART ATC RATIO UPLIFT
A 24.590 3.662 14,89%
B 24.396 3.778 15,49% +3,99%
Chance 96,7%
Chance 3,3%
Chance of B outperforming A
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Main result (b)
THE TEST VARIATION LEADS TO A HIGHER CONVERSION RATE
 In the original 5,82% finishes a order, in the variation
6,11%. This is an uplift of +5,02%
 The chance that the variation leads to more orders is
91,3%
VISITORS ORDERS CR UPLIFT
A 24.590 1.431 5,82%
B 24.396 1.491 6,11% +5,02%
Chance 91,3%
Chance 8,7%
Chance of B outperforming A
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Post-analysis - Deep dive
Analyse relevant segments:
 User type
 Device category
 Channel
 Entry page
Also look at other relevant statistics:
 Time on test page / exit% / bounce%
 Interactions on the test page
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Word of Caution
DON’T SEGMENT ON EVERYTHING!
• Each analyzed segment needs to have enough visitors and conversions
• Be careful with too many segments: you might end up with False
Positives! (each segment counts as a new A/B-test, so the likelihood of
a False Positive increases with each segment)
• If you find a difference between segments  run a separate A/B-test to
confirm the finding
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Results per important segment
 The uplift in conversion rate is only apparent
on desktop devices (+7,54%). On tablet
devices a drop in conversion rate was
measured of -2,47%
 For both new and returning visitors the
variation performed better (new: +3,25%,
returning: +7,62%)
THE VARIATION PERFORMS BEST FOR DEKSTOP AND RETURNING USERS
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Behavioral changes
 The changes on the product page lead to more
interaction with the page: more visitors navigated
through the images and more visitors selected a
size or color.
 The product description and details were placed
below the fold (at least for tablet). Interaction with
this element decreased. But still 13,5% clicks on
this element.
 Exit rate and time on page weren’t influenced by the
variation
MORE INTERACTION ABOVE THE FOLD
CLICKED ON
PHOTO
SELECTED
SIZE/COLOR
CLICKED ON
PRODUCT
INFO
ADD TO
WISHLIST
A 43,8% 16,9% 15,0% 1,4%
B 53,4% 17,5% 13,5% 1,4%
+21,9% +3,3% -9,9% +1,4%
PRODUCT PAGE INTERACTIONS
EXIT RATE TIME ON PAGE
A 8,60% 41 sec
B 8,64% 41 sec
PRODUCT PAGE STATISTICS
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
ABTESTGUIDE.COM/BAYESIAN/
Calculate the impact on revenue
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Business CaseEXPECTED EFFECT ON REVENUE IN 6 MONTHS AFTER IMPLEMENTATION
IMPLEMENT
WINNER
IN 6 MONTHS
EXTRA REVENUE
+ € 132.616
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Draw conclusions
WHAT DID YOU LEARN FROM THIS TEST?
 A more balanced and structured product page (lower cognitive load) leads to more
interaction above the fold and more bookers
 The product description and details are important for visitors.
 For tablet visitors the new variation did not perform better. This might have been
caused by the product description that was placed below the main image
ADVICE:
- implement the variation on desktop devices and re-test with the product description above the
fold for tablets.
- Run more tests on lowering cognitive load on other pages of the site
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Adjust the Test Roadmap
Questions?
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Towards a Data Driven Test Strategy
From PIE to PIPE:
• Potential: where can we get the biggest lift?
• Impact: score hypothesis based on 5V
• Power: where should we test these hypothesis?
• Ease: how easy is it to test and implement?
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Determine & Analyze Customer Journeys
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Set-up Hypothesis and challenge them
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Determine how and where you can test
these hypothesis (length & MDE)
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Prioritize based on Ease
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
And then…
 Keep in mind the possibility of sample pollution
 Determine extra needed measurements
 Run your test for the designated time period (!)
 Analyze your tests in the analytics tool
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
And then…
 Draw conclusions and implement winning variations asap
 Determine follow-up tests
 Adjust the prioritization model
 … and run another test 
© 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
Successful program?
 Keep track of the number of tests
 Keep track of the percentage of winners
 Keep track of the specific test learnings and overall insights
Thank you!!
@AM_Klaassen
annelytics@outlook.com
nl.linkedin.com/in/amklaassen
Bayesian calculator: abtestguide.com/bayesian
Power calculator: abtestguide.com/calc
Test bandwidth: ondi.me/bandwidth
Sample size / MDE calculator: ondi.me/samplesize

More Related Content

What's hot

What's hot (20)

Workshop 6: Build Your Organization's Optimization Culture
Workshop 6: Build Your Organization's Optimization CultureWorkshop 6: Build Your Organization's Optimization Culture
Workshop 6: Build Your Organization's Optimization Culture
 
Customer journey based vo c
Customer journey based vo cCustomer journey based vo c
Customer journey based vo c
 
How to Reduce Churn with Better Product Adoption
How to Reduce Churn with Better Product AdoptionHow to Reduce Churn with Better Product Adoption
How to Reduce Churn with Better Product Adoption
 
Optimizely Workshop: Mobile Walkthrough
Optimizely Workshop: Mobile Walkthrough Optimizely Workshop: Mobile Walkthrough
Optimizely Workshop: Mobile Walkthrough
 
Losing is the New Winning
Losing is the New WinningLosing is the New Winning
Losing is the New Winning
 
Personalization Strategy Workshop - Los Angeles
Personalization Strategy Workshop - Los AngelesPersonalization Strategy Workshop - Los Angeles
Personalization Strategy Workshop - Los Angeles
 
Getting Started with Server-Side Testing
Getting Started with Server-Side TestingGetting Started with Server-Side Testing
Getting Started with Server-Side Testing
 
Testing Your Testing Program
Testing Your Testing ProgramTesting Your Testing Program
Testing Your Testing Program
 
Retain or Die: The Retention Playbook
Retain or Die: The Retention PlaybookRetain or Die: The Retention Playbook
Retain or Die: The Retention Playbook
 
Qualtrics Vocalize Product Tour: An Inside Look at the Future of Voice of the...
Qualtrics Vocalize Product Tour: An Inside Look at the Future of Voice of the...Qualtrics Vocalize Product Tour: An Inside Look at the Future of Voice of the...
Qualtrics Vocalize Product Tour: An Inside Look at the Future of Voice of the...
 
Cultivating a Culture of Experimentation
Cultivating a Culture of ExperimentationCultivating a Culture of Experimentation
Cultivating a Culture of Experimentation
 
Opticon 2015-Website Redesign Strategies
Opticon 2015-Website Redesign StrategiesOpticon 2015-Website Redesign Strategies
Opticon 2015-Website Redesign Strategies
 
Website Redesigns: Why they Fail and How to Ensure Success
Website Redesigns: Why they Fail and How to Ensure SuccessWebsite Redesigns: Why they Fail and How to Ensure Success
Website Redesigns: Why they Fail and How to Ensure Success
 
Optimizely Workshop 1: Prioritize your roadmap
Optimizely Workshop 1: Prioritize your roadmapOptimizely Workshop 1: Prioritize your roadmap
Optimizely Workshop 1: Prioritize your roadmap
 
Listening to the Voice of the Customer in an Omnichannel World
Listening to the Voice of the Customer in an Omnichannel WorldListening to the Voice of the Customer in an Omnichannel World
Listening to the Voice of the Customer in an Omnichannel World
 
Secrets from the trenches
Secrets from the trenchesSecrets from the trenches
Secrets from the trenches
 
An Experimentation Framework: How to Position for Triple Digit Growth
An Experimentation Framework: How to Position for Triple Digit GrowthAn Experimentation Framework: How to Position for Triple Digit Growth
An Experimentation Framework: How to Position for Triple Digit Growth
 
Product Experimentation | Forming Strong Experiment Hypotheses
Product Experimentation | Forming Strong Experiment HypothesesProduct Experimentation | Forming Strong Experiment Hypotheses
Product Experimentation | Forming Strong Experiment Hypotheses
 
Optimizely X Seminar Amsterdam Nov 10
Optimizely X Seminar Amsterdam Nov 10Optimizely X Seminar Amsterdam Nov 10
Optimizely X Seminar Amsterdam Nov 10
 
Revealing Behavior: Web Analytics Strategy 101
Revealing Behavior: Web Analytics Strategy 101Revealing Behavior: Web Analytics Strategy 101
Revealing Behavior: Web Analytics Strategy 101
 

Similar to Workshop data driven test strategy

Pashina - Digital Content Strategy: Lessons Learned from Translating Customer...
Pashina - Digital Content Strategy: Lessons Learned from Translating Customer...Pashina - Digital Content Strategy: Lessons Learned from Translating Customer...
Pashina - Digital Content Strategy: Lessons Learned from Translating Customer...
LavaCon
 
Vertster - MarketingSherpa Webinar Part 1
Vertster - MarketingSherpa Webinar Part 1Vertster - MarketingSherpa Webinar Part 1
Vertster - MarketingSherpa Webinar Part 1
Vertster.com
 

Similar to Workshop data driven test strategy (20)

Unveiling Our All-New Enhancement Request Model and Customer Support Portal
Unveiling Our All-New Enhancement Request Model and Customer Support PortalUnveiling Our All-New Enhancement Request Model and Customer Support Portal
Unveiling Our All-New Enhancement Request Model and Customer Support Portal
 
SplitMetrics answers burning questions on mobile A/B testing
SplitMetrics answers burning questions on mobile A/B testingSplitMetrics answers burning questions on mobile A/B testing
SplitMetrics answers burning questions on mobile A/B testing
 
Online dialogues and conversion optimization (online tuesday feb 9, 2010)
Online dialogues and conversion optimization (online tuesday feb 9, 2010)Online dialogues and conversion optimization (online tuesday feb 9, 2010)
Online dialogues and conversion optimization (online tuesday feb 9, 2010)
 
Conversion Optimization Framework to Build Sustainable and Repeat Growth
Conversion Optimization Framework to Build Sustainable and Repeat GrowthConversion Optimization Framework to Build Sustainable and Repeat Growth
Conversion Optimization Framework to Build Sustainable and Repeat Growth
 
Transforming Customer and Client Outcomes Through Engaging User Experiences
Transforming Customer and Client Outcomes Through Engaging User ExperiencesTransforming Customer and Client Outcomes Through Engaging User Experiences
Transforming Customer and Client Outcomes Through Engaging User Experiences
 
Engaging_UX
Engaging_UXEngaging_UX
Engaging_UX
 
Integrated marketing 2014
Integrated marketing 2014Integrated marketing 2014
Integrated marketing 2014
 
[Webinar] The Scalable Way: Unlocking Data To Drive Great Customer Experience...
[Webinar] The Scalable Way: Unlocking Data To Drive Great Customer Experience...[Webinar] The Scalable Way: Unlocking Data To Drive Great Customer Experience...
[Webinar] The Scalable Way: Unlocking Data To Drive Great Customer Experience...
 
The Scalable Way: Unlocking Data To Drive Great Customer Experience and Conve...
The Scalable Way: Unlocking Data To Drive Great Customer Experience and Conve...The Scalable Way: Unlocking Data To Drive Great Customer Experience and Conve...
The Scalable Way: Unlocking Data To Drive Great Customer Experience and Conve...
 
Attribution & Our Approach
Attribution & Our ApproachAttribution & Our Approach
Attribution & Our Approach
 
Cracking the Enablement Code of 21st Century Digital Sales Professionals
Cracking the Enablement Code of 21st Century Digital Sales ProfessionalsCracking the Enablement Code of 21st Century Digital Sales Professionals
Cracking the Enablement Code of 21st Century Digital Sales Professionals
 
Pashina - Digital Content Strategy: Lessons Learned from Translating Customer...
Pashina - Digital Content Strategy: Lessons Learned from Translating Customer...Pashina - Digital Content Strategy: Lessons Learned from Translating Customer...
Pashina - Digital Content Strategy: Lessons Learned from Translating Customer...
 
3. Driving Faster Innovation with SAP_Andy Ho
3. Driving Faster Innovation with SAP_Andy Ho3. Driving Faster Innovation with SAP_Andy Ho
3. Driving Faster Innovation with SAP_Andy Ho
 
5 tips als je nu wilt starten met digital marketing analytics
5 tips als je nu wilt starten met digital marketing analytics5 tips als je nu wilt starten met digital marketing analytics
5 tips als je nu wilt starten met digital marketing analytics
 
Cleve Gibbon - Design Principles to Deliver Your Own Customer Experience Plat...
Cleve Gibbon - Design Principles to Deliver Your Own Customer Experience Plat...Cleve Gibbon - Design Principles to Deliver Your Own Customer Experience Plat...
Cleve Gibbon - Design Principles to Deliver Your Own Customer Experience Plat...
 
Vertster - MarketingSherpa Webinar Part 1
Vertster - MarketingSherpa Webinar Part 1Vertster - MarketingSherpa Webinar Part 1
Vertster - MarketingSherpa Webinar Part 1
 
3. Driving faster innovation with SAP_Anne Koh
3. Driving faster innovation with SAP_Anne Koh3. Driving faster innovation with SAP_Anne Koh
3. Driving faster innovation with SAP_Anne Koh
 
How to direct your investments for digital transformation
How to direct your investments for digital transformationHow to direct your investments for digital transformation
How to direct your investments for digital transformation
 
Maximize Quality with SAP Hybris Expert Services
Maximize Quality with SAP Hybris Expert ServicesMaximize Quality with SAP Hybris Expert Services
Maximize Quality with SAP Hybris Expert Services
 
3. Driving Faster Innovation with SAP - Oviani Natalia
3. Driving Faster Innovation with SAP - Oviani Natalia3. Driving Faster Innovation with SAP - Oviani Natalia
3. Driving Faster Innovation with SAP - Oviani Natalia
 

More from Annemarie Klaassen

More from Annemarie Klaassen (10)

10 tips to improve the validity of your experiments
10 tips to improve the validity of your experiments10 tips to improve the validity of your experiments
10 tips to improve the validity of your experiments
 
Emerce GAUC - Optimaliseer je optimalisatieprogramma
Emerce GAUC - Optimaliseer je optimalisatieprogrammaEmerce GAUC - Optimaliseer je optimalisatieprogramma
Emerce GAUC - Optimaliseer je optimalisatieprogramma
 
Verbeter je A/B-testen | customer data conference
Verbeter je A/B-testen | customer data conferenceVerbeter je A/B-testen | customer data conference
Verbeter je A/B-testen | customer data conference
 
Optimaliseer je optimalisatieprogramma - digital analytics conference
Optimaliseer je optimalisatieprogramma - digital analytics conferenceOptimaliseer je optimalisatieprogramma - digital analytics conference
Optimaliseer je optimalisatieprogramma - digital analytics conference
 
MOA awards jury presentatie
MOA awards jury presentatieMOA awards jury presentatie
MOA awards jury presentatie
 
Nhtv gastcollege - Methoden van onderzoek
Nhtv gastcollege - Methoden van onderzoekNhtv gastcollege - Methoden van onderzoek
Nhtv gastcollege - Methoden van onderzoek
 
Conversion Optimization
Conversion Optimization Conversion Optimization
Conversion Optimization
 
Test for business growth
Test for business growthTest for business growth
Test for business growth
 
Optimize for money
Optimize for moneyOptimize for money
Optimize for money
 
What your testtool doesn't tell you
What your testtool doesn't tell youWhat your testtool doesn't tell you
What your testtool doesn't tell you
 

Recently uploaded

一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
ju0dztxtn
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
pyhepag
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
ppy8zfkfm
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cyebo
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 

Recently uploaded (20)

How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
Heaps & its operation -Max Heap, Min Heap
Heaps & its operation -Max Heap, Min  HeapHeaps & its operation -Max Heap, Min  Heap
Heaps & its operation -Max Heap, Min Heap
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 

Workshop data driven test strategy

  • 2. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue 10:00 Introduction 10:15 Where to test? 10:45 What to test? 11:15 How to test? 11:45 Break 12:00 Bayesian statistics 12:30 Post-analysis 13:00 The end…  Program WORKSHOP DATA DRIVEN TEST STRATEGY
  • 4. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
  • 5.
  • 6. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Analytics Psychology
  • 7. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
  • 9. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Junior analyst Director Digital Analytics Customer Experience Marketeer CRO & Analytics CRO Specialist Conversion Specialist Data Scientist Webanalist Web Analytics Consultant Product Owner Online Data Specialist
  • 10. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue
  • 11. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue 10:00 Introduction 10:15 Where to test? 10:45 What to test? 11:15 How to test? 11:45 Break 12:00 Bayesian statistics 12:30 Post-analysis 13:00 The end…  Program WORKSHOP DATA DRIVEN TEST STRATEGY
  • 13. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue • Potential: Where can we get the biggest lift? Where can we get the biggest lift?
  • 14. Opinions are like **sholes everyone has one
  • 15. Without an analyzed reason to start, it doesn’t make any sense to start at all
  • 16. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Goal Customer Behaviour Study  Insight in the most important customer journeys  Understand behaviour  Input for setting (test) hypothesis
  • 17. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Customer Behaviour Study View Voice Verified (1st party) Verified (2nd party) Value
  • 18. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Web Analytics Heatmaps Recordings Market data Customer Behaviour Study View Voice Verified (1st party) Verified (2nd party) Value
  • 19. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue ViewWeb analytics  Where do visitors start on the site?  Where do they come from?  What is the flow of those visitors?  Are there notable differences between segments or products?  What’s the behavior on specific test pages?
  • 20. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue  Where do visitors start their journey?  Difference between new / existing customers?  Difference per device? Landing pages View
  • 21. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue  Where do visitors come from?  Do they already have a product in mind?  Do they already know the brand? Traffic source View
  • 22. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Funnel Analysis View  What’s the CTR (and return rate) to the next step?  Exit rate and time on page per step?  Determine for each segment / product
  • 23. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Customer Journey Analysis View
  • 24. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Behavior on important pages View  What’s the next action visitors take on the page?  What was the previous action taken by visitors?  What choices do they make?  Does this differ among segments / products?
  • 25. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue ViewHeat- and scroll maps
  • 26. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue ViewSession recordings
  • 27. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Web Analytics Heatmaps Recordings Market data Customer Service Surveys Feedback tools User research Customer Behaviour Study View Voice Verified (1st party) Verified (2nd party) Value
  • 28. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Talk to customer service!
  • 29. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Ask for feedback online Voice
  • 30. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue  Interview customers  Focus groups  Usability studies VoiceUser research
  • 31. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Web Analytics Heatmaps Recordings Customer Service Surveys Feedback tools User research Previous tests Customer Behaviour Study View Voice Verified (1st party) Verified (2nd party) Value
  • 32. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Insight previous tests Verified (1st party)  What have you tested already?  What have you learned form those experiments?
  • 33. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Web Analytics Heatmaps Recordings Market data Customer Service Surveys Online Chat Feedback tools User research Previous tests Customer Behaviour Study View Voice Verified (1st party) Verified (2nd party) Value Scientific research Competitors
  • 34. Scientific literature Verified (2nd party)  What do we know from scientific literature?  In general about decision-making processes  And specifically about the type of products sold
  • 35. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Web Analytics Heatmaps Recordings Market data Customer Service Surveys Online Chat Feedback tools User research Mission Vision Strategy Goals Previous tests Customer Behaviour Study View Voice Verified (1st party) Verified (2nd party) Value Scientific research Competitors
  • 36. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue What to do with all this info? 1. Fix all the bugs and implement no-brainers 2. Growth-hack on places where you can not test 3. Set hypotheses and build an A/B-test road map
  • 37. I APPLY THISIf , then THIS BEHAVORIAL CHANGE will happen, ( among THIS GROUP ), THIS REASONbecause of . Set up concrete hypothesis
  • 38. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Challenge your hypothesis • Potential: Where can we get the biggest lift? • Impact: Score hypothesis based on 5V
  • 39. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Impact: Score Hypothesis based on 5V
  • 41. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue 10:00 Introduction 10:15 What to test? 10:45 Where to test? 11:15 How to test? 11:45 Break 12:00 Bayesian statistics 12:30 Post-analysis 13:00 The end…  Program WORKSHOP DATA DRIVEN TEST STRATEGY
  • 43. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue How should we test the hypothesis? • Potential: Where can we get the biggest lift? • Impact: Score Hypothesis based on 5V • Power: How should we test the hypothesis?
  • 44.
  • 45. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Time span Conversionspermonth Risk + Optimization + Automation Re-think 10.000 conversions per month 1.000 conversions per month
  • 46. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Where can we test? Map out your different page(type)s and determine: Can I run a test on this page with a Power of >=80%?
  • 47. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Test Power Determination WHY?  To determine what page are eligible to test on  To structure A/B-tests  To define the pages with the highest impact
  • 48. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Power & Significance Do not reject H0 Reject H0 H0 is true H0 is false Reality Measured
  • 49. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Null hypothesis H0: Defendant is innocent Alternative hypothesis Ha: Defendant is guilty Present the evidence Collect data Judge the evidence “Could the data plausibly have happened by chance if the defendant is actually innocent?” Yes Fail to reject H0 No Reject H0
  • 50. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue P-value “Could the data plausibly have happened by chance if the null hypothesis is true?” Null hypothesis H0: Conversion rates of default and variation B are the same Alternative hypothesis Ha: Variation B is better Present the evidence Collect data Yes Fail to reject H0 No Reject H0
  • 51. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Significance Do not reject H0 Reject H0 H0 is true H0 is false Correct decision  (Significance) Measured Reality
  • 52. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Significance Do not reject H0 Reject H0 H0 is true Type I False Positive (α) H0 is false Correct decision  (Significance) Measured Reality
  • 53. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Power Do not reject H0 Reject H0 H0 is true Correct decision  (Power) Type I False Positive (α) H0 is false Correct decision  (Significance) Measured Reality
  • 54. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Power Do not reject H0 Reject H0 H0 is true Correct decision  (Power) Type I False Positive (α) H0 is false Type II False Negative (β) Correct decision  (Significance) Measured Reality
  • 55. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Power & Significance Power Only test on pages with a high Power (>80%)  otherwise you don’t detect effects when there is an effect to be detected (False negatives). Significance Test against a high enough significance level (90% or 95%)  otherwise you’ll declare a winner, when in reality there isn’t an effect (False positives).
  • 56. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Power “Statistical power is the likelihood that an experiment will detect an effect when there is an effect there to be detected”. Depends on: • Sample size • Effect size • Significance level
  • 57. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Power examples abtestguide.com/calc
  • 58. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Power examples abtestguide.com/calc Power: 64,7%
  • 59. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Power: sample size +
  • 60. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Power: sample size + Power: 85,4%
  • 61. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Power: effect size +
  • 62. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Power: effect size + Power: 97,5%
  • 63. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Test Power Determination WHY?  To determine what page are eligible to test on  To structure A/B-tests  To define the pages with the highest impact
  • 64. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Test Power Determination HOW? 1. Determine important KPI’s, segments / test platforms 2. Map out all the different page types for each flow 3. Determine unique weekly visitors per page type 4. Determine unique visitors with a conversion per page type 5. Determine test eligibility (test duration and minimum effect)
  • 65. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Test Power Determination 2. MAP OUT ALL THE DIFFERENT PAGE TYPES PER FLOW 1. Homepage 2. Listing 3. Product page 4. Campaign landing page 5. Cart 6. Checkout step 1 7. …
  • 66. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Test Power Determination 3. DETERMINE UNIQUE WEEKLY VISITORS PER PAGE TYPE  We run and evaluate A/B tests on the unique visitor metric: we want to influence unique users
  • 67. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Test Power Determination 4. DETERMINE UNIQUE VISITORS WITH A CONVERSION PER PAGE TYPE  Visitors must have seen the test page before they converted € Converted
  • 68. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Word of Caution KPI NEEDS TO BE BINARY! 1. A/B-test tools and calculators in the market are only compatible with binary variables (0/1 variables) 2. You either convert or you don’t 3. Most important assumption: the distribution follows the normal distribution (symmetrical)  KPI can’t be AOV, average satisfaction, number of pageviews etc.
  • 69. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Test Power Determination 5. DETERMINE TEST ELIGIBILITY (TEST DURATION AND MINIMUM EFFECT) Ondi.me/bandwidth If weekly conversions are lower than 250, then testing becomes very challenging
  • 70. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Test Power Determination 5. DETERMINE TEST ELIGIBILITY (TEST DURATION AND MINIMUM EFFECT) How many weeks should we run the test? How big of an uplift is feasible?  The longer you test, the smaller the uplift you are able to recognize  Aim for uplifts at least lower than 10%
  • 71. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Sample size <> MDE Ondi.me/samplesize
  • 72. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Test Power Determination 5. DETERMINE TEST ELIGIBILITY (TEST DURATION AND MINIMUM EFFECT) Ondi.me/bandwidth
  • 73. Power: How much uplift do we expect?
  • 74. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue How easy is it to test and implement? • Potential: Where can we get the biggest lift? • Impact: Score Hypothesis based on 5V • Power: How should we test these hypothesis? • Ease: How easy is it to test and implement?
  • 77. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue 10:00 Introduction 10:15 What to test? 10:45 Where to test? 11:15 How to test? 11:45 Break 12:00 Bayesian statistics 12:30 Post-analysis 13:00 The end…  Program WORKSHOP DATA DRIVEN TEST STRATEGY
  • 79. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Test duration  You need a sample large enough not to be vulnerable to the data’s natural variability  You need a sample representative of your overall audience
  • 80. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Variability – 10 coin tosses  % heads varies between 30 and 80%
  • 81. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Variability – 100 coin tosses  % heads varies between 49 and 54%
  • 82. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Central Limit Theorem “The sampling distribution of the mean of any independent, random variable will be normal or nearly normal, if the sample size is large enough.”
  • 83. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Representativeness  Make sure the proportion of each group in the sample is representative of the total population  Highly dependent upon sample size
  • 84. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Representativeness Variation A Variation B Visitors 1.000 1.000 Conversions 184 162 Conversion rate 18,4% 16,2%  Variation B performs 12,3% worse
  • 85. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Representativeness Variation A Variation B Visitors 1.000 1.000 Conversions 184 162 Conversion rate 18,4% 16,2% % new visitors 53% 60%  Just because the sample isn’t representative (because of low sample size), variation B seems to perform worse CR new 3,2% CR repeat 35,6%
  • 86. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Day-of-the-week effects  Test for full weeks to rule out day-of- the-week effects
  • 87. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Don’t stop test too early!  Determine the test duration (sample size) upfront  Stick to the duration!
  • 88. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Don’t stop test too early! http://www.einarsen.no/is-your-ab-testing-effort-just-chasing-statistical-ghosts/
  • 89. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Don’t stop test too early! https://destack.home.xs4all.nl/projects/significance/#
  • 90. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Sample pollution “Sample pollution happens when people in a test see both variations (ABBA-effect) due to any uncontrolled external factor.
  • 91. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Cookie deletion  When a visitor deletes its cookie, he/she will be newly assigned to a variation in their next visit  In case of an A/B-test this means a probability of 50% of seeing the wrong variation  The more variations you test, the higher the probability he/she will see the wrong variation
  • 92. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Cookie deletion  5-10% deletes their cookie in every session  Estimations in the market: 50% cookie deletion within a few months, 31% within a month  How big is your cookie deletion rate within the test period?
  • 93. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Cross-device usage  When a visitor visits the site on another device as well, he/she has a 50% of seeing the wrong variation (with 2 variations)  The more variations you test, the higher the probability that he/she will see the wrong variation  Do research how big this issue is on your site Did you visit our website on other devices in the last week? (i.e. desktop/smartphone/tablet) a) No, I only visited website.com with my current device b) Yes, on 1 other device c) Yes, on 2 or more other devices
  • 94. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Customer Journey Effects  When it takes a couple of visits for a visitors to convert, it’s likely that he/she visited the site prior to the test  The longer the customer journey, the higher this likelihood  Do research how big this issue is on your site
  • 95. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Test Duration vs Sample Pollution The longer the test runs, the smaller the difference in conversion rate you can detect, but…the longer the test runs, the higher the chance of sample pollution  It’s a balance between the two: we recommend testing for max 4 weeks
  • 97. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Can’t I just rely on the tool?  You could, but tools are a black-box  If you integrate with you analytics tool you have more data: you control the data, you can do the calculations yourself and analyse more!
  • 98. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Test set-up & integration  Integrate your test tool with your analytics tool  Use Custom Dimensions / eVars or Event tracking to measure the variations  Rather use the Code Editor instead of the standard integration, so you have better control over when to fire the measurement //Set variable information var testID = 'OD000', testNamed = 'Testname', testVariation = 'A: Control'; //Integration Google Analytics ga('create', 'UA-xxxxxx-x', 'auto'); ga('send', 'event', 'AB-Test', testID+' - '+testNamed+' - '+testVariation, testVariation, {'nonInteraction': 1});
  • 99. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Q & A your test  Is the variation working correctly?  Are all the extra measurements working correctly?  Always check browser compatibility  And device compatibility  Do not break dynamic stuff!
  • 101. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue 10:00 Introduction 10:15 What to test? 10:45 Where to test? 11:15 How to test? 11:45 Break 12:00 Bayesian statistics 12:30 Post-analysis 13:00 The end…  Program WORKSHOP DATA DRIVEN TEST STRATEGY
  • 102. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Frequentist statistics Bayesian statistics
  • 104. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Null hypothesis H0: Conversion rates of default and variation B are the same Alternative hypothesis Ha: Conversion rate of variation B is better Present the evidence Collect data Judge the evidence (p-value) “Could the data plausibly have happened by chance if the null hypothesis is true?” Yes Fail to reject H0 No Reject H0
  • 105.
  • 106. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue H0 = Variation A and B have the same conversion rate Hard to Understand Ondi.me/vis
  • 107. So the p-value only tells you: How unlikely is it that you found this result, given that the null hypothesis is true (that there is no difference between the conversion rates)
  • 108. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Focus on finding proof
  • 109. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Challenge 2: Focus on finding proof
  • 110. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue What’s the alternative? Frequentist statistics Bayesian statistics
  • 111. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Bayesian Test evaluation
  • 112. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue abtestguide.com/bayesian/
  • 113. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Bayesian Test evaluation 89,1% A test result is the probability that B outperforms A: ranging from 0% - 100%
  • 114. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue IMPLEMENT B PROBABILITY * EFFECT ON REVENU Expected risk 10,9% - € 204.400 Expected uplift 89,1% € 647.150 Contribution € 554.552 * Based on 6 months and an average order value of € 175 Make a risk assessment
  • 115. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue What’s an acceptable probability?  Depends on the business : how much risk is the business willing to take?  Depends on the type of test : how invasive is the test? How much resources does it cost?
  • 116. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Adding direct value Learning user behavior
  • 117. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue The cut-off probability for implementation is not the same as the cut-off probability for a learning We still need the scientist! CHANCE LEARNING? (Customer Intelligence) < 70 % No learning 70 – 85 % Indication (need retest to confirm) 85 – 95 % Strong indication (need congruent other data) > 95 % Learning
  • 119. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue 10:00 Introduction 10:15 What to test? 10:45 Where to test? 11:15 How to test? 11:45 Break 12:00 Bayesian statistics 12:30 Post-analysis 13:00 The end…  Program WORKSHOP DATA DRIVEN TEST STRATEGY
  • 120. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Before you start analyzing  Know the test duration. Only check during the test that conversion are coming in and the traffic is evenly distributed.  Check if the population of users that have seen the test are about the same per variation (to make sure you have a representative sample)  Determine how to isolate the test population (users who have seen the variation).  Determine the test goals and how to isolate those users (users who have seen the variation followed by a test goal).
  • 121. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Post-analysis - Basics  Analyse in the analytics tool and not in the test tool.  Avoid sampling .  Analyse users not sessions.  Analyse users who have converted (0/1 variable) not users and total conversions.
  • 122. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Users of the test  We run and evaluate A/B tests on the unique visitor metric: we want to influence unique users
  • 123. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Users in the test  Build a segment for the specific targeting of the test
  • 124. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Users with a conversion  Visitors must have seen the test page before they converted € Converted
  • 125. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Users with a conversion  Build a 2nd sequential segment with page seen  converted
  • 126. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Build a Custom Report in GA AND APPLY THE SEGMENTS FOR USERS AND USERS WITH A CONVERSION
  • 127. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Or even automate it! https://groups.google.com/forum/#!forum/ google-analytics-spreadsheet-add-on
  • 128. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Or even automate it!
  • 129. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Calculate the chance that B > A
  • 130. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Main result (a) THE TEST VARIATION LEADS TO A HIGHER ADD TO CART RATIO  In the original 14,89% adds a product to their cart, in the variation 15,49%. This is an uplift of +3,99%  The chance that the variation leads to more visitors who put a product in their cart is 96,7% VISITORS ADD TO CART ATC RATIO UPLIFT A 24.590 3.662 14,89% B 24.396 3.778 15,49% +3,99% Chance 96,7% Chance 3,3% Chance of B outperforming A
  • 131. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Main result (b) THE TEST VARIATION LEADS TO A HIGHER CONVERSION RATE  In the original 5,82% finishes a order, in the variation 6,11%. This is an uplift of +5,02%  The chance that the variation leads to more orders is 91,3% VISITORS ORDERS CR UPLIFT A 24.590 1.431 5,82% B 24.396 1.491 6,11% +5,02% Chance 91,3% Chance 8,7% Chance of B outperforming A
  • 132. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Post-analysis - Deep dive Analyse relevant segments:  User type  Device category  Channel  Entry page Also look at other relevant statistics:  Time on test page / exit% / bounce%  Interactions on the test page
  • 133. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Word of Caution DON’T SEGMENT ON EVERYTHING! • Each analyzed segment needs to have enough visitors and conversions • Be careful with too many segments: you might end up with False Positives! (each segment counts as a new A/B-test, so the likelihood of a False Positive increases with each segment) • If you find a difference between segments  run a separate A/B-test to confirm the finding
  • 134. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Results per important segment  The uplift in conversion rate is only apparent on desktop devices (+7,54%). On tablet devices a drop in conversion rate was measured of -2,47%  For both new and returning visitors the variation performed better (new: +3,25%, returning: +7,62%) THE VARIATION PERFORMS BEST FOR DEKSTOP AND RETURNING USERS
  • 135. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Behavioral changes  The changes on the product page lead to more interaction with the page: more visitors navigated through the images and more visitors selected a size or color.  The product description and details were placed below the fold (at least for tablet). Interaction with this element decreased. But still 13,5% clicks on this element.  Exit rate and time on page weren’t influenced by the variation MORE INTERACTION ABOVE THE FOLD CLICKED ON PHOTO SELECTED SIZE/COLOR CLICKED ON PRODUCT INFO ADD TO WISHLIST A 43,8% 16,9% 15,0% 1,4% B 53,4% 17,5% 13,5% 1,4% +21,9% +3,3% -9,9% +1,4% PRODUCT PAGE INTERACTIONS EXIT RATE TIME ON PAGE A 8,60% 41 sec B 8,64% 41 sec PRODUCT PAGE STATISTICS
  • 136. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue ABTESTGUIDE.COM/BAYESIAN/ Calculate the impact on revenue
  • 137. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Business CaseEXPECTED EFFECT ON REVENUE IN 6 MONTHS AFTER IMPLEMENTATION IMPLEMENT WINNER IN 6 MONTHS EXTRA REVENUE + € 132.616
  • 138. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Draw conclusions WHAT DID YOU LEARN FROM THIS TEST?  A more balanced and structured product page (lower cognitive load) leads to more interaction above the fold and more bookers  The product description and details are important for visitors.  For tablet visitors the new variation did not perform better. This might have been caused by the product description that was placed below the main image ADVICE: - implement the variation on desktop devices and re-test with the product description above the fold for tablets. - Run more tests on lowering cognitive load on other pages of the site
  • 139. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Adjust the Test Roadmap
  • 141. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Towards a Data Driven Test Strategy From PIE to PIPE: • Potential: where can we get the biggest lift? • Impact: score hypothesis based on 5V • Power: where should we test these hypothesis? • Ease: how easy is it to test and implement?
  • 142. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Determine & Analyze Customer Journeys
  • 143. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Set-up Hypothesis and challenge them
  • 144. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Determine how and where you can test these hypothesis (length & MDE)
  • 145. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Prioritize based on Ease
  • 146. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue And then…  Keep in mind the possibility of sample pollution  Determine extra needed measurements  Run your test for the designated time period (!)  Analyze your tests in the analytics tool
  • 147. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue And then…  Draw conclusions and implement winning variations asap  Determine follow-up tests  Adjust the prioritization model  … and run another test 
  • 148. © 2017 – Online Dialogue - for MeasureCamp workshop attendees only – Do not duplicate, share or copy without permission of Online Dialogue Successful program?  Keep track of the number of tests  Keep track of the percentage of winners  Keep track of the specific test learnings and overall insights
  • 149. Thank you!! @AM_Klaassen annelytics@outlook.com nl.linkedin.com/in/amklaassen Bayesian calculator: abtestguide.com/bayesian Power calculator: abtestguide.com/calc Test bandwidth: ondi.me/bandwidth Sample size / MDE calculator: ondi.me/samplesize

Editor's Notes

  1. I am Annemarie Klaassen and I work as an analytics and optimization expert at online dialogue. I studied at Tilburg University where I completed my master in Leisure Studies and Marketing Management. I have a real passion for data and traveling. I actually just returned from a trip to NY, so I’m a bit jetlagged. Hopefully you won’t be able to notice it too much.
  2. We work at OD: a conversion rate optimization agency in Utrecht. Our goal is to grow businesses by improving their conversion rate.
  3. There are a couple more conversion rate optimization agencies in the Netherlands, but our USP is the combination we make between analytics and psychology. We combine data insights with psychological insights for evidence based growth.
  4. AN We do this for a bunch of clients in the Netherlands and also for some pretty cool international clients. For most we do high velocity testing. Which means , we run multiple tests per week for them.
  5. P: Potential I: Impact P: Power E: Ease
  6. Waar zit de aandacht op de pagina? Welke elementen worden wel/niet gebruikt?
  7. P: Potential I: Impact P: Power E: Ease
  8. The first thing you do is map out all the different page types you have on your website, then look at the weekly unique visitors you have on that page type and the conversions through that page as well. Then you determine whether the pages have enough test power – based on these numbers. Now you might wonder what test Power actually means, well..
  9. Frequentist testing is very much like a court trial in the US. The null hypothesis says that the defendant is innocent and the alternative hypothesis says that the defendant is guilty. We then present evidence or, or in other words, collect data. Then, we judge this evidence and ask ourselves the question, could the data plausibly have happened by chance if the null hypothesis were true? If the data were likely to have occurred under the assumption that the null hypothesis were true, then we would fail to reject the null hypothesis, and state that the evidence is not sufficient to suggest that the defendant is guilty. If the data were very unlikely to have occurred, then the evidence raises more than a reasonable doubt about the null hypothesis, and hence we reject the null hypothesis. 
  10. This judging of evidence is done with the p-value.
  11. If you test against a significance level of 90%, then you will have a 10% false positive rate (10% of your declared winners aren’t real winners)
  12. If you test against a Power of 80%, then in 20% of the tests you won’t declare a winner, when in fact it is.
  13. The test power is the likelihood that an experiment will detect an effect when there is an effect to be detected. You want to make sure you can find the winning variation in the collected data. The power depends on 3 elements: the sample size (so on how much traffic you run your test), the effect size (that means the actual uplift in conversion) and the chosen significance level.
  14. If you visit Abtestguide.com you can calculate the Power of a test given the number of visitors and conversions and the expected uplift of the test. In this case you have 10.000 visitors per variation and 1000 conversions in the control. You expect an uplift of 5%. This results in a Power of only 65%. This is not very high! You will only detect 65% of the time a winner when there is a winner to be detected. A ground rule for the Power is at least 80%. To increase the Power of the test you can do 3 things:
  15. If you visit Abtestguide.com you can calculate the Power of a test given the number of visitors and conversions and the expected uplift of the test. In this case you have 10.000 visitors per variation and 1000 conversions in the control. You expect an uplift of 5%. This results in a Power of only 65%. This is not very high! You will only detect 65% of the time a winner when there is a winner to be detected. A ground rule for the Power is at least 80%. To increase the Power of the test you can do 3 things:
  16. You can increase the sample size: or the number of visitors in your experiment. If you double the test duration (so you get 20.000 visitors and 2000 conversions), then you see that the distributions of the 2 variations lie further apart. Hence, the power increases to 85,4%.
  17. You can increase the sample size: or the number of visitors in your experiment. If you double the test duration (so you get 20.000 visitors and 2000 conversions), then you see that the distributions of the 2 variations lie further apart. Hence, the power increases to 85,4%.
  18. The other element is effect size; how much uplift do you expect from your variation? If you expect an uplift of 10% instead of 5% then your test Power increases immensely. You need to be aware what kind of uplift can be expected of the A/B-test. You learn this by doing a lot of experiments, but it’s quite rare to find winning variation with an uplift higher than 10%. Most of the time it’s not higher than 5%. This of course also depends on the type of test your doing. If you only change a headline you probably won’t get a 10% uplift.
  19. The other element is effect size; how much uplift do you expect from your variation? If you expect an uplift of 10% instead of 5% then your test Power increases immensely. You need to be aware what kind of uplift can be expected of the A/B-test. You learn this by doing a lot of experiments, but it’s quite rare to find winning variation with an uplift higher than 10%. Most of the time it’s not higher than 5%. This of course also depends on the type of test your doing. If you only change a headline you probably won’t get a 10% uplift.
  20. P: Potential I: Impact P: Power E: Ease
  21. You can look at different segments in your data, look at click behavior per variation, time on page and other micro conversions.
  22. What are the main ways of analysing A/B-tests then? The most common approach to analysing A/B-tests is the t-test (which is based on frequentist statistics). But, over the last couple of years Bayesian statistics have grown in popularity. I will try to explain both in a bit.
  23. We will start with frequentist statistics.
  24. Frequentist testing is very much like a court trial in the US. The null hypothesis says that the defendant is innocent and the alternative hypothesis says that the defendant is guilty. We then present evidence or, or in other words, collect data. Then, we judge this evidence and ask ourselves the question, could the data plausibly have happened by chance if the null hypothesis were true? If the data were likely to have occurred under the assumption that the null hypothesis were true, then we would fail to reject the null hypothesis, and state that the evidence is not sufficient to suggest that the defendant is guilty. If the data were very unlikely to have occurred, then the evidence raises more than a reasonable doubt about the null hypothesis, and hence we reject the null hypothesis. 
  25. IT’s a mnemonic to remember what to do 
  26. I will give an example how this translates to an A/B-test. When you use a t-test you first state a null hypothesis. You calculate the p-value and decide to reject the null hypothesis or not. So you try to reject the hypothesis that the conversion rates are the same. So, suppose you did an experiment and the p-value of that test was 0.01. The p-value in this experiment tells you that There is a 1% chance of observing a difference as large as you observed even if the two means are identical. The p-value is very low, so the H0 gets to go.
  27. The other challenge with using frequentist statistics is that an A/B-test can only have 2 outcomes: you either have a winner of no winner. And the focus is on finding those real winners. You want to take as little risk as possible. This is not so surprising if you take into account that t-tests have been used in a lot of medical research as well. Of course you don’t want to bring a medicine to the market if you’re not 100% sure that it won’t make people worse of kill them. You don’t want to take any risk whatsoever. But businesses aren’t run this way. You need to take some risk in order to grow your business.
  28. If you take a look at this test-result you would conclude that there is no winner, that it mustn’t be implemented and that the measured uplift in conversion rate wasn’t enough. So you will see this a loser and move on to another test idea. However, there seems to be a positive movement (the measured uplift is 5%), but it isn’t big enough to recognize as a significant winner. You probably only need a few more conversions.
  29. If Frequentists statistics confronts us with these kind of challenges, what’s the alternative then? Well as I said earlier, the most common approach to analysing A/B-tests is the t-test (which is based on frequentist statistics). But, over the last couple of years more and more software packages (like VWO and Google Optimize) are switching to Bayesian statistics. And that’s not without reason, because using Bayesian statistics makes more sense, since it better suits how businesses are run and I will show you why.
  30. So, when you use Bayesian statistics, to evaluate your A/B-test, then there is no difficult statistical terminology involved anymore. There’s no null hypothesis, no p-value or z-value et cetera. It just shows you the measured uplift and the probability that B is better than A. Easy right? Everyone can understand this. Based on the same numbers of the A/B-test we showed you earlier you have a 89,1% chance that B will actually be better than A. Probably every manager would understand this and will like these odds.
  31. Recently we turned this Bayesian Excel calculator into a webtool as well. It’s for everyone free to use. If you visit this URL you can input your test data and calculate! It will return the chance that B outperforms A.
  32. When using a Bayesian A/B-test evaluation method you no longer have a binary outcome like the t-test does. A test result won’t tell you winner / no winner, but a percentage between 0 and 100% whether the variation performs better than the original. In this example 89,1%. The question that remains is: is this enough to be implemented?
  33. What you can do is make a risk assessment. You can calculate what the results mean in terms of revenue. When the client decides to implement the variation they have a 10.9% chance of a drop in revenue of 200.000 in 6 months time (and an average order value of 175) But on the other hand, they also have a 89.1% chance that the variation is actually better and brings in nearly 650.000 euro. You can show this table to your boss and ask whether he would place the bet.
  34. Well that depends on a couple of things. If you would implement a test variation with a probability of 51% then you’re not doing much better than just flipping a coin. The risk of implementing a losing variation is quite high. Depending in the type of business you may be more or less willing to take risks. If you are a start-up you might want to take more risk then a full grown business, but still we don’t really like the chance to lose money so what we see with our clients that most need at least a probability of 70%. But it also depends on the type of test. If you only changed a headline then the risk is lower, then when you need to implement a new functionality on the page. This will consume much more resources. Hence, you will need a higher probability.
  35. The purpose of A/B-testing is of course to add direct value, but we still want to learn about user behavior. If you really want to learn from user behavior then you need to test very strict (say with >95%). Otherwise you only have a hunch, but you don’t have proof.
  36. We take these numbers as a ballpark. If the test has a probability lower than 70% we won’t see it as a learning. If the percentage lies between 70 and 85% we see it as an indication something is there, but we need a retest to confirm the learning. Anything between 85 en 95% is a very strong indication. So we would do follow-up tests on other parts of the website to see if it works there too. And the same as with a t-test: when the chance is higher than 95% we see it as a real learning. So even though you would implement the previous test, it doesn’t prove the stated hypothesis. It shows a strong indication, but to be sure the hypothesis is true you need follow-up tests to confirm this learning.
  37. Recently we turned this Bayesian Excel calculator into a webtool as well. It’s for everyone free to use. If you visit this URL you can input your test data and calculate! It will return the chance that B outperforms A.
  38. Average order value of € 63