How To Fail At Testing - Inbound17 - Madison Hajeb

#INBOUND17#INBOUND17
HOW TO FAIL AT TESTING
MADISON HAJEB | CLEARLINK
@MADISONHAJEB

#INBOUND17
PRE-TEST
MID-TEST
POST-TEST

#INBOUND17
1.
NOT
DOING
ENOUGH
RESEARCH

#INBOUND17
S CI E NTI FI C ME THO D
Unlike intuitive, philosophical or religious methods for acquiring
knowledge, the scientific method relies on empirical, repeatable tests to
reveal the truth
QUESTION OBSERVATIONS HYPOTHESIS
1. NOT DOING ENOUGH RESEARCH

#INBOUND17
RE S E ARCH AP P RO ACHE S
QUALITATIVE QUANTITATIVE

#INBOUND17
R E S E A R C H - D R I V E N T E S T D E S I G N

#INBOUND17
2.
SKIPPING
TEST
DURATION
CALCULATION

#INBOUND17
F A C T O R S F O R T E S T D U R A T I O N
2. SKIPPING TEST DURATION CALCULATION
1. Traffic Volume (Visitors)
2. Baseline Success Rate (KPI Completions)
3. Difference Between Experiences
(Minimum Detectable Effect (MDE))
4. Statistical Significance
5. Statistical Power

#INBOUND17
Low Traffic, High KPI Conversion, Small Difference
= Needs lots of days!

#INBOUND17
High Daily Traffic, Low Daily KPI Conversion, Small Difference
= Needs lots of days!

#INBOUND17
High Daily Traffic, High Daily KPI Conversion, Large Difference
= Fast testing!

#INBOUND17
CALCULATI O N
Input
p1 = .1
lift = .05
p2 = p1*(1+lift)
sample <- power.prop.test(p1 = p1, p2 = p2,
power = .8, sig.level = .05,
alternative = 'one.sided')
daily_traffic = 1500
days_week = 7
sample[[1]]/((daily_traffic/2)*days_week)
Output
[1] 8.66 (weeks)
Sample Size Per Group 90930
Weeks Needed (50/50) 8.66
2. SKIPPING TEST DURATION CALCULATION
1. Traffic Volume (Visitors)
2. Baseline Conversion Rate
3. Difference Between Experiences
(Minimum Detectable Effect (MDE))
4. Statistical Significance
5. Statistical Power

#INBOUND17
3.
MISSING A
HYPOTHESIS

#INBOUND17
E X P E RI ME NTAL DE S I G N
QUESTION OBSERVATIONS HYPOTHESIS
3. MISSING A HYPOTHESIS

#INBOUND17
S O U R C E : L I V E
S C I E N C E
“The basic idea of a hypothesis is
that there is no pre-determined
outcome. ”
3. MISSING A HYPOTHESIS

#INBOUND17
4.
NOT
CHECKING
WEBSITE
UPDATE
SCHEDULE

#INBOUND174. NOT CHECKING WEBSITE UPDATE SCHEDULE

#INBOUND17
5.
SETTING
UP TOO
MANY
VARIATIONS

#INBOUND17
RI S KS
• Not testing impactful changes
• Lack of focus in the experiment
• Can take way longer than it needs to
• Chance of sample pollution (users with different devices or cleared cookies getting
into a different experience)
• False positives (see next)
5. SETTING UP TOO MANY VARIATIONS

#INBOUND17
S TATI S TI CAL RI S K
Confidence Level = 100%-Significance
Significance = .05, Confidence = 95%
Significance = .05 * 20 variations =
1 (significant purely by chance)
Significance = .05 * 80 variations =
4 (significant purely by chance)

#INBOUND17
BO NFE RRO NI CO RRE CTI O N
Confidence Level = 100%-Significance
Desired Confidence Level = 95%
.05/20 = .0025 = .25 significance level
100% - .25% = 99.75% confidence level for an
individual test
.05/80 = .000625 = .0625 significance level
100% - .0625% = 99.9375% confidence level for an individual test

#INBOUND17
6.
IGNORING
PAGE
FLICKER
(FOOC)

#INBOUND17
Source: WiderFunnel
E X AMP LE
6. IGNORING PAGE FLICKER (FOOC)

#INBOUND17
FO O C FI X E S
• Make sure the testing snippet is in the <head> (as high as possible!)
• Reduce the size of your testing snippet
• Don’t use testing software to make development changes (slows down the test load)
• Make sure jQuery is above the testing snippet on the page
• Use raw JavaScript instead of jQuery
• QA, QA, QA!
6. IGNORING PAGE FLICKER (FOOC)

#INBOUND17
7.
CHANGING
THE SITE
MID-TEST

#INBOUND17
E X AMP LE
Control
Variation
7. CHANGING THE SITE MID-TEST

#INBOUND17
E X AMP LE
Desired Update (Mid-Test)
Unfortunate Result

#INBOUND17
EXAMP LE 2 ( SAME TEST)
Original KPI measurement: stores.site.com
Day 3: URL changed to site.com/stores
KPI tracking broken

#INBOUND17
8.
MEASURING
THE WRONG
KPI (OR
CHANGING
IT)

#INBOUND17
TEXAS
SHARPSHOOTER
FALLACY
8. MEASURING THE WRONG KPI (OR CHANGING IT)

#INBOUND17
CHANG I NG THE KP I
If you use a metric further up a conversion funnel to speed
up testing, you have to make sure that there is a direct,
measurable relationship between that metric and your
actual KPI
8. MEASURING THE WRONG KPI (OR CHANGING IT)

#INBOUND17
9.
DRIVING
NEW
TRAFFIC
MID-TEST

#INBOUND17
KPI = Entrée Sales
9. DRIVING NEW TRAFFIC MID-TEST

#INBOUND17
W A Y S T O C H A N G E T R A F F I C M I D - T E S T
• Start a paid search (PPC) campaign centered around new keyword
grouping
• Start a promotion (deal-focused users)
• Have a publicity moment (news channels, reddit, social media
virality, etc)
• Many, many more
9. DRIVING NEW TRAFFIC MID-TEST

#INBOUND17
10.
CHANGING
TRAFFIC
ALLOCATION

#INBOUND17
S O U R C E :
O P T I M I Z E L Y
“When you change a variation’s traffic allocation mid-
experiment, all new users will be allocated accordingly
from then on.
However, all users that entered your experiment before the
change will be bucketed into the same variation they
entered previously, altering the results and making it
difficult to interpret the conversion rate.”
10. CHANGING TRAFFIC ALLOCATION

#INBOUND17
WHY DO E S THI S HAP P E N?
• You have a risk-averse company
• The executives feel like this will speed up the test (it doesn’t)
• Someone gets antsy (or excited) when results aren’t behaving the
way they expected
• Etc.

#INBOUND17
W H A T ’ S T H E R I S K ?
Risk of changing allocation mid-test:
N = 1M visitors per day
Friday = Treatment performed better
Saturday = Treatment performed better
Combined = Appears treatment did worse
Simpson’s Paradox

#INBOUND17
11.
STOPPING
THE TEST
WHEN IT
REACHES
SIGNIFICANCE

#INBOUND17
E X AMP LE
Control
Variation
11. STOPPING THE TEST WHEN IT REACHES SIGNIFICANCE

#INBOUND17
12.
NEGLECTING
THE “WHY”

#INBOUND17
E X AMP LE
12. NEGLECTING THE “WHY”

#INBOUND17
ANALY S I S AP P RO ACHE S
QUALITATIVE QUANTITATIVE
12. NEGLECTING THE “WHY”

#INBOUND17
13.
IGNORING
TEMPORAL
FLUCTUATIONS

#INBOUND17
If daily traffic isn’t representative of all traffic, you need to have statistical
significance in ‘time segment rotations,’ not necessarily days
Think of your traditional behavior cycles as ‘time segments’
TI ME S E G ME NTS
13. IGNORING TEMPORAL FLUCTUATIONS

#INBOUND1713. IGNORING TEMPORAL FLUCTUATIONS

#INBOUND17
14.
ASSUMING
SOMETHING
WORKS
FOR
EVERYONE

#INBOUND17
S E G ME NTATI O N G RO UP S
• Buyer modalities
• Gender
• Age
• Region
• Social groupings
• Etc
14. ASSUMING SOMETHING WORKS FOR EVERYONE

#INBOUND17
15.
NOT
IMPLEMENTING
POSITIVE
RESULTS

#INBOUND17
S TRATE G I E S
• Find allies in your
organization with site
owners, project managers,
key stakeholders
• Understand product
development cycles and
request queue structures
15. NOT IMPLEMENTING POSITIVE RESULTS

#INBOUND17
16.
NOT
TESTING
AT ALL

#INBOUND17
S O U R C E :
F U N N E L E N V Y
“The truth is, nothing else
offers the ROI potential of a
good CRO campaign”
16. NOT TESTING AT ALL

#INBOUND17
E X AMP LE
Control
Variation
16. NOT TESTING AT ALL
= $10M REVENUE

#INBOUND17#INBOUND17
T H A N K Y O U
M A D I S O N H A J E B | C L E A R L I N K
@ M A D I S O N H A J E B

How To Fail At Testing - Inbound17 - Madison Hajeb

Recommended

Recommended

More Related Content

Similar to How To Fail At Testing - Inbound17 - Madison Hajeb

Similar to How To Fail At Testing - Inbound17 - Madison Hajeb (20)

Recently uploaded

Recently uploaded (20)

How To Fail At Testing - Inbound17 - Madison Hajeb