Optimization Summer Games - Get started with A/B testing
1. Results are always greener
on the other side
Lessons learned from failed or inconclusive experiments
Strategy Consultant
@LTatarov
Lev Tatarov
2. You are not going to get wins all the time!
* N = 90k, May 2014 - July 2016, >=10k visitors, wins = significant uplift on 1 or more goal
3. You are not going to get wins all the time!
* N = 90k, May 2014 - July 2016, >=10k visitors, wins = significant uplift on 1 or more goal
Inconclusive results
4. You are not going to get wins all the time!
Inconclusive resultsNo wins
* N = 90k, May 2014 - July 2016, >=10k visitors, wins = significant uplift on 1 or more goal
5. We need to get better at learning from
losing and inconclusive experiments !!!
7. Hypothesis: If we add press mentions at the
bottom of the homepage, we will generate
more clicks on the CTA because it will create
trust in the brand
Blacklane
Result: No significant difference
A
B
Conclusion: Visitors are not driven to convert
by press mentions
Next steps: ...???
8. A great hypothesis begins with the problem, not the solution
Problem Solution Result
9. Meaningful hypotheses drive focus
Problem
Solution Solution Solution
Problem
Solution Solution Solution
Company goal
Time
11. Hypothesis: Because we have unused
real-estate above the fold on the homepage, if
we add press mentions, we will increase
booking CTA conversion
Blacklane
Result: No significant difference
A
B
Conclusion: Visitors are not driven to convert
by press mentions
Next steps: What else can we use this
real-estate for?
12. Blacklane - next solution
Result: Increased conversion on CTA!B
Hypothesis: Because we have unused
real-estate above the fold on the homepage, if
we add USPs, we will increase booking CTA
conversion
14. Hypothesis: Because videos are more
engaging and informative, if we use them
instead of photos on the product page,
conversion will increase
Chrome Industries
Result: +0.2% in conversion
A
B
15. • Use a facade to test whether there is general
interest in a certain functionality
• Saves the effort of full implementation
• Allows gradual testing of functionalities
Smoke testing
17. Use smoke testing to measure
demand for complex functionality
before it is built
Insight #2
18. Hypothesis: Because videos are more
engaging and informative, if we use images
instead of photos on the product page,
conversion will increase
Chrome Industries
Result: +0.2% in conversion
A
B
Conclusion: Difference between variation is not
big enough to justify production costs of videos
for all products
20. Hypothesis: Because of users’ reading habits (F
shape), if we move the videos link to the left
side of the menu it will be more noticeable and
will drive more visitors to the videos page,
IGN
Result: -92.3% in clicks
A
B
21. Insight #3
A significant drop in an important
metric might mean you found
something your users care about
or sensitive to
22. Why? Visitors believed that the section was
deleted / didn’t bother looking for it or went to
find the videos elsewhere (youtube)
Hypothesis: Because of users’ reading habits (F
shape), if we move the videos link to the left
side of the menu it will be more noticeable and
will drive more visitors to the videos page,
IGN
Result: -92.3% in clicks
A
B
Who? After segmenting the results, it was clear
that the change affected mostly returning
visitors
24. Hypothesis: Because the current layout was
not clean and clear enough, if we give the user
an obvious next step and remove distractions,
cart check-out rates will increase
Rubylane
Result: Inconclusive
A
B
What happened?
25. Insight #4
If results are very unexpected,
take some time to validate your
test design
26. Hypothesis: Because the current layout was
not clean and clear enough, if we give the user
an obvious next step and remove distractions,
cart check-out rates will increase
Rubylane - round 2!
Result: +5% cart check-out
A
B
27. Recap
● Strong hypotheses enable learning from failures -
Start with a meaningful problem definition
● Small or inconclusive impact might mean that you
are not testing something your users care about
● Use smoke testing to measure demand for complex
functionality before it is built
● A significant drop in an important metric might
mean you found something your users care about
or sensitive to
● If results are very unexpected, take some time to
validate your test design