@commagereBlake Commagere
A/B Testing
Technically Split or Bucket Testing, but
nobody calls it that.
Blake Commagere @commagere
Who the hell am I?
It’s a fair question
●Started seven companies
oFive of those have been bought
●Raised over 12M Angel / VC
●I build viral things
oPlaxo
oCauses
oVampires
●I’m obsessed with human behavior
2
Blake Commagere @commagere
My Fucking Talks
Analogies Rule
Dave McClure Blake Commagere
●Makes sailors blush with
his swearing
●Occasionally uses
hyperbole in his talks
●Is funny
●Makes Dave McClure
blush with his swearing
●The most egregious
abuse of hyperbole
●Thinks he’s funny
3
Blake Commagere @commagere
This talk is for you.
Interrupt at any point.
Don’t be shy.
If it sucks and you never say
anything, I’m blaming you.
4
Blake Commagere @commagere
What I’m covering
●What the hell is A/B testing?
●Why should you care?
●How should you A/B test?
●How can A/B testing fuck you over?
●A few ancillary benefits of A/B testing
5
Blake Commagere @commagere
What I’m NOT covering
AKA “Shit you should Google if you don’t already know”
●Familiarize yourself with concepts like
oConfidence Intervals
oComparative Error
oStatistical Significance
oSample Size requirements
 FYI, there are online calculators for all these things
●How to fucking do math
oYes there’s a lot of math in A/B Testing
oChances are, the tools you use will do the math for you
 If not, your code better do the math for you
6
Blake Commagere @commagere
What the hell is A/B testing?
7
Blake Commagere @commagere
What the hell is A/B Testing?
Will it hurt?
●Randomized Experiment
oControl & Variant
oTests against a specific goal
oTypically you want to establish Statistical Significance
with a Confidence Level of 95%
●Bucket or Split Testing
oTechnically, this is what you want most of the time
oMultiple variants
oPeople usually say A/B when they mean Bucket/Split
oJust go with it
8
Blake Commagere @commagere
Why should you care?
9
Blake Commagere @commagere
Do I really need A/B Testing?
Technically, you may not need it
●Do you have users?
●If not, do you plan on having users?
●If not… congrats! You don’t need A/B Testing
because you don’t have a business!
●For everyone else, you need A/B Testing.
10
Blake Commagere @commagere
You are not your user
Your users are much, much stupider
You Your User
●Busy as hell
●Dedicated
●Very knowledgeable
●Eager to learn
●Not Stupid
●Lazy
●Impatient
●Stupid
●Stupid
●Real. Stupid.
11
Blake Commagere @commagere
The Result?
You’re building the product for you
●You are not your user
●Your instincts are probably wrong
●Your wants/needs are probably different
●You view the market differently
12
Blake Commagere @commagere
Still not convinced?
●Create and Make are synonyms
●“Make an Ad” vs “Create an Ad”
●Software Prefers ‘Create’ because REST, CRUD
●Make outperformed by up to 20% for non-artists
●There is a mental barrier to Create - it sounds
more involved/complex. Making is easier.
13
Create vs Make - one word makes all the difference
Blake Commagere @commagere
How should you A/B Test
14
Blake Commagere @commagere
Blockers on your A/B testing
If this is happening, all your tests will suck
●Shit is broken
oFIX IT NOW WHY ARE YOU FUCKING READING THIS
●Shit is slow
oPageSpeed on frontend,
oProfilers on backend / mobile
●Official definitions of slow*
oA webpage that takes > ~2 seconds & no progress indicator
oAn app that isn’t immediately responsive
*according to me
15
Blake Commagere @commagere
Ensuring tests maximize impact
AKA the most obvious advice ever
1)ABT. Always Be Testing
2)Win as frequently as possible
3)In areas that matter
●Any moron can get #1 right
●Most morons can’t do #2 or #3
●Essentially, you need to know what to test and
how to test.
16
Blake Commagere @commagere
Only a few features matter
That’s it
●Users are either engaged or not
●Data on your userbase tells you which 2-3 features
matter
●Optimize these features
●Make Onboarding focus on these features
●Facebook example:
oFriends, Photos, Status Updates
oNew User Experience focused on these
oNetwork leverage to help new users
17
Blake Commagere @commagere
Similarly, only a few flows matter
●Is a flow critical?
odoes it get a user engaged with a critical feature
odoes it help another user engage with a critical feature
●Is every step necessary?
oAlmost always, the answer is no
●Where are the leaks?
oAlmost always, this is the result of bad messaging
oFind the worst leak, fix it, then move to the next one
18
That’s also it
Blake Commagere @commagere
Your messaging is never perfect
AKA test your fucking messaging
●In most cases it is better to:
oUse fewer words - your users are lazy
oBe Colloquial - your users are dumb
oTarget the user - your users are selfish
●You can always target better
oSome data is implicit to your existing data
oYour users only care about what they get
●Language is always evolving
oLanguage in every channel is evolving as well
oChannels change & language in a channel changes!
19
Blake Commagere @commagere
Tools of the trade
●Web:
oGoogle Analytics (Content Experiments)
oOptimizely
oMixPanel
●Mobile:
oMixPanel
oOptimizely
oSwrve
20
Blake Commagere @commagere
Which tool should I use?
●The best one for your company:
oIt depends
oOn. so. many. things.
●You’ll end up doing some coding too probably.
●The worst one for your company?
oNot having one
oCreating your own from scratch
21
Blake Commagere @commagere
How A/B Testing can fuck you over
22
Blake Commagere @commagere
Pitfalls
Some very common mistakes
●Find statistical significance sample size in advance
oYou want 95% confidence level
oWith 20,000 users, this can be 377 people. NOT
50/50
●Do not take convenience samples
o2-4 weeks is ideal
●Test in 7, 14, 28 day intervals
oHuman behavior is day dependent, tied to week
oYour results will skew if you do 8 day tests
23
Blake Commagere @commagere
The Lies We Tell Ourselves
Sometimes we like lying because we’re lazy
●You can make the numbers lie
●Avoiding a channel that you hate
oEmail works
oSEO works
oTwitter works
oFacebook works
oIf a channel doesn’t work for you,
you’re probably doing it wrong
24
Blake Commagere @commagere
Farmville
How to optimize for suck
Farmville on Launch Farmville after ~2 yrs
25
Blake Commagere @commagere
Ancillary Benefits
26
Blake Commagere @commagere
Designer vs Engineer
Your users are much, much stupider
Designer Engineer
●Better not suck at design
●Better have design exp
●Better respect design
●Will not always agree with
engineers on design
●Not Stupid
●Probably sucks at design
●May not have design exp
●Should respect design
●Will not always agree
with designers on design
●Not Stupid
27
Blake Commagere @commagere
Using A/B Testing for Team Bonding
●The loudest voice usually gets their way
●This can stifle good ideas
●Sometimes it’s ok to test a bad idea
●Becomes a teachable moment
●You could be wrong (GASP!)
28
Blake Commagere @commagere
Want more swearing and/or advice?
blake.commagere@gmail.com
@commagere
29

A/B Testing That Matters

  • 1.
    @commagereBlake Commagere A/B Testing TechnicallySplit or Bucket Testing, but nobody calls it that.
  • 2.
    Blake Commagere @commagere Whothe hell am I? It’s a fair question ●Started seven companies oFive of those have been bought ●Raised over 12M Angel / VC ●I build viral things oPlaxo oCauses oVampires ●I’m obsessed with human behavior 2
  • 3.
    Blake Commagere @commagere MyFucking Talks Analogies Rule Dave McClure Blake Commagere ●Makes sailors blush with his swearing ●Occasionally uses hyperbole in his talks ●Is funny ●Makes Dave McClure blush with his swearing ●The most egregious abuse of hyperbole ●Thinks he’s funny 3
  • 4.
    Blake Commagere @commagere Thistalk is for you. Interrupt at any point. Don’t be shy. If it sucks and you never say anything, I’m blaming you. 4
  • 5.
    Blake Commagere @commagere WhatI’m covering ●What the hell is A/B testing? ●Why should you care? ●How should you A/B test? ●How can A/B testing fuck you over? ●A few ancillary benefits of A/B testing 5
  • 6.
    Blake Commagere @commagere WhatI’m NOT covering AKA “Shit you should Google if you don’t already know” ●Familiarize yourself with concepts like oConfidence Intervals oComparative Error oStatistical Significance oSample Size requirements  FYI, there are online calculators for all these things ●How to fucking do math oYes there’s a lot of math in A/B Testing oChances are, the tools you use will do the math for you  If not, your code better do the math for you 6
  • 7.
    Blake Commagere @commagere Whatthe hell is A/B testing? 7
  • 8.
    Blake Commagere @commagere Whatthe hell is A/B Testing? Will it hurt? ●Randomized Experiment oControl & Variant oTests against a specific goal oTypically you want to establish Statistical Significance with a Confidence Level of 95% ●Bucket or Split Testing oTechnically, this is what you want most of the time oMultiple variants oPeople usually say A/B when they mean Bucket/Split oJust go with it 8
  • 9.
  • 10.
    Blake Commagere @commagere DoI really need A/B Testing? Technically, you may not need it ●Do you have users? ●If not, do you plan on having users? ●If not… congrats! You don’t need A/B Testing because you don’t have a business! ●For everyone else, you need A/B Testing. 10
  • 11.
    Blake Commagere @commagere Youare not your user Your users are much, much stupider You Your User ●Busy as hell ●Dedicated ●Very knowledgeable ●Eager to learn ●Not Stupid ●Lazy ●Impatient ●Stupid ●Stupid ●Real. Stupid. 11
  • 12.
    Blake Commagere @commagere TheResult? You’re building the product for you ●You are not your user ●Your instincts are probably wrong ●Your wants/needs are probably different ●You view the market differently 12
  • 13.
    Blake Commagere @commagere Stillnot convinced? ●Create and Make are synonyms ●“Make an Ad” vs “Create an Ad” ●Software Prefers ‘Create’ because REST, CRUD ●Make outperformed by up to 20% for non-artists ●There is a mental barrier to Create - it sounds more involved/complex. Making is easier. 13 Create vs Make - one word makes all the difference
  • 14.
    Blake Commagere @commagere Howshould you A/B Test 14
  • 15.
    Blake Commagere @commagere Blockerson your A/B testing If this is happening, all your tests will suck ●Shit is broken oFIX IT NOW WHY ARE YOU FUCKING READING THIS ●Shit is slow oPageSpeed on frontend, oProfilers on backend / mobile ●Official definitions of slow* oA webpage that takes > ~2 seconds & no progress indicator oAn app that isn’t immediately responsive *according to me 15
  • 16.
    Blake Commagere @commagere Ensuringtests maximize impact AKA the most obvious advice ever 1)ABT. Always Be Testing 2)Win as frequently as possible 3)In areas that matter ●Any moron can get #1 right ●Most morons can’t do #2 or #3 ●Essentially, you need to know what to test and how to test. 16
  • 17.
    Blake Commagere @commagere Onlya few features matter That’s it ●Users are either engaged or not ●Data on your userbase tells you which 2-3 features matter ●Optimize these features ●Make Onboarding focus on these features ●Facebook example: oFriends, Photos, Status Updates oNew User Experience focused on these oNetwork leverage to help new users 17
  • 18.
    Blake Commagere @commagere Similarly,only a few flows matter ●Is a flow critical? odoes it get a user engaged with a critical feature odoes it help another user engage with a critical feature ●Is every step necessary? oAlmost always, the answer is no ●Where are the leaks? oAlmost always, this is the result of bad messaging oFind the worst leak, fix it, then move to the next one 18 That’s also it
  • 19.
    Blake Commagere @commagere Yourmessaging is never perfect AKA test your fucking messaging ●In most cases it is better to: oUse fewer words - your users are lazy oBe Colloquial - your users are dumb oTarget the user - your users are selfish ●You can always target better oSome data is implicit to your existing data oYour users only care about what they get ●Language is always evolving oLanguage in every channel is evolving as well oChannels change & language in a channel changes! 19
  • 20.
    Blake Commagere @commagere Toolsof the trade ●Web: oGoogle Analytics (Content Experiments) oOptimizely oMixPanel ●Mobile: oMixPanel oOptimizely oSwrve 20
  • 21.
    Blake Commagere @commagere Whichtool should I use? ●The best one for your company: oIt depends oOn. so. many. things. ●You’ll end up doing some coding too probably. ●The worst one for your company? oNot having one oCreating your own from scratch 21
  • 22.
    Blake Commagere @commagere HowA/B Testing can fuck you over 22
  • 23.
    Blake Commagere @commagere Pitfalls Somevery common mistakes ●Find statistical significance sample size in advance oYou want 95% confidence level oWith 20,000 users, this can be 377 people. NOT 50/50 ●Do not take convenience samples o2-4 weeks is ideal ●Test in 7, 14, 28 day intervals oHuman behavior is day dependent, tied to week oYour results will skew if you do 8 day tests 23
  • 24.
    Blake Commagere @commagere TheLies We Tell Ourselves Sometimes we like lying because we’re lazy ●You can make the numbers lie ●Avoiding a channel that you hate oEmail works oSEO works oTwitter works oFacebook works oIf a channel doesn’t work for you, you’re probably doing it wrong 24
  • 25.
    Blake Commagere @commagere Farmville Howto optimize for suck Farmville on Launch Farmville after ~2 yrs 25
  • 26.
  • 27.
    Blake Commagere @commagere Designervs Engineer Your users are much, much stupider Designer Engineer ●Better not suck at design ●Better have design exp ●Better respect design ●Will not always agree with engineers on design ●Not Stupid ●Probably sucks at design ●May not have design exp ●Should respect design ●Will not always agree with designers on design ●Not Stupid 27
  • 28.
    Blake Commagere @commagere UsingA/B Testing for Team Bonding ●The loudest voice usually gets their way ●This can stifle good ideas ●Sometimes it’s ok to test a bad idea ●Becomes a teachable moment ●You could be wrong (GASP!) 28
  • 29.
    Blake Commagere @commagere Wantmore swearing and/or advice? blake.commagere@gmail.com @commagere 29

Editor's Notes

  • #4 Ok, obviously I’m a huge fan of Dave. I just like giving him attitude and poking fun at him.
  • #7 What is NOT covered is important to stress - you can look this shit up. If you don’t know this stuff already, don’t worry, I’m not going to waste everyone’s time making them do formulas. But someone on the team better be familiar with these concepts so that you A/B test correctly.
  • #16 Your messaging will be largely rendered irrelevant if it takes 7 seconds to load
  • #18 Your messaging will be largely rendered irrelevant if it takes 7 seconds to load
  • #20 Your messaging will be largely rendered irrelevant if it takes 7 seconds to load
  • #29 Let’s let the customer decide