SlideShare a Scribd company logo
1 of 37
A/B Testing for Everyone
Pavel Dmitriev
Some slides taken from talks by Ronny Kohavi
About Me
B.S. Applied Math @ Moscow State University, Russia
 Ph.D. Computer Science @ Cornell University focused on applied Machine
Learning
 3 years @ Yahoo!, worked on web crawling and indexing optimization
 8 years @ Microsoft, worked on experimentation in Bing, MSN, O365,
Skype, Windows
 5 months @ Outreach, working on experimentation, ML, NLP
3
About Me
B.S. Applied Math @ Moscow State University, Russia
 Ph.D. Computer Science @ Cornell University focused on applied Machine
Learning
 3 years @ Yahoo!, worked on web crawling and indexing optimization
 8 years @ Microsoft, worked on experimentation in Bing, MSN, O365,
Skype, Windows
 5 months @ Outreach, working on experimentation, ML, NLP
4
Outline
• Intro to A/B testing
• Examples of real experiments
• Experimentation adoption across industries
• Five challenges preventing faster adoption, in Sales and in Software
5
The Life of a Great Idea – True Bing
Story
6
Control – Existing Display Treatment – new idea called Long Ad Titles
The Life of a Great Idea
• It was one of hundreds of ideas on the table, and it seemed
• Stayed in the backlog in
• Many features were above it, it was clear the idea was not going to make it any time
soon
• The engineer thought it was trivial to implement. He implemented it and started an A/B
test.
• Immediately an alert fired: the Revenue was abnormally high (usually indicates a bug)
• But in this case there was no bug. The idea increased Bing’s revenue by 12% (over
$100M/year), without hurting user experience metrics!
7
…meh…
Feb March April May June
We are bad at assessing the value of
ideas
• The best revenue generating idea in Bing history was badly rated and delayed for
months!
At Microsoft, we ran a study in Bing and found that only ~1/3 of ideas developed were actually
good for users and business, ~1/3 were neutral, and ~1/3 were bad
• Only in Software Engineering?
In Sales, contradicting “best practices” are abundant. For example, best day to contact the
prospect is …
In Medicine, correctly evaluating an idea, e.g. a new drug, is a matter of life and death. FDA and
EMA do not trust expert opinions and mandates the use of Randomized Controlled Trials
8We can’t trust our gut! To make the right choices we need data from real users!
9
Collecting Usage Data
• Companies have always been collecting data to learn what their users appear to
value
Interviews, focus groups, questionnaires, and other similar techniques are great at revealing
what users say they do
Although rich with qualitative information, the learnings from these techniques are typically
based on small samples and risk being biased, making it hard to generalize
• With the internet connectivity of the products, companies can collect feedback data
to learn what their customers actually value
Telemetry and logging reveal what the customers actually do
10
Use Data Correctly - Correlation is not
Causation
• Seattle is known for its rain
• Whenever I see people on the street carrying
umbrellas, very soon it starts raining
• I may conclude that umbrellas cause the rain,
and decide to ban them
• Banning umbrellas, however, won’t stop the
rain; it will just make everyone more wet
11
Photo by Mike Waller, taken from Flickr
Relying on correlations isn’t just neutral, it’s often harmful to the business!
Correlation is not Causation – Real
Example
• You observe the churn rates for users using/not-using your feature:
25% of new users who do NOT use your feature churn (stop using product 30 days later); only
10% of new users who use your feature churn
• [Wrong] Conclusion: your feature reduces churn and thus critical for retention
Flaw: Relationship between the feature and retention is correlational, the data above is
insufficient for any causal conclusion
• Example: Users who see error messages in Office 365 churn less.
This does NOT mean we should show more error messages. They are just heavier users of
Office 365 12
Using Data Correctly – Before and After
13
Flaw: This approach misses
time related factors such as
external events, weekends,
holidays, seasonality, etc.
0
5
10
15
20
25
30
35
Amazon Kindle Sales
Website A Website B
Before and after example
0
5
10
15
20
25
30
35
Amazon Kindle Sales
Website A Website B
Oprah calls
Kindle "her new
favorite thing"
The new site (B) is always worse
than the original (A), opposite of
what observational data
suggests
A/B Tests in One Slide
• Other names: Controlled Experiments, Randomized Clinical Trials (RCTs)
• Can have more than two variants: A/B/C/etc. tests are common
• Must run statistical tests to confirm differences are not due to chance
14A/B Tests are the best scientific way to prove causality!
Real Examples
• Three experiments
• Each had enough users for statistical validity
• For each experiment I’ll tell you the success metric
• Your job is to guess the result
Please stand up
You’ll chose between three options by raising you left hand, right hand, or leave both
hand down
If you get it wrong, please sit down
• Since there are 3 choices for each question, random guessing implies 100%/3^3 =~ 4% will
get all three questions right. Let’s see how much better than random you can do.
Example 1: Outreach Email (Step 9, Day 7)
• Success metric: Reply Rate 16
Hey {{first_name}},
In short, we're a sales automation platform that makes your
reps life a lot easier. Our average companies (based on 1100+
companies) have tripled their reply rates on cold outbound
emails and boosted rep productivity by 2x.
We take what your best reps are doing and automate that
across your entire team so your weaker reps can work at the
highest possible same level. We also solve the issue of follow
up falling through the cracks and reps not going deep enough.
When can I get a few minutes on your calendar to discuss?
{{sender.first_name}}
{{first_name}},
I'm sure in your role you get a ton of sales-driven emails, probably most of which are
spam you have no interest in. My goal is to provide enough value to warrant a 15 minute
call with you.
What we do is put your sales process into a structured series of touch points which takes
care of your follow-up process for you. This ramps up reps activities and ensures that every
lead is thoroughly worked, never gets lost and receives the 5 to 12 touches where 80% of
sales happen.
Second, we do all the administrative work for in your CRM (Salesforce). This frees up your
reps time, logs their activities, and gives you 100% accurate reporting.
Finally, we open up the "Black Box" of sales and show you in real time how each rep is
performing, what activities they're doing, and what is and isn't working. This provides a solid
foundation to accurately forecast results, improve your outreach and train your team.
Over 1100 companies (like CenturyLink, Adobe, and Marketo) use us and their average rep
saves 2 hrs a day, and 2X's their productivity.
If you see value here can we set up a time next Tuesday or Wednesday to discuss?
{{sender.first_name}}
• Left: shorter, more “salesy”
• Right: longer, more “socially
• Raise your left hand if you think the Left version wins (stat-sig)
• Raise your right hand if you think the Right version wins (stat-sig)
• Don’t raise your hand if they are the about the same (no stat-sig difference)
Example 1: Outreach Email (Step 9, Day 7)
17
Hey {{first_name}},
In short, we're a sales automation platform that makes your
reps life a lot easier. Our average companies (based on 1100+
companies) have tripled their reply rates on cold outbound
emails and boosted rep productivity by 2x.
We take what your best reps are doing and automate that
across your entire team so your weaker reps can work at the
highest possible same level. We also solve the issue of follow
up falling through the cracks and reps not going deep enough.
When can I get a few minutes on your calendar to discuss?
{{sender.first_name}}
{{first_name}},
I'm sure in your role you get a ton of sales-driven emails, probably most of which are
spam you have no interest in. My goal is to provide enough value to warrant a 15 minute
call with you.
What we do is put your sales process into a structured series of touch points which takes
care of your follow-up process for you. This ramps up reps activities and ensures that every
lead is thoroughly worked, never gets lost and receives the 5 to 12 touches where 80% of
sales happen.
Second, we do all the administrative work for in your CRM (Salesforce). This frees up your
reps time, logs their activities, and gives you 100% accurate reporting.
Finally, we open up the "Black Box" of sales and show you in real time how each rep is
performing, what activities they're doing, and what is and isn't working. This provides a solid
foundation to accurately forecast results, improve your outreach and train your team.
Over 1100 companies (like CenturyLink, Adobe, and Marketo) use us and their average rep
saves 2 hrs a day, and 2X's their productivity.
If you see value here can we set up a time next Tuesday or Wednesday to discuss?
{{sender.first_name}}
• Left template has 70% higher reply rate… However, most replies are
negative or unsubscribe requests. The right template has higher positive
• If you did not raise your hand, sit down…
• If you raise your right hand, sit down…
Example 2: SERP Truncation
• SERP is a Search Engine Result Page
(shown on the right)
• Success Metric: Clickthrough Rate on first SERP
(ignore issues with click/back, page 2, etc.)
• Version A: show 10 algorithmic results
• Version B: show 8 algorithmic results by
removing the last two results (shown on the right)
• All else the same: task pane, ads, related
searches
18
• Raise your left hand if you think version A wins (10 results)
• Raise your right hand if you think version B wins (8 results)
• Don’t raise your hand if they are the about the same
Example 2: SERP Truncation
• If you raised your left hand, sit down…
• If you raised your right hand, sit down…
• With over 3M users in each variant, we could not
detect a stat-sig delta. Users simply shifted the
clicks from the last two algorithmic results to
other elements of the page.
• Rule of Thumb: Shifting clicks is easy. Reducing
abandonment is hard.
19
Example 3: Windows Search Box
• The search box in the lower left corner of the screen on Windows machines
20
• Success metrics: more searches (and thus more Bing revenue)
• Raise your left hand if you think the Left version wins
• Raise your right hand if you think the Right version wins
• Don’t raise your hand if they are the about the same
Example 3: Windows Search Box
21
• If you did not raise your hand, sit down…
• If you raised your left hand, sit down…
• The four variants we actually tested in order of performance are:
Type here to search (winner)
What can I help you find?
Ask me anything (Control - the design that shipped with Windows 10)
Search the web and Windows (worst)
Stop guessing – get the data!
Experimentation Adoption:
Microsoft
22
Experimentation Adoption: Software
Industry
• http://www.exp-growth.com/ -
survey to determine the state
of experimentation maturity
(Fabijan et al, ICSE 2017,
SEAA 2018)
23
0
5
10
15
20
25
Crawl Walk Run Fly
State of Exp Growth
Other industries? Let’s look at
Sales
• Most of Outreach ~2500 customers fall into Crawl stage, with many not doing any
A/B testing at all
• Few sales organizations have a systematic experimentation program
• Huge potential: some experiments we ran doubled reply rates!
24
What are the reasons for low adoption in
Sales?
• A few facts about sales
Very traditional industry (some say the oldest profession on earth), slow to change
No formal education or degrees, considered entry level and pays low
Requires extreme mental toughness. You are constantly ignored and told no. You’ve got a
monthly quota, and if you don’t meet it 3 months in row – you are fired
• There’s a fear of change: sales managers are afraid to try new ideas, fearing it may
cause harm and result in missing their quota
25
What are the reasons for low adoption in
Sales?
• Inadequate support for experimentation in sales tools, leading to most tests being
invalid, and inability to confidently make decisions even on valid tests
26
no statistical
testing
any user can turn the
variants on/off any time
during the test
Any user can edit the
email being tested any
time during the test
Vast majority of the
tests are broken (e.g.
imbalance in deliveries)
How to increase the adoption?
• We need to make experimentation
Trustworthy – results are correct and easily understood
Safe – impact of testing bad ideas is limited
Easy to use – enable non-technical sales managers and executives answer their
questions
• These are the same things I worked on trying to increase adoption of
experimentation at Microsoft!
Except… the bar is higher!!!
27
Five Gaps
1. No open source trustworthy A/B testing solution
2. Difficult to come up with the right metrics
3. Small sample sizes
4. Difficult to understand results of statistical tests
5. Hard to translate business questions into experiment designs
Between the needs of Sales Industry and the experimentation State of the
Art
28
Solving these issues will help accelerate experimentation adoption in Sales, Software, and other domains
#1. Open Source A/B Testing
Platform
• Pretty much anything “platform” is open
source, except A/B testing
Wasabi, the only option, is not maintained
• Our http://exp-growth.com survey showed
that most companies build their own platform
from scratch (Fabijan et al, SEAA 2018).
This is hard - a big investment few
companies can afford.
29
0%
10%
20%
30%
40%
50%
60%
70%
80%
Internally developed
platform
Third party platform No platform (manual coding
of experiments)
Type of Exp Platform
• There’s a need for an easy to deploy and integrate open source A/B testing solution
that is easy to use, supports several common experiment designs, and provides safety
features
#2. Determining the right metrics
• How to judge the result of an A/B test?
OEC = Overall Evaluation Criteria, or OMTM = One Metric That Matters
A single metric or a few key metrics with a well-defined decision criteria
• Two key properties:
1. Alignment with long-term company goals (directionality)
2. Ability to impact (sensitivity)
• Finding a good OMTM is hard, in Sales and in Software Products
Simple metrics like Opens or Replies to sales emails are not predictive of future sale (fail directionality)
Long-term metrics like Sales or Revenue take too long to measure - typical sales cycle takes months - and are
hard to impact via small changes like email content (fail sensitivity)
Outreach solution – Positive Replies, where “positive” is determined via an ML classifier
See A/B Testing at Scale Tutorial for examples from Software industry 30
#3. Small Sample Sizes
• A typical 2-week A/B test for a mid-size Outreach customer will only have hundreds-
to-thousands data points in each variant
This translates to being able to detect only changes of ~20% or more
• Solutions:
Run bigger tests (at Outreach we recommend to always run 50/50 tests)
Select more sensitive metrics: 20% increase in Revenue is hard, 20% increase in Positive
Replies is easier
Start by focusing on bigger changes rather than small tweaks. As the company grows and
volume of sales activity increases, can focus on smaller and smaller changes
Implement smarter experiment designs (e.g. cross-over design) and analysis methods (e.g.
CUPED)
31
#4. Understanding Experiment
Results
• Standard way of evaluating experiments via Null
Hypothesis Testing can be easily misinterpreted, leading
to wrong conclusions
See Steve Goodman’s A Dirty Dozen for 12 ways to get it
wrong
Can’t show p-values to sales reps, need an easier way to
interpret results
32
• Treatment effect may be different on different sub-populations
Results may vary depending on country, browser, location, prospect persona, sales step, etc.
How to automatically detect and visualize such heterogeneous results?
#4. Understanding Experiment
Results
• Each experiment needs to have clear success criteria, mapping unambiguously to
positive/negative outcomes
• Summarize results and learnings in an easy to understand visual way (Fabijan et al,
SEAA 2018)
33
#5. Answering Business Questions
• Traditionally, A/B testing have been used to answer simple yes/no questions like
Does my new medicine help?
Should I ship my new feature?
Is my new email subject line better?
• However, managers and execs think of bigger more difficult questions
Does embedding videos in e-mails help?
How urgently should sales reps reply to prospects?
How much should I invest in improving performance of my site?
• Using A/B testing to help answer these questions can help greatly accelerate adoption of
experimentation
Run a series of experiments on embedding video across all key scenarios
Run a series of experiments notifying users to reply with different delays across multiple scenarios
Run a series of “slowdown” experiments to estimate impact of performance on revenue
• Need to develop design patterns for such “learning experiment series”
34
Summary
• We are bad at assessing the value of our ideas. Don’t trust experts – get the data!
• A/B testing is the best scientific way to measure causal impact of your work on users and business
• Experimentation adoption is growing in Software Industry, but very low in other industries like
Sales
• Five challenges slowing down the adoption:
1. No open source trustworthy A/B testing solution
2. Difficult to come up with the right metrics
3. Small sample sizes
4. Difficult to understand results of statistical tests
5. Hard to translate business questions into experiment designs
• Solving these challenges will not only help Sales, it will accelerate experimentation
adoption in Software and other industries, bringing experimentation to Everyone!35
Questions?
Slides will be posted on my LinkedIn page: www.linkedin.com/in/paveldmitriev/
A/B Testing for Everyone

More Related Content

What's hot

7 Lessons From Building Search Engine Products by CB Insights PM
7 Lessons From Building Search Engine Products by CB Insights PM7 Lessons From Building Search Engine Products by CB Insights PM
7 Lessons From Building Search Engine Products by CB Insights PMProduct School
 
Product Management Metrics | Saeed Khan | ProductTank Toronto
Product Management Metrics | Saeed Khan | ProductTank Toronto Product Management Metrics | Saeed Khan | ProductTank Toronto
Product Management Metrics | Saeed Khan | ProductTank Toronto Product Tank Toronto
 
Optimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with StatisticsOptimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with StatisticsOptimizely
 
Optimizely Experience Customer Story - Atlassian
Optimizely Experience Customer Story - AtlassianOptimizely Experience Customer Story - Atlassian
Optimizely Experience Customer Story - AtlassianOptimizely
 
[CXL Live 16] A/B Testing Pitfalls: Getting Numbers is Easy; Getting Numbers ...
[CXL Live 16] A/B Testing Pitfalls: Getting Numbers is Easy; Getting Numbers ...[CXL Live 16] A/B Testing Pitfalls: Getting Numbers is Easy; Getting Numbers ...
[CXL Live 16] A/B Testing Pitfalls: Getting Numbers is Easy; Getting Numbers ...CXL
 
Webinar: Experimentation & Product Management by Indeed Product Lead
Webinar: Experimentation & Product Management by Indeed Product LeadWebinar: Experimentation & Product Management by Indeed Product Lead
Webinar: Experimentation & Product Management by Indeed Product LeadProduct School
 
AI and ML for Product Management by Smartsheet Sr Dir of PM
AI and ML for Product Management by Smartsheet Sr Dir of PMAI and ML for Product Management by Smartsheet Sr Dir of PM
AI and ML for Product Management by Smartsheet Sr Dir of PMProduct School
 
Top 5 product development engineer interview questions with answers
Top 5 product development engineer interview questions with answersTop 5 product development engineer interview questions with answers
Top 5 product development engineer interview questions with answerspresent0
 
Succesful Product Strategy | Moe Ali | ProductTank Toronto
Succesful Product Strategy | Moe Ali | ProductTank TorontoSuccesful Product Strategy | Moe Ali | ProductTank Toronto
Succesful Product Strategy | Moe Ali | ProductTank TorontoProduct Tank Toronto
 
Product Manager 101: What Does A Product Manager Actually Do?
Product Manager 101: What Does A Product Manager Actually Do?Product Manager 101: What Does A Product Manager Actually Do?
Product Manager 101: What Does A Product Manager Actually Do?Chris Cummings
 
An Experimentation Framework: How to Position for Triple Digit Growth
An Experimentation Framework: How to Position for Triple Digit GrowthAn Experimentation Framework: How to Position for Triple Digit Growth
An Experimentation Framework: How to Position for Triple Digit GrowthOptimizely
 
From Idea to Business with Lean Startup & the Progress Board
From Idea to Business with Lean Startup & the Progress Board From Idea to Business with Lean Startup & the Progress Board
From Idea to Business with Lean Startup & the Progress Board Strategyzer
 
How to Prepare For a Product Manager Interview by Google PM
How to Prepare For a Product Manager Interview by Google PMHow to Prepare For a Product Manager Interview by Google PM
How to Prepare For a Product Manager Interview by Google PMProduct School
 
Product Development, a PM Perspective by Microsoft Product Leader
Product Development, a PM Perspective by Microsoft Product LeaderProduct Development, a PM Perspective by Microsoft Product Leader
Product Development, a PM Perspective by Microsoft Product LeaderProduct School
 
How to Use Machine Learning as a Product Manager by Wework PM
 How to Use Machine Learning as a Product Manager by Wework PM How to Use Machine Learning as a Product Manager by Wework PM
How to Use Machine Learning as a Product Manager by Wework PMProduct School
 
Use the Progress Board to Test your Business Ideas
Use the Progress Board to Test your Business IdeasUse the Progress Board to Test your Business Ideas
Use the Progress Board to Test your Business IdeasStrategyzer
 
How Experimentation Plays a Role in PM by Expedia Sr. PM
How Experimentation Plays a Role in PM by Expedia Sr. PMHow Experimentation Plays a Role in PM by Expedia Sr. PM
How Experimentation Plays a Role in PM by Expedia Sr. PMProduct School
 
Losing is the New Winning
Losing is the New WinningLosing is the New Winning
Losing is the New WinningOptimizely
 
How to Use Data to Build Better Products by HelloSociety PM
How to Use Data to Build Better Products by HelloSociety PMHow to Use Data to Build Better Products by HelloSociety PM
How to Use Data to Build Better Products by HelloSociety PMProduct School
 
Product Analytics is Useless by Heap CEO
Product Analytics is Useless by Heap CEOProduct Analytics is Useless by Heap CEO
Product Analytics is Useless by Heap CEOProduct School
 

What's hot (20)

7 Lessons From Building Search Engine Products by CB Insights PM
7 Lessons From Building Search Engine Products by CB Insights PM7 Lessons From Building Search Engine Products by CB Insights PM
7 Lessons From Building Search Engine Products by CB Insights PM
 
Product Management Metrics | Saeed Khan | ProductTank Toronto
Product Management Metrics | Saeed Khan | ProductTank Toronto Product Management Metrics | Saeed Khan | ProductTank Toronto
Product Management Metrics | Saeed Khan | ProductTank Toronto
 
Optimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with StatisticsOptimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with Statistics
 
Optimizely Experience Customer Story - Atlassian
Optimizely Experience Customer Story - AtlassianOptimizely Experience Customer Story - Atlassian
Optimizely Experience Customer Story - Atlassian
 
[CXL Live 16] A/B Testing Pitfalls: Getting Numbers is Easy; Getting Numbers ...
[CXL Live 16] A/B Testing Pitfalls: Getting Numbers is Easy; Getting Numbers ...[CXL Live 16] A/B Testing Pitfalls: Getting Numbers is Easy; Getting Numbers ...
[CXL Live 16] A/B Testing Pitfalls: Getting Numbers is Easy; Getting Numbers ...
 
Webinar: Experimentation & Product Management by Indeed Product Lead
Webinar: Experimentation & Product Management by Indeed Product LeadWebinar: Experimentation & Product Management by Indeed Product Lead
Webinar: Experimentation & Product Management by Indeed Product Lead
 
AI and ML for Product Management by Smartsheet Sr Dir of PM
AI and ML for Product Management by Smartsheet Sr Dir of PMAI and ML for Product Management by Smartsheet Sr Dir of PM
AI and ML for Product Management by Smartsheet Sr Dir of PM
 
Top 5 product development engineer interview questions with answers
Top 5 product development engineer interview questions with answersTop 5 product development engineer interview questions with answers
Top 5 product development engineer interview questions with answers
 
Succesful Product Strategy | Moe Ali | ProductTank Toronto
Succesful Product Strategy | Moe Ali | ProductTank TorontoSuccesful Product Strategy | Moe Ali | ProductTank Toronto
Succesful Product Strategy | Moe Ali | ProductTank Toronto
 
Product Manager 101: What Does A Product Manager Actually Do?
Product Manager 101: What Does A Product Manager Actually Do?Product Manager 101: What Does A Product Manager Actually Do?
Product Manager 101: What Does A Product Manager Actually Do?
 
An Experimentation Framework: How to Position for Triple Digit Growth
An Experimentation Framework: How to Position for Triple Digit GrowthAn Experimentation Framework: How to Position for Triple Digit Growth
An Experimentation Framework: How to Position for Triple Digit Growth
 
From Idea to Business with Lean Startup & the Progress Board
From Idea to Business with Lean Startup & the Progress Board From Idea to Business with Lean Startup & the Progress Board
From Idea to Business with Lean Startup & the Progress Board
 
How to Prepare For a Product Manager Interview by Google PM
How to Prepare For a Product Manager Interview by Google PMHow to Prepare For a Product Manager Interview by Google PM
How to Prepare For a Product Manager Interview by Google PM
 
Product Development, a PM Perspective by Microsoft Product Leader
Product Development, a PM Perspective by Microsoft Product LeaderProduct Development, a PM Perspective by Microsoft Product Leader
Product Development, a PM Perspective by Microsoft Product Leader
 
How to Use Machine Learning as a Product Manager by Wework PM
 How to Use Machine Learning as a Product Manager by Wework PM How to Use Machine Learning as a Product Manager by Wework PM
How to Use Machine Learning as a Product Manager by Wework PM
 
Use the Progress Board to Test your Business Ideas
Use the Progress Board to Test your Business IdeasUse the Progress Board to Test your Business Ideas
Use the Progress Board to Test your Business Ideas
 
How Experimentation Plays a Role in PM by Expedia Sr. PM
How Experimentation Plays a Role in PM by Expedia Sr. PMHow Experimentation Plays a Role in PM by Expedia Sr. PM
How Experimentation Plays a Role in PM by Expedia Sr. PM
 
Losing is the New Winning
Losing is the New WinningLosing is the New Winning
Losing is the New Winning
 
How to Use Data to Build Better Products by HelloSociety PM
How to Use Data to Build Better Products by HelloSociety PMHow to Use Data to Build Better Products by HelloSociety PM
How to Use Data to Build Better Products by HelloSociety PM
 
Product Analytics is Useless by Heap CEO
Product Analytics is Useless by Heap CEOProduct Analytics is Useless by Heap CEO
Product Analytics is Useless by Heap CEO
 

Similar to A/B Testing for Everyone

A/B testing AI - Global Artificial Intelligence Conference 2019
A/B testing AI - Global Artificial Intelligence Conference 2019A/B testing AI - Global Artificial Intelligence Conference 2019
A/B testing AI - Global Artificial Intelligence Conference 2019Pavel Dmitriev
 
Non-Sales Questions That Lead to Sales
Non-Sales Questions That Lead to SalesNon-Sales Questions That Lead to Sales
Non-Sales Questions That Lead to SalesMailerMailer
 
Marketing Strategy Hacks
Marketing Strategy HacksMarketing Strategy Hacks
Marketing Strategy HacksApril Dunford
 
Growth Hacking Conference '17 - Antwerp
Growth Hacking Conference '17 - AntwerpGrowth Hacking Conference '17 - Antwerp
Growth Hacking Conference '17 - AntwerpThibault Imbert
 
FAQ for the Predictive Testing of Opportunities
FAQ for the Predictive Testing of OpportunitiesFAQ for the Predictive Testing of Opportunities
FAQ for the Predictive Testing of OpportunitiesThe Inovo Group
 
Hacking Lead Gen - Tools, Resources, and Strategy
Hacking Lead Gen - Tools, Resources, and StrategyHacking Lead Gen - Tools, Resources, and Strategy
Hacking Lead Gen - Tools, Resources, and StrategySales Hacker
 
Can I Buy An Essay Now Oelbert Gymnasiumoelbert
Can I Buy An Essay Now Oelbert GymnasiumoelbertCan I Buy An Essay Now Oelbert Gymnasiumoelbert
Can I Buy An Essay Now Oelbert GymnasiumoelbertAngela Gibbs
 
CEOFlow Introduction To Cold Calling 2.0 102007
CEOFlow Introduction To Cold Calling 2.0 102007CEOFlow Introduction To Cold Calling 2.0 102007
CEOFlow Introduction To Cold Calling 2.0 102007Aaron Ross
 
Questions That Uncover Hidden Sales Opportunities
Questions That Uncover Hidden Sales OpportunitiesQuestions That Uncover Hidden Sales Opportunities
Questions That Uncover Hidden Sales OpportunitiesMailerMailer
 
Data Driven Product Management - ProductTank Boston Feb '14
Data Driven Product Management - ProductTank Boston Feb '14Data Driven Product Management - ProductTank Boston Feb '14
Data Driven Product Management - ProductTank Boston Feb '14Quantopian
 
Quality Assurance, Testing, And Implementation
Quality Assurance, Testing, And ImplementationQuality Assurance, Testing, And Implementation
Quality Assurance, Testing, And ImplementationKristen Wilson
 
Be A Great Product Leader (Dropbox / AirBnB 2013)
Be A Great Product Leader (Dropbox / AirBnB 2013)Be A Great Product Leader (Dropbox / AirBnB 2013)
Be A Great Product Leader (Dropbox / AirBnB 2013)Adam Nash
 
The Demise of Duplicate Data Webinar (Part 1)
The Demise of Duplicate Data Webinar (Part 1)The Demise of Duplicate Data Webinar (Part 1)
The Demise of Duplicate Data Webinar (Part 1)Cloudingo
 
Be A Great Product Leader (Square 2013)
Be A Great Product Leader (Square 2013)Be A Great Product Leader (Square 2013)
Be A Great Product Leader (Square 2013)Adam Nash
 
How to Use Data to Inform Your Design and Drive Your Business
How to Use Data to Inform Your Design and Drive Your BusinessHow to Use Data to Inform Your Design and Drive Your Business
How to Use Data to Inform Your Design and Drive Your BusinessKissmetrics on SlideShare
 
2010 10 19 the lean startup workshop for i_gap ireland
2010 10 19 the lean startup workshop for i_gap ireland2010 10 19 the lean startup workshop for i_gap ireland
2010 10 19 the lean startup workshop for i_gap irelandEric Ries
 
Is Your Software Working For You
Is Your Software Working For YouIs Your Software Working For You
Is Your Software Working For YouIan Shufflebotham
 
Growth Hacking - High Tempo Testing
Growth Hacking - High Tempo TestingGrowth Hacking - High Tempo Testing
Growth Hacking - High Tempo TestingTomek Duda
 
Lean Analytics: Using Data to Build a Better Business Faster
Lean Analytics: Using Data to Build a Better Business FasterLean Analytics: Using Data to Build a Better Business Faster
Lean Analytics: Using Data to Build a Better Business FasterLean Startup Co.
 
Ian Waring - helping great managers succeed - available for interim hire
Ian Waring - helping great managers succeed - available for interim hireIan Waring - helping great managers succeed - available for interim hire
Ian Waring - helping great managers succeed - available for interim hireIan Waring
 

Similar to A/B Testing for Everyone (20)

A/B testing AI - Global Artificial Intelligence Conference 2019
A/B testing AI - Global Artificial Intelligence Conference 2019A/B testing AI - Global Artificial Intelligence Conference 2019
A/B testing AI - Global Artificial Intelligence Conference 2019
 
Non-Sales Questions That Lead to Sales
Non-Sales Questions That Lead to SalesNon-Sales Questions That Lead to Sales
Non-Sales Questions That Lead to Sales
 
Marketing Strategy Hacks
Marketing Strategy HacksMarketing Strategy Hacks
Marketing Strategy Hacks
 
Growth Hacking Conference '17 - Antwerp
Growth Hacking Conference '17 - AntwerpGrowth Hacking Conference '17 - Antwerp
Growth Hacking Conference '17 - Antwerp
 
FAQ for the Predictive Testing of Opportunities
FAQ for the Predictive Testing of OpportunitiesFAQ for the Predictive Testing of Opportunities
FAQ for the Predictive Testing of Opportunities
 
Hacking Lead Gen - Tools, Resources, and Strategy
Hacking Lead Gen - Tools, Resources, and StrategyHacking Lead Gen - Tools, Resources, and Strategy
Hacking Lead Gen - Tools, Resources, and Strategy
 
Can I Buy An Essay Now Oelbert Gymnasiumoelbert
Can I Buy An Essay Now Oelbert GymnasiumoelbertCan I Buy An Essay Now Oelbert Gymnasiumoelbert
Can I Buy An Essay Now Oelbert Gymnasiumoelbert
 
CEOFlow Introduction To Cold Calling 2.0 102007
CEOFlow Introduction To Cold Calling 2.0 102007CEOFlow Introduction To Cold Calling 2.0 102007
CEOFlow Introduction To Cold Calling 2.0 102007
 
Questions That Uncover Hidden Sales Opportunities
Questions That Uncover Hidden Sales OpportunitiesQuestions That Uncover Hidden Sales Opportunities
Questions That Uncover Hidden Sales Opportunities
 
Data Driven Product Management - ProductTank Boston Feb '14
Data Driven Product Management - ProductTank Boston Feb '14Data Driven Product Management - ProductTank Boston Feb '14
Data Driven Product Management - ProductTank Boston Feb '14
 
Quality Assurance, Testing, And Implementation
Quality Assurance, Testing, And ImplementationQuality Assurance, Testing, And Implementation
Quality Assurance, Testing, And Implementation
 
Be A Great Product Leader (Dropbox / AirBnB 2013)
Be A Great Product Leader (Dropbox / AirBnB 2013)Be A Great Product Leader (Dropbox / AirBnB 2013)
Be A Great Product Leader (Dropbox / AirBnB 2013)
 
The Demise of Duplicate Data Webinar (Part 1)
The Demise of Duplicate Data Webinar (Part 1)The Demise of Duplicate Data Webinar (Part 1)
The Demise of Duplicate Data Webinar (Part 1)
 
Be A Great Product Leader (Square 2013)
Be A Great Product Leader (Square 2013)Be A Great Product Leader (Square 2013)
Be A Great Product Leader (Square 2013)
 
How to Use Data to Inform Your Design and Drive Your Business
How to Use Data to Inform Your Design and Drive Your BusinessHow to Use Data to Inform Your Design and Drive Your Business
How to Use Data to Inform Your Design and Drive Your Business
 
2010 10 19 the lean startup workshop for i_gap ireland
2010 10 19 the lean startup workshop for i_gap ireland2010 10 19 the lean startup workshop for i_gap ireland
2010 10 19 the lean startup workshop for i_gap ireland
 
Is Your Software Working For You
Is Your Software Working For YouIs Your Software Working For You
Is Your Software Working For You
 
Growth Hacking - High Tempo Testing
Growth Hacking - High Tempo TestingGrowth Hacking - High Tempo Testing
Growth Hacking - High Tempo Testing
 
Lean Analytics: Using Data to Build a Better Business Faster
Lean Analytics: Using Data to Build a Better Business FasterLean Analytics: Using Data to Build a Better Business Faster
Lean Analytics: Using Data to Build a Better Business Faster
 
Ian Waring - helping great managers succeed - available for interim hire
Ian Waring - helping great managers succeed - available for interim hireIan Waring - helping great managers succeed - available for interim hire
Ian Waring - helping great managers succeed - available for interim hire
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

A/B Testing for Everyone

  • 1.
  • 2. A/B Testing for Everyone Pavel Dmitriev Some slides taken from talks by Ronny Kohavi
  • 3. About Me B.S. Applied Math @ Moscow State University, Russia  Ph.D. Computer Science @ Cornell University focused on applied Machine Learning  3 years @ Yahoo!, worked on web crawling and indexing optimization  8 years @ Microsoft, worked on experimentation in Bing, MSN, O365, Skype, Windows  5 months @ Outreach, working on experimentation, ML, NLP 3
  • 4. About Me B.S. Applied Math @ Moscow State University, Russia  Ph.D. Computer Science @ Cornell University focused on applied Machine Learning  3 years @ Yahoo!, worked on web crawling and indexing optimization  8 years @ Microsoft, worked on experimentation in Bing, MSN, O365, Skype, Windows  5 months @ Outreach, working on experimentation, ML, NLP 4
  • 5. Outline • Intro to A/B testing • Examples of real experiments • Experimentation adoption across industries • Five challenges preventing faster adoption, in Sales and in Software 5
  • 6. The Life of a Great Idea – True Bing Story 6 Control – Existing Display Treatment – new idea called Long Ad Titles
  • 7. The Life of a Great Idea • It was one of hundreds of ideas on the table, and it seemed • Stayed in the backlog in • Many features were above it, it was clear the idea was not going to make it any time soon • The engineer thought it was trivial to implement. He implemented it and started an A/B test. • Immediately an alert fired: the Revenue was abnormally high (usually indicates a bug) • But in this case there was no bug. The idea increased Bing’s revenue by 12% (over $100M/year), without hurting user experience metrics! 7 …meh… Feb March April May June
  • 8. We are bad at assessing the value of ideas • The best revenue generating idea in Bing history was badly rated and delayed for months! At Microsoft, we ran a study in Bing and found that only ~1/3 of ideas developed were actually good for users and business, ~1/3 were neutral, and ~1/3 were bad • Only in Software Engineering? In Sales, contradicting “best practices” are abundant. For example, best day to contact the prospect is … In Medicine, correctly evaluating an idea, e.g. a new drug, is a matter of life and death. FDA and EMA do not trust expert opinions and mandates the use of Randomized Controlled Trials 8We can’t trust our gut! To make the right choices we need data from real users!
  • 9. 9
  • 10. Collecting Usage Data • Companies have always been collecting data to learn what their users appear to value Interviews, focus groups, questionnaires, and other similar techniques are great at revealing what users say they do Although rich with qualitative information, the learnings from these techniques are typically based on small samples and risk being biased, making it hard to generalize • With the internet connectivity of the products, companies can collect feedback data to learn what their customers actually value Telemetry and logging reveal what the customers actually do 10
  • 11. Use Data Correctly - Correlation is not Causation • Seattle is known for its rain • Whenever I see people on the street carrying umbrellas, very soon it starts raining • I may conclude that umbrellas cause the rain, and decide to ban them • Banning umbrellas, however, won’t stop the rain; it will just make everyone more wet 11 Photo by Mike Waller, taken from Flickr Relying on correlations isn’t just neutral, it’s often harmful to the business!
  • 12. Correlation is not Causation – Real Example • You observe the churn rates for users using/not-using your feature: 25% of new users who do NOT use your feature churn (stop using product 30 days later); only 10% of new users who use your feature churn • [Wrong] Conclusion: your feature reduces churn and thus critical for retention Flaw: Relationship between the feature and retention is correlational, the data above is insufficient for any causal conclusion • Example: Users who see error messages in Office 365 churn less. This does NOT mean we should show more error messages. They are just heavier users of Office 365 12
  • 13. Using Data Correctly – Before and After 13 Flaw: This approach misses time related factors such as external events, weekends, holidays, seasonality, etc. 0 5 10 15 20 25 30 35 Amazon Kindle Sales Website A Website B Before and after example 0 5 10 15 20 25 30 35 Amazon Kindle Sales Website A Website B Oprah calls Kindle "her new favorite thing" The new site (B) is always worse than the original (A), opposite of what observational data suggests
  • 14. A/B Tests in One Slide • Other names: Controlled Experiments, Randomized Clinical Trials (RCTs) • Can have more than two variants: A/B/C/etc. tests are common • Must run statistical tests to confirm differences are not due to chance 14A/B Tests are the best scientific way to prove causality!
  • 15. Real Examples • Three experiments • Each had enough users for statistical validity • For each experiment I’ll tell you the success metric • Your job is to guess the result Please stand up You’ll chose between three options by raising you left hand, right hand, or leave both hand down If you get it wrong, please sit down • Since there are 3 choices for each question, random guessing implies 100%/3^3 =~ 4% will get all three questions right. Let’s see how much better than random you can do.
  • 16. Example 1: Outreach Email (Step 9, Day 7) • Success metric: Reply Rate 16 Hey {{first_name}}, In short, we're a sales automation platform that makes your reps life a lot easier. Our average companies (based on 1100+ companies) have tripled their reply rates on cold outbound emails and boosted rep productivity by 2x. We take what your best reps are doing and automate that across your entire team so your weaker reps can work at the highest possible same level. We also solve the issue of follow up falling through the cracks and reps not going deep enough. When can I get a few minutes on your calendar to discuss? {{sender.first_name}} {{first_name}}, I'm sure in your role you get a ton of sales-driven emails, probably most of which are spam you have no interest in. My goal is to provide enough value to warrant a 15 minute call with you. What we do is put your sales process into a structured series of touch points which takes care of your follow-up process for you. This ramps up reps activities and ensures that every lead is thoroughly worked, never gets lost and receives the 5 to 12 touches where 80% of sales happen. Second, we do all the administrative work for in your CRM (Salesforce). This frees up your reps time, logs their activities, and gives you 100% accurate reporting. Finally, we open up the "Black Box" of sales and show you in real time how each rep is performing, what activities they're doing, and what is and isn't working. This provides a solid foundation to accurately forecast results, improve your outreach and train your team. Over 1100 companies (like CenturyLink, Adobe, and Marketo) use us and their average rep saves 2 hrs a day, and 2X's their productivity. If you see value here can we set up a time next Tuesday or Wednesday to discuss? {{sender.first_name}} • Left: shorter, more “salesy” • Right: longer, more “socially • Raise your left hand if you think the Left version wins (stat-sig) • Raise your right hand if you think the Right version wins (stat-sig) • Don’t raise your hand if they are the about the same (no stat-sig difference)
  • 17. Example 1: Outreach Email (Step 9, Day 7) 17 Hey {{first_name}}, In short, we're a sales automation platform that makes your reps life a lot easier. Our average companies (based on 1100+ companies) have tripled their reply rates on cold outbound emails and boosted rep productivity by 2x. We take what your best reps are doing and automate that across your entire team so your weaker reps can work at the highest possible same level. We also solve the issue of follow up falling through the cracks and reps not going deep enough. When can I get a few minutes on your calendar to discuss? {{sender.first_name}} {{first_name}}, I'm sure in your role you get a ton of sales-driven emails, probably most of which are spam you have no interest in. My goal is to provide enough value to warrant a 15 minute call with you. What we do is put your sales process into a structured series of touch points which takes care of your follow-up process for you. This ramps up reps activities and ensures that every lead is thoroughly worked, never gets lost and receives the 5 to 12 touches where 80% of sales happen. Second, we do all the administrative work for in your CRM (Salesforce). This frees up your reps time, logs their activities, and gives you 100% accurate reporting. Finally, we open up the "Black Box" of sales and show you in real time how each rep is performing, what activities they're doing, and what is and isn't working. This provides a solid foundation to accurately forecast results, improve your outreach and train your team. Over 1100 companies (like CenturyLink, Adobe, and Marketo) use us and their average rep saves 2 hrs a day, and 2X's their productivity. If you see value here can we set up a time next Tuesday or Wednesday to discuss? {{sender.first_name}} • Left template has 70% higher reply rate… However, most replies are negative or unsubscribe requests. The right template has higher positive • If you did not raise your hand, sit down… • If you raise your right hand, sit down…
  • 18. Example 2: SERP Truncation • SERP is a Search Engine Result Page (shown on the right) • Success Metric: Clickthrough Rate on first SERP (ignore issues with click/back, page 2, etc.) • Version A: show 10 algorithmic results • Version B: show 8 algorithmic results by removing the last two results (shown on the right) • All else the same: task pane, ads, related searches 18 • Raise your left hand if you think version A wins (10 results) • Raise your right hand if you think version B wins (8 results) • Don’t raise your hand if they are the about the same
  • 19. Example 2: SERP Truncation • If you raised your left hand, sit down… • If you raised your right hand, sit down… • With over 3M users in each variant, we could not detect a stat-sig delta. Users simply shifted the clicks from the last two algorithmic results to other elements of the page. • Rule of Thumb: Shifting clicks is easy. Reducing abandonment is hard. 19
  • 20. Example 3: Windows Search Box • The search box in the lower left corner of the screen on Windows machines 20 • Success metrics: more searches (and thus more Bing revenue) • Raise your left hand if you think the Left version wins • Raise your right hand if you think the Right version wins • Don’t raise your hand if they are the about the same
  • 21. Example 3: Windows Search Box 21 • If you did not raise your hand, sit down… • If you raised your left hand, sit down… • The four variants we actually tested in order of performance are: Type here to search (winner) What can I help you find? Ask me anything (Control - the design that shipped with Windows 10) Search the web and Windows (worst) Stop guessing – get the data!
  • 23. Experimentation Adoption: Software Industry • http://www.exp-growth.com/ - survey to determine the state of experimentation maturity (Fabijan et al, ICSE 2017, SEAA 2018) 23 0 5 10 15 20 25 Crawl Walk Run Fly State of Exp Growth
  • 24. Other industries? Let’s look at Sales • Most of Outreach ~2500 customers fall into Crawl stage, with many not doing any A/B testing at all • Few sales organizations have a systematic experimentation program • Huge potential: some experiments we ran doubled reply rates! 24
  • 25. What are the reasons for low adoption in Sales? • A few facts about sales Very traditional industry (some say the oldest profession on earth), slow to change No formal education or degrees, considered entry level and pays low Requires extreme mental toughness. You are constantly ignored and told no. You’ve got a monthly quota, and if you don’t meet it 3 months in row – you are fired • There’s a fear of change: sales managers are afraid to try new ideas, fearing it may cause harm and result in missing their quota 25
  • 26. What are the reasons for low adoption in Sales? • Inadequate support for experimentation in sales tools, leading to most tests being invalid, and inability to confidently make decisions even on valid tests 26 no statistical testing any user can turn the variants on/off any time during the test Any user can edit the email being tested any time during the test Vast majority of the tests are broken (e.g. imbalance in deliveries)
  • 27. How to increase the adoption? • We need to make experimentation Trustworthy – results are correct and easily understood Safe – impact of testing bad ideas is limited Easy to use – enable non-technical sales managers and executives answer their questions • These are the same things I worked on trying to increase adoption of experimentation at Microsoft! Except… the bar is higher!!! 27
  • 28. Five Gaps 1. No open source trustworthy A/B testing solution 2. Difficult to come up with the right metrics 3. Small sample sizes 4. Difficult to understand results of statistical tests 5. Hard to translate business questions into experiment designs Between the needs of Sales Industry and the experimentation State of the Art 28 Solving these issues will help accelerate experimentation adoption in Sales, Software, and other domains
  • 29. #1. Open Source A/B Testing Platform • Pretty much anything “platform” is open source, except A/B testing Wasabi, the only option, is not maintained • Our http://exp-growth.com survey showed that most companies build their own platform from scratch (Fabijan et al, SEAA 2018). This is hard - a big investment few companies can afford. 29 0% 10% 20% 30% 40% 50% 60% 70% 80% Internally developed platform Third party platform No platform (manual coding of experiments) Type of Exp Platform • There’s a need for an easy to deploy and integrate open source A/B testing solution that is easy to use, supports several common experiment designs, and provides safety features
  • 30. #2. Determining the right metrics • How to judge the result of an A/B test? OEC = Overall Evaluation Criteria, or OMTM = One Metric That Matters A single metric or a few key metrics with a well-defined decision criteria • Two key properties: 1. Alignment with long-term company goals (directionality) 2. Ability to impact (sensitivity) • Finding a good OMTM is hard, in Sales and in Software Products Simple metrics like Opens or Replies to sales emails are not predictive of future sale (fail directionality) Long-term metrics like Sales or Revenue take too long to measure - typical sales cycle takes months - and are hard to impact via small changes like email content (fail sensitivity) Outreach solution – Positive Replies, where “positive” is determined via an ML classifier See A/B Testing at Scale Tutorial for examples from Software industry 30
  • 31. #3. Small Sample Sizes • A typical 2-week A/B test for a mid-size Outreach customer will only have hundreds- to-thousands data points in each variant This translates to being able to detect only changes of ~20% or more • Solutions: Run bigger tests (at Outreach we recommend to always run 50/50 tests) Select more sensitive metrics: 20% increase in Revenue is hard, 20% increase in Positive Replies is easier Start by focusing on bigger changes rather than small tweaks. As the company grows and volume of sales activity increases, can focus on smaller and smaller changes Implement smarter experiment designs (e.g. cross-over design) and analysis methods (e.g. CUPED) 31
  • 32. #4. Understanding Experiment Results • Standard way of evaluating experiments via Null Hypothesis Testing can be easily misinterpreted, leading to wrong conclusions See Steve Goodman’s A Dirty Dozen for 12 ways to get it wrong Can’t show p-values to sales reps, need an easier way to interpret results 32 • Treatment effect may be different on different sub-populations Results may vary depending on country, browser, location, prospect persona, sales step, etc. How to automatically detect and visualize such heterogeneous results?
  • 33. #4. Understanding Experiment Results • Each experiment needs to have clear success criteria, mapping unambiguously to positive/negative outcomes • Summarize results and learnings in an easy to understand visual way (Fabijan et al, SEAA 2018) 33
  • 34. #5. Answering Business Questions • Traditionally, A/B testing have been used to answer simple yes/no questions like Does my new medicine help? Should I ship my new feature? Is my new email subject line better? • However, managers and execs think of bigger more difficult questions Does embedding videos in e-mails help? How urgently should sales reps reply to prospects? How much should I invest in improving performance of my site? • Using A/B testing to help answer these questions can help greatly accelerate adoption of experimentation Run a series of experiments on embedding video across all key scenarios Run a series of experiments notifying users to reply with different delays across multiple scenarios Run a series of “slowdown” experiments to estimate impact of performance on revenue • Need to develop design patterns for such “learning experiment series” 34
  • 35. Summary • We are bad at assessing the value of our ideas. Don’t trust experts – get the data! • A/B testing is the best scientific way to measure causal impact of your work on users and business • Experimentation adoption is growing in Software Industry, but very low in other industries like Sales • Five challenges slowing down the adoption: 1. No open source trustworthy A/B testing solution 2. Difficult to come up with the right metrics 3. Small sample sizes 4. Difficult to understand results of statistical tests 5. Hard to translate business questions into experiment designs • Solving these challenges will not only help Sales, it will accelerate experimentation adoption in Software and other industries, bringing experimentation to Everyone!35
  • 36. Questions? Slides will be posted on my LinkedIn page: www.linkedin.com/in/paveldmitriev/

Editor's Notes

  1. I wanted to enable experimentation for Sales. On the one hand, there already was A/B testing support in Outreach product. So looks like I had a head start. But on the other hand, few organizations were using it. Today I’m going to share with you what I learned while working on increasing adoption of A/B testing in Sales, what challenges we need to solve to do it, and how solving these challenges will actually help us grow adoption of experimentation in Software and other industries.
  2. References for similar observations made by many others “Google ran approximately 12,000 randomized experiments in 2009, with [only] about 10 percent of these leading to business changes” was stated in Uncontrolled by Jim Manzi “80% of the time you/we are wrong about what a customer wants” was stated in Experimentation and Testing: A Primer by Avinash Kaushik, author of Web Analytics and Web Analytics 2.0 QualPro tested 150,000 ideas over 22 years and founder Charles Holland stated, “75 percent of important business decisions and business improvement ideas either have no impact on performance or actually hurt performance…” in Breakthrough Business Results With MVT At Amazon, half of the experiments failed to show improvement
  3. What customers say they like during a study may be different from what they actually like in their daily life. Example: In user studies customers always prefer richer web pages (with more images/videos/carousels/etc.). In real life, when page load time and the speed of getting to the result is important, richer pages slow users down and introduce distractions from the task at hand, and often end up doing worse.
  4. It’s not enough to collect the data, we need to use it correctly
  5. At Microsoft experimentation is winning. When I left 5 months ago more than 20 large Microsoft products were actively running experiments.
  6. In software industry at large, experimentation adoption is growing fast. More and more companies are adoption “experiment with everything” culture. Experimentation is becoming the norm, with many successful companies like Netflix, Pinterest, Intuit, LinkedIn, etc. citing it as one of the key factors of their success. This peer pressure forces even the companies that do not have good understanding of the benefits of experimentation to adopt it.
  7. In sales, adoption of experimentation is low.
  8.  Cultural issues
  9. Technical issues Outreach had ”state of the art” experimentation capabilities. Others are similar or worse. What these issues mean is that even if someone tries to run an experiment, they are very likely to fail to obtain correct actionable results, which in turn creates even bigger barrier for them to try it again.
  10. Even though on the surface the sales industry is so different from Software Engineering, the issues wrt experimentation adoption are the same. For example, while for Microsoft engineers you can teach them how statistical testing works and just send just give them p-values and confidence intervals and reasonably trust they will interpret it all correctly, this doesn’t work for sales people. Sales people need a clear answer – which variant wins and why.
  11. You will see that, while I’m going to be using the example of Sales, these are again the challenges faced by the Software industry as well. By solving them, we can increase experimentation adoption not only in Sales but in Software and other industries.
  12. From hosting of services, to data collection and processing, to data analysis and real time machine learning and AI – there are open source solutions supporting all of that. A lot of work to develop a trustworthy A/B testing platform. Most companies end up with very basic, often untrusworthy solutions, like the one we had at Outreach. If we are to increase the adoption of experimentation in any industry, we need a quality open source solution.
  13. If we run an A/B test but do not have a clear way to determine success, the value is much less.
  14. Often the reason for complexity is that experimentation system does not know what the success criteria area, so it has to return back everything and have the user understand and interpret the results. If, on the other hand, experimentation system knows the precise success criteria, it can do analysis automatically and just give the user the answer. Interpretation difficulties and shortcomings of NHST can be greatly reduced in this case.
  15. Translation of the video question: Should I pay for a video tool and train my team on it? It is executives and managers who decide on adopting experimentation program. If A/B testing can help them answer the questions they really care about, it will greatly accelerate the adoption.
  16. End by 1:00