SlideShare a Scribd company logo
Google’s infamous AB test: testing 41 variants of mildly different shades of blue
Agenda
Longitudinal or pre-post testing is difficult since little variance is explained by product features. Other factors
impacting conversion are:
Price
Weekend/Weekday
Seasonality
Source of Traffic
Availability
Mix of users (distribution bias)
Clarity of product thinking & avoiding snowballing of incorrect insights
Why was conversion for Android version 5.5.6 better than 5.5.5 for the first 3 days?
(Hint: Early adoptor bias- users with stable wifi and loyal to MMT app convert higher than all users)
Why is AB Testing needed?
Introduction to AB testing
Choosing Alia Bhatt as brand ambassador
A recommended hotel on the top of the listing
Impact of a fix for latency
Increase sign-in rate by increasing the size of the login button
Impact of showing packing list as a notification a day before the flight date
Quiz: What can or cannot be AB tested
AB testing is for lower hanging fruits not quantum leaps: for those user testing,
interviews and FGDs as well as analysis of existing data are better.
Choosing Alia Bhatt as brand ambassador: No
A recommended hotel on the top of the listing: Yes
Impact of a fix for latency: Yes
Increase sign-in rate by increasing the size of the login button: Yes
Impact of showing packing list as a notification a day before the flight date: Tough, but theoretically yes
Quiz: What can or cannot be AB tested
AB testing is for lower hanging fruits not quantum leaps: for those user testing,
interviews and FGDs as well as analysis of existing data are better.
Key Stages of AB Testing
Hypothesis Definition
Metric Identification
Determining Size & Duration
Tooling & Distribution
Invariance Testing
Analyzing Results
Almost all AB experiment hypotheses should look something like below:
Eg. 1
H0 (Null/Control): A big login button will not impact user login percentage
H1 (Test): A big login button will significantly increase user login percentage
Eg: 2
H0 (Control): Putting higher user rating hotels at the top of the listing doesn’t change conversion
H1 (Test): Putting higher user rating hotels at the top of the listing changes conversion significantly
Good to articulate the hypothesis you’re testing in simple English at the start of the experiment. The
hypothesis should have a user verbiage and not a feature verbiage. It’s okay if you skip this too as long as
you get the idea.
Hypotheses Definition
Counts, eg.
#Shoppers
#Users buying
#Orders
Rates, eg.
Click through Rate
Search to Shopper Rate
Bounce Rate
Probability (a user completes a task), eg.
User Conversion in the funnel
Metric identification (1/2)
Consider the following metrics for conversion:
1. #Order/#Visits to listing page
2. #Visitors to TY Page/#Visitors to Listing Page
3. #Visits to TY Page/#Visits to listing page
4. #Orders/#PageViews of listing page
Metric identification (2/2): Quiz
1 2 3 4
User refreshes the listing page
User breaks the booking into 2
User’s TY page gets refreshed
User does a browser back and the page is served from cache
User drops off on details and comes back via drop-off
notification
Omniture is not firing properly on listing page
1. If showing a summary of hotel USPs on the details page is improving conversion?
2. If a user who purchased with MMT will come back again?
3. If we are sending too many or too few notifications to users?
How can you measure?
1 .If showing a summary of hotel USPs on the details page is improving conversion?
A simple A/B set-up with and without the feature will help in evaluation
2. If a user who purchased with MMT will come back again?
A. An secondary metric captured by asking buyers this question or an NPS survey and comparing results
should give some idea
3. If we are sending too many or too few notifications to users?
A. An indirect metric measured as retained users on the app across the two variants
How can you measure?
Size & Duration
Reality Test Output Error
Control is better Control is better 1- α (confidence level)
Control is better Test is better α (significance)
Test is better Test is better 1-β (power)
Test is better Control is better β
α or type-I error is the probability of rejecting null when it is true (Downside Error)
β or type-II error is the probability of accepting null when control is better (Opportunity Cost Error)
Target values to test significance is at α = 5% and 1-β=80%
Size & Duration
Size:
• To figure out the size of the samples required to get the 80% power for the test, here
• These many users need to be targeted with the smallest of the test variant being examined
Duration:
• Is an outcome of what % of traffic can you direct to the test + some minimum duration considerations
• You might want to limit the %age exposure of the experiment due to:
• Revenue impacts
• Leaving room for other people to experiment
• Even if the sample size for the required power can be reached in a shorter duration good to reduce the exposure of
the experiment to include:
• At-least 1 weekend/weekdays
• low & high discounting periods (if possible)
• Low & high availability periods (if possible)
No Peeking
• It is important to not reduce power of the test by changing decision with insufficient data
• Best explained in the blog. Primary idea being that taking duration clues from early data introduces human error in
the measurement
• In-case the sample size is turning out to be very high, a few ways to reduce it are:
• Use this sequential sampling approach (reduces size by as high as 50% in some scenarios)
• Use this Bayesian sampling approach (mathematically intensive)
• Try matching the lowest unit of measurement with lowest unit of distribution (eg instead of measuring
latency/user measure latency per hit and distribute the experiment on hit)
• Try moving the experiment allocation closer to the step where there is an actual change (eg assign payment
experiment to payment page users)
Distribution Metric
1. Page Views
2. Cookies
3. Login-ID
4. Device ID
5. IP Address
Tooling & Distribution (1/2)
Which will not be hampered by the following 1 2 3 4 5
User shortlists 2-3 hotels and comes back after a day
User starts search on mobile and books on desktop
User changes browsers on the machine
User logs out and continues with another ID
Typical requirements for an AB system are:
Each experiment should support multiple variants (A/B/C..) and each variant can be defined using a combination of
experiment variables
Each user is randomly assigned a variant (as per the distribution percentage). System ensures users are served a
consistent experience basis their device ID or cookie (other distribution parameters like page view or visit might be
used but cookie/device-id is the most stable)
Auto-logs the variant that the users are being exposed to in an analytics system
There are multiple AB testing systems available by several vendors or one can be easily created internally using a tag
manager like Google tags
Tooling & Distribution (2/2)
A/A Testing:
Ideally, it is good to run 1 or many A/A test to measure the same metric you’re planning to measure in A/B tests before
and after your test period
Even if the above is not feasible, do try to run A/A test regularly to test the underlying system
Things to test during A/A Tests:
Key metrics you measure (like conversion, counts, page-views, etc) and their statistical difference between the
two cohorts at different ratios of test & control
A/A & Invariance Testing
Invariance Testing
Identify Invariance metrics- metrics that should not change between control & experiment
One of the basic metrics that will be the invariant will be the count of the users assigned to each group. Very
important to test these
Each of the invariants should be within statistical bounds between population and control
A/A & Invariance Testing
1. Remember the threshold practical significance threshold used in sample size calculator. That is going to be
the least change that we care about, so a statistically significant change < the practical significance
threshold is useless.
2. Choose the distribution & test:
1. Counts: poisson distribution or poisson-mean
2. Rates: poisson distribution or possison-mean
3. Click-through-probability: binomial distribution & t-test (or chi-square test).
Analyzing Results (1/3)
Analyzing Results (2/3): Taking
Decision Launch
Don’t Launch or
Keep Testing
Analyzing Results (2/3): Taking
Decision Launch
Don’t Launch or
Keep Testing
Yes
No Keep Testing
No Don’t Launch
No Keep Testing
Analyzing Results (3/3): Taking Decision
A/B/C Setup
A particular type of experiment set-up that is beneficial where there might be server & client side affects that
introduce bias. A few examples
Measure impact of persuasion shown (say last room left)
User might be positively impacted to convert higher, v/s
Higher latency to fetch persuasion might reduce conversion
Showing a message “Cheaper than Rajdhani” on flights > 75 mins duration and fare <3000
User might be positively impacted to convert, v/s
Conversion for cheaper flight (<3000) is generally higher
Showing a USP of the hotel generated from user reviews, eg. guests love this because: “great neighborhood to
stay”
User might be positively impacted to convert, v/s
Feature might only be visible on hotels with > X reviews (and hence bookings). There is an innate hotel bias.
In these scenarios, it is best to setup 3 variants:
A= Feature Off or Control
B= Feature On but not shown to users
C= Feature on but shown to users.
A/B/C Setup
AB testing in an organization typically goes through the following stages:
Would encourage you all to help your organization move to the
next stage in the AB testing journey
Best to be in a state where the company culture supports quick prototyping and testing with real users
Solving for multi device (stitching sessions) and other tracking limitations in the set-up
Higher standards of experiment analysis and responsible reporting
Things to Improve
Sanity Checks
Testing for
conflict
resolution
Testing for
impact
measurement
Testing for
hypothesis
Rapid
prototyping &
testing
Definitely read the Evan Miller blog. It basically summarizes everything you need to know.
If keen on getting in more detail of techniques and best practices, take the course on Udacity. Just doing the first chapter
would be good enough
Further Reading

More Related Content

What's hot

Practical Introduction to A/B Testing
Practical Introduction to A/B TestingPractical Introduction to A/B Testing
Practical Introduction to A/B Testing
Alex Alwan
 
SXSW 2016 - Everything you think about A/B testing is wrong
SXSW 2016 - Everything you think about A/B testing is wrongSXSW 2016 - Everything you think about A/B testing is wrong
SXSW 2016 - Everything you think about A/B testing is wrong
Dan Chuparkoff
 
10 Guidelines for A/B Testing
10 Guidelines for A/B Testing10 Guidelines for A/B Testing
10 Guidelines for A/B Testing
Emily Robinson
 
A/B testing at Spotify
A/B testing at SpotifyA/B testing at Spotify
A/B testing at Spotify
Ali Sarrafi
 
Test for Success: A Guide to A/B Testing on Emails & Landing Pages
Test for Success: A Guide to A/B Testing on Emails & Landing PagesTest for Success: A Guide to A/B Testing on Emails & Landing Pages
Test for Success: A Guide to A/B Testing on Emails & Landing Pages
Optimizely
 
A/B Testing Framework Design
A/B Testing Framework DesignA/B Testing Framework Design
A/B Testing Framework Design
Patrick McKenzie
 
The Power of A/B Testing
The Power of A/B TestingThe Power of A/B Testing
The Power of A/B Testing
Alexandre Pallota
 
A/B testing
A/B testingA/B testing
Startup Metrics for Pirates (Aug 2010)
Startup Metrics for Pirates (Aug 2010)Startup Metrics for Pirates (Aug 2010)
Startup Metrics for Pirates (Aug 2010)
Dave McClure
 
Experimentation Platform at Netflix
Experimentation Platform at NetflixExperimentation Platform at Netflix
Experimentation Platform at Netflix
Steve Urban
 
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PM
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PMControlled Experimentation aka A/B Testing for PMs by Tinder Sr PM
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PM
Product School
 
A/B Testing for New Product Launches by Booking.com Sr PM
A/B Testing for New Product Launches by Booking.com Sr PMA/B Testing for New Product Launches by Booking.com Sr PM
A/B Testing for New Product Launches by Booking.com Sr PM
Product School
 
Why everything is an A/B Test at Pinterest
Why everything is an A/B Test at PinterestWhy everything is an A/B Test at Pinterest
Why everything is an A/B Test at Pinterest
Krishna Gade
 
eMetrics London - The AB Testing Hype Cycle
eMetrics London - The AB Testing Hype CycleeMetrics London - The AB Testing Hype Cycle
eMetrics London - The AB Testing Hype Cycle
Craig Sullivan
 
A/B testing
A/B testingA/B testing
A/B testing
Kapil Saxena
 
A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation
WrangleConf
 
Growth Hacking / Marketing 101: It's about process
Growth Hacking / Marketing 101: It's about processGrowth Hacking / Marketing 101: It's about process
Growth Hacking / Marketing 101: It's about process
Ruben Hamilius
 
Startup Metrics for Pirates (FOWA/Miami, Feb 2009)
Startup Metrics for Pirates (FOWA/Miami, Feb 2009)Startup Metrics for Pirates (FOWA/Miami, Feb 2009)
Startup Metrics for Pirates (FOWA/Miami, Feb 2009)
Dave McClure
 
Startup Metrics for Pirates
Startup Metrics for PiratesStartup Metrics for Pirates
Startup Metrics for Pirates
Dave McClure
 
Conversion conference london nov 2011 - multi channel testing - craig sullivan
Conversion conference london   nov 2011 - multi channel testing - craig sullivanConversion conference london   nov 2011 - multi channel testing - craig sullivan
Conversion conference london nov 2011 - multi channel testing - craig sullivan
Craig Sullivan
 

What's hot (20)

Practical Introduction to A/B Testing
Practical Introduction to A/B TestingPractical Introduction to A/B Testing
Practical Introduction to A/B Testing
 
SXSW 2016 - Everything you think about A/B testing is wrong
SXSW 2016 - Everything you think about A/B testing is wrongSXSW 2016 - Everything you think about A/B testing is wrong
SXSW 2016 - Everything you think about A/B testing is wrong
 
10 Guidelines for A/B Testing
10 Guidelines for A/B Testing10 Guidelines for A/B Testing
10 Guidelines for A/B Testing
 
A/B testing at Spotify
A/B testing at SpotifyA/B testing at Spotify
A/B testing at Spotify
 
Test for Success: A Guide to A/B Testing on Emails & Landing Pages
Test for Success: A Guide to A/B Testing on Emails & Landing PagesTest for Success: A Guide to A/B Testing on Emails & Landing Pages
Test for Success: A Guide to A/B Testing on Emails & Landing Pages
 
A/B Testing Framework Design
A/B Testing Framework DesignA/B Testing Framework Design
A/B Testing Framework Design
 
The Power of A/B Testing
The Power of A/B TestingThe Power of A/B Testing
The Power of A/B Testing
 
A/B testing
A/B testingA/B testing
A/B testing
 
Startup Metrics for Pirates (Aug 2010)
Startup Metrics for Pirates (Aug 2010)Startup Metrics for Pirates (Aug 2010)
Startup Metrics for Pirates (Aug 2010)
 
Experimentation Platform at Netflix
Experimentation Platform at NetflixExperimentation Platform at Netflix
Experimentation Platform at Netflix
 
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PM
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PMControlled Experimentation aka A/B Testing for PMs by Tinder Sr PM
Controlled Experimentation aka A/B Testing for PMs by Tinder Sr PM
 
A/B Testing for New Product Launches by Booking.com Sr PM
A/B Testing for New Product Launches by Booking.com Sr PMA/B Testing for New Product Launches by Booking.com Sr PM
A/B Testing for New Product Launches by Booking.com Sr PM
 
Why everything is an A/B Test at Pinterest
Why everything is an A/B Test at PinterestWhy everything is an A/B Test at Pinterest
Why everything is an A/B Test at Pinterest
 
eMetrics London - The AB Testing Hype Cycle
eMetrics London - The AB Testing Hype CycleeMetrics London - The AB Testing Hype Cycle
eMetrics London - The AB Testing Hype Cycle
 
A/B testing
A/B testingA/B testing
A/B testing
 
A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation
 
Growth Hacking / Marketing 101: It's about process
Growth Hacking / Marketing 101: It's about processGrowth Hacking / Marketing 101: It's about process
Growth Hacking / Marketing 101: It's about process
 
Startup Metrics for Pirates (FOWA/Miami, Feb 2009)
Startup Metrics for Pirates (FOWA/Miami, Feb 2009)Startup Metrics for Pirates (FOWA/Miami, Feb 2009)
Startup Metrics for Pirates (FOWA/Miami, Feb 2009)
 
Startup Metrics for Pirates
Startup Metrics for PiratesStartup Metrics for Pirates
Startup Metrics for Pirates
 
Conversion conference london nov 2011 - multi channel testing - craig sullivan
Conversion conference london   nov 2011 - multi channel testing - craig sullivanConversion conference london   nov 2011 - multi channel testing - craig sullivan
Conversion conference london nov 2011 - multi channel testing - craig sullivan
 

Viewers also liked

Ab test -互联网渐进式解决方案
Ab test -互联网渐进式解决方案Ab test -互联网渐进式解决方案
Ab test -互联网渐进式解决方案
文波 张
 
Tag-it 2016 slides: UX + A/B Testing at Booking.com: Design focused on conver...
Tag-it 2016 slides: UX + A/B Testing at Booking.com: Design focused on conver...Tag-it 2016 slides: UX + A/B Testing at Booking.com: Design focused on conver...
Tag-it 2016 slides: UX + A/B Testing at Booking.com: Design focused on conver...
Maria Lígia Klokner
 
#Measurecamp : 18 Simple Ways to F*** up Your AB Testing
#Measurecamp : 18 Simple Ways to F*** up Your AB Testing#Measurecamp : 18 Simple Ways to F*** up Your AB Testing
#Measurecamp : 18 Simple Ways to F*** up Your AB Testing
Craig Sullivan
 
A/B Testing: You Might be Driving in the Wrong Direction
A/B Testing: You Might be Driving in the Wrong DirectionA/B Testing: You Might be Driving in the Wrong Direction
A/B Testing: You Might be Driving in the Wrong Direction
Kissmetrics on SlideShare
 
Test of significance (t-test, proportion test, chi-square test)
Test of significance (t-test, proportion test, chi-square test)Test of significance (t-test, proportion test, chi-square test)
Test of significance (t-test, proportion test, chi-square test)
Ramnath Takiar
 
T Test For Two Independent Samples
T Test For Two Independent SamplesT Test For Two Independent Samples
T Test For Two Independent Samplesshoffma5
 
The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017
LinkedIn
 

Viewers also liked (7)

Ab test -互联网渐进式解决方案
Ab test -互联网渐进式解决方案Ab test -互联网渐进式解决方案
Ab test -互联网渐进式解决方案
 
Tag-it 2016 slides: UX + A/B Testing at Booking.com: Design focused on conver...
Tag-it 2016 slides: UX + A/B Testing at Booking.com: Design focused on conver...Tag-it 2016 slides: UX + A/B Testing at Booking.com: Design focused on conver...
Tag-it 2016 slides: UX + A/B Testing at Booking.com: Design focused on conver...
 
#Measurecamp : 18 Simple Ways to F*** up Your AB Testing
#Measurecamp : 18 Simple Ways to F*** up Your AB Testing#Measurecamp : 18 Simple Ways to F*** up Your AB Testing
#Measurecamp : 18 Simple Ways to F*** up Your AB Testing
 
A/B Testing: You Might be Driving in the Wrong Direction
A/B Testing: You Might be Driving in the Wrong DirectionA/B Testing: You Might be Driving in the Wrong Direction
A/B Testing: You Might be Driving in the Wrong Direction
 
Test of significance (t-test, proportion test, chi-square test)
Test of significance (t-test, proportion test, chi-square test)Test of significance (t-test, proportion test, chi-square test)
Test of significance (t-test, proportion test, chi-square test)
 
T Test For Two Independent Samples
T Test For Two Independent SamplesT Test For Two Independent Samples
T Test For Two Independent Samples
 
The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017
 

Similar to Basics of AB testing in online products

How to Run Landing Page Tests On and Off Paid Social Platforms
How to Run Landing Page Tests On and Off Paid Social PlatformsHow to Run Landing Page Tests On and Off Paid Social Platforms
How to Run Landing Page Tests On and Off Paid Social Platforms
VWO
 
Data-Driven Decision Making by Expedia Sr PM
Data-Driven Decision Making by Expedia Sr PMData-Driven Decision Making by Expedia Sr PM
Data-Driven Decision Making by Expedia Sr PM
Product School
 
Data-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B TestingData-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B Testing
Jack Nguyen (Hung Tien)
 
A B testing introduction.pptx
A B testing introduction.pptxA B testing introduction.pptx
A B testing introduction.pptx
Ahmed Khaled
 
A/B Testing: Common Pitfalls and How to Avoid Them
A/B Testing: Common Pitfalls and How to Avoid ThemA/B Testing: Common Pitfalls and How to Avoid Them
A/B Testing: Common Pitfalls and How to Avoid Them
Igor Karpov
 
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov DisplayIRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET Journal
 
Andreas Reiffen - SMX London Slidedeck
Andreas Reiffen - SMX London SlidedeckAndreas Reiffen - SMX London Slidedeck
Andreas Reiffen - SMX London Slidedeck
Crealytics
 
Ab testing explained
Ab testing explainedAb testing explained
Ab testing explained
Julien Kervizic
 
Testing overview
Testing overviewTesting overview
Testing overview
Anandhababu Msj
 
introduction to Google Firebase and Ab testing
introduction to Google Firebase and Ab testingintroduction to Google Firebase and Ab testing
introduction to Google Firebase and Ab testing
Hamza Rehman
 
Test Automation Strategies For Agile
Test Automation Strategies For AgileTest Automation Strategies For Agile
Test Automation Strategies For Agile
Naresh Jain
 
Supercharge your AB testing with automated causal inference - Community Works...
Supercharge your AB testing with automated causal inference - Community Works...Supercharge your AB testing with automated causal inference - Community Works...
Supercharge your AB testing with automated causal inference - Community Works...
Egor Kraev
 
Master the essentials of conversion optimization
Master the essentials of conversion optimization Master the essentials of conversion optimization
Master the essentials of conversion optimization
Steve Clough
 
AJRA Test Strategy Discussion
AJRA Test Strategy DiscussionAJRA Test Strategy Discussion
AJRA Test Strategy Discussion
ajrhem
 
A b-testing-101
A b-testing-101A b-testing-101
A b-testing-101
Madhumita Mantri
 
Hong Kong Web Analytics Wednesday #10
Hong Kong Web Analytics Wednesday #10Hong Kong Web Analytics Wednesday #10
Hong Kong Web Analytics Wednesday #10
Hong Kong Web Analytics
 
Testing Intelligence
Testing IntelligenceTesting Intelligence
Testing Intelligence
Lalit Bhamare
 
What is AB Testing? A Beginner's Guide
What is AB Testing? A Beginner's GuideWhat is AB Testing? A Beginner's Guide
What is AB Testing? A Beginner's Guide
PPCexpo
 
Analytics Academy 2017 Presentation Slides
Analytics Academy 2017 Presentation SlidesAnalytics Academy 2017 Presentation Slides
Analytics Academy 2017 Presentation Slides
HarvardComms
 

Similar to Basics of AB testing in online products (20)

How to Run Landing Page Tests On and Off Paid Social Platforms
How to Run Landing Page Tests On and Off Paid Social PlatformsHow to Run Landing Page Tests On and Off Paid Social Platforms
How to Run Landing Page Tests On and Off Paid Social Platforms
 
Data-Driven Decision Making by Expedia Sr PM
Data-Driven Decision Making by Expedia Sr PMData-Driven Decision Making by Expedia Sr PM
Data-Driven Decision Making by Expedia Sr PM
 
Data-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B TestingData-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B Testing
 
A B testing introduction.pptx
A B testing introduction.pptxA B testing introduction.pptx
A B testing introduction.pptx
 
A/B Testing: Common Pitfalls and How to Avoid Them
A/B Testing: Common Pitfalls and How to Avoid ThemA/B Testing: Common Pitfalls and How to Avoid Them
A/B Testing: Common Pitfalls and How to Avoid Them
 
User Stories Lunch & Learn
User Stories Lunch & LearnUser Stories Lunch & Learn
User Stories Lunch & Learn
 
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov DisplayIRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
 
Andreas Reiffen - SMX London Slidedeck
Andreas Reiffen - SMX London SlidedeckAndreas Reiffen - SMX London Slidedeck
Andreas Reiffen - SMX London Slidedeck
 
Ab testing explained
Ab testing explainedAb testing explained
Ab testing explained
 
Testing overview
Testing overviewTesting overview
Testing overview
 
introduction to Google Firebase and Ab testing
introduction to Google Firebase and Ab testingintroduction to Google Firebase and Ab testing
introduction to Google Firebase and Ab testing
 
Test Automation Strategies For Agile
Test Automation Strategies For AgileTest Automation Strategies For Agile
Test Automation Strategies For Agile
 
Supercharge your AB testing with automated causal inference - Community Works...
Supercharge your AB testing with automated causal inference - Community Works...Supercharge your AB testing with automated causal inference - Community Works...
Supercharge your AB testing with automated causal inference - Community Works...
 
Master the essentials of conversion optimization
Master the essentials of conversion optimization Master the essentials of conversion optimization
Master the essentials of conversion optimization
 
AJRA Test Strategy Discussion
AJRA Test Strategy DiscussionAJRA Test Strategy Discussion
AJRA Test Strategy Discussion
 
A b-testing-101
A b-testing-101A b-testing-101
A b-testing-101
 
Hong Kong Web Analytics Wednesday #10
Hong Kong Web Analytics Wednesday #10Hong Kong Web Analytics Wednesday #10
Hong Kong Web Analytics Wednesday #10
 
Testing Intelligence
Testing IntelligenceTesting Intelligence
Testing Intelligence
 
What is AB Testing? A Beginner's Guide
What is AB Testing? A Beginner's GuideWhat is AB Testing? A Beginner's Guide
What is AB Testing? A Beginner's Guide
 
Analytics Academy 2017 Presentation Slides
Analytics Academy 2017 Presentation SlidesAnalytics Academy 2017 Presentation Slides
Analytics Academy 2017 Presentation Slides
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 

Recently uploaded (20)

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 

Basics of AB testing in online products

  • 1.
  • 2. Google’s infamous AB test: testing 41 variants of mildly different shades of blue
  • 4. Longitudinal or pre-post testing is difficult since little variance is explained by product features. Other factors impacting conversion are: Price Weekend/Weekday Seasonality Source of Traffic Availability Mix of users (distribution bias) Clarity of product thinking & avoiding snowballing of incorrect insights Why was conversion for Android version 5.5.6 better than 5.5.5 for the first 3 days? (Hint: Early adoptor bias- users with stable wifi and loyal to MMT app convert higher than all users) Why is AB Testing needed?
  • 6. Choosing Alia Bhatt as brand ambassador A recommended hotel on the top of the listing Impact of a fix for latency Increase sign-in rate by increasing the size of the login button Impact of showing packing list as a notification a day before the flight date Quiz: What can or cannot be AB tested AB testing is for lower hanging fruits not quantum leaps: for those user testing, interviews and FGDs as well as analysis of existing data are better.
  • 7. Choosing Alia Bhatt as brand ambassador: No A recommended hotel on the top of the listing: Yes Impact of a fix for latency: Yes Increase sign-in rate by increasing the size of the login button: Yes Impact of showing packing list as a notification a day before the flight date: Tough, but theoretically yes Quiz: What can or cannot be AB tested AB testing is for lower hanging fruits not quantum leaps: for those user testing, interviews and FGDs as well as analysis of existing data are better.
  • 8. Key Stages of AB Testing Hypothesis Definition Metric Identification Determining Size & Duration Tooling & Distribution Invariance Testing Analyzing Results
  • 9. Almost all AB experiment hypotheses should look something like below: Eg. 1 H0 (Null/Control): A big login button will not impact user login percentage H1 (Test): A big login button will significantly increase user login percentage Eg: 2 H0 (Control): Putting higher user rating hotels at the top of the listing doesn’t change conversion H1 (Test): Putting higher user rating hotels at the top of the listing changes conversion significantly Good to articulate the hypothesis you’re testing in simple English at the start of the experiment. The hypothesis should have a user verbiage and not a feature verbiage. It’s okay if you skip this too as long as you get the idea. Hypotheses Definition
  • 10. Counts, eg. #Shoppers #Users buying #Orders Rates, eg. Click through Rate Search to Shopper Rate Bounce Rate Probability (a user completes a task), eg. User Conversion in the funnel Metric identification (1/2)
  • 11. Consider the following metrics for conversion: 1. #Order/#Visits to listing page 2. #Visitors to TY Page/#Visitors to Listing Page 3. #Visits to TY Page/#Visits to listing page 4. #Orders/#PageViews of listing page Metric identification (2/2): Quiz 1 2 3 4 User refreshes the listing page User breaks the booking into 2 User’s TY page gets refreshed User does a browser back and the page is served from cache User drops off on details and comes back via drop-off notification Omniture is not firing properly on listing page
  • 12. 1. If showing a summary of hotel USPs on the details page is improving conversion? 2. If a user who purchased with MMT will come back again? 3. If we are sending too many or too few notifications to users? How can you measure?
  • 13. 1 .If showing a summary of hotel USPs on the details page is improving conversion? A simple A/B set-up with and without the feature will help in evaluation 2. If a user who purchased with MMT will come back again? A. An secondary metric captured by asking buyers this question or an NPS survey and comparing results should give some idea 3. If we are sending too many or too few notifications to users? A. An indirect metric measured as retained users on the app across the two variants How can you measure?
  • 14. Size & Duration Reality Test Output Error Control is better Control is better 1- α (confidence level) Control is better Test is better α (significance) Test is better Test is better 1-β (power) Test is better Control is better β α or type-I error is the probability of rejecting null when it is true (Downside Error) β or type-II error is the probability of accepting null when control is better (Opportunity Cost Error) Target values to test significance is at α = 5% and 1-β=80%
  • 15. Size & Duration Size: • To figure out the size of the samples required to get the 80% power for the test, here • These many users need to be targeted with the smallest of the test variant being examined Duration: • Is an outcome of what % of traffic can you direct to the test + some minimum duration considerations • You might want to limit the %age exposure of the experiment due to: • Revenue impacts • Leaving room for other people to experiment • Even if the sample size for the required power can be reached in a shorter duration good to reduce the exposure of the experiment to include: • At-least 1 weekend/weekdays • low & high discounting periods (if possible) • Low & high availability periods (if possible)
  • 16. No Peeking • It is important to not reduce power of the test by changing decision with insufficient data • Best explained in the blog. Primary idea being that taking duration clues from early data introduces human error in the measurement • In-case the sample size is turning out to be very high, a few ways to reduce it are: • Use this sequential sampling approach (reduces size by as high as 50% in some scenarios) • Use this Bayesian sampling approach (mathematically intensive) • Try matching the lowest unit of measurement with lowest unit of distribution (eg instead of measuring latency/user measure latency per hit and distribute the experiment on hit) • Try moving the experiment allocation closer to the step where there is an actual change (eg assign payment experiment to payment page users)
  • 17. Distribution Metric 1. Page Views 2. Cookies 3. Login-ID 4. Device ID 5. IP Address Tooling & Distribution (1/2) Which will not be hampered by the following 1 2 3 4 5 User shortlists 2-3 hotels and comes back after a day User starts search on mobile and books on desktop User changes browsers on the machine User logs out and continues with another ID
  • 18. Typical requirements for an AB system are: Each experiment should support multiple variants (A/B/C..) and each variant can be defined using a combination of experiment variables Each user is randomly assigned a variant (as per the distribution percentage). System ensures users are served a consistent experience basis their device ID or cookie (other distribution parameters like page view or visit might be used but cookie/device-id is the most stable) Auto-logs the variant that the users are being exposed to in an analytics system There are multiple AB testing systems available by several vendors or one can be easily created internally using a tag manager like Google tags Tooling & Distribution (2/2)
  • 19. A/A Testing: Ideally, it is good to run 1 or many A/A test to measure the same metric you’re planning to measure in A/B tests before and after your test period Even if the above is not feasible, do try to run A/A test regularly to test the underlying system Things to test during A/A Tests: Key metrics you measure (like conversion, counts, page-views, etc) and their statistical difference between the two cohorts at different ratios of test & control A/A & Invariance Testing
  • 20. Invariance Testing Identify Invariance metrics- metrics that should not change between control & experiment One of the basic metrics that will be the invariant will be the count of the users assigned to each group. Very important to test these Each of the invariants should be within statistical bounds between population and control A/A & Invariance Testing
  • 21. 1. Remember the threshold practical significance threshold used in sample size calculator. That is going to be the least change that we care about, so a statistically significant change < the practical significance threshold is useless. 2. Choose the distribution & test: 1. Counts: poisson distribution or poisson-mean 2. Rates: poisson distribution or possison-mean 3. Click-through-probability: binomial distribution & t-test (or chi-square test). Analyzing Results (1/3)
  • 22. Analyzing Results (2/3): Taking Decision Launch Don’t Launch or Keep Testing
  • 23. Analyzing Results (2/3): Taking Decision Launch Don’t Launch or Keep Testing Yes No Keep Testing No Don’t Launch No Keep Testing
  • 24. Analyzing Results (3/3): Taking Decision
  • 25. A/B/C Setup A particular type of experiment set-up that is beneficial where there might be server & client side affects that introduce bias. A few examples Measure impact of persuasion shown (say last room left) User might be positively impacted to convert higher, v/s Higher latency to fetch persuasion might reduce conversion Showing a message “Cheaper than Rajdhani” on flights > 75 mins duration and fare <3000 User might be positively impacted to convert, v/s Conversion for cheaper flight (<3000) is generally higher Showing a USP of the hotel generated from user reviews, eg. guests love this because: “great neighborhood to stay” User might be positively impacted to convert, v/s Feature might only be visible on hotels with > X reviews (and hence bookings). There is an innate hotel bias. In these scenarios, it is best to setup 3 variants: A= Feature Off or Control B= Feature On but not shown to users C= Feature on but shown to users. A/B/C Setup
  • 26. AB testing in an organization typically goes through the following stages: Would encourage you all to help your organization move to the next stage in the AB testing journey Best to be in a state where the company culture supports quick prototyping and testing with real users Solving for multi device (stitching sessions) and other tracking limitations in the set-up Higher standards of experiment analysis and responsible reporting Things to Improve Sanity Checks Testing for conflict resolution Testing for impact measurement Testing for hypothesis Rapid prototyping & testing
  • 27. Definitely read the Evan Miller blog. It basically summarizes everything you need to know. If keen on getting in more detail of techniques and best practices, take the course on Udacity. Just doing the first chapter would be good enough Further Reading