A/B Testing best practices from strategic vision to operational considerations to communication and finally expectations management. We need to adhere to fundamental project management, technology, statistical, experimental design, UX Design, Customer Relationship, business and data principles to ensure that the insights and hence the decision is as trustworthy as possible.
1. Intended for Knowledge Sharing only
A/B Testing is not Art, it is Science
Business Analytics Innovation Summit 2015
Business Analytics Innovation Summit | May 2015
2. Intended for Knowledge Sharing only
Disclaimer:
Participation in this summit is purely on personal basis and not representing VISA in any form or
matter. The talk is based on learnings from work across industries and firms. Care has been taken to
ensure no proprietary or work related info of any firm is used in any material.
Director, Insights at Visa, Inc.
Help Executives/Product/Marketing with
actionable insights
RAMKUMAR RAVICHANDRAN
3. Intended for Knowledge Sharing only
Quick recap of what is it?
Quick recap of what it is
Intended for Knowledge Sharing only
Quick recap on A/B Testing
4. Intended for Knowledge Sharing
only 44
OK, SO WHAT EXACTLY IS…
A/B Testing is the simplest form of Experimental Design used to test reactions of Customers something
new or changed(a feature/s, product/s, campaign/s)….
“Similar”
Users
Variation 1
Variation 2
Is the delta
(V1-V2)
statistically
significant?
Test
Metric
Value (V1)
Test
Metric
Value (V2)
Intended for Knowledge Sharing only
5. SOME SAMPLE APPLICATIONS…
Some use cases from the industries and functions….
Intended for Knowledge Sharing only
Product Management
Marketing/Branding
Operations
1. To test performance of new product/feature/flow before actual rollout
2. To optimize for Placement, Prominence, Messaging
To optimize for Campaigns -
1. Channel - Email/Social/Offline/SEO/Alerts/Notifications
2. Type - Promotion/Discounts, etc..
3. Frequency - Monthly/Weekly
4. Time - Seasonal, etc..
5. Place - Retailers/Ads/Websites
Redirect Customers through new queuing flow, FAQ pages, Chat terminals,
etc..
Function Areas
Sales New Onboarding Flow, Value Prop Communication, Execution Method, Channel
Risk New Risk Engine performance over Current
…what to test is usually determined from Strategy, UX, Business Wisdom, Analytics, Research,
Mining, etc.
6. Intended for Knowledge Sharing only
Quick recap of what is it?
Quick recap of what it is
Intended for Knowledge Sharing only
Common Misconceptions
7. A DAY IN THE LIFE OF AN A/B TESTER
*only satiric to wake you up and not indicative of anyone or anything- any similarity is purely coincidental!
https://www.youtube.com/watch?v=_CHLE9hmbEw
8. COMMON MISPERCEPTIONS
We often hear these statements in the context of testing…
Very easy
A/B Testing will prove who is right
Test everything
Coolness is in the quantity and complexity of the test
Oh results aren’t significant – A/B Testing is a failure
…so let’s check how many of these are right
9. Intended for Knowledge Sharing only
Quick recap of what is it?
Quick recap of what it is
Intended for Knowledge Sharing only
The big picture
11. WHAT DO YOU MEAN BY RIGHT FACE?
Message
Prominence
Flow
Form
Clear and crisp Value Prop and Call to Action (CTA)
Trendy and easy to spot
Easily spotted and fitting with the Consumer’s mental model
Quick and efficient
Minimal and relevant elements only
Placement
12. WHAT ARE THE HIGH LEVEL STEPS?
• Analytics team
creates direct/proxy
metrics to measure
the performance
• Instrument metrics
if needed
• Decision on the
Research
Methodology based
on Analytical
findings
ACTIONS
• Defined the question
to be answered and
why, Design the
changes, know the
cost and finalize
success criteria
• Quantify/Analyze
the impact
• Size the potential
impact on launching
Measure LaunchStrategy
PHASES
Analyze
Primary Metrics, e g.,
• Click Through Rate
• NPS
Secondary Metrics
• Repeat Visits
• Lifetime Value
Questions
• Target Customers
• Where and What is
being checked?
• Why is this even
being considered?
• Target Metrics and
success criteria
Research Methods
• Attitudinal vs.
Behavioral
• Qualitative vs.
Quantitative
• Context for Product
Use
Factors deciding
Research Methods
• Speed of execution
• Cost of execution
• Reliability
• Product
Development Stage
Factors deciding
eventual rollout (in
order of priority)
• Strategic need
• Estimated impact
calculation from
Analytics
• Findings from other
sources (Data
Analytics/Mining,
Consumer Feedback
DETAILS
13. WHEN TO USE WHICH METHOD?
Method Description
Factors
Speed Cost Inference Dev Stage
Prototyping
Usability
Studies
Focus Group
Surveys &
Feedback
Pre-Post
A/B Testing
Create & Test prototypes
internally (external, if
needed)
Standardized Lab
experiments – Panel/s of
employees/friends/family
In-depth interviews for
Feedback
Email/Pop-ups Surveys
Roll-out the changes and
then test for impact
Different experiences to
users and then measure delta
Quickest (HTML
Prototypes)
Quick (Panel,
Questions, Read)
Slow (+Detailed
interviews)
Slower
(+Response rate)
Slower (Dev+QA+
Launch+Release
cycle)
Slowest
(+Sampling+
Profiling+
Statistical
Inferencing)
Inexpensive
(Feedback
incentives)
Relatively
expensive
(+Lab)
Expensive
(+Incentive
+Time)
Expensive
(Infra to
send, track
& Read)
Costly
(+Tech
resources)
Very Costly
(+Tech
+Analytics
+Time)
Directional
+Consistency across
users
+additional context
on Why?
+strength of
numbers
+Possible Statistical
Significance but risk
of bad experience.
+Rigorous (Statistical
Significance). *Risk of
bad experience
reduced.
Ideation Stage
Ideation Stage
Ideation Stage
Ideation/Dev/
Post Launch
Post Launch
Pre Launch
(after Dev)
14. Intended for Knowledge Sharing only
Quick recap of what is it?
Quick recap of what it is
Intended for Knowledge Sharing only
A/B Testing
15. STEPS IN EXECUTING AN A/B TEST
Phase OwnersTasks Outcome
Pre-Work
Define &
Prioritize
Design
Set-up &
Execution
UAT &
Sign-off
Launch &
Monitor
Analysis &
Readout
• Strategic Objectives: Engagement, Satisfaction, Personalize, etc.
• Analytics: Drivers Analysis, Data Gap Analysis, RoI Analysis.
• Decision filters: A/B or Pre-Post or Usability or Drivers Modeling.
• Type of Test: Placement, Prominence, Messaging, Form, Flow.
• Success Criteria: Test Metrics and estimated impact ($).
• Wireframe: Expected change(s) vs. Control (Design signed off)
• Target Criteria: Who, Where, When, #Cells (exclusions if any)
• Analytical Details: Sample size, #days to run, Traffic Split
• Set-up: Actual set-up on Front end.
• QA: Initial QA – look & feel, compatibilities, loading, data, etc.
• Sign-off from Product: Per expectations
• Sign-off from Requester: Per expectations, deviations ok?
• Sign-off from Analytics & Data: Data validation results
• Monitor the Test for data validity (if bad workaround or stop)
• Stop Test when sample size needs met.
• Impact calculation: Calculate delta, significance & consistency.
• Go/No-go Recommendation and $ impact: on full roll out.
Requestors,
Product &
Analytics
Requestors,
Product &
Analytics
Requestors,
Product &
Analytics
Technology
Requestors,
Product, BI &
Analytics
Analytics &
Technology
Analytics
Test type
assignment
Test
prioritized &
added to
pipeline
Test
Document for
Tech
Test
prototype for
UAT
Go ahead for
launch
Test results
Final readout
16. PROJECT MANAGEMENT (ILLUSTRATIVE)
Priority Test Description
Requestors/Key
Stakeholders
Type of
Change
Hypotheses
How did we
arrive at this
hypotheses
Where will
the Test
happen?
Target
Audience
1
Remove Ad
banner on Yahoo
home page
User Experience Prominence
Removing Ad
banners would
reduce
distraction and
focus users to
CTA
Product/Design
Judgement
Home Page All Consumers
Primary Metrics Secondary Metrics
Estimated Benefit
(USD)Click Through Rate Net Promoter Score Repeat Visits
Customer Lifetime
Value
x% y% z% a%
Standard Test
Plan Document
Ready
#Test Cells
#Days needed for the
Test to run tor
statistical significant
sample
Design
Ready?
Specific
Technical
Requirements?
Estimated Tech
Effort/Cost
(USD)
Overall Test Cost
(USD)
Yes 2 40 Yes
Test Details
Expected Impact from the Test
Other details from the Test
17. NECESSARY DETAILS FOR PROJECT MANAGEMENT
Sl. No. Type of Change Example
1 Placement Right top vs. Right bottom
2 Message Do this vs. Do that
3 Prominence Size, Color, etc.
4 Flow 3 step submission to 2 step submission, etc.
5 Targeting Different set of actions to different sets of people
6 Form 5 fields to fill vs. 2 fields
Sl. No. Type of Test
1 One Cell Test (A/B Test)
2 Multiple Test (A/B/C Test)
3 Multivariate Test (A*B*C Test)
Sl. No. How did we arrive at this hypotheses?
1 Analytics
2 Consumer Feedback
3 Product/Design Judgement
4 Competitive Pressures
5 Legal Compliance
6 Partnership Requirements
7 Strategic need
18. SAMPLE SIZE CALCULATION (ILLUSTRATIVE)
#Days for the test to run Avg counts per day #Sample Size Required in Test Group
40 10,000 40,000
Control proportion
(%)
Lift to test
(%)
Test
proportion
(%)
Acceptable False Positive
threshold:
Chances of incorrectly
identifying a lift when it's not
there
Acceptable False Negative threshold:
Chances of incorrectly identifying
there's no lift when there is one
60% 20% 72% 20% 20%
Required sample size and #days to run the test for required statistical significance…
What input metrics are required…
Calculations that happen in the backend…
Average proportion
(%)
Control Variance
{p*(1-p)}
Test variance
{p*(1-p)}
Avg variance
False Positive
(zcrit):
False negative
(zpwr)
64% 23% 23% 23% 1.28 1.28
19. SAMPLE READOUT
Objective
Understand if removing Ad banner on home page improves click through rate on articles and increases consumer
satisfaction
0%
20%
40%
60%
80%
100%
120%
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
DeltabetweenTest&Control
Test/ControlValues
Test metrics - Click through Rate
Delta Test Control
Key Findings
1. Removing the banner increased CTR by '100%' and NPS by 20 points '. It translates to $40 M in Lifetime Value impact.
2. All the above lifts are statistically significant at 90% confidence level. These lifts were also consistent over two weeks
time window.
Sl.No.
1
2
3
5
Performance data Time window: Apr 1, 1980 to Apr 14, 1980
20. Intended for Knowledge Sharing only
Quick recap of what is it?
Quick recap of what it is
Intended for Knowledge Sharing only
Other Considerations & Best Practices
21. THINGS TO WATCH OUT FOR
• Engineering overheads – everytime a new flow needs to be introduced or any major addition to the experience,
new development is required. It has to go through Standard engineering prioritization route unless a SWAT team is
dedicated to it.
• Tricky QA situations – QA team should be trained to handle A/B Testing scenarios and use cases; Integration
with automated QA tools. Security and FE load failure considerations apart from standard checks.
• Operational excellence requirements – Testing of the Tests in Sandbox, Staging and Live Site Testing areas.
End to End Dry runs mandatory being launching the tests.
• Analytical nuances – Experiment Design supreme need! External factors can easily invalidate A/B Testing.
Sample fragmentation with increasing #tests and complexity; Need for Universal Control; Impact should be
checked for significance over time.
• Data needs – Reliable instrumentation, Testing Tool Javascripts put in right place, with minimal overhead
performance impact, integration with Web Analytics tool, Data feed with ability to tie with other data sources
(for deep dives).
• Branding Guidelines – Don’t overwhelm and confuse users in quest for multiple and complex tests; Standardize
but customize experience across various channels and platforms; Soft launches should be as much avoided as
possible.
• Proactive internal communication, specifically to client facing teams.
• Strategic Decisions – Some changes have to go in irrespective of A/B Testing findings, the question would be
how to make it happen right? This is gradual ramp, progressive learning and iterative improvements – a collection
of A/B Tests and not one off big one.
…A/B Testing can never be a failure, by definition it is a learning on whether the change was well
received by the user or not that informs the next steps
22. Intended for Knowledge Sharing only
Quick recap of what is it?
Quick recap of what it is
Intended for Knowledge Sharing only
Appendix
23. Intended for Knowledge Sharing
only 2323
THANK YOU!
Intended for Knowledge Sharing only
Would love to hear from you on any of the following forums…
https://twitter.com/decisions_2_0
http://www.slideshare.net/RamkumarRavichandran
https://www.youtube.com/channel/UCODSVC0WQws607clv0k8mQA/videos
http://www.odbms.org/2015/01/ramkumar-ravichandran-visa/
https://www.linkedin.com/pub/ramkumar-ravichandran/10/545/67a
24. Intended for Knowledge Sharing
only 24
RESEARCH/LEARNING RESOURCES
Intended for Knowledge Sharing only
• When to use which Research Method
http://www.nngroup.com/articles/which-ux-research-methods/
• Building our own Participatory Research Community
http://uxmag.com/articles/build-your-own-participant-resource-for-ux-research
• Additional details on User Research Methods
http://www.usability.gov/what-and-why/user-research.html
• Practical questions on User Research
http://www.slideshare.net/dgcooley/introduction-to-ux-research-methods
• A/B Tool comparison
http://www.roidna.com/tools/ab-testing-tool/#tool-comparison
• Best Practices on A/B Testing
http://conversionxl.com/12-ab-split-testing-mistakes-i-see-businesses-make-all-the-
time/#.
• Case Studies on A/B Testing
http://white.net/noise/30-multivariate-ab-split-testing-tools-tutorials-resources/
25. A/B TESTING TOOL EVALUATION STEPS
• Step 1: Decide on evaluation criteria & test use cases in discussion with various
stakeholder teams - Analytics & Testing, Business Intelligence, Marketing, Product
Management & Engineering
• Step 2: First round interview with the Sales teams to understand what tools
meet the criteria
• Step 3: Request product capability demo on the test use cases and evaluate the
level of investment (resources & time) needed for such use cases
• Step 4: Interview with current Customer references
• Step 5: Conduct specific “engineering/security” focused discussion to evaluate
the implementation cost, resources and time and fit with existing infrastructure
• Step 6: Cross functional Panel discussion on the findings from the Evaluation
round and decisioning on the vendor
26. A/B TESTING TOOL EVALUATION CRITERIA
• Type of Testing: A/B Testing, Multiple A/B Testing, Multi-factor testing
• Traffic distribution: Flexibility of Traffic distribution (non 50-50), Segmentation
(Region), Universal Control
• What can be tested: Placement, Prominence, Messaging, Funnels, Channels, etc.
• Test Metrics: Clicks, Page Views, Conversion, Time Spent, etc.
• Implementation effort: Time, Resources, What can & cannot be done, Latency, Winner
Variation ramp and Version Release dependencies in App Testing
• Channels: Web, Native App, Mobile Website
• Pricing packages: Users, Page Load, Monthly Service Contract (Type), etc.
• Programming experience: GUI vs. Coding (Small Test vs. Complex Test)
• Analysis options: Analysis & Reporting Flexibility, Post (or in-flight) Testing Segmentation
• Current Customer Base:
• Security limitations