SlideShare a Scribd company logo
1 of 116
Download to read offline
Building an experimentation
                          framework for web apps



                                  Zhi-Da Zhong
                                  zz@etsy.com



Tuesday, July 26, 2011
About the talk
                               Why
                               What
                            Framework
                            Break / hack
                            Tech Details
                             Test design
                              Analysis


Tuesday, July 26, 2011
Why?



Tuesday, July 26, 2011
Questions


                         “What will happen if I do X”?

                             “Is X better than Y?”




Tuesday, July 26, 2011
The future
                                  &
                         alternate universes
                              (We’re bad at those.)




Tuesday, July 26, 2011
Then what?



Tuesday, July 26, 2011
Experiments




Tuesday, July 26, 2011
Experiments



                            Try it out.




Tuesday, July 26, 2011
Experiments



                               Try it out.
                         Data beats speculation.




Tuesday, July 26, 2011
Experiments



                         Try different alternatives
                           on different people.




Tuesday, July 26, 2011
Experiments



                         Try different alternatives
                           on different people.




Tuesday, July 26, 2011
Which is better?



                                v.s.




Tuesday, July 26, 2011
Not a great experiment




Tuesday, July 26, 2011
Web apps



Tuesday, July 26, 2011
Front end experiments


                         • Layout, colors, images, copy, ...
                         • No functional changes
                         • Impact can be surprisingly high




Tuesday, July 26, 2011
A little more complex...



                         • Multipage flows
                         • Functionality changes




Tuesday, July 26, 2011
Backend experiments



                         • Why not?
                         • Algorithms, architectures, batch processes, ...




Tuesday, July 26, 2011
The Etsy search backend

                                                                      Web app
                         •   New algorithm                            search()



                         •   New RPC protocol                    searchA()   searchB()




                         •   New result data structure
                                                          Search                      Search
                         •   New Solr trunk snapshot     cluster A                   cluster B




Tuesday, July 26, 2011
DB re-architecture



                         • Postgres => Sharded MySQL
                         • Multiple experiments




Tuesday, July 26, 2011
Whole new features

                               New pages
                                   +
                              New DB tables
                                   +
                              New batch jobs
                                   +
                                   ...


Tuesday, July 26, 2011
Not just 2 variants



                         • A/B/C... tests
                         • Multi-variate tests




Tuesday, July 26, 2011
Caveats


                         • Content not under your control
                         • Price tests?
                         • Hard-to-measure/quantify things
                         • Long term impact?




Tuesday, July 26, 2011
Other tests



                         • Internal users testing
                         • Whitelisted user testing




Tuesday, July 26, 2011
Opt-in experiments




Tuesday, July 26, 2011
Complementary techniques


                         • Observed/recorded testing
                            - show different people the same thing

                         • Side-by-side testing
                            - show each person 2 alternatives




Tuesday, July 26, 2011
Side by side testing




Tuesday, July 26, 2011
How



Tuesday, July 26, 2011
A common approach


                         • JS-based
                         • Non-techie UI
                         • “No IT!”
                         • “Designed For Marketers, By Marketers”




Tuesday, July 26, 2011
Our approach


                         • The developer is the user
                         • Code as configuration
                         • An integral part of the dev process




Tuesday, July 26, 2011
Developer as the user



                         • The builder of the feature writes the test
                         • Not just a marketing tool




Tuesday, July 26, 2011
Code as config

                         • Simplicity
                         • Expressivity
                         • Quality
                         • Version => complete system state
                          •   Revision history




Tuesday, July 26, 2011
Part of the dev process



                            Every change is an experiment!




Tuesday, July 26, 2011
What does it look like?



Tuesday, July 26, 2011
Tuesday, July 26, 2011
Default => Experiment => (new) Default




Tuesday, July 26, 2011
To add a new feature...

                         + $config[‘new_search’] = array(
                         +    ‘enabled’ => ‘off’
                         + );


                             function search() {
                         +     if ($cfg->isEnabled(‘new_search’)) {
                         +       return do_new_search();
                         +     }

                                 // existing stuff
                             }




Tuesday, July 26, 2011
Deploy that



Tuesday, July 26, 2011
Now we go crazy...


                         function do_new_search() {
                           // exciting new stuff
                           // that might or might not work
                           // but we can deploy it anyway
                           // since it’s flagged off
                         }




Tuesday, July 26, 2011
Internal user testing

                             $config[‘new_search’] = array(
                         +      ‘enabled’ => ‘rampup’,
                         +      ‘rampup’ => array(
                         +        ‘admin’ => true
                                )
                             );




Tuesday, July 26, 2011
Whitelists

                             $config[‘new_search’] = array(
                                ‘enabled’ => ‘rampup’,
                                ‘rampup’ => array(
                         +        ‘whitelist’ => array('zhida'),
                                  ‘admin’ => true
                                )
                             );




Tuesday, July 26, 2011
Opt-in experiments

                             $config[‘new_search’] = array(
                                ‘enabled’ => ‘rampup’,
                                ‘rampup’ => array(
                         +        ‘group’ => 12345,
                                  ‘admin’ => true
                                )
                             );




Tuesday, July 26, 2011
A/B

                             $config[‘new_search’] = array(
                                ‘enabled’ => ‘rampup’,
                                ‘rampup’ => array(
                         +        ‘percent’ => 1.5,
                                  ‘admin’ => true
                                )
                             );




Tuesday, July 26, 2011
If it works...


                             $config[‘new_search’] = array(
                         +      ‘enabled’ => ‘on’
                             );




Tuesday, July 26, 2011
Order matters



                         Whitelist / Blacklist > Internal > Opt-in > Random




Tuesday, July 26, 2011
The framework



Tuesday, July 26, 2011
As easy as...




Tuesday, July 26, 2011
As easy as...


                         1. Pick a variant




Tuesday, July 26, 2011
As easy as...


                         1. Pick a variant
                         2. Do what it says




Tuesday, July 26, 2011
As easy as...


                         1. Pick a variant
                         2. Do what it says
                         3. Log the event




Tuesday, July 26, 2011
What's in a test?



Tuesday, July 26, 2011
Variants


                         • Key-value pairs
                          •   interpreted by the app

                         • Name
                          •   mostly for logging




Tuesday, July 26, 2011
SubjectIdProvider
                                                             function getID()

                         •       Why?
                             •       hashing and other selectors
                             •       logging
                         •       Types of subjects
                                 •     Users...but not always
                                 •     Different groups of users - sellers vs buyers, etc.
                                 •     Different ways to identify them - signed in vs signed out



Tuesday, July 26, 2011
Selectors



                         function select($subjectID) => Variant Name




Tuesday, July 26, 2011
Combining multiple selectors

                         • OR
                          •   breaks blacklists

                         • AND
                          •   breaks whitelists

                         • Sequence
                          •   works!




Tuesday, July 26, 2011
Selector sequence



                         • Defines an ordering
                         • Returns A/B/C/... or <don't care>




Tuesday, July 26, 2011
Loggers



                         function log($testKey, $variantKey, $subjectKey)




Tuesday, July 26, 2011
More => better

                         • More data
                         • More ways to track
                          •   access logs
                          •   3P analytics
                          •   custom




Tuesday, July 26, 2011
Access log augmentation


                         • Apache note
                         • Lots of log analysis tools
                          •   grep
                          •   $$




Tuesday, July 26, 2011
3P Analytics

                         • Quick to start
                         • May be cheap
                         • Volume?
                         • Lag time?
                         • Flexibility / customization?



Tuesday, July 26, 2011
3P Analytics - how


                         • Custom variables
                          •   take note of number & size limits

                         • Custom segments
                         • Canned metrics




Tuesday, July 26, 2011
3P Analytics - example

                         <script type="text/javascript">
                            var pageTracker = _gat._getTracker("UA-1234567-8");
                            pageTracker._initData();
                            pageTracker._setCustomVar(2, "AB", "search_test.variantC", 3);
                            pageTracker._trackPageview();
                         </script>




Tuesday, July 26, 2011
Our own event tracking

                                                  HTML,    event
                                                   JS     beacon
                                                                        Web app

                         •   HTML beacons                          Event log


                         •   Hadoop
                                                             Hadoop

                         •   Cloud                            Results




Tuesday, July 26, 2011
Break / hack
                          https://github.com/etsy/ab




Tuesday, July 26, 2011
Building on top of the
                                core API


Tuesday, July 26, 2011
Test builders

                         • Capture common patterns
                          •   feature ramp ups
                          •   opt-in experiments

                         • Help with test design
                          •   weight equalization
                          •   multivariate testing




Tuesday, July 26, 2011
Automatic Dispatchers

                         • Separate dispatching and work
                         • Work with components that have well-defined
                           invocation APIs
                         • Define a particular level of granularity
                         • Feel like magic



Tuesday, July 26, 2011
Dispatcher example - MVC


                         • View dispatch
                         • Controller dispatch
                         • Spring framework, etc.




Tuesday, July 26, 2011
Selector Registry


                         • Reuse           $selectorReg = array(
                                              ‘staff’ => ‘InternalUserSelector’,
                         • Clarity            ‘whitelist’ => ‘WhitelistSelector’,
                                              ‘percent’ => ‘WeightedSelector’
                         • Documentation   );




Tuesday, July 26, 2011
Randomized Selector



Tuesday, July 26, 2011
What does it mean?




Tuesday, July 26, 2011
What does it mean?


                         • Independent of subject attributes




Tuesday, July 26, 2011
What does it mean?


                         • Independent of subject attributes
                         • Independent of other tests




Tuesday, July 26, 2011
What does it mean?


                         • Independent of subject attributes
                         • Independent of other tests
                         • Independent of (coarse-grained) time




Tuesday, July 26, 2011
Persistence




Tuesday, July 26, 2011
Persistence


                         • Better experience




Tuesday, July 26, 2011
Persistence


                         • Better experience
                         • Better data




Tuesday, July 26, 2011
Persistence


                         • Better experience
                         • Better data
                         • Multi-part tests




Tuesday, July 26, 2011
Persistence


                         • Better experience
                         • Better data
                         • Multi-part tests
                         • ...but not forever




Tuesday, July 26, 2011
Ramping up/down


                         • Vary group sizes
                         • Reduce risk
                         • Distribute load




Tuesday, July 26, 2011
Persistence + Ramping

                         • Minimize inconsistency
                         • Ramping up
                          •   Should just add people to the treatment group

                         • Ramping down
                          •   Should just remove part of the treatment group




Tuesday, July 26, 2011
rand()

                         • Explicit persistence
                          •   Cookie
                          •   DB

                         • Scaling
                         • Maintenance



Tuesday, July 26, 2011
Hashing


                         variant = H(id)




Tuesday, July 26, 2011
Hashing


                         variant = H(id)


                            Persistence




Tuesday, July 26, 2011
Hashing


                                       variant = H(id)




                         Persistence



Tuesday, July 26, 2011
Hashing


                                        variant = H(id)


                                       Attribute independence




                         Persistence



Tuesday, July 26, 2011
Hashing


                                               variant = H(id)




                         Persistence   Attribute independence



Tuesday, July 26, 2011
Hashing


                                               variant = H(id)


                                          Test independence?


                         Persistence   Attribute independence



Tuesday, July 26, 2011
Hashing


                                          variant = H(test id, id)

                                               Test independence




                         Persistence   Attribute independence



Tuesday, July 26, 2011
Hashing


                                          variant = H(test id, id)




                         Persistence   Attribute independence   Test independence



Tuesday, July 26, 2011
Hashing


                                          variant = H(test id, id)


                                                 What else?


                         Persistence   Attribute independence   Test independence



Tuesday, July 26, 2011
Hashing


                                          variant = H(test id, id)


                                                  Weights!


                         Persistence   Attribute independence   Test independence



Tuesday, July 26, 2011
Hashing


                                                  h = H(test id, id)




                         Persistence   Attribute independence   Test independence



Tuesday, July 26, 2011
Hashing


                                                  h = H(test id, id)

                                          variant = P(h, weights)


                         Persistence   Attribute independence   Test independence



Tuesday, July 26, 2011
Partitioning



                         Hash


                                0         1


Tuesday, July 26, 2011
Partitioning



                         Hash


                                0               1
                                       .5
                                    Partition
Tuesday, July 26, 2011
Partitioning



                         Hash


                                0 A       B   1
                                     .5
                                  Partition
Tuesday, July 26, 2011
Ramping up



                         Hash


                                 0 A       B   1
                                         .7
                                   Partition
Tuesday, July 26, 2011
Which hash function?


                         • MD5/SHA-256/...
                         • Test it!
                         • But be careful...




Tuesday, July 26, 2011
A/B + opt-in


                         • Need to separate the groups for analysis
                         • Solution: use more than 2 variants!
                          •   Act according to variant properties
                          •   Track by variant name




Tuesday, July 26, 2011
Analysis



Tuesday, July 26, 2011
...   Confidence interval ... something something
                                     ... Binomial ... blah blah ...




Tuesday, July 26, 2011
Confidence Intervals



                         • How sure are we?
                         • What if it were random?




Tuesday, July 26, 2011
Binomial experiments




Tuesday, July 26, 2011
Binomial experiments



                             HT HTTT HT H H




Tuesday, July 26, 2011
Binomial experiments



                             HT HTTT HT H H
                             T HT HTT H HT H




Tuesday, July 26, 2011
Results



Tuesday, July 26, 2011
Dashboards




Tuesday, July 26, 2011
A few test design tips



Tuesday, July 26, 2011
Whatʼs the question?




Tuesday, July 26, 2011
Whatʼs the question?


                             What metrics?




Tuesday, July 26, 2011
Whatʼs the question?


                             What metrics?

                            How much better?




Tuesday, July 26, 2011
Who?

                         • Different roles
                         • Old vs new
                          •   Novelty
                          •   Habit
                          •   Expectation




Tuesday, July 26, 2011
When?

                         • User types vary
                         • Activity patterns vary
                         • Site content might vary
                         • Performance might vary
                         • Full weeks are often a good starting point



Tuesday, July 26, 2011
Summary



Tuesday, July 26, 2011
Better living through
                                   experimentation


                         • More risk taking => better product
                         • MTTR
                         • Lower stress




Tuesday, July 26, 2011
You can too.



Tuesday, July 26, 2011

More Related Content

Similar to Building an experimentation framework

CMS Expo 2011 - Social Drupal
CMS Expo 2011 - Social DrupalCMS Expo 2011 - Social Drupal
CMS Expo 2011 - Social Drupal
Blake Hall
 
2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore
ikailan
 
Writing a Crawler with Python and TDD
Writing a Crawler with Python and TDDWriting a Crawler with Python and TDD
Writing a Crawler with Python and TDD
Andrea Francia
 
Jazeed about Solr - People as A Search Problem
Jazeed about Solr - People as A Search ProblemJazeed about Solr - People as A Search Problem
Jazeed about Solr - People as A Search Problem
Lucidworks (Archived)
 
Creating common assessments in Limelight
Creating common assessments in LimelightCreating common assessments in Limelight
Creating common assessments in Limelight
Terri Sallee
 
SecurityBSides las vegas - Agnitio
SecurityBSides las vegas - AgnitioSecurityBSides las vegas - Agnitio
SecurityBSides las vegas - Agnitio
Security Ninja
 

Similar to Building an experimentation framework (20)

Time Series Data Storage in MongoDB
Time Series Data Storage in MongoDBTime Series Data Storage in MongoDB
Time Series Data Storage in MongoDB
 
Mozilla: Continuous Deploment on SUMO
Mozilla: Continuous Deploment on SUMOMozilla: Continuous Deploment on SUMO
Mozilla: Continuous Deploment on SUMO
 
CMS Expo 2011 - Social Drupal
CMS Expo 2011 - Social DrupalCMS Expo 2011 - Social Drupal
CMS Expo 2011 - Social Drupal
 
2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore
 
Instagram Training for Collective Bias
Instagram Training for Collective BiasInstagram Training for Collective Bias
Instagram Training for Collective Bias
 
SBML (the Systems Biology Markup Language)
SBML (the Systems Biology Markup Language)SBML (the Systems Biology Markup Language)
SBML (the Systems Biology Markup Language)
 
Panasonic search
Panasonic searchPanasonic search
Panasonic search
 
Writing a Crawler with Python and TDD
Writing a Crawler with Python and TDDWriting a Crawler with Python and TDD
Writing a Crawler with Python and TDD
 
Frontend Caching, PHPTek 2011, Chicago
Frontend Caching, PHPTek 2011, ChicagoFrontend Caching, PHPTek 2011, Chicago
Frontend Caching, PHPTek 2011, Chicago
 
The State of Front End Web Development 2011
The State of Front End Web Development 2011The State of Front End Web Development 2011
The State of Front End Web Development 2011
 
Jazzed about Solr: People as a Search Problem - By Joshua Tuberville
Jazzed about Solr: People as a Search Problem - By Joshua TubervilleJazzed about Solr: People as a Search Problem - By Joshua Tuberville
Jazzed about Solr: People as a Search Problem - By Joshua Tuberville
 
Jazeed about Solr - People as A Search Problem
Jazeed about Solr - People as A Search ProblemJazeed about Solr - People as A Search Problem
Jazeed about Solr - People as A Search Problem
 
Creating common assessments in Limelight
Creating common assessments in LimelightCreating common assessments in Limelight
Creating common assessments in Limelight
 
Selenium Page Objects101
Selenium Page Objects101Selenium Page Objects101
Selenium Page Objects101
 
iPhone App from concept to product
iPhone App from concept to productiPhone App from concept to product
iPhone App from concept to product
 
eXo Software Factory Overview
eXo Software Factory OvervieweXo Software Factory Overview
eXo Software Factory Overview
 
SecurityBSides las vegas - Agnitio
SecurityBSides las vegas - AgnitioSecurityBSides las vegas - Agnitio
SecurityBSides las vegas - Agnitio
 
Drizzle 7.0, Future of Virtualizing
Drizzle 7.0, Future of VirtualizingDrizzle 7.0, Future of Virtualizing
Drizzle 7.0, Future of Virtualizing
 
JavaScript Secrets
JavaScript SecretsJavaScript Secrets
JavaScript Secrets
 
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Building an experimentation framework

  • 1. Building an experimentation framework for web apps Zhi-Da Zhong zz@etsy.com Tuesday, July 26, 2011
  • 2. About the talk Why What Framework Break / hack Tech Details Test design Analysis Tuesday, July 26, 2011
  • 4. Questions “What will happen if I do X”? “Is X better than Y?” Tuesday, July 26, 2011
  • 5. The future & alternate universes (We’re bad at those.) Tuesday, July 26, 2011
  • 8. Experiments Try it out. Tuesday, July 26, 2011
  • 9. Experiments Try it out. Data beats speculation. Tuesday, July 26, 2011
  • 10. Experiments Try different alternatives on different people. Tuesday, July 26, 2011
  • 11. Experiments Try different alternatives on different people. Tuesday, July 26, 2011
  • 12. Which is better? v.s. Tuesday, July 26, 2011
  • 13. Not a great experiment Tuesday, July 26, 2011
  • 15. Front end experiments • Layout, colors, images, copy, ... • No functional changes • Impact can be surprisingly high Tuesday, July 26, 2011
  • 16. A little more complex... • Multipage flows • Functionality changes Tuesday, July 26, 2011
  • 17. Backend experiments • Why not? • Algorithms, architectures, batch processes, ... Tuesday, July 26, 2011
  • 18. The Etsy search backend Web app • New algorithm search() • New RPC protocol searchA() searchB() • New result data structure Search Search • New Solr trunk snapshot cluster A cluster B Tuesday, July 26, 2011
  • 19. DB re-architecture • Postgres => Sharded MySQL • Multiple experiments Tuesday, July 26, 2011
  • 20. Whole new features New pages + New DB tables + New batch jobs + ... Tuesday, July 26, 2011
  • 21. Not just 2 variants • A/B/C... tests • Multi-variate tests Tuesday, July 26, 2011
  • 22. Caveats • Content not under your control • Price tests? • Hard-to-measure/quantify things • Long term impact? Tuesday, July 26, 2011
  • 23. Other tests • Internal users testing • Whitelisted user testing Tuesday, July 26, 2011
  • 25. Complementary techniques • Observed/recorded testing - show different people the same thing • Side-by-side testing - show each person 2 alternatives Tuesday, July 26, 2011
  • 26. Side by side testing Tuesday, July 26, 2011
  • 28. A common approach • JS-based • Non-techie UI • “No IT!” • “Designed For Marketers, By Marketers” Tuesday, July 26, 2011
  • 29. Our approach • The developer is the user • Code as configuration • An integral part of the dev process Tuesday, July 26, 2011
  • 30. Developer as the user • The builder of the feature writes the test • Not just a marketing tool Tuesday, July 26, 2011
  • 31. Code as config • Simplicity • Expressivity • Quality • Version => complete system state • Revision history Tuesday, July 26, 2011
  • 32. Part of the dev process Every change is an experiment! Tuesday, July 26, 2011
  • 33. What does it look like? Tuesday, July 26, 2011
  • 35. Default => Experiment => (new) Default Tuesday, July 26, 2011
  • 36. To add a new feature... + $config[‘new_search’] = array( + ‘enabled’ => ‘off’ + ); function search() { + if ($cfg->isEnabled(‘new_search’)) { + return do_new_search(); + } // existing stuff } Tuesday, July 26, 2011
  • 38. Now we go crazy... function do_new_search() { // exciting new stuff // that might or might not work // but we can deploy it anyway // since it’s flagged off } Tuesday, July 26, 2011
  • 39. Internal user testing $config[‘new_search’] = array( + ‘enabled’ => ‘rampup’, + ‘rampup’ => array( + ‘admin’ => true ) ); Tuesday, July 26, 2011
  • 40. Whitelists $config[‘new_search’] = array( ‘enabled’ => ‘rampup’, ‘rampup’ => array( + ‘whitelist’ => array('zhida'), ‘admin’ => true ) ); Tuesday, July 26, 2011
  • 41. Opt-in experiments $config[‘new_search’] = array( ‘enabled’ => ‘rampup’, ‘rampup’ => array( + ‘group’ => 12345, ‘admin’ => true ) ); Tuesday, July 26, 2011
  • 42. A/B $config[‘new_search’] = array( ‘enabled’ => ‘rampup’, ‘rampup’ => array( + ‘percent’ => 1.5, ‘admin’ => true ) ); Tuesday, July 26, 2011
  • 43. If it works... $config[‘new_search’] = array( + ‘enabled’ => ‘on’ ); Tuesday, July 26, 2011
  • 44. Order matters Whitelist / Blacklist > Internal > Opt-in > Random Tuesday, July 26, 2011
  • 46. As easy as... Tuesday, July 26, 2011
  • 47. As easy as... 1. Pick a variant Tuesday, July 26, 2011
  • 48. As easy as... 1. Pick a variant 2. Do what it says Tuesday, July 26, 2011
  • 49. As easy as... 1. Pick a variant 2. Do what it says 3. Log the event Tuesday, July 26, 2011
  • 50. What's in a test? Tuesday, July 26, 2011
  • 51. Variants • Key-value pairs • interpreted by the app • Name • mostly for logging Tuesday, July 26, 2011
  • 52. SubjectIdProvider function getID() • Why? • hashing and other selectors • logging • Types of subjects • Users...but not always • Different groups of users - sellers vs buyers, etc. • Different ways to identify them - signed in vs signed out Tuesday, July 26, 2011
  • 53. Selectors function select($subjectID) => Variant Name Tuesday, July 26, 2011
  • 54. Combining multiple selectors • OR • breaks blacklists • AND • breaks whitelists • Sequence • works! Tuesday, July 26, 2011
  • 55. Selector sequence • Defines an ordering • Returns A/B/C/... or <don't care> Tuesday, July 26, 2011
  • 56. Loggers function log($testKey, $variantKey, $subjectKey) Tuesday, July 26, 2011
  • 57. More => better • More data • More ways to track • access logs • 3P analytics • custom Tuesday, July 26, 2011
  • 58. Access log augmentation • Apache note • Lots of log analysis tools • grep • $$ Tuesday, July 26, 2011
  • 59. 3P Analytics • Quick to start • May be cheap • Volume? • Lag time? • Flexibility / customization? Tuesday, July 26, 2011
  • 60. 3P Analytics - how • Custom variables • take note of number & size limits • Custom segments • Canned metrics Tuesday, July 26, 2011
  • 61. 3P Analytics - example <script type="text/javascript"> var pageTracker = _gat._getTracker("UA-1234567-8"); pageTracker._initData(); pageTracker._setCustomVar(2, "AB", "search_test.variantC", 3); pageTracker._trackPageview(); </script> Tuesday, July 26, 2011
  • 62. Our own event tracking HTML, event JS beacon Web app • HTML beacons Event log • Hadoop Hadoop • Cloud Results Tuesday, July 26, 2011
  • 63. Break / hack https://github.com/etsy/ab Tuesday, July 26, 2011
  • 64. Building on top of the core API Tuesday, July 26, 2011
  • 65. Test builders • Capture common patterns • feature ramp ups • opt-in experiments • Help with test design • weight equalization • multivariate testing Tuesday, July 26, 2011
  • 66. Automatic Dispatchers • Separate dispatching and work • Work with components that have well-defined invocation APIs • Define a particular level of granularity • Feel like magic Tuesday, July 26, 2011
  • 67. Dispatcher example - MVC • View dispatch • Controller dispatch • Spring framework, etc. Tuesday, July 26, 2011
  • 68. Selector Registry • Reuse $selectorReg = array( ‘staff’ => ‘InternalUserSelector’, • Clarity ‘whitelist’ => ‘WhitelistSelector’, ‘percent’ => ‘WeightedSelector’ • Documentation ); Tuesday, July 26, 2011
  • 70. What does it mean? Tuesday, July 26, 2011
  • 71. What does it mean? • Independent of subject attributes Tuesday, July 26, 2011
  • 72. What does it mean? • Independent of subject attributes • Independent of other tests Tuesday, July 26, 2011
  • 73. What does it mean? • Independent of subject attributes • Independent of other tests • Independent of (coarse-grained) time Tuesday, July 26, 2011
  • 75. Persistence • Better experience Tuesday, July 26, 2011
  • 76. Persistence • Better experience • Better data Tuesday, July 26, 2011
  • 77. Persistence • Better experience • Better data • Multi-part tests Tuesday, July 26, 2011
  • 78. Persistence • Better experience • Better data • Multi-part tests • ...but not forever Tuesday, July 26, 2011
  • 79. Ramping up/down • Vary group sizes • Reduce risk • Distribute load Tuesday, July 26, 2011
  • 80. Persistence + Ramping • Minimize inconsistency • Ramping up • Should just add people to the treatment group • Ramping down • Should just remove part of the treatment group Tuesday, July 26, 2011
  • 81. rand() • Explicit persistence • Cookie • DB • Scaling • Maintenance Tuesday, July 26, 2011
  • 82. Hashing variant = H(id) Tuesday, July 26, 2011
  • 83. Hashing variant = H(id) Persistence Tuesday, July 26, 2011
  • 84. Hashing variant = H(id) Persistence Tuesday, July 26, 2011
  • 85. Hashing variant = H(id) Attribute independence Persistence Tuesday, July 26, 2011
  • 86. Hashing variant = H(id) Persistence Attribute independence Tuesday, July 26, 2011
  • 87. Hashing variant = H(id) Test independence? Persistence Attribute independence Tuesday, July 26, 2011
  • 88. Hashing variant = H(test id, id) Test independence Persistence Attribute independence Tuesday, July 26, 2011
  • 89. Hashing variant = H(test id, id) Persistence Attribute independence Test independence Tuesday, July 26, 2011
  • 90. Hashing variant = H(test id, id) What else? Persistence Attribute independence Test independence Tuesday, July 26, 2011
  • 91. Hashing variant = H(test id, id) Weights! Persistence Attribute independence Test independence Tuesday, July 26, 2011
  • 92. Hashing h = H(test id, id) Persistence Attribute independence Test independence Tuesday, July 26, 2011
  • 93. Hashing h = H(test id, id) variant = P(h, weights) Persistence Attribute independence Test independence Tuesday, July 26, 2011
  • 94. Partitioning Hash 0 1 Tuesday, July 26, 2011
  • 95. Partitioning Hash 0 1 .5 Partition Tuesday, July 26, 2011
  • 96. Partitioning Hash 0 A B 1 .5 Partition Tuesday, July 26, 2011
  • 97. Ramping up Hash 0 A B 1 .7 Partition Tuesday, July 26, 2011
  • 98. Which hash function? • MD5/SHA-256/... • Test it! • But be careful... Tuesday, July 26, 2011
  • 99. A/B + opt-in • Need to separate the groups for analysis • Solution: use more than 2 variants! • Act according to variant properties • Track by variant name Tuesday, July 26, 2011
  • 101. ... Confidence interval ... something something ... Binomial ... blah blah ... Tuesday, July 26, 2011
  • 102. Confidence Intervals • How sure are we? • What if it were random? Tuesday, July 26, 2011
  • 104. Binomial experiments HT HTTT HT H H Tuesday, July 26, 2011
  • 105. Binomial experiments HT HTTT HT H H T HT HTT H HT H Tuesday, July 26, 2011
  • 108. A few test design tips Tuesday, July 26, 2011
  • 110. Whatʼs the question? What metrics? Tuesday, July 26, 2011
  • 111. Whatʼs the question? What metrics? How much better? Tuesday, July 26, 2011
  • 112. Who? • Different roles • Old vs new • Novelty • Habit • Expectation Tuesday, July 26, 2011
  • 113. When? • User types vary • Activity patterns vary • Site content might vary • Performance might vary • Full weeks are often a good starting point Tuesday, July 26, 2011
  • 115. Better living through experimentation • More risk taking => better product • MTTR • Lower stress Tuesday, July 26, 2011
  • 116. You can too. Tuesday, July 26, 2011