Building an experimentation                          framework for web apps                                  Zhi-Da Zhong ...
About the talk                               Why                               What                            Framework  ...
Why?Tuesday, July 26, 2011
Questions                         “What will happen if I do X”?                             “Is X better than Y?”Tuesday, ...
The future                                  &                         alternate universes                              (We...
Then what?Tuesday, July 26, 2011
ExperimentsTuesday, July 26, 2011
Experiments                            Try it out.Tuesday, July 26, 2011
Experiments                               Try it out.                         Data beats speculation.Tuesday, July 26, 2011
Experiments                         Try different alternatives                           on different people.Tuesday, July...
Experiments                         Try different alternatives                           on different people.Tuesday, July...
Which is better?                                v.s.Tuesday, July 26, 2011
Not a great experimentTuesday, July 26, 2011
Web appsTuesday, July 26, 2011
Front end experiments                         • Layout, colors, images, copy, ...                         • No functional ...
A little more complex...                         • Multipage flows                         • Functionality changesTuesday, ...
Backend experiments                         • Why not?                         • Algorithms, architectures, batch processe...
The Etsy search backend                                                                      Web app                      ...
DB re-architecture                         • Postgres => Sharded MySQL                         • Multiple experimentsTuesd...
Whole new features                               New pages                                   +                            ...
Not just 2 variants                         • A/B/C... tests                         • Multi-variate testsTuesday, July 26...
Caveats                         • Content not under your control                         • Price tests?                   ...
Other tests                         • Internal users testing                         • Whitelisted user testingTuesday, Ju...
Opt-in experimentsTuesday, July 26, 2011
Complementary techniques                         • Observed/recorded testing                            - show different p...
Side by side testingTuesday, July 26, 2011
HowTuesday, July 26, 2011
A common approach                         • JS-based                         • Non-techie UI                         • “No...
Our approach                         • The developer is the user                         • Code as configuration           ...
Developer as the user                         • The builder of the feature writes the test                         • Not j...
Code as config                         • Simplicity                         • Expressivity                         • Qualit...
Part of the dev process                            Every change is an experiment!Tuesday, July 26, 2011
What does it look like?Tuesday, July 26, 2011
Tuesday, July 26, 2011
Default => Experiment => (new) DefaultTuesday, July 26, 2011
To add a new feature...                         + $config[‘new_search’] = array(                         +    ‘enabled’ =>...
Deploy thatTuesday, July 26, 2011
Now we go crazy...                         function do_new_search() {                           // exciting new stuff     ...
Internal user testing                             $config[‘new_search’] = array(                         +      ‘enabled’ ...
Whitelists                             $config[‘new_search’] = array(                                ‘enabled’ => ‘rampup’...
Opt-in experiments                             $config[‘new_search’] = array(                                ‘enabled’ => ...
A/B                             $config[‘new_search’] = array(                                ‘enabled’ => ‘rampup’,      ...
If it works...                             $config[‘new_search’] = array(                         +      ‘enabled’ => ‘on’...
Order matters                         Whitelist / Blacklist > Internal > Opt-in > RandomTuesday, July 26, 2011
The frameworkTuesday, July 26, 2011
As easy as...Tuesday, July 26, 2011
As easy as...                         1. Pick a variantTuesday, July 26, 2011
As easy as...                         1. Pick a variant                         2. Do what it saysTuesday, July 26, 2011
As easy as...                         1. Pick a variant                         2. Do what it says                        ...
Whats in a test?Tuesday, July 26, 2011
Variants                         • Key-value pairs                          •   interpreted by the app                    ...
SubjectIdProvider                                                             function getID()                         •  ...
Selectors                         function select($subjectID) => Variant NameTuesday, July 26, 2011
Combining multiple selectors                         • OR                          •   breaks blacklists                  ...
Selector sequence                         • Defines an ordering                         • Returns A/B/C/... or <dont care>T...
Loggers                         function log($testKey, $variantKey, $subjectKey)Tuesday, July 26, 2011
More => better                         • More data                         • More ways to track                          •...
Access log augmentation                         • Apache note                         • Lots of log analysis tools        ...
3P Analytics                         • Quick to start                         • May be cheap                         • Vol...
3P Analytics - how                         • Custom variables                          •   take note of number & size limi...
3P Analytics - example                         <script type="text/javascript">                            var pageTracker ...
Our own event tracking                                                  HTML,    event                                    ...
Break / hack                          https://github.com/etsy/abTuesday, July 26, 2011
Building on top of the                                core APITuesday, July 26, 2011
Test builders                         • Capture common patterns                          •   feature ramp ups             ...
Automatic Dispatchers                         • Separate dispatching and work                         • Work with componen...
Dispatcher example - MVC                         • View dispatch                         • Controller dispatch            ...
Selector Registry                         • Reuse           $selectorReg = array(                                         ...
Randomized SelectorTuesday, July 26, 2011
What does it mean?Tuesday, July 26, 2011
What does it mean?                         • Independent of subject attributesTuesday, July 26, 2011
What does it mean?                         • Independent of subject attributes                         • Independent of ot...
What does it mean?                         • Independent of subject attributes                         • Independent of ot...
PersistenceTuesday, July 26, 2011
Persistence                         • Better experienceTuesday, July 26, 2011
Persistence                         • Better experience                         • Better dataTuesday, July 26, 2011
Persistence                         • Better experience                         • Better data                         • Mu...
Persistence                         • Better experience                         • Better data                         • Mu...
Ramping up/down                         • Vary group sizes                         • Reduce risk                         •...
Persistence + Ramping                         • Minimize inconsistency                         • Ramping up               ...
rand()                         • Explicit persistence                          •   Cookie                          •   DB ...
Hashing                         variant = H(id)Tuesday, July 26, 2011
Hashing                         variant = H(id)                            PersistenceTuesday, July 26, 2011
Hashing                                       variant = H(id)                         PersistenceTuesday, July 26, 2011
Hashing                                        variant = H(id)                                       Attribute independenc...
Hashing                                               variant = H(id)                         Persistence   Attribute inde...
Hashing                                               variant = H(id)                                          Test indepe...
Hashing                                          variant = H(test id, id)                                               Te...
Hashing                                          variant = H(test id, id)                         Persistence   Attribute ...
Hashing                                          variant = H(test id, id)                                                 ...
Hashing                                          variant = H(test id, id)                                                 ...
Hashing                                                  h = H(test id, id)                         Persistence   Attribut...
Hashing                                                  h = H(test id, id)                                          varia...
Partitioning                         Hash                                0         1Tuesday, July 26, 2011
Partitioning                         Hash                                0               1                                ...
Partitioning                         Hash                                0 A       B   1                                  ...
Ramping up                         Hash                                 0 A       B   1                                   ...
Which hash function?                         • MD5/SHA-256/...                         • Test it!                         ...
A/B + opt-in                         • Need to separate the groups for analysis                         • Solution: use mo...
AnalysisTuesday, July 26, 2011
...   Confidence interval ... something something                                     ... Binomial ... blah blah ...Tuesday...
Confidence Intervals                         • How sure are we?                         • What if it were random?Tuesday, J...
Binomial experimentsTuesday, July 26, 2011
Binomial experiments                             HT HTTT HT H HTuesday, July 26, 2011
Binomial experiments                             HT HTTT HT H H                             T HT HTT H HT HTuesday, July 2...
ResultsTuesday, July 26, 2011
DashboardsTuesday, July 26, 2011
A few test design tipsTuesday, July 26, 2011
Whatʼs the question?Tuesday, July 26, 2011
Whatʼs the question?                             What metrics?Tuesday, July 26, 2011
Whatʼs the question?                             What metrics?                            How much better?Tuesday, July 26...
Who?                         • Different roles                         • Old vs new                          •   Novelty  ...
When?                         • User types vary                         • Activity patterns vary                         •...
SummaryTuesday, July 26, 2011
Better living through                                   experimentation                         • More risk taking => bett...
You can too.Tuesday, July 26, 2011
Upcoming SlideShare
Loading in …5
×

Building an experimentation framework

2,593 views
2,453 views

Published on

OSCON talk on building a simple but powerful framework for feature ramp ups, A/B and multivariate testing, and other types of experiments in web apps.

Published in: Technology, Education
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,593
On SlideShare
0
From Embeds
0
Number of Embeds
43
Actions
Shares
0
Downloads
38
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Building an experimentation framework

  1. 1. Building an experimentation framework for web apps Zhi-Da Zhong zz@etsy.comTuesday, July 26, 2011
  2. 2. About the talk Why What Framework Break / hack Tech Details Test design AnalysisTuesday, July 26, 2011
  3. 3. Why?Tuesday, July 26, 2011
  4. 4. Questions “What will happen if I do X”? “Is X better than Y?”Tuesday, July 26, 2011
  5. 5. The future & alternate universes (We’re bad at those.)Tuesday, July 26, 2011
  6. 6. Then what?Tuesday, July 26, 2011
  7. 7. ExperimentsTuesday, July 26, 2011
  8. 8. Experiments Try it out.Tuesday, July 26, 2011
  9. 9. Experiments Try it out. Data beats speculation.Tuesday, July 26, 2011
  10. 10. Experiments Try different alternatives on different people.Tuesday, July 26, 2011
  11. 11. Experiments Try different alternatives on different people.Tuesday, July 26, 2011
  12. 12. Which is better? v.s.Tuesday, July 26, 2011
  13. 13. Not a great experimentTuesday, July 26, 2011
  14. 14. Web appsTuesday, July 26, 2011
  15. 15. Front end experiments • Layout, colors, images, copy, ... • No functional changes • Impact can be surprisingly highTuesday, July 26, 2011
  16. 16. A little more complex... • Multipage flows • Functionality changesTuesday, July 26, 2011
  17. 17. Backend experiments • Why not? • Algorithms, architectures, batch processes, ...Tuesday, July 26, 2011
  18. 18. The Etsy search backend Web app • New algorithm search() • New RPC protocol searchA() searchB() • New result data structure Search Search • New Solr trunk snapshot cluster A cluster BTuesday, July 26, 2011
  19. 19. DB re-architecture • Postgres => Sharded MySQL • Multiple experimentsTuesday, July 26, 2011
  20. 20. Whole new features New pages + New DB tables + New batch jobs + ...Tuesday, July 26, 2011
  21. 21. Not just 2 variants • A/B/C... tests • Multi-variate testsTuesday, July 26, 2011
  22. 22. Caveats • Content not under your control • Price tests? • Hard-to-measure/quantify things • Long term impact?Tuesday, July 26, 2011
  23. 23. Other tests • Internal users testing • Whitelisted user testingTuesday, July 26, 2011
  24. 24. Opt-in experimentsTuesday, July 26, 2011
  25. 25. Complementary techniques • Observed/recorded testing - show different people the same thing • Side-by-side testing - show each person 2 alternativesTuesday, July 26, 2011
  26. 26. Side by side testingTuesday, July 26, 2011
  27. 27. HowTuesday, July 26, 2011
  28. 28. A common approach • JS-based • Non-techie UI • “No IT!” • “Designed For Marketers, By Marketers”Tuesday, July 26, 2011
  29. 29. Our approach • The developer is the user • Code as configuration • An integral part of the dev processTuesday, July 26, 2011
  30. 30. Developer as the user • The builder of the feature writes the test • Not just a marketing toolTuesday, July 26, 2011
  31. 31. Code as config • Simplicity • Expressivity • Quality • Version => complete system state • Revision historyTuesday, July 26, 2011
  32. 32. Part of the dev process Every change is an experiment!Tuesday, July 26, 2011
  33. 33. What does it look like?Tuesday, July 26, 2011
  34. 34. Tuesday, July 26, 2011
  35. 35. Default => Experiment => (new) DefaultTuesday, July 26, 2011
  36. 36. To add a new feature... + $config[‘new_search’] = array( + ‘enabled’ => ‘off’ + ); function search() { + if ($cfg->isEnabled(‘new_search’)) { + return do_new_search(); + } // existing stuff }Tuesday, July 26, 2011
  37. 37. Deploy thatTuesday, July 26, 2011
  38. 38. Now we go crazy... function do_new_search() { // exciting new stuff // that might or might not work // but we can deploy it anyway // since it’s flagged off }Tuesday, July 26, 2011
  39. 39. Internal user testing $config[‘new_search’] = array( + ‘enabled’ => ‘rampup’, + ‘rampup’ => array( + ‘admin’ => true ) );Tuesday, July 26, 2011
  40. 40. Whitelists $config[‘new_search’] = array( ‘enabled’ => ‘rampup’, ‘rampup’ => array( + ‘whitelist’ => array(zhida), ‘admin’ => true ) );Tuesday, July 26, 2011
  41. 41. Opt-in experiments $config[‘new_search’] = array( ‘enabled’ => ‘rampup’, ‘rampup’ => array( + ‘group’ => 12345, ‘admin’ => true ) );Tuesday, July 26, 2011
  42. 42. A/B $config[‘new_search’] = array( ‘enabled’ => ‘rampup’, ‘rampup’ => array( + ‘percent’ => 1.5, ‘admin’ => true ) );Tuesday, July 26, 2011
  43. 43. If it works... $config[‘new_search’] = array( + ‘enabled’ => ‘on’ );Tuesday, July 26, 2011
  44. 44. Order matters Whitelist / Blacklist > Internal > Opt-in > RandomTuesday, July 26, 2011
  45. 45. The frameworkTuesday, July 26, 2011
  46. 46. As easy as...Tuesday, July 26, 2011
  47. 47. As easy as... 1. Pick a variantTuesday, July 26, 2011
  48. 48. As easy as... 1. Pick a variant 2. Do what it saysTuesday, July 26, 2011
  49. 49. As easy as... 1. Pick a variant 2. Do what it says 3. Log the eventTuesday, July 26, 2011
  50. 50. Whats in a test?Tuesday, July 26, 2011
  51. 51. Variants • Key-value pairs • interpreted by the app • Name • mostly for loggingTuesday, July 26, 2011
  52. 52. SubjectIdProvider function getID() • Why? • hashing and other selectors • logging • Types of subjects • Users...but not always • Different groups of users - sellers vs buyers, etc. • Different ways to identify them - signed in vs signed outTuesday, July 26, 2011
  53. 53. Selectors function select($subjectID) => Variant NameTuesday, July 26, 2011
  54. 54. Combining multiple selectors • OR • breaks blacklists • AND • breaks whitelists • Sequence • works!Tuesday, July 26, 2011
  55. 55. Selector sequence • Defines an ordering • Returns A/B/C/... or <dont care>Tuesday, July 26, 2011
  56. 56. Loggers function log($testKey, $variantKey, $subjectKey)Tuesday, July 26, 2011
  57. 57. More => better • More data • More ways to track • access logs • 3P analytics • customTuesday, July 26, 2011
  58. 58. Access log augmentation • Apache note • Lots of log analysis tools • grep • $$Tuesday, July 26, 2011
  59. 59. 3P Analytics • Quick to start • May be cheap • Volume? • Lag time? • Flexibility / customization?Tuesday, July 26, 2011
  60. 60. 3P Analytics - how • Custom variables • take note of number & size limits • Custom segments • Canned metricsTuesday, July 26, 2011
  61. 61. 3P Analytics - example <script type="text/javascript"> var pageTracker = _gat._getTracker("UA-1234567-8"); pageTracker._initData(); pageTracker._setCustomVar(2, "AB", "search_test.variantC", 3); pageTracker._trackPageview(); </script>Tuesday, July 26, 2011
  62. 62. Our own event tracking HTML, event JS beacon Web app • HTML beacons Event log • Hadoop Hadoop • Cloud ResultsTuesday, July 26, 2011
  63. 63. Break / hack https://github.com/etsy/abTuesday, July 26, 2011
  64. 64. Building on top of the core APITuesday, July 26, 2011
  65. 65. Test builders • Capture common patterns • feature ramp ups • opt-in experiments • Help with test design • weight equalization • multivariate testingTuesday, July 26, 2011
  66. 66. Automatic Dispatchers • Separate dispatching and work • Work with components that have well-defined invocation APIs • Define a particular level of granularity • Feel like magicTuesday, July 26, 2011
  67. 67. Dispatcher example - MVC • View dispatch • Controller dispatch • Spring framework, etc.Tuesday, July 26, 2011
  68. 68. Selector Registry • Reuse $selectorReg = array( ‘staff’ => ‘InternalUserSelector’, • Clarity ‘whitelist’ => ‘WhitelistSelector’, ‘percent’ => ‘WeightedSelector’ • Documentation );Tuesday, July 26, 2011
  69. 69. Randomized SelectorTuesday, July 26, 2011
  70. 70. What does it mean?Tuesday, July 26, 2011
  71. 71. What does it mean? • Independent of subject attributesTuesday, July 26, 2011
  72. 72. What does it mean? • Independent of subject attributes • Independent of other testsTuesday, July 26, 2011
  73. 73. What does it mean? • Independent of subject attributes • Independent of other tests • Independent of (coarse-grained) timeTuesday, July 26, 2011
  74. 74. PersistenceTuesday, July 26, 2011
  75. 75. Persistence • Better experienceTuesday, July 26, 2011
  76. 76. Persistence • Better experience • Better dataTuesday, July 26, 2011
  77. 77. Persistence • Better experience • Better data • Multi-part testsTuesday, July 26, 2011
  78. 78. Persistence • Better experience • Better data • Multi-part tests • ...but not foreverTuesday, July 26, 2011
  79. 79. Ramping up/down • Vary group sizes • Reduce risk • Distribute loadTuesday, July 26, 2011
  80. 80. Persistence + Ramping • Minimize inconsistency • Ramping up • Should just add people to the treatment group • Ramping down • Should just remove part of the treatment groupTuesday, July 26, 2011
  81. 81. rand() • Explicit persistence • Cookie • DB • Scaling • MaintenanceTuesday, July 26, 2011
  82. 82. Hashing variant = H(id)Tuesday, July 26, 2011
  83. 83. Hashing variant = H(id) PersistenceTuesday, July 26, 2011
  84. 84. Hashing variant = H(id) PersistenceTuesday, July 26, 2011
  85. 85. Hashing variant = H(id) Attribute independence PersistenceTuesday, July 26, 2011
  86. 86. Hashing variant = H(id) Persistence Attribute independenceTuesday, July 26, 2011
  87. 87. Hashing variant = H(id) Test independence? Persistence Attribute independenceTuesday, July 26, 2011
  88. 88. Hashing variant = H(test id, id) Test independence Persistence Attribute independenceTuesday, July 26, 2011
  89. 89. Hashing variant = H(test id, id) Persistence Attribute independence Test independenceTuesday, July 26, 2011
  90. 90. Hashing variant = H(test id, id) What else? Persistence Attribute independence Test independenceTuesday, July 26, 2011
  91. 91. Hashing variant = H(test id, id) Weights! Persistence Attribute independence Test independenceTuesday, July 26, 2011
  92. 92. Hashing h = H(test id, id) Persistence Attribute independence Test independenceTuesday, July 26, 2011
  93. 93. Hashing h = H(test id, id) variant = P(h, weights) Persistence Attribute independence Test independenceTuesday, July 26, 2011
  94. 94. Partitioning Hash 0 1Tuesday, July 26, 2011
  95. 95. Partitioning Hash 0 1 .5 PartitionTuesday, July 26, 2011
  96. 96. Partitioning Hash 0 A B 1 .5 PartitionTuesday, July 26, 2011
  97. 97. Ramping up Hash 0 A B 1 .7 PartitionTuesday, July 26, 2011
  98. 98. Which hash function? • MD5/SHA-256/... • Test it! • But be careful...Tuesday, July 26, 2011
  99. 99. A/B + opt-in • Need to separate the groups for analysis • Solution: use more than 2 variants! • Act according to variant properties • Track by variant nameTuesday, July 26, 2011
  100. 100. AnalysisTuesday, July 26, 2011
  101. 101. ... Confidence interval ... something something ... Binomial ... blah blah ...Tuesday, July 26, 2011
  102. 102. Confidence Intervals • How sure are we? • What if it were random?Tuesday, July 26, 2011
  103. 103. Binomial experimentsTuesday, July 26, 2011
  104. 104. Binomial experiments HT HTTT HT H HTuesday, July 26, 2011
  105. 105. Binomial experiments HT HTTT HT H H T HT HTT H HT HTuesday, July 26, 2011
  106. 106. ResultsTuesday, July 26, 2011
  107. 107. DashboardsTuesday, July 26, 2011
  108. 108. A few test design tipsTuesday, July 26, 2011
  109. 109. Whatʼs the question?Tuesday, July 26, 2011
  110. 110. Whatʼs the question? What metrics?Tuesday, July 26, 2011
  111. 111. Whatʼs the question? What metrics? How much better?Tuesday, July 26, 2011
  112. 112. Who? • Different roles • Old vs new • Novelty • Habit • ExpectationTuesday, July 26, 2011
  113. 113. When? • User types vary • Activity patterns vary • Site content might vary • Performance might vary • Full weeks are often a good starting pointTuesday, July 26, 2011
  114. 114. SummaryTuesday, July 26, 2011
  115. 115. Better living through experimentation • More risk taking => better product • MTTR • Lower stressTuesday, July 26, 2011
  116. 116. You can too.Tuesday, July 26, 2011

×