A/B Testing Framework Design


Published on

Design documentation aimed at engineers or startup founders for how you can create an A/B testing framework in your language/MVC framework of choice.

Published in: Technology, Business
1 Comment
  • QUite a good presentation. Although is intended for coders, you can easily understand the framework, the main issues and the considerations. Thank you.
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

A/B Testing Framework Design

  1. 1. Company D LOGO A/B Testing Framework Design Issues Patrick McKenzie 2010 (This presentation is meant to be read. It is released under the Creative Commons By Attribution license – feel free to spread it or use it.) www.abingo.org By Patrick McKenzie 2010. Please use or send to people who'd benefit.
  2. 2. Company D A/B Testing Frameworks LOGO • Why You Should Care • Core Use Scenarios • A/B Test Lifecycle • Design Decisions • Technical Considerations • API Considerations www.abingo.org
  3. 3. Company D Why You Should Care LOGO There is a paucity of A/B testing frameworks. "I can probably name a dozen different systems for building high scale applications (distributed storage, message queues, caching layers, search engines, etc), but I can’t name a single AB testing framework other than Google Website Optimizer. That seems like a serious inversion of priorities for most startups." http://www.tomkleinpeter.com/2009 /01/21/where-are-the-ab-testing- frameworks/ www.abingo.org
  4. 4. Company D Why You Should Care LOGO • A/B testing helps you validate your hypotheses about customers and product. • A/B testing is drop-dead easy if your tech supports it. • You won't do it otherwise, because it feels like boring busywork. The goal is to have split-testing be a continuous part of our development process, so much so that it is considered a completely routine part of developing a new feature. In fact, I've seen this approach work so well that it would be considered weird and kind of silly for anyone to ship a new feature without subjecting it to a split-test. That's when this approach can pay huge dividends. Eric Ries in blog post www.abingo.org
  5. 5. Company D Why You Should Care LOGO • There are only two decent A/B test frameworks for Rails. Both less than 9 months old. • There are (to best of my knowledge) no OSS frameworks for Java, Python, etc. • You should write one. V1.0 can be done in 10 man hours in modern MVC frameworks. Will be best ROI you ever get. • This presentation hopes to save you time by telling you where the hard decisions are. www.abingo.org
  6. 6. Company D Three Use Scenarios LOGO • Customers interacting with site. • Implementers coding A/B test. • Somebody interpreting results. www.abingo.org
  7. 7. Company D User View of A/B Test LOGO (What Cindy Sees) www.abingo.org
  8. 8. Company D User View of A/B Test LOGO (What Bob Sees) www.abingo.org
  9. 9. Company D Key Points For Users LOGO • Users get consistent behavior. Cindy always sees her alternative. Bob always sees his. • A/B test doesn't break usage of site. (Sounds obvious, can be non-trivial. Test for interactions!) • Ending A/B test doesn't break site. Did you know that in Google Website Optimizer users can bookmark individual A/B alternatives because they have distinct URLs? And that after the test is over they may 404? Yeah. Don't do that. www.abingo.org
  10. 10. Company D What Developers See LOGO • One line to add a test. • One line to track it. • No thought required beyond creating alternatives. www.abingo.org
  11. 11. Company D What Internal Customers See LOGO • Simple, clear, actionable results. • Stats 101 not required. Your marketing team might know math. That doesn't mean they should have to. www.abingo.org
  12. 12. Company D A/B Test Lifecycle LOGO • Come up with alternatives. • Code alternatives. • Test alternatives. • Deploy to site. • Users interact with alternatives. • Analyze results. • End test. When designing your A/B testing framework, keep in mind that you'll be doing all of the above. Eliminate as much friction from each step as possible – this decreases total time through the loop. www.abingo.org
  13. 13. Company D Come up with alternatives. LOGO • Not generally a technical problem. • Inspiration can come from anywhere – a blog post, a passing fancy, customer comments. • Should never have to say "We can't do that!" • Strong recommendation: If we pay your salary, you are authorized to test. Customers do not think in terms of Model/View/Controller interfaces. They just want to know what the app can do. You should be able to A/B test from any point in the app. www.abingo.org
  14. 14. Company D Code Alternatives LOGO • Programming is hard, but you have to do it anyway. • Programming A/B tests is easy – one liner and if statement. • Testing framework handles all bookkeeping – programmers never care. • Re-use conversion code. Typical businesses have lots of tests, few defined conversions. No need to reinvent wheel every single time. www.abingo.org
  15. 15. Company D Test Alternatives LOGO • A/B tests are live code. They can have bugs. You should be able to unit test like normal. • Helpful for developers to have access to quick "switch what test I'm seeing" functionality. Simplest example: manually add parameter to URL (&exampleTest=altA). Turn off feature in production. • Careful of test interactions. Very easy to do once you start testing behavior in addition to display. www.abingo.org
  16. 16. Company D Deploy to site. LOGO • Avoid pointless work here. "Push code live, test starts automatically" is the ideal. • Testing framework should handle its own setup first time test is called. After that, re- use. • Note this decision going to be made thousands or hundreds of thousands of times, possibly right after you push live: consider performance implications. • Can make code default to old version, control start/stop of test via dashboard. Could be worth it, adds complexity. www.abingo.org
  17. 17. Company D Users interact with alternatives. LOGO • Happily, this takes very little work for you... • … except when it creates Heisenbugs. • In addition to thorough testing, make sure your "What The User Is Seeing" feature (you have one, right?) reflects their A/B tests. www.abingo.org
  18. 18. Company D Analyze results. LOGO • Stats behind A/B tests may not be well understood. Impress that stats are real, measured, and actionable. It doesn't matter if they think it is magic as long as they trust the magic. • Do significance testing so it isn't magic. • Doing significance testing is grunt work: let the computer do it. • Spend the extra time to make internal dashboard pretty. People trust pretty things. • A/B tests not a good place to dig for data. One glance tells you all you need. www.abingo.org
  19. 19. Company D End test LOGO • Simple solution: rip code out, test stops. • Simple solution requires redeploy. In event of bug or strong test result ("Oh my God what were we thinking!?!") might want immediate end button on dashboard. Be able to specify alternative. • Automatic end of test? Probably a misfeature, but easy to implement. • Ending test should switch all users to winner (or else you get to support old tests until doomsday). However, users have memories. • Negatively affected users (e.g. you end test in favor of higher price, user planning on buying later saw lower price) may be mad. Not big problem, but be ready. www.abingo.org
  20. 20. Company D Design Considerations LOGO • Tracking and managing identity. • How to choose alternatives by identity. • Where to store test participation. • Where to store alternatives. • Stats is hard, let's go shopping. • Presenting results. www.abingo.org
  21. 21. Company D Tracking Identity LOGO • Cindy is Cindy, Bob is Bob, Cindy should always see Cindy's tests. • Cindy is not a cookie. Cindy is not a session. Cindy is freaking Cindy. Even when she is on different computer. • You already have identity via user authentication. Probably want to punt identity problem there. Have it inform framework of current user identity. • Important edge case: new user signup should persist “identity” from anonymous visitor to identifiable user. www.abingo.org
  22. 22. Company D Tracking Identity LOGO • Easiest identity is random number thrown into cookie. Associate with user accounts. Restore on login. Bam, done. • However, you will occasionally have A/B test conversions outside of Cindy's HTTP cycle. (e.g. Purchase notification comes from Paypal, not from Cindy. Cindy calls up to place order.) Think it through – not terribly difficult if you plan for it. www.abingo.org
  23. 23. Company D How To Choose Alternatives LOGO • If you have N alternatives, picking randomly and persisting it by identity works decently. • Another approach: MD5(identity) % number_of_alts. Saves space (marginally). • Don't need to save what test Cindy is seeing as long as you can reproduce it. www.abingo.org
  24. 24. Company D How To Choose Alternatives LOGO • If you have N alternatives, picking randomly and persisting it by identity works decently. • Another approach: MD5(identity) % number_of_alts. Saves space (marginally). • Don't need to save what test Cindy is seeing as long as you can reproduce it. www.abingo.org
  25. 25. Company D Where to store test participation LOGO • Cookie/session bad idea: Cindy will log in at work tomorrow. She should see consistent behavior. • Cache (memcached) possible, but if Cindy is evicted from cache or cache resets, tough for Cindy and tough for you. • Persistent data store best bet. Will talk about specific data stores later in slides. www.abingo.org
  26. 26. Company D Where to store alternatives LOGO • Many approaches. Whatever works for you. • A/Bingo puts alternatives directly in code. Easiest place, always right in front of developer, no conceptual overhead. • Vanity puts alternatives in special experiment files. Arguably cleaner code, but have to context/switch. • Google Website Optimizer has you define alternatives on a web form. Great for marketing department at insurance company. Don't do this. Greatly limits possibilities, increases integration work, www.abingo.org blows testing to heck and back.
  27. 27. Company D Doing Stats LOGO • If possible, call out to dedicated stats modules/libraries to do stats. • Many types of possible stats for A/B testing. Pick one, stick with it. I use Z- scores because a) I remember them and b) implementation was drop-dead easy. • Sadly, Ruby lacks many good stats libraries. Oh, to be a Perl programmer... • This subject worth its own presentation. See Ben Tilly. http://elem.com/~btilly/effective-ab-testing/ www.abingo.org
  28. 28. Company D Presenting Results LOGO • Text is easy! Graphs not quite. • Google's confidence bars are sexy... and pretty useless. • Simple, human language to describe what confidence intervals and statistical significance mean. • De-emphasize null results (A > B but not statistically significantly so) but don't hide them. (After all, the fact that "this test was too close to call" tells you something useful.) www.abingo.org
  29. 29. Company D Technical Considerations LOGO • Less than 1,000 visitors per hour? Skip these slides. • A/B testing turns performance assumptions on head: heavy writes in very bursty fashion ("as soon as test goes live"), very non-relational data, fairly infrequent reads (~3X writes on my site), extraordinarily infrequent use of summary statistics. • Practically tailor-made for key/value store, not so much for SQL. www.abingo.org
  30. 30. Company D Queries You Have To Answer FAST LOGO • Who is Cindy? (user → identity) • Is Cindy participating in Test X? • If so, what alternative has she seen? • If not, what alternative should she see? • Record fact that Cindy is participating in Test X. • Has Cindy converted in Test X? • Record fact that Cindy converted for Test X. www.abingo.org
  31. 31. Company D Queries You Can Answer Leisurely LOGO • How many people have participated in Experiment X? • How many saw Alternative A? • Umm, do that stats magic for me. www.abingo.org
  32. 32. Company D Query You Will NEVER ASK LOGO • Who saw Alternative A in Experiment X? www.abingo.org
  33. 33. Company D Possible Architectures LOGO • Summary statistics (participant counts & conversion counts) in MySQL table with "fairly few" rows. Simple increment statements for updates. • Participation information (Cindy, Experiment X, Alternative A) in key/value store. • Or whole thing in key/value store. www.abingo.org
  34. 34. Company D Quick Speed Improvement for SQL LOGO • Give each of your alternatives a unique string ID like MD5(experiment name, alternative name). Calculate that in application code. Index on column. • UPDATE alternatives SET participants = participants + 1 where lookup_code = 'CALCULATED IN APPLICATION'; • This avoids having to translate human name in code to ID in table. (Or having to use multi-column index for lookup.) • Note: I am not a very good guy with DBs, but I am informed this is fairly fast. Test for yourself. www.abingo.org
  35. 35. Company D Specific Key/Value Store LOGO Recommendations • MySQL with big string columns for key, value: ewwwwww. I mean, ewwwwww. • Memcachd: Acceptable (and fast) but not persistent. Also tends to only go down when server does. For A/B testing, might just re-run all in progress tests if it dies. • MemcacheDB: Tried it. Has unacceptable performance when BerkeleyDB flushes to disk. (5 seconds+!) • Redis: Tried it. Not in production yet. My recommendation – very fast. Vanity also uses it. www.abingo.org
  36. 36. Company D API Considerations LOGO Only need to expose two methods: • ab_test(name, alternatives, conversion_name) • conversion(conversion_name) Note lack of identity in method calls. Let the framework worry about that. How you specify alternatives up to you. Array of strings is easy to understand. www.abingo.org
  37. 37. Company D Consuming API LOGO ab_test(name, alternatives, conversion_name) returns the chosen alternative, handles all bookkeeping as side effect. Typically: if (ab_test(...) == "something") { #do something } else { #do something else } Fun opportunity for blocks/binding if your language supports that. www.abingo.org
  38. 38. Company D Got Questions? LOGO Great A/B testing resources: • Eric Ries (startuplessonslearned.com) – heavy on motivation, less on stats/design decisions • #abtests and @abtests on Twitter. Good community, many ideas for inspiration. • http://abtests.com – ditto • http://www.bingocardcreator.com/abingo/resources – links I use when I forget the math. • http://www.kalzumeus.com – my blog • patrick@bingocardcreator.com – I'm always happy to chat about A/B testing, with anybody. Potentially available for consulting. www.abingo.org