Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AB Testing at Expedia


Published on

  • Be the first to comment

AB Testing at Expedia

  1. 1. AB Testing Revolution through constsant evolution
  2. 2. Expedia SF 114 Sansome @expediaeng Work with us:
  3. 3. Paul Lucas Sr Director, Technology Want to visit next? Greece Jeff Madynski Director, Technology Want to visit next? Croatia Anuj Gupta Sr Software Dev Engineer Want to visit next? Peru
  4. 4. Revolution through constant evolution
  5. 5. Technology Evolution V0 – batch processing from abacus exposure logs, Omniture, and booking datamart. Tableau visualization V1 - Storm, Kestrel, DynamoDB / Postgresql reading UIS messages and client log data. (Nov 2014 - Dec 2015) V2 - Introduce Kafka and Cassandra (May 2016)
  6. 6. TNL – original solution • Batch processing • Tableau visualization • Merged data from OMS/omniture • Problems: – 1-2d feedback loop – what if we had mistakes in test implementation(bucketing not what anticipated)? – In order to fix data import errors - start over again
  7. 7. TNL Dashboard v0 Omniture click data Booking datamart Abacus exposures Tableau Hadoop ETL
  8. 8. TNL v0 -> v1
  9. 9. Begin Jeff delete this page
  10. 10. TNL v1 Problems • Database size 420GB, queries took 3-5 minutes • Data drop (kestrel) • Increase in data (multi-brand, +customers)
  11. 11. TNL v1->v1.1, v2 • Fighting fires, borrowing more time • POC next
  12. 12. Fighting fires – borrowing more time
  13. 13. User Interaction Service(UIS) Traffic
  14. 14. Scaling messaging system Kafka • Publish-subscribe based messaging system • Distributed and reliable • Longer retention and persistence • Monitoring dashboard and alerts • Buffer for system downtime Kestrel limitation • Message durability is not available • Reaching potential scalability issues • In-active open source project
  15. 15. Scaling database performance • Database views for caching –Views created every 6 hours –UI only loads data from views –Read-only replicas for select queries • Archive data –Moved old and completed experiment data to separate tables –DB cleanup using vacuum and re-indexing
  16. 16. TNL Dashboard v2
  17. 17. Product Demo
  18. 18. Streaming
  19. 19. •Column-oriented, time series schema •Time-to-live(TTL) on data •Only store most popular aggregates
  20. 20. v1 VS v2 •New Architecture –More scalable –More responsive –Less prone to data loss • Lessons learnt –System is as fast as the slowest component –Fault-tolerance and resilience –Partition data –Pre-production environment
  21. 21. Questions/discussion
  22. 22. APPENDIX
  23. 23. 27 Apply statistical power to test results results Using 90% confidence level, 1 out of 10 tests will be false positive or negative Heads Tails Right hand 51 49 Left hand 49 51 Right hand is superior at getting heads!
  24. 24. Do’s and Don’ts when concluding tests Don’t call test too early; this increases false positives or negatives Don’t call tests as soon as you see positive results because test result frequently goes up and down To claim a test Winner/Loser, the positive/negative effect has to stay for at least 5 consecutive days and the trend is stable Please note this type of chart is not currently available in the Test and Learn dashboard or SiteSpect UI; The shape of Confidence Interval lines varies test by test Define one success metric and run tests for a pre-determined duration; (For hotel/flight tests in the US, suggest running until confidence interval of conversion change is within +/- 1%); tests should run at least 10 days Don’t assume the midpoint (observed % change during the test period) will hold true after the feature is rolled out: a 4.0% +/- 4.0% test may have zero impact and may not be much better than a 1.0% +/- 1.0% test Don’t call an inconclusive test “trending positive” or “trending negative” as test result fluctuates Contact ARM testing team for questions Using 90% confidence level Winner: Lower bound of % change >= 0 (or probability of test being positive >= 95%); Loser: Higher bound of % change <= 0 (or probability of test being negative >= 95%) Else: Inconclusive or Neutral