Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

N=10^9: Automated Experimentation at Scale

1,012 views

Published on

Wojciech Galuba, Decision Tools Lead, Facebook

Experimentation is a valuable tool for supporting product decisions, iterating on features and gaining actionable insights into people's behavior.

In this session Wojciech Galbua, Data Scientist at Facebook, presents an overview of Facebook's experimentation framework and how it is used to make day-to-day data-driven decisions at global scale.

The talk focuses on the challenges of building and scaling the analytics infrastructure, designing the tools for ease-of-use and ensuring broad adoption of sound experimentation methodologies across all the teams.

Published in: Technology, Business
  • Be the first to comment

N=10^9: Automated Experimentation at Scale

  1. 1. N=109   Automated   Experimenta5on  at  Scale   Wojciech  Galuba   Decision  Tools  Lead,   Facebook   @wgaluba  
  2. 2. N=109: Automated Experimentation at Scale Wojtek Galuba (wgaluba@fb) Decision Tools Team Lead Data Science Infrastructure Facebook
  3. 3. History of Data Science Infra at FB •  Founded April 2012 •  A group of data scientists and software engineers •  Experienced first hand the need for better infrastructure •  Need continues to grow •  Team doubled over the past year •  Expect continued rapid growth this year
  4. 4. Why do we experiment?
  5. 5. Experimentation Product changes Experiment to study this Metrics
  6. 6. Experiment to: Catch problems before they arise
  7. 7. Experiment to: Choose between multiple options
  8. 8. Experiment to: Challenge intuitions about product
  9. 9. Experiment to: Not only evaluate ideas but generate new ones
  10. 10. Challenges
  11. 11. Many experiments • Experiments running in parallel • Modifying many different aspects of the product • Overlaps are possible and may conflict
  12. 12. Many metric dimensions • Different contexts of user actions • Thousands of device types • Geography • Demographics • Time • Enormous space of possible questions
  13. 13. Many teams • Many ways to run an experiment • Diverse audience for results • Huge set of results from every experiment • Many ways to interpret results
  14. 14. Experimentation at Facebook
  15. 15. An experiment
  16. 16. QuickExperiment Dividepeoplerandomly color: blue
 size: medium" color: blue" size: big" color: green" size: medium"
  17. 17. QuickExperiment • Centralized experiment management • Purely config-level: no code pushes to iterate • Automatic exposure logging
  18. 18. PlanOut
  19. 19. PlanOut • Open sourced: http://facebook.github.io/planout/ • Flexible experimental design • Full, programmatic control over param values
  20. 20. Experiment evaluation Exposures Metrics % change from control to test -1 0 1 2 -2 3 -3 posts 99.9 %99 %95 %Confidence:
  21. 21. Assess decision risk 99.9 %99 %95 %Confidence:
  22. 22. Lessons learned
  23. 23. Computing answers to exponential number of possible questions Pre-compute • low specificity • low dimensionality • long-term Compute on-the- fly • high specificity • high dimensionality • short-term A balancing act
  24. 24. Tackling many dimensions Two sets of tools For exploration For extraction
  25. 25. Automated exploration
  26. 26. Enforce a lifecycle; In particular: clear experiment end dates
  27. 27. Why lifecycle policy? • Unifies methodology across teams • Prevents tech debt buildup • Minimizes bad impact on product
  28. 28. Ease of rapid iteration; Safe and scientifically valid iteration
  29. 29. Fast, but not too fast • Novelty effect vs. top engaged users bump • Understand if waiting helps
  30. 30. Ensure mutual exclusion; Across platforms, features and infra
  31. 31. Why mutual exclusion? • Fewer experiment conflicts • Lower metrics variance
  32. 32. Exposure log everything • Measure effects on the exposed only • Conditioning analyses on the time since last exposure
  33. 33. The culture Experimentation gives focus; But watch out for tunnel vision!
  34. 34. The culture Cultivate sound practices; Safe and low-impact experimentation
  35. 35. The culture Educate on data interpretation; Uniform decision-making across teams
  36. 36. Understanding uncertainty “Robust misinterpretation of confidence intervals” Rink Hoekstra et al. Psychonomic Bulletin & Review • Only 3% of scientists got all 6 answers right... • How do we educate the users of the tools?
  37. 37. The three stages of experimentation infrastructure
  38. 38. Stage 1: Artisanal Photo credit: Abhisek
  39. 39. Stage 2: Power tools
  40. 40. Stage 2: Power tools
  41. 41. Stage 3: Industrialized Photo credit: Steve Jurvetson
  42. 42. Conclusions Empower, but don’t overwhelm
  43. 43. Conclusions Filter and automate, but maintain broad focus
  44. 44. Conclusions Clean data and powerful tools are great, but building the right experimentation culture is equally important
  45. 45. N=109   Automated  Experimenta5on  at   Scale   Wojciech   Galuba   Decision  Tools  Lead,   Facebook   @wgaluba  

×