Lean Experimentation

2,454 views

Published on

Talk given on lean experimentation in research and practice at Cornell Information Science.

Published in: Automotive, Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,454
On SlideShare
0
From Embeds
0
Number of Embeds
672
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Lean Experimentation

  1. 1. Lean Experimentation How to leverage online experiments in research and practice Thomas Høgenhaven Twitter: @thogenhaven Cornell IS Breakfast Talk April 4th, 2012Friday, April 6, 12
  2. 2. Agenda 1. Conducting Online Experiments 2. Experimentation Literature 3. Experimentation in SMEs and Government Today 4. Lean ExperimentationFriday, April 6, 12
  3. 3. I Conducting Online ExperimentsFriday, April 6, 12
  4. 4. The Why Bother Question “While some social scientists engage in small-scale controlled experimentation with dozens of users or groups, the capacity to perform large-scale interventions with thousands of users opens up new opportunities for research." (Preece and Schneiderman 2009: 25).Friday, April 6, 12
  5. 5. What I Mean With Online Experiments In online experiments, we are interested in examining online behavior. Not just using the internet as a means to examine offline behavior.Friday, April 6, 12
  6. 6. What I Mean With Online Experiments Users Independent Variation Variation Variation variable A B n Dependent Online Online Online Behavior variable Behavior Behavior Behavior Statistical Difference testFriday, April 6, 12
  7. 7. The High-Level Experimental Process Thomke 1998: 745.Friday, April 6, 12
  8. 8. Example: Experimentation At Microsoft Guess which one performs better, in each of these 8 pairs. Anyone getting 6/8 right, wins a t-shirtFriday, April 6, 12
  9. 9. Experimenting At Microsoft1 A B 5 A B Which one is significantly better? [] A [] B2 A B 6 A B [] None of them3 A B 7 A B4 A B 8 A B Kohavi et al (2009): Online Experimentation at MicrosoftFriday, April 6, 12
  10. 10. Experimenting At Microsoft 0 / 200 Microsoft employees got more than 5 / 8 answers right 1 A B 5 A B 2 A B 6 A B 3 A B 7 A B 4 A B 8 A B Kohavi et al (2009): Online Experimentation at MicrosoftFriday, April 6, 12
  11. 11. What Is The Effect Of Experiments? Improvement No Effect Disimprovement 33% 33% 33% Kohavi et al (2009): Online Experimentation at MicrosoftFriday, April 6, 12
  12. 12. Is That Just Microsoft Being Microsoft? No. Estimating effects of changes is incredible hard. Netflix considers 90% of what they try to be wrong.Friday, April 6, 12
  13. 13. It’s Actually Hard To Predict https://whichtestwon.com/past-testsFriday, April 6, 12
  14. 14. 2 Experimental LiteratureFriday, April 6, 12
  15. 15. Current Experimental Framework in HCI Psychology & Social Psychology Experimental methodology literature HCIFriday, April 6, 12
  16. 16. Offline And Online Experiments • Psychology literature sometimes uses the internet to study human behavior • But it does not use the internet to study the internetFriday, April 6, 12
  17. 17. For example... No mentions of experimentation in online environments 2010Friday, April 6, 12
  18. 18. Offline And Online Experiments Laboratory Field Offline OnlineFriday, April 6, 12
  19. 19. Offline And Online Experiments Laboratory Field Psychology covers this Offline OnlineFriday, April 6, 12
  20. 20. Offline And Online Experiments Laboratory Field Psychology covers this Offline But not this OnlineFriday, April 6, 12
  21. 21. The Research There Is, Is Not Systematic "To the extent of our knowledge, no research has so far been reported on treating online test design and implementation in a systematic manner" (Cámara and Kobsa 2009: 18).Friday, April 6, 12
  22. 22. Online Experiments In Academia CHI and CSCW use experiments all the time - but more can be invested in methodology literature. This will help explore possibilities and limitations of online experimentationFriday, April 6, 12
  23. 23. 3 Experimentation In SMEs And Government Agencies TodayFriday, April 6, 12
  24. 24. State Of The Art In Industry Today • Experimentation is increasing • At least 25 different software vendors • $0 - $320,000 a year* *Source: whichmvt.comFriday, April 6, 12
  25. 25. Practice Has Its Own LiteratureFriday, April 6, 12
  26. 26. Website Experiments Several ways to conduct experiments 1. Server-side / Client-side 2. A/B Test / Multivariate TestFriday, April 6, 12
  27. 27. Not Overly Expensive Software Just 2 out of 25+ vendors Google Website Optimizer Visual Website Optimizer (free) ($600 - $3000 / year)Friday, April 6, 12
  28. 28. A/B/n Experiment Users Javascript Independent Webpage Webpage Webpage variable A B n Dependent variable Behavior Behavior Behavior Statistical test DifferenceFriday, April 6, 12
  29. 29. Google Website OptimizerFriday, April 6, 12
  30. 30. Limitations Of Mainstream Experimental Software 1. Limited to between-subject design 2. Lack of data export 3. No control over statistical test 4. Expensive coding necessaryFriday, April 6, 12
  31. 31. Limitation 1: Limited To Between Subject Design • Cannot control for individual differences (No such data is collected / made available) • Requires more experimental subjects • No pre-experimental data is collectedFriday, April 6, 12
  32. 32. Limitation 2: Lack of Data ExportFriday, April 6, 12
  33. 33. Google Website Optimizer: Data ExportFriday, April 6, 12
  34. 34. Visual Website OptimizerFriday, April 6, 12
  35. 35. Visual Website Optimizer: Data ExportFriday, April 6, 12
  36. 36. Software Limitations: Data Export • Some software better than other • No data on individual users • No segmentation on background variables • This might be the biggest problem, as this is where many significant results lie.Friday, April 6, 12
  37. 37. Limitation 3: No Choice Between Statistical Tests Okay?Friday, April 6, 12
  38. 38. Statistical Test = Chance To Beat Original “The chance to beat original ... displays the probability that a combination will be more the successful than the original version. When numbers in this column are high, perhaps around 95%, that means a given combination is probably a good candidate to replace your original content. Low numbers in this column mean that the corresponding combination is a poor candidate for replacement.” http://support.google.com/websiteoptimizer/bin/answer.py?hl=en&answer=55944Friday, April 6, 12
  39. 39. Visual Website Optimizer Is More Transparent “ Visual Website Optimizer uses z-tests for both A/ B tests and multivariate tests” Standard Error (SE) = Square root of (p * (1-p) / n) http://visualwebsiteoptimizer.com/split-testing-blog/what-you-really-need-to-know-about-mathematics-of-ab-split-testing/Friday, April 6, 12
  40. 40. z-tests We don’t know if data fits this • Focus on a single parameter • Assumes parametric assumptions are metFriday, April 6, 12
  41. 41. Limitation 4: Coding Required Have to Users Javascript be coded Independent Webpage Webpage Webpage variable A B n Dependent variable Behavior Behavior Behavior Statistical test DifferenceFriday, April 6, 12
  42. 42. Software Limitations: Expensive Coding We already coded it, so we can as well keep it. I hate working for no reasonFriday, April 6, 12
  43. 43. Software Limitations: Expensive Coding I knew this wouldn’t work! We should never have spent resources on it...Friday, April 6, 12
  44. 44. The Challenge 1. Overcome methodological limitations of experimental software 2. Reduce development costs 3. Explore possibilities and limitations of online experimentationFriday, April 6, 12
  45. 45. 4 Lean ExperimentationFriday, April 6, 12
  46. 46. Test Environment Users Independent Proxy Proxy Proxy variable A B n Dependent Behavior on Behavior on Behavior on variable Behavior website website website Statistical test DifferenceFriday, April 6, 12
  47. 47. Proxies For Experimentation Website Email Survey AdsFriday, April 6, 12
  48. 48. Comparative Advantages And DisadvantagesFriday, April 6, 12
  49. 49. Lean Experimentation Principles 1.Test assumptions, ideas, and theories 2. Test before coding, not after 3. Test in the fieldFriday, April 6, 12
  50. 50. 1. Test Assumptions, Ideas, And TheoriesFriday, April 6, 12
  51. 51. 2. Test Before Coding, Not After Ideas Bad Idea Good Idea Experimentation ImplementationFriday, April 6, 12
  52. 52. 3. Test In The Field • Identical design patterns have different effects in different contexts • E.g. social comparison information in respectively competitive and cooperative communities • Cocktail effects are largely unknownFriday, April 6, 12
  53. 53. Requirements Of Lean Experimentation 1. Independent groups 2. Random assignment 3. Allows trackingFriday, April 6, 12
  54. 54. Why Use Proxies For Experimentation?Friday, April 6, 12
  55. 55. Test Environment • Manipulates the independent variable through a proxy • Examines dependent variable in natural field environmentFriday, April 6, 12
  56. 56. Test Subjects • Existing users (when using website, email, and survey) • Potential users (when using advertisements)Friday, April 6, 12
  57. 57. Proposed Usage and limitations Good for Less suited for • Ideas • Small changes • Theories • Graphical changes • Hypothesis • Features Can be useful if testing assumptionsFriday, April 6, 12
  58. 58. Data Output • Mixed sources that need to be combined • Open / CTR rates from proxy • Web analytics • SQL databasesFriday, April 6, 12
  59. 59. Durability of Proxy Experiment is short Email experiment Control Experimentation 16 12 8 4 0 Wk0 Wk1 Wk2 Wk3Friday, April 6, 12
  60. 60. Buy In Needed Hard to sell 1. Making changes on websites 2. Sending Emails 3. Conducting Surveys 4. Running Ads Easy to sellFriday, April 6, 12
  61. 61. Feedback Quality Critical feedback 1. Wireframes / early stage development 2. Finished / Nearly finished stages Not so critical feedbackFriday, April 6, 12
  62. 62. Influence On Decisions Increased likelihood of impact when getting experimental effect data earlyFriday, April 6, 12

×