Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A/B Testing for Game Design

A/B Testing for Game Design
Iteration: A Bayesian Approach

  • Be the first to comment

A/B Testing for Game Design

  1. 1. Steve Collins CTO / Swrve A/B Testing for Game Design Iteration: A Bayesian Approach
  2. 2. Biography
  3. 3. Push Notifications In-App MessagingA/B TestingTargeting & Analytics
  4. 4. Introduction to A/B Testing
  5. 5. Game Service Test Dev Update Test Dev Update Test Dev Update Test Dev Update Development Beta “Soft” Launch LAUNCH Test Dev Launch
  6. 6. On-board 1st  experience   Tutorial  Flow   Incentives  /  Promo Monetization IAPs,  virtual  items   Promotions  /  offers  /  Ads   Economy  /  Game  Balance Retention Messaging  /  Push  Notify   Core  loops,  events   Community  /  Social “Engagement” Fun  factor!   Content  refresh  /  evolution   Level  balance,  progress LTV (+$) Acquisition Ads  (CPC,  CPI)   Cross  Promo   Discovery,  PR UAC (-$)
  7. 7. ROI = LTV - UAC
  8. 8. ROI UACLTV = nX x=1 ARPUx COSTx (1 + WACC)x CAC ARPULTV = nX x=1 ARPUx COSTx (1 + WACC)x CACLTV = nX x=1 ARPUx COSTx (1 + WACC)x CAC d d lifetime? LTV {
  9. 9. 1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15" 16" 17" 18" 19" 20" 21" 22" 23" 24" 25" 26" 27" 28" 29" 30" 31" 32" High Mean Low Install-day-cohort 30-day revenue
  10. 10. Test%Hypotheses% Data$driven$ Take%ac'on% Iterate,'fail+fast' Understand) Metrics()>(Analy0cs()>(Insight(
  11. 11. 1. Split population 2. Show variations 4. Choose winner 4. Deploy winner3. Measure response
  12. 12. State  Sync State  Sync Game  State  DB Data   Warehouse Analyse Events  (tagged  with  A  or  B) Game   Server Content   Management     System A  or  B  variant A/B  Test   Server
  13. 13. What to test?
  14. 14. vs Message layouts / content
  15. 15. choke point Tutorial Flow
  16. 16. vs 25 25 -20%-40% Promotion Discounts
  17. 17. Elasticity testing: exchange rate $4.99 $4.99 100 coins A B 200 coins vs
  18. 18. Price set A Price set B Price set C Store Inventory
  19. 19. (f) Zoomed in version of the previous scatterplot. Note that jitter and transparency Value of first purchase Totalspend $70$30$15 A/B Test VIPs Repeat Purchases $5$2
  20. 20. Timing Wait Sessions (3) Wait Event (movie_closed) In-app Msg "Request Facebook Registration" FB Reg when where
  21. 21. 0 10000 20000 30000 40000 50000 0.00.20.40.60.81.0 number of user trials Probabilityofwinning
  22. 22. Canacandycrushbalt* Day 1 retention = 30% * Apologies to King and Adam Saltsman
  23. 23. Beta Test Original New Version
  24. 24. Expecting 30% day-1 retention After 50 users, we see 0% Is this bad?
  25. 25. Null Hypothesis Testing (NHT) View
  26. 26. The Null Hypothesis ) = 1 p 2⇡ 2 e (x µ)2) 2 2 H0 : x ! µ H0 : x ! µ Accept Reject H0 : x = µ 30%
  27. 27. ) = 1 p 2⇡ 2 e (x µ)2) 2 2Reject 0% 30%
  28. 28. Conclusion: 30% is unlikely to be the retention rate
  29. 29. Issue #1 p-value
  30. 30. p-Value 95% 2.5% 2.5% = 1 p 2 e (x µ)2) 2 2 (n − r)!r! µ = np σ = np(1 − p) z = ¯x − µ σ z = ¯x − µ ˆσ√ n SE(¯x) = ˆσ √ n n = (zα + zβ)2 σ2 δ2 µ + 1.96σ µ − 1.96σ z = ¯x − µ σ z = ¯x − µ ˆσ√ n SE(¯x) = ˆσ √ n n = (zα + zβ)2 σ2 δ2 µ + 1.96σ µ − 1.96σ
  31. 31. p-Value The probability of observing as extreme a result assuming the null hypothesis is true The probability of the data given the model OR
  32. 32. Null  Hypothesis:  H0 H H Accept H Type-II Error Reject H Type-I Error Truth Observation False Positive
  33. 33. All we can ever say is either not enough evidence that retention rates are the same the retention rates are different, 95% of the time p < 0.05 p-Value
  34. 34. actually... The evidence supports a rejection of the null hypothesis, i.e. the probability of seeing a result as extreme as this, assuming the retention rate is actually 30%, is less than 5%. p < 0.05
  35. 35. Issue #2 “Peeking”
  36. 36. Required change Standard deviation Number of participants per group n ≈ 16σ 2 δ2
  37. 37. Peeks 5% Equivalent 1 2.9% 2 2.2% 3 1.8% 5 1.4% 10 1% To get 5% false positive rate you need... i.e. 5 times http://www.evanmiller.org/how-­‐not-­‐to-­‐run-­‐an-­‐ab-­‐test.html
  38. 38. Issue #3 Family-wise Error
  39. 39. Family-wise Error p = P( Type-I Error) = 0.05 5% of the time we will get a false positive - for one treatment P( no Type-I Error) = 0.95 P( at least 1 Type-I Error for 2 treatments) = (1 - 0.9025) = 0.0975 P( no Type-I Error for 2 treatments) = (0.95)(0.95) = 0.9025
  40. 40. Bayesian View
  41. 41. 0.0 0.2 0.4 0.6 0.8 1.0 0246810 Retention Density Expected retention rate ~30% “Belief” uncertainty
  42. 42. 0.0 0.2 0.4 0.6 0.8 1.0 0246810 Retention Density ~15% New “belief” after 0 retained users Stronger belief after observing evidence
  43. 43. The probability of the model given the data
  44. 44. p(heads) = p(tails) = 0.5
  45. 45. Tossing the Coin 0 10 20 30 40 50 0.00.20.40.60.81.0 Number of flips ProportionofHeads Heads Tails THTHTTTHHTHH…
  46. 46. 1 5 10 50 500 0.00.20.40.60.81.0 Number of flips ProportionofHeads Tossing the Coin Long run average = 0.5
  47. 47. p(y|x) = p(y, x) p(x) p(x|y) = p(x, y) p(y) p(y, x) = p(x, y) = p(y|x)p(x) = p(x|y)p(y) p(y|x) = p(x|y)p(y) p(x) p(y|x) = p(x|y)p(y) P Probability of x p(y|x) = p(y, x) p(x) p(x|y) = p(x, y) p(y) p(y, x) = p(x, y) = p(y|x)p(x) = p(x|y)p(y) p(y|x) = p(x|y)p(y) p(x) Probability of x and y (conjoint) p(y|x) = p(y, x) p(x) p(x|y) = p(x, y) p(y) p(y, x) = p(x, y) = p(y|x)p(x) = p(x|y)p p(x|y)p(y) Probability of x given y (conditional) Terminology
  48. 48. The Bernoulli distribution For a “fair” coin, θ = 0.5 Head (H) = 1, Tails (T) = 0 p(x|θ) = θx(1 − θ)(1−x) A single toss: p(x|θ) = θx(1 − θ)(1−x) p(heads = 1|0.5) = 0.51(1−0.5)(1−1) = 0.5 0 (1−0) p(x|θ) = θx(1 − θ)(1−x) p(heads = 1|0.5) = 0.51(1−0.5)(1−1) = 0.5 p(tails = 0|0.5) = 0.50(1−0.5)(1−0) = 0.5
  49. 49. The Binomial p(x|θ) = θx(1 − θ)(1−x) Probability of heads in a single throw: Probability of x heads in n throws: p(heads = 1|0.5) = 0.51 (1 − 0.5)(1−1) = 0.5 p(tails = 0|0.5) = 0.50 (1 − 0.5)(1−0) = 0.5 p(h|n, θ) = n h θh (1 − θ)(n−h) p(x|θ, n) = n x θx (1 − θ)(n−x)
  50. 50. 20 tosses of a fair coin = 1 p 2⇡ 2 e (x µ)2) 2 2 1 − θ)(n−x)p(θ)dθ pc − pv 1−pc)+pv(1−pv) n p(x|0.5, 20) p(x|θ)p(θ) x (1 − θ)(1−x) 0.51 (1 − 0.5)(1−1) = 0.5
  51. 51. 20 tosses of an “un-fair” coin 1 p 2⇡ 2 e (x µ)2) 2 2 θx(1 − θ)(n−x)p(θ)dθ ztest = pc − pv pc(1−pc)+pv(1−pv) n p(x|0.2, 20) p(x|0.5, 20) p(¯θ) = p(x|θ)p(θ) p(x|θ) = θx (1 − θ)(1−x) s = 1|0.5) = 0.51 (1 − 0.5)(1−1) = 0.5
  52. 52. Likelihood of θ given observation i of x heads in n throws: “Binomial Likelihood” h p(x|θ, n) = n x θx (1 − θ)(n−x) 1 p(h|n, θ) = n h θh (1 − θ)(n−h) p(x|θ, n) = n x θx (1 − θ)(n−x) L(θ|xi, ni) = ni xi θxi (1 − θ)(ni−xi)
  53. 53. The Likelihood Increasing likelihood of θ with more observations… 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.4 1 heads in 2 throws. θ Likelihood 0.0 0.2 0.4 0.6 0.8 1.0 0.000.100.20 5 heads in 10 throws. θ Likelihood 0.0 0.2 0.4 0.6 0.8 1.0 0.0000.0100.020 500 heads in 1000 throws. θ Likelihood
  54. 54. p(¯θ) = p(x|θ)p(θ) p(x|θ) = θx (1 − θ)(1−x) p(heads = 1|0.5) = 0.51 (1 − 0.5)(1−1) = 0.5 p(tails = 0|0.5) = 0.50 (1 − 0.5)(1−0) = 0.5 The observations (#heads) p(x|0.2, 20) p(x|0.5, 20) p(¯θ) = p(x|θ)p(θ) p(x|θ) = θx (1 − θ)(1−x) p(heads = 1|0.5) = 0.51 (1 − 0.5)(1−1) = 0.5 p(tails = 0|0.5) = 0.50 (1 − 0.5)(1−0) = 0.5 The model parameter (e.g. fair coin) p(x|θ, n) = θx (1 − θ)(n−x) B(a, b) = Γ(a)Γ(b) Γ(a + b) = (a − 1)!(b − 1)! (a + b − 1)! beta(θ|a, b) = θ(a−1) (1 − θ)(b−1) / B(a, b) p(θ|x) = θx (1 − θ)(n−x) p(θ) / p(x)We want to know this ztest = c v pc(1−pc)+pv(1−pv) n p(x|0.2, 20) p(x|0.5, 20) p(¯θ) = p(x|θ)p(θ) p(x|θ) = θx (1 − θ)(1−x) p(heads = 1|0.5) = 0.51 (1 − 0.5)(1−1) = 0.5 Probability of data given model A recap…
  55. 55. Note that p(x, y) = p(x|y)p(y) = p(y|x)p(x) p(x|θ) ̸= p(θ|x) p(cloudy|raining) ̸= p(raining|cloudy) p(θ|x, n) = θx (1 − θ)(n−x) θ(a−1) (1 − θ)(b−1) / B(a, b) p( p(θ|x, n) = θx+a−1 (1 − θ)(n−x+b−1) / B(x + a, n − x + b
  56. 56. p(y|x) = p(y, x)p(x) p(x|y) = p(x, y)p(y) p(x, y) = p(x|y) p(y) = p(y|x) p(x) p(x|y)p(y) = p(y|x)p(x) p(y|x) = p(x|y)p(y) p(x) p(x, y) = p(x|y)p(y) = p(y|x)p(x) p(x|y) ̸= p(y|x) p(cloudy|raining) ̸= p(raining|cloudy) (θ|x, n) = θx (1 − θ)(n−x) θ(a−1) (1 − θ)(b−1) / B(a, b) p(x x+a−1 (n−x+b−1) Bayes’ Rule p(x, y) = p(x|y)p(y) = p(y|x)p(x p(x) = y p(x|y)p(y) p(x|θ) ̸= p(θ|x)
  57. 57. p(y|x) = p(x|y)p(y) P y p(x|y)p(y) discrete form p(x|y) = p(y) p(y, x) = p(x, y) = p(y|x)p(x p(y|x) = p(x|y)p(y) p(x) p(y|x) = p(x|y)p(y) R p(x|y)p(y)dy continuous form
  58. 58. p(x, n|θ) = θx(1 − θ)(n−x) ztest = pc − pv pc(1−pc)+pv(1−pv) n p(h, 20|0.2) p(x) p(✓|x) | {z } = p(x|✓) | {z } p(✓) |{z} / p(x) |{z} p(x) = Z p(x|✓)p(✓)d✓ prob of model prob #heads given model p(✓|x) | {z } = p(x|✓) | {z } p(✓) |{z} / p(x) |{z} p(x) = Z p(x|✓)p(✓)d✓ p(✓|x) | {z } = p(x|✓) | {z } p(✓) |{z} / p(x) |{z} p(x) = Z p(x|✓)p(✓)d✓ prob model given #heads p(x) p(x|y) = p(x, y) p(y) p(y, x) = p(x, y) = p(y|x)p(x) = p(x|y)p(y) p(y|x) = p(x|y)p(y) p(x) p(y|x) = p(x|y)p(y) R p(x|y)p(y)dy
  59. 59. p(✓|x) | {z } = p(x|✓) | {z } p(✓) |{z} / p(x) |{z} p(x) = Z p(x|✓)p(✓)d✓ posterior likelihood prior normalizing factor p(✓|x) | {z } = p(x|✓) | {z } p(✓) |{z} / p(x) |{z} p(x) = Z p(x|✓)p(✓)d✓normalizing factor
  60. 60. p(✓|x) | {z } = p(x|✓) | {z } p(✓) |{z} / p(x) |{z} p(x) = Z p(x|✓)p(✓)d✓ The prior Captures our “belief” in the model based on prior experience, observations or knowledge
  61. 61. Best estimate so far Iterations with more data… p(θ|x) = p(x|θ) p(θ) / p(x) p(θ|x) = θx (1 − θ)(n−x) p(θ) θx(1 − θ)(n−x)p(θ)dθ ztest = pc − pv pc(1−pc)+pv(1−pv) n p(h, 20|0.2) p(x, n|θ) = θ (1 − θ) p(θ|x) = θx (1 − θ)(n−x) p(θ) / p(x) ˆp0(θ|x0) = p(x0|θ) p(θ) / p(x0) ˆp1(θ|x1) = p(x1|θ) ˆp0(θ) / ˆp0(x1) ˆpn(θ|xn) = p(xn|θ) ˆpn−1(θ) / ˆpn−1(xn) x (n−x) p(θ|x) = θx (1 − θ)(n−x) p(θ) / p(x) ˆp0(θ|x0) = p(x0|θ) p(θ) / p(x0) ˆp1(θ|x1) = p(x1|θ) ˆp0(θ) / ˆp0(x1) ˆpn(θ|xn) = p(xn|θ) ˆpn−1(θ) / ˆpn−1(xn) p(θ|x) = θx (1 − θ)(n−x) p(θ) p(θ|x) = θx (1 − θ)(n−x) p(θ) / p(x) ˆp0(θ|x0) = p(x0|θ) p(θ) / p(x0) ˆp1(θ|x1) = p(x1|θ) ˆp0(θ) / ˆp0(x1) ˆpn(θ|xn) = p(xn|θ) ˆpn−1(θ) / ˆpn−1(xn) p(θ|x) = θx (1 − θ)(n−x) p(θ) x (n−x)
  62. 62. Selecting a prior We’d like the product of prior and likelihood to be “like” the likelihood We’d like the integral to be easily evaluated p(x, n|θ) = θx (1 − θ)(n−x) p(θ|x) = θx (1 − θ)(n−x) p(θ) / p(x) p(θ|x) = θx (1 − θ)(n−x) p(θ) θx(1 − θ)(n−x)p(θ)dθ ztest = pc − pv pc(1−pc)+pv(1−pv) n p(h|n, θ) = n h θh (1 − θ)(n−h) p(x|θ, n) = n x θx (1 − θ)(n−x) 1
  63. 63. “Conjugate prior” ztest = pc(1−pc)+pv(1−pv) n p(h, 20|0.2) p(¯θ) = p(x|θ)p(θ) p(x|θ) = θx (1 − θ)(1−x)
  64. 64. Beta distributionp(x, n|θ) = θx (1 − θ)(n−x) beta(θ|a, b) = θ(a−1) (1 − θ)(b−1) / B(a, b) p(θ|x) = θx (1 − θ)(n−x) p(θ) / p(x) ˆp0(θ|x0) = p(x0|θ) p(θ) / p(x0) ˆp1(θ|x1) = p(x1|θ) ˆp0(θ) / ˆp0(x1) number of heads + 1 number of tails + 1 p(θ|x, n) = θx (1 − θ)(n−x) θ(a−1) (1 − θ)(b−1) / B(a, b) p(θ|x, n) = θx+a−1 (1 − θ)(n−x+b−1) / B(x + a, n − x p(x, n|θ) = θx (1 − θ)(n−x) B(a, b) = Γ(a)Γ(b) Γ(a + b) = (a − 1)!(b − 1)! (a + b − 1)!
  65. 65. Beta distribution
  66. 66. beta priorbinomial likelihood Putting it together… p(θ|x, n) = θx (1 − θ)(n−x) θ(a−1) (1 − θ)(b−1) / B(a, b) p(x, n) p(θ|x, n) = θx+a−1 (1 − θ)(n−x+b−1) / B(x + a, n − x + b) p(x, n|θ) = θx (1 − θ)(n−x) number of heads x number of tails (n-x) p(x|y) ̸= p(y|x) p(cloudy|raining) ̸= p(raining|cloudy) p(θ|x, n) = θx (1 − θ)(n−x) θ(a−1) (1 − θ)(b−1) / B(a, b) p(x) p(θ|x, n) = θx+a−1 (1 − θ)(n−x+b−1) / B(x + a, n − x + b) p(θa > θb) = θa>θb p(θa|xa)p(θb|xb) dθadθb
  67. 67. 1. Decide on a prior, which captures your belief 2. Run experiment and observe data for heads, tails 3. Determine your posterior based on the data 4. Use posterior as your new belief and re-run experiment 5. Rinse, repeat until you hit an actionable certainty Putting it together…
  68. 68. Uniform prior “Fair” coin Pretty sure coin is fair
  69. 69. “Coin is fair” prior “Fair” coin Very sure coin is fair
  70. 70. Uniform prior “Biased” coin Pretty sure coin is unfair
  71. 71. “Coin is fair” prior “biased” coin Not sure of anything yet!
  72. 72. When to reject?
  73. 73. The Credible Interval Uniform prior “Fair” coin 95% credible interval
  74. 74. The Credible Interval Uniform prior “Biased” coin Outside credible interval
  75. 75. Captures our prior belief, expertise, opinion Strong prior belief means: ‣ we need lots of evidence to contradict ‣ results converge more quickly (if prior is “relevant”) Provides inertia With enough samples, prior’s impact diminishes, rapidly The Prior
  76. 76. Running a test… A B
  77. 77. With 1 or more variants we have a multi-dimensional problem Need to evaluate volumes under the posterior In general requires numerical quadrature = Markov Chain Monte-Carlo (MCMC) Multiple variant tests
  78. 78. p(θ|x, n) = θx (1 − θ)(n−x) θ(a−1) p(θ|x, n) = θx+a−1 (1 − θ)(n−x+ p(θa > θb) = θa>θb p(θ p(θ|x, n) = θx (1 − θ)(n−x) θ(a−1) (1 − θ) p(θ|x, n) = θx+a−1 (1 − θ)(n−x+b−1) / p(θa > θb) = θ >θ p(θa|xa)p(
  79. 79. Probability of Winning |x, n) = θx (1 − θ)(n−x) θ(a−1) (1 − θ)(b−1) / B(a, b) B(x p(θ|x, n) = θx+a−1 (1 − θ)(n−x+b−1) / B(x + a, n − x + b p(θa > θb) = θa>θb p(θa|xa)p(θb|xb) dθadθb p(x|θ, n) = θx (1 − θ)(n−x) B(a, b) = Γ(a)Γ(b) Γ(a + b) = (a − 1)!(b − 1)! (a + b − 1)!
  80. 80. p(θ|x, n) = θx (1 − θ)(n−x) θ(a−1) (1 − θ) p(θ|x, n) = θx+a−1 (1 − θ)(n−x+b−1) / p(θa > θb) = θa>θb p(θa|xa)p(θ
  81. 81. What’s the prior?
  82. 82. Fit a beta a=42, b=94
  83. 83. Some examples...
  84. 84. A successful test
  85. 85. Probability of beating all
  86. 86. Observed conversion rates (with CI bounds)
  87. 87. A not so successful test...
  88. 88. Probability of beating all
  89. 89. Observed conversion rate (posterior)
  90. 90. Assumptions Users are independent User’s convert quickly (immediately) Probability of conversion is independent of time
  91. 91. Un-converged conversion rate
  92. 92. Benefits / Features Continuously observable No need to fix population size in advance Incorporate prior knowledge / expertise Result is a “true” probability A measure of the difference magnitude is given Consistent framework for lots of different scenarios
  93. 93. Useful Links • h#ps://github.com/CamDavidsonPilon/Probabilistic-­‐‑Programming-­‐‑ and-­‐‑Bayesian-­‐‑Methods-­‐‑for-­‐‑Hackers   • “Doing  Bayesian  Data  Analysis:  A  Tutorial  with  R  and  Bugs”,  John  K.   Kruschke   • h#p://www.evanmiller.org/how-­‐‑not-­‐‑to-­‐‑run-­‐‑an-­‐‑ab-­‐‑test.html     • h#p://www.kaushik.net  -­‐‑  Occam’s  Razor  Blog   • h#p://exp-­‐‑platform.com  -­‐‑  Ron  Kovahi  et  al.  
  94. 94. Thanks
  95. 95. Multi-arm bandits 5% 6% 10% 4% Pull 1 Pull 2 Pull 3
  96. 96. Multi-arm bandits Draw sample from each "arm" prior distribution Evaluate Success Update posterior for chosen "arm" Pick largest & deploy Thompson Sampling Thompson, William R. "On the likelihood that one unknown probability exceeds another in view of the evidence of two samples". Biometrika, 25(3–4):285–294
  97. 97. https://github.com/CamDavidsonPilon/Probabilistic-­‐Programming-­‐and-­‐Bayesian-­‐Methods-­‐for-­‐Hackers
  98. 98. Difficulty tuning a slight silly example… Canacandycrushbalt
  99. 99. Winner

    Be the first to comment

    Login to see the comments

  • ssuserb75425

    Dec. 23, 2015
  • yujiukai

    Jun. 28, 2017
  • hqtheworst

    May. 9, 2018

A/B Testing for Game Design Iteration: A Bayesian Approach

Views

Total views

1,945

On Slideshare

0

From embeds

0

Number of embeds

3

Actions

Downloads

188

Shares

0

Comments

0

Likes

3

×