Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Matt gershoff

371 views

Published on

#EMSNYCDAY2

Published in: Marketing
  • Be the first to comment

Matt gershoff

  1. 1. AB TESTING TO AI (REINFORCEMENT LEARNING)
  2. 2. WHO IS THIS GUY? • Matt Gershoff • CEO: Conductrics • Twitter:mgershoff • Email:matt@conductrics.com
  3. 3. AI is …?
  4. 4. WHAT WE WILL TALK ABOUT • Definition of Reinforcement Learning –Trial and Error Learning •AB Testing (Bayesian) •Multi-Armed Bandit – (Automation) •Bandit with Targeting –Multi-Touch Point Optimization •Attribution=Dynamics •Q-Learning
  5. 5. What is Reinforcement Learning?
  6. 6. Reinforcement Learning is a Problem not a Solution
  7. 7. Reinforcement Learning Problem: Learn to make a Sequence of Decisions by Trial & Error in order to Achieve (delayed) Goal(s)
  8. 8. EXAMPLE
  9. 9. MARKETING PROBLEMS Online Applications – websites, mobile, things communicating via HTTP Low Risk Decisions* – i.e. ‘Which Banner’ High Volume* – not for one off, or for decisions that are made infrequently * High Volume/Low Risk from here http://jtonedm.com/
  10. 10. TRIAL AND ERROR LEARNING AB Testing/Bandit Sequential Decisions Targeting
  11. 11. A B Page A Convert Don’t Convert Location Decision Objective/Payoff TRIAL AND ERROR: AB TESTING
  12. 12. How to Solve: A B Page A Convert Don’t Convert Location Decision Objective/Payoff TRIAL AND ERROR: AB TESTING
  13. 13. How to Solve: 1. AB Testing A B Page A Convert Don’t Convert Location Decision Objective/Payoff TRIAL AND ERROR: AB TESTING
  14. 14. AB Testing: Bayesian Red Button Green Button
  15. 15. Bayesian AB Test asks: AB Testing: Bayesian
  16. 16. Bayesian AB Test asks: AB Testing: Bayesian Is P( Green|DATA)> P(Red|DATA)?
  17. 17. BAYESIAN AB TESTING REVIEW P( Green|DATA)> P(Red|DATA)=50% Sample Size=0
  18. 18. BAYESIAN AB TESTING REVIEW P( Green|DATA)> P(Red|DATA)=68% Sample Size=100
  19. 19. BAYESIAN AB TESTING REVIEW P( Green|DATA)> P(Red|Data) = 94% Sample Size=1,000
  20. 20. BAYESIAN AB TESTING REVIEW P( Green|DATA)> P(Red|Data)=99.99…% Sample Size=10,000
  21. 21. AB TESTING ->LEARN FIRST Conductrics Confidential 23 Time Explore/ Learn Exploit/ Earn Data Collection/Sample Apply Leaning
  22. 22. How to Solve: 1. AB Testing 2. Multi-Arm Bandit A B Page A Convert Don’t Convert Location Decision Objective/Payoff SINGLE LOCATION DECISIONS/AB TEST
  23. 23. Like Bayesian AB Testing • Calculate P(A|Data) & P(B|Data) Unlike AB Testing • Don’t make fair selections (50/50) • Select based on P(A|Data) & P(B|Data) BANDIT: THOMPSON SAMPLING
  24. 24. Adaptive Construct Probability Distributions • Use Mean as center • Standard Deviation for spread A B C ADAPTIVE: THOMPSON SAMPLING
  25. 25. A B C Adaptive – For Each User 1)Take a random sample from each distribution A=0.49 ADAPTIVE: THOMPSON SAMPLING
  26. 26. A B C Adaptive – For Each User 1)Take a random sample from each distribution A=0.49B=0.51 ADAPTIVE: THOMPSON SAMPLING
  27. 27. A B C Adaptive – For Each User 1)Take a random sample from each distribution A=0.49C=0.46 B=0.51 ADAPTIVE: THOMPSON SAMPLING
  28. 28. A B C Adaptive – For Each User 1)Pick Option with Highest Score (Option B) A=0.49C=0.46 B=0.51 ADAPTIVE: THOMPSON SAMPLING
  29. 29. A B C Adaptive – Repeat 1)Take a random sample from each distribution ADAPTIVE: THOMPSON SAMPLING
  30. 30. A B C Adaptive – Repeat 1)Take a random sample from each distribution A=0.52 ADAPTIVE: THOMPSON SAMPLING
  31. 31. A B C Adaptive – Repeat 1)Take a random sample from each distribution A=0.52B=0.43 ADAPTIVE: THOMPSON SAMPLING
  32. 32. A B C Adaptive – Repeat 1)Take a random sample from each distribution A=0.52C=0.49B=0.43 ADAPTIVE: THOMPSON SAMPLING
  33. 33. A B C Adaptive – Repeat 1)Take a random sample from each distribution A=0.52C=0.49B=0.43 ADAPTIVE: THOMPSON SAMPLING
  34. 34. Selection Chance based on: 1. Relative estimated mean value of the option 2. Amount of overlap of the distributions 67% 8% 25% 0% 20% 40% 60% 80% 100% Option A Option B Option C Selection Chance ADAPTIVE: THOMPSON SAMPLING
  35. 35. 37 twitter: @mgershoff Trial and Error Learning Sequential Decisions Predictive Targeting TARGETING
  36. 36. PREDICTIVE TARGETING A Mapping ∑Behavioral Data Option/Actions
  37. 37. Confidential Thompson Sampling with Targeting
  38. 38. Confidential Thompson Sampling with Targeting
  39. 39. Source: Larochelle - Neural Networks 1 - DLSS 2017.pdfConductrics Inc. | Matt Gershoff | www.conductrics.com | @conductrics LEARNING THE MAPPINGS • Regression (Lin, Logistic, etc.) • Deep Nets • Decision Trees
  40. 40. 𝑓(𝑥) = 𝑤0 + ෍ 𝑑 𝑤 𝑑 ∗ 𝑥 𝑑 Conductrics Inc. | Matt Gershoff | www.conductrics.com | @conductrics REGRESSION
  41. 41. 1) Input Data 2) Hidden Layer 3) Hidden Layer 4) Output Layer Source: Larochelle - Neural Networks 1 - DLSS 2017.pdfConductrics Inc. | Matt Gershoff | www.conductrics.com | @conductrics DEEP LEARNING
  42. 42. Model as Decision Tree What Simple Model? Conductrics Inc. | Matt Gershoff | www.conductrics.com | @conductrics
  43. 43. REINFORCEMENT LEARNING
  44. 44. REINFORCEMENT LEARNING 1. Sequential Decisions 2. Delayed Rewards
  45. 45. EXAMPLE
  46. 46. Enter Site Page 1 Page 2 MULTI-TOUCH = DYNAMICS
  47. 47. Enter Site Page 1 Page 2 C D A B MULTI-TOUCH = DYNAMICS
  48. 48. Enter Site Exit Site Page 1 Page 2 C D A B MULTI-TOUCH = DYNAMICS
  49. 49. Enter Site Exit SiteGoal Page 1 Page 2 C D A B MULTI-TOUCH = DYNAMICS
  50. 50. Enter Site Exit SiteGoal Page 1 Page 2 C D A B MULTI-TOUCH = DYNAMICS
  51. 51. Enter Site Exit SiteGoal Page 1 Page 2 C D A B MULTI-TOUCH = DYNAMICS
  52. 52. 1. Conversion Rates Option Value Page1:A 3% Page1:B 4% Page2:C 10% Page2:D 12% MULTI-TOUCH = DYNAMICS
  53. 53. 1. Conversion Rates 2. Transition Frequencies Page:Action Page 1 Page 2 Page1:A - 30% Page1:B - 20% Page2:C 2% - Page2:D 1% - MULTI-TOUCH = DYNAMICS
  54. 54. This is Complicated! MULTI-TOUCH = DYNAMICS
  55. 55. Q Learning ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔 𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕
  56. 56. ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔 𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 Q-LEARNING
  57. 57. Analytics Interpretation of Q-Learning 1)Treat Landing on the Next Page like a regular conversion! Q-LEARNING
  58. 58. Analytics Interpretation of Q-Learning 1)Treat Landing on the Next Page like a regular conversion! 2)Use the estimates at the next step as the conversion value! Q-LEARNING
  59. 59. Page 1 A B 1) Take an action Q-LEARNING
  60. 60. Page 1 A 1) Take an action – Pick A Q-LEARNING
  61. 61. Page 1 A 2) Measure what user does after Q-LEARNING
  62. 62. 2) Do they Convert? $10 Page 1 A Q-LEARNING
  63. 63. 2) Yes! $10 Page 1 A Q-LEARNING
  64. 64. ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 2) Set r =$10 $10 Page 1 A Q-LEARNING
  65. 65. EXACTLY the SAME as AB TESTING $10 Page 1 A Q-LEARNING
  66. 66. 3) Do they next go to Page 2? Goal Page 1 A Page 2 Q-LEARNING
  67. 67. 3) Yes! Goal Page 1 Page 2 A Q-LEARNING
  68. 68. 3) Yes! Now in Dynamic part of Path Goal Page 1 Page 2 A Q-LEARNING
  69. 69. 71
  70. 70. Page 2 C D 4) Check Current Estimated Values ‘C’ & ‘D’ Q-LEARNING
  71. 71. 4) Check Current Estimated Values ‘C’ & ‘D’ Of course initially C=$0; D=$0 Page 2 C D $0 $0 Q-LEARNING
  72. 72. 4) Check Current Estimated Values ‘C’ & ‘D’ But assume mean of C=$1; D=$5 Page 2 C D $1 $5 Q-LEARNING
  73. 73. ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 4) Set max(Q(st,at)) = $5 (value of D) Page 2 C D $1 $5 Q-LEARNING
  74. 74. ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 1. 𝛄 𝐢𝐬 the 𝐝𝐢𝐬𝐜𝐨𝐮𝐧𝐭 𝐫𝐚𝐭𝐞 2. Related to Google’s Half Life 3. 7 day half life  0.9 Q-LEARNING
  75. 75. ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 5) 𝐏𝐚𝐠𝐞𝟏: 𝐀 = $𝟏𝟎 + 𝟎. 𝟗 ∗ $𝟓 $10 Page 1 Page 2 A Q-LEARNING
  76. 76. Direct Credit: $10.0 Attribution Credit: $4.5 Q-LEARNING
  77. 77. Direct Credit: $10.0 Attribution Credit: $4.5 Total Page1|A: $14.5 Q-LEARNING
  78. 78. 5) 𝐂𝐫𝐞𝐝𝐢𝐭 𝐏𝐚𝐠𝐞𝟏: 𝐀 = $𝟏𝟒. 𝟓 $10 Page 1 Page 2 A Q-LEARNING
  79. 79. Attribution in just two simple steps: 1)Treat Landing on Next Page like a regular conversion! 2)Use Predictions of future values at the next step as the conversion value! Q-LEARNING
  80. 80. Q Learning + Targeting User: Is a New User and from Rural area Page 1 Page 2 A
  81. 81. User: Is a New User and from Rural area Page 1 Page 2 A Q Learning + Targeting
  82. 82. Attribution calculation depends on [Rural;New] Page 1 Page 2 A Q Learning + Targeting
  83. 83. 85 Source: Conductrics Predictive Audience Discovery Q-VALUE: NEW & RURAL USER
  84. 84. 86 Source: Conductrics Predictive Audience Discovery Q-VALUE: NEW & RURAL USER
  85. 85. 87 Source: Conductrics Predictive Audience Discovery Q-VALUE: NEW & RURAL USER
  86. 86. 88 Q-VALUE: NEW & RURAL USER 1. For New & Rural users Option B has highest value 2. Use predicted value of Option B for use in the Q-value calculation Source: Conductrics Predictive Audience Discovery
  87. 87. ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 Page 1 Page 2 A 𝐏𝐚𝐠𝐞𝟏: 𝐀 = 𝟎 + 𝟎. 𝟗 ∗ 𝟎. 𝟒𝟏 Q Learning + Targeting
  88. 88. ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 Page 1 Page 2 A 𝐏𝐚𝐠𝐞𝟏: 𝐀 = 𝟎. 𝟑𝟔𝟗 Q Learning + Targeting
  89. 89. 1) Bandits help solve Automation 2) Attribution can be solved by hacking ‘AB Testing’ (Q-Learning) 3) Extended Attribution to include decisions/experiments 4) Looked into the eye of AI and Lived WHAT DID WE LEARN
  90. 90. WAKE UP. WE ARE DONE! Twitter:mgershoff Email:matt.gershoff@conductrics.com

×