Successfully reported this slideshow.
Your SlideShare is downloading. ×

Reinforcement learning conductrics-superweek2017

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 95 Ad

Reinforcement learning conductrics-superweek2017

Web Optimization is a Reinforcement Learning problem. Q-Learning is introduced as a way to integrate AB Testing, Attribution, and Predictive Targeting.

Web Optimization is a Reinforcement Learning problem. Q-Learning is introduced as a way to integrate AB Testing, Attribution, and Predictive Targeting.

Advertisement
Advertisement

More Related Content

Similar to Reinforcement learning conductrics-superweek2017 (20)

Advertisement

Reinforcement learning conductrics-superweek2017

  1. 1. Learning over Sequences of Decisions
  2. 2. CEO / Co-Founder Conductrics www.conductrics.com Past: Database Marketing Education: Artificial Intelligence & Economics twitter:@mgershoff, @conductrics Email:matt@conductrics.com www.conductrics.com/blog Who is this guy?
  3. 3. AI in the News www.Conductrics.com @conductrics
  4. 4. AI in the News www.Conductrics.com @conductrics
  5. 5. AI is …? www.Conductrics.com @conductrics
  6. 6. What’s In it For You www.Conductrics.com @conductrics
  7. 7. • Reinforcement Learning (RL): • AB Testing What’s In it For You www.Conductrics.com @conductrics
  8. 8. • Reinforcement Learning (RL): • AB Testing • Attribution What’s In it For You www.Conductrics.com @conductrics
  9. 9. • Reinforcement Learning (RL): • AB Testing • Attribution • Predictive Targeting What’s In it For You www.Conductrics.com @conductrics
  10. 10. • Reinforcement Learning (RL): • AB Testing • Attribution • Predictive Targeting • An RL Solution from AI What’s In it For You www.Conductrics.com @conductrics
  11. 11. • Reinforcement Learning (RL): • AB Testing • Attribution • Predictive Targeting • An RL Solution from AI • Tell EveryoneYou Know AI!!! What’s In it For You www.Conductrics.com @conductrics
  12. 12. What is Reinforcement Learning? www.Conductrics.com @conductrics
  13. 13. Reinforcement Learning is a Problem not a Solution www.Conductrics.com @conductrics
  14. 14. Reinforcement Learning Problem: Learn to make a Sequence of Decisions by Trial & Error in order to Achieve some Goal(s) www.Conductrics.com @conductrics
  15. 15. Reinforcement Learning Example: www.Conductrics.com @conductrics
  16. 16. Reinforcement Learning www.Conductrics.com @conductrics
  17. 17. www.Conductrics.com @conductrics
  18. 18. www.Conductrics.com @conductrics
  19. 19. 19 AB Testing Sequential Decisions Targeting Part 1: AB TEST = Trial & Error Learning www.Conductrics.com @conductrics
  20. 20. Single Location Decisions/AB Test Home Page Hero Decision Point Decision Simple Image Fancy Version Fancy Version RL Agent www.Conductrics.com @conductrics
  21. 21. Page A Location Single Location Decisions/AB Test www.Conductrics.com @conductrics
  22. 22. A B Page A Location Decision Single Location Decisions/AB Test www.Conductrics.com @conductrics
  23. 23. A B Page A Convert Location Decision Objective/Payoff Single Location Decisions/AB Test
  24. 24. A B Page A Convert Location Decision Objective/Payoff Single Location Decisions/AB Test www.Conductrics.com @conductrics
  25. 25. A B Page A Convert Don’t Convert Location Decision Objective/Payoff Single Location Decisions/AB Test
  26. 26. A B Page A Convert Don’t Convert Location Decision Objective/Payoff Single Location Decisions/AB Test www.Conductrics.com @conductrics
  27. 27. How to Solve: A B Page A Convert Don’t Convert Location Decision Objective/Payoff Single Location Decisions/AB Test
  28. 28. How to Solve: 1. AB/MV Testing A B Page A Convert Don’t Convert Location Decision Objective/Payoff Single Location Decisions/AB Test www.Conductrics.com @conductrics
  29. 29. How to Solve: 1. AB/MV Testing 2. Multi-Arm Bandit A B Page A Convert Don’t Convert Location Decision Objective/Payoff Single Location Decisions/AB Test
  30. 30. Only need Conversion Data Option Value A 5% B 6% Single Location Decisions/AB Test www.Conductrics.com @conductrics
  31. 31. 31 Part 2: Attribution as Sequential Decisions AB Testing Attribution Targeting www.Conductrics.com @conductrics
  32. 32. Single Location Compound: MVT Home Page Banner Hero Decision Point Decision #1 Decision #2 Banner A Banner B Banner C Simple Image Fancy Version Banner C Fancy Version RL Agent Search Results Special Offers CheckoutSignup Home Page www.Conductrics.com @conductrics
  33. 33. Sequential Decisions -> Dynamics Enter Site Page 1 Page 2 www.Conductrics.com @conductrics
  34. 34. Enter Site Page 1 Page 2 C D A B Sequential Decisions -> Dynamics www.Conductrics.com @conductrics
  35. 35. Enter Site Exit Site Page 1 Page 2 C D A B Sequential Decisions -> Dynamics
  36. 36. Enter Site Exit SiteGoal Page 1 Page 2 C D A B Sequential Decisions -> Dynamics www.Conductrics.com @conductrics
  37. 37. Enter Site Exit SiteGoal Page 1 Page 2 C D A B Sequential Decisions -> Dynamics www.Conductrics.com @conductrics So far just AB Testing
  38. 38. Enter Site Exit SiteGoal Page 1 Page 2 C D A B Sequential Decisions -> Dynamics www.Conductrics.com @conductrics Now add Dynamics
  39. 39. Enter Site Exit SiteGoal Page 1 Page 2 C D A B Sequential Decisions -> Dynamics Transitions = Dynamics www.Conductrics.com @conductrics
  40. 40. Enter Site Exit SiteGoal Page 1 Page 2 C D A B Sequential Decisions -> Dynamics www.Conductrics.com @conductrics
  41. 41. 1. Conversion Rates Option Value Page1:A 3% Page1:B 4% Page2:C 10% Page2:D 12% Sequential Decisions
  42. 42. 1. Conversion Rates 2. Transition Frequencies Page:Action Page 1 Page 2 Page1:A - 30% Page1:B - 20% Page2:C 2% - Page2:D 1% - Sequential Decisions -> Dynamics www.Conductrics.com @conductrics
  43. 43. This is Complicated! Sequential Decisions -> Dynamics
  44. 44. How to Assign Value? Backward Calculation: AssignValues BACK to events AFTER the Conversion www.Conductrics.com @conductrics
  45. 45. Agent Calculating Attribution: Backward Looking Search Results Special Offers Signup Home Page
  46. 46. How Does Google Do it? www.Conductrics.com @conductrics
  47. 47. How Does Google Do it?
  48. 48. How to Assign Value? Backward Calculation: Is this the ONLY way? www.Conductrics.com @conductrics
  49. 49. Q Learning ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔 𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 www.Conductrics.com @conductrics
  50. 50. Q Learning ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔 𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 www.Conductrics.com @conductrics
  51. 51. Q Learning ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔 𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕
  52. 52. Q Learning ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔 𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 www.Conductrics.com @conductrics
  53. 53. Forward View: Q Learning ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔 𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 www.Conductrics.com @conductrics
  54. 54. Forward View: Q Learning Analytics Interpretation of Q-Learning 1)Treat Landing on the Next Page like a regular conversion! www.Conductrics.com @conductrics
  55. 55. Forward View: Q Learning Analytics Interpretation of Q-Learning 1)Treat Landing on the Next Page like a regular conversion! 2)Use the estimates at the next step as the conversion value! www.Conductrics.com @conductrics
  56. 56. Page 1 A B 1) Take an action Forward View: Q Learning www.Conductrics.com @conductrics
  57. 57. Page 1 A 1) Take an action – Pick A Forward View: Q Learning www.Conductrics.com @conductrics
  58. 58. Page 1 A 2) Measure what user does after Forward View: Q Learning www.Conductrics.com @conductrics
  59. 59. 2) Do they Convert? $10 Page 1 A Forward View: Q Learning www.Conductrics.com @conductrics
  60. 60. 2) Yes! $10 Page 1 A Forward View: Q Learning www.Conductrics.com @conductrics
  61. 61. ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 2) Set r =$10 $10 Page 1 A Forward View: Q Learning www.Conductrics.com @conductrics
  62. 62. EXACTLY the SAME as AB TESTING $10 Page 1 A Forward View: Q Learning www.Conductrics.com @conductrics
  63. 63. 3) Do they next go to Page 2? Goal Page 1 A Page 2 Forward View: Q Learning
  64. 64. 3) Yes! Goal Page 1 Page 2 A Forward View: Q Learning www.Conductrics.com @conductrics
  65. 65. 3) Yes! Now in Dynamic part of Path Goal Page 1 Page 2 A Forward View: Q Learning www.Conductrics.com @conductrics
  66. 66. Page 2 C D 4) Check Current Estimated Values ‘C’ & ‘D’ Forward View: Q Learning www.Conductrics.com @conductrics
  67. 67. 4) Check Current Estimated Values ‘C’ & ‘D’ Of course initially C=$0; D=$0 Page 2 C D $0 $0 Forward View: Q Learning www.Conductrics.com @conductrics
  68. 68. 4) Check Current Estimated Values ‘C’ & ‘D’ But lets just assume a mean of C=$1; D=$5 Page 2 C D $1 $5 Forward View: Q Learning www.Conductrics.com @conductrics
  69. 69. ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 4) Set max(Q(st,at)) = $5 Page 2 C D $1 $5 Forward View: Q Learning www.Conductrics.com @conductrics
  70. 70. ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 1. 𝛄 𝐢𝐬 the 𝐝𝐢𝐬𝐜𝐨𝐮𝐧𝐭 𝐫𝐚𝐭𝐞 2. Related to Google’s Half Life 3. 7 day half life  0.9 Forward View: Q Learning www.Conductrics.com @conductrics
  71. 71. ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 5) 𝐏𝐚𝐠𝐞𝟏: 𝐀 = 𝟏 + 𝟎. 𝟗 ∗ 𝟓 Goal Page 1 Page 2 A Forward View: Q Learning www.Conductrics.com @conductrics
  72. 72. Forward View: Q Learning Direct Credit: $1.0 Attribution Credit: $4.5 www.Conductrics.com @conductrics
  73. 73. Forward View: Q Learning Direct Credit: $1.0 Attribution Credit: $4.5 Total Page1|A: $5.5 www.Conductrics.com @conductrics
  74. 74. 5) 𝐂𝐫𝐞𝐝𝐢𝐭 𝐏𝐚𝐠𝐞𝟏: 𝐀 = 𝟓. 𝟓 Goal Page 1 Page 2 A Forward View: Q Learning www.Conductrics.com @conductrics
  75. 75. Attribution in just two simple steps: Forward View: Q Learning www.Conductrics.com @conductrics
  76. 76. Forward View: Q Learning Attribution in just two simple steps: 1)Treat Landing on Next Page like a regular conversion! www.Conductrics.com @conductrics
  77. 77. Attribution in just two simple steps: 1)Treat Landing on Next Page like a regular conversion! 2)Use Predictions of future values at the next step as the conversion value! Forward View: Q Learning www.Conductrics.com @conductrics
  78. 78. Attribution in just two simple steps: 1)Treat Landing on Next Page exactly like a conversion! 2)Use estimates at the next step as the conversion value! 3)This is guaranteed to converge to optimum result!!! Forward View: Q Learning www.Conductrics.com @conductrics
  79. 79. 79 Part 3: Targeting Trial & Error Learning Sequential Decisions Targeting www.Conductrics.com @conductrics
  80. 80. Targeting = Decision Logic IF [Customer] THEN [Experiences?] www.Conductrics.com @conductrics
  81. 81. Q Learning + Targeting User: Is a New User and from Rural area Page 1 Page 2 A www.Conductrics.com @conductrics
  82. 82. User: Is a New User and from Rural area Page 1 Page 2 A Q Learning + Targeting www.Conductrics.com @conductrics
  83. 83. Attribution calculation depends on [Rural;New] Page 1 Page 2 A Q Learning + Targeting www.Conductrics.com @conductrics
  84. 84. 84 www.Conductrics.com @conductrics At Page2: Evaluate Value New & Rural Customer
  85. 85. 85 At Page2: Evaluate Value New & Rural Customer www.Conductrics.com @conductrics
  86. 86. 86 www.Conductrics.com @conductrics At Page2: Evaluate Value New & Rural Customer
  87. 87. 87 www.Conductrics.com @conductrics
  88. 88. 88 Predicted Value=43% Q Learning + Targeting www.Conductrics.com @conductrics
  89. 89. ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕 Page 1 Page 2 A 𝐏𝐚𝐠𝐞𝟏: 𝐀 = 𝟎 + 𝟎. 𝟗 ∗ 𝟒𝟑% Q Learning + Targeting www.Conductrics.com @conductrics
  90. 90. 90 Case Study: Web & Call Center Optimize Marketing Site and Call Center IVR 1. WebSite • Initial Offer • Online Chat option 2. Call Center • Choice of IVR prompts Result: Call Center Conversion by 26% Increased average call value by $9.28www.Conductrics.com @conductrics
  91. 91. Sales drive Optimization User Features •Time of Day •Geo-Data •Browser Data Get Decision Website IVR Options Response IVR Prompts: A;B;C;…. Get Decision Call Agent IVR ‘Rewards’ Web Decision Case Study: Web and Call Center www.Conductrics.com @conductrics
  92. 92. Application Layer Targeted Multi-Touch Optimization Touch-Point 1 User 2) Option Response 3) Touch-Point Transition 1) Option Request Local Options Optimizer F1 F2 Fn S Local Model 1 Touch-Point 2 User 7) Option Response 4) Option Request Local Options Optimizer F1 F2 Fn S Local Model 25) Calculates Attribution Credit and sends to Model1 8)Conversion 6) Update Local Model1 using credit as a conversion 9) Update Local Model2 using conversion value • Attribution Credit enables Local Optimizers to Solve Global Multi-Touch Optimization www.Conductrics.com @conductrics
  93. 93. 1) Attribution can be solved by hacking ‘AB Testing’ (Q-Learning) 2) Extended Attribution to include decisions/experiments 3) Looked into the eye of AI and Lived What did we Do/Learn? www.Conductrics.com @conductrics
  94. 94. References www.Conductrics.com @conductrics 1) https://conductrics.com/data- science-resources-2 2) http://videolectures.net/mlss09uk_ littman_rl (model based RL) 3) https://en.wikipedia.org/wiki/Mar kov_decision_process
  95. 95. Thank you! www.Conductrics.com @conductrics

×