AB TESTING TO AI
(REINFORCEMENT LEARNING)
WHO IS THIS GUY?
• Matt Gershoff
• CEO: Conductrics
• Twitter:mgershoff
• Email:matt@conductrics.com
AI is …?
WHAT WE WILL TALK ABOUT
• Definition of Reinforcement Learning
–Trial and Error Learning
•AB Testing (Bayesian)
•Multi-Armed Bandit – (Automation)
•Bandit with Targeting
–Multi-Touch Point Optimization
•Attribution=Dynamics
•Q-Learning
What is
Reinforcement
Learning?
Reinforcement
Learning is a
Problem not a
Solution
Reinforcement Learning Problem:
Learn to make a Sequence
of Decisions by Trial &
Error in order to Achieve
(delayed) Goal(s)
EXAMPLE
MARKETING PROBLEMS
Online Applications – websites, mobile, things
communicating via HTTP
Low Risk Decisions* – i.e. ‘Which Banner’
High Volume* – not for one off, or for decisions
that are made infrequently
* High Volume/Low Risk from here http://jtonedm.com/
TRIAL AND ERROR LEARNING
AB
Testing/Bandit
Sequential
Decisions
Targeting
A
B
Page A Convert
Don’t
Convert
Location Decision Objective/Payoff
TRIAL AND ERROR: AB TESTING
How to Solve:
A
B
Page A Convert
Don’t
Convert
Location Decision Objective/Payoff
TRIAL AND ERROR: AB TESTING
How to Solve:
1. AB Testing
A
B
Page A Convert
Don’t
Convert
Location Decision Objective/Payoff
TRIAL AND ERROR: AB TESTING
AB Testing: Bayesian
Red Button Green Button
Bayesian AB Test asks:
AB Testing: Bayesian
Bayesian AB Test asks:
AB Testing: Bayesian
Is P( Green|DATA)> P(Red|DATA)?
BAYESIAN AB TESTING REVIEW
P( Green|DATA)> P(Red|DATA)=50%
Sample Size=0
BAYESIAN AB TESTING REVIEW
P( Green|DATA)> P(Red|DATA)=68%
Sample Size=100
BAYESIAN AB TESTING REVIEW
P( Green|DATA)> P(Red|Data) = 94%
Sample Size=1,000
BAYESIAN AB TESTING REVIEW
P( Green|DATA)> P(Red|Data)=99.99…%
Sample Size=10,000
AB TESTING ->LEARN FIRST
Conductrics Confidential 23
Time
Explore/
Learn
Exploit/
Earn
Data Collection/Sample Apply Leaning
How to Solve:
1. AB Testing
2. Multi-Arm Bandit
A
B
Page A Convert
Don’t
Convert
Location Decision Objective/Payoff
SINGLE LOCATION DECISIONS/AB TEST
Like Bayesian AB Testing
• Calculate P(A|Data) & P(B|Data)
Unlike AB Testing
• Don’t make fair selections (50/50)
• Select based on P(A|Data) & P(B|Data)
BANDIT: THOMPSON SAMPLING
Adaptive
Construct Probability Distributions
• Use Mean as center
• Standard Deviation for spread
A
B C
ADAPTIVE: THOMPSON SAMPLING
A
B C
Adaptive – For Each User
1)Take a random sample from each distribution
A=0.49
ADAPTIVE: THOMPSON SAMPLING
A
B C
Adaptive – For Each User
1)Take a random sample from each distribution
A=0.49B=0.51
ADAPTIVE: THOMPSON SAMPLING
A
B C
Adaptive – For Each User
1)Take a random sample from each distribution
A=0.49C=0.46 B=0.51
ADAPTIVE: THOMPSON SAMPLING
A
B C
Adaptive – For Each User
1)Pick Option with Highest Score (Option B)
A=0.49C=0.46
B=0.51
ADAPTIVE: THOMPSON SAMPLING
A
B C
Adaptive – Repeat
1)Take a random sample from each distribution
ADAPTIVE: THOMPSON SAMPLING
A
B C
Adaptive – Repeat
1)Take a random sample from each distribution
A=0.52
ADAPTIVE: THOMPSON SAMPLING
A
B C
Adaptive – Repeat
1)Take a random sample from each distribution
A=0.52B=0.43
ADAPTIVE: THOMPSON SAMPLING
A
B C
Adaptive – Repeat
1)Take a random sample from each distribution
A=0.52C=0.49B=0.43
ADAPTIVE: THOMPSON SAMPLING
A
B C
Adaptive – Repeat
1)Take a random sample from each distribution
A=0.52C=0.49B=0.43
ADAPTIVE: THOMPSON SAMPLING
Selection Chance based on:
1. Relative estimated mean value of the option
2. Amount of overlap of the distributions
67%
8%
25%
0%
20%
40%
60%
80%
100%
Option A Option B Option C
Selection Chance
ADAPTIVE: THOMPSON SAMPLING
37
twitter: @mgershoff
Trial and
Error
Learning
Sequential
Decisions
Predictive
Targeting
TARGETING
PREDICTIVE TARGETING
A Mapping
∑Behavioral
Data
Option/Actions
Confidential
Thompson Sampling with Targeting
Confidential
Thompson Sampling with Targeting
Source: Larochelle - Neural Networks 1 - DLSS 2017.pdfConductrics Inc. | Matt Gershoff |
www.conductrics.com | @conductrics
LEARNING THE MAPPINGS
• Regression (Lin, Logistic, etc.)
• Deep Nets
• Decision Trees
𝑓(𝑥) = 𝑤0 + ෍
𝑑
𝑤 𝑑 ∗ 𝑥 𝑑
Conductrics Inc. | Matt Gershoff |
www.conductrics.com | @conductrics
REGRESSION
1) Input Data
2) Hidden Layer
3) Hidden Layer
4) Output Layer
Source: Larochelle - Neural Networks 1 - DLSS 2017.pdfConductrics Inc. | Matt Gershoff |
www.conductrics.com | @conductrics
DEEP LEARNING
Model as Decision Tree
What Simple Model?
Conductrics Inc. | Matt Gershoff |
www.conductrics.com | @conductrics
REINFORCEMENT LEARNING
REINFORCEMENT LEARNING
1. Sequential Decisions
2. Delayed Rewards
EXAMPLE
Enter
Site
Page 1
Page 2
MULTI-TOUCH = DYNAMICS
Enter
Site
Page 1
Page 2 C D
A B
MULTI-TOUCH = DYNAMICS
Enter
Site
Exit Site
Page 1
Page 2 C D
A B
MULTI-TOUCH = DYNAMICS
Enter
Site
Exit SiteGoal
Page 1
Page 2 C D
A B
MULTI-TOUCH = DYNAMICS
Enter
Site
Exit SiteGoal
Page 1
Page 2 C D
A B
MULTI-TOUCH = DYNAMICS
Enter
Site
Exit SiteGoal
Page 1
Page 2 C D
A B
MULTI-TOUCH = DYNAMICS
1. Conversion Rates
Option Value
Page1:A 3%
Page1:B 4%
Page2:C 10%
Page2:D 12%
MULTI-TOUCH = DYNAMICS
1. Conversion Rates
2. Transition Frequencies
Page:Action Page 1 Page 2
Page1:A - 30%
Page1:B - 20%
Page2:C 2% -
Page2:D 1% -
MULTI-TOUCH = DYNAMICS
This is Complicated!
MULTI-TOUCH = DYNAMICS
Q Learning
ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔 𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕
ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔 𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕
Q-LEARNING
Analytics Interpretation of Q-Learning
1)Treat Landing on the Next Page like a
regular conversion!
Q-LEARNING
Analytics Interpretation of Q-Learning
1)Treat Landing on the Next Page like a
regular conversion!
2)Use the estimates at the next step as the
conversion value!
Q-LEARNING
Page 1 A B
1) Take an action
Q-LEARNING
Page 1 A
1) Take an action – Pick A
Q-LEARNING
Page 1 A
2) Measure what user does after
Q-LEARNING
2) Do they Convert?
$10
Page 1 A
Q-LEARNING
2) Yes!
$10
Page 1 A
Q-LEARNING
ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕
2) Set r =$10
$10
Page 1 A
Q-LEARNING
EXACTLY the SAME as AB TESTING
$10
Page 1 A
Q-LEARNING
3) Do they next go to Page 2?
Goal
Page 1 A
Page 2
Q-LEARNING
3) Yes!
Goal
Page 1
Page 2
A
Q-LEARNING
3) Yes! Now in Dynamic part of Path
Goal
Page 1
Page 2
A
Q-LEARNING
71
Page 2 C D
4) Check Current Estimated Values ‘C’ & ‘D’
Q-LEARNING
4) Check Current Estimated Values ‘C’ & ‘D’
Of course initially C=$0; D=$0
Page 2 C D
$0 $0
Q-LEARNING
4) Check Current Estimated Values ‘C’ & ‘D’
But assume mean of C=$1; D=$5
Page 2 C D
$1 $5
Q-LEARNING
ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕
4) Set max(Q(st,at)) = $5 (value of D)
Page 2 C D
$1 $5
Q-LEARNING
ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕
1. 𝛄 𝐢𝐬 the 𝐝𝐢𝐬𝐜𝐨𝐮𝐧𝐭 𝐫𝐚𝐭𝐞
2. Related to Google’s Half Life
3. 7 day half life  0.9
Q-LEARNING
ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕
5) 𝐏𝐚𝐠𝐞𝟏: 𝐀 = $𝟏𝟎 + 𝟎. 𝟗 ∗ $𝟓
$10
Page 1
Page 2
A
Q-LEARNING
Direct Credit: $10.0
Attribution Credit: $4.5
Q-LEARNING
Direct Credit: $10.0
Attribution Credit: $4.5
Total Page1|A: $14.5
Q-LEARNING
5) 𝐂𝐫𝐞𝐝𝐢𝐭 𝐏𝐚𝐠𝐞𝟏: 𝐀 = $𝟏𝟒. 𝟓
$10
Page 1
Page 2
A
Q-LEARNING
Attribution in just two simple steps:
1)Treat Landing on Next Page like a regular
conversion!
2)Use Predictions of future values at the
next step as the conversion value!
Q-LEARNING
Q Learning + Targeting
User: Is a New User and from Rural area
Page 1
Page 2
A
User: Is a New User and from Rural area
Page 1
Page 2
A
Q Learning + Targeting
Attribution calculation depends on [Rural;New]
Page 1
Page 2
A
Q Learning + Targeting
85
Source: Conductrics Predictive Audience Discovery
Q-VALUE: NEW & RURAL USER
86
Source: Conductrics Predictive Audience Discovery
Q-VALUE: NEW & RURAL USER
87
Source: Conductrics Predictive Audience Discovery
Q-VALUE: NEW & RURAL USER
88
Q-VALUE: NEW & RURAL USER
1. For New & Rural users Option B has highest value
2. Use predicted value of Option B for use in the Q-value calculation
Source: Conductrics Predictive Audience Discovery
ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕
Page 1
Page 2
A
𝐏𝐚𝐠𝐞𝟏: 𝐀 = 𝟎 + 𝟎. 𝟗 ∗ 𝟎. 𝟒𝟏
Q Learning + Targeting
ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕
Page 1
Page 2
A
𝐏𝐚𝐠𝐞𝟏: 𝐀 = 𝟎. 𝟑𝟔𝟗
Q Learning + Targeting
1) Bandits help solve Automation
2) Attribution can be solved by
hacking ‘AB Testing’ (Q-Learning)
3) Extended Attribution to include
decisions/experiments
4) Looked into the eye of AI and Lived
WHAT DID WE LEARN
WAKE UP. WE ARE DONE!
Twitter:mgershoff
Email:matt.gershoff@conductrics.com

Matt gershoff