Matt gershoff

AB TESTING TO AI
(REINFORCEMENT LEARNING)

WHO IS THIS GUY?
• Matt Gershoff
• CEO: Conductrics
• Twitter:mgershoff
• Email:matt@conductrics.com

WHAT WE WILL TALK ABOUT
• Definition of Reinforcement Learning
–Trial and Error Learning
•AB Testing (Bayesian)
•Multi-Armed Bandit – (Automation)
•Bandit with Targeting
–Multi-Touch Point Optimization
•Attribution=Dynamics
•Q-Learning

What is
Reinforcement
Learning?

Reinforcement
Learning is a
Problem not a
Solution

Reinforcement Learning Problem:
Learn to make a Sequence
of Decisions by Trial &
Error in order to Achieve
(delayed) Goal(s)

MARKETING PROBLEMS
Online Applications – websites, mobile, things
communicating via HTTP
Low Risk Decisions* – i.e. ‘Which Banner’
High Volume* – not for one off, or for decisions
that are made infrequently
* High Volume/Low Risk from here http://jtonedm.com/

TRIAL AND ERROR LEARNING
AB
Testing/Bandit
Sequential
Decisions
Targeting

A
B
Page A Convert
Don’t
Convert
Location Decision Objective/Payoff
TRIAL AND ERROR: AB TESTING

How to Solve:
A
B
Page A Convert
Don’t
Convert

How to Solve:
1. AB Testing
A
B
Page A Convert
Don’t
Convert

AB Testing: Bayesian
Red Button Green Button

Bayesian AB Test asks:

Bayesian AB Test asks:
Is P( Green|DATA)> P(Red|DATA)?

BAYESIAN AB TESTING REVIEW
P( Green|DATA)> P(Red|DATA)=50%
Sample Size=0

P( Green|DATA)> P(Red|DATA)=68%
Sample Size=100

P( Green|DATA)> P(Red|Data) = 94%
Sample Size=1,000

P( Green|DATA)> P(Red|Data)=99.99…%
Sample Size=10,000

AB TESTING ->LEARN FIRST
Conductrics Confidential 23
Time
Explore/
Learn
Exploit/
Earn
Data Collection/Sample Apply Leaning

How to Solve:
1. AB Testing
2. Multi-Arm Bandit
A
B
Page A Convert
Don’t
Convert
SINGLE LOCATION DECISIONS/AB TEST

Like Bayesian AB Testing
• Calculate P(A|Data) & P(B|Data)
Unlike AB Testing
• Don’t make fair selections (50/50)
• Select based on P(A|Data) & P(B|Data)
BANDIT: THOMPSON SAMPLING

Adaptive
Construct Probability Distributions
• Use Mean as center
• Standard Deviation for spread
A
B C
ADAPTIVE: THOMPSON SAMPLING

A
B C
Adaptive – For Each User
1)Take a random sample from each distribution
A=0.49

A
B C
A=0.49B=0.51

A
B C
A=0.49C=0.46 B=0.51

A
B C
1)Pick Option with Highest Score (Option B)
A=0.49C=0.46
B=0.51

A
B C
Adaptive – Repeat

A
B C
Adaptive – Repeat
A=0.52

A
B C
Adaptive – Repeat
A=0.52B=0.43

A
B C
Adaptive – Repeat
A=0.52C=0.49B=0.43

Selection Chance based on:
1. Relative estimated mean value of the option
2. Amount of overlap of the distributions
67%
8%
25%
0%
20%
40%
60%
80%
100%
Option A Option B Option C
Selection Chance

37
twitter: @mgershoff
Trial and
Error
Learning
Sequential
Decisions
Predictive
Targeting
TARGETING

PREDICTIVE TARGETING
A Mapping
∑Behavioral
Data
Option/Actions

Confidential
Thompson Sampling with Targeting

Source: Larochelle - Neural Networks 1 - DLSS 2017.pdfConductrics Inc. | Matt Gershoff |
www.conductrics.com | @conductrics
LEARNING THE MAPPINGS
• Regression (Lin, Logistic, etc.)
• Deep Nets
• Decision Trees

𝑓(𝑥) = 𝑤0 + ෍
𝑑
𝑤 𝑑 ∗ 𝑥 𝑑
Conductrics Inc. | Matt Gershoff |
REGRESSION

1) Input Data
2) Hidden Layer
3) Hidden Layer
4) Output Layer
Source: Larochelle - Neural Networks 1 - DLSS 2017.pdfConductrics Inc. | Matt Gershoff |
DEEP LEARNING

Model as Decision Tree
What Simple Model?
Conductrics Inc. | Matt Gershoff |

REINFORCEMENT LEARNING
1. Sequential Decisions
2. Delayed Rewards

Enter
Site
Page 1
Page 2
MULTI-TOUCH = DYNAMICS

Enter
Site
Page 1
Page 2 C D
A B

Enter
Site
Exit Site
Page 1
Page 2 C D
A B

Enter
Site
Exit SiteGoal
Page 1
Page 2 C D
A B

1. Conversion Rates
Option Value
Page1:A 3%
Page1:B 4%
Page2:C 10%
Page2:D 12%

1. Conversion Rates
2. Transition Frequencies
Page:Action Page 1 Page 2
Page1:A - 30%
Page1:B - 20%
Page2:C 2% -
Page2:D 1% -

This is Complicated!

Q Learning
ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔 𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕

ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔 𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕
Q-LEARNING

Analytics Interpretation of Q-Learning
1)Treat Landing on the Next Page like a
regular conversion!
Q-LEARNING

Analytics Interpretation of Q-Learning
1)Treat Landing on the Next Page like a
regular conversion!
2)Use the estimates at the next step as the
conversion value!
Q-LEARNING

A B
1) Take an action
Q-LEARNING

A
1) Take an action – Pick A
Q-LEARNING

A
2) Measure what user does after
Q-LEARNING

2) Do they Convert?
$10
Page 1 A
Q-LEARNING

2) Yes!
$10
Page 1 A
Q-LEARNING

ሿ𝑸 𝒔 𝒕, 𝒂 𝒕 + 𝜶[𝒓 𝒕+𝟏 + 𝜸 ∗ 𝒎𝒂𝒙 𝒂 𝑸 𝒔𝒕+𝟏, 𝒂 𝒕 − 𝑸 𝒔 𝒕, 𝒂 𝒕
2) Set r =$10
$10
Page 1 A
Q-LEARNING

EXACTLY the SAME as AB TESTING
$10
Page 1 A
Q-LEARNING

3) Do they next go to Page 2?
Goal
Page 1 A
Page 2
Q-LEARNING

3) Yes!
Goal
Page 1
Page 2
A
Q-LEARNING

3) Yes! Now in Dynamic part of Path
Goal
Page 1
Page 2
A
Q-LEARNING

C D
4) Check Current Estimated Values ‘C’ & ‘D’
Q-LEARNING

Of course initially C=$0; D=$0
Page 2 C D
$0 $0
Q-LEARNING

But assume mean of C=$1; D=$5
Page 2 C D
$1 $5
Q-LEARNING

4) Set max(Q(st,at)) = $5 (value of D)
Page 2 C D
$1 $5
Q-LEARNING

1. 𝛄 𝐢𝐬 the 𝐝𝐢𝐬𝐜𝐨𝐮𝐧𝐭 𝐫𝐚𝐭𝐞
2. Related to Google’s Half Life
3. 7 day half life  0.9
Q-LEARNING

5) 𝐏𝐚𝐠𝐞𝟏: 𝐀 = $𝟏𝟎 + 𝟎. 𝟗 ∗ $𝟓
$10
Page 1
Page 2
A
Q-LEARNING

Direct Credit: $10.0
Attribution Credit: $4.5
Q-LEARNING

Direct Credit: $10.0
Attribution Credit: $4.5
Total Page1|A: $14.5
Q-LEARNING

5) 𝐂𝐫𝐞𝐝𝐢𝐭 𝐏𝐚𝐠𝐞𝟏: 𝐀 = $𝟏𝟒. 𝟓
$10
Page 1
Page 2
A
Q-LEARNING

Attribution in just two simple steps:
1)Treat Landing on Next Page like a regular
conversion!
2)Use Predictions of future values at the
next step as the conversion value!
Q-LEARNING

Q Learning + Targeting
User: Is a New User and from Rural area
Page 1
Page 2
A

User: Is a New User and from Rural area
Page 1
Page 2
A

Attribution calculation depends on [Rural;New]
Page 1
Page 2
A

85
Source: Conductrics Predictive Audience Discovery
Q-VALUE: NEW & RURAL USER

86

87

88
1. For New & Rural users Option B has highest value
2. Use predicted value of Option B for use in the Q-value calculation

Page 1
Page 2
A
𝐏𝐚𝐠𝐞𝟏: 𝐀 = 𝟎 + 𝟎. 𝟗 ∗ 𝟎. 𝟒𝟏

Page 1
Page 2
A
𝐏𝐚𝐠𝐞𝟏: 𝐀 = 𝟎. 𝟑𝟔𝟗

1) Bandits help solve Automation
2) Attribution can be solved by
hacking ‘AB Testing’ (Q-Learning)
3) Extended Attribution to include
decisions/experiments
4) Looked into the eye of AI and Lived
WHAT DID WE LEARN

WAKE UP. WE ARE DONE!
Twitter:mgershoff
Email:matt.gershoff@conductrics.com

Matt gershoff

More Related Content

Viewers also liked

Similar to Matt gershoff

More from Rising Media, Inc.

Recently uploaded

Matt gershoff