1
Mojan Hamed
November 2019
● from Toronto, Canada
● uWaterloo, B.A.Sc Engineering and uWashington, M.A.Sc Applied Math
● data scientist at Shopify
2
this talk
3
4
5
most targeted marketing activity today, even when
measured on the basis of incremental impact, is
targeted on the basis of non-incremental models.
Radcliffe & Surry1
“
: “who will respond positively?”
: “who will respond positively, that would not have if not treated?”
Thus uplift modelling focuses on which individuals’ behaviour you can .
6
target on the basis of incrementality
7
we may be tempted to use a response (propensity) model
Training
Population
Pilot
Campaign
Model
Test
Population
ex. logistic regression
8
9
modeling the probability
of the effect
modeling the probability of
an in the effect;
10
11
● run a randomized experiment with the treatment
○ can infer from data if unconfoundedness condition is met (out of scope)
● train a model that can estimate the causal effect (E.g. CATE)
● evaluate model on incremental gains
● extend model to score individuals on an ongoing basis
12
onditional verage reatment ffect (CATE)
13
vector of features
individualoutcome
onditional verage reatment ffect (CATE)
14
vector of features
individualoutcome
This is the
“What would a person have done in alternative universe where we didn’t treat them?”
input data
15
1 Treatment 34 392
2 Treatment 65 583
3 Control 67 394
4 Control 24 960
feature matrix
experiment
results
16
1 Treatment 34 392 1
2 Treatment 65 583 0
3 Control 67 394 1
4 Control 24 960 0
17
1 Treatment 34 392 1 0
2 Treatment 65 583 0 1
3 Control 67 394 1 1
4 Control 24 960 0 0
our uplift model will
estimate the counterfactual
uplift
model
uplift
model
18
1 Treatment 34 392 1 0 1
2 Treatment 65 583 0 1 -1
3 Control 67 394 1 1 0
4 Control 24 960 0 0 0
causal effect
(CATE)
19
1 Treatment 34 392 1 0 1
2 Treatment 65 583 0 1 -1
3 Control 67 394 1 1 0
4 Control 24 960 0 0 0
perfect target!
uplift
model
20
1 Treatment 34 392 1 0 1
2 Treatment 65 583 0 1 -1
3 Control 67 394 1 1 0
4 Control 24 960 0 0 0
sleeping dog
uplift
model
21
1 Treatment 34 392 1 0 1
2 Treatment 65 583 0 1 -1
3 Control 67 394 1 1 0
4 Control 24 960 0 0 0
sure thing
uplift
model
22
1 Treatment 34 392 1 0 1
2 Treatment 65 583 0 1 -1
3 Control 67 394 1 1 0
4 Control 24 960 0 0 0
unpersuadable
uplift
model
23
24
github.com/mojanh/explore
Three
methods
25
Three
methods
26
Train one model for E[Yi
(treatment)|Xi
] and another for E[Yi
(control)|Xi
]
Well known implementation: CausalLift
Treatment
Control
Model B
E[Yi
(treatment)|Xi
]
Model A
E[Yi
(treatment)|Xi
]
Uplift:
Model A - Model B
* to score individuals going forward, take Δ of models
Experiment
Three
methods
27
● Very simple, intuitive
● Can work well for lower complexity problems
● Disadvantages:
○ Behaviour of the uplift may differ from individual classifiers
○ Fitting to the main effect may miss “weaker” uplift signal
○ Variable selection and weighting differs between models
Three
methods
28
● Requires balanced binary outcome variable
● Derive transformed outcome and train model to optimize for it
○ Well known implementation: Pylift
proof
* W is treatment(0,1) and p = P(W=1)
* to score individuals going forward, simply fit to single model
Three
methods
29
● in general, goal of decision tree is to:
○ minimize Δ in size of splits
○ maximize Δ in value of splits (homogeneity)
● for uplift we add one additional criterion:
○ maximize Δ of control & treatment between splits
● aka “difference of differences”
Three
methods
30
PT
(Y)
PC
(Y)
PT
(Y|a0
)
PC
(Y|a0
)
PT
(Y|a1
)
PC
(Y|a1
)
a0
= X < A a1
= X > A
Three
methods
31
PT
(Y)
PC
(Y)
PT
(Y|a0
)
PC
(Y|a0
)
PT
(Y|a1
)
PC
(Y|a1
)
a0
= X < A a1
= X > A
we are attempting to validate the counterfactual, an event
that never happened, therefore there is no “ground truth”.
32
first plot your curve,
using a
ordering of individuals
33
percent of population targeted
cumulativeincrementalresponse
random
then, plot uplift curve by
ordering individuals by
descending incremental gain
34
percent of population targeted
cumulativeincrementalresponse
random
class transformation
you can compare many
methods on the same plot!
35
percent of population targeted
cumulativeincrementalresponse
random
class transformation
two-model approach
if stakeholder buy-in is an issue, you can run an experiment
targetting randomly, and based on uplift
with statistical
significance since it is targetting to maximize incremental lift
36
37
Areas of
Impact
38
Ex. Lead scoring for paid bidding system
Areas of
Impact
39
Ex. Select the best offer for at-risk for churn customers
Areas of
Impact
40
Ex. Justifying the cost of customer support initiatives
Areas of
Impact
41
Ex. Identify best candidates for promotional offer
[1] Real-World Uplift Modelling with Significance-Based Uplift Trees,
[2] Causal Inference and Uplift Modeling,
[3] Uplift Modeling in Direct Marketing,
[4] Decision trees for uplift modeling with single and multiple treatments,
[1] Pylift,
[2] CausalLift,
42
43

Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed

  • 1.
  • 2.
    ● from Toronto,Canada ● uWaterloo, B.A.Sc Engineering and uWashington, M.A.Sc Applied Math ● data scientist at Shopify 2
  • 3.
  • 4.
  • 5.
    5 most targeted marketingactivity today, even when measured on the basis of incremental impact, is targeted on the basis of non-incremental models. Radcliffe & Surry1 “
  • 6.
    : “who willrespond positively?” : “who will respond positively, that would not have if not treated?” Thus uplift modelling focuses on which individuals’ behaviour you can . 6
  • 7.
    target on thebasis of incrementality 7 we may be tempted to use a response (propensity) model Training Population Pilot Campaign Model Test Population ex. logistic regression
  • 8.
  • 9.
    9 modeling the probability ofthe effect modeling the probability of an in the effect;
  • 10.
  • 11.
    11 ● run arandomized experiment with the treatment ○ can infer from data if unconfoundedness condition is met (out of scope) ● train a model that can estimate the causal effect (E.g. CATE) ● evaluate model on incremental gains ● extend model to score individuals on an ongoing basis
  • 12.
  • 13.
    onditional verage reatmentffect (CATE) 13 vector of features individualoutcome
  • 14.
    onditional verage reatmentffect (CATE) 14 vector of features individualoutcome This is the “What would a person have done in alternative universe where we didn’t treat them?”
  • 15.
    input data 15 1 Treatment34 392 2 Treatment 65 583 3 Control 67 394 4 Control 24 960 feature matrix
  • 16.
    experiment results 16 1 Treatment 34392 1 2 Treatment 65 583 0 3 Control 67 394 1 4 Control 24 960 0
  • 17.
    17 1 Treatment 34392 1 0 2 Treatment 65 583 0 1 3 Control 67 394 1 1 4 Control 24 960 0 0 our uplift model will estimate the counterfactual uplift model
  • 18.
    uplift model 18 1 Treatment 34392 1 0 1 2 Treatment 65 583 0 1 -1 3 Control 67 394 1 1 0 4 Control 24 960 0 0 0 causal effect (CATE)
  • 19.
    19 1 Treatment 34392 1 0 1 2 Treatment 65 583 0 1 -1 3 Control 67 394 1 1 0 4 Control 24 960 0 0 0 perfect target! uplift model
  • 20.
    20 1 Treatment 34392 1 0 1 2 Treatment 65 583 0 1 -1 3 Control 67 394 1 1 0 4 Control 24 960 0 0 0 sleeping dog uplift model
  • 21.
    21 1 Treatment 34392 1 0 1 2 Treatment 65 583 0 1 -1 3 Control 67 394 1 1 0 4 Control 24 960 0 0 0 sure thing uplift model
  • 22.
    22 1 Treatment 34392 1 0 1 2 Treatment 65 583 0 1 -1 3 Control 67 394 1 1 0 4 Control 24 960 0 0 0 unpersuadable uplift model
  • 23.
  • 24.
  • 25.
  • 26.
    Three methods 26 Train one modelfor E[Yi (treatment)|Xi ] and another for E[Yi (control)|Xi ] Well known implementation: CausalLift Treatment Control Model B E[Yi (treatment)|Xi ] Model A E[Yi (treatment)|Xi ] Uplift: Model A - Model B * to score individuals going forward, take Δ of models Experiment
  • 27.
    Three methods 27 ● Very simple,intuitive ● Can work well for lower complexity problems ● Disadvantages: ○ Behaviour of the uplift may differ from individual classifiers ○ Fitting to the main effect may miss “weaker” uplift signal ○ Variable selection and weighting differs between models
  • 28.
    Three methods 28 ● Requires balancedbinary outcome variable ● Derive transformed outcome and train model to optimize for it ○ Well known implementation: Pylift proof * W is treatment(0,1) and p = P(W=1) * to score individuals going forward, simply fit to single model
  • 29.
    Three methods 29 ● in general,goal of decision tree is to: ○ minimize Δ in size of splits ○ maximize Δ in value of splits (homogeneity) ● for uplift we add one additional criterion: ○ maximize Δ of control & treatment between splits ● aka “difference of differences”
  • 30.
  • 31.
  • 32.
    we are attemptingto validate the counterfactual, an event that never happened, therefore there is no “ground truth”. 32
  • 33.
    first plot yourcurve, using a ordering of individuals 33 percent of population targeted cumulativeincrementalresponse random
  • 34.
    then, plot upliftcurve by ordering individuals by descending incremental gain 34 percent of population targeted cumulativeincrementalresponse random class transformation
  • 35.
    you can comparemany methods on the same plot! 35 percent of population targeted cumulativeincrementalresponse random class transformation two-model approach
  • 36.
    if stakeholder buy-inis an issue, you can run an experiment targetting randomly, and based on uplift with statistical significance since it is targetting to maximize incremental lift 36
  • 37.
  • 38.
    Areas of Impact 38 Ex. Leadscoring for paid bidding system
  • 39.
    Areas of Impact 39 Ex. Selectthe best offer for at-risk for churn customers
  • 40.
    Areas of Impact 40 Ex. Justifyingthe cost of customer support initiatives
  • 41.
    Areas of Impact 41 Ex. Identifybest candidates for promotional offer
  • 42.
    [1] Real-World UpliftModelling with Significance-Based Uplift Trees, [2] Causal Inference and Uplift Modeling, [3] Uplift Modeling in Direct Marketing, [4] Decision trees for uplift modeling with single and multiple treatments, [1] Pylift, [2] CausalLift, 42
  • 43.