Estimating Causal Effect of Ads in a Real-Time Bidding Platform

Estimating Causal Eﬀect of Ads in a
Real-Time-Bidding Platform
Prasad Chalasani (SVP Data Science, MediaMath)
Sep 24, 2016

Project Placebo
Or,
How to Measure Causal Eﬀect of Ads in an RTB Platform
Placebo Team (alphabetical):
Ari Buchalter (President, Technology; co-founder)
Prasad Chalasani
Himanish Kushary
Jason Lei
Jonathan Marshall
Michael Neiss
Tristan Piron
Sara Skrmetti
Jawad Stouli
Jaynth Thiagarajan
Ezra Winston

listen to ~ 100 Bln ad opportunities daily

respond with optimal bids within milliseconds

respond with optimal bids within milliseconds
petabytes of data (ad impressions, visits, clicks, conversions)

Key Conceptual Take-aways
Deﬁnition of causal eﬀect

Context: relationship to Machine Learning

Causal eﬀect in a Real-Time Bidding Platform

Simplest approach is wasteful

Less wasteful approach: bias (non-compliance)

MediaMath’s solution

Bayesian Methods for Ad Lift Conﬁdence Bounds

Gibbs Sampling (MCMC – Markov Chain Monte Carlo)

Complications unique to our setting:

Long-running experiments

Multiple cookies per user

Ad impact measurement
Advertisers want to know the impact of showing ads to people.

Measuring Ad Impact: Two Approaches
Observational studies:

Compare people who happen to be exposed vs not exposed

Bias a big issue

Bias a big issue
Randomized tests:

Bias a big issue
Randomized tests:
Randomly assign people to test (exposed), control (un-exposed)

Causal Eﬀect: the questions to ask
When a set of people U is exposed to ads,
what is the avg response-rate R1 of the people in U?

what would have been the response rate R0 of U, if they
had not seen the ad?

what would have been the response rate R0 of U, if they
had not seen the ad?
relative causal eﬀect, or causal lift = R1/R0 − 1

Causal Eﬀect: Notation
“units” i = 1, 2, . . . , n (“users”, “user_context”, . . . )

Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0.

Each unit i has 2 potential responses:

Yi (0) = response when not exposed to an ad

Yi (1) = response when exposed to an ad

Wi = 1 if unit i exposed to ad, else 0.

Observed response: Y obs
i = Yi (Wi )

i = Yi (Wi )
if Wi = 1, only Yi (1) is observed, Yi (0) is a counterfactual

i = Yi (Wi )

i = Yi (Wi )
Xi = k-dimensional vector of features

i = Yi (Wi )
e.g. (dayOfWeek, age, location, web-domain)

i = Yi (Wi )
e.g. (dayOfWeek, age, location, web-domain)
Unit level causal eﬀect is impossible to measure:
τi = Yi (1) − Yi (0)

Average Causal/Treatment Eﬀects

Average Treatment Eﬀect (ATE)
ATE = E[Yi (1) − Yi (0)]

ATE = E[Yi (1) − Yi (0)]
Average Treatment Eﬀect on the Treated (ATET)
ATET = E[Yi (1) − Yi (0) | Wi = 1]

ATE = E[Yi (1) − Yi (0)]
ATET = E[Yi (1) − Yi (0) | Wi = 1]
Causal Lift (L) (this talk)
L =
E[Yi (1) | Wi = 1]
E[Yi (0) | Wi = 1]
− 1

ATE = E[Yi (1) − Yi (0)]
ATET = E[Yi (1) − Yi (0) | Wi = 1]
L =
E[Yi (1) | Wi = 1]
E[Yi (0) | Wi = 1]
− 1
Conditional Average Treatment Eﬀect: (Athey/Imbens et al)
τ(x) = E[Yi (1) − Yi (0) | Xi = x]

ATE = E[Yi (1) − Yi (0)]
ATET = E[Yi (1) − Yi (0) | Wi = 1]
L =
E[Yi (1) | Wi = 1]
E[Yi (0) | Wi = 1]
− 1
Conditional Average Treatment Eﬀect: (Athey/Imbens et al)
τ(x) = E[Yi (1) − Yi (0) | Xi = x]
Conditional Response Rate (usual Machine Learning problem)
R(x) = E[Yi (1) | Xi = x]

Causal Eﬀect Illustration: Counterfactuals

Causal Eﬀect with Counterfactuals
Counterfactuals are unobservable!

Instead of comparing:
Resp-rate of exposed users U vs
Counterfactual un-exposed response-rate of same users U,

We compare:
Resp-rate of un-exposed users statistically equivalent to U.

We compare:
Resp-rate of un-exposed users statistically equivalent to U.
=⇒ using randomization

Ideal Randomized Test:
Randomize after winning bid

Ideal Randomized Test: Ad lift

Ideal Randomized Test: Wasted spend

MediaMath’s approach:
Randomize before bidding

A Less Wasteful Randomized Test

Estimating the Counterfactual RCW

Ad Lift Estimation
Main steps:
observe response rates RC , RTW , RTL

Ad Lift Estimation
Main steps:
observe test win-rate w

Ad Lift Estimation
Main steps:
estimate the control counterfactual winner response-rate
RCW =
RC − (1 − w)RTL
w

Ad Lift Estimation
Main steps:
RCW =
RC − (1 − w)RTL
w
compute lift L = RTW /RCW − 1

Ad Lift Estimation
Main steps:
RCW =
RC − (1 − w)RTL
w
compute lift L = RTW /RCW − 1
similar to Treatment Eﬀect Under Non-compliance in clinicial
trials.

Ad Lift Estimation
How to compute the 90% conﬁdence interval for L?

Ad Lift: Conﬁdence Intervals with Gibbs sampler

Bayesian approach
Assume a random parameter vector θ consisting of:

Bayesian approach
(RTW , RL, RCW , w, ...)

Bayesian approach
(RTW , RL, RCW , w, ...)
Set up prior distribution on θ ∼ p(θ)

Bayesian approach
(RTW , RL, RCW , w, ...)
Sample M values of unknown θ from posterior: Gibbs Sampler
P(θ |Data) ∝ P(Data | θ) · p(θ)

Bayesian approach
(RTW , RL, RCW , w, ...)
For each sampled θ compute lift L = RTW /RCW − 1

Bayesian approach
(RTW , RL, RCW , w, ...)
For each sampled θ compute lift L = RTW /RCW − 1
Compute (0.05, 0.95) quantiles of sampled L values

Ad Lift Gibbs Sampling: Random variables
Probabilities: w, RTW , RCW , RL
Counts: CW 0, CW 1, CL0, CL1

Ad Lift Gibbs Sampling: Random variables
Probabilities: w, RTW , RCW , RL
Counts: CW 0, CW 1, CL0, CL1
Beta(1, 1) priors on probabilities, e.g.:
w ∼ Beta(1, 1) ∼ Uniform(0, 1), . . .

Ad Lift Gibbs Sampling: Posterior Probabilities
Likelihood of observed
k = CL1 + TL1 conversions out of
n = CL1 + TL1 + CL0 + TL0 trials,
given loser reponse-rate RL:

Binom(k, n; RL) ∝ Rk
L (1 − RL)n−k
,

L (1 − RL)n−k
,
so posterior of RL
P(RL | k, n) ∝ P(k, n | RL) · p(RL)

L (1 − RL)n−k
,
so posterior of RL
P(RL | k, n) ∝ P(k, n | RL) · p(RL)
∝ Rk
L (1 − RL)n−k
· Beta(1, 1)

L (1 − RL)n−k
,
so posterior of RL
P(RL | k, n) ∝ P(k, n | RL) · p(RL)
∝ Rk
L (1 − RL)n−k
· Beta(1, 1)
∝ Rk+1
L (1 − RL)n−k+1

L (1 − RL)n−k
,
so posterior of RL
P(RL | k, n) ∝ P(k, n | RL) · p(RL)
∝ Rk
L (1 − RL)n−k
· Beta(1, 1)
∝ Rk+1
L (1 − RL)n−k+1
∝ Beta(k + 1, n − k + 1)

Ad Lift Gibbs Sampling: Posterior Counts
We observe C1 = CL1 + CW 1 (total control conversions).
Need to sample CL1, CW 1

CW 1 is a Binomial draw from n = C1, with probability:
P(ctl winner | ctl conversion) =
w · RCW
w · RCW + (1 − w) · RL

CW 1 is a Binomial draw from n = C1, with probability:
P(ctl winner | ctl conversion) =
w · RCW
w · RCW + (1 − w) · RL
CL1 = C1 − CW 1

Complication 1: We only observe cookies, not users;
A user’s cookies may be in both test and control
(Contamination)

Control Contamination due to Multiple Cookies

Cookie-Contamination Questions
How does cookie contamination aﬀect measured lift?

Does the cookie-distribution matter?

everyone has k cookies vs an average of k cookies

What is the inﬂuence of the control percentage?

Simulations best way to understand this

Simulations best way to understand this
Monte carlo simulations using Spark

Simulations for cookie-contamination
A scenario is a combination of parameters:
M = # trials for this scenario, usually 10K-1M
n = # users, typically 10K - 10M
p = # control percentage (usually 10-50%)
k = cookie-distribution, expressed as 1 : 100, or 1 : 70, 3 : 30
r = (un-contaminated) control user response rate
a = true lift, i.e. exposed user response rate = r ∗ (1 + a).
A scenario ﬁle speciﬁes a scenario in each row.
could be thousands of scenarios

Complication 2:

Long-Running Experiments
Ideal randomized test is instantaneous.

When a test is run for weeks/months,
A test user may sometimes be a winner, sometimes loser.
How to deﬁne who is a “winner” and “loser”?
Crucial because lift L = RTW /RCW − 1.

When a test is run for weeks/months,
A test user may sometimes be a winner, sometimes loser.
How to deﬁne who is a “winner” and “loser”?
Crucial because lift L = RTW /RCW − 1.
Our approach (details omitted):
Ad inﬂuence period is limited
“refresh” a user after suitable time-period elapses.
Count “user time-spans” rather than “users”
Identify “experiments” within user’s time-line

MediaMath’s Placebo App
Currently in production for ∼ 10 advertisers
Advertisers can specify which campaigns to measure
Lift estimation, Gibbs Sampling runs on AWS using Spark
Multiple runs of Gibbs Sampler in parallel (with diﬀerent priors)

Thank you!
pchalasani@mediamath.com

Estimating Causal Effect of Ads in a Real-Time Bidding Platform

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Estimating Causal Effect of Ads in a Real-Time Bidding Platform