SlideShare a Scribd company logo
Estimating Causal Effect of Ads in a
Real-Time-Bidding Platform
Prasad Chalasani (SVP Data Science, MediaMath)
Sep 24, 2016
Project Placebo
Or,
How to Measure Causal Effect of Ads in an RTB Platform
Placebo Team (alphabetical):
Ari Buchalter (President, Technology; co-founder)
Prasad Chalasani
Himanish Kushary
Jason Lei
Jonathan Marshall
Michael Neiss
Tristan Piron
Sara Skrmetti
Jawad Stouli
Jaynth Thiagarajan
Ezra Winston
listen to ~ 100 Bln ad opportunities daily
listen to ~ 100 Bln ad opportunities daily
respond with optimal bids within milliseconds
listen to ~ 100 Bln ad opportunities daily
respond with optimal bids within milliseconds
petabytes of data (ad impressions, visits, clicks, conversions)
Key Conceptual Take-aways
Definition of causal effect
Key Conceptual Take-aways
Definition of causal effect
Context: relationship to Machine Learning
Key Conceptual Take-aways
Definition of causal effect
Context: relationship to Machine Learning
Causal effect in a Real-Time Bidding Platform
Key Conceptual Take-aways
Definition of causal effect
Context: relationship to Machine Learning
Causal effect in a Real-Time Bidding Platform
Simplest approach is wasteful
Key Conceptual Take-aways
Definition of causal effect
Context: relationship to Machine Learning
Causal effect in a Real-Time Bidding Platform
Simplest approach is wasteful
Less wasteful approach: bias (non-compliance)
Key Conceptual Take-aways
Definition of causal effect
Context: relationship to Machine Learning
Causal effect in a Real-Time Bidding Platform
Simplest approach is wasteful
Less wasteful approach: bias (non-compliance)
MediaMath’s solution
Key Conceptual Take-aways
Definition of causal effect
Context: relationship to Machine Learning
Causal effect in a Real-Time Bidding Platform
Simplest approach is wasteful
Less wasteful approach: bias (non-compliance)
MediaMath’s solution
Bayesian Methods for Ad Lift Confidence Bounds
Key Conceptual Take-aways
Definition of causal effect
Context: relationship to Machine Learning
Causal effect in a Real-Time Bidding Platform
Simplest approach is wasteful
Less wasteful approach: bias (non-compliance)
MediaMath’s solution
Bayesian Methods for Ad Lift Confidence Bounds
Gibbs Sampling (MCMC – Markov Chain Monte Carlo)
Key Conceptual Take-aways
Definition of causal effect
Context: relationship to Machine Learning
Causal effect in a Real-Time Bidding Platform
Simplest approach is wasteful
Less wasteful approach: bias (non-compliance)
MediaMath’s solution
Bayesian Methods for Ad Lift Confidence Bounds
Gibbs Sampling (MCMC – Markov Chain Monte Carlo)
Complications unique to our setting:
Key Conceptual Take-aways
Definition of causal effect
Context: relationship to Machine Learning
Causal effect in a Real-Time Bidding Platform
Simplest approach is wasteful
Less wasteful approach: bias (non-compliance)
MediaMath’s solution
Bayesian Methods for Ad Lift Confidence Bounds
Gibbs Sampling (MCMC – Markov Chain Monte Carlo)
Complications unique to our setting:
Long-running experiments
Key Conceptual Take-aways
Definition of causal effect
Context: relationship to Machine Learning
Causal effect in a Real-Time Bidding Platform
Simplest approach is wasteful
Less wasteful approach: bias (non-compliance)
MediaMath’s solution
Bayesian Methods for Ad Lift Confidence Bounds
Gibbs Sampling (MCMC – Markov Chain Monte Carlo)
Complications unique to our setting:
Long-running experiments
Multiple cookies per user
Ad impact measurement
Advertisers want to know the impact of showing ads to people.
Measuring Ad Impact: Two Approaches
Observational studies:
Measuring Ad Impact: Two Approaches
Observational studies:
Compare people who happen to be exposed vs not exposed
Measuring Ad Impact: Two Approaches
Observational studies:
Compare people who happen to be exposed vs not exposed
Bias a big issue
Measuring Ad Impact: Two Approaches
Observational studies:
Compare people who happen to be exposed vs not exposed
Bias a big issue
Randomized tests:
Measuring Ad Impact: Two Approaches
Observational studies:
Compare people who happen to be exposed vs not exposed
Bias a big issue
Randomized tests:
Randomly assign people to test (exposed), control (un-exposed)
Causal Effect: the questions to ask
When a set of people U is exposed to ads,
what is the avg response-rate R1 of the people in U?
Causal Effect: the questions to ask
When a set of people U is exposed to ads,
what is the avg response-rate R1 of the people in U?
what would have been the response rate R0 of U, if they
had not seen the ad?
Causal Effect: the questions to ask
When a set of people U is exposed to ads,
what is the avg response-rate R1 of the people in U?
what would have been the response rate R0 of U, if they
had not seen the ad?
relative causal effect, or causal lift = R1/R0 − 1
Causal Effect: Notation
“units” i = 1, 2, . . . , n (“users”, “user_context”, . . . )
Causal Effect: Notation
“units” i = 1, 2, . . . , n (“users”, “user_context”, . . . )
Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0.
Causal Effect: Notation
“units” i = 1, 2, . . . , n (“users”, “user_context”, . . . )
Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0.
Each unit i has 2 potential responses:
Causal Effect: Notation
“units” i = 1, 2, . . . , n (“users”, “user_context”, . . . )
Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0.
Each unit i has 2 potential responses:
Yi (0) = response when not exposed to an ad
Causal Effect: Notation
“units” i = 1, 2, . . . , n (“users”, “user_context”, . . . )
Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0.
Each unit i has 2 potential responses:
Yi (0) = response when not exposed to an ad
Yi (1) = response when exposed to an ad
Causal Effect: Notation
“units” i = 1, 2, . . . , n (“users”, “user_context”, . . . )
Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0.
Each unit i has 2 potential responses:
Yi (0) = response when not exposed to an ad
Yi (1) = response when exposed to an ad
Wi = 1 if unit i exposed to ad, else 0.
Causal Effect: Notation
“units” i = 1, 2, . . . , n (“users”, “user_context”, . . . )
Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0.
Each unit i has 2 potential responses:
Yi (0) = response when not exposed to an ad
Yi (1) = response when exposed to an ad
Wi = 1 if unit i exposed to ad, else 0.
Observed response: Y obs
i = Yi (Wi )
Causal Effect: Notation
“units” i = 1, 2, . . . , n (“users”, “user_context”, . . . )
Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0.
Each unit i has 2 potential responses:
Yi (0) = response when not exposed to an ad
Yi (1) = response when exposed to an ad
Wi = 1 if unit i exposed to ad, else 0.
Observed response: Y obs
i = Yi (Wi )
if Wi = 1, only Yi (1) is observed, Yi (0) is a counterfactual
Causal Effect: Notation
“units” i = 1, 2, . . . , n (“users”, “user_context”, . . . )
Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0.
Each unit i has 2 potential responses:
Yi (0) = response when not exposed to an ad
Yi (1) = response when exposed to an ad
Wi = 1 if unit i exposed to ad, else 0.
Observed response: Y obs
i = Yi (Wi )
if Wi = 1, only Yi (1) is observed, Yi (0) is a counterfactual
if Wi = 0, only Yi (0) is observed, Yi (1) is a counterfactual
Causal Effect: Notation
“units” i = 1, 2, . . . , n (“users”, “user_context”, . . . )
Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0.
Each unit i has 2 potential responses:
Yi (0) = response when not exposed to an ad
Yi (1) = response when exposed to an ad
Wi = 1 if unit i exposed to ad, else 0.
Observed response: Y obs
i = Yi (Wi )
if Wi = 1, only Yi (1) is observed, Yi (0) is a counterfactual
if Wi = 0, only Yi (0) is observed, Yi (1) is a counterfactual
Xi = k-dimensional vector of features
Causal Effect: Notation
“units” i = 1, 2, . . . , n (“users”, “user_context”, . . . )
Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0.
Each unit i has 2 potential responses:
Yi (0) = response when not exposed to an ad
Yi (1) = response when exposed to an ad
Wi = 1 if unit i exposed to ad, else 0.
Observed response: Y obs
i = Yi (Wi )
if Wi = 1, only Yi (1) is observed, Yi (0) is a counterfactual
if Wi = 0, only Yi (0) is observed, Yi (1) is a counterfactual
Xi = k-dimensional vector of features
e.g. (dayOfWeek, age, location, web-domain)
Causal Effect: Notation
“units” i = 1, 2, . . . , n (“users”, “user_context”, . . . )
Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0.
Each unit i has 2 potential responses:
Yi (0) = response when not exposed to an ad
Yi (1) = response when exposed to an ad
Wi = 1 if unit i exposed to ad, else 0.
Observed response: Y obs
i = Yi (Wi )
if Wi = 1, only Yi (1) is observed, Yi (0) is a counterfactual
if Wi = 0, only Yi (0) is observed, Yi (1) is a counterfactual
Xi = k-dimensional vector of features
e.g. (dayOfWeek, age, location, web-domain)
Unit level causal effect is impossible to measure:
τi = Yi (1) − Yi (0)
Average Causal/Treatment Effects
Average Causal/Treatment Effects
Average Treatment Effect (ATE)
ATE = E[Yi (1) − Yi (0)]
Average Causal/Treatment Effects
Average Treatment Effect (ATE)
ATE = E[Yi (1) − Yi (0)]
Average Treatment Effect on the Treated (ATET)
ATET = E[Yi (1) − Yi (0) | Wi = 1]
Average Causal/Treatment Effects
Average Treatment Effect (ATE)
ATE = E[Yi (1) − Yi (0)]
Average Treatment Effect on the Treated (ATET)
ATET = E[Yi (1) − Yi (0) | Wi = 1]
Causal Lift (L) (this talk)
L =
E[Yi (1) | Wi = 1]
E[Yi (0) | Wi = 1]
− 1
Average Causal/Treatment Effects
Average Treatment Effect (ATE)
ATE = E[Yi (1) − Yi (0)]
Average Treatment Effect on the Treated (ATET)
ATET = E[Yi (1) − Yi (0) | Wi = 1]
Causal Lift (L) (this talk)
L =
E[Yi (1) | Wi = 1]
E[Yi (0) | Wi = 1]
− 1
Conditional Average Treatment Effect: (Athey/Imbens et al)
τ(x) = E[Yi (1) − Yi (0) | Xi = x]
Average Causal/Treatment Effects
Average Treatment Effect (ATE)
ATE = E[Yi (1) − Yi (0)]
Average Treatment Effect on the Treated (ATET)
ATET = E[Yi (1) − Yi (0) | Wi = 1]
Causal Lift (L) (this talk)
L =
E[Yi (1) | Wi = 1]
E[Yi (0) | Wi = 1]
− 1
Conditional Average Treatment Effect: (Athey/Imbens et al)
τ(x) = E[Yi (1) − Yi (0) | Xi = x]
Conditional Response Rate (usual Machine Learning problem)
R(x) = E[Yi (1) | Xi = x]
Causal Effect Illustration
Causal Effect Illustration
Causal Effect Illustration
Causal Effect Illustration
Causal Effect Illustration
Causal Effect Illustration: Counterfactuals
Causal Effect Illustration: Counterfactuals
Causal Effect Illustration: Counterfactuals
Causal Effect with Counterfactuals
Counterfactuals are unobservable!
Causal Effect with Counterfactuals
Counterfactuals are unobservable!
Instead of comparing:
Resp-rate of exposed users U vs
Counterfactual un-exposed response-rate of same users U,
Causal Effect with Counterfactuals
Counterfactuals are unobservable!
Instead of comparing:
Resp-rate of exposed users U vs
Counterfactual un-exposed response-rate of same users U,
We compare:
Resp-rate of exposed users U vs
Resp-rate of un-exposed users statistically equivalent to U.
Causal Effect with Counterfactuals
Counterfactuals are unobservable!
Instead of comparing:
Resp-rate of exposed users U vs
Counterfactual un-exposed response-rate of same users U,
We compare:
Resp-rate of exposed users U vs
Resp-rate of un-exposed users statistically equivalent to U.
=⇒ using randomization
Ideal Randomized Test:
Randomize after winning bid
Ideal Randomized Test
Ideal Randomized Test
Ideal Randomized Test
Ideal Randomized Test: Ad lift
Ideal Randomized Test: Ad lift
But is this practical?
Ideal Randomized Test
Ideal Randomized Test
Ideal Randomized Test
Ideal Randomized Test: Wasted spend
MediaMath’s approach:
Randomize before bidding
A Less Wasteful Randomized Test
A Less Wasteful Randomized Test
A Less Wasteful Randomized Test
Compare RC vs RT ?
Compare RC vs RT ?
Compare RC vs RT ?
Compare RC vs RTW ?
Compare RC vs RTW ? Win-bias
Ad Lift: Proper Definition
Ad Lift: Proper Definition
Ad Lift: Proper Definition
Ad Lift: Proper Definition
Estimating the Counterfactual RCW
Estimating the Counterfactual RCW
Estimating the Counterfactual RCW
Estimating the Counterfactual RCW
Estimating the Counterfactual RCW
Estimating the Counterfactual RCW
Estimating the Counterfactual RCW
Ad Lift Estimation
Main steps:
observe response rates RC , RTW , RTL
Ad Lift Estimation
Main steps:
observe response rates RC , RTW , RTL
observe test win-rate w
Ad Lift Estimation
Main steps:
observe response rates RC , RTW , RTL
observe test win-rate w
estimate the control counterfactual winner response-rate
RCW =
RC − (1 − w)RTL
w
Ad Lift Estimation
Main steps:
observe response rates RC , RTW , RTL
observe test win-rate w
estimate the control counterfactual winner response-rate
RCW =
RC − (1 − w)RTL
w
compute lift L = RTW /RCW − 1
Ad Lift Estimation
Main steps:
observe response rates RC , RTW , RTL
observe test win-rate w
estimate the control counterfactual winner response-rate
RCW =
RC − (1 − w)RTL
w
compute lift L = RTW /RCW − 1
similar to Treatment Effect Under Non-compliance in clinicial
trials.
Ad Lift Estimation
How to compute the 90% confidence interval for L?
Ad Lift: Confidence Intervals with Gibbs sampler
Ad Lift: Confidence Intervals with Gibbs sampler
Bayesian approach
Assume a random parameter vector θ consisting of:
Ad Lift: Confidence Intervals with Gibbs sampler
Bayesian approach
Assume a random parameter vector θ consisting of:
(RTW , RL, RCW , w, ...)
Ad Lift: Confidence Intervals with Gibbs sampler
Bayesian approach
Assume a random parameter vector θ consisting of:
(RTW , RL, RCW , w, ...)
Set up prior distribution on θ ∼ p(θ)
Ad Lift: Confidence Intervals with Gibbs sampler
Bayesian approach
Assume a random parameter vector θ consisting of:
(RTW , RL, RCW , w, ...)
Set up prior distribution on θ ∼ p(θ)
Sample M values of unknown θ from posterior: Gibbs Sampler
P(θ |Data) ∝ P(Data | θ) · p(θ)
Ad Lift: Confidence Intervals with Gibbs sampler
Bayesian approach
Assume a random parameter vector θ consisting of:
(RTW , RL, RCW , w, ...)
Set up prior distribution on θ ∼ p(θ)
Sample M values of unknown θ from posterior: Gibbs Sampler
P(θ |Data) ∝ P(Data | θ) · p(θ)
For each sampled θ compute lift L = RTW /RCW − 1
Ad Lift: Confidence Intervals with Gibbs sampler
Bayesian approach
Assume a random parameter vector θ consisting of:
(RTW , RL, RCW , w, ...)
Set up prior distribution on θ ∼ p(θ)
Sample M values of unknown θ from posterior: Gibbs Sampler
P(θ |Data) ∝ P(Data | θ) · p(θ)
For each sampled θ compute lift L = RTW /RCW − 1
Compute (0.05, 0.95) quantiles of sampled L values
Ad Lift: Gibbs Sampling
Ad Lift: Gibbs Sampling
Ad Lift: Gibbs Sampling
Ad Lift: Gibbs Sampling
Ad Lift: Gibbs Sampling
Ad Lift: Gibbs Sampling
Ad Lift: Gibbs Sampling
Ad Lift: Gibbs Sampling
Ad Lift: Gibbs Sampling
Ad Lift Gibbs Sampling: Random variables
Probabilities: w, RTW , RCW , RL
Counts: CW 0, CW 1, CL0, CL1
Ad Lift Gibbs Sampling: Random variables
Probabilities: w, RTW , RCW , RL
Counts: CW 0, CW 1, CL0, CL1
Beta(1, 1) priors on probabilities, e.g.:
w ∼ Beta(1, 1) ∼ Uniform(0, 1), . . .
Ad Lift Gibbs Sampling: Posterior Probabilities
Likelihood of observed
k = CL1 + TL1 conversions out of
n = CL1 + TL1 + CL0 + TL0 trials,
given loser reponse-rate RL:
Ad Lift Gibbs Sampling: Posterior Probabilities
Likelihood of observed
k = CL1 + TL1 conversions out of
n = CL1 + TL1 + CL0 + TL0 trials,
given loser reponse-rate RL:
Binom(k, n; RL) ∝ Rk
L (1 − RL)n−k
,
Ad Lift Gibbs Sampling: Posterior Probabilities
Likelihood of observed
k = CL1 + TL1 conversions out of
n = CL1 + TL1 + CL0 + TL0 trials,
given loser reponse-rate RL:
Binom(k, n; RL) ∝ Rk
L (1 − RL)n−k
,
so posterior of RL
P(RL | k, n) ∝ P(k, n | RL) · p(RL)
Ad Lift Gibbs Sampling: Posterior Probabilities
Likelihood of observed
k = CL1 + TL1 conversions out of
n = CL1 + TL1 + CL0 + TL0 trials,
given loser reponse-rate RL:
Binom(k, n; RL) ∝ Rk
L (1 − RL)n−k
,
so posterior of RL
P(RL | k, n) ∝ P(k, n | RL) · p(RL)
∝ Rk
L (1 − RL)n−k
· Beta(1, 1)
Ad Lift Gibbs Sampling: Posterior Probabilities
Likelihood of observed
k = CL1 + TL1 conversions out of
n = CL1 + TL1 + CL0 + TL0 trials,
given loser reponse-rate RL:
Binom(k, n; RL) ∝ Rk
L (1 − RL)n−k
,
so posterior of RL
P(RL | k, n) ∝ P(k, n | RL) · p(RL)
∝ Rk
L (1 − RL)n−k
· Beta(1, 1)
∝ Rk+1
L (1 − RL)n−k+1
Ad Lift Gibbs Sampling: Posterior Probabilities
Likelihood of observed
k = CL1 + TL1 conversions out of
n = CL1 + TL1 + CL0 + TL0 trials,
given loser reponse-rate RL:
Binom(k, n; RL) ∝ Rk
L (1 − RL)n−k
,
so posterior of RL
P(RL | k, n) ∝ P(k, n | RL) · p(RL)
∝ Rk
L (1 − RL)n−k
· Beta(1, 1)
∝ Rk+1
L (1 − RL)n−k+1
∝ Beta(k + 1, n − k + 1)
Ad Lift Gibbs Sampling: Posterior Counts
We observe C1 = CL1 + CW 1 (total control conversions).
Need to sample CL1, CW 1
Ad Lift Gibbs Sampling: Posterior Counts
We observe C1 = CL1 + CW 1 (total control conversions).
Need to sample CL1, CW 1
CW 1 is a Binomial draw from n = C1, with probability:
P(ctl winner | ctl conversion) =
w · RCW
w · RCW + (1 − w) · RL
Ad Lift Gibbs Sampling: Posterior Counts
We observe C1 = CL1 + CW 1 (total control conversions).
Need to sample CL1, CW 1
CW 1 is a Binomial draw from n = C1, with probability:
P(ctl winner | ctl conversion) =
w · RCW
w · RCW + (1 − w) · RL
CL1 = C1 − CW 1
Complication 1: We only observe cookies, not users;
A user’s cookies may be in both test and control
(Contamination)
Control Contamination due to Multiple Cookies
Control Contamination due to Multiple Cookies
Control Contamination due to Multiple Cookies
Control Contamination due to Multiple Cookies
Control Contamination due to Multiple Cookies
Cookie-Contamination Questions
How does cookie contamination affect measured lift?
Cookie-Contamination Questions
How does cookie contamination affect measured lift?
Does the cookie-distribution matter?
Cookie-Contamination Questions
How does cookie contamination affect measured lift?
Does the cookie-distribution matter?
everyone has k cookies vs an average of k cookies
Cookie-Contamination Questions
How does cookie contamination affect measured lift?
Does the cookie-distribution matter?
everyone has k cookies vs an average of k cookies
What is the influence of the control percentage?
Cookie-Contamination Questions
How does cookie contamination affect measured lift?
Does the cookie-distribution matter?
everyone has k cookies vs an average of k cookies
What is the influence of the control percentage?
Simulations best way to understand this
Cookie-Contamination Questions
How does cookie contamination affect measured lift?
Does the cookie-distribution matter?
everyone has k cookies vs an average of k cookies
What is the influence of the control percentage?
Simulations best way to understand this
Monte carlo simulations using Spark
Simulations for cookie-contamination
A scenario is a combination of parameters:
M = # trials for this scenario, usually 10K-1M
n = # users, typically 10K - 10M
p = # control percentage (usually 10-50%)
k = cookie-distribution, expressed as 1 : 100, or 1 : 70, 3 : 30
r = (un-contaminated) control user response rate
a = true lift, i.e. exposed user response rate = r ∗ (1 + a).
A scenario file specifies a scenario in each row.
could be thousands of scenarios
Complication 2:
Long-running experiments
Long-Running Experiments
Ideal randomized test is instantaneous.
Long-Running Experiments
Ideal randomized test is instantaneous.
When a test is run for weeks/months,
A test user may sometimes be a winner, sometimes loser.
How to define who is a “winner” and “loser”?
Crucial because lift L = RTW /RCW − 1.
Long-Running Experiments
Ideal randomized test is instantaneous.
When a test is run for weeks/months,
A test user may sometimes be a winner, sometimes loser.
How to define who is a “winner” and “loser”?
Crucial because lift L = RTW /RCW − 1.
Our approach (details omitted):
Ad influence period is limited
“refresh” a user after suitable time-period elapses.
Count “user time-spans” rather than “users”
Identify “experiments” within user’s time-line
MediaMath’s Placebo App
Currently in production for ∼ 10 advertisers
Advertisers can specify which campaigns to measure
Lift estimation, Gibbs Sampling runs on AWS using Spark
Multiple runs of Gibbs Sampler in parallel (with different priors)
Thank you!
pchalasani@mediamath.com

More Related Content

Recently uploaded

Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 

Recently uploaded (20)

Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 

Featured

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
GetSmarter
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
Project for Public Spaces & National Center for Biking and Walking
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
Erica Santiago
 

Featured (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

Estimating Causal Effect of Ads in a Real-Time Bidding Platform

  • 1. Estimating Causal Effect of Ads in a Real-Time-Bidding Platform Prasad Chalasani (SVP Data Science, MediaMath) Sep 24, 2016
  • 2. Project Placebo Or, How to Measure Causal Effect of Ads in an RTB Platform Placebo Team (alphabetical): Ari Buchalter (President, Technology; co-founder) Prasad Chalasani Himanish Kushary Jason Lei Jonathan Marshall Michael Neiss Tristan Piron Sara Skrmetti Jawad Stouli Jaynth Thiagarajan Ezra Winston
  • 3.
  • 4. listen to ~ 100 Bln ad opportunities daily
  • 5. listen to ~ 100 Bln ad opportunities daily respond with optimal bids within milliseconds
  • 6. listen to ~ 100 Bln ad opportunities daily respond with optimal bids within milliseconds petabytes of data (ad impressions, visits, clicks, conversions)
  • 8. Key Conceptual Take-aways Definition of causal effect Context: relationship to Machine Learning
  • 9. Key Conceptual Take-aways Definition of causal effect Context: relationship to Machine Learning Causal effect in a Real-Time Bidding Platform
  • 10. Key Conceptual Take-aways Definition of causal effect Context: relationship to Machine Learning Causal effect in a Real-Time Bidding Platform Simplest approach is wasteful
  • 11. Key Conceptual Take-aways Definition of causal effect Context: relationship to Machine Learning Causal effect in a Real-Time Bidding Platform Simplest approach is wasteful Less wasteful approach: bias (non-compliance)
  • 12. Key Conceptual Take-aways Definition of causal effect Context: relationship to Machine Learning Causal effect in a Real-Time Bidding Platform Simplest approach is wasteful Less wasteful approach: bias (non-compliance) MediaMath’s solution
  • 13. Key Conceptual Take-aways Definition of causal effect Context: relationship to Machine Learning Causal effect in a Real-Time Bidding Platform Simplest approach is wasteful Less wasteful approach: bias (non-compliance) MediaMath’s solution Bayesian Methods for Ad Lift Confidence Bounds
  • 14. Key Conceptual Take-aways Definition of causal effect Context: relationship to Machine Learning Causal effect in a Real-Time Bidding Platform Simplest approach is wasteful Less wasteful approach: bias (non-compliance) MediaMath’s solution Bayesian Methods for Ad Lift Confidence Bounds Gibbs Sampling (MCMC – Markov Chain Monte Carlo)
  • 15. Key Conceptual Take-aways Definition of causal effect Context: relationship to Machine Learning Causal effect in a Real-Time Bidding Platform Simplest approach is wasteful Less wasteful approach: bias (non-compliance) MediaMath’s solution Bayesian Methods for Ad Lift Confidence Bounds Gibbs Sampling (MCMC – Markov Chain Monte Carlo) Complications unique to our setting:
  • 16. Key Conceptual Take-aways Definition of causal effect Context: relationship to Machine Learning Causal effect in a Real-Time Bidding Platform Simplest approach is wasteful Less wasteful approach: bias (non-compliance) MediaMath’s solution Bayesian Methods for Ad Lift Confidence Bounds Gibbs Sampling (MCMC – Markov Chain Monte Carlo) Complications unique to our setting: Long-running experiments
  • 17. Key Conceptual Take-aways Definition of causal effect Context: relationship to Machine Learning Causal effect in a Real-Time Bidding Platform Simplest approach is wasteful Less wasteful approach: bias (non-compliance) MediaMath’s solution Bayesian Methods for Ad Lift Confidence Bounds Gibbs Sampling (MCMC – Markov Chain Monte Carlo) Complications unique to our setting: Long-running experiments Multiple cookies per user
  • 18. Ad impact measurement Advertisers want to know the impact of showing ads to people.
  • 19. Measuring Ad Impact: Two Approaches Observational studies:
  • 20. Measuring Ad Impact: Two Approaches Observational studies: Compare people who happen to be exposed vs not exposed
  • 21. Measuring Ad Impact: Two Approaches Observational studies: Compare people who happen to be exposed vs not exposed Bias a big issue
  • 22. Measuring Ad Impact: Two Approaches Observational studies: Compare people who happen to be exposed vs not exposed Bias a big issue Randomized tests:
  • 23. Measuring Ad Impact: Two Approaches Observational studies: Compare people who happen to be exposed vs not exposed Bias a big issue Randomized tests: Randomly assign people to test (exposed), control (un-exposed)
  • 24. Causal Effect: the questions to ask When a set of people U is exposed to ads, what is the avg response-rate R1 of the people in U?
  • 25. Causal Effect: the questions to ask When a set of people U is exposed to ads, what is the avg response-rate R1 of the people in U? what would have been the response rate R0 of U, if they had not seen the ad?
  • 26. Causal Effect: the questions to ask When a set of people U is exposed to ads, what is the avg response-rate R1 of the people in U? what would have been the response rate R0 of U, if they had not seen the ad? relative causal effect, or causal lift = R1/R0 − 1
  • 27. Causal Effect: Notation “units” i = 1, 2, . . . , n (“users”, “user_context”, . . . )
  • 28. Causal Effect: Notation “units” i = 1, 2, . . . , n (“users”, “user_context”, . . . ) Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0.
  • 29. Causal Effect: Notation “units” i = 1, 2, . . . , n (“users”, “user_context”, . . . ) Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0. Each unit i has 2 potential responses:
  • 30. Causal Effect: Notation “units” i = 1, 2, . . . , n (“users”, “user_context”, . . . ) Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0. Each unit i has 2 potential responses: Yi (0) = response when not exposed to an ad
  • 31. Causal Effect: Notation “units” i = 1, 2, . . . , n (“users”, “user_context”, . . . ) Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0. Each unit i has 2 potential responses: Yi (0) = response when not exposed to an ad Yi (1) = response when exposed to an ad
  • 32. Causal Effect: Notation “units” i = 1, 2, . . . , n (“users”, “user_context”, . . . ) Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0. Each unit i has 2 potential responses: Yi (0) = response when not exposed to an ad Yi (1) = response when exposed to an ad Wi = 1 if unit i exposed to ad, else 0.
  • 33. Causal Effect: Notation “units” i = 1, 2, . . . , n (“users”, “user_context”, . . . ) Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0. Each unit i has 2 potential responses: Yi (0) = response when not exposed to an ad Yi (1) = response when exposed to an ad Wi = 1 if unit i exposed to ad, else 0. Observed response: Y obs i = Yi (Wi )
  • 34. Causal Effect: Notation “units” i = 1, 2, . . . , n (“users”, “user_context”, . . . ) Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0. Each unit i has 2 potential responses: Yi (0) = response when not exposed to an ad Yi (1) = response when exposed to an ad Wi = 1 if unit i exposed to ad, else 0. Observed response: Y obs i = Yi (Wi ) if Wi = 1, only Yi (1) is observed, Yi (0) is a counterfactual
  • 35. Causal Effect: Notation “units” i = 1, 2, . . . , n (“users”, “user_context”, . . . ) Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0. Each unit i has 2 potential responses: Yi (0) = response when not exposed to an ad Yi (1) = response when exposed to an ad Wi = 1 if unit i exposed to ad, else 0. Observed response: Y obs i = Yi (Wi ) if Wi = 1, only Yi (1) is observed, Yi (0) is a counterfactual if Wi = 0, only Yi (0) is observed, Yi (1) is a counterfactual
  • 36. Causal Effect: Notation “units” i = 1, 2, . . . , n (“users”, “user_context”, . . . ) Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0. Each unit i has 2 potential responses: Yi (0) = response when not exposed to an ad Yi (1) = response when exposed to an ad Wi = 1 if unit i exposed to ad, else 0. Observed response: Y obs i = Yi (Wi ) if Wi = 1, only Yi (1) is observed, Yi (0) is a counterfactual if Wi = 0, only Yi (0) is observed, Yi (1) is a counterfactual Xi = k-dimensional vector of features
  • 37. Causal Effect: Notation “units” i = 1, 2, . . . , n (“users”, “user_context”, . . . ) Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0. Each unit i has 2 potential responses: Yi (0) = response when not exposed to an ad Yi (1) = response when exposed to an ad Wi = 1 if unit i exposed to ad, else 0. Observed response: Y obs i = Yi (Wi ) if Wi = 1, only Yi (1) is observed, Yi (0) is a counterfactual if Wi = 0, only Yi (0) is observed, Yi (1) is a counterfactual Xi = k-dimensional vector of features e.g. (dayOfWeek, age, location, web-domain)
  • 38. Causal Effect: Notation “units” i = 1, 2, . . . , n (“users”, “user_context”, . . . ) Yi = 1 if unit i responds (buys, subscribes, . . . ), else 0. Each unit i has 2 potential responses: Yi (0) = response when not exposed to an ad Yi (1) = response when exposed to an ad Wi = 1 if unit i exposed to ad, else 0. Observed response: Y obs i = Yi (Wi ) if Wi = 1, only Yi (1) is observed, Yi (0) is a counterfactual if Wi = 0, only Yi (0) is observed, Yi (1) is a counterfactual Xi = k-dimensional vector of features e.g. (dayOfWeek, age, location, web-domain) Unit level causal effect is impossible to measure: τi = Yi (1) − Yi (0)
  • 40. Average Causal/Treatment Effects Average Treatment Effect (ATE) ATE = E[Yi (1) − Yi (0)]
  • 41. Average Causal/Treatment Effects Average Treatment Effect (ATE) ATE = E[Yi (1) − Yi (0)] Average Treatment Effect on the Treated (ATET) ATET = E[Yi (1) − Yi (0) | Wi = 1]
  • 42. Average Causal/Treatment Effects Average Treatment Effect (ATE) ATE = E[Yi (1) − Yi (0)] Average Treatment Effect on the Treated (ATET) ATET = E[Yi (1) − Yi (0) | Wi = 1] Causal Lift (L) (this talk) L = E[Yi (1) | Wi = 1] E[Yi (0) | Wi = 1] − 1
  • 43. Average Causal/Treatment Effects Average Treatment Effect (ATE) ATE = E[Yi (1) − Yi (0)] Average Treatment Effect on the Treated (ATET) ATET = E[Yi (1) − Yi (0) | Wi = 1] Causal Lift (L) (this talk) L = E[Yi (1) | Wi = 1] E[Yi (0) | Wi = 1] − 1 Conditional Average Treatment Effect: (Athey/Imbens et al) τ(x) = E[Yi (1) − Yi (0) | Xi = x]
  • 44. Average Causal/Treatment Effects Average Treatment Effect (ATE) ATE = E[Yi (1) − Yi (0)] Average Treatment Effect on the Treated (ATET) ATET = E[Yi (1) − Yi (0) | Wi = 1] Causal Lift (L) (this talk) L = E[Yi (1) | Wi = 1] E[Yi (0) | Wi = 1] − 1 Conditional Average Treatment Effect: (Athey/Imbens et al) τ(x) = E[Yi (1) − Yi (0) | Xi = x] Conditional Response Rate (usual Machine Learning problem) R(x) = E[Yi (1) | Xi = x]
  • 50. Causal Effect Illustration: Counterfactuals
  • 51. Causal Effect Illustration: Counterfactuals
  • 52. Causal Effect Illustration: Counterfactuals
  • 53. Causal Effect with Counterfactuals Counterfactuals are unobservable!
  • 54. Causal Effect with Counterfactuals Counterfactuals are unobservable! Instead of comparing: Resp-rate of exposed users U vs Counterfactual un-exposed response-rate of same users U,
  • 55. Causal Effect with Counterfactuals Counterfactuals are unobservable! Instead of comparing: Resp-rate of exposed users U vs Counterfactual un-exposed response-rate of same users U, We compare: Resp-rate of exposed users U vs Resp-rate of un-exposed users statistically equivalent to U.
  • 56. Causal Effect with Counterfactuals Counterfactuals are unobservable! Instead of comparing: Resp-rate of exposed users U vs Counterfactual un-exposed response-rate of same users U, We compare: Resp-rate of exposed users U vs Resp-rate of un-exposed users statistically equivalent to U. =⇒ using randomization
  • 57. Ideal Randomized Test: Randomize after winning bid
  • 63. But is this practical?
  • 67. Ideal Randomized Test: Wasted spend
  • 68.
  • 70. A Less Wasteful Randomized Test
  • 71. A Less Wasteful Randomized Test
  • 72. A Less Wasteful Randomized Test
  • 76. Compare RC vs RTW ?
  • 77. Compare RC vs RTW ? Win-bias
  • 78. Ad Lift: Proper Definition
  • 79. Ad Lift: Proper Definition
  • 80. Ad Lift: Proper Definition
  • 81. Ad Lift: Proper Definition
  • 89. Ad Lift Estimation Main steps: observe response rates RC , RTW , RTL
  • 90. Ad Lift Estimation Main steps: observe response rates RC , RTW , RTL observe test win-rate w
  • 91. Ad Lift Estimation Main steps: observe response rates RC , RTW , RTL observe test win-rate w estimate the control counterfactual winner response-rate RCW = RC − (1 − w)RTL w
  • 92. Ad Lift Estimation Main steps: observe response rates RC , RTW , RTL observe test win-rate w estimate the control counterfactual winner response-rate RCW = RC − (1 − w)RTL w compute lift L = RTW /RCW − 1
  • 93. Ad Lift Estimation Main steps: observe response rates RC , RTW , RTL observe test win-rate w estimate the control counterfactual winner response-rate RCW = RC − (1 − w)RTL w compute lift L = RTW /RCW − 1 similar to Treatment Effect Under Non-compliance in clinicial trials.
  • 94. Ad Lift Estimation How to compute the 90% confidence interval for L?
  • 95.
  • 96. Ad Lift: Confidence Intervals with Gibbs sampler
  • 97. Ad Lift: Confidence Intervals with Gibbs sampler Bayesian approach Assume a random parameter vector θ consisting of:
  • 98. Ad Lift: Confidence Intervals with Gibbs sampler Bayesian approach Assume a random parameter vector θ consisting of: (RTW , RL, RCW , w, ...)
  • 99. Ad Lift: Confidence Intervals with Gibbs sampler Bayesian approach Assume a random parameter vector θ consisting of: (RTW , RL, RCW , w, ...) Set up prior distribution on θ ∼ p(θ)
  • 100. Ad Lift: Confidence Intervals with Gibbs sampler Bayesian approach Assume a random parameter vector θ consisting of: (RTW , RL, RCW , w, ...) Set up prior distribution on θ ∼ p(θ) Sample M values of unknown θ from posterior: Gibbs Sampler P(θ |Data) ∝ P(Data | θ) · p(θ)
  • 101. Ad Lift: Confidence Intervals with Gibbs sampler Bayesian approach Assume a random parameter vector θ consisting of: (RTW , RL, RCW , w, ...) Set up prior distribution on θ ∼ p(θ) Sample M values of unknown θ from posterior: Gibbs Sampler P(θ |Data) ∝ P(Data | θ) · p(θ) For each sampled θ compute lift L = RTW /RCW − 1
  • 102. Ad Lift: Confidence Intervals with Gibbs sampler Bayesian approach Assume a random parameter vector θ consisting of: (RTW , RL, RCW , w, ...) Set up prior distribution on θ ∼ p(θ) Sample M values of unknown θ from posterior: Gibbs Sampler P(θ |Data) ∝ P(Data | θ) · p(θ) For each sampled θ compute lift L = RTW /RCW − 1 Compute (0.05, 0.95) quantiles of sampled L values
  • 103. Ad Lift: Gibbs Sampling
  • 104. Ad Lift: Gibbs Sampling
  • 105. Ad Lift: Gibbs Sampling
  • 106. Ad Lift: Gibbs Sampling
  • 107. Ad Lift: Gibbs Sampling
  • 108. Ad Lift: Gibbs Sampling
  • 109. Ad Lift: Gibbs Sampling
  • 110. Ad Lift: Gibbs Sampling
  • 111. Ad Lift: Gibbs Sampling
  • 112. Ad Lift Gibbs Sampling: Random variables Probabilities: w, RTW , RCW , RL Counts: CW 0, CW 1, CL0, CL1
  • 113. Ad Lift Gibbs Sampling: Random variables Probabilities: w, RTW , RCW , RL Counts: CW 0, CW 1, CL0, CL1 Beta(1, 1) priors on probabilities, e.g.: w ∼ Beta(1, 1) ∼ Uniform(0, 1), . . .
  • 114. Ad Lift Gibbs Sampling: Posterior Probabilities Likelihood of observed k = CL1 + TL1 conversions out of n = CL1 + TL1 + CL0 + TL0 trials, given loser reponse-rate RL:
  • 115. Ad Lift Gibbs Sampling: Posterior Probabilities Likelihood of observed k = CL1 + TL1 conversions out of n = CL1 + TL1 + CL0 + TL0 trials, given loser reponse-rate RL: Binom(k, n; RL) ∝ Rk L (1 − RL)n−k ,
  • 116. Ad Lift Gibbs Sampling: Posterior Probabilities Likelihood of observed k = CL1 + TL1 conversions out of n = CL1 + TL1 + CL0 + TL0 trials, given loser reponse-rate RL: Binom(k, n; RL) ∝ Rk L (1 − RL)n−k , so posterior of RL P(RL | k, n) ∝ P(k, n | RL) · p(RL)
  • 117. Ad Lift Gibbs Sampling: Posterior Probabilities Likelihood of observed k = CL1 + TL1 conversions out of n = CL1 + TL1 + CL0 + TL0 trials, given loser reponse-rate RL: Binom(k, n; RL) ∝ Rk L (1 − RL)n−k , so posterior of RL P(RL | k, n) ∝ P(k, n | RL) · p(RL) ∝ Rk L (1 − RL)n−k · Beta(1, 1)
  • 118. Ad Lift Gibbs Sampling: Posterior Probabilities Likelihood of observed k = CL1 + TL1 conversions out of n = CL1 + TL1 + CL0 + TL0 trials, given loser reponse-rate RL: Binom(k, n; RL) ∝ Rk L (1 − RL)n−k , so posterior of RL P(RL | k, n) ∝ P(k, n | RL) · p(RL) ∝ Rk L (1 − RL)n−k · Beta(1, 1) ∝ Rk+1 L (1 − RL)n−k+1
  • 119. Ad Lift Gibbs Sampling: Posterior Probabilities Likelihood of observed k = CL1 + TL1 conversions out of n = CL1 + TL1 + CL0 + TL0 trials, given loser reponse-rate RL: Binom(k, n; RL) ∝ Rk L (1 − RL)n−k , so posterior of RL P(RL | k, n) ∝ P(k, n | RL) · p(RL) ∝ Rk L (1 − RL)n−k · Beta(1, 1) ∝ Rk+1 L (1 − RL)n−k+1 ∝ Beta(k + 1, n − k + 1)
  • 120. Ad Lift Gibbs Sampling: Posterior Counts We observe C1 = CL1 + CW 1 (total control conversions). Need to sample CL1, CW 1
  • 121. Ad Lift Gibbs Sampling: Posterior Counts We observe C1 = CL1 + CW 1 (total control conversions). Need to sample CL1, CW 1 CW 1 is a Binomial draw from n = C1, with probability: P(ctl winner | ctl conversion) = w · RCW w · RCW + (1 − w) · RL
  • 122. Ad Lift Gibbs Sampling: Posterior Counts We observe C1 = CL1 + CW 1 (total control conversions). Need to sample CL1, CW 1 CW 1 is a Binomial draw from n = C1, with probability: P(ctl winner | ctl conversion) = w · RCW w · RCW + (1 − w) · RL CL1 = C1 − CW 1
  • 123. Complication 1: We only observe cookies, not users; A user’s cookies may be in both test and control (Contamination)
  • 124.
  • 125. Control Contamination due to Multiple Cookies
  • 126. Control Contamination due to Multiple Cookies
  • 127. Control Contamination due to Multiple Cookies
  • 128. Control Contamination due to Multiple Cookies
  • 129. Control Contamination due to Multiple Cookies
  • 130. Cookie-Contamination Questions How does cookie contamination affect measured lift?
  • 131. Cookie-Contamination Questions How does cookie contamination affect measured lift? Does the cookie-distribution matter?
  • 132. Cookie-Contamination Questions How does cookie contamination affect measured lift? Does the cookie-distribution matter? everyone has k cookies vs an average of k cookies
  • 133. Cookie-Contamination Questions How does cookie contamination affect measured lift? Does the cookie-distribution matter? everyone has k cookies vs an average of k cookies What is the influence of the control percentage?
  • 134. Cookie-Contamination Questions How does cookie contamination affect measured lift? Does the cookie-distribution matter? everyone has k cookies vs an average of k cookies What is the influence of the control percentage? Simulations best way to understand this
  • 135. Cookie-Contamination Questions How does cookie contamination affect measured lift? Does the cookie-distribution matter? everyone has k cookies vs an average of k cookies What is the influence of the control percentage? Simulations best way to understand this Monte carlo simulations using Spark
  • 136. Simulations for cookie-contamination A scenario is a combination of parameters: M = # trials for this scenario, usually 10K-1M n = # users, typically 10K - 10M p = # control percentage (usually 10-50%) k = cookie-distribution, expressed as 1 : 100, or 1 : 70, 3 : 30 r = (un-contaminated) control user response rate a = true lift, i.e. exposed user response rate = r ∗ (1 + a). A scenario file specifies a scenario in each row. could be thousands of scenarios
  • 137.
  • 139. Long-Running Experiments Ideal randomized test is instantaneous.
  • 140. Long-Running Experiments Ideal randomized test is instantaneous. When a test is run for weeks/months, A test user may sometimes be a winner, sometimes loser. How to define who is a “winner” and “loser”? Crucial because lift L = RTW /RCW − 1.
  • 141. Long-Running Experiments Ideal randomized test is instantaneous. When a test is run for weeks/months, A test user may sometimes be a winner, sometimes loser. How to define who is a “winner” and “loser”? Crucial because lift L = RTW /RCW − 1. Our approach (details omitted): Ad influence period is limited “refresh” a user after suitable time-period elapses. Count “user time-spans” rather than “users” Identify “experiments” within user’s time-line
  • 142. MediaMath’s Placebo App Currently in production for ∼ 10 advertisers Advertisers can specify which campaigns to measure Lift estimation, Gibbs Sampling runs on AWS using Spark Multiple runs of Gibbs Sampler in parallel (with different priors)