When marketing teams spend money on a paid acquisitions program it is crucial to understand the effect of that ad spend. In this talk, we will outline incrementality as a way to measure the causal impact that ad spend has on acquiring new customers and its advantages over more traditional metrics. We will walk through several ad measurement products available today and give examples of how to apply them to your business.
23. How many new customers did I acquire
because I spent money?
24. How many more customers were acquired due to ad spend
Users acquired
while spending
money on ads
Users acquired
while NOT
spending money on
ads
Incrementality
37. But we don’t want to scale down our
advertising efforts…
38. ● It’s worth it in the long run
● In order to experiment in a rigorous way we need holdouts
● Experimentation costs might be lower than perceived
Organizational alignment
39. ● Want to understand iCPA not % lift
● Adverse incentives from platforms can lead you to have a high % holdout
● What is the minimum % holdout I can have?
● You really may only need just a few % points holdout
Practically…
41. Final words
● Traditional CPA metrics have problems
● Incrementality: How many more customers were acquired due to ad spend
● Compute iCPA often
● Ask for it from your advertising partners to help make it a standard
Editor's Notes
Introduce Stitch Fix
What is this offering all about?
Clients fill out style profile
Algorithms, merchandising and stylists work together to find you the right clothes
Warehousing and operations get it out the door on time!
We learn and serve you better
Leading the Algorithmic Acquisitions team
Our mission: help drive our paid client acquisitions program with our vast amounts of internal data
How much should we bid for certain users?
How should we retarget users?
Which types of users should we try to acquire now?
What type of mix should we be running?
The basis for all these questions is a solid frameworkf for measurement
Previously worked on recommendation products at LinkedIn and small e-commerce startup.
Built data products and recommendations systems to help drive retention, engagement, sales.
“What metric do I want to drive?”
Had to choose the right metric
CTR – might be initial guess at metric
Cons: can capture bad type of engagement - not what you actually want to drive
How CTR can go wrong in:
Content/news: promote content that is clickbait or polarizing and/or offensive
People/connection recommendations: promote people that have nice visuals/images or images that have questionable content
Item recommendations: promote items with questionable pictures/content, items that are overall popular
Examples of good metrics:
Content/news: increased number of sessions/ other downstream metrics
People/connections: increased number of messages, accepted connection requests
Items: increased number of purchases
Main problem we are all facing in advertising field
Given my past experience in creating products to increase sales/connections I began to really think about how we measured success.
What does “we want to grow” really mean?
What kind of behavior do we want to drive?
Volume vs high sales
What metric best captures that behavior?
Main theme: I want to spend money in the places that will get me the most clients
Need to understand which channels are the most effective at bringing in clients with a given amount of spend
How many new clients/purchases/revenue did I drive with my spend?
But still…. What is the metric that captures this???
Want to measure the effectiveness of spend
CPA - the de facto metric
Captures directionality but has failures
Not causal – This metric does not quantify the effect that our ad spend had on acquiring users.
Lets focus on the attribution issues associate to CPA
Issues first start to become clear when we actually try to define the “acquired users” part of the metric
What do we mean by acquired user?
“A user came to our site/made a purchase after clicking on an ad that they saw”
What if the user saw ads in more than one channel?
To simplify companies will say that they will use some form of attribution. e.g. “last touch”
Good: Simple to explain
Bad: Attribution problems
People usually tend to solve this with multi touch attribution:
Basic principal: By analyzing all the impressions and clicks a user had on their path to convergence we will be able to redistribute the credit from last touch to help us understand the impact that channels have to acquiring customers.
Con:
We do not have all the impressions from all of our channels.
Facebook – walled garden
Offline channels impossible to tell
Impressions/clicks are expensive to obtain. MTA solutions are also expensive.
Multi-touch attribution is not causal
We may be redistributing weight but do we know that the conversion actually happened due to ads?
Even if we had all the impressions and clicks from all of our channels, modern ad platforms target populations by features and criteria that only they know.
This creates an unbalanced population between those that are exposed to ads
These user features are essentially confounders which explain conversions
We will never have those features
Not all channels compute CPA the same way
Inconsistency generally seen in offline channels
TV: spike analysis
Match back analysis
URL links
Application of multipliers
biases and seasonality
For TV we can use a spike analysis to measure impact of TV
Natural experiment
Look a website traffic – spikes usually correspond to a TV ad running
Time before and after a spike is users to estimate a baseline – line in orange.
Difference between total users in spike and estimated baseline is assumed to be the additional users that TV brought in
Limitation
Spikes most likely happen during a short time window and are unable to capture long term impact of TV
Long term impact is important to understand for top of funnel channels
Another example where measurement becomes difficult: Direct mail
Send a nice piece of mail to your customers and ask them to sign up via a url link
Its not going to happen – will most likely see very little conversions coming in via url links
Have to figure out another way to estimate the impact of these campaigns
Use a match back analysis
Know the list of customers who received the mailer
Look at conversions that happened after the mailer was sent and see if you can identify which of the users who received the mailer ultimately ended up converting
Use that number to estimate the acquisitions brought in by the program
Problem:
What time scale do I use?
Look at example of signups after a direct mail campaign
Window is very long and our way of choosing the window is arbitrary
Summarize:
At the end of the day we have CPA numbers for all our channels which we are using to make decisions but CPAs are not computed the same way across channels
Finally – the main issue with CPA is that it is not causal. It does not capture the effect that spending money on ads had on bringing customers to your business
Traditional CPA metric may incorrectly estimate the effects of your ads
Overestimate
High intent users where likely to convert despite your ad
Ad still takes credit for those conversions
Looks more efficient than it is
Underestimate
Looking only at converters that clicked through
Will not value the effect that an impression of the add had on the user
Looks less efficient than it is
Popular Ebay case study that looked at exactly the effects of ads on their business
Wanted to study the effectiveness of their ads, in particular their branded search terms
Concerned that these types of adds targeted customers with high intent that would have converted anyway
Ran experiments that showed no measurable short-term value in brand keyword advertising
First (left) graph:
Halted SEM brand keywords on both Yahoo! and Microsoft (MSN)
Drop in clicks that came from branded KWs was made up by organic
Users found their way to eBay without branded terms
Second (right) graph:
Halted SEM brand keywords on Google
Same findings as on Yahoo! and MSN
“Shutting off paid search ads closed one (costly) path to a company’s website but diverted traffic to natural search, which is free to the advertiser”
CPA has problems:
attribution
Inconsistency
Doesn’t actually measure effectiveness of spend
What we really wanted to understand:How many **NEW** people did I bring in due to my add spend?
AKA lift
We can design controlled experiments!!
We will walk you through a couple of examples of how to do this for some of the more popular channels
Incremental CPA should be the baseline metric
We can design controlled experiments!!
We will walk you through a couple of examples of how to do this for some of the more popular channels
Lets revisit the Ebay story:
CPA would have indicated that spending money on branded keywords was the cheapest way to acquire clients
In this case the CPA showed to be very efficient but the iCPA would captured inefficiencies
Now lets walk through some basic practical examples that will get you started
Facebook
They have a measurement product to detect incrementality called conversion lift
Can identify users across devices
The way the study and analysis works
Allocate all users in the FB universe to test and control
Calculate the number of incremental users acquired at the end of the study
Example: 2 incremental users
Calculate iCPA
Called a conversion lift
Facebook
This lift study is an intent to treat
Pros: clean test
Cons: high variance – looking at conversions among exposed and non exposed users
Comparing all users in control in test, not just those that would have been exposed
Causes more noise due to the conversions of the people in the bucket that would not have been exposed to adds
Getting the test off the ground
For one test -- talk to your FB rep
For multiple tests -- use the API
SEM
Cannot always identify users across devices
Must run a geo based study
Pros: Cleaner than using a cookie based test with potential for significant cross-contamination
Cons: Can’t use this if you don’t operate in a large geographic area
How to do set up the study and do the analysis:
Choose randomly which regions are in test and which are in control
We could compare treatment regions to test regions BUT
Very few regions in each bucket – implies high variance
Different from FB were we had many users in each control/test
SEM
We alleviate noise issues by estimating the counterfactual (what is called a synthetic control) for the test regions from the control regions
In other words: Estimate what would have happened to in the test regions in the absence of ad spend
Understand the lift by looking at the difference between synthetic control and test
CausalImpact package in R lets you do just that with just a few lines of code
Getting test off the ground:
Talk to your Google rep – will have good advice on what to do
You will manually have to choose DMAs and turn off your spend in these DMAs
You will have to do the analysis yourself
Display – GDN
GDN has a Conversion Lift Study measurement product
The way it works:
User based study - try to identify users across devices
When you have an opportunity to show an add
If user is in test – then show them the add and record it
If user is in control – substitute the next add in the auction and record that they would have seen the add
Look at difference in acquisitions/sales among users that saw your ads and users that would have see your adds
The experiment setup is a ghost ads study
Pros:
Reduced noise:
only comparing difference in blue acquisitions above
do not look at difference in yellow – this is what would have happened in an intent to treat study
Less variance -> shorter to run
Cons:
Users could clear their cookies so could inadvertently be exposed to both treatments
Display – Partners
Its possible to run lift/incrementality test on other display networks via partners.
Must talk to your partner to see if they offer these types of studies
In many cases they may offer to run a PSA study
The way it works:
Cookie based study – not user based
Off the bat we may have cross contamination issues
Cookies are split into control and test
When a user is set up to receive an add (similar to ghost ads):
If user is in test – then show them the add and record it
If user is in control – serve the user a PSA ad and record it
Pros:
It’s something
Cons:
Users could clear their cookies so could inadvertently be exposed to both treatments
Users in control are not exposed to competitor ads possibly leading to less conversions in control
Need to be careful that targeting capabilities keep test/control buckets constant.
Most modern ad platforms will optimize the audience an add is shown to.
PSA adds could cause ad platforms to try to target a fundamentally different audience in control creating a bias in the audiences.
Need to verify with your partner how they deal with this.
Expensive since you may have to foot the bill for the PSA ads as well
Organizational alignment
Companies in a high growth mode can be hesitant about with-holding spend for fear of not hitting numbers
Need to ask yourself: How much do I really have to hold out? How do I power my test? (i.e. how much data do I have to have in order to conclusion that is statistically significant?)
Organizational alignment:
It’s important to commit to running tests.
You could be measuring the impact of your ads campaigns incorrectly and making non-optimal ad spend decisions
The rest of the data science runs experiments with holdouts to understand the impact of recommendations/products/actions
Lets bring performance marketing to that standard
Having a hold out may not actually be as bad as you might think:
Not only get gains from learnings but
By looking for the right thing to understand we way be able to get away with a small % holdout
Remember:
iCPA is the metric you want to understand
NOT the % lift you saw by spending money on ads
% lift is not comparable across channels b/c have different baselines but iCPA is comparable
Example: You could run a lift study on a channel and see a 2% lift. Great to know but if that translates to a $100 iCPA and you can only really tolerate paying $10 for a new user then its not really worth it for you to be able to detect such a small lift
Adverse incentives from channel platforms – they want you to walk out of the lift study thinking that you had a % lift
Will advise you to have large holdouts in order to be able to detect the small lift
Work with you data science team to understand what is the minimum % holdout you need to have in order detect a maximum allowable iCPA
In some cases you may only need a few % point lift.
Interesting analysis one can get from incrementality:
Most incremental user segments:
Check with your channel partners what features you can segments your learnings by
Age/geo/other user features
CPAs vs incremental CPAs
User segment 1 looks best under traditional CPA methods
User segment 2 is really the best segment to go after when you make ads
May shift the way you
Make creative – may now focus on user segment 2 vs user segment 1
Adjust how aggressively you bid