How do we optimize the ad yield on Spotify platform?
The type of questions we have
Find the total available audio ad impressions on iOS platform
between 9/12/2013 and 9/13/2013 in NYC metro area for male
users in the age-group of 18-35, and who typically listen to hip-hop
What is unique about us?
• Rules triggering ad breaks are unique
• We also log user activity and audio streaming data
• Simulate ad delivery by replaying user events and
triggering ad breaks
• Pre-compute impression aggregates for different
dimensions and build a complex model to combine those
• Use subset of impression data then filter and extrapolate it
using a simple model
Our Hadoop infrastructure
700 nodes in our hadoop cluster
• Fast real-time lookup service
• Consistent results
• Ability to handle additional targeting
• Ability to scale
Use subset of impression data then filter and extrapolate it
using a simple model in a service
Now begins the fun part…
Lets dive deeper to solve this problem
What was the big picture going be like?
High level forecasting engine algorithm
daily Once a minute
• Organic growth in inventory
• Cold start
Organic growth in inventory
Ad impression inventory in a growing market
Organic growth in inventory?
Ad impression inventory in a market with high conversion to premium
Ad impression inventory in a newly launched market
Ad impression inventory dip in early Q1
Volume of data
• Billions of ad impressions per month
• Terabytes of relevant forecasting data
daily Once a minute
Optimizing data retrieval
• We analyzed our data access pattern and found over 75% of
our campaigns are targeted by age and location.
• So we mapped location to a list of users sorted by age using
• Optimized user lookup by location and age-group to O(kLgN)
from typical O(kN) where,
N : Total users for a location
k : constant
How to find available inventory for sample population?
1. Take all user ad impressions by applying “day of the month”
2. Apply filters by ad-type, location, age, gender, platform, etc.
3. Count the total impressions for all the users who match
4. Read booked impressions for the similar target criteria from
5. Inventory available = total impressions – booked
• Population (15 million) -> Sample (150,000)
• Scaling factor is 100
• Total Available inventory = scaling factor * available inventory for sample
• Ad Frequency capping
• Day of the week and time of the day filtering
• View per user (VPU) capping
What worked for us?
1. Fast lookups
2. Simple models scaled well
3. Deterministic algorithms easier to debug
4. Adding new targeting features was easy
5. Forecasting engine agnostic to changes in ad server
What didn’t work that well?
1. Campaign level forecasts difficult without simulation
2. Cold start is a real problem when there is no proxy dataset
3. Forecasting inventory for new ad types can be challenging
What we’ve learnt
• Think data volume
• Consider Sampling
• Choose appropriate time window
• Analyze data access patterns and optimize for it
• Use deterministic algorithms
• Analyze data trends and factor those in computation
• Simple models scale well
May 12, 2014
Email - Kinshuk@spotify.com
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.