Ad Yield Optimization @ Spotify - DataGotham 2013

  • 223 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
223
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
7
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. May 12, 2014 Ad Yield Optimization @ Spotify
  • 2. I’m Kinshuk Mishra •  Work on distributed systems and data science problems •  Lead architecture for ads backend platform at Spotify •  You can find me @_kinshukmishra
  • 3. 3 •  Started in 2006 •  Currently has over 24 million users •  6 million paying users •  Available in 28 countries •  Over 300 engineers, of which 100 in NYC What is Spotify?
  • 4. •  getFreeTierUsers() / getAllUsers() > 0.70 •  getSpotifyPayoutToMusicLabels() = $$$ •  Great medium for promotions and announcements Why are Ads important?
  • 5. 5 Native Ads
  • 6. The problem How do we optimize the ad yield on Spotify platform?
  • 7. The type of questions we have Find the total available audio ad impressions on iOS platform between 9/12/2013 and 9/13/2013 in NYC metro area for male users in the age-group of 18-35, and who typically listen to hip-hop music genre?
  • 8. What is unique about us? •  Rules triggering ad breaks are unique •  We also log user activity and audio streaming data
  • 9. Different approaches •  Simulate ad delivery by replaying user events and triggering ad breaks •  Pre-compute impression aggregates for different dimensions and build a complex model to combine those •  Use subset of impression data then filter and extrapolate it using a simple model
  • 10. Our Hadoop infrastructure 700 nodes in our hadoop cluster
  • 11. Some constraints •  Fast real-time lookup service •  Consistent results •  Ability to handle additional targeting •  Ability to scale
  • 12. The solution Use subset of impression data then filter and extrapolate it using a simple model in a service
  • 13. But how? Now begins the fun part… Lets dive deeper to solve this problem
  • 14. What was the big picture going be like? Hadoop   Ad  impression  log   Postgres  DB   Booked  Campaigns   Forecas4ng    engine   Forecast  Query  
  • 15. High level forecasting engine algorithm Log   data   Load  Data  Cache   Campaign   data  daily Once a minute Submit  Forecast   query   Wait  for   query   Apply  filter  criteria   to  dataset   Count  available   impressions   Apply  growth  and   other   extrapola4on   factors  
  • 16. Some challenges… •  Organic growth in inventory •  Cold start •  Seasonality
  • 17. Organic growth in inventory Ad impression inventory in a growing market
  • 18. Organic growth in inventory? Ad impression inventory in a market with high conversion to premium
  • 19. Cold start Ad impression inventory in a newly launched market
  • 20. Seasonality Ad impression inventory dip in early Q1
  • 21. Volume of data •  Billions of ad impressions per month •  Terabytes of relevant forecasting data Data overload?
  • 22. Sampling
  • 23. Caching 9/12/2013   9/11/2013   9/10/2013   9/09/2013   9/08/2013   9/07/2013   Log   data   Load  Data  Cache   Campaign   data  daily Once a minute 9/13/2013   9/14/2013  
  • 24. Optimizing data retrieval •  We analyzed our data access pattern and found over 75% of our campaigns are targeted by age and location. •  So we mapped location to a list of users sorted by age using SortedSetMultimap •  Optimized user lookup by location and age-group to O(kLgN) from typical O(kN) where, N : Total users for a location k : constant
  • 25. Day of the Month 1   2   3   4   5   6   7   8   9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32   Growth
  • 26. How to find available inventory for sample population? 1.  Take all user ad impressions by applying “day of the month” substitution 2.  Apply filters by ad-type, location, age, gender, platform, etc. 3.  Count the total impressions for all the users who match 4.  Read booked impressions for the similar target criteria from the cache 5.  Inventory available = total impressions – booked impressions
  • 27. Growth Factor Keep it simple
  • 28. Extrapolation •  Population (15 million) -> Sample (150,000) •  Scaling factor is 100 •  Total Available inventory = scaling factor * available inventory for sample
  • 29. Other features •  Ad Frequency capping •  Day of the week and time of the day filtering •  View per user (VPU) capping
  • 30. What worked for us? 1.  Fast lookups 2.  Simple models scaled well 3.  Deterministic algorithms easier to debug 4.  Adding new targeting features was easy 5.  Forecasting engine agnostic to changes in ad server
  • 31. What didn’t work that well? 1.  Campaign level forecasts difficult without simulation 2.  Cold start is a real problem when there is no proxy dataset 3.  Forecasting inventory for new ad types can be challenging
  • 32. What we’ve learnt •  Think data volume •  Consider Sampling •  Choose appropriate time window •  Analyze data access patterns and optimize for it •  Use deterministic algorithms •  Analyze data trends and factor those in computation •  Simple models scale well
  • 33. May 12, 2014 Email - Kinshuk@spotify.com https://twitter.com/Spotifyjobs Thanks!