Proprietary & Confidential. Copyright © 2014.
Behavioral Targeting @ Scale
- How did we know that this Ad was relevant for...
Proprietary & Confidential. Copyright © 2014.
ADVERTISER ROCKET FUEL
200+
RTB advertising
supply partners
50+ Mn
Websites
...
Proprietary & Confidential. Copyright © 2014.
Exchanges
Ad
Exchange
Rocket Fuel Platform
Auto
Optimization
Real-Time
Biddi...
Proprietary & Confidential. Copyright © 2014.
Bid on Ad
User
Data
Bid Request
Rocket Fuel
Winning AdAd Request
Ad Served t...
Proprietary & Confidential. Copyright © 2014.
1.25
$2.11
$1.26
$2.78
$1.256
$1.809
$2.42
1.25
$2.11
$1.26
$2.78
$0.586
$2....
Proprietary & Confidential. Copyright © 2014.
Goal:
Leads
& sales
Goal:
Coupon
downloads
Goal:
Brand
awareness
Site/PageGe...
Proprietary & Confidential. Copyright © 2014.
Scalable Predictive Models
Age/Gender
Occupation
IncomeEthnicity
Purchase In...
Proprietary & Confidential. Copyright © 2014.
5 B
6 B
50 B
Facebook likes
Searches on Google
Events processed by Rocket Fu...
Proprietary & Confidential. Copyright © 2014.
Rocket Fuel Scale
 34,474 CPU Processor Cores
 2655 servers
 187.4 Terafl...
Proprietary & Confidential. Copyright © 2014.
200 Servers 1400 Servers
5 PB
41 PB
8x
Data Warehouse Growth
Proprietary & Confidential. Copyright © 2014.
Behavioral Targeting
Proprietary & Confidential. Copyright © 2014.
Behavioral Targeting
 Leverage online activities on the web to learn about ...
Proprietary & Confidential. Copyright © 2014.
Behavioral Targeting @ Rocket Fuel
Label Data
Train
Model
Back Test
Calibrat...
Proprietary & Confidential. Copyright © 2014.
Hadoop/HBase @ Rocket Fuel
 Cluster Highlights
 650+ Slaves (64 GB + 12 *3...
Proprietary & Confidential. Copyright © 2014.
Behavioral Targeting @ Rocket Fuel
bmw.com 11:23
Cars 11:23
pizzahut.com 11:...
Proprietary & Confidential. Copyright © 2014.
HBase Data Model
11:23ABCD06EFG
2014060416:site:bmw.com 2014060416:category:...
Proprietary & Confidential. Copyright © 2014.
Proprietary & Confidential. Copyright © 2014.
Key Challenges
User Profile Freshness Scaling Issues Pipeline Failures
Proprietary & Confidential. Copyright © 2014.
User Profile Freshness
 Strict latency requirements
 Recent activity much ...
Proprietary & Confidential. Copyright © 2014.
Staggered Pipelines
Extract Score Filter Upload
Extract Score Filter UploadS...
Proprietary & Confidential. Copyright © 2014.
Real Time
Behavioral Targeting
Proprietary & Confidential. Copyright © 2014.
Batched Profile
Blackbird – HBase instance tuned for 2ms latencies
Refreshed...
Proprietary & Confidential. Copyright © 2014.
Batched Updates vs. Real Time Updates
Event Granularity
Aggregated over
seve...
Proprietary & Confidential. Copyright © 2014.
Scaling Issues
 3X growth in events processed/year
 First Party Data
 App...
Proprietary & Confidential. Copyright © 2014.
HBase Region
Hot Spotting
Proprietary & Confidential. Copyright © 2014.
HBase
Region
HBase Region Hot-spotting
High Write Load
HBase
Region
HBase
Re...
Proprietary & Confidential. Copyright © 2014.
HBase Region Hot-spotting
 Uneven write-load distribution
 Non-Uniform Row...
Proprietary & Confidential. Copyright © 2014.
HBase Region Hot-spotting
 Don’t stop at salting
 Map input splits configu...
Proprietary & Confidential. Copyright © 2014.
HBase Key Partitioner
 As many splits as regions to maximize parallelism
 ...
Proprietary & Confidential. Copyright © 2014.
Network Bandwidth
Troubles
Proprietary & Confidential. Copyright © 2014.
Data Center Expansion
Proprietary & Confidential. Copyright © 2014.
Network Bandwidth Constraints
 Consistently overshot bandwidth limit during...
Proprietary & Confidential. Copyright © 2014.
Solutions
 Intelligent storage – protobufs everywhere
 Throttle writes
 G...
Proprietary & Confidential. Copyright © 2014.
Geo Splitting
Proprietary & Confidential. Copyright © 2014.
Geo-splitting
 Tag user’s location history & predict future data center vis...
Proprietary & Confidential. Copyright © 2014.
Geo-splitting
Label Data
Train
Model
Back Test
Calibrate
Training
Events
Pix...
Proprietary & Confidential. Copyright © 2014.
Proprietary & Confidential. Copyright © 2014.
Quick Recovery From Failures
 Break pipeline into short payloads
 Fail fas...
Proprietary & Confidential. Copyright © 2014.
Quick Recovery From Failures
 Materialize data as frequently as possible
 ...
Proprietary & Confidential. Copyright © 2014.
Shout-outs!
Proprietary & Confidential. Copyright © 2014.
Shout-outs!
Proprietary & Confidential. Copyright © 2014.
Shout-outs!
Proprietary & Confidential. Copyright © 2014.
Shout-outs!
Proprietary & Confidential. Copyright © 2014.
We Are Hiring!
Proprietary & Confidential. Copyright © 2014.
Questions ?
Thank You!
Sivasankaran Chandrasekar
chandra@rocketfuel.com
Savi...
Proprietary & Confidential. Copyright © 2014.
We are hiring! (as always)
http://rocketfuel.com/careers
savin@rocketfuel.co...
Proprietary & Confidential. Copyright © 2014.
How did you know this ad would be relevant for me?
Upcoming SlideShare
Loading in...5
×

How did you know this ad would be relevant for me?

2,795

Published on

Published in: Technology, Business
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,795
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide
  • When an advertiser works with Rocket Fuel, it immediately has access to 145 RTB advertising supply partners, 21M sites, 20B ad serving opportunities, 3B users on 92000 devices.
  • Real Time Auction
    Selecting the right ad for each auction
  • Automatically learning from every response & getting better
    Nobody else is doing this as fast, precisely, consistently for our customers
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Mention that it runs round the clock, handles upwards of 100 TB per day, stages vary in frequency, dependencies vary in frequency, need to play catch up, bugs
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Mention that we went from
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Give props to the open source stack.
  • Give props to the open source stack.
  • Give props to the open source stack.
  • Give props to the open source stack.
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • How did you know this ad would be relevant for me?

    1. 1. Proprietary & Confidential. Copyright © 2014. Behavioral Targeting @ Scale - How did we know that this Ad was relevant for you ? Savin Goyal Sivasankaran Chandrasekar
    2. 2. Proprietary & Confidential. Copyright © 2014. ADVERTISER ROCKET FUEL 200+ RTB advertising supply partners 50+ Mn Websites 50+ Bn Daily impressions 3B WW CONSUMERS 100,000+ DEVICES
    3. 3. Proprietary & Confidential. Copyright © 2014. Exchanges Ad Exchange Rocket Fuel Platform Auto Optimization Real-Time Bidding Agencies Data Partners Display Advertising Ecosystem
    4. 4. Proprietary & Confidential. Copyright © 2014. Bid on Ad User Data Bid Request Rocket Fuel Winning AdAd Request Ad Served to User Page RequestWeb Browser Rocket Fuel Platform Smart Ad Servers Response Prediction Models 1 8 2 7 Calculate Propensity Score 5User Engagement Recorded 9 User Engages with Ad Publishers Refresh learning Campaign & Audience Data 4 Qualify Campaign 10 3 6 Data Partners Exchange Partners Programmatic Buying
    5. 5. Proprietary & Confidential. Copyright © 2014. 1.25 $2.11 $1.26 $2.78 $1.256 $1.809 $2.42 1.25 $2.11 $1.26 $2.78 $0.586 $2.009 1.25 $2.11 $1.26 $2.78 $1.56 $0.00 Site/PageGeo/WeatherTime of DayBrand AffinityUser [ + ][ + ] Real Time Auction
    6. 6. Proprietary & Confidential. Copyright © 2014. Goal: Leads & sales Goal: Coupon downloads Goal: Brand awareness Site/PageGeo/WeatherTime of DayBrand AffinityDemo Impression Scorecard Demo Brand Affinity Time of Day Geo/Weather Site/Page Ad Position In-market Behavior Response Impression Scorecard Demo Brand Affinity Time of Day Geo/Weather Site/Page Ad Position In-Market Behavior Response X Impression Scorecard Demo Brand Affinity Time of Day Geo/Weather Site/Page Ad Position In-Market Behavior Response +100 +40 -20 +20 +15 +10 +40 +35 +9.7% +40 -70 -20 +10 +15 -25 -40 -18 +0.7% +10 -10 -20 +20 +10 -35 -25 +10 +1.4% X Real Time Auction
    7. 7. Proprietary & Confidential. Copyright © 2014. Scalable Predictive Models Age/Gender Occupation IncomeEthnicity Purchase Intent Online Purchases Offline Purchases Browsing Behavior Site Actions Zip CodeCity/DMA Search Sites Search Categories Recency Search Keywords Web Site/Page Referral URL Site Category Bizographics Social Interests Lifestyle Positive Lift Marginal Impact Negative Lift -7 +17 X -2 +8 +14 X -9 -13 -12 X +19 +13 +11 X +11 X X X +25 +6 X -7 +17 -2 +28 X +11 X X -9 +14 +17 +19 +8 +11 X X -9 +17 -23 +6 X +17 -7 X -2 -13 -12 X +13 +6 +11 X X X -9 X +17 X +19 +8 +14 +18 -23 +17 -12 +11 -9 +8 +14 X +11 -13 -12 +13 +11 X X -7 +17 +8 +18X +11 X -12-10 +6 +14 X +8 +11 -10+13 +28 +6 +13 +19 X +8 +11 -10 +13 -12 +17 X -7 +8 X Automated Feature Selection  Infinite number of models  Determine perfect model size  Balance past data fit and future generalization Learn-Test-Refine  Automatically learn from each response  Cross-validate - A / B testing infrastructure  Training pipeline
    8. 8. Proprietary & Confidential. Copyright © 2014. 5 B 6 B 50 B Facebook likes Searches on Google Events processed by Rocket Fuel Requests per day Throughput
    9. 9. Proprietary & Confidential. Copyright © 2014. Rocket Fuel Scale  34,474 CPU Processor Cores  2655 servers  187.4 Teraflops of computing  188 Terabytes of memory  13X the memory of Jeopardy- winning IBM Watson  42 Petabytes of storage  106X the data volume of entire Library of Congress
    10. 10. Proprietary & Confidential. Copyright © 2014. 200 Servers 1400 Servers 5 PB 41 PB 8x Data Warehouse Growth
    11. 11. Proprietary & Confidential. Copyright © 2014. Behavioral Targeting
    12. 12. Proprietary & Confidential. Copyright © 2014. Behavioral Targeting  Leverage online activities on the web to learn about user’s  Long Term Interests  User is interested in luxury cars  Short Term Interests  User is looking for a pizza right now  Expand user set beyond retargeting  Explore v/s Exploit  Identify relevant users even if they have never been targeted previously
    13. 13. Proprietary & Confidential. Copyright © 2014. Behavioral Targeting @ Rocket Fuel Label Data Train Model Back Test Calibrate Training Events Pixel Stream Ad Logs BT Features (HBase) Feature Generation Score Profiles Profile Generation Scoring Ad Serving Data Centers Model
    14. 14. Proprietary & Confidential. Copyright © 2014. Hadoop/HBase @ Rocket Fuel  Cluster Highlights  650+ Slaves (64 GB + 12 *3 TB)  20 PB Storage  HA Name Node Set Up  9k Map Slots + 5.5k Reduce Slots  Co-located to run HBase for offline processing  HBase 0.94.15  5 Node ZooKeeper quorum  Monitoring with OpenTSDB  Dual Master Setup
    15. 15. Proprietary & Confidential. Copyright © 2014. Behavioral Targeting @ Rocket Fuel bmw.com 11:23 Cars 11:23 pizzahut.com 11:26 Food 11:26 honda.com 11:27 Cars 11:27 30 minutes honda.com 11:27 Recent 6 hours: 5 Between 6 and 12 hours: 3 Between 12 hours and … Food 11:26 Recent 6 hours: 2 Between 6 and 12 hours: 7 Between 12 hours and … Read events of last N days Recency Frequency Others.. Behavioral Targeting Profile 11:23 11:26 11:27
    16. 16. Proprietary & Confidential. Copyright © 2014. HBase Data Model 11:23ABCD06EFG 2014060416:site:bmw.com 2014060416:category:food 11:26 row_key: user_id Single Column Family “u” Column Qualifier: <date><hour>:<type>:<value> Cell Value: [Protobuf] Most recent timestamp, Event details relative to timestamp Event details relative to 11:23 Event details relative to 11:26 • Efficient look up for a given user • Access range of events by event date, hour and type
    17. 17. Proprietary & Confidential. Copyright © 2014.
    18. 18. Proprietary & Confidential. Copyright © 2014. Key Challenges User Profile Freshness Scaling Issues Pipeline Failures
    19. 19. Proprietary & Confidential. Copyright © 2014. User Profile Freshness  Strict latency requirements  Recent activity much better predictor Solutions -  Staggered Pipelines  Real Time Behavioral Targeting
    20. 20. Proprietary & Confidential. Copyright © 2014. Staggered Pipelines Extract Score Filter Upload Extract Score Filter UploadSource Data Extract Score Filter Upload Extract Score Filter Upload Extract Score Filter Upload
    21. 21. Proprietary & Confidential. Copyright © 2014. Real Time Behavioral Targeting
    22. 22. Proprietary & Confidential. Copyright © 2014. Batched Profile Blackbird – HBase instance tuned for 2ms latencies Refreshed every N hours Real Time Behavioral Targeting Offline BT Pipeline BT Profile Ad Servers Merge Profiles Logs Blackbird Online Profile Record events for users in real time Request Response
    23. 23. Proprietary & Confidential. Copyright © 2014. Batched Updates vs. Real Time Updates Event Granularity Aggregated over several hours/days Raw recorded events appended for recent N hours Processing Load Requires minimal CPU processing Needs aggregation on-the-fly Disk Footprint Compact representation captures several days Strict limits to ensure read times are acceptable Coverage All interactions Only interactions at a data center  Real Time Profile updated in milliseconds  Batched Profile refreshed every N hours Batched Profile Real Time Profile
    24. 24. Proprietary & Confidential. Copyright © 2014. Scaling Issues  3X growth in events processed/year  First Party Data  App Interactions  Geo-location Data  …  Case Studies  HBase Region Hot-spotting  Network Bandwidth Troubles
    25. 25. Proprietary & Confidential. Copyright © 2014. HBase Region Hot Spotting
    26. 26. Proprietary & Confidential. Copyright © 2014. HBase Region HBase Region Hot-spotting High Write Load HBase Region HBase Region Region Split (painful!) Some users more active than others No control on user id’s generated Still problematic Non-uniform distribution!
    27. 27. Proprietary & Confidential. Copyright © 2014. HBase Region Hot-spotting  Uneven write-load distribution  Non-Uniform Row Key Distribution  Salt row key’s to ensure uniform distribution  Fixed length hashed prefix  Murmur hash based prefix Original User ID  Uniform pre-splits
    28. 28. Proprietary & Confidential. Copyright © 2014. HBase Region Hot-spotting  Don’t stop at salting  Map input splits configured for region boundaries Region 1 x03x85x1ExB8ZZZZZZ Region 2 x07x5CxF5xC2928ZZ Region m xFFxAEx14xE1Z28ZZ 1234557 1234568 1234579 1234583 1234594 .. .. .. .. ZZAHT654 ZZZGT934 ZZZZNGA2 ZZZZKLO1 Key Partitioner ‘k’ splits ‘m’ regions‘m’ splits x01x85x1ExB811ZKL1 x01x86x1ExB8129542 .. x03x85x1ExB8ZZZKL1 x05x35x9Ex18087KL1 x06x86x1ExB8AHV24 .. x07x5CxF5xC16534Z xEBx27x92x1508RKL1 xFEx86x1ExB8AHV24 .. xFFxAEx14x126534Z
    29. 29. Proprietary & Confidential. Copyright © 2014. HBase Key Partitioner  As many splits as regions to maximize parallelism  Key Partitioner (MR) –  Reads region boundaries of HBase table  Salts and sorts row key accordingly  Multiple Output Format to optimize reduce phase  Each generated split file corresponds to a single region  Drastically reduces read latencies
    30. 30. Proprietary & Confidential. Copyright © 2014. Network Bandwidth Troubles
    31. 31. Proprietary & Confidential. Copyright © 2014. Data Center Expansion
    32. 32. Proprietary & Confidential. Copyright © 2014. Network Bandwidth Constraints  Consistently overshot bandwidth limit during uploads  All sorts of delays (Redis, MySQL, Blackbird…)  Bidding hampered
    33. 33. Proprietary & Confidential. Copyright © 2014. Solutions  Intelligent storage – protobufs everywhere  Throttle writes  Geo-splitting
    34. 34. Proprietary & Confidential. Copyright © 2014. Geo Splitting
    35. 35. Proprietary & Confidential. Copyright © 2014. Geo-splitting  Tag user’s location history & predict future data center visits  ⨍(dc, geo_history, bt_profile)  A separate workflow periodically generates geo-split rules:  Clusters users & analyzes migration patterns  Ensures maximal look-up coverage of profiles  Minimizes total number of profiles stored  Ensures efficient use of resources, with minimal impact on perf
    36. 36. Proprietary & Confidential. Copyright © 2014. Geo-splitting Label Data Train Model Back Test Calibrate Training Events Pixel Stream Ad Logs BT Features (HBase) Feature Generation Score Profiles Profile Generation Scoring Ad Serving Data Centers Model Cluster Users Analyze Patterns Generate Rules Geo-split
    37. 37. Proprietary & Confidential. Copyright © 2014.
    38. 38. Proprietary & Confidential. Copyright © 2014. Quick Recovery From Failures  Break pipeline into short payloads  Fail fast, recover fast!  Actionable alerts, cut down noise
    39. 39. Proprietary & Confidential. Copyright © 2014. Quick Recovery From Failures  Materialize data as frequently as possible  Cross system fault tolerance  Idempotency  Backfill at EOD to plug holes if needed
    40. 40. Proprietary & Confidential. Copyright © 2014. Shout-outs!
    41. 41. Proprietary & Confidential. Copyright © 2014. Shout-outs!
    42. 42. Proprietary & Confidential. Copyright © 2014. Shout-outs!
    43. 43. Proprietary & Confidential. Copyright © 2014. Shout-outs!
    44. 44. Proprietary & Confidential. Copyright © 2014. We Are Hiring!
    45. 45. Proprietary & Confidential. Copyright © 2014. Questions ? Thank You! Sivasankaran Chandrasekar chandra@rocketfuel.com Savin Goyal savin@rocketfuel.com
    46. 46. Proprietary & Confidential. Copyright © 2014. We are hiring! (as always) http://rocketfuel.com/careers savin@rocketfuel.com chandra@rocketfuel.com
    47. 47. Proprietary & Confidential. Copyright © 2014.

    ×