• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,910
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • When an advertiser works with Rocket Fuel, it immediately has access to 145 RTB advertising supply partners, 21M sites, 20B ad serving opportunities, 3B users on 92000 devices.
  • Real Time Auction
    Selecting the right ad for each auction
  • Automatically learning from every response & getting better
    Nobody else is doing this as fast, precisely, consistently for our customers
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Mention that it runs round the clock, handles upwards of 100 TB per day, stages vary in frequency, dependencies vary in frequency, need to play catch up, bugs
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Mention that we went from
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting
  • Give props to the open source stack.
  • Give props to the open source stack.
  • Give props to the open source stack.
  • Give props to the open source stack.
  • Stock career images (2), probably ask recruiting
  • Stock career images (2), probably ask recruiting

Transcript

  • 1. Proprietary & Confidential. Copyright © 2014. Behavioral Targeting @ Scale - How did we know that this Ad was relevant for you ? Savin Goyal Sivasankaran Chandrasekar
  • 2. Proprietary & Confidential. Copyright © 2014. ADVERTISER ROCKET FUEL 200+ RTB advertising supply partners 50+ Mn Websites 50+ Bn Daily impressions 3B WW CONSUMERS 100,000+ DEVICES
  • 3. Proprietary & Confidential. Copyright © 2014. Exchanges Ad Exchange Rocket Fuel Platform Auto Optimization Real-Time Bidding Agencies Data Partners Display Advertising Ecosystem
  • 4. Proprietary & Confidential. Copyright © 2014. Bid on Ad User Data Bid Request Rocket Fuel Winning AdAd Request Ad Served to User Page RequestWeb Browser Rocket Fuel Platform Smart Ad Servers Response Prediction Models 1 8 2 7 Calculate Propensity Score 5User Engagement Recorded 9 User Engages with Ad Publishers Refresh learning Campaign & Audience Data 4 Qualify Campaign 10 3 6 Data Partners Exchange Partners Programmatic Buying
  • 5. Proprietary & Confidential. Copyright © 2014. 1.25 $2.11 $1.26 $2.78 $1.256 $1.809 $2.42 1.25 $2.11 $1.26 $2.78 $0.586 $2.009 1.25 $2.11 $1.26 $2.78 $1.56 $0.00 Site/PageGeo/WeatherTime of DayBrand AffinityUser [ + ][ + ] Real Time Auction
  • 6. Proprietary & Confidential. Copyright © 2014. Goal: Leads & sales Goal: Coupon downloads Goal: Brand awareness Site/PageGeo/WeatherTime of DayBrand AffinityDemo Impression Scorecard Demo Brand Affinity Time of Day Geo/Weather Site/Page Ad Position In-market Behavior Response Impression Scorecard Demo Brand Affinity Time of Day Geo/Weather Site/Page Ad Position In-Market Behavior Response X Impression Scorecard Demo Brand Affinity Time of Day Geo/Weather Site/Page Ad Position In-Market Behavior Response +100 +40 -20 +20 +15 +10 +40 +35 +9.7% +40 -70 -20 +10 +15 -25 -40 -18 +0.7% +10 -10 -20 +20 +10 -35 -25 +10 +1.4% X Real Time Auction
  • 7. Proprietary & Confidential. Copyright © 2014. Scalable Predictive Models Age/Gender Occupation IncomeEthnicity Purchase Intent Online Purchases Offline Purchases Browsing Behavior Site Actions Zip CodeCity/DMA Search Sites Search Categories Recency Search Keywords Web Site/Page Referral URL Site Category Bizographics Social Interests Lifestyle Positive Lift Marginal Impact Negative Lift -7 +17 X -2 +8 +14 X -9 -13 -12 X +19 +13 +11 X +11 X X X +25 +6 X -7 +17 -2 +28 X +11 X X -9 +14 +17 +19 +8 +11 X X -9 +17 -23 +6 X +17 -7 X -2 -13 -12 X +13 +6 +11 X X X -9 X +17 X +19 +8 +14 +18 -23 +17 -12 +11 -9 +8 +14 X +11 -13 -12 +13 +11 X X -7 +17 +8 +18X +11 X -12-10 +6 +14 X +8 +11 -10+13 +28 +6 +13 +19 X +8 +11 -10 +13 -12 +17 X -7 +8 X Automated Feature Selection  Infinite number of models  Determine perfect model size  Balance past data fit and future generalization Learn-Test-Refine  Automatically learn from each response  Cross-validate - A / B testing infrastructure  Training pipeline
  • 8. Proprietary & Confidential. Copyright © 2014. 5 B 6 B 50 B Facebook likes Searches on Google Events processed by Rocket Fuel Requests per day Throughput
  • 9. Proprietary & Confidential. Copyright © 2014. Rocket Fuel Scale  34,474 CPU Processor Cores  2655 servers  187.4 Teraflops of computing  188 Terabytes of memory  13X the memory of Jeopardy- winning IBM Watson  42 Petabytes of storage  106X the data volume of entire Library of Congress
  • 10. Proprietary & Confidential. Copyright © 2014. 200 Servers 1400 Servers 5 PB 41 PB 8x Data Warehouse Growth
  • 11. Proprietary & Confidential. Copyright © 2014. Behavioral Targeting
  • 12. Proprietary & Confidential. Copyright © 2014. Behavioral Targeting  Leverage online activities on the web to learn about user’s  Long Term Interests  User is interested in luxury cars  Short Term Interests  User is looking for a pizza right now  Expand user set beyond retargeting  Explore v/s Exploit  Identify relevant users even if they have never been targeted previously
  • 13. Proprietary & Confidential. Copyright © 2014. Behavioral Targeting @ Rocket Fuel Label Data Train Model Back Test Calibrate Training Events Pixel Stream Ad Logs BT Features (HBase) Feature Generation Score Profiles Profile Generation Scoring Ad Serving Data Centers Model
  • 14. Proprietary & Confidential. Copyright © 2014. Hadoop/HBase @ Rocket Fuel  Cluster Highlights  650+ Slaves (64 GB + 12 *3 TB)  20 PB Storage  HA Name Node Set Up  9k Map Slots + 5.5k Reduce Slots  Co-located to run HBase for offline processing  HBase 0.94.15  5 Node ZooKeeper quorum  Monitoring with OpenTSDB  Dual Master Setup
  • 15. Proprietary & Confidential. Copyright © 2014. Behavioral Targeting @ Rocket Fuel bmw.com 11:23 Cars 11:23 pizzahut.com 11:26 Food 11:26 honda.com 11:27 Cars 11:27 30 minutes honda.com 11:27 Recent 6 hours: 5 Between 6 and 12 hours: 3 Between 12 hours and … Food 11:26 Recent 6 hours: 2 Between 6 and 12 hours: 7 Between 12 hours and … Read events of last N days Recency Frequency Others.. Behavioral Targeting Profile 11:23 11:26 11:27
  • 16. Proprietary & Confidential. Copyright © 2014. HBase Data Model 11:23ABCD06EFG 2014060416:site:bmw.com 2014060416:category:food 11:26 row_key: user_id Single Column Family “u” Column Qualifier: <date><hour>:<type>:<value> Cell Value: [Protobuf] Most recent timestamp, Event details relative to timestamp Event details relative to 11:23 Event details relative to 11:26 • Efficient look up for a given user • Access range of events by event date, hour and type
  • 17. Proprietary & Confidential. Copyright © 2014.
  • 18. Proprietary & Confidential. Copyright © 2014. Key Challenges User Profile Freshness Scaling Issues Pipeline Failures
  • 19. Proprietary & Confidential. Copyright © 2014. User Profile Freshness  Strict latency requirements  Recent activity much better predictor Solutions -  Staggered Pipelines  Real Time Behavioral Targeting
  • 20. Proprietary & Confidential. Copyright © 2014. Staggered Pipelines Extract Score Filter Upload Extract Score Filter UploadSource Data Extract Score Filter Upload Extract Score Filter Upload Extract Score Filter Upload
  • 21. Proprietary & Confidential. Copyright © 2014. Real Time Behavioral Targeting
  • 22. Proprietary & Confidential. Copyright © 2014. Batched Profile Blackbird – HBase instance tuned for 2ms latencies Refreshed every N hours Real Time Behavioral Targeting Offline BT Pipeline BT Profile Ad Servers Merge Profiles Logs Blackbird Online Profile Record events for users in real time Request Response
  • 23. Proprietary & Confidential. Copyright © 2014. Batched Updates vs. Real Time Updates Event Granularity Aggregated over several hours/days Raw recorded events appended for recent N hours Processing Load Requires minimal CPU processing Needs aggregation on-the-fly Disk Footprint Compact representation captures several days Strict limits to ensure read times are acceptable Coverage All interactions Only interactions at a data center  Real Time Profile updated in milliseconds  Batched Profile refreshed every N hours Batched Profile Real Time Profile
  • 24. Proprietary & Confidential. Copyright © 2014. Scaling Issues  3X growth in events processed/year  First Party Data  App Interactions  Geo-location Data  …  Case Studies  HBase Region Hot-spotting  Network Bandwidth Troubles
  • 25. Proprietary & Confidential. Copyright © 2014. HBase Region Hot Spotting
  • 26. Proprietary & Confidential. Copyright © 2014. HBase Region HBase Region Hot-spotting High Write Load HBase Region HBase Region Region Split (painful!) Some users more active than others No control on user id’s generated Still problematic Non-uniform distribution!
  • 27. Proprietary & Confidential. Copyright © 2014. HBase Region Hot-spotting  Uneven write-load distribution  Non-Uniform Row Key Distribution  Salt row key’s to ensure uniform distribution  Fixed length hashed prefix  Murmur hash based prefix Original User ID  Uniform pre-splits
  • 28. Proprietary & Confidential. Copyright © 2014. HBase Region Hot-spotting  Don’t stop at salting  Map input splits configured for region boundaries Region 1 x03x85x1ExB8ZZZZZZ Region 2 x07x5CxF5xC2928ZZ Region m xFFxAEx14xE1Z28ZZ 1234557 1234568 1234579 1234583 1234594 .. .. .. .. ZZAHT654 ZZZGT934 ZZZZNGA2 ZZZZKLO1 Key Partitioner ‘k’ splits ‘m’ regions‘m’ splits x01x85x1ExB811ZKL1 x01x86x1ExB8129542 .. x03x85x1ExB8ZZZKL1 x05x35x9Ex18087KL1 x06x86x1ExB8AHV24 .. x07x5CxF5xC16534Z xEBx27x92x1508RKL1 xFEx86x1ExB8AHV24 .. xFFxAEx14x126534Z
  • 29. Proprietary & Confidential. Copyright © 2014. HBase Key Partitioner  As many splits as regions to maximize parallelism  Key Partitioner (MR) –  Reads region boundaries of HBase table  Salts and sorts row key accordingly  Multiple Output Format to optimize reduce phase  Each generated split file corresponds to a single region  Drastically reduces read latencies
  • 30. Proprietary & Confidential. Copyright © 2014. Network Bandwidth Troubles
  • 31. Proprietary & Confidential. Copyright © 2014. Data Center Expansion
  • 32. Proprietary & Confidential. Copyright © 2014. Network Bandwidth Constraints  Consistently overshot bandwidth limit during uploads  All sorts of delays (Redis, MySQL, Blackbird…)  Bidding hampered
  • 33. Proprietary & Confidential. Copyright © 2014. Solutions  Intelligent storage – protobufs everywhere  Throttle writes  Geo-splitting
  • 34. Proprietary & Confidential. Copyright © 2014. Geo Splitting
  • 35. Proprietary & Confidential. Copyright © 2014. Geo-splitting  Tag user’s location history & predict future data center visits  ⨍(dc, geo_history, bt_profile)  A separate workflow periodically generates geo-split rules:  Clusters users & analyzes migration patterns  Ensures maximal look-up coverage of profiles  Minimizes total number of profiles stored  Ensures efficient use of resources, with minimal impact on perf
  • 36. Proprietary & Confidential. Copyright © 2014. Geo-splitting Label Data Train Model Back Test Calibrate Training Events Pixel Stream Ad Logs BT Features (HBase) Feature Generation Score Profiles Profile Generation Scoring Ad Serving Data Centers Model Cluster Users Analyze Patterns Generate Rules Geo-split
  • 37. Proprietary & Confidential. Copyright © 2014.
  • 38. Proprietary & Confidential. Copyright © 2014. Quick Recovery From Failures  Break pipeline into short payloads  Fail fast, recover fast!  Actionable alerts, cut down noise
  • 39. Proprietary & Confidential. Copyright © 2014. Quick Recovery From Failures  Materialize data as frequently as possible  Cross system fault tolerance  Idempotency  Backfill at EOD to plug holes if needed
  • 40. Proprietary & Confidential. Copyright © 2014. Shout-outs!
  • 41. Proprietary & Confidential. Copyright © 2014. Shout-outs!
  • 42. Proprietary & Confidential. Copyright © 2014. Shout-outs!
  • 43. Proprietary & Confidential. Copyright © 2014. Shout-outs!
  • 44. Proprietary & Confidential. Copyright © 2014. We Are Hiring!
  • 45. Proprietary & Confidential. Copyright © 2014. Questions ? Thank You! Sivasankaran Chandrasekar chandra@rocketfuel.com Savin Goyal savin@rocketfuel.com
  • 46. Proprietary & Confidential. Copyright © 2014. We are hiring! (as always) http://rocketfuel.com/careers savin@rocketfuel.com chandra@rocketfuel.com
  • 47. Proprietary & Confidential. Copyright © 2014.