Sifting, Sorting and Scanning for Gold – The Real-Time User Data Problem


Published on

Brian Bulkowski, founder and CEO of Citrusleaf, will focus on the role of innovative NoSQL databases in advertising and the problem of real-time user data. The talk will feature case studies from online and mobile advertising companies who are facing data challenges of 100s of terabytes and requirements for millisecond response times. He will discuss the data problems in real-time bidding applications and present some ideas for moving beyond the last click attribution model. He will illustrate why scalability and speed are so critical and how new technology approaches are driving innovation in digital advertising.

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Sifting, Sorting and Scanning for Gold – The Real-Time User Data Problem

  1. 1. Citrusleaf“Sifting, Sorting and Scanning for Gold – g, g g The Real‐Time User Data Problem” Brian Bulkowski, CEO and Founder © 2011 Citrusleaf. All rights reserved. 1
  2. 2. NoSQL in Digital Advertising• Display advertising – today’s state of the art Display advertising  today s state of the art• Real‐time data in digital advertising• C Case study   d – User data store & cookie mapping• Market differentiation• Moving beyond the last click g yCitrusleaf Confidential © 2011 Citrusleaf. All rights reserved. 2
  3. 3. Display Advertising• Lie: – Content determines the value of an impression• Lie:  – The value of an individual user is constant – The decision funnel The decision funnel• Truth: – The right ad to the right user at the right time The right ad, to the right user, at the right time © 2011 Citrusleaf. All rights reserved. 3
  4. 4. Today’s state of the art• Impression logs => user segments (warehouse)• Third part data => user segments• Region segments• Real‐time per‐user factor • Frequency caps • Session management for cookie less users Session management for cookie‐less users• Mapping external partner IDs• Build optimization tables based on log analysis Build optimization tables based on log analysis Simple math determines highest value ad Simple math determines highest value ad © 2011 Citrusleaf. All rights reserved. 4
  5. 5. Real‐time data requirements• Scalable and  flexible fl ibl• 100% availability• Billion object support• High performance with low hardware cost• Sophisticated data eviction policies © 2010 Citrusleaf. All rights reserved. 5
  6. 6. Case StudyMap store User storeKey y cookie string or g Key internal user id partner id Value   Segment dataValue   internal user id (8 bytes) Frequency caps Other optimization dataNorth America North America 1B ~ 1.5B objects 500M ~ 800M objects Write load: Write load: Write load: Write load: 1k ~ 2k per second 10K ~ 20K per second Read load: Read load: 10k ~ 50k per second p 20k ~ 100k per secondConfiguration Configuration DRAM backed by disk DRAM backed by disk 1T  4T user storage 1T ~ 4T user storage Lowest latency (0.4 ms) Low latency (0.8 ms) © 2011 Citrusleaf. All rights reserved. 6
  7. 7. Market differentiation• Control over impressions Control over impressions• Quality and variety of inventory y y y• Better insight generation & tracking• And: – High quality user understanding Trend: multiple user stores © 2011 Citrusleaf. All rights reserved. 7
  8. 8. The next challengeWeakness: “last ad takes all” Performance is judged externally by the last  P f i j d d t ll b th l t advertiser before a conversionSo… game the system to place strategically New technology allows real‐time understanding of  user behaviorBut … Will advertisers stand for it? © 2011 Citrusleaf. All rights reserved. 8
  9. 9. Indexed user behavior storage• Store 100B+ user  behavior objects  b h i bj t (1B per day, 90  days)• Index by user_id,  giving sub‐second  access to entire  behavior chain• O ti i d f Optimized for  rotational media• Time‐based eviction Time‐based eviction © 2010 Citrusleaf. All rights reserved. 9
  10. 10. Indexed user behavior storage• Solves the “needle in a haystack” problem• Allows immediate behavioral triggering All i di b h i l i i• Removes ETL and Map/Reduce scans• Allows sophisticated attribution models, (and advanced optimization and reporting) The right ad, g , to the right user, at the right time t th i ht ti © 2011 Citrusleaf. All rights reserved. 10
  11. 11. CitrusleafMaking Web Scale Easy and AffordableM ki W b S l E d Aff d bl Contact Info: Contact Info: brian@citrusleaf com © 2011 Citrusleaf. All rights reserved. 11
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.