5. Ad Cloud Architecture
Ad Exchanges
Bidders +
Ad Servers
User Data
Optimization
Models
Stats Platform
User Data
Platform
Browsers,
Mobile Apps,
Connected TVs,
Social
platforms
DMPAd Delivery
Data
Machine
Learning
Platform
Ad opportunities
Capping-Pacing
Frequency Caps
7. Stats Architecture
Data Event
Servers
Data event http
Data
loader
Stream
Processor
Social Event
Service
Couchbase
Druid
Vertica
Stats event
Rollup event
S3
Partner Report
service
Client Report
Service
Billing Ingestion
Service
UI
UI
Netsuite
Machine
Learning
Business
Intelligence
Apps
Social
APIs
Mysql
mysqlbinlog
Pixels
Clients,
Partners
Clients
Qubole /
EMR
RTB
Attribution
Log Ingestor
Attribution
Service
Attribution
Service
Attainment
Service / Real
Time Stats API
8. 3.5 to 4 Billion events processed per day at peak
2016 peak volume increased 2.5x over 2015
Real-time stats in UI within 5 seconds after an event is received
Data to data warehouse within 10 minutes
40+ event types
18 Kafka brokers handling 30 topics
Produce ~3 TB of data per day, Consume ~23 TB data per day
Scale
9. Context
● Started at kafka-0.7.2
● Immediate need for Exactly-Once semantics
● Kafka Streams is a distant future...
28. Costs of Application Side Deduplication
● Consumer state is larger - includes mapping producerId ⟼ producerOffset
● Adds complexity to consumer code
○ Consumers must be participate in deduplication
29. Benefits of Application Side Deduplication
● Eliminates need for atomic offset storage
○ Kafka’s solution introduces atomic cross-topic transactions to deal with this
● Allows ordered reprocessing of partition data
● Efficient partition recovery