Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Kafka in Adobe Ad Cloud's Analytics Platform

415 views

Published on

Presented at the July 2017 Bay Area Apache Kafka meetup event hosted by Confluent

Published in: Software
  • Be the first to comment

Apache Kafka in Adobe Ad Cloud's Analytics Platform

  1. 1. Kafka in Adobe Ad Cloud's Analytics Platform Michael Schiff | Lead Software Engineer Vikram Patankar | Senior Engineering Manager
  2. 2. Agenda ● Adobe Ad Cloud ● Architecture ● Delivery Guarantees ● Duplicates ● Building exactly-once semantics
  3. 3. Ad-Tech Ecosystem RTB Ecosystem Ad ExchangeD S P S S P Source : https://www.slideshare.net/stanislavmikhailiyk/what-is-real-time-bidding-dsp-ssp-dmp-atd-itd PublisherAdvertiser
  4. 4. Adobe Ad Cloud Platform Optimize Buy Measure Plan
  5. 5. Ad Cloud Architecture Ad Exchanges Bidders + Ad Servers User Data Optimization Models Stats Platform User Data Platform Browsers, Mobile Apps, Connected TVs, Social platforms DMPAd Delivery Data Machine Learning Platform Ad opportunities Capping-Pacing Frequency Caps
  6. 6. Stats Platform Browsers, Mobile Apps, Connected TVs, Social platforms Ad Delivery Data Ad Cloud Architecture
  7. 7. Stats Architecture Data Event Servers Data event http Data loader Stream Processor Social Event Service Couchbase Druid Vertica Stats event Rollup event S3 Partner Report service Client Report Service Billing Ingestion Service UI UI Netsuite Machine Learning Business Intelligence Apps Social APIs Mysql mysqlbinlog Pixels Clients, Partners Clients Qubole / EMR RTB Attribution Log Ingestor Attribution Service Attribution Service Attainment Service / Real Time Stats API
  8. 8. 3.5 to 4 Billion events processed per day at peak 2016 peak volume increased 2.5x over 2015 Real-time stats in UI within 5 seconds after an event is received Data to data warehouse within 10 minutes 40+ event types 18 Kafka brokers handling 30 topics Produce ~3 TB of data per day, Consume ~23 TB data per day Scale
  9. 9. Context ● Started at kafka-0.7.2 ● Immediate need for Exactly-Once semantics ● Kafka Streams is a distant future...
  10. 10. Delivery Guarantees ● At Most Once ● At Least Once ● Exactly Once
  11. 11. “Exactly Once Delivery” is a Lie ● At least once + idempotent events
  12. 12. When Do I Care? ● Applications where exact counts matter
  13. 13. What’s Involved 1. Consumer-side offset tracking 2. Producer-side delivery guarantees 3. Producer-Consumer co-operation
  14. 14. Non-Atomic Offset Tracking Crashes between committing side-effects and committing offsets produce duplicates.
  15. 15. Atomic Offset Tracking Commit offsets with the side-effect they produce. Consumer does not produce duplicates.
  16. 16. Producer Side Delivery Guarantees acks : {0=don’t wait, 1=wait for leader ack, all=wait for ack from all replicas} retries : > 0
  17. 17. Producer Side Delivery Guarantees acks : {0=don’t wait, 1=wait for leader ack, all=wait for ack from all replicas} retries : > 0
  18. 18. Producer Side Delivery Guarantees acks : {0=don’t wait, 1=wait for leader ack, all=wait for ack from all replicas} retries : > 0
  19. 19. Producer Side Delivery Guarantees acks : {0=don’t wait, 1=wait for leader ack, all=wait for ack from all replicas} retries : > 0
  20. 20. Producer Side Delivery Guarantees At least once, In Order per Partition Delivery
  21. 21. Producer Side Delivery Guarantees In Order Delivery max.in.flight.requests.per.connection=1
  22. 22. Producer Offsets
  23. 23. Consuming Producer Offsets
  24. 24. Consuming Producer Offsets
  25. 25. Consuming Producer Offsets
  26. 26. Consuming Producer Offsets
  27. 27. Chaining Producer & Consumer
  28. 28. Costs of Application Side Deduplication ● Consumer state is larger - includes mapping producerId ⟼ producerOffset ● Adds complexity to consumer code ○ Consumers must be participate in deduplication
  29. 29. Benefits of Application Side Deduplication ● Eliminates need for atomic offset storage ○ Kafka’s solution introduces atomic cross-topic transactions to deal with this ● Allows ordered reprocessing of partition data ● Efficient partition recovery
  30. 30. We’re Hiring !!

×