Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introducing Exactly Once Semantics To Apache Kafka

4,444 views

Published on

Here are slides from my talk on introducing exactly once semantics to Apache Kafka. The talk was given at the Kafka Summit NYC, 8 May 2017.

The slides dive into the design of transactions in Apache Kafka.

Published in: Engineering

Introducing Exactly Once Semantics To Apache Kafka

  1. 1. 1 Introducing Exactly Once Semantics in Apache Kafka Jason Gustafson, Guozhang Wang, Sriram Subramaniam, and Apurva Mehta
  2. 2. 2 On deck.. • Kafka’s existing delivery semantics. • Why did we improve them? • What’s new? • How do you use it? • Summary.
  3. 3. 3 Apache Kafka’s existing semantics
  4. 4. 4 Existing Semantics
  5. 5. 5 Existing Semantics
  6. 6. 6 Existing Semantics
  7. 7. 7 Existing Semantics
  8. 8. 8 Existing Semantics
  9. 9. 9 Existing Semantics
  10. 10. 10 Existing Semantics
  11. 11. 11 Existing Semantics
  12. 12. 12 Existing Semantics
  13. 13. 13 TL;DR – What we have today • At least once in order delivery per partition. • Producer retries can introduce duplicates.
  14. 14. 14 Why improve?
  15. 15. 15 Why improve? • Stream processing is becoming an ever bigger part of the data landscape. • Apache Kafka is the heart of the streams platform. • Strengthening Kafka’s semantics expands the universe of streaming applications.
  16. 16. 16 A motivating example.. A peer to peer lending platform which processes micro-loans between users.
  17. 17. 17 A Peer to Peer Lender
  18. 18. 18 The Basic Flow
  19. 19. 19 Offset commits
  20. 20. 20 Reprocessed transfer, eek!
  21. 21. 21 Lost money! Eek eek!
  22. 22. 22 What’s new?
  23. 23. 23 What’s new • Exactly once in order delivery per partition • Atomic writes across multiple partitions • Performance considerations
  24. 24. 24 What’s new, Part 1 Exactly once, in order, delivery per partition
  25. 25. 25 The idempotent producer
  26. 26. 26 The idempotent producer
  27. 27. 27 The idempotent producer
  28. 28. 28 The idempotent producer
  29. 29. 29 The idempotent producer
  30. 30. 30 The idempotent producer
  31. 31. 31 The idempotent producer
  32. 32. 32 The idempotent producer
  33. 33. 33 TL;DR • Sequence numbers and producer ids: • enable de-dup • are in the log. • Hence de-dup works transparently across leader changes. • Will not de-dup application-level resends. • Works transparently – no API changes.
  34. 34. 34 What’s new, part 2 Multi partition writes.
  35. 35. 35 Introducing ‘transactions’ producer.initTransactions(); try { producer.beginTransaction(); producer.send(record0); producer.send(record1); producer.sendOffsetsToTxn(…); producer.commitTransaction(); } catch (ProducerFencedException e) { producer.close(); } catch (KafkaException e) { producer.abortTransaction(); }
  36. 36. 36 Introducing ‘transactions’
  37. 37. 37 Initializing ‘transactions’
  38. 38. 38 Transactional sends – part 1
  39. 39. 39 Transactional sends – part 2
  40. 40. 40 Commit – phase 1
  41. 41. 41 Commit – phase 2
  42. 42. 42 Commit – phase 2
  43. 43. 43 Success!
  44. 44. 44 Let’s review the APIs producer.initTransactions(); try { producer.beginTransaction(); producer.send(record0); producer.send(record1); producer.sendOffsetsToTxn(…); producer.commitTransaction(); } catch (ProducerFencedException e) { producer.close(); } catch (KafkaException e) { producer.abortTransaction(); }
  45. 45. 45 Let’s review the APIs producer.initTransactions(); try { producer.beginTransaction(); producer.send(record0); producer.send(record1); producer.sendOffsetsToTxn(…); producer.commitTransaction(); } catch (ProducerFencedException e) { producer.close(); } catch (KafkaException e) { producer.abortTransaction(); }
  46. 46. 46 Let’s review the APIs producer.initTransactions(); try { producer.beginTransaction(); producer.send(record0); producer.send(record1); producer.sendOffsetsToTxn(…); producer.commitTransaction(); } catch (ProducerFencedException e) { producer.close(); } catch (KafkaException e) { producer.abortTransaction(); }
  47. 47. 47 Let’s review the APIs producer.initTransactions(); try { producer.beginTransaction(); producer.send(record0); producer.send(record1); producer.sendOffsetsToTxn(…); producer.commitTransaction(); } catch (ProducerFencedException e) { producer.close(); } catch (KafkaException e) { producer.abortTransaction(); }
  48. 48. 48 Let’s review the APIs producer.initTransactions(); try { producer.beginTransaction(); producer.send(record0); producer.send(record1); producer.sendOffsetsToTxn(…); producer.commitTransaction(); } catch (ProducerFencedException e) { producer.close(); } catch (KafkaException e) { producer.abortTransaction(); }
  49. 49. 49 Consumer returns only committed messages
  50. 50. 50 Some notes on consuming transactions • Two ‘isolation levels’ : read_committed, and read_uncommitted. • Messages read in offset order. • read_committed consumers read to the point where there are no open transactions.
  51. 51. 51 TL;DR • Transaction coordinator and transaction log maintain transaction state. • Use the new producer APIs for transactions. • Consumers can read only committed messages.
  52. 52. 52 Part 3 Performance!
  53. 53. 53 What’s new, part 3: Performance boost! • Up to +20% producer throughput • Up to +50% consumer throughput • Up to -20% disk utilization • Savings start when you batch • Details: https://bit.ly/kafka-eos-perf
  54. 54. 54 Too good to be true? Let’s understand how!
  55. 55. 55 The old message format
  56. 56. 56 The new format
  57. 57. 57 The new format -> new fields
  58. 58. 58 The new format -> new fields
  59. 59. 59 The new format -> delta encoding
  60. 60. 60 A visual comparison with 7 records, 10 bytes each
  61. 61. 61 TL;DR • With a batch size of 2, the new format starts saving space. • Savings are maximal for large batches of small messages. • Hence higher throughput when IO bound. • Works as soon as you upgrade to the new format.
  62. 62. 62 Cool! But how do I use this?
  63. 63. 63 Producer Configs • enable.idempotence = true • max.inflight.requests.per.connection=1 • acks = “all” • retries > 1 (preferably MAX_INT) • transactional.id = ‘some unique id’ • enable.idempotence = true
  64. 64. 64 Consumer configs • isolation.level: • “read_committed”, or • “read_uncommitted”
  65. 65. 65 Streams config • processing.mode = “exactly_once”
  66. 66. 66 Putting it together • We understood Kafka’s existing delivery semantics • Understood why we want to improve them • Learned how these have been strengthened • Learned how the new semantics work
  67. 67. 67 When is it available? Available to try in Kafka 0.11, June 2017.
  68. 68. 68 Thank You!

×