Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Kafka: New Features That You Might Not Know About


Published on

In the last two years Apache Kafka rapidly introduced new versions, going from 0.10.x to 2.x. It can be hard to keep up with all the updates and a lot of companies still run 0.10.x clusters (or even older ones).

Join this session to learn new exciting features in Kafka introduced in 0.11, 1.0, 1.1 and 2.0 versions including, but not limited to, the new protocol and message headers, transactional support and exactly-only delivery semantics, as well as controller changes that make it possible to shutdown even large clusters in seconds.

Published in: Software
  • If you’re struggling with your assignments like me, check out ⇒ ⇐.
    Are you sure you want to  Yes  No
    Your message goes here

Apache Kafka: New Features That You Might Not Know About

  1. 1. Apache Kafka: New Features That You Might Not Know About Yaroslav Tkachenko Software Architect at Activision
  2. 2. Apache Kafka Versions 1.0.0 2.0.0 1.0.1 1.1.1 2.0.1 2.1.0 May 2016 June 2017 November 2017 July 2018
  3. 3. 0.11: New Message Format
  4. 4. Record Batch: ... magic: 2 … attributes: … bit 4: isTransactional … producerId: int64 producerEpoch: int16 records: [Record] Message Format v2 Record: ... key: byte[] value: byte[] headers: [Header] Header: ... headerKey: String value: byte[]
  5. 5. 0.11: Headers
  6. 6. Message Headers public interface Header { String key(); byte[] value(); } List<Header> headers = Arrays.asList( new RecordHeader("hkey1", "hvalue1".getBytes()), new RecordHeader("hkey2", "hvalue2".getBytes()) ); new ProducerRecord<>("topic", 0, "key", "value", headers);
  7. 7. Pros • No need to deserialize the whole message payload for routing / filtering use-cases Cons • Harder to save the headers together with the payload when archiving, persisting to data stores or integrating with 3rd party systems Message Headers
  8. 8. Message Headers
  9. 9. 0.11: Transactions
  10. 10. Transactions • Atomic writes to multiple Kafka topics and partitions • Offset commits happen in the same transaction • + epoch for every producer • Consumers must use “read_committed” isolation level for consuming only committed transactional data
  11. 11. Transactions KafkaProducer producer = ... producer.initTransactions(); KafkaConsumer consumer = ... consumer.subscribe("inputTopic")); ConsumerRecords records = consumer.poll(Long.MAX_VALUE); try { producer.beginTransaction(); for (ConsumerRecord record : records) { producer.send(processAndProduceRecord("outputTopic", record)); } producer.sendOffsetsToTransaction(currentOffsets(consumer), groupId); producer.commitTransaction(); } catch (Exception e) { producer.abortTransaction(); }
  12. 12. Transactions
  13. 13. Transactions In practice, for a producer producing 1KB records at maximum throughput, committing messages every 100ms results in only a 3% degradation in throughput.
  14. 14. 0.11: Exactly-Once Delivery
  15. 15. Exactly-Once: Why is it so Hard?
  16. 16. At most once • May or may not be received • No duplicates • Probably missing data Delivery Guarantees Exactly once • Delivery guaranteed • No duplicates • No missing data At least once • Delivery guaranteed • Possible duplicates • No missing data
  17. 17. Idempotent producer writesTransactions API Atomic writes and reads Transactions Idempotence
  18. 18. Idempotence • Unique producer ID is assigned to each producer • Monotonically increasing sequence number is generated for every topic/partition write • Broker persists and validates sequence numbers: • lower number → duplicate, reject • higher number → out-of-sequence error, reject • exactly one greater than the last → allow
  19. 19. Enabling Exactly-Once in Kafka Streams? Just set “processing.guarantee” to “exactly_once”. That’s it! Don’t need to think about checkpointing and related challenges (like in some other frameworks...)
  20. 20. 1.1: Controller Improvements
  21. 21. Controller Improvements • One Controller per cluster • Responsible for state management of partitions and replicas • Communicates with Zookeeper
  22. 22. Updating partition leaders in batches during the controlled shutdown Zookeeper Asynchronous API is used during the controlled shutdown and controller failover Controlled shutdown time: 3 seconds Updating partition leaders one by one, sequentially during the controlled shutdown Zookeeper Synchronous API is used during the controlled shutdown and controller failover Controlled shutdown time: 6.5 minutes Before 1.1.0 After 1.1.0
  23. 23. 2.0: Kafka Streams Improvements
  24. 24. Kafka Streams Improvements • Message header support in the Processor API • TopicNameExtractor for dynamic routing • kafka-streams-testutil helper for unit-testing • Scala wrapper for the Streams DSL
  25. 25. Thanks! @sap1ens