Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams

1,180 views

Published on


Presented at Kafka Summit SF 2017 by Guozhang Wang, Confluent

Published in: Software
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams

  1. 1. Exactly-once Stream Processing with Kafka Streams Guozhang Wang Kafka Summit SF, Aug. 28, 2017
  2. 2. Outline • Stream processing with Kafka • Exactly-once for stream processing • How Kafka Streams enabled exactly-once 2
  3. 3. 3 Stream Processing with Kafka Process State Ads Clicks Ads Displays Billing Updates Fraud Suspects Your App
  4. 4. 4 Stream Processing with Kafka Process State Ads Clicks Ads Displays Billing Updates Fraud Suspects ack ack commit Your App
  5. 5. 5 Stream Processing: Do itYourself while (isRunning) { // read some messages from Kafka inputMessages = consumer.poll(); // do some processing… // send output messages back to Kafka, wait for ack producer.send(outputMessages).get(); // commit offsets for processed messages consumer.commit(..); }
  6. 6. 6 • Ordering • Partitioning &

 Scalability
 • Fault tolerance • State Management • Time, Window &

 Out-of-order Data
 • Re-processing DIY Stream Processing is Hard
  7. 7. 7 API, coding “Full stack” evaluation Operations, debugging, …
  8. 8. 8 Exactly-Once An application property for stream processing, .. that for each received record, .. its process results will be reflected exactly once, .. even under failures
  9. 9. 9 Error Scenario #1: Duplicate Writes Process State Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D ack Streams App
  10. 10. 10 Error Scenario #1: Duplicate Writes Process State Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D ack producer config: retries = N (default = 0) Streams App
  11. 11. 11 Error Scenario #2: Re-process Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D commit ack ack State Process Streams App
  12. 12. 12 Error Scenario #2: Re-process State Process Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D State Streams App
  13. 13. 13 Process State Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D Life before 0.11: At-least-once + Dedup
  14. 14. 14 So how to achieve Exactly-Once?
  15. 15. 15 Process State Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D ack ack Life before 0.11: At-least-once + Dedup
  16. 16. 16 Process State Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D ack ack commit Life before 0.11: At-least-once + Dedup
  17. 17. 17 2 2 3 3 4 4 Life before 0.11: At-least-once + Dedup
  18. 18. 18 Exactly-once, the Kafka Way!(0.11+)
  19. 19. 19 • Building blocks to achieve exactly-once
 • Idempotence: de-duped sends in order per partition
 • Transactions: atomic multiple-sends across topic partitions
 • Kafka Streams: enable exactly-once in a single knob Exactly-once, the Kafka Way!(0.11+)
  20. 20. Kafka Streams (0.10+) • New client library beyond producer and consumer • Powerful yet easy-to-use • Event time, stateful processing • Out-of-order handling • Highly scalable, distributed, fault tolerant • and more.. 20
  21. 21. 21 Anywhere, anytime Ok. Ok. Ok. Ok.
  22. 22. 22 Anywhere, anytime <dependency>
 <groupId>org.apache.kafka</groupId> <artifactId>kafka-streams</artifactId> <version>0.11.0.0</version> </dependency>
  23. 23. 23 Simple is Beautiful
  24. 24. Kafka Streams DSL 24 public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream<String, String> words = builder.stream(”topic1”); // count the words in this stream as an aggregated table KTable<String, Long> counts = words.groupBy(..).count(”Counts”); // write the result table to a new topic counts.to(”topic2”); // create a streams client and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }
  25. 25. Kafka Streams DSL 25 public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream<String, String> words = builder.stream(”topic1”); // count the words in this stream as an aggregated table KTable<String, Long> counts = words.groupBy(..).count(”Counts”); // write the result table to a new topic counts.to(”topic2”); // create a streams client and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }
  26. 26. Kafka Streams DSL 26 public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream<String, String> words = builder.stream(”topic1”); // count the words in this stream as an aggregated table KTable<String, Long> counts = words.groupBy(..).count(”Counts”); // write the result table to a new topic counts.to(”topic2”); // create a streams client and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }
  27. 27. Kafka Streams DSL 27 public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream<String, String> words = builder.stream(”topic1”); // count the words in this stream as an aggregated table KTable<String, Long> counts = words.groupBy(..).count(”Counts”); // write the result table to a new topic counts.to(”topic2”); // create a streams client and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }
  28. 28. Processor Topology 28 State KStream<..> stream1 = builder.stream(”topic1”); KStream<..> stream2 = builder.stream(”topic2”); KStream<..> joined = stream1.leftJoin(stream2, ...); KTable<..> aggregated = joined.groupBy(…).count(”store”); aggregated.to(”topic3”);
  29. 29. Processor 29 State KStream<..> stream1 = builder.stream(”topic1”); KStream<..> stream2 = builder.stream(”topic2”); KStream<..> joined = stream1.leftJoin(stream2, ...); KTable<..> aggregated = joined.groupBy(…).count(”store”); aggregated.to(”topic3”);
  30. 30. Stream 30 State KStream<..> stream1 = builder.stream(”topic1”); KStream<..> stream2 = builder.stream(”topic2”); KStream<..> joined = stream1.leftJoin(stream2, ...); KTable<..> aggregated = joined.groupBy(…).count(”store”); aggregated.to(”topic3”);
  31. 31. State Store 31 State KStream<..> stream1 = builder.stream(”topic1”); KStream<..> stream2 = builder.stream(”topic2”); KStream<..> joined = stream1.leftJoin(stream2, ...); KTable<..> aggregated = joined.groupBy(…).count(”store”); aggregated.to(”topic3”);
  32. 32. Kafka Streams DSL 32 public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream<String, String> words = builder.stream(”topic1”); // count the words in this stream as an aggregated table KTable<String, Long> counts = words.groupBy(..).count(”Counts”); // write the result table to a new topic counts.to(”topic2”); // create a streams client and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }
  33. 33. Processor Topology 33Kafka Streams Kafka State
  34. 34. 34 Kafka Topic B Kafka Topic A P1 P2 P1 P2 Processing in Kafka Streams
  35. 35. 35 Kafka Topic B Kafka Topic A Processor Topology P1 P2 P1 P2 Processing in Kafka Streams
  36. 36. 36 Kafka Topic AKafka Topic B Processing in Kafka Streams
  37. 37. MyApp.2MyApp.1 Kafka Topic B Task2Task1 37 Kafka Topic A State State Processing in Kafka Streams
  38. 38. MyApp.2MyApp.1 Kafka Topic B Task2Task1 38 Kafka Topic A State State Processing in Kafka Streams Kafka Changelog Topic
  39. 39. 39 Process State Kafka Topic C Kafka Topic D ack ack Kafka Topic A Kafka Topic B commit Exactly-once with Kafka
  40. 40. 40 Process State Kafka Topic C Kafka Topic D ack ack Kafka Topic A Kafka Topic B commit Exactly-once with Kafka • Acked produce to sink topics • Offset commit for source topics • State update on processor
  41. 41. 41 • Acked produce to sink topics • Offset commit for source topics • State update on processor Exactly-Once with Kafka
  42. 42. 42 • Acked produce to sink topics • Offset commit for source topics • State update on processor All or Nothing Exactly-Once with Kafka
  43. 43. 43 Exactly-Once with Kafka Streams (0.11+) • Acked produce to sink topics • Offset commit for source topics • State update on processor
  44. 44. 44 Exactly-Once with Kafka Streams (0.11+) • Acked produce to sink topics • A batch of records sent to the offset topic • State update on processor
  45. 45. 45 • Acked produce to sink topics • A batch of records sent to the offset topic Exactly-Once with Kafka Streams (0.11+) • A batch of records sent to changelog topics
  46. 46. 46 Exactly-Once with Kafka Streams (0.11+) • A batch of records sent to sink topics • A batch of records sent to the offset topic • A batch of records sent to changelog topics
  47. 47. 47 Exactly-Once with Kafka Streams (0.11+) All or Nothing • A batch of records sent to sink topics • A batch of records sent to the offset topic • A batch of records sent to changelog topics
  48. 48. 48Kafka Streams Kafka Input Topics Kafka Changelog Topic Kafka Output Topic try { producer.beginTxn(); } catch (KafkaException e) {
 } Kafka Streams Exactly-Once State
  49. 49. 49Kafka Streams Kafka Input Topics Kafka Changelog Topic Kafka Output Topic Kafka Streams Exactly-Once State try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. } } catch (KafkaException e) {
 }
  50. 50. Kafka Streams Exactly-Once 50Kafka Streams Kafka Input Topics Kafka Changelog Topic Kafka Output Topic try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. } } catch (KafkaException e) {
 } State
  51. 51. Kafka Streams Exactly-Once 51Kafka Streams Kafka Input Topics State Kafka Changelog Topic Kafka Output Topic try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); } } catch (KafkaException e) {
 }
  52. 52. Kafka Streams Exactly-Once 52Kafka Streams Kafka Input Topics State Kafka Changelog Topic Kafka Output Topic try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } } catch (KafkaException e) {
 }
  53. 53. Kafka Streams Exactly-Once 53Kafka Streams Kafka Input Topics State Kafka Changelog Topic Kafka Output Topic try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } producer.commitTxn(); } catch (KafkaException e) {
 producer.abortTxn(); }
  54. 54. 54 StateProcess StateProcess StateProcess Exactly-Once with Failures Kafka Kafka Streams Kafka Changelog Kafka
  55. 55. 55 StateProcess StateProcess StateProcess Exactly-Once with Failures Kafka Kafka Streams Kafka Changelog Kafka
  56. 56. 56 StateProcess StateProcess Process Kafka Kafka Streams Kafka Changelog Kafka Exactly-Once with Failures State
  57. 57. 57 StateProcess StateProcess StateProcess Kafka Kafka Streams Kafka Changelog Kafka Exactly-Once with Failures
  58. 58. 58 StateProcess StateProcess StateProcess Kafka Kafka Streams Kafka Changelog Kafka Exactly-Once with Failures
  59. 59. 59 Exactly-Once with Failures config: processing.mode = exactly-once 
 (default = at-least-once) [KIP-98, KIP-129]
  60. 60. 60 API, coding “Full stack” evaluation Operations, debugging, … Simple is Beautiful
  61. 61. 61 Life is Good with Exactly-Once, but..
  62. 62. 62 What if not all my data is in Kafka?
  63. 63. 63
  64. 64. 64 Connectors • 60+ since first release (0.9+) • 20+ from & partners (exactly-once coming)
  65. 65. 65 Connect End-to-End Exactly-Once Connect Connect Connect Connect Connect Connect Streams Streams Streams Streams Streams KAFKA Exactly-Once Zone Wild Wild West Connect
  66. 66. Take-aways • Exactly-once: important property for stream processing 66
  67. 67. Take-aways • Exactly-once: important property for stream processing • Kafka Streams: exactly-once processing made easy 67
  68. 68. Take-aways 68 THANKS! Guozhang Wang | guozhang@confluent.io | @guozhangwang Additional Resources: http://www.confluent.io/resources • Exactly-once: important property for stream processing • Kafka Streams: exactly-once processing made easy

×