1
Building Stream
Processing Applications
with Apache® Kafka’sTM
Exactly-Once Processing
Guarantees
Matthias J. Sax | Software Engineer
matthias@confluent.io
@MatthiasJSax
2
Apache Kafka
• A distributed Streaming Platform
Consumers
Producers
Connectors
Processing
3
Confluent
• Founded by the original creators of Apache Kafka
• Headquarters based in Palo Alto, CA
KSQL: Streaming SQL for Apache Kafka
Developer Preview (https://github.com/confluentinc/ksql)
4
How to Build Applications with Apache Kafka
• Streams API
• Client library (it’s actually much more, but you use it like one)
• DIY using Consumer/Producer API
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>0.11.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>0.11.0.1</version>
</dependency>
5
Apache Kafka is a Streaming Platform
Does NOT run inside
the Kafka brokers!
6
Deploy as you wish
7
Streams API
The easiest way to use exactly-once semantics!
• Easy to use and powerful DSL plus low level Processor API
• Filter, aggregations, windows, joins, tables, punctuations, …
• Rich time semantics (event time, ingestion time, processing time)
• Elastic, scalable, fault-tolerant (including state)
• S, M, L, XL, … use cases
• No need to change any code to use exactly-once!
• Config parameter processing.mode = “exactly_once”
8
Apache Kafka’s Exactly-once Guarantees
• Avoids duplicates on writes
• Frees application code from de-duplication
• Simplifies application development
• Enables new use cases with strong consistency guarantees
• Stock market
• Financial industry
• Billing
• Etc…
9
Core Concepts in Streams API
10
Topics, Streams, and Tables
11
Processing Streams
KStreamBuilder builder = new KStreamBuilder();
KStream<Long,String> inputStream =
builder.stream(“input-topic”);
KStream<Long,String> outputStream =
inputStream.mapValues(
value -> value.toLowerCase());
outputStream.to(“output-topic”);
12
Using Tables
KTable<Long,String> inputTable = builder.table(“changelog-topic”);
13
Using Tables
KStream<Long,String> enrichedStream =
inputStream.join(inputTable, …);
14
Aggregating Streams
KTable<Long,Long> countPerKey =
enrichedStream.groupByKey()
.count();
15
End-To-End Application and Exactly-Once
read – process – write
track input offsets – track state updates – write output
16
Application Failure Scenarios: tracking offsets
Application
(k,v1) (k,v2)
Duplicate reads results in duplicate writes.
read(k,v1) -> process -> output -> commit offsets
read(k,v2) -> process -> output -> commit offsets
read(k,v2) -> process -> output -> CRASH
17
Application Failure Scenarios: state update
Application
(k,v1) (k,v2)
read(k,v1) -> process/state -> output -> commit offsets
read(k,v2) -> process/state -> output -> commit offsets
read(k,v2) -> process/state -> CRASH
Duplicate reads results in corrupted state and thus
wrong results (e.g., over counting).
18
Application Failure Scenarios: Error on Write
Producer
Application
Consumer
Application
(k,v1) (k,v2)
Duplicate writes lead to wrong result
and downstream duplicate reads.
write(k,v1)
write(k,v2)
read(k,v1)
read(k,v2)
read(k,v2)
(k,v2)
19
Error Propagation
Application Application Application
20
Exactly-Once in Kafka Streams API
Kafka’s Streams API provides
Exactly-Once Processing Guarantees,
by an atomic read-process-write pattern.
This allows for deep processing pipelines with
exactly-one guarantees.
21
Exactly-Once in Apache Kafka Streams API
22
Exactly-Once in Kafka Streams API (since v0.11.0)
• Builds on top of KafkaProducer and KafkaConsumer
• In v0.11.0, KafkaProducer adds:
• Idempotent writes
• Transactional API
• Includes offset commits in a producer transaction
• No offset commits via KafkaConsumer
• In v0.11.0, KafkaConsumer adds:
• read committed mode (vs. read uncommited)
23
How to use exactly-once capabilities:
• Streams API (the easiest way to use exactly-once semantics)
• Config parameter processing.mode = “exactly_once”
• Idempotent Producer
• Config parameter enable.idempotence = true
• Transactional Producer
• Config parameter transactional.id = “my-unique-tid”
• And Transactional API (hard to use correctly – even if it look simple on the surface)
• Transactional Consumer
• Config parameter isolation.level = “read_committed”
(default: “read_uncommitted”)
24
Transactional API
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(message1);
producer.send(message2);
producer.sendOffsetsToTxn(…);
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
} catch (KafkaException e) {
producer.abortTransaction();
}
25
Summary
• Streams API is the easiest way to build applications with Apache Kafka
• It’s a library that enriches your application
• No compute cluster
• It provides end-to-end exactly-once processing guarantees
• Kafka’s exactly-once guarantees provide strong semantics that simplifies your
application code
26
Material
• Download Confluent Open Source: https://www.confluent.io/product/confluent-
open-source/
• Check out the docs: https://docs.confluent.io/
• Check our blog:
• Exactly-Once: https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-
apache-kafka-does-it/
• Micro-Service Blog Series: https://www.confluent.io/blog/data-dichotomy-rethinking-the-way-
we-treat-data-and-services/
• Kafka Summit talks:
• Exactly-Once (NY): https://www.confluent.io/kafka-summit-nyc17/resource/#exactly-once-
semantics_slide
• Exactly-Once with Streams API (SF): https://www.confluent.io/kafka-summit-
sf17/resource/#Exactly-once-Stream-Processing-with-Kafka-Streams_slide
• Micro-Services (SF): https://www.confluent.io/kafka-summit-sf17/resource/#building-event-
driven-services-stateful-streams_slide
27
Thank You
We are hiring!

Building Stream Processing Applications with Apache Kafka's Exactly-Once Processing Guarantees

  • 1.
    1 Building Stream Processing Applications withApache® Kafka’sTM Exactly-Once Processing Guarantees Matthias J. Sax | Software Engineer matthias@confluent.io @MatthiasJSax
  • 2.
    2 Apache Kafka • Adistributed Streaming Platform Consumers Producers Connectors Processing
  • 3.
    3 Confluent • Founded bythe original creators of Apache Kafka • Headquarters based in Palo Alto, CA KSQL: Streaming SQL for Apache Kafka Developer Preview (https://github.com/confluentinc/ksql)
  • 4.
    4 How to BuildApplications with Apache Kafka • Streams API • Client library (it’s actually much more, but you use it like one) • DIY using Consumer/Producer API <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-streams</artifactId> <version>0.11.0.1</version> </dependency> <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients</artifactId> <version>0.11.0.1</version> </dependency>
  • 5.
    5 Apache Kafka isa Streaming Platform Does NOT run inside the Kafka brokers!
  • 6.
  • 7.
    7 Streams API The easiestway to use exactly-once semantics! • Easy to use and powerful DSL plus low level Processor API • Filter, aggregations, windows, joins, tables, punctuations, … • Rich time semantics (event time, ingestion time, processing time) • Elastic, scalable, fault-tolerant (including state) • S, M, L, XL, … use cases • No need to change any code to use exactly-once! • Config parameter processing.mode = “exactly_once”
  • 8.
    8 Apache Kafka’s Exactly-onceGuarantees • Avoids duplicates on writes • Frees application code from de-duplication • Simplifies application development • Enables new use cases with strong consistency guarantees • Stock market • Financial industry • Billing • Etc…
  • 9.
    9 Core Concepts inStreams API
  • 10.
  • 11.
    11 Processing Streams KStreamBuilder builder= new KStreamBuilder(); KStream<Long,String> inputStream = builder.stream(“input-topic”); KStream<Long,String> outputStream = inputStream.mapValues( value -> value.toLowerCase()); outputStream.to(“output-topic”);
  • 12.
    12 Using Tables KTable<Long,String> inputTable= builder.table(“changelog-topic”);
  • 13.
    13 Using Tables KStream<Long,String> enrichedStream= inputStream.join(inputTable, …);
  • 14.
    14 Aggregating Streams KTable<Long,Long> countPerKey= enrichedStream.groupByKey() .count();
  • 15.
    15 End-To-End Application andExactly-Once read – process – write track input offsets – track state updates – write output
  • 16.
    16 Application Failure Scenarios:tracking offsets Application (k,v1) (k,v2) Duplicate reads results in duplicate writes. read(k,v1) -> process -> output -> commit offsets read(k,v2) -> process -> output -> commit offsets read(k,v2) -> process -> output -> CRASH
  • 17.
    17 Application Failure Scenarios:state update Application (k,v1) (k,v2) read(k,v1) -> process/state -> output -> commit offsets read(k,v2) -> process/state -> output -> commit offsets read(k,v2) -> process/state -> CRASH Duplicate reads results in corrupted state and thus wrong results (e.g., over counting).
  • 18.
    18 Application Failure Scenarios:Error on Write Producer Application Consumer Application (k,v1) (k,v2) Duplicate writes lead to wrong result and downstream duplicate reads. write(k,v1) write(k,v2) read(k,v1) read(k,v2) read(k,v2) (k,v2)
  • 19.
  • 20.
    20 Exactly-Once in KafkaStreams API Kafka’s Streams API provides Exactly-Once Processing Guarantees, by an atomic read-process-write pattern. This allows for deep processing pipelines with exactly-one guarantees.
  • 21.
    21 Exactly-Once in ApacheKafka Streams API
  • 22.
    22 Exactly-Once in KafkaStreams API (since v0.11.0) • Builds on top of KafkaProducer and KafkaConsumer • In v0.11.0, KafkaProducer adds: • Idempotent writes • Transactional API • Includes offset commits in a producer transaction • No offset commits via KafkaConsumer • In v0.11.0, KafkaConsumer adds: • read committed mode (vs. read uncommited)
  • 23.
    23 How to useexactly-once capabilities: • Streams API (the easiest way to use exactly-once semantics) • Config parameter processing.mode = “exactly_once” • Idempotent Producer • Config parameter enable.idempotence = true • Transactional Producer • Config parameter transactional.id = “my-unique-tid” • And Transactional API (hard to use correctly – even if it look simple on the surface) • Transactional Consumer • Config parameter isolation.level = “read_committed” (default: “read_uncommitted”)
  • 24.
  • 25.
    25 Summary • Streams APIis the easiest way to build applications with Apache Kafka • It’s a library that enriches your application • No compute cluster • It provides end-to-end exactly-once processing guarantees • Kafka’s exactly-once guarantees provide strong semantics that simplifies your application code
  • 26.
    26 Material • Download ConfluentOpen Source: https://www.confluent.io/product/confluent- open-source/ • Check out the docs: https://docs.confluent.io/ • Check our blog: • Exactly-Once: https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how- apache-kafka-does-it/ • Micro-Service Blog Series: https://www.confluent.io/blog/data-dichotomy-rethinking-the-way- we-treat-data-and-services/ • Kafka Summit talks: • Exactly-Once (NY): https://www.confluent.io/kafka-summit-nyc17/resource/#exactly-once- semantics_slide • Exactly-Once with Streams API (SF): https://www.confluent.io/kafka-summit- sf17/resource/#Exactly-once-Stream-Processing-with-Kafka-Streams_slide • Micro-Services (SF): https://www.confluent.io/kafka-summit-sf17/resource/#building-event- driven-services-stateful-streams_slide
  • 27.