Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Join semantics in kafka streams

217 views

Published on

With the recent adoption of the Confluent and Kafka Streams, organizations have experienced significantly improved system stability with real-time processing framework, as well as improved scalability and lower maintenance costs.
The focus of this webinar is:
~Different join operators in Kafka Streams.
~Exploring different options in Kafka Streams to join semantics, both with and without shared keys.
~How to put Application Owner in control by leveraging simplified app-centric architecture.

If you have any queries, contact Himani over mail himani.arora@knoldus.in

Published in: Software
  • Be the first to comment

Join semantics in kafka streams

  1. 1. Join Semantics in Kafka Streams Himani Arora Software Consultant Knoldus Inc.
  2. 2. Agenda ● Introduction to Apache Kafka ● Introduction to Streams API ● How to use Streams API ● Join Operations supported in Kafka Streams ● Different types of Joins
  3. 3. Apache Kafka
  4. 4. Introduction ● Apache Kafka is a distributed streaming platform where producers send messages—key-value pairs—to topics which in turn are polled and read by consumers. Each topic is partitioned, and the partitions are distributed among brokers. ● It has four core APIs: ○ Producer API ○ Consumer API ○ Streams API ○ Connector API
  5. 5. Streams API ● Kafka Streams is a client library for processing and analyzing data stored in Kafka. ● There are two main abstractions in the Streams API: ○ A KStream is a stream of key-value pairs. KStreams are stateless, but they allow for aggregation by turning them into the other core abstraction. ○ A KTable, which is often described as a “changelog stream.” A KTable holds the latest value for a given message key and reacts automatically to newly incoming messages.
  6. 6. How to install the Streams API? ● There is no installation needed - Build Apps, Not Clusters! ● It is a library and can be added to your app like any other library. <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-streams</artifactId> <version>1.1.0</version> </dependency>
  7. 7. Joins Kafka Streams supports 3 type of joins: ● Inner Joins ○ Gives an output when both input sources have records with same key. ● Left Joins ○ Gives an output for each record in the left or primary input source. If the other source does not have a value for a given key, it is set to null. ● Outer Joins ○ Gives an output for each record in either input source. If only one source contains a key, the other is null.
  8. 8. Type 1 Type 2 Inner Join Left Join Outer Join KStream KStream ✔ ✔ ✔ KStream KTable ✔ ✔ ✖ KStream Global KTable ✔ ✔ ✖ KTable KTable ✔ ✔ ✔
  9. 9. KStream-KStream Join ● This is a sliding window join, meaning that all tuples close to each other with regard to time are joined. Time here is the difference up to the size of the window. ● These joins are always windowed joins because otherwise, the size of the internal state store used to perform the join would grow indefinitely. ● Since KStream-KStream Join is always windowed joins, we must provide a join window. KStream<String, String> joined = left.join(right, (leftValue, rightValue) -> "left=" + leftValue + ", right=" + rightValue, /* ValueJoiner */ JoinWindows.of(TimeUnit.MINUTES.toMillis(5)), Serdes.String(), /* key */ Serdes.Long(), /* left value */ Serdes.Double() /* right value */ );
  10. 10. KTable-KTable Join ● KTable-KTable joins are designed to be consistent with their counterparts in relational databases. They are always non-windowed joins. ● The changelog streams of KTables is materialized into local state stores that represent the latest snapshot of their tables. The join result is a new KTable representing changelog stream of the join operation. KTable<String, String> joined = left.join(right, (leftValue, rightValue) -> "left=" + leftValue + ", right=" + rightValue /* ValueJoiner */ );
  11. 11. KStream-KTable Join ● KStream-KTable joins are asymmetric non-window joins. They allow you to perform table lookups against a KTable everytime a new record is received from the KStream. ● In contrast to stream-stream and table-table join which are both symmetric, a stream-table join is asymmetric. KStream<String, String> joined = left.join(right, (leftValue, rightValue) -> "left=" + leftValue + ", right=" + rightValue, /* ValueJoiner */ Serdes.String(), /* key */ Serdes.Long() /* left value */ );
  12. 12. KStream-GlobalKTable Join ● KStream-GlobalKTable joins are always non-windowed joins. ● It differs from KStream-Global KTable joins in the following manner: ○ They allow for efficient star joins, joining large scale facts stream with dimension tables. ○ They allow for joining against foreign keys ○ They are often more efficient than their partitioned KTable counterpart. KStream<String, String> joined = left.join(right, (leftKey, leftValue) -> leftKey.length(), /* derive a new key by which to lookup agianst the table */ (leftValue, rightValue) -> "left=" + leftValue + ", right=" + rightValue ); /* ValueJoiner */
  13. 13. References ● https://kafka.apache.org/documentation/streams ● https://docs.confluent.io/current/streams/developer-guide/dsl-api.html ● https://www.confluent.io/blog/crossing-streams-joins-apache-kafka
  14. 14. Q&A Please email your queries to himani.arora@knoldus.in

×