Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kafka Connect and Streams (Concepts, Architecture, Features)

81 views

Published on

High level introduction to Kafka Connect and Kafka Streams, two components of the Apache Kafka open source framework. See the concepts, architecture and features.

Published in: Technology
  • Be the first to comment

Kafka Connect and Streams (Concepts, Architecture, Features)

  1. 1. Kafka Connect and Kafka Streams The Rise of Apache Kafka as Streaming Platform Kai Waehner Technology Evangelist kontakt@kai-waehner.de LinkedIn @KaiWaehner www.confluent.io www.kai-waehner.de
  2. 2. 2https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63921 https://qconlondon.com/london2018/presentation/cloud-native-and-scalable-kafka-architecture (2018) (2018) Apache Kafka A Distributed, Scalable Commit Log
  3. 3. 3 Apache Kafka A Distributed, Scalable Commit Log
  4. 4. 4 Apache Kafka A Distributed, Scalable Commit Log
  5. 5. 5 1.0 “Enterprise Ready” A Brief History of Apache Kafka and Confluent 0.11 Exactly-once semantics 0.10 Data processing (Streams API) 0.9 Data integration (Connect API) Intra-cluster replication 0.8 2012 2014 Cluster mirroring0.7 2015 2016 20172013 2018 CP 4.1 KSQL GA
  6. 6. 6 Apache Kafka The Rise of a Streaming Platform
  7. 7. 7 Orders Customers Payments Stock Apache Kafka Single Shared Source of Truth for (Micro)Services
  8. 8. 8 Independent Dev / Test / Prod
  9. 9. 9 No Matter Where it Runs
  10. 10. Kafka Connect Declarative Data integration for Apache Kafka
  11. 11. 11 Apache Kafka as Central Nervous System
  12. 12. 12 Kafka Connect
  13. 13. 13 Standalone Mode
  14. 14. 14 Distributed Mode
  15. 15. 15 Scalable Consumption
  16. 16. 16 Certified Connectors
  17. 17. 17 (Distributed) Workers
  18. 18. 18 Converters
  19. 19. 19 Avro Converter
  20. 20. 20 Single Message Transforms
  21. 21. 21 Single Message Transforms •Mask sensitive information •Add identifiers •Tag events •Lineage/provenance •Remove unnecessary columns •Route high priority events to faster data stores •Direct events to different Elasticsearch indexes •Cast data types to match destination •Remove unnecessary columns Modify events before storing in Kafka: Modify events going out of Kafka:
  22. 22. 22 • InsertField – Add a field using either static data or record metadata • ReplaceField – Filter or rename fields • MaskField – Replace field with valid null value for the type (0, empty string, etc) • ValueToKey – Set the key to one of the value’s fields • HoistField – Wrap the entire event as a single field inside a Struct or a Map • ExtractField – Extract a specific field from Struct and Map and include only this field in results • SetSchemaMetadata – modify the schema name or version • TimestampRouter – Modify the topic of a record based on original topic and timestamp. Useful when using a sink that needs to write to different tables or indexes based on timestamps • RegexpRouter – modify the topic of a record based on original topic, replacement string and a regular expression • „Build your own“ – A Transformation is just a Java Class Built-in Transformations
  23. 23. Kafka Streams Stream Processing natively on top of Apache Kafka without an additional big data cluster
  24. 24. 24 Kafka Streams - Part of Apache Kafka
  25. 25. 25 Stream Processing Data at Rest Data in Motion
  26. 26. 26 Key concepts
  27. 27. 27 Kafka Streams - Processor Topology 1) Read input from Kafka 2) Operator (directed acyclic graph): • Filter / map / aggregation / joins • Operators can be stateful 3) Write result back to Kafka
  28. 28. 28 Kafka Streams - Runtime
  29. 29. 29 Kafka Streams - Distributed State
  30. 30. 30 Kafka Streams - Scaling
  31. 31. 31 Kafka Streams - Streams and Tables
  32. 32. 32 Kafka Streams - Streams and Tables
  33. 33. 33 // Example: reading data from Kafka KStream<byte[], String> textLines = builder.stream("textlines-topic", Consumed.with( Serdes.ByteArray(), Serdes.String())); // Example: transforming data KStream<byte[], String> upperCasedLines= rawRatings.mapValues(String::toUpperCase)); KStream
  34. 34. 34 // Example: aggregating data KTable<String, Long> wordCounts = textLines .flatMapValues(textLine -> Arrays.asList(textLine.toLowerCase().split("W+"))) .groupBy((key, word) -> word) .count(); KTable
  35. 35. 35 Kafka Streams A complete streaming microservices, ready for production at large-scale App configuration Define processing (here: WordCount) Start processing
  36. 36. 36
  37. 37. 37 What if you are NOT a Java Coder? Population CodingSophistication Realm of Stream Processing New, Expanded Realm BI Analysts Core Developers Data Engineers Core Developers who don’t like Java Java KSQL
  38. 38. 38 KSQLis the Streaming SQL Engine for Apache Kafka
  39. 39. 39 KSQL – The Streaming SQL Engine for Apache Kafka
  40. 40. Questions? Kai Waehner Technology Evangelist kontakt@kai-waehner.de LinkedIn @KaiWaehner www.confluent.io www.kai-waehner.de

×