Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Event Hub (i.e. Kafka) in Modern Data Architecture

587 views

Published on

Today's modern data architectures and the their implementations contain an Event Hub. What are the benefits of placing an Event Hub in a Modern Data (Analytics) Architecture? What exactly is an Event Hub and what capabilities should it provide? Why is Apache Kafka the most popular realization of an Event Hub?
These and many other questions will be answered in this session. The talk will start with a vendor-neutral definition of the capabilities of an Event Hub.
Then the session will highlight the different architecture styles which can be supported using an Event Hub (Kafka), such as Streaming Data Integration, Stream Analytics and Decoupled Event-Driven Applications and how can these be combined into a unified architecture, making the Event Hub the central nervous system of an enterprise architecture. We will end with an overview of the Kafka ecosystem and a placement of the various components onto the Modern Data (Analytics) Architecture.

Published in: Data & Analytics

Event Hub (i.e. Kafka) in Modern Data Architecture

  1. 1. http://guidoschmutz@wordpress.com@gschmutz Event Hub (Kafka) in Modern Data Architecture Guido Schmutz
  2. 2. BASEL | BERN | BRUGG | BUKAREST | DÜSSELDORF | FRANKFURT A.M. | FREIBURG I.BR. | GENF HAMBURG | KOPENHAGEN | LAUSANNE | MANNHEIM | MÜNCHEN | STUTTGART | WIEN | ZÜRICH Guido Working at Trivadis for more than 23 years Consultant, Trainer, Platform Architect for Java, Oracle, SOA and Big Data / Fast Data Oracle Groundbreaker Ambassador & Oracle ACE Director @gschmutz guidoschmutz.wordpress.com 192nd edition
  3. 3. What exactly is an Event Hub?
  4. 4. Event Hub Event Hub – as a starting point
  5. 5. Event Hub Event Hub – an Infrastructure with these capabilities 1. topic semantics (publish/subscribe) – message can be consumed by 0 – n consumers 2. queue semantics – messages can be consumed by exactly one consumer 3. horizontally scalable – throughput increases with more resources 4. auto-scaling – up and down-scaling upon load 5. highly available – no single point of failure 6. Control/handle back-pressure 7. durable – messages may not be lost 8. schema-less – no knowledge on message content and format 9. Efficient support of Stream and Batch Consumers (offline and with large Backlog) 10. (Unlimited) Retention of messages (long term storage) 11. Guaranteed ordering of messages 12. Support re-consumption of events 13. Access control – control over who can produce and consume which events 14. interoperable – support for different clients
  6. 6. Kafka – the most popular Event Hub
  7. 7. Kafka – the most popular Event Hub Kafka Cluster Consumer 1 Consumer 2 Broker 1 Broker 2 Broker 3 Zookeeper Ensemble ZK 1 ZK 2ZK 3 Schema Registry Service 1 Management Control Center Kafka Manager KAdmin Producer 1 Producer 2 kafkacat Data Retention: • Never • Time (TTL) or Size-based • Log-Compacted based 1 10 12 3 5 6 7 14 8 9 11 12 Producer3Producer3 ConsumerConsumer 3 1. topic semantics 2. queue semantics 3. horizontally scalable 4. auto-scaling 5. highly available 6. back-pressure 7. durable 8. schema-less/opaque 9. Stream and Batch Consumers 10. (Unlimited) Retention 11. Guaranteed ordering 12. re-consumption of events 13. Access Control 14. Interoperable
  8. 8. Event Hub Event Hub – capabilities supported by Kafka 1. topic semantics (publish/subscribe) – message can be consumed by 0 – n consumers 2. queue semantics – messages can be consumed by exactly one consumer 3. horizontally scalable – throughput increases with more resources 4. auto-scaling – up and down-scaling upon load 5. highly available – no single point of failure 6. Control/handle back-pressure 7. durable – messages may not be lost 8. schema-less – no knowledge on message content and format 9. Efficient support of Stream and Batch Consumers (offline and with large Backlog) 10. (Unlimited) Retention of messages (long term storage) 11. Guaranteed ordering of messages 12. Support re-consumption of events 13. Access control – control over who can produce and consume which events 14. interoperable – support for different clients Light Grey = limited support
  9. 9. • Cloud Services • Cloud Services with Kafka API • Kafka Cloud Services Event Hub - Kafka Alternatives? Cloud Services? • traditional Message Brokers (with a lot of limitations regarding Event Hub capabilities) • Apache Pulsar • Solace • Pravega (Dell Streaming Platform) • Oracle AQ (Kafka API coming) AQ
  10. 10. Event Hub - core building block of a Modern Data Architecture
  11. 11. Event Hub Event Hub – as a starting point Vehicle Environ mental Streaming Data Sources Ware house E-Comm erce
  12. 12. Event Hub Stream Data Integration Stream Data Integration Vehicle Environ mental Streaming Data Sources Ware house Using Stream Data Integration for integrating various data sources E-Comm erce Stream Data Integration
  13. 13. Event Hub Stream Data Integration Stream Data Integration Vehicle Environ mental Ware house Gateway Using Edge Computing and Stream Data Integration • MQTT as a gateway to Kafka E-Comm erce Stream Data Integration Streaming Data Sources
  14. 14. Stream Data Integration – Kafka Connect / StreamSets • declarative style, simple data flows • framework is part of Apache Kafka • Many connectors available • Single Message Transforms (SMT) • GUI-based, drag-and drop Data Flow Pipelines • Both stream and batch processing (micro- batching) • custom sources, sinks, processors
  15. 15. Event Hub Stream Analytics Stream Data Integration Stream Data Integration Vehicle Environ mental Streaming Data Sources Ware house Using Stream Analytics • Time Windowed State Management • Stream-to-Table Joins • Stream-to-Stream Joins • Event Pattern Detection • Machine Learning Model Execution (Inference) [1] E-Comm erce Stream Data Integration Gateway
  16. 16. Stream Analytics - Kafka Streams • Programmatic API, “just” a Java library • fault-tolerant local state • Fixed, Sliding and Session Windowing • Stream-Stream / Stream-Table Joins • At-least-once and exactly-once • Stream Processing with zero coding using SQL-like language • built on top of Kafka Streams • interactive (CLI) and headless (cmd file) trucking_ driver Kafka Broker Java Application Kafka Streams ksqlDB trucking_ driver Kafka Broker ksqlDB Engine Kafka Streams ksqlDB REST Commands ksqlDB CLI push pull
  17. 17. Event Hub Stream Analytics Stream Data Integration Stream Data Integration Vehicle Environ mental Ware house Using Stream Analytics • Push results back to new topic so other interested parties can use it too! E-Comm erce Stream Data Integration Streaming Data Sources Gateway
  18. 18. Event Hub Stream Analytics Stream Data Integration Stream Data Integration Vehicle Environ mental Ware house Using Stream Data Integration to callback to Data Source (to Actuator) E-Comm erce Stream Data Integration Streaming Data Sources Gateway
  19. 19. Event Hub Stream Analytics Streaming Visualize Stream Data Integration Stream Data Integration Vehicle Environ mental Ware house Using Streaming Visualization • ksqlDB pull queries or Kafka Streams Interactive Queries allow to query state of stream processor [2] E-Comm erce Stream Data Integration Streaming Data Sources Stream Data Integration Gateway
  20. 20. Event Hub Stream Analytics Legacy App Stream Data IntegrationCDC Streaming Visualize Stream Data Integration Stream Data Integration Stream Data Integration Vehicle Environ mental Ware house (Right-Time) Legacy Systems Integration • Stream-to-Table join E-Comm erce Stream Data Integration Streaming Data Sources Gateway Legacy Data Sources
  21. 21. Kafka as an Event Hub 10 1. topic semantics 2. queue semantics 3. horizontally scalable 4. auto-scaling 5. highly available 6. back-pressure 7. durable 8. schema-less/opaque 9. Stream and Batch Consumers 10. (Unlimited) Retention 11. Guaranteed ordering 12. re-consumption of events 13. Access Control 14. Interoperable
  22. 22. Event Hub Stream Analytics Legacy App Machine IIoT Stream Data IntegrationCDC Stream Data Integration CDC Streaming Visualize Stream Data Integration Stream Data Integration Stream Data Integration Vehicle Environ mental Ware house (Right-Time) Legacy Systems Integration E-Comm erce Stream Data Integration Streaming Data Sources Gateway Legacy Data Sources
  23. 23. Event Hub Stream Analytics Legacy App Machine IIoT Stream Data IntegrationCDC Stream Data Integration CDC Streaming Visualize Stream Data Integration Stream Data Integration Stream Data Integration Vehicle Environ mental Ware house Stream Data Integration NoSQL RDBMS Micro-Batch Visualize Providing “Materialized Views” in RDBMS or NoSQL Datastores E-Comm erce Stream Data Integration Streaming Data Sources Gateway • Bootstrap ”Materialized View” from event history Legacy Data Sources
  24. 24. Event Hub Stream Analytics Legacy App Machine IIoT Stream Data IntegrationCDC Stream Data Integration CDC Streaming Visualize Stream Data Integration 1st Micro service Stream Data Integration Stream Data Integration Vehicle Environ mental Ware house Stream Data Integration NoSQL RDBMS Micro-Batch Visualize Modern Event-Driven Apps (aka. Microservices) • Microservice participates as both a consumer and producer of events E-Comm erce Stream Data Integration Streaming Data Sources Gateway Legacy Data Sources
  25. 25. Event Hub Stream Analytics Legacy App Machine IIoT Stream Data IntegrationCDC Stream Data Integration CDC Streaming Visualize Stream Data Integration 1st Micro service 2nd Micro service Stream Data Integration Stream Data Integration Vehicle Environ mental Ware house Stream Data Integration NoSQL RDBMS Micro-Batch Visualize Modern Event-Driven Apps (aka. Microservices) • 2nd microservice consumes events from 1st Bootstrap from event history [3] E-Comm erce Stream Data Integration Streaming Data Sources Gateway Legacy Data Sources
  26. 26. Event Hub Stream Analytics Legacy App Machine IIoT Stream Data IntegrationCDC Stream Data Integration CDC Streaming Visualize Stream Data Integration 1st Micro service 2nd Micro service Stream Data Integration Stream Data Integration Vehicle Environ mental Ware house Stream Data Integration NoSQL RDBMS Micro-Batch Visualize Bi-Directional Legacy Systems Integration [4]AQ E-Comm erce Stream Data Integration Streaming Data Sources Gateway Legacy Data Sources Legacy Data Sources
  27. 27. Event Hub Stream Analytics Legacy App Machine IIoT Stream Data IntegrationCDC Stream Data Integration CDC Streaming Visualize Stream Data Integration 1st Micro service 2nd Micro service Stream Data Integration Stream Data Integration Vehicle Environ mental Ware house Stream Data Integration NoSQL RDBMS Micro-Batch Visualize Hybrid Cloud Scenario AQ E-Comm erce Stream Data Integration Streaming Data Sources Event Hub Mirroring Event Hub Gateway Legacy Data Sources
  28. 28. Event Hub Stream Analytics Streaming Visualize Stream Data Integration Stream Data Integration Stream Data Integration Vehicle Environ mental Ware house Batch Analytics Event Hub as “Virtualized” Data Lake for Batch Analytics E-Comm erce Stream Data Integration Streaming Data Sources 1st Micro service 2nd Micro service Stream Data Integration NoSQL RDBMS Micro-Batch Visualize Legacy App Machine IIoT Stream Data IntegrationCDC Stream Data Integration CDC AQ Gateway Legacy Data Sources
  29. 29. Kafka Storage Local Storage Tiered Storage (Confluent Enterprise) Broker 1 Broker 2 Broker 3 Broker 1 Broker 2 Broker 3 Object Storage hothot & cold cold 10 10 Data Retention: • Never • Time (TTL) or Size-based • Log-Compacted based 1. topic semantics 2. queue semantics 3. horizontally scalable 4. auto-scaling 5. highly available 6. back-pressure 7. durable 8. schema-less/opaque 9. Stream and Batch Consumers 10. (Unlimited) Retention 11. Guaranteed ordering 12. re-consumption of events 13. Access Control 14. Interoperable
  30. 30. Event Hub Stream Analytics Legacy App Machine IIoT Stream Data IntegrationCDC Stream Data Integration CDC Streaming Visualize Stream Data Integration 1st Micro service 2nd Micro service Stream Data Integration Stream Data Integration Vehicle Environ mental Ware house Batch Data Integration Stream Data Integration NoSQL RDBMS Data Lake / DWH Batch Visualize Batch Analytics Micro-Batch Visualize “Materialized” Data Lake for Batch Analytics E-Comm erce Stream Data Integration Streaming Data Sources Gateway Legacy Data Sources
  31. 31. Event Hub Stream Analytics Legacy App Machine IIoT Stream Data IntegrationCDC Stream Data Integration CDC Streaming Visualize Stream Data Integration 1st Micro service 2nd Micro service Serverless FaaS Stream Data Integration Stream Data Integration Vehicle Environ mental Ware house Gateway Batch Data Integration Stream Data Integration NoSQL RDBMS Data Lake / DWH Batch Visualize Batch Analytics Micro-Batch Visualize Serverless/Function as a Service (FaaS) E-Comm erce Stream Data Integration Streaming Data Sources Legacy Data Sources
  32. 32. Event Hub Stream Analytics Legacy App Machine IIoT Stream Data IntegrationCDC Stream Data Integration CDC Streaming Visualize Stream Data Integration 1st Micro service 2nd Micro service Serverless FaaS Stream Data Integration Stream Data Integration Vehicle Environ mental Ware house Gateway Batch Data Integration Stream Data Integration NoSQL RDBMS Data Lake / DWH Batch Visualize Batch Analytics Micro-Batch Visualize Event Hub becomes the central nervous system for your information! E-Comm erce Stream Data Integration Streaming Data Sources Legacy Data Sources
  33. 33. Event Hub Stream Analytics Legacy App Machine IIoT Stream Data IntegrationCDC Stream Data Integration CDC Streaming Visualize Stream Data Integration Micro service Micro service Serverless FaaS Stream Data Integration Stream Data Integration Vehicle Environ mental Ware house Gateway Batch Data Integration Stream Data Integration NoSQL RDBMS Data Lake / DWH Batch Visualize Batch Analytics Micro-Batch Visualize Event Hub becomes the central nervous system for your information! E-Comm erce Stream Data Integration Streaming Data Sources Log as a first-class citizen! Turning the database Inside out! Legacy Data Sources
  34. 34. Summary
  35. 35. Ref Architecture Service Event Stream Bulk Data Flow Bulk Source Event Source Location DB Extract File Weather DB IoT Data Mobile Apps Social File Import / SQL Import Consumer BI Apps Data Science Workbench Enterprise App Enterprise Data Warehouse SQL / Search SQL “Native” Raw RDBMS “SQL” / Search Service Event Hub Hadoop ClusterdHadoop ClusterBig Data Platform SQL Export Storage Storage Raw Refined/ UsageOpt Microservice Cluster Stream Processing Cluster Stream Processor Model / State Edge Node Rules Event Hub Storage Governance Data Catalog Rules Engine Parallel Processing Query Engine Microservice Data { } API Event Stream Modern Data Platform Event Stream Event Stream
  36. 36. Reference 1. Stream Processing Concepts and Frameworks 2. Streaming Visualization 3. Building event-driven (Micro)Services with Apache Kafka 4. Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka

×