Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introducing AMQ Streams data streaming with Apache Kafka


Published on

Apache Kafka has emerged as a leading platform for building real-time data pipelines. It provides support for high-throughput/low-latency messaging, as well as sophisticated development options that cover all the stages of a distributed data-streaming pipeline—from ingestion to processing.

In this session, we'll introduce and demonstrate AMQ Streams, an enterprise-grade distribution of Apache Kafka. We'll showcase how AMQ Streams can be deployed on OpenShift to simplify the task of configuring, deploying, and managing Kafka clusters and Kafka-based applications at scale.

With AMQ Streams now joining Broker and Interconnect, AMQ now offers a rich suite of messaging technologies covering every type of connectivity challenge, for example, IoT data ingestion, enterprise application integration, secure cross-data center connectivity, and data streaming. In this session, we'll also compare the different AMQ messaging engines and highlight their respective sweet-spots.

Published in: Software

Introducing AMQ Streams data streaming with Apache Kafka

  1. 1. INTRODUCING AMQ STREAMS Data streaming with Apache Kafka David Ingham Director, Software Engineering Paolo Patierno Principal Software Engineer 05/10/2018 @dingha @ppatierno
  2. 2. Red Hat AMQ 2018 Flexible messaging for the enterprise, cloud and Internet of Things Standard Protocols Broker • Queuing and pub/sub • Rich feature set • JMS 2.0 compliance • Best-in-class perf • Based on Apache ActiveMQ Artemis Interconnect • Message routing • Secure messaging backbone for hybrid cloud • Based on Apache Qpid Dispatch Router Streams (preview) • Durable pub/sub • Replayable streams • Highly scalable • Based on Apache Kafka Online (preview) • Scalable, “self-service” messaging-as-a-service utility based on OpenShift • Available for self-managed and Red Hat-managed deployments (AMQ Online) Polyglot Clients Common Management
  3. 3. What is Apache Kafka? A publish/subscribe messaging system? A streaming data platform? A distributed, horizontally-scalable, fault-tolerant, commit log?
  4. 4. Apache Kafka Concepts •Messages are sent to and received from a topic • Topics are split into one or more partitions (aka shards) • All actual work is done on partition level, topic is just a virtual object •Each message is written only into a one selected partition • Partitioning is usually done based on the message key • Message ordering within the partition is fixed •Retention • Based on size / message age • Compacted based on message key
  5. 5. Kafka concepts Topics & partitions old new 0 1 2 3 4 5 6 7 8 9 1 0 1 1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 1 0 Producer Partition 0 Partition 1 Partition 2
  6. 6. Kafka concepts Topics & partitions old new 0 1 2 3 4 5 6 7 8 9 1 0 1 1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 1 0 Consumer Partition 0 Partition 1 Partition 2
  7. 7. Kafka concepts High availability Broker 1 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 2 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 3 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Leaders and followers spread across the cluster
  8. 8. Kafka concepts High availability Broker 1 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 2 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 3 T1 - P1 T1 - P2 T2 - P1 T2 - P2 If a broker with leader partition goes down, a new leader partition is elected on different node
  9. 9. AMQ Broker & AMQ Streams Key differences AMQ Broker (ActiveMQ Artemis) AMQ Streams (Kafka) Model “Smart broker, dumb clients” “Dumb broker, smart clients” Durability Volatile or durable storage Durable storage Storage duration Temporary storage of messages Potential long-term storage of messages Message retention Retained until consumed Retained until expired or compacted Consumer state Broker managed Client managed (can be stored in broker) Selectors Yes, per consumer No Stream replay No Yes High-availability Replication Replication Protocols AMQP, MQTT, OpenWire, Core, STOMP Kafka protocol Delivery guarantees Best-effort or guaranteed Best-effort or guaranteed
  10. 10. AMQ Streams Why should you use AMQ Streams ? ● Scalability and performance ○ Designed for horizontal scalability ● Message ordering guarantee ○ At partition level ● Message rewind/replay ○ “Long term” storage ○ Allows to reconstruct application state by replaying the messages ○ Combined with compacted topics allows to use Kafka as key-value store
  11. 11. AMQ Streams What’s the catch ? ● Kafka protocol cannot be proxied ○ Clients need access to all brokers in the cluster ○ Producers/consumers might need to maintain large number of TCP connections ○ Proxying via HTTP REST or AMQP could be a solution ● Dumb broker, smart clients ○ Carefully decide the “right” number of partitions for each topic ○ Adding partitions can change destination partition for “keyed” messages ○ Removing partitions is not possible
  12. 12. AMQ Streams on OpenShift ● Based on OSS project called Strimzi ● Provides: ○ Docker images for running Apache Kafka and Zookeeper ○ Tooling for managing and configuring Apache Kafka clusters and topics ● Follows the Kubernetes “operator” model ● OpenShift 3.9 and higher
  13. 13. AMQ Streams on OpenShift What is Strimzi ? ● Open source project focused on running Apache Kafka on Kubernetes and OpenShift ● Licensed under Apache License 2.0 ● Web site: ● GitHub: ● Slack: ● Mailing list: ● Twitter: @strimziio
  14. 14. AMQ Streams on OpenShift The challenges ● Apache Kafka is *stateful* which means we require … ○ … a stable broker identity ○ … a way for the brokers to discover each other on the network ○ … durable broker state (i.e., the messages) ○ … the ability to recover broker state after a failure ● All the above are true for Apache Zookeeper as well ● StatefulSets, PersistentVolumeClaims, Services can help but …
  15. 15. It’s not easy!
  16. 16. AMQ Streams on OpenShift Goals ● Simplifying the Apache Kafka deployment on OpenShift ● Using the OpenShift native mechanisms for... ○ Provisioning the cluster ○ Managing the topics ● … thereby removing the need to use Kafka command-line tools ● Providing a better integration with applications running on OpenShift ○ microservices, data streaming, event-sourcing, etc.
  17. 17. AMQ Streams on OpenShift The “Operator” model ● An application used to create, configure and manage other complex applications ○ Contains specific domain / application knowledge ● Controller operates based on input from Config Maps or Custom Resource Definitions ○ User describes the desired state ○ Controller applies this state to the application ● It watches the *desired* state and the *actual* state … ○ … taking appropriate actions Observe Analyze Act
  18. 18. AMQ Streams on OpenShift Config Map versus Custom Resource Definitions ● Controllers are currently using Config Maps for configuration ○ Main advantage of Config Maps is no need for special permissions to install Strimzi/AMQ Streams on OpenShift ● CRDs have some advantages as well ○ Flexible data structure ○ Possibility to set permissions for the CRD resources ● Adding support for CRDs is on backlog for the future
  19. 19. Cluster Controller Creating and managing Apache Kafka clusters Zookeeper Kafka Cluster Controller Config Map Manages
  20. 20. Cluster Controller Creating cluster ● Able to deploy two types of clusters ○ Kafka (alongside a Zookeeper ensemble) ○ Kafka Connect (even with S2I support for custom connector plugins) ● The ConfigMap allows to specify ○ Number of nodes ○ Brokers configuration ○ Healthchecks ○ Metrics exported for Prometheus ● Ephemeral or persistent storage
  21. 21. Cluster Controller Managing cluster ● Modifying the ConfigMap for updating the cluster ○ Scale up/down ○ Configuration changes (rolling updates) ● Deleting the ConfigMap for de-provisioning the cluster ○ Persisted data will be deleted according to the user configuration
  23. 23. Topic Controller Creating and managing Kafka topics Zookeeper Kafka Topic Controller Config Map Manages topics
  24. 24. Topic Controller Creating and managing Kafka topics ● Topics can be created by… ○ Writing a ConfigMap ○ Interacting directly with Kafka cluster ○ Automatically by others (Kafka Connect, Kafka Streams) ● Consistency is handled by using 3-way diff ○ Our own Zookeeper store ○ Apache Kafka/Zookeeper ○ ConfigMaps
  25. 25. Topic Controller Creating and managing Kafka topics Zookeeper (Topic Controller’s own storage) Kafka topics Topic Controller (3-way diff) Config Map
  27. 27. AMQ Streams on OpenShift Planned for 1.0 ● Detailed Kafka configuration (buffers, topic defaults, etc.) ● TLS encryption and authentication ● Authentication options ○ TLS Client Authentication ○ SASL-SCRAM mechanism with credentials stored in Zookeeper ○ SASL-PLAIN mechanisms with credentials stored in OpenShift secret ● Authorization using ACL rules stored in Zookeeper ● Resources configuration (memory and CPU limits, ...) ● Scaling (with manual partition reassignment) ● Managing topics
  28. 28. AMQ Streams on OpenShift Planned after 1.0 ● Kafka updates ● Automated partition balancing and automated scaling ● Additional authentication options ○ Using Red Hat SSO, LDAP, Kubernetes tokens ● Exposing Kafka cluster outside of OpenShift ● Service broker integration ● Integrated AMQP, MQTT and HTTP bridge ● Integrated Schema registry ● MirrorMaker Operator
  30. 30. IoT and Kafka Streams iot-temperature-max Zookeeper Kafka Kafka Streams App Device App(s) iot-temperature iot-temperature Consumer App iot-temperature-max
  31. 31. Summary Click to add subtitle ● AMQ Streams is distribution of Apache Kafka included as part of the AMQ product ● Simplifies the deployment, management and monitoring of Kafka on OpenShift using the Operator approach ● Fully open source based on Strimzi ● Available now as a Developer Preview Signup: ● Beta tentatively planned for Summer 2018 with GA in late Fall 2018 Standard Protocols AMQ Broker AMQ Online Polyglot Clients Common Mgmt Red Hat AMQ AMQ Interconnect AMQ Streams
  32. 32. Thank you! Any questions?
  33. 33. LATER TODAY Thu May 10, 1:45 PM – 3:45 PM Red Hat AMQ Online – Messaging-as-a-Service Ulf Lilleengen, Paolo Patierno Moscone South 214 [W1098]
  34. 34. Resources ● Strimzi : ● Apache Kafka : ● AMQ Streams Dev Preview : ● Demo :
  35. 35. THANK YOU