Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp

206 views

Published on

Apache Kafka is the most used data streaming broker by companies. It could manage millions of messages easily and it is the base of many architectures based in events, micro-services, orchestration, ... and now cloud environments. OpenShift is the most extended Platform as a Service (PaaS). It is based in Kubernetes and it helps the companies to deploy easily any kind of workload in a cloud environment. Thanks many of its features it is the base for many architectures based in stateless applications to build new Cloud Native Applications. Strimzi is an open source community that implements a set of Kubernetes Operators to help you to manage and deploy Apache Kafka brokers in OpenShift environments.

These slides will introduce you Strimzi as a new component on OpenShift to manage your Apache Kafka clusters.

Slides used at OpenShift Meetup Spain:
- https://www.meetup.com/es-ES/openshift_spain/events/261284764/

Published in: Software
  • Be the first to comment

Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp

  1. 1. Where Apache Kafka meets OpenShift
  2. 2. oc whoami $ oc whoami rmarting, jromanmartin $ oc describe user rmarting jromanmartin Name: Jose Roman Martin Gil Created: 43 years ago Labels: father, husband, friend, runner, curious, red hatter, developer (in any order) Annotations: Principal Middleware Architect @ Red Hat Identities: GitHub: https://github.com/rmarting LinkedIn: https://www.linkedin.com/in/jromanmartin/ @jromanmartin
  3. 3. Spoiler!! It is a true love story!
  4. 4. What is Apache Kafka? Apache Kafka is a distributed system designed for streams. It is built to be an horizontally-scalable, fault-tolerant, commit log, and allows distributed data streams and stream processing applications.
  5. 5. What is Apache Kafka? ● Developed at LinkedIn back in 2010, open sourced in 2011 ● Designed to be fast, scalable, durable and available ● Distributed by nature ● Data partitioning (sharding) ● High throughput / low latency ● Ability to handle huge number of consumers
  6. 6. Main Use Cases Rebuild user activity tracking pipeline as a set of real-time publish-subscribe feeds. Activity is published to central topics with one topic per activity type. Replacement of traditional message broker, has better throughput, built-in partitioning, replication, and fault-tolerance. Provides strong durability. Messaging Website Activity Tracker Aggregation of statistics from distributed applications to produce centralized feeds of operational data. Metrics
  7. 7. Main Use Cases Stream Processing Data Integration Enables continuous, real-time applications built to react to, process, or transform streams. Captures streams of events or data changes and feeds these to other data systems. Abstracts details of files an gives event data as stream of messages. Offers good performance, stronger durability guarantees due to replication. Log Aggregation
  8. 8. Apache Kafka Core Components Zookeeper Kafka Applications Admin tools
  9. 9. Topics and Partitions ● Messages/records are sent to/received from topic ○ Topics are split into one or more partitions ○ All actual work is done on partition level, topic is just a virtual object ● Each message is writing only into one selected partition ○ Partitioning is usually done based on the message key ○ Messaging ordering within the partition is fixed ● Retention policies ○ Based on size/message age ○ Compacted based on message key ● Replication ○ Each partition can exist in one or more backup copies to achieve HA
  10. 10. Topics and Partitions - Producers old new 0 1 2 3 4 5 6 7 8 9 1 0 1 1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 1 0 Producer Partition 0 Partition 1 Partition 2
  11. 11. Topics and Partitions - Consumers old new 0 1 2 3 4 5 6 7 8 9 1 0 1 1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 1 0 Consumer Partition 0 Partition 1 Partition 2
  12. 12. High Availability Broker 1 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 2 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 3 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Replication from leaders to followers
  13. 13. High Availability Broker 1 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 2 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 3 T1 - P1 T1 - P2 T2 - P1 T2 - P2 New leaders on different brokers
  14. 14. High Availability Broker 1 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 2 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Broker 3 T1 - P1 T1 - P2 T2 - P1 T2 - P2 Producer Consumer Consumer Producer Consumer Reading and writing to leaders
  15. 15. Partitions Assignment Topic Consumer Groups Partition 0 Partition 1 Partition 2 Partition 3 Group 1 Consumer Consumer Group 2 Consumer Consumer Consumer
  16. 16. Consumer Groups Rebalancing Topic Partition 0 Partition 1 Partition 2 Partition 3 Group 1 Consumer Consumer Group 2 Consumer Consumer Consumer
  17. 17. Max parallelism & Idle Consumer Consumer Groups Topic Partition 0 Partition 1 Partition 2 Partition 3 Group 1 Consumer Consumer Consumer Consumer Consumer
  18. 18. Applications ● They are really “smart” (unlike “traditional” messaging) ● Configured with a “bootstrap servers” list for fetching first metadata ○ Where are interested topics? Connect to broker which holds partition leaders ○ Producer specifies destination partition ○ Consumer handles messages offsets to read ○ If error happens, refresh metadata (something is changed in the cluster) ● Batching on producing/consuming
  19. 19. What is OpenShift? A distribution of Kubernetes optimized for continuous application development and multi-tenant deployment.
  20. 20. SPRING & JAVA EE MICROSERVICES FUNCTIONS LANGUAGES DATABASES APPLICATION SERVICES CODE BUILD TEST DEPLOY MONITORREVIEW Self-service Infrastructure Automated build & deploy CI/CD pipelines Consistent environments Configuration management App logs & metrics OpenShift enables Developer Productivity
  21. 21. But ... ● Apache Kafka is stateful which means we require: ○ A stable broker identity ○ A way for the brokers to discover each other on the network ○ Durable broker state ○ The ability to recover broker state after a failure ● All the above are true for Zookeeper as well and … ○ Each node has the configuration of the others ○ To have notes able to communicate each others ● Accessing Kafka is not so simple ● OpenShift provides several services …
  22. 22. Who is going to help me?
  23. 23. Strimzi is a set of enabling services that allow Apache Kafka to work in OpenShift as a first class citizen, be installed easily and configured and managed simply. Strimzi is our hero!
  24. 24. What is Strimzi? ● Simplifies Apache Kafka deployment on OpenShift ● Uses the OpenShift native mechanisms for: ○ Provisioning Kafka clusters ○ Managing topics and users ● Provides: ○ Linux Container images for running Apache Kafka and Zookeeper ○ Tooling for managing and configuring Apache Kafka clusters, topics and users ● Follows the Kubernetes Operator Model ● Available from OperatorHub.io
  25. 25. OperatorHub.io ● Home for the Kubernetes community to share Operators ● Many different Operators available … ○ Big Data, Cloud Provider, Database, Logging & Tracing, Monitoring, ... ● … And growing up !!!
  26. 26. Kubernetes Operators ● Application-specific controller is used to create, configure and manage other complex application: ○ The controller contains specific domain/application knowledge ○ Usually used for stateful applications (databases, …) which are non-trivial to operate on Kubernetes/OpenShift ● Controller operates based on input from Config Maps or Custom Resource Definitions ○ User describes the desired state ○ Controller applies this state to the application
  27. 27. Operator Pattern ● Monitor the current state of the application ● Compare the actual state to the desire state ● Resolve any differences between actual and desired state Observe Analyze Act
  28. 28. Strimzi Operators Kafka Operator Topic Operator User Operator
  29. 29. Strimzi Operators in a glance Strimzi OpenShift Custom Resources KafkaTopic Kafka KafkaUser Operators User Operator Topic Operator Kafka Operator Kafka Environment Deploys & Manage Manage Users Manage Topics Kafka Cluster Kafka Broker Kafka Broker Kafka Broker Zookeeper Cluster Zookeeper Node Zookeeper Node Zookeeper Node
  30. 30. Kafka Operator ● Creating and managing Apache Kafka clusters ○ Number of Zookeeper, Kafka Brokers and Kafka Connect nodes ○ Configuration of Kafka and Kafka Connect ● Responsible for ○ Deployment ○ Storage ○ Scale up / Scale Down ○ Metrics ○ Healthchecks ● One Cluster Operator can manage several clusters in parallel ○ Can cover one or more projects
  31. 31. Topic Operator ● Creating and managing Kafka topics ● Some Kafka components (Streams, Connect) often create their own topics ○ Bi-directional synchronization ○ Changes done directly in Kafka/Zookeeper are applied to Custom Resource Definitions ○ Changes done in Custom Resource Definitions are applied to Kafka topics ● Topic Operator solves this by using 3-way diff ○ Our own Zookeeper-based store ○ Apache Kafka / Zookeeper ○ Custom Resource Definitions
  32. 32. User Operator ● Creating and managing Kafka users ● Unlike Topic Operator, does not sync any changes from the Kafka cluster with OCP ● It is not expected that the users will be managed directly in Kafka cluster in parallel with the User operator ● User Credentials managed in a Secret ● Manages authorization rules
  33. 33. Strimzi Custom Resource Definitions ● Extensions of the Kubernetes API to store a collection of API objects ● Allows to introduce an own API into a cluster ● Strimzi CRD ○ Kafka: Definition of a Kafka Cluster ○ KafkaTopic: Definition of a Kafka Topic ○ KafkaUser: Definition of a User
  34. 34. Kafka CRD apiVersion: kafka.strimzi.io/v1alpha1 kind: Kafka metadata: name: meetup-cluster spec: kafka: replicas: 2 # listeners, config, metrics, jvmOptions, resources storage: type: persistent-claim size: 12Gi zookeeper: replicas: 3 # jvmOptions, resources storage: type: persistent-claim size: 5Gi entityOperator: # topicOperator, userOperator, tlsSidecar
  35. 35. KafkaTopic & KafkaUser CRD apiVersion: kafka.strimzi.io/v1alpha1 kind: KafkaTopic metadata: name: meetings labels: strimzi.io/cluster: meetup-cluster spec: partitions: 10 replicas: 2 apiVersion: kafka.strimzi.io/v1alpha1 kind: KafkaUser metadata: name: roman labels: strimzi.io/cluster: meetup-cluster spec: authentication: type: tls authorization: type: simple acls: - resource: type: topic name: meetings patternType: literal operation: Read
  36. 36. Strimzi is much more ...
  37. 37. Demo Time !!!
  38. 38. Get Started !!! ● Strimzi: ○ https://strimzi.io ○ https://github.com/strimzi ● OpenShift: ○ https://www.openshift.com ○ http://learn.openshift.com ● Apache Kafka: ○ https://kafka.apache.org ● OperatorHub.io: ○ https://www.operatorhub.io
  39. 39. Questions?
  40. 40. Thank you!

×