Kafka and ibm event streams basics

Brian S Paskin, Senior Application Architect, R&D Services, IBM Cloud Innovations Lab
Updated 22 May 2019
Kafka and IBM Event
Streams Basics

What is Kafka
2
 Kafka was originally developed at LinkedIn in 2010 and opened sourced in 2011
 A version and extras maintained by Confluent, the original Kafka creators from LinkedIn
 A distributed publish and subscribe middleware where all records are persistent
 Used as a part in Event Driven Architectures
 Fault tolerant and scalable when running multiple brokers with multiple partitions
 Kafka runs on Java with clients in many languages
 Uses Apache Zookeeper for metadata (leader and follower setup)
 Can be used with Java Messaging Services (JMS), but does not support all features
 Kafka clients are written in many languages
– C/C++, Python, Go, Erlang, .NET, Clojure, Ruby, Node.js, Proxy (HTTP REST), Perl,
stdin/stdout, PHP, Rust, Alternative Java, Storm, Scala DSL, Clojure, Swift

Brokers and Clusters
4
 A broker is an instance of Kafka, identified by an integer, in the configuration file
 More than 1 broker working together is a cluster
– Can span multiple systems
 All brokers in a cluster know about all other brokers
 All information is written to disk
 A connection to a broker is called the bootstrap broker
 A cluster durably persists all published records for the retention period
– Default retention period is 1 week

Topics and Partitions
5
 A topic is a category or feed name to which records are published
– Subtopics are not supported (i.e. sports/football, sports/football/ASRoma)
 A partition is an ordered, immutable sequence of records of a specific topic
 The records in the partitions are each assigned a sequential id number called the offset
 A topic can have multiple partitions that may span brokers in the cluster
– Allows for fault tolerance and better for consuming of messages
 Partitions can be replicated with in sync replicas (ISR) that are passive
 Partitions/Replicas have a leader that is elected
– If a partition goes down then a new leader is elected
– Cannot have more replicas than brokers
 Brokers can have more than 1 partition, and have multiple partitions for the same topic

Topics and Partitions
6
Cluster with Brokers and 3 Partitions Scenarios

Records
7
 Records consist of a key, a value, and a timestamp
– A key is not required
– Timestamp is added automatically
– The key and value can be an Objects
 Records are serialized by Producers and deserialized by Consumers
– Several serializers/deserializers are available
– Can write other serializers/deserializers

Producers
8
 A Producer writes a record to a Topic
– If there are more than 1 partition then round robin is used to each partition of the topic
– If a key is given, then the record will always be written to a single partition
 For guaranteed deliver there are three types of acknowledgments (ack)
– 0. No acknowledgment (fire and forget)
– 1. Wait for leader to acknowledge
– All. wait for leader and replicas to acknowledge
 Producer retries if acknowledgement is never received
– Can be sent out of order
– May cause duplicate records
 Producers can be idempotent, which prevents sending a message twice
 Producers can use message compression
– Compression codecs supported are Snappy, GZIP and LZ4
– Consumers will automatically know a message is compressed and decompress

Producers
9
 Producers can send messages in batches for efficiency
– By default 5 messages can be in flight at a time
– More messages are placed in batch and sent all at once
– Creating a small delay in processing can lead to better performance
– Batch waits until the delay is expired or batch size is filled
– Messages larger than the batch size will not be batched
 If Producers are sending faster than Brokers can handle then the Producers can be slowed
– Set the buffer memory for storage
– Set the blocking time (milliseconds)
– Throw an error message that the records cannot be sent
 Schema Registry is available to validate data using Confluent Schema Registry
– Uses Apache Avro
– Protects from bad data or mismatches
– Self describing

Consumers
10
 Consumers subscribe to 1 or more Topics
– Read from all partitions from the last offset and consumes records in FIFO order
– Can have multiple consumers subscribed to a topic
– Consumers can set the offset if records need to be processed again
 Multiple Consumers in a consumer groups will read each from a fixed amount of partitions
exclusively
– Having more consumers in a group than partitions will lead to inactive consumers
– Adding or removing Consumers will automatically rebalance the Consumers with the
number of partitions
 Consumers can be idempotent by coding
 Schema Registry is available

Connectors
11
 Connectors allow for integration from sources to sinks and vice versa
– Import from sources like DBs, JDBC, Blockchain, Salesforce, Twitter, etc
– Export to AWS S3, Elastic Search, JDBC, DB, Twitter, Splunk, etc
– Run a connect cluster to pull from source and publish it to Kafka
– Can be used with Streams
– Confluent Hub has many connectors already available
 Connectors can be managed with REST calls

Streams
12
 Consumers from a Topic, processes data, and Publishes in another Topic
 Several built in functions to process or transform data
– Can create other functions
– branch, filter, filterNot, flatMap, flatMapValues, foreach, groupByKey, groupBy, join,
leftJoin, map, mapValues, merge, outerJoin, peek, print, selectKey, through, transform,
tranformFormValues
 Exactly once processing
 Event time windowing is supported
– Group of records with the same key perform stateful operations

Zookeeper Quick Look
14
 Open source project from Apache
 Comes in the package with Kafka
 Centralized system for maintaining configuration information in a distributed system
 There is a Leader service and follower services that exchange information
 Runs on Java
 Should always have an odd number of Zookeeper services started
 Keeps information in files
 Do not need to use the Zookeeper provided with Kafka

Kafka Command Line Basics
15
 Start Zookeeper as a daemon
zookeeper-server-start.sh –daemon ../config/zookeeper.properties
 Stop Zookeeper
zookeeper-server-stop.sh
 Start Kafka as a daemon
kafka-server-start.sh –daemon ../config/server.properties
 Stop Kafka
kafka-server-stop.sh
 Create a topic with number of partitions and number of replications
kafka-topics.sh –-bootstrap-server host:port --topic topicName --
create --partitions 3 --replication-factor 1
 List Topics
kafka-topics.sh –-bootstrap-server host:port --list

16
 Retrieve information about a Topic
kafka-topics.sh –-bootstrap-server host:port --topic topicName --
describe
 Delete Topic
kafka-topics.sh –-bootstrap-server host:port --topic topicName –-
delete
 Produce messages to a Topic
kafka-console-producer.sh --broker-list host:port --topic topicName
 Consume from Topic from current Offset
kafka-console-consumer.sh --bootstrap-server host:port --topic
topicName
 Consume from Topic from Beginning Offset
topicName --from-beginning

17
 Consume from Topic using Consumer Group
topicName --group groupName

Event Streams
18
 Event Streams is IBM’s implementation of Kafka
– Several different versions and support
 IBM Event Streams is Kafka with enterprise features and IBM Support
 IBM Event Streams Community Edition is a free version for evaluation and demo use
 IBM Event Streams on IBM Cloud is Kafka as a service on the IBM Cloud
 Support on Red Hat Open Shift and IBM Cloud Private
 Contains REST Proxy Interface for the Producer
 Use external monitoring tools
 Producer Dashboard
 Health Checks for Cluster, Deployment and Topics
 Geo-replication of Topics for high availability and scalability
 Encrypted communications

Event Streams on IBM Cloud
19
 Select Event Streams from the Catalog
 Enter details and which plan is to be used
– Classic, as a Cloud Foundry Service
– Standard, as a standard Kubernetes service
– Enterprise, dedicate
 Fill out topic information and other attributes
 Create credentials that can be used by selecting Service Credentials
 Viewing the credentials shows Brokers hosts and ports, Admin URL, userid and password
 IBM Cloud has its own ES CLI to connect
 IBM MQ Connectors are available

Kafka and IBM MQ
21
 Kafka is a pub/sub engine with streams and
connectors
 All topics are persistent
 All subscribers are durable
 Adding brokers to requires little work
(changing a configuration file)
 Topics can be spread across brokers
(partitions) with a command
 Producers and Consumers are aware of
changes made to the cluster
 Can have n number of replication partitions
 MQ is a queue, pub/sub engine with file
transfer, MQTT, AMQP and other capabilities
 Queues and topics can be persistent or non
persistent
 Subscribers can be durable or non durable
 Adding QMGRs to requires some work (Add
the QMGRs to the cluster, add cluster
channels. Queues and Topics need to be
added to the cluster.)
 Queues and topics can be spread across a
cluster by adding them to clustered QMGRs
 All MQ clients require a CCDT file to know of
changes if not using a gateway QMGR
 Can have 2 replicas (RDQM) of a QMGR,
Multi Instance QMGRs

Kafka and IBM MQ
22
 Simple load balancing
 Can reread messages
 All clients connect using a single connection
method
 Streams processing built in
 Has connection security, authentication
security, and ACLs (read/write to Topic)
 Load balancing can be simple or more
complex using weights and affinity
 Cannot reread messages that have been
already processed
 MQ has Channels which allow different
clients to connect, each having the ability to
have different security requirements
 Stream processing is not built in, but using
third party libraries, like MicroProfile Reactive
Streams, ReactiveX, etc.
 Has connection security, channel security,
authentication security, message
security/encryption, ACLs for each Object,
third party plugins (Channel Exits)

Kafka and IBM MQ
23
 Built on Java, so can run on any platform that
support Java 8+
 Monitoring by using statistics provided by
Kafka CLI, open source tools, Confluent
Control Center
 Latest native on AIX, IBM i, Linux systems,
Solaris, Windows, z/OS.
 Much more can be monitored. Monitoring
using PCF API, MQ Explorer, MQ CLI
(runmqsc), Third Party Tools (Tivoli, CA
APM, Help Systems, Open Source, etc)

More information
24
 Sample code on GitHub
 Kafka documentation
 Event Streams documentation
 Event Streams on IBM Cloud
 Event Streams sample on GitHub
 IBM Cloud Event Driven Architecture (EDA) Reference
 IBM Cloud EDA Solution

Kafka and ibm event streams basics

Kafka and ibm event streams basics

More Related Content

What's hot

Similar to Kafka and ibm event streams basics

Recently uploaded

Kafka and ibm event streams basics

Editor's Notes