Kafka Summit SF 2017 - Kafka and the Polyglot Programmer

© 2017 IBM Corporation
Edoardo Comar ecomar@uk.ibm.com
Andrew Schofield andrew_schofield@uk.ibm.com
IBM Message Hub
Kafka and the Polyglot Programmer
A brief overview of the Kafka clients ecosystem

Session objectives
• Show some comparable Kafka usage from different languages
• Show a little of what goes underneath a Kafka client
• To appreciate the heavy lifting a client has to do
• We’ll proceed in reverse order though J

How does an application talk to Kafka ?
• Protocol-level client libraries (implementing the Kafka “wire” API)
• The “official” Java client
• librdkafka C/C++ library and wrappers for other languages
• Other clients from a large open-source ecosystem
• Alternative “message-level” APIs
• Kafkacat, REST
• Higher-level APIs
• Kafka Connect, Kafka Streams

What is the Kafka protocol (the ‘wire’ API) ?
• A set of Request/Response message pairs
• e.g.: ProduceRequest, ProduceResponse (ApiKey=0)
• A set of error codes
• e.g.: Unknown Topic Or Partition (code=3)
• Messages exchanged using Kafka’s own binary protocol
• Over TCP (or TLS)
• It’s not HTTP, AMQP, MQTT …
• All requests initiated by the clients.
• Brokers send Responses

Kafka’s TCP binary protocol
• Open-source protocol (obviously!)
• Messages defined in terms of Serializable data structures
• Primitive types (intNN, nullable string) + Arrays
• Struct types, e.g.
RecordBatch for sequence of Records (key, value, metadata)
• Clients typically holds multiple long-lived TCP connections
• One per broker node
• Clients expected to use non-blocking I/O
http://kafka.apache.org/protocol

Kafka message capture with Wireshark
$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

Anatomy of a wire message
Magic 1
(v.0.10.x)

In 0.11 RecordBatch superseded MessageSet
• Magic value = 2
• Records have Headers (KIP-82)
• They look like footers 😀
• Metadata for Exactly-Once Semantics
• Space savings for large batches
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+
Kafka+Protocol#AGuideToTheKafkaProtocol-Messagesets

It’s an open Source protocol …
… so anyone can write a client, in any language ?
• In theory, yes
• In practice, it’s a very big investment
• A lot of intelligence goes in the client
• Partitioning
• Consumer Group assignment
• Complexity has grown a lot since 0.8 …
• Consumer group protocol
• Security protocols/SASL mechanisms
• KIP-4 (administrative actions)
• Exactly-Once Semantics

The evolution of the Kafka API
Kafka
Version
released # of API Keys (RPCs) # of Error Codes
Including -1 UNKNOWN
0 NO_ERROR
0.7.2 Nov 2012 5 6
0.8.0 Nov 2013 8 13
0.9.0 Nov 2015 17 33
0.10.0 May 2016 19 37
0.10.2 Feb 2017 21 46
0.11.0 June 2017 33 55
• Brokers support older clients
• Recent clients support somewhat older brokers

• Good support for the features of Apache Kafka
• Message keys, committing offsets, exactly-once semantics, ...
• Blending natural idioms of the language with proper use of Kafka
• Solid software engineering
• Responsive community support
• Native code or ‘pure’
• Particularly important in the cloud
• Does it support the technologies you have chosen to use?
• Message encoding, Schema Registry, ...
What makes for a good choice of client?

Project Language Pure or native code
Apache Kafka client Java pure
librdkafka C / C++ –
node-kafka-native JavaScript (Node.js) native
node-rdkafka JavaScript (Node.js) native
confluent-kafka-go Go native
Sarama Go pure
kafkacat CLI / Shell scripts –
Confluent Kafka REST Any –
Let’s take a look at some different clients

Java
producer
• Part of Apache Kafka
• Best for feature support and
performance
• Asynchronous with batching
• Highly configurable
• Rich metrics
https://kafka.apache.org/0110/javadoc/index.html

• Part of Apache Kafka
• Best for feature support and
performance
• Single-threaded
• Polls for records and this is
also a liveness check
• Commits offsets
automatically, async or sync
Java
consumer
https://kafka.apache.org/0110/javadoc/index.html

C / C++ librdkafka
• Fully featured native code Kafka client library
• Portable so supports Linux, MacOS, Windows and more
• Used as the basis for many other client libraries for other languages
• Does a good job of keeping track with the Kafka releases
• A bit tricky to build on platforms other than Linux if you want security
• SASL only recently supported on Windows
• SSL on Mac requires homebrew
• Can emit metrics
• At broker and topic-partition levels
https://github.com/edenhill/librdkafka

• Concepts very similar to
Apache Kafka client
• But you have to manage
memory yourself
• Uses callbacks to report
status but you have to poll to
have them fire
C librdkafka
producer

• This is the low-level consumer
interface
• The high-level one supports
consumer groups
• Thread-safe (unlike Java)
C librdkafka
consumer

• Built on top of the C library
• Looks more similar to Java,
primarily because it’s object-
oriented
• Again, there’s a need to make
a regular call to respond to
callbacks
C++ librdkafka
consumer

• Another Node.js module
wrapping librdkafka
• Looked promising but
ultimately not updated to
keep up with new features
• No updates for a long time
now
• Use node-rdkafka instead
https://github.com/alfred-landrum/node-kafka-native
JS node-kafka-native

• Third-party Node.js module
wrapping librdkafka
• Natural Node.js style of event
delivery
• A good example of the
community working well
https://github.com/Blizzard/node-rdkafka
JS node-rdkafka
producer

• Supports many of the
features of consuming such
as rebalancing, committing
offsets
• There’s also a streaming
interface
https://github.com/Blizzard/node-rdkafka
JS node-rdkafka
consumer

confluent-kafka-go
producer
• Confluent Go client based on
librdkafka
• Two variants of producer
• Function-based
• Channel-based
• Delivery reports emitted on
Events channel

• This variant of the API uses
polling and then a type switch
confluent-kafka-go
consumer

• This variant of the API uses a
channel to deliver messages
and events such as rebalance
confluent-kafka-go
consumer

• Third-party pure Go client
• Currently at 0.10.2.x level
Go Sarama
producer
https://shopify.github.io/sarama/

• Consumer groups not
supported yet
• No offset tracking
• Available as 3rd party
extensions
Go Sarama
consumer
https://shopify.github.io/sarama/

kafkacat
https://github.com/edenhill/kafkacat
• Command line
non-JVM
Kafka producer and
consumer
• Unsurprisingly, uses
librdkafka too
• Useful in shell scripts
and just for trying stuff
out on the command
line

• Part of Confluent
Platform
• Integrated with
Schema Registry
• Use any language…
• A bit tricky to format
the data correctly
Confluent
Kafka REST
https://github.com/confluentinc/kafka-rest

Do non-Java users face many issues?
• Most problems are conceptual
• Many new users struggle with the concepts of Kafka L
• Users assume it’s the same as traditional messaging systems
• Partitions, consumer groups, at-least-once semantics, ...
• Documentation nowadays is getting really good
• Historically, lack of best-practice examples in the various languages
• Handling expected errors properly is a common theme
• Failed commits, producer timeouts, ...
• Non-Java clients lag behind Java in features
• librdkafka doing a great job here, but dependent clients need to expose the
features
• Even more true for independent clients

Summary
• Kafka has mature clients for several popular languages
• Java still gives the best experience
• librdkafka is delivering a solid base for non-Java clients
• At the expense of native code
• Some third-party ‘pure’ clients look good too
• But the community support needs to stay the course

A few links
Kafka Protocol
http://kafka.apache.org/protocol
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
Kafka Clients directory
https://cwiki.apache.org/confluence/display/KAFKA/Clients
Code samples / modules
https://github.com/ibm-messaging/message-hub-samples
https://www.npmjs.com/package/message-hub-rest
http://docs.confluent.io/current/clients/index.html

Q & A
Contact us through the Summit App or via email
ecomar@uk.ibm.com
andrew_schofield@uk.ibm.com
Thanks !

Kafka Summit SF 2017 - Kafka and the Polyglot Programmer

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Kafka Summit SF 2017 - Kafka and the Polyglot Programmer

Similar to Kafka Summit SF 2017 - Kafka and the Polyglot Programmer (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

Kafka Summit SF 2017 - Kafka and the Polyglot Programmer