Kafka talk

© 2014 MapR Technologies 2
Agenda
• Why are message queuing systems important?
• Just enough Kafka
• Under the covers
• Production Issues
• What’s coming next?
• Community

Why are message queuing systems important?

Common Scenario in Enterprises – ETL/Data Integration

Message Queuing Systems

Just enough Kafka
• Producer
• Broker
• Topics
• Partitions – Random & Semantic
• ISR – Leader & Controller, high watermark
• Zookeeper – offsets
• Consumer
• Consumer Groups
• Commit Log – TTLs, compactions

Under the covers

Basics

Replicas Layout - ISR

Commit Protocol

Replication response times
When Producer
receives ack
Time to publish a
message (ms)
Durability on failures
No ack 0.29 high probability of data
loss
Wait for the leader 1.05 Some data loss
Wait for committed 2.05 Very low probability of
data loss

Producer throughput

Message Size vs Throughput (count)

New (0.8.2-beta) JVM Producer
1 producer, replication x 3 async 786,980 records/sec
(75.1 MB/sec)
1 producer, replication x 3 sync 421,823 records/sec
(40.2 MB/sec)
3 producer, replication x 3 async 2,024,032 records/sec
(193.0 MB/sec)
End-to-end latency
2 ms (median)
3 ms (99th percentile)
14 ms (99.9th percentile)

From 0.8.2-beta onwards

Consumer throughput

Features in 0.8 which enable low-latency
• Message is committed before a flush
• Consumer long polling
• Byte offset to logical offset of messages
• Messages have sequential ids
• Key is stored with the message
• Enables deleting individual messages
• Fetch from arbitrary offsets
• Balance between latency and durability

Production Issues

Production Issues
• 0.7 to 0.8 was not backwards compatible – had to use migration tool.
• There is no automatic data balancing, Partitions have to be added and
removed manually.
• No Web UI to give visibility into the cluster which is part of the distribution.
• Upgrading versions and scala byte in-compatibilities a major issue.
• Flaky Zookeeper connection.
• Both the producer and the consumer should be aware of the schema of the
messages written into the topic. Especially if there are multiple versions.
• Kafka is optimized for small messages (typically 1Kb size), Larger
messages decrease throughput drastically and cause GC issues.
• Async producer can lose messages when the callback is not handled
properly. Typically the case when there is a burst of messages.

• No reliable Inter-datacenter replication – mirror maker is getting better at avoiding
data loss when it shuts-down uncleanly – KAFKA- 1650
• Broker with high number of partitions experience high CPU Utilization – KAFKA-
1952
• There is no serializer/deserializer api to the new java client and application
developers need to write code explicitly – KAFKA-1797
• Snappy compressor is not thread safe – KAFKA-1721
• Leadership election state is stale and never recovers without all brokers restarting
KAFKA-1825
• Issue with Async producer with 250 Messages/second of size 25K – KAFKA-1789
• Kafka server can miss zookeeper watches during long zkclient callbacks KAFKA-
1155
• SimpleConsumerShell ONLY connects to the first host in the broker-list string to
fetch topic metadata – KAFKA-599
• CPU Usage Spike to 100% when network connection is lost - KAFKA-1642
• 521 unresolved bugs
• About 1200 resolved bugs

Typical Maintenance
• Adding and removing topics
• Modifying topics
• Graceful shutdown
• Balancing leadership
• Checking consumer position
• Mirroring data between clusters
• Expanding your cluster
• Decommissioning brokers
• Increasing replication factor

Typical hardware requirements
• Brokers:
Spec: Intel Xeon 2.5 GHz processor with six cores, Six 7200 RPM
SATA drives (JBOD or Raid10), 32GB of RAM, 1Gb Ethernet
• Typically Network IO and Disk IO bound Price: $6K
Ec2: c3.2xlarge Price: $312.48 /month
• Producer, consumers and zookeeper can run multitenant
• Consumers tend to be CPU bound

How and when to scale the cluster?
• Estimate the traffic pattern ahead of time.
• Typically brokers are network IO and disk IO bound.
• Access how far behind the consumer offsets are wrt the
producer.
• If brokers are running beyond 70% network utilization – add a
new broker to the cluster and rebalance the partitions.
• Periodically run load testing tool (part of the distribution) to
access if the cluster can handle the SLA requirements.
• Keep a track of the size of the log files that are retained on disk
and make sure sufficient storage space is available.

What’s next?

What’s coming in 0.9 release and further?
• Operational improvements
• Native offset storage – moving away from Zookeeper
• New Producer
• New Consumer
• Connection Quotas
• Schema repository
• Enhanced security
• Support for LZ4 compression
• Leader balancing
• New log Compaction controls
• Latency improvements
• Performance improvements
• Idempotence, Transactions

State of the Community.

Community
• 10 Commiters – mostly from Linkedin and Confluent
• 53 Contributors
• Clients written in multiple languages like Python, Go, Ruby, C,
C++, Perl, Clozure etc.

Apache Kafka Version distribution
Source: https://sematext.files.wordpress.com/2015/03/kafka-versions-usage-pie.png

• Cloudera released Kafka–1.2.0 as part of their distribution with
integrations to Flume and Spark. Fixed 37 bugs in the new release.
• Good Integrations with Storm, Spark Streaming and Hadoop
(Camus).
• MirrorMaker for mirroring data between clusters.
• Open-source web-console for UI.
• Flume and Kafka Integrations – Flafka.
• Yahoo open-sourced a Kafka Manager.
• VarnishKafka - Logcollector with integrated Kafka producer.
• Kafkatee – Consumes messages from one or more topics and writes
to multiple outputs.
• No credible open-source integrations with Hbase and Sqoop yet
• Work is being done for a Kafka Schema store

Q&A
@mapr maprtech
mgunturu@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies

Appendix

Message Queue Performance
http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf

Consumer
• Zookeeper interaction
• Threading/partitions
• Message decoding
• commit/offset management
• Plug in more consumers
• Layered on High-level Kafka consumer
• Batching

Message Queuing Systems
Stream
processing

From 0.8.2 onwards

Log Compaction

Kafka talk

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Kafka talk

Similar to Kafka talk (20)

Kafka talk

Editor's Notes