© 2014 MapR Technologies 1© 2014 MapR Technologies
© 2014 MapR Technologies 2
Agenda
• Why are message queuing systems important?
• Just enough Kafka
• Under the covers
• Production Issues
• What’s coming next?
• Community
© 2014 MapR Technologies 3
Why are message queuing systems important?
© 2014 MapR Technologies 4
Common Scenario in Enterprises – ETL/Data Integration
© 2014 MapR Technologies 5
Message Queuing Systems
© 2014 MapR Technologies 6
Just enough Kafka
• Producer
• Broker
• Topics
• Partitions – Random & Semantic
• ISR – Leader & Controller, high watermark
• Zookeeper – offsets
• Consumer
• Consumer Groups
• Commit Log – TTLs, compactions
© 2014 MapR Technologies 7
Under the covers
© 2014 MapR Technologies 8
Basics
© 2014 MapR Technologies 9
© 2014 MapR Technologies 10
Replicas Layout - ISR
© 2014 MapR Technologies 11
Commit Protocol
© 2014 MapR Technologies 12
© 2014 MapR Technologies 13
© 2014 MapR Technologies 14
© 2014 MapR Technologies 15
© 2014 MapR Technologies 16
Replication response times
When Producer
receives ack
Time to publish a
message (ms)
Durability on failures
No ack 0.29 high probability of data
loss
Wait for the leader 1.05 Some data loss
Wait for committed 2.05 Very low probability of
data loss
© 2014 MapR Technologies 17
Producer throughput
© 2014 MapR Technologies 18
Message Size vs Throughput (count)
© 2014 MapR Technologies 19
New (0.8.2-beta) JVM Producer
1 producer, replication x 3 async 786,980 records/sec
(75.1 MB/sec)
1 producer, replication x 3 sync 421,823 records/sec
(40.2 MB/sec)
3 producer, replication x 3 async 2,024,032 records/sec
(193.0 MB/sec)
End-to-end latency
2 ms (median)
3 ms (99th percentile)
14 ms (99.9th percentile)
© 2014 MapR Technologies 20
From 0.8.2-beta onwards
© 2014 MapR Technologies 21
Consumer throughput
© 2014 MapR Technologies 22
Features in 0.8 which enable low-latency
• Message is committed before a flush
• Consumer long polling
• Byte offset to logical offset of messages
• Messages have sequential ids
• Key is stored with the message
• Enables deleting individual messages
• Fetch from arbitrary offsets
• Balance between latency and durability
© 2014 MapR Technologies 23
Production Issues
© 2014 MapR Technologies 24
Production Issues
• 0.7 to 0.8 was not backwards compatible – had to use migration tool.
• There is no automatic data balancing, Partitions have to be added and
removed manually.
• No Web UI to give visibility into the cluster which is part of the distribution.
• Upgrading versions and scala byte in-compatibilities a major issue.
• Flaky Zookeeper connection.
• Both the producer and the consumer should be aware of the schema of the
messages written into the topic. Especially if there are multiple versions.
• Kafka is optimized for small messages (typically 1Kb size), Larger
messages decrease throughput drastically and cause GC issues.
• Async producer can lose messages when the callback is not handled
properly. Typically the case when there is a burst of messages.
© 2014 MapR Technologies 25
• No reliable Inter-datacenter replication – mirror maker is getting better at avoiding
data loss when it shuts-down uncleanly – KAFKA- 1650
• Broker with high number of partitions experience high CPU Utilization – KAFKA-
1952
• There is no serializer/deserializer api to the new java client and application
developers need to write code explicitly – KAFKA-1797
• Snappy compressor is not thread safe – KAFKA-1721
• Leadership election state is stale and never recovers without all brokers restarting
KAFKA-1825
• Issue with Async producer with 250 Messages/second of size 25K – KAFKA-1789
• Kafka server can miss zookeeper watches during long zkclient callbacks KAFKA-
1155
• SimpleConsumerShell ONLY connects to the first host in the broker-list string to
fetch topic metadata – KAFKA-599
• CPU Usage Spike to 100% when network connection is lost - KAFKA-1642
• 521 unresolved bugs
• About 1200 resolved bugs
© 2014 MapR Technologies 26
Typical Maintenance
• Adding and removing topics
• Modifying topics
• Graceful shutdown
• Balancing leadership
• Checking consumer position
• Mirroring data between clusters
• Expanding your cluster
• Decommissioning brokers
• Increasing replication factor
© 2014 MapR Technologies 27
Typical hardware requirements
• Brokers:
Spec: Intel Xeon 2.5 GHz processor with six cores, Six 7200 RPM
SATA drives (JBOD or Raid10), 32GB of RAM, 1Gb Ethernet
• Typically Network IO and Disk IO bound Price: $6K
Ec2: c3.2xlarge Price: $312.48 /month
• Producer, consumers and zookeeper can run multitenant
• Consumers tend to be CPU bound
© 2014 MapR Technologies 28
How and when to scale the cluster?
• Estimate the traffic pattern ahead of time.
• Typically brokers are network IO and disk IO bound.
• Access how far behind the consumer offsets are wrt the
producer.
• If brokers are running beyond 70% network utilization – add a
new broker to the cluster and rebalance the partitions.
• Periodically run load testing tool (part of the distribution) to
access if the cluster can handle the SLA requirements.
• Keep a track of the size of the log files that are retained on disk
and make sure sufficient storage space is available.
© 2014 MapR Technologies 29
What’s next?
© 2014 MapR Technologies 30
What’s coming in 0.9 release and further?
• Operational improvements
• Native offset storage – moving away from Zookeeper
• New Producer
• New Consumer
• Connection Quotas
• Schema repository
• Enhanced security
• Support for LZ4 compression
• Leader balancing
• New log Compaction controls
• Latency improvements
• Performance improvements
• Idempotence, Transactions
© 2014 MapR Technologies 31
State of the Community.
© 2014 MapR Technologies 32
Community
• 10 Commiters – mostly from Linkedin and Confluent
• 53 Contributors
• Clients written in multiple languages like Python, Go, Ruby, C,
C++, Perl, Clozure etc.
© 2014 MapR Technologies 33
Apache Kafka Version distribution
Source: https://sematext.files.wordpress.com/2015/03/kafka-versions-usage-pie.png
© 2014 MapR Technologies 34
• Cloudera released Kafka–1.2.0 as part of their distribution with
integrations to Flume and Spark. Fixed 37 bugs in the new release.
• Good Integrations with Storm, Spark Streaming and Hadoop
(Camus).
• MirrorMaker for mirroring data between clusters.
• Open-source web-console for UI.
• Flume and Kafka Integrations – Flafka.
• Yahoo open-sourced a Kafka Manager.
• VarnishKafka - Logcollector with integrated Kafka producer.
• Kafkatee – Consumes messages from one or more topics and writes
to multiple outputs.
• No credible open-source integrations with Hbase and Sqoop yet
• Work is being done for a Kafka Schema store
© 2014 MapR Technologies 35
Q&A
@mapr maprtech
mgunturu@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies
© 2014 MapR Technologies 36
Appendix
© 2014 MapR Technologies 37
Message Queue Performance
http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
© 2014 MapR Technologies 38
Consumer
• Zookeeper interaction
• Threading/partitions
• Message decoding
• commit/offset management
• Plug in more consumers
• Layered on High-level Kafka consumer
• Batching
© 2014 MapR Technologies 39
Message Queuing Systems
Stream
processing
© 2014 MapR Technologies 40
© 2014 MapR Technologies 41
From 0.8.2 onwards
© 2014 MapR Technologies 42
Log Compaction

Kafka talk

  • 1.
    © 2014 MapRTechnologies 1© 2014 MapR Technologies
  • 2.
    © 2014 MapRTechnologies 2 Agenda • Why are message queuing systems important? • Just enough Kafka • Under the covers • Production Issues • What’s coming next? • Community
  • 3.
    © 2014 MapRTechnologies 3 Why are message queuing systems important?
  • 4.
    © 2014 MapRTechnologies 4 Common Scenario in Enterprises – ETL/Data Integration
  • 5.
    © 2014 MapRTechnologies 5 Message Queuing Systems
  • 6.
    © 2014 MapRTechnologies 6 Just enough Kafka • Producer • Broker • Topics • Partitions – Random & Semantic • ISR – Leader & Controller, high watermark • Zookeeper – offsets • Consumer • Consumer Groups • Commit Log – TTLs, compactions
  • 7.
    © 2014 MapRTechnologies 7 Under the covers
  • 8.
    © 2014 MapRTechnologies 8 Basics
  • 9.
    © 2014 MapRTechnologies 9
  • 10.
    © 2014 MapRTechnologies 10 Replicas Layout - ISR
  • 11.
    © 2014 MapRTechnologies 11 Commit Protocol
  • 12.
    © 2014 MapRTechnologies 12
  • 13.
    © 2014 MapRTechnologies 13
  • 14.
    © 2014 MapRTechnologies 14
  • 15.
    © 2014 MapRTechnologies 15
  • 16.
    © 2014 MapRTechnologies 16 Replication response times When Producer receives ack Time to publish a message (ms) Durability on failures No ack 0.29 high probability of data loss Wait for the leader 1.05 Some data loss Wait for committed 2.05 Very low probability of data loss
  • 17.
    © 2014 MapRTechnologies 17 Producer throughput
  • 18.
    © 2014 MapRTechnologies 18 Message Size vs Throughput (count)
  • 19.
    © 2014 MapRTechnologies 19 New (0.8.2-beta) JVM Producer 1 producer, replication x 3 async 786,980 records/sec (75.1 MB/sec) 1 producer, replication x 3 sync 421,823 records/sec (40.2 MB/sec) 3 producer, replication x 3 async 2,024,032 records/sec (193.0 MB/sec) End-to-end latency 2 ms (median) 3 ms (99th percentile) 14 ms (99.9th percentile)
  • 20.
    © 2014 MapRTechnologies 20 From 0.8.2-beta onwards
  • 21.
    © 2014 MapRTechnologies 21 Consumer throughput
  • 22.
    © 2014 MapRTechnologies 22 Features in 0.8 which enable low-latency • Message is committed before a flush • Consumer long polling • Byte offset to logical offset of messages • Messages have sequential ids • Key is stored with the message • Enables deleting individual messages • Fetch from arbitrary offsets • Balance between latency and durability
  • 23.
    © 2014 MapRTechnologies 23 Production Issues
  • 24.
    © 2014 MapRTechnologies 24 Production Issues • 0.7 to 0.8 was not backwards compatible – had to use migration tool. • There is no automatic data balancing, Partitions have to be added and removed manually. • No Web UI to give visibility into the cluster which is part of the distribution. • Upgrading versions and scala byte in-compatibilities a major issue. • Flaky Zookeeper connection. • Both the producer and the consumer should be aware of the schema of the messages written into the topic. Especially if there are multiple versions. • Kafka is optimized for small messages (typically 1Kb size), Larger messages decrease throughput drastically and cause GC issues. • Async producer can lose messages when the callback is not handled properly. Typically the case when there is a burst of messages.
  • 25.
    © 2014 MapRTechnologies 25 • No reliable Inter-datacenter replication – mirror maker is getting better at avoiding data loss when it shuts-down uncleanly – KAFKA- 1650 • Broker with high number of partitions experience high CPU Utilization – KAFKA- 1952 • There is no serializer/deserializer api to the new java client and application developers need to write code explicitly – KAFKA-1797 • Snappy compressor is not thread safe – KAFKA-1721 • Leadership election state is stale and never recovers without all brokers restarting KAFKA-1825 • Issue with Async producer with 250 Messages/second of size 25K – KAFKA-1789 • Kafka server can miss zookeeper watches during long zkclient callbacks KAFKA- 1155 • SimpleConsumerShell ONLY connects to the first host in the broker-list string to fetch topic metadata – KAFKA-599 • CPU Usage Spike to 100% when network connection is lost - KAFKA-1642 • 521 unresolved bugs • About 1200 resolved bugs
  • 26.
    © 2014 MapRTechnologies 26 Typical Maintenance • Adding and removing topics • Modifying topics • Graceful shutdown • Balancing leadership • Checking consumer position • Mirroring data between clusters • Expanding your cluster • Decommissioning brokers • Increasing replication factor
  • 27.
    © 2014 MapRTechnologies 27 Typical hardware requirements • Brokers: Spec: Intel Xeon 2.5 GHz processor with six cores, Six 7200 RPM SATA drives (JBOD or Raid10), 32GB of RAM, 1Gb Ethernet • Typically Network IO and Disk IO bound Price: $6K Ec2: c3.2xlarge Price: $312.48 /month • Producer, consumers and zookeeper can run multitenant • Consumers tend to be CPU bound
  • 28.
    © 2014 MapRTechnologies 28 How and when to scale the cluster? • Estimate the traffic pattern ahead of time. • Typically brokers are network IO and disk IO bound. • Access how far behind the consumer offsets are wrt the producer. • If brokers are running beyond 70% network utilization – add a new broker to the cluster and rebalance the partitions. • Periodically run load testing tool (part of the distribution) to access if the cluster can handle the SLA requirements. • Keep a track of the size of the log files that are retained on disk and make sure sufficient storage space is available.
  • 29.
    © 2014 MapRTechnologies 29 What’s next?
  • 30.
    © 2014 MapRTechnologies 30 What’s coming in 0.9 release and further? • Operational improvements • Native offset storage – moving away from Zookeeper • New Producer • New Consumer • Connection Quotas • Schema repository • Enhanced security • Support for LZ4 compression • Leader balancing • New log Compaction controls • Latency improvements • Performance improvements • Idempotence, Transactions
  • 31.
    © 2014 MapRTechnologies 31 State of the Community.
  • 32.
    © 2014 MapRTechnologies 32 Community • 10 Commiters – mostly from Linkedin and Confluent • 53 Contributors • Clients written in multiple languages like Python, Go, Ruby, C, C++, Perl, Clozure etc.
  • 33.
    © 2014 MapRTechnologies 33 Apache Kafka Version distribution Source: https://sematext.files.wordpress.com/2015/03/kafka-versions-usage-pie.png
  • 34.
    © 2014 MapRTechnologies 34 • Cloudera released Kafka–1.2.0 as part of their distribution with integrations to Flume and Spark. Fixed 37 bugs in the new release. • Good Integrations with Storm, Spark Streaming and Hadoop (Camus). • MirrorMaker for mirroring data between clusters. • Open-source web-console for UI. • Flume and Kafka Integrations – Flafka. • Yahoo open-sourced a Kafka Manager. • VarnishKafka - Logcollector with integrated Kafka producer. • Kafkatee – Consumes messages from one or more topics and writes to multiple outputs. • No credible open-source integrations with Hbase and Sqoop yet • Work is being done for a Kafka Schema store
  • 35.
    © 2014 MapRTechnologies 35 Q&A @mapr maprtech mgunturu@mapr.com Engage with us! MapR maprtech mapr-technologies
  • 36.
    © 2014 MapRTechnologies 36 Appendix
  • 37.
    © 2014 MapRTechnologies 37 Message Queue Performance http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
  • 38.
    © 2014 MapRTechnologies 38 Consumer • Zookeeper interaction • Threading/partitions • Message decoding • commit/offset management • Plug in more consumers • Layered on High-level Kafka consumer • Batching
  • 39.
    © 2014 MapRTechnologies 39 Message Queuing Systems Stream processing
  • 40.
    © 2014 MapRTechnologies 40
  • 41.
    © 2014 MapRTechnologies 41 From 0.8.2 onwards
  • 42.
    © 2014 MapRTechnologies 42 Log Compaction

Editor's Notes

  • #5  Jerry-rigged piping between systems and applications on an as needed basis A lot of times there is Impedance mismatch and We typically deploy asynchronous processing to encounter it i.e request-response web services for any downstream processing. This approach is very ad hoc and Over time this set-up gets more and more complex. Data becomes unreliable and data quality suffers
  • #6 Kafka Decouples data pipelines Central repository of data streams Takes care of any impedance mismatch between different applications and the analytics necessary Enables Lambda Architecture seamlessly – tried and tested extensively in Linkedin This system has its limitations - as code has to be written twice in the realtime layer and batch layer There was a blog by Jay Kreps who founded a new company called Confluent talking about Kappa Architecture – http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html twitter summingbird is a complex system which solves and addresses these issues - https://blog.twitter.com/2013/streaming-mapreduce-with-summingbird
  • #7 Producers - ** push ** Batching Compression Sync (Ack), Async (auto batch) Sequential writes, guaranteed ordering within each partition Consumers - ** pull ** No state held by broker Consumers control reading from the stream * compression.codec – uses GZIP and Snappy for message compression from the producer to the broker Zero Copy for producers and consumers to and from the broker - Zero copy is a function in file channel (java NIO) which lets you avoid redundant data copies between intermediate buffers & reduces the number of context switches between user space and kernel space. http://kafka.apache.org/documentation.html#maximizingefficiency Messages stay on disk when consumed, deletes on TTL or compaction https://kafka.apache.org/documentation.html#compaction
  • #9 * Partitions are sequential writes
  • #11 partition count is higher than the number of brokers, so that the leader partitions are evenly distributed across brokers, thus distributing the read/write load.
  • #16 While configuring a Producer, set acks=-1. message is considered to be successfully delivered only after ALL the ISRs have acknowledged writing the message. Set the topic level configuration min.insync.replicas, which specifies the number of replicas that must acknowledge a write, for the write to be considered successful. If this minimum cannot be met, and acks=-1, the producer will raise an exception. Set the broker configuration param unclean.leader.election.enable to false. This setting essentially means you are prioritizing durability over availability since Kafka would avoid electing a leader, and instead make the partition unavailable, if no ISR is available to become the next leader safely. Failure cases: Leader failure * leader – controller * Zookeeper has an ephemeral node to which the Leader/controller is subscribed to, when the leader fails notifies the controller 2) Follower failure * What happens when a broker goes down * How are new messages handled * What happens when the broker which is down is up and ready to join the ISR
  • #25 No UI which gives visibility into # of topics, Partitions, Consumers/ Consumer groups, name and number of topics - log retention (time/capacity), Current log size, Consumer groups IDs, partitions, offset, lag, throughput, owner
  • #26 KAFKA-1890 - Fix bug preventing Mirror Maker from successful rebalance
  • #27 * Expanding your cluster: Just assign a broker ID and start Kafka on a new node. Partitions are not automatically assigned. Partitions have to be manually migrated -> Partition re-assignment tool which generates, executes a custom re-assignment plan and verifies the status. * Decommissioning Brokers: Custom re-assignment plan to move all the replicas to more than one broker (evenly Distributed)
  • #31 While most of the focus for 0.6 and 0.7 have been Scalability, Atleast one semantics, Strong enough gaurantees, doesn’t fallover, persistance and effeciency Non blocking IO for the producer – Flush logic to completely move to the background Better durability & consistency controls Better Partitioning Security https://cwiki.apache.org/confluence/display/KAFKA/Security Authentication TLS/SSL Kerberos Authorization Plugable
  • #35 Yahoo manager for Kafka – * Manage multiple clusters Easy inspection of the state of the cluster Takes care of partition assignments and reassignments (based on current state of cluster) * varnishkafka is a varnish log collector with an integrated Apache Kafka producer. It was written from scratch with performance and modularity in mind, varnishkafka consumes about a third of the CPU that varnishncsa does and has a far more frugal memory approach. * kafkatee consumes messages from one or more Kafka topics and writes the messages to one or more outputs - either command pipes or files. * Opportunity No well recognized Hbase Connector with Kafka
  • #43 Streaming support, Reliability, guaranteed ordering of messages in a partition, Scalability, runs on a Distributed infrastructure, Persistence, compression