More Related Content Similar to RabbitMQ & Kafka (20) More from VMware Tanzu (20) RabbitMQ & Kafka2. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Introduction
2
Madhav Sathe
Platform Architect
@madhav_sathe
Zoe Vance
Product Lead
zvance@pivotal.io
3. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Benefits
When to Use Each
Challenges
RabbitMQ Building Blocks
Developer Experiences
Live Coding
Kafka Building Blocks
Tips
Agenda
3
4. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Benefits
5. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
What is Rabbit
General purpose message broker, based around message queues,
designed with a smart broker / passive consumer model
6. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Why RabbitMQ - mature and stable
7. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Plugins
Tier 1 (19)
Community
- Routing
- Auth
- Mgmt
- Clustering
- Logging
- Queues
- Protocols
Why RabbitMQ - wide and extendable support
Client Libraries
Java (4)
Spring (3)
.Net (6)
Ruby (7)
Python (4)
PHP (7)
JavaScript & Node (4)
Rust (2)
Objective-C & Scala (1)
Other JVM (11)
C & C++ (4)
Go (3)… find more at rmq docs
8. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Why RabbitMQ - first class monitoring
9. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ 9
Why RabbitMQ - extremely flexible routing
RabbitMQ Broker
Exchange(s)
Bindings
Queues
10. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Easy to scale by adding/removing competing consumers on a single queue
Can be configured for consistency, high availability, low latency, high throughput
Cluster rolling upgrades via feature flags
Easy to get started
Supports strict ordering
Wider use cases e.g., event driven microservices, RPC, ETL (with SCDF), enterprise message bus, pub-sub
messaging, real-time analytics (with Reactor and RabbitMQ Reactive API)
Why RabbitMQ
11. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Distributed & partitioned commit log with messaging semantics
Distributed real-time streaming platform
What is Kafka?
12. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Why Kafka
Very high throughput with data guarantees
Massive scale
Ability to replay events
De-facto standard for streaming platform
Ability to plug-n-play consumer groups on a topic
Supports strict ordering
Replace complex data architectures
Broad ecosystem of connectors
Schema evolution with backward compatibility
Wider use cases - pub-sub messaging, events driven microservices, logs store, streaming, event
sourcing, CDC, enterprise data pipelines
13. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Why Kafka
Messaging API
Apps
Streaming API
Apps
Connect SinkConnect
Source
14. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
When to use each
15. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
If you don’t have specific Kafka requirements, then RabbitMQ gives you greater flexibility, can
meet high throughput and real-time event-processing needs and has lower cost of operations
Evolving application requirements
Decoupled producer and consumers (using exchanges)
Consumers independently bring their own queue that binds to exchanges
Consuming applications don’t need to process messages that aren’t relevant
When to use RabbitMQ Over Kafka
16. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
When to use Kafka Over RabbitMQ
Streaming platform
Extremely high throughput
Joining multiple streams or streams and tables to enrich the data
Massive scale (RMQ suffers beyond 5 brokers)
Replay
Schema evolution
17. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Challenges
18. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
RabbitMQ
Operational complexity in resolving network partitions
Queues are single-threaded
Scaling brokers >3 becomes complicated and can have negative performance impacts
No events replay
Does not natively support stateful streaming use cases (but can do so with Reactor +
RabbitMQ Reactive API with external store such as Redis)
19. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Challenges in Kafka
No free lunch - operational complexity
Requires a separate Zookeeper cluster
Requires meticulous planning to select partition count
Storage management overheads
Careful coordination needed between teams writing consumer groups and producers
Out of box management & monitoring console in upstream OSS Kafka
Streaming API support restricted mainly to Java (KSQL can help but needs a separate
cluster)
20. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Let us a play a little game
Let us say, we are introducing topic “Bar” for “appA” that requires parallelism of 20
consumers. So we define topic “Bar” with 20 partitions.
And YOU are a Kafka architect 😍
https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html#join-co-partitioning-requirements
21. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Let us a play a little game
Let us say, we are introducing topic “Bar” for “appA” that requires parallelism of 20
consumers. So we define topic “Bar” with 20 partitions.
At some later point “appB” too has a requirement to stream events from “Bar”. But “appB”
already streams events from topic “Foo”. “appB” actually needs to join the events from “Foo”
and “Bar”. However, “Foo” has only 10 partitions.
And YOU are a Kafka architect 😕
https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html#join-co-partitioning-requirements
22. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Let us a play a little game
Let us say, we are introducing topic “Bar” for “appA” that requires parallelism of 20
consumers. So we define topic “Bar” with 20 partitions.
At some later point “appB” too has a requirement to stream events from “Bar”. But “appB”
already streams events from topic “Foo”. “appB” actually needs to join the events from “Foo”
and “Bar”. However, “Foo” has only 10 partitions.
Now “appC” is interested in events from “Bar”. However, “appC” wants ordering across the
topic, so “appC” really needs “Bar” to have only one partition.
And YOU are a Kafka architect 😢
https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html#join-co-partitioning-requirements
23. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Let us a play a little game
Let us say, we are introducing topic “Bar” for “appA” that requires parallelism of 20
consumers. So we define topic “Bar” with 20 partitions.
At some later point “appB” too has a requirement to stream events from “Bar”. But “appB”
already streams events from topic “Foo”. “appB” actually needs to join the events from “Foo”
and “Bar”. However, “Foo” has only 10 partitions.
Now “appC” is interested in events from “Bar”. However, “appC” wants ordering across the
topic, so “appC” really needs “Bar” to have only one partition.
Both “Foo” and “Bar” events are becoming popular, so “appD” doesn’t want to be left
behind and wants both events. However, “appD” needs ordering guarantee across the all
events.
And YOU are a Kafka architect ……….. 😼 🤟
https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html#join-co-partitioning-requirements
24. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Credit: Jack Vanlightly @vanlightly
25. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Credit: Jack Vanlightly @vanlightly
26. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Building Blocks of
RabbitMQ
27. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
2
7
Messaging in RabbitMQ
BrokerProducers
Message
Exchange(s)
Bindings
Queues Consumers
28. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Direct Exchange
Direct exchange delivers
messages to queues when the
message routing key exactly
matches the queue’s binding key.
images.crop cropper
Routing key
resizer
images.resize
29. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Topic Exchange
Topic exchange delivers
messages to queues when the
wildcard matches between the
routing key and the queue’s
binding key.
*.*.error errors
Routing key
geos
eu.de.*
30. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Fanout Exchange
Fanout exchange delivers
messages to all queues
regardless of routing keys or
pattern matching.
31. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Clustering with RabbitMQ
36. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
RabbitMQ Roadmap
RabbitMQ
OSS
RMQ for Pivotal
Platform
RMQ on
Kubernetes
RabbitMQ 3.8 just out!
- Quorum queues using RAFT to
provide persistent and fault
tolerant messaging systems
- Mixed-version rolling upgrades
- Enhanced observability (new
metrics and built in plugin with
visualizations in grafana)
- OAuth 2.0 Support
In Closed Beta
Goal is to provide great developer and day-
2 operational experience (automated
reliable upgrades, problem resolution and
actionable observability)
RabbitMQ for PCF 1.18 just out!
- Support for off-platform
applications instances to on-
platform RabbitMQ instances
PAS PKSDIY
37. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
RabbitMQ Developer Experience
38. Spring Cloud
Stream
● Abstraction over
protocol
● Abstraction over
messaging vendor
● Same code
regardless of
messaging broker
Spring AMQP
● Provides a
"template" as a high-
level abstraction for
sending and
receiving messages.
● Support for
Message-driven
POJOs with a
"listener container".
● Similar to the JMS
support in the Spring
Framework.
RabbitMQ AMQP
Client
● Low level API to
RabbitMQ
Options for Java developers
Reactor RabbitMQ
● Reactive API for
RabbitMQ based on
Reactor and RMQ
Java Client
● Functional APIs
enables messages
to be
published/consumed
with non-blocking
back-pressure and
very low overheads
39. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Kafka Developer Experience
40. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Spring Integration
Spring for Apache Kafka
Spring Cloud Stream Binder for
Kafka
Spring Cloud Stream Kafka Streams Binder
KStream, KTable & GlobalKTable
Spring Cloud Stream
Input & Output Message Channels Spring Cloud Kafka Stream
Spring Cloud Stream Microservice
Spring Cloud Stream Kafka Streams
Microservice
Binder
41. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Demo
42. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
JustRide: Driver Behavior Driven Car Insurance
Car Events
Violations
Score Processor
Customer Score Sink
Customer Info
Customer
Scores API
vSphere
Azure &
Azure Stack
Google CloudAWS
Dashboard
Pivotal Platform
Speed Check
Processor
Customer Score
Customer
Score Sink
43. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Speed Check Processor
(SCSt + Kstream)
For the demo
Car Events
Violations
Car Events Load
Generator
(SCSt)
Score Processor
(SCSt + Kstream)
Main business logic
application.yaml to define input and output bindings with
topics as destinations
44. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Model for the demo
CarEvent:
- uuid
- latitude
- longitude
- speed
Simple JSON friendly Pojo
Message Key: uuid
45. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Model for the demo
CarEvent:
- uuid
- latitude
- longitude
- speed
Simple JSON friendly Pojo
Message Key: uuid
ViolationEvent:
- uuid
- List<CarEvent>
- violationCount
- start
- end
State Management:
- new() - ‘start’ timestamp, new ArrayList<CarEvent>
- addCarEvent() - check speed, add to list, increase
violationCount
- closeWindow() - ‘end’ timestamp
JSON friendly “self aware” data model
Message Key: uuid
Speed Check Processor
(SCSt + Kstream)
46. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Building Blocks
Kafka
47. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Sequential I/O as opposed to random → huge benefits for disk
performance
Kafka is Log & Real-time*, How’s That Possible?
Extremely smart utilization of OS page cache → achieve read and writes
without disk IOPS in call path
Zero copy send files → kernel copies the data directly from the disk file to
the socket, without going through the application
* When evil scenarios are avoided :)
In-memory
performance
48. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Broker A
File System
In-memory Cache
Producer
1 - Send message
2
3
4
5 - Receive ACK
ACK = 1
https://www.confluent.io/kafka-summit-sf18/kafka-on-zfs
49. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Broker A
File System
In-memory Cache
Consumer
2
4
1
Socket
Buffer
NIC Buffer
3
5
https://medium.com/@sunny_81705/what-makes-apache-kafka-so-fast-71b477dcbf0
50. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Broker and Cluster
Broker A
Controller
- Monitoring
other brokers
- Broker
shutdown
- Election of
partition leaders
- Tell brokers
about partition leaders
Broker B Broker C
Zookeeper Cluster
/controller → Broker A
/topic/A/0 → Broker A
/topic/A/1 → Broker B
/topic/A/2 → Broker C
51. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Topic and Partitions Basics
Partition 0
Partition 1
Topic is divided in partitions, each partition is
essentially a log file
Partitions can have replicas for HA
One of replicas is chosen as a Leader
All reads and writes happen only on Leader
Publisher can only append to the partition
Ordering guarantees only in a partition
Topic
52. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Broker B
Partition: 0
Leader
Partition: 1
Follower
Partition: 0
Follower
Partition: 1
Leader
Broker A
Producers
Producers
Producer API
Cluster: 2 brokers,
Topic: trades,
Partitions: 2,
Replicas: 2
Messages without a key
are shared across
partitions in round robin
fashion
Hash based partition
selection
Messages with same key
go to same partition
Batch Batch
53. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Partition and Consumer Offset
0 1 2 3 4 5 6 7
Producers/Publishers
Alana’s Offset
Cody’s Offset Zoe’s Offset
P0
P0 P0 P0
1 3 6
Partition offset map topic
Just reset your offset to
re-play the messages
54. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Kafka Streams
Cluster
Topic: trades
Kstream
Topic: company
profiles
Ktable
Topic:
recommendations
Kstream
Kafka managed topic
Kafka Streaming
Microservice
Fault tolerant
state store
55. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Cluster
Partitions and Consumer Groups
Topic: trades,
Partition: 0
Topic: trades,
Partition: 1
Consumer Group
Consumer 1
56. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Cluster
Partitions and Consumer Groups
Topic: trades,
Partition: 0
Topic: trades,
Partition: 1
Consumer Group
Consumer 1
Consumer 2
57. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Cluster
Partitions and Consumer Groups
Topic: trades,
Partition: 0
Topic: trades,
Partition: 1
Consumer Group
Consumer 1
Consumer 2
Consumer 3
58. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Cluster
Partitions and Consumer Groups
Topic: trades,
Partition: 0
Topic: trades,
Partition: 1
Consumer Group
Consumer 1
🔥Consumer 2
59. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Cluster
Partitions and Consumer Groups
Topic: trades,
Partition: 0
Topic: trades,
Partition: 1
Consumer Group
Consumer 1
🔥Consumer 2
60. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Tips for RabbitMQ
61. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Tips to efficiently use RabbitMQ
● Overall
○ Clarify requirements of your application
○ Monitor RabbitMQ
○ Use wide range of RabbitMQ Resources
● Queues
○ Happy Rabbit is an Empty Rabbit - keep queues short
○ For performance, use in-memory, non-mirrored queues
○ For HA & data safety, use Quorum Queues
● Producer/Consumers
○ Use multiple consumers if consumers are slow or there are too many producers
○ If messages can’t be lost, use acknowledgments
○ For performance, use autoack
● Resources
○ Every connection, channel, queue costs memory & CPU; more there are and harder they work, more resources
are required
○ For throughput, have as many queues as cores on the underlying nodes of a multi-core system
● Messages
○ For performance, keep messages in memory
○ If messages can’t be lost, ensure you use pre-fetch values with publisher confirms/consumer acks
62. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Tips for Kafka
63. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Tips to efficiently using Kafka
● Use ACK = All if you love your data, ACK = 1 for balance between reliability & latency
● Route “hot keys” to a different topic to avoid “hot partitions”
● Producer efficiency
○ Pick optimal batch size that fills up fast so that you good mix of throughput and latency
○ Linger of 5ms is a usually a good thumb rule
○ Increase batch size for higher throughput
○ Lower linger for lower latency
● If you have large files to send consider following options
○ Put files on shared location and send location of the files on kafka
○ Break down file to right size and use keys to ensure ordered processing
● Tips on using keys
○ Setting Key=Null gives best performance and balanced partitions across cluster
○ Explore possibility of leveraging downstream stores to establish ordering
○ Use keys only if you need ordered messaging in real-time or joins across different topics
● A lot of Kafka’s performance depends availability of page cache and GC overhead so monitor these two
parameters with extra care
64. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
● Extremely large messages may block consumers if the consumer is not configured with adequate buffer size. If there
are some large messages, consider using a separate topic
● Ad-hoc offset reset/replays while writes and other reads are going on
● ACK=All will have some impact on latency so plan the partitions accordingly
○ In-Sync replicas lagging behind, always watch your ISR list
● Watch out for disk IOPS during reads and writes
○ Less available memory for page caching forcing more disk IOPS
○ Vastly lagging consumers working at different speeds forcing disk IOPS
● Brokers running different versions within a cluster can performance issue
○ Few brokers running much faster or much slower than rest of the cluster
● Watch out for Zombie brokers in older Kafka versions
● Old Kafka client libraries (producer/consumer) may have adverse impact on throughput and latency
● Check rebalance of partitions if broker goes down, if you don’t throttle it then it can use up all your n/w bandwidth
● Restarting cluster with large number of partitions, leader election takes time
● If consumer crashes or unable to send heartbeats, the partition reassignment will take place, and during this time no
consumer in the group can process any message
Watch out! Things causing adverse impact in Kafka
65. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Thanks to
Jack Vanlightly
Marcial Rosales
Soby Chacko
Sina Sajoodi
Wayne Lund
Timothy Dalsing
Dan Carwin
Gerhard Lazu
Karl Nilsson
Arnaud Cogoluegnes
James Williams
66. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Resources
https://rabbitmqsummit.com/
https://kafka-summit.org/
https://www.rabbitmq.com/
https://kafka.apache.org/
https://www.rabbitmq.com/blog/tag/3-8/
https://www.confluent.io/
https://jack-vanlightly.com/blog/2017/12/3/rabbitmq-vs-kafka-series-introduction
67. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
68. Unless otherwise indicated, these slides are © 2013-2019 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/
Thank You!