Building distributed systems is challenging. Luckily, Apache Kafka provides a powerful toolkit for putting together big services as a set of scalable, decoupled components. In this talk, I'll describe some of the design tradeoffs when building microservices, and how Kafka's powerful abstractions can help. I'll also talk a little bit about what the community has been up to with Kafka Streams, Kafka Connect, and exactly-once semantics.
Presentation by Colin McCabe, Confluent, Big Data Day LA
3. 3
Roadmap
● Example network service
• Why microservices?
• Why Kafka?
● Apache Kafka background
● How Kafka helps scale microservices
● Kafka APIs
• Kafka Connect API
• Kafka Streams API
● Wrap up
● New Kafka features and improvements
8. 8
Themes
● Improving Decoupling
• Everything in one big app: no decoupling
• Microservices with REST: multiple services
• Microservices with Kafka: decoupled services sharing
data
● Improving Scalability
• Everything in one big app: single node
• Microservices with REST: one node per service
• Microservices with Kafka: scalable microservices
9. 9
Apache Kafka
● A distributed streaming platform
● https://kafka.apache.org/intro
● Kafka was built at LinkedIn around 2010
● Multi-platform: clients in Java, Scala, C, C++, Python, Go, C#, …
14. 14
Kafka is Durable
Frontend
● Data is
replicated to
multiple servers
and persisted to
disk.
● Configurable log
retention.
● Consumers can
read from any
part of the log.
‘views’
topic
15. 15
Scaling with Kafka
● Can have multiple producers writing to a topic
● Can have multiple consumers reading from a topic
● Can add new microservices to consume data easily
• Example: add more microservices processing views
• Organize microservices around data, rather than APIs
● Can add more Kafka brokers to handle more messages and
topics
• Horizontal scalability
16. 16
Scaling a Topic with Multiple Partitions
Frontend
events
topic
Backend Backend Backend
17. 17
Load Balancing with Multiple Consumers
Frontend
emailer consumer
group
story_emails topic
22. 22
Calculating News Reader Metrics
Alice 13
Bob 4
Chao 25
Bob 19
Dave 55
...
Alice
europe
Bob us
Chao asia
Bob us
Dave
europe
...
europe 68
us
23
asia 25
...
+ =
clicks locations clicks per location
23. 23
Kafka Streams API
● Inputs and outputs are
Kafka streams
● Fault-tolerance,
rebalancing, scalability
provided by Kafka
● KStream
● KTable
26. 26
New Kafka Features and Improvements
● Exactly once semantics in Kafka 0.11
• https://www.confluent.io/blog/exactly-once-semantics-
are-possible-heres-how-apache-kafka-does-it/
● Consumer and producer performance improvements
• Up to +20% producer throughput
• Up to +50% consumer throughput
● Better CLASSPATH isolation for Kafka Connect connectors
27. 27
Conclusion
● The loose coupling, deployability, and testability of
microservices makes them a great way to scale.
● Apache Kafka is an incredibly useful building block for many
different microservices.
● Kafka is reliable and does the heavy lifting
● Kafka Connect is a great API for connecting with external
databases, Hadoop clusters, and other external systems.
● Kafka Streams can process data in realtime.
● https://www.confluent.io/solutions/microservices/