Abstract:- Building distributed systems is challenging. Luckily, Apache Kafka provides a powerful toolkit for putting together big services as a set of scalable, decoupled components. In this talk, I'll describe some of the design tradeoffs when building microservices, and how Kafka's powerful abstractions can help. I'll also talk a little bit about what the community has been up to with Kafka Streams, Kafka Connect, and exactly-once semantics.
3. 3
Network Services
Services
● Expose functionality over the network
● Manage state and business logic
Important aspects
● Availability
● Maintainability
● Consistency
● Extensibility
4. 4
Microservices vs Monolithic Services
Microservices
● Multiple components
● Loose coupling
● Organized around capabilities
Monolithic Services
● “One big app”
● Usually a single process
● No separation of concerns
5. 5
Why Microservices?
● Microservices
○ Bounded contexts
○ Easier to test
○ Easier to scale to multiple servers
○ Easier to scale to multiple teams
○ More robust
● Monolithic Services
○ Easier to get started with
6. 6
Synchronous Communication vs. Asynchronous
● Synchronous
○ Request / Response
○ REST
○ gRPC
○ Apache Thrift
● Asynchronous
○ Message queue
○ Kafka
○ ZeroMQ
Service A
Service B
Service A Service B
10. 10
Apache Kafka
● A distributed streaming platform
● That lets you publish and subscribe to
streams of records
● … in a fault-tolerant, real-time way
○ https://kafka.apache.org/intro
● Open source
○ https://www.confluent.io/download/
11. 11
Kafka History
● Kafka was built at LinkedIn around 2010
● https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
● Multi-platform: clients in Java, Scala, C, C++, Python, Go, C#, …
13. 13
Topics
● A topic is a
category or feed
name
● Divided into
partitions
● Can have multiple
consumers and
producers per
topic
● Can view older
messages
14. 14
Log-Compacted Topics
● Regular topics
○ A stream of messages: key/value pairs
○ As new messages are added, old ones may be deleted to
make space.
● Log-Compacted Topics
○ Retain the last known value for each key
○ Can easily fetch the value associated with a key
○ Acts a little bit like a table
20. 20
Kafka Streams
● docs.confluent.io/current/streams/
● Process data, not just transport it
● Makes stream processing simpler and easier
● Applications are fault-tolerant and elastic-- the scaling and
load-balancing are done by Kafka.
● The inputs and outputs are just Kafka topics.
● A library, not a framework.
21. 21
Kafka Streams
● Tables and streams are duals
○ A stream can be viewed as the changelog for a table
○ A table is just a cache of the latest value associated
with a key in a stream
● An aggregate like a join or a count can be viewed as a table
K1 -> V1
K2 -> V2
K3 -> V3
KStream
KTable
K1, V1 K2, V2 K3, V3
24. 24
Achieving Exactly-Once Semantics
● Idempotence: exactly once in-order semantics per partition
● Transactions: atomic writes across multiple partitions
● Streams support
● https://www.confluent.io/blog/exactly-once-semantics-are-p
ossible-heres-how-apache-kafka-does-it/
25. 25
Conclusion
● The loose coupling, deployability, and testability of
microservices makes them a great way to scale.
● Apache Kafka is an incredibly useful building block for many
different microservices.
● Kafka is reliable and does the heavy lifting
● Kafka is more than just a pipe -- Kafka Streams can process
data in realtime. Libraries, not frameworks. Deploy your way.