1
Building a Streaming
Platform with Kafka
Pere Urbón-Bayes
Technical Architect (TAM)
pere@confluent.io
2
Overview
1. Set the stage.
2. Introducing the key concepts ( Kafka Broker, Connect and KStreams)
3. Using events for notifications and state transfer
4. Let’s build an small application
5. Conclusion
3
Kafka & Confluent
4
Is Kafka a Streaming Platform?
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
6
authorization_attempts possible_fraud
What exactly is Stream Processing?
7
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
What exactly is Stream Processing?
authorization_attempts possible_fraud
8
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
What exactly is Stream Processing?
9
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
What exactly is Stream Processing?
10
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
What exactly is Stream Processing?
11
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
What exactly is Stream Processing?
12
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
authorization_attempts possible_fraud
What exactly is Stream Processing?
13
Streaming is the toolset for dealing with events as they move!
14
Looking more closely: What is a Streaming Platform?
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
15
Looking more closely: Kafka’s Distributed Log
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
16
Kafka’s Distributed Log: A durable messaging system
Kafka is similar to a traditional messaging system (ActiveMQ,
Rabbit,..) but with:
• Better scalability
• Fault Tolerance
• Hight Availability
• Better storage.
17
The log is a simple idea
Messages are always
appended at the end
Old New
18
Consumers have a position all of their own
Sally
is here
George
is here
Fred
is here
Old New
Scan Scan
Scan
19
Only Sequential Access
Old New
Read to offset & scan
20
Scaling Out
21
Shard data to get scalability
Messages are sent to different
partitions
Producer (1) Producer (2) Producer (3)
Cluster
of
machine
s
Partitions live on different machines
22
Replicate to get fault tolerance
replicate
msg
msg
leader
Machine A
Machine B
23
Replication provides resiliency
A ‘replica’ takes over on machine failure
24
Linearly Scalable Architecture
Single topic:
- Many producers machines
- Many consumer machines
- Many Broker machines
No Bottleneck!!
Consumers
Producers
KAFKA
25
Clusters can be connected to provide Worldwide, localized views
25
NY
London
Tokyo
Replicator Replicator
Replicator
26
The Connect API
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
27
Ingest / Egest into practically any data source
Kafka
Connect
Kafka
Connect
Kafka
28
List of Kafka Connect sources and sinks (and more…)
Amazon S3
Elasticsearch
HDFS
JDBC
Couchbase
Cassandra
Oracle
SAP
Vertica
Blockchain
JMX
Kenesis
MongoDB
MQTT
NATS
Postgres
Rabbit
Redis
Twitter
DynamoDB
FTP
Github
BigQuery
Google Pub Sub
RethinkDB
Salesforce
Solr
Splunk
29
The Kafka Streams API / KSQL
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
30
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
Engine for Continuous Computation
31
But it’s just an API
public static void main(String[] args) {
StreamsBuilder builder = new StreamsBuilder();
builder.stream(”caterpillars")
.map((k, v) -> coolTransformation(k, v))
.to(“butterflies”);
new KafkaStreams(builder.build(), props()).start();
}
31
32
Compacted
Topic
Join
Stream
Table
Kafka
Kafka Streams / KSQL
Topic
Join Streams and Tables
33
Windows / Retention – Handle Late Events
The asynchronous dilemma: Who was first? The order or the payment?
KAFKA
Payments
Orders
Buffer 5 mins
Emailer
Join by Key
34
KAFKA
Payments
Orders
Buffer 5 mins
Emailer
Join by Key
KStream orders = builder.stream(“Orders”);
KStream payments = builder.stream(“Payments”);
orders.join(payments, KeyValue::new, JoinWindows.of(1 * MIN))
.peek((key, pair) -> emailer.sendMail(pair));
Windows / Retention – Handle Late Events
35
A KTable is just a stream with infinite retention
KAFKA
Emailer
Orders, Payments
Customers
Join
36
KStream orders = builder.stream(“Orders”);
KStream payments = builder.stream(“Payments”);
KTable customers = builder.table(“Customers”);
orders.join(payments, EmailTuple::new, JoinWindows.of(1*MIN))
.join(customers, (tuple, cust) -> tuple.setCust(cust))
.peek((key, tuple) -> emailer.sendMail(tuple));
KAFKA
Emailer
Orders, Payments
Customers
Join
Materialize a
table in two
lines of code!
A KTable is just a stream with infinite retention
37
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
Kafka is a complete Streaming Platform
38
What happens when we apply this to Microservices?
Microservices
39
MicroservicesApp
Increasingly we build ecosystems: Microservices
40
We break them into services that have specific roles
Customer
Service
Shipping
Service
41
The Problem is now your DATA
42
Most services share the same core facts.
OrdersCustomers
Catalog
Most
services live
in here
43
Kafka works as a Backbone for Services to exchange Events
43
Kafka
Notification
Data is
replicated
45
An example:
Buying an iPad
45
46
Buying an iPad (with REST)
• Orders Service calls Shipping
Service to tell it to ship item.
• Shipping service looks up
address to ship to (from
Customer Service)
Submit
Order
shipOrder() getCustomer()
Orders
Service
Shipping
Service
Customer
Service
Webserver
47
Buying an iPad with Events for Notification
Message Broker (Kafka)
Submit
Order
Order
Created
getCustomer()
REST
Notification
Orders
Service
Shipping
Service
Customer
Service
Webserver
KAFKA
- Orders Service no longer
knows about the Shipping
service (or any other service).
Events are fire and forget.
48
Customer
Updated
Submit
Order
Order
Created
Data is
replicated
Orders
Service
Shipping
Service
Customer
Service
Webserver
KAFKA
Buying an iPad with Events for Replication
- Call to Customer service is
gone.
- Instead data in replicated, as
events, into the shipping
service, where it is queried
locally.
49
Event streams are the key to scalable service ecosystems
Sender has no knowledge of
who consumes the event they
send. This decouples the
system.
Orders
Service
50
A Richer
Microservices
Application
50
51
KAFKA
Order
Requested Order
Validated
Order
Received
Browser
Webserver
Orders
Service
Orders Service validates orders by operating on the event stream
KStreams API used
to validate orders as
they are created.
52
KAFKA
Order
Requested Order
Validated
Order
Received
Browser
Webserver
Orders
Service
Wrap you Messages in Schemas: The Schema Registry
KStreams API used
to validate orders as
they are created.
Schema RegistrySchema Registry
53
KAFKA
Order
Requested Order
Validated
Order
Received
Browser
Webserver
Orders
Service
Use Connect / CDC to Evolve Away From Legacy
KStreams API used
to validate orders as
they are created.
Connect
Stock
Schema RegistrySchema Registry
54
KAFKA
Order
Requested Order
Validated
Order
Received
Browser
Webserver
Orders
Service
Store Events in the Log long term (KTables)
Connect
Stock
Schema RegistrySchema Registry
Lookup table (a.k.a. View)
created inside the Orders
Service
(i.e. data is moved from Kafka to
the KStreams API so it can be
referred to locally)
Stock
55
KAFKA
Order
Requested Order
Validated
Order
Received
Browser
Webserver
Orders
Service
Writable table created
for reserved stocks
Connect
Products
Stock
You can create tables that are writable too
Stock
Reserved Stocks
Reserved Stocks
Schema RegistrySchema Registry
56
KAFKA
Order
Requested Order
Validated
Order
Received
Browser
Webserver
Orders
Service
Connect
Products
Schema Registry
Stock
Reads & Writes wrapped in Transactional Guarantees
Stock
Reserved Stocks
Reserved Stocks
TRANSACTION
57
KAFKA
Order
Requested Order
Validated
Order
Received
Browser
Webserver
Orders
Service
Connect
Products
Schema Registry
Stock
Or, if you prefer, just use a database (Materialized View)
Reserved Stocks
Connect
“Materialized” View in a database
58
POST
GET
Load
Balancer
ORDER
SORDERS
OV
TOPIC
Order
Validations
KAFKA
INVENTORY
Orders
Inventory
Fraud
Service
Order
Details
Service
Inventory
Service
(see previous
figure)
Order
Created
Order
Validated
Orders View
Q in
CQRS
Orders
Service
C is
CQRS
Services in the Micro: Orders Service
Find the code online:
https://github.com/confluentinc/kafka-streams-examples/tree/3.3.0-post/src/main/java/io/confluent/examples/streams/microservices
59
Orders Customers
Payments
Stock
KStreams/KSQL
Larger Ecosystems
HISTORICAL
EVENT STREAMS
60
Kafk
a
Kafka
KAFKA
Kafka
KAFKA
KAFKA
New York
Tokyo
London
Global / Disconnected Ecosystems
61
Key benefits of an Streaming Platform
Streams help you improve Microservices deployments in a number of ways:
• Decouple ecosystems so they are more pluggable and easier to change.
• Evolve away from legacy systems.
• Improve response time by building on asynchronicity first solution.
• Bring progressive responsiveness into the core of your platform.
• Build an agnostic central nervous system for your data systems.
As well:
• Bring data as first class citizen allowing true independence between teams (producers and
consumers).
• Safely manage the evolution of data in the ecosystem, as time passes.
• Embrace event sourcing with an immutable log that can be rewound and replayed.
• Build different (materialized) views based on each service requirements.
64
Services on a
Streaming
Platform
65
66
Thank You!, questions?
Pere Urbón-Bayes
Technical Architect (TAM)
pere@confluent.io
http://www.twitter.com/purbon

Building a Streaming Platform with Kafka