Making Kafka
Cloud Native
1
Companies are becoming software.
2
Old World
...disconnected use cases
around the edges
of a company
Software is...
New World
...a platform for
directly transacting
business
3
Rich front-end
customer experiences
4
Intelligent, real-time
backend operations
5
What technologies enable this?
6
Internet of
Things
Mobile
Microservices
Machine
Learning
7
Data in Motion
Data in Motion
Cloud
Data at Rest
8
9
Databases
Databases are fundamentally incomplete.
Databases are designed for siloed, UI-centric applications.
10
Data at rest
Slow, daily
batch processing
Simple, static
real-time queries
Databases
A software defined company requires
total connectivity and instant reaction
in real-time.
11
PUBLIC CLOUD
With only data at rest, a company is
fragmented and siloed
LINE OF BUSINESS 02
LINE OF BUSINESS 01
12
Data in Motion
13
The Paradigm for Data in Motion: Event Streams
Data Management = Storage + Flow
14
Rich front-end
customer experiences
Real-time
Data
Real-time
Stream Processing
Real-time backend
operations
QUERY
A Sale
A shipment
A Trade
A Customer
Experience
PUBLIC CLOUD
Enterprise Architecture is a Big Mess
LINE OF BUSINESS 02
LINE OF BUSINESS 01
15
Central Nervous System
16
Kafka
ksqlDB:
Data in Motion + Data at Rest
17
User Payments
Jay 42
Sue 18
Fred 65
... ...
User
Jay 695
Sue 430
User
Jay 695
Sue 430
Tables Streams
User Credit Score
Jay 695
Sue 430
Fred 710
V1
V3
V2
SELECT * FROM
DB_TABLE
CREATE TABLE T
AS SELECT * FROM
EVENT_STREAM
Active Query: Passive Data:
DB Table
Active Data: Passive Query:
Event Stream
Traditional
Database
Stream
Processing
Build a complete streaming app
with a few SQL statements
20
Capture
events
Perform
continuous
transformations
Create
materialized
views
Serve lookups
against
materialized
views
1 2 3 4
Central Nervous System
21
Kafka
Internet of
Things
Mobile
Microservices
Machine
Learning
22
Cloud
Cloud
Data in Motion
The Rise of Cloud &
Cloud Native Data Services
23
Over the next four years $140B+ in IT
spend will move to the cloud.
24
Cloud Native Data Systems
Aurora DynamoDB Kinesis
S3 Spanner Snowflake
25
What is a Cloud-Native Data System?
Elastic
Usage-based
Cost Model
Infinite
Api-driven
Operations
Secure and
Reliable
Serverless
Global
Multitenant
26
My Experience
27
Managing Kafka at Scale is Hard
Tweaking servers
Tuning GC
Fiddling with ZK
Upgrades
Patches
Rebalances
Mirroring
28
What my manager
thinks I do
What the Front-end
team thinks I do
What I think I do
What I actually do
“I know! We’ll use
Kubernetes!”
30
Cloud Native Data Systems
Are The Future
31
Moving up the stack
32
If running Kafka is so hard, why is Kafka winning?
33
Apache Kafka
Amazon
Kinesis
Microsoft Azure
Event Hubs
Google Cloud
Pub/Sub
VS
End-to-End Solution
34
Pub/Sub Stream
Processing
Storage
Connectors
Community
28,030
Stack Overflow
Questions
210
Meetups
58,902
Meetup Attendees
11,315
Jiras for
Apache Kafka
51,335
Emails sent to the
Apache Kafka Mailing list
667
KIPs
35
ksqlDB
Ecosystem
A point of view
Open Platforms Win
Linux Email
TCP-IP
x86
38
For Kafka to Thrive There Must Be
Cloud Native Kafka Services
39
The Capabilities of a Cloud-Native Data System
Elastic
Usage-based
Cost Model
Infinite
Api-driven
Operations
Secure and
Reliable
Serverless
Global
Multitenant
40
Cloud-native
Kinesis
EventHubs
Pub/Sub
Kafka
Self-managed and
Semi-managed
41
Making a Cloud Native
Kafka Service
42
Sign up
43
Elastic
44
Infinite Storage
45
Usage-based Billing
46
Cost Effective
47
Complete
48
Connect
49
KSQL
50
Data Governance meets Data Discovery
Self-service Platform
Security
Data Catalog
Data Lineage
Data Policies
Data Quality
51
Confluent Cloud Data Governance
Data Quality
Increase data trust
● Enterprise ready Schema
Registry
● Schemas management UI
● Broker-side schema ID
validation
Data Catalog
Classify, organize, discover
● Search and discover
schemas metadata
● Manage data classifications
● Classify schemas with tags
Data Lineage
Turn data visibility on
● Visualize complex data
in motion pipelines
● Audit data movement
across systems
NOW IN EARLY-ACCESS
52
Everywhere
53
54
Kafka must be
everywhere
your data is
Everywhere
55
● Create a global fabric for event
streams that spans the globe
● Span cloud environments and
on premises
● Enables dynamic, API-driven
replication topologies
● Exact offset mirroring between
clusters
● No additional moving parts to
manage or monitor
56
Global: Cluster linking
Keep Data in Sync Globally
to serve regional needs
Currently in Preview
What about on premises?
57
Confluent for Kubernetes
Introducing a Declarative, API-driven Control Plane to deploy
and manage Confluent in Private Infrastructures.
AVAILABLE NOW!
Runs on Kubernetes: the infrastructure
runtime for cloud-native architectures.
Declarative API for
operating Confluent
in production
Integrates with
Cloud-Native ecosystem
for Security, Reliability,
DevOps Automation
Manage topics and
RBAC policies through
Infrastructure as Code
58
Not Just About One Cloud Service,
Kafka Itself Improves
59
KIP-500 Early Access
No ZooKeeper
60
Event-Driven Raft Consensus (AKA Kraft)
L
L
With Zookeeper With Quorum Controller
L denotes quorum leader
61
Simplicity is Sophistication
Simpler
operations
Tighter
security
2 Million+ partitions
per cluster
Single process
execution
62
Simpler
deployment
Support for up to 2m partitions
0
200
400
600
Controlled Shutdown Time Recovery from Uncontrolled Shutdown
Zookeeper
Quorum Controller
63
Event Streaming Platform in Every Company
64
Kafka
This is the essential architecture for
companies that are becoming software.
65
Making Kafka
Cloud Native
66

Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent