Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME

Conﬂuent Platform
Components Introduction
Customer Success Engineering

Agenda
3
01
Conﬂuent Platform
What makes up the Conﬂuent Platform
02
Basic Concepts
Events, Distributed Commit Log, Event
Streaming/Processing
03
Components
Brokers, Zookeeper, Clients, REST Proxy,
Schema Registry ,Connect, Kafka Streams,
ksqlDB, Control Center, Health+
Features
Multi-Region Clusters, Tiered Storage,
Cluster Linking, Self Balancing clusters
04

What is the Conﬂuent Platform?
An Enterprise Event Streaming Platform built around Apache
Kafka

Dynamic Performance & Elasticity
Elastic Scaling | Infinite Storage
Self-Balancing Clusters | Tiered Storage
Flexible DevOps Automation
Confluent for K8s | Ansible Playbooks
Marketplace Availability
Management & Monitoring
Cloud Data Flow | Metrics API
Control Center | Health+
Streaming Database
ksqlDB
Rich Pre-built Ecosystem
Connectors | Hub | Schema Registry
Multi-language Development
Non-Java Clients | REST Proxy
Admin REST APIs
Global Resilience
Multi AZ Clusters | 99.95% SLA | Replicator
Multi-Region Clusters | Cluster Linking
Data Compatibility
Schema Registry | Schema Validation
Enterprise-grade Security
RBAC | BYOK | Private Networking
Encryption | Audit Logs
TCO / ROI
Revenue / Cost / Risk Impact
Complete Engagement Model
Efficient
Operations at Scale
Unrestricted
Developer Productivity
Production-stage
Prerequisites
Partnership for
Business Success
Availability Everywhere
Committer-driven Expertise
Cloud service
Software
Fully Managed Cloud Service Self-managed Software
Training Partners
Enterprise
Support
Professional
Services
ARCHITECT
OPERATOR
DEVELOPER EXECUTIVE
Apache Kafka
Complete: Confluent completes Apache Kafka

Confluent Platform Components
https://www.confluent.io/whitepaper/confluent-enterprise-reference-architecture/
Application
Sticky Load Balancer
REST Proxy
Proxy
Kafka Brokers
Broker +
Rebalancer
ZooKeeper Nodes
ZK ZK ZK
Proxy
Broker +
Rebalancer
Broker +
Rebalancer
Broker +
Rebalancer
Schema Registry
Leader Follower
ZK ZK
Confluent
Control Center
Application
Clients
KStreams
pp
Streams
Kafka Connect
Worker +
Connectors
or
Replicator
Microservice
Worker +
Connectors
or
Replicator
ksqlDB
ksqlDB
Server
ksqlDB
Server

Apache Kafka is a Distributed Commit Log
Process streams of events
and produce new ones
In real time, as they occur
110101
010111
001101
100010
Publish and subscribe to
streams of events
Similar to a message queue
110101
010111
001101
100010
Store streams of events In a fault tolerant way
110101
010111
001101
100010
12

Anatomy of a Kafka Topic
1 2 3 4 5 6 8 9
7
Partition 1
Old New
1 2 3 4 5 6 8
7
Partition 0 10
9 11 12
Partition 2 1 2 3 4 5 6 8
7 10
9 11 12
Writes
1 2 3 4 5 6 8
7 10
9 11 12
Producers
Writes
Consumer A
(offset=4)
Consumer B
(offset=7)
Reads

Apache Kafka - scale out and failover
16
Broker 1
Topic1
partition1
Broker 2 Broker 3 Broker 4
Topic1
partition1
Topic1
partition1
Topic1
partition2
Topic1
partition2
Topic1
partition2
Topic1
partition3
Topic1
partition4
Topic1
partition3
Topic1
partition3
Topic1
partition4
Topic1
partition4

Apache Zookeeper - cluster coordination
17
Broker 1
partition
Broker 2
(controller) Broker 3 Broker 4
Zookeeper 2
partition
partition
Zookeeper 1
Zookeeper 3
(leader)
partition
partition
partition
partition
Stores metadata:
heartbeats, watches,
controller elections,
cluster/topic conﬁgs,
permissions writes go to leader

Clients
Smart Clients to dumb pipes

What happens inside a producer?
21
Producer
Producer Record
Topic
[Partition]
[Timestamp]
Value
Serializer Partitioner
Topic A
Partition 0
Batch 0
Batch 1
Batch 2
Topic B
Partition 1
Batch 0
Batch 1
Batch 2
Kafka
Broker
Send()
Retry
?
Fail
?
Yes
No
non-retriable
exception
success metadata
Yes
[Headers]
[Key]

Make Kafka
Widely Accessible
to Developers
Enable all developers to
leverage Kafka throughout
the organization with a wide
variety of Conﬂuent clients
Conﬂuent Clients
Battle-tested and high performing
producer and consumer APIs (plus
admin client)

Connect Any
Application to Kafka
REST Proxy
Non-Java
Applications
Native Kafka Java
Applications
Schema Registry
REST / HTTP
Allows third-party apps to
produce and consume
messages
Communicate via
HTTP-connected devices
Provides a RESTful
interface to a Kafka cluster
REST Proxy

29
Admin REST APIs
Confluent Platform
introduces REST APIs for
administrative
operations to simplify
Kafka management
Admin REST APIs add even greater
flexibility in how you manage Kafka:
Describe, list, and configure brokers
Create, delete, describe, list, and configure
topics
Delete, describe, and list consumer groups
Create, delete, describe, and list ACLs
List partition reassignments
Confluent offers several options to run
admin operations, including Control
Center, the CLI, and Kafka clients...

REST Proxy: Key Features
API endpoints:
• Produce messages:
Topic : POST /topics/(string:topic_name)
Partition : POST /topics/(string:topic_name)/partitions/(int:partition_id)
• Consume messages (Note: requires stickyness to REST Proxy instance):
Consumer Group : GET /consumers/(string:group_name)/instances/(string:instance)/records
• Consumer group management:
Add/Remove Instances: POST /consumers/(string:group_name), DELETE
/consumers/(string:group_name)/instances/(string:instance)
Commit/Get Offsets : POST or GET /consumers/(string:group_name)/instances/(string:instance)/offsets
Modify Subscriptions: POST, GET or DELETE /consumers/(string:group_name)/instances/(string:instance)/subscription
Modify Assignments : POST or GET /consumers/(string:group_name)/instances/(string:instance)/assignments
Reposition : POST or GET /consumers/(string:group_name)/instances/(string:instance)/positions
• Get Metadata:
Topic : GET /topics, GET /topics/(string:topic_name)
Partition : GET /topics/(string:topic_name)/partitions/(int:partition_id)
Broker : GET /brokers
• Admin functions (preview):
Create Topic : POST /clusters/(string:cluster_id)/topics
Delete Topic: DELETE /clusters/(string:cluster_id)/topics/(string:topic_name)
List Topic Conﬁgs: Partition : GET /clusters/(string:cluster_id)/topics/(string:topic_name)/conﬁgs
30

Conﬂuent Schema Registry
Enforce Producer/Consumer compatibility

Many sources without a policy
causes mayhem in a centralized
data pipeline
Ensuring downstream systems can
use the data is key to an operational
stream pipeline
Even within a single application,
different formats can be presented Incompatibly formatted message
The Challenge of
Data Compatibility
at Scale
App 03
App 02
App 01
32

Enable Application
Development
Compatibility
App 1
!
Schema
Registry
Kafka
topic
!
Serializer
App 1
Serializer
Develop using standard schemas
• Store and share a versioned
history of all standard schemas
• Validate data compatibility at the
client level
Reduce operational complexity
• Avoid time-consuming
coordination among developers to
standardize on schemas
Schema Registry

Schema Registry: Key Features
• Manage schemas and enforce schema policies
Define, per Kafka topic, a set of compatible schemas that are “allowed”
Schemas can be defined by an admin or by clients at runtime
Avro, Protobuf, and JSON schemas all supported
• Automatic validation when data is written to a topic
If the data doesn’t match the schema, the producer gets an error
• Works transparently
When used with Confluent Kafka clients, Kafka REST Proxy, and Kafka Streams
• Integrates with Kafka Connect
• Integrates with Kafka Streams
• Supports high availability (within a datacenter)
• Supports multi-datacenter deployments
34

Kafka Connect
No/Low Code connectivity to many systems

Kafka Connect
No-Code way of connecting known systems (databases, object storage, queues, etc)
to Apache Kafka
Some code can be written to do custom transforms and data conversions though
maybe out of the box Single Message Transforms and Converters exist
Kafka Connect Kafka Connect
Data
sources
Data
sinks

Kafka
Cluster
Kafka Connect
Durable Data
Pipelines
Schema
Registry
Worker
Integrate upstream and
downstream systems with Apache
Kafka®
• Capture schema from sources, use
schema to inform data sinks
• Highly Available workers ensure
data pipelines aren’t interrupted
• Extensible framework API for
building custom connectors
Kafka Connect
Worker
Worker
Worker

Instantly Connect Popular Data Sources & Sinks
Data Diode
210+
pre-built
connectors
90+ Confluent Supported 60+ Partner Supported, Confluent Verified

Confluent HUB
Easily browse connectors by:
• Source vs Sinks
• Confluent vs Partner supported
• Commercial vs Free
• Available in Confluent Cloud
confluent.io/hub
Instantly Connect
Popular Data
Sources & Sinks

Kafka Connect
Connectors are reusable components that
know how to talk to speciﬁc sources and
sinks.

Kafka Streams
Build apps which with stream processing inside

Stream Processing by Analogy
46
Kafka Cluster
Connect API Stream Processing Connect API
$ cat < in.txt | grep "ksql" | tr a-z A-Z > out.txt

Kafka Streams
Scalable Stream
Processing
Build scalable, durable
stream-processing services with
the Kafka Streams Java Library
• Simple functional API
• Powerful Processing API
• No Framework needed, it’s a
Library, use it and deploy it as any
other JVM Library
builder.stream(inputTopic)
.map((k, v) ->
new KeyValue<>(
(String) v.getAccountId(),
(Integer) v.getTotalValue())
)
.groupByKey()
.count()
.toStream().to(outputTopic);

Where does the processing code run?
49
Brokers?
Nope!
App
Streams
API
Same app, many instances
App
Streams
API
App
Streams
API

Leverages Consumer Group Protocol
50
App
Streams
API
Same app, many instances
App
Streams
API
App
Streams
API

ksqlDB
Stream processing using SQL and much more

Stream Processing in Kafka
52
Flexibility Simplicity
Producer/Consume
r
Kafka Streams API
● subscribe()
● poll()
● send()
● ﬂush()
● ﬁlter()
● map()
● join()
● aggregate()
ksqlDB
● Select…from…
● Join…where…
● Group by..

ksqlDB provides one solution for capturing events, stream processing, and serving both push
and pull queries
Simplify Your Stream Processing Architecture
DB
APP
APP
DB
PULL
PUSH
CONNECTORS
STREAM
PROCESSING
STATE STORES
ksqlDB
1 2
APP

Streaming app with 4 SQL statements
59
Serve lookups against
materialized views
Create
materialized views
Perform continuous
transformations
CREATE SOURCE CONNECTOR jdbcConnector WITH (
‘connector.class’ = '...JdbcSourceConnector',
‘connection.url’ = '...',
…);
CREATE STREAM purchases AS
SELECT viewtime, userid,pageid,
TIMESTAMPTOSTRING(viewtime, 'yyyy-MM-dd HH:mm:ss.SSS')
FROM pageviews;
CREATE TABLE orders_by_country AS
SELECT country, COUNT(*) AS order_count, SUM(order_total) AS order_total
FROM purchases
WINDOW TUMBLING (SIZE 5 MINUTES)
LEFT JOIN purchases ON purchases.customer_id = user_profiles.customer_id
GROUP BY country
EMIT CHANGES;
SELECT * FROM orders_by_country WHERE country='usa';
Capture data

Conﬂuent
Control Center
The simplest way to operate
and build applications with
Apache Kafka
For Operators
Centrally manage and monitor
multi-cluster environments and security
For Developers
View messages, topics and schemas,
manage connectors and build ksqlDB
queries

Messages
Browse messages, and search
offsets or timestamps by
partition
Topics
Create, edit, delete and view all
topics in one place
Schemas
Create, edit and view topic
schemas, and compare schema
versions
Accelerate Application Development and
Integration

Adhere to Established
Event Streaming SLAs
Monitor and optimize
system health
• Broker and ZooKeeper uptime
• Under replicated partitions
• Out of sync replicas
• Disk usage and distribution
• Alerting
Broker overview
Cluster overview

Accelerate Application
Development and
Integration
Simplify the developer’s
mental model for ksqlDB
• View a summary of all clusters
• Develop and run queries
• Support multiple ksqlDB clusters
at a time
Query editor

What is Health+?
● Intelligent alerts: manage Health+
intelligent alerts via Confluent Cloud’s UI
● Accelerated Confluent Support:
Support uses the performance
metadata to help you with questions or
problems even faster.
● Monitoring dashboards: view all of your
critical metrics in a single cloud-based
dashboard.
● Confluent Telemetry Reporter: send
performance metadata back to
Confluent via Confluent Telemetry
Reporter plugin.

Intelligent alerts There are 50+ alerts available today (with
many more to come), including:
● Request handler idle percentage
● Network processor idle percentage
● Active controller count
● Ofﬂine partitions
● Unclean leader elections
● Under replicated partitions
● Under min in-sync replicas
● Disk usage
● Unused topics
● No metrics from cluster in one hour

How does it work ?
● Confluent Telemetry Reporter is a plugin
that runs inside each Confluent
Platform service (only brokers at the
moment) to push metadata about the
service to Confluent
● Data is sent over HTTP using an
encrypted connection, once per minute
by default
● _confluent-telemetry-metrics topic
is where metrics are stored

72
Self-Balancing
Clusters
Self-Balancing Clusters
automate partition
rebalances to improve
Kafka’s performance,
elasticity, and ease of
operations
Shrinkage
Uneven
load
Expansion
Rebalances are required regularly to
optimize cluster performance:

73
Self-Balancing
Clusters
Self-Balancing Clusters
automate partition
rebalances to improve
Kafka’s performance,
elasticity, and ease of
operations
Manual Rebalance Process:
$ cat partitions-to-move.json
{
"partitions": [{
"topic": "foo",
"partition": 1,
"replicas": [1, 2, 4]
}, ...],
"version": 1
}
$ kafka-reassign-partitions ...
Conﬂuent Platform:
No complex math, no risk of human error
Self-Balancing

75
Tiered Storage
Tiered Storage enables
inﬁnite data retention
and elastic scalability by
decoupling the compute
and storage layers in
Kafka
Event Streaming is storage-intensive:
...
Micro-
service
...
SFDC App
Splunk
...
Device
Logs
Object
Storage
Main-
frame
...
Hadoop
Data Stores
3rd Party Apps Custom Apps /
Microservices
Logs

76
Tiered Storage
Kafka
Tiered Storage allows Kafka to
recognize two layers of storage:
Brokers
Cost-effective
Object Storage
Ofﬂoad old data
to object store

77
Tiered Storage
Kafka
Tiered Storage delivers three primary
benefits that revolutionize the way
our customers experience Kafka:
Infinite data retention
Reimagine what event streaming apps can do
Reduced infrastructure costs
Offload data to cost-effective object storage
Platform elasticity
Scale compute and storage independently

80
Multi-Region
Cluster (MRC)
A cluster stretched across multiple
regions that can replicate synchronously
and asynchronously. It requires 3 data
centers/regions minimum (at least for
zookeeper).
It is offset preserving and has automatic
client failover with no custom code.
Note: A rack can only have synchronous
or asynchronous replicas for a topic, not
both. But you can have multiple racks in
a DC/Zone

84
Cluster Linking
Cluster Linking allows you to directly connect
clusters together and mirror topics from one
cluster to another without the need for Connect.
Cluster Linking makes it much easier to build
multi-datacenter, multi-cluster, and hybrid
cloud deployments.

Sharing data between independent
clusters or migrating clusters presents
two challenges:
1. Requires deploying a separate Connect
cluster
2. Offsets are not preserved, so messages
are at risk of being skipped or reread
85
Cluster Linking
Cluster Linking
simpliﬁes hybrid cloud
and multi-cloud
deployments for Kafka
1
2
0 1 2 3 4 ...
4 5 6 7 8 ...
Topic 1, DC 1:
Topic 1, DC 2:
DC 1: DC 2:

86
Cluster Linking
Cluster Linking
simpliﬁes hybrid cloud
and multi-cloud
deployments for Kafka
Cluster Linking requires no additional
infrastructure and preserves offsets:
Migrate
clusters to
Conﬂuent
Cloud

Questions?
ableasdale@conﬂuent.io

Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME

Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME

Similar to Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME