SlideShare a Scribd company logo
Learnings From Shipping
1000+ Streaming Data Pipelines
To Production
Hakan Lofcali, Stefan Sprenger
{hakan,stefan}@datacater.io
‣We develop tools for developers working with streaming data
‣With Kafka, Kubernetes, and less than 5 developers, we built a platform that helped teams to
deploy more than 1,000 streaming data pipelines to production
‣Let’s take you on our journey and the tools we adopted, hurdles encountered, and solutions
found
‣Infra Space
‣Customer Solutions Space
2
WHAT WE DO / WHO WE ARE
3
STREAMING DATA PIPELINES
Continuous Applications of Data Transformations
Kafka Topic
Kafka
Connect
Source
Connector
Kafka
Connect
Sink
Connector
Kafka Topic Kafka Streams
4
STREAMING PIPELINES IN THE WILD
Customer communications in real-time
Actionable
data
Clickstream
data
Outbox
service
Raw data
Process
clickstream events
5
GOALS FOR THIS TALK
Avoid common pitfalls in streaming ETL
‣How to operate streaming data pipelines in an efficient and robust manner?
‣How to deal with resource-leaking Kafka Connect connectors?
‣How to monitor and debug running pipelines?
‣What are ways to deal with large data sources or slow data sinks?
‣What is missing in today’s ecosystem for streaming to become a commodity?
How to operate streaming data pipelines in
an efficient and robust manner?
6
7
SEPARATE BY NODE POOL
Kafka NodePool
Apache Kafka
Broker*
Strimzi Kafka
Operator
Control Plane Nodepool
Pod
We operate one K8s Cluster - Multiple Node pools
K8s StatefulSet
K8s Deployment
DataCater
Control Plane
K8s Deployment
…
Kafka Connect Nodepool
Quarkus
Pipeline
Pod
K8s Deployment(s) ▸ Max 110 pods per
node
▸ Max 5,000 nodes per
cluster
▸ Max 150,000 pods in
total
▸ *Separate Kafka and
Kafka Connect
clusters
State-of-the-art Orchestration
8
PROCESS ORCHESTRATION
▸ We started out on a single VM and moved to a distributed
process orchestration tool
▸ Kafka’s ecosystem is lagging state of the art process
orchestration like Kubernetes, Nomad, etc.
▸ ksqlDB and Kafka Connect manage processes, but we will
see how they are lacking fundamental patterns to be
operated at scale
Kafka Streams
Single VM Docker
Java Quarkus
Kubernetes
Quarkus SmallRye Reactive Messaging
9
STARTUP TIME
Scheduled
0s
Scheduled
60s
First Event
Processed
First Event
Processed
5s …
Liveness
OK
10s
Kafka Streams
30s
10
WORKLOAD DENSITY
Docker on Single VM
Quarkus
Quarkus
Quarkus
Kubernetes
Kafka Streams
Kafka Streams
RAM < 1.5GB
RAM < 1.5GB
RAM < 300MB
RAM < 300MB
RAM < 300MB
11
STRIMZI FOR KAFKA
…
Kubernetes; Dedicated node pool for Kafka
Apache Kafka
Broker
Apache Kafka
Broker
Apache Kafka
Broker
Strimzi Kafka
Operator
Kubernetes
StatefulSet
Pod Pod Pod
12
CONSUMER RE-BALANCING
summit
consumer 0
summit
consumer 1
summit
consumer 2
…
Pod Pod Pod
summit
partition 0
summit
partition 1
summit
partition 2
… … …
…
… …
13
CONSUMER RE-BALANCING
summit
consumer 0
summit
consumer 1
summit
consumer 2
…
Pod Pod Pod
summit
partition 0
summit
partition 1
summit
partition 2
… … …
…
… …
▸ Consumer re-balancing will
cause no consumption until
re-balancing is completed
by co-ordinator
▸ Number of consumers can
change due to errors,
disconnection, and
triggered by new load
requirements
14
UNEXPECTED SHUTDOWN
Startup Time
Partition Size
15
UNEXPECTED SHUTDOWN
Startup Time
Partition Size
Not Linear
Point of No Recovery
Log Size
How to deal with resource-leaking Kafka
Connect connectors?
16
17
KAFKA CONNECT SELF-MANAGED
…
Kubernetes (K8s); Dedicated Kafka Connect Nodepool
ElasticSearch Sink
Task C
S3 Source
Task A
PostgreSQL Source
Task A
K8s Deployment / Connect Cluster
Pod Pod Pod
PostgreSQL Source
Task B
MySQL CDC Source
Task A
MySQL CDC Source
Task B
K8s Deployment / Connect Cluster
18
KAFKA CONNECT SELF-MANAGED
…
Kubernetes (K8s); Dedicated Kafka Connect Nodepool
ElasticSearch Sink
Task C
MySQL CDC Source
Task A
MySQL CDC Source
Task B
S3 SOURCE
TASK A
Pod Pod
PostgreSQL Source
Task A
PostgreSQL Source
Task B
K8s Deployment / Connect Cluster
19
KAFKA CONNECT SELF-MANAGED
…
Kubernetes (K8s); Dedicated Kafka Connect Nodepool
ElasticSearch Sink
Task C
S3 SOURCE
TASK A
Pod Pod
MySQL CDC Source
Task A
MySQL CDC Source
Task B
PostgreSQL Source
Task A
PostgreSQL Source
Task B
K8s Deployment / Connect Cluster
20
KAFKA CONNECT SELF-MANAGED
…
Kubernetes (K8s); Dedicated Kafka Connect Nodepool
ELASTICSEARCH SINK
TASK C
S3 SOURCE
TASK A
Pod Pod
MySQL CDC Source
Task A
MySQL CDC Source
Task B
PostgreSQL Source
Task A
PostgreSQL Source
Task B
Connect Cluster Connect Cluster
21
KAFKA CONNECT SELF-MANAGED
…
Kubernetes (K8s); Dedicated Kafka Connect Nodepool
ElasticSearch Sink
Task C
S3 Source
Task A
Connect Cluster
Pod Pod Pod
MySQL CDC Source
Task A
MySQL CDC Source
Task B
PostgreSQL Source
Task A
PostgreSQL Source
Task B
Connect Cluster Connect Cluster
22
KAFKA CONNECT SELF-MANAGED
…
Kubernetes (K8s); Dedicated Kafka Connect Nodepool
ElasticSearch Sink
Task A
PostgreSQL Source
Task A
Connect Cluster
Pod Pod
S3 Source
Task A
Pod
MySQL CDC Source
Task A
Connect Cluster Connect Cluster
23
KAFKA CONNECT SELF-MANAGED
…
Kubernetes (K8s); Dedicated Kafka Connect Nodepool
S3 SOURCE
TASK A
ElasticSearch Sink
Task A
PostgreSQL Source
Task A
Connect Cluster
Pod Pod
MySQL CDC Source
Task A
Connect Cluster Connect Cluster
24
KAFKA CONNECT SELF-MANAGED
…
Kubernetes (K8s); Dedicated Kafka Connect Nodepool
S3 SOURCE
TASK A
ElasticSearch Sink
Task A
PostgreSQL Source
Task A
Connect Cluster
Pod Pod
MySQL CDC Source
Task A
Connect Cluster Connect Cluster
25
KAFKA CONNECT SELF-MANAGED
…
Kubernetes (K8s); Dedicated Kafka Connect Nodepool
ElasticSearch Sink
Task A
PostgreSQL Source
Task A
Connect Cluster
Pod Pod
S3 Source
Task A
Pod
MySQL CDC Source
Task A
▸ Utilise state of the art orchestration tools.
▸ Running Kafka on Kubernetes does not bring automatic elasticity.
▸ Kafka Connect is not self-contained. This will become a larger headache the more
connector tasks are running in a given cluster.
▸ Think about startup time throughout your tech stack. From Kafka brokers over Connect
tasks to streaming applications.
26
TAKE-AWAYS
Key Learnings
How to monitor and debug pipelines?
27
28
MONITORING STREAMING DATA PIPELINES
Kafka Topic
Kafka
Connect
Source
Connector
Kafka
Connect
Sink
Connector
Kafka Topic Kafka Streams
▸ External data sources or data sinks are unavailable (temporarily)
▸ Consumers (processors or sink connectors) are slower than producers
▸ Processing of events fails
29
POTENTIAL PRODUCTION ISSUES
Most common issues in streaming data pipelines
▸ External data sources or data sinks are unavailable (temporarily)
▸ Consumers (processors or sink connectors) are slower than producers
▸ Processing of events fails
30
POTENTIAL PRODUCTION ISSUES
Most common issues in streaming data pipelines
31
MONITORING CONNECTORS
Kafka Topic
Kafka
Connect
Source
Connector
Kafka
Connect
Sink
Connector
Kafka Topic Kafka Streams
Monitoring the health of connectors
‣Periodically call /connectors/:connector_name/status and investigate the response
32
MONITORING CONNECTORS
GET /connectors/hdfs-sink/status
{
"name": "hdfs-sink",
"connector": {
"state": "RUNNING",
"worker_id": "localhost:8083"
},
"tasks":
[
{
"id": 0,
"state": "RUNNING",
"worker_id": “localhost:8083"
}
]
}
Healthy
33
MONITORING CONNECTORS
GET /connectors/hdfs-sink/status
{
"name": "hdfs-sink",
"connector": {
"state": “FAILED",
"worker_id": "localhost:8083"
},
"tasks":
[
{
"id": 0,
"state": "FAILED",
"worker_id": “localhost:8083”,
"trace": "org.apache.kafka.common.errors.RecordTooLargeExceptionn"
}
]
}
Unhealthy
34
MONITORING CONNECTORS
Kafka Topic
Kafka
Connect
Source
Connector
Kafka
Connect
Sink
Connector
Kafka Topic Kafka Streams
‣Periodically call /connectors/:connector_name/status and investigate the response
‣If failed, try to restart the connector (e.g., deals with temporary API outages) and
escalate or alert after X restarts
‣Sometimes, directly escalating might be reasonable
Monitoring the health of connectors
▸ External data sources or data sinks are unavailable (temporarily)
▸ Consumers (processors or sink connectors) are slower than producers
▸ Processing of events fails
35
POTENTIAL PRODUCTION ISSUES
Most common issues in streaming data pipelines
36
MONITORING BACKPRESSURE
Consumer Lags
Kafka Topic Consumer
‣Difference between the latest offset available in the Kafka topic (partition) and the
latest offset processed by the consumer
‣Resembles how much consumers are behind producers in terms of number of records
processed
37
MONITORING BACKPRESSURE
Kafka Topic
Kafka
Connect
Source
Connector
Kafka
Connect
Sink
Connector
Kafka Topic Kafka Streams
Consumer Lags in Streaming Data Pipelines
38
MONITORING BACKPRESSURE
Kafka Topic
Kafka
Connect
Source
Connector
Kafka
Connect
Sink
Connector
Kafka Topic Kafka Streams
Kafka Streams Consumer Lag
‣Number of records that have been extracted by the data source connector but have not
yet been processed by the Kafka Streams app
‣If data processing is slower than extraction, you might want to increase the degree of
parallelism of the Kafka Streams app
39
MONITORING BACKPRESSURE
Kafka Topic
Kafka
Connect
Source
Connector
Kafka
Connect
Sink
Connector
Kafka Topic Kafka Streams
Sink Connector Consumer Lag
‣Number of records that have been processed by the Kafka Streams app but have not yet
been published by the sink connector
‣If publishing data to the data sinks is slower than processing, you might want to increase
the number of tasks of the sink connector
▸ External data sources or data sinks are unavailable (temporarily)
▸ Consumers (processors or sink connectors) are slower than producers
▸ Processing of events fails
40
POTENTIAL PRODUCTION ISSUES
Most common issues in streaming data pipelines
41
DEAD-LETTER QUEUES
Keep track of errors in processing
‣By default, Kafka Connect connectors fail
when observing errors in processing
‣We recommend to configure a dead-letter
queue (topic) for storing records that could
not be processed
‣Monitor the dead-letter queue topic and
manually investigate failed records
errors.tolerance = all
errors.deadletterqueue.topic.name = topic-dlq
Topic
Dead-letter
queue topic
Successful
processing
Failed
processing
Kafka
Connect
Source
Connector
What are ways to deal with large data
sources or slow data sinks?
42
43
DEALING WITH LARGE DATA SOURCES
‣Hurts a lot when performing initial snapshots,
which can take hours
‣Use multiple connectors for the same database
and make use of table.include.list
‣Adjust the snapshot query and consider only a
subset of the data source
‣Mitigate pain with incremental snapshotting
‣Accelerate snapshotting with parallelisation
PostgreSQL
Debezium
Source
Connector
TBs of data
44
DEALING WITH SLOW DATA SINKS
Kafka
Connect
Sink
Connector
Elasticsearch
‣Detect slow data sinks by monitoring the sink
connector consumer lag
‣Parallelise sending records to the data sink by
increasing the number of connector tasks
‣If available, batch multiple records and send
them with one request to the data sink
‣Avoid duplicated data delivery by adjusting
max.poll.records or max.poll.interval.ms
What is missing in today’s ecosystem for
streaming to become a commodity?
45
46
SERVERLESS TOPICS
‣Partitioned topics are the de-facto standard for
persisting events
‣# partitions = maximum degree of parallelism
‣Choosing the number of partitions remains a crucial
questions with significant impact on future cost and
performance, and needs to be answered at topic
creation time (!)
‣Having the ability to dynamically choose the degree of
parallelism would allow to easier cope with peak loads
"Horizontal Partition Autoscaler”
Partition 0
1 partition
Partition 0 Partition 1 Partition 2
3 partitions
Partition 0
1 partition
Scale Up
Scale Down
47
EASE OPERATIONS
More and better managed services
‣Operating streaming data pipelines boils down to running multiple distributed
systems and remains one of the big hurdles for its adoption
‣Managed services can reduce the operational pain
‣We witness the rise of cloud/SaaS offerings but believe there is still lots of room for
improvement
Summary
48
49
TAKE-AWAYS
Summary
‣Throwing Kafka and Kafka Connect at Kubernetes is beneficial but does not provide
a true cloud-native experience. It takes a few steps to, for instance, apply the self-
containment principle to Kafka Connect.
‣If possible, try to handle errors of connectors or streaming applications in an
automated manner without bringing the pipeline down
‣A lot of issues occur when integrating external systems that you do not control, e.g.,
snapshotting a very large database table, sending events to slow APIs, etc.
Questions?
50

More Related Content

Similar to Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hakan Lofcali & Stefan Sprenger

Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
Will Gardella
 
Kafka summit apac session
Kafka summit apac sessionKafka summit apac session
Kafka summit apac session
Christina Lin
 
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud ServicesBuild a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
confluent
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
confluent
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
NguyenChiHoangMinh
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en bas
Florent Ramiere
 
All Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZAll Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZ
confluent
 
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
HostedbyConfluent
 
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
HostedbyConfluent
 
Building Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and KafkaBuilding Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and Kafka
ScyllaDB
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Edureka!
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
PivotalOpenSourceHub
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
confluent
 
Introducing Confluent Cloud: Apache Kafka as a Service
Introducing Confluent Cloud: Apache Kafka as a Service Introducing Confluent Cloud: Apache Kafka as a Service
Introducing Confluent Cloud: Apache Kafka as a Service
confluent
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
confluent
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
Attunity
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
Monal Daxini
 
Citi Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid CloudCiti Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid Cloud
confluent
 
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Anant Corporation
 

Similar to Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hakan Lofcali & Stefan Sprenger (20)

Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 
Kafka summit apac session
Kafka summit apac sessionKafka summit apac session
Kafka summit apac session
 
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud ServicesBuild a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en bas
 
All Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZAll Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZ
 
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
 
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
 
Building Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and KafkaBuilding Event Streaming Architectures on Scylla and Kafka
Building Event Streaming Architectures on Scylla and Kafka
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Introducing Confluent Cloud: Apache Kafka as a Service
Introducing Confluent Cloud: Apache Kafka as a Service Introducing Confluent Cloud: Apache Kafka as a Service
Introducing Confluent Cloud: Apache Kafka as a Service
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
 
Citi Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid CloudCiti Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid Cloud
 
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 

Recently uploaded (20)

Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 

Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hakan Lofcali & Stefan Sprenger

  • 1. Learnings From Shipping 1000+ Streaming Data Pipelines To Production Hakan Lofcali, Stefan Sprenger {hakan,stefan}@datacater.io
  • 2. ‣We develop tools for developers working with streaming data ‣With Kafka, Kubernetes, and less than 5 developers, we built a platform that helped teams to deploy more than 1,000 streaming data pipelines to production ‣Let’s take you on our journey and the tools we adopted, hurdles encountered, and solutions found ‣Infra Space ‣Customer Solutions Space 2 WHAT WE DO / WHO WE ARE
  • 3. 3 STREAMING DATA PIPELINES Continuous Applications of Data Transformations Kafka Topic Kafka Connect Source Connector Kafka Connect Sink Connector Kafka Topic Kafka Streams
  • 4. 4 STREAMING PIPELINES IN THE WILD Customer communications in real-time Actionable data Clickstream data Outbox service Raw data Process clickstream events
  • 5. 5 GOALS FOR THIS TALK Avoid common pitfalls in streaming ETL ‣How to operate streaming data pipelines in an efficient and robust manner? ‣How to deal with resource-leaking Kafka Connect connectors? ‣How to monitor and debug running pipelines? ‣What are ways to deal with large data sources or slow data sinks? ‣What is missing in today’s ecosystem for streaming to become a commodity?
  • 6. How to operate streaming data pipelines in an efficient and robust manner? 6
  • 7. 7 SEPARATE BY NODE POOL Kafka NodePool Apache Kafka Broker* Strimzi Kafka Operator Control Plane Nodepool Pod We operate one K8s Cluster - Multiple Node pools K8s StatefulSet K8s Deployment DataCater Control Plane K8s Deployment … Kafka Connect Nodepool Quarkus Pipeline Pod K8s Deployment(s) ▸ Max 110 pods per node ▸ Max 5,000 nodes per cluster ▸ Max 150,000 pods in total ▸ *Separate Kafka and Kafka Connect clusters
  • 8. State-of-the-art Orchestration 8 PROCESS ORCHESTRATION ▸ We started out on a single VM and moved to a distributed process orchestration tool ▸ Kafka’s ecosystem is lagging state of the art process orchestration like Kubernetes, Nomad, etc. ▸ ksqlDB and Kafka Connect manage processes, but we will see how they are lacking fundamental patterns to be operated at scale Kafka Streams Single VM Docker Java Quarkus Kubernetes
  • 9. Quarkus SmallRye Reactive Messaging 9 STARTUP TIME Scheduled 0s Scheduled 60s First Event Processed First Event Processed 5s … Liveness OK 10s Kafka Streams 30s
  • 10. 10 WORKLOAD DENSITY Docker on Single VM Quarkus Quarkus Quarkus Kubernetes Kafka Streams Kafka Streams RAM < 1.5GB RAM < 1.5GB RAM < 300MB RAM < 300MB RAM < 300MB
  • 11. 11 STRIMZI FOR KAFKA … Kubernetes; Dedicated node pool for Kafka Apache Kafka Broker Apache Kafka Broker Apache Kafka Broker Strimzi Kafka Operator Kubernetes StatefulSet Pod Pod Pod
  • 12. 12 CONSUMER RE-BALANCING summit consumer 0 summit consumer 1 summit consumer 2 … Pod Pod Pod summit partition 0 summit partition 1 summit partition 2 … … … … … …
  • 13. 13 CONSUMER RE-BALANCING summit consumer 0 summit consumer 1 summit consumer 2 … Pod Pod Pod summit partition 0 summit partition 1 summit partition 2 … … … … … … ▸ Consumer re-balancing will cause no consumption until re-balancing is completed by co-ordinator ▸ Number of consumers can change due to errors, disconnection, and triggered by new load requirements
  • 15. 15 UNEXPECTED SHUTDOWN Startup Time Partition Size Not Linear Point of No Recovery Log Size
  • 16. How to deal with resource-leaking Kafka Connect connectors? 16
  • 17. 17 KAFKA CONNECT SELF-MANAGED … Kubernetes (K8s); Dedicated Kafka Connect Nodepool ElasticSearch Sink Task C S3 Source Task A PostgreSQL Source Task A K8s Deployment / Connect Cluster Pod Pod Pod PostgreSQL Source Task B MySQL CDC Source Task A MySQL CDC Source Task B
  • 18. K8s Deployment / Connect Cluster 18 KAFKA CONNECT SELF-MANAGED … Kubernetes (K8s); Dedicated Kafka Connect Nodepool ElasticSearch Sink Task C MySQL CDC Source Task A MySQL CDC Source Task B S3 SOURCE TASK A Pod Pod PostgreSQL Source Task A PostgreSQL Source Task B
  • 19. K8s Deployment / Connect Cluster 19 KAFKA CONNECT SELF-MANAGED … Kubernetes (K8s); Dedicated Kafka Connect Nodepool ElasticSearch Sink Task C S3 SOURCE TASK A Pod Pod MySQL CDC Source Task A MySQL CDC Source Task B PostgreSQL Source Task A PostgreSQL Source Task B
  • 20. K8s Deployment / Connect Cluster 20 KAFKA CONNECT SELF-MANAGED … Kubernetes (K8s); Dedicated Kafka Connect Nodepool ELASTICSEARCH SINK TASK C S3 SOURCE TASK A Pod Pod MySQL CDC Source Task A MySQL CDC Source Task B PostgreSQL Source Task A PostgreSQL Source Task B
  • 21. Connect Cluster Connect Cluster 21 KAFKA CONNECT SELF-MANAGED … Kubernetes (K8s); Dedicated Kafka Connect Nodepool ElasticSearch Sink Task C S3 Source Task A Connect Cluster Pod Pod Pod MySQL CDC Source Task A MySQL CDC Source Task B PostgreSQL Source Task A PostgreSQL Source Task B
  • 22. Connect Cluster Connect Cluster 22 KAFKA CONNECT SELF-MANAGED … Kubernetes (K8s); Dedicated Kafka Connect Nodepool ElasticSearch Sink Task A PostgreSQL Source Task A Connect Cluster Pod Pod S3 Source Task A Pod MySQL CDC Source Task A
  • 23. Connect Cluster Connect Cluster 23 KAFKA CONNECT SELF-MANAGED … Kubernetes (K8s); Dedicated Kafka Connect Nodepool S3 SOURCE TASK A ElasticSearch Sink Task A PostgreSQL Source Task A Connect Cluster Pod Pod MySQL CDC Source Task A
  • 24. Connect Cluster Connect Cluster 24 KAFKA CONNECT SELF-MANAGED … Kubernetes (K8s); Dedicated Kafka Connect Nodepool S3 SOURCE TASK A ElasticSearch Sink Task A PostgreSQL Source Task A Connect Cluster Pod Pod MySQL CDC Source Task A
  • 25. Connect Cluster Connect Cluster 25 KAFKA CONNECT SELF-MANAGED … Kubernetes (K8s); Dedicated Kafka Connect Nodepool ElasticSearch Sink Task A PostgreSQL Source Task A Connect Cluster Pod Pod S3 Source Task A Pod MySQL CDC Source Task A
  • 26. ▸ Utilise state of the art orchestration tools. ▸ Running Kafka on Kubernetes does not bring automatic elasticity. ▸ Kafka Connect is not self-contained. This will become a larger headache the more connector tasks are running in a given cluster. ▸ Think about startup time throughout your tech stack. From Kafka brokers over Connect tasks to streaming applications. 26 TAKE-AWAYS Key Learnings
  • 27. How to monitor and debug pipelines? 27
  • 28. 28 MONITORING STREAMING DATA PIPELINES Kafka Topic Kafka Connect Source Connector Kafka Connect Sink Connector Kafka Topic Kafka Streams
  • 29. ▸ External data sources or data sinks are unavailable (temporarily) ▸ Consumers (processors or sink connectors) are slower than producers ▸ Processing of events fails 29 POTENTIAL PRODUCTION ISSUES Most common issues in streaming data pipelines
  • 30. ▸ External data sources or data sinks are unavailable (temporarily) ▸ Consumers (processors or sink connectors) are slower than producers ▸ Processing of events fails 30 POTENTIAL PRODUCTION ISSUES Most common issues in streaming data pipelines
  • 31. 31 MONITORING CONNECTORS Kafka Topic Kafka Connect Source Connector Kafka Connect Sink Connector Kafka Topic Kafka Streams Monitoring the health of connectors ‣Periodically call /connectors/:connector_name/status and investigate the response
  • 32. 32 MONITORING CONNECTORS GET /connectors/hdfs-sink/status { "name": "hdfs-sink", "connector": { "state": "RUNNING", "worker_id": "localhost:8083" }, "tasks": [ { "id": 0, "state": "RUNNING", "worker_id": “localhost:8083" } ] } Healthy
  • 33. 33 MONITORING CONNECTORS GET /connectors/hdfs-sink/status { "name": "hdfs-sink", "connector": { "state": “FAILED", "worker_id": "localhost:8083" }, "tasks": [ { "id": 0, "state": "FAILED", "worker_id": “localhost:8083”, "trace": "org.apache.kafka.common.errors.RecordTooLargeExceptionn" } ] } Unhealthy
  • 34. 34 MONITORING CONNECTORS Kafka Topic Kafka Connect Source Connector Kafka Connect Sink Connector Kafka Topic Kafka Streams ‣Periodically call /connectors/:connector_name/status and investigate the response ‣If failed, try to restart the connector (e.g., deals with temporary API outages) and escalate or alert after X restarts ‣Sometimes, directly escalating might be reasonable Monitoring the health of connectors
  • 35. ▸ External data sources or data sinks are unavailable (temporarily) ▸ Consumers (processors or sink connectors) are slower than producers ▸ Processing of events fails 35 POTENTIAL PRODUCTION ISSUES Most common issues in streaming data pipelines
  • 36. 36 MONITORING BACKPRESSURE Consumer Lags Kafka Topic Consumer ‣Difference between the latest offset available in the Kafka topic (partition) and the latest offset processed by the consumer ‣Resembles how much consumers are behind producers in terms of number of records processed
  • 38. 38 MONITORING BACKPRESSURE Kafka Topic Kafka Connect Source Connector Kafka Connect Sink Connector Kafka Topic Kafka Streams Kafka Streams Consumer Lag ‣Number of records that have been extracted by the data source connector but have not yet been processed by the Kafka Streams app ‣If data processing is slower than extraction, you might want to increase the degree of parallelism of the Kafka Streams app
  • 39. 39 MONITORING BACKPRESSURE Kafka Topic Kafka Connect Source Connector Kafka Connect Sink Connector Kafka Topic Kafka Streams Sink Connector Consumer Lag ‣Number of records that have been processed by the Kafka Streams app but have not yet been published by the sink connector ‣If publishing data to the data sinks is slower than processing, you might want to increase the number of tasks of the sink connector
  • 40. ▸ External data sources or data sinks are unavailable (temporarily) ▸ Consumers (processors or sink connectors) are slower than producers ▸ Processing of events fails 40 POTENTIAL PRODUCTION ISSUES Most common issues in streaming data pipelines
  • 41. 41 DEAD-LETTER QUEUES Keep track of errors in processing ‣By default, Kafka Connect connectors fail when observing errors in processing ‣We recommend to configure a dead-letter queue (topic) for storing records that could not be processed ‣Monitor the dead-letter queue topic and manually investigate failed records errors.tolerance = all errors.deadletterqueue.topic.name = topic-dlq Topic Dead-letter queue topic Successful processing Failed processing Kafka Connect Source Connector
  • 42. What are ways to deal with large data sources or slow data sinks? 42
  • 43. 43 DEALING WITH LARGE DATA SOURCES ‣Hurts a lot when performing initial snapshots, which can take hours ‣Use multiple connectors for the same database and make use of table.include.list ‣Adjust the snapshot query and consider only a subset of the data source ‣Mitigate pain with incremental snapshotting ‣Accelerate snapshotting with parallelisation PostgreSQL Debezium Source Connector TBs of data
  • 44. 44 DEALING WITH SLOW DATA SINKS Kafka Connect Sink Connector Elasticsearch ‣Detect slow data sinks by monitoring the sink connector consumer lag ‣Parallelise sending records to the data sink by increasing the number of connector tasks ‣If available, batch multiple records and send them with one request to the data sink ‣Avoid duplicated data delivery by adjusting max.poll.records or max.poll.interval.ms
  • 45. What is missing in today’s ecosystem for streaming to become a commodity? 45
  • 46. 46 SERVERLESS TOPICS ‣Partitioned topics are the de-facto standard for persisting events ‣# partitions = maximum degree of parallelism ‣Choosing the number of partitions remains a crucial questions with significant impact on future cost and performance, and needs to be answered at topic creation time (!) ‣Having the ability to dynamically choose the degree of parallelism would allow to easier cope with peak loads "Horizontal Partition Autoscaler” Partition 0 1 partition Partition 0 Partition 1 Partition 2 3 partitions Partition 0 1 partition Scale Up Scale Down
  • 47. 47 EASE OPERATIONS More and better managed services ‣Operating streaming data pipelines boils down to running multiple distributed systems and remains one of the big hurdles for its adoption ‣Managed services can reduce the operational pain ‣We witness the rise of cloud/SaaS offerings but believe there is still lots of room for improvement
  • 49. 49 TAKE-AWAYS Summary ‣Throwing Kafka and Kafka Connect at Kubernetes is beneficial but does not provide a true cloud-native experience. It takes a few steps to, for instance, apply the self- containment principle to Kafka Connect. ‣If possible, try to handle errors of connectors or streaming applications in an automated manner without bringing the pipeline down ‣A lot of issues occur when integrating external systems that you do not control, e.g., snapshotting a very large database table, sending events to slow APIs, etc.