SlideShare a Scribd company logo
Restoring Restoration’s Reputation
in Kafka Streams
Bruno Cadonna & Lucas Brutschy
About the Presenters
2
Bruno Cadonna
Senior Software Engineer @ Confluent
Committer & PMC member @ Apache Kafka
Lucas Brutschy
Senior Software Engineer @ Confluent
Contributor @ Apache Kafka
3
Outline
The basics
Why does restoration have a poor reputation?
Restoring restoration’s reputation
Experiments
Broader vision & future work
Take aways
The basics
4
Kafka Streams
5
Stateful processors use local state stores
1234 2345 9876 2846 xxxx xxxx xxxx 2846
4
3
4
stateless processor
stateful processor
stream
Subtopologies in Kafka Streams
6
stateless processor
stateful processor
stream
A topology consists of subtopologies connected by Kafka topics
Subtopology A Subtopology B
repartitioning
define new key
Tasks in Kafka Streams
7
A task is an instance of a subtopology that processes records from one topic partition
task A0
task A1
task A2
task B0
task B1
task B2
stateless processor
stateful processor
stream
Task assignment in Kafka Streams
8
During failover Kafka Streams moves tasks to the running clients
task A0
task B0
task A1
task A2
task B2
task B1
Streams client 1 Streams client 2 Streams client 3
Kafka Streams distribute tasks over the Streams clients
Running clients do not have local state for newly assigned tasks
Restoration in Kafka Streams
9
When tasks are moved to a different Streams client their states are restored from Kafka
Streams client 1 Streams client 3
task B1
task A0
task A0 task B1
task B1
task A0
Streams client 2
Restoration
task B1
task A0
Replication
Kafka Streams replicate the content of state stores to changelog topics in Kafka
Standby tasks
10
● Hot local replicas of a local state store
● Assigned to Streams clients that DO NOT own the task whose state is
replicated
● Updates local state store changelog topic
● No processing
● Avoid restoration when tasks are moved to other Streams clients
How to see restoration in your applications
11
Verify progress of restoration:
[2023-04-26 13:43:48,739] INFO stream-thread [<client ID>-StreamThread-1] Restoration in progress for
10 partitions. {benchmark-store-changelog-5: position=26324, end=252519, totalRestored=26272}
{benchmark-store-changelog-4: position=26360, end=252296, totalRestored=26336}
{benchmark-store-changelog-3: position=26394, end=252304, totalRestored=26351}
{benchmark-store-changelog-2: position=26279, end=252514, totalRestored=26240}
{benchmark-store-changelog-9: position=13270, end=252490, totalRestored=13239}
{benchmark-store-changelog-8: position=13277, end=252495, totalRestored=13247}
{benchmark-store-changelog-7: position=13273, end=252527, totalRestored=13234}
{benchmark-store-changelog-6: position=13266, end=252508, totalRestored=13245}
{benchmark-store-changelog-1: position=13298, end=252710, totalRestored=13278}
{benchmark-store-changelog-0: position=13275, end=252705, totalRestored=13256}
(org.apache.kafka.streams.processor.internals.StoreChangelogReader:575)
Verify whether restoration finished:
2023-04-26 13:49:05,441] INFO stream-thread [<client ID>-StreamThread-1] Restoration took 189227 ms for all
tasks [1_0, 0_9, 0_8, 1_9, 0_7, 1_8, 0_6, 1_7, 0_5, 1_6, 0_4, 1_5, 0_3, 1_4, 0_2, 1_3, 0_1, 1_2, 0_0, 1_1]
(org.apache.kafka.streams.processor.internals.StreamThread:902)
Why does restoration have a poor
reputation?
12
Phases of a Stream thread
Poll Restore Poll Restore Poll Restore Poll Restore
Process Process
“Restoration Phase” “Processing Phase”
Active
Tasks
Standby
Tasks
Active
Tasks
Standby
Tasks
Active
Tasks
Standby
Tasks
Active
Tasks
Standby
Tasks
Rebalance
Time
All active tasks are blocked from processing until
restoration is complete
Poll Re
stream thread
All active tasks have
caught up
Checkpointing offsets of states
No checkpoints!
Time If restoration is interrupted, it has to
restart from the beginning
✔ taking checkpoints
Poll Restore Poll Restore Poll Restore Poll Restore
Process Process
Active
Tasks
Standby
Tasks
Active
Tasks
Standby
Tasks
Active
Tasks
Standby
Tasks
Active
Tasks
Standby
Tasks
Rebalance Poll Re
✔ ✔
stream thread
More disadvantages of coupling restoration / processing
15
➔ If restoration is slow, we risk falling out of the consumer group
➔ If restoration is slow and exactly-once semantics are enabled, we
risk timing out transactions
➔ Both trigger rebalance and restoration
Restoring restoration’s reputation
16
Introducing the state updater
Time
Restoration and processing happen in parallel
task A0
task B1
Active Tasks
Poll Process
Rebalance Poll Poll Poll
Process Process
Process Poll Process
Restore
Standby Tasks
task B1
task A0
Active Tasks
17
state updater
stream thread
Restore
Standby Tasks
task B1
task A0
Active Tasks
task A0
task B1
Active Tasks
Poll Process
Rebalance Poll Poll Poll
Process Process
Process Poll Process
Checkpointing offsets during restoration
Time
Checkpoints are taken during restoration and
processing
✔ ✔
✔
✔ ✔ ✔ ✔
✔ taking checkpoints
18
stream thread
state updater
✔ ✔
Communication between stream thread and state updater
Rebalance Poll Process
19
Restore
Restored
tasks
Tasks to add
and remove
Removed
tasks
Exceptions
and tasks
stream thread
state updater
New Metrics
20
active-restoring-tasks The number of active tasks currently undergoing restoration
standby-updating-tasks The number of standby tasks currently undergoing state update
idle-ratio The fraction of time the thread spent on idling
active-restore-ratio .. on restoring active tasks
standby-update-ratio .. on updating standby tasks
checkpoint-ratio .. on checkpointing tasks restored progress
KIP-869 (Guozhang Wang)
Run it!
21
See how tasks are passed to the state updater:
[2023-04-26 17:12:29,915] INFO state-updater [<client ID>-StateUpdater-1] Stateful active task 1_4 was added
to the state updater (org.apache.kafka.streams.processor.internals.DefaultStateUpdater:370)
and complete restoration:
[2023-04-26 17:17:00,936] INFO state-updater [<client ID>-StateUpdater-1] Stateful active task 1_4 completed
restoration (org.apache.kafka.streams.processor.internals.DefaultStateUpdater:449)
State Updater is available in Kafka 3.5 but gated by an internal flag
Try it out by setting __state.updater.enabled__ streams config
(use at your own risk)
Experiments
22
Setup
23
input
output
repartition
topic
repartition
topic
State
Process without
restoration
Process only after
restoration
Getting back into the business sooner…
24
w/o state updater
w/ state updater
● 1 single Streams client
● Topics with 10 partitions
● Runtime: 20 min
● After 10 min the client is restarted and
local state cleaned up
● Streams client needs to restore all state
CPU load
25
w/o
state updater
w/
state updater
stream thread
state updater
Broader vision & future work
26
Stream Thread
Old model
27
Restore
Consumer
Main
Consumer
Poll/Restore/Process
Loop
Producer
Multi-core scenario
28
Stream Thread
Restore
Consumer
Main
Consumer
Poll/Restore/Process
Loop
Producer
Stream Thread
Restore
Consumer
Main
Consumer
Poll/Restore/Process
Loop
Producer
Stream Thread
Restore
Consumer
Main
Consumer
Poll/Restore/Process
Loop
Producer
Stream Thread
Restore
Consumer
Main
Consumer
Poll/Restore/Process
Loop
Producer
…
The vision
29
State Updater
Poll Thread
Restore
Consumer
Main
Consumer
Poll
Loop
Restore
Loop
Producer
Thread
Producer
Processing Thread 1
Processing Thread 2
Processing Thread N
Restoration with exactly-once semantics
30
- When a Streams client fails, the local state may have been
updated since the last commit
- We cannot “roll back” the state
- State has to be wiped, triggering full restoration
KIP-844: Transactional State Stores
KIP-892: Transactional Semantics for State Stores
●
Take aways
31
Take aways
32
• Restoration decoupled from processing
• Checkpointing during restoration
• Part of larger change in threading model
Joint work with Guozhang Wang
Thank you!
33
Challenges
34
35
Poll
The Stream thread loop
Restore Process
36
Poll
One more detail: check pointing
Restore Process
Checkpoint
File
✔
Keeps the latest offset of changelog topic
Answers the question “is our state up-to-date?” after restart
State Updater
Stream Thread
Model with the state updater
37
Restore
Consumer
Main
Consumer
Poll/Process
Loop
Producer
Restore
Loop
38
Cycles saved
Cycles gained
39
Cycles saved
Cycles gained
We save cycles on restoration!
40
Cycles saved
Cycles gained
We gain cycles on
during processing?
* Throughput 15% lower!
Causing a slowdown, even when barely running at all?
41
Stream
Thread
State Updater
Thread
Icons
42
Central Nervous
System
Early Production
Streaming
Stream
Designer
Data
Everywhere
Kafka Cluster
Database Databases Data Lake DB Warehouse Data Center Cloud Cloud to Cloud Hybrid Cloud Cloud Dev Equal Cloud Cloud
Management
Server On-premise Serverless Replicator Operator Kafka KSQL Rocket ksqlDB KSQL Circle Connector Microservices Schema Registry
Streams Event Streams Number of
Data Sources
IOT Cluster Partition
Rebalancing
Stream Processing
Cookbook
Data
Governance
Apps Service Apps Custom Apps Logs Data Stacks Stack Overflow Storage Platform Data In Data Out Data Add
Branch
Processing Real-time Aggregate Data Frameworks CLI Dev Scale Combine Join Architect # of Producers
Icons
43
Webinar Developer Onboard Offboard Filter
Globe Infinity Settings Monitoring Anomaly
Detection
Analytics Real-time
Analytics
Real-time Processing Process Data Upload Download
Computer Devices Computer /
DB / Cloud
Status Open Source Web Confirmed RSS MQTT Message Quotes Interview # of Topics
Person People People
Manager
Career
Enablement
Roadmap Search Solution
Send
Features Company
Policies
Docs Invoice Blog Podcast Video Book Table Email Print
Continuous
Learning
Lock Key Warning Hacker Bug GDPR CCPA Shield Shield Open Machine
Learning
Eye
Icons
44
Shirt Food Catalyst Box Sparkly New
Manufacturing Venue Government Business Marketplace Ecommerce Sale Money Telecom Support Gaming Healthcare
Computer Love Partner Hand Arm Benefit Thumbs Up Swipe Select Promote Awareness Target
Car Truck Puzzle Lightening Star Question Check
Workday
Speed Time Coming Soon Time / Money ROI TCO Data in Terabytes
Per Day
# of Events
Per Day
Calendar Payday Docker
Transfer Expand / Shrink Add Balance Rest Trophy Certificate Badge

More Related Content

What's hot

Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environment
confluent
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
Kostas Tzoumas
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
confluent
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
Sumant Tambe
 
Versioned State Stores in Kafka Streams with Victoria Xia
Versioned State Stores in Kafka Streams with Victoria XiaVersioned State Stores in Kafka Streams with Victoria Xia
Versioned State Stores in Kafka Streams with Victoria Xia
HostedbyConfluent
 
Schemas Beyond The Edge
Schemas Beyond The EdgeSchemas Beyond The Edge
Schemas Beyond The Edge
confluent
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
confluent
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
confluent
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
Exactly-once Stream Processing Done Right with Matthias J Sax
Exactly-once Stream Processing Done Right with Matthias J SaxExactly-once Stream Processing Done Right with Matthias J Sax
Exactly-once Stream Processing Done Right with Matthias J Sax
HostedbyConfluent
 
How to tune Kafka® for production
How to tune Kafka® for productionHow to tune Kafka® for production
How to tune Kafka® for production
confluent
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
confluent
 
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
SANG WON PARK
 
What to do if Your Kafka Streams App Gets OOMKilled? with Andrey Serebryanskiy
What to do if Your Kafka Streams App Gets OOMKilled? with Andrey SerebryanskiyWhat to do if Your Kafka Streams App Gets OOMKilled? with Andrey Serebryanskiy
What to do if Your Kafka Streams App Gets OOMKilled? with Andrey Serebryanskiy
HostedbyConfluent
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
confluent
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Amazon Web Services
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
confluent
 

What's hot (20)

Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environment
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
 
Versioned State Stores in Kafka Streams with Victoria Xia
Versioned State Stores in Kafka Streams with Victoria XiaVersioned State Stores in Kafka Streams with Victoria Xia
Versioned State Stores in Kafka Streams with Victoria Xia
 
Schemas Beyond The Edge
Schemas Beyond The EdgeSchemas Beyond The Edge
Schemas Beyond The Edge
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Exactly-once Stream Processing Done Right with Matthias J Sax
Exactly-once Stream Processing Done Right with Matthias J SaxExactly-once Stream Processing Done Right with Matthias J Sax
Exactly-once Stream Processing Done Right with Matthias J Sax
 
How to tune Kafka® for production
How to tune Kafka® for productionHow to tune Kafka® for production
How to tune Kafka® for production
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
 
What to do if Your Kafka Streams App Gets OOMKilled? with Andrey Serebryanskiy
What to do if Your Kafka Streams App Gets OOMKilled? with Andrey SerebryanskiyWhat to do if Your Kafka Streams App Gets OOMKilled? with Andrey Serebryanskiy
What to do if Your Kafka Streams App Gets OOMKilled? with Andrey Serebryanskiy
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 

Similar to Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Lucas Brutschy

Performance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and UnderscoresPerformance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and Underscores
Jitendra Singh
 
Why stop the world when you can change it? Design and implementation of Incre...
Why stop the world when you can change it? Design and implementation of Incre...Why stop the world when you can change it? Design and implementation of Incre...
Why stop the world when you can change it? Design and implementation of Incre...
confluent
 
Design and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative RebalancingDesign and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative Rebalancing
confluent
 
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
VMworld
 
Infrastructure Strategy
Infrastructure StrategyInfrastructure Strategy
Infrastructure StrategyRobert Jones
 
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
HostedbyConfluent
 
ebs-performance-tuning-part-1-470542.pdf
ebs-performance-tuning-part-1-470542.pdfebs-performance-tuning-part-1-470542.pdf
ebs-performance-tuning-part-1-470542.pdf
ElboulmaniMohamed
 
Kubernetes 1.21 release
Kubernetes 1.21 releaseKubernetes 1.21 release
Kubernetes 1.21 release
LibbySchulze
 
Heart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object ModelHeart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object Model
Docker, Inc.
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
DataStax Academy
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...
IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...
IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...
Daniel Martin
 
COLLABORATE 16 Demystifying secrets of R12.2 upgrade_PPT
COLLABORATE 16 Demystifying secrets of R12.2 upgrade_PPTCOLLABORATE 16 Demystifying secrets of R12.2 upgrade_PPT
COLLABORATE 16 Demystifying secrets of R12.2 upgrade_PPTPreet Kamal Singh
 
Moving Towards Better Upgrades in Kafka Streams
Moving Towards Better Upgrades in Kafka StreamsMoving Towards Better Upgrades in Kafka Streams
Moving Towards Better Upgrades in Kafka Streams
HostedbyConfluent
 
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka StreamsStreams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
HostedbyConfluent
 
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
KafkaZone
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward
 
#Virtualdreamin Meera_Nar_Salesforce_performance_considerations
#Virtualdreamin Meera_Nar_Salesforce_performance_considerations#Virtualdreamin Meera_Nar_Salesforce_performance_considerations
#Virtualdreamin Meera_Nar_Salesforce_performance_considerations
Meera R Nair
 
VMworld 2013: Automated Management of Tier-1 Applications on VMware
VMworld 2013: Automated Management of Tier-1 Applications on VMware VMworld 2013: Automated Management of Tier-1 Applications on VMware
VMworld 2013: Automated Management of Tier-1 Applications on VMware
VMworld
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward
 

Similar to Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Lucas Brutschy (20)

Performance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and UnderscoresPerformance Stability, Tips and Tricks and Underscores
Performance Stability, Tips and Tricks and Underscores
 
Why stop the world when you can change it? Design and implementation of Incre...
Why stop the world when you can change it? Design and implementation of Incre...Why stop the world when you can change it? Design and implementation of Incre...
Why stop the world when you can change it? Design and implementation of Incre...
 
Design and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative RebalancingDesign and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative Rebalancing
 
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
VMworld 2013: Moving Enterprise Application Dev/Test to VMware’s Internal Pri...
 
Infrastructure Strategy
Infrastructure StrategyInfrastructure Strategy
Infrastructure Strategy
 
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
 
ebs-performance-tuning-part-1-470542.pdf
ebs-performance-tuning-part-1-470542.pdfebs-performance-tuning-part-1-470542.pdf
ebs-performance-tuning-part-1-470542.pdf
 
Kubernetes 1.21 release
Kubernetes 1.21 releaseKubernetes 1.21 release
Kubernetes 1.21 release
 
Heart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object ModelHeart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object Model
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...
IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...
IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...
 
COLLABORATE 16 Demystifying secrets of R12.2 upgrade_PPT
COLLABORATE 16 Demystifying secrets of R12.2 upgrade_PPTCOLLABORATE 16 Demystifying secrets of R12.2 upgrade_PPT
COLLABORATE 16 Demystifying secrets of R12.2 upgrade_PPT
 
Moving Towards Better Upgrades in Kafka Streams
Moving Towards Better Upgrades in Kafka StreamsMoving Towards Better Upgrades in Kafka Streams
Moving Towards Better Upgrades in Kafka Streams
 
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka StreamsStreams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
 
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
 
#Virtualdreamin Meera_Nar_Salesforce_performance_considerations
#Virtualdreamin Meera_Nar_Salesforce_performance_considerations#Virtualdreamin Meera_Nar_Salesforce_performance_considerations
#Virtualdreamin Meera_Nar_Salesforce_performance_considerations
 
VMworld 2013: Automated Management of Tier-1 Applications on VMware
VMworld 2013: Automated Management of Tier-1 Applications on VMware VMworld 2013: Automated Management of Tier-1 Applications on VMware
VMworld 2013: Automated Management of Tier-1 Applications on VMware
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 

Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Lucas Brutschy

  • 1. Restoring Restoration’s Reputation in Kafka Streams Bruno Cadonna & Lucas Brutschy
  • 2. About the Presenters 2 Bruno Cadonna Senior Software Engineer @ Confluent Committer & PMC member @ Apache Kafka Lucas Brutschy Senior Software Engineer @ Confluent Contributor @ Apache Kafka
  • 3. 3 Outline The basics Why does restoration have a poor reputation? Restoring restoration’s reputation Experiments Broader vision & future work Take aways
  • 5. Kafka Streams 5 Stateful processors use local state stores 1234 2345 9876 2846 xxxx xxxx xxxx 2846 4 3 4 stateless processor stateful processor stream
  • 6. Subtopologies in Kafka Streams 6 stateless processor stateful processor stream A topology consists of subtopologies connected by Kafka topics Subtopology A Subtopology B repartitioning define new key
  • 7. Tasks in Kafka Streams 7 A task is an instance of a subtopology that processes records from one topic partition task A0 task A1 task A2 task B0 task B1 task B2 stateless processor stateful processor stream
  • 8. Task assignment in Kafka Streams 8 During failover Kafka Streams moves tasks to the running clients task A0 task B0 task A1 task A2 task B2 task B1 Streams client 1 Streams client 2 Streams client 3 Kafka Streams distribute tasks over the Streams clients Running clients do not have local state for newly assigned tasks
  • 9. Restoration in Kafka Streams 9 When tasks are moved to a different Streams client their states are restored from Kafka Streams client 1 Streams client 3 task B1 task A0 task A0 task B1 task B1 task A0 Streams client 2 Restoration task B1 task A0 Replication Kafka Streams replicate the content of state stores to changelog topics in Kafka
  • 10. Standby tasks 10 ● Hot local replicas of a local state store ● Assigned to Streams clients that DO NOT own the task whose state is replicated ● Updates local state store changelog topic ● No processing ● Avoid restoration when tasks are moved to other Streams clients
  • 11. How to see restoration in your applications 11 Verify progress of restoration: [2023-04-26 13:43:48,739] INFO stream-thread [<client ID>-StreamThread-1] Restoration in progress for 10 partitions. {benchmark-store-changelog-5: position=26324, end=252519, totalRestored=26272} {benchmark-store-changelog-4: position=26360, end=252296, totalRestored=26336} {benchmark-store-changelog-3: position=26394, end=252304, totalRestored=26351} {benchmark-store-changelog-2: position=26279, end=252514, totalRestored=26240} {benchmark-store-changelog-9: position=13270, end=252490, totalRestored=13239} {benchmark-store-changelog-8: position=13277, end=252495, totalRestored=13247} {benchmark-store-changelog-7: position=13273, end=252527, totalRestored=13234} {benchmark-store-changelog-6: position=13266, end=252508, totalRestored=13245} {benchmark-store-changelog-1: position=13298, end=252710, totalRestored=13278} {benchmark-store-changelog-0: position=13275, end=252705, totalRestored=13256} (org.apache.kafka.streams.processor.internals.StoreChangelogReader:575) Verify whether restoration finished: 2023-04-26 13:49:05,441] INFO stream-thread [<client ID>-StreamThread-1] Restoration took 189227 ms for all tasks [1_0, 0_9, 0_8, 1_9, 0_7, 1_8, 0_6, 1_7, 0_5, 1_6, 0_4, 1_5, 0_3, 1_4, 0_2, 1_3, 0_1, 1_2, 0_0, 1_1] (org.apache.kafka.streams.processor.internals.StreamThread:902)
  • 12. Why does restoration have a poor reputation? 12
  • 13. Phases of a Stream thread Poll Restore Poll Restore Poll Restore Poll Restore Process Process “Restoration Phase” “Processing Phase” Active Tasks Standby Tasks Active Tasks Standby Tasks Active Tasks Standby Tasks Active Tasks Standby Tasks Rebalance Time All active tasks are blocked from processing until restoration is complete Poll Re stream thread All active tasks have caught up
  • 14. Checkpointing offsets of states No checkpoints! Time If restoration is interrupted, it has to restart from the beginning ✔ taking checkpoints Poll Restore Poll Restore Poll Restore Poll Restore Process Process Active Tasks Standby Tasks Active Tasks Standby Tasks Active Tasks Standby Tasks Active Tasks Standby Tasks Rebalance Poll Re ✔ ✔ stream thread
  • 15. More disadvantages of coupling restoration / processing 15 ➔ If restoration is slow, we risk falling out of the consumer group ➔ If restoration is slow and exactly-once semantics are enabled, we risk timing out transactions ➔ Both trigger rebalance and restoration
  • 17. Introducing the state updater Time Restoration and processing happen in parallel task A0 task B1 Active Tasks Poll Process Rebalance Poll Poll Poll Process Process Process Poll Process Restore Standby Tasks task B1 task A0 Active Tasks 17 state updater stream thread
  • 18. Restore Standby Tasks task B1 task A0 Active Tasks task A0 task B1 Active Tasks Poll Process Rebalance Poll Poll Poll Process Process Process Poll Process Checkpointing offsets during restoration Time Checkpoints are taken during restoration and processing ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ taking checkpoints 18 stream thread state updater ✔ ✔
  • 19. Communication between stream thread and state updater Rebalance Poll Process 19 Restore Restored tasks Tasks to add and remove Removed tasks Exceptions and tasks stream thread state updater
  • 20. New Metrics 20 active-restoring-tasks The number of active tasks currently undergoing restoration standby-updating-tasks The number of standby tasks currently undergoing state update idle-ratio The fraction of time the thread spent on idling active-restore-ratio .. on restoring active tasks standby-update-ratio .. on updating standby tasks checkpoint-ratio .. on checkpointing tasks restored progress KIP-869 (Guozhang Wang)
  • 21. Run it! 21 See how tasks are passed to the state updater: [2023-04-26 17:12:29,915] INFO state-updater [<client ID>-StateUpdater-1] Stateful active task 1_4 was added to the state updater (org.apache.kafka.streams.processor.internals.DefaultStateUpdater:370) and complete restoration: [2023-04-26 17:17:00,936] INFO state-updater [<client ID>-StateUpdater-1] Stateful active task 1_4 completed restoration (org.apache.kafka.streams.processor.internals.DefaultStateUpdater:449) State Updater is available in Kafka 3.5 but gated by an internal flag Try it out by setting __state.updater.enabled__ streams config (use at your own risk)
  • 24. Getting back into the business sooner… 24 w/o state updater w/ state updater ● 1 single Streams client ● Topics with 10 partitions ● Runtime: 20 min ● After 10 min the client is restarted and local state cleaned up ● Streams client needs to restore all state
  • 25. CPU load 25 w/o state updater w/ state updater stream thread state updater
  • 26. Broader vision & future work 26
  • 28. Multi-core scenario 28 Stream Thread Restore Consumer Main Consumer Poll/Restore/Process Loop Producer Stream Thread Restore Consumer Main Consumer Poll/Restore/Process Loop Producer Stream Thread Restore Consumer Main Consumer Poll/Restore/Process Loop Producer Stream Thread Restore Consumer Main Consumer Poll/Restore/Process Loop Producer
  • 29. … The vision 29 State Updater Poll Thread Restore Consumer Main Consumer Poll Loop Restore Loop Producer Thread Producer Processing Thread 1 Processing Thread 2 Processing Thread N
  • 30. Restoration with exactly-once semantics 30 - When a Streams client fails, the local state may have been updated since the last commit - We cannot “roll back” the state - State has to be wiped, triggering full restoration KIP-844: Transactional State Stores KIP-892: Transactional Semantics for State Stores ●
  • 32. Take aways 32 • Restoration decoupled from processing • Checkpointing during restoration • Part of larger change in threading model Joint work with Guozhang Wang
  • 35. 35 Poll The Stream thread loop Restore Process
  • 36. 36 Poll One more detail: check pointing Restore Process Checkpoint File ✔ Keeps the latest offset of changelog topic Answers the question “is our state up-to-date?” after restart
  • 37. State Updater Stream Thread Model with the state updater 37 Restore Consumer Main Consumer Poll/Process Loop Producer Restore Loop
  • 39. 39 Cycles saved Cycles gained We save cycles on restoration!
  • 40. 40 Cycles saved Cycles gained We gain cycles on during processing? * Throughput 15% lower!
  • 41. Causing a slowdown, even when barely running at all? 41 Stream Thread State Updater Thread
  • 42. Icons 42 Central Nervous System Early Production Streaming Stream Designer Data Everywhere Kafka Cluster Database Databases Data Lake DB Warehouse Data Center Cloud Cloud to Cloud Hybrid Cloud Cloud Dev Equal Cloud Cloud Management Server On-premise Serverless Replicator Operator Kafka KSQL Rocket ksqlDB KSQL Circle Connector Microservices Schema Registry Streams Event Streams Number of Data Sources IOT Cluster Partition Rebalancing Stream Processing Cookbook Data Governance Apps Service Apps Custom Apps Logs Data Stacks Stack Overflow Storage Platform Data In Data Out Data Add Branch Processing Real-time Aggregate Data Frameworks CLI Dev Scale Combine Join Architect # of Producers
  • 43. Icons 43 Webinar Developer Onboard Offboard Filter Globe Infinity Settings Monitoring Anomaly Detection Analytics Real-time Analytics Real-time Processing Process Data Upload Download Computer Devices Computer / DB / Cloud Status Open Source Web Confirmed RSS MQTT Message Quotes Interview # of Topics Person People People Manager Career Enablement Roadmap Search Solution Send Features Company Policies Docs Invoice Blog Podcast Video Book Table Email Print Continuous Learning Lock Key Warning Hacker Bug GDPR CCPA Shield Shield Open Machine Learning Eye
  • 44. Icons 44 Shirt Food Catalyst Box Sparkly New Manufacturing Venue Government Business Marketplace Ecommerce Sale Money Telecom Support Gaming Healthcare Computer Love Partner Hand Arm Benefit Thumbs Up Swipe Select Promote Awareness Target Car Truck Puzzle Lightening Star Question Check Workday Speed Time Coming Soon Time / Money ROI TCO Data in Terabytes Per Day # of Events Per Day Calendar Payday Docker Transfer Expand / Shrink Add Balance Rest Trophy Certificate Badge