SlideShare a Scribd company logo
1 of 55
Download to read offline
Getting Under the Hood of Kafka Streams -
Optimizing Storage engine to Tune Up Performance
Agenda
★ What is a Storage Engine?
★ State store in Kafka Streams
★ 3 tips and 1 game changing trick
to improve performance today
★ What is Speedb
2
Storage Engine
3
Storage Engine
- A software component responsible for managing how data is
stored, retrieved, and updated in a database or data processing
system
- Plays a crucial role in determining the performance and
efficiency of data storage and retrieval operations
- Embedded in the application software stack
- Data management capabilities: snapshots, transactions, data is
ordered, can run iterators
4
Stateful Kafka Streams
5
Stateful Kafka Streams
- Kafka streams uses a storage engine for state storing
- Stateful enables to process the current event in the context of other
events
- Track information over time, in real-time data processing
Example of State storing operations:
- Count() - how many times have we seen a key in the event stream
- Aggregate() - allows aggregate data you’ve seen until now , combine events,
they don’t have to be of the same type.
6
RocksDB: Kafka’s default storage engine
- The default storage engine in Kafka is RocksDB
- RocksDB is a popular key-value storage engine, by Facebook
- A fork of LevelDB, by Google
- LSM tree implementation
- Better than B-tree on write intensive workload
- Very complex, it has dozens of parameters to config
7
Why does it matter?
- The storage engine resides in the data path, has a major impact on the
overall performance
- Handles all put/get/delete/update operations
- Example: state restore:
Scale out or recovering
Lower is better
8
3 tips to increase your stateful streaming
performance today
and 1 game changing trick
Speedb/RocksDB High Level Architecture
Memtable is full
SST file
Read
Write
Memtable
Flush
L0
L1
L2
.
.
Ln
Data not in the memtable
Immutable
memtable
SST file
SST file SST file SST file
SST file SST file SST file SST file
SST file SST file SST file SST file SST file
10
1. Write buffer size (memtable)
- Data is written to the write buffer
- When the memtable is full the data is flushed to the
disk
- The size of the memtable affects the flush frequency
and the file size
DB2 DB4
SST1
SST1
L0 SST1 SST1
mem1 mem2 mem3
Flush
mem3
Writes
11
Write Buffer Size - Considerations
Larger write buffer:
Fewer flush operations
Reduced write latency
Lower write amplification
(less data written to disk)
Improved read performance for recent data
Consumes extra memory
Smaller write buffer:
More frequent flushes
Higher write amplification
Increased write latency
May require more frequent disk access
for recent data
12
Write Buffer Size - db_bench Test Results
Write Amplification
- Write amplification changed from 11 to 8
(30% lower)
OPs
- Up to 50% increase in write performance
- Average of 20% more
30% decrease
lower is better
13
P99 Comparison: 64MB vs 16MB
TEST NAME P99 - 64MB P99 - 16MB P99 - diff
readrandomwriterandom_5 1940.18 2588.6 -25.05%
readrandomwriterandom_50 865.01 1285.75 -32.72%
readrandomwriterandom_70 858.19 1404.23 -38.89%
Up to ~40% decrease in P99 latency when write
buffer size is set to 64MB
14
Write Buffer Size
For a total increase in RAM footprint of
(64(recommended)-16(default today))*4(max memtable num) per CF in MB
(so around 200MB) you gain 30% performance increase.
So --> if you have
Low number of partitions or a lot of free memory
This would be good for you
Note: It's not a system wide, you can increase one specific CF
(partition) and not the rest. 15
2. Performance vs Memory Tradeoff
- In heavy write workload scenario the write buffer can not handle the
heavy write rate
- There are 2 options:
- Consume extra memory to handle the new writes
- Delay the writes and potentially getting into to application
stalls
MAX memtables=2
16
Performance
Allow_stalls:disabled
RocksDB consumes
extra memory
Can lead to OOM
Memory
Allow_stalls:enabled
Serious performance
impact
Rocksdb may enter
deadlocks
2. Performance vs Memory Tradeoff
17
Pinning - Performance
Pin index and filter
block for better
performance
RocksDB consumes
extra memory
Can lead to OOM
LRU - Memory
Use LRU cache for
filter and index
Performance impact when
need to load data to
the cache
Performance vs Memory Tradeoff
18
Reorder the LSM tree for
better performance and
lower space (garbage
collection)
3. Compaction Method
Periodic process
Merge SST files: removes duplicate or overwriting keys
Multi-threaded
process
19
Compaction Types : Universal vs. Leveled
Universal Compaction VS Leveled Compaction
Write rate > write rate
Space Amplification > Space Amplification
Write Amplification > write amplification
20
Universal vs. Leveled Compaction
Results
- 60% less space amplification
- 22% less write amplification
Configuration:
- 80 Million keys of 1KB
- WB size: 16MB
lower is better
21
Universal vs. Leveled Compaction (WB size 16MB)
Results
Leveled compaction provides:
- 12% improvement in mixed workload
- 38% improvement in Seek random
- 10% less write performance
Configuration:
- 80 Million keys of 1KB
- WB size: 16MB
22
Universal vs. Leveled Compaction (WB size 64MB)
Results
- 10% improvement with Level
compaction +64MB write buffer size
Test:
- Overwrite (100% random write)
23
A Game changing trick to
Boost your Application
A drop-in replacement for RocksDB
Speedb Open Source
- Speedb open source is a community-led project
- A fork of RocksDB, fully compatible
- Rebase to latest rocksdb regularly
Community-driven
Community Resource
utilization
High and stable
performance
T
Resource
utilization
Data that has not been
flushed - secured on the
WAL but not yet “cleaned”
Developer
Experience
High and stable
performance
High and stable
performance
26
Performance Stabilization: Delayed write
- Mechanism to slow down writes when reaching to a certain threshold
- Speedb delayed the writes moderately to avoid stalls and performance
instability
27
Global delayed write
- Every Kafka streams partition is translated to a Rocksdb instance
- Partitions allowing parallel processing
- The global delayed write take into account all of instances when
decides about the write rate changes
28
Delayedwrite
Environment
16 cores 128GB RAM, NVME
Scenario
5578
4122
No More Hiccups!
Stable performance
no stalls
Results
100% random write workload
29
3
0
5578
4122
Write Buffer Manager
Speedb offers: new write buffer manager that keep
stable performance while staying in the memory
boundaries.
Limits the sum of all memtables to not exceed its
size
The problem: when the WBM reaches to 90% usage it
causes huge performance issue.
Without stalls the size is not enforced.
Series impact on performance, deadlock etc’,
3
1
5578
4122
Write Buffer Manager
-
-
Define a single
parameter
Simplicity
-
-
Stays within the
memory boundaries
Memory
Consumption -
-
Eliminate Stalls
Stable
Performance
WriteBuffer
Manager
Environment
16 cores 128GB RAM, NVME
Scenario
95% write workload
Results
- Stable performance (no
stalls)
- 24% less memory usage
5578
4122
Lower is better
32
33
ProactiveFlushes
testandresults
Environment
16 cores 128GB RAM, NVME
Scenario
As described above
Results
45% improvement in writes
ProactiveFlushesandNative
Proactive Flushes Native
secs_elapsed
Static Pinning
- Pinning index and filter blocks to the
memory has better performance than LRU
cache
- The risk with RocksDB is to get into OOM
condition
Speedb added a safety belt that provides
the benefit of pinning without the memory
risk
34
35
StaticPinning
Environment
8 cores 64GB RAM, HDD
Scenario
Results
130% improvement in
read random workload
100% random read workload
Speedb Enterprise
36
Speedb Enterprise
- Based on Speedb OSS
- For high scale systems with SLA
Includes:
- Adaptive multi dimensional compaction
- Adaptive media throughput Control
- Professional Services and support
37
Why Speedb Enterprise?
Removes Data size limitation (~100GB)
Reduced CPU usage due to lower write
amplification
No IO hiccups during compaction
Eliminate the Performance degradation when
the data set is ±20GB
Improved SSD endurance due to lower
write amplification
38
Reduce WAF from ~24 to ~4
39
Seek while Write
Rocksdb
Speedb
14000
12000
10000
8000
6000
4000
2000
0
1 361
181 541 721 901 1081 1261 1441 1621 1801 1981 2161 2341 2521 2701 2762 2881 2942 3061 3122
IOPS
Time (in seconds)
40
95% write. 5% read
Rocksdb
Speedb
20000
18000
16000
14000
12000
10000
8000
6000
4000
2000
0
1 361
181 541 721 901 1081 1261 1441 1621 1801 1981 2161 2341 2462 2581 2642 2761 2822 2941 3002
IOPS
Time (in seconds)
41
Adaptive media throughput Control
42
Adaptive media throughput Control
43
Summary
1. Increase write buffer size from 16MB to
64MB
2. Memory vs. performance tradeoff
3. Change the compaction method to Level
4. Use Speedb storage engine in your Kafka Streams
application
44
How to replace RocksDB with Speedb?
1. Download a compiled Kafka stream version with Speedb
2. Compile yourself from source
3. Replace RocksDB with Speedb lib
45
Follow the doc for full instructions
Speedb Github
TheHive
Speedb serves as a hive where Speedb/RocksDB users
and contributors can collaborate on the development
of new storage engine capabilities to address the
needs of modern, data-intensive workloads
4
6
Thankyou
P99
Due to optimization for small object writes, RocksDB is typically
most popular in state store, metadata, caching, indexing and
other such applications supporting databases (like Redis on
Flash), applications and event streaming platforms like Kafka
Streams, Apache Flink or Apache Spark.
Rocksdb write performance suffers when the database size exceeds over 50GB. In this case the write amplification may reach a very large number, which means
that the system needs to do many (more) reads and writes for each application write.
48
Compaction Method
- L0: stores recent data, sorted by flush time
- L1-Lmax: sorted data by key
- Lmax - oldest data
49
Bloom Filter
- The data is always sorted - optimal for
sequential reads
- Bloom filter improves random read
performance
- Reduce the number of reads from the disk
- Returns either "possibly in set" or
"definitely not in set"
50
5
1
Bloomfilter
testandresults
Environment
8 cores 64GB RAM, HDD
Scenario
1B KV pairs 16B key, 256B
value, 4 threads, 20 bpk ,
high read-miss (similar to
read before write workload)
Results
130% improvement in
read misses read random
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1000
k
500k
0
10000000000 OBJECTS / 100% READS / 0% WRITES
readrandom
5
2
Bloomfilter
testandresults
Environment
16 cores 128GB RAM, NVME
Scenario
1B KV pairs 16B key, 256B
value, 4 threads, 29vs40 bpk
Results
26% reduction in memory usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1000
k
500k
0
10000000000 OBJECTS / 100% READS / 0% WRITES
readrandom
6000
4000
0
memoryconsumption
3000
5000
2000
1000
Bloom 40 BPK Speedb bloom 29 BPK
5578
4122
New Sorted hash Memtable
- The sorted hash memtable improves overall
performance
- Data structure: array of vectors + hash
table
- Improved read while write performance
53
5
4
Memtable
testandresults
Environment
16 cores 128GB RAM, NVME
Scenario
1B KV pairs 16B key,
64B value,50 threads db_bench
Results
17% gain overwrite,
13% gain mixed (9010)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
400k
200k
0
10000000000 OBJECTS / 0% READS / 100% WRITES
overwrite
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
400k
200k
0
10000000000 OBJECTS / 90% READS / 10% WRITES
readrandomwriterandom_90
RocksDB vs Speedb OSS vs Enterprise
RocksDB Speedb OSS Speedb Enterprise
Professional
Services
Eliminate Write
stalls
Reduced Memory
usage
High performance
at scale
(dataset >30GB)
Adaptive
Multi-dimensional
compaction
55

More Related Content

Similar to Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune Up Performance

EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBScott Mansfield
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
 
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Programinside-BigData.com
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataRyan Bosshart
 
505 kobal exadata
505 kobal exadata505 kobal exadata
505 kobal exadataKam Chan
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Виталий Стародубцев
 
Planning & Best Practice for Microsoft Virtualization
Planning & Best Practice for Microsoft VirtualizationPlanning & Best Practice for Microsoft Virtualization
Planning & Best Practice for Microsoft VirtualizationLai Yoong Seng
 
High Frequency Trading and NoSQL database
High Frequency Trading and NoSQL databaseHigh Frequency Trading and NoSQL database
High Frequency Trading and NoSQL databasePeter Lawrey
 
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance BarriersCeph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance BarriersCeph Community
 
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Spark Summit
 
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Community
 
Tuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy WorkloadTuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy WorkloadMarius Adrian Popa
 
Application Caching: The Hidden Microservice
Application Caching: The Hidden MicroserviceApplication Caching: The Hidden Microservice
Application Caching: The Hidden MicroserviceScott Mansfield
 
Storage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten RachfahlStorage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten RachfahlITCamp
 
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...Joao Galdino Mello de Souza
 
Orcl siebel-sun-s282213-oow2006
Orcl siebel-sun-s282213-oow2006Orcl siebel-sun-s282213-oow2006
Orcl siebel-sun-s282213-oow2006Sal Marcus
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed_Hat_Storage
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMariaDB plc
 

Similar to Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune Up Performance (20)

EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast Data
 
505 kobal exadata
505 kobal exadata505 kobal exadata
505 kobal exadata
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
 
Planning & Best Practice for Microsoft Virtualization
Planning & Best Practice for Microsoft VirtualizationPlanning & Best Practice for Microsoft Virtualization
Planning & Best Practice for Microsoft Virtualization
 
LUG 2014
LUG 2014LUG 2014
LUG 2014
 
High Frequency Trading and NoSQL database
High Frequency Trading and NoSQL databaseHigh Frequency Trading and NoSQL database
High Frequency Trading and NoSQL database
 
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance BarriersCeph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
 
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
 
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
 
Tuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy WorkloadTuning Linux Windows and Firebird for Heavy Workload
Tuning Linux Windows and Firebird for Heavy Workload
 
Application Caching: The Hidden Microservice
Application Caching: The Hidden MicroserviceApplication Caching: The Hidden Microservice
Application Caching: The Hidden Microservice
 
Storage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten RachfahlStorage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
Storage Spaces Direct - the new Microsoft SDS star - Carsten Rachfahl
 
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
 
Orcl siebel-sun-s282213-oow2006
Orcl siebel-sun-s282213-oow2006Orcl siebel-sun-s282213-oow2006
Orcl siebel-sun-s282213-oow2006
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage Performance
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at Facebook
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune Up Performance

  • 1. Getting Under the Hood of Kafka Streams - Optimizing Storage engine to Tune Up Performance
  • 2. Agenda ★ What is a Storage Engine? ★ State store in Kafka Streams ★ 3 tips and 1 game changing trick to improve performance today ★ What is Speedb 2
  • 4. Storage Engine - A software component responsible for managing how data is stored, retrieved, and updated in a database or data processing system - Plays a crucial role in determining the performance and efficiency of data storage and retrieval operations - Embedded in the application software stack - Data management capabilities: snapshots, transactions, data is ordered, can run iterators 4
  • 6. Stateful Kafka Streams - Kafka streams uses a storage engine for state storing - Stateful enables to process the current event in the context of other events - Track information over time, in real-time data processing Example of State storing operations: - Count() - how many times have we seen a key in the event stream - Aggregate() - allows aggregate data you’ve seen until now , combine events, they don’t have to be of the same type. 6
  • 7. RocksDB: Kafka’s default storage engine - The default storage engine in Kafka is RocksDB - RocksDB is a popular key-value storage engine, by Facebook - A fork of LevelDB, by Google - LSM tree implementation - Better than B-tree on write intensive workload - Very complex, it has dozens of parameters to config 7
  • 8. Why does it matter? - The storage engine resides in the data path, has a major impact on the overall performance - Handles all put/get/delete/update operations - Example: state restore: Scale out or recovering Lower is better 8
  • 9. 3 tips to increase your stateful streaming performance today and 1 game changing trick
  • 10. Speedb/RocksDB High Level Architecture Memtable is full SST file Read Write Memtable Flush L0 L1 L2 . . Ln Data not in the memtable Immutable memtable SST file SST file SST file SST file SST file SST file SST file SST file SST file SST file SST file SST file SST file 10
  • 11. 1. Write buffer size (memtable) - Data is written to the write buffer - When the memtable is full the data is flushed to the disk - The size of the memtable affects the flush frequency and the file size DB2 DB4 SST1 SST1 L0 SST1 SST1 mem1 mem2 mem3 Flush mem3 Writes 11
  • 12. Write Buffer Size - Considerations Larger write buffer: Fewer flush operations Reduced write latency Lower write amplification (less data written to disk) Improved read performance for recent data Consumes extra memory Smaller write buffer: More frequent flushes Higher write amplification Increased write latency May require more frequent disk access for recent data 12
  • 13. Write Buffer Size - db_bench Test Results Write Amplification - Write amplification changed from 11 to 8 (30% lower) OPs - Up to 50% increase in write performance - Average of 20% more 30% decrease lower is better 13
  • 14. P99 Comparison: 64MB vs 16MB TEST NAME P99 - 64MB P99 - 16MB P99 - diff readrandomwriterandom_5 1940.18 2588.6 -25.05% readrandomwriterandom_50 865.01 1285.75 -32.72% readrandomwriterandom_70 858.19 1404.23 -38.89% Up to ~40% decrease in P99 latency when write buffer size is set to 64MB 14
  • 15. Write Buffer Size For a total increase in RAM footprint of (64(recommended)-16(default today))*4(max memtable num) per CF in MB (so around 200MB) you gain 30% performance increase. So --> if you have Low number of partitions or a lot of free memory This would be good for you Note: It's not a system wide, you can increase one specific CF (partition) and not the rest. 15
  • 16. 2. Performance vs Memory Tradeoff - In heavy write workload scenario the write buffer can not handle the heavy write rate - There are 2 options: - Consume extra memory to handle the new writes - Delay the writes and potentially getting into to application stalls MAX memtables=2 16
  • 17. Performance Allow_stalls:disabled RocksDB consumes extra memory Can lead to OOM Memory Allow_stalls:enabled Serious performance impact Rocksdb may enter deadlocks 2. Performance vs Memory Tradeoff 17
  • 18. Pinning - Performance Pin index and filter block for better performance RocksDB consumes extra memory Can lead to OOM LRU - Memory Use LRU cache for filter and index Performance impact when need to load data to the cache Performance vs Memory Tradeoff 18
  • 19. Reorder the LSM tree for better performance and lower space (garbage collection) 3. Compaction Method Periodic process Merge SST files: removes duplicate or overwriting keys Multi-threaded process 19
  • 20. Compaction Types : Universal vs. Leveled Universal Compaction VS Leveled Compaction Write rate > write rate Space Amplification > Space Amplification Write Amplification > write amplification 20
  • 21. Universal vs. Leveled Compaction Results - 60% less space amplification - 22% less write amplification Configuration: - 80 Million keys of 1KB - WB size: 16MB lower is better 21
  • 22. Universal vs. Leveled Compaction (WB size 16MB) Results Leveled compaction provides: - 12% improvement in mixed workload - 38% improvement in Seek random - 10% less write performance Configuration: - 80 Million keys of 1KB - WB size: 16MB 22
  • 23. Universal vs. Leveled Compaction (WB size 64MB) Results - 10% improvement with Level compaction +64MB write buffer size Test: - Overwrite (100% random write) 23
  • 24. A Game changing trick to Boost your Application
  • 25. A drop-in replacement for RocksDB
  • 26. Speedb Open Source - Speedb open source is a community-led project - A fork of RocksDB, fully compatible - Rebase to latest rocksdb regularly Community-driven Community Resource utilization High and stable performance T Resource utilization Data that has not been flushed - secured on the WAL but not yet “cleaned” Developer Experience High and stable performance High and stable performance 26
  • 27. Performance Stabilization: Delayed write - Mechanism to slow down writes when reaching to a certain threshold - Speedb delayed the writes moderately to avoid stalls and performance instability 27
  • 28. Global delayed write - Every Kafka streams partition is translated to a Rocksdb instance - Partitions allowing parallel processing - The global delayed write take into account all of instances when decides about the write rate changes 28
  • 29. Delayedwrite Environment 16 cores 128GB RAM, NVME Scenario 5578 4122 No More Hiccups! Stable performance no stalls Results 100% random write workload 29
  • 30. 3 0 5578 4122 Write Buffer Manager Speedb offers: new write buffer manager that keep stable performance while staying in the memory boundaries. Limits the sum of all memtables to not exceed its size The problem: when the WBM reaches to 90% usage it causes huge performance issue. Without stalls the size is not enforced. Series impact on performance, deadlock etc’,
  • 31. 3 1 5578 4122 Write Buffer Manager - - Define a single parameter Simplicity - - Stays within the memory boundaries Memory Consumption - - Eliminate Stalls Stable Performance
  • 32. WriteBuffer Manager Environment 16 cores 128GB RAM, NVME Scenario 95% write workload Results - Stable performance (no stalls) - 24% less memory usage 5578 4122 Lower is better 32
  • 33. 33 ProactiveFlushes testandresults Environment 16 cores 128GB RAM, NVME Scenario As described above Results 45% improvement in writes ProactiveFlushesandNative Proactive Flushes Native secs_elapsed
  • 34. Static Pinning - Pinning index and filter blocks to the memory has better performance than LRU cache - The risk with RocksDB is to get into OOM condition Speedb added a safety belt that provides the benefit of pinning without the memory risk 34
  • 35. 35 StaticPinning Environment 8 cores 64GB RAM, HDD Scenario Results 130% improvement in read random workload 100% random read workload
  • 37. Speedb Enterprise - Based on Speedb OSS - For high scale systems with SLA Includes: - Adaptive multi dimensional compaction - Adaptive media throughput Control - Professional Services and support 37
  • 38. Why Speedb Enterprise? Removes Data size limitation (~100GB) Reduced CPU usage due to lower write amplification No IO hiccups during compaction Eliminate the Performance degradation when the data set is ±20GB Improved SSD endurance due to lower write amplification 38
  • 39. Reduce WAF from ~24 to ~4 39
  • 40. Seek while Write Rocksdb Speedb 14000 12000 10000 8000 6000 4000 2000 0 1 361 181 541 721 901 1081 1261 1441 1621 1801 1981 2161 2341 2521 2701 2762 2881 2942 3061 3122 IOPS Time (in seconds) 40
  • 41. 95% write. 5% read Rocksdb Speedb 20000 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 1 361 181 541 721 901 1081 1261 1441 1621 1801 1981 2161 2341 2462 2581 2642 2761 2822 2941 3002 IOPS Time (in seconds) 41
  • 44. Summary 1. Increase write buffer size from 16MB to 64MB 2. Memory vs. performance tradeoff 3. Change the compaction method to Level 4. Use Speedb storage engine in your Kafka Streams application 44
  • 45. How to replace RocksDB with Speedb? 1. Download a compiled Kafka stream version with Speedb 2. Compile yourself from source 3. Replace RocksDB with Speedb lib 45 Follow the doc for full instructions Speedb Github
  • 46. TheHive Speedb serves as a hive where Speedb/RocksDB users and contributors can collaborate on the development of new storage engine capabilities to address the needs of modern, data-intensive workloads 4 6
  • 48. P99 Due to optimization for small object writes, RocksDB is typically most popular in state store, metadata, caching, indexing and other such applications supporting databases (like Redis on Flash), applications and event streaming platforms like Kafka Streams, Apache Flink or Apache Spark. Rocksdb write performance suffers when the database size exceeds over 50GB. In this case the write amplification may reach a very large number, which means that the system needs to do many (more) reads and writes for each application write. 48
  • 49. Compaction Method - L0: stores recent data, sorted by flush time - L1-Lmax: sorted data by key - Lmax - oldest data 49
  • 50. Bloom Filter - The data is always sorted - optimal for sequential reads - Bloom filter improves random read performance - Reduce the number of reads from the disk - Returns either "possibly in set" or "definitely not in set" 50
  • 51. 5 1 Bloomfilter testandresults Environment 8 cores 64GB RAM, HDD Scenario 1B KV pairs 16B key, 256B value, 4 threads, 20 bpk , high read-miss (similar to read before write workload) Results 130% improvement in read misses read random 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1000 k 500k 0 10000000000 OBJECTS / 100% READS / 0% WRITES readrandom
  • 52. 5 2 Bloomfilter testandresults Environment 16 cores 128GB RAM, NVME Scenario 1B KV pairs 16B key, 256B value, 4 threads, 29vs40 bpk Results 26% reduction in memory usage 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1000 k 500k 0 10000000000 OBJECTS / 100% READS / 0% WRITES readrandom 6000 4000 0 memoryconsumption 3000 5000 2000 1000 Bloom 40 BPK Speedb bloom 29 BPK 5578 4122
  • 53. New Sorted hash Memtable - The sorted hash memtable improves overall performance - Data structure: array of vectors + hash table - Improved read while write performance 53
  • 54. 5 4 Memtable testandresults Environment 16 cores 128GB RAM, NVME Scenario 1B KV pairs 16B key, 64B value,50 threads db_bench Results 17% gain overwrite, 13% gain mixed (9010) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 400k 200k 0 10000000000 OBJECTS / 0% READS / 100% WRITES overwrite 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 400k 200k 0 10000000000 OBJECTS / 90% READS / 10% WRITES readrandomwriterandom_90
  • 55. RocksDB vs Speedb OSS vs Enterprise RocksDB Speedb OSS Speedb Enterprise Professional Services Eliminate Write stalls Reduced Memory usage High performance at scale (dataset >30GB) Adaptive Multi-dimensional compaction 55