"With a few tweaks under the hood of your Kafka Streams implementation, you greatly improve performance. Sound too good to be true? Well, the secret lies in understanding storage engines.
You may already know that if you're using Kafka streams, you already have a storage engine in place, but do you know what options are available to tune it for optimal performance and scalability?
This presentation will discuss the importance of optimizing and choosing storage engines for Kafka streams applications.
Outline:
- What a storage engine is and how it relates to Kafka stateful streams
- The importance of understanding storage engines for optimal performance and scalability
- Evaluation of Storage Engines - Overview of popular storage engines, including Leveldb, Rocksdb, and Speedb open-source
- Review of the 5 most relevant configurable items and how they affect performance
- Practical ways to optimize and fine tune your storage engine
- Showcase - 2 minutes drop-in replacement demonstration"
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune Up Performance
1. Getting Under the Hood of Kafka Streams -
Optimizing Storage engine to Tune Up Performance
2. Agenda
★ What is a Storage Engine?
★ State store in Kafka Streams
★ 3 tips and 1 game changing trick
to improve performance today
★ What is Speedb
2
4. Storage Engine
- A software component responsible for managing how data is
stored, retrieved, and updated in a database or data processing
system
- Plays a crucial role in determining the performance and
efficiency of data storage and retrieval operations
- Embedded in the application software stack
- Data management capabilities: snapshots, transactions, data is
ordered, can run iterators
4
6. Stateful Kafka Streams
- Kafka streams uses a storage engine for state storing
- Stateful enables to process the current event in the context of other
events
- Track information over time, in real-time data processing
Example of State storing operations:
- Count() - how many times have we seen a key in the event stream
- Aggregate() - allows aggregate data you’ve seen until now , combine events,
they don’t have to be of the same type.
6
7. RocksDB: Kafka’s default storage engine
- The default storage engine in Kafka is RocksDB
- RocksDB is a popular key-value storage engine, by Facebook
- A fork of LevelDB, by Google
- LSM tree implementation
- Better than B-tree on write intensive workload
- Very complex, it has dozens of parameters to config
7
8. Why does it matter?
- The storage engine resides in the data path, has a major impact on the
overall performance
- Handles all put/get/delete/update operations
- Example: state restore:
Scale out or recovering
Lower is better
8
9. 3 tips to increase your stateful streaming
performance today
and 1 game changing trick
10. Speedb/RocksDB High Level Architecture
Memtable is full
SST file
Read
Write
Memtable
Flush
L0
L1
L2
.
.
Ln
Data not in the memtable
Immutable
memtable
SST file
SST file SST file SST file
SST file SST file SST file SST file
SST file SST file SST file SST file SST file
10
11. 1. Write buffer size (memtable)
- Data is written to the write buffer
- When the memtable is full the data is flushed to the
disk
- The size of the memtable affects the flush frequency
and the file size
DB2 DB4
SST1
SST1
L0 SST1 SST1
mem1 mem2 mem3
Flush
mem3
Writes
11
12. Write Buffer Size - Considerations
Larger write buffer:
Fewer flush operations
Reduced write latency
Lower write amplification
(less data written to disk)
Improved read performance for recent data
Consumes extra memory
Smaller write buffer:
More frequent flushes
Higher write amplification
Increased write latency
May require more frequent disk access
for recent data
12
13. Write Buffer Size - db_bench Test Results
Write Amplification
- Write amplification changed from 11 to 8
(30% lower)
OPs
- Up to 50% increase in write performance
- Average of 20% more
30% decrease
lower is better
13
14. P99 Comparison: 64MB vs 16MB
TEST NAME P99 - 64MB P99 - 16MB P99 - diff
readrandomwriterandom_5 1940.18 2588.6 -25.05%
readrandomwriterandom_50 865.01 1285.75 -32.72%
readrandomwriterandom_70 858.19 1404.23 -38.89%
Up to ~40% decrease in P99 latency when write
buffer size is set to 64MB
14
15. Write Buffer Size
For a total increase in RAM footprint of
(64(recommended)-16(default today))*4(max memtable num) per CF in MB
(so around 200MB) you gain 30% performance increase.
So --> if you have
Low number of partitions or a lot of free memory
This would be good for you
Note: It's not a system wide, you can increase one specific CF
(partition) and not the rest. 15
16. 2. Performance vs Memory Tradeoff
- In heavy write workload scenario the write buffer can not handle the
heavy write rate
- There are 2 options:
- Consume extra memory to handle the new writes
- Delay the writes and potentially getting into to application
stalls
MAX memtables=2
16
18. Pinning - Performance
Pin index and filter
block for better
performance
RocksDB consumes
extra memory
Can lead to OOM
LRU - Memory
Use LRU cache for
filter and index
Performance impact when
need to load data to
the cache
Performance vs Memory Tradeoff
18
19. Reorder the LSM tree for
better performance and
lower space (garbage
collection)
3. Compaction Method
Periodic process
Merge SST files: removes duplicate or overwriting keys
Multi-threaded
process
19
20. Compaction Types : Universal vs. Leveled
Universal Compaction VS Leveled Compaction
Write rate > write rate
Space Amplification > Space Amplification
Write Amplification > write amplification
20
21. Universal vs. Leveled Compaction
Results
- 60% less space amplification
- 22% less write amplification
Configuration:
- 80 Million keys of 1KB
- WB size: 16MB
lower is better
21
22. Universal vs. Leveled Compaction (WB size 16MB)
Results
Leveled compaction provides:
- 12% improvement in mixed workload
- 38% improvement in Seek random
- 10% less write performance
Configuration:
- 80 Million keys of 1KB
- WB size: 16MB
22
23. Universal vs. Leveled Compaction (WB size 64MB)
Results
- 10% improvement with Level
compaction +64MB write buffer size
Test:
- Overwrite (100% random write)
23
26. Speedb Open Source
- Speedb open source is a community-led project
- A fork of RocksDB, fully compatible
- Rebase to latest rocksdb regularly
Community-driven
Community Resource
utilization
High and stable
performance
T
Resource
utilization
Data that has not been
flushed - secured on the
WAL but not yet “cleaned”
Developer
Experience
High and stable
performance
High and stable
performance
26
27. Performance Stabilization: Delayed write
- Mechanism to slow down writes when reaching to a certain threshold
- Speedb delayed the writes moderately to avoid stalls and performance
instability
27
28. Global delayed write
- Every Kafka streams partition is translated to a Rocksdb instance
- Partitions allowing parallel processing
- The global delayed write take into account all of instances when
decides about the write rate changes
28
30. 3
0
5578
4122
Write Buffer Manager
Speedb offers: new write buffer manager that keep
stable performance while staying in the memory
boundaries.
Limits the sum of all memtables to not exceed its
size
The problem: when the WBM reaches to 90% usage it
causes huge performance issue.
Without stalls the size is not enforced.
Series impact on performance, deadlock etc’,
34. Static Pinning
- Pinning index and filter blocks to the
memory has better performance than LRU
cache
- The risk with RocksDB is to get into OOM
condition
Speedb added a safety belt that provides
the benefit of pinning without the memory
risk
34
37. Speedb Enterprise
- Based on Speedb OSS
- For high scale systems with SLA
Includes:
- Adaptive multi dimensional compaction
- Adaptive media throughput Control
- Professional Services and support
37
38. Why Speedb Enterprise?
Removes Data size limitation (~100GB)
Reduced CPU usage due to lower write
amplification
No IO hiccups during compaction
Eliminate the Performance degradation when
the data set is ±20GB
Improved SSD endurance due to lower
write amplification
38
44. Summary
1. Increase write buffer size from 16MB to
64MB
2. Memory vs. performance tradeoff
3. Change the compaction method to Level
4. Use Speedb storage engine in your Kafka Streams
application
44
45. How to replace RocksDB with Speedb?
1. Download a compiled Kafka stream version with Speedb
2. Compile yourself from source
3. Replace RocksDB with Speedb lib
45
Follow the doc for full instructions
Speedb Github
46. TheHive
Speedb serves as a hive where Speedb/RocksDB users
and contributors can collaborate on the development
of new storage engine capabilities to address the
needs of modern, data-intensive workloads
4
6
48. P99
Due to optimization for small object writes, RocksDB is typically
most popular in state store, metadata, caching, indexing and
other such applications supporting databases (like Redis on
Flash), applications and event streaming platforms like Kafka
Streams, Apache Flink or Apache Spark.
Rocksdb write performance suffers when the database size exceeds over 50GB. In this case the write amplification may reach a very large number, which means
that the system needs to do many (more) reads and writes for each application write.
48
49. Compaction Method
- L0: stores recent data, sorted by flush time
- L1-Lmax: sorted data by key
- Lmax - oldest data
49
50. Bloom Filter
- The data is always sorted - optimal for
sequential reads
- Bloom filter improves random read
performance
- Reduce the number of reads from the disk
- Returns either "possibly in set" or
"definitely not in set"
50