Tiered Storage 101
Maria Berinde-Tâmpănariu
Staff Solutions Engineer
Kafa Storage
2
Factors influencing the amount of required storage:
- number of topics/partitions,
- rate of incoming messages,
- retention period (log.retention).
Extending Kafka Storage
3
Increase Disk
Capacity
Increase the
Number of Brokers
Use Tiered Storage
- not always possible
- finite
- must predict
growth
- adds memory and
CPU, as well
- less cost-efficient
- more complex
deployment &
operations
What is Tiered Storage?
4
Broker Configuration
5
Remote Storage Manager Remote Log Metadata Manager
• lifecycle of remote log
segments and indexes
• interface to be implemented
• storage-specific
implementations not part of
Apache Kafka repo
• lifecycle of metadata about
remote log segments
• strongly consistent semantics
• implementation with storage
as an internal topic (can be
changed)
What Does this Look Like in Action?
6
Example:
Confluent Tiered Storage Configuration for AWS S3
confluent.tier.feature=true enables Tiered Storage on the broker
confluent.tier.enable=true sets the default for new topics
confluent.tier.backend=S3 sets the storage service, in this case AWS S3
confluent.tier.s3.bucket=<BUCKET_NAME>
confluent.tier.s3.prefix=<DIRECTORY-PATH> AWS S3 storage configuration
confluent.tier.s3.region=<REGION>
# confluent.tier.metadata.replication.factor=1 overrides the default value, which is 3
confluent.tier.s3.cred.file.path authentication to the storage service
7
Topic Configuration
• Tiered Storage enabled
• retention
total retention
local retention
•
8
The message is written
to the local disk.
Step 1
The log segment is
uploaded to the remote
storage.
Step 2
When the local
retention expires, the
message is marked for
deletion on the local
storage.
Step 3
When the total
retention expires, the
segment is deleted
from the cluster.
Step 4
Summary
- Tiered Storage offers a cost effective way to increase the storage available in a
Kafka cluster.
- Clients continue to interact with the cluster as usual.
- Operating the cluster becomes easier because of a less complex deployment
and quicker recovery times.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Sto
rage
https://kafka.apache.org/documentation/#tiered_storage
https://docs.confluent.io/platform/current/clusters/tiered-storage.html
9
Tiered Storage 101 | Kafla Summit London

Tiered Storage 101 | Kafla Summit London

  • 1.
    Tiered Storage 101 MariaBerinde-Tâmpănariu Staff Solutions Engineer
  • 2.
    Kafa Storage 2 Factors influencingthe amount of required storage: - number of topics/partitions, - rate of incoming messages, - retention period (log.retention).
  • 3.
    Extending Kafka Storage 3 IncreaseDisk Capacity Increase the Number of Brokers Use Tiered Storage - not always possible - finite - must predict growth - adds memory and CPU, as well - less cost-efficient - more complex deployment & operations
  • 4.
    What is TieredStorage? 4
  • 5.
    Broker Configuration 5 Remote StorageManager Remote Log Metadata Manager • lifecycle of remote log segments and indexes • interface to be implemented • storage-specific implementations not part of Apache Kafka repo • lifecycle of metadata about remote log segments • strongly consistent semantics • implementation with storage as an internal topic (can be changed)
  • 6.
    What Does thisLook Like in Action? 6
  • 7.
    Example: Confluent Tiered StorageConfiguration for AWS S3 confluent.tier.feature=true enables Tiered Storage on the broker confluent.tier.enable=true sets the default for new topics confluent.tier.backend=S3 sets the storage service, in this case AWS S3 confluent.tier.s3.bucket=<BUCKET_NAME> confluent.tier.s3.prefix=<DIRECTORY-PATH> AWS S3 storage configuration confluent.tier.s3.region=<REGION> # confluent.tier.metadata.replication.factor=1 overrides the default value, which is 3 confluent.tier.s3.cred.file.path authentication to the storage service 7
  • 8.
    Topic Configuration • TieredStorage enabled • retention total retention local retention • 8 The message is written to the local disk. Step 1 The log segment is uploaded to the remote storage. Step 2 When the local retention expires, the message is marked for deletion on the local storage. Step 3 When the total retention expires, the segment is deleted from the cluster. Step 4
  • 9.
    Summary - Tiered Storageoffers a cost effective way to increase the storage available in a Kafka cluster. - Clients continue to interact with the cluster as usual. - Operating the cluster becomes easier because of a less complex deployment and quicker recovery times. https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Sto rage https://kafka.apache.org/documentation/#tiered_storage https://docs.confluent.io/platform/current/clusters/tiered-storage.html 9