Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited Storage from AWS Services

© 2023, Amazon Web Services, Inc. or its affiliates.
Optimize costs and scale your
streaming applications with
virtually unlimited storage
Principal Product Manager
Amazon MSK
AWS
Todd McGrath
Vidhi Taneja
Principal Solutions Architect
Streaming Services
AWS

Agenda
Learnings and conclusions
Amazon MSK Tiered Storage in action
Achieve cost efficiencies with Amazon MSK Tiered Storage
Growing demand of retaining data for longer in Kafka

Amazon Managed Streaming for Apache Kafka
A fully managed service for
and
Amazon MSK
Provisioned | Serverless

Multi-AZ replication. Horizonal and Vertical Scaling. Multiple levels of security. Encryption at
rest and in transit. VPC isolation and IAM, Secrets Manager, TLS based authentication
Amazon MSK
Redshift and OpenSearch streaming ingestion, AWS Lambda as a data consumer, Schema
management with AWS Glue Schema Registry, Stream processing with Amazon MSF
Maximize Apache Kafka performance (high throughput, low latency) at any scale. Monitoring
via Amazon CloudWatch or Open Monitoring for Prometheus

Customers running Amazon MSK at scale
5
Ingesting millions of
events/s for
interaction analytics
Ingesting 20 TB/day
of in-game events for
game analytics
Ingesting billions of
application
events/day
Leveraging as a
strategic backbone to
exchange messages
between processes
Ingesting 40 GB/s of
telemetry
Ingesting from
hundreds of real
estate data sources
nationwide
Ingesting 6 billion
events/day for event
logging pipeline
Leveraging for real-
time communication
between multiple
micro-services

“Retain data for ”

The growing demand for retaining data for longer
Training or scoring Machine Learning
models
Regulatory compliance reasons
Recompute results for application logic
change or unplanned outages

New!
No Code
Fully Managed
Built-in transformations Amazon MSK Kinesis Data
Firehose
Amazon S3
Approaches for longer retention
Managed Kafka Connect
Fully Compatible
Automatic scaling Amazon MSK Amazon MSK
Connect
(Sink Connector)
Amazon S3
Managed Flink
Service
Process real-time
data streams Amazon MSK Amazon MSF Amazon S3

Traditional approaches of longer data retention in a
Kafka Cluster can be
Tightly coupled compute and storage on
Kafka
Increased cost
Longer time to failure recovery and
rebalance on Kafka
!

”Retain data for
”

Amazon MSK Tiered Storage
Save with
low-cost tier
Scale storage without
adding brokers
No additional
infrastructure to
manage
Scale to virtually
unlimited storage
Launched in 2022
Faster partition
rebalancing and
recovery
Based on KIP-405: Kafka Tiered Storage

AWS’ Contributions to Open Source Kafka
85%+ Code Reviews Integration testing frameworks
Critical bug fixes
Code Contributions
New KIPs for Tiered Storage
Documentation
Customer experience definition
Mentorship to new community members
Testing and reporting issues

Customer Savings with Tiered Storage
Without Tiered
Storage
With Tiered Storage
Achieved 56% savings in
their infrastructure costs
Without Tiered
Storage
With Tiered Storage
Reduced $/GB by 27% while
increasing their retention by 3x
56% cost
savings
3x data
retention

Customers finding value from Tiered Storage
Independent storage scaling from compute
Enables high throughput workloads
Use same app code for historical and real-time data
Longer retention, improved cluster capacity utilization

Virtually unlimited and cost-effective storage tier
Amazon MSK
Tiered Storage

Enabling Amazon MSK Tiered Storage
1 Enable tiered storage using MSK console and AWS CLI
2 Enable tiered storage on a Kafka topic
bin/kafka-configs.sh --bootstrap-server $bsrv --alter --entity-type topics --entity-name
vidhi --add-config 'remote.storage.enable=true, local.retention.ms=7200000,
retention.ms=604800000
Or
Create a new tiered storage enabled topic
bin/kafka-topics.sh --create --bootstrap-server $bsrv --replication-factor 2 --partitions
1 --topic vidhi --config remote.storage.enable=true --config local.retention.ms=7200000 -
-config retention.ms=604800000

Topic
Partition 0
Kafka partitions and segments
Segment 0:
Closed
Segment 1:
Active
Writes
Partition 0
Partition 1
Partition 2

Data lifecycle in a tiered storage enabled topic
Example scenario: A topic with 2 days local retention and 5 days overall retention
T0 Time T0 - Before you enable tiered storage
Segment 3- Active
Segment 1 Segment 2
Local log segments
Topic Partition -0

T1 Time T1 (< 2 days) - Tiered storage enabled. Segment 1 and 2 copied to tiered storage
Segment 3- Active
Segment 1 Segment 2
Local log segments
Segment 1 Segment 2
Tiered storage log segments
Topic Partition -0

T2 Time T2 - Local retention in effect
Deleted
Segment 3- Active
Segment 1 Segment 2
Local log segments
Segment 1 Segment 2
Topic Partition -0

T3 Time T3 - Overall retention in effect
Deleted
Segment 3- Active
Local log segments
Topic Partition -0
Segment 1 Segment 2

How read works with tiered storage
Fetch request ReplicaManager
ReadFromLocalLog
RemoteLogManager
Local storage
Remote storage
Source: KIP-405
Kafka Client

Amazon MSK Tiered Storage

Takeaways
Zero infrastructure management
Consumption based pricing
Fully compatible with Kafka APIs
Virtually unlimited storage
Similar read latencies
Amazon MSK
Tiered Storage

27
“Enable tiered
storage on your MSK
clusters today to
achieve storage
elasticity and retain
data longer at
a lower cost with
virtually unlimited
capacity”
Documentation Blog
Vidhi Taneja
vidhit@amazon.com
Todd McGrath
toddmcg@amazon.com

Thank you!
Vidhi Taneja
vidhit@amazon.com
Todd McGrath
toddmcg@amazon.com

Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited Storage from AWS Services

Recommended

Recommended

More Related Content

Similar to Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited Storage from AWS Services

Similar to Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited Storage from AWS Services (20)

More from HostedbyConfluent

More from HostedbyConfluent (20)

Recently uploaded

Recently uploaded (20)

Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited Storage from AWS Services