More Related Content Similar to Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited Storage from AWS Services (20) More from HostedbyConfluent (20) Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited Storage from AWS Services1. © 2023, Amazon Web Services, Inc. or its affiliates.
© 2022, Amazon Web Services, Inc. or its affiliates.
Optimize costs and scale your
streaming applications with
virtually unlimited storage
Principal Product Manager
Amazon MSK
AWS
Todd McGrath
Vidhi Taneja
Principal Solutions Architect
Streaming Services
AWS
2. © 2023, Amazon Web Services, Inc. or its affiliates.
Agenda
Learnings and conclusions
Amazon MSK Tiered Storage in action
Achieve cost efficiencies with Amazon MSK Tiered Storage
Growing demand of retaining data for longer in Kafka
3. © 2023, Amazon Web Services, Inc. or its affiliates.
Amazon Managed Streaming for Apache Kafka
A fully managed service for
and
Amazon MSK
Provisioned | Serverless
4. © 2023, Amazon Web Services, Inc. or its affiliates.
Multi-AZ replication. Horizonal and Vertical Scaling. Multiple levels of security. Encryption at
rest and in transit. VPC isolation and IAM, Secrets Manager, TLS based authentication
Amazon MSK
Redshift and OpenSearch streaming ingestion, AWS Lambda as a data consumer, Schema
management with AWS Glue Schema Registry, Stream processing with Amazon MSF
Maximize Apache Kafka performance (high throughput, low latency) at any scale. Monitoring
via Amazon CloudWatch or Open Monitoring for Prometheus
5. © 2023, Amazon Web Services, Inc. or its affiliates.
Customers running Amazon MSK at scale
5
Ingesting millions of
events/s for
interaction analytics
Ingesting 20 TB/day
of in-game events for
game analytics
Ingesting billions of
application
events/day
Leveraging as a
strategic backbone to
exchange messages
between processes
Ingesting 40 GB/s of
telemetry
Ingesting from
hundreds of real
estate data sources
nationwide
Ingesting 6 billion
events/day for event
logging pipeline
Leveraging for real-
time communication
between multiple
micro-services
6. © 2023, Amazon Web Services, Inc. or its affiliates.
“Retain data for ”
7. © 2023, Amazon Web Services, Inc. or its affiliates.
The growing demand for retaining data for longer
Training or scoring Machine Learning
models
Regulatory compliance reasons
Recompute results for application logic
change or unplanned outages
8. © 2023, Amazon Web Services, Inc. or its affiliates.
New!
No Code
Fully Managed
Built-in transformations Amazon MSK Kinesis Data
Firehose
Amazon S3
Approaches for longer retention
Managed Kafka Connect
Fully Compatible
Automatic scaling Amazon MSK Amazon MSK
Connect
(Sink Connector)
Amazon S3
Managed Flink
Service
Process real-time
data streams Amazon MSK Amazon MSF Amazon S3
9. © 2023, Amazon Web Services, Inc. or its affiliates.
“Retain data for ”
10. © 2023, Amazon Web Services, Inc. or its affiliates.
Traditional approaches of longer data retention in a
Kafka Cluster can be
Tightly coupled compute and storage on
Kafka
Increased cost
Longer time to failure recovery and
rebalance on Kafka
!
11. © 2023, Amazon Web Services, Inc. or its affiliates.
”Retain data for
”
12. © 2023, Amazon Web Services, Inc. or its affiliates.
Amazon MSK Tiered Storage
Save with
low-cost tier
Scale storage without
adding brokers
No additional
infrastructure to
manage
Scale to virtually
unlimited storage
Launched in 2022
Faster partition
rebalancing and
recovery
Based on KIP-405: Kafka Tiered Storage
13. © 2023, Amazon Web Services, Inc. or its affiliates.
AWS’ Contributions to Open Source Kafka
85%+ Code Reviews Integration testing frameworks
Critical bug fixes
Code Contributions
New KIPs for Tiered Storage
Documentation
Customer experience definition
Mentorship to new community members
Testing and reporting issues
14. © 2023, Amazon Web Services, Inc. or its affiliates.
Customer Savings with Tiered Storage
Without Tiered
Storage
With Tiered Storage
Achieved 56% savings in
their infrastructure costs
Without Tiered
Storage
With Tiered Storage
Reduced $/GB by 27% while
increasing their retention by 3x
56% cost
savings
3x data
retention
15. © 2023, Amazon Web Services, Inc. or its affiliates.
Customers finding value from Tiered Storage
Independent storage scaling from compute
Enables high throughput workloads
Use same app code for historical and real-time data
Longer retention, improved cluster capacity utilization
16. © 2023, Amazon Web Services, Inc. or its affiliates.
Virtually unlimited and cost-effective storage tier
Amazon MSK
Tiered Storage
17. © 2023, Amazon Web Services, Inc. or its affiliates.
Enabling Amazon MSK Tiered Storage
1 Enable tiered storage using MSK console and AWS CLI
2 Enable tiered storage on a Kafka topic
bin/kafka-configs.sh --bootstrap-server $bsrv --alter --entity-type topics --entity-name
vidhi --add-config 'remote.storage.enable=true, local.retention.ms=7200000,
retention.ms=604800000
Or
Create a new tiered storage enabled topic
bin/kafka-topics.sh --create --bootstrap-server $bsrv --replication-factor 2 --partitions
1 --topic vidhi --config remote.storage.enable=true --config local.retention.ms=7200000 -
-config retention.ms=604800000
18. © 2023, Amazon Web Services, Inc. or its affiliates.
Topic
Partition 0
Kafka partitions and segments
Segment 0:
Closed
Segment 1:
Active
Writes
Partition 0
Partition 1
Partition 2
19. © 2023, Amazon Web Services, Inc. or its affiliates.
Data lifecycle in a tiered storage enabled topic
Example scenario: A topic with 2 days local retention and 5 days overall retention
T0 Time T0 - Before you enable tiered storage
Segment 3- Active
Segment 1 Segment 2
Local log segments
Topic Partition -0
20. © 2023, Amazon Web Services, Inc. or its affiliates.
Data lifecycle in a tiered storage enabled topic
Example scenario: A topic with 2 days local retention and 5 days overall retention
T1 Time T1 (< 2 days) - Tiered storage enabled. Segment 1 and 2 copied to tiered storage
Segment 3- Active
Segment 1 Segment 2
Local log segments
Segment 1 Segment 2
Tiered storage log segments
Topic Partition -0
21. © 2023, Amazon Web Services, Inc. or its affiliates.
Data lifecycle in a tiered storage enabled topic
Example scenario: A topic with 2 days local retention and 5 days overall retention
T2 Time T2 - Local retention in effect
Deleted
Segment 3- Active
Segment 1 Segment 2
Local log segments
Segment 1 Segment 2
Tiered storage log segments
Topic Partition -0
22. © 2023, Amazon Web Services, Inc. or its affiliates.
Data lifecycle in a tiered storage enabled topic
Example scenario: A topic with 2 days local retention and 5 days overall retention
T3 Time T3 - Overall retention in effect
Deleted
Segment 3- Active
Local log segments
Tiered storage log segments
Topic Partition -0
Segment 1 Segment 2
23. © 2023, Amazon Web Services, Inc. or its affiliates.
How read works with tiered storage
Fetch request ReplicaManager
ReadFromLocalLog
RemoteLogManager
Local storage
Remote storage
Source: KIP-405
Kafka Client
24. © 2023, Amazon Web Services, Inc. or its affiliates.
Amazon MSK Tiered Storage
26. © 2023, Amazon Web Services, Inc. or its affiliates.
Takeaways
Zero infrastructure management
Consumption based pricing
Fully compatible with Kafka APIs
Virtually unlimited storage
Similar read latencies
Amazon MSK
Tiered Storage
27. © 2023, Amazon Web Services, Inc. or its affiliates.
27
“Enable tiered
storage on your MSK
clusters today to
achieve storage
elasticity and retain
data longer at
a lower cost with
virtually unlimited
capacity”
Documentation Blog
Vidhi Taneja
vidhit@amazon.com
Todd McGrath
toddmcg@amazon.com
28. © 2023, Amazon Web Services, Inc. or its affiliates.
Thank you!
© 2022, Amazon Web Services, Inc. or its affiliates.
Vidhi Taneja
vidhit@amazon.com
Todd McGrath
toddmcg@amazon.com