SlideShare a Scribd company logo
1 of 28
Download to read offline
© 2023, Amazon Web Services, Inc. or its affiliates.
© 2022, Amazon Web Services, Inc. or its affiliates.
Optimize costs and scale your
streaming applications with
virtually unlimited storage
Principal Product Manager
Amazon MSK
AWS
Todd McGrath
Vidhi Taneja
Principal Solutions Architect
Streaming Services
AWS
© 2023, Amazon Web Services, Inc. or its affiliates.
Agenda
Learnings and conclusions
Amazon MSK Tiered Storage in action
Achieve cost efficiencies with Amazon MSK Tiered Storage
Growing demand of retaining data for longer in Kafka
© 2023, Amazon Web Services, Inc. or its affiliates.
Amazon Managed Streaming for Apache Kafka
A fully managed service for
and
Amazon MSK
Provisioned | Serverless
© 2023, Amazon Web Services, Inc. or its affiliates.
Multi-AZ replication. Horizonal and Vertical Scaling. Multiple levels of security. Encryption at
rest and in transit. VPC isolation and IAM, Secrets Manager, TLS based authentication
Amazon MSK
Redshift and OpenSearch streaming ingestion, AWS Lambda as a data consumer, Schema
management with AWS Glue Schema Registry, Stream processing with Amazon MSF
Maximize Apache Kafka performance (high throughput, low latency) at any scale. Monitoring
via Amazon CloudWatch or Open Monitoring for Prometheus
© 2023, Amazon Web Services, Inc. or its affiliates.
Customers running Amazon MSK at scale
5
Ingesting millions of
events/s for
interaction analytics
Ingesting 20 TB/day
of in-game events for
game analytics
Ingesting billions of
application
events/day
Leveraging as a
strategic backbone to
exchange messages
between processes
Ingesting 40 GB/s of
telemetry
Ingesting from
hundreds of real
estate data sources
nationwide
Ingesting 6 billion
events/day for event
logging pipeline
Leveraging for real-
time communication
between multiple
micro-services
© 2023, Amazon Web Services, Inc. or its affiliates.
“Retain data for ”
© 2023, Amazon Web Services, Inc. or its affiliates.
The growing demand for retaining data for longer
Training or scoring Machine Learning
models
Regulatory compliance reasons
Recompute results for application logic
change or unplanned outages
© 2023, Amazon Web Services, Inc. or its affiliates.
New!
No Code
Fully Managed
Built-in transformations Amazon MSK Kinesis Data
Firehose
Amazon S3
Approaches for longer retention
Managed Kafka Connect
Fully Compatible
Automatic scaling Amazon MSK Amazon MSK
Connect
(Sink Connector)
Amazon S3
Managed Flink
Service
Process real-time
data streams Amazon MSK Amazon MSF Amazon S3
© 2023, Amazon Web Services, Inc. or its affiliates.
“Retain data for ”
© 2023, Amazon Web Services, Inc. or its affiliates.
Traditional approaches of longer data retention in a
Kafka Cluster can be
Tightly coupled compute and storage on
Kafka
Increased cost
Longer time to failure recovery and
rebalance on Kafka
!
© 2023, Amazon Web Services, Inc. or its affiliates.
”Retain data for
”
© 2023, Amazon Web Services, Inc. or its affiliates.
Amazon MSK Tiered Storage
Save with
low-cost tier
Scale storage without
adding brokers
No additional
infrastructure to
manage
Scale to virtually
unlimited storage
Launched in 2022
Faster partition
rebalancing and
recovery
Based on KIP-405: Kafka Tiered Storage
© 2023, Amazon Web Services, Inc. or its affiliates.
AWS’ Contributions to Open Source Kafka
85%+ Code Reviews Integration testing frameworks
Critical bug fixes
Code Contributions
New KIPs for Tiered Storage
Documentation
Customer experience definition
Mentorship to new community members
Testing and reporting issues
© 2023, Amazon Web Services, Inc. or its affiliates.
Customer Savings with Tiered Storage
Without Tiered
Storage
With Tiered Storage
Achieved 56% savings in
their infrastructure costs
Without Tiered
Storage
With Tiered Storage
Reduced $/GB by 27% while
increasing their retention by 3x
56% cost
savings
3x data
retention
© 2023, Amazon Web Services, Inc. or its affiliates.
Customers finding value from Tiered Storage
Independent storage scaling from compute
Enables high throughput workloads
Use same app code for historical and real-time data
Longer retention, improved cluster capacity utilization
© 2023, Amazon Web Services, Inc. or its affiliates.
Virtually unlimited and cost-effective storage tier
Amazon MSK
Tiered Storage
© 2023, Amazon Web Services, Inc. or its affiliates.
Enabling Amazon MSK Tiered Storage
1 Enable tiered storage using MSK console and AWS CLI
2 Enable tiered storage on a Kafka topic
bin/kafka-configs.sh --bootstrap-server $bsrv --alter --entity-type topics --entity-name
vidhi --add-config 'remote.storage.enable=true, local.retention.ms=7200000,
retention.ms=604800000
Or
Create a new tiered storage enabled topic
bin/kafka-topics.sh --create --bootstrap-server $bsrv --replication-factor 2 --partitions
1 --topic vidhi --config remote.storage.enable=true --config local.retention.ms=7200000 -
-config retention.ms=604800000
© 2023, Amazon Web Services, Inc. or its affiliates.
Topic
Partition 0
Kafka partitions and segments
Segment 0:
Closed
Segment 1:
Active
Writes
Partition 0
Partition 1
Partition 2
© 2023, Amazon Web Services, Inc. or its affiliates.
Data lifecycle in a tiered storage enabled topic
Example scenario: A topic with 2 days local retention and 5 days overall retention
T0 Time T0 - Before you enable tiered storage
Segment 3- Active
Segment 1 Segment 2
Local log segments
Topic Partition -0
© 2023, Amazon Web Services, Inc. or its affiliates.
Data lifecycle in a tiered storage enabled topic
Example scenario: A topic with 2 days local retention and 5 days overall retention
T1 Time T1 (< 2 days) - Tiered storage enabled. Segment 1 and 2 copied to tiered storage
Segment 3- Active
Segment 1 Segment 2
Local log segments
Segment 1 Segment 2
Tiered storage log segments
Topic Partition -0
© 2023, Amazon Web Services, Inc. or its affiliates.
Data lifecycle in a tiered storage enabled topic
Example scenario: A topic with 2 days local retention and 5 days overall retention
T2 Time T2 - Local retention in effect
Deleted
Segment 3- Active
Segment 1 Segment 2
Local log segments
Segment 1 Segment 2
Tiered storage log segments
Topic Partition -0
© 2023, Amazon Web Services, Inc. or its affiliates.
Data lifecycle in a tiered storage enabled topic
Example scenario: A topic with 2 days local retention and 5 days overall retention
T3 Time T3 - Overall retention in effect
Deleted
Segment 3- Active
Local log segments
Tiered storage log segments
Topic Partition -0
Segment 1 Segment 2
© 2023, Amazon Web Services, Inc. or its affiliates.
How read works with tiered storage
Fetch request ReplicaManager
ReadFromLocalLog
RemoteLogManager
Local storage
Remote storage
Source: KIP-405
Kafka Client
© 2023, Amazon Web Services, Inc. or its affiliates.
Amazon MSK Tiered Storage
© 2023, Amazon Web Services, Inc. or its affiliates.
© 2023, Amazon Web Services, Inc. or its affiliates.
Takeaways
Zero infrastructure management
Consumption based pricing
Fully compatible with Kafka APIs
Virtually unlimited storage
Similar read latencies
Amazon MSK
Tiered Storage
© 2023, Amazon Web Services, Inc. or its affiliates.
27
“Enable tiered
storage on your MSK
clusters today to
achieve storage
elasticity and retain
data longer at
a lower cost with
virtually unlimited
capacity”
Documentation Blog
Vidhi Taneja
vidhit@amazon.com
Todd McGrath
toddmcg@amazon.com
© 2023, Amazon Web Services, Inc. or its affiliates.
Thank you!
© 2022, Amazon Web Services, Inc. or its affiliates.
Vidhi Taneja
vidhit@amazon.com
Todd McGrath
toddmcg@amazon.com

More Related Content

Similar to Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited Storage from AWS Services

Opinionated re:Invent recap with AWS Heroes & Builders
Opinionated re:Invent recap with AWS Heroes & BuildersOpinionated re:Invent recap with AWS Heroes & Builders
Opinionated re:Invent recap with AWS Heroes & Builders
Daniel Zivkovic
 

Similar to Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited Storage from AWS Services (20)

AWS 資料湖服務
AWS 資料湖服務AWS 資料湖服務
AWS 資料湖服務
 
A deep dive into Amazon MSK - ADB206 - Chicago AWS Summit
A deep dive into Amazon MSK - ADB206 - Chicago AWS SummitA deep dive into Amazon MSK - ADB206 - Chicago AWS Summit
A deep dive into Amazon MSK - ADB206 - Chicago AWS Summit
 
Building real-time serverless data applications with Confluent and AWS.pptx
Building real-time serverless data applications with Confluent and AWS.pptxBuilding real-time serverless data applications with Confluent and AWS.pptx
Building real-time serverless data applications with Confluent and AWS.pptx
 
AWS re:Invent Recap
AWS re:Invent RecapAWS re:Invent Recap
AWS re:Invent Recap
 
ENT306 Migrating Large Scale Data Sets to the Cloud
ENT306 Migrating Large Scale Data Sets to the CloudENT306 Migrating Large Scale Data Sets to the Cloud
ENT306 Migrating Large Scale Data Sets to the Cloud
 
Opinionated re:Invent recap with AWS Heroes & Builders
Opinionated re:Invent recap with AWS Heroes & BuildersOpinionated re:Invent recap with AWS Heroes & Builders
Opinionated re:Invent recap with AWS Heroes & Builders
 
AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...
AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...
AWS re:Invent 2016: Getting Started with the Hybrid Cloud: Enterprise Backup ...
 
Automating Backup and Archiving on AWS with Commvault (STG358) - AWS re:Inven...
Automating Backup and Archiving on AWS with Commvault (STG358) - AWS re:Inven...Automating Backup and Archiving on AWS with Commvault (STG358) - AWS re:Inven...
Automating Backup and Archiving on AWS with Commvault (STG358) - AWS re:Inven...
 
Os benefícios de realizar backup na nuvem AWS
Os benefícios de realizar backup na nuvem AWSOs benefícios de realizar backup na nuvem AWS
Os benefícios de realizar backup na nuvem AWS
 
ENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the CloudENT306 Migrating large Scale Data Sets to the Cloud
ENT306 Migrating large Scale Data Sets to the Cloud
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 
Backup & Recovery - Optimize Your Backup and Restore Architectures in the Cloud
Backup & Recovery - Optimize Your Backup and Restore Architectures in the CloudBackup & Recovery - Optimize Your Backup and Restore Architectures in the Cloud
Backup & Recovery - Optimize Your Backup and Restore Architectures in the Cloud
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 
Migrating_Large_Scale_Data_Sets_to_the_Cloud
Migrating_Large_Scale_Data_Sets_to_the_CloudMigrating_Large_Scale_Data_Sets_to_the_Cloud
Migrating_Large_Scale_Data_Sets_to_the_Cloud
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 
SRV308 Deep Dive on Amazon Aurora
SRV308 Deep Dive on Amazon AuroraSRV308 Deep Dive on Amazon Aurora
SRV308 Deep Dive on Amazon Aurora
 
Ditching the overhead - Moving Apache Kafka workloads into Amazon MSK - ADB30...
Ditching the overhead - Moving Apache Kafka workloads into Amazon MSK - ADB30...Ditching the overhead - Moving Apache Kafka workloads into Amazon MSK - ADB30...
Ditching the overhead - Moving Apache Kafka workloads into Amazon MSK - ADB30...
 
Getting Started with Managed Services | AWS Public Sector Summit 2016
Getting Started with Managed Services | AWS Public Sector Summit 2016Getting Started with Managed Services | AWS Public Sector Summit 2016
Getting Started with Managed Services | AWS Public Sector Summit 2016
 
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and RecoveryGetting Started with the Hybrid Cloud: Enterprise Backup and Recovery
Getting Started with the Hybrid Cloud: Enterprise Backup and Recovery
 
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 

Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited Storage from AWS Services

  • 1. © 2023, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. Optimize costs and scale your streaming applications with virtually unlimited storage Principal Product Manager Amazon MSK AWS Todd McGrath Vidhi Taneja Principal Solutions Architect Streaming Services AWS
  • 2. © 2023, Amazon Web Services, Inc. or its affiliates. Agenda Learnings and conclusions Amazon MSK Tiered Storage in action Achieve cost efficiencies with Amazon MSK Tiered Storage Growing demand of retaining data for longer in Kafka
  • 3. © 2023, Amazon Web Services, Inc. or its affiliates. Amazon Managed Streaming for Apache Kafka A fully managed service for and Amazon MSK Provisioned | Serverless
  • 4. © 2023, Amazon Web Services, Inc. or its affiliates. Multi-AZ replication. Horizonal and Vertical Scaling. Multiple levels of security. Encryption at rest and in transit. VPC isolation and IAM, Secrets Manager, TLS based authentication Amazon MSK Redshift and OpenSearch streaming ingestion, AWS Lambda as a data consumer, Schema management with AWS Glue Schema Registry, Stream processing with Amazon MSF Maximize Apache Kafka performance (high throughput, low latency) at any scale. Monitoring via Amazon CloudWatch or Open Monitoring for Prometheus
  • 5. © 2023, Amazon Web Services, Inc. or its affiliates. Customers running Amazon MSK at scale 5 Ingesting millions of events/s for interaction analytics Ingesting 20 TB/day of in-game events for game analytics Ingesting billions of application events/day Leveraging as a strategic backbone to exchange messages between processes Ingesting 40 GB/s of telemetry Ingesting from hundreds of real estate data sources nationwide Ingesting 6 billion events/day for event logging pipeline Leveraging for real- time communication between multiple micro-services
  • 6. © 2023, Amazon Web Services, Inc. or its affiliates. “Retain data for ”
  • 7. © 2023, Amazon Web Services, Inc. or its affiliates. The growing demand for retaining data for longer Training or scoring Machine Learning models Regulatory compliance reasons Recompute results for application logic change or unplanned outages
  • 8. © 2023, Amazon Web Services, Inc. or its affiliates. New! No Code Fully Managed Built-in transformations Amazon MSK Kinesis Data Firehose Amazon S3 Approaches for longer retention Managed Kafka Connect Fully Compatible Automatic scaling Amazon MSK Amazon MSK Connect (Sink Connector) Amazon S3 Managed Flink Service Process real-time data streams Amazon MSK Amazon MSF Amazon S3
  • 9. © 2023, Amazon Web Services, Inc. or its affiliates. “Retain data for ”
  • 10. © 2023, Amazon Web Services, Inc. or its affiliates. Traditional approaches of longer data retention in a Kafka Cluster can be Tightly coupled compute and storage on Kafka Increased cost Longer time to failure recovery and rebalance on Kafka !
  • 11. © 2023, Amazon Web Services, Inc. or its affiliates. ”Retain data for ”
  • 12. © 2023, Amazon Web Services, Inc. or its affiliates. Amazon MSK Tiered Storage Save with low-cost tier Scale storage without adding brokers No additional infrastructure to manage Scale to virtually unlimited storage Launched in 2022 Faster partition rebalancing and recovery Based on KIP-405: Kafka Tiered Storage
  • 13. © 2023, Amazon Web Services, Inc. or its affiliates. AWS’ Contributions to Open Source Kafka 85%+ Code Reviews Integration testing frameworks Critical bug fixes Code Contributions New KIPs for Tiered Storage Documentation Customer experience definition Mentorship to new community members Testing and reporting issues
  • 14. © 2023, Amazon Web Services, Inc. or its affiliates. Customer Savings with Tiered Storage Without Tiered Storage With Tiered Storage Achieved 56% savings in their infrastructure costs Without Tiered Storage With Tiered Storage Reduced $/GB by 27% while increasing their retention by 3x 56% cost savings 3x data retention
  • 15. © 2023, Amazon Web Services, Inc. or its affiliates. Customers finding value from Tiered Storage Independent storage scaling from compute Enables high throughput workloads Use same app code for historical and real-time data Longer retention, improved cluster capacity utilization
  • 16. © 2023, Amazon Web Services, Inc. or its affiliates. Virtually unlimited and cost-effective storage tier Amazon MSK Tiered Storage
  • 17. © 2023, Amazon Web Services, Inc. or its affiliates. Enabling Amazon MSK Tiered Storage 1 Enable tiered storage using MSK console and AWS CLI 2 Enable tiered storage on a Kafka topic bin/kafka-configs.sh --bootstrap-server $bsrv --alter --entity-type topics --entity-name vidhi --add-config 'remote.storage.enable=true, local.retention.ms=7200000, retention.ms=604800000 Or Create a new tiered storage enabled topic bin/kafka-topics.sh --create --bootstrap-server $bsrv --replication-factor 2 --partitions 1 --topic vidhi --config remote.storage.enable=true --config local.retention.ms=7200000 - -config retention.ms=604800000
  • 18. © 2023, Amazon Web Services, Inc. or its affiliates. Topic Partition 0 Kafka partitions and segments Segment 0: Closed Segment 1: Active Writes Partition 0 Partition 1 Partition 2
  • 19. © 2023, Amazon Web Services, Inc. or its affiliates. Data lifecycle in a tiered storage enabled topic Example scenario: A topic with 2 days local retention and 5 days overall retention T0 Time T0 - Before you enable tiered storage Segment 3- Active Segment 1 Segment 2 Local log segments Topic Partition -0
  • 20. © 2023, Amazon Web Services, Inc. or its affiliates. Data lifecycle in a tiered storage enabled topic Example scenario: A topic with 2 days local retention and 5 days overall retention T1 Time T1 (< 2 days) - Tiered storage enabled. Segment 1 and 2 copied to tiered storage Segment 3- Active Segment 1 Segment 2 Local log segments Segment 1 Segment 2 Tiered storage log segments Topic Partition -0
  • 21. © 2023, Amazon Web Services, Inc. or its affiliates. Data lifecycle in a tiered storage enabled topic Example scenario: A topic with 2 days local retention and 5 days overall retention T2 Time T2 - Local retention in effect Deleted Segment 3- Active Segment 1 Segment 2 Local log segments Segment 1 Segment 2 Tiered storage log segments Topic Partition -0
  • 22. © 2023, Amazon Web Services, Inc. or its affiliates. Data lifecycle in a tiered storage enabled topic Example scenario: A topic with 2 days local retention and 5 days overall retention T3 Time T3 - Overall retention in effect Deleted Segment 3- Active Local log segments Tiered storage log segments Topic Partition -0 Segment 1 Segment 2
  • 23. © 2023, Amazon Web Services, Inc. or its affiliates. How read works with tiered storage Fetch request ReplicaManager ReadFromLocalLog RemoteLogManager Local storage Remote storage Source: KIP-405 Kafka Client
  • 24. © 2023, Amazon Web Services, Inc. or its affiliates. Amazon MSK Tiered Storage
  • 25. © 2023, Amazon Web Services, Inc. or its affiliates.
  • 26. © 2023, Amazon Web Services, Inc. or its affiliates. Takeaways Zero infrastructure management Consumption based pricing Fully compatible with Kafka APIs Virtually unlimited storage Similar read latencies Amazon MSK Tiered Storage
  • 27. © 2023, Amazon Web Services, Inc. or its affiliates. 27 “Enable tiered storage on your MSK clusters today to achieve storage elasticity and retain data longer at a lower cost with virtually unlimited capacity” Documentation Blog Vidhi Taneja vidhit@amazon.com Todd McGrath toddmcg@amazon.com
  • 28. © 2023, Amazon Web Services, Inc. or its affiliates. Thank you! © 2022, Amazon Web Services, Inc. or its affiliates. Vidhi Taneja vidhit@amazon.com Todd McGrath toddmcg@amazon.com