SlideShare a Scribd company logo
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Ditching the overhead: Moving
Apache Kafka workloads into Amazon
MSK
Damian Wylie
Principal product manager
AWS
A B D 3 0 1
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Agenda
A brief intro to Apache Kafka
The challenges of running Apache Kafka in production
How Amazon MSK addresses these challenges so that you don’t have to
Announcements
Replicating or migrating your workloads using MirrorMaker
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Related breakouts
ADB206: A deep dive into Amazon MSK
Damian Wylie and Vijay Kistampalli
Chalk Talk, W184a @ 5:00 p.m. on Thursday, May 30
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Apache Kafka use cases
Real-time web and log analytics
Messaging
Transaction and event sourcing
Decoupled microservices
Streaming ETL
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Apache Kafka anatomy 101
Producer
Broker
Broker
Broker
Data consumer
Cluster
Apache
ZooKeeper
Producer
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Apache Kafka anatomy: Writes to partitions
Newest dataOldest data
50 1 2 3 4
0 1 2 3
0 1 2 3 4
Partition 2
Partition 1
Partition 3
Writes from
producers
Topic with 3 partitions
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Apache Kafka anatomy: Reads from partitions
Newest dataOldest data
50 1 2 3 4
0 1 2 3
0 1 2 3 4
Partition 2
Partition 1
Partition 3
Topic with 3 partitions
Consumer
Consumer
Consumer
Consumer group
= Next consumer offset
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Challenges operating Apache Kafka
Difficult to set up
Hard to achieve high
availability
Difficult to scale
AWS integrations = development
No console, no
visible metrics
𝑓 𝑘𝑎𝑓𝑘𝑎 𝑢𝑠𝑎𝑔𝑒 = ෍
𝑛=1
∞
𝑆𝑅𝐸
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
What Amazon MSK does for you
• Makes Apache Kafka more accessible to your organization
• Drives best practices through design, defaults, and automation
• Allows developers to focus more on application development and less on
infrastructure management
• Amazon MSK is committed to improving open-source Apache Kafka
𝑓 𝑘𝑎𝑓𝑘𝑎 𝑢𝑠𝑎𝑔𝑒 = ෍
𝑛=1
∞
𝑆𝑡𝑟𝑒𝑎𝑚𝑖𝑛𝑔 𝐴𝑝𝑝𝑠
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Getting started with Amazon MSK is easy
• Fully compatible with Apache Kafka v1.1.1 and v2.1.0
• AWS Management Console and AWS API for provisioning
• Clusters are set up automatically in minutes
• Provision Apache Kafka brokers and storage
• Create and tear down clusters on demand
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Where’s Apache Zookeeper?
Apache Zookeeper is under the hood
It is highly available, fully managed,
automatically provisioned, and included
with each cluster at no additional cost
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
How connectivity
works
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
How pricing works
• On-demand, hourly pricing is prorated to the second
• Broker and storage pricing
• Broker pricing starts with kafka.m5.large at $0.21 per hour
• Storage pricing is $0.10 per GB-month
• Data transfer from replication within the cluster and ZooKeeper nodes are
included at no additional cost
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Launching now
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
New!
New!
New!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Amazon MSK customer references
“Reduced maintenance overhead”
“Made it easy to set up, maintain, and scale Kafka clusters”
“Accelerates time to market”
“Ensures data durability, cluster availability, and scalability“
“Significantly increase[s] the efficiency of our teams and
reduce[s] time spent maintaining our clusters”
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
New security features
Encryption in transit via TLS
inCluster and clientBroker
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
New security features
Mutual TLS authentication
Certificate-based authentication using AWS Certificate
Manager Private Certificate Authority (AWS PCA)
1. Create PCA with a root certificate within AWS ACM
2. Create Amazon MSK cluster with authentication enabled, selecting PCAs
3. Consumers and producers are configured with a certificate issued by the root CA and trust store
4. Apache Kafka ACLs can now be configured using the certificate dname as the principal user
AWS Certificate Manager
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
HIPAA eligible
AWS CloudTrail for API auditing
AWS
CloudTrail
New compliance features
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
New ease of use features
Custom configurations
For new clusters; support for updating existing clusters
coming soon
Cluster-wide storage scaling
Cluster tagging and tag-based IAM polices
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
New ease of use features
Custom configurations (CLI only)
Console support coming in the next few weeks
auto.create.topics.enable
delete.topic.enable
group.initial.rebalance.delay.ms
group.max.session.timeout.ms
group.min.session.timeout.ms
log.cleaner.delete.retention.ms
log.cleaner.min.cleanable.ratio
log.flush.interval.messages
log.flush.interval.ms
log.retention.bytes
log.retention.hours
log.retention.minutes
log.retention.ms
log.roll.ms
log.segment.bytes
max.incremental.fetch.session.cache.slots
message.max.bytes
min.insync.replicas
num.partitions
offsets.retention.minutes
transaction.max.timeout.ms
unclean.leader.election.enable
zookeeper.connection.timeout.ms
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
How performance meets cost
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
How to ditch the overhead
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
MirrorMaker v1: How it works
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
MirrorMaker command
bin/kafka-mirror-maker.sh
--consumer.config consumer.properties
--producer.config producer.properties
--num.streams
--num.producers
--whitelist <regex topics>
[--blacklist <regex topics> ]
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
MirrorMaker v1 best practices
Run the tool in the destination—in this case, in the VPC with your MSK cluster; if encryption is required in
transit, run it in the source
For no data loss and order
For consumer, set auto.commit.enabled=false
For producer
max.in.flight.requests.per.connection=1
retries=Int.Max_Value
acks=all
max.block.ms = Long.Max_Value
For MirrorMaker
set – abortOnSendFail
For high throughput for producer
max.in.flight.requests.per.connection = 1+ (warning: no ordering)
Enable compression (compression.type = gzip)
Buffer messages and fill message batches – tune buffer.memory, batch.size, linger.ms
Tune socket buffers – receive.buffer.bytes, send.buffer.bytes
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
MirrorMaker v1 best practices
For high throughput of consumer
Increase the number of threads/consumers per MirrorMaker process - num.streams
Increase the number of MirrorMaker processes across machines first before increasing threads to allow for
high availability
Increase the number of MirrorMaker processes first on the same machine and then on different machines
(with same groupid)
Isolate topics that have very high throughput and use separate MirrorMaker instances
For management and configuration
Use AWS CloudFormation and configuration management tools like Chef and Ansible
Use Amazon EFS file system mounts to keep all configuration files accessible from all Amazon EC2 Instances
Use containers for easy scaling and management of MirrorMaker instances
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
MirrorMaker v1 limitations
Does not replicate topic
configurations
It replicates topics if
auto.create.topics.enable = true in the
destination cluster, but topics are created
with the default configuration in the
destination cluster
Can cause configuration divergence
With auto.create.topics.enable = false,
topics have to be manually created in the
destination
Topic configuration changes have to be
manually replicated
Message offsets might not match
between source and destination
clusters
To avoid duplicates, shut down producers to
the source cluster, confirming that consumers
have consumed all messages, replicating all
messages to the destination cluster, and
starting consumers on the destination cluster
with auto.offset.reset = latest
Failover and disaster recovery scenarios are not
supported
Whitelists and blacklists only support Java-style
regular expressions and cannot be dynamically
updated
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
MirrorMaker v1 Limitations
• Minimal operational and management support
• Only supports at-least-once guarantees; does not support an idempotent
producer or transactions
• Minimal metrics support
• Any configuration change means the cluster must be bounced
• Rebalancing causes latency spikes, which may trigger further rebalances
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
MirrorMaker v2
Addresses the limitations of v1
• Leverages the Kafka Connect framework and ecosystem
• Detects new topics, partitions
• Automatically syncs topic configuration between clusters
• Supports active/active cluster pairs, as well as any number of active clusters
• Provides new metrics, including end-to-end replication latency across multiple data centers or clusters
• Emits offsets required to migrate consumers between clusters and tooling for offset translation
• Supports a high-level configuration file for specifying multiple clusters and replication flows in one
place, compared to low-level producer/consumer properties for each MM1 process
• https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Migration and replication guide
1. Create Amazon MSK destination cluster
2. Start MirrorMaker from an Amazon EC2 instance within the same Amazon VPC
as the destination cluster
3. Inspect MirrorMaker lag
4. If you are migrating, once MirrorMaker has caught up, redirect producers and
consumers to new cluster using the Amazon MSK cluster bootstrap broker
value
5. Shut down MirrorMaker
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T
Demo
Thank you!
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Damian Wylie
wylied@amazon.com
LinkedIn: wyliedamian
Twitter: @DamianWylie

More Related Content

What's hot

Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...
Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...
Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...
Amazon Web Services
 
AWS Core Services Overview, Immersion Day Huntsville 2019
AWS Core Services Overview, Immersion Day Huntsville 2019AWS Core Services Overview, Immersion Day Huntsville 2019
AWS Core Services Overview, Immersion Day Huntsville 2019
Amazon Web Services
 
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...
Amazon Web Services
 
AWS Summit London 2019 - Containers on AWS
AWS Summit London 2019 - Containers on AWSAWS Summit London 2019 - Containers on AWS
AWS Summit London 2019 - Containers on AWS
Massimo Ferre'
 
Prometheus 101
Prometheus 101Prometheus 101
Prometheus 101
Paul Podolny
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak Performance
Todd Palino
 
Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to Prometheus
Julien Pivotto
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
Discover Pinterest
 
Centralizing DNS Management in a Multi-Account Environment (NET322-R2) - AWS ...
Centralizing DNS Management in a Multi-Account Environment (NET322-R2) - AWS ...Centralizing DNS Management in a Multi-Account Environment (NET322-R2) - AWS ...
Centralizing DNS Management in a Multi-Account Environment (NET322-R2) - AWS ...
Amazon Web Services
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Flink Forward
 
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Amazon Web Services
 
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon Web Services
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
Provectus
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for Databricks
Databricks
 
Introducing AWS Fargate
Introducing AWS FargateIntroducing AWS Fargate
Introducing AWS Fargate
Amazon Web Services
 
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...
Amazon Web Services
 
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Amazon Web Services
 
AWS Security Week: Security, Identity, & Compliance
AWS Security Week: Security, Identity, & ComplianceAWS Security Week: Security, Identity, & Compliance
AWS Security Week: Security, Identity, & Compliance
Amazon Web Services
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
Agile Testing Alliance
 

What's hot (20)

Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...
Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...
Migrating Your Databases to AWS - Deep Dive on Amazon RDS and AWS Database Mi...
 
AWS Core Services Overview, Immersion Day Huntsville 2019
AWS Core Services Overview, Immersion Day Huntsville 2019AWS Core Services Overview, Immersion Day Huntsville 2019
AWS Core Services Overview, Immersion Day Huntsville 2019
 
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...
Metrics-Driven Performance Tuning for AWS Glue ETL Jobs (ANT326) - AWS re:Inv...
 
AWS Summit London 2019 - Containers on AWS
AWS Summit London 2019 - Containers on AWSAWS Summit London 2019 - Containers on AWS
AWS Summit London 2019 - Containers on AWS
 
Prometheus 101
Prometheus 101Prometheus 101
Prometheus 101
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak Performance
 
Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to Prometheus
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Centralizing DNS Management in a Multi-Account Environment (NET322-R2) - AWS ...
Centralizing DNS Management in a Multi-Account Environment (NET322-R2) - AWS ...Centralizing DNS Management in a Multi-Account Environment (NET322-R2) - AWS ...
Centralizing DNS Management in a Multi-Account Environment (NET322-R2) - AWS ...
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
 
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
 
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
DevOps for Databricks
DevOps for DatabricksDevOps for Databricks
DevOps for Databricks
 
Introducing AWS Fargate
Introducing AWS FargateIntroducing AWS Fargate
Introducing AWS Fargate
 
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...
Introducing AWS DataSync - Simplify, automate, and accelerate online data tra...
 
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
 
AWS Security Week: Security, Identity, & Compliance
AWS Security Week: Security, Identity, & ComplianceAWS Security Week: Security, Identity, & Compliance
AWS Security Week: Security, Identity, & Compliance
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 

Similar to Ditching the overhead - Moving Apache Kafka workloads into Amazon MSK - ADB301 - Chicago AWS Summit

A deep dive into Amazon MSK - ADB206 - Chicago AWS Summit
A deep dive into Amazon MSK - ADB206 - Chicago AWS SummitA deep dive into Amazon MSK - ADB206 - Chicago AWS Summit
A deep dive into Amazon MSK - ADB206 - Chicago AWS Summit
Amazon Web Services
 
利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統
Amazon Web Services
 
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
Amazon Web Services
 
Building well architected .NET applications - SVC209 - Atlanta AWS Summit
Building well architected .NET applications - SVC209 - Atlanta AWS SummitBuilding well architected .NET applications - SVC209 - Atlanta AWS Summit
Building well architected .NET applications - SVC209 - Atlanta AWS Summit
Amazon Web Services
 
Architecting SAP on Amazon Web Services - SVC216 - Chicago AWS Summit
Architecting SAP on Amazon Web Services - SVC216 - Chicago AWS SummitArchitecting SAP on Amazon Web Services - SVC216 - Chicago AWS Summit
Architecting SAP on Amazon Web Services - SVC216 - Chicago AWS Summit
Amazon Web Services
 
Fast-Track Your Application Modernisation Journey with Containers - AWS Summi...
Fast-Track Your Application Modernisation Journey with Containers - AWS Summi...Fast-Track Your Application Modernisation Journey with Containers - AWS Summi...
Fast-Track Your Application Modernisation Journey with Containers - AWS Summi...
Amazon Web Services
 
Perfecting the Media Workflow Experience on AWS - Ben Masek, 월드와이드 미디어 사업개발 헤...
Perfecting the Media Workflow Experience on AWS - Ben Masek, 월드와이드 미디어 사업개발 헤...Perfecting the Media Workflow Experience on AWS - Ben Masek, 월드와이드 미디어 사업개발 헤...
Perfecting the Media Workflow Experience on AWS - Ben Masek, 월드와이드 미디어 사업개발 헤...
Amazon Web Services Korea
 
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS Summit
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS SummitGetting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS Summit
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS Summit
Amazon Web Services
 
Running Amazon Elastic Compute Cloud (Amazon EC2) workloads at scale - CMP202...
Running Amazon Elastic Compute Cloud (Amazon EC2) workloads at scale - CMP202...Running Amazon Elastic Compute Cloud (Amazon EC2) workloads at scale - CMP202...
Running Amazon Elastic Compute Cloud (Amazon EC2) workloads at scale - CMP202...
Amazon Web Services
 
Architecting security and governance through policy guardrails in Amazon EKS ...
Architecting security and governance through policy guardrails in Amazon EKS ...Architecting security and governance through policy guardrails in Amazon EKS ...
Architecting security and governance through policy guardrails in Amazon EKS ...
Amazon Web Services
 
Well Archictecture Framework dotNET.pdf
Well Archictecture Framework dotNET.pdfWell Archictecture Framework dotNET.pdf
Well Archictecture Framework dotNET.pdf
ConradoDeBiasi
 
Accelerating product development with high performance computing - CMP301 - S...
Accelerating product development with high performance computing - CMP301 - S...Accelerating product development with high performance computing - CMP301 - S...
Accelerating product development with high performance computing - CMP301 - S...
Amazon Web Services
 
Amazon EC2 Strategie per l'ottimizzazione dei costi
Amazon EC2 Strategie per l'ottimizzazione dei costiAmazon EC2 Strategie per l'ottimizzazione dei costi
Amazon EC2 Strategie per l'ottimizzazione dei costi
Amazon Web Services
 
Breaking Up the Monolith with Containers
Breaking Up the Monolith with ContainersBreaking Up the Monolith with Containers
Breaking Up the Monolith with Containers
Amazon Web Services
 
Modernizing legacy applications with Amazon EKS - MAD301 - Chicago AWS Summit
Modernizing legacy applications with Amazon EKS - MAD301 - Chicago AWS SummitModernizing legacy applications with Amazon EKS - MAD301 - Chicago AWS Summit
Modernizing legacy applications with Amazon EKS - MAD301 - Chicago AWS Summit
Amazon Web Services
 
利用Fargate無伺服器的容器環境建置高可用的系統
利用Fargate無伺服器的容器環境建置高可用的系統利用Fargate無伺服器的容器環境建置高可用的系統
利用Fargate無伺服器的容器環境建置高可用的系統
Amazon Web Services
 
Secure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using FirecrackerSecure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using Firecracker
Arun Gupta
 
Core services
Core servicesCore services
Core services
Richard Harvey
 
What's New in Amazon Aurora - ADB203 - Anaheim AWS Summit
What's New in Amazon Aurora - ADB203 - Anaheim AWS SummitWhat's New in Amazon Aurora - ADB203 - Anaheim AWS Summit
What's New in Amazon Aurora - ADB203 - Anaheim AWS Summit
Amazon Web Services
 
What's new in Amazon Aurora - ADB203 - Atlanta AWS Summit
What's new in Amazon Aurora - ADB203 - Atlanta AWS SummitWhat's new in Amazon Aurora - ADB203 - Atlanta AWS Summit
What's new in Amazon Aurora - ADB203 - Atlanta AWS Summit
Amazon Web Services
 

Similar to Ditching the overhead - Moving Apache Kafka workloads into Amazon MSK - ADB301 - Chicago AWS Summit (20)

A deep dive into Amazon MSK - ADB206 - Chicago AWS Summit
A deep dive into Amazon MSK - ADB206 - Chicago AWS SummitA deep dive into Amazon MSK - ADB206 - Chicago AWS Summit
A deep dive into Amazon MSK - ADB206 - Chicago AWS Summit
 
利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統利用 Fargate - 無伺服器的容器環境建置高可用的系統
利用 Fargate - 無伺服器的容器環境建置高可用的系統
 
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
 
Building well architected .NET applications - SVC209 - Atlanta AWS Summit
Building well architected .NET applications - SVC209 - Atlanta AWS SummitBuilding well architected .NET applications - SVC209 - Atlanta AWS Summit
Building well architected .NET applications - SVC209 - Atlanta AWS Summit
 
Architecting SAP on Amazon Web Services - SVC216 - Chicago AWS Summit
Architecting SAP on Amazon Web Services - SVC216 - Chicago AWS SummitArchitecting SAP on Amazon Web Services - SVC216 - Chicago AWS Summit
Architecting SAP on Amazon Web Services - SVC216 - Chicago AWS Summit
 
Fast-Track Your Application Modernisation Journey with Containers - AWS Summi...
Fast-Track Your Application Modernisation Journey with Containers - AWS Summi...Fast-Track Your Application Modernisation Journey with Containers - AWS Summi...
Fast-Track Your Application Modernisation Journey with Containers - AWS Summi...
 
Perfecting the Media Workflow Experience on AWS - Ben Masek, 월드와이드 미디어 사업개발 헤...
Perfecting the Media Workflow Experience on AWS - Ben Masek, 월드와이드 미디어 사업개발 헤...Perfecting the Media Workflow Experience on AWS - Ben Masek, 월드와이드 미디어 사업개발 헤...
Perfecting the Media Workflow Experience on AWS - Ben Masek, 월드와이드 미디어 사업개발 헤...
 
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS Summit
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS SummitGetting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS Summit
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS Summit
 
Running Amazon Elastic Compute Cloud (Amazon EC2) workloads at scale - CMP202...
Running Amazon Elastic Compute Cloud (Amazon EC2) workloads at scale - CMP202...Running Amazon Elastic Compute Cloud (Amazon EC2) workloads at scale - CMP202...
Running Amazon Elastic Compute Cloud (Amazon EC2) workloads at scale - CMP202...
 
Architecting security and governance through policy guardrails in Amazon EKS ...
Architecting security and governance through policy guardrails in Amazon EKS ...Architecting security and governance through policy guardrails in Amazon EKS ...
Architecting security and governance through policy guardrails in Amazon EKS ...
 
Well Archictecture Framework dotNET.pdf
Well Archictecture Framework dotNET.pdfWell Archictecture Framework dotNET.pdf
Well Archictecture Framework dotNET.pdf
 
Accelerating product development with high performance computing - CMP301 - S...
Accelerating product development with high performance computing - CMP301 - S...Accelerating product development with high performance computing - CMP301 - S...
Accelerating product development with high performance computing - CMP301 - S...
 
Amazon EC2 Strategie per l'ottimizzazione dei costi
Amazon EC2 Strategie per l'ottimizzazione dei costiAmazon EC2 Strategie per l'ottimizzazione dei costi
Amazon EC2 Strategie per l'ottimizzazione dei costi
 
Breaking Up the Monolith with Containers
Breaking Up the Monolith with ContainersBreaking Up the Monolith with Containers
Breaking Up the Monolith with Containers
 
Modernizing legacy applications with Amazon EKS - MAD301 - Chicago AWS Summit
Modernizing legacy applications with Amazon EKS - MAD301 - Chicago AWS SummitModernizing legacy applications with Amazon EKS - MAD301 - Chicago AWS Summit
Modernizing legacy applications with Amazon EKS - MAD301 - Chicago AWS Summit
 
利用Fargate無伺服器的容器環境建置高可用的系統
利用Fargate無伺服器的容器環境建置高可用的系統利用Fargate無伺服器的容器環境建置高可用的系統
利用Fargate無伺服器的容器環境建置高可用的系統
 
Secure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using FirecrackerSecure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using Firecracker
 
Core services
Core servicesCore services
Core services
 
What's New in Amazon Aurora - ADB203 - Anaheim AWS Summit
What's New in Amazon Aurora - ADB203 - Anaheim AWS SummitWhat's New in Amazon Aurora - ADB203 - Anaheim AWS Summit
What's New in Amazon Aurora - ADB203 - Anaheim AWS Summit
 
What's new in Amazon Aurora - ADB203 - Atlanta AWS Summit
What's new in Amazon Aurora - ADB203 - Atlanta AWS SummitWhat's new in Amazon Aurora - ADB203 - Atlanta AWS Summit
What's new in Amazon Aurora - ADB203 - Atlanta AWS Summit
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Ditching the overhead - Moving Apache Kafka workloads into Amazon MSK - ADB301 - Chicago AWS Summit

  • 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Ditching the overhead: Moving Apache Kafka workloads into Amazon MSK Damian Wylie Principal product manager AWS A B D 3 0 1
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Agenda A brief intro to Apache Kafka The challenges of running Apache Kafka in production How Amazon MSK addresses these challenges so that you don’t have to Announcements Replicating or migrating your workloads using MirrorMaker
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Related breakouts ADB206: A deep dive into Amazon MSK Damian Wylie and Vijay Kistampalli Chalk Talk, W184a @ 5:00 p.m. on Thursday, May 30
  • 4. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Apache Kafka use cases Real-time web and log analytics Messaging Transaction and event sourcing Decoupled microservices Streaming ETL
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Apache Kafka anatomy 101 Producer Broker Broker Broker Data consumer Cluster Apache ZooKeeper Producer
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Apache Kafka anatomy: Writes to partitions Newest dataOldest data 50 1 2 3 4 0 1 2 3 0 1 2 3 4 Partition 2 Partition 1 Partition 3 Writes from producers Topic with 3 partitions
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Apache Kafka anatomy: Reads from partitions Newest dataOldest data 50 1 2 3 4 0 1 2 3 0 1 2 3 4 Partition 2 Partition 1 Partition 3 Topic with 3 partitions Consumer Consumer Consumer Consumer group = Next consumer offset
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Challenges operating Apache Kafka Difficult to set up Hard to achieve high availability Difficult to scale AWS integrations = development No console, no visible metrics 𝑓 𝑘𝑎𝑓𝑘𝑎 𝑢𝑠𝑎𝑔𝑒 = ෍ 𝑛=1 ∞ 𝑆𝑅𝐸
  • 10. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T What Amazon MSK does for you • Makes Apache Kafka more accessible to your organization • Drives best practices through design, defaults, and automation • Allows developers to focus more on application development and less on infrastructure management • Amazon MSK is committed to improving open-source Apache Kafka 𝑓 𝑘𝑎𝑓𝑘𝑎 𝑢𝑠𝑎𝑔𝑒 = ෍ 𝑛=1 ∞ 𝑆𝑡𝑟𝑒𝑎𝑚𝑖𝑛𝑔 𝐴𝑝𝑝𝑠
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Getting started with Amazon MSK is easy • Fully compatible with Apache Kafka v1.1.1 and v2.1.0 • AWS Management Console and AWS API for provisioning • Clusters are set up automatically in minutes • Provision Apache Kafka brokers and storage • Create and tear down clusters on demand
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Where’s Apache Zookeeper? Apache Zookeeper is under the hood It is highly available, fully managed, automatically provisioned, and included with each cluster at no additional cost
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T How connectivity works
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T How pricing works • On-demand, hourly pricing is prorated to the second • Broker and storage pricing • Broker pricing starts with kafka.m5.large at $0.21 per hour • Storage pricing is $0.10 per GB-month • Data transfer from replication within the cluster and ZooKeeper nodes are included at no additional cost
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Launching now
  • 17. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. New! New! New!
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Amazon MSK customer references “Reduced maintenance overhead” “Made it easy to set up, maintain, and scale Kafka clusters” “Accelerates time to market” “Ensures data durability, cluster availability, and scalability“ “Significantly increase[s] the efficiency of our teams and reduce[s] time spent maintaining our clusters”
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T New security features Encryption in transit via TLS inCluster and clientBroker
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T New security features Mutual TLS authentication Certificate-based authentication using AWS Certificate Manager Private Certificate Authority (AWS PCA) 1. Create PCA with a root certificate within AWS ACM 2. Create Amazon MSK cluster with authentication enabled, selecting PCAs 3. Consumers and producers are configured with a certificate issued by the root CA and trust store 4. Apache Kafka ACLs can now be configured using the certificate dname as the principal user AWS Certificate Manager
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T HIPAA eligible AWS CloudTrail for API auditing AWS CloudTrail New compliance features
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T New ease of use features Custom configurations For new clusters; support for updating existing clusters coming soon Cluster-wide storage scaling Cluster tagging and tag-based IAM polices
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T New ease of use features Custom configurations (CLI only) Console support coming in the next few weeks auto.create.topics.enable delete.topic.enable group.initial.rebalance.delay.ms group.max.session.timeout.ms group.min.session.timeout.ms log.cleaner.delete.retention.ms log.cleaner.min.cleanable.ratio log.flush.interval.messages log.flush.interval.ms log.retention.bytes log.retention.hours log.retention.minutes log.retention.ms log.roll.ms log.segment.bytes max.incremental.fetch.session.cache.slots message.max.bytes min.insync.replicas num.partitions offsets.retention.minutes transaction.max.timeout.ms unclean.leader.election.enable zookeeper.connection.timeout.ms
  • 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T How performance meets cost
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T How to ditch the overhead
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T MirrorMaker v1: How it works
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T MirrorMaker command bin/kafka-mirror-maker.sh --consumer.config consumer.properties --producer.config producer.properties --num.streams --num.producers --whitelist <regex topics> [--blacklist <regex topics> ]
  • 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T MirrorMaker v1 best practices Run the tool in the destination—in this case, in the VPC with your MSK cluster; if encryption is required in transit, run it in the source For no data loss and order For consumer, set auto.commit.enabled=false For producer max.in.flight.requests.per.connection=1 retries=Int.Max_Value acks=all max.block.ms = Long.Max_Value For MirrorMaker set – abortOnSendFail For high throughput for producer max.in.flight.requests.per.connection = 1+ (warning: no ordering) Enable compression (compression.type = gzip) Buffer messages and fill message batches – tune buffer.memory, batch.size, linger.ms Tune socket buffers – receive.buffer.bytes, send.buffer.bytes
  • 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T MirrorMaker v1 best practices For high throughput of consumer Increase the number of threads/consumers per MirrorMaker process - num.streams Increase the number of MirrorMaker processes across machines first before increasing threads to allow for high availability Increase the number of MirrorMaker processes first on the same machine and then on different machines (with same groupid) Isolate topics that have very high throughput and use separate MirrorMaker instances For management and configuration Use AWS CloudFormation and configuration management tools like Chef and Ansible Use Amazon EFS file system mounts to keep all configuration files accessible from all Amazon EC2 Instances Use containers for easy scaling and management of MirrorMaker instances
  • 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T MirrorMaker v1 limitations Does not replicate topic configurations It replicates topics if auto.create.topics.enable = true in the destination cluster, but topics are created with the default configuration in the destination cluster Can cause configuration divergence With auto.create.topics.enable = false, topics have to be manually created in the destination Topic configuration changes have to be manually replicated Message offsets might not match between source and destination clusters To avoid duplicates, shut down producers to the source cluster, confirming that consumers have consumed all messages, replicating all messages to the destination cluster, and starting consumers on the destination cluster with auto.offset.reset = latest Failover and disaster recovery scenarios are not supported Whitelists and blacklists only support Java-style regular expressions and cannot be dynamically updated
  • 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T MirrorMaker v1 Limitations • Minimal operational and management support • Only supports at-least-once guarantees; does not support an idempotent producer or transactions • Minimal metrics support • Any configuration change means the cluster must be bounced • Rebalancing causes latency spikes, which may trigger further rebalances
  • 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T MirrorMaker v2 Addresses the limitations of v1 • Leverages the Kafka Connect framework and ecosystem • Detects new topics, partitions • Automatically syncs topic configuration between clusters • Supports active/active cluster pairs, as well as any number of active clusters • Provides new metrics, including end-to-end replication latency across multiple data centers or clusters • Emits offsets required to migrate consumers between clusters and tooling for offset translation • Supports a high-level configuration file for specifying multiple clusters and replication flows in one place, compared to low-level producer/consumer properties for each MM1 process • https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
  • 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Migration and replication guide 1. Create Amazon MSK destination cluster 2. Start MirrorMaker from an Amazon EC2 instance within the same Amazon VPC as the destination cluster 3. Inspect MirrorMaker lag 4. If you are migrating, once MirrorMaker has caught up, redirect producers and consumers to new cluster using the Amazon MSK cluster bootstrap broker value 5. Shut down MirrorMaker
  • 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Demo
  • 35. Thank you! S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Damian Wylie wylied@amazon.com LinkedIn: wyliedamian Twitter: @DamianWylie