1
Bridge to Cloud: Using Apache
Kafka to Migrate to AWS
Priya Shivakumar, Confluent Cloud
Konstantine Karantasis, Confluent Cloud
Rohit Pujari, Amazon Web Services
2
Speakers
Priya is the Director of Product at Confluent, where she focuses on product and go-to-market strategy for Confluent Cloud, a fully managed Apache
Kafka service. She brings more than a decade of experience in the tech industry as an engineering and product leader. Prior to Confluent, she led
product marketing and GTM strategy for VSAN at VMware. As a growth strategy consultant at LEK before that, Priya advised clients on a range of
growth issues that include new product introduction, commercialization, geo expansion, and acquisition.
Priya Shivakumar
Director of Product, Confluent
Konstantine Karantasis is a Software Engineer at Confluent, Inc. He’s a main contributor to Apache Kafka and its Connect API and he’s the author of
widely used software, such as Confluent’s S3 and Replicator Connectors, class loading isolation in Apache Kafka Connect, Confluent CLI and more.
Previously, he built scalable open source web-services at Yahoo and did research on high-performance computing at the University of Illinois at
Urbana-Champaign. Konstantine holds a Ph.D. from the University of Patras, Greece.
Konstantine Karantasis
Software Engineer, Confluent
Rohit is a Partner Solutions Architect with AWS. He focuses on growing cloud business of their top technology partners by helping them build,
innovate, and go-to-market with customer-centric solutions on AWS. Rohit brings a wealth of experience in data engineering and analytics from having
worked with customers of all sizes in various stages of their data journey.
Rohit Pujari
Partner Solutions Architect, Amazon Web Services
3
Agenda
The Great “Cloud Shift”
What is a Bridge to Cloud?
Key Considerations for Cloud Migration
A Customer Story
Live Demo
Q&A
4
Poll
How is your infrastructure setup today?
• Mostly on-premises
• Mostly in the cloud
• Combination of on-premises and cloud (hybrid model)
5
The Great “Cloud Shift”
A cloud product is not complete
without a cloud migration story.
6
Cloud Migration: A one time thing?
7
In reality, we keep running
Our apps are complex, intricately tied
together.
Some, built to run on legacy, are
immovable systems.
8
We don’t want to just move.
We want to build for the cloud.
9
DC-1
App
App
App
App App
App
Elasticsearch
MySQL
Oracle Teradata
Redshift
S3
10
App
Oracle
S3
11C O N F I D E N T I A L
Apache Kafka, the de-facto standard for real-
time streaming
Real-time | Uses disk structure for constant performance at Petabyte
scale
Scalable | Distributed, scales quickly and easily without downtime
Persistent | Persists messages on disks, enables intra-cluster replication
Reliable | Replicates data, auto balances consumers upon failure
In production at more
than a third of the
Fortune 500
2 trillion messages a
day at LinkedIn
500 billion events a
day (1.3 PB) at Netflix
12
Poll
How are you using Apache Kafka today?
• Apache Kafka on-premises
• Apache Kafka in cloud
• Apache Kafka on-premises and cloud (hybrid)
• Not using Apache Kafka currently
13
What is our bridge?
Confluent Replicator
14C O N F I D E N T I A L
Replicator | Reliable, Scalable, Simple
Feature List Replicator Mirror-maker
Reliable Auto creation of topics ✔ Partial
New partition addition
Configuration replication
✔ X
Single message transformations ✔ X
Active-active replication ✔ X
Scalable Aggregate cluster - single management point for multiple
clusters
✔ X
Auto scale - scale replication processes as Kafka traffic
increases with a single configuration
✔ X
Simple Control Center Integration - manage and monitor replication
via Control Center UI
✔ X
Disaster Recovery
support
Active-active replication - redirect events to avoid infinite
replication loops in active-active configurations ✔ X
15
Disaster Recovery and Bridge to Cloud
Enables multi-DC deployments of
Apache Kafka
● On-premises: DC to DC
● Hybrid: DC to Region
● Cloud: Region to Region
Confluent Replicator
16
Establish Your Foundation
App
Oracle
S3
1 Deploy Kafka on-premises and on AWS
Teradata
17
Establish Your Foundation
App
Oracle
2
Create your pipeline and replicate your topics to
your AWS cluster
Teradata
Confluent Replicator
18
Let the Traffic Flow
App
Oracle S3
3
Migrate app by app, database by
database
App
Topic: dc1users
Topic: dc1clicks
19
DC-1
App
App
App
App App
App
Elasticsearch
MySQL
Oracle Teradata DynamoDB
RDS
Redshift
S3
DC-NY
DC-
LON
amer-northeast
emea-central
20
DC-1
App
App
App
App App
App
Elasticsearch
MySQL
Oracle Teradata
Redshift
S3
DC-NY
DC-
LON
amer-northeast
emea-central
21
Confluent
Cloud
manages
Kafka for you
Mission-Critical Reliability
Complete Streaming Service
Freedom of Choice
from the original creators of Apache Kafka
that evolves with your needs
for the best of private and public clouds
22C O N F I D E N T I A L
Confluent Cloud Battle tested for massive
scale, mission-critical pipelines
● Designed by creators of Kafka
● Operated by Kafka committers
● 100+ years of combined Kafka
experience
● 80% of Kafka code commits
● More than a third of Fortune
100 companies trust Confluent
● Sub-25ms^ latencies at multi-
GBps throughput
● Highly secure with ACLs,
RBAC and encryption*
● 99.95% uptime SLA guarantee
● < 1 hr response time SLA for
P1 issues
Delivered by Experts Trusted by Enterprises Backed by Guarantees
^P95 latency
*ACLs, RBAC coming soon to Confluent Cloud
23C O N F I D E N T I A L
DATA COMPATIBILITY | Schema registry
DEVELOPMENT & CONNECTIVITY | Clients | Connectors | REST proxy | KSQL
APACHE KAFKA | Connect | Pub-Sub | Streams
®
Database changes Log events IoT events Web events
Transformations
Custom apps
Analytics
Monitoring
Hadoop
Database
Data warehouse
CRM
Confluent Cloud
DATA INTEGRATION REAL-TIME APPS
Confluent Cloud Kafka re-engineered for cloud
MANAGEMENT & MONITORING | Multi-tenancy | Quotas | Usage tracking and billing
24C O N F I D E N T I A L
Private Cloud
Deploy on-premises in your
datacenter with Confluent
Enterprise
Public Cloud
Migrate to or adopt cloud
at your own pace with fully-
managed Confluent Cloud
Hybrid Cloud
Build a persistent bridge b/n
datacenter and cloud with
Confluent Replicator
Confluent Cloud Industry’s only hybrid Kafka service
25C O N F I D E N T I A L
Confluent | Singular Kafka focus and innovation
Apache Kafka re-engineered for the cloud
Confluent Vision for Kafka
Global
● Automated disaster recovery
● Global applications with geo-awareness
Infinite
● Efficient and infinite data with tiered storage
● Unlimited horizontal scalability for single clusters
● Faster elastic scaling for brokers and partitions
Elastic
● Easy Kubernetes- based orchestration and management with Confluent operator
● Faster elastic scaling when adding brokers and partitions
26C O N F I D E N T I A L
Confluent Cloud on AWS AWS EcosystemOn-premises
Confluent Cloud + AWS Ecosystem
App
Oracle
App
Confluent
Enterprise
Confluent
Replicator
Confluent S3
Connector
Confluent
Cloud
Run SQL queries Visualize data
Run interactive queries
Run ad-hoc and big data analysis
27
Key
Considerations
for Cloud
Migration
● Security
● Compliance
● Availability
● Cost
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Automate
with deeply integrated
security tools
and services
Inherit
global
security and compliance
controls
Highest
standards
for privacy
and data
security
Largest
network
of security
partners and solutions
Scale
with superior
visibility and control
that satisfies the most
risk-sensitive orgs
Move to AWS
Strengthen your security posture
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Encrypt data in
transit and at rest
with keys managed by
our AWS Key Management
System (KMS) or managing
your own encryption keys
with Cloud HSM using
FIPS 140-2 Level 3
validated HSMs
Meet data
residency requirements
Choose an AWS Region
and AWS will not replicate it
elsewhere unless you choose to
do so
Access services and tools that
enable you to
build GDPR-compliant
infrastructure
on top of AWS
Comply with local
data privacy laws
by controlling who
can access content, its
lifecycle and disposal
Highest standards for privacy
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
● Virtual Private Connection (VPC) - a
virtual network defined in AWS
● Dedicated to a cloud account (AWS)
● Provides logical isolation from other
virtual networks in AWS
● Launch cloud resources such as EC2
instances into a VPC
AWS Virtual Private Cloud
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
● Setup VPC peering between VPCs for private
traffic
● VPC peering connection is a networking
connection between two VPCs that enables
traffic to be routed private IP addresses
● Instances in either VPC can communicate with
each other as if they are within the same
network
● Create a VPC peering connection between:
○ Your own VPCs if you have multiple
○ Your VPC and Confluent Cloud VPC
VPC Peering for secure data transfer
32C O N F I D E N T I A L
Security | Confluent Cloud
● Confluent Cloud offers dedicated clusters
for greater isolation and security
● Confluent Cloud supports VPC peering to
private data transfer
● With VPC peering, your cloud applications
can securely transfer data to your dedicated
Confluent Cloud cluster
● Data does not need to travel over internet
Your AWS VPC
Dedicated
Kafka
(Single-AZ)
CCloud VPC
AWS Region
N1 N2 N3
VPC
Peering
ELB
(Private IP)
Your Datacenter
Confluent
Replicator
Cloud
Applications
33C O N F I D E N T I A L
Availability | Multi-AZ with Confluent Cloud
● Confluent Cloud offers zone-level failure
protection with Multi-AZ clusters
● Confluent Cloud guarantees 99.95%
uptime with SLA
● Maximize availability with 3x replication
across 3 zones
● Use Confluent replicator to stream data
across regions for region-level failure
protection
AZ3AZ2AZ1
Customer VPC
Dedicated
Kafka (Multi-
AZ)
CCloud VPC
AWS Region
N1 N2 N3
VPC
Peering
ELB
(Private IP)
34C O N F I D E N T I A L
Compliance | Confluent Cloud
SOC-2 Type II
compliant
GDPR ready
PCI Phase 1
complete
HIPAA
(coming soon)
35C O N F I D E N T I A L
Cost | Right-size, architect, and outsource as necessary
● Customers often incorrectly size / oversize clusters in the cloud
● Cut costs by right sizing, using reserved instances, and signing up for AWS EDP
Right sizing
Architecting for
cost
Optimizing for time-
to-market
De-risk downtime
and lag
● Getting up and running quickly is critical for costs and competitiveness
● Managed services and consumption-based models help with speed to market
● Distributed systems are complex with lots of APIs, metrics, systems, and configs
● Stateful systems also require careful capacity planning
● Plan carefully, ensure in-house expertise or get expert help to avoid business risk
● Data transfer costs can be significant, especially for data pipelines and streaming systems
● Eliminate point to point data transfer (often duplicate data transfer) with replicator
● Avoid data egress costs for VPC to VPC transfer with Confluent Cloud VPC peering
36C O N F I D E N T I A L
Confluent | Self-managed or fully-managed on AWS
Self-Managed Software
Confluent Platform
The Leading Distribution of Apache Kafka
Deploy on any platform on-premises or on AWS
Fully-Managed Service
Confluent Cloud
Apache Kafka Re-engineered for the Cloud
Available as a fully managed service on AWS
VM
37C O N F I D E N T I A L
Confluent | Complete portfolio of products and
services
built around Kafka
Complete support across the entire adoption lifecycle
Kafka Training Confluent Platform Professional Services Fully Managed Kafka
38
A Customer
Story
A Way to Simplify Diverse
Infrastructure
for a Leading Financial Services
Company
39
A Way to Simplify
Diverse Infrastructure
vs.
● Reduces total traffic between on-
premises systems and cloud
● Uses Confluent Replicator to sync to
the cloud
● Can be used for migrating
applications to the cloud
40
Live Demo
41
Q&A
42
Next Steps
Ready to go? Get started at:
cnfl.io/aws-18
43

Bridge to Cloud: Using Apache Kafka to Migrate to AWS

  • 1.
    1 Bridge to Cloud:Using Apache Kafka to Migrate to AWS Priya Shivakumar, Confluent Cloud Konstantine Karantasis, Confluent Cloud Rohit Pujari, Amazon Web Services
  • 2.
    2 Speakers Priya is theDirector of Product at Confluent, where she focuses on product and go-to-market strategy for Confluent Cloud, a fully managed Apache Kafka service. She brings more than a decade of experience in the tech industry as an engineering and product leader. Prior to Confluent, she led product marketing and GTM strategy for VSAN at VMware. As a growth strategy consultant at LEK before that, Priya advised clients on a range of growth issues that include new product introduction, commercialization, geo expansion, and acquisition. Priya Shivakumar Director of Product, Confluent Konstantine Karantasis is a Software Engineer at Confluent, Inc. He’s a main contributor to Apache Kafka and its Connect API and he’s the author of widely used software, such as Confluent’s S3 and Replicator Connectors, class loading isolation in Apache Kafka Connect, Confluent CLI and more. Previously, he built scalable open source web-services at Yahoo and did research on high-performance computing at the University of Illinois at Urbana-Champaign. Konstantine holds a Ph.D. from the University of Patras, Greece. Konstantine Karantasis Software Engineer, Confluent Rohit is a Partner Solutions Architect with AWS. He focuses on growing cloud business of their top technology partners by helping them build, innovate, and go-to-market with customer-centric solutions on AWS. Rohit brings a wealth of experience in data engineering and analytics from having worked with customers of all sizes in various stages of their data journey. Rohit Pujari Partner Solutions Architect, Amazon Web Services
  • 3.
    3 Agenda The Great “CloudShift” What is a Bridge to Cloud? Key Considerations for Cloud Migration A Customer Story Live Demo Q&A
  • 4.
    4 Poll How is yourinfrastructure setup today? • Mostly on-premises • Mostly in the cloud • Combination of on-premises and cloud (hybrid model)
  • 5.
    5 The Great “CloudShift” A cloud product is not complete without a cloud migration story.
  • 6.
    6 Cloud Migration: Aone time thing?
  • 7.
    7 In reality, wekeep running Our apps are complex, intricately tied together. Some, built to run on legacy, are immovable systems.
  • 8.
    8 We don’t wantto just move. We want to build for the cloud.
  • 9.
  • 10.
  • 11.
    11C O NF I D E N T I A L Apache Kafka, the de-facto standard for real- time streaming Real-time | Uses disk structure for constant performance at Petabyte scale Scalable | Distributed, scales quickly and easily without downtime Persistent | Persists messages on disks, enables intra-cluster replication Reliable | Replicates data, auto balances consumers upon failure In production at more than a third of the Fortune 500 2 trillion messages a day at LinkedIn 500 billion events a day (1.3 PB) at Netflix
  • 12.
    12 Poll How are youusing Apache Kafka today? • Apache Kafka on-premises • Apache Kafka in cloud • Apache Kafka on-premises and cloud (hybrid) • Not using Apache Kafka currently
  • 13.
    13 What is ourbridge? Confluent Replicator
  • 14.
    14C O NF I D E N T I A L Replicator | Reliable, Scalable, Simple Feature List Replicator Mirror-maker Reliable Auto creation of topics ✔ Partial New partition addition Configuration replication ✔ X Single message transformations ✔ X Active-active replication ✔ X Scalable Aggregate cluster - single management point for multiple clusters ✔ X Auto scale - scale replication processes as Kafka traffic increases with a single configuration ✔ X Simple Control Center Integration - manage and monitor replication via Control Center UI ✔ X Disaster Recovery support Active-active replication - redirect events to avoid infinite replication loops in active-active configurations ✔ X
  • 15.
    15 Disaster Recovery andBridge to Cloud Enables multi-DC deployments of Apache Kafka ● On-premises: DC to DC ● Hybrid: DC to Region ● Cloud: Region to Region Confluent Replicator
  • 16.
    16 Establish Your Foundation App Oracle S3 1Deploy Kafka on-premises and on AWS Teradata
  • 17.
    17 Establish Your Foundation App Oracle 2 Createyour pipeline and replicate your topics to your AWS cluster Teradata Confluent Replicator
  • 18.
    18 Let the TrafficFlow App Oracle S3 3 Migrate app by app, database by database App Topic: dc1users Topic: dc1clicks
  • 19.
    19 DC-1 App App App App App App Elasticsearch MySQL Oracle TeradataDynamoDB RDS Redshift S3 DC-NY DC- LON amer-northeast emea-central
  • 20.
  • 21.
    21 Confluent Cloud manages Kafka for you Mission-CriticalReliability Complete Streaming Service Freedom of Choice from the original creators of Apache Kafka that evolves with your needs for the best of private and public clouds
  • 22.
    22C O NF I D E N T I A L Confluent Cloud Battle tested for massive scale, mission-critical pipelines ● Designed by creators of Kafka ● Operated by Kafka committers ● 100+ years of combined Kafka experience ● 80% of Kafka code commits ● More than a third of Fortune 100 companies trust Confluent ● Sub-25ms^ latencies at multi- GBps throughput ● Highly secure with ACLs, RBAC and encryption* ● 99.95% uptime SLA guarantee ● < 1 hr response time SLA for P1 issues Delivered by Experts Trusted by Enterprises Backed by Guarantees ^P95 latency *ACLs, RBAC coming soon to Confluent Cloud
  • 23.
    23C O NF I D E N T I A L DATA COMPATIBILITY | Schema registry DEVELOPMENT & CONNECTIVITY | Clients | Connectors | REST proxy | KSQL APACHE KAFKA | Connect | Pub-Sub | Streams ® Database changes Log events IoT events Web events Transformations Custom apps Analytics Monitoring Hadoop Database Data warehouse CRM Confluent Cloud DATA INTEGRATION REAL-TIME APPS Confluent Cloud Kafka re-engineered for cloud MANAGEMENT & MONITORING | Multi-tenancy | Quotas | Usage tracking and billing
  • 24.
    24C O NF I D E N T I A L Private Cloud Deploy on-premises in your datacenter with Confluent Enterprise Public Cloud Migrate to or adopt cloud at your own pace with fully- managed Confluent Cloud Hybrid Cloud Build a persistent bridge b/n datacenter and cloud with Confluent Replicator Confluent Cloud Industry’s only hybrid Kafka service
  • 25.
    25C O NF I D E N T I A L Confluent | Singular Kafka focus and innovation Apache Kafka re-engineered for the cloud Confluent Vision for Kafka Global ● Automated disaster recovery ● Global applications with geo-awareness Infinite ● Efficient and infinite data with tiered storage ● Unlimited horizontal scalability for single clusters ● Faster elastic scaling for brokers and partitions Elastic ● Easy Kubernetes- based orchestration and management with Confluent operator ● Faster elastic scaling when adding brokers and partitions
  • 26.
    26C O NF I D E N T I A L Confluent Cloud on AWS AWS EcosystemOn-premises Confluent Cloud + AWS Ecosystem App Oracle App Confluent Enterprise Confluent Replicator Confluent S3 Connector Confluent Cloud Run SQL queries Visualize data Run interactive queries Run ad-hoc and big data analysis
  • 27.
  • 28.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Automate with deeply integrated security tools and services Inherit global security and compliance controls Highest standards for privacy and data security Largest network of security partners and solutions Scale with superior visibility and control that satisfies the most risk-sensitive orgs Move to AWS Strengthen your security posture
  • 29.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Encrypt data in transit and at rest with keys managed by our AWS Key Management System (KMS) or managing your own encryption keys with Cloud HSM using FIPS 140-2 Level 3 validated HSMs Meet data residency requirements Choose an AWS Region and AWS will not replicate it elsewhere unless you choose to do so Access services and tools that enable you to build GDPR-compliant infrastructure on top of AWS Comply with local data privacy laws by controlling who can access content, its lifecycle and disposal Highest standards for privacy
  • 30.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. ● Virtual Private Connection (VPC) - a virtual network defined in AWS ● Dedicated to a cloud account (AWS) ● Provides logical isolation from other virtual networks in AWS ● Launch cloud resources such as EC2 instances into a VPC AWS Virtual Private Cloud
  • 31.
    © 2018, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. ● Setup VPC peering between VPCs for private traffic ● VPC peering connection is a networking connection between two VPCs that enables traffic to be routed private IP addresses ● Instances in either VPC can communicate with each other as if they are within the same network ● Create a VPC peering connection between: ○ Your own VPCs if you have multiple ○ Your VPC and Confluent Cloud VPC VPC Peering for secure data transfer
  • 32.
    32C O NF I D E N T I A L Security | Confluent Cloud ● Confluent Cloud offers dedicated clusters for greater isolation and security ● Confluent Cloud supports VPC peering to private data transfer ● With VPC peering, your cloud applications can securely transfer data to your dedicated Confluent Cloud cluster ● Data does not need to travel over internet Your AWS VPC Dedicated Kafka (Single-AZ) CCloud VPC AWS Region N1 N2 N3 VPC Peering ELB (Private IP) Your Datacenter Confluent Replicator Cloud Applications
  • 33.
    33C O NF I D E N T I A L Availability | Multi-AZ with Confluent Cloud ● Confluent Cloud offers zone-level failure protection with Multi-AZ clusters ● Confluent Cloud guarantees 99.95% uptime with SLA ● Maximize availability with 3x replication across 3 zones ● Use Confluent replicator to stream data across regions for region-level failure protection AZ3AZ2AZ1 Customer VPC Dedicated Kafka (Multi- AZ) CCloud VPC AWS Region N1 N2 N3 VPC Peering ELB (Private IP)
  • 34.
    34C O NF I D E N T I A L Compliance | Confluent Cloud SOC-2 Type II compliant GDPR ready PCI Phase 1 complete HIPAA (coming soon)
  • 35.
    35C O NF I D E N T I A L Cost | Right-size, architect, and outsource as necessary ● Customers often incorrectly size / oversize clusters in the cloud ● Cut costs by right sizing, using reserved instances, and signing up for AWS EDP Right sizing Architecting for cost Optimizing for time- to-market De-risk downtime and lag ● Getting up and running quickly is critical for costs and competitiveness ● Managed services and consumption-based models help with speed to market ● Distributed systems are complex with lots of APIs, metrics, systems, and configs ● Stateful systems also require careful capacity planning ● Plan carefully, ensure in-house expertise or get expert help to avoid business risk ● Data transfer costs can be significant, especially for data pipelines and streaming systems ● Eliminate point to point data transfer (often duplicate data transfer) with replicator ● Avoid data egress costs for VPC to VPC transfer with Confluent Cloud VPC peering
  • 36.
    36C O NF I D E N T I A L Confluent | Self-managed or fully-managed on AWS Self-Managed Software Confluent Platform The Leading Distribution of Apache Kafka Deploy on any platform on-premises or on AWS Fully-Managed Service Confluent Cloud Apache Kafka Re-engineered for the Cloud Available as a fully managed service on AWS VM
  • 37.
    37C O NF I D E N T I A L Confluent | Complete portfolio of products and services built around Kafka Complete support across the entire adoption lifecycle Kafka Training Confluent Platform Professional Services Fully Managed Kafka
  • 38.
    38 A Customer Story A Wayto Simplify Diverse Infrastructure for a Leading Financial Services Company
  • 39.
    39 A Way toSimplify Diverse Infrastructure vs. ● Reduces total traffic between on- premises systems and cloud ● Uses Confluent Replicator to sync to the cloud ● Can be used for migrating applications to the cloud
  • 40.
  • 41.
  • 42.
    42 Next Steps Ready togo? Get started at: cnfl.io/aws-18
  • 43.