SlideShare a Scribd company logo
1 of 93
Download to read offline
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Multi-Region Active-Active
Architecture
G l e n n G o r e , D i r e c t o r , S o l u t i o n s A r c h i t e c t u r e
G i r i s h P a t i l , S e n i o r S o l u t i o n s A r c h i t e c t
D a r i n B r i s k m a n , D e v e l o p e r E v a n g e l i s t
ARC319
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Session objectives
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Session objectives
1. Understand how to think about application availability
2. Understand how AWS makes services highly-available
3. Understand why you might need Multi-Region Active-Active
architecture
4. Review AWS services that are useful for Multi-Region Active-
Active design
5. Trade-offs involved in the design
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Understand how to think about
application availability
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Where are we?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Calculating availability with hard
dependencies
Application
Dependency 1 Dependency 3
99% 90%
<= 90%
Dependency 2
95%
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“A chain is only as strong as its
weakest link.”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Calculating availability with redundant
components
Application
Component 1 Component 1 Replica
90% 90%
~99%
Assuming
instantaneous
failover!
Let us understand
intuitively
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“Component redundancy
increases availability
significantly!”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“Nines” of availability
Reliability Pillar White-paper (Nov 2017)
Availability Max Disruption (per year) Application Categories
99% 3 days 15 hours Batch processing, data extraction, transfer, and load jobs
99.90% 8 hours 45 minutes Internal tools like knowledge management, project tracking
99.95% 4 hours 22 minutes Online commerce, point of sale
99.99% 52 minutes Video delivery, ridesharing services, broadcast systems
99.999% 5 minutes ATM transactions, telecommunications systems
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Your application’s availability
depends …
… n o t only on availab ility of AWS se rvice s, b u t also on
the qu ality of you r software and you r adhe re nce to
op e rational b e st p ractice s !
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“Nines” of availability
Reliability Pillar whitepaper (Nov 2017)
Availability Max Disruption (per year) Application Categories
99% 3 days 15 hours Batch processing, data extraction, transfer, and load jobs
99.90% 8 hours 45 minutes Internal tools like knowledge management, project tracking
99.95% 4 hours 22 minutes Online commerce, point of sale
99.99% 52 minutes Video delivery, ridesharing services, broadcast systems
99.999% 5 minutes ATM transactions, telecommunications systems
Consider Multi-Region
Active-Active for this class
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Understand how AWS makes services
highly available
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Region-wide AWS services
Amazon
DynamoDB
Amazon
RDS
Amazon
ElastiCache
Amazon
S3
Amazon
EFS
Amazon
SQS
Amazon
Kinesis
Amazon ES
Default
Configurable for
multi-AZ deployment
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
High availability for EC2 – instance
recovery
Instance ID,
private IP addresses,
Elastic IP addresses, and
all instance metadata
Instance ID,
private IP addresses,
Elastic IP addresses, and
all instance metadata
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-recover.html
Loss of network connectivity
Loss of system power
Software issues on the physical host
Hardware issues on the physical host that impact network
reachability
New instance, same in all
manners as old instance
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
High availability for EC2 – Auto Scaling
Availability Zone 1 Availability Zone 2
Auto-scaling Group
Fresh server from AMI
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Understand why you might need
Multi-Region Active-Active
architecture
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Serving geographically distributed
customer base
ChinaUSA West USA East EU-East
Users from
San Francisco
Users from
New York
Users from
London
Users from
Shanghai
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Guarding against failure of your applications
in one region
Applications
in US West
Applications
in US East
Users from
San Francisco
Users from
New York
AWS Service 1
AWS Service 2
AWS Service 3
AWS Service 4
AWS Service 1
AWS Service 2
AWS Service 3
AWS Service 4
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cost effective DR: Why not use DR all the
time?
DR environments that don’t get used
1. Fall out of sync, eventually
2. Waste money
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Key Technology Requirements
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Reliable & secure network
Network backbone
VPN or encrypted (SSL) communication over public IP
Region BRegion A Encrypted data
Backbone
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data replication
1. Synchronous
2. Asynchronous
o Nearly continuous (lag of few seconds or minutes)
o Batch (hourly/daily/weekly copy)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Code synchronization
1. Staged deployment across regions
o Rolling
o Blue-Green
o Rollback facility
2. Parameterized localization
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Traffic segregation & management
Segregation options
Explicit – different URLs
• e.g. east.abc-corp.com and west.abc-corp.com
Implicit (DNS level) – the same URL
• e.g. www.abc-corp.com
Traffic management infrastructure
• Throttling
• Internal redirecting
• External redirecting
When users reach wrong
regions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitoring
Application & infrastructure health
Replication lag & code sync monitoring
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Multi-tenancy
What is a tenant?
-A unit of movement/failover
US EastUS West
Backbone
California
Texas New York
Pennsylvania
Texas
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Direction of replication – before
Texas
containers DB DB
Region A Region B
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Direction of replication – after
Texas
containersDB DB
Region A Region B
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Failover scripts
Failover scripts should be able to:
1. Traffic rerouting for a tenant
2. Change direction of replication
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Key architectural considerations
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tolerance for network partitioning
Failure of one region should not lead to failure of applications in
another
Regional independence for request serving – no API calls from one
region to another
Region BRegion A
Backbone
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Minimal data replication requirements
Does all data need to be replicated?
If yes, does it need to replicated synchronously?
Does all data need to be replicated continuously?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Classification of data
Transactions
Catalog
information
Events,
objects
Server logs
Purchase record Product details Click stream Https logs
Low volume,
but highly
critical
High volume,
but less
critical
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sync vs. async replication modes
Tenant
1
App
DB DB
Tenant
1
App
DB DB
Write
Write
Replicate
Replicate
AckAck
Ack Ack
Cons: Network & target
dependent!
Pros: Guarantees
consistency
Cons: Two databases can go
out of sync
Pros: Network & target
independent
Region A Region B
Async preferred,
when possible!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Concept of data replication lanes
Synchronous replication
Asynchronous nearly
continuous replication
Asynchronous batch
replication
Transactions
Catalog
information
Most difficult to manage
Easiest to manage
Events
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Ideal replication system
Each data store type will need a different technology.
But at the the minimum:
1. It should report replication lag
2. It should report record offset
3. Should be able to retry replication of failed records
ReplicatorSource DB Target DB
High Level
Metrics
monitoring
Try until successful
Replication
lag
Record offset
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Distributed system design best practices
Eventual
consistencyIdempotency
Static
stability
ThrottlingExponential
backup
Circuit
breaking
Reliability Pillar whitepaper (Nov 2017)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How AWS enables customers to
deploy Multi-Region Active-Active
Architecture
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS worldwide network backbone
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Multi-Region VPN – over non-AWS network
https://aws.amazon.com/answers/networking/aws-multiple-region-multi-vpc-connectivity/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Multi-Region VPN – over AWS network
https://aws.amazon.com/answers/networking/aws-multiple-region-multi-vpc-connectivity/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
INTER REGION VPC PEERING
N E W
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HQ
OREGON
VPC
Shared
services
VPC
VPC
Customer
network
OREGON REGION
VPC
VPC
VPC VPC
VPC
IRELAND REGION
VPC
Shared
services
VPC
VPC
VPC
VPC
VPC
N O O V E R L A P P I N G I P
A D D R E S S S P A C E
SHARED SERVICES
A M A Z O N B A C K B O N E
V P C P E E R
C R O S S R E G I O N P E E R E D
C O N N E C T I O N E N C R Y P T E D
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Key benefits
1. Works similar to existing Intra Region VPC Peering
2. Data always stays on the AWS backbone
3. Data always encrypted by default
4. No need to use Gateways, third-party VPN solutions to connect across
regions.
5. No additional charges for using interregion VPC peering. Customers pay
standard data transfer rates
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon RDS Multi-AZ deployment
Availability Zone 1 Availability Zone 2
Security group
mydb1.abc45345.eu-west-1.rds.amazonaws.com:3306
VPC subnetVPC subnet
Synchronous
physical replication
• Standbys ensure zero data loss in event of the master’s failure.
• Always have a stand-by. Always!!!
• Also note, cross AZ failovers are automatic & fast; whereas cross-
region failovers take time and need nontrivial planning.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Supported DBs
• MySQL
• MariaDB
• PostgreSQL
• Aurora
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html#USER_ReadRepl.XRgn
http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/AuroraMySQL.Replication.CrossRegion.html
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitoring RDS replication lag
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Very simple plan
All regions send
critical write
traffic to a single
master
Replication traffic
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“Simple, but higher latency and
network dependency”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Better plan (for most cases)
Tenant in
Region A
Tenant in
Region B
Tenant in
Region C
Sync
Async
Async
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Applications get faster response
Some committed transactions may not make it to the failover region, before the switch.
Committed transactions are safe due to standby.
They can be recovered with help of reconciliation techniques.
However
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Zero application downtime from ANY instance failure
Zero application downtime from ANY AZ failure
Faster write performance and higher scale
Sign up for single-region multi-master preview today;
Multi-Region Multi-Master coming in 2018
Availability
Zone 1
Scale out both reads and writes
Availability
Zone 2
Availability
Zone 3
Application
Read/Write
Master 1
Shared distributed storage volume
Read/Write
Master 2
Read/Write
Master 3
Aurora multi-master—scale out reads & writes
F i r s t M y S Q L c o m p a t i b l e D B s e r v i c e w i t h s c a l e o u t a c r o s s m u l t i p l e d a t a c e n t e r s
Database Migration Service (DMS) for
granular replication
DMS
Replication instance
Source Target
Update
t1 t2
t1
t2
Transactions Change
apply
after bulk
load
Change data capture
(supported for MySQL, MariaDB,
Aurora, PostgreSQL)
Details: http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.html
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon DynamoDB
F a s t a n d f l e x i b l e N o S Q L d a t a b a s e s e r v i c e f o r a n y s c a l e
Fast, consistent
performance
Highly scalable Fully managed Business critical
reliability
Consistent single-digit millisecond
latency; DAX in-memory
performance reduces response
times to microseconds
Auto-scaling to hundreds
of terabytes of data that
serve millions of
requests per second
Automatic provisioning,
infrastructure
management, scaling,
and configuration with
zero downtime
Data is replicated across
fault tolerant Availability
Zones, with fine-grained
access control
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon DynamoDB Global Tables ( G A )
Fi r s t f u l l y m a n a g e d , m u l t i - m a s t e r , m u l t i - r e g i o n d a t a b a s e
Build high performance, globally distributed applications
Low latency reads & writes to locally available tables
Disaster proof with multi-region redundancy
Easy to set up and no application rewrites required
Globally dispersed users
Replica (N. America)
Replica (Europe)
Replica (Asia)
Global App
Global Table
Open Source Cross-
Region Replication Library
or
Lambda
Amazon DynamoDB cross-region
replication with streams
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is AWS Lambda
• It is a serverless, event driven compute service
• Define functions, and the service executes them as many times as input
arrives
• Many supported event sources including: Kinesis and DynamoDB streams
• Scales automatically with event rate
• Supported languages: Java, .Net, Node.js, Python
Ensures streamed records in shards are
processed (replicated in this case) in order!
DynamoDB Streams and AWS Lambda
Error handling in Lambda
1. For stream-based event sources (Amazon Kinesis Streams and
DynamoDB streams), AWS Lambda polls your stream and invokes your
Lambda function.
2. Therefore, if a Lambda function fails, AWS Lambda attempts to
process the erring batch of records until the time the data expires from
the stream.
3. The exception is treated as blocking, and AWS Lambda will not read
any new records from the stream until the failed batch of records
either expires or processed successfully.
This ensures that AWS Lambda processes the stream events in order.
http://docs.aws.amazon.com/lambda/latest/dg/retries-on-errors.html
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ElasticSearch cross region replication
Amazon Kinesis
Firehose
Amazon Kinesis
Firehose
Amazon ES
Amazon ES
Source
Source needs to have tracking
to have successful posting to
both regions
Region A
Region B
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Redshift cross region replication
Amazon RedshiftAmazon Kinesis
Firehose
Amazon Kinesis
Firehose
Amazon Redshift
Source
Source needs to have tracking
to have successful posting to
both regions
Region A
Region B
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Global Traffic Management with
Route 53
Global service
Load balancers and/or servers across multiple regions
globally
Region A
Region B
Region C
User for Region A
User for Region C
User for Region B
Traffic flow: endpoints
• Hybrid/low level infrastructure: IP address or CNAME
• ELB Classic Load Balancer / Application Load Balancer
• Amazon S3 website
• Amazon CloudFront distribution
• AWS Elastic Beanstalk environment
Traffic flow: Quick overview of rules
Simple routing policy – Use for a single resource that performs a given
function for your domain, for example, a web server that serves content for
the example.com website.
Failover routing policy – Use when you want to configure active-passive
failover.
Geolocation routing policy – Use when you want to route traffic based on
the location of your users.
Traffic flow: Quick overview of rules
Geoproximity routing policy – Use when you want to route traffic based on
the location of your resources and, optionally, shift traffic from one resource
in one location to resources in another.
Latency routing policy – Use when you have resources in multiple locations
and you want to route traffic to the resource that provides the best latency.
Multivalue answer routing policy – Use when you want Amazon Route 53 to
respond to DNS queries with up to eight healthy records selected at random.
Weighted routing policy – Use to route traffic to multiple resources in
proportions that you specify.
Traffic policy example
Example of an advanced policy
Failover Rule Geolocation Rule
Failover Rule
End points
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cross-region code deployment
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BLUE-GREEN
Link
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
CANARY
Link
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Is code deployment any different?
No. DevOps pipelines work the same way.
Important trade-off you should make:
1. Simultaneous deployments
2. “One region at a time” deployments
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Using AWS services
https://aws.amazon.com/blogs/devops/building-a-cross-regioncross-account-code-deployment-solution-on-aws/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Parameter localization
App 1 App 2 App 1 App 2
Local parameters DB Local parameters DB
Region A Region B
Central parameter
configuration application
DevOps pipeline deploying
same code base
Take a look at Netflix
Archaius!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cross Region Monitoring
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Is monitoring any different?
Yes. You need to do additional monitoring.
1. Replication lags (very critical)
2. Record offset tracking
Need to put important stats for two/more regions on a
single canvas for proper monitoring.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitoring replication lag – CloudWatch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitoring Amazon S3 file replication progress
https://aws.amazon.com/answers/infrastructure-management/crr-monitor/
Info on new
files
Info on
replicated files
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitoring using serverless components
Arrow direction indicates general direction of data flow
EC2 instances
VPC
Flow Logs
CloudTrail
Audit Logs
S3
Access
Logs
ELB
Access
Logs
CloudFront
Access
Logs
SNS
Notifications
DynamoDB
Streams
SES
Inbound
Email
Cognito
Events
Kinesis
Streams
CloudWatch
Events &
Alarms
Config
Rules
AmazonKinesis
Firehose
AmazonKinesis
Firehose
Kinesis
Agent
https://d0.awsstatic.com/whitepapers/whitepaper-use-amazon-elasticsearch-to-log-and-monitor-almost-everything.pdf
Highly available
deployment across
two AZs
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Not all metrics are created equal!
High-Level Metric (high importance, low volume):
• User experience
• User count
• Transaction count
• Replication status
Low-Level Metric (relatively low importance, high volume):
• HTTP request count
• Read vs. write throughout
• Cache hit vs. miss
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
High-Level Metrics monitoring
Low-Level Metrics
monitoring
Low-Level Metrics
monitoring
High-Level
Metrics
monitoring
High-Level Metrics
monitoring
App monitoring agents App monitoring agents
Region A Region B
Replicate only
High-Level Metrics.
Use region tags.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Some useful open source projects
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Zuul & Isthmus from Netflix
Request rerouting for users coming from
unwanted region, handling load balancer failures
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EVCache from Netflix
For remote cache invalidation and synchronization
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Conclusions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Conclusions
o Avoid synchronous replication & simultaneous deployments as
much as possible
o Design applications for idempotency & eventual consistency as
much as possible
o Closely monitor replication & code sync delays
o Have push buttons ready to switch traffic for tenants
o Make High-Level Metrics monitoring systems also Multi-Region
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Conclusions
o It is an involved exercise. It requires careful planning
and design.
o However, various AWS services make implementation
much easier by doing undifferentiated heavy lifting
for our customers.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Conclusions
o For companies with extremely high availability
requirements or/and geographically distributed user
base, benefits of Multi-Region Active-Active
architecture can be profound.
o In such cases, consider designing applications for
Multi-Region Active-Active implementation from DAY
ONE.
But, again as we say in Amazon, today is DAY ONE!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
THANK YOU!

More Related Content

What's hot

What's hot (20)

AWS Fargate와 Amazon ECS를 활용한 CI/CD 모범사례 - 유재석, AWS 솔루션즈 아키텍트 :: AWS Game Mast...
AWS Fargate와 Amazon ECS를 활용한 CI/CD 모범사례 - 유재석, AWS 솔루션즈 아키텍트 :: AWS Game Mast...AWS Fargate와 Amazon ECS를 활용한 CI/CD 모범사례 - 유재석, AWS 솔루션즈 아키텍트 :: AWS Game Mast...
AWS Fargate와 Amazon ECS를 활용한 CI/CD 모범사례 - 유재석, AWS 솔루션즈 아키텍트 :: AWS Game Mast...
 
AWS for Backup and Recovery
AWS for Backup and RecoveryAWS for Backup and Recovery
AWS for Backup and Recovery
 
AWS 101
AWS 101AWS 101
AWS 101
 
[NEW LAUNCH!] Deep Dive on Amazon FSx for Windows File Server (STG322-R) - AW...
[NEW LAUNCH!] Deep Dive on Amazon FSx for Windows File Server (STG322-R) - AW...[NEW LAUNCH!] Deep Dive on Amazon FSx for Windows File Server (STG322-R) - AW...
[NEW LAUNCH!] Deep Dive on Amazon FSx for Windows File Server (STG322-R) - AW...
 
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
 
DevSecOps reference architectures 2018
DevSecOps reference architectures 2018DevSecOps reference architectures 2018
DevSecOps reference architectures 2018
 
Kubernetes/ EKS - 김광영 (AWS 솔루션즈 아키텍트)
Kubernetes/ EKS - 김광영 (AWS 솔루션즈 아키텍트)Kubernetes/ EKS - 김광영 (AWS 솔루션즈 아키텍트)
Kubernetes/ EKS - 김광영 (AWS 솔루션즈 아키텍트)
 
AWS WAF
AWS WAFAWS WAF
AWS WAF
 
AWS VPC Fundamentals- Webinar
AWS VPC Fundamentals- WebinarAWS VPC Fundamentals- Webinar
AWS VPC Fundamentals- Webinar
 
Palo Alto Networks CASB
Palo Alto Networks CASBPalo Alto Networks CASB
Palo Alto Networks CASB
 
AWS Black Belt Online Seminar 2016 AWS上でのActive Directory構築
AWS Black Belt Online Seminar 2016 AWS上でのActive Directory構築AWS Black Belt Online Seminar 2016 AWS上でのActive Directory構築
AWS Black Belt Online Seminar 2016 AWS上でのActive Directory構築
 
AWS Certified Solutions Architect Professional Course S1-S5
AWS Certified Solutions Architect Professional Course S1-S5AWS Certified Solutions Architect Professional Course S1-S5
AWS Certified Solutions Architect Professional Course S1-S5
 
워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안
워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안
워크로드 특성에 따른 안전하고 효율적인 Data Lake 운영 방안
 
AWS Security Best Practices and Design Patterns
AWS Security Best Practices and Design PatternsAWS Security Best Practices and Design Patterns
AWS Security Best Practices and Design Patterns
 
The Qa Testing Checklists for Successful Cloud Migration
The Qa Testing Checklists for Successful Cloud MigrationThe Qa Testing Checklists for Successful Cloud Migration
The Qa Testing Checklists for Successful Cloud Migration
 
AWS Web Application Firewall and AWS Shield - Webinar
AWS Web Application Firewall and AWS Shield - Webinar AWS Web Application Firewall and AWS Shield - Webinar
AWS Web Application Firewall and AWS Shield - Webinar
 
CI/CD pipelines on AWS - Builders Day Israel
CI/CD pipelines on AWS - Builders Day IsraelCI/CD pipelines on AWS - Builders Day Israel
CI/CD pipelines on AWS - Builders Day Israel
 
Gateway/APIC security
Gateway/APIC securityGateway/APIC security
Gateway/APIC security
 
Infrastructure is code with the AWS CDK - MAD312 - New York AWS Summit
Infrastructure is code with the AWS CDK - MAD312 - New York AWS SummitInfrastructure is code with the AWS CDK - MAD312 - New York AWS Summit
Infrastructure is code with the AWS CDK - MAD312 - New York AWS Summit
 
CI/CD for Modern Applications
CI/CD for Modern ApplicationsCI/CD for Modern Applications
CI/CD for Modern Applications
 

Similar to ARC319_Multi-Region Active-Active Architecture

Building Global Serverless Backends powered by Amazon DynamoDB Global Tables
Building Global Serverless Backends powered by Amazon DynamoDB Global TablesBuilding Global Serverless Backends powered by Amazon DynamoDB Global Tables
Building Global Serverless Backends powered by Amazon DynamoDB Global Tables
Amazon Web Services
 
Building Global Serverless Backends
Building Global Serverless BackendsBuilding Global Serverless Backends
Building Global Serverless Backends
Amazon Web Services
 

Similar to ARC319_Multi-Region Active-Active Architecture (20)

How to Design a Multi-Region Active-Active Architecture
How to Design a Multi-Region Active-Active ArchitectureHow to Design a Multi-Region Active-Active Architecture
How to Design a Multi-Region Active-Active Architecture
 
ARC306_High Resiliency & Availability Of Online Entertainment Communities Usi...
ARC306_High Resiliency & Availability Of Online Entertainment Communities Usi...ARC306_High Resiliency & Availability Of Online Entertainment Communities Usi...
ARC306_High Resiliency & Availability Of Online Entertainment Communities Usi...
 
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
AWS Database and Analytics State of the Union - 2017 - DAT201 - re:Invent 2017
 
Building Global Serverless Backends powered by Amazon DynamoDB Global Tables
Building Global Serverless Backends powered by Amazon DynamoDB Global TablesBuilding Global Serverless Backends powered by Amazon DynamoDB Global Tables
Building Global Serverless Backends powered by Amazon DynamoDB Global Tables
 
Building Multiregion Serverless Backends
Building Multiregion Serverless BackendsBuilding Multiregion Serverless Backends
Building Multiregion Serverless Backends
 
Building Global Multi-Region, Active-Active Serverless Backends I AWS Dev Day...
Building Global Multi-Region, Active-Active Serverless Backends I AWS Dev Day...Building Global Multi-Region, Active-Active Serverless Backends I AWS Dev Day...
Building Global Multi-Region, Active-Active Serverless Backends I AWS Dev Day...
 
Building a Multi-Region, Active-Active Serverless Backends.
Building a Multi-Region, Active-Active Serverless Backends.Building a Multi-Region, Active-Active Serverless Backends.
Building a Multi-Region, Active-Active Serverless Backends.
 
Building Global Serverless Backends
Building Global Serverless BackendsBuilding Global Serverless Backends
Building Global Serverless Backends
 
Journey Towards Scaling Your API to 10 Million Users
Journey Towards Scaling Your API to 10 Million UsersJourney Towards Scaling Your API to 10 Million Users
Journey Towards Scaling Your API to 10 Million Users
 
AWS X-Ray: Debugging Applications at Scale - AWS Online Tech Talks
AWS X-Ray: Debugging Applications at Scale - AWS Online Tech TalksAWS X-Ray: Debugging Applications at Scale - AWS Online Tech Talks
AWS X-Ray: Debugging Applications at Scale - AWS Online Tech Talks
 
ARC207 Monitoring Performance of Enterprise Applications on AWS: Understandin...
ARC207 Monitoring Performance of Enterprise Applications on AWS: Understandin...ARC207 Monitoring Performance of Enterprise Applications on AWS: Understandin...
ARC207 Monitoring Performance of Enterprise Applications on AWS: Understandin...
 
Learn how to build serverless applications using the AWS Serverless Platform-...
Learn how to build serverless applications using the AWS Serverless Platform-...Learn how to build serverless applications using the AWS Serverless Platform-...
Learn how to build serverless applications using the AWS Serverless Platform-...
 
STG401_This Is My Architecture
STG401_This Is My ArchitectureSTG401_This Is My Architecture
STG401_This Is My Architecture
 
Scale Website dan Mobile Applications Anda di AWS hingga 10 juta pengguna
Scale Website dan Mobile Applications Anda di AWS hingga 10 juta penggunaScale Website dan Mobile Applications Anda di AWS hingga 10 juta pengguna
Scale Website dan Mobile Applications Anda di AWS hingga 10 juta pengguna
 
DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...
DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...
DEV325_Application Deployment Techniques for Amazon EC2 Workloads with AWS Co...
 
ARC207_Monitoring Performance of Enterprise Applications on AWS
ARC207_Monitoring Performance of Enterprise Applications on AWSARC207_Monitoring Performance of Enterprise Applications on AWS
ARC207_Monitoring Performance of Enterprise Applications on AWS
 
Networking State of the Union - NET205 - re:Invent 2017
Networking State of the Union - NET205 - re:Invent 2017Networking State of the Union - NET205 - re:Invent 2017
Networking State of the Union - NET205 - re:Invent 2017
 
ARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million UsersARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million Users
 
Increasing Productivity with End-User Computing Solutions on AWS
  Increasing Productivity with End-User Computing Solutions on AWS  Increasing Productivity with End-User Computing Solutions on AWS
Increasing Productivity with End-User Computing Solutions on AWS
 
ABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSightABD206-Building Visualizations and Dashboards with Amazon QuickSight
ABD206-Building Visualizations and Dashboards with Amazon QuickSight
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

ARC319_Multi-Region Active-Active Architecture

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Multi-Region Active-Active Architecture G l e n n G o r e , D i r e c t o r , S o l u t i o n s A r c h i t e c t u r e G i r i s h P a t i l , S e n i o r S o l u t i o n s A r c h i t e c t D a r i n B r i s k m a n , D e v e l o p e r E v a n g e l i s t ARC319
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Session objectives
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Session objectives 1. Understand how to think about application availability 2. Understand how AWS makes services highly-available 3. Understand why you might need Multi-Region Active-Active architecture 4. Review AWS services that are useful for Multi-Region Active- Active design 5. Trade-offs involved in the design
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Understand how to think about application availability
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Where are we?
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Calculating availability with hard dependencies Application Dependency 1 Dependency 3 99% 90% <= 90% Dependency 2 95%
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “A chain is only as strong as its weakest link.”
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Calculating availability with redundant components Application Component 1 Component 1 Replica 90% 90% ~99% Assuming instantaneous failover! Let us understand intuitively
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “Component redundancy increases availability significantly!”
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “Nines” of availability Reliability Pillar White-paper (Nov 2017) Availability Max Disruption (per year) Application Categories 99% 3 days 15 hours Batch processing, data extraction, transfer, and load jobs 99.90% 8 hours 45 minutes Internal tools like knowledge management, project tracking 99.95% 4 hours 22 minutes Online commerce, point of sale 99.99% 52 minutes Video delivery, ridesharing services, broadcast systems 99.999% 5 minutes ATM transactions, telecommunications systems
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Your application’s availability depends … … n o t only on availab ility of AWS se rvice s, b u t also on the qu ality of you r software and you r adhe re nce to op e rational b e st p ractice s !
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “Nines” of availability Reliability Pillar whitepaper (Nov 2017) Availability Max Disruption (per year) Application Categories 99% 3 days 15 hours Batch processing, data extraction, transfer, and load jobs 99.90% 8 hours 45 minutes Internal tools like knowledge management, project tracking 99.95% 4 hours 22 minutes Online commerce, point of sale 99.99% 52 minutes Video delivery, ridesharing services, broadcast systems 99.999% 5 minutes ATM transactions, telecommunications systems Consider Multi-Region Active-Active for this class
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Understand how AWS makes services highly available
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Region-wide AWS services Amazon DynamoDB Amazon RDS Amazon ElastiCache Amazon S3 Amazon EFS Amazon SQS Amazon Kinesis Amazon ES Default Configurable for multi-AZ deployment
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. High availability for EC2 – instance recovery Instance ID, private IP addresses, Elastic IP addresses, and all instance metadata Instance ID, private IP addresses, Elastic IP addresses, and all instance metadata http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-recover.html Loss of network connectivity Loss of system power Software issues on the physical host Hardware issues on the physical host that impact network reachability New instance, same in all manners as old instance
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. High availability for EC2 – Auto Scaling Availability Zone 1 Availability Zone 2 Auto-scaling Group Fresh server from AMI
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Understand why you might need Multi-Region Active-Active architecture
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serving geographically distributed customer base ChinaUSA West USA East EU-East Users from San Francisco Users from New York Users from London Users from Shanghai
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Guarding against failure of your applications in one region Applications in US West Applications in US East Users from San Francisco Users from New York AWS Service 1 AWS Service 2 AWS Service 3 AWS Service 4 AWS Service 1 AWS Service 2 AWS Service 3 AWS Service 4
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cost effective DR: Why not use DR all the time? DR environments that don’t get used 1. Fall out of sync, eventually 2. Waste money
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Key Technology Requirements
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reliable & secure network Network backbone VPN or encrypted (SSL) communication over public IP Region BRegion A Encrypted data Backbone
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data replication 1. Synchronous 2. Asynchronous o Nearly continuous (lag of few seconds or minutes) o Batch (hourly/daily/weekly copy)
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Code synchronization 1. Staged deployment across regions o Rolling o Blue-Green o Rollback facility 2. Parameterized localization
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Traffic segregation & management Segregation options Explicit – different URLs • e.g. east.abc-corp.com and west.abc-corp.com Implicit (DNS level) – the same URL • e.g. www.abc-corp.com Traffic management infrastructure • Throttling • Internal redirecting • External redirecting When users reach wrong regions
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitoring Application & infrastructure health Replication lag & code sync monitoring
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Multi-tenancy What is a tenant? -A unit of movement/failover US EastUS West Backbone California Texas New York Pennsylvania Texas
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Direction of replication – before Texas containers DB DB Region A Region B
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Direction of replication – after Texas containersDB DB Region A Region B
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Failover scripts Failover scripts should be able to: 1. Traffic rerouting for a tenant 2. Change direction of replication
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Key architectural considerations
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tolerance for network partitioning Failure of one region should not lead to failure of applications in another Regional independence for request serving – no API calls from one region to another Region BRegion A Backbone
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Minimal data replication requirements Does all data need to be replicated? If yes, does it need to replicated synchronously? Does all data need to be replicated continuously?
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Classification of data Transactions Catalog information Events, objects Server logs Purchase record Product details Click stream Https logs Low volume, but highly critical High volume, but less critical
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Sync vs. async replication modes Tenant 1 App DB DB Tenant 1 App DB DB Write Write Replicate Replicate AckAck Ack Ack Cons: Network & target dependent! Pros: Guarantees consistency Cons: Two databases can go out of sync Pros: Network & target independent Region A Region B Async preferred, when possible!
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Concept of data replication lanes Synchronous replication Asynchronous nearly continuous replication Asynchronous batch replication Transactions Catalog information Most difficult to manage Easiest to manage Events
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ideal replication system Each data store type will need a different technology. But at the the minimum: 1. It should report replication lag 2. It should report record offset 3. Should be able to retry replication of failed records ReplicatorSource DB Target DB High Level Metrics monitoring Try until successful Replication lag Record offset
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Distributed system design best practices Eventual consistencyIdempotency Static stability ThrottlingExponential backup Circuit breaking Reliability Pillar whitepaper (Nov 2017)
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How AWS enables customers to deploy Multi-Region Active-Active Architecture
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS worldwide network backbone
  • 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Multi-Region VPN – over non-AWS network https://aws.amazon.com/answers/networking/aws-multiple-region-multi-vpc-connectivity/
  • 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Multi-Region VPN – over AWS network https://aws.amazon.com/answers/networking/aws-multiple-region-multi-vpc-connectivity/
  • 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. INTER REGION VPC PEERING N E W
  • 46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. HQ OREGON VPC Shared services VPC VPC Customer network OREGON REGION VPC VPC VPC VPC VPC IRELAND REGION VPC Shared services VPC VPC VPC VPC VPC N O O V E R L A P P I N G I P A D D R E S S S P A C E SHARED SERVICES A M A Z O N B A C K B O N E V P C P E E R C R O S S R E G I O N P E E R E D C O N N E C T I O N E N C R Y P T E D
  • 47. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Key benefits 1. Works similar to existing Intra Region VPC Peering 2. Data always stays on the AWS backbone 3. Data always encrypted by default 4. No need to use Gateways, third-party VPN solutions to connect across regions. 5. No additional charges for using interregion VPC peering. Customers pay standard data transfer rates
  • 48. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 49. Amazon RDS Multi-AZ deployment Availability Zone 1 Availability Zone 2 Security group mydb1.abc45345.eu-west-1.rds.amazonaws.com:3306 VPC subnetVPC subnet Synchronous physical replication • Standbys ensure zero data loss in event of the master’s failure. • Always have a stand-by. Always!!! • Also note, cross AZ failovers are automatic & fast; whereas cross- region failovers take time and need nontrivial planning.
  • 50. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Supported DBs • MySQL • MariaDB • PostgreSQL • Aurora http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html#USER_ReadRepl.XRgn http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/AuroraMySQL.Replication.CrossRegion.html
  • 51. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitoring RDS replication lag
  • 52. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Very simple plan All regions send critical write traffic to a single master Replication traffic
  • 53. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “Simple, but higher latency and network dependency”
  • 54. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Better plan (for most cases) Tenant in Region A Tenant in Region B Tenant in Region C Sync Async Async
  • 55. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Applications get faster response Some committed transactions may not make it to the failover region, before the switch. Committed transactions are safe due to standby. They can be recovered with help of reconciliation techniques. However
  • 56. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Zero application downtime from ANY instance failure Zero application downtime from ANY AZ failure Faster write performance and higher scale Sign up for single-region multi-master preview today; Multi-Region Multi-Master coming in 2018 Availability Zone 1 Scale out both reads and writes Availability Zone 2 Availability Zone 3 Application Read/Write Master 1 Shared distributed storage volume Read/Write Master 2 Read/Write Master 3 Aurora multi-master—scale out reads & writes F i r s t M y S Q L c o m p a t i b l e D B s e r v i c e w i t h s c a l e o u t a c r o s s m u l t i p l e d a t a c e n t e r s
  • 57. Database Migration Service (DMS) for granular replication DMS Replication instance Source Target Update t1 t2 t1 t2 Transactions Change apply after bulk load Change data capture (supported for MySQL, MariaDB, Aurora, PostgreSQL) Details: http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.html
  • 58. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon DynamoDB F a s t a n d f l e x i b l e N o S Q L d a t a b a s e s e r v i c e f o r a n y s c a l e Fast, consistent performance Highly scalable Fully managed Business critical reliability Consistent single-digit millisecond latency; DAX in-memory performance reduces response times to microseconds Auto-scaling to hundreds of terabytes of data that serve millions of requests per second Automatic provisioning, infrastructure management, scaling, and configuration with zero downtime Data is replicated across fault tolerant Availability Zones, with fine-grained access control
  • 59. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon DynamoDB Global Tables ( G A ) Fi r s t f u l l y m a n a g e d , m u l t i - m a s t e r , m u l t i - r e g i o n d a t a b a s e Build high performance, globally distributed applications Low latency reads & writes to locally available tables Disaster proof with multi-region redundancy Easy to set up and no application rewrites required Globally dispersed users Replica (N. America) Replica (Europe) Replica (Asia) Global App Global Table
  • 60. Open Source Cross- Region Replication Library or Lambda Amazon DynamoDB cross-region replication with streams
  • 61. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is AWS Lambda • It is a serverless, event driven compute service • Define functions, and the service executes them as many times as input arrives • Many supported event sources including: Kinesis and DynamoDB streams • Scales automatically with event rate • Supported languages: Java, .Net, Node.js, Python Ensures streamed records in shards are processed (replicated in this case) in order!
  • 62. DynamoDB Streams and AWS Lambda
  • 63. Error handling in Lambda 1. For stream-based event sources (Amazon Kinesis Streams and DynamoDB streams), AWS Lambda polls your stream and invokes your Lambda function. 2. Therefore, if a Lambda function fails, AWS Lambda attempts to process the erring batch of records until the time the data expires from the stream. 3. The exception is treated as blocking, and AWS Lambda will not read any new records from the stream until the failed batch of records either expires or processed successfully. This ensures that AWS Lambda processes the stream events in order. http://docs.aws.amazon.com/lambda/latest/dg/retries-on-errors.html
  • 64. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ElasticSearch cross region replication Amazon Kinesis Firehose Amazon Kinesis Firehose Amazon ES Amazon ES Source Source needs to have tracking to have successful posting to both regions Region A Region B
  • 65. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Redshift cross region replication Amazon RedshiftAmazon Kinesis Firehose Amazon Kinesis Firehose Amazon Redshift Source Source needs to have tracking to have successful posting to both regions Region A Region B
  • 66. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Global Traffic Management with Route 53
  • 67. Global service Load balancers and/or servers across multiple regions globally Region A Region B Region C User for Region A User for Region C User for Region B
  • 68. Traffic flow: endpoints • Hybrid/low level infrastructure: IP address or CNAME • ELB Classic Load Balancer / Application Load Balancer • Amazon S3 website • Amazon CloudFront distribution • AWS Elastic Beanstalk environment
  • 69. Traffic flow: Quick overview of rules Simple routing policy – Use for a single resource that performs a given function for your domain, for example, a web server that serves content for the example.com website. Failover routing policy – Use when you want to configure active-passive failover. Geolocation routing policy – Use when you want to route traffic based on the location of your users.
  • 70. Traffic flow: Quick overview of rules Geoproximity routing policy – Use when you want to route traffic based on the location of your resources and, optionally, shift traffic from one resource in one location to resources in another. Latency routing policy – Use when you have resources in multiple locations and you want to route traffic to the resource that provides the best latency. Multivalue answer routing policy – Use when you want Amazon Route 53 to respond to DNS queries with up to eight healthy records selected at random. Weighted routing policy – Use to route traffic to multiple resources in proportions that you specify.
  • 72. Example of an advanced policy Failover Rule Geolocation Rule Failover Rule End points
  • 73. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cross-region code deployment
  • 74. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. BLUE-GREEN Link
  • 75. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. CANARY Link
  • 76. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Is code deployment any different? No. DevOps pipelines work the same way. Important trade-off you should make: 1. Simultaneous deployments 2. “One region at a time” deployments
  • 77. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Using AWS services https://aws.amazon.com/blogs/devops/building-a-cross-regioncross-account-code-deployment-solution-on-aws/
  • 78. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Parameter localization App 1 App 2 App 1 App 2 Local parameters DB Local parameters DB Region A Region B Central parameter configuration application DevOps pipeline deploying same code base Take a look at Netflix Archaius!
  • 79. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cross Region Monitoring
  • 80. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Is monitoring any different? Yes. You need to do additional monitoring. 1. Replication lags (very critical) 2. Record offset tracking Need to put important stats for two/more regions on a single canvas for proper monitoring.
  • 81. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitoring replication lag – CloudWatch
  • 82. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitoring Amazon S3 file replication progress https://aws.amazon.com/answers/infrastructure-management/crr-monitor/ Info on new files Info on replicated files
  • 83. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitoring using serverless components Arrow direction indicates general direction of data flow EC2 instances VPC Flow Logs CloudTrail Audit Logs S3 Access Logs ELB Access Logs CloudFront Access Logs SNS Notifications DynamoDB Streams SES Inbound Email Cognito Events Kinesis Streams CloudWatch Events & Alarms Config Rules AmazonKinesis Firehose AmazonKinesis Firehose Kinesis Agent https://d0.awsstatic.com/whitepapers/whitepaper-use-amazon-elasticsearch-to-log-and-monitor-almost-everything.pdf Highly available deployment across two AZs
  • 84. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Not all metrics are created equal! High-Level Metric (high importance, low volume): • User experience • User count • Transaction count • Replication status Low-Level Metric (relatively low importance, high volume): • HTTP request count • Read vs. write throughout • Cache hit vs. miss
  • 85. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. High-Level Metrics monitoring Low-Level Metrics monitoring Low-Level Metrics monitoring High-Level Metrics monitoring High-Level Metrics monitoring App monitoring agents App monitoring agents Region A Region B Replicate only High-Level Metrics. Use region tags.
  • 86. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Some useful open source projects
  • 87. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Zuul & Isthmus from Netflix Request rerouting for users coming from unwanted region, handling load balancer failures
  • 88. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EVCache from Netflix For remote cache invalidation and synchronization
  • 89. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Conclusions
  • 90. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Conclusions o Avoid synchronous replication & simultaneous deployments as much as possible o Design applications for idempotency & eventual consistency as much as possible o Closely monitor replication & code sync delays o Have push buttons ready to switch traffic for tenants o Make High-Level Metrics monitoring systems also Multi-Region
  • 91. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Conclusions o It is an involved exercise. It requires careful planning and design. o However, various AWS services make implementation much easier by doing undifferentiated heavy lifting for our customers.
  • 92. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Conclusions o For companies with extremely high availability requirements or/and geographically distributed user base, benefits of Multi-Region Active-Active architecture can be profound. o In such cases, consider designing applications for Multi-Region Active-Active implementation from DAY ONE. But, again as we say in Amazon, today is DAY ONE!
  • 93. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. THANK YOU!