SlideShare a Scribd company logo
1 of 65
Download to read offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Under the Hood of Route 53
Gavin McCullagh
System Development Engineer
Amazon Route 53
A R C 4 0 8
Alec Peterson
General Manager
Amazon Route 53
It’s not DNS
There’s no way it’s DNS
It was DNS
u/SSBroski
A haiku about DNS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Design for high availability: Amazon Route 53
public DNS data plane
Redundancy, redundancy, redundancy
Blast radius reduction
Customer isolation
Constant work (maybe)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Goal: A good discussion about design patterns
Questions, discussions, debates are welcome
Really
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Definitions
PoP: (or point of presence) basic data center footprint.
Multiple DNS servers. Often co-located.
Data plane: the DNS service that answers queries.
Consists of many PoPs.
Control plane: the Web API that accepts calls to create
and update zones and records.
Blast radius: the scope of impact when a problem occurs.
Eye ball/transit: networks hosting clients vs
interconnecting transit providers
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Availability SLAs
SLA = service level agreement
99.9% SLA
1 min 26.4 sec per day
43 min 49.7 sec per month
8 hour 45 min 57.0 sec per year
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Availability SLAs
99.99% SLA
8.6 sec per day
4 min 23 sec per month
52 min 35.7 sec per year
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Availability SLAs
100% SLA
Makes the math 100% easier
Why 100%?
Every 99.99% SLA service depends on DNS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What about DNS caching?
Suppose 300x resolvers cache a TTL of 60
Some resolver features help
Prefetching … fetch cached records early
Stale caching … use the last-known good answer
Most resolvers don’t do any of this 
TTLs: 5 sec (Amazon Simple Storage Service [Amazon S3], Amazon DynamoDB,
Amazon Relational Database Service [Amazon RDS]), 1 sec (Amazon Aurora).
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Design goals
100% data plane availability
Support all AWS customers
Customer isolation
Low latency
Affordable
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Common failures
Things that can fail:
Hosts, switches, routers, power, PoPs
Network paths, transit providers, TLDs
Solution: Disposable, independent PoPs
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Uncommon failures
How could a whole data plane fail?
Deployments
Operator Makes Global Change
Common Routing
Common Transit
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Redundant data planes
A Route 53 delegation set:
gavinmc.com. 172800 IN NS ns-190.awsdns-23.com.
gavinmc.com. 172800 IN NS ns-1084.awsdns-07.org.
gavinmc.com. 172800 IN NS ns-1831.awsdns-36.co.uk.
gavinmc.com. 172800 IN NS ns-634.awsdns-15.net.
DNS resolvers retry against each NS
Each data plane (“stripe”) is one /23 subnet, routed
independently
Our stripes deployed, operated separately
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Result
Loss of one data plane has minimal impact for any customer
PoP black holing
Routing problem
Transit provider congestion event
TLD failure
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Questions
How big is your failure domain?
Do you operate isolated, redundant failure
domains?
Is this a pattern you use or would consider?
What pros/cons do you see?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Blast radius reduction—Anycast
20x PoPs advertising each IP prefix to BGP
Resolvers hit nearest PoP for each stripe
Reduces blast radius, improves latency
If a PoP fails, we route it elsewhere
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
COM stripe
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NET stripe
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
ORG stripe
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
CO.UK stripe
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Non-striped anycast
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Route 53
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Blast Radius: Deployments
CO.UKNETORGCOMOnePoPGamma
TST1 FRA53
ATL50 EWR50 JFK1 IAD12
ORD51 SEA4 SFO9 ORD50
PHL50 JFK5 JFK6 ORD54
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Results
Data plane failures typically geo contained
and stripe
Individual bad clients are geo contained
Deployments failures are geo and stripe
contained
Latency Trade-Offs
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Questions?
How do you contain Blast Radius?
Do you align deployments with blast radius?
Do you partition your service endpoints?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Customer isolation
Prevent customers impacting each other
Trade-offs:
Multi-tenant services offer cost efficiency
Single tenant gives isolation, but expensive
Blast radius
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Horizontal scaling
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Horizontal scaling
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Horizontal scaling
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Horizontal scaling
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Horizontal scaling
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sharding
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sharding
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sharding
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shuffle sharding
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shuffle sharding
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shuffle sharding
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shuffle sharding
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shuffle sharding
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shuffle sharding
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shuffle sharding Route 53
Route 53 has 512x nameservers per stripe
Every hosted zone gets one NS on each stripe
Guaranteed max overlap of 2x nameservers
A Route 53 delegation set:
gavinmc.com. 172800 IN NS ns-190.awsdns-23.com.
gavinmc.com. 172800 IN NS ns-634.awsdns-15.net.
gavinmc.com. 172800 IN NS ns-1084.awsdns-07.org.
gavinmc.com. 172800 IN NS ns-1831.awsdns-36.co.uk.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Results
Benefits:
High availability and customer isolation at low cost
Rare single-customer impacts are contained to the single customer
Route 53 continually meets 100% availability SLA for customers
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Results
Challenges:
Customer experience monitoring can be challenging
Nameservers are easy to confuse or typo
Capacity management can be challenging
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Using shuffle sharding
Routing layer, for example, per customer/resource DNS names
For example: Elastic Load Balancing, Amazon CloudFront, unique names
Smart retrying client
Means to withdraw failing endpoints
Multiple redundant endpoints
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Route 53 as routing layer
1x DNS name per customer
1x A and/or AAAA record per physical endpoint
Health checks (NB fail open)
WRR combinations of ALIAS to endpoints
Multi-value answers
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Weighted round robin
endpoint1.mydomain.com/A 1.2.3.4 (Health Checked)
endpoint2.mydomain.com/A 1.2.3.5 (Health Checked)
endpoint3.mydomain.com/A 1.2.3.6 (Health Checked)
…
customer1.mydomain.com./A WRR(ALIAS endpoint1, ALIAS endpoint2)
customer2.mydomain.com./A WRR(ALIAS endpoint1, ALIAS endpoint3)
customer3.mydomain.com./A WRR(ALIAS endpoint2, ALIAS endpoint3)
…
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: Multi-value answers
endpoint1.mydomain.com/A 1.2.3.4 (Health Checked)
endpoint2.mydomain.com/A 1.2.3.5 (Health Checked)
endpoint3.mydomain.com/A 1.2.3.6 (Health Checked)
…
customer1.mydomain.com./A MVA(ALIAS endpoint1, ALIAS endpoint2)
customer2.mydomain.com./A MVA(ALIAS endpoint1, ALIAS endpoint3)
customer3.mydomain.com./A MVA(ALIAS endpoint2, ALIAS endpoint3)
…
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Questions?
Would you build this?
Do you see pros/cons we’ve missed?
What tools would you look for?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
“Bimodal is the practice of managing two separate but
coherent styles of work: one focused on predictability;
the other on exploration.”
Gartner IT Glossary
“If your system has a mode change once every six
months, you should plan for an outage about twice a
year.”
Alec Peterson
GM & Plagiarist, Route 53
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Non-constant work
Database failure, failover to standby
DC failure, failover to remote DC
Dependency fails, fall back to alt code
path
API Caller changes pattern
Major storage failure, revert backups
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Anti-patterns
Untested bimodal/fallback paths
Do it all the time or don’t ever do it
Optimizing for 99.99% of cases
If X fails (0.01%), try Y
Accepting unbounded work from clients
Throttling, fail fast
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DNS data propagation
Route 53
Control Plane
PoP1 PoP2 PoP3 PoP4
Config Data
Store
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Route 53 health checks
Your
endpoint
Checker (us-
east-1)
Checker (us-
west-2)
Checker (eu-
west-1)
Checker (ap-
southeast-1)
Checker (sa-
east-1)
Checker (ap-
southeast-2)
Checker (us-
east-2)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Route 53 DNS API
ListResourceRecordSets is paginated.
API Calls are throttled globally, by
customer and by call type.
If API overloaded, fail requests fast.
Work is bounded to a limit we can sustain.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Is this always possible?
Single-Master Databases
DNS
Nameserver failures == traffic shifts.
Caching/Retries cause surprising query load
increase after outages.
Zone Transfer is incremental, but falls back to
full.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Constant work
Bounded workloads in all operating modes
Prefer redundant work always vs occasionally
increased work
Be wary of optimizing for most but not all workloads. Be
wary of caches.
Throttle APIs, bound their work
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Alec Peterson
General Manager
Route 53
Gavin McCullagh
System Development Engineer
Route 53
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

What's hot

20191127 AWS Black Belt Online Seminar Amazon CloudWatch Container Insights で...
20191127 AWS Black Belt Online Seminar Amazon CloudWatch Container Insights で...20191127 AWS Black Belt Online Seminar Amazon CloudWatch Container Insights で...
20191127 AWS Black Belt Online Seminar Amazon CloudWatch Container Insights で...Amazon Web Services Japan
 
20210126 AWS Black Belt Online Seminar AWS CodeDeploy
20210126 AWS Black Belt Online Seminar AWS CodeDeploy20210126 AWS Black Belt Online Seminar AWS CodeDeploy
20210126 AWS Black Belt Online Seminar AWS CodeDeployAmazon Web Services Japan
 
AWS EC2 Eメール制限解除 - 逆引き(rDNS)設定 申請手順
AWS EC2 Eメール制限解除 - 逆引き(rDNS)設定 申請手順AWS EC2 Eメール制限解除 - 逆引き(rDNS)設定 申請手順
AWS EC2 Eメール制限解除 - 逆引き(rDNS)設定 申請手順Amazon Web Services Japan
 
20191029 AWS Black Belt Online Seminar Elastic Load Balancing (ELB)
20191029 AWS Black Belt Online Seminar Elastic Load Balancing (ELB)20191029 AWS Black Belt Online Seminar Elastic Load Balancing (ELB)
20191029 AWS Black Belt Online Seminar Elastic Load Balancing (ELB)Amazon Web Services Japan
 
20191126 AWS Black Belt Online Seminar Amazon AppStream 2.0
20191126 AWS Black Belt Online Seminar Amazon AppStream 2.020191126 AWS Black Belt Online Seminar Amazon AppStream 2.0
20191126 AWS Black Belt Online Seminar Amazon AppStream 2.0Amazon Web Services Japan
 
20180322 AWS Black Belt Online Seminar AWS Snowball Edge
20180322 AWS Black Belt Online Seminar AWS Snowball Edge20180322 AWS Black Belt Online Seminar AWS Snowball Edge
20180322 AWS Black Belt Online Seminar AWS Snowball EdgeAmazon Web Services Japan
 
20190730 AWS Black Belt Online Seminar Amazon CloudFrontの概要
20190730 AWS Black Belt Online Seminar Amazon CloudFrontの概要20190730 AWS Black Belt Online Seminar Amazon CloudFrontの概要
20190730 AWS Black Belt Online Seminar Amazon CloudFrontの概要Amazon Web Services Japan
 
20190320 AWS Black Belt Online Seminar Amazon EBS
20190320 AWS Black Belt Online Seminar Amazon EBS20190320 AWS Black Belt Online Seminar Amazon EBS
20190320 AWS Black Belt Online Seminar Amazon EBSAmazon Web Services Japan
 
[CTO Night & Day 2019] Amazon Pinpoint でかゆいところに手が届くユーザー動向分析とセグメント通知 #ctonight
[CTO Night & Day 2019] Amazon Pinpoint でかゆいところに手が届くユーザー動向分析とセグメント通知 #ctonight[CTO Night & Day 2019] Amazon Pinpoint でかゆいところに手が届くユーザー動向分析とセグメント通知 #ctonight
[CTO Night & Day 2019] Amazon Pinpoint でかゆいところに手が届くユーザー動向分析とセグメント通知 #ctonightAmazon Web Services Japan
 
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...Amazon Web Services Korea
 
AWS Black Belt Online Seminar 2017 AWS OpsWorks
AWS Black Belt Online Seminar 2017 AWS OpsWorksAWS Black Belt Online Seminar 2017 AWS OpsWorks
AWS Black Belt Online Seminar 2017 AWS OpsWorksAmazon Web Services Japan
 
20200812 AWS Black Belt Online Seminar Amazon Macie
20200812 AWS Black Belt Online Seminar Amazon Macie20200812 AWS Black Belt Online Seminar Amazon Macie
20200812 AWS Black Belt Online Seminar Amazon MacieAmazon Web Services Japan
 
20190402 AWS Black Belt Online Seminar Let's Dive Deep into AWS Lambda Part1 ...
20190402 AWS Black Belt Online Seminar Let's Dive Deep into AWS Lambda Part1 ...20190402 AWS Black Belt Online Seminar Let's Dive Deep into AWS Lambda Part1 ...
20190402 AWS Black Belt Online Seminar Let's Dive Deep into AWS Lambda Part1 ...Amazon Web Services Japan
 
AWS Black Belt Online Seminar 2018 AWS Certificate Manager
AWS Black Belt Online Seminar 2018 AWS Certificate ManagerAWS Black Belt Online Seminar 2018 AWS Certificate Manager
AWS Black Belt Online Seminar 2018 AWS Certificate ManagerAmazon Web Services Japan
 
20190130 AWS Black Belt Online Seminar AWS Identity and Access Management (AW...
20190130 AWS Black Belt Online Seminar AWS Identity and Access Management (AW...20190130 AWS Black Belt Online Seminar AWS Identity and Access Management (AW...
20190130 AWS Black Belt Online Seminar AWS Identity and Access Management (AW...Amazon Web Services Japan
 
AWS Black Belt Techシリーズ AWS Direct Connect
AWS Black Belt Techシリーズ AWS Direct ConnectAWS Black Belt Techシリーズ AWS Direct Connect
AWS Black Belt Techシリーズ AWS Direct ConnectAmazon Web Services Japan
 

What's hot (20)

20191127 AWS Black Belt Online Seminar Amazon CloudWatch Container Insights で...
20191127 AWS Black Belt Online Seminar Amazon CloudWatch Container Insights で...20191127 AWS Black Belt Online Seminar Amazon CloudWatch Container Insights で...
20191127 AWS Black Belt Online Seminar Amazon CloudWatch Container Insights で...
 
20210126 AWS Black Belt Online Seminar AWS CodeDeploy
20210126 AWS Black Belt Online Seminar AWS CodeDeploy20210126 AWS Black Belt Online Seminar AWS CodeDeploy
20210126 AWS Black Belt Online Seminar AWS CodeDeploy
 
AWS EC2 Eメール制限解除 - 逆引き(rDNS)設定 申請手順
AWS EC2 Eメール制限解除 - 逆引き(rDNS)設定 申請手順AWS EC2 Eメール制限解除 - 逆引き(rDNS)設定 申請手順
AWS EC2 Eメール制限解除 - 逆引き(rDNS)設定 申請手順
 
Black Belt Online Seminar AWS Amazon S3
Black Belt Online Seminar AWS Amazon S3Black Belt Online Seminar AWS Amazon S3
Black Belt Online Seminar AWS Amazon S3
 
AWS Black Belt online seminar 2017 Snowball
AWS Black Belt online seminar 2017 SnowballAWS Black Belt online seminar 2017 Snowball
AWS Black Belt online seminar 2017 Snowball
 
20191029 AWS Black Belt Online Seminar Elastic Load Balancing (ELB)
20191029 AWS Black Belt Online Seminar Elastic Load Balancing (ELB)20191029 AWS Black Belt Online Seminar Elastic Load Balancing (ELB)
20191029 AWS Black Belt Online Seminar Elastic Load Balancing (ELB)
 
20191126 AWS Black Belt Online Seminar Amazon AppStream 2.0
20191126 AWS Black Belt Online Seminar Amazon AppStream 2.020191126 AWS Black Belt Online Seminar Amazon AppStream 2.0
20191126 AWS Black Belt Online Seminar Amazon AppStream 2.0
 
20180322 AWS Black Belt Online Seminar AWS Snowball Edge
20180322 AWS Black Belt Online Seminar AWS Snowball Edge20180322 AWS Black Belt Online Seminar AWS Snowball Edge
20180322 AWS Black Belt Online Seminar AWS Snowball Edge
 
20190730 AWS Black Belt Online Seminar Amazon CloudFrontの概要
20190730 AWS Black Belt Online Seminar Amazon CloudFrontの概要20190730 AWS Black Belt Online Seminar Amazon CloudFrontの概要
20190730 AWS Black Belt Online Seminar Amazon CloudFrontの概要
 
20190320 AWS Black Belt Online Seminar Amazon EBS
20190320 AWS Black Belt Online Seminar Amazon EBS20190320 AWS Black Belt Online Seminar Amazon EBS
20190320 AWS Black Belt Online Seminar Amazon EBS
 
[CTO Night & Day 2019] Amazon Pinpoint でかゆいところに手が届くユーザー動向分析とセグメント通知 #ctonight
[CTO Night & Day 2019] Amazon Pinpoint でかゆいところに手が届くユーザー動向分析とセグメント通知 #ctonight[CTO Night & Day 2019] Amazon Pinpoint でかゆいところに手が届くユーザー動向分析とセグメント通知 #ctonight
[CTO Night & Day 2019] Amazon Pinpoint でかゆいところに手が届くユーザー動向分析とセグメント通知 #ctonight
 
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...
 
AWS Black Belt Online Seminar 2017 AWS OpsWorks
AWS Black Belt Online Seminar 2017 AWS OpsWorksAWS Black Belt Online Seminar 2017 AWS OpsWorks
AWS Black Belt Online Seminar 2017 AWS OpsWorks
 
20200812 AWS Black Belt Online Seminar Amazon Macie
20200812 AWS Black Belt Online Seminar Amazon Macie20200812 AWS Black Belt Online Seminar Amazon Macie
20200812 AWS Black Belt Online Seminar Amazon Macie
 
20190402 AWS Black Belt Online Seminar Let's Dive Deep into AWS Lambda Part1 ...
20190402 AWS Black Belt Online Seminar Let's Dive Deep into AWS Lambda Part1 ...20190402 AWS Black Belt Online Seminar Let's Dive Deep into AWS Lambda Part1 ...
20190402 AWS Black Belt Online Seminar Let's Dive Deep into AWS Lambda Part1 ...
 
AWS Black Belt Online Seminar 2018 AWS Certificate Manager
AWS Black Belt Online Seminar 2018 AWS Certificate ManagerAWS Black Belt Online Seminar 2018 AWS Certificate Manager
AWS Black Belt Online Seminar 2018 AWS Certificate Manager
 
20190130 AWS Black Belt Online Seminar AWS Identity and Access Management (AW...
20190130 AWS Black Belt Online Seminar AWS Identity and Access Management (AW...20190130 AWS Black Belt Online Seminar AWS Identity and Access Management (AW...
20190130 AWS Black Belt Online Seminar AWS Identity and Access Management (AW...
 
AWS Black Belt Techシリーズ AWS Direct Connect
AWS Black Belt Techシリーズ AWS Direct ConnectAWS Black Belt Techシリーズ AWS Direct Connect
AWS Black Belt Techシリーズ AWS Direct Connect
 
20170725 black belt_monitoring_on_aws
20170725 black belt_monitoring_on_aws20170725 black belt_monitoring_on_aws
20170725 black belt_monitoring_on_aws
 
Microsoft licensing on AWS
Microsoft licensing on AWSMicrosoft licensing on AWS
Microsoft licensing on AWS
 

Similar to AWS Route 53: Redundant Data Planes and Sharding for Availability and Isolation

[NEW LAUNCH!] How to Architect for Multi-Region Redundancy Using Anycast IPs ...
[NEW LAUNCH!] How to Architect for Multi-Region Redundancy Using Anycast IPs ...[NEW LAUNCH!] How to Architect for Multi-Region Redundancy Using Anycast IPs ...
[NEW LAUNCH!] How to Architect for Multi-Region Redundancy Using Anycast IPs ...Amazon Web Services
 
Resiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudResiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudAmazon Web Services
 
Making Headless Drupal Serverless
Making Headless Drupal ServerlessMaking Headless Drupal Serverless
Making Headless Drupal ServerlessAmazon Web Services
 
Optimize EC2 for Fun and Profit - SRV203 - Anaheim AWS Summit
Optimize EC2 for Fun and Profit - SRV203 - Anaheim AWS SummitOptimize EC2 for Fun and Profit - SRV203 - Anaheim AWS Summit
Optimize EC2 for Fun and Profit - SRV203 - Anaheim AWS SummitAmazon Web Services
 
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Amazon Web Services
 
[NEW LAUNCH!] Introduction to AWS Global Accelerator (NET330) - AWS re:Invent...
[NEW LAUNCH!] Introduction to AWS Global Accelerator (NET330) - AWS re:Invent...[NEW LAUNCH!] Introduction to AWS Global Accelerator (NET330) - AWS re:Invent...
[NEW LAUNCH!] Introduction to AWS Global Accelerator (NET330) - AWS re:Invent...Amazon Web Services
 
How AWS Minimizes the Blast Radius of Failures (ARC338) - AWS re:Invent 2018
How AWS Minimizes the Blast Radius of Failures (ARC338) - AWS re:Invent 2018How AWS Minimizes the Blast Radius of Failures (ARC338) - AWS re:Invent 2018
How AWS Minimizes the Blast Radius of Failures (ARC338) - AWS re:Invent 2018Amazon Web Services
 
ServerlessConf 2018 Keynote - Debunking Serverless Myths
ServerlessConf 2018 Keynote - Debunking Serverless MythsServerlessConf 2018 Keynote - Debunking Serverless Myths
ServerlessConf 2018 Keynote - Debunking Serverless MythsTim Wagner
 
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...Amazon Web Services
 
Building Global Multi-Region, Active-Active Serverless Backends
Building Global Multi-Region, Active-Active Serverless Backends Building Global Multi-Region, Active-Active Serverless Backends
Building Global Multi-Region, Active-Active Serverless Backends Amazon Web Services
 
Introduction to Amazon Route 53 Resolver for Hybrid Cloud (NET215) - AWS re:I...
Introduction to Amazon Route 53 Resolver for Hybrid Cloud (NET215) - AWS re:I...Introduction to Amazon Route 53 Resolver for Hybrid Cloud (NET215) - AWS re:I...
Introduction to Amazon Route 53 Resolver for Hybrid Cloud (NET215) - AWS re:I...Amazon Web Services
 
What’s New for Amazon DynamoDB - 2018 Q1 Update - AWS Online Tech Talks
What’s New for Amazon DynamoDB - 2018 Q1 Update - AWS Online Tech TalksWhat’s New for Amazon DynamoDB - 2018 Q1 Update - AWS Online Tech Talks
What’s New for Amazon DynamoDB - 2018 Q1 Update - AWS Online Tech TalksAmazon Web Services
 
Scaling up to and beyond 10M users
Scaling up to and beyond 10M usersScaling up to and beyond 10M users
Scaling up to and beyond 10M usersAmazon Web Services
 
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...Amazon Web Services
 
Leadership Session: Networking (NET209-L) - AWS re:Invent 2018
Leadership Session: Networking (NET209-L) - AWS re:Invent 2018Leadership Session: Networking (NET209-L) - AWS re:Invent 2018
Leadership Session: Networking (NET209-L) - AWS re:Invent 2018Amazon Web Services
 
Architecting Next Generation Serverless SaaS Solutions on AWS (ARC324-R1) - A...
Architecting Next Generation Serverless SaaS Solutions on AWS (ARC324-R1) - A...Architecting Next Generation Serverless SaaS Solutions on AWS (ARC324-R1) - A...
Architecting Next Generation Serverless SaaS Solutions on AWS (ARC324-R1) - A...Amazon Web Services
 
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...Amazon Web Services
 
Improve Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech Talks
Improve Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech TalksImprove Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech Talks
Improve Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech TalksAmazon Web Services
 

Similar to AWS Route 53: Redundant Data Planes and Sharding for Availability and Isolation (20)

[NEW LAUNCH!] How to Architect for Multi-Region Redundancy Using Anycast IPs ...
[NEW LAUNCH!] How to Architect for Multi-Region Redundancy Using Anycast IPs ...[NEW LAUNCH!] How to Architect for Multi-Region Redundancy Using Anycast IPs ...
[NEW LAUNCH!] How to Architect for Multi-Region Redundancy Using Anycast IPs ...
 
Resiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudResiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the Cloud
 
Best of AWS re:Invent 2017
Best of AWS re:Invent 2017Best of AWS re:Invent 2017
Best of AWS re:Invent 2017
 
Making Headless Drupal Serverless
Making Headless Drupal ServerlessMaking Headless Drupal Serverless
Making Headless Drupal Serverless
 
Taking Serverless to the Edge
Taking Serverless to the Edge Taking Serverless to the Edge
Taking Serverless to the Edge
 
Optimize EC2 for Fun and Profit - SRV203 - Anaheim AWS Summit
Optimize EC2 for Fun and Profit - SRV203 - Anaheim AWS SummitOptimize EC2 for Fun and Profit - SRV203 - Anaheim AWS Summit
Optimize EC2 for Fun and Profit - SRV203 - Anaheim AWS Summit
 
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
 
[NEW LAUNCH!] Introduction to AWS Global Accelerator (NET330) - AWS re:Invent...
[NEW LAUNCH!] Introduction to AWS Global Accelerator (NET330) - AWS re:Invent...[NEW LAUNCH!] Introduction to AWS Global Accelerator (NET330) - AWS re:Invent...
[NEW LAUNCH!] Introduction to AWS Global Accelerator (NET330) - AWS re:Invent...
 
How AWS Minimizes the Blast Radius of Failures (ARC338) - AWS re:Invent 2018
How AWS Minimizes the Blast Radius of Failures (ARC338) - AWS re:Invent 2018How AWS Minimizes the Blast Radius of Failures (ARC338) - AWS re:Invent 2018
How AWS Minimizes the Blast Radius of Failures (ARC338) - AWS re:Invent 2018
 
ServerlessConf 2018 Keynote - Debunking Serverless Myths
ServerlessConf 2018 Keynote - Debunking Serverless MythsServerlessConf 2018 Keynote - Debunking Serverless Myths
ServerlessConf 2018 Keynote - Debunking Serverless Myths
 
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
 
Building Global Multi-Region, Active-Active Serverless Backends
Building Global Multi-Region, Active-Active Serverless Backends Building Global Multi-Region, Active-Active Serverless Backends
Building Global Multi-Region, Active-Active Serverless Backends
 
Introduction to Amazon Route 53 Resolver for Hybrid Cloud (NET215) - AWS re:I...
Introduction to Amazon Route 53 Resolver for Hybrid Cloud (NET215) - AWS re:I...Introduction to Amazon Route 53 Resolver for Hybrid Cloud (NET215) - AWS re:I...
Introduction to Amazon Route 53 Resolver for Hybrid Cloud (NET215) - AWS re:I...
 
What’s New for Amazon DynamoDB - 2018 Q1 Update - AWS Online Tech Talks
What’s New for Amazon DynamoDB - 2018 Q1 Update - AWS Online Tech TalksWhat’s New for Amazon DynamoDB - 2018 Q1 Update - AWS Online Tech Talks
What’s New for Amazon DynamoDB - 2018 Q1 Update - AWS Online Tech Talks
 
Scaling up to and beyond 10M users
Scaling up to and beyond 10M usersScaling up to and beyond 10M users
Scaling up to and beyond 10M users
 
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
 
Leadership Session: Networking (NET209-L) - AWS re:Invent 2018
Leadership Session: Networking (NET209-L) - AWS re:Invent 2018Leadership Session: Networking (NET209-L) - AWS re:Invent 2018
Leadership Session: Networking (NET209-L) - AWS re:Invent 2018
 
Architecting Next Generation Serverless SaaS Solutions on AWS (ARC324-R1) - A...
Architecting Next Generation Serverless SaaS Solutions on AWS (ARC324-R1) - A...Architecting Next Generation Serverless SaaS Solutions on AWS (ARC324-R1) - A...
Architecting Next Generation Serverless SaaS Solutions on AWS (ARC324-R1) - A...
 
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
 
Improve Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech Talks
Improve Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech TalksImprove Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech Talks
Improve Efficiency by Migrating Messaging to Amazon MQ - AWS Online Tech Talks
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

AWS Route 53: Redundant Data Planes and Sharding for Availability and Isolation

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Under the Hood of Route 53 Gavin McCullagh System Development Engineer Amazon Route 53 A R C 4 0 8 Alec Peterson General Manager Amazon Route 53
  • 3. It’s not DNS There’s no way it’s DNS It was DNS u/SSBroski A haiku about DNS
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Design for high availability: Amazon Route 53 public DNS data plane Redundancy, redundancy, redundancy Blast radius reduction Customer isolation Constant work (maybe)
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Goal: A good discussion about design patterns Questions, discussions, debates are welcome Really
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Definitions PoP: (or point of presence) basic data center footprint. Multiple DNS servers. Often co-located. Data plane: the DNS service that answers queries. Consists of many PoPs. Control plane: the Web API that accepts calls to create and update zones and records. Blast radius: the scope of impact when a problem occurs. Eye ball/transit: networks hosting clients vs interconnecting transit providers
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Availability SLAs SLA = service level agreement 99.9% SLA 1 min 26.4 sec per day 43 min 49.7 sec per month 8 hour 45 min 57.0 sec per year
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Availability SLAs 99.99% SLA 8.6 sec per day 4 min 23 sec per month 52 min 35.7 sec per year
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Availability SLAs 100% SLA Makes the math 100% easier Why 100%? Every 99.99% SLA service depends on DNS
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What about DNS caching? Suppose 300x resolvers cache a TTL of 60 Some resolver features help Prefetching … fetch cached records early Stale caching … use the last-known good answer Most resolvers don’t do any of this  TTLs: 5 sec (Amazon Simple Storage Service [Amazon S3], Amazon DynamoDB, Amazon Relational Database Service [Amazon RDS]), 1 sec (Amazon Aurora).
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Design goals 100% data plane availability Support all AWS customers Customer isolation Low latency Affordable
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Common failures Things that can fail: Hosts, switches, routers, power, PoPs Network paths, transit providers, TLDs Solution: Disposable, independent PoPs
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Uncommon failures How could a whole data plane fail? Deployments Operator Makes Global Change Common Routing Common Transit
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Redundant data planes A Route 53 delegation set: gavinmc.com. 172800 IN NS ns-190.awsdns-23.com. gavinmc.com. 172800 IN NS ns-1084.awsdns-07.org. gavinmc.com. 172800 IN NS ns-1831.awsdns-36.co.uk. gavinmc.com. 172800 IN NS ns-634.awsdns-15.net. DNS resolvers retry against each NS Each data plane (“stripe”) is one /23 subnet, routed independently Our stripes deployed, operated separately
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Result Loss of one data plane has minimal impact for any customer PoP black holing Routing problem Transit provider congestion event TLD failure
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Questions How big is your failure domain? Do you operate isolated, redundant failure domains? Is this a pattern you use or would consider? What pros/cons do you see?
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Blast radius reduction—Anycast 20x PoPs advertising each IP prefix to BGP Resolvers hit nearest PoP for each stripe Reduces blast radius, improves latency If a PoP fails, we route it elsewhere
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. COM stripe
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. NET stripe
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. ORG stripe
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. CO.UK stripe
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Non-striped anycast
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Route 53
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Blast Radius: Deployments CO.UKNETORGCOMOnePoPGamma TST1 FRA53 ATL50 EWR50 JFK1 IAD12 ORD51 SEA4 SFO9 ORD50 PHL50 JFK5 JFK6 ORD54
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Results Data plane failures typically geo contained and stripe Individual bad clients are geo contained Deployments failures are geo and stripe contained Latency Trade-Offs
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Questions? How do you contain Blast Radius? Do you align deployments with blast radius? Do you partition your service endpoints?
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Customer isolation Prevent customers impacting each other Trade-offs: Multi-tenant services offer cost efficiency Single tenant gives isolation, but expensive Blast radius
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Horizontal scaling
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Horizontal scaling
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Horizontal scaling
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Horizontal scaling
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Horizontal scaling
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sharding
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sharding
  • 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sharding
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shuffle sharding
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shuffle sharding
  • 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shuffle sharding
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shuffle sharding
  • 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shuffle sharding
  • 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shuffle sharding
  • 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shuffle sharding Route 53 Route 53 has 512x nameservers per stripe Every hosted zone gets one NS on each stripe Guaranteed max overlap of 2x nameservers A Route 53 delegation set: gavinmc.com. 172800 IN NS ns-190.awsdns-23.com. gavinmc.com. 172800 IN NS ns-634.awsdns-15.net. gavinmc.com. 172800 IN NS ns-1084.awsdns-07.org. gavinmc.com. 172800 IN NS ns-1831.awsdns-36.co.uk.
  • 47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Results Benefits: High availability and customer isolation at low cost Rare single-customer impacts are contained to the single customer Route 53 continually meets 100% availability SLA for customers
  • 48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Results Challenges: Customer experience monitoring can be challenging Nameservers are easy to confuse or typo Capacity management can be challenging
  • 49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Using shuffle sharding Routing layer, for example, per customer/resource DNS names For example: Elastic Load Balancing, Amazon CloudFront, unique names Smart retrying client Means to withdraw failing endpoints Multiple redundant endpoints
  • 50. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Route 53 as routing layer 1x DNS name per customer 1x A and/or AAAA record per physical endpoint Health checks (NB fail open) WRR combinations of ALIAS to endpoints Multi-value answers
  • 51. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example: Weighted round robin endpoint1.mydomain.com/A 1.2.3.4 (Health Checked) endpoint2.mydomain.com/A 1.2.3.5 (Health Checked) endpoint3.mydomain.com/A 1.2.3.6 (Health Checked) … customer1.mydomain.com./A WRR(ALIAS endpoint1, ALIAS endpoint2) customer2.mydomain.com./A WRR(ALIAS endpoint1, ALIAS endpoint3) customer3.mydomain.com./A WRR(ALIAS endpoint2, ALIAS endpoint3) …
  • 52. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example: Multi-value answers endpoint1.mydomain.com/A 1.2.3.4 (Health Checked) endpoint2.mydomain.com/A 1.2.3.5 (Health Checked) endpoint3.mydomain.com/A 1.2.3.6 (Health Checked) … customer1.mydomain.com./A MVA(ALIAS endpoint1, ALIAS endpoint2) customer2.mydomain.com./A MVA(ALIAS endpoint1, ALIAS endpoint3) customer3.mydomain.com./A MVA(ALIAS endpoint2, ALIAS endpoint3) …
  • 53. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Questions? Would you build this? Do you see pros/cons we’ve missed? What tools would you look for?
  • 54. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 55. “Bimodal is the practice of managing two separate but coherent styles of work: one focused on predictability; the other on exploration.” Gartner IT Glossary
  • 56. “If your system has a mode change once every six months, you should plan for an outage about twice a year.” Alec Peterson GM & Plagiarist, Route 53
  • 57. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Non-constant work Database failure, failover to standby DC failure, failover to remote DC Dependency fails, fall back to alt code path API Caller changes pattern Major storage failure, revert backups
  • 58. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Anti-patterns Untested bimodal/fallback paths Do it all the time or don’t ever do it Optimizing for 99.99% of cases If X fails (0.01%), try Y Accepting unbounded work from clients Throttling, fail fast
  • 59. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. DNS data propagation Route 53 Control Plane PoP1 PoP2 PoP3 PoP4 Config Data Store
  • 60. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Route 53 health checks Your endpoint Checker (us- east-1) Checker (us- west-2) Checker (eu- west-1) Checker (ap- southeast-1) Checker (sa- east-1) Checker (ap- southeast-2) Checker (us- east-2)
  • 61. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Route 53 DNS API ListResourceRecordSets is paginated. API Calls are throttled globally, by customer and by call type. If API overloaded, fail requests fast. Work is bounded to a limit we can sustain.
  • 62. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Is this always possible? Single-Master Databases DNS Nameserver failures == traffic shifts. Caching/Retries cause surprising query load increase after outages. Zone Transfer is incremental, but falls back to full.
  • 63. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Constant work Bounded workloads in all operating modes Prefer redundant work always vs occasionally increased work Be wary of optimizing for most but not all workloads. Be wary of caches. Throttle APIs, bound their work
  • 64. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Alec Peterson General Manager Route 53 Gavin McCullagh System Development Engineer Route 53
  • 65. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.