SlideShare a Scribd company logo
P U B L I C S E C T O R
S U M M I T
Wa shingto n, D C
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Failure is not an Option
Designing Highly Resilient AWS
Systems
Tim Griesbach
Manager, Solutions Architecture
AWS WWPS
3 0 2 9 5 5
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Agenda
What are we planning for? Risk and Resiliency requirements.
Think resiliently. Principles of Resiliency
Resilient design. System, Test, and Operations patterns & best practices
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
“Everything fails, all the time”
- Werner Vogels
(CTO, Amazon.com)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Resiliency is the ability for a
system to recover quickly and
continue operating even when a
failure occurs
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Push for resiliency
IT failures lead to broad society impact
• Government Services, Airline, Financial, Communications
Reputation / Legal
• More and more people depend on IT systems for everything.
$$$
• Lost productivity, idle time of people dependent on system
• Lost productivity, time putting out fires and recovering
• Lost revenue from system
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
What are we planning for?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Consider each applications
significance to your business,
and the potential impact if a disruption
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Cause Examples Probability
Operator error Manual operator error HIGH
Deployment induced Software, hardware, network, or configuration deployment
Both automated and manual changes
HIGH
Load induced Change in behavior, either of a specific caller or aggregate
Service reaching a tipping point
Load failures can occur in the network
Denial of service (DDoS)
HIGH
Data induced Data accepted by the system that it can’t process (“poison
pill”)
MED
Credential expiration Expiration of a certificate or credentials MED
Hardware failure Any hardware component in the system, i.e. hosts, storage,
network, or elsewhere.
LOW
Infrastructure Power feed or environmental conditions LOW
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
On-premise data center realities
• Traditional DR “weekend tests”
• Connected to the internet? Exposed to same external attacks
• DR site is always “ACTIVE” and hence you are paying for resources
• Data is constantly replicated
• Datacenter security compliance is expensive and resource intensive
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Think Resiliently. Principles of
Resiliency
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
A V A I L A B I L I T Y , R E L I A B I L I T Y ,
A N D R E S I L I E N C E
IN 21ST CENTURY ARCHITECTURES
Test recovery procedures
Automatically recover from failure
Scale horizontally to improve availability
Stop guessing capacity
Manage change through automation
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
What do those 9’s really mean?
Availability
Max disruption
(per year)
Max disruption
(per month)
Max disruption
(per month)
99% 3 days 15 hours 7.31 hours 14.4 minutes
99.9% 8 hours 45
minutes
43.83 minutes 1.44 minutes
99.95% 4 hours 22
minutes
21.92 minutes 43.2 seconds
99.99% 52 minutes 4.38 minutes 8.64 seconds
99.999% 5 minutes 26.3 seconds 864 milliseconds
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Disaster
Recovery point
Data loss
Recovery time
Down time
Time
Recovery Point and Recovery Time Objective
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Classification
Security Policy
Customer Provided and Managed Controls
Encryption
Governance
ITDaM
ITSM
Monitoring
Operations
Malware
Risk
Management
You control how you manage your own risks
AWS Managed and Audited Controls
SOC 1 SOC 2 PCI-DSS NIST 800-53 ISO 27001
AWS Provided, Customer Configured and Managed Controls
Virtual Private
Cloud
Key
Management
Logging Other AWS features and services
Customer Risk Appetite and Desired Control Environment
Business Risks Sourcing Risks
Technology
Risks
Security Risks ComplianceAWSCustomers
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Design Systems Resiliently
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
A complex system that works is invariably found
to have evolved from a simple system that worked.
G A L L ’ S L A W
It’s not binary.
Start somewhere
and scale up.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
“There is no compression algorithm for experience.”
AWS has had 13+ years to build the world’s most reliable, secure, scalable, and
cost-effective infrastructure.
• Your operational DNA has to be crafted for reliability.
• Service SLAs between 99.9% and 100% availability
• Amazon S3 is designed for 99.999999999% durability
• AWS Availability Zones exist on isolated fault lines, flood plains, networks, and local electrical grids to
substantially reduce the chance of simultaneous failure.
• Disaster is inevitable; automation + redundancy = availability.
We are driven to remove any and all causes of failure. Our goal is to make our operational
performance indistinguishable from perfect.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
A W S
R E G I O N A L
E X PA N S I O N
23 Regions and 67 AZs 4 New Regions and 12 AZs
2 GovCloud Regions Today New GovCloud, TS, and Secret Regions
Coming Soon
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T 25
Amazon Global Network
AWS Region with Multiple Edge Locations
Amazon CloudFront PoPs
AWS Direct Connect Location
96 AWS Direct Connect locations
Customers can reach every public AWS Region from
the local Direct Connect location (except China)
A W S
C O N N E C T I V I T Y
O P T I O N S
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Resilient AWS Cloud
Infrastructure
Regions, AZs, Networking
Service Design
Cell-based architecture
Multi-Az architecture
Micro-service architecture
Distributed systems best practices
Understand the AWS Services scope
Single AZ, Regional, Global, Cross-Reginal
capability Figure 3
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Resilient Networking
Networking is foundation
Packets must get from point-a to point-b
Ensure network supporting your applications is appropriately
redundant, always available, and seamlessly routed.
AWS provides a global infrastructure with 20 Regions and
61 Availability Zones (at the time of publication)
AWS services
Amazon EC2 networking
Amazon Virtual Private Cloud (VPC), VPC Peering, VPC Sharing
AWS Gateways for external, internal and back to on-premise routing (VPN, Transit)
DNS (Route53)
Elastic Load Balancer (ELB)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Resilient Data
Must have confidence in the resilience of your data
Many forms: filesystem, block storage, databases, in memory caches
Consider how eventual consistency impacts design
AWS services
Amazon S3 cross-region replication
Cross region snapshots (Amazon EBS volumes)
Amazon RDS cross region replicas
AWS Storage & File Gateway
Amazon FSx for Windows and Lustre
Figure 10
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Self-Healing applications
Highly resilient applications must be able to self-heal.
How
Leverage Microservices app architecture
Decouple inter-dependencies, loose coupling
Remove state from app components
AWS services
Elastic Load
Balancing
AWS Auto Scaling Amazon Simple
Queue Service
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Single Region: Multi AZ
Start here before adopting more complex architecture
Only consider multi-region if requirements dictate
Pros
Availability of AWS region-wide services include
Amazon S3, Amazon DynamoDB, Amazon EFS,
Amazon SQS, Amazon Kinesis
Much less complexity in design, implementation, and
operations.
Cons
If you need >99.9% resiliency, consider multi-region.
May not meet needs of regulators
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Multi-Region: Active-Standby
Traditional DR Pattern
Backup env used in event of failure only
Pros
For Apps which cannot use native AWS features
Least # changes to the application
Cons
Delays while Standby becomes Active (hrs)
RPO limited by replication lag
AWS Services
Amazon RDS Amazon Route 53
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Multi-Region: Active-active
Both stacks active, traffic distributed
Data replication critical, must consider latency impacts
Pros
Zero RTO
Works well for apps that can partition users
Cons
Data replication must be handled by Applications
AWS Services
Storage replication from APN partners
Amazon RDS Amazon DynamoDB Amazon Aurora AWS Database
Migration Service
Amazon Route 53
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Multi-Region: Dual-write
Shared nothing architecture – all TX processed in
duplicate/parallel
Good for legacy applications
Pros
Zero RPO
Little/No change to apps in each region
Cons
Requires checkpointing
Reconciliation jobs to ensure sites in sync
Downstream apps must avoid duplicates
AWS Service
AWS Lambda Amazon Route 53
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Anti-Patterns
• Replicate on-premise problems & patterns to the cloud
• Use of Non-redundant architectures to meet schedules
• Single datacenter (Availability Zones) architectures
• Reusing manual processes
• Data retention practices, Failover & Scaling
• Responding to monitoring alerts and metrics (vs self-healing, auto scaling)
• Assuming data is safe in your data center
Don't sacrifice long-term value
for short-term results
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Resilient operations, often
overlooked
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Operations is key pillar in resiliency
Success in operations means you
• Successful & consistent implementation of changes
• Have insight to operational health
• Have insight to achievement of business outcomes
• Respond in timely and effectively to events impacting the application
How?
• Perform operations as code
• Annotated documentation
• Make frequent, small, reversible changes
• Refine operations procedures frequently
• Anticipate failure
• Learn from operational failures
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Operations monitoring
Must detect failures fast
Applications emit telemetry to detect
Processes defined and understood
AWS Services
Amazon CloudWatch
AWS Personal
Health Dashboard
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Operations automation
“A key mechanism to achieve this is to automate
the management as much as possible, removing
error prone, manual operations.” - Werner Vogels
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Application deployment
Infrastructure as code – integrates infrastructure and application change processes
Examples include staged deployment, canary deployments, isolation zone deployments, and
automatic roll back
AWS Services
AWS CodeBuild
AWS CodeCommit
AWS CodeDeploy
AWS CodePipeline
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Testing enforces resiliency
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Application testing & certification
Resilient system continue to operate successfully in the presence of failures.
Failure Mode Effect Analysis (FMEA) – industry standard technique
Estimate risk priority number (RPN) between 1 and 1000
Rank probability, severity, and observability on a 1-10 scale, where 1 is good and 10 is bad, and multiplying them.
Perfectly low probability, low impact, easy to measure risk has an RPN of 1.
Extremely frequent, permanently damaging, impossible to detect risk has an RPN of 1000
Failure impact analysis
Failure Effect Mitigation Result
Failure of an AZ
Temporary capacity
reduction
Automatic failover to secondary
AZ
Temporary performance
degradation
Total failure of satellite region Data replication offline
Repair/reconfigure replication
using alternate region
No service interruption
Partition of network between regions Data replication offline
Auto recovery when network is
available
No service interruption
Total failure of primary region Service Offline Failover to secondary region
Service restored within two
hours
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Chaos engineering
Cloud has ushered in new method of testing
Principles of Chaos Engineering – “Chaos Engineering can be thought of as the facilitation of
experiments to uncover systemic weaknesses.” https://principlesofchaos.org/
Principles
Building a hypothesis around steady state behavior
Applying variations to simulate real world events
Run experiments in production
Automate the experiments to run continuously
Minimize blast radius of failures
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Continuous Testing of Infrastructure
Regularly execute tests in stable, production & production-like test environments.
Treat Infrastructure as Code
• CI/CD Test in Infrastructure Build Pipeline
• Testing of infrastructure during Integration Test
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Future Considerations
• Consider serverless, reduces maintenance and moves the responsibility of the
resilient design to AWS.
• Take advantage of our distributed systems by building on top of them – Amazon
S3/AWS Lambda/Amazon ECS.
• Break systems down into smaller pieces along logical seams. Reduce the blast
radius of a failure of any individual piece of the system
• Leverage Well Architected tool to assess your applications -
https://aws.amazon.com/well-architected-tool
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Additional Resources
• AWS Well Architected https://aws.amazon.com/architecture/well-architected
• AWS Whitepaper: Building Mission-Critical Financial Services Applications on
AWS, April 2019
• re:Invent 2018: Close Loops & Opening Minds: How to Take Control of Systems,
Big & Small-https://www.youtube.com/watch?v=O8xLxNje30M
• Building Microservices: Designing Fine-Grained Systems
• AWS re:Invent 2018: Architecture Patterns for Multi-Region Active-Active
Applications (ARC209-R2)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Tim Griesbach
awstim@amazon.com

More Related Content

What's hot

Route53 및 CloudFront를 이용한 CDN 활용기 - AWS Summit Seoul 2017
Route53 및 CloudFront를 이용한 CDN 활용기 - AWS Summit Seoul 2017Route53 및 CloudFront를 이용한 CDN 활용기 - AWS Summit Seoul 2017
Route53 및 CloudFront를 이용한 CDN 활용기 - AWS Summit Seoul 2017
Amazon Web Services Korea
 
Chaos engineering & Gameday on AWS
Chaos engineering & Gameday on AWSChaos engineering & Gameday on AWS
Chaos engineering & Gameday on AWS
Bilal Aybar
 
An Introduction to the AWS Well Architected Framework - Webinar
An Introduction to the AWS Well Architected Framework - WebinarAn Introduction to the AWS Well Architected Framework - Webinar
An Introduction to the AWS Well Architected Framework - Webinar
Amazon Web Services
 
AWS Summit Seoul 2023 | 가격은 저렴, 성능은 최대로! 확 달라진 Amazon EC2 알아보기
AWS Summit Seoul 2023 | 가격은 저렴, 성능은 최대로! 확 달라진 Amazon EC2 알아보기AWS Summit Seoul 2023 | 가격은 저렴, 성능은 최대로! 확 달라진 Amazon EC2 알아보기
AWS Summit Seoul 2023 | 가격은 저렴, 성능은 최대로! 확 달라진 Amazon EC2 알아보기
Amazon Web Services Korea
 
DevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best PracticesDevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best Practices
Shiva Narayanaswamy
 
AWS Summit Seoul 2023 | 스타트업의 서버리스 기반 SaaS 데이터 처리 및 데이터웨어하우스 구축 사례
AWS Summit Seoul 2023 | 스타트업의 서버리스 기반 SaaS 데이터 처리 및 데이터웨어하우스 구축 사례AWS Summit Seoul 2023 | 스타트업의 서버리스 기반 SaaS 데이터 처리 및 데이터웨어하우스 구축 사례
AWS Summit Seoul 2023 | 스타트업의 서버리스 기반 SaaS 데이터 처리 및 데이터웨어하우스 구축 사례
Amazon Web Services Korea
 
Amazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for KubernetesAmazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for Kubernetes
Amazon Web Services
 
Using AWS Control Tower to govern multi-account AWS environments at scale - G...
Using AWS Control Tower to govern multi-account AWS environments at scale - G...Using AWS Control Tower to govern multi-account AWS environments at scale - G...
Using AWS Control Tower to govern multi-account AWS environments at scale - G...
Amazon Web Services
 
Azure DevOps Presentation
Azure DevOps PresentationAzure DevOps Presentation
Azure DevOps Presentation
InCycleSoftware
 
Improve Developer Experience with Developer Portal
Improve Developer Experience with Developer PortalImprove Developer Experience with Developer Portal
Improve Developer Experience with Developer Portal
Kumton Suttiraksiri
 
Architecture Patterns for Multi-Region Active-Active Applications (ARC209-R2)...
Architecture Patterns for Multi-Region Active-Active Applications (ARC209-R2)...Architecture Patterns for Multi-Region Active-Active Applications (ARC209-R2)...
Architecture Patterns for Multi-Region Active-Active Applications (ARC209-R2)...
Amazon Web Services
 
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...
Amazon Web Services Korea
 
Developer Experience on AWS
Developer Experience on AWSDeveloper Experience on AWS
Developer Experience on AWS
Amazon Web Services
 
Azure DevOps
Azure DevOpsAzure DevOps
Azure DevOps
Juan Fabian
 
AWS Summit Seoul 2023 | SOCAR는 어떻게 2만대의 차량을 운영할까?: IoT Data의 수집부터 분석까지
AWS Summit Seoul 2023 | SOCAR는 어떻게 2만대의 차량을 운영할까?: IoT Data의 수집부터 분석까지AWS Summit Seoul 2023 | SOCAR는 어떻게 2만대의 차량을 운영할까?: IoT Data의 수집부터 분석까지
AWS Summit Seoul 2023 | SOCAR는 어떻게 2만대의 차량을 운영할까?: IoT Data의 수집부터 분석까지
Amazon Web Services Korea
 
Azure DevOps Best Practices Webinar
Azure DevOps Best Practices WebinarAzure DevOps Best Practices Webinar
Azure DevOps Best Practices Webinar
Cambay Digital
 
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Amazon Web Services
 
비즈니스 리더를 위한 디지털 트랜스포메이션 트렌드 - 김지현, 김영현 AWS 사업개발 매니저 :: AWS re:Invent re:Cap 2021
비즈니스 리더를 위한 디지털 트랜스포메이션 트렌드 - 김지현, 김영현 AWS 사업개발 매니저 :: AWS re:Invent re:Cap 2021비즈니스 리더를 위한 디지털 트랜스포메이션 트렌드 - 김지현, 김영현 AWS 사업개발 매니저 :: AWS re:Invent re:Cap 2021
비즈니스 리더를 위한 디지털 트랜스포메이션 트렌드 - 김지현, 김영현 AWS 사업개발 매니저 :: AWS re:Invent re:Cap 2021
Amazon Web Services Korea
 
DevOps beyond the Tools
DevOps beyond the ToolsDevOps beyond the Tools
DevOps beyond the Tools
Johann-Peter Hartmann
 
FinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel Aviv
FinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel AvivFinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel Aviv
FinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel Aviv
Amazon Web Services
 

What's hot (20)

Route53 및 CloudFront를 이용한 CDN 활용기 - AWS Summit Seoul 2017
Route53 및 CloudFront를 이용한 CDN 활용기 - AWS Summit Seoul 2017Route53 및 CloudFront를 이용한 CDN 활용기 - AWS Summit Seoul 2017
Route53 및 CloudFront를 이용한 CDN 활용기 - AWS Summit Seoul 2017
 
Chaos engineering & Gameday on AWS
Chaos engineering & Gameday on AWSChaos engineering & Gameday on AWS
Chaos engineering & Gameday on AWS
 
An Introduction to the AWS Well Architected Framework - Webinar
An Introduction to the AWS Well Architected Framework - WebinarAn Introduction to the AWS Well Architected Framework - Webinar
An Introduction to the AWS Well Architected Framework - Webinar
 
AWS Summit Seoul 2023 | 가격은 저렴, 성능은 최대로! 확 달라진 Amazon EC2 알아보기
AWS Summit Seoul 2023 | 가격은 저렴, 성능은 최대로! 확 달라진 Amazon EC2 알아보기AWS Summit Seoul 2023 | 가격은 저렴, 성능은 최대로! 확 달라진 Amazon EC2 알아보기
AWS Summit Seoul 2023 | 가격은 저렴, 성능은 최대로! 확 달라진 Amazon EC2 알아보기
 
DevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best PracticesDevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best Practices
 
AWS Summit Seoul 2023 | 스타트업의 서버리스 기반 SaaS 데이터 처리 및 데이터웨어하우스 구축 사례
AWS Summit Seoul 2023 | 스타트업의 서버리스 기반 SaaS 데이터 처리 및 데이터웨어하우스 구축 사례AWS Summit Seoul 2023 | 스타트업의 서버리스 기반 SaaS 데이터 처리 및 데이터웨어하우스 구축 사례
AWS Summit Seoul 2023 | 스타트업의 서버리스 기반 SaaS 데이터 처리 및 데이터웨어하우스 구축 사례
 
Amazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for KubernetesAmazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for Kubernetes
 
Using AWS Control Tower to govern multi-account AWS environments at scale - G...
Using AWS Control Tower to govern multi-account AWS environments at scale - G...Using AWS Control Tower to govern multi-account AWS environments at scale - G...
Using AWS Control Tower to govern multi-account AWS environments at scale - G...
 
Azure DevOps Presentation
Azure DevOps PresentationAzure DevOps Presentation
Azure DevOps Presentation
 
Improve Developer Experience with Developer Portal
Improve Developer Experience with Developer PortalImprove Developer Experience with Developer Portal
Improve Developer Experience with Developer Portal
 
Architecture Patterns for Multi-Region Active-Active Applications (ARC209-R2)...
Architecture Patterns for Multi-Region Active-Active Applications (ARC209-R2)...Architecture Patterns for Multi-Region Active-Active Applications (ARC209-R2)...
Architecture Patterns for Multi-Region Active-Active Applications (ARC209-R2)...
 
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...
AWS 기반 클라우드 아키텍처 모범사례 - 삼성전자 개발자 포털/개발자 워크스페이스 - 정영준 솔루션즈 아키텍트, AWS / 유현성 수석,...
 
Developer Experience on AWS
Developer Experience on AWSDeveloper Experience on AWS
Developer Experience on AWS
 
Azure DevOps
Azure DevOpsAzure DevOps
Azure DevOps
 
AWS Summit Seoul 2023 | SOCAR는 어떻게 2만대의 차량을 운영할까?: IoT Data의 수집부터 분석까지
AWS Summit Seoul 2023 | SOCAR는 어떻게 2만대의 차량을 운영할까?: IoT Data의 수집부터 분석까지AWS Summit Seoul 2023 | SOCAR는 어떻게 2만대의 차량을 운영할까?: IoT Data의 수집부터 분석까지
AWS Summit Seoul 2023 | SOCAR는 어떻게 2만대의 차량을 운영할까?: IoT Data의 수집부터 분석까지
 
Azure DevOps Best Practices Webinar
Azure DevOps Best Practices WebinarAzure DevOps Best Practices Webinar
Azure DevOps Best Practices Webinar
 
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
Introduction to the Well-Architected Framework and Tool - SVC208 - Anaheim AW...
 
비즈니스 리더를 위한 디지털 트랜스포메이션 트렌드 - 김지현, 김영현 AWS 사업개발 매니저 :: AWS re:Invent re:Cap 2021
비즈니스 리더를 위한 디지털 트랜스포메이션 트렌드 - 김지현, 김영현 AWS 사업개발 매니저 :: AWS re:Invent re:Cap 2021비즈니스 리더를 위한 디지털 트랜스포메이션 트렌드 - 김지현, 김영현 AWS 사업개발 매니저 :: AWS re:Invent re:Cap 2021
비즈니스 리더를 위한 디지털 트랜스포메이션 트렌드 - 김지현, 김영현 AWS 사업개발 매니저 :: AWS re:Invent re:Cap 2021
 
DevOps beyond the Tools
DevOps beyond the ToolsDevOps beyond the Tools
DevOps beyond the Tools
 
FinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel Aviv
FinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel AvivFinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel Aviv
FinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel Aviv
 

Similar to Failure is not an Option - Designing Highly Resilient AWS Systems

Scale - Failure is not an Option: Designing Highly Resilient AWS Systems
Scale - Failure is not an Option: Designing Highly Resilient AWS SystemsScale - Failure is not an Option: Designing Highly Resilient AWS Systems
Scale - Failure is not an Option: Designing Highly Resilient AWS Systems
Amazon Web Services
 
NIST Compliance, AWS Federal Pop-Up Loft
NIST Compliance, AWS Federal Pop-Up LoftNIST Compliance, AWS Federal Pop-Up Loft
NIST Compliance, AWS Federal Pop-Up Loft
Amazon Web Services
 
Hybrid Solutions at the Edge – Go Global Faster, Efficiently, and More Secure...
Hybrid Solutions at the Edge – Go Global Faster, Efficiently, and More Secure...Hybrid Solutions at the Edge – Go Global Faster, Efficiently, and More Secure...
Hybrid Solutions at the Edge – Go Global Faster, Efficiently, and More Secure...
Amazon Web Services
 
Cybersecurity: A Drive Force Behind Cloud Adoption
Cybersecurity: A Drive Force Behind Cloud AdoptionCybersecurity: A Drive Force Behind Cloud Adoption
Cybersecurity: A Drive Force Behind Cloud Adoption
Amazon Web Services
 
Automated Security Remediation
Automated Security RemediationAutomated Security Remediation
Automated Security Remediation
Amazon Web Services
 
AWS PROTECTED - Why This Matters to Australia.
AWS PROTECTED - Why This Matters to Australia.AWS PROTECTED - Why This Matters to Australia.
AWS PROTECTED - Why This Matters to Australia.
Amazon Web Services
 
Innovate - Cybersecurity: A Drive Force Behind Cloud Adoption
Innovate - Cybersecurity: A Drive Force Behind Cloud AdoptionInnovate - Cybersecurity: A Drive Force Behind Cloud Adoption
Innovate - Cybersecurity: A Drive Force Behind Cloud Adoption
Amazon Web Services
 
Cost Optimization on AWS
Cost Optimization on AWSCost Optimization on AWS
Cost Optimization on AWS
Amazon Web Services
 
Innovate - Become Migration Ready: Accelerate and Optimise your Cloud Adoptio...
Innovate - Become Migration Ready: Accelerate and Optimise your Cloud Adoptio...Innovate - Become Migration Ready: Accelerate and Optimise your Cloud Adoptio...
Innovate - Become Migration Ready: Accelerate and Optimise your Cloud Adoptio...
Amazon Web Services
 
Cost Optimisation
Cost OptimisationCost Optimisation
Cost Optimisation
Amazon Web Services
 
From Monolith to Microservices
From Monolith to MicroservicesFrom Monolith to Microservices
From Monolith to Microservices
Amazon Web Services
 
2. migration, disaster recovery and business continuity in the cloud
2. migration, disaster recovery and business continuity in the cloud2. migration, disaster recovery and business continuity in the cloud
2. migration, disaster recovery and business continuity in the cloud
Reham Maher El-Safarini
 
Cost Optimization on AWS (REPEAT)
Cost Optimization on AWS (REPEAT)Cost Optimization on AWS (REPEAT)
Cost Optimization on AWS (REPEAT)
Amazon Web Services
 
Breaking Up the Monolith with Containers
Breaking Up the Monolith with ContainersBreaking Up the Monolith with Containers
Breaking Up the Monolith with ContainersAmazon Web Services
 
Leaping Over the Skills Gap - Accelerate Your Journey with AMS
Leaping Over the Skills Gap - Accelerate Your Journey with AMSLeaping Over the Skills Gap - Accelerate Your Journey with AMS
Leaping Over the Skills Gap - Accelerate Your Journey with AMS
Amazon Web Services
 
Continuous Diagnostics and Mitigation (CDM) at Cloud Scale: How Federal Agenc...
Continuous Diagnostics and Mitigation (CDM) at Cloud Scale: How Federal Agenc...Continuous Diagnostics and Mitigation (CDM) at Cloud Scale: How Federal Agenc...
Continuous Diagnostics and Mitigation (CDM) at Cloud Scale: How Federal Agenc...
Amazon Web Services
 
Desktop-as-a-Service: Flexible Application Delivery to Cloud-Native Desktops
Desktop-as-a-Service: Flexible Application Delivery to Cloud-Native DesktopsDesktop-as-a-Service: Flexible Application Delivery to Cloud-Native Desktops
Desktop-as-a-Service: Flexible Application Delivery to Cloud-Native Desktops
Amazon Web Services
 
以容器技術為基礎的混合雲設計架構
以容器技術為基礎的混合雲設計架構以容器技術為基礎的混合雲設計架構
以容器技術為基礎的混合雲設計架構
Amazon Web Services
 
How Nubank is building a customer-obsessed bank - FSV201 - New York AWS Summit
How Nubank is building a customer-obsessed bank - FSV201 - New York AWS SummitHow Nubank is building a customer-obsessed bank - FSV201 - New York AWS Summit
How Nubank is building a customer-obsessed bank - FSV201 - New York AWS Summit
Amazon Web Services
 

Similar to Failure is not an Option - Designing Highly Resilient AWS Systems (20)

Scale - Failure is not an Option: Designing Highly Resilient AWS Systems
Scale - Failure is not an Option: Designing Highly Resilient AWS SystemsScale - Failure is not an Option: Designing Highly Resilient AWS Systems
Scale - Failure is not an Option: Designing Highly Resilient AWS Systems
 
NIST Compliance, AWS Federal Pop-Up Loft
NIST Compliance, AWS Federal Pop-Up LoftNIST Compliance, AWS Federal Pop-Up Loft
NIST Compliance, AWS Federal Pop-Up Loft
 
Hybrid Solutions at the Edge – Go Global Faster, Efficiently, and More Secure...
Hybrid Solutions at the Edge – Go Global Faster, Efficiently, and More Secure...Hybrid Solutions at the Edge – Go Global Faster, Efficiently, and More Secure...
Hybrid Solutions at the Edge – Go Global Faster, Efficiently, and More Secure...
 
Cybersecurity: A Drive Force Behind Cloud Adoption
Cybersecurity: A Drive Force Behind Cloud AdoptionCybersecurity: A Drive Force Behind Cloud Adoption
Cybersecurity: A Drive Force Behind Cloud Adoption
 
Automated Security Remediation
Automated Security RemediationAutomated Security Remediation
Automated Security Remediation
 
AWS PROTECTED - Why This Matters to Australia.
AWS PROTECTED - Why This Matters to Australia.AWS PROTECTED - Why This Matters to Australia.
AWS PROTECTED - Why This Matters to Australia.
 
Innovate - Cybersecurity: A Drive Force Behind Cloud Adoption
Innovate - Cybersecurity: A Drive Force Behind Cloud AdoptionInnovate - Cybersecurity: A Drive Force Behind Cloud Adoption
Innovate - Cybersecurity: A Drive Force Behind Cloud Adoption
 
Cost Optimization on AWS
Cost Optimization on AWSCost Optimization on AWS
Cost Optimization on AWS
 
Innovate - Become Migration Ready: Accelerate and Optimise your Cloud Adoptio...
Innovate - Become Migration Ready: Accelerate and Optimise your Cloud Adoptio...Innovate - Become Migration Ready: Accelerate and Optimise your Cloud Adoptio...
Innovate - Become Migration Ready: Accelerate and Optimise your Cloud Adoptio...
 
Cost Optimisation
Cost OptimisationCost Optimisation
Cost Optimisation
 
From Monolith to Microservices
From Monolith to MicroservicesFrom Monolith to Microservices
From Monolith to Microservices
 
2. migration, disaster recovery and business continuity in the cloud
2. migration, disaster recovery and business continuity in the cloud2. migration, disaster recovery and business continuity in the cloud
2. migration, disaster recovery and business continuity in the cloud
 
Keynote: Introduction to AWS
Keynote: Introduction to AWS Keynote: Introduction to AWS
Keynote: Introduction to AWS
 
Cost Optimization on AWS (REPEAT)
Cost Optimization on AWS (REPEAT)Cost Optimization on AWS (REPEAT)
Cost Optimization on AWS (REPEAT)
 
Breaking Up the Monolith with Containers
Breaking Up the Monolith with ContainersBreaking Up the Monolith with Containers
Breaking Up the Monolith with Containers
 
Leaping Over the Skills Gap - Accelerate Your Journey with AMS
Leaping Over the Skills Gap - Accelerate Your Journey with AMSLeaping Over the Skills Gap - Accelerate Your Journey with AMS
Leaping Over the Skills Gap - Accelerate Your Journey with AMS
 
Continuous Diagnostics and Mitigation (CDM) at Cloud Scale: How Federal Agenc...
Continuous Diagnostics and Mitigation (CDM) at Cloud Scale: How Federal Agenc...Continuous Diagnostics and Mitigation (CDM) at Cloud Scale: How Federal Agenc...
Continuous Diagnostics and Mitigation (CDM) at Cloud Scale: How Federal Agenc...
 
Desktop-as-a-Service: Flexible Application Delivery to Cloud-Native Desktops
Desktop-as-a-Service: Flexible Application Delivery to Cloud-Native DesktopsDesktop-as-a-Service: Flexible Application Delivery to Cloud-Native Desktops
Desktop-as-a-Service: Flexible Application Delivery to Cloud-Native Desktops
 
以容器技術為基礎的混合雲設計架構
以容器技術為基礎的混合雲設計架構以容器技術為基礎的混合雲設計架構
以容器技術為基礎的混合雲設計架構
 
How Nubank is building a customer-obsessed bank - FSV201 - New York AWS Summit
How Nubank is building a customer-obsessed bank - FSV201 - New York AWS SummitHow Nubank is building a customer-obsessed bank - FSV201 - New York AWS Summit
How Nubank is building a customer-obsessed bank - FSV201 - New York AWS Summit
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Failure is not an Option - Designing Highly Resilient AWS Systems

  • 1. P U B L I C S E C T O R S U M M I T Wa shingto n, D C
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Failure is not an Option Designing Highly Resilient AWS Systems Tim Griesbach Manager, Solutions Architecture AWS WWPS 3 0 2 9 5 5
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Agenda What are we planning for? Risk and Resiliency requirements. Think resiliently. Principles of Resiliency Resilient design. System, Test, and Operations patterns & best practices
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T “Everything fails, all the time” - Werner Vogels (CTO, Amazon.com)
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Resiliency is the ability for a system to recover quickly and continue operating even when a failure occurs
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Push for resiliency IT failures lead to broad society impact • Government Services, Airline, Financial, Communications Reputation / Legal • More and more people depend on IT systems for everything. $$$ • Lost productivity, idle time of people dependent on system • Lost productivity, time putting out fires and recovering • Lost revenue from system
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T What are we planning for?
  • 8.
  • 9.
  • 10.
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Consider each applications significance to your business, and the potential impact if a disruption
  • 12.
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Cause Examples Probability Operator error Manual operator error HIGH Deployment induced Software, hardware, network, or configuration deployment Both automated and manual changes HIGH Load induced Change in behavior, either of a specific caller or aggregate Service reaching a tipping point Load failures can occur in the network Denial of service (DDoS) HIGH Data induced Data accepted by the system that it can’t process (“poison pill”) MED Credential expiration Expiration of a certificate or credentials MED Hardware failure Any hardware component in the system, i.e. hosts, storage, network, or elsewhere. LOW Infrastructure Power feed or environmental conditions LOW
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T On-premise data center realities • Traditional DR “weekend tests” • Connected to the internet? Exposed to same external attacks • DR site is always “ACTIVE” and hence you are paying for resources • Data is constantly replicated • Datacenter security compliance is expensive and resource intensive
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Think Resiliently. Principles of Resiliency
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T A V A I L A B I L I T Y , R E L I A B I L I T Y , A N D R E S I L I E N C E IN 21ST CENTURY ARCHITECTURES Test recovery procedures Automatically recover from failure Scale horizontally to improve availability Stop guessing capacity Manage change through automation
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T What do those 9’s really mean? Availability Max disruption (per year) Max disruption (per month) Max disruption (per month) 99% 3 days 15 hours 7.31 hours 14.4 minutes 99.9% 8 hours 45 minutes 43.83 minutes 1.44 minutes 99.95% 4 hours 22 minutes 21.92 minutes 43.2 seconds 99.99% 52 minutes 4.38 minutes 8.64 seconds 99.999% 5 minutes 26.3 seconds 864 milliseconds
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Disaster Recovery point Data loss Recovery time Down time Time Recovery Point and Recovery Time Objective
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Classification Security Policy Customer Provided and Managed Controls Encryption Governance ITDaM ITSM Monitoring Operations Malware Risk Management You control how you manage your own risks AWS Managed and Audited Controls SOC 1 SOC 2 PCI-DSS NIST 800-53 ISO 27001 AWS Provided, Customer Configured and Managed Controls Virtual Private Cloud Key Management Logging Other AWS features and services Customer Risk Appetite and Desired Control Environment Business Risks Sourcing Risks Technology Risks Security Risks ComplianceAWSCustomers
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Design Systems Resiliently
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T A complex system that works is invariably found to have evolved from a simple system that worked. G A L L ’ S L A W
  • 22. It’s not binary. Start somewhere and scale up.
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T “There is no compression algorithm for experience.” AWS has had 13+ years to build the world’s most reliable, secure, scalable, and cost-effective infrastructure. • Your operational DNA has to be crafted for reliability. • Service SLAs between 99.9% and 100% availability • Amazon S3 is designed for 99.999999999% durability • AWS Availability Zones exist on isolated fault lines, flood plains, networks, and local electrical grids to substantially reduce the chance of simultaneous failure. • Disaster is inevitable; automation + redundancy = availability. We are driven to remove any and all causes of failure. Our goal is to make our operational performance indistinguishable from perfect.
  • 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T A W S R E G I O N A L E X PA N S I O N 23 Regions and 67 AZs 4 New Regions and 12 AZs 2 GovCloud Regions Today New GovCloud, TS, and Secret Regions Coming Soon
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T 25 Amazon Global Network AWS Region with Multiple Edge Locations Amazon CloudFront PoPs AWS Direct Connect Location 96 AWS Direct Connect locations Customers can reach every public AWS Region from the local Direct Connect location (except China) A W S C O N N E C T I V I T Y O P T I O N S
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Resilient AWS Cloud Infrastructure Regions, AZs, Networking Service Design Cell-based architecture Multi-Az architecture Micro-service architecture Distributed systems best practices Understand the AWS Services scope Single AZ, Regional, Global, Cross-Reginal capability Figure 3
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Resilient Networking Networking is foundation Packets must get from point-a to point-b Ensure network supporting your applications is appropriately redundant, always available, and seamlessly routed. AWS provides a global infrastructure with 20 Regions and 61 Availability Zones (at the time of publication) AWS services Amazon EC2 networking Amazon Virtual Private Cloud (VPC), VPC Peering, VPC Sharing AWS Gateways for external, internal and back to on-premise routing (VPN, Transit) DNS (Route53) Elastic Load Balancer (ELB)
  • 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Resilient Data Must have confidence in the resilience of your data Many forms: filesystem, block storage, databases, in memory caches Consider how eventual consistency impacts design AWS services Amazon S3 cross-region replication Cross region snapshots (Amazon EBS volumes) Amazon RDS cross region replicas AWS Storage & File Gateway Amazon FSx for Windows and Lustre Figure 10
  • 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Self-Healing applications Highly resilient applications must be able to self-heal. How Leverage Microservices app architecture Decouple inter-dependencies, loose coupling Remove state from app components AWS services Elastic Load Balancing AWS Auto Scaling Amazon Simple Queue Service
  • 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T
  • 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Single Region: Multi AZ Start here before adopting more complex architecture Only consider multi-region if requirements dictate Pros Availability of AWS region-wide services include Amazon S3, Amazon DynamoDB, Amazon EFS, Amazon SQS, Amazon Kinesis Much less complexity in design, implementation, and operations. Cons If you need >99.9% resiliency, consider multi-region. May not meet needs of regulators
  • 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Multi-Region: Active-Standby Traditional DR Pattern Backup env used in event of failure only Pros For Apps which cannot use native AWS features Least # changes to the application Cons Delays while Standby becomes Active (hrs) RPO limited by replication lag AWS Services Amazon RDS Amazon Route 53
  • 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Multi-Region: Active-active Both stacks active, traffic distributed Data replication critical, must consider latency impacts Pros Zero RTO Works well for apps that can partition users Cons Data replication must be handled by Applications AWS Services Storage replication from APN partners Amazon RDS Amazon DynamoDB Amazon Aurora AWS Database Migration Service Amazon Route 53
  • 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Multi-Region: Dual-write Shared nothing architecture – all TX processed in duplicate/parallel Good for legacy applications Pros Zero RPO Little/No change to apps in each region Cons Requires checkpointing Reconciliation jobs to ensure sites in sync Downstream apps must avoid duplicates AWS Service AWS Lambda Amazon Route 53
  • 35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Anti-Patterns • Replicate on-premise problems & patterns to the cloud • Use of Non-redundant architectures to meet schedules • Single datacenter (Availability Zones) architectures • Reusing manual processes • Data retention practices, Failover & Scaling • Responding to monitoring alerts and metrics (vs self-healing, auto scaling) • Assuming data is safe in your data center Don't sacrifice long-term value for short-term results
  • 36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Resilient operations, often overlooked
  • 37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Operations is key pillar in resiliency Success in operations means you • Successful & consistent implementation of changes • Have insight to operational health • Have insight to achievement of business outcomes • Respond in timely and effectively to events impacting the application How? • Perform operations as code • Annotated documentation • Make frequent, small, reversible changes • Refine operations procedures frequently • Anticipate failure • Learn from operational failures
  • 38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Operations monitoring Must detect failures fast Applications emit telemetry to detect Processes defined and understood AWS Services Amazon CloudWatch AWS Personal Health Dashboard
  • 39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Operations automation “A key mechanism to achieve this is to automate the management as much as possible, removing error prone, manual operations.” - Werner Vogels
  • 40. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Application deployment Infrastructure as code – integrates infrastructure and application change processes Examples include staged deployment, canary deployments, isolation zone deployments, and automatic roll back AWS Services AWS CodeBuild AWS CodeCommit AWS CodeDeploy AWS CodePipeline
  • 41. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Testing enforces resiliency
  • 42. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T
  • 43. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Application testing & certification Resilient system continue to operate successfully in the presence of failures. Failure Mode Effect Analysis (FMEA) – industry standard technique Estimate risk priority number (RPN) between 1 and 1000 Rank probability, severity, and observability on a 1-10 scale, where 1 is good and 10 is bad, and multiplying them. Perfectly low probability, low impact, easy to measure risk has an RPN of 1. Extremely frequent, permanently damaging, impossible to detect risk has an RPN of 1000 Failure impact analysis Failure Effect Mitigation Result Failure of an AZ Temporary capacity reduction Automatic failover to secondary AZ Temporary performance degradation Total failure of satellite region Data replication offline Repair/reconfigure replication using alternate region No service interruption Partition of network between regions Data replication offline Auto recovery when network is available No service interruption Total failure of primary region Service Offline Failover to secondary region Service restored within two hours
  • 44. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Chaos engineering Cloud has ushered in new method of testing Principles of Chaos Engineering – “Chaos Engineering can be thought of as the facilitation of experiments to uncover systemic weaknesses.” https://principlesofchaos.org/ Principles Building a hypothesis around steady state behavior Applying variations to simulate real world events Run experiments in production Automate the experiments to run continuously Minimize blast radius of failures
  • 45. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T
  • 46. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Continuous Testing of Infrastructure Regularly execute tests in stable, production & production-like test environments. Treat Infrastructure as Code • CI/CD Test in Infrastructure Build Pipeline • Testing of infrastructure during Integration Test
  • 47.
  • 48. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Future Considerations • Consider serverless, reduces maintenance and moves the responsibility of the resilient design to AWS. • Take advantage of our distributed systems by building on top of them – Amazon S3/AWS Lambda/Amazon ECS. • Break systems down into smaller pieces along logical seams. Reduce the blast radius of a failure of any individual piece of the system • Leverage Well Architected tool to assess your applications - https://aws.amazon.com/well-architected-tool
  • 49. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Additional Resources • AWS Well Architected https://aws.amazon.com/architecture/well-architected • AWS Whitepaper: Building Mission-Critical Financial Services Applications on AWS, April 2019 • re:Invent 2018: Close Loops & Opening Minds: How to Take Control of Systems, Big & Small-https://www.youtube.com/watch?v=O8xLxNje30M • Building Microservices: Designing Fine-Grained Systems • AWS re:Invent 2018: Architecture Patterns for Multi-Region Active-Active Applications (ARC209-R2)
  • 50. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T
  • 51. Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Tim Griesbach awstim@amazon.com