SlideShare a Scribd company logo
1 of 30
Download to read offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Resiliency Testing: Verify That Your
System Is as Reliable as You Think
Rodney Lester
Reliability Lead
AWS Well Architected
A R C 4 0 3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Deploy testing infrastructure
AWS well-architected reliability
pillar
Scenario walkthrough
Lab work
Summary of tests and best
practices
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Related breakouts
Thursday, November 29
Chaos Engineering and Scalability at Audible.com
1 p.m. – 2 p.m. | Aria West, Level 3, Ironwood 5
Tuesday, November 27
Globalizing Player Accounts at Riot Games While Maintaining
Availability
1:45 p.m. – 2:45 p.m. | Aria East, Plaza Level, Orovada 2
Tuesday, November 27
How Intuit TurboTax Ran Entirely on AWS for 2017 Taxes
10:45 a.m. – 11:45 a.m. | Venetian, Level 2, Venetian F
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Download the lab guide
• Navigate to and download the lab guide
• https://bit.ly/2QjJASo
• Execute the first section: Deploying the Infrastructure
• We can start testing when the web servers are available in the Ohio Region
• This will take five minutes to start the execution and about 30 minutes until you can perform
tests
• The replication to a second region will be available about 25 minutes later
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is it?
“The reliability pillar encompasses the ability of a system to recover from
infrastructure or service disruptions, dynamically acquire computing
resources to meet demand, and mitigate disruptions such as
misconfigurations or transient network issues.” – AWS Well-Architected
Framework whitepaper
Design principles
• Test recovery procedures
• Automatically recover from failures
• Scale horizontally to increase aggregate system availability
• Stop guessing capacity
• Manage change using automation
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is it?
“The reliability pillar encompasses the ability of a system to recover from
infrastructure or service disruptions, dynamically acquire computing
resources to meet demand, and mitigate disruptions such as
misconfigurations or transient network issues.” – AWS Well-Architected
Framework whitepaper
Design principles
• Test recovery procedures
• Automatically recover from failures
• Scale horizontally to increase aggregate system availability
• Stop guessing capacity
• Manage change using automation
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Failure management
“Everything fails, all the time.” – Werner Vogels, Amazon CTO
We are going to work on withstanding component failures and planning
for recovery
Netflix OSS Simian Army, specifically “Chaos” suite, is an example of how
to test fault tolerance
• Chaos Monkey – injects failure on instances/containers
• Chaos Gorilla (deprecated) – simulates an AZ failure
• Chaos Kong – simulates an AWS Region failure
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Automated recovery
Manual recovery will take much longer
Use Amazon Web Services (AWS) to automate
• AWS Auto Scaling
• Amazon Relational Database Service (Amazon RDS)
• Amazon Route 53
• Amazon Simple Storage Service (Amazon S3)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why are we here?
Just use Netflix OSS, right?
Wait, Chaos Monkey is written in Java or Go?
Wait, I need to share SSH keys to inject failure?
What do you mean there is no AZ failure?
(No more Chaos Gorilla?)
This creates fear, uncertainty, and doubt
Can I do this?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Yes, you can do this!
You can write code to perform these scenarios in any language
You can use AWS Systems Manager and Amazon Elastic Compute Cloud
Run Command to inject failure without using SSH
You can write your own AZ failure simulation
You can simulate your own regional failure
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scenario for testing
AWS US-EAST-2 Region
Virtual private cloud
Availability zoneAvailability zone Availability zone
IGW routed subnet IGW routed subnet IGW routed subnet
Private subnet Private subnet Private subnet
Application
Load Balancer
MySQL DB
Multi-AZ
MySQL DB
Multi-AZ
Instance Instance
Auto Scale
app tier
Bucket
Instance Instance Instance Instance
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
But wait, there’s more….
AWS US-WEST-2 Region
Virtual private cloud
Availability zoneAvailability zone Availability zone
IGW routed subnet IGW routed subnet IGW routed subnet
Private subnet Private subnet Private subnet
Application
Load Balancer
MySQL DB
Multi-AZ
MySQL DB
Multi-AZ
Instance Instance
Auto Scale
app tier
Bucket
Instance Instance
Instance Instance
RDS DB
read
replica
AWS DMS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
And even more
AWS US-EAST-2 Region
Virtual private cloud
Availability zoneAvailability zone Availability zone
IGW routed subnet IGW routed subnet IGW routed subnet
Private subnet Private subnet Private subnet
Application
Load Balancer
MySQL DB
Multi-AZ
MySQL DB
Multi-AZ
Instance Instance
Auto Scale
app tier
Bucket
Instance Instance Instance Instance
RDS DB
read
replica
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Start at “Discussion and Example Failure Scenarios”
Minimum requirements to start testing
Step Functions state machine should have completed “WaitForWebApp” state
No need to wait for read replicas or database migration instance
Decide which language you are comfortable using
• Bash
• PowerShell
• Java
• C#
• Python
Ask for help if you need it
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Start at “Cross Region Failure Scenarios”
Minimum requirements to start testing
Step Functions state machine should have completed “WaitForDMS” state, which is the last
state
Scenarios are limited due to features and student environment
requirements
Follow instructions
Ask for help if you need it
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Simulated failures
AWS US-EAST-2 Region
Virtual private cloud
Availability zoneAvailability zone Availability zone
IGW routed subnet IGW routed subnet IGW routed subnet
Private subnet Private subnet Private subnet
Application
Load Balancer
MySQL DB
Multi-AZ
MySQL DB
Multi-AZ
Instance Instance
Auto Scale
app tier
Bucket
Instance Instance Instance Instance
RDS DB
read
replica
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What did you learn?
We’d like you to lose the fear of writing these tests
The simulation might not be obvious – discuss and think about the effects
before implementation and revise based on results, but writing this code
is not difficult
On very large implementations, you’ll need to use orchestration to have
things happen almost simultaneously
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Things to watch out for
You will find problems and resolve them, but real failure might not look
like the simulation
Expect this to be an ongoing effort
Add these tests to your pipeline/acceptance testing
Some failure modes are destructive
Automated deployment will help bring the environment back
Failover is easier than failback
Transactions will be in flight
Flapping can be worse than an outage
Game day exercises are essential
Practice, practice, practice
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
This code is available for download
Download it at (this is also in the lab guide):
https://bit.ly/2Nb0XHg
It is licensed under the Apache License, Version 2.0
https://aws.amazon.com/apache2.0
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Rodney Lester
rodneyle@amazon.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

What's hot

Amazon WorkSpaces for Regulated Industries (BAP211) - AWS re:Invent 2018
Amazon WorkSpaces for Regulated Industries (BAP211) - AWS re:Invent 2018Amazon WorkSpaces for Regulated Industries (BAP211) - AWS re:Invent 2018
Amazon WorkSpaces for Regulated Industries (BAP211) - AWS re:Invent 2018Amazon Web Services
 
How Amazon WorkSpaces Powers the Hands-On Labs (BAP317) - AWS re:Invent 2018
How Amazon WorkSpaces Powers the Hands-On Labs (BAP317) - AWS re:Invent 2018How Amazon WorkSpaces Powers the Hands-On Labs (BAP317) - AWS re:Invent 2018
How Amazon WorkSpaces Powers the Hands-On Labs (BAP317) - AWS re:Invent 2018Amazon Web Services
 
Amazon GuardDuty Threat Detection and Remediation
Amazon GuardDuty Threat Detection and RemediationAmazon GuardDuty Threat Detection and Remediation
Amazon GuardDuty Threat Detection and RemediationAmazon Web Services
 
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...Amazon Web Services
 
Operational Excellence for Identity & Access Management (SEC334) - AWS re:Inv...
Operational Excellence for Identity & Access Management (SEC334) - AWS re:Inv...Operational Excellence for Identity & Access Management (SEC334) - AWS re:Inv...
Operational Excellence for Identity & Access Management (SEC334) - AWS re:Inv...Amazon Web Services
 
Hands-On: Automating AWS Infrastructure with PowerShell (WIN308) - AWS re:Inv...
Hands-On: Automating AWS Infrastructure with PowerShell (WIN308) - AWS re:Inv...Hands-On: Automating AWS Infrastructure with PowerShell (WIN308) - AWS re:Inv...
Hands-On: Automating AWS Infrastructure with PowerShell (WIN308) - AWS re:Inv...Amazon Web Services
 
Amazon CI-CD Practices for Software Development Teams
Amazon CI-CD Practices for Software Development Teams Amazon CI-CD Practices for Software Development Teams
Amazon CI-CD Practices for Software Development Teams Amazon Web Services
 
Petabyte-Scale Migration to Amazon S3 Building Photobox's Data Lake (STG393) ...
Petabyte-Scale Migration to Amazon S3 Building Photobox's Data Lake (STG393) ...Petabyte-Scale Migration to Amazon S3 Building Photobox's Data Lake (STG393) ...
Petabyte-Scale Migration to Amazon S3 Building Photobox's Data Lake (STG393) ...Amazon Web Services
 
Pause and Resume your EC2 Instances with Hibernate (CMP392) - AWS re:Invent 2018
Pause and Resume your EC2 Instances with Hibernate (CMP392) - AWS re:Invent 2018Pause and Resume your EC2 Instances with Hibernate (CMP392) - AWS re:Invent 2018
Pause and Resume your EC2 Instances with Hibernate (CMP392) - AWS re:Invent 2018Amazon Web Services
 
SID304 Threat Detection and Remediation with Amazon GuardDuty
 SID304 Threat Detection and Remediation with Amazon GuardDuty SID304 Threat Detection and Remediation with Amazon GuardDuty
SID304 Threat Detection and Remediation with Amazon GuardDutyAmazon Web Services
 
Building Microservices with the Twelve Factor App Pattern on AWS
Building Microservices with the Twelve Factor App Pattern on AWSBuilding Microservices with the Twelve Factor App Pattern on AWS
Building Microservices with the Twelve Factor App Pattern on AWSAmazon Web Services
 
Up and Running with Amazon Linux WorkSpaces (BAP207-R1) - AWS re:Invent 2018
Up and Running with Amazon Linux WorkSpaces (BAP207-R1) - AWS re:Invent 2018Up and Running with Amazon Linux WorkSpaces (BAP207-R1) - AWS re:Invent 2018
Up and Running with Amazon Linux WorkSpaces (BAP207-R1) - AWS re:Invent 2018Amazon Web Services
 
AWS Storage Leadership Session: What's New in Amazon S3, Amazon EFS, Amazon E...
AWS Storage Leadership Session: What's New in Amazon S3, Amazon EFS, Amazon E...AWS Storage Leadership Session: What's New in Amazon S3, Amazon EFS, Amazon E...
AWS Storage Leadership Session: What's New in Amazon S3, Amazon EFS, Amazon E...Amazon Web Services
 
Layered Perimeter Protection for Apps Running on AWS (CTD201-R1) - AWS re:Inv...
Layered Perimeter Protection for Apps Running on AWS (CTD201-R1) - AWS re:Inv...Layered Perimeter Protection for Apps Running on AWS (CTD201-R1) - AWS re:Inv...
Layered Perimeter Protection for Apps Running on AWS (CTD201-R1) - AWS re:Inv...Amazon Web Services
 
Set Up Compliance Automation Using AWS Management Tools (SEC317) - AWS re:Inv...
Set Up Compliance Automation Using AWS Management Tools (SEC317) - AWS re:Inv...Set Up Compliance Automation Using AWS Management Tools (SEC317) - AWS re:Inv...
Set Up Compliance Automation Using AWS Management Tools (SEC317) - AWS re:Inv...Amazon Web Services
 
Hitchhiker's Guide to Cloud Ops
Hitchhiker's Guide to Cloud Ops Hitchhiker's Guide to Cloud Ops
Hitchhiker's Guide to Cloud Ops Amazon Web Services
 
SRV205 Architectures and Strategies for Building Modern Applications on AWS
 SRV205 Architectures and Strategies for Building Modern Applications on AWS SRV205 Architectures and Strategies for Building Modern Applications on AWS
SRV205 Architectures and Strategies for Building Modern Applications on AWSAmazon Web Services
 
Inventory and Patch Management Using AWS Systems Manager (ARC332) - AWS re:In...
Inventory and Patch Management Using AWS Systems Manager (ARC332) - AWS re:In...Inventory and Patch Management Using AWS Systems Manager (ARC332) - AWS re:In...
Inventory and Patch Management Using AWS Systems Manager (ARC332) - AWS re:In...Amazon Web Services
 

What's hot (20)

Amazon WorkSpaces for Regulated Industries (BAP211) - AWS re:Invent 2018
Amazon WorkSpaces for Regulated Industries (BAP211) - AWS re:Invent 2018Amazon WorkSpaces for Regulated Industries (BAP211) - AWS re:Invent 2018
Amazon WorkSpaces for Regulated Industries (BAP211) - AWS re:Invent 2018
 
How Amazon WorkSpaces Powers the Hands-On Labs (BAP317) - AWS re:Invent 2018
How Amazon WorkSpaces Powers the Hands-On Labs (BAP317) - AWS re:Invent 2018How Amazon WorkSpaces Powers the Hands-On Labs (BAP317) - AWS re:Invent 2018
How Amazon WorkSpaces Powers the Hands-On Labs (BAP317) - AWS re:Invent 2018
 
Amazon GuardDuty Threat Detection and Remediation
Amazon GuardDuty Threat Detection and RemediationAmazon GuardDuty Threat Detection and Remediation
Amazon GuardDuty Threat Detection and Remediation
 
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...
 
Operational Excellence for Identity & Access Management (SEC334) - AWS re:Inv...
Operational Excellence for Identity & Access Management (SEC334) - AWS re:Inv...Operational Excellence for Identity & Access Management (SEC334) - AWS re:Inv...
Operational Excellence for Identity & Access Management (SEC334) - AWS re:Inv...
 
Container Scheduling
Container SchedulingContainer Scheduling
Container Scheduling
 
Hands-On: Automating AWS Infrastructure with PowerShell (WIN308) - AWS re:Inv...
Hands-On: Automating AWS Infrastructure with PowerShell (WIN308) - AWS re:Inv...Hands-On: Automating AWS Infrastructure with PowerShell (WIN308) - AWS re:Inv...
Hands-On: Automating AWS Infrastructure with PowerShell (WIN308) - AWS re:Inv...
 
Deep dive - AWS Fargate
Deep dive - AWS FargateDeep dive - AWS Fargate
Deep dive - AWS Fargate
 
Amazon CI-CD Practices for Software Development Teams
Amazon CI-CD Practices for Software Development Teams Amazon CI-CD Practices for Software Development Teams
Amazon CI-CD Practices for Software Development Teams
 
Petabyte-Scale Migration to Amazon S3 Building Photobox's Data Lake (STG393) ...
Petabyte-Scale Migration to Amazon S3 Building Photobox's Data Lake (STG393) ...Petabyte-Scale Migration to Amazon S3 Building Photobox's Data Lake (STG393) ...
Petabyte-Scale Migration to Amazon S3 Building Photobox's Data Lake (STG393) ...
 
Pause and Resume your EC2 Instances with Hibernate (CMP392) - AWS re:Invent 2018
Pause and Resume your EC2 Instances with Hibernate (CMP392) - AWS re:Invent 2018Pause and Resume your EC2 Instances with Hibernate (CMP392) - AWS re:Invent 2018
Pause and Resume your EC2 Instances with Hibernate (CMP392) - AWS re:Invent 2018
 
SID304 Threat Detection and Remediation with Amazon GuardDuty
 SID304 Threat Detection and Remediation with Amazon GuardDuty SID304 Threat Detection and Remediation with Amazon GuardDuty
SID304 Threat Detection and Remediation with Amazon GuardDuty
 
Building Microservices with the Twelve Factor App Pattern on AWS
Building Microservices with the Twelve Factor App Pattern on AWSBuilding Microservices with the Twelve Factor App Pattern on AWS
Building Microservices with the Twelve Factor App Pattern on AWS
 
Up and Running with Amazon Linux WorkSpaces (BAP207-R1) - AWS re:Invent 2018
Up and Running with Amazon Linux WorkSpaces (BAP207-R1) - AWS re:Invent 2018Up and Running with Amazon Linux WorkSpaces (BAP207-R1) - AWS re:Invent 2018
Up and Running with Amazon Linux WorkSpaces (BAP207-R1) - AWS re:Invent 2018
 
AWS Storage Leadership Session: What's New in Amazon S3, Amazon EFS, Amazon E...
AWS Storage Leadership Session: What's New in Amazon S3, Amazon EFS, Amazon E...AWS Storage Leadership Session: What's New in Amazon S3, Amazon EFS, Amazon E...
AWS Storage Leadership Session: What's New in Amazon S3, Amazon EFS, Amazon E...
 
Layered Perimeter Protection for Apps Running on AWS (CTD201-R1) - AWS re:Inv...
Layered Perimeter Protection for Apps Running on AWS (CTD201-R1) - AWS re:Inv...Layered Perimeter Protection for Apps Running on AWS (CTD201-R1) - AWS re:Inv...
Layered Perimeter Protection for Apps Running on AWS (CTD201-R1) - AWS re:Inv...
 
Set Up Compliance Automation Using AWS Management Tools (SEC317) - AWS re:Inv...
Set Up Compliance Automation Using AWS Management Tools (SEC317) - AWS re:Inv...Set Up Compliance Automation Using AWS Management Tools (SEC317) - AWS re:Inv...
Set Up Compliance Automation Using AWS Management Tools (SEC317) - AWS re:Inv...
 
Hitchhiker's Guide to Cloud Ops
Hitchhiker's Guide to Cloud Ops Hitchhiker's Guide to Cloud Ops
Hitchhiker's Guide to Cloud Ops
 
SRV205 Architectures and Strategies for Building Modern Applications on AWS
 SRV205 Architectures and Strategies for Building Modern Applications on AWS SRV205 Architectures and Strategies for Building Modern Applications on AWS
SRV205 Architectures and Strategies for Building Modern Applications on AWS
 
Inventory and Patch Management Using AWS Systems Manager (ARC332) - AWS re:In...
Inventory and Patch Management Using AWS Systems Manager (ARC332) - AWS re:In...Inventory and Patch Management Using AWS Systems Manager (ARC332) - AWS re:In...
Inventory and Patch Management Using AWS Systems Manager (ARC332) - AWS re:In...
 

Similar to Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC403) - AWS re:Invent 2018

How to Determine if You Are Well-Architected for Reliability
How to Determine if You Are Well-Architected for ReliabilityHow to Determine if You Are Well-Architected for Reliability
How to Determine if You Are Well-Architected for ReliabilityAmazon Web Services
 
Rodney Lester: Well-Architected - Reliability Instructor Led Lab.pdf
Rodney Lester: Well-Architected - Reliability Instructor Led Lab.pdfRodney Lester: Well-Architected - Reliability Instructor Led Lab.pdf
Rodney Lester: Well-Architected - Reliability Instructor Led Lab.pdfAmazon Web Services
 
How to Determine If You Are Well Architected for Resiliency (or How I Learned...
How to Determine If You Are Well Architected for Resiliency (or How I Learned...How to Determine If You Are Well Architected for Resiliency (or How I Learned...
How to Determine If You Are Well Architected for Resiliency (or How I Learned...Amazon Web Services
 
Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...
Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...
Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...Amazon Web Services
 
2018 10-19-jc conf-embrace-legacy-java-ee-by-aws-serverless
2018 10-19-jc conf-embrace-legacy-java-ee-by-aws-serverless2018 10-19-jc conf-embrace-legacy-java-ee-by-aws-serverless
2018 10-19-jc conf-embrace-legacy-java-ee-by-aws-serverlessKim Kao
 
Ensuring Your Windows Server Workloads Are Well-Architected - AWS Online Tech...
Ensuring Your Windows Server Workloads Are Well-Architected - AWS Online Tech...Ensuring Your Windows Server Workloads Are Well-Architected - AWS Online Tech...
Ensuring Your Windows Server Workloads Are Well-Architected - AWS Online Tech...Amazon Web Services
 
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...Amazon Web Services
 
Building Well Architected .NET Apps (WIN304) - AWS re:Invent 2018
Building Well Architected .NET Apps (WIN304) - AWS re:Invent 2018Building Well Architected .NET Apps (WIN304) - AWS re:Invent 2018
Building Well Architected .NET Apps (WIN304) - AWS re:Invent 2018Amazon Web Services
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Amazon Web Services
 
Scaling from zero to millions of users
Scaling from zero to millions of usersScaling from zero to millions of users
Scaling from zero to millions of usersAmazon Web Services
 
Nuvem Híbrida - EBC on the road Brazil Edition [Portuguese]
Nuvem Híbrida - EBC on the road Brazil Edition [Portuguese]Nuvem Híbrida - EBC on the road Brazil Edition [Portuguese]
Nuvem Híbrida - EBC on the road Brazil Edition [Portuguese]Amazon Web Services
 
Design, Deploy, & Optimize SQL Server Workloads - SRV209 - Chicago AWS Summit
Design, Deploy, & Optimize SQL Server Workloads - SRV209 - Chicago AWS SummitDesign, Deploy, & Optimize SQL Server Workloads - SRV209 - Chicago AWS Summit
Design, Deploy, & Optimize SQL Server Workloads - SRV209 - Chicago AWS SummitAmazon Web Services
 
Design, Deploy, Optimize SQL Server Workloads on AWS - SRV209 - Anaheim AWS S...
Design, Deploy, Optimize SQL Server Workloads on AWS - SRV209 - Anaheim AWS S...Design, Deploy, Optimize SQL Server Workloads on AWS - SRV209 - Anaheim AWS S...
Design, Deploy, Optimize SQL Server Workloads on AWS - SRV209 - Anaheim AWS S...Amazon Web Services
 
Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Adrian Hornsby
 
Lock it Down: How to Secure your AWS Account and your Organization's Accounts
Lock it Down: How to Secure your AWS Account and your Organization's AccountsLock it Down: How to Secure your AWS Account and your Organization's Accounts
Lock it Down: How to Secure your AWS Account and your Organization's AccountsAmazon Web Services
 
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Amazon Web Services
 
Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...
Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...
Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...Amazon Web Services
 
Design, Deploy, & Optimize SQL Server Workloads
Design, Deploy, & Optimize SQL Server Workloads Design, Deploy, & Optimize SQL Server Workloads
Design, Deploy, & Optimize SQL Server Workloads Amazon Web Services
 
AWS Lambda use cases and best practices - Builders Day Israel
AWS Lambda use cases and best practices - Builders Day IsraelAWS Lambda use cases and best practices - Builders Day Israel
AWS Lambda use cases and best practices - Builders Day IsraelAmazon Web Services
 

Similar to Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC403) - AWS re:Invent 2018 (20)

How to Determine if You Are Well-Architected for Reliability
How to Determine if You Are Well-Architected for ReliabilityHow to Determine if You Are Well-Architected for Reliability
How to Determine if You Are Well-Architected for Reliability
 
Rodney Lester: Well-Architected - Reliability Instructor Led Lab.pdf
Rodney Lester: Well-Architected - Reliability Instructor Led Lab.pdfRodney Lester: Well-Architected - Reliability Instructor Led Lab.pdf
Rodney Lester: Well-Architected - Reliability Instructor Led Lab.pdf
 
How to Determine If You Are Well Architected for Resiliency (or How I Learned...
How to Determine If You Are Well Architected for Resiliency (or How I Learned...How to Determine If You Are Well Architected for Resiliency (or How I Learned...
How to Determine If You Are Well Architected for Resiliency (or How I Learned...
 
Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...
Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...
Getting Started with Serverless Architectures with Microservices_AWSPSSummit_...
 
2018 10-19-jc conf-embrace-legacy-java-ee-by-aws-serverless
2018 10-19-jc conf-embrace-legacy-java-ee-by-aws-serverless2018 10-19-jc conf-embrace-legacy-java-ee-by-aws-serverless
2018 10-19-jc conf-embrace-legacy-java-ee-by-aws-serverless
 
Ensuring Your Windows Server Workloads Are Well-Architected - AWS Online Tech...
Ensuring Your Windows Server Workloads Are Well-Architected - AWS Online Tech...Ensuring Your Windows Server Workloads Are Well-Architected - AWS Online Tech...
Ensuring Your Windows Server Workloads Are Well-Architected - AWS Online Tech...
 
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
Enabling Your Organization’s Amazon Redshift Adoption – Going from Zero to He...
 
Building Well Architected .NET Apps (WIN304) - AWS re:Invent 2018
Building Well Architected .NET Apps (WIN304) - AWS re:Invent 2018Building Well Architected .NET Apps (WIN304) - AWS re:Invent 2018
Building Well Architected .NET Apps (WIN304) - AWS re:Invent 2018
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
 
Enterprise Security
Enterprise SecurityEnterprise Security
Enterprise Security
 
Scaling from zero to millions of users
Scaling from zero to millions of usersScaling from zero to millions of users
Scaling from zero to millions of users
 
Nuvem Híbrida - EBC on the road Brazil Edition [Portuguese]
Nuvem Híbrida - EBC on the road Brazil Edition [Portuguese]Nuvem Híbrida - EBC on the road Brazil Edition [Portuguese]
Nuvem Híbrida - EBC on the road Brazil Edition [Portuguese]
 
Design, Deploy, & Optimize SQL Server Workloads - SRV209 - Chicago AWS Summit
Design, Deploy, & Optimize SQL Server Workloads - SRV209 - Chicago AWS SummitDesign, Deploy, & Optimize SQL Server Workloads - SRV209 - Chicago AWS Summit
Design, Deploy, & Optimize SQL Server Workloads - SRV209 - Chicago AWS Summit
 
Design, Deploy, Optimize SQL Server Workloads on AWS - SRV209 - Anaheim AWS S...
Design, Deploy, Optimize SQL Server Workloads on AWS - SRV209 - Anaheim AWS S...Design, Deploy, Optimize SQL Server Workloads on AWS - SRV209 - Anaheim AWS S...
Design, Deploy, Optimize SQL Server Workloads on AWS - SRV209 - Anaheim AWS S...
 
Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.
 
Lock it Down: How to Secure your AWS Account and your Organization's Accounts
Lock it Down: How to Secure your AWS Account and your Organization's AccountsLock it Down: How to Secure your AWS Account and your Organization's Accounts
Lock it Down: How to Secure your AWS Account and your Organization's Accounts
 
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
 
Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...
Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...
Evolve Your Incident Response Process and Powers for AWS - SID306 - Chicago A...
 
Design, Deploy, & Optimize SQL Server Workloads
Design, Deploy, & Optimize SQL Server Workloads Design, Deploy, & Optimize SQL Server Workloads
Design, Deploy, & Optimize SQL Server Workloads
 
AWS Lambda use cases and best practices - Builders Day Israel
AWS Lambda use cases and best practices - Builders Day IsraelAWS Lambda use cases and best practices - Builders Day Israel
AWS Lambda use cases and best practices - Builders Day Israel
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Resiliency Testing: Verify That Your System Is as Reliable as You Think (ARC403) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Resiliency Testing: Verify That Your System Is as Reliable as You Think Rodney Lester Reliability Lead AWS Well Architected A R C 4 0 3
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Deploy testing infrastructure AWS well-architected reliability pillar Scenario walkthrough Lab work Summary of tests and best practices
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Related breakouts Thursday, November 29 Chaos Engineering and Scalability at Audible.com 1 p.m. – 2 p.m. | Aria West, Level 3, Ironwood 5 Tuesday, November 27 Globalizing Player Accounts at Riot Games While Maintaining Availability 1:45 p.m. – 2:45 p.m. | Aria East, Plaza Level, Orovada 2 Tuesday, November 27 How Intuit TurboTax Ran Entirely on AWS for 2017 Taxes 10:45 a.m. – 11:45 a.m. | Venetian, Level 2, Venetian F
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Download the lab guide • Navigate to and download the lab guide • https://bit.ly/2QjJASo • Execute the first section: Deploying the Infrastructure • We can start testing when the web servers are available in the Ohio Region • This will take five minutes to start the execution and about 30 minutes until you can perform tests • The replication to a second region will be available about 25 minutes later
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is it? “The reliability pillar encompasses the ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions such as misconfigurations or transient network issues.” – AWS Well-Architected Framework whitepaper Design principles • Test recovery procedures • Automatically recover from failures • Scale horizontally to increase aggregate system availability • Stop guessing capacity • Manage change using automation
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is it? “The reliability pillar encompasses the ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions such as misconfigurations or transient network issues.” – AWS Well-Architected Framework whitepaper Design principles • Test recovery procedures • Automatically recover from failures • Scale horizontally to increase aggregate system availability • Stop guessing capacity • Manage change using automation
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Failure management “Everything fails, all the time.” – Werner Vogels, Amazon CTO We are going to work on withstanding component failures and planning for recovery Netflix OSS Simian Army, specifically “Chaos” suite, is an example of how to test fault tolerance • Chaos Monkey – injects failure on instances/containers • Chaos Gorilla (deprecated) – simulates an AZ failure • Chaos Kong – simulates an AWS Region failure
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Automated recovery Manual recovery will take much longer Use Amazon Web Services (AWS) to automate • AWS Auto Scaling • Amazon Relational Database Service (Amazon RDS) • Amazon Route 53 • Amazon Simple Storage Service (Amazon S3)
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why are we here? Just use Netflix OSS, right? Wait, Chaos Monkey is written in Java or Go? Wait, I need to share SSH keys to inject failure? What do you mean there is no AZ failure? (No more Chaos Gorilla?) This creates fear, uncertainty, and doubt Can I do this?
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Yes, you can do this! You can write code to perform these scenarios in any language You can use AWS Systems Manager and Amazon Elastic Compute Cloud Run Command to inject failure without using SSH You can write your own AZ failure simulation You can simulate your own regional failure
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Scenario for testing AWS US-EAST-2 Region Virtual private cloud Availability zoneAvailability zone Availability zone IGW routed subnet IGW routed subnet IGW routed subnet Private subnet Private subnet Private subnet Application Load Balancer MySQL DB Multi-AZ MySQL DB Multi-AZ Instance Instance Auto Scale app tier Bucket Instance Instance Instance Instance
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. But wait, there’s more…. AWS US-WEST-2 Region Virtual private cloud Availability zoneAvailability zone Availability zone IGW routed subnet IGW routed subnet IGW routed subnet Private subnet Private subnet Private subnet Application Load Balancer MySQL DB Multi-AZ MySQL DB Multi-AZ Instance Instance Auto Scale app tier Bucket Instance Instance Instance Instance RDS DB read replica AWS DMS
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. And even more AWS US-EAST-2 Region Virtual private cloud Availability zoneAvailability zone Availability zone IGW routed subnet IGW routed subnet IGW routed subnet Private subnet Private subnet Private subnet Application Load Balancer MySQL DB Multi-AZ MySQL DB Multi-AZ Instance Instance Auto Scale app tier Bucket Instance Instance Instance Instance RDS DB read replica
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Start at “Discussion and Example Failure Scenarios” Minimum requirements to start testing Step Functions state machine should have completed “WaitForWebApp” state No need to wait for read replicas or database migration instance Decide which language you are comfortable using • Bash • PowerShell • Java • C# • Python Ask for help if you need it
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Start at “Cross Region Failure Scenarios” Minimum requirements to start testing Step Functions state machine should have completed “WaitForDMS” state, which is the last state Scenarios are limited due to features and student environment requirements Follow instructions Ask for help if you need it
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Simulated failures AWS US-EAST-2 Region Virtual private cloud Availability zoneAvailability zone Availability zone IGW routed subnet IGW routed subnet IGW routed subnet Private subnet Private subnet Private subnet Application Load Balancer MySQL DB Multi-AZ MySQL DB Multi-AZ Instance Instance Auto Scale app tier Bucket Instance Instance Instance Instance RDS DB read replica
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What did you learn? We’d like you to lose the fear of writing these tests The simulation might not be obvious – discuss and think about the effects before implementation and revise based on results, but writing this code is not difficult On very large implementations, you’ll need to use orchestration to have things happen almost simultaneously
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Things to watch out for You will find problems and resolve them, but real failure might not look like the simulation Expect this to be an ongoing effort Add these tests to your pipeline/acceptance testing Some failure modes are destructive Automated deployment will help bring the environment back Failover is easier than failback Transactions will be in flight Flapping can be worse than an outage Game day exercises are essential Practice, practice, practice
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. This code is available for download Download it at (this is also in the lab guide): https://bit.ly/2Nb0XHg It is licensed under the Apache License, Version 2.0 https://aws.amazon.com/apache2.0
  • 29. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Rodney Lester rodneyle@amazon.com
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.