SlideShare a Scribd company logo
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Closing Loops and Opening Minds: How to
Take Control of Systems, Big and Small
Colm MacCárthaigh
Senior Principal Engineer
AWS
A R C 3 3 7
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
“Quality is not an act, it is a habit”
Aristotle, some time around 350BC
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon CloudFront Control Plane
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
CloudFront Control Plane
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
CloudFront Control Plane
(-, +)
(-, -)(+, -)
(+, +)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What goes into high quality designs
Diverse creative minds working in a fearless environment
Systematic reviews and mechanisms to share lessons
Use well-worn patterns where possible and focus
invention where it is truly needed
Testing, testing, testing, testing, testing
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How we make trade offs in design
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Control Planes Vs Data planes
Control Planes are often a bigger design
challenge than the data planes that they
support.
Poorly designed Control Planes have the
ability to cause large outages, or worse:
misconfigurations and corruption.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What do Control Planes do in the Cloud?
Manage the life cycle for resources
Provision software
Provision service configuration
Provision user configuration
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What do Control Planes do in the Cloud?
Manage the life cycle for resources
Provision software
Provision service configuration
Provision user configuration
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Control Theory 101
• Independently discovered in several
fields of engineering and science
• Formalized in the early-to-mid
twentieth century
• One of the most under-appreciated
branches of science, incredibly relevant
to distributed systems
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Control Theory 101
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Control Theory 101
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Control Theory 101
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Control Theory 101
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Control Theory 101
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Control Theory 101
PID
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 1: Checksum all of the things
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 1: Checksum all the things
watch:
out:
for:
- YAML
this:
file:
can:
be:
-truncated
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 2: Cryptographic Authentication
Encrypt and authenticate everything! Control Planes
are powerful and security critical systems
Be able to revoke and rotate every credentials. But also
watch out for certificate expiries
Prevent human access to production credentials
Never allow a non-production control plane to talk to
the production data plane
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 3: Cells, Shells, and Poison Tasters
We divide up our control planes horizontally into
regions, availability zones and cells
It’s also common to compartmentalize control
planes so that the data plane is insulated from
control plane crashes
Poison tasters: check up front that is a change is
safe
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 4: Asynchronous Coupling
Synchronous systems are very strongly coupled
A problem in a synchronous downstream
dependency has immediate impact on the
upstream callers
Retries from upstream callers can all-too-easily
fan-out and amplify problems
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 4: Asynchronous Coupling
Asynchronous coupling systems tend to be more
tolerant
Can make partial progress even when some
components are unavailable
Workflows and queues can be tuned to have
deterministic retry behaviors
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 5: Closed Feedback Loops
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 6: Small pushes and large pulls
Very Frequently Asked Question: Is it better to
push, or to pull?
For example: should data plane hosts accept
connections and be pushed configurations, or
should they connect to the control plane and pull
them?
It’s really the wrong question!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 6: Small pushes and large pulls
Long lived connections can support pushing
timely updates regardless of the “direction” of
the connection
Better to ask: which fleet is bigger? In general,
small fleets should connect to bigger fleets.
This avoids the problems of small fleets being
overwhelmed with thundering herds and retry
storms
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 7: Avoiding Cold Starts and Cold Caches
Caches are bi-modal systems. Super fast when
they have entries, and slow when they are empty
A thundering herd hitting a cold cache can
prevent it from ever getting warm
Retry storms often need to be moderated by
throttles
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 7: Avoiding Cold Starts and Cold Caches
Work out if you really need a cache at all
Pre-warm caches before accepting requests
Consider serving stale entries when backends are
unavailable
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 8: Throttles
Throttles and rate-limits are often needed to
moderate problem requestors and to dampen
fluctuating systems
Example: Amazon Elastic Load Balancer and
Amazon Elastic Compute Cloud (Amazon EC2)
Takes careful work to ensure that throttling does
not impact the end customer experience
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 9: Deltas
What happens when we do have too much
configuration state to push around?
More efficient to compute deltas and distribute
patches
But how do we actually do that?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 9: Deltas
Key Value
foo bar
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 9: Deltas
Key Value Version
foo bar 1
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 9: Deltas
Key Value Version
foo bar 1
foo baz 2
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 9: Deltas
Key Value Version
foo bar 1
foo baz 2
foo bar 3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 10: Modality and Constant-Work
So far, we can build a loosely coupled control
plane, with deltas to minimize work, and throttles
to keep things safe
But what if a LOT of things change at the same
time?
We don’t want to build up backlogs and queues
and introduce lag
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 10: Modality and Constant-Work
Systems that change performance in response to
workload or data patterns can be fragile
Example: Relational databases are great for
flexible business queries, but terrible for stable
control planes. Hidden optimizations and query
plan flips can wreck chaos
Deployments, peak events, power events, all incur
risk because they can be new modes
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 10: Modality and Constant-Work
How dumb would it be to make a really really
simple control plane?
User calls an API that edits a configuration file on
Amazon Simple Storage Service (Amazon S3).
Push that configuration file every 10 second …
whether it changed or not!
Very very reliable and robust
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 10: Modality and Constant-Work
Our Network health checks, including Amazon
Route 53 Health Checks are a good example
Health Checks are happening all of the time
Results being published to consumers, all of the
time
Zone or Region failure = no difference!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 10: Modality and Constant-Work
100 nodes requesting a configuration every
second
$1200 / year in request costs
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What did we learn about building stable systems?
Closing loops is critical, measure the progress!
Loose asynchronous coupling helps
Think about the modalities of the system
Our lessons are baked into Amazon API Gateway
and AWS Lambda
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

What's hot

AWS Outposts/LocalZones/Wavelength勉強会
AWS Outposts/LocalZones/Wavelength勉強会AWS Outposts/LocalZones/Wavelength勉強会
AWS Outposts/LocalZones/Wavelength勉強会
Mamoru Ohashi
 
Infrastructure as Code (IaC) 談義 2022
Infrastructure as Code (IaC) 談義 2022Infrastructure as Code (IaC) 談義 2022
Infrastructure as Code (IaC) 談義 2022
Amazon Web Services Japan
 
マルチテナントのアプリケーション実装〜実践編〜
マルチテナントのアプリケーション実装〜実践編〜マルチテナントのアプリケーション実装〜実践編〜
マルチテナントのアプリケーション実装〜実践編〜
Yoshiki Nakagawa
 
ローカル開発環境の構築をしよう VirtualBox + Vagrant
ローカル開発環境の構築をしよう VirtualBox + Vagrantローカル開発環境の構築をしよう VirtualBox + Vagrant
ローカル開発環境の構築をしよう VirtualBox + Vagrant
Kazuma Kimura
 
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Amazon Web Services
 
AWS Black Belt Techシリーズ Amazon CloudWatch & Auto Scaling
AWS Black Belt Techシリーズ  Amazon CloudWatch & Auto ScalingAWS Black Belt Techシリーズ  Amazon CloudWatch & Auto Scaling
AWS Black Belt Techシリーズ Amazon CloudWatch & Auto Scaling
Amazon Web Services Japan
 
【勉強会資料】Systems Managerによるパッチ管理 for PCI DSS
【勉強会資料】Systems Managerによるパッチ管理 for PCI DSS【勉強会資料】Systems Managerによるパッチ管理 for PCI DSS
【勉強会資料】Systems Managerによるパッチ管理 for PCI DSS
Nobuhiro Nakayama
 
컴퓨팅 분야 신규 서비스 - 조상만, AWS 솔루션즈 아키텍트 :: AWS re:Invent re:Cap 2021
컴퓨팅 분야 신규 서비스 - 조상만, AWS 솔루션즈 아키텍트 :: AWS re:Invent re:Cap 2021컴퓨팅 분야 신규 서비스 - 조상만, AWS 솔루션즈 아키텍트 :: AWS re:Invent re:Cap 2021
컴퓨팅 분야 신규 서비스 - 조상만, AWS 솔루션즈 아키텍트 :: AWS re:Invent re:Cap 2021
Amazon Web Services Korea
 
"Kong Summit, Japan 2022" パートナーセッション:Kong on AWS で実現するスケーラブルな API 基盤の構築
"Kong Summit, Japan 2022" パートナーセッション:Kong on AWS で実現するスケーラブルな API 基盤の構築"Kong Summit, Japan 2022" パートナーセッション:Kong on AWS で実現するスケーラブルな API 基盤の構築
"Kong Summit, Japan 2022" パートナーセッション:Kong on AWS で実現するスケーラブルな API 基盤の構築
Junji Nishihara
 
20211109 JAWS-UG SRE keynotes
20211109 JAWS-UG SRE keynotes20211109 JAWS-UG SRE keynotes
20211109 JAWS-UG SRE keynotes
Amazon Web Services Japan
 
20200818 AWS Black Belt Online Seminar AWS Shield Advanced
20200818 AWS Black Belt Online Seminar AWS Shield Advanced20200818 AWS Black Belt Online Seminar AWS Shield Advanced
20200818 AWS Black Belt Online Seminar AWS Shield Advanced
Amazon Web Services Japan
 
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
Amazon Web Services Korea
 
Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)
Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)
Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)
Noritaka Sekiyama
 
CFの便利機能を他の環境でも。Open Service Broker
CFの便利機能を他の環境でも。Open Service BrokerCFの便利機能を他の環境でも。Open Service Broker
CFの便利機能を他の環境でも。Open Service Broker
Kazuto Kusama
 
[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門
[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門
[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門
Shuji Kikuchi
 
AWS で Presto を徹底的に使いこなすワザ
AWS で Presto を徹底的に使いこなすワザAWS で Presto を徹底的に使いこなすワザ
AWS で Presto を徹底的に使いこなすワザ
Noritaka Sekiyama
 
"Kong Summit, Japan 2022" カスタマーセッション:持続可能な店舗運営を支えるリテールテックとKongの利活用について
"Kong Summit, Japan 2022" カスタマーセッション:持続可能な店舗運営を支えるリテールテックとKongの利活用について"Kong Summit, Japan 2022" カスタマーセッション:持続可能な店舗運営を支えるリテールテックとKongの利活用について
"Kong Summit, Japan 2022" カスタマーセッション:持続可能な店舗運営を支えるリテールテックとKongの利活用について
Junji Nishihara
 
20200212 AWS Black Belt Online Seminar AWS Systems Manager
20200212 AWS Black Belt Online Seminar AWS Systems Manager20200212 AWS Black Belt Online Seminar AWS Systems Manager
20200212 AWS Black Belt Online Seminar AWS Systems Manager
Amazon Web Services Japan
 
AI時代の要件定義
AI時代の要件定義AI時代の要件定義
AI時代の要件定義
Zenji Kanzaki
 
IAM 정책을 잘 알아야 AWS 보안도 쉬워진다. 이것은 꼭 알고 가자! - 신은수 솔루션즈 아키텍트, AWS :: AWS Summit S...
IAM 정책을 잘 알아야 AWS 보안도 쉬워진다. 이것은 꼭 알고 가자! - 신은수 솔루션즈 아키텍트, AWS :: AWS Summit S...IAM 정책을 잘 알아야 AWS 보안도 쉬워진다. 이것은 꼭 알고 가자! - 신은수 솔루션즈 아키텍트, AWS :: AWS Summit S...
IAM 정책을 잘 알아야 AWS 보안도 쉬워진다. 이것은 꼭 알고 가자! - 신은수 솔루션즈 아키텍트, AWS :: AWS Summit S...
Amazon Web Services Korea
 

What's hot (20)

AWS Outposts/LocalZones/Wavelength勉強会
AWS Outposts/LocalZones/Wavelength勉強会AWS Outposts/LocalZones/Wavelength勉強会
AWS Outposts/LocalZones/Wavelength勉強会
 
Infrastructure as Code (IaC) 談義 2022
Infrastructure as Code (IaC) 談義 2022Infrastructure as Code (IaC) 談義 2022
Infrastructure as Code (IaC) 談義 2022
 
マルチテナントのアプリケーション実装〜実践編〜
マルチテナントのアプリケーション実装〜実践編〜マルチテナントのアプリケーション実装〜実践編〜
マルチテナントのアプリケーション実装〜実践編〜
 
ローカル開発環境の構築をしよう VirtualBox + Vagrant
ローカル開発環境の構築をしよう VirtualBox + Vagrantローカル開発環境の構築をしよう VirtualBox + Vagrant
ローカル開発環境の構築をしよう VirtualBox + Vagrant
 
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
 
AWS Black Belt Techシリーズ Amazon CloudWatch & Auto Scaling
AWS Black Belt Techシリーズ  Amazon CloudWatch & Auto ScalingAWS Black Belt Techシリーズ  Amazon CloudWatch & Auto Scaling
AWS Black Belt Techシリーズ Amazon CloudWatch & Auto Scaling
 
【勉強会資料】Systems Managerによるパッチ管理 for PCI DSS
【勉強会資料】Systems Managerによるパッチ管理 for PCI DSS【勉強会資料】Systems Managerによるパッチ管理 for PCI DSS
【勉強会資料】Systems Managerによるパッチ管理 for PCI DSS
 
컴퓨팅 분야 신규 서비스 - 조상만, AWS 솔루션즈 아키텍트 :: AWS re:Invent re:Cap 2021
컴퓨팅 분야 신규 서비스 - 조상만, AWS 솔루션즈 아키텍트 :: AWS re:Invent re:Cap 2021컴퓨팅 분야 신규 서비스 - 조상만, AWS 솔루션즈 아키텍트 :: AWS re:Invent re:Cap 2021
컴퓨팅 분야 신규 서비스 - 조상만, AWS 솔루션즈 아키텍트 :: AWS re:Invent re:Cap 2021
 
"Kong Summit, Japan 2022" パートナーセッション:Kong on AWS で実現するスケーラブルな API 基盤の構築
"Kong Summit, Japan 2022" パートナーセッション:Kong on AWS で実現するスケーラブルな API 基盤の構築"Kong Summit, Japan 2022" パートナーセッション:Kong on AWS で実現するスケーラブルな API 基盤の構築
"Kong Summit, Japan 2022" パートナーセッション:Kong on AWS で実現するスケーラブルな API 基盤の構築
 
20211109 JAWS-UG SRE keynotes
20211109 JAWS-UG SRE keynotes20211109 JAWS-UG SRE keynotes
20211109 JAWS-UG SRE keynotes
 
20200818 AWS Black Belt Online Seminar AWS Shield Advanced
20200818 AWS Black Belt Online Seminar AWS Shield Advanced20200818 AWS Black Belt Online Seminar AWS Shield Advanced
20200818 AWS Black Belt Online Seminar AWS Shield Advanced
 
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
 
Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)
Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)
Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)
 
CFの便利機能を他の環境でも。Open Service Broker
CFの便利機能を他の環境でも。Open Service BrokerCFの便利機能を他の環境でも。Open Service Broker
CFの便利機能を他の環境でも。Open Service Broker
 
[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門
[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門
[AKIBA.AWS] AWS Elemental MediaConvertから学ぶコーデック入門
 
AWS で Presto を徹底的に使いこなすワザ
AWS で Presto を徹底的に使いこなすワザAWS で Presto を徹底的に使いこなすワザ
AWS で Presto を徹底的に使いこなすワザ
 
"Kong Summit, Japan 2022" カスタマーセッション:持続可能な店舗運営を支えるリテールテックとKongの利活用について
"Kong Summit, Japan 2022" カスタマーセッション:持続可能な店舗運営を支えるリテールテックとKongの利活用について"Kong Summit, Japan 2022" カスタマーセッション:持続可能な店舗運営を支えるリテールテックとKongの利活用について
"Kong Summit, Japan 2022" カスタマーセッション:持続可能な店舗運営を支えるリテールテックとKongの利活用について
 
20200212 AWS Black Belt Online Seminar AWS Systems Manager
20200212 AWS Black Belt Online Seminar AWS Systems Manager20200212 AWS Black Belt Online Seminar AWS Systems Manager
20200212 AWS Black Belt Online Seminar AWS Systems Manager
 
AI時代の要件定義
AI時代の要件定義AI時代の要件定義
AI時代の要件定義
 
IAM 정책을 잘 알아야 AWS 보안도 쉬워진다. 이것은 꼭 알고 가자! - 신은수 솔루션즈 아키텍트, AWS :: AWS Summit S...
IAM 정책을 잘 알아야 AWS 보안도 쉬워진다. 이것은 꼭 알고 가자! - 신은수 솔루션즈 아키텍트, AWS :: AWS Summit S...IAM 정책을 잘 알아야 AWS 보안도 쉬워진다. 이것은 꼭 알고 가자! - 신은수 솔루션즈 아키텍트, AWS :: AWS Summit S...
IAM 정책을 잘 알아야 AWS 보안도 쉬워진다. 이것은 꼭 알고 가자! - 신은수 솔루션즈 아키텍트, AWS :: AWS Summit S...
 

Similar to Closing Loops and Opening Minds: How to Take Control of Systems, Big and Small (ARC337) - AWS re:Invent 2018

Keynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringKeynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos Engineering
Amazon Web Services
 
Resiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudResiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the Cloud
Amazon Web Services
 
Mejores prácticas para administrar las operaciones de seguridad en AWS - MXO2...
Mejores prácticas para administrar las operaciones de seguridad en AWS - MXO2...Mejores prácticas para administrar las operaciones de seguridad en AWS - MXO2...
Mejores prácticas para administrar las operaciones de seguridad en AWS - MXO2...
Amazon Web Services
 
Chaos Engineering with Kubernetes
Chaos Engineering with KubernetesChaos Engineering with Kubernetes
Chaos Engineering with Kubernetes
Arun Gupta
 
Keynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practicedKeynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practiced
AWS User Group Bengaluru
 
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...
Amazon Web Services
 
Applying principles of chaos engineering to serverless (reinvent DVC305)
Applying principles of chaos engineering to serverless (reinvent DVC305)Applying principles of chaos engineering to serverless (reinvent DVC305)
Applying principles of chaos engineering to serverless (reinvent DVC305)
Yan Cui
 
Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018
Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018
Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018
Amazon Web Services
 
New AWS Security Solutions to Protect Your Workload
New AWS Security Solutions to Protect Your WorkloadNew AWS Security Solutions to Protect Your Workload
New AWS Security Solutions to Protect Your Workload
Amazon Web Services
 
ServerlessConf 2018 Keynote - Debunking Serverless Myths
ServerlessConf 2018 Keynote - Debunking Serverless MythsServerlessConf 2018 Keynote - Debunking Serverless Myths
ServerlessConf 2018 Keynote - Debunking Serverless Myths
Tim Wagner
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Amazon Web Services
 
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Amazon Web Services
 
Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.
Adrian Hornsby
 
Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...
Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...
Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...
Amazon Web Services
 
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
Amazon Web Services
 
Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018
Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018
Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018
Amazon Web Services
 
Optimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS Summit
Optimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS SummitOptimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS Summit
Optimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS Summit
Amazon Web Services
 
Come Out From Behind Your Firewall
Come Out From Behind Your FirewallCome Out From Behind Your Firewall
Come Out From Behind Your Firewall
Amazon Web Services
 
SRV203 Optimizing Amazon EC2 for Fun and Profit
 SRV203 Optimizing Amazon EC2 for Fun and Profit SRV203 Optimizing Amazon EC2 for Fun and Profit
SRV203 Optimizing Amazon EC2 for Fun and Profit
Amazon Web Services
 
11p_what_is_cloud_computing.pptx
11p_what_is_cloud_computing.pptx11p_what_is_cloud_computing.pptx
11p_what_is_cloud_computing.pptx
ssuser53e623
 

Similar to Closing Loops and Opening Minds: How to Take Control of Systems, Big and Small (ARC337) - AWS re:Invent 2018 (20)

Keynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringKeynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos Engineering
 
Resiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudResiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the Cloud
 
Mejores prácticas para administrar las operaciones de seguridad en AWS - MXO2...
Mejores prácticas para administrar las operaciones de seguridad en AWS - MXO2...Mejores prácticas para administrar las operaciones de seguridad en AWS - MXO2...
Mejores prácticas para administrar las operaciones de seguridad en AWS - MXO2...
 
Chaos Engineering with Kubernetes
Chaos Engineering with KubernetesChaos Engineering with Kubernetes
Chaos Engineering with Kubernetes
 
Keynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practicedKeynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practiced
 
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...
 
Applying principles of chaos engineering to serverless (reinvent DVC305)
Applying principles of chaos engineering to serverless (reinvent DVC305)Applying principles of chaos engineering to serverless (reinvent DVC305)
Applying principles of chaos engineering to serverless (reinvent DVC305)
 
Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018
Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018
Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018
 
New AWS Security Solutions to Protect Your Workload
New AWS Security Solutions to Protect Your WorkloadNew AWS Security Solutions to Protect Your Workload
New AWS Security Solutions to Protect Your Workload
 
ServerlessConf 2018 Keynote - Debunking Serverless Myths
ServerlessConf 2018 Keynote - Debunking Serverless MythsServerlessConf 2018 Keynote - Debunking Serverless Myths
ServerlessConf 2018 Keynote - Debunking Serverless Myths
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
 
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
 
Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.
 
Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...
Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...
Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...
 
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
 
Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018
Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018
Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018
 
Optimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS Summit
Optimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS SummitOptimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS Summit
Optimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS Summit
 
Come Out From Behind Your Firewall
Come Out From Behind Your FirewallCome Out From Behind Your Firewall
Come Out From Behind Your Firewall
 
SRV203 Optimizing Amazon EC2 for Fun and Profit
 SRV203 Optimizing Amazon EC2 for Fun and Profit SRV203 Optimizing Amazon EC2 for Fun and Profit
SRV203 Optimizing Amazon EC2 for Fun and Profit
 
11p_what_is_cloud_computing.pptx
11p_what_is_cloud_computing.pptx11p_what_is_cloud_computing.pptx
11p_what_is_cloud_computing.pptx
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Closing Loops and Opening Minds: How to Take Control of Systems, Big and Small (ARC337) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Closing Loops and Opening Minds: How to Take Control of Systems, Big and Small Colm MacCárthaigh Senior Principal Engineer AWS A R C 3 3 7
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 8. “Quality is not an act, it is a habit” Aristotle, some time around 350BC
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon CloudFront Control Plane
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. CloudFront Control Plane
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. CloudFront Control Plane (-, +) (-, -)(+, -) (+, +)
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What goes into high quality designs Diverse creative minds working in a fearless environment Systematic reviews and mechanisms to share lessons Use well-worn patterns where possible and focus invention where it is truly needed Testing, testing, testing, testing, testing
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How we make trade offs in design
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Control Planes Vs Data planes Control Planes are often a bigger design challenge than the data planes that they support. Poorly designed Control Planes have the ability to cause large outages, or worse: misconfigurations and corruption.
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What do Control Planes do in the Cloud? Manage the life cycle for resources Provision software Provision service configuration Provision user configuration
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What do Control Planes do in the Cloud? Manage the life cycle for resources Provision software Provision service configuration Provision user configuration
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Control Theory 101 • Independently discovered in several fields of engineering and science • Formalized in the early-to-mid twentieth century • One of the most under-appreciated branches of science, incredibly relevant to distributed systems
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Control Theory 101
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Control Theory 101
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Control Theory 101
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Control Theory 101
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Control Theory 101
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Control Theory 101 PID
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 1: Checksum all of the things
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 1: Checksum all the things watch: out: for: - YAML this: file: can: be: -truncated
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 2: Cryptographic Authentication Encrypt and authenticate everything! Control Planes are powerful and security critical systems Be able to revoke and rotate every credentials. But also watch out for certificate expiries Prevent human access to production credentials Never allow a non-production control plane to talk to the production data plane
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 3: Cells, Shells, and Poison Tasters We divide up our control planes horizontally into regions, availability zones and cells It’s also common to compartmentalize control planes so that the data plane is insulated from control plane crashes Poison tasters: check up front that is a change is safe
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 4: Asynchronous Coupling Synchronous systems are very strongly coupled A problem in a synchronous downstream dependency has immediate impact on the upstream callers Retries from upstream callers can all-too-easily fan-out and amplify problems
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 4: Asynchronous Coupling Asynchronous coupling systems tend to be more tolerant Can make partial progress even when some components are unavailable Workflows and queues can be tuned to have deterministic retry behaviors
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 5: Closed Feedback Loops
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 6: Small pushes and large pulls Very Frequently Asked Question: Is it better to push, or to pull? For example: should data plane hosts accept connections and be pushed configurations, or should they connect to the control plane and pull them? It’s really the wrong question!
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 6: Small pushes and large pulls Long lived connections can support pushing timely updates regardless of the “direction” of the connection Better to ask: which fleet is bigger? In general, small fleets should connect to bigger fleets. This avoids the problems of small fleets being overwhelmed with thundering herds and retry storms
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 7: Avoiding Cold Starts and Cold Caches Caches are bi-modal systems. Super fast when they have entries, and slow when they are empty A thundering herd hitting a cold cache can prevent it from ever getting warm Retry storms often need to be moderated by throttles
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 7: Avoiding Cold Starts and Cold Caches Work out if you really need a cache at all Pre-warm caches before accepting requests Consider serving stale entries when backends are unavailable
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 8: Throttles Throttles and rate-limits are often needed to moderate problem requestors and to dampen fluctuating systems Example: Amazon Elastic Load Balancer and Amazon Elastic Compute Cloud (Amazon EC2) Takes careful work to ensure that throttling does not impact the end customer experience
  • 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 9: Deltas What happens when we do have too much configuration state to push around? More efficient to compute deltas and distribute patches But how do we actually do that?
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 9: Deltas Key Value foo bar
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 9: Deltas Key Value Version foo bar 1
  • 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 9: Deltas Key Value Version foo bar 1 foo baz 2
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 9: Deltas Key Value Version foo bar 1 foo baz 2 foo bar 3
  • 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 10: Modality and Constant-Work So far, we can build a loosely coupled control plane, with deltas to minimize work, and throttles to keep things safe But what if a LOT of things change at the same time? We don’t want to build up backlogs and queues and introduce lag
  • 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 10: Modality and Constant-Work Systems that change performance in response to workload or data patterns can be fragile Example: Relational databases are great for flexible business queries, but terrible for stable control planes. Hidden optimizations and query plan flips can wreck chaos Deployments, peak events, power events, all incur risk because they can be new modes
  • 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 10: Modality and Constant-Work How dumb would it be to make a really really simple control plane? User calls an API that edits a configuration file on Amazon Simple Storage Service (Amazon S3). Push that configuration file every 10 second … whether it changed or not! Very very reliable and robust
  • 47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 10: Modality and Constant-Work Our Network health checks, including Amazon Route 53 Health Checks are a good example Health Checks are happening all of the time Results being published to consumers, all of the time Zone or Region failure = no difference!
  • 48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 10: Modality and Constant-Work 100 nodes requesting a configuration every second $1200 / year in request costs
  • 49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What did we learn about building stable systems? Closing loops is critical, measure the progress! Loose asynchronous coupling helps Think about the modalities of the system Our lessons are baked into Amazon API Gateway and AWS Lambda
  • 50. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 51. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.