What is the
The AWS Well Architected Framework is a white paper, published by Amazon
It is written by a team of AWS Solutions Architects and aims to share best
practices and core strategies for architecting in the cloud.
The White paper is designed for all levels of technical employee including:
▪ Operations Team Members
The paper outlines 5 pillars which are the foundation of creating a Well Architected
It also discusses general design principles to facilitate good design in the cloud.
Security Reliability Performance
Stop guessing your capacity needs
With the cloud, there is no need to have
resources idling away doing nothing or
have downtime due to capacity being
exceeded, Scale up or down as needed.
Test systems at production scale
In the cloud, you can create a production
scale test environment on-demand and
simply shut it down as soon as you are
Automate to make architectural
Automation allows you to create and
replicate your systems at low cost and
avoid the expense of manual effort.
Allow for evolutionary architectures
Rather than static one time architecture
choices, businesses can take advantage of
innovations and change their architecture.
e.g. New Instance Classes / Lambda vs EC2
Drive architectures using data
In the cloud, you can collect data on how
your architectural choices affect the
behaviour of your workload. This lets
you make fact-based decisions on how
to improve your workload.
e.g. MySQL RDS vs Aurora
Improve through game days
Test how your architecture and
processes perform by regularly
scheduling game days to simulate
e.g. Black Friday Deals
▪ Perform operations with code
▪ Annotate documentation
▪ Make frequent , small, reversible
▪ Anticipate failure
- Test for responses to
- Simian army (Chaos
monkey, chaos snail) used
▪ Learn from operational events and
▪ Refine operations procedure
▪ How are you evolving your
workload while minimizing the
impact of change?
▪ How do you monitor your
workload to ensure it is operating
▪ How do you respond to unplanned
▪ How is escalation managed when
responding to unplanned
Security Design Principles
▪ Implement a strong identity
▪ Enable traceability
▪ Apply security at all layers
▪ Automate security best practices
▪ Protect data in transit and at rest
▪ Prepare for security events
▪ How are you protecting access to and
use of the AWS root account
▪ How are you enforcing network and
host level boundary protection?
▪ How are you encrypting and
protecting your data at rest?
▪ How are you encrypting and
protecting your data in transit?
▪ How are you managing keys and
▪ How are you capturing and analyzing
▪ Sample of 6 questions, full 12 are in
Reliability Design Principles
▪ Test recovery procedures
▪ Automatically recover from failure
▪ Scale horizontally to increase
aggregate system availability
▪ Stop guessing capacity
▪ Manage change in automation
▪ How does your system adapt to
changes in demand?
▪ How are you monitoring AWS
▪ How are you executing change?
▪ How are you backing up your
▪ How does your system withstand
▪ How are you testing resiliency?
▪ How are you planning for disaster
▪ Democratize advanced
▪ Go global in minutes
▪ Use Serverless architecture
▪ Experiment more often
▪ Mechanical sympathy
▪ How do you select the best
▪ How did you select your compute
▪ How do you select your storage
▪ How do you select your database
▪ How do you configure your
▪ How do you ensure that you continue
to have the most appropriate resource
type as new resource types and
features are introduced?
▪ Adopt a consumption model
▪ Measure overall efficiency
▪ Stop spending money on data
▪ Analyze and attribute expenditure
▪ Use managed services to reduce
the cost of ownership
▪ Are you considering cost when
you select AWS services for your
▪ Have you sized your resources to
meet your cost targets?
▪ Have you selected the appropriate
pricing model to meet cost
▪ How do you make sure your
capacity matches but does not
exceed what you need?
▪ How are you monitoring usage
▪ Do you decommission resources
that you no longer need or stop
resources that are temporarily not
Thanks for listening
Operational Excellence: Run and monitor systems to deliver business value & continually improve supporting processes and procedures
Security: Protect information, systems and assets while delivering value through risk assessments and mitigation strategies
Reliability: The ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand and mitigate disruptions such as misconfigurations or transient network issues.
Performance Efficiency: Use resources efficiently to meet system requirements and to maintain efficiency as demand changes and technologies evolve.
Cost Optimization: The ability to avoid or eliminate unneeded cost or suboptimal resources
- Multiple layers of defense are advisable in any environment.
- Boundary protection – VPC security Groups - NACLs
- Monitoring points of ingress/outgress
- Comprehensive logging
Democratize advanced technologies
Amazon’s way of saying use managed resources where possible, especially where the technology is difficult/complicated. e.g. Media Transcoding, NoSQL databases
- Understand the hardware makes you a better developer. Consider data access patterns when selecting database or storage approaches. Consider instance type? Optimized for memory vs compute
How do you ensure that you continue to have the most appropriate resource type as new resource types and features are introduced?
- In other words, how do you ensure the correct choice you made stays corrects as new products/instance classes are brought to market.
Adopt a consumption model
Pay only for what you need. Stop services when not in use. 75% reduction in costs if used for 40 hours of developer’s work week, rather than 168 hours.
Have you sized your resources to meet your cost targets?
i.e, a small instance that 23 hours to run an operation could actually cost more than a large instance that could run code < 1 hour
Spot / On-Demand / Reserved