2. Who I’m
• Carlos Henrique Mecking
• Solutions Architect
• Twitter: @henriquemecking
• Github: @mecking
•
2
3. First: The Six advantages and Benefits
3
Trade capital expense for flexible expense
Benefit from massive economies of scale
Eliminate guessing on your capacity needs
Increase speed and agility
Stop spending money on running and maintaining data centers
Go global in minutes
5. Topics
• History/Why we need
• What’s it
• Design principles
• Pillars
• Security
• Reliability
• Performance Efficiency
• Cost Optimization
• Operational Excellence
5
6. History
“AWS Solutions Architects have years of experience architecting
solutions across a wide variety of business verticals and use cases,
and we have helped design and review thousands of customers’
architectures on AWS. From this experience, we have identified
best practices and core strategies for architecting systems in the
cloud.”
6
7. The AWS Well-Architected Framework
Increases awareness of architectural best practices
Address foundational areas that are often neglected
Provides consistent approach to evaluating architectures
7
AWS Well-
Architected
Framework
Design
Principles
Pillars Questions Best Practices
8. Design principles
• Stop guessing your capacity needs: always use cloud’s scalability capabilities rather than
guessing capacity needs and risking providing inadequate capacity.
• Test systems at production scale: scale up the system to what it would be in production
and test it to see how it works in the real environment. Decommission the extra
resources once the test is over.
• Automate to make architectural experimentation easier: automate the entire process of
creating a system, enabling it to be replicated easily. Also, returning to a previous setup is
simple that way.
• Allow for evolutionary architectures: automation enables architects to evolve systems
as needed, easily testing and setting up new configurations.
• Data-driven architectures: collect needed operational data that can be used to evaluate
how architectural changes impact the workloads. The data can also be used to tune up
the automation code.
• Improve through game days: inject failures to simulate operational events in production
to understand how the system behaves when they take place and correct it if necessarily.
8
10. Security Pillar: Principles
10
1. Apply security at all layers
2. Enable traceability
3. Implement a principle of least privilege
4. Focus on securing your system
5. Automate security best practices
11. Security Pillar: Question/Example
SEC 1. How are you protecting access to and use of the AWS root account credentials?
The AWS root account credentials are similar to root or local admin in other operating
systems and should be used very sparingly. The current best practice is to create AWS
Identity and Access Management (IAM) users, associate them to an administrator group,
and use the IAM user to manage the account. The AWS root account should not have API
keys, should have a strong password, and should be associated with a hardware multi-
factor authentication (MFA) device. This forces the only use of the root identity to be via
the AWS Management Console and does not allow the root account to be used for
application programming interface (API) calls. Note that some resellers or regions do not
distribute or support the AWS root account credentials.
Best practices:
• MFA and Minimal Use of Root The AWS root account credentials are only used for only
minimal required activities.
• No use of Root
11
12. Security Pillar: Questions
• 1. Identity and access management
• SEC 1. How are you protecting access to and use of the AWS root account credentials?
• BP: MFA and Minimal Use of Root The AWS root account credentials are only used for
only minimal required activities.
• BP: No use of Root
• SEC 2. How are you defining roles and responsibilities of system users to control human
access to the AWS Management Console and API?
• BP: Employee Life-Cycle Managed Employee life-cycle policies are defined and enforced.
• BP: Least Privilege Users, groups, and roles are clearly defined and granted only the
minimum privileges needed to accomplish business requirements.
• SEC 3. How are you limiting automated access to AWS resources?
• BP: Static Credentials used for Automated Access Stored these securely.
• BP: Dynamic Authentication for Automated Access Manage using instance profiles or
Amazon STS.
• 2. Detective controls
• SEC 4. How are you capturing and analyzing logs?
• BP: Activity Monitored Appropriately Amazon CloudWatch logs, events, VPC flow logs,
ELB logs, S3 bucket logs, AWS Cloud Trail Enabled, Monitored Operating System or
Application Logs 12
13. Security Pillar: Questions
3. Infrastructure protection
• SEC 5. How are you enforcing network and host-level boundary protection?
• BP: Controlled Network Traffic in VPC For example, use firewalls, security groups, NACLS,
a bastion host, etc.
• BP: Controlled Network Traffic at the Boundary For example use AWS WAF, host based
firewalls, security groups, NACLS, etc.
• SEC 6. How are you leveraging AWS service level security features?
• BP: Using Additional Features Where Appropriate
• SEC 7. How are you protecting the integrity of the operating systems on your Amazon EC2
instances?
• BP: File Integrity File integrity controls are used for EC2 instances.
• BP: EC2 Intrusion Detection Host-based intrusion detection controls are used for EC2
instances.
• BP: AWS Marketplace or Partner Solution A solution from the AWS Marketplace or from a
Partner.
• BP: Configuration Management Tool Use of a custom Amazon Machine Image (AMI) or
configuration management tools (such as Puppet or Chef) that are secured by default.
13
14. Security Pillar: Questions
4. Data protection
• SEC 8. How are you classifying your data?
• BP: Using Data Classification Schema
• BP: All data is Treated as Sensitive
• SEC 9. How are you encrypting and protecting your data at rest?
• BP: Not Required Data at rest encryption is not required
• BP: Encrypting at Rest
• SEC 10. How are you managing keys?
• BP: AWS CloudHSM , Using AWS Service Controls, Using Client Side, AWS Marketplace or
Partner Solution
• SEC 11. How are you encrypting and protecting your data in transit?
• Not Required Encryption not required on data in transit.
• Encrypted Communications TLS or equivalent is used for communication as appropriate.
5. Incident response
• SEC 12. How do you ensure you have the appropriate incident response?
• Pre-Provisioned Access, Pre-Deployed Tools, Non-Production Game Days, Production
Game 14
15. Reliability Pillar: Principles
1. Test recovery procedures
2. Automatically recover from failure
3. Scale horizontally to increase aggregate system availability
4. Stop guessing capacity
5. Manage change in automation
15
16. Reliability Pillar: Questions
1. Foundations
• REL 1. How do you manage AWS service limits for your accounts?
• BP: Monitor and Manage Limits
• BP: Set Up Automated Monitoring
• REL 2. How are you planning your network topology on AWS?
• BP: Highly Available Connectivity Between AWS and On-Premises Environment (as Applicable)
2. Change Management
• REL 3. How does your system adapt to changes in demand?
• BP: Automated Scaling
• BP: Load Tested
• REL 4. How are you monitoring AWS resources?
• …
• REL 5. How are you executing change?
3. Failure Management
• REL 6. How are you backing up your data?
• REL 7. How does your system withstand component failures?
• REL 8. How are you testing for resiliency?
• REL 9. How are you planning for disaster recovery?
16
17. Performance Efficiency
1. Democratize advanced technologies
2. Go global in minutes
3. Use serverless architectures
4. Experiment more often
5. Mechanical sympathy
17
18. Performance Efficiency
1. Selection
• PERF 1. How do you select the best performing architecture?
• PERF 2. How do you select your compute solution?
• PERF 3. How do you select your storage solution?
• PERF 4. How do you select your database solution?
• PERF 5. How do you select your network solution?
2. Review
• PERF 6. How do you ensure that you continue to have the most appropriate resource type as
new resource types and features are introduced?
3. Monitoring
• PERF 7. How do you monitor your resources post-launch to ensure they are performing as
expected?
4. Tradeoffs
• PERF 8. How do you use tradeoffs to improve performance?
18
19. Cost Optimization: Principles
1. Adopt a consumption model
2. Benefit from economies of scale
3. Stop spending money on data center operations
4. Analyze and attribute expenditure
5. Use managed services to reduce cost of ownership
19
20. Cost Optimization
1. Cost-Effective Resources
• COST 1. Are you considering cost when you select AWS services for your solution?
• COST 2. Have you sized your resources to meet your cost targets?
• COST 3. Have you selected the appropriate pricing model to meet your cost targets?
2. Matching Supply and Demand
• COST 4. How do you make sure your capacity matches but does not substantially exceed
what you need?
3. Expenditure Awareness
• COST 5. Did you consider data-transfer charges when designing your architecture?
• COST 6. How are you monitoring usage and spending?
• COST 7. Do you decommission resources that you no longer need or stop resources that are
temporarily not needed?
• COST 8. What access controls and procedures do you have in place to govern AWS usage?
4. Optimizing Over Time
• COST 9. How do you manage and/or consider the adoption of new services?
20
21. Operational Excellence: Principles
1. Perform operations with code
2. Align operations processes to business objectives
3. Make regular, small, incremental changes
4. Test for responses to unexpected events
5. Learn from operational events and failures
6. Keep operations procedures current
21
22. Operational Excellence
1. Preparation
• OPS 1. What best practices for cloud operations are you using?
• OPS 2. How are you doing configuration management for your workload?
2. Operations
• OPS 3. How are you evolving your workload while minimizing the impact of change?
• OPS 4. How do you monitor your workload to ensure it is operating as expected?
3. Responses
• OPS 5. How do you respond to unplanned operational events?
• OPS 6. How is escalation managed when responding to unplanned operational events?
22