AWS Well-Architected
Framework
Operational Excellence Pillar
I’m Jonathan LaCour, CTO of Reliam
● Technologist
● Programmer
● Cloud Strategist
● Bourbon Junkie
Nice to meet you!
Hello there
Reliam is an AWS certified consulting and managed
services provider based in Southern California.
Serving customers globally from startups to enterprise, Reliam’s certified solutions
architects and engineers incorporate AWS best practices including the Well
Architected Framework to advise companies on workload migration, architecture
and optimization to drive rapid adoption of AWS services and high customer
satisfaction.
Reliam’s obsessive customer focus, coupled with operational excellence, expert
technical solutions, industry-leading SLAs, and proven strategies & best practices,
delivers on our promise to each customer to ensure their continued success
throughout the entire lifecycle of their technology journey.
Question
How is your company currently using AWS?
Agenda
● Introduction to Well-Architected
● Design Principles for Operational Excellence
● Areas of Operational Excellence
○ Preparation
○ Operation
○ Evolution
● Q&A
The Well-Architected Framework
AWS Well
Architected
The Five Pillars
● Operational Excellence
● Security
● Reliability
● Performance Efficiency
● Cost Optimization
AWS Well
Architected
Benefits
● Build and deploy faster
● Lower or mitigate risk
● Make informed decisions
● Learn AWS best practices
Reliam’s Insights from Well-Architected Reviews
Review
Free
Remediation
$10,000 for 40 hours
$5,000
AWS Service Credit
Reliam Accelerated Well-Architected Review Package
Operational Excellence Pillar
Operational
Excellence
Design Principles
Perform operations as code
Annotated documentation
Frequent, small, reversible changes
Refine ops procedures frequently
Anticipate failure
Learn from all operational failures
Perform operations as code
Software Engineering Practices
● Automated testing
● CI/CD pipelines
● Version control
● Code review and standards
Operations as Code
● Everything is software
● Bring software practices to
operations and infrastructure
● De-risk, ensure consistency
Question
Has your organization adopted Infrastructure as Code?
Annotated documentation
On-Prem Environments
● Manual documentation
● Prone to error
● Docs drift from reality
● Operational agility suffers
Cloud Environments
● Automated documentation
● Useful to humans & systems
● Docs reflect reality
● Operational agility improves
Frequent, small, reversible changes
Traditional Approach
● Software releases are large,
high-risk bundles of changes
● Agile practitioners bundle
many “sprints” into a release
● Systems are monolithic
Continuous Approach
● Change is the new normal
● Systems are composed of
small, focused components
● All changes are designed to
be quickly reversible
Question
What is your typical release cadence?
Refine operations procedures frequently
Software Engineering Procedures
● Regular cadence of
“retrospective” meetings
● Improvements progressively
integrated
Operations Procedures
● Regular cadence of “game
days” and associated retros
● Improvements progressively
integrated
Question
Does your organization have regular “Game Days?”
Anticipate failure
Typical Operations Teams
● Reactive approach to failure
● Post-mortem exercises after
failures, if at all
● Problems usually discovered
in production
Operationally Excellent Teams
● Proactive approach to failure
● Pre-mortem exercises
● Test, validate, & measure
scenarios in Game Days
● Problems usually anticipated
Question
Do you regularly schedule “pre-mortem” meetings?
Learn from all operational failures
Evolution Requires Sharing
● Drive change by sharing
● Involve product, marketing,
and finance in improvements
● Establish a culture of
continuous evolution
Operational
Excellence
Focus Areas
Preparation
Operation
Evolution
Focus: Preparation
Operational Priorities
Successful operations teams are enlightened operations teams.
● Experts on their workloads
● Aware of shared business goals
● Clearly understand their role
● Grasp of regulatory and compliance constraints
Proper prioritization without context is impossible.
Question
Do you feel that your operations teams are enlightened?
Focus: Preparation
Design for Operations
Intentionally consider deployment, updates, and operations by design.
● Everything as code
● Structured CI/CD pipelines
● Shared libraries of common tools and templates
● Obsessive observability – data, data, data!
Empower yourself to act quickly during incidents.
Focus: Preparation
Operational Readiness
Technology is important, but so are process and procedure.
● Accurate documentation – checklists, runbooks, and playbooks
● Trained, right-sized team… no shortcuts!
● Governance to control readiness
Codify process and procedure with AWS: resource tags, event triggers, AWS
Systems Manager Run Command, Lambda, CloudWatch Events, etc.
Question
Does your operations team have documented procedures?
Focus: Operation
Understanding Operational Health
Operational excellence requires immediately available, accurate insight into
key metrics that are aligned with business requirements.
● Performance, cost, availability, latency, etc.
● Collect and aggregate data
● Implement dashboards and alerting
AWS provides CloudWatch, Amazon ElasticSearch with Kibana, and many
other tools to enable your understanding of operational health.
Focus: Operation
Responding to Events
Armed with key metrics, alerting, and dashboards, your team can respond to
events with confidence.
● Consider business impact when prioritizing
● Script responses through operations as code, leveraging data
● Implement automated rollbacks to known good versions
● Embrace AWS auto-scaling
After navigating an incident, always perform root cause analysis and a full
post-mortem.
Focus: Evolution
Learning from Experience
The greatest indicator of success for ops teams? A passion for learning.
● Every incident is an opportunity
● Encourage ops teams to analyze, experiment, and improve
● AWS provides extensive platform to enable
Be sure to pull in all parts of the business to add differing points of view,
surfacing new opportunities for improvement.
Question
How does your organization view operational events?
Focus: Evolution
Share Learnings
Many companies have multiple product and operations teams. Share your
lessons broadly to drive a culture of improvement.
● Leverage AWS platform for sharing best practices, such as
CloudFormation templates, Chef Cookbooks, and Lambda functions.
● Use AWS IAM to define permissions for controlled access.
Evolution isn’t a localized process.
Summary
AWS WAF is a powerful collection of best practices
WAF Program Partners like Reliam can help accelerate your journey
Operational Excellence Pillar
● Design Principles
● Focus Areas
○ Preparation
○ Operation
○ Evolution
Q&A
Thanks for Attending!

AWS Well-Architected Framework: Operational Excellence Pillar

  • 1.
  • 2.
    I’m Jonathan LaCour,CTO of Reliam ● Technologist ● Programmer ● Cloud Strategist ● Bourbon Junkie Nice to meet you! Hello there
  • 3.
    Reliam is anAWS certified consulting and managed services provider based in Southern California. Serving customers globally from startups to enterprise, Reliam’s certified solutions architects and engineers incorporate AWS best practices including the Well Architected Framework to advise companies on workload migration, architecture and optimization to drive rapid adoption of AWS services and high customer satisfaction. Reliam’s obsessive customer focus, coupled with operational excellence, expert technical solutions, industry-leading SLAs, and proven strategies & best practices, delivers on our promise to each customer to ensure their continued success throughout the entire lifecycle of their technology journey.
  • 4.
    Question How is yourcompany currently using AWS?
  • 5.
    Agenda ● Introduction toWell-Architected ● Design Principles for Operational Excellence ● Areas of Operational Excellence ○ Preparation ○ Operation ○ Evolution ● Q&A
  • 6.
  • 7.
    AWS Well Architected The FivePillars ● Operational Excellence ● Security ● Reliability ● Performance Efficiency ● Cost Optimization
  • 8.
    AWS Well Architected Benefits ● Buildand deploy faster ● Lower or mitigate risk ● Make informed decisions ● Learn AWS best practices
  • 9.
    Reliam’s Insights fromWell-Architected Reviews
  • 10.
    Review Free Remediation $10,000 for 40hours $5,000 AWS Service Credit Reliam Accelerated Well-Architected Review Package
  • 11.
  • 12.
    Operational Excellence Design Principles Perform operationsas code Annotated documentation Frequent, small, reversible changes Refine ops procedures frequently Anticipate failure Learn from all operational failures
  • 13.
    Perform operations ascode Software Engineering Practices ● Automated testing ● CI/CD pipelines ● Version control ● Code review and standards Operations as Code ● Everything is software ● Bring software practices to operations and infrastructure ● De-risk, ensure consistency
  • 14.
    Question Has your organizationadopted Infrastructure as Code?
  • 15.
    Annotated documentation On-Prem Environments ●Manual documentation ● Prone to error ● Docs drift from reality ● Operational agility suffers Cloud Environments ● Automated documentation ● Useful to humans & systems ● Docs reflect reality ● Operational agility improves
  • 16.
    Frequent, small, reversiblechanges Traditional Approach ● Software releases are large, high-risk bundles of changes ● Agile practitioners bundle many “sprints” into a release ● Systems are monolithic Continuous Approach ● Change is the new normal ● Systems are composed of small, focused components ● All changes are designed to be quickly reversible
  • 17.
    Question What is yourtypical release cadence?
  • 18.
    Refine operations proceduresfrequently Software Engineering Procedures ● Regular cadence of “retrospective” meetings ● Improvements progressively integrated Operations Procedures ● Regular cadence of “game days” and associated retros ● Improvements progressively integrated
  • 19.
    Question Does your organizationhave regular “Game Days?”
  • 20.
    Anticipate failure Typical OperationsTeams ● Reactive approach to failure ● Post-mortem exercises after failures, if at all ● Problems usually discovered in production Operationally Excellent Teams ● Proactive approach to failure ● Pre-mortem exercises ● Test, validate, & measure scenarios in Game Days ● Problems usually anticipated
  • 21.
    Question Do you regularlyschedule “pre-mortem” meetings?
  • 22.
    Learn from alloperational failures Evolution Requires Sharing ● Drive change by sharing ● Involve product, marketing, and finance in improvements ● Establish a culture of continuous evolution
  • 23.
  • 24.
    Focus: Preparation Operational Priorities Successfuloperations teams are enlightened operations teams. ● Experts on their workloads ● Aware of shared business goals ● Clearly understand their role ● Grasp of regulatory and compliance constraints Proper prioritization without context is impossible.
  • 25.
    Question Do you feelthat your operations teams are enlightened?
  • 26.
    Focus: Preparation Design forOperations Intentionally consider deployment, updates, and operations by design. ● Everything as code ● Structured CI/CD pipelines ● Shared libraries of common tools and templates ● Obsessive observability – data, data, data! Empower yourself to act quickly during incidents.
  • 27.
    Focus: Preparation Operational Readiness Technologyis important, but so are process and procedure. ● Accurate documentation – checklists, runbooks, and playbooks ● Trained, right-sized team… no shortcuts! ● Governance to control readiness Codify process and procedure with AWS: resource tags, event triggers, AWS Systems Manager Run Command, Lambda, CloudWatch Events, etc.
  • 28.
    Question Does your operationsteam have documented procedures?
  • 29.
    Focus: Operation Understanding OperationalHealth Operational excellence requires immediately available, accurate insight into key metrics that are aligned with business requirements. ● Performance, cost, availability, latency, etc. ● Collect and aggregate data ● Implement dashboards and alerting AWS provides CloudWatch, Amazon ElasticSearch with Kibana, and many other tools to enable your understanding of operational health.
  • 30.
    Focus: Operation Responding toEvents Armed with key metrics, alerting, and dashboards, your team can respond to events with confidence. ● Consider business impact when prioritizing ● Script responses through operations as code, leveraging data ● Implement automated rollbacks to known good versions ● Embrace AWS auto-scaling After navigating an incident, always perform root cause analysis and a full post-mortem.
  • 31.
    Focus: Evolution Learning fromExperience The greatest indicator of success for ops teams? A passion for learning. ● Every incident is an opportunity ● Encourage ops teams to analyze, experiment, and improve ● AWS provides extensive platform to enable Be sure to pull in all parts of the business to add differing points of view, surfacing new opportunities for improvement.
  • 32.
    Question How does yourorganization view operational events?
  • 33.
    Focus: Evolution Share Learnings Manycompanies have multiple product and operations teams. Share your lessons broadly to drive a culture of improvement. ● Leverage AWS platform for sharing best practices, such as CloudFormation templates, Chef Cookbooks, and Lambda functions. ● Use AWS IAM to define permissions for controlled access. Evolution isn’t a localized process.
  • 34.
    Summary AWS WAF isa powerful collection of best practices WAF Program Partners like Reliam can help accelerate your journey Operational Excellence Pillar ● Design Principles ● Focus Areas ○ Preparation ○ Operation ○ Evolution
  • 35.