In this presentation, we will tackle the 'Operational Excellence Pillar' of the AWS Well-Architected Framework. This pillar focuses on running and monitoring systems that deliver business value, and continually improving processes and procedures.
Amazon Web Services (AWS) has spent years working with thousands of companies across all industries to create the most comprehensive collection of best practices and guidance known as the Well-Architected Framework. This resource is available for organizations undergoing a cloud transformation who want to ensure their success on AWS.
Topics Include:
- How operational excellence is a consequence of culture.
- The six design principles for operational excellence in the cloud.
- The focus areas of cloud operational excellence.
- What operational excellence looks like in practice.
2. I’m Jonathan LaCour, CTO of Reliam
● Technologist
● Programmer
● Cloud Strategist
● Bourbon Junkie
Nice to meet you!
Hello there
3. Reliam is an AWS certified consulting and managed
services provider based in Southern California.
Serving customers globally from startups to enterprise, Reliam’s certified solutions
architects and engineers incorporate AWS best practices including the Well
Architected Framework to advise companies on workload migration, architecture
and optimization to drive rapid adoption of AWS services and high customer
satisfaction.
Reliam’s obsessive customer focus, coupled with operational excellence, expert
technical solutions, industry-leading SLAs, and proven strategies & best practices,
delivers on our promise to each customer to ensure their continued success
throughout the entire lifecycle of their technology journey.
15. Annotated documentation
On-Prem Environments
● Manual documentation
● Prone to error
● Docs drift from reality
● Operational agility suffers
Cloud Environments
● Automated documentation
● Useful to humans & systems
● Docs reflect reality
● Operational agility improves
16. Frequent, small, reversible changes
Traditional Approach
● Software releases are large,
high-risk bundles of changes
● Agile practitioners bundle
many “sprints” into a release
● Systems are monolithic
Continuous Approach
● Change is the new normal
● Systems are composed of
small, focused components
● All changes are designed to
be quickly reversible
20. Anticipate failure
Typical Operations Teams
● Reactive approach to failure
● Post-mortem exercises after
failures, if at all
● Problems usually discovered
in production
Operationally Excellent Teams
● Proactive approach to failure
● Pre-mortem exercises
● Test, validate, & measure
scenarios in Game Days
● Problems usually anticipated
22. Learn from all operational failures
Evolution Requires Sharing
● Drive change by sharing
● Involve product, marketing,
and finance in improvements
● Establish a culture of
continuous evolution
24. Focus: Preparation
Operational Priorities
Successful operations teams are enlightened operations teams.
● Experts on their workloads
● Aware of shared business goals
● Clearly understand their role
● Grasp of regulatory and compliance constraints
Proper prioritization without context is impossible.
26. Focus: Preparation
Design for Operations
Intentionally consider deployment, updates, and operations by design.
● Everything as code
● Structured CI/CD pipelines
● Shared libraries of common tools and templates
● Obsessive observability – data, data, data!
Empower yourself to act quickly during incidents.
27. Focus: Preparation
Operational Readiness
Technology is important, but so are process and procedure.
● Accurate documentation – checklists, runbooks, and playbooks
● Trained, right-sized team… no shortcuts!
● Governance to control readiness
Codify process and procedure with AWS: resource tags, event triggers, AWS
Systems Manager Run Command, Lambda, CloudWatch Events, etc.
29. Focus: Operation
Understanding Operational Health
Operational excellence requires immediately available, accurate insight into
key metrics that are aligned with business requirements.
● Performance, cost, availability, latency, etc.
● Collect and aggregate data
● Implement dashboards and alerting
AWS provides CloudWatch, Amazon ElasticSearch with Kibana, and many
other tools to enable your understanding of operational health.
30. Focus: Operation
Responding to Events
Armed with key metrics, alerting, and dashboards, your team can respond to
events with confidence.
● Consider business impact when prioritizing
● Script responses through operations as code, leveraging data
● Implement automated rollbacks to known good versions
● Embrace AWS auto-scaling
After navigating an incident, always perform root cause analysis and a full
post-mortem.
31. Focus: Evolution
Learning from Experience
The greatest indicator of success for ops teams? A passion for learning.
● Every incident is an opportunity
● Encourage ops teams to analyze, experiment, and improve
● AWS provides extensive platform to enable
Be sure to pull in all parts of the business to add differing points of view,
surfacing new opportunities for improvement.
33. Focus: Evolution
Share Learnings
Many companies have multiple product and operations teams. Share your
lessons broadly to drive a culture of improvement.
● Leverage AWS platform for sharing best practices, such as
CloudFormation templates, Chef Cookbooks, and Lambda functions.
● Use AWS IAM to define permissions for controlled access.
Evolution isn’t a localized process.
34. Summary
AWS WAF is a powerful collection of best practices
WAF Program Partners like Reliam can help accelerate your journey
Operational Excellence Pillar
● Design Principles
● Focus Areas
○ Preparation
○ Operation
○ Evolution