SlideShare a Scribd company logo
1 of 32
Download to read offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Paul Ferguson
Senior Consultant, Professional Services, Amazon Web Services
Chris Kozlowski
Senior Technical Account Manager, Amazon Web Services
Building a Monitoring Plan
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Who we are
Paul Ferguson – Senior Consultant, London
Chris Kozlowski – Senior Technical Account
Manager, US East Coast
Who we are
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Buzzword Bingo
Observability
Operational Intelligence
‘No Ops’
Composable monitoring
Event correlation
Signal to noise ratio
Alarm fatigue
Single pane of glass
E-bonding
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What we’ll discuss…
• Who needs to be involved, and why
• What to Monitor
• What makes for an effective monitoring rule
• What tools to use and when
• Metrics, business outcomes, improvements
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Operational Challenges will always exist…
But with proper planning and design, you will be ready for them.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
So Why Monitor In the First Place?
To Gain Insights!
• Customer Experience
• Performance & Cost
• Trends
• Troubleshooting & Remediation
• Learning & Improvement
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What Goes Into a Monitoring Plan?
Alerts
System
Knowledge
People
Actions
Tools
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
People
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Roles and Responsibilities
• Operations – First Responders, Triage
• Developers/Engineers – Define normal operation
• Management – Tasked with making business decisions in
response to events
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring Plan
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
System Knowledge
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Faults
Configuration
Accounting
Performance
Security
Categories of Insight
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Things to Monitor
AWS Foundation Services
Compute Storage Database Networking
AWS Global
Infrastructure
Regions
Availability Zones Edge Locations
Operating Systems
Applications
Databases
Networking
Internet
Gateway
Elastic
Load
Balancer
Web Servers
(EC2 w/ Auto
Scaling)
RDS
our
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring Plan
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Crafting Alerts
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Anatomy of an Effective Alert
FCAPS Category: Performance
Amazon
Cloudwatch
Element: Web Server
Custom Alert:
ALARM Site latency
>=2s for 1 minute
Elastic Load
Balancing
EC2 InstancesAuto Scaling
EC2 Instance
Runbook
Owner
Test Action
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The Drive Towards Achieving Business Insight
Metrics
Operational
Outcomes
Webpage Response
Time, Job Run Length
CPU Wait %, Disk Queue
Depth
Business Insight! Customer Sentiment,
SLAs
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Alerting Best Practices
• Break alert crafting into batches. Highest Priority
First
• Refine quickly.
• Alert to prompt an action
• Descriptive alerts to aid in prompt resolution
• Don’t only use email
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring Plan
System Knowledge
Component Area
IGW Faults
ELB Faults
ELB Performance
Web Servers Faults
Web Servers Performance
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Tools to use
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How to select a good tool
• Let your requirements dictate your tools
• Start with the tools you have
• Consider using native tools on the platform
• Integrate tools - ergonomics matter
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example Workload
CloudTrail – logging of API calls
AWS Config Rules - config
CloudWatch – Resources
APM for customer experience/
synthetic monitoring
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Dashboards
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
System Knowledge
Component Area
IGW Faults
ELB Faults
ELB Performance
Web Servers Faults
Web Servers Performance
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Actions and Improvements
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Actions
• Every alert and event should end in an action to be taken
• Escalations to another person should end with an action to
be taken by them
• Actions are not only technical. Plan for what
business decisions might need to be made
• Runbooks and Playbooks
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Whenever alarms and alerts fire, identify if they can be improved.
• Alert made more accurate, descriptive, timely
• Remediation improved
• Establish processes with people first, then automate
• Identify routine or standard changes as early candidates for automation
Improvement and Automation
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Monitoring Plan
System Knowledge
Component Area
IGW Faults
ELB Faults
ELB Performance
Web Servers Faults
Web Servers Performance
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Summary and Next Steps
- Check your monitoring approach
- Is it user-centric?
- Are you measuring the right things?
- Write a monitoring plan
- Start monitoring, test and iterate
The reason operations exists is to support the needs of the
business.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!

More Related Content

What's hot

What's hot (20)

Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) -...
Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) -...Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) -...
Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) -...
 
A Practitioner's Guide to Securing Your Cloud (Like an Expert) (SEC203-R1) - ...
A Practitioner's Guide to Securing Your Cloud (Like an Expert) (SEC203-R1) - ...A Practitioner's Guide to Securing Your Cloud (Like an Expert) (SEC203-R1) - ...
A Practitioner's Guide to Securing Your Cloud (Like an Expert) (SEC203-R1) - ...
 
AWS Storage and Edge Processing
AWS Storage and Edge ProcessingAWS Storage and Edge Processing
AWS Storage and Edge Processing
 
Designing for Operability: Getting the Last Nines in Five-Nines Availability ...
Designing for Operability: Getting the Last Nines in Five-Nines Availability ...Designing for Operability: Getting the Last Nines in Five-Nines Availability ...
Designing for Operability: Getting the Last Nines in Five-Nines Availability ...
 
How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...
How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...
How Nubank Automates Fine-Grained Security with IAM, AWS Lambda, and CI/CD (F...
 
AWS and Symantec: Cyber Defense at Scale (SEC311-S) - AWS re:Invent 2018
AWS and Symantec: Cyber Defense at Scale (SEC311-S) - AWS re:Invent 2018AWS and Symantec: Cyber Defense at Scale (SEC311-S) - AWS re:Invent 2018
AWS and Symantec: Cyber Defense at Scale (SEC311-S) - AWS re:Invent 2018
 
SRV207 Orchestrating AWS Lambda with Step Functions
 SRV207 Orchestrating AWS Lambda with Step Functions SRV207 Orchestrating AWS Lambda with Step Functions
SRV207 Orchestrating AWS Lambda with Step Functions
 
Continuous Compliance for Modern Application Pipelines (GPSWS402) - AWS re:In...
Continuous Compliance for Modern Application Pipelines (GPSWS402) - AWS re:In...Continuous Compliance for Modern Application Pipelines (GPSWS402) - AWS re:In...
Continuous Compliance for Modern Application Pipelines (GPSWS402) - AWS re:In...
 
Automating DDoS and WAF Response
Automating DDoS and WAF ResponseAutomating DDoS and WAF Response
Automating DDoS and WAF Response
 
Connected Product Development - Secure Cloud & Local Connectivity for Microco...
Connected Product Development - Secure Cloud & Local Connectivity for Microco...Connected Product Development - Secure Cloud & Local Connectivity for Microco...
Connected Product Development - Secure Cloud & Local Connectivity for Microco...
 
Amazon GuardDuty Threat Detection and Remediation
Amazon GuardDuty Threat Detection and RemediationAmazon GuardDuty Threat Detection and Remediation
Amazon GuardDuty Threat Detection and Remediation
 
Five New Security Automations Using AWS Security Services & Open Source (SEC4...
Five New Security Automations Using AWS Security Services & Open Source (SEC4...Five New Security Automations Using AWS Security Services & Open Source (SEC4...
Five New Security Automations Using AWS Security Services & Open Source (SEC4...
 
Using Amazon SageMaker and AWS DeepLens with Teams New to Machine Learning (G...
Using Amazon SageMaker and AWS DeepLens with Teams New to Machine Learning (G...Using Amazon SageMaker and AWS DeepLens with Teams New to Machine Learning (G...
Using Amazon SageMaker and AWS DeepLens with Teams New to Machine Learning (G...
 
Stream Video, Analyze It in Real Time, and Share It in Real Time (ANT357) - A...
Stream Video, Analyze It in Real Time, and Share It in Real Time (ANT357) - A...Stream Video, Analyze It in Real Time, and Share It in Real Time (ANT357) - A...
Stream Video, Analyze It in Real Time, and Share It in Real Time (ANT357) - A...
 
ENT205 Preparing Your Team for a Cloud Transformation
ENT205 Preparing Your Team for a Cloud TransformationENT205 Preparing Your Team for a Cloud Transformation
ENT205 Preparing Your Team for a Cloud Transformation
 
Introducing AWS Firewall Manager - AWS Online Tech Talks
Introducing AWS Firewall Manager - AWS Online Tech TalksIntroducing AWS Firewall Manager - AWS Online Tech Talks
Introducing AWS Firewall Manager - AWS Online Tech Talks
 
What's New with the AWS CLI (DEV322-R1) - AWS re:Invent 2018
What's New with the AWS CLI (DEV322-R1) - AWS re:Invent 2018What's New with the AWS CLI (DEV322-R1) - AWS re:Invent 2018
What's New with the AWS CLI (DEV322-R1) - AWS re:Invent 2018
 
ENT305 Compliance and Cloud Security for Regulated Industries
ENT305 Compliance and Cloud Security for Regulated IndustriesENT305 Compliance and Cloud Security for Regulated Industries
ENT305 Compliance and Cloud Security for Regulated Industries
 
Moving to DevOps the Amazon Way (DEV210-R1) - AWS re:Invent 2018
Moving to DevOps the Amazon Way (DEV210-R1) - AWS re:Invent 2018Moving to DevOps the Amazon Way (DEV210-R1) - AWS re:Invent 2018
Moving to DevOps the Amazon Way (DEV210-R1) - AWS re:Invent 2018
 
Join HPE to Learn How to Keep Your Career Relevant and Not Be Automated out o...
Join HPE to Learn How to Keep Your Career Relevant and Not Be Automated out o...Join HPE to Learn How to Keep Your Career Relevant and Not Be Automated out o...
Join HPE to Learn How to Keep Your Career Relevant and Not Be Automated out o...
 

Similar to Building a Monitoring Plan.pdf

AWS Accounts@Scale Using AWS Landing Zone_AWSPSSummit_Singapore
AWS Accounts@Scale Using AWS Landing Zone_AWSPSSummit_SingaporeAWS Accounts@Scale Using AWS Landing Zone_AWSPSSummit_Singapore
AWS Accounts@Scale Using AWS Landing Zone_AWSPSSummit_Singapore
Amazon Web Services
 

Similar to Building a Monitoring Plan.pdf (20)

Cheat your Way into the Cloud
Cheat your Way into the CloudCheat your Way into the Cloud
Cheat your Way into the Cloud
 
Automated Monitoring of Best Practices and Operational Health of Your AWS Res...
Automated Monitoring of Best Practices and Operational Health of Your AWS Res...Automated Monitoring of Best Practices and Operational Health of Your AWS Res...
Automated Monitoring of Best Practices and Operational Health of Your AWS Res...
 
AWS 良好架構服務概述 (Level: 200)
AWS 良好架構服務概述 (Level: 200)AWS 良好架構服務概述 (Level: 200)
AWS 良好架構服務概述 (Level: 200)
 
AIOps: Steps Towards Autonomous Operations (DEV301-R1) - AWS re:Invent 2018
AIOps: Steps Towards Autonomous Operations (DEV301-R1) - AWS re:Invent 2018AIOps: Steps Towards Autonomous Operations (DEV301-R1) - AWS re:Invent 2018
AIOps: Steps Towards Autonomous Operations (DEV301-R1) - AWS re:Invent 2018
 
Operating at Scale- Preparing for the Journey [Portuguese]
Operating at Scale- Preparing for the Journey [Portuguese]Operating at Scale- Preparing for the Journey [Portuguese]
Operating at Scale- Preparing for the Journey [Portuguese]
 
Hitchhiker's Guide to Cloud Ops
Hitchhiker's Guide to Cloud Ops Hitchhiker's Guide to Cloud Ops
Hitchhiker's Guide to Cloud Ops
 
Optimize EC2 for Fun and Profit - SRV203 - Anaheim AWS Summit
Optimize EC2 for Fun and Profit - SRV203 - Anaheim AWS SummitOptimize EC2 for Fun and Profit - SRV203 - Anaheim AWS Summit
Optimize EC2 for Fun and Profit - SRV203 - Anaheim AWS Summit
 
Landing Zones Creating a Foundation - AWS Summit Sydney 2018
Landing Zones Creating a Foundation - AWS Summit Sydney 2018Landing Zones Creating a Foundation - AWS Summit Sydney 2018
Landing Zones Creating a Foundation - AWS Summit Sydney 2018
 
Building Serverless IoT solutions - EPAM SEC 2018 Minsk
Building Serverless IoT solutions - EPAM SEC 2018 MinskBuilding Serverless IoT solutions - EPAM SEC 2018 Minsk
Building Serverless IoT solutions - EPAM SEC 2018 Minsk
 
Landing zones: Creating a Foundation for Your AWS Migrations
Landing zones: Creating a Foundation for Your AWS MigrationsLanding zones: Creating a Foundation for Your AWS Migrations
Landing zones: Creating a Foundation for Your AWS Migrations
 
Safeguard the Integrity of Your Code for Fast and Secure Deployments (DEV349-...
Safeguard the Integrity of Your Code for Fast and Secure Deployments (DEV349-...Safeguard the Integrity of Your Code for Fast and Secure Deployments (DEV349-...
Safeguard the Integrity of Your Code for Fast and Secure Deployments (DEV349-...
 
[REPEAT 1] Safeguard the Integrity of Your Code for Fast and Secure Deploymen...
[REPEAT 1] Safeguard the Integrity of Your Code for Fast and Secure Deploymen...[REPEAT 1] Safeguard the Integrity of Your Code for Fast and Secure Deploymen...
[REPEAT 1] Safeguard the Integrity of Your Code for Fast and Secure Deploymen...
 
How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018
How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018
How to Use Predictive Scaling (API331-R1) - AWS re:Invent 2018
 
So You Want to be Well-Architected?
So You Want to be Well-Architected?So You Want to be Well-Architected?
So You Want to be Well-Architected?
 
Best Practices for Safe Deployments on AWS Lambda and Amazon API Gateway (SRV...
Best Practices for Safe Deployments on AWS Lambda and Amazon API Gateway (SRV...Best Practices for Safe Deployments on AWS Lambda and Amazon API Gateway (SRV...
Best Practices for Safe Deployments on AWS Lambda and Amazon API Gateway (SRV...
 
Operationalizing Microsoft Workloads (WIN320) - AWS re:Invent 2018
Operationalizing Microsoft Workloads (WIN320) - AWS re:Invent 2018Operationalizing Microsoft Workloads (WIN320) - AWS re:Invent 2018
Operationalizing Microsoft Workloads (WIN320) - AWS re:Invent 2018
 
Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online ...
Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online ...Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online ...
Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online ...
 
AWS Webinar Series - Cost Optimisation Levers, Tools, and Strategies
AWS Webinar Series - Cost Optimisation Levers, Tools, and StrategiesAWS Webinar Series - Cost Optimisation Levers, Tools, and Strategies
AWS Webinar Series - Cost Optimisation Levers, Tools, and Strategies
 
Are you Well-Architected?
Are you Well-Architected?Are you Well-Architected?
Are you Well-Architected?
 
AWS Accounts@Scale Using AWS Landing Zone_AWSPSSummit_Singapore
AWS Accounts@Scale Using AWS Landing Zone_AWSPSSummit_SingaporeAWS Accounts@Scale Using AWS Landing Zone_AWSPSSummit_Singapore
AWS Accounts@Scale Using AWS Landing Zone_AWSPSSummit_Singapore
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Building a Monitoring Plan.pdf

  • 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Paul Ferguson Senior Consultant, Professional Services, Amazon Web Services Chris Kozlowski Senior Technical Account Manager, Amazon Web Services Building a Monitoring Plan
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Who we are Paul Ferguson – Senior Consultant, London Chris Kozlowski – Senior Technical Account Manager, US East Coast Who we are
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Buzzword Bingo Observability Operational Intelligence ‘No Ops’ Composable monitoring Event correlation Signal to noise ratio Alarm fatigue Single pane of glass E-bonding
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What we’ll discuss… • Who needs to be involved, and why • What to Monitor • What makes for an effective monitoring rule • What tools to use and when • Metrics, business outcomes, improvements
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Operational Challenges will always exist… But with proper planning and design, you will be ready for them.
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. So Why Monitor In the First Place? To Gain Insights! • Customer Experience • Performance & Cost • Trends • Troubleshooting & Remediation • Learning & Improvement
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What Goes Into a Monitoring Plan? Alerts System Knowledge People Actions Tools
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. People
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Roles and Responsibilities • Operations – First Responders, Triage • Developers/Engineers – Define normal operation • Management – Tasked with making business decisions in response to events
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Monitoring Plan
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. System Knowledge
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Faults Configuration Accounting Performance Security Categories of Insight
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Things to Monitor AWS Foundation Services Compute Storage Database Networking AWS Global Infrastructure Regions Availability Zones Edge Locations Operating Systems Applications Databases Networking Internet Gateway Elastic Load Balancer Web Servers (EC2 w/ Auto Scaling) RDS our
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Monitoring Plan
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Crafting Alerts
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Anatomy of an Effective Alert FCAPS Category: Performance Amazon Cloudwatch Element: Web Server Custom Alert: ALARM Site latency >=2s for 1 minute Elastic Load Balancing EC2 InstancesAuto Scaling EC2 Instance Runbook Owner Test Action
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The Drive Towards Achieving Business Insight Metrics Operational Outcomes Webpage Response Time, Job Run Length CPU Wait %, Disk Queue Depth Business Insight! Customer Sentiment, SLAs
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Alerting Best Practices • Break alert crafting into batches. Highest Priority First • Refine quickly. • Alert to prompt an action • Descriptive alerts to aid in prompt resolution • Don’t only use email
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Monitoring Plan System Knowledge Component Area IGW Faults ELB Faults ELB Performance Web Servers Faults Web Servers Performance
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Tools to use
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How to select a good tool • Let your requirements dictate your tools • Start with the tools you have • Consider using native tools on the platform • Integrate tools - ergonomics matter
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example Workload CloudTrail – logging of API calls AWS Config Rules - config CloudWatch – Resources APM for customer experience/ synthetic monitoring
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Dashboards
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. System Knowledge Component Area IGW Faults ELB Faults ELB Performance Web Servers Faults Web Servers Performance
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Actions and Improvements
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Actions • Every alert and event should end in an action to be taken • Escalations to another person should end with an action to be taken by them • Actions are not only technical. Plan for what business decisions might need to be made • Runbooks and Playbooks
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Whenever alarms and alerts fire, identify if they can be improved. • Alert made more accurate, descriptive, timely • Remediation improved • Establish processes with people first, then automate • Identify routine or standard changes as early candidates for automation Improvement and Automation
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Monitoring Plan System Knowledge Component Area IGW Faults ELB Faults ELB Performance Web Servers Faults Web Servers Performance
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Summary and Next Steps - Check your monitoring approach - Is it user-centric? - Are you measuring the right things? - Write a monitoring plan - Start monitoring, test and iterate The reason operations exists is to support the needs of the business.
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!