Successfully reported this slideshow.

Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online Tech Talks

2

Share

1 of 29
1 of 29

Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online Tech Talks

2

Share

Learning Objectives:
- Leverage CloudWatch Agent to easily monitor VM health in a unified way
- Combine utilization, performance, and alarms to build operational dashboard
- Understand how RackSpace is leveraging CloudWatch Agent to manage AWS resources

Learning Objectives:
- Leverage CloudWatch Agent to easily monitor VM health in a unified way
- Combine utilization, performance, and alarms to build operational dashboard
- Understand how RackSpace is leveraging CloudWatch Agent to manage AWS resources

More Related Content

Similar to Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online Tech Talks

More from Amazon Web Services

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online Tech Talks

  1. 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Bob Wilkinson, GM – Amazon CloudWatch Jon Madison, Manager, Product Engineering – Rackspace May 21, 2018 AWS Online Tech Talks Gaining Better Observability of Your VMs with Amazon CloudWatch
  2. 2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Speakers Bob Wilkinson – AWS GM, Amazon CloudWatch Jon Madison – Rackspace Manager, Product Engineering
  3. 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda • Introduction to Amazon CloudWatch • Monitoring & Observability • CloudWatch Agent in Action • Rackspace – Scaling Operations with CloudWatch • Closing
  4. 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Introduction to Amazon CloudWatch
  5. 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon CloudWatch at a Glance MONITOR • Get metrics on key resources • Observe application and operational health • Monitor custom metrics and log files ACT • SNS notifications • Automated alarm actions • Event-driven corrective actions ANALYZE • Visualize through Dashboards • 1-sec granularity • Unified operational view • 15-months of data retention Gain System-Wide Visibility into Resource Utilization, Application Performance, and Operational Health >
  6. 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “Amazon CloudWatch monitors more than 800 trillion metric observations, triggers than 2 trillion events, and ingests more than 50 petabytes of logs per month (*as of March 2018)”
  7. 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitoring & Observability
  8. 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. From Monitoring to Observability MONITORING • Reports overall system health OBSERVABILITY • Granular insights into system behavior • Detailed metrics, enhanced monitoring with alerting, visualization, and log aggregation & analytics • Used for debugging, complex troubleshooting, system performance etc.
  9. 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Observability is Challenging • Complex applications & microservices • Agile infrastructure • Distributed systems • Disparate tooling • High customer expectations
  10. 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. + Contextual Resource Information Custom Dimensions Autodetect Region Aggregate Metrics 1-Second Resolution Metrics Available with the CloudWatch Agent Metric Enhancements Observability Needs Granular Metrics + Default Metrics CPU Disk Memory cpu_time_guest disk_free mem_active cpu_time_guest_nice disk_inodes_free mem_available cpu_time_idle disk_inodes_total mem_available_percent cpu_time_iowait disk_inodes_used mem_buffered cpu_time_irq disk_total mem_cached cpu_time_nice disk_used mem_free cpu_time_softirq disk_used_percent mem_inactive cpu_time_steal diskio_io_time mem_total cpu_time_system diskio_iops_in_progress mem_used cpu_time_user diskio_read_bytes mem_used_percent cpu_usage_guest diskio_read_time cpu_usage_guest_nice diskio_reads Network Statistics cpu_usage_idle diskio_write_bytes netstat_tcp_close cpu_usage_iowait diskio_write_time netstat_tcp_close_wait cpu_usage_irq diskio_writes netstat_tcp_closing cpu_usage_nice netstat_tcp_established cpu_usage_softirq Processes netstat_tcp_fin_wait1 cpu_usage_steal processes_blocked netstat_tcp_fin_wait2 cpu_usage_system processes_dead netstat_tcp_last_ack cpu_usage_user processes_idle netstat_tcp_listen processes_paging netstat_tcp_none Network processes_running netstat_tcp_syn_recv net_bytes_recv processes_sleeping netstat_tcp_syn_sent net_bytes_sent processes_stopped netstat_tcp_time_wait net_drop_in processes_total netstat_udp_socket net_drop_out processes_total_threads net_err_in processes_wait Swap net_err_out processes_zombies swap_free net_packets_recv swap_used net_packets_sent swap_used_percent EC2 Instance Metrics CPUUtilization DiskReadBytes DiskReadOps DiskWriteBytes DiskWriteOps NetworkIn NetworkOut NetworkPacketsIn NetworkPacketsOut
  11. 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. CloudWatch Agent in Action
  12. 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The CloudWatch Agent Simplifies Observability Unified Agent Metrics & logs For EC2 and on-premise servers Linux & Windows Enhanced Metrics & Logs Collect in-guest system metrics Appends EC2 dimensions Custom dimensions
  13. 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Getting Started Experience Install and Configure with AWS Systems Manager Integration Provides defaults specific to OS-type (Windows vs Linux) Basic, Standard, or Advanced options (complement EC2 or granular per resource metrics) Log collection (can specify multiple file_paths) Migrate from previous CW Logs agent Curated metric set specific to environment
  14. 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ideal Path to Gain Observability of your VMs Install the CloudWatch Agent Collect Metrics and Logs Build and View Dashboards Create Alarms & Actions Generate New Time Series Using Metric Math ✅
  15. 15. Collect Metrics and Logs
  16. 16. Collect Metrics and Logs
  17. 17. Build and View Dashboards
  18. 18. Create Alarms & Actions
  19. 19. Generate New Time Series Using Metric Math
  20. 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rackspace – Scaling Operations with Amazon CloudWatch
  21. 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. About Rackspace • AWS Premier Consulting Partner and audited Managed Service Provider (MSP) • A leader in the 2017 Gartner Magic Quadrant for Public Cloud Infrastructure Managed Service Providers, Worldwide • Managed Service Provider to over half the Fortune 100 • Provides a full-stack portfolio, from managed operations and applications to security, professional services, and Enterprise migration and transformation
  22. 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Key Use Cases Driving Customer Value 1. Solving Enterprise challenges of cost optimization and cost governance 2. Supporting day-to-day Customer Operations 3. Enabling automation to lower time to diagnose and resolve infrastructure and Operating System issues
  23. 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 1. Cost Control With Amazon CloudWatch • Rackspace provides services for Cost and Performance Optimization • CloudWatch replaces expensive, on premise infrastructure monitoring • The CloudWatch Agent provides on-instance insight for disk size, memory/swap utilization, etc. • Rackspace recently saved a customer ~$500k on their yearly AWS Spend by: • Right-sizing instances based on CloudWatch metric performance insights • Consolidating unused instances • Migration to new instance families • Proposed Reserved Instance (RI) savings would add ~$100k in savings • Rackspace also has tooling to manage spend alerting, bill consolidation and cost allocation / chargeback
  24. 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 2. Scale Operations With Amazon CloudWatch • Rackspace provides services for 24x7x365 Fanatical Support • CloudWatch is used to provide infrastructure monitoring and alerting for hundreds of AWS customer accounts. • Rackspace acts as first line of response for Infrastructure and Operating System alarms. • The CloudWatch Agent enables monitoring of Operating System performance and logs, and dashboards increase context for our operations team.
  25. 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 3. Automation With Amazon CloudWatch • CloudWatch and the CloudWatch Agent integrate with other Amazon services (SNS, Lambda, etc.) to provide automation opportunities • Examples: • Low disk space reports • Restart runaway processes • Run diagnostic tools in response to instance metric alarms (top, free, etc.)
  26. 26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Recommended Best Practices • Use the in-instance visibility of the CloudWatch Agent to maximize your cost optimization strategy • Leverage CloudWatch Agent and Dashboards to provide an increase in context for Operations and reduce MTTR • CloudWatch can tie into other systems to increase automated handling and diagnosis of issues
  27. 27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Closing
  28. 28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Closing Thoughts 1. Have a strong monitoring & observability strategy 2. Focus on collecting health and behavior metrics 3. Improve performance, minimize production impacts and control costs for a better end-user experience
  29. 29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Next Steps Get started with CloudWatch for free today aws.amazon.com/cloudwatch Many applications can operate within our monthly free tier limits • Basic monitoring • 10 custom or detailed metrics • 10 alarms • 3 dashboards of 50 metrics each • 1 million API requests Install the CloudWatch Agent docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent Use Metric Math to create new time series docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math

×