Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Shashi Raina [AWS] & Al Sargent [InfluxData] | Build Modern Monitoring with InfluxDB and AWS | InfluxDays Virtual Experience London 2020

Infrastructure management and observability can be challenging in industries like financial services where uptime is of paramount importance. Learn how AWS and InfluxDB solutions overcome these challenges by delivering real-time visibility into systems to help quickly identify and mitigate potential outages. We’ll share real examples of how these solutions were implemented, providing better observability into infrastructure and transactions, and how these organizations were able to pass on these performance improvements to their customers as a result. Finally, this session will share how you can get started with a free trial of InfluxDB on AWS.


  • Be the first to comment

Shashi Raina [AWS] & Al Sargent [InfluxData] | Build Modern Monitoring with InfluxDB and AWS | InfluxDays Virtual Experience London 2020

  1. 1. Shashi Raina Partner Solution Architect, AWS Al Sargent, Sr. Director, Product Marketing, InfluxData Build modern monitoring with InfluxDB and AWS
  2. 2. Monitoring AWS: Best Practices Monitoring AWS with InfluxDB Configure Telegraf to monitor AWS Demo
  3. 3. Best Practices for monitoring AWS
  4. 4. © 2020 InfluxData. All rights reserved. 4 So Why Monitor In the First Place? To Gain Insights! • Customer Experience • Performance & Cost • Trends • Troubleshooting & Remediation • Learning & Improvement
  5. 5. © 2020 InfluxData. All rights reserved. 5 What Goes Into a Monitoring Plan? Alerts System Knowledge People Actions Tools
  6. 6. © 2020 InfluxData. All rights reserved. 6 Alerting Best Practices • Break alert crafting into batches. Highest Priority First • Refine quickly. • Alert to prompt an action • Descriptive alerts to aid in prompt resolution • Don’t only use email
  7. 7. © 2020 InfluxData. All rights reserved. 7
  8. 8. © 2020 InfluxData. All rights reserved. 8
  9. 9. © 2020 InfluxData. All rights reserved. 9 Summary - Check your monitoring approach - Is it user-centric? - Are you measuring the right things? - Write a monitoring plan - Start monitoring, test and iterate The reason operations exists is to support the needs of the business.
  10. 10. Monitoring AWS with InfluxDB
  11. 11. © 2019 InfluxData. All rights reserved.11 Accumulate Act Telegraf Inputs CloudWatch plugin ECS plugin System plugin Docker plugin Kubernetes plugins Kinesis plugin MQTT & Modbus plugins Flux RDS joins 93 AWS services AWS ECS & Fargate AWS EC2 AWS EKS AWS Kinesis IoT devices & sensors AWS RDS MariaDB, MySQL, Postgres Analyze AWS Global Infrastructure InfluxDB Cloud InfluxDB Enterprise InfluxDB Purpose-built Time Series Database Realtime Data Stream Processing Visualization & Dashboarding Data Analysis & Anomaly Detection Alerting & Notifications Alerting Systems PagerDuty Slack Webhooks Telegraf Outputs AWS Kinesis AWS CloudWatch Grafana Client Libraries AWS Marketplace Billing
  12. 12. © 2020 InfluxData. All rights reserved. 12 Telegraf input plugins on GitHub Telegraf Inputs CloudWatch plugin ECS plugin System plugin Docker plugin Kubernetes plugins Kinesis plugin MQTT & Modbus plugins github.com/influxdata/telegraf/tree/master/plugins/inputs
  13. 13. Setup telegraf.conf for CloudWatch
  14. 14. © 2019 InfluxData. All rights reserved.14 Accumulate Act Telegraf Inputs ECS plugin System plugin Docker plugin Kubernetes plugins Kinesis plugin MQTT & Modbus plugins Flux RDS joins AWS ECS & Fargate AWS EC2 AWS EKS AWS Kinesis IoT devices & sensors AWS RDS MariaDB, MySQL, Postgres Analyze AWS Global Infrastructure InfluxDB Cloud InfluxDB Enterprise InfluxDB Purpose-built Time Series Database Realtime Data Stream Processing Visualization & Dashboarding Data Analysis & Anomaly Detection Alerting & Notifications Alerting Systems PagerDuty Slack Webhooks Telegraf Outputs AWS Kinesis AWS CloudWatch Grafana Client Libraries AWS Marketplace Billing CloudWatch plugin93 AWS services
  15. 15. © 2020 InfluxData. All rights reserved. 15 First, the overall agent config [agent] # Run telegraf with debug log messages? debug = false # How often to collect metrics interval = "30s" # Default flushing interval for all outputs flush_interval = "10s" # How many metrics to cache metric_buffer_limit = 50000
  16. 16. © 2020 InfluxData. All rights reserved. 16 Specify your cloud region [[inputs.cloudwatch]] # Specify your AWS Region # region = "eu-central-1" # Frankfurt # region = "eu-north-1" # Stockholm # region = "eu-west-1" # Dublin # region = "eu-west-2" # London # region = "eu-west-3" # Paris # region = "eu-south-1" # Milan # region = "us-east-1" # Virginia # region = "us-east-2" # Ohio # region = "us-west-1" # Northern California region = "us-west-2" # Oregon # https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html
  17. 17. © 2020 InfluxData. All rights reserved. 17 Specify your AWS credentials # AWS credentials # Credentials are loaded in the following order # 1) Assumed credentials via STS if role_arn is specified # 2) explicit credentials from 'access_key' and 'secret_key' # 3) shared profile from 'profile' # 4) environment variables # 5) shared credentials file # 6) EC2 Instance Profile # access_key = "" # secret_key = "" # token = "" # role_arn = "" # profile = "" shared_credential_file = "./credentials" [default] aws_access_key_id = AKIAI53FASDFP7J3KQ aws_secret_access_key = 4EZ7As/Lmr2d1JgUaIdakr+58hpBJ credentials
  18. 18. © 2020 InfluxData. All rights reserved. 18 Specify your collection timing # Requested CloudWatch aggregation Period (required - must be a multiple of 60s) period = "1m" # Collection Delay (required - must account for metrics availability via CloudWatch API) delay = "5m" # Recommended: use metric 'interval' that is a multiple of 'period' to avoid # gaps or overlap in pulled data interval = "1m"
  19. 19. © 2020 InfluxData. All rights reserved. 19 Configure your metric namespaces # Metric Statistic Namespace (required) # List of namespaces: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/aws-services-cloudwatch- metrics.html namespace = "AWS/EC2" # Maximum requests per second. Note that the global default AWS rate limit is # 50 reqs/sec, so if you define multiple namespaces, these should add up to a # maximum of 50. ratelimit = 25
  20. 20. © 2020 InfluxData. All rights reserved. 20 docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/aws-services-cloudwatch-metrics.html
  21. 21. © 2020 InfluxData. All rights reserved. 21 Add tags to find your metrics more easily # Optional tags that you can add [inputs.cloudwatch.tags] plugin = 'cloudwatch' aws_service = 'ec2'
  22. 22. © 2020 InfluxData. All rights reserved. 22 Add tags to find your metrics more easily # Optional tags that you can add [inputs.cloudwatch.tags] plugin = 'cloudwatch' aws_service = 'ec2'
  23. 23. © 2020 InfluxData. All rights reserved. 23 Specify the metrics you want Telegraf to pull [[inputs.cloudwatch.metrics]] # List of EC2 metrics available: # https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewing_metrics_with_cloudwatch.html names = ["StatusCheckFailed","EBSReadBytes","CPUSurplusCreditsCharged","EBSByteBalance%", "StatusCheckFailed_System","EBSWriteBytes","NetworkIn","CPUCreditUsage", "EBSIOBalance%","EBSReadOps","CPUCreditBalance","StatusCheckFailed_Instance", "CPUUtilization","NetworkOut"]
  24. 24. © 2020 InfluxData. All rights reserved. 24 List of metrics varies by AWS service
  25. 25. © 2020 InfluxData. All rights reserved. 25 Specify your instance [[inputs.cloudwatch.metrics.dimensions]] name = "InstanceId" # This will be unique for each AWS instance value = "i-06025f2c26acfbf47"
  26. 26. © 2020 InfluxData. All rights reserved. 26 Best practice: Have Telegraf monitor itself # Collect metrics on Telegraf itself [[inputs.internal]] collect_memstats = true # Tag stats with the metric name for easier retrieval [inputs.internal.tags] plugin = 'internal'
  27. 27. © 2020 InfluxData. All rights reserved. 27 Send data to your InfluxDB instance [[outputs.influxdb_v2]] # Location of your InfluxDB Cloud instance # Cloud URLs: https://v2.docs.influxdata.com/v2.0/reference/urls/ urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"] # Store token in an environment variable called US_WEST_2_1 token = "$US_WEST_2_1" # Your org is the email you signed up with organization = "asargent@influxdata.com" bucket = "aws" # About 5x faster content_encoding = "gzip"
  28. 28. © 2020 InfluxData. All rights reserved. 28 You can dual-write to multiple instances [[outputs.influxdb_v2]] # Cloud URLs: https://v2.docs.influxdata.com/v2.0/reference/urls/ urls = ["https://eu-central-1-1.aws.cloud2.influxdata.com"] # Store token in an environment variable called EU_CENTRAL_1_1 token = "$EU_CENTRAL_1_1" organization = "asargent+aws-eu-central-1@influxdata.com" bucket = "aws" content_encoding = "gzip"
  29. 29. © 2020 InfluxData. All rights reserved. 29 To troubleshoot, write line protocol to stdout [[outputs.file]] files = ["stdout"] data_format = "influx"
  30. 30. Demo
  31. 31. © 2020 InfluxData. All rights reserved. 31 github.com/alsargent/telegraf-cloudwatch
  32. 32. © 2019 InfluxData. All rights reserved.32 Accumulate Act Telegraf Inputs CloudWatch plugin ECS plugin System plugin Docker plugin Kubernetes plugins Kinesis plugin MQTT & Modbus plugins Flux RDS joins 93 AWS services AWS ECS & Fargate AWS EC2 AWS EKS AWS Kinesis IoT devices & sensors AWS RDS MariaDB, MySQL, Postgres Analyze InfluxDB Purpose-built Time Series Database Realtime Data Stream Processing Visualization & Dashboarding Data Analysis & Anomaly Detection Alerting & Notifications Alerting Systems PagerDuty Slack Webhooks Telegraf Outputs AWS Kinesis AWS CloudWatch Grafana Client LibrariesAWS Global Infrastructure InfluxDB Cloud InfluxDB Enterprise AWS Marketplace Billing
  33. 33. © 2020 InfluxData. All rights reserved. 33 InfluxDB on AWS
  34. 34. © 2020 InfluxData. All rights reserved. 34
  35. 35. © 2020 InfluxData. All rights reserved. 35

×