Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Prometheus on AWS

16,240 views

Published on

Prometheus on AWS (english version)

Published in: Technology

Prometheus on AWS

  1. 1. Prometheus on AWS
  2. 2. About me • MitsuhiroTanda • Infrastructure Engineer @GREE • Use Prometheus on AWS (1 year) • Grafana committer • @mtanda
  3. 3. Features • multi-dimensional data model • flexible query language • pull model over HTTP • service discovery • Prometheus values reliability
  4. 4. AWS Monitoring Problems • Instance lifecycle is short • Instance is launched/terminated byASG • Instance workload is not same amongAZ, …
  5. 5. Why we use Prometheus • multi-dimensional data model & flexible query language – aggregate metrics by Role/AZ, and compare the result – detect the instance which workload is differ among the Role • pull model over HTTP & service discovery – specify monitoring target by Role, ... – easily adapt monitoring target increase
  6. 6. multi-dimensional data model • record instance metadata to labels key value instance_id i-1234abcd instance_type ec2, rds, elasticache, elb, … instance_model t2.large, m4.large, c4.large, r3.large, … region ap-northeast-1, us-east-1, … availability_zone ap-northeast-1a, ap-northeast-1c, … role (instance tag) web, db, … environment (instance tag) production, staging, …
  7. 7. avg(cpu) by (availability_zone)
  8. 8. cpu{role="web"}
  9. 9. avg(cpu) by (role)
  10. 10. Service Discovery • auto detect monitoring target • Prometheus provides several SD – ec2_sd, consul_sd, kubernetes_sd, file_sd • (fundamental feature for Pull architecture)
  11. 11. ec2_sd • detect monitoring target by ec2:DescribeInstances API • specify monitoring target by AZ, InstanceTags, ... • example setting for specifying Web Role target - job_name: 'job_name' ec2_sd_configs: - region: ap-northeast-1 port: 9100 relabel_configs: - source_labels: [__meta_ec2_tag_Role] regex: web.* action: keep
  12. 12. How we deploy setting Prometheus (for web) Prometheus (for db) Role=web Role=db pack upload deploy edit このロゴはJenkins project (https://jenkins.io/)に帰属します。
  13. 13. CloudWatch support • We store CloudWatch metrics to Prometheus • Don't use cloudwatch_exporter, because it's depend on Java • Create in-house CloudWatch exporter by aws-sdk-go • Recording timestamp cause some problems – CloudWatch metrics emission is delayed for several minutes – Prometheus treat the metrics as stale, and drop it – I give up to record timestamp for some metrics
  14. 14. Instance Spec we use • use t2.micro - t2.medium instance • use gp2 EBS, volume size is 50-100GB • If the number of monitoring target is 50-100, t2.medium is enough to monitor them • I recommend to use t2.small or upper – t2.micro's memory size is not enough – need to change storage.local.memory-chunks • Sudden load increase can handled by Burst – t2 Instance burst – EBS(gp2) burst
  15. 15. Disk write workload
  16. 16. Disk usage • calculate per monitoring target instance • We have 150 - 300 metrics per one instance • scrape interval is 15 seconds • Disk usage becomes approximately 200MB per 1 month
  17. 17. Long term metrics storage • Prometheus doesn't support summarize metrics like rrdtool • The data size becomes large if you set long retention period • The default retention period is 15 days • Prometheus is not designed for long term metrics storage • To store metrics for a long term – Use Remote Storage (e.g. Graphite) – Launch another Prometheus for long term storage, and store summarized metrics data (we create metrics summarize exporter)
  18. 18. Using 1 year • daily operation – Prometheus workload is very stable – mostly no operation required • upgrade Prometheus – need to change configuration file due to format change – breaking change will come until version 1.0 • support new monitoring target middleware – create exporter for each middleware – by using Prometheus powerful query, exporter becomes very simple
  19. 19. Reference URL • http://www.robustperception.io/automatically-monitoring-ec2-instances/ • http://www.robustperception.io/how-to-have-labels-for-machine-roles/ • http://www.robustperception.io/life-of-a-label/ • http://www.slideshare.net/FabianReinartz/prometheus-storage-57557499

×