SlideShare a Scribd company logo
Prometheus on AWS
About me
• MitsuhiroTanda
• Infrastructure Engineer @GREE
• Use Prometheus on AWS (1 year)
• Grafana committer
• @mtanda
Features
• multi-dimensional data model
• flexible query language
• pull model over HTTP
• service discovery
• Prometheus values reliability
AWS Monitoring Problems
• Instance lifecycle is short
• Instance is launched/terminated byASG
• Instance workload is not same amongAZ, …
Why we use Prometheus
• multi-dimensional data model & flexible query
language
– aggregate metrics by Role/AZ, and compare the result
– detect the instance which workload is differ among the
Role
• pull model over HTTP & service discovery
– specify monitoring target by Role, ...
– easily adapt monitoring target increase
multi-dimensional data model
• record instance metadata to labels
key value
instance_id i-1234abcd
instance_type ec2, rds, elasticache, elb, …
instance_model t2.large, m4.large, c4.large, r3.large, …
region ap-northeast-1, us-east-1, …
availability_zone ap-northeast-1a, ap-northeast-1c, …
role (instance tag) web, db, …
environment (instance tag) production, staging, …
avg(cpu) by (availability_zone)
cpu{role="web"}
avg(cpu) by (role)
Service Discovery
• auto detect monitoring target
• Prometheus provides several SD
– ec2_sd, consul_sd, kubernetes_sd, file_sd
• (fundamental feature for Pull architecture)
ec2_sd
• detect monitoring target by ec2:DescribeInstances API
• specify monitoring target by AZ, InstanceTags, ...
• example setting for specifying Web Role target
- job_name: 'job_name'
ec2_sd_configs:
- region: ap-northeast-1
port: 9100
relabel_configs:
- source_labels: [__meta_ec2_tag_Role]
regex: web.*
action: keep
How we deploy setting
Prometheus
(for web)
Prometheus
(for db)
Role=web Role=db
pack
upload
deploy
edit
このロゴはJenkins project (https://jenkins.io/)に帰属します。
CloudWatch support
• We store CloudWatch metrics to Prometheus
• Don't use cloudwatch_exporter, because it's depend on
Java
• Create in-house CloudWatch exporter by aws-sdk-go
• Recording timestamp cause some problems
– CloudWatch metrics emission is delayed for several minutes
– Prometheus treat the metrics as stale, and drop it
– I give up to record timestamp for some metrics
Instance Spec we use
• use t2.micro - t2.medium instance
• use gp2 EBS, volume size is 50-100GB
• If the number of monitoring target is 50-100, t2.medium is enough to
monitor them
• I recommend to use t2.small or upper
– t2.micro's memory size is not enough
– need to change storage.local.memory-chunks
• Sudden load increase can handled by Burst
– t2 Instance burst
– EBS(gp2) burst
Disk write workload
Disk usage
• calculate per monitoring target instance
• We have 150 - 300 metrics per one instance
• scrape interval is 15 seconds
• Disk usage becomes approximately 200MB
per 1 month
Long term metrics storage
• Prometheus doesn't support summarize metrics like rrdtool
• The data size becomes large if you set long retention period
• The default retention period is 15 days
• Prometheus is not designed for long term metrics storage
• To store metrics for a long term
– Use Remote Storage (e.g. Graphite)
– Launch another Prometheus for long term storage, and store
summarized metrics data (we create metrics summarize exporter)
Using 1 year
• daily operation
– Prometheus workload is very stable
– mostly no operation required
• upgrade Prometheus
– need to change configuration file due to format change
– breaking change will come until version 1.0
• support new monitoring target middleware
– create exporter for each middleware
– by using Prometheus powerful query, exporter becomes very simple
Reference URL
• http://www.robustperception.io/automatically-monitoring-ec2-instances/
• http://www.robustperception.io/how-to-have-labels-for-machine-roles/
• http://www.robustperception.io/life-of-a-label/
• http://www.slideshare.net/FabianReinartz/prometheus-storage-57557499

More Related Content

What's hot

Prometheus
PrometheusPrometheus
Prometheus
wyukawa
 
Kubernetes at Telekom Austria Group
Kubernetes at Telekom Austria Group Kubernetes at Telekom Austria Group
Kubernetes at Telekom Austria Group
Oliver Moser
 
Prometheus london
Prometheus londonPrometheus london
Prometheus london
wyukawa
 
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Brian Brazil
 
An Introduction to Prometheus
An Introduction to PrometheusAn Introduction to Prometheus
An Introduction to Prometheus
Evgeny Shmarnev
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
振东 刘
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Brian Brazil
 
Monitoring a Kubernetes-backed microservice architecture with Prometheus
Monitoring a Kubernetes-backed microservice architecture with PrometheusMonitoring a Kubernetes-backed microservice architecture with Prometheus
Monitoring a Kubernetes-backed microservice architecture with Prometheus
Fabian Reinartz
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)
Brian Brazil
 
Cloud Monitoring with Prometheus
Cloud Monitoring with PrometheusCloud Monitoring with Prometheus
Cloud Monitoring with Prometheus
QAware GmbH
 
Project Reactor By Example
Project Reactor By ExampleProject Reactor By Example
Project Reactor By Example
Denny Abraham Cheriyan
 
Lessons Learned from Building and Operating Scuba
Lessons Learned from Building and Operating ScubaLessons Learned from Building and Operating Scuba
Lessons Learned from Building and Operating Scuba
SingleStore
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus Overview
Brian Brazil
 
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...
Weaveworks
 
Low latency stream processing with jet
Low latency stream processing with jetLow latency stream processing with jet
Low latency stream processing with jet
StreamNative
 
Portable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache BeamPortable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache Beam
confluent
 
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache KafkaKafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
confluent
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Brian Brazil
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
Apache Kafka TLV
 
Introduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with StormIntroduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with Storm
Brandon O'Brien
 

What's hot (20)

Prometheus
PrometheusPrometheus
Prometheus
 
Kubernetes at Telekom Austria Group
Kubernetes at Telekom Austria Group Kubernetes at Telekom Austria Group
Kubernetes at Telekom Austria Group
 
Prometheus london
Prometheus londonPrometheus london
Prometheus london
 
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)
 
An Introduction to Prometheus
An Introduction to PrometheusAn Introduction to Prometheus
An Introduction to Prometheus
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Monitoring a Kubernetes-backed microservice architecture with Prometheus
Monitoring a Kubernetes-backed microservice architecture with PrometheusMonitoring a Kubernetes-backed microservice architecture with Prometheus
Monitoring a Kubernetes-backed microservice architecture with Prometheus
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)
 
Cloud Monitoring with Prometheus
Cloud Monitoring with PrometheusCloud Monitoring with Prometheus
Cloud Monitoring with Prometheus
 
Project Reactor By Example
Project Reactor By ExampleProject Reactor By Example
Project Reactor By Example
 
Lessons Learned from Building and Operating Scuba
Lessons Learned from Building and Operating ScubaLessons Learned from Building and Operating Scuba
Lessons Learned from Building and Operating Scuba
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus Overview
 
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...
Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...
 
Low latency stream processing with jet
Low latency stream processing with jetLow latency stream processing with jet
Low latency stream processing with jet
 
Portable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache BeamPortable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache Beam
 
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache KafkaKafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
 
Introduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with StormIntroduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with Storm
 

Viewers also liked

Grafana optimization for Prometheus
Grafana optimization for PrometheusGrafana optimization for Prometheus
Grafana optimization for Prometheus
Mitsuhiro Tanda
 
Application security as crucial to the modern distributed trust model
Application security as crucial to   the modern distributed trust modelApplication security as crucial to   the modern distributed trust model
Application security as crucial to the modern distributed trust model
LINE Corporation
 
Implementing Trusted Endpoints in the Mobile World
Implementing Trusted Endpoints in the Mobile WorldImplementing Trusted Endpoints in the Mobile World
Implementing Trusted Endpoints in the Mobile World
LINE Corporation
 
“Your Security, More Simple.” by utilizing FIDO Authentication
“Your Security, More Simple.” by utilizing FIDO Authentication“Your Security, More Simple.” by utilizing FIDO Authentication
“Your Security, More Simple.” by utilizing FIDO Authentication
LINE Corporation
 
Drawing the Line Correctly: Enough Security, Everywhere
Drawing the Line Correctly:   Enough Security, EverywhereDrawing the Line Correctly:   Enough Security, Everywhere
Drawing the Line Correctly: Enough Security, Everywhere
LINE Corporation
 
FRONTIERS IN CRYPTOGRAPHY
FRONTIERS IN CRYPTOGRAPHYFRONTIERS IN CRYPTOGRAPHY
FRONTIERS IN CRYPTOGRAPHY
LINE Corporation
 
FIDO認証で「あんしんをもっと便利に」
FIDO認証で「あんしんをもっと便利に」FIDO認証で「あんしんをもっと便利に」
FIDO認証で「あんしんをもっと便利に」
LINE Corporation
 
ゲーム開発を加速させる クライアントセキュリティ
ゲーム開発を加速させる クライアントセキュリティゲーム開発を加速させる クライアントセキュリティ
ゲーム開発を加速させる クライアントセキュリティ
LINE Corporation
 

Viewers also liked (8)

Grafana optimization for Prometheus
Grafana optimization for PrometheusGrafana optimization for Prometheus
Grafana optimization for Prometheus
 
Application security as crucial to the modern distributed trust model
Application security as crucial to   the modern distributed trust modelApplication security as crucial to   the modern distributed trust model
Application security as crucial to the modern distributed trust model
 
Implementing Trusted Endpoints in the Mobile World
Implementing Trusted Endpoints in the Mobile WorldImplementing Trusted Endpoints in the Mobile World
Implementing Trusted Endpoints in the Mobile World
 
“Your Security, More Simple.” by utilizing FIDO Authentication
“Your Security, More Simple.” by utilizing FIDO Authentication“Your Security, More Simple.” by utilizing FIDO Authentication
“Your Security, More Simple.” by utilizing FIDO Authentication
 
Drawing the Line Correctly: Enough Security, Everywhere
Drawing the Line Correctly:   Enough Security, EverywhereDrawing the Line Correctly:   Enough Security, Everywhere
Drawing the Line Correctly: Enough Security, Everywhere
 
FRONTIERS IN CRYPTOGRAPHY
FRONTIERS IN CRYPTOGRAPHYFRONTIERS IN CRYPTOGRAPHY
FRONTIERS IN CRYPTOGRAPHY
 
FIDO認証で「あんしんをもっと便利に」
FIDO認証で「あんしんをもっと便利に」FIDO認証で「あんしんをもっと便利に」
FIDO認証で「あんしんをもっと便利に」
 
ゲーム開発を加速させる クライアントセキュリティ
ゲーム開発を加速させる クライアントセキュリティゲーム開発を加速させる クライアントセキュリティ
ゲーム開発を加速させる クライアントセキュリティ
 

Similar to Prometheus on AWS

Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
Taro L. Saito
 
Understanding Elastic Block Store Availability and Performance
Understanding Elastic Block Store Availability and PerformanceUnderstanding Elastic Block Store Availability and Performance
Understanding Elastic Block Store Availability and Performance
Amazon Web Services
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on docker
Federico Palladoro
 
Fastest Servlets in the West
Fastest Servlets in the WestFastest Servlets in the West
Fastest Servlets in the West
Stuart (Pid) Williams
 
Presto
PrestoPresto
Presto
Knoldus Inc.
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
amesar0
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
Ankita Kapratwar
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
Anna Ossowski
 
Hardware Provisioning
Hardware Provisioning Hardware Provisioning
Hardware Provisioning
MongoDB
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
MongoDB
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
Amazon Web Services
 
Log analytics with ELK stack
Log analytics with ELK stackLog analytics with ELK stack
Log analytics with ELK stack
AWS User Group Bengaluru
 
Performance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovPerformance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei Radov
Valeriia Maliarenko
 
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataDataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
Hakka Labs
 
Optimizing spark based data pipelines - are you up for it?
Optimizing spark based data pipelines - are you up for it?Optimizing spark based data pipelines - are you up for it?
Optimizing spark based data pipelines - are you up for it?
Etti Gur
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
Amazon Web Services
 
Presto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - LyftPresto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - Lyft
kbajda
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
Ruslan Meshenberg
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructureharendra_pathak
 

Similar to Prometheus on AWS (20)

Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Understanding Elastic Block Store Availability and Performance
Understanding Elastic Block Store Availability and PerformanceUnderstanding Elastic Block Store Availability and Performance
Understanding Elastic Block Store Availability and Performance
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on docker
 
Fastest Servlets in the West
Fastest Servlets in the WestFastest Servlets in the West
Fastest Servlets in the West
 
Presto
PrestoPresto
Presto
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
 
Hardware Provisioning
Hardware Provisioning Hardware Provisioning
Hardware Provisioning
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
Log analytics with ELK stack
Log analytics with ELK stackLog analytics with ELK stack
Log analytics with ELK stack
 
Performance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei RadovPerformance testing in scope of migration to cloud by Serghei Radov
Performance testing in scope of migration to cloud by Serghei Radov
 
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataDataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
 
Optimizing spark based data pipelines - are you up for it?
Optimizing spark based data pipelines - are you up for it?Optimizing spark based data pipelines - are you up for it?
Optimizing spark based data pipelines - are you up for it?
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
 
Presto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - LyftPresto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - Lyft
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructure
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 

Prometheus on AWS

  • 2. About me • MitsuhiroTanda • Infrastructure Engineer @GREE • Use Prometheus on AWS (1 year) • Grafana committer • @mtanda
  • 3. Features • multi-dimensional data model • flexible query language • pull model over HTTP • service discovery • Prometheus values reliability
  • 4. AWS Monitoring Problems • Instance lifecycle is short • Instance is launched/terminated byASG • Instance workload is not same amongAZ, …
  • 5. Why we use Prometheus • multi-dimensional data model & flexible query language – aggregate metrics by Role/AZ, and compare the result – detect the instance which workload is differ among the Role • pull model over HTTP & service discovery – specify monitoring target by Role, ... – easily adapt monitoring target increase
  • 6. multi-dimensional data model • record instance metadata to labels key value instance_id i-1234abcd instance_type ec2, rds, elasticache, elb, … instance_model t2.large, m4.large, c4.large, r3.large, … region ap-northeast-1, us-east-1, … availability_zone ap-northeast-1a, ap-northeast-1c, … role (instance tag) web, db, … environment (instance tag) production, staging, …
  • 10. Service Discovery • auto detect monitoring target • Prometheus provides several SD – ec2_sd, consul_sd, kubernetes_sd, file_sd • (fundamental feature for Pull architecture)
  • 11. ec2_sd • detect monitoring target by ec2:DescribeInstances API • specify monitoring target by AZ, InstanceTags, ... • example setting for specifying Web Role target - job_name: 'job_name' ec2_sd_configs: - region: ap-northeast-1 port: 9100 relabel_configs: - source_labels: [__meta_ec2_tag_Role] regex: web.* action: keep
  • 12. How we deploy setting Prometheus (for web) Prometheus (for db) Role=web Role=db pack upload deploy edit このロゴはJenkins project (https://jenkins.io/)に帰属します。
  • 13. CloudWatch support • We store CloudWatch metrics to Prometheus • Don't use cloudwatch_exporter, because it's depend on Java • Create in-house CloudWatch exporter by aws-sdk-go • Recording timestamp cause some problems – CloudWatch metrics emission is delayed for several minutes – Prometheus treat the metrics as stale, and drop it – I give up to record timestamp for some metrics
  • 14. Instance Spec we use • use t2.micro - t2.medium instance • use gp2 EBS, volume size is 50-100GB • If the number of monitoring target is 50-100, t2.medium is enough to monitor them • I recommend to use t2.small or upper – t2.micro's memory size is not enough – need to change storage.local.memory-chunks • Sudden load increase can handled by Burst – t2 Instance burst – EBS(gp2) burst
  • 16. Disk usage • calculate per monitoring target instance • We have 150 - 300 metrics per one instance • scrape interval is 15 seconds • Disk usage becomes approximately 200MB per 1 month
  • 17. Long term metrics storage • Prometheus doesn't support summarize metrics like rrdtool • The data size becomes large if you set long retention period • The default retention period is 15 days • Prometheus is not designed for long term metrics storage • To store metrics for a long term – Use Remote Storage (e.g. Graphite) – Launch another Prometheus for long term storage, and store summarized metrics data (we create metrics summarize exporter)
  • 18. Using 1 year • daily operation – Prometheus workload is very stable – mostly no operation required • upgrade Prometheus – need to change configuration file due to format change – breaking change will come until version 1.0 • support new monitoring target middleware – create exporter for each middleware – by using Prometheus powerful query, exporter becomes very simple
  • 19. Reference URL • http://www.robustperception.io/automatically-monitoring-ec2-instances/ • http://www.robustperception.io/how-to-have-labels-for-machine-roles/ • http://www.robustperception.io/life-of-a-label/ • http://www.slideshare.net/FabianReinartz/prometheus-storage-57557499