Prometheus on AWS

•Download as PPTX, PDF•

13 likes•18,148 views

Mitsuhiro Tanda

Prometheus on AWS (english version)

Technology

About me
• MitsuhiroTanda
• Infrastructure Engineer @GREE
• Use Prometheus on AWS (1 year)
• Grafana committer
• @mtanda

Features
• multi-dimensional data model
• flexible query language
• pull model over HTTP
• service discovery
• Prometheus values reliability

AWS Monitoring Problems
• Instance lifecycle is short
• Instance is launched/terminated byASG
• Instance workload is not same amongAZ, …

Why we use Prometheus
• multi-dimensional data model & flexible query
language
– aggregate metrics by Role/AZ, and compare the result
– detect the instance which workload is differ among the
Role
• pull model over HTTP & service discovery
– specify monitoring target by Role, ...
– easily adapt monitoring target increase

multi-dimensional data model
• record instance metadata to labels
key value
instance_id i-1234abcd
instance_type ec2, rds, elasticache, elb, …
instance_model t2.large, m4.large, c4.large, r3.large, …
region ap-northeast-1, us-east-1, …
availability_zone ap-northeast-1a, ap-northeast-1c, …
role (instance tag) web, db, …
environment (instance tag) production, staging, …

Service Discovery
• auto detect monitoring target
• Prometheus provides several SD
– ec2_sd, consul_sd, kubernetes_sd, file_sd
• (fundamental feature for Pull architecture)

ec2_sd
• detect monitoring target by ec2:DescribeInstances API
• specify monitoring target by AZ, InstanceTags, ...
• example setting for specifying Web Role target
- job_name: 'job_name'
ec2_sd_configs:
- region: ap-northeast-1
port: 9100
relabel_configs:
- source_labels: [__meta_ec2_tag_Role]
regex: web.*
action: keep

How we deploy setting
Prometheus
(for web)
Prometheus
(for db)
Role=web Role=db
pack
upload
deploy
edit
このロゴはJenkins project (https://jenkins.io/)に帰属します。

CloudWatch support
• We store CloudWatch metrics to Prometheus
• Don't use cloudwatch_exporter, because it's depend on
Java
• Create in-house CloudWatch exporter by aws-sdk-go
• Recording timestamp cause some problems
– CloudWatch metrics emission is delayed for several minutes
– Prometheus treat the metrics as stale, and drop it
– I give up to record timestamp for some metrics

Instance Spec we use
• use t2.micro - t2.medium instance
• use gp2 EBS, volume size is 50-100GB
• If the number of monitoring target is 50-100, t2.medium is enough to
monitor them
• I recommend to use t2.small or upper
– t2.micro's memory size is not enough
– need to change storage.local.memory-chunks
• Sudden load increase can handled by Burst
– t2 Instance burst
– EBS(gp2) burst

Disk usage
• calculate per monitoring target instance
• We have 150 - 300 metrics per one instance
• scrape interval is 15 seconds
• Disk usage becomes approximately 200MB
per 1 month

Long term metrics storage
• Prometheus doesn't support summarize metrics like rrdtool
• The data size becomes large if you set long retention period
• The default retention period is 15 days
• Prometheus is not designed for long term metrics storage
• To store metrics for a long term
– Use Remote Storage (e.g. Graphite)
– Launch another Prometheus for long term storage, and store
summarized metrics data (we create metrics summarize exporter)

Using 1 year
• daily operation
– Prometheus workload is very stable
– mostly no operation required
• upgrade Prometheus
– need to change configuration file due to format change
– breaking change will come until version 1.0
• support new monitoring target middleware
– create exporter for each middleware
– by using Prometheus powerful query, exporter becomes very simple

Reference URL
• http://www.robustperception.io/automatically-monitoring-ec2-instances/
• http://www.robustperception.io/how-to-have-labels-for-machine-roles/
• http://www.robustperception.io/life-of-a-label/
• http://www.slideshare.net/FabianReinartz/prometheus-storage-57557499

In this talk, we'll explain our journey from having near-zero monitoring to having all of our infrastructure monitored with the necessary metrics and alerts. We will share with the audience some of the mistakes we did and what lessons we have learned. We currently have around 200 instances monitored with a comfortable cost-effective in-house monitoring stack based on Prometheus. We want to demonstrate that you don't need to have a big fleet to embrace Prometheus and that it is a non-expensive solution for monitoring. ---------- ShuttleCloud is a small startup specialized in email and contacts migrations. We developed a reliable migration platform in high availability used by clients like Gmail, GContacts and Comcast. For example, Gmail alone has imported data for 3 million users with our API and we process hundreds of terabytes every month. ------------- Follow us on Twitter: @ShuttleCloud: https://twitter.com/ShuttleCloud @ShuttleCloudEng: https://twitter.com/ShuttleCloudEng ShuttleCloud.com

Monitoring Kafka w/ Prometheus

kawamuray

Monitoring Kubernetes with Prometheus

Tobias Schmidt

3.1.Performance and BigData Ecosystem

振东刘

As many startups of the last decade, SoundCloud’s architecture started as a Ruby-on-Rails monolith, which later had to be broken into microservices to cope with the growing size and complexity of the site. The microservices initially ran on an in-house container management and deployment platform. Recently, the company has started to migrate to Kubernetes. With the introduction of microservices, the existing conventional monitoring setup failed both conceptually and in terms of scalability. Thus, starting in 2012, SoundCloud invested heavily into the development of the open-source monitoring system Prometheus, which was designed for large-scale highly dynamic service-oriented architectures. Migrating to Kubernetes, it became apparent that Prometheus and Kubernetes are a match made in open-source heaven. The talk will demonstrate the current Prometheus setup at SoundCloud, monitoring a large-scale Kubernetes cluster.

Prometheus for Monitoring Metrics (Fermilab 2018)

Brian Brazil

Cloud Monitoring with Prometheus

QAware GmbH

Project Reactor By Example

Denny Abraham Cheriyan

Lessons Learned from Building and Operating Scuba

SingleStore

Prometheus Overview

Brian Brazil

Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...

Weaveworks

In this talk we present a prototype solution for multitenant, scale-out Prometheus. Don't worry, its open source! Our solution turns a lot of the Prometheus architectural assumptions on its head, by marrying a scale-out PromQL query engine with a storage layer based on DynamoDB & S3. We have disaggregated the Prometheus binary into a microservices-style architecture, with separate services for distribution, ingest and storage. By designing all these services as fungible replicas, this solution can be scaled out with ease and failure of any individual replica can be dealt with gracefully. This multitenant, scale-out Prometheus service forms a core component of the Weave Cloud, a hosted management, monitoring and visualisation platform for microservice & containerised applications. This platform is built from 100% open source components, and we're working with the Prometheus community to contribute all the changes we've made back to Prometheus.

Low latency stream processing with jet

StreamNative

Portable Streaming Pipelines with Apache Beam

confluent

Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka

confluent

Prometheus: A Next Generation Monitoring System (FOSDEM 2016)

Brian Brazil

Distributed Kafka Architecture Taboola Scale

Apache Kafka TLV

___________________________________________ Meetup#7 | Session 2 | 21/03/2018 | Taboola _____________________________________________ In this talk, we will present our multi-DC Kafka architecture, and discuss how we tackle sending and handling 10B+ messages per day, with maximum availability and no tolerance for data loss. Our architecture includes technologies such as Cassandra, Spark, HDFS, and Vertica - with Kafka as the backbone that feeds them all.

Introduction to Streaming Distributed Processing with Storm

Brandon O'Brien

Contact: https://www.linkedin.com/in/brandonjobrien @hakczar Introducing streaming data concepts, Storm cluster architecture, Storm topology architecture, and demonstrate working example of a WordCount topology for SIGKDD Seattle chapter meetup. Presented by Brandon O'Brien Code example: https://github.com/OpenDataMining/brandonobrien Meetup: http://www.meetup.com/seattlesigkdd/events/222955114/

Grafana optimization for Prometheus

Mitsuhiro Tanda

Application security as crucial to the modern distributed trust model

LINE Corporation

What's hot

Prometheus

wyukawa

Kubernetes at Telekom Austria Group

Oliver Moser

Prometheus london

wyukawa

Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)

Brian Brazil

An Introduction to Prometheus

Evgeny Shmarnev

3.2 Streaming and Messaging

振东刘

Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)

Brian Brazil

Monitoring a Kubernetes-backed microservice architecture with Prometheus

Fabian Reinartz

Prometheus for Monitoring Metrics (Fermilab 2018)

Brian Brazil

Cloud Monitoring with Prometheus

QAware GmbH

Project Reactor By Example

Denny Abraham Cheriyan

Lessons Learned from Building and Operating Scuba

SingleStore

Prometheus Overview

Brian Brazil

Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...

Weaveworks

Low latency stream processing with jet

StreamNative

Portable Streaming Pipelines with Apache Beam

confluent

Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka

confluent

Prometheus: A Next Generation Monitoring System (FOSDEM 2016)

Brian Brazil

Distributed Kafka Architecture Taboola Scale

Apache Kafka TLV

Introduction to Streaming Distributed Processing with Storm

Brandon O'Brien

What's hot (20)

Prometheus

Kubernetes at Telekom Austria Group

Prometheus london

Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)

An Introduction to Prometheus

3.2 Streaming and Messaging

Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)

Monitoring a Kubernetes-backed microservice architecture with Prometheus

Prometheus for Monitoring Metrics (Fermilab 2018)

Cloud Monitoring with Prometheus

Project Reactor By Example

Lessons Learned from Building and Operating Scuba

Prometheus Overview

Project Frankenstein: A multitenant, horizontally scalable Prometheus as a se...

Low latency stream processing with jet

Portable Streaming Pipelines with Apache Beam

Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka

Prometheus: A Next Generation Monitoring System (FOSDEM 2016)

Distributed Kafka Architecture Taboola Scale

Introduction to Streaming Distributed Processing with Storm

Viewers also liked

Grafana optimization for Prometheus

Mitsuhiro Tanda

Application security as crucial to the modern distributed trust model

LINE Corporation

Implementing Trusted Endpoints in the Mobile World

LINE Corporation

“Your Security, More Simple.” by utilizing FIDO Authentication

LINE Corporation

Drawing the Line Correctly: Enough Security, Everywhere

LINE Corporation

FRONTIERS IN CRYPTOGRAPHY

LINE Corporation

FIDO認証で「あんしんをもっと便利に」

LINE Corporation

ゲーム開発を加速させるクライアントセキュリティ

LINE Corporation

Viewers also liked (8)

Grafana optimization for Prometheus

Application security as crucial to the modern distributed trust model

Implementing Trusted Endpoints in the Mobile World

“Your Security, More Simple.” by utilizing FIDO Authentication

Drawing the Line Correctly: Enough Security, Everywhere

FRONTIERS IN CRYPTOGRAPHY

FIDO認証で「あんしんをもっと便利に」

ゲーム開発を加速させるクライアントセキュリティ

Similar to Prometheus on AWS

Presto At Treasure Data

Taro L. Saito

Understanding Elastic Block Store Availability and Performance

Amazon Web Services

Depending on your application needs, Elastic Block Store’s volumes can be configured for optimal performance and higher availability. In this session, we will present the different design characteristics of EBS Standard and Provisioned IOPS volumes, provide technical insights on how to think about EBS performance and availability, and share best practices to achieve higher availability and performance.

Big data Argentina meetup 2020-09: Intro to presto on docker

Federico Palladoro

Fastest Servlets in the West

Stuart (Pid) Williams

The venerable Servlet Container still has some performance tricks up its sleeve - this talk will demonstrate Apache Tomcat's stability under high load, describe some do's (and some don'ts!), explain how to performance test a Servlet-based application, troubleshoot and tune the container and your application and compare the performance characteristics of the different Tomcat connectors. The presenters will share their combined experience supporting real Tomcat applications for over 20 years and show how a few small changes can make a big, big difference.

Presto

Knoldus Inc.

Cloud Security Monitoring and Spark Analytics

amesar0

Megastore by Google

Ankita Kapratwar

[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...

Anna Ossowski

Hardware Provisioning

MongoDB

Capacity Planning

MongoDB

Deploying any software can be a challenge if you don't understand how resources are used or how to plan for the capacity of your systems. Whether you need to deploy or grow a single MongoDB instance, replica set, or tens of sharded clusters then you probably share the same challenges in trying to size that deployment. This webinar will cover what resources MongoDB uses, and how to plan for their use in your deployment. Topics covered will include understanding how to model and plan capacity needs for new and growing deployments. The goal of this webinar will be to provide you with the tools needed to be successful in managing your MongoDB capacity planning tasks.

AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...

Amazon Web Services

In this session, you will learn the key differences between a relational database management service (RDBMS) and non-relational (NoSQL) databases like Amazon DynamoDB. You will learn about suitable and unsuitable use cases for NoSQL databases. You'll learn strategies for migrating from an RDBMS to DynamoDB through a 5-phase, iterative approach. See how Sony migrated an on-premises MySQL database to the cloud with Amazon DynamoDB, and see the results of this migration.

Log analytics with ELK stack

AWS User Group Bengaluru

Performance testing in scope of migration to cloud by Serghei Radov

Valeriia Maliarenko

DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data

Hakka Labs

By Doug Daniels (Director of Engineering, Data Dog) At Datadog, we collect hundreds of billions of metric data points per day from hosts, services, and customers all over the world. In addition charting and monitoring this data in real time, we also run many large-scale offline jobs to apply algorithms and compute aggregations on the data. In the past months, we’ve migrated our largest data sets over to Apache Parquet—an efficient, portable columnar storage format

Optimizing spark based data pipelines - are you up for it?

Etti Gur

Etti Gur from Israel, Senior Big Data Engineer @ Nielsen, will talk about Optimizing spark-based data pipelines - are you up for it? In Nielsen, we ingest billions of events per day into our big data stores and we need to do it in a scalable yet cost-efficient manner. In this talk, we will discuss how we significantly optimized our Spark-based in-flight analytics daily pipeline, reducing its total execution time from over 20 hours down to 1 hour, resulting in a huge cost reduction. Topics include: * Ways to identify Spark optimization opportunities; * Optimizing Spark resource allocation; * Parallelizing Spark output phase with dynamic partition inserts; * Running multiple Spark ''jobs' in parallel within a single Spark application;

AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...

Amazon Web Services

Learn about architecture best practices for combining AWS storage and database technologies. We outline AWS storage options (Amazon EBS, Amazon EC2 Instance Storage, Amazon S3 and Amazon Glacier) along with AWS database options including Amazon ElastiCache (in-memory data store), Amazon RDS (SQL database), Amazon DynamoDB (NoSQL database), Amazon CloudSearch (search), Amazon EMR (hadoop) and Amazon Redshift (data warehouse). Then we discuss how to architect your database tier by using the right database and storage technologies to achieve the required functionality, performance, availability, and durability—at the right cost.

Presto Summit 2018 - 07 - Lyft

kbajda

NetflixOSS Meetup season 3 episode 1

Ruslan Meshenberg

Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...

javier ramirez

En esta sesión voy a contar las decisiones técnicas que tomamos al desarrollar QuestDB, una base de datos Open Source para series temporales compatible con Postgres, y cómo conseguimos escribir más de cuatro millones de filas por segundo sin bloquear o enlentecer las consultas. Hablaré de cosas como (zero) Garbage Collection, vectorización de instrucciones usando SIMD, reescribir en lugar de reutilizar para arañar microsegundos, aprovecharse de los avances en procesadores, discos duros y sistemas operativos, como por ejemplo el soporte de io_uring, o del balance entre experiencia de usuario y rendimiento cuando se plantean nuevas funcionalidades.

Architectures, Frameworks and Infrastructureharendra_pathak

Similar to Prometheus on AWS (20)

Presto At Treasure Data

Understanding Elastic Block Store Availability and Performance

Big data Argentina meetup 2020-09: Intro to presto on docker

Fastest Servlets in the West

Presto

Cloud Security Monitoring and Spark Analytics

Megastore by Google

[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...

Hardware Provisioning

Capacity Planning

AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...

Log analytics with ELK stack

Performance testing in scope of migration to cloud by Serghei Radov

DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data

Optimizing spark based data pipelines - are you up for it?

AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...

Presto Summit 2018 - 07 - Lyft

NetflixOSS Meetup season 3 episode 1

Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...

Architectures, Frameworks and Infrastructure

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 4

DianaGray10

Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap. The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies. Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques What will you get from this session? 1. Insights into SAP testing best practices 2. Heatmap utilization for testing 3. Optimization of testing processes 4. Demo Topics covered: Execution from the test manager Orchestrator execution result Defect reporting SAP heatmap example with demo Speaker: Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

Key Trends Shaping the Future of Infrastructure.pdf

Cheryl Hung

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

Paul Groth

The Art of the Pitch: WordPress Relationships and Sales

Laura Byrne

Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes? All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...

Product School

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

DanBrown980551

Do you want to learn how to model and simulate an electrical network from scratch in under an hour? Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)! During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook. PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides: - A fully editable and extendable library for grid component modelling; - Visualization tools to display your network; - Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses; The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well. What you will learn during the webinar: - For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills; - For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.

Search and Society: Reimagining Information Access for Radical Futures

Bhaskar Mitra

The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.

How world-class product teams are winning in the AI era by CEO and Founder, P...

Product School

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

91mobiles

"Impact of front-end architecture on development cost", Viktor Turskyi

Fwdays

I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx

Abida Shariff

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Product School

Assuring Contact Center Experiences for Your Customers With ThousandEyes

ThousandEyes

Epistemic Interaction - tuning interfaces to provide information for AI support

Alan Dix

Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024 https://alandix.com/academic/papers/synergy2024-epistemic/ As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

Jeffrey Haguewood

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams. Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

BookNet Canada

The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more. Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/ Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.

Leading Change strategies and insights for effective change management pdf 1.pdf

OnBoard

The Future of Platform Engineering

Jemma Hussein Allen

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 4

Key Trends Shaping the Future of Infrastructure.pdf

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

The Art of the Pitch: WordPress Relationships and Sales

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...

GraphRAG is All You need? LLM & Knowledge Graph

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

Search and Society: Reimagining Information Access for Radical Futures

How world-class product teams are winning in the AI era by CEO and Founder, P...

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

"Impact of front-end architecture on development cost", Viktor Turskyi

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Assuring Contact Center Experiences for Your Customers With ThousandEyes

Epistemic Interaction - tuning interfaces to provide information for AI support

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

Leading Change strategies and insights for effective change management pdf 1.pdf

The Future of Platform Engineering

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

Prometheus on AWS

1. Prometheus on AWS

2. About me • MitsuhiroTanda • Infrastructure Engineer @GREE • Use Prometheus on AWS (1 year) • Grafana committer • @mtanda

3. Features • multi-dimensional data model • flexible query language • pull model over HTTP • service discovery • Prometheus values reliability

4. AWS Monitoring Problems • Instance lifecycle is short • Instance is launched/terminated byASG • Instance workload is not same amongAZ, …

5. Why we use Prometheus • multi-dimensional data model & flexible query language – aggregate metrics by Role/AZ, and compare the result – detect the instance which workload is differ among the Role • pull model over HTTP & service discovery – specify monitoring target by Role, ... – easily adapt monitoring target increase

6. multi-dimensional data model • record instance metadata to labels key value instance_id i-1234abcd instance_type ec2, rds, elasticache, elb, … instance_model t2.large, m4.large, c4.large, r3.large, … region ap-northeast-1, us-east-1, … availability_zone ap-northeast-1a, ap-northeast-1c, … role (instance tag) web, db, … environment (instance tag) production, staging, …

7. avg(cpu) by (availability_zone)

8. cpu{role="web"}

9. avg(cpu) by (role)

10. Service Discovery • auto detect monitoring target • Prometheus provides several SD – ec2_sd, consul_sd, kubernetes_sd, file_sd • (fundamental feature for Pull architecture)

11. ec2_sd • detect monitoring target by ec2:DescribeInstances API • specify monitoring target by AZ, InstanceTags, ... • example setting for specifying Web Role target - job_name: 'job_name' ec2_sd_configs: - region: ap-northeast-1 port: 9100 relabel_configs: - source_labels: [__meta_ec2_tag_Role] regex: web.* action: keep

12. How we deploy setting Prometheus (for web) Prometheus (for db) Role=web Role=db pack upload deploy edit このロゴはJenkins project (https://jenkins.io/)に帰属します。

13. CloudWatch support • We store CloudWatch metrics to Prometheus • Don't use cloudwatch_exporter, because it's depend on Java • Create in-house CloudWatch exporter by aws-sdk-go • Recording timestamp cause some problems – CloudWatch metrics emission is delayed for several minutes – Prometheus treat the metrics as stale, and drop it – I give up to record timestamp for some metrics

14. Instance Spec we use • use t2.micro - t2.medium instance • use gp2 EBS, volume size is 50-100GB • If the number of monitoring target is 50-100, t2.medium is enough to monitor them • I recommend to use t2.small or upper – t2.micro's memory size is not enough – need to change storage.local.memory-chunks • Sudden load increase can handled by Burst – t2 Instance burst – EBS(gp2) burst

15. Disk write workload

16. Disk usage • calculate per monitoring target instance • We have 150 - 300 metrics per one instance • scrape interval is 15 seconds • Disk usage becomes approximately 200MB per 1 month

17. Long term metrics storage • Prometheus doesn't support summarize metrics like rrdtool • The data size becomes large if you set long retention period • The default retention period is 15 days • Prometheus is not designed for long term metrics storage • To store metrics for a long term – Use Remote Storage (e.g. Graphite) – Launch another Prometheus for long term storage, and store summarized metrics data (we create metrics summarize exporter)

18. Using 1 year • daily operation – Prometheus workload is very stable – mostly no operation required • upgrade Prometheus – need to change configuration file due to format change – breaking change will come until version 1.0 • support new monitoring target middleware – create exporter for each middleware – by using Prometheus powerful query, exporter becomes very simple

19. Reference URL • http://www.robustperception.io/automatically-monitoring-ec2-instances/ • http://www.robustperception.io/how-to-have-labels-for-machine-roles/ • http://www.robustperception.io/life-of-a-label/ • http://www.slideshare.net/FabianReinartz/prometheus-storage-57557499

Prometheus on AWS

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Prometheus on AWS

Similar to Prometheus on AWS (20)

Recently uploaded

Recently uploaded (20)

Prometheus on AWS