Prometheus is an open source monitoring project used to gather metrics.
It as many capabilities built-in, such as service discovery, which makes it very suitable for an automated environment.
This talk will give a brief introduction of Prometheus, what are the latest developments, and then give practical tips and examples about how you can use it in an automated world.
Great contribution from our partner Splitpoints solutions on how to collect and format Performance Vision data into Elastic Search / Kibana.
Potential applications are:
- NPM or APM custom dashboards
- Dashboards mixing Performance Vision data with other ITSM tools / sources
- Alerting and baselining.
MeetUp Monitoring with Prometheus and Grafana (September 2018)Lucas Jellema
This presentation introduces the concept of monitoring - focusing on why and how and finally on the tools to use. It introduces Prometheus (metrics gathering, processing, alerting), application instrumentation and Prometheus exporters and finally it introduces Grafana as a common companion for dashboarding, alerting and notifications. This presentations also introduces the handson workshop - for which materials are available from https://github.com/lucasjellema/monitoring-workshop-prometheus-grafana
Debugging Microservices - key challenges and techniques - Microservices Odesa...Lohika_Odessa_TechTalks
Microservice architecture is widespread our days. It comes with a lot of benefits and challenges to solve. Main goal of this talk is to go through troubleshooting and debugging in the distributed micro-service world. Topic would cover:
main aspects of the logging,
monitoring,
distributed tracing,
debugging services on the cluster.
About speaker:
Andrеy Kolodnitskiy is Staff engineer in the Lohika and his primary focus is around distributed systems, microservices and JVM based languages.
Majority of time engineers spend debugging and fixing the issues. This talk will be dedicated to best practicies and tools Andrеys team uses on its project which do help to find issues more efficiently.
Great contribution from our partner Splitpoints solutions on how to collect and format Performance Vision data into Elastic Search / Kibana.
Potential applications are:
- NPM or APM custom dashboards
- Dashboards mixing Performance Vision data with other ITSM tools / sources
- Alerting and baselining.
MeetUp Monitoring with Prometheus and Grafana (September 2018)Lucas Jellema
This presentation introduces the concept of monitoring - focusing on why and how and finally on the tools to use. It introduces Prometheus (metrics gathering, processing, alerting), application instrumentation and Prometheus exporters and finally it introduces Grafana as a common companion for dashboarding, alerting and notifications. This presentations also introduces the handson workshop - for which materials are available from https://github.com/lucasjellema/monitoring-workshop-prometheus-grafana
Debugging Microservices - key challenges and techniques - Microservices Odesa...Lohika_Odessa_TechTalks
Microservice architecture is widespread our days. It comes with a lot of benefits and challenges to solve. Main goal of this talk is to go through troubleshooting and debugging in the distributed micro-service world. Topic would cover:
main aspects of the logging,
monitoring,
distributed tracing,
debugging services on the cluster.
About speaker:
Andrеy Kolodnitskiy is Staff engineer in the Lohika and his primary focus is around distributed systems, microservices and JVM based languages.
Majority of time engineers spend debugging and fixing the issues. This talk will be dedicated to best practicies and tools Andrеys team uses on its project which do help to find issues more efficiently.
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Brian Brazil
Prometheus is a next-generation monitoring system. Since being publicly announced last year it has seen wide-spread interest and adoption. This talk will look at the concepts behind monitoring with Prometheus, and how to use it with Kubernetes which has direct support for Prometheus.
Topics of this presentation:
- Basics and best practices of developing single-page applications (SPA) and Web API Services on Microsoft .NET -
- Core with Docker and Linux.
- PowerShell Core automated builds.
- Markdown/PDF documentation.
- Documentation of public interfaces with Swagger/OAS/YAML.
- Automated testing of SPA on Protractor and testing the Web API on Postman/Newman.
This presentation by Sergii Fradkov (Consultant, Engineering), Andrii Zarharov (Lead Software Engineer, Consultant), Igor Magdich (Lead Test Engineer, Consultant) was delivered at GlobalLogic Kharkiv .NET TechTalk #1 on May 24, 2019.
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataGetInData
Did you like it? Check out our blog to stay up to date: https://getindata.com/blog
The webinar was organized by GetinData on 2020. During the webinar we explaned the concept of monitoring and observability with focus on data analytics platforms.
Watch more here: https://www.youtube.com/watch?v=qSOlEN5XBQc
Whitepaper - Monitoring ang Observability for Data Platform: https://getindata.com/blog/white-paper-big-data-monitoring-observability-data-platform/
Speaker: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Build cloud native solution using open source Nitesh Jadhav
Build cloud native solution using open source. I have tried to give a high level overview on How to build Cloud Native using CNCF graduated software's which are tested, proven and having many reference case studies and partner support for deployment
Monitoring as an entry point for collaborationJulien Pivotto
In the last years, we have been building complex stacks, made from lots of components. All of this backed by multiple teams. This talk will present how you can use monitoring to look at the business side and have everyone looking at the same dashboards, making cooperation a reality.
Measuring CDN performance and why you're doing it wrongFastly
Integrating content delivery networks into your application infrastructure can offer many benefits, including major performance improvements for your applications. So understanding how CDNs perform — especially for your specific use cases — is vital. However, testing for measurement is complicated and nuanced, and results in metric overload and confusion. It's becoming increasingly important to understand measurement techniques, what they're telling you, and how to apply them to your actual content.
In this session, we'll examine the challenges around measuring CDN performance and focus on the different methods for measurement. We'll discuss what to measure, important metrics to focus on, and different ways that numbers may mislead you.
More specifically, we'll cover:
Different techniques for measuring CDN performance
Differentiating between network footprint and object delivery performance
Choosing the right content to test
Core metrics to focus on and how each impacts real traffic
Understanding cache hit ratio, why it can be misleading, and how to measure for it
Npm has modules for devops, like logging, metrics, service discovery. But when you arrive to production, you may find that these are already handled by old players. Avoid the same mistakes I did, when my first node app was on its way to the world.
ApacheCon Core: Service Discovery in OSGi: Beyond the JVM using Docker and Co...Frank Lyaruu
OSGi offers an excellent service discovery mechanism, but it is limited to services inside the JVM. With Docker nowadays it is trivially easy to deploy all kind of (micro) services, using pretty much any technology stack, so we’d like to discover those as easily as the ones inside the JVM. We will have a look at how we can use the Docker API to discover services in other containers, and how we can use Consul to expand service discovery to other hosts.
Do you know what your Drupal is doing_ Observe it!sparkfabrik
Our Drupal 8 websites are true applications, often very complex ones.
More and more workload is delegated to external systems, usually microservices, that are used for many different tasks.
Architectures are always more distributed and fragmented.
To trace the lifecycle of a single request that origins in a client, passes throught all Drupal subsytems, reaches external (micro)services and comes back will become mandatory to track down problems and to optimize for performances. This is often time consuming and without the right tools may became very difficult.
A simple unstructured log stream isn't enough anymore, we need to find a way to observe the details of what is going on.
Observability is all about this and is based on structured logs, metrics and traces. In this talk we will see how to implement these tecniques in Drupal, which tools and which modules to use to trace and log all requests that reach our website and how to expose and display useful metrics.
We will integrate Drupal with OpenTracing, Prometheus, Monolog, Grafana and many more.
Based on experience with hundreds of customers, here's a set of best practices for monitoring Kubernetes and monitoring your applications running inside docker containers.
What's New in Prometheus and Its EcosystemJulien Pivotto
Let's have a look at all the recent features and changes in the Prometheus server and the community. We will introduce the new features and see how you can integrate them in your workflows to get a better Prometheus experience.
More Related Content
Similar to Monitoring in a fast-changing world with Prometheus
Monitoring Kubernetes with Prometheus (Kubernetes Ireland, 2016)Brian Brazil
Prometheus is a next-generation monitoring system. Since being publicly announced last year it has seen wide-spread interest and adoption. This talk will look at the concepts behind monitoring with Prometheus, and how to use it with Kubernetes which has direct support for Prometheus.
Topics of this presentation:
- Basics and best practices of developing single-page applications (SPA) and Web API Services on Microsoft .NET -
- Core with Docker and Linux.
- PowerShell Core automated builds.
- Markdown/PDF documentation.
- Documentation of public interfaces with Swagger/OAS/YAML.
- Automated testing of SPA on Protractor and testing the Web API on Postman/Newman.
This presentation by Sergii Fradkov (Consultant, Engineering), Andrii Zarharov (Lead Software Engineer, Consultant), Igor Magdich (Lead Test Engineer, Consultant) was delivered at GlobalLogic Kharkiv .NET TechTalk #1 on May 24, 2019.
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataGetInData
Did you like it? Check out our blog to stay up to date: https://getindata.com/blog
The webinar was organized by GetinData on 2020. During the webinar we explaned the concept of monitoring and observability with focus on data analytics platforms.
Watch more here: https://www.youtube.com/watch?v=qSOlEN5XBQc
Whitepaper - Monitoring ang Observability for Data Platform: https://getindata.com/blog/white-paper-big-data-monitoring-observability-data-platform/
Speaker: Albert Lewandowski
Linkedin: https://www.linkedin.com/in/albert-lewandowski/
___
Getindata is a company founded in 2014 by ex-Spotify data engineers. From day one our focus has been on Big Data projects. We bring together a group of best and most experienced experts in Poland, working with cloud and open-source Big Data technologies to help companies build scalable data architectures and implement advanced analytics over large data sets.
Our experts have vast production experience in implementing Big Data projects for Polish as well as foreign companies including i.a. Spotify, Play, Truecaller, Kcell, Acast, Allegro, ING, Agora, Synerise, StepStone, iZettle and many others from the pharmaceutical, media, finance and FMCG industries.
https://getindata.com
Build cloud native solution using open source Nitesh Jadhav
Build cloud native solution using open source. I have tried to give a high level overview on How to build Cloud Native using CNCF graduated software's which are tested, proven and having many reference case studies and partner support for deployment
Monitoring as an entry point for collaborationJulien Pivotto
In the last years, we have been building complex stacks, made from lots of components. All of this backed by multiple teams. This talk will present how you can use monitoring to look at the business side and have everyone looking at the same dashboards, making cooperation a reality.
Measuring CDN performance and why you're doing it wrongFastly
Integrating content delivery networks into your application infrastructure can offer many benefits, including major performance improvements for your applications. So understanding how CDNs perform — especially for your specific use cases — is vital. However, testing for measurement is complicated and nuanced, and results in metric overload and confusion. It's becoming increasingly important to understand measurement techniques, what they're telling you, and how to apply them to your actual content.
In this session, we'll examine the challenges around measuring CDN performance and focus on the different methods for measurement. We'll discuss what to measure, important metrics to focus on, and different ways that numbers may mislead you.
More specifically, we'll cover:
Different techniques for measuring CDN performance
Differentiating between network footprint and object delivery performance
Choosing the right content to test
Core metrics to focus on and how each impacts real traffic
Understanding cache hit ratio, why it can be misleading, and how to measure for it
Npm has modules for devops, like logging, metrics, service discovery. But when you arrive to production, you may find that these are already handled by old players. Avoid the same mistakes I did, when my first node app was on its way to the world.
ApacheCon Core: Service Discovery in OSGi: Beyond the JVM using Docker and Co...Frank Lyaruu
OSGi offers an excellent service discovery mechanism, but it is limited to services inside the JVM. With Docker nowadays it is trivially easy to deploy all kind of (micro) services, using pretty much any technology stack, so we’d like to discover those as easily as the ones inside the JVM. We will have a look at how we can use the Docker API to discover services in other containers, and how we can use Consul to expand service discovery to other hosts.
Do you know what your Drupal is doing_ Observe it!sparkfabrik
Our Drupal 8 websites are true applications, often very complex ones.
More and more workload is delegated to external systems, usually microservices, that are used for many different tasks.
Architectures are always more distributed and fragmented.
To trace the lifecycle of a single request that origins in a client, passes throught all Drupal subsytems, reaches external (micro)services and comes back will become mandatory to track down problems and to optimize for performances. This is often time consuming and without the right tools may became very difficult.
A simple unstructured log stream isn't enough anymore, we need to find a way to observe the details of what is going on.
Observability is all about this and is based on structured logs, metrics and traces. In this talk we will see how to implement these tecniques in Drupal, which tools and which modules to use to trace and log all requests that reach our website and how to expose and display useful metrics.
We will integrate Drupal with OpenTracing, Prometheus, Monolog, Grafana and many more.
Based on experience with hundreds of customers, here's a set of best practices for monitoring Kubernetes and monitoring your applications running inside docker containers.
Similar to Monitoring in a fast-changing world with Prometheus (20)
What's New in Prometheus and Its EcosystemJulien Pivotto
Let's have a look at all the recent features and changes in the Prometheus server and the community. We will introduce the new features and see how you can integrate them in your workflows to get a better Prometheus experience.
Prometheus: What is is, what is new, what is comingJulien Pivotto
Prometheus is a metrics-based monitoring and alerting system and also the project with the second longest tenure within the CNCF. As such you have probably heard about it by now. We will give you a short introduction to Prometheus, what it is and why it was such a big deal when it was initially released. In all those years since then, the project has only gained speed, which provides us with the opportunity to tell you about all the exciting new features that have just been released or are in the pipeline, including opportunities to contribute to the project and its wider ecosystem.
Talk at kubecon 2021
Graphs can represent many different things. Across the years I have learned how to display different situations in Grafana effectively. I share how to visualize different kinds of situations and make them easy to read by using advanced features of Grafana.
HAProxy is often used to route ingress traffic, but we use it the other way around. We use it for egress. Our applications talk to the outside world through HAProxy. We get a lot of benefits from this unique approach: throttling, guaranteed response times, unified monitoring, and path rewriting. I will highlight how we use HAProxy at Inuits and how we achieve observability via Prometheus and Grafana.
Improved alerting with Prometheus and AlertmanagerJulien Pivotto
One of the reasons we collect metrics is to be able to alert on them. This presentation will introduce you some concepts of PromQL, prometheus and alertmanager to highly improve the quality and reliability of your alerts. This talk will cover different topic, including: - Reducing flapping alerts - Hysteresis - "Time of the day" based alerting - Computed thresholds with data history
his talk will introduce you to the Prometheus monitoring solution and how you can use it to monitor yous CentOS servers, and the applications that run on top of them. It will provide tips about the setup and show some great, real life example.
A small demo involving OpenShift will also be produced, to demonstrate how Prometheus can work with dynamic environments.
Automation is at the heart of modern infrastructure. Ansible is a great tool to automate your routing workflows and your infrastructure.
This talk will present you the best of Ansible - how you can quickly get started and start automating your infrastructure with it.
Let's face it: config management has grown up so far that the problems slowing us down are for most of them not technical anymore. From common DevOps misconception to the way we pay our technical debt, we can use config management and automation to actually improve and attract all the people that are not playing the game yet. This talk will enlight some great moves that happened in this world recently and show that anything can be automate properly now. Then I will take some examples on how you can improve and shave the last yaks.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
3. • Applications are short lived
• Updated often
• Infrastructure changes
(Nothing new...)
A fast changing world
@roidelapluie
4. • Monitoring an infrastructure
• Monitoring user experience
... together (dev&ops)
Monitoring in a "fast changing world"
@roidelapluie
5. CPU Usage, Disk space, Memory, Open file descriptors, ...
Infrastructure monitoring
@roidelapluie
6. Request Rate, Request Errors, Request Duration
Utilization, Saturation, Errors
User experience monitoring
@roidelapluie RED method by Tom Wilkie, USE method by Brendan Gregg
7. • High level overview of the state of a service/component
• Performance
• Availability
• Technical components
What is going on?
What is monitoring?
@roidelapluie
8. • Understand how your services behave
• Like you are at their place
• Without specific code
Why is this going on?
What's observability?
@roidelapluie
9. • Monitoring is required
• Some monitoring systems are design for observability
• If lucky, monitoring is enough
• Observability is removing luck
How do monitoring and observability connect?
@roidelapluie
17. • Open Source monitoring solution
• Graduated CNCF Project
• Born in 2012, publicly announced in 2015
• Collects metrics
• Plenty of integrations
• Service discoveries, like kubernetes.
• Easy to use query language
• Built-in alerting
Prometheus
@roidelapluie
18. • A community
• A server and many other components
• An ecosystem
What "Prometheus" means
@roidelapluie
19. • Open Source
• Pull-based Monitoring over HTTP
• Powerful query language
• Optimized TSDB
The Prometheus server
@roidelapluie
26. Prometheus scrapes metrics over HTTP.
caddy_http_requests_total{code="200",method="POST",path="/load"} 1
Dimensional data model, for filtering and aggregation.
Metrics and Labels
@roidelapluie
29. • Metrics do not represent problems
• Metrics represent a state, give insights
• Metrics can be graphed
• You can alert based on them
Metrics and monitoring
@roidelapluie
30. In general you can just expose counters, and let the monitoring server do the
real maths.
That keeps the overhead very low of apps.
Exposed metrics are "raw"
@roidelapluie
36. Let's see what makes Prometheus play nicely with automation tools.
Automating Prometheus
@roidelapluie
37. • Works on your machine
• Container ready
• Not tied to kubernetes (see prometheus-operator)
Deploy anywhere
@roidelapluie
38. • Reloads on SIGHUP
• /-/reload endpoint (--web.enable-lifecycle)
• Working to have less and less overhead on reloads
Reloading Prometheus
@roidelapluie
40. Plenty of situation do not require a reload of Prometheus:
• Password files
• TLS certificates
Prometheus will read them before use, no reload needed!
Not reloading Prometheus
@roidelapluie
41. HashiCorp Vault enables retrieving temporary secrets and writing them to a file.
./vault agent -config vault-agent.hcl
Using vault
@roidelapluie https://inuits.eu/blog/prometheus-consul-vault-228/
44. The "web-config" file is read on every request:
• No need to reload
• Instantly change passwords, cert files
Shared config format between Prometheus and exporters!
TLS and basic auth
@roidelapluie
45. Prometheus has a snapshot API.
Enable with --web.enable-admin-api
curl -d{} http://localhost:9090/api/v1/admin/tsdb/snapshot
Prometheus TSDB is made of immutable blocks. Snapshots use hard links.
Backups
@roidelapluie
47. • Easier to know what's down with Pull
• Easy debugging (curl)
• Easier to spread the load
• Central configuration point
• High Availability
Prometheus pull model
@roidelapluie
48. • Prometheus must know what to pull
• Source of Truth
• Service Discovery != Auto discovery
• Event based when possible
Service Discovery
@roidelapluie
49. • Kubernetes
• Consul
• Cloud providers (Azure, AWS, GCP, DigitalOcean, Hetzner, Scaleway, Linode)
• Docker & Docker Swarm
• And more! 20+ external SD in total.
Sources of Truth
@roidelapluie
50. • Static SD: into Prometheus main config
• File SD: Files on disk
• HTTP SD: HTTP endpoints
Generic Service Discovery
@roidelapluie https://inuits.eu/blog/prometheus-http-service-discovery/
52. • Both integrate your own SD systems into prometheus
• File SD is event based (inotify)
• HTTP SD can be integrated in your apps
File SD vs HTTP SD
@roidelapluie
53. Labels can be used to configure targets.
• __address__: 127.0.0.1:9090
• __metrics_path__: /metrics
• __scheme__: http or https
• __param_<name>: http parameter
• __scrape_interval__, __scrape_timeout__: 1m
Labels
@roidelapluie
54. Additionally, extra labels are added by SD.
• __meta_kubernetes_pod_label_app
• __meta_digitalocean_region
• __meta_linode_public_ipv6
• __meta_scaleway_instance_status
Meta labels
@roidelapluie https://prometheus.io/docs/prometheus/latest/configuration/configuration/
55. A fundamental principle in Prometheus.
Transform input labels into a new set of labels.
Relabeling
@roidelapluie
56. • Rename, merge, replace labels
• Conditionally drop label sets
• Only keep labels sets
Relabeling actions
@roidelapluie
57. • Get lots of labels as input
• Turns them into targets
• Remove labels prefixed with __
• Can use "special labels"
Target relabeling
@roidelapluie