SlideShare a Scribd company logo
1 of 31
a open-source monitoring solution.
Prometheus - Monitoring system &
time series database
Takeaways:
• What is Prometheus?
• Difference Between Nagios vs Prometheus
• PromQL (Prometheus Query Language)
• Time series DB
• Grafana
• Live Demo
What is Prometheus?
• Prometheus is an open-source systems monitoring and alerting
toolkit originally built at SoundCloud.
• Inspired by Google’s Borgmon Monitoring System
• Written in Go .. Go, also known as Golang.. Go is syntactically
similar to C. Go is widely used in production at Google and in
many other organizations and open-source projects.
• It is now a standalone open source project and maintained
independently of any company. To emphasize this, and to clarify
the project's governance structure, Prometheus joined the CNCF
in 2016 as the second hosted project, after Kubernetes.
• The core Prometheus server is a single binary, with no
dependencies like Zookeeper, Consul, Cassandra, Hadoop or the
internet. All it needs is local disk, preferably an SSD.
• It is a systems and service monitoring system. It collects metrics
from configured targets at given intervals, evaluates rule
expressions, displays the results, and can trigger alerts if some
condition is observed to be true.
https://appinventiv.com/blog/mini-guide-to-go-programming-language/
ABOUT
• The Linux Foundation is the parent.
• OpenSource cloud computing for applications. Not
to confuse with OpenStack which is for
infrastructure.
• Netflix pioneered the concept of cloud native as a
practical tool
• Cloud native is a term used to describe container-
based environments. Cloud native technologies are
used to develop applications built with services
packaged in containers, deployed as microservices
and managed on elastic infrastructure through
agile DevOps processes and continuous delivery
workflows.
• August 9, 2018 - CNCF Announces Prometheus
Graduation.
https://www.cncf.io/webinars/what-is-cloud-native-and-why-does-it-exist/
Why Prometheus?
 Multi-Dimensional Data Model – Ex: instance, service, endpoint, and method.
 Operational Simplicity
 Scalable data Collection
 Powerful query Language.
All of these features existed in various systems.
However, Prometheus combined them all.
Nagios – an Overview
• The Industry Standard In IT Infrastructure Monitoring
• First launched in 1999.Nagios is officially sponsored by Nagios Enterprises.
• Nagios Core, is a free and open-source computer-software application that monitors systems,
networks and infrastructure. Nagios offers monitoring and alerting services for servers, switches,
applications and services. It alerts users when things go wrong and alerts them a second time when
the problem has been resolved.
• NDOUTILS -The NDOUTILS addon is designed to store all configuration and event data from Nagios
in a database. It requires a MariaDB or MySQL database for storing Nagios Core data .
• RRDtool and Highcharts are included to create customizable graphs that can be displayed in
dashboards.
• (Nagios Core vs Nagios XI) Nagios Core is open source whereas Nagios XI is a commercial,
enterprise version of Nagios.
• Historical performance data that is used to generate graphs are stored in Round Robin Database
(RRD) files.
• Rrdcached - On a Nagios XI server, rrdcached collects host and service performance data and then
flushes it to the appropriate rrd files at a specified interval. This reduces the amount of disk activity
needed to keep a large number of rrd files current for performance graphs.
Nagios vs Prometheus
• Nagios is primarily about alerting based on the exit codes of
scripts.
• Nagios is host-based. Each host can have one or more services
and each service can perform one check.
• There is no notion of labels or a query language.
• Nagios has no storage per-se, beyond the current check state.
There are plugins which can store data such as for
visualisation.
• Nagios XI - Using Grafana With Existing Performance Data:
Grafana uses the existing performance data files (RRD) to
generate the graphs.
• Overall, Nagios is suitable for basic monitoring of small and/or
static systems where blackbox probing is sufficient. If you want
to do whitebox monitoring, or have a dynamic or cloud based
environment, then Prometheus is a good choice.
Cacti Cacti
Should we cry or laugh?
Prometheus – By Canonical
• Ref:
https://prometheus.io/blog/2016/11/16/interview-with-canonical/
Architecture
Architecture - Explanation
• Prometheus scrapes metrics from instrumented jobs, either directly or via an
intermediary push gateway for short-lived jobs. It stores all scraped samples
locally and runs rules over this data to either aggregate and record new time
series from existing data or generate alerts.
• Also pulling is slightly better than pushing.
• For cases where you must push, we offer the Pushgateway as occasionally you
will need to monitor components which cannot be scraped. The Prometheus
Pushgateway allows you to push time series from short-lived service-level batch
jobs to an intermediary job which Prometheus can scrape.
• Limitation:-Not for Billing using the status collected for monitoring as as the
collected data will likely not be detailed and complete enough.
• Grafana or other API consumers can be used to visualize the collected data.
Alertmanager
• Grouping: Useful during larger outages when many systems fail at once and
hundreds to thousands of alerts may be firing simultaneously
• Inhibition is a concept of suppressing notifications for certain alerts if certain
other alerts are already firing.
• Silences are a straightforward way to simply mute alerts for a given time
• Following external systems are supported:
Email
Generic Webhooks
HipChat
OpsGenie
PagerDuty
Pushover
Slack
• To make Prometheus highly available: Run identical Prometheus servers on two or
more separate machines. Identical alerts will be deduplicated by the Alertmanager.
Time Series Database (TSDB)
• What is a time series -The value of something tracked over time.
• Labels (key/value pairs). Identifier -> (t0, v0), (t1, v1), (t2, v2), (t3, v3), .... Each data
point is a tuple of a timestamp and a value. For the purpose of monitoring, the
timestamp is an integer and the value any number.
Example : - This could be temperature once a day, or requests to your API once a minute.
The latter could look like:
my_api_requests: 5@1:00PM 2@1:01PM 18@1:02PM
• Fundamentally the same as the one of OpenTSDB
• Prometheus includes a local on-disk time series database, but also optionally
integrates with remote storage systems
• Ingested samples are grouped into blocks of two hours. Each two-hour block
consists of a directory containing one or more chunk files that contain all time
series samples for that window of time, as well as a metadata file and index file
(which indexes metric names and labels to time series in the chunk files). When
series are deleted via the API, deletion records are stored in separate tombstone
files (instead of deleting the data immediately from the chunk files).
• limitation of the local storage is that it is not clustered or replicated. Hence Using
RAID for disk availiablity, snapshots for backups, capacity planning, etc, is
recommended for improved durability. Alternatively, external storage may be used
via the remote read/write APIs.
TSDB Configuration:-
• Prometheus has several flags that allow configuring the local storage.
The most important ones are:
--storage.tsdb.path: This determines where Prometheus writes its database. Defaults to data/.
--storage.tsdb.retention.time: This determines when to remove old data. Defaults to 15d.
--storage.tsdb.retention.size: This determines the maximum number of bytes that storage blocks can use The oldest
data will be removed first. Defaults to 0 or disabled.
--storage.tsdb.wal-compression: This flag enables compression of the write-ahead log (WAL). Depending on your data,
you can expect the WAL size to be halved with little extra cpu load.
• TSDB Storage as follows
Prometheus - Demo
Free Online Demo:
http://demo.robustperception.io:9090/graph
• Prometheus means Forethinker
• Prometheus is Titan. i.e A titan is an
extremely important person. Albert Einstein
was a titan in the world of science.
• A Trickster figure, he was a champion of
mankind known for his wily intelligence,
who stole fire from Zeus and the gods and
gave it to mortals.
• Prometheus is a 2012 science fiction film of
spaceship.
Are You a Titan or just wearing Titan Watch?
Let’s Start - Prometheus
• Prerequisite: Configure Prometheus.yml (i.e scrape interval, target server to be monitored, alertmanager configuration, etc)
• Config file is written in YAML format. Prometheus can reload its configuration at runtime. A configuration reload is triggered by sending a
SIGHUP to the Prometheus process or sending a HTTP POST request to the /-/reload endpoint (when the --web.enable-lifecycle flag is
enabled).
• The kill command can send all of the above signals to commands and process. However, commands only give response if they are
programmed to recognize those signals. Particularly useful signals include: There are 64 signal(kill –l), Some are as below
 SIGHUP (1) - Hangup detected on controlling terminal or death of controlling process.
 SIGKILL (9) - Kill signal i.e. kill running process.
 SIGSTOP (19) - Stop process.
 SIGCONT (18) - Continue process if stopped.
To send a kill signal to PID # 1234 use: kill -9 1234
To send a kSIGHUP signal to PID # 1234 use: kill -1 1234
Prometheus – Exporter
• Exporters bridge the gap between Prometheus and system which don’t export metrics
in the Prometheus format.
• There are official & externally contributed exporter available like for mysql, oracledb,
DELL/IBM Hw, jira,Hadoop storage, apache http,AWS APIs, Docker,SNMP etc
https://prometheus.io/docs/instrumenting/exporters/
• Build Your Own Exporter:-
 Important Cronjob success or not.
 Any New Error from timesten db - error.log
 Online Selling Website perspective – Total order success vs failure.
 Order Data Metric - Dashboard Integration
 Important file received/processed or not.
 Top selling product/category
 5star to 1star review metric analysis.
 etc.
Node-Exporter - Monitors For hardware and OS
Metrics
PromQL - Prometheus Query Language
• Prometheus provides a functional query
language.
• It lets user select and aggregate time series data
in real time. The result of an expression can either
be shown as a graph, viewed as tabular data in
Prometheus's expression browser, or consumed
by external systems via the HTTP API.
• The Prometheus query language allows you to
slice and dice the dimensional data for ad-hoc
exploration, graphing, and alerting.
Time Series Selectors
• Instant Vector - One Value per time series Guaranteed. In the simplest
form, only a metric name is specified
• Range Vector - Any Number of Value between two timestamps. a
range duration is appended in square brackets ([]) at the end of a
vector selector
Metric types
• Counter :A counter is a cumulative metric that
represents a single monotonically increasing counter
whose value can only increase or be reset to zero on
restart. For example, you can use a counter to represent
the number of requests served, tasks completed, or
errors.
• Gauge :A gauge is a metric that represents a single
numerical value that can arbitrarily go up and down. i.e
temperatures or current memory usage
• Histogram :A histogram samples observations (usually
things like request durations or response sizes) and
counts them in configurable buckets.
• Summary:Similar to a histogram, a summary samples
observations (usually things like request durations and
response sizes).
https://povilasv.me/prometheus-tracking-request-duration/
Operators
• Binary Comparison Operators:
== , !=, >,<,>=,<=
• Binary Arithmetic Operators:
+, -, *, /,% (modulo), ^(power/exponentiation)
• Logical/set Binary operators:
and (intersection),or (union),unless (complement)
• Built-in aggregation operators:
sum, min, max, avg, stddev,stdvar,count, count_values, bottomk, topk, quantile
- These operators can either be used to aggregate over all label dimensions or preserve
distinct dimensions using,
by, without
https://blog.pvincent.io/2017/12/prometheus-blog-series-part-2-metric-types/
Basic Functions
• PromQL has 46 functions & growing…
• Most of the mathematical functions &
day, month, year, minute, hour, time are
avilable.
• In Prometheus perspective, we use
below mostly,
 Rate()
 irate() -irate should only be used when graphing
volatile, fast-moving counters.
 increase()
 label_join()/label_replace()
 <aggregation>_over_time()
min_over_time
max_over_time
avg_over_time
sum_over_time
count_over_time
Wow! Functions
• delta()
• holt_winters()
• predict_linear()
• clamp_max()
• clamp_min()
• histogram_quantile()
Holt-Winters
https://www.otexts.org/fpp/7/5
New Relic Doc
 Averages unfortunately have the big drawback
of hiding distribution and prevent the discovery
of outliers/deviation.
 Quantiles are better measurement for this kind
of metrics, as they allow to understand
distribution. For example, if the request latency
0.5-quantile (50th percentile) is 100ms, it
means that 50% of requests completed under
100ms. Similarly, if the 0.99-quantile (99th
percentile) is 4s, it means that 1% of requests
responded in more than 4s.
predict_linear()
Demo Queries
• max by(instance)(node_filesystem_size_bytes)
• max without(device, fstype, mountpoint)(node_filesystem_size_bytes)
• sum without(device, fstype, mountpoint)(node_filesystem_size_bytes)
• sum(node_filesystem_size_bytes)
• round(sum(node_filesystem_size_bytes)/1024/1024/1024)
• round(sum by(instance, device)(node_filesystem_size_bytes)/1024/1024/1024)
• rate(node_load1[5m])
• rate(node_cpu_seconds_total{mode="system"}[5m])
• min_over_time(node_load1[5m])
• max_over_time(node_load1[5m])
• avg_over_time(node_load1[5m])
• sum_over_time(node_load1[5m])
• count_over_time(node_load1[5m])
• delta(node_hwmon_temp_celsius[1h])
• clamp_max(node_load1,1.2)
• clamp_min(clamp_max(node_load1,1.2),1.05)
• predict_linear(node_load1[1h],4*3600)
• quantile without(cpu)(0.9, rate(node_cpu_seconds_total{mode="system"}[5m]))
• topk(3, sum by (mode) (node_cpu_seconds_total))
• bottomk(3, sum by (le) (alertmanager_http_request_duration_seconds_bucket))
Grafana – Demo
• Download and install grafana as described in url https://grafana.com/grafana/download/beta
• Post install, Follow as below to start, stop or check status accordingly. There are different way
too, follow installation guide for more data (attached logs)
gmv-evo@gmvevo:~/Downloads$ sudo systemctl start grafana-server
gmv-evo@gmvevo:~/Downloads$ sudo systemctl status grafana-server
gmv-evo@gmvevo:~/Downloads$ sudo systemctl stop grafana-server
• Open Url as follows and configure login process -http://localhost:3000.
• Configure Prometheus dashboard as generic and import Node Exporter dashboard: -
https://grafana.com/grafana/dashboards/1860
Wow! Grafana – An Dashboard Does for us!!!
Out of Syllabus – Trigger to look out
• Remote Endpoints and Storage - long term storage
• Alertmanager - Webhook Receiver (Gmail, etc)
• Prometheus Concerns - fixed by Cortex and Thanos
https://grafana.com/blog/2019/11/21/promcon-recap-two-
households-both-alike-in-dignity-cortex-and-thanos/
• Prometheus open bugs and fixes:
https://github.com/prometheus/prometheus/issues?
• Cloud Monitoring : Nagios vs. Prometheus
• Google's mtail - Extract Prometheus metrics from application logs.
• Prometheus is a system to collect and process metrics, not an event
logging system - ELK stack Answer.
Study Material –Free & Cost
Free
• https://prometheus.io/docs/introduction/overview/
• https://promcon.io/2019-munich/stream/
• Prometheus Monitoring : The Definitive Guide in 2019
• subreddit collecting all Prometheus-related resources on the internet.
• https://training.robustperception.io/ - Introduction to Prometheus
• Soundcloud - What makesPrometheusa “next generation”monitoring
system?
Cost
• Understanding PromQL by Robust Perception
• Prometheus: Up & Running by oreilly
Thanks for Listening!!!
be happy and make happy @how? given by my aasan:-
Go below what you have # Dream above what you have # First love what you have
Spread info what you have # Get info what others have # Help as per what you have

More Related Content

What's hot

Monitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_TutorialMonitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_Tutorial
Tim Vaillancourt
 

What's hot (20)

MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus Overview
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheus
 
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
 
Prometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome MonitoringPrometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome Monitoring
 
Prometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is comingPrometheus: What is is, what is new, what is coming
Prometheus: What is is, what is new, what is coming
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
 
Monitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_TutorialMonitoring_with_Prometheus_Grafana_Tutorial
Monitoring_with_Prometheus_Grafana_Tutorial
 
OpenTelemetry For Developers
OpenTelemetry For DevelopersOpenTelemetry For Developers
OpenTelemetry For Developers
 
Prometheus 101
Prometheus 101Prometheus 101
Prometheus 101
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
 
How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 
OpenTelemetry For Operators
OpenTelemetry For OperatorsOpenTelemetry For Operators
OpenTelemetry For Operators
 
MySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaMySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & Grafana
 

Similar to Prometheus - Intro, CNCF, TSDB,PromQL,Grafana

MongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps PerspectiveMongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps Perspective
MongoDB
 

Similar to Prometheus - Intro, CNCF, TSDB,PromQL,Grafana (20)

System monitoring
System monitoringSystem monitoring
System monitoring
 
Nagios En
Nagios EnNagios En
Nagios En
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructure
 
MongoDB at MapMyFitness
MongoDB at MapMyFitnessMongoDB at MapMyFitness
MongoDB at MapMyFitness
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source
 
Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)
 
Graylog
GraylogGraylog
Graylog
 
MongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps PerspectiveMongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps Perspective
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin
 
Streaming meetup
Streaming meetupStreaming meetup
Streaming meetup
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scale
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scale
 
Using Sumo Logic - Apr 2018
Using Sumo Logic - Apr 2018Using Sumo Logic - Apr 2018
Using Sumo Logic - Apr 2018
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Prometheus - Intro, CNCF, TSDB,PromQL,Grafana

  • 1. a open-source monitoring solution. Prometheus - Monitoring system & time series database
  • 2. Takeaways: • What is Prometheus? • Difference Between Nagios vs Prometheus • PromQL (Prometheus Query Language) • Time series DB • Grafana • Live Demo
  • 3. What is Prometheus? • Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. • Inspired by Google’s Borgmon Monitoring System • Written in Go .. Go, also known as Golang.. Go is syntactically similar to C. Go is widely used in production at Google and in many other organizations and open-source projects. • It is now a standalone open source project and maintained independently of any company. To emphasize this, and to clarify the project's governance structure, Prometheus joined the CNCF in 2016 as the second hosted project, after Kubernetes. • The core Prometheus server is a single binary, with no dependencies like Zookeeper, Consul, Cassandra, Hadoop or the internet. All it needs is local disk, preferably an SSD. • It is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true. https://appinventiv.com/blog/mini-guide-to-go-programming-language/
  • 4. ABOUT • The Linux Foundation is the parent. • OpenSource cloud computing for applications. Not to confuse with OpenStack which is for infrastructure. • Netflix pioneered the concept of cloud native as a practical tool • Cloud native is a term used to describe container- based environments. Cloud native technologies are used to develop applications built with services packaged in containers, deployed as microservices and managed on elastic infrastructure through agile DevOps processes and continuous delivery workflows. • August 9, 2018 - CNCF Announces Prometheus Graduation. https://www.cncf.io/webinars/what-is-cloud-native-and-why-does-it-exist/
  • 5. Why Prometheus?  Multi-Dimensional Data Model – Ex: instance, service, endpoint, and method.  Operational Simplicity  Scalable data Collection  Powerful query Language. All of these features existed in various systems. However, Prometheus combined them all.
  • 6. Nagios – an Overview • The Industry Standard In IT Infrastructure Monitoring • First launched in 1999.Nagios is officially sponsored by Nagios Enterprises. • Nagios Core, is a free and open-source computer-software application that monitors systems, networks and infrastructure. Nagios offers monitoring and alerting services for servers, switches, applications and services. It alerts users when things go wrong and alerts them a second time when the problem has been resolved. • NDOUTILS -The NDOUTILS addon is designed to store all configuration and event data from Nagios in a database. It requires a MariaDB or MySQL database for storing Nagios Core data . • RRDtool and Highcharts are included to create customizable graphs that can be displayed in dashboards. • (Nagios Core vs Nagios XI) Nagios Core is open source whereas Nagios XI is a commercial, enterprise version of Nagios. • Historical performance data that is used to generate graphs are stored in Round Robin Database (RRD) files. • Rrdcached - On a Nagios XI server, rrdcached collects host and service performance data and then flushes it to the appropriate rrd files at a specified interval. This reduces the amount of disk activity needed to keep a large number of rrd files current for performance graphs.
  • 7. Nagios vs Prometheus • Nagios is primarily about alerting based on the exit codes of scripts. • Nagios is host-based. Each host can have one or more services and each service can perform one check. • There is no notion of labels or a query language. • Nagios has no storage per-se, beyond the current check state. There are plugins which can store data such as for visualisation. • Nagios XI - Using Grafana With Existing Performance Data: Grafana uses the existing performance data files (RRD) to generate the graphs. • Overall, Nagios is suitable for basic monitoring of small and/or static systems where blackbox probing is sufficient. If you want to do whitebox monitoring, or have a dynamic or cloud based environment, then Prometheus is a good choice.
  • 8. Cacti Cacti Should we cry or laugh?
  • 9. Prometheus – By Canonical • Ref: https://prometheus.io/blog/2016/11/16/interview-with-canonical/
  • 11. Architecture - Explanation • Prometheus scrapes metrics from instrumented jobs, either directly or via an intermediary push gateway for short-lived jobs. It stores all scraped samples locally and runs rules over this data to either aggregate and record new time series from existing data or generate alerts. • Also pulling is slightly better than pushing. • For cases where you must push, we offer the Pushgateway as occasionally you will need to monitor components which cannot be scraped. The Prometheus Pushgateway allows you to push time series from short-lived service-level batch jobs to an intermediary job which Prometheus can scrape. • Limitation:-Not for Billing using the status collected for monitoring as as the collected data will likely not be detailed and complete enough. • Grafana or other API consumers can be used to visualize the collected data.
  • 12. Alertmanager • Grouping: Useful during larger outages when many systems fail at once and hundreds to thousands of alerts may be firing simultaneously • Inhibition is a concept of suppressing notifications for certain alerts if certain other alerts are already firing. • Silences are a straightforward way to simply mute alerts for a given time • Following external systems are supported: Email Generic Webhooks HipChat OpsGenie PagerDuty Pushover Slack • To make Prometheus highly available: Run identical Prometheus servers on two or more separate machines. Identical alerts will be deduplicated by the Alertmanager.
  • 13. Time Series Database (TSDB) • What is a time series -The value of something tracked over time. • Labels (key/value pairs). Identifier -> (t0, v0), (t1, v1), (t2, v2), (t3, v3), .... Each data point is a tuple of a timestamp and a value. For the purpose of monitoring, the timestamp is an integer and the value any number. Example : - This could be temperature once a day, or requests to your API once a minute. The latter could look like: my_api_requests: 5@1:00PM 2@1:01PM 18@1:02PM • Fundamentally the same as the one of OpenTSDB • Prometheus includes a local on-disk time series database, but also optionally integrates with remote storage systems • Ingested samples are grouped into blocks of two hours. Each two-hour block consists of a directory containing one or more chunk files that contain all time series samples for that window of time, as well as a metadata file and index file (which indexes metric names and labels to time series in the chunk files). When series are deleted via the API, deletion records are stored in separate tombstone files (instead of deleting the data immediately from the chunk files). • limitation of the local storage is that it is not clustered or replicated. Hence Using RAID for disk availiablity, snapshots for backups, capacity planning, etc, is recommended for improved durability. Alternatively, external storage may be used via the remote read/write APIs.
  • 14. TSDB Configuration:- • Prometheus has several flags that allow configuring the local storage. The most important ones are: --storage.tsdb.path: This determines where Prometheus writes its database. Defaults to data/. --storage.tsdb.retention.time: This determines when to remove old data. Defaults to 15d. --storage.tsdb.retention.size: This determines the maximum number of bytes that storage blocks can use The oldest data will be removed first. Defaults to 0 or disabled. --storage.tsdb.wal-compression: This flag enables compression of the write-ahead log (WAL). Depending on your data, you can expect the WAL size to be halved with little extra cpu load. • TSDB Storage as follows
  • 15. Prometheus - Demo Free Online Demo: http://demo.robustperception.io:9090/graph
  • 16. • Prometheus means Forethinker • Prometheus is Titan. i.e A titan is an extremely important person. Albert Einstein was a titan in the world of science. • A Trickster figure, he was a champion of mankind known for his wily intelligence, who stole fire from Zeus and the gods and gave it to mortals. • Prometheus is a 2012 science fiction film of spaceship. Are You a Titan or just wearing Titan Watch?
  • 17. Let’s Start - Prometheus • Prerequisite: Configure Prometheus.yml (i.e scrape interval, target server to be monitored, alertmanager configuration, etc) • Config file is written in YAML format. Prometheus can reload its configuration at runtime. A configuration reload is triggered by sending a SIGHUP to the Prometheus process or sending a HTTP POST request to the /-/reload endpoint (when the --web.enable-lifecycle flag is enabled). • The kill command can send all of the above signals to commands and process. However, commands only give response if they are programmed to recognize those signals. Particularly useful signals include: There are 64 signal(kill –l), Some are as below  SIGHUP (1) - Hangup detected on controlling terminal or death of controlling process.  SIGKILL (9) - Kill signal i.e. kill running process.  SIGSTOP (19) - Stop process.  SIGCONT (18) - Continue process if stopped. To send a kill signal to PID # 1234 use: kill -9 1234 To send a kSIGHUP signal to PID # 1234 use: kill -1 1234
  • 18. Prometheus – Exporter • Exporters bridge the gap between Prometheus and system which don’t export metrics in the Prometheus format. • There are official & externally contributed exporter available like for mysql, oracledb, DELL/IBM Hw, jira,Hadoop storage, apache http,AWS APIs, Docker,SNMP etc https://prometheus.io/docs/instrumenting/exporters/ • Build Your Own Exporter:-  Important Cronjob success or not.  Any New Error from timesten db - error.log  Online Selling Website perspective – Total order success vs failure.  Order Data Metric - Dashboard Integration  Important file received/processed or not.  Top selling product/category  5star to 1star review metric analysis.  etc.
  • 19. Node-Exporter - Monitors For hardware and OS Metrics
  • 20. PromQL - Prometheus Query Language • Prometheus provides a functional query language. • It lets user select and aggregate time series data in real time. The result of an expression can either be shown as a graph, viewed as tabular data in Prometheus's expression browser, or consumed by external systems via the HTTP API. • The Prometheus query language allows you to slice and dice the dimensional data for ad-hoc exploration, graphing, and alerting.
  • 21. Time Series Selectors • Instant Vector - One Value per time series Guaranteed. In the simplest form, only a metric name is specified • Range Vector - Any Number of Value between two timestamps. a range duration is appended in square brackets ([]) at the end of a vector selector
  • 22. Metric types • Counter :A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart. For example, you can use a counter to represent the number of requests served, tasks completed, or errors. • Gauge :A gauge is a metric that represents a single numerical value that can arbitrarily go up and down. i.e temperatures or current memory usage • Histogram :A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. • Summary:Similar to a histogram, a summary samples observations (usually things like request durations and response sizes). https://povilasv.me/prometheus-tracking-request-duration/
  • 23. Operators • Binary Comparison Operators: == , !=, >,<,>=,<= • Binary Arithmetic Operators: +, -, *, /,% (modulo), ^(power/exponentiation) • Logical/set Binary operators: and (intersection),or (union),unless (complement) • Built-in aggregation operators: sum, min, max, avg, stddev,stdvar,count, count_values, bottomk, topk, quantile - These operators can either be used to aggregate over all label dimensions or preserve distinct dimensions using, by, without https://blog.pvincent.io/2017/12/prometheus-blog-series-part-2-metric-types/
  • 24. Basic Functions • PromQL has 46 functions & growing… • Most of the mathematical functions & day, month, year, minute, hour, time are avilable. • In Prometheus perspective, we use below mostly,  Rate()  irate() -irate should only be used when graphing volatile, fast-moving counters.  increase()  label_join()/label_replace()  <aggregation>_over_time() min_over_time max_over_time avg_over_time sum_over_time count_over_time
  • 25. Wow! Functions • delta() • holt_winters() • predict_linear() • clamp_max() • clamp_min() • histogram_quantile() Holt-Winters https://www.otexts.org/fpp/7/5 New Relic Doc  Averages unfortunately have the big drawback of hiding distribution and prevent the discovery of outliers/deviation.  Quantiles are better measurement for this kind of metrics, as they allow to understand distribution. For example, if the request latency 0.5-quantile (50th percentile) is 100ms, it means that 50% of requests completed under 100ms. Similarly, if the 0.99-quantile (99th percentile) is 4s, it means that 1% of requests responded in more than 4s. predict_linear()
  • 26. Demo Queries • max by(instance)(node_filesystem_size_bytes) • max without(device, fstype, mountpoint)(node_filesystem_size_bytes) • sum without(device, fstype, mountpoint)(node_filesystem_size_bytes) • sum(node_filesystem_size_bytes) • round(sum(node_filesystem_size_bytes)/1024/1024/1024) • round(sum by(instance, device)(node_filesystem_size_bytes)/1024/1024/1024) • rate(node_load1[5m]) • rate(node_cpu_seconds_total{mode="system"}[5m]) • min_over_time(node_load1[5m]) • max_over_time(node_load1[5m]) • avg_over_time(node_load1[5m]) • sum_over_time(node_load1[5m]) • count_over_time(node_load1[5m]) • delta(node_hwmon_temp_celsius[1h]) • clamp_max(node_load1,1.2) • clamp_min(clamp_max(node_load1,1.2),1.05) • predict_linear(node_load1[1h],4*3600) • quantile without(cpu)(0.9, rate(node_cpu_seconds_total{mode="system"}[5m])) • topk(3, sum by (mode) (node_cpu_seconds_total)) • bottomk(3, sum by (le) (alertmanager_http_request_duration_seconds_bucket))
  • 27. Grafana – Demo • Download and install grafana as described in url https://grafana.com/grafana/download/beta • Post install, Follow as below to start, stop or check status accordingly. There are different way too, follow installation guide for more data (attached logs) gmv-evo@gmvevo:~/Downloads$ sudo systemctl start grafana-server gmv-evo@gmvevo:~/Downloads$ sudo systemctl status grafana-server gmv-evo@gmvevo:~/Downloads$ sudo systemctl stop grafana-server • Open Url as follows and configure login process -http://localhost:3000. • Configure Prometheus dashboard as generic and import Node Exporter dashboard: - https://grafana.com/grafana/dashboards/1860
  • 28. Wow! Grafana – An Dashboard Does for us!!!
  • 29. Out of Syllabus – Trigger to look out • Remote Endpoints and Storage - long term storage • Alertmanager - Webhook Receiver (Gmail, etc) • Prometheus Concerns - fixed by Cortex and Thanos https://grafana.com/blog/2019/11/21/promcon-recap-two- households-both-alike-in-dignity-cortex-and-thanos/ • Prometheus open bugs and fixes: https://github.com/prometheus/prometheus/issues? • Cloud Monitoring : Nagios vs. Prometheus • Google's mtail - Extract Prometheus metrics from application logs. • Prometheus is a system to collect and process metrics, not an event logging system - ELK stack Answer.
  • 30. Study Material –Free & Cost Free • https://prometheus.io/docs/introduction/overview/ • https://promcon.io/2019-munich/stream/ • Prometheus Monitoring : The Definitive Guide in 2019 • subreddit collecting all Prometheus-related resources on the internet. • https://training.robustperception.io/ - Introduction to Prometheus • Soundcloud - What makesPrometheusa “next generation”monitoring system? Cost • Understanding PromQL by Robust Perception • Prometheus: Up & Running by oreilly
  • 31. Thanks for Listening!!! be happy and make happy @how? given by my aasan:- Go below what you have # Dream above what you have # First love what you have Spread info what you have # Get info what others have # Help as per what you have