SlideShare a Scribd company logo
a open-source monitoring solution.
Prometheus - Monitoring system &
time series database
Takeaways:
• What is Prometheus?
• Difference Between Nagios vs Prometheus
• PromQL (Prometheus Query Language)
• Time series DB
• Grafana
• Live Demo
What is Prometheus?
• Prometheus is an open-source systems monitoring and alerting
toolkit originally built at SoundCloud.
• Inspired by Google’s Borgmon Monitoring System
• Written in Go .. Go, also known as Golang.. Go is syntactically
similar to C. Go is widely used in production at Google and in
many other organizations and open-source projects.
• It is now a standalone open source project and maintained
independently of any company. To emphasize this, and to clarify
the project's governance structure, Prometheus joined the CNCF
in 2016 as the second hosted project, after Kubernetes.
• The core Prometheus server is a single binary, with no
dependencies like Zookeeper, Consul, Cassandra, Hadoop or the
internet. All it needs is local disk, preferably an SSD.
• It is a systems and service monitoring system. It collects metrics
from configured targets at given intervals, evaluates rule
expressions, displays the results, and can trigger alerts if some
condition is observed to be true.
https://appinventiv.com/blog/mini-guide-to-go-programming-language/
ABOUT
• The Linux Foundation is the parent.
• OpenSource cloud computing for applications. Not
to confuse with OpenStack which is for
infrastructure.
• Netflix pioneered the concept of cloud native as a
practical tool
• Cloud native is a term used to describe container-
based environments. Cloud native technologies are
used to develop applications built with services
packaged in containers, deployed as microservices
and managed on elastic infrastructure through
agile DevOps processes and continuous delivery
workflows.
• August 9, 2018 - CNCF Announces Prometheus
Graduation.
https://www.cncf.io/webinars/what-is-cloud-native-and-why-does-it-exist/
Why Prometheus?
 Multi-Dimensional Data Model – Ex: instance, service, endpoint, and method.
 Operational Simplicity
 Scalable data Collection
 Powerful query Language.
All of these features existed in various systems.
However, Prometheus combined them all.
Nagios – an Overview
• The Industry Standard In IT Infrastructure Monitoring
• First launched in 1999.Nagios is officially sponsored by Nagios Enterprises.
• Nagios Core, is a free and open-source computer-software application that monitors systems,
networks and infrastructure. Nagios offers monitoring and alerting services for servers, switches,
applications and services. It alerts users when things go wrong and alerts them a second time when
the problem has been resolved.
• NDOUTILS -The NDOUTILS addon is designed to store all configuration and event data from Nagios
in a database. It requires a MariaDB or MySQL database for storing Nagios Core data .
• RRDtool and Highcharts are included to create customizable graphs that can be displayed in
dashboards.
• (Nagios Core vs Nagios XI) Nagios Core is open source whereas Nagios XI is a commercial,
enterprise version of Nagios.
• Historical performance data that is used to generate graphs are stored in Round Robin Database
(RRD) files.
• Rrdcached - On a Nagios XI server, rrdcached collects host and service performance data and then
flushes it to the appropriate rrd files at a specified interval. This reduces the amount of disk activity
needed to keep a large number of rrd files current for performance graphs.
Nagios vs Prometheus
• Nagios is primarily about alerting based on the exit codes of
scripts.
• Nagios is host-based. Each host can have one or more services
and each service can perform one check.
• There is no notion of labels or a query language.
• Nagios has no storage per-se, beyond the current check state.
There are plugins which can store data such as for
visualisation.
• Nagios XI - Using Grafana With Existing Performance Data:
Grafana uses the existing performance data files (RRD) to
generate the graphs.
• Overall, Nagios is suitable for basic monitoring of small and/or
static systems where blackbox probing is sufficient. If you want
to do whitebox monitoring, or have a dynamic or cloud based
environment, then Prometheus is a good choice.
Cacti Cacti
Should we cry or laugh?
Prometheus – By Canonical
• Ref:
https://prometheus.io/blog/2016/11/16/interview-with-canonical/
Architecture
Architecture - Explanation
• Prometheus scrapes metrics from instrumented jobs, either directly or via an
intermediary push gateway for short-lived jobs. It stores all scraped samples
locally and runs rules over this data to either aggregate and record new time
series from existing data or generate alerts.
• Also pulling is slightly better than pushing.
• For cases where you must push, we offer the Pushgateway as occasionally you
will need to monitor components which cannot be scraped. The Prometheus
Pushgateway allows you to push time series from short-lived service-level batch
jobs to an intermediary job which Prometheus can scrape.
• Limitation:-Not for Billing using the status collected for monitoring as as the
collected data will likely not be detailed and complete enough.
• Grafana or other API consumers can be used to visualize the collected data.
Alertmanager
• Grouping: Useful during larger outages when many systems fail at once and
hundreds to thousands of alerts may be firing simultaneously
• Inhibition is a concept of suppressing notifications for certain alerts if certain
other alerts are already firing.
• Silences are a straightforward way to simply mute alerts for a given time
• Following external systems are supported:
Email
Generic Webhooks
HipChat
OpsGenie
PagerDuty
Pushover
Slack
• To make Prometheus highly available: Run identical Prometheus servers on two or
more separate machines. Identical alerts will be deduplicated by the Alertmanager.
Time Series Database (TSDB)
• What is a time series -The value of something tracked over time.
• Labels (key/value pairs). Identifier -> (t0, v0), (t1, v1), (t2, v2), (t3, v3), .... Each data
point is a tuple of a timestamp and a value. For the purpose of monitoring, the
timestamp is an integer and the value any number.
Example : - This could be temperature once a day, or requests to your API once a minute.
The latter could look like:
my_api_requests: 5@1:00PM 2@1:01PM 18@1:02PM
• Fundamentally the same as the one of OpenTSDB
• Prometheus includes a local on-disk time series database, but also optionally
integrates with remote storage systems
• Ingested samples are grouped into blocks of two hours. Each two-hour block
consists of a directory containing one or more chunk files that contain all time
series samples for that window of time, as well as a metadata file and index file
(which indexes metric names and labels to time series in the chunk files). When
series are deleted via the API, deletion records are stored in separate tombstone
files (instead of deleting the data immediately from the chunk files).
• limitation of the local storage is that it is not clustered or replicated. Hence Using
RAID for disk availiablity, snapshots for backups, capacity planning, etc, is
recommended for improved durability. Alternatively, external storage may be used
via the remote read/write APIs.
TSDB Configuration:-
• Prometheus has several flags that allow configuring the local storage.
The most important ones are:
--storage.tsdb.path: This determines where Prometheus writes its database. Defaults to data/.
--storage.tsdb.retention.time: This determines when to remove old data. Defaults to 15d.
--storage.tsdb.retention.size: This determines the maximum number of bytes that storage blocks can use The oldest
data will be removed first. Defaults to 0 or disabled.
--storage.tsdb.wal-compression: This flag enables compression of the write-ahead log (WAL). Depending on your data,
you can expect the WAL size to be halved with little extra cpu load.
• TSDB Storage as follows
Prometheus - Demo
Free Online Demo:
http://demo.robustperception.io:9090/graph
• Prometheus means Forethinker
• Prometheus is Titan. i.e A titan is an
extremely important person. Albert Einstein
was a titan in the world of science.
• A Trickster figure, he was a champion of
mankind known for his wily intelligence,
who stole fire from Zeus and the gods and
gave it to mortals.
• Prometheus is a 2012 science fiction film of
spaceship.
Are You a Titan or just wearing Titan Watch?
Let’s Start - Prometheus
• Prerequisite: Configure Prometheus.yml (i.e scrape interval, target server to be monitored, alertmanager configuration, etc)
• Config file is written in YAML format. Prometheus can reload its configuration at runtime. A configuration reload is triggered by sending a
SIGHUP to the Prometheus process or sending a HTTP POST request to the /-/reload endpoint (when the --web.enable-lifecycle flag is
enabled).
• The kill command can send all of the above signals to commands and process. However, commands only give response if they are
programmed to recognize those signals. Particularly useful signals include: There are 64 signal(kill –l), Some are as below
 SIGHUP (1) - Hangup detected on controlling terminal or death of controlling process.
 SIGKILL (9) - Kill signal i.e. kill running process.
 SIGSTOP (19) - Stop process.
 SIGCONT (18) - Continue process if stopped.
To send a kill signal to PID # 1234 use: kill -9 1234
To send a kSIGHUP signal to PID # 1234 use: kill -1 1234
Prometheus – Exporter
• Exporters bridge the gap between Prometheus and system which don’t export metrics
in the Prometheus format.
• There are official & externally contributed exporter available like for mysql, oracledb,
DELL/IBM Hw, jira,Hadoop storage, apache http,AWS APIs, Docker,SNMP etc
https://prometheus.io/docs/instrumenting/exporters/
• Build Your Own Exporter:-
 Important Cronjob success or not.
 Any New Error from timesten db - error.log
 Online Selling Website perspective – Total order success vs failure.
 Order Data Metric - Dashboard Integration
 Important file received/processed or not.
 Top selling product/category
 5star to 1star review metric analysis.
 etc.
Node-Exporter - Monitors For hardware and OS
Metrics
PromQL - Prometheus Query Language
• Prometheus provides a functional query
language.
• It lets user select and aggregate time series data
in real time. The result of an expression can either
be shown as a graph, viewed as tabular data in
Prometheus's expression browser, or consumed
by external systems via the HTTP API.
• The Prometheus query language allows you to
slice and dice the dimensional data for ad-hoc
exploration, graphing, and alerting.
Time Series Selectors
• Instant Vector - One Value per time series Guaranteed. In the simplest
form, only a metric name is specified
• Range Vector - Any Number of Value between two timestamps. a
range duration is appended in square brackets ([]) at the end of a
vector selector
Metric types
• Counter :A counter is a cumulative metric that
represents a single monotonically increasing counter
whose value can only increase or be reset to zero on
restart. For example, you can use a counter to represent
the number of requests served, tasks completed, or
errors.
• Gauge :A gauge is a metric that represents a single
numerical value that can arbitrarily go up and down. i.e
temperatures or current memory usage
• Histogram :A histogram samples observations (usually
things like request durations or response sizes) and
counts them in configurable buckets.
• Summary:Similar to a histogram, a summary samples
observations (usually things like request durations and
response sizes).
https://povilasv.me/prometheus-tracking-request-duration/
Operators
• Binary Comparison Operators:
== , !=, >,<,>=,<=
• Binary Arithmetic Operators:
+, -, *, /,% (modulo), ^(power/exponentiation)
• Logical/set Binary operators:
and (intersection),or (union),unless (complement)
• Built-in aggregation operators:
sum, min, max, avg, stddev,stdvar,count, count_values, bottomk, topk, quantile
- These operators can either be used to aggregate over all label dimensions or preserve
distinct dimensions using,
by, without
https://blog.pvincent.io/2017/12/prometheus-blog-series-part-2-metric-types/
Basic Functions
• PromQL has 46 functions & growing…
• Most of the mathematical functions &
day, month, year, minute, hour, time are
avilable.
• In Prometheus perspective, we use
below mostly,
 Rate()
 irate() -irate should only be used when graphing
volatile, fast-moving counters.
 increase()
 label_join()/label_replace()
 <aggregation>_over_time()
min_over_time
max_over_time
avg_over_time
sum_over_time
count_over_time
Wow! Functions
• delta()
• holt_winters()
• predict_linear()
• clamp_max()
• clamp_min()
• histogram_quantile()
Holt-Winters
https://www.otexts.org/fpp/7/5
New Relic Doc
 Averages unfortunately have the big drawback
of hiding distribution and prevent the discovery
of outliers/deviation.
 Quantiles are better measurement for this kind
of metrics, as they allow to understand
distribution. For example, if the request latency
0.5-quantile (50th percentile) is 100ms, it
means that 50% of requests completed under
100ms. Similarly, if the 0.99-quantile (99th
percentile) is 4s, it means that 1% of requests
responded in more than 4s.
predict_linear()
Demo Queries
• max by(instance)(node_filesystem_size_bytes)
• max without(device, fstype, mountpoint)(node_filesystem_size_bytes)
• sum without(device, fstype, mountpoint)(node_filesystem_size_bytes)
• sum(node_filesystem_size_bytes)
• round(sum(node_filesystem_size_bytes)/1024/1024/1024)
• round(sum by(instance, device)(node_filesystem_size_bytes)/1024/1024/1024)
• rate(node_load1[5m])
• rate(node_cpu_seconds_total{mode="system"}[5m])
• min_over_time(node_load1[5m])
• max_over_time(node_load1[5m])
• avg_over_time(node_load1[5m])
• sum_over_time(node_load1[5m])
• count_over_time(node_load1[5m])
• delta(node_hwmon_temp_celsius[1h])
• clamp_max(node_load1,1.2)
• clamp_min(clamp_max(node_load1,1.2),1.05)
• predict_linear(node_load1[1h],4*3600)
• quantile without(cpu)(0.9, rate(node_cpu_seconds_total{mode="system"}[5m]))
• topk(3, sum by (mode) (node_cpu_seconds_total))
• bottomk(3, sum by (le) (alertmanager_http_request_duration_seconds_bucket))
Grafana – Demo
• Download and install grafana as described in url https://grafana.com/grafana/download/beta
• Post install, Follow as below to start, stop or check status accordingly. There are different way
too, follow installation guide for more data (attached logs)
gmv-evo@gmvevo:~/Downloads$ sudo systemctl start grafana-server
gmv-evo@gmvevo:~/Downloads$ sudo systemctl status grafana-server
gmv-evo@gmvevo:~/Downloads$ sudo systemctl stop grafana-server
• Open Url as follows and configure login process -http://localhost:3000.
• Configure Prometheus dashboard as generic and import Node Exporter dashboard: -
https://grafana.com/grafana/dashboards/1860
Wow! Grafana – An Dashboard Does for us!!!
Out of Syllabus – Trigger to look out
• Remote Endpoints and Storage - long term storage
• Alertmanager - Webhook Receiver (Gmail, etc)
• Prometheus Concerns - fixed by Cortex and Thanos
https://grafana.com/blog/2019/11/21/promcon-recap-two-
households-both-alike-in-dignity-cortex-and-thanos/
• Prometheus open bugs and fixes:
https://github.com/prometheus/prometheus/issues?
• Cloud Monitoring : Nagios vs. Prometheus
• Google's mtail - Extract Prometheus metrics from application logs.
• Prometheus is a system to collect and process metrics, not an event
logging system - ELK stack Answer.
Study Material –Free & Cost
Free
• https://prometheus.io/docs/introduction/overview/
• https://promcon.io/2019-munich/stream/
• Prometheus Monitoring : The Definitive Guide in 2019
• subreddit collecting all Prometheus-related resources on the internet.
• https://training.robustperception.io/ - Introduction to Prometheus
• Soundcloud - What makesPrometheusa “next generation”monitoring
system?
Cost
• Understanding PromQL by Robust Perception
• Prometheus: Up & Running by oreilly
Thanks for Listening!!!
be happy and make happy @how? given by my aasan:-
Go below what you have # Dream above what you have # First love what you have
Spread info what you have # Get info what others have # Help as per what you have

More Related Content

What's hot

Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
Grafana Labs
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
Rico Chen
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus Overview
Brian Brazil
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
Dhrubaji Mandal ♛
 
Grafana
GrafanaGrafana
Grafana
NoelMc Grath
 
Prometheus - basics
Prometheus - basicsPrometheus - basics
Prometheus - basics
Juraj Hantak
 
Prometheus
PrometheusPrometheus
Prometheus
wyukawa
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
Brian Brazil
 
Prometheus workshop
Prometheus workshopPrometheus workshop
Prometheus workshop
OpsTree solutions
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018
Grafana Labs
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
Shiao-An Yuan
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Brian Brazil
 
Grafana.pptx
Grafana.pptxGrafana.pptx
Grafana.pptx
Bhushan Rane
 
Prometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome MonitoringPrometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome Monitoring
Henrique Galafassi Dalssaso
 
Prometheus with Grafana - AddWeb Solution
Prometheus with Grafana - AddWeb SolutionPrometheus with Grafana - AddWeb Solution
Prometheus with Grafana - AddWeb Solution
AddWeb Solution Pvt. Ltd.
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
Kevin Brockhoff
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
Lucas Jellema
 
Cloud Monitoring with Prometheus
Cloud Monitoring with PrometheusCloud Monitoring with Prometheus
Cloud Monitoring with Prometheus
QAware GmbH
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy
Docker, Inc.
 
How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?
Wojciech Barczyński
 

What's hot (20)

Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus Overview
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
 
Grafana
GrafanaGrafana
Grafana
 
Prometheus - basics
Prometheus - basicsPrometheus - basics
Prometheus - basics
 
Prometheus
PrometheusPrometheus
Prometheus
 
An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Prometheus workshop
Prometheus workshopPrometheus workshop
Prometheus workshop
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)
 
Grafana.pptx
Grafana.pptxGrafana.pptx
Grafana.pptx
 
Prometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome MonitoringPrometheus + Grafana = Awesome Monitoring
Prometheus + Grafana = Awesome Monitoring
 
Prometheus with Grafana - AddWeb Solution
Prometheus with Grafana - AddWeb SolutionPrometheus with Grafana - AddWeb Solution
Prometheus with Grafana - AddWeb Solution
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
 
Cloud Monitoring with Prometheus
Cloud Monitoring with PrometheusCloud Monitoring with Prometheus
Cloud Monitoring with Prometheus
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy
 
How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?How to monitor your micro-service with Prometheus?
How to monitor your micro-service with Prometheus?
 

Similar to Prometheus - Intro, CNCF, TSDB,PromQL,Grafana

System monitoring
System monitoringSystem monitoring
System monitoring
HardikBadola
 
Nagios En
Nagios EnNagios En
Nagios En
Aleksey Trusov
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructure
Fernando Lopez Aguilar
 
MongoDB at MapMyFitness
MongoDB at MapMyFitnessMongoDB at MapMyFitness
MongoDB at MapMyFitness
MapMyFitness
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
GetInData
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source
Nitesh Jadhav
 
Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)
Brian Brazil
 
Graylog
GraylogGraylog
Graylog
Knoldus Inc.
 
MongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps PerspectiveMongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps PerspectiveMongoDB
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
Joe Stein
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin
Kuberton
 
Streaming meetup
Streaming meetupStreaming meetup
Streaming meetup
karthik_krk
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scale
Juraj Hantak
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scale
Adam Hamsik
 
Using Sumo Logic - Apr 2018
Using Sumo Logic - Apr 2018Using Sumo Logic - Apr 2018
Using Sumo Logic - Apr 2018
Sumo Logic
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)
Brian Brazil
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
GetInData
 
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
OpenStack
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Brian Brazil
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Spark Summit
 

Similar to Prometheus - Intro, CNCF, TSDB,PromQL,Grafana (20)

System monitoring
System monitoringSystem monitoring
System monitoring
 
Nagios En
Nagios EnNagios En
Nagios En
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructure
 
MongoDB at MapMyFitness
MongoDB at MapMyFitnessMongoDB at MapMyFitness
MongoDB at MapMyFitness
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source
 
Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)
 
Graylog
GraylogGraylog
Graylog
 
MongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps PerspectiveMongoDB at MapMyFitness from a DevOps Perspective
MongoDB at MapMyFitness from a DevOps Perspective
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin
 
Streaming meetup
Streaming meetupStreaming meetup
Streaming meetup
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scale
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scale
 
Using Sumo Logic - Apr 2018
Using Sumo Logic - Apr 2018Using Sumo Logic - Apr 2018
Using Sumo Logic - Apr 2018
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 

Recently uploaded

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 

Recently uploaded (20)

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 

Prometheus - Intro, CNCF, TSDB,PromQL,Grafana

  • 1. a open-source monitoring solution. Prometheus - Monitoring system & time series database
  • 2. Takeaways: • What is Prometheus? • Difference Between Nagios vs Prometheus • PromQL (Prometheus Query Language) • Time series DB • Grafana • Live Demo
  • 3. What is Prometheus? • Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. • Inspired by Google’s Borgmon Monitoring System • Written in Go .. Go, also known as Golang.. Go is syntactically similar to C. Go is widely used in production at Google and in many other organizations and open-source projects. • It is now a standalone open source project and maintained independently of any company. To emphasize this, and to clarify the project's governance structure, Prometheus joined the CNCF in 2016 as the second hosted project, after Kubernetes. • The core Prometheus server is a single binary, with no dependencies like Zookeeper, Consul, Cassandra, Hadoop or the internet. All it needs is local disk, preferably an SSD. • It is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true. https://appinventiv.com/blog/mini-guide-to-go-programming-language/
  • 4. ABOUT • The Linux Foundation is the parent. • OpenSource cloud computing for applications. Not to confuse with OpenStack which is for infrastructure. • Netflix pioneered the concept of cloud native as a practical tool • Cloud native is a term used to describe container- based environments. Cloud native technologies are used to develop applications built with services packaged in containers, deployed as microservices and managed on elastic infrastructure through agile DevOps processes and continuous delivery workflows. • August 9, 2018 - CNCF Announces Prometheus Graduation. https://www.cncf.io/webinars/what-is-cloud-native-and-why-does-it-exist/
  • 5. Why Prometheus?  Multi-Dimensional Data Model – Ex: instance, service, endpoint, and method.  Operational Simplicity  Scalable data Collection  Powerful query Language. All of these features existed in various systems. However, Prometheus combined them all.
  • 6. Nagios – an Overview • The Industry Standard In IT Infrastructure Monitoring • First launched in 1999.Nagios is officially sponsored by Nagios Enterprises. • Nagios Core, is a free and open-source computer-software application that monitors systems, networks and infrastructure. Nagios offers monitoring and alerting services for servers, switches, applications and services. It alerts users when things go wrong and alerts them a second time when the problem has been resolved. • NDOUTILS -The NDOUTILS addon is designed to store all configuration and event data from Nagios in a database. It requires a MariaDB or MySQL database for storing Nagios Core data . • RRDtool and Highcharts are included to create customizable graphs that can be displayed in dashboards. • (Nagios Core vs Nagios XI) Nagios Core is open source whereas Nagios XI is a commercial, enterprise version of Nagios. • Historical performance data that is used to generate graphs are stored in Round Robin Database (RRD) files. • Rrdcached - On a Nagios XI server, rrdcached collects host and service performance data and then flushes it to the appropriate rrd files at a specified interval. This reduces the amount of disk activity needed to keep a large number of rrd files current for performance graphs.
  • 7. Nagios vs Prometheus • Nagios is primarily about alerting based on the exit codes of scripts. • Nagios is host-based. Each host can have one or more services and each service can perform one check. • There is no notion of labels or a query language. • Nagios has no storage per-se, beyond the current check state. There are plugins which can store data such as for visualisation. • Nagios XI - Using Grafana With Existing Performance Data: Grafana uses the existing performance data files (RRD) to generate the graphs. • Overall, Nagios is suitable for basic monitoring of small and/or static systems where blackbox probing is sufficient. If you want to do whitebox monitoring, or have a dynamic or cloud based environment, then Prometheus is a good choice.
  • 8. Cacti Cacti Should we cry or laugh?
  • 9. Prometheus – By Canonical • Ref: https://prometheus.io/blog/2016/11/16/interview-with-canonical/
  • 11. Architecture - Explanation • Prometheus scrapes metrics from instrumented jobs, either directly or via an intermediary push gateway for short-lived jobs. It stores all scraped samples locally and runs rules over this data to either aggregate and record new time series from existing data or generate alerts. • Also pulling is slightly better than pushing. • For cases where you must push, we offer the Pushgateway as occasionally you will need to monitor components which cannot be scraped. The Prometheus Pushgateway allows you to push time series from short-lived service-level batch jobs to an intermediary job which Prometheus can scrape. • Limitation:-Not for Billing using the status collected for monitoring as as the collected data will likely not be detailed and complete enough. • Grafana or other API consumers can be used to visualize the collected data.
  • 12. Alertmanager • Grouping: Useful during larger outages when many systems fail at once and hundreds to thousands of alerts may be firing simultaneously • Inhibition is a concept of suppressing notifications for certain alerts if certain other alerts are already firing. • Silences are a straightforward way to simply mute alerts for a given time • Following external systems are supported: Email Generic Webhooks HipChat OpsGenie PagerDuty Pushover Slack • To make Prometheus highly available: Run identical Prometheus servers on two or more separate machines. Identical alerts will be deduplicated by the Alertmanager.
  • 13. Time Series Database (TSDB) • What is a time series -The value of something tracked over time. • Labels (key/value pairs). Identifier -> (t0, v0), (t1, v1), (t2, v2), (t3, v3), .... Each data point is a tuple of a timestamp and a value. For the purpose of monitoring, the timestamp is an integer and the value any number. Example : - This could be temperature once a day, or requests to your API once a minute. The latter could look like: my_api_requests: 5@1:00PM 2@1:01PM 18@1:02PM • Fundamentally the same as the one of OpenTSDB • Prometheus includes a local on-disk time series database, but also optionally integrates with remote storage systems • Ingested samples are grouped into blocks of two hours. Each two-hour block consists of a directory containing one or more chunk files that contain all time series samples for that window of time, as well as a metadata file and index file (which indexes metric names and labels to time series in the chunk files). When series are deleted via the API, deletion records are stored in separate tombstone files (instead of deleting the data immediately from the chunk files). • limitation of the local storage is that it is not clustered or replicated. Hence Using RAID for disk availiablity, snapshots for backups, capacity planning, etc, is recommended for improved durability. Alternatively, external storage may be used via the remote read/write APIs.
  • 14. TSDB Configuration:- • Prometheus has several flags that allow configuring the local storage. The most important ones are: --storage.tsdb.path: This determines where Prometheus writes its database. Defaults to data/. --storage.tsdb.retention.time: This determines when to remove old data. Defaults to 15d. --storage.tsdb.retention.size: This determines the maximum number of bytes that storage blocks can use The oldest data will be removed first. Defaults to 0 or disabled. --storage.tsdb.wal-compression: This flag enables compression of the write-ahead log (WAL). Depending on your data, you can expect the WAL size to be halved with little extra cpu load. • TSDB Storage as follows
  • 15. Prometheus - Demo Free Online Demo: http://demo.robustperception.io:9090/graph
  • 16. • Prometheus means Forethinker • Prometheus is Titan. i.e A titan is an extremely important person. Albert Einstein was a titan in the world of science. • A Trickster figure, he was a champion of mankind known for his wily intelligence, who stole fire from Zeus and the gods and gave it to mortals. • Prometheus is a 2012 science fiction film of spaceship. Are You a Titan or just wearing Titan Watch?
  • 17. Let’s Start - Prometheus • Prerequisite: Configure Prometheus.yml (i.e scrape interval, target server to be monitored, alertmanager configuration, etc) • Config file is written in YAML format. Prometheus can reload its configuration at runtime. A configuration reload is triggered by sending a SIGHUP to the Prometheus process or sending a HTTP POST request to the /-/reload endpoint (when the --web.enable-lifecycle flag is enabled). • The kill command can send all of the above signals to commands and process. However, commands only give response if they are programmed to recognize those signals. Particularly useful signals include: There are 64 signal(kill –l), Some are as below  SIGHUP (1) - Hangup detected on controlling terminal or death of controlling process.  SIGKILL (9) - Kill signal i.e. kill running process.  SIGSTOP (19) - Stop process.  SIGCONT (18) - Continue process if stopped. To send a kill signal to PID # 1234 use: kill -9 1234 To send a kSIGHUP signal to PID # 1234 use: kill -1 1234
  • 18. Prometheus – Exporter • Exporters bridge the gap between Prometheus and system which don’t export metrics in the Prometheus format. • There are official & externally contributed exporter available like for mysql, oracledb, DELL/IBM Hw, jira,Hadoop storage, apache http,AWS APIs, Docker,SNMP etc https://prometheus.io/docs/instrumenting/exporters/ • Build Your Own Exporter:-  Important Cronjob success or not.  Any New Error from timesten db - error.log  Online Selling Website perspective – Total order success vs failure.  Order Data Metric - Dashboard Integration  Important file received/processed or not.  Top selling product/category  5star to 1star review metric analysis.  etc.
  • 19. Node-Exporter - Monitors For hardware and OS Metrics
  • 20. PromQL - Prometheus Query Language • Prometheus provides a functional query language. • It lets user select and aggregate time series data in real time. The result of an expression can either be shown as a graph, viewed as tabular data in Prometheus's expression browser, or consumed by external systems via the HTTP API. • The Prometheus query language allows you to slice and dice the dimensional data for ad-hoc exploration, graphing, and alerting.
  • 21. Time Series Selectors • Instant Vector - One Value per time series Guaranteed. In the simplest form, only a metric name is specified • Range Vector - Any Number of Value between two timestamps. a range duration is appended in square brackets ([]) at the end of a vector selector
  • 22. Metric types • Counter :A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart. For example, you can use a counter to represent the number of requests served, tasks completed, or errors. • Gauge :A gauge is a metric that represents a single numerical value that can arbitrarily go up and down. i.e temperatures or current memory usage • Histogram :A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. • Summary:Similar to a histogram, a summary samples observations (usually things like request durations and response sizes). https://povilasv.me/prometheus-tracking-request-duration/
  • 23. Operators • Binary Comparison Operators: == , !=, >,<,>=,<= • Binary Arithmetic Operators: +, -, *, /,% (modulo), ^(power/exponentiation) • Logical/set Binary operators: and (intersection),or (union),unless (complement) • Built-in aggregation operators: sum, min, max, avg, stddev,stdvar,count, count_values, bottomk, topk, quantile - These operators can either be used to aggregate over all label dimensions or preserve distinct dimensions using, by, without https://blog.pvincent.io/2017/12/prometheus-blog-series-part-2-metric-types/
  • 24. Basic Functions • PromQL has 46 functions & growing… • Most of the mathematical functions & day, month, year, minute, hour, time are avilable. • In Prometheus perspective, we use below mostly,  Rate()  irate() -irate should only be used when graphing volatile, fast-moving counters.  increase()  label_join()/label_replace()  <aggregation>_over_time() min_over_time max_over_time avg_over_time sum_over_time count_over_time
  • 25. Wow! Functions • delta() • holt_winters() • predict_linear() • clamp_max() • clamp_min() • histogram_quantile() Holt-Winters https://www.otexts.org/fpp/7/5 New Relic Doc  Averages unfortunately have the big drawback of hiding distribution and prevent the discovery of outliers/deviation.  Quantiles are better measurement for this kind of metrics, as they allow to understand distribution. For example, if the request latency 0.5-quantile (50th percentile) is 100ms, it means that 50% of requests completed under 100ms. Similarly, if the 0.99-quantile (99th percentile) is 4s, it means that 1% of requests responded in more than 4s. predict_linear()
  • 26. Demo Queries • max by(instance)(node_filesystem_size_bytes) • max without(device, fstype, mountpoint)(node_filesystem_size_bytes) • sum without(device, fstype, mountpoint)(node_filesystem_size_bytes) • sum(node_filesystem_size_bytes) • round(sum(node_filesystem_size_bytes)/1024/1024/1024) • round(sum by(instance, device)(node_filesystem_size_bytes)/1024/1024/1024) • rate(node_load1[5m]) • rate(node_cpu_seconds_total{mode="system"}[5m]) • min_over_time(node_load1[5m]) • max_over_time(node_load1[5m]) • avg_over_time(node_load1[5m]) • sum_over_time(node_load1[5m]) • count_over_time(node_load1[5m]) • delta(node_hwmon_temp_celsius[1h]) • clamp_max(node_load1,1.2) • clamp_min(clamp_max(node_load1,1.2),1.05) • predict_linear(node_load1[1h],4*3600) • quantile without(cpu)(0.9, rate(node_cpu_seconds_total{mode="system"}[5m])) • topk(3, sum by (mode) (node_cpu_seconds_total)) • bottomk(3, sum by (le) (alertmanager_http_request_duration_seconds_bucket))
  • 27. Grafana – Demo • Download and install grafana as described in url https://grafana.com/grafana/download/beta • Post install, Follow as below to start, stop or check status accordingly. There are different way too, follow installation guide for more data (attached logs) gmv-evo@gmvevo:~/Downloads$ sudo systemctl start grafana-server gmv-evo@gmvevo:~/Downloads$ sudo systemctl status grafana-server gmv-evo@gmvevo:~/Downloads$ sudo systemctl stop grafana-server • Open Url as follows and configure login process -http://localhost:3000. • Configure Prometheus dashboard as generic and import Node Exporter dashboard: - https://grafana.com/grafana/dashboards/1860
  • 28. Wow! Grafana – An Dashboard Does for us!!!
  • 29. Out of Syllabus – Trigger to look out • Remote Endpoints and Storage - long term storage • Alertmanager - Webhook Receiver (Gmail, etc) • Prometheus Concerns - fixed by Cortex and Thanos https://grafana.com/blog/2019/11/21/promcon-recap-two- households-both-alike-in-dignity-cortex-and-thanos/ • Prometheus open bugs and fixes: https://github.com/prometheus/prometheus/issues? • Cloud Monitoring : Nagios vs. Prometheus • Google's mtail - Extract Prometheus metrics from application logs. • Prometheus is a system to collect and process metrics, not an event logging system - ELK stack Answer.
  • 30. Study Material –Free & Cost Free • https://prometheus.io/docs/introduction/overview/ • https://promcon.io/2019-munich/stream/ • Prometheus Monitoring : The Definitive Guide in 2019 • subreddit collecting all Prometheus-related resources on the internet. • https://training.robustperception.io/ - Introduction to Prometheus • Soundcloud - What makesPrometheusa “next generation”monitoring system? Cost • Understanding PromQL by Robust Perception • Prometheus: Up & Running by oreilly
  • 31. Thanks for Listening!!! be happy and make happy @how? given by my aasan:- Go below what you have # Dream above what you have # First love what you have Spread info what you have # Get info what others have # Help as per what you have