SlideShare a Scribd company logo
1 of 30
#DOPPA17
Prometheus: Monitoring
Pravin Magdum
9th September 2017
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
• Devops evangelist @ Crevise Technology
• Developer turned into Devops evangelist
• 9 + years of development and project management exp in
various technology.
• Love to resolve tech problems,debug issues.
Who Am I?
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
What is and why Monitoring ?
• Continuously keep track of the status of the system
• Continuously keep track of deployed applications
• Earliest warning of failures, defects or problems and to improve them
• Trending to see over time - help with upgrade /downgrade infra resources
• To know when things go wrong
• If issue persists, analysed data to debug issue and prevent it in future
• Black box monitoring
• Whitebox monitoring
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
Black box monitoring
• Just like smoke testing
• Examples - Ping ,http requests
• To check if server is up and working etc
• When - when system broken and to test from outside n/w
• Won’t get info -whats going inside machine
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
White box monitoring
• Complementary to black box testing
• Get info -whats inside going in system
• Example - check CPU usage, n/w usage
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
What to Monitor ?
• It is best to, first, understand what holds business value to
you and your customers.
• CPU, Memory, IO, storage - typical metrics
• Application monitoring - to make application run in cluster
depending on these metrics
• Predicate resource utilization to avoid downtime.
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
Prometheus
• Inspired from Google’ Borgmon monitoring system
• Mainly written in GO , publicly launched in 2015
• Open source Monitoring and alerting system with active Eco
system
• Used by Docker, Digital ocean, Core Os to name few
Prometheus Offers
Prometheus Offers -
• Multi-dimensional data model(time series data)
– No strings like “doppa.pune”
– Key value pairs {event=Doppa, city=pune}
• Powerful Queries - To leverage this dimensionality
• Precise alerting
• Pull model over HTTP
• Scalable
• Dashboards
• Efficient -
– Single server can handle - Millions of metrics
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
Lets Understand with simple
diagram
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
Components
• Prometheus Server - scrapes and stores time series data
• Exporters - to get metrics from resources
• Alet Rules - define alert rules
• Alert Manager - to notify on different communication channel
about alerts
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
Powerful Queries
• Can multiply ,join,add,aggregate ,predict in same query
• Can evaluate current as well as backdated data
• E.g.
• Which are top 3 services who are consuming CPU most or more
than 80% ?
• Will my storage get full in next 4 hours?
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
Some Query examples
• CPU: 100 - (avg by (instance)
(irate(node_cpu{instance="node1:9100",job="node",mode="idle"}[1m])) *
100)
• Memory: node_memory_MemTotal{job=‘node’,instance=‘node1:9100’} -
node_memory_MemFree{job=‘node’,instance=‘node1:9100’} -
node_memory_Buffers{job=‘node’,instance=‘node1:9100’} -
node_memory_Cached{job=‘node’,instance=‘node1:9100’}
• Disk Write : irate(node_disk_bytes_written[60s]) / 1024
Out of box feature
• The textfile collector is similar to the Pushgateway, in that it allows
exporting of statistics from batch jobs,shell scripts.
• Metrics not exported by node-exporter
• You can still have such metrics in prometheus with the help of Textfile
Collector
• Produce output that is compatible with Prometheus text output format
• Write your own exporters to feed prometheus
• ./node_exporter --collector.textfile.directory=Metrics
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
Running prometheus
• Download prometheus
https://prometheus.io/download/#prometheus
• Extract and Run - done.
• Let’s hit http://localhost:9090
• Let’s see in action
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
Node Exporter installation
• Again two steps
• Go to https://prometheus.io/download/#node_exporter
• And download node exporter ,Extract and Run - done
• Exports metrics at port : 9100
• let’s hit http://localhost:9100
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
Configuration
• It’s time to tell prometheus to pull metrics from node exporter
• Edit Prometheus.yml file - configuration file for prometheus
• Scrape interval -15 sec
• Scrape_configs: what are we scraping
• targets: nodes Ip/hostname to monitor
• Labels: logical group of hosts
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
Sample configuration file
Global:
scrape_interval: 15s
Alert.Rules:
-’CriticalAlert.Rules’
scrape_configs:
job_name: node
static_configs:
labels:
Group: ‘QA-Env'
targets:
- "IP:9100"
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
Reload prometheus
curl -X POST
http://localhost:9090/-/reload
# above curl command will reload prometheus server with new
configuration without restart
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
Alert Rules and manager
• Define alert rules with powerful prom queries
• Predicate about linear changes at nodes
• Send Alerts on your choice of communication channel
• e.g. slack , pagerduty , email ,sms etc
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
Setting up Alert Manager
• Alertmanager can be configured to send prometheus alerts to
your mailbox,slack,get automated calls in critical situation etc.
• Download Alertmanager from
https://prometheus.io/download/#alertmanager
• Extract and configure
• Run
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
Alert rules - instance is up and
running ?ALERT InstanceDown
IF up == 0
FOR 10m
LABELS { severity = "CRITICAL" }
ANNOTATIONS {
summary = "Instance down",
description = "{{ $labels.group }}-{{$labels.instance}} - instance has been down for more than 10 minute." }
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
Alert Rule - does my cpu usage
going beyond 75% ?
ALERT NodeCPUUsage
IF (100 - (avg by (instance) (irate(node_cpu{job="node",mode="idle"}[1m])) * 100)) > 75
FOR 2m
LABELS { severity="CRITICAL"}
ANNOTATIONS {
SUMMARY = "{{ $labels.group }}-{{$labels.instance}}: High CPU usage detected",
DESCRIPTION = " CPU usage is above 75% (current value is: {{ $value }})"
}
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
Alert Rule- storage packed 80 % ?
ALERT filesystem_threshold_exceeded
IF 100 *(1 - (node_filesystem_free{mountpoint="/"} / node_filesystem_size{ mountpoint="/"}) ) > 80
LABELS {severity="CRITICAL" }
ANNOTATIONS {
summary = "{{ $labels.group }}-{{ $labels.instance }} High filesystem usage is detected",
description = "This device's filesystem usage has exceeded the threshold with a value of {{ $value }}.",
}
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
AlertManager.yml
Alert manager Configuration file
global:
route:
repeat_interval: 4h
routes:
- receiver: email-QA
match:
group: 'QA-trad'
- receiver: email-prod
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
Receivers
receivers:
- name: "email-Prod"
email_configs:
- to: 'Prodsupport@crevise.com'
from: 'no-reply@crevise.com'
smarthost: 'smtp.office365.com:587'
auth_username: 'no-reply@crevise.com'
auth_identity: 'no-reply@crevise.com'
auth_password: 'fXXXX'
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
Slack Alerts
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
Email Alerts
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
Set up Grafana
• Download tar from official site,Extract it and run binary
• Check URL http://<Server-IP>:3000
• Add Prometheus as datasource
• Go to http://<Server-IP>:3000
• Enter the username admin and password admin, and then click “Log In”.
• Click “Data Sources” on the left menu
• Click “Add new” on the top menu
Grafana Dashboard
#DOPPA17
As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media
marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us)
Thank you !!
Questions ?
Reachable at
pravin.magdum@crevise.com
Twitter - @pravin_magdum

More Related Content

What's hot

An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)Brian Brazil
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Brian Brazil
 
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Prometheus for Monitoring Metrics (Percona Live Europe 2017)Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Prometheus for Monitoring Metrics (Percona Live Europe 2017)Brian Brazil
 
Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)Brian Brazil
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Brian Brazil
 
Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016
Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016
Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016Matthew Broberg
 
Efficient monitoring and alerting
Efficient monitoring and alertingEfficient monitoring and alerting
Efficient monitoring and alertingTobias Schmidt
 
Prometheus: From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
Prometheus:  From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)Prometheus:  From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
Prometheus: From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)Brian Brazil
 
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)Brian Brazil
 
Framingham Go Meetup - October 2016
Framingham Go Meetup - October 2016Framingham Go Meetup - October 2016
Framingham Go Meetup - October 2016Matthew Broberg
 
GoSF Jan 2016 - Go Write a Plugin for Snap!
GoSF Jan 2016 - Go Write a Plugin for Snap!GoSF Jan 2016 - Go Write a Plugin for Snap!
GoSF Jan 2016 - Go Write a Plugin for Snap!Matthew Broberg
 
Intro to open source telemetry linux con 2016
Intro to open source telemetry   linux con 2016Intro to open source telemetry   linux con 2016
Intro to open source telemetry linux con 2016Matthew Broberg
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy Docker, Inc.
 
Open source tools for optimizing your peering infrastructure @ DE-CIX TechMee...
Open source tools for optimizing your peering infrastructure @ DE-CIX TechMee...Open source tools for optimizing your peering infrastructure @ DE-CIX TechMee...
Open source tools for optimizing your peering infrastructure @ DE-CIX TechMee...Daniel Czerwonk
 
Ansible at FOSDEM (Ansible Dublin, 2016)
Ansible at FOSDEM (Ansible Dublin, 2016)Ansible at FOSDEM (Ansible Dublin, 2016)
Ansible at FOSDEM (Ansible Dublin, 2016)Brian Brazil
 
Lightning Fast Monitoring against Lightning Fast Outages
Lightning Fast Monitoring against Lightning Fast OutagesLightning Fast Monitoring against Lightning Fast Outages
Lightning Fast Monitoring against Lightning Fast OutagesMaxime Petazzoni
 
Reactive Programming with Rx
 Reactive Programming with Rx Reactive Programming with Rx
Reactive Programming with RxC4Media
 

What's hot (20)

An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)An Introduction to Prometheus (GrafanaCon 2016)
An Introduction to Prometheus (GrafanaCon 2016)
 
Prometheus course
Prometheus coursePrometheus course
Prometheus course
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
 
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Prometheus for Monitoring Metrics (Percona Live Europe 2017)Prometheus for Monitoring Metrics (Percona Live Europe 2017)
Prometheus for Monitoring Metrics (Percona Live Europe 2017)
 
Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)Evolution of Monitoring and Prometheus (Dublin 2018)
Evolution of Monitoring and Prometheus (Dublin 2018)
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016
Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016
Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016
 
Prometheus with Grafana - AddWeb Solution
Prometheus with Grafana - AddWeb SolutionPrometheus with Grafana - AddWeb Solution
Prometheus with Grafana - AddWeb Solution
 
Efficient monitoring and alerting
Efficient monitoring and alertingEfficient monitoring and alerting
Efficient monitoring and alerting
 
Prometheus: From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
Prometheus:  From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)Prometheus:  From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
Prometheus: From Berlin to Bonanza (Keynote CloudNativeCon+Kubecon Europe 2017)
 
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
 
Framingham Go Meetup - October 2016
Framingham Go Meetup - October 2016Framingham Go Meetup - October 2016
Framingham Go Meetup - October 2016
 
GoSF Jan 2016 - Go Write a Plugin for Snap!
GoSF Jan 2016 - Go Write a Plugin for Snap!GoSF Jan 2016 - Go Write a Plugin for Snap!
GoSF Jan 2016 - Go Write a Plugin for Snap!
 
Intro to open source telemetry linux con 2016
Intro to open source telemetry   linux con 2016Intro to open source telemetry   linux con 2016
Intro to open source telemetry linux con 2016
 
Prometheus design and philosophy
Prometheus design and philosophy   Prometheus design and philosophy
Prometheus design and philosophy
 
Open source tools for optimizing your peering infrastructure @ DE-CIX TechMee...
Open source tools for optimizing your peering infrastructure @ DE-CIX TechMee...Open source tools for optimizing your peering infrastructure @ DE-CIX TechMee...
Open source tools for optimizing your peering infrastructure @ DE-CIX TechMee...
 
Ansible at FOSDEM (Ansible Dublin, 2016)
Ansible at FOSDEM (Ansible Dublin, 2016)Ansible at FOSDEM (Ansible Dublin, 2016)
Ansible at FOSDEM (Ansible Dublin, 2016)
 
Training – Going Async
Training – Going AsyncTraining – Going Async
Training – Going Async
 
Lightning Fast Monitoring against Lightning Fast Outages
Lightning Fast Monitoring against Lightning Fast OutagesLightning Fast Monitoring against Lightning Fast Outages
Lightning Fast Monitoring against Lightning Fast Outages
 
Reactive Programming with Rx
 Reactive Programming with Rx Reactive Programming with Rx
Reactive Programming with Rx
 

Similar to Monitoring with Prometheus

Architecting DevOps Ready Application
Architecting DevOps Ready Application Architecting DevOps Ready Application
Architecting DevOps Ready Application Agile Testing Alliance
 
Windows automation with ansible
Windows automation with ansibleWindows automation with ansible
Windows automation with ansibleSwapnil Dahiphale
 
Making DevOps a reality for Legacy Enterprise Monolithic Products
Making DevOps a reality for Legacy Enterprise Monolithic ProductsMaking DevOps a reality for Legacy Enterprise Monolithic Products
Making DevOps a reality for Legacy Enterprise Monolithic ProductsAgile Testing Alliance
 
DevOps In Mobility World With Microsoft Technology
DevOps In Mobility World With Microsoft Technology DevOps In Mobility World With Microsoft Technology
DevOps In Mobility World With Microsoft Technology Agile Testing Alliance
 
#ATAGTR2020 Presentation - Adaptive Learner
#ATAGTR2020 Presentation - Adaptive Learner#ATAGTR2020 Presentation - Adaptive Learner
#ATAGTR2020 Presentation - Adaptive LearnerAgile Testing Alliance
 
#ATAGTR2020 Presentation - Multiplatform Test Automation Framework Solution w...
#ATAGTR2020 Presentation - Multiplatform Test Automation Framework Solution w...#ATAGTR2020 Presentation - Multiplatform Test Automation Framework Solution w...
#ATAGTR2020 Presentation - Multiplatform Test Automation Framework Solution w...Agile Testing Alliance
 
#ATAGTR2020 Presentation - Redefining DevOps for seamless performance testing
#ATAGTR2020 Presentation - Redefining DevOps for seamless performance testing#ATAGTR2020 Presentation - Redefining DevOps for seamless performance testing
#ATAGTR2020 Presentation - Redefining DevOps for seamless performance testingAgile Testing Alliance
 
#ATAGTR2020 Presentation - Speed Up Your Regression Testing Cycles with Data ...
#ATAGTR2020 Presentation - Speed Up Your Regression Testing Cycles with Data ...#ATAGTR2020 Presentation - Speed Up Your Regression Testing Cycles with Data ...
#ATAGTR2020 Presentation - Speed Up Your Regression Testing Cycles with Data ...Agile Testing Alliance
 
#ATAGTR2020 Presentation - Relish your journey to Software Testing Masterchef
#ATAGTR2020 Presentation - Relish your journey to Software Testing Masterchef#ATAGTR2020 Presentation - Relish your journey to Software Testing Masterchef
#ATAGTR2020 Presentation - Relish your journey to Software Testing MasterchefAgile Testing Alliance
 
#ATAGTR2020 Presentation - Non-Functional Testing of Chatbots
#ATAGTR2020 Presentation - Non-Functional Testing of Chatbots#ATAGTR2020 Presentation - Non-Functional Testing of Chatbots
#ATAGTR2020 Presentation - Non-Functional Testing of ChatbotsAgile Testing Alliance
 
Linuxkit and Moby - A Sneek Peek into The Future of Container Ecosystem
Linuxkit and Moby - A Sneek Peek into The Future of Container EcosystemLinuxkit and Moby - A Sneek Peek into The Future of Container Ecosystem
Linuxkit and Moby - A Sneek Peek into The Future of Container EcosystemAgile Testing Alliance
 
#ATAGTR2020 Presentation - Case study for holistic approach to IoT testing
#ATAGTR2020 Presentation - Case study for holistic approach to IoT testing#ATAGTR2020 Presentation - Case study for holistic approach to IoT testing
#ATAGTR2020 Presentation - Case study for holistic approach to IoT testingAgile Testing Alliance
 
Addressing the challenges of delivering Microservice applications in the ente...
Addressing the challenges of delivering Microservice applications in the ente...Addressing the challenges of delivering Microservice applications in the ente...
Addressing the challenges of delivering Microservice applications in the ente...Agile Testing Alliance
 
#ATAGTR2019 Presentation "Re-engineering perfmance strategy of deep learning ...
#ATAGTR2019 Presentation "Re-engineering perfmance strategy of deep learning ...#ATAGTR2019 Presentation "Re-engineering perfmance strategy of deep learning ...
#ATAGTR2019 Presentation "Re-engineering perfmance strategy of deep learning ...Agile Testing Alliance
 
Distributed And Scaled (DiSc) Agile PMO
Distributed And Scaled (DiSc) Agile PMODistributed And Scaled (DiSc) Agile PMO
Distributed And Scaled (DiSc) Agile PMOAgile Testing Alliance
 
#ATAGTR2020 Presentation - The Splunk Integration for Futuristic NFT in DevOp...
#ATAGTR2020 Presentation - The Splunk Integration for Futuristic NFT in DevOp...#ATAGTR2020 Presentation - The Splunk Integration for Futuristic NFT in DevOp...
#ATAGTR2020 Presentation - The Splunk Integration for Futuristic NFT in DevOp...Agile Testing Alliance
 

Similar to Monitoring with Prometheus (20)

DevOps Architecture Design
DevOps Architecture DesignDevOps Architecture Design
DevOps Architecture Design
 
Windows Automation with Ansible
Windows Automation with Ansible Windows Automation with Ansible
Windows Automation with Ansible
 
Architecting DevOps Ready Application
Architecting DevOps Ready Application Architecting DevOps Ready Application
Architecting DevOps Ready Application
 
Windows automation with ansible
Windows automation with ansibleWindows automation with ansible
Windows automation with ansible
 
Making DevOps a reality for Legacy Enterprise Monolithic Products
Making DevOps a reality for Legacy Enterprise Monolithic ProductsMaking DevOps a reality for Legacy Enterprise Monolithic Products
Making DevOps a reality for Legacy Enterprise Monolithic Products
 
DevOps In Mobility World With Microsoft Technology
DevOps In Mobility World With Microsoft Technology DevOps In Mobility World With Microsoft Technology
DevOps In Mobility World With Microsoft Technology
 
Salesforce: CI,CD & CT
Salesforce: CI,CD & CTSalesforce: CI,CD & CT
Salesforce: CI,CD & CT
 
#ATAGTR2020 Presentation - Adaptive Learner
#ATAGTR2020 Presentation - Adaptive Learner#ATAGTR2020 Presentation - Adaptive Learner
#ATAGTR2020 Presentation - Adaptive Learner
 
#ATAGTR2020 Presentation - Multiplatform Test Automation Framework Solution w...
#ATAGTR2020 Presentation - Multiplatform Test Automation Framework Solution w...#ATAGTR2020 Presentation - Multiplatform Test Automation Framework Solution w...
#ATAGTR2020 Presentation - Multiplatform Test Automation Framework Solution w...
 
Robotic Process Automation
Robotic Process Automation Robotic Process Automation
Robotic Process Automation
 
#ATAGTR2020 Presentation - Redefining DevOps for seamless performance testing
#ATAGTR2020 Presentation - Redefining DevOps for seamless performance testing#ATAGTR2020 Presentation - Redefining DevOps for seamless performance testing
#ATAGTR2020 Presentation - Redefining DevOps for seamless performance testing
 
#ATAGTR2020 Presentation - Speed Up Your Regression Testing Cycles with Data ...
#ATAGTR2020 Presentation - Speed Up Your Regression Testing Cycles with Data ...#ATAGTR2020 Presentation - Speed Up Your Regression Testing Cycles with Data ...
#ATAGTR2020 Presentation - Speed Up Your Regression Testing Cycles with Data ...
 
#ATAGTR2020 Presentation - Relish your journey to Software Testing Masterchef
#ATAGTR2020 Presentation - Relish your journey to Software Testing Masterchef#ATAGTR2020 Presentation - Relish your journey to Software Testing Masterchef
#ATAGTR2020 Presentation - Relish your journey to Software Testing Masterchef
 
#ATAGTR2020 Presentation - Non-Functional Testing of Chatbots
#ATAGTR2020 Presentation - Non-Functional Testing of Chatbots#ATAGTR2020 Presentation - Non-Functional Testing of Chatbots
#ATAGTR2020 Presentation - Non-Functional Testing of Chatbots
 
Linuxkit and Moby - A Sneek Peek into The Future of Container Ecosystem
Linuxkit and Moby - A Sneek Peek into The Future of Container EcosystemLinuxkit and Moby - A Sneek Peek into The Future of Container Ecosystem
Linuxkit and Moby - A Sneek Peek into The Future of Container Ecosystem
 
#ATAGTR2020 Presentation - Case study for holistic approach to IoT testing
#ATAGTR2020 Presentation - Case study for holistic approach to IoT testing#ATAGTR2020 Presentation - Case study for holistic approach to IoT testing
#ATAGTR2020 Presentation - Case study for holistic approach to IoT testing
 
Addressing the challenges of delivering Microservice applications in the ente...
Addressing the challenges of delivering Microservice applications in the ente...Addressing the challenges of delivering Microservice applications in the ente...
Addressing the challenges of delivering Microservice applications in the ente...
 
#ATAGTR2019 Presentation "Re-engineering perfmance strategy of deep learning ...
#ATAGTR2019 Presentation "Re-engineering perfmance strategy of deep learning ...#ATAGTR2019 Presentation "Re-engineering perfmance strategy of deep learning ...
#ATAGTR2019 Presentation "Re-engineering perfmance strategy of deep learning ...
 
Distributed And Scaled (DiSc) Agile PMO
Distributed And Scaled (DiSc) Agile PMODistributed And Scaled (DiSc) Agile PMO
Distributed And Scaled (DiSc) Agile PMO
 
#ATAGTR2020 Presentation - The Splunk Integration for Futuristic NFT in DevOp...
#ATAGTR2020 Presentation - The Splunk Integration for Futuristic NFT in DevOp...#ATAGTR2020 Presentation - The Splunk Integration for Futuristic NFT in DevOp...
#ATAGTR2020 Presentation - The Splunk Integration for Futuristic NFT in DevOp...
 

Recently uploaded

Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 

Recently uploaded (20)

Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 

Monitoring with Prometheus

  • 2. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) • Devops evangelist @ Crevise Technology • Developer turned into Devops evangelist • 9 + years of development and project management exp in various technology. • Love to resolve tech problems,debug issues. Who Am I?
  • 3. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) What is and why Monitoring ? • Continuously keep track of the status of the system • Continuously keep track of deployed applications • Earliest warning of failures, defects or problems and to improve them • Trending to see over time - help with upgrade /downgrade infra resources • To know when things go wrong • If issue persists, analysed data to debug issue and prevent it in future • Black box monitoring • Whitebox monitoring
  • 4. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) Black box monitoring • Just like smoke testing • Examples - Ping ,http requests • To check if server is up and working etc • When - when system broken and to test from outside n/w • Won’t get info -whats going inside machine
  • 5. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) White box monitoring • Complementary to black box testing • Get info -whats inside going in system • Example - check CPU usage, n/w usage
  • 6. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) What to Monitor ? • It is best to, first, understand what holds business value to you and your customers. • CPU, Memory, IO, storage - typical metrics • Application monitoring - to make application run in cluster depending on these metrics • Predicate resource utilization to avoid downtime.
  • 7. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) Prometheus • Inspired from Google’ Borgmon monitoring system • Mainly written in GO , publicly launched in 2015 • Open source Monitoring and alerting system with active Eco system • Used by Docker, Digital ocean, Core Os to name few
  • 8. Prometheus Offers Prometheus Offers - • Multi-dimensional data model(time series data) – No strings like “doppa.pune” – Key value pairs {event=Doppa, city=pune} • Powerful Queries - To leverage this dimensionality • Precise alerting • Pull model over HTTP • Scalable • Dashboards • Efficient - – Single server can handle - Millions of metrics
  • 9. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) Lets Understand with simple diagram
  • 10. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) Components • Prometheus Server - scrapes and stores time series data • Exporters - to get metrics from resources • Alet Rules - define alert rules • Alert Manager - to notify on different communication channel about alerts
  • 11. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) Powerful Queries • Can multiply ,join,add,aggregate ,predict in same query • Can evaluate current as well as backdated data • E.g. • Which are top 3 services who are consuming CPU most or more than 80% ? • Will my storage get full in next 4 hours?
  • 12. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) Some Query examples • CPU: 100 - (avg by (instance) (irate(node_cpu{instance="node1:9100",job="node",mode="idle"}[1m])) * 100) • Memory: node_memory_MemTotal{job=‘node’,instance=‘node1:9100’} - node_memory_MemFree{job=‘node’,instance=‘node1:9100’} - node_memory_Buffers{job=‘node’,instance=‘node1:9100’} - node_memory_Cached{job=‘node’,instance=‘node1:9100’} • Disk Write : irate(node_disk_bytes_written[60s]) / 1024
  • 13. Out of box feature • The textfile collector is similar to the Pushgateway, in that it allows exporting of statistics from batch jobs,shell scripts. • Metrics not exported by node-exporter • You can still have such metrics in prometheus with the help of Textfile Collector • Produce output that is compatible with Prometheus text output format • Write your own exporters to feed prometheus • ./node_exporter --collector.textfile.directory=Metrics
  • 14. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) Running prometheus • Download prometheus https://prometheus.io/download/#prometheus • Extract and Run - done. • Let’s hit http://localhost:9090 • Let’s see in action
  • 15. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) Node Exporter installation • Again two steps • Go to https://prometheus.io/download/#node_exporter • And download node exporter ,Extract and Run - done • Exports metrics at port : 9100 • let’s hit http://localhost:9100
  • 16. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) Configuration • It’s time to tell prometheus to pull metrics from node exporter • Edit Prometheus.yml file - configuration file for prometheus • Scrape interval -15 sec • Scrape_configs: what are we scraping • targets: nodes Ip/hostname to monitor • Labels: logical group of hosts
  • 17. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) Sample configuration file Global: scrape_interval: 15s Alert.Rules: -’CriticalAlert.Rules’ scrape_configs: job_name: node static_configs: labels: Group: ‘QA-Env' targets: - "IP:9100"
  • 18. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) Reload prometheus curl -X POST http://localhost:9090/-/reload # above curl command will reload prometheus server with new configuration without restart
  • 19. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) Alert Rules and manager • Define alert rules with powerful prom queries • Predicate about linear changes at nodes • Send Alerts on your choice of communication channel • e.g. slack , pagerduty , email ,sms etc
  • 20. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) Setting up Alert Manager • Alertmanager can be configured to send prometheus alerts to your mailbox,slack,get automated calls in critical situation etc. • Download Alertmanager from https://prometheus.io/download/#alertmanager • Extract and configure • Run
  • 21. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) Alert rules - instance is up and running ?ALERT InstanceDown IF up == 0 FOR 10m LABELS { severity = "CRITICAL" } ANNOTATIONS { summary = "Instance down", description = "{{ $labels.group }}-{{$labels.instance}} - instance has been down for more than 10 minute." }
  • 22. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) Alert Rule - does my cpu usage going beyond 75% ? ALERT NodeCPUUsage IF (100 - (avg by (instance) (irate(node_cpu{job="node",mode="idle"}[1m])) * 100)) > 75 FOR 2m LABELS { severity="CRITICAL"} ANNOTATIONS { SUMMARY = "{{ $labels.group }}-{{$labels.instance}}: High CPU usage detected", DESCRIPTION = " CPU usage is above 75% (current value is: {{ $value }})" }
  • 23. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) Alert Rule- storage packed 80 % ? ALERT filesystem_threshold_exceeded IF 100 *(1 - (node_filesystem_free{mountpoint="/"} / node_filesystem_size{ mountpoint="/"}) ) > 80 LABELS {severity="CRITICAL" } ANNOTATIONS { summary = "{{ $labels.group }}-{{ $labels.instance }} High filesystem usage is detected", description = "This device's filesystem usage has exceeded the threshold with a value of {{ $value }}.", }
  • 24. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) AlertManager.yml Alert manager Configuration file global: route: repeat_interval: 4h routes: - receiver: email-QA match: group: 'QA-trad' - receiver: email-prod
  • 25. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) Receivers receivers: - name: "email-Prod" email_configs: - to: 'Prodsupport@crevise.com' from: 'no-reply@crevise.com' smarthost: 'smtp.office365.com:587' auth_username: 'no-reply@crevise.com' auth_identity: 'no-reply@crevise.com' auth_password: 'fXXXX'
  • 26. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) Slack Alerts
  • 27. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) Email Alerts
  • 28. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) Set up Grafana • Download tar from official site,Extract it and run binary • Check URL http://<Server-IP>:3000 • Add Prometheus as datasource • Go to http://<Server-IP>:3000 • Enter the username admin and password admin, and then click “Log In”. • Click “Data Sources” on the left menu • Click “Add new” on the top menu
  • 30. #DOPPA17 As a author of this presentation I/we own the copyright and confirm the originality of the content. I/we allow Agile testing alliance to use the content for social media marketing, publishing it on ATA Blog or ATA social medial channels(Provided due credit is given to me/us) Thank you !! Questions ? Reachable at pravin.magdum@crevise.com Twitter - @pravin_magdum