Agile Lean Ireland - Workshop - Cloud native monitoring with prometheus

•Download as PPTX, PDF•

1 like•378 views

Note that provided environments will not be available outside the workshop - you can follow instructions from https://github.com/PierreVincent/prometheus-workshop to run the environment yourself. In the world of cloud native and distributed applications, Prometheus has quickly risen to be one of the leading open-source monitoring tools. In this workshop, you will get to learn as much as possible to get you started with Prometheus for monitoring a service-oriented architecture. We will cover: - The core concepts of Prometheus - Instrumenting your code to expose metrics - Querying Prometheus to gain insights on how your applications behave - Defining rules to trigger alerts based on metrics and thresholds - Building Grafana dashboards combining multiple metrics

Software

Cloud Native Monitoring
with
Prometheus & Grafana
April 26th, 2019 – Dublin
@PierreVincent pvincent.io

@PierreVincent
Reaching production is
only the beginning

Pierre Vincent
Infrastructure & Reliability Manager
@PierreVincent
pvincent.io

@PierreVincent
Workshop Overview
Slides - Metrics & Prometheus basics
Part 1 - Intro to Prometheus UI and Queries
Part 2 - Building Grafana Dashboards
Part 3 - Creating Prometheus Alerts
Part 4 - Instrumenting Code (Golang)

@PierreVincent
System
metrics
Application
metrics
Business
metrics
CPU usage Error rates Customer conversions
Metrics

@PierreVincent
“Cloud Native” changes the game
Monolithic architectures
Long-running instances
Long-running servers
Loosely-coupled architectures
Short-lived instances
Short/Medium-lived servers
Microservices
Auto-scaling
deployments
Multiple
deploys/day
Cloud VMsAuto-scaling
clusters
SOA

@PierreVincent
Servers / VMs
Appliances/Infra
Services
/metrics
/metrics
/metrics
Prometheus
Overview

$@PierreVincent Scraping for samples User Service /metrics # HELP http_requests_total Total number of http requests by response status code # TYPE http_requests_total counter http_requests_total{endpoint="/login",status="200"} 1584 http_requests_total{endpoint="/login",status="500"} 9 ... metric http_requests_total labels endpoint=/login status=200 timestamp 1519205931 value 1584 tsdb Each value results in a sample Every scrape interval Persist$

@PierreVincent
Our example
http-simulator
/metrics
http_requests_total
http_request_duration_milliseconds
+ standard go metrics
Option 1: Deploy on your own cluster
See instructions in kubernetes/install
Option 2: Use pre-deployed setup
prometheus.prom-workshop.pvincent.io
grafana.prom-workshop.pvincent.io
OR

http://grafana.prom-workshop.pvincent.io
PierreVincent/prometheus-workshop
http://prometheus.prom-workshop.pvincent.io

$@PierreVincent Exercises 1 - Counters & Rates ● What's the overall request rate (with a 1 minute rolling-window) for the http- simulator service? ● How many requests per minute are errors? ● What's the error rate (in %) of requests to the /users endpoint? sum(rate(http_requests_total{app="http-simulator"}[1m])) 60*sum(rate(http_requests_total{app="http-simulator", status="500"}[1m])) 100 * sum(rate(http_requests_total{app="http-simulator", endpoint="/users", status="500"}[1m])) / sum(rate(http_requests_total{app="http-simulator", endpoint="/users"}[1m]))$

$@PierreVincent Exercises 2 - Latency distribution ● What is the median latency of all requests to the http-simulator service? ● Does the /users endpoint fulfill the SLO of 3 Nines requests responding within 400ms? histogram_quantile(0.5,rate(http_request_duration_milliseconds_ bucket{app="http-simulator"}[5m])) sum(http_request_duration_milliseconds_bucket{app="http- simulator", status="200", endpoint="/users", le="400"}) / sum(http_request_duration_milliseconds_count{app="http- simulator", status="200", endpoint="/users"})$

@PierreVincent
Exercises 3 - Grafana widgets
Some examples of widgets (or come up with your own ones):
● Graph of latency distribution
● Cumulative % graph of endpoint request rate
● Memory usage over time
● CPU usage over time
● Graph % of requests fulfilling the SLO of 400ms for /login endpoint
● ...

Recently uploaded

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI

Microsoft AI Transformation Partner Playbook.pdfWilly Marroquin (WillyDevNET)

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda

The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823

Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy

Diamond Application Development Crafting Solutions with PrecisionSolGuruz

HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave

A Secure and Reliable Document Management System is Essential.docxComplianceQuest1

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveCall Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc

5 Signs You Need a Fashion PLM Software.pdfWave PLM

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health

Recently uploaded (20)

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI

Microsoft AI Transformation Partner Playbook.pdf

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

How To Troubleshoot Collaboration Apps for the Modern Connected Worker

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications

Diamond Application Development Crafting Solutions with Precision

HR Software Buyers Guide in 2024 - HRSoftware.com

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...

A Secure and Reliable Document Management System is Essential.docx

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live

How To Use Server-Side Rendering with Nuxt.js

5 Signs You Need a Fashion PLM Software.pdf

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...

Featured

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Featured (20)

2024 State of Marketing Report – by Hubspot

Everything You Need To Know About ChatGPT

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Agile Lean Ireland - Workshop - Cloud native monitoring with prometheus

1. Cloud Native Monitoring with Prometheus & Grafana April 26th, 2019 – Dublin @PierreVincent pvincent.io

2. @PierreVincent Reaching production is only the beginning

3. Pierre Vincent Infrastructure & Reliability Manager @PierreVincent pvincent.io

4. @PierreVincent Workshop Overview Slides - Metrics & Prometheus basics Part 1 - Intro to Prometheus UI and Queries Part 2 - Building Grafana Dashboards Part 3 - Creating Prometheus Alerts Part 4 - Instrumenting Code (Golang)

5. @PierreVincent System metrics Application metrics Business metrics CPU usage Error rates Customer conversions Metrics

6. @PierreVincent “Cloud Native” changes the game Monolithic architectures Long-running instances Long-running servers Loosely-coupled architectures Short-lived instances Short/Medium-lived servers Microservices Auto-scaling deployments Multiple deploys/day Cloud VMsAuto-scaling clusters SOA

7. @PierreVincent Servers / VMs Appliances/Infra Services /metrics /metrics /metrics Prometheus Overview

8. @PierreVincent Scraping for samples User Service /metrics # HELP http_requests_total Total number of http requests by response status code # TYPE http_requests_total counter http_requests_total{endpoint="/login",status="200"} 1584 http_requests_total{endpoint="/login",status="500"} 9 ... metric http_requests_total labels endpoint=/login status=200 timestamp 1519205931 value 1584 tsdb Each value results in a sample Every scrape interval Persist

9. @PierreVincent Our example http-simulator /metrics http_requests_total http_request_duration_milliseconds + standard go metrics Option 1: Deploy on your own cluster See instructions in kubernetes/install Option 2: Use pre-deployed setup prometheus.prom-workshop.pvincent.io grafana.prom-workshop.pvincent.io OR

10. http://grafana.prom-workshop.pvincent.io PierreVincent/prometheus-workshop http://prometheus.prom-workshop.pvincent.io

11. @PierreVincent Exercises 1 - Counters & Rates ● What's the overall request rate (with a 1 minute rolling-window) for the http- simulator service? ● How many requests per minute are errors? ● What's the error rate (in %) of requests to the /users endpoint? sum(rate(http_requests_total{app="http-simulator"}[1m])) 60*sum(rate(http_requests_total{app="http-simulator", status="500"}[1m])) 100 * sum(rate(http_requests_total{app="http-simulator", endpoint="/users", status="500"}[1m])) / sum(rate(http_requests_total{app="http-simulator", endpoint="/users"}[1m]))

12. @PierreVincent Exercises 2 - Latency distribution ● What is the median latency of all requests to the http-simulator service? ● Does the /users endpoint fulfill the SLO of 3 Nines requests responding within 400ms? histogram_quantile(0.5,rate(http_request_duration_milliseconds_ bucket{app="http-simulator"}[5m])) sum(http_request_duration_milliseconds_bucket{app="http- simulator", status="200", endpoint="/users", le="400"}) / sum(http_request_duration_milliseconds_count{app="http- simulator", status="200", endpoint="/users"})

13. @PierreVincent Exercises 3 - Grafana widgets Some examples of widgets (or come up with your own ones): ● Graph of latency distribution ● Cumulative % graph of endpoint request rate ● Memory usage over time ● CPU usage over time ● Graph % of requests fulfilling the SLO of 400ms for /login endpoint ● ...

Agile Lean Ireland - Workshop - Cloud native monitoring with prometheus

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Agile Lean Ireland - Workshop - Cloud native monitoring with prometheus