SlideShare a Scribd company logo
Tikal KnowledgeTikal Knowledge
Haggai Philip Zagury - DevOps Group Lead - Tikal Knowledge
FullStack Developers Israel
INTRO - WHO WE ARE
WHO WE ARE ?
▸ Tikal helps ISV’s in Israel & abroad in their technological
challenges.
▸ Our Engineers are Fullstack Developers with expertise in
Android, DevOps, Java, JS, Ruby & Python
▸ We are passionate about technology and specialize in
OpenSource technologies.
▸ Our Tech and Group leaders help establish & enhance existing
software teams with innovative & creative thinking.
https://www.meetup.com/full-stack-developer-il/
INTRODUCTION TO MODERN MONITORING
CURRENT STATUS [ INFRASTRUCTURE ]
▸ AWS, Cloud, Hybrid / Multi Cloud
▸ Define metrics and system health based on experience and application
specific behaviors.
▸ Many False Positives
▸ Scaling is hard [ semi-auto, manual ]
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
COMMON MONITORING STATUS
▸ OPS own monitoring domain
▸ Define metrics and system health based on experience and application
specific behaviours.
▸ Many False Positives
▸ Scaling is hard [ semi-auto, manual ]
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
COMMON MONITORING SOLUTIONS
▸ cloud watch
▸ new relic
▸ Nagios
▸ App Dynamics
▸ Data Dog
▸ Many more ….
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
GOALS
▸ Improve existing monitoring and RCA indicators
▸ Reduce false positives & ‘customer driven alerting’
▸ Proactively identify data anomalies / diversions
▸ Provide meaningful / intelligent notifications [ severity, SLA compliance etc ]
▸ Proactively remediate commonly known issues, or set the foundation of a
robust substitute
▸ Provide KPI integration policy & methodology for both DevOps & R&D teams
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
CHALLENGES
▸ Preserve the knowledge and insights in the existing Monitoring system
▸ Cultural changes:
▸ APM is part of the development process
▸ Monitoring tools are part of the developer stack (or he will wake up on any
issue with his code/app)
▸ On-call isn’t only for OPS … Everybody’s accountable
▸ breakdown the “wall of confusion” between dev and ops
Tikal Knowledge
PHILOSOPHY
Tikal Knowledge
The Gap of Traditional Monitoring
- We know what we want to know …
Tikal Knowledge
System Metrics
Not enough || Too much a little too late
Tikal Knowledge
We do not always
know what we are
looking @ / 4 …
Tikal Knowledge
Is this OK ?! || Normal
What happened at 4AM
Tikal Knowledge
If your’e lucky
+
= No action needed
Tikal Knowledge
Go back to sleep
( you still work up ! )
Tikal Knowledge
REALITY
Murphy’s law …
Tikal Knowledge
Stop using Nagios
(so it can die peacefully)
Feb 13, 2014 [ slideshare ]
Tikal Knowledge
In 2 words:
Configuration files…
In a few more:
- resources
- services
- dependencies
- …
Tikal Knowledge
Traditional Monitoring
• Reliable
• Durable
• Scalable
Conclusion …
system monitoring does not suffice, enter APM
Tikal Knowledge
HOW DID WE GET HERE
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
TRADITIONAL MONITORING WAS(IS) ALL ABOUT THE “BLACK BOX” | “OS” METRICS
▸ All we care about is that the system is OK …
APPLICATION
FROTNEND
APPLICATION
BACKEND
APPLICATION
DATABASE
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
OPS ARE WORKING ON OPTIMIZING INFRASTRUCTURE …
▸ Throw more RAM &
“Reports”
▸ Add another node to
the “FE cluster”
▸ Add another shard to
the DB …
▸ ….
APPLICATION …
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
IN THE PAST ~10 YEARS
▸ Developers have started to implement METRICS
▸ Organizations are adopting Standards
▸ Common metrics have become a commodity
Tikal Knowledge
REALITY PREVAILS
Tikal Knowledge
APPLICATION
FROTNEND
APPLICATION
BACKEND
APPLICATION
DATABASE
APPLICATION …
Tikal Knowledge
Multipule
Dimensions
• [ Stability ]
• Ops dimension
• [ Innovation ]
• Dev dimension
• Product dimension
Tikal Knowledge
Even More
• Environment [ stg, uat, prod ]
• Application Stack(s) || tags || types
• Business metrics
Tikal Knowledge
TEAMS | SCOPES | METRICS - COME TOGETHR
Tikal Knowledge
Tikal KnowledgeTikal Knowledge
Apply
INTRODUCTION TO MODERN MONITORING
MONITORING CRITARIA’S
▸ Server (OS) level monitoring
▸ Application Monitoring (APM)
▸ Perimeter (External website) monitoring
▸ Event driven remediation
▸ Alerting and Escalation
▸ Associated log data & anomaly detection
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
REQUIRED FEATURES
Accessibility
Scheduling
SLA’s assured
Auth & Authorization
Escalation
Durable & Resilient
Forensics
Automatic
Flexible & Elastic
Accountable
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
IT’S AN ITERATIVE PROCESS
▸ How quick did we recover ?
▸ What worked / Didn’t work ?
▸ Iterative improvements [ Chaos Monkey, 10 story test ]
▸ RCA -> Remediation [ a.k.a False positive lifecycle ]
Tikal Knowledge
METHODOLOGY
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
HOW TO DEFINE A METRIC OR ALERT VS. HOW TO STORE DATA
▸ A Metric’s Lifecycle & Design
▸ Time Series Data stream(s) || source(s)
▸ Common tagging
▸ Metric naming conventions and implications
▸ Micro Services, Integration of Traditional and New Generation solutions
▸ Choose short, mid & long term tools / services
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
A METRIC’S LIFECYCLE
NEW (A)
METRIC
INFRUSTRUCTURE (OS)
APPLICATION
EXTERNAL (DEPENDENCY / ENDPOINT)
REMEDIABLE ?
ALEARTABLE ?
LOG CORRELATION
SCOPE OF IMPACT
LEARN IN DEV | STG
}
} DEFINE IN DEV | STG
} SHIP TO PROD
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
A METRIC’S LIFECYCLE - “TAG-ABLE” == FILTERABLE | MEASURABLE | QUANTIFIABLE
NEW (A)
METRIC
INFRUSTRUCTURE (OS)
APPLICATION
EXTERNAL (DEPENDENCY / ENDPOINT)
REMEDIABLE ?
ALEARTABLE ?
LOG CORRELATION
SCOPE OF IMPACT
LEARN IN DEV | STG
}
} DEFINE IN DEV | STG
} SHIP TO PROD
DEVLOPMENT STAGING PRODUCTIONENVIRONMENT
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
A METRIC’S LIFECYCLE
NEW (A)
METRIC
INFRUSTRUCTURE (OS)
APPLICATION
EXTERNAL (DEPENDENCY / ENDPOINT)
REMEDIABLE ?
ALEARTABLE ?
LOG CORRELATION
SCOPE OF IMPACT
LEARN IN DEV | STG
}
} DEFINE IN DEV | STG
} SHIP TO PROD
- QUANTIFIABLE METRICS: SEVERITY, CRITICAL STATE
- EXPOSING A SERVICE
- CONSUMING A SERVICE
- - WHY DOES MY SERVICE HAVE AN OS IMPACT ?
- - IS IT BY DESIGN ?
- FALLBACK METHODS ?
- ALTERNATE ENDPOINTS / RETRY ?
- FEATURE TOGGLE
- DEFINE SEVERITY
37
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
TSD PRINCIPLES
Credit->http://opentsdb.net/overview.html
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
DATAPOINTS
Credit->https://www.datadoghq.com/blog/the-power-of-tagged-metrics/
IntoolslikePrometheusyoudon'tneedthetimestampitjustusescollectiontimestamp
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
MIX ’N’ MATCH
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
SHORT | MID | LONG TERM SOLUTIONS
Tikal Knowledge
PROMETHEUS
https://github.com/prometheus/prometheus
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
FEATURES
▸ Open-source systems monitoring and alerting toolkit
▸ A multi-dimensional data model (time series identified by metric name and key/value pairs)
▸ A flexible query language to leverage this dimensionality
▸ A no reliance on distributed storage; single server nodes are autonomous**
▸ A time series collection happens via a pull model over HTTP
▸ A pushing time series is supported via an intermediary gateway
▸ A targets are discovered via service discovery or static configuration
▸ A multiple modes of graphing and dashboarding support
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
PROMETHEUS ARCHITECTURE
Dashboarding
Prometheus Server Alertmanager
Retrieval /
Collection
DataSerie
s
Storage
[DB]
PromQ
L
web UI
Prometheus
server
Prometheus
server(s)
Push Gateway
Service Discovery Providers
Prometheus
server
Prometheus
exporters
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
UNTIL NOW
‣ Try providing this to each developer
‣ Sensu has a very similar approach to
APM …
‣ Complexity is the barrier …
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
UNTIL NOW
‣ Pull has become an advantage …
‣ Severity is implied [TSD]
‣ False Positives reduction
‣ Docker makes it super simple
‣ Go Lang lightweight approach
Tikal Knowledge
IMPLEMENTATION
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
IMPLEMENTATION
‣ Review old system metrics & capabilities and decide what’s good whats bad
‣ What can move
‣ What needs to stay | integrate to new system
‣ Prometheus deployment is Automated from day 1
‣ Prometheus exporter services are tagged and labeled per application stack | layer
‣ Preferably Dockerized
‣ Metric Design Workshops | meetings | slack group
‣ Alert Design Workshops | meetings | slack group
‣ Teams Mectic tags and Alerting & Escalation
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
STEP1 - IMPLEMENT DISCOVERY
AWS Discovery -> https://github.com/prometheus/prometheus/tree/master/discovery
NEW NODE
DEPLOYMEN
T
SERVICE
DISCOVERY
DEV
STAGING
PRODUCTION
STACK / APP
NAME Alertmanager
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
STEP2 - IMPLEMENT EXPORTERS
https://prometheus.io/docs/instrumenting/exporters/
Official node exporter -> https://github.com/prometheus/node_exporter
Mssql Exporter -> https://hub.docker.com/r/awaragi/prometheus-mssql-
exporter/
Nagios Exporter -> https://github.com/m-lab/prometheus-nagios-exporter
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
STEP3 - IMPLEMENT CUSTOM APPLICATION METRICS
https://prometheus.io/docs/instrumenting/exporters/
Windows WMI -> https://github.com/martinlindhe/wmi_exporter
Java -> https://github.com/prometheus/jmx_exporter
node.js -> https://www.npmjs.com/browse/keyword/prometheus
.Net -> https://github.com/andrasm/prometheus-net
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
STEP4 - ADAPT TO YOUR INFRA MONITORING [ FILTER || TAG || SELECTOR ]
kubernetes_sd_config
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
STEP 5 - METRIC DESIGN
‣ Review sample METRICS and GRAPHS
‣ Define | Reuse
‣ Naming conventions { https://prometheus.io/docs/practices/naming/ }
‣ Quantifiable [ numbers not strings … ]
Tikal Knowledge
DASHBOARSTikal Knowledge
INTRODUCTION TO MODERN MONITORING
DEVELOPER TOOL
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
DEVELOPER TOOL - SIMPLE GRAPHS
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
DEVELOPER TOOL - METRICS - USING PROMQL
▸ Simple queries:
▸ rate(http_requests_total[5m])
▸ Linear predictions
▸ predict_linear(node_filesystem_free[1h], 4*3600)
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
GRAFANA - SIMILAR WORKING EXPERIENCE - MUCH NICER
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
GRAFANA - SIMILAR WORKING EXPERIENCE - MUCH NICER
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
STEP 6 - ALERT DESIGN
‣ Review new METRICS and GRAPHS define | design thresholds
‣ Define Severity
‣ Ownership
‣ Escalation lader
Tikal Knowledge
ALERTINGTikal Knowledge
INTRODUCTION TO MODERN MONITORING
ALERT DESIGN
▸ ALERT <alert name>
▸ IF <expression>
▸ [ FOR <duration> ]
▸ [ LABELS <label set> ]
▸ [ ANNOTATIONS <label set> ]
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
ALERT FOR ANY INSTANCE THAT IS UNREACHABLE FOR >5 MINUTES.
ALERT high_load
IF node_load1 > 0.5
ANNOTATIONS {description="{{ $labels.instance }} of job {{ $labels.job }} is
under high load.", summary="Instance {{ $labels.instance }} under high load"}
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
STILL LOOKING FOR ONLINE EDITOR FOR EASE OF DEVELOPMENT
https://github.com/alerta/prometheus-config
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
SIMPLE YAML FILE
route:
receiver: 'slack'
receivers:
- name: 'slack'
slack_configs:
- send_resolved: true
username: '<username>'
channel: '#<channel-name>'
api_url: '<incomming-webhook-url>'
WHERE TO ROUTE TO
ROUTER DETAILS
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
ALERTING
global:
resolve_timeout: 5m
smtp_require_tls: true
pagerduty_url: https://events.pagerduty.com/generic/2010-04-15/create_event.json
hipchat_url: https://api.hipchat.com/
opsgenie_api_host: https://api.opsgenie.com/
victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/
route:
receiver: slack
receivers:
- name: slack
slack_configs:
- send_resolved: true
api_url: <secret>
channel: '#<channel-name>'
username: <username>
color: '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}'
title: '{{ template "slack.default.title" . }}'
title_link: '{{ template "slack.default.titlelink" . }}'
pretext: '{{ template "slack.default.pretext" . }}'
text: '{{ template "slack.default.text" . }}'
fallback: '{{ template "slack.default.fallback" . }}'
icon_emoji: '{{ template "slack.default.iconemoji" . }}'
icon_url: '{{ template "slack.default.iconurl" . }}'
templates: []
}
}Channel Configuration
Variables | Global configuration
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
ALERT TEMPLATING
▸ What | How to say …
https://prometheus.io/blog/2016/03/03/custom-alertmanager-templates/
- send_resolved: true
api_url: <secret>
channel: '#<channel-name>'
username: <username>
color: '{{ if eq .Status "firing" }}danger{{ else }}
good{{ end }}'
title: '{{ template "slack.default.title" . }}'
title_link: '{{ template "slack.default.titlelink" . }}'
pretext: '{{ template "slack.default.pretext" . }}'
text: '{{ template "slack.default.text" . }}'
fallback: '{{ template "slack.default.fallback" . }}'
icon_emoji: '{{ template "slack.default.iconemoji" . }}'
icon_url: '{{ template "slack.default.iconurl" . }}'
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
SILENCING, VIA UI / API
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
ANSWERS REQUIRED FEATURES
Accessibility
Scheduling
SLA’s assured
Auth & Authorization
Escalation
Durable & Resilient
Forensics
Automatic
Flexible & Elastic
Accountable
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
NEXT STEPS
INFRUSTRUCTURE (OS)
APPLICATION
EXTERNAL (DEPENDENCY / ENDPOINT)
REMEDIABLE ?
ALEARTABLE ?
LOG CORRELATION
}
ALERT MANAGER
LEGACY
IDENTIFY
CHOOSE
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
DEMO TIME
‣ Docker-compose - ready fro R&D to start using to run create custom application
Metrics.
‣ Prometheus, Node_exporter, Alertmanager Cadvisor, Grafana
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
DOCKER SETTINGS - VOLUMES, NETWORKS
version: ‘2'
volumes:
prometheus_data: {}
grafana_data: {}
networks:
front-tier:
driver: bridge
back-tier:
driver: bridge
Docker-compose version
Docker volumes for preometheus and grafana
Docker Networks
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
PROMETHEUS - OFFICIAL CONTAINER
services:
prometheus:
image: prom/prometheus
container_name: prometheus
volumes:
- ./prometheus/:/etc/prometheus/
- prometheus_data:/prometheus
command:
- '-config.file=/etc/prometheus/prometheus.yml'
- '-storage.local.path=/prometheus'
- '-alertmanager.url=http://alertmanager:9093'
expose:
- 9090
ports:
- 9090:9090
links:
- cadvisor:cadvisor
- alertmanager:alertmanager
depends_on:
- cadvisor
networks:
- back-tier
Docker Service name
Docker volumes for prometheus and grafana
Expose as service on specified port
Ports to expose as service
Link to cadvisor & alertmanager
Network placement ‘back-tier’
Configuration
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
NODE-EXPORTER [ NODE METRICS COLLECTOR ]
node-exporter:
container_name: node-exporter
image: prom/node-exporter
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command: '-collector.procfs=/host/proc -collector.sysfs=/host/sys
-collector.filesystem.ignored-mount-points="^(/rootfs|/host|)/(sys|
proc|dev|host|etc)($$|/)" collector.filesystem.ignored-fs-
types="^(sys|proc|auto|cgroup|devpts|ns|au|fuse.lxc|mqueue)(fs|)$$"'
expose:
- 9100
networks:
- back-tier
Access to /proc /sys
What to mount from
OS to container for
metric collection
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
ALERT MANAGER
alertmanager:
image: prom/alertmanager
ports:
- 9093:9093
volumes:
- ./alertmanager/:/etc/alertmanager/
networks:
- back-tier
command:
- '-config.file=/etc/alertmanager/config.yml'
- '-storage.path=/alertmanager'
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
CADVISOR
cadvisor:
image: google/cadvisor
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
expose:
- 8080
networks:
- back-tier
grafana:
image: grafana/grafana
depends_on:
- prometheus
ports:
- 3000:3000
volumes:
- grafana_data:/var/lib/grafana
env_file:
- config.monitoring
networks:
- back-tier
- front-tier
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
GRAFANA
grafana:
image: grafana/grafana
depends_on:
- prometheus
ports:
- 3000:3000
volumes:
- grafana_data:/var/lib/grafana
env_file:
- config.monitoring
networks:
- back-tier
- front-tier
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
DOCKER PS
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3dcfd7c289cb grafana/grafana "/run.sh" 21 hours ago Up 4 minutes 0.0.0.0:3000->3000/tcp prometheus_grafana_1
2b2817fc0bd9 prom/prometheus "/bin/prometheus -..." 21 hours ago Up 4 minutes 0.0.0.0:9090->9090/tcp prometheus
d2c6849d3bd9 google/cadvisor "/usr/bin/cadvisor..." 21 hours ago Up 4 minutes 8080/tcp prometheus_cadvisor_1
d4a3c3ceb97d prom/node-exporter "/bin/node_exporte..." 21 hours ago Up 4 minutes 9100/tcp node-exporter
75eb08791ea9 prom/alertmanager "/bin/alertmanager..." 21 hours ago Up 4 minutes 0.0.0.0:9093->9093/tcp prometheus_alertmanager_1
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
DEMO PROJECT ON GITHUB
https://github.com/shelleg/monlog-compose-stack
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
‣ All containers - monitored by prometheus + graphed in a small nice project.
Tikal Knowledge
TEXT
ROLLOUT [ LLD ]
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
PLACEMENT OPTIONS
‣ 1 main prometheus server vs. 1 Prometheus server per team
‣ 1 Alert-manager [ with pre-defined “receivers” ] vs. 1 per team / concern
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
DEPLOYMENT OPTIONS
‣ Automate deployment of prometheus server(s) / Alert-manager [ pre-defined
“receivers” ]
‣ Ansible, puppet etc
‣ Jenkins
‣ The combination of the 2 ;)
‣ Automation helps solve the “one 2 Many” dilemma IMHO …
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
DEVELOPER STACK
‣ Options:
‣ Personal Docker / Docker-compose[ private fork if desired ]
‣ A small startup.cmd / startup.sh starting go applications of promethes & alertmanager
‣ A centralized Grafana / Alertmanager with only prometheus on dev-machine
‣ Toolkit for
‣ develop metrics, alarms, graphs
‣ Add exporters to configuration [ tendency :: as common as you develop new services ]
‣ SDLC -> Gil Pull/MErge request mechanism
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
DEVELOPER STACK(S) - EXAMPLE
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
ALERTS IN SCM MASTER -> STG -> PRD
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
POPULATE ALERTS | METRICS | DASHBOARDS VIA SCM
1. Use “ready made” || good starring point graphs from grafana dashboard exchange or build your own
2. Customize
3. Add / push to git master branch
4. “ci” server -> listen on GitHook -> push to staging
5. “ci” server -> wait for manual trigger -> push to production
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
CONTINUOUS DELIVERY OPTIONS [ ADDING AN ALERT SAMPLE WORKFLOW ]
master (dev)
staging
production
DEVELOP
DEPLOY TO STAGE
DEPLOY TO PROD
1 centralized repo
branch per env /
prometheus instance
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
CONTINUOUS DELIVERY OPTIONS [ ADDING GRAPHS ]
master (dev)
staging
production
DEVELOP
DEPLOY TO STAGE
DEPLOY TO PROD
“Grafana Dashboard hub”
- separate repo ?
- part of monitoring repo ?
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
CI PIPELINE -DATA ORIGINS & PRESENTATION
Exporters
REGION POD INSTANCE *
}
}
App Metrics
OS Metrics
Filter Tags & Alerts
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
CI PIPELINE
DEV
STAGING
PRODUCTION
STACK / APP
NAME
ALERTMANAGE
R
ALERTMANAGE
R
Web-hook (PR-builder)
GRAFANA
GRAFANA
OPS “CLEANUP” ROUTINE(S)
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
BUILDING THE PIPELINE
‣ Routine on submit / push builds to dev/stg
‣ Run daily / weekly deployments of Alerts (prometheus) |
Dashboards (grafana)
‣ Avoid / rollback any manual changes of Alerts /
Graphs etc
‣ Help make automation a common practice
‣ Scheduled task which syncs and re-configures the
desired state from SCM
Tikal Knowledge
INTRODUCTION TO MODERN MONITORING
MESURE THE PIPELINE
‣ Pipeline steps are monitored
‣ Expose metrics such as:
‣ deployment time & status [ in env | stack etc ]
‣ count (# of alerts, new vs old last week, month etc)
‣ Metric counters [ application metrics ] …
‣ [ Jenkins exporter || push gateway TBD ]
Tikal Knowledge
FEEDBACK / QUESTIONS ? I’M HERE …
HAGZAG@TIKALK.COM, 0545302525
Haggai Philip Zagury - Tikal Knowledge
MONITORING HLD
FullStack Developers Israel

More Related Content

What's hot

DevOps Transformation in Technical
DevOps Transformation in TechnicalDevOps Transformation in Technical
DevOps Transformation in Technical
Opsta
 
DevOps Training | DevOps Training Video | DevOps Tools | DevOps Tutorial For ...
DevOps Training | DevOps Training Video | DevOps Tools | DevOps Tutorial For ...DevOps Training | DevOps Training Video | DevOps Tools | DevOps Tutorial For ...
DevOps Training | DevOps Training Video | DevOps Tools | DevOps Tutorial For ...
Simplilearn
 
Gitops: a new paradigm for software defined operations
Gitops: a new paradigm for software defined operationsGitops: a new paradigm for software defined operations
Gitops: a new paradigm for software defined operations
Mariano Cunietti
 
DevOps Tutorial For Beginners | DevOps Tutorial | DevOps Tools | DevOps Train...
DevOps Tutorial For Beginners | DevOps Tutorial | DevOps Tools | DevOps Train...DevOps Tutorial For Beginners | DevOps Tutorial | DevOps Tools | DevOps Train...
DevOps Tutorial For Beginners | DevOps Tutorial | DevOps Tools | DevOps Train...
Simplilearn
 
DevOps - A Gentle Introduction
DevOps - A Gentle IntroductionDevOps - A Gentle Introduction
DevOps - A Gentle Introduction
Ganesh Samarthyam
 
DevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best PracticesDevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best Practices
QBurst
 
GitOps with Gitkube
GitOps with GitkubeGitOps with Gitkube
GitOps with Gitkube
Tirumarai Selvan
 
GitOps - Modern best practices for high velocity app dev using cloud native t...
GitOps - Modern best practices for high velocity app dev using cloud native t...GitOps - Modern best practices for high velocity app dev using cloud native t...
GitOps - Modern best practices for high velocity app dev using cloud native t...
Weaveworks
 
How to implement DevOps in your Organization
How to implement DevOps in your OrganizationHow to implement DevOps in your Organization
How to implement DevOps in your Organization
Dalibor Blazevic
 
Devops Devops Devops
Devops Devops DevopsDevops Devops Devops
Devops Devops Devops
Kris Buytaert
 
Rapid Strategic SRE Assessments
Rapid Strategic SRE AssessmentsRapid Strategic SRE Assessments
Rapid Strategic SRE Assessments
Marc Hornbeek
 
Building a DevOps organization
Building a DevOps organizationBuilding a DevOps organization
Building a DevOps organization
Zinnov
 
Azure DevOps & GitHub... Better Together!
Azure DevOps & GitHub... Better Together!Azure DevOps & GitHub... Better Together!
Azure DevOps & GitHub... Better Together!
Lorenzo Barbieri
 
GitOps 101 Presentation.pdf
GitOps 101 Presentation.pdfGitOps 101 Presentation.pdf
GitOps 101 Presentation.pdf
ssuser31375f
 
Monitoring Pull vs Push, InfluxDB and Prometheus
Monitoring Pull vs Push, InfluxDB and PrometheusMonitoring Pull vs Push, InfluxDB and Prometheus
Monitoring Pull vs Push, InfluxDB and Prometheus
Gianluca Arbezzano
 
我們與Azure DevOps的距離
我們與Azure DevOps的距離我們與Azure DevOps的距離
我們與Azure DevOps的距離
Edward Kuo
 
DevOps Best Practices
DevOps Best PracticesDevOps Best Practices
DevOps Best Practices
Giragadurai Vallirajan
 
Introduction to devops
Introduction to devopsIntroduction to devops
Introduction to devops
UtpalenduChakrobortt1
 
cLoki: Like Loki but for ClickHouse
cLoki: Like Loki but for ClickHousecLoki: Like Loki but for ClickHouse
cLoki: Like Loki but for ClickHouse
Altinity Ltd
 
Learn from the Experts: Using DORA Metrics to Accelerate Value Stream Flow
Learn from the Experts: Using DORA Metrics to Accelerate Value Stream FlowLearn from the Experts: Using DORA Metrics to Accelerate Value Stream Flow
Learn from the Experts: Using DORA Metrics to Accelerate Value Stream Flow
DevOps.com
 

What's hot (20)

DevOps Transformation in Technical
DevOps Transformation in TechnicalDevOps Transformation in Technical
DevOps Transformation in Technical
 
DevOps Training | DevOps Training Video | DevOps Tools | DevOps Tutorial For ...
DevOps Training | DevOps Training Video | DevOps Tools | DevOps Tutorial For ...DevOps Training | DevOps Training Video | DevOps Tools | DevOps Tutorial For ...
DevOps Training | DevOps Training Video | DevOps Tools | DevOps Tutorial For ...
 
Gitops: a new paradigm for software defined operations
Gitops: a new paradigm for software defined operationsGitops: a new paradigm for software defined operations
Gitops: a new paradigm for software defined operations
 
DevOps Tutorial For Beginners | DevOps Tutorial | DevOps Tools | DevOps Train...
DevOps Tutorial For Beginners | DevOps Tutorial | DevOps Tools | DevOps Train...DevOps Tutorial For Beginners | DevOps Tutorial | DevOps Tools | DevOps Train...
DevOps Tutorial For Beginners | DevOps Tutorial | DevOps Tools | DevOps Train...
 
DevOps - A Gentle Introduction
DevOps - A Gentle IntroductionDevOps - A Gentle Introduction
DevOps - A Gentle Introduction
 
DevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best PracticesDevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best Practices
 
GitOps with Gitkube
GitOps with GitkubeGitOps with Gitkube
GitOps with Gitkube
 
GitOps - Modern best practices for high velocity app dev using cloud native t...
GitOps - Modern best practices for high velocity app dev using cloud native t...GitOps - Modern best practices for high velocity app dev using cloud native t...
GitOps - Modern best practices for high velocity app dev using cloud native t...
 
How to implement DevOps in your Organization
How to implement DevOps in your OrganizationHow to implement DevOps in your Organization
How to implement DevOps in your Organization
 
Devops Devops Devops
Devops Devops DevopsDevops Devops Devops
Devops Devops Devops
 
Rapid Strategic SRE Assessments
Rapid Strategic SRE AssessmentsRapid Strategic SRE Assessments
Rapid Strategic SRE Assessments
 
Building a DevOps organization
Building a DevOps organizationBuilding a DevOps organization
Building a DevOps organization
 
Azure DevOps & GitHub... Better Together!
Azure DevOps & GitHub... Better Together!Azure DevOps & GitHub... Better Together!
Azure DevOps & GitHub... Better Together!
 
GitOps 101 Presentation.pdf
GitOps 101 Presentation.pdfGitOps 101 Presentation.pdf
GitOps 101 Presentation.pdf
 
Monitoring Pull vs Push, InfluxDB and Prometheus
Monitoring Pull vs Push, InfluxDB and PrometheusMonitoring Pull vs Push, InfluxDB and Prometheus
Monitoring Pull vs Push, InfluxDB and Prometheus
 
我們與Azure DevOps的距離
我們與Azure DevOps的距離我們與Azure DevOps的距離
我們與Azure DevOps的距離
 
DevOps Best Practices
DevOps Best PracticesDevOps Best Practices
DevOps Best Practices
 
Introduction to devops
Introduction to devopsIntroduction to devops
Introduction to devops
 
cLoki: Like Loki but for ClickHouse
cLoki: Like Loki but for ClickHousecLoki: Like Loki but for ClickHouse
cLoki: Like Loki but for ClickHouse
 
Learn from the Experts: Using DORA Metrics to Accelerate Value Stream Flow
Learn from the Experts: Using DORA Metrics to Accelerate Value Stream FlowLearn from the Experts: Using DORA Metrics to Accelerate Value Stream Flow
Learn from the Experts: Using DORA Metrics to Accelerate Value Stream Flow
 

Similar to Modern Monitoring [ with Prometheus ]

Transforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOpsTransforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOps
Nicolas (Nick) Barcet
 
Chaos is a ladder !
Chaos is a ladder !Chaos is a ladder !
Chaos is a ladder !
Haggai Philip Zagury
 
Data Center Migration Essentials - Adam Saint-Prix Tim Wong
Data Center Migration Essentials - Adam Saint-Prix Tim WongData Center Migration Essentials - Adam Saint-Prix Tim Wong
Data Center Migration Essentials - Adam Saint-Prix Tim Wong
Atlassian
 
ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015
Shannon Lietz
 
5 Steps to Get Precise SAP Impact-Based Testing
5 Steps to Get Precise SAP Impact-Based Testing5 Steps to Get Precise SAP Impact-Based Testing
5 Steps to Get Precise SAP Impact-Based Testing
TurnKey Solutions
 
Winning Governance Strategies for the Technology Disruptions of our Time
Winning Governance Strategies for the Technology Disruptions of our TimeWinning Governance Strategies for the Technology Disruptions of our Time
Winning Governance Strategies for the Technology Disruptions of our Time
CloudHesive
 
GitOps, Driving NGN Operations Teams 211127 #kcdgt 2021
GitOps, Driving NGN Operations Teams 211127 #kcdgt 2021GitOps, Driving NGN Operations Teams 211127 #kcdgt 2021
GitOps, Driving NGN Operations Teams 211127 #kcdgt 2021
William Caban
 
Success Factors for a Mature Microservices Implementation
Success Factors for a Mature Microservices ImplementationSuccess Factors for a Mature Microservices Implementation
Success Factors for a Mature Microservices Implementation
Dustin Ruehle
 
AWS Public Sector Symposium 2014 Canberra | Continuous Integration and Deploy...
AWS Public Sector Symposium 2014 Canberra | Continuous Integration and Deploy...AWS Public Sector Symposium 2014 Canberra | Continuous Integration and Deploy...
AWS Public Sector Symposium 2014 Canberra | Continuous Integration and Deploy...
Amazon Web Services
 
Planning open stack-poc
Planning open stack-pocPlanning open stack-poc
Kube Security Shifting left | Scanners & OPA
Kube Security Shifting left | Scanners & OPAKube Security Shifting left | Scanners & OPA
Kube Security Shifting left | Scanners & OPA
Haggai Philip Zagury
 
Partnership with Synergy
Partnership with SynergyPartnership with Synergy
Partnership with Synergy
Pointwest
 
DevSecCon Keynote
DevSecCon KeynoteDevSecCon Keynote
DevSecCon Keynote
Shannon Lietz
 
DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015
Shannon Lietz
 
Production-Ready Kubernetes: It's Not About Technology
Production-Ready Kubernetes: It's Not About TechnologyProduction-Ready Kubernetes: It's Not About Technology
Production-Ready Kubernetes: It's Not About Technology
Antoine Craske
 
How Cloud-Ready Alerting Is Optimal For Today's Environments
How Cloud-Ready Alerting Is Optimal For Today's EnvironmentsHow Cloud-Ready Alerting Is Optimal For Today's Environments
How Cloud-Ready Alerting Is Optimal For Today's Environments
SignalFx
 
Introduction to DevOps
Introduction to DevOpsIntroduction to DevOps
Introduction to DevOps
OCTO Technology
 
Chicago DevOps Meetup Nov2019
Chicago DevOps Meetup Nov2019Chicago DevOps Meetup Nov2019
Chicago DevOps Meetup Nov2019
Mike Villiger
 
ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...
ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...
ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...
AgileNetwork
 
Digital day - Devops & Continuous delivery
Digital day - Devops & Continuous deliveryDigital day - Devops & Continuous delivery
Digital day - Devops & Continuous delivery
Bruno Simioni
 

Similar to Modern Monitoring [ with Prometheus ] (20)

Transforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOpsTransforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOps
 
Chaos is a ladder !
Chaos is a ladder !Chaos is a ladder !
Chaos is a ladder !
 
Data Center Migration Essentials - Adam Saint-Prix Tim Wong
Data Center Migration Essentials - Adam Saint-Prix Tim WongData Center Migration Essentials - Adam Saint-Prix Tim Wong
Data Center Migration Essentials - Adam Saint-Prix Tim Wong
 
ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015
 
5 Steps to Get Precise SAP Impact-Based Testing
5 Steps to Get Precise SAP Impact-Based Testing5 Steps to Get Precise SAP Impact-Based Testing
5 Steps to Get Precise SAP Impact-Based Testing
 
Winning Governance Strategies for the Technology Disruptions of our Time
Winning Governance Strategies for the Technology Disruptions of our TimeWinning Governance Strategies for the Technology Disruptions of our Time
Winning Governance Strategies for the Technology Disruptions of our Time
 
GitOps, Driving NGN Operations Teams 211127 #kcdgt 2021
GitOps, Driving NGN Operations Teams 211127 #kcdgt 2021GitOps, Driving NGN Operations Teams 211127 #kcdgt 2021
GitOps, Driving NGN Operations Teams 211127 #kcdgt 2021
 
Success Factors for a Mature Microservices Implementation
Success Factors for a Mature Microservices ImplementationSuccess Factors for a Mature Microservices Implementation
Success Factors for a Mature Microservices Implementation
 
AWS Public Sector Symposium 2014 Canberra | Continuous Integration and Deploy...
AWS Public Sector Symposium 2014 Canberra | Continuous Integration and Deploy...AWS Public Sector Symposium 2014 Canberra | Continuous Integration and Deploy...
AWS Public Sector Symposium 2014 Canberra | Continuous Integration and Deploy...
 
Planning open stack-poc
Planning open stack-pocPlanning open stack-poc
Planning open stack-poc
 
Kube Security Shifting left | Scanners & OPA
Kube Security Shifting left | Scanners & OPAKube Security Shifting left | Scanners & OPA
Kube Security Shifting left | Scanners & OPA
 
Partnership with Synergy
Partnership with SynergyPartnership with Synergy
Partnership with Synergy
 
DevSecCon Keynote
DevSecCon KeynoteDevSecCon Keynote
DevSecCon Keynote
 
DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015
 
Production-Ready Kubernetes: It's Not About Technology
Production-Ready Kubernetes: It's Not About TechnologyProduction-Ready Kubernetes: It's Not About Technology
Production-Ready Kubernetes: It's Not About Technology
 
How Cloud-Ready Alerting Is Optimal For Today's Environments
How Cloud-Ready Alerting Is Optimal For Today's EnvironmentsHow Cloud-Ready Alerting Is Optimal For Today's Environments
How Cloud-Ready Alerting Is Optimal For Today's Environments
 
Introduction to DevOps
Introduction to DevOpsIntroduction to DevOps
Introduction to DevOps
 
Chicago DevOps Meetup Nov2019
Chicago DevOps Meetup Nov2019Chicago DevOps Meetup Nov2019
Chicago DevOps Meetup Nov2019
 
ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...
ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...
ANIn Chennai April 2024 |Beyond Big Bang: Technical Agility in Vintage Produc...
 
Digital day - Devops & Continuous delivery
Digital day - Devops & Continuous deliveryDigital day - Devops & Continuous delivery
Digital day - Devops & Continuous delivery
 

More from Haggai Philip Zagury

DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
Haggai Philip Zagury
 
TechRadarCon 2022 | Have you built your platform yet ?
TechRadarCon 2022 | Have you built your platform yet ?TechRadarCon 2022 | Have you built your platform yet ?
TechRadarCon 2022 | Have you built your platform yet ?
Haggai Philip Zagury
 
DevEx | there’s no place like k3s
DevEx | there’s no place like k3sDevEx | there’s no place like k3s
DevEx | there’s no place like k3s
Haggai Philip Zagury
 
Git ops & Continuous Infrastructure with terra*
Git ops  & Continuous Infrastructure with terra*Git ops  & Continuous Infrastructure with terra*
Git ops & Continuous Infrastructure with terra*
Haggai Philip Zagury
 
Auth experience - vol 1.0
Auth experience  - vol 1.0Auth experience  - vol 1.0
Auth experience - vol 1.0
Haggai Philip Zagury
 
Linux intro
Linux introLinux intro
Auth experience
Auth experienceAuth experience
Auth experience
Haggai Philip Zagury
 
Kubexperience intro session
Kubexperience intro sessionKubexperience intro session
Kubexperience intro session
Haggai Philip Zagury
 
Scaling i/o bound Microservices
Scaling i/o bound MicroservicesScaling i/o bound Microservices
Scaling i/o bound Microservices
Haggai Philip Zagury
 
The 2nd half. Scaling to the next^2
The 2nd half. Scaling to the next^2The 2nd half. Scaling to the next^2
The 2nd half. Scaling to the next^2
Haggai Philip Zagury
 
Terraform 101
Terraform 101Terraform 101
Terraform 101
Haggai Philip Zagury
 
Natively clouded Journey
Natively clouded JourneyNatively clouded Journey
Natively clouded Journey
Haggai Philip Zagury
 
Deep Learning - Continuous Operations
Deep Learning - Continuous Operations Deep Learning - Continuous Operations
Deep Learning - Continuous Operations
Haggai Philip Zagury
 
Terraform 101
Terraform 101Terraform 101
Terraform 101
Haggai Philip Zagury
 
Helm intro
Helm introHelm intro
Machine Learning - Continuous operations
Machine Learning - Continuous operationsMachine Learning - Continuous operations
Machine Learning - Continuous operations
Haggai Philip Zagury
 
Whats all the FaaS About
Whats all the FaaS AboutWhats all the FaaS About
Whats all the FaaS About
Haggai Philip Zagury
 
Git internals
Git internalsGit internals
Git internals
Haggai Philip Zagury
 
Tce automation-d4-110102123012-phpapp01
Tce automation-d4-110102123012-phpapp01Tce automation-d4-110102123012-phpapp01
Tce automation-d4-110102123012-phpapp01
Haggai Philip Zagury
 

More from Haggai Philip Zagury (19)

DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...
 
TechRadarCon 2022 | Have you built your platform yet ?
TechRadarCon 2022 | Have you built your platform yet ?TechRadarCon 2022 | Have you built your platform yet ?
TechRadarCon 2022 | Have you built your platform yet ?
 
DevEx | there’s no place like k3s
DevEx | there’s no place like k3sDevEx | there’s no place like k3s
DevEx | there’s no place like k3s
 
Git ops & Continuous Infrastructure with terra*
Git ops  & Continuous Infrastructure with terra*Git ops  & Continuous Infrastructure with terra*
Git ops & Continuous Infrastructure with terra*
 
Auth experience - vol 1.0
Auth experience  - vol 1.0Auth experience  - vol 1.0
Auth experience - vol 1.0
 
Linux intro
Linux introLinux intro
Linux intro
 
Auth experience
Auth experienceAuth experience
Auth experience
 
Kubexperience intro session
Kubexperience intro sessionKubexperience intro session
Kubexperience intro session
 
Scaling i/o bound Microservices
Scaling i/o bound MicroservicesScaling i/o bound Microservices
Scaling i/o bound Microservices
 
The 2nd half. Scaling to the next^2
The 2nd half. Scaling to the next^2The 2nd half. Scaling to the next^2
The 2nd half. Scaling to the next^2
 
Terraform 101
Terraform 101Terraform 101
Terraform 101
 
Natively clouded Journey
Natively clouded JourneyNatively clouded Journey
Natively clouded Journey
 
Deep Learning - Continuous Operations
Deep Learning - Continuous Operations Deep Learning - Continuous Operations
Deep Learning - Continuous Operations
 
Terraform 101
Terraform 101Terraform 101
Terraform 101
 
Helm intro
Helm introHelm intro
Helm intro
 
Machine Learning - Continuous operations
Machine Learning - Continuous operationsMachine Learning - Continuous operations
Machine Learning - Continuous operations
 
Whats all the FaaS About
Whats all the FaaS AboutWhats all the FaaS About
Whats all the FaaS About
 
Git internals
Git internalsGit internals
Git internals
 
Tce automation-d4-110102123012-phpapp01
Tce automation-d4-110102123012-phpapp01Tce automation-d4-110102123012-phpapp01
Tce automation-d4-110102123012-phpapp01
 

Recently uploaded

Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
GohKiangHock
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
Yara Milbes
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
Alberto Brandolini
 
Requirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional SafetyRequirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional Safety
Ayan Halder
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
sjcobrien
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
Rakesh Kumar R
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
Bert Jan Schrijver
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
Mobile app Development Services | Drona Infotech
Mobile app Development Services  | Drona InfotechMobile app Development Services  | Drona Infotech
Mobile app Development Services | Drona Infotech
Drona Infotech
 
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
mz5nrf0n
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
Peter Muessig
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 

Recently uploaded (20)

Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
 
Requirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional SafetyRequirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional Safety
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
Mobile app Development Services | Drona Infotech
Mobile app Development Services  | Drona InfotechMobile app Development Services  | Drona Infotech
Mobile app Development Services | Drona Infotech
 
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 

Modern Monitoring [ with Prometheus ]

  • 1. Tikal KnowledgeTikal Knowledge Haggai Philip Zagury - DevOps Group Lead - Tikal Knowledge
  • 2. FullStack Developers Israel INTRO - WHO WE ARE WHO WE ARE ? ▸ Tikal helps ISV’s in Israel & abroad in their technological challenges. ▸ Our Engineers are Fullstack Developers with expertise in Android, DevOps, Java, JS, Ruby & Python ▸ We are passionate about technology and specialize in OpenSource technologies. ▸ Our Tech and Group leaders help establish & enhance existing software teams with innovative & creative thinking. https://www.meetup.com/full-stack-developer-il/
  • 3. INTRODUCTION TO MODERN MONITORING CURRENT STATUS [ INFRASTRUCTURE ] ▸ AWS, Cloud, Hybrid / Multi Cloud ▸ Define metrics and system health based on experience and application specific behaviors. ▸ Many False Positives ▸ Scaling is hard [ semi-auto, manual ] Tikal Knowledge
  • 4. INTRODUCTION TO MODERN MONITORING COMMON MONITORING STATUS ▸ OPS own monitoring domain ▸ Define metrics and system health based on experience and application specific behaviours. ▸ Many False Positives ▸ Scaling is hard [ semi-auto, manual ] Tikal Knowledge
  • 5. INTRODUCTION TO MODERN MONITORING COMMON MONITORING SOLUTIONS ▸ cloud watch ▸ new relic ▸ Nagios ▸ App Dynamics ▸ Data Dog ▸ Many more …. Tikal Knowledge
  • 6. INTRODUCTION TO MODERN MONITORING GOALS ▸ Improve existing monitoring and RCA indicators ▸ Reduce false positives & ‘customer driven alerting’ ▸ Proactively identify data anomalies / diversions ▸ Provide meaningful / intelligent notifications [ severity, SLA compliance etc ] ▸ Proactively remediate commonly known issues, or set the foundation of a robust substitute ▸ Provide KPI integration policy & methodology for both DevOps & R&D teams Tikal Knowledge
  • 7. INTRODUCTION TO MODERN MONITORING CHALLENGES ▸ Preserve the knowledge and insights in the existing Monitoring system ▸ Cultural changes: ▸ APM is part of the development process ▸ Monitoring tools are part of the developer stack (or he will wake up on any issue with his code/app) ▸ On-call isn’t only for OPS … Everybody’s accountable ▸ breakdown the “wall of confusion” between dev and ops Tikal Knowledge
  • 9. The Gap of Traditional Monitoring - We know what we want to know … Tikal Knowledge
  • 10. System Metrics Not enough || Too much a little too late Tikal Knowledge
  • 11. We do not always know what we are looking @ / 4 … Tikal Knowledge
  • 12. Is this OK ?! || Normal What happened at 4AM Tikal Knowledge
  • 13. If your’e lucky + = No action needed Tikal Knowledge
  • 14. Go back to sleep ( you still work up ! ) Tikal Knowledge
  • 16. Stop using Nagios (so it can die peacefully) Feb 13, 2014 [ slideshare ] Tikal Knowledge
  • 17. In 2 words: Configuration files… In a few more: - resources - services - dependencies - … Tikal Knowledge
  • 18. Traditional Monitoring • Reliable • Durable • Scalable Conclusion … system monitoring does not suffice, enter APM Tikal Knowledge
  • 19. HOW DID WE GET HERE Tikal Knowledge
  • 20. INTRODUCTION TO MODERN MONITORING TRADITIONAL MONITORING WAS(IS) ALL ABOUT THE “BLACK BOX” | “OS” METRICS ▸ All we care about is that the system is OK … APPLICATION FROTNEND APPLICATION BACKEND APPLICATION DATABASE Tikal Knowledge
  • 21. INTRODUCTION TO MODERN MONITORING OPS ARE WORKING ON OPTIMIZING INFRASTRUCTURE … ▸ Throw more RAM & “Reports” ▸ Add another node to the “FE cluster” ▸ Add another shard to the DB … ▸ …. APPLICATION … Tikal Knowledge
  • 22. INTRODUCTION TO MODERN MONITORING IN THE PAST ~10 YEARS ▸ Developers have started to implement METRICS ▸ Organizations are adopting Standards ▸ Common metrics have become a commodity Tikal Knowledge
  • 25. Multipule Dimensions • [ Stability ] • Ops dimension • [ Innovation ] • Dev dimension • Product dimension Tikal Knowledge
  • 26. Even More • Environment [ stg, uat, prod ] • Application Stack(s) || tags || types • Business metrics Tikal Knowledge
  • 27. TEAMS | SCOPES | METRICS - COME TOGETHR
  • 30. INTRODUCTION TO MODERN MONITORING MONITORING CRITARIA’S ▸ Server (OS) level monitoring ▸ Application Monitoring (APM) ▸ Perimeter (External website) monitoring ▸ Event driven remediation ▸ Alerting and Escalation ▸ Associated log data & anomaly detection Tikal Knowledge
  • 31. INTRODUCTION TO MODERN MONITORING REQUIRED FEATURES Accessibility Scheduling SLA’s assured Auth & Authorization Escalation Durable & Resilient Forensics Automatic Flexible & Elastic Accountable Tikal Knowledge
  • 32. INTRODUCTION TO MODERN MONITORING IT’S AN ITERATIVE PROCESS ▸ How quick did we recover ? ▸ What worked / Didn’t work ? ▸ Iterative improvements [ Chaos Monkey, 10 story test ] ▸ RCA -> Remediation [ a.k.a False positive lifecycle ] Tikal Knowledge
  • 34. INTRODUCTION TO MODERN MONITORING HOW TO DEFINE A METRIC OR ALERT VS. HOW TO STORE DATA ▸ A Metric’s Lifecycle & Design ▸ Time Series Data stream(s) || source(s) ▸ Common tagging ▸ Metric naming conventions and implications ▸ Micro Services, Integration of Traditional and New Generation solutions ▸ Choose short, mid & long term tools / services Tikal Knowledge
  • 35. INTRODUCTION TO MODERN MONITORING A METRIC’S LIFECYCLE NEW (A) METRIC INFRUSTRUCTURE (OS) APPLICATION EXTERNAL (DEPENDENCY / ENDPOINT) REMEDIABLE ? ALEARTABLE ? LOG CORRELATION SCOPE OF IMPACT LEARN IN DEV | STG } } DEFINE IN DEV | STG } SHIP TO PROD Tikal Knowledge
  • 36. INTRODUCTION TO MODERN MONITORING A METRIC’S LIFECYCLE - “TAG-ABLE” == FILTERABLE | MEASURABLE | QUANTIFIABLE NEW (A) METRIC INFRUSTRUCTURE (OS) APPLICATION EXTERNAL (DEPENDENCY / ENDPOINT) REMEDIABLE ? ALEARTABLE ? LOG CORRELATION SCOPE OF IMPACT LEARN IN DEV | STG } } DEFINE IN DEV | STG } SHIP TO PROD DEVLOPMENT STAGING PRODUCTIONENVIRONMENT Tikal Knowledge
  • 37. INTRODUCTION TO MODERN MONITORING A METRIC’S LIFECYCLE NEW (A) METRIC INFRUSTRUCTURE (OS) APPLICATION EXTERNAL (DEPENDENCY / ENDPOINT) REMEDIABLE ? ALEARTABLE ? LOG CORRELATION SCOPE OF IMPACT LEARN IN DEV | STG } } DEFINE IN DEV | STG } SHIP TO PROD - QUANTIFIABLE METRICS: SEVERITY, CRITICAL STATE - EXPOSING A SERVICE - CONSUMING A SERVICE - - WHY DOES MY SERVICE HAVE AN OS IMPACT ? - - IS IT BY DESIGN ? - FALLBACK METHODS ? - ALTERNATE ENDPOINTS / RETRY ? - FEATURE TOGGLE - DEFINE SEVERITY 37 Tikal Knowledge
  • 38. INTRODUCTION TO MODERN MONITORING TSD PRINCIPLES Credit->http://opentsdb.net/overview.html Tikal Knowledge
  • 39. INTRODUCTION TO MODERN MONITORING DATAPOINTS Credit->https://www.datadoghq.com/blog/the-power-of-tagged-metrics/ IntoolslikePrometheusyoudon'tneedthetimestampitjustusescollectiontimestamp Tikal Knowledge
  • 40. INTRODUCTION TO MODERN MONITORING MIX ’N’ MATCH Tikal Knowledge
  • 41. INTRODUCTION TO MODERN MONITORING SHORT | MID | LONG TERM SOLUTIONS Tikal Knowledge
  • 43. INTRODUCTION TO MODERN MONITORING FEATURES ▸ Open-source systems monitoring and alerting toolkit ▸ A multi-dimensional data model (time series identified by metric name and key/value pairs) ▸ A flexible query language to leverage this dimensionality ▸ A no reliance on distributed storage; single server nodes are autonomous** ▸ A time series collection happens via a pull model over HTTP ▸ A pushing time series is supported via an intermediary gateway ▸ A targets are discovered via service discovery or static configuration ▸ A multiple modes of graphing and dashboarding support Tikal Knowledge
  • 44. INTRODUCTION TO MODERN MONITORING PROMETHEUS ARCHITECTURE Dashboarding Prometheus Server Alertmanager Retrieval / Collection DataSerie s Storage [DB] PromQ L web UI Prometheus server Prometheus server(s) Push Gateway Service Discovery Providers Prometheus server Prometheus exporters Tikal Knowledge
  • 45. INTRODUCTION TO MODERN MONITORING UNTIL NOW ‣ Try providing this to each developer ‣ Sensu has a very similar approach to APM … ‣ Complexity is the barrier … Tikal Knowledge
  • 46. INTRODUCTION TO MODERN MONITORING UNTIL NOW ‣ Pull has become an advantage … ‣ Severity is implied [TSD] ‣ False Positives reduction ‣ Docker makes it super simple ‣ Go Lang lightweight approach Tikal Knowledge
  • 48. INTRODUCTION TO MODERN MONITORING IMPLEMENTATION ‣ Review old system metrics & capabilities and decide what’s good whats bad ‣ What can move ‣ What needs to stay | integrate to new system ‣ Prometheus deployment is Automated from day 1 ‣ Prometheus exporter services are tagged and labeled per application stack | layer ‣ Preferably Dockerized ‣ Metric Design Workshops | meetings | slack group ‣ Alert Design Workshops | meetings | slack group ‣ Teams Mectic tags and Alerting & Escalation Tikal Knowledge
  • 49. INTRODUCTION TO MODERN MONITORING STEP1 - IMPLEMENT DISCOVERY AWS Discovery -> https://github.com/prometheus/prometheus/tree/master/discovery NEW NODE DEPLOYMEN T SERVICE DISCOVERY DEV STAGING PRODUCTION STACK / APP NAME Alertmanager Tikal Knowledge
  • 50. INTRODUCTION TO MODERN MONITORING STEP2 - IMPLEMENT EXPORTERS https://prometheus.io/docs/instrumenting/exporters/ Official node exporter -> https://github.com/prometheus/node_exporter Mssql Exporter -> https://hub.docker.com/r/awaragi/prometheus-mssql- exporter/ Nagios Exporter -> https://github.com/m-lab/prometheus-nagios-exporter Tikal Knowledge
  • 51. INTRODUCTION TO MODERN MONITORING STEP3 - IMPLEMENT CUSTOM APPLICATION METRICS https://prometheus.io/docs/instrumenting/exporters/ Windows WMI -> https://github.com/martinlindhe/wmi_exporter Java -> https://github.com/prometheus/jmx_exporter node.js -> https://www.npmjs.com/browse/keyword/prometheus .Net -> https://github.com/andrasm/prometheus-net Tikal Knowledge
  • 52. INTRODUCTION TO MODERN MONITORING STEP4 - ADAPT TO YOUR INFRA MONITORING [ FILTER || TAG || SELECTOR ] kubernetes_sd_config Tikal Knowledge
  • 53. INTRODUCTION TO MODERN MONITORING STEP 5 - METRIC DESIGN ‣ Review sample METRICS and GRAPHS ‣ Define | Reuse ‣ Naming conventions { https://prometheus.io/docs/practices/naming/ } ‣ Quantifiable [ numbers not strings … ] Tikal Knowledge
  • 55. INTRODUCTION TO MODERN MONITORING DEVELOPER TOOL Tikal Knowledge
  • 56. INTRODUCTION TO MODERN MONITORING DEVELOPER TOOL - SIMPLE GRAPHS Tikal Knowledge
  • 57. INTRODUCTION TO MODERN MONITORING DEVELOPER TOOL - METRICS - USING PROMQL ▸ Simple queries: ▸ rate(http_requests_total[5m]) ▸ Linear predictions ▸ predict_linear(node_filesystem_free[1h], 4*3600) Tikal Knowledge
  • 58. INTRODUCTION TO MODERN MONITORING GRAFANA - SIMILAR WORKING EXPERIENCE - MUCH NICER Tikal Knowledge
  • 59. INTRODUCTION TO MODERN MONITORING GRAFANA - SIMILAR WORKING EXPERIENCE - MUCH NICER Tikal Knowledge
  • 60. INTRODUCTION TO MODERN MONITORING STEP 6 - ALERT DESIGN ‣ Review new METRICS and GRAPHS define | design thresholds ‣ Define Severity ‣ Ownership ‣ Escalation lader Tikal Knowledge
  • 62. INTRODUCTION TO MODERN MONITORING ALERT DESIGN ▸ ALERT <alert name> ▸ IF <expression> ▸ [ FOR <duration> ] ▸ [ LABELS <label set> ] ▸ [ ANNOTATIONS <label set> ] Tikal Knowledge
  • 63. INTRODUCTION TO MODERN MONITORING ALERT FOR ANY INSTANCE THAT IS UNREACHABLE FOR >5 MINUTES. ALERT high_load IF node_load1 > 0.5 ANNOTATIONS {description="{{ $labels.instance }} of job {{ $labels.job }} is under high load.", summary="Instance {{ $labels.instance }} under high load"} Tikal Knowledge
  • 64. INTRODUCTION TO MODERN MONITORING STILL LOOKING FOR ONLINE EDITOR FOR EASE OF DEVELOPMENT https://github.com/alerta/prometheus-config Tikal Knowledge
  • 65. INTRODUCTION TO MODERN MONITORING SIMPLE YAML FILE route: receiver: 'slack' receivers: - name: 'slack' slack_configs: - send_resolved: true username: '<username>' channel: '#<channel-name>' api_url: '<incomming-webhook-url>' WHERE TO ROUTE TO ROUTER DETAILS Tikal Knowledge
  • 66. INTRODUCTION TO MODERN MONITORING ALERTING global: resolve_timeout: 5m smtp_require_tls: true pagerduty_url: https://events.pagerduty.com/generic/2010-04-15/create_event.json hipchat_url: https://api.hipchat.com/ opsgenie_api_host: https://api.opsgenie.com/ victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/ route: receiver: slack receivers: - name: slack slack_configs: - send_resolved: true api_url: <secret> channel: '#<channel-name>' username: <username> color: '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}' title: '{{ template "slack.default.title" . }}' title_link: '{{ template "slack.default.titlelink" . }}' pretext: '{{ template "slack.default.pretext" . }}' text: '{{ template "slack.default.text" . }}' fallback: '{{ template "slack.default.fallback" . }}' icon_emoji: '{{ template "slack.default.iconemoji" . }}' icon_url: '{{ template "slack.default.iconurl" . }}' templates: [] } }Channel Configuration Variables | Global configuration Tikal Knowledge
  • 67. INTRODUCTION TO MODERN MONITORING ALERT TEMPLATING ▸ What | How to say … https://prometheus.io/blog/2016/03/03/custom-alertmanager-templates/ - send_resolved: true api_url: <secret> channel: '#<channel-name>' username: <username> color: '{{ if eq .Status "firing" }}danger{{ else }} good{{ end }}' title: '{{ template "slack.default.title" . }}' title_link: '{{ template "slack.default.titlelink" . }}' pretext: '{{ template "slack.default.pretext" . }}' text: '{{ template "slack.default.text" . }}' fallback: '{{ template "slack.default.fallback" . }}' icon_emoji: '{{ template "slack.default.iconemoji" . }}' icon_url: '{{ template "slack.default.iconurl" . }}' Tikal Knowledge
  • 68. INTRODUCTION TO MODERN MONITORING SILENCING, VIA UI / API Tikal Knowledge
  • 69. INTRODUCTION TO MODERN MONITORING ANSWERS REQUIRED FEATURES Accessibility Scheduling SLA’s assured Auth & Authorization Escalation Durable & Resilient Forensics Automatic Flexible & Elastic Accountable Tikal Knowledge
  • 70. INTRODUCTION TO MODERN MONITORING NEXT STEPS INFRUSTRUCTURE (OS) APPLICATION EXTERNAL (DEPENDENCY / ENDPOINT) REMEDIABLE ? ALEARTABLE ? LOG CORRELATION } ALERT MANAGER LEGACY IDENTIFY CHOOSE Tikal Knowledge
  • 71. INTRODUCTION TO MODERN MONITORING DEMO TIME ‣ Docker-compose - ready fro R&D to start using to run create custom application Metrics. ‣ Prometheus, Node_exporter, Alertmanager Cadvisor, Grafana Tikal Knowledge
  • 72. INTRODUCTION TO MODERN MONITORING DOCKER SETTINGS - VOLUMES, NETWORKS version: ‘2' volumes: prometheus_data: {} grafana_data: {} networks: front-tier: driver: bridge back-tier: driver: bridge Docker-compose version Docker volumes for preometheus and grafana Docker Networks Tikal Knowledge
  • 73. INTRODUCTION TO MODERN MONITORING PROMETHEUS - OFFICIAL CONTAINER services: prometheus: image: prom/prometheus container_name: prometheus volumes: - ./prometheus/:/etc/prometheus/ - prometheus_data:/prometheus command: - '-config.file=/etc/prometheus/prometheus.yml' - '-storage.local.path=/prometheus' - '-alertmanager.url=http://alertmanager:9093' expose: - 9090 ports: - 9090:9090 links: - cadvisor:cadvisor - alertmanager:alertmanager depends_on: - cadvisor networks: - back-tier Docker Service name Docker volumes for prometheus and grafana Expose as service on specified port Ports to expose as service Link to cadvisor & alertmanager Network placement ‘back-tier’ Configuration Tikal Knowledge
  • 74. INTRODUCTION TO MODERN MONITORING NODE-EXPORTER [ NODE METRICS COLLECTOR ] node-exporter: container_name: node-exporter image: prom/node-exporter volumes: - /proc:/host/proc:ro - /sys:/host/sys:ro - /:/rootfs:ro command: '-collector.procfs=/host/proc -collector.sysfs=/host/sys -collector.filesystem.ignored-mount-points="^(/rootfs|/host|)/(sys| proc|dev|host|etc)($$|/)" collector.filesystem.ignored-fs- types="^(sys|proc|auto|cgroup|devpts|ns|au|fuse.lxc|mqueue)(fs|)$$"' expose: - 9100 networks: - back-tier Access to /proc /sys What to mount from OS to container for metric collection Tikal Knowledge
  • 75. INTRODUCTION TO MODERN MONITORING ALERT MANAGER alertmanager: image: prom/alertmanager ports: - 9093:9093 volumes: - ./alertmanager/:/etc/alertmanager/ networks: - back-tier command: - '-config.file=/etc/alertmanager/config.yml' - '-storage.path=/alertmanager' Tikal Knowledge
  • 76. INTRODUCTION TO MODERN MONITORING CADVISOR cadvisor: image: google/cadvisor volumes: - /:/rootfs:ro - /var/run:/var/run:rw - /sys:/sys:ro - /var/lib/docker/:/var/lib/docker:ro expose: - 8080 networks: - back-tier grafana: image: grafana/grafana depends_on: - prometheus ports: - 3000:3000 volumes: - grafana_data:/var/lib/grafana env_file: - config.monitoring networks: - back-tier - front-tier Tikal Knowledge
  • 77. INTRODUCTION TO MODERN MONITORING GRAFANA grafana: image: grafana/grafana depends_on: - prometheus ports: - 3000:3000 volumes: - grafana_data:/var/lib/grafana env_file: - config.monitoring networks: - back-tier - front-tier Tikal Knowledge
  • 78. INTRODUCTION TO MODERN MONITORING DOCKER PS CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 3dcfd7c289cb grafana/grafana "/run.sh" 21 hours ago Up 4 minutes 0.0.0.0:3000->3000/tcp prometheus_grafana_1 2b2817fc0bd9 prom/prometheus "/bin/prometheus -..." 21 hours ago Up 4 minutes 0.0.0.0:9090->9090/tcp prometheus d2c6849d3bd9 google/cadvisor "/usr/bin/cadvisor..." 21 hours ago Up 4 minutes 8080/tcp prometheus_cadvisor_1 d4a3c3ceb97d prom/node-exporter "/bin/node_exporte..." 21 hours ago Up 4 minutes 9100/tcp node-exporter 75eb08791ea9 prom/alertmanager "/bin/alertmanager..." 21 hours ago Up 4 minutes 0.0.0.0:9093->9093/tcp prometheus_alertmanager_1 Tikal Knowledge
  • 79. INTRODUCTION TO MODERN MONITORING DEMO PROJECT ON GITHUB https://github.com/shelleg/monlog-compose-stack Tikal Knowledge
  • 80. INTRODUCTION TO MODERN MONITORING ‣ All containers - monitored by prometheus + graphed in a small nice project. Tikal Knowledge
  • 81. TEXT ROLLOUT [ LLD ] Tikal Knowledge
  • 82. INTRODUCTION TO MODERN MONITORING PLACEMENT OPTIONS ‣ 1 main prometheus server vs. 1 Prometheus server per team ‣ 1 Alert-manager [ with pre-defined “receivers” ] vs. 1 per team / concern Tikal Knowledge
  • 83. INTRODUCTION TO MODERN MONITORING DEPLOYMENT OPTIONS ‣ Automate deployment of prometheus server(s) / Alert-manager [ pre-defined “receivers” ] ‣ Ansible, puppet etc ‣ Jenkins ‣ The combination of the 2 ;) ‣ Automation helps solve the “one 2 Many” dilemma IMHO … Tikal Knowledge
  • 84. INTRODUCTION TO MODERN MONITORING DEVELOPER STACK ‣ Options: ‣ Personal Docker / Docker-compose[ private fork if desired ] ‣ A small startup.cmd / startup.sh starting go applications of promethes & alertmanager ‣ A centralized Grafana / Alertmanager with only prometheus on dev-machine ‣ Toolkit for ‣ develop metrics, alarms, graphs ‣ Add exporters to configuration [ tendency :: as common as you develop new services ] ‣ SDLC -> Gil Pull/MErge request mechanism Tikal Knowledge
  • 85. INTRODUCTION TO MODERN MONITORING DEVELOPER STACK(S) - EXAMPLE Tikal Knowledge
  • 86. INTRODUCTION TO MODERN MONITORING ALERTS IN SCM MASTER -> STG -> PRD Tikal Knowledge
  • 87. INTRODUCTION TO MODERN MONITORING POPULATE ALERTS | METRICS | DASHBOARDS VIA SCM 1. Use “ready made” || good starring point graphs from grafana dashboard exchange or build your own 2. Customize 3. Add / push to git master branch 4. “ci” server -> listen on GitHook -> push to staging 5. “ci” server -> wait for manual trigger -> push to production Tikal Knowledge
  • 88. INTRODUCTION TO MODERN MONITORING CONTINUOUS DELIVERY OPTIONS [ ADDING AN ALERT SAMPLE WORKFLOW ] master (dev) staging production DEVELOP DEPLOY TO STAGE DEPLOY TO PROD 1 centralized repo branch per env / prometheus instance Tikal Knowledge
  • 89. INTRODUCTION TO MODERN MONITORING CONTINUOUS DELIVERY OPTIONS [ ADDING GRAPHS ] master (dev) staging production DEVELOP DEPLOY TO STAGE DEPLOY TO PROD “Grafana Dashboard hub” - separate repo ? - part of monitoring repo ? Tikal Knowledge
  • 90. INTRODUCTION TO MODERN MONITORING CI PIPELINE -DATA ORIGINS & PRESENTATION Exporters REGION POD INSTANCE * } } App Metrics OS Metrics Filter Tags & Alerts Tikal Knowledge
  • 91. INTRODUCTION TO MODERN MONITORING CI PIPELINE DEV STAGING PRODUCTION STACK / APP NAME ALERTMANAGE R ALERTMANAGE R Web-hook (PR-builder) GRAFANA GRAFANA OPS “CLEANUP” ROUTINE(S) Tikal Knowledge
  • 92. INTRODUCTION TO MODERN MONITORING BUILDING THE PIPELINE ‣ Routine on submit / push builds to dev/stg ‣ Run daily / weekly deployments of Alerts (prometheus) | Dashboards (grafana) ‣ Avoid / rollback any manual changes of Alerts / Graphs etc ‣ Help make automation a common practice ‣ Scheduled task which syncs and re-configures the desired state from SCM Tikal Knowledge
  • 93. INTRODUCTION TO MODERN MONITORING MESURE THE PIPELINE ‣ Pipeline steps are monitored ‣ Expose metrics such as: ‣ deployment time & status [ in env | stack etc ] ‣ count (# of alerts, new vs old last week, month etc) ‣ Metric counters [ application metrics ] … ‣ [ Jenkins exporter || push gateway TBD ] Tikal Knowledge
  • 94. FEEDBACK / QUESTIONS ? I’M HERE … HAGZAG@TIKALK.COM, 0545302525 Haggai Philip Zagury - Tikal Knowledge MONITORING HLD FullStack Developers Israel