Monitoring klassisch oder Cloud

Klassisch oder Cloud - egal.
Monitoring ohne Spagat mit OMD
– Part II Cloud Monitoring –
Ulrike Klusik
22.11.2019

Event Digitalisierung - Monitoring Folie 2
Difference between classical and cloud applications
App1
Inst1
App1
Inst2
App2
Inst1
App1
Inst 1
App1
Inst N
App2
Inst 1
App2
Inst N
CMDB
classic cloud
Application
version,
limits
Fix instances and resources instances and resources on demand
fix order

Monitoring Challenges in the Cloud
• A Cloud infrastructure is a platform for High Available Applications with scaling on
demand
• Hence the infrastructure must also be scalable, to satisfy the needs
• Monitoring of Central Services: immediate alerts about reduced availability
• Monitoring Resource Usage: early alerts for extensions
• Rapidly Changing Applications:
• Fix checks are rapidly outdated
• It is important to have more performance metrics available than used for the current
alerting. E.g. for detailed post mortem analysis
The Monitoring solution needs to know what is exactly running at the moment
and needs collect many metrics

Prometheus for monitoring in the Cloud
• The open source monitoring and alerting solution for containerized system and especially
Kubernetes:
• Can determine metric sources (aka targets) dynamically via service discovery for
Kubernetes and most public cloud providers and other container registries.
• Gathers/scrapes metrics from these targets
• Alert rules defined on metrics expression,
define problematic conditions
• The Alertmanager gets these alerts, deduplicates
and routes them e.g. via email or
generic webhooks etc. to incident management
systems
• Typically visualization via Grafana
from https://prometheus.io/assets/architecture.png

Example: Monitoring OpenShift Clusters
• OpenShift is a commercial
Kubernetes Implementation
• The central service URLs from the
Cluster infrastructure are stable,
• but the infrastructure objects
(Nodes, Pods) to be monitored are
rapidly changing.
• The API already provides meta data
about the cluster components =>
this generically determines the
metric targets
from https://docs.okd.io/3.11/architecture/index.html

namespace
Nodes
host
NODE-
EXPORTER9100
OMD server
INFLUXDB8086
ALERTMGR
(cluster possible)443
Container
OMD-Service
Grafana443
ConSol OpenShift Infrastructure Monitoring Architecture
Kubelet +
cAdvisor
Openshift-
Service
HAProxy(Router)
ETCD
(on masters)
OpenShift projects
remote write
(selected metrics)
Incident Mananagent
systems (e.g. Remedy,
Service Now)
custom webhook
api-servers
kube controllers
EFK Logging
(via Pods)
GlusterFS (via
Heketi-Route)
Project prometheus-infra-mon
9090 PROMETHEUS
KSM/OSM8080
OpenShift Cluster
• Most OpenShift services already
provide Prometheus metrics
(with each Version > 3.6 more)
• Node-Exporter for operation
system metrics
• KSM/OSM: metrics over objects
and their states

Visualization: Cluster Monitoring Cockpit via Grafana
• Top Down Approach:
Cluster
Overview
Node
Resources
Pod
Details
Cluster
Resources
Service
Dashboard
Pod
Details
Service
Details

Dashboard: Entry Dashboard “Cluster Overview” per Cluster
• Part Cluster Services: Overall status by URL checks and Pod availabilities
Color coding: show worst status in selected time period

Dashboard: Entry Dashboard “Cluster Overview” per Cluster
• Overview of current alerts:
• Only list by alert name
• Details in Prometheus or
in the incident management system (which is notified via the Alertmanager)

Dashboard: Services, e.g. Router/HAProxy
General idea for the service
dashboards
• Health:
about availability and errors
• System:
drill through to PODs
• Basic General Info:
most important performance
metrics

Dashboard: Node Resources
• Details on one cluster node
• resource capacities
• amount of pods with drill through
• availability via node status
• and operating system metrics from
node-exporter
• Details on one cluster node
• resource capacities
• amount of pods with drill through
• availability via node status
• and operating system metrics from
node-exporter

Conclusion
• OMD Labs integrates the tools needed to monitor all kinds of infrastructures.
• It is open source.
• We have a lot of experience implementing monitoring solutions based on OMD Labs for
complex and dynamically changing IT infrastructures.
• We can customize it for your convenience.
• Check it out
https://labs.consol.de/de/omd/index.html
https://labs.consol.de/de/omd/getting_started.html

Monitoring klassisch oder Cloud

More Related Content

What's hot

Similar to Monitoring klassisch oder Cloud

More from ConSol Consulting & Solutions Software GmbH

Recently uploaded

Monitoring klassisch oder Cloud