SlideShare a Scribd company logo
Implementing Observability
for Kubernetes
José Manuel Ortega(@jmortegac)
Agenda
● Introducing the concept of observability
● Implementing Kubernetes observability
● Observability stack in K8s
● Integrating Prometheus with
OpenTelemetry
Introducing the concept of observability
● Software architecture is more complex.
● Pillars of observability—logs, metrics,
and traces.
● Observability is now a top priority for
DevOps teams.
Introducing the concept of observability
● Monitoring
● Logging
● Tracing
Introducing the concept of observability
● time=”2019-12-23T01:27:38-04:00″
level=debug msg=”Application starting”
environment=dev
● http_requests_total=100
log metric
Introducing the concept of observability
Implementing Kubernetes observability
1. Node status. Current health status and availability of the
node.
2. Node resource usage metrics. Disk and memory
utilization, CPU and network bandwidth.
3. Implementation status. Current and desired state of the
deployments in the cluster.
4. Number of pods. Kubernetes internal components and
processes use this information to manage the workload and
schedule the pods.
Implementing Kubernetes observability
1. Kubernetes metrics. These metrics apply to the number
and types of resources within a pod. This metric includes
resource limit tracking to avoid running out of system
resources.
2. Container metrics. These metrics capture the utilization of
container-level resources, such as CPU, memory, and
network usage.
3. Application metrics. Such metrics include the number of
active or online users and response times.
Implementing Kubernetes observability
Implementing Kubernetes observability
Implementing Kubernetes observability
Observability stack in K8s
● Kubewatch is an open-source Kubernetes monitoring
tool that sends notifications about changes in a
Kubernetes cluster to various communication channels,
such as Slack, Microsoft Teams, or email.
● It monitors Kubernetes resources, such as deployments,
services, and pods, and alerts users in real-time when
changes occur.
https://github.com/vmware-archive/kubewatch
Observability stack in K8s
https://github.com/salesforce/sloop
Observability stack in K8s
● Jaeger is an open-source distributed tracing system
● The tool is designed to monitor and troubleshoot
distributed microservices, mostly focusing on:
○ Distributed context propagation
○ Distributed transaction monitoring
○ Root cause analysis
○ Service dependency analysis
○ Performance/latency optimization
https://www.jaegertracing.io
Observability stack in K8s
Observability stack in K8s
Observability stack in K8s
https://www.jaegertracing.io/docs/1.46/operator
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: simplest
Observability stack in K8s
● Fluentd is an open-source data collector for
unified logging layers.
● It works with Kubernetes running as
DaemonSet. This combination ensures that all
nodes run one copy of a pod.
https://www.fluentd.org
Observability stack in K8s
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: fluentd
namespace: kube-system
spec:
containers:
– name: fluentd
image:
quay.io/fluent/fluentd-kubernetes-daemonset
Observability stack in K8s
Observability stack in K8s
● Prometheus is a cloud native time series data
store with built-in rich query language for
metrics.
● Collecting data with Prometheus opens up many
possibilities for increasing the observability of
your infrastructure and the containers running in
Kubernetes cluster.
https://prometheus.io
Observability stack in K8s
● Multi-dimensional data model
● Prometheus query language(PromQL)
● Data collection
● Storage
● Visualization(Grafana)
https://prometheus.io
Observability stack in K8s
Observability stack in K8s
● Most of the metrics can be exported using node_exporter
https://github.com/prometheus/node_exporter and cAdvisor
https://github.com/google/cadvisor
○ Resource utilization saturation. The containers’ resource
consumption and allocation.
○ The number of failing pods and errors within a specific
namespace.
○ Kubernetes resource capacity. The total number of
nodes, CPU cores, and memory available.
Observability stack in K8s
Observability stack in K8s
● Service dependencies & communication map
○ What services are communicating with each other?
○ What HTTP calls are being made?
● Operational monitoring & alerting
○ Is any network communication failing?
○ Is the communication broken on layer 4 (TCP) or layer 7 (HTTP)?
● Application monitoring
○ What is the rate of 5xx or 4xx HTTP response codes for a
particular service or across all clusters?
● Security observability
○ Which services had connections blocked due to network policy?
https://github.com/cilium/hubble
Observability stack in K8s
https://github.com/cilium/hubble
Observability stack in K8s
https://github.com/cilium/hubble
Service Dependency Graph
Observability stack in K8s
https://github.com/cilium/hubble
Networking Behavior
Observability stack in K8s
https://github.com/cilium/hubble
HTTP Request/Response Rate & Latency
Integrating Prometheus with OpenTelemetry
Integrating Prometheus with OpenTelemetry
Integrating Prometheus with OpenTelemetry
● Receivers: are the data sources of observability
information.
● Processors: they process the information received before it
is exported to the different backends.
● Exporters: they are in charge of exporting the information to
the different backends, such as Jaeger or Kafka
Integrating Prometheus with Open
https://github.com/open-telemetry/opentelemetry-collector
otel-collector:
image: otel/opentelemetry-collector:latest
command: [ "--config=/etc/otel-collector-config.yaml" ]
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml:Z
ports:
- "13133:13133"
- "4317:4317"
- "4318:4318"
depends_on:
- jaeger
Integrating Prometheus with OpenTelemetry
otel-collector-config.yaml
processors:
batch:
extensions:
health_check:
service:
extensions: [health_check]
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [jaeger]
receivers:
otlp:
protocols:
grpc:
endpoint: otel-collector:4317
exporters:
jaeger:
endpoint: jaeger:14250
tls:
insecure: true
Integrating Prometheus with OpenTelemetry
otel-collector-config.yaml
Integrating Prometheus with OpenTelemetry
https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor
Integrating Prometheus with OpenTelemetry
Integrating Prometheus with OpenTelemetry
receivers:
..
prometheus:
config:
scrape_configs:
- job_name: 'service-a'
scrape_interval: 2s
metrics_path: '/metrics/prometheus'
static_configs:
- targets: [ 'service-a:8080' ]
- job_name: 'service-b'
scrape_interval: 2s
metrics_path: '/actuator/prometheus'
static_configs:
- targets: [ 'service-b:8081' ]
- job_name: 'service-c'
scrape_interval: 2s
Integrating Prometheus with OpenTelemetry
exporters:
…
prometheusremotewrite:
endpoint: http://prometheus:9090/api/v1/write
tls:
insecure: true
● active in Prometheus “--web.enable-remote-write-receiver”
Integrating Prometheus with OpenTelemetry
https://github.com/open-telemetry/opentelemetry-demo
Integrating Prometheus with OpenTelemetry
https://github.com/open-telemetry/opentelemetry-demo
Integrating Prometheus with OpenTelemetry
https://github.com/open-telemetry/opentelemetry-demo
Integrating Prometheus with OpenTelemetry
https://github.com/open-telemetry/opentelemetry-demo
Conclusions
● Lean on the native capabilities of Kubernetes for the
collection and exploitation of metrics in order to know
the state of health of your pods and, in general, of your
cluster.
● Use these metrics to be able to create alarms that
proactively notify us of errors or even allow us to
anticipate issues in our applications or infraestructure.
¡Thank you!
@jmortegac
https://www.linkedin.com
/in/jmortega1
https://jmortega.github.io

More Related Content

Similar to Implementing Observability for Kubernetes.pdf

Using Kubernetes to make cellular data plans cheaper for 50M users
Using Kubernetes to make cellular data plans cheaper for 50M usersUsing Kubernetes to make cellular data plans cheaper for 50M users
Using Kubernetes to make cellular data plans cheaper for 50M users
Mirantis
 
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
QAware GmbH
 
Intro to GitOps with Weave GitOps, Flagger and Linkerd
Intro to GitOps with Weave GitOps, Flagger and LinkerdIntro to GitOps with Weave GitOps, Flagger and Linkerd
Intro to GitOps with Weave GitOps, Flagger and Linkerd
Weaveworks
 
Microservices Part 4: Functional Reactive Programming
Microservices Part 4: Functional Reactive ProgrammingMicroservices Part 4: Functional Reactive Programming
Microservices Part 4: Functional Reactive Programming
Araf Karsh Hamid
 
08 - kubernetes.pptx
08 - kubernetes.pptx08 - kubernetes.pptx
08 - kubernetes.pptx
RanjithM61
 
kubernetesssssssssssssssssssssssssss.pdf
kubernetesssssssssssssssssssssssssss.pdfkubernetesssssssssssssssssssssssssss.pdf
kubernetesssssssssssssssssssssssssss.pdf
bchiriamina2
 
OSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike Klusik
OSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike KlusikOSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike Klusik
OSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike Klusik
NETWAYS
 
Kubernetes PPT.pptx
Kubernetes PPT.pptxKubernetes PPT.pptx
Kubernetes PPT.pptx
ssuser0cc9131
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
GetInData
 
Monitoring Cockpit for OpenShift Clusters
Monitoring Cockpit for OpenShift ClustersMonitoring Cockpit for OpenShift Clusters
Monitoring Cockpit for OpenShift Clusters
ConSol Consulting & Solutions Software GmbH
 
OpenTelemetry 101 FTW
OpenTelemetry 101 FTWOpenTelemetry 101 FTW
OpenTelemetry 101 FTW
NGINX, Inc.
 
software defined network, openflow protocol and its controllers
software defined network, openflow protocol and its controllerssoftware defined network, openflow protocol and its controllers
software defined network, openflow protocol and its controllersIsaku Yamahata
 
Exploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on KubernetesExploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on Kubernetes
Red Hat Developers
 
Xpdays: Kubernetes CI-CD Frameworks Case Study
Xpdays: Kubernetes CI-CD Frameworks Case StudyXpdays: Kubernetes CI-CD Frameworks Case Study
Xpdays: Kubernetes CI-CD Frameworks Case Study
Denys Vasyliev
 
Monitoring kubernetes with prometheus-operator
Monitoring kubernetes with prometheus-operatorMonitoring kubernetes with prometheus-operator
Monitoring kubernetes with prometheus-operator
Lili Cosic
 
2307 - DevBCN - Otel 101_compressed.pdf
2307 - DevBCN - Otel 101_compressed.pdf2307 - DevBCN - Otel 101_compressed.pdf
2307 - DevBCN - Otel 101_compressed.pdf
DimitrisFinas1
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin
Kuberton
 
Zephyr Introduction - Nordic Webinar - Sept. 24.pdf
Zephyr Introduction - Nordic Webinar - Sept. 24.pdfZephyr Introduction - Nordic Webinar - Sept. 24.pdf
Zephyr Introduction - Nordic Webinar - Sept. 24.pdf
AswathRangaraj1
 
Monitoring in 2017 - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp DockerMonitoring in 2017 - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp Docker
The Incredible Automation Day
 
Integrating Puppet and Gitolite for sysadmins cooperations
Integrating Puppet and Gitolite for sysadmins cooperationsIntegrating Puppet and Gitolite for sysadmins cooperations
Integrating Puppet and Gitolite for sysadmins cooperations
Luca Mazzaferro
 

Similar to Implementing Observability for Kubernetes.pdf (20)

Using Kubernetes to make cellular data plans cheaper for 50M users
Using Kubernetes to make cellular data plans cheaper for 50M usersUsing Kubernetes to make cellular data plans cheaper for 50M users
Using Kubernetes to make cellular data plans cheaper for 50M users
 
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
Kubernetes One-Click Deployment: Hands-on Workshop (Mainz)
 
Intro to GitOps with Weave GitOps, Flagger and Linkerd
Intro to GitOps with Weave GitOps, Flagger and LinkerdIntro to GitOps with Weave GitOps, Flagger and Linkerd
Intro to GitOps with Weave GitOps, Flagger and Linkerd
 
Microservices Part 4: Functional Reactive Programming
Microservices Part 4: Functional Reactive ProgrammingMicroservices Part 4: Functional Reactive Programming
Microservices Part 4: Functional Reactive Programming
 
08 - kubernetes.pptx
08 - kubernetes.pptx08 - kubernetes.pptx
08 - kubernetes.pptx
 
kubernetesssssssssssssssssssssssssss.pdf
kubernetesssssssssssssssssssssssssss.pdfkubernetesssssssssssssssssssssssssss.pdf
kubernetesssssssssssssssssssssssssss.pdf
 
OSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike Klusik
OSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike KlusikOSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike Klusik
OSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike Klusik
 
Kubernetes PPT.pptx
Kubernetes PPT.pptxKubernetes PPT.pptx
Kubernetes PPT.pptx
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Monitoring Cockpit for OpenShift Clusters
Monitoring Cockpit for OpenShift ClustersMonitoring Cockpit for OpenShift Clusters
Monitoring Cockpit for OpenShift Clusters
 
OpenTelemetry 101 FTW
OpenTelemetry 101 FTWOpenTelemetry 101 FTW
OpenTelemetry 101 FTW
 
software defined network, openflow protocol and its controllers
software defined network, openflow protocol and its controllerssoftware defined network, openflow protocol and its controllers
software defined network, openflow protocol and its controllers
 
Exploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on KubernetesExploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on Kubernetes
 
Xpdays: Kubernetes CI-CD Frameworks Case Study
Xpdays: Kubernetes CI-CD Frameworks Case StudyXpdays: Kubernetes CI-CD Frameworks Case Study
Xpdays: Kubernetes CI-CD Frameworks Case Study
 
Monitoring kubernetes with prometheus-operator
Monitoring kubernetes with prometheus-operatorMonitoring kubernetes with prometheus-operator
Monitoring kubernetes with prometheus-operator
 
2307 - DevBCN - Otel 101_compressed.pdf
2307 - DevBCN - Otel 101_compressed.pdf2307 - DevBCN - Otel 101_compressed.pdf
2307 - DevBCN - Otel 101_compressed.pdf
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin
 
Zephyr Introduction - Nordic Webinar - Sept. 24.pdf
Zephyr Introduction - Nordic Webinar - Sept. 24.pdfZephyr Introduction - Nordic Webinar - Sept. 24.pdf
Zephyr Introduction - Nordic Webinar - Sept. 24.pdf
 
Monitoring in 2017 - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp DockerMonitoring in 2017 - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp Docker
 
Integrating Puppet and Gitolite for sysadmins cooperations
Integrating Puppet and Gitolite for sysadmins cooperationsIntegrating Puppet and Gitolite for sysadmins cooperations
Integrating Puppet and Gitolite for sysadmins cooperations
 

More from Jose Manuel Ortega Candel

Herramientas de benchmarks para evaluar el rendimiento en máquinas y aplicaci...
Herramientas de benchmarks para evaluar el rendimiento en máquinas y aplicaci...Herramientas de benchmarks para evaluar el rendimiento en máquinas y aplicaci...
Herramientas de benchmarks para evaluar el rendimiento en máquinas y aplicaci...
Jose Manuel Ortega Candel
 
Asegurando tus APIs Explorando el OWASP Top 10 de Seguridad en APIs.pdf
Asegurando tus APIs Explorando el OWASP Top 10 de Seguridad en APIs.pdfAsegurando tus APIs Explorando el OWASP Top 10 de Seguridad en APIs.pdf
Asegurando tus APIs Explorando el OWASP Top 10 de Seguridad en APIs.pdf
Jose Manuel Ortega Candel
 
PyGoat Analizando la seguridad en aplicaciones Django.pdf
PyGoat Analizando la seguridad en aplicaciones Django.pdfPyGoat Analizando la seguridad en aplicaciones Django.pdf
PyGoat Analizando la seguridad en aplicaciones Django.pdf
Jose Manuel Ortega Candel
 
Ciberseguridad en Blockchain y Smart Contracts: Explorando los Desafíos y Sol...
Ciberseguridad en Blockchain y Smart Contracts: Explorando los Desafíos y Sol...Ciberseguridad en Blockchain y Smart Contracts: Explorando los Desafíos y Sol...
Ciberseguridad en Blockchain y Smart Contracts: Explorando los Desafíos y Sol...
Jose Manuel Ortega Candel
 
Evolution of security strategies in K8s environments- All day devops
Evolution of security strategies in K8s environments- All day devops Evolution of security strategies in K8s environments- All day devops
Evolution of security strategies in K8s environments- All day devops
Jose Manuel Ortega Candel
 
Evolution of security strategies in K8s environments.pdf
Evolution of security strategies in K8s environments.pdfEvolution of security strategies in K8s environments.pdf
Evolution of security strategies in K8s environments.pdf
Jose Manuel Ortega Candel
 
Computación distribuida usando Python
Computación distribuida usando PythonComputación distribuida usando Python
Computación distribuida usando Python
Jose Manuel Ortega Candel
 
Seguridad en arquitecturas serverless y entornos cloud
Seguridad en arquitecturas serverless y entornos cloudSeguridad en arquitecturas serverless y entornos cloud
Seguridad en arquitecturas serverless y entornos cloud
Jose Manuel Ortega Candel
 
Construyendo arquitecturas zero trust sobre entornos cloud
Construyendo arquitecturas zero trust sobre entornos cloud Construyendo arquitecturas zero trust sobre entornos cloud
Construyendo arquitecturas zero trust sobre entornos cloud
Jose Manuel Ortega Candel
 
Tips and tricks for data science projects with Python
Tips and tricks for data science projects with Python Tips and tricks for data science projects with Python
Tips and tricks for data science projects with Python
Jose Manuel Ortega Candel
 
Sharing secret keys in Docker containers and K8s
Sharing secret keys in Docker containers and K8sSharing secret keys in Docker containers and K8s
Sharing secret keys in Docker containers and K8s
Jose Manuel Ortega Candel
 
Implementing cert-manager in K8s
Implementing cert-manager in K8sImplementing cert-manager in K8s
Implementing cert-manager in K8s
Jose Manuel Ortega Candel
 
Python para equipos de ciberseguridad(pycones)
Python para equipos de ciberseguridad(pycones)Python para equipos de ciberseguridad(pycones)
Python para equipos de ciberseguridad(pycones)
Jose Manuel Ortega Candel
 
Python para equipos de ciberseguridad
Python para equipos de ciberseguridad Python para equipos de ciberseguridad
Python para equipos de ciberseguridad
Jose Manuel Ortega Candel
 
Shodan Tips and tricks. Automatiza y maximiza las búsquedas shodan
Shodan Tips and tricks. Automatiza y maximiza las búsquedas shodanShodan Tips and tricks. Automatiza y maximiza las búsquedas shodan
Shodan Tips and tricks. Automatiza y maximiza las búsquedas shodan
Jose Manuel Ortega Candel
 
ELK para analistas de seguridad y equipos Blue Team
ELK para analistas de seguridad y equipos Blue TeamELK para analistas de seguridad y equipos Blue Team
ELK para analistas de seguridad y equipos Blue Team
Jose Manuel Ortega Candel
 
Monitoring and managing Containers using Open Source tools
Monitoring and managing Containers using Open Source toolsMonitoring and managing Containers using Open Source tools
Monitoring and managing Containers using Open Source tools
Jose Manuel Ortega Candel
 
Python Memory Management 101(Europython)
Python Memory Management 101(Europython)Python Memory Management 101(Europython)
Python Memory Management 101(Europython)
Jose Manuel Ortega Candel
 
SecDevOps containers
SecDevOps containersSecDevOps containers
SecDevOps containers
Jose Manuel Ortega Candel
 
Python memory managment. Deeping in Garbage collector
Python memory managment. Deeping in Garbage collectorPython memory managment. Deeping in Garbage collector
Python memory managment. Deeping in Garbage collector
Jose Manuel Ortega Candel
 

More from Jose Manuel Ortega Candel (20)

Herramientas de benchmarks para evaluar el rendimiento en máquinas y aplicaci...
Herramientas de benchmarks para evaluar el rendimiento en máquinas y aplicaci...Herramientas de benchmarks para evaluar el rendimiento en máquinas y aplicaci...
Herramientas de benchmarks para evaluar el rendimiento en máquinas y aplicaci...
 
Asegurando tus APIs Explorando el OWASP Top 10 de Seguridad en APIs.pdf
Asegurando tus APIs Explorando el OWASP Top 10 de Seguridad en APIs.pdfAsegurando tus APIs Explorando el OWASP Top 10 de Seguridad en APIs.pdf
Asegurando tus APIs Explorando el OWASP Top 10 de Seguridad en APIs.pdf
 
PyGoat Analizando la seguridad en aplicaciones Django.pdf
PyGoat Analizando la seguridad en aplicaciones Django.pdfPyGoat Analizando la seguridad en aplicaciones Django.pdf
PyGoat Analizando la seguridad en aplicaciones Django.pdf
 
Ciberseguridad en Blockchain y Smart Contracts: Explorando los Desafíos y Sol...
Ciberseguridad en Blockchain y Smart Contracts: Explorando los Desafíos y Sol...Ciberseguridad en Blockchain y Smart Contracts: Explorando los Desafíos y Sol...
Ciberseguridad en Blockchain y Smart Contracts: Explorando los Desafíos y Sol...
 
Evolution of security strategies in K8s environments- All day devops
Evolution of security strategies in K8s environments- All day devops Evolution of security strategies in K8s environments- All day devops
Evolution of security strategies in K8s environments- All day devops
 
Evolution of security strategies in K8s environments.pdf
Evolution of security strategies in K8s environments.pdfEvolution of security strategies in K8s environments.pdf
Evolution of security strategies in K8s environments.pdf
 
Computación distribuida usando Python
Computación distribuida usando PythonComputación distribuida usando Python
Computación distribuida usando Python
 
Seguridad en arquitecturas serverless y entornos cloud
Seguridad en arquitecturas serverless y entornos cloudSeguridad en arquitecturas serverless y entornos cloud
Seguridad en arquitecturas serverless y entornos cloud
 
Construyendo arquitecturas zero trust sobre entornos cloud
Construyendo arquitecturas zero trust sobre entornos cloud Construyendo arquitecturas zero trust sobre entornos cloud
Construyendo arquitecturas zero trust sobre entornos cloud
 
Tips and tricks for data science projects with Python
Tips and tricks for data science projects with Python Tips and tricks for data science projects with Python
Tips and tricks for data science projects with Python
 
Sharing secret keys in Docker containers and K8s
Sharing secret keys in Docker containers and K8sSharing secret keys in Docker containers and K8s
Sharing secret keys in Docker containers and K8s
 
Implementing cert-manager in K8s
Implementing cert-manager in K8sImplementing cert-manager in K8s
Implementing cert-manager in K8s
 
Python para equipos de ciberseguridad(pycones)
Python para equipos de ciberseguridad(pycones)Python para equipos de ciberseguridad(pycones)
Python para equipos de ciberseguridad(pycones)
 
Python para equipos de ciberseguridad
Python para equipos de ciberseguridad Python para equipos de ciberseguridad
Python para equipos de ciberseguridad
 
Shodan Tips and tricks. Automatiza y maximiza las búsquedas shodan
Shodan Tips and tricks. Automatiza y maximiza las búsquedas shodanShodan Tips and tricks. Automatiza y maximiza las búsquedas shodan
Shodan Tips and tricks. Automatiza y maximiza las búsquedas shodan
 
ELK para analistas de seguridad y equipos Blue Team
ELK para analistas de seguridad y equipos Blue TeamELK para analistas de seguridad y equipos Blue Team
ELK para analistas de seguridad y equipos Blue Team
 
Monitoring and managing Containers using Open Source tools
Monitoring and managing Containers using Open Source toolsMonitoring and managing Containers using Open Source tools
Monitoring and managing Containers using Open Source tools
 
Python Memory Management 101(Europython)
Python Memory Management 101(Europython)Python Memory Management 101(Europython)
Python Memory Management 101(Europython)
 
SecDevOps containers
SecDevOps containersSecDevOps containers
SecDevOps containers
 
Python memory managment. Deeping in Garbage collector
Python memory managment. Deeping in Garbage collectorPython memory managment. Deeping in Garbage collector
Python memory managment. Deeping in Garbage collector
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 

Implementing Observability for Kubernetes.pdf

  • 2. Agenda ● Introducing the concept of observability ● Implementing Kubernetes observability ● Observability stack in K8s ● Integrating Prometheus with OpenTelemetry
  • 3. Introducing the concept of observability ● Software architecture is more complex. ● Pillars of observability—logs, metrics, and traces. ● Observability is now a top priority for DevOps teams.
  • 4. Introducing the concept of observability ● Monitoring ● Logging ● Tracing
  • 5. Introducing the concept of observability ● time=”2019-12-23T01:27:38-04:00″ level=debug msg=”Application starting” environment=dev ● http_requests_total=100 log metric
  • 6. Introducing the concept of observability
  • 7. Implementing Kubernetes observability 1. Node status. Current health status and availability of the node. 2. Node resource usage metrics. Disk and memory utilization, CPU and network bandwidth. 3. Implementation status. Current and desired state of the deployments in the cluster. 4. Number of pods. Kubernetes internal components and processes use this information to manage the workload and schedule the pods.
  • 8. Implementing Kubernetes observability 1. Kubernetes metrics. These metrics apply to the number and types of resources within a pod. This metric includes resource limit tracking to avoid running out of system resources. 2. Container metrics. These metrics capture the utilization of container-level resources, such as CPU, memory, and network usage. 3. Application metrics. Such metrics include the number of active or online users and response times.
  • 12. Observability stack in K8s ● Kubewatch is an open-source Kubernetes monitoring tool that sends notifications about changes in a Kubernetes cluster to various communication channels, such as Slack, Microsoft Teams, or email. ● It monitors Kubernetes resources, such as deployments, services, and pods, and alerts users in real-time when changes occur. https://github.com/vmware-archive/kubewatch
  • 13. Observability stack in K8s https://github.com/salesforce/sloop
  • 14. Observability stack in K8s ● Jaeger is an open-source distributed tracing system ● The tool is designed to monitor and troubleshoot distributed microservices, mostly focusing on: ○ Distributed context propagation ○ Distributed transaction monitoring ○ Root cause analysis ○ Service dependency analysis ○ Performance/latency optimization https://www.jaegertracing.io
  • 17. Observability stack in K8s https://www.jaegertracing.io/docs/1.46/operator apiVersion: jaegertracing.io/v1 kind: Jaeger metadata: name: simplest
  • 18. Observability stack in K8s ● Fluentd is an open-source data collector for unified logging layers. ● It works with Kubernetes running as DaemonSet. This combination ensures that all nodes run one copy of a pod. https://www.fluentd.org
  • 19. Observability stack in K8s apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: fluentd namespace: kube-system spec: containers: – name: fluentd image: quay.io/fluent/fluentd-kubernetes-daemonset
  • 21. Observability stack in K8s ● Prometheus is a cloud native time series data store with built-in rich query language for metrics. ● Collecting data with Prometheus opens up many possibilities for increasing the observability of your infrastructure and the containers running in Kubernetes cluster. https://prometheus.io
  • 22. Observability stack in K8s ● Multi-dimensional data model ● Prometheus query language(PromQL) ● Data collection ● Storage ● Visualization(Grafana) https://prometheus.io
  • 24. Observability stack in K8s ● Most of the metrics can be exported using node_exporter https://github.com/prometheus/node_exporter and cAdvisor https://github.com/google/cadvisor ○ Resource utilization saturation. The containers’ resource consumption and allocation. ○ The number of failing pods and errors within a specific namespace. ○ Kubernetes resource capacity. The total number of nodes, CPU cores, and memory available.
  • 26. Observability stack in K8s ● Service dependencies & communication map ○ What services are communicating with each other? ○ What HTTP calls are being made? ● Operational monitoring & alerting ○ Is any network communication failing? ○ Is the communication broken on layer 4 (TCP) or layer 7 (HTTP)? ● Application monitoring ○ What is the rate of 5xx or 4xx HTTP response codes for a particular service or across all clusters? ● Security observability ○ Which services had connections blocked due to network policy? https://github.com/cilium/hubble
  • 27. Observability stack in K8s https://github.com/cilium/hubble
  • 28. Observability stack in K8s https://github.com/cilium/hubble Service Dependency Graph
  • 29. Observability stack in K8s https://github.com/cilium/hubble Networking Behavior
  • 30. Observability stack in K8s https://github.com/cilium/hubble HTTP Request/Response Rate & Latency
  • 33. Integrating Prometheus with OpenTelemetry ● Receivers: are the data sources of observability information. ● Processors: they process the information received before it is exported to the different backends. ● Exporters: they are in charge of exporting the information to the different backends, such as Jaeger or Kafka
  • 35. https://github.com/open-telemetry/opentelemetry-collector otel-collector: image: otel/opentelemetry-collector:latest command: [ "--config=/etc/otel-collector-config.yaml" ] volumes: - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml:Z ports: - "13133:13133" - "4317:4317" - "4318:4318" depends_on: - jaeger Integrating Prometheus with OpenTelemetry otel-collector-config.yaml
  • 36. processors: batch: extensions: health_check: service: extensions: [health_check] pipelines: traces: receivers: [otlp] processors: [batch] exporters: [jaeger] receivers: otlp: protocols: grpc: endpoint: otel-collector:4317 exporters: jaeger: endpoint: jaeger:14250 tls: insecure: true Integrating Prometheus with OpenTelemetry otel-collector-config.yaml
  • 37. Integrating Prometheus with OpenTelemetry https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor
  • 39. Integrating Prometheus with OpenTelemetry receivers: .. prometheus: config: scrape_configs: - job_name: 'service-a' scrape_interval: 2s metrics_path: '/metrics/prometheus' static_configs: - targets: [ 'service-a:8080' ] - job_name: 'service-b' scrape_interval: 2s metrics_path: '/actuator/prometheus' static_configs: - targets: [ 'service-b:8081' ] - job_name: 'service-c' scrape_interval: 2s
  • 40. Integrating Prometheus with OpenTelemetry exporters: … prometheusremotewrite: endpoint: http://prometheus:9090/api/v1/write tls: insecure: true ● active in Prometheus “--web.enable-remote-write-receiver”
  • 41. Integrating Prometheus with OpenTelemetry https://github.com/open-telemetry/opentelemetry-demo
  • 42. Integrating Prometheus with OpenTelemetry https://github.com/open-telemetry/opentelemetry-demo
  • 43. Integrating Prometheus with OpenTelemetry https://github.com/open-telemetry/opentelemetry-demo
  • 44. Integrating Prometheus with OpenTelemetry https://github.com/open-telemetry/opentelemetry-demo
  • 45. Conclusions ● Lean on the native capabilities of Kubernetes for the collection and exploitation of metrics in order to know the state of health of your pods and, in general, of your cluster. ● Use these metrics to be able to create alarms that proactively notify us of errors or even allow us to anticipate issues in our applications or infraestructure.