SlideShare a Scribd company logo
1 of 69
Download to read offline
Monitoring in
Big Data Platform
Author: Albert Lewandowski
•Big Data DevOps Engineer in Getindata
•Focused on infrastructure, cloud, Internet of Things and
Big Data
Who am I?
Agenda
1. Monitoring overview.
2. Metrics.
a. Infrastructure
b. Big Data Components
c. Applications.
3. Logs analysis.
a. Overview.
b. Logging solutions.
c. Why and how to implement?
4. Q&A
Monitoring
Overview
Push vs. Pull model
Push Pull
Agents push metrics Collector takes metrics
Polling task fully distributed among agents,
resulting in linear scalability.
Workload on central poller increases with the
number of devices polled.
Push agents are inherently secure against remote
attacks since they do not listen for network
connections.
Polling protocol can potentially open up system to
remote access and denial of service attacks.
Relatively inflexible: pre-determined, fixed set of
measurements are periodically exported.
Flexible: poller can ask for any metric at any time.
Monitoring Overview
Prometheus Architecture
Prometheus Architecture
Prometheus is an open-source systems monitoring and alerting toolkit
originally built at SoundCloud and joined the Cloud Native
Computing Foundation in 2016 as the second hosted project,
after Kubernetes.
Components:
• server
• client libraries for instrumenting application code
• a push gateway for supporting short-lived jobs
• special-purpose exporters for services
• an AlertManager to handle alerts
Prometheus’ stories
•Use service discovery, it’s great
• Discover where Flink JM and TM expose their
metrics.
•How to provide HA?
• Think of using long-term storage like Thanos
or M3 or Cortex
•Do you need archived data?
•Monitor Prometheus even if it’s a
monitoring tool.
Service Discovery
It is used for discovering scrape targets.
We can use Kubernetes, Consul or file-based
service discovery and many others.
Prometheus Security
Remember:
■ It should not by default be possible for a target to expose data
that impersonates a different target. The honor_labels option
removes this protection, as can certain relabelling setups.
■ --web.enable-admin-api flag controls access to the
administrative HTTP API which includes functionality such as
deleting time series. This is disabled by default. If enabled,
administrative and mutating functionality will be accessible
under the /api/*/admin/ paths.
■ Any secrets stored in template files could be exfiltrated by
anyone able to configure receivers in the Alertmanager
configuration file.
Prometheus Security
Remember:
■ Prometheus and its components do not provide any
server-side authentication, authorization or encryption. If
you require this, it is recommended to use a reverse proxy.
■ As administrative and mutating endpoints are intended to
be accessed via simple tools such as cURL, there is no built
in CSRF protection.
In the future, server-side TLS support will be rolled out to the different
Prometheus projects. Those projects include Prometheus,
Alertmanager, Pushgateway and the official exporters. Authentication
of clients by TLS client certs will also be supported.
HA for Prometheus
Simple
Two Prometheus
instances
HA for Prometheus
HA with cold storage
● Cortex (below)
● M3DB
● Thanos
Monitoring
Infrastructure
Servers Monitoring
■ Using Node Exporter or any custom exporter
■ Predict disk usage
■ Send alerts in case of any issue or any dangerous
situation
Kubernetes Monitoring
■ Nodes monitoring
■ Pods monitoring
■ Namespace monitoring
Check:
- resources limits and resources requests
- resource utilization
- Node Exporter
- Kube State metrics -it is a simple service that listens to the Kubernetes
API server and generates metrics about the state of the objects.
Monitoring
Components
Hadoop Stack Overview
Kafka exporter
We can scrape and expose
mBeans of a JMX target.
Add to KAFKA_OPTS
export
KAFKA_OPTS='-javaagent:/opt/
jmx-exporter/jmx-exporter.ja
r=9101:/etc/jmx-exporter/kaf
ka.yml'
Kafka exporter
Metrics example:
• Number or replicas
• Under replicated partitions
• State of each broker
• JVM metrics
What about Kafka?
• Critical component in many systems
• A lot of data so any issue can cause
business loss
• What about adding more brokers?
What about Kafka?
• Keep it simple with JMX exporter.
• Monitor partitions state, number of
replicas
• You must have alerts when the number
of under replicated partitions grows fast
Monitoring
Applications
When should I use PushGateway?
• When monitoring multiple instances through a single
Pushgateway, the Pushgateway becomes both a
single point of failure and a potential bottleneck.
• You lose Prometheus's automatic instance health
monitoring via the up metric (generated on every
scrape).
• The Pushgateway never forgets series pushed to it
and will expose them to Prometheus forever unless
those series are manually deleted via the
Pushgateway's API.
Blackbox Exporter
● Official Prometheus Exporter, written in Go
● Single binary installed on monitoring server
● Allows probing API endpoints
https://github.com/prometheus/blackbox_exporter
Airflow Exporter
● Provides metrics for Airflow:
○ dag_status
○ task_status
○ run_duration
● Simple install via pip
https://github.com/epoch8/airflow-exporter
Knox Exporter
● Provides metrics about access to WebHDFS and Hive
● Written in Java, manual installation as a service, needs
configuration file
https://github.com/marcelmay/apache-knox-exporter
NiFi
We use PrometheusReportingTask to report metrics in
Prometheus format.
It created metrics HTTP endpoint with information about
the JVM and the NiFi instance.
For flows, we need to use custom reporters like own NAR
processor that can count the number of flow files or any
different metric.
Bash Exporter
● Written in Go, manual installation as a service, needs
additional bash scripts to run
● Can make metric from output of bash command
● Status for Hadoop component:
○ Zeppelin
○ Ranger
○ HiveServer
https://github.com/gree-gorey/bash-exporter
Hive Query
● Running Hive query via crontab and writing output to
file
● Running bash exporter reading file and displaying
metrics for Prometheus
● Reading metrics in Grafana
Flink monitoring
The most important processing tool.
Flink jobs must run all the time.
Flink monitoring
Processing metrics
• Number of processed events
• Lag of processing that may trigger
alerts
• JVM parameters
Flink stability
• How many restarts does Flink have?
• Can it finish making checkpoints
and savepoints?
Before Spark 3.0
Spark & Prometheus
What about Spark jobs?
• Let’s connect Prometheus with
statsd_exporter
• Use built in sink for statsd in Spark
Spark & Prometheus
The best way to achieve stable metrics system
with right timestamp is about using statsd_sink.
These metrics can be sent to statsd_exporter
(https://github.com/prometheus/statsd_exporter)
and then can be exposed to the Prometheus.
Spark & Prometheus
The different approach based on JMX Exporter
and using Spark’s JMXSink.
The next is about using Spark’s GraphiteSink and
using Prometheus Graphite Exporter.
The last is about building custom exporter of
metrics.
From Spark 3.0
Spark & Prometheus
Spark 3.0 introduces the following resources:
- PrometheusServlet - which makes the
Master/Worker/Driver nodes expose metrics in
a Prometheus format.
- Prometheus Resource - which export metrics of
all executors at the driver.
Monitoring
Cloud
AWS
- Cloudwatch
- Cloudwatch Logs
Google Cloud
- Operations metrics
- Operations Logging
Formerly Stackdriver
Microsoft Azure
- Azure Monitor:
- Logs
- Metrics
Monitoring
Demo
Logs analysis
Overview
Logs analysis
Logging solutions
Log analytics
ELK
Elasticsearch
Elasticsearch is a distributed, open source
search and analytics engine for all types of
data, including textual, numerical,
geospatial, structured, and unstructured.
Elasticsearch is built on Apache Lucene
and was first released in 2010
Elasticsearch
Elasticsearch is a distributed, open source
search and analytics engine for all types of
data, including textual, numerical,
geospatial, structured, and unstructured.
Elasticsearch is built on Apache Lucene
and was first released in 2010
Elasticsearch Use cases
• Logging and log analytics
• Infrastructure metrics and container
monitoring
• Application performance monitoring
• Geospatial data analysis and
visualization
• Security analytics
• Business analytics
Logs & Metrics
Logs & Metrics
Log analytics
Loki
Loki
Loki is a horizontally-scalable,
highly-available, multi-tenant log
aggregation system inspired by
Prometheus. It is designed to be very cost
effective and easy to operate. It does not
index the contents of the logs, but rather a
set of labels for each log stream.
Loki Overview
Loki receives logs in separate streams, where each
stream is uniquely identified by its tenant ID
and its set of labels.
As log entries from a stream arrive, they are
GZipped as "chunks" and saved in the chunks
store.
The index stores each stream's label set and links
them to the individual chunks.
Log analytics with Loki
LogQL
This example counts all the log lines within the last
five minutes for the MySQL job.
rate( ( {job="mysql"} |= "error" !=
"timeout)[10s] ) )
It counts the entries for each log stream
count_over_time({job="mysql"}[5m])
Promtail
Promtail is an agent which ships the
contents of local logs to a private Loki
instance. It is deployed to every machine
that has applications needed to be
monitored.
Currently, Promtail can tail logs from two
sources: local log files and the systemd
journal (on AMD64 machines only).
Pipelines in Promtail
A pipeline is used to transform a single
log line, its labels, and its timestamp.
A pipeline is comprised of a set of
stages.
Loki - HA
ELK vs. Loki+Promtail
ELK Loki + Promtail
Data visualisation Kibana Grafana
Query performance Faster due to indexed
all the data
Slower due to indexing
only labels
Resource
consumption
Higher due to the need
of indexing
Lower due to index
only labels
ELK vs. Loki+Promtail
ELK Loki + Promtail
Indexing Keys and content of
each key
Only labels
Query language Query DSL or Lucene
QL
LogQL
Log pipeline More components
1. Fluentd/Fluentbit
-> Logstash ->
Elasticsearch
2. Filebeat ->
Logstash ->
Elasticsearch
Fewer components:
Promtail -> Loki
Logs analysis
Why and how to implement?
Article
https://getindata.com/blog/why-log-analyti
cs-important-monitoring-system
Whitepaper
Monitoring and Observability for Data Platform
https://getindata.com/blog/white-paper-big-
data-monitoring-observability-data-platfor
m
Logs analysis
Demo
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Q&A

More Related Content

Similar to Monitoring in Big Data Platform - Albert Lewandowski, GetInData

Troubleshooting and Best Practices with WSO2 Enterprise Integrator
Troubleshooting and Best Practices with WSO2 Enterprise IntegratorTroubleshooting and Best Practices with WSO2 Enterprise Integrator
Troubleshooting and Best Practices with WSO2 Enterprise IntegratorWSO2
 
Monitoring Your AWS EKS Environment with Datadog
Monitoring Your AWS EKS Environment with DatadogMonitoring Your AWS EKS Environment with Datadog
Monitoring Your AWS EKS Environment with DatadogDevOps.com
 
Troubleshooting and Best Practices with WSO2 Enterprise Integrator
Troubleshooting and Best Practices with WSO2 Enterprise IntegratorTroubleshooting and Best Practices with WSO2 Enterprise Integrator
Troubleshooting and Best Practices with WSO2 Enterprise IntegratorWSO2
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Brian Brazil
 
Monitoring kubernetes with prometheus-operator
Monitoring kubernetes with prometheus-operatorMonitoring kubernetes with prometheus-operator
Monitoring kubernetes with prometheus-operatorLili Cosic
 
Observability for Integration Using WSO2 Enterprise Integrator
Observability for Integration Using WSO2 Enterprise IntegratorObservability for Integration Using WSO2 Enterprise Integrator
Observability for Integration Using WSO2 Enterprise IntegratorWSO2
 
Implementing Observability for Kubernetes.pdf
Implementing Observability for Kubernetes.pdfImplementing Observability for Kubernetes.pdf
Implementing Observability for Kubernetes.pdfJose Manuel Ortega Candel
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogJoe Stein
 
DevOps Spain 2019. Beatriz Martínez-IBM
DevOps Spain 2019. Beatriz Martínez-IBMDevOps Spain 2019. Beatriz Martínez-IBM
DevOps Spain 2019. Beatriz Martínez-IBMatSistemas
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With PrometheusKnoldus Inc.
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Brian Brazil
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Brian Brazil
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaSridhar Kumar N
 
How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...Paul Brebner
 
OSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike Klusik
OSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike KlusikOSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike Klusik
OSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike KlusikNETWAYS
 
vRO Training Document
vRO Training DocumentvRO Training Document
vRO Training DocumentMayank Goyal
 
Monitoring at scale: Migrating to Prometheus at Fastly
Monitoring at scale: Migrating to Prometheus at FastlyMonitoring at scale: Migrating to Prometheus at Fastly
Monitoring at scale: Migrating to Prometheus at FastlyMarcus Barczak
 
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...NETWAYS
 

Similar to Monitoring in Big Data Platform - Albert Lewandowski, GetInData (20)

Troubleshooting and Best Practices with WSO2 Enterprise Integrator
Troubleshooting and Best Practices with WSO2 Enterprise IntegratorTroubleshooting and Best Practices with WSO2 Enterprise Integrator
Troubleshooting and Best Practices with WSO2 Enterprise Integrator
 
Monitoring Your AWS EKS Environment with Datadog
Monitoring Your AWS EKS Environment with DatadogMonitoring Your AWS EKS Environment with Datadog
Monitoring Your AWS EKS Environment with Datadog
 
Troubleshooting and Best Practices with WSO2 Enterprise Integrator
Troubleshooting and Best Practices with WSO2 Enterprise IntegratorTroubleshooting and Best Practices with WSO2 Enterprise Integrator
Troubleshooting and Best Practices with WSO2 Enterprise Integrator
 
Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)Prometheus for Monitoring Metrics (Fermilab 2018)
Prometheus for Monitoring Metrics (Fermilab 2018)
 
Prometheus workshop
Prometheus workshopPrometheus workshop
Prometheus workshop
 
Monitoring kubernetes with prometheus-operator
Monitoring kubernetes with prometheus-operatorMonitoring kubernetes with prometheus-operator
Monitoring kubernetes with prometheus-operator
 
System monitoring
System monitoringSystem monitoring
System monitoring
 
Observability for Integration Using WSO2 Enterprise Integrator
Observability for Integration Using WSO2 Enterprise IntegratorObservability for Integration Using WSO2 Enterprise Integrator
Observability for Integration Using WSO2 Enterprise Integrator
 
Implementing Observability for Kubernetes.pdf
Implementing Observability for Kubernetes.pdfImplementing Observability for Kubernetes.pdf
Implementing Observability for Kubernetes.pdf
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
DevOps Spain 2019. Beatriz Martínez-IBM
DevOps Spain 2019. Beatriz Martínez-IBMDevOps Spain 2019. Beatriz Martínez-IBM
DevOps Spain 2019. Beatriz Martínez-IBM
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
Prometheus: A Next Generation Monitoring System (FOSDEM 2016)
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)
 
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,GrafanaPrometheus - Intro, CNCF, TSDB,PromQL,Grafana
Prometheus - Intro, CNCF, TSDB,PromQL,Grafana
 
How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...
 
OSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike Klusik
OSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike KlusikOSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike Klusik
OSMC 2019 | Monitoring Cockpit for Kubernetes Clusters by Ulrike Klusik
 
vRO Training Document
vRO Training DocumentvRO Training Document
vRO Training Document
 
Monitoring at scale: Migrating to Prometheus at Fastly
Monitoring at scale: Migrating to Prometheus at FastlyMonitoring at scale: Migrating to Prometheus at Fastly
Monitoring at scale: Migrating to Prometheus at Fastly
 
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
OSDC 2018 | Hardware-level data-center monitoring with Prometheus by Conrad H...
 

More from GetInData

How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...GetInData
 
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr MenclewiczData-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr MenclewiczGetInData
 
How NOT to win a Kaggle competition
How NOT to win a Kaggle competitionHow NOT to win a Kaggle competition
How NOT to win a Kaggle competitionGetInData
 
How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team? How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team? GetInData
 
OpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easierOpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easierGetInData
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformGetInData
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataGetInData
 
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...GetInData
 
MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...GetInData
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
 
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInDataFeast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInDataGetInData
 
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...GetInData
 
Big data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInDataBig data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInDataGetInData
 
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...GetInData
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...GetInData
 
Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...GetInData
 
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...GetInData
 
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...GetInData
 
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInDataStrategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInDataGetInData
 
Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...GetInData
 

More from GetInData (20)

How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...
 
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr MenclewiczData-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
 
How NOT to win a Kaggle competition
How NOT to win a Kaggle competitionHow NOT to win a Kaggle competition
How NOT to win a Kaggle competition
 
How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team? How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team?
 
OpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easierOpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easier
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML Platform
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
 
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
 
MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInDataFeast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
 
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
 
Big data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInDataBig data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInData
 
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...
 
Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...
 
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
 
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
 
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInDataStrategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
 
Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...
 

Recently uploaded

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 

Recently uploaded (20)

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 

Monitoring in Big Data Platform - Albert Lewandowski, GetInData

  • 1. Monitoring in Big Data Platform Author: Albert Lewandowski
  • 2. •Big Data DevOps Engineer in Getindata •Focused on infrastructure, cloud, Internet of Things and Big Data Who am I?
  • 3. Agenda 1. Monitoring overview. 2. Metrics. a. Infrastructure b. Big Data Components c. Applications. 3. Logs analysis. a. Overview. b. Logging solutions. c. Why and how to implement? 4. Q&A
  • 5. Push vs. Pull model Push Pull Agents push metrics Collector takes metrics Polling task fully distributed among agents, resulting in linear scalability. Workload on central poller increases with the number of devices polled. Push agents are inherently secure against remote attacks since they do not listen for network connections. Polling protocol can potentially open up system to remote access and denial of service attacks. Relatively inflexible: pre-determined, fixed set of measurements are periodically exported. Flexible: poller can ask for any metric at any time.
  • 8. Prometheus Architecture Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud and joined the Cloud Native Computing Foundation in 2016 as the second hosted project, after Kubernetes. Components: • server • client libraries for instrumenting application code • a push gateway for supporting short-lived jobs • special-purpose exporters for services • an AlertManager to handle alerts
  • 9. Prometheus’ stories •Use service discovery, it’s great • Discover where Flink JM and TM expose their metrics. •How to provide HA? • Think of using long-term storage like Thanos or M3 or Cortex •Do you need archived data? •Monitor Prometheus even if it’s a monitoring tool.
  • 10. Service Discovery It is used for discovering scrape targets. We can use Kubernetes, Consul or file-based service discovery and many others.
  • 11. Prometheus Security Remember: ■ It should not by default be possible for a target to expose data that impersonates a different target. The honor_labels option removes this protection, as can certain relabelling setups. ■ --web.enable-admin-api flag controls access to the administrative HTTP API which includes functionality such as deleting time series. This is disabled by default. If enabled, administrative and mutating functionality will be accessible under the /api/*/admin/ paths. ■ Any secrets stored in template files could be exfiltrated by anyone able to configure receivers in the Alertmanager configuration file.
  • 12. Prometheus Security Remember: ■ Prometheus and its components do not provide any server-side authentication, authorization or encryption. If you require this, it is recommended to use a reverse proxy. ■ As administrative and mutating endpoints are intended to be accessed via simple tools such as cURL, there is no built in CSRF protection. In the future, server-side TLS support will be rolled out to the different Prometheus projects. Those projects include Prometheus, Alertmanager, Pushgateway and the official exporters. Authentication of clients by TLS client certs will also be supported.
  • 13. HA for Prometheus Simple Two Prometheus instances
  • 14. HA for Prometheus HA with cold storage ● Cortex (below) ● M3DB ● Thanos
  • 16. Servers Monitoring ■ Using Node Exporter or any custom exporter ■ Predict disk usage ■ Send alerts in case of any issue or any dangerous situation
  • 17. Kubernetes Monitoring ■ Nodes monitoring ■ Pods monitoring ■ Namespace monitoring Check: - resources limits and resources requests - resource utilization - Node Exporter - Kube State metrics -it is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects.
  • 20. Kafka exporter We can scrape and expose mBeans of a JMX target. Add to KAFKA_OPTS export KAFKA_OPTS='-javaagent:/opt/ jmx-exporter/jmx-exporter.ja r=9101:/etc/jmx-exporter/kaf ka.yml'
  • 21. Kafka exporter Metrics example: • Number or replicas • Under replicated partitions • State of each broker • JVM metrics
  • 22. What about Kafka? • Critical component in many systems • A lot of data so any issue can cause business loss • What about adding more brokers?
  • 23. What about Kafka? • Keep it simple with JMX exporter. • Monitor partitions state, number of replicas • You must have alerts when the number of under replicated partitions grows fast
  • 25. When should I use PushGateway? • When monitoring multiple instances through a single Pushgateway, the Pushgateway becomes both a single point of failure and a potential bottleneck. • You lose Prometheus's automatic instance health monitoring via the up metric (generated on every scrape). • The Pushgateway never forgets series pushed to it and will expose them to Prometheus forever unless those series are manually deleted via the Pushgateway's API.
  • 26. Blackbox Exporter ● Official Prometheus Exporter, written in Go ● Single binary installed on monitoring server ● Allows probing API endpoints https://github.com/prometheus/blackbox_exporter
  • 27. Airflow Exporter ● Provides metrics for Airflow: ○ dag_status ○ task_status ○ run_duration ● Simple install via pip https://github.com/epoch8/airflow-exporter
  • 28. Knox Exporter ● Provides metrics about access to WebHDFS and Hive ● Written in Java, manual installation as a service, needs configuration file https://github.com/marcelmay/apache-knox-exporter
  • 29. NiFi We use PrometheusReportingTask to report metrics in Prometheus format. It created metrics HTTP endpoint with information about the JVM and the NiFi instance. For flows, we need to use custom reporters like own NAR processor that can count the number of flow files or any different metric.
  • 30. Bash Exporter ● Written in Go, manual installation as a service, needs additional bash scripts to run ● Can make metric from output of bash command ● Status for Hadoop component: ○ Zeppelin ○ Ranger ○ HiveServer https://github.com/gree-gorey/bash-exporter
  • 31. Hive Query ● Running Hive query via crontab and writing output to file ● Running bash exporter reading file and displaying metrics for Prometheus ● Reading metrics in Grafana
  • 32. Flink monitoring The most important processing tool. Flink jobs must run all the time.
  • 33. Flink monitoring Processing metrics • Number of processed events • Lag of processing that may trigger alerts • JVM parameters Flink stability • How many restarts does Flink have? • Can it finish making checkpoints and savepoints?
  • 36. What about Spark jobs? • Let’s connect Prometheus with statsd_exporter • Use built in sink for statsd in Spark
  • 37. Spark & Prometheus The best way to achieve stable metrics system with right timestamp is about using statsd_sink. These metrics can be sent to statsd_exporter (https://github.com/prometheus/statsd_exporter) and then can be exposed to the Prometheus.
  • 38. Spark & Prometheus The different approach based on JMX Exporter and using Spark’s JMXSink. The next is about using Spark’s GraphiteSink and using Prometheus Graphite Exporter. The last is about building custom exporter of metrics.
  • 40. Spark & Prometheus Spark 3.0 introduces the following resources: - PrometheusServlet - which makes the Master/Worker/Driver nodes expose metrics in a Prometheus format. - Prometheus Resource - which export metrics of all executors at the driver.
  • 43. Google Cloud - Operations metrics - Operations Logging Formerly Stackdriver
  • 44. Microsoft Azure - Azure Monitor: - Logs - Metrics
  • 49. Elasticsearch Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. Elasticsearch is built on Apache Lucene and was first released in 2010
  • 50. Elasticsearch Elasticsearch is a distributed, open source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. Elasticsearch is built on Apache Lucene and was first released in 2010
  • 51. Elasticsearch Use cases • Logging and log analytics • Infrastructure metrics and container monitoring • Application performance monitoring • Geospatial data analysis and visualization • Security analytics • Business analytics
  • 55. Loki Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus. It is designed to be very cost effective and easy to operate. It does not index the contents of the logs, but rather a set of labels for each log stream.
  • 56. Loki Overview Loki receives logs in separate streams, where each stream is uniquely identified by its tenant ID and its set of labels. As log entries from a stream arrive, they are GZipped as "chunks" and saved in the chunks store. The index stores each stream's label set and links them to the individual chunks.
  • 58. LogQL This example counts all the log lines within the last five minutes for the MySQL job. rate( ( {job="mysql"} |= "error" != "timeout)[10s] ) ) It counts the entries for each log stream count_over_time({job="mysql"}[5m])
  • 59. Promtail Promtail is an agent which ships the contents of local logs to a private Loki instance. It is deployed to every machine that has applications needed to be monitored. Currently, Promtail can tail logs from two sources: local log files and the systemd journal (on AMD64 machines only).
  • 60. Pipelines in Promtail A pipeline is used to transform a single log line, its labels, and its timestamp. A pipeline is comprised of a set of stages.
  • 62. ELK vs. Loki+Promtail ELK Loki + Promtail Data visualisation Kibana Grafana Query performance Faster due to indexed all the data Slower due to indexing only labels Resource consumption Higher due to the need of indexing Lower due to index only labels
  • 63. ELK vs. Loki+Promtail ELK Loki + Promtail Indexing Keys and content of each key Only labels Query language Query DSL or Lucene QL LogQL Log pipeline More components 1. Fluentd/Fluentbit -> Logstash -> Elasticsearch 2. Filebeat -> Logstash -> Elasticsearch Fewer components: Promtail -> Loki
  • 64. Logs analysis Why and how to implement?
  • 66.
  • 67. Whitepaper Monitoring and Observability for Data Platform https://getindata.com/blog/white-paper-big- data-monitoring-observability-data-platfor m
  • 69. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Q&A