SlideShare a Scribd company logo
Monitor Apache Spark 3
on Kubernetes using
Metrics and Plugins
Luca Canali
Data Engineer, CERN
About Luca
• Data Engineer at CERN
• Data analytics and Spark service, database services
• 20+ years with databases and data engineering
• Passionate about performance engineering
• Repos, blogs, presentations
• @LucaCanaliDB
https://github.com/lucacanali
http://cern.ch/canali
Agenda
 Apache Spark monitoring:
ecosystem and motivations
 Spark metrics system
 Spark 3 plugins
 Metrics for Spark on K8S
and cloud storage
 How you can run a Spark
performance dashboard
Performance Troubleshooting Goals
• The key to good performance:
• You run good execution plans
• There are no serialization points
• Without these all bets are off!
• Attribution: I first heard this from Andrew Holdsworth (context: Oracle DB performance discussion ~2007)
• Spark users care about performance at scale
• Investigate execution plans and bottlenecks in the workload
• Useful tools are the Spark Web UI and metrics instrumentation!
Apache Spark Monitoring Ecosystem
• Web UI
• Details on jobs, stages, tasks, SQL, streaming, etc
• Default URL: http://driver:4040
• https://spark.apache.org/docs/latest/web-ui.html
• Spark REST API + Spark Listener @Developer API
• Exposes task metrics and executor metrics
• https://spark.apache.org/docs/latest/monitoring.html
• Spark Metrics System
• Implemented using the Dropwizard metrics library
Spark Metrics System
• Many metrics instrument Spark workloads:
• https://spark.apache.org/docs/latest/monitoring.html#metrics
• Metrics are emitted from the driver, executors and other Spark components
into sinks (several sinks available)
• Example of metrics: number of active tasks, jobs/stages completed and failed,
executor CPU used, executor run time, garbage collection time, shuffle metrics,
I/O metrics, metrics with memory usage details, etc.
Explore Spark Metrics
• Web UI Servlet Sink
• Metrics from the driver + executor (executor metrics available only if Spark is in local mode)
• By default: metrics are exposed using the metrics servlet on the WebUI
http://driver_host:4040/metrics/json
• Prometheus sink for the driver and for spark standalone, exposes metrics on the Web UI too
• *.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
• *.sink.prometheusServlet.path=/metrics/prometheus
• Jmx Sink
• Use jmx sink and explore metrics using jconsole
• Configuration: *.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink
Spark Metrics System and Monitoring Pipeline
How to Use Spark Metrics – Sink to InfluxDB
• Edit $SPARK_HOME/conf/metrics.properties
• Alternative: use the config parameters spark.metrics.conf.*
• Use this method if running on K8S on Spark version 2.x or 3.0, see also SPARK-30985
cat $SPARK_HOME/conf/metrics.properties
*.sink.graphite.class"="org.apache.spark.metrics.sink.GraphiteSink"
*.sink.graphite.host"="<graphiteEndPoint_influxDB_hostName>"
*.sink.graphite.port"=<graphite_listening_port>
*.sink.graphite.period"=10
*.sink.graphite.unit"=seconds
*.sink.graphite.prefix"="lucatest"
*.source.jvm.class"="org.apache.spark.metrics.source.JvmSource"
$SPARK_HOME/bin/spark-shell 
--conf "spark.metrics.conf.*.sink.graphite.class"="org.apache.spark.metrics.sink.GraphiteSink“ 
--conf "spark.metrics.conf.*.sink.graphite.host"="<graphiteEndPoint_influxDB_hostName>" 
..etc..
Metrics Visualization with Grafana
• Metrics visualized as time series
• Metrics values visualized vs. time and detailed per executor
Number of Active Tasks Executor JVM CPU Usage
Metrics for Spark on Kubernetes
• Spark on Kubernetes is GA since Spark 3.1.1
• Running Spark on cloud resources is widely used
• Need for improved instrumentation in Spark
• Spark 3 metrics system can be extended with plugins
• Plugin metrics to monitor K8S pods’ resources usage (CPU, memory, network,
etc)
• Plugin instrumentation for cloud filesystems: S3A, GS, WASBS, OCI, CERN’s
EOS, etc)
Spark Plugins and
Custom Extensions to the Metrics System
• Plugin interface introduced in Spark 3
• Plugins are user-provided code run at the start of the executors (and of the driver)
• Plugins allow to extend Spark metrics with custom code and instrumentation
A First Look at the Spark Plugins API
• Code snippets for demonstration
• A Spark Plugin implementing a metric that reports the constant value 42
Spark 3.x Plugin API
metricRegistry allows to register user-provided
sources with Spark Metrics System
‘Kick the Tires’ of Spark Plugins
▪ From https://github.com/cerndb/SparkPlugins
▪ RunOSCommandPlugin - runs an OS command as the executor starts
▪ DemoMetricsPlugin - shows how to integrate with Spark Metrics
bin/spark-shell --master k8s://https://<K8S URL> 
• --packages ch.cern.sparkmeasure:spark-plugins_2.12:0.1 
• --conf spark.plugins=ch.cern.RunOSCommandPlugin,
• ch.cern.DemoMetricsPlugin
Plugins for Spark on Kubernetes
• Measure metrics related to pods’ resources usage
• Integrated with rest of Spark metrics
• Plugin code implemented using cgroup instrumentation
• Example: measure CPU from /sys/fs/cgroup/cpuacct/cpuacct.usage
bin/spark-shell --master k8s://https://<K8S URL> 
--packages ch.cern.sparkmeasure:spark-plugins_2.12:0.1 
--conf spark.kubernetes.container.image=<registry>/spark:v311 
--conf spark.plugins=ch.cern.CgroupMetrics 
...
CgroupMetrics Plugin
▪ Metrics (gauges), in ch.cern.CgroupMetrics plugin:
▪ CPUTimeNanosec: CPU time used by the processes in the cgroup
▪ this includes CPU used by Python processes (PySpark UDF)
▪ MemoryRss: number of bytes of anonymous and swap cache memory.
▪ MemorySwap: number of bytes of swap usage.
▪ MemoryCache: number of bytes of page cache memory.
▪ NetworkBytesIn: network traffic inbound.
▪ NetworkBytesOut: network traffic outbound.
Plugin to Measure Cloud Filesystems
▪ Example of how to measure S3 throughput metrics
▪ Note: Apache Spark instruments only HDFS and local filesystem
▪ Plugins uses Hadoop client API for Hadoop Compatible filesystems
▪ Metrics: bytesRead, bytesWritten, readOps, writeOps
--conf spark.plugins=ch.cern.CloudFSMetrics
--conf spark.cernSparkPlugin.cloudFsName=<name of the filesystem>
(example: "s3a", "gs", "wasbs", "oci", "root", etc.)
• Code and examples to get started:
• https://github.com/cerndb/spark-dashboard
• It simplifies the configuration of InfluxDB as a sink for Spark metrics
• Grafana dashboards with pre-built panels, graphs, and queries
• Option 1: Dockerfile
• Use this for testing locally
• From dockerhub: lucacanali/spark-dashboard:v01
• Option 2: Helm Chart
• Use for testing on Kubernetes
Tooling for a Spark Performance Dashboard
Tooling for Spark Plugins
• Code and examples to get started:
• https://github.com/cerndb/SparkPlugins
• --packages ch.cern.sparkmeasure:spark-plugins_2.12:0.1
• Plugins for OS metrics (Spark on K8S)
• --conf spark.plugins=ch.cern.CgroupMetrics
• Plugins for measuring cloud storage
• Hadoop Compatible Filesystems: s3a, gs, wasbs, oci, root, etc..
• --conf spark.plugins=ch.cern.CloudFSMetrics
• --conf spark.cernSparkPlugin.cloudFsName=<filesystem name>
How to Deploy the Dashboard - Example
docker run --network=host -d lucacanali/spark-dashboard:v01
INFLUXDB_ENDPOINT=`hostname`
bin/spark-shell --master k8s://https://<K8S URL> 
• --packages ch.cern.sparkmeasure:spark-plugins_2.12:0.1 
• --conf spark.plugins=ch.cern.CgroupMetrics 
• --conf "spark.metrics.conf.*.sink.graphite.class"=
• "org.apache.spark.metrics.sink.GraphiteSink" 
• --conf "spark.metrics.conf.*.sink.graphite.host"=$INFLUXDB_ENDPOINT 
• --conf "spark.metrics.conf.*.sink.graphite.port"=2003 
• --conf "spark.metrics.conf.*.sink.graphite.period"=10 
• --conf "spark.metrics.conf.*.sink.graphite.unit"=seconds 
• --conf "spark.metrics.conf.*.sink.graphite.prefix"="luca“ 
• --conf spark.metrics.appStatusSource.enabled=true
...
Spark Performance Dashboard
• Visualize Spark metrics
• Real-time + historical data
• Summaries and time series of key metrics
• Data for root-cause analysis
Dashboard Annotations
• Annotations to the workload graphs with:
• Start/end time for job id, or stage id, or SQL id
Dashboard Annotations
• Annotations to the workload graphs with:
• Start/end time for job id, or stage id, or SQL id
INFLUXDB_HTTP_ENDPOINT="http://`hostname`:8086"
--packages ch.cern.sparkmeasure:spark-measure_2.12:0.17 
--conf spark.sparkmeasure.influxdbURL=$INFLUXDB_HTTP_ENDPOINT 
--conf spark.extraListeners=ch.cern.sparkmeasure.InfluxDBSink
Demo – Spark Performance Dashboard
(5 min)
Use Plugins to Instrument Custom Libraries
• Augment Spark metrics with metrics from custom code
• Example: experimental way to measure I/O time with metrics
• Measure read time, seek time and CPU time during read operations for HDFS, S3A, etc.
▪ Custom S3A jar with time instrumentation (for Hadoop 3.2.0, use with Spark 3.0 and 3.1):
https://github.com/LucaCanali/hadoop/tree/s3aAndHDFSTimeInstrumentation
▪ Metrics: S3AReadTimeMuSec, S3ASeekTimeMuSec, S3AGetObjectMetadataMuSec
▪ Spark metrics with custom Spark plugin:
▪ --packages ch.cern.sparkmeasure:spark-plugins_2.12:0.1
▪ --conf spark.plugins=ch.cern.experimental.S3ATimeInstrumentation
Lessons Learned and Further Work
▪ Feedback on deploying the dashboard @CERN
▪ We provide the Spark dashboard as an optional configuration for the CERN
Jupyter-based data analysis platform
▪ Cognitive load to understand the available metrics and troubleshooting process
▪ We set data retention and in general pay attention not to overload InfluxDB
▪ Core Spark development
▪ A native InfluxDB sink would be useful (currently Spark has a Graphite sink)
▪ Spark is not yet fully instrumented, a few areas yet to cover, for example
instrumentation of I/O time and Python UDF run time.
Conclusions
▪ Spark metrics and dashboard
▪ Provide extensive performance monitoring data. Complement the Web UI.
▪ Spark plugins
▪ Introduced in Spark 3.0, make it easy to augment the Spark metrics system.
▪ Use plugins to monitor Spark on Kubernetes and cloud filesystem usage.
▪ Tooling
▪ Get started with a dashboard based on InfluxDB and Grafana.
▪ Build your plugins and dashboards, and share your experience!
Acknowledgments and Links
▪ Thanks to CERN data analytics and Spark service team
▪ Thanks to Apache Spark committers and Spark community
▪ For their help with PRs: SPARK-22190, SPARK-25170, SPARK-25228, SPARK-
26928, SPARK-27189, SPARK-28091, SPARK-29654, SPARK-30041, SPARK-
30775, SPARK-31711
▪ Links:
• https://github.com/cerndb/SparkPlugins
• https://github.com/cerndb/spark-dashboard
• https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-
apache-spark
• https://github.com/LucaCanali/sparkMeasure

More Related Content

What's hot

Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
Databricks
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
Databricks
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Adam Doyle
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
DataWorks Summit
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
Databricks
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Anant Corporation
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
Databricks
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
Databricks
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
Ryan Blue
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
Databricks
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
Julien Le Dem
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Databricks
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
Databricks
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 

What's hot (20)

Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 

Similar to Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins

Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
Spark Summit
 
Native support of Prometheus monitoring in Apache Spark 3
Native support of Prometheus monitoring in Apache Spark 3Native support of Prometheus monitoring in Apache Spark 3
Native support of Prometheus monitoring in Apache Spark 3
Dongjoon Hyun
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Databricks
 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Yao Yao
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Databricks
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
Databricks
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
Karan Alang
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
Stavros Kontopoulos
 
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay RaiConquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Databricks
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
Databricks
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Databricks
 
Operational Tips For Deploying Apache Spark
Operational Tips For Deploying Apache SparkOperational Tips For Deploying Apache Spark
Operational Tips For Deploying Apache Spark
Databricks
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Landon Robinson
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructure
harendra_pathak
 
Productionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan ChanProductionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan Chan
Spark Summit
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
Evan Chan
 
Data Science at Scale with Apache Spark and Zeppelin Notebook
Data Science at Scale with Apache Spark and Zeppelin NotebookData Science at Scale with Apache Spark and Zeppelin Notebook
Data Science at Scale with Apache Spark and Zeppelin Notebook
Carolyn Duby
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
Ahmet Bulut
 
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkEnd-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache Spark
Burak Yavuz
 

Similar to Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins (20)

Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
 
Native support of Prometheus monitoring in Apache Spark 3
Native support of Prometheus monitoring in Apache Spark 3Native support of Prometheus monitoring in Apache Spark 3
Native support of Prometheus monitoring in Apache Spark 3
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay RaiConquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
Conquering Hadoop and Apache Spark with Operational Intelligence with Akshay Rai
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
Operational Tips For Deploying Apache Spark
Operational Tips For Deploying Apache SparkOperational Tips For Deploying Apache Spark
Operational Tips For Deploying Apache Spark
 
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
Spark + AI Summit 2019: Headaches and Breakthroughs in Building Continuous Ap...
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructure
 
Productionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan ChanProductionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan Chan
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Data Science at Scale with Apache Spark and Zeppelin Notebook
Data Science at Scale with Apache Spark and Zeppelin NotebookData Science at Scale with Apache Spark and Zeppelin Notebook
Data Science at Scale with Apache Spark and Zeppelin Notebook
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
End-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache SparkEnd-to-End Data Pipelines with Apache Spark
End-to-End Data Pipelines with Apache Spark
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
bhupeshkumar0889
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
kinni singh$A17
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
huseindihon
 
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
dizzycaye
 
Girls call in Hyderabad 000XX00000 Provide Best And Top Girl Service And No1 ...
Girls call in Hyderabad 000XX00000 Provide Best And Top Girl Service And No1 ...Girls call in Hyderabad 000XX00000 Provide Best And Top Girl Service And No1 ...
Girls call in Hyderabad 000XX00000 Provide Best And Top Girl Service And No1 ...
avanikakapoor
 
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
kuldeepsharmaks8120
 
Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 in City
solankikamal004
 
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
ginni singh$A17
 
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
birajmohan012
 
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
sharonblush
 
Lucknow Girls Call Indira Nagar Lucknow 08630512678 Provide Best And Top Girl...
Lucknow Girls Call Indira Nagar Lucknow 08630512678 Provide Best And Top Girl...Lucknow Girls Call Indira Nagar Lucknow 08630512678 Provide Best And Top Girl...
Lucknow Girls Call Indira Nagar Lucknow 08630512678 Provide Best And Top Girl...
bangaloreakshitakaus
 
Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...
kittycrispy617
 
CHAPTER-1-Introduction-to-Marketing.pptx
CHAPTER-1-Introduction-to-Marketing.pptxCHAPTER-1-Introduction-to-Marketing.pptx
CHAPTER-1-Introduction-to-Marketing.pptx
girewiy968
 
the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...
huseindihon
 
Celonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptxCelonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptx
AnujaGaikwad28
 
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
kuldeepsharmaks8120
 
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
janvikumar4133
 
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
sukaniyasunnu
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
uapta
 

Recently uploaded (20)

🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...🚂🚘 Premium Girls Call Bangalore  🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
🚂🚘 Premium Girls Call Bangalore 🛵🚡000XX00000 💃 Choose Best And Top Girl Serv...
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
 
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
 
Girls call in Hyderabad 000XX00000 Provide Best And Top Girl Service And No1 ...
Girls call in Hyderabad 000XX00000 Provide Best And Top Girl Service And No1 ...Girls call in Hyderabad 000XX00000 Provide Best And Top Girl Service And No1 ...
Girls call in Hyderabad 000XX00000 Provide Best And Top Girl Service And No1 ...
 
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
 
Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 in City
 
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
 
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
 
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
 
Lucknow Girls Call Indira Nagar Lucknow 08630512678 Provide Best And Top Girl...
Lucknow Girls Call Indira Nagar Lucknow 08630512678 Provide Best And Top Girl...Lucknow Girls Call Indira Nagar Lucknow 08630512678 Provide Best And Top Girl...
Lucknow Girls Call Indira Nagar Lucknow 08630512678 Provide Best And Top Girl...
 
Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...
 
CHAPTER-1-Introduction-to-Marketing.pptx
CHAPTER-1-Introduction-to-Marketing.pptxCHAPTER-1-Introduction-to-Marketing.pptx
CHAPTER-1-Introduction-to-Marketing.pptx
 
the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...
 
Celonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptxCelonis Busniess Analyst Virtual Internship.pptx
Celonis Busniess Analyst Virtual Internship.pptx
 
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
 
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
Beautiful Girls Call 9711199171 9711199171 Provide Best And Top Girl Service ...
 
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
 

Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins

  • 1. Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins Luca Canali Data Engineer, CERN
  • 2. About Luca • Data Engineer at CERN • Data analytics and Spark service, database services • 20+ years with databases and data engineering • Passionate about performance engineering • Repos, blogs, presentations • @LucaCanaliDB https://github.com/lucacanali http://cern.ch/canali
  • 3. Agenda  Apache Spark monitoring: ecosystem and motivations  Spark metrics system  Spark 3 plugins  Metrics for Spark on K8S and cloud storage  How you can run a Spark performance dashboard
  • 4. Performance Troubleshooting Goals • The key to good performance: • You run good execution plans • There are no serialization points • Without these all bets are off! • Attribution: I first heard this from Andrew Holdsworth (context: Oracle DB performance discussion ~2007) • Spark users care about performance at scale • Investigate execution plans and bottlenecks in the workload • Useful tools are the Spark Web UI and metrics instrumentation!
  • 5. Apache Spark Monitoring Ecosystem • Web UI • Details on jobs, stages, tasks, SQL, streaming, etc • Default URL: http://driver:4040 • https://spark.apache.org/docs/latest/web-ui.html • Spark REST API + Spark Listener @Developer API • Exposes task metrics and executor metrics • https://spark.apache.org/docs/latest/monitoring.html • Spark Metrics System • Implemented using the Dropwizard metrics library
  • 6. Spark Metrics System • Many metrics instrument Spark workloads: • https://spark.apache.org/docs/latest/monitoring.html#metrics • Metrics are emitted from the driver, executors and other Spark components into sinks (several sinks available) • Example of metrics: number of active tasks, jobs/stages completed and failed, executor CPU used, executor run time, garbage collection time, shuffle metrics, I/O metrics, metrics with memory usage details, etc.
  • 7. Explore Spark Metrics • Web UI Servlet Sink • Metrics from the driver + executor (executor metrics available only if Spark is in local mode) • By default: metrics are exposed using the metrics servlet on the WebUI http://driver_host:4040/metrics/json • Prometheus sink for the driver and for spark standalone, exposes metrics on the Web UI too • *.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet • *.sink.prometheusServlet.path=/metrics/prometheus • Jmx Sink • Use jmx sink and explore metrics using jconsole • Configuration: *.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink
  • 8. Spark Metrics System and Monitoring Pipeline
  • 9. How to Use Spark Metrics – Sink to InfluxDB • Edit $SPARK_HOME/conf/metrics.properties • Alternative: use the config parameters spark.metrics.conf.* • Use this method if running on K8S on Spark version 2.x or 3.0, see also SPARK-30985 cat $SPARK_HOME/conf/metrics.properties *.sink.graphite.class"="org.apache.spark.metrics.sink.GraphiteSink" *.sink.graphite.host"="<graphiteEndPoint_influxDB_hostName>" *.sink.graphite.port"=<graphite_listening_port> *.sink.graphite.period"=10 *.sink.graphite.unit"=seconds *.sink.graphite.prefix"="lucatest" *.source.jvm.class"="org.apache.spark.metrics.source.JvmSource" $SPARK_HOME/bin/spark-shell --conf "spark.metrics.conf.*.sink.graphite.class"="org.apache.spark.metrics.sink.GraphiteSink“ --conf "spark.metrics.conf.*.sink.graphite.host"="<graphiteEndPoint_influxDB_hostName>" ..etc..
  • 10. Metrics Visualization with Grafana • Metrics visualized as time series • Metrics values visualized vs. time and detailed per executor Number of Active Tasks Executor JVM CPU Usage
  • 11. Metrics for Spark on Kubernetes • Spark on Kubernetes is GA since Spark 3.1.1 • Running Spark on cloud resources is widely used • Need for improved instrumentation in Spark • Spark 3 metrics system can be extended with plugins • Plugin metrics to monitor K8S pods’ resources usage (CPU, memory, network, etc) • Plugin instrumentation for cloud filesystems: S3A, GS, WASBS, OCI, CERN’s EOS, etc)
  • 12. Spark Plugins and Custom Extensions to the Metrics System • Plugin interface introduced in Spark 3 • Plugins are user-provided code run at the start of the executors (and of the driver) • Plugins allow to extend Spark metrics with custom code and instrumentation
  • 13. A First Look at the Spark Plugins API • Code snippets for demonstration • A Spark Plugin implementing a metric that reports the constant value 42 Spark 3.x Plugin API metricRegistry allows to register user-provided sources with Spark Metrics System
  • 14. ‘Kick the Tires’ of Spark Plugins ▪ From https://github.com/cerndb/SparkPlugins ▪ RunOSCommandPlugin - runs an OS command as the executor starts ▪ DemoMetricsPlugin - shows how to integrate with Spark Metrics bin/spark-shell --master k8s://https://<K8S URL> • --packages ch.cern.sparkmeasure:spark-plugins_2.12:0.1 • --conf spark.plugins=ch.cern.RunOSCommandPlugin, • ch.cern.DemoMetricsPlugin
  • 15. Plugins for Spark on Kubernetes • Measure metrics related to pods’ resources usage • Integrated with rest of Spark metrics • Plugin code implemented using cgroup instrumentation • Example: measure CPU from /sys/fs/cgroup/cpuacct/cpuacct.usage bin/spark-shell --master k8s://https://<K8S URL> --packages ch.cern.sparkmeasure:spark-plugins_2.12:0.1 --conf spark.kubernetes.container.image=<registry>/spark:v311 --conf spark.plugins=ch.cern.CgroupMetrics ...
  • 16. CgroupMetrics Plugin ▪ Metrics (gauges), in ch.cern.CgroupMetrics plugin: ▪ CPUTimeNanosec: CPU time used by the processes in the cgroup ▪ this includes CPU used by Python processes (PySpark UDF) ▪ MemoryRss: number of bytes of anonymous and swap cache memory. ▪ MemorySwap: number of bytes of swap usage. ▪ MemoryCache: number of bytes of page cache memory. ▪ NetworkBytesIn: network traffic inbound. ▪ NetworkBytesOut: network traffic outbound.
  • 17. Plugin to Measure Cloud Filesystems ▪ Example of how to measure S3 throughput metrics ▪ Note: Apache Spark instruments only HDFS and local filesystem ▪ Plugins uses Hadoop client API for Hadoop Compatible filesystems ▪ Metrics: bytesRead, bytesWritten, readOps, writeOps --conf spark.plugins=ch.cern.CloudFSMetrics --conf spark.cernSparkPlugin.cloudFsName=<name of the filesystem> (example: "s3a", "gs", "wasbs", "oci", "root", etc.)
  • 18. • Code and examples to get started: • https://github.com/cerndb/spark-dashboard • It simplifies the configuration of InfluxDB as a sink for Spark metrics • Grafana dashboards with pre-built panels, graphs, and queries • Option 1: Dockerfile • Use this for testing locally • From dockerhub: lucacanali/spark-dashboard:v01 • Option 2: Helm Chart • Use for testing on Kubernetes Tooling for a Spark Performance Dashboard
  • 19. Tooling for Spark Plugins • Code and examples to get started: • https://github.com/cerndb/SparkPlugins • --packages ch.cern.sparkmeasure:spark-plugins_2.12:0.1 • Plugins for OS metrics (Spark on K8S) • --conf spark.plugins=ch.cern.CgroupMetrics • Plugins for measuring cloud storage • Hadoop Compatible Filesystems: s3a, gs, wasbs, oci, root, etc.. • --conf spark.plugins=ch.cern.CloudFSMetrics • --conf spark.cernSparkPlugin.cloudFsName=<filesystem name>
  • 20. How to Deploy the Dashboard - Example docker run --network=host -d lucacanali/spark-dashboard:v01 INFLUXDB_ENDPOINT=`hostname` bin/spark-shell --master k8s://https://<K8S URL> • --packages ch.cern.sparkmeasure:spark-plugins_2.12:0.1 • --conf spark.plugins=ch.cern.CgroupMetrics • --conf "spark.metrics.conf.*.sink.graphite.class"= • "org.apache.spark.metrics.sink.GraphiteSink" • --conf "spark.metrics.conf.*.sink.graphite.host"=$INFLUXDB_ENDPOINT • --conf "spark.metrics.conf.*.sink.graphite.port"=2003 • --conf "spark.metrics.conf.*.sink.graphite.period"=10 • --conf "spark.metrics.conf.*.sink.graphite.unit"=seconds • --conf "spark.metrics.conf.*.sink.graphite.prefix"="luca“ • --conf spark.metrics.appStatusSource.enabled=true ...
  • 21. Spark Performance Dashboard • Visualize Spark metrics • Real-time + historical data • Summaries and time series of key metrics • Data for root-cause analysis
  • 22. Dashboard Annotations • Annotations to the workload graphs with: • Start/end time for job id, or stage id, or SQL id
  • 23. Dashboard Annotations • Annotations to the workload graphs with: • Start/end time for job id, or stage id, or SQL id INFLUXDB_HTTP_ENDPOINT="http://`hostname`:8086" --packages ch.cern.sparkmeasure:spark-measure_2.12:0.17 --conf spark.sparkmeasure.influxdbURL=$INFLUXDB_HTTP_ENDPOINT --conf spark.extraListeners=ch.cern.sparkmeasure.InfluxDBSink
  • 24. Demo – Spark Performance Dashboard (5 min)
  • 25. Use Plugins to Instrument Custom Libraries • Augment Spark metrics with metrics from custom code • Example: experimental way to measure I/O time with metrics • Measure read time, seek time and CPU time during read operations for HDFS, S3A, etc. ▪ Custom S3A jar with time instrumentation (for Hadoop 3.2.0, use with Spark 3.0 and 3.1): https://github.com/LucaCanali/hadoop/tree/s3aAndHDFSTimeInstrumentation ▪ Metrics: S3AReadTimeMuSec, S3ASeekTimeMuSec, S3AGetObjectMetadataMuSec ▪ Spark metrics with custom Spark plugin: ▪ --packages ch.cern.sparkmeasure:spark-plugins_2.12:0.1 ▪ --conf spark.plugins=ch.cern.experimental.S3ATimeInstrumentation
  • 26. Lessons Learned and Further Work ▪ Feedback on deploying the dashboard @CERN ▪ We provide the Spark dashboard as an optional configuration for the CERN Jupyter-based data analysis platform ▪ Cognitive load to understand the available metrics and troubleshooting process ▪ We set data retention and in general pay attention not to overload InfluxDB ▪ Core Spark development ▪ A native InfluxDB sink would be useful (currently Spark has a Graphite sink) ▪ Spark is not yet fully instrumented, a few areas yet to cover, for example instrumentation of I/O time and Python UDF run time.
  • 27. Conclusions ▪ Spark metrics and dashboard ▪ Provide extensive performance monitoring data. Complement the Web UI. ▪ Spark plugins ▪ Introduced in Spark 3.0, make it easy to augment the Spark metrics system. ▪ Use plugins to monitor Spark on Kubernetes and cloud filesystem usage. ▪ Tooling ▪ Get started with a dashboard based on InfluxDB and Grafana. ▪ Build your plugins and dashboards, and share your experience!
  • 28. Acknowledgments and Links ▪ Thanks to CERN data analytics and Spark service team ▪ Thanks to Apache Spark committers and Spark community ▪ For their help with PRs: SPARK-22190, SPARK-25170, SPARK-25228, SPARK- 26928, SPARK-27189, SPARK-28091, SPARK-29654, SPARK-30041, SPARK- 30775, SPARK-31711 ▪ Links: • https://github.com/cerndb/SparkPlugins • https://github.com/cerndb/spark-dashboard • https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard- apache-spark • https://github.com/LucaCanali/sparkMeasure