Native Support of Prometheus Monitoring in Apache Spark 3.0

Native Support of Prometheus Monitoring
in Apache Spark 3
Dongjoon Hyun
DB Tsai
SPARK+AI SUMMIT 2020

Who am I
Dongjoon Hyun
Apache Spark PMC and Committer
Apache ORC PMC and Committer
Apache REEF PMC and Committer
https://github.com/dongjoon-hyun
https://www.linkedin.com/in/dongjoon
@dongjoonhyun

Who am I
DB Tsai
Apache Spark PMC and Committer
Apache SystemML PMC and Committer
Apache Yunikorn Committer
Apache Bahir Committer
https://github.com/dbtsai
https://www.linkedin.com/in/dbtsai
@dbtsai

Three popular methods
Monitoring Apache Spark
Web UI (Live and History Server)
• Jobs, Stages,Tasks, SQL queries
• Executors, Storage
Logs
• Event logs and Spark process logs
• Listeners (SparkListener, StreamingQueryListener, SparkStatusTracker, …)
Metrics
• Various numeric values

Early warning instead of post-mortem process
Metrics are useful to handle gray failures
Monitoring and alerting Spark jobs’gray failures
• Memory Leak or misconﬁguration
• Performance degradation
• Growing streaming job’s inter-states

An open-source systems monitoring and alerting toolkit
Prometheus
Provides
• a multi-dimensional data model
• operational simplicity
• scalable data collection
• a powerful query language
A good option for Apache Spark Metrics
Prometheus Server
Prometheus Web UI
Alert Manager
Pushgateway
https://en.wikipedia.org/wiki/Prometheus_(software)

Using JmxSink and JMXExporter combination
Spark 2 with Prometheus (1/3)
Enable Spark’s built-in JmxSink in Spark’s conf/metrics.properties
Deploy Prometheus' JMXExporter library and its config file
Expose JMXExporter port, 9404, to Prometheus
Add `-javaagent` option to the target (master/worker/executor/driver/…)
-javaagent:./jmx_prometheus_javaagent-0.12.0.jar=9404:config.yaml

Using GraphiteSink and GraphiteExporter combination
Set up Graphite server
Enable Spark’s built-in Graphite Sink with several conﬁgurations
Enable Prometheus’GraphiteExporter at Graphite

Custom sink (or 3rd party Sink) + Pushgateway server
Set up Pushgateway server
Develop a custom sink (or use 3rd party libs) with Prometheus dependency
Deploy the sink libraries and its conﬁguration ﬁle to the cluster

Pros and Cons
Pros
• Used already in production
• A general approach
Cons
• Diﬃcult to setup at new environments
• Some custom libraries may have a dependency on Spark versions

Easy usage
Goal in Apache Spark 3
Be independent from the existing Metrics pipeline
• Use new endpoints and disable it by default
• Avoid introducing new dependency
Reuse the existing resources
• Use oﬃcial documented ports of Master/Worker/Driver
• Take advantage of Prometheus Service Discovery in K8s as much as possible

SPARK-29674 / SPARK-29557
DropWizard Metrics 4 for JDK11
Timeline
2.3 3.02.41.6 2.1 2.22.0
4.1.13.1.53.1.2DropWizard Metrics
Spark
20202019201820172016Year

DropWizard Metrics 4.x (Spark 3)
SPARK-29674 / SPARK-29557
DropWizard Metrics 4 for JDK11
Timeline
DropWizard Metrics 3.x (Spark 1/2)
metrics_master_workers_Value 0.0 metrics_master_workers_Value{type="gauges",} 0.0
metrics_master_workers_Number{type=“gauges",} 0.0
2.3 3.02.41.6 2.1 2.22.0
4.1.13.1.53.1.2DropWizard Metrics
Spark
20202019201820172016Year

A new metric source
ExecutorMetricsSource
Collect executor memory metrics to driver and expose it as ExecutorMetricsSource and
REST API (SPARK-23429, SPARK-27189, SPARK-27324, SPARK-24958)
• JVMHeapMemory / JVMOffHeapMemory
• OnHeapExecutionMemory / OffHeapExecutionMemory
• OnHeapStorageMemory / OffHeapStorageMemory
• OnHeapUnifiedMemory / OffHeapUnifiedMemory
• DirectPoolMemory / MappedPoolMemory
• MinorGCCount / MinorGCTime
• MajorGCCount / MajorGCTime
• ProcessTreeJVMVMemory
• ProcessTreeJVMRSSMemory
• ProcessTreePythonVMemory
• ProcessTreePythonRSSMemory
• ProcessTreeOtherVMemory
• ProcessTreeOtherRSSMemory
JVM Process Tree

Prometheus-format endpoints
Support Prometheus more natively (1/2)
PrometheusServlet: A friend of MetricSevlet
• A new metric sink supporting Prometheus-format (SPARK-29032)
• Uniﬁed way of conﬁgurations via conf/metrics.properties
• No additional system requirements (services / libraries / ports)

Prometheus-format endpoints
PrometheusServlet: A friend of MetricSevlet
• A new metric sink supporting Prometheus-format (SPARK-29032)
• Unified way of configurations via conf/metrics.properties
• No additional system requirements (services / libraries / ports)
PrometheusResource: A single endpoint for all executor memory metrics
• A new metric endpoint to export all executor metrics at driver (SPARK-29064/SPARK-29400)
• The most efficient way to discover and collect because driver has all information already
• Enabled by `spark.ui.prometheus.enabled` (default:false)

spark_info and service discovery
Add spark_info metric (SPARK-31743)
• A standard Prometheus way to expose
version and revision
• Monitoring Spark jobs per version
Support driver service annotation in K8S (SPARK-31696)
• Used by Prometheus service discovery

SPARK-29032AddPrometheusServlettomonitorMaster/Worker/Driver
PrometheusServlet
Make Master/Worker/Driver expose the metrics in Prometheus format at the existing port
Follow the output style of "Spark JMXSink + Prometheus JMXExporter + javaagent" way
Port Prometheus Endpoint (New in 3.0) JSON Endpoint (Since initial release)
Driver 4040 /metrics/prometheus/ /metrics/json/
Worker 8081 /metrics/prometheus/ /metrics/json/
Master 8080 /metrics/master/prometheus/ /metrics/master/json/
Master 8080 /metrics/applications/prometheus/ /metrics/applications/json/

Use conf/metrics.properties like the other sinks
PrometheusServlet Conﬁguration
Copy conf/metrics.properties.template to conf/metrics.properties
Uncomment like the following in conf/metrics.properties
*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
*.sink.prometheusServlet.path=/metrics/prometheus
master.sink.prometheusServlet.path=/metrics/master/prometheus
applications.sink.prometheusServlet.path=/metrics/applications/prometheus

SPARK-29064AddPrometheusResourcetoexportexecutormetrics
PrometheusResource
New endpoint with the similar information of JSON endpoint
Driver exposes all executor memory metrics in Prometheus format
Port Prometheus Endpoint (New in 3.0) JSON Endpoint (Since 1.4)
Driver 4040 /metrics/executors/prometheus/ /api/v1/applications/{id}/executors/

Use spark.ui.prometheus.enabled
PrometheusResource Conﬁguration
Run spark-shell with conﬁguration
Run `curl` with the new endpoint
$ bin/spark-shell
-c spark.ui.prometheus.enabled=true
-c spark.executor.processTreeMetrics.enabled=true
$ curl http://localhost:4040/metrics/executors/prometheus/ | grep executor | head -n1
metrics_executor_rddBlocks{application_id="...", application_name="...", executor_id="..."} 0

Key Monitoring Scenarios on K8s clusters
Monitoring batch job memory behavior
Monitoring dynamic allocation behavior
Monitoring streaming job behavior

Key Monitoring Scenarios on K8s clusters
Monitoring batch job memory behavior
Monitoring streaming job behavior
=> A risk to be killed?
=> Unexpected slowness?
=> Latency?

Use Prometheus Service Discovery
Monitoring batch job memory behavior (1/2)
Conﬁguration Value
spark.ui.prometheus.enabled true
spark.kubernetes.driver.annotation.prometheus.io/scrape true
spark.kubernetes.driver.annotation.prometheus.io/path /metrics/executors/prometheus/
spark.kubernetes.driver.annotation.prometheus.io/port 4040

Monitoring batch job memory behavior (2/2)
spark-submit --master k8s://$K8S_MASTER --deploy-mode cluster
-c spark.driver.memory=2g
-c spark.executor.instances=30
-c spark.kubernetes.driver.annotation.prometheus.io/scrape=true
-c spark.kubernetes.driver.annotation.prometheus.io/path=/metrics/executors/prometheus/
-c spark.kubernetes.driver.annotation.prometheus.io/port=4040
-c spark.kubernetes.container.image=spark:3.0.0
--class org.apache.spark.examples.SparkPi
local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0.jar 200000

Set spark.dynamicAllocation.*
-c spark.dynamicAllocation.enabled=true
-c spark.dynamicAllocation.executorIdleTimeout=5
-c spark.dynamicAllocation.shuﬄeTracking.enabled=true
-c spark.dynamicAllocation.maxExecutors=50
… (the same) …
https://gist.githubusercontent.com/dongjoon-hyun/.../dynamic-pi.py 10000

Set spark.dynamicAllocation.*
-c spark.dynamicAllocation.enabled=true
-c spark.dynamicAllocation.executorIdleTimeout=5
-c spark.dynamicAllocation.shuﬄeTracking.enabled=true
-c spark.dynamicAllocation.maxExecutors=50
… (the same) …
https://gist.githubusercontent.com/dongjoon-hyun/.../dynamic-pi.py 10000
`dynamic-pi.py` computes Pi, sleeps 1 minutes, and computes Pi again.

Select a single Spark app
rate(metrics_executor_totalTasks_total{...}[1m])

Inform Prometheus both metrics endpoints
Driver service annotation
-c spark.kubernetes.driver.annotation.prometheus.io/scrape=true
-c spark.kubernetes.driver.annotation.prometheus.io/path=/metrics/prometheus/
-c spark.kubernetes.driver.annotation.prometheus.io/port=4040
-c spark.kubernetes.driver.service.annotation.prometheus.io/scrape=true
-c spark.kubernetes.driver.service.annotation.prometheus.io/path=/metrics/executors/prometheus/
-c spark.kubernetes.driver.service.annotation.prometheus.io/port=4040
…

spark.dynamicAllocation.maxExecutors=30
spark.dynamicAllocation.maxExecutors=300
Executor Allocation Ratio

Set spark.sql.streaming.metricsEnabled=true (default:false)
Monitoring streaming job behavior (1/2)
Metrics
• latency
• inputRate-total
• processingRate-total
• states-rowsTotal
• states-usedBytes
• eventTime-watermark
Preﬁx of streaming query metric names
• metrics_[namespace]_spark_streaming_[queryName]
•

All metrics are important for alert
Monitoring streaming job behavior (2/2)
latency > micro-batch interval
• Spark can endure some situations, but the job needs to be re-design to prevent future
outage
states-rowsTotal grows indeﬁnitely
• These jobs will die eventually due to OOM
- SPARK-27340 Alias on TimeWindow expression cause watermark metadata lost (Fixed at 3.0)
- SPARK-30553 Fix structured-streaming java example error

Separation of concerns
Prometheus Federation and Alert
Prometheus
Server
Prometheus Web UI
Alert Manager
Pushgateway
namespace1 (User)
… Prometheus
Server
Prometheus Web UI
Alert Manager
Pushgateway
namespace2 (User)
Prometheus
Server
Prometheus Web UI
Alert Manager
Pushgateway
Cluster-wise prometheus (Admin)
Metrics for batch job monitoring Metrics for streaming job monitoring
a subset of metrics
(spark_info + ...)

New endpoints are still experimental
Limitations and Tips
New endpoints expose only Spark metrics starting with `metrics_` or `spark_info`
• `javaagent` method can expose more metrics like `jvm_info`
PrometheusSevlet does not follow Prometheus naming convention
• Instead, it's designed to follow Spark 2 naming convention for consistency in Spark
The number of metrics grows if we don't set the followings
writeStream.queryName("spark")
spark.metrics.namespace=spark

Summary
Spark 3 provides a better integration with Prometheus monitoring
• Especially, in K8s environment, the metric collections become much easier than Spark 2
New Prometheus style endpoints are independent and additional options
• Users can migrate into new endpoints or use them with the existing methods in a mixed
way

Native Support of Prometheus Monitoring in Apache Spark 3.0

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Native Support of Prometheus Monitoring in Apache Spark 3.0

Similar to Native Support of Prometheus Monitoring in Apache Spark 3.0 (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

Native Support of Prometheus Monitoring in Apache Spark 3.0