SlideShare a Scribd company logo
Native Support of Prometheus Monitoring
in Apache Spark 3
Dongjoon Hyun
DB Tsai
SPARK+AI SUMMIT 2020
Who am I
Dongjoon Hyun
Apache Spark PMC and Committer
Apache ORC PMC and Committer
Apache REEF PMC and Committer
https://github.com/dongjoon-hyun
https://www.linkedin.com/in/dongjoon
@dongjoonhyun
Who am I
DB Tsai
Apache Spark PMC and Committer
Apache SystemML PMC and Committer
Apache Yunikorn Committer
Apache Bahir Committer
https://github.com/dbtsai
https://www.linkedin.com/in/dbtsai
@dbtsai
Three popular methods
Monitoring Apache Spark
Web UI (Live and History Server)
• Jobs, Stages,Tasks, SQL queries
• Executors, Storage
Logs
• Event logs and Spark process logs
• Listeners (SparkListener, StreamingQueryListener, SparkStatusTracker, …)
Metrics
• Various numeric values
Early warning instead of post-mortem process
Metrics are useful to handle gray failures
Monitoring and alerting Spark jobs’gray failures
• Memory Leak or misconfiguration
• Performance degradation
• Growing streaming job’s inter-states
An open-source systems monitoring and alerting toolkit
Prometheus
Provides
• a multi-dimensional data model
• operational simplicity
• scalable data collection
• a powerful query language
A good option for Apache Spark Metrics
Prometheus Server
Prometheus Web UI
Alert Manager
Pushgateway
https://en.wikipedia.org/wiki/Prometheus_(software)
Using JmxSink and JMXExporter combination
Spark 2 with Prometheus (1/3)
Enable Spark’s built-in JmxSink in Spark’s conf/metrics.properties
Deploy Prometheus' JMXExporter library and its config file
Expose JMXExporter port, 9404, to Prometheus
Add `-javaagent` option to the target (master/worker/executor/driver/…)
-javaagent:./jmx_prometheus_javaagent-0.12.0.jar=9404:config.yaml
Using GraphiteSink and GraphiteExporter combination
Spark 2 with Prometheus (2/3)
Set up Graphite server
Enable Spark’s built-in Graphite Sink with several configurations
Enable Prometheus’GraphiteExporter at Graphite
Custom sink (or 3rd party Sink) + Pushgateway server
Spark 2 with Prometheus (3/3)
Set up Pushgateway server
Develop a custom sink (or use 3rd party libs) with Prometheus dependency
Deploy the sink libraries and its configuration file to the cluster
Pros and Cons
Pros
• Used already in production
• A general approach
Cons
• Difficult to setup at new environments
• Some custom libraries may have a dependency on Spark versions
Easy usage
Goal in Apache Spark 3
Be independent from the existing Metrics pipeline
• Use new endpoints and disable it by default
• Avoid introducing new dependency
Reuse the existing resources
• Use official documented ports of Master/Worker/Driver
• Take advantage of Prometheus Service Discovery in K8s as much as possible
What's new in Spark 3 Metrics
SPARK-29674 / SPARK-29557
DropWizard Metrics 4 for JDK11
Timeline
2.3 3.02.41.6 2.1 2.22.0
4.1.13.1.53.1.2DropWizard Metrics
Spark
20202019201820172016Year
DropWizard Metrics 4.x (Spark 3)
SPARK-29674 / SPARK-29557
DropWizard Metrics 4 for JDK11
Timeline
DropWizard Metrics 3.x (Spark 1/2)
metrics_master_workers_Value 0.0 metrics_master_workers_Value{type="gauges",} 0.0
metrics_master_workers_Number{type=“gauges",} 0.0
2.3 3.02.41.6 2.1 2.22.0
4.1.13.1.53.1.2DropWizard Metrics
Spark
20202019201820172016Year
A new metric source
ExecutorMetricsSource
Collect executor memory metrics to driver and expose it as ExecutorMetricsSource and
REST API (SPARK-23429, SPARK-27189, SPARK-27324, SPARK-24958)
• JVMHeapMemory / JVMOffHeapMemory
• OnHeapExecutionMemory / OffHeapExecutionMemory
• OnHeapStorageMemory / OffHeapStorageMemory
• OnHeapUnifiedMemory / OffHeapUnifiedMemory
• DirectPoolMemory / MappedPoolMemory
• MinorGCCount / MinorGCTime
• MajorGCCount / MajorGCTime
• ProcessTreeJVMVMemory
• ProcessTreeJVMRSSMemory
• ProcessTreePythonVMemory
• ProcessTreePythonRSSMemory
• ProcessTreeOtherVMemory
• ProcessTreeOtherRSSMemory
JVM Process Tree
Prometheus-format endpoints
Support Prometheus more natively (1/2)
PrometheusServlet: A friend of MetricSevlet
• A new metric sink supporting Prometheus-format (SPARK-29032)
• Unified way of configurations via conf/metrics.properties
• No additional system requirements (services / libraries / ports)
Prometheus-format endpoints
Support Prometheus more natively (1/2)
PrometheusServlet: A friend of MetricSevlet
• A new metric sink supporting Prometheus-format (SPARK-29032)
• Unified way of configurations via conf/metrics.properties
• No additional system requirements (services / libraries / ports)
PrometheusResource: A single endpoint for all executor memory metrics
• A new metric endpoint to export all executor metrics at driver (SPARK-29064/SPARK-29400)
• The most efficient way to discover and collect because driver has all information already
• Enabled by `spark.ui.prometheus.enabled` (default:false)
spark_info and service discovery
Support Prometheus more natively (2/2)
Add spark_info metric (SPARK-31743)
• A standard Prometheus way to expose
version and revision
• Monitoring Spark jobs per version
Support driver service annotation in K8S (SPARK-31696)
• Used by Prometheus service discovery
Under the hood
SPARK-29032AddPrometheusServlettomonitorMaster/Worker/Driver
PrometheusServlet
Make Master/Worker/Driver expose the metrics in Prometheus format at the existing port
Follow the output style of "Spark JMXSink + Prometheus JMXExporter + javaagent" way
Port Prometheus Endpoint (New in 3.0) JSON Endpoint (Since initial release)
Driver 4040 /metrics/prometheus/ /metrics/json/
Worker 8081 /metrics/prometheus/ /metrics/json/
Master 8080 /metrics/master/prometheus/ /metrics/master/json/
Master 8080 /metrics/applications/prometheus/ /metrics/applications/json/
Spark Driver Endpoint Example
Use conf/metrics.properties like the other sinks
PrometheusServlet Configuration
Copy conf/metrics.properties.template to conf/metrics.properties
Uncomment like the following in conf/metrics.properties
*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
*.sink.prometheusServlet.path=/metrics/prometheus
master.sink.prometheusServlet.path=/metrics/master/prometheus
applications.sink.prometheusServlet.path=/metrics/applications/prometheus
SPARK-29064AddPrometheusResourcetoexportexecutormetrics
PrometheusResource
New endpoint with the similar information of JSON endpoint
Driver exposes all executor memory metrics in Prometheus format
Port Prometheus Endpoint (New in 3.0) JSON Endpoint (Since 1.4)
Driver 4040 /metrics/executors/prometheus/ /api/v1/applications/{id}/executors/
Use spark.ui.prometheus.enabled
PrometheusResource Configuration
Run spark-shell with configuration
Run `curl` with the new endpoint
$ bin/spark-shell 
-c spark.ui.prometheus.enabled=true 
-c spark.executor.processTreeMetrics.enabled=true
$ curl http://localhost:4040/metrics/executors/prometheus/ | grep executor | head -n1
metrics_executor_rddBlocks{application_id="...", application_name="...", executor_id="..."} 0
Monitoring in K8s cluster
Key Monitoring Scenarios on K8s clusters
Monitoring batch job memory behavior
Monitoring dynamic allocation behavior
Monitoring streaming job behavior
Key Monitoring Scenarios on K8s clusters
Monitoring batch job memory behavior
Monitoring dynamic allocation behavior
Monitoring streaming job behavior
=> A risk to be killed?
=> Unexpected slowness?
=> Latency?
Use Prometheus Service Discovery
Monitoring batch job memory behavior (1/2)
Configuration Value
spark.ui.prometheus.enabled true
spark.kubernetes.driver.annotation.prometheus.io/scrape true
spark.kubernetes.driver.annotation.prometheus.io/path /metrics/executors/prometheus/
spark.kubernetes.driver.annotation.prometheus.io/port 4040
Monitoring batch job memory behavior (2/2)
spark-submit --master k8s://$K8S_MASTER --deploy-mode cluster 
-c spark.driver.memory=2g 
-c spark.executor.instances=30 
-c spark.ui.prometheus.enabled=true 
-c spark.kubernetes.driver.annotation.prometheus.io/scrape=true 
-c spark.kubernetes.driver.annotation.prometheus.io/path=/metrics/executors/prometheus/ 
-c spark.kubernetes.driver.annotation.prometheus.io/port=4040 
-c spark.kubernetes.container.image=spark:3.0.0 
--class org.apache.spark.examples.SparkPi 
local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0.jar 200000
OOMKilled at Driver
Set spark.dynamicAllocation.*
Monitoring dynamic allocation behavior
spark-submit --master k8s://$K8S_MASTER --deploy-mode cluster 
-c spark.dynamicAllocation.enabled=true 
-c spark.dynamicAllocation.executorIdleTimeout=5 
-c spark.dynamicAllocation.shuffleTracking.enabled=true 
-c spark.dynamicAllocation.maxExecutors=50 
-c spark.ui.prometheus.enabled=true 
… (the same) …
https://gist.githubusercontent.com/dongjoon-hyun/.../dynamic-pi.py 10000
Set spark.dynamicAllocation.*
Monitoring dynamic allocation behavior
spark-submit --master k8s://$K8S_MASTER --deploy-mode cluster 
-c spark.dynamicAllocation.enabled=true 
-c spark.dynamicAllocation.executorIdleTimeout=5 
-c spark.dynamicAllocation.shuffleTracking.enabled=true 
-c spark.dynamicAllocation.maxExecutors=50 
-c spark.ui.prometheus.enabled=true 
… (the same) …
https://gist.githubusercontent.com/dongjoon-hyun/.../dynamic-pi.py 10000
`dynamic-pi.py` computes Pi, sleeps 1 minutes, and computes Pi again.
Select a single Spark app
rate(metrics_executor_totalTasks_total{...}[1m])
Inform Prometheus both metrics endpoints
Driver service annotation
spark-submit --master k8s://$K8S_MASTER --deploy-mode cluster 
-c spark.ui.prometheus.enabled=true 
-c spark.kubernetes.driver.annotation.prometheus.io/scrape=true 
-c spark.kubernetes.driver.annotation.prometheus.io/path=/metrics/prometheus/ 
-c spark.kubernetes.driver.annotation.prometheus.io/port=4040 
-c spark.kubernetes.driver.service.annotation.prometheus.io/scrape=true 
-c spark.kubernetes.driver.service.annotation.prometheus.io/path=/metrics/executors/prometheus/ 
-c spark.kubernetes.driver.service.annotation.prometheus.io/port=4040 
…
spark.dynamicAllocation.maxExecutors=30
spark.dynamicAllocation.maxExecutors=300
Executor Allocation Ratio
Set spark.sql.streaming.metricsEnabled=true (default:false)
Monitoring streaming job behavior (1/2)
Metrics
• latency
• inputRate-total
• processingRate-total
• states-rowsTotal
• states-usedBytes
• eventTime-watermark
Prefix of streaming query metric names
• metrics_[namespace]_spark_streaming_[queryName]
•
All metrics are important for alert
Monitoring streaming job behavior (2/2)
latency > micro-batch interval
• Spark can endure some situations, but the job needs to be re-design to prevent future
outage
states-rowsTotal grows indefinitely
• These jobs will die eventually due to OOM
- SPARK-27340 Alias on TimeWindow expression cause watermark metadata lost (Fixed at 3.0)
- SPARK-30553 Fix structured-streaming java example error
Separation of concerns
Prometheus Federation and Alert
Prometheus
Server
Prometheus Web UI
Alert Manager
Pushgateway
namespace1 (User)
… Prometheus
Server
Prometheus Web UI
Alert Manager
Pushgateway
namespace2 (User)
Prometheus
Server
Prometheus Web UI
Alert Manager
Pushgateway
Cluster-wise prometheus (Admin)
Metrics for batch job monitoring Metrics for streaming job monitoring
a subset of metrics
(spark_info + ...)
New endpoints are still experimental
Limitations and Tips
New endpoints expose only Spark metrics starting with `metrics_` or `spark_info`
• `javaagent` method can expose more metrics like `jvm_info`
PrometheusSevlet does not follow Prometheus naming convention
• Instead, it's designed to follow Spark 2 naming convention for consistency in Spark
The number of metrics grows if we don't set the followings
writeStream.queryName("spark")
spark.metrics.namespace=spark
Summary
Spark 3 provides a better integration with Prometheus monitoring
• Especially, in K8s environment, the metric collections become much easier than Spark 2
New Prometheus style endpoints are independent and additional options
• Users can migrate into new endpoints or use them with the existing methods in a mixed
way
Thank you!

More Related Content

What's hot

Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
Databricks
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Databricks
 
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
Databricks
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
Databricks
 
Extending Spark With Java Agent (handout)
Extending Spark With Java Agent (handout)Extending Spark With Java Agent (handout)
Extending Spark With Java Agent (handout)Jaroslav Bachorik
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
Databricks
 
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark Summit
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Databricks
 
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpRunning Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
HostedbyConfluent
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
Flink Forward
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 

What's hot (20)

Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
 
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
Extending Spark With Java Agent (handout)
Extending Spark With Java Agent (handout)Extending Spark With Java Agent (handout)
Extending Spark With Java Agent (handout)
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
Spark SQL: Another 16x Faster After Tungsten: Spark Summit East talk by Brad ...
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
 
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpRunning Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 

Similar to Native Support of Prometheus Monitoring in Apache Spark 3.0

Native support of Prometheus monitoring in Apache Spark 3
Native support of Prometheus monitoring in Apache Spark 3Native support of Prometheus monitoring in Apache Spark 3
Native support of Prometheus monitoring in Apache Spark 3
Dongjoon Hyun
 
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
Databricks
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
Databricks
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
Service Mesh - Observability
Service Mesh - ObservabilityService Mesh - Observability
Service Mesh - Observability
Araf Karsh Hamid
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
GetInData
 
What is New with Apache Spark Performance Monitoring in Spark 3.0
What is New with Apache Spark Performance Monitoring in Spark 3.0What is New with Apache Spark Performance Monitoring in Spark 3.0
What is New with Apache Spark Performance Monitoring in Spark 3.0
Databricks
 
Apache spark 2.4 and beyond
Apache spark 2.4 and beyondApache spark 2.4 and beyond
Apache spark 2.4 and beyond
Xiao Li
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
Paco Nathan
 
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Spark Summit
 
Regain Control Thanks To Prometheus
Regain Control Thanks To PrometheusRegain Control Thanks To Prometheus
Regain Control Thanks To Prometheus
Etienne Coutaud
 
The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]
Mahmoud Hatem
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
Stavros Kontopoulos
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Provectus
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectSaltlux Inc.
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
DataStax Academy
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
Databricks
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
GetInData
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
Joe Stein
 

Similar to Native Support of Prometheus Monitoring in Apache Spark 3.0 (20)

Native support of Prometheus monitoring in Apache Spark 3
Native support of Prometheus monitoring in Apache Spark 3Native support of Prometheus monitoring in Apache Spark 3
Native support of Prometheus monitoring in Apache Spark 3
 
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Service Mesh - Observability
Service Mesh - ObservabilityService Mesh - Observability
Service Mesh - Observability
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
 
What is New with Apache Spark Performance Monitoring in Spark 3.0
What is New with Apache Spark Performance Monitoring in Spark 3.0What is New with Apache Spark Performance Monitoring in Spark 3.0
What is New with Apache Spark Performance Monitoring in Spark 3.0
 
Apache spark 2.4 and beyond
Apache spark 2.4 and beyondApache spark 2.4 and beyond
Apache spark 2.4 and beyond
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
 
Regain Control Thanks To Prometheus
Regain Control Thanks To PrometheusRegain Control Thanks To Prometheus
Regain Control Thanks To Prometheus
 
The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC Project
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Recently uploaded

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 

Native Support of Prometheus Monitoring in Apache Spark 3.0

  • 1. Native Support of Prometheus Monitoring in Apache Spark 3 Dongjoon Hyun DB Tsai SPARK+AI SUMMIT 2020
  • 2. Who am I Dongjoon Hyun Apache Spark PMC and Committer Apache ORC PMC and Committer Apache REEF PMC and Committer https://github.com/dongjoon-hyun https://www.linkedin.com/in/dongjoon @dongjoonhyun
  • 3. Who am I DB Tsai Apache Spark PMC and Committer Apache SystemML PMC and Committer Apache Yunikorn Committer Apache Bahir Committer https://github.com/dbtsai https://www.linkedin.com/in/dbtsai @dbtsai
  • 4. Three popular methods Monitoring Apache Spark Web UI (Live and History Server) • Jobs, Stages,Tasks, SQL queries • Executors, Storage Logs • Event logs and Spark process logs • Listeners (SparkListener, StreamingQueryListener, SparkStatusTracker, …) Metrics • Various numeric values
  • 5. Early warning instead of post-mortem process Metrics are useful to handle gray failures Monitoring and alerting Spark jobs’gray failures • Memory Leak or misconfiguration • Performance degradation • Growing streaming job’s inter-states
  • 6. An open-source systems monitoring and alerting toolkit Prometheus Provides • a multi-dimensional data model • operational simplicity • scalable data collection • a powerful query language A good option for Apache Spark Metrics Prometheus Server Prometheus Web UI Alert Manager Pushgateway https://en.wikipedia.org/wiki/Prometheus_(software)
  • 7. Using JmxSink and JMXExporter combination Spark 2 with Prometheus (1/3) Enable Spark’s built-in JmxSink in Spark’s conf/metrics.properties Deploy Prometheus' JMXExporter library and its config file Expose JMXExporter port, 9404, to Prometheus Add `-javaagent` option to the target (master/worker/executor/driver/…) -javaagent:./jmx_prometheus_javaagent-0.12.0.jar=9404:config.yaml
  • 8. Using GraphiteSink and GraphiteExporter combination Spark 2 with Prometheus (2/3) Set up Graphite server Enable Spark’s built-in Graphite Sink with several configurations Enable Prometheus’GraphiteExporter at Graphite
  • 9. Custom sink (or 3rd party Sink) + Pushgateway server Spark 2 with Prometheus (3/3) Set up Pushgateway server Develop a custom sink (or use 3rd party libs) with Prometheus dependency Deploy the sink libraries and its configuration file to the cluster
  • 10. Pros and Cons Pros • Used already in production • A general approach Cons • Difficult to setup at new environments • Some custom libraries may have a dependency on Spark versions
  • 11. Easy usage Goal in Apache Spark 3 Be independent from the existing Metrics pipeline • Use new endpoints and disable it by default • Avoid introducing new dependency Reuse the existing resources • Use official documented ports of Master/Worker/Driver • Take advantage of Prometheus Service Discovery in K8s as much as possible
  • 12. What's new in Spark 3 Metrics
  • 13. SPARK-29674 / SPARK-29557 DropWizard Metrics 4 for JDK11 Timeline 2.3 3.02.41.6 2.1 2.22.0 4.1.13.1.53.1.2DropWizard Metrics Spark 20202019201820172016Year
  • 14. DropWizard Metrics 4.x (Spark 3) SPARK-29674 / SPARK-29557 DropWizard Metrics 4 for JDK11 Timeline DropWizard Metrics 3.x (Spark 1/2) metrics_master_workers_Value 0.0 metrics_master_workers_Value{type="gauges",} 0.0 metrics_master_workers_Number{type=“gauges",} 0.0 2.3 3.02.41.6 2.1 2.22.0 4.1.13.1.53.1.2DropWizard Metrics Spark 20202019201820172016Year
  • 15. A new metric source ExecutorMetricsSource Collect executor memory metrics to driver and expose it as ExecutorMetricsSource and REST API (SPARK-23429, SPARK-27189, SPARK-27324, SPARK-24958) • JVMHeapMemory / JVMOffHeapMemory • OnHeapExecutionMemory / OffHeapExecutionMemory • OnHeapStorageMemory / OffHeapStorageMemory • OnHeapUnifiedMemory / OffHeapUnifiedMemory • DirectPoolMemory / MappedPoolMemory • MinorGCCount / MinorGCTime • MajorGCCount / MajorGCTime • ProcessTreeJVMVMemory • ProcessTreeJVMRSSMemory • ProcessTreePythonVMemory • ProcessTreePythonRSSMemory • ProcessTreeOtherVMemory • ProcessTreeOtherRSSMemory JVM Process Tree
  • 16. Prometheus-format endpoints Support Prometheus more natively (1/2) PrometheusServlet: A friend of MetricSevlet • A new metric sink supporting Prometheus-format (SPARK-29032) • Unified way of configurations via conf/metrics.properties • No additional system requirements (services / libraries / ports)
  • 17. Prometheus-format endpoints Support Prometheus more natively (1/2) PrometheusServlet: A friend of MetricSevlet • A new metric sink supporting Prometheus-format (SPARK-29032) • Unified way of configurations via conf/metrics.properties • No additional system requirements (services / libraries / ports) PrometheusResource: A single endpoint for all executor memory metrics • A new metric endpoint to export all executor metrics at driver (SPARK-29064/SPARK-29400) • The most efficient way to discover and collect because driver has all information already • Enabled by `spark.ui.prometheus.enabled` (default:false)
  • 18. spark_info and service discovery Support Prometheus more natively (2/2) Add spark_info metric (SPARK-31743) • A standard Prometheus way to expose version and revision • Monitoring Spark jobs per version Support driver service annotation in K8S (SPARK-31696) • Used by Prometheus service discovery
  • 20. SPARK-29032AddPrometheusServlettomonitorMaster/Worker/Driver PrometheusServlet Make Master/Worker/Driver expose the metrics in Prometheus format at the existing port Follow the output style of "Spark JMXSink + Prometheus JMXExporter + javaagent" way Port Prometheus Endpoint (New in 3.0) JSON Endpoint (Since initial release) Driver 4040 /metrics/prometheus/ /metrics/json/ Worker 8081 /metrics/prometheus/ /metrics/json/ Master 8080 /metrics/master/prometheus/ /metrics/master/json/ Master 8080 /metrics/applications/prometheus/ /metrics/applications/json/
  • 22. Use conf/metrics.properties like the other sinks PrometheusServlet Configuration Copy conf/metrics.properties.template to conf/metrics.properties Uncomment like the following in conf/metrics.properties *.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet *.sink.prometheusServlet.path=/metrics/prometheus master.sink.prometheusServlet.path=/metrics/master/prometheus applications.sink.prometheusServlet.path=/metrics/applications/prometheus
  • 23. SPARK-29064AddPrometheusResourcetoexportexecutormetrics PrometheusResource New endpoint with the similar information of JSON endpoint Driver exposes all executor memory metrics in Prometheus format Port Prometheus Endpoint (New in 3.0) JSON Endpoint (Since 1.4) Driver 4040 /metrics/executors/prometheus/ /api/v1/applications/{id}/executors/
  • 24. Use spark.ui.prometheus.enabled PrometheusResource Configuration Run spark-shell with configuration Run `curl` with the new endpoint $ bin/spark-shell -c spark.ui.prometheus.enabled=true -c spark.executor.processTreeMetrics.enabled=true $ curl http://localhost:4040/metrics/executors/prometheus/ | grep executor | head -n1 metrics_executor_rddBlocks{application_id="...", application_name="...", executor_id="..."} 0
  • 25. Monitoring in K8s cluster
  • 26. Key Monitoring Scenarios on K8s clusters Monitoring batch job memory behavior Monitoring dynamic allocation behavior Monitoring streaming job behavior
  • 27. Key Monitoring Scenarios on K8s clusters Monitoring batch job memory behavior Monitoring dynamic allocation behavior Monitoring streaming job behavior => A risk to be killed? => Unexpected slowness? => Latency?
  • 28. Use Prometheus Service Discovery Monitoring batch job memory behavior (1/2) Configuration Value spark.ui.prometheus.enabled true spark.kubernetes.driver.annotation.prometheus.io/scrape true spark.kubernetes.driver.annotation.prometheus.io/path /metrics/executors/prometheus/ spark.kubernetes.driver.annotation.prometheus.io/port 4040
  • 29. Monitoring batch job memory behavior (2/2) spark-submit --master k8s://$K8S_MASTER --deploy-mode cluster -c spark.driver.memory=2g -c spark.executor.instances=30 -c spark.ui.prometheus.enabled=true -c spark.kubernetes.driver.annotation.prometheus.io/scrape=true -c spark.kubernetes.driver.annotation.prometheus.io/path=/metrics/executors/prometheus/ -c spark.kubernetes.driver.annotation.prometheus.io/port=4040 -c spark.kubernetes.container.image=spark:3.0.0 --class org.apache.spark.examples.SparkPi local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0.jar 200000
  • 31. Set spark.dynamicAllocation.* Monitoring dynamic allocation behavior spark-submit --master k8s://$K8S_MASTER --deploy-mode cluster -c spark.dynamicAllocation.enabled=true -c spark.dynamicAllocation.executorIdleTimeout=5 -c spark.dynamicAllocation.shuffleTracking.enabled=true -c spark.dynamicAllocation.maxExecutors=50 -c spark.ui.prometheus.enabled=true … (the same) … https://gist.githubusercontent.com/dongjoon-hyun/.../dynamic-pi.py 10000
  • 32. Set spark.dynamicAllocation.* Monitoring dynamic allocation behavior spark-submit --master k8s://$K8S_MASTER --deploy-mode cluster -c spark.dynamicAllocation.enabled=true -c spark.dynamicAllocation.executorIdleTimeout=5 -c spark.dynamicAllocation.shuffleTracking.enabled=true -c spark.dynamicAllocation.maxExecutors=50 -c spark.ui.prometheus.enabled=true … (the same) … https://gist.githubusercontent.com/dongjoon-hyun/.../dynamic-pi.py 10000 `dynamic-pi.py` computes Pi, sleeps 1 minutes, and computes Pi again.
  • 33. Select a single Spark app rate(metrics_executor_totalTasks_total{...}[1m])
  • 34. Inform Prometheus both metrics endpoints Driver service annotation spark-submit --master k8s://$K8S_MASTER --deploy-mode cluster -c spark.ui.prometheus.enabled=true -c spark.kubernetes.driver.annotation.prometheus.io/scrape=true -c spark.kubernetes.driver.annotation.prometheus.io/path=/metrics/prometheus/ -c spark.kubernetes.driver.annotation.prometheus.io/port=4040 -c spark.kubernetes.driver.service.annotation.prometheus.io/scrape=true -c spark.kubernetes.driver.service.annotation.prometheus.io/path=/metrics/executors/prometheus/ -c spark.kubernetes.driver.service.annotation.prometheus.io/port=4040 …
  • 36. Set spark.sql.streaming.metricsEnabled=true (default:false) Monitoring streaming job behavior (1/2) Metrics • latency • inputRate-total • processingRate-total • states-rowsTotal • states-usedBytes • eventTime-watermark Prefix of streaming query metric names • metrics_[namespace]_spark_streaming_[queryName] •
  • 37. All metrics are important for alert Monitoring streaming job behavior (2/2) latency > micro-batch interval • Spark can endure some situations, but the job needs to be re-design to prevent future outage states-rowsTotal grows indefinitely • These jobs will die eventually due to OOM - SPARK-27340 Alias on TimeWindow expression cause watermark metadata lost (Fixed at 3.0) - SPARK-30553 Fix structured-streaming java example error
  • 38. Separation of concerns Prometheus Federation and Alert Prometheus Server Prometheus Web UI Alert Manager Pushgateway namespace1 (User) … Prometheus Server Prometheus Web UI Alert Manager Pushgateway namespace2 (User) Prometheus Server Prometheus Web UI Alert Manager Pushgateway Cluster-wise prometheus (Admin) Metrics for batch job monitoring Metrics for streaming job monitoring a subset of metrics (spark_info + ...)
  • 39. New endpoints are still experimental Limitations and Tips New endpoints expose only Spark metrics starting with `metrics_` or `spark_info` • `javaagent` method can expose more metrics like `jvm_info` PrometheusSevlet does not follow Prometheus naming convention • Instead, it's designed to follow Spark 2 naming convention for consistency in Spark The number of metrics grows if we don't set the followings writeStream.queryName("spark") spark.metrics.namespace=spark
  • 40. Summary Spark 3 provides a better integration with Prometheus monitoring • Especially, in K8s environment, the metric collections become much easier than Spark 2 New Prometheus style endpoints are independent and additional options • Users can migrate into new endpoints or use them with the existing methods in a mixed way