Delight: An Improved Apache Spark UI, Free, and Cross-Platform

Delight: An improved
Apache Spark UI,
Free &
Cross-Platform
Friday, May 28th at 11:40am PDT
Jean-Yves Stephan & Julien Dumazert
Co-Founders of Data Mechanics

/whoami
Jean-Yves “JY” Stephan
Co-Founder & CEO @ Data Mechanics
jy@datamechanics.co
Previously:
Software Engineer and
Spark Infrastructure Lead @ Databricks
Julien Dumazert
Co-Founder & CTO @ Data Mechanics
julien@datamechanics.co
Previously:
Lead Data Scientist @ ContentSquare
Data Scientist @ BlaBlaCar

Agenda
▪ A primer on Data Mechanics
▪ The Vision Behind Delight
▪ How Delight Works
▪ Performance Tuning
Session
▪ Future Roadmap

Data Mechanics - Our mission is
to make Spark more developer friendly & cost-effective
https://www.datamechanics.co

Developer-friendly: Run Dockerized Spark apps from anywhere, and monitor them from our intuitive UI.
Cost-Effective: Your pipelines are continuously scaled and optimized for stability and performance.
Flexible: Beneﬁt from the open k8s ecosystem in your account, in your VPC.. without the complexity.
A serverless Spark platform in your cloud account
A managed, autoscaling, Kubernetes cluster in your AWS, GCP, or Azure account, in your VPC
Data
Mechanics
Gateway
Notebooks
API
GUI

Customer story: A migration from EMR to Data Mechanics
“Leveraging Data Mechanics Spark
expertise and platform decreases cost
while letting us sleep well at night and
achieve the plans we dream about”
Dale McCrory, Chief Product Officer
Read our blog post Migrating from EMR to Data Mechanics for details
https://www.datamechanics.co/blog-post/migrating-from-emr-to-spark-on-kubernetes-with-data-mechanics
100%
35%
AWS Costs
40s
20s
App Startup
150s
90s
App Duration
100%

Problems with the Spark UI
● It’s hard to get a bird-eye view
○ Too much noise
○ Needs “tribal knowledge”
● No system metrics
○ Memory, CPU, I/O
○ Requires jumping with another
monitoring tool (not Spark centric)
● The Spark History Server
○ Slow & Unstable
○ Requires setup & maintenance

How Delight Can Help
● Memory & CPU Metrics
○ Taken from Spark
○ Aligned on the same timeline
as your Spark phases
● Identify performance issues
○ Make problems obvious
○ Give automated tuning
recommendations
● Easy to setup
○ Agent running in the Spark driver
○ Hosted dashboard

We’re now opening up Delight to any Spark user
https://www.datamechanics.co/delight
April 2021
Delight public release.
Works on top of any Spark
platform.
November 2020
MVP released: Dashboard +
Hosted Spark History Server
Particularly useful for
Spark-on-Kubernetes.
July 2020
Blog post with design
prototype published.
500 sign-ups.
February 2021
Internal release to Data
Mechanics customers
Usability and stability ﬁxes

Your Spark application
Your Spark Infrastructure
Cloud or on-premise, Commercial or open-source
Data Mechanics Backend
Storage
Automated cleanup after 30 days
Log Collector
Webapp
Data Mechanics Agent
Open-sourced SparkListener
Encrypted event logs
sent over HTTPS
Web dashboard at
delight.datamechanics.co
An open-source agent talking to a hosted backend

How to get started with Delight

Example: Installation Instructions on Databricks
https://github.com/datamechanics/delight

Example: Installation Instructions on EMR

The dashboard lists your completed Spark apps ...

… with high-level stats to help track your costs

● CPU Uptime (in core-hours)
○ # of CPU resources by an app
○ Example: 3 executors, with 2 cores
each, up for 1 hour => 6 core hours
● Spark tasks (in hours)
○ Sum of the duration of all the Spark
tasks in your application
○ “Real work” done by Spark
○ Example: 72 minutes
● Efficiency (%)
○ Spark Tasks / CPU Uptime ratio
○ % of the time when you Spark
executors are busy running tasks
○ Example: 72 min / 6 hours = 20%.

Good Efficiency!
Poor Efficiency!

Delight can help you identify & ﬁx inefficiencies
● Common root causes:
○ Lack of dynamic allocation
○ Overprovisioning # of executors
○ Too small # of partitions (in the spark config, or in the input data partitioning scheme)
○ Task duration skew caused by data skew
○ Slow object store commits
○ Long periods of driver-only work (e.g. pure Python code)
● The Data Mechanics platform has many optimizations to help increase our
customers efficiency
○ So we can reduce their cloud costs
○ Our pricing is based on Spark Tasks time, not on CPU Uptime. So our incentives are aligned

Agenda
▪ A primer on Data Mechanics
▪ The Vision Behind Delight
▪ How Delight Works
▪ Performance Tuning Session
▪ Future Roadmap

Our future plans for Delight
https://www.datamechanics.co/delight
September 2021
Real-time metrics
While the app is running.
Useful for streaming apps.
July 2021
Driver memory
Collect and display driver
memory usage
June 2021
Executor page
Memory usage graph
for each executor
August 2021
Automated recommendations
Delight surfaces issues and gives
resolution tips

What are your plans for Delight? Try it out & let us know!
Get started at https://delight.datamechanics.co

Thank You!
Your feedback is important to us.
Don’t forget to rate and review the sessions.
github.com/datamechanics/delight
www.datamechanics.co/

Delight: An Improved Apache Spark UI, Free, and Cross-Platform

More Related Content

What's hot

Similar to Delight: An Improved Apache Spark UI, Free, and Cross-Platform

More from Databricks

Recently uploaded

Delight: An Improved Apache Spark UI, Free, and Cross-Platform