Delight: An improved
Apache Spark UI,
Free &
Cross-Platform
Friday, May 28th at 11:40am PDT
Jean-Yves Stephan & Julien Dumazert
Co-Founders of Data Mechanics
/whoami
Jean-Yves “JY” Stephan
Co-Founder & CEO @ Data Mechanics
jy@datamechanics.co
Previously:
Software Engineer and
Spark Infrastructure Lead @ Databricks
Julien Dumazert
Co-Founder & CTO @ Data Mechanics
julien@datamechanics.co
Previously:
Lead Data Scientist @ ContentSquare
Data Scientist @ BlaBlaCar
Agenda
▪ A primer on Data Mechanics
▪ The Vision Behind Delight
▪ How Delight Works
▪ Performance Tuning
Session
▪ Future Roadmap
Agenda
▪ A primer on Data Mechanics
▪ The Vision Behind Delight
▪ How Delight Works
▪ Performance Tuning
Session
▪ Future Roadmap
Data Mechanics - Our mission is
to make Spark more developer friendly & cost-effective
https://www.datamechanics.co
Developer-friendly: Run Dockerized Spark apps from anywhere, and monitor them from our intuitive UI.
Cost-Effective: Your pipelines are continuously scaled and optimized for stability and performance.
Flexible: Benefit from the open k8s ecosystem in your account, in your VPC.. without the complexity.
A serverless Spark platform in your cloud account
A managed, autoscaling, Kubernetes cluster in your AWS, GCP, or Azure account, in your VPC
Data
Mechanics
Gateway
Notebooks
API
GUI
Customer story: A migration from EMR to Data Mechanics
“Leveraging Data Mechanics Spark
expertise and platform decreases cost
while letting us sleep well at night and
achieve the plans we dream about”
Dale McCrory, Chief Product Officer
Read our blog post Migrating from EMR to Data Mechanics for details
https://www.datamechanics.co/blog-post/migrating-from-emr-to-spark-on-kubernetes-with-data-mechanics
100%
35%
AWS Costs
40s
20s
App Startup
150s
90s
App Duration
100%
Agenda
▪ A primer on Data Mechanics
▪ The Vision Behind Delight
▪ How Delight Works
▪ Performance Tuning
Session
▪ Future Roadmap
Problems with the Spark UI
● It’s hard to get a bird-eye view
○ Too much noise
○ Needs “tribal knowledge”
● No system metrics
○ Memory, CPU, I/O
○ Requires jumping with another
monitoring tool (not Spark centric)
● The Spark History Server
○ Slow & Unstable
○ Requires setup & maintenance
How Delight Can Help
● Memory & CPU Metrics
○ Taken from Spark
○ Aligned on the same timeline
as your Spark phases
● Identify performance issues
○ Make problems obvious
○ Give automated tuning
recommendations
● Easy to setup
○ Agent running in the Spark driver
○ Hosted dashboard
We’re now opening up Delight to any Spark user
https://www.datamechanics.co/delight
April 2021
Delight public release.
Works on top of any Spark
platform.
November 2020
MVP released: Dashboard +
Hosted Spark History Server
Particularly useful for
Spark-on-Kubernetes.
July 2020
Blog post with design
prototype published.
500 sign-ups.
February 2021
Internal release to Data
Mechanics customers
Usability and stability fixes
Agenda
▪ A primer on Data Mechanics
▪ The Vision Behind Delight
▪ How Delight Works
▪ Performance Tuning
Session
▪ Future Roadmap
Your Spark application
Your Spark Infrastructure
Cloud or on-premise, Commercial or open-source
Data Mechanics Backend
Storage
Automated cleanup after 30 days
Log Collector
Webapp
Data Mechanics Agent
Open-sourced SparkListener
Encrypted event logs
sent over HTTPS
Web dashboard at
delight.datamechanics.co
An open-source agent talking to a hosted backend
How to get started with Delight
Example: Installation Instructions on Databricks
https://github.com/datamechanics/delight
Example: Installation Instructions on EMR
The dashboard lists your completed Spark apps ...
… with high-level stats to help track your costs
… with high-level stats to help track your costs
● CPU Uptime (in core-hours)
○ # of CPU resources by an app
○ Example: 3 executors, with 2 cores
each, up for 1 hour => 6 core hours
● Spark tasks (in hours)
○ Sum of the duration of all the Spark
tasks in your application
○ “Real work” done by Spark
○ Example: 72 minutes
● Efficiency (%)
○ Spark Tasks / CPU Uptime ratio
○ % of the time when you Spark
executors are busy running tasks
○ Example: 72 min / 6 hours = 20%.
… with high-level stats to help track your costs
Good Efficiency!
Poor Efficiency!
Delight can help you identify & fix inefficiencies
● Common root causes:
○ Lack of dynamic allocation
○ Overprovisioning # of executors
○ Too small # of partitions (in the spark config, or in the input data partitioning scheme)
○ Task duration skew caused by data skew
○ Slow object store commits
○ Long periods of driver-only work (e.g. pure Python code)
● The Data Mechanics platform has many optimizations to help increase our
customers efficiency
○ So we can reduce their cloud costs
○ Our pricing is based on Spark Tasks time, not on CPU Uptime. So our incentives are aligned
Agenda
▪ A primer on Data Mechanics
▪ The Vision Behind Delight
▪ How Delight Works
▪ Performance Tuning Session
▪ Future Roadmap
Agenda
▪ A primer on Data Mechanics
▪ The Vision Behind Delight
▪ How Delight Works
▪ Performance Tuning Session
▪ Future Roadmap
Our future plans for Delight
https://www.datamechanics.co/delight
September 2021
Real-time metrics
While the app is running.
Useful for streaming apps.
July 2021
Driver memory
Collect and display driver
memory usage
June 2021
Executor page
Memory usage graph
for each executor
August 2021
Automated recommendations
Delight surfaces issues and gives
resolution tips
What are your plans for Delight? Try it out & let us know!
Get started at https://delight.datamechanics.co
Thank You!
Your feedback is important to us.
Don’t forget to rate and review the sessions.
github.com/datamechanics/delight
www.datamechanics.co/

Delight: An Improved Apache Spark UI, Free, and Cross-Platform

  • 1.
    Delight: An improved ApacheSpark UI, Free & Cross-Platform Friday, May 28th at 11:40am PDT Jean-Yves Stephan & Julien Dumazert Co-Founders of Data Mechanics
  • 2.
    /whoami Jean-Yves “JY” Stephan Co-Founder& CEO @ Data Mechanics jy@datamechanics.co Previously: Software Engineer and Spark Infrastructure Lead @ Databricks Julien Dumazert Co-Founder & CTO @ Data Mechanics julien@datamechanics.co Previously: Lead Data Scientist @ ContentSquare Data Scientist @ BlaBlaCar
  • 3.
    Agenda ▪ A primeron Data Mechanics ▪ The Vision Behind Delight ▪ How Delight Works ▪ Performance Tuning Session ▪ Future Roadmap
  • 4.
    Agenda ▪ A primeron Data Mechanics ▪ The Vision Behind Delight ▪ How Delight Works ▪ Performance Tuning Session ▪ Future Roadmap
  • 5.
    Data Mechanics -Our mission is to make Spark more developer friendly & cost-effective https://www.datamechanics.co
  • 6.
    Developer-friendly: Run DockerizedSpark apps from anywhere, and monitor them from our intuitive UI. Cost-Effective: Your pipelines are continuously scaled and optimized for stability and performance. Flexible: Benefit from the open k8s ecosystem in your account, in your VPC.. without the complexity. A serverless Spark platform in your cloud account A managed, autoscaling, Kubernetes cluster in your AWS, GCP, or Azure account, in your VPC Data Mechanics Gateway Notebooks API GUI
  • 7.
    Customer story: Amigration from EMR to Data Mechanics “Leveraging Data Mechanics Spark expertise and platform decreases cost while letting us sleep well at night and achieve the plans we dream about” Dale McCrory, Chief Product Officer Read our blog post Migrating from EMR to Data Mechanics for details https://www.datamechanics.co/blog-post/migrating-from-emr-to-spark-on-kubernetes-with-data-mechanics 100% 35% AWS Costs 40s 20s App Startup 150s 90s App Duration 100%
  • 8.
    Agenda ▪ A primeron Data Mechanics ▪ The Vision Behind Delight ▪ How Delight Works ▪ Performance Tuning Session ▪ Future Roadmap
  • 9.
    Problems with theSpark UI ● It’s hard to get a bird-eye view ○ Too much noise ○ Needs “tribal knowledge” ● No system metrics ○ Memory, CPU, I/O ○ Requires jumping with another monitoring tool (not Spark centric) ● The Spark History Server ○ Slow & Unstable ○ Requires setup & maintenance
  • 10.
    How Delight CanHelp ● Memory & CPU Metrics ○ Taken from Spark ○ Aligned on the same timeline as your Spark phases ● Identify performance issues ○ Make problems obvious ○ Give automated tuning recommendations ● Easy to setup ○ Agent running in the Spark driver ○ Hosted dashboard
  • 11.
    We’re now openingup Delight to any Spark user https://www.datamechanics.co/delight April 2021 Delight public release. Works on top of any Spark platform. November 2020 MVP released: Dashboard + Hosted Spark History Server Particularly useful for Spark-on-Kubernetes. July 2020 Blog post with design prototype published. 500 sign-ups. February 2021 Internal release to Data Mechanics customers Usability and stability fixes
  • 12.
    Agenda ▪ A primeron Data Mechanics ▪ The Vision Behind Delight ▪ How Delight Works ▪ Performance Tuning Session ▪ Future Roadmap
  • 13.
    Your Spark application YourSpark Infrastructure Cloud or on-premise, Commercial or open-source Data Mechanics Backend Storage Automated cleanup after 30 days Log Collector Webapp Data Mechanics Agent Open-sourced SparkListener Encrypted event logs sent over HTTPS Web dashboard at delight.datamechanics.co An open-source agent talking to a hosted backend
  • 14.
    How to getstarted with Delight
  • 15.
    Example: Installation Instructionson Databricks https://github.com/datamechanics/delight
  • 16.
  • 17.
    The dashboard listsyour completed Spark apps ...
  • 18.
    … with high-levelstats to help track your costs
  • 19.
    … with high-levelstats to help track your costs ● CPU Uptime (in core-hours) ○ # of CPU resources by an app ○ Example: 3 executors, with 2 cores each, up for 1 hour => 6 core hours ● Spark tasks (in hours) ○ Sum of the duration of all the Spark tasks in your application ○ “Real work” done by Spark ○ Example: 72 minutes ● Efficiency (%) ○ Spark Tasks / CPU Uptime ratio ○ % of the time when you Spark executors are busy running tasks ○ Example: 72 min / 6 hours = 20%.
  • 20.
    … with high-levelstats to help track your costs Good Efficiency! Poor Efficiency!
  • 21.
    Delight can helpyou identify & fix inefficiencies ● Common root causes: ○ Lack of dynamic allocation ○ Overprovisioning # of executors ○ Too small # of partitions (in the spark config, or in the input data partitioning scheme) ○ Task duration skew caused by data skew ○ Slow object store commits ○ Long periods of driver-only work (e.g. pure Python code) ● The Data Mechanics platform has many optimizations to help increase our customers efficiency ○ So we can reduce their cloud costs ○ Our pricing is based on Spark Tasks time, not on CPU Uptime. So our incentives are aligned
  • 22.
    Agenda ▪ A primeron Data Mechanics ▪ The Vision Behind Delight ▪ How Delight Works ▪ Performance Tuning Session ▪ Future Roadmap
  • 23.
    Agenda ▪ A primeron Data Mechanics ▪ The Vision Behind Delight ▪ How Delight Works ▪ Performance Tuning Session ▪ Future Roadmap
  • 24.
    Our future plansfor Delight https://www.datamechanics.co/delight September 2021 Real-time metrics While the app is running. Useful for streaming apps. July 2021 Driver memory Collect and display driver memory usage June 2021 Executor page Memory usage graph for each executor August 2021 Automated recommendations Delight surfaces issues and gives resolution tips
  • 25.
    What are yourplans for Delight? Try it out & let us know! Get started at https://delight.datamechanics.co
  • 26.
    Thank You! Your feedbackis important to us. Don’t forget to rate and review the sessions. github.com/datamechanics/delight www.datamechanics.co/