Sparklint @ Spark Meetup Chicago

•Download as PPTX, PDF•

2 likes•400 views

Spark makes it easy to build and deploy complex data processing applications onto shared compute platforms, but tuning them is a skill in itself and can get overlooked. Uncontrolled, this leads to over specified resource requirements, unnecessary platform load and increases the chances of resource contention, degrading overall performance. By identifying inefficient jobs, development teams and platform administrators can wrestle back control of system resources, improve efficiency and lessen the effect of contention across the cluster. Sparklint uses the Spark metrics steam and a custom event listener to analyze individual Spark jobs for over specified or unbalanced resources, incorrect partitioning and sub optimal worker locality. It is easily attached to any Spark job and can also run standalone against historical event logs, presenting data for analysis through a web UI and providing a unique resource focused view of the application runtime.

Technology

Sparklint
a Tool for Identifying and Tuning Inefficient Spark Jobs
Across Your Cluster

Why Sparklint?
• A successful Spark cluster grows rapidly
• Capacity and capability mismatches arise
• Leads to resource contention
• Tuning process is non-trivial
• Current UI operational in focus
We wanted to understand application efficiency

Sparklint provides:
• Live view of batch & streaming application stats
or
• Event by event analysis of historical event logs
• Stats and graphs for:
– Idle time
– Core usage
– Task locality

Demo…
• Simulated workload analyzing site access logs:
– read text file as JSON
– convert to Record(ip, verb, status, time)
– countByIp, countByStatus, countByVerb

Job took 10m7s to finish
Already pretty good
distribution; low idle time
indicates good worker
usage, minimal driver node
interaction in job
But overall utilization is low
Which is reflected in the
common occurrence of the
IDLE state (unused cores)

Job took 15m14s to
finish
Core usage increased,
job is more efficient,
execution time
increased, but the app
is not cpu bound

Job took 9m24s to finish
Core utilization decreased
proportionally, trading execution time
for efficiency
Lots of IDLE state shows
we are over allocating
resources

Job took 11m34s to finish
Core utilization remains
low, the config settings
are not right for this
workload.
Dynamic allocation only effective at
app start due to long
executorIdleTimeout setting

Job took 33m5s to finish Core utilization is up, but execution time is
up dramatically due to reclaiming
resources before each short running task.
IDLE state is reduced to a minimum, looks
efficient, but execution is much slower due to
dynamic allocation overhead

Job took 7m34s to finish
Core utilization way up,
with lower execution time
Parallel execution is
clearly visible in
overlapping stages
Flat tops show we are
becoming CPU bound

Job took 5m6s to finish
Core utilization decreases,
trading execution time for
efficiency again here

Thanks to dynamic allocation the
utilization is high despite being a bi-
modal application
Data loading and mapping requires
a large core count to get throughput
Aggregation and IO of results
optimized for end file size,
therefore requires less cores

Future Features:
• History Server event sources
• Inline recommendations
• Auto-tuning
• Streaming stage parameter delegation
• Replay capable listener

The Credit:
• Lead developer is Robert Xue
• https://github.com/roboxue
• SDE @ Groupon

Contribute!
Sparklint is OSS:
https://github.com/groupon/sparklint

What's hot

Key to optimal end user experienceManageEngine, Zoho Corporation

Enterprise Beacon Object Hive - Siebel Version ControlMilind Waikul

Taking Splunk to the Next Level - ArchitectureSplunk

Infrastructure as codeAakash Singhal

Siebel monitoringSarnindar Purewal

2013 04-29-evolution of backendWooga

The Many Faces of Apache Kafka: Leveraging Real-time Data at ScaleMessaging Meetup

Netflix's Could MigrationChef

Flink Jobs Deployment On KubernetesKnoldus Inc.

Closing the door on application performance problemsManageEngine, Zoho Corporation

Engineering Leader opportunity @ Netflix - Playback Data SystemsPhilip Fisher-Ogden

Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018iguazio

DON'T PANIC: GETTING YOUR INFRASTRUCTURE DRIFT UNDER CONTROL, ERAN BIBI, FireflyDevOpsDays Tel Aviv

Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...Flink Forward

OpenStack Orchestration - Juno UpdatesOpenStack Foundation

Webinar: Cutting Time, Complexity and Cost from Data Science to Productioniguazio

Cloud applications monitoring in digital transformation eraManageEngine, Zoho Corporation

Building Reactive applications with AkkaKnoldus Inc.

Reaching State Zero Without Losing Your VersionsSSP Innovations

Stream processing with Apache Flink @ OfferUpBowen Li

What's hot (20)

Key to optimal end user experience

Enterprise Beacon Object Hive - Siebel Version Control

Taking Splunk to the Next Level - Architecture

Infrastructure as code

Siebel monitoring

2013 04-29-evolution of backend

The Many Faces of Apache Kafka: Leveraging Real-time Data at Scale

Netflix's Could Migration

Flink Jobs Deployment On Kubernetes

Closing the door on application performance problems

Engineering Leader opportunity @ Netflix - Playback Data Systems

Serverless and AI: Orit Nissan-Messing, Iguazio, Serverless NYC 2018

DON'T PANIC: GETTING YOUR INFRASTRUCTURE DRIFT UNDER CONTROL, ERAN BIBI, Firefly

Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...

OpenStack Orchestration - Juno Updates

Webinar: Cutting Time, Complexity and Cost from Data Science to Production

Cloud applications monitoring in digital transformation era

Building Reactive applications with Akka

Reaching State Zero Without Losing Your Versions

Stream processing with Apache Flink @ OfferUp

Similar to Sparklint @ Spark Meetup Chicago

Spark Summit EU talk by Simon WhitearSpark Summit

Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...Indrajit Poddar

Serverless Computing with Azure Functions Best PracticesJuan Pablo

Metrics-driven tuning of Apache Spark at scaleDataWorks Summit

Real time monitoring of hadoop and spark workflowsShankar Manian

Taking Splunk to the Next Level - Architecture Breakout SessionSplunk

Taking Splunk to the Next Level - ArchitectureSplunk

Serverless Patterns by Jesse ButlerOracle Developers

Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye ZhouDatabricks

DoneDeal - AWS Data Analytics Platformmartinbpeters

SQL PASS Summit 2018Kendra Little

Radical Speed for SQL Queries on Databricks: Photon Under the HoodDatabricks

How to Drive Down iSeries Computing Costsmboadway

Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks

Best Practices for Enabling Speculative Execution on Large Scale PlatformsDatabricks

Performance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorksThoughtworks

Cloud Security Monitoring and Spark Analyticsamesar0

Taking Splunk to the Next Level - TechnicalSplunk

Taking Splunk to the Next Level – ArchitectureSplunk

How to Migrate Applications Off a MainframeVMware Tanzu

Similar to Sparklint @ Spark Meetup Chicago (20)

Spark Summit EU talk by Simon Whitear

Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...

Serverless Computing with Azure Functions Best Practices

Metrics-driven tuning of Apache Spark at scale

Real time monitoring of hadoop and spark workflows

Taking Splunk to the Next Level - Architecture Breakout Session

Taking Splunk to the Next Level - Architecture

Serverless Patterns by Jesse Butler

Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou

DoneDeal - AWS Data Analytics Platform

SQL PASS Summit 2018

Radical Speed for SQL Queries on Databricks: Photon Under the Hood

How to Drive Down iSeries Computing Costs

Lessons Learned Replatforming A Large Machine Learning Application To Apache ...

Best Practices for Enabling Speculative Execution on Large Scale Platforms

Performance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorks

Cloud Security Monitoring and Spark Analytics

Taking Splunk to the Next Level - Technical

Taking Splunk to the Next Level – Architecture

How to Migrate Applications Off a Mainframe

Recently uploaded

IESVE for Early Stage Design and PlanningIES VE

The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance

Demystifying gRPC in .Net by John StaveleyJohn Staveley

ECS 2024 Teams Premium - Pretty SecureFemke de Vroome

Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55

WSO2CONMay2024OpenSourceConferenceDebrief.pptxJennifer Lim

WebAssembly is Key to Better LLM PerformanceSamy Fodil

Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin

Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance

Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin

Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel

Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin

Designing for Hardware Accessibility at ComcastUXDXConf

THE BEST IPTV in GERMANY for 2024: IPTVreelreely ones

Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin

Google I/O Extended 2024 WarsawGDSC PJATK

What's New in Teams Calling, Meetings and Devices April 2024Stephanie Beckett

Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde

Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin

The UX of Automation by AJ King, Senior UX Researcher, OcadoUXDXConf

Recently uploaded (20)

IESVE for Early Stage Design and Planning

The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf

Demystifying gRPC in .Net by John Staveley

ECS 2024 Teams Premium - Pretty Secure

Oauth 2.0 Introduction and Flows with MuleSoft

WSO2CONMay2024OpenSourceConferenceDebrief.pptx

WebAssembly is Key to Better LLM Performance

Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom

Where to Learn More About FDO _ Richard at FIDO Alliance.pdf

Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder

Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx

Powerful Start- the Key to Project Success, Barbara Laskowska

Designing for Hardware Accessibility at Comcast

THE BEST IPTV in GERMANY for 2024: IPTVreel

Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...

Google I/O Extended 2024 Warsaw

What's New in Teams Calling, Meetings and Devices April 2024

Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)

Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade

The UX of Automation by AJ King, Senior UX Researcher, Ocado

Sparklint @ Spark Meetup Chicago

1. Sparklint a Tool for Identifying and Tuning Inefficient Spark Jobs Across Your Cluster

2. Simon Whitear Principal Engineer

3. Why Sparklint? • A successful Spark cluster grows rapidly • Capacity and capability mismatches arise • Leads to resource contention • Tuning process is non-trivial • Current UI operational in focus We wanted to understand application efficiency

4. Sparklint provides: • Live view of batch & streaming application stats or • Event by event analysis of historical event logs • Stats and graphs for: – Idle time – Core usage – Task locality

5. Sparklint Listener:

6. Sparklint Listener:

7. Sparklint Server:

8. Demo… • Simulated workload analyzing site access logs: – read text file as JSON – convert to Record(ip, verb, status, time) – countByIp, countByStatus, countByVerb

9. Job took 10m7s to finish Already pretty good distribution; low idle time indicates good worker usage, minimal driver node interaction in job But overall utilization is low Which is reflected in the common occurrence of the IDLE state (unused cores)

10. Job took 15m14s to finish Core usage increased, job is more efficient, execution time increased, but the app is not cpu bound

11. Job took 9m24s to finish Core utilization decreased proportionally, trading execution time for efficiency Lots of IDLE state shows we are over allocating resources

12. Job took 11m34s to finish Core utilization remains low, the config settings are not right for this workload. Dynamic allocation only effective at app start due to long executorIdleTimeout setting

13. Job took 33m5s to finish Core utilization is up, but execution time is up dramatically due to reclaiming resources before each short running task. IDLE state is reduced to a minimum, looks efficient, but execution is much slower due to dynamic allocation overhead

14. Job took 7m34s to finish Core utilization way up, with lower execution time Parallel execution is clearly visible in overlapping stages Flat tops show we are becoming CPU bound

15. Job took 5m6s to finish Core utilization decreases, trading execution time for efficiency again here

16. Thanks to dynamic allocation the utilization is high despite being a bi- modal application Data loading and mapping requires a large core count to get throughput Aggregation and IO of results optimized for end file size, therefore requires less cores

17. Future Features: • History Server event sources • Inline recommendations • Auto-tuning • Streaming stage parameter delegation • Replay capable listener

18. The Credit: • Lead developer is Robert Xue • https://github.com/roboxue • SDE @ Groupon

19. Contribute! Sparklint is OSS: https://github.com/groupon/sparklint

20. Q+A

Editor's Notes

Spark cluster success Platform rolls out with a maximum supported load. Early projects ramp up, usage is fine Early successes feed back into recommendations to use the platform New users start loading up the platform just as initial successes are being scaled Platform limits hit, scaling requirements now begin to be understood and planned for Rough times whilst the platform operation learns to lead the application usage ◦ spark ui provides masses of info for only recent jobs / stages/tasks by default when the job is alive ◦ when serving spark ui from history server, there is still little summary information to debug the job config: Have I used the right magic number (locality wait, cores, numPartitions, job scheduling mode, etc.) ◦ difficult to compare different execution of the same job, due to this missing level of summary, (execution time is almost the only metrics to compare)
◦ A mechanism to listen the spark event log stream, and accumulate life time stats without losing (too many) details using constant memory in live mode because of the gauge we are using ◦ The mechanism also provides convenient replay when serving from a file ◦ A set of stats and graphs to describe the job performance uniformly: 1. idle time (duration when all calc are done on driver node, things to avoid) 2. max core usage, core usage percentage (should not be too high or too low, thinking about using avg numTaskInWait to supplement it) 3. task execution time for a certain stage by locality, (which honestly describes the opportunity cost of a lower locality, and indicates the idle locality wait config.)
using the ReduceByKey.scala in repo as a sample to demo a series of attempts when we try to optimize a Spark application. The logs are included as well. The highlights for each run have been annotated in the screenshots in the attachment. The application is basically reading a text file, json parse and convert to "Record(ip: String, verb: String, status: Int, time: Long)", then do countByIP, countByStatus, countByVerb on them, repeat 10 times. These are three independent map reduce jobs, each has one map stage (parsing) and one reduce stage (countByXXX). Algo level optimization is out of the discussion here. The app need a constant number of CPU seconds, and a floating but bound amount of network i/o time (decided by job locality) to finish the execution.
We use 16 cores as the baseline standard. The job takes 10 min to finish. The annotations in the pic describes what are we running here, and how to read sparklint graph. After reading the chart, we decided to decrease core to see if the execution time doubles or not, to figure out if we are bonded by CPU.
by using 8 cores, the job took 15 min to finish, shorter than the 20 min expectation, proving that we are not bonded by cpu. Actually this saw tooth pattern easily indicates we are not bonded by CPU, and can be used as a classic example; An example of cpu bounded application can be found in the last demo slide. This leads to another angle of optimization: job scheduling tweaking.
by using 32 cores, the job took 9 min to finish, proving again that throwing more cores doesn't provide commensurate performance gains.. The graph is a classic example about over allocating resources. We can assume we need no more than 24 cores to do the work effectively, so now we can look into other ways of tuning the job: dynamic allocation and increased parallelism.
we try to optimize resource requirement by using dynamic allocation, initially just using the default executorIdleTimeout of 1min. This has also led us to try 1 core / executor. Since we don't usually have any task longer than 1 minute, we proved that dynamic allocation is not the key in optimizing this kind of app that has shorter tasks.
we reduced executorIdleTimeout to 10s. In this way we decreased resource footprint and increased utilization. However this is a false saving for this job, because the job throughput is reduced due to low core supply and overhead in getting executors. This example proved again that dynamic allocation doesn't solve the optimization challenge when we have shorter tasks So, let’s try parallelism inside the job using FAIR scheduling.
by using 16 cores and FAIR scheduler, this simple tweak cut the execution time from 10 min to 7.5 min, and our job now become CPU bounded (see annotation) The tweak to run the three count stages in parallel and use FAIR scheduling increases efficiency and reduces runtime, allowing us to become CPU bound,
by using 32 cores and FAIR scheduler, the execution time become 5 min (compare to 9 min in pic3 using the same resource). We reduce efficiency in order to gain execution time, this is a decision for the team to decide, if there is a hard SLA to hit, it may be worth running with lower utilization. We can now call the job scheduling optimization done.
Demos the correct scenario of using dynamic allocation, and throwing more cpu will help when the job is CPU bounded (the flat tops in the usage graph is the clear proof) In this case the partition count is chosen to optimize file size on HDFS, so the team are comfortable with the runtime.
Sparklint can easily distinguish CPU bounded and job scheduling bounded applications. (We are working on automating this judgment, by using average number of pending tasks) Really easy to spot when a job is not bounded by CPU, but job scheduling (leads to low core usage) and driver node operations (leads to idle time). In theory your app will be 2x faster if you throw 2x cores to it, but this is not always true The point of spark level optimization is to make your job CPU bounded, when you can decide freely between ($ gain from faster application / $ spent in providing more cores) If your job is CPU bounded, simply add cores If your job has a lot of idle time, try decrease it by reducing unwanted/unintended driver node operations. (could be simple things like doing a map on a large array instead of an RDD and they forgot about it) If your job is job scheduling bounded, you can both reduce waste by using dynamic allocation (which in turns provides you high throughput when needed), and submit independent jobs in parallel using Futures and FAIR scheduler http://spark.apache.org/docs/latest/configuration.html#scheduling

Sparklint @ Spark Meetup Chicago

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Sparklint @ Spark Meetup Chicago

Similar to Sparklint @ Spark Meetup Chicago (20)

Recently uploaded

Recently uploaded (20)

Sparklint @ Spark Meetup Chicago

Editor's Notes