Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Sparklens: Understanding the Scalability Limits of Spark Applications with Rohit Karlupia

267 views

Published on

One of the common requests we receive from customers (at Qubole) is debugging slow spark application. Usually this process is done with trial and error, which takes time and requires running clusters beyond normal usage (read wasted resources). Moreover, it doesn’t tell us where to looks for further improvements. We at Qubole are looking into making this process more self-serve. Towards this goal we have built a tool (OSS https://github.com/qubole/sparklens) based on spark event listener framework.

From a single run of the application, Sparklens provides insights about scalability limits of given spark application. In this talk we will cover what Sparklens does and theory behind Sparklens. We will talk about how structure of spark application puts important constraints on its scalability. How can we find these structural constraints and how to use these constraints as a guide in solving performance and scalability problems of spark applications.

This talk will help audience in answering the following questions about their spark applications: 1) Will their application run faster with more executors? 2) How will cluster utilization change as number of executors change? 3) What is the absolute minimum time this application will take even if we give it infinite executors? 4) What is the expected wall clock time for the application when we fix the most important structural limits of these application? Sparklens makes the ROI of additional executor extremely obvious for a given application and needs just a single run of the application to determine how application with behave with different executor counts. Specifically, it will help managers take the correct side of the tradeoff between spending developer time optimising applications vs spending money on compute bills.

Published in: Data & Analytics
  • If you want to download or read this book, copy link or url below in the New tab ......................................................................................................................... DOWNLOAD FULL PDF EBOOK here { http://bit.ly/2m6jJ5M } .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • If you want to download or read this book, Copy link or url below in the New tab ......................................................................................................................... DOWNLOAD FULL PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • If you want to download or read this book, copy link or url below in the New tab ......................................................................................................................... DOWNLOAD FULL PDF EBOOK here { http://bit.ly/2m6jJ5M } .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • If you want to download or read this book, Copy link or url below in the New tab ......................................................................................................................... DOWNLOAD FULL PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Sparklens: Understanding the Scalability Limits of Spark Applications with Rohit Karlupia

  1. 1. Rohit Karlupia, Qubole Sparklens: Understanding the scalability limits of Spark applications @qubole #Sparklens #SAISDev17
  2. 2. Agenda 2Share your thoughts @qubole #Sparklens #SAISDev17 PERFORMANCE TUNING PITFALLS THEORY BEHIND SPARKLENS QUBOLE SPARKLENS TUNING EXAMPLE
  3. 3. Spark Tuning: Usual Suspects 3Share your thoughts @qubole #Sparklens #SAISDev17 ProfilingExperimentsResource Utilization
  4. 4. Minimize Doing Nothing 4 Driver Stage 1 Stage 2 Stage 3 Core1 Core2 Core3 Core4 Time Share your thoughts @qubole Sparklens
  5. 5. Driver Side Computations 5 Driver Stage 1 Stage 2 Stage 3 Core1 Core2 Core3 Core4 Time Share your thoughts @qubole #Sparklens #SAISDev17
  6. 6. Driver does... File listing & split computation Loading of hive tables FOC Collect df.toPandas() 6Share your thoughts @qubole #Sparklens #SAISDev17
  7. 7. Not Enough Tasks 7 Driver Stage 1 Stage 2 Stage 3 Core1 Core2 Core3 Core4 Time Share your thoughts @qubole #Sparklens #SAISDev17
  8. 8. Controlling number of tasks HDFS block size Min/max split size Default Parallelism Shuffle Partitions Repartitions 8Share your thoughts @qubole #Sparklens #SAISDev17
  9. 9. Non-Uniform Tasks: Skew 9 Driver Stage 1 Stage 2 Stage 3 Core1 Core2 Core3 Core4 Time Share your thoughts @qubole #Sparklens #SAISDev17
  10. 10. Critical Path: Limit to scalability 10 Driver Stage 1 Stage 2 Stage 3 Core1 Core2 Core3 Core4 Time Share your thoughts @qubole #Sparklens #SAISDev17
  11. 11. Ideal Application Time 11 Driver Stage 1 Stage 2 Stage 3 Core1 Core2 Core3 Core4 Time Share your thoughts @qubole #Sparklens #SAISDev17
  12. 12. Latency Vs Compute vs Developer 12 Critical Path Time Wall Clock Time Ideal* Application Time Share your thoughts @qubole #Sparklens #SAISDev17
  13. 13. Structure of a Spark application Spark application is either executing in driver or in parallel in executors Child stage is not executed until all parent stages are complete Stage is not complete until all tasks of stage are complete 13Share your thoughts @qubole Sparklens
  14. 14. 14 Sparklens in Action Performance Tuning 603 lines of unfamiliar Scala code Share your thoughts @qubole #Sparklens #SAISDev17
  15. 15. Sparklens: First Pass Driver WallClock 41m 40s 26% Executor WallClock 117m 03s 74% Total WallClock 158m 44s Critical Path 127m 41s Ideal Application 43m 32s @qubole #Sparklens #SAISDev17
  16. 16. Observations & Actions • The application had too many stages (697) • The Critical Path Time was 3X the Ideal Application Time • Instead of letting spark write to hive table, the code was doing serial writes to each partition, in a loop • We changed the code to let spark write to partitions in parallel @qubole #Sparklens #SAISDev17
  17. 17. Sparklens: Second Pass Driver WallClock 02m 28s 9% Executor WallClock 24m 03s 91% Total WallClock 26m 32s Critical Path 25m 27s Ideal Application 04m 48s @qubole #Sparklens #SAISDev17
  18. 18. Count Time Utilisation 10 44m 51% 20 34m 33% 50 28m 16% 80 27m 10% 100 26m 8% 110 26m 8% 120 26m 7% 150 25m 5% 200 25m 4% 300 25m 3% 400 25m 2% 500 25m 1% Sparklens Performance Prediction @qubole #Sparklens #SAISDev17
  19. 19. Executor Utilisation ECCH available 320h 50m ECCH used 31h 00m 9% ECCH wasted 289h 50m 91% ECCH: Executor Core Compute Hour @qubole #Sparklens #SAISDev17
  20. 20. Per Stage Metrics Stage-ID WallClock Core Task PRatio -----Task------ Stage% ComputeHours Count Skew StageSkew 0 0.27 00h 00m 2 0.00 1.00 0.78 1 0.37 00h 00m 10 0.01 1.05 0.85 33 85.84 03h 18m 10 0.01 1.07 1.00 Stage-ID OIRatio |* ShuffleWrite% ReadFetch% GC% *| 0 0.00 |* 0.00 0.00 3.03 *| 1 0.00 |* 0.00 0.00 2.02 *| 33 0.00 |* 0.00 0.00 0.23 *| CCH 3h 18m Task Count 10 Total Cores 800 @qubole #Sparklens #SAISDev17
  21. 21. Observations & Actions • 85% of time spent in a single stage with very low number of tasks. • 91% compute wasted on executor side. • Found that repartition(10) was called somewhere in code, resulting in only 10 tasks. Removed it. • Also increased the spark.sql.shuffle.partitions from default 200 to 800@qubole #Sparklens #SAISDev17
  22. 22. Sparklens: Third Pass Driver WallClock 02m 34s 26% Executor WallClock 07m 13s 74% Total WallClock 09m 48s Critical Path 07m 18s Ideal Application 07m 09s @qubole #Sparklens #SAISDev17
  23. 23. Using Sparklens —packages qubole:sparklens:0.2.0-s_2.11 —conf spark.extraListener=com.qubole.sparklens.QuboleJobListener For inline processing, add following extra command line options to spark-submit Old event log files (history server) —packages qubole:sparklens:0.2.0-s_2.11 --class com.qubole.sparklens.app.ReporterApp dummy-arg <eventLogFile> source=history Special Sparklens output files (very small file with all the relevant data) —packages qubole:sparklens:0.2.0-s_2.11 --class com.qubole.sparklens.app.ReporterApp dummy-arg <eventLogFile>@qubole #Sparklens #SAISDev17 https://github.com/qubole/sparklens
  24. 24. http://sparklens.qubole.net @qubole #Sparklens #SAISDev17
  25. 25. Future Work • Sparklens for spark pipelines • Elasticity aware autoscaling @qubole #Sparklens #SAISDev17

×