SlideShare a Scribd company logo
1 of 25
Download to read offline
Rohit Karlupia, Qubole
Sparklens: Understanding
the scalability limits of Spark
applications
@qubole #Sparklens #SAISDev17
Agenda
2Share your thoughts @qubole #Sparklens #SAISDev17
PERFORMANCE
TUNING PITFALLS
THEORY BEHIND
SPARKLENS
QUBOLE SPARKLENS
TUNING EXAMPLE
Spark Tuning: Usual Suspects
3Share your thoughts @qubole #Sparklens #SAISDev17
ProfilingExperimentsResource
Utilization
Minimize Doing Nothing
4
Driver Stage 1 Stage 2 Stage 3
Core1
Core2
Core3
Core4
Time
Share your thoughts @qubole Sparklens
Driver Side Computations
5
Driver Stage 1 Stage 2 Stage 3
Core1
Core2
Core3
Core4
Time
Share your thoughts @qubole #Sparklens #SAISDev17
Driver does...
File listing & split
computation
Loading of hive tables
FOC
Collect
df.toPandas()
6Share your thoughts @qubole #Sparklens #SAISDev17
Not Enough Tasks
7
Driver Stage 1 Stage 2 Stage 3
Core1
Core2
Core3
Core4
Time
Share your thoughts @qubole #Sparklens #SAISDev17
Controlling
number of tasks
HDFS block size
Min/max split size
Default Parallelism
Shuffle Partitions
Repartitions
8Share your thoughts @qubole #Sparklens #SAISDev17
Non-Uniform Tasks: Skew
9
Driver Stage 1 Stage 2 Stage 3
Core1
Core2
Core3
Core4
Time
Share your thoughts @qubole #Sparklens #SAISDev17
Critical Path: Limit to scalability
10
Driver Stage 1 Stage 2 Stage 3
Core1
Core2
Core3
Core4
Time
Share your thoughts @qubole #Sparklens #SAISDev17
Ideal Application Time
11
Driver Stage 1 Stage 2 Stage 3
Core1
Core2
Core3
Core4
Time
Share your thoughts @qubole #Sparklens #SAISDev17
Latency Vs Compute vs Developer
12
Critical Path
Time
Wall Clock
Time
Ideal*
Application
Time
Share your thoughts @qubole #Sparklens #SAISDev17
Structure of a
Spark application
Spark application is either
executing in driver or in
parallel in executors
Child stage is not executed
until all parent stages are
complete
Stage is not complete until
all tasks of stage are
complete
13Share your thoughts @qubole Sparklens
14
Sparklens in Action
Performance Tuning 603 lines of unfamiliar Scala code
Share your thoughts @qubole #Sparklens #SAISDev17
Sparklens: First Pass
Driver WallClock 41m 40s 26%
Executor WallClock 117m 03s 74%
Total WallClock 158m 44s
Critical Path 127m 41s
Ideal Application 43m 32s
@qubole
#Sparklens
#SAISDev17
Observations & Actions
• The application had too many stages (697)
• The Critical Path Time was 3X the Ideal Application Time
• Instead of letting spark write to hive table, the code was doing
serial writes to each partition, in a loop
• We changed the code to let spark write to partitions in parallel
@qubole
#Sparklens
#SAISDev17
Sparklens: Second Pass
Driver WallClock 02m 28s 9%
Executor WallClock 24m 03s 91%
Total WallClock 26m 32s
Critical Path 25m 27s
Ideal Application 04m 48s
@qubole
#Sparklens
#SAISDev17
Count Time Utilisation
10 44m 51%
20 34m 33%
50 28m 16%
80 27m 10%
100 26m 8%
110 26m 8%
120 26m 7%
150 25m 5%
200 25m 4%
300 25m 3%
400 25m 2%
500 25m 1%
Sparklens Performance Prediction
@qubole
#Sparklens
#SAISDev17
Executor Utilisation
ECCH available 320h 50m
ECCH used 31h 00m 9%
ECCH wasted 289h 50m 91%
ECCH: Executor Core Compute Hour
@qubole
#Sparklens
#SAISDev17
Per Stage Metrics
Stage-ID WallClock Core Task PRatio -----Task------
Stage% ComputeHours Count Skew StageSkew
0 0.27 00h 00m 2 0.00 1.00 0.78
1 0.37 00h 00m 10 0.01 1.05 0.85
33 85.84 03h 18m 10 0.01 1.07 1.00
Stage-ID OIRatio |* ShuffleWrite% ReadFetch% GC% *|
0 0.00 |* 0.00 0.00 3.03 *|
1 0.00 |* 0.00 0.00 2.02 *|
33 0.00 |* 0.00 0.00 0.23 *|
CCH 3h 18m
Task Count 10
Total Cores 800
@qubole
#Sparklens
#SAISDev17
Observations & Actions
• 85% of time spent in a single stage with very low number of
tasks.
• 91% compute wasted on executor side.
• Found that repartition(10) was called somewhere in code,
resulting in only 10 tasks. Removed it.
• Also increased the spark.sql.shuffle.partitions from default 200
to 800@qubole
#Sparklens
#SAISDev17
Sparklens: Third Pass
Driver WallClock 02m 34s 26%
Executor WallClock 07m 13s 74%
Total WallClock 09m 48s
Critical Path 07m 18s
Ideal Application 07m 09s
@qubole
#Sparklens
#SAISDev17
Using Sparklens
—packages qubole:sparklens:0.2.0-s_2.11
—conf spark.extraListener=com.qubole.sparklens.QuboleJobListener
For inline processing, add following extra command line options to spark-submit
Old event log files (history server)
—packages qubole:sparklens:0.2.0-s_2.11 --class
com.qubole.sparklens.app.ReporterApp dummy-arg <eventLogFile>
source=history
Special Sparklens output files (very small file with all the relevant data)
—packages qubole:sparklens:0.2.0-s_2.11 --class
com.qubole.sparklens.app.ReporterApp dummy-arg <eventLogFile>@qubole
#Sparklens
#SAISDev17 https://github.com/qubole/sparklens
http://sparklens.qubole.net
@qubole
#Sparklens
#SAISDev17
Future Work
• Sparklens for spark pipelines
• Elasticity aware autoscaling
@qubole
#Sparklens
#SAISDev17

More Related Content

What's hot

HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Databricks
 

What's hot (20)

Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Sharding
ShardingSharding
Sharding
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpRunning Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle Service
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Vectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at FacebookVectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at Facebook
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested ColumnsMaterialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
 
Improving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesImproving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot Instances
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
 

Similar to Sparklens: Understanding the Scalability Limits of Spark Applications with Rohit Karlupia

RMOUG 2013 - Where did my CPU go?
RMOUG 2013 - Where did my CPU go?RMOUG 2013 - Where did my CPU go?
RMOUG 2013 - Where did my CPU go?
Kristofferson A
 
Where Did My CPU Go?
Where Did My CPU Go?Where Did My CPU Go?
Where Did My CPU Go?
Enkitec
 
Rmoug13 - where did my CPU go?
Rmoug13 - where did my CPU go?Rmoug13 - where did my CPU go?
Rmoug13 - where did my CPU go?
Enkitec
 
Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on Kubernetes
Databricks
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
OOW 2013: Where did my CPU go
OOW 2013: Where did my CPU goOOW 2013: Where did my CPU go
OOW 2013: Where did my CPU go
Kristofferson A
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
Databricks
 

Similar to Sparklens: Understanding the Scalability Limits of Spark Applications with Rohit Karlupia (20)

Java Performance: Speedup your application with hardware counters
Java Performance: Speedup your application with hardware countersJava Performance: Speedup your application with hardware counters
Java Performance: Speedup your application with hardware counters
 
Phil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage makerPhil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage maker
 
Machine learning at scale with aws sage maker
Machine learning at scale with aws sage makerMachine learning at scale with aws sage maker
Machine learning at scale with aws sage maker
 
RMOUG 2013 - Where did my CPU go?
RMOUG 2013 - Where did my CPU go?RMOUG 2013 - Where did my CPU go?
RMOUG 2013 - Where did my CPU go?
 
Where Did My CPU Go?
Where Did My CPU Go?Where Did My CPU Go?
Where Did My CPU Go?
 
Rmoug13 - where did my CPU go?
Rmoug13 - where did my CPU go?Rmoug13 - where did my CPU go?
Rmoug13 - where did my CPU go?
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Scaling sql server 2014 parallel insert
Scaling sql server 2014 parallel insertScaling sql server 2014 parallel insert
Scaling sql server 2014 parallel insert
 
Using AWR for SQL Analysis
Using AWR for SQL AnalysisUsing AWR for SQL Analysis
Using AWR for SQL Analysis
 
TestUpload
TestUploadTestUpload
TestUpload
 
Ccna labs-udemy
Ccna labs-udemyCcna labs-udemy
Ccna labs-udemy
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
Sge
SgeSge
Sge
 
Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on Kubernetes
 
Apache Beam (incubating)
Apache Beam (incubating)Apache Beam (incubating)
Apache Beam (incubating)
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at Facebook
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
OOW 2013: Where did my CPU go
OOW 2013: Where did my CPU goOOW 2013: Where did my CPU go
OOW 2013: Where did my CPU go
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
 

More from Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
mikehavy0
 
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotecAbortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
LuisMiguelPaz5
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
23050636
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 

Recently uploaded (20)

obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSDBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AI
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 
Abortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotecAbortion pills in Jeddah |+966572737505 | get cytotec
Abortion pills in Jeddah |+966572737505 | get cytotec
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 

Sparklens: Understanding the Scalability Limits of Spark Applications with Rohit Karlupia

  • 1. Rohit Karlupia, Qubole Sparklens: Understanding the scalability limits of Spark applications @qubole #Sparklens #SAISDev17
  • 2. Agenda 2Share your thoughts @qubole #Sparklens #SAISDev17 PERFORMANCE TUNING PITFALLS THEORY BEHIND SPARKLENS QUBOLE SPARKLENS TUNING EXAMPLE
  • 3. Spark Tuning: Usual Suspects 3Share your thoughts @qubole #Sparklens #SAISDev17 ProfilingExperimentsResource Utilization
  • 4. Minimize Doing Nothing 4 Driver Stage 1 Stage 2 Stage 3 Core1 Core2 Core3 Core4 Time Share your thoughts @qubole Sparklens
  • 5. Driver Side Computations 5 Driver Stage 1 Stage 2 Stage 3 Core1 Core2 Core3 Core4 Time Share your thoughts @qubole #Sparklens #SAISDev17
  • 6. Driver does... File listing & split computation Loading of hive tables FOC Collect df.toPandas() 6Share your thoughts @qubole #Sparklens #SAISDev17
  • 7. Not Enough Tasks 7 Driver Stage 1 Stage 2 Stage 3 Core1 Core2 Core3 Core4 Time Share your thoughts @qubole #Sparklens #SAISDev17
  • 8. Controlling number of tasks HDFS block size Min/max split size Default Parallelism Shuffle Partitions Repartitions 8Share your thoughts @qubole #Sparklens #SAISDev17
  • 9. Non-Uniform Tasks: Skew 9 Driver Stage 1 Stage 2 Stage 3 Core1 Core2 Core3 Core4 Time Share your thoughts @qubole #Sparklens #SAISDev17
  • 10. Critical Path: Limit to scalability 10 Driver Stage 1 Stage 2 Stage 3 Core1 Core2 Core3 Core4 Time Share your thoughts @qubole #Sparklens #SAISDev17
  • 11. Ideal Application Time 11 Driver Stage 1 Stage 2 Stage 3 Core1 Core2 Core3 Core4 Time Share your thoughts @qubole #Sparklens #SAISDev17
  • 12. Latency Vs Compute vs Developer 12 Critical Path Time Wall Clock Time Ideal* Application Time Share your thoughts @qubole #Sparklens #SAISDev17
  • 13. Structure of a Spark application Spark application is either executing in driver or in parallel in executors Child stage is not executed until all parent stages are complete Stage is not complete until all tasks of stage are complete 13Share your thoughts @qubole Sparklens
  • 14. 14 Sparklens in Action Performance Tuning 603 lines of unfamiliar Scala code Share your thoughts @qubole #Sparklens #SAISDev17
  • 15. Sparklens: First Pass Driver WallClock 41m 40s 26% Executor WallClock 117m 03s 74% Total WallClock 158m 44s Critical Path 127m 41s Ideal Application 43m 32s @qubole #Sparklens #SAISDev17
  • 16. Observations & Actions • The application had too many stages (697) • The Critical Path Time was 3X the Ideal Application Time • Instead of letting spark write to hive table, the code was doing serial writes to each partition, in a loop • We changed the code to let spark write to partitions in parallel @qubole #Sparklens #SAISDev17
  • 17. Sparklens: Second Pass Driver WallClock 02m 28s 9% Executor WallClock 24m 03s 91% Total WallClock 26m 32s Critical Path 25m 27s Ideal Application 04m 48s @qubole #Sparklens #SAISDev17
  • 18. Count Time Utilisation 10 44m 51% 20 34m 33% 50 28m 16% 80 27m 10% 100 26m 8% 110 26m 8% 120 26m 7% 150 25m 5% 200 25m 4% 300 25m 3% 400 25m 2% 500 25m 1% Sparklens Performance Prediction @qubole #Sparklens #SAISDev17
  • 19. Executor Utilisation ECCH available 320h 50m ECCH used 31h 00m 9% ECCH wasted 289h 50m 91% ECCH: Executor Core Compute Hour @qubole #Sparklens #SAISDev17
  • 20. Per Stage Metrics Stage-ID WallClock Core Task PRatio -----Task------ Stage% ComputeHours Count Skew StageSkew 0 0.27 00h 00m 2 0.00 1.00 0.78 1 0.37 00h 00m 10 0.01 1.05 0.85 33 85.84 03h 18m 10 0.01 1.07 1.00 Stage-ID OIRatio |* ShuffleWrite% ReadFetch% GC% *| 0 0.00 |* 0.00 0.00 3.03 *| 1 0.00 |* 0.00 0.00 2.02 *| 33 0.00 |* 0.00 0.00 0.23 *| CCH 3h 18m Task Count 10 Total Cores 800 @qubole #Sparklens #SAISDev17
  • 21. Observations & Actions • 85% of time spent in a single stage with very low number of tasks. • 91% compute wasted on executor side. • Found that repartition(10) was called somewhere in code, resulting in only 10 tasks. Removed it. • Also increased the spark.sql.shuffle.partitions from default 200 to 800@qubole #Sparklens #SAISDev17
  • 22. Sparklens: Third Pass Driver WallClock 02m 34s 26% Executor WallClock 07m 13s 74% Total WallClock 09m 48s Critical Path 07m 18s Ideal Application 07m 09s @qubole #Sparklens #SAISDev17
  • 23. Using Sparklens —packages qubole:sparklens:0.2.0-s_2.11 —conf spark.extraListener=com.qubole.sparklens.QuboleJobListener For inline processing, add following extra command line options to spark-submit Old event log files (history server) —packages qubole:sparklens:0.2.0-s_2.11 --class com.qubole.sparklens.app.ReporterApp dummy-arg <eventLogFile> source=history Special Sparklens output files (very small file with all the relevant data) —packages qubole:sparklens:0.2.0-s_2.11 --class com.qubole.sparklens.app.ReporterApp dummy-arg <eventLogFile>@qubole #Sparklens #SAISDev17 https://github.com/qubole/sparklens
  • 25. Future Work • Sparklens for spark pipelines • Elasticity aware autoscaling @qubole #Sparklens #SAISDev17