SlideShare a Scribd company logo
Submit Search
Upload
Login
Signup
JVM and OS Tuning for accelerating Spark application
Report
Tatsuhiro Chiba
Follow
Researcher at IBM
Feb. 9, 2016
•
0 likes
•
6,358 views
1
of
12
JVM and OS Tuning for accelerating Spark application
Feb. 9, 2016
•
0 likes
•
6,358 views
Download Now
Download to read offline
Report
Software
This presentation is used in my talk at Hadoop Spark Conference Japan 2016.
Tatsuhiro Chiba
Follow
Researcher at IBM
Recommended
Apache Sparkを用いたスケーラブルな時系列データの異常検知モデル学習ソフトウェアの開発
Ryo 亮 Kawahara 河原
5.8K views
•
29 slides
Exploiting GPUs in Spark
Kazuaki Ishizaki
9.4K views
•
36 slides
Track A-2 基於 Spark 的數據分析
Etu Solution
5.8K views
•
59 slides
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
J On The Beach
1.6K views
•
36 slides
Using BigBench to compare Hive and Spark (Long version)
Nicolas Poggi
734 views
•
35 slides
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Databricks
1.5K views
•
45 slides
More Related Content
What's hot
Managing Apache Spark Workload and Automatic Optimizing
Databricks
1.1K views
•
47 slides
sudoers: Benchmarking Hadoop with ALOJA
Nicolas Poggi
691 views
•
33 slides
The state of SQL-on-Hadoop in the Cloud
Nicolas Poggi
818 views
•
35 slides
Getting The Best Performance With PySpark
Spark Summit
27.2K views
•
49 slides
Scaling Machine Learning To Billions Of Parameters
Jen Aman
2.7K views
•
95 slides
Re-Architecting Spark For Performance Understandability
Jen Aman
436 views
•
52 slides
What's hot
(20)
Managing Apache Spark Workload and Automatic Optimizing
Databricks
•
1.1K views
sudoers: Benchmarking Hadoop with ALOJA
Nicolas Poggi
•
691 views
The state of SQL-on-Hadoop in the Cloud
Nicolas Poggi
•
818 views
Getting The Best Performance With PySpark
Spark Summit
•
27.2K views
Scaling Machine Learning To Billions Of Parameters
Jen Aman
•
2.7K views
Re-Architecting Spark For Performance Understandability
Jen Aman
•
436 views
Exploiting GPUs in Spark
Kazuaki Ishizaki
•
1.7K views
Inferno Scalable Deep Learning on Spark
DataWorks Summit/Hadoop Summit
•
2.3K views
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
Spark Summit
•
3.9K views
Demystifying DataFrame and Dataset
Kazuaki Ishizaki
•
873 views
Distributed Model Training using MXNet with Horovod
Lin Yuan
•
219 views
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Databricks
•
647 views
Leveraging GPU-Accelerated Analytics on top of Apache Spark with Todd Mostak
Databricks
•
968 views
GPU Computing With Apache Spark And Python
Jen Aman
•
3.6K views
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
Hadoop / Spark Conference Japan
•
1.7K views
Life of PySpark - A tale of two environments
Shankar M S
•
152 views
What's new in Hadoop Common and HDFS
DataWorks Summit/Hadoop Summit
•
2.9K views
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
Spark Summit
•
1.2K views
Spark on Mesos
Jen Aman
•
1.8K views
Low Latency Execution For Apache Spark
Jen Aman
•
4.2K views
Viewers also liked
Spark 2.0 What's Next (Hadoop / Spark Conference Japan 2016 キーノート講演資料)
Hadoop / Spark Conference Japan
8K views
•
34 slides
Hadoop / Spark Conference Japan 2016 ご挨拶・Hadoopを取り巻く環境
Hadoop / Spark Conference Japan
6.3K views
•
11 slides
Hadoop Conference Japan 2016 LT資料 グラフデータベース事始め
オラクルエンジニア通信
6.3K views
•
38 slides
Apache Hadoop の現在と将来(Hadoop / Spark Conference Japan 2016 キーノート講演資料)
Hadoop / Spark Conference Japan
7.2K views
•
15 slides
Sparkによる GISデータを題材とした時系列データ処理 (Hadoop / Spark Conference Japan 2016 講演資料)
Hadoop / Spark Conference Japan
10K views
•
49 slides
2016-02-08 Spark MLlib Now and Beyond@Spark Conference Japan 2016
Yu Ishikawa
9.6K views
•
40 slides
Viewers also liked
(8)
Spark 2.0 What's Next (Hadoop / Spark Conference Japan 2016 キーノート講演資料)
Hadoop / Spark Conference Japan
•
8K views
Hadoop / Spark Conference Japan 2016 ご挨拶・Hadoopを取り巻く環境
Hadoop / Spark Conference Japan
•
6.3K views
Hadoop Conference Japan 2016 LT資料 グラフデータベース事始め
オラクルエンジニア通信
•
6.3K views
Apache Hadoop の現在と将来(Hadoop / Spark Conference Japan 2016 キーノート講演資料)
Hadoop / Spark Conference Japan
•
7.2K views
Sparkによる GISデータを題材とした時系列データ処理 (Hadoop / Spark Conference Japan 2016 講演資料)
Hadoop / Spark Conference Japan
•
10K views
2016-02-08 Spark MLlib Now and Beyond@Spark Conference Japan 2016
Yu Ishikawa
•
9.6K views
Hive on Spark を活用した高速データ分析 - Hadoop / Spark Conference Japan 2016
Nagato Kasaki
•
16.6K views
sparksql-hive-bench-by-nec-hwx-at-hcj16
Yifeng Jiang
•
10.5K views
Similar to JVM and OS Tuning for accelerating Spark application
Profiling & Testing with Spark
Roger Rafanell Mas
3K views
•
51 slides
IBM Runtimes Performance Observations with Apache Spark
AdamRobertsIBM
675 views
•
66 slides
Exploring the Performance Impact of Virtualization on an HPC Cloud
Ryousei Takano
9.8K views
•
21 slides
AIST Super Green Cloud: lessons learned from the operation and the performanc...
Ryousei Takano
1.2K views
•
29 slides
Apache Spark Performance Observations
Adam Roberts
588 views
•
66 slides
OpenACC Monthly Highlights: October2020
OpenACC
362 views
•
13 slides
Similar to JVM and OS Tuning for accelerating Spark application
(20)
Profiling & Testing with Spark
Roger Rafanell Mas
•
3K views
IBM Runtimes Performance Observations with Apache Spark
AdamRobertsIBM
•
675 views
Exploring the Performance Impact of Virtualization on an HPC Cloud
Ryousei Takano
•
9.8K views
AIST Super Green Cloud: lessons learned from the operation and the performanc...
Ryousei Takano
•
1.2K views
Apache Spark Performance Observations
Adam Roberts
•
588 views
OpenACC Monthly Highlights: October2020
OpenACC
•
362 views
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData
•
1.5K views
Large-Scale Optimization Strategies for Typical HPC Workloads
inside-BigData.com
•
508 views
DevoxxUK: Optimizating Application Performance on Kubernetes
Dinakar Guniguntala
•
206 views
Java Performance and Profiling
WSO2
•
2.3K views
Ch1
Elizabeth de Leon Aler
•
365 views
Ch1
Elizabeth de Leon Aler
•
325 views
Toronto meetup 20190917
Bill Liu
•
392 views
Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection
Akihiro Hayashi
•
841 views
Performance Tuning Oracle Weblogic Server 12c
Ajith Narayanan
•
7.5K views
Boosting spark performance: An Overview of Techniques
Ahsan Javed Awan
•
513 views
BKK16-308 The tool called Auto-Tuned Optimization System (ATOS)
Linaro
•
909 views
Fugaku, the Successes and the Lessons Learned
RCCSRENKEI
•
457 views
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Intel® Software
•
1.8K views
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
Jeff Larkin
•
1.7K views
Recently uploaded
A sighting of sequence function in Practical FP in Scala
Philip Schwarz
9 views
•
4 slides
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
Roberto Pérez Alcolea
408 views
•
86 slides
Salesforce @AXA.pdf
PatrickYANG48
7 views
•
13 slides
Test Automation at Scale: Lessons from Top-Performing Distributed Teams
Applitools
7 views
•
6 slides
A Guide to Java Dynamic Proxies and It in Coding
MikeConner22
5 views
•
4 slides
Workflow Engines & Event Streaming Brokers - Can they work together? [Current...
Natan Silnitsky
54 views
•
63 slides
Recently uploaded
(20)
A sighting of sequence function in Practical FP in Scala
Philip Schwarz
•
9 views
[DPE Summit] How Improving the Testing Experience Goes Beyond Quality: A Deve...
Roberto Pérez Alcolea
•
408 views
Salesforce @AXA.pdf
PatrickYANG48
•
7 views
Test Automation at Scale: Lessons from Top-Performing Distributed Teams
Applitools
•
7 views
A Guide to Java Dynamic Proxies and It in Coding
MikeConner22
•
5 views
Workflow Engines & Event Streaming Brokers - Can they work together? [Current...
Natan Silnitsky
•
54 views
PostgreSQL Prologue
Md. Golam Hossain
•
10 views
Domain storytelling-one-size-fit-all process
Michael Chen
•
7 views
Semantic Search_ NLP_ ML.pdf
PlamenaDzharadat
•
12 views
LangChain + Docugami Webinar
Taqi Jaffri
•
62 views
TorfsBot or Not? Evaluating User Perception on Imitative Text Generation (CLI...
Thomas Winters
•
11 views
The art of AI Art
Dennis Vroegop
•
13 views
DevOps and SF.pdf
PatrickYANG48
•
5 views
Why Should You Choose a Personal Trainer over Group Gym Classes?
Neighborhood Trainer
•
29 views
Alliance Expedition Battle
Silver Caprice
•
1.5K views
Improving User Experience with Our Website Feedback Tool
Not8 App
•
8 views
The Next Era of CRM.pdf
PatrickYANG48
•
8 views
Empowering Advanced Users: Extending OutSystems UI Framework with Openness an...
Bernardo Cardoso
•
34 views
Kubernetes with Cilium in AWS - Experience Report!
QAware GmbH
•
17 views
baklink.docx
AbdAsisHusainSalam
•
5 views
JVM and OS Tuning for accelerating Spark application
1.
© 2015 IBM
Corporation JVM, OSレベルのチューニングによる Spark アプリケーションの最適化 Feb. 8, 2016 Tatsuhiro Chiba (chiba@jp.ibm.com) IBM Research - Tokyo
2.
© 2016 IBM
CorporationHadoop / Spark Conference Japan 2016 Performance Innovation Laboratory, IBM Research - Tokyo Who am I ? Tatsuhiro Chiba (千葉 立寛) Staff Researcher at IBM Research – Tokyo Research Interests – Parallel Distributed System and Middleware – Parallel Distributed Programming Language – High Performance Computing Twitter: @tatsuhiro Today’s contents appear in, – 付録D in “Sparkによる実践データ解析” - O’reilly Japan – “Workload Characterization and Optimization of TPC-H Queries on Apache Spark”, IBM Research Reports. 2
3.
© 2016 IBM
CorporationHadoop / Spark Conference Japan 2016 Performance Innovation Laboratory, IBM Research - Tokyo Summary – after applying JVM and OS tuning 3 Machine Spec : CPU: POWER8 3.3GHz(2Sockets x 12cores), Memory: 1TB, Disk: 1TB OS: Ubuntu 14.10(Kernel: 3.16.0-31-generic) Optimized JVM Option : -Xmx24g –Xms24g –Xmn12g -Xgcthreads12 -Xtrace:none –Xnoloa –XlockReservation –Xgcthreads6 –Xnocompactgc –Xdisableexplicitgc -XX:-RuntimeInstrumentation –Xlp Executor JVMs : 4 OS Settings : NUMA aware affinity=enabled, large page=enabled Spark Version : 1.4.1 JVM Version : java version “1.8.0” (IBM J9 VM, build pxl6480sr2-20151023_01(SR2)) -50.0% -45.0% -40.0% -35.0% -30.0% -25.0% -20.0% -15.0% -10.0% -5.0% 0.0% 0 50 100 150 200 250 300 350 400 450 Q1 Q3 Q5 Q9 kmeans ExecutionTIme(sec.) original optimized speedup (%)
4.
© 2016 IBM
CorporationHadoop / Spark Conference Japan 2016 Performance Innovation Laboratory, IBM Research - Tokyo Benchmark 1 – Kmeans // input data is cached val data = sc.textFile(“file:///tmp/kmeans-data”, 2) val parsedData = data.map(s => Vectors.dense( s.split(' ').map(_.toDouble))).persist() // run Kmeans with varying # of clusters val bestK = (100,1) for (k <- 2 to 11) { val clusters = new KMeans() .setK(k).setMaxIterations(5) .setRuns(1).setInitializationMode("random") .setEpsilon(1e-30).run(parsedData) // evaluate val error = clusters.computeCost(parsedData) if (bestK._1 > error) { bestK = (errors,k) } } Kmeans Kmeans application – Varied clustering number ‘K’ for the same dataset – The first Kmeans job takes much time due to data loading into memory Synthetic data generator program – Used BigDataBench published at http://prof.ict.ac.cn/ – Generated 6GB dataset which includes over 65M data points
5.
© 2016 IBM
CorporationHadoop / Spark Conference Japan 2016 Performance Innovation Laboratory, IBM Research - Tokyo Benchmark 2 - TPC-H TPC-H Benchmark on Spark SQL – TPC-H is often used for SQL on Hadoop system – Spark SQL can run Hive QL directly through hiveserver2 (thrift server) and beeline (JDBC client) – We modified TPC-H Queries published at https://github.com/rxin/TPC-H-Hive Table data generator – Used DBGEN program and generated 100GB dataset (scale factor = 100) – Loaded data into Hive tables with Parquet format and Snappy compression 5 select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice*(1-l_discount)) as sum_disc_price, sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= '1998-09-01' group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus; TPC-H Q1 (Hive)
6.
© 2016 IBM
CorporationHadoop / Spark Conference Japan 2016 Performance Innovation Laboratory, IBM Research - Tokyo Machine & Software Spec and Spark Settings 6 Processor # Core SMT Memory OS POWER8 3.30 GHz * 2 24 cores (2 sockets * 12 cores) 8 (total 192 hardware threads) 1TB Ubuntu 14.10 (kernel 3.16.0-31) Xeon E5-2699 v3 2.30 GHz 36 cores (2 sockets x 18 cores) 2 (total 72 hardware threads) 755GB Ubuntu 15.04 (kernel 3.19.0-26) software version Spark 1.4.1, 1.5.2, 1.6.0 Hadoop (HDFS) 2.6.0 Java 1.8.0 (IBM J9 VM SR2) Scala 2.10.4 Default Spark Settings – # of Executor JVMs: 1 – # of worker threads: 48 – Total Heap size: 192GB (nursery = 48g, tenure = 144g)
7.
© 2016 IBM
CorporationHadoop / Spark Conference Japan 2016 Performance Innovation Laboratory, IBM Research - Tokyo JVM Tuning – Heap Space Sizing Garbage Collection tuning points – GC algorithms – GC threads – Heap sizing Heap sizing is simplest way to reduce GC overhead – Bigger young space helps to achieve over 30% improvement But, small old space may cause many global GC – Cached RDD stays in Java heap 7 0 50 100 150 200 250 300 350 400 450 Xmn48g Xmn96g Xmn144g Xmn48g Xmn96g Xmn144g Kmeans TPC-H Q9 ExecutionTime(sec.) Young Space (-Xmn) Execution Time (sec) GC ratio (%) Minor GC Avg. pause time Minor GC Major GC 48g (default) 400 s 20 % 2.1 s 39 1 96g 306 s 18 % 3.4 s 22 1 144g 300 s 14 % 3.6 s 14 0
8.
© 2016 IBM
CorporationHadoop / Spark Conference Japan 2016 Performance Innovation Laboratory, IBM Research - Tokyo JVM Tuning – Other Options JVM options tuning point – Monitor threads tuning – GC tuning – Java thread tuning – JIT tuning , etc. Result – Proper JVM options helps to improve application performance over 20% 8 -25.0% -20.0% -15.0% -10.0% -5.0% 0.0% 0 20 40 60 80 100 120 option 0 option 1 option 2 option 3 option 4 ExecutionTIme(sec.) Q1 Q5 speedup Q1 (%) speedup Q5 (%) # JVM Options Option 0 (baseline) -Xmn96g –Xdump:heap:none –Xdump:system:none -XX:+RuntimeInstrumentation -agentpath:/path/to/libjvmti_oprofile.so -verbose:gc –Xverbosegclog:/tmp/gc.log -Xjit:verbose={compileStart,compileEnd},vlog=/tmp/jit.log Option 1 (Monitor) Option 0 + “-Xtrace:none” Option 2 (GC) Option 1 + “-Xgcthreads48 –Xnoloa –Xnocompactgc –Xdisableexplicitgc” Option 3 (Thread) Option 2 + “-XlockReservation” Option 4 (JIT) Option 3 + “-XX:-RuntimeInstrumentation”
9.
© 2016 IBM
CorporationHadoop / Spark Conference Japan 2016 Performance Innovation Laboratory, IBM Research - Tokyo JVM Tuning – JVM Counts Experiment – Kept # worker threads and total heap – Changed # Executor JVMs – 1JVM : 48 worker threads & 192GB heap – 2JVMs : 24 worker threads & 96GB heap – 4JVMs : 12 worker threads & 48GB heap Result – Using a single big Executor JVM is not always best – By dividing into smaller JVMs, • Helps to reduce GC overhead • Helps to reduce resource contention Kmeans case – Performance gap comes from the first Kmeans job, especially from data loading – After loading RDD in memory, computation performance is similar 9 -16% -14% -12% -10% -8% -6% -4% -2% 0% 2% 4% 6% 0 50 100 150 200 250 300 Q1 Q3 Q5 Q9 Kmeans improvement ExecutionTime(sec.) 1JVM 2JVM 4JVM 2JVM (%) 4JVM (%) 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10 ExecutionTime(sec.) Kmeans Clustering Job Iterations (K = 2, 3, .. 11) 1JVM 2JVM 4JVM
10.
© 2016 IBM
CorporationHadoop / Spark Conference Japan 2016 Performance Innovation Laboratory, IBM Research - Tokyo OS Tuning – NUMA aware process affinity Setting NUMA aware process affinity to each Executor JVM helps to speed-up – By reducing scheduling overhead – By reducing cache miss and stall cycles Result – Achieved 3 – 14% improvement in all benchmarks without any bad effects 10 NUMA1NUMA0 NUMA2 NUMA3 JVM 0 12threads JVM 1 12threads JVM 2 12threads JVM 3 12threads Socket 0 Socket 1 Processors DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM numactl -c [0-7],[8-15],[16-23],[24-31],[32-39],[40-47] Spark Executor JVMs -16.0% -14.0% -12.0% -10.0% -8.0% -6.0% -4.0% -2.0% 0.0% 0 50 100 150 200 250 Q1 Q5 Q9 Kmeans ExecutionTIme(sec.) NUMA off NUMA on speedup (%)
11.
© 2016 IBM
CorporationHadoop / Spark Conference Japan 2016 Performance Innovation Laboratory, IBM Research - Tokyo OS Tuning – Large Page How to use large page – reserve large page on Linux by changing kernel parameter – Append “-Xlp” to Executor JVM option Result – Achieved 3 – 5 % improvement 11 0 20 40 60 80 100 120 140 160 180 200 PageSize=64K PageSize=16M PageSize=64K PageSize=16M NUMA off NUMA on ExecutionTime(sec.) Kmeans
12.
© 2016 IBM
CorporationHadoop / Spark Conference Japan 2016 Performance Innovation Laboratory, IBM Research - Tokyo Comparison of Default and Optimized w/ 1.4.1, 1.5.2, and 1.6.0 Newer version basically achieved good performance JVM & OS tuning are still helpful to improve Spark performance Tungsten & other new features (e.g. Unified Memory Management) can reduce GC overhead drastically 12 0 20 40 60 80 100 120 140 160 1.4.1 1.5.2 1.6.0 1.4.1 1.5.2 1.6.0 1.4.1 1.5.2 1.6.0 Q1 Q3 Q5 ExecutionTime(sec.) default optimized 0 50 100 150 200 250 300 350 1.4.1 1.5.2 1.6.0 1.4.1 1.5.2 1.6.0 1.4.1 1.5.2 1.6.0 Q9 Q19 Q21 ExecutionTime(sec.) default optimized 711 632