SlideShare a Scribd company logo
1 of 48
Optimizing Hortonworks
Apache Spark machine learning
workloads for contemporary
Open Platforms
Raj Krishnamurthy, Indrajit Poddar (I.P), IBM Systems
Animesh Trivedi, Bernard Metzler, IBM Research
© International Business Machines (IBM) 2017
Please Note:
• IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice and at IBM’s
sole discretion.
• Information regarding potential future products is intended to outline our general product direction and it should not be
relied on in making a purchasing decision.
• The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver
any material, code or functionality. Information about potential future products may not be incorporated into any contract.
• The development, release, and timing of any future features or functionality described for our products remains at our sole
discretion.
• Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The
actual throughput or performance that any user will experience will vary depending upon many factors, including
considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage
configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results
similar to those stated here.
2
© International Business Machines (IBM) 2017
Agenda
Spark, Machine Learning and Deep Learning Overview
Why OpenPower ?
Deep Learning with OpenPOWER GPUs
Spark Machine Learning performance tuning with OpenPower CPUs
IO Optimization for Spark TeraSort benchmark
3
© International Business Machines (IBM) 2017
What is Apache Spark
• Unified Analytics Platform
– Combine streaming, graph, machine
learning and sql analytics on a single
platform
– Simplified, multi-language
programming model
– Interactive and Batch
• In-Memory Design
– Pipelines multiple iterations on single
copy of data in memory
– Superior Performance
– Natural Successor to MapReduce
Fast and general engine for
large-scale data processing
Spark Core API
R Scala SQL Python Java
Spark SQL Streaming MLlib GraphX
4
© International Business Machines (IBM) 2017
Machine Learning and Deep Learning (ML/DL)
What you and I (our brains) do without even thinking about it…..we recognize a bicycle
Apr 7, 2017
(c) International Business Machines (IBM) 2017
5
6
Now machines are learning the way we learn….
From "Texture of the Nervous
System of Man and the Vertebrates"
by Santiago Ramón y Cajal.
Artificial Neural Networks
Apr 7, 2017(c) International Business Machines (IBM) 2017
But training needs a lot computational resources
Easy scale-out with:
But model training is not easy to distribute
Training can take hours, days or
weeks
Input data and model sizes are
becoming larger than ever (e.g. video
input, billions of features etc.)
Real-time analytics with:
• whole system optimization
• offloaded computation
• accelerators, and
• higher memory bandwidth systems
Resulting in need for:
Moore’s law is dying
Apr 7, 2017(c) International Business Machines (IBM) 2017 7
Today’s challenges demand whole system innovation
You are here
44 zettabytes
unstructured data
2010 2020
structured data
Data holds competitive valueFull system and stack open innovation required
DataGrowth
Price/Performance
Moore’s Law
Processor
Technology
2000 2020
Firmware / OS
Accelerators
Software
Storage
Network
8
© International Business Machines (IBM) 2017
9
OpenPOWER: open hardware for high performance
Systems designed for
big data analytics
and superior cloud economics
Upto:
12 cores per cpu
96 hardware threads per cpu
1 TB RAM
7.6Tb/s combined I/O Bandwidth
GPUs and FPGAs coming…
OpenPOWER
Traditional
Intel x86
http://www.softlayer.com/POWER-SERVERS
https://mc.jarvice.com/
Apr 7, 2017(c) International Business Machines (IBM) 2017
10
OpenPower Ecosystem – Members
(c) International Business Machines (IBM) 2017 Apr 7, 2017
Memory
Interface
Control
Memory
IBM & Partner
Devices
CAPI/PCI
DMI
Cores
• 12 cores / 8 threads per core
• TDP: 130W and 190W
• 64K data cache, 32K instruction cache
Accelerators
• Crypto & memory expansion
• Transactional Memory
Caches
• 512 KB SRAM L2 / core
• 96 MB eDRAM shared L3
Memory Subsystem
• Memory buffers with 128MB Cache
• ~70ns latency to memory
Bus Interfaces
• Durable Memory attach Interface (DMI)
• Integrated PCIe Gen3
• SMP Interconnect for up to 4 sockets
Virtual Addressing
•Accelerator can work with same memory
addresses that the processors use
•Pointers de-referenced same as the host
application
•Removes OS & device driver overhead
Hardware Managed Cache Coherence
•Enables the accelerator to participate in “Locks” as
a normal thread
•Lowers Latency over IO communication model
6 Hardware Partners developing with CAPI
Over 20 CAPI Solutions
• All listed here http://ibm.biz/powercapi
Examples of Available CAPI Solutions
• IBM Data Engine for NoSQL
• DRC Graphfind analytics
• Erasure Code Acceleration for Hadoop
Coherent Accelerator Processor Interface
(CAPI)
22nm SOI, eDRAM, 15 ML 650mm2
SMP
http://openpowerfoundation.org/wp-content/uploads/2016/04/HardwareRevealFlyerFinal.pdf
Newly Announced OpenPOWER systems and solutions:
POWER8 Processor - Design
1
1
© International Business Machines (IBM) 2017
Introducing Minsky S822LC OpenPOWER system for HPC
first custom-built GPU accelerator server with NVLink
|
12
2.5x Faster CPU-GPU Data
Communication via NVLink
NVLink
80 GB/s
GPU
P8
GPU GPU
P8
GPU
PCIe
32 GB/s
GPU
x86
GPU GPU
x86
GPU
No NVLink between CPU &
GPU for x86 Servers: PCIe
Bottleneck
NVIDIA P100 Pascal GPU
POWER8 NVLink Server x86 Servers with PCIe
• Custom-built GPU Accelerator Server
• High-Speed NVLink Connections between
CPUs & GPUs and among GPUs
• Features novel NVIDIA P100 Pascal GPU
accelerator
M.Gschwind, Bringing the Deep Learning Revolution into the Enterprise
Deep Learning on OpenPOWER with GPUs
Transparent acceleration without code changes
|
13
Apr 7, 2017(c) International Business Machines (IBM) 2017
Introducing PowerAI: Get started fast with Deep Learning
14
Enabled by High Performance Computing Infrastructure
Package of Pre-Compiled Major
Deep Learning Frameworks
Easy to install & get started with
Deep Learning with Enterprise-
Class Support
for Performance
To Take Advantage of NVLink
https://www.ibm.com/ms-en/marketplace/deep-learning-platform
Machine and Deep Learning analytics on OpenPOWER
no code changes needed!!
15
ATLAS
Automatically Tuned Linear Algebra
Software)
https://www.ibm.com/developerworks/community/blogs/fe313521-2e95-46f2-817d-
44a4f27eba32/entry/DeepLearning4J_Deep_Learning_with_Java_Spark_and_Power?lang=en
OpenPOWER: GPU support
16
Credit: Kevin Klaues, Mesosphere
Mesos supports GPU scheduling
Huge speed-ups with GPUs and OpenPOWER!
Enabling Accelerators/GPUs in the cloud stack
17
Deep Learning Training + Inference
Containers
and images
Accelerators
Clustering frameworks
Tensorflow on tesla P100: PowerAI is 30% faster
18
IBM S822LC 20-cores 2.86GHz 512GB memory / 4 NVIDIA Tesla P100 GPUs / Ubuntu 16.04 / CUDA 8.0.44 /
cuDNN 5.1 / TensorFlow 0.12.0 / Inception v3 Benchmark (64 image minbatch)
Intel Broadwell E5-2640v4 20-core 2.6 GHz 512GB memory / 4 NVIDIA Tesla P100 GPUs/ Ubuntu 16.04 / CUDA
8.0.44 / cuDNN 5.1 / TensorFlow 0.12.0 / Inception v3 Benchmark (64 image minbatch)
Larger value is better
PowerAI vs DGX-1: 1.6x Tensorflow throughput / dollar
19
▪ TensorFlow 0.12 on the IBM PowerAI platform takes advantage
of the full capabilities of NVLink
▪ For image classification and analysis this means a 1.6X price
performance advantage relative to the NVIDIA DGX-1
System Images / Second List Price $ / Image / Second
NVIDIA DGX-1
(8 P100 GPU,
512GB Mem)
330 $129,000 $390
PowerAI (4 P100
GPU, 512 GB Mem)
273 $67,000 $241
Lower cost is better
NVLink and P100 advantage
|
20
• NVLink reduces communication time and overhead
• Incorporating the fastest GPU for deep learning
• Data gets from GPU-GPU, Memory-GPU faster, for shorter training times
x86 based
GPU system
POWER8 +
Tesla
P100+NVLink
ImageNet / Alexnet: Minibatch size = 128
170 ms
78 ms
IBM advantage: data communication
and GPU performance
Spark Machine Learning performance tuning on OpenPOWER
What knobs can you tweak?
|
21
Apr 7, 2017(c) International Business Machines (IBM) 2017
Spark on OpenPower
• Streaming and SQL benefit from High Thread Density and Concurrency
• Processing multiple packets of a stream and different stages of a message stream pipeline
• Processing multiple rows from a query
2
2
© International Business Machines (IBM) 2017
• Machine Learning benefits from Large Caches and Memory Bandwidth
• Iterative Algorithms on the same data
• Fewer core pipeline stalls and overall higher throughput
2
3
Spark on OpenPower
© International Business Machines (IBM) 2017
• Graph algorithms also benefit from Large Caches, Memory Bandwidth and Higher
Thread Strength
• Flexibility to go from 8 SMT threads per core to 4 or 2
• Manage Balance between thread performance and throughput
24
Spark on OpenPower
© International Business Machines (IBM) 2017
• Headroom
• Balanced resource utilization, more efficient scale-out
• Multi-tenant deployments
2
5
Spark on OpenPower
© International Business Machines (IBM) 2017
Roofline SPARK Performance Model
26
Spark Tunables
Spark Performance
“Roofline” Performance
Navigation uses system resource
workload characterization and analysis
to look for fundamental inefficiencies
“Roofline “
Good Enough
“Out of Box”
FOR 1 … MAX WORKERS
FOR 1 …. MAX CPU PER NODE
FOR 1 … MAX THREADS PER CPU
FOR 1… MAX PARTITIONS
Unwieldly & Complicated
(some respite in ML workloads
from data sampling)
Performance Navigation Automation Script
© International Business Machines (IBM) 2017
Performance Tuning Tips for a Machine Learning Workload
27
Top Down Approach
Methodology:
Alternating Least Squares Based
Matrix Factorization application
Optimization Process:
Spark executor Instances
Spark executor cores
Spark executor memory
Spark shuffle location and manager
RDD persistence storage level
Application
Large No of Spark Tunable -
Spark Executors and Spark Cores ……
Default Configurations
Out of Box Performance
Bottom Approach
System Hardware
Characterizing the Workload
Through Resource monitoring
Custom SPARK Tunables from
Configuration Sweeps
Roofline Performance
WorkFlow
28
• Matrix Factorization from SPARKBENCH - https://github.com/SparkTC/spark-bench
• Training
• Validation
• Prediction
With permission - Raj Krishnamurthy STRATA NYC 2016
© International Business Machines (IBM) 2017
Parameters used for data generation in MF application
Matrix Factorization with Alternating Least Squares
29
Data generation
parameters
Value
Rows in data matrix 62000
Columns in data matrix 62000
Data set size 100 GB
Spark parameter Value for MF
Master node 1
Worker nodes 6
Executors per Node 1
Executor cores 80 / 40 /24
Executor Memory 480 GB
Shuffle Location HDDs
Input Storage HDFS
Job Function Description / API called
7 Mean at
MFApp.java
AbstractJavaRDDLike.map
MatrixFactorizationModel.predict
JavaDoubleRDD.mean
6 Aggregate at
MFModel.scala
MatrixFactorizationModel.predict
MatrixFactorizationModel.countApproxDistinctUserProduct
5 First at
MFModel.scala
ml.recommendation.ALS.computeFactors
4 First at
MFModel.scala
ml.recommendation.ALS.computeFactors
3 Count at ALS.scala ALS.train and ALS.intialize
2 Count at ALS.scala ALS.train
1 Count at ALS.scala ALS.train
0 Count at ALS.scala ALS.train
© International Business Machines (IBM) 2017
Analyzing SPARK Configuration Sweep
30
Configur
ation
1 2 3 4 5 6 7 8 9 10 11
Spark
executor
cores
80 80 40 40 40 40 40 40 24 24 24
GC
options
Default Default Default ParallelGCth
reads=40
ParallelGCth
reads=40
ParallelGCth
reads=40
ParallelGCth
reads=40
ParallelGCth
reads=40
ParallelGCth
reads=24
ParallelGCth
reads=24
Default
RDD
compres
sion
TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
Storage
level
memory_a
nd_disk
memory
_only
memory
_only
memory_onl
y
memory_and
_disk_ser
memory_onl
y_ser
memory_onl
y
memory_onl
y
memory_and
_disk_ser
memory_and
_disk_ser
memory_
and_disk
_ser
Partition
numbers
1000 1000 1000 1000 1000 1000 800 1200 1000 1000 1000
Shuffle
Manager
Sort based Sort
based
Sort
based
Sort
based
Sort
based
Sort
based
Sort
based
Sort
based
Sort
based
Tungsten-
sort
Tungsten-
sort
Run-
time
(minutes
)
40 34 26 24 20 25 26 27 21 19 18
Various configurations tried in optimizing MF application on Spark
© International Business Machines (IBM) 2017
GC and Memory Foot print
31
Configuration Run time of last stage GC time of last stage
1 12 min 4.4 min
4 4.4 min 1.8 min
9 3.5 min 1.6 min
11 47s 16s
Run time and GC time of Stage 68 for different configurations
© International Business Machines (IBM) 2017
Last Stage Analysis
32© International Business Machines (IBM) 2017
Characterizing Configuration #1
33
CPU utilization on
a worker node
(configuration 1 )
Memory utilization
on a worker node (
configuration 1)
© International Business Machines (IBM) 2017
Characterizing Configuration #1 and Configuration #11
34
Memory footprint of configuration 11
© International Business Machines (IBM) 2017
Summary - How to Optimize Closer to Roofline Performance Faster?
• Classify workload into CPU, memory, IO or mixed (CPU, memory, IO) intensive
• Characterize “out-of-the-box” workload to understand CPU, Memory, IO and Network performance
characteristics
• Floorplan cluster resources
• Tune “out-of-the-box” workload to navigate “Roofline” performance space in the above named dimensions
– If workload is memory/IO/Network bound then tune SPARK to increase operational intensity operations/byte as much as
possible to make it CPU bound
• Divide search space into regions and perform exhaustive search
35© International Business Machines (IBM) 2017
IO Optimizations
How to take advantage of faster networks?
36Apr 7, 2017(c) International Business Machines (IBM) 2017
THE GAP – HIGH-PERFORMANCE NETWORKS
The networks – 1, 10, and 40 Gbps networks
Runtime(secs)
37Apr 7, 2017(c) International Business Machines (IBM) 2017
THE PERFORMANCE LOSS IN THE BIG-DATA STACK
High-Performance
I/O devices
• Data copies
• Context switches
• Cache pollution
• Deep call-stacks
• Legacy I/O interfaces
38Apr 7, 2017(c) International Business Machines (IBM) 2017
The Crail Architecture WWW.CRAIL.IO
 A high-performance data fabric for the Apache Data
Processing Stack
 Relies on the principles of user level IO
 Separation between control path and data path
 User-space direct-access I/O architecture/layer cut-through
 Builds on a distributed, shared data store
 No changes to overall data processing framework
 Is optimized to serve short-lived data sharing and staging
spark / flink / storm …
HDFS
Crail Store
High Performance
RDMA Network
zerocopy
spark specific
shuffle broadcast
39Apr 7, 2017(c) International Business Machines (IBM) 2017
EVALUATION - TERASORT
0
100
200
300
400
500
600
Spark Spark/Crail
Runtime(seconds)
12.8 TB data set, TeraSort
reduce
map
128 nodes OpenPOWER cluster
• 2 x IBM POWER8 10-core @ 2.9 Ghz
• DRAM: 512GB DDR4
• 4 x 1.2 TB NVMe SSD
• 100GbE Mellanox ConnectX-4 EN (RoCE)
• Ubuntu 16.04 (kernel 4.4.0-31)
• Spark 2.0.2
Performance gain: 6x
• Most gain from reduce phase:
• Crail shuffler much faster than Spark build-in
• Dramatically reduced CPU involvement
• Dramatically improved network usage
• Map phase: all activity local
• Still faster than vanilla Spark
40Apr 7, 2017(c) International Business Machines (IBM) 2017
EVALUATION – TERASORT: NETWORK IO
• Vanilla Spark runs on 100GbE
• Spark/Crail runs on 100Gb RoCE/RDMA
• Vanilla Spark peaks at ~10Gb/s
• Spark/Crail shuffle delivers ~70Gb/s per node
41Apr 7, 2017(c) International Business Machines (IBM) 2017
EVALUATION – TERASORT CPU EFFICIENCY
• Spark/Crail completes much faster despite comparable CPU load
• Spark/Crail CPU efficiency is close to 2016 sorting benchmark winner: 3.13
vs. 4.4 GB/min/core
• 2016 winner runs native C code!
Spark +
Crail
Spark
2.0.2
Winner
2014
Winner
2016
Size
TB
12.8 100
Time
sec
98 527 1406 98.6
Cores 2560 6592 10240
Nodes 128 206 512
NW
Gb/s
100 10 100
Rate
TB/min
7.8 1.4 4.27 44.78
Rate/core
GB/min
3.13 0.58 0.66 4.4
42Apr 7, 2017(c) International Business Machines (IBM) 2017
CRAIL WITH THE HORTONWORKS STACK
scalable, fault-tolerant,
cost-efficient storage
resource manager
compute
frameworks
user
interfaces
broadcast
HDFS
plugin
RPCs
shuffle
caching
key-value
store
...
High-performance
Crailfabric
43Apr 7, 2017(c) International Business Machines (IBM) 2017
Roadmap
Where is OpenPOWER headed?
|
44
Apr 7, 2017(c) International Business Machines (IBM) 2017
Accelerator Technology
2015 2016 2017
POWER8 POWER8 with NVLink POWER9
OpenPower
CAPI Interface
Enhanced CAPI
& NVLink
Connect-IB
FDR Infiniband
PCIe Gen3
ConnectX-4
EDR Infiniband
CAPI over PCIe Gen3
ConnectX-5
Next-Gen Infiniband
Enhanced CAPI over PCIe Gen4
IBM CPUs
Kepler
PCIe Gen3
Volta
Enhanced NVLink
Pascal
NVLink
45© International Business Machines (IBM) 2017
NOTICES AND DISCLAIMERS
46
Copyright © 2016 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM.
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.
Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial
publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS"
WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT
LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the
agreements under which they are provided.
IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms
apply.”
Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.
Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used
IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.
References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM
operates or does business.
Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions
are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.
It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as
to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any
actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services
or products will ensure that the customer is in compliance with any law
© 2016 International Business Machines C
NOTICES AND DISCLAIMERS CON’T.
47
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those
products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of
non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to
interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
AND FITNESS FOR A PARTICULAR PURPOSE.
The provision of the information contained h erein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right.
IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business
Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®,
OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®,
Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and
System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or
other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.
48
Q & A

More Related Content

What's hot

Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0Ganesan Narayanasamy
 
AI OpenPOWER Academia Discussion Group
AI OpenPOWER Academia Discussion Group AI OpenPOWER Academia Discussion Group
AI OpenPOWER Academia Discussion Group Ganesan Narayanasamy
 
NVIDIA 深度學習教育機構 (DLI): Neural network deployment
NVIDIA 深度學習教育機構 (DLI): Neural network deploymentNVIDIA 深度學習教育機構 (DLI): Neural network deployment
NVIDIA 深度學習教育機構 (DLI): Neural network deploymentNVIDIA Taiwan
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platforminside-BigData.com
 
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...Willy Marroquin (WillyDevNET)
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformGanesan Narayanasamy
 
Affordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeAffordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeNVIDIA Taiwan
 
OpenPOWER Foundation Overview
OpenPOWER Foundation OverviewOpenPOWER Foundation Overview
OpenPOWER Foundation OverviewNVIDIA Taiwan
 
Artificial intelligence on the Edge
Artificial intelligence on the EdgeArtificial intelligence on the Edge
Artificial intelligence on the EdgeUsman Qayyum
 
Gschwind, PowerAI: A Co-Optimized Software Stack for AI on Power
Gschwind, PowerAI: A Co-Optimized Software Stack for AI on PowerGschwind, PowerAI: A Co-Optimized Software Stack for AI on Power
Gschwind, PowerAI: A Co-Optimized Software Stack for AI on PowerMichael Gschwind
 
OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM Ganesan Narayanasamy
 
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA Taiwan
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciIntel® Software
 
Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learningGanesan Narayanasamy
 

What's hot (20)

2018 bsc power9 and power ai
2018   bsc power9 and power ai 2018   bsc power9 and power ai
2018 bsc power9 and power ai
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0
 
SNAP MACHINE LEARNING
SNAP MACHINE LEARNINGSNAP MACHINE LEARNING
SNAP MACHINE LEARNING
 
CFD on Power
CFD on Power CFD on Power
CFD on Power
 
AI OpenPOWER Academia Discussion Group
AI OpenPOWER Academia Discussion Group AI OpenPOWER Academia Discussion Group
AI OpenPOWER Academia Discussion Group
 
NVIDIA 深度學習教育機構 (DLI): Neural network deployment
NVIDIA 深度學習教育機構 (DLI): Neural network deploymentNVIDIA 深度學習教育機構 (DLI): Neural network deployment
NVIDIA 深度學習教育機構 (DLI): Neural network deployment
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
 
Affordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeAffordable AI Connects To A Better Life
Affordable AI Connects To A Better Life
 
OpenPOWER Foundation Overview
OpenPOWER Foundation OverviewOpenPOWER Foundation Overview
OpenPOWER Foundation Overview
 
OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar OpenPOWER/POWER9 AI webinar
OpenPOWER/POWER9 AI webinar
 
WML OpenPOWER presentation
WML OpenPOWER presentationWML OpenPOWER presentation
WML OpenPOWER presentation
 
Artificial intelligence on the Edge
Artificial intelligence on the EdgeArtificial intelligence on the Edge
Artificial intelligence on the Edge
 
Gschwind, PowerAI: A Co-Optimized Software Stack for AI on Power
Gschwind, PowerAI: A Co-Optimized Software Stack for AI on PowerGschwind, PowerAI: A Co-Optimized Software Stack for AI on Power
Gschwind, PowerAI: A Co-Optimized Software Stack for AI on Power
 
OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM OpenPOWER/POWER9 Webinar from MIT and IBM
OpenPOWER/POWER9 Webinar from MIT and IBM
 
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
 
OpenPOWER Webinar
OpenPOWER Webinar OpenPOWER Webinar
OpenPOWER Webinar
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learning
 

Similar to Optimizing Hortonworks Apache Spark machine learning workloads for contemporary Open Platforms

Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsYong Feng
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next DecadePaula Koziol
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...Databricks
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...Databricks
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit MumbaiAnand Haridass
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERinside-BigData.com
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIBM Switzerland
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8MongoDB
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Indrajit Poddar
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Ahsan Javed Awan
 
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...Indrajit Poddar
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...DataWorks Summit
 
Capi snap overview
Capi snap overviewCapi snap overview
Capi snap overviewYutaka Kawai
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-isctembreternitz
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator Ganesan Narayanasamy
 
IBM Power Systems - enabling cloud solutions
IBM Power Systems - enabling cloud solutionsIBM Power Systems - enabling cloud solutions
IBM Power Systems - enabling cloud solutionsDavid Spurway
 
IBM Power leading Cognitive Systems
IBM Power leading Cognitive SystemsIBM Power leading Cognitive Systems
IBM Power leading Cognitive SystemsHugo Blanco
 

Similar to Optimizing Hortonworks Apache Spark machine learning workloads for contemporary Open Platforms (20)

Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next Decade
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
 
Demystify OpenPOWER
Demystify OpenPOWERDemystify OpenPOWER
Demystify OpenPOWER
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...
 
Capi snap overview
Capi snap overviewCapi snap overview
Capi snap overview
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscte
 
OpenPOWER Boot camp in Zurich
OpenPOWER Boot camp in ZurichOpenPOWER Boot camp in Zurich
OpenPOWER Boot camp in Zurich
 
Power overview 2018 08-13b
Power overview 2018 08-13bPower overview 2018 08-13b
Power overview 2018 08-13b
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator
 
IBM Power Systems - enabling cloud solutions
IBM Power Systems - enabling cloud solutionsIBM Power Systems - enabling cloud solutions
IBM Power Systems - enabling cloud solutions
 
IBM Power leading Cognitive Systems
IBM Power leading Cognitive SystemsIBM Power leading Cognitive Systems
IBM Power leading Cognitive Systems
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 

Optimizing Hortonworks Apache Spark machine learning workloads for contemporary Open Platforms

  • 1. Optimizing Hortonworks Apache Spark machine learning workloads for contemporary Open Platforms Raj Krishnamurthy, Indrajit Poddar (I.P), IBM Systems Animesh Trivedi, Bernard Metzler, IBM Research © International Business Machines (IBM) 2017
  • 2. Please Note: • IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice and at IBM’s sole discretion. • Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. • The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. • The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. • Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. 2 © International Business Machines (IBM) 2017
  • 3. Agenda Spark, Machine Learning and Deep Learning Overview Why OpenPower ? Deep Learning with OpenPOWER GPUs Spark Machine Learning performance tuning with OpenPower CPUs IO Optimization for Spark TeraSort benchmark 3 © International Business Machines (IBM) 2017
  • 4. What is Apache Spark • Unified Analytics Platform – Combine streaming, graph, machine learning and sql analytics on a single platform – Simplified, multi-language programming model – Interactive and Batch • In-Memory Design – Pipelines multiple iterations on single copy of data in memory – Superior Performance – Natural Successor to MapReduce Fast and general engine for large-scale data processing Spark Core API R Scala SQL Python Java Spark SQL Streaming MLlib GraphX 4 © International Business Machines (IBM) 2017
  • 5. Machine Learning and Deep Learning (ML/DL) What you and I (our brains) do without even thinking about it…..we recognize a bicycle Apr 7, 2017 (c) International Business Machines (IBM) 2017 5
  • 6. 6 Now machines are learning the way we learn…. From "Texture of the Nervous System of Man and the Vertebrates" by Santiago Ramón y Cajal. Artificial Neural Networks Apr 7, 2017(c) International Business Machines (IBM) 2017
  • 7. But training needs a lot computational resources Easy scale-out with: But model training is not easy to distribute Training can take hours, days or weeks Input data and model sizes are becoming larger than ever (e.g. video input, billions of features etc.) Real-time analytics with: • whole system optimization • offloaded computation • accelerators, and • higher memory bandwidth systems Resulting in need for: Moore’s law is dying Apr 7, 2017(c) International Business Machines (IBM) 2017 7
  • 8. Today’s challenges demand whole system innovation You are here 44 zettabytes unstructured data 2010 2020 structured data Data holds competitive valueFull system and stack open innovation required DataGrowth Price/Performance Moore’s Law Processor Technology 2000 2020 Firmware / OS Accelerators Software Storage Network 8 © International Business Machines (IBM) 2017
  • 9. 9 OpenPOWER: open hardware for high performance Systems designed for big data analytics and superior cloud economics Upto: 12 cores per cpu 96 hardware threads per cpu 1 TB RAM 7.6Tb/s combined I/O Bandwidth GPUs and FPGAs coming… OpenPOWER Traditional Intel x86 http://www.softlayer.com/POWER-SERVERS https://mc.jarvice.com/ Apr 7, 2017(c) International Business Machines (IBM) 2017
  • 10. 10 OpenPower Ecosystem – Members (c) International Business Machines (IBM) 2017 Apr 7, 2017
  • 11. Memory Interface Control Memory IBM & Partner Devices CAPI/PCI DMI Cores • 12 cores / 8 threads per core • TDP: 130W and 190W • 64K data cache, 32K instruction cache Accelerators • Crypto & memory expansion • Transactional Memory Caches • 512 KB SRAM L2 / core • 96 MB eDRAM shared L3 Memory Subsystem • Memory buffers with 128MB Cache • ~70ns latency to memory Bus Interfaces • Durable Memory attach Interface (DMI) • Integrated PCIe Gen3 • SMP Interconnect for up to 4 sockets Virtual Addressing •Accelerator can work with same memory addresses that the processors use •Pointers de-referenced same as the host application •Removes OS & device driver overhead Hardware Managed Cache Coherence •Enables the accelerator to participate in “Locks” as a normal thread •Lowers Latency over IO communication model 6 Hardware Partners developing with CAPI Over 20 CAPI Solutions • All listed here http://ibm.biz/powercapi Examples of Available CAPI Solutions • IBM Data Engine for NoSQL • DRC Graphfind analytics • Erasure Code Acceleration for Hadoop Coherent Accelerator Processor Interface (CAPI) 22nm SOI, eDRAM, 15 ML 650mm2 SMP http://openpowerfoundation.org/wp-content/uploads/2016/04/HardwareRevealFlyerFinal.pdf Newly Announced OpenPOWER systems and solutions: POWER8 Processor - Design 1 1 © International Business Machines (IBM) 2017
  • 12. Introducing Minsky S822LC OpenPOWER system for HPC first custom-built GPU accelerator server with NVLink | 12 2.5x Faster CPU-GPU Data Communication via NVLink NVLink 80 GB/s GPU P8 GPU GPU P8 GPU PCIe 32 GB/s GPU x86 GPU GPU x86 GPU No NVLink between CPU & GPU for x86 Servers: PCIe Bottleneck NVIDIA P100 Pascal GPU POWER8 NVLink Server x86 Servers with PCIe • Custom-built GPU Accelerator Server • High-Speed NVLink Connections between CPUs & GPUs and among GPUs • Features novel NVIDIA P100 Pascal GPU accelerator M.Gschwind, Bringing the Deep Learning Revolution into the Enterprise
  • 13. Deep Learning on OpenPOWER with GPUs Transparent acceleration without code changes | 13 Apr 7, 2017(c) International Business Machines (IBM) 2017
  • 14. Introducing PowerAI: Get started fast with Deep Learning 14 Enabled by High Performance Computing Infrastructure Package of Pre-Compiled Major Deep Learning Frameworks Easy to install & get started with Deep Learning with Enterprise- Class Support for Performance To Take Advantage of NVLink https://www.ibm.com/ms-en/marketplace/deep-learning-platform
  • 15. Machine and Deep Learning analytics on OpenPOWER no code changes needed!! 15 ATLAS Automatically Tuned Linear Algebra Software) https://www.ibm.com/developerworks/community/blogs/fe313521-2e95-46f2-817d- 44a4f27eba32/entry/DeepLearning4J_Deep_Learning_with_Java_Spark_and_Power?lang=en
  • 16. OpenPOWER: GPU support 16 Credit: Kevin Klaues, Mesosphere Mesos supports GPU scheduling Huge speed-ups with GPUs and OpenPOWER!
  • 17. Enabling Accelerators/GPUs in the cloud stack 17 Deep Learning Training + Inference Containers and images Accelerators Clustering frameworks
  • 18. Tensorflow on tesla P100: PowerAI is 30% faster 18 IBM S822LC 20-cores 2.86GHz 512GB memory / 4 NVIDIA Tesla P100 GPUs / Ubuntu 16.04 / CUDA 8.0.44 / cuDNN 5.1 / TensorFlow 0.12.0 / Inception v3 Benchmark (64 image minbatch) Intel Broadwell E5-2640v4 20-core 2.6 GHz 512GB memory / 4 NVIDIA Tesla P100 GPUs/ Ubuntu 16.04 / CUDA 8.0.44 / cuDNN 5.1 / TensorFlow 0.12.0 / Inception v3 Benchmark (64 image minbatch) Larger value is better
  • 19. PowerAI vs DGX-1: 1.6x Tensorflow throughput / dollar 19 ▪ TensorFlow 0.12 on the IBM PowerAI platform takes advantage of the full capabilities of NVLink ▪ For image classification and analysis this means a 1.6X price performance advantage relative to the NVIDIA DGX-1 System Images / Second List Price $ / Image / Second NVIDIA DGX-1 (8 P100 GPU, 512GB Mem) 330 $129,000 $390 PowerAI (4 P100 GPU, 512 GB Mem) 273 $67,000 $241 Lower cost is better
  • 20. NVLink and P100 advantage | 20 • NVLink reduces communication time and overhead • Incorporating the fastest GPU for deep learning • Data gets from GPU-GPU, Memory-GPU faster, for shorter training times x86 based GPU system POWER8 + Tesla P100+NVLink ImageNet / Alexnet: Minibatch size = 128 170 ms 78 ms IBM advantage: data communication and GPU performance
  • 21. Spark Machine Learning performance tuning on OpenPOWER What knobs can you tweak? | 21 Apr 7, 2017(c) International Business Machines (IBM) 2017
  • 22. Spark on OpenPower • Streaming and SQL benefit from High Thread Density and Concurrency • Processing multiple packets of a stream and different stages of a message stream pipeline • Processing multiple rows from a query 2 2 © International Business Machines (IBM) 2017
  • 23. • Machine Learning benefits from Large Caches and Memory Bandwidth • Iterative Algorithms on the same data • Fewer core pipeline stalls and overall higher throughput 2 3 Spark on OpenPower © International Business Machines (IBM) 2017
  • 24. • Graph algorithms also benefit from Large Caches, Memory Bandwidth and Higher Thread Strength • Flexibility to go from 8 SMT threads per core to 4 or 2 • Manage Balance between thread performance and throughput 24 Spark on OpenPower © International Business Machines (IBM) 2017
  • 25. • Headroom • Balanced resource utilization, more efficient scale-out • Multi-tenant deployments 2 5 Spark on OpenPower © International Business Machines (IBM) 2017
  • 26. Roofline SPARK Performance Model 26 Spark Tunables Spark Performance “Roofline” Performance Navigation uses system resource workload characterization and analysis to look for fundamental inefficiencies “Roofline “ Good Enough “Out of Box” FOR 1 … MAX WORKERS FOR 1 …. MAX CPU PER NODE FOR 1 … MAX THREADS PER CPU FOR 1… MAX PARTITIONS Unwieldly & Complicated (some respite in ML workloads from data sampling) Performance Navigation Automation Script © International Business Machines (IBM) 2017
  • 27. Performance Tuning Tips for a Machine Learning Workload 27 Top Down Approach Methodology: Alternating Least Squares Based Matrix Factorization application Optimization Process: Spark executor Instances Spark executor cores Spark executor memory Spark shuffle location and manager RDD persistence storage level Application Large No of Spark Tunable - Spark Executors and Spark Cores …… Default Configurations Out of Box Performance Bottom Approach System Hardware Characterizing the Workload Through Resource monitoring Custom SPARK Tunables from Configuration Sweeps Roofline Performance
  • 28. WorkFlow 28 • Matrix Factorization from SPARKBENCH - https://github.com/SparkTC/spark-bench • Training • Validation • Prediction With permission - Raj Krishnamurthy STRATA NYC 2016 © International Business Machines (IBM) 2017
  • 29. Parameters used for data generation in MF application Matrix Factorization with Alternating Least Squares 29 Data generation parameters Value Rows in data matrix 62000 Columns in data matrix 62000 Data set size 100 GB Spark parameter Value for MF Master node 1 Worker nodes 6 Executors per Node 1 Executor cores 80 / 40 /24 Executor Memory 480 GB Shuffle Location HDDs Input Storage HDFS Job Function Description / API called 7 Mean at MFApp.java AbstractJavaRDDLike.map MatrixFactorizationModel.predict JavaDoubleRDD.mean 6 Aggregate at MFModel.scala MatrixFactorizationModel.predict MatrixFactorizationModel.countApproxDistinctUserProduct 5 First at MFModel.scala ml.recommendation.ALS.computeFactors 4 First at MFModel.scala ml.recommendation.ALS.computeFactors 3 Count at ALS.scala ALS.train and ALS.intialize 2 Count at ALS.scala ALS.train 1 Count at ALS.scala ALS.train 0 Count at ALS.scala ALS.train © International Business Machines (IBM) 2017
  • 30. Analyzing SPARK Configuration Sweep 30 Configur ation 1 2 3 4 5 6 7 8 9 10 11 Spark executor cores 80 80 40 40 40 40 40 40 24 24 24 GC options Default Default Default ParallelGCth reads=40 ParallelGCth reads=40 ParallelGCth reads=40 ParallelGCth reads=40 ParallelGCth reads=40 ParallelGCth reads=24 ParallelGCth reads=24 Default RDD compres sion TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE Storage level memory_a nd_disk memory _only memory _only memory_onl y memory_and _disk_ser memory_onl y_ser memory_onl y memory_onl y memory_and _disk_ser memory_and _disk_ser memory_ and_disk _ser Partition numbers 1000 1000 1000 1000 1000 1000 800 1200 1000 1000 1000 Shuffle Manager Sort based Sort based Sort based Sort based Sort based Sort based Sort based Sort based Sort based Tungsten- sort Tungsten- sort Run- time (minutes ) 40 34 26 24 20 25 26 27 21 19 18 Various configurations tried in optimizing MF application on Spark © International Business Machines (IBM) 2017
  • 31. GC and Memory Foot print 31 Configuration Run time of last stage GC time of last stage 1 12 min 4.4 min 4 4.4 min 1.8 min 9 3.5 min 1.6 min 11 47s 16s Run time and GC time of Stage 68 for different configurations © International Business Machines (IBM) 2017
  • 32. Last Stage Analysis 32© International Business Machines (IBM) 2017
  • 33. Characterizing Configuration #1 33 CPU utilization on a worker node (configuration 1 ) Memory utilization on a worker node ( configuration 1) © International Business Machines (IBM) 2017
  • 34. Characterizing Configuration #1 and Configuration #11 34 Memory footprint of configuration 11 © International Business Machines (IBM) 2017
  • 35. Summary - How to Optimize Closer to Roofline Performance Faster? • Classify workload into CPU, memory, IO or mixed (CPU, memory, IO) intensive • Characterize “out-of-the-box” workload to understand CPU, Memory, IO and Network performance characteristics • Floorplan cluster resources • Tune “out-of-the-box” workload to navigate “Roofline” performance space in the above named dimensions – If workload is memory/IO/Network bound then tune SPARK to increase operational intensity operations/byte as much as possible to make it CPU bound • Divide search space into regions and perform exhaustive search 35© International Business Machines (IBM) 2017
  • 36. IO Optimizations How to take advantage of faster networks? 36Apr 7, 2017(c) International Business Machines (IBM) 2017
  • 37. THE GAP – HIGH-PERFORMANCE NETWORKS The networks – 1, 10, and 40 Gbps networks Runtime(secs) 37Apr 7, 2017(c) International Business Machines (IBM) 2017
  • 38. THE PERFORMANCE LOSS IN THE BIG-DATA STACK High-Performance I/O devices • Data copies • Context switches • Cache pollution • Deep call-stacks • Legacy I/O interfaces 38Apr 7, 2017(c) International Business Machines (IBM) 2017
  • 39. The Crail Architecture WWW.CRAIL.IO  A high-performance data fabric for the Apache Data Processing Stack  Relies on the principles of user level IO  Separation between control path and data path  User-space direct-access I/O architecture/layer cut-through  Builds on a distributed, shared data store  No changes to overall data processing framework  Is optimized to serve short-lived data sharing and staging spark / flink / storm … HDFS Crail Store High Performance RDMA Network zerocopy spark specific shuffle broadcast 39Apr 7, 2017(c) International Business Machines (IBM) 2017
  • 40. EVALUATION - TERASORT 0 100 200 300 400 500 600 Spark Spark/Crail Runtime(seconds) 12.8 TB data set, TeraSort reduce map 128 nodes OpenPOWER cluster • 2 x IBM POWER8 10-core @ 2.9 Ghz • DRAM: 512GB DDR4 • 4 x 1.2 TB NVMe SSD • 100GbE Mellanox ConnectX-4 EN (RoCE) • Ubuntu 16.04 (kernel 4.4.0-31) • Spark 2.0.2 Performance gain: 6x • Most gain from reduce phase: • Crail shuffler much faster than Spark build-in • Dramatically reduced CPU involvement • Dramatically improved network usage • Map phase: all activity local • Still faster than vanilla Spark 40Apr 7, 2017(c) International Business Machines (IBM) 2017
  • 41. EVALUATION – TERASORT: NETWORK IO • Vanilla Spark runs on 100GbE • Spark/Crail runs on 100Gb RoCE/RDMA • Vanilla Spark peaks at ~10Gb/s • Spark/Crail shuffle delivers ~70Gb/s per node 41Apr 7, 2017(c) International Business Machines (IBM) 2017
  • 42. EVALUATION – TERASORT CPU EFFICIENCY • Spark/Crail completes much faster despite comparable CPU load • Spark/Crail CPU efficiency is close to 2016 sorting benchmark winner: 3.13 vs. 4.4 GB/min/core • 2016 winner runs native C code! Spark + Crail Spark 2.0.2 Winner 2014 Winner 2016 Size TB 12.8 100 Time sec 98 527 1406 98.6 Cores 2560 6592 10240 Nodes 128 206 512 NW Gb/s 100 10 100 Rate TB/min 7.8 1.4 4.27 44.78 Rate/core GB/min 3.13 0.58 0.66 4.4 42Apr 7, 2017(c) International Business Machines (IBM) 2017
  • 43. CRAIL WITH THE HORTONWORKS STACK scalable, fault-tolerant, cost-efficient storage resource manager compute frameworks user interfaces broadcast HDFS plugin RPCs shuffle caching key-value store ... High-performance Crailfabric 43Apr 7, 2017(c) International Business Machines (IBM) 2017
  • 44. Roadmap Where is OpenPOWER headed? | 44 Apr 7, 2017(c) International Business Machines (IBM) 2017
  • 45. Accelerator Technology 2015 2016 2017 POWER8 POWER8 with NVLink POWER9 OpenPower CAPI Interface Enhanced CAPI & NVLink Connect-IB FDR Infiniband PCIe Gen3 ConnectX-4 EDR Infiniband CAPI over PCIe Gen3 ConnectX-5 Next-Gen Infiniband Enhanced CAPI over PCIe Gen4 IBM CPUs Kepler PCIe Gen3 Volta Enhanced NVLink Pascal NVLink 45© International Business Machines (IBM) 2017
  • 46. NOTICES AND DISCLAIMERS 46 Copyright © 2016 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM. U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM. Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided. IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.” Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice. Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary. References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation. It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law © 2016 International Business Machines C
  • 47. NOTICES AND DISCLAIMERS CON’T. 47 Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. The provision of the information contained h erein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right. IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.