SlideShare a Scribd company logo
1 of 47
Download to read offline
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
OpenPOWER ADG
+
IBM Deep Learning Cluster Reference Architecture
—
Florin Manaila
Senior IT Architect and Inventor
Cognitive Systems (HPC and Deep Learning)
IBM Systems Hardware Europe
OpenPOWER ADG
2Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Welcome you all for the AI and OpenPOWER event
3
4
Founding Members
in 2013
5
Ecosystem
Chip / SOC
This is What A Revolution Looks Like © 2018 OpenPOWER
Foundation
I/O / Storage / Acceleration
Boards /
Systems
Software
System / Integration
Implementation / HPC / Research
Chip / SOC
This is What A Revolution Looks Like © 2018 OpenPOWER
Foundation
I/O / Storage / Acceleration
Boards /
Systems
Software
System / Integration
Implementation / HPC / Research
328+
Members
33
Countries
70+
ISVs
Chip / SOC
This is What A Revolution Looks Like © 2018 OpenPOWER
Foundation
I/O / Storage / Acceleration
Boards /
Systems
Software
System / Integration
Implementation / HPC / Research
328+
Members
33
Countries
70+
ISVs
Active Membership
From All
Layers of the
Stack
100k+ Linux Applications
Running on Power
2300 ISVs Written Code
on Linux
Partners
Bring
Systems
to Market
150+ OpenPOWER Ready
Certified Products
20+ Systems Manufacturers
40+ POWER-based systems
shipping or in development
100+ Collaborative innovations
under way
POWER Roadmap
9
OpenPOWER in Action
10Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Academic Membership
11
A*STAR ASU ASTRI Moscow State University Carnegie Mellon Univ.
CDAC Colorado School of
Mines
CINECA CFMS Coimbatore Institute of
Technology
Dalian University of
Technology
GSIC Hartree Centre ICM IIIT Bangalore
IIT Bombay Indian Institute for
Technology Roorkee
ICCS INAF FZ Jülich
LSU BSC Nanyang Technological
University
National University of
Singapore
NIT Mangalore
NIT Warangal Northeastern University
in China
ORNL OSU RICE
Rome HPC Center LLNL SANDIA SASTRA University Seoul National University
Shanghai Shao Tong
University
SICSR TEES Tohoku University Tsinghua University
University of Arkansas SDSC Unicamp University of Central
Florida
University of Florida
University of Hawai University of Hyderabad University of Illinois University of Michigan University of Oregon
University of Patras University of Southern
California
TACC Waseda University IISc ,Loyola,IIT Roorkee
Currently about 100+ academic members in OPF
Goals of the Academia Discussion Group
12
§ Provide training and exchange of experience and know-how
§ Provide platform for networking among academic members
§ Work on engagement of HPC community
§ Enable co-design/development activities
OpenPOWER Foundation
13
Growing number of academic organizations have become member of the OpenPOWER Foundation
The Academia Discussion Groups provides a platform for training, networking, engagement and
enablement of co-design
Those who have not yet joined:
You are welcome to join
https://members.openpowerfoundation.org/wg/AcademiaDG/mail/index
OpenPOWER AI virtual University's focus on bringing together industry, government and academic
expertise to connect and help shape the AI future .
https://www.youtube.com/channel/UCYLtbUp0AH0ZAv5mNut1Kcg
IBM Deep Learning Cluster Reference Architecture
14Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Distributed Deep Learning Approach
15
SINGLE ACCELERATOR DATA PARALLEL MODEL PARALLEL DATA AND MODEL PARALLEL
1x Accelerator 4x Accelerators 4x Accelerators
4x n Accelerators
Longer Training Time Shorter Training Time
System1System2Systemn
System
Data
Data
DataDataDataData
DataDataData
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Phases of AI development
16
Experimentation Phase
– Single-node
– Small scale data
– Algorithms prototyping and hyper-parammeters
Scaling Phase
– Multi-node
– Medium scale data (local SSD’s or NVM’s)
Production Phase
– Cluster deployment
– Upstream data pipeline
– Inference
Experimentation Scaling Production
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Challenges in Deep Learning
17
§ Storage performance / Data-pipeline
§ Network performance
§ Orchestration
§ Management and monitoring of the cluster
§ Monitoring of DL training or DL inference
§ Scaling
§ Efficiency
§ Data ingest
§ ILM
§ Backup
§ Accelerated rate of new DL frameworks and versions
§ Software refresh cycle
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Deep Learning Scaling Challenges
18
§ Model replication
§ Device placement for variables
§ Fault tolerance
§ Sessions and Servers
§ Monitoring training session
§ Data splitting
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Some Data Scientists considerations
19
Data Size
– The entire model might not fit onto a single GPU If the size of the input data is especially
large
– Shared file system is required if the number of records is prohibitively large
– If the number of records is large convergence can be sped up using multiple GPUs or
distributed models
Model Size
– Splitting the model across multi-GPUs (model parallel) is required if the size of the network
is larger than used GPU’s memory
# Updates
– Multiple GPU configurations on a single server (4, 6, 8) should be taken into consideration
in case number of updates and the size of the updates are considerable
Hardware
– Network speeds play a crucial role in distributed model settings
– Infiniband RDMA and MPI play an important role (MPI latency is 1-3us/message due to OS
bypass)
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Standards
20
§ Mellanox InfiniBand
§ RDMA over InfiniBand
§ NVIDIA GPU’s and related software
§ Containers
§ Workload Managers (LSF, SLURM, Kubernetes etc)
§ xCAT
§ High Performance File System
§ Python 2.x and/or 3.x
§ DL Frameworks (Caffe, TF, Torch, etc)
§ SSD/NVMe
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Functional Requirements
21
§ NVIDIA GPU’s SMX2 form factor
§ InfiniBand EDR interconnect no over-
subscription
§ Islands approach for large cluster
§ Inter islalands 1:2 InfiniBand over-
subscription
§ High Performance file system using SSD’s,
NVM’s or Flash
§ MPI
§ Job Scheduler support for GPU based
containers
§ Job Scheduler python integration
§ DL Frameworks support for NVLINK
§ Distributed Deep Learning
§ Large Model Support
§ HDFS support
§ IPIMI support
§ Management and Monitoring of the infrastructure
with xCAT or similar and web interface
§ Visualization of Distributed Deep Learning training
activities
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Non-Functional Requirements
22
§ Accessibility
§ Audibility and Control
§ Availability
§ Backup
§ Fault tolerance (e.g. Operational System Monitoring, Measuring, and Management)
§ Open Source Frameworks
§ Resilience
§ Scalability in integrated way (from 2 nodes to 2000 nodes)
§ Security and Privacy
§ Throughput
§ Performance / short training times
§ Platform compatibility
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Architecture Decisions
Containers vs Bare Metal
23Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Architecture Decisions
Storage
24Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Architecture for an experimental IBM Deep Learning System
Hardware Overview
26Experimentation Scaling Production
Data Scientists
workstations
Data Scientists Internal SAS
drives and
NVM’s
POWER
Accelerated
Servers with GPU’s
InfiniBand EDR
P2P connection
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Architecture for small IBM Deep Learning Cluster
Hardware Overview
27
Experimentation Scaling Production
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Architecture for small IBM Deep Learning Cluster
Hardware Overview for fully containerized environment
28Group Name / DOC ID / Month XX, 2017 / © 2017 IBM Corporation
Experimentation Scaling ProductionProduction
29
Experimentation Scaling Production
Architecture for large IBM Deep Learning Cluster
Hardware Overview
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Architecture for small to large IBM Deep Learning Cluster
Storage – Spectrum Scale
30
Block
iSCSI
Data Scientists
workstations
Data Scientists
and
applications
Traditional
applications
Global Namespace
Analytics
Transparent
HDFS
Spark
OpenStack
Cinder
Glance
Manilla
Object
Swift S3
Transparent Cloud
Powered by IBM Spectrum Scale
Automated data placement and data migration
Disk Tape Shared Nothing
Cluster
Flash
Transparent Cloud
Tier
SMBNFS
POSIX
File
Worldwide Data
Distribution(AFM)
Site
B
Site
A
Site
C
Encryption
DR Site
AFM-DR
JBOD/JBOF
Spectrum Scale RAID
Compression
Deep
Learning
Cluster
Native
RDMA
over InfiniBnd
Long Term Only
Experimentation Scaling Production
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Architecture for large IBM Deep Learning Cluster
Compute (InfiniBand) Networking
31
Compute Island #1 Compute Island #2 Mng and IO Island #1
L3-1 L3-X
L2-1 L2-Z
18x Links to Login, Srv 18x Links to IBM ESS
L1-1 L1-Y
L2-1 L2-Z
18x Links to Compute 18x Links to Compute
L1-1 L1-Y
L2-1 L2-Z
18x Links to Compute 18x Links to Compute
NOTE: Number of InfiniBand switches depends of the no of compute nodes and required
oversubscription as well as no of available IB ports / switch
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Architecture for large IBM Deep Learning Cluster
Management Networking
32Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Architecture for large IBM Deep Learning Cluster
Docker Containers (only for HPC based customers)
33Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Physical View of small IBM Deep Learning Cluster
Hardware rack view
34
Compute Nodes (9x)
• Shown with decorative bezel
• Hardware viewable behind
bezel
Network Switch
Location
• Shown with blank cover
• 3 EIA
Empty Space
• 2 EIA
• Space reserved in the back
for power, cooling, cabling
escape
Empty Space
• 1 EIA
• Space reserved in the back
for power, cabling escape
Compute Nodes (9x)
• Shown with decorative bezel
• Hardware viewable behind
bezel
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Physical View of small IBM Deep Learning Cluster – Sample Scalability
Hardware rack view
35
Scale by factor of:
- 2x Storage (capacity
and performance)
- 3.2x Compute
- 1:1 IB
Oversubscription
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Architecture for an experimental IBM Deep Learning System
Software Overview
36Experimentation Scaling Production
RHEL 7.5
Mlnx OFED 4
CUDA 9
cuDDN 7
IBM Spectrum MPI
PowerAI 5.1
Docker
Anaconda
Nvidia-Docker
RHEL 7.5
Mlnx OFED 4
CUDA 9
cuDDN 7
Docker
ICP with K8s
PowerAIBase
PowerAIVision
PowerAIBase
PowerAIBase
DSXLocal
RHEL 7.5
Mlnx OFED 4
CUDA 9
cuDDN 7
IBM Spectrum MPI
PowerAI 5.1
Docker
Anaconda
Nvidia-Docker
IBM Spectrum LSF
Option 1 Option 2 Option 3
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Architecture for an experimental IBM Deep Learning System
Software Overview
37Experimentation Scaling Production
RHEL 7.5
Mlnx OFED 4
CUDA 9
cuDDN 7
Docker
ICP Compute
PowerAIBase
PowerAIVision
PowerAIBase
PowerAIBase
DSXLocal
RHEL 7.5
Mlnx OFED 4
CUDA 9
cuDDN 7
IBM Spectrum MPI
PowerAI 5.1
Docker
Anaconda
Nvidia-Docker
LSF Client
Option 1
Option 2
RHEL 7.5
Mlnx OFED 4
IBM Spectrum MPI
LSF Master
RHEL 7.5
Mlnx OFED 4
Docker
ICP Master with K8s
xCAT, Grafana
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
IBM Cloud Private Architecture Overview
Containerized environment based on Kubernetes
38Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Architecture Overview for IBM Deep Learning Cluster
Hardware Components
39
§ Login Nodes (40c POWER9, 2x V100 GPU's, 256GB RAM, 2x 960GB SSD, IB EDR,
10GE, 1Gbps)
§ Service/Master Nodes (40c POWER, 256GB RAM, 4x 960GB SSD, IB EDR, 10GE)
§ CES Nodes (40c POWER, 256GB RAM, 2x 960GB SSD, IB EDR, 10GE)
§ Compute/Worker Nodes (40c POWER9, 4x V100 GPU's, 512GB RAM, 2x 960GB SSD,
1x 1.6TB NVMe adapter, IB EDR, 1Gbps)
§ EDR Mellanox InfiniBand Switches with 36 ports; including IB cables
§ IBM Ethernet Switches for management (48-ports 1Gbps and 4x 10GE ports )
including cables and SFP+
§ IBM ESS GS2S, with InfiniBand EDR and 10GE Network for storage
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
IBM Newell
AC922 System Architecture Overview
40Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Architecture Overview for IBM Deep Learning Cluster
Operational Model 1
41
Data Scientists
2x IBM AC922
SSHv2
HTTP
DIGITS Web
CLI - Python
AI Vision
Experimentation Scaling ProductionCognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Architecture Overview for IBM Deep Learning Cluster
Operational Model 2
42
Data Scientists
2x IBM AC922
SSHv2
HTTP
Jupiter
Notebook
CLI - Python
TensorBoar
d
Experimentation Scaling ProductionCognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
LSF New GPU Scheduling Options
43
GPU mode management
§ The user can request the desired GPU mode for the job. If the mode of a GPU needs to be
changed for a job to run, it’s original mode will be restored after the job completes.
GPU allocation policy
§ Support reserving physical GPU resources
§ Provide “best-effort” GPU allocation policy considering: CPU-GPU affinity, Current GPU
mode and GPU job load
§ Export CUDA_VISIBLE_DEVICES for using in job pre/post scripts
Integrated support for IBM Spectrum MPI
§ Export per task environmental variables CUDA_VISIBLE_DEVICES%d
§ IBM Spectrum MPI will apply the correct CVD mask to each task
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
LSF Docker Support
44
Starting with LSF 10.1.0.3 we provide support for nVidia’s distribution of Docker which allows
LSF’s CPU, cgroup and GPU allocation functionality to work correctly.
Begin Application
NAME = nvdia-docker
CONTAINER = nvidia-docker [ image(nvidia/cuda) options(--rm --net=host --ipc=host --sig-
proxy=false) starter(lsfadmin)]
End Application
$bsub -app nvdia-docker –gpu “num=1” ./ibm-powerai
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
HW Design: Elastic Storage Server (ESS)
45
Software
§ IBM Spectrum Scale for
IBM Elastic Storage
Server
§ RedHat Linux Enterprise
Data Server Summary
§ 2x20 Cores POWER8 3.42 GHz
§ 2x 256GB DDR4 Memory
§ 4x 100Gb/s Infiniband EDR
Storage SSD Enclosures
§ 2x 24 3.84 TB SSD (288 SSD)
§ Cc 128TB usable capacity (8+2
parity)
§ Burst Buffer capacity - sum of all
NVMe‘s in the compute nodes
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
HW Design: Burst Buffer Integration
46
§ Compute Node SSD uses a standard XFS Linux file system
§ Burst buffer is a file transfer service
§ Raw data transfer uses NVMe over Fabrics
§ Formerly called FlashDirect
§ Think of it as: RDMA targeting NVMe memory
§ Data transferred between ESS I/O node and NVMe PCIe device
§ Data is directly placed onto NVMe PCIe device (or pulled from)
§ Avoiding CPU/GPU usage
§ Hardware offload support built into ConnectX-5
§ File system
§ BB determines where to place data onto NVMe PCIe
§ Consistent with where the file system expects
§ Optimized for direct placement of data.
NVMe PCIe
Device
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
Thank you
47
Florin Manaila
Senior IT Architect and Inventor
Cognitive Systems (HPC and Deep Learning)
florin.manaila@de.ibm.com
Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
48Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation

More Related Content

What's hot

Transparent Hardware Acceleration for Deep Learning
Transparent Hardware Acceleration for Deep LearningTransparent Hardware Acceleration for Deep Learning
Transparent Hardware Acceleration for Deep LearningIndrajit Poddar
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0Ganesan Narayanasamy
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsGanesan Narayanasamy
 
AI OpenPOWER Academia Discussion Group
AI OpenPOWER Academia Discussion Group AI OpenPOWER Academia Discussion Group
AI OpenPOWER Academia Discussion Group Ganesan Narayanasamy
 
OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research Ganesan Narayanasamy
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERinside-BigData.com
 
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Indrajit Poddar
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIBM Switzerland
 
End-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooEnd-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooJason Dai
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Ganesan Narayanasamy
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning Dr. Swaminathan Kathirvel
 

What's hot (20)

Transparent Hardware Acceleration for Deep Learning
Transparent Hardware Acceleration for Deep LearningTransparent Hardware Acceleration for Deep Learning
Transparent Hardware Acceleration for Deep Learning
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systems
 
2018 bsc power9 and power ai
2018   bsc power9 and power ai 2018   bsc power9 and power ai
2018 bsc power9 and power ai
 
AI OpenPOWER Academia Discussion Group
AI OpenPOWER Academia Discussion Group AI OpenPOWER Academia Discussion Group
AI OpenPOWER Academia Discussion Group
 
BSC LMS DDL
BSC LMS DDL BSC LMS DDL
BSC LMS DDL
 
OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research OpenPOWER Webinar on Machine Learning for Academic Research
OpenPOWER Webinar on Machine Learning for Academic Research
 
OpenPOWER Webinar
OpenPOWER Webinar OpenPOWER Webinar
OpenPOWER Webinar
 
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWERIBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
 
CFD on Power
CFD on Power CFD on Power
CFD on Power
 
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
 
Ibm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bkIbm symp14 referentin_barbara koch_power_8 launch bk
Ibm symp14 referentin_barbara koch_power_8 launch bk
 
IBM HPC Transformation with AI
IBM HPC Transformation with AI IBM HPC Transformation with AI
IBM HPC Transformation with AI
 
End-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooEnd-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics Zoo
 
SNAP MACHINE LEARNING
SNAP MACHINE LEARNINGSNAP MACHINE LEARNING
SNAP MACHINE LEARNING
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture
 
Deeplearningusingcloudpakfordata
DeeplearningusingcloudpakfordataDeeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
 
FPGAs and Machine Learning
FPGAs and Machine LearningFPGAs and Machine Learning
FPGAs and Machine Learning
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
 
OpenPOWER System Marconi100
OpenPOWER System Marconi100OpenPOWER System Marconi100
OpenPOWER System Marconi100
 

Similar to Distributed deep learning reference architecture v3.2l

Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Codemotion
 
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Codemotion
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit MumbaiAnand Haridass
 
Genomics Deployments - How to Get Right with Software Defined Storage
 Genomics Deployments -  How to Get Right with Software Defined Storage Genomics Deployments -  How to Get Right with Software Defined Storage
Genomics Deployments - How to Get Right with Software Defined StorageSandeep Patil
 
Webinar: Three Reasons Why NAS is No Good for AI and Machine Learning
Webinar: Three Reasons Why NAS is No Good for AI and Machine LearningWebinar: Three Reasons Why NAS is No Good for AI and Machine Learning
Webinar: Three Reasons Why NAS is No Good for AI and Machine LearningStorage Switzerland
 
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...Paul Hofmann
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkAlluxio, Inc.
 
Research data management 1.5
Research data management 1.5Research data management 1.5
Research data management 1.5John Martin
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderDataconomy Media
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform Seldon
 
HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataLviv Startup Club
 
Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Lviv Startup Club
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examplesLuciano Resende
 
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in VacouverNGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in VacouverScott Shadley, MBA,PMC-III
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors DataWorks Summit/Hadoop Summit
 
IRJET- ALPYNE - A Grid Computing Framework
IRJET- ALPYNE - A Grid Computing FrameworkIRJET- ALPYNE - A Grid Computing Framework
IRJET- ALPYNE - A Grid Computing FrameworkIRJET Journal
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningModusOptimum
 
MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...
MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...
MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...MongoDB
 
flexpod_hadoop_cloudera
flexpod_hadoop_clouderaflexpod_hadoop_cloudera
flexpod_hadoop_clouderaPrem Jain
 

Similar to Distributed deep learning reference architecture v3.2l (20)

Data EcoSystem 2.0
Data EcoSystem 2.0Data EcoSystem 2.0
Data EcoSystem 2.0
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
 
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
 
2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai2016 August POWER Up Your Insights - IBM System Summit Mumbai
2016 August POWER Up Your Insights - IBM System Summit Mumbai
 
Genomics Deployments - How to Get Right with Software Defined Storage
 Genomics Deployments -  How to Get Right with Software Defined Storage Genomics Deployments -  How to Get Right with Software Defined Storage
Genomics Deployments - How to Get Right with Software Defined Storage
 
Webinar: Three Reasons Why NAS is No Good for AI and Machine Learning
Webinar: Three Reasons Why NAS is No Good for AI and Machine LearningWebinar: Three Reasons Why NAS is No Good for AI and Machine Learning
Webinar: Three Reasons Why NAS is No Good for AI and Machine Learning
 
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Research data management 1.5
Research data management 1.5Research data management 1.5
Research data management 1.5
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern Staender
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
 
HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big Data
 
Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in VacouverNGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
 
IRJET- ALPYNE - A Grid Computing Framework
IRJET- ALPYNE - A Grid Computing FrameworkIRJET- ALPYNE - A Grid Computing Framework
IRJET- ALPYNE - A Grid Computing Framework
 
The Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine LearningThe Future of Data Warehousing, Data Science and Machine Learning
The Future of Data Warehousing, Data Science and Machine Learning
 
MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...
MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...
MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...
 
flexpod_hadoop_cloudera
flexpod_hadoop_clouderaflexpod_hadoop_cloudera
flexpod_hadoop_cloudera
 

More from Ganesan Narayanasamy

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency programGanesan Narayanasamy
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and VerilogGanesan Narayanasamy
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISAGanesan Narayanasamy
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsGanesan Narayanasamy
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...Ganesan Narayanasamy
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsGanesan Narayanasamy
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsGanesan Narayanasamy
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems Ganesan Narayanasamy
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Ganesan Narayanasamy
 

More from Ganesan Narayanasamy (20)

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency program
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA
 
OpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT RoorkeeOpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT Roorkee
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systems
 
IBM BOA for POWER
IBM BOA for POWER IBM BOA for POWER
IBM BOA for POWER
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
 
POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
 
AI in healthcare - Use Cases
AI in healthcare - Use Cases AI in healthcare - Use Cases
AI in healthcare - Use Cases
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
 
Poster from NUS
Poster from NUSPoster from NUS
Poster from NUS
 
SAP HANA on POWER9 systems
SAP HANA on POWER9 systemsSAP HANA on POWER9 systems
SAP HANA on POWER9 systems
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 
AI in the enterprise
AI in the enterprise AI in the enterprise
AI in the enterprise
 
Robustness in deep learning
Robustness in deep learningRobustness in deep learning
Robustness in deep learning
 
Perspectives of Frond end Design
Perspectives of Frond end DesignPerspectives of Frond end Design
Perspectives of Frond end Design
 
A2O Core implementation on FPGA
A2O Core implementation on FPGAA2O Core implementation on FPGA
A2O Core implementation on FPGA
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 

Distributed deep learning reference architecture v3.2l

  • 1. Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation OpenPOWER ADG + IBM Deep Learning Cluster Reference Architecture — Florin Manaila Senior IT Architect and Inventor Cognitive Systems (HPC and Deep Learning) IBM Systems Hardware Europe
  • 2. OpenPOWER ADG 2Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 3. Welcome you all for the AI and OpenPOWER event 3
  • 6. Chip / SOC This is What A Revolution Looks Like © 2018 OpenPOWER Foundation I/O / Storage / Acceleration Boards / Systems Software System / Integration Implementation / HPC / Research
  • 7. Chip / SOC This is What A Revolution Looks Like © 2018 OpenPOWER Foundation I/O / Storage / Acceleration Boards / Systems Software System / Integration Implementation / HPC / Research 328+ Members 33 Countries 70+ ISVs
  • 8. Chip / SOC This is What A Revolution Looks Like © 2018 OpenPOWER Foundation I/O / Storage / Acceleration Boards / Systems Software System / Integration Implementation / HPC / Research 328+ Members 33 Countries 70+ ISVs Active Membership From All Layers of the Stack 100k+ Linux Applications Running on Power 2300 ISVs Written Code on Linux Partners Bring Systems to Market 150+ OpenPOWER Ready Certified Products 20+ Systems Manufacturers 40+ POWER-based systems shipping or in development 100+ Collaborative innovations under way
  • 10. OpenPOWER in Action 10Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 11. Academic Membership 11 A*STAR ASU ASTRI Moscow State University Carnegie Mellon Univ. CDAC Colorado School of Mines CINECA CFMS Coimbatore Institute of Technology Dalian University of Technology GSIC Hartree Centre ICM IIIT Bangalore IIT Bombay Indian Institute for Technology Roorkee ICCS INAF FZ Jülich LSU BSC Nanyang Technological University National University of Singapore NIT Mangalore NIT Warangal Northeastern University in China ORNL OSU RICE Rome HPC Center LLNL SANDIA SASTRA University Seoul National University Shanghai Shao Tong University SICSR TEES Tohoku University Tsinghua University University of Arkansas SDSC Unicamp University of Central Florida University of Florida University of Hawai University of Hyderabad University of Illinois University of Michigan University of Oregon University of Patras University of Southern California TACC Waseda University IISc ,Loyola,IIT Roorkee Currently about 100+ academic members in OPF
  • 12. Goals of the Academia Discussion Group 12 § Provide training and exchange of experience and know-how § Provide platform for networking among academic members § Work on engagement of HPC community § Enable co-design/development activities
  • 13. OpenPOWER Foundation 13 Growing number of academic organizations have become member of the OpenPOWER Foundation The Academia Discussion Groups provides a platform for training, networking, engagement and enablement of co-design Those who have not yet joined: You are welcome to join https://members.openpowerfoundation.org/wg/AcademiaDG/mail/index OpenPOWER AI virtual University's focus on bringing together industry, government and academic expertise to connect and help shape the AI future . https://www.youtube.com/channel/UCYLtbUp0AH0ZAv5mNut1Kcg
  • 14. IBM Deep Learning Cluster Reference Architecture 14Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 15. Distributed Deep Learning Approach 15 SINGLE ACCELERATOR DATA PARALLEL MODEL PARALLEL DATA AND MODEL PARALLEL 1x Accelerator 4x Accelerators 4x Accelerators 4x n Accelerators Longer Training Time Shorter Training Time System1System2Systemn System Data Data DataDataDataData DataDataData Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 16. Phases of AI development 16 Experimentation Phase – Single-node – Small scale data – Algorithms prototyping and hyper-parammeters Scaling Phase – Multi-node – Medium scale data (local SSD’s or NVM’s) Production Phase – Cluster deployment – Upstream data pipeline – Inference Experimentation Scaling Production Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 17. Challenges in Deep Learning 17 § Storage performance / Data-pipeline § Network performance § Orchestration § Management and monitoring of the cluster § Monitoring of DL training or DL inference § Scaling § Efficiency § Data ingest § ILM § Backup § Accelerated rate of new DL frameworks and versions § Software refresh cycle Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 18. Deep Learning Scaling Challenges 18 § Model replication § Device placement for variables § Fault tolerance § Sessions and Servers § Monitoring training session § Data splitting Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 19. Some Data Scientists considerations 19 Data Size – The entire model might not fit onto a single GPU If the size of the input data is especially large – Shared file system is required if the number of records is prohibitively large – If the number of records is large convergence can be sped up using multiple GPUs or distributed models Model Size – Splitting the model across multi-GPUs (model parallel) is required if the size of the network is larger than used GPU’s memory # Updates – Multiple GPU configurations on a single server (4, 6, 8) should be taken into consideration in case number of updates and the size of the updates are considerable Hardware – Network speeds play a crucial role in distributed model settings – Infiniband RDMA and MPI play an important role (MPI latency is 1-3us/message due to OS bypass) Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 20. Standards 20 § Mellanox InfiniBand § RDMA over InfiniBand § NVIDIA GPU’s and related software § Containers § Workload Managers (LSF, SLURM, Kubernetes etc) § xCAT § High Performance File System § Python 2.x and/or 3.x § DL Frameworks (Caffe, TF, Torch, etc) § SSD/NVMe Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 21. Functional Requirements 21 § NVIDIA GPU’s SMX2 form factor § InfiniBand EDR interconnect no over- subscription § Islands approach for large cluster § Inter islalands 1:2 InfiniBand over- subscription § High Performance file system using SSD’s, NVM’s or Flash § MPI § Job Scheduler support for GPU based containers § Job Scheduler python integration § DL Frameworks support for NVLINK § Distributed Deep Learning § Large Model Support § HDFS support § IPIMI support § Management and Monitoring of the infrastructure with xCAT or similar and web interface § Visualization of Distributed Deep Learning training activities Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 22. Non-Functional Requirements 22 § Accessibility § Audibility and Control § Availability § Backup § Fault tolerance (e.g. Operational System Monitoring, Measuring, and Management) § Open Source Frameworks § Resilience § Scalability in integrated way (from 2 nodes to 2000 nodes) § Security and Privacy § Throughput § Performance / short training times § Platform compatibility Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 23. Architecture Decisions Containers vs Bare Metal 23Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 24. Architecture Decisions Storage 24Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 25. Architecture for an experimental IBM Deep Learning System Hardware Overview 26Experimentation Scaling Production Data Scientists workstations Data Scientists Internal SAS drives and NVM’s POWER Accelerated Servers with GPU’s InfiniBand EDR P2P connection Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 26. Architecture for small IBM Deep Learning Cluster Hardware Overview 27 Experimentation Scaling Production Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 27. Architecture for small IBM Deep Learning Cluster Hardware Overview for fully containerized environment 28Group Name / DOC ID / Month XX, 2017 / © 2017 IBM Corporation Experimentation Scaling ProductionProduction
  • 28. 29 Experimentation Scaling Production Architecture for large IBM Deep Learning Cluster Hardware Overview Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 29. Architecture for small to large IBM Deep Learning Cluster Storage – Spectrum Scale 30 Block iSCSI Data Scientists workstations Data Scientists and applications Traditional applications Global Namespace Analytics Transparent HDFS Spark OpenStack Cinder Glance Manilla Object Swift S3 Transparent Cloud Powered by IBM Spectrum Scale Automated data placement and data migration Disk Tape Shared Nothing Cluster Flash Transparent Cloud Tier SMBNFS POSIX File Worldwide Data Distribution(AFM) Site B Site A Site C Encryption DR Site AFM-DR JBOD/JBOF Spectrum Scale RAID Compression Deep Learning Cluster Native RDMA over InfiniBnd Long Term Only Experimentation Scaling Production Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 30. Architecture for large IBM Deep Learning Cluster Compute (InfiniBand) Networking 31 Compute Island #1 Compute Island #2 Mng and IO Island #1 L3-1 L3-X L2-1 L2-Z 18x Links to Login, Srv 18x Links to IBM ESS L1-1 L1-Y L2-1 L2-Z 18x Links to Compute 18x Links to Compute L1-1 L1-Y L2-1 L2-Z 18x Links to Compute 18x Links to Compute NOTE: Number of InfiniBand switches depends of the no of compute nodes and required oversubscription as well as no of available IB ports / switch Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 31. Architecture for large IBM Deep Learning Cluster Management Networking 32Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 32. Architecture for large IBM Deep Learning Cluster Docker Containers (only for HPC based customers) 33Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 33. Physical View of small IBM Deep Learning Cluster Hardware rack view 34 Compute Nodes (9x) • Shown with decorative bezel • Hardware viewable behind bezel Network Switch Location • Shown with blank cover • 3 EIA Empty Space • 2 EIA • Space reserved in the back for power, cooling, cabling escape Empty Space • 1 EIA • Space reserved in the back for power, cabling escape Compute Nodes (9x) • Shown with decorative bezel • Hardware viewable behind bezel Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 34. Physical View of small IBM Deep Learning Cluster – Sample Scalability Hardware rack view 35 Scale by factor of: - 2x Storage (capacity and performance) - 3.2x Compute - 1:1 IB Oversubscription Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 35. Architecture for an experimental IBM Deep Learning System Software Overview 36Experimentation Scaling Production RHEL 7.5 Mlnx OFED 4 CUDA 9 cuDDN 7 IBM Spectrum MPI PowerAI 5.1 Docker Anaconda Nvidia-Docker RHEL 7.5 Mlnx OFED 4 CUDA 9 cuDDN 7 Docker ICP with K8s PowerAIBase PowerAIVision PowerAIBase PowerAIBase DSXLocal RHEL 7.5 Mlnx OFED 4 CUDA 9 cuDDN 7 IBM Spectrum MPI PowerAI 5.1 Docker Anaconda Nvidia-Docker IBM Spectrum LSF Option 1 Option 2 Option 3 Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 36. Architecture for an experimental IBM Deep Learning System Software Overview 37Experimentation Scaling Production RHEL 7.5 Mlnx OFED 4 CUDA 9 cuDDN 7 Docker ICP Compute PowerAIBase PowerAIVision PowerAIBase PowerAIBase DSXLocal RHEL 7.5 Mlnx OFED 4 CUDA 9 cuDDN 7 IBM Spectrum MPI PowerAI 5.1 Docker Anaconda Nvidia-Docker LSF Client Option 1 Option 2 RHEL 7.5 Mlnx OFED 4 IBM Spectrum MPI LSF Master RHEL 7.5 Mlnx OFED 4 Docker ICP Master with K8s xCAT, Grafana Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 37. IBM Cloud Private Architecture Overview Containerized environment based on Kubernetes 38Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 38. Architecture Overview for IBM Deep Learning Cluster Hardware Components 39 § Login Nodes (40c POWER9, 2x V100 GPU's, 256GB RAM, 2x 960GB SSD, IB EDR, 10GE, 1Gbps) § Service/Master Nodes (40c POWER, 256GB RAM, 4x 960GB SSD, IB EDR, 10GE) § CES Nodes (40c POWER, 256GB RAM, 2x 960GB SSD, IB EDR, 10GE) § Compute/Worker Nodes (40c POWER9, 4x V100 GPU's, 512GB RAM, 2x 960GB SSD, 1x 1.6TB NVMe adapter, IB EDR, 1Gbps) § EDR Mellanox InfiniBand Switches with 36 ports; including IB cables § IBM Ethernet Switches for management (48-ports 1Gbps and 4x 10GE ports ) including cables and SFP+ § IBM ESS GS2S, with InfiniBand EDR and 10GE Network for storage Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 39. IBM Newell AC922 System Architecture Overview 40Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 40. Architecture Overview for IBM Deep Learning Cluster Operational Model 1 41 Data Scientists 2x IBM AC922 SSHv2 HTTP DIGITS Web CLI - Python AI Vision Experimentation Scaling ProductionCognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 41. Architecture Overview for IBM Deep Learning Cluster Operational Model 2 42 Data Scientists 2x IBM AC922 SSHv2 HTTP Jupiter Notebook CLI - Python TensorBoar d Experimentation Scaling ProductionCognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 42. LSF New GPU Scheduling Options 43 GPU mode management § The user can request the desired GPU mode for the job. If the mode of a GPU needs to be changed for a job to run, it’s original mode will be restored after the job completes. GPU allocation policy § Support reserving physical GPU resources § Provide “best-effort” GPU allocation policy considering: CPU-GPU affinity, Current GPU mode and GPU job load § Export CUDA_VISIBLE_DEVICES for using in job pre/post scripts Integrated support for IBM Spectrum MPI § Export per task environmental variables CUDA_VISIBLE_DEVICES%d § IBM Spectrum MPI will apply the correct CVD mask to each task Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 43. LSF Docker Support 44 Starting with LSF 10.1.0.3 we provide support for nVidia’s distribution of Docker which allows LSF’s CPU, cgroup and GPU allocation functionality to work correctly. Begin Application NAME = nvdia-docker CONTAINER = nvidia-docker [ image(nvidia/cuda) options(--rm --net=host --ipc=host --sig- proxy=false) starter(lsfadmin)] End Application $bsub -app nvdia-docker –gpu “num=1” ./ibm-powerai Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 44. HW Design: Elastic Storage Server (ESS) 45 Software § IBM Spectrum Scale for IBM Elastic Storage Server § RedHat Linux Enterprise Data Server Summary § 2x20 Cores POWER8 3.42 GHz § 2x 256GB DDR4 Memory § 4x 100Gb/s Infiniband EDR Storage SSD Enclosures § 2x 24 3.84 TB SSD (288 SSD) § Cc 128TB usable capacity (8+2 parity) § Burst Buffer capacity - sum of all NVMe‘s in the compute nodes Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 45. HW Design: Burst Buffer Integration 46 § Compute Node SSD uses a standard XFS Linux file system § Burst buffer is a file transfer service § Raw data transfer uses NVMe over Fabrics § Formerly called FlashDirect § Think of it as: RDMA targeting NVMe memory § Data transferred between ESS I/O node and NVMe PCIe device § Data is directly placed onto NVMe PCIe device (or pulled from) § Avoiding CPU/GPU usage § Hardware offload support built into ConnectX-5 § File system § BB determines where to place data onto NVMe PCIe § Consistent with where the file system expects § Optimized for direct placement of data. NVMe PCIe Device Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 46. Thank you 47 Florin Manaila Senior IT Architect and Inventor Cognitive Systems (HPC and Deep Learning) florin.manaila@de.ibm.com Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation
  • 47. 48Cognitive Systems / v3.1 / May 28 / © 2018 IBM Corporation