AI OpenPOWER Academia Discussion Group

OpenPOWER and AI
Workshop
Ganesan Narayanasamy
IBM

Welcome you all for the AI and OpenPOWER Bootcamp
6/2
0/2
2

OpenPOWER & AI Workshop at BSC ,Barcelona
By OpenPOWER Academia
Day 1 is meant as an introduction for everyone interested in using AI.
Day 2 is meant to go deeper with those who have especially challenging projects.
on 18th and 19th June 2018

Agenda
Day 1 - June 18th 2018
9:00 a.m to 9.30 a.m.
9.30 a.m to 10.15 am
10.15 am to 10.30 am
10.30 am to 11.15 am
11.15 am to 12.00 Noon
12.00 Noon to 1.00 pm
Welcome and OpenPOWER ADG features
Introduction to Power 9 and PowerAI
Break
Large Model Support and Distributed Deep Learning
Use Case Demonstration with PowerAI
Lunch
1.00pm to 1.45 pm
1.45 pm to 2.45 pm
2.45 pm to 3.00 pm
3.00.pm to 3.45pm
3.45 pm to 4.45 pm
4.45 pm to 5.00 pm
Mellanox Feature Updates
CFD Simulation on Power
Break
Introduction to Snap Machine Learning
Snap Machine Learning Demos , Q&A
Wrap up and Q & A

Agenda
Day 2 - June 19th 2018
9.00 am to 9.30 am
9.30 am to 12.00 pm
12.00 pm to 1.00 pm
01.00 pm to 04.30 pm
Quick review about Day I
Deep Learning Exercise II using Nimbix /Other Infra
Industry specific use cases ( LMS )
Lunch
Deep Learning Exercise II using Nimbix/Other infra
Industry specific Use cases using P9 features ( LMS
and DDL )

Chip / SOC
This is What A Revolution Looks Like © 2018 OpenPOWER
Foundation
I/O / Storage / Acceleration
Boards /
Systems
Software
System / Integration
Implementation / HPC / Research

Software
Boards /
Systems
Chip / SOC
Foundation
328+
Members
33
Countri
es
70+
ISVs

Chip / SOC
Foundation
Boards /
Systems
Software
328+
Members
33
Countri
es
70+
ISVs
Active Membership
From All
Layers of the
Stack
100k+ Linux Applications
Running on Power
2300 ISVs Written Code
on Linux
Partners
Bring
Systems
to Market
150+ OpenPOWER Ready
Certified Products
20+ Systems Manufacturers
40+ POWER-based systems
shipping or in development
100+ Collaborative innovations
under way

OpenPOWER in Action
6/2
0/2
12

What is CORAL?
The program through which Summit & Sierra are procured.
 Several DOE labs have strong supercomputing programs and facilities.
 To bring the next generation of leading supercomputers to these labs, DOE
created CORAL (the Collaboration of Oak Ridge, Argonne, and Livermore) to
jointly procure these systems, and in so doing, align strategy and resources
across the DOE enterprise.
 Collaboration grouping of DOE labs was done based on common acquisition
timings. Collaboration is a win-win for all parties.
“Summit” System “Sierra” System
OpenPOWER Technologies: IBM POWER CPUs, NVIDIA Tesla GPUs, Mellanox EDR
100Gb/s InfiniBand
Paving The Road to Exascale Performance

Academic Membership
 Currently about 100+ academic members in OPF
14
A*STAR ASU ASTRI Moscow State
University
Carnegie Mellon Univ.
CDAC Colorado School of
Mines
CINECA CFMS Coimbatore Institute of
Technology
Dalian University of
Technology
GSIC Hartree Centre ICM IIIT Bangalore
IIT Bombay Indian Institute for
Technology Roorkee
ICCS INAF FZ Jülich
LSU BSC Nanyang
Technological
University
National University of
Singapore
NIT Mangalore
NIT Warangal Northeastern
University in China
ORNL OSU RICE
Rome HPC Center LLNL SANDIA SASTRA University Seoul National
University
Shanghai Shao Tong
University
SICSR TEES Tohoku University Tsinghua University
University of Arkansas SDSC Unicamp University of Central
Florida
University of Florida
University of Hawai University of
Hyderabad
University of Illinois University of Michigan University of Oregon
University of Patras University of Southern
California
TACC Waseda University IISc ,Loyola,IIT
Roorkee

Goals of the Academia Discussion Group
 Provide training and exchange of experience and know-how
 Provide platform for networking among academic members
 Work on engagement of HPC community
 Enable co-design/development activities
15
6/2
0/2

Conclusions
 Growing number of academic organizations have become member of the
OpenPOWER Foundation
 The Academia Discussion Groups provides a platform for training,
networking, engagement and enablement of co-design
 Those who have not yet joined:
You are welcome to join
https://members.openpowerfoundation.org/wg/AcademiaDG/mail/index
 OpenPOWER AI virtual University's focus on bringing together industry,
government and academic expertise to connect and help shape the AI
future .
 https://www.youtube.com/channel/UCYLtbUp0AH0ZAv5mNut1Kcg
16
6/2
0/2

1. CPU
- POWER9 NZ gzip, has a potential when working with compressed-full
workload to reduced memory foot print and I/O bottlenecks in pre-processing
stage; is not today available but hopefully we will get this soon;
- CPU has direct access to GPU memory without need for migration; not
explored today in TF or Caffe part of PowerAI
- VSX3 can accelerate the media processing/pre-processing for computer
vision
http://www.eecg.utoronto.ca/~moshovos/ACA06/readings/altivec.pdf
2. System’s Memory
- 8x DDR4 memory channels will always give more performance and prevent
memory contention in AI workloads
- Managed memory is cache-coherent between CPU & GPU; not explored
today in TF or Caffe part of PowerAI

3. GPU
- NVLINK 2.0 with the CPU allows faster data movement from the CPU to the
GPU when datasets are larger in range of TB's
- GPUDirect RDMA to unified memory; don't think is explored today in TF or
Caffe part of PowerAI
- technology such LMS are best feet for large models like deep residual
networks / ResNet-152
https://arxiv.org/pdf/1803.06333
4. InfiniBand
- MPI / DDL / Horovod have the potential to explore this unique multi-host
socket direct adapter and provide lowest possible latency between many
learners when training. This will lead to lower training times. Posible
improvements in training efficiency over exiting research paper:
https://arxiv.org/pdf/1708.02188

5. I/O:
- PCIe Gen4 offers for NVMe adapters more bandwidth used for caching
datasets into compute nodes more closer to the GPUs (13.5GB/s vs 6.8GB/s
in PCIe Gen3); this is helping very much in pre-fetching the data into the
system memory
- OpenCAPI provides more bandwidth for other type of accelerators such
FPGA's give then option of fast inference processes; possible other kinds of
DRAM in the feature.
6. Others:
- Water cooled systems available for 4x GPUs and 6x GPUs are making the
AI solutions much more efficient at scale taken into consideration 300W/GPU
power consumption.

AI OpenPOWER Academia Discussion Group

More Related Content

What's hot

Similar to AI OpenPOWER Academia Discussion Group

More from Ganesan Narayanasamy

Recently uploaded

AI OpenPOWER Academia Discussion Group