OpenPOWER and AI
Workshop
Ganesan Narayanasamy
IBM
Welcome you all for the AI and OpenPOWER Bootcamp
6/2
0/2
2
OpenPOWER & AI Workshop at BSC ,Barcelona
By OpenPOWER Academia
Day 1 is meant as an introduction for everyone interested in using AI.
Day 2 is meant to go deeper with those who have especially challenging projects.
on 18th and 19th June 2018
Agenda
Day 1 - June 18th 2018
9:00 a.m to 9.30 a.m.
9.30 a.m to 10.15 am
10.15 am to 10.30 am
10.30 am to 11.15 am
11.15 am to 12.00 Noon
12.00 Noon to 1.00 pm
Welcome and OpenPOWER ADG features
Introduction to Power 9 and PowerAI
Break
Large Model Support and Distributed Deep Learning
Use Case Demonstration with PowerAI
Lunch
1.00pm to 1.45 pm
1.45 pm to 2.45 pm
2.45 pm to 3.00 pm
3.00.pm to 3.45pm
3.45 pm to 4.45 pm
4.45 pm to 5.00 pm
Mellanox Feature Updates
CFD Simulation on Power
Break
Introduction to Snap Machine Learning
Snap Machine Learning Demos , Q&A
Wrap up and Q & A
Agenda
Day 2 - June 19th 2018
9.00 am to 9.30 am
9.30 am to 12.00 pm
12.00 pm to 1.00 pm
01.00 pm to 04.30 pm
Quick review about Day I
Deep Learning Exercise II using Nimbix /Other Infra
Industry specific use cases ( LMS )
Lunch
Deep Learning Exercise II using Nimbix/Other infra
Industry specific Use cases using P9 features ( LMS
and DDL )
Founding Members
in 2013
Ecosystem
Chip / SOC
This is What A Revolution Looks Like © 2018 OpenPOWER
Foundation
I/O / Storage / Acceleration
Boards /
Systems
Software
System / Integration
Implementation / HPC / Research
Software
Boards /
Systems
System / Integration
I/O / Storage / Acceleration
Implementation / HPC / Research
Chip / SOC
This is What A Revolution Looks Like © 2017 OpenPOWER
Foundation
328+
Members
33
Countri
es
70+
ISVs
Chip / SOC
This is What A Revolution Looks Like © 2017 OpenPOWER
Foundation
I/O / Storage / Acceleration
Implementation / HPC / Research
Boards /
Systems
System / Integration
Software
328+
Members
33
Countri
es
70+
ISVs
Active Membership
From All
Layers of the
Stack
100k+ Linux Applications
Running on Power
2300 ISVs Written Code
on Linux
Partners
Bring
Systems
to Market
150+ OpenPOWER Ready
Certified Products
20+ Systems Manufacturers
40+ POWER-based systems
shipping or in development
100+ Collaborative innovations
under way
OpenPOWER in Action
6/2
0/2
12
What is CORAL?
The program through which Summit & Sierra are procured.
 Several DOE labs have strong supercomputing programs and facilities.
 To bring the next generation of leading supercomputers to these labs, DOE
created CORAL (the Collaboration of Oak Ridge, Argonne, and Livermore) to
jointly procure these systems, and in so doing, align strategy and resources
across the DOE enterprise.
 Collaboration grouping of DOE labs was done based on common acquisition
timings. Collaboration is a win-win for all parties.
“Summit” System “Sierra” System
OpenPOWER Technologies: IBM POWER CPUs, NVIDIA Tesla GPUs, Mellanox EDR
100Gb/s InfiniBand
Paving The Road to Exascale Performance
Academic Membership
 Currently about 100+ academic members in OPF
14
A*STAR ASU ASTRI Moscow State
University
Carnegie Mellon Univ.
CDAC Colorado School of
Mines
CINECA CFMS Coimbatore Institute of
Technology
Dalian University of
Technology
GSIC Hartree Centre ICM IIIT Bangalore
IIT Bombay Indian Institute for
Technology Roorkee
ICCS INAF FZ Jülich
LSU BSC Nanyang
Technological
University
National University of
Singapore
NIT Mangalore
NIT Warangal Northeastern
University in China
ORNL OSU RICE
Rome HPC Center LLNL SANDIA SASTRA University Seoul National
University
Shanghai Shao Tong
University
SICSR TEES Tohoku University Tsinghua University
University of Arkansas SDSC Unicamp University of Central
Florida
University of Florida
University of Hawai University of
Hyderabad
University of Illinois University of Michigan University of Oregon
University of Patras University of Southern
California
TACC Waseda University IISc ,Loyola,IIT
Roorkee
Goals of the Academia Discussion Group
 Provide training and exchange of experience and know-how
 Provide platform for networking among academic members
 Work on engagement of HPC community
 Enable co-design/development activities
15
6/2
0/2
Conclusions
 Growing number of academic organizations have become member of the
OpenPOWER Foundation
 The Academia Discussion Groups provides a platform for training,
networking, engagement and enablement of co-design
 Those who have not yet joined:
You are welcome to join
https://members.openpowerfoundation.org/wg/AcademiaDG/mail/index
 OpenPOWER AI virtual University's focus on bringing together industry,
government and academic expertise to connect and help shape the AI
future .
 https://www.youtube.com/channel/UCYLtbUp0AH0ZAv5mNut1Kcg
16
6/2
0/2
Power 9 Advantages ( AC922)
1. CPU
- POWER9 NZ gzip, has a potential when working with compressed-full
workload to reduced memory foot print and I/O bottlenecks in pre-processing
stage; is not today available but hopefully we will get this soon;
- CPU has direct access to GPU memory without need for migration; not
explored today in TF or Caffe part of PowerAI
- VSX3 can accelerate the media processing/pre-processing for computer
vision
http://www.eecg.utoronto.ca/~moshovos/ACA06/readings/altivec.pdf
2. System’s Memory
- 8x DDR4 memory channels will always give more performance and prevent
memory contention in AI workloads
- Managed memory is cache-coherent between CPU & GPU; not explored
today in TF or Caffe part of PowerAI
3. GPU
- NVLINK 2.0 with the CPU allows faster data movement from the CPU to the
GPU when datasets are larger in range of TB's
- GPUDirect RDMA to unified memory; don't think is explored today in TF or
Caffe part of PowerAI
- technology such LMS are best feet for large models like deep residual
networks / ResNet-152
https://arxiv.org/pdf/1803.06333
4. InfiniBand
- MPI / DDL / Horovod have the potential to explore this unique multi-host
socket direct adapter and provide lowest possible latency between many
learners when training. This will lead to lower training times. Posible
improvements in training efficiency over exiting research paper:
https://arxiv.org/pdf/1708.02188
5. I/O:
- PCIe Gen4 offers for NVMe adapters more bandwidth used for caching
datasets into compute nodes more closer to the GPUs (13.5GB/s vs 6.8GB/s
in PCIe Gen3); this is helping very much in pre-fetching the data into the
system memory
- OpenCAPI provides more bandwidth for other type of accelerators such
FPGA's give then option of fast inference processes; possible other kinds of
DRAM in the feature.
6. Others:
- Water cooled systems available for 4x GPUs and 6x GPUs are making the
AI solutions much more efficient at scale taken into consideration 300W/GPU
power consumption.
THANK YOU!

AI OpenPOWER Academia Discussion Group

  • 1.
  • 2.
    Welcome you allfor the AI and OpenPOWER Bootcamp 6/2 0/2 2
  • 3.
    OpenPOWER & AIWorkshop at BSC ,Barcelona By OpenPOWER Academia Day 1 is meant as an introduction for everyone interested in using AI. Day 2 is meant to go deeper with those who have especially challenging projects. on 18th and 19th June 2018
  • 4.
    Agenda Day 1 -June 18th 2018 9:00 a.m to 9.30 a.m. 9.30 a.m to 10.15 am 10.15 am to 10.30 am 10.30 am to 11.15 am 11.15 am to 12.00 Noon 12.00 Noon to 1.00 pm Welcome and OpenPOWER ADG features Introduction to Power 9 and PowerAI Break Large Model Support and Distributed Deep Learning Use Case Demonstration with PowerAI Lunch 1.00pm to 1.45 pm 1.45 pm to 2.45 pm 2.45 pm to 3.00 pm 3.00.pm to 3.45pm 3.45 pm to 4.45 pm 4.45 pm to 5.00 pm Mellanox Feature Updates CFD Simulation on Power Break Introduction to Snap Machine Learning Snap Machine Learning Demos , Q&A Wrap up and Q & A
  • 5.
    Agenda Day 2 -June 19th 2018 9.00 am to 9.30 am 9.30 am to 12.00 pm 12.00 pm to 1.00 pm 01.00 pm to 04.30 pm Quick review about Day I Deep Learning Exercise II using Nimbix /Other Infra Industry specific use cases ( LMS ) Lunch Deep Learning Exercise II using Nimbix/Other infra Industry specific Use cases using P9 features ( LMS and DDL )
  • 6.
  • 7.
  • 8.
    Chip / SOC Thisis What A Revolution Looks Like © 2018 OpenPOWER Foundation I/O / Storage / Acceleration Boards / Systems Software System / Integration Implementation / HPC / Research
  • 9.
    Software Boards / Systems System /Integration I/O / Storage / Acceleration Implementation / HPC / Research Chip / SOC This is What A Revolution Looks Like © 2017 OpenPOWER Foundation 328+ Members 33 Countri es 70+ ISVs
  • 10.
    Chip / SOC Thisis What A Revolution Looks Like © 2017 OpenPOWER Foundation I/O / Storage / Acceleration Implementation / HPC / Research Boards / Systems System / Integration Software 328+ Members 33 Countri es 70+ ISVs Active Membership From All Layers of the Stack 100k+ Linux Applications Running on Power 2300 ISVs Written Code on Linux Partners Bring Systems to Market 150+ OpenPOWER Ready Certified Products 20+ Systems Manufacturers 40+ POWER-based systems shipping or in development 100+ Collaborative innovations under way
  • 12.
  • 13.
    What is CORAL? Theprogram through which Summit & Sierra are procured.  Several DOE labs have strong supercomputing programs and facilities.  To bring the next generation of leading supercomputers to these labs, DOE created CORAL (the Collaboration of Oak Ridge, Argonne, and Livermore) to jointly procure these systems, and in so doing, align strategy and resources across the DOE enterprise.  Collaboration grouping of DOE labs was done based on common acquisition timings. Collaboration is a win-win for all parties. “Summit” System “Sierra” System OpenPOWER Technologies: IBM POWER CPUs, NVIDIA Tesla GPUs, Mellanox EDR 100Gb/s InfiniBand Paving The Road to Exascale Performance
  • 14.
    Academic Membership  Currentlyabout 100+ academic members in OPF 14 A*STAR ASU ASTRI Moscow State University Carnegie Mellon Univ. CDAC Colorado School of Mines CINECA CFMS Coimbatore Institute of Technology Dalian University of Technology GSIC Hartree Centre ICM IIIT Bangalore IIT Bombay Indian Institute for Technology Roorkee ICCS INAF FZ Jülich LSU BSC Nanyang Technological University National University of Singapore NIT Mangalore NIT Warangal Northeastern University in China ORNL OSU RICE Rome HPC Center LLNL SANDIA SASTRA University Seoul National University Shanghai Shao Tong University SICSR TEES Tohoku University Tsinghua University University of Arkansas SDSC Unicamp University of Central Florida University of Florida University of Hawai University of Hyderabad University of Illinois University of Michigan University of Oregon University of Patras University of Southern California TACC Waseda University IISc ,Loyola,IIT Roorkee
  • 15.
    Goals of theAcademia Discussion Group  Provide training and exchange of experience and know-how  Provide platform for networking among academic members  Work on engagement of HPC community  Enable co-design/development activities 15 6/2 0/2
  • 16.
    Conclusions  Growing numberof academic organizations have become member of the OpenPOWER Foundation  The Academia Discussion Groups provides a platform for training, networking, engagement and enablement of co-design  Those who have not yet joined: You are welcome to join https://members.openpowerfoundation.org/wg/AcademiaDG/mail/index  OpenPOWER AI virtual University's focus on bringing together industry, government and academic expertise to connect and help shape the AI future .  https://www.youtube.com/channel/UCYLtbUp0AH0ZAv5mNut1Kcg 16 6/2 0/2
  • 17.
  • 19.
    1. CPU - POWER9NZ gzip, has a potential when working with compressed-full workload to reduced memory foot print and I/O bottlenecks in pre-processing stage; is not today available but hopefully we will get this soon; - CPU has direct access to GPU memory without need for migration; not explored today in TF or Caffe part of PowerAI - VSX3 can accelerate the media processing/pre-processing for computer vision http://www.eecg.utoronto.ca/~moshovos/ACA06/readings/altivec.pdf 2. System’s Memory - 8x DDR4 memory channels will always give more performance and prevent memory contention in AI workloads - Managed memory is cache-coherent between CPU & GPU; not explored today in TF or Caffe part of PowerAI
  • 20.
    3. GPU - NVLINK2.0 with the CPU allows faster data movement from the CPU to the GPU when datasets are larger in range of TB's - GPUDirect RDMA to unified memory; don't think is explored today in TF or Caffe part of PowerAI - technology such LMS are best feet for large models like deep residual networks / ResNet-152 https://arxiv.org/pdf/1803.06333 4. InfiniBand - MPI / DDL / Horovod have the potential to explore this unique multi-host socket direct adapter and provide lowest possible latency between many learners when training. This will lead to lower training times. Posible improvements in training efficiency over exiting research paper: https://arxiv.org/pdf/1708.02188
  • 21.
    5. I/O: - PCIeGen4 offers for NVMe adapters more bandwidth used for caching datasets into compute nodes more closer to the GPUs (13.5GB/s vs 6.8GB/s in PCIe Gen3); this is helping very much in pre-fetching the data into the system memory - OpenCAPI provides more bandwidth for other type of accelerators such FPGA's give then option of fast inference processes; possible other kinds of DRAM in the feature. 6. Others: - Water cooled systems available for 4x GPUs and 6x GPUs are making the AI solutions much more efficient at scale taken into consideration 300W/GPU power consumption.
  • 22.