The document discusses IBM's Power Systems as an expert platform for artificial intelligence. Some key points:
- Power Systems are designed for modern AI workloads, with accelerated computing capabilities like GPUs and FPGAs.
- The IBM Power AC922 server provides an "acceleration superhighway" between CPUs, GPUs, and other accelerators for optimal AI performance.
- Tests show the AC922 can reduce AI model training times by 3.8x compared to x86 systems, thanks to features like high bandwidth NVLink connections between components.
- IBM's PowerAI software tools help make AI development easier on the Power platform.
8. “POWER9 is an absolute beast when it comes to
moving data, a critical for AI-centric processes.”
– Charles King
President and principal analyst, Pund-IT Inc
“IBM’s POWER9 is literally the Swiss Army knife of ML
acceleration as it supports an astronomical amount of IO and
bandwidth, 10X of anything that’s out there today.”
– Patrick Moorhead
Principal analyst, Moor Insights & Strategy
“Google is excited about IBM's progress in the
development of the latest Power technology”
– Bart Sano
VP of Google Platforms
”IBM Power is a great
cognitive platform if
not the best out there”
10. AC922
An acceleration superhighway
Unleash accelerated computing potential in the
post CPU-only era
Designed for the AI era
Architected for the modern analytics and
AI workloads that fuel insights
Delivering enterprise-class AI
Cutting-edge AI innovation data scientists desire,
with dependability IT requires
14. PowerAI package & tools
Democratization of the AI
Power System AC922 server
Nvlink NVIDIA GPUTensorFlow
CAFFE
Theano
Chainer
Torch
(DL4J)
DIGITS,
OpenBLAS,…
PowerAI DL Insight
SpectrumConductor
PowerAI Vision
POWER9
Power System AC922 server
Nvlink NVIDIA GPU
TensorFlow
CAFFE
Theano
Chainer
Torch
(DL4J)
DIGITS,
OpenBLAS,…
PowerAI DL Insight
SpectrumConductor
PowerAI Vision
POWER9
Power System AC922 server
Nvlink NVIDIA GPU
TensorFlow
CAFFE
Theano
Chainer
Torch
(DL4J)
DIGITS,
OpenBLAS,…
PowerAI DL Insight
SpectrumConductor
PowerAI Vision
POWER9Power System AC922 server
Nvlink NVIDIA GPU
TensorFlow
CAFFE
Theano
Chainer
Torch
(DL4J)
DIGITS,
OpenBLAS,…
PowerAI DL Insight
SpectrumConductor
PowerAI Vision
POWER9
Data Scientist Experience (DSX)
15. Maximize research productivity running training for
medical/satellite images with Caffe with the AC922
• 3.8X reduction vs tested x86 systems 1000 iterations running
on competing systems to train on 2k x 2k images
• Critical machine learning (ML) capabilities such as regression,
nearest neighbor, recommendation systems, clustering, etc.
operate on more than just the GPU memory
• NVLink 2.0 enables enhanced Host to GPU communication
• Large Model Support - use system memory and GPU memory
to support more complex and higher resolution data
3.8X reduction in AI model training
vs tested x86 systems
Results are based IBM Internal Measurements running 1000 iterations of Enlarged GoogleNet model (mini-batch size=5) on Enlarged Imagenet Dataset (2240x2240) .
Power AC922; 40 cores (2 x 20c chips), POWER9 with NVLink 2.0; 2.25 GHz, 1024 GB memory, 4xTesla V100 GPU ; Red Hat Enterprise Linux 7.4 for Power Little Endian
(POWER9) with CUDA 9.1/ CUDNN 7;. Competitive stack: 2x Xeon E5-2640 v4; 20 cores (2 x 10c chips) / 40 threads; Intel Xeon E5-2640 v4; 2.4 GHz; 1024 GB memory,
4xTesla V100 GPU, Ubuntu 16.04. with CUDA .9.0/ CUDNN 7 .
Software: IBM Caffe with LMS Source code https://github.com/ibmsoe/caffe/tree/master-lms
Caffe: More Accuracy
(3.8 iterations vs 1)
4 run
Accuracy
3 run
Accuracy
2 run
Accuracy
1 run
Accuracy
One
Iteration
One
Iteration
Two
Iterations
Three
Iterations
+ 80%
iteration
Xeon
4xV100
AC922
4xV100
17. Threads per
core vs x86
Up to 9.5x more I/O
bandwidth than x86
More RAM
possible vs. x86
CPU to deliver
PCIe gen 4
4x 9.5x 2.6x 1st
POWER9
An acceleration superhighway.
The only processor specifically designed for the AI era.
18. Summit
AI at unrivaled scale
Feature Titan Summit
Application
Performance
Baseline 5-10x Titan
Number of Nodes 18.688 ~4.600
Node performance 1,4 TF > 40 TF
Total performance 27.122 TF ~200.000 TF
Total System Memory 710 TB
>10 PB DDR4 + HBM2 + Non-
volatile
Processors
1 AMD Opteron™
1 NVIDIA Kepler™
2 IBM POWER9™
6 NVIDIA Volta™
File System 32 PB, 1 TB/s, Lustre®
250 PB, 2,5 TB/s, GPFS™
Peak power
consumption
9 MW 15 MW
19. POWER9 Family
2018
More performance and scale via
POWER9 processors
More memory capacity for
in-memory DB
Reduce latency and improve
throughput with enhanced I/O
support
• PCIe Gen4
• Integrated NVMe Flash (bootable)
High-bandwidth (25Gb/s) links for
GPU/OpenCAPI acceleration
2-socket
Entry
4-socket
Midrange
4- to16-socket
Modular High-end
Large-scale multi-socket SMP
Buffered memory attach
Robust 2-socket SMP
Direct memory attach
2017
AC922
2-socket
Linux
Scale Out 2-socket SMP
Direct memory attach
There are 3 key points with AC922 which make it the best server for AI. First, as I mentioned it is designed from the ground up for AI workloads, this starts with the acceleration superhighway. In AC922 we have introduced second generation NVLink between the CPU and GPU, this is 5.6x faster than PCIe Gen 3 architectures which x86 remains committed amongst other advanced IO interfaces. Second, we did not only focus on the the NVLink and the GPU, but designed a balanced system, one that is designed for the AI era with industry leading memory bandwidth, PCIe gen 4 busses for the best network connectivity with IB and high performance storage adapters. Lastly, we took the opensource deep learning frameworks and optimized them around this advanced design, added enhancements such as spark conductor for DDL and large model support while supporting everything from the HW to the SW in the solution. This results in the best server and solution for enterprise AI. Additionally, this server design will find use in applications such as HPC and accelerated databases…so do not think it is just for AI.
When we say how are nine is the ideal processor fork celebration you simply have to look at the various Advanced IO interfaces that only available on power. If we set the band with to PCA gentry speeds as one normalize it and compare to the other interfaces we have introduced into the power nine processor for example PCIe gen 4 has double the band width of that PCI E GEN three which X 86 remains committed to. Additionally Power is the only processor with in the wink between the processor and the axcelerator delivering up to 10 times the data bandwidth all while providing a coherent interface.
One of the more prevalent AI frameworks is Caffe. This result with Caffe and with Large Model Support running on POWER9 shows the value of NVLink 2.0 and the performance capability that it can delvier when the problem set becomes larger than the memory supported on the GPU cards. These tests move large amouants of data between the CPU and the GPU and can reduce model training times by 3.8X.
NOTE: These are capabilities of the POWER9 CPU, not necessarily capabilities of the scale-up CPUs found in the AC922.
At the center of our differentiation is the processor. Everything starts from here and it is designed for cognitive era. POWER has always had a stronger core with up to 4x the threads over x86. The architecture also enables advantaged memory bandwidth for a balanced systems designed enabling ease of data movement within the system. One of the core differentiators POWER delivers is the advanced IO interfaces. Last fall we introduced P8 with NVLink. This was the first processor with NVLink between the CPU and the GPU. With P9 we introduced more advanced interfaces such as next generation NVLink, PCIe Gen 4 and OpenCAPI.
2x core performance vs x86
1,5x core performance vs POWER8
2x more memory vs POWER8