MN-3, MN-Core and HPL - SC21 Green500 BOF

Yusuke Doi, Ph.D
Corporate Officer, VP of Computing Infrastructure, Preferred Networks, Inc.
MN-3, MN-Core and HPL
SC21 Green500 BOF

Who are We?
Why We Need Computing Power?

Preferred Networks Inc.
Industry Domains
Transportation Manufacturing Life Sciences Materials Robots Entertainment
Founded March 2014
Directors
CEO Toru Nishikawa
COO Daisuke Okanohara
CTO Ryosuke Okuta
Located
Tokyo, Japan (HQ)
Burlingame, CA., US
(Preferred Networks America, Inc.)
Make the real world computable

How much information can you extract from a single image?
Our pixel accuracy
object detection model
extracts large amount of rich features
from single image using
● State-of-art algorithm
● Hyperparameter tuning and
optimization using Optuna™
● Proprietary CG-based annotation-
free data generation and
augmentation combined with
domain transfer to real image
● Distributed / large-batch training

5
Behavior model
with neural network
Computational Chemistry with Deep Learning
Searching for
new
materials
over
computers
Atom
Energy
and
Force
physical
property
Molecular
Dynamics

Our Capabilities
Deep Learning
World class researchers
focusing on deep learning
Expertise
Wide range of deep expertise from
robotics to genomics to
computational chemistry
World class computational
resources designed for deep
learning application
Private Super
Computer
Software
In-house developments of OSS and
hyperparameter tuning library to
accelerate software development

MN-3 and MN-Core: Deep Learning Supercomputer
7
MN-Core MN-Core Board x 4
CPU Intel Xeon 8260M 2way （48 physical cores）
Memory 384GB DDR4
Storage Class Memory 3TB Intel Optane DC Persistent Memory
Network
MN-Core DirectConnect（112Gbps） x 2
Mellanox ConnectX-6（100GbE） x 2
On board（10GbE） x 2
MN-3 node specs
Deep learning processor MN-Core
For more information please visit: https://projects.preferred.jp/supercomputers/en/

MN-3 is the world’s most energy efficient supercomputer for deep learning.
We use HPL to understand how to run our computer efficiently.
Green500 / TOP500 history:
● 2021/11, 39.38GFlops/W (Green500 #1 / TOP500 #301)
MN-3 and HPL

Giant SIMD Processor
● Single instruction stream
● 500W/Package @ 32.8TF(DP)
○ 65GF/W on chip (ideal case)
● Hierarchical structure with unique on-chip
network (broadcast, aggregation, etc)
● Deterministic/transparent from software
○ no cache, software shall manage data
copy between each layers
MN-Core

Philosophy behind MN -Core Hardware
By providing only the functions necessary for
computation and controlling them completely with
software, we can achieve high execution
efficiency/power efficiency with minimal hardware.
This is difficult to achieve with cache-based parallel
processors whose behavior is hidden to software.
Prof. Makino
(Kobe Univ.)
Idea Behind MN -Core ： Transparent Hardware for
High Performance

● Level 2 in our first Green500 (June 2020)
● Upgraded to meet Level 3 requirements
○ #1 system should be one of the best
examples in power measurement
● We measure in level-3 criteria since our
second Green500 (Nov. 2020)
← upgrading power facility and measurement
devices to meet Level 3 requirements
Update to Level 3 power measurement

Level 3 Measurements: Power System of the MN-3
200V600A
3P3W
200V600A
3P3W
200V150A (3P3W) MN-3A (Zone 0) - 32nodes
Smart PDU x 10
Smart PDU x 8
MN-3 nodes x 16
MN-3 nodes x 16
MN-3 Interconnect
MN-3A (Zone 1)
Revenue grade meter
ME110SSR x 4
Power Analyzer
WT1800E (6elements)
MN-3 Power Measurements system TSDB
HPL program
Trigger
Feedback
via Slack bot, Web (Grafana)

Everybody can see results on Slack and Grafana dashboards

Measurement system supporting continuous improvements
● The more iterations, the more improvements
● Key to rapid iterations: how we quickly share the results of experiments
● Automated reporting system
○ Issues a unique ID to each HPL run
○ Records timestamp of core/full phase with the ID
○ Generates summary and graph of power measurements
○ Share the results in Slack
● It helps us to quickly understand effects of development

We targeted 40GF/W with our 12nm accelerator

● 2020/06, 21.11GFlops/W, efficiency 41% (#1)
○ initial challenge, made it work
○ optimization on scheduling, GEMM
○ even more optimization
○ interconnect improvement, aggressive software-level clock
gating, even even more optimization
MN-Core Challenge on HPL

58% → 64% (6pt gain)
● +2pt (58→60): Optimizing DGEMM kernel and re-organized
overlapping DGEMM and communication (swap and panel
broadcast)
● +3pt (60→63): Increased bandwidth of interconnect (MN-Core
DirectConnect) and more overlapping calc. and comm.
● +1pt (63→64): Optimizing other parts including panel factorization
and dynamic code generation
Execution Efficiency Improvement Breakdown

29.70GF/W → 39.38GF/W (9.68GF/W gain)
● +3.4GF/W: Corresponding to the improvement of execution efficiency
● +4.4GF/W: Generating "energy-efficient instructions" by software:
stopping unused arithmetic units, using scratchpad FFs instead of
SRAMs, and reducing energy consumption of data copy, etc.
● +1.9GF/W: Other tuning including the core voltage and freq.
Power Efficiency Improvement Breakdown
Stop an ALU in PE
Reuse a matrix as much
as possible to reduce
data copy

● Unique computation framework of MN-Core
○ Deterministic and transparent hardware fully controlled by software
○ Application-specific optimization
● HPL is very useful benchmark to understand efficiency of new-style computer in
real environment
○ Precise and integrated measurement is essential for continuous
improvement
Please visit us at booth #1521!
Summary

MN-3, MN-Core and HPL - SC21 Green500 BOF

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MN-3, MN-Core and HPL - SC21 Green500 BOF

Similar to MN-3, MN-Core and HPL - SC21 Green500 BOF (20)

More from Preferred Networks

More from Preferred Networks (20)

Recently uploaded

Recently uploaded (20)

MN-3, MN-Core and HPL - SC21 Green500 BOF