SlideShare a Scribd company logo
1 of 21
Download to read offline
Yusuke Doi, Ph.D
Corporate Officer, VP of Computing Infrastructure, Preferred Networks, Inc.
MN-3, MN-Core and HPL
SC21 Green500 BOF
Who are We?
Why We Need Computing Power?
Preferred Networks Inc.
Industry Domains
Transportation Manufacturing Life Sciences Materials Robots Entertainment
Founded March 2014
Directors
CEO Toru Nishikawa
COO Daisuke Okanohara
CTO Ryosuke Okuta
Located
Tokyo, Japan (HQ) ​
Burlingame, CA., US
(Preferred Networks America, Inc.)​
Make the real world computable
How much information can you extract from a single image?
Our pixel accuracy
object detection model
extracts large amount of rich features
from single image using
● State-of-art algorithm
● Hyperparameter tuning and
optimization using Optuna™
● Proprietary CG-based annotation-
free data generation and
augmentation combined with
domain transfer to real image
● Distributed / large-batch training
5
Behavior model
with neural network
Computational Chemistry with Deep Learning
Searching for
new
materials
over
computers
Atom
Energy
and
Force
physical
property
Molecular
Dynamics
Our Capabilities
Deep Learning
World class researchers
focusing on deep learning
Expertise
Wide range of deep expertise from
robotics to genomics to
computational chemistry
World class computational
resources designed for deep
learning application
Private Super
Computer
Software
In-house developments of OSS and
hyperparameter tuning library to
accelerate software development
MN-3 and MN-Core: Deep Learning Supercomputer
7
MN-Core MN-Core Board x 4
CPU Intel Xeon 8260M 2way (48 physical cores)
Memory 384GB DDR4
Storage Class Memory 3TB Intel Optane DC Persistent Memory
Network
MN-Core DirectConnect(112Gbps) x 2
Mellanox ConnectX-6(100GbE) x 2
On board(10GbE) x 2
MN-3 node specs
Deep learning processor MN-Core
For more information please visit: https://projects.preferred.jp/supercomputers/en/
MN-3 is the world’s most energy efficient supercomputer for deep learning.
We use HPL to understand how to run our computer efficiently.
Green500 / TOP500 history:
● 2021/11, 39.38GFlops/W (Green500 #1 / TOP500 #301)
● 2021/06, 29.70GFlops/W (Green500 #1 / TOP500 #335)
● 2020/11, 26.04GFlops/W (Green500 #2 / TOP500 #330)
● 2020/06, 21.11GFlops/W (Green500 #1 / TOP500 #393)
MN-3 and HPL
Giant SIMD Processor
● Single instruction stream
● 500W/Package @ 32.8TF(DP)
○ 65GF/W on chip (ideal case)
● Hierarchical structure with unique on-chip
network (broadcast, aggregation, etc)
● Deterministic/transparent from software
○ no cache, software shall manage data
copy between each layers
MN-Core
Philosophy behind MN -Core Hardware
By providing only the functions necessary for
computation and controlling them completely with
software, we can achieve high execution
efficiency/power efficiency with minimal hardware.
This is difficult to achieve with cache-based parallel
processors whose behavior is hidden to software.
Prof. Makino
(Kobe Univ.)
Idea Behind MN -Core : Transparent Hardware for
High Performance
Power Measurements
● Level 2 in our first Green500 (June 2020)
● Upgraded to meet Level 3 requirements
○ #1 system should be one of the best
examples in power measurement
● We measure in level-3 criteria since our
second Green500 (Nov. 2020)
← upgrading power facility and measurement
devices to meet Level 3 requirements
Update to Level 3 power measurement
Level 3 Measurements: Power System of the MN-3
200V600A
3P3W
200V600A
3P3W
200V150A (3P3W) MN-3A (Zone 0) - 32nodes
Smart PDU x 10
Smart PDU x 8
MN-3 nodes x 16
MN-3 nodes x 16
MN-3 Interconnect
MN-3A (Zone 1)
Revenue grade meter
ME110SSR x 4
Power Analyzer
WT1800E (6elements)
MN-3 Power Measurements system TSDB
HPL program
Trigger
Feedback
via Slack bot, Web (Grafana)
Everybody can see results on Slack and Grafana dashboards
Measurement system supporting continuous improvements
● The more iterations, the more improvements
● Key to rapid iterations: how we quickly share the results of experiments
● Automated reporting system
○ Issues a unique ID to each HPL run
○ Records timestamp of core/full phase with the ID
○ Generates summary and graph of power measurements
○ Share the results in Slack
● It helps us to quickly understand effects of development
Optimization
We targeted 40GF/W with our 12nm accelerator
● 2020/06, 21.11GFlops/W, efficiency 41% (#1)
○ initial challenge, made it work
● 2020/11, 26.04GFlops/W, efficiency 53% (#2)
○ optimization on scheduling, GEMM
● 2021/06, 29.70GFlops/W, efficiency 58% (#1)
○ even more optimization
● 2021/11, 39.38GFlops/W, efficiency 64% (#1)
○ interconnect improvement, aggressive software-level clock
gating, even even more optimization
MN-Core Challenge on HPL
58% → 64% (6pt gain)
● +2pt (58→60): Optimizing DGEMM kernel and re-organized
overlapping DGEMM and communication (swap and panel
broadcast)
● +3pt (60→63): Increased bandwidth of interconnect (MN-Core
DirectConnect) and more overlapping calc. and comm.
● +1pt (63→64): Optimizing other parts including panel factorization
and dynamic code generation
Execution Efficiency Improvement Breakdown
29.70GF/W → 39.38GF/W (9.68GF/W gain)
● +3.4GF/W: Corresponding to the improvement of execution efficiency
● +4.4GF/W: Generating "energy-efficient instructions" by software:
stopping unused arithmetic units, using scratchpad FFs instead of
SRAMs, and reducing energy consumption of data copy, etc.
● +1.9GF/W: Other tuning including the core voltage and freq.
Power Efficiency Improvement Breakdown
Stop an ALU in PE
Reuse a matrix as much
as possible to reduce
data copy
● Unique computation framework of MN-Core
○ Deterministic and transparent hardware fully controlled by software
○ Application-specific optimization
● HPL is very useful benchmark to understand efficiency of new-style computer in
real environment
○ Precise and integrated measurement is essential for continuous
improvement
Please visit us at booth #1521!
Summary

More Related Content

What's hot

2015年度GPGPU実践プログラミング 第9回 行列計算(行列-行列積)
2015年度GPGPU実践プログラミング 第9回 行列計算(行列-行列積)2015年度GPGPU実践プログラミング 第9回 行列計算(行列-行列積)
2015年度GPGPU実践プログラミング 第9回 行列計算(行列-行列積)
智啓 出川
 
0から理解するニューラルネットアーキテクチャサーチ(NAS)
0から理解するニューラルネットアーキテクチャサーチ(NAS)0から理解するニューラルネットアーキテクチャサーチ(NAS)
0から理解するニューラルネットアーキテクチャサーチ(NAS)
MasanoriSuganuma
 

What's hot (20)

入門 Kubeflow ~Kubernetesで機械学習をはじめるために~ (NTT Tech Conference #4 講演資料)
入門 Kubeflow ~Kubernetesで機械学習をはじめるために~ (NTT Tech Conference #4 講演資料)入門 Kubeflow ~Kubernetesで機械学習をはじめるために~ (NTT Tech Conference #4 講演資料)
入門 Kubeflow ~Kubernetesで機械学習をはじめるために~ (NTT Tech Conference #4 講演資料)
 
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
 
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
 
強化学習初心者が強化学習でニューラルネットワークの設計を自動化してみたい
強化学習初心者が強化学習でニューラルネットワークの設計を自動化してみたい強化学習初心者が強化学習でニューラルネットワークの設計を自動化してみたい
強化学習初心者が強化学習でニューラルネットワークの設計を自動化してみたい
 
プログラムを高速化する話Ⅱ 〜GPGPU編〜
プログラムを高速化する話Ⅱ 〜GPGPU編〜プログラムを高速化する話Ⅱ 〜GPGPU編〜
プログラムを高速化する話Ⅱ 〜GPGPU編〜
 
Graph U-Nets
Graph U-NetsGraph U-Nets
Graph U-Nets
 
いまさら聞けない!CUDA高速化入門
いまさら聞けない!CUDA高速化入門いまさら聞けない!CUDA高速化入門
いまさら聞けない!CUDA高速化入門
 
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
 
全力解説!Transformer
全力解説!Transformer全力解説!Transformer
全力解説!Transformer
 
2015年度GPGPU実践プログラミング 第9回 行列計算(行列-行列積)
2015年度GPGPU実践プログラミング 第9回 行列計算(行列-行列積)2015年度GPGPU実践プログラミング 第9回 行列計算(行列-行列積)
2015年度GPGPU実践プログラミング 第9回 行列計算(行列-行列積)
 
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
 
Quine・難解プログラミングについて
Quine・難解プログラミングについてQuine・難解プログラミングについて
Quine・難解プログラミングについて
 
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
 
[DL輪読会]A closer look at few shot classification
[DL輪読会]A closer look at few shot classification[DL輪読会]A closer look at few shot classification
[DL輪読会]A closer look at few shot classification
 
0から理解するニューラルネットアーキテクチャサーチ(NAS)
0から理解するニューラルネットアーキテクチャサーチ(NAS)0から理解するニューラルネットアーキテクチャサーチ(NAS)
0から理解するニューラルネットアーキテクチャサーチ(NAS)
 
分散学習のあれこれ~データパラレルからモデルパラレルまで~
分散学習のあれこれ~データパラレルからモデルパラレルまで~分散学習のあれこれ~データパラレルからモデルパラレルまで~
分散学習のあれこれ~データパラレルからモデルパラレルまで~
 
「NVIDIA プロファイラを用いたPyTorch学習最適化手法のご紹介(修正版)」
「NVIDIA プロファイラを用いたPyTorch学習最適化手法のご紹介(修正版)」「NVIDIA プロファイラを用いたPyTorch学習最適化手法のご紹介(修正版)」
「NVIDIA プロファイラを用いたPyTorch学習最適化手法のご紹介(修正版)」
 
Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩
 
Triplet Loss 徹底解説
Triplet Loss 徹底解説Triplet Loss 徹底解説
Triplet Loss 徹底解説
 
CUDAプログラミング入門
CUDAプログラミング入門CUDAプログラミング入門
CUDAプログラミング入門
 

Similar to MN-3, MN-Core and HPL - SC21 Green500 BOF

Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
inside-BigData.com
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
Junli Gu
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
Bomm Kim
 
Study of Energy Efficient Images with Just Noticeable Difference Threshold Ba...
Study of Energy Efficient Images with Just Noticeable Difference Threshold Ba...Study of Energy Efficient Images with Just Noticeable Difference Threshold Ba...
Study of Energy Efficient Images with Just Noticeable Difference Threshold Ba...
ijtsrd
 
Top 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisTop 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & Analysis
NomanSiddiqui41
 
Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processing
ideas2ignite
 

Similar to MN-3, MN-Core and HPL - SC21 Green500 BOF (20)

Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
ECML PKDD 2021 ML meets IoT Tutorial Part II: Creating ML based Self learning...
ECML PKDD 2021 ML meets IoT Tutorial Part II: Creating ML based Self learning...ECML PKDD 2021 ML meets IoT Tutorial Part II: Creating ML based Self learning...
ECML PKDD 2021 ML meets IoT Tutorial Part II: Creating ML based Self learning...
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
Study of Energy Efficient Images with Just Noticeable Difference Threshold Ba...
Study of Energy Efficient Images with Just Noticeable Difference Threshold Ba...Study of Energy Efficient Images with Just Noticeable Difference Threshold Ba...
Study of Energy Efficient Images with Just Noticeable Difference Threshold Ba...
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 
Early Application experiences on Summit
Early Application experiences on Summit Early Application experiences on Summit
Early Application experiences on Summit
 
Top 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisTop 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & Analysis
 
Performance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and MindsporePerformance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and Mindspore
 
High Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming ParadigmsHigh Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming Paradigms
 
Fugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons LearnedFugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons Learned
 
Scolari's ICCD17 Talk
Scolari's ICCD17 TalkScolari's ICCD17 Talk
Scolari's ICCD17 Talk
 
Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processing
 
Lakefield: Hybrid Cores in 3D Package
Lakefield: Hybrid Cores in 3D PackageLakefield: Hybrid Cores in 3D Package
Lakefield: Hybrid Cores in 3D Package
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
 
PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...
 
Deep learning at the edge: 100x Inference improvement on edge devices
Deep learning at the edge: 100x Inference improvement on edge devicesDeep learning at the edge: 100x Inference improvement on edge devices
Deep learning at the edge: 100x Inference improvement on edge devices
 
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDSFACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
 

More from Preferred Networks

More from Preferred Networks (20)

PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
 
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
 
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
 
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
 
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
 
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
 
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
 
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
 
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
 
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
 
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語るKubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
 
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
 
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
 
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
 
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
 
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
 
PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...
PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...
PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...
 
PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜
PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜
PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜
 
わかる!metadata.managedFields / Kubernetes Meetup Tokyo 48
わかる!metadata.managedFields / Kubernetes Meetup Tokyo 48わかる!metadata.managedFields / Kubernetes Meetup Tokyo 48
わかる!metadata.managedFields / Kubernetes Meetup Tokyo 48
 
Playgram開発秘話_2022年1月プログラミングシンポジウム招待講演_西澤勇輝、岡本雄太
Playgram開発秘話_2022年1月プログラミングシンポジウム招待講演_西澤勇輝、岡本雄太Playgram開発秘話_2022年1月プログラミングシンポジウム招待講演_西澤勇輝、岡本雄太
Playgram開発秘話_2022年1月プログラミングシンポジウム招待講演_西澤勇輝、岡本雄太
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

MN-3, MN-Core and HPL - SC21 Green500 BOF

  • 1. Yusuke Doi, Ph.D Corporate Officer, VP of Computing Infrastructure, Preferred Networks, Inc. MN-3, MN-Core and HPL SC21 Green500 BOF
  • 2. Who are We? Why We Need Computing Power?
  • 3. Preferred Networks Inc. Industry Domains Transportation Manufacturing Life Sciences Materials Robots Entertainment Founded March 2014 Directors CEO Toru Nishikawa COO Daisuke Okanohara CTO Ryosuke Okuta Located Tokyo, Japan (HQ) ​ Burlingame, CA., US (Preferred Networks America, Inc.)​ Make the real world computable
  • 4. How much information can you extract from a single image? Our pixel accuracy object detection model extracts large amount of rich features from single image using ● State-of-art algorithm ● Hyperparameter tuning and optimization using Optuna™ ● Proprietary CG-based annotation- free data generation and augmentation combined with domain transfer to real image ● Distributed / large-batch training
  • 5. 5 Behavior model with neural network Computational Chemistry with Deep Learning Searching for new materials over computers Atom Energy and Force physical property Molecular Dynamics
  • 6. Our Capabilities Deep Learning World class researchers focusing on deep learning Expertise Wide range of deep expertise from robotics to genomics to computational chemistry World class computational resources designed for deep learning application Private Super Computer Software In-house developments of OSS and hyperparameter tuning library to accelerate software development
  • 7. MN-3 and MN-Core: Deep Learning Supercomputer 7 MN-Core MN-Core Board x 4 CPU Intel Xeon 8260M 2way (48 physical cores) Memory 384GB DDR4 Storage Class Memory 3TB Intel Optane DC Persistent Memory Network MN-Core DirectConnect(112Gbps) x 2 Mellanox ConnectX-6(100GbE) x 2 On board(10GbE) x 2 MN-3 node specs Deep learning processor MN-Core For more information please visit: https://projects.preferred.jp/supercomputers/en/
  • 8. MN-3 is the world’s most energy efficient supercomputer for deep learning. We use HPL to understand how to run our computer efficiently. Green500 / TOP500 history: ● 2021/11, 39.38GFlops/W (Green500 #1 / TOP500 #301) ● 2021/06, 29.70GFlops/W (Green500 #1 / TOP500 #335) ● 2020/11, 26.04GFlops/W (Green500 #2 / TOP500 #330) ● 2020/06, 21.11GFlops/W (Green500 #1 / TOP500 #393) MN-3 and HPL
  • 9. Giant SIMD Processor ● Single instruction stream ● 500W/Package @ 32.8TF(DP) ○ 65GF/W on chip (ideal case) ● Hierarchical structure with unique on-chip network (broadcast, aggregation, etc) ● Deterministic/transparent from software ○ no cache, software shall manage data copy between each layers MN-Core
  • 10. Philosophy behind MN -Core Hardware By providing only the functions necessary for computation and controlling them completely with software, we can achieve high execution efficiency/power efficiency with minimal hardware. This is difficult to achieve with cache-based parallel processors whose behavior is hidden to software. Prof. Makino (Kobe Univ.) Idea Behind MN -Core : Transparent Hardware for High Performance
  • 12. ● Level 2 in our first Green500 (June 2020) ● Upgraded to meet Level 3 requirements ○ #1 system should be one of the best examples in power measurement ● We measure in level-3 criteria since our second Green500 (Nov. 2020) ← upgrading power facility and measurement devices to meet Level 3 requirements Update to Level 3 power measurement
  • 13. Level 3 Measurements: Power System of the MN-3 200V600A 3P3W 200V600A 3P3W 200V150A (3P3W) MN-3A (Zone 0) - 32nodes Smart PDU x 10 Smart PDU x 8 MN-3 nodes x 16 MN-3 nodes x 16 MN-3 Interconnect MN-3A (Zone 1) Revenue grade meter ME110SSR x 4 Power Analyzer WT1800E (6elements) MN-3 Power Measurements system TSDB HPL program Trigger Feedback via Slack bot, Web (Grafana)
  • 14. Everybody can see results on Slack and Grafana dashboards
  • 15. Measurement system supporting continuous improvements ● The more iterations, the more improvements ● Key to rapid iterations: how we quickly share the results of experiments ● Automated reporting system ○ Issues a unique ID to each HPL run ○ Records timestamp of core/full phase with the ID ○ Generates summary and graph of power measurements ○ Share the results in Slack ● It helps us to quickly understand effects of development
  • 17. We targeted 40GF/W with our 12nm accelerator
  • 18. ● 2020/06, 21.11GFlops/W, efficiency 41% (#1) ○ initial challenge, made it work ● 2020/11, 26.04GFlops/W, efficiency 53% (#2) ○ optimization on scheduling, GEMM ● 2021/06, 29.70GFlops/W, efficiency 58% (#1) ○ even more optimization ● 2021/11, 39.38GFlops/W, efficiency 64% (#1) ○ interconnect improvement, aggressive software-level clock gating, even even more optimization MN-Core Challenge on HPL
  • 19. 58% → 64% (6pt gain) ● +2pt (58→60): Optimizing DGEMM kernel and re-organized overlapping DGEMM and communication (swap and panel broadcast) ● +3pt (60→63): Increased bandwidth of interconnect (MN-Core DirectConnect) and more overlapping calc. and comm. ● +1pt (63→64): Optimizing other parts including panel factorization and dynamic code generation Execution Efficiency Improvement Breakdown
  • 20. 29.70GF/W → 39.38GF/W (9.68GF/W gain) ● +3.4GF/W: Corresponding to the improvement of execution efficiency ● +4.4GF/W: Generating "energy-efficient instructions" by software: stopping unused arithmetic units, using scratchpad FFs instead of SRAMs, and reducing energy consumption of data copy, etc. ● +1.9GF/W: Other tuning including the core voltage and freq. Power Efficiency Improvement Breakdown Stop an ALU in PE Reuse a matrix as much as possible to reduce data copy
  • 21. ● Unique computation framework of MN-Core ○ Deterministic and transparent hardware fully controlled by software ○ Application-specific optimization ● HPL is very useful benchmark to understand efficiency of new-style computer in real environment ○ Precise and integrated measurement is essential for continuous improvement Please visit us at booth #1521! Summary