① 「計算・データ・学習」融合スーパー
コンピュータシステムWisteria/BDEC-01
② 革新的ソフトウェア基盤h3-Open-BDEC
③ 次の一手:Miyabiと将来に向けた取り組み
東京大学情報基盤センター
① 「計算・データ・学習」融合スーパー
コンピュータシステムWisteria/BDEC-01
東京大学情報基盤センター
2001-2005 2006-2010 2011-2015 2016-2020 2021-2025 2026-2030
Hitachi
SR2201
307.2GF
Hitachi
SR8000/MPP
2,073.6 GF
Hitachi SR8000
1,024 GF
Hitachi SR11000
J1, J2
5.35 TF, 18.8 TF
Hitachi SR16K/M1
Yayoi
54.9 TF
Hitachi HA8000
T2K Todai
140 TF
Fujitsu FX10
Oakleaf-FX
1.13 PF
Reedbush-
U/H/L (SGI-HPE)
3.36 PF
33.1 PF
Fujitsu
BDEC-
02
150+ PF
AMD Opteron
OBCX
(Fujitsu)
6.61 PF
Ipomoea-01 25PB
Ipomoea-02
Ipomoea-
03
SPACR64 IXfx
Intel BDW +
NVIDIA P100
IBM Power7
IBM Power5+
Intel CLX
A64FX,
Intel Icelake+
NVIDIA A100
Oakforest-
PACS (Fujitsu)
25.0 PF
Intel Xeon Phi
Accelerators
SR8000
SR8000
HARP-1E
3
Pseudo Vector
Multicore CPU
GPU, Accelerators
Miyabi/OFP-II
80+ PF
Intel SPR, NVIDIA GH200
Supercomputers
@ITC/U.Tokyo
3,000+ Users
55+% outside of U.Tokyo
スーパーコンピューティング
の今後
• ワークロードの多様化
– 計算科学,計算工学:Simulations
– 大規模データ解析
– AI,機械学習
• (シミュレーション(計算)+データ+学習)融合
⇒Society 5.0実現に有効
– フィジカル空間とサイバー空間の融合
• S:シミュレーション(計算)(Simulation)
• D:データ(Data)
• L:学習(Learning)
– Simulation + Data + Learning = S+D+L
BDEC: S + D + L
mdx: S + D + L 4
• 2021年春に柏IIキャンパスで始動
– BDEC(Wisteria/BDEC-01):賢いスパコン
– Data Platform(mdx):Cloud的,よりフレキシブル
Engineering
Earth/Space
Material
Energy/Physics
Info. Sci. : System
Info. Sci. : Algrorithms
Info. Sci. : AI
Education
Industry
Bio
Bioinformatics
Social Sci. & Economics
Data
工学・
ものつくり
材料科学
産業利用
エネルギー・
物理学
Engineering
Earth/Space
Material
Energy/Physics
Info. Sci. : System
Info. Sci. : Algrorithms
Info. Sci. : AI
Education
Industry
Bio
Bioinformatics
Social Sci. & Economics
Data
材料科学
情報科学:AI
生物科学・
生体力学
バイオ
インフォマティクス
情報科学:
アルゴリズム
工学・ものつくり
地球科学・宇宙科学
材料科学
エネルギー・物理学
情報科学:システム
情報科学:アルゴリズム
情報科学:AI
教育
産業利用
生物科学・生体力学
バイオインフォマティクス
社会科学・経済学
データ科学・データ同化
CPU Cluster
Oakbridge-CX
GPU Cluster
Reedbush-L
生物科学・
生体力学
Wisteria/BDEC-01
• Operation started on May 14, 2021
• 33.1 PF, 8.38 PB/sec by Fujitsu
– ~4.5 MVA with Cooling, ~360m2
5
• 2 Types of Node Groups
– Hierarchical, Hybrid, Heterogeneous (h3)
– Simulation Node Group: Odyssey
• Fujitsu PRIMEHPC FX1000 (A64FX), 25.9 PF
– 7,680 nodes (368,640 cores), Tofu-D
– General Purpose CPU + HBM
– Commercial Version of “Fugaku”
– Data/Learning Node Group: Aquarius
• Data Analytics & AI/Machine Learning
• Intel Xeon Ice Lake + NVIDIA A100, 7.2PF
– 45 nodes (90x Ice Lake, 360x A100), IB-HDR
• DL nodes are connected to external resources directly
• File Systems: SFS (Shared/Large) + FFS (Fast/Small)
Fast File
System
(FFS)
1 PB, 1.0 TB/s
External
Resources
External Network
Simulation Nodes:
Odyssey
Fujitsu/Arm A64FX
25.9PF, 7.8 PB/s
2.0 TB/s
800 Gbps
Shared File
System
(SFS)
25.8 PB, 500 GB/s
Data/Learning
Nodes: Aquarius
Intel Ice Lake + NVIDIA A100
7.20 PF, 578.2 TB/s
External
Resources
Platform for Integration of (S+D+L)
Big Data & Extreme Computing
The 1st BDEC System
(Big Data & Extreme Computing)
HW Platform for Integration of (S+D+L)
http://www.top500.org/
Site Computer/Year Vendor Cores
Rmax
(PFLOPS)
Rpeak
(PFLOPS)
GFLOPS/W
Power
(kW)
1 El Capitan, 2024, USA DOE/NNSA/LLNL
HPE Cray EX255a, AMD 4th Gen EPYC 24C 1.8GHz, AMD Instinct MI300A,
Slingshot-11, TOSS
11,039,616 1,742.00
(=1.742 EF)
2,746.38
63.4 %
58.99 29,581
2
Frontier, 2021, USA DOE/SC/Oak Ridge
National Laboratory
HPE Cray EX235a, AMD Optimized 3rd Gen. EPYC 64C 2GHz, AMD Instinct
MI250X, Slingshot-11
9,066,176 1.353.00
2,055.72
65.8 %
54.98 24,607
3
Aurora, 2023, USA DOE/SC/Argonne National
Laboratory
HPE Cray EX - Intel Exascale Compute Blade, Xeon CPU Max 9470 52C 2.4GHz,
Intel Data Center GPU Max, Slingshot-11, Intel
9,264,128 1,012.00
1,980.01
51.1 %
26.15 38,698
4 Eagle, 2023, USA Microsoft
Microsoft NDv5, Xeon Platinum 8480C 48C 2GHz, NVIDIA H100, NVIDIA
Infiniband NDR
2,073,600 561.20
846.84
66.3 %
5 HPC 6, 2024, Italy Eni S.p.A.
HPE Cray EX235a, AMD Optimized 3rd Generation EPYC 64C 2GHz, AMD
Instinct MI250X, Slingshot-11, RHEL 8.9
3,143,520 477.90
606.97
66.3 %
56.48 8,461
6 Fugaku, 2020, Japan R-CCS, RIKEN Fujitsu PRIMEHPC FX1000, Fujitsu A64FX 48C 2.2GHz, Tofu-D 7,630,848 442.01
537.21
82.3 %
14.78 29,899
7
Alps, 2024, Switzerland Swiss Natl. SC
Centre (CSCS)
HPE Cray EX254n, NVIDIA Grace 72C 3.1GHz, NVIDIA GH200 Superchip,
Slingshot-11
2,121,600 434.90
574.84
75.7 %
61.05 7,124
8 LUMI, 2023, Finland EuroHPC/CSC
HPE Cray EX235a, AMD Optimized 3rd Gen. EPYC 64C 2GHz, AMD Instinct
MI250X, Slingshot-11
2,752,704 379.70
531.51
71.4 %
53.43 7,107
9 Leonard, 2023, Italy EuroHPC/Cineca
BullSequana XH2000, Xeon Platinum 8358 32C 2.6GHz, NVIDIA A100 SXM4
64GB, Quad-rail NVIDIA HDR100
1,824,768 241.20
306.31
78.7 %
32.19 7,494
10 Tuolumne, 2024, USA DOE/NNSA/LLNL
HPE Cray EX255a, AMD 4th Gen EPYC 24C 1.8GHz, AMD Instinct MI300A,
Slingshot-11, TOSS
1,161,216 208.10
288.88
72.0 %
61.45 3,387
13 Venado, 2024, USA DOE/NNSA/LANL
HPE Cray EX254n, NVIDIA Grace 72C 3.1GHz, NVIDIA GH200 Superchip,
Slingshot-11
481,440 98.51
130.44
75.5 %
59.29 1,662
16 CHIE-3, 2024, Japan SoftBank, Corp.
NVIDIA DGX H100, Xeon Platinum 8480C 56C 2GHz, NVIDIA H100, Infiniband
NDR400, Ubuntu 22.04.4 LTS
163.200 91.94
138.32
66.5 %
17 CHIE-2, 2024, Japan SoftBank, Corp.
NVIDIA DGX H100, Xeon Platinum 8480C 56C 2GHz, NVIDIA H100, Infiniband
NDR400, Ubuntu 22.04.4 LTS
163.200 89.78
138.32
64.9 %
18 JETI, 2024, Germany EuroHPC/FZJ
BullSequana XH3000, Grace Hopper Superchip 72C 3GHz, NVIDIA GH200
Superchip, Quad-Rail NVIDIA InfiniBand NDR200, RedHat Linux, Modular OS
391,680 83.14
94.00
88.4 %
63.43 1,311
22 CEA-HE, 2024, France CEA
BullSequana XH3000, Grace Hopper Superchip 72C 3GHz, NVIDIA GH200
Superchip, Quad-Rail BXI v2, EVIDEN
389,232 64.32
103.48
62.2 %
52.17 1,233
28 Miyabi-G, 2024, Japan JCAHPC
Fujitsu, Supermicro ARS 111GL DNHR LCC, Grace Hopper Superchip 72C 3GHz,
Infiniband NDR200, Rocky Linux
80,640 46.80
72.80
64.3 %
47.59 983
36
TSUBAME 4.0, 2024,Japan Institute of
Science Tokyo
HPE Cray XD665, AMD EPYC 9654 96C 2.4GHz, NVIDIA H100 SXM5 94 GB,
Infiniband NDR200
172,800 39.62
61.60
64.3 %
48.55 816
Wisteria/BDEC-01 (Odyssey), 2021, Japan 25.95
64th TOP500 List (Nov, 2024)
Rmax: Performance of Linpack (TFLOPS)
Rpeak: Peak Performance (TFLOPS), Power: kW
6
Engineering
Earth/Space
Material
Energy/Physics
Info. Sci. : System
Info. Sci. : Algorithms
Info. Sci. : AI
Education
Industry
Bio
Bioinformatics
Social Sci. & Economics
Data
Research Area based on Machine Hours (FY.2023)
■CPU, ■GPU (April-March)
7
Engineering
Earth/
Space
Materials
Energy/
Physics
Bio
Physics
Energy/
Physics
Earth/
Space
Bio
Informatics
Engineering
Materials
AI
Bio
Informatics
Data
Analytics
Bio
Physics
Materials
OBCX
CascadeLake
Retired in the end of
September 2023
Odyssey
A64FX
Aquarius
A100
Earth/
Space
Engineering
8
Fast File
System
(FFS)
1.0 PB,
1.0 TB/s
Simulation Nodes
Odyssey
25.9 PF, 7.8 PB/s
Shared File
System
(SFS)
25.8 PB,
0.50 TB/s
Data/Learning Nodes
Aquarius
7.20 PF, 578.2 TB/s
計算科学コード
データ・学習ノード群
Aquarius
シミュレーション
ノード群,Odyssey
機械学習,DDA
最適化されたモデル,
パラメータ
観測データ
計算結果
データ同化
データ解析
Wisteria/BDEC-01
外部
リソース
外部ネットワーク
サーバー
ストレージ
DB
センサー群
他
② 革新的ソフトウェア基盤
h3-Open-BDEC
東京大学情報基盤センター
(計算+データ+学習)融合によるエクサスケール
時代の革新的シミュレーション手法
10
• エクサスケール(富岳+クラス)のスパコンによる科学的発見の持続的促進のた
め,計算科学にデータ科学,機械学習のアイディアを導入した(計算+データ
+学習(S+D+L))融合による革新的シミュレーション手法を提案
– (計算+データ+学習)融合によるエクサスケール時代の革新的シミュレーション手法 (科
研費基盤S,代表:中島研吾(東大情基セ),2019年度~2023年度)
• 革新的ソフトウェア基盤「h3-Open-BDEC」の開発:東大BDECシステム(
Wisteria/BDEC-01),「富岳」等を「S+D+L」融合プラットフォームと位置づけ,
スパコンの能力を最大限引き出し,最小の計算量・消費電力での計算実行を
実現するために,下記2項目を中心に研究
– 変動精度演算・精度保証・自動チューニングによる新計算原理に基づく革新的数値解法
– 階層型データ駆動アプローチ(hDDA:Hierarchical Data Driven Approach)等に基づく
革新的機械学習手法
– Hierarchical, Hybrid, Heterogeneous ⇒ h3
h3-Open-BDEC
「計算+データ+学習」融合を実現する革新的ソフトウェア基盤
科研費基盤研究(S)(2019年度~23年度,代表:中島研吾)
https://h3-open-bdec.cc.u-tokyo.ac.jp/
① 変動精度演算・精度保証・自動
チューニングによる新計算原理
に基づく革新的数値解法
② 階層型データ駆動アプローチ
等に基づく革新的機械学習手
法
③ ヘテロジニアス環境(e.g.
Wisteria/BDEC-01)におけるソ
フトウェア,ユーティリティ群
11
Hierarchical,
Hybrid,
Heterogeneous
Big Data &
Extreme
Computing
h3-Open-BDEC
h3-Open-MATH
Algorithms with High-
Performance, Reliability,
Efficiency
h3-Open-VER
Verification of Accuracy
h3-Open-AT
Automatic Tuning
h3-Open-APP: Simulation
Application Development
h3-Open-DATA: Data
Data Science
h3-Open-DDA: Learning
Data Driven Approach
h3-Open-SYS
Control & Integration
h3-Open-UTIL
Utilities for Large-Scale
Computing
Integration +
Communications+
Utilities
Simulation + Data +
Learning
New Principle for
Computations
Numerical Alg./Library App. Dev. Framework Control & Utility
Wisteria/BDEC-01: The First “Really
Heterogenous” System in the World
12
12
Simulation
Codes
Data/Learning
Nodes, Aquarius
Simulation Nodes
Odyssey
Machine
Learning, DDA
Optimized Models &
Parameters
Observation
Data
Results
Data Assimilation
Data Analysis
Wisteria/BDEC-01
External
Resources
External Network
Server,
Storage,
DB,
Sensors,
etc.
Fast File
System
(FFS)
1 PB, 1.0 TB/s
External
Resources
External Network
Simulation Nodes:
Odyssey
Fujitsu/Arm A64FX
25.9PF, 7.8 PB/s
2.0 TB/s
800 Gbps
Shared File
System
(SFS)
25.8 PB, 500 GB/s
Data/Learning
Nodes: Aquarius
Intel Ice Lake + NVIDIA A100
7.20 PF, 578.2 TB/s
External
Resources
Platform for Integration of (S+D+L)
Big Data & Extreme Computing
h3-Open-UTIL/MP (h3o-U/MP)
(HPC+AI) Coupling
[Dr. H. Yashiro, NIES]
13
h3o-U/MP
HPC App
(Fortran)
Analysis/ML
App
(Python)
h3o-U/MP
F<->P adapter
Coupling
Surrogate
Model
Visualiztion
Statistics
A huge amount of
simulation data
output
Simulations
Odyssey
AI/ML
Aquarius
h3-Open-SYS/WaiIO-Socket
14
• Wisteria/BDEC-01
– Aquarius (GPU: NVIDIA A100)
– Odyssey (CPU: A64FX)
• Combining Odyssey-Aquarius
– Single MPI Job over O-A is
impossible
• Connection between Odyssey-
Aquarius
– IB-EDR with 2TB/sec.
– Fast File System
– h3-Open-SYS/WaitIO-Socket
• Library for Inter-Process
Communication through IB-
EDR with MPI-like interface
Fast File
System
(FFS)
1 PB, 1.0 TB/s
External
Resources
External Network
Simulation Nodes:
Odyssey
Fujitsu/Arm A64FX
25.9PF, 7.8 PB/s
2.0 TB/s
800 Gbps
Shared File
System
(SFS)
25.8 PB, 500 GB/s
Data/Learning
Nodes: Aquarius
Intel Ice Lake + NVIDIA A100
7.20 PF, 578.2 TB/s
External
Resources
Platform for Integration of (S+D+L)
Big Data & Extreme Computing
WaitIO-Socket
h3-Open-SYS/WaiIO-File
15
• Wisteria/BDEC-01
– Aquarius (GPU: NVIDIA A100)
– Odyssey (CPU: A64FX)
• Combining Odyssey-Aquarius
– Single MPI Job over O-A is
impossible
• Connection between Odyssey-
Aquarius
– IB-EDR with 2TB/sec.
– Fast File System
– h3-Open-SYS/WaitIO-File
• Library for Inter-Process
Communication through FFS
with MPI-like interface
Fast File
System
(FFS)
1 PB, 1.0 TB/s
External
Resources
External Network
Simulation Nodes:
Odyssey
Fujitsu/Arm A64FX
25.9PF, 7.8 PB/s
2.0 TB/s
800 Gbps
Shared File
System
(SFS)
25.8 PB, 500 GB/s
Data/Learning
Nodes: Aquarius
Intel Ice Lake + NVIDIA A100
7.20 PF, 578.2 TB/s
External
Resources
Platform for Integration of (S+D+L)
Big Data & Extreme Computing
WaitIO-File
h3-Open-UTIL/MP +
h3-Open-SYS/WaitIO-Socket
Available in June 2022
16
Fortran APP
(NICAM)
Python APP
(PyTorch)
h3open modules
h3open modules
h3opp.py
h3open_py.f90
Jcup modules
Jcup modules
h3-Open-UTIL/MP
jcup_mpi_lib.f90
jcup_mpi_lib.f90
MPI
Jcup
Fortran APP
(NICAM)
Python APP
(PyTorch)
h3open modules
h3open modules
h3opp.py
h3open_py.f90
Jcup modules
Jcup modules
jcup_mpi_lib.f90
jcup_mpi_lib.f90
MPI
Jcup
MPI
WaitIO
MPI+WaitIO
MPI wrapper
h3-Open-UTIL/MP
May 2021: MPI Only
Odyssey Aquarius
IB-EDR
June 2022: Coupler+WaitIO
17
API of h3-Open-SYS/WaitIO-Socket/-File
PB (Parallel Block): Each Application
WaitIO API Description
waitio_isend Non-Blocking Send
waitio_irecv Non-Blocking Receive
waitio_wait Termination of waitio_isend/irecv
waitio_init Initialization of WaitIO
waitio_get_nprocs Process # for each PB (Parallel Block)
waitio_create_group
waitio_create_group_wranks
Creating communication groups
among PB’s
waitio_group_rank Rank ID in the Group
waitio_group_size Size of Each Group
waitio_pb_size Size of the Entire PB
waitio_pb_rank Rank ID of the Entire PB
[Sumimoto et al. 2021]
Replacing
this part
with AI
 Motivation of this experiment
 Tow types of Atmospheric models: Cloud resolving VS Cloud
parameterizing
 Could resolving model is difficult to use for climate simulation
 Parameterized model has many assumptions
 Replacing low-resolution cloud processes calculation with ML!
Diagram of applying ML to an atmospheric model
High Resolution Atmospheric Model
(Convection-Resolving Mode)
Low Resolution Atmospheric Model
(Convection-Parameterization Mode)
Physical process
Input
Output
Coupling with
Grid Remapping
ML App
(Python)
Coupling without
Grid Remapping
Coupling Phase 1
Training with high-resolution
NICAM data
Coupling Phase 2
Replacing Physical Process
in Low-Resolution NICAM
with Machine Learning
Atmosphere-ML Coupling
[Yashiro (NIES), Arakawa (ClimTech/U.Tokyo)]
75% 25%
Odyssey Aquarius
h3-Open-UTIL/MP
(Coupler)
+
h3-Open-SYS/WaitIO-
Socket
~0%
Experimental Design
 Atmospheric model on Odyssey
 NICAM : global non-hydrostatic model with an icosahedral grid
 Resolution : horizontal : 10240, vertical : 78
 ML on Aquarius
 Framework : PyTorch
 Method : Three-Layer MLP
 Resolution : horizontal : 10240, vertical : 78
 Experimental design
 Phase1: PyTorch is trained to reproduce output variables from
input variables of cloud physics subroutine.
 Phase2:Reproduce the output variables from Input variables
and training results
 Training data
 Input : total air density (rho), internal energy (ein), density of
water vapor (rho_q)
 Output : tendencies of input variables computed within the
cloud physics subroutine
Atmospheric Model
(Convection-Scheme ON)
Cloud physics
subroutine
Input
Output
ML App
(Python)
Output
Phase1: Training phase Phase2: Test phase
Simulation Node
Odyssey
Data/Learning Node
Aquarius
Δ𝑟ℎ𝑜
Δ𝑇
Δ𝑒𝑖𝑛
Δ𝑇
Δ𝑟ℎ𝑜_𝑞
Δ𝑇
Test calculation
Total air density
Internal energy
Density of
water vapor
Input Simulation Output
 Compute output variables from input variables and PyTorch
 The rough distribution of all variables is well reproduced
 The reproduction of extreme values is no good
ML output
Simulations Prediction by ML/NN
③次の一手:Miyabiと将来に向けた取り組み
東京大学情報基盤センター
2001-2005 2006-2010 2011-2015 2016-2020 2021-2025 2026-2030
Hitachi
SR2201
307.2GF
Hitachi
SR8000/MPP
2,073.6 GF
Hitachi SR8000
1,024 GF
Hitachi SR11000
J1, J2
5.35 TF, 18.8 TF
Hitachi SR16K/M1
Yayoi
54.9 TF
Hitachi HA8000
T2K Todai
140 TF
Fujitsu FX10
Oakleaf-FX
1.13 PF
Reedbush-
U/H/L (SGI-HPE)
3.36 PF
33.1 PF
Fujitsu
BDEC-
02
150+ PF
AMD Opteron
OBCX
(Fujitsu)
6.61 PF
Ipomoea-01 25PB
Ipomoea-02
Ipomoea-
03
SPACR64 IXfx
Intel BDW +
NVIDIA P100
IBM Power7
IBM Power5+
Intel CLX
A64FX,
Intel Icelake+
NVIDIA A100
Oakforest-
PACS (Fujitsu)
25.0 PF
Intel Xeon Phi
Accelerators
SR8000
SR8000
HARP-1E
22
Pseudo Vector
Multicore CPU
GPU, Accelerators
Miyabi/OFP-II
80+ PF
Intel SPR, NVIDIA GH200
Supercomputers
@ITC/U.Tokyo
3,000+ Users
55+% outside of U.Tokyo
100G
100G
100G
100G
400G
Ether
(RoCEv2)
NDR400
x2
100G
HDR200
HDR200
IB-NDR
(800
Gbps)
100G
x20
NDR400
x2
NDR400
x2
NDR400
x2
HDR100
x8
100G
• GPU cluster with 16 H100 GPUs, as prototype system for OFP-II
• Operation will start since Nov. 2023, installed and operated by Nippon Comsys
• Intel Xeon Platinum 8468 (Sapphire Rapids, 48c, 2.1GHz) x2, DDR5-4800 512 GB + NVIDIA H100 SXM5
x4, manufactured by Dell
• NVIDIA InfiniBand-NDR (800 Gbps) + 400 G Ethernet (RoCE)
• Lustre Filesystem: 67 TB by NVMe SSD + 26 PB available from Ipomoea-01
(Wisteria-) Mercury
Sep. 4th, 2023 44th Advanced Supercomputing Environment (ASE) Seminar 23
Ipomoea-01
Large-scale Common Storage
Compute node
4 node
Fast Storage
Login node / Management node
Miyabi System
24
NVIDIA GH200 Grace Hopper Superchip
Hopper
GPU
GRACE
CPU
72c, 2.6 GHz
IB NDR HCA
ConnectX-7
LPDDR5X
120 GB
450 GB/s
512 GB/s
HBM3
96 GB
4.022 TB/s
NVLink C2C
PCIe Gen4
x4
PCIe Gen5
x8
IB NDR200
(200 Gbps)
NVMe SSD
1.92 TB
• Acc-Group: CPU+GPU: NVIDIA GH200
– Node: NVIDIA GH200 Grace-Hopper Superchip
• Grace: 72c, 3.456 TF, 120 GB, 512 GB/sec (LPDDR5X)
• H100: 66.9 TF DP-Tensor Core, 96 GB, 4,022 GB/sec (HBM3)
– Cache Coherent between CPU-GPU
• NVMe SSD for each GPU: 1.9TB, 8.0GB/sec, GPUDirect Storage
– Total (Aggregated Performance: CPU+GPU)
• 1,120 nodes, 78.8 PF, 5.07 PB/sec, IB-NDR 200
• CPU-Group: CPU Only: Intel Xeon Max 9480 (SPR)
– Node: Intel Xeon Max 9480 (1.9 GHz, 56c) x 2
• 6.8 TF, 128 GiB, 3,200 GB/sec (HBM2e only)
– Total
• 190 nodes, 1.3 PF, IB-NDR 200
• 372 TB/sec for STREAM Triad (Peak: 608 TB/sec)
Miyabi (1/2)
Operation starts in January 2025
25
• File System: DDN EXA Scalar, Lustre FS
– 11.3 PB (NVMe SSD) 1.0TB/sec, “Ipomoea-01” with 26 PB is also available
• All nodes are connected with Full Bisection Bandwidth
– (400Gbps/8)×(32×20+16×1) = 32.8 TB/sec
• Operation starts in January 2025, h3-Open-SYS/WaitoIO will be
adopted for communication between Acc-Group and CPU-Group
Miyabi (2/2)
Operation starts in January 2025
26
IB-NDR(400Gbps)
IB-NDR200(200) IB-HDR(200)
File System
DDN EXA Scaler
11.3 PB, 1.0TB/sec
CPU-Group
Intel Xeon Max
(HBM2e) 2 x 190
1.3 PF, 608 TB/sec
Acc-Group
NVIDIA GH200 1,120
78.2 PF, 5.07 PB/sec
Ipomoea-01
Common Shared Storage
26 PB
27
http://www.top500.org/
Site Computer/Year Vendor Cores
Rmax
(PFLOPS)
Rpeak
(PFLOPS)
GFLOPS/W
Power
(kW)
1 El Capitan, 2024, USA DOE/NNSA/LLNL
HPE Cray EX255a, AMD 4th Gen EPYC 24C 1.8GHz, AMD Instinct MI300A,
Slingshot-11, TOSS
11,039,616 1,742.00
(=1.742 EF)
2,746.38
63.4 %
58.99 29,581
2
Frontier, 2021, USA DOE/SC/Oak Ridge
National Laboratory
HPE Cray EX235a, AMD Optimized 3rd Gen. EPYC 64C 2GHz, AMD Instinct
MI250X, Slingshot-11
9,066,176 1.353.00
2,055.72
65.8 %
54.98 24,607
3
Aurora, 2023, USA DOE/SC/Argonne National
Laboratory
HPE Cray EX - Intel Exascale Compute Blade, Xeon CPU Max 9470 52C 2.4GHz,
Intel Data Center GPU Max, Slingshot-11, Intel
9,264,128 1,012.00
1,980.01
51.1 %
26.15 38,698
4 Eagle, 2023, USA Microsoft
Microsoft NDv5, Xeon Platinum 8480C 48C 2GHz, NVIDIA H100, NVIDIA
Infiniband NDR
2,073,600 561.20
846.84
66.3 %
5 HPC 6, 2024, Italy Eni S.p.A.
HPE Cray EX235a, AMD Optimized 3rd Generation EPYC 64C 2GHz, AMD
Instinct MI250X, Slingshot-11, RHEL 8.9
3,143,520 477.90
606.97
66.3 %
56.48 8,461
6 Fugaku, 2020, Japan R-CCS, RIKEN Fujitsu PRIMEHPC FX1000, Fujitsu A64FX 48C 2.2GHz, Tofu-D 7,630,848 442.01
537.21
82.3 %
14.78 29,899
7
Alps, 2024, Switzerland Swiss Natl. SC
Centre (CSCS)
HPE Cray EX254n, NVIDIA Grace 72C 3.1GHz, NVIDIA GH200 Superchip,
Slingshot-11
2,121,600 434.90
574.84
75.7 %
61.05 7,124
8 LUMI, 2023, Finland EuroHPC/CSC
HPE Cray EX235a, AMD Optimized 3rd Gen. EPYC 64C 2GHz, AMD Instinct
MI250X, Slingshot-11
2,752,704 379.70
531.51
71.4 %
53.43 7,107
9 Leonard, 2023, Italy EuroHPC/Cineca
BullSequana XH2000, Xeon Platinum 8358 32C 2.6GHz, NVIDIA A100 SXM4
64GB, Quad-rail NVIDIA HDR100
1,824,768 241.20
306.31
78.7 %
32.19 7,494
10 Tuolumne, 2024, USA DOE/NNSA/LLNL
HPE Cray EX255a, AMD 4th Gen EPYC 24C 1.8GHz, AMD Instinct MI300A,
Slingshot-11, TOSS
1,161,216 208.10
288.88
72.0 %
61.45 3,387
13 Venado, 2024, USA DOE/NNSA/LANL
HPE Cray EX254n, NVIDIA Grace 72C 3.1GHz, NVIDIA GH200 Superchip,
Slingshot-11
481,440 98.51
130.44
75.5 %
59.29 1,662
16 CHIE-3, 2024, Japan SoftBank, Corp.
NVIDIA DGX H100, Xeon Platinum 8480C 56C 2GHz, NVIDIA H100, Infiniband
NDR400, Ubuntu 22.04.4 LTS
163.200 91.94
138.32
66.5 %
17 CHIE-2, 2024, Japan SoftBank, Corp.
NVIDIA DGX H100, Xeon Platinum 8480C 56C 2GHz, NVIDIA H100, Infiniband
NDR400, Ubuntu 22.04.4 LTS
163.200 89.78
138.32
64.9 %
18 JETI, 2024, Germany EuroHPC/FZJ
BullSequana XH3000, Grace Hopper Superchip 72C 3GHz, NVIDIA GH200
Superchip, Quad-Rail NVIDIA InfiniBand NDR200, RedHat Linux, Modular OS
391,680 83.14
94.00
88.4 %
63.43 1,311
22 CEA-HE, 2024, France CEA
BullSequana XH3000, Grace Hopper Superchip 72C 3GHz, NVIDIA GH200
Superchip, Quad-Rail BXI v2, EVIDEN
389,232 64.32
103.48
62.2 %
52.17 1,233
28 Miyabi-G, 2024, Japan JCAHPC
Fujitsu, Supermicro ARS 111GL DNHR LCC, Grace Hopper Superchip 72C 3GHz,
Infiniband NDR200, Rocky Linux
80,640 46.80
72.80
64.3 %
47.59 983
36
TSUBAME 4.0, 2024,Japan Institute of
Science Tokyo
HPE Cray XD665, AMD EPYC 9654 96C 2.4GHz, NVIDIA H100 SXM5 94 GB,
Infiniband NDR200
172,800 39.62
61.60
64.3 %
48.55 816
58
Wisteria/BDEC-01 (Odyssey), 2021, Japan Fujitsu PRIMEHPC FX1000, A64FX 48C 2.2GHz, Tofu D 368,640 22.12
25.95
15.07 1,468
64th TOP500 List (Nov, 2024)
Rmax: Performance of Linpack (TFLOPS)
Rpeak: Peak Performance (TFLOPS), Power: kW
28
Detailed Plan for Porting
• Strong supports by NVIDIA, Japan
• 3,000+ OFP users: Two categories of support
• Self Porting: Various Options
– 1-week Hackathon (online/hybrid): Every 3-months, utliziing Slack for comm.
– Monthly Open Meeting for Consultation via Zoom (Non-users can join)
– Portal Site for Useful Information (in Japanese)
• https://jcahpc.github.io/gpu_porting/
• Surpported Porting
– Community Codes with Many Users (17, next page) + OpenFOAM (by NVIDIA)
– Budget for Outsourcing
– Started in October 2022: Meetings every 3-4 months
– Many members of “Supported Porting” groups are joining Hackathons.
• Mostly, our users’ codes are parallelized by MPI+OpenMP
– OpenACC is recommended 29
30
Category Name (Organizations) Target, Method etc. Language
Engineering
(5)
FrontISTR (U.Tokyo) Solid Mechanics, FEM Fortran
FrontFlow/blue (FFB) (U.Tokyo) CFD, FEM Fortran
FrontFlow/red (AFFr) (Advanced Soft) CFD, FVM Fortran
FFX (U.Tokyo) CFD, Lattice Boltzmann Method (LBM) Fortran
CUBE (Kobe U./RIKEN) CFD, Hierarchical Cartesian Grid Fortran
Biophysics
(3)
ABINIT-MP (Rikkyo U.) Drug Discovery etc., FMO Fortran
UT-Heart (UT Heart, U.Tokyo) Heart Simulation, FEM etc. Fortran, C
Lynx (Simula, U.Tokyo) Cardiac Electrophysiology, FVM C
Physics
(3)
MUTSU/iHallMHD3D (NIFS) Turbulent MHD, FFT Fortran
Nucl_TDDFT (Tokyo Tech) Nuclear Physics, Time Dependent DFT Fortran
Athena++ (Tohoku U. etc.) Astrophysics/MHD, FVM/AMR C++
Climate/
Weather/
Ocean
(4)
SCALE (RIKEN) Climate/Weather, FVM Fortran
NICAM (U.Tokyo, RIKEN, NIES) Global Climate, FVM Fortran
MIROC-GCM (AORI/U.Tokyo) Atmospheric Science, FFT etc. Fortran77
Kinaco (AORI/U.Tokyo) Ocean Science, FDM Fortran
Earthquake
(4)
OpenSWPC (ERI/U.Tokyo) Earthquake Wave Propagation, FDM Fortran
SPECFEM3D (Kyoto U.) Earthquake Simulations, Spectral FEM Fortran
hbi_hacapk (JAMSTEC, U.Tokyo) Earthquake Simulations, H-Matrix Fortran
sse_3d (NIED) Earthquake Science, BEM (CUDA Fortran) Fortran
JHPC-quantum Project
31
32
計算可能領域の開拓のための量子・スパコン連携プラットフォームの研究開発
実施者
概要
国立研究開発法人理化学研究所、ソフトバンク株式会社 (共同実施)東京大学、大阪大学
量子コンピュータとスーパーコンピュータ(HPC)を連携するための量子・HPC連携システムソフトウェアを研
究開発し、これを用いてこれまでのスパコンのみでは困難だった領域の計算を可能とする量子・スパコン連携
プラットフォームを構築する。既存のスパコンのみの計算に対し量子・HPC連携アプリの優位性を実証するとと
もに、この計算プラットフォームで実行される量子・HPC連携ソフトウェアをポスト5G時代のネットワークで提供
されるサービスとして展開する技術を開発する。
ポスト5G情報通信システム基盤強化研究開発事業/ポスト5G情報通信システムの開発
1.開発目的
• 量子コンピュータは、従来のコンピュータと全く異なる原理で動作し、情報処理速度の劇的な高速
化が期待されるが、現時点では、規模拡大と計算結果の誤り訂正の両立が困難であり、量子コ
ンピュータ単独での実用化には時間を要する見込み。
• 一方で、デジタル化の進展により、情報処理能力の向上が急務であり、量子コンピュータの活用
がいち早く求められているところ、古典コンピュータを組み合わせて活用することが有望視されている。
• 本事業では、世界に先駆けて、量子コンピュータとスパコンを連携利用するためのソフトウェアやプ
ラットフォーム、アプリケーションを開発・構築し、ポスト5G時代で提供されるサービスとして展開する
技術としての有効性を実証していく。
2.開発内容
• 量子・HPC連携ソフトウェア:スパコンと量子コンピュータを連携させ、最適な計算資源をシーム
レスかつ効率的に利用するためのシステムソフトウェアを開発。
• モジュール型量子ソフトウェアライブラリ:アプリ分野に合わせたモジュール型のソフトウエアを整備、
量子コンピュータの特性に合わせたエラー緩和処理、回路最適化処理を実現する上位ソフトウェア
ライブラリを開発。モジュールとして組み合わせることで高度な量子アプリケーションを開発可能とする。
• 量子・スパコン連携PFのクラウド化技術:事業展開を見据えて、量子アプリケーションの利用を
支援するクラウド基盤ソフトウェアを開発。
3.構築する量子・スパコン連携プラットフォームの構成
• 理研・計算科学研究センター(神戸)及び(和光)に特性の異なる2種類の量子コンピュータ
を整備。これらと富岳、及び東大・阪大スパコンと連携したプラットフォームを構築。
量子・スパコン連携プラットフォーム
量子計算ミュレータ
(高性能GPUシステム)
大規模シミュレータ
(スパコン・富岳) 超伝導型量子
コンピュータ
イオントラッ
プ型量子コン
ピュータ
量子計算シミュレータ 量子コンピュータ
量子コンピュータ・量子計算シミュレータ
スパコン(富岳)
PC/サーバー
高性能
GPUシ
ステム
スーパーコンピュータ
古典コンピュータ
遠隔手続き呼び出し
連携スケジューラ
量子HPCプログラミ
ング環境
量子・スパコン連携
クラウド化技術
量子・HPC連携
システムソフト
ウェア
モジュール型量子計算
ソフトウエアライブラリ
量子HPC連携プログラム最
適化技術(エラー緩和、
回路最適化)
2026年度に量子・スパコン連携プラット
フォームを運用開始し、それを用いて量
子・ HPC連携アプリケーションの有効性
の実証に取り組む
2028年度下期、量子・スパコン連携プラッ
トフォームのプレリリースを計画
QC-HPC Hybrid Computing
• JHPC-quantum (FY.2023-FY.2028)
– https://jhpc-quantum.org/
– RIKEN, Softbank, U.Tokyo, Osaka U.
• supported by Japanese Government (METI/NEDO)
– Two Real Quantum Computers will be installed
• IBM’s Superconducting QC at RIKEN-Kobe (100+Qubit)
• Quantinuum’s Ion-Trap QC at RIKEN-Wako (20+Qubit)
– Target Applications
• Quantum Sciences, Quantum Machine Learning etc.
33
A64FX
Arm
X86
NVIDIA
Intel
AMD
Arm
AMD Intel
Sambanova
Cerebras
Graphcore
etc.
Quantum
h3-Open-SYS/WaitIO
CPU GPU Others
• Role of U.Tokyo
– R&D on System SW for QC-HPC Hybrid Environment (QC as Accelerators)
• Extension of h3-Open-BDEC
• Fugaku & Miyabi will be connected to QCs in March 2026 (or before)
Wisteria/BDEC-01 with Odyssey-Aquarius
Simulator of (QC-HPC) Hybrid Environment
34
Fast File
System
(FFS)
1 PB, 1.0 TB/s
External
Resources
External Network
Simulation Nodes:
Odyssey
Fujitsu/Arm A64FX
25.9PF, 7.8 PB/s
2.0 TB/s
800 Gbps
Shared File
System
(SFS)
25.8 PB, 500 GB/s
Data/Learning
Nodes: Aquarius
Intel Ice Lake + NVIDIA A100
7.20 PF, 578.2 TB/s
External
Resources
Platform for Integration of (S+D+L)
Big Data & Extreme Computing
• Quantum Computer = Accelerator of Supercomputers: QC-HPC Hybrid
• System SW for Efficient & Smooth Operation of QC (Quantum Computer,
including simulators on supercomputers)-HPC Hybrid Environment
– QHscheduler: A job scheduler that can simultaneously use multiple computer
resources distributed in remote locations
– h3-Open-BDEC/QH: Coupling to efficiently implement and integrate communication
and data transfer between QC-HPC on-line and in real time
– Collaboration with RIKEN R-CCS, funded by Japanese Government
• Target Application
– AI for HPC, combined workload
• Simulations in Computational Science
• Quantum Machine Learning
– Quantum Simulations, Error Correction
System SW for QC-HPC Hybrid Environment (1/2)
35
HPC
(1)
HPC
(2)
HPC
(3)
QC
(a)
QC
(b)
QC
On
HPC
QHscheduler
h3-Open-BDEC/QH
• Innovations
– This is the world's first attempt
to link multiple
supercomputers and quantum
computers installed at
different sites in real time.
– In particular, by using multiple
QCs simultaneously, it is
possible to form a virtual QC
with higher processing
capacity.
• Many people are thinking about
same thing all over the world
– This idea can be extended to
any types of systems
System SW for QC-HPC Hybrid Environment (2/2)
36
HPC
(1)
HPC
(2)
HPC
(3)
QC
(a)
QC
(b)
QC
On
HPC
QHscheduler
h3-Open-BDEC/QH
AI
• CUDA-Q has been already installed to Aquarius
• QC-HPC Hybrid Environment is now under
construction using h3-Open-BDEC
• 1-Day Tutorial on December 18 (Wed)
– Let's experience "Quantum/HPC Hybrid" with "CUDA-
Q+Wisteria/BDEC-01+h3-Open-BDEC"!
– World’s 1st Tutorial on “QC-HPC” with hands-on
• VQE (Variational Quantum Eigensolver): Toy
Wisteria/BDEC-01 with Odyssey-Aquarius
Simulator of (QC-HPC) Hybrid Environment
37
BDEC-02+mdx3
38
mdx I & II: Current Systems
• IaaS-based High-Performance Academic Cloud
• mdx I @ UTokyo (2023 April ~ now)
• VM hosting (on 368 CPU Nodes + 40 GPU Nodes Cluster)
• Peta-Scale Storage (Lustre + s3, around 25PB in total)
• Global IPs for web service
• mdx II @ OsakaU (2025)
• 60 CPU nodes + 1PB Storage (+ GPU Nodes under preparation)
• https://mdx.jp
• Targeting Data Science and Cross-Disciplinary Research
• Around 120 research projects undergoing (2023 April ~ now):
Computer Science, LLM, ML, Social Science, Life Science, Physics, Chemistry,
Materials Science, etc.
• Not only on traditional HPC workload but also on Data collection, Web service, etc.
• Example
• ARIM-mdx Data System:
Nextcloud + Jupyter Service for materials science
• Data collection / analysis from experimental facilities & HPCs
• https://arim.mdx.jp/
39
mdx III : Key Concepts
• Container-based High-Performance Academic Cloud
• On top of CPU / GPU Cluster
• VM hosting (mdx I & II) → Container hosting (mdx III)
• Targets:
• Data-centric Projects (current) + ML Inference (new)
• Key concepts:
• Interactive
• Jupyter, VSCode, Remote Desktop, etc.
• Long-running
• Web services, Data Collection, Data base,
Inference deployment, etc.
• HPC offloading & Storage Sharing
• Large-Scale Batch Workload, Data Training, Simulation, etc.
• Customizable
• Custom Container, Tenant Isolation, etc.
• Inherit from mdx I & II
40
Data Analysis /
Development
Web Server
DB Server
Inference
Server
Data
Source
BEDEC-02
Offloading
(Training /
Simulation)
mdx III
BDEC-02: 2nd System of the BDEC Project
• Operation starts in Fall 2027- Spring 2028, hopefully
• Wisteria/BDEC-01 retires at the end of April 2027
• University of Tokyo’s System with 200-250 PF (150 PF is realistic)
• HPL performance is not a big issue
• “Real-Time” Integration of (S+D+L) towards “AI for Science”
• NVIDIA’s GPUs, or ones based on compatible programming
environments with OpenACC/StdPar
• Fortran is still very important
• Connection to Special Devices/Systems
• Quantum Computers etc.
• h3-Open-BDEC, h3-Open-BDEC/QH
• Integration with mdx-Next (mdx3)
• Cloud-like Data Platform
• (at least) (part of) storage to be shared
41
Internet
BDEC-02+mdx3 Concept
• The concept of the BDEC-02/mdx3 is to save costs by sharing computer
resources as much as possible and avoiding redundant configurations.
• Currently, Wisteria/BDEC-01 and mdx are separate systems in the same room
42
Wisteria/BDEC-01
Odyssey
Aquarius
mdx
Router
Node
Router
Internet
Wisteria/BDEC-02
Node A
Node B
mdx3
Node
Router
Ipomoea-01
(Ipomoea-02)

PCCC24(第24回PCクラスタシンポジウム):東京大学情報基盤センター テーマ3「次の一手:Miyabiと将来に向けた取り組み」

  • 1.
  • 2.
  • 3.
    2001-2005 2006-2010 2011-20152016-2020 2021-2025 2026-2030 Hitachi SR2201 307.2GF Hitachi SR8000/MPP 2,073.6 GF Hitachi SR8000 1,024 GF Hitachi SR11000 J1, J2 5.35 TF, 18.8 TF Hitachi SR16K/M1 Yayoi 54.9 TF Hitachi HA8000 T2K Todai 140 TF Fujitsu FX10 Oakleaf-FX 1.13 PF Reedbush- U/H/L (SGI-HPE) 3.36 PF 33.1 PF Fujitsu BDEC- 02 150+ PF AMD Opteron OBCX (Fujitsu) 6.61 PF Ipomoea-01 25PB Ipomoea-02 Ipomoea- 03 SPACR64 IXfx Intel BDW + NVIDIA P100 IBM Power7 IBM Power5+ Intel CLX A64FX, Intel Icelake+ NVIDIA A100 Oakforest- PACS (Fujitsu) 25.0 PF Intel Xeon Phi Accelerators SR8000 SR8000 HARP-1E 3 Pseudo Vector Multicore CPU GPU, Accelerators Miyabi/OFP-II 80+ PF Intel SPR, NVIDIA GH200 Supercomputers @ITC/U.Tokyo 3,000+ Users 55+% outside of U.Tokyo
  • 4.
    スーパーコンピューティング の今後 • ワークロードの多様化 – 計算科学,計算工学:Simulations –大規模データ解析 – AI,機械学習 • (シミュレーション(計算)+データ+学習)融合 ⇒Society 5.0実現に有効 – フィジカル空間とサイバー空間の融合 • S:シミュレーション(計算)(Simulation) • D:データ(Data) • L:学習(Learning) – Simulation + Data + Learning = S+D+L BDEC: S + D + L mdx: S + D + L 4 • 2021年春に柏IIキャンパスで始動 – BDEC(Wisteria/BDEC-01):賢いスパコン – Data Platform(mdx):Cloud的,よりフレキシブル Engineering Earth/Space Material Energy/Physics Info. Sci. : System Info. Sci. : Algrorithms Info. Sci. : AI Education Industry Bio Bioinformatics Social Sci. & Economics Data 工学・ ものつくり 材料科学 産業利用 エネルギー・ 物理学 Engineering Earth/Space Material Energy/Physics Info. Sci. : System Info. Sci. : Algrorithms Info. Sci. : AI Education Industry Bio Bioinformatics Social Sci. & Economics Data 材料科学 情報科学:AI 生物科学・ 生体力学 バイオ インフォマティクス 情報科学: アルゴリズム 工学・ものつくり 地球科学・宇宙科学 材料科学 エネルギー・物理学 情報科学:システム 情報科学:アルゴリズム 情報科学:AI 教育 産業利用 生物科学・生体力学 バイオインフォマティクス 社会科学・経済学 データ科学・データ同化 CPU Cluster Oakbridge-CX GPU Cluster Reedbush-L 生物科学・ 生体力学
  • 5.
    Wisteria/BDEC-01 • Operation startedon May 14, 2021 • 33.1 PF, 8.38 PB/sec by Fujitsu – ~4.5 MVA with Cooling, ~360m2 5 • 2 Types of Node Groups – Hierarchical, Hybrid, Heterogeneous (h3) – Simulation Node Group: Odyssey • Fujitsu PRIMEHPC FX1000 (A64FX), 25.9 PF – 7,680 nodes (368,640 cores), Tofu-D – General Purpose CPU + HBM – Commercial Version of “Fugaku” – Data/Learning Node Group: Aquarius • Data Analytics & AI/Machine Learning • Intel Xeon Ice Lake + NVIDIA A100, 7.2PF – 45 nodes (90x Ice Lake, 360x A100), IB-HDR • DL nodes are connected to external resources directly • File Systems: SFS (Shared/Large) + FFS (Fast/Small) Fast File System (FFS) 1 PB, 1.0 TB/s External Resources External Network Simulation Nodes: Odyssey Fujitsu/Arm A64FX 25.9PF, 7.8 PB/s 2.0 TB/s 800 Gbps Shared File System (SFS) 25.8 PB, 500 GB/s Data/Learning Nodes: Aquarius Intel Ice Lake + NVIDIA A100 7.20 PF, 578.2 TB/s External Resources Platform for Integration of (S+D+L) Big Data & Extreme Computing The 1st BDEC System (Big Data & Extreme Computing) HW Platform for Integration of (S+D+L)
  • 6.
    http://www.top500.org/ Site Computer/Year VendorCores Rmax (PFLOPS) Rpeak (PFLOPS) GFLOPS/W Power (kW) 1 El Capitan, 2024, USA DOE/NNSA/LLNL HPE Cray EX255a, AMD 4th Gen EPYC 24C 1.8GHz, AMD Instinct MI300A, Slingshot-11, TOSS 11,039,616 1,742.00 (=1.742 EF) 2,746.38 63.4 % 58.99 29,581 2 Frontier, 2021, USA DOE/SC/Oak Ridge National Laboratory HPE Cray EX235a, AMD Optimized 3rd Gen. EPYC 64C 2GHz, AMD Instinct MI250X, Slingshot-11 9,066,176 1.353.00 2,055.72 65.8 % 54.98 24,607 3 Aurora, 2023, USA DOE/SC/Argonne National Laboratory HPE Cray EX - Intel Exascale Compute Blade, Xeon CPU Max 9470 52C 2.4GHz, Intel Data Center GPU Max, Slingshot-11, Intel 9,264,128 1,012.00 1,980.01 51.1 % 26.15 38,698 4 Eagle, 2023, USA Microsoft Microsoft NDv5, Xeon Platinum 8480C 48C 2GHz, NVIDIA H100, NVIDIA Infiniband NDR 2,073,600 561.20 846.84 66.3 % 5 HPC 6, 2024, Italy Eni S.p.A. HPE Cray EX235a, AMD Optimized 3rd Generation EPYC 64C 2GHz, AMD Instinct MI250X, Slingshot-11, RHEL 8.9 3,143,520 477.90 606.97 66.3 % 56.48 8,461 6 Fugaku, 2020, Japan R-CCS, RIKEN Fujitsu PRIMEHPC FX1000, Fujitsu A64FX 48C 2.2GHz, Tofu-D 7,630,848 442.01 537.21 82.3 % 14.78 29,899 7 Alps, 2024, Switzerland Swiss Natl. SC Centre (CSCS) HPE Cray EX254n, NVIDIA Grace 72C 3.1GHz, NVIDIA GH200 Superchip, Slingshot-11 2,121,600 434.90 574.84 75.7 % 61.05 7,124 8 LUMI, 2023, Finland EuroHPC/CSC HPE Cray EX235a, AMD Optimized 3rd Gen. EPYC 64C 2GHz, AMD Instinct MI250X, Slingshot-11 2,752,704 379.70 531.51 71.4 % 53.43 7,107 9 Leonard, 2023, Italy EuroHPC/Cineca BullSequana XH2000, Xeon Platinum 8358 32C 2.6GHz, NVIDIA A100 SXM4 64GB, Quad-rail NVIDIA HDR100 1,824,768 241.20 306.31 78.7 % 32.19 7,494 10 Tuolumne, 2024, USA DOE/NNSA/LLNL HPE Cray EX255a, AMD 4th Gen EPYC 24C 1.8GHz, AMD Instinct MI300A, Slingshot-11, TOSS 1,161,216 208.10 288.88 72.0 % 61.45 3,387 13 Venado, 2024, USA DOE/NNSA/LANL HPE Cray EX254n, NVIDIA Grace 72C 3.1GHz, NVIDIA GH200 Superchip, Slingshot-11 481,440 98.51 130.44 75.5 % 59.29 1,662 16 CHIE-3, 2024, Japan SoftBank, Corp. NVIDIA DGX H100, Xeon Platinum 8480C 56C 2GHz, NVIDIA H100, Infiniband NDR400, Ubuntu 22.04.4 LTS 163.200 91.94 138.32 66.5 % 17 CHIE-2, 2024, Japan SoftBank, Corp. NVIDIA DGX H100, Xeon Platinum 8480C 56C 2GHz, NVIDIA H100, Infiniband NDR400, Ubuntu 22.04.4 LTS 163.200 89.78 138.32 64.9 % 18 JETI, 2024, Germany EuroHPC/FZJ BullSequana XH3000, Grace Hopper Superchip 72C 3GHz, NVIDIA GH200 Superchip, Quad-Rail NVIDIA InfiniBand NDR200, RedHat Linux, Modular OS 391,680 83.14 94.00 88.4 % 63.43 1,311 22 CEA-HE, 2024, France CEA BullSequana XH3000, Grace Hopper Superchip 72C 3GHz, NVIDIA GH200 Superchip, Quad-Rail BXI v2, EVIDEN 389,232 64.32 103.48 62.2 % 52.17 1,233 28 Miyabi-G, 2024, Japan JCAHPC Fujitsu, Supermicro ARS 111GL DNHR LCC, Grace Hopper Superchip 72C 3GHz, Infiniband NDR200, Rocky Linux 80,640 46.80 72.80 64.3 % 47.59 983 36 TSUBAME 4.0, 2024,Japan Institute of Science Tokyo HPE Cray XD665, AMD EPYC 9654 96C 2.4GHz, NVIDIA H100 SXM5 94 GB, Infiniband NDR200 172,800 39.62 61.60 64.3 % 48.55 816 Wisteria/BDEC-01 (Odyssey), 2021, Japan 25.95 64th TOP500 List (Nov, 2024) Rmax: Performance of Linpack (TFLOPS) Rpeak: Peak Performance (TFLOPS), Power: kW 6
  • 7.
    Engineering Earth/Space Material Energy/Physics Info. Sci. :System Info. Sci. : Algorithms Info. Sci. : AI Education Industry Bio Bioinformatics Social Sci. & Economics Data Research Area based on Machine Hours (FY.2023) ■CPU, ■GPU (April-March) 7 Engineering Earth/ Space Materials Energy/ Physics Bio Physics Energy/ Physics Earth/ Space Bio Informatics Engineering Materials AI Bio Informatics Data Analytics Bio Physics Materials OBCX CascadeLake Retired in the end of September 2023 Odyssey A64FX Aquarius A100 Earth/ Space Engineering
  • 8.
    8 Fast File System (FFS) 1.0 PB, 1.0TB/s Simulation Nodes Odyssey 25.9 PF, 7.8 PB/s Shared File System (SFS) 25.8 PB, 0.50 TB/s Data/Learning Nodes Aquarius 7.20 PF, 578.2 TB/s 計算科学コード データ・学習ノード群 Aquarius シミュレーション ノード群,Odyssey 機械学習,DDA 最適化されたモデル, パラメータ 観測データ 計算結果 データ同化 データ解析 Wisteria/BDEC-01 外部 リソース 外部ネットワーク サーバー ストレージ DB センサー群 他
  • 9.
  • 10.
    (計算+データ+学習)融合によるエクサスケール 時代の革新的シミュレーション手法 10 • エクサスケール(富岳+クラス)のスパコンによる科学的発見の持続的促進のた め,計算科学にデータ科学,機械学習のアイディアを導入した(計算+データ +学習(S+D+L))融合による革新的シミュレーション手法を提案 – (計算+データ+学習)融合によるエクサスケール時代の革新的シミュレーション手法(科 研費基盤S,代表:中島研吾(東大情基セ),2019年度~2023年度) • 革新的ソフトウェア基盤「h3-Open-BDEC」の開発:東大BDECシステム( Wisteria/BDEC-01),「富岳」等を「S+D+L」融合プラットフォームと位置づけ, スパコンの能力を最大限引き出し,最小の計算量・消費電力での計算実行を 実現するために,下記2項目を中心に研究 – 変動精度演算・精度保証・自動チューニングによる新計算原理に基づく革新的数値解法 – 階層型データ駆動アプローチ(hDDA:Hierarchical Data Driven Approach)等に基づく 革新的機械学習手法 – Hierarchical, Hybrid, Heterogeneous ⇒ h3
  • 11.
    h3-Open-BDEC 「計算+データ+学習」融合を実現する革新的ソフトウェア基盤 科研費基盤研究(S)(2019年度~23年度,代表:中島研吾) https://h3-open-bdec.cc.u-tokyo.ac.jp/ ① 変動精度演算・精度保証・自動 チューニングによる新計算原理 に基づく革新的数値解法 ② 階層型データ駆動アプローチ 等に基づく革新的機械学習手 法 ③ヘテロジニアス環境(e.g. Wisteria/BDEC-01)におけるソ フトウェア,ユーティリティ群 11 Hierarchical, Hybrid, Heterogeneous Big Data & Extreme Computing h3-Open-BDEC h3-Open-MATH Algorithms with High- Performance, Reliability, Efficiency h3-Open-VER Verification of Accuracy h3-Open-AT Automatic Tuning h3-Open-APP: Simulation Application Development h3-Open-DATA: Data Data Science h3-Open-DDA: Learning Data Driven Approach h3-Open-SYS Control & Integration h3-Open-UTIL Utilities for Large-Scale Computing Integration + Communications+ Utilities Simulation + Data + Learning New Principle for Computations Numerical Alg./Library App. Dev. Framework Control & Utility
  • 12.
    Wisteria/BDEC-01: The First“Really Heterogenous” System in the World 12 12 Simulation Codes Data/Learning Nodes, Aquarius Simulation Nodes Odyssey Machine Learning, DDA Optimized Models & Parameters Observation Data Results Data Assimilation Data Analysis Wisteria/BDEC-01 External Resources External Network Server, Storage, DB, Sensors, etc. Fast File System (FFS) 1 PB, 1.0 TB/s External Resources External Network Simulation Nodes: Odyssey Fujitsu/Arm A64FX 25.9PF, 7.8 PB/s 2.0 TB/s 800 Gbps Shared File System (SFS) 25.8 PB, 500 GB/s Data/Learning Nodes: Aquarius Intel Ice Lake + NVIDIA A100 7.20 PF, 578.2 TB/s External Resources Platform for Integration of (S+D+L) Big Data & Extreme Computing
  • 13.
    h3-Open-UTIL/MP (h3o-U/MP) (HPC+AI) Coupling [Dr.H. Yashiro, NIES] 13 h3o-U/MP HPC App (Fortran) Analysis/ML App (Python) h3o-U/MP F<->P adapter Coupling Surrogate Model Visualiztion Statistics A huge amount of simulation data output Simulations Odyssey AI/ML Aquarius
  • 14.
    h3-Open-SYS/WaiIO-Socket 14 • Wisteria/BDEC-01 – Aquarius(GPU: NVIDIA A100) – Odyssey (CPU: A64FX) • Combining Odyssey-Aquarius – Single MPI Job over O-A is impossible • Connection between Odyssey- Aquarius – IB-EDR with 2TB/sec. – Fast File System – h3-Open-SYS/WaitIO-Socket • Library for Inter-Process Communication through IB- EDR with MPI-like interface Fast File System (FFS) 1 PB, 1.0 TB/s External Resources External Network Simulation Nodes: Odyssey Fujitsu/Arm A64FX 25.9PF, 7.8 PB/s 2.0 TB/s 800 Gbps Shared File System (SFS) 25.8 PB, 500 GB/s Data/Learning Nodes: Aquarius Intel Ice Lake + NVIDIA A100 7.20 PF, 578.2 TB/s External Resources Platform for Integration of (S+D+L) Big Data & Extreme Computing WaitIO-Socket
  • 15.
    h3-Open-SYS/WaiIO-File 15 • Wisteria/BDEC-01 – Aquarius(GPU: NVIDIA A100) – Odyssey (CPU: A64FX) • Combining Odyssey-Aquarius – Single MPI Job over O-A is impossible • Connection between Odyssey- Aquarius – IB-EDR with 2TB/sec. – Fast File System – h3-Open-SYS/WaitIO-File • Library for Inter-Process Communication through FFS with MPI-like interface Fast File System (FFS) 1 PB, 1.0 TB/s External Resources External Network Simulation Nodes: Odyssey Fujitsu/Arm A64FX 25.9PF, 7.8 PB/s 2.0 TB/s 800 Gbps Shared File System (SFS) 25.8 PB, 500 GB/s Data/Learning Nodes: Aquarius Intel Ice Lake + NVIDIA A100 7.20 PF, 578.2 TB/s External Resources Platform for Integration of (S+D+L) Big Data & Extreme Computing WaitIO-File
  • 16.
    h3-Open-UTIL/MP + h3-Open-SYS/WaitIO-Socket Available inJune 2022 16 Fortran APP (NICAM) Python APP (PyTorch) h3open modules h3open modules h3opp.py h3open_py.f90 Jcup modules Jcup modules h3-Open-UTIL/MP jcup_mpi_lib.f90 jcup_mpi_lib.f90 MPI Jcup Fortran APP (NICAM) Python APP (PyTorch) h3open modules h3open modules h3opp.py h3open_py.f90 Jcup modules Jcup modules jcup_mpi_lib.f90 jcup_mpi_lib.f90 MPI Jcup MPI WaitIO MPI+WaitIO MPI wrapper h3-Open-UTIL/MP May 2021: MPI Only Odyssey Aquarius IB-EDR June 2022: Coupler+WaitIO
  • 17.
    17 API of h3-Open-SYS/WaitIO-Socket/-File PB(Parallel Block): Each Application WaitIO API Description waitio_isend Non-Blocking Send waitio_irecv Non-Blocking Receive waitio_wait Termination of waitio_isend/irecv waitio_init Initialization of WaitIO waitio_get_nprocs Process # for each PB (Parallel Block) waitio_create_group waitio_create_group_wranks Creating communication groups among PB’s waitio_group_rank Rank ID in the Group waitio_group_size Size of Each Group waitio_pb_size Size of the Entire PB waitio_pb_rank Rank ID of the Entire PB [Sumimoto et al. 2021]
  • 18.
    Replacing this part with AI Motivation of this experiment  Tow types of Atmospheric models: Cloud resolving VS Cloud parameterizing  Could resolving model is difficult to use for climate simulation  Parameterized model has many assumptions  Replacing low-resolution cloud processes calculation with ML! Diagram of applying ML to an atmospheric model High Resolution Atmospheric Model (Convection-Resolving Mode) Low Resolution Atmospheric Model (Convection-Parameterization Mode) Physical process Input Output Coupling with Grid Remapping ML App (Python) Coupling without Grid Remapping Coupling Phase 1 Training with high-resolution NICAM data Coupling Phase 2 Replacing Physical Process in Low-Resolution NICAM with Machine Learning Atmosphere-ML Coupling [Yashiro (NIES), Arakawa (ClimTech/U.Tokyo)] 75% 25% Odyssey Aquarius h3-Open-UTIL/MP (Coupler) + h3-Open-SYS/WaitIO- Socket ~0%
  • 19.
    Experimental Design  Atmosphericmodel on Odyssey  NICAM : global non-hydrostatic model with an icosahedral grid  Resolution : horizontal : 10240, vertical : 78  ML on Aquarius  Framework : PyTorch  Method : Three-Layer MLP  Resolution : horizontal : 10240, vertical : 78  Experimental design  Phase1: PyTorch is trained to reproduce output variables from input variables of cloud physics subroutine.  Phase2:Reproduce the output variables from Input variables and training results  Training data  Input : total air density (rho), internal energy (ein), density of water vapor (rho_q)  Output : tendencies of input variables computed within the cloud physics subroutine Atmospheric Model (Convection-Scheme ON) Cloud physics subroutine Input Output ML App (Python) Output Phase1: Training phase Phase2: Test phase Simulation Node Odyssey Data/Learning Node Aquarius Δ𝑟ℎ𝑜 Δ𝑇 Δ𝑒𝑖𝑛 Δ𝑇 Δ𝑟ℎ𝑜_𝑞 Δ𝑇
  • 20.
    Test calculation Total airdensity Internal energy Density of water vapor Input Simulation Output  Compute output variables from input variables and PyTorch  The rough distribution of all variables is well reproduced  The reproduction of extreme values is no good ML output Simulations Prediction by ML/NN
  • 21.
  • 22.
    2001-2005 2006-2010 2011-20152016-2020 2021-2025 2026-2030 Hitachi SR2201 307.2GF Hitachi SR8000/MPP 2,073.6 GF Hitachi SR8000 1,024 GF Hitachi SR11000 J1, J2 5.35 TF, 18.8 TF Hitachi SR16K/M1 Yayoi 54.9 TF Hitachi HA8000 T2K Todai 140 TF Fujitsu FX10 Oakleaf-FX 1.13 PF Reedbush- U/H/L (SGI-HPE) 3.36 PF 33.1 PF Fujitsu BDEC- 02 150+ PF AMD Opteron OBCX (Fujitsu) 6.61 PF Ipomoea-01 25PB Ipomoea-02 Ipomoea- 03 SPACR64 IXfx Intel BDW + NVIDIA P100 IBM Power7 IBM Power5+ Intel CLX A64FX, Intel Icelake+ NVIDIA A100 Oakforest- PACS (Fujitsu) 25.0 PF Intel Xeon Phi Accelerators SR8000 SR8000 HARP-1E 22 Pseudo Vector Multicore CPU GPU, Accelerators Miyabi/OFP-II 80+ PF Intel SPR, NVIDIA GH200 Supercomputers @ITC/U.Tokyo 3,000+ Users 55+% outside of U.Tokyo
  • 23.
    100G 100G 100G 100G 400G Ether (RoCEv2) NDR400 x2 100G HDR200 HDR200 IB-NDR (800 Gbps) 100G x20 NDR400 x2 NDR400 x2 NDR400 x2 HDR100 x8 100G • GPU clusterwith 16 H100 GPUs, as prototype system for OFP-II • Operation will start since Nov. 2023, installed and operated by Nippon Comsys • Intel Xeon Platinum 8468 (Sapphire Rapids, 48c, 2.1GHz) x2, DDR5-4800 512 GB + NVIDIA H100 SXM5 x4, manufactured by Dell • NVIDIA InfiniBand-NDR (800 Gbps) + 400 G Ethernet (RoCE) • Lustre Filesystem: 67 TB by NVMe SSD + 26 PB available from Ipomoea-01 (Wisteria-) Mercury Sep. 4th, 2023 44th Advanced Supercomputing Environment (ASE) Seminar 23 Ipomoea-01 Large-scale Common Storage Compute node 4 node Fast Storage Login node / Management node
  • 24.
  • 25.
    NVIDIA GH200 GraceHopper Superchip Hopper GPU GRACE CPU 72c, 2.6 GHz IB NDR HCA ConnectX-7 LPDDR5X 120 GB 450 GB/s 512 GB/s HBM3 96 GB 4.022 TB/s NVLink C2C PCIe Gen4 x4 PCIe Gen5 x8 IB NDR200 (200 Gbps) NVMe SSD 1.92 TB • Acc-Group: CPU+GPU: NVIDIA GH200 – Node: NVIDIA GH200 Grace-Hopper Superchip • Grace: 72c, 3.456 TF, 120 GB, 512 GB/sec (LPDDR5X) • H100: 66.9 TF DP-Tensor Core, 96 GB, 4,022 GB/sec (HBM3) – Cache Coherent between CPU-GPU • NVMe SSD for each GPU: 1.9TB, 8.0GB/sec, GPUDirect Storage – Total (Aggregated Performance: CPU+GPU) • 1,120 nodes, 78.8 PF, 5.07 PB/sec, IB-NDR 200 • CPU-Group: CPU Only: Intel Xeon Max 9480 (SPR) – Node: Intel Xeon Max 9480 (1.9 GHz, 56c) x 2 • 6.8 TF, 128 GiB, 3,200 GB/sec (HBM2e only) – Total • 190 nodes, 1.3 PF, IB-NDR 200 • 372 TB/sec for STREAM Triad (Peak: 608 TB/sec) Miyabi (1/2) Operation starts in January 2025 25
  • 26.
    • File System:DDN EXA Scalar, Lustre FS – 11.3 PB (NVMe SSD) 1.0TB/sec, “Ipomoea-01” with 26 PB is also available • All nodes are connected with Full Bisection Bandwidth – (400Gbps/8)×(32×20+16×1) = 32.8 TB/sec • Operation starts in January 2025, h3-Open-SYS/WaitoIO will be adopted for communication between Acc-Group and CPU-Group Miyabi (2/2) Operation starts in January 2025 26 IB-NDR(400Gbps) IB-NDR200(200) IB-HDR(200) File System DDN EXA Scaler 11.3 PB, 1.0TB/sec CPU-Group Intel Xeon Max (HBM2e) 2 x 190 1.3 PF, 608 TB/sec Acc-Group NVIDIA GH200 1,120 78.2 PF, 5.07 PB/sec Ipomoea-01 Common Shared Storage 26 PB
  • 27.
  • 28.
    http://www.top500.org/ Site Computer/Year VendorCores Rmax (PFLOPS) Rpeak (PFLOPS) GFLOPS/W Power (kW) 1 El Capitan, 2024, USA DOE/NNSA/LLNL HPE Cray EX255a, AMD 4th Gen EPYC 24C 1.8GHz, AMD Instinct MI300A, Slingshot-11, TOSS 11,039,616 1,742.00 (=1.742 EF) 2,746.38 63.4 % 58.99 29,581 2 Frontier, 2021, USA DOE/SC/Oak Ridge National Laboratory HPE Cray EX235a, AMD Optimized 3rd Gen. EPYC 64C 2GHz, AMD Instinct MI250X, Slingshot-11 9,066,176 1.353.00 2,055.72 65.8 % 54.98 24,607 3 Aurora, 2023, USA DOE/SC/Argonne National Laboratory HPE Cray EX - Intel Exascale Compute Blade, Xeon CPU Max 9470 52C 2.4GHz, Intel Data Center GPU Max, Slingshot-11, Intel 9,264,128 1,012.00 1,980.01 51.1 % 26.15 38,698 4 Eagle, 2023, USA Microsoft Microsoft NDv5, Xeon Platinum 8480C 48C 2GHz, NVIDIA H100, NVIDIA Infiniband NDR 2,073,600 561.20 846.84 66.3 % 5 HPC 6, 2024, Italy Eni S.p.A. HPE Cray EX235a, AMD Optimized 3rd Generation EPYC 64C 2GHz, AMD Instinct MI250X, Slingshot-11, RHEL 8.9 3,143,520 477.90 606.97 66.3 % 56.48 8,461 6 Fugaku, 2020, Japan R-CCS, RIKEN Fujitsu PRIMEHPC FX1000, Fujitsu A64FX 48C 2.2GHz, Tofu-D 7,630,848 442.01 537.21 82.3 % 14.78 29,899 7 Alps, 2024, Switzerland Swiss Natl. SC Centre (CSCS) HPE Cray EX254n, NVIDIA Grace 72C 3.1GHz, NVIDIA GH200 Superchip, Slingshot-11 2,121,600 434.90 574.84 75.7 % 61.05 7,124 8 LUMI, 2023, Finland EuroHPC/CSC HPE Cray EX235a, AMD Optimized 3rd Gen. EPYC 64C 2GHz, AMD Instinct MI250X, Slingshot-11 2,752,704 379.70 531.51 71.4 % 53.43 7,107 9 Leonard, 2023, Italy EuroHPC/Cineca BullSequana XH2000, Xeon Platinum 8358 32C 2.6GHz, NVIDIA A100 SXM4 64GB, Quad-rail NVIDIA HDR100 1,824,768 241.20 306.31 78.7 % 32.19 7,494 10 Tuolumne, 2024, USA DOE/NNSA/LLNL HPE Cray EX255a, AMD 4th Gen EPYC 24C 1.8GHz, AMD Instinct MI300A, Slingshot-11, TOSS 1,161,216 208.10 288.88 72.0 % 61.45 3,387 13 Venado, 2024, USA DOE/NNSA/LANL HPE Cray EX254n, NVIDIA Grace 72C 3.1GHz, NVIDIA GH200 Superchip, Slingshot-11 481,440 98.51 130.44 75.5 % 59.29 1,662 16 CHIE-3, 2024, Japan SoftBank, Corp. NVIDIA DGX H100, Xeon Platinum 8480C 56C 2GHz, NVIDIA H100, Infiniband NDR400, Ubuntu 22.04.4 LTS 163.200 91.94 138.32 66.5 % 17 CHIE-2, 2024, Japan SoftBank, Corp. NVIDIA DGX H100, Xeon Platinum 8480C 56C 2GHz, NVIDIA H100, Infiniband NDR400, Ubuntu 22.04.4 LTS 163.200 89.78 138.32 64.9 % 18 JETI, 2024, Germany EuroHPC/FZJ BullSequana XH3000, Grace Hopper Superchip 72C 3GHz, NVIDIA GH200 Superchip, Quad-Rail NVIDIA InfiniBand NDR200, RedHat Linux, Modular OS 391,680 83.14 94.00 88.4 % 63.43 1,311 22 CEA-HE, 2024, France CEA BullSequana XH3000, Grace Hopper Superchip 72C 3GHz, NVIDIA GH200 Superchip, Quad-Rail BXI v2, EVIDEN 389,232 64.32 103.48 62.2 % 52.17 1,233 28 Miyabi-G, 2024, Japan JCAHPC Fujitsu, Supermicro ARS 111GL DNHR LCC, Grace Hopper Superchip 72C 3GHz, Infiniband NDR200, Rocky Linux 80,640 46.80 72.80 64.3 % 47.59 983 36 TSUBAME 4.0, 2024,Japan Institute of Science Tokyo HPE Cray XD665, AMD EPYC 9654 96C 2.4GHz, NVIDIA H100 SXM5 94 GB, Infiniband NDR200 172,800 39.62 61.60 64.3 % 48.55 816 58 Wisteria/BDEC-01 (Odyssey), 2021, Japan Fujitsu PRIMEHPC FX1000, A64FX 48C 2.2GHz, Tofu D 368,640 22.12 25.95 15.07 1,468 64th TOP500 List (Nov, 2024) Rmax: Performance of Linpack (TFLOPS) Rpeak: Peak Performance (TFLOPS), Power: kW 28
  • 29.
    Detailed Plan forPorting • Strong supports by NVIDIA, Japan • 3,000+ OFP users: Two categories of support • Self Porting: Various Options – 1-week Hackathon (online/hybrid): Every 3-months, utliziing Slack for comm. – Monthly Open Meeting for Consultation via Zoom (Non-users can join) – Portal Site for Useful Information (in Japanese) • https://jcahpc.github.io/gpu_porting/ • Surpported Porting – Community Codes with Many Users (17, next page) + OpenFOAM (by NVIDIA) – Budget for Outsourcing – Started in October 2022: Meetings every 3-4 months – Many members of “Supported Porting” groups are joining Hackathons. • Mostly, our users’ codes are parallelized by MPI+OpenMP – OpenACC is recommended 29
  • 30.
    30 Category Name (Organizations)Target, Method etc. Language Engineering (5) FrontISTR (U.Tokyo) Solid Mechanics, FEM Fortran FrontFlow/blue (FFB) (U.Tokyo) CFD, FEM Fortran FrontFlow/red (AFFr) (Advanced Soft) CFD, FVM Fortran FFX (U.Tokyo) CFD, Lattice Boltzmann Method (LBM) Fortran CUBE (Kobe U./RIKEN) CFD, Hierarchical Cartesian Grid Fortran Biophysics (3) ABINIT-MP (Rikkyo U.) Drug Discovery etc., FMO Fortran UT-Heart (UT Heart, U.Tokyo) Heart Simulation, FEM etc. Fortran, C Lynx (Simula, U.Tokyo) Cardiac Electrophysiology, FVM C Physics (3) MUTSU/iHallMHD3D (NIFS) Turbulent MHD, FFT Fortran Nucl_TDDFT (Tokyo Tech) Nuclear Physics, Time Dependent DFT Fortran Athena++ (Tohoku U. etc.) Astrophysics/MHD, FVM/AMR C++ Climate/ Weather/ Ocean (4) SCALE (RIKEN) Climate/Weather, FVM Fortran NICAM (U.Tokyo, RIKEN, NIES) Global Climate, FVM Fortran MIROC-GCM (AORI/U.Tokyo) Atmospheric Science, FFT etc. Fortran77 Kinaco (AORI/U.Tokyo) Ocean Science, FDM Fortran Earthquake (4) OpenSWPC (ERI/U.Tokyo) Earthquake Wave Propagation, FDM Fortran SPECFEM3D (Kyoto U.) Earthquake Simulations, Spectral FEM Fortran hbi_hacapk (JAMSTEC, U.Tokyo) Earthquake Simulations, H-Matrix Fortran sse_3d (NIED) Earthquake Science, BEM (CUDA Fortran) Fortran
  • 31.
  • 32.
    32 計算可能領域の開拓のための量子・スパコン連携プラットフォームの研究開発 実施者 概要 国立研究開発法人理化学研究所、ソフトバンク株式会社 (共同実施)東京大学、大阪大学 量子コンピュータとスーパーコンピュータ(HPC)を連携するための量子・HPC連携システムソフトウェアを研 究開発し、これを用いてこれまでのスパコンのみでは困難だった領域の計算を可能とする量子・スパコン連携 プラットフォームを構築する。既存のスパコンのみの計算に対し量子・HPC連携アプリの優位性を実証するとと もに、この計算プラットフォームで実行される量子・HPC連携ソフトウェアをポスト5G時代のネットワークで提供 されるサービスとして展開する技術を開発する。 ポスト5G情報通信システム基盤強化研究開発事業/ポスト5G情報通信システムの開発 1.開発目的 • 量子コンピュータは、従来のコンピュータと全く異なる原理で動作し、情報処理速度の劇的な高速 化が期待されるが、現時点では、規模拡大と計算結果の誤り訂正の両立が困難であり、量子コ ンピュータ単独での実用化には時間を要する見込み。 •一方で、デジタル化の進展により、情報処理能力の向上が急務であり、量子コンピュータの活用 がいち早く求められているところ、古典コンピュータを組み合わせて活用することが有望視されている。 • 本事業では、世界に先駆けて、量子コンピュータとスパコンを連携利用するためのソフトウェアやプ ラットフォーム、アプリケーションを開発・構築し、ポスト5G時代で提供されるサービスとして展開する 技術としての有効性を実証していく。 2.開発内容 • 量子・HPC連携ソフトウェア:スパコンと量子コンピュータを連携させ、最適な計算資源をシーム レスかつ効率的に利用するためのシステムソフトウェアを開発。 • モジュール型量子ソフトウェアライブラリ:アプリ分野に合わせたモジュール型のソフトウエアを整備、 量子コンピュータの特性に合わせたエラー緩和処理、回路最適化処理を実現する上位ソフトウェア ライブラリを開発。モジュールとして組み合わせることで高度な量子アプリケーションを開発可能とする。 • 量子・スパコン連携PFのクラウド化技術:事業展開を見据えて、量子アプリケーションの利用を 支援するクラウド基盤ソフトウェアを開発。 3.構築する量子・スパコン連携プラットフォームの構成 • 理研・計算科学研究センター(神戸)及び(和光)に特性の異なる2種類の量子コンピュータ を整備。これらと富岳、及び東大・阪大スパコンと連携したプラットフォームを構築。 量子・スパコン連携プラットフォーム 量子計算ミュレータ (高性能GPUシステム) 大規模シミュレータ (スパコン・富岳) 超伝導型量子 コンピュータ イオントラッ プ型量子コン ピュータ 量子計算シミュレータ 量子コンピュータ 量子コンピュータ・量子計算シミュレータ スパコン(富岳) PC/サーバー 高性能 GPUシ ステム スーパーコンピュータ 古典コンピュータ 遠隔手続き呼び出し 連携スケジューラ 量子HPCプログラミ ング環境 量子・スパコン連携 クラウド化技術 量子・HPC連携 システムソフト ウェア モジュール型量子計算 ソフトウエアライブラリ 量子HPC連携プログラム最 適化技術(エラー緩和、 回路最適化) 2026年度に量子・スパコン連携プラット フォームを運用開始し、それを用いて量 子・ HPC連携アプリケーションの有効性 の実証に取り組む 2028年度下期、量子・スパコン連携プラッ トフォームのプレリリースを計画
  • 33.
    QC-HPC Hybrid Computing •JHPC-quantum (FY.2023-FY.2028) – https://jhpc-quantum.org/ – RIKEN, Softbank, U.Tokyo, Osaka U. • supported by Japanese Government (METI/NEDO) – Two Real Quantum Computers will be installed • IBM’s Superconducting QC at RIKEN-Kobe (100+Qubit) • Quantinuum’s Ion-Trap QC at RIKEN-Wako (20+Qubit) – Target Applications • Quantum Sciences, Quantum Machine Learning etc. 33 A64FX Arm X86 NVIDIA Intel AMD Arm AMD Intel Sambanova Cerebras Graphcore etc. Quantum h3-Open-SYS/WaitIO CPU GPU Others • Role of U.Tokyo – R&D on System SW for QC-HPC Hybrid Environment (QC as Accelerators) • Extension of h3-Open-BDEC • Fugaku & Miyabi will be connected to QCs in March 2026 (or before)
  • 34.
    Wisteria/BDEC-01 with Odyssey-Aquarius Simulatorof (QC-HPC) Hybrid Environment 34 Fast File System (FFS) 1 PB, 1.0 TB/s External Resources External Network Simulation Nodes: Odyssey Fujitsu/Arm A64FX 25.9PF, 7.8 PB/s 2.0 TB/s 800 Gbps Shared File System (SFS) 25.8 PB, 500 GB/s Data/Learning Nodes: Aquarius Intel Ice Lake + NVIDIA A100 7.20 PF, 578.2 TB/s External Resources Platform for Integration of (S+D+L) Big Data & Extreme Computing
  • 35.
    • Quantum Computer= Accelerator of Supercomputers: QC-HPC Hybrid • System SW for Efficient & Smooth Operation of QC (Quantum Computer, including simulators on supercomputers)-HPC Hybrid Environment – QHscheduler: A job scheduler that can simultaneously use multiple computer resources distributed in remote locations – h3-Open-BDEC/QH: Coupling to efficiently implement and integrate communication and data transfer between QC-HPC on-line and in real time – Collaboration with RIKEN R-CCS, funded by Japanese Government • Target Application – AI for HPC, combined workload • Simulations in Computational Science • Quantum Machine Learning – Quantum Simulations, Error Correction System SW for QC-HPC Hybrid Environment (1/2) 35 HPC (1) HPC (2) HPC (3) QC (a) QC (b) QC On HPC QHscheduler h3-Open-BDEC/QH
  • 36.
    • Innovations – Thisis the world's first attempt to link multiple supercomputers and quantum computers installed at different sites in real time. – In particular, by using multiple QCs simultaneously, it is possible to form a virtual QC with higher processing capacity. • Many people are thinking about same thing all over the world – This idea can be extended to any types of systems System SW for QC-HPC Hybrid Environment (2/2) 36 HPC (1) HPC (2) HPC (3) QC (a) QC (b) QC On HPC QHscheduler h3-Open-BDEC/QH AI
  • 37.
    • CUDA-Q hasbeen already installed to Aquarius • QC-HPC Hybrid Environment is now under construction using h3-Open-BDEC • 1-Day Tutorial on December 18 (Wed) – Let's experience "Quantum/HPC Hybrid" with "CUDA- Q+Wisteria/BDEC-01+h3-Open-BDEC"! – World’s 1st Tutorial on “QC-HPC” with hands-on • VQE (Variational Quantum Eigensolver): Toy Wisteria/BDEC-01 with Odyssey-Aquarius Simulator of (QC-HPC) Hybrid Environment 37
  • 38.
  • 39.
    mdx I &II: Current Systems • IaaS-based High-Performance Academic Cloud • mdx I @ UTokyo (2023 April ~ now) • VM hosting (on 368 CPU Nodes + 40 GPU Nodes Cluster) • Peta-Scale Storage (Lustre + s3, around 25PB in total) • Global IPs for web service • mdx II @ OsakaU (2025) • 60 CPU nodes + 1PB Storage (+ GPU Nodes under preparation) • https://mdx.jp • Targeting Data Science and Cross-Disciplinary Research • Around 120 research projects undergoing (2023 April ~ now): Computer Science, LLM, ML, Social Science, Life Science, Physics, Chemistry, Materials Science, etc. • Not only on traditional HPC workload but also on Data collection, Web service, etc. • Example • ARIM-mdx Data System: Nextcloud + Jupyter Service for materials science • Data collection / analysis from experimental facilities & HPCs • https://arim.mdx.jp/ 39
  • 40.
    mdx III :Key Concepts • Container-based High-Performance Academic Cloud • On top of CPU / GPU Cluster • VM hosting (mdx I & II) → Container hosting (mdx III) • Targets: • Data-centric Projects (current) + ML Inference (new) • Key concepts: • Interactive • Jupyter, VSCode, Remote Desktop, etc. • Long-running • Web services, Data Collection, Data base, Inference deployment, etc. • HPC offloading & Storage Sharing • Large-Scale Batch Workload, Data Training, Simulation, etc. • Customizable • Custom Container, Tenant Isolation, etc. • Inherit from mdx I & II 40 Data Analysis / Development Web Server DB Server Inference Server Data Source BEDEC-02 Offloading (Training / Simulation) mdx III
  • 41.
    BDEC-02: 2nd Systemof the BDEC Project • Operation starts in Fall 2027- Spring 2028, hopefully • Wisteria/BDEC-01 retires at the end of April 2027 • University of Tokyo’s System with 200-250 PF (150 PF is realistic) • HPL performance is not a big issue • “Real-Time” Integration of (S+D+L) towards “AI for Science” • NVIDIA’s GPUs, or ones based on compatible programming environments with OpenACC/StdPar • Fortran is still very important • Connection to Special Devices/Systems • Quantum Computers etc. • h3-Open-BDEC, h3-Open-BDEC/QH • Integration with mdx-Next (mdx3) • Cloud-like Data Platform • (at least) (part of) storage to be shared 41
  • 42.
    Internet BDEC-02+mdx3 Concept • Theconcept of the BDEC-02/mdx3 is to save costs by sharing computer resources as much as possible and avoiding redundant configurations. • Currently, Wisteria/BDEC-01 and mdx are separate systems in the same room 42 Wisteria/BDEC-01 Odyssey Aquarius mdx Router Node Router Internet Wisteria/BDEC-02 Node A Node B mdx3 Node Router Ipomoea-01 (Ipomoea-02)