PCCC23：東京大学情報基盤センター「Society5.0の実現を目指す『計算・データ・学習』の融合による革新的スーパーコンピューティング」

① 「計算・データ・学習」融合スーパー
コンピュータシステムWisteria/BDEC-01
② 革新的ソフトウェア基盤h3-Open-BDEC
③ 次の一手：将来動向とOFP-IIへ向けた取り組み
東京大学情報基盤センター

① 「計算・データ・学習」融合スーパー
コンピュータシステムWisteria/BDEC-01

Reedbush (HPE, Intel BDW + NVIDIA P100 (Pascal))
• データ解析・シミュレーション融合スーパーコンピュータ
• 2016年7月～2021年11月末
• 東大ITC初のGPUクラスタ，ピーク性能3.36 PF（Reedbush-H/L）
Oakforest-PACS (OFP) (Fujitsu, Intel Xeon Phi (KNL))
• JCAHPC （筑波大CCS・東大ITC），2016年10月～2022年3月末
• 25 PF, #39 in 58th TOP 500 (November 2021)
Oakbridge-CX (OBCX) (Fujitsu, Intel Xeon CLX)
• 2019年7月～2023年9月末
• 6.61 PF, #129 in 60th TOP500（November 2022）
Wisteria/BDEC-01（Fujitsu）
• シミュレーションノード群（Odyssey）：A64FX (#23)
• データ・学習ノード群（Aquarius）：Intel Icelake + NVIDIA A100 (#125)
• 33.1 PF, 2021年5月14日運用開始
• 「計算・データ・学習（S+D+L）」融合のためのプラットフォーム
• 革新的ソフトウェア基盤「h3-Open-BDEC」
（科研費基盤（S） 2019年度～2023年度）
Fast File
System
（FFS）
1 PB, 1.0 TB/s
External
Resources
External Network
Simulation Nodes:
Odyssey
Fujitsu/Arm A64FX
25.9PF, 7.8 PB/s
2.0 TB/s
800 Gbps
Shared File
System
（SFS）
25.8 PB, 500 GB/s
Data/Learning
Nodes: Aquarius
Intel Ice Lake + NVIDIA A100
7.20 PF, 578.2 TB/s
External
Resources
Platform for Integration of (S+D+L)
Big Data & Extreme Computing
Oakbridge-CX
Simulation Nodes
（Odyssey）
Data/Learning Nodes
（Aquarius） Reedbush Oakforest-PACS

2001-2005 2006-2010 2011-2015 2016-2020 2021-2025 2026-2030
Hitachi
SR2201
307.2GF
Hitachi
SR8000/MPP
2,073.6 GF
Hitachi SR8000
1,024 GF
Hitachi SR11000
J1, J2
5.35 TF, 18.8 TF
Hitachi SR16K/M1
Yayoi
54.9 TF
Hitachi HA8000
T2K Todai
140 TF
Fujitsu FX10
Oakleaf-FX
1.13 PF
Reedbush-
U/H/L (SGI-HPE)
3.36 PF
33.1 PF
Fujitsu
OFP-II
75+ PF
BDEC-
02
150+ PF
OBCX
(Fujitsu)
6.61 PF
Ipomoea-01 25PB
Ipomoea-02
Ipomoea-
03
Oakforest-
PACS (Fujitsu)
25.0 PF
東京大学情報基盤
センターのスパコン
利用者2,600+名
55％は学外
4

2001-2005 2006-2010 2011-2015 2016-2020 2021-2025 2026-2030
Hitachi
SR2201
307.2GF
Hitachi
SR8000/MPP
2,073.6 GF
Hitachi SR8000
1,024 GF
Hitachi SR11000
J1, J2
5.35 TF, 18.8 TF
Hitachi SR16K/M1
Yayoi
54.9 TF
Hitachi HA8000
T2K Todai
140 TF
Fujitsu FX10
Oakleaf-FX
1.13 PF
Reedbush-
U/H/L (SGI-HPE)
3.36 PF
33.1 PF
Fujitsu
BDEC-
02
150+ PF
AMD Opteron
OBCX
(Fujitsu)
6.61 PF
Ipomoea-01 25PB
Ipomoea-02
Ipomoea-
03
SPACR64 IXfx
Intel BDW +
NVIDIA P100
IBM Power7
IBM Power5+
Intel CLX
A64FX,
Intel Icelake+
NVIDIA A100
Oakforest-
PACS (Fujitsu)
25.0 PF
Intel Xeon Phi
Accelerators
SR8000
SR8000
HARP-1E
疑似ベクトル
汎用CPU
加速装置付
5
利用者2,600+名
55％は学外
OFP-II
75+ PF
NVIDIA GPUs

スーパーコンピューティング
の今後
• ワークロードの多様化
– 計算科学，計算工学：Simulations
– 大規模データ解析
– AI，機械学習
• （シミュレーション（計算）＋データ＋学習）融合
⇒Society 5.0実現に有効
– フィジカル空間とサイバー空間の融合
• S:シミュレーション（計算）（Simulation）
• D:データ（Data）
• L：学習（Learning）
– Simulation + Data + Learning = S+D+L
BDEC: S + D + L
mdx: S + D + L 6
• 2021年春に柏IIキャンパスで始動
– BDEC（Wisteria/BDEC-01）：賢いスパコン
– Data Platform（mdx）：Cloud的，よりフレキシブル
Engineering
Earth/Space
Material
Energy/Physics
Info. Sci. : System
Info. Sci. : Algrorithms
Info. Sci. : AI
Education
Industry
Bio
Bioinformatics
Social Sci. & Economics
Data
工学・
ものつくり
材料科学
産業利用
エネルギー・
物理学
Engineering
Earth/Space
Material
Energy/Physics
Info. Sci. : System
Info. Sci. : Algrorithms
Info. Sci. : AI
Education
Industry
Bio
Bioinformatics
Data
材料科学
情報科学：AI
生物科学・
生体力学
バイオ
インフォマティクス
情報科学：
アルゴリズム
工学・ものつくり
地球科学・宇宙科学
材料科学
エネルギー・物理学
情報科学：システム
情報科学：アルゴリズム
情報科学：AI
教育
産業利用
生物科学・生体力学
バイオインフォマティクス
社会科学・経済学
データ科学・データ同化
CPU Cluster
Oakbridge-CX
GPU Cluster
Reedbush-L
生物科学・
生体力学

Wisteria/BDEC-01
• 2021年5月14日運用開始
– 東京大学柏Ⅱキャンパス
• 33.1 PF, 8.38 PB/sec. ，富士通製
– ~4.5 MVA（空調込み）, ~360m2
• Hierarchical, Hybrid, Heterogeneous (h3)
• 2種類のノード群
– シミュレーションノード群（S, SIM）：Odyssey
• 従来のスパコン
• Fujitsu PRIMEHPC FX1000 (A64FX), 25.9 PF
– 7,680ノード（368,640 コア），20ラック，Tofu-D
7
– データ・学習ノード群（D/L, DL）：Aquarius
• データ解析，機械学習
• Intel Xeon Ice Lake + NVIDIA A100, 7.2 PF
– 45ノード（Ice Lake：90基，A100：360基）, IB-HDR
• 一部は外部リソース（ストレージ，サーバー，センサー
ネットワーク他）に直接接続
– ファイルシステム：共有（大容量）＋高速
BDEC:「計算・データ・学習（S+D+L）」
融合のためのプラットフォーム
（Big Data & Extreme Computing）
Fast File
System
高速ファイル
システム
（FFS）
1PB, 1.0TB/s
External
Resources
外部リソース
External Network
外部ネットワーク
シミュレーションノード群
Odyssey
Fujitsu/Arm A64FX
25.9PF, 7.8PB/s
2.0TB/s
800 Gbps
Shared File
System
共有ファイル
システム
（SFS）
25.8 PB, 500GB/s
データ・学習ノード群
Aquarius
7.20PF, 578.2TB/s

http://www.top500.org/
Site Computer/Year Vendor Cores
Rmax
(PFLOPS)
Rpeak
(PFLOPS)
Power
(kW)
1 Frontier, 2022, USA
DOE/SC/Oak Ridge National Laboratory
HPE Cray EX235a, AMD Optimized 3rd Gen. EPYC 64C
2GHz, AMD Instinct MI250X, Slingshot-11
8,699,904
1,194.00
(=1.194 EF)
1,679.82 22,703
2 Aurora, USA
DOE/SC/Argonne National Laboratory
HPE Cray EX - Intel Exascale Compute Blade, Xeon CPU
Max 9470 52C 2.4GHz, Intel Data Center GPU Max,
Slingshot-11, Intel
4,742,808 585.34 1,059.33 24,687
3 Eagle, USA
Microsoft Azure
Microsoft NDv5, Xeon Platinum 8480C 48C 2GHz, NVIDIA
H100, NVIDIA Infiniband NDR, Microsoft
1,123,200 561.20 846.84 -
4 Fugaku, 2020, Japan
R-CCS, RIKEN
Fujitsu PRIMEHPC FX1000, Fujitsu A64FX 48C 2.2GHz,
Tofu-D
7,630,848 442.01 537.21 29,899
5 LUMI, 2022, Finland
EuroHPC/CSC
2,220,288 309.10 428.70 6,016
6 Leonard, 2022, Italy
EuroHPC/Cineca
BullSequana XH2000, Xeon Platinum 8358 32C 2.6GHz,
NVIDIA A100 SXM4 64GB, Quad-rail NVIDIA HDR100
1,824,768 238.70 304.47 7,404
7 Summit, 2018, USA
IBM Power System AC922, IBM POWER9 22C 3.07GHz,
NVIDIA Volta GV100, Dual-rail Mellanox EDR InfiniBand
2,414,592 148.60 200.79 10,096
8 MareNostrum 5 ACC, Spain
EuroHPC/BSC
BullSequana XH3000, Xeon Platinum 8460Y+ 40C 2.3GHz,
NVIDIA H100 64GB, Infiniband NDR200, EVIDEN
680,960 138.20 265.57 2,528
9 EOS NVIDIA DGX SUPERPOD, USA
NVIDIA
NVIDIA DGX H100, XEON PLATINUM 8480C 56C 3.8GHZ,
NVIDIA H100, INFINIBAND NDR400
485,888 121.40 188.65 -
10 Sierra, 2018, USA
DOE/NNSA/LLNL
IBM Power System S922LC, IBM POWER9 22C 3.1GHz,
1,572,480 94.64 125.71 7,438
32
ABCI 2.0, 2021, Japan
AIST
PRIMERGY GX2570 M6, Xeon Platinum 8360Y 36C
2.4GHz, NVIDIA A100 SXM4 40 GB, InfiniBand HDR
504,000 22.21 54.34 1,600
33
Wisteria/BDEC-01 (Odyssey), 2021,
Japan ITC, University of Tokyo
PRIMEHPC FX1000, A64FX 48C 2.2GHz, Tofu
interconnect D
368,640 22.12 25.95 1,468
62nd TOP500 List (Nov., 2023) Rmax: Performance of Linpack (TFLOPS)
Rpeak: Peak Performance (TFLOPS), Power: kW
8

http://www.top500.org/
Site Computer/Year Vendor Cores
Rmax
(PFLOPS)
Rpeak
(PFLOPS)
Power
(kW)
1 Frontier, 2022, USA
8,699,904
1,194.00
(=1.194 EF)
1,679.82 22,703
2 Aurora, USA
DOE/SC/Argonne National Laboratory
HPE Cray EX - Intel Exascale Compute Blade, Xeon CPU
Max 9470 52C 2.4GHz, Intel Data Center GPU Max,
Slingshot-11, Intel
4,742,808 585.34 1,059.33 24,687
3 Eagle, USA
Microsoft Azure
Microsoft NDv5, Xeon Platinum 8480C 48C 2GHz, NVIDIA
H100, NVIDIA Infiniband NDR, Microsoft
1,123,200 561.20 846.84 -
4 Fugaku, 2020, Japan
R-CCS, RIKEN
Fujitsu PRIMEHPC FX1000, Fujitsu A64FX 48C 2.2GHz,
Tofu-D
7,630,848 442.01 537.21 29,899
5 LUMI, 2022, Finland
EuroHPC/CSC
2,220,288 309.10 428.70 6,016
6 Leonard, 2022, Italy
EuroHPC/Cineca
BullSequana XH2000, Xeon Platinum 8358 32C 2.6GHz,
NVIDIA A100 SXM4 64GB, Quad-rail NVIDIA HDR100
1,824,768 238.70 304.47 7,404
7 Summit, 2018, USA
IBM Power System AC922, IBM POWER9 22C 3.07GHz,
2,414,592 148.60 200.79 10,096
8 MareNostrum 5 ACC, Spain
EuroHPC/BSC
BullSequana XH3000, Xeon Platinum 8460Y+ 40C 2.3GHz,
NVIDIA H100 64GB, Infiniband NDR200, EVIDEN
680,960 138.20 265.57 2,528
9 EOS NVIDIA DGX SUPERPOD, USA
NVIDIA
NVIDIA DGX H100, XEON PLATINUM 8480C 56C 3.8GHZ,
NVIDIA H100, INFINIBAND NDR400
485,888 121.40 188.65 -
10 Sierra, 2018, USA
DOE/NNSA/LLNL
IBM Power System S922LC, IBM POWER9 22C 3.1GHz,
1,572,480 94.64 125.71 7,438
32
ABCI 2.0, 2021, Japan
AIST
PRIMERGY GX2570 M6, Xeon Platinum 8360Y 36C
2.4GHz, NVIDIA A100 SXM4 40 GB, InfiniBand HDR
504,000 22.21 54.34 1,600
33
Wisteria/BDEC-01 (Odyssey), 2021,
Japan ITC, University of Tokyo
PRIMEHPC FX1000, A64FX 48C 2.2GHz, Tofu
interconnect D
368,640 22.12 25.95 1,468
62nd TOP500 List (Nov., 2023) Rmax: Performance of Linpack (TFLOPS)
Rpeak: Peak Performance (TFLOPS), Power: kW
9
日本2位・柏1位（産業技術総合研究所，柏IIキャンパス）
日本3位・柏2位

SC23における諸ランキング
10
Fast File
System
（FFS）
1 PB, 1.0 TB/s
External
Resources
External Network
Simulation Nodes:
Odyssey
Fujitsu/Arm A64FX
25.9PF, 7.8 PB/s
2.0 TB/s
800 Gbps
Shared File
System
（SFS）
25.8 PB, 500 GB/s
Data/Learning
Nodes: Aquarius
7.20 PF, 578.2 TB/s
External
Resources
Odyssey Aquarius
TOP 500 33 168
Green 500 55 35
HPCG 14 73
Graph 500 BFS 6 -
HPL-MxP
(HPL-AI)
14 -

Engineering
Earth/Space
Material
Energy/Physics
Info. Sci. : System
Info. Sci. : Algorithms
Info. Sci. : AI
Education
Industry
Bio
Bioinformatics
Data
2022年度分野別 ■汎用CPU，■GPU
11
地球科学・
宇宙科学
エネルギー・
物理学
材料科学
工学・
ものつくり
Engineering
Earth/Space
Material
Energy/Physics
Info. Sci. : System
Info. Sci. : AI
Education
Industry
Bio
Bioinformatics
Data
Engineering
Earth/Space
Material
Energy/Physics
Info. Sci. : System
Info. Sci. : AI
Education
Industry
Bio
Bioinformatics
Data
材料科学
情報科学：AI
教育
産業利用
工学・
ものつくり
地球科学・
宇宙科学
材料科学
エネルギー・
物理学
生物科学・
生体力学
エネルギー・
物理学
地球科学・
宇宙科学
材料科学
工学・
ものつくり
地球科学・
宇宙科学
バイオ
インフォマ
ティクス
情報科学：
AI
アルゴリズム
バイオインフォ
マティクス
材
料
科
学
工学・
ものつくり
データ科学・
データ同化
OBCX
CascadeLake
Odyssey
A64FX
Aquarius
A100

Engineering
Earth/Space
Material
Energy/Physics
Info. Sci. : System
Info. Sci. : AI
Education
Industry
Bio
Bioinformatics
Data
Engineering
Earth/Space
Material
Energy/Physics
Info. Sci. : System
Info. Sci. : AI
Education
Industry
Bio
Bioinformatics
Data
2023年度分野別（4月～9月末）
■汎用CPU，■GPU
12
Engineering
Earth/Space
Material
Energy/Physics
Info. Sci. : System
Info. Sci. : AI
Education
Industry
Bio
Bioinformatics
Data
AI
LLM
OBCX
CascadeLake
2023年9月末退役
Odyssey
A64FX
Aquarius
A100
材料科学
情報科学：AI
教育
産業利用
工学・
ものつくり
地球科学・
宇宙科学
材料科学
エネルギー・
物理学
生物科学・
生体力学
バイオ
インフォマ
ティクス
エネルギー・
物理学地球科学・
宇宙科学
材
料
科
学
工学・
ものつくり
材料科学
工学・
ものつくり
地球科学・
宇宙科学
バイオインフォ
マティクス
データ科学・
データ同化
生物科学・
生体力学

13
Fast File
System
（FFS）
1.0 PB,
1.0 TB/s
Simulation Nodes
Odyssey
25.9 PF, 7.8 PB/s
Shared File
System
（SFS）
25.8 PB,
0.50 TB/s
Data/Learning Nodes
Aquarius
7.20 PF, 578.2 TB/s
計算科学コード
データ・学習ノード群
Aquarius
シミュレーション
ノード群，Odyssey
機械学習，DDA
最適化されたモデル，
パラメータ
観測データ
計算結果
データ同化
データ解析
Wisteria/BDEC-01
外部
リソース
外部ネットワーク
サーバー
ストレージ
DB
センサー群
他

② 革新的ソフトウェア基盤
h3-Open-BDEC

（計算＋データ＋学習）融合によるエクサスケール
時代の革新的シミュレーション手法
15
• エクサスケール（富岳＋クラス）のスパコンによる科学的発見の持続的促進のた
め，計算科学にデータ科学，機械学習のアイディアを導入した（計算＋データ
＋学習（S+D+L））融合による革新的シミュレーション手法を提案
– (計算+データ+学習)融合によるエクサスケール時代の革新的シミュレーション手法（科
研費基盤S，代表：中島研吾（東大情基セ），2019年度～2023年度）
• 革新的ソフトウェア基盤「h3-Open-BDEC」の開発：東大BDECシステム（
Wisteria/BDEC-01），「富岳」等を「S+D+L」融合プラットフォームと位置づけ，
スパコンの能力を最大限引き出し，最小の計算量・消費電力での計算実行を
実現するために，下記2項目を中心に研究
– 変動精度演算・精度保証・自動チューニングによる新計算原理に基づく革新的数値解法
– 階層型データ駆動アプローチ（hDDA：Hierarchical Data Driven Approach）等に基づく
革新的機械学習手法
– Hierarchical, Hybrid, Heterogeneous ⇒ h3

h3-Open-BDEC
「計算＋データ＋学習」融合を実現する革新的ソフトウェア基盤
科研費基盤研究（Ｓ）（2019年度～23年度，代表：中島研吾）
https://h3-open-bdec.cc.u-tokyo.ac.jp/
① 変動精度演算・精度保証・自動
チューニングによる新計算原理
に基づく革新的数値解法
② 階層型データ駆動アプローチ
等に基づく革新的機械学習手
法
③ ヘテロジニアス環境（e.g.
Wisteria/BDEC-01）におけるソ
フトウェア，ユーティリティ群
16
Hierarchical,
Hybrid,
Heterogeneous
Big Data &
Extreme
Computing
h3-Open-BDEC
h3-Open-MATH
Algorithms with High-
Performance, Reliability,
Efficiency
h3-Open-VER
Verification of Accuracy
h3-Open-AT
Automatic Tuning
h3-Open-APP: Simulation
Application Development
h3-Open-DATA: Data
Data Science
h3-Open-DDA: Learning
Data Driven Approach
h3-Open-SYS
Control & Integration
h3-Open-UTIL
Utilities for Large-Scale
Computing
Integration +
Communications+
Utilities
Simulation + Data +
Learning
New Principle for
Computations
Numerical Alg./Library App. Dev. Framework Control & Utility

Wisteria/BDEC-01: The First “Really
Heterogenous” System in the World
17
17
Simulation
Codes
Data/Learning
Nodes, Aquarius
Simulation Nodes
Odyssey
Machine
Learning, DDA
Optimized Models &
Parameters
Observation
Data
Results
Data Assimilation
Data Analysis
Wisteria/BDEC-01
External
Resources
External Network
Server,
Storage,
DB,
Sensors,
etc.
Fast File
System
（FFS）
1 PB, 1.0 TB/s
External
Resources
External Network
Simulation Nodes:
Odyssey
Fujitsu/Arm A64FX
25.9PF, 7.8 PB/s
2.0 TB/s
800 Gbps
Shared File
System
（SFS）
25.8 PB, 500 GB/s
Data/Learning
Nodes: Aquarius
7.20 PF, 578.2 TB/s
External
Resources

h3-Open-UTIL/MP (h3o-U/MP)
(HPC+AI) Coupling
[Dr. H. Yashiro, NIES]
18
h3o-U/MP
HPC App
(Fortran)
Analysis/ML
App
(Python)
h3o-U/MP
F<->P adapter
Coupling
Surrogate
Model
Visualiztion
Statistics
A huge amount of
simulation data
output
Simulations
Odyssey
AI/ML
Aquarius

h3-Open-SYS/WaiIO-Socket
19
• Wisteria/BDEC-01
– Aquarius (GPU: NVIDIA A100)
– Odyssey (CPU: A64FX)
• Combining Odyssey-Aquarius
– Single MPI Job over O-A is
impossible
• Connection between Odyssey-
Aquarius
– IB-EDR with 2TB/sec.
– Fast File System
– h3-Open-SYS/WaitIO-Socket
• Library for Inter-Process
Communication through IB-
EDR with MPI-like interface
Fast File
System
（FFS）
1 PB, 1.0 TB/s
External
Resources
External Network
Simulation Nodes:
Odyssey
Fujitsu/Arm A64FX
25.9PF, 7.8 PB/s
2.0 TB/s
800 Gbps
Shared File
System
（SFS）
25.8 PB, 500 GB/s
Data/Learning
Nodes: Aquarius
7.20 PF, 578.2 TB/s
External
Resources
WaitIO-Socket

h3-Open-SYS/WaiIO-File
20
• Wisteria/BDEC-01
– Aquarius (GPU: NVIDIA A100)
– Odyssey (CPU: A64FX)
• Combining Odyssey-Aquarius
– Single MPI Job over O-A is
impossible
• Connection between Odyssey-
Aquarius
– IB-EDR with 2TB/sec.
– Fast File System
– h3-Open-SYS/WaitIO-File
• Library for Inter-Process
Communication through FFS
with MPI-like interface
Fast File
System
（FFS）
1 PB, 1.0 TB/s
External
Resources
External Network
Simulation Nodes:
Odyssey
Fujitsu/Arm A64FX
25.9PF, 7.8 PB/s
2.0 TB/s
800 Gbps
Shared File
System
（SFS）
25.8 PB, 500 GB/s
Data/Learning
Nodes: Aquarius
7.20 PF, 578.2 TB/s
External
Resources
WaitIO-File

h3-Open-UTIL/MP +
h3-Open-SYS/WaitIO-Socket
Available in June 2022
21
Fortran APP
(NICAM)
Python APP
(PyTorch)
h3open modules
h3open modules
h3opp.py
h3open_py.f90
Jcup modules
Jcup modules
h3-Open-UTIL/MP
jcup_mpi_lib.f90
jcup_mpi_lib.f90
MPI
Jcup
Fortran APP
(NICAM)
Python APP
(PyTorch)
h3open modules
h3open modules
h3opp.py
h3open_py.f90
Jcup modules
Jcup modules
jcup_mpi_lib.f90
jcup_mpi_lib.f90
MPI
Jcup
MPI
WaitIO
MPI+WaitIO
MPI wrapper
h3-Open-UTIL/MP
May 2021: MPI Only
Odyssey Aquarius
IB-EDR
June 2022: Coupler＋WaitIO

22
API of h3-Open-SYS/WaitIO-Socket/-File
PB (Parallel Block): Each Application
WaitIO API Description
waitio_isend Non-Blocking Send
waitio_irecv Non-Blocking Receive
waitio_wait Termination of waitio_isend/irecv
waitio_init Initialization of WaitIO
waitio_get_nprocs Process # for each PB (Parallel Block)
waitio_create_group
waitio_create_group_wranks
Creating communication groups
among PB’s
waitio_group_rank Rank ID in the Group
waitio_group_size Size of Each Group
waitio_pb_size Size of the Entire PB
waitio_pb_rank Rank ID of the Entire PB
[Sumimoto et al. 2021]

Replacing
this part
with AI
 Motivation of this experiment
 Tow types of Atmospheric models: Cloud resolving VS Cloud
parameterizing
 Could resolving model is difficult to use for climate simulation
 Parameterized model has many assumptions
 Replacing low-resolution cloud processes calculation with ML!
Diagram of applying ML to an atmospheric model
High Resolution Atmospheric Model
(Convection-Resolving Mode)
Low Resolution Atmospheric Model
(Convection-Parameterization Mode)
Physical process
Input
Output
Coupling with
Grid Remapping
ML App
(Python)
Coupling without
Grid Remapping
Coupling Phase 1
Training with high-resolution
NICAM data
Coupling Phase 2
Replacing Physical Process
in Low-Resolution NICAM
with Machine Learning
Atmosphere-ML Coupling
[Yashiro (NIES), Arakawa (ClimTech/U.Tokyo)]
75% 25%
Odyssey Aquarius
h3-Open-UTIL/MP
(Coupler)
+
h3-Open-SYS/WaitIO-
Socket
~0%

Experimental Design
 Atmospheric model on Odyssey
 NICAM : global non-hydrostatic model with an icosahedral grid
 Resolution : horizontal : 10240, vertical : 78
 ML on Aquarius
 Framework : PyTorch
 Method : Three-Layer MLP
 Resolution : horizontal : 10240, vertical : 78
 Experimental design
 Phase1: PyTorch is trained to reproduce output variables from
input variables of cloud physics subroutine.
 Phase2:Reproduce the output variables from Input variables
and training results
 Training data
 Input : total air density (rho), internal energy (ein), density of
water vapor (rho_q)
 Output : tendencies of input variables computed within the
cloud physics subroutine
Atmospheric Model
(Convection-Scheme ON)
Cloud physics
subroutine
Input
Output
ML App
(Python)
Output
Phase1: Training phase Phase2: Test phase
Simulation Node
Odyssey
Data/Learning Node
Aquarius
Δ𝑟ℎ𝑜
Δ𝑇
Δ𝑒𝑖𝑛
Δ𝑇
Δ𝑟ℎ𝑜_𝑞
Δ𝑇

Test calculation
Total air density
Internal energy
Density of
water vapor
Input Simulation Output
 Compute output variables from input variables and PyTorch
 The rough distribution of all variables is well reproduced
 The reproduction of extreme values is no good
ML output
Simulations Prediction by ML/NN

③ 次の一手：将来動向と
OFP-IIへ向けた取り組み

2001-2005 2006-2010 2011-2015 2016-2020 2021-2025 2026-2030
Hitachi
SR2201
307.2GF
Hitachi
SR8000/MPP
2,073.6 GF
Hitachi SR8000
1,024 GF
Hitachi SR11000
J1, J2
5.35 TF, 18.8 TF
Hitachi SR16K/M1
Yayoi
54.9 TF
Hitachi HA8000
T2K Todai
140 TF
Fujitsu FX10
Oakleaf-FX
1.13 PF
Reedbush-
U/H/L (SGI-HPE)
3.36 PF
33.1 PF
Fujitsu
OFP-II
75+ PF
BDEC-
02
250+ PF
OBCX
(Fujitsu)
6.61 PF
Ipomoea-01 25PB
Ipomoea-02
Ipomoea-
03
Oakforest-
PACS (Fujitsu)
25.0 PF
27
利用者2,600+名
55％は学外
Mercury

2001-2005 2006-2010 2011-2015 2016-2020 2021-2025 2026-2030
Hitachi
SR2201
307.2GF
Hitachi
SR8000/MPP
2,073.6 GF
Hitachi SR8000
1,024 GF
Hitachi SR11000
J1, J2
5.35 TF, 18.8 TF
Hitachi SR16K/M1
Yayoi
54.9 TF
Hitachi HA8000
T2K Todai
140 TF
Fujitsu FX10
Oakleaf-FX
1.13 PF
Reedbush-
U/H/L (SGI-HPE)
3.36 PF
33.1 PF
Fujitsu
BDEC-
02
250+ PF
AMD Opteron
Ipomoea-01 25PB
Ipomoea-02
Ipomoea-
03
SPACR64 IXfx
Intel BDW +
NVIDIA P100
IBM Power7
IBM Power5+
A64FX,
Intel Icelake+
NVIDIA A100
Oakforest-
PACS (Fujitsu)
25.0 PF
Intel Xeon Phi
Accelerators
SR8000
SR8000
HARP-1E
28
Pseudo Vector
Multicore CPU
GPU, Accelerators
OFP-II
75+ PF
Accelerators
Mercury
Accelerators
OBCX
(Fujitsu)
6.61 PF
Intel CLX
利用者2,600+名
55％は学外

100G
100G
100G
100G
400G
Ether
(RoCEv2)
NDR400
x2
100G
HDR200
HDR200
IB-NDR
(800
Gbps)
100G
x20
NDR400
x2
NDR400
x2
NDR400
x2
HDR100
x8
100G
• GPU cluster with 16 H100 GPUs, as prototype system for OFP-II
• Operation will start since Nov. 2023, installed and operated by Nippon Comsys
• Intel Xeon Platinum 8468 (Sapphire Rapids, 48c, 2.1GHz) x2, DDR5-4800 512 GB + NVIDIA H100 SXM5
x4, manufactured by Dell
• NVIDIA InfiniBand-NDR (800 Gbps) + 400 G Ethernet (RoCE)
• Lustre Filesystem: 67 TB by NVMe SSD + 26 PB available from Ipomoea-01
(Wisteria-) Mercury
Sep. 4th, 2023 44th Advanced Supercomputing Environment (ASE) Seminar 29
Ipomoea-01
Large-scale Common Storage
Compute node
4 node
Fast Storage
Login node / Management node

• Group-A: CPU Only: Intel Xeon Max 9480 (SPR)
– Node: Intel Xeon Max 9480 (1.9GHz, 56c) x 2
• 6.8 TF, 128 GiB, 3,200 GB/sec (HBM2e only)
– Total
• 190 nodes, 1.3 PF, IB-NDR 200
• 372 TB/sec for STREAM Triad (Peak: 608 TB/sec)
• Group-B: CPU+GPU: NVIDIA GH200
– Node: NVIDIA GH200 Grace-Hopper Superchip
• Grace: 72c, 2.9 TF, 111.7 GiB, 512 GB/sec (LPDDR5X)
• H100: 66.9 TF DP-Tensor Core, 89.4 GiB, 4,022 GB/sec (HBM3)
• NVMe SSD for each GPU: 1.9TB, 8.0GB/sec
– Total (Aggregated Performance: CPU+GPU)
• 1,120 nodes, 78.2 PF, 5.07 PB/sec, IB-NDR 200
OFP-II (1/2)
Bid opened on November 9th
30

• File System: DDN EXAScalar, Lustre Filesystem
– 10.3 PB (NVMe SSD), 1.0TB/sec
– “Ipomoea-01” with 26 PB is also available
• All compute nodes in Group-A/B are connected with Full Bisection Bandwidth
– (400Gbps/8)×(32×20＋16×1) = 32.8 TB/sec
• Operation starts in January 2025, h3-Open-SYS/WaitoIO will be adopted
for communication between Group-A and Group-B
OFP-II (2/2)
Bid opened on November 9th
31
IB-NDR（400Gbps）
IB-NDR200（200） IB-HDR（200）
File System
DDN EXA Scaler
10.3 PB, 1.0TB/sec
Group-A
Intel Xeon Max
(HBM2e) 2 x 190
1.3 PF, 608 TB/sec
Group-B
NVIDIA GH200 1,120
78.2 PF, 5.07 PB/sec
Ipomoea-01
Common Shared Storage
26 PB

Detailed Plan for Porting
• Strong supports by NVIDIA, Japan
• 3,000+ OFP users: Two categories of support
• Self Porting: Various Options
– 1-week Hackathon (online/hybrid): Every 3-months, utliziing Slack for comm.
– Monthly Open Meeting for Consultation via Zoom (Non-users can join)
– Portal Site for Useful Information (in Japanese)
• https://jcahpc.github.io/gpu_porting/
• Surpported Porting
– Community Codes with Many Users (17, next page) + OpenFOAM (by NVIDIA)
– Budget for Outsourcing
– Started in October 2022: Meetings every 3-4 months
– Many members of “Supported Porting” groups are joining Hackathons.
• Mostly, our users’ codes are parallelized by MPI+OpenMP
– OpenACC is recommended 32

33
Category Name (Organizations) Target, Method etc. Language
Engineering
(3)
FrontISTR (U.Tokyo) Solid Mechanics, FEM Fortran
FrontFlow/blue (U.Tokyo) CFD, FEM Fortran
FrontFlow/red (Advanced Soft) CFD, FVM Fortran
Biophysics
(3)
ABINIT-MP (Rikkyo U.) Drug Discovery etc., FMO Fortran
UT-Heart (UT Heart, U.Tokyo) Heart Simulation, FEM etc. Fortran, C
Lynx (Simula, U.Tokyo) Cardiac Electrophysiology, FVM C
Physics
(3)
MUTSU/iHallMHD3D (NIFS) Turbulent MHD, FFT Fortran
Nucl_TDDFT (Tokyo Tech) Nuclear Physics, Time Dependent DFT Fortran
Athena++ (Tohoku U. etc.) Astrophysics/MHD, FVM/AMR C++
Climate/
Weather/
Ocean
(4)
SCALE (RIKEN) Climate/Weather, FVM Fortran
NICAM (U.Tokyo, RIKEN, NIES) Global Climate, FVM Fortran
MIROC-GCM (AORI/U.Tokyo) Atmospheric Science, FFT etc. Fortran77
Kinaco (AORI/U.Tokyo) Ocean Science, FDM Fortran
Earthquake
(4)
OpenSWPC (ERI/U.Tokyo) Earthquake Wave Propagation, FDM Fortran
SPECFEM3D (Kyoto U.) Earthquake Simulations, Spectral FEM Fortran
hbi_hacapk (JAMSTEC, U.Tokyo) Earthquake Simulations, H-Matrix Fortran
sse_3d (NIED) Earthquake Science, BEM (CUDA Fortran) Fortran

34
計算可能領域の開拓のための量子・スパコン連携プラットフォームの研究開発
実施者
概要
国立研究開発法人理化学研究所、ソフトバンク株式会社（共同実施）東京大学、大阪大学
量子コンピュータとスーパーコンピュータ（HPC）を連携するための量子・HPC連携システムソフトウェアを研
究開発し、これを用いてこれまでのスパコンのみでは困難だった領域の計算を可能とする量子・スパコン連携
プラットフォームを構築する。既存のスパコンのみの計算に対し量子・HPC連携アプリの優位性を実証するとと
もに、この計算プラットフォームで実行される量子・HPC連携ソフトウェアをポスト5G時代のネットワークで提供
されるサービスとして展開する技術を開発する。
ポスト５Ｇ情報通信システム基盤強化研究開発事業／ポスト５Ｇ情報通信システムの開発
１．開発目的
• 量子コンピュータは、従来のコンピュータと全く異なる原理で動作し、情報処理速度の劇的な高速
化が期待されるが、現時点では、規模拡大と計算結果の誤り訂正の両立が困難であり、量子コ
ンピュータ単独での実用化には時間を要する見込み。
• 一方で、デジタル化の進展により、情報処理能力の向上が急務であり、量子コンピュータの活用
がいち早く求められているところ、古典コンピュータを組み合わせて活用することが有望視されている。
• 本事業では、世界に先駆けて、量子コンピュータとスパコンを連携利用するためのソフトウェアやプ
ラットフォーム、アプリケーションを開発・構築し、ポスト5G時代で提供されるサービスとして展開する
技術としての有効性を実証していく。
２．開発内容
• 量子・HPC連携ソフトウェア：スパコンと量子コンピュータを連携させ、最適な計算資源をシーム
レスかつ効率的に利用するためのシステムソフトウェアを開発。
• モジュール型量子ソフトウェアライブラリ：アプリ分野に合わせたモジュール型のソフトウエアを整備、
量子コンピュータの特性に合わせたエラー緩和処理、回路最適化処理を実現する上位ソフトウェア
ライブラリを開発。モジュールとして組み合わせることで高度な量子アプリケーションを開発可能とする。
• 量子・スパコン連携PFのクラウド化技術：事業展開を見据えて、量子アプリケーションの利用を
支援するクラウド基盤ソフトウェアを開発。
３．構築する量子・スパコン連携プラットフォームの構成
• 理研・計算科学研究センター（神戸）及び（和光）に特性の異なる2種類の量子コンピュータ
を整備。これらと富岳、及び東大・阪大スパコンと連携したプラットフォームを構築。
量子・スパコン連携プラットフォーム
量子計算ミュレータ
（高性能GPUシステム）
大規模シミュレータ
（スパコン・富岳）超伝導型量子
コンピュータ
イオントラッ
プ型量子コン
ピュータ
量子計算シミュレータ量子コンピュータ
量子コンピュータ・量子計算シミュレータ
スパコン(富岳）
PC/サーバー
高性能
GPUシ
ステム
スーパーコンピュータ
古典コンピュータ
遠隔手続き呼び出し
連携スケジューラ
量子HPCプログラミ
ング環境
量子・スパコン連携
クラウド化技術
量子・HPC連携
システムソフト
ウェア
モジュール型量子計算
ソフトウエアライブラリ
量子HPC連携プログラム最
適化技術（エラー緩和、
回路最適化）
2026年度に量子・スパコン連携プラット
フォームを運用開始し、それを用いて量
子・ HPC連携アプリケーションの有効性
の実証に取り組む
2028年度下期、量子・スパコン連携プラッ
トフォームのプレリリースを計画

• Quantum Computer = Accelerator of Supercomputers: QC-HPC Hybrid
• System SW for Efficient & Smooth Operation of QC (Quantum Computer,
including simulators on supercomputers)-HPC Hybrid Environment
– QHscheduler: A job scheduler that can simultaneously use multiple computer
resources distributed in remote locations
– h3-Open-BDEC/QH: Coupling to efficiently implement and integrate communication
and data transfer between QC-HPC on-line and in real time
– Collaboration with RIKEN R-CCS, funded by Japanese Government
• Target Application
– AI for HPC, combined workload
• Simulations in Computational Science
• Quantum Machine Learning
– Quantum Simulations, Error Correction
System SW for QC-HPC Hybrid Environment (1/2)
35
HPC
(1)
HPC
(2)
HPC
(3)
QC
(a)
QC
(b)
QC
On
HPC
QHscheduler
h3-Open-BDEC/QH

• Innovations
– This is the world's first attempt
to link multiple
supercomputers and quantum
computers installed at
different sites in real time.
– In particular, by using multiple
QCs simultaneously, it is
possible to form a virtual QC
with higher processing
capacity.
• Many people are thinking about
same thing all over the world
– This idea can be extended to
any types of systems
System SW for QC-HPC Hybrid Environment (2/2)
36
HPC
(1)
HPC
(2)
HPC
(3)
QC
(a)
QC
(b)
QC
On
HPC
QHscheduler
h3-Open-BDEC/QH
AI

PCCC23：東京大学情報基盤センター「Society5.0の実現を目指す『計算・データ・学習』の融合による革新的スーパーコンピューティング」

Recommended

Recommended

More Related Content

Similar to PCCC23：東京大学情報基盤センター「Society5.0の実現を目指す『計算・データ・学習』の融合による革新的スーパーコンピューティング」

Similar to PCCC23：東京大学情報基盤センター「Society5.0の実現を目指す『計算・データ・学習』の融合による革新的スーパーコンピューティング」 (20)

More from PC Cluster Consortium

More from PC Cluster Consortium (20)

Recently uploaded

Recently uploaded (8)