SlideShare a Scribd company logo
1 of 28
Download to read offline
Shinji Sumimoto, Ph.D.
Next Generation Technical Computing Unit
FUJITSU LIMITED
Jun. 20th, 2019
The First SVE Enabled Arm Processor:
A64FX and Building up Arm HPC
Ecosystem
Copyright 2019 FUJITSU LIMITED0
AHUG@ISC19 Workshop
Outline of This Talk
The First SVE Enabled Arm Processor: A64FX
A64FX: High Performance Arm CPU
Arm HPC Ecosystem Development
Arm HPC Software Topics
•Activities with Arm, Linaro and OSS Community
•OSS Application Porting Updates
1 Copyright 2019 FUJITSU LIMITED
 Inheriting Fujitsu HPC CPU technologies with
commodity standard ISA
A64FX: High Performance Arm CPU
Copyright 2019 FUJITSU LIMITED2
High Performance Arm CPU “A64FX”
 Architecture features
ISA Armv8.2-A (AArch64 only) SVE (Scalable Vector
Extension)
SIMD width 512-bit
Precision FP64/32/16, INT64/32/16/8
# of cores 48 computing cores + 4 assistant cores (4 CMGs)
Memory HBM2: Peak B/W 1024 GB/s
Interconnect TofuD: 28 Gbps x 2 lanes x 10 ports
 Peak performance (Chip-level)
0.128 0.128
2.7+
5.4+
10.8+
21.6+
0
5
10
15
20
25
64 bits 32 bits 16 bits 8 bits
SPARC64 VIIIfx
(K computer)
A64FX
(Supercomputer "Fugaku")
(TOPS)
(Element size)
N/A N/A
HPC AI
Copyright 2019 FUJITSU LIMITED3
HBM2
HBM2
HBM2
HBM2
TofuD
Controller
PCIe
Controller
NetworkonChip
CMG(Core Memory Group)
specification
13 cores
L2 Cache 8 MiB
Mem 8 GiB, 256 GB/s
TofuD
28 Gbps x 2 lanes x 10 ports
I/O
PCIe Gen3 16 lanes
Core Core Core Core Core Core Core Core Core Core
Core Core Core Core Core
Core Core Core Core
Core Core Core Core Core Core Core
Core Core Core Core Core Core Core Core Core Core Core Core
Core Core Core Core Core Core Core Core Core Core
Core Core Core Core
L2
Cache
L2
Cache
L2
Cache
L2
Cache
HBM2
Interface
HBM2
Interface
HBM2
interface
HBM2
Interface
PCIe InterfaceTofuD Interface
RING-Bus
1 Peta FLOPS System: K computer vs. Post-K
 K computer
 80x compute racks & 20x disk racks
 Post-K (Now Fugaku)
 1x rack w/ SSDs
2.2m
More applications as well as
system software will come in
collaboration with
Open Source Community
K computer Post-K
Compute nodes 7,680(=96x80)
384
IO nodes 4,80(=6x80)
Footprint (m2) 128(=4x32) 1.1
SPARC Linux Arm Linux
Copyright 2019 FUJITSU LIMITED4
+ ICC
SPARC64 VIIIfx
A64FX
A64FX CPU Performance Evaluation
Over 2.5x faster in HPC & AI benchmarks than SPARC64 XIfx
Copyright 2019 FUJITSU LIMITED5
A64FX Performance Comparison(1/2)
Himeno Benchmark (Fortran90)
† “Performance evaluation of a vector supercomputer SX-aurora TSUBASA”,
SC18, https://dl.acm.org/citation.cfm?id=3291728
Copyright 2019 FUJITSU LIMITED6
A64FX Performance Comparison(2/2)
Copyright 2019 FUJITSU LIMITED
WRF: Weather Research and Forecasting model
 Vectorizing loops including IF-constructs is key optimization
 Source code tuning using directives promotes compiler optimizations
x
x
7
 With Arm and Linaro
 With OSS Community: SPACK, Open MPI and Lustre
Arm HPC Software Topics:
Activities with Community
Copyright 2019 FUJITSU LIMITED8
Strong Relationship with Arm HPC community
 Arm
 Great Establishment and Contribution to Arm HPC base
such as SVE Support of Linux GCC and OpenHPC
https://developer.arm.com/hpc
 Linaro
 Building binary portability on Arm HPC
•Standardization of Arm Basic System Software (Linux Kernel,
glibc, GCC etc.) and Upstreaming to OSS community
•Developing and upstreaming SVE software to OSS
community
https://www.linaro.org/sig/hpc/
 OpenHPC
 Developing Standard IA and Arm HPC software
portability
 Distribution
•2017/11: v1.3.3 the first Arm version distributed
•2019/6: v1.3.8 for Arm distributed
http://www.openhpc.community/
Copyright 2019 FUJITSU LIMITED9
Activities with Arm and Linaro
LLVM SVE upstreaming and OSS porting with Arm
Variable Vector Length Support for LLVM Community in
cooperation with Arm
OpenHPC with Linaro:
Mr. Okamoto(Fujitsu) has been selected a 2018-2019
TSC(Technical Steering Committee) member
Development Status with Linaro
LLVM/Clang for aarch64 Improvement: now ongoing
•Register allocation, Software pipelining support, Vectorization/SIMDization
•Pushing SVE support to the LLVM community in cooperation with Arm,
Variable Vector Length Support is critical issue to introduce to LLVM tree.
QEMU/SVE Development: for building SVE software development
•V4.0.0 released: https://www.qemu.org/
10 Copyright 2019 FUJITSU LIMITED
QEMU/SVE Development with Linaro
https://www.qemu.org/
11 Copyright 2019 FUJITSU LIMITED
 2019/4/23: Version 4.0.0 Released
QEMU on Armv8+SVE with Fedora
 Fedora/aarch64 : https://alt.fedoraproject.org/en/alt/
 Fedora 29 supports SVE Enabled Kernel and GCC Compilers
 Easy to Use: Downloading raw-image and Running with virt-manager
•CPU type: max (not cortex-xx)
12 Copyright 2019 FUJITSU LIMITED
 In collaboration with LLNL and R-CCS
Our goals of SPACK world
and testing status
Copyright 2019 FUJITSU LIMITED13
About SPACK package manager
14 Copyright 2019 FUJITSU LIMITED
Software Installation with SPACK on
QEMU/Aarch64 w/ SVE
 Installing hdf5
15 Copyright 2019 FUJITSU LIMITED
Our goals of Arm HPC Ecosystem with SPACK
 Pushing forward building up Arm HPC w/SVE Ecosystem
 Building tools and applications for Arm HPC w/SVE
 Distributing and Sharing execution binaries for not only in computational
centers but also in Arm HPC users in the world.
 Final Target is binary distribution including OpenHPC distribution
 Sharing community build binary packages in the world
 Not only private package building environment but also sharing binaries in
centers, countries, and world
 Not limited open source packages, but also commercial binaries.
SPACK
Custom
User
System
Manager SPACK
Shared Bin.
SPACK
Custom
SPACK
Custom
User User
RPM(Deb)
Shared Bin.
Computation Center
SPACK
Custom
User
System
Manager SPACK
Shared Bin.
SPACK
Custom
SPACK
Custom
User User
RPM(Deb)
Shared Bin.
Computation Center
Binary
Distribution
Cloud
SPACK
Shared Bin.
RPM(Deb)
Shared Bin.
OpenHPC
Shared Bin.
OpenHPC
Shared Bin.
OpenHPC
Shared Bin.
Commercial
HPC Bin.
Commercial
HPC Bin.
Commercial
HPC Bin.
Copyright 2019 FUJITSU LIMITED16
Testing Status
 Testing Environments:
 Platforms: Thunder X2 system and Qemu 3.1 for Aarch64 w/ SVE
 OS: CentOS 7.x, Fedora29
 Networking environment: Proxy Internet Access w/ User Authentication
 First Impression:
 Working fine very easily on Aarch64 environment
 Good for custom binary building tool for each execution environment
•Rpm packages on the Internet are unified binaries.
 Sometimes timeout on downloading because of heavily congestion internet
•Curl gave-up and build error occurred
 Some applications/tools fail to compile not implemented for Aarch64
•Overall test results are as follows
Copyright 2019 FUJITSU LIMITED17
Result of package installation using Spack
 Total 3,225 packages are registered (releases/v0.12.1 6th June
2019)
 Confirmation condition
 Compiling without modifying anything except URL and checksum
 Using gcc 4.8.5
 No confirmation for execution
 # of Success
 Evaluation Status
 Around 70% of packages are success on Aarch64
 Existing Several failure patterns
•Some of them can be fixed by modification of configuration files
18 Copyright 2019 FUJITSU LIMITED
NOW (Jun. 2019) Jan. 2019
X86(gcc) 2,386/3225(73%) 2,284/2,907(78%)
Arm(gcc) 2,199/3225(68%) 1,336/2,907(45%)
Post-K(Now Fugaku) Software Stack
 Post-K system supports SBSA/SBBR
 Keeping binary compatibility with the other Aarch64 based systems.
Copyright 2019 FUJITSU LIMITED
Post-K System Hardware
Linux OS / McKernel (Lightweight Kernel)
FUJITSU Technical Computing Suite / RIKEN Advanced System Software
Post-K Applications
Management Software Programming EnvironmentHierarchical File I/O Software
System management
for highly available &
power saving operation
Job management for
higher system
utilization & power
efficiency
Lustre-based
distributed file system
FEFS
OpenMP, COARRAY, Math Libs.
Compilers (C, C++, Fortran)
Debugging and tuning tools
MPI (Open MPI, MPICH)
XcalableMP
Application-oriented
file I/O middleware
Post-K
Under Development
w/ RIKEN
19
Open MPI: from SC18 BoF Slides
https://www.open-mpi.org/papers/sc-2018
20 Copyright 2019 FUJITSU LIMITED
Open MPI: from SC18 BoF Slides
https://www.open-mpi.org/papers/sc-2018
21 Copyright 2019 FUJITSU LIMITED
Half-precision(FP16)datatype development started in cooperation with ANL and Mellanox
Open MPI: from SC18 BoF Slides
https://www.open-mpi.org/papers/sc-2018
22 Copyright 2019 FUJITSU LIMITED
Lustre Community: OpenSFS and EOFS
 OpenSFS: US Based Non-profit Organization
 President: Stephen Simms (Indiana University)
 EOFS: EU Based Non-profit Organization
 President: Frank Baetke (HPE)
 Lustre for Arm
 Fujitsu is member of OpenSFS and will support
Lustre based products.
 Two Major Events
 Lustre User Group(LUG)
•LUG 19@Houston, 2019/5/15-17
http://opensfs.org/events/
 Lustre Admins and Devs workshop(LAD)
•LAD 19@Paris, 2019/9/24-25
https://www.eofs.eu/events/lad19
Slides Archives are on each site
23 Copyright 2019 FUJITSU LIMITED
2018/11: Whamcloud has started Lustre client
support on Arm based platforms
24 Copyright 2019 FUJITSU LIMITED
https://www.ddn.com/press-releases/ddn-unveils-professional-support-lustre-arm-based-rm-platforms/
Arm HPC Software Topics:
OSS Application Porting Updates
Copyright 2019 FUJITSU LIMITED25
OSS apps porting at Arm HPC Users Group
 Twelve primary OSS applications are listed and being tested in the
Users Group for each compilers, collaboratively w/ Arm
Copyright 2019 FUJITSU LIMITED
* Registered by Fujitsu
(http://arm-hpc.gitlab.io/)
26
Application Lang. GCC LLVM Arm Fujitsu
LAMMPS C++ Modified Modified Modified Modified
GROMACS C Modified Modified Modified Modified
GAMESS* Fortran Modified Modified Modified Modified
OpenFOAM C++ Modified Modified Modified Modified
Siesta* Fortran Ok in as is Issues found Issues found Modified
NAMD C++ Modified Modified Modified Modified
WRF Fortran Modified Modified Modified Modified
Quantum
ESPRESSO
Fortran Ok in as is Ok in as is Ok in as is Modified
NWChem Fortran Ok in as is Modified Modified Modified
ABINIT Fortran Modified Modified Modified Modified
CP2K Fortran Ok in as is Issues found Issues found Modified
NEST* C++ Ok in as is Modified Modified Modified
USQCD (MILC) C Ok in as is Modified Modified Modified
BLAST* C++ Ok in as is Modified Modified Modified
Copyright 2019 FUJITSU LIMITED27

More Related Content

Similar to The First SVE Enabled Arm Processor: A64FX and Building up Arm HPC Ecosystem

Implementing AI: High Performance Architectures: Large scale HPC hardware in ...
Implementing AI: High Performance Architectures: Large scale HPC hardware in ...Implementing AI: High Performance Architectures: Large scale HPC hardware in ...
Implementing AI: High Performance Architectures: Large scale HPC hardware in ...KTN
 
Introduction of Fujitsu's HPC Processor for the Post-K Computer
Introduction of Fujitsu's HPC Processor for the Post-K ComputerIntroduction of Fujitsu's HPC Processor for the Post-K Computer
Introduction of Fujitsu's HPC Processor for the Post-K Computerinside-BigData.com
 
Post-K: Building the Arm HPC Ecosystem
Post-K: Building the Arm HPC EcosystemPost-K: Building the Arm HPC Ecosystem
Post-K: Building the Arm HPC EcosystemLinaro
 
CNCF and Fujitsu
CNCF and FujitsuCNCF and Fujitsu
CNCF and FujitsuLF Events
 
Llilum 161108 at MVP Global Summit 2016
Llilum 161108 at MVP Global Summit 2016Llilum 161108 at MVP Global Summit 2016
Llilum 161108 at MVP Global Summit 2016Atomu Hidaka
 
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...Edge AI and Vision Alliance
 
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayStrata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayLuciano Resende
 
In Network Computing Prototype Using P4 at KSC/KREONET 2019
In Network Computing Prototype Using P4 at KSC/KREONET 2019In Network Computing Prototype Using P4 at KSC/KREONET 2019
In Network Computing Prototype Using P4 at KSC/KREONET 2019Kentaro Ebisawa
 
P4+ONOS SRv6 tutorial.pptx
P4+ONOS SRv6 tutorial.pptxP4+ONOS SRv6 tutorial.pptx
P4+ONOS SRv6 tutorial.pptxtampham61268
 
Involvement in OpenHPC
Involvement in OpenHPC	Involvement in OpenHPC
Involvement in OpenHPC Linaro
 
Webinar gravado: Programando Microcontroladores ARM da Microchip usando MPLAB...
Webinar gravado: Programando Microcontroladores ARM da Microchip usando MPLAB...Webinar gravado: Programando Microcontroladores ARM da Microchip usando MPLAB...
Webinar gravado: Programando Microcontroladores ARM da Microchip usando MPLAB...Embarcados
 
2009-09-24 Get the Hype on System z Webinar with IBM, Current & Future Linux ...
2009-09-24 Get the Hype on System z Webinar with IBM, Current & Future Linux ...2009-09-24 Get the Hype on System z Webinar with IBM, Current & Future Linux ...
2009-09-24 Get the Hype on System z Webinar with IBM, Current & Future Linux ...Shawn Wells
 
Fujitsu World Tour 2016 - Ateliers - HPC et Simulation numérique : L’innovati...
Fujitsu World Tour 2016 - Ateliers - HPC et Simulation numérique : L’innovati...Fujitsu World Tour 2016 - Ateliers - HPC et Simulation numérique : L’innovati...
Fujitsu World Tour 2016 - Ateliers - HPC et Simulation numérique : L’innovati...Fujitsu France
 
An Update on the European Processor Initiative
An Update on the European Processor InitiativeAn Update on the European Processor Initiative
An Update on the European Processor Initiativeinside-BigData.com
 
Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...
Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...
Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...Yoshitake Kobayashi
 
LCU14 Keynote by George Grey
LCU14 Keynote by George GreyLCU14 Keynote by George Grey
LCU14 Keynote by George GreyLinaro
 

Similar to The First SVE Enabled Arm Processor: A64FX and Building up Arm HPC Ecosystem (20)

Implementing AI: High Performance Architectures: Large scale HPC hardware in ...
Implementing AI: High Performance Architectures: Large scale HPC hardware in ...Implementing AI: High Performance Architectures: Large scale HPC hardware in ...
Implementing AI: High Performance Architectures: Large scale HPC hardware in ...
 
Introduction of Fujitsu's HPC Processor for the Post-K Computer
Introduction of Fujitsu's HPC Processor for the Post-K ComputerIntroduction of Fujitsu's HPC Processor for the Post-K Computer
Introduction of Fujitsu's HPC Processor for the Post-K Computer
 
Post-K: Building the Arm HPC Ecosystem
Post-K: Building the Arm HPC EcosystemPost-K: Building the Arm HPC Ecosystem
Post-K: Building the Arm HPC Ecosystem
 
CNCF and Fujitsu
CNCF and FujitsuCNCF and Fujitsu
CNCF and Fujitsu
 
Llilum 161108 at MVP Global Summit 2016
Llilum 161108 at MVP Global Summit 2016Llilum 161108 at MVP Global Summit 2016
Llilum 161108 at MVP Global Summit 2016
 
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
“Khronos Standard APIs for Accelerating Vision and Inferencing,” a Presentati...
 
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayStrata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
 
RDMA on ARM
RDMA on ARMRDMA on ARM
RDMA on ARM
 
In Network Computing Prototype Using P4 at KSC/KREONET 2019
In Network Computing Prototype Using P4 at KSC/KREONET 2019In Network Computing Prototype Using P4 at KSC/KREONET 2019
In Network Computing Prototype Using P4 at KSC/KREONET 2019
 
P4+ONOS SRv6 tutorial.pptx
P4+ONOS SRv6 tutorial.pptxP4+ONOS SRv6 tutorial.pptx
P4+ONOS SRv6 tutorial.pptx
 
Involvement in OpenHPC
Involvement in OpenHPC	Involvement in OpenHPC
Involvement in OpenHPC
 
Webinar gravado: Programando Microcontroladores ARM da Microchip usando MPLAB...
Webinar gravado: Programando Microcontroladores ARM da Microchip usando MPLAB...Webinar gravado: Programando Microcontroladores ARM da Microchip usando MPLAB...
Webinar gravado: Programando Microcontroladores ARM da Microchip usando MPLAB...
 
2009-09-24 Get the Hype on System z Webinar with IBM, Current & Future Linux ...
2009-09-24 Get the Hype on System z Webinar with IBM, Current & Future Linux ...2009-09-24 Get the Hype on System z Webinar with IBM, Current & Future Linux ...
2009-09-24 Get the Hype on System z Webinar with IBM, Current & Future Linux ...
 
Open stack nova reverse engineer
Open stack nova reverse engineerOpen stack nova reverse engineer
Open stack nova reverse engineer
 
Fujitsu World Tour 2016 - Ateliers - HPC et Simulation numérique : L’innovati...
Fujitsu World Tour 2016 - Ateliers - HPC et Simulation numérique : L’innovati...Fujitsu World Tour 2016 - Ateliers - HPC et Simulation numérique : L’innovati...
Fujitsu World Tour 2016 - Ateliers - HPC et Simulation numérique : L’innovati...
 
An Update on the European Processor Initiative
An Update on the European Processor InitiativeAn Update on the European Processor Initiative
An Update on the European Processor Initiative
 
Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...
Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...
Civil Infrastructure Platform: Industrial Grade SLTS Kernel and Base-layer De...
 
LCU14 Keynote by George Grey
LCU14 Keynote by George GreyLCU14 Keynote by George Grey
LCU14 Keynote by George Grey
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
An Update on Arm HPC
An Update on Arm HPCAn Update on Arm HPC
An Update on Arm HPC
 

Recently uploaded

Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Recently uploaded (20)

Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

The First SVE Enabled Arm Processor: A64FX and Building up Arm HPC Ecosystem

  • 1. Shinji Sumimoto, Ph.D. Next Generation Technical Computing Unit FUJITSU LIMITED Jun. 20th, 2019 The First SVE Enabled Arm Processor: A64FX and Building up Arm HPC Ecosystem Copyright 2019 FUJITSU LIMITED0 AHUG@ISC19 Workshop
  • 2. Outline of This Talk The First SVE Enabled Arm Processor: A64FX A64FX: High Performance Arm CPU Arm HPC Ecosystem Development Arm HPC Software Topics •Activities with Arm, Linaro and OSS Community •OSS Application Porting Updates 1 Copyright 2019 FUJITSU LIMITED
  • 3.  Inheriting Fujitsu HPC CPU technologies with commodity standard ISA A64FX: High Performance Arm CPU Copyright 2019 FUJITSU LIMITED2
  • 4. High Performance Arm CPU “A64FX”  Architecture features ISA Armv8.2-A (AArch64 only) SVE (Scalable Vector Extension) SIMD width 512-bit Precision FP64/32/16, INT64/32/16/8 # of cores 48 computing cores + 4 assistant cores (4 CMGs) Memory HBM2: Peak B/W 1024 GB/s Interconnect TofuD: 28 Gbps x 2 lanes x 10 ports  Peak performance (Chip-level) 0.128 0.128 2.7+ 5.4+ 10.8+ 21.6+ 0 5 10 15 20 25 64 bits 32 bits 16 bits 8 bits SPARC64 VIIIfx (K computer) A64FX (Supercomputer "Fugaku") (TOPS) (Element size) N/A N/A HPC AI Copyright 2019 FUJITSU LIMITED3 HBM2 HBM2 HBM2 HBM2 TofuD Controller PCIe Controller NetworkonChip CMG(Core Memory Group) specification 13 cores L2 Cache 8 MiB Mem 8 GiB, 256 GB/s TofuD 28 Gbps x 2 lanes x 10 ports I/O PCIe Gen3 16 lanes Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core L2 Cache L2 Cache L2 Cache L2 Cache HBM2 Interface HBM2 Interface HBM2 interface HBM2 Interface PCIe InterfaceTofuD Interface RING-Bus
  • 5. 1 Peta FLOPS System: K computer vs. Post-K  K computer  80x compute racks & 20x disk racks  Post-K (Now Fugaku)  1x rack w/ SSDs 2.2m More applications as well as system software will come in collaboration with Open Source Community K computer Post-K Compute nodes 7,680(=96x80) 384 IO nodes 4,80(=6x80) Footprint (m2) 128(=4x32) 1.1 SPARC Linux Arm Linux Copyright 2019 FUJITSU LIMITED4 + ICC SPARC64 VIIIfx A64FX
  • 6. A64FX CPU Performance Evaluation Over 2.5x faster in HPC & AI benchmarks than SPARC64 XIfx Copyright 2019 FUJITSU LIMITED5
  • 7. A64FX Performance Comparison(1/2) Himeno Benchmark (Fortran90) † “Performance evaluation of a vector supercomputer SX-aurora TSUBASA”, SC18, https://dl.acm.org/citation.cfm?id=3291728 Copyright 2019 FUJITSU LIMITED6
  • 8. A64FX Performance Comparison(2/2) Copyright 2019 FUJITSU LIMITED WRF: Weather Research and Forecasting model  Vectorizing loops including IF-constructs is key optimization  Source code tuning using directives promotes compiler optimizations x x 7
  • 9.  With Arm and Linaro  With OSS Community: SPACK, Open MPI and Lustre Arm HPC Software Topics: Activities with Community Copyright 2019 FUJITSU LIMITED8
  • 10. Strong Relationship with Arm HPC community  Arm  Great Establishment and Contribution to Arm HPC base such as SVE Support of Linux GCC and OpenHPC https://developer.arm.com/hpc  Linaro  Building binary portability on Arm HPC •Standardization of Arm Basic System Software (Linux Kernel, glibc, GCC etc.) and Upstreaming to OSS community •Developing and upstreaming SVE software to OSS community https://www.linaro.org/sig/hpc/  OpenHPC  Developing Standard IA and Arm HPC software portability  Distribution •2017/11: v1.3.3 the first Arm version distributed •2019/6: v1.3.8 for Arm distributed http://www.openhpc.community/ Copyright 2019 FUJITSU LIMITED9
  • 11. Activities with Arm and Linaro LLVM SVE upstreaming and OSS porting with Arm Variable Vector Length Support for LLVM Community in cooperation with Arm OpenHPC with Linaro: Mr. Okamoto(Fujitsu) has been selected a 2018-2019 TSC(Technical Steering Committee) member Development Status with Linaro LLVM/Clang for aarch64 Improvement: now ongoing •Register allocation, Software pipelining support, Vectorization/SIMDization •Pushing SVE support to the LLVM community in cooperation with Arm, Variable Vector Length Support is critical issue to introduce to LLVM tree. QEMU/SVE Development: for building SVE software development •V4.0.0 released: https://www.qemu.org/ 10 Copyright 2019 FUJITSU LIMITED
  • 12. QEMU/SVE Development with Linaro https://www.qemu.org/ 11 Copyright 2019 FUJITSU LIMITED  2019/4/23: Version 4.0.0 Released
  • 13. QEMU on Armv8+SVE with Fedora  Fedora/aarch64 : https://alt.fedoraproject.org/en/alt/  Fedora 29 supports SVE Enabled Kernel and GCC Compilers  Easy to Use: Downloading raw-image and Running with virt-manager •CPU type: max (not cortex-xx) 12 Copyright 2019 FUJITSU LIMITED
  • 14.  In collaboration with LLNL and R-CCS Our goals of SPACK world and testing status Copyright 2019 FUJITSU LIMITED13
  • 15. About SPACK package manager 14 Copyright 2019 FUJITSU LIMITED
  • 16. Software Installation with SPACK on QEMU/Aarch64 w/ SVE  Installing hdf5 15 Copyright 2019 FUJITSU LIMITED
  • 17. Our goals of Arm HPC Ecosystem with SPACK  Pushing forward building up Arm HPC w/SVE Ecosystem  Building tools and applications for Arm HPC w/SVE  Distributing and Sharing execution binaries for not only in computational centers but also in Arm HPC users in the world.  Final Target is binary distribution including OpenHPC distribution  Sharing community build binary packages in the world  Not only private package building environment but also sharing binaries in centers, countries, and world  Not limited open source packages, but also commercial binaries. SPACK Custom User System Manager SPACK Shared Bin. SPACK Custom SPACK Custom User User RPM(Deb) Shared Bin. Computation Center SPACK Custom User System Manager SPACK Shared Bin. SPACK Custom SPACK Custom User User RPM(Deb) Shared Bin. Computation Center Binary Distribution Cloud SPACK Shared Bin. RPM(Deb) Shared Bin. OpenHPC Shared Bin. OpenHPC Shared Bin. OpenHPC Shared Bin. Commercial HPC Bin. Commercial HPC Bin. Commercial HPC Bin. Copyright 2019 FUJITSU LIMITED16
  • 18. Testing Status  Testing Environments:  Platforms: Thunder X2 system and Qemu 3.1 for Aarch64 w/ SVE  OS: CentOS 7.x, Fedora29  Networking environment: Proxy Internet Access w/ User Authentication  First Impression:  Working fine very easily on Aarch64 environment  Good for custom binary building tool for each execution environment •Rpm packages on the Internet are unified binaries.  Sometimes timeout on downloading because of heavily congestion internet •Curl gave-up and build error occurred  Some applications/tools fail to compile not implemented for Aarch64 •Overall test results are as follows Copyright 2019 FUJITSU LIMITED17
  • 19. Result of package installation using Spack  Total 3,225 packages are registered (releases/v0.12.1 6th June 2019)  Confirmation condition  Compiling without modifying anything except URL and checksum  Using gcc 4.8.5  No confirmation for execution  # of Success  Evaluation Status  Around 70% of packages are success on Aarch64  Existing Several failure patterns •Some of them can be fixed by modification of configuration files 18 Copyright 2019 FUJITSU LIMITED NOW (Jun. 2019) Jan. 2019 X86(gcc) 2,386/3225(73%) 2,284/2,907(78%) Arm(gcc) 2,199/3225(68%) 1,336/2,907(45%)
  • 20. Post-K(Now Fugaku) Software Stack  Post-K system supports SBSA/SBBR  Keeping binary compatibility with the other Aarch64 based systems. Copyright 2019 FUJITSU LIMITED Post-K System Hardware Linux OS / McKernel (Lightweight Kernel) FUJITSU Technical Computing Suite / RIKEN Advanced System Software Post-K Applications Management Software Programming EnvironmentHierarchical File I/O Software System management for highly available & power saving operation Job management for higher system utilization & power efficiency Lustre-based distributed file system FEFS OpenMP, COARRAY, Math Libs. Compilers (C, C++, Fortran) Debugging and tuning tools MPI (Open MPI, MPICH) XcalableMP Application-oriented file I/O middleware Post-K Under Development w/ RIKEN 19
  • 21. Open MPI: from SC18 BoF Slides https://www.open-mpi.org/papers/sc-2018 20 Copyright 2019 FUJITSU LIMITED
  • 22. Open MPI: from SC18 BoF Slides https://www.open-mpi.org/papers/sc-2018 21 Copyright 2019 FUJITSU LIMITED Half-precision(FP16)datatype development started in cooperation with ANL and Mellanox
  • 23. Open MPI: from SC18 BoF Slides https://www.open-mpi.org/papers/sc-2018 22 Copyright 2019 FUJITSU LIMITED
  • 24. Lustre Community: OpenSFS and EOFS  OpenSFS: US Based Non-profit Organization  President: Stephen Simms (Indiana University)  EOFS: EU Based Non-profit Organization  President: Frank Baetke (HPE)  Lustre for Arm  Fujitsu is member of OpenSFS and will support Lustre based products.  Two Major Events  Lustre User Group(LUG) •LUG 19@Houston, 2019/5/15-17 http://opensfs.org/events/  Lustre Admins and Devs workshop(LAD) •LAD 19@Paris, 2019/9/24-25 https://www.eofs.eu/events/lad19 Slides Archives are on each site 23 Copyright 2019 FUJITSU LIMITED
  • 25. 2018/11: Whamcloud has started Lustre client support on Arm based platforms 24 Copyright 2019 FUJITSU LIMITED https://www.ddn.com/press-releases/ddn-unveils-professional-support-lustre-arm-based-rm-platforms/
  • 26. Arm HPC Software Topics: OSS Application Porting Updates Copyright 2019 FUJITSU LIMITED25
  • 27. OSS apps porting at Arm HPC Users Group  Twelve primary OSS applications are listed and being tested in the Users Group for each compilers, collaboratively w/ Arm Copyright 2019 FUJITSU LIMITED * Registered by Fujitsu (http://arm-hpc.gitlab.io/) 26 Application Lang. GCC LLVM Arm Fujitsu LAMMPS C++ Modified Modified Modified Modified GROMACS C Modified Modified Modified Modified GAMESS* Fortran Modified Modified Modified Modified OpenFOAM C++ Modified Modified Modified Modified Siesta* Fortran Ok in as is Issues found Issues found Modified NAMD C++ Modified Modified Modified Modified WRF Fortran Modified Modified Modified Modified Quantum ESPRESSO Fortran Ok in as is Ok in as is Ok in as is Modified NWChem Fortran Ok in as is Modified Modified Modified ABINIT Fortran Modified Modified Modified Modified CP2K Fortran Ok in as is Issues found Issues found Modified NEST* C++ Ok in as is Modified Modified Modified USQCD (MILC) C Ok in as is Modified Modified Modified BLAST* C++ Ok in as is Modified Modified Modified