SlideShare a Scribd company logo
1 of 20
Download to read offline
Huawei’s requirements for
the ARM based HPC solution readiness
Joshua.Mora@Huawei.com
Chief Architect microprocessor and applications for HPC and BigData
R&D IT Product Line.
Futurewei, Santa Clara, USA
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
2
• A high level review of a wide range of requirements to architect an
ARM based competitive HPC solution is provided.
• The review combines both Industry and Huawei’s unique views with
the intend to :
• communicate openly the alignment and support in ongoing
efforts carried over by other ARM key players
• brief on the areas of differentiation that Huawei is investing
towards the research, development and deployment of
homegrown ARM based HPC solution(s).
Objectives of the presentation
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
3
Market opportunities and Timelines
• ARM, partners and vendors both on HW and SW are creating a set
of competitive products that customers are evaluating and investing
with visibility in 2018-2020.
• ARM based HPC initiatives and business cases currently lead by
customers in research institutions are a clear server market
reaction to the stagnation of x86 based solutions faced in the past
~4 years. The result is a competitive performance of the ARM core
and SOCs and the growth/maturity of core SW with the help of key
entities such as Linaro and ARM vendors.
• We believe at Huawei that 2018-2020 is a crucial window of
opportunities to demonstrate the value of ARM based solutions,
among others in the HPC space.
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
4
Execution model
• Our execution model will allow Huawei to participate in that window of
opportunities aforementioned.
• The strategy for the execution of the development of ARM based HPC
solutions has 2 phases
• Phase 1 (development ready): A variety of Hi1616 based
platforms (reliable and performant, ~ Broadwell) have been
available to enable partners to build both HW and SW
ecosystems (core components of the HPC solution). Including
applications.
• Phase 2 (business ready): A similar number of Hi1620 based
platforms (with competitive performance against currently
available x86 CPUs) is becoming soon available to perform an
“smooth/quick” update/transition from phase 1.
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
5
Very High Level Requirements
• Ultimate objective: turn key HPC solutions.
• Define/architect from HW perspective:
• compute tier, ARM based, with and without accelerators
• storage tier, ARM based, with and without accelerators
• Networking, support for IB and RoCE, with “smart” capabilities
• Define/architect from SW perspective:
• BIOS/FW platform specific
• OS tuning (incl. drivers and system libraries) and certification,
platform agnostic
• HPC SW stack optimized and certified for specific platforms
• Applications optimized and certified for specific platforms
• Deployment models: on premise and cloud
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
6
Speeding up ARM based HPC solution adoption
• Focus on development all the way to the turn key solutions (like
cluster management, containers and applications), not just core
components (like drivers, OS, MPI, compiler, math library).
• Investment on deployment of ARM based HPC solution as a service
in the cloud: Customers should not need to be aware what
architecture is delivering the HPC service (ie. HPC application
execution must meet performance targets in an affordable way)
• This effort requires alignment with ARM, HW vendors, SW vendors,
cloud providers and communicate it clearly to customers through a
variety of events such as this one.
• It cannot be easily and solely driven by single ARM based vendor.
• Huawei acknowledges and supports therefore these activities that
will pave the road for ARM based business.
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
7
HW Requirements for ARM based HPC solution
• CPU
• race for memory bandwidth, window of opportunity with 8 memory
channels/CPU. Memory frequencies upto 3200MHz. Leading into
2P system memory bandwidth >300GB/s (measured)
• Large core count with competitive performance ~64cores/CPU
(without SMT/HT) at high core frequency upto 3GHz
• >128bit vector instructions
• Low local and remote random memory access, < 90nsec,
<200nsec respectively
• Efficient hardware prefetchers to get high single core bandwidth
>20GB/s for few cores in numanode to saturate memory
controller bandwidth (good for core licensed applications).
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
8
SW Requirements for ARM based HPC solution
• Ongoing development efforts in 2 complementary solutions
• Fully opensource, “community based support/you are mostly on your
own”
• Fully commercial, “we support you everywhere”
-
-
-
Cost/Revenue/Margin
Value added
-optimizations
-support
-basic performance
Commercial
solution
Open source
solution
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
9
SW Requirements for ARM based HPC solution
• For either one, the SW stack looks as follows:
• BIOS, OS, drivers
• Cluster management for monitoring, provisioning, scheduling
• Containers for application deployment
• Development tools (compiler, profiler, debugger)
• Libraries (Math, MPI)
• Applications (different verticals)
• Parallel File System
• The open source effort is around openHPC (activity reviewed with
Linaro) and application focus is driven by current business
opportunities such as in CFD, weather, bioinformatics, astrophysics.
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
10
SW Requirements for ARM based HPC solution
• Partnerships with ISVs:
• ISVs are fundamental for the healthy growth of the ARM
business if we are pursuing the turn key solutions.
• While our final objective is to deliver good performance on our
platform, we are encouraging the ISVs to reach out the other
ARM vendors in order to grow the portfolio of ARM based
solutions available to customers in 2018-2020.
• We follow the 2 phase execution model with the ISVs.
• Reseller agreements to facilitate the adoption of high quality
software stacks optimized for ARM.
• We would pursue to deploy the turn key solutions with those
ISVs both on premise and in the cloud to speed up the adoption.
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
11
• Math libraries supporting Hybrid MPI + openMP for multi chip module
SOCs with low communication/synchronization overheads within node.
• Optimized multithreaded libraries based on task scheduling of DAG
(Directed Acyclic Graph) to leverage the high core count CPU.
• Opportunities to reduce bandwidth requirements and make it more
scalable for large core count architectures.
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
SW Requirements for ARM based HPC solution
12
HPCG: Leveraging DAG for efficient openMP execution of Gauss-Seidel algorithm
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
SW Requirements for ARM based HPC solution
13
HPCG: Leveraging DAG for efficient openMP execution of Gauss-Seidel algorithm
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
SW Requirements for ARM based HPC solution
14
HPCG: Leveraging DAG for efficient openMP execution of Gauss-Seidel algorithm
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
SW Requirements for ARM based HPC solution
15
HPCG: Leveraging DAG for fuse of Gauss-Seidel with Residual (bw reduction)
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
SW Requirements for ARM based HPC solution
100 100 100 100 100 100
237.6 228.2
241.6
220.7
175.7
185.8
64
128
256
1 2 4 8 16 32
Relativeperformanceincrease
#cores
problem size 96x96x96 in Hi1616
FW GS + R FW FS GS FW FS GS Opt
1.0
2.0
4.2 7.7
11.2
23.6
1
2
4
8
16
32
1.0
2.0
4.0
8.0
16.0
32.0
1 2 4 8 16 32
Speedup
#cores
problem size 96x96x96 in Hi1616
FW GS + R FW FS GS FW FS GS Opt Ideal
Superlinear
cache effects
wrt 1 core
Memory bandwidth
Saturation 12/16
cores in numanode
FW: Forward Pass, similar benefits for Backward pass
1.8Xbetter
16
• MPI validation, optimization and certification across a set of
configurations:
• Inter node communication with NIC type: IB, RoCE
• Intra node communication
• Operating systems: opensource and commercial
• Compiler: opensource and commercial
• MPI primitives: P2P, collectives
• Platform optimization and certification
• ISV + MPI optimization and certification
• Integration of ISV + MPI with cluster management
• Participating in OpenUCX to drive features and optimization on ARM
• Provide early access to clusters through HPC-AI advisory council.
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
SW Requirements for ARM based HPC solution
17
Services Requirements for ARM based HPC solution
• Dedicated seasoned team with HPC skills (yes, we are hiring!)
spread out in China, EU and US to optimize ARM based HPC
solutions delivering :
• optimizations on open source applications
• Support to ISVs in their porting, optimization and certification
efforts.
• Training on ARM CPU, platform, software stacks.
• Benchmarking team for business support
• That very same team has high interaction with Hisilicon team to
squeeze performance on applications and to drive new features for
next generation CPUs for HPC.
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
18
Services Requirements for ARM based HPC solution
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
• Dedicated HPC team with multidisciplinary and overlapped skills
Group skills 1: CPU centric
CPU Architecture, compiler technology, algorithms, performance modeling,
profiling
Group skills 2: System centric
CPU architecture, system architecture, networking, parallel file systems, Operating
systems and driver tuning.
Group skills 3: Math centric
Linear algebra, statistics, algorithms, data structures, MPI, OpenMP, partial
differential equations, sometimes also one of the verticals, numerical methods
Group skills4: Vertical centric
Individuals with vertical market experience, also strong on linear algebra, partial
differential equations, numerical methods
19
If you want to know more
• Both vendors and customers are encouraged to sign an NDA for
disclosure of details of Huawei’s ARM based HPC solutions and
availability timelines
• We are planning to unveil progressively more details within 2H 18
at multiple events like SC18 including both open source and
commercial application demos.
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
Thank you
Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018

More Related Content

What's hot

HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
RDMA on ARM
RDMA on ARMRDMA on ARM
RDMA on ARM
inside-BigData.com
 

What's hot (20)

CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
ARM HPC Ecosystem
ARM HPC EcosystemARM HPC Ecosystem
ARM HPC Ecosystem
 
BXI: Bull eXascale Interconnect
BXI: Bull eXascale InterconnectBXI: Bull eXascale Interconnect
BXI: Bull eXascale Interconnect
 
Arm in HPC
Arm in HPCArm in HPC
Arm in HPC
 
Codasip application class RISC-V processor solutions
Codasip application class RISC-V processor solutionsCodasip application class RISC-V processor solutions
Codasip application class RISC-V processor solutions
 
SGI: Meeting Manufacturing's Need for Production Supercomputing
SGI: Meeting Manufacturing's Need for Production SupercomputingSGI: Meeting Manufacturing's Need for Production Supercomputing
SGI: Meeting Manufacturing's Need for Production Supercomputing
 
TULIPP at the 10th Intelligent Imaging Event
TULIPP at the 10th Intelligent Imaging EventTULIPP at the 10th Intelligent Imaging Event
TULIPP at the 10th Intelligent Imaging Event
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
Linux on RISC-V with Open Source Hardware (Open Source Summit Japan 2020)
Linux on RISC-V with Open Source Hardware (Open Source Summit Japan 2020)Linux on RISC-V with Open Source Hardware (Open Source Summit Japan 2020)
Linux on RISC-V with Open Source Hardware (Open Source Summit Japan 2020)
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
 
DOME 64-bit μDataCenter
DOME 64-bit μDataCenterDOME 64-bit μDataCenter
DOME 64-bit μDataCenter
 
Fueling the datasphere how RISC-V enables the storage ecosystem
Fueling the datasphere   how RISC-V enables the storage ecosystemFueling the datasphere   how RISC-V enables the storage ecosystem
Fueling the datasphere how RISC-V enables the storage ecosystem
 
Educating the computer architects of tomorrow's critical systems with RISC-V
Educating the computer architects of tomorrow's critical systems with RISC-VEducating the computer architects of tomorrow's critical systems with RISC-V
Educating the computer architects of tomorrow's critical systems with RISC-V
 
Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a...
Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a...Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a...
Esperanto accelerates machine learning with 1000+ low power RISC-V cores on a...
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 
Semi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresSemi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V cores
 
RDMA on ARM
RDMA on ARMRDMA on ARM
RDMA on ARM
 
ARM-based Supercomputer from Fujitsu and RIKEN - "Post-K"
ARM-based Supercomputer from Fujitsu and RIKEN - "Post-K"ARM-based Supercomputer from Fujitsu and RIKEN - "Post-K"
ARM-based Supercomputer from Fujitsu and RIKEN - "Post-K"
 

Similar to Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora

Presentazione IBM Power System Evento Venaria 14 ottobre
Presentazione IBM Power System Evento Venaria 14 ottobrePresentazione IBM Power System Evento Venaria 14 ottobre
Presentazione IBM Power System Evento Venaria 14 ottobre
PRAGMA PROGETTI
 
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic..."Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
Edge AI and Vision Alliance
 
ES-HAS: An Edge- and SDN-Assisted Framework for HTTP Adaptive Video Streaming
ES-HAS: An Edge- and SDN-Assisted Framework for HTTP Adaptive Video StreamingES-HAS: An Edge- and SDN-Assisted Framework for HTTP Adaptive Video Streaming
ES-HAS: An Edge- and SDN-Assisted Framework for HTTP Adaptive Video Streaming
Alpen-Adria-Universität
 
"Lessons Learned from Bringing Mobile and Embedded Vision Products to Market,...
"Lessons Learned from Bringing Mobile and Embedded Vision Products to Market,..."Lessons Learned from Bringing Mobile and Embedded Vision Products to Market,...
"Lessons Learned from Bringing Mobile and Embedded Vision Products to Market,...
Edge AI and Vision Alliance
 
SGI HPC Systems Help Fuel Manufacturing Rebirth 2015
SGI HPC Systems Help Fuel Manufacturing Rebirth 2015SGI HPC Systems Help Fuel Manufacturing Rebirth 2015
SGI HPC Systems Help Fuel Manufacturing Rebirth 2015
Josh Goergen
 

Similar to Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora (20)

Migrating from ibm to hpe
Migrating from ibm to hpeMigrating from ibm to hpe
Migrating from ibm to hpe
 
High-Level Synthesis for the Design of AI Chips
High-Level Synthesis for the Design of AI ChipsHigh-Level Synthesis for the Design of AI Chips
High-Level Synthesis for the Design of AI Chips
 
An Update on the European Processor Initiative
An Update on the European Processor InitiativeAn Update on the European Processor Initiative
An Update on the European Processor Initiative
 
Linaro: High Performance Computing (HPC)
Linaro: High Performance Computing (HPC)Linaro: High Performance Computing (HPC)
Linaro: High Performance Computing (HPC)
 
Arm - ceph on arm update
Arm - ceph on arm updateArm - ceph on arm update
Arm - ceph on arm update
 
Presentazione IBM Power System Evento Venaria 14 ottobre
Presentazione IBM Power System Evento Venaria 14 ottobrePresentazione IBM Power System Evento Venaria 14 ottobre
Presentazione IBM Power System Evento Venaria 14 ottobre
 
The Ultimate Guide to HBM2E Implementation & Selection - Frank Ferro - Rambus...
The Ultimate Guide to HBM2E Implementation & Selection - Frank Ferro - Rambus...The Ultimate Guide to HBM2E Implementation & Selection - Frank Ferro - Rambus...
The Ultimate Guide to HBM2E Implementation & Selection - Frank Ferro - Rambus...
 
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
 
Demystify OpenPOWER
Demystify OpenPOWERDemystify OpenPOWER
Demystify OpenPOWER
 
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
 
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic..."Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
 
Pushing new industry standards with Sap Hana
Pushing new industry standards with Sap HanaPushing new industry standards with Sap Hana
Pushing new industry standards with Sap Hana
 
Traditional vs. SoC FPGA Design Flow A Video Pipeline Case Study
Traditional vs. SoC FPGA Design Flow A Video Pipeline Case StudyTraditional vs. SoC FPGA Design Flow A Video Pipeline Case Study
Traditional vs. SoC FPGA Design Flow A Video Pipeline Case Study
 
OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017
 
Ibm power systems hpc cluster
Ibm power systems hpc cluster Ibm power systems hpc cluster
Ibm power systems hpc cluster
 
ES-HAS: An Edge- and SDN-Assisted Framework for HTTP Adaptive Video Streaming
ES-HAS: An Edge- and SDN-Assisted Framework for HTTP Adaptive Video StreamingES-HAS: An Edge- and SDN-Assisted Framework for HTTP Adaptive Video Streaming
ES-HAS: An Edge- and SDN-Assisted Framework for HTTP Adaptive Video Streaming
 
"Lessons Learned from Bringing Mobile and Embedded Vision Products to Market,...
"Lessons Learned from Bringing Mobile and Embedded Vision Products to Market,..."Lessons Learned from Bringing Mobile and Embedded Vision Products to Market,...
"Lessons Learned from Bringing Mobile and Embedded Vision Products to Market,...
 
Ken Liao, Senior Associate VP, Faraday
Ken Liao, Senior Associate VP, FaradayKen Liao, Senior Associate VP, Faraday
Ken Liao, Senior Associate VP, Faraday
 
SGI HPC Systems Help Fuel Manufacturing Rebirth 2015
SGI HPC Systems Help Fuel Manufacturing Rebirth 2015SGI HPC Systems Help Fuel Manufacturing Rebirth 2015
SGI HPC Systems Help Fuel Manufacturing Rebirth 2015
 
20230614 LinuxONE Distinguished_Recognition ISSIP_Award_Talk.pptx
20230614 LinuxONE Distinguished_Recognition ISSIP_Award_Talk.pptx20230614 LinuxONE Distinguished_Recognition ISSIP_Award_Talk.pptx
20230614 LinuxONE Distinguished_Recognition ISSIP_Award_Talk.pptx
 

More from Linaro

Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Linaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
Linaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
Linaro
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
Linaro
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready Program
Linaro
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
Linaro
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
Linaro
 
HKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionHKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: Introduction
Linaro
 
HKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 ServersHKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 Servers
Linaro
 
HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with Coresight
Linaro
 

More from Linaro (20)

Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready Program
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
 
HKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionHKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: Introduction
 
HKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 ServersHKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 Servers
 
HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with Coresight
 
HKG18-TR12 - LAVA for LITE Platforms and Tests
HKG18-TR12 - LAVA for LITE Platforms and TestsHKG18-TR12 - LAVA for LITE Platforms and Tests
HKG18-TR12 - LAVA for LITE Platforms and Tests
 
HKG18-419 - OpenHPC on Ansible
HKG18-419 - OpenHPC on AnsibleHKG18-419 - OpenHPC on Ansible
HKG18-419 - OpenHPC on Ansible
 
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 

Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora

  • 1. Huawei’s requirements for the ARM based HPC solution readiness Joshua.Mora@Huawei.com Chief Architect microprocessor and applications for HPC and BigData R&D IT Product Line. Futurewei, Santa Clara, USA Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  • 2. 2 • A high level review of a wide range of requirements to architect an ARM based competitive HPC solution is provided. • The review combines both Industry and Huawei’s unique views with the intend to : • communicate openly the alignment and support in ongoing efforts carried over by other ARM key players • brief on the areas of differentiation that Huawei is investing towards the research, development and deployment of homegrown ARM based HPC solution(s). Objectives of the presentation Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  • 3. 3 Market opportunities and Timelines • ARM, partners and vendors both on HW and SW are creating a set of competitive products that customers are evaluating and investing with visibility in 2018-2020. • ARM based HPC initiatives and business cases currently lead by customers in research institutions are a clear server market reaction to the stagnation of x86 based solutions faced in the past ~4 years. The result is a competitive performance of the ARM core and SOCs and the growth/maturity of core SW with the help of key entities such as Linaro and ARM vendors. • We believe at Huawei that 2018-2020 is a crucial window of opportunities to demonstrate the value of ARM based solutions, among others in the HPC space. Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  • 4. 4 Execution model • Our execution model will allow Huawei to participate in that window of opportunities aforementioned. • The strategy for the execution of the development of ARM based HPC solutions has 2 phases • Phase 1 (development ready): A variety of Hi1616 based platforms (reliable and performant, ~ Broadwell) have been available to enable partners to build both HW and SW ecosystems (core components of the HPC solution). Including applications. • Phase 2 (business ready): A similar number of Hi1620 based platforms (with competitive performance against currently available x86 CPUs) is becoming soon available to perform an “smooth/quick” update/transition from phase 1. Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  • 5. 5 Very High Level Requirements • Ultimate objective: turn key HPC solutions. • Define/architect from HW perspective: • compute tier, ARM based, with and without accelerators • storage tier, ARM based, with and without accelerators • Networking, support for IB and RoCE, with “smart” capabilities • Define/architect from SW perspective: • BIOS/FW platform specific • OS tuning (incl. drivers and system libraries) and certification, platform agnostic • HPC SW stack optimized and certified for specific platforms • Applications optimized and certified for specific platforms • Deployment models: on premise and cloud Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  • 6. 6 Speeding up ARM based HPC solution adoption • Focus on development all the way to the turn key solutions (like cluster management, containers and applications), not just core components (like drivers, OS, MPI, compiler, math library). • Investment on deployment of ARM based HPC solution as a service in the cloud: Customers should not need to be aware what architecture is delivering the HPC service (ie. HPC application execution must meet performance targets in an affordable way) • This effort requires alignment with ARM, HW vendors, SW vendors, cloud providers and communicate it clearly to customers through a variety of events such as this one. • It cannot be easily and solely driven by single ARM based vendor. • Huawei acknowledges and supports therefore these activities that will pave the road for ARM based business. Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  • 7. 7 HW Requirements for ARM based HPC solution • CPU • race for memory bandwidth, window of opportunity with 8 memory channels/CPU. Memory frequencies upto 3200MHz. Leading into 2P system memory bandwidth >300GB/s (measured) • Large core count with competitive performance ~64cores/CPU (without SMT/HT) at high core frequency upto 3GHz • >128bit vector instructions • Low local and remote random memory access, < 90nsec, <200nsec respectively • Efficient hardware prefetchers to get high single core bandwidth >20GB/s for few cores in numanode to saturate memory controller bandwidth (good for core licensed applications). Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  • 8. 8 SW Requirements for ARM based HPC solution • Ongoing development efforts in 2 complementary solutions • Fully opensource, “community based support/you are mostly on your own” • Fully commercial, “we support you everywhere” - - - Cost/Revenue/Margin Value added -optimizations -support -basic performance Commercial solution Open source solution Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  • 9. 9 SW Requirements for ARM based HPC solution • For either one, the SW stack looks as follows: • BIOS, OS, drivers • Cluster management for monitoring, provisioning, scheduling • Containers for application deployment • Development tools (compiler, profiler, debugger) • Libraries (Math, MPI) • Applications (different verticals) • Parallel File System • The open source effort is around openHPC (activity reviewed with Linaro) and application focus is driven by current business opportunities such as in CFD, weather, bioinformatics, astrophysics. Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  • 10. 10 SW Requirements for ARM based HPC solution • Partnerships with ISVs: • ISVs are fundamental for the healthy growth of the ARM business if we are pursuing the turn key solutions. • While our final objective is to deliver good performance on our platform, we are encouraging the ISVs to reach out the other ARM vendors in order to grow the portfolio of ARM based solutions available to customers in 2018-2020. • We follow the 2 phase execution model with the ISVs. • Reseller agreements to facilitate the adoption of high quality software stacks optimized for ARM. • We would pursue to deploy the turn key solutions with those ISVs both on premise and in the cloud to speed up the adoption. Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  • 11. 11 • Math libraries supporting Hybrid MPI + openMP for multi chip module SOCs with low communication/synchronization overheads within node. • Optimized multithreaded libraries based on task scheduling of DAG (Directed Acyclic Graph) to leverage the high core count CPU. • Opportunities to reduce bandwidth requirements and make it more scalable for large core count architectures. Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018 SW Requirements for ARM based HPC solution
  • 12. 12 HPCG: Leveraging DAG for efficient openMP execution of Gauss-Seidel algorithm Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018 SW Requirements for ARM based HPC solution
  • 13. 13 HPCG: Leveraging DAG for efficient openMP execution of Gauss-Seidel algorithm Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018 SW Requirements for ARM based HPC solution
  • 14. 14 HPCG: Leveraging DAG for efficient openMP execution of Gauss-Seidel algorithm Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018 SW Requirements for ARM based HPC solution
  • 15. 15 HPCG: Leveraging DAG for fuse of Gauss-Seidel with Residual (bw reduction) Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018 SW Requirements for ARM based HPC solution 100 100 100 100 100 100 237.6 228.2 241.6 220.7 175.7 185.8 64 128 256 1 2 4 8 16 32 Relativeperformanceincrease #cores problem size 96x96x96 in Hi1616 FW GS + R FW FS GS FW FS GS Opt 1.0 2.0 4.2 7.7 11.2 23.6 1 2 4 8 16 32 1.0 2.0 4.0 8.0 16.0 32.0 1 2 4 8 16 32 Speedup #cores problem size 96x96x96 in Hi1616 FW GS + R FW FS GS FW FS GS Opt Ideal Superlinear cache effects wrt 1 core Memory bandwidth Saturation 12/16 cores in numanode FW: Forward Pass, similar benefits for Backward pass 1.8Xbetter
  • 16. 16 • MPI validation, optimization and certification across a set of configurations: • Inter node communication with NIC type: IB, RoCE • Intra node communication • Operating systems: opensource and commercial • Compiler: opensource and commercial • MPI primitives: P2P, collectives • Platform optimization and certification • ISV + MPI optimization and certification • Integration of ISV + MPI with cluster management • Participating in OpenUCX to drive features and optimization on ARM • Provide early access to clusters through HPC-AI advisory council. Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018 SW Requirements for ARM based HPC solution
  • 17. 17 Services Requirements for ARM based HPC solution • Dedicated seasoned team with HPC skills (yes, we are hiring!) spread out in China, EU and US to optimize ARM based HPC solutions delivering : • optimizations on open source applications • Support to ISVs in their porting, optimization and certification efforts. • Training on ARM CPU, platform, software stacks. • Benchmarking team for business support • That very same team has high interaction with Hisilicon team to squeeze performance on applications and to drive new features for next generation CPUs for HPC. Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  • 18. 18 Services Requirements for ARM based HPC solution Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018 • Dedicated HPC team with multidisciplinary and overlapped skills Group skills 1: CPU centric CPU Architecture, compiler technology, algorithms, performance modeling, profiling Group skills 2: System centric CPU architecture, system architecture, networking, parallel file systems, Operating systems and driver tuning. Group skills 3: Math centric Linear algebra, statistics, algorithms, data structures, MPI, OpenMP, partial differential equations, sometimes also one of the verticals, numerical methods Group skills4: Vertical centric Individuals with vertical market experience, also strong on linear algebra, partial differential equations, numerical methods
  • 19. 19 If you want to know more • Both vendors and customers are encouraged to sign an NDA for disclosure of details of Huawei’s ARM based HPC solutions and availability timelines • We are planning to unveil progressively more details within 2H 18 at multiple events like SC18 including both open source and commercial application demos. Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  • 20. Thank you Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018