Successfully reported this slideshow.
Your SlideShare is downloading. ×

Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 20 Ad

Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora

Download to read offline

Talk Title: Huawei’s requirements for the ARM based HPC solution readiness
Talk Abstract:
A high level review of a wide range of requirements to architect an ARM based competitive HPC solution is provided. The review combines both Industry and Huawei’s unique views with the intend to communicate openly not only the alignment and support in ongoing efforts carried over by other ARM key players but to brief on the areas of differentiation that Huawei is investing towards the research, development and deployment of homegrown ARM based HPC solution(s).


Speaker: Joshua Mora
Speaker Bio:
20 years of experience in research and development of both software and hardware for high performance computing. Currently leading the architecture definition and development of ARM based HPC solutions, both hardware and software, all the way to the applications (ie. turnkey HPC solutions for different compute intensive markets where ARM will succeed !!).

Talk Title: Huawei’s requirements for the ARM based HPC solution readiness
Talk Abstract:
A high level review of a wide range of requirements to architect an ARM based competitive HPC solution is provided. The review combines both Industry and Huawei’s unique views with the intend to communicate openly not only the alignment and support in ongoing efforts carried over by other ARM key players but to brief on the areas of differentiation that Huawei is investing towards the research, development and deployment of homegrown ARM based HPC solution(s).


Speaker: Joshua Mora
Speaker Bio:
20 years of experience in research and development of both software and hardware for high performance computing. Currently leading the architecture definition and development of ARM based HPC solutions, both hardware and software, all the way to the applications (ie. turnkey HPC solutions for different compute intensive markets where ARM will succeed !!).

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora (20)

Advertisement

More from Linaro (20)

Recently uploaded (20)

Advertisement

Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora

  1. 1. Huawei’s requirements for the ARM based HPC solution readiness Joshua.Mora@Huawei.com Chief Architect microprocessor and applications for HPC and BigData R&D IT Product Line. Futurewei, Santa Clara, USA Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  2. 2. 2 • A high level review of a wide range of requirements to architect an ARM based competitive HPC solution is provided. • The review combines both Industry and Huawei’s unique views with the intend to : • communicate openly the alignment and support in ongoing efforts carried over by other ARM key players • brief on the areas of differentiation that Huawei is investing towards the research, development and deployment of homegrown ARM based HPC solution(s). Objectives of the presentation Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  3. 3. 3 Market opportunities and Timelines • ARM, partners and vendors both on HW and SW are creating a set of competitive products that customers are evaluating and investing with visibility in 2018-2020. • ARM based HPC initiatives and business cases currently lead by customers in research institutions are a clear server market reaction to the stagnation of x86 based solutions faced in the past ~4 years. The result is a competitive performance of the ARM core and SOCs and the growth/maturity of core SW with the help of key entities such as Linaro and ARM vendors. • We believe at Huawei that 2018-2020 is a crucial window of opportunities to demonstrate the value of ARM based solutions, among others in the HPC space. Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  4. 4. 4 Execution model • Our execution model will allow Huawei to participate in that window of opportunities aforementioned. • The strategy for the execution of the development of ARM based HPC solutions has 2 phases • Phase 1 (development ready): A variety of Hi1616 based platforms (reliable and performant, ~ Broadwell) have been available to enable partners to build both HW and SW ecosystems (core components of the HPC solution). Including applications. • Phase 2 (business ready): A similar number of Hi1620 based platforms (with competitive performance against currently available x86 CPUs) is becoming soon available to perform an “smooth/quick” update/transition from phase 1. Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  5. 5. 5 Very High Level Requirements • Ultimate objective: turn key HPC solutions. • Define/architect from HW perspective: • compute tier, ARM based, with and without accelerators • storage tier, ARM based, with and without accelerators • Networking, support for IB and RoCE, with “smart” capabilities • Define/architect from SW perspective: • BIOS/FW platform specific • OS tuning (incl. drivers and system libraries) and certification, platform agnostic • HPC SW stack optimized and certified for specific platforms • Applications optimized and certified for specific platforms • Deployment models: on premise and cloud Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  6. 6. 6 Speeding up ARM based HPC solution adoption • Focus on development all the way to the turn key solutions (like cluster management, containers and applications), not just core components (like drivers, OS, MPI, compiler, math library). • Investment on deployment of ARM based HPC solution as a service in the cloud: Customers should not need to be aware what architecture is delivering the HPC service (ie. HPC application execution must meet performance targets in an affordable way) • This effort requires alignment with ARM, HW vendors, SW vendors, cloud providers and communicate it clearly to customers through a variety of events such as this one. • It cannot be easily and solely driven by single ARM based vendor. • Huawei acknowledges and supports therefore these activities that will pave the road for ARM based business. Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  7. 7. 7 HW Requirements for ARM based HPC solution • CPU • race for memory bandwidth, window of opportunity with 8 memory channels/CPU. Memory frequencies upto 3200MHz. Leading into 2P system memory bandwidth >300GB/s (measured) • Large core count with competitive performance ~64cores/CPU (without SMT/HT) at high core frequency upto 3GHz • >128bit vector instructions • Low local and remote random memory access, < 90nsec, <200nsec respectively • Efficient hardware prefetchers to get high single core bandwidth >20GB/s for few cores in numanode to saturate memory controller bandwidth (good for core licensed applications). Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  8. 8. 8 SW Requirements for ARM based HPC solution • Ongoing development efforts in 2 complementary solutions • Fully opensource, “community based support/you are mostly on your own” • Fully commercial, “we support you everywhere” - - - Cost/Revenue/Margin Value added -optimizations -support -basic performance Commercial solution Open source solution Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  9. 9. 9 SW Requirements for ARM based HPC solution • For either one, the SW stack looks as follows: • BIOS, OS, drivers • Cluster management for monitoring, provisioning, scheduling • Containers for application deployment • Development tools (compiler, profiler, debugger) • Libraries (Math, MPI) • Applications (different verticals) • Parallel File System • The open source effort is around openHPC (activity reviewed with Linaro) and application focus is driven by current business opportunities such as in CFD, weather, bioinformatics, astrophysics. Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  10. 10. 10 SW Requirements for ARM based HPC solution • Partnerships with ISVs: • ISVs are fundamental for the healthy growth of the ARM business if we are pursuing the turn key solutions. • While our final objective is to deliver good performance on our platform, we are encouraging the ISVs to reach out the other ARM vendors in order to grow the portfolio of ARM based solutions available to customers in 2018-2020. • We follow the 2 phase execution model with the ISVs. • Reseller agreements to facilitate the adoption of high quality software stacks optimized for ARM. • We would pursue to deploy the turn key solutions with those ISVs both on premise and in the cloud to speed up the adoption. Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  11. 11. 11 • Math libraries supporting Hybrid MPI + openMP for multi chip module SOCs with low communication/synchronization overheads within node. • Optimized multithreaded libraries based on task scheduling of DAG (Directed Acyclic Graph) to leverage the high core count CPU. • Opportunities to reduce bandwidth requirements and make it more scalable for large core count architectures. Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018 SW Requirements for ARM based HPC solution
  12. 12. 12 HPCG: Leveraging DAG for efficient openMP execution of Gauss-Seidel algorithm Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018 SW Requirements for ARM based HPC solution
  13. 13. 13 HPCG: Leveraging DAG for efficient openMP execution of Gauss-Seidel algorithm Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018 SW Requirements for ARM based HPC solution
  14. 14. 14 HPCG: Leveraging DAG for efficient openMP execution of Gauss-Seidel algorithm Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018 SW Requirements for ARM based HPC solution
  15. 15. 15 HPCG: Leveraging DAG for fuse of Gauss-Seidel with Residual (bw reduction) Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018 SW Requirements for ARM based HPC solution 100 100 100 100 100 100 237.6 228.2 241.6 220.7 175.7 185.8 64 128 256 1 2 4 8 16 32 Relativeperformanceincrease #cores problem size 96x96x96 in Hi1616 FW GS + R FW FS GS FW FS GS Opt 1.0 2.0 4.2 7.7 11.2 23.6 1 2 4 8 16 32 1.0 2.0 4.0 8.0 16.0 32.0 1 2 4 8 16 32 Speedup #cores problem size 96x96x96 in Hi1616 FW GS + R FW FS GS FW FS GS Opt Ideal Superlinear cache effects wrt 1 core Memory bandwidth Saturation 12/16 cores in numanode FW: Forward Pass, similar benefits for Backward pass 1.8Xbetter
  16. 16. 16 • MPI validation, optimization and certification across a set of configurations: • Inter node communication with NIC type: IB, RoCE • Intra node communication • Operating systems: opensource and commercial • Compiler: opensource and commercial • MPI primitives: P2P, collectives • Platform optimization and certification • ISV + MPI optimization and certification • Integration of ISV + MPI with cluster management • Participating in OpenUCX to drive features and optimization on ARM • Provide early access to clusters through HPC-AI advisory council. Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018 SW Requirements for ARM based HPC solution
  17. 17. 17 Services Requirements for ARM based HPC solution • Dedicated seasoned team with HPC skills (yes, we are hiring!) spread out in China, EU and US to optimize ARM based HPC solutions delivering : • optimizations on open source applications • Support to ISVs in their porting, optimization and certification efforts. • Training on ARM CPU, platform, software stacks. • Benchmarking team for business support • That very same team has high interaction with Hisilicon team to squeeze performance on applications and to drive new features for next generation CPUs for HPC. Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  18. 18. 18 Services Requirements for ARM based HPC solution Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018 • Dedicated HPC team with multidisciplinary and overlapped skills Group skills 1: CPU centric CPU Architecture, compiler technology, algorithms, performance modeling, profiling Group skills 2: System centric CPU architecture, system architecture, networking, parallel file systems, Operating systems and driver tuning. Group skills 3: Math centric Linear algebra, statistics, algorithms, data structures, MPI, OpenMP, partial differential equations, sometimes also one of the verticals, numerical methods Group skills4: Vertical centric Individuals with vertical market experience, also strong on linear algebra, partial differential equations, numerical methods
  19. 19. 19 If you want to know more • Both vendors and customers are encouraged to sign an NDA for disclosure of details of Huawei’s ARM based HPC solutions and availability timelines • We are planning to unveil progressively more details within 2H 18 at multiple events like SC18 including both open source and commercial application demos. Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018
  20. 20. Thank you Arm Architecture HPC Workshop by Linaro and HiSilicon, 7/26/2018

×