Successfully reported this slideshow.
Your SlideShare is downloading. ×

MN-3, MN-Core and HPL - SC21 Green500 BOF

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 21 Ad

MN-3, MN-Core and HPL - SC21 Green500 BOF

Download to read offline

This presentation was given at the Green500 BoF at SC21, in which PFN's VP of Computing Infrastructure Yusuke Doi discussed the power measurement for PFN's MN-3 supercomputer with MN-Core™ accelerators and how the company improved MN-3's power efficiency from 29.7GF/W to 39.38GF/W in 5 months.
More about MN-Core: https://projects.preferred.jp/mn-core/en/
More about MN-3: https://projects.preferred.jp/supercomputers/en/

This presentation was given at the Green500 BoF at SC21, in which PFN's VP of Computing Infrastructure Yusuke Doi discussed the power measurement for PFN's MN-3 supercomputer with MN-Core™ accelerators and how the company improved MN-3's power efficiency from 29.7GF/W to 39.38GF/W in 5 months.
More about MN-Core: https://projects.preferred.jp/mn-core/en/
More about MN-3: https://projects.preferred.jp/supercomputers/en/

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to MN-3, MN-Core and HPL - SC21 Green500 BOF (20)

Advertisement

More from Preferred Networks (20)

Recently uploaded (20)

Advertisement

MN-3, MN-Core and HPL - SC21 Green500 BOF

  1. 1. Yusuke Doi, Ph.D Corporate Officer, VP of Computing Infrastructure, Preferred Networks, Inc. MN-3, MN-Core and HPL SC21 Green500 BOF
  2. 2. Who are We? Why We Need Computing Power?
  3. 3. Preferred Networks Inc. Industry Domains Transportation Manufacturing Life Sciences Materials Robots Entertainment Founded March 2014 Directors CEO Toru Nishikawa COO Daisuke Okanohara CTO Ryosuke Okuta Located Tokyo, Japan (HQ) ​ Burlingame, CA., US (Preferred Networks America, Inc.)​ Make the real world computable
  4. 4. How much information can you extract from a single image? Our pixel accuracy object detection model extracts large amount of rich features from single image using ● State-of-art algorithm ● Hyperparameter tuning and optimization using Optuna™ ● Proprietary CG-based annotation- free data generation and augmentation combined with domain transfer to real image ● Distributed / large-batch training
  5. 5. 5 Behavior model with neural network Computational Chemistry with Deep Learning Searching for new materials over computers Atom Energy and Force physical property Molecular Dynamics
  6. 6. Our Capabilities Deep Learning World class researchers focusing on deep learning Expertise Wide range of deep expertise from robotics to genomics to computational chemistry World class computational resources designed for deep learning application Private Super Computer Software In-house developments of OSS and hyperparameter tuning library to accelerate software development
  7. 7. MN-3 and MN-Core: Deep Learning Supercomputer 7 MN-Core MN-Core Board x 4 CPU Intel Xeon 8260M 2way (48 physical cores) Memory 384GB DDR4 Storage Class Memory 3TB Intel Optane DC Persistent Memory Network MN-Core DirectConnect(112Gbps) x 2 Mellanox ConnectX-6(100GbE) x 2 On board(10GbE) x 2 MN-3 node specs Deep learning processor MN-Core For more information please visit: https://projects.preferred.jp/supercomputers/en/
  8. 8. MN-3 is the world’s most energy efficient supercomputer for deep learning. We use HPL to understand how to run our computer efficiently. Green500 / TOP500 history: ● 2021/11, 39.38GFlops/W (Green500 #1 / TOP500 #301) ● 2021/06, 29.70GFlops/W (Green500 #1 / TOP500 #335) ● 2020/11, 26.04GFlops/W (Green500 #2 / TOP500 #330) ● 2020/06, 21.11GFlops/W (Green500 #1 / TOP500 #393) MN-3 and HPL
  9. 9. Giant SIMD Processor ● Single instruction stream ● 500W/Package @ 32.8TF(DP) ○ 65GF/W on chip (ideal case) ● Hierarchical structure with unique on-chip network (broadcast, aggregation, etc) ● Deterministic/transparent from software ○ no cache, software shall manage data copy between each layers MN-Core
  10. 10. Philosophy behind MN -Core Hardware By providing only the functions necessary for computation and controlling them completely with software, we can achieve high execution efficiency/power efficiency with minimal hardware. This is difficult to achieve with cache-based parallel processors whose behavior is hidden to software. Prof. Makino (Kobe Univ.) Idea Behind MN -Core : Transparent Hardware for High Performance
  11. 11. Power Measurements
  12. 12. ● Level 2 in our first Green500 (June 2020) ● Upgraded to meet Level 3 requirements ○ #1 system should be one of the best examples in power measurement ● We measure in level-3 criteria since our second Green500 (Nov. 2020) ← upgrading power facility and measurement devices to meet Level 3 requirements Update to Level 3 power measurement
  13. 13. Level 3 Measurements: Power System of the MN-3 200V600A 3P3W 200V600A 3P3W 200V150A (3P3W) MN-3A (Zone 0) - 32nodes Smart PDU x 10 Smart PDU x 8 MN-3 nodes x 16 MN-3 nodes x 16 MN-3 Interconnect MN-3A (Zone 1) Revenue grade meter ME110SSR x 4 Power Analyzer WT1800E (6elements) MN-3 Power Measurements system TSDB HPL program Trigger Feedback via Slack bot, Web (Grafana)
  14. 14. Everybody can see results on Slack and Grafana dashboards
  15. 15. Measurement system supporting continuous improvements ● The more iterations, the more improvements ● Key to rapid iterations: how we quickly share the results of experiments ● Automated reporting system ○ Issues a unique ID to each HPL run ○ Records timestamp of core/full phase with the ID ○ Generates summary and graph of power measurements ○ Share the results in Slack ● It helps us to quickly understand effects of development
  16. 16. Optimization
  17. 17. We targeted 40GF/W with our 12nm accelerator
  18. 18. ● 2020/06, 21.11GFlops/W, efficiency 41% (#1) ○ initial challenge, made it work ● 2020/11, 26.04GFlops/W, efficiency 53% (#2) ○ optimization on scheduling, GEMM ● 2021/06, 29.70GFlops/W, efficiency 58% (#1) ○ even more optimization ● 2021/11, 39.38GFlops/W, efficiency 64% (#1) ○ interconnect improvement, aggressive software-level clock gating, even even more optimization MN-Core Challenge on HPL
  19. 19. 58% → 64% (6pt gain) ● +2pt (58→60): Optimizing DGEMM kernel and re-organized overlapping DGEMM and communication (swap and panel broadcast) ● +3pt (60→63): Increased bandwidth of interconnect (MN-Core DirectConnect) and more overlapping calc. and comm. ● +1pt (63→64): Optimizing other parts including panel factorization and dynamic code generation Execution Efficiency Improvement Breakdown
  20. 20. 29.70GF/W → 39.38GF/W (9.68GF/W gain) ● +3.4GF/W: Corresponding to the improvement of execution efficiency ● +4.4GF/W: Generating "energy-efficient instructions" by software: stopping unused arithmetic units, using scratchpad FFs instead of SRAMs, and reducing energy consumption of data copy, etc. ● +1.9GF/W: Other tuning including the core voltage and freq. Power Efficiency Improvement Breakdown Stop an ALU in PE Reuse a matrix as much as possible to reduce data copy
  21. 21. ● Unique computation framework of MN-Core ○ Deterministic and transparent hardware fully controlled by software ○ Application-specific optimization ● HPL is very useful benchmark to understand efficiency of new-style computer in real environment ○ Precise and integrated measurement is essential for continuous improvement Please visit us at booth #1521! Summary

×