Fugaku is designed to be the centerpiece of Japan's Society 5.0 vision. It is being constructed to be the world's first exascale supercomputer, with a target speed of over 100 times faster than the previous K supercomputer for some applications. Fugaku will have over 150,000 nodes, 150 petaflops of memory bandwidth, and a peak performance of over 400 petaflops for double precision calculations. It aims to accelerate high performance computing, big data, and AI workloads for important societal domains like healthcare, disaster prevention, energy, and manufacturing.
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...inside-BigData.com
In this video from Linaro Connect 2019, Satoshi Matsuoka from Riken presents: A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exascale Performance.
"Fugaku is the flagship next generation national supercomputer being developed by Riken R-CCS and Fujitsu in collaboration. Fugaku will have hyperscale datacenter class resource in a single exascale machine, with more than 150,000 nodes of sever-class Fujitsu A64fx many-core Arm CPUs with the new SVE (Scalable Vector Extension) with low precision math for the first time in the world, accelerating both HPC and AI workloads, augmented with HBM2 memory paired with each CPU, exhibiting nearly a Terabyte/s memory bandwidth for both HPC and AI rapid data movements."
Watch the video: https://wp.me/p3RLHQ-kYn
Learn more: https://postk-web.r-ccs.riken.jp/
and
https://connect.linaro.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck, Yuichiro Ajima from Fujitsu presents: The Tofu Interconnect D.
"Through the development of post-K, which will be equipped with this CPU, Fujitsu will contribute to the resolution of social and scientific issues in such computer simulation fields as cutting-edge research, health and longevity, disaster prevention and mitigation, energy, as well as manufacturing, while enhancing industrial competitiveness and contributing to the creation of Society 5.0 by promoting applications in big data and AI fields."
Learn more: https://insidehpc.com/2018/08/fujitsu-unveils-details-post-k-supercomputer-processor-powered-arm/
and
http://www.fujitsu.com/jp/solutions/business-technology/tc/catalog/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIinside-BigData.com
Satoshi Matsuoka from RIKEN gave this talk at the HPC User Forum in Santa Fe.
"With rapid rise and increase of Big Data and AI as a new breed of high-performance workloads on supercomputers, we need to accommodate them at scale, and thus the need for R&D for HW and SW Infrastructures where traditional simulation-based HPC and BD/AI would converge, in a BYTES-oriented fashion. Post-K is the flagship next generation national supercomputer being developed by Riken and Fujitsu in collaboration. Post-K will have hyperscale class resource in one exascale machine, with well more than 100,000 nodes of sever-class A64fx many-core Arm CPUs, realized through extensive co-design process involving the entire Japanese HPC community.
Rather than to focus on double precision flops that are of lesser utility, rather Post-K, especially its Arm64fx processor and the Tofu-D network is designed to sustain extreme bandwidth on realistic applications including those for oil and gas, such as seismic wave propagation, CFD, as well as structural codes, besting its rivals by several factors in measured performance. Post-K is slated to perform 100 times faster on some key applications c.f. its predecessor, the K-Computer, but also will likely to be the premier big data and AI/ML infrastructure. Currently, we are conducting research to scale deep learning to more than 100,000 nodes on Post-K, where we would obtain near top GPU-class performance on each node."
Watch the video: https://wp.me/p3RLHQ-k6G
Learn more: https://en.wikichip.org/wiki/supercomputers/post-k
and
http://hpcuserforum.com
In this deck from the Perth HPC Conference, Rob Farber from TechEnablement presents: AI is Impacting HPC Everywhere.
"The convergence of AI and HPC has created a fertile venue that is ripe for imaginative researchers — versed in AI technology — to make a big impact in a variety of scientific fields. From new hardware to new computational approaches, the true impact of deep- and machine learning on HPC is, in a word, “everywhere”. Just as technology changes in the personal computer market brought about a revolution in the design and implementation of the systems and algorithms used in high performance computing (HPC), so are recent technology changes in machine learning bringing about an AI revolution in the HPC community. Expect new HPC analytic techniques including the use of GANs (Generative Adversarial Networks) in physics-based modeling and simulation, as well as reduced precision math libraries such as NLAFET and HiCMA to revolutionize many fields of research. Other benefits of the convergence of AI and HPC include the physical instantiation of data flow architectures in FPGAs and ASICs, plus the development of powerful data analytic services."
Learn more: http://www.techenablement.com/
and
http://hpcadvisorycouncil.com/events/2019/australia-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the Argonne Training Program on Extreme-Scale Computing 2019, Howard Pritchard from LANL and Simon Hammond from Sandia present: NNSA Explorations: ARM for Supercomputing.
"The Arm-based Astra system at Sandia will be used by the National Nuclear Security Administration (NNSA) to run advanced modeling and simulation workloads for addressing areas such as national security, energy and science.
"By introducing Arm processors with the HPE Apollo 70, a purpose-built HPC architecture, we are bringing powerful elements, like optimal memory performance and greater density, to supercomputers that existing technologies in the market cannot match,” said Mike Vildibill, vice president, Advanced Technologies Group, HPE. “Sandia National Laboratories has been an active partner in leveraging our Arm-based platform since its early design, and featuring it in the deployment of the world’s largest Arm-based supercomputer, is a strategic investment for the DOE and the industry as a whole as we race toward achieving exascale computing.”
Watch the video: https://wp.me/p3RLHQ-l29
Learn more: https://insidehpc.com/2018/06/arm-goes-big-hpe-builds-petaflop-supercomputer-sandia/
and
https://extremecomputingtraining.anl.gov/agenda-2019/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exas...inside-BigData.com
In this video from Linaro Connect 2019, Satoshi Matsuoka from Riken presents: A64fx and Fugaku - A Game Changing, HPC / AI Optimized Arm CPU to enable Exascale Performance.
"Fugaku is the flagship next generation national supercomputer being developed by Riken R-CCS and Fujitsu in collaboration. Fugaku will have hyperscale datacenter class resource in a single exascale machine, with more than 150,000 nodes of sever-class Fujitsu A64fx many-core Arm CPUs with the new SVE (Scalable Vector Extension) with low precision math for the first time in the world, accelerating both HPC and AI workloads, augmented with HBM2 memory paired with each CPU, exhibiting nearly a Terabyte/s memory bandwidth for both HPC and AI rapid data movements."
Watch the video: https://wp.me/p3RLHQ-kYn
Learn more: https://postk-web.r-ccs.riken.jp/
and
https://connect.linaro.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck, Yuichiro Ajima from Fujitsu presents: The Tofu Interconnect D.
"Through the development of post-K, which will be equipped with this CPU, Fujitsu will contribute to the resolution of social and scientific issues in such computer simulation fields as cutting-edge research, health and longevity, disaster prevention and mitigation, energy, as well as manufacturing, while enhancing industrial competitiveness and contributing to the creation of Society 5.0 by promoting applications in big data and AI fields."
Learn more: https://insidehpc.com/2018/08/fujitsu-unveils-details-post-k-supercomputer-processor-powered-arm/
and
http://www.fujitsu.com/jp/solutions/business-technology/tc/catalog/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIinside-BigData.com
Satoshi Matsuoka from RIKEN gave this talk at the HPC User Forum in Santa Fe.
"With rapid rise and increase of Big Data and AI as a new breed of high-performance workloads on supercomputers, we need to accommodate them at scale, and thus the need for R&D for HW and SW Infrastructures where traditional simulation-based HPC and BD/AI would converge, in a BYTES-oriented fashion. Post-K is the flagship next generation national supercomputer being developed by Riken and Fujitsu in collaboration. Post-K will have hyperscale class resource in one exascale machine, with well more than 100,000 nodes of sever-class A64fx many-core Arm CPUs, realized through extensive co-design process involving the entire Japanese HPC community.
Rather than to focus on double precision flops that are of lesser utility, rather Post-K, especially its Arm64fx processor and the Tofu-D network is designed to sustain extreme bandwidth on realistic applications including those for oil and gas, such as seismic wave propagation, CFD, as well as structural codes, besting its rivals by several factors in measured performance. Post-K is slated to perform 100 times faster on some key applications c.f. its predecessor, the K-Computer, but also will likely to be the premier big data and AI/ML infrastructure. Currently, we are conducting research to scale deep learning to more than 100,000 nodes on Post-K, where we would obtain near top GPU-class performance on each node."
Watch the video: https://wp.me/p3RLHQ-k6G
Learn more: https://en.wikichip.org/wiki/supercomputers/post-k
and
http://hpcuserforum.com
In this deck from the Perth HPC Conference, Rob Farber from TechEnablement presents: AI is Impacting HPC Everywhere.
"The convergence of AI and HPC has created a fertile venue that is ripe for imaginative researchers — versed in AI technology — to make a big impact in a variety of scientific fields. From new hardware to new computational approaches, the true impact of deep- and machine learning on HPC is, in a word, “everywhere”. Just as technology changes in the personal computer market brought about a revolution in the design and implementation of the systems and algorithms used in high performance computing (HPC), so are recent technology changes in machine learning bringing about an AI revolution in the HPC community. Expect new HPC analytic techniques including the use of GANs (Generative Adversarial Networks) in physics-based modeling and simulation, as well as reduced precision math libraries such as NLAFET and HiCMA to revolutionize many fields of research. Other benefits of the convergence of AI and HPC include the physical instantiation of data flow architectures in FPGAs and ASICs, plus the development of powerful data analytic services."
Learn more: http://www.techenablement.com/
and
http://hpcadvisorycouncil.com/events/2019/australia-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the Argonne Training Program on Extreme-Scale Computing 2019, Howard Pritchard from LANL and Simon Hammond from Sandia present: NNSA Explorations: ARM for Supercomputing.
"The Arm-based Astra system at Sandia will be used by the National Nuclear Security Administration (NNSA) to run advanced modeling and simulation workloads for addressing areas such as national security, energy and science.
"By introducing Arm processors with the HPE Apollo 70, a purpose-built HPC architecture, we are bringing powerful elements, like optimal memory performance and greater density, to supercomputers that existing technologies in the market cannot match,” said Mike Vildibill, vice president, Advanced Technologies Group, HPE. “Sandia National Laboratories has been an active partner in leveraging our Arm-based platform since its early design, and featuring it in the deployment of the world’s largest Arm-based supercomputer, is a strategic investment for the DOE and the industry as a whole as we race toward achieving exascale computing.”
Watch the video: https://wp.me/p3RLHQ-l29
Learn more: https://insidehpc.com/2018/06/arm-goes-big-hpe-builds-petaflop-supercomputer-sandia/
and
https://extremecomputingtraining.anl.gov/agenda-2019/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this slidecast, Bill Mannel from SGI presents an update on the company's innovative HPC solutions.
Learn more at: http://sgi.com
Watch the presentation video: http://insidehpc.com/2013/07/01/slidecast-sgi-product-update-for-june-2013/
Implementing AI: High Performance Architectures: Large scale HPC hardware in ...KTN
The Implementing AI: High Performance Architectures webinar, hosted by KTN and eFutures, was the fourth event in the Implementing AI webinar series.
The focus of the webinar was the impact of processing AI data on data centres - particularly from the technology perspective. Prof. Simon McIntosh-Smith, Professor in High Performance Computing, University of Bristol, covered Large scale HPC hardware in the age of AI.
This slide explains about the detailed view hardware architecture which includes CPUs, GPUs, Interconnect networks and applications used by the summit supercomputer
Today Fujitsu published specifications for the A64FX CPU to be featured in the post-K computer, a future machine designed to be 100 times faster than the legendary K computer that dominated the TOP500 for years.
A64FX is the world's first CPU to adopt the Scalable Vector Extension (SVE), an extension of Armv8-A instruction set architecture for supercomputers. Building on over 60 years' worth of Fujitsu-developed microarchitecture, this chip offers peak performance of over 2.7 TFLOPS, demonstrating superior HPC and AI performance. A64FX offers a number of features, including broad utility supporting a wide range of applications, massive parallelization through the Tofu interconnect, low power consumption, and mainframe-class reliability.
A64FX is the world's first CPU to adopt the SVE of Arm Limited's Armv8-A instruction set architecture, extended for supercomputers. Fujitsu collaborated with Arm, contributing to the development of the SVE as a lead partner, and adopted the results in the A64FX.
Learn more: https://wp.me/p3RLHQ-iYt
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Early Benchmarking Results for Neuromorphic ComputingDESMOND YUEN
An update on the Intel Neuromorphic Research Community’s growth and benchmark results, including the addition of new corporate members and numerous new benchmarking updates computed on Intel’s neuromorphic test chip, Loihi.
Designed without compromise for unlimited innovation, Bull's HPC clusters of bullx processors are deployed on several continents with petascale computing power, for applications from sports car design to simulating the whole full observable universe.
El Barcelona Supercomputing Center (BSC) fue establecido en 2005 y alberga el MareNostrum, uno de los superordenadores más potentes de España. Somos el centro pionero de la supercomputación en España. Nuestra especialidad es la computación de altas prestaciones - también conocida como HPC o High Performance Computing- y nuestra misión es doble: ofrecer infraestructuras y servicio de supercomputación a los científicos españoles y europeos, y generar conocimiento y tecnología para transferirlos a la sociedad. Somos Centro de Excelencia Severo Ochoa, miembros de primer nivel de la infraestructura de investigación europea PRACE (Partnership for Advanced Computing in Europe), y gestionamos la Red Española de Supercomputación (RES). Como centro de investigación, contamos con más de 456 expertos de 45 países, organizados en cuatro grandes áreas de investigación: Ciencias de la computación, Ciencias de la vida, Ciencias de la tierra y aplicaciones computacionales en ciencia e ingeniería.
The RAPIDS suite of software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
Despite the increase of deep learning practitioners and researchers, many of them do not use GPUs, this may lead to long training/evaluation cycles and non-practical research.
In his talk, Lior shares how to get started with GPUs and some of the best practices that helped him during research and work. The talk is for everyone who works with machine learning (deep learning experience is NOT mandatory!), It covers the very basics of how GPU works, CUDA drivers, IDE configuration, training, inference, and multi-GPU training.
In this slidecast, Bill Mannel from SGI presents an update on the company's innovative HPC solutions.
Learn more at: http://sgi.com
Watch the presentation video: http://insidehpc.com/2013/07/01/slidecast-sgi-product-update-for-june-2013/
Implementing AI: High Performance Architectures: Large scale HPC hardware in ...KTN
The Implementing AI: High Performance Architectures webinar, hosted by KTN and eFutures, was the fourth event in the Implementing AI webinar series.
The focus of the webinar was the impact of processing AI data on data centres - particularly from the technology perspective. Prof. Simon McIntosh-Smith, Professor in High Performance Computing, University of Bristol, covered Large scale HPC hardware in the age of AI.
This slide explains about the detailed view hardware architecture which includes CPUs, GPUs, Interconnect networks and applications used by the summit supercomputer
Today Fujitsu published specifications for the A64FX CPU to be featured in the post-K computer, a future machine designed to be 100 times faster than the legendary K computer that dominated the TOP500 for years.
A64FX is the world's first CPU to adopt the Scalable Vector Extension (SVE), an extension of Armv8-A instruction set architecture for supercomputers. Building on over 60 years' worth of Fujitsu-developed microarchitecture, this chip offers peak performance of over 2.7 TFLOPS, demonstrating superior HPC and AI performance. A64FX offers a number of features, including broad utility supporting a wide range of applications, massive parallelization through the Tofu interconnect, low power consumption, and mainframe-class reliability.
A64FX is the world's first CPU to adopt the SVE of Arm Limited's Armv8-A instruction set architecture, extended for supercomputers. Fujitsu collaborated with Arm, contributing to the development of the SVE as a lead partner, and adopted the results in the A64FX.
Learn more: https://wp.me/p3RLHQ-iYt
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Early Benchmarking Results for Neuromorphic ComputingDESMOND YUEN
An update on the Intel Neuromorphic Research Community’s growth and benchmark results, including the addition of new corporate members and numerous new benchmarking updates computed on Intel’s neuromorphic test chip, Loihi.
Designed without compromise for unlimited innovation, Bull's HPC clusters of bullx processors are deployed on several continents with petascale computing power, for applications from sports car design to simulating the whole full observable universe.
El Barcelona Supercomputing Center (BSC) fue establecido en 2005 y alberga el MareNostrum, uno de los superordenadores más potentes de España. Somos el centro pionero de la supercomputación en España. Nuestra especialidad es la computación de altas prestaciones - también conocida como HPC o High Performance Computing- y nuestra misión es doble: ofrecer infraestructuras y servicio de supercomputación a los científicos españoles y europeos, y generar conocimiento y tecnología para transferirlos a la sociedad. Somos Centro de Excelencia Severo Ochoa, miembros de primer nivel de la infraestructura de investigación europea PRACE (Partnership for Advanced Computing in Europe), y gestionamos la Red Española de Supercomputación (RES). Como centro de investigación, contamos con más de 456 expertos de 45 países, organizados en cuatro grandes áreas de investigación: Ciencias de la computación, Ciencias de la vida, Ciencias de la tierra y aplicaciones computacionales en ciencia e ingeniería.
The RAPIDS suite of software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
Despite the increase of deep learning practitioners and researchers, many of them do not use GPUs, this may lead to long training/evaluation cycles and non-practical research.
In his talk, Lior shares how to get started with GPUs and some of the best practices that helped him during research and work. The talk is for everyone who works with machine learning (deep learning experience is NOT mandatory!), It covers the very basics of how GPU works, CUDA drivers, IDE configuration, training, inference, and multi-GPU training.
In this deck from the HPC User Forum in Austin, Yutaka Ishikawa from Riken AICS presents: Japan's post K Computer.
Watch the video presentation: http://wp.me/p3RLHQ-fJ6
Learn more: http://hpcuserforum.com
¿Es posible construir el Airbus de la Supercomputación en Europa?AMETIC
Presentación a cargo de Mateo Valero, Director del Barcelona Supercomputing Center, en el marco de la 30ª edición de los Encuentros de Telecomunicaciones y Economía Digital.
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ijdpsjournal
This paper studies the performance and energy consumption of several multi-core, multi-CPUs and manycore
hardware platforms and software stacks for parallel programming. It uses the Multimedia Multiscale
Parser (MMP), a computationally demanding image encoder application, which was ported to several
hardware and software parallel environments as a benchmark. Hardware-wise, the study assesses
NVIDIA's Jetson TK1 development board, the Raspberry Pi 2, and a dual Intel Xeon E5-2620/v2 server, as
well as NVIDIA's discrete GPUs GTX 680, Titan Black Edition and GTX 750 Ti. The assessed parallel
programming paradigms are OpenMP, Pthreads and CUDA, and a single-thread sequential version, all
running in a Linux environment. While the CUDA-based implementation delivered the fastest execution, the
Jetson TK1 proved to be the most energy efficient platform, regardless of the used parallel software stack.
Although it has the lowest power demand, the Raspberry Pi 2 energy efficiency is hindered by its lengthy
execution times, effectively consuming more energy than the Jetson TK1. Surprisingly, OpenMP delivered
twice the performance of the Pthreads-based implementation, proving the maturity of the tools and
libraries supporting OpenMP.
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ijdpsjournal
This paper studies the performance and energy consumption of several multi-core, multi-CPUs and manycore
hardware platforms and software stacks for parallel programming. It uses the Multimedia Multiscale
Parser (MMP), a computationally demanding image encoder application, which was ported to several
hardware and software parallel environments as a benchmark. Hardware-wise, the study assesses
NVIDIA's Jetson TK1 development board, the Raspberry Pi 2, and a dual Intel Xeon E5-2620/v2 server, as
well as NVIDIA's discrete GPUs GTX 680, Titan Black Edition and GTX 750 Ti. The assessed parallel
programming paradigms are OpenMP, Pthreads and CUDA, and a single-thread sequential version, all
running in a Linux environment. While the CUDA-based implementation delivered the fastest execution, the
Jetson TK1 proved to be the most energy efficient platform, regardless of the used parallel software stack.
Although it has the lowest power demand, the Raspberry Pi 2 energy efficiency is hindered by its lengthy
execution times, effectively consuming more energy than the Jetson TK1. Surprisingly, OpenMP delivered
twice the performance of the Pthreads-based implementation, proving the maturity of the tools and
libraries supporting OpenMP.
T21.Fujitsu World Tour India 2016-Education, Research and DesignFujitsu India
Find out how Fujitsu has helped hundreds of customers in India redefine their High Performance computing environment - and streamlined processes to enhance throughput.
The Implementing AI: High Performance Architectures webinar, hosted by KTN and eFutures, was the fourth event in the Implementing AI summer webinar series.
Every business is increasing the use of artificial intelligence to gain efficiency and to make better decisions. These new demands for data processing are not well delivered by traditional computer architectures. Enterprises, developers, data scientists, and researchers need new platforms that unify all AI workloads, simplifying infrastructure and accelerating ROI. This has led to the development of high performance and specialised hardware devices to meet these new demands.
The focus of this webinar was the impact of processing AI data on data centres - particularly from the technology perspective. The webinar had four presentations from experts, covering the opportunities, implementation techniques and Case Studies, followed by a panel Q&A session.
In this deck from the RISC-V Workshop in Barcelona, Mateo Valero, Director of the Barcelona Supercomputer center, explains how the RISC-V architecture can play a main role in new supercomputer architectures.
"RISC-V is an open, free ISA enabling a new era of processor innovation through open standard collaboration. Born in academia and research, RISC-V ISA delivers a new level of free, extensible software and hardware freedom on architecture, paving the way for the next 50 years of computing design and innovation."
Watch the video interview:
Learn more: https://tmt.knect365.com/risc-v-workshop-barcelona/
In this deck from the the 2018 RISC-V Workshop in Barcelona, Director Prof. Mateo Valero presents: European Processor Initiative & RISC-V.
Watch the video: https://wp.me/p3RLHQ-kHU
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Learn more: https://riscv.org/2018/05/risc-v-workshop-in-barcelona-proceedings/
and
https://www.european-processor-initiative.eu/
Top 10 Supercomputers With Descriptive Information & AnalysisNomanSiddiqui41
Top 10 Supercomputers Report
What is Supercomputer?
A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, there are supercomputers which can perform over 1017 FLOPS (a hundred quadrillion FLOPS, 100 petaFLOPS or 100 PFLOPS
Supercomputers play an important role in the field of computational science, and are used for a wide range of computationally intensive tasks in various fields, including quantum mechanics, weather forecasting, climate research, oil and gas exploration, molecular modeling (computing the structures and properties of chemical compounds, biological macromolecules, polymers, and crystals), and physical simulations (such as simulations of the early moments of the universe, airplane and spacecraft aerodynamics, the detonation of nuclear weapons, and nuclear fusion). They have been essential in the field of cryptanalysis.
1. The Fugaku Supercomputer
Introduction:
Fugaku is a petascale supercomputer (while only at petascale for mainstream benchmark), at the Riken Center for Computational Science in Kobe, Japan. It started development in 2014 as the successor to the K computer, and started operating in 2021. Fugaku made its debut in 2020, and became the fastest supercomputer in the world in the June 2020 TOP500 list, as well as becoming the first ARM architecture-based computer to achieve this. In June 2020, it achieved 1.42 exaFLOPS (in HPL-AI benchmark making it the first ever supercomputer that achieved 1 exaFLOPS. As of November 2021, Fugaku is the fastest supercomputer in the world. It is named after an alternative name for Mount Fuji.
Block Diagram:
Functional Units:
Functional Units, Co-Design and System for the Supercomputer “Fugaku”
1. Performance estimation tool: This tool, taking Fujitsu FX100 (FX100 is the previous Fujitsu supercomputer) execution profile data as an input, enables the performance projection by a given set of architecture parameters. The performance projection is modeled according to the Fujitsu microarchitecture. This tool can also estimate the power consumption based on the architecture model.
2. Fujitsu in-house processor simulator: We used an extended FX100 SPARC instruction-set simulator and compiler, developed by Fujitsu, for preliminary studies in the initial phase, and an Armv8þSVE simulator and compiler afterward.
3. Gem5 simulator for the Post-K processor: The Post-K processor simulator3 based on an opensource system-level processor simulator, Gem5, was developed by RIKEN during the co-design for architecture verification and performance tuning. A fundamental problem is the scale of scientific applications that are expected to be run on Post-K. Even our target applications are thousands of lines of code and are written to use complex algorithms and data structures. Altho
In this slidecast, Gilad Shainer from Mellanox describes how the company's interconnect solutions power HPC and Ai workloads.
"Mellanox is leading industry innovation by providing the highest throughput and lowest latency HDR 200Gb/s InfiniBand and Ethernet solutions available today, with a clear roadmap for tomorrow. Please visit Mellanox Technologies in booth #3207 to see the latest in our industry-leading HDR 200Gb/s InfiniBand and Spectrum Ethernet solutions, and see how Mellanox is already paving the way to Exascale."
Learn more: http://mellanox.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
At ISC'10, Fujitsu will be introducing a number of key technologies including information on Japan's Next-Generation Supercomputer to be installed at RIKEN.
By Koichi Hirai, Fujitsu
Post-K use Arm based super computer. But there are not too many Arm based servers for HPC. Therefore we think to need to build Arm HPC Ecosystem until Post-K release. In this presentation, we describe our collaboration efforts to build the Arm HPC Ecosystem.
For more info on The Linaro High Performance Computing (HPC) visit https://www.linaro.org/sig/hpc/
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
4. Specifications
Massively parallel, general purpose supercomputer
No. of nodes: 88,128
Peak speed: 11.28 Petaflops
Memory: 1.27 PB
Network: 6-dim mesh-torus (Tofu)
Top 500 ranking
LINPACK measures the speed and efficiency of linear
equation calculations.
Real applications require more complex computations.
No.1 in Jun. & Nov. 2011
No.20 in Jun 2019
HPCG ranking
Measures the speed and efficiency of solving linear
equation using HPCG
Better correlate to actual applications
No. 1 in Nov. 2017, No. 3 since Jun 2018
Graph 500 ranking
“Big Data” supercomputer ranking
Measures the ability of data-intensive loads
No. 1 for 9 consecutive editions
since 2015
K computer (decommissioned Aug. 16, 2019)
ACM Gordon Bell Prize
“Best HPC Application of the Year”
Winner 2011 & 2012. several finalists
First supercomputer in the world to retire
as #1 in major rankings (Graph 500)
6. 6
K: Focus mainly on Basic Science & Simulations
Pro: Modernized Japanese SC to massively parallel
Con: Large-scale simulations are ‘transient’: great
results but leaves success to scientific ‘chance’
Impact to IT, industry and society moderate
Fugaku: Centerpiece to Society 5.0
Designed to have continuum to IT infrastructure at
large (Arm, RHEL Linux, Container, Cloud)
High impact & ROI to IT, industry & society
Can sustain towards future for its value, fostering
both basic science & Society 5.0 usage
From K to Fugaku: Dramatic Change
Fundamental Change
Towards Society5.0
Fugaku
7. Broad Base --- Applicability & Capacity
Broad Applications: Simulation, Data Science, AI, …
Broad User Bae: Academia, Industry, Cloud Startups, …
For Society 5.0
High-Peak---Accelerationof
LargeScaleApplication
(Capability)
The “Fugaku” 富岳 “Exascale”
Supercomputer for Society 5.0
Mt. Fuji representing
the ideal of supercomputing
8. Co-Design Activities in Fugaku
Extremely tight collabrations between the Co-Design apps centers,
Riken, and Fujitsu, etc.
Chose 9 representative apps as “target application” scenario
Achieve up to x100 speedup c.f. K-Computer
Also ease-of-programming, broad SW ecosystem, very low power, …
Multiple Activities since 2011
・9 Priority App Areas: High Concern to
General Public: Medical/Pharma,
Environment/Disaster, Energy,
Manufacturing, …
Select representatives fr
om 100s of applications
signifying various compu
tational characteristics
Design systems with param
eters that consider various
application characteristics
Science by Computing Science of Computing
A 6 4 f x
For the
Fugaku
supercomputer
9. Arm64fx & Fugaku 富岳 /Post-K are:
9
Fujitsu-Riken design A64fx ARM v8.2 (SVE), 48/52 core CPU
HPC Optimized: Extremely high package high memory BW
(1TByte/s), on-die Tofu-D network BW (~400Gbps), high SVE FLOPS
(~3Teraflops), various AI support (FP16, INT8, etc.)
Gen purpose CPU – Linux, Windows (Word), other SCs/Clouds
Extremely power efficient – > 10x power/perf efficiency for CFD
benchmark over current mainstream x86 CPU
Largest and fastest supercomputer to be ever built circa 2020
> 150,000 nodes, superseding LLNL Sequoia
> 150 PetaByte/s memory BW
Tofu-D 6D Torus NW, 60 Petabps injection BW (10x global IDC traffic)
25~30PB NVMe L1 storage
The first ‘exascale’ machine (not exa64bitflops =>apps perf.)
Acceleration of HPC, Big Data, and AI for Society 5.0
10. Fugaku’s FUjitsu A64fx Processor is…
10
an Many-Core ARM CPU…
48 compute cores + 2 or 4 assistant (OS) cores
Brand new core design
Near Xeon-Class Integer performance core
ARM V8 --- 64bit ARM ecosystem
Tofu-D + PCIe 3 external connection
…but also an accelerated GPU-like processor
SVE 512 bit x 2 vector extensions (ARM & Fujitsu)
Integer (1, 2, 4, 8 bytes) + Float (16, 32, 64 bytes)
Cache + scratchpad-like local memory (sector cache)
HBM2 on package memory – Massive Mem BW (Bytes/DPF ~0.4)
Streaming memory access, strided access, scatter/gather etc.
Intra-chip barrier synch. and other memory enhancing features
GPU-like High performance in HPC, AI/Big Data, Auto Driving…
11. FUJITSU CONFIDENTIAL
Green500, Nov. 2019
A64FX prototype –
Fujitsu A64FX 48C 2GHz
ranked #1on the list
768x general purpose
A64FX CPU w/o
accelerators
• 1.9995 PFLOPS @ HPL,
84.75%
• 16.876 GF/W
• Power quality level 2
Copyright 2019 FUJITSU LIMITED11
12. FUJITSU CONFIDENTIAL
0
2
4
6
8
10
12
14
16
18
0 50 100 150 200 250 300 350 400 450 500 35 40 45 50
SC19 Green500 ranking and 1st
appeared TOP500
edition
Copyright 2019 FUJITSU LIMITED
TOP500 editionGreen500 rank
GF/W
Sunway TaihuLight: #35, 6.051 GF/W
“A64FX prototype”, prototype of Supercomputer Fugaku,
ranked #1, 16.876 GF/W
w/ Acc
w/o Acc
Endeavor: #94, 2.961 GF/W
Approx. 3x better GF/W to 2nd
system w/CPUs
The first system w/o Acc ranked #1 in 9.5 years
12
13. Significance of Green500 Achievement
13
13
Green500 until now
CPU
CPU
CPU
Specialized Architectures e.g. GPUs dominated
First time ever general purpose CPU became #1
Custom
GPU
GPU
Custom
GPU
GPU
CPU
CPU
CPU
CPU
Green500 w/Fugaku
Fugaku
14. A64FX CPU performance evaluation for real apps
Open source software, Real
apps on an A64FX @ 2.2GHz
Up to 1.8x faster over the latest
x86 processor (24core, 2.9GHz)
x 2, or 3.6x per socket
High memory B/Wand long
SIMD length of A64FX work
effectively with these
applications
OpenFOAM
FrontlSTR
ABINIT
SALMON
SPECFEM3D
WRF
MPAS
FUJITSU A64FX x86 (24core, 2.9GHz)x2
1.5
19
Relative speed up ratio
0.5 1.0
15. A64FX CPU power efficiency for real apps
Performance /Energy
consumption on an A64FX @
2.2GHz
Up to 3.7x more efficient over
the latest x86 processor
(24core, 2.9GHz) x2
High efficiency is achieved by
energy-conscious design and
implementation
OpenFOAM
FrontlSTR
ABINIT
SALMON
SPECFEM3D
WRF
MPAS
4.0
20
Relative power efficiency ratio
1.0 2.0 3.0
FUJITSU A64FX x86 (24core, 2.9GHz)x2
16. 1/
Fugaku is a Year’s worth of IT in Japan
16
Smartphones
IDC Servers incl
Clouds
Fugaku K Computer
Units
20 million
(2/3 annual shipments
in Japan)
=
300,000
(2/3 annual shipments
in Japan)
= 1 30~100
Power
10W×20 mil =
200MW =
600-700W x 30K=
200MW
(incl cooling)
>
>
30MW 15MW
CPU ISA
System SW
Arm
iOS/
Android Linux
x86/Arm
Linux (Red Hat
etc.)/Win
Arm
Linux (Red
Hat etc.)
Sparc
Proprietary Linux
Low generality
AI
Acceleration
Custom ASIC
Inference Only
Gen. Purpos
Accelerator e.g.
GPU
Gen. CPU
SVE
instructions
None
17. PostK K
Peak DP
(double precision)
>400+ Pflops
(34x +)
11.3 Pflops
Peak SP
(single precision)
>800+ Pflops
(70x +)
11.3 Pflops
Peak HP
(half precision)
>1600+ Pflops
(141x +)
--
Total memory
bandwidth
>150+ PB/sec
(29x +)
5,184TB/sec
Catego
ry
Priority Issue Area
Performance
Speedup over K
Application Brief description
Healthandlongevity
1. Innovative computing
infrastructure for drug
discovery
125x + GENESIS MD for proteins
2. Personalized and
preventive medicine using big
data
8x + Genomon
Genome processing
(Genome alignment)
Disaster
preventionand
Environment
3. Integrated simulation
systems induced by
earthquake and tsunami
45x + GAMERA
Earthquake simulator (FEM in
unstructured & structured grid)
4. Meteorological and global
environmental prediction using
big data
120x +
NICAM+
LETKF
Weather prediction system using
Big data (structured grid stencil &
ensemble Kalman filter)
Energyissue
5. New technologies for
energy creation, conversion /
storage, and use
40x + NTChem
Molecular electronic simulation
(structure calculation)
6. Accelerated development of
innovative clean energy
systems
35x + Adventure
Computational Mechanics System
for Large Scale Analysis and
Design (unstructured grid)
Industrial
competitiveness
enhancement
7. Creation of new functional
devices and high-performance
materials
30x + RSDFT
Ab-initio simulation
(density functional theory)
8. Development of innovative
design and production
processes
25x + FFB
Large Eddy Simulation
(unstructured grid)
Basic
science
9. Elucidation of the
fundamental laws and
evolution of the universe
25x + LQCD
Lattice QCD simulation (structured
grid Monte Carlo)
Fugaku Performance Estimate on 9 Co-Design Target Apps
Performance target goal
Peak performance to be achieved
100 times faster than K for some applications
(tuning included)
30 to 40 MW power consumption
As of 2019/05/14
Geometric Mean of Performance
Speedup of the 9 Target Applications
over the K-Computer
> 37x+
18. Fugaku Programming Environment
18
Programing Languages and Compilers provided by
Fujitsu
Fortran2008 & Fortran2018 subset
C11 & GNU and Clang extensions
C++14 & C++17 subset and GNU and Clang
extensions
OpenMP 4.5 & OpenMP 5.0 subset
Java
Parallel Programming Language & Domain Specific
Library provided by RIKEN
XcalableMP
FDPS (Framework for Developing Particle Simulator)
Process/Thread Library provided by RIKEN
PiP (Process in Process)
Script Languages provided by Linux distributor
E.g., Python+NumPy, SciPy
Communication Libraries
MPI 3.1 & MPI4.0 subset
Open MPI base (Fujitsu), MPICH (RIKEN)
Low-level Communication Libraries
uTofu (Fujitsu), LLC(RIKEN)
File I/O Libraries provided by RIKEN
Lustre
pnetCDF, DTF, FTAR
Math Libraries
BLAS, LAPACK, ScaLAPACK, SSL II (Fujitsu)
EigenEXA, Batched BLAS (RIKEN)
Programming Tools provided by Fujitsu
Profiler, Debugger, GUI
NEW: Containers (Singularity) and other Cloud APIs
NEW: AI software stacks (w/ARM)
Optimized PyTorch, TensorFlow, etc.
NEW: DoE Spack Package Manager
GCC and LLVM will be also available
19. Copyright 2019 FUJITSU LIMITED
PRIMEHPC FX1000
Supercomputer optimized for large scale computing
PRIMEHPC
FX700
Supercomputer based on
standard technologies
New PRIMEHPC Lineup
High Scalability High Density Superior power efficiency
Ease to use Installation
19
20. FUJITSU CONFIDENTIAL
Introducing the Cray CS500 - Fujitsu A64FX Arm Server
Next generation Arm® solution
Cray Fujitsu Technology Agreement
Supported in Cray CS500 infrastructure
Cray Programming Environment
Leadership performance for many memory
intensive HPC applications
GA in mid’2020
20
22. 22
Two-fold Fugaku Missions
Carryover mission of K: Large-scale Simulation etc.
Support of Basic Sciences
Society 5.0 mission imposed by CSTI
Rapidly deliver rapid tangible results of high
interest to the society and SDG
Enhancement of R&D
missions carried over
from K
Merit-based
Central platform for R&D
of next-gen Society 5.0
applications
Strategic w/industry
23. 23
Current Internet: more than 90% of traffic IDC=> edge,
video traffic (YouTube, NetFlix, etc.), caching optimization
As such IDC+Internet+Edge (Smartphones, PCs) are optimized
as CDN (Contents Delivery Network)
Same data used many times: (e.g. YouTube, Netflix)-> moderate
demand for storage
Intra-IDC traffic is mainly ‘North-South’ i.e., between storage nodes
in the IDC towards the outgoing traffic to the edge. Intra-IDC traffic
‘East-West’ is mostly for control, does not need high performance
interconnect as is with supercomputers
Exo-IDC traffic is downstream outgoing. Since the same data are sent
repeatedly, caching traffic in the Internet is effective e.g. Akamai
Low affinity with scientific big data (-_-;)
How will this chage with Society 5.0?
Society5.0 will Fundamentally Change the Internet (1)
24. 24
Future Internet: Traffic will reverse, Edge=>IDC, variety of sensor
data, in-flight processing
IDC+Internet+Edge: A real-time data analytics infrastructure
Raw edge data are all different, no way to store them all (CAGR of storage
devices ~ 15%) -> must ‘throw away’ data
- Sophisticated AI real-time analytics necessary, not just simple compression
Intra-IDC Network require high performance as is SC, due to AI+HPC centric
workloads (IDCs to become supercomputers)
Inter-IDC Network IDC: Edge->IDC upstream ingest, caching does not work,
need continuous real-time processing to ‘throw away’ data => from data-
centric to compute centric
Compression is largely ‘feature detection’. Data analytics mainly (1)
correlelative analysis between data, and (2) data reconstruction from features
High affinity with scientific instrumentation and data analytics ☺
Society5.0 will Fundamentally Change the Internet (2)
25. 25
Society5.0 and Internet
Current Internet Society5.0 Internet
Same data are copied &
reused
Data Type Different Data from Sensors
IDC => Edge (Video)
Data Flow
Direction
Edge => IDC (w/processing)
Small
North-South Dominant
Intra IDC
Traffic
Large (HPC & AI)
East-West Dominant
Supercomputer Interconnect
From Data Centric to Compute
Centric
CDN dominated by Video (YouTube)
North-South
East-West
26. Can Simulation be the Core Society5.0 Technology?
26
Society5.0 (def. by MIA) “Human-centric society where
convergence of cyber space (virtual world) and physical space
(real world) resulting in for economic development as well as
resolution of societal issues.”
Simulation models physical world in cyberspace =>
fundamental to Society5.0
That is why major US IT companies are pushing HPC
But Society 5.0 is mainly sensing=>analytics=>actuation
Here, simulation is DIFFICULT TO APPLY due to cost
=> AI surrogates & AI-reduced simulation models
=> we must train the AI through simulation
As such HPC+AI would be fundamental: winning formula
against GAFA, including R-CCS
27. Fugaku will be a centerpiernce of Society 5.0
by CyberPhysical Assimilation of Data & Simulation
of HPC and AI
NatureSensing
Human society and economy
Physical WorldCyberspace
1101011
1101011
1101011
Simulation Big Data
Big Data Assimilation
AI x HPC
synchronize
predict &
control
IoT
Societal Issues e.g., weather
Figure originally by Takemasa Miyoshi, R-CCS
28. Precise Bloodflow Simulation of Artery on
TSUBAME2.0
(Bernaschi et. al., IAC-CNR, Italy)
Personal CT Scan + Simulation
=> Accurate Diagnostics of Cardiac Illness
5 Billion Red Blood Cells + 10 Billion Degrees
of Freedom
29. MUPHY: Multiphysics simulation of blood flow
(Melchionna, Bernaschi et al.)
Combined Lattice-Boltzmann (LB)
simulation for plasma and Molecular
Dynamics (MD) for Red Blood Cells
Realistic geometry ( from CAT scan)
Two-levels of parallelism: CUDA (on
GPU) + MPI
• 1 Billion mesh node for LB
component
•100 Million RBCs
Red blood cells
(RBCs) are
represented as
ellipsoidal particles
Fluid: Blood plasma
Lattice Boltzmann
Multiphyics simulation
with MUPHY software
Body: Red blood cell
Extended MD
Irregular mesh is divided by
using PT-SCOTCH tool,
considering cutoff distance
coupled ACM
Gordon Bell
Prize 2011
Honorable
Mention
4000 GPUs,
0.6Petaflops
30. Patient-specific cardiac blood flow simulation is
scientifically significant, but cannot be utilized directly
to clinical diagnostics due to cost
30
Tsubame2 requires 48 hours to simulate ONE heartbeat
To compute in one hour on Fugaku requires 150 racks
BW bound algorithm: TSUBAME2 ~= Fugaku 3 racks x 48
Need $1 bil over 6 years => At most 40,000 patients => at least
$25K per patient just for diagnosis, not feasible
Large-scale simulation is beneficial when (1) results are fairly
predicable, and (2) result can be utilized broadly, but…
Most science, both fundamental and applied, involve
numerous & repetitive experimentations due to cost
Basic science: large simulations cannot be executed often so
probability of new discoveries become low
31. AI to the Rescue: Role of AI in Society5.0 (1)
31
IT infrastructure for real time sensing->analysis->actuation
Analytics and decision-making by surrogate AI is lightweight ->
appropriate for the actual use in cyber-physical apps
But AI needs to be trained: we DON’T NEED to collect large
data blindly. Rather, appropriate sampling of the model space
But simple observations nor apriori rules insufficient,
especially to cover low probability ‘rare events’
E.g., a pedestrian jumping in front of a car in auto-driving
We use simulation to generate rare events
High-level simulation capabilities essential
Search strategies in parameter space achieved by AI methods
High cost of repeated simulation amortized by multiple uses of the
trained network
32. HPC and AI Convergence for Society 5.0 Manufacturing
[Tsubokura et. al., R-CCS]
Combining ML/Deep Learning, Data Assimilation, Multivariate
Optimization with Simulation for new generation manufacturing
Use output of high-resolution simulation data to train AI
Construct AI surrogate model training on simulation data, allowing real-time CFD
to facilitate designer-engineer collaboration, multivariate design optimization, etc.
Use NN to derive reduced simulation model, allowing digital twin in cyberspace
corresponding to entities in physical space for real-time interactions
Neural Network
Designer-Engineer Collaboration Multivariate
Design Optimization
Auto-driving
AI Training
Real-Time CFD
Surrogate AI Models Reduced Simulation Models
Digital Twin
Physical Space
Simulation in CyberSpace
Real-world design
evaluation
33. Train AI with Simulation Inputs to create AI Surrogate Model
Hundreds of CFD simulations on variable car designs to generate training input data
Surrogate NN: body design input => Aerodynamic properties
Multivariate optimization w/genetic algorithm
Derive multiple optimal car designs rapidly
Example: Automotive Multivariate CFD Design using HPC & AI
Max epoch 10000
Batch size 100
CD
ΔCD
Optimization
Fuel Efficient Design
Wind shear Resistant Design
34. Social phenomena are complex, and have too many DOF and regimes/phases.
HPC with AI is a powerful tool to attack the complexity.
Example: car traffic in Kobe city
~30,000 roads, ~100,000 cars/day
Genetic Optimizations of Evacuation simulationsDeep Neural Network analysis
Learning of 30,000 dimensional results
↓ of Kobe traffic simulations
Classification of
↓ city traffic modes
~9,000 roads
86 safe places
~50,000
evacuees from
146 regions
Multivariate analysis of
simulation results
No.1 factor(colored)→
explain 10% of traffic
and following
~300 factors
from 5000 simulations.
GA code
of evacuation plan
Evolve towards
“simple” and
“effective” plan
Pareto optimum plans
“Big data” through G5 and the followings, not only be monitored,
but also be used on HPC and AI for optimization and redesign of our society.
Smart Cities and High Performance Computing [Ito et. al., R-CCS
Example: Tsunami evacuation planning
of Nishi-yodogawa ward in Osaka
35. 1-h-lead downpour forecast
refreshed every 30 seconds
at 100-m mesh
Revolutionizing Weather Prediction for Tokyo Olympics
[Miyoshi et. al., R-CCS]
Nature
Sensing
Human society and economy
Real worldCyberspace
1101011
1101011
1101011
Simulation Big Data
Big Data Assimilation
AI x HPC
synchronize
predict & control
IoT
Prepared for weather
36. Society5.0 innovates Big Data from being pure
data centric to becoming compute centric
36
Society5.0 incurs big change in big data, to be thrown away
Simple disposal: simple data selection, compression, etc
Correct disposal: extract the minimum set of features to allow
for necessary analysis as well as reconstruction of data
Such extraction to be performed multiple times from edge=>IDC
E.g. Extract CAD data from the photos of a building
Data reconstruction methodologies
Simulation: from physical models parameterized by the features
Generative AI, e.g. GAN
Both expensive, appropriate for HPC
Seamless integration required Society5.0 -> true HPC&AI
37. R-CCS & Fugaku Society 5.0 Values
• “Fugaku Arm” Pinnacle of Arm Ecosystem dominant for IOT
• A64fx CPU: World’s fastest gen-purpose CPU (but can run PowerPoint)
• Converged HPC, Cloud, AI, IoT Software Stack
• Support modern SW e.g. VM, Container, Packaging (Spack), etc.
• “Fugaku AI” Development
• PyTorch, TensorFlow based on SVE & DNNL for A64fx, Eigen, …
• Collaboration Between Fujitsu (Lab&Product), Arm, Riken (R-CCS, …)..
• Other research efforts on convergence of HPC & AI
• “Fugaku Cloud” facilitates broad cloud-style services
• Collaboration with cloud service providers -> 8 companies accepted
• 2020 experimental services, 2021 production
• “Fugaku Live Stream” provides multipkle live data streams to
Society5.0 IoT applications
• Variety of live data from multiple sources stored for certain period for
R&D of Society 5.0 apps as well as basic science apps
• Collaboration with various public & private sectors 37
38. 38
Industry use of Fugaku via
intermediary cloud SaaS
vendors, Fugaku as IaaS
A64fx and other Fugaku
Technology being
incorporated into the Cloud
Fugaku Cloud Strategy
HPC SaaS
Provider 1
HPC SaaS
Provider 2
HPC SaaS
Provider 3
Industry
User 1
Industry
User 2
Industry
User 3
Various Cloud Ser
vice API for HPC
Other IaaS
Commerci
al Cloud
Extreme Perf
ormance Adv
antage
KVM/Singularity, K
ubernetes, etc.
Cloud Vendor
1
Cloud Vendor
2
Cloud Vendor
3
Cloud Workload Becoming
HPC (including AI)
↓
Significant Performance Ad
vantage
↓
Millions of Units shipped to
Cloud
↓
Rebirth of JP Semion
富岳
39. 39
Fugaku accessibility with Cloud APIs
job control
SSH
accessAPI
OpenStack(willbetested)
Kubernetes(willbetested)
Compute nodes
Jobs can be executed via Fujitsu batch job scheduler
CUI and access API(NEWT2.0 based) are available
interactive use is also available under batch job scheduling
KVM and Singularity will be tested
Front-end/PrePost environment
Multi architecture based
x86(w/ GPU), arm TX2(w/ GPU), A64FX(48 nodes)
interactive/batch/OpenStack/Kubernetes (will be tested)
Amazon S3 compatible object storage (under
procurement)
new!
Front-end/PrePost
40. Cloud Service Providers Partnership
40
×
Action Items
• Cool Project name and logo!
• Trial methods to provide computing resources of Fugaku to end-users via service providers
• Evaluate the effectiveness of the methods quantitatively as possible and organize the issues
• The knowledges gained will be feedbacked to scheme design of Fugaku by the government
https://www.r-ccs.riken.jp/library/topics/200213.html (in Japanese)
41. 41
Fugaku Processor
◆High perf FP16&Int8
◆High mem BW for convolution
◆Built-in scalable Tofu network
Unprecedened DL scalability
High Performance DNN Convolution
Low Precision ALU + High Memory Bandwi
dth + Advanced Combining of Convolution
Algorithms (FFT+Winograd+GEMM)
High Performance and Ultra-Scalable Network
for massive scaling model & data parallelism
Unprecedented Scalability of Data/
Massive Scale Deep Learning on
Fugaku
C P U
For the
Fugaku
supercomputer
C P U
For the
Fugaku
supercomputer
C P U
For the
Fugaku
supercomputer
C P U
For the
Fugaku
supercomputer
TOFU Network w/high
injection BW for fast
reduction
42. Fujitsu-Riken-Arm joint effort on AI framework
development on SVE/A64FX
42
MOU Signed Fujitsu-Riken
Nov. 25, 2019
w/Arm in works
42 Copyright 2019 FUJITSU LIMITED
Optimization Levels
DL Frameworks
(PyTorch, TensorFlow)
Arm DNNL
A64fx processor
Arm Xbyak, Eigen + Lib
Step1
Step2
Step3
Exaops of sim, data, and AI on Fugaku and Cloud
43. 43
Large Scale Public AI Infrastructures in Japan
Deployed Purpose AI Processor Inference
Peak Perf.
Training
Peak Perf.
Top500
Perf/Rank
Green500
Perf/Rank
Tokyo Tech.
TSUBAME3
July 2017 HPC + AI
Public
NVIDIA P100
x 2160
45.8 PF
(FP16)
22.9 PF / 45.8PF
(FP32/FP16)
8.125 PF
#22
13.704 GF/W
#5
U-Tokyo
Reedbush-
H/L
Apr. 2018
(update)
HPC + AI
Public
NVIDIA P100
x 496
10.71 PF
(FP16)
5.36 PF / 10.71PF
(FP32/FP16)
(Unranked
)
(Unranked)
U-Kyushu
ITO-B
Oct. 2017 HPC + AI
Public
NVIDIA P100
x 512
11.1 PF
(FP16)
5.53 PF/11.1 PF
(FP32/FP16)
(Unranked
)
(Unranked)
AIST-AIRC
AICC
Oct. 2017 AI
Lab Only
NVIDIA P100
x 400
8.64 PF
(FP16)
4.32 PF / 8.64PF
(FP32/FP16)
0.961 PF
#446
12.681 GF/W
#7
Riken-AIP
Raiden
Apr. 2018
(update)
AI
Lab Only
NVIDIA V100
x 432
54.0 PF
(FP16)
6.40 PF/54.0 PF
(FP32/FP16)
1.213 PF
#280
11.363 GF/W
#10
AIST-AIRC
ABCI
Aug.
2018
AI
Public
NVIDIA V100
x 4352
544.0 PF
(FP16)
65.3 PF/544.0 PF
(FP32/FP16)
19.88 PF
#7
14.423 GF/W
#4
NICT
(unnamed)
Summer
2019
AI
Lab Only
NVIDIA V100
x 1700程度
~210 PF
(FP16)
~26 PF/~210 PF
(FP32/FP16)
???? ????
C.f. US ORNL
Summit
Summer
2018
HPC + AI
Public
NVIDIA V100
x 27,000
3,375 PF
(FP16)
405 PF/3,375 PF
(FP32/FP16)
143.5 PF
#1
14.668 GF/W
#3
Riken R-CCS
Fugaku
2020
~2021
HPC + AI
Public
Fujitsu A64fx
> x 150,000
> 4000 PO
(Int8)
>1000PF/>2000PF
(FP32/FP16)
> 400PF
#1 (2020?)
> 15 GF/W
???
ABCI 2 2022 AI Future GPU Similar similar ~100PF 25~30GF/W
Inference
838.5PF
Training
86.9 PF
vs. Summit
Inf. 1/4
Train. 1/5
45. Transistor Lithography Scaling
(CMOS Logic Circuits, DRAM/SRAM)
Loosely Coupled with Electronic Interconnect
Data Data
Hardware/Software System APIs
Flops-Centric Massively Parallel Architecture
Flops-Centric Monolithic System Software
Novel Devices + CMOS (Dark Silicon)
(Nanophotonics, Non-Volatile Devices etc.)
Ultra Tightly Coupled w/Aggressive
3-D+Photonic Switching Interconnected
Hardware/Software System APIs
“Cambrian” Heterogeneous Architecture
Cambrian Heterogeneous System Software
Heterogeneous CPUs + Holistic Data
Data Data
Homogeneous General Purpose Nodes
+ Localized Data
Reconfigurable
Dataflow Optical
ComputingDNN&
Neuromorphic
Massive BW
3-D Package
Quantum
ComputingLow Precision
Error-Prone
Non-Volatile
Memory
Flops-Centric Monolithic Algorithms and Apps Cambrian Heterogeneous Algorithms and Apps
Compute
Nodes
Gen CPU Gen CPU
汎用CPU Gen CPU
~2025
M-P Extinction
Event
Many Core Era
Post Moore
Cambrian Era
Compute
Nodes
Compute
Nodes
Compute
Nodes
46. Towards 100x processor in 2028
Various combinations of CPU architectures, new memory devices and 3-D technologies
Perf. measurement/characterization/models for high-BW intra-chip data movement
Cost models and algorithms for horizontal & hierarchical data movement
Programming models and heterogeneous resource management
NEDO 100x Processor Project
Riken (R-CCS)/U-Tokyo/Tokyo Tech
12 Apr, 2019
2028 Strawman 100x
Architecture
Future HPC & BD/AI Converged Applications
100x BYTES-centric archi
tecture through perform
ance simulations (R-CCS)
Heterogeneous & High
BW Vector CGRA archi
tecture (R-CCS)
High-Bandwidth Hierarchic
al Memory systems and the
ir management & Program
ming (Tokyo Tech)
New programming met
hodologies for heteroge
neous, high-bandwidth
systems (Univ. Tokyo)
> 1TB NVM
Novel memory devices
10TbpsWDMNetwork