Future of HPC
Putchong Uthayopas
Department of Computer Engineering,
Faculty of Engineering, Kasetsart University
Bangkok, Thailand.
putchong@ku.th
What is High Performance Computing?
• The use of a very powerful computing
system to solve large scale problem
• Hardware and system architecture
• Scalable Software system and tools
• Why?
• Science and Engineering research
• CFD, Genomics, Automobile Design, Drug
discovery, disaster simulation
• Business is moving more to the analysis of
Big Data/AI/Machine Intelligent
• Social Media
• Sensor: Location Image Video from drone video
Next Generation Challenging Application
• Traditional Computational Science with
a much larger problem
• Disaster and earth quake simulation
• Urban science
• Data Intensive science
• Square Kilometer Array Project
• Once the telescope is operational, it will be
collecting data in the ballpark of 11 exabytes
per day!
• AI / Machine learning/ Deep learning
application to science
• Guiding simulation
• Insight into massive scientific data
Parallel Computing
•Parallel computing is the use of multiple processors
or computers to solve the same common task
• Break large task into many small sub tasks
• Execute these sub tasks on multiple core ort processors
• Collect result together
6
How to achieve
parallelism?
• Adding more concurrency
into hardware
• Processor , I/O ,
Memory
• Adding more concurrency
into software
• How to express
parallelism better in
software
• Adding more concurrency
into algorithm
• How to do many thing at
the same time
• How to make people
think in parallel
System level
Parallelism in
Hardware
• Using multithreading on many
core processor that is compatible
to conventional processor
• Accelerator
• Using very large number of
small processors core in a
SIMD model. Evolving from
graphics technology
• NVIDIA GPU
• Hardware using FPGA
Heterogeneous Computing
• CPU (host) + GPUs (devices) Computing
• CPU and GPUs are different entities.
• Both have their own memory space.
• Offloading parallel computation to GPUs.
Source: Introduction to CUDA, Mike Clark, NVIDIA, http://www.int.washington.edu/PROGRAMS/12-2c/week3/clark_01.pdf
8/2/2018 9
CPUs vs GPUs
• Intel Xeon Scalable: 28 cores per socket.
Up to 8 sockets
• Optimized for latency
• Get job done fast.
• Lower Memory Bandwidth
• But larger memory size
• Suitable for generic workloads
• Nvidia V100: 5120 cores. Up to 16 GPUs
per node
• Optimized for throughput
• Get more jobs done
• Higher Memory Bandwidth
• But smaller memory size
• Suitable for highly parallel workloads.
Source: CUDA Programming Guide, NVIDIA, https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
8/2/2018 10
Challenges
• Large number of core will have to divide memory among them
• Much smaller memory per core
• Demand high memory bandwidth
• Still need an effective fine grain parallel programming model
• No free lunch , programmer have to do some work
Getting the most out of these system
•Data partitioning strategy that generate less
communication
•Scheduling communication in algorithm to reduce
the conflict of transmission
•Restructure algorithm to use local or neighborhood
communication rather than costly global
communication
•Mapping algorithm to fully utilize the memory
hierarchy and processor structure
Approach to Future HPC system
• Conventional CPU with GPU and highspeed GPU interconnect
• Summit (IBM power9 + GPU V100)
• Energy efficient many core
• Post-K system (Riken, Japan)
• Sunway TaihuLight
• Balanced System for Next Generation Workload
• ABCI (AIST, Japan)
Top 500 List
Who is at the top of
the world?
SUMMIT
What makes a supercomputer smart?
In the case of the Oak Ridge Leadership
Computing Facility’s newest leadership-class
system, Summit, the answer is in the
architecture.
Summit, an IBM AC922 system, links more than
27,000 NVIDIA Volta V100 GPUs with more
than 9,000 IBM Power9 CPUs to provide
unprecedented opportunities for the integration of
artificial intelligence (AI) and scientific discovery.
Applying AI techniques like deep learning to
automate, accelerate, and drive understanding at
supercomputer scales will help scientists achieve
breakthroughs in human health, energy, and
engineering and answer fundamental questions
about the universe.
The arrival of AI supercomputing and Summit
means science has never been smarter.
Post-K system
References: Introduction of Post-K development, Yutaka Ishikawa, RIKEN AICS, 12th of December, 2017
References: Introduction of Post-K development, Yutaka Ishikawa, RIKEN AICS, 12th of December, 2017
References: Introduction of Post-K development, Yutaka Ishikawa, RIKEN AICS, 12th of December, 2017
References: Introduction of Post-K development, Yutaka Ishikawa, RIKEN AICS, 12th of December, 2017
Reference: Thomas Sterling, “HPC Achievement and Impact2018”, ISC2018 Keynote session
Reference: Thomas Sterling, “HPC Achievement and Impact2018”, ISC2018 Keynote session
Most Powerful Supercomputer in China
https://www.nextplatform.com/2016/06/20/look-inside-chinas-chart-topping-new-supercomputer/
https://www.nextplatform.com/2016/06/20/look-inside-chinas-chart-topping-new-supercomputer/
ABCI
Reference: Jack Dongarra, Current High-Performance Computing Trends and Challenges for the Future, ASC16
Computing insight, 2016.
Reference: Jack Dongarra, Current High-Performance Computing Trends and Challenges for the Future, ASC16
Computing insight, 2016.
Going to exascale computing and
beyond
Reference: Thomas Sterling, “HPC Achievement and Impact2018”, ISC2018
Keynote session
Exascale applications
https://www.exascaleproject.org/focus_area/software/
The common requirement across
all three problems is for a flexible
deep learning engine that
• scales to use the full
memory (> PBs)
• effectively utilizes high-
performance interconnects
• effectively exploits
advanced memory
hierarchies incorporating
both ultra-high bandwidth
memory stacks and non-
volatile memory
• takes full advantage of
floating point accelerators
for vector and matrix
operations, and
http://candle.cels.anl.gov/
Exascale Deep Learning and Simulation
Enabled Precision Medicine for Cancer focuses
on building a scalable deep neural network code
called the CANcer Distributed Learning
Environment (CANDLE) that addresses three top
challenges of the National Cancer Institute:
understanding the molecular basis of key protein
interactions, developing predictive models for
drug response and automating the analysis and
extraction of information from millions of cancer
patient records to determine optimal cancer
treatment strategies.
Rick Stevens, Principal Investigator, Argonne National
Laboratory, with Los Alamos National Laboratory, Lawrence
Livermore National Laboratory, Oak Ridge National Laboratory
and the National Cancer Institute.
The scale of the deep learning in
this problem comes from the
size of the state-space (O(109))
that must be navigated and the
number of model parameters to
describe each state (O(1012)).
Post-K applications
References: Introduction of Post-K development, Yutaka Ishikawa, RIKEN AICS, 12th of December, 2017
References: Introduction of Post-K development, Yutaka Ishikawa, RIKEN AICS, 12th of December, 2017
Exascale Computing War
• China schedule first exascale
computer in 2020
• US first exascale is schedule
in 2021 at Argonne national
laboratory
• Japan schedule the full Post-
K in 2022
• EU will reach exascale in
2021
http://www.sciencemag.org/news/2018/02/racing-
match-chinas-growing-computer-power-us-outlines-
design-exascale-computer
On the Horizon: Quantum Computing
• Potentially much faster than conventional
computer for certain class of application
• Still have to solve many engineering
problem to make a stable and powerful
platform
• How to program quantum computer in a
simple way is much needed
• Back end to the cloud (google, IBM)
Yottaflops = 1000 Exaflops ☺
Reference: Thomas Sterling, “HPC Achievement and Impact2018”, ISC2018 Keynote session
Thank you

Future of hpc

  • 1.
    Future of HPC PutchongUthayopas Department of Computer Engineering, Faculty of Engineering, Kasetsart University Bangkok, Thailand. putchong@ku.th
  • 2.
    What is HighPerformance Computing? • The use of a very powerful computing system to solve large scale problem • Hardware and system architecture • Scalable Software system and tools • Why? • Science and Engineering research • CFD, Genomics, Automobile Design, Drug discovery, disaster simulation • Business is moving more to the analysis of Big Data/AI/Machine Intelligent • Social Media • Sensor: Location Image Video from drone video
  • 3.
    Next Generation ChallengingApplication • Traditional Computational Science with a much larger problem • Disaster and earth quake simulation • Urban science • Data Intensive science • Square Kilometer Array Project • Once the telescope is operational, it will be collecting data in the ballpark of 11 exabytes per day! • AI / Machine learning/ Deep learning application to science • Guiding simulation • Insight into massive scientific data
  • 6.
    Parallel Computing •Parallel computingis the use of multiple processors or computers to solve the same common task • Break large task into many small sub tasks • Execute these sub tasks on multiple core ort processors • Collect result together 6
  • 7.
    How to achieve parallelism? •Adding more concurrency into hardware • Processor , I/O , Memory • Adding more concurrency into software • How to express parallelism better in software • Adding more concurrency into algorithm • How to do many thing at the same time • How to make people think in parallel
  • 8.
    System level Parallelism in Hardware •Using multithreading on many core processor that is compatible to conventional processor • Accelerator • Using very large number of small processors core in a SIMD model. Evolving from graphics technology • NVIDIA GPU • Hardware using FPGA
  • 9.
    Heterogeneous Computing • CPU(host) + GPUs (devices) Computing • CPU and GPUs are different entities. • Both have their own memory space. • Offloading parallel computation to GPUs. Source: Introduction to CUDA, Mike Clark, NVIDIA, http://www.int.washington.edu/PROGRAMS/12-2c/week3/clark_01.pdf 8/2/2018 9
  • 10.
    CPUs vs GPUs •Intel Xeon Scalable: 28 cores per socket. Up to 8 sockets • Optimized for latency • Get job done fast. • Lower Memory Bandwidth • But larger memory size • Suitable for generic workloads • Nvidia V100: 5120 cores. Up to 16 GPUs per node • Optimized for throughput • Get more jobs done • Higher Memory Bandwidth • But smaller memory size • Suitable for highly parallel workloads. Source: CUDA Programming Guide, NVIDIA, https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html 8/2/2018 10
  • 14.
    Challenges • Large numberof core will have to divide memory among them • Much smaller memory per core • Demand high memory bandwidth • Still need an effective fine grain parallel programming model • No free lunch , programmer have to do some work
  • 15.
    Getting the mostout of these system •Data partitioning strategy that generate less communication •Scheduling communication in algorithm to reduce the conflict of transmission •Restructure algorithm to use local or neighborhood communication rather than costly global communication •Mapping algorithm to fully utilize the memory hierarchy and processor structure
  • 16.
    Approach to FutureHPC system • Conventional CPU with GPU and highspeed GPU interconnect • Summit (IBM power9 + GPU V100) • Energy efficient many core • Post-K system (Riken, Japan) • Sunway TaihuLight • Balanced System for Next Generation Workload • ABCI (AIST, Japan)
  • 18.
    Top 500 List Whois at the top of the world?
  • 19.
    SUMMIT What makes asupercomputer smart? In the case of the Oak Ridge Leadership Computing Facility’s newest leadership-class system, Summit, the answer is in the architecture. Summit, an IBM AC922 system, links more than 27,000 NVIDIA Volta V100 GPUs with more than 9,000 IBM Power9 CPUs to provide unprecedented opportunities for the integration of artificial intelligence (AI) and scientific discovery. Applying AI techniques like deep learning to automate, accelerate, and drive understanding at supercomputer scales will help scientists achieve breakthroughs in human health, energy, and engineering and answer fundamental questions about the universe. The arrival of AI supercomputing and Summit means science has never been smarter.
  • 21.
    Post-K system References: Introductionof Post-K development, Yutaka Ishikawa, RIKEN AICS, 12th of December, 2017
  • 23.
    References: Introduction ofPost-K development, Yutaka Ishikawa, RIKEN AICS, 12th of December, 2017
  • 24.
    References: Introduction ofPost-K development, Yutaka Ishikawa, RIKEN AICS, 12th of December, 2017
  • 25.
    References: Introduction ofPost-K development, Yutaka Ishikawa, RIKEN AICS, 12th of December, 2017
  • 26.
    Reference: Thomas Sterling,“HPC Achievement and Impact2018”, ISC2018 Keynote session
  • 27.
    Reference: Thomas Sterling,“HPC Achievement and Impact2018”, ISC2018 Keynote session
  • 28.
    Most Powerful Supercomputerin China https://www.nextplatform.com/2016/06/20/look-inside-chinas-chart-topping-new-supercomputer/
  • 29.
  • 31.
  • 41.
    Reference: Jack Dongarra,Current High-Performance Computing Trends and Challenges for the Future, ASC16 Computing insight, 2016.
  • 42.
    Reference: Jack Dongarra,Current High-Performance Computing Trends and Challenges for the Future, ASC16 Computing insight, 2016.
  • 43.
    Going to exascalecomputing and beyond Reference: Thomas Sterling, “HPC Achievement and Impact2018”, ISC2018 Keynote session
  • 46.
  • 49.
    The common requirementacross all three problems is for a flexible deep learning engine that • scales to use the full memory (> PBs) • effectively utilizes high- performance interconnects • effectively exploits advanced memory hierarchies incorporating both ultra-high bandwidth memory stacks and non- volatile memory • takes full advantage of floating point accelerators for vector and matrix operations, and http://candle.cels.anl.gov/ Exascale Deep Learning and Simulation Enabled Precision Medicine for Cancer focuses on building a scalable deep neural network code called the CANcer Distributed Learning Environment (CANDLE) that addresses three top challenges of the National Cancer Institute: understanding the molecular basis of key protein interactions, developing predictive models for drug response and automating the analysis and extraction of information from millions of cancer patient records to determine optimal cancer treatment strategies. Rick Stevens, Principal Investigator, Argonne National Laboratory, with Los Alamos National Laboratory, Lawrence Livermore National Laboratory, Oak Ridge National Laboratory and the National Cancer Institute. The scale of the deep learning in this problem comes from the size of the state-space (O(109)) that must be navigated and the number of model parameters to describe each state (O(1012)).
  • 50.
    Post-K applications References: Introductionof Post-K development, Yutaka Ishikawa, RIKEN AICS, 12th of December, 2017
  • 51.
    References: Introduction ofPost-K development, Yutaka Ishikawa, RIKEN AICS, 12th of December, 2017
  • 56.
    Exascale Computing War •China schedule first exascale computer in 2020 • US first exascale is schedule in 2021 at Argonne national laboratory • Japan schedule the full Post- K in 2022 • EU will reach exascale in 2021 http://www.sciencemag.org/news/2018/02/racing- match-chinas-growing-computer-power-us-outlines- design-exascale-computer
  • 57.
    On the Horizon:Quantum Computing • Potentially much faster than conventional computer for certain class of application • Still have to solve many engineering problem to make a stable and powerful platform • How to program quantum computer in a simple way is much needed • Back end to the cloud (google, IBM)
  • 58.
    Yottaflops = 1000Exaflops ☺ Reference: Thomas Sterling, “HPC Achievement and Impact2018”, ISC2018 Keynote session
  • 59.