Future of hpc

Future of HPC
Putchong Uthayopas
Department of Computer Engineering,
Faculty of Engineering, Kasetsart University
Bangkok, Thailand.
putchong@ku.th

What is High Performance Computing?
• The use of a very powerful computing
system to solve large scale problem
• Hardware and system architecture
• Scalable Software system and tools
• Why?
• Science and Engineering research
• CFD, Genomics, Automobile Design, Drug
discovery, disaster simulation
• Business is moving more to the analysis of
Big Data/AI/Machine Intelligent
• Social Media
• Sensor: Location Image Video from drone video

Next Generation Challenging Application
• Traditional Computational Science with
a much larger problem
• Disaster and earth quake simulation
• Urban science
• Data Intensive science
• Square Kilometer Array Project
• Once the telescope is operational, it will be
collecting data in the ballpark of 11 exabytes
per day!
• AI / Machine learning/ Deep learning
application to science
• Guiding simulation
• Insight into massive scientific data

Parallel Computing
•Parallel computing is the use of multiple processors
or computers to solve the same common task
• Break large task into many small sub tasks
• Execute these sub tasks on multiple core ort processors
• Collect result together
6

How to achieve
parallelism?
• Adding more concurrency
into hardware
• Processor , I/O ,
Memory
into software
• How to express
parallelism better in
software
into algorithm
• How to do many thing at
the same time
• How to make people
think in parallel

System level
Parallelism in
Hardware
• Using multithreading on many
core processor that is compatible
to conventional processor
• Accelerator
• Using very large number of
small processors core in a
SIMD model. Evolving from
graphics technology
• NVIDIA GPU
• Hardware using FPGA

Heterogeneous Computing
• CPU (host) + GPUs (devices) Computing
• CPU and GPUs are different entities.
• Both have their own memory space.
• Offloading parallel computation to GPUs.
Source: Introduction to CUDA, Mike Clark, NVIDIA, http://www.int.washington.edu/PROGRAMS/12-2c/week3/clark_01.pdf
8/2/2018 9

CPUs vs GPUs
• Intel Xeon Scalable: 28 cores per socket.
Up to 8 sockets
• Optimized for latency
• Get job done fast.
• Lower Memory Bandwidth
• But larger memory size
• Suitable for generic workloads
• Nvidia V100: 5120 cores. Up to 16 GPUs
per node
• Optimized for throughput
• Get more jobs done
• Higher Memory Bandwidth
• But smaller memory size
• Suitable for highly parallel workloads.
Source: CUDA Programming Guide, NVIDIA, https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
8/2/2018 10

Challenges
• Large number of core will have to divide memory among them
• Much smaller memory per core
• Demand high memory bandwidth
• Still need an effective fine grain parallel programming model
• No free lunch , programmer have to do some work

Getting the most out of these system
•Data partitioning strategy that generate less
communication
•Scheduling communication in algorithm to reduce
the conflict of transmission
•Restructure algorithm to use local or neighborhood
communication rather than costly global
communication
•Mapping algorithm to fully utilize the memory
hierarchy and processor structure

Approach to Future HPC system
• Conventional CPU with GPU and highspeed GPU interconnect
• Summit (IBM power9 + GPU V100)
• Energy efficient many core
• Post-K system (Riken, Japan)
• Sunway TaihuLight
• Balanced System for Next Generation Workload
• ABCI (AIST, Japan)

Top 500 List
Who is at the top of
the world?

SUMMIT
What makes a supercomputer smart?
In the case of the Oak Ridge Leadership
Computing Facility’s newest leadership-class
system, Summit, the answer is in the
architecture.
Summit, an IBM AC922 system, links more than
27,000 NVIDIA Volta V100 GPUs with more
than 9,000 IBM Power9 CPUs to provide
unprecedented opportunities for the integration of
artificial intelligence (AI) and scientific discovery.
Applying AI techniques like deep learning to
automate, accelerate, and drive understanding at
supercomputer scales will help scientists achieve
breakthroughs in human health, energy, and
engineering and answer fundamental questions
about the universe.
The arrival of AI supercomputing and Summit
means science has never been smarter.

Post-K system
References: Introduction of Post-K development, Yutaka Ishikawa, RIKEN AICS, 12th of December, 2017

Reference: Thomas Sterling, “HPC Achievement and Impact2018”, ISC2018 Keynote session

Most Powerful Supercomputer in China
https://www.nextplatform.com/2016/06/20/look-inside-chinas-chart-topping-new-supercomputer/

https://www.nextplatform.com/2016/06/20/look-inside-chinas-chart-topping-new-supercomputer/

Reference: Jack Dongarra, Current High-Performance Computing Trends and Challenges for the Future, ASC16
Computing insight, 2016.

Going to exascale computing and
beyond
Reference: Thomas Sterling, “HPC Achievement and Impact2018”, ISC2018
Keynote session

Exascale applications
https://www.exascaleproject.org/focus_area/software/

The common requirement across
all three problems is for a flexible
deep learning engine that
• scales to use the full
memory (> PBs)
• effectively utilizes high-
performance interconnects
• effectively exploits
advanced memory
hierarchies incorporating
both ultra-high bandwidth
memory stacks and non-
volatile memory
• takes full advantage of
floating point accelerators
for vector and matrix
operations, and
http://candle.cels.anl.gov/
Exascale Deep Learning and Simulation
Enabled Precision Medicine for Cancer focuses
on building a scalable deep neural network code
called the CANcer Distributed Learning
Environment (CANDLE) that addresses three top
challenges of the National Cancer Institute:
understanding the molecular basis of key protein
interactions, developing predictive models for
drug response and automating the analysis and
extraction of information from millions of cancer
patient records to determine optimal cancer
treatment strategies.
Rick Stevens, Principal Investigator, Argonne National
Laboratory, with Los Alamos National Laboratory, Lawrence
Livermore National Laboratory, Oak Ridge National Laboratory
and the National Cancer Institute.
The scale of the deep learning in
this problem comes from the
size of the state-space (O(109))
that must be navigated and the
number of model parameters to
describe each state (O(1012)).

Post-K applications

Exascale Computing War
• China schedule first exascale
computer in 2020
• US first exascale is schedule
in 2021 at Argonne national
laboratory
• Japan schedule the full Post-
K in 2022
• EU will reach exascale in
2021
http://www.sciencemag.org/news/2018/02/racing-
match-chinas-growing-computer-power-us-outlines-
design-exascale-computer

On the Horizon: Quantum Computing
• Potentially much faster than conventional
computer for certain class of application
• Still have to solve many engineering
problem to make a stable and powerful
platform
• How to program quantum computer in a
simple way is much needed
• Back end to the cloud (google, IBM)

Yottaflops = 1000 Exaflops ☺
Reference: Thomas Sterling, “HPC Achievement and Impact2018”, ISC2018 Keynote session

Future of hpc

More Related Content

What's hot

Similar to Future of hpc

More from Putchong Uthayopas

Recently uploaded

Future of hpc