SlideShare a Scribd company logo
C++ on its way to exascale and beyond
– The HPX Parallel Runtime System
Thomas Heller (
January 21, 2016
This project has received funding from the Eu-
ropean Union‘s Horizon 2020 research and in-
novation programme under grant agreement No.
What is Exascale anyway?
This project has received funding from the Eu-
ropean Union‘s Horizon 2020 research and in-
novation programme under grant agreement No.
Exascale in numbers
• An Exascale Computer is supposed to execute 1018
floating point
operations in a second
• Exa: 1018
= 1000000000000000000
• People on Earth: 7.3 Billion = 7.3 ∗ 109
• Imagine each person is able to compute one operation per second. It
⇒ 136986301 seconds
⇒ 2283105 minutes
⇒ 38051 hours
⇒ 1585 days
⇒ 4 years
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
3/ 51
Why do we need that many calculations?
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
4/ 51
Why do we need that many calculations?
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
4/ 51
Why do we need that many calculations?
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
4/ 51
Why do we need that many calculations?
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
4/ 51
Why do we need that many calculations?
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
4/ 51
Why do we need that many calculations?
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
4/ 51
Why do we need that many calculations?
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
4/ 51
Why do we need that many calculations?
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
4/ 51
• How do we program those beasts?
⇒ Massively parallel processors
⇒ Massive amount of compute nodes
⇒ Deep Memory hierarchies
• How can we design the architecture to be affordable?
⇒ Biggest Operational cost is Energy
⇒ Power Envelop of 20MW
⇒ Current fastest Computer (Tian-He 2): 17MW
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
5/ 51
Current Development
Current #1 System:
• Tian-He 2: 33.9 PFLOPS
• 4% of an Exaflop
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
6/ 51
Hardware Trends
• ARM: Low-Power ARM64 cores (maybe adding embedded GPU
• IBM: POWER + NVIDIA Accelerators
• Intel: Knights Landing (Xeon Phi) Many Core processor
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
7/ 51
How will C++ deal with all that?!?
This project has received funding from the Eu-
ropean Union‘s Horizon 2020 research and in-
novation programme under grant agreement No.
• Programmability
• Expressing Parallelism
• Expressing Data Locality
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
9/ 51
The 4 Horsemen of the Apocalypse: SLOW
Waiting for contention
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
10/ 51
State of the Art
• Modern architectures impose massive challenges on programmability in
the context of performance portability
• Massive increase in on-node parallelism
• Deep memory hierarchies
• Only portable parallelization solution for C++ programmers (today):
OpenMP and MPI
• Hugely successful for years
• Widely used and supported
• Simple use for simple use cases
• Very portable
• Highly optimized
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
11/ 51
State of the Art – Parallelism in C++
• C++11 introduced lower level abstractions
• std::thread, std::mutex, std::future, etc.
• Fairly limited, more is needed
• C++ needs stronger support for higher-level parallelism
• Several proposals to the Standardization Committee are accepted or
under consideration
• Technical Specification: Concurrency (P0159, note: misnomer)
• Technical Specification: Parallelism (P0024)
• Other smaller proposals: resumable functions, task regions, executors
• Currently there is no overarching vision related to higher-level parallelism
• Goal is to standardize a ‘big story’ by 2020
• No need for OpenMP, OpenACC, OpenCL, etc.
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
12/ 51
Stepping Aside – Introducing HPX
This project has received funding from the Eu-
ropean Union‘s Horizon 2020 research and in-
novation programme under grant agreement No.
HPX – A general purpose parallel Runtime System
• Solidly based on a theoretical foundation – a well defined, new execution
model (ParalleX)
• Exposes a coherent and uniform, standards-oriented API for ease of
programming parallel and distributed applications.
• Enables to write fully asynchronous code using hundreds of millions of threads.
• Provides unified syntax and semantics for local and remote operations.
• Open Source: Published under the Boost Software License
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
14/ 51
HPX – A general purpose parallel Runtime System
HPX represents an innovative mixture of
• A global system-wide address space (AGAS - Active Global Address
• Fine grain parallelism and lightweight synchronization
• Combined with implicit, work queue based, message driven computation
• Full semantic equivalence of local and remote execution, and
• Explicit support for hardware accelerators (through percolation)
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
15/ 51
HPX 101 – The programming model
Locality 0
Locality 1
Locality i
Locality N-1
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
16/ 51
HPX 101 – The programming model
Global Address Space
Locality 0
Locality 1
Locality i
Locality N-1
Active Global Address Space (AGAS) Service
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
16/ 51
HPX 101 – The programming model
Global Address Space
Locality 0
Locality 1
Locality i
Locality N-1
Active Global Address Space (AGAS) Service
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
16/ 51
HPX 101 – The programming model
Global Address Space
Locality 0
Locality 1
Locality i
Locality N-1
Active Global Address Space (AGAS) Service
future <id_type > id =
new_ <Component >( locality , ...);
future <R> result =
async(id.get(), action , ...);
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
16/ 51
HPX 101 – The programming model
Locality 0 Locality 1 Locality i Locality N-1
Active Global Address Space (AGAS) Service
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
16/ 51
HPX 101 – Overview
C++ Standard Library
R f(p...) Synchronous Asynchronous Fire & Forget
(returns R) (returns future<R>) (returns void)
Functions f(p...) async(f, p...) apply(f, p...)
Functions bind(f, p...)(...) async(bind(f, p...), ...) apply(bind(f, p...), ...)
Actions HPX_ACTION(f, a) HPX_ACTION(f, a) HPX_ACTION(f, a)
(direct) a()(id, p...) async(a(), id, p...) apply(a(), id, p...)
Actions HPX_ACTION(f, a) HPX_ACTION(f, a) HPX_ACTION(f, a)
(lazy) bind(a(), id, p...)
async(bind(a(), id, p...),
apply(bind(a(), id, p...),
In Addition: dataflow(func, f1, f2);
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
17/ 51
The Future, an example
int universal_answer () { return 42; }
void deep_thought () {
future <int > promised_answer
= async(& universal_answer);
// do other things for 7.5 million years
cout << promised_answer.get() << endl;
// prints 42, eventually
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
18/ 51
Compositional facilities
• Sequential composition of futures
future <string > make_string () {
future <int > f1 =
async ([]() -> int { return 123; });
future <string > f2 = f1.then(
[](future <int > f) -> string
// here .get() won’t block
return to_string(f.get());
return f2;
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
19/ 51
Compositional facilities
• Parallel composition of futures
future <int > test_when_all () {
future <int > future1 =
async ([]() -> int { return 125; });
future <string > future2 =
async ([]() -> string { return string("hi"); });
auto all_f = when_all(future1 , future2);
future <int > result = all_f.then(
[]( auto f) -> int {
return do_work(f.get());
return result;
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
20/ 51
Dataflow – The new ’async’ (HPX)
• What if one or more arguments to ’async’ are futures themselves?
• Normal behavior: pass futures through to function
• Extended behavior: wait for futures to become ready before invoking the
template <typename F, typename ... Arg >
future <result_of_t <F(Args ...) >>
// requires(is_callable <F(Arg ...) >)
dataflow(F && f, Arg &&... arg);
• If ArgN is a future, then the invocation of F will be delayed
• Non-future arguments are passed through
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
21/ 51
Parallel Algorithms
This project has received funding from the Eu-
ropean Union‘s Horizon 2020 research and in-
novation programme under grant agreement No.
Concepts of Parallelism – Parallel Execution Properties
• The execution restrictions applicable for the work items
• In what sequence the work items have to be executed
• Where the work items should be executed
• The parameters of the execution environment
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
23/ 51
Concepts and Types of Parallelism
Execution Policies
Executors Executor Parameters
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
24/ 51
Concepts and Types of Parallelism
Execution Policies
Executors Executor Parameters
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
24/ 51
Concepts and Types of Parallelism
Execution Policies
Executors Executor Parameters
Sequence, Where
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
24/ 51
Concepts and Types of Parallelism
Execution Policies
Executors Executor Parameters
Sequence, Where
Grain Size
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
24/ 51
Concepts and Types of Parallelism
Execution Policies
Executors Executor Parameters
Sequence, Where
Grain Size
Futures, Async, Dataflow
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
24/ 51
Concepts and Types of Parallelism
Execution Policies
Executors Executor Parameters
Sequence, Where
Grain Size
Futures, Async, Dataflow
Parallel Algorithms Fork-Join, etc
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
24/ 51
Execution Policies (std)
• Specify execution guarantees (in terms of thread-safety) for executed
parallel tasks:
• sequential_execution_policy: seq
• parallel_execution_policy: par
• parallel_vector_execution_policy: par_vec
• In parallelism TS used for parallel algorithms only
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
25/ 51
Execution Policies (Extensions)
• Asynchronous Execution Policies:
• sequential_task_execution_policy: seq(task)
• parallel_task_execution_policy: par(task)
• In both cases the formerly synchronous functions return a future<>
• Instruct the parallel construct to be executed asynchronously
• Allows integration with asynchronous control flow
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
26/ 51
• Executor are objects responsible for
• Creating execution agents on which work is performed (P0058)
• In P0058 this is limited to parallel algorithms, here much broader use
• Abstraction of the (potentially platform-specific) mechanisms for launching
• Responsible for defining the Where and How of the execution of tasks
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
27/ 51
Execution Parameters
Allows to control the grain size of work
• i.e. amount of iterations of a parallel for_each run on the same thread
• Similar to OpenMP scheduling policies: static, guided, dynamic
• Much more fine control
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
28/ 51
Putting it all together – SAXPY routine with data locality
• a[i] = b[i] ∗ x + c[i], for i from 0 to N − 1
• Using parallel algorithms
• Explicit Control over data locality
• No raw Loops
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
29/ 51
Putting it all together – SAXPY routine with data locality
Complete serial version:
std::vector <double > a = ...;
std::vector <double > b = ...;
std::vector <double > c = ...;
double x = ...;
std:: transform(b.begin(), b.end(),
c.begin(), c.end(), a.begin(),
[x]( double bb, double cc)
return bb * x + cc;
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
30/ 51
Putting it all together – SAXPY routine with data locality
Parallel version, no data locality:
std::vector <double > a = ...;
std::vector <double > b = ...;
std::vector <double > c = ...;
double x = ...;
parallel :: transform(parallel ::par ,
b.begin(), b.end(),
c.begin(), c.end(), a.begin(),
[x]( double bb, double cc)
return bb * x + cc;
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
31/ 51
Putting it all together – SAXPY routine with data locality
Parallel version, no data locality:
std::vector <double , numa_allocator > a = ...;
std::vector <double , numa_allocator > b = ...;
std::vector <double , numa_allocator > c = ...;
double x = ...;
for(numa_executor : numa_executors) {
parallel :: transform(
parallel ::par.on(numa_executor),
b.begin() +..., b.begin() +...,
c.begin() +..., c.begin() +..., a.begin() +...,
[x]( double bb, double cc)
{ return bb * x + cc; });
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
32/ 51
Case Studies
This project has received funding from the Eu-
ropean Union‘s Horizon 2020 research and in-
novation programme under grant agreement No.
• C++ Auto-parallelizing framework
• Open Source
• High scalability
• Wide range of platform support
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
34/ 51
Futurizing the Simulation Flow
Basic Simulation flow:
for(Region r: innerRegion) {
update(r, oldGrid , newGrid , step);
swap(oldGrid , newGrid);
for(Region r: outerGhostZoneRegion) {
notifyPatchProviders(r, oldGrid);
for(Region r: outerGhostZoneRegion) {
update(r, oldGrid , newGrid , step);
for(Region r: innerGhostZoneRegion) {
notifyPatchAccepters(r, oldGrid);
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
35/ 51
Futurizing the Simulation Flow
Futurized Simulation flow:
parallel for(Region r: innerRegion) {
update(r, oldGrid , newGrid , step);
swap(oldGrid , newGrid); ++ step;
parallel for(Region r: outerGhostZoneRegion) {
notifyPatchProviders(r, oldGrid);
parallel for(Region r: outerGhostZoneRegion) {
update(r, oldGrid , newGrid , step);
parallel for(Region r: innerGhostZoneRegion) {
notifyPatchAccepters(r, oldGrid);
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
36/ 51
HPXCL – Extending the Global Adress Space
• All GPU devices are addressable globally
• GPU memory can be allocated and referenced remotely
• Events are extensions of the shared state
⇒ API embedded into the already existing future facilities
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
37/ 51
From async to GPUs
Spawning single tasks not feasible
⇒ offload a work group (Think of parallel::for_each)
auto devices
= hpx:: opencl :: find_devices(hpx:: find_here (),
// create buffers , programs and kernels ...
hpx:: opencl :: buffer buf = devices [0]. create_buffer(
auto write_future = buf.enqueue_write(some_vec.
begin(), some_vec.end());
auto kernel_future = kernel.enqueue(dim ,
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
38/ 51
From async to GPUs
Spawning single tasks not feasible
⇒ offload a work group (Think of parallel::for_each)
• Proof of Concept
• Future Directions:
• Embedd OpenCL devices behind Execution Policies and Executors
• Hide OpenCL stuff behind parallel algorithms
• Hide OpenCL buffer management behind "distributed data structures"
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
38/ 51
Mandelbrot example
Maps API
Acknowledgements to Martin Stumpf
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
39/ 51
Mandelbrot example
Acknowledgements to Martin Stumpf
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
40/ 51
Mandelbrot example
Acknowledgements to Martin Stumpf
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
41/ 51
Performance Results
1 2 4 8 16
Number of Cores, on one Node
Execution Times of HPX and MPI N-Body Codes
(SMP, Weak Scaling)
Comm HPX
Comm MPI
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
42/ 51
Performance Results
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
42/ 51
Performance Results
0 10 20 30 40 50 60
Number of Cores
Weak Scaling Results for HPX N-Body Code
(Single Xeon Phi, Futurized)
1 Thread/Core
2 Threads/Core
3 Threads/Core
4 Threads/Core
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
42/ 51
Performance Results
0 2 4 6 8 10 12 14 16
Number of Nodes, 16 Cores on Host, Full Xeon Phi
Weak Scaling Results for HPX N-Body Codes
(Host Cores and Xeon Phi Accelerator)
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
42/ 51
STREAM Benchmark
1 2 3 4 5 6 7 8 9 10 11 12
Number of cores per NUMA Domain
(50 million data points)
HPX (1 NUMA Domain)
OpenMP (1 NUMA Domain)
HPX (2 NUMA Domains)
OpenMP (2 NUMA Domains)
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
43/ 51
Matrix Transpose
1 2 3 4 5 6 7 8 9 10 11 12
Number of cores per NUMA domain
Matrix Transpose (SMP, 24kx24k Matrices)
HPX (1 NUMA Domain)
HPX (2 NUMA Domains)
OMP (1 NUMA Domain)
OMP (2 NUMA Domains)
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
44/ 51
Matrix Transpose
1 2 3 4 5 6 7 8 9 10 11 12
Number of cores per NUMA domain
Matrix Transpose (SMP, 24kx24k Matrices)
HPX (2 NUMA Domains)
MPI (1 NUMA Domain, 12 ranks)
MPI (2 NUMA Domains, 24 ranks)
MPI+OMP (2 NUMA Domains)
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
45/ 51
Matrix Transpose
0 10 20 30 40 50 60
Number of cores
Matrix Transpose (Xeon/Phi, 24kx24k matrices)
HPX (4 PUs per core) OMP (4 PUs per core)
HPX (2 PUs per core) OMP (2 PUs per core)
HPX (1 PUs per core) OMP (1 PUs per core)
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
46/ 51
Matrix Transpose
2 3 4 5 6 7 8
Number of nodes (16 cores each)
Matrix Transpose (Distributed, 18kx18k elements per node)
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
47/ 51
What’s beyond Exascale?
This project has received funding from the Eu-
ropean Union‘s Horizon 2020 research and in-
novation programme under grant agreement No.
Higher-level parallelization abstractions in C++:
• uniform, versatile, and generic
• All of this is enabled by use of modern C++ facilities
• Runtime system (fine-grain, task-based schedulers)
• Performant, portable implementation
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
49/ 51
Parallelism is here to stay!
• Massive Parallel Hardware is already part of our daily lives!
• Parallelism is observable everywhere:
⇒ IoT: Massive amount devices existing in parallel
⇒ Embedded: Meet massively parallel energy-aware systems (Epiphany, DSPs,
⇒ Automotive: Massive amount of parallel sensor data to process
• We all need solutions on how to deal with this, efficiently and pragmatically
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
50/ 51
More Information
• #STE||AR @
• FET-HPC (H2020): AllScale (
• DOE: Part of X-Stack
C++ on its way to exascale and beyond – The HPX Parallel Runtime System
21.01.2016 | Thomas Heller |
51/ 51

More Related Content

What's hot

Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
Apache Apex
An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduceFrane Bandov
Map Reduce introduction
Map Reduce introductionMap Reduce introduction
Map Reduce introduction
Muralidharan Deenathayalan
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
Accumulo Summit
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisLWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
Jonas Traub
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Jen Aman
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
Flink Forward
Introduction to R
Introduction to RIntroduction to R
Introduction to R
Samuel Bosch
Asymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, ExplainedAsymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, Explained
Vasia Kalavri
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangalore
Kelly Technologies
All AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIAll AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AI
Jim Dowling
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
Hassan A-j
On the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC ApplicationsOn the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC Applications
Wim Vanderbauwhede
Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014
Claudio Martella
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
Kostas Tzoumas
VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC
VENUS: Vertex-Centric Streamlined Graph Computation on a Single PCVENUS: Vertex-Centric Streamlined Graph Computation on a Single PC
VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC
Qin Liu
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, AlibabaWhat's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
Flink Forward
SICS: Apache Flink Streaming
SICS: Apache Flink StreamingSICS: Apache Flink Streaming
SICS: Apache Flink Streaming
Turi, Inc.

What's hot (18)

Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduce
Map Reduce introduction
Map Reduce introductionMap Reduce introduction
Map Reduce introduction
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisLWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
Introduction to R
Introduction to RIntroduction to R
Introduction to R
Asymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, ExplainedAsymmetry in Large-Scale Graph Analysis, Explained
Asymmetry in Large-Scale Graph Analysis, Explained
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangalore
All AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AIAll AI Roads lead to Distribution - Dot AI
All AI Roads lead to Distribution - Dot AI
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
On the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC ApplicationsOn the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC Applications
Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC
VENUS: Vertex-Centric Streamlined Graph Computation on a Single PCVENUS: Vertex-Centric Streamlined Graph Computation on a Single PC
VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, AlibabaWhat's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
SICS: Apache Flink Streaming
SICS: Apache Flink StreamingSICS: Apache Flink Streaming
SICS: Apache Flink Streaming

Viewers also liked

Римский корсаков снегурочка
Римский корсаков снегурочкаРимский корсаков снегурочка
Римский корсаков снегурочка
Ninel Kek
Цветочные легенды
Цветочные легендыЦветочные легенды
Цветочные легенды
Ninel Kek
High Performance Distributed Systems with CQRS
High Performance Distributed Systems with CQRSHigh Performance Distributed Systems with CQRS
High Performance Distributed Systems with CQRSJonathan Oliver
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Apache Apex
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Apache Apex
бсп (обоб. урок)
бсп (обоб. урок)бсп (обоб. урок)
бсп (обоб. урок)
правописание приставок урок№4
правописание приставок урок№4правописание приставок урок№4
правописание приставок урок№4
Troubleshooting mysql-tutorial
Troubleshooting mysql-tutorialTroubleshooting mysql-tutorial
Troubleshooting mysql-tutorial
james tong
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Spark Summit
Windowing in Apache Apex
Windowing in Apache ApexWindowing in Apache Apex
Windowing in Apache Apex
Apache Apex
The 5 People in your Organization that grow Legacy Code
The 5 People in your Organization that grow Legacy CodeThe 5 People in your Organization that grow Legacy Code
The 5 People in your Organization that grow Legacy Code
Roberto Cortez
Hadoop File System Shell Commands,
Hadoop File System Shell Commands,Hadoop File System Shell Commands,
Hadoop File System Shell Commands,
Hadoop online training
Hadoop basic commands
Hadoop basic commandsHadoop basic commands
Hadoop basic commands
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application
Apache Apex
Build your shiny new pc, with Pangoly
Build your shiny new pc, with PangolyBuild your shiny new pc, with Pangoly
Build your shiny new pc, with Pangoly
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
Apache Apex
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
Emilio Coppa
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
Apache Apex
Introduction to UNIX Command-Lines with examples
Introduction to UNIX Command-Lines with examplesIntroduction to UNIX Command-Lines with examples
Introduction to UNIX Command-Lines with examples
Noé Fernández-Pozo
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data Processing
Apache Apex

Viewers also liked (20)

Римский корсаков снегурочка
Римский корсаков снегурочкаРимский корсаков снегурочка
Римский корсаков снегурочка
Цветочные легенды
Цветочные легендыЦветочные легенды
Цветочные легенды
High Performance Distributed Systems with CQRS
High Performance Distributed Systems with CQRSHigh Performance Distributed Systems with CQRS
High Performance Distributed Systems with CQRS
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
бсп (обоб. урок)
бсп (обоб. урок)бсп (обоб. урок)
бсп (обоб. урок)
правописание приставок урок№4
правописание приставок урок№4правописание приставок урок№4
правописание приставок урок№4
Troubleshooting mysql-tutorial
Troubleshooting mysql-tutorialTroubleshooting mysql-tutorial
Troubleshooting mysql-tutorial
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Windowing in Apache Apex
Windowing in Apache ApexWindowing in Apache Apex
Windowing in Apache Apex
The 5 People in your Organization that grow Legacy Code
The 5 People in your Organization that grow Legacy CodeThe 5 People in your Organization that grow Legacy Code
The 5 People in your Organization that grow Legacy Code
Hadoop File System Shell Commands,
Hadoop File System Shell Commands,Hadoop File System Shell Commands,
Hadoop File System Shell Commands,
Hadoop basic commands
Hadoop basic commandsHadoop basic commands
Hadoop basic commands
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application
Build your shiny new pc, with Pangoly
Build your shiny new pc, with PangolyBuild your shiny new pc, with Pangoly
Build your shiny new pc, with Pangoly
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
Introduction to UNIX Command-Lines with examples
Introduction to UNIX Command-Lines with examplesIntroduction to UNIX Command-Lines with examples
Introduction to UNIX Command-Lines with examples
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data Processing

Similar to C++ on its way to exascale and beyond -- The HPX Parallel Runtime System

Software Abstractions for Parallel Hardware
Software Abstractions for Parallel HardwareSoftware Abstractions for Parallel Hardware
Software Abstractions for Parallel Hardware
Joel Falcou
Micro-Benchmarking Considered Harmful
Micro-Benchmarking Considered HarmfulMicro-Benchmarking Considered Harmful
Micro-Benchmarking Considered Harmful
Thomas Wuerthinger
The Download: Tech Talks by the HPCC Systems Community, Episode 11
The Download: Tech Talks by the HPCC Systems Community, Episode 11The Download: Tech Talks by the HPCC Systems Community, Episode 11
The Download: Tech Talks by the HPCC Systems Community, Episode 11
HPCC Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Meetup: Big Data NLP with HPCC Systems® - A Development Ride from Spray to TH...
Meetup: Big Data NLP with HPCC Systems® - A Development Ride from Spray to TH...Meetup: Big Data NLP with HPCC Systems® - A Development Ride from Spray to TH...
Meetup: Big Data NLP with HPCC Systems® - A Development Ride from Spray to TH...
HPCC Systems
Going deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusGoing deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkus
Red Hat Developers
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
Ioan Toma
HDR Defence - Software Abstractions for Parallel Architectures
HDR Defence - Software Abstractions for Parallel ArchitecturesHDR Defence - Software Abstractions for Parallel Architectures
HDR Defence - Software Abstractions for Parallel Architectures
Joel Falcou
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre..."APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
Edge AI and Vision Alliance
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesChapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Akihiro Hayashi
Deployment of an HPC Cloud based on Intel hardware
Deployment of an HPC Cloud based on Intel hardwareDeployment of an HPC Cloud based on Intel hardware
Deployment of an HPC Cloud based on Intel hardware
Intel IT Center
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open Workshop
University of Maribor
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsArchitecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Alluxio, Inc.
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred Hutch
Dirk Petersen
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for research
Esteban Hernandez
C cerin piv2017_c
C cerin piv2017_cC cerin piv2017_c
C cerin piv2017_c
Bertrand Tavitian
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
Edge AI and Vision Alliance

Similar to C++ on its way to exascale and beyond -- The HPX Parallel Runtime System (20)

Software Abstractions for Parallel Hardware
Software Abstractions for Parallel HardwareSoftware Abstractions for Parallel Hardware
Software Abstractions for Parallel Hardware
Micro-Benchmarking Considered Harmful
Micro-Benchmarking Considered HarmfulMicro-Benchmarking Considered Harmful
Micro-Benchmarking Considered Harmful
The Download: Tech Talks by the HPCC Systems Community, Episode 11
The Download: Tech Talks by the HPCC Systems Community, Episode 11The Download: Tech Talks by the HPCC Systems Community, Episode 11
The Download: Tech Talks by the HPCC Systems Community, Episode 11
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Meetup: Big Data NLP with HPCC Systems® - A Development Ride from Spray to TH...
Meetup: Big Data NLP with HPCC Systems® - A Development Ride from Spray to TH...Meetup: Big Data NLP with HPCC Systems® - A Development Ride from Spray to TH...
Meetup: Big Data NLP with HPCC Systems® - A Development Ride from Spray to TH...
Going deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkusGoing deep (learning) with tensor flow and quarkus
Going deep (learning) with tensor flow and quarkus
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HDR Defence - Software Abstractions for Parallel Architectures
HDR Defence - Software Abstractions for Parallel ArchitecturesHDR Defence - Software Abstractions for Parallel Architectures
HDR Defence - Software Abstractions for Parallel Architectures
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre..."APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesChapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Deployment of an HPC Cloud based on Intel hardware
Deployment of an HPC Cloud based on Intel hardwareDeployment of an HPC Cloud based on Intel hardware
Deployment of an HPC Cloud based on Intel hardware
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open Workshop
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and CloudsArchitecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Scientific Computing @ Fred Hutch
Scientific Computing @ Fred HutchScientific Computing @ Fred Hutch
Scientific Computing @ Fred Hutch
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for research
C cerin piv2017_c
C cerin piv2017_cC cerin piv2017_c
C cerin piv2017_c
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...
“Khronos Group Standards: Powering the Future of Embedded Vision,” a Presenta...

Recently uploaded

Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García

Recently uploaded (20)

Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)

C++ on its way to exascale and beyond -- The HPX Parallel Runtime System

  • 1. C++ on its way to exascale and beyond – The HPX Parallel Runtime System Thomas Heller ( January 21, 2016 This project has received funding from the Eu- ropean Union‘s Horizon 2020 research and in- novation programme under grant agreement No. 671603
  • 2. What is Exascale anyway? This project has received funding from the Eu- ropean Union‘s Horizon 2020 research and in- novation programme under grant agreement No. 671603
  • 3. Exascale in numbers • An Exascale Computer is supposed to execute 1018 floating point operations in a second • Exa: 1018 = 1000000000000000000 • People on Earth: 7.3 Billion = 7.3 ∗ 109 • Imagine each person is able to compute one operation per second. It takes: ⇒ 136986301 seconds ⇒ 2283105 minutes ⇒ 38051 hours ⇒ 1585 days ⇒ 4 years C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 3/ 51
  • 4. Why do we need that many calculations? C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 4/ 51
  • 5. Why do we need that many calculations? C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 4/ 51
  • 6. Why do we need that many calculations? C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 4/ 51
  • 7. Why do we need that many calculations? C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 4/ 51
  • 8. Why do we need that many calculations? C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 4/ 51
  • 9. Why do we need that many calculations? C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 4/ 51
  • 10. Why do we need that many calculations? C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 4/ 51
  • 11. Why do we need that many calculations? C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 4/ 51
  • 12. Challenges • How do we program those beasts? ⇒ Massively parallel processors ⇒ Massive amount of compute nodes ⇒ Deep Memory hierarchies • How can we design the architecture to be affordable? ⇒ Biggest Operational cost is Energy ⇒ Power Envelop of 20MW ⇒ Current fastest Computer (Tian-He 2): 17MW C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 5/ 51
  • 13. Current Development Current #1 System: • Tian-He 2: 33.9 PFLOPS • 4% of an Exaflop C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 6/ 51
  • 14. Hardware Trends • ARM: Low-Power ARM64 cores (maybe adding embedded GPU accelerators) • IBM: POWER + NVIDIA Accelerators • Intel: Knights Landing (Xeon Phi) Many Core processor C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 7/ 51
  • 15. How will C++ deal with all that?!? This project has received funding from the Eu- ropean Union‘s Horizon 2020 research and in- novation programme under grant agreement No. 671603
  • 16. Challenges • Programmability • Expressing Parallelism • Expressing Data Locality C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 9/ 51
  • 17. The 4 Horsemen of the Apocalypse: SLOW Starvation Latency Overhead Waiting for contention C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 10/ 51
  • 18. State of the Art • Modern architectures impose massive challenges on programmability in the context of performance portability • Massive increase in on-node parallelism • Deep memory hierarchies • Only portable parallelization solution for C++ programmers (today): OpenMP and MPI • Hugely successful for years • Widely used and supported • Simple use for simple use cases • Very portable • Highly optimized C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 11/ 51
  • 19. State of the Art – Parallelism in C++ • C++11 introduced lower level abstractions • std::thread, std::mutex, std::future, etc. • Fairly limited, more is needed • C++ needs stronger support for higher-level parallelism • Several proposals to the Standardization Committee are accepted or under consideration • Technical Specification: Concurrency (P0159, note: misnomer) • Technical Specification: Parallelism (P0024) • Other smaller proposals: resumable functions, task regions, executors • Currently there is no overarching vision related to higher-level parallelism • Goal is to standardize a ‘big story’ by 2020 • No need for OpenMP, OpenACC, OpenCL, etc. C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 12/ 51
  • 20. Stepping Aside – Introducing HPX This project has received funding from the Eu- ropean Union‘s Horizon 2020 research and in- novation programme under grant agreement No. 671603
  • 21. HPX – A general purpose parallel Runtime System • Solidly based on a theoretical foundation – a well defined, new execution model (ParalleX) • Exposes a coherent and uniform, standards-oriented API for ease of programming parallel and distributed applications. • Enables to write fully asynchronous code using hundreds of millions of threads. • Provides unified syntax and semantics for local and remote operations. • Open Source: Published under the Boost Software License C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 14/ 51
  • 22. HPX – A general purpose parallel Runtime System HPX represents an innovative mixture of • A global system-wide address space (AGAS - Active Global Address Space) • Fine grain parallelism and lightweight synchronization • Combined with implicit, work queue based, message driven computation • Full semantic equivalence of local and remote execution, and • Explicit support for hardware accelerators (through percolation) C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 15/ 51
  • 23. HPX 101 – The programming model Memory Locality 0 Memory Locality 1 Memory Locality i Memory Locality N-1 C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 16/ 51
  • 24. HPX 101 – The programming model Global Address Space Memory Locality 0 Memory Locality 1 Memory Locality i Memory Locality N-1 Parcelport Active Global Address Space (AGAS) Service Thread- Scheduler Thread- Scheduler Thread- Scheduler Thread- Scheduler C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 16/ 51
  • 25. HPX 101 – The programming model Global Address Space Memory Locality 0 Memory Locality 1 Memory Locality i Memory Locality N-1 Parcelport Active Global Address Space (AGAS) Service Thread- Scheduler Thread- Scheduler Thread- Scheduler Thread- Scheduler Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 16/ 51
  • 26. HPX 101 – The programming model Global Address Space Memory Locality 0 Memory Locality 1 Memory Locality i Memory Locality N-1 Parcelport Active Global Address Space (AGAS) Service Thread- Scheduler Thread- Scheduler Thread- Scheduler Thread- Scheduler Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread Thread future <id_type > id = new_ <Component >( locality , ...); future <R> result = async(id.get(), action , ...); C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 16/ 51
  • 27. HPX 101 – The programming model Locality 0 Locality 1 Locality i Locality N-1 Parcelport Active Global Address Space (AGAS) Service Thread- Scheduler Thread- Scheduler Thread- Scheduler Thread- Scheduler C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 16/ 51
  • 28. HPX 101 – Overview HPX C++ Standard Library C++ R f(p...) Synchronous Asynchronous Fire & Forget (returns R) (returns future<R>) (returns void) Functions f(p...) async(f, p...) apply(f, p...) (direct) Functions bind(f, p...)(...) async(bind(f, p...), ...) apply(bind(f, p...), ...) (lazy) Actions HPX_ACTION(f, a) HPX_ACTION(f, a) HPX_ACTION(f, a) (direct) a()(id, p...) async(a(), id, p...) apply(a(), id, p...) Actions HPX_ACTION(f, a) HPX_ACTION(f, a) HPX_ACTION(f, a) (lazy) bind(a(), id, p...) (...) async(bind(a(), id, p...), ...) apply(bind(a(), id, p...), ...) In Addition: dataflow(func, f1, f2); C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 17/ 51
  • 29. The Future, an example int universal_answer () { return 42; } void deep_thought () { future <int > promised_answer = async(& universal_answer); // do other things for 7.5 million years cout << promised_answer.get() << endl; // prints 42, eventually } C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 18/ 51
  • 30. Compositional facilities • Sequential composition of futures future <string > make_string () { future <int > f1 = async ([]() -> int { return 123; }); future <string > f2 = f1.then( [](future <int > f) -> string { // here .get() won’t block return to_string(f.get()); }); return f2; } C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 19/ 51
  • 31. Compositional facilities • Parallel composition of futures future <int > test_when_all () { future <int > future1 = async ([]() -> int { return 125; }); future <string > future2 = async ([]() -> string { return string("hi"); }); auto all_f = when_all(future1 , future2); future <int > result = all_f.then( []( auto f) -> int { return do_work(f.get()); }); return result; } C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 20/ 51
  • 32. Dataflow – The new ’async’ (HPX) • What if one or more arguments to ’async’ are futures themselves? • Normal behavior: pass futures through to function • Extended behavior: wait for futures to become ready before invoking the function: template <typename F, typename ... Arg > future <result_of_t <F(Args ...) >> // requires(is_callable <F(Arg ...) >) dataflow(F && f, Arg &&... arg); • If ArgN is a future, then the invocation of F will be delayed • Non-future arguments are passed through C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 21/ 51
  • 33. Parallel Algorithms This project has received funding from the Eu- ropean Union‘s Horizon 2020 research and in- novation programme under grant agreement No. 671603
  • 34. Concepts of Parallelism – Parallel Execution Properties • The execution restrictions applicable for the work items • In what sequence the work items have to be executed • Where the work items should be executed • The parameters of the execution environment C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 23/ 51
  • 35. Concepts and Types of Parallelism Application Concepts Execution Policies Executors Executor Parameters C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 24/ 51
  • 36. Concepts and Types of Parallelism Application Concepts Execution Policies Executors Executor Parameters Restrictions C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 24/ 51
  • 37. Concepts and Types of Parallelism Application Concepts Execution Policies Executors Executor Parameters Restrictions Sequence, Where C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 24/ 51
  • 38. Concepts and Types of Parallelism Application Concepts Execution Policies Executors Executor Parameters Restrictions Sequence, Where Grain Size C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 24/ 51
  • 39. Concepts and Types of Parallelism Application Concepts Execution Policies Executors Executor Parameters Restrictions Sequence, Where Grain Size Futures, Async, Dataflow C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 24/ 51
  • 40. Concepts and Types of Parallelism Application Concepts Execution Policies Executors Executor Parameters Restrictions Sequence, Where Grain Size Futures, Async, Dataflow Parallel Algorithms Fork-Join, etc C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 24/ 51
  • 41. Execution Policies (std) • Specify execution guarantees (in terms of thread-safety) for executed parallel tasks: • sequential_execution_policy: seq • parallel_execution_policy: par • parallel_vector_execution_policy: par_vec • In parallelism TS used for parallel algorithms only C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 25/ 51
  • 42. Execution Policies (Extensions) • Asynchronous Execution Policies: • sequential_task_execution_policy: seq(task) • parallel_task_execution_policy: par(task) • In both cases the formerly synchronous functions return a future<> • Instruct the parallel construct to be executed asynchronously • Allows integration with asynchronous control flow C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 26/ 51
  • 43. Executors • Executor are objects responsible for • Creating execution agents on which work is performed (P0058) • In P0058 this is limited to parallel algorithms, here much broader use • Abstraction of the (potentially platform-specific) mechanisms for launching work • Responsible for defining the Where and How of the execution of tasks C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 27/ 51
  • 44. Execution Parameters Allows to control the grain size of work • i.e. amount of iterations of a parallel for_each run on the same thread • Similar to OpenMP scheduling policies: static, guided, dynamic • Much more fine control C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 28/ 51
  • 45. Putting it all together – SAXPY routine with data locality • a[i] = b[i] ∗ x + c[i], for i from 0 to N − 1 • Using parallel algorithms • Explicit Control over data locality • No raw Loops C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 29/ 51
  • 46. Putting it all together – SAXPY routine with data locality Complete serial version: std::vector <double > a = ...; std::vector <double > b = ...; std::vector <double > c = ...; double x = ...; std:: transform(b.begin(), b.end(), c.begin(), c.end(), a.begin(), [x]( double bb, double cc) { return bb * x + cc; }); C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 30/ 51
  • 47. Putting it all together – SAXPY routine with data locality Parallel version, no data locality: std::vector <double > a = ...; std::vector <double > b = ...; std::vector <double > c = ...; double x = ...; parallel :: transform(parallel ::par , b.begin(), b.end(), c.begin(), c.end(), a.begin(), [x]( double bb, double cc) { return bb * x + cc; }); C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 31/ 51
  • 48. Putting it all together – SAXPY routine with data locality Parallel version, no data locality: std::vector <double , numa_allocator > a = ...; std::vector <double , numa_allocator > b = ...; std::vector <double , numa_allocator > c = ...; double x = ...; for(numa_executor : numa_executors) { parallel :: transform( parallel ::par.on(numa_executor), b.begin() +..., b.begin() +..., c.begin() +..., c.begin() +..., a.begin() +..., [x]( double bb, double cc) { return bb * x + cc; }); } C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 32/ 51
  • 49. Case Studies This project has received funding from the Eu- ropean Union‘s Horizon 2020 research and in- novation programme under grant agreement No. 671603
  • 50. LibGeoDecomp • C++ Auto-parallelizing framework • Open Source • High scalability • Wide range of platform support • C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 34/ 51
  • 51. LibGeoDecomp Futurizing the Simulation Flow Basic Simulation flow: for(Region r: innerRegion) { update(r, oldGrid , newGrid , step); } swap(oldGrid , newGrid); ++step; for(Region r: outerGhostZoneRegion) { notifyPatchProviders(r, oldGrid); } for(Region r: outerGhostZoneRegion) { update(r, oldGrid , newGrid , step); } for(Region r: innerGhostZoneRegion) { notifyPatchAccepters(r, oldGrid); } C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 35/ 51
  • 52. LibGeoDecomp Futurizing the Simulation Flow Futurized Simulation flow: parallel for(Region r: innerRegion) { update(r, oldGrid , newGrid , step); } swap(oldGrid , newGrid); ++ step; parallel for(Region r: outerGhostZoneRegion) { notifyPatchProviders(r, oldGrid); } parallel for(Region r: outerGhostZoneRegion) { update(r, oldGrid , newGrid , step); } parallel for(Region r: innerGhostZoneRegion) { notifyPatchAccepters(r, oldGrid); } Continuation Continuation Continuation C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 36/ 51
  • 53. HPXCL – Extending the Global Adress Space • All GPU devices are addressable globally • GPU memory can be allocated and referenced remotely • Events are extensions of the shared state ⇒ API embedded into the already existing future facilities C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 37/ 51
  • 54. From async to GPUs Spawning single tasks not feasible ⇒ offload a work group (Think of parallel::for_each) auto devices = hpx:: opencl :: find_devices(hpx:: find_here (), CL_DEVICE_TYPE_GPU).get(); // create buffers , programs and kernels ... hpx:: opencl :: buffer buf = devices [0]. create_buffer( CL_MEM_READ_WRITE , 4711); auto write_future = buf.enqueue_write(some_vec. begin(), some_vec.end()); auto kernel_future = kernel.enqueue(dim , write_future); C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 38/ 51
  • 55. From async to GPUs Spawning single tasks not feasible ⇒ offload a work group (Think of parallel::for_each) • Proof of Concept • Future Directions: • Embedd OpenCL devices behind Execution Policies and Executors • Hide OpenCL stuff behind parallel algorithms • Hide OpenCL buffer management behind "distributed data structures" C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 38/ 51
  • 56. Mandelbrot example Queue Google Maps API Client Worker Generator Worker Webserver Acknowledgements to Martin Stumpf C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 39/ 51
  • 57. Mandelbrot example Acknowledgements to Martin Stumpf C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 40/ 51
  • 58. Mandelbrot example Acknowledgements to Martin Stumpf C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 41/ 51
  • 59. LibGeoDecomp Performance Results 0 10 20 30 40 50 60 70 1 2 4 8 16 Time[s] Number of Cores, on one Node Execution Times of HPX and MPI N-Body Codes (SMP, Weak Scaling) Sim HPX Sim MPI Comm HPX Comm MPI C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 42/ 51
  • 60. LibGeoDecomp Performance Results C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 42/ 51
  • 61. LibGeoDecomp Performance Results 0 200 400 600 800 1000 1200 1400 1600 0 10 20 30 40 50 60 PerformanceinGFLOPS Number of Cores Weak Scaling Results for HPX N-Body Code (Single Xeon Phi, Futurized) 1 Thread/Core 2 Threads/Core 3 Threads/Core 4 Threads/Core C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 42/ 51
  • 62. LibGeoDecomp Performance Results 0 5 10 15 20 25 30 0 2 4 6 8 10 12 14 16 PerformanceinTFLOPS Number of Nodes, 16 Cores on Host, Full Xeon Phi Weak Scaling Results for HPX N-Body Codes (Host Cores and Xeon Phi Accelerator) HPX Peak C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 42/ 51
  • 63. STREAM Benchmark 10 20 30 40 50 60 70 80 1 2 3 4 5 6 7 8 9 10 11 12 Bandwidth[GB/s] Number of cores per NUMA Domain TRIAD STREAM Results (50 million data points) HPX (1 NUMA Domain) OpenMP (1 NUMA Domain) HPX (2 NUMA Domains) OpenMP (2 NUMA Domains) C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 43/ 51
  • 64. Matrix Transpose 0 10 20 30 40 50 60 1 2 3 4 5 6 7 8 9 10 11 12 Datatransferrate[GB/s] Number of cores per NUMA domain Matrix Transpose (SMP, 24kx24k Matrices) HPX (1 NUMA Domain) HPX (2 NUMA Domains) OMP (1 NUMA Domain) OMP (2 NUMA Domains) C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 44/ 51
  • 65. Matrix Transpose 0 10 20 30 40 50 60 1 2 3 4 5 6 7 8 9 10 11 12 Datatransferrate[GB/s] Number of cores per NUMA domain Matrix Transpose (SMP, 24kx24k Matrices) HPX (2 NUMA Domains) MPI (1 NUMA Domain, 12 ranks) MPI (2 NUMA Domains, 24 ranks) MPI+OMP (2 NUMA Domains) C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 45/ 51
  • 66. Matrix Transpose 0 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 60 Datatransferrate[GB/s] Number of cores Matrix Transpose (Xeon/Phi, 24kx24k matrices) HPX (4 PUs per core) OMP (4 PUs per core) HPX (2 PUs per core) OMP (2 PUs per core) HPX (1 PUs per core) OMP (1 PUs per core) C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 46/ 51
  • 67. Matrix Transpose 0 5 10 15 20 25 30 35 2 3 4 5 6 7 8 Datatransferrate[GB/s] Number of nodes (16 cores each) Matrix Transpose (Distributed, 18kx18k elements per node) HPX MPI C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 47/ 51
  • 68. What’s beyond Exascale? This project has received funding from the Eu- ropean Union‘s Horizon 2020 research and in- novation programme under grant agreement No. 671603
  • 69. Conclusions Higher-level parallelization abstractions in C++: • uniform, versatile, and generic • All of this is enabled by use of modern C++ facilities • Runtime system (fine-grain, task-based schedulers) • Performant, portable implementation C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 49/ 51
  • 70. Parallelism is here to stay! • Massive Parallel Hardware is already part of our daily lives! • Parallelism is observable everywhere: ⇒ IoT: Massive amount devices existing in parallel ⇒ Embedded: Meet massively parallel energy-aware systems (Epiphany, DSPs, FPGAs) ⇒ Automotive: Massive amount of parallel sensor data to process • We all need solutions on how to deal with this, efficiently and pragmatically C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 50/ 51
  • 71. More Information • • • • #STE||AR @ Collaborations: • FET-HPC (H2020): AllScale ( • NSF: STORM ( • DOE: Part of X-Stack C++ on its way to exascale and beyond – The HPX Parallel Runtime System 21.01.2016 | Thomas Heller | 51/ 51