SlideShare a Scribd company logo
1 of 135
Download to read offline
An introduction to GrPPI
An introduction to GrPPI
Generic Reusable Parallel Patterns Interface
J. Daniel Garcia
ARCOS Group
University Carlos III of Madrid
Spain
November 2019
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 1/80
An introduction to GrPPI
Warning
c This work is under Attribution-NonCommercial-
NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
license.
You are free to Share — copy and redistribute the ma-
terial in any medium or format.
b You must give appropriate credit, provide a link to the
license, and indicate if changes were made. You may
do so in any reasonable manner, but not in any way that
suggests the licensor endorses you or your use.
e You may not use the material for commercial purposes.
d If you remix, transform, or build upon the material, you
may not distribute the modified material.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 2/80
An introduction to GrPPI
ARCOS@uc3m
UC3M: A young international research oriented university.
ARCOS: An applied research group.
Lines: High Performance Computing, Big data,
Cyberphysical Systems, and Programming models for
application improvement.
Programming Models for Application Improvement:
Provide programming tools for improving:
Performance.
Energy efficiency.
Maintainability.
Correctness.
Standardization:
ISO/IEC JTC/SC22/WG21. ISO C++ standards committe.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 3/80
An introduction to GrPPI
Acknowledgements
Project 801091 “ASPIDE: Exascale programming models for
extreme programming funded by the European Commission
through H2020 FET-HPC (2018-2020).
Project ICT 644235 “REPHRASE: REfactoring Parallel
Heterogeneous Resource-aware Applications” funded by the
European Commission through H2020 program (2015-2018).
Project TIN2016-79673-P “Towards Unification of HPC and
Big Data Paradigms” funded by the Spanish Ministry of
Economy and Competitiveness (2016-2019).
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 4/80
An introduction to GrPPI
GrPPI team
Main team
J. Daniel Garcia (UC3M, lead).
Javier Garcia Blas (UC3M).
David del Río (UC3M).
Javier Fernández (UC3M).
Cooperation
Marco Danelutto (Univ. Pisa)
Massimo Torquati (Univ. Pisa)
Marco Aldinucci (Univ. Torino)
Fabio Tordini (Univ. Torino)
Plácido Fernández (UC3M-CERN).
Manuel F. Dolz (Univ. Jaume I).
. . .
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 5/80
An introduction to GrPPI
Introduction
1 Introduction
2 Simple use
3 Patterns in GrPPI
4 Evaluation
5 Conclusions
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 6/80
An introduction to GrPPI
Introduction
Sequential Programming versus Parallel Programming
Sequential programming
Well-known set of control-structures embedded in
programming languages.
Control structures inherently sequential.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 7/80
An introduction to GrPPI
Introduction
Sequential Programming versus Parallel Programming
Sequential programming
Well-known set of control-structures embedded in
programming languages.
Control structures inherently sequential.
Traditional Parallel programming
Constructs adapting sequential control structures to the
parallel world (e.g. parallel-for).
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 7/80
An introduction to GrPPI
Introduction
Sequential Programming versus Parallel Programming
Sequential programming
Well-known set of control-structures embedded in
programming languages.
Control structures inherently sequential.
Traditional Parallel programming
Constructs adapting sequential control structures to the
parallel world (e.g. parallel-for).
But wait!
What if we had constructs that could be both sequential
and parallel?
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 7/80
An introduction to GrPPI
Introduction
Software design
There are two ways of constructing a software design:
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 8/80
An introduction to GrPPI
Introduction
Software design
There are two ways of constructing a software design:
One way is
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 8/80
An introduction to GrPPI
Introduction
Software design
There are two ways of constructing a software design:
One way is
to make it so simple that there are obviously no defi-
ciencies,
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 8/80
An introduction to GrPPI
Introduction
Software design
There are two ways of constructing a software design:
One way is
to make it so simple that there are obviously no defi-
ciencies,
and the other way is
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 8/80
An introduction to GrPPI
Introduction
Software design
There are two ways of constructing a software design:
One way is
to make it so simple that there are obviously no defi-
ciencies,
and the other way is
to make it so complicated that there are no obvious
deficiencies.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 8/80
An introduction to GrPPI
Introduction
Software design
There are two ways of constructing a software design:
One way is
to make it so simple that there are obviously no defi-
ciencies,
and the other way is
to make it so complicated that there are no obvious
deficiencies.
The first method is far more difficult.
C.A.R Hoare
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 8/80
An introduction to GrPPI
Introduction
Adding two vectors
Traditional way
using numvec = std::vector<double>;
numvec add(const numvec & v1, const numvec & v2) {
numvec res;
res.reserve(v1.size() ) ; // Asume equal sizes
for (int i =0; i <v1.size() ; ++i) {
res.push_back(v1[i]+v2[i]) ;
}
return res;
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 9/80
An introduction to GrPPI
Introduction
Adding two vectors
Traditional way
using numvec = std::vector<double>;
numvec add(const numvec & v1, const numvec & v2) {
numvec res;
res.reserve(v1.size() ) ; // Asume equal sizes
for (int i =0; i <v1.size() ; ++i) {
res.push_back(v1[i]+v2[i]) ;
}
return res;
}
Adds additional constraints.
Traversing in-order.
Potential mistakes.
i<v1.size() versus i<=v1.size().
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 9/80
An introduction to GrPPI
Introduction
Adding two vectors
The STL way (C++98/03)
using numvec = std::vector<double>;
numvec add(const numvec & v1, const numvec & v2) {
numvec res;
res.reserve(v1.size() ) ; // Asume equal sizes
std :: transform(v1.begin(), v1.end(), v2.begin() ,
std :: back_inserter(res),
std :: plus<>{}) ;
return res;
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 10/80
An introduction to GrPPI
Introduction
Adding two vectors
The STL way (C++98/03)
using numvec = std::vector<double>;
numvec add(const numvec & v1, const numvec & v2) {
numvec res;
res.reserve(v1.size() ) ; // Asume equal sizes
std :: transform(v1.begin(), v1.end(), v2.begin() ,
std :: back_inserter(res),
std :: plus<>{}) ;
return res;
}
Minimize off-by-one mistakes.
Type specific optimizations.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 10/80
An introduction to GrPPI
Introduction
Adding two vectors
The STL way (C++14)
using numvec = std::vector<double>;
numvec add(const numvec & v1, const numvec & v2) {
numvec res;
res.reserve(v1.size() ) ; // Asume equal sizes
std :: transform(v1.begin(), v1.end(), v2.begin() ,
std :: back_inserter(res),
[]( auto x, auto y) { return x+y; }) ;
return res;
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 11/80
An introduction to GrPPI
Introduction
Adding two vectors
The STL way (C++14)
using numvec = std::vector<double>;
numvec add(const numvec & v1, const numvec & v2) {
numvec res;
res.reserve(v1.size() ) ; // Asume equal sizes
std :: transform(v1.begin(), v1.end(), v2.begin() ,
std :: back_inserter(res),
[]( auto x, auto y) { return x+y; }) ;
return res;
}
Use (possibly generic) lambdas .
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 11/80
An introduction to GrPPI
Introduction
A brief history of patterns
From building and architecture (Cristopher Alexander):
1977: A Pattern Language: Towns, Buildings, Construction.
1979: The timeless way of buildings.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 12/80
An introduction to GrPPI
Introduction
A brief history of patterns
From building and architecture (Cristopher Alexander):
1977: A Pattern Language: Towns, Buildings, Construction.
1979: The timeless way of buildings.
To software design (Gamma et al.):
1993: Design Patterns: abstraction and reuse of object
oriented design. ECOOP.
1995: Design Patterns. Elements of Reusable
Object-Oriented Software.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 12/80
An introduction to GrPPI
Introduction
A brief history of patterns
From building and architecture (Cristopher Alexander):
1977: A Pattern Language: Towns, Buildings, Construction.
1979: The timeless way of buildings.
To software design (Gamma et al.):
1993: Design Patterns: abstraction and reuse of object
oriented design. ECOOP.
1995: Design Patterns. Elements of Reusable
Object-Oriented Software.
To parallel programming (McCool, Reinders, Robinson):
2012: Structured Parallel Programming: Patterns for
Efficient Computation.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 12/80
An introduction to GrPPI
Introduction
Parallel Patterns
Levels of patterns:
Software Architecture Patterns.
Design Patterns.
Algorithm Strategy Patterns.
Implementation patterns.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 13/80
An introduction to GrPPI
Introduction
Parallel Patterns
Levels of patterns:
Software Architecture Patterns.
Design Patterns.
Algorithm Strategy Patterns.
Implementation patterns.
Algorithm Strategy Pattern or Algorithmic Skeleton
A way to codify best practices in parallel programming in
a reusable way.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 13/80
An introduction to GrPPI
Introduction
Parallel Patterns
Levels of patterns:
Software Architecture Patterns.
Design Patterns.
Algorithm Strategy Patterns.
Implementation patterns.
Algorithm Strategy Pattern or Algorithmic Skeleton
A way to codify best practices in parallel programming in
a reusable way.
A single Parallel Pattern may have different realizations
in different programming models.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 13/80
An introduction to GrPPI
Introduction
Example: map pattern
Apply a single elemental operation to different data
items.
SAXPY: Sequential
for (size_t i=0; i<n; ++i) {
y[ i ] = a ∗ x[ i ] + y[ i ];
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 14/80
An introduction to GrPPI
Introduction
Example: map pattern
Apply a single elemental operation to different data
items.
SAXPY: Sequential
for (size_t i=0; i<n; ++i) {
y[ i ] = a ∗ x[ i ] + y[ i ];
}
SAXPY: OpenMP
#pragma omp parallel for
for (size_t i=0; i<n; ++i) {
y[ i ] = a ∗ x[ i ] + y[ i ];
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 14/80
An introduction to GrPPI
Introduction
Example: map pattern
Apply a single elemental operation to different data
items.
SAXPY: Sequential
for (size_t i=0; i<n; ++i) {
y[ i ] = a ∗ x[ i ] + y[ i ];
}
SAXPY: OpenMP
#pragma omp parallel for
for (size_t i=0; i<n; ++i) {
y[ i ] = a ∗ x[ i ] + y[ i ];
}
SAXPY: TBB
tbb :: parallel_for (tbb :: blocked_range<int>{0,n},
[&](tbb :: blocked_range<int> r) {
for (auto i : r) {
y[ i ] = a ∗ x[ i ] + y[ i ];
}
}) ;
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 14/80
An introduction to GrPPI
Introduction
Ideals for Parallel Patterns
Applications should be expressed independently of the
execution model.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 15/80
An introduction to GrPPI
Introduction
Ideals for Parallel Patterns
Applications should be expressed independently of the
execution model.
Computations should be expressed in terms of
structured composable patterns.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 15/80
An introduction to GrPPI
Introduction
Ideals for Parallel Patterns
Applications should be expressed independently of the
execution model.
Computations should be expressed in terms of
structured composable patterns.
Multiple back-ends should be offered with simple
switching mechanisms.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 15/80
An introduction to GrPPI
Introduction
Ideals for Parallel Patterns
Applications should be expressed independently of the
execution model.
Computations should be expressed in terms of
structured composable patterns.
Multiple back-ends should be offered with simple
switching mechanisms.
Interface should integrate seamlessly with modern C++
and its standard library.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 15/80
An introduction to GrPPI
Introduction
Ideals for Parallel Patterns
Applications should be expressed independently of the
execution model.
Computations should be expressed in terms of
structured composable patterns.
Multiple back-ends should be offered with simple
switching mechanisms.
Interface should integrate seamlessly with modern C++
and its standard library.
Applications should be able to take advantage of modern
C++ language features.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 15/80
An introduction to GrPPI
Introduction
C++17: Execution policies
An execution policy allows to disambiguate parallel
algorithms overloads.
sequenced_policy: Not parallelized.
std::execution::seq.
parallel_policy: Execution can be parallelized in multiple
threads.
std::execution::par.
parallel_unsequenced_policy: Execution can be
parallelized, vectorized, or migrated across threads.
std::execution::par_unseq.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 16/80
An introduction to GrPPI
Introduction
C++17: Parallel algorithms
Most algorithms in <algorithm> and <numeric> have
execution policy based versions.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 17/80
An introduction to GrPPI
Introduction
C++17: Parallel algorithms
Most algorithms in <algorithm> and <numeric> have
execution policy based versions.
SAXPY: C++17
std :: transform(std::execution::par,
y.begin(), y.end(), x.begin(), y.begin(),
[=]( auto vx, auto vy) { return a ∗ vx + vy; }) ;
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 17/80
An introduction to GrPPI
Introduction
C++17: Parallel algorithms
Most algorithms in <algorithm> and <numeric> have
execution policy based versions.
SAXPY: C++17
std :: transform(std::execution::par,
y.begin(), y.end(), x.begin(), y.begin(),
[=]( auto vx, auto vy) { return a ∗ vx + vy; }) ;
However . . .
No fine control knobs.
Lack of task parallelism and stream parallelism.
OK as an entry level to parallelism.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 17/80
An introduction to GrPPI
Introduction
GrPPI
https://github.com/arcosuc3m/grppi
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 18/80
An introduction to GrPPI
Introduction
GrPPI
https://github.com/arcosuc3m/grppi
Generic reusable Parallel Pattern Interface.
A header only library (might change).
A set of execution policies.
A set of type safe generic algorithms.
Requires C++14.
Apache 2.0 License.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 18/80
An introduction to GrPPI
Simple use
1 Introduction
2 Simple use
3 Patterns in GrPPI
4 Evaluation
5 Conclusions
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 19/80
An introduction to GrPPI
Simple use
Example: Transforming a sequence
Given a sequence of frames generate a new sequence of
frames in grayscale.
struct frame { /∗ ... ∗/ };
frame togray(const frame & f);
frame[0] frame[1] frame[2] frame[3] frame[4]
frame[0] frame[1] frame[2] frame[3] frame[4]
togray() togray() togray() togray() togray()
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 20/80
An introduction to GrPPI
Simple use
Transforming a sequence
Traditional explicit loop
using frameseq = std::vector<frame>;
frameseq seq_togray(const frameseq & s) {
frameseq r;
r.reserve(s.size() ) ;
// Requires processing in−order
for (const auto & f : s) {
r.push_back(togray(f));
}
return r;
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 21/80
An introduction to GrPPI
Simple use
Transforming a sequence
STL way
using frameseq = std::vector<frame>;
frameseq seq_togray(const frameseq & s) {
frameseq r;
r.reserve(s.size() ) ;
std :: transform(s.begin(), s.end(), std :: back_inserter(r), togray);
return r;
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 22/80
An introduction to GrPPI
Simple use
Transforming a sequence
Parallel STL way (C++17)
using frameseq = std::vector<frame>;
frameseq seq_togray(const frameseq & s) {
frameseq r;
r.reserve(s.size () ) ;
// No execution order assumed
std :: transform(std :: execution::par,
s.begin(), s.end(), std :: back_inserter(r), togray);
return r;
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 23/80
An introduction to GrPPI
Simple use
Transforming a sequence
GrPPI
using frameseq = std::vector<frame>;
frameseq seq_togray(const frameseq & s) {
frameseq r(s.size() );
grppi :: sequential_execution seq;
grppi :: map(seq, s.begin(), s.end(), r.begin(), togray);
return r;
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 24/80
An introduction to GrPPI
Simple use
Transforming a sequence
GrPPI + lambda
using frameseq = std::vector<frame>;
frameseq seq_togray(const frameseq & s) {
frameseq r(s.size() );
grppi :: sequential_execution seq;
grppi :: map(seq, s.begin(), s.end(), r.begin(),
[]( const frame & f) { return filter (f,64); }) ;
return r;
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 25/80
An introduction to GrPPI
Simple use
Transforming a sequence
GrPPI + lambda - iterators
using frameseq = std::vector<frame>;
frameseq seq_togray(const frameseq & s) {
frameseq r(s.size() );
grppi :: sequential_execution seq;
grppi :: map(seq, s, r,
[]( const frame & f) { return filter (f,64); }) ;
return r;
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 26/80
An introduction to GrPPI
Patterns in GrPPI
1 Introduction
2 Simple use
3 Patterns in GrPPI
4 Evaluation
5 Conclusions
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 27/80
An introduction to GrPPI
Patterns in GrPPI
Controlling execution
3 Patterns in GrPPI
Controlling execution
Patterns overview
Data patterns
Streaming patterns
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 28/80
An introduction to GrPPI
Patterns in GrPPI
Controlling execution
Execution types
Execution model is encapsulated in execution types.
Always provided as first argument to patterns.
Current concrete execution types:
Sequential: sequential_execution.
ISO C++ Threads: parallel_execution_native.
OpenMP: parallel_execution_omp.
Intel TBB: parallel_execution_tbb.
FastFlow: parallel_execution_ff.
Run-time polymorphic wrapper through type erasure:
dynamic_execution.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 29/80
An introduction to GrPPI
Patterns in GrPPI
Controlling execution
Execution model properties
Some execution types allow finer configurtion.
Example: Concurrency degree.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 30/80
An introduction to GrPPI
Patterns in GrPPI
Controlling execution
Execution model properties
Some execution types allow finer configurtion.
Example: Concurrency degree.
Interface:
ex.set_concurrency_degree(4);
int n = ex.concurrency_degree();
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 30/80
An introduction to GrPPI
Patterns in GrPPI
Controlling execution
Execution model properties
Some execution types allow finer configurtion.
Example: Concurrency degree.
Interface:
ex.set_concurrency_degree(4);
int n = ex.concurrency_degree();
Default values:
Sequential ⇒ 1.
Native ⇒ std::thread::hardware_concurrency().
OpenMP ⇒ omp_get_num_threads().
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 30/80
An introduction to GrPPI
Patterns in GrPPI
Patterns overview
3 Patterns in GrPPI
Controlling execution
Patterns overview
Data patterns
Streaming patterns
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 31/80
An introduction to GrPPI
Patterns in GrPPI
Patterns overview
A classification
Data patterns: Express computations over a data set.
map, reduce, map/reduce, stencil.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 32/80
An introduction to GrPPI
Patterns in GrPPI
Patterns overview
A classification
Data patterns: Express computations over a data set.
map, reduce, map/reduce, stencil.
Task patterns: Express task composition.
divide/conquer.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 32/80
An introduction to GrPPI
Patterns in GrPPI
Patterns overview
A classification
Data patterns: Express computations over a data set.
map, reduce, map/reduce, stencil.
Task patterns: Express task composition.
divide/conquer.
Streaming patterns: Express computations ver a
(possibly unbounded) data set.
pipeline.
Specialized stages: farm, filter, reduction, iteration.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 32/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
3 Patterns in GrPPI
Controlling execution
Patterns overview
Data patterns
Streaming patterns
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 33/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Patterns on data sets
A data pattern performs an operation on one or more data
sets that are already in memory.
Input:
One or more data sets.
Operations.
Output:
A data set (map, stencil).
A single value (reduce, map/reduce).
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 34/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Maps on data sequences
A map pattern applies an operation to every element in a a
data set, generating a new data set.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 35/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Maps on data sequences
A map pattern applies an operation to every element in a a
data set, generating a new data set.
Unidimensional:
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 35/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Maps on data sequences
A map pattern applies an operation to every element in a a
data set, generating a new data set.
Unidimensional:
Multidimensional:
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 35/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Single sequences mapping
Double all elements in vector sequentially
std :: vector<double> double_elements(const std::vector<double> & v)
{
std :: vector<double> res(v.size());
grppi :: sequential_execution seq;
grppi :: map(seq, v, res,
[]( double x) { return 2∗x; }) ;
return res;
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 36/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Single sequences mapping
Double all elements in vector with OpenMP
std :: vector<double> double_elements(const std::vector<double> & v)
{
std :: vector<double> res(v.size());
grppi :: parallel_execution_omp omp;
grppi :: map(omp, v, res,
[]( double x) { return 2∗x; }) ;
return res;
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 37/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Multiple sequences mapping
Add two vectors
template <typename Execution>
std :: vector<double> add_vectors(const Execution & ex,
const std::vector<double> & v1,
const std::vector<double> & v2)
{
auto size = std :: min(v1.size() , v2.size() );
std :: vector<double> res(size);
grppi :: map(ex, zip(v1, v2), res,
[]( double x, double y) { return x+y; }) ;
return res;
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 38/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Multiple sequences mapping
Add three vectors
template <typename Execution>
std :: vector<double> add_vectors(const Execution & ex,
const std::vector<double> & v1,
const std::vector<double> & v2,
const std::vector<double> & v3)
{
auto size = std :: min(v1.size() , v2.size() );
std :: vector<double> res(size);
grppi :: map(ex, zip(v1,v2,v3), res,
[]( double x, double y, double z) { return x+y+z; });
return res;
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 39/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Heterogeneous mapping
The result can be from a different type.
Complex vector from real and imaginary vectors
template <typename Execution>
std :: vector<complex<double>> create_cplx(const Execution & ex,
const std::vector<double> & re,
const std::vector<double> & im)
{
auto size = std :: min(re.size() , im.size() );
std :: vector<complex<double>> res(size);
grppi :: map(ex, zip(re,im), res,
[]( double r, double i) −> complex<double> { return {r,i}; });
return res;
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 40/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Reductions on data sequences
A reduce pattern combines all values in a data set using a
binary combination operation.
id
result
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 41/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Homogeneous reduction
Add a sequence of values
template <typename Execution>
double add_sequence(const Execution & ex, const vector<double> & v)
{
return grppi :: reduce(ex, v, 0.0,
[]( double x, double y) { return x+y; }) ;
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 42/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Heterogeneous reduction
Add areas of shapes
template <typename Execution>
double add_areas(const Execution & ex, const std::vector<shape> & shapes)
{
return grppi :: reduce(ex, shapes, 0.0,
[]( double a, const auto & s) {
return a + s.area()() ;
}) ;
}
Note: Better expressed as a map reduce.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 43/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Map/reduce pattern
A map/reduce pattern combines a map pattern and a
reduce pattern into a single pattern.
1 One or more data sets are mapped applying a
transformation operation.
2 The results are combined by a reduction operation.
A map/reduce could be also expressed by the
composition of a map and a reduce.
However, map/reduce may potentially fuse both stages,
allowing for extra optimizations.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 44/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Map/reduce
id
result
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 45/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Single sequence map/reduce
Sum of squares
template <typename Execution>
double sum_squares(const Execution & ex, const std::vector<double> & v)
{
return grppi :: map_reduce(ex, v, 0.0,
[]( double x) { return x∗x; }
[]( double x, double y) { return x+y; }
);
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 46/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Map/reduce on two data sets
Scalar product
template <typename Execution>
double scalar_product(const Execution & ex,
const std::vector<double> & v1,
const std::vector<double> & v2)
{
return grppi :: map_reduce(ex, zip(v1,v2), 0.0,
[]( double x, double y) { return x∗y; },
[]( double x, double y) { return x+y; }) ;
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 47/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Heterogeneous reduction versus map/reduce
Add areas of shapes
template <typename Execution>
int add_areas(const Execution & ex, const std::vector<shape> & shapes)
{
return grppi :: map_reduce(ex, shapes, 0.0,
[]( const auto & s) { return s.area(); },
[]( double a, double b) { return a+b; } );
}
Note: Better than previous heterogeneous reduce.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 48/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Cannonical map/reduce
Given a sequence of words, produce a container where:
The key is the word.
The value is the number of occurrences of that word.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 49/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Cannonical map/reduce
Given a sequence of words, produce a container where:
The key is the word.
The value is the number of occurrences of that word.
Word frequencies
template <typename Execution>
auto word_freq(const Execution & ex, const std::vector<std::string> & words)
{
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 49/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Cannonical map/reduce
Given a sequence of words, produce a container where:
The key is the word.
The value is the number of occurrences of that word.
Word frequencies
template <typename Execution>
auto word_freq(const Execution & ex, const std::vector<std::string> & words)
{
using namespace std;
using dictionary = std :: map<string,int>;
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 49/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Cannonical map/reduce
Given a sequence of words, produce a container where:
The key is the word.
The value is the number of occurrences of that word.
Word frequencies
template <typename Execution>
auto word_freq(const Execution & ex, const std::vector<std::string> & words)
{
using namespace std;
using dictionary = std :: map<string,int>;
return grppi :: map_reduce(ex, words, dictionary{},
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 49/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Cannonical map/reduce
Given a sequence of words, produce a container where:
The key is the word.
The value is the number of occurrences of that word.
Word frequencies
template <typename Execution>
auto word_freq(const Execution & ex, const std::vector<std::string> & words)
{
using namespace std;
using dictionary = std :: map<string,int>;
return grppi :: map_reduce(ex, words, dictionary{},
[]( string w) −> dictionary { return {w,1}; }
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 49/80
An introduction to GrPPI
Patterns in GrPPI
Data patterns
Cannonical map/reduce
Given a sequence of words, produce a container where:
The key is the word.
The value is the number of occurrences of that word.
Word frequencies
template <typename Execution>
auto word_freq(const Execution & ex, const std::vector<std::string> & words)
{
using namespace std;
using dictionary = std :: map<string,int>;
return grppi :: map_reduce(ex, words, dictionary{},
[]( string w) −> dictionary { return {w,1}; }
[]( dictionary & lhs, const dictionary & rhs) −> dictionary {
for (auto & entry : rhs) { lhs[entry. first ] += entry.second; }
return lhs;
}) ;
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 49/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
3 Patterns in GrPPI
Controlling execution
Patterns overview
Data patterns
Streaming patterns
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 50/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Pipeline pattern
A pipeline pattern allows processing a data stream where
the computation may be divided in multiple stages.
Each stage processes the data item generated in the
previous stage and passes the produced result to the next
stage.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 51/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Standalone pipeline
A standalone pipeline is a top-level pipeline.
Invoking the pipeline translates into its execution.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 52/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Standalone pipeline
A standalone pipeline is a top-level pipeline.
Invoking the pipeline translates into its execution.
Given:
A generator g : ∅ → T1 ∪ ∅
A sequence of transformers ti : Ti → Ti+1
For every non-empty value generated by g, it evaluates:
tn(tn−1(. . . t1(g())))
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 52/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Generators
A generator g is any callable C++ entity that:
Takes no argument.
Returns a value of type T that may hold (or not) a value.
Null value signals end of stream.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 53/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Generators
A generator g is any callable C++ entity that:
Takes no argument.
Returns a value of type T that may hold (or not) a value.
Null value signals end of stream.
The return value must be any type that:
Is copy-constructible or move-constructible.
T x = g();
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 53/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Generators
A generator g is any callable C++ entity that:
Takes no argument.
Returns a value of type T that may hold (or not) a value.
Null value signals end of stream.
The return value must be any type that:
Is copy-constructible or move-constructible.
T x = g();
Is contextually convertible to bool
if (x) { /∗ ... ∗/ }
if (!x) { /∗ ... ∗/ }
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 53/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Generators
A generator g is any callable C++ entity that:
Takes no argument.
Returns a value of type T that may hold (or not) a value.
Null value signals end of stream.
The return value must be any type that:
Is copy-constructible or move-constructible.
T x = g();
Is contextually convertible to bool
if (x) { /∗ ... ∗/ }
if (!x) { /∗ ... ∗/ }
Can be derreferenced
auto val = ∗x;
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 53/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Generators
A generator g is any callable C++ entity that:
Takes no argument.
Returns a value of type T that may hold (or not) a value.
Null value signals end of stream.
The return value must be any type that:
Is copy-constructible or move-constructible.
T x = g();
Is contextually convertible to bool
if (x) { /∗ ... ∗/ }
if (!x) { /∗ ... ∗/ }
Can be derreferenced
auto val = ∗x;
The standard library offers an excellent candidate
std::experimental::optional<T>.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 53/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Simple pipeline
x -> x*x -> 1/x -> print
template <typename Execution>
void run_pipe(const Execution & ex, int n)
{
grppi :: pipeline(ex,
[ i=0,max=n] () mutable −> optional<int> {
if ( i<max) return i++;
else return {};
},
[]( int x) −> double { return x∗x; },
[]( double x) { return 1/x; },
[]( double x) { cout << x << "n"; }
);
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 54/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Nested pipelines
Pipelines may be nested.
An inner pipeline:
Does not take an execution policy.
All stages are transformers (no generator).
The last stage must also produce values.
The inner pipeline uses the same execution policy than
the outer pipeline.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 55/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Nested pipelines
Image processing
void process(std::istream & in_file , std :: ostream & out_file) {
grppi:parallel_execution_native ex;
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 56/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Nested pipelines
Image processing
void process(std::istream & in_file , std :: ostream & out_file) {
grppi:parallel_execution_native ex;
grppi :: pipeline(ex,
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 56/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Nested pipelines
Image processing
void process(std::istream & in_file , std :: ostream & out_file) {
grppi:parallel_execution_native ex;
grppi :: pipeline(ex,
[& in_file ]() −> optional<frame> {
frame f = read_frame(file);
if (! file ) return {};
return f ;
},
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 56/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Nested pipelines
Image processing
void process(std::istream & in_file , std :: ostream & out_file) {
grppi:parallel_execution_native ex;
grppi :: pipeline(ex,
[& in_file ]() −> optional<frame> {
frame f = read_frame(file);
if (! file ) return {};
return f ;
},
pipeline(
[]( const frame & f) { return filter (f); },
[]( const frame & f) { return gray_scale(f); },
},
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 56/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Nested pipelines
Image processing
void process(std::istream & in_file , std :: ostream & out_file) {
grppi:parallel_execution_native ex;
grppi :: pipeline(ex,
[& in_file ]() −> optional<frame> {
frame f = read_frame(file);
if (! file ) return {};
return f ;
},
pipeline(
[]( const frame & f) { return filter (f); },
[]( const frame & f) { return gray_scale(f); },
},
[& out_file ](const frame & f) { write_frame(out_file , f); }
);
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 56/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Piecewise pipelines
Image processing
void process(std::istream & in_file , std :: ostream & out_file) {
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 57/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Piecewise pipelines
Image processing
void process(std::istream & in_file , std :: ostream & out_file) {
auto reader = [& in_file ]() −> optional<frame> {
frame f = read_frame(file);
if (! file ) return {};
return f ;
};
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 57/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Piecewise pipelines
Image processing
void process(std::istream & in_file , std :: ostream & out_file) {
auto reader = [& in_file ]() −> optional<frame> {
frame f = read_frame(file);
if (! file ) return {};
return f ;
};
auto transformer = pipeline(
[]( const frame & f) { return filter (f); },
[]( const frame & f) { return gray_scale(f); },
};
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 57/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Piecewise pipelines
Image processing
void process(std::istream & in_file , std :: ostream & out_file) {
auto reader = [& in_file ]() −> optional<frame> {
frame f = read_frame(file);
if (! file ) return {};
return f ;
};
auto transformer = pipeline(
[]( const frame & f) { return filter (f); },
[]( const frame & f) { return gray_scale(f); },
};
auto writer = [& out_file ](const frame & f) { write_frame(out_file , f); }
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 57/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Piecewise pipelines
Image processing
void process(std::istream & in_file , std :: ostream & out_file) {
auto reader = [& in_file ]() −> optional<frame> {
frame f = read_frame(file);
if (! file ) return {};
return f ;
};
auto transformer = pipeline(
[]( const frame & f) { return filter (f); },
[]( const frame & f) { return gray_scale(f); },
};
auto writer = [& out_file ](const frame & f) { write_frame(out_file , f); }
grppi:parallel_execution_native ex;
grppi :: pipeline(ex, reader, transformer, writer );
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 57/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Farm pattern
A farm is a streaming pattern applicable to a stage in a
pipeline, providing multiple tasks to process data items
from a data stream.
A farm has an associated cardinality which is the number
of parallel tasks used to serve the stage.
Each task in a farm runs a transformer for each data item
it receives.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 58/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Farms in pipelines
Improving a video
template <typename Execution>
void run_pipe(const Execution & ex, std::ifstream & filein , std :: ofstream & fileout )
{
grppi :: pipeline(ex,
[& filein ] () −> optional<frame> {
frame f = read_frame(filein);
if (! filein ) retrun {};
return f ;
},
farm(4, []( const frame & f) { return improve(f); },
[& fileout ] (const frame & f) { write_frame(f); }
);
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 59/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Piecewise farms
Improving a video
template <typename Execution>
void run_pipe(const Execution & ex, std::ifstream & filein , std :: ofstream & fileout )
{
auto improver = farm(4, []( const frame & f) { return improve(f); }) ;
grppi :: pipeline(ex,
[& filein ] () −> optional<frame> {
frame f = read_frame(filein);
if (! filein ) retrun {};
return f ;
},
improver,
[& fileout ] (const frame & f) { write_frame(f); }
);
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 60/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Ordering
Signals if pipeline items must be consumed in the same
order they were produced.
Do they need to be time-stamped?
Default is ordered.
API
ex.enable_ordering()
ex.disable_ordering()
bool o = ex.is_ordered()
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 61/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Queueing properties
Some policies (native and omp) use queues to
communicate pipeline stages.
Properties:
Queue size: Buffer size of the queue.
Mode: blocking versus lock-free.
API
ex.set_queue_attributes(100, mode::blocking)
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 62/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Filter pattern
A filter pattern discards (or keeps) the data items from a
data stream based on the outcome of a predicate.
This pattern can be used only as a stage of a pipeline.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 63/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Filter pattern
A filter pattern discards (or keeps) the data items from a
data stream based on the outcome of a predicate.
This pattern can be used only as a stage of a pipeline.
Alternatives:
Keep: Only data items satisfying the predicate are sent to
the next stage.
Discard: Only data items not satisfying the predicate are
sent to the next stage.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 63/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Filtering in
Print primes
bool is_prime(int n);
template <typename Execution>
void print_primes(const Execution & ex, int n)
{
grppi :: pipeline(exec,
[ i=0,max=n]() mutable −> optional<int> {
if ( i<=n) return i++;
else return {};
},
grppi :: keep(is_prime),
[]( int x) { cout << x << "n"; }
);
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 64/80
An introduction to GrPPI
Patterns in GrPPI
Streaming patterns
Filtering out
Discard words
template <typename Execution>
void print_primes(const Execution & ex, std::istream & is)
{
grppi :: pipeline(exec,
[& file ]() −> optional<string> {
string word;
file >> word;
if (! file ) { return {}; }
else { return word; }
},
grppi :: discard ([]( std :: string w) { return w.length() < 4; },
[]( std :: string w) { cout << x << "n"; }
);
}
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 65/80
An introduction to GrPPI
Evaluation
1 Introduction
2 Simple use
3 Patterns in GrPPI
4 Evaluation
5 Conclusions
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 66/80
An introduction to GrPPI
Evaluation
Example
Video frame processing for border detection with two
filters.
Gaussian blur.
Sobel.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 67/80
An introduction to GrPPI
Evaluation
Example
Video frame processing for border detection with two
filters.
Gaussian blur.
Sobel.
Using pipeline pattern:
S1: Frame reading.
S2: Gaussian blur (may apply farm).
S3: Sobel filter (may apply farm).
S4: Frame writing.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 67/80
An introduction to GrPPI
Evaluation
Example
Video frame processing for border detection with two
filters.
Gaussian blur.
Sobel.
Using pipeline pattern:
S1: Frame reading.
S2: Gaussian blur (may apply farm).
S3: Sobel filter (may apply farm).
S4: Frame writing.
Approaches:
Manual.
GrPPI.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 67/80
An introduction to GrPPI
Evaluation
Parallelization effort
Pipeline % LOC increase
Composition C++ Threads OpenMP Intel TBB GrPPI
( p | p | p | p ) +8.8 % +13.0 % +25.9 % +1.8 %
( p | f | p | p ) +59.4 % +62.6 % +25.9 % +3.1 %
( p | p | f | p ) +60.0 % +63.9 % +25.9 % +3.1 %
( p | f | f | p ) +106.9 % +109.4 % +25.9 % +4.4 %
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 68/80
An introduction to GrPPI
Evaluation
Performance: frames per second
1
4
16
64
480p 720p 1080p 1440p 2160p
Fotogramasporsegundo
Pipeline (p|p|p|p)
1
4
16
64
480p 720p 1080p 1440p 2160p
Pipeline (p|f|p|p)
1
4
16
64
480p 720p 1080p 1440p 2160p
Fotogramasporsegundo
Resolución de video
Pipeline (p|p|f|p)
1
4
16
64
256
480p 720p 1080p 1440p 2160p
Pipeline (p|f|f|p)
C++11
GrPPI C++11
OpenMP
GrPPI OpenMP
TBB
GrPPI TBB
Resolución de video
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 69/80
An introduction to GrPPI
Evaluation
Brain MRI (Magnetic Resonance Imaging)
Non intrusive method for getting
internal anatomy.
Huge amount of data generated.
Applied to neuro-sciences.
Bipolar disorder.
Paranoia.
Schizophrenia.
Identification of fibers and connectivity between areas in
brain.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 70/80
An introduction to GrPPI
Evaluation
Fibbers in brain
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 71/80
An introduction to GrPPI
Evaluation
MRI Evaluation
0
100
200
300
400
500
600
700
800
900
1000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Time(seconds)
Number of threads
GRPPI+SEQ GRPPI+TBB GRPPI+OMP GRPPI+THR GRPPI+FastFlow
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 72/80
An introduction to GrPPI
Conclusions
1 Introduction
2 Simple use
3 Patterns in GrPPI
4 Evaluation
5 Conclusions
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 73/80
An introduction to GrPPI
Conclusions
Summary
A unified programming model for sequential and
parallel modes.
Multiple back-ends available: sequential, OpenMP, TBB,
native, FastFlow.
Current pattern set:
Data: map, reduce, map/reduce, stencil.
Task: divide/conquer.
Streaming: pipeline with nesting of farm, filter,
reduction, iteration.
Low programming effort.
Very low performance overhead.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 74/80
An introduction to GrPPI
Conclusions
Further reading
A Generic Parallel Pattern Interface for Stream and Data Processing. D. del
Rio, M. F. Dolz, J. Fernández, J. D. García. Concurrency and Computation:
Practice and Experience. 2017.
Exploring stream parallel patterns in distributed MPI environments. Javier
López-Gómez, Javier Fernández Muñoz, David del Rio Astorga, Manuel F. Dolz,
J. Daniel Garcia. Parallel Computing. 84:24–36, 2019.
Challenging the abstraction penalty in parallel patterns libraries. J. D.
Garcia, D. del Rio, M. Aldinucci, F. Tordini, M. Danelutto, G. Mencagli, M.
Torquati. Journal of Supercomputing, 21pp. 2019.
Paving the way towards high-level parallel pattern interfaces for data
stream processing. D. del Rio, M. F. Dolz, J. Fernandez, and J. D. Garcia.
Future Generation Computer Systems 87:228-241. 2018
Supporting Advanced Patterns in GrPPI: a Generic Parallel Pattern
Interface. D. del rio, M. F. Dolz, J. Fernandez, and J. D. Garcia, Auto-DaSP 2017
(Euro-Par 2017).
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 75/80
An introduction to GrPPI
Conclusions
Further reading
Towards Automatic Parallelization of Stream Processing Applications. M. F.
Dolz, D. del Rio, J. Fernandez, J. D. Garcia, and J. Carretero, IEEE Access,
6:39944-39961. 2018.
Finding parallel patterns through static analysis in C++ applications. D. R.
del Astorga, M. F. Dolz, L. M. Sanchez, J. D. Garcia, M. Danelutto, and M.
Torquati, International Journal of High Performance Computing Applications,
2017.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 76/80
An introduction to GrPPI
Conclusions
GrPPI
https://github.com/arcosuc3m/grppi
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 77/80
An introduction to GrPPI
Conclusions
Future work
Additional backends:
Accelerator integration, distributed computing, . . .
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 78/80
An introduction to GrPPI
Conclusions
Future work
Additional backends:
Accelerator integration, distributed computing, . . .
Next generation C++:
Concepts (reduce metaprogramming), modules.
Executors: parallel versus reactive models.
Ranges.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 78/80
An introduction to GrPPI
Conclusions
Future work
Additional backends:
Accelerator integration, distributed computing, . . .
Next generation C++:
Concepts (reduce metaprogramming), modules.
Executors: parallel versus reactive models.
Ranges.
More patterns and less patterns:
Simplified interface.
Pattern catalog extension.
Minimal building blocks identification.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 78/80
An introduction to GrPPI
Conclusions
Future work
Additional backends:
Accelerator integration, distributed computing, . . .
Next generation C++:
Concepts (reduce metaprogramming), modules.
Executors: parallel versus reactive models.
Ranges.
More patterns and less patterns:
Simplified interface.
Pattern catalog extension.
Minimal building blocks identification.
Pattern composition:
Pattern fusion.
Pattern transformation rules.
Multi-context approach for heterogenous computing.
Better support of NUMA and affinity.
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 78/80
An introduction to GrPPI
Conclusions
The future
grppi :: omp_execution omp;
grppi :: cuda_execution cuda;
std :: vector<int> v = get() ;
auto comp = v |
grppi :: map([](auto x) { return x∗x; }) |
grppi :: reduce([](auto x, auto y) { return x+y; }) ;
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 79/80
An introduction to GrPPI
Conclusions
The future
grppi :: omp_execution omp;
grppi :: cuda_execution cuda;
std :: vector<int> v = get() ;
auto comp = v |
grppi :: map([](auto x) { return x∗x; }) |
grppi :: reduce([](auto x, auto y) { return x+y; }) ;
comp >> grppi::on(omp);
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 79/80
An introduction to GrPPI
Conclusions
The future
grppi :: omp_execution omp;
grppi :: cuda_execution cuda;
std :: vector<int> v = get() ;
auto comp = v |
grppi :: map([](auto x) { return x∗x; }) |
grppi :: reduce([](auto x, auto y) { return x+y; }) ;
comp >> grppi::on(omp);
auto pipe = generator |
( grppi :: seq ([]( const frame & f) { return filter (f); }) |
grppi :: farm(4, []( const frame & f) { return to_gray(f); })
>> on(cuda)
) |
frame_writer >> on(omp);
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 79/80
An introduction to GrPPI
Conclusions
An introduction to GrPPI
Generic Reusable Parallel Patterns Interface
J. Daniel Garcia
ARCOS Group
University Carlos III of Madrid
Spain
November 2019
cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 80/80

More Related Content

Similar to Simplificando el paralelismo con Patrones Genéricos Paralelos

cemracs_vivabrain_slides
cemracs_vivabrain_slidescemracs_vivabrain_slides
cemracs_vivabrain_slides
Jussara F.M.
 

Similar to Simplificando el paralelismo con Patrones Genéricos Paralelos (16)

Federated Non-Traditional Practical Work for Engineering Education
Federated Non-Traditional Practical Work for Engineering EducationFederated Non-Traditional Practical Work for Engineering Education
Federated Non-Traditional Practical Work for Engineering Education
 
Sudarshan Joshi,BCA 2nd Year
Sudarshan Joshi,BCA 2nd Year Sudarshan Joshi,BCA 2nd Year
Sudarshan Joshi,BCA 2nd Year
 
Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?
 
stanford_graph-learning_workshop.pdf
stanford_graph-learning_workshop.pdfstanford_graph-learning_workshop.pdf
stanford_graph-learning_workshop.pdf
 
EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overview
 
IAP09 CUDA@MIT 6.963 - Lecture 01: Logistics (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 01: Logistics (Nicolas Pinto, MIT)IAP09 CUDA@MIT 6.963 - Lecture 01: Logistics (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 01: Logistics (Nicolas Pinto, MIT)
 
Yashika Soni, BCA 2nd Year
Yashika Soni, BCA 2nd YearYashika Soni, BCA 2nd Year
Yashika Soni, BCA 2nd Year
 
Harendra Singh Rawat,BCA 2nd Year
Harendra Singh Rawat,BCA 2nd YearHarendra Singh Rawat,BCA 2nd Year
Harendra Singh Rawat,BCA 2nd Year
 
2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research2014-10-10-SBC361-Reproducible research
2014-10-10-SBC361-Reproducible research
 
Linkedin Resume- CFD engineer
Linkedin Resume- CFD engineerLinkedin Resume- CFD engineer
Linkedin Resume- CFD engineer
 
Doctoral Presentation - April 2023.pptx
Doctoral Presentation - April 2023.pptxDoctoral Presentation - April 2023.pptx
Doctoral Presentation - April 2023.pptx
 
Doctoral Presentation Sept 2022.pptx
Doctoral Presentation Sept 2022.pptxDoctoral Presentation Sept 2022.pptx
Doctoral Presentation Sept 2022.pptx
 
cemracs_vivabrain_slides
cemracs_vivabrain_slidescemracs_vivabrain_slides
cemracs_vivabrain_slides
 
High scalable applications with Python
High scalable applications with PythonHigh scalable applications with Python
High scalable applications with Python
 
Cs8383 oop lab manual-2019
Cs8383 oop lab manual-2019Cs8383 oop lab manual-2019
Cs8383 oop lab manual-2019
 
Mukesh
MukeshMukesh
Mukesh
 

More from Facultad de Informática UCM

More from Facultad de Informática UCM (20)

¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?¿Por qué debemos seguir trabajando en álgebra lineal?
¿Por qué debemos seguir trabajando en álgebra lineal?
 
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
TECNOPOLÍTICA Y ACTIVISMO DE DATOS: EL MAPEO COMO FORMA DE RESILIENCIA ANTE L...
 
DRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation ComputersDRAC: Designing RISC-V-based Accelerators for next generation Computers
DRAC: Designing RISC-V-based Accelerators for next generation Computers
 
uElectronics ongoing activities at ESA
uElectronics ongoing activities at ESAuElectronics ongoing activities at ESA
uElectronics ongoing activities at ESA
 
Tendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura ArmTendencias en el diseño de procesadores con arquitectura Arm
Tendencias en el diseño de procesadores con arquitectura Arm
 
Formalizing Mathematics in Lean
Formalizing Mathematics in LeanFormalizing Mathematics in Lean
Formalizing Mathematics in Lean
 
Introduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented ComputingIntroduction to Quantum Computing and Quantum Service Oriented Computing
Introduction to Quantum Computing and Quantum Service Oriented Computing
 
Computer Design Concepts for Machine Learning
Computer Design Concepts for Machine LearningComputer Design Concepts for Machine Learning
Computer Design Concepts for Machine Learning
 
Inteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuroInteligencia Artificial en la atención sanitaria del futuro
Inteligencia Artificial en la atención sanitaria del futuro
 
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
 Design Automation Approaches for Real-Time Edge Computing for Science Applic... Design Automation Approaches for Real-Time Edge Computing for Science Applic...
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
 
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
Estrategias de navegación para robótica móvil de campo: caso de estudio proye...
 
Fault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error CorrectionFault-tolerance Quantum computation and Quantum Error Correction
Fault-tolerance Quantum computation and Quantum Error Correction
 
Cómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intentoCómo construir un chatbot inteligente sin morir en el intento
Cómo construir un chatbot inteligente sin morir en el intento
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPC
 
Type and proof structures for concurrency
Type and proof structures for concurrencyType and proof structures for concurrency
Type and proof structures for concurrency
 
Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...
 
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
Jose carlossancho slidesLa seguridad en el desarrollo de software implementad...
 
Do you trust your artificial intelligence system?
Do you trust your artificial intelligence system?Do you trust your artificial intelligence system?
Do you trust your artificial intelligence system?
 
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.Redes neuronales y reinforcement learning. Aplicación en energía eólica.
Redes neuronales y reinforcement learning. Aplicación en energía eólica.
 
Challenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore windChallenges and Opportunities for AI and Data analytics in Offshore wind
Challenges and Opportunities for AI and Data analytics in Offshore wind
 

Recently uploaded

21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
rahulmanepalli02
 
Online crime reporting system project.pdf
Online crime reporting system project.pdfOnline crime reporting system project.pdf
Online crime reporting system project.pdf
Kamal Acharya
 
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdfALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
Madan Karki
 

Recently uploaded (20)

UNIT-2 image enhancement.pdf Image Processing Unit 2 AKTU
UNIT-2 image enhancement.pdf Image Processing Unit 2 AKTUUNIT-2 image enhancement.pdf Image Processing Unit 2 AKTU
UNIT-2 image enhancement.pdf Image Processing Unit 2 AKTU
 
Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...
Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...
Vip ℂall Girls Karkardooma Phone No 9999965857 High Profile ℂall Girl Delhi N...
 
Electrical shop management system project report.pdf
Electrical shop management system project report.pdfElectrical shop management system project report.pdf
Electrical shop management system project report.pdf
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 
Introduction to Artificial Intelligence and History of AI
Introduction to Artificial Intelligence and History of AIIntroduction to Artificial Intelligence and History of AI
Introduction to Artificial Intelligence and History of AI
 
Multivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptxMultivibrator and its types defination and usges.pptx
Multivibrator and its types defination and usges.pptx
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptx
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...
 
Raashid final report on Embedded Systems
Raashid final report on Embedded SystemsRaashid final report on Embedded Systems
Raashid final report on Embedded Systems
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptx
 
Linux Systems Programming: Semaphores, Shared Memory, and Message Queues
Linux Systems Programming: Semaphores, Shared Memory, and Message QueuesLinux Systems Programming: Semaphores, Shared Memory, and Message Queues
Linux Systems Programming: Semaphores, Shared Memory, and Message Queues
 
Online crime reporting system project.pdf
Online crime reporting system project.pdfOnline crime reporting system project.pdf
Online crime reporting system project.pdf
 
handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailing
 
Geometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdfGeometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdf
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptx
 
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdfALCOHOL PRODUCTION- Beer Brewing Process.pdf
ALCOHOL PRODUCTION- Beer Brewing Process.pdf
 
Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2
 
Dynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptxDynamo Scripts for Task IDs and Space Naming.pptx
Dynamo Scripts for Task IDs and Space Naming.pptx
 
"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.
 
Introduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of ArduinoIntroduction to Arduino Programming: Features of Arduino
Introduction to Arduino Programming: Features of Arduino
 

Simplificando el paralelismo con Patrones Genéricos Paralelos

  • 1. An introduction to GrPPI An introduction to GrPPI Generic Reusable Parallel Patterns Interface J. Daniel Garcia ARCOS Group University Carlos III of Madrid Spain November 2019 cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 1/80
  • 2. An introduction to GrPPI Warning c This work is under Attribution-NonCommercial- NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license. You are free to Share — copy and redistribute the ma- terial in any medium or format. b You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. e You may not use the material for commercial purposes. d If you remix, transform, or build upon the material, you may not distribute the modified material. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 2/80
  • 3. An introduction to GrPPI ARCOS@uc3m UC3M: A young international research oriented university. ARCOS: An applied research group. Lines: High Performance Computing, Big data, Cyberphysical Systems, and Programming models for application improvement. Programming Models for Application Improvement: Provide programming tools for improving: Performance. Energy efficiency. Maintainability. Correctness. Standardization: ISO/IEC JTC/SC22/WG21. ISO C++ standards committe. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 3/80
  • 4. An introduction to GrPPI Acknowledgements Project 801091 “ASPIDE: Exascale programming models for extreme programming funded by the European Commission through H2020 FET-HPC (2018-2020). Project ICT 644235 “REPHRASE: REfactoring Parallel Heterogeneous Resource-aware Applications” funded by the European Commission through H2020 program (2015-2018). Project TIN2016-79673-P “Towards Unification of HPC and Big Data Paradigms” funded by the Spanish Ministry of Economy and Competitiveness (2016-2019). cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 4/80
  • 5. An introduction to GrPPI GrPPI team Main team J. Daniel Garcia (UC3M, lead). Javier Garcia Blas (UC3M). David del Río (UC3M). Javier Fernández (UC3M). Cooperation Marco Danelutto (Univ. Pisa) Massimo Torquati (Univ. Pisa) Marco Aldinucci (Univ. Torino) Fabio Tordini (Univ. Torino) Plácido Fernández (UC3M-CERN). Manuel F. Dolz (Univ. Jaume I). . . . cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 5/80
  • 6. An introduction to GrPPI Introduction 1 Introduction 2 Simple use 3 Patterns in GrPPI 4 Evaluation 5 Conclusions cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 6/80
  • 7. An introduction to GrPPI Introduction Sequential Programming versus Parallel Programming Sequential programming Well-known set of control-structures embedded in programming languages. Control structures inherently sequential. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 7/80
  • 8. An introduction to GrPPI Introduction Sequential Programming versus Parallel Programming Sequential programming Well-known set of control-structures embedded in programming languages. Control structures inherently sequential. Traditional Parallel programming Constructs adapting sequential control structures to the parallel world (e.g. parallel-for). cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 7/80
  • 9. An introduction to GrPPI Introduction Sequential Programming versus Parallel Programming Sequential programming Well-known set of control-structures embedded in programming languages. Control structures inherently sequential. Traditional Parallel programming Constructs adapting sequential control structures to the parallel world (e.g. parallel-for). But wait! What if we had constructs that could be both sequential and parallel? cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 7/80
  • 10. An introduction to GrPPI Introduction Software design There are two ways of constructing a software design: cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 8/80
  • 11. An introduction to GrPPI Introduction Software design There are two ways of constructing a software design: One way is cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 8/80
  • 12. An introduction to GrPPI Introduction Software design There are two ways of constructing a software design: One way is to make it so simple that there are obviously no defi- ciencies, cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 8/80
  • 13. An introduction to GrPPI Introduction Software design There are two ways of constructing a software design: One way is to make it so simple that there are obviously no defi- ciencies, and the other way is cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 8/80
  • 14. An introduction to GrPPI Introduction Software design There are two ways of constructing a software design: One way is to make it so simple that there are obviously no defi- ciencies, and the other way is to make it so complicated that there are no obvious deficiencies. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 8/80
  • 15. An introduction to GrPPI Introduction Software design There are two ways of constructing a software design: One way is to make it so simple that there are obviously no defi- ciencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. C.A.R Hoare cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 8/80
  • 16. An introduction to GrPPI Introduction Adding two vectors Traditional way using numvec = std::vector<double>; numvec add(const numvec & v1, const numvec & v2) { numvec res; res.reserve(v1.size() ) ; // Asume equal sizes for (int i =0; i <v1.size() ; ++i) { res.push_back(v1[i]+v2[i]) ; } return res; } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 9/80
  • 17. An introduction to GrPPI Introduction Adding two vectors Traditional way using numvec = std::vector<double>; numvec add(const numvec & v1, const numvec & v2) { numvec res; res.reserve(v1.size() ) ; // Asume equal sizes for (int i =0; i <v1.size() ; ++i) { res.push_back(v1[i]+v2[i]) ; } return res; } Adds additional constraints. Traversing in-order. Potential mistakes. i<v1.size() versus i<=v1.size(). cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 9/80
  • 18. An introduction to GrPPI Introduction Adding two vectors The STL way (C++98/03) using numvec = std::vector<double>; numvec add(const numvec & v1, const numvec & v2) { numvec res; res.reserve(v1.size() ) ; // Asume equal sizes std :: transform(v1.begin(), v1.end(), v2.begin() , std :: back_inserter(res), std :: plus<>{}) ; return res; } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 10/80
  • 19. An introduction to GrPPI Introduction Adding two vectors The STL way (C++98/03) using numvec = std::vector<double>; numvec add(const numvec & v1, const numvec & v2) { numvec res; res.reserve(v1.size() ) ; // Asume equal sizes std :: transform(v1.begin(), v1.end(), v2.begin() , std :: back_inserter(res), std :: plus<>{}) ; return res; } Minimize off-by-one mistakes. Type specific optimizations. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 10/80
  • 20. An introduction to GrPPI Introduction Adding two vectors The STL way (C++14) using numvec = std::vector<double>; numvec add(const numvec & v1, const numvec & v2) { numvec res; res.reserve(v1.size() ) ; // Asume equal sizes std :: transform(v1.begin(), v1.end(), v2.begin() , std :: back_inserter(res), []( auto x, auto y) { return x+y; }) ; return res; } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 11/80
  • 21. An introduction to GrPPI Introduction Adding two vectors The STL way (C++14) using numvec = std::vector<double>; numvec add(const numvec & v1, const numvec & v2) { numvec res; res.reserve(v1.size() ) ; // Asume equal sizes std :: transform(v1.begin(), v1.end(), v2.begin() , std :: back_inserter(res), []( auto x, auto y) { return x+y; }) ; return res; } Use (possibly generic) lambdas . cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 11/80
  • 22. An introduction to GrPPI Introduction A brief history of patterns From building and architecture (Cristopher Alexander): 1977: A Pattern Language: Towns, Buildings, Construction. 1979: The timeless way of buildings. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 12/80
  • 23. An introduction to GrPPI Introduction A brief history of patterns From building and architecture (Cristopher Alexander): 1977: A Pattern Language: Towns, Buildings, Construction. 1979: The timeless way of buildings. To software design (Gamma et al.): 1993: Design Patterns: abstraction and reuse of object oriented design. ECOOP. 1995: Design Patterns. Elements of Reusable Object-Oriented Software. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 12/80
  • 24. An introduction to GrPPI Introduction A brief history of patterns From building and architecture (Cristopher Alexander): 1977: A Pattern Language: Towns, Buildings, Construction. 1979: The timeless way of buildings. To software design (Gamma et al.): 1993: Design Patterns: abstraction and reuse of object oriented design. ECOOP. 1995: Design Patterns. Elements of Reusable Object-Oriented Software. To parallel programming (McCool, Reinders, Robinson): 2012: Structured Parallel Programming: Patterns for Efficient Computation. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 12/80
  • 25. An introduction to GrPPI Introduction Parallel Patterns Levels of patterns: Software Architecture Patterns. Design Patterns. Algorithm Strategy Patterns. Implementation patterns. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 13/80
  • 26. An introduction to GrPPI Introduction Parallel Patterns Levels of patterns: Software Architecture Patterns. Design Patterns. Algorithm Strategy Patterns. Implementation patterns. Algorithm Strategy Pattern or Algorithmic Skeleton A way to codify best practices in parallel programming in a reusable way. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 13/80
  • 27. An introduction to GrPPI Introduction Parallel Patterns Levels of patterns: Software Architecture Patterns. Design Patterns. Algorithm Strategy Patterns. Implementation patterns. Algorithm Strategy Pattern or Algorithmic Skeleton A way to codify best practices in parallel programming in a reusable way. A single Parallel Pattern may have different realizations in different programming models. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 13/80
  • 28. An introduction to GrPPI Introduction Example: map pattern Apply a single elemental operation to different data items. SAXPY: Sequential for (size_t i=0; i<n; ++i) { y[ i ] = a ∗ x[ i ] + y[ i ]; } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 14/80
  • 29. An introduction to GrPPI Introduction Example: map pattern Apply a single elemental operation to different data items. SAXPY: Sequential for (size_t i=0; i<n; ++i) { y[ i ] = a ∗ x[ i ] + y[ i ]; } SAXPY: OpenMP #pragma omp parallel for for (size_t i=0; i<n; ++i) { y[ i ] = a ∗ x[ i ] + y[ i ]; } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 14/80
  • 30. An introduction to GrPPI Introduction Example: map pattern Apply a single elemental operation to different data items. SAXPY: Sequential for (size_t i=0; i<n; ++i) { y[ i ] = a ∗ x[ i ] + y[ i ]; } SAXPY: OpenMP #pragma omp parallel for for (size_t i=0; i<n; ++i) { y[ i ] = a ∗ x[ i ] + y[ i ]; } SAXPY: TBB tbb :: parallel_for (tbb :: blocked_range<int>{0,n}, [&](tbb :: blocked_range<int> r) { for (auto i : r) { y[ i ] = a ∗ x[ i ] + y[ i ]; } }) ; cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 14/80
  • 31. An introduction to GrPPI Introduction Ideals for Parallel Patterns Applications should be expressed independently of the execution model. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 15/80
  • 32. An introduction to GrPPI Introduction Ideals for Parallel Patterns Applications should be expressed independently of the execution model. Computations should be expressed in terms of structured composable patterns. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 15/80
  • 33. An introduction to GrPPI Introduction Ideals for Parallel Patterns Applications should be expressed independently of the execution model. Computations should be expressed in terms of structured composable patterns. Multiple back-ends should be offered with simple switching mechanisms. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 15/80
  • 34. An introduction to GrPPI Introduction Ideals for Parallel Patterns Applications should be expressed independently of the execution model. Computations should be expressed in terms of structured composable patterns. Multiple back-ends should be offered with simple switching mechanisms. Interface should integrate seamlessly with modern C++ and its standard library. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 15/80
  • 35. An introduction to GrPPI Introduction Ideals for Parallel Patterns Applications should be expressed independently of the execution model. Computations should be expressed in terms of structured composable patterns. Multiple back-ends should be offered with simple switching mechanisms. Interface should integrate seamlessly with modern C++ and its standard library. Applications should be able to take advantage of modern C++ language features. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 15/80
  • 36. An introduction to GrPPI Introduction C++17: Execution policies An execution policy allows to disambiguate parallel algorithms overloads. sequenced_policy: Not parallelized. std::execution::seq. parallel_policy: Execution can be parallelized in multiple threads. std::execution::par. parallel_unsequenced_policy: Execution can be parallelized, vectorized, or migrated across threads. std::execution::par_unseq. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 16/80
  • 37. An introduction to GrPPI Introduction C++17: Parallel algorithms Most algorithms in <algorithm> and <numeric> have execution policy based versions. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 17/80
  • 38. An introduction to GrPPI Introduction C++17: Parallel algorithms Most algorithms in <algorithm> and <numeric> have execution policy based versions. SAXPY: C++17 std :: transform(std::execution::par, y.begin(), y.end(), x.begin(), y.begin(), [=]( auto vx, auto vy) { return a ∗ vx + vy; }) ; cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 17/80
  • 39. An introduction to GrPPI Introduction C++17: Parallel algorithms Most algorithms in <algorithm> and <numeric> have execution policy based versions. SAXPY: C++17 std :: transform(std::execution::par, y.begin(), y.end(), x.begin(), y.begin(), [=]( auto vx, auto vy) { return a ∗ vx + vy; }) ; However . . . No fine control knobs. Lack of task parallelism and stream parallelism. OK as an entry level to parallelism. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 17/80
  • 40. An introduction to GrPPI Introduction GrPPI https://github.com/arcosuc3m/grppi cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 18/80
  • 41. An introduction to GrPPI Introduction GrPPI https://github.com/arcosuc3m/grppi Generic reusable Parallel Pattern Interface. A header only library (might change). A set of execution policies. A set of type safe generic algorithms. Requires C++14. Apache 2.0 License. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 18/80
  • 42. An introduction to GrPPI Simple use 1 Introduction 2 Simple use 3 Patterns in GrPPI 4 Evaluation 5 Conclusions cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 19/80
  • 43. An introduction to GrPPI Simple use Example: Transforming a sequence Given a sequence of frames generate a new sequence of frames in grayscale. struct frame { /∗ ... ∗/ }; frame togray(const frame & f); frame[0] frame[1] frame[2] frame[3] frame[4] frame[0] frame[1] frame[2] frame[3] frame[4] togray() togray() togray() togray() togray() cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 20/80
  • 44. An introduction to GrPPI Simple use Transforming a sequence Traditional explicit loop using frameseq = std::vector<frame>; frameseq seq_togray(const frameseq & s) { frameseq r; r.reserve(s.size() ) ; // Requires processing in−order for (const auto & f : s) { r.push_back(togray(f)); } return r; } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 21/80
  • 45. An introduction to GrPPI Simple use Transforming a sequence STL way using frameseq = std::vector<frame>; frameseq seq_togray(const frameseq & s) { frameseq r; r.reserve(s.size() ) ; std :: transform(s.begin(), s.end(), std :: back_inserter(r), togray); return r; } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 22/80
  • 46. An introduction to GrPPI Simple use Transforming a sequence Parallel STL way (C++17) using frameseq = std::vector<frame>; frameseq seq_togray(const frameseq & s) { frameseq r; r.reserve(s.size () ) ; // No execution order assumed std :: transform(std :: execution::par, s.begin(), s.end(), std :: back_inserter(r), togray); return r; } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 23/80
  • 47. An introduction to GrPPI Simple use Transforming a sequence GrPPI using frameseq = std::vector<frame>; frameseq seq_togray(const frameseq & s) { frameseq r(s.size() ); grppi :: sequential_execution seq; grppi :: map(seq, s.begin(), s.end(), r.begin(), togray); return r; } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 24/80
  • 48. An introduction to GrPPI Simple use Transforming a sequence GrPPI + lambda using frameseq = std::vector<frame>; frameseq seq_togray(const frameseq & s) { frameseq r(s.size() ); grppi :: sequential_execution seq; grppi :: map(seq, s.begin(), s.end(), r.begin(), []( const frame & f) { return filter (f,64); }) ; return r; } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 25/80
  • 49. An introduction to GrPPI Simple use Transforming a sequence GrPPI + lambda - iterators using frameseq = std::vector<frame>; frameseq seq_togray(const frameseq & s) { frameseq r(s.size() ); grppi :: sequential_execution seq; grppi :: map(seq, s, r, []( const frame & f) { return filter (f,64); }) ; return r; } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 26/80
  • 50. An introduction to GrPPI Patterns in GrPPI 1 Introduction 2 Simple use 3 Patterns in GrPPI 4 Evaluation 5 Conclusions cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 27/80
  • 51. An introduction to GrPPI Patterns in GrPPI Controlling execution 3 Patterns in GrPPI Controlling execution Patterns overview Data patterns Streaming patterns cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 28/80
  • 52. An introduction to GrPPI Patterns in GrPPI Controlling execution Execution types Execution model is encapsulated in execution types. Always provided as first argument to patterns. Current concrete execution types: Sequential: sequential_execution. ISO C++ Threads: parallel_execution_native. OpenMP: parallel_execution_omp. Intel TBB: parallel_execution_tbb. FastFlow: parallel_execution_ff. Run-time polymorphic wrapper through type erasure: dynamic_execution. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 29/80
  • 53. An introduction to GrPPI Patterns in GrPPI Controlling execution Execution model properties Some execution types allow finer configurtion. Example: Concurrency degree. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 30/80
  • 54. An introduction to GrPPI Patterns in GrPPI Controlling execution Execution model properties Some execution types allow finer configurtion. Example: Concurrency degree. Interface: ex.set_concurrency_degree(4); int n = ex.concurrency_degree(); cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 30/80
  • 55. An introduction to GrPPI Patterns in GrPPI Controlling execution Execution model properties Some execution types allow finer configurtion. Example: Concurrency degree. Interface: ex.set_concurrency_degree(4); int n = ex.concurrency_degree(); Default values: Sequential ⇒ 1. Native ⇒ std::thread::hardware_concurrency(). OpenMP ⇒ omp_get_num_threads(). cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 30/80
  • 56. An introduction to GrPPI Patterns in GrPPI Patterns overview 3 Patterns in GrPPI Controlling execution Patterns overview Data patterns Streaming patterns cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 31/80
  • 57. An introduction to GrPPI Patterns in GrPPI Patterns overview A classification Data patterns: Express computations over a data set. map, reduce, map/reduce, stencil. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 32/80
  • 58. An introduction to GrPPI Patterns in GrPPI Patterns overview A classification Data patterns: Express computations over a data set. map, reduce, map/reduce, stencil. Task patterns: Express task composition. divide/conquer. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 32/80
  • 59. An introduction to GrPPI Patterns in GrPPI Patterns overview A classification Data patterns: Express computations over a data set. map, reduce, map/reduce, stencil. Task patterns: Express task composition. divide/conquer. Streaming patterns: Express computations ver a (possibly unbounded) data set. pipeline. Specialized stages: farm, filter, reduction, iteration. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 32/80
  • 60. An introduction to GrPPI Patterns in GrPPI Data patterns 3 Patterns in GrPPI Controlling execution Patterns overview Data patterns Streaming patterns cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 33/80
  • 61. An introduction to GrPPI Patterns in GrPPI Data patterns Patterns on data sets A data pattern performs an operation on one or more data sets that are already in memory. Input: One or more data sets. Operations. Output: A data set (map, stencil). A single value (reduce, map/reduce). cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 34/80
  • 62. An introduction to GrPPI Patterns in GrPPI Data patterns Maps on data sequences A map pattern applies an operation to every element in a a data set, generating a new data set. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 35/80
  • 63. An introduction to GrPPI Patterns in GrPPI Data patterns Maps on data sequences A map pattern applies an operation to every element in a a data set, generating a new data set. Unidimensional: cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 35/80
  • 64. An introduction to GrPPI Patterns in GrPPI Data patterns Maps on data sequences A map pattern applies an operation to every element in a a data set, generating a new data set. Unidimensional: Multidimensional: cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 35/80
  • 65. An introduction to GrPPI Patterns in GrPPI Data patterns Single sequences mapping Double all elements in vector sequentially std :: vector<double> double_elements(const std::vector<double> & v) { std :: vector<double> res(v.size()); grppi :: sequential_execution seq; grppi :: map(seq, v, res, []( double x) { return 2∗x; }) ; return res; } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 36/80
  • 66. An introduction to GrPPI Patterns in GrPPI Data patterns Single sequences mapping Double all elements in vector with OpenMP std :: vector<double> double_elements(const std::vector<double> & v) { std :: vector<double> res(v.size()); grppi :: parallel_execution_omp omp; grppi :: map(omp, v, res, []( double x) { return 2∗x; }) ; return res; } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 37/80
  • 67. An introduction to GrPPI Patterns in GrPPI Data patterns Multiple sequences mapping Add two vectors template <typename Execution> std :: vector<double> add_vectors(const Execution & ex, const std::vector<double> & v1, const std::vector<double> & v2) { auto size = std :: min(v1.size() , v2.size() ); std :: vector<double> res(size); grppi :: map(ex, zip(v1, v2), res, []( double x, double y) { return x+y; }) ; return res; } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 38/80
  • 68. An introduction to GrPPI Patterns in GrPPI Data patterns Multiple sequences mapping Add three vectors template <typename Execution> std :: vector<double> add_vectors(const Execution & ex, const std::vector<double> & v1, const std::vector<double> & v2, const std::vector<double> & v3) { auto size = std :: min(v1.size() , v2.size() ); std :: vector<double> res(size); grppi :: map(ex, zip(v1,v2,v3), res, []( double x, double y, double z) { return x+y+z; }); return res; } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 39/80
  • 69. An introduction to GrPPI Patterns in GrPPI Data patterns Heterogeneous mapping The result can be from a different type. Complex vector from real and imaginary vectors template <typename Execution> std :: vector<complex<double>> create_cplx(const Execution & ex, const std::vector<double> & re, const std::vector<double> & im) { auto size = std :: min(re.size() , im.size() ); std :: vector<complex<double>> res(size); grppi :: map(ex, zip(re,im), res, []( double r, double i) −> complex<double> { return {r,i}; }); return res; } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 40/80
  • 70. An introduction to GrPPI Patterns in GrPPI Data patterns Reductions on data sequences A reduce pattern combines all values in a data set using a binary combination operation. id result cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 41/80
  • 71. An introduction to GrPPI Patterns in GrPPI Data patterns Homogeneous reduction Add a sequence of values template <typename Execution> double add_sequence(const Execution & ex, const vector<double> & v) { return grppi :: reduce(ex, v, 0.0, []( double x, double y) { return x+y; }) ; } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 42/80
  • 72. An introduction to GrPPI Patterns in GrPPI Data patterns Heterogeneous reduction Add areas of shapes template <typename Execution> double add_areas(const Execution & ex, const std::vector<shape> & shapes) { return grppi :: reduce(ex, shapes, 0.0, []( double a, const auto & s) { return a + s.area()() ; }) ; } Note: Better expressed as a map reduce. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 43/80
  • 73. An introduction to GrPPI Patterns in GrPPI Data patterns Map/reduce pattern A map/reduce pattern combines a map pattern and a reduce pattern into a single pattern. 1 One or more data sets are mapped applying a transformation operation. 2 The results are combined by a reduction operation. A map/reduce could be also expressed by the composition of a map and a reduce. However, map/reduce may potentially fuse both stages, allowing for extra optimizations. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 44/80
  • 74. An introduction to GrPPI Patterns in GrPPI Data patterns Map/reduce id result cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 45/80
  • 75. An introduction to GrPPI Patterns in GrPPI Data patterns Single sequence map/reduce Sum of squares template <typename Execution> double sum_squares(const Execution & ex, const std::vector<double> & v) { return grppi :: map_reduce(ex, v, 0.0, []( double x) { return x∗x; } []( double x, double y) { return x+y; } ); } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 46/80
  • 76. An introduction to GrPPI Patterns in GrPPI Data patterns Map/reduce on two data sets Scalar product template <typename Execution> double scalar_product(const Execution & ex, const std::vector<double> & v1, const std::vector<double> & v2) { return grppi :: map_reduce(ex, zip(v1,v2), 0.0, []( double x, double y) { return x∗y; }, []( double x, double y) { return x+y; }) ; } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 47/80
  • 77. An introduction to GrPPI Patterns in GrPPI Data patterns Heterogeneous reduction versus map/reduce Add areas of shapes template <typename Execution> int add_areas(const Execution & ex, const std::vector<shape> & shapes) { return grppi :: map_reduce(ex, shapes, 0.0, []( const auto & s) { return s.area(); }, []( double a, double b) { return a+b; } ); } Note: Better than previous heterogeneous reduce. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 48/80
  • 78. An introduction to GrPPI Patterns in GrPPI Data patterns Cannonical map/reduce Given a sequence of words, produce a container where: The key is the word. The value is the number of occurrences of that word. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 49/80
  • 79. An introduction to GrPPI Patterns in GrPPI Data patterns Cannonical map/reduce Given a sequence of words, produce a container where: The key is the word. The value is the number of occurrences of that word. Word frequencies template <typename Execution> auto word_freq(const Execution & ex, const std::vector<std::string> & words) { cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 49/80
  • 80. An introduction to GrPPI Patterns in GrPPI Data patterns Cannonical map/reduce Given a sequence of words, produce a container where: The key is the word. The value is the number of occurrences of that word. Word frequencies template <typename Execution> auto word_freq(const Execution & ex, const std::vector<std::string> & words) { using namespace std; using dictionary = std :: map<string,int>; cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 49/80
  • 81. An introduction to GrPPI Patterns in GrPPI Data patterns Cannonical map/reduce Given a sequence of words, produce a container where: The key is the word. The value is the number of occurrences of that word. Word frequencies template <typename Execution> auto word_freq(const Execution & ex, const std::vector<std::string> & words) { using namespace std; using dictionary = std :: map<string,int>; return grppi :: map_reduce(ex, words, dictionary{}, cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 49/80
  • 82. An introduction to GrPPI Patterns in GrPPI Data patterns Cannonical map/reduce Given a sequence of words, produce a container where: The key is the word. The value is the number of occurrences of that word. Word frequencies template <typename Execution> auto word_freq(const Execution & ex, const std::vector<std::string> & words) { using namespace std; using dictionary = std :: map<string,int>; return grppi :: map_reduce(ex, words, dictionary{}, []( string w) −> dictionary { return {w,1}; } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 49/80
  • 83. An introduction to GrPPI Patterns in GrPPI Data patterns Cannonical map/reduce Given a sequence of words, produce a container where: The key is the word. The value is the number of occurrences of that word. Word frequencies template <typename Execution> auto word_freq(const Execution & ex, const std::vector<std::string> & words) { using namespace std; using dictionary = std :: map<string,int>; return grppi :: map_reduce(ex, words, dictionary{}, []( string w) −> dictionary { return {w,1}; } []( dictionary & lhs, const dictionary & rhs) −> dictionary { for (auto & entry : rhs) { lhs[entry. first ] += entry.second; } return lhs; }) ; } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 49/80
  • 84. An introduction to GrPPI Patterns in GrPPI Streaming patterns 3 Patterns in GrPPI Controlling execution Patterns overview Data patterns Streaming patterns cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 50/80
  • 85. An introduction to GrPPI Patterns in GrPPI Streaming patterns Pipeline pattern A pipeline pattern allows processing a data stream where the computation may be divided in multiple stages. Each stage processes the data item generated in the previous stage and passes the produced result to the next stage. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 51/80
  • 86. An introduction to GrPPI Patterns in GrPPI Streaming patterns Standalone pipeline A standalone pipeline is a top-level pipeline. Invoking the pipeline translates into its execution. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 52/80
  • 87. An introduction to GrPPI Patterns in GrPPI Streaming patterns Standalone pipeline A standalone pipeline is a top-level pipeline. Invoking the pipeline translates into its execution. Given: A generator g : ∅ → T1 ∪ ∅ A sequence of transformers ti : Ti → Ti+1 For every non-empty value generated by g, it evaluates: tn(tn−1(. . . t1(g()))) cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 52/80
  • 88. An introduction to GrPPI Patterns in GrPPI Streaming patterns Generators A generator g is any callable C++ entity that: Takes no argument. Returns a value of type T that may hold (or not) a value. Null value signals end of stream. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 53/80
  • 89. An introduction to GrPPI Patterns in GrPPI Streaming patterns Generators A generator g is any callable C++ entity that: Takes no argument. Returns a value of type T that may hold (or not) a value. Null value signals end of stream. The return value must be any type that: Is copy-constructible or move-constructible. T x = g(); cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 53/80
  • 90. An introduction to GrPPI Patterns in GrPPI Streaming patterns Generators A generator g is any callable C++ entity that: Takes no argument. Returns a value of type T that may hold (or not) a value. Null value signals end of stream. The return value must be any type that: Is copy-constructible or move-constructible. T x = g(); Is contextually convertible to bool if (x) { /∗ ... ∗/ } if (!x) { /∗ ... ∗/ } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 53/80
  • 91. An introduction to GrPPI Patterns in GrPPI Streaming patterns Generators A generator g is any callable C++ entity that: Takes no argument. Returns a value of type T that may hold (or not) a value. Null value signals end of stream. The return value must be any type that: Is copy-constructible or move-constructible. T x = g(); Is contextually convertible to bool if (x) { /∗ ... ∗/ } if (!x) { /∗ ... ∗/ } Can be derreferenced auto val = ∗x; cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 53/80
  • 92. An introduction to GrPPI Patterns in GrPPI Streaming patterns Generators A generator g is any callable C++ entity that: Takes no argument. Returns a value of type T that may hold (or not) a value. Null value signals end of stream. The return value must be any type that: Is copy-constructible or move-constructible. T x = g(); Is contextually convertible to bool if (x) { /∗ ... ∗/ } if (!x) { /∗ ... ∗/ } Can be derreferenced auto val = ∗x; The standard library offers an excellent candidate std::experimental::optional<T>. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 53/80
  • 93. An introduction to GrPPI Patterns in GrPPI Streaming patterns Simple pipeline x -> x*x -> 1/x -> print template <typename Execution> void run_pipe(const Execution & ex, int n) { grppi :: pipeline(ex, [ i=0,max=n] () mutable −> optional<int> { if ( i<max) return i++; else return {}; }, []( int x) −> double { return x∗x; }, []( double x) { return 1/x; }, []( double x) { cout << x << "n"; } ); } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 54/80
  • 94. An introduction to GrPPI Patterns in GrPPI Streaming patterns Nested pipelines Pipelines may be nested. An inner pipeline: Does not take an execution policy. All stages are transformers (no generator). The last stage must also produce values. The inner pipeline uses the same execution policy than the outer pipeline. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 55/80
  • 95. An introduction to GrPPI Patterns in GrPPI Streaming patterns Nested pipelines Image processing void process(std::istream & in_file , std :: ostream & out_file) { grppi:parallel_execution_native ex; cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 56/80
  • 96. An introduction to GrPPI Patterns in GrPPI Streaming patterns Nested pipelines Image processing void process(std::istream & in_file , std :: ostream & out_file) { grppi:parallel_execution_native ex; grppi :: pipeline(ex, cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 56/80
  • 97. An introduction to GrPPI Patterns in GrPPI Streaming patterns Nested pipelines Image processing void process(std::istream & in_file , std :: ostream & out_file) { grppi:parallel_execution_native ex; grppi :: pipeline(ex, [& in_file ]() −> optional<frame> { frame f = read_frame(file); if (! file ) return {}; return f ; }, cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 56/80
  • 98. An introduction to GrPPI Patterns in GrPPI Streaming patterns Nested pipelines Image processing void process(std::istream & in_file , std :: ostream & out_file) { grppi:parallel_execution_native ex; grppi :: pipeline(ex, [& in_file ]() −> optional<frame> { frame f = read_frame(file); if (! file ) return {}; return f ; }, pipeline( []( const frame & f) { return filter (f); }, []( const frame & f) { return gray_scale(f); }, }, cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 56/80
  • 99. An introduction to GrPPI Patterns in GrPPI Streaming patterns Nested pipelines Image processing void process(std::istream & in_file , std :: ostream & out_file) { grppi:parallel_execution_native ex; grppi :: pipeline(ex, [& in_file ]() −> optional<frame> { frame f = read_frame(file); if (! file ) return {}; return f ; }, pipeline( []( const frame & f) { return filter (f); }, []( const frame & f) { return gray_scale(f); }, }, [& out_file ](const frame & f) { write_frame(out_file , f); } ); } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 56/80
  • 100. An introduction to GrPPI Patterns in GrPPI Streaming patterns Piecewise pipelines Image processing void process(std::istream & in_file , std :: ostream & out_file) { cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 57/80
  • 101. An introduction to GrPPI Patterns in GrPPI Streaming patterns Piecewise pipelines Image processing void process(std::istream & in_file , std :: ostream & out_file) { auto reader = [& in_file ]() −> optional<frame> { frame f = read_frame(file); if (! file ) return {}; return f ; }; cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 57/80
  • 102. An introduction to GrPPI Patterns in GrPPI Streaming patterns Piecewise pipelines Image processing void process(std::istream & in_file , std :: ostream & out_file) { auto reader = [& in_file ]() −> optional<frame> { frame f = read_frame(file); if (! file ) return {}; return f ; }; auto transformer = pipeline( []( const frame & f) { return filter (f); }, []( const frame & f) { return gray_scale(f); }, }; cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 57/80
  • 103. An introduction to GrPPI Patterns in GrPPI Streaming patterns Piecewise pipelines Image processing void process(std::istream & in_file , std :: ostream & out_file) { auto reader = [& in_file ]() −> optional<frame> { frame f = read_frame(file); if (! file ) return {}; return f ; }; auto transformer = pipeline( []( const frame & f) { return filter (f); }, []( const frame & f) { return gray_scale(f); }, }; auto writer = [& out_file ](const frame & f) { write_frame(out_file , f); } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 57/80
  • 104. An introduction to GrPPI Patterns in GrPPI Streaming patterns Piecewise pipelines Image processing void process(std::istream & in_file , std :: ostream & out_file) { auto reader = [& in_file ]() −> optional<frame> { frame f = read_frame(file); if (! file ) return {}; return f ; }; auto transformer = pipeline( []( const frame & f) { return filter (f); }, []( const frame & f) { return gray_scale(f); }, }; auto writer = [& out_file ](const frame & f) { write_frame(out_file , f); } grppi:parallel_execution_native ex; grppi :: pipeline(ex, reader, transformer, writer ); } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 57/80
  • 105. An introduction to GrPPI Patterns in GrPPI Streaming patterns Farm pattern A farm is a streaming pattern applicable to a stage in a pipeline, providing multiple tasks to process data items from a data stream. A farm has an associated cardinality which is the number of parallel tasks used to serve the stage. Each task in a farm runs a transformer for each data item it receives. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 58/80
  • 106. An introduction to GrPPI Patterns in GrPPI Streaming patterns Farms in pipelines Improving a video template <typename Execution> void run_pipe(const Execution & ex, std::ifstream & filein , std :: ofstream & fileout ) { grppi :: pipeline(ex, [& filein ] () −> optional<frame> { frame f = read_frame(filein); if (! filein ) retrun {}; return f ; }, farm(4, []( const frame & f) { return improve(f); }, [& fileout ] (const frame & f) { write_frame(f); } ); } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 59/80
  • 107. An introduction to GrPPI Patterns in GrPPI Streaming patterns Piecewise farms Improving a video template <typename Execution> void run_pipe(const Execution & ex, std::ifstream & filein , std :: ofstream & fileout ) { auto improver = farm(4, []( const frame & f) { return improve(f); }) ; grppi :: pipeline(ex, [& filein ] () −> optional<frame> { frame f = read_frame(filein); if (! filein ) retrun {}; return f ; }, improver, [& fileout ] (const frame & f) { write_frame(f); } ); } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 60/80
  • 108. An introduction to GrPPI Patterns in GrPPI Streaming patterns Ordering Signals if pipeline items must be consumed in the same order they were produced. Do they need to be time-stamped? Default is ordered. API ex.enable_ordering() ex.disable_ordering() bool o = ex.is_ordered() cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 61/80
  • 109. An introduction to GrPPI Patterns in GrPPI Streaming patterns Queueing properties Some policies (native and omp) use queues to communicate pipeline stages. Properties: Queue size: Buffer size of the queue. Mode: blocking versus lock-free. API ex.set_queue_attributes(100, mode::blocking) cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 62/80
  • 110. An introduction to GrPPI Patterns in GrPPI Streaming patterns Filter pattern A filter pattern discards (or keeps) the data items from a data stream based on the outcome of a predicate. This pattern can be used only as a stage of a pipeline. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 63/80
  • 111. An introduction to GrPPI Patterns in GrPPI Streaming patterns Filter pattern A filter pattern discards (or keeps) the data items from a data stream based on the outcome of a predicate. This pattern can be used only as a stage of a pipeline. Alternatives: Keep: Only data items satisfying the predicate are sent to the next stage. Discard: Only data items not satisfying the predicate are sent to the next stage. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 63/80
  • 112. An introduction to GrPPI Patterns in GrPPI Streaming patterns Filtering in Print primes bool is_prime(int n); template <typename Execution> void print_primes(const Execution & ex, int n) { grppi :: pipeline(exec, [ i=0,max=n]() mutable −> optional<int> { if ( i<=n) return i++; else return {}; }, grppi :: keep(is_prime), []( int x) { cout << x << "n"; } ); } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 64/80
  • 113. An introduction to GrPPI Patterns in GrPPI Streaming patterns Filtering out Discard words template <typename Execution> void print_primes(const Execution & ex, std::istream & is) { grppi :: pipeline(exec, [& file ]() −> optional<string> { string word; file >> word; if (! file ) { return {}; } else { return word; } }, grppi :: discard ([]( std :: string w) { return w.length() < 4; }, []( std :: string w) { cout << x << "n"; } ); } cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 65/80
  • 114. An introduction to GrPPI Evaluation 1 Introduction 2 Simple use 3 Patterns in GrPPI 4 Evaluation 5 Conclusions cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 66/80
  • 115. An introduction to GrPPI Evaluation Example Video frame processing for border detection with two filters. Gaussian blur. Sobel. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 67/80
  • 116. An introduction to GrPPI Evaluation Example Video frame processing for border detection with two filters. Gaussian blur. Sobel. Using pipeline pattern: S1: Frame reading. S2: Gaussian blur (may apply farm). S3: Sobel filter (may apply farm). S4: Frame writing. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 67/80
  • 117. An introduction to GrPPI Evaluation Example Video frame processing for border detection with two filters. Gaussian blur. Sobel. Using pipeline pattern: S1: Frame reading. S2: Gaussian blur (may apply farm). S3: Sobel filter (may apply farm). S4: Frame writing. Approaches: Manual. GrPPI. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 67/80
  • 118. An introduction to GrPPI Evaluation Parallelization effort Pipeline % LOC increase Composition C++ Threads OpenMP Intel TBB GrPPI ( p | p | p | p ) +8.8 % +13.0 % +25.9 % +1.8 % ( p | f | p | p ) +59.4 % +62.6 % +25.9 % +3.1 % ( p | p | f | p ) +60.0 % +63.9 % +25.9 % +3.1 % ( p | f | f | p ) +106.9 % +109.4 % +25.9 % +4.4 % cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 68/80
  • 119. An introduction to GrPPI Evaluation Performance: frames per second 1 4 16 64 480p 720p 1080p 1440p 2160p Fotogramasporsegundo Pipeline (p|p|p|p) 1 4 16 64 480p 720p 1080p 1440p 2160p Pipeline (p|f|p|p) 1 4 16 64 480p 720p 1080p 1440p 2160p Fotogramasporsegundo Resolución de video Pipeline (p|p|f|p) 1 4 16 64 256 480p 720p 1080p 1440p 2160p Pipeline (p|f|f|p) C++11 GrPPI C++11 OpenMP GrPPI OpenMP TBB GrPPI TBB Resolución de video cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 69/80
  • 120. An introduction to GrPPI Evaluation Brain MRI (Magnetic Resonance Imaging) Non intrusive method for getting internal anatomy. Huge amount of data generated. Applied to neuro-sciences. Bipolar disorder. Paranoia. Schizophrenia. Identification of fibers and connectivity between areas in brain. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 70/80
  • 121. An introduction to GrPPI Evaluation Fibbers in brain cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 71/80
  • 122. An introduction to GrPPI Evaluation MRI Evaluation 0 100 200 300 400 500 600 700 800 900 1000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Time(seconds) Number of threads GRPPI+SEQ GRPPI+TBB GRPPI+OMP GRPPI+THR GRPPI+FastFlow cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 72/80
  • 123. An introduction to GrPPI Conclusions 1 Introduction 2 Simple use 3 Patterns in GrPPI 4 Evaluation 5 Conclusions cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 73/80
  • 124. An introduction to GrPPI Conclusions Summary A unified programming model for sequential and parallel modes. Multiple back-ends available: sequential, OpenMP, TBB, native, FastFlow. Current pattern set: Data: map, reduce, map/reduce, stencil. Task: divide/conquer. Streaming: pipeline with nesting of farm, filter, reduction, iteration. Low programming effort. Very low performance overhead. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 74/80
  • 125. An introduction to GrPPI Conclusions Further reading A Generic Parallel Pattern Interface for Stream and Data Processing. D. del Rio, M. F. Dolz, J. Fernández, J. D. García. Concurrency and Computation: Practice and Experience. 2017. Exploring stream parallel patterns in distributed MPI environments. Javier López-Gómez, Javier Fernández Muñoz, David del Rio Astorga, Manuel F. Dolz, J. Daniel Garcia. Parallel Computing. 84:24–36, 2019. Challenging the abstraction penalty in parallel patterns libraries. J. D. Garcia, D. del Rio, M. Aldinucci, F. Tordini, M. Danelutto, G. Mencagli, M. Torquati. Journal of Supercomputing, 21pp. 2019. Paving the way towards high-level parallel pattern interfaces for data stream processing. D. del Rio, M. F. Dolz, J. Fernandez, and J. D. Garcia. Future Generation Computer Systems 87:228-241. 2018 Supporting Advanced Patterns in GrPPI: a Generic Parallel Pattern Interface. D. del rio, M. F. Dolz, J. Fernandez, and J. D. Garcia, Auto-DaSP 2017 (Euro-Par 2017). cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 75/80
  • 126. An introduction to GrPPI Conclusions Further reading Towards Automatic Parallelization of Stream Processing Applications. M. F. Dolz, D. del Rio, J. Fernandez, J. D. Garcia, and J. Carretero, IEEE Access, 6:39944-39961. 2018. Finding parallel patterns through static analysis in C++ applications. D. R. del Astorga, M. F. Dolz, L. M. Sanchez, J. D. Garcia, M. Danelutto, and M. Torquati, International Journal of High Performance Computing Applications, 2017. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 76/80
  • 127. An introduction to GrPPI Conclusions GrPPI https://github.com/arcosuc3m/grppi cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 77/80
  • 128. An introduction to GrPPI Conclusions Future work Additional backends: Accelerator integration, distributed computing, . . . cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 78/80
  • 129. An introduction to GrPPI Conclusions Future work Additional backends: Accelerator integration, distributed computing, . . . Next generation C++: Concepts (reduce metaprogramming), modules. Executors: parallel versus reactive models. Ranges. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 78/80
  • 130. An introduction to GrPPI Conclusions Future work Additional backends: Accelerator integration, distributed computing, . . . Next generation C++: Concepts (reduce metaprogramming), modules. Executors: parallel versus reactive models. Ranges. More patterns and less patterns: Simplified interface. Pattern catalog extension. Minimal building blocks identification. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 78/80
  • 131. An introduction to GrPPI Conclusions Future work Additional backends: Accelerator integration, distributed computing, . . . Next generation C++: Concepts (reduce metaprogramming), modules. Executors: parallel versus reactive models. Ranges. More patterns and less patterns: Simplified interface. Pattern catalog extension. Minimal building blocks identification. Pattern composition: Pattern fusion. Pattern transformation rules. Multi-context approach for heterogenous computing. Better support of NUMA and affinity. cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 78/80
  • 132. An introduction to GrPPI Conclusions The future grppi :: omp_execution omp; grppi :: cuda_execution cuda; std :: vector<int> v = get() ; auto comp = v | grppi :: map([](auto x) { return x∗x; }) | grppi :: reduce([](auto x, auto y) { return x+y; }) ; cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 79/80
  • 133. An introduction to GrPPI Conclusions The future grppi :: omp_execution omp; grppi :: cuda_execution cuda; std :: vector<int> v = get() ; auto comp = v | grppi :: map([](auto x) { return x∗x; }) | grppi :: reduce([](auto x, auto y) { return x+y; }) ; comp >> grppi::on(omp); cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 79/80
  • 134. An introduction to GrPPI Conclusions The future grppi :: omp_execution omp; grppi :: cuda_execution cuda; std :: vector<int> v = get() ; auto comp = v | grppi :: map([](auto x) { return x∗x; }) | grppi :: reduce([](auto x, auto y) { return x+y; }) ; comp >> grppi::on(omp); auto pipe = generator | ( grppi :: seq ([]( const frame & f) { return filter (f); }) | grppi :: farm(4, []( const frame & f) { return to_gray(f); }) >> on(cuda) ) | frame_writer >> on(omp); cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 79/80
  • 135. An introduction to GrPPI Conclusions An introduction to GrPPI Generic Reusable Parallel Patterns Interface J. Daniel Garcia ARCOS Group University Carlos III of Madrid Spain November 2019 cbed – J. Daniel Garcia – ARCOS@UC3M (josedaniel.garcia@uc3m.es) – Twitter: @jdgarciauc3m 80/80