SlideShare a Scribd company logo
1 of 28
Download to read offline
DASH: A C++ PGAS Library for
Distributed Data Structures and
Parallel Algorithms
www.dash-project.org
Tobias Fuchs
fuchst@nm.ifi.lmu.de
Ludwig-Maximilians-Universität München, MNM-Team
| 2DASH
DASH - Overview
 DASH is a C++ template library that offers
– distributed data structures and parallel algorithms
– a complete PGAS (part. global address space) programming
system without a custom (pre-)compiler
 PGAS Terminology – ShMem Analogy
Unit: The individual participants in a DASH
program, usually full OS processes.
Private
Shared
Unit 0 Unit 1 Unit N-1
int b;
int c;
dash::Array a(1000);
int a;
…
dash::Shared s;
10..190..9 ..999
Shared data:
managed by DASH
in a virtual global
address space
Private data:
managed by regular
C/C++ mechanisms
| 3DASH
DASH - Partitioned Global Address Space
 Data affinity
– data has well-defined owner but can be accessed by any unit
– data locality important for performance
– support for the owner computes execution model
 DASH:
– unified access to
local and remote
data in global
memory space
| 4DASH
DASH - Partitioned Global Address Space
 Data affinity
– data has well-defined owner but can be accessed by any unit
– data locality important for performance
– support for the owner computes execution model
 DASH:
– unified access to
local and remote
data in global
memory space
– and explicit views
on local memory
space
| 5DASH
DASH Project Structure
Phase I (2013-2015) Phase II (2016-2018)
LMU Munich
Project lead,
C++ template library
Project lead,
C++ template library,
data dock
TU Dresden
Libraries and
interfaces, tools
Smart data
structures, resilience
HLRS Stuttgart DART runtime DART runtime
KIT Karlsruhe Application studies
IHR Stuttgart
Smart deployment,
Application studies
| 6DASH
DART: The DASH Runtime
 The DART Interface
– plain C (99) interface
– SPMD execution model
– global memory abstraction
– one-sided RDMA operations and
synchronization communication
– topology discovery, locality hierarchy
 Several implementations
– DART-SHMEM shared-memory based implementation
– DART-CUDA supports GPUs, based on DART-SHMEM
– DART-GASPI Initial implementation using GASPI
– DART-MPI MPI-3 RDMA “workhorse” implementation
| 7DASH
Distributed Data Structures
 DASH offers distributed data structures
– Flexible data distribution schemes
– Example: dash::Array<T>
dash::Array<int> arr(100);
if (dash::myid() == 0) {
for (auto i=0; i < arr.size(); i++)
arr[i] = i;
}
arr.barrier();
if (dash::myid() == 1) {
for (auto e: arr)
cout << (int)e << " ";
cout << endl;
}
DASH global array of 100 integers,
distributed over all units,
default distribution is BLOCKED
unit 0 writes to the array using the
global index i. Operator [] is
overloaded for the dash::Array.
$ mpirun -n 4 ./array
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
88 89 90 91 92 93 94 95 96 97 98 99
unit 1 executes a range-based for loop
over the DASH array
| 8DASH
Accessing Local Data
 Access to local ranges in container:
local-view proxy .local
dash::Array<int> arr(100);
for (int i=0; i < arr.lsize(); i++)
arr.local[i] = dash::myid();
arr.barrier();
if (dash::myid() == 0) {
for (auto e: arr)
cout << (int)e << " ";
cout << endl;
}
$ mpirun -n 4 ./array
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
3 3 3 3
.local is a proxy object that
represents the part of the data
that is local to a unit.
 no communication here!
.lsize() is short hand for
.local.size() and returns the
number of local elements
| 9DASH
Efficient Local Access
 IGUPs Benchmark: independent parallel updates
| 10DASH
Using STL Algorithms
int main(int argc, char* argv[])
{
dash::init(&argc, &argv);
dash::Array<int> a(1000);
if (dash::myid() == 0) {
// global iterators and std. algorithms
std::sort(a.begin(), a.end());
}
// local access using local iterators
std::fill(a.lbegin(), a.lend(), 23+dash::myid());
dash::finalize();
}
Collective constructor, all
units involved
Standard library algorithms
work with DASH global
iterator ranges …
… as well as DASH local
iterator ranges
 STL algorithms can be used with DASH containers
… on both local and global ranges
| 11DASH
DASH Distributed Data Structures Overview
Container Description Data distribution
Array<T> 1D Array static, configurable
NArray<T, N> N-dim. Array static, configurable
Shared<T> Shared scalar fixed (at 0)
Directory(*)<T> Variable-size,
locally indexed
array
manual,
load-balanced
List<T> Variable-size
linked list
dynamic,
load-balanced
Map<T> Variable-size
associative map
dynamic, balanced
by hash function
(*) Under construction
| 12DASH
Data Distribution Patterns
 Data distribution patterns are configurable
 Assume 4 units
 Custom data distributions:
– just implement the DASH Pattern concept
dash::Array<int> arr1(20); // default: BLOCKED
dash::Array<int> arr2(20, dash::BLOCKED)
dash::Array<int> arr3(20, dash::CYCLIC)
dash::Array<int> arr4(20, dash::BLOCKCYCLIC(3))
// use your own data distribution:
dash::Array<int, MyPattern> arr5(20, MyPattern(…))
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 8 12 164 51 13 179 102 6 18143 7 11 15 19
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
BLOCKED
CYCLIC
BLOCKCYCLIC(3)
arr1, arr2
arr3
arr4
| 13DASH
Multidimensional Data Distribution (1)
 dash::Pattern<N> specifies N-dim data distribution
– Blocked, cyclic, and block-cyclic in multiple dimensions
Pattern<2>(20, 15)
(BLOCKED,
NONE)
(NONE,
BLOCKCYCLIC(2))
(BLOCKED,
BLOCKCYCLIC(3))
Extent in first and
second dimension
Distribution in first and
second dimension
| 14DASH
Multidimensional Data Distribution (2)
 Example: tiled and tile-shifted data distribution
(TILE(4), TILE(3))
ShiftTilePattern<2>(32, 24)TilePattern<2, COL_MAJOR>(20, 15)
(TILE(5), TILE(5))
| 15DASH
The N-Dimensional Array
 Distributed Multidimensional Array Abstraction
dash::NArray (dash::Matrix)
– Dimension is a template parameter
– Element access using coordinates or linear index
– Support for custom index types
– Support for row-major and column-major storage
dash::NArray<int, 2> mat(40, 30); // 1200 elements
int a = mat(i,j); // Fortran style access
int b = mat[i][j]; // chained subscripts
auto loc = mat.local;
int c = mat.local[i][j];
int d = *(mat.local.begin()); // local iterator
| 16DASH
Multidimensional Views
 Lightweight Multidimensional Views
// 8x8 2D array
dash::NArray<int, 2> mat(8,8);
// linear access using iterators
dash::distance(mat.begin(), mat.end()) == 64
// create 2x5 region view
auto reg = matrix.cols(2,5).rows(3,2);
// region can be used just like 2D array
cout << reg[1][2] << endl; // ‘7’
dash::distance(reg.begin(), reg.end()) == 10
| 17DASH
DASH Algorithms (1)
 Growing number of DASH equivalents for STL algorithms:
 Examples of STL algorithms ported to DASH
(which also work for multidimensional ranges!)
– dash::copy range[i] <- range2[i]
– dash::fill range[i] <- val
– dash::generate range[i] <- func()
– dash::for_each func(range[i])
– dash::transform range[i] = func(range2[i])
– dash::accumulate sum(range[i]) (0<=i<=n-1)
– dash::min_element min(range[i]) (0<=i<=n-1)
dash::GlobIter<T> dash::fill(GlobIter<T> begin,
GlobIter<T> end,
T val);
| 18DASH
DASH Algorithms (2)
 Example: Find the min. element in a distributed array
dash::Array<int> arr(100, dash::BLOCKED);
// ...
auto min = dash::min_element(arr.begin(),
arr.end());
if (dash::myid() == 0) {
cout << “Global minimum: “ << (int)*min
<< endl;
}
Collective call,
returns global pointer
to min. element
 reduce results of
std::min_element
of all local ranges
 Features
– Still works when using CYCLIC or any other distribution
– Still works when using a range other than [begin, end)
| 19DASH
Performance of dash::min_element() (int)
| 20DASH
Asynchronous Communication
 Realized via two mechanisms:
– Asynchronous execution policy, e.g. dash::copy_async()
– .async proxy object on DASH containers
for (i = dash::myid(); i < NUPDATE; i += dash::size()) {
ran = (ran << 1) ^ (((int64_t) ran < 0) ? POLY : 0);
// using .async proxy object
// .async proxy object for async. communication
Table.async[ran & (TableSize-1)] ^= ran;
}
Table.flush();
// async. copy of global range to local memory
std::vector<int> lcopy(block.size());
auto fut = dash::copy_async(block.begin(), block.end(),
lcopy.begin());
...
fut.wait();
| 21DASH
Hierarchical Machines
 Machines are getting increasingly hierarchical
– Both within nodes and between nodes
– Data locality is the most crucial factor for performance and
energy efficiency
Hierarchical locality not well supported by current
approaches. PGAS languages usually only offer a two-level
differentiation (local vs. remote).
| 22DASH
dash::Team& t0 = dash::Team::All();
dash::Array<int> arr1(100);
// same as
dash::Array<int> arr2(100, t0);
// allocate array over t1
auto t1 = t0.split(2)
dash::Array<int> arr3(100, t1);
Teams
 Teams are everywhere in DASH (but not always visible)
– Team: ordered set of units
– Default team: dash::Team::All() - all units that exist at startup
– team.split(N) – split an existing team into N sub-teams
DART_TEAM_ALL
{0,…,7}
Node 0 {0,…,3} Node 1 {4,…,7}
ND 0 {0,1} ND 1 {2,3} ND 0 {4,5} ND 1 {6,7}
ID=2
ID=0
ID=1
ID=2 ID=3 ID=3 ID=4
| 25DASH
Porting LULESH
 LULESH: A shock-hydrodynamics mini-application
class Domain // local data
{
private:
std::vector<Real_t> m_x;
// many more...
public:
Domain() { // C’tor
m_x.resize(numNodes);
//...
}
Real_t& x(Index_t idx) {
return m_x[idx];
}
};
class Domain // global data
{
private:
dash::NArray<Real_t, 3> m_x;
// many more...
public:
Domain() { // C’tor
dash::Pattern<3> nodePat(
nx()*px(), ny()*py(), nz()*pz(),
BLOCKED, BLOCKED, BLOCKED);
m_x.allocate(nodePat);
}
Real_t& x(Index_t idx) {
return m_x.lbegin()[idx];
}
};
MPI Version DASH Version
With DASH, you specify data distribution
once and explicitly instead of implicitly
and repeatedly in your algorithm!
| 26DASH
Porting LULESH (goals)
 Easy to remove limitations of MPI domain decomposition
– MPI version: cubic number of MPI processes only (P x P x P),
cubic per-processor grid required
– DASH port: allows any P x Q x R configuration
 Avoid replication, manual index calculation, bookkeeping
manual index calculation
manual bookkeeping
implicit assumptions
… replicated about 80x in MPI code
| 27DASH
Performance of the DASH Version (using put)
 Performance and scalability (weak scaling) of LULESH,
implemented in MPI and DASH
Karl Fürlinger, Tobias Fuchs, and Roger Kowalewski. DASH: A C++ PGAS Library for Distributed Data
Structures and Parallel Algorithms. In Proceedings of the 18th IEEE International Conference on High
Performance Computing and Communications (HPCC 2016). Sydney, Australia, December 2016.
| 28DASH
DASH: On-going and Future Work
DASH Runtime (DART)
DASH C++ Template Library
DASH Application
ToolsandInterfaces
Hardware: Network, Processor,
Memory, Storage
One-sided Communication
Substrate
MPI GASnet GASPIARMCI
DART API
Task-based
execution
LULESH port
DASH algorithms
DASH Halo Matrix
DASH for graph
problems
Topology
discovery,
multilevel
locality
Memory spaces
(NVRAM and HBW)
Dynamic
data structures
| 29DASH
Summary
 DASH is
– a complete data-oriented PGAS
programming system: entire applications
can be written in DASH
– a library that provides distributed data
structures: DASH can be integrated into
existing MPI applications
 More information
– Upcoming talk on multi-dimensional array:
Session 9-B in 30min!
– http://www.dash-project.org/
– http://github.com/dash-project/
| 30DASH
Acknowledgements
DASH on GitHub:
https://github.com/dash-project/dash/
 Funding
 The DASH Team
T. Fuchs (LMU), R. Kowalewski (LMU), D. Hünich (TUD), A. Knüpfer
(TUD), J. Gracia (HLRS), C. Glass (HLRS), H. Zhou (HLRS), K. Idrees
(HLRS), J. Schuchart (HLRS), F. Mößbauer (LMU), K. Fürlinger (LMU)

More Related Content

What's hot

Opensource gis development - part 4
Opensource gis development - part 4Opensource gis development - part 4
Opensource gis development - part 4Andrea Antonello
 
Variables in matlab
Variables in matlabVariables in matlab
Variables in matlabTUOS-Sam
 
Social network-analysis-in-python
Social network-analysis-in-pythonSocial network-analysis-in-python
Social network-analysis-in-pythonJoe OntheRocks
 
Lecture 14 data structures and algorithms
Lecture 14 data structures and algorithmsLecture 14 data structures and algorithms
Lecture 14 data structures and algorithmsAakash deep Singhal
 
How Matlab Helps
How Matlab HelpsHow Matlab Helps
How Matlab Helpssiufu
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised HashingSean Moran
 
"contour.py" module
"contour.py" module"contour.py" module
"contour.py" moduleArulalan T
 
Linear models
Linear modelsLinear models
Linear modelsFAO
 
Md2k 0219 shang
Md2k 0219 shangMd2k 0219 shang
Md2k 0219 shangBBKuhn
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXBenjamin Bengfort
 
Project - Deep Locality Sensitive Hashing
Project - Deep Locality Sensitive HashingProject - Deep Locality Sensitive Hashing
Project - Deep Locality Sensitive HashingGabriele Angeletti
 
Edit/correct India Map In Cdat Documentation - With Edited World Map Data
Edit/correct India Map In Cdat  Documentation -  With Edited World Map Data Edit/correct India Map In Cdat  Documentation -  With Edited World Map Data
Edit/correct India Map In Cdat Documentation - With Edited World Map Data Arulalan T
 
Introduction to Mahout
Introduction to MahoutIntroduction to Mahout
Introduction to MahoutTed Dunning
 
Algo labpresentation a_group
Algo labpresentation a_groupAlgo labpresentation a_group
Algo labpresentation a_groupUmme habiba
 

What's hot (20)

Lecture set 5
Lecture set 5Lecture set 5
Lecture set 5
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
 
Opensource gis development - part 4
Opensource gis development - part 4Opensource gis development - part 4
Opensource gis development - part 4
 
Variables in matlab
Variables in matlabVariables in matlab
Variables in matlab
 
Social network-analysis-in-python
Social network-analysis-in-pythonSocial network-analysis-in-python
Social network-analysis-in-python
 
Lecture 14 data structures and algorithms
Lecture 14 data structures and algorithmsLecture 14 data structures and algorithms
Lecture 14 data structures and algorithms
 
Computer Science Assignment Help
Computer Science Assignment Help Computer Science Assignment Help
Computer Science Assignment Help
 
How Matlab Helps
How Matlab HelpsHow Matlab Helps
How Matlab Helps
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised Hashing
 
"contour.py" module
"contour.py" module"contour.py" module
"contour.py" module
 
Linear models
Linear modelsLinear models
Linear models
 
Md2k 0219 shang
Md2k 0219 shangMd2k 0219 shang
Md2k 0219 shang
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
 
Vector in R
Vector in RVector in R
Vector in R
 
Project - Deep Locality Sensitive Hashing
Project - Deep Locality Sensitive HashingProject - Deep Locality Sensitive Hashing
Project - Deep Locality Sensitive Hashing
 
Edit/correct India Map In Cdat Documentation - With Edited World Map Data
Edit/correct India Map In Cdat  Documentation -  With Edited World Map Data Edit/correct India Map In Cdat  Documentation -  With Edited World Map Data
Edit/correct India Map In Cdat Documentation - With Edited World Map Data
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Introduction to Mahout
Introduction to MahoutIntroduction to Mahout
Introduction to Mahout
 
Algo labpresentation a_group
Algo labpresentation a_groupAlgo labpresentation a_group
Algo labpresentation a_group
 
Parallel algorithm in linear algebra
Parallel algorithm in linear algebraParallel algorithm in linear algebra
Parallel algorithm in linear algebra
 

Similar to DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorithms (HPCC'16)

Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Massimo Schenone
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationateeq ateeq
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizationsscottcrespo
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA Japan
 
Transformations and actions a visual guide training
Transformations and actions a visual guide trainingTransformations and actions a visual guide training
Transformations and actions a visual guide trainingSpark Summit
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkDatio Big Data
 
Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Satalia
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLMLconf
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
Spark 计算模型
Spark 计算模型Spark 计算模型
Spark 计算模型wang xing
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELJoel Falcou
 
Time Series Meetup: Virtual Edition | July 2020
Time Series Meetup: Virtual Edition | July 2020Time Series Meetup: Virtual Edition | July 2020
Time Series Meetup: Virtual Edition | July 2020InfluxData
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionChetan Khatri
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.pptCheeWeiTan10
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectMatthew Gerring
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSArchana Gopinath
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalArvind Surve
 

Similar to DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorithms (HPCC'16) (20)

Apache Spark: What? Why? When?
Apache Spark: What? Why? When?Apache Spark: What? Why? When?
Apache Spark: What? Why? When?
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other OptimizationsMastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
Transformations and actions a visual guide training
Transformations and actions a visual guide trainingTransformations and actions a visual guide training
Transformations and actions a visual guide training
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++Options and trade offs for parallelism and concurrency in Modern C++
Options and trade offs for parallelism and concurrency in Modern C++
 
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATLSandy Ryza – Software Engineer, Cloudera at MLconf ATL
Sandy Ryza – Software Engineer, Cloudera at MLconf ATL
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Spark 计算模型
Spark 计算模型Spark 计算模型
Spark 计算模型
 
Automatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSELAutomatic Task-based Code Generation for High Performance DSEL
Automatic Task-based Code Generation for High Performance DSEL
 
Time Series Meetup: Virtual Edition | July 2020
Time Series Meetup: Virtual Edition | July 2020Time Series Meetup: Virtual Edition | July 2020
Time Series Meetup: Virtual Edition | July 2020
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science Project
 
Pune Clojure Course Outline
Pune Clojure Course OutlinePune Clojure Course Outline
Pune Clojure Course Outline
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICS
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
 

Recently uploaded

8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 

Recently uploaded (20)

8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 

DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorithms (HPCC'16)

  • 1. DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorithms www.dash-project.org Tobias Fuchs fuchst@nm.ifi.lmu.de Ludwig-Maximilians-Universität München, MNM-Team
  • 2. | 2DASH DASH - Overview  DASH is a C++ template library that offers – distributed data structures and parallel algorithms – a complete PGAS (part. global address space) programming system without a custom (pre-)compiler  PGAS Terminology – ShMem Analogy Unit: The individual participants in a DASH program, usually full OS processes. Private Shared Unit 0 Unit 1 Unit N-1 int b; int c; dash::Array a(1000); int a; … dash::Shared s; 10..190..9 ..999 Shared data: managed by DASH in a virtual global address space Private data: managed by regular C/C++ mechanisms
  • 3. | 3DASH DASH - Partitioned Global Address Space  Data affinity – data has well-defined owner but can be accessed by any unit – data locality important for performance – support for the owner computes execution model  DASH: – unified access to local and remote data in global memory space
  • 4. | 4DASH DASH - Partitioned Global Address Space  Data affinity – data has well-defined owner but can be accessed by any unit – data locality important for performance – support for the owner computes execution model  DASH: – unified access to local and remote data in global memory space – and explicit views on local memory space
  • 5. | 5DASH DASH Project Structure Phase I (2013-2015) Phase II (2016-2018) LMU Munich Project lead, C++ template library Project lead, C++ template library, data dock TU Dresden Libraries and interfaces, tools Smart data structures, resilience HLRS Stuttgart DART runtime DART runtime KIT Karlsruhe Application studies IHR Stuttgart Smart deployment, Application studies
  • 6. | 6DASH DART: The DASH Runtime  The DART Interface – plain C (99) interface – SPMD execution model – global memory abstraction – one-sided RDMA operations and synchronization communication – topology discovery, locality hierarchy  Several implementations – DART-SHMEM shared-memory based implementation – DART-CUDA supports GPUs, based on DART-SHMEM – DART-GASPI Initial implementation using GASPI – DART-MPI MPI-3 RDMA “workhorse” implementation
  • 7. | 7DASH Distributed Data Structures  DASH offers distributed data structures – Flexible data distribution schemes – Example: dash::Array<T> dash::Array<int> arr(100); if (dash::myid() == 0) { for (auto i=0; i < arr.size(); i++) arr[i] = i; } arr.barrier(); if (dash::myid() == 1) { for (auto e: arr) cout << (int)e << " "; cout << endl; } DASH global array of 100 integers, distributed over all units, default distribution is BLOCKED unit 0 writes to the array using the global index i. Operator [] is overloaded for the dash::Array. $ mpirun -n 4 ./array 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 unit 1 executes a range-based for loop over the DASH array
  • 8. | 8DASH Accessing Local Data  Access to local ranges in container: local-view proxy .local dash::Array<int> arr(100); for (int i=0; i < arr.lsize(); i++) arr.local[i] = dash::myid(); arr.barrier(); if (dash::myid() == 0) { for (auto e: arr) cout << (int)e << " "; cout << endl; } $ mpirun -n 4 ./array 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 .local is a proxy object that represents the part of the data that is local to a unit.  no communication here! .lsize() is short hand for .local.size() and returns the number of local elements
  • 9. | 9DASH Efficient Local Access  IGUPs Benchmark: independent parallel updates
  • 10. | 10DASH Using STL Algorithms int main(int argc, char* argv[]) { dash::init(&argc, &argv); dash::Array<int> a(1000); if (dash::myid() == 0) { // global iterators and std. algorithms std::sort(a.begin(), a.end()); } // local access using local iterators std::fill(a.lbegin(), a.lend(), 23+dash::myid()); dash::finalize(); } Collective constructor, all units involved Standard library algorithms work with DASH global iterator ranges … … as well as DASH local iterator ranges  STL algorithms can be used with DASH containers … on both local and global ranges
  • 11. | 11DASH DASH Distributed Data Structures Overview Container Description Data distribution Array<T> 1D Array static, configurable NArray<T, N> N-dim. Array static, configurable Shared<T> Shared scalar fixed (at 0) Directory(*)<T> Variable-size, locally indexed array manual, load-balanced List<T> Variable-size linked list dynamic, load-balanced Map<T> Variable-size associative map dynamic, balanced by hash function (*) Under construction
  • 12. | 12DASH Data Distribution Patterns  Data distribution patterns are configurable  Assume 4 units  Custom data distributions: – just implement the DASH Pattern concept dash::Array<int> arr1(20); // default: BLOCKED dash::Array<int> arr2(20, dash::BLOCKED) dash::Array<int> arr3(20, dash::CYCLIC) dash::Array<int> arr4(20, dash::BLOCKCYCLIC(3)) // use your own data distribution: dash::Array<int, MyPattern> arr5(20, MyPattern(…)) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 0 8 12 164 51 13 179 102 6 18143 7 11 15 19 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 BLOCKED CYCLIC BLOCKCYCLIC(3) arr1, arr2 arr3 arr4
  • 13. | 13DASH Multidimensional Data Distribution (1)  dash::Pattern<N> specifies N-dim data distribution – Blocked, cyclic, and block-cyclic in multiple dimensions Pattern<2>(20, 15) (BLOCKED, NONE) (NONE, BLOCKCYCLIC(2)) (BLOCKED, BLOCKCYCLIC(3)) Extent in first and second dimension Distribution in first and second dimension
  • 14. | 14DASH Multidimensional Data Distribution (2)  Example: tiled and tile-shifted data distribution (TILE(4), TILE(3)) ShiftTilePattern<2>(32, 24)TilePattern<2, COL_MAJOR>(20, 15) (TILE(5), TILE(5))
  • 15. | 15DASH The N-Dimensional Array  Distributed Multidimensional Array Abstraction dash::NArray (dash::Matrix) – Dimension is a template parameter – Element access using coordinates or linear index – Support for custom index types – Support for row-major and column-major storage dash::NArray<int, 2> mat(40, 30); // 1200 elements int a = mat(i,j); // Fortran style access int b = mat[i][j]; // chained subscripts auto loc = mat.local; int c = mat.local[i][j]; int d = *(mat.local.begin()); // local iterator
  • 16. | 16DASH Multidimensional Views  Lightweight Multidimensional Views // 8x8 2D array dash::NArray<int, 2> mat(8,8); // linear access using iterators dash::distance(mat.begin(), mat.end()) == 64 // create 2x5 region view auto reg = matrix.cols(2,5).rows(3,2); // region can be used just like 2D array cout << reg[1][2] << endl; // ‘7’ dash::distance(reg.begin(), reg.end()) == 10
  • 17. | 17DASH DASH Algorithms (1)  Growing number of DASH equivalents for STL algorithms:  Examples of STL algorithms ported to DASH (which also work for multidimensional ranges!) – dash::copy range[i] <- range2[i] – dash::fill range[i] <- val – dash::generate range[i] <- func() – dash::for_each func(range[i]) – dash::transform range[i] = func(range2[i]) – dash::accumulate sum(range[i]) (0<=i<=n-1) – dash::min_element min(range[i]) (0<=i<=n-1) dash::GlobIter<T> dash::fill(GlobIter<T> begin, GlobIter<T> end, T val);
  • 18. | 18DASH DASH Algorithms (2)  Example: Find the min. element in a distributed array dash::Array<int> arr(100, dash::BLOCKED); // ... auto min = dash::min_element(arr.begin(), arr.end()); if (dash::myid() == 0) { cout << “Global minimum: “ << (int)*min << endl; } Collective call, returns global pointer to min. element  reduce results of std::min_element of all local ranges  Features – Still works when using CYCLIC or any other distribution – Still works when using a range other than [begin, end)
  • 19. | 19DASH Performance of dash::min_element() (int)
  • 20. | 20DASH Asynchronous Communication  Realized via two mechanisms: – Asynchronous execution policy, e.g. dash::copy_async() – .async proxy object on DASH containers for (i = dash::myid(); i < NUPDATE; i += dash::size()) { ran = (ran << 1) ^ (((int64_t) ran < 0) ? POLY : 0); // using .async proxy object // .async proxy object for async. communication Table.async[ran & (TableSize-1)] ^= ran; } Table.flush(); // async. copy of global range to local memory std::vector<int> lcopy(block.size()); auto fut = dash::copy_async(block.begin(), block.end(), lcopy.begin()); ... fut.wait();
  • 21. | 21DASH Hierarchical Machines  Machines are getting increasingly hierarchical – Both within nodes and between nodes – Data locality is the most crucial factor for performance and energy efficiency Hierarchical locality not well supported by current approaches. PGAS languages usually only offer a two-level differentiation (local vs. remote).
  • 22. | 22DASH dash::Team& t0 = dash::Team::All(); dash::Array<int> arr1(100); // same as dash::Array<int> arr2(100, t0); // allocate array over t1 auto t1 = t0.split(2) dash::Array<int> arr3(100, t1); Teams  Teams are everywhere in DASH (but not always visible) – Team: ordered set of units – Default team: dash::Team::All() - all units that exist at startup – team.split(N) – split an existing team into N sub-teams DART_TEAM_ALL {0,…,7} Node 0 {0,…,3} Node 1 {4,…,7} ND 0 {0,1} ND 1 {2,3} ND 0 {4,5} ND 1 {6,7} ID=2 ID=0 ID=1 ID=2 ID=3 ID=3 ID=4
  • 23. | 25DASH Porting LULESH  LULESH: A shock-hydrodynamics mini-application class Domain // local data { private: std::vector<Real_t> m_x; // many more... public: Domain() { // C’tor m_x.resize(numNodes); //... } Real_t& x(Index_t idx) { return m_x[idx]; } }; class Domain // global data { private: dash::NArray<Real_t, 3> m_x; // many more... public: Domain() { // C’tor dash::Pattern<3> nodePat( nx()*px(), ny()*py(), nz()*pz(), BLOCKED, BLOCKED, BLOCKED); m_x.allocate(nodePat); } Real_t& x(Index_t idx) { return m_x.lbegin()[idx]; } }; MPI Version DASH Version With DASH, you specify data distribution once and explicitly instead of implicitly and repeatedly in your algorithm!
  • 24. | 26DASH Porting LULESH (goals)  Easy to remove limitations of MPI domain decomposition – MPI version: cubic number of MPI processes only (P x P x P), cubic per-processor grid required – DASH port: allows any P x Q x R configuration  Avoid replication, manual index calculation, bookkeeping manual index calculation manual bookkeeping implicit assumptions … replicated about 80x in MPI code
  • 25. | 27DASH Performance of the DASH Version (using put)  Performance and scalability (weak scaling) of LULESH, implemented in MPI and DASH Karl Fürlinger, Tobias Fuchs, and Roger Kowalewski. DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorithms. In Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications (HPCC 2016). Sydney, Australia, December 2016.
  • 26. | 28DASH DASH: On-going and Future Work DASH Runtime (DART) DASH C++ Template Library DASH Application ToolsandInterfaces Hardware: Network, Processor, Memory, Storage One-sided Communication Substrate MPI GASnet GASPIARMCI DART API Task-based execution LULESH port DASH algorithms DASH Halo Matrix DASH for graph problems Topology discovery, multilevel locality Memory spaces (NVRAM and HBW) Dynamic data structures
  • 27. | 29DASH Summary  DASH is – a complete data-oriented PGAS programming system: entire applications can be written in DASH – a library that provides distributed data structures: DASH can be integrated into existing MPI applications  More information – Upcoming talk on multi-dimensional array: Session 9-B in 30min! – http://www.dash-project.org/ – http://github.com/dash-project/
  • 28. | 30DASH Acknowledgements DASH on GitHub: https://github.com/dash-project/dash/  Funding  The DASH Team T. Fuchs (LMU), R. Kowalewski (LMU), D. Hünich (TUD), A. Knüpfer (TUD), J. Gracia (HLRS), C. Glass (HLRS), H. Zhou (HLRS), K. Idrees (HLRS), J. Schuchart (HLRS), F. Mößbauer (LMU), K. Fürlinger (LMU)