SlideShare a Scribd company logo
www.bsc.es
Leipzig, June 17th 2013
Jesus Labarta
Jesus.labarta@bsc.es
OmpSs – improving the
scalability of OpenMP
2
The parallel programming revolution
Parallel programming in the past
– Where to place data
– What to run where
– How to communicate
Parallel programming in the future
– What do I need to compute
– What data do I need to use
– hints (not necessarily very precise) on
potential concurrency, locality,…
Schedule @ programmers mind
Static
Schedule @ system
Dynamic
Complexity  ! Awareness
Variability
3
Key concept
– Sequential task based program on single address/name space
+ directionality annotations
– Happens to execute parallel: Automatic run time computation of
dependencies between tasks
Differentiation of StarSs
– Dependences: Tasks instantiated but not ready. Order IS defined
• Lookahead
– Avoid stalling the main control flow when a computation depending on previous
tasks is reached
– Possibility to “see” the future searching for further potential concurrency
– Locality aware
– Homogenizing heterogeneity
The StarSs family of programming models
4
The StarSs “Granularities”
StarSs
OmpSs COMPSs
@ SMP @ GPU @ Cluster
Average task Granularity:
100 microseconds – 10 milliseconds 1second - 1 day
Language binding:
C, C++, FORTRAN Java, Python
Address space to compute dependences:
Memory Files, Objects (SCM)
Parallel Ensemble, workflow
5
OpenMP
void Cholesky(int NT, float *A[NT][NT] ) {
#pragma omp parallel
#pragma omp single
for (int k=0; k<NT; k++) {
#pragma omp task
spotrf (A[k][k], TS);
#pragma omp taskwait
for (int i=k+1; i<NT; i++) {
#pragma omp task
strsm (A[k][k], A[k][i], TS);
}
#pragma omp taskwait
for (int i=k+1; i<NT; i++) {
for (j=k+1; j<i; j++)
#pragma omp task
sgemm( A[k][i], A[k][j], A[j][i], TS);
#pragma omp task
ssyrk (A[k][i], A[i][i], TS);
#pragma omp taskwait
}
}
}
6
Extend OpenMP with a data-flow execution model to exploit
unstructured parallelism (compared to fork-join)
– in/out pragmas to enable dependence and locality
– Inlined/outlined
void Cholesky(int NT, float *A[NT][NT] ) {
for (int k=0; k<NT; k++) {
#pragma omp task inout ([TS][TS]A[k][k])
spotrf (A[k][k], TS) ;
for (int i=k+1; i<NT; i++) {
#pragma omp task in(([TS][TS]A[k][k])) 
inout ([TS][TS]A[k][i])
strsm (A[k][k], A[k][i], TS);
}
for (int i=k+1; i<NT; i++) {
for (j=k+1; j<i; j++) {
#pragma omp task in([TS][TS]A[k][i]),
in([TS][TS]A[k][j]) inout ([TS][TS]A[j][i])
sgemm( A[k][i], A[k][j], A[j][i], TS);
}
#pragma omp task in ([TS][TS]A[k][i]) 
inout([TS][TS]A[i][i])
ssyrk (A[k][i], A[i][i], TS);
}
}
}
OmpSs
7
OmpSs
Other features and characteristics
– Contiguous / strided dependence detection and data management
– Nesting/recursion
– Multiple implementations for a given task
– Integrated with powerful performance tools (Extrae+Paraver)
Continuous development and use
– Since 2004
– On large applications and systems
Pushing ideas into the OpenMP standard
– Developer Positioning: efforts in OmpSs will not be lost.
– Dependences  v 4.0
18
Used in projects and applications …
Undertaken significant efforts to port real large scale
applications:
–
• Scalapack, PLASMA, SPECFEM3D, LBC, CPMD PSC, PEPC,
LS1 Mardyn, Asynchronous algorithms, Microbenchmarks
–
• YALES2, EUTERPE, SPECFEM3D, MP2C, BigDFT,
QuantumESPRESSO, PEPC, SMMP, ProFASI, COSMO, BQCD
– DEEP
• NEURON, iPIC3D, ECHAM/MESSy, AVBP, TurboRVB, Seismic
– G8_ECS
• CGPOP, NICAM (planed) …
– Consolider project (Spanish ministry)
• MRGENESIS
– BSC initiatives and collaborations:
• GROMACS, GADGET, WRF,…
19
… but NOT only for «scientific computing» …
Plagiarism detection
– Histograms, sorting, … (FhI FIRST)
Trace browsing
– Paraver (BSC)
Clustering algorithms
– G-means (BSC)
Image processing
– Tracking (USAF)
Embedded and consumer
– H.264 (TUBerlin), …
8
The potential of asynchrony
Performance …
– Cholesky 8K x 8K @ SandyBridge (E5-2670, 2.6 GHz)
– Intel Math Kernel Library (MKL) parallel vs. OmpSs+ MKL sequential
• Nested OmpSs (large block size, small block size)
 Reproducibility / less sensitivity to environment options
Strong scaling
Impact of problem size
for fixed core count
9
• Detected issues (ongoing work)
– scheduling policies (generic of nested OmpSs),
– contention on task queues (specific of high #cores),…
• Still fairly good performance!!!
Xeon Phi (I)
10
Hybrid MPI/SMPSs
Linpack example
Overlap communication/computation
Extend asynchronous data-flow execution to
outer level
Automatic lookahead
…
for (k=0; k<N; k++) {
if (mine) {
Factor_panel(A[k]);
send (A[k])
} else {
receive (A[k]);
if (necessary) resend (A[k]);
}
for (j=k+1; j<N; j++)
update (A[k], A[j]);
…
#pragma css task inout(A[SIZE])
void Factor_panel(float *A);
#pragma css task input(A[SIZE]) inout(B[SIZE])
void update(float *A, float *B);
#pragma css task input(A[SIZE])
void send(float *A);
#pragma css task output(A[SIZE])
void receive(float *A);
#pragma css task input(A[SIZE])
void resend(float *A);
P0 P1 P2
V. Marjanovic, et al, “Overlapping Communication and Computation by using a Hybrid MPI/SMPSs Approach” ICS 2010
11
Automatically achieved by the runtime
– Load balance within node
– Fine grain.
– Complementary to application level
load balance.
– Leverage OmpSs malleability
LeWI: Lend core When Idle
– User level Run time Library (DLB)
coordinating processes within node
– Fighting the Linux kernel: Explicit pinning of
threads and handoff scheduling
Overcome Amdahl’s law in hybrid
programming
– Enable partial node level parallelization
– hope for lazy programmers
… and us all
Dynamic Load Balancing: LeWI
Communication
Running task
Idle
4 MPI processes @ 4 cores node
“LeWI: A Runtime Balancing
Algorithm for Nested Parallelism”.
M.Garcia et al. ICPP09
12
Offloading to Intel® Xeon Phi™ (SRMPI app @ DEEP)
Host
KNC
KNC
KNC
KNC
Host
KNC
KNC
KNC
KNC
MPI
MPI
MPI
void main(int argc, char* argv[]) {
//MPI stuff ….
for(int i=0; i<my_jobs; i++) {
int idx = i % workers;
#pragma omp task out(worker_image[idx])
worker(worker_image[idx]);
#pragma omp task inout(master_image) in(worker_image[idx]);
accum(master_image, worker_image[idx])
}
#pragma omp taskwait
global_accum(global_pool, master_image); // MPI
}
Host
KNC
KNC
KNC
KNC
Host
KNC
KNC
KNC
KNC
MPI
OmpSs
OmpSs
void main(int argc, char* argv[]) {
//MPI stuff ….
for(int i=0; i<my_jobs; i++) {
int idx = i % workers;
// MPI stuff to send job to worker
//…
//….manually taking care of load balance
// …
// …. and collect results on worker_pool[idx]
accum(master_image, worker_image[idx])
}
global_accum(global_image, master_image); // MPI
}
void main(int argc, char* argv[]) {
for(int i=0; i<my_jobs; i++) {
int idx = i % workers;
#pragma omp task out(worker_image[idx])
worker(worker_image[idx]);
#pragma omp task inout(master_image) in(worker_image[idx]);
accum(global_image, worker_image[idx])
}
#pragma omp taskwait
}
13
Offloading to Intel® Xeon Phi™ (SRMPI app @ DEEP)
Productivity …
… and performance
… flexibility …
14
MPI + OmpSs @ MIC
– Hydro 8 MPI processes, 2 cores/process, 4 OmpSs threads/process
– Mapped to 64 hardware threads (16 cores)
4 OmpSs threads/Core
2 OmpSs threads/Core
3 OmpSs threads@ Core 1
2 OmpSs threads/Core
1 OmpSs threads@ Core 2
15
OmpSs: Enabler for exascale
 Can exploit very unstructured
parallelism
 Not just loop/data parallelism
 Easy to change structure
 Supports large amounts of
lookahead
 Not stalling for dependence satisfaction
 Allow for locality optimizations to
tolerate latency
 Overlap data transfers, prefetch
 Reuse
 Nicely hybridizes into MPI/StarSs
 Propagates to large scale the node level
dataflow characteristics
 Overlap communication and computation
 A chance against Amdahl’s law
 Support for heterogeneity
 Any # and combination of CPUs, GPUs
 Including autotuning
 Malleability: Decouple program from
resources
 Allowing dynamic resource allocation and
load balance
 Tolerate noise
15
Asynchrony: Data-flow
Locality
Simple / incremental
interface to programmers
Compatible with proprietary
lower level technologies
THANKS

More Related Content

What's hot

An evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loopsAn evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loops
Linaro
 
あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法
x1 ichi
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Jen Aman
 
Using R in remote computer clusters
Using R in remote computer clustersUsing R in remote computer clusters
Using R in remote computer clusters
Burak Himmetoglu
 
Flux and InfluxDB 2.0 by Paul Dix
Flux and InfluxDB 2.0 by Paul DixFlux and InfluxDB 2.0 by Paul Dix
Flux and InfluxDB 2.0 by Paul Dix
InfluxData
 
Linuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talkLinuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talk
Lenz Gschwendtner
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
Flink Forward
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Intel® Software
 
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsEarly Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Jeff Larkin
 
.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel Programming.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel Programming
Alex Moore
 
Michael Häusler – Everyday flink
Michael Häusler – Everyday flinkMichael Häusler – Everyday flink
Michael Häusler – Everyday flink
Flink Forward
 
scikit-cuda
scikit-cudascikit-cuda
scikit-cuda
Lev Givon
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
AMD Developer Central
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
Jen Aman
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Somnath Mazumdar
 
Buzzwords Numba Presentation
Buzzwords Numba PresentationBuzzwords Numba Presentation
Buzzwords Numba Presentation
kammeyer
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Databricks
 
Probabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityProbabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. Cardinality
Andrii Gakhov
 
Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 Basic
Flink Forward
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
HONGJOO LEE
 

What's hot (20)

An evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loopsAn evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loops
 
あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
 
Using R in remote computer clusters
Using R in remote computer clustersUsing R in remote computer clusters
Using R in remote computer clusters
 
Flux and InfluxDB 2.0 by Paul Dix
Flux and InfluxDB 2.0 by Paul DixFlux and InfluxDB 2.0 by Paul Dix
Flux and InfluxDB 2.0 by Paul Dix
 
Linuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talkLinuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talk
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
 
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsEarly Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
 
.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel Programming.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel Programming
 
Michael Häusler – Everyday flink
Michael Häusler – Everyday flinkMichael Häusler – Everyday flink
Michael Häusler – Everyday flink
 
scikit-cuda
scikit-cudascikit-cuda
scikit-cuda
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
 
Buzzwords Numba Presentation
Buzzwords Numba PresentationBuzzwords Numba Presentation
Buzzwords Numba Presentation
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
 
Probabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityProbabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. Cardinality
 
Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 Basic
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
 

Similar to OmpSs – improving the scalability of OpenMP

OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroom
Facultad de Informática UCM
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
NVIDIA Japan
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel Processing
Ed Kohlwey
 
Big data shim
Big data shimBig data shim
Big data shim
tistrue
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
Harisankar H
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
Sri Prasanna
 
Task based Programming with OmpSs and its Application
Task based Programming with OmpSs and its ApplicationTask based Programming with OmpSs and its Application
Task based Programming with OmpSs and its Application
Facultad de Informática UCM
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
Intel® Software
 
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesChapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Akihiro Hayashi
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Arvind Surve
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Arvind Surve
 
IS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialIS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorial
Roger Rafanell Mas
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Databricks
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
Hugo Gävert
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
Anubhav Jain
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
MLconf
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKee
Rizwan Habib
 
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014
Holden Karau
 
HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016
Ehsan Totoni
 
Three Functional Programming Technologies for Big Data
Three Functional Programming Technologies for Big DataThree Functional Programming Technologies for Big Data
Three Functional Programming Technologies for Big Data
Dynamical Software, Inc.
 

Similar to OmpSs – improving the scalability of OpenMP (20)

OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroom
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel Processing
 
Big data shim
Big data shimBig data shim
Big data shim
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 
Task based Programming with OmpSs and its Application
Task based Programming with OmpSs and its ApplicationTask based Programming with OmpSs and its Application
Task based Programming with OmpSs and its Application
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesChapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
 
IS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialIS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorial
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKee
 
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014
 
HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016
 
Three Functional Programming Technologies for Big Data
Three Functional Programming Technologies for Big DataThree Functional Programming Technologies for Big Data
Three Functional Programming Technologies for Big Data
 

More from Intel IT Center

AI Crash Course- Supercomputing
AI Crash Course- SupercomputingAI Crash Course- Supercomputing
AI Crash Course- Supercomputing
Intel IT Center
 
FPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraFPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsara
Intel IT Center
 
High Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationHigh Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel Station
Intel IT Center
 
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsINFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
Intel IT Center
 
Disrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationDisrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User Authentication
Intel IT Center
 
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Intel IT Center
 
Harness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayHarness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace Today
Intel IT Center
 
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
Intel IT Center
 
Achieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldAchieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital World
Intel IT Center
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel IT Center
 
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
Intel IT Center
 
Identity Protection for the Digital Age
Identity Protection for the Digital AgeIdentity Protection for the Digital Age
Identity Protection for the Digital Age
Intel IT Center
 
Three Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityThree Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a Reality
Intel IT Center
 
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Intel IT Center
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel IT Center
 

More from Intel IT Center (20)

AI Crash Course- Supercomputing
AI Crash Course- SupercomputingAI Crash Course- Supercomputing
AI Crash Course- Supercomputing
 
FPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraFPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsara
 
High Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationHigh Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel Station
 
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsINFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
 
Disrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationDisrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User Authentication
 
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
 
Harness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayHarness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace Today
 
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
 
Achieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldAchieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital World
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
 
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
 
Identity Protection for the Digital Age
Identity Protection for the Digital AgeIdentity Protection for the Digital Age
Identity Protection for the Digital Age
 
Three Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityThree Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a Reality
 
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
 
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
 

Recently uploaded

Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 

Recently uploaded (20)

Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 

OmpSs – improving the scalability of OpenMP

  • 1. www.bsc.es Leipzig, June 17th 2013 Jesus Labarta Jesus.labarta@bsc.es OmpSs – improving the scalability of OpenMP
  • 2. 2 The parallel programming revolution Parallel programming in the past – Where to place data – What to run where – How to communicate Parallel programming in the future – What do I need to compute – What data do I need to use – hints (not necessarily very precise) on potential concurrency, locality,… Schedule @ programmers mind Static Schedule @ system Dynamic Complexity  ! Awareness Variability
  • 3. 3 Key concept – Sequential task based program on single address/name space + directionality annotations – Happens to execute parallel: Automatic run time computation of dependencies between tasks Differentiation of StarSs – Dependences: Tasks instantiated but not ready. Order IS defined • Lookahead – Avoid stalling the main control flow when a computation depending on previous tasks is reached – Possibility to “see” the future searching for further potential concurrency – Locality aware – Homogenizing heterogeneity The StarSs family of programming models
  • 4. 4 The StarSs “Granularities” StarSs OmpSs COMPSs @ SMP @ GPU @ Cluster Average task Granularity: 100 microseconds – 10 milliseconds 1second - 1 day Language binding: C, C++, FORTRAN Java, Python Address space to compute dependences: Memory Files, Objects (SCM) Parallel Ensemble, workflow
  • 5. 5 OpenMP void Cholesky(int NT, float *A[NT][NT] ) { #pragma omp parallel #pragma omp single for (int k=0; k<NT; k++) { #pragma omp task spotrf (A[k][k], TS); #pragma omp taskwait for (int i=k+1; i<NT; i++) { #pragma omp task strsm (A[k][k], A[k][i], TS); } #pragma omp taskwait for (int i=k+1; i<NT; i++) { for (j=k+1; j<i; j++) #pragma omp task sgemm( A[k][i], A[k][j], A[j][i], TS); #pragma omp task ssyrk (A[k][i], A[i][i], TS); #pragma omp taskwait } } }
  • 6. 6 Extend OpenMP with a data-flow execution model to exploit unstructured parallelism (compared to fork-join) – in/out pragmas to enable dependence and locality – Inlined/outlined void Cholesky(int NT, float *A[NT][NT] ) { for (int k=0; k<NT; k++) { #pragma omp task inout ([TS][TS]A[k][k]) spotrf (A[k][k], TS) ; for (int i=k+1; i<NT; i++) { #pragma omp task in(([TS][TS]A[k][k])) inout ([TS][TS]A[k][i]) strsm (A[k][k], A[k][i], TS); } for (int i=k+1; i<NT; i++) { for (j=k+1; j<i; j++) { #pragma omp task in([TS][TS]A[k][i]), in([TS][TS]A[k][j]) inout ([TS][TS]A[j][i]) sgemm( A[k][i], A[k][j], A[j][i], TS); } #pragma omp task in ([TS][TS]A[k][i]) inout([TS][TS]A[i][i]) ssyrk (A[k][i], A[i][i], TS); } } } OmpSs
  • 7. 7 OmpSs Other features and characteristics – Contiguous / strided dependence detection and data management – Nesting/recursion – Multiple implementations for a given task – Integrated with powerful performance tools (Extrae+Paraver) Continuous development and use – Since 2004 – On large applications and systems Pushing ideas into the OpenMP standard – Developer Positioning: efforts in OmpSs will not be lost. – Dependences  v 4.0 18 Used in projects and applications … Undertaken significant efforts to port real large scale applications: – • Scalapack, PLASMA, SPECFEM3D, LBC, CPMD PSC, PEPC, LS1 Mardyn, Asynchronous algorithms, Microbenchmarks – • YALES2, EUTERPE, SPECFEM3D, MP2C, BigDFT, QuantumESPRESSO, PEPC, SMMP, ProFASI, COSMO, BQCD – DEEP • NEURON, iPIC3D, ECHAM/MESSy, AVBP, TurboRVB, Seismic – G8_ECS • CGPOP, NICAM (planed) … – Consolider project (Spanish ministry) • MRGENESIS – BSC initiatives and collaborations: • GROMACS, GADGET, WRF,… 19 … but NOT only for «scientific computing» … Plagiarism detection – Histograms, sorting, … (FhI FIRST) Trace browsing – Paraver (BSC) Clustering algorithms – G-means (BSC) Image processing – Tracking (USAF) Embedded and consumer – H.264 (TUBerlin), …
  • 8. 8 The potential of asynchrony Performance … – Cholesky 8K x 8K @ SandyBridge (E5-2670, 2.6 GHz) – Intel Math Kernel Library (MKL) parallel vs. OmpSs+ MKL sequential • Nested OmpSs (large block size, small block size)  Reproducibility / less sensitivity to environment options Strong scaling Impact of problem size for fixed core count
  • 9. 9 • Detected issues (ongoing work) – scheduling policies (generic of nested OmpSs), – contention on task queues (specific of high #cores),… • Still fairly good performance!!! Xeon Phi (I)
  • 10. 10 Hybrid MPI/SMPSs Linpack example Overlap communication/computation Extend asynchronous data-flow execution to outer level Automatic lookahead … for (k=0; k<N; k++) { if (mine) { Factor_panel(A[k]); send (A[k]) } else { receive (A[k]); if (necessary) resend (A[k]); } for (j=k+1; j<N; j++) update (A[k], A[j]); … #pragma css task inout(A[SIZE]) void Factor_panel(float *A); #pragma css task input(A[SIZE]) inout(B[SIZE]) void update(float *A, float *B); #pragma css task input(A[SIZE]) void send(float *A); #pragma css task output(A[SIZE]) void receive(float *A); #pragma css task input(A[SIZE]) void resend(float *A); P0 P1 P2 V. Marjanovic, et al, “Overlapping Communication and Computation by using a Hybrid MPI/SMPSs Approach” ICS 2010
  • 11. 11 Automatically achieved by the runtime – Load balance within node – Fine grain. – Complementary to application level load balance. – Leverage OmpSs malleability LeWI: Lend core When Idle – User level Run time Library (DLB) coordinating processes within node – Fighting the Linux kernel: Explicit pinning of threads and handoff scheduling Overcome Amdahl’s law in hybrid programming – Enable partial node level parallelization – hope for lazy programmers … and us all Dynamic Load Balancing: LeWI Communication Running task Idle 4 MPI processes @ 4 cores node “LeWI: A Runtime Balancing Algorithm for Nested Parallelism”. M.Garcia et al. ICPP09
  • 12. 12 Offloading to Intel® Xeon Phi™ (SRMPI app @ DEEP) Host KNC KNC KNC KNC Host KNC KNC KNC KNC MPI MPI MPI void main(int argc, char* argv[]) { //MPI stuff …. for(int i=0; i<my_jobs; i++) { int idx = i % workers; #pragma omp task out(worker_image[idx]) worker(worker_image[idx]); #pragma omp task inout(master_image) in(worker_image[idx]); accum(master_image, worker_image[idx]) } #pragma omp taskwait global_accum(global_pool, master_image); // MPI } Host KNC KNC KNC KNC Host KNC KNC KNC KNC MPI OmpSs OmpSs void main(int argc, char* argv[]) { //MPI stuff …. for(int i=0; i<my_jobs; i++) { int idx = i % workers; // MPI stuff to send job to worker //… //….manually taking care of load balance // … // …. and collect results on worker_pool[idx] accum(master_image, worker_image[idx]) } global_accum(global_image, master_image); // MPI } void main(int argc, char* argv[]) { for(int i=0; i<my_jobs; i++) { int idx = i % workers; #pragma omp task out(worker_image[idx]) worker(worker_image[idx]); #pragma omp task inout(master_image) in(worker_image[idx]); accum(global_image, worker_image[idx]) } #pragma omp taskwait }
  • 13. 13 Offloading to Intel® Xeon Phi™ (SRMPI app @ DEEP) Productivity … … and performance … flexibility …
  • 14. 14 MPI + OmpSs @ MIC – Hydro 8 MPI processes, 2 cores/process, 4 OmpSs threads/process – Mapped to 64 hardware threads (16 cores) 4 OmpSs threads/Core 2 OmpSs threads/Core 3 OmpSs threads@ Core 1 2 OmpSs threads/Core 1 OmpSs threads@ Core 2
  • 15. 15 OmpSs: Enabler for exascale  Can exploit very unstructured parallelism  Not just loop/data parallelism  Easy to change structure  Supports large amounts of lookahead  Not stalling for dependence satisfaction  Allow for locality optimizations to tolerate latency  Overlap data transfers, prefetch  Reuse  Nicely hybridizes into MPI/StarSs  Propagates to large scale the node level dataflow characteristics  Overlap communication and computation  A chance against Amdahl’s law  Support for heterogeneity  Any # and combination of CPUs, GPUs  Including autotuning  Malleability: Decouple program from resources  Allowing dynamic resource allocation and load balance  Tolerate noise 15 Asynchrony: Data-flow Locality Simple / incremental interface to programmers Compatible with proprietary lower level technologies