SlideShare a Scribd company logo
Exploiting Parallelism with Multi-core Technologies ,[object Object],[object Object],[object Object],[object Object]
Coding with TBB Contest ,[object Object],[object Object],[object Object],[object Object],[object Object]
Problem ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Three Approaches for Improvement ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Family Tree  1988 2001 2006 1995 Languages *Other names and brands may be claimed as the property of others Cilk  space efficient scheduler cache-oblivious algorithms Threaded-C continuation tasks task stealing OpenMP* fork/join tasks OpenMP taskqueue while & recursion Pragmas Chare Kernel small tasks JSR-166 (FJTask) containers Intel® Threading Building Blocks   STL generic programming STAPL recursive ranges ECMA .NET* parallel iteration classes Libraries
Enter Intel® Threading Building Blocks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example (just a peek) void ApplyFoo( size_t n , int x  ) { for(size_t i=range_begin; i<range_end; ++i) Foo( i,x ); } SERIAL VERSION void ParallelApplyFoo(size_t n, int x) { parallel_for( blocked_range<size_t>(0,n,10),  <>(const blocked_range<size_t>& range) { for(size_t i=range.begin(); i<range.end(); ++i)  Foo(i,x); } ); } PARALLEL VERSION (the way I wish I could write it)
Parallel Version  (as it can be written) class ApplyFoo { public: int my_x; ApplyFoo( int x ) : my_x(x) {} void operator()(const blocked_range<size_t>& range) const { for(size_t i=range.begin(); i!=range.end(); ++i)  Foo(i,my_x); } }; void ParallelApplyFoo(size_t n, int x) { parallel_for(blocked_range<size_t>(0,n,10),  ApplyFoo(x)); }
Underlying concepts
Generic Programming ,[object Object],[object Object],[object Object],[object Object],[object Object]
Key Features of Intel Threading Building Blocks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Relaxed Sequential Semantics ,[object Object],[object Object]
Synchronization Primitives atomic, spin_mutex, spin_rw_mutex, queuing_mutex, queuing_rw_mutex, mutex Generic Parallel Algorithms parallel_for parallel_while parallel_reduce pipeline parallel_sort parallel_scan Concurrent Containers concurrent_hash_map concurrent_queue concurrent_vector Task scheduler Memory Allocation cache_aligned_allocator scalable_allocator
Serial Example static void SerialUpdateVelocity() { for( int i=1; i<UniverseHeight-1; ++i ) for( int j=1; j<UniverseWidth-1; ++j )  V[i][j] += (S[i][j] - S[i][j-1] + T[i][j] - T[i-1][j])*M[i]; }
Parallel Version blue = original code red = provided by TBB black = boilerplate for library struct UpdateVelocityBody { void operator()( const   blocked_range <int>& range ) const { int end =  range.end (); for( int i=   range.begin ();  i<end; ++i ) {   for( int j=1; j<UniverseWidth-1; ++j ) { V[i][j] += (S[i][j] - S[i][j-1] + T[i][j] - T[i-1][j])*M[i]; } } } void ParallelUpdateVelocity() { parallel_for (   blocked_range<int> (  1, UniverseHeight-1),    UpdateVelocityBody(),  auto_partitioner()  ); } Task Parallel control structure Task subdivision handler
Range is Generic ,[object Object],[object Object],[object Object],[object Object],Destructor R::~R() True if range is empty bool R::empty() const True if range can be partitioned bool R::is_divisible() const Split  r  into two subranges R::R (R& r,  split ) Copy constructor R::R (const  R&)
parallel_reduce parallel_scan parallel_while parallel_sort pipeline
Parallel  pipeline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Parallel  pipeline Parallel stage scales because it can process items in parallel or out of order.  Serial stage processes items one at a time in order. Another serial stage. Items wait for turn in serial stage Controls excessive parallelism by limiting total number of items flowing through pipeline. Uses sequence numbers recover order for serial stage. Tag incoming items with sequence numbers Throughput limited by throughput  of slowest serial stage. 1 3 2 4 5 6 7 8 9 10 11 12
Concurrent Containers, Mutual Exclusion Memory Allocator Task Scheduler
Concurrent Containers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Concurrent interface requirements ,[object Object],[object Object],[object Object],extern std::queue q; if(!q.empty()) { item=q.front();  q.pop(); } At this instant, another thread  might pop last element.
concurrent_vector <T> ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],// Append sequence [begin,end) to x in thread-safe way. template<typename T> void Append( concurrent_vector <T> &x, const T *begin, const T *end ) { std::copy(begin, end, x.begin() + x. grow_by (end-begin) ) } Example
concurrent_queue <T> ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
concurrent_hash <Key,T,HashCompare> ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
struct MyHashCompare { static long  hash ( const char* x ) { long h = 0; for( const char* s = x; *s; s++ ) h = (h*157)^*s; return h; } static bool  equal ( const char* x, const char* y ) { return strcmp(x,y)==0; } }; typedef  concurrent_hash_map <const char*,int,MyHashCompare> StringTable; StringTable MyTable; Example: map strings to integers void MyUpdateCount( const char* x ) { StringTable::accessor  a; MyTable. insert ( a, x ); a->second += 1; } Multiple threads can insert and update entries concurrently. accessor object acts as smart pointer and writer lock.
Synchronization Primitives ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example: spin_rw_mutex promotion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Scalable Memory Allocator ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Task Scheduler ,[object Object],[object Object],[object Object],TBB Approach Problem Task chunking and work-stealing help balance load Load imbalance Programmer specifies tasks, not threads. Program complexity “ Greedy” scheduling often wins “ Fair” scheduling One scheduler thread per hardware thread Oversubscription
Task Scheduler ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],#include “tbb/task_scheduler_init.h” using namespace tbb; int main() { task_scheduler_init  init; … . return 0; }
Tasking Development tools ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
open source tour
Open Source – quick ‘tour’ ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Coding with TBB Contest ,[object Object],[object Object],[object Object],[object Object]
Learn more… ,[object Object],[object Object]

More Related Content

What's hot

Конверсия управляемых языков в неуправляемые
Конверсия управляемых языков в неуправляемыеКонверсия управляемых языков в неуправляемые
Конверсия управляемых языков в неуправляемые
Platonov Sergey
 
C optimization notes
C optimization notesC optimization notes
C optimization notes
Fyaz Ghaffar
 
Re-engineering Eclipse MDT/OCL for Xtext
Re-engineering Eclipse MDT/OCL for XtextRe-engineering Eclipse MDT/OCL for Xtext
Re-engineering Eclipse MDT/OCL for Xtext
Edward Willink
 
Code GPU with CUDA - Memory Subsystem
Code GPU with CUDA - Memory SubsystemCode GPU with CUDA - Memory Subsystem
Code GPU with CUDA - Memory Subsystem
Marina Kolpakova
 
Implementation of “Parma Polyhedron Library”-functions in MATLAB
Implementation of “Parma Polyhedron Library”-functions in MATLABImplementation of “Parma Polyhedron Library”-functions in MATLAB
Implementation of “Parma Polyhedron Library”-functions in MATLAB
Leo Asselborn
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
David Walker
 
Code GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flowCode GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flow
Marina Kolpakova
 
Async await in C++
Async await in C++Async await in C++
Async await in C++
cppfrug
 
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul PillaiA look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
Cysinfo Cyber Security Community
 
Return Oriented Programming
Return Oriented ProgrammingReturn Oriented Programming
Return Oriented Programming
UTD Computer Security Group
 
An evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loopsAn evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loops
Linaro
 
ADMS'13 High-Performance Holistic XML Twig Filtering Using GPUs
ADMS'13  High-Performance Holistic XML Twig Filtering Using GPUsADMS'13  High-Performance Holistic XML Twig Filtering Using GPUs
ADMS'13 High-Performance Holistic XML Twig Filtering Using GPUs
ty1er
 
OpenMP And C++
OpenMP And C++OpenMP And C++
OpenMP And C++
Dragos Sbîrlea
 
Apache Hadoop Java API
Apache Hadoop Java APIApache Hadoop Java API
Apache Hadoop Java API
Adam Kawa
 
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...
PyData
 
tokyotalk
tokyotalktokyotalk
tokyotalk
Hiroshi Ono
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with R
Abhirup Mallik
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVM
Linaro
 
A closure ekon16
A closure ekon16A closure ekon16
A closure ekon16
Max Kleiner
 
Numba
NumbaNumba

What's hot (20)

Конверсия управляемых языков в неуправляемые
Конверсия управляемых языков в неуправляемыеКонверсия управляемых языков в неуправляемые
Конверсия управляемых языков в неуправляемые
 
C optimization notes
C optimization notesC optimization notes
C optimization notes
 
Re-engineering Eclipse MDT/OCL for Xtext
Re-engineering Eclipse MDT/OCL for XtextRe-engineering Eclipse MDT/OCL for Xtext
Re-engineering Eclipse MDT/OCL for Xtext
 
Code GPU with CUDA - Memory Subsystem
Code GPU with CUDA - Memory SubsystemCode GPU with CUDA - Memory Subsystem
Code GPU with CUDA - Memory Subsystem
 
Implementation of “Parma Polyhedron Library”-functions in MATLAB
Implementation of “Parma Polyhedron Library”-functions in MATLABImplementation of “Parma Polyhedron Library”-functions in MATLAB
Implementation of “Parma Polyhedron Library”-functions in MATLAB
 
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
The Effect of Hierarchical Memory on the Design of Parallel Algorithms and th...
 
Code GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flowCode GPU with CUDA - Optimizing memory and control flow
Code GPU with CUDA - Optimizing memory and control flow
 
Async await in C++
Async await in C++Async await in C++
Async await in C++
 
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul PillaiA look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
A look into the sanitizer family (ASAN & UBSAN) by Akul Pillai
 
Return Oriented Programming
Return Oriented ProgrammingReturn Oriented Programming
Return Oriented Programming
 
An evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loopsAn evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loops
 
ADMS'13 High-Performance Holistic XML Twig Filtering Using GPUs
ADMS'13  High-Performance Holistic XML Twig Filtering Using GPUsADMS'13  High-Performance Holistic XML Twig Filtering Using GPUs
ADMS'13 High-Performance Holistic XML Twig Filtering Using GPUs
 
OpenMP And C++
OpenMP And C++OpenMP And C++
OpenMP And C++
 
Apache Hadoop Java API
Apache Hadoop Java APIApache Hadoop Java API
Apache Hadoop Java API
 
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...
 
tokyotalk
tokyotalktokyotalk
tokyotalk
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with R
 
Compilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVMCompilation of COSMO for GPU using LLVM
Compilation of COSMO for GPU using LLVM
 
A closure ekon16
A closure ekon16A closure ekon16
A closure ekon16
 
Numba
NumbaNumba
Numba
 

Viewers also liked

Scale Up Performance with Intel® Development
Scale Up Performance with Intel® DevelopmentScale Up Performance with Intel® Development
Scale Up Performance with Intel® Development
Intel IT Center
 
Starting cilk development on windows
Starting cilk development on windowsStarting cilk development on windows
Starting cilk development on windows
Mazen Abdulaziz
 
Develop, Deploy, and Innovate with Intel® Cluster Ready
Develop, Deploy, and Innovate with Intel® Cluster ReadyDevelop, Deploy, and Innovate with Intel® Cluster Ready
Develop, Deploy, and Innovate with Intel® Cluster Ready
Intel IT Center
 
Scalability comparison: Traditional fork-join-based parallelism vs. Goroutine...
Scalability comparison: Traditional fork-join-based parallelism vs. Goroutine...Scalability comparison: Traditional fork-join-based parallelism vs. Goroutine...
Scalability comparison: Traditional fork-join-based parallelism vs. Goroutine...
Artjom Simon
 
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...
Intel Software Brasil
 
Performance boosting of discrete cosine transform using parallel programming ...
Performance boosting of discrete cosine transform using parallel programming ...Performance boosting of discrete cosine transform using parallel programming ...
Performance boosting of discrete cosine transform using parallel programming ...
IAEME Publication
 

Viewers also liked (6)

Scale Up Performance with Intel® Development
Scale Up Performance with Intel® DevelopmentScale Up Performance with Intel® Development
Scale Up Performance with Intel® Development
 
Starting cilk development on windows
Starting cilk development on windowsStarting cilk development on windows
Starting cilk development on windows
 
Develop, Deploy, and Innovate with Intel® Cluster Ready
Develop, Deploy, and Innovate with Intel® Cluster ReadyDevelop, Deploy, and Innovate with Intel® Cluster Ready
Develop, Deploy, and Innovate with Intel® Cluster Ready
 
Scalability comparison: Traditional fork-join-based parallelism vs. Goroutine...
Scalability comparison: Traditional fork-join-based parallelism vs. Goroutine...Scalability comparison: Traditional fork-join-based parallelism vs. Goroutine...
Scalability comparison: Traditional fork-join-based parallelism vs. Goroutine...
 
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...
 
Performance boosting of discrete cosine transform using parallel programming ...
Performance boosting of discrete cosine transform using parallel programming ...Performance boosting of discrete cosine transform using parallel programming ...
Performance boosting of discrete cosine transform using parallel programming ...
 

Similar to Os Reindersfinal

ParaSail
ParaSail  ParaSail
ParaSail
AdaCore
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
Evan Chan
 
Modern C++
Modern C++Modern C++
Modern C++
Richard Thomson
 
Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009
Martin Odersky
 
A Survey of Concurrency Constructs
A Survey of Concurrency ConstructsA Survey of Concurrency Constructs
A Survey of Concurrency Constructs
Ted Leung
 
220 runtime environments
220 runtime environments220 runtime environments
220 runtime environments
J'tong Atong
 
Clojure concurrency
Clojure concurrencyClojure concurrency
Clojure concurrency
Alex Navis
 
Lockless
LocklessLockless
Lockless
Sandeep Joshi
 
Writing a TSDB from scratch_ performance optimizations.pdf
Writing a TSDB from scratch_ performance optimizations.pdfWriting a TSDB from scratch_ performance optimizations.pdf
Writing a TSDB from scratch_ performance optimizations.pdf
RomanKhavronenko
 
Kotlin coroutines and spring framework
Kotlin coroutines and spring frameworkKotlin coroutines and spring framework
Kotlin coroutines and spring framework
Sunghyouk Bae
 
Concurrency Constructs Overview
Concurrency Constructs OverviewConcurrency Constructs Overview
Concurrency Constructs Overview
stasimus
 
The D Programming Language - Why I love it!
The D Programming Language - Why I love it!The D Programming Language - Why I love it!
The D Programming Language - Why I love it!
ryutenchi
 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/Invoke
Dmitri Nesteruk
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/Multitasking
Sasha Kravchuk
 
Java concurrency
Java concurrencyJava concurrency
Java concurrency
ducquoc_vn
 
C language
C languageC language
C language
spatidar0
 
Rust All Hands Winter 2011
Rust All Hands Winter 2011Rust All Hands Winter 2011
Rust All Hands Winter 2011
Patrick Walton
 
C language
C languageC language
C language
VAIRA MUTHU
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
Zvi Avraham
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
rantav
 

Similar to Os Reindersfinal (20)

ParaSail
ParaSail  ParaSail
ParaSail
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
Modern C++
Modern C++Modern C++
Modern C++
 
Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009
 
A Survey of Concurrency Constructs
A Survey of Concurrency ConstructsA Survey of Concurrency Constructs
A Survey of Concurrency Constructs
 
220 runtime environments
220 runtime environments220 runtime environments
220 runtime environments
 
Clojure concurrency
Clojure concurrencyClojure concurrency
Clojure concurrency
 
Lockless
LocklessLockless
Lockless
 
Writing a TSDB from scratch_ performance optimizations.pdf
Writing a TSDB from scratch_ performance optimizations.pdfWriting a TSDB from scratch_ performance optimizations.pdf
Writing a TSDB from scratch_ performance optimizations.pdf
 
Kotlin coroutines and spring framework
Kotlin coroutines and spring frameworkKotlin coroutines and spring framework
Kotlin coroutines and spring framework
 
Concurrency Constructs Overview
Concurrency Constructs OverviewConcurrency Constructs Overview
Concurrency Constructs Overview
 
The D Programming Language - Why I love it!
The D Programming Language - Why I love it!The D Programming Language - Why I love it!
The D Programming Language - Why I love it!
 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/Invoke
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/Multitasking
 
Java concurrency
Java concurrencyJava concurrency
Java concurrency
 
C language
C languageC language
C language
 
Rust All Hands Winter 2011
Rust All Hands Winter 2011Rust All Hands Winter 2011
Rust All Hands Winter 2011
 
C language
C languageC language
C language
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 

More from oscon2007

J Ruby Whirlwind Tour
J Ruby Whirlwind TourJ Ruby Whirlwind Tour
J Ruby Whirlwind Tour
oscon2007
 
Solr Presentation5
Solr Presentation5Solr Presentation5
Solr Presentation5
oscon2007
 
Os Borger
Os BorgerOs Borger
Os Borger
oscon2007
 
Os Harkins
Os HarkinsOs Harkins
Os Harkins
oscon2007
 
Os Fitzpatrick Sussman Wiifm
Os Fitzpatrick Sussman WiifmOs Fitzpatrick Sussman Wiifm
Os Fitzpatrick Sussman Wiifm
oscon2007
 
Os Bunce
Os BunceOs Bunce
Os Bunce
oscon2007
 
Yuicss R7
Yuicss R7Yuicss R7
Yuicss R7
oscon2007
 
Performance Whack A Mole
Performance Whack A MolePerformance Whack A Mole
Performance Whack A Mole
oscon2007
 
Os Fogel
Os FogelOs Fogel
Os Fogel
oscon2007
 
Os Lanphier Brashears
Os Lanphier BrashearsOs Lanphier Brashears
Os Lanphier Brashears
oscon2007
 
Os Tucker
Os TuckerOs Tucker
Os Tucker
oscon2007
 
Os Fitzpatrick Sussman Swp
Os Fitzpatrick Sussman SwpOs Fitzpatrick Sussman Swp
Os Fitzpatrick Sussman Swp
oscon2007
 
Os Furlong
Os FurlongOs Furlong
Os Furlong
oscon2007
 
Os Berlin Dispelling Myths
Os Berlin Dispelling MythsOs Berlin Dispelling Myths
Os Berlin Dispelling Myths
oscon2007
 
Os Kimsal
Os KimsalOs Kimsal
Os Kimsal
oscon2007
 
Os Pruett
Os PruettOs Pruett
Os Pruett
oscon2007
 
Os Alrubaie
Os AlrubaieOs Alrubaie
Os Alrubaie
oscon2007
 
Os Keysholistic
Os KeysholisticOs Keysholistic
Os Keysholistic
oscon2007
 
Os Jonphillips
Os JonphillipsOs Jonphillips
Os Jonphillips
oscon2007
 
Os Urnerupdated
Os UrnerupdatedOs Urnerupdated
Os Urnerupdated
oscon2007
 

More from oscon2007 (20)

J Ruby Whirlwind Tour
J Ruby Whirlwind TourJ Ruby Whirlwind Tour
J Ruby Whirlwind Tour
 
Solr Presentation5
Solr Presentation5Solr Presentation5
Solr Presentation5
 
Os Borger
Os BorgerOs Borger
Os Borger
 
Os Harkins
Os HarkinsOs Harkins
Os Harkins
 
Os Fitzpatrick Sussman Wiifm
Os Fitzpatrick Sussman WiifmOs Fitzpatrick Sussman Wiifm
Os Fitzpatrick Sussman Wiifm
 
Os Bunce
Os BunceOs Bunce
Os Bunce
 
Yuicss R7
Yuicss R7Yuicss R7
Yuicss R7
 
Performance Whack A Mole
Performance Whack A MolePerformance Whack A Mole
Performance Whack A Mole
 
Os Fogel
Os FogelOs Fogel
Os Fogel
 
Os Lanphier Brashears
Os Lanphier BrashearsOs Lanphier Brashears
Os Lanphier Brashears
 
Os Tucker
Os TuckerOs Tucker
Os Tucker
 
Os Fitzpatrick Sussman Swp
Os Fitzpatrick Sussman SwpOs Fitzpatrick Sussman Swp
Os Fitzpatrick Sussman Swp
 
Os Furlong
Os FurlongOs Furlong
Os Furlong
 
Os Berlin Dispelling Myths
Os Berlin Dispelling MythsOs Berlin Dispelling Myths
Os Berlin Dispelling Myths
 
Os Kimsal
Os KimsalOs Kimsal
Os Kimsal
 
Os Pruett
Os PruettOs Pruett
Os Pruett
 
Os Alrubaie
Os AlrubaieOs Alrubaie
Os Alrubaie
 
Os Keysholistic
Os KeysholisticOs Keysholistic
Os Keysholistic
 
Os Jonphillips
Os JonphillipsOs Jonphillips
Os Jonphillips
 
Os Urnerupdated
Os UrnerupdatedOs Urnerupdated
Os Urnerupdated
 

Recently uploaded

Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
marufrahmanstratejm
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 

Recently uploaded (20)

Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 

Os Reindersfinal

  • 1.
  • 2.
  • 3.
  • 4.
  • 5. Family Tree 1988 2001 2006 1995 Languages *Other names and brands may be claimed as the property of others Cilk space efficient scheduler cache-oblivious algorithms Threaded-C continuation tasks task stealing OpenMP* fork/join tasks OpenMP taskqueue while & recursion Pragmas Chare Kernel small tasks JSR-166 (FJTask) containers Intel® Threading Building Blocks STL generic programming STAPL recursive ranges ECMA .NET* parallel iteration classes Libraries
  • 6.
  • 7. Example (just a peek) void ApplyFoo( size_t n , int x ) { for(size_t i=range_begin; i<range_end; ++i) Foo( i,x ); } SERIAL VERSION void ParallelApplyFoo(size_t n, int x) { parallel_for( blocked_range<size_t>(0,n,10), <>(const blocked_range<size_t>& range) { for(size_t i=range.begin(); i<range.end(); ++i) Foo(i,x); } ); } PARALLEL VERSION (the way I wish I could write it)
  • 8. Parallel Version (as it can be written) class ApplyFoo { public: int my_x; ApplyFoo( int x ) : my_x(x) {} void operator()(const blocked_range<size_t>& range) const { for(size_t i=range.begin(); i!=range.end(); ++i) Foo(i,my_x); } }; void ParallelApplyFoo(size_t n, int x) { parallel_for(blocked_range<size_t>(0,n,10), ApplyFoo(x)); }
  • 10.
  • 11.
  • 12.
  • 13. Synchronization Primitives atomic, spin_mutex, spin_rw_mutex, queuing_mutex, queuing_rw_mutex, mutex Generic Parallel Algorithms parallel_for parallel_while parallel_reduce pipeline parallel_sort parallel_scan Concurrent Containers concurrent_hash_map concurrent_queue concurrent_vector Task scheduler Memory Allocation cache_aligned_allocator scalable_allocator
  • 14. Serial Example static void SerialUpdateVelocity() { for( int i=1; i<UniverseHeight-1; ++i ) for( int j=1; j<UniverseWidth-1; ++j ) V[i][j] += (S[i][j] - S[i][j-1] + T[i][j] - T[i-1][j])*M[i]; }
  • 15. Parallel Version blue = original code red = provided by TBB black = boilerplate for library struct UpdateVelocityBody { void operator()( const blocked_range <int>& range ) const { int end = range.end (); for( int i= range.begin (); i<end; ++i ) { for( int j=1; j<UniverseWidth-1; ++j ) { V[i][j] += (S[i][j] - S[i][j-1] + T[i][j] - T[i-1][j])*M[i]; } } } void ParallelUpdateVelocity() { parallel_for ( blocked_range<int> ( 1, UniverseHeight-1), UpdateVelocityBody(), auto_partitioner() ); } Task Parallel control structure Task subdivision handler
  • 16.
  • 18.
  • 19. Parallel pipeline Parallel stage scales because it can process items in parallel or out of order. Serial stage processes items one at a time in order. Another serial stage. Items wait for turn in serial stage Controls excessive parallelism by limiting total number of items flowing through pipeline. Uses sequence numbers recover order for serial stage. Tag incoming items with sequence numbers Throughput limited by throughput of slowest serial stage. 1 3 2 4 5 6 7 8 9 10 11 12
  • 20. Concurrent Containers, Mutual Exclusion Memory Allocator Task Scheduler
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26. struct MyHashCompare { static long hash ( const char* x ) { long h = 0; for( const char* s = x; *s; s++ ) h = (h*157)^*s; return h; } static bool equal ( const char* x, const char* y ) { return strcmp(x,y)==0; } }; typedef concurrent_hash_map <const char*,int,MyHashCompare> StringTable; StringTable MyTable; Example: map strings to integers void MyUpdateCount( const char* x ) { StringTable::accessor a; MyTable. insert ( a, x ); a->second += 1; } Multiple threads can insert and update entries concurrently. accessor object acts as smart pointer and writer lock.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 34.
  • 35.
  • 36.