Medical Image Processing Strategies for multi-core CPUs

Medical image processing strategies for multi-core CPUs Daniel Blezek, Mayo Clinic blezek.daniel@mayo.edu

Poll Does your primary computer have more than one core...? 2 Have you ever written parallel code?

It’s a parallel world... SMP formerly was the domain of researchers Thanks to Intel, now it’s everywhere! 3 ... but most of us think in serial ... ,[object Object]

Development of parallel software is difficult

Parallel Computing – according to Google “parallel computing” 1.4M hits on Google “multithreading” 10M hits “multicore” 2.4M hits “parallel programming” 1.1M hits Why is it so hard? the world is parallel we all think in parallel yet we are taught to program in serial 4 driving

Degrees of parallelism (my take) Serial – SISD single thread of execution Data parallel – SIMD (fine grained parallelism) Embarrassingly parallel – larger scale SIMD CT or MR reconstruction Each operation is independent, e.g. iFFT of slices Worker thread – e.g. virus scanning software Coarse grained parallelism – SMP or MIMD Focus of this presentation, more in GPU talk Concurrency, OpenMP, TBB, pthreads/Winthreads Large scale – MPI on cluster, tight coupling Large scale – Grid computing, loose coupling 5

Pragmatic approach C/C++ and Fortran are the kings of performance (I’ve never written a single line of Fortran, so don’t ask) “Bolted on” parallel concepts Zero language support Huge existing codebase 6

Pragmatic approach Briefly touch on SIMD Introduce SMP concepts Threads, concurrency Development models pthreads/WinThreads OpenMP TBB ITK Medical Image Processing Example problems Common errors Next steps 7 packed

SIMD – basic principles 9 http://en.wikipedia.org/wiki/SIMD

Data structures for SIMD Array of Structures struct Vec { float x, y, z; }; Vec[] points = new Vec[sz]; 10 X Y Z -- Pack X Y Z -- X Y Z -- * Unpack X Y Z --

Data structures for SIMD 11 Structure of Arrays struct Vec { float[] x; float[] y; float[] z; Vec ( int sz ) { x = new float[sz]; y = new float[sz]; z = new float[sz]; }; }; Structure of Arrays struct Vec{ Vector4f[] v; Vec ( int sz ) { // must be word // aligned v = new Vector4f[sz]; }; };

SIMD pitfalls Structure alignment Usually needs to be aligned on word boundary Structure considerations May need to refactor existing code/structures Generally not cross-platform MMX, 3D Now!, SSE, SSE2, SSE4, AltVec, AVX, etc... Performance gains are modest 2x – 4x common Limited instructions Add, multiply, divide, round Not suitable for branching logic Autovectorizing compilers for simple loops -ftree-vectorize (GCC), -fast, -O2 or -O3 (Intel Compiler) 12

14 Threads – they’re everywhere

SMP concepts 15 Useful to think in terms of “cores” 2 dual-core CPU = 4 “cores” Cores share main memory, may share cache Threads in same process share memory Generally, one executing thread per core Other threads sleeping

Cores – they’re everywhere 16 How many cores does your laptop have? Mine has 50(!) 2 Intel CPU (Core 2 Duo) 32 nVidia cores (9600M GT) 16 nVidia cores (9400M)

Parallel concepts for SMP Process Started by the OS Single thread executes “main” No direct access to memory of other processes Threads Stream of execution under a process Access to memory in containing process Private memory Lifetime may be less than main thread Concurrency Coordination between threads High level (mutex, locks, barriers) Low level (atomic operations) 17

Processes & Threads 18 Process Thread NoNo

#include <pthread.h> // Thread work function, must return pointer to void void *doWork(void *work) { // Do work return work; // equivalent to pthread_exit ( myWork ); } ... pthread_t child; ... rc=pthread_create(&child, &attr, doWork, (void *)work); ... rc = pthread_join ( child, &threadwork ); ... Thread construction – pthread example 19

Thread construction – Win32 example 20 #include <windows.h> DWORD WINAPI doWork( LPVOID work) {}; ... PMYDATA work; DWORD childID; HANDLE child; child = CreateThread( NULL, // default security attributes 0, // use default stack size doWork, // thread function name work, // argument to thread function 0, // use default creation flags &childID); // returns the thread identifier WaitForMultipleObjects(NThreads, child, TRUE, INFINITE);

Thread construction – Java example 21 import java.lang.Thread; class Worker implements Runnable { public Worker ( Work work ) {}; public void run() {}; // Do work here } ... Worker worker = new Worker ( someWork ); New Thread ( worker ).start();

Race Conditions 22 Serial Parallel Problem! nono/door

Mutex Mutex – Mutual exclusion lock Protects a section of code Only one thread has a lock on the object Threads may wait for the mutex return a status if the mutex is locked Semaphore N threads Critical Section One thread executes code Protects global resources Maintain consistent state 23

Race Conditions 24 ... N = 0; ... // Start some threads ... void* doWork() { N++; // get, incr, store } Mutexmutex; mutex.lock(); mutex.release(); Solution w/Mutex NoNo

Atomic operations Locks are not perfect Cause blocking Relatively heavy-weight Atomic operations Simple operations Hardware support Can implement w/Mutex Conditions Invisibility – no other thread knows about the change Atomicity – if operation fails, return to original state 25

Deadlock Deadlock 26 Mutex A Mutex B Mutex Thread NoNo

Thread synchronization – barrier Initialized with the number of threads expected Threads signal when they are ready Wait until all expected threads are there A stalled or dead thread can stall all the threads 27

Thread synchronization – Condition variables Workers atomically release mutex and wait Master atomically releases mutex and signals Workers wake up and acquire mutex 28 Mutex A Working Condition Mutex A Condition Mutex A Wait Mutex A Condition Mutex A Mutex Thread

Thread pool & Futures 29 Maintains a “pool” of Worker threads Work queued until thread available Optionally notify through a “Future” Future can query status, holds return value Thread returns to pool, no startup overhead Core concept for OpenMP and TBB

Introduction to OpenMP Scatter / gather paradigm Maintains a thread pool Requires compiler support Visual C++, gcc 4.0, Intel Compiler Easy to adapt existing serial code, easy to debug Simple paradigm 31

OpenMP – simple parallel sections 32 #pragmaomp parallel sections num_threads ( 5 ) { // 5 Threads scatter here #pragmaomp section { // Do task 1 } #pragmaomp section { // Do task 2 } ... #pragmaomp section { // Do task N } // Implicit barrier } Barrier ... NoNo

OpenMP – parallel for 33 #pragmaomp parallel for for ( int i = 0; i < NumberOfIterations; i++ ) { // Threads scatter here // each thread has a private copy of i doSomeWork( i ); } // Implicit barrier Scheduling the iterations

OpenMP – reduction 34 int TotalAmountOfWork = 0; #pragmaomp parallel for reduction ( + : TotalAmountOfWork ) for ( int i = 0; i < NumberOfIterations; i++ ) { // Threads scatter here // each thread has a private copy of i & TotalAmountOfWork TotalAmountOfWork += doSomeWork( i ); } // Implicit barrier // TotalAmountOfWork was properly accumulated // Each thread has local copy, barrier does reduction // No need to use critical sections

OpenMP – “atomic” reduction 35 int TotalAmountOfWork = 0; #pragmaomp parallel for for ( int i = 0; i < NumberOfIterations; i++ ) { // Threads scatter here int myWork = doSomeWork( i); #pragmaomp atomic TotalAmountOfWork += myWork; } // Implicit barrier // TotalAmountOfWork was properly accumulated // However, the atomic section can cause thread stalls

OpenMP – critical 36 int TotalAmountOfWork = 0; #pragmaomp parallel for reduction ( + : TotalAmountOfWork ) for ( int i = 0; i < NumberOfIterations; i++ ) { // Threads scatter here // each thread has a private copy of i TotalAmountOfWork += doSomeWork( i ); #pragmaomp critical { // Execute by one thread at a time, e.g., “Mutex lock” criticalOperation(); } } // Implicit barrier

OpenMP – single 37 int TotalAmountOfWork = 0; #pragmaomp parallel for reduction ( + : TotalAmountOfWork ) for ( int i = 0; i < NumberOfIterations; i++ ) { // Threads scatter here // each thread has a private copy of i TotalAmountOfWork += doSomeWork( i ); #pragmaomp single nowait { // Execute by one thread, use “master” for the main thread reportProgress ( TotalAmountOfWork ); } // !! No implicit barrier because of “nowait” clause !! } // Implicit barrier

Threading Building Blocks (TBB) 38

Introduction to TBB Commercial and Open Source Licenses GPL with runtime exception Cross-platform C++ library Similar to STL Usual concurrency classes Several different constructs for threading for, do, reduction, pipeline Finer control over scheduling Maintains a thread pool to execute tasks http://www.threadingbuildingblocks.org/ 39

TBB – parallel for 40 #include "tbb/blocked_range.h” #include "tbb/parallel_for.h” class Worker { public: Worker ( /* ... */ ) {...}; void operator() ( const tbb::blocked_range<int>& r ) const { for ( int i = r.begin(); i != r.end(); ++i ) { doWork ( i ); } } }; ... tbb::parallel_for ( tbb::blocked_range<int> ( 0, N ), Worker ( /* ... */ ), tbb::auto_partitioner() );

TBB – parallel reduction 41 #include "tbb/blocked_range.h” #include "tbb/parallel_reduce.h” class ReducingWorker { int mLocalWork; public: ReducingWorker ( /* ... */ ) {...}; ReducingWorker ( ReducingWorker& o, split ) : mLocalWork(0) {}; void join ( const ReducingWorker& o ) {mLocalWork += o.mLocalWork}; void operator() ( const tbb::blocked_range<int>& r ) { ... } }; ... Worker w; tbb::parallel_reduce ( tbb::blocked_range<int> ( 0, N ), w, tbb::auto_partitioner() ); w.getLocalWork();

TBB – synchronization 43 tbb::spin_mutex MyMutex; void doWork ( /* ... */ ) { // Enter critical section, exit when lock goes out of scope tbb::spin_mutex::scoped_lock lock ( MyMutex ); // NB: This is an error!!! // tbb::spin_mutex::scoped_lock( MyMutex ); } ... #include <tbb/atomic.h> tbb::atomic<int> MyCounter; ... MyCounter = 0; // Atomic int i = MyCounter; // Atomic MyCounter++; MyCounter--; ++MyCounter; --MyCounter; // Atomic ... MyCounter = 0; MyCounter += 2; // Watch out for other threads!

ITK Implementation Threads operate across slices Only implemented behavior in ITK itk::MultiThreader is somewhat flexible Requires that you break the ITK model Uses Thread Join, higher overhead No thread pool 45

Comparison 46 Language specific (Java) + Fine-grain control + Cross-platform easy(?) + Many constructs +/- Language-specific Threads (C/C++) + Fine-grain control ,[object Object]

Few constructsITK + Integrated + Simple ,[object Object],+/- ITK only TBB +/- More complex + Fine-grain control + Intel (-?) + Open Source + Some constructs ,[object Object],OpenMP + Simple + Adapt existing code +/- Industry standard +/- Compiler support ,[object Object],diy

Image class 48 class Image { public: short* mData; int mWidth, mHeight, mDepth; int mVoxelsPerSlice; int mVoxelsPerVolume; short* mSlicePointers; // Pointers to the start of each slice short getVoxel ( int x, int y, int z ) {...} void setVoxel ( int x, int y, int z, short v ) {...} };

Trivial problem – threshold Threshold an image If intensity > 100, output 1 otherwise output 0 Present from simple to complex OpenMP TBB ITK pthread(see extra slides) 49

Threshold – OpenMP #1 50 void doThreshold ( Image* in, Image* out ) { #pragmaomp parallel for for ( int z = 0; z < in->mDepth; z++ ) { for ( int y = 0; y < in->mHeight; y++ ) { for ( int x = 0; x < in->mWidth; x++ ) { if ( in->getVoxel(x,y,z) > 100 ) { out->setVoxel(x,y,z,1); } else { out->setVoxel(x,y,z,0); } } } } } // NB: can loop over slices, rows or columns by moving // pragma, but must choose at compile time

Threshold – OpenMP #2 51 void doThreshold ( Image* in, Image* out ) { #pragmaomp parallel for for ( int s = 0; s < in->mVoxelsPerVolume; s++ ) { if ( in->mData[s] > 100 ) { out->mData[s] = 1; } else { out->mData[s] = 0; } } } // Likely a lot faster than previous code

class Threshold { public: Threshold ( Image* in, Image* o ) : in ( i ), out ( o ) {...} void operator() ( const tbb::blocked_range<int>& r ) { for ( int x = r.begin(); x != r.end(); ++x ) { if ( in->mData[x] > 100 ) { out->mData[x] = 1; } else { out->mData[x] = 0; } } } } ... parallel_for ( tbb::blocked_range<int>(0, in->mVoxelsPerVolume ), Threshold ( in, out ), auto_partitioner() ); // NB: default “grain size” for blocked_range is 1 pixel // tbb::blocked_range<int>(..., in->mVoxelsPerVolume / NumberOfCPUs ) Threshold – TBB #1 52

class Threshold { public: Threshold ( Image* in, Image* o ) : in ( i ), out ( o ) {...} void operator() ( const tbb::blocked_range<int>& r ) {...} void operator() ( const tbb::blocked_range2d<int,int>& r ) { for ( int z = in->mDepth; z < in->mDepth; z++ ) { for ( int y = r.rows().begin(); y != r.rows.end(); y++ ) { for ( int x = r.cols().begin(); x != r.cols().end(); x++ ){ if ( in->getVoxel(x,y,z) > 100 ) { out->setVoxel(x,y,z,1); } else { out->setVoxel(x,y,z,0); } } } } } }; ... parallel_for ( tbb::blocked_range2d<int,int>( 0, in->mHeight, 32 0, in->mWidth, 32 ), Threshold ( in, out ), auto_partitioner() ); Threshold – TBB #2 53

class Threshold { public: Threshold ( Image* in, Image* o ) : in ( i ), out ( o ) {...} void operator() ( const tbb::blocked_range<int>& r ) {...} void operator() ( const tbb::blocked_range2d<int,int>& r ) {...} void operator() ( const tbb::blocked_range3d<int,int,int>& r ) { for ( int z = r.pages().begin(); z != r.pages().end(); z++ ) { for ( int y = r.rows().begin(); y != r.rows.end(); y++ ) { for ( int x = r.cols().begin(); x != r.cols().end(); x++ ){ if ( in->getVoxel(x,y,z) > 100 ) { out->setVoxel(x,y,z,1); } else { out->setVoxel(x,y,z,0); } } } } } }; ... parallel_for ( tbb::blocked_range3d<int,int,int>(0, in->mDepth, 1 0, in->mHeight, 32 0, in->mWidth, 32 ), Threshold ( in, out ), auto_partitioner() ); Threshold – TBB #3 54

Threshold – ITK solution 55 ThreadedGenerateData( const OutputImageRegionType out, int threadId) { ... // Define the iterators ImageRegionConstIterator<TIn> inputIt(inputPtr, out); ImageRegionIterator<TOut> outputIt(outputPtr, out); inputIt.GoToBegin(); outputIt.GoToBegin(); while( !inputIt.IsAtEnd() ) { if ( inputIt.Get() > 100 ) { outputIt.Set ( 1 ); } else { outputIt.Set ( 0 ); { ++inputIt; ++outputIt; } }

Interesting problem – anisotropic diffusion Edge preserving smoothing method Perona and Malik. Scale-space and edge detection using anisotropic diffusion. Pattern Analysis and Machine Intelligence, IEEE Transactions on (1990) vol. 12 (7) pp. 629 – 639 Iterative process Demonstrate OpenMP TBB (ITK has an implementation) (pthreads are tedious at the very least) Pop quiz – are the following correct? 56

Anisotropic diffusion – OpenMP 57 void doAD ( Image* in, Image* out ) { #pragmaomp parallel for for ( int t = 0; t < TotalTime; t++ ) { for ( int z = 0; z < in->mDepth; z++ ) { ... } } }

Anisotropic diffusion – OpenMP 58 void doAD ( Image* in, Image* out ) { short *previousSlice, *slice, *nextSlice; for ( int t = 0; t < TotalTime; t++ ) { #pragmaomp parallel for for ( int z = 1; z < in->mDepth-1; z++ ) { previousSlice = in->mSlicePointers[z-1]; slice = in->mSlicePointers[z]; nextSlice = in->mSlicePointers[z+1]; for ( int y = 1; y < in->mHeight-1; y++ ) { short* previousRow = slice + y-1 * in->mWidth; short* row = slice + y * in->mWidth; short* nextRow = slice + y-1 * in->mWidth; short* aboveRow = previousSlice + y * in->mWidth; short* belowRow = nextSlice + y * in->mWidth; for ( int x = 1; i < in->mWidth-1; x++ ) { dx = 2 * row[x] – row[x-1] – row[x+1]; dy = 2 * row[x] – previousRow[x] – nextRow[x]; dz = 2 * row[x] – aboveRow[x] – belowRow[x]; ...

Anisotropic diffusion – OpenMP 59 void doAD ( Image* in, Image* out ) { for ( int t = 0; t < TotalTime; t++ ) { #pragmaomp parallel for for ( int z = 1; z < in->mDepth-1; z++ ) { short* previousSlice = in->mSlicePointers[z-1]; short* slice = in->mSlicePointers[z]; short* nextSlice = in->mSlicePointers[z+1]; for ( int y = 1; y < in->mHeight-1; y++ ) { short* previousRow = slice + y-1 * in->mWidth; short* row = slice + y * in->mWidth; short* nextRow = slice + y-1 * in->mWidth; short* aboveRow = previousSlice + y * in->mWidth; short* belowRow = nextSlice + y * in->mWidth; for ( int x = 1; i < in->mWidth-1; x++ ) { dx = 2 * row[x] – row[x-1] – row[x+1]; dy = 2 * row[x] – previousRow[x] – nextRow[x]; dz = 2 * row[x] – aboveRow[x] – belowRow[x]; ...

Anisotropic diffusion – TBB #1 60 class doAD { public: static ADConstants* sConstants; doAD ( Image* in, Image* out ) { ... } void operator() ( const tbb::blocked_range3d<int,int,int>& r ) { if ( !sConstants == NULL ) { initConstants(); } // process ... } }

Threshold – TBB #2 61 class doAD { public: doAd ( ... ) {...} void operator() ( const tbb::blocked_range3d<int,int,int>& r ) { for ( int z = r.pages().begin(); z != r.pages().end(); z++ ) { for ( int y = r.rows().begin(); y != r.rows.end(); y++ ) { for ( int x = r.cols().begin(); x != r.cols().end(); x++ ){ ... } }; ... parallel_for ( tbb::blocked_range3d<int,int,int>(0, in->mDepth 0, in->mHeight 0, in->mWidth ), doAD ( in, out ), auto_partitioner() );

Threshold – TBB #3 62 class doAD { public: static tbb::atomic<int> sProgress; tbb::spin_mutexmMutex; doAd ( ... ) {...} void reportProgress ( int p ) { ... } void operator() ( const tbb::blocked_range3d<int,int,int>& r ) { for ( int z = r.pages().begin(); z != r.pages().end(); z++ ) { tbb::spin_mutex::scoped_lock lock ( mMutex ); sProgress++; reportProgress ( sProgress ); for ( int y = r.rows().begin(); y != r.rows.end(); y++ ) { for ( int x = r.cols().begin(); x != r.cols().end(); x++ ){ ... } }; ... doAD::sProgress = 0; parallel_for (...);

Threshold – TBB #4 63 class doAD { public: static tbb::atomic<int> sProgress; static tbb::spin_mutexmMutex; doAd ( ... ) {...} void reportProgress ( int p ) { ... } void operator() ( const tbb::blocked_range3d<int,int,int>& r ) { for ( int z = r.pages().begin(); z != r.pages().end(); z++ ) { tbb::spin_mutex::scoped_lock lock ( mMutex ); sProgress++; reportProgress ( sProgress ); for ( int y = r.rows().begin(); y != r.rows.end(); y++ ) { for ( int x = r.cols().begin(); x != r.cols().end(); x++ ){ ... } }; ... doAD::sProgress = 0; parallel_for (...);

nowait Anisotropic diffusion – OpenMP (Progress) 64 using std; void doAD ( Image* in, Image* out ) { int progress = 0; for ( int t = 0; t < TotalTime; t++ ) { #pragmaomp parallel for for ( int s = 0; s < in->mDepth; s++ ) { #pragmaomp atomic progress++; #pragmaomp single reportProgress ( progress ); ... } } }

Real-life problem Compute Frangi’svesselness measure Frangi et al. Model-based quantitation of 3-D magnetic resonance angiographic images. IEEE Transactions on Medical Imaging (1999) vol. 18 (10) pp. 946-956 Memory constrained solution ITK implementation requires 1.2G for 100M volume Antiga. Generalizing vesselness with respect to dimensionality and shape. Insight Journal (2007) Possible solutions using OpenMP, TBB 65

ITK Implementation – computing the Hessian 6 volumes computed in serial Individual filters are threaded Good CPU usage High memory requirements 67

Design considerations Break problem into blocks Compute hessian, eigenvalues, and vesselness Reduces memory requirements Incurs overhead, boundary conditions 68

Design considerations 69 keep cpu’s full

Design considerations – boundary condition 70

Algorithm sketch – Serial 72 intBlockSize = 32; for ( intz = 0; z < image->mDepth; z += BlockSize ) { for ( inty = 0; y < image->mHeight; y += BlockSize ) { for ( intx = 0; x < image->mWidth; x += BlockSize ) { processBlock ( in, out, x, y, z, BlockSize ); } } }

Algorithm sketch – OpenMP 73 intBlockSize = 32; #pragmaomp parallel for for ( intz = 0; z < image->mDepth; z += BlockSize ) { for ( inty = 0; y < image->mHeight; y += BlockSize ) { for ( intx = 0; x < image->mWidth; x += BlockSize ) { processBlock ( in, out, x, y, z, BlockSize ); } } } Each thread is on a different slice May cause cache contention Similar problems for “y” direction

Algorithm sketch – OpenMP 74 intBlockSize = 32; for ( intz = 0; z < image->mDepth; z += BlockSize ) { for ( inty = 0; y < image->mHeight; y += BlockSize ) { #pragmaomp parallel for for ( intx = 0; x < image->mWidth; x += BlockSize ) { processBlock ( in, out, x, y, z, BlockSize ); } } } All threads on same rows May not utilize all CPUs If Ratio of Width to BlockSize < # CPUs Better cache utilization

Algorithm sketch – TBB 75 class Vesselness { public: void operator() ( const tbb::blocked_range3d<int,int,int>& r ) { // Process the block, could use ITK here processBlock ( r.cols().begin(), r.rows().begin(), r.pages().begin(), r.cols().size(), r.rows().size(), r.pages().size() ); ... parallel_for ( tbb::blocked_range3d<int,int,int>( 0, in->mDepth, 32 0, in->mHeight, 32 0, in->mWidth, 32 ), Vesselness( in, out ), auto_partitioner() ); Individual blocks Full CPUs May not have best cache performance

Next steps Go try parallel development Try threads to gain understanding and insight Next OpenMP, adapting existing code TBB: more constructs, different approachs Experiment with new languages Erlang, Scala, Reia, Chapel, X10, Fortress... Check out some of the resources provided Have fun! It’s a brave new world out there... 76

Resources TBB (http://www.threadingbuildingblocks.org/) OpenMP (http://openmp.org/wp/) Books/Articles Java Concurrency in Practice (http://www.javaconcurrencyinpractice.com/) Parallel Programming (http://www-users.cs.umn.edu/~karypis/parbook/) ITK Software Guide (http://www.itk.org/ItkSoftwareGuide.pdf) The Problem with Threads (http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf) Tutorials Parallel Programming(https://computing.llnl.gov/tutorials/parallel_comp/) pthreads (https://computing.llnl.gov/tutorials/pthreads/) OpenMP (https://computing.llnl.gov/tutorials/openMP/) Other LLNL (https://computing.llnl.gov/) Erlang (http://en.wikipedia.org/wiki/Erlang_programming_language) GCC-OpenMP (http://gcc.gnu.org/projects/gomp/) Intel Compiler (http://software.intel.com/en-us/intel-compilers/) 77

Resources Languages Erlang (http://www.erlang.org/) Scala (http://www.scala-lang.org/) Chapel (http://chapel.cs.washington.edu/) X10 (http://x10-lang.org/) Unified Parallel C (http://upc.gwu.edu/) Titanium (http://titanium.cs.berkeley.edu/) Co-Array Fortran (http://www.co-array.org/) ZPL (http://www.cs.washington.edu/research/zpl/home/index.html) High Performance Fortran (http://hpff.rice.edu/) Fortress (http://projectfortress.sun.com/Projects/Community/) Others (http://www.google.com/search?q=parallel+programming+language) 78

Thread construction – pthread example 80 include <pthread.h> void *(*start_routine)(void *); int pthread_create(pthread_t *restrict thread, const pthread_attr_t *restrict attr, void *(*start_routine)(void *), void *restrict arg); void pthread_exit(void *value_ptr); int pthread_join(pthread_t thread, void **value_ptr);

Mutex – pthread example 81 #include <pthread.h> pthread_mutex_t myMutex; ... pthread_mutex_init ( &myMutex, NULL ); ... pthread_mutex_lock ( &myMutex ); // Critical Section, only one thread at a time ... pthread_mutex_unlock ( &myMutex ); ... if ( pthread_mutex_trylock ( &myMutex ) == EBUSY ) { // We did get the lock, so we are in the critical section ... pthread_mutex_unlock ( &myMutex ); }

Mutex – Java example 82 import java.lang.*; class Foo { public synchronized int doWork () { // only one thread can execute doWork } Object resource; public int otherWork () { synchronized ( resource ) { // critical section, resource is the mutex ... } }

Medical Image Processing Strategies for multi-core CPUs

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Medical Image Processing Strategies for multi-core CPUs

Similar to Medical Image Processing Strategies for multi-core CPUs (20)

Recently uploaded

Recently uploaded (20)

Medical Image Processing Strategies for multi-core CPUs

Editor's Notes