Multicore

Multicore Birgit Plötzeneder, 11/24/10

Intro (Why?) Architecture Languages OMP MPI Tools

Darling, I shrunk the computer. * copyright by Prof. Erik Hagersten / Uppsala, who does awesome work Signal propagation delay » transistor delay Not enough ILP for more transistors Power consumption

O RLY? You want FASTER code. NOW. - prefetching - high comp load - image/video - fun

Moving from 1 core to 4 cores can give you a factor of

Moving from m emory to L 1 can give you a factor of

Disabling the L2 cache will reduce system performance more than disabling a second CPU core of a dual-core processor …

Program start : only master thread runs Parallel region : team of worker threads is generated (“fork”) Threads synchronize when leaving parallel region (“join”) OpenMP-Concept

Work-sharing constructs omp for or omp do sections single master

Data sharing attribute clauses shared : visible and accessible by all threads simultaneously. Default (!i) . a[i]=a[i-1].. private : each thread will have a local copy, value is not maintained for use outside firstprivate : like private except initialized to original value. lastprivate : like private except original value is updated after construct. reduction (->reduction ops)

Scheduling clauses schedule(type, chunk): static dynamic guided

Other clauses critical : executed by only one thread at a time atomic : similar to critical section, but may be better ordered : executed in the order in which iterations would be executed in a sequential loop b arrier nowait

MPI-Concept mpicc <options> prog.c mpirun -arch <architecture> -np<np> prog

MPI program: 6 basic calls MPI messages Communicators MPI MPI_INIT MPI_COMM_RANK MPI_COMM_SIZE MPI_SEND MPI_RECV MPI_FINALIZE data (startbuf, count, datatype) envelope (destination/source, tag, communicatior)

Communication modes Collective vs P2P One2All, All2All, All2One Blocking / Nonblocking Synchronous / Asynchronous

Communication modes synchronous mode ("safest"): Is the receiver ready? ready mode (lowest system overhead)- only if there is a receiver waiting (streaming) buffered mode (decouples sender from receiver), buffer size, buffer attachment! standard mode

Communication Mode Blocking Routines Non-Blocking Routines synchronous MPI_SSEND MPI_ISSEND ready MPI_RSEND MPI_IRSEND buffered MPI_BSEND MPI_IBSEND standard MPI_SEND MPI_ISEND MPI_RECV MPI_IRECV MPI_SENDRECV MPI_SENDRECV_REPLACE

Collective communication Barrier Broadcast Gather Scatter Reduction

PAPI PAPI is a library that monitors hardware events when a program runs. Papiex is a tool that makes it easy to get access to performance counters using PAPI .* * http://icl.cs.utk.edu/papi / papiex –e <EVENT> ./my_prog (to turn of optimizations (use the flag -O0) for some tests)

Profilers Two Types Statistical Profilers Event Based Profilers Statistical Profiling : Interrupts at random intervals and records which program instruction the CPU is executing. Event Based Profiling : Interrupts triggered by hardware counter events are recorded. Measuring profiles affects performance. Still a lot of data saved.

Tracing Wrappers for function calls (for example MPI_Recv) Records when a function was called and with what parameters Which nodes exchanged messages, message size… Can affect performance

Intel tracing tools Marmot MPI correctness and portability checker MpiP - http://mpip.sourceforge.net/

Extrae + Paraver module add paraver mpi2prv -f TRACE.mpits -o MPImatrix.prv v Scalasca Screenshots and examples of profilers/tracing tools available – but not on the internet. v

This talk was given to the TumFUG Linux/Unix-User group at the TU München. Contact me via [email_address] You may use the pictures of the processors (not the screenshots, not the overview pic which I only adapted), but please do notify and credit me accordingly. Some of the code was copy-pasted from Wikipedia. I've removed copy-right problematic parts.

Multicore

More Related Content

What's hot

Viewers also liked

Similar to Multicore

More from Birgit Plötzeneder

Recently uploaded

Multicore