Intel® MPI Library e OpenMP* - Intel Software Conference 2013
Upcoming SlideShare
Loading in...5
×
 

Intel® MPI Library e OpenMP* - Intel Software Conference 2013

on

  • 1,205 views

Palestra ministrada por Werner Krotz-Vogel no Intel Software Conference nos dias 6 de Agosto (NCC/UNESP/SP) e 12 de Agosto (COPPE/UFRJ/RJ).

Palestra ministrada por Werner Krotz-Vogel no Intel Software Conference nos dias 6 de Agosto (NCC/UNESP/SP) e 12 de Agosto (COPPE/UFRJ/RJ).

Statistics

Views

Total Views
1,205
Views on SlideShare
953
Embed Views
252

Actions

Likes
0
Downloads
23
Comments
0

18 Embeds 252

http://software.intel.com 200
http://cluster512.rssing.com 14
http://cluster153.rssing.com 8
http://revivable36.rssing.com 6
http://homologate31.swaltsu.com 4
http://finno-ugric17.rssing.com 4
http://reupholstering48.rssing.com 3
http://parallel362.rssing.com 2
http://parallel360.rssing.com 2
http://cloud.feedly.com 1
http://hypothesizers24.rssing.com 1
http://hypothyroidism24.rssing.com 1
http://yugoslav32.hokando.com 1
http://hypotoxic24.rssing.com 1
http://uncollaboratively32.tawaba.com 1
http://fortran71.rssing.com 1
http://www.google.com.br 1
http://melastomaceae24.shattafa.com 1
More...

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Intel® MPI Library e OpenMP* - Intel Software Conference 2013 Intel® MPI Library e OpenMP* - Intel Software Conference 2013 Presentation Transcript

  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. MPI and OpenMP Reducing effort for parallel software development August, 2013 1 Werner Krotz-Vogel
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. © 2009 Mathew J. Sottile, Timothy G. Mattson, and Craig E 2 Objectives • Design parallel applications from serial codes • Determine appropriate decomposition strategies for applications • Choose applicable parallel model for implementation • MPI and OpenMP
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Why MPI and OpenMP ? • Performance ~ Die Area - 4x the Silicon Die area gives 2x the performance in one core, but 4x the performance when dedicated to 4 cores - Power ~ Voltage2 (voltage is roughly prop. to clock freq.) Conclusion (with respect to above Pollack’s rule) - Multiple cores is a powerful handle to adjust “Performance/Watt”  Parallel Hardware  Parallel Software
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 4 Parallel Programming: Algorithms Distributed Versus Shared Memory CPU Memory Bus Memory C P U C P U C P U C P U CPU Memory CPU Memory CPU Memory Network Message Passing Threads Multiple processes Share data with messages MPI* Single process Concurrent execution Shared memory and resources Explicit threads, OpenMP*
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 5 Parallel Programming: Algorithms Designing Parallel Programs •Partition – Divide problem into tasks •Communicate – Determine amount and pattern of communication •Agglomerate – Combine tasks •Map – Assign agglomerated tasks to physical processors The Problem Initial tasks Communication Combined Tasks Final Program
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 6 Parallel Programming: Algorithms 1. Partitioning •Discover as much parallelism as possible • Independent computations and/or data • Maximize number of primitive tasks •Functional decomposition • Divide the computation, then associate the data •Domain decomposition • Divide the data into pieces, then associate computation Initial tasks
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 7 Parallel Programming: Algorithms Decomposition Methods •Functional decomposition – Focusing on computations can reveal structure in a problem Grid reprinted with permission of Dr. Phu V. Luong, Coastal and Hydraulics Laboratory, Engineer Research and Development Center (ERDC). Domain decomposition • Focus on largest or most frequently accessed data structure • Data parallelism • Same operation(s) applied to all data Atmosphere Model Ocean Model Land Surface Model Hydrology Model
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 8 Parallel Programming: Algorithms 2. Communication •Determine the communication pattern between primitive tasks • What data need to be shared? •Point-to-point • One thread to another •Collective • Groups of threads sharing data •Execution order dependencies are communication Communication
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 9 Parallel Programming: Algorithms 3. Agglomeration •Group primitive tasks in order to: • Improve performance/granularity – Localize communication • Put tasks that communicate in the same group – Maintain scalability of design • Gracefully handle changes in data set size or number of processors – Simplify programming and maintenance Combined Tasks
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 10 Parallel Programming: Algorithms 4. Mapping •Assign tasks to processors in order to: – Maximize processor utilization – Minimize inter-processor communication •One task or multiple tasks per processor? •Static or dynamic assignment? •Most applicable to message passing – Programmer can map tasks to threads Final Program
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 11 Parallel Programming: Algorithms What Is Not Parallel•Subprograms with “state” or with side effects – Pseudo-random number generators – File I/O routines – Output on screen •Loops with data dependencies – Variables written in one iteration and read in another – Quick test: Reverse loop iterations Loop carried – Value carried from one iteration to the next Induction variables – Incremented each trip through loop Reductions – Summation; collapse array to single value Recurrence – Feed information forward
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 12 Introduction to MPI What is MPI ? CPU Private Memory CPU Private Memory CPU Private Memory Node 0 Node 1 Node n
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 13 Introduction to MPI The Distributed-Memory Model •Characteristics of distributed memory machines • No common address space • High-latency interconnection network • Explicit message exchange
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 14 Introduction to MPI Message Passing Interface (MPI) •Depending on the interconnection network, clusters exhibit different interfaces to the network, e.g. • Ethernet: UNIX sockets • InfiniBand: OFED, Verbs •MPI provides an abstraction to these interfaces • Generic communication interface • Logical ranks (no physical addresses) • Supportive functions (e.g. parallel file I/O)
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 15 Introduction to MPI “Hello World” in Fortran •program hello •include 'mpif.h‘ •integer mpierr, rank, procs •call MPI_Init(mpierr) •call MPI_Comm_size(MPI_COMM_WORLD, procs, mpierr) •call MPI_Comm_rank(MPI_COMM_WORLD, rank, mpierr) •write (*,*) 'Hello world from ', rank, 'of', procs •call MPI_Finalize(mpierr) •end program hello
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 16 Introduction to MPI Compilation and Execution •MPI implementations ship with a compiler wrapper: • mpiicc –o helloc hello.c • mpiifort –o hellof hello.f •Wrapper correctly calls native C/Fortran compiler and passes along MPI specifics (e.g. library) •Wrappers usually accept the same compiler options as the underlying native compiler, e.g. • mpiicc –O2 –fast –o module.o –c module.c
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 17 Introduction to MPI Compilation and Execution •To run the “Hello World”, use: • mpirun –np 8 helloc •It provides portable, transparent application start-up • connect to the cluster nodes for execution • launch processes on the nodes • pass along information how to reach others •When mpirun returns, execution was completed. •Note: mpirun is implementation-specific
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 18 Introduction to MPI Output of “Hello World” • Hello world from 0 of 8 • Hello world from 1 of 8 • Hello world from 4 of 8 • Hello world from 6 of 8 • Hello world from 5 of 8 • Hello world from 7 of 8 • Hello world from 2 of 8 • Hello world from 3 of 8 No particular ordering of process execution! If needed, programmer must ensure ordering by explicit comm’.
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 19 Introduction to MPI Sending Messages (Blocking) • subroutine master(array, length) • include 'mpif.h' • double precision array(1) • integer length • double precision sum, globalsum • integer rank, procs, mpierr, size • call MPI_Comm_size(MPI_COMM_WORLD, procs, mpierr) • size = length / procs • do rank = 1,procs-1 • call MPI_Send(size, 1, MPI_INTEGER, rank, 0, • & MPI_COMM_WORLD, mpierr) • call MPI_Send(array(rank*size+1:rank*size+size), size, • & MPI_DOUBLE_PRECISION, rank, 1, MPI_COMM_WORLD, mpierr) • enddo Example only correct, iff length is a multiple of procs.
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 20 Introduction to MPI MPI_Send •int MPI_Send(void* buf, int count, MPI_Datatype dtype, int dest, int tag, MPI_Comm comm) •MPI_SEND(BUF, COUNT, DTYPE, DEST, TAG, COMM,IERR) <type> BUF(*) INTEGER COUNT, DTYPE, DEST, TAG, COMM, IERR •Blocking message delivery • blocks until receiver has completely received the message • effectively synchronizes sender and receiver
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 21 Introduction to MPI MPI_Send buf Pointer to message data (e.g. pointer to first element of an array) count Length of the message in elements dtype Data type of the message content (size of data type x count = message size) dest Rank of the destination process tag “Type” of the message comm Handle to the communication group ierr Fortran: OUT argument for error code return value C/C++: error code
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 22 Introduction to MPI MPI Data Type C Data Type MPI_BYTE MPI_CHAR signed char MPI_DOUBLE double MPI_FLOAT float MPI_INT int MPI_LONG long MPI_LONG_DOUBLE long double MPI_PACKED MPI_SHORT short MPI_UNSIGNED_SHORT unsigned short MPI_UNSIGNED unsigned int MPI_UNSIGNED_LONG unsigned long MPI_UNSIGNED_CHAR unsigned char MPI provides predefined data types that must be specified when passing messages. MPI Data Types for C
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 23 Introduction to MPI Communication Wildcards •MPI defines a set of wildcards to be specified with communication primitives: MPI_ANY_SOURCE Matches any logical rank when receiving a message with MPI_Recv (message status contains actual sender) MPI_ANY_TAG Matches any message tag when receiving a message (message status contains actual tag) MPI_PROC_NULL Special value indicating non-existent process rank (messages are not delivered or received for this special rank)
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 24 Introduction to MPI Blocking Communication •MPI_Send and MPI_Recv are blocking operations MPI_Send MPI_Recv Computation Communication Process A Process B
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 25 Introduction to MPI Non-blocking Communication •MPI_Isend and MPI_Irecv are blocking operations MPI_Isend MPI_Irecv Computation Communication Process A Process B MPI_Wait MPI_Wait
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 26 Introduction to MPI ‘Collectives’, e.g. MPI_Reduce •int MPI_Reduce(void* sendbuf, void* recvbuf, int count, MPI_Datatype dtype, MPI_Op op, int root, MPI_Comm comm) •MPI_REDUCE(SENDBUF, RECVBUF, COUNT, DTYPE, OP, ROOT, COMM, IERR) <type> SENDBUF(*), RECVBUF(*) INTEGER COUNT, DTYPE, OP, ROOT, COMM, IERR •Global operation that accumulates data at the processors into a global result at the root process. • All processes have to reach the same MPI_Reduce invocation. • Otherwise deadlocks and undefined behavior may occur.
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 27 Introduction to MPI MPI_Reduce – Operators MPI_MAX maximum MPI_MIN minimum MPI_SUM sum MPI_PROD product MPI_LAND / MPI_BAND logical and / bit-wise and MPI_LOR / MPI_BOR logical or / bit-wise or MPI_LXOR MPI_BXOR logical excl. or / bit-wise excl. or MPI_MAXLOC max value and location MPI_MINLOC min value and location
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 28 Introduction to MPI MPI _Barrier •int MPI_Barrier(MPI_Comm comm ) •MPI_BARRIER(COMM, IERROR) INTEGER COMM, IERROR •Global operation that synchronizes all participating processes. • All processes have to reach an MPI_Barrier invocation. • Otherwise deadlocks and undefined behavior may occur.
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 29 Introduction to MPI Stencil Computation example •Some algorithms (e.g. Jacobi, Gauss- Seidel) process data in with a stencil: • grid(i,j) = 0.25 * (grid(i+1,j) + grid(i-1,j) + grid(i,j+1) + grid(i,j-1)) •Data access pattern:i-1,j i+1,j i,j+1i,j-1 i,j
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 30 Introduction to MPI MPI features not covered • One-sided communication – MPI_Put, MPI_Get – Uses Remote Memory Access (RMA) – Separates communication from synchronization • User-defined datatypes, strided messages • Dynamic process spawning: MPI_Spawn Collective communication can be used across disjoint intra- communicators • Parallel I/O • MPI 3.0 (released Sept 21, 2012)
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 31 What Is OpenMP? • Portable, shared-memory threading API –Fortran, C, and C++ –Multi-vendor support for both Linux and Windows • Standardizes task & loop-level parallelism • Supports coarse-grained parallelism • Combines serial and parallel code in single source • Standardizes ~ 20 years of compiler- directed threading experience http://www.openmp.org Current spec is OpenMP 4.0 July 31, 2013 (combined C/C++ and Fortran) Introduction to OpenMP
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 32 OpenMP Programming Model Fork-Join Parallelism: • Master thread spawns a team of threads as needed • Parallelism is added incrementally: that is, the sequential program evolves into a parallel program Parallel Regions Master Thread Introduction to OpenMP
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 33 A Few Syntax Details to Get Started • Most of the constructs in OpenMP are compiler directives or pragmas – For C and C++, the pragmas take the form: #pragma omp construct [clause [clause]…] – For Fortran, the directives take one of the forms: C$OMP construct [clause [clause]…] !$OMP construct [clause [clause]…] *$OMP construct [clause [clause]…] • Header file or Fortran 90 module #include “omp.h” use omp_lib Introduction to OpenMP
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 34 Worksharing • Worksharing is the general term used in OpenMP to describe distribution of work across threads. • Three examples of worksharing in OpenMP are: • omp for construct • omp sections construct • omp task construct Automatically divides work among threads Introduction to OpenMP
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 35 ‘omp for’ Construct • Threads are assigned an independent set of iterations • Threads must wait at the end of work-sharing construct #pragma omp parallel #pragma omp for Implicit barrier i = 1 i = 2 i = 3 i = 4 i = 5 i = 6 i = 7 i = 8 i = 9 i = 10 i = 11 i = 12 // assume N=12 #pragma omp parallel #pragma omp for for(i = 1, i < N+1, i++) c[i] = a[i] + b[i]; Introduction to OpenMP
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 36 New Addition to OpenMP Tasks Main change for OpenMP 3.0 • Allows parallelization of irregular problems • unbounded loops • recursive algorithms • producer/consume Device Constructs Main change for OpenMP 4.0 • Allows to describe regions of code where data and/or computation should be moved to another computing device. Introduction to OpenMP
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 37 What are tasks? • Tasks are independent units of work • Threads are assigned to perform the work of each task – Tasks may be deferred • Tasks may be executed immediately • The runtime system decides which of the above – Tasks are composed of: • code to execute • data environment • internal control variables (ICV) Serial Parallel Introduction to OpenMP
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 38 Simple Task Example A pool of 8 threads is created here #pragma omp parallel // assume 8 threads { #pragma omp single private(p) { … while (p) { #pragma omp task { processwork(p); } p = p->next; } } } One thread gets to execute the while loop The single “while loop” thread creates a task for each instance of processwork() Introduction to OpenMP
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 39 Task Construct – Explicit Task View – A team of threads is created at the omp parallel construct – A single thread is chosen to execute the while loop – lets call this thread “L” – Thread L operates the while loop, creates tasks, and fetches next pointers – Each time L crosses the omp task construct it generates a new task and has a thread assigned to it – Each task runs in its own thread – All tasks complete at the barrier at the end of the parallel region’s single construct #pragma omp parallel { #pragma omp single { // block 1 node * p = head; while (p) { //block 2 #pragma omp task process(p); p = p->next; //block 3 } } } Introduction to OpenMP
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 40 OpenMP* Reduction Clause • reduction (op : list) • The variables in “list” must be shared in the enclosing parallel region • Inside parallel or work-sharing construct: • A PRIVATE copy of each list variable is created and initialized depending on the “op” • These copies are updated locally by threads • At end of construct, local copies are combined through “op” into a single value and combined with the value in the original SHARED variable Introduction to OpenMP
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 41 Reduction Example • Local copy of sum for each thread • All local copies of sum added together and stored in “global” variable #pragma omp parallel for reduction(+:sum) for(i=0; i<N; i++) { sum += a[i] * b[i]; } Introduction to OpenMP
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 10 20 40 80 160 320 640 1280 2560 5120 1 2 4 8 16 32 64 128 Runtimeinseconds Number of nodes 1 PPN 1 PPN / 2 TPP 1 PPN / 4 TPP 1 PPN / 8 TPP 2 PPN 2 PPN / 2 TPP 2 PPN / 4 TPP 4 PPN 4 PPN / 2 TPP 8 PPN Why Hybrid Programming? OpenMP/MPI PPN = processes per node TPP = threads per process 53% improvement over MPI Simulation of Free-Surface Flows, Finite Element CFD solver written in Fortran and C Figure kindly provided by HPC group of the Center of Computing and Communication, RWTH Aachen, Germany
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. The Good, the Bad, and the Ugly The Good • OpenMP and MPI blend well with each other if certain rules are respected by programmers. The Bad • Programmers need to be aware of the issues of hybrid programming, e.g. using thread-safe libraries and MPI. The Ugly • What’s the best setting for PPN and TPP for a given machine? MPI and OpenMP hybrid programs can greatly improve performance of parallel codes ! 43
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 44
  • © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright © , Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Core, Xeon Phi, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Legal Disclaimer & Optimization Notice Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 45 8/21/201 Intel Confidential - Use under NDA only 45