SlideShare a Scribd company logo
1 of 42
Advancing Application Process Affinity Experimentation:
Open MPI's LAMA-Based Affinity Interface
Jeff Squyres
September 18, 2013
Joshua Hursey
Locality Matters
• Multiple talks here at EuroMPI’13 about
network locality
• Goals:
 Minimize data transfer distance
 Reduce network congestion and contention
• …this also matters inside the server, too!
Machine (128GB)
NUMANode P#0 (64GB)
Socket P#0
L3 (20MB)
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#0
PU P#0
PU P#16
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#1
PU P#1
PU P#17
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#2
PU P#2
PU P#18
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#3
PU P#3
PU P#19
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#4
PU P#4
PU P#20
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#5
PU P#5
PU P#21
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#6
PU P#6
PU P#22
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#7
PU P#7
PU P#23
PCI 8086:1521
eth0
PCI 8086:1521
eth1
PCI 8086:1521
eth2
PCI 8086:1521
eth3
PCI 1137:0043
eth4
PCI 1137:0043
eth5
PCI 102b:0522
NUMANode P#1 (64GB)
Socket P#1
L3 (20MB)
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#0
PU P#8
PU P#24
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#1
PU P#9
PU P#25
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#2
PU P#10
PU P#26
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#3
PU P#11
PU P#27
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#4
PU P#12
PU P#28
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#5
PU P#13
PU P#29
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#6
PU P#14
PU P#30
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#7
PU P#15
PU P#31
PCI 1000:005b
sda sdb
PCI 1137:0043
eth6
PCI 1137:0043
eth7
Intel Xeon E5-2690 (“Sandy Bridge”)
2 sockets, 8 cores, 64GB per socket
1G
NICs
10G
NICs
10G
NICs
L1 and L2
Shared L3
Hyperthreading enabled
The intent of this work is to provide a mechanism that
allows users to explore the process-placement space
within the scope of their own applications.
A User’s Playground
LAMA
• Locality-Aware Mapping Algorithm (LAMA)
 Supports a wide range of regular mapping
patterns.
• Adapts at runtime to available hardware
 Supports homogeneous and heterogeneous
systems.
• Extensible to any depth of server topology
 Naturally supports potentially deeper
topologies of future server architectures.
LAMA Inspiration
• Drawn from much prior work
• Most notably, heavily inspired by
BlueGene/P and /Q mapping systems
 LAMA’s mapping specification is similar
Launching MPI Applications
• Three steps in MPI process placement
1. Mapping
2. Ordering
3. Binding
• Let's discuss how these work in Open MPI
1. Mapping
• Create a layout of processes-to-resources
Server Server Server Server
Server Server Server Server
Server Server Server Server
Server Server Server Server
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
MPI
Mapping
• MPI's runtime must create a map, pairing
processes-to-processors (and memory).
• Basic technique:
 Gather hwloc topologies from allocated nodes.
 Mapping agent then makes a plan for which
resources are assigned to processes
Mapping Agent
• Act of planning mappings:
 Specify which process will be launched on
each server
 Identify if any hardware resource will be
oversubscribed
• Processes are mapped to the resolution of
a single processing unit (PU)
 Smallest unit of allocation: hardware thread
 In HPC, usually the same as a processor core
Oversubscription
• Common / usual definition:
 When a single PU is assigned more than one
process
• Complicating the definition:
 Some application may need more than one
PU per process (multithreaded applications)
• How can the user express what their
application means by “oversubscription”?
2. Ordering: By “Slot”
Assigning MCW ranks to mapped processes
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
16 17 18 19
20 21 22 23
24 25 26 27
28 29 30 31
32
36
40
44
48 49 50 51 64 65 66 67 80
2. Ordering: By Node
Assigning MCW ranks to mapped processes
0 16 32 48
64 80 96 112
128 144 160 176
192 208 224 240
1 17 33 49
65 81 97 113
129 145 161 177
193 209 225 241
2
66
130
194
4 20 36 52 5 23 37 53 6
Ordering
• Each process must be assigned a unique
rank in MPI_COMM_WORLD
• Two common types of ordering:
 natural
• The order in which processes are mapped
determines their rank in MCW
 sequential
• The processes are sequentially numbered starting
at the first processing unit, and continuing until the
last processing unit
3. Binding
• Launch processes and enforce the layout
Machine (128GB)
NUMANode P#0 (64GB)
Socket P#0
L3 (20MB)
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#0
PU P#0
PU P#16
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#1
PU P#1
PU P#17
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#2
PU P#2
PU P#18
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#3
PU P#3
PU P#19
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#4
PU P#4
PU P#20
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#5
PU P#5
PU P#21
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#6
PU P#6
PU P#22
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#7
PU P#7
PU P#23
PCI 8086:1521
eth0
PCI 8086:1521
eth1
PCI 8086:1521
eth2
PCI 8086:1521
eth3
PCI 1137:0043
eth4
PCI 1137:0043
eth5
PCI 102b:0522
NUMANode P#1 (64GB)
Socket P#1
L3 (20MB)
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#0
PU P#8
PU P#24
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#1
PU P#9
PU P#25
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#2
PU P#10
PU P#26
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#3
PU P#11
PU P#27
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#4
PU P#12
PU P#28
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#5
PU P#13
PU P#29
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#6
PU P#14
PU P#30
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#7
PU P#15
PU P#31
PCI 1000:005b
sda sdb
PCI 1137:0043
eth6
PCI 1137:0043
eth7
Indexes: physical
Machine (128GB)
NUMANode P#0 (64GB)
Socket P#0
L3 (20MB)
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#0
PU P#0
PU P#16
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#1
PU P#1
PU P#17
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#2
PU P#2
PU P#18
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#3
PU P#3
PU P#19
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#4
PU P#4
PU P#20
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#5
PU P#5
PU P#21
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#6
PU P#6
PU P#22
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#7
PU P#7
PU P#23
PCI 8086:1521
eth0
PCI 8086:1521
eth1
PCI 8086:1521
eth2
PCI 8086:1521
eth3
PCI 1137:0043
eth4
PCI 1137:0043
eth5
PCI 102b:0522
NUMANode P#1 (64GB)
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
Machine (128GB)
NUMANode P#0 (64GB)
Socket P#0
L3 (20MB)
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#0
PU P#0
PU P#16
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#1
PU P#1
PU P#17
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#2
PU P#2
PU P#18
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#3
PU P#3
PU P#19
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#4
PU P#4
PU P#20
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#5
PU P#5
PU P#21
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#6
PU P#6
PU P#22
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#7
PU P#7
PU P#23
PCI 8086:1521
eth0
PCI 8086:1521
eth1
PCI 8086:1521
eth2
PCI 8086:1521
eth3
PCI 1137:
eth4
PCI 1137:
eth5
PCI 102b:0522
NUMANode P#1 (64GB)
Socket P#1
L3 (20MB)
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#0
PU P#8
PU P#24
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#1
PU P#9
PU P#25
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#2
PU P#10
PU P#26
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#3
PU P#11
PU P#27
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#4
PU P#12
PU P#28
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#5
PU P#13
PU P#29
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#6
PU P#14
PU P#30
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#7
PU P#15
PU P#31
PCI 1000:005b
sda sdb
PCI 1137:
eth6
PCI 1137:
eth7
Indexes: physical
Machine (128GB)
NUMANode P#0 (64GB)
Socket P#0
L3 (20MB)
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#0
PU P#0
PU P#16
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#1
PU P#1
PU P#17
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#2
PU P#2
PU P#18
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#3
PU P#3
PU P#19
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#4
PU P#4
PU P#20
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#5
PU P#5
PU P#21
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#6
PU P#6
PU P#22
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#7
PU P#7
PU P#23
PCI 8086:1521
eth0
PCI 8086:1521
eth1
PCI 8086:1521
eth2
PCI 8086:1521
eth3
PCI 1137:
eth4
PCI 1137:
eth5
PCI 102b:0522
NUMANode P#1 (64GB)
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
Machine (128GB)
NUMANode P#0 (64GB)
Socket P#0
L3 (20MB)
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#0
PU P#0
PU P#16
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#1
PU P#1
PU P#17
L2 (25
L1d (3
L1i (3
Core
PU
PU
NUMANode P#1 (64GB)
Socket P#1
L3 (20MB)
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#0
PU P#8
PU P#24
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#1
PU P#9
PU P#25
L2 (25
L1d (3
L1i (3
Core
PU
PU
Indexes: physical
Machine (128GB)
NUMANode P#0 (64GB)
Socket P#0
L3 (20MB)
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#0
PU P#0
PU P#16
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#1
PU P#1
PU P#17
L2 (25
L1d (3
L1i (3
Core
PU
PU
NUMANode P#1 (64GB)
32 33 3
40 41 4
Machine (128GB)
NUMANode P#0 (64GB)
Socket P#0
L3 (20MB)
L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB)
PCI 8086:1521
eth0
PCI 8086:1521
Machine (128GB)
NUMANode P#0 (64GB)
Socket P#0
L3 (20MB)
L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB)
PCI 8086:1521
eth0
PCI 8086:1521
Machine (128GB)
NUMANode P#0 (64GB)
Socket P#0
L3 (20MB)
L2 (256KB) L2 (256KB) L2 (25
Binding
• Process-launching agent working with the
OS to limit where each process can run:
1. No restrictions
2. Limited set of restrictions
3. Specific resource restrictions
• “Binding width”
 The number of PUs to which a process is
bound
Command Line Interface (CLI)
• 4 levels of abstraction for the user
 Level 1: None
 Level 2: Simple, common patterns
 Level 3: LAMA process layout regular patterns
 Level 4: Irregular patterns
CLI: Level 1 (none)
• No mapping or binding options specified
 May or may not specify the number of
processes to launch (-np)
 If not specified, default to the number of cores
available in the allocation
 One process is mapped to each core in the
system in a "by-core" style
 Processes are not bound
• …for backwards compatibility reasons 
CLI: Level 2 (common)
• Simple, common patterns for mapping and
binding
 Specify mapping pattern with
• --map-by X (e.g., --map-by socket)
 Specify binding option with:
• --bind-to Y (e.g., --bind-to core)
 All of these options are translated to Level 3
options for processing by LAMA
(full list of X / Y values shown later)
CLI: Level 3 (regular patterns)
• LAMA process layout regular patterns
 Power users wanting something unique for
their application
 Four MCA run-time parameters
• rmaps_lama_map: Mapping process layout
• rmaps_lama_bind: Binding width
• rmaps_lama_order: Ordering of MCW ranks
• rmaps_lama_mppr: Maximum allowable number of
processes per resource (oversubscription)
rmaps_lama_map (map)
• Takes as an argument the "process layout"
 A series of nine tokens
• allowing 9! (362,880) mapping permutation options.
 Preferred iteration order for LAMA
• innermost iteration specified first
• outermost iteration specified last
Example system
2 servers (nodes), 4 sockets, 2 cores, 2 PUs
rmaps_lama_map (map)
• map=scbnh (a.k.a., by socket, then by core)
rmaps_lama_map (map)
• map=scbnh (a.k.a., by socket, then by core)
rmaps_lama_map (map)
• map=scbnh (a.k.a., by socket, then by core)
rmaps_lama_map (map)
• map=scbnh (a.k.a., by socket, then by core)
rmaps_lama_map (map)
• map=scbnh (a.k.a., by socket, then by core)
rmaps_lama_bind (bind)
• “Binding width" and layer
• Example: bind=3c (3 cores)Machine (128GB)
NUMANode P#0 (64GB)
Socket P#0
L3 (20MB)
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#0
PU P#0
PU P#16
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#1
PU P#1
PU P#17
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#2
PU P#2
PU P#18
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#3
PU P#3
PU P#19
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#4
PU P#4
PU P#20
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#5
PU P#5
PU P#21
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#6
PU P#6
PU P#22
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#7
PU P#7
PU P#23
PCI
e
PCI
e
PCI
e
PCI
e
bind = 3c
Machine (128GB)
NUMANode P#0 (64GB)
Socket P#0
L3 (20MB)
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#0
PU P#0
PU P#16
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#1
PU P#1
PU P#17
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#2
PU P#2
PU P#18
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#3
PU P#3
PU P#19
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#4
PU P#4
PU P#20
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#5
PU P#5
PU P#21
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#6
PU P#6
PU P#22
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#7
PU P#7
PU P#23
rmaps_lama_bind (bind)
• “Binding width" and layer
• Example: bind=2s (2 sockets)
Machine (128GB)
NUMANode P#0 (64GB)
Socket P#0
L3 (20MB)
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#0
PU P#0
PU P#16
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#1
PU P#1
PU P#17
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#2
PU P#2
PU P#18
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#3
PU P#3
PU P#19
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#4
PU P#4
PU P#20
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#5
PU P#5
PU P#21
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#6
PU P#6
PU P#22
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#7
PU P#7
PU P#23
PCI 8086:1521
eth0
PCI 8086:1521
eth1
PCI 8086:1521
eth2
PCI 8086:1521
eth3
PCI 1137:0043
eth4
PCI 1137:0043
eth5
PCI 102b:0522
bind = 2s
bind = 2s
rmaps_lama_bind (bind)
• “Binding width" and layer
• Example: bind=12 (all PUs in an L2)
bind = 12
rmaps_lama_bind (bind)
• “Binding width" and layer
• Example: bind=1N (all PUs in NUMA locality)
bind = 1N
rmaps_lama_order (order)
• Select which ranks are assigned to
processes in MCW
• There are other possible orderings, but no
one has asked for them yet…
Natural order for
map-by-node (default)
Sequential order for
any mapping
rmaps_lama_mppr (mppr)
• mppr (mip-per) sets the Maximum number
of allowable Processes Per Resource
 User-specified definition of oversubscription
• Comma-delimited list of <#:resource>
 1:c  At most one process per core
 1:c,2:s  At most one process per core, and
at most two processes per socket
MPPR
 1:c  At most one process per coreMachine (128GB)
NUMANode P#0 (64GB)
Socket P#0
L3 (20MB)
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#0
PU P#0
PU P#16
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#1
PU P#1
PU P#17
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#2
PU P#2
PU P#18
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#3
PU P#3
PU P#19
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#4
PU P#4
PU P#20
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#5
PU P#5
PU P#21
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#6
PU P#6
PU P#22
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#7
PU P#7
PU P#23
MPPR
 1:c,2:s  At most one process per core and
two processes per socket
Machine (128GB)
NUMANode P#0 (64GB)
Socket P#0
L3 (20MB)
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#0
PU P#0
PU P#16
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#1
PU P#1
PU P#17
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#2
PU P#2
PU P#18
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#3
PU P#3
PU P#19
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#4
PU P#4
PU P#20
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#5
PU P#5
PU P#21
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#6
PU P#6
PU P#22
L2 (256KB)
L1d (32KB)
L1i (32KB)
Core P#7
PU P#7
PU P#23
CLI: Level 4 (rankfile)
• Complete specification of processor-to-
resource mapping description
 Bypasses LAMA
• Not described in the paper
Level 2 to Level 3 Chart
Remember the prior example?
• -np 24 -mppr 2:c -map scbnh
Same example, different mapping
• -np 24 -mppr 2:c -map nbsch
• Displays prettyprint representation of the
binding actually used for each process.
 Visual feedback = quite helpful when exploring
mpirun -np 4 --mca rmaps lama --mca rmaps_lama_bind 1c --mca
rmaps_lama_map nbsch --mca rmaps_lama_mppr 1:c --report-
bindings hello_world
MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..]
MCW rank 1 bound to socket 1[core 8[hwt 0-1]]: [../../../../../../../..][BB/../../../../../../..]
MCW rank 2 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..]
MCW rank 3 bound to socket 1[core 9[hwt 0-1]]: [../../../../../../../..][../BB/../../../../../..]
Report Bindings
Future Work
• Available in Open MPI v1.7.2 (and later)
• Open questions to users:
 Are more flexible ordering options useful?
 What common mapping patterns are useful?
 What additional features would you like to
see?
Thank You

More Related Content

What's hot

Using GTP on Linux with libgtpnl
Using GTP on Linux with libgtpnlUsing GTP on Linux with libgtpnl
Using GTP on Linux with libgtpnlKentaro Ebisawa
 
Networking and Go: An Epic Journey
Networking and Go: An Epic JourneyNetworking and Go: An Epic Journey
Networking and Go: An Epic JourneySneha Inguva
 
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)Thomas Graf
 
LinuxCon 2015 Stateful NAT with OVS
LinuxCon 2015 Stateful NAT with OVSLinuxCon 2015 Stateful NAT with OVS
LinuxCon 2015 Stateful NAT with OVSThomas Graf
 
LF_DPDK17_Lagopus Router
LF_DPDK17_Lagopus RouterLF_DPDK17_Lagopus Router
LF_DPDK17_Lagopus RouterLF_DPDK
 
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NATOpen vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NATThomas Graf
 
Compiling P4 to XDP, IOVISOR Summit 2017
Compiling P4 to XDP, IOVISOR Summit 2017Compiling P4 to XDP, IOVISOR Summit 2017
Compiling P4 to XDP, IOVISOR Summit 2017Cheng-Chun William Tu
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughThomas Graf
 
Protecting the Privacy of the Network – Using P4 to Prototype and Extend Netw...
Protecting the Privacy of the Network – Using P4 to Prototype and Extend Netw...Protecting the Privacy of the Network – Using P4 to Prototype and Extend Netw...
Protecting the Privacy of the Network – Using P4 to Prototype and Extend Netw...Open-NFP
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014   Kernel Networking WalkthroughDevConf 2014   Kernel Networking Walkthrough
DevConf 2014 Kernel Networking WalkthroughThomas Graf
 
[Webinar Slides] Programming the Network Dataplane in P4
[Webinar Slides] Programming the Network Dataplane in P4[Webinar Slides] Programming the Network Dataplane in P4
[Webinar Slides] Programming the Network Dataplane in P4Open Networking Summits
 
P4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server Adapters
P4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server AdaptersP4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server Adapters
P4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server AdaptersOpen-NFP
 
Programming Protocol-Independent Packet Processors
Programming Protocol-Independent Packet ProcessorsProgramming Protocol-Independent Packet Processors
Programming Protocol-Independent Packet ProcessorsOpen Networking Summits
 
BPF - All your packets belong to me
BPF - All your packets belong to meBPF - All your packets belong to me
BPF - All your packets belong to me_xhr_
 
Network Measurement with P4 and C on Netronome Agilio
Network Measurement with P4 and C on Netronome AgilioNetwork Measurement with P4 and C on Netronome Agilio
Network Measurement with P4 and C on Netronome AgilioOpen-NFP
 
Linux Native, HTTP Aware Network Security
Linux Native, HTTP Aware Network SecurityLinux Native, HTTP Aware Network Security
Linux Native, HTTP Aware Network SecurityThomas Graf
 
P4 for Custom Identification, Flow Tagging, Monitoring and Control
P4 for Custom Identification, Flow Tagging, Monitoring and ControlP4 for Custom Identification, Flow Tagging, Monitoring and Control
P4 for Custom Identification, Flow Tagging, Monitoring and ControlOpen-NFP
 
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
BPF  & Cilium - Turning Linux into a Microservices-aware Operating SystemBPF  & Cilium - Turning Linux into a Microservices-aware Operating System
BPF & Cilium - Turning Linux into a Microservices-aware Operating SystemThomas Graf
 
Consensus as a Network Service
Consensus as a Network ServiceConsensus as a Network Service
Consensus as a Network ServiceOpen-NFP
 

What's hot (20)

Using GTP on Linux with libgtpnl
Using GTP on Linux with libgtpnlUsing GTP on Linux with libgtpnl
Using GTP on Linux with libgtpnl
 
Networking and Go: An Epic Journey
Networking and Go: An Epic JourneyNetworking and Go: An Epic Journey
Networking and Go: An Epic Journey
 
Ebpf ovsconf-2016
Ebpf ovsconf-2016Ebpf ovsconf-2016
Ebpf ovsconf-2016
 
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
 
LinuxCon 2015 Stateful NAT with OVS
LinuxCon 2015 Stateful NAT with OVSLinuxCon 2015 Stateful NAT with OVS
LinuxCon 2015 Stateful NAT with OVS
 
LF_DPDK17_Lagopus Router
LF_DPDK17_Lagopus RouterLF_DPDK17_Lagopus Router
LF_DPDK17_Lagopus Router
 
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NATOpen vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NAT
 
Compiling P4 to XDP, IOVISOR Summit 2017
Compiling P4 to XDP, IOVISOR Summit 2017Compiling P4 to XDP, IOVISOR Summit 2017
Compiling P4 to XDP, IOVISOR Summit 2017
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
 
Protecting the Privacy of the Network – Using P4 to Prototype and Extend Netw...
Protecting the Privacy of the Network – Using P4 to Prototype and Extend Netw...Protecting the Privacy of the Network – Using P4 to Prototype and Extend Netw...
Protecting the Privacy of the Network – Using P4 to Prototype and Extend Netw...
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014   Kernel Networking WalkthroughDevConf 2014   Kernel Networking Walkthrough
DevConf 2014 Kernel Networking Walkthrough
 
[Webinar Slides] Programming the Network Dataplane in P4
[Webinar Slides] Programming the Network Dataplane in P4[Webinar Slides] Programming the Network Dataplane in P4
[Webinar Slides] Programming the Network Dataplane in P4
 
P4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server Adapters
P4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server AdaptersP4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server Adapters
P4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server Adapters
 
Programming Protocol-Independent Packet Processors
Programming Protocol-Independent Packet ProcessorsProgramming Protocol-Independent Packet Processors
Programming Protocol-Independent Packet Processors
 
BPF - All your packets belong to me
BPF - All your packets belong to meBPF - All your packets belong to me
BPF - All your packets belong to me
 
Network Measurement with P4 and C on Netronome Agilio
Network Measurement with P4 and C on Netronome AgilioNetwork Measurement with P4 and C on Netronome Agilio
Network Measurement with P4 and C on Netronome Agilio
 
Linux Native, HTTP Aware Network Security
Linux Native, HTTP Aware Network SecurityLinux Native, HTTP Aware Network Security
Linux Native, HTTP Aware Network Security
 
P4 for Custom Identification, Flow Tagging, Monitoring and Control
P4 for Custom Identification, Flow Tagging, Monitoring and ControlP4 for Custom Identification, Flow Tagging, Monitoring and Control
P4 for Custom Identification, Flow Tagging, Monitoring and Control
 
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
BPF  & Cilium - Turning Linux into a Microservices-aware Operating SystemBPF  & Cilium - Turning Linux into a Microservices-aware Operating System
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
 
Consensus as a Network Service
Consensus as a Network ServiceConsensus as a Network Service
Consensus as a Network Service
 

Similar to Open MPI Explorations in Process Affinity (EuroMPI'13 presentation)

Arrow multisolution nxp lpc4300 dual core
Arrow multisolution   nxp lpc4300 dual coreArrow multisolution   nxp lpc4300 dual core
Arrow multisolution nxp lpc4300 dual coreAmir Sherman
 
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux DeviceAdding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux DeviceSamsung Open Source Group
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Ray Jenkins
 
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of ThingsJerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of ThingsSamsung Open Source Group
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120Linaro
 
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028ssuser5b12d1
 
FIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using KurentoFIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using KurentoFIWARE
 
Hortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIHortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIDataWorks Summit
 
lecture16-recap-questions-and-answers.pdf
lecture16-recap-questions-and-answers.pdflecture16-recap-questions-and-answers.pdf
lecture16-recap-questions-and-answers.pdfAyushKumar93531
 
BGP Prime
BGP Prime BGP Prime
BGP Prime KHNOG
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideLinaro
 
Apache Spark Best Practices Meetup Talk
Apache Spark Best Practices Meetup TalkApache Spark Best Practices Meetup Talk
Apache Spark Best Practices Meetup TalkEren Avşaroğulları
 
POLYTEDA PowerDRC/LVS overview
POLYTEDA PowerDRC/LVS overviewPOLYTEDA PowerDRC/LVS overview
POLYTEDA PowerDRC/LVS overviewAlexander Grudanov
 
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick44CON
 
Experiences with Power 9 at A*STAR CRC
Experiences with Power 9 at A*STAR CRCExperiences with Power 9 at A*STAR CRC
Experiences with Power 9 at A*STAR CRCGanesan Narayanasamy
 

Similar to Open MPI Explorations in Process Affinity (EuroMPI'13 presentation) (20)

Arrow multisolution nxp lpc4300 dual core
Arrow multisolution   nxp lpc4300 dual coreArrow multisolution   nxp lpc4300 dual core
Arrow multisolution nxp lpc4300 dual core
 
Pipeline parallelism
Pipeline parallelismPipeline parallelism
Pipeline parallelism
 
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux DeviceAdding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of ThingsJerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
 
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
Amd epyc update_gdep_xilinx_ai_web_seminar_20201028
 
FIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using KurentoFIWARE Global Summit - Real-time Media Stream Processing Using Kurento
FIWARE Global Summit - Real-time Media Stream Processing Using Kurento
 
Hortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AIHortonworks on IBM POWER Analytics / AI
Hortonworks on IBM POWER Analytics / AI
 
lecture16-recap-questions-and-answers.pdf
lecture16-recap-questions-and-answers.pdflecture16-recap-questions-and-answers.pdf
lecture16-recap-questions-and-answers.pdf
 
BGP Prime
BGP Prime BGP Prime
BGP Prime
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
 
Apache Spark Best Practices Meetup Talk
Apache Spark Best Practices Meetup TalkApache Spark Best Practices Meetup Talk
Apache Spark Best Practices Meetup Talk
 
6LoWPAN: An open IoT Networking Protocol
6LoWPAN: An open IoT Networking Protocol6LoWPAN: An open IoT Networking Protocol
6LoWPAN: An open IoT Networking Protocol
 
POLYTEDA PowerDRC/LVS overview
POLYTEDA PowerDRC/LVS overviewPOLYTEDA PowerDRC/LVS overview
POLYTEDA PowerDRC/LVS overview
 
Lrz kurs: big data analysis
Lrz kurs: big data analysisLrz kurs: big data analysis
Lrz kurs: big data analysis
 
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
 
Experiences with Power 9 at A*STAR CRC
Experiences with Power 9 at A*STAR CRCExperiences with Power 9 at A*STAR CRC
Experiences with Power 9 at A*STAR CRC
 

More from Jeff Squyres

Open MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOFOpen MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOFJeff Squyres
 
MPI Sessions: a proposal to the MPI Forum
MPI Sessions: a proposal to the MPI ForumMPI Sessions: a proposal to the MPI Forum
MPI Sessions: a proposal to the MPI ForumJeff Squyres
 
MPI Fourm SC'15 BOF
MPI Fourm SC'15 BOFMPI Fourm SC'15 BOF
MPI Fourm SC'15 BOFJeff Squyres
 
Open MPI SC'15 State of the Union BOF
Open MPI SC'15 State of the Union BOFOpen MPI SC'15 State of the Union BOF
Open MPI SC'15 State of the Union BOFJeff Squyres
 
Cisco's journey from Verbs to Libfabric
Cisco's journey from Verbs to LibfabricCisco's journey from Verbs to Libfabric
Cisco's journey from Verbs to LibfabricJeff Squyres
 
(Very) Loose proposal to revamp MPI_INIT and MPI_FINALIZE
(Very) Loose proposal to revamp MPI_INIT and MPI_FINALIZE(Very) Loose proposal to revamp MPI_INIT and MPI_FINALIZE
(Very) Loose proposal to revamp MPI_INIT and MPI_FINALIZEJeff Squyres
 
Fun with Github webhooks: verifying Signed-off-by
Fun with Github webhooks: verifying Signed-off-byFun with Github webhooks: verifying Signed-off-by
Fun with Github webhooks: verifying Signed-off-byJeff Squyres
 
Open MPI new version number scheme and roadmap
Open MPI new version number scheme and roadmapOpen MPI new version number scheme and roadmap
Open MPI new version number scheme and roadmapJeff Squyres
 
The State of libfabric in Open MPI
The State of libfabric in Open MPIThe State of libfabric in Open MPI
The State of libfabric in Open MPIJeff Squyres
 
Cisco usNIC libfabric provider
Cisco usNIC libfabric providerCisco usNIC libfabric provider
Cisco usNIC libfabric providerJeff Squyres
 
2014 01-21-mpi-community-feedback
2014 01-21-mpi-community-feedback2014 01-21-mpi-community-feedback
2014 01-21-mpi-community-feedbackJeff Squyres
 
Cisco usNIC: how it works, how it is used in Open MPI
Cisco usNIC: how it works, how it is used in Open MPICisco usNIC: how it works, how it is used in Open MPI
Cisco usNIC: how it works, how it is used in Open MPIJeff Squyres
 
Cisco EuroMPI'13 vendor session presentation
Cisco EuroMPI'13 vendor session presentationCisco EuroMPI'13 vendor session presentation
Cisco EuroMPI'13 vendor session presentationJeff Squyres
 
MOSSCon 2013, Cisco Open Source talk
MOSSCon 2013, Cisco Open Source talkMOSSCon 2013, Cisco Open Source talk
MOSSCon 2013, Cisco Open Source talkJeff Squyres
 
Ethernet and TCP optimizations
Ethernet and TCP optimizationsEthernet and TCP optimizations
Ethernet and TCP optimizationsJeff Squyres
 
Friends don't let friends leak MPI_Requests
Friends don't let friends leak MPI_RequestsFriends don't let friends leak MPI_Requests
Friends don't let friends leak MPI_RequestsJeff Squyres
 
MPI-3 Timer requests proposal
MPI-3 Timer requests proposalMPI-3 Timer requests proposal
MPI-3 Timer requests proposalJeff Squyres
 
MPI_Mprobe is good for you
MPI_Mprobe is good for youMPI_Mprobe is good for you
MPI_Mprobe is good for youJeff Squyres
 
The Message Passing Interface (MPI) in Layman's Terms
The Message Passing Interface (MPI) in Layman's TermsThe Message Passing Interface (MPI) in Layman's Terms
The Message Passing Interface (MPI) in Layman's TermsJeff Squyres
 

More from Jeff Squyres (20)

Open MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOFOpen MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOF
 
MPI Sessions: a proposal to the MPI Forum
MPI Sessions: a proposal to the MPI ForumMPI Sessions: a proposal to the MPI Forum
MPI Sessions: a proposal to the MPI Forum
 
MPI Fourm SC'15 BOF
MPI Fourm SC'15 BOFMPI Fourm SC'15 BOF
MPI Fourm SC'15 BOF
 
Open MPI SC'15 State of the Union BOF
Open MPI SC'15 State of the Union BOFOpen MPI SC'15 State of the Union BOF
Open MPI SC'15 State of the Union BOF
 
Cisco's journey from Verbs to Libfabric
Cisco's journey from Verbs to LibfabricCisco's journey from Verbs to Libfabric
Cisco's journey from Verbs to Libfabric
 
(Very) Loose proposal to revamp MPI_INIT and MPI_FINALIZE
(Very) Loose proposal to revamp MPI_INIT and MPI_FINALIZE(Very) Loose proposal to revamp MPI_INIT and MPI_FINALIZE
(Very) Loose proposal to revamp MPI_INIT and MPI_FINALIZE
 
Fun with Github webhooks: verifying Signed-off-by
Fun with Github webhooks: verifying Signed-off-byFun with Github webhooks: verifying Signed-off-by
Fun with Github webhooks: verifying Signed-off-by
 
Open MPI new version number scheme and roadmap
Open MPI new version number scheme and roadmapOpen MPI new version number scheme and roadmap
Open MPI new version number scheme and roadmap
 
The State of libfabric in Open MPI
The State of libfabric in Open MPIThe State of libfabric in Open MPI
The State of libfabric in Open MPI
 
Cisco usNIC libfabric provider
Cisco usNIC libfabric providerCisco usNIC libfabric provider
Cisco usNIC libfabric provider
 
2014 01-21-mpi-community-feedback
2014 01-21-mpi-community-feedback2014 01-21-mpi-community-feedback
2014 01-21-mpi-community-feedback
 
Cisco usNIC: how it works, how it is used in Open MPI
Cisco usNIC: how it works, how it is used in Open MPICisco usNIC: how it works, how it is used in Open MPI
Cisco usNIC: how it works, how it is used in Open MPI
 
Cisco EuroMPI'13 vendor session presentation
Cisco EuroMPI'13 vendor session presentationCisco EuroMPI'13 vendor session presentation
Cisco EuroMPI'13 vendor session presentation
 
MPI History
MPI HistoryMPI History
MPI History
 
MOSSCon 2013, Cisco Open Source talk
MOSSCon 2013, Cisco Open Source talkMOSSCon 2013, Cisco Open Source talk
MOSSCon 2013, Cisco Open Source talk
 
Ethernet and TCP optimizations
Ethernet and TCP optimizationsEthernet and TCP optimizations
Ethernet and TCP optimizations
 
Friends don't let friends leak MPI_Requests
Friends don't let friends leak MPI_RequestsFriends don't let friends leak MPI_Requests
Friends don't let friends leak MPI_Requests
 
MPI-3 Timer requests proposal
MPI-3 Timer requests proposalMPI-3 Timer requests proposal
MPI-3 Timer requests proposal
 
MPI_Mprobe is good for you
MPI_Mprobe is good for youMPI_Mprobe is good for you
MPI_Mprobe is good for you
 
The Message Passing Interface (MPI) in Layman's Terms
The Message Passing Interface (MPI) in Layman's TermsThe Message Passing Interface (MPI) in Layman's Terms
The Message Passing Interface (MPI) in Layman's Terms
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Open MPI Explorations in Process Affinity (EuroMPI'13 presentation)

  • 1. Advancing Application Process Affinity Experimentation: Open MPI's LAMA-Based Affinity Interface Jeff Squyres September 18, 2013 Joshua Hursey
  • 2. Locality Matters • Multiple talks here at EuroMPI’13 about network locality • Goals:  Minimize data transfer distance  Reduce network congestion and contention • …this also matters inside the server, too!
  • 3. Machine (128GB) NUMANode P#0 (64GB) Socket P#0 L3 (20MB) L2 (256KB) L1d (32KB) L1i (32KB) Core P#0 PU P#0 PU P#16 L2 (256KB) L1d (32KB) L1i (32KB) Core P#1 PU P#1 PU P#17 L2 (256KB) L1d (32KB) L1i (32KB) Core P#2 PU P#2 PU P#18 L2 (256KB) L1d (32KB) L1i (32KB) Core P#3 PU P#3 PU P#19 L2 (256KB) L1d (32KB) L1i (32KB) Core P#4 PU P#4 PU P#20 L2 (256KB) L1d (32KB) L1i (32KB) Core P#5 PU P#5 PU P#21 L2 (256KB) L1d (32KB) L1i (32KB) Core P#6 PU P#6 PU P#22 L2 (256KB) L1d (32KB) L1i (32KB) Core P#7 PU P#7 PU P#23 PCI 8086:1521 eth0 PCI 8086:1521 eth1 PCI 8086:1521 eth2 PCI 8086:1521 eth3 PCI 1137:0043 eth4 PCI 1137:0043 eth5 PCI 102b:0522 NUMANode P#1 (64GB) Socket P#1 L3 (20MB) L2 (256KB) L1d (32KB) L1i (32KB) Core P#0 PU P#8 PU P#24 L2 (256KB) L1d (32KB) L1i (32KB) Core P#1 PU P#9 PU P#25 L2 (256KB) L1d (32KB) L1i (32KB) Core P#2 PU P#10 PU P#26 L2 (256KB) L1d (32KB) L1i (32KB) Core P#3 PU P#11 PU P#27 L2 (256KB) L1d (32KB) L1i (32KB) Core P#4 PU P#12 PU P#28 L2 (256KB) L1d (32KB) L1i (32KB) Core P#5 PU P#13 PU P#29 L2 (256KB) L1d (32KB) L1i (32KB) Core P#6 PU P#14 PU P#30 L2 (256KB) L1d (32KB) L1i (32KB) Core P#7 PU P#15 PU P#31 PCI 1000:005b sda sdb PCI 1137:0043 eth6 PCI 1137:0043 eth7 Intel Xeon E5-2690 (“Sandy Bridge”) 2 sockets, 8 cores, 64GB per socket 1G NICs 10G NICs 10G NICs L1 and L2 Shared L3 Hyperthreading enabled
  • 4. The intent of this work is to provide a mechanism that allows users to explore the process-placement space within the scope of their own applications. A User’s Playground
  • 5. LAMA • Locality-Aware Mapping Algorithm (LAMA)  Supports a wide range of regular mapping patterns. • Adapts at runtime to available hardware  Supports homogeneous and heterogeneous systems. • Extensible to any depth of server topology  Naturally supports potentially deeper topologies of future server architectures.
  • 6. LAMA Inspiration • Drawn from much prior work • Most notably, heavily inspired by BlueGene/P and /Q mapping systems  LAMA’s mapping specification is similar
  • 7. Launching MPI Applications • Three steps in MPI process placement 1. Mapping 2. Ordering 3. Binding • Let's discuss how these work in Open MPI
  • 8. 1. Mapping • Create a layout of processes-to-resources Server Server Server Server Server Server Server Server Server Server Server Server Server Server Server Server MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI MPI
  • 9. Mapping • MPI's runtime must create a map, pairing processes-to-processors (and memory). • Basic technique:  Gather hwloc topologies from allocated nodes.  Mapping agent then makes a plan for which resources are assigned to processes
  • 10. Mapping Agent • Act of planning mappings:  Specify which process will be launched on each server  Identify if any hardware resource will be oversubscribed • Processes are mapped to the resolution of a single processing unit (PU)  Smallest unit of allocation: hardware thread  In HPC, usually the same as a processor core
  • 11. Oversubscription • Common / usual definition:  When a single PU is assigned more than one process • Complicating the definition:  Some application may need more than one PU per process (multithreaded applications) • How can the user express what their application means by “oversubscription”?
  • 12. 2. Ordering: By “Slot” Assigning MCW ranks to mapped processes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 36 40 44 48 49 50 51 64 65 66 67 80
  • 13. 2. Ordering: By Node Assigning MCW ranks to mapped processes 0 16 32 48 64 80 96 112 128 144 160 176 192 208 224 240 1 17 33 49 65 81 97 113 129 145 161 177 193 209 225 241 2 66 130 194 4 20 36 52 5 23 37 53 6
  • 14. Ordering • Each process must be assigned a unique rank in MPI_COMM_WORLD • Two common types of ordering:  natural • The order in which processes are mapped determines their rank in MCW  sequential • The processes are sequentially numbered starting at the first processing unit, and continuing until the last processing unit
  • 15. 3. Binding • Launch processes and enforce the layout Machine (128GB) NUMANode P#0 (64GB) Socket P#0 L3 (20MB) L2 (256KB) L1d (32KB) L1i (32KB) Core P#0 PU P#0 PU P#16 L2 (256KB) L1d (32KB) L1i (32KB) Core P#1 PU P#1 PU P#17 L2 (256KB) L1d (32KB) L1i (32KB) Core P#2 PU P#2 PU P#18 L2 (256KB) L1d (32KB) L1i (32KB) Core P#3 PU P#3 PU P#19 L2 (256KB) L1d (32KB) L1i (32KB) Core P#4 PU P#4 PU P#20 L2 (256KB) L1d (32KB) L1i (32KB) Core P#5 PU P#5 PU P#21 L2 (256KB) L1d (32KB) L1i (32KB) Core P#6 PU P#6 PU P#22 L2 (256KB) L1d (32KB) L1i (32KB) Core P#7 PU P#7 PU P#23 PCI 8086:1521 eth0 PCI 8086:1521 eth1 PCI 8086:1521 eth2 PCI 8086:1521 eth3 PCI 1137:0043 eth4 PCI 1137:0043 eth5 PCI 102b:0522 NUMANode P#1 (64GB) Socket P#1 L3 (20MB) L2 (256KB) L1d (32KB) L1i (32KB) Core P#0 PU P#8 PU P#24 L2 (256KB) L1d (32KB) L1i (32KB) Core P#1 PU P#9 PU P#25 L2 (256KB) L1d (32KB) L1i (32KB) Core P#2 PU P#10 PU P#26 L2 (256KB) L1d (32KB) L1i (32KB) Core P#3 PU P#11 PU P#27 L2 (256KB) L1d (32KB) L1i (32KB) Core P#4 PU P#12 PU P#28 L2 (256KB) L1d (32KB) L1i (32KB) Core P#5 PU P#13 PU P#29 L2 (256KB) L1d (32KB) L1i (32KB) Core P#6 PU P#14 PU P#30 L2 (256KB) L1d (32KB) L1i (32KB) Core P#7 PU P#15 PU P#31 PCI 1000:005b sda sdb PCI 1137:0043 eth6 PCI 1137:0043 eth7 Indexes: physical Machine (128GB) NUMANode P#0 (64GB) Socket P#0 L3 (20MB) L2 (256KB) L1d (32KB) L1i (32KB) Core P#0 PU P#0 PU P#16 L2 (256KB) L1d (32KB) L1i (32KB) Core P#1 PU P#1 PU P#17 L2 (256KB) L1d (32KB) L1i (32KB) Core P#2 PU P#2 PU P#18 L2 (256KB) L1d (32KB) L1i (32KB) Core P#3 PU P#3 PU P#19 L2 (256KB) L1d (32KB) L1i (32KB) Core P#4 PU P#4 PU P#20 L2 (256KB) L1d (32KB) L1i (32KB) Core P#5 PU P#5 PU P#21 L2 (256KB) L1d (32KB) L1i (32KB) Core P#6 PU P#6 PU P#22 L2 (256KB) L1d (32KB) L1i (32KB) Core P#7 PU P#7 PU P#23 PCI 8086:1521 eth0 PCI 8086:1521 eth1 PCI 8086:1521 eth2 PCI 8086:1521 eth3 PCI 1137:0043 eth4 PCI 1137:0043 eth5 PCI 102b:0522 NUMANode P#1 (64GB) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Machine (128GB) NUMANode P#0 (64GB) Socket P#0 L3 (20MB) L2 (256KB) L1d (32KB) L1i (32KB) Core P#0 PU P#0 PU P#16 L2 (256KB) L1d (32KB) L1i (32KB) Core P#1 PU P#1 PU P#17 L2 (256KB) L1d (32KB) L1i (32KB) Core P#2 PU P#2 PU P#18 L2 (256KB) L1d (32KB) L1i (32KB) Core P#3 PU P#3 PU P#19 L2 (256KB) L1d (32KB) L1i (32KB) Core P#4 PU P#4 PU P#20 L2 (256KB) L1d (32KB) L1i (32KB) Core P#5 PU P#5 PU P#21 L2 (256KB) L1d (32KB) L1i (32KB) Core P#6 PU P#6 PU P#22 L2 (256KB) L1d (32KB) L1i (32KB) Core P#7 PU P#7 PU P#23 PCI 8086:1521 eth0 PCI 8086:1521 eth1 PCI 8086:1521 eth2 PCI 8086:1521 eth3 PCI 1137: eth4 PCI 1137: eth5 PCI 102b:0522 NUMANode P#1 (64GB) Socket P#1 L3 (20MB) L2 (256KB) L1d (32KB) L1i (32KB) Core P#0 PU P#8 PU P#24 L2 (256KB) L1d (32KB) L1i (32KB) Core P#1 PU P#9 PU P#25 L2 (256KB) L1d (32KB) L1i (32KB) Core P#2 PU P#10 PU P#26 L2 (256KB) L1d (32KB) L1i (32KB) Core P#3 PU P#11 PU P#27 L2 (256KB) L1d (32KB) L1i (32KB) Core P#4 PU P#12 PU P#28 L2 (256KB) L1d (32KB) L1i (32KB) Core P#5 PU P#13 PU P#29 L2 (256KB) L1d (32KB) L1i (32KB) Core P#6 PU P#14 PU P#30 L2 (256KB) L1d (32KB) L1i (32KB) Core P#7 PU P#15 PU P#31 PCI 1000:005b sda sdb PCI 1137: eth6 PCI 1137: eth7 Indexes: physical Machine (128GB) NUMANode P#0 (64GB) Socket P#0 L3 (20MB) L2 (256KB) L1d (32KB) L1i (32KB) Core P#0 PU P#0 PU P#16 L2 (256KB) L1d (32KB) L1i (32KB) Core P#1 PU P#1 PU P#17 L2 (256KB) L1d (32KB) L1i (32KB) Core P#2 PU P#2 PU P#18 L2 (256KB) L1d (32KB) L1i (32KB) Core P#3 PU P#3 PU P#19 L2 (256KB) L1d (32KB) L1i (32KB) Core P#4 PU P#4 PU P#20 L2 (256KB) L1d (32KB) L1i (32KB) Core P#5 PU P#5 PU P#21 L2 (256KB) L1d (32KB) L1i (32KB) Core P#6 PU P#6 PU P#22 L2 (256KB) L1d (32KB) L1i (32KB) Core P#7 PU P#7 PU P#23 PCI 8086:1521 eth0 PCI 8086:1521 eth1 PCI 8086:1521 eth2 PCI 8086:1521 eth3 PCI 1137: eth4 PCI 1137: eth5 PCI 102b:0522 NUMANode P#1 (64GB) 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Machine (128GB) NUMANode P#0 (64GB) Socket P#0 L3 (20MB) L2 (256KB) L1d (32KB) L1i (32KB) Core P#0 PU P#0 PU P#16 L2 (256KB) L1d (32KB) L1i (32KB) Core P#1 PU P#1 PU P#17 L2 (25 L1d (3 L1i (3 Core PU PU NUMANode P#1 (64GB) Socket P#1 L3 (20MB) L2 (256KB) L1d (32KB) L1i (32KB) Core P#0 PU P#8 PU P#24 L2 (256KB) L1d (32KB) L1i (32KB) Core P#1 PU P#9 PU P#25 L2 (25 L1d (3 L1i (3 Core PU PU Indexes: physical Machine (128GB) NUMANode P#0 (64GB) Socket P#0 L3 (20MB) L2 (256KB) L1d (32KB) L1i (32KB) Core P#0 PU P#0 PU P#16 L2 (256KB) L1d (32KB) L1i (32KB) Core P#1 PU P#1 PU P#17 L2 (25 L1d (3 L1i (3 Core PU PU NUMANode P#1 (64GB) 32 33 3 40 41 4 Machine (128GB) NUMANode P#0 (64GB) Socket P#0 L3 (20MB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) PCI 8086:1521 eth0 PCI 8086:1521 Machine (128GB) NUMANode P#0 (64GB) Socket P#0 L3 (20MB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) PCI 8086:1521 eth0 PCI 8086:1521 Machine (128GB) NUMANode P#0 (64GB) Socket P#0 L3 (20MB) L2 (256KB) L2 (256KB) L2 (25
  • 16. Binding • Process-launching agent working with the OS to limit where each process can run: 1. No restrictions 2. Limited set of restrictions 3. Specific resource restrictions • “Binding width”  The number of PUs to which a process is bound
  • 17. Command Line Interface (CLI) • 4 levels of abstraction for the user  Level 1: None  Level 2: Simple, common patterns  Level 3: LAMA process layout regular patterns  Level 4: Irregular patterns
  • 18. CLI: Level 1 (none) • No mapping or binding options specified  May or may not specify the number of processes to launch (-np)  If not specified, default to the number of cores available in the allocation  One process is mapped to each core in the system in a "by-core" style  Processes are not bound • …for backwards compatibility reasons 
  • 19. CLI: Level 2 (common) • Simple, common patterns for mapping and binding  Specify mapping pattern with • --map-by X (e.g., --map-by socket)  Specify binding option with: • --bind-to Y (e.g., --bind-to core)  All of these options are translated to Level 3 options for processing by LAMA (full list of X / Y values shown later)
  • 20. CLI: Level 3 (regular patterns) • LAMA process layout regular patterns  Power users wanting something unique for their application  Four MCA run-time parameters • rmaps_lama_map: Mapping process layout • rmaps_lama_bind: Binding width • rmaps_lama_order: Ordering of MCW ranks • rmaps_lama_mppr: Maximum allowable number of processes per resource (oversubscription)
  • 21. rmaps_lama_map (map) • Takes as an argument the "process layout"  A series of nine tokens • allowing 9! (362,880) mapping permutation options.  Preferred iteration order for LAMA • innermost iteration specified first • outermost iteration specified last
  • 22. Example system 2 servers (nodes), 4 sockets, 2 cores, 2 PUs
  • 23. rmaps_lama_map (map) • map=scbnh (a.k.a., by socket, then by core)
  • 24. rmaps_lama_map (map) • map=scbnh (a.k.a., by socket, then by core)
  • 25. rmaps_lama_map (map) • map=scbnh (a.k.a., by socket, then by core)
  • 26. rmaps_lama_map (map) • map=scbnh (a.k.a., by socket, then by core)
  • 27. rmaps_lama_map (map) • map=scbnh (a.k.a., by socket, then by core)
  • 28. rmaps_lama_bind (bind) • “Binding width" and layer • Example: bind=3c (3 cores)Machine (128GB) NUMANode P#0 (64GB) Socket P#0 L3 (20MB) L2 (256KB) L1d (32KB) L1i (32KB) Core P#0 PU P#0 PU P#16 L2 (256KB) L1d (32KB) L1i (32KB) Core P#1 PU P#1 PU P#17 L2 (256KB) L1d (32KB) L1i (32KB) Core P#2 PU P#2 PU P#18 L2 (256KB) L1d (32KB) L1i (32KB) Core P#3 PU P#3 PU P#19 L2 (256KB) L1d (32KB) L1i (32KB) Core P#4 PU P#4 PU P#20 L2 (256KB) L1d (32KB) L1i (32KB) Core P#5 PU P#5 PU P#21 L2 (256KB) L1d (32KB) L1i (32KB) Core P#6 PU P#6 PU P#22 L2 (256KB) L1d (32KB) L1i (32KB) Core P#7 PU P#7 PU P#23 PCI e PCI e PCI e PCI e bind = 3c
  • 29. Machine (128GB) NUMANode P#0 (64GB) Socket P#0 L3 (20MB) L2 (256KB) L1d (32KB) L1i (32KB) Core P#0 PU P#0 PU P#16 L2 (256KB) L1d (32KB) L1i (32KB) Core P#1 PU P#1 PU P#17 L2 (256KB) L1d (32KB) L1i (32KB) Core P#2 PU P#2 PU P#18 L2 (256KB) L1d (32KB) L1i (32KB) Core P#3 PU P#3 PU P#19 L2 (256KB) L1d (32KB) L1i (32KB) Core P#4 PU P#4 PU P#20 L2 (256KB) L1d (32KB) L1i (32KB) Core P#5 PU P#5 PU P#21 L2 (256KB) L1d (32KB) L1i (32KB) Core P#6 PU P#6 PU P#22 L2 (256KB) L1d (32KB) L1i (32KB) Core P#7 PU P#7 PU P#23 rmaps_lama_bind (bind) • “Binding width" and layer • Example: bind=2s (2 sockets) Machine (128GB) NUMANode P#0 (64GB) Socket P#0 L3 (20MB) L2 (256KB) L1d (32KB) L1i (32KB) Core P#0 PU P#0 PU P#16 L2 (256KB) L1d (32KB) L1i (32KB) Core P#1 PU P#1 PU P#17 L2 (256KB) L1d (32KB) L1i (32KB) Core P#2 PU P#2 PU P#18 L2 (256KB) L1d (32KB) L1i (32KB) Core P#3 PU P#3 PU P#19 L2 (256KB) L1d (32KB) L1i (32KB) Core P#4 PU P#4 PU P#20 L2 (256KB) L1d (32KB) L1i (32KB) Core P#5 PU P#5 PU P#21 L2 (256KB) L1d (32KB) L1i (32KB) Core P#6 PU P#6 PU P#22 L2 (256KB) L1d (32KB) L1i (32KB) Core P#7 PU P#7 PU P#23 PCI 8086:1521 eth0 PCI 8086:1521 eth1 PCI 8086:1521 eth2 PCI 8086:1521 eth3 PCI 1137:0043 eth4 PCI 1137:0043 eth5 PCI 102b:0522 bind = 2s bind = 2s
  • 30. rmaps_lama_bind (bind) • “Binding width" and layer • Example: bind=12 (all PUs in an L2) bind = 12
  • 31. rmaps_lama_bind (bind) • “Binding width" and layer • Example: bind=1N (all PUs in NUMA locality) bind = 1N
  • 32. rmaps_lama_order (order) • Select which ranks are assigned to processes in MCW • There are other possible orderings, but no one has asked for them yet… Natural order for map-by-node (default) Sequential order for any mapping
  • 33. rmaps_lama_mppr (mppr) • mppr (mip-per) sets the Maximum number of allowable Processes Per Resource  User-specified definition of oversubscription • Comma-delimited list of <#:resource>  1:c  At most one process per core  1:c,2:s  At most one process per core, and at most two processes per socket
  • 34. MPPR  1:c  At most one process per coreMachine (128GB) NUMANode P#0 (64GB) Socket P#0 L3 (20MB) L2 (256KB) L1d (32KB) L1i (32KB) Core P#0 PU P#0 PU P#16 L2 (256KB) L1d (32KB) L1i (32KB) Core P#1 PU P#1 PU P#17 L2 (256KB) L1d (32KB) L1i (32KB) Core P#2 PU P#2 PU P#18 L2 (256KB) L1d (32KB) L1i (32KB) Core P#3 PU P#3 PU P#19 L2 (256KB) L1d (32KB) L1i (32KB) Core P#4 PU P#4 PU P#20 L2 (256KB) L1d (32KB) L1i (32KB) Core P#5 PU P#5 PU P#21 L2 (256KB) L1d (32KB) L1i (32KB) Core P#6 PU P#6 PU P#22 L2 (256KB) L1d (32KB) L1i (32KB) Core P#7 PU P#7 PU P#23
  • 35. MPPR  1:c,2:s  At most one process per core and two processes per socket Machine (128GB) NUMANode P#0 (64GB) Socket P#0 L3 (20MB) L2 (256KB) L1d (32KB) L1i (32KB) Core P#0 PU P#0 PU P#16 L2 (256KB) L1d (32KB) L1i (32KB) Core P#1 PU P#1 PU P#17 L2 (256KB) L1d (32KB) L1i (32KB) Core P#2 PU P#2 PU P#18 L2 (256KB) L1d (32KB) L1i (32KB) Core P#3 PU P#3 PU P#19 L2 (256KB) L1d (32KB) L1i (32KB) Core P#4 PU P#4 PU P#20 L2 (256KB) L1d (32KB) L1i (32KB) Core P#5 PU P#5 PU P#21 L2 (256KB) L1d (32KB) L1i (32KB) Core P#6 PU P#6 PU P#22 L2 (256KB) L1d (32KB) L1i (32KB) Core P#7 PU P#7 PU P#23
  • 36. CLI: Level 4 (rankfile) • Complete specification of processor-to- resource mapping description  Bypasses LAMA • Not described in the paper
  • 37. Level 2 to Level 3 Chart
  • 38. Remember the prior example? • -np 24 -mppr 2:c -map scbnh
  • 39. Same example, different mapping • -np 24 -mppr 2:c -map nbsch
  • 40. • Displays prettyprint representation of the binding actually used for each process.  Visual feedback = quite helpful when exploring mpirun -np 4 --mca rmaps lama --mca rmaps_lama_bind 1c --mca rmaps_lama_map nbsch --mca rmaps_lama_mppr 1:c --report- bindings hello_world MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..] MCW rank 1 bound to socket 1[core 8[hwt 0-1]]: [../../../../../../../..][BB/../../../../../../..] MCW rank 2 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..] MCW rank 3 bound to socket 1[core 9[hwt 0-1]]: [../../../../../../../..][../BB/../../../../../..] Report Bindings
  • 41. Future Work • Available in Open MPI v1.7.2 (and later) • Open questions to users:  Are more flexible ordering options useful?  What common mapping patterns are useful?  What additional features would you like to see?

Editor's Notes

  1. Can be done either before or after binding – MPI implementation choice.
  2. --map-by socket would have put rank 4 on Node 1 (not what we wanted)
  3. --map-by socket would have put rank 4 on Node 1 (not what we wanted)
  4. --map-by socket would have put rank 4 on Node 1 (not what we wanted)
  5. --map-by socket would have put rank 4 on Node 1 (not what we wanted)
  6. --map-by socket would have put rank 4 on Node 1 (not what we wanted)
  7. --map-by socket would have put rank 4 on Node 1 (not what we wanted)
  8. Oversubscription is an error by defaultuser can specify –oversubscribe.
  9. Oversubscription is an error by defaultuser can specify –oversubscribe.
  10. Oversubscription is an error by defaultuser can specify –oversubscribe.