1. Introduction Tests Conclusions Contributions Future work
Gasnet library evaluation on Barrelfish and
Intel SCC
June 30, 2012
Zeus G´omez Marmolejo
Barcelona Supercomputing Center
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
2. Introduction Tests Conclusions Contributions Future work Motivation Project goals Software architecture
Contents
1 Introduction
Motivation
Project goals
Software architecture
2 Tests
Hardware
Configurations
MP: 1 to 1
MP: N to N
3 Conclusions
4 Contributions
5 Future work
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
3. Introduction Tests Conclusions Contributions Future work Motivation Project goals Software architecture
Introduction
Motivation
Future trends:
Multi-core CPUs and multi-core GPUs in a single chip.
Shared memory and cache coherence complexity. This
May not scale in the future.
Problems with shared memory OS like Linux or Windows
and many core systems.
Message passing OS like Barrelfish.
Experiments on non-coherent multi-core shared
architectures: Intel SCC and its MPBs.
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
4. Introduction Tests Conclusions Contributions Future work Motivation Project goals Software architecture
Linux approach
Multi-core operating systems using shared memory
core 0
struct page {
...
spinlock_t ptl;
};
core 1 core 2 core N...
Data sharing:
Access locks
False sharing
Memory
Contention
Hardware cache
coherence
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
5. Introduction Tests Conclusions Contributions Future work Motivation Project goals Software architecture
Barrelfish approach
No sharing, but message passing
System Knowledge Base:
No driver software!
Message passing:
No sharing at all
System processes
Asynchronous calls
Interconnect drivers
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
6. Introduction Tests Conclusions Contributions Future work Motivation Project goals Software architecture
Project goals
Looking for the appropriate library that meets the desired features
Port a well-known message passing library to
Barrelfish...
Desired features:
Portable across different
architectures, systems and
OSs.
Highly efficient.
Used in many applications and
parallel languages.
Be able to run standard
OpenMP programs via the
nanos runtime.
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
7. Introduction Tests Conclusions Contributions Future work Motivation Project goals Software architecture
Project goals
Looking for the appropriate library that meets the desired features
Port a well-known message passing library to
Barrelfish...
Desired features:
Portable across different
architectures, systems and
OSs.
Highly efficient.
Used in many applications and
parallel languages.
Be able to run standard
OpenMP programs via the
nanos runtime.
The Gasnet library
from the
University of
Berkeley
fulfills these
expectations.
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
8. Introduction Tests Conclusions Contributions Future work Motivation Project goals Software architecture
Gasnet library
Low level communication library
Network hardware
UDP
conduit
SMP
conduit
MPI
conduit
BF
conduit
Gasnet core API
Low level communication library: implements UPC,
Titanium, OmpSs.
Different categories: AMShort, AMMedium, AMLong
Message types: requests, replies
Private Shared Memory (PSHM) mode for a conduit
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
9. Introduction Tests Conclusions Contributions Future work Motivation Project goals Software architecture
Barrelfish Message Passing
Generated stubs for efficient message passing
msg()
user.c
...
...
ump_hdlr() {
user_flounder
_bindings.c
...
cache
write
}
core 0 process
event_dispatch() {
waitset.c
...
ump_rx() {
...
msg()
}
core 1 process
closure.handler()
...
...
user_flounder
_bindings.c
msg() {
...
}
user.c
} ...
Non-blocking asynchronous calls.
Continuation closure, called also asynchronously.
Messages sent as RPC.
Generated C code by flounder tool, in Haskell, depending
on the interconnect driver.
Fast event handling code on receiving side (polling).
When all arguments are assembled, call is made to user
program.
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
10. Introduction Tests Conclusions Contributions Future work Motivation Project goals Software architecture
Differences with Barrelfish MP model
Similarities, differences and solutions proposed
Similarities:
Gasnet Nodes → Barrelfish Cores
Messages as RPC
Necessity to send large buffers
Be able to send back replies
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
11. Introduction Tests Conclusions Contributions Future work Motivation Project goals Software architecture
Differences with Barrelfish MP model
Similarities, differences and solutions proposed
Similarities:
Gasnet Nodes → Barrelfish Cores
Messages as RPC
Necessity to send large buffers
Be able to send back replies
Differences:
Gasnet calls must
block
Non-blocking
message handlers
No thread-safe
Solutions:
2 threads: application &
Gasnet
Leader-followers thread
serving model
4 independent
channels per peer
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
12. Introduction Tests Conclusions Contributions Future work Motivation Project goals Software architecture
Gasnet BF conduit implementation
Details of the BF implementation using previous solutions
Details:
Uses the BF flounder generated stub to pass messages.
To simulate the synchronous behavior of Gasnet, we use 2
threads. One of them is coming from the pool.
However, the binding cannot be handled by two threads
concurrently without proper locking.
thread 1
GASNET BARRELFISH
core 1
BARRELFISH
core 2
gasnet_AMShort()
ack
ack
ack
GASNET
handler call
ambf_ump_send_handler()
endr()
thread 2 thread 2 thread 1
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
13. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Contents
1 Introduction
Motivation
Project goals
Software architecture
2 Tests
Hardware
Configurations
MP: 1 to 1
MP: N to N
3 Conclusions
4 Contributions
5 Future work
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
14. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Test systems
Intel x86 64 SMP system
Sun Fire X2270 M2, with 2 Intel Xeon CPU E5620@ 2.40GHz:
chip 0 chip 1QPI link
(8 CPUs) (8 CPUs)
DDR3 0
DDR3 1
DDR3 2
DDR3 0
DDR3 1
DDR3 2
SMP system
Features:
Intel x86 64 architecture
2 chips x 4 SMP x 2 SMT = 16 CPUs
32 GB RAM NUMA
QPI link: 25 GB/s
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
15. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Test systems
Intel Single-chip Cloud Computer system
Features:
48 CPUs Intel 32-bit P54C in a
single chip
Non-coherent caches
Routers and MPBs for message
passing
4 DDR3 memory controllers
Shipped as:
A cluster of 48 linux
systems accessed
by SSH
No OS prev to
Barrelfish sees it as
a single system
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
16. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Test configurations
Possible combinations of architecture & OS & Gasnet conduit/interconnect driver
x86 64-linux-pshm. SMP with Linux and PSHM in Gasnet.
x86 64-linux-mpi. SMP with Linux with MPI conduit for Gasnet.
x86 64-barrelfish-pshm. SMP running Barrelfish and PSHM in
Gasnet.
x86 64-barrelfish-ump. SMP running Barrelfish and the User-Level
Message Passing.
scc-linux-mpi. Intel SCC running Linux on all cores with Gasnet MPI
conduit, compiled with the MPIRCK CH2 driver.
scc-barrelfish-ump ipi. Intel SCC running Barrelfish with Gasnet BF
conduit with UMP & Inter-Process Interrupts backend.
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
17. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Test configurations
Possible combinations of architecture & OS & Gasnet conduit/interconnect driver
x86 64-linux-pshm. SMP with Linux and PSHM in Gasnet.
x86 64-linux-mpi. SMP with Linux with MPI conduit for Gasnet.
x86 64-barrelfish-pshm. SMP running Barrelfish and PSHM in
Gasnet.
x86 64-barrelfish-ump. SMP running Barrelfish and the User-Level
Message Passing.
scc-linux-mpi. Intel SCC running Linux on all cores with Gasnet MPI
conduit, compiled with the MPIRCK CH2 driver.
scc-barrelfish-ump ipi. Intel SCC running Barrelfish with Gasnet BF
conduit with UMP & Inter-Process Interrupts backend.
Not tested:
32-bit SMP system
Bulk transfer Barrelfish transfer mode
Intel SCC MPBs flounder backend: Fast, but very short
and unprotected. SCC is seen as an accelerator
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
18. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Message Passing: 1 to 1
Only two nodes are sending messages
Testam benchmark from Gasnet. 1000 mesages of:
Ping-pong roundtrip Request - Reply (prqp)
Ping-pong roundtrip Request - Request (prqq)
Flood one-way Request (foq)
Flood roundtrip Request - Reply (frqp)
Flood roundtrip Request - Request (frqq)
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
19. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Message Passing: 1 to 1
Only two nodes are sending messages
Testam benchmark from Gasnet. 1000 mesages of:
Ping-pong roundtrip Request - Reply (prqp)
Ping-pong roundtrip Request - Request (prqq)
Flood one-way Request (foq)
Flood roundtrip Request - Reply (frqp)
Flood roundtrip Request - Request (frqq)
0.5
1
1.5
2
2.5
3
3.5
4
4.5
prqp prqq foq frqp
delay(us)
test type
AMShort testam x86_64-linux-pshm
linux-pshm
0
2000
4000
6000
8000
10000
12000
14000
1 16 256 4k 64k 1M
throughput(Mbytes/s)
buffer size (bytes)
AMLong testam x86_64-linux-pshm
prqp
prqq
foq
frqp
frqq
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
20. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Message Passing: 1 to 1
Simplest comparison
x86 64-linux-pshm
vs
x86 64-barrelfish-pshm
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
21. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Message Passing: 1 to 1
Simplest comparison
x86 64-linux-pshm
vs
x86 64-barrelfish-pshm
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
prqp prqq foq frqp
delay(us)
test type
AMShort testam x86_64-*-pshm
linux
barrelfish
0
2000
4000
6000
8000
10000
12000
1 16 256 4k 64k 1M
throughput(Mbytes/s)
buffer size (bytes)
AMLong testam x86_64-*-pshm
linux
barrelfish
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
22. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Message Passing: 1 to 1
Simplest comparison
x86 64-linux-pshm
vs
x86 64-barrelfish-pshm
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
prqp prqq foq frqp
delay(us)
test type
AMShort testam x86_64-*-pshm
linux
barrelfish
0
2000
4000
6000
8000
10000
12000
1 16 256 4k 64k 1M
throughput(Mbytes/s)
buffer size (bytes)
AMLong testam x86_64-*-pshm
linux
barrelfish
On AMShort category Barrelfish is much faster, as async
MP handlers are very efficient
On AMMedium and AMLong categories, when the buffer
>= 2048 Kb Barrelfish performs worse
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
23. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Message Passing: 1 to 1
Performance analysis breakdown
For long messages, most of the time is spent in the memcpy()
libc function, copying bytes from one region to another
We tried different implementations to see the result
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
24. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Message Passing: 1 to 1
Performance analysis breakdown
For long messages, most of the time is spent in the memcpy()
libc function, copying bytes from one region to another
We tried different implementations to see the result
0
2000
4000
6000
8000
10000
12000
1 16 256 4k 64k 1M
throughput(Mbytes/s)
buffer size (bytes)
Different memcpy() implementations
linux-glibc
bf-glibc
bf-newlib
bf-oldc
Test runs:
1 Linux with GNU GLIBC
using supl. SSE3 and
REP prefix
2 Barrelfish GNU GLIBC
memcpy()
3 Barrelfish with Red Hat
Newlib using REP
4 Barrelfish with old libc (C
language)
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
25. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Message Passing: 1 to 1
MPI vs UMP on the SMP
x86 64-linux-mpi
vs
x86 64-barrelfish-ump
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
26. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Message Passing: 1 to 1
MPI vs UMP on the SMP
x86 64-linux-mpi
vs
x86 64-barrelfish-ump
0
500
1000
1500
2000
2500
3000
3500
1 16 256 4k 64k
throughput(Mbytes/s)
buffer size (bytes)
AMMedium testam x86_64-{linux-mpi,bf-ump}
linux
barrelfish
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
27. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Message Passing: 1 to 1
MPI vs UMP on the SMP
x86 64-linux-mpi
vs
x86 64-barrelfish-ump
0
500
1000
1500
2000
2500
3000
3500
1 16 256 4k 64k
throughput(Mbytes/s)
buffer size (bytes)
AMMedium testam x86_64-{linux-mpi,bf-ump}
linux
barrelfish
(on linux-mpi maxsize AMMedium =
AMLong)
Barrelfish is performing
always worse, UMP
interconnect driver is not
designed for sending large
buffers.
Newlib memcpy() against GLIBC memcpy()!
UMP is decomposing buffers into fragments
Needs ACK for each fragment. There is piggybacking
implemented but not used to avoid handler deadlocks
Bulk transfer designed for this purpose: large buffers
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
28. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Message Passing: 1 to 1
MPI vs UMP on the SCC
scc-linux-mpi
vs
scc-barrelfish-ump ipi
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
29. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Message Passing: 1 to 1
MPI vs UMP on the SCC
scc-linux-mpi
vs
scc-barrelfish-ump ipi
0
5
10
15
20
25
30
35
1 16 256 4k 64k
throughput(Mbytes/s)
buffer size (bytes)
AMMedium testam scc-{linux-mpi,bf-ump}
linux
barrelfish
0
10
20
30
40
50
60
1 16 256 4k 64k 1M
throughput(Mbytes/s)
buffer size (bytes)
AMLong testam scc-{linux-mpi,bf-ump}
linux
barrelfish
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
30. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Message Passing: 1 to 1
MPI vs UMP on the SCC
scc-linux-mpi
vs
scc-barrelfish-ump ipi
0
5
10
15
20
25
30
35
1 16 256 4k 64k
throughput(Mbytes/s)
buffer size (bytes)
AMMedium testam scc-{linux-mpi,bf-ump}
linux
barrelfish
0
10
20
30
40
50
60
1 16 256 4k 64k 1M
throughput(Mbytes/s)
buffer size (bytes)
AMLong testam scc-{linux-mpi,bf-ump}
linux
barrelfish
Again Newlib memcpy() against GLIBC memcpy()
Same problems as before
On Linux, SCC MPBs are fully used (1 application)
On Linux we can see a strong performance degradation
when SCC MPBs overflow (after 8Kb)
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
31. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Message Passing: N to N
Description of the test application
Now we want to model a real system:
A node can send messages to any other node of the
system with the same probability.
Messages are sent in a Poisson process, idle times follow
an exponential distribution with rate parameter λ. From an
uniform distribution, we get the idle time by:
T = −
ln U
λ
We can choose the probability of sending AMShort,
AMMedium and AMLong.
Now buffers are fixed size.
We also model the probability for a request to have a reply.
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
32. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Message Passing: N to N
Values for the test runs
Test runs:
λ = 1 to 106 mgs/s in powers of 2
3 runs:
1 Majority of shorts: (ps = 0.7, pm = 0.2, pl = 0.1),
2 Majority of longs: (ps = 0.1, pm = 0.2, pl = 0.7)
3 All categories balanced (ps = 0.33, pm = 0.33, pl = 0.33)
Medium block size = 8Kb, long size = 64Kb
Reply probability = 0.33
We run every test during 5 minutes, as longer times don’t
affect numbers
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
33. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Message Passing: N to N
x86 64 architecure results
x86 64-linux-pshm vs x86 64-barrelfish-pshm vs x86 64-linux-mpi
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
34. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Message Passing: N to N
x86 64 architecure results
x86 64-linux-pshm vs x86 64-barrelfish-pshm vs x86 64-linux-mpi
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
16 256 4k 64k
realrate(msg/s)
perfect rate (msg/s)
x86_64 with 70% short msgs (16 cores)
lin-pshm
bf-pshm
lin-mpi
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
16 256 4k 64k
realrate(msg/s)
perfect rate (msg/s)
x86_64 with 70% long msgs (16 cores)
lin-pshm
bf-pshm
lin-mpi
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
35. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Message Passing: N to N
x86 64 architecure results
x86 64-linux-pshm vs x86 64-barrelfish-pshm vs x86 64-linux-mpi
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
16 256 4k 64k
realrate(msg/s)
perfect rate (msg/s)
x86_64 with 70% short msgs (16 cores)
lin-pshm
bf-pshm
lin-mpi
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
16 256 4k 64k
realrate(msg/s)
perfect rate (msg/s)
x86_64 with 70% long msgs (16 cores)
lin-pshm
bf-pshm
lin-mpi
16 cores simultaneously
Saturation rate (64 kmsg/s short, 8 kmsg/s long)
PSHM runs are better, even in Barrelfish
Again Newlib memcpy() against GLIBC memcpy(), even
with this barrelfish-pshm outperforms linux-mpi
Greater gap with MPI for longs: works better for shorts.
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
36. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Message Passing: N to N
MPI running on the SCC
scc-linux-mpi
0
100
200
300
400
500
600
16 256 4k 64k
realrate(msg/s)
perfect rate (msg/s)
Intel SCC with MPIRCK (48 cores)
short
balanced
long
(48 cores simultaneously)
scc-barrelfish-ump ipi
couldn’t be evaluated due to
severe deadlocks and race
conditions.
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
37. Introduction Tests Conclusions Contributions Future work Hardware Configurations MP: 1 to 1 MP: N to N
Message Passing: N to N
MPI running on the SCC
scc-linux-mpi
0
100
200
300
400
500
600
16 256 4k 64k
realrate(msg/s)
perfect rate (msg/s)
Intel SCC with MPIRCK (48 cores)
short
balanced
long
(48 cores simultaneously)
scc-barrelfish-ump ipi
couldn’t be evaluated due to
severe deadlocks and race
conditions.
Evaluation
Compiled with MPIRCK CH2 driver, using SCC MPBs
Slower convergence ratio than running with 16 cores
Convergence area: 3:1 ratio for short - balanced, and 2:1
for balanced - long
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
38. Introduction Tests Conclusions Contributions Future work
Contents
1 Introduction
Motivation
Project goals
Software architecture
2 Tests
Hardware
Configurations
MP: 1 to 1
MP: N to N
3 Conclusions
4 Contributions
5 Future work
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
39. Introduction Tests Conclusions Contributions Future work
Conclusions
Summary of results
When there is no buffer involved, Barrelfish performs
much faster due to asynchronous design of MP.
In case of a buffer, memcpy() becomes critical
libc shipped with Barrelfish has the worst performance
GNU GLIBC in Linux is very optimized but non-portable
There is a compromise between the two: Newlib
UMP Barrelfish driver not suitable for large buffers
On x86 64 architecture
All PSHM setups outperform MPI (even Barrelfish)
On the Intel SCC
SCC MPBs are not very suitable for an operating system
due to the lack of hardware protection
The size of MPBs for multitasking are very small, message
longer than the MPBs size per core need flux control.
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
40. Introduction Tests Conclusions Contributions Future work
Conclusions
Final project conclusions
After evaluating the project, we found:
Barrelfish not mature for normal work, lots of engineering
work
Time-consuming to debug race-conditions, lack of a
proper debugger and simulator
It has a lot of potential, specially because of the
asynchronous nature. This can be undoubtedly exploited.
MP models Gasnet / Barrelfish are different, a lot of
quirks to make it working
Intel SCC platform has been designed more like an
accelerator than a standalone system.
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
41. Introduction Tests Conclusions Contributions Future work
Contents
1 Introduction
Motivation
Project goals
Software architecture
2 Tests
Hardware
Configurations
MP: 1 to 1
MP: N to N
3 Conclusions
4 Contributions
5 Future work
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
42. Introduction Tests Conclusions Contributions Future work
Contributions
Accepted contributions
Barrelfish
17 commits accepted on the Barrelfish’ official tree:
Porting of the Newlib C library. Now all programs in the
tree link with it by default
IOAPIC index register access in 32-bit words
Cross-compiler C++ language support
System V shared memory extension
Thread mutex additional operations
Compiler/libc type decoupling
Hake tool extension for creating libraries from libraries
Bochs emulator
Accepted patch on Bochs emulator to continue deterministic
execution in debugging mode.
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
43. Introduction Tests Conclusions Contributions Future work
Contributions
Pending contributions and cross-compiler features
Pending contributions:
GNU cross-compiler tools for building programs on
Barrelfish
Gasnet BF conduit and internal modifications for running
it on Barrelfish
Cross-compiler features
Thanks to this project now it’s possible to run standard
C++ programs on Barrelfish
Compile standard GNU programs with the cross-compiler
with minor changes as:
./configure --host=x86 64-pc-barrelfish
Example: GNU bash
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
44. Introduction Tests Conclusions Contributions Future work
Contributions
GNU Bash screenshot
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
45. Introduction Tests Conclusions Contributions Future work
Contents
1 Introduction
Motivation
Project goals
Software architecture
2 Tests
Hardware
Configurations
MP: 1 to 1
MP: N to N
3 Conclusions
4 Contributions
5 Future work
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
46. Introduction Tests Conclusions Contributions Future work
Future work
Proposals for continuing the project
Future work:
Redesign Barrelfish Bulk transfer for flexible bucket size
and full duplex operation.
Rewrite flounder to be thread-safe.
Better UMP driver with longer buffer windows.
Faster memcpy() implementations.
Running OpenMP/OmpSs programs with nanos runtime
using the current C++ cross-compiler
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC
47. Introduction Tests Conclusions Contributions Future work
End
Questions?
Questions?
Zeus G´omez Marmolejo Gasnet library evaluation on Barrelfish and Intel SCC