The document discusses the implications of non-volatile main memories (NVMMs) for networking stacks. It presents a case study on write-ahead logging that shows networking can become a bottleneck. It then proposes a PASTE approach using static packet buffers on NVMM to enable fast, zero-copy logging with minimal data movement and caching issues. Preliminary results show a 10-88% increase in throughput and 9-46% decrease in latency.
3. Case Study: Write-Ahead Logging
• Persist client’s request prior to acknowledgment
• Mask overhead of updating primary database (e.g. B-
tree) to the client
• 1KB commit
• 2030 us
• Networking takes 40 us
client
DRAM
Network
stack
SSD/
DIsk
App
NIC
(1)
(2)
(5)
Storage
stack
(4)
write()/memcpy()
->fsync()/msync()
(3)
read()
4. write()/memcpy()
->fsync()/msync()
Case Study: Write-Ahead Logging
• Persist client’s request prior to acknowledgment
• Mask overhead of updating primary database (e.g. B-
tree) to the client
• 1KB commit
• 2000 42 us
• Networking takes 40 us
• This 2 us is not small
client
DRAM
Network
stack
Storage
stack
NVMM
App
NIC
(1)
(2)
(5)
Emulated using a
reserved region of
DRAM
(3)
read()
(4)
5. 1 5 10 15 20 25
0
20
40
60
80
100
120
140
TKrougKput[1.tranV/V]
1 5 10 15 20 25
0
50
100
150
200
250
300
350
/atency[µV]
# of Concurrent ConnectLonV
1et. 2nOy
1et. + read()/mVync() on 1V00 (emuO.)
1et. + memcpy()/mVync() on 1V00 (emuO.)
• Parallel requests are serialized on each core
Case Study: Write-Ahead Logging
33 % throughput decrease, 50 % latency increase
6. Data Copies Matter
• Cache Misses
• Copy to tmp buffer (e.g., read()) is cheap
• Logging always happens to a different destination
app bufferkernel buffer
read()
log file (mmap()-ed)
memcpy()
Overall cache misses Largest Contributor
Networking only 0.0004 % net_rx_action() (84%)
Networking + NVMM
(read() + memcpy() + msync())
4.4121 % memcpy() (98%)
Networking + NVMM
(read()+msync())
8.3451 % sys_read() (99 %)
We must avoid data copy for logging!
7. Packet Store (PASTE) Overview
• Static packet buffers on a named NVMM region
• DMA to NVMM
• Zero-copy APIs
• Fast logging
client
Network
stack
Storage
stack
NVMM
App
NIC
(1)
(2)
(3)
/mnt/nvmm
/pktbufs
/mnt/pmem
/appmd
(4)
metadata only
(e.g., buffer index)
8. Fast Logging with PASTE
/mnt/nvmm/myapp_metadata
buf_idx off len
0 100 1135
1 100 932
3 100 1024
/mnt/nvmm/pktbufs
buf_ofs: 123
/mnt/nvmm/pktbufs
packet buffers
(static)
metadata header
metadata entries
NIC ring
Application
(2) Write metadata entry
(3) Flush (buffer and metadata)
netmap APImmap()
(1) Read data (zero copy)
Kernel
User
TCP/IP
input
and
output
Because of DDIO (DMA to L3 cache), NVMM latency only
arises on flushing data (i.e., step (3)), not on every packet
Unread
Read
Flushed
9. Fast Logging with PASTE
/mnt/nvmm/myapp_metadata
buf_idx off len
0 100 1135
1 100 932
3 100 1024
/mnt/nvmm/pktbufs
buf_ofs: 123
/mnt/nvmm/pktbufs
packet buffers
(static)
metadata header
metadata entries
NIC ring
Application
(2) Write metadata entry
(3) Flush (buffer and metadata)
netmap APImmap()
(1) Read data (zero copy)
Kernel
User
TCP/IP
input
and
output
Unread
Read
Flushed
Because of DDIO (DMA to L3 cache), NVMM latency only
arises on flushing data (i.e., step (3)), not on every packet
10. Fast Logging with PASTE
/mnt/nvmm/myapp_metadata
buf_idx off len
0 100 1135
1 100 932
3 100 1024
/mnt/nvmm/pktbufs
buf_ofs: 123
/mnt/nvmm/pktbufs
packet buffers
(static)
metadata header
metadata entries
NIC ring
Application
(2) Write metadata entry
(3) Flush (buffer and metadata)
netmap APImmap()
(1) Read data (zero copy)
Kernel
User
TCP/IP
input
and
output
Unread
Read
Flushed
Because of DDIO (DMA to L3 cache), NVMM latency only
arises on flushing data (i.e., step (3)), not on every packet
11. Fast Logging with PASTE
/mnt/nvmm/myapp_metadata
buf_idx off len
0 100 1135
1 100 932
3 100 1024
/mnt/nvmm/pktbufs
buf_ofs: 123
/mnt/nvmm/pktbufs
packet buffers
(static)
metadata header
metadata entries
NIC ring
Application
(2) Write metadata entry
(3) Flush (buffer and metadata)
netmap APImmap()
(1) Read data (zero copy)
Kernel
User
TCP/IP
input
and
output
Unread
Read
Flushed
Because of DDIO (DMA to L3 cache), NVMM latency only
arises on flushing data (i.e., step (3)), not on every packet
12. Fast Logging with PASTE
/mnt/nvmm/myapp_metadata
buf_idx off len
0 100 1135
1 100 932
3 100 1024
/mnt/nvmm/pktbufs
buf_ofs: 123
/mnt/nvmm/pktbufs
packet buffers
(static)
metadata header
metadata entries
NIC ring
Application
(2) Write metadata entry
(3) Flush (buffer and metadata)
netmap APImmap()
(1) Read data (zero copy)
Kernel
User
TCP/IP
input
and
output
Unread
Read
Flushed
Because of DDIO (DMA to L3 cache), NVMM latency only
arises on flushing data (i.e., step (3)), not on every packet
13. Fast Logging with PASTE
/mnt/nvmm/myapp_metadata
buf_idx off len
0 100 1135
1 100 932
3 100 1024
/mnt/nvmm/pktbufs
buf_ofs: 123
/mnt/nvmm/pktbufs
packet buffers
(static)
metadata header
metadata entries
NIC ring
Application
(2) Write metadata entry
(3) Flush (buffer and metadata)
netmap APImmap()
(1) Read data (zero copy)
Kernel
User
TCP/IP
input
and
output
Unread
Read
Flushed
Because of DDIO (DMA to L3 cache), NVMM latency only
arises on flushing data (i.e., step (3)), not on every packet
14. Fast Logging with PASTE
/mnt/nvmm/myapp_metadata
buf_idx off len
0 100 1135
1 100 932
3 100 1024
/mnt/nvmm/pktbufs
buf_ofs: 123
/mnt/nvmm/pktbufs
packet buffers
(static)
metadata header
metadata entries
NIC ring
Application
(2) Write metadata entry
(3) Flush (buffer and metadata)
netmap APImmap()
(1) Read data (zero copy)
Kernel
User
TCP/IP
input
and
output
Unread
Read
Flushed
Because of DDIO (DMA to L3 cache), NVMM latency only
arises on flushing data (i.e., step (3)), not on every packet
15. Fast Logging with PASTE
/mnt/nvmm/myapp_metadata
buf_idx off len
0 100 1135
1 100 932
3 100 1024
/mnt/nvmm/pktbufs
buf_ofs: 123
/mnt/nvmm/pktbufs
packet buffers
(static)
metadata header
metadata entries
NIC ring
Application
(2) Write metadata entry
(3) Flush (buffer and metadata)
netmap APImmap()
(1) Read data (zero copy)
Kernel
User
TCP/IP
input
and
output
Unread
Read
Flushed
Because of DDIO (DMA to L3 cache), NVMM latency only
arises on flushing data (i.e., step (3)), not on every packet
16. Fast Logging with PASTE
/mnt/nvmm/myapp_metadata
buf_idx off len
0 100 1135
1 100 932
3 100 1024
/mnt/nvmm/pktbufs
buf_ofs: 123
/mnt/nvmm/pktbufs
packet buffers
(static)
metadata header
metadata entries
NIC ring
Application
(2) Write metadata entry
(3) Flush (buffer and metadata)
netmap APImmap()
(1) Read data (zero copy)
Kernel
User
TCP/IP
input
and
output
Unread
Read
Flushed
Because of DDIO (DMA to L3 cache), NVMM latency only
arises on flushing data (i.e., step (3)), not on every packet
17. Fast Logging with PASTE
/mnt/nvmm/myapp_metadata
buf_idx off len
0 100 1135
1 100 932
3 100 1024
/mnt/nvmm/pktbufs
buf_ofs: 123
/mnt/nvmm/pktbufs
packet buffers
(static)
metadata header
metadata entries
NIC ring
Application
(2) Write metadata entry
(3) Flush (buffer and metadata)
netmap APImmap()
(1) Read data (zero copy)
Kernel
User
TCP/IP
input
and
output
Idempotent
request
Unread
Read
Flushed
Because of DDIO (DMA to L3 cache), NVMM latency only
arises on flushing data (i.e., step (3)), not on every packet
18. Fast Logging with PASTE
/mnt/nvmm/myapp_metadata
buf_idx off len
0 100 1135
1 100 932
3 100 1024
/mnt/nvmm/pktbufs
buf_ofs: 123
/mnt/nvmm/pktbufs
packet buffers
(static)
metadata header
metadata entries
NIC ring
Application
(2) Write metadata entry
(3) Flush (buffer and metadata)
netmap APImmap()
(1) Read data (zero copy)
Kernel
User
TCP/IP
input
and
output
Unread
Read
Flushed
Because of DDIO (DMA to L3 cache), NVMM latency only
arises on flushing data (i.e., step (3)), not on every packet
19. Fast Logging with PASTE
/mnt/nvmm/myapp_metadata
buf_idx off len
0 100 1135
1 100 932
3 100 1024
/mnt/nvmm/pktbufs
buf_ofs: 123
/mnt/nvmm/pktbufs
packet buffers
(static)
metadata header
metadata entries
NIC ring
Application
(2) Write metadata entry
(3) Flush (buffer and metadata)
netmap APImmap()
(1) Read data (zero copy)
Kernel
User
TCP/IP
input
and
output
Unread
Read
Flushed
Because of DDIO (DMA to L3 cache), NVMM latency only
arises on flushing data (i.e., step (3)), not on every packet
20. Fast Logging with PASTE
/mnt/nvmm/myapp_metadata
buf_idx off len
0 100 1135
1 100 932
3 100 1024
/mnt/nvmm/pktbufs
buf_ofs: 123
/mnt/nvmm/pktbufs
packet buffers
(static)
metadata header
metadata entries
NIC ring
Application
(2) Write metadata entry
(3) Flush (buffer and metadata)
netmap APImmap()
(1) Read data (zero copy)
Kernel
User
TCP/IP
input
and
output
Unread
Read
Flushed
Because of DDIO (DMA to L3 cache), NVMM latency only
arises on flushing data (i.e., step (3)), not on every packet