New Ethernet standards, such as 40 GbE or 100 GbE, are already being deployed commercially along with their corresponding Network Interface Cards (NICs) for the servers. However, network measurement solutions are lagging behind: while there are several tools available for monitoring 10 or 20 Gbps networks, higher speeds pose a harder challenge that requires more new ideas, different from those applied previously, and so there are less applications available. In this paper, we show a system capable of capturing, timestamping and storing 40 Gbps network traffic using a tailored network driver together with Non-Volatile Memory express (NVMe) technology and the Storage Performance Development Kit (SPDK) framework. Also, we expose core ideas that can be extended for the capture at higher rates: a multicore architecture capable of synchronization with minimal overhead that reduces disordering of the received frames, methods to filter the traffic discarding unwanted frames without being computationally expensive, and the use of an intermediate buffer that allows simultaneous access from several applications to the same data and efficient disk writes. Finally, we show a testbed for a reliable benchmarking of our solution using custom DPDK traffic generators and replayers, which
have been made freely available for the network measurement
community.
young call girls in Green Park🔝 9953056974 🔝 escort Service
On the feasibility of 40 Gbps network data capture and retention with general purpose hardware
1. On the feasibility of 40 Gbps network data capture and
retention with general purpose hardware
SAC 2018 | Pau, France
Guillermo Julián-Moreno
Rafael Leira
Jorge E. López de Vergara
Francisco Gómez-Arribas
Iván González
April 10, 2018
Naudit HPCN &
Escuela Politécnica Superior, Universidad
Autónoma de Madrid
4. Motivation
Why would we want to capture and store traffic?
• Online analysis and monitoring (e.g., flow records, traffic volume dashboards, IDS).
• Data retention for specialized analysis or policy requirements (e.g. GPDR).
10 GbE standard is widespread, now we are seeing higher speeds (40 GbE, but also 100
GbE)
• 10 GbE is the “last” standard that we can process with a single core.
• 40 GbE and higher speeds require parallelism: how?
2
5. Purpose of the system
• Receive the traffic at 40 Gbps as efficiently as possible.
• Timestamp the incoming traffic.
• Store the network frames in the disk at 40 Gbps.
• Use commercial off-the-shelf hardware to reduce costs.
3
7. Previous architecture
DMA
tail
head
Descriptor ring Data bufferNIC
• Single thread copying frames to the intermediate buffer.
• Write files by blocks and use padding at the end of each file.
• Return the descriptor’s ownership to the card after the copy (no allocations
needed).
4
8. Reading from NIC and copying to buffer
1
23
4
Rx
Intermediate buffer
...
• Usual approach for parallelism is RSS queues. Problem: ensuring uniform
distribution between queues.
• Given our limited scope, switch to single queue and fixed descriptor assignments:
uniform distribution and no synchronization required for reading.
• Use a single atomic counter for the buffer write offset: as fast as possible and no
deadlocks possible.
• Add padding to the beginning and end of the files to avoid having frames
overrunning file boundaries.
5
9. Client reading
Kernel Userspace
Rx 0
Rx 1
Rx 2
Rx 3
1. Allocation
2. Copy Client
…
3. New data!
4. Reading
• Userspace clients get last written byte via syscalls and set their last read byte.
• RX thread 0 updates s, the space available in the buffer. No thread writes more
than ⌊s/n⌋ bytes in a batch.
6
10. Writing to disk
Two options for the write process:
• Regular files written in 4MB blocks. Needs a fast filesystem.
• Distributed writes between several NVMe disks with SPDK.
Features to reduce hardware requirements:
• Simple filtering system that searches “strings” of bytes at fixed positions.
• Selective storage: only store the first N bytes of each frame.
7
12. Hardware used
Traffic generator RX Server 1 RX Server 2
CPU Intel Xeon E5-1620 v2 Intel Xeon E5-1620 v2 2 × Intel Xeon E5-2630 v4
Clock 3.70GHz 3.70GHz 2.20 GHz
Cores 4 4 2 x 10
Memory 32 GB 32 GB 2 x 64 GB
NIC Intel XL710 Intel XL710 Intel XL710
Storage SATA RAID SATA RAID 6 x NVMe
Est. cost 7,000e 7,000e 10,000e
Table 1: Specifications of the servers used for testing. HyperThreading was disabled.
8
13. Storage speed
0
10
20
30
40
50
2 3 4 5 6
Rate(Gbps)
Number of discs
Disk write rate
Software RAID
SPDK
Theoretical max speed
Figure 1: Performance of the NVMe disk array.
9
14. Traffic generation
0
5
10
15
20
25
30
35
40
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500
Rate(Gbps)
Frame size (bytes)
Generation rate
Send rate
Theoretical max rate
Figure 2: Synthetic traffic rates achieved with our custom, DPDK-based traffic generator. We also
made a version capable of sending large PCAP files at line rate.
10
15. Timestamping accuracy
Frames Mean Std
All 1738 ns 3296 ns
One out of every eight 55 ns 287 ns
Table 2: Timestamping accuracy. The Intel NIC posts descriptors in batches of eight, so we have to
take that into account for the accuracy.
11
16. Traffic capture
0
25
50
75
100
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500
0
5
10
15
20
25
30
35
40
Loss%
Rate(Gbps)
Frame size (bytes)
Capture rate
Lost %
Port Drop %
Send rate
Capture rate
Theoretical max rate
Figure 3: Results of the first test: retrieval of the
frames from the NIC. The bottleneck is the card
for small frame sizes.
0
25
50
75
100
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500
0
5
10
15
20
25
30
35
40
Loss%
Rate(Gbps)
Frame size (bytes)
Capture rate
Lost %
Port Drop %
Send rate
Capture rate
Theoretical max rate
Figure 4: Results of the second test: writing of
the frames to a null device.
12
17. Traffic storage
0
25
50
75
100
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500
0
5
10
15
20
25
30
35
40
Loss%
Rate(Gbps)
Frame size (bytes)
Capture rate
Lost %
Send rate
Capture rate
Theoretical max rate
Figure 5: Results of the third test: traffic storage using SPDK.
13
18. Traffic storage
Name Size Avg. frame size Send rate Loss %
CAIDA 222 GB 787.91 B 39.78 Gbps <0.01
University 4.3 GB 910.08 B 39.82 Gbps 0
Table 3: Performance on reception of traffic capture files.
14
20. Results
• We have created and open-sourced a system capable of capturing, timestamping
and storing network traffic at 40 Gbps.
• Not using RSS parallelism is feasible and useful in our limited-scope system.
• The one-copy mechanism and synchronization algorithms allow our system to store
line-rate traffic at frame sizes of 300 bytes and above (enough for the majority of
environments).
• We have created a testbed capable of saturating 40 GbE links for frames of size 96
bytes or greater.
15
21. Future work
• Improve the selective-storage approach: more effective filters (ASCII/BPF) or limits
based on RX rate.
• Detailed comparison of the disorder and timestamp inaccuracies between our
approach and RSS queues.
• Port this system to virtual machines with SR-IOV NVFs.
16