• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
In-Network Acceleration with FPGA (MEMO)
 

In-Network Acceleration with FPGA (MEMO)

on

  • 1,570 views

In-Network Acceleration with FPGA (MEMO)

In-Network Acceleration with FPGA (MEMO)

05 Feb, 2013
SAKURA Internet Research Center
Senior Researcher / Naoto MATSUMOTO

Statistics

Views

Total Views
1,570
Views on SlideShare
1,380
Embed Views
190

Actions

Likes
1
Downloads
0
Comments
0

4 Embeds 190

http://geodenx.blogspot.jp 185
https://twitter.com 3
http://geodenx.blogspot.sg 1
http://geodenx.blogspot.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    In-Network Acceleration with FPGA (MEMO) In-Network Acceleration with FPGA (MEMO) Presentation Transcript

    • 18 Feb, 2013 SAKURA Internet Research CenterSenior Researcher / Naoto MATSUMOTO
    • Hardware Acceraletion Overview Source: (c) 2012 Enyx
    • In-Network Acceleration Model Source: (c) 2012 Enyx
    • Hardware Acceleration Limitation Supports up to 16 TCP sessions (ONLY) Source: © Copyright 2012 PLDA All Rights Reserved
    • Altera 5SGXEA7N2F45C2N Arista 7124FX Altera 5SGXEA7N2F45C2N - 622,000 Logic Elements, - 234,750 Adaptive Logic Modules (ALMs), - 642,000 Registers, - 57 Mb internal memory, - 512 18×18 Hardware multipliers, - 256 27×27 Digital Signal Processing Blocks, Copyright © 2013 Arista Networks, Inc. All rights reserved. - 28 PLL for digital clocks synthesis.- 2 x 4 GB DDR3-1333 ECC @ 600 MHz memory- 3 x 72Mbit QDR II+ SRAM Memory @ 333 MHz.- 16+8 SFP+ ports for 1 Gb/s and 10 Gb/s Ethernet fiber or copper applications.- PCI Express Generation 2.0 x4 interface. Source: (c) 2012 Enyx
    • Comparison of FPGA SPECsArista 7124FX / Enyx FPB1 (NIC)/ Accelize XpressGX4LP (NIC)SolarFlare AOE(NIC)Altera Stratix V 5SGXEA7N2F45C2N Altera EP4SGX530KF40C2N622,000 Logic Elements, 531,200 Logic Elements,234,750 Adaptive Logic Modules (ALMs), 212,480 Adaptive Logic Modules (ALMs),642,000 Registers, 424,960 Registers,57 Mb internal memory, 2.6 Mb internal memory,512 18×18 Hardware multipliers, 1024 18×18 Hardware multipliers.256 27×27 Digital Signal Processing Blks,28 PLL for digital clocks synthesis. Source: (c) 2012 Enyx
    • Enabling ToDo1) Using Pre-installed kernel modeuls for Mellanox 40GbE-NIC(mlx4_core,en)2) Load 40GbE-NIC kernel module on /etc /modules $ show version Version: VC6.5R1 Description: Vyatta Core 6.5 R1 $ sudo vi /etc/modules mlx4_en $ sync; sync; sync; reboot © 2013 Mellanox Technologies. All Rights Reserved.
    • 40GbE-NIC Status Check $ show interfaces ethernet eth1 physical Settings for eth1: Supported ports: [ TP ] : Speed: 40000Mb/s Duplex: Full Port: Twisted Pair : Link detected: yes driver: mlx4_en version: 2.0 (Dec 2011) firmware-version: 2.10.800 bus-info: 0000:01:00.0
    • HighGig DATA Transfer Benchmark 0.87 Gbit/sec* 5.58 Gbit/sec* 8.00 Gbit/sec* 13.68 Gbit/sec* 18.23 Gbit/sec**[System; Intel® Core™ i7-3930K CPU @ 3.20GHz / 32GB DDR3-DIMM / Linux 3.7-rc7 / Mellanox ConnectX3 40GbE-NIC][Benchmark Tool: wget+thttpd+tmpfs*, rcopy+tmpfs**,] SOURCE: SAKURA Internet Research Center. 12/2012 rev2 Project THORN.
    • Application Bottleneck in OS (BAD KnowHow) 5.28 Gbit/sec* 5.17 Gbit/sec* Application Bottoleneck 5.37 Gbit/sec* 18.24 Gbit/sec**[System; Intel® Core™ i7-3930K CPU @ 3.20GHz / 32GB DDR3-DIMM / Linux 3.7-rc7 / Mellanox ConnectX3 40GbE-NIC][Benchmark Tool: nc+dd+tmpfs*, rcopy+tmpfs**,] SOURCE: SAKURA Internet Research Center. 12/2012 rev2 Project THORN.
    • DPDK TESTING Overview 1) Intel ® DPDK source codes for linux were released at End of 2012. http://www.intel.com/p/en_US/embedded/hwsw/technology/packet-processing Running Intel® DPDK Applications in a Linux Environment To run an Intel® DPDK application, some customization must be done on the target machine. Running an Intel® DPDK application requires some kernel configuration customization (done at build time) and some dynamic kernel tweaks (modules, procfs): Required: • glibc >= 2.7 (for features related to cpuset) ..etc Intel® 10Gbps Dual-port Network AdapterLinux DPDK Layer3 Router is Evolutionary Network Technology. Source: SAKURA Internet Research Center. 11/2012: Project THORN
    • DPDK Layer 3 Fwd Benchmark [Layer 3 Fowarder with Intel® DPDK] Intel® Core™ i7-3960X CPU @ 3.30GHz Intel 82599EB 10GbE-NIC /PCI Epxress 3.0 Linux 2.6.32-220.23.1.el6.x86_64 # ./build/l3fwd -c 0x3 -n 2 -- -p 0x3 --config="(0,0,0),(1,0,1)" : done: Port 0 Link Up - speed 10000 Mbps - full-duplex done: Port 1 Link Up - speed 10000 Mbps - full-duplex L3FWD: entering main loop on lcore 1 VXLAN Network L3FWD: -- lcoreid=1 portid=1 rxqueueid=0 :[Traffic Generator] MTU64Byte Short Pkt. [Packet Receiver]Intel® Core™ i7-3930K CPU @ 3.20GHz AMD E-350 1.76GHz / DDR3 8GBIntel 82599EB 10GbE-NIC/PCI Express 2.0 Intel 82599EB 10GbE-NIC/PCI Express 2.010.0.0.11 / 00:0C:BD:00:E8:1B 10.0.0.22 / 90:E2:BA:23:02:9D# pkt-gen –i ix1 –f tx –l 64 -d 10.0.0.22 # pkt-gen –i ix1 –f rxmain [1042] map size is 207712 Kb main [1071] map size is 207712 Kbmain [1064] mmapping 207712 Kbytes main [1093] mmapping 207712 Kbytesmain [1119] Ready... main [1146] Wait 2 secs for phy resetsender_body [607] start main [1148] Ready...sender_body [644] drop copy main [1257] 1206448 ppsmain [1231] 14115785 pps main [1257] 13602560 ppsmain [1231] 14118009 pps main [1257] 13573141 pps: [14.1Mpps] : [13.5Mpps] Source: SAKURA Internet Research Center. 11/2012: Project THORN
    • DPDK Layer 3 Fwd perf stat [Layer 3 Fowarder with Intel® DPDK] Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz Intel 82599EB 10GbE-NIC /PCI Epxress 3.0 Linux 2.6.32-220.23.1.el6.x86_64 # perf rstat ./build/l3fwd -c 0x3 -n 2 -- -p 0x3 --config="(0,0,0),(1,0,1)" : Performance counter stats for ./build/l3fwd -c 0x3 -n 2 -- -p 0x3 --config=(0,0,0),(1,0,1): VXLAN Network 92805.936402 task-clock # 1.853 CPUs utilized 133 context-switches # 0.000 M/sec 13 CPU-migrations # 0.000 M/sec 1,958 page-faults # 0.000 M/sec 370,566,087,852 cycles # 3.993 GHz [83.33%] 102,860,504,930 stalled-cycles-frontend # 27.76% frontend cycles idle [83.33%] 32,572,874,185 stalled-cycles-backend # 8.79% backend cycles idle [66.67%] 663,418,320,041 instructions # 1.79 insns per cycle # 0.16 stalled cycles per insn [83.33%] 106,088,555,938 branches # 1143.123 M/sec [83.33%] 63,608,468 branch-misses # 0.06% of all branches [83.33%] 50.077399637 seconds time elapsed[Traffic Generator] MTU64Byte Short Pkt. [Packet Receiver] Source: SAKURA Internet Research Center. 11/2012: Project THORN
    • Thanks for your interest.SAKURA Internet Research Center.