High-resolution Timer-based 
Packet Pacing Mechanism 
on the Linux Operating System 
Ryousei Takano, Tomohiro Kudoh, 
Yuetsu Kodama, Fumihiro Okazaki 
 
Information Technology Research Institute, 
National Institute of Advanced Industrial 
Science and Technology (AIST) 
ŬƧƀƩƉƂƆŸƧƐũƤƧż2010ij2010b10†26€ij‹	NS
2 
±ÏŖŸŦ 
• ƍŷƂƆƕƩźƧŶ 
• ĆÖcƀŬƘŨ®ĹŋƍŷƂƆ 
ƕƩźƧŶ” 
• ÚVą 
• ŝŒş
ɃƪƭƬƮƫ 
• ƉƂƆƦƩŵŖNWð5ľêņŐĹťľĴ 
_G)®2¨Ũ…N5ŇťŖŗCýŕ 
• @ăŖ8BŗéŖ¿ 
ƫTCP incast@ă 
• MPI All-to-allé 
• MapReduce źƝƂƐƣé 
3 
: 
: 
×ÁƊƩƇ 
1 żŬƂƁ 
2 
3 
N 
ƌƂƐũ 
ĸŚŦ
ɃƪƮƬƮƫ 
• _G)®2¨Ũ…N5ŇťŕŗƲ 
– ƪèƤƩƆƫƱƪ)®:Ê_GƫćƬćƪ=‚éƊƩƇ{ƫ 
• ņĽņĴéŖƌƩżƆlŖŋşĴ‚³ŕ)®:Ê_GŨâì 
ƍŷƂƆƕƩźƧŶƪƌƩżƆlŖa£5ƫľjÓ 
4 
VüŖéƪƌƩżƆ‡ƫ 
1 żŬƂƁ 
2 
3 
ƌƂƐũ 
ĸŚŦ 
ƌƩżƆ 
BW 
BW / 3 
żŬƂƁ 
BW 
ªm³ŔéƪƌƩżƆ¤ƫ 
1 
2 
3 
BW / 3
ªm³ŔƕƩźƧŶŖV© 
• –¹ŔƍŷƂƆèôú*iľjÓ 
• ƪƍŷƂƆŹŬŽƫƱƪƍŷƂƆôŴƝƂƒƫŖIư 
èƤƩƆŗ¥ª_GŖ1/2 
– ƫjÓŔƍŷƂƆèôúŖÂc 
ijij1 GbpsƬMTU 1500BIJ24ƘŬŵƥ» 
ijij10 GbpsƬMTU 9000BIJ14.4ƘŬŵƥ» 
ijij 
ćƍŷƂƆèôú ćƍŷƂƆôŴƝƂƒ 
5
PSPacer 
• é_GŨ2¨³ŕ)®ŇťŋşŖſƐƆŭŮŪ 
• ŴŲƎƂƆŬƩŹƉƂƆŕļŁťÂXŔƕƩźƧŶŨ 
ſƐƆŭŮŪŌŁőV© 
– )®:Ê_GŕĸŧʼnŐƌƩżƆèŨa£5Ňť 
łŒőĴTUņŋĆĹélÊŨV© 
6 
Buffer 
Overflow 
ƌƩżƆlŖĆĹƆơƐūƂŵŗƍŷƂƆƥ 
żŨfĿáłņĴélÊŖŨrŀ 
Switch/Router 
PSPacerŗƍŷƂƆôúŨÜ|ņĴa£5 
ńŦTUņŋƆơƐūƂŵŨ¬oŇť
ƍŷƂƆƕƩźƧŶŖVÐe 
ƋƩƇŭŮŪVÐ ſƐƆŭŮŪVÐ 
ƀŬƘĄ3 
e 
Öc 
ƀŬƘ 
ĆÖc 
ƀŬƘ 
ŴƝƂƒı 
ƍŷƂƆe 
• FPGAšNPŨ®ĹŋVÐ 
• Chelsio T210 
PSPacer 
PSPacer/HT 
7
8 
PSPacer:ƌŬƆŵƥƂŵ 
• ƀŬƘĄ3e 
– ŴŲƎƂƆÃŖƉƂƆƦƩŵőŗƘŬŵƥ»ÂcŖ*iŗCý 
• OSŖƀŬƘôúư1Ƴ10ƙƢ» 
– ĂÆŔƀŬƘ.çŞ%ªŕŢťŰƩƌƓƂƇŖK0 
• ƌŬƆŵƥƂŵ 
– ƀŬƘ.çŞŨŧňĴèƌŬƆŨŵƥƂŵŕ)® 
• 1ƌŬƆŖèŕÓŇť‚ôŗUijƪ10 GbpsIJ0.8 ƈƊ»ƫ 
– ƦŬƞƤƩƆőƍŷƂƆŨûôŔŀèőĿŦŘĴƍŷƂƆèôúŗ 
–¹ŕ*i:Ê 
7.2 us 7.2 us 
ƌŬƆŵƥƂŵ 
9000B (byte) 
0 9K 18K 27K 
èƀŬƙƧŶŗéóPĽţŖèƌŬƆő›U 
9000B
9 
PSPacerưŴƝƂƒƍŷƂƆe 
• PAUSEƐƤƩƚƪIEEE 802.3x ƐƥƩ*iƫŖ)® 
– -®Ŕņ 
• jňżŬƂƁƨƣƩƀŖ/ƗƩƆő¸Ž 
– VƍŷƂƆŖŞľĴ ŖèôúŨsņŏŏ/ 
– ¦(ŔƋƩƇŭŮŪľÓ 
èPC żŬƂƁ 
VƍŷƂƆ 
ŴƝƂƒƍŷƂƆ
ŴƝƂƒƍŷƂƆeŖ*÷ 
1. ƦŬƞƤƩƆőƍŷƂƆŨèőĿťlÊľjÓ 
– CPUlÊãšĴPCIƌżƖƆƣƉƂŵŕŔťIĴ 
–¹ŔƕƩźƧŶŗ:Ê 
• ƫ10 GbEĴ32bit/33MHz PCI (ªÝƤƩƆ 133MB/s)őGbE 
2. Ethernet
MőŖ®ľ:Ê 
– ŴƝƂƒƍŷƂƆŖV©p—ƪPAUSEƐƤƩƚƫľŔĹ 
3. °ƅƌŬżśŖ‰Yk 
– ƫBondingĴtapƅƌŬż 
– 8ª³ŕŗYk:ÊŌľĴ°ƅƌŬżŖƇơŬƌŕ 
YŇť–ľjÓ 
10
11 
±ÏŖŸŦ 
• ƍŷƂƆƕƩźƧŶ 
• ĆÖcƀŬƘŨ®ĹŋƍŷƂƆ 
żŷŻƟƩƢƧŶ” 
• ÚVą 
• ŝŒş
LinuxŖƀŬƘźżƄƚ 
ÖcƀŬƘƨŬƔƧƆ‡ 
ÖcƀŬƘƨŬƔƧƆ¤ 
ĆÖcƀŬƘŬƔƧƆ 
Ticks (Jiffies) 
• ÖcƀŬƘ 
– 1/HZ»Ŗ?ˆőƋƧƇơŨVÍ 
– ƀŬƘŬƔƧƆ
Mŕ‘ĵŔ%ªŨ=‚ŕVÍ 
• ĆÖcƀŬƘ 
– nŖ‚+ŕƋƧƇơŨ²ò:Ê 
• ?ˆ³ŠņŀŗƦƧźƠƂƆ 
– äðŔŬƔƧƆ%ª 
1000 1001 1002 1003 1004 
12
ƀŬƘĄ3eŖ$Ç 
@ăưĆĂcŖ.çŞ%ªŕYŇťCPUßÌ 
ij IJĆÖcƀŬƘe 
– ÖcŗŹƑƘŬŵƥ»őäð 
• Linux kernel 2.6.31
öŖÖcư1/16ƘŬŵƥ» 
– OSŹƗƩƆŖ| 
ŊŖ
ŖCPUßÌä ŖÀ 
– ƉƂƆƦƩŵżƀƂŵŖƘƣƁŸŪYk 
– NICŕŢťŰƐƥƩƇ”ŖZ 
• ƫTCP Segmentation OffloadŔœ 
13
14 
PSPacer/HTŖVÐ 
• űƩƉƣƜŻƟƩƣƪQdiscƫ 
ŒņŐVÐ 
– űƩƉƣŖ$ŸƧƍŬƣľÓ 
– ŪƒƢŷƩźƠƧ 
– ƒƥƆŸƣżƀƂŵijijÿR 
– ƇơŬƌ 
• Linux’¢ƃƩƣĽţŖ)® 
– Iproute2 (tc(8)) 
 
Socket 
buffer 
Protocol stack 
Device Driver 
enqueue 
dequeue 
PSPacer/HT 
Byte clock 
scheduler 
Socket Layer 
Interface 
queues 
Classifier 
Netlink 
socket I/F
15 
ƌŬƆŵƥƂŵƨżŷŻƟƩơ 
ŵơżŵƥƂŵưųƟƩŖ!āƍŷƂƆŖ 
ijijijijijijijijijijèU‚+ŨsŇť 
ŶƥƩƌƣŵƥƂŵư©DŝőŖèƌŬƆ{ 
…[ŖŵơżŵƥƂŵľĴŶƥƩƌ 
ƣŵƥƂŵŢŤŠ[ńŁŦŘĴŊŖ 
ųƟƩŖ!āƍŷƂƆŨèņĴ 
ŵƥƂŵŨ„~Ňť 
VüŕŗĶèŪŬƇƣ‚ôķľjÓ 
ĆÖcƀŬƘŨÙUņŐĴ•ŖƍŷƂƆè‚+ŝőg”
;eŖ™å 
š®l –¹ń CPUßÌ 
ÖcƀŬƘ 
ĆÖcƀŬƘ 
(PSPacer/HT) 
ŴƝƂƒƍŷƂƆ 
(PSPacer) 
16
17 
±ÏŖŸŦ 
• ƍŷƂƆƕƩźƧŶ 
• ĆÖcƀŬƘŨ®ĹŋƍŷƂƆ 
ƕƩźƧŶ” 
• ÚVą 
• ŝŒş
Ú 
• VąĀ´ 
– aF_G 
• 100 Mbps+ŞőèƤƩƆŨL5ńʼnĴ´’ŒV¡Ŗ 
^'Ũס 
– burstiness 
• ƖƆƣƉƂŵƣƩƀƨżŬƂƁŖƌƂƐũ®ðŕYŇťt’ 
• ƍŷƂƆųƝƒƁƝÄŒŨHŕźƙƟƤƩźƠƧŕŢŤ×Á 
– CPUßÌ 
• żƆƢƩƚ{ŒŖµõ 
• ÚYÞ 
– PSPacerĴPSPacer/HTĴHTB (Hierarchical Token Bucket) 
18
HTB: Hierarchical Token Bucket 
• Linux’¢ŖQdiscƜŻƟƩƣ 
• CBQƪClass based queuingƫŖŢĺŔù]³Ŕ 
_G*iľ:Ê 
• ƍŷƂƆżŷŻƟƩƢƧŶŕĆÖcƀŬƘŨ)® 
– Linux kernel 2.6.31
öŖÖcư1/16ƘŬŵƥ» 
• gō‚ô×ÁœŖíĹ 
– PSPacer/HT: ƍŷƂƆ˜ŕ´’ƤƩƆĽţ×Á 
– HTB: l2t (length to time)ÏŕťÏfĿ 
• ÏŖŬƧƅƂŵżŗ256ņĽŔĹŋşĴÂcŕ÷¯ 
19
Myri-10G Myri-10G 
20 
Vą«J 
• ×Á”żƕƂŵćƪPC Aƫ 
– CPU: Quad-core Xeon (E5430) x 2 
– NIC: Myricom Myri-10G (PCIe x 8) 
• MTU: 9000 byte 
– Memory: 8GB DDR2-667 
• OS: Ubuntu 9.10 server 
sender receiver 
– Linux kernel 2.6.31-10 + myri10ge driver 1.5.1 
– sysctlƍơƛƩƀ: 
• net.core.netdev_max_backlog 25000 
• net.core.rmem_max 16777216 
• net.core.wmem_max 16777216 
• net.ipv4.tcp_rmem 4096 65536 16777216 
• net.ipv4.tcp_wmem 4096 87380 16777216 
• net.ipv4.tcp_no_metrics_save 1 
GtrcNET-10
21 
GtrcNET 
• NÕ“FPGAŨvæņŋƋƩƇŭŮŪƉƂƆƦƩŵƄżƆƔƂƇ 
• ćƦŬƞƤƩƆőńŝŅŝŔ”ÊŨƒƥŶơƚ:Ê 
• ćGtrcNET-1: GbE (GBIC) x 4ports + 16MBytes Memory/port 
• GtrcNET-10: 10GbE (XENPAK) x 3ports + 1GBytes Memory /port 
• VÐ”Ê 
• ć_G¡UƪƗƩƆ6ĴżƆƢƩƚ6ĴVLAN6ƫ 
• ćëdŖ“w 
• ćƍŷƂƆųƝƒƁƝ 
• ćƄżƆƍŷƂƆ¬o 
• ćèƤƩƆ*iƪƕƩźƧŶĴ 
ijźŮƩƏƧŶĴƗƢźƧŶƫ 
http://projects.itri.aist.go.jp/gnet/
aF_G*iŖ–¹ń 
ćčċĐ 
ćčċď 
ćč 
Ċčċď 
ĊčċĐ 
IperfŨ5»ôVÍņŋŒĿŖè_GŨGtrcNET-10ő¡U 
ĜĝĜĠĢĤīČĞĖ 
ĜĝĜĠĢĤī 
ĘĞĖć 
ćč ćď ćĐ ćđ ćē ćĎč 
ěġĬĤīĮĤģćĖĠĩģįħģĭĦćĊćĞĠīĥĤĭćĖĠĩģįħģĭĦ 
ćĈėġĪĬĉ 
ĞĠīĥĤĭćĖĠĩģįħģĭĦćĈėġĪĬĉ 
^'Ŗ…N 
ƪůơƩ¨ƫư 
+473 Mbps (+9.5%) 
+36 Kbps (0.0%) 
-287 Mbps (-5.7%) 
HTB: èƤƩƆÔ¼ŠŤŖOz 
PSPacer: ƦŬƞƤƩƆŐĹŔĹ 
22 
PSPacer/HT: ƦŬƞƤƩƆŔĹ 
źżƄƚőŠ–¹ŔƕƩźƧŶŨV©
Burstiness 
• ƖƆƣƉƂŵƣƩƀƨżŬƂƁŖƌƂƐũ®ðŕ 
YŇťt’ 
burstiness 
– NĿĹŜœƌƂƐũĸŚŦŖ 
7øľĆŝť 
• 5 Gbpsè‚Ŗ70ƍŷƂƆŨųƝƒƁƝņĴ 
źƙƟƤƩźƠƧŢŤburstinessŖ…NŨ×Á 
max. burstiness 
PSPacer 7 
PSPacer/HT 9 
HTB 8 
ÖcƀŬƘƪ1ƙƢ»ƫ 39 
23 
ĆÂcƀŬƘŕŢťƌƩżƆl, 2ŒŗNĿĹ 
 
nưTSO‡2 
ijijijijªm³ŕŗƯ
ĆÖcƀŬƘƋƧƇơ%ªŖëd 
űƩƉƣƀŬƘ.çŞÙUŨL„ņŐĴƀŬƘŬƔƧƆŖëdŨ¡U 
10ƙƢ» 1ƙƢ» 
burstiness = 53 burstiness = 9 
(1) űƩƉƣƀŬƘ.çŞư10ƙƢ» ƪ2ƫćűƩƉƣƀŬƘ.çŞư1ƙƢ» 
ĆÖcƀŬƘƋƧƇơő%ªőĿŔĽŎŋŬƔƧƆŗĴ 
ÖcƀŬƘƋƧƇơŖá3ŝőëdńŦť 
24
CPUßÌƪ1żƆƢƩƚƫ 
×´’ 
_G 
PSPacer PSPacer/HT HTB 
1 
×_G/ 
50 Mbps 1 
×_G/ 
50 Mbps 1 
×_G/ 
50 Mbps 
1 Gbps 0.66 0.71 0.84 
2 Gbps 1.80 1.60 1.83 
4 Gbps 3.74 3.66 3.92 
8 Gbps 7.67 8.35 8.88 
é_GľNĿŀŔťŜœĴĆÖcƀŬƘ%ªŖßÌŗ 
25
CPUßÌƪÒ{żƆƢƩƚƫ 
×´’ 
_G 
PSPacer PSPacer/HT HTB 
1 
×_G/ 
50 Mbps 1 
×_G/ 
50 Mbps 1 
×_G/ 
50 Mbps 
1 Gbps 0.66 1.04 0.71 0.91 0.84 0.82 
2 Gbps 1.80 2.16 1.60 2.44 1.83 1.88 
4 Gbps 3.74 4.78 3.66 8.19 3.92 4.49 
8 Gbps 7.67 11.19 8.35 17.04 8.88 25.55 
żƆƢƩƚ{ľKĻťŜœĴĆÖcƀŬƘ%ªŖßÌŗ 
26
VąÄŒŝŒş 
š®l –¹ń CPUßÌ 
ÖcƀŬƘ 
ĆÖcƀŬƘ 
ŴƝƂƒƍŷƂƆ 
CPUßÌŗxËŖEľĸť 
27
28 
±ÏŖŸŦ 
• ƍŷƂƆƕƩźƧŶ 
• ĆÖcƀŬƘŨ®ĹŋƍŷƂƆ 
ƕƩźƧŶ” 
• ÚVą 
• ŝŒş
ŝŒş 
• ĆÂcƀŬƘŨ®ĹŋƍŷƂƆƕƩźƧŶ”Ũ 
uņĴÚ 
– ŴƝƂƒƍŷƂƆeŖ*÷ŨAî 
– 10GbE«JőŠÂXŔƕƩźƧŶľ:Ê 
– Ò{żƆƢƩƚĴ9ř´’_GľĆĹIĴCPUßÌľ 
ššĆĹĸŤ 
• hŖÛă 
– Ò{żƆƢƩƚé‚ŕļŁťCPUßÌŖ,  
– HTBŕļŁťèƤƩƆÔ¼ŠŤœŖxË 
29
30 
ŃþÈĸŤľŒĺŃŅĹŝņŋ 
PSPacer/HTŗGNU GPLơŬžƧżŕŐ#ó 
http://www.gridmpi.org/pspacer.jsp 
ŔļĴŠ·½Ŗïŗ}ïºS¶ºS·½àÑ1ñƪ20800083ƫĴ 
ļŢř§¾Íyœ
~ůƉƣŴƩƨ­qÎÅ󱔐 
ƪNEDOƫŖQ؏4ĶŶƢƩƧƉƂƆƦƩŵƨźżƄƚqη½ó± 
ƒƥŻŮŵƆƪŶƢƩƧITƒƥŻŮŵƆƫķŖoŒŨž®ņŐĹť

High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating System

  • 1.
    High-resolution Timer-based PacketPacing Mechanism on the Linux Operating System Ryousei Takano, Tomohiro Kudoh, Yuetsu Kodama, Fumihiro Okazaki Information Technology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST) ŬƧƀƩƉƂƆŸƧƐũƤƧż2010ij2010b10†26€ij‹ NS
  • 2.
    2 ±ÏŖŸŦ •ƍŷƂƆƕƩźƧŶ • ĆÖcƀŬƘŨ®ĹŋƍŷƂƆ ƕƩźƧŶ” • ÚVą • ŝŒş
  • 3.
    ɃƪƭƬƮƫ • ƉƂƆƦƩŵŖNWð5ľêņŐĹťľĴ _G)®2¨Ũ…N5ŇťŖŗCýŕ • @ăŖ8BŗéŖ¿ ƫTCP incast@ă • MPI All-to-allé • MapReduce źƝƂƐƣé 3 : : ×ÁƊƩƇ 1 żŬƂƁ 2 3 N ƌƂƐũ ĸŚŦ
  • 4.
    ɃƪƮƬƮƫ • _G)®2¨Ũ…N5ŇťŕŗƲ – ƪèƤƩƆƫƱƪ)®:Ê_GƫćƬćƪ=‚éƊƩƇ{ƫ • ņĽņĴéŖƌƩżƆlŖŋşĴ‚³ŕ)®:Ê_GŨâì ƍŷƂƆƕƩźƧŶƪƌƩżƆlŖa£5ƫľjÓ 4 VüŖéƪƌƩżƆ‡ƫ 1 żŬƂƁ 2 3 ƌƂƐũ ĸŚŦ ƌƩżƆ BW BW / 3 żŬƂƁ BW ªm³ŔéƪƌƩżƆ¤ƫ 1 2 3 BW / 3
  • 5.
    ªm³ŔƕƩźƧŶŖV© • –¹ŔƍŷƂƆèôú*iľjÓ • ƪƍŷƂƆŹŬŽƫƱƪƍŷƂƆôŴƝƂƒƫŖIư èƤƩƆŗ¥ª_GŖ1/2 – ƫjÓŔƍŷƂƆèôúŖÂc ijij1 GbpsƬMTU 1500BIJ24ƘŬŵƥ» ijij10 GbpsƬMTU 9000BIJ14.4ƘŬŵƥ» ijij ćƍŷƂƆèôú ćƍŷƂƆôŴƝƂƒ 5
  • 6.
    PSPacer • é_GŨ2¨³ŕ)®ŇťŋşŖſƐƆŭŮŪ • ŴŲƎƂƆŬƩŹƉƂƆŕļŁťÂXŔƕƩźƧŶŨ ſƐƆŭŮŪŌŁőV© – )®:Ê_GŕĸŧʼnŐƌƩżƆèŨa£5Ňť łŒőĴTUņŋĆĹélÊŨV© 6 Buffer Overflow ƌƩżƆlŖĆĹƆơƐūƂŵŗƍŷƂƆƥ żŨfĿáłņĴélÊŖŨrŀ Switch/Router PSPacerŗƍŷƂƆôúŨÜ|ņĴa£5 ńŦTUņŋƆơƐūƂŵŨ¬oŇť
  • 7.
    ƍŷƂƆƕƩźƧŶŖVÐe ƋƩƇŭŮŪVÐ ſƐƆŭŮŪVÐ ƀŬƘĄ3 e Öc ƀŬƘ ĆÖc ƀŬƘ ŴƝƂƒı ƍŷƂƆe • FPGAšNPŨ®ĹŋVÐ • Chelsio T210 PSPacer PSPacer/HT 7
  • 8.
    8 PSPacer:ƌŬƆŵƥƂŵ •ƀŬƘĄ3e – ŴŲƎƂƆÃŖƉƂƆƦƩŵőŗƘŬŵƥ»ÂcŖ*iŗCý • OSŖƀŬƘôúư1Ƴ10ƙƢ» – ĂÆŔƀŬƘ.çŞ%ªŕŢťŰƩƌƓƂƇŖK0 • ƌŬƆŵƥƂŵ – ƀŬƘ.çŞŨŧňĴèƌŬƆŨŵƥƂŵŕ)® • 1ƌŬƆŖèŕÓŇť‚ôŗUijƪ10 GbpsIJ0.8 ƈƊ»ƫ – ƦŬƞƤƩƆőƍŷƂƆŨûôŔŀèőĿŦŘĴƍŷƂƆèôúŗ –¹ŕ*i:Ê 7.2 us 7.2 us ƌŬƆŵƥƂŵ 9000B (byte) 0 9K 18K 27K èƀŬƙƧŶŗéóPĽţŖèƌŬƆő›U 9000B
  • 9.
    9 PSPacerưŴƝƂƒƍŷƂƆe •PAUSEƐƤƩƚƪIEEE 802.3x ƐƥƩ*iƫŖ)® – -®Ŕņ • jňżŬƂƁƨƣƩƀŖ/ƗƩƆő¸Ž – VƍŷƂƆŖŞľĴ ŖèôúŨsņŏŏ/ – ¦(ŔƋƩƇŭŮŪľÓ èPC żŬƂƁ VƍŷƂƆ ŴƝƂƒƍŷƂƆ
  • 10.
    ŴƝƂƒƍŷƂƆeŖ*÷ 1. ƦŬƞƤƩƆőƍŷƂƆŨèőĿťlÊľjÓ – CPUlÊãšĴPCIƌżƖƆƣƉƂŵŕŔťIĴ –¹ŔƕƩźƧŶŗ:Ê • ƫ10 GbEĴ32bit/33MHz PCI (ªÝƤƩƆ 133MB/s)őGbE 2. Ethernet MőŖ®ľ:Ê – ŴƝƂƒƍŷƂƆŖV©p—ƪPAUSEƐƤƩƚƫľŔĹ 3. °ƅƌŬżśŖ‰Yk – ƫBondingĴtapƅƌŬż – 8ª³ŕŗYk:ÊŌľĴ°ƅƌŬżŖƇơŬƌŕ YŇť–ľjÓ 10
  • 11.
    11 ±ÏŖŸŦ •ƍŷƂƆƕƩźƧŶ • ĆÖcƀŬƘŨ®ĹŋƍŷƂƆ żŷŻƟƩƢƧŶ” • ÚVą • ŝŒş
  • 12.
    LinuxŖƀŬƘźżƄƚ ÖcƀŬƘƨŬƔƧƆ‡ ÖcƀŬƘƨŬƔƧƆ¤ ĆÖcƀŬƘŬƔƧƆ Ticks (Jiffies) • ÖcƀŬƘ – 1/HZ»Ŗ?ˆőƋƧƇơŨVÍ – ƀŬƘŬƔƧƆ Mŕ‘ĵŔ%ªŨ=‚ŕVÍ • ĆÖcƀŬƘ – nŖ‚+ŕƋƧƇơŨ²ò:Ê • ?ˆ³ŠņŀŗƦƧźƠƂƆ – äðŔŬƔƧƆ%ª 1000 1001 1002 1003 1004 12
  • 13.
    ƀŬƘĄ3eŖ$Ç @ăưĆĂcŖ.çŞ%ªŕYŇťCPUßÌ ijIJĆÖcƀŬƘe – ÖcŗŹƑƘŬŵƥ»őäð • Linux kernel 2.6.31 öŖÖcư1/16ƘŬŵƥ» – OSŹƗƩƆŖ| ŊŖ
  • 14.
    ŖCPUßÌä ŖÀ – ƉƂƆƦƩŵżƀƂŵŖƘƣƁŸŪYk – NICŕŢťŰƐƥƩƇ”ŖZ • ƫTCP Segmentation OffloadŔœ 13
  • 15.
    14 PSPacer/HTŖVÐ •űƩƉƣƜŻƟƩƣƪQdiscƫ ŒņŐVÐ – űƩƉƣŖ$ŸƧƍŬƣľÓ – ŪƒƢŷƩźƠƧ – ƒƥƆŸƣżƀƂŵijijÿR – ƇơŬƌ • Linux’¢ƃƩƣĽţŖ)® – Iproute2 (tc(8)) Socket buffer Protocol stack Device Driver enqueue dequeue PSPacer/HT Byte clock scheduler Socket Layer Interface queues Classifier Netlink socket I/F
  • 16.
    15 ƌŬƆŵƥƂŵƨżŷŻƟƩơ ŵơżŵƥƂŵưųƟƩŖ!āƍŷƂƆŖ ijijijijijijijijijijèU‚+ŨsŇť ŶƥƩƌƣŵƥƂŵư©DŝőŖèƌŬƆ{ …[ŖŵơżŵƥƂŵľĴŶƥƩƌ ƣŵƥƂŵŢŤŠ[ńŁŦŘĴŊŖ ųƟƩŖ!āƍŷƂƆŨèņĴ ŵƥƂŵŨ„~Ňť VüŕŗĶèŪŬƇƣ‚ôķľjÓ ĆÖcƀŬƘŨÙUņŐĴ•ŖƍŷƂƆè‚+ŝőg”
  • 17.
    ;eŖ™å š®l –¹ńCPUßÌ ÖcƀŬƘ ĆÖcƀŬƘ (PSPacer/HT) ŴƝƂƒƍŷƂƆ (PSPacer) 16
  • 18.
    17 ±ÏŖŸŦ •ƍŷƂƆƕƩźƧŶ • ĆÖcƀŬƘŨ®ĹŋƍŷƂƆ ƕƩźƧŶ” • ÚVą • ŝŒş
  • 19.
    Ú • VąĀ´ – aF_G • 100 Mbps+ŞőèƤƩƆŨL5ńʼnĴ´’ŒV¡Ŗ ^'Ũס – burstiness • ƖƆƣƉƂŵƣƩƀƨżŬƂƁŖƌƂƐũ®ðŕYŇťt’ • ƍŷƂƆųƝƒƁƝÄŒŨHŕźƙƟƤƩźƠƧŕŢŤ×Á – CPUßÌ • żƆƢƩƚ{ŒŖµõ • ÚYÞ – PSPacerĴPSPacer/HTĴHTB (Hierarchical Token Bucket) 18
  • 20.
    HTB: Hierarchical TokenBucket • Linux’¢ŖQdiscƜŻƟƩƣ • CBQƪClass based queuingƫŖŢĺŔù]³Ŕ _G*iľ:Ê • ƍŷƂƆżŷŻƟƩƢƧŶŕĆÖcƀŬƘŨ)® – Linux kernel 2.6.31 öŖÖcư1/16ƘŬŵƥ» • gō‚ô×ÁœŖíĹ – PSPacer/HT: ƍŷƂƆ˜ŕ´’ƤƩƆĽţ×Á – HTB: l2t (length to time)ÏŕťÏfĿ • ÏŖŬƧƅƂŵżŗ256ņĽŔĹŋşĴÂcŕ÷¯ 19
  • 21.
    Myri-10G Myri-10G 20 Vą«J • ×Á”żƕƂŵćƪPC Aƫ – CPU: Quad-core Xeon (E5430) x 2 – NIC: Myricom Myri-10G (PCIe x 8) • MTU: 9000 byte – Memory: 8GB DDR2-667 • OS: Ubuntu 9.10 server sender receiver – Linux kernel 2.6.31-10 + myri10ge driver 1.5.1 – sysctlƍơƛƩƀ: • net.core.netdev_max_backlog 25000 • net.core.rmem_max 16777216 • net.core.wmem_max 16777216 • net.ipv4.tcp_rmem 4096 65536 16777216 • net.ipv4.tcp_wmem 4096 87380 16777216 • net.ipv4.tcp_no_metrics_save 1 GtrcNET-10
  • 22.
    21 GtrcNET •NÕ“FPGAŨvæņŋƋƩƇŭŮŪƉƂƆƦƩŵƄżƆƔƂƇ • ćƦŬƞƤƩƆőńŝŅŝŔ”ÊŨƒƥŶơƚ:Ê • ćGtrcNET-1: GbE (GBIC) x 4ports + 16MBytes Memory/port • GtrcNET-10: 10GbE (XENPAK) x 3ports + 1GBytes Memory /port • VÐ”Ê • ć_G¡UƪƗƩƆ6ĴżƆƢƩƚ6ĴVLAN6ƫ • ćëdŖ“w • ćƍŷƂƆųƝƒƁƝ • ćƄżƆƍŷƂƆ¬o • ćèƤƩƆ*iƪƕƩźƧŶĴ ijźŮƩƏƧŶĴƗƢźƧŶƫ http://projects.itri.aist.go.jp/gnet/
  • 23.
    aF_G*iŖ–¹ń ćčċĐ ćčċď ćč Ċčċď ĊčċĐ IperfŨ5»ôVÍņŋŒĿŖè_GŨGtrcNET-10ő¡U ĜĝĜĠĢĤīČĞĖ ĜĝĜĠĢĤī ĘĞĖć ćč ćď ćĐ ćđ ćē ćĎč ěġĬĤīĮĤģćĖĠĩģįħģĭĦćĊćĞĠīĥĤĭćĖĠĩģįħģĭĦ ćĈėġĪĬĉ ĞĠīĥĤĭćĖĠĩģįħģĭĦćĈėġĪĬĉ ^'Ŗ…N ƪůơƩ¨ƫư +473 Mbps (+9.5%) +36 Kbps (0.0%) -287 Mbps (-5.7%) HTB: èƤƩƆÔ¼ŠŤŖOz PSPacer: ƦŬƞƤƩƆŐĹŔĹ 22 PSPacer/HT: ƦŬƞƤƩƆŔĹ źżƄƚőŠ–¹ŔƕƩźƧŶŨV©
  • 24.
    Burstiness • ƖƆƣƉƂŵƣƩƀƨżŬƂƁŖƌƂƐũ®ðŕ YŇťt’ burstiness – NĿĹŜœƌƂƐũĸŚŦŖ 7øľĆŝť • 5 Gbpsè‚Ŗ70ƍŷƂƆŨųƝƒƁƝņĴ źƙƟƤƩźƠƧŢŤburstinessŖ…NŨ×Á max. burstiness PSPacer 7 PSPacer/HT 9 HTB 8 ÖcƀŬƘƪ1ƙƢ»ƫ 39 23 ĆÂcƀŬƘŕŢťƌƩżƆl, 2ŒŗNĿĹ nưTSO‡2 ijijijijªm³ŕŗƯ
  • 25.
    ĆÖcƀŬƘƋƧƇơ%ªŖëd űƩƉƣƀŬƘ.çŞÙUŨL„ņŐĴƀŬƘŬƔƧƆŖëdŨ¡U 10ƙƢ»1ƙƢ» burstiness = 53 burstiness = 9 (1) űƩƉƣƀŬƘ.çŞư10ƙƢ» ƪ2ƫćűƩƉƣƀŬƘ.çŞư1ƙƢ» ĆÖcƀŬƘƋƧƇơő%ªőĿŔĽŎŋŬƔƧƆŗĴ ÖcƀŬƘƋƧƇơŖá3ŝőëdńŦť 24
  • 26.
    CPUßÌƪ1żƆƢƩƚƫ ×´’ _G PSPacer PSPacer/HT HTB 1 ×_G/ 50 Mbps 1 ×_G/ 50 Mbps 1 ×_G/ 50 Mbps 1 Gbps 0.66 0.71 0.84 2 Gbps 1.80 1.60 1.83 4 Gbps 3.74 3.66 3.92 8 Gbps 7.67 8.35 8.88 é_GľNĿŀŔťŜœĴĆÖcƀŬƘ%ªŖßÌŗ 25
  • 27.
    CPUßÌƪÒ{żƆƢƩƚƫ ×´’ _G PSPacer PSPacer/HT HTB 1 ×_G/ 50 Mbps 1 ×_G/ 50 Mbps 1 ×_G/ 50 Mbps 1 Gbps 0.66 1.04 0.71 0.91 0.84 0.82 2 Gbps 1.80 2.16 1.60 2.44 1.83 1.88 4 Gbps 3.74 4.78 3.66 8.19 3.92 4.49 8 Gbps 7.67 11.19 8.35 17.04 8.88 25.55 żƆƢƩƚ{ľKĻťŜœĴĆÖcƀŬƘ%ªŖßÌŗ 26
  • 28.
    VąÄŒŝŒş š®l –¹ńCPUßÌ ÖcƀŬƘ ĆÖcƀŬƘ ŴƝƂƒƍŷƂƆ CPUßÌŗxËŖEľĸť 27
  • 29.
    28 ±ÏŖŸŦ •ƍŷƂƆƕƩźƧŶ • ĆÖcƀŬƘŨ®ĹŋƍŷƂƆ ƕƩźƧŶ” • ÚVą • ŝŒş
  • 30.
    ŝŒş • ĆÂcƀŬƘŨ®ĹŋƍŷƂƆƕƩźƧŶ”Ũ uņĴÚ – ŴƝƂƒƍŷƂƆeŖ*÷ŨAî – 10GbE«JőŠÂXŔƕƩźƧŶľ:Ê – Ò{żƆƢƩƚĴ9ř´’_GľĆĹIĴCPUßÌľ ššĆĹĸŤ • hŖÛă – Ò{żƆƢƩƚé‚ŕļŁťCPUßÌŖ,  – HTBŕļŁťèƤƩƆÔ¼ŠŤœŖxË 29
  • 31.
    30 ŃþÈĸŤľŒĺŃŅĹŝņŋ PSPacer/HTŗGNUGPLơŬžƧżŕŐ#ó http://www.gridmpi.org/pspacer.jsp ŔļĴŠ·½Ŗïŗ}ïºS¶ºS·½àÑ1ñƪ20800083ƫĴ ļŢř§¾Íyœ ~ůƉƣŴƩƨ­qÎÅ󱔐 ƪNEDOƫŖQ؏4ĶŶƢƩƧƉƂƆƦƩŵƨźżƄƚqη½ó± ƒƥŻŮŵƆƪŶƢƩƧITƒƥŻŮŵƆƫķŖoŒŨž®ņŐĹť