SlideShare a Scribd company logo
1 of 81
Download to read offline
HPC
         TCP/IP




  2009 5 29   SACSIS2009
(2)



            HPC      TCP/IP



•
    •                    1 Gbps
        RTT 100

        •                 240 Mbps ☹

        •         940 Mbps ☺



                                         2
(2)



                                                HPC                                                        TCP/IP


                          • Ethernet                                    PC


                               •              TCP/IP
                                          2                       10MB
                   1000                                                                                 1000

                    800                                x3.5                                              800



                                                                                     Bandwidth (Mbps)
Bandwidth (Mbps)




                              160
                    600                                                                                  600

                    400                                                                                  400

                    200                                                                                  200

                      0                                                                                    0
                          0         100       200       300       400    500   600                             0   100   200       300       400   500   600
                                                    Time (msec)                                                                Time (msec)
                                                                                                                                                           3
(3)




•                      TCP/IP


    •
    • MPI

• TCP/IP    TCP

 • TCP      Reno [RFC2851]

 •          TCP   CUBIC   Compound TCP



                                           4
(4)




• TCP/IP
• TCP/IP
• TCP
• TCP/IP MPI
•
• TCP/IP
•

                 5
(5)




TCP/IP
(6)



TCP (Transport Control Protocol)

 •


 •



                                     7
(7)



        TCP

•             •
    •             •
    •             •
    • SACK            •
                      •
                      • ACK


                                8
(8)



        TCP
        •
            •
            •
                ACK        ACK

        •                                  ACK

                           #5      #4    #3     #2      #1
                 #5   #1


                           #2       #3    #4     #5     #6


※                                                            ACK
        1
    1
                                RTT (Round Trip Time)
                                                                     9
(9)




•                    RTO

    •         RTT α           ACK

•
    • ACK            3

    • 1 RTT                    RTO            +α

                #5       #4    #3    #2      #1



                 #2      #3          #3      #3

                                                   ACK
                                                  ACK
                     RTT (Round Trip Time)
                                                           10
(10)



TCP
•
• ACK           1 RTT


                                        RTT


               #8   #7    #6     #5
     #8   #1
#9                                            #4   #1
               #2    #3    #4    #5


                                        ACK
                RTT (Round Trip Time)
                                                     11
(11)



TCP
•
    •
•
    •




          12
(13)




•
➡               rwin

 •        ACK

 •                            rwin

                #8     #7   #6    #5
     #8    #1                                  rwin
#9                                                    #4#3
                #2     #3   #4    #5


                                         ACK
                 RTT (Round Trip Time)
                                                         13
(-)




     •

     •
     •
     ➔

                   B                               B/2
#5       #4   #3       #2   #1   #7 ... #4   #3   #2      #1




                             T                           2T
                                                                 14
(14)




•
➡               cwnd

  •
  •             cwnd


                #8     #7   #6     #5
      #8   #1                                   rwin
#9                                                     #4#3
                       cwnd
                 #2    #3   #4     #5


                                          ACK
                  RTT (Round Trip Time)
                                                          15
(15)




      •
      •        cwnd

          •2
 cwnd
 w



w/2
                      ssthresh


                           time
                                    16
(16)




      •              cwnd


      •       cwnd

          •                 →
 cwnd
 w



w/2
                                ssthresh


                                     time
                                              17
(12)




•                 rwin        ACK   TCP

•                 cwnd

‣

     =min(cwnd, rwin)
                  #8     #7    #6    #5
     #8      #1                                   rwin
#9                                                       #4#3
                   #2    #3   #4     #5
                          cwnd
                                            ACK
                    RTT (Round Trip Time)
                                                            18
(17)



         ACK
 •
          RTT

     •


ACK




ACK




                  19
(17)



        ACK
•
           RTT

    •
• ACK
 •
(1) data



                 (2) ACK

                             20
(34)




TCP/IP
(35)



                                          TCP
• RTT   100


                    240 Mbps



                         : 1 Gbps
                    RTT: 100

              S                     R

              GbE                   GbE



                                                  22
(36)




•              ×

    •              1Gbps                     50
            RTT 100                     6.25MB


                                 : 50
        :          1 Gbps x 50          = 50 Mbit
1 Gbps                                  = 6.25 MB



               2
                                                      23
(37)




•                         x RTT

    • 1 Gbps × 50    x 2 = 12.5 MB

    •
•                                                 MB

                       6.25MB                          6.25MB

        12.5MB



                                                          out-
                    RTT (Round Trip Time)   of-order
                                                             24
(-)



        TCP

•
    ‣ sysctl
•
    ‣          API




                       25
(39)



sysctl                                                    net.*
                  sysctl                                                              default
    net.core.rmem_default                                                             109568
    net.core.rmem_max                                                                 131071
    net.core.wmem_default                                                             109568
    net.core.wmem_max                                                                 131071
    net.core.netdev_max_backlog                                                        1000
    net.ipv4.tcp_rmem                 TCP                [min, default, max]   4096, 87380, 4194304
    net.ipv4.tcp_wmem                 TCP                [min, default, max]   4096, 16384, 4194304
    net.ipv4.tcp_mem                                [low,pressure, high]       98304, 131072, 196608
    net.ipv4.tcp_no_metrics_save                                                        0
    net.ipv4.tcp_sack                             SACK                                  1
    net.ipv4.tcp_congestion_control         TCP                                        cubic


※                                                                               1GB
                                                                                                         26
(38)



              sysctl
•
       $ sysctl net.core.wmem_max
       net.core.wmem_max = 131071

       # sysctl -w net.core.wmem_max=1048576
       net.core.wmem_max=1048576


•   Linux   procfs
       $ cat /proc/sys/net/core/wmem_max
       131071

       # echo 1048576 > /proc/sys/net/core/wmem_max


•
       $ sysctl -p   #              /etc/sysctl.conf
       $ sysctl -a | grep tcp #
                                                         27
(41)




   •       Linux


   •                            tcp_wmem
       tcp_rmem

       •               cwnd       4MB

TCP

       min              default         cwnd          max

net.ipv4.tcp_wmem[0]   tcp_wmem[1]                 tcp_wmem[2]
        (4KB)             (16KB)                       (4MB)

                                               ※
                                                                   28
(40)



                                           API
•
• setsockopt(2)                  getsockopt(2)
  int ssiz; /* send buffer size */
  cc = setsockopt(so, SOL_SOCKET, SO_SNDBUF, &ssiz, sizeof(ssiz));

  socklen_t optlen = sizeof(csiz);
  cc = getsockopt(so, SOL_SOCKET, SO_SNDBUF, &csiz, &optlen);



     SOL_SOCKET          SO_SNDBUF
     SOL_SOCKET          SO_RCVBUF

• SO_SNDBUF
• SO_SNDBUF                      net.core.wmem_max
                                                                       29
(42)



    setsockopt

• listen connect
• Linux                     2

 • man tcp(7)                          TCP




                  Iperf
  $ iperf -w 1m -c 192.168.0.2
  ------------------------------------------------------------
  Client connecting to 192.168.0.2, TCP port 5001
  TCP window size: 2.00 MByte (WARNING: requested 1.00 MByte)
  ------------------------------------------------------------

                                                                   30
(43)




      •
      •
                                 1000

                                  900

                                  800

                                  700
              Bandwidth (Mbps)




                                  600

                                  500

                                  400

                                  300
   : 1 Gbps
                                  200
RTT: 100
                                  100
TCP: CUBIC                          0
                                                                sndbuf=12.5MB
                                        0   10   20         30      40      50   60
                                                      Time (Second)
                                                                                        31
(44)




•                                                      QoS
    NIC

•
•                       1000

                         900

                         800

                         700
     Bandwidth (Mbps)




                         600

                         500

                         400

                         300

                         200

                         100            sndbuf=12.5MB txqueuelen=10000
                                                         sndbuf=12.5MB
                                                                                           NIC
                           0
                               0   10     20         30      40      50   60
                                               Time (Second)                   NIC: Network Interface Card
                                                                                                         32
(45)




•                                                      tc
    $ tc -s qdisc show dev eth0
    qdisc pfifo_fast 0: root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
     Sent 41825726768 bytes 1383455 pkt (dropped 72, overlimits 0 requeues 25164)
     rate 0bit 0pps backlog 0b 0p requeues 25164


• ifconfig
    $ ifconfig eth0
    eth0    Link encap:Ethernet HWaddr 00:22:64:10:cc:9d
          inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0
          inet6 addr: fe80::222:64ff:fe10:cc9d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:795673 errors:0 dropped:0 overruns:0 frame:0
          TX packets:163559 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:72212519 (72.2 MB) TX bytes:29126498 (29.1 MB)
          Interrupt:17
    $ ifconfig eth0 txqueuelen 10000



                                                                                      33
(71)



                                              netstat -s
            •
            •                         Linux                 net-tools

$ netstat -s                                    TcpExt:
[snip]                                             55 resets received for embryonic SYN_RECV sockets
Tcp:                                               108 TCP sockets finished time wait in fast timer
   49 active connections openings                  10969 delayed acks sent
   4925 passive connection openings                Quick ack mode was activated 67 times
   56 failed connection attempts                   7961 packets directly queued to recvmsg prequeue.
   12 connection resets received                   43 bytes directly received in process context from prequeue
   1 connections established                       69443 packet headers predicted
   226842 segments received                        11162 acknowledgments not containing data payload received
   161944 segments send out                        110524 predicted acknowledgments
   210 segments retransmited                       4 times recovered from packet loss by selective acknowledgements
   1 bad segments received.                        37 congestion windows recovered without slow start after partial ack
   575 resets sent                                 4 TCP data loss events
[snip]                                             6 timeouts after SACK recovery
                                                   1 timeouts in loss state
                                                   5 fast retransmits
                                                   5 retransmits in slow start
                                                   167 other TCP timeouts
                                                [snip]


                                                                                                                            34
(46)




TCP
(47)




•       TCP                               cwnd


    •              cwnd

    •
                 1Gbps    RTT 200ms MTU 1500B
cwnd
         16000

                                                 8000


                         8000 RTT ≒ 27

                                                   Time
                                                            36
(48)




•
    • PSockets   GridFTP (BitTrrent Swarming)

•
    •
        •                 TCP           TCP Friendly

        •
    •
        •
                                                         37
(50)




•
•

•                        Inter-protocol fairness

    • TCP Friendliness      TCP

•                        Intra-protocol fairness

    • RTT         RTT

     • Reno   RTT

                                                     38
(49)



  TCP
  NewReno                   RFC 2852
HighSpeed TCP               RFC 3649
 Scalable TCP
     BIC
   CUBIC                Linux
   H-TCP
    Vegas
 Westwood+
    FAST
    Illinois
    YeAH
Compound TCP        Windows Vista

                                       ≒
                ≒
                        RTT2 > RTT1



                                             39
(51)



                       CUBIC
•
            cwnd     Wmax


• RTT                    cubic
                                     cwnd

cwnd

           Steady state behavior          Probing
    Wmax

                                                            3
                                               3   Wmax β
                             Wcubic = C   T−                    + Wmax
                                                    C
    Wmin
                                                                   Time
                                                                            40
(52)



             Compound TCP

• Windows Vista                loss-based (cwnd)

•                                            Reno

                                           +
                               delay-based (dwnd)
    •
    •            Reno
                                               RTT



•




                                         =
                               cwnd + dwnd


    • dwnd   0          Reno
                                      loss free period

                                                           41
(53)



                        TCP
•                 TCP

    •
    •
        •
    •
        •
•
•           HPC           TCP

http://code.google.com/p/pspacer/wiki/DonkanTcp
                                               42
(54)




    • RTT
    •
    •
        →
                             RTT: 200
        RTT: 200
        BW: 500 Mbps


S                      R
                                        (   )
GbE                    GbE

 500Mbps
 1Gbps
                                                43
(55)




• Limited slow start [RFC 3742]
 •   max_ssthresh < cwnd <= ssthresh     RTT
     cwnd      max_ssthresh/2

• Swift Start
 •   Packet-pair probing
     ACK

• HyStart (Hybrid slow start)
 •                                 RTT


 •     Linux 2.6.29   CUBIC
                                                 44
(18)




TCP/IP MPI
(19)



MPI                      TCP

 • MPI                  TCP
          e.g.

  •
  •
      •



                 Time          Time
                                        46
(20)



                                  PC
8 PCs                                                              8 PCs
                            WAN                                    Node8
Node0
 Host 0                                                             Host 0
  Host 0                     GtrcNET-1                               Host 0

            Catalyst 3750




                                                   Catalyst 3750
   Host 0                                                             Host 0




                                                                     ………
  ………




Node7                                                              Node15



                                   • CPU: Pentium4/2.4GHz
                                   • Memory: DDR400 512MB
                                   • NIC: Intel PRO/1000 (82547EI)
                                   • OS: Linux-2.6.9-1.6 (Fedora Core 2)
                                   • Socket buffer: 20MB
                                                                                 47
(-)




    8 PCs       2                     10MB                                   8 PCs
                                     WAN                                     Node8
    Node0
     Host 0                                                                   Host 0
      Host 0                          GtrcNET-1                                Host 0

                    Catalyst 3750




                                                             Catalyst 3750
       Host 0                                                                   Host 0




                                                                               ………
      ………




    Node7                           RTT: 20 ms                               Node15
                                        : 500 Mbps

                                             • CPU: Pentium4/2.4GHz
                                             • Memory: DDR400 512MB
                                             • NIC: Intel PRO/1000 (82547EI)
1                                            • OS: Linux-2.6.9-1.6 (Fedora Core 2)
                                             • Socket buffer: 20MB
                                                                                           48
(21)




                    1000

                     800                                                                2         10MB
 Bandwidth (Mbps)




                     600

                     400

                     200

                       0
                           0           2         4                6         8      10


                                                                                        •
                                                     Time (sec)
                                                                                                   160   10MB
                    1000                                                                    500Mbps

                                                                                        •
                     800
                                                       x3.5
Bandwidth (Mbps)




                               160
                     600

                     400

                     200

                       0
                           0         100   200       300              400   500   600
                                                 Time (msec)
                                                                                                                  49
(22)




       •
           •   Linux   RTO                  cwnd        [RFC 2861]

       • MPI
        • HTTP/1.1
       • kernel 2.6.18
                       net.ipv4. tcp_slow_start_after_idle=0


                                                         ssthresh
cwnd




                                                               time
                                                                        50
(23)




•
    •
    • cwnd                                    ACK
                                        →

             cwnd   cwnd




                                              ACK
                      RTT (Round Trip Time)
                                                      51
(24)




• ACK

 •   ACK                        RTT

 •                RTT cwnd

           cwnd           RTT cwnd




                                                ACK
                        RTT (Round Trip Time)
                                                        52
(25)




                   1000

                    800
Bandwidth (Mbps)




                    600
                                                                          2   10MB
                    400

                    200

                      0
                          0   100   200       300       400   500   600
                                          Time (msec)




                   1000
                                                                          •
                                                                                cwnd
                    800
Bandwidth (Mbps)




                    600

                    400
                                                                          •
                    200

                      0
                          0   100   200       300       400   500   600
                                          Time (msec)

                                                                                         53
(-)




8 PCs                                                              8 PCs
                             WAN                                   Node8
Node0
 Host 0                                                             Host 0
  Host 0                      GtrcNET-1                              Host 0

            Catalyst 3750




                                                   Catalyst 3750
   Host 0                                                             Host 0




                                                                     ………
  ………




Node7                       RTT: 0 ms                              Node15
                                : 1 Gbps

                                   • CPU: Pentium4/2.4GHz
                                   • Memory: DDR400 512MB
16                                 • NIC: Intel PRO/1000 (82547EI)
                                   • OS: Linux-2.6.9-1.6 (Fedora Core 2)
                                   • Socket buffer: 20MB
                                                                                 54
(29)



                                         All-to-all
      •
      • MPI
      •
                        data

          A   A0       A1      A2       A3       A0       B0   C0   D0
process




          B   B0       B1      B2       B3       A1       B1   C1   D1
          C   C0       C1      C2       C3       A2       B2   C2   D2
          D   D0       D1      D2       D3       A3       B3   C3   D3



          A        B                A        B        A        B



          C        D                C        D        C        D
                   1                         2                 3
                                                                           55
(28)




                                   1
                            250


                            200                        230Mbps                     •
         Bandwidth (Mbps)




                            150
                                                                                   •
                                                                                   •
                            100


                             50


                              0
                                   0
                                   32 KB200     400          600
                                                Message size (KB)
                                                                     800   1000




                            1000
                                                32KB
                                                                                   •
                             800
Bandwidth (Mbps)




                             600       200
                             400                                                   •   200

                                                                                   •
                             200

                               0
                                   0      500         1000          1500    2000
                                                  Time (msec)
                                                                                               56
(30)



                        RTO
    •           2                             RTO




    • TCP
            1                     2            3

        A           B         A       B   A         B
                        RTO                             A C
1

        C           D         C       D   C         D

        A           B         A       B   A         B
2

        C           D         C       D   C         D
                                                                57
(31)



           RTO
• RTO = RTT + α
 •α →
 •α →
                                   }
• RTT 0                  RTO            200
      RTO = SRTT + max(G, 4 x RTTVAR)
      SRTT = (1 - β) x SRTT + β x RTT
      RTTVAR = (1 - γ) x RTTVAR + γ x |SRTT - RTT|
      β = 1/8, γ = 1/4, G = 200 msec
                                          SRTT: Smoothed Round Trip Time

 RFC2988     G 1     RTT                       500
                                                                             58
(32)



                                                    RTO
                                 1
                          250
                                                                                             RTO = SRTT + max(G, 4 x RTTVAR)
                          200


                                                                                                 •                          G = 200
       Bandwidth (Mbps)




                          150


                          100
                                                                                                 •                          G=0

                                                                                                                • (kernel 2.6.23
                           50
                                     32 KB                             w/o RTO fix
                                                                       w/ RTO fix
                                                                                                                                               ) ip route
                            0
                                 0     200         400          600         800      1000
                                                   Message size (KB)


                                        RTO                                                                                RTO
                          1000                                                                                  1000

                           800                                                                                   800
Bandwidth (Mbps)




                           600                                                               Bandwidth (Mbps)    600

                           400                                                                                   400

                           200                                                                                   200

                             0                                                                                     0
                                 0           500         1000             1500        2000                             0   500      1000         1500   2000
                                                     Time (msec)                                                                 Time (msec)
                                                                                                                                                              59
(-)




                         1
                   250
                                                                              •
                   200
                                                                                  •
Bandwidth (Mbps)




                   150




                                                                                  ‣
                   100


                    50
                             32 KB                       w/o RTO fix
                                                         w/ RTO fix


                                                                              •
                     0
                         0     200   400        600           800      1000
                                     Message size (KB)



                                                                                  • RTO
                                                                                  ‣ RTO
                                                                              •
                                                                                  •
                                                                                          RTO
                                                                                                  60
(33)



      MPI   TCP/IP



            ACK
RTO



            cwnd

            MPI      TCP           ,
                           ACS11 , 2005
                                         61
(65)
(57)




  63
(58)


    PSPacer


•

• Linux                          OS


•                GPL


    http://www.gridmpi.org/pspacer.jsp


      ,                                  ACS14   , 2006
                                                            64
(63)




•             500Mbps


•




    PSPacer             PSPacer   500Mbps
                                              65
(64)



                      TCP
      Sender                                     Receiver

 A
                                                            C
                               1 Gbps
                              RTT 200ms

 B                                                          D

            w/o PSPacer                   w/ PSPacer
                           535 Mbps            980 Mbps

                                               490 Mbps
            394 Mbps
                                               490 Mbps

                141 Mbps
                               TCP

※Scalable TCP
                                                                  66
(62)



                  PSPacer               10GbE
                          100Mbps       Iperf 5
         GtrcNET-10
         10
(Gbps)




          8


                                                    PSPacer        10GbE
          6




          4
                                                          HTB: Hierarchical Token Bucket
                                                           Linux

          2

                                         PSPacer+
                                              HTB
          0                                               CPU: Quad-core Xeon (E5430) x 2
              0       2         4   6          8      10 NIC: Myricom Myri-10G (PCIe x 8)
                                                             MTU: 9000 byte
                                                          Memory: 4GB DDR2-667
                                        (Gbps)
                                                                                              67
(66)




•

    •
• ip
        $ ip route show cached to 192.168.0.2
        192.168.0.2 from 192.168.0.1 dev eth1
           cache mtu 9000 rtt 187ms rttvar 175ms ssthresh 1024 cwnd 637
        advmss 8960 hoplimit 64


•
        # sysctl -w net.ipv4.tcp_no_metrics_save=1
        net.ipv4.tcp_no_metrics_save=1


                                                                            68
(67)




•
    •
        GbE MTU 1500B → 949.3 Mbps

        (Gb       MTU 9000B → 991.4 Mbps
•                                “ping -M do”                 IP             “Don’t
    fragment” bit          on
           $ ping -M do -s 9000 192.168.0.2                                  NG
           PING 192.168.0.2 (192.168.0.2) 9000(9028) bytes of data.
           From 192.168.0.1 icmp_seq=1 Frag needed and DF set (mtu = 9000)
           From 192.168.0.1 icmp_seq=1 Frag needed and DF set (mtu = 9000)
           From 192.168.0.1 icmp_seq=1 Frag needed and DF set (mtu = 9000)

           $ ping -M do -s 8972 192.168.0.2
           PING 192.168.0.2 (192.168.0.2) 8972(9000) bytes of data.          OK
           8980 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=3.76 ms
           8980 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.127 ms
           8980 bytes from 192.168.0.2: icmp_seq=3 ttl=64 time=0.133 ms


                                                                                        69
(68)




•   Infiniband      Myrinet

        •

•                                        IEEE 802.3x

    •                                                  PAUSE


    •
            $ sudo ethtool -A eth0 tx on rx on

            $ sudo ethtool -a eth0
            Pause parameters for eth1:
            Autonegotiate:
off
            RX:
 
    off
            TX:
 
    off
                                                   Lossless Ethernet
                                                                         70
(69)




TCP/IP
(70)



Rough consensus, running code
 •                                RFC

     •        Reno [RFC 2851]

 •
     • net/ipv4
             180

     •             regression


         •    BIC init_ssthresh    ABC

 •
     •
                                           72
(72)



                    Web100
•        Pittsburgh Supercomputing Center (PSC)


•                                       procfs


    • netstat -s                          [RFC 4898]

•
•

                                  http://www.web100.org/
                                                           73
(72)



                 TCP Probe
• KProbes            TCP



    •
            procfs

•
•



                               74
(73)



TCP Probe
                              cwnd ssthresh




   http://www.linuxfoundation.org/en/Net:TCP_testing
                                                       75
(76)




1.                                 ×RTT
                         4MB

2. setsockopt   listen   connect

3.


4.    TCP


5.




                                            76
(77)




1.
           tcp_slow_start_after_idle

2.


3.               RTO
     RTO

4.


5.


                                         77
TCP
              ARPANET                                  NCP
              TCP IP         (1970           )

                                                       TCP
                             RFC793
                            (standard)    TCP(v4)                IPv4
                   (1986)
    →                                Tahoe
                RFC2851
               (standard)      Reno
        RFC2852
     (experimental)     NewReno                          Vegas

              HSTCP
                                                         FAST
ScalableTCP                       HTCP
                      BIC
          CUBIC                          CompoundTCP         TCP
                                                                        79
RFC
                   Spec.                                  sysctl             default
                TCP (RFC 793)                                -                  -
          Window scaling (RFC 1323)                tcp_window_scaling          1
        TImestamps option (RFC 1323)                 tcp_timestamps            1
TIME-WAIT Assassination Hazards (RFC 1337)             tcp_rfc1337             0
            SACK (RFC 2018, 3517)                        tcp_sack              1
       Control block sharing (RFC 2140)            tcp_no_metrics_save         0
       MD5 signature option (RFC 2385)                       -                  -
         Initial window size (RFC 2414)                      -                  -
        Congestion control (RFC 2581)                        -                  -
             NewReno (RFC 3782)                              -                  -
          Cwnd validation (RFC 2861)             tcp_slow_start_after_idle     1
             D-SACK (RFC 2883)                          tcp_dsack              1
         Limited transmit (RFC 3042)                         -                  -
   Explicit congestion notification (RFC 3168)            tcp_ecn               0
     Appropriate Byte Counting (RFC 3465)                tcp_abc               0
         Limited slow start (RFC 3742)              tcp_max_ssthresh           0
               FRTO (RFC 4138)                           tcp_frto              0
                 Forward ACK                             tcp_fack              1
              Path MTU discovery                     tcp_mtu_probing           0
                                                                                       80
URL
    Project                               URL
  BIC/CUBIC        http://netsrv.csc.ncsu.edu/twiki/bin/view/Main/BIC
     FAST                   http://netlab.caltech.edu/FAST/
TCP Westwood+          http://c3lab.poliba.it/index.php/Westwood
  Scalable TCP         http://www.deneholme.net/tom/scalable/
 HighSpeed TCP            http://www.icir.org/floyd/hstcp.html
    H-TCP             http://www.hamilton.ie/net/htcp/index.htm
  TCP-Illinois       http://www.ews.uiuc.edu/~shaoliu/tcpillinois/
Compound TCP       http://research.microsoft.com/en-us/projects/ctcp/
   TCP Vegas         http://www.cs.arizona.edu/projects/protocols/
TCP Westwood             http://www.cs.ucla.edu/NRL/hpi/tcpw/
  Circuit TCP            http://www.ece.virginia.edu/cheetah/
   TCP Hybla                   https://hybla.deis.unibo.it/
TCP Low Priority      http://www.ece.rice.edu/networks/TCP-LP/
     UDT                     http://www.cs.uic.edu/~ygu1/
     XCP                 http://www.ana.lcs.mit.edu/dina/XCP/
    Web100                     http://www.web100.org/
  TCP Probe        http://www.linuxfoundation.org/en/Net:TcpProbe
    iproute2        http://www.linuxfoundation.org/en/Net:Iproute2
                                                                        81

More Related Content

What's hot

5. transistion mechanisum 1
5. transistion mechanisum 15. transistion mechanisum 1
5. transistion mechanisum 1rajataro
 
RIP RTCP RTSP
RIP RTCP RTSPRIP RTCP RTSP
RIP RTCP RTSPDev Heba
 
First hop redundancy
First hop redundancyFirst hop redundancy
First hop redundancyMohamed Gamel
 
Precision Time Synchronization
Precision Time SynchronizationPrecision Time Synchronization
Precision Time SynchronizationKrishna Sankar
 
RTSP Protocol - Explanation to develop API of RTSP Protocol
RTSP Protocol - Explanation to develop API of RTSP ProtocolRTSP Protocol - Explanation to develop API of RTSP Protocol
RTSP Protocol - Explanation to develop API of RTSP ProtocolFranZEast
 
GLBP (gateway load balancing protocol)
GLBP (gateway load balancing protocol)GLBP (gateway load balancing protocol)
GLBP (gateway load balancing protocol)Netwax Lab
 
First Hop Redundancy Protocols in IPv6 HSRP + GLBP
First Hop Redundancy Protocols in IPv6 HSRP + GLBPFirst Hop Redundancy Protocols in IPv6 HSRP + GLBP
First Hop Redundancy Protocols in IPv6 HSRP + GLBPIT Tech
 
RTSP Analysis Wireshark
RTSP Analysis WiresharkRTSP Analysis Wireshark
RTSP Analysis WiresharkYoss Cohen
 
Implementing IPv6 Segment Routing in the Linux kernel
Implementing IPv6 Segment Routing in the Linux kernelImplementing IPv6 Segment Routing in the Linux kernel
Implementing IPv6 Segment Routing in the Linux kernelOlivier Bonaventure
 
QoS marking on cisco IOS Router
QoS marking on cisco IOS RouterQoS marking on cisco IOS Router
QoS marking on cisco IOS RouterNetProtocol Xpert
 

What's hot (20)

Rtsp
RtspRtsp
Rtsp
 
5. transistion mechanisum 1
5. transistion mechanisum 15. transistion mechanisum 1
5. transistion mechanisum 1
 
Real-Time Streaming Protocol
Real-Time Streaming Protocol Real-Time Streaming Protocol
Real-Time Streaming Protocol
 
7. protocols
7. protocols7. protocols
7. protocols
 
RTP.ppt
RTP.pptRTP.ppt
RTP.ppt
 
RIP RTCP RTSP
RIP RTCP RTSPRIP RTCP RTSP
RIP RTCP RTSP
 
Chap05 gtp 03_kh
Chap05 gtp 03_khChap05 gtp 03_kh
Chap05 gtp 03_kh
 
First hop redundancy
First hop redundancyFirst hop redundancy
First hop redundancy
 
Precision Time Synchronization
Precision Time SynchronizationPrecision Time Synchronization
Precision Time Synchronization
 
7.protocols 2
7.protocols 27.protocols 2
7.protocols 2
 
RTSP Protocol - Explanation to develop API of RTSP Protocol
RTSP Protocol - Explanation to develop API of RTSP ProtocolRTSP Protocol - Explanation to develop API of RTSP Protocol
RTSP Protocol - Explanation to develop API of RTSP Protocol
 
Fhrp notes
Fhrp notesFhrp notes
Fhrp notes
 
GLBP (gateway load balancing protocol)
GLBP (gateway load balancing protocol)GLBP (gateway load balancing protocol)
GLBP (gateway load balancing protocol)
 
RTP & RTCP
RTP & RTCPRTP & RTCP
RTP & RTCP
 
First Hop Redundancy Protocols in IPv6 HSRP + GLBP
First Hop Redundancy Protocols in IPv6 HSRP + GLBPFirst Hop Redundancy Protocols in IPv6 HSRP + GLBP
First Hop Redundancy Protocols in IPv6 HSRP + GLBP
 
RTSP Analysis Wireshark
RTSP Analysis WiresharkRTSP Analysis Wireshark
RTSP Analysis Wireshark
 
Rtp
RtpRtp
Rtp
 
Ccnpswitch
CcnpswitchCcnpswitch
Ccnpswitch
 
Implementing IPv6 Segment Routing in the Linux kernel
Implementing IPv6 Segment Routing in the Linux kernelImplementing IPv6 Segment Routing in the Linux kernel
Implementing IPv6 Segment Routing in the Linux kernel
 
QoS marking on cisco IOS Router
QoS marking on cisco IOS RouterQoS marking on cisco IOS Router
QoS marking on cisco IOS Router
 

Viewers also liked

DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...Jim St. Leger
 
Gamecenter概説
Gamecenter概説Gamecenter概説
Gamecenter概説Hiroshi Ono
 
Voltdb - wikipedia
Voltdb - wikipediaVoltdb - wikipedia
Voltdb - wikipediaHiroshi Ono
 
introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack monad bobo
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014   Kernel Networking WalkthroughDevConf 2014   Kernel Networking Walkthrough
DevConf 2014 Kernel Networking WalkthroughThomas Graf
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelDivye Kapoor
 

Viewers also liked (6)

DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
 
Gamecenter概説
Gamecenter概説Gamecenter概説
Gamecenter概説
 
Voltdb - wikipedia
Voltdb - wikipediaVoltdb - wikipedia
Voltdb - wikipedia
 
introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014   Kernel Networking WalkthroughDevConf 2014   Kernel Networking Walkthrough
DevConf 2014 Kernel Networking Walkthrough
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
 

Similar to SACSIS2009_TCP.pdf

Jitsi Videobridge, Octopodes, and Kotlin
Jitsi Videobridge, Octopodes, and KotlinJitsi Videobridge, Octopodes, and Kotlin
Jitsi Videobridge, Octopodes, and KotlinBoris Grozev
 
IPv6 Fundamentals & Securities
IPv6 Fundamentals & SecuritiesIPv6 Fundamentals & Securities
IPv6 Fundamentals & SecuritiesDon Anto
 
SMP Implementation for OpenBSD/sgi [Japanese Edition]
SMP Implementation for OpenBSD/sgi [Japanese Edition]SMP Implementation for OpenBSD/sgi [Japanese Edition]
SMP Implementation for OpenBSD/sgi [Japanese Edition]Takuya ASADA
 
HTTP and 5G (fixed1)
HTTP and 5G (fixed1)HTTP and 5G (fixed1)
HTTP and 5G (fixed1)dynamis
 
Data centre networking at London School of Economics and Political Science - ...
Data centre networking at London School of Economics and Political Science - ...Data centre networking at London School of Economics and Political Science - ...
Data centre networking at London School of Economics and Political Science - ...Jisc
 
Druid + Superset (資料的快速通道)
Druid + Superset (資料的快速通道)Druid + Superset (資料的快速通道)
Druid + Superset (資料的快速通道)二文 郭
 
Part 9 : Congestion control and IPv6
Part 9 : Congestion control and IPv6Part 9 : Congestion control and IPv6
Part 9 : Congestion control and IPv6Olivier Bonaventure
 
Designing TCP-Friendly Window-based Congestion Control
Designing TCP-Friendly Window-based Congestion ControlDesigning TCP-Friendly Window-based Congestion Control
Designing TCP-Friendly Window-based Congestion Controlsoohyunc
 
MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)
MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)
MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)Art Schanz
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughThomas Graf
 
MAP-E as IPv4 over IPv6 Technology - with some operational experiences
MAP-E as IPv4 over IPv6 Technology - with some operational experiencesMAP-E as IPv4 over IPv6 Technology - with some operational experiences
MAP-E as IPv4 over IPv6 Technology - with some operational experiencesAPNIC
 
web performance explained to network and infrastructure experts
web performance explained to network and infrastructure expertsweb performance explained to network and infrastructure experts
web performance explained to network and infrastructure expertsBernard Paques
 
Cncf k8s_network_part1
Cncf k8s_network_part1Cncf k8s_network_part1
Cncf k8s_network_part1Erhwen Kuo
 

Similar to SACSIS2009_TCP.pdf (20)

SCTP Tutorial
SCTP TutorialSCTP Tutorial
SCTP Tutorial
 
Sctp tutorial
Sctp tutorialSctp tutorial
Sctp tutorial
 
Jitsi Videobridge, Octopodes, and Kotlin
Jitsi Videobridge, Octopodes, and KotlinJitsi Videobridge, Octopodes, and Kotlin
Jitsi Videobridge, Octopodes, and Kotlin
 
IPv6 Fundamentals & Securities
IPv6 Fundamentals & SecuritiesIPv6 Fundamentals & Securities
IPv6 Fundamentals & Securities
 
SMP Implementation for OpenBSD/sgi [Japanese Edition]
SMP Implementation for OpenBSD/sgi [Japanese Edition]SMP Implementation for OpenBSD/sgi [Japanese Edition]
SMP Implementation for OpenBSD/sgi [Japanese Edition]
 
HTTP and 5G (fixed1)
HTTP and 5G (fixed1)HTTP and 5G (fixed1)
HTTP and 5G (fixed1)
 
CN Jntu PPT
CN Jntu PPTCN Jntu PPT
CN Jntu PPT
 
Transportlayer tanenbaum
Transportlayer tanenbaumTransportlayer tanenbaum
Transportlayer tanenbaum
 
Ruby を全面的に採用した東京ガスの地震防災システム
Ruby を全面的に採用した東京ガスの地震防災システムRuby を全面的に採用した東京ガスの地震防災システム
Ruby を全面的に採用した東京ガスの地震防災システム
 
Data centre networking at London School of Economics and Political Science - ...
Data centre networking at London School of Economics and Political Science - ...Data centre networking at London School of Economics and Political Science - ...
Data centre networking at London School of Economics and Political Science - ...
 
Druid + Superset (資料的快速通道)
Druid + Superset (資料的快速通道)Druid + Superset (資料的快速通道)
Druid + Superset (資料的快速通道)
 
Raptor codes
Raptor codesRaptor codes
Raptor codes
 
Part 9 : Congestion control and IPv6
Part 9 : Congestion control and IPv6Part 9 : Congestion control and IPv6
Part 9 : Congestion control and IPv6
 
Designing TCP-Friendly Window-based Congestion Control
Designing TCP-Friendly Window-based Congestion ControlDesigning TCP-Friendly Window-based Congestion Control
Designing TCP-Friendly Window-based Congestion Control
 
MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)
MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)
MQTC V2.0.1.3 - WMQ & TCP Buffers – Size DOES Matter! (pps)
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
 
MAP-E as IPv4 over IPv6 Technology - with some operational experiences
MAP-E as IPv4 over IPv6 Technology - with some operational experiencesMAP-E as IPv4 over IPv6 Technology - with some operational experiences
MAP-E as IPv4 over IPv6 Technology - with some operational experiences
 
web performance explained to network and infrastructure experts
web performance explained to network and infrastructure expertsweb performance explained to network and infrastructure experts
web performance explained to network and infrastructure experts
 
QUIC
QUICQUIC
QUIC
 
Cncf k8s_network_part1
Cncf k8s_network_part1Cncf k8s_network_part1
Cncf k8s_network_part1
 

More from Hiroshi Ono

EventDrivenArchitecture
EventDrivenArchitectureEventDrivenArchitecture
EventDrivenArchitectureHiroshi Ono
 
program_draft3.pdf
program_draft3.pdfprogram_draft3.pdf
program_draft3.pdfHiroshi Ono
 
nodalities_issue7.pdf
nodalities_issue7.pdfnodalities_issue7.pdf
nodalities_issue7.pdfHiroshi Ono
 
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfgenpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfHiroshi Ono
 
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfkademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfHiroshi Ono
 
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfpragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfHiroshi Ono
 
downey08semaphores.pdf
downey08semaphores.pdfdowney08semaphores.pdf
downey08semaphores.pdfHiroshi Ono
 
BOF1-Scala02.pdf
BOF1-Scala02.pdfBOF1-Scala02.pdf
BOF1-Scala02.pdfHiroshi Ono
 
TwitterOct2008.pdf
TwitterOct2008.pdfTwitterOct2008.pdf
TwitterOct2008.pdfHiroshi Ono
 
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfstateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfHiroshi Ono
 
SACSIS2009_TCP.pdf
SACSIS2009_TCP.pdfSACSIS2009_TCP.pdf
SACSIS2009_TCP.pdfHiroshi Ono
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdfHiroshi Ono
 
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfstateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfHiroshi Ono
 
program_draft3.pdf
program_draft3.pdfprogram_draft3.pdf
program_draft3.pdfHiroshi Ono
 
nodalities_issue7.pdf
nodalities_issue7.pdfnodalities_issue7.pdf
nodalities_issue7.pdfHiroshi Ono
 
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfgenpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfHiroshi Ono
 
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfkademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfHiroshi Ono
 
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfpragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfHiroshi Ono
 
downey08semaphores.pdf
downey08semaphores.pdfdowney08semaphores.pdf
downey08semaphores.pdfHiroshi Ono
 

More from Hiroshi Ono (20)

EventDrivenArchitecture
EventDrivenArchitectureEventDrivenArchitecture
EventDrivenArchitecture
 
program_draft3.pdf
program_draft3.pdfprogram_draft3.pdf
program_draft3.pdf
 
nodalities_issue7.pdf
nodalities_issue7.pdfnodalities_issue7.pdf
nodalities_issue7.pdf
 
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfgenpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdf
 
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfkademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdf
 
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfpragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
 
downey08semaphores.pdf
downey08semaphores.pdfdowney08semaphores.pdf
downey08semaphores.pdf
 
BOF1-Scala02.pdf
BOF1-Scala02.pdfBOF1-Scala02.pdf
BOF1-Scala02.pdf
 
TwitterOct2008.pdf
TwitterOct2008.pdfTwitterOct2008.pdf
TwitterOct2008.pdf
 
camel-scala.pdf
camel-scala.pdfcamel-scala.pdf
camel-scala.pdf
 
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfstateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
 
SACSIS2009_TCP.pdf
SACSIS2009_TCP.pdfSACSIS2009_TCP.pdf
SACSIS2009_TCP.pdf
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdf
 
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdfstateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
stateyouredoingitwrongjavaone2009-090617031310-phpapp02.pdf
 
program_draft3.pdf
program_draft3.pdfprogram_draft3.pdf
program_draft3.pdf
 
nodalities_issue7.pdf
nodalities_issue7.pdfnodalities_issue7.pdf
nodalities_issue7.pdf
 
genpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdfgenpaxospublic-090703114743-phpapp01.pdf
genpaxospublic-090703114743-phpapp01.pdf
 
kademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdfkademlia-1227143905867010-8.pdf
kademlia-1227143905867010-8.pdf
 
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdfpragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
pragmaticrealworldscalajfokus2009-1233251076441384-2.pdf
 
downey08semaphores.pdf
downey08semaphores.pdfdowney08semaphores.pdf
downey08semaphores.pdf
 

Recently uploaded

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

SACSIS2009_TCP.pdf

  • 1. HPC TCP/IP 2009 5 29 SACSIS2009
  • 2. (2) HPC TCP/IP • • 1 Gbps RTT 100 • 240 Mbps ☹ • 940 Mbps ☺ 2
  • 3. (2) HPC TCP/IP • Ethernet PC • TCP/IP 2 10MB 1000 1000 800 x3.5 800 Bandwidth (Mbps) Bandwidth (Mbps) 160 600 600 400 400 200 200 0 0 0 100 200 300 400 500 600 0 100 200 300 400 500 600 Time (msec) Time (msec) 3
  • 4. (3) • TCP/IP • • MPI • TCP/IP TCP • TCP Reno [RFC2851] • TCP CUBIC Compound TCP 4
  • 5. (4) • TCP/IP • TCP/IP • TCP • TCP/IP MPI • • TCP/IP • 5
  • 7. (6) TCP (Transport Control Protocol) • • 7
  • 8. (7) TCP • • • • • • • SACK • • • ACK 8
  • 9. (8) TCP • • • ACK ACK • ACK #5 #4 #3 #2 #1 #5 #1 #2 #3 #4 #5 #6 ※ ACK 1 1 RTT (Round Trip Time) 9
  • 10. (9) • RTO • RTT α ACK • • ACK 3 • 1 RTT RTO +α #5 #4 #3 #2 #1 #2 #3 #3 #3 ACK ACK RTT (Round Trip Time) 10
  • 11. (10) TCP • • ACK 1 RTT RTT #8 #7 #6 #5 #8 #1 #9 #4 #1 #2 #3 #4 #5 ACK RTT (Round Trip Time) 11
  • 12. (11) TCP • • • • 12
  • 13. (13) • ➡ rwin • ACK • rwin #8 #7 #6 #5 #8 #1 rwin #9 #4#3 #2 #3 #4 #5 ACK RTT (Round Trip Time) 13
  • 14. (-) • • • ➔ B B/2 #5 #4 #3 #2 #1 #7 ... #4 #3 #2 #1 T 2T 14
  • 15. (14) • ➡ cwnd • • cwnd #8 #7 #6 #5 #8 #1 rwin #9 #4#3 cwnd #2 #3 #4 #5 ACK RTT (Round Trip Time) 15
  • 16. (15) • • cwnd •2 cwnd w w/2 ssthresh time 16
  • 17. (16) • cwnd • cwnd • → cwnd w w/2 ssthresh time 17
  • 18. (12) • rwin ACK TCP • cwnd ‣ =min(cwnd, rwin) #8 #7 #6 #5 #8 #1 rwin #9 #4#3 #2 #3 #4 #5 cwnd ACK RTT (Round Trip Time) 18
  • 19. (17) ACK • RTT • ACK ACK 19
  • 20. (17) ACK • RTT • • ACK • (1) data (2) ACK 20
  • 22. (35) TCP • RTT 100 240 Mbps : 1 Gbps RTT: 100 S R GbE GbE 22
  • 23. (36) • × • 1Gbps 50 RTT 100 6.25MB : 50 : 1 Gbps x 50 = 50 Mbit 1 Gbps = 6.25 MB 2 23
  • 24. (37) • x RTT • 1 Gbps × 50 x 2 = 12.5 MB • • MB 6.25MB 6.25MB 12.5MB out- RTT (Round Trip Time) of-order 24
  • 25. (-) TCP • ‣ sysctl • ‣ API 25
  • 26. (39) sysctl net.* sysctl default net.core.rmem_default 109568 net.core.rmem_max 131071 net.core.wmem_default 109568 net.core.wmem_max 131071 net.core.netdev_max_backlog 1000 net.ipv4.tcp_rmem TCP [min, default, max] 4096, 87380, 4194304 net.ipv4.tcp_wmem TCP [min, default, max] 4096, 16384, 4194304 net.ipv4.tcp_mem [low,pressure, high] 98304, 131072, 196608 net.ipv4.tcp_no_metrics_save 0 net.ipv4.tcp_sack SACK 1 net.ipv4.tcp_congestion_control TCP cubic ※ 1GB 26
  • 27. (38) sysctl • $ sysctl net.core.wmem_max net.core.wmem_max = 131071 # sysctl -w net.core.wmem_max=1048576 net.core.wmem_max=1048576 • Linux procfs $ cat /proc/sys/net/core/wmem_max 131071 # echo 1048576 > /proc/sys/net/core/wmem_max • $ sysctl -p # /etc/sysctl.conf $ sysctl -a | grep tcp # 27
  • 28. (41) • Linux • tcp_wmem tcp_rmem • cwnd 4MB TCP min default cwnd max net.ipv4.tcp_wmem[0] tcp_wmem[1] tcp_wmem[2] (4KB) (16KB) (4MB) ※ 28
  • 29. (40) API • • setsockopt(2) getsockopt(2) int ssiz; /* send buffer size */ cc = setsockopt(so, SOL_SOCKET, SO_SNDBUF, &ssiz, sizeof(ssiz)); socklen_t optlen = sizeof(csiz); cc = getsockopt(so, SOL_SOCKET, SO_SNDBUF, &csiz, &optlen); SOL_SOCKET SO_SNDBUF SOL_SOCKET SO_RCVBUF • SO_SNDBUF • SO_SNDBUF net.core.wmem_max 29
  • 30. (42) setsockopt • listen connect • Linux 2 • man tcp(7) TCP Iperf $ iperf -w 1m -c 192.168.0.2 ------------------------------------------------------------ Client connecting to 192.168.0.2, TCP port 5001 TCP window size: 2.00 MByte (WARNING: requested 1.00 MByte) ------------------------------------------------------------ 30
  • 31. (43) • • 1000 900 800 700 Bandwidth (Mbps) 600 500 400 300 : 1 Gbps 200 RTT: 100 100 TCP: CUBIC 0 sndbuf=12.5MB 0 10 20 30 40 50 60 Time (Second) 31
  • 32. (44) • QoS NIC • • 1000 900 800 700 Bandwidth (Mbps) 600 500 400 300 200 100 sndbuf=12.5MB txqueuelen=10000 sndbuf=12.5MB NIC 0 0 10 20 30 40 50 60 Time (Second) NIC: Network Interface Card 32
  • 33. (45) • tc $ tc -s qdisc show dev eth0 qdisc pfifo_fast 0: root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 41825726768 bytes 1383455 pkt (dropped 72, overlimits 0 requeues 25164) rate 0bit 0pps backlog 0b 0p requeues 25164 • ifconfig $ ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:22:64:10:cc:9d inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::222:64ff:fe10:cc9d/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:795673 errors:0 dropped:0 overruns:0 frame:0 TX packets:163559 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:72212519 (72.2 MB) TX bytes:29126498 (29.1 MB) Interrupt:17 $ ifconfig eth0 txqueuelen 10000 33
  • 34. (71) netstat -s • • Linux net-tools $ netstat -s TcpExt: [snip] 55 resets received for embryonic SYN_RECV sockets Tcp: 108 TCP sockets finished time wait in fast timer 49 active connections openings 10969 delayed acks sent 4925 passive connection openings Quick ack mode was activated 67 times 56 failed connection attempts 7961 packets directly queued to recvmsg prequeue. 12 connection resets received 43 bytes directly received in process context from prequeue 1 connections established 69443 packet headers predicted 226842 segments received 11162 acknowledgments not containing data payload received 161944 segments send out 110524 predicted acknowledgments 210 segments retransmited 4 times recovered from packet loss by selective acknowledgements 1 bad segments received. 37 congestion windows recovered without slow start after partial ack 575 resets sent 4 TCP data loss events [snip] 6 timeouts after SACK recovery 1 timeouts in loss state 5 fast retransmits 5 retransmits in slow start 167 other TCP timeouts [snip] 34
  • 36. (47) • TCP cwnd • cwnd • 1Gbps RTT 200ms MTU 1500B cwnd 16000 8000 8000 RTT ≒ 27 Time 36
  • 37. (48) • • PSockets GridFTP (BitTrrent Swarming) • • • TCP TCP Friendly • • • 37
  • 38. (50) • • • Inter-protocol fairness • TCP Friendliness TCP • Intra-protocol fairness • RTT RTT • Reno RTT 38
  • 39. (49) TCP NewReno RFC 2852 HighSpeed TCP RFC 3649 Scalable TCP BIC CUBIC Linux H-TCP Vegas Westwood+ FAST Illinois YeAH Compound TCP Windows Vista ≒ ≒ RTT2 > RTT1 39
  • 40. (51) CUBIC • cwnd Wmax • RTT cubic cwnd cwnd Steady state behavior Probing Wmax 3 3 Wmax β Wcubic = C T− + Wmax C Wmin Time 40
  • 41. (52) Compound TCP • Windows Vista loss-based (cwnd) • Reno + delay-based (dwnd) • • Reno RTT • = cwnd + dwnd • dwnd 0 Reno loss free period 41
  • 42. (53) TCP • TCP • • • • • • • HPC TCP http://code.google.com/p/pspacer/wiki/DonkanTcp 42
  • 43. (54) • RTT • • → RTT: 200 RTT: 200 BW: 500 Mbps S R ( ) GbE GbE 500Mbps 1Gbps 43
  • 44. (55) • Limited slow start [RFC 3742] • max_ssthresh < cwnd <= ssthresh RTT cwnd max_ssthresh/2 • Swift Start • Packet-pair probing ACK • HyStart (Hybrid slow start) • RTT • Linux 2.6.29 CUBIC 44
  • 46. (19) MPI TCP • MPI TCP e.g. • • • Time Time 46
  • 47. (20) PC 8 PCs 8 PCs WAN Node8 Node0 Host 0 Host 0 Host 0 GtrcNET-1 Host 0 Catalyst 3750 Catalyst 3750 Host 0 Host 0 ……… ……… Node7 Node15 • CPU: Pentium4/2.4GHz • Memory: DDR400 512MB • NIC: Intel PRO/1000 (82547EI) • OS: Linux-2.6.9-1.6 (Fedora Core 2) • Socket buffer: 20MB 47
  • 48. (-) 8 PCs 2 10MB 8 PCs WAN Node8 Node0 Host 0 Host 0 Host 0 GtrcNET-1 Host 0 Catalyst 3750 Catalyst 3750 Host 0 Host 0 ……… ……… Node7 RTT: 20 ms Node15 : 500 Mbps • CPU: Pentium4/2.4GHz • Memory: DDR400 512MB • NIC: Intel PRO/1000 (82547EI) 1 • OS: Linux-2.6.9-1.6 (Fedora Core 2) • Socket buffer: 20MB 48
  • 49. (21) 1000 800 2 10MB Bandwidth (Mbps) 600 400 200 0 0 2 4 6 8 10 • Time (sec) 160 10MB 1000 500Mbps • 800 x3.5 Bandwidth (Mbps) 160 600 400 200 0 0 100 200 300 400 500 600 Time (msec) 49
  • 50. (22) • • Linux RTO cwnd [RFC 2861] • MPI • HTTP/1.1 • kernel 2.6.18 net.ipv4. tcp_slow_start_after_idle=0 ssthresh cwnd time 50
  • 51. (23) • • • cwnd ACK → cwnd cwnd ACK RTT (Round Trip Time) 51
  • 52. (24) • ACK • ACK RTT • RTT cwnd cwnd RTT cwnd ACK RTT (Round Trip Time) 52
  • 53. (25) 1000 800 Bandwidth (Mbps) 600 2 10MB 400 200 0 0 100 200 300 400 500 600 Time (msec) 1000 • cwnd 800 Bandwidth (Mbps) 600 400 • 200 0 0 100 200 300 400 500 600 Time (msec) 53
  • 54. (-) 8 PCs 8 PCs WAN Node8 Node0 Host 0 Host 0 Host 0 GtrcNET-1 Host 0 Catalyst 3750 Catalyst 3750 Host 0 Host 0 ……… ……… Node7 RTT: 0 ms Node15 : 1 Gbps • CPU: Pentium4/2.4GHz • Memory: DDR400 512MB 16 • NIC: Intel PRO/1000 (82547EI) • OS: Linux-2.6.9-1.6 (Fedora Core 2) • Socket buffer: 20MB 54
  • 55. (29) All-to-all • • MPI • data A A0 A1 A2 A3 A0 B0 C0 D0 process B B0 B1 B2 B3 A1 B1 C1 D1 C C0 C1 C2 C3 A2 B2 C2 D2 D D0 D1 D2 D3 A3 B3 C3 D3 A B A B A B C D C D C D 1 2 3 55
  • 56. (28) 1 250 200 230Mbps • Bandwidth (Mbps) 150 • • 100 50 0 0 32 KB200 400 600 Message size (KB) 800 1000 1000 32KB • 800 Bandwidth (Mbps) 600 200 400 • 200 • 200 0 0 500 1000 1500 2000 Time (msec) 56
  • 57. (30) RTO • 2 RTO • TCP 1 2 3 A B A B A B RTO A C 1 C D C D C D A B A B A B 2 C D C D C D 57
  • 58. (31) RTO • RTO = RTT + α •α → •α → } • RTT 0 RTO 200 RTO = SRTT + max(G, 4 x RTTVAR) SRTT = (1 - β) x SRTT + β x RTT RTTVAR = (1 - γ) x RTTVAR + γ x |SRTT - RTT| β = 1/8, γ = 1/4, G = 200 msec SRTT: Smoothed Round Trip Time RFC2988 G 1 RTT 500 58
  • 59. (32) RTO 1 250 RTO = SRTT + max(G, 4 x RTTVAR) 200 • G = 200 Bandwidth (Mbps) 150 100 • G=0 • (kernel 2.6.23 50 32 KB w/o RTO fix w/ RTO fix ) ip route 0 0 200 400 600 800 1000 Message size (KB) RTO RTO 1000 1000 800 800 Bandwidth (Mbps) 600 Bandwidth (Mbps) 600 400 400 200 200 0 0 0 500 1000 1500 2000 0 500 1000 1500 2000 Time (msec) Time (msec) 59
  • 60. (-) 1 250 • 200 • Bandwidth (Mbps) 150 ‣ 100 50 32 KB w/o RTO fix w/ RTO fix • 0 0 200 400 600 800 1000 Message size (KB) • RTO ‣ RTO • • RTO 60
  • 61. (33) MPI TCP/IP ACK RTO cwnd MPI TCP , ACS11 , 2005 61
  • 62. (65)
  • 64. (58) PSPacer • • Linux OS • GPL http://www.gridmpi.org/pspacer.jsp , ACS14 , 2006 64
  • 65. (63) • 500Mbps • PSPacer PSPacer 500Mbps 65
  • 66. (64) TCP Sender Receiver A C 1 Gbps RTT 200ms B D w/o PSPacer w/ PSPacer 535 Mbps 980 Mbps 490 Mbps 394 Mbps 490 Mbps 141 Mbps TCP ※Scalable TCP 66
  • 67. (62) PSPacer 10GbE 100Mbps Iperf 5 GtrcNET-10 10 (Gbps) 8 PSPacer 10GbE 6 4 HTB: Hierarchical Token Bucket Linux 2 PSPacer+ HTB 0 CPU: Quad-core Xeon (E5430) x 2 0 2 4 6 8 10 NIC: Myricom Myri-10G (PCIe x 8) MTU: 9000 byte Memory: 4GB DDR2-667 (Gbps) 67
  • 68. (66) • • • ip $ ip route show cached to 192.168.0.2 192.168.0.2 from 192.168.0.1 dev eth1 cache mtu 9000 rtt 187ms rttvar 175ms ssthresh 1024 cwnd 637 advmss 8960 hoplimit 64 • # sysctl -w net.ipv4.tcp_no_metrics_save=1 net.ipv4.tcp_no_metrics_save=1 68
  • 69. (67) • • GbE MTU 1500B → 949.3 Mbps (Gb MTU 9000B → 991.4 Mbps • “ping -M do” IP “Don’t fragment” bit on $ ping -M do -s 9000 192.168.0.2 NG PING 192.168.0.2 (192.168.0.2) 9000(9028) bytes of data. From 192.168.0.1 icmp_seq=1 Frag needed and DF set (mtu = 9000) From 192.168.0.1 icmp_seq=1 Frag needed and DF set (mtu = 9000) From 192.168.0.1 icmp_seq=1 Frag needed and DF set (mtu = 9000) $ ping -M do -s 8972 192.168.0.2 PING 192.168.0.2 (192.168.0.2) 8972(9000) bytes of data. OK 8980 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=3.76 ms 8980 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.127 ms 8980 bytes from 192.168.0.2: icmp_seq=3 ttl=64 time=0.133 ms 69
  • 70. (68) • Infiniband Myrinet • • IEEE 802.3x • PAUSE • $ sudo ethtool -A eth0 tx on rx on $ sudo ethtool -a eth0 Pause parameters for eth1: Autonegotiate: off RX: off TX: off Lossless Ethernet 70
  • 72. (70) Rough consensus, running code • RFC • Reno [RFC 2851] • • net/ipv4 180 • regression • BIC init_ssthresh ABC • • 72
  • 73. (72) Web100 • Pittsburgh Supercomputing Center (PSC) • procfs • netstat -s [RFC 4898] • • http://www.web100.org/ 73
  • 74. (72) TCP Probe • KProbes TCP • procfs • • 74
  • 75. (73) TCP Probe cwnd ssthresh http://www.linuxfoundation.org/en/Net:TCP_testing 75
  • 76. (76) 1. ×RTT 4MB 2. setsockopt listen connect 3. 4. TCP 5. 76
  • 77. (77) 1. tcp_slow_start_after_idle 2. 3. RTO RTO 4. 5. 77
  • 78.
  • 79. TCP ARPANET NCP TCP IP (1970 ) TCP RFC793 (standard) TCP(v4) IPv4 (1986) → Tahoe RFC2851 (standard) Reno RFC2852 (experimental) NewReno Vegas HSTCP FAST ScalableTCP HTCP BIC CUBIC CompoundTCP TCP 79
  • 80. RFC Spec. sysctl default TCP (RFC 793) - - Window scaling (RFC 1323) tcp_window_scaling 1 TImestamps option (RFC 1323) tcp_timestamps 1 TIME-WAIT Assassination Hazards (RFC 1337) tcp_rfc1337 0 SACK (RFC 2018, 3517) tcp_sack 1 Control block sharing (RFC 2140) tcp_no_metrics_save 0 MD5 signature option (RFC 2385) - - Initial window size (RFC 2414) - - Congestion control (RFC 2581) - - NewReno (RFC 3782) - - Cwnd validation (RFC 2861) tcp_slow_start_after_idle 1 D-SACK (RFC 2883) tcp_dsack 1 Limited transmit (RFC 3042) - - Explicit congestion notification (RFC 3168) tcp_ecn 0 Appropriate Byte Counting (RFC 3465) tcp_abc 0 Limited slow start (RFC 3742) tcp_max_ssthresh 0 FRTO (RFC 4138) tcp_frto 0 Forward ACK tcp_fack 1 Path MTU discovery tcp_mtu_probing 0 80
  • 81. URL Project URL BIC/CUBIC http://netsrv.csc.ncsu.edu/twiki/bin/view/Main/BIC FAST http://netlab.caltech.edu/FAST/ TCP Westwood+ http://c3lab.poliba.it/index.php/Westwood Scalable TCP http://www.deneholme.net/tom/scalable/ HighSpeed TCP http://www.icir.org/floyd/hstcp.html H-TCP http://www.hamilton.ie/net/htcp/index.htm TCP-Illinois http://www.ews.uiuc.edu/~shaoliu/tcpillinois/ Compound TCP http://research.microsoft.com/en-us/projects/ctcp/ TCP Vegas http://www.cs.arizona.edu/projects/protocols/ TCP Westwood http://www.cs.ucla.edu/NRL/hpi/tcpw/ Circuit TCP http://www.ece.virginia.edu/cheetah/ TCP Hybla https://hybla.deis.unibo.it/ TCP Low Priority http://www.ece.rice.edu/networks/TCP-LP/ UDT http://www.cs.uic.edu/~ygu1/ XCP http://www.ana.lcs.mit.edu/dina/XCP/ Web100 http://www.web100.org/ TCP Probe http://www.linuxfoundation.org/en/Net:TcpProbe iproute2 http://www.linuxfoundation.org/en/Net:Iproute2 81