Joint Techs Workshop, TIP 2004
Jan 28, 2004
Honolulu, Hawaii




            Trans-Pacific Grid Datafarm

                                 Osamu Tatebe
                Grid Technology Research Center, AIST

                 On behalf of the Grid Datafarm Project



              National Institute of Advanced Industrial Science and Technology
Key points of this talk
 Trans-pacific Grid file system and testbed
    70 TBytes disk capacity, 13 GB/sec disk I/O performance
 Trans-pacific file replication [SC2003 Bandwidth Challenge]
    1.5TB data transferred in an hour
    Multiple high-speed Trans-Pacific networks;
    APAN/TransPAC (2.4 Gbps OC48 POS, 500 Mbps OC-12
    ATM), SuperSINET (2.4 Gbps x 2, 1 Gbps available)
    6,000 miles
    stable 3.79 Gbps out of theoretical peak 3.9 Gbps (97%)
    using 11 node pairs (MTU 6000B)
    We won the "Distributed Infrastructure" award!


        National Institute of Advanced Industrial Science and Technology
[Background] Petascale Data Intensive
 Computing
High Energy Physics
     CERN LHC, KEK Belle
        ~MB/collision,
        100 collisions/sec                                                       Detector for
                                                                               LHCb experiment

        ~PB/year
        2000 physicists, 35 countries

                                                        Detector for
                                                        ALICE experiment




                                     Astronomical Data Analysis
                                          data analysis of whole the data
                                          TB~PB/year/telescope
                                          SUBARU telescope
                                              10 GB/night, 3 TB/year

            National Institute of Advanced Industrial Science and Technology
[Background 2] Large-scale File Sharing
 P2P – exclusive and special-purpose approach
    Napster, Gnutella, Freenet, . . .
 Grid technology – file transfer, metadata management
    GridFTP, Replica Location Service
    Storage Resource Broker (SRB)
 Large-scale file system – general approach
    Legion, Avaki [Grid, no replica management]
    Grid Datafarm [Grid]
    Farsite, OceanStore [P2P]
    AFS, DFS, . . .



        National Institute of Advanced Industrial Science and Technology
Goal and feature of Grid Datafarm
 Goal
    Dependable data sharing among multiple organizations
    High-speed data access, High-speed data processing

 Grid Datafarm
     Grid File System – Global dependable virtual file system
        Integrates CPU + storage
    Parallel & distributed data processing

 Features
    Secured based on Grid Security Infrastructure
    Scalable depending on data size and usage scenarios
    Data location transparent data access
    Automatic and transparent replica access for fault tolerance
    High-performance data access and processing by accessing multiple
    dispersed storages in parallel (file affinity scheduling)


          National Institute of Advanced Industrial Science and Technology
Grid Datafarm (1): Gfarm file system -
       World-wide virtual file system [CCGrid 2002]
            Transparent access to dispersed file data in a Grid
               POSIX I/O APIs, and native Gfarm APIs for extended file
               view semantics and replications
               Map from virtual directory tree to physical file
               Automatic and transparent replica access for fault
               tolerance and access-concentration avoidance
 Virtual Directory /grid
 Tree                                File system metadata

          ggf                        jp

aist            gtrc         file1        file2
                                                               mapping

  file1         file2    file3       file4

                                                            File replica creation
          Gfarm File System


                        National Institute of Advanced Industrial Science and Technology
Grid Datafarm (2): High-performance data access
and processing support [CCGrid 2002]
 World-wide parallel and distributed processing
   Aggregate of files = superfile
   Data processing of superfiles = parallel and
   distributed data processing of member files
      Local file view (SPMD parallel file access)
      File-affinity scheduling (“Owner-computes”)

                                                                       World-wide
                            Virtual CPU                                Parallel &
                                                                       distributed
                                                                       processing
                         Grid File System                    Astronomic archival data
                                                             365 parallel analysis
                                                             in a year (superfile)

        National Institute of Advanced Industrial Science and Technology
Transfer technology in long fat networks

 Bandwidth and latency between US and Japan
   1 10 Gbps, 150 300 msec in RTT
 TCP acceleration
   Adjustment of congestion window
   Multiple TCP connections
   HighSpeed TCP、Scalable TCP、FAST TCP
   XCP (not TCP)
 UDP based acceleration
   Tsunami、UDT、RBUDP、atou、. . .
         Bandwidth prediction without packet loss
       National Institute of Advanced Industrial Science and Technology
Multiple TCP streams sometimes
   considered harmful . . .
      Multiple TCP streams achieve good bandwidth, but
      excessively congest the network. In fact would
      “shoot oneself in the foot”.
                                                     APAN/TransPAC LA-Tokyo (2.4Gbps)

                                        2800
                                                                                                  Too much
High oscillation
                                        2600                                                      congestion
Not stable!                             2400
                                        2200
                                        2000
                       Bandwidth (Mbp




                                        1800
                                        1600                                            TxTotal
                                                                                        TxBW0
                                        1400
                                                                                        TxBW1
                                        1200
                                                                                        TxBW2
                                        1000
Compensate                               800
each other                               600
                                         400
                                         200
                                           0
                                           375.5   Too much 377
                                                   376 376.5       network flow
                                                                         377.5 378
                                                        Time (seconds)
                                                                       [10 msec average]
                                                   Need to limit bandwidth appropriately
                   National Institute of Advanced Industrial Science and Technology
A programmable network testbed device
GNET-1



                            Large high-speed
                            memory blocks




                                                 Programmable hardware
                                                 network testbed
                                                 WAN emulation
                                                 - latency, bandwidth,
                                                   packet loss, jitter, . . .
                                                 Precise measurement
                                                 - bandwidth in 100 usec
                                                 - latency, jitter between 2 GNET-1
                                                 General purpose, very flexible!
      National Institute of Advanced Industrial Science and Technology
IFG-based pace control by GNET-1
                        Shaping by GNET-1 (700Mbps x 3 @ APAN LA-Tokyo(2.4Gbps))
                                                                                                                   Shaping by GNET-1 (700Mbps x 3 @ APAN LA-Tokyo(2.4Gbps))                                   Shaping by GNET-1 (700Mbps x 3 @ APAN LA-Tokyo(2.4Gbps))
                 1000
                                                                                                            1000                                                                                       1000
                  900
                                                                                                            900                                                                                        900
                  800
                                                                                                            800                                                                                        800
                  700
                                                                                                            700                                                                                        700
 Bandwidth (Mb




                                                                                           Bandwidth (Mbp




                                                                                                                                                                                      Bandwidth (Mbp
                  600
                                                                                                            600                                                                                        600
                                                                                   RxBW0                                                                                      TxBW0                                                                                      TxBW0
                  500                                                                                                                                                         TxBW1                                                                                      TxBW1
                                                                                   RxBW1                    500                                                                                        500
                  400                                                                                                                                                         TxBW2                                                                                      TxBW2
                                                                                                            400                                                                                        400
                  300                                                                                       300                                                                                        300
                  200                                                                                       200                                                                                        200
                  100                                                                                       100                                                                                        100
                    0                                                                                         0                                                                                          0
                    245.5              246               246.5         247                                    245.5              246                   246.5           247                               245.5              246                   246.5           247
                                             Time (Second)                                                                             Time (Second)                                                                              Time (Second)


                                      GNET-1                                                                                                                        Bottleneck
                       1 Gbps
                                                                                                                              700 Mbps           700 Mbps
                 (enable flow control)
                                                                                                                                     NO PACKET LOSS!
 GNET-1 provides
   Precise traffic pacing at any data rate by
   changing IFG (Inter-Frame Gap)
   Packet loss free network using large input buffer
   (16MB)

                                              National Institute of Advanced Industrial Science and Technology
Summary of technologies for performance
improvement
 [Disk I/O performance] Grid Datafarm – A Grid file system with high-
 performance data-intensive computing support
     A world-wide virtual file system that federates local file systems of
     multiple clusters
     It provides scalable disk I/O performance for file replication via high-
     speed network links and large-scale data-intensive applications
     Trans-Pacific Grid Datafarm testbed
        5 clusters in Japan, 3 clusters in US, and 1 cluster in Thailand, provides 70
        TBytes disk capacity, 13 GB/sec disk I/O performance
    It supports file replication for fault tolerance and access-concentration
    avoidance
 [World-wide high-speed network efficient utilization] GNET-1 – a gigabit
 network testbed device
    Provides IFG-based precise rate-controlled flow at any rate
    Enables stable and efficient Trans-Pacific network use of HighSpeed
    TCP


          National Institute of Advanced Industrial Science and Technology
Trans-Pacific Grid Datafarm testbed:
   Network and cluster configuration
                 SuperSINET Trans-Pacific thoretical peak 3.9 Gbps                          Indiana
                                     Gfarm disk capacity               70 TBytes            Univ
           Titech
                                          disk read/write              13 GB/sec
     147 nodes
     16 TBytes         10G                    SuperSINET
     4 GB/sec                                                                                   SC2003
            Univ                                2.4G
          Tsukuba              NII                               New     2.4G                   Phoenix
    10 nodes           10G                                       York
    1 TBytes                                     2.4G(1G)
    300 MB/sec         10G                     [950 Mbps]




                                                                                Abilene
                                                                                Abilene
              KEK                           [500 Mbps]
    7 nodes                  1G              OC-12 ATM
    3.7 TBytes                                 622M            Chicago
    200 MB/sec
                              APAN                                        1G              10G
           Maffin            Tokyo XP
                                               APAN/TransPAC
                       1G     1G
                                                                                  32 nodes
              AIST                   5G         2.4G         Los Angeles
                                             [2.34 Gbps]                          23.3 TBytes
                                                                         10G
                     Tsukuba                                                 SDSC 2 GB/sec
16 nodes                                  16 nodes         Kasetsart
                     WAN
11.7 TBytes                               11.7 TBytes      Univ,
1 GB/sec                                                   Thailand
                                          1 GB/sec
                 National Institute of Advanced Industrial Science and Technology
Scientific Data for Bandwidth Challenge
 Trans-Pacific File Replication of scientific data
    For transparent, high-performance, and fault-tolerant access
 Astronomical Object Survey on Grid Datafarm [HPC Challenge participant]
    World-wide data analysis on whole the archive
    652 GBytes data observed by SUBARU telescope
    N. Yamamoto (AIST)
 Large configuration data from Lattice QCD
    Three sets of hundreds of gluon field configurations on a 24^3*48 4-D
    space-time lattice (3 sets x 364.5 MB x 800 = 854.3 GB)
    Generated by the CP-PACS parallel computer at Center for
    Computational Physics, Univ. of Tsukuba (300Gflops x years of CPU
    time) [Univ Tsukuba Booth]




         National Institute of Advanced Industrial Science and Technology
Network bandwidth in APAN/TransPAC
 LA route
PC                                           RTT: 141 ms                               PC
              switch                                                          switch
PC                      3G                                             3G              PC
                                    10G             2.4G
PC                           FC10                              Juniper                 PC
              switch                      router                              switch
PC                           E600                               M20                    PC
PC            switch                       LA                  Tokyo          switch   PC

GNET-1                                             Stable transfer rate of 2.3 Gbps

              2
     [Gbps]




              1




                     No pacing                               Pacing in 2.3 Gbps
                                                             (900 + 900 + 500)
                  National Institute of Advanced Industrial Science and Technology
APAN/TransPAC LA route (1)




      National Institute of Advanced Industrial Science and Technology
APAN/TransPAC LA route (2)




      National Institute of Advanced Industrial Science and Technology
APAN/TransPAC LA route (3)




      National Institute of Advanced Industrial Science and Technology
File replication between Japan and US
  (network configuration)
 PC                                                   RTT: 141 ms                  PC
          switch                                                         switch
 PC                      10G                                                       PC
                   3G                                               3G
 PC                                                                                PC
          switch                          LA              Tokyo          switch
 PC                                               2.4G                             PC
 PC                                     router                                     PC


                              Abilene
                              Abilene
          switch                                          Juniper        switch
  PC                 FC10                                                          PC
                                                           M20
                     E600
  PC                                                                               PC
          switch                        router                           switch
  PC                                              500M              1G             PC
                1G                      Chicago       RTT: 250 ms
 PC                                                                                PC
 PC       switch 1G                     router             router        switch    PC
 PC                                               2.4G              1G              PC
                                   (1G) NYC           RTT: 285 ms
Phoenix      GNET-1                                                               Tokyo,
                                                                                  Tsukuba

             National Institute of Advanced Industrial Science and Technology
File replication performance between Japan
and US (total)




       National Institute of Advanced Industrial Science and Technology
APAN/TransPAC Chicago




               Pacing at 500 Mbps, quite stable




     National Institute of Advanced Industrial Science and Technology
APAN/TransPAC LA (1)

              After re-pacing from 800 to 780 Mbps, quite stable




      National Institute of Advanced Industrial Science and Technology
APAN/TransPAC LA (2)

              After re-pacing of LA (1), quite stable




      National Institute of Advanced Industrial Science and Technology
APAN/TransPAC LA (3)

              After re-pacing of LA (1), quite stable




      National Institute of Advanced Industrial Science and Technology
SuperSINET NYC
           Re-pacing from 930 to 950 Mbps




     National Institute of Advanced Industrial Science and Technology
Summary
 Efficient use around the peak rate in long fat networks
      IFG-based precise pacing within packet loss free bandwidth with GNET-1
         -> packet loss free network
     Stable network flow even with HighSpeed TCP
 Disk I/O performance improvement
     Parallel disk access using Gfarm
     Trans-pacific file replication performance: 3.79Gbps out of theoretical peak 3.9
     Gbps (97%) using 11 node pairs (MTU 6000B)
     1.5TB data was transferred in an hour
 Linux 2.4 kernel problem during file replication (transfer)
     Network transfer stopped in a few minutes when flushing buffer cache to disk
     Linux kernel bug?
     Defensive solution: set very short interval for buffer cache flush
         This limits file transfer rate to 400 Mbps for one node pair
 Successful Trans-pacific scale data analysis
 . . . Scalability problem of LDAP server for a metadata server
       Further improvement needed




           National Institute of Advanced Industrial Science and Technology
Future work
 Standardization effort with GGF Grid File System WG
    Foster (world-wide) storage sharing and integration
    dependable data sharing, high-performance data access
    among several organizations
 Application area
    High energy physics experiment
    Astronomic data analysis
    Bioinformatics, . . .
    Dependable data processing in eGovernment and
    eCommerce
    Other applications that needs dependable file sharing
    among several organizations


        National Institute of Advanced Industrial Science and Technology
Special thanks to
 Hirotaka Ogawa, Yuetsu Kodama, Tomohiro Kudoh, Satoshi
 Sekiguchi (AIST), Satoshi Matsuoka, Kento Aida (Titech),
 Taisuke Boku, Mitsuhisa Sato (Univ Tsukuba),
 Youhei Morita (KEK), Yoshinori Kitatsuji (APAN Tokyo XP),
 Jim Williams, John Hicks (TransPAC/Indiana Univ)
 Eguchi Hisashi (Maffin), Kazunori Konishi, Jin Tanaka,
 Yoshitaka Hattori (APAN), Jun Matsukata (NII), Chris Robb
 (Abilene)
 Tsukuba WAN NOC team, APAN NOC team, NII SuperSINET
 NOC team
 Force10 Networks
 PRAGMA, ApGrid, SDSC, Indiana University, Kasetsart
 University


        National Institute of Advanced Industrial Science and Technology

Gfarm Fs Tatebe Tip2004

  • 1.
    Joint Techs Workshop,TIP 2004 Jan 28, 2004 Honolulu, Hawaii Trans-Pacific Grid Datafarm Osamu Tatebe Grid Technology Research Center, AIST On behalf of the Grid Datafarm Project National Institute of Advanced Industrial Science and Technology
  • 2.
    Key points ofthis talk Trans-pacific Grid file system and testbed 70 TBytes disk capacity, 13 GB/sec disk I/O performance Trans-pacific file replication [SC2003 Bandwidth Challenge] 1.5TB data transferred in an hour Multiple high-speed Trans-Pacific networks; APAN/TransPAC (2.4 Gbps OC48 POS, 500 Mbps OC-12 ATM), SuperSINET (2.4 Gbps x 2, 1 Gbps available) 6,000 miles stable 3.79 Gbps out of theoretical peak 3.9 Gbps (97%) using 11 node pairs (MTU 6000B) We won the "Distributed Infrastructure" award! National Institute of Advanced Industrial Science and Technology
  • 3.
    [Background] Petascale DataIntensive Computing High Energy Physics CERN LHC, KEK Belle ~MB/collision, 100 collisions/sec Detector for LHCb experiment ~PB/year 2000 physicists, 35 countries Detector for ALICE experiment Astronomical Data Analysis data analysis of whole the data TB~PB/year/telescope SUBARU telescope 10 GB/night, 3 TB/year National Institute of Advanced Industrial Science and Technology
  • 4.
    [Background 2] Large-scaleFile Sharing P2P – exclusive and special-purpose approach Napster, Gnutella, Freenet, . . . Grid technology – file transfer, metadata management GridFTP, Replica Location Service Storage Resource Broker (SRB) Large-scale file system – general approach Legion, Avaki [Grid, no replica management] Grid Datafarm [Grid] Farsite, OceanStore [P2P] AFS, DFS, . . . National Institute of Advanced Industrial Science and Technology
  • 5.
    Goal and featureof Grid Datafarm Goal Dependable data sharing among multiple organizations High-speed data access, High-speed data processing Grid Datafarm Grid File System – Global dependable virtual file system Integrates CPU + storage Parallel & distributed data processing Features Secured based on Grid Security Infrastructure Scalable depending on data size and usage scenarios Data location transparent data access Automatic and transparent replica access for fault tolerance High-performance data access and processing by accessing multiple dispersed storages in parallel (file affinity scheduling) National Institute of Advanced Industrial Science and Technology
  • 6.
    Grid Datafarm (1):Gfarm file system - World-wide virtual file system [CCGrid 2002] Transparent access to dispersed file data in a Grid POSIX I/O APIs, and native Gfarm APIs for extended file view semantics and replications Map from virtual directory tree to physical file Automatic and transparent replica access for fault tolerance and access-concentration avoidance Virtual Directory /grid Tree File system metadata ggf jp aist gtrc file1 file2 mapping file1 file2 file3 file4 File replica creation Gfarm File System National Institute of Advanced Industrial Science and Technology
  • 7.
    Grid Datafarm (2):High-performance data access and processing support [CCGrid 2002] World-wide parallel and distributed processing Aggregate of files = superfile Data processing of superfiles = parallel and distributed data processing of member files Local file view (SPMD parallel file access) File-affinity scheduling (“Owner-computes”) World-wide Virtual CPU Parallel & distributed processing Grid File System Astronomic archival data 365 parallel analysis in a year (superfile) National Institute of Advanced Industrial Science and Technology
  • 8.
    Transfer technology inlong fat networks Bandwidth and latency between US and Japan 1 10 Gbps, 150 300 msec in RTT TCP acceleration Adjustment of congestion window Multiple TCP connections HighSpeed TCP、Scalable TCP、FAST TCP XCP (not TCP) UDP based acceleration Tsunami、UDT、RBUDP、atou、. . . Bandwidth prediction without packet loss National Institute of Advanced Industrial Science and Technology
  • 9.
    Multiple TCP streamssometimes considered harmful . . . Multiple TCP streams achieve good bandwidth, but excessively congest the network. In fact would “shoot oneself in the foot”. APAN/TransPAC LA-Tokyo (2.4Gbps) 2800 Too much High oscillation 2600 congestion Not stable! 2400 2200 2000 Bandwidth (Mbp 1800 1600 TxTotal TxBW0 1400 TxBW1 1200 TxBW2 1000 Compensate 800 each other 600 400 200 0 375.5 Too much 377 376 376.5 network flow 377.5 378 Time (seconds) [10 msec average] Need to limit bandwidth appropriately National Institute of Advanced Industrial Science and Technology
  • 10.
    A programmable networktestbed device GNET-1 Large high-speed memory blocks Programmable hardware network testbed WAN emulation - latency, bandwidth, packet loss, jitter, . . . Precise measurement - bandwidth in 100 usec - latency, jitter between 2 GNET-1 General purpose, very flexible! National Institute of Advanced Industrial Science and Technology
  • 11.
    IFG-based pace controlby GNET-1 Shaping by GNET-1 (700Mbps x 3 @ APAN LA-Tokyo(2.4Gbps)) Shaping by GNET-1 (700Mbps x 3 @ APAN LA-Tokyo(2.4Gbps)) Shaping by GNET-1 (700Mbps x 3 @ APAN LA-Tokyo(2.4Gbps)) 1000 1000 1000 900 900 900 800 800 800 700 700 700 Bandwidth (Mb Bandwidth (Mbp Bandwidth (Mbp 600 600 600 RxBW0 TxBW0 TxBW0 500 TxBW1 TxBW1 RxBW1 500 500 400 TxBW2 TxBW2 400 400 300 300 300 200 200 200 100 100 100 0 0 0 245.5 246 246.5 247 245.5 246 246.5 247 245.5 246 246.5 247 Time (Second) Time (Second) Time (Second) GNET-1 Bottleneck 1 Gbps 700 Mbps 700 Mbps (enable flow control) NO PACKET LOSS! GNET-1 provides Precise traffic pacing at any data rate by changing IFG (Inter-Frame Gap) Packet loss free network using large input buffer (16MB) National Institute of Advanced Industrial Science and Technology
  • 12.
    Summary of technologiesfor performance improvement [Disk I/O performance] Grid Datafarm – A Grid file system with high- performance data-intensive computing support A world-wide virtual file system that federates local file systems of multiple clusters It provides scalable disk I/O performance for file replication via high- speed network links and large-scale data-intensive applications Trans-Pacific Grid Datafarm testbed 5 clusters in Japan, 3 clusters in US, and 1 cluster in Thailand, provides 70 TBytes disk capacity, 13 GB/sec disk I/O performance It supports file replication for fault tolerance and access-concentration avoidance [World-wide high-speed network efficient utilization] GNET-1 – a gigabit network testbed device Provides IFG-based precise rate-controlled flow at any rate Enables stable and efficient Trans-Pacific network use of HighSpeed TCP National Institute of Advanced Industrial Science and Technology
  • 13.
    Trans-Pacific Grid Datafarmtestbed: Network and cluster configuration SuperSINET Trans-Pacific thoretical peak 3.9 Gbps Indiana Gfarm disk capacity 70 TBytes Univ Titech disk read/write 13 GB/sec 147 nodes 16 TBytes 10G SuperSINET 4 GB/sec SC2003 Univ 2.4G Tsukuba NII New 2.4G Phoenix 10 nodes 10G York 1 TBytes 2.4G(1G) 300 MB/sec 10G [950 Mbps] Abilene Abilene KEK [500 Mbps] 7 nodes 1G OC-12 ATM 3.7 TBytes 622M Chicago 200 MB/sec APAN 1G 10G Maffin Tokyo XP APAN/TransPAC 1G 1G 32 nodes AIST 5G 2.4G Los Angeles [2.34 Gbps] 23.3 TBytes 10G Tsukuba SDSC 2 GB/sec 16 nodes 16 nodes Kasetsart WAN 11.7 TBytes 11.7 TBytes Univ, 1 GB/sec Thailand 1 GB/sec National Institute of Advanced Industrial Science and Technology
  • 14.
    Scientific Data forBandwidth Challenge Trans-Pacific File Replication of scientific data For transparent, high-performance, and fault-tolerant access Astronomical Object Survey on Grid Datafarm [HPC Challenge participant] World-wide data analysis on whole the archive 652 GBytes data observed by SUBARU telescope N. Yamamoto (AIST) Large configuration data from Lattice QCD Three sets of hundreds of gluon field configurations on a 24^3*48 4-D space-time lattice (3 sets x 364.5 MB x 800 = 854.3 GB) Generated by the CP-PACS parallel computer at Center for Computational Physics, Univ. of Tsukuba (300Gflops x years of CPU time) [Univ Tsukuba Booth] National Institute of Advanced Industrial Science and Technology
  • 15.
    Network bandwidth inAPAN/TransPAC LA route PC RTT: 141 ms PC switch switch PC 3G 3G PC 10G 2.4G PC FC10 Juniper PC switch router switch PC E600 M20 PC PC switch LA Tokyo switch PC GNET-1 Stable transfer rate of 2.3 Gbps 2 [Gbps] 1 No pacing Pacing in 2.3 Gbps (900 + 900 + 500) National Institute of Advanced Industrial Science and Technology
  • 16.
    APAN/TransPAC LA route(1) National Institute of Advanced Industrial Science and Technology
  • 17.
    APAN/TransPAC LA route(2) National Institute of Advanced Industrial Science and Technology
  • 18.
    APAN/TransPAC LA route(3) National Institute of Advanced Industrial Science and Technology
  • 19.
    File replication betweenJapan and US (network configuration) PC RTT: 141 ms PC switch switch PC 10G PC 3G 3G PC PC switch LA Tokyo switch PC 2.4G PC PC router PC Abilene Abilene switch Juniper switch PC FC10 PC M20 E600 PC PC switch router switch PC 500M 1G PC 1G Chicago RTT: 250 ms PC PC PC switch 1G router router switch PC PC 2.4G 1G PC (1G) NYC RTT: 285 ms Phoenix GNET-1 Tokyo, Tsukuba National Institute of Advanced Industrial Science and Technology
  • 20.
    File replication performancebetween Japan and US (total) National Institute of Advanced Industrial Science and Technology
  • 21.
    APAN/TransPAC Chicago Pacing at 500 Mbps, quite stable National Institute of Advanced Industrial Science and Technology
  • 22.
    APAN/TransPAC LA (1) After re-pacing from 800 to 780 Mbps, quite stable National Institute of Advanced Industrial Science and Technology
  • 23.
    APAN/TransPAC LA (2) After re-pacing of LA (1), quite stable National Institute of Advanced Industrial Science and Technology
  • 24.
    APAN/TransPAC LA (3) After re-pacing of LA (1), quite stable National Institute of Advanced Industrial Science and Technology
  • 25.
    SuperSINET NYC Re-pacing from 930 to 950 Mbps National Institute of Advanced Industrial Science and Technology
  • 26.
    Summary Efficient usearound the peak rate in long fat networks IFG-based precise pacing within packet loss free bandwidth with GNET-1 -> packet loss free network Stable network flow even with HighSpeed TCP Disk I/O performance improvement Parallel disk access using Gfarm Trans-pacific file replication performance: 3.79Gbps out of theoretical peak 3.9 Gbps (97%) using 11 node pairs (MTU 6000B) 1.5TB data was transferred in an hour Linux 2.4 kernel problem during file replication (transfer) Network transfer stopped in a few minutes when flushing buffer cache to disk Linux kernel bug? Defensive solution: set very short interval for buffer cache flush This limits file transfer rate to 400 Mbps for one node pair Successful Trans-pacific scale data analysis . . . Scalability problem of LDAP server for a metadata server Further improvement needed National Institute of Advanced Industrial Science and Technology
  • 27.
    Future work Standardizationeffort with GGF Grid File System WG Foster (world-wide) storage sharing and integration dependable data sharing, high-performance data access among several organizations Application area High energy physics experiment Astronomic data analysis Bioinformatics, . . . Dependable data processing in eGovernment and eCommerce Other applications that needs dependable file sharing among several organizations National Institute of Advanced Industrial Science and Technology
  • 28.
    Special thanks to Hirotaka Ogawa, Yuetsu Kodama, Tomohiro Kudoh, Satoshi Sekiguchi (AIST), Satoshi Matsuoka, Kento Aida (Titech), Taisuke Boku, Mitsuhisa Sato (Univ Tsukuba), Youhei Morita (KEK), Yoshinori Kitatsuji (APAN Tokyo XP), Jim Williams, John Hicks (TransPAC/Indiana Univ) Eguchi Hisashi (Maffin), Kazunori Konishi, Jin Tanaka, Yoshitaka Hattori (APAN), Jun Matsukata (NII), Chris Robb (Abilene) Tsukuba WAN NOC team, APAN NOC team, NII SuperSINET NOC team Force10 Networks PRAGMA, ApGrid, SDSC, Indiana University, Kasetsart University National Institute of Advanced Industrial Science and Technology