Experiences with High-bandwidth Networks


Published on

ESGF face-to-face meeting 2012 November

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Experiences with High-bandwidth Networks

  1. 1. Streaming  Exa-­‐scale  Data  over  100Gbps   Networks   Mehmet  Balman     Scien.fic  Data  Management  Group   Computa.onal  Research  Division   Lawrence  Berkeley  Na.onal  Laboratory  
  2. 2. ESG  (Earth  Systems  Grid)   •  Over  2,700  sites   •  25,000  users   •  IPCC  Fi3h  Assessment  Report   (AR5)  2PB     •  IPCC  Forth  Assessment  Report   (AR4)  35TB  
  3. 3. Applications’  Perspective   •  Increasing  the  bandwidth  is  not  sufficient  by  itself;   we  need  careful  evaluaLon  of  high-­‐bandwidth   networks  from  the  applicaLons’  perspecLve.       •  Data  distribu.on  for  climate  science   •   How  scien*fic  data  movement  and  analysis  between   geographically  disparate  supercompu*ng  facili*es  can   benefit  from  high-­‐bandwidth  networks?  
  4. 4. Climate  Data  over  100Gbps   •  Data  volume  in  climate  applicaLons  is  increasing   exponenLally.   •  An  important  challenge  in  managing  ever  increasing  data  sizes   in  climate  science  is  the  large  variance  in  file  sizes.     •  Climate  simulaLon  data  consists  of  a  mix  of  relaLvely  small  and   large  files  with  irregular  file  size  distribuLon  in  each  dataset.     •  Many  small  files  
  5. 5. Keep  the  data  channel  full   FTP RPC request a file request a file send file send file request data send data •  Concurrent  transfers   •  Parallel  streams  
  6. 6. lots-­‐of-­‐small-­‐Ciles  problem!   Cile-­‐centric  tools?     l  Not  necessarily  high-­‐speed  (same  distance)   -  Latency  is  sLll  a  problem   100Gbps pipe 10Gbps pipe request a dataset send data
  7. 7. Framework  for  the  Memory-­‐mapped   Network  Channel   memory  caches  are  logically  mapped  between  client  and  server    
  8. 8. Moving  climate  Ciles  efCiciently  
  9. 9. Advantages  of    MemzNet   •  Decoupling  I/O  and  network  operaLons   •  front-­‐end  (I/O    processing)   •  back-­‐end  (networking  layer)     •  Not  limited  by  the  characterisLcs  of  the  file  sizes    On  the  fly  tar  approach,    bundling  and  sending    many  files  together   •  Dynamic  data  channel  management    Can  increase/decrease  the  parallelism  level  both    in  the  network  communicaLon  and  I/O  read/write    operaLons,  without  closing  and  reopening  the    data  channel  connecLon  (as  is  done  in  regular  FTP    variants).    
  10. 10. ANI  100Gbps     testbed   ANI 100G Router nersc-diskpt-2 nersc-diskpt-3 nersc-diskpt-1 nersc-C2940 switch 4x10GE (MM) 4x 10GE (MM) Site Router (nersc-mr2) anl-mempt-2 anl-mempt-1 anl-app nersc-app NERSC ANL Updated December 11, 2011 ANI Middleware Testbed ANL Site Router 4x10GE (MM) 4x10GE (MM) 100G 100G 1GE 1 GE 1 GE 1 GE 1GE 1 GE 1 GE 1 GE 10G 10G To ESnet ANI 100G Router 4x10GE (MM) 100G 100G ANI 100G Network anl-mempt-1 NICs: 2: 2x10G Myricom anl-mempt-2 NICs: 2: 2x10G Myricom nersc-diskpt-1 NICs: 2: 2x10G Myricom 1: 4x10G HotLava nersc-diskpt-2 NICs: 1: 2x10G Myricom 1: 2x10G Chelsio 1: 6x10G HotLava nersc-diskpt-3 NICs: 1: 2x10G Myricom 1: 2x10G Mellanox 1: 6x10G HotLava Note: ANI 100G routers and 100G wave available till summer 2012; Testbed resources after that subject funding availability. nersc-asw1 anl-C2940 switch 1 GE anl-asw1 1 GE To ESnet eth0 eth0 eth0 eth0 eth0 eth0 eth2-5 eth2-5 eth2-5 eth2-5 eth2-5 eth0 anl-mempt-3 4x10GE (MM) eth2-5 eth0 1 GE anl-mempt-3 NICs: 1: 2x10G Myricom 1: 2x10G Mellanox 4x10GE (MM) 10GE (MM) 10GE (MM) SC11  100Gbps     demo  
  11. 11. Disadvantage  of  many  TCP  Streams   (a) total throughput vs. the number of concurrent memory-to-memory transfers, (b) interface traffic, packages per second (blue) and bytes per second, over a single NIC with different number of concurrent transfers. Three hosts, each with 4 available NICs, and a total of 10 10Gbps NIC pairs were used to saturate the 100Gbps pipe in the ANI Testbed. 10 data movement jobs, each corresponding to a NIC pair, at source and destination started simultaneously. Each peak represents a different test; 1, 2, 4, 8, 16, 32, 64 concurrent streams per job were initiated for 5min intervals (e.g. when concurrency level is 4, there are 40 streams in total).  
  12. 12. ANI testbed 100Gbps (10x10NICs, three hosts): Interrupts/CPU vs the number of concurrent transfers [1, 2, 4, 8, 16, 32 64 concurrent jobs - 5min intervals], TCP buffer size is 50M Effects  of  many  streams  
  13. 13. MemzNet’s  Performance     TCP  buffer  size  is  set  to  50MB     MemzNetGridFTP SC11 demo ANI Testbed
  14. 14. MemzNet’s  Architecture  for  data   streaming  
  15. 15. Acknowledgements   Eric  Pouyoul,  Yushu  Yao,  E.  Wes  Bethel,  Burlen  Loring,  Prabhat,  John   Shalf,  Alex  Sim,  Brian  L.  Tierney,  Peter  Nugent,  Zarija  Lukic  ,  Patrick   Dorn,   Evangelos   Chaniotakis,   John   Christman,   Chin   Guok,   Chris   Tracy,  Lauren  Rotman,  Jason  Lee,  Shane  Canon,  Tina  Declerck,  Cary   Whitney,  Ed  Holohan,    Adam  Scovel,  Linda  Winkler,  Jason  Hill,  Doug   Fuller,     Susan   Hicks,   Hank   Childs,   Mark   Howison,   Aaron   Thomas,   John  Dugan,  Gopal  Vaswani