Blue Waters and Resource Management - Now and in the Future


Published on

In this presentation from Moabcon 2013, Bill Kramer from NCSA presents: Blue Waters and Resource Management - Now and in the Future.

Watch the video of this presentation:

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Blue Waters and Resource Management - Now and in the Future

  1. 1. Blue Waters and Resource Management –Now and in the FutureDr. William KramerNational Center for Supercomputing Applications, University of Illinois
  2. 2. Science & Engineering on Blue WatersMolecular Science Weather & ClimateForecastingEarth ScienceAstro* HealthBlue Waters will enable advances in a broad range of science andengineering disciplines. Examples include:Moabcon 2013 - April 10, 2013 - Salt LakeCity2
  3. 3. What Have I been Doing?3Moabcon 2013 - April 10, 2013 - Salt LakeCity
  4. 4. Science  Area   Number  of  Teams  Codes   Struct  Grids  Unstruct  Grids  Dense  Matrix  Sparse  Matrix  N-­‐Body  Monte  Carlo  FFT   PIC   Significant  I/O  Climate  and  Weather   3   CESM,  GCRM,  CM1/WRF,  HOMME  X   X   X   X   X  Plasmas/Magnetosphere   2   H3D(M),VPIC,  OSIRIS,  Magtail/UPIC  X   X   X   X  Stellar  Atmospheres  and  Supernovae  5   PPM,  MAESTRO,  CASTRO,  SEDONA,  ChaNGa,  MS-­‐FLUKSS  X   X   X   X   X   X  Cosmology   2   Enzo,  pGADGET   X   X   X  CombusRon/Turbulence   2   PSDNS,  DISTUF   X   X  General  RelaRvity   2   Cactus,  Harm3D,  LazEV  X   X  Molecular  Dynamics   4   AMBER,  Gromacs,  NAMD,  LAMMPS  X   X   X  Quantum  Chemistry   2   SIAL,  GAMESS,  NWChem  X   X   X   X   X  Material  Science   3   NEMOS,  OMEN,  GW,  QMCPACK  X   X   X   X  Earthquakes/Seismology   2   AWP-­‐ODC,  HERCULES,  PLSQR,  SPECFEM3D  X   X   X   X  Quantum  Chromo  Dynamics  1   Chroma,  MILC,  USQCD  X   X   X   X   X  Social  Networks   1   EPISIMDEMICS  EvoluRon   1   Eve  Engineering/System  of  Systems  1   GRIPS,Revisit   X  Computer  Science   1   X   X   X   X   X  Moabcon 2013 - April 10, 2013 - Salt LakeCity4
  5. 5. Blue Waters Computing SystemMoabcon  2013  -­‐  April  10,  2013  -­‐  Salt  Lake  City  Sonexion:  26  usable  PB  >1  TB/sec  100  GB/sec  10/40/100  Gb  Ethernet  Switch  Spectra  Logic:  300  usable  PB  120+  Gb/sec  100-­‐300  Gbps  WAN  IB  Switch  5External  Servers  Aggregate  Memory  –  1.5  PB  
  6. 6. 40  GbE  FDR  IB  10  GbE  QDR  IB   Cray  HSN  1.2PB    useable  Disk  1,200 GB/s100 GB/s28  Dell  720  IE  servers  4  Dell    esLogin  Online  disk  >25PB  /home,  /project  /scratch  LNET(s)   rSIP  GW  300 GB/sNetwork  GW  FC8  LNETTCP/IP (10 GbE)SCSI (FCP)ProtocolsGridFTP (TCP/IP)380PB  RAW  Tape  50  Dell  720  Near  Line  servers  55 GB/s100GB/s100GB/s6100 GB/s40GbE  Switch  440  Gb/s  Ethernet  from  site  network  Core    FDR/QDR  IB    Extreme  Switches  LAN/WAN100 GB/sAll storage sizesgiven as theamount usable.Rates are alwaysusable/measuredsustained ratesMoabcon 2013 - April 10, 2013 - Salt LakeCity
  7. 7. 7Cray XE6/XK7 - 276 CabinetsXE6  Compute  Nodes  -­‐  5,688  Blades  –  22,640  Nodes  –      362,240  FP  (bulldozer)  Cores  –  724,480  Integer  Cores  4  GB  per  FP  core  DSL  48  Nodes  Resource    Manager  (MOM)  64  Nodes  H2O  Login    4  Nodes  Import/Export  Nodes  Management  Node  esServers CabinetsHPSS  Data  Mover  Nodes  XK7    GPU  Nodes  768  Blades  –  3,072  (4,224)  Nodes  24,576  (33,792)  FP  Cores      –  4,224  GPUs      4  GB  per  FP  core  Sonexion  25+  usable  PB  online  storage  36  racks  BOOT  2  Nodes  SDB  2  Nodes  Network  GW  8  Nodes  Reserved  74  Nodes  LNET  Routers  582  Nodes  InfiniBand  fabric  Boot RAIDBoot CabinetSMW    10/40/100  Gb  Ethernet  Switch  Gemini Fabric (HSN)RSIP  12Nodes  NCSAnet  Near-­‐Line  Storage  300+  usable  PB  SupporRng  systems:  LDAP,  RSA,  Portal,  JIRA,  Globus  CA,  Bro,  test  systems,  Accounts/AllocaRons,  CVS,  Wiki  Cyber  ProtecRon  IDPS  NPCFMoabcon 2013 - April 10, 2013 - Salt LakeCitySCUBA
  8. 8. BW Focus on Sustained Performance•  Blue Water’s and NSF are focusing on sustained performance in a way few havebeen before.•  Sustained is the computer’s useful, consistent performance on a broad range ofapplications that scientists and engineers use every day.•  Time to solution for a given amount of work is the important metric – not hardware Ops/s•  Sustained performance (and therefore tests) include time to read data and write the results•  NSF’s Track-1 call emphasized sustained performance, demonstrated on a collection ofapplication benchmarks (application + problem set)•  Not just simplistic metrics (e.g. HP Linpack)•  Applications include both Petascale applications (effectively use the full machine, solvingscalability problems for both compute and I/O) and applications that use a large fraction of thesystem•  Blue Waters project focus is on delivering sustained Petascale performance tocomputational and data focused applications•  Develop tools, techniques, samples, that exploit all parts of the system•  Explore new tools, programming models, and libraries to help applications get the most fromthe system•  By the Sustained Petascale Performance Metrics Blue Waters sustained >1.3 across12 different time to solution application tests.Moabcon 2013 - April 10, 2013 - Salt LakeCity8
  9. 9. View from the Blue Waters Portal9Moabcon 2013 - April 10, 2013 - Salt LakeCityAs of April 2, 2013, Blue Watershas delivered over 1.3 Billioncore-hours to S&E Teams
  10. 10. Usage Breakdown – Jan 1 to Mar 26, 2013•  Torque log accounting (NCSA, Mike Showerman)10Moabcon 2013 - April 10, 2013 - Salt LakeCity0  5  10  15  20  25  30  35  1   2   4   8   16   32   64   128   256   512   1024   2048   4096   8192   16384  32768  usage  (M  node-­‐hours)  XE  job  size  (NODES:  1  node  =  32  cores)  Accumulated  XE  node-­‐hours  –  January  1  to  March  26,  2013  Power of 2> 65,536 Cores> 262,144 Cores
  11. 11. OBSERVATIONS AND THOUGHTS:FLEXIBILITY IF THE WORD OF THENEXT DECADEWhat is Blue Waters already telling us about the future @Scalesystems11Moabcon 2013 - April 10, 2013 - Salt LakeCity
  12. 12. Observation 1: Topology Matters•  Much of the work for performance improvement of earlyapplications was understanding and tuning for layout/topology even on dedicated systems•  Factors of almost 10 were seen for some applications•  Nvidia’s Linpack results are mostly due to topology awarework layout•  Done with hand tuning, special node selection etc,•  Needs to become common place to really benefit use12Moabcon 2013 - April 10, 2013 - Salt LakeCity
  13. 13. Topology Matters•  Even very small changes canhave dramatic andunexpected consequences.•  Example – having just 1down gemini out of 6114can slow an application by>20%•  0.0156% of componentsunavailable can extend anapplication run time by>20% if the componentsjust happen to be in thewrong place•  P3DNS – 6114 Nodes13Moabcon 2013 - April 10, 2013 - Salt LakeCity
  14. 14. Topology•  1 poorly placed node out of4116 (0.02%) can slow anapplication by >30%•  On a dedicated system!•  It is hard to get an optimaltopology assignments,especial in non-dedicateduse, but is should be easyto avoid really detrimentaltopology assignments.141 poorly placed node out of4116 (0.02%) can slow anapplication by >30%Moabcon 2013 - April 10, 2013 - Salt LakeCity
  15. 15. Topology Awareness Needed forAll Types of Interconnects15Moabcon 2013 - April 10, 2013 - Salt LakeCityTori Trees HypercubesDirect Connect & Dragonflys
  16. 16. Performance and Scalability throughFlexibility•  Harder for applications are able to scale in the face of limited bandwidths.•  BW works with science teams and technology providers to•  Understand and develop better process-to-node mapping analysis to determinebehavior and usage patterns.•  Better instrumentation of what the network is really doing•  Topology aware resource and systems management that enable and rewardtopology aware applications•  Malleability – for applications and systems•  Understanding topology given and maximizing effectiveness•  Being able to express desired topology based on algorithms•  Mid ware support•  Even if applications scale, consistency becomes an increasing issue forsystems and applications•  This will only get worse in future systems16Moabcon 2013 - April 10, 2013 - Salt LakeCity
  17. 17. Flexible Resiliency Modes•  Run again•  Defensive I/O (traditional Checkpoint/restart)•  Expensive•  Extra overhead for application and system•  Intrusive•  I/O infrastructure share across all jobs•  New C/R (node memory copy, SSD, Journaling,…)•  Spare nodes in job requests to rebalance work if a single point of failure•  Wastes resources•  Run times do not support well yet (but can do it)•  Redistribute work within remaining nodes•  Charm++ , some MPI implementations•  Takes longer•  Add spare nodes from system pool to job•  Job scheduler and resource manager and runtime all have to made more flexible17Moabcon 2013 - April 10, 2013 - Salt LakeCity
  18. 18. Observation 2: Resiliency Flexibility Critical•  Migrate From Checkpoint to Application Resiliency•  Traditional system based checkpoint restart is no longer viable•  Defensive I/O per application is inefficient but the current state ofthe art•  Better application resiliency requires improvements in both systemsand applications•  Several teams moving to new frameworks (e.g. Charm++) toimprove resiliency•  MPI trying to add better features for resiliency18Moabcon 2013 - April 10, 2013 - Salt LakeCity
  19. 19. Resiliency Flexibility•  Application Based Resiliency•  Multiple layers of Software and Hardware have to coordinateinformation and reaction•  Analysis and understanding is needed before action•  Correct and actionable messages need to flow up and down thestack to the applications so they can take the proper action withcorrect information•  Application Situational Awareness - need to understandcircumstances and take action•  Flexible resource provisioning needed in real time•  Replacing failed node on the dynamically from a system pool ofnodes•  Interaction with other constraints so sub-optimization does notadversely impact overall system optimization19Moabcon 2013 - April 10, 2013 - Salt LakeCity
  20. 20. The Chicken or the Egg•  Applications cannot take advantage of features thesystem does not provide•  So they do the best they can with guesses•  Technology providers do not provide featuresbecause they say applications do not use themMy message isWe can not brute force our way to future @scalesystems and applications any longer.20Moabcon 2013 - April 10, 2013 - Salt LakeCity
  21. 21. Many Other Observations – OtherPresentations•  Storage and I/O significant challenges•  System software quality and resiliency•  Testing for function, feature and performance at scale•  Information Gathering for the system•  Application Methods•  Measuring real time to solution performance•  System SW scale performance•  Heterogeneous components•  Application Consistency•  Efficiency•  Energy, TCO, utilizationS&E Team Productivity•  ….21Moabcon 2013 - April 10, 2013 - Salt LakeCity
  22. 22. Summary•  Blue Waters is delivering on its commitment to sustained performance to theNation for computational and data focused @Scale problems.•  We appreciate the tremendous efforts and support of all our technologyproviders and science team partners•  I am pleased to see Adaptive and Cray seriously addressing topologyawareness issues – to meet BW specific needs and hopefully beyoind•  I am pleased Cray made initial improvements to enable application resiliencybut Adaptive, Cray, MPI and other technology providers need to do muchmore to solve•  I am very encouraged application teams are willing (and desire) to implementflexibility in their codes if they have options•  We need more commonality across technology providers and implement•  BW is an excellent platform for studying the issues as well as providing anunprecedented S&E resource•  Stay tuned for amazing results from BW22Moabcon 2013 - April 10, 2013 - Salt LakeCity
  23. 23. AcknowledgementsThis work is part of the Blue Waters sustained-petascale computing project,which is supported by the National Science Foundation (award number OCI07-25070) and the state of Illinois. Blue Waters is a joint effort of theUniversity of Illinois at Urbana-Champaign, its National Center forSupercomputing Applications, Cray, and the Great Lakes Consortium forPetascale Computation.The work described is achievable through the efforts of the many other ondifferent teams.Moabcon 2013 - April 10, 2013 - Salt LakeCity23