SlideShare a Scribd company logo
1 of 23
Download to read offline
Blue Waters and Resource Management –
Now and in the Future
Dr. William Kramer
National Center for Supercomputing Applications, University of Illinois
Science & Engineering on Blue Waters
Molecular Science Weather & Climate
Forecasting
Earth ScienceAstro* Health
Blue Waters will enable advances in a broad range of science and
engineering disciplines. Examples include:
Moabcon 2013 - April 10, 2013 - Salt Lake
City
2
What Have I been Doing?
3
Moabcon 2013 - April 10, 2013 - Salt Lake
City
Science	
  Area	
   Number	
  
of	
  Teams	
  
Codes	
   Struct	
  
Grids	
  
Unstruct	
  
Grids	
  
Dense	
  
Matrix	
  
Sparse	
  
Matrix	
  
N-­‐
Body	
  
Monte	
  
Carlo	
  
FFT	
   PIC	
   Significant	
  
I/O	
  
Climate	
  and	
  Weather	
   3	
   CESM,	
  GCRM,	
  
CM1/WRF,	
  
HOMME	
  
X	
   X	
   X	
   X	
   X	
  
Plasmas/Magnetosphere	
   2	
   H3D(M),VPIC,	
  
OSIRIS,	
  Magtail/
UPIC	
  
X	
   X	
   X	
   X	
  
Stellar	
  Atmospheres	
  and	
  
Supernovae	
  
5	
   PPM,	
  MAESTRO,	
  
CASTRO,	
  SEDONA,	
  
ChaNGa,	
  MS-­‐
FLUKSS	
  
X	
   X	
   X	
   X	
   X	
   X	
  
Cosmology	
   2	
   Enzo,	
  pGADGET	
   X	
   X	
   X	
  
CombusRon/Turbulence	
   2	
   PSDNS,	
  DISTUF	
   X	
   X	
  
General	
  RelaRvity	
   2	
   Cactus,	
  Harm3D,	
  
LazEV	
  
X	
   X	
  
Molecular	
  Dynamics	
   4	
   AMBER,	
  Gromacs,	
  
NAMD,	
  LAMMPS	
  
X	
   X	
   X	
  
Quantum	
  Chemistry	
   2	
   SIAL,	
  GAMESS,	
  
NWChem	
  
X	
   X	
   X	
   X	
   X	
  
Material	
  Science	
   3	
   NEMOS,	
  OMEN,	
  
GW,	
  QMCPACK	
  
X	
   X	
   X	
   X	
  
Earthquakes/Seismology	
   2	
   AWP-­‐ODC,	
  
HERCULES,	
  PLSQR,	
  
SPECFEM3D	
  
X	
   X	
   X	
   X	
  
Quantum	
  Chromo	
  
Dynamics	
  
1	
   Chroma,	
  MILC,	
  
USQCD	
  
X	
   X	
   X	
   X	
   X	
  
Social	
  Networks	
   1	
   EPISIMDEMICS	
  
EvoluRon	
   1	
   Eve	
  
Engineering/System	
  of	
  
Systems	
  
1	
   GRIPS,Revisit	
   X	
  
Computer	
  Science	
   1	
   X	
   X	
   X	
   X	
   X	
  
Moabcon 2013 - April 10, 2013 - Salt Lake
City
4
Blue Waters Computing System
Moabcon	
  2013	
  -­‐	
  April	
  10,	
  2013	
  -­‐	
  Salt	
  Lake	
  City	
  
Sonexion:	
  26	
  usable	
  PB	
  
>1	
  TB/sec	
  
100	
  GB/sec	
  
10/40/100	
  Gb	
  
Ethernet	
  Switch	
  
Spectra	
  Logic:	
  300	
  usable	
  PB	
  
120+	
  Gb/sec	
  
100-­‐300	
  Gbps	
  WAN	
  
IB	
  Switch	
  
5
External	
  Servers	
  
Aggregate	
  Memory	
  –	
  1.5	
  PB	
  
40	
  GbE	
  
FDR	
  IB	
  
10	
  GbE	
  
QDR	
  IB	
   Cray	
  HSN	
  
1.2PB	
  	
  
useable	
  
Disk	
  
1,200 GB/s
100 GB/s
28	
  Dell	
  720	
  
IE	
  servers	
  
4	
  Dell	
  	
  
esLogin	
  Online	
  disk	
  >25PB	
  
/home,	
  /project	
  
/scratch	
  
LNET(s)	
   rSIP	
  
GW	
  
300 GB/s
Network	
  
GW	
  
FC8	
  
LNET
TCP/IP (10 GbE)
SCSI (FCP)
Protocols
GridFTP (TCP/IP)
380PB	
  
RAW	
  
Tape	
  
50	
  Dell	
  720	
  
Near	
  Line	
  
servers	
  
55 GB/s
100
GB/s
100
GB/s
6
100 GB/s
40GbE	
  
Switch	
  
440	
  Gb/s	
  Ethernet	
  
from	
  site	
  network	
  
Core	
  	
  FDR/QDR	
  IB	
  	
  
Extreme	
  Switches	
  
LAN/WAN
100 GB/s
All storage sizes
given as the
amount usable.
Rates are always
usable/measured
sustained rates
Moabcon 2013 - April 10, 2013 - Salt Lake
City
7
Cray XE6/XK7 - 276 Cabinets
XE6	
  Compute	
  Nodes	
  -­‐	
  5,688	
  Blades	
  –	
  22,640	
  Nodes	
  –	
  	
  	
  
362,240	
  FP	
  (bulldozer)	
  Cores	
  –	
  724,480	
  Integer	
  Cores	
  
4	
  GB	
  per	
  FP	
  core	
  
DSL	
  
48	
  Nodes	
  
Resource	
  	
  
Manager	
  (MOM)	
  
64	
  Nodes	
  
H2O	
  Login	
  	
  
4	
  Nodes	
  
Import/Export	
  
Nodes	
  
Management	
  Node	
  
esServers Cabinets
HPSS	
  Data	
  Mover	
  
Nodes	
  
XK7	
  	
  GPU	
  Nodes	
  
768	
  Blades	
  –	
  3,072	
  (4,224)	
  Nodes	
  
24,576	
  (33,792)	
  FP	
  Cores	
  	
  
	
  –	
  4,224	
  GPUs	
  	
  	
  4	
  GB	
  per	
  FP	
  core	
  
Sonexion	
  
25+	
  usable	
  PB	
  online	
  storage	
  
36	
  racks	
  
BOOT	
  
2	
  Nodes	
  
SDB	
  
2	
  Nodes	
  
Network	
  GW	
  
8	
  Nodes	
  
Reserved	
  
74	
  Nodes	
  
LNET	
  Routers	
  
582	
  Nodes	
  
InfiniBand	
  fabric	
  
Boot RAID
Boot Cabinet
SMW	
  	
  
10/40/100	
  Gb	
  
Ethernet	
  Switch	
  
Gemini Fabric (HSN)
RSIP	
  
12Nodes	
  
NCSAnet	
  
Near-­‐Line	
  Storage	
  
300+	
  usable	
  PB	
  
SupporRng	
  systems:	
  LDAP,	
  RSA,	
  Portal,	
  JIRA,	
  Globus	
  CA,	
  
Bro,	
  test	
  systems,	
  Accounts/AllocaRons,	
  CVS,	
  Wiki	
  
Cyber	
  ProtecRon	
  IDPS	
  
NPCF
Moabcon 2013 - April 10, 2013 - Salt Lake
City
SCUBA
BW Focus on Sustained Performance
•  Blue Water’s and NSF are focusing on sustained performance in a way few have
been before.
•  Sustained is the computer’s useful, consistent performance on a broad range of
applications that scientists and engineers use every day.
•  Time to solution for a given amount of work is the important metric – not hardware Ops/s
•  Sustained performance (and therefore tests) include time to read data and write the results
•  NSF’s Track-1 call emphasized sustained performance, demonstrated on a collection of
application benchmarks (application + problem set)
•  Not just simplistic metrics (e.g. HP Linpack)
•  Applications include both Petascale applications (effectively use the full machine, solving
scalability problems for both compute and I/O) and applications that use a large fraction of the
system
•  Blue Waters project focus is on delivering sustained Petascale performance to
computational and data focused applications
•  Develop tools, techniques, samples, that exploit all parts of the system
•  Explore new tools, programming models, and libraries to help applications get the most from
the system
•  By the Sustained Petascale Performance Metrics Blue Waters sustained >1.3 across
12 different time to solution application tests.
Moabcon 2013 - April 10, 2013 - Salt Lake
City
8
View from the Blue Waters Portal
9
Moabcon 2013 - April 10, 2013 - Salt Lake
City
As of April 2, 2013, Blue Waters
has delivered over 1.3 Billion
core-hours to S&E Teams
Usage Breakdown – Jan 1 to Mar 26, 2013
•  Torque log accounting (NCSA, Mike Showerman)
10
Moabcon 2013 - April 10, 2013 - Salt Lake
City
0	
  
5	
  
10	
  
15	
  
20	
  
25	
  
30	
  
35	
  
1	
   2	
   4	
   8	
   16	
   32	
   64	
   128	
   256	
   512	
   1024	
   2048	
   4096	
   8192	
   16384	
  32768	
  
usage	
  (M	
  node-­‐hours)	
  
XE	
  job	
  size	
  (NODES:	
  1	
  node	
  =	
  32	
  cores)	
  
Accumulated	
  XE	
  node-­‐hours	
  –	
  January	
  1	
  to	
  March	
  26,	
  2013	
  
Power of 2
> 65,536 Cores
> 262,144 Cores
OBSERVATIONS AND THOUGHTS:
FLEXIBILITY IF THE WORD OF THE
NEXT DECADE
What is Blue Waters already telling us about the future @Scale
systems
11
Moabcon 2013 - April 10, 2013 - Salt Lake
City
Observation 1: Topology Matters
•  Much of the work for performance improvement of early
applications was understanding and tuning for layout/
topology even on dedicated systems
•  Factors of almost 10 were seen for some applications
•  Nvidia’s Linpack results are mostly due to topology aware
work layout
•  Done with hand tuning, special node selection etc,
•  Needs to become common place to really benefit use
12
Moabcon 2013 - April 10, 2013 - Salt Lake
City
Topology Matters
•  Even very small changes can
have dramatic and
unexpected consequences.
•  Example – having just 1
down gemini out of 6114
can slow an application by
>20%
•  0.0156% of components
unavailable can extend an
application run time by
>20% if the components
just happen to be in the
wrong place
•  P3DNS – 6114 Nodes
13
Moabcon 2013 - April 10, 2013 - Salt Lake
City
Topology
•  1 poorly placed node out of
4116 (0.02%) can slow an
application by >30%
•  On a dedicated system!
•  It is hard to get an optimal
topology assignments,
especial in non-dedicated
use, but is should be easy
to avoid really detrimental
topology assignments.
14
1 poorly placed node out of
4116 (0.02%) can slow an
application by >30%
Moabcon 2013 - April 10, 2013 - Salt Lake
City
Topology Awareness Needed for
All Types of Interconnects
15
Moabcon 2013 - April 10, 2013 - Salt Lake
City
Tori Trees Hypercubes
Direct Connect & Dragonflys
Performance and Scalability through
Flexibility
•  Harder for applications are able to scale in the face of limited bandwidths.
•  BW works with science teams and technology providers to
•  Understand and develop better process-to-node mapping analysis to determine
behavior and usage patterns.
•  Better instrumentation of what the network is really doing
•  Topology aware resource and systems management that enable and reward
topology aware applications
•  Malleability – for applications and systems
•  Understanding topology given and maximizing effectiveness
•  Being able to express desired topology based on algorithms
•  Mid ware support
•  Even if applications scale, consistency becomes an increasing issue for
systems and applications
•  This will only get worse in future systems
16
Moabcon 2013 - April 10, 2013 - Salt Lake
City
Flexible Resiliency Modes
•  Run again
•  Defensive I/O (traditional Checkpoint/restart)
•  Expensive
•  Extra overhead for application and system
•  Intrusive
•  I/O infrastructure share across all jobs
•  New C/R (node memory copy, SSD, Journaling,…)
•  Spare nodes in job requests to rebalance work if a single point of failure
•  Wastes resources
•  Run times do not support well yet (but can do it)
•  Redistribute work within remaining nodes
•  Charm++ , some MPI implementations
•  Takes longer
•  Add spare nodes from system pool to job
•  Job scheduler and resource manager and runtime all have to made more flexible
17
Moabcon 2013 - April 10, 2013 - Salt Lake
City
Observation 2: Resiliency Flexibility Critical
•  Migrate From Checkpoint to Application Resiliency
•  Traditional system based checkpoint restart is no longer viable
•  Defensive I/O per application is inefficient but the current state of
the art
•  Better application resiliency requires improvements in both systems
and applications
•  Several teams moving to new frameworks (e.g. Charm++) to
improve resiliency
•  MPI trying to add better features for resiliency
18
Moabcon 2013 - April 10, 2013 - Salt Lake
City
Resiliency Flexibility
•  Application Based Resiliency
•  Multiple layers of Software and Hardware have to coordinate
information and reaction
•  Analysis and understanding is needed before action
•  Correct and actionable messages need to flow up and down the
stack to the applications so they can take the proper action with
correct information
•  Application Situational Awareness - need to understand
circumstances and take action
•  Flexible resource provisioning needed in real time
•  Replacing failed node on the dynamically from a system pool of
nodes
•  Interaction with other constraints so sub-optimization does not
adversely impact overall system optimization
19
Moabcon 2013 - April 10, 2013 - Salt Lake
City
The Chicken or the Egg
•  Applications cannot take advantage of features the
system does not provide
•  So they do the best they can with guesses
•  Technology providers do not provide features
because they say applications do not use them
My message is
We can not brute force our way to future @scale
systems and applications any longer.
20
Moabcon 2013 - April 10, 2013 - Salt Lake
City
Many Other Observations – Other
Presentations
•  Storage and I/O significant challenges
•  System software quality and resiliency
•  Testing for function, feature and performance at scale
•  Information Gathering for the system
•  Application Methods
•  Measuring real time to solution performance
•  System SW scale performance
•  Heterogeneous components
•  Application Consistency
•  Efficiency
•  Energy, TCO, utilization
S&E Team Productivity
•  ….
21
Moabcon 2013 - April 10, 2013 - Salt Lake
City
Summary
•  Blue Waters is delivering on its commitment to sustained performance to the
Nation for computational and data focused @Scale problems.
•  We appreciate the tremendous efforts and support of all our technology
providers and science team partners
•  I am pleased to see Adaptive and Cray seriously addressing topology
awareness issues – to meet BW specific needs and hopefully beyoind
•  I am pleased Cray made initial improvements to enable application resiliency
but Adaptive, Cray, MPI and other technology providers need to do much
more to solve
•  I am very encouraged application teams are willing (and desire) to implement
flexibility in their codes if they have options
•  We need more commonality across technology providers and implement
•  BW is an excellent platform for studying the issues as well as providing an
unprecedented S&E resource
•  Stay tuned for amazing results from BW
22
Moabcon 2013 - April 10, 2013 - Salt Lake
City
Acknowledgements
This work is part of the Blue Waters sustained-petascale computing project,
which is supported by the National Science Foundation (award number OCI
07-25070) and the state of Illinois. Blue Waters is a joint effort of the
University of Illinois at Urbana-Champaign, its National Center for
Supercomputing Applications, Cray, and the Great Lakes Consortium for
Petascale Computation.
The work described is achievable through the efforts of the many other on
different teams.
Moabcon 2013 - April 10, 2013 - Salt Lake
City
23

More Related Content

What's hot

40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
 
Towards Exascale Simulations of Stellar Explosions with FLASH
Towards Exascale  Simulations of Stellar  Explosions with FLASHTowards Exascale  Simulations of Stellar  Explosions with FLASH
Towards Exascale Simulations of Stellar Explosions with FLASHGanesan Narayanasamy
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsRafael Ferreira da Silva
 
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...J On The Beach
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19ExtremeEarth
 
Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaFacultad de Informática UCM
 
Auto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream ProcessingAuto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream ProcessingZbigniew Jerzak
 
TeraGrid and Physics Research
TeraGrid and Physics ResearchTeraGrid and Physics Research
TeraGrid and Physics Researchshandra_psc
 
Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014aceas13tern
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Zbigniew Jerzak
 
20190314 cern register v3
20190314 cern register v320190314 cern register v3
20190314 cern register v3Tim Bell
 
20100512 Workflow Ramage
20100512 Workflow Ramage20100512 Workflow Ramage
20100512 Workflow RamageSteven Ramage
 
CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014Tim Bell
 
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic... NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...Igor Sfiligoi
 
GRIMES_Visualizing_Telemetry
GRIMES_Visualizing_TelemetryGRIMES_Visualizing_Telemetry
GRIMES_Visualizing_TelemetryKevin Grimes
 
Burst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runBurst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runIgor Sfiligoi
 
SkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemSkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemJayjeetChakraborty
 
"Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner""Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner"Frank Wuerthwein
 
BDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBenchBDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBencht_ivanov
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstIgor Sfiligoi
 

What's hot (20)

40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
Towards Exascale Simulations of Stellar Explosions with FLASH
Towards Exascale  Simulations of Stellar  Explosions with FLASHTowards Exascale  Simulations of Stellar  Explosions with FLASH
Towards Exascale Simulations of Stellar Explosions with FLASH
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and Workflows
 
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19
 
Barcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de RiquezaBarcelona Supercomputing Center, Generador de Riqueza
Barcelona Supercomputing Center, Generador de Riqueza
 
Auto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream ProcessingAuto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream Processing
 
TeraGrid and Physics Research
TeraGrid and Physics ResearchTeraGrid and Physics Research
TeraGrid and Physics Research
 
Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...
 
20190314 cern register v3
20190314 cern register v320190314 cern register v3
20190314 cern register v3
 
20100512 Workflow Ramage
20100512 Workflow Ramage20100512 Workflow Ramage
20100512 Workflow Ramage
 
CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014
 
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic... NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 
GRIMES_Visualizing_Telemetry
GRIMES_Visualizing_TelemetryGRIMES_Visualizing_Telemetry
GRIMES_Visualizing_Telemetry
 
Burst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runBurst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud run
 
SkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemSkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage System
 
"Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner""Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner"
 
BDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBenchBDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBench
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 

Similar to Blue Waters and Resource Management - Now and in the Future

Grid optical network service architecture for data intensive applications
Grid optical network service architecture for data intensive applicationsGrid optical network service architecture for data intensive applications
Grid optical network service architecture for data intensive applicationsTal Lavian Ph.D.
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research PlatformLarry Smarr
 
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...Larry Smarr
 
Linac Coherent Light Source (LCLS) Data Transfer Requirements
Linac Coherent Light Source (LCLS) Data Transfer RequirementsLinac Coherent Light Source (LCLS) Data Transfer Requirements
Linac Coherent Light Source (LCLS) Data Transfer Requirementsinside-BigData.com
 
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEIDATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEIBig Data Week
 
Network Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingNetwork Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingGlobus
 
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...Tal Lavian Ph.D.
 
BigData Clusters Redefined
BigData Clusters RedefinedBigData Clusters Redefined
BigData Clusters RedefinedDataWorks Summit
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and ComputationTal Lavian Ph.D.
 
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and HadoopMichael Zhang
 
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...balmanme
 
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...Larry Smarr
 
Long and winding road - Chile 2014
Long and winding road - Chile 2014Long and winding road - Chile 2014
Long and winding road - Chile 2014Connor McDonald
 
Data Mobility Exhibition
Data Mobility ExhibitionData Mobility Exhibition
Data Mobility ExhibitionGlobus
 
ACES QuakeSim 2011
ACES QuakeSim 2011ACES QuakeSim 2011
ACES QuakeSim 2011marpierc
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingJen Aman
 
20-datacenter-measurements.pptx
20-datacenter-measurements.pptx20-datacenter-measurements.pptx
20-datacenter-measurements.pptxSteve491226
 
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC FederalKafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC FederalHostedbyConfluent
 
MPLS/SDN 2013 Intercloud Standardization and Testbeds - Sill
MPLS/SDN 2013 Intercloud Standardization and Testbeds - SillMPLS/SDN 2013 Intercloud Standardization and Testbeds - Sill
MPLS/SDN 2013 Intercloud Standardization and Testbeds - SillAlan Sill
 

Similar to Blue Waters and Resource Management - Now and in the Future (20)

Grid optical network service architecture for data intensive applications
Grid optical network service architecture for data intensive applicationsGrid optical network service architecture for data intensive applications
Grid optical network service architecture for data intensive applications
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...
 
Linac Coherent Light Source (LCLS) Data Transfer Requirements
Linac Coherent Light Source (LCLS) Data Transfer RequirementsLinac Coherent Light Source (LCLS) Data Transfer Requirements
Linac Coherent Light Source (LCLS) Data Transfer Requirements
 
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEIDATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
 
Network Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingNetwork Engineering for High Speed Data Sharing
Network Engineering for High Speed Data Sharing
 
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
A Platform for Data Intensive Services Enabled by Next Generation Dynamic Opt...
 
BigData Clusters Redefined
BigData Clusters RedefinedBigData Clusters Redefined
BigData Clusters Redefined
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and Computation
 
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and Hadoop
 
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
 
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
High Performance Cyberinfrastructure Enabling Data-Driven Science Supporting ...
 
Long and winding road - Chile 2014
Long and winding road - Chile 2014Long and winding road - Chile 2014
Long and winding road - Chile 2014
 
Data Mobility Exhibition
Data Mobility ExhibitionData Mobility Exhibition
Data Mobility Exhibition
 
ACES QuakeSim 2011
ACES QuakeSim 2011ACES QuakeSim 2011
ACES QuakeSim 2011
 
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark Streaming
 
20-datacenter-measurements.pptx
20-datacenter-measurements.pptx20-datacenter-measurements.pptx
20-datacenter-measurements.pptx
 
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC FederalKafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
 
MPLS/SDN 2013 Intercloud Standardization and Testbeds - Sill
MPLS/SDN 2013 Intercloud Standardization and Testbeds - SillMPLS/SDN 2013 Intercloud Standardization and Testbeds - Sill
MPLS/SDN 2013 Intercloud Standardization and Testbeds - Sill
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Blue Waters and Resource Management - Now and in the Future

  • 1. Blue Waters and Resource Management – Now and in the Future Dr. William Kramer National Center for Supercomputing Applications, University of Illinois
  • 2. Science & Engineering on Blue Waters Molecular Science Weather & Climate Forecasting Earth ScienceAstro* Health Blue Waters will enable advances in a broad range of science and engineering disciplines. Examples include: Moabcon 2013 - April 10, 2013 - Salt Lake City 2
  • 3. What Have I been Doing? 3 Moabcon 2013 - April 10, 2013 - Salt Lake City
  • 4. Science  Area   Number   of  Teams   Codes   Struct   Grids   Unstruct   Grids   Dense   Matrix   Sparse   Matrix   N-­‐ Body   Monte   Carlo   FFT   PIC   Significant   I/O   Climate  and  Weather   3   CESM,  GCRM,   CM1/WRF,   HOMME   X   X   X   X   X   Plasmas/Magnetosphere   2   H3D(M),VPIC,   OSIRIS,  Magtail/ UPIC   X   X   X   X   Stellar  Atmospheres  and   Supernovae   5   PPM,  MAESTRO,   CASTRO,  SEDONA,   ChaNGa,  MS-­‐ FLUKSS   X   X   X   X   X   X   Cosmology   2   Enzo,  pGADGET   X   X   X   CombusRon/Turbulence   2   PSDNS,  DISTUF   X   X   General  RelaRvity   2   Cactus,  Harm3D,   LazEV   X   X   Molecular  Dynamics   4   AMBER,  Gromacs,   NAMD,  LAMMPS   X   X   X   Quantum  Chemistry   2   SIAL,  GAMESS,   NWChem   X   X   X   X   X   Material  Science   3   NEMOS,  OMEN,   GW,  QMCPACK   X   X   X   X   Earthquakes/Seismology   2   AWP-­‐ODC,   HERCULES,  PLSQR,   SPECFEM3D   X   X   X   X   Quantum  Chromo   Dynamics   1   Chroma,  MILC,   USQCD   X   X   X   X   X   Social  Networks   1   EPISIMDEMICS   EvoluRon   1   Eve   Engineering/System  of   Systems   1   GRIPS,Revisit   X   Computer  Science   1   X   X   X   X   X   Moabcon 2013 - April 10, 2013 - Salt Lake City 4
  • 5. Blue Waters Computing System Moabcon  2013  -­‐  April  10,  2013  -­‐  Salt  Lake  City   Sonexion:  26  usable  PB   >1  TB/sec   100  GB/sec   10/40/100  Gb   Ethernet  Switch   Spectra  Logic:  300  usable  PB   120+  Gb/sec   100-­‐300  Gbps  WAN   IB  Switch   5 External  Servers   Aggregate  Memory  –  1.5  PB  
  • 6. 40  GbE   FDR  IB   10  GbE   QDR  IB   Cray  HSN   1.2PB     useable   Disk   1,200 GB/s 100 GB/s 28  Dell  720   IE  servers   4  Dell     esLogin  Online  disk  >25PB   /home,  /project   /scratch   LNET(s)   rSIP   GW   300 GB/s Network   GW   FC8   LNET TCP/IP (10 GbE) SCSI (FCP) Protocols GridFTP (TCP/IP) 380PB   RAW   Tape   50  Dell  720   Near  Line   servers   55 GB/s 100 GB/s 100 GB/s 6 100 GB/s 40GbE   Switch   440  Gb/s  Ethernet   from  site  network   Core    FDR/QDR  IB     Extreme  Switches   LAN/WAN 100 GB/s All storage sizes given as the amount usable. Rates are always usable/measured sustained rates Moabcon 2013 - April 10, 2013 - Salt Lake City
  • 7. 7 Cray XE6/XK7 - 276 Cabinets XE6  Compute  Nodes  -­‐  5,688  Blades  –  22,640  Nodes  –       362,240  FP  (bulldozer)  Cores  –  724,480  Integer  Cores   4  GB  per  FP  core   DSL   48  Nodes   Resource     Manager  (MOM)   64  Nodes   H2O  Login     4  Nodes   Import/Export   Nodes   Management  Node   esServers Cabinets HPSS  Data  Mover   Nodes   XK7    GPU  Nodes   768  Blades  –  3,072  (4,224)  Nodes   24,576  (33,792)  FP  Cores      –  4,224  GPUs      4  GB  per  FP  core   Sonexion   25+  usable  PB  online  storage   36  racks   BOOT   2  Nodes   SDB   2  Nodes   Network  GW   8  Nodes   Reserved   74  Nodes   LNET  Routers   582  Nodes   InfiniBand  fabric   Boot RAID Boot Cabinet SMW     10/40/100  Gb   Ethernet  Switch   Gemini Fabric (HSN) RSIP   12Nodes   NCSAnet   Near-­‐Line  Storage   300+  usable  PB   SupporRng  systems:  LDAP,  RSA,  Portal,  JIRA,  Globus  CA,   Bro,  test  systems,  Accounts/AllocaRons,  CVS,  Wiki   Cyber  ProtecRon  IDPS   NPCF Moabcon 2013 - April 10, 2013 - Salt Lake City SCUBA
  • 8. BW Focus on Sustained Performance •  Blue Water’s and NSF are focusing on sustained performance in a way few have been before. •  Sustained is the computer’s useful, consistent performance on a broad range of applications that scientists and engineers use every day. •  Time to solution for a given amount of work is the important metric – not hardware Ops/s •  Sustained performance (and therefore tests) include time to read data and write the results •  NSF’s Track-1 call emphasized sustained performance, demonstrated on a collection of application benchmarks (application + problem set) •  Not just simplistic metrics (e.g. HP Linpack) •  Applications include both Petascale applications (effectively use the full machine, solving scalability problems for both compute and I/O) and applications that use a large fraction of the system •  Blue Waters project focus is on delivering sustained Petascale performance to computational and data focused applications •  Develop tools, techniques, samples, that exploit all parts of the system •  Explore new tools, programming models, and libraries to help applications get the most from the system •  By the Sustained Petascale Performance Metrics Blue Waters sustained >1.3 across 12 different time to solution application tests. Moabcon 2013 - April 10, 2013 - Salt Lake City 8
  • 9. View from the Blue Waters Portal 9 Moabcon 2013 - April 10, 2013 - Salt Lake City As of April 2, 2013, Blue Waters has delivered over 1.3 Billion core-hours to S&E Teams
  • 10. Usage Breakdown – Jan 1 to Mar 26, 2013 •  Torque log accounting (NCSA, Mike Showerman) 10 Moabcon 2013 - April 10, 2013 - Salt Lake City 0   5   10   15   20   25   30   35   1   2   4   8   16   32   64   128   256   512   1024   2048   4096   8192   16384  32768   usage  (M  node-­‐hours)   XE  job  size  (NODES:  1  node  =  32  cores)   Accumulated  XE  node-­‐hours  –  January  1  to  March  26,  2013   Power of 2 > 65,536 Cores > 262,144 Cores
  • 11. OBSERVATIONS AND THOUGHTS: FLEXIBILITY IF THE WORD OF THE NEXT DECADE What is Blue Waters already telling us about the future @Scale systems 11 Moabcon 2013 - April 10, 2013 - Salt Lake City
  • 12. Observation 1: Topology Matters •  Much of the work for performance improvement of early applications was understanding and tuning for layout/ topology even on dedicated systems •  Factors of almost 10 were seen for some applications •  Nvidia’s Linpack results are mostly due to topology aware work layout •  Done with hand tuning, special node selection etc, •  Needs to become common place to really benefit use 12 Moabcon 2013 - April 10, 2013 - Salt Lake City
  • 13. Topology Matters •  Even very small changes can have dramatic and unexpected consequences. •  Example – having just 1 down gemini out of 6114 can slow an application by >20% •  0.0156% of components unavailable can extend an application run time by >20% if the components just happen to be in the wrong place •  P3DNS – 6114 Nodes 13 Moabcon 2013 - April 10, 2013 - Salt Lake City
  • 14. Topology •  1 poorly placed node out of 4116 (0.02%) can slow an application by >30% •  On a dedicated system! •  It is hard to get an optimal topology assignments, especial in non-dedicated use, but is should be easy to avoid really detrimental topology assignments. 14 1 poorly placed node out of 4116 (0.02%) can slow an application by >30% Moabcon 2013 - April 10, 2013 - Salt Lake City
  • 15. Topology Awareness Needed for All Types of Interconnects 15 Moabcon 2013 - April 10, 2013 - Salt Lake City Tori Trees Hypercubes Direct Connect & Dragonflys
  • 16. Performance and Scalability through Flexibility •  Harder for applications are able to scale in the face of limited bandwidths. •  BW works with science teams and technology providers to •  Understand and develop better process-to-node mapping analysis to determine behavior and usage patterns. •  Better instrumentation of what the network is really doing •  Topology aware resource and systems management that enable and reward topology aware applications •  Malleability – for applications and systems •  Understanding topology given and maximizing effectiveness •  Being able to express desired topology based on algorithms •  Mid ware support •  Even if applications scale, consistency becomes an increasing issue for systems and applications •  This will only get worse in future systems 16 Moabcon 2013 - April 10, 2013 - Salt Lake City
  • 17. Flexible Resiliency Modes •  Run again •  Defensive I/O (traditional Checkpoint/restart) •  Expensive •  Extra overhead for application and system •  Intrusive •  I/O infrastructure share across all jobs •  New C/R (node memory copy, SSD, Journaling,…) •  Spare nodes in job requests to rebalance work if a single point of failure •  Wastes resources •  Run times do not support well yet (but can do it) •  Redistribute work within remaining nodes •  Charm++ , some MPI implementations •  Takes longer •  Add spare nodes from system pool to job •  Job scheduler and resource manager and runtime all have to made more flexible 17 Moabcon 2013 - April 10, 2013 - Salt Lake City
  • 18. Observation 2: Resiliency Flexibility Critical •  Migrate From Checkpoint to Application Resiliency •  Traditional system based checkpoint restart is no longer viable •  Defensive I/O per application is inefficient but the current state of the art •  Better application resiliency requires improvements in both systems and applications •  Several teams moving to new frameworks (e.g. Charm++) to improve resiliency •  MPI trying to add better features for resiliency 18 Moabcon 2013 - April 10, 2013 - Salt Lake City
  • 19. Resiliency Flexibility •  Application Based Resiliency •  Multiple layers of Software and Hardware have to coordinate information and reaction •  Analysis and understanding is needed before action •  Correct and actionable messages need to flow up and down the stack to the applications so they can take the proper action with correct information •  Application Situational Awareness - need to understand circumstances and take action •  Flexible resource provisioning needed in real time •  Replacing failed node on the dynamically from a system pool of nodes •  Interaction with other constraints so sub-optimization does not adversely impact overall system optimization 19 Moabcon 2013 - April 10, 2013 - Salt Lake City
  • 20. The Chicken or the Egg •  Applications cannot take advantage of features the system does not provide •  So they do the best they can with guesses •  Technology providers do not provide features because they say applications do not use them My message is We can not brute force our way to future @scale systems and applications any longer. 20 Moabcon 2013 - April 10, 2013 - Salt Lake City
  • 21. Many Other Observations – Other Presentations •  Storage and I/O significant challenges •  System software quality and resiliency •  Testing for function, feature and performance at scale •  Information Gathering for the system •  Application Methods •  Measuring real time to solution performance •  System SW scale performance •  Heterogeneous components •  Application Consistency •  Efficiency •  Energy, TCO, utilization S&E Team Productivity •  …. 21 Moabcon 2013 - April 10, 2013 - Salt Lake City
  • 22. Summary •  Blue Waters is delivering on its commitment to sustained performance to the Nation for computational and data focused @Scale problems. •  We appreciate the tremendous efforts and support of all our technology providers and science team partners •  I am pleased to see Adaptive and Cray seriously addressing topology awareness issues – to meet BW specific needs and hopefully beyoind •  I am pleased Cray made initial improvements to enable application resiliency but Adaptive, Cray, MPI and other technology providers need to do much more to solve •  I am very encouraged application teams are willing (and desire) to implement flexibility in their codes if they have options •  We need more commonality across technology providers and implement •  BW is an excellent platform for studying the issues as well as providing an unprecedented S&E resource •  Stay tuned for amazing results from BW 22 Moabcon 2013 - April 10, 2013 - Salt Lake City
  • 23. Acknowledgements This work is part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation (award number OCI 07-25070) and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign, its National Center for Supercomputing Applications, Cray, and the Great Lakes Consortium for Petascale Computation. The work described is achievable through the efforts of the many other on different teams. Moabcon 2013 - April 10, 2013 - Salt Lake City 23