SlideShare a Scribd company logo
1 of 25
Download to read offline
T. Schulthess
Thomas C. Schulthess
1
An update on CSCS
T. Schulthess
ETH Domain
2
Lausanne
Basel
Thoune
Villigen
Birmensdorf
Zürich
Dübendorf
St. Gallen
Kastanienbaum
Bellinzona
Lugano-
Cornaredo
Davos
Neuchâtel
Sion
ETH Zurich
EPFL, Lausanne
PSI
WSL
Empa
Eawag
T. Schulthess 3
Centro Svizzero di Calcolo Scientifico (CSCS)
The Swiss National Supercomputing Center
• Established in 1991 as a unit of ETH Zurich
• Located in Lugano, Ticino
• 115 highly qualified staff (25f) from 26 nations
• Flexible Infrastructure: Power/cooling: 12MW, upgradable to 25MW
• Develops & operates the key supercomputing capabilities required to solve important problems to science and/or society
• A research infrastructure with ~2000 users, 200 projects
• Leads the national strategy for High-Performance Computing and Networking (HPCN) that was passed by Swiss Parliament in 2009
• Has a dedicated User Laboratory for supercomputing since 2011
(i.e. research infrastructure funded by the ETH Domain on a programmatic basis)
• From 2017, tier-0 system of the “Partnership for Advanced Computing in Europe” (PRACE)
T. Schulthess
User Lab allocations 2018 – by research fields
4
node hours storage
T. Schulthess
User Lab allocations 2018 – by institutions
5
T. Schulthess
“Piz Daint” 2017 fact sheet
6
http://www.cscs.ch/publications/fact_sheets/index.html
~5’000 NVIDIA P100 GPU accelerated nodes
~1’400 Dual multi-core socket nodes
Institutions using Piz Daint (in 2019)
•User Lab (including PRACE Tier-0 allocations)
•University of Zurich, USI, PSI, EMPA
•NCCR MARVEL and HBP (EPFL)
•CHIPP (Swiss LHC Grid sine Aug. 2017)
•Others (exploratory)
T. Schulthess 7
CSCS vision for next generation systems
•Performance goal: develop a general purpose system (for all domains) with enough
performance to run “exascale weather and climate simulations” by 2022, specifically,
•Run global model with 1 km horizontal resolution at one simulated year per day
throughput on a system with similar footprint at Piz Daint;
•Functional goal: converged Cloud and HPC services in one infrastructure
•Support most native Cloud services on supercomputer replacing Piz Daint in 2022
•In particular, focus on software defined infrastructure (networking, storage and
compute) and service orientation
Pursue clear and ambitious goals for successor of Piz Daint
T. Schulthess 8
September 15, 2015
Today’s Outlook: GPU-accelerated Weather Forecasting
John Russell
“Piz Kesch”
Since April 2016, the Swiss version* of the COSMO model is running operationally on GPUs
(*) Swiss version of the COSMO model is running at
1km horizontal resolution over Alpine region and was
(in 2016) ~10x more efficient than the state of the art
T. Schulthess
MeteoSwiss’ performance ambitions in 2013
9
1
5
10
15
20
25
30
35
40
Constant budget for investments and operations
24x
Ensemble with multiple forecasts
Grid 2.2 km → 1.1 km
10x
Requirements from MeteoSwiss
Data assimilation
6x
We need a 40x improvement between 2012 and 2015 at constant cost
?
T. Schulthess
COSMO: old and new (refactored) code
10
main (current / Fortran)
physics
(Fortran)
dynamics (Fortran)
MPI
system
main (new / Fortran)
physics
(Fortran)
with OpenMP /
OpenACC
dynamics (C++)
MPI or whatever
system
Generic Comm.
Library
boundary
conditions & halo
exchg.
stencil library
X86* CUDA
Shared Infrastructure
ROCmPhi*
* two different OpenMP backends
T. Schulthess
Where the factor 40 improvement came from
11
1
5
10
15
20
25
30
35
40
Constant budget for investments and operations
Grid 2.2 km → 1.1 km
24x
Ensemble with multiple forecasts
Data assimilation
10x
1.7x from software refactoring (old vs. new implementation on x86)
2.8x Mathematical improvements (resource utilisation, precision)
2.8x Moore’s Law & arch. improvements on x86
2.3x Change in architecture (CPU → GPU)
1.3x additional processors
Requirements from MeteoSwiss
6x
Investment in software allowed mathematical improvements and change in architecture
There is no silver bullet!
Bonus: reduction in power!
T. Schulthess
Leadership in weather and climate
12
European model may be the best – but far away from
sufficient accuracy and reliability!
Peter Bauer, ECMWF
T. Schulthess
Structural convergence
Statistics of cloud ensemble:
E.g., spacing and size of convective clouds
Bulk convergence
Area-averaged bulk effects upon ambient flow:
E.g., heating and moistening of cloud layer
Resolving convective clouds (convergence?)
13
Source: Christoph Schär, ETH Zurich
T. Schulthess
Structural and bulk convergence
14
Source: Christoph Schär, ETH Zurich
Statistics of cloud area Statistics of up- & downdrafts
No structural convergence Bulk statistics of updrafts converges
Factor 4
(Panosetti et al. 2018)
T. Schulthess 15
Source: Christoph Schär, ETH Zurich, & Nils Wedi, ECMWF
Can the delivery of a 1km-scale
capability be pulled in by a decade?
T. Schulthess
Our “exascale” goal for 2022
16
Horizontal resolution 1 km (globally quasi-uniform)
Vertical resolution 180 levels (surface to ~100 km)
Time resolution Less than 1 minute
Coupled Land-surface/ocean/ocean-waves/sea-ice
Atmosphere Non-hydrostatic
Precision Single (32bit) or mixed precision
Compute rate 1 SYPD (simulated year wall-clock day)
T. Schulthess
Running COSMO 5.0 & IFS (“the European Model”) at global scale on Piz Daint
17
Scaling to full system size: ~5300 GPU accelerate nodes available
Running a near-global (±80º covering 97% of Earths surface) COSMO 5.0 simulation & IFS
> Either on the hosts processors: Intel Xeon E5 2690v3 (Haswell 12c).
> Or on the GPU accelerator: PCIe version of NVIDIA GP100 (Pascal) GPU
T. Schulthess
The baseline for COSMO-global and IFS
18
T. Schulthess
Memory use efficiency
19
Necessary data transfers
Actual data transfers
Fuhrer et al., Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2017-230, published 2018
Achieved BW
Max achievable BW
(STREAM)
0.88
0.76
= 0.67
2x lower than peak BW
0.55 w. regard to
peak BW
T. Schulthess
Can the 100x shortfall of a grid-based implementation like COSMO-global be overcome?
20
1. Icosahedral/octahedral grid (ICON/IFS) vs. Lat-long/Cartesian grid (COSMO)
2x fewer grid-columns
Time step of 10 ms instead of 5 ms
4x
2. Improving BW efficiency
Improve BW efficiency and peak BW 2x
(results on Volta show this is realistic)
3. Strong scaling
4x possible in COSMO, but we reduced
available parallelism by factor 1.33
3x
4. Remaining reduction in shortfall 4x
Numerical algorithms (larger time steps)
Further improved processors / memory
But we don’t want to increase the footprint of the 2022 system succeeding “Piz Daint”
100x
T. Schulthess
What about ensembles and throughput for climate?
(Remaining goals beyond 2022)
21
1. Improve the throughput to 5 SYPD
2. Reduce the footprint of a single simulation by up to factor 10-50
Necessary data transfers
Actual data transfers
Achieved BW
Max achievable BW
Change the architecture from control flow to data flow centric (reduce necessary data transfers)
We may have to change the footprint of machines to hyper scale!
T. Schulthess 22
LUMI Consortium
•Large consortium with strong national HPC centres and competence
provides a unique opportunity for
•knowledge transfer;
•synergies in operations; and
•regionally adaptable user support for extreme-scale systems
•National & EU investments (2020-2026)
Finland 50 M€
Belgium 15.5 M€
Czech Republic 5 M€
Denmark 6 M€
Estonia 2 M€
Norway 4 M€
Poland 5 M€
Sweden 7 M€
Switzerland 10 M€
EU 104 M€
Plus additional investments in applications development
T. Schulthess
Kajaani Data Center (LUMI)
23
100% hydroelectric energy up to 200 MW
2200 m2 floor space, expandable up to 4600 m2
Waste heat reuse: effective energy price 35 €/MWh,
negative CO2 footprint: 13500 tons reduced every year
One power grid outage in 36 years
100% free cooling @ PUE 1.03
Extreme connectivity:
Kajaani DC is a direct part of the Nordic backbone; 4x100
Gbit/s in place; can be easily scaled up to multi-terabit level
Zero network downtime since the establishment of the DC in 2012
T. Schulthess
Collaborators on Exascale (climate)
24
Tim Palmer (U. of Oxford)
Christoph Schar (ETH Zurich)
Oliver Fuhrer (MeteoSwiss)
Peter Bauer (ECMWF)
Bjorn Stevens (MPI-M)
Torsten Hoefler (ETH Zurich)Nils Wedi (ECMWF)
T. Schulthess 25
Thank you!

More Related Content

What's hot

AHM 2014: The Flow Simulation Tools on VHub
AHM 2014: The Flow Simulation Tools on VHubAHM 2014: The Flow Simulation Tools on VHub
AHM 2014: The Flow Simulation Tools on VHubEarthCube
 
Inspire Compliant Weather Data
Inspire Compliant Weather DataInspire Compliant Weather Data
Inspire Compliant Weather DataRoope Tervo
 
Nuclear emergency response and Big Data technologies
Nuclear emergency response and Big Data technologiesNuclear emergency response and Big Data technologies
Nuclear emergency response and Big Data technologiesBigData_Europe
 
SmartMet Server OSGeo
SmartMet Server OSGeoSmartMet Server OSGeo
SmartMet Server OSGeoRoope Tervo
 
069MSW405_Devendra Tamrakar_Presentation
069MSW405_Devendra Tamrakar_Presentation069MSW405_Devendra Tamrakar_Presentation
069MSW405_Devendra Tamrakar_PresentationDevendra Tamrakar
 
1 catchment delineation.ppt
1 catchment delineation.ppt1 catchment delineation.ppt
1 catchment delineation.pptmarwan B
 
Producing INSPIRE compliant datasets
Producing INSPIRE compliant datasetsProducing INSPIRE compliant datasets
Producing INSPIRE compliant datasetsRoope Tervo
 
Optimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource ConfigurationOptimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource ConfigurationRECAP Project
 
Accès ouvert aux données météorologiques d’Environnement Canada
Accès ouvert aux données météorologiques d’Environnement CanadaAccès ouvert aux données météorologiques d’Environnement Canada
Accès ouvert aux données météorologiques d’Environnement CanadaVisionGEOMATIQUE2014
 
Possibilities of Open Source Code
Possibilities of Open Source CodePossibilities of Open Source Code
Possibilities of Open Source CodeRoope Tervo
 
Fibre footprint-for-research-infrastructures
Fibre footprint-for-research-infrastructuresFibre footprint-for-research-infrastructures
Fibre footprint-for-research-infrastructuresCESNET
 
Watershed Delineation Using ArcMap
Watershed Delineation Using ArcMapWatershed Delineation Using ArcMap
Watershed Delineation Using ArcMapArthur Green
 
FMI Open Data Interface and Usage
FMI Open Data Interface and UsageFMI Open Data Interface and Usage
FMI Open Data Interface and UsageRoope Tervo
 
Big Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open WorkshopBig Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open WorkshopExtremeEarth
 

What's hot (17)

AHM 2014: The Flow Simulation Tools on VHub
AHM 2014: The Flow Simulation Tools on VHubAHM 2014: The Flow Simulation Tools on VHub
AHM 2014: The Flow Simulation Tools on VHub
 
Inspire Compliant Weather Data
Inspire Compliant Weather DataInspire Compliant Weather Data
Inspire Compliant Weather Data
 
Nuclear emergency response and Big Data technologies
Nuclear emergency response and Big Data technologiesNuclear emergency response and Big Data technologies
Nuclear emergency response and Big Data technologies
 
SmartMet Server OSGeo
SmartMet Server OSGeoSmartMet Server OSGeo
SmartMet Server OSGeo
 
069MSW405_Devendra Tamrakar_Presentation
069MSW405_Devendra Tamrakar_Presentation069MSW405_Devendra Tamrakar_Presentation
069MSW405_Devendra Tamrakar_Presentation
 
1 catchment delineation.ppt
1 catchment delineation.ppt1 catchment delineation.ppt
1 catchment delineation.ppt
 
Producing INSPIRE compliant datasets
Producing INSPIRE compliant datasetsProducing INSPIRE compliant datasets
Producing INSPIRE compliant datasets
 
Optimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource ConfigurationOptimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource Configuration
 
Software for the Hydrographic ocean
Software for the Hydrographic oceanSoftware for the Hydrographic ocean
Software for the Hydrographic ocean
 
Accès ouvert aux données météorologiques d’Environnement Canada
Accès ouvert aux données météorologiques d’Environnement CanadaAccès ouvert aux données météorologiques d’Environnement Canada
Accès ouvert aux données météorologiques d’Environnement Canada
 
Possibilities of Open Source Code
Possibilities of Open Source CodePossibilities of Open Source Code
Possibilities of Open Source Code
 
Fibre footprint-for-research-infrastructures
Fibre footprint-for-research-infrastructuresFibre footprint-for-research-infrastructures
Fibre footprint-for-research-infrastructures
 
Watershed Delineation Using ArcMap
Watershed Delineation Using ArcMapWatershed Delineation Using ArcMap
Watershed Delineation Using ArcMap
 
FMI Open Data Interface and Usage
FMI Open Data Interface and UsageFMI Open Data Interface and Usage
FMI Open Data Interface and Usage
 
Change detection
Change detection Change detection
Change detection
 
Big Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open WorkshopBig Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open Workshop
 
3D Analyst - Cut and Fill
3D Analyst - Cut and Fill3D Analyst - Cut and Fill
3D Analyst - Cut and Fill
 

Similar to An Update on CSCS

Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos
 
Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...
Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...
Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...Leonardo ENERGY
 
CloudLightning - Project and Architecture Overview
CloudLightning - Project and Architecture OverviewCloudLightning - Project and Architecture Overview
CloudLightning - Project and Architecture OverviewCloudLightning
 
LCG project description
LCG project descriptionLCG project description
LCG project descriptionlouisponcet
 
Data analytics and downscaling for climate research in a big data world
Data analytics and downscaling for climate research in a big data worldData analytics and downscaling for climate research in a big data world
Data analytics and downscaling for climate research in a big data worldBigData_Europe
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
 
IoT Meetup Budapest - The Open-CPS approach
IoT Meetup Budapest - The Open-CPS approachIoT Meetup Budapest - The Open-CPS approach
IoT Meetup Budapest - The Open-CPS approachÁkos Horváth
 
This Helix Nebula Science Cloud Pilot Phase Open Session
This Helix Nebula Science Cloud Pilot Phase Open SessionThis Helix Nebula Science Cloud Pilot Phase Open Session
This Helix Nebula Science Cloud Pilot Phase Open SessionHelix Nebula The Science Cloud
 
Designing HPC Architectures at the Barcelona Supercomputing Center
Designing HPC Architectures at the Barcelona Supercomputing CenterDesigning HPC Architectures at the Barcelona Supercomputing Center
Designing HPC Architectures at the Barcelona Supercomputing CenterFacultad de Informática UCM
 
Volunteer Crowd Computing and Federated Cloud developments
Volunteer Crowd Computing and Federated Cloud developmentsVolunteer Crowd Computing and Federated Cloud developments
Volunteer Crowd Computing and Federated Cloud developmentsDavid Wallom
 
Linking EUDAT services to the EGI Fed-Cloud - EUDAT Summer School (Hans van P...
Linking EUDAT services to the EGI Fed-Cloud - EUDAT Summer School (Hans van P...Linking EUDAT services to the EGI Fed-Cloud - EUDAT Summer School (Hans van P...
Linking EUDAT services to the EGI Fed-Cloud - EUDAT Summer School (Hans van P...EUDAT
 
ICT4S - Green cloud? the current and future development of energy consumption...
ICT4S - Green cloud? the current and future development of energy consumption...ICT4S - Green cloud? the current and future development of energy consumption...
ICT4S - Green cloud? the current and future development of energy consumption...SURFsara
 
eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...
eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...
eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...e-ROSA
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning
 
El nuevo superordenador Mare Nostrum y el futuro procesador europeo
El nuevo superordenador Mare Nostrum y el futuro procesador europeoEl nuevo superordenador Mare Nostrum y el futuro procesador europeo
El nuevo superordenador Mare Nostrum y el futuro procesador europeoAMETIC
 
Flexibility needs at system level and how RD&I projects are leveraging these ...
Flexibility needs at system level and how RD&I projects are leveraging these ...Flexibility needs at system level and how RD&I projects are leveraging these ...
Flexibility needs at system level and how RD&I projects are leveraging these ...Leonardo ENERGY
 

Similar to An Update on CSCS (20)

Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
 
Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...
Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...
Overview of the FlexPlan project. Focus on EU regulatory analysis and TSO-DSO...
 
CloudLightning - Project and Architecture Overview
CloudLightning - Project and Architecture OverviewCloudLightning - Project and Architecture Overview
CloudLightning - Project and Architecture Overview
 
LCG project description
LCG project descriptionLCG project description
LCG project description
 
Data analytics and downscaling for climate research in a big data world
Data analytics and downscaling for climate research in a big data worldData analytics and downscaling for climate research in a big data world
Data analytics and downscaling for climate research in a big data world
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
IoT Meetup Budapest - The Open-CPS approach
IoT Meetup Budapest - The Open-CPS approachIoT Meetup Budapest - The Open-CPS approach
IoT Meetup Budapest - The Open-CPS approach
 
This Helix Nebula Science Cloud Pilot Phase Open Session
This Helix Nebula Science Cloud Pilot Phase Open SessionThis Helix Nebula Science Cloud Pilot Phase Open Session
This Helix Nebula Science Cloud Pilot Phase Open Session
 
Designing HPC Architectures at the Barcelona Supercomputing Center
Designing HPC Architectures at the Barcelona Supercomputing CenterDesigning HPC Architectures at the Barcelona Supercomputing Center
Designing HPC Architectures at the Barcelona Supercomputing Center
 
Volunteer Crowd Computing and Federated Cloud developments
Volunteer Crowd Computing and Federated Cloud developmentsVolunteer Crowd Computing and Federated Cloud developments
Volunteer Crowd Computing and Federated Cloud developments
 
Linking EUDAT services to the EGI Fed-Cloud - EUDAT Summer School (Hans van P...
Linking EUDAT services to the EGI Fed-Cloud - EUDAT Summer School (Hans van P...Linking EUDAT services to the EGI Fed-Cloud - EUDAT Summer School (Hans van P...
Linking EUDAT services to the EGI Fed-Cloud - EUDAT Summer School (Hans van P...
 
Jorge gomes
Jorge gomesJorge gomes
Jorge gomes
 
Jorge gomes
Jorge gomesJorge gomes
Jorge gomes
 
Jorge gomes
Jorge gomesJorge gomes
Jorge gomes
 
ICT4S - Green cloud? the current and future development of energy consumption...
ICT4S - Green cloud? the current and future development of energy consumption...ICT4S - Green cloud? the current and future development of energy consumption...
ICT4S - Green cloud? the current and future development of energy consumption...
 
eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...
eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...
eROSA Policy WS2: European Open Science Cloud (EOSC) - The Perspective of e-I...
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use Case
 
El nuevo superordenador Mare Nostrum y el futuro procesador europeo
El nuevo superordenador Mare Nostrum y el futuro procesador europeoEl nuevo superordenador Mare Nostrum y el futuro procesador europeo
El nuevo superordenador Mare Nostrum y el futuro procesador europeo
 
Flexibility needs at system level and how RD&I projects are leveraging these ...
Flexibility needs at system level and how RD&I projects are leveraging these ...Flexibility needs at system level and how RD&I projects are leveraging these ...
Flexibility needs at system level and how RD&I projects are leveraging these ...
 
SomeSlides
SomeSlidesSomeSlides
SomeSlides
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Recently uploaded (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

An Update on CSCS

  • 1. T. Schulthess Thomas C. Schulthess 1 An update on CSCS
  • 2. T. Schulthess ETH Domain 2 Lausanne Basel Thoune Villigen Birmensdorf Zürich Dübendorf St. Gallen Kastanienbaum Bellinzona Lugano- Cornaredo Davos Neuchâtel Sion ETH Zurich EPFL, Lausanne PSI WSL Empa Eawag
  • 3. T. Schulthess 3 Centro Svizzero di Calcolo Scientifico (CSCS) The Swiss National Supercomputing Center • Established in 1991 as a unit of ETH Zurich • Located in Lugano, Ticino • 115 highly qualified staff (25f) from 26 nations • Flexible Infrastructure: Power/cooling: 12MW, upgradable to 25MW • Develops & operates the key supercomputing capabilities required to solve important problems to science and/or society • A research infrastructure with ~2000 users, 200 projects • Leads the national strategy for High-Performance Computing and Networking (HPCN) that was passed by Swiss Parliament in 2009 • Has a dedicated User Laboratory for supercomputing since 2011 (i.e. research infrastructure funded by the ETH Domain on a programmatic basis) • From 2017, tier-0 system of the “Partnership for Advanced Computing in Europe” (PRACE)
  • 4. T. Schulthess User Lab allocations 2018 – by research fields 4 node hours storage
  • 5. T. Schulthess User Lab allocations 2018 – by institutions 5
  • 6. T. Schulthess “Piz Daint” 2017 fact sheet 6 http://www.cscs.ch/publications/fact_sheets/index.html ~5’000 NVIDIA P100 GPU accelerated nodes ~1’400 Dual multi-core socket nodes Institutions using Piz Daint (in 2019) •User Lab (including PRACE Tier-0 allocations) •University of Zurich, USI, PSI, EMPA •NCCR MARVEL and HBP (EPFL) •CHIPP (Swiss LHC Grid sine Aug. 2017) •Others (exploratory)
  • 7. T. Schulthess 7 CSCS vision for next generation systems •Performance goal: develop a general purpose system (for all domains) with enough performance to run “exascale weather and climate simulations” by 2022, specifically, •Run global model with 1 km horizontal resolution at one simulated year per day throughput on a system with similar footprint at Piz Daint; •Functional goal: converged Cloud and HPC services in one infrastructure •Support most native Cloud services on supercomputer replacing Piz Daint in 2022 •In particular, focus on software defined infrastructure (networking, storage and compute) and service orientation Pursue clear and ambitious goals for successor of Piz Daint
  • 8. T. Schulthess 8 September 15, 2015 Today’s Outlook: GPU-accelerated Weather Forecasting John Russell “Piz Kesch” Since April 2016, the Swiss version* of the COSMO model is running operationally on GPUs (*) Swiss version of the COSMO model is running at 1km horizontal resolution over Alpine region and was (in 2016) ~10x more efficient than the state of the art
  • 9. T. Schulthess MeteoSwiss’ performance ambitions in 2013 9 1 5 10 15 20 25 30 35 40 Constant budget for investments and operations 24x Ensemble with multiple forecasts Grid 2.2 km → 1.1 km 10x Requirements from MeteoSwiss Data assimilation 6x We need a 40x improvement between 2012 and 2015 at constant cost ?
  • 10. T. Schulthess COSMO: old and new (refactored) code 10 main (current / Fortran) physics (Fortran) dynamics (Fortran) MPI system main (new / Fortran) physics (Fortran) with OpenMP / OpenACC dynamics (C++) MPI or whatever system Generic Comm. Library boundary conditions & halo exchg. stencil library X86* CUDA Shared Infrastructure ROCmPhi* * two different OpenMP backends
  • 11. T. Schulthess Where the factor 40 improvement came from 11 1 5 10 15 20 25 30 35 40 Constant budget for investments and operations Grid 2.2 km → 1.1 km 24x Ensemble with multiple forecasts Data assimilation 10x 1.7x from software refactoring (old vs. new implementation on x86) 2.8x Mathematical improvements (resource utilisation, precision) 2.8x Moore’s Law & arch. improvements on x86 2.3x Change in architecture (CPU → GPU) 1.3x additional processors Requirements from MeteoSwiss 6x Investment in software allowed mathematical improvements and change in architecture There is no silver bullet! Bonus: reduction in power!
  • 12. T. Schulthess Leadership in weather and climate 12 European model may be the best – but far away from sufficient accuracy and reliability! Peter Bauer, ECMWF
  • 13. T. Schulthess Structural convergence Statistics of cloud ensemble: E.g., spacing and size of convective clouds Bulk convergence Area-averaged bulk effects upon ambient flow: E.g., heating and moistening of cloud layer Resolving convective clouds (convergence?) 13 Source: Christoph Schär, ETH Zurich
  • 14. T. Schulthess Structural and bulk convergence 14 Source: Christoph Schär, ETH Zurich Statistics of cloud area Statistics of up- & downdrafts No structural convergence Bulk statistics of updrafts converges Factor 4 (Panosetti et al. 2018)
  • 15. T. Schulthess 15 Source: Christoph Schär, ETH Zurich, & Nils Wedi, ECMWF Can the delivery of a 1km-scale capability be pulled in by a decade?
  • 16. T. Schulthess Our “exascale” goal for 2022 16 Horizontal resolution 1 km (globally quasi-uniform) Vertical resolution 180 levels (surface to ~100 km) Time resolution Less than 1 minute Coupled Land-surface/ocean/ocean-waves/sea-ice Atmosphere Non-hydrostatic Precision Single (32bit) or mixed precision Compute rate 1 SYPD (simulated year wall-clock day)
  • 17. T. Schulthess Running COSMO 5.0 & IFS (“the European Model”) at global scale on Piz Daint 17 Scaling to full system size: ~5300 GPU accelerate nodes available Running a near-global (±80º covering 97% of Earths surface) COSMO 5.0 simulation & IFS > Either on the hosts processors: Intel Xeon E5 2690v3 (Haswell 12c). > Or on the GPU accelerator: PCIe version of NVIDIA GP100 (Pascal) GPU
  • 18. T. Schulthess The baseline for COSMO-global and IFS 18
  • 19. T. Schulthess Memory use efficiency 19 Necessary data transfers Actual data transfers Fuhrer et al., Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2017-230, published 2018 Achieved BW Max achievable BW (STREAM) 0.88 0.76 = 0.67 2x lower than peak BW 0.55 w. regard to peak BW
  • 20. T. Schulthess Can the 100x shortfall of a grid-based implementation like COSMO-global be overcome? 20 1. Icosahedral/octahedral grid (ICON/IFS) vs. Lat-long/Cartesian grid (COSMO) 2x fewer grid-columns Time step of 10 ms instead of 5 ms 4x 2. Improving BW efficiency Improve BW efficiency and peak BW 2x (results on Volta show this is realistic) 3. Strong scaling 4x possible in COSMO, but we reduced available parallelism by factor 1.33 3x 4. Remaining reduction in shortfall 4x Numerical algorithms (larger time steps) Further improved processors / memory But we don’t want to increase the footprint of the 2022 system succeeding “Piz Daint” 100x
  • 21. T. Schulthess What about ensembles and throughput for climate? (Remaining goals beyond 2022) 21 1. Improve the throughput to 5 SYPD 2. Reduce the footprint of a single simulation by up to factor 10-50 Necessary data transfers Actual data transfers Achieved BW Max achievable BW Change the architecture from control flow to data flow centric (reduce necessary data transfers) We may have to change the footprint of machines to hyper scale!
  • 22. T. Schulthess 22 LUMI Consortium •Large consortium with strong national HPC centres and competence provides a unique opportunity for •knowledge transfer; •synergies in operations; and •regionally adaptable user support for extreme-scale systems •National & EU investments (2020-2026) Finland 50 M€ Belgium 15.5 M€ Czech Republic 5 M€ Denmark 6 M€ Estonia 2 M€ Norway 4 M€ Poland 5 M€ Sweden 7 M€ Switzerland 10 M€ EU 104 M€ Plus additional investments in applications development
  • 23. T. Schulthess Kajaani Data Center (LUMI) 23 100% hydroelectric energy up to 200 MW 2200 m2 floor space, expandable up to 4600 m2 Waste heat reuse: effective energy price 35 €/MWh, negative CO2 footprint: 13500 tons reduced every year One power grid outage in 36 years 100% free cooling @ PUE 1.03 Extreme connectivity: Kajaani DC is a direct part of the Nordic backbone; 4x100 Gbit/s in place; can be easily scaled up to multi-terabit level Zero network downtime since the establishment of the DC in 2012
  • 24. T. Schulthess Collaborators on Exascale (climate) 24 Tim Palmer (U. of Oxford) Christoph Schar (ETH Zurich) Oliver Fuhrer (MeteoSwiss) Peter Bauer (ECMWF) Bjorn Stevens (MPI-M) Torsten Hoefler (ETH Zurich)Nils Wedi (ECMWF)