National Computational Science Alliance
“Supercomputers: Directions in Technology,
Architecture, and Applications”
Keynote Presentation
Supercomputer ‘98
Mannheim, Germany
June 18, 1998
1
Dr. Larry Smarr
Director, National Computational Science Alliance and
the National Center for Supercomputing Applications
Professor in the Departments of Physics and Astronomy
University of Illinois Urbana-Champaign
National Computational Science Alliance
NCSA is the Leading Edge Site for the
National Computational Science Alliance
www.ncsa.uiuc.edu
National Computational Science Alliance
Scientific Applications Continue to Require
Exponential Growth in Capacity
MACHINE REQUIREMENT IN FLOPS
1010 1012
1014
1016
1018
1020
1995 NSF
Capability
108
2000 NSF
Leading Edge
Molecular Dynamics for
Biological Molecules
Computational
Cosmology
Turbulent
Convection
in Stars
Atomic/Diatomic
Interaction
QCD
1012
M
E
M
O
R
Y
B
Y
T
E
S
1010
108
1014
= Long Range Projections from Recent Applications Workshop
= Next Step Projections by NSF Grand Challenge Research Teams
= Recent Computations by NSF Grand Challenge Research Teams
ASCI in 2004
100 year climate
model in hours
NSF in 2004 (Projected)
From Bob Voigt, NSF
National Computational Science Alliance
The Promise of the Teraflop -
From Thunderstorm to National-Scale Simulation
Simulation by
Wilhelmson, et al.;
Figure from
Supercomputing and
the Transformation of
Science, Kaufmann
and Smarr, Freeman,
1993
National Computational Science Alliance
Accelerated Strategic Computing Initiative is
Coupling DOE Defense Labs to Universities
• Access to ASCI Leading Edge Supercomputers
• Academic Strategic Alliances Program
• Data and Visualization Corridors
http://www.llnl.gov/asci-alliances/centers.html
National Computational Science Alliance
Comparison of the DoE ASCI and the
NSF PACI Origin Array Scale Through FY99
www.lanl.gov/projects/asci/bluemtn
/Hardware/schedule.html
Los Alamos Origin System FY99
5-6000 processors
NCSA Proposed System FY99
6x128 and 4x64=1024 processors
National Computational Science Alliance
Future Upgrade Under Negotiation with NSF
NCSA Combines Shared Memory
Programming with Massive Parallelism
CM-5
CM-2
National Computational Science Alliance
The Exponential Growth of NCSA’s
SGI Shared Memory Supercomputers
1
10
100
1000
10000
Jan-94
Jan-95
Jan-96
Jan-97
Jan-98
Jan-99
Jan-00
Jan-01
SGI
Processors Doubling Every Nine Months!
Challenge
Power Challenge
Origin
SN1
National Computational Science Alliance
TOP500 Systems by Vendor
TOP500 Reports: http://www.netlib.org/benchmark/top500.html
CRI
SGI
IBM
Convex
HP
Sun
TMC
Intel
DEC
Japanese
Other
0
100
200
300
400
500 Jun-93
Nov-93
Jun-94
Nov-94
Jun-95
Nov-95
Jun-96
Nov-96
Jun-97
Nov-97
Jun-98
Number
of
Systems
Other
Japanese
DEC
Intel
TMC
Sun
HP
Convex
IBM
SGI
CRI
National Computational Science Alliance
Average User MFLOPS
Number
of
Users
0
50
100
150
20
40
60
80
100
120
140
160
180
200
220
240
260
280
300
March, 1992 - February, 1993
Average Performance, Users > 0.5 CPU Hour
Cray Y-MP4 / 64
Average Speed 70 MFLOPS
Peak Speed
MIPS R8000
Peak
Speed
Y-MP1
Why NCSA Switched From Vector to
RISC Processors
NCSA 1992 Supercomputing Community
National Computational Science Alliance
Replacement of Shared Memory Vector
Supercomputers by Microprocessor SMPs
TOP500 Reports: http://www.netlib.org/benchmark/top500.html
Top500
Installed
SC’s
0
100
200
300
400
500
Jun-93
Jun-94
Jun-95
Jun-96
Jun-97
Jun-98
MPP
SMP/DSM
PVP
National Computational Science Alliance
Top500 Shared Memory Systems
Vector Processors Microprocessors
TOP500 Reports: http://www.netlib.org/benchmark/top500.html
PVP Systems
0
100
200
300
Jun-93
Nov-93
Jun-94
Nov-94
Jun-95
Nov-95
Jun-96
Nov-96
Jun-97
Nov-97
Jun-98
Number
of
Systems
Europe
Japan
USA
SMP + DSM Systems
0
100
200
300
Jun-93
Nov-93
Jun-94
Nov-94
Jun-95
Nov-95
Jun-96
Nov-96
Jun-97
Nov-97
Jun-98
Number
of
Systems
USA
National Computational Science Alliance
Simulation of the Evolution of the Universe
on a Massively Parallel Supercomputer
12 Billion Light Years 4 Billion Light Years
Virgo Project - Evolving a Billion Pieces of Cold Dark Matter in a Hubble Volume -
688-processor CRAY T3E at Garching Computing Centre of the Max-Planck-Society
http://www.mpg.de/universe.htm
National Computational Science Alliance
Limitations of Uniform Grids for Complex
Scientific and Engineering Problems
Source: Greg Bryan, Mike Norman, NCSA
512x512x512 Run on 512-node CM-5
Gravitation Causes
Continuous
Increase in Density
Until There is a
Large Mass in a
Single Grid Zone
National Computational Science Alliance
Use of Shared Memory Adaptive Grids To
Achieve Dynamic Load Balancing
Source: Greg Bryan, Mike Norman, John Shalf, NCSA
64x64x64 Run with Seven Levels of Adaption on SGI Power Challenge,
Locally Equivalent to 8192x8192x8192 Resolution
National Computational Science Alliance
1
10
100
1000
10000
100000
1000000
1
16
31
46
61
76
91
106
121
136
151
166
181
Rank
CPU-Hours
Burned
100k to 1 M
10k to 100k
1k to 10k
100 to 1k
10 to 100
1 to 10
Extreme and Large PIs
Dominant Usage of NCSA Origin
January thru April, 1998
National Computational Science Alliance
Disciplines Using the NCSA Origin 2000
CPU-Hours in March 1995
Particle Physics
Chemistry
Materials Sciences
Engineering CFD
Astronomy
Physics
Industry
Molecular Biology
Other
National Computational Science Alliance
0
1
2
3
4
5
6
7 0
1
0
2
0
3
0
4
0
5
0
6
0
Processors
G
ig
a
flo
p
s
Origin-DSM
Origin-MPI
NT-MPI
SP2-MPI
T3E-MPI
SPP2000-DSM
Solving 2D Navier-Stokes Kernel -
Performance of Scalable Systems
Source: Danesh Tafti, NCSA
Preconditioned Conjugate Gradient Method With
Multi-level Additive Schwarz Richardson Pre-conditioner
(2D 1024x1024)
National Computational Science Alliance
A Variety of Discipline Codes -
Single Processor Performance Origin vs. T3E
0
20
40
60
80
100
120
140
160
Origin T3E
Single
Processor
MFLOPS
QMC
RIEMANN
Laplace
QCD
PPM
PIMC
ZEUS
National Computational Science Alliance
Alliance PACS Origin2000 Repository
http://scv.bu.edu/SCV/Origin2000/
Kadin Tseng, BU, Gary Jensen, NCSA, Chuck Swanson, SGI
John Connolly, U Kentucky Developing Repository for HP Exemplar
National Computational Science Alliance
• NEC SX-5
– 32 x 16 vector processor SMP
– 512 Processors
– 8 Gigaflop Peak Processor
• IBM SP
– 256 x 16 RISC Processor SMP
– 4096 Processors
– 1 Gigaflop Peak Processor
• SGI Origin Follow-on
– 32 x 128 RISC Processor DSM
– 4096 Processors
– 1 Gigaflop Peak Processor
High-End Architecture 2000-
Scalable Clusters of Shared Memory Modules
Each is 4 Teraflops Peak
National Computational Science Alliance
Emerging Portable Computing Standards
• HPF
• MPI
• OpenMP
• Hybrids of MPI and OpenMP
National Computational Science Alliance
Basket of Applications Average Performance
as Percentage of Linpack Performance
0
200
400
600
800
1000
1200
1400
1600
1800
T90 C90 SPP-
2000
SP2-
160
Origin
195
PCA
Linpack
Apps. Ave.
22%
25%
14% 19%
33% 26%
Applications Codes:
CFD
Biomolecular
Chemistry
Materials
QCD
National Computational Science Alliance
Harnessing Distributed UNIX Workstations -
University of Wisconsin Condor Pool
Condor Cycles
CondorView, Courtesy of Miron Livny, Todd Tannenbaum(UWisc)
National Computational Science Alliance
NT Workstation Shipments
Rapidly Surpassing UNIX
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1995 1996 1997
Workstations
Shipped
(Millions)
UNIX
NT
Source: IDC, Wall Street Journal, 3/6/98
National Computational Science Alliance
First Scaling Testing of ZEUS-MP on
CRAY T3E and Origin vs. NT Supercluster
“Supercomputer performance at mail-order prices”-- Jim Gray, Microsoft
access.ncsa.uiuc.edu/CoverStories/SuperCluster/super.html
Zeus-MP Hydro Code
Running Under MPI
• Alliance Cosmology Team
• Andrew Chien, UIUC
• Rob Pennington, NCSA
0
2
0
4
0
6
0
8
0
1
0
0
1
2
0
1
4
0
T
3
E
O
r
i
g
i
n
N
T
S
i
n
g
l
e
P
r
o
c
e
s
s
o
r
S
p
e
e
d
o
n
Z
E
U
S
-
M
P
(
M
F
L
O
P
S
)
0
1
2
3
4
5
6
7
8
0
2
0
4
0
6
0
8
0
1
0
0
1
2
0
1
4
0
1
6
0
1
8
0
2
0
0
P
r
o
c
e
s
s
o
r
s
G
F
L
O
P
S
T
3
E
O
r
i
g
i
n
N
T
/
I
n
t
e
l
National Computational Science Alliance
NCSA NT Supercluster
Solving Navier-Stokes Kernel
Preconditioned Conjugate Gradient Method With
Multi-level Additive Schwarz Richardson Pre-conditioner
(2D 1024x1024)
Single Processor Performance:
MIPS R10k 117 MFLOPS
Intel Pentium II 80 MFLOPS
Danesh Tafti, Rob Pennington, Andrew Chien NCSA
0
10
20
30
40
50
60
0
10
20
30
40
50
60
Processors
Speedup
NT MPI
Origin MPI
Origin SM
Perfect
0
1
2
3
4
5
6
7
0
10
20
30
40
50
60
70
Processors
Gigaflops
NT MPI
Origin MPI
Origin SM
National Computational Science Alliance
Near Perfect Scaling of Cactus -
3D Dynamic Solver for the Einstein GR Equations
0
20
40
60
80
100
120
0
20
40
60
80
100
120
Processors
Scaling
Origin
NT SC
Ratio of GFLOPs
Origin = 2.5x NT SC
Danesh Tafti, Rob Pennington, Andrew Chien NCSA
Cactus was
Developed by
Paul Walker,
MPI-Potsdam
UIUC, NCSA
National Computational Science Alliance
NCSA Symbio - A Distributed Object Framework
Bringing Scalable Computing to NT Desktops
http://access.ncsa.uiuc.edu/Features/Symbio/Symbio.html
• Parallel Computing on NT Clusters
– Briand Sanderson, NCSA
– Microsoft Co-Funds Development
• Features
– Based on Microsoft DCOM
– Batch or Interactive Modes
– Application Development Wizards
• Current Status & Future Plans
– Symbio Developer Preview 2 Released
– Princeton University Testbed
National Computational Science Alliance
The Road to Merced
http://developer.intel.com/solutions/archive/issue5/focus.htm#FOUR

Supercomputers: Directions in Technology, Architecture, and Applications

  • 1.
    National Computational ScienceAlliance “Supercomputers: Directions in Technology, Architecture, and Applications” Keynote Presentation Supercomputer ‘98 Mannheim, Germany June 18, 1998 1 Dr. Larry Smarr Director, National Computational Science Alliance and the National Center for Supercomputing Applications Professor in the Departments of Physics and Astronomy University of Illinois Urbana-Champaign
  • 2.
    National Computational ScienceAlliance NCSA is the Leading Edge Site for the National Computational Science Alliance www.ncsa.uiuc.edu
  • 3.
    National Computational ScienceAlliance Scientific Applications Continue to Require Exponential Growth in Capacity MACHINE REQUIREMENT IN FLOPS 1010 1012 1014 1016 1018 1020 1995 NSF Capability 108 2000 NSF Leading Edge Molecular Dynamics for Biological Molecules Computational Cosmology Turbulent Convection in Stars Atomic/Diatomic Interaction QCD 1012 M E M O R Y B Y T E S 1010 108 1014 = Long Range Projections from Recent Applications Workshop = Next Step Projections by NSF Grand Challenge Research Teams = Recent Computations by NSF Grand Challenge Research Teams ASCI in 2004 100 year climate model in hours NSF in 2004 (Projected) From Bob Voigt, NSF
  • 4.
    National Computational ScienceAlliance The Promise of the Teraflop - From Thunderstorm to National-Scale Simulation Simulation by Wilhelmson, et al.; Figure from Supercomputing and the Transformation of Science, Kaufmann and Smarr, Freeman, 1993
  • 5.
    National Computational ScienceAlliance Accelerated Strategic Computing Initiative is Coupling DOE Defense Labs to Universities • Access to ASCI Leading Edge Supercomputers • Academic Strategic Alliances Program • Data and Visualization Corridors http://www.llnl.gov/asci-alliances/centers.html
  • 6.
    National Computational ScienceAlliance Comparison of the DoE ASCI and the NSF PACI Origin Array Scale Through FY99 www.lanl.gov/projects/asci/bluemtn /Hardware/schedule.html Los Alamos Origin System FY99 5-6000 processors NCSA Proposed System FY99 6x128 and 4x64=1024 processors
  • 7.
    National Computational ScienceAlliance Future Upgrade Under Negotiation with NSF NCSA Combines Shared Memory Programming with Massive Parallelism CM-5 CM-2
  • 8.
    National Computational ScienceAlliance The Exponential Growth of NCSA’s SGI Shared Memory Supercomputers 1 10 100 1000 10000 Jan-94 Jan-95 Jan-96 Jan-97 Jan-98 Jan-99 Jan-00 Jan-01 SGI Processors Doubling Every Nine Months! Challenge Power Challenge Origin SN1
  • 9.
    National Computational ScienceAlliance TOP500 Systems by Vendor TOP500 Reports: http://www.netlib.org/benchmark/top500.html CRI SGI IBM Convex HP Sun TMC Intel DEC Japanese Other 0 100 200 300 400 500 Jun-93 Nov-93 Jun-94 Nov-94 Jun-95 Nov-95 Jun-96 Nov-96 Jun-97 Nov-97 Jun-98 Number of Systems Other Japanese DEC Intel TMC Sun HP Convex IBM SGI CRI
  • 10.
    National Computational ScienceAlliance Average User MFLOPS Number of Users 0 50 100 150 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 March, 1992 - February, 1993 Average Performance, Users > 0.5 CPU Hour Cray Y-MP4 / 64 Average Speed 70 MFLOPS Peak Speed MIPS R8000 Peak Speed Y-MP1 Why NCSA Switched From Vector to RISC Processors NCSA 1992 Supercomputing Community
  • 11.
    National Computational ScienceAlliance Replacement of Shared Memory Vector Supercomputers by Microprocessor SMPs TOP500 Reports: http://www.netlib.org/benchmark/top500.html Top500 Installed SC’s 0 100 200 300 400 500 Jun-93 Jun-94 Jun-95 Jun-96 Jun-97 Jun-98 MPP SMP/DSM PVP
  • 12.
    National Computational ScienceAlliance Top500 Shared Memory Systems Vector Processors Microprocessors TOP500 Reports: http://www.netlib.org/benchmark/top500.html PVP Systems 0 100 200 300 Jun-93 Nov-93 Jun-94 Nov-94 Jun-95 Nov-95 Jun-96 Nov-96 Jun-97 Nov-97 Jun-98 Number of Systems Europe Japan USA SMP + DSM Systems 0 100 200 300 Jun-93 Nov-93 Jun-94 Nov-94 Jun-95 Nov-95 Jun-96 Nov-96 Jun-97 Nov-97 Jun-98 Number of Systems USA
  • 13.
    National Computational ScienceAlliance Simulation of the Evolution of the Universe on a Massively Parallel Supercomputer 12 Billion Light Years 4 Billion Light Years Virgo Project - Evolving a Billion Pieces of Cold Dark Matter in a Hubble Volume - 688-processor CRAY T3E at Garching Computing Centre of the Max-Planck-Society http://www.mpg.de/universe.htm
  • 14.
    National Computational ScienceAlliance Limitations of Uniform Grids for Complex Scientific and Engineering Problems Source: Greg Bryan, Mike Norman, NCSA 512x512x512 Run on 512-node CM-5 Gravitation Causes Continuous Increase in Density Until There is a Large Mass in a Single Grid Zone
  • 15.
    National Computational ScienceAlliance Use of Shared Memory Adaptive Grids To Achieve Dynamic Load Balancing Source: Greg Bryan, Mike Norman, John Shalf, NCSA 64x64x64 Run with Seven Levels of Adaption on SGI Power Challenge, Locally Equivalent to 8192x8192x8192 Resolution
  • 16.
    National Computational ScienceAlliance 1 10 100 1000 10000 100000 1000000 1 16 31 46 61 76 91 106 121 136 151 166 181 Rank CPU-Hours Burned 100k to 1 M 10k to 100k 1k to 10k 100 to 1k 10 to 100 1 to 10 Extreme and Large PIs Dominant Usage of NCSA Origin January thru April, 1998
  • 17.
    National Computational ScienceAlliance Disciplines Using the NCSA Origin 2000 CPU-Hours in March 1995 Particle Physics Chemistry Materials Sciences Engineering CFD Astronomy Physics Industry Molecular Biology Other
  • 18.
    National Computational ScienceAlliance 0 1 2 3 4 5 6 7 0 1 0 2 0 3 0 4 0 5 0 6 0 Processors G ig a flo p s Origin-DSM Origin-MPI NT-MPI SP2-MPI T3E-MPI SPP2000-DSM Solving 2D Navier-Stokes Kernel - Performance of Scalable Systems Source: Danesh Tafti, NCSA Preconditioned Conjugate Gradient Method With Multi-level Additive Schwarz Richardson Pre-conditioner (2D 1024x1024)
  • 19.
    National Computational ScienceAlliance A Variety of Discipline Codes - Single Processor Performance Origin vs. T3E 0 20 40 60 80 100 120 140 160 Origin T3E Single Processor MFLOPS QMC RIEMANN Laplace QCD PPM PIMC ZEUS
  • 20.
    National Computational ScienceAlliance Alliance PACS Origin2000 Repository http://scv.bu.edu/SCV/Origin2000/ Kadin Tseng, BU, Gary Jensen, NCSA, Chuck Swanson, SGI John Connolly, U Kentucky Developing Repository for HP Exemplar
  • 21.
    National Computational ScienceAlliance • NEC SX-5 – 32 x 16 vector processor SMP – 512 Processors – 8 Gigaflop Peak Processor • IBM SP – 256 x 16 RISC Processor SMP – 4096 Processors – 1 Gigaflop Peak Processor • SGI Origin Follow-on – 32 x 128 RISC Processor DSM – 4096 Processors – 1 Gigaflop Peak Processor High-End Architecture 2000- Scalable Clusters of Shared Memory Modules Each is 4 Teraflops Peak
  • 22.
    National Computational ScienceAlliance Emerging Portable Computing Standards • HPF • MPI • OpenMP • Hybrids of MPI and OpenMP
  • 23.
    National Computational ScienceAlliance Basket of Applications Average Performance as Percentage of Linpack Performance 0 200 400 600 800 1000 1200 1400 1600 1800 T90 C90 SPP- 2000 SP2- 160 Origin 195 PCA Linpack Apps. Ave. 22% 25% 14% 19% 33% 26% Applications Codes: CFD Biomolecular Chemistry Materials QCD
  • 24.
    National Computational ScienceAlliance Harnessing Distributed UNIX Workstations - University of Wisconsin Condor Pool Condor Cycles CondorView, Courtesy of Miron Livny, Todd Tannenbaum(UWisc)
  • 25.
    National Computational ScienceAlliance NT Workstation Shipments Rapidly Surpassing UNIX 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1995 1996 1997 Workstations Shipped (Millions) UNIX NT Source: IDC, Wall Street Journal, 3/6/98
  • 26.
    National Computational ScienceAlliance First Scaling Testing of ZEUS-MP on CRAY T3E and Origin vs. NT Supercluster “Supercomputer performance at mail-order prices”-- Jim Gray, Microsoft access.ncsa.uiuc.edu/CoverStories/SuperCluster/super.html Zeus-MP Hydro Code Running Under MPI • Alliance Cosmology Team • Andrew Chien, UIUC • Rob Pennington, NCSA 0 2 0 4 0 6 0 8 0 1 0 0 1 2 0 1 4 0 T 3 E O r i g i n N T S i n g l e P r o c e s s o r S p e e d o n Z E U S - M P ( M F L O P S ) 0 1 2 3 4 5 6 7 8 0 2 0 4 0 6 0 8 0 1 0 0 1 2 0 1 4 0 1 6 0 1 8 0 2 0 0 P r o c e s s o r s G F L O P S T 3 E O r i g i n N T / I n t e l
  • 27.
    National Computational ScienceAlliance NCSA NT Supercluster Solving Navier-Stokes Kernel Preconditioned Conjugate Gradient Method With Multi-level Additive Schwarz Richardson Pre-conditioner (2D 1024x1024) Single Processor Performance: MIPS R10k 117 MFLOPS Intel Pentium II 80 MFLOPS Danesh Tafti, Rob Pennington, Andrew Chien NCSA 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Processors Speedup NT MPI Origin MPI Origin SM Perfect 0 1 2 3 4 5 6 7 0 10 20 30 40 50 60 70 Processors Gigaflops NT MPI Origin MPI Origin SM
  • 28.
    National Computational ScienceAlliance Near Perfect Scaling of Cactus - 3D Dynamic Solver for the Einstein GR Equations 0 20 40 60 80 100 120 0 20 40 60 80 100 120 Processors Scaling Origin NT SC Ratio of GFLOPs Origin = 2.5x NT SC Danesh Tafti, Rob Pennington, Andrew Chien NCSA Cactus was Developed by Paul Walker, MPI-Potsdam UIUC, NCSA
  • 29.
    National Computational ScienceAlliance NCSA Symbio - A Distributed Object Framework Bringing Scalable Computing to NT Desktops http://access.ncsa.uiuc.edu/Features/Symbio/Symbio.html • Parallel Computing on NT Clusters – Briand Sanderson, NCSA – Microsoft Co-Funds Development • Features – Based on Microsoft DCOM – Batch or Interactive Modes – Application Development Wizards • Current Status & Future Plans – Symbio Developer Preview 2 Released – Princeton University Testbed
  • 30.
    National Computational ScienceAlliance The Road to Merced http://developer.intel.com/solutions/archive/issue5/focus.htm#FOUR