Computing Outside The Box June 2009 - Presentation Transcript
Ian Foster
Computation Institute
Argonne National Lab & University of Chicago 1
Abstract
The past decade has seen increasingly ambitious and successful
methods for outsourcing computing. Approaches such as utility
computing, on-demand computing, grid computing, software as a
service, and cloud computing all seek to free computer
applications from the limiting confines of a single computer.
Software that thus runs \"outside the box\" can be more powerful
(think Google, TeraGrid), dynamic (think Animoto, caBIG), and
collaborative (think FaceBook, myExperiment). It can also be
cheaper, due to economies of scale in hardware and software.
The combination of new functionality and new economics inspires
new applications, reduces barriers to entry for application
providers, and in general disrupts the computing ecosystem. I
discuss the new applications that outside-the-box computing
enables, in both business and science, and the hardware and
software architectures that make these new applications
possible. 2
3
“I’ve been doing
cloud computing
since before it
was called grid.”
4
1890
5
1953
6
“Computation may someday be
organized as a public utility …
The computing utility could become
the basis for a new and important
industry.”
John
McCarthy
(1961)
7
8
“When the network is as fast as the computer's
internal links, the machine disintegrates across
the net into a set of special purpose appliances”
(George Gilder, 2001)
Science
Connectivity (on log scale)
Grid
Time 9
Application
Infrastructure
10
Layered grid architecture
Application
“Specialized services”: user- or
appln-specific distributed services User
Internet Protocol Architecture
“Managing multiple resources”:
ubiquitous infrastructure services Collective
Application
“Sharing single resources”:
negotiating access, controlling use Resource
“Talking to things”: communication
(Internet protocols) & security Connectivity Transport
Internet
“Controlling things locally”: Access
to, & control of, resources Fabric Link
(“The Anatomy of the Grid,” 2001) 11
Application
Service oriented infrastructure
Infrastructure
12
13
www.opensciencegrid.org 14
www.opensciencegrid.org 15
Application
Service oriented infrastructure
Infrastructure
16
Application
Service oriented applications
Service oriented infrastructure
Infrastructure
17
18
As of Oct 122 participants 70 data
19, 2008: 105 services 35 analytical
19
l Query and retrieve Microarray clustering
microarray data from a
caArray data service:
using Taverna
cagridnode.c2b2.columbia.edu:80 Workflow in/output
80/wsrf/services/cagrid/CaArrayS
caGrid services
crub
others “Shim” services
l Normalize microarray
data using GenePattern
analytical service
node255.broad.mit.edu:6060/wsrf
/services/cagrid/PreprocessDatase
tMAGEService
l Hierarchical clustering
using geWorkbench
analytical service:
cagridnode.c2b2.columbia.edu:80
80/wsrf/services/cagrid/Hierarchic
alClusteringMage
Wei Tan
20
Applications
Infrastructure 21
Energy
Progress of adoption
22
$$ $$ $$
Energy
Progress of adoption
23
$$ $$ $$
Energy
Progress of adoption
24
“When the network is as fast as the computer's
internal links, the machine disintegrates across
the net into a set of special purpose appliances”
(George Gilder, 2001)
Science Enterprise
Connectivity (on log scale)
Grid Cloud
Time 25
Dynamo: Amazon’s highly available key-
value store (DeCandia et al., SOSP’07)
q Simple query model
q Weak consistency,
no isolation
q Stringent SLAs (e.g.,
300ms for 99.9% of
requests; peak 500
requests/sec)
q Incremental
scalability
q Symmetry
q Decentralization
q Heterogeneity 36
Technologies used in Dynamo
T e c h n iq u
P r o b le m Ad v a n ta g e
C o n se t e n t
is In c r e m e n t a l
P a r t it io n in g Ve c t o r
h a s h in g s c a la b ilit y
H ig h c lo c k s w it h V e r s io n s iz e is
A v a ila b ilit y f o r r e c o n c ilia t io d e c o u p le d f r o m
w r it e s n d u r in g P r odvaid e s aht ig sh
up t e r e
S lo p p y a v a ila b ilit y a n d
H a n d lin g re a d s
q u o ru m a n d d u r a b ilit y
te m p o ra ry
h in t e d g u a ra n te e w h e n
f a ilu r e s
h a n d o ff s o m e o f th e
R e c o v e r in g r eS p licc ahsr oa n iz enso t
yn re
A n t i-e n t r o p y
fro m d iv a ila b le t
a v e rg e n
u s in g M e r k le
p e rm a n e n t r e P r e s e r ine t h e
p lic a s v s
tre e s s b a cm e trroy u a n d
ym kg
f a ilu r e s G o s s ip - nd
a v o id s h a v in g a
bas e d
M e m b e r s h ip c e n t r a liz e d
m e m b e r s h ip
a n d f a ilu r e r e g is t r y f o r
p ro to c o l a n d
d e t e c t io n s t o r in g
f a ilu r e
m e m b e r s h ip a n d
d e t e c t io n .
n o d e liv e n e s s
Application
Service oriented applications
Service oriented infrastructure
Infrastructure
38
The Globus-based
LIGO data grid
LIGO Gravitational Wave Observatory
B ir m in g h a m •
Cardiff
AEI/Golm
Replicating >1 Terabyte/day to 8 sites
>100 million replicas so far
MTBF = 1 month 39
Data replication service
Pull “missing” files to a storage system
Data Location
Data Movement Local Replica
GridFTP Replica Location
Reliable Catalog Index
File
Transfer
Service
Local Replica
GridFTP Replica Location
Catalog Index
Data Replication
List of
required Data
Files Replication
Service
“Design and Implementation of a Data Replication Service Based on the
Lightweight Data Replicator System,” Chervenak et al., 2005 40
Specializing further …
S1
User S2
D
“Provide access to S3
data D at S1, S2, S3 S1
with performance P”
S2
Service D
Provider Replica catalog,
S3
“Provide storage User-level multicast, …
with performance P1,
network with P2, …” S1
Resource S2
D
Provider
S3 41
Using IaaS in biomedical informatics
handle.net
IaaS provider My servers
BIRN BIRN
Chicago
Chicago Chicago
Chicago
Chicago
42
Clouds and supercomputers:
Conventional wisdom?
Clouds/
clusters ✔ Too slow
Super
computers
Too
expensive ✔
Loosely coupled Tightly coupled
applications applications
43
Ed Walker, Benchmarking Amazon EC2 for high-performance
scientific computing, ;Login, October 2008. 44
Ed Walker, Benchmarking Amazon EC2 for high-performance
scientific computing, ;Login, October 2008. 45
Ed Walker, Benchmarking Amazon EC2 for high-performance
scientific computing, ;Login, October 2008. 46
Ed Walker, Benchmarking Amazon EC2 for high-performance
scientific computing, ;Login, October 2008. 47
D. Nurmi, J. Brevik, R. Wolski: QBETS: queue bounds estimation
from time series. SIGMETRICS 2007: 379-380 48
D. Nurmi, J. Brevik, R. Wolski: QBETS: queue bounds estimation
from time series. SIGMETRICS 2007: 379-380 49
D. Nurmi, J. Brevik, R. Wolski: QBETS: queue bounds estimation
from time series. SIGMETRICS 2007: 379-380 50
D. Nurmi, J. Brevik, R. Wolski: QBETS: queue bounds estimation
from time series. SIGMETRICS 2007: 379-380 51
Clouds and supercomputers:
Conventional wisdom?
✔
Good for
Clouds/
clusters
rapid
response
Super
computers
Too
expensive ✔
Loosely coupled Tightly coupled
applications applications
52
Loosely coupled problems
q Ensemble runs to quantify climate model
uncertainty
q Identify potential drug targets by screening a
database of ligand structures against target proteins
q Study economic model sensitivity to parameters
q Analyze turbulence dataset from many perspectives
q Perform numerical optimization to determine optimal
resource assignment in energy problems
q Mine collection of data from advanced light sources
q Construct databases of computed properties of
chemical compounds
q Analyze data from the Large Hadron Collider
q Analyze log data from 100,000-node parallel 53
53
Many many tasks:
Identifying potential drug targets
Protein x 2M+ ligands
target(s)
(Mike Kubal, Benoit Roux, and others) 54
Manually prep Manually prep NAB script
ZINC DOCK6 rec file FRED rec file NAB parameters
3-D Script (defines flexible
structures DOCK6 FRED Template residues,
Receptor Receptor #MDsteps)
6
2M (1 per protein: (1 per protein:
structures
(6GB
PDB defines pocket defines pocket
1 GB)
protein protein to bind to) to bind to) BuildNABScript
descriptions (1MB)
NAB Amber prep:
start 2. AmberizeReceptor
Script
~4M x 60s x 1 cpu 4. perl: gen nabscript
FRED DOCK6
~60K cpu-hrs
Select best ~5K Select best ~5K
Amber Score:
~10K x 20m x 1 cpu 1. AmberizeLigand
Amber
~3K cpu-hrs 3. AmberizeComplex
5. RunNABScript
Select best ~500
GCMC
~500 x 10hr x 100 cpu
~500K cpu-hrs
For 1 target:
end 4 million tasks
500,000 cpu-hrs
report ligands complexes
(50 cpu-years)55
56
DOCK on BG/P: ~1M tasks on 118,000 CPUs
q CPU cores: 118784
q Tasks: 934803
q Elapsed time: 7257 sec
q Compute time: 21.43 CPU
years
q Average task time: 667 sec
q Relative Efficiency: 99.7%
q (from 16 to 32 racks)
q Utilization:
x Sustained: 99.6%
Time (secs)
•
x Overall: 78.3%
GPFS
• 1 script (~5KB)
• 2 file read (~10KB)
• 1 file write (~10KB)
• RAM (cached from GPFS on first task per node)
Ioan Zhao Mike • 1 binary (~7MB)
Raicu Zhang Wilde • Static input data (~45MB) 57
Scaling
Chirp Global file system Posix to
(multicast) petascale
Staging
Torus and tree interconnects
CN-striped intermediate file system Intermediate
Large …
MosaStore
dataset (striping)
Compute Compute
LFS
node ... LFS
node
Local
(local datasets) (local datasets)
59
Efficiency for 4 second tasks and varying data size (1KB
to 1MB) for CIO and GPFS up to 32K processors 60
Same scenario, but with dynamic resource provisioning
62
Data diffusion sine-wave
workload: Summary
q GPFS 5.70 hrs, ~8Gb/s, 1138 CPU hrs
q DD+SRP 1.80 hrs, ~25Gb/s, 361 CPU hrs
q DD+DRP 1.86 hrs, ~24Gb/s, 253 CPU hrs
63
Clouds and supercomputers:
Conventional wisdom?
✔
Good for
Clouds/
clusters
rapid
response
Super
computers
Excellent ✔
Loosely coupled Tightly coupled
applications applications
64
“The computer
revolution hasn’t
happened yet.”
Alan Kay, 1997
65
“When the network is as fast as the computer's
internal links, the machine disintegrates across
the net into a set of special purpose appliances”
(George Gilder, 2001)
Science Enterprise Consumer
Connectivity (on log scale)
Grid Cloud ????
Time 66
The Shape of Grids to Come?
Energy Internet
67
Thank you!
Computation Institute
www.ci.uchicago.edu
Keynote talk at the International Conference on Sup more
Keynote talk at the International Conference on Supercoming 2009, at IBM Yorktown in New York. This is a major update of a talk first given in New Zealand last January. The abstract follows.
The past decade has seen increasingly ambitious and successful methods for outsourcing computing. Approaches such as utility computing, on-demand computing, grid computing, software as a service, and cloud computing all seek to free computer applications from the limiting confines of a single computer. Software that thus runs "outside the box" can be more powerful (think Google, TeraGrid), dynamic (think Animoto, caBIG), and collaborative (think FaceBook, myExperiment). It can also be cheaper, due to economies of scale in hardware and software. The combination of new functionality and new economics inspires new applications, reduces barriers to entry for application providers, and in general disrupts the computing ecosystem. I discuss the new applications that outside-the-box computing enables, in both business and science, and the hardware and software architectures that make these new applications possible. less
0 comments
Post a comment