Grid Computing Overview

5,564 views

Published on

Federating Compute and Storage Resources to Accelerate Science and Aid Collaboration

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,564
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
32
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Grid Computing Overview

    1. 1. Grid
Computing
OverviewFederating
Compute
and
Storage
Resources
to
 Accelerate
Science
and
Aid
Collaboration Ian
Stokes‐Rees,
PhD Harvard
Medical
School,
Boston,
USA http://portal.sbgrid.org ijstokes@hkl.hms.harvard.edu
    2. 2. Slides
and
Contact ijstokes@hkl.hms.harvard.edu http://linkedin.com/in/ijstokes http://slidesha.re/ijstokes-grid2011Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    3. 3. Slides
and
Contact ijstokes@hkl.hms.harvard.edu http://linkedin.com/in/ijstokes http://slidesha.re/ijstokes-grid2011 http://www.sbgrid.org http://portal.sbgrid.org http://www.opensciencegrid.orgGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    4. 4. ScientiFic
Research
Today • International
collaborations • IT
becomes
embedded
into
research
process:
data,
results,
 analysis,
visualization • Crossing
institutional
and
national
boundaries • Computational
techniques
increasingly
 important • ...
and
computationally
intensive
techniques
as
well • requires
use
of
high
performance
computing
systems • Data
volumes
are
growing
fast • hard
to
share • hard
to
manage • ScientiFic
software
often
difFicult
to
use • or
to
use
properly • Web
based
tools
increasingly
important • but
often
lack
disconnect
from
persisted
and
shared
resultsGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    5. 5. SBGrid
Consortium Cornell U. Washington U. School of Med. R. Cerione NE-CAT T. Ellenberger B. Crane R. Oswald D. Fremont S. Ealick C. Parrish Rosalind Franklin NIH M. Jin H. Sondermann D. Harrison M. Mayer A. Ke UMass Medical U. Washington T. Gonen U. Maryland W. Royer E. Toth Brandeis U. UC Davis N. Grigorieff H. Stahlberg Tufts U. K. Heldwein UCSF Columbia U. JJ Miranda Q. Fan Y. Cheng Rockefeller U. Stanford R. MacKinnon A. Brunger Yale U. K. Garcia T. Boggon K. Reinisch T. Jardetzky D. Braddock J. Schlessinger Y. Ha F. Sigworth CalTech E. Lolis F. Zhou P. Bjorkman Harvard and Affiliates W. Clemons N. Beglova A. Leschziner G. Jensen Rice University S. Blacklow K. Miller D. Rees E. Nikonowicz B. Chen A. Rao Y. Shamoo Vanderbilt J. Chou T. Rapoport Y.J. Tao Center for Structural Biology J. Clardy M. Samso WesternU W. Chazin C. Sanders M. Eck P. Sliz M. Swairjo B. Eichman B. Spiller B. Furie T. Springer M. Egli M. Stone R. Gaudet G. Verdine UCSD B. Lacy M. Waterman M. Grant G. Wagner T. Nakagawa M. Ohi S.C. Harrison L. Walensky H. Viadiu Thomas Jefferson J. Hogle S.Walker J. Williams D. Jeruzalmi T.Walz D. Kahne J. Wang Not Pictured: University of Toronto: L. Howell, E. Pai, F. Sicheri; NHRI (Taiwan): G. Liou; Trinity College, Dublin: Amir Khan T. Kirchhausen S. WongGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    6. 6. Boston
Life
Sciences
Hub • Biomedical
researchers • Government
agencies • Life
sciences Tufts • Universities Universit y School of Medicin e • HospitalsGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    7. 7. Study
of
Protein
Structure
 and
FunctionGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    8. 8. Study
of
Protein
Structure
 and
Function 1mmGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    9. 9. Study
of
Protein
Structure
 and
Function 400m 1mmGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    10. 10. Study
of
Protein
Structure
 and
Function 400m 1mmGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    11. 11. Study
of
Protein
Structure
 and
Function 400m 1mmGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    12. 12. Study
of
Protein
Structure
 and
Function 400m 1mmGrid Overview - Ian Stokes-Rees 10nm ijstokes@hkl.hms.harvard.edu
    13. 13. Study
of
Protein
Structure
 and
Function 400m 1mm 10nm • Shared
scientiFic
data
collection
facility • Data
intensive
(10‐100
GB/day)Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    14. 14. Cryo
Electron
Microscopy • Previously,
1­10,000
images,
managed
by
hand • Now,
robotic
systems
collect
millions
of
images • estimate
250,000
CPU­hours
to
reconstruct
modelGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    15. 15. Cryo
Electron
Microscopy • Previously,
1­10,000
images,
managed
by
hand • Now,
robotic
systems
collect
millions
of
images • estimate
250,000
CPU­hours
to
reconstruct
modelGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    16. 16. Cryo
Electron
Microscopy • Previously,
1­10,000
images,
managed
by
hand • Now,
robotic
systems
collect
millions
of
images • estimate
250,000
CPU­hours
to
reconstruct
modelGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    17. 17. Molecular
Dynamics
Simulations 1
fs
time
step 1ns
snapshot 1
us
simulation 1e6
steps 1000
frames 10
MB
/
frame 10
GB
/
sim 20
CPU­years 3
months
(wall­ clock)Big Data - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    18. 18. Molecular
Dynamics
Simulations 1
fs
time
step 1ns
snapshot 1
us
simulation 1e6
steps 1000
frames 10
MB
/
frame 10
GB
/
sim 20
CPU­years 3
months
(wall­ clock)Big Data - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    19. 19. Required: Collaborative
environment
for
compute
and
data
intensive
science
    20. 20. High
Energy
PhysicsGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    21. 21. High
Energy
Physics 40
MHz
bunch
crossing
rate 10
million
data
channels 1
KHz
level
1
event
recording
rate 1­10
MB
per
event 14
hours
per
day,
7+
months
/
year 4
detectors 6
PB
of
data
/
year globally
distribute
data
for
analysis
(x2)Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    22. 22. Open
Science
Grid http://opensciencegrid.org • US
National
 Cyberinfrastructure • Primarily
used
for
high
 energy
physics
computing • 80
sites • ~100,000
job
slots 5,073,293
hours ~570
years • ~1,500,000
hours
per
day • PB
scale
aggregate
storage • ~
1
PB
transferred
each
dayGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    23. 23. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    24. 24. Home About Us Informations TNGP News Calendar Document Download Jobs Forums Photogallery Publications Blog Related Links สัมมนาวิชาการเทคโนโลยีกริดและคลาวด์ Guestbook ศูนย์ไทยกริดแห่งชาติ สํานักงานส่งเสริมอุตสาหกรรมซอฟต์แวร์ Contact Us แห่งชาติ (องค์การมหาชน) ร่วมกับมหาวิทยาลัยเทคโนโลยีพะ Travel จอมเกล้าธนบุรี (มจธ) เป็นเจ้าภาพในการจัด... Healthy 6 May 2011 academic ประกาศผลการแข่งขัน โครงการ Grid Technology Innovat... ทําการแข่งขันเมื่อวันที่ 12-13 กุมภาพันธ์ 2554 25 March 2011 โปรแกรม R สําหรับงานวิเคราะห์และวิจัยด้านสถิติ R เป็นซอฟ์ทแวร์ สําหรับใช้ในงานด้านวิเคราะห์และวิจัยทางด้าน สถิติซึ่งนิยมใช้กันในผู้ที่ต้องทํางานด้านวิจัยที่เกี่ยวข้องกับการ สมัครรับข่าวสาร ยกเลิก คํานวณทางด้านสถิติ 28 February 2011 รัฐบาลมีนโยบายปฏิรูปการทํางาน และเพิ่มศักยภาพ ทุกภาคส่วน ของรัฐ ให้เอื้ออํานวยต่อการเสริมสร้าง ความเข้มแข็ง ของภาค เอกชน โดยการผลักดัน ยุทธศาสตร์ การเสริมสร้างศักยภาพการ แข่งขัน และการพัฒนา ที่ยั่งยืนของประเทศ และต้องการ พัฒนา ประเทศไปสู่สังคม แห่งภูมิปัญญา และ การเรียนรู้ (Knowledge Based Society) http://www.thaigrid.net/Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    25. 25. SimpliFied
Grid
ArchitectureGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    26. 26. Grid
Opportunities • New
compute
intensive
workFlows • think
big:
tens
or
hundreds
of
thousands
of
hours
Finished
in
1‐2
days • sharing
resources
for
efFicient
and
large
scale
utilization • Data
intensive
problems • we
mirror
20
GB
of
data
to
30
computing
centers • Data
movement,
management,
and
archive • Federated
identity
and
user
management • labs,
collaborations
or
ad‐hoc
groups • role‐based
access
control
(RBAC)
and
IdM • Collaborative
environment • Web‐based
access
to
applicationsGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    27. 27. Web
Portals
for
Collaborative,
Multi‐disciplinary
Research...
    28. 28. Web
Portals
for
Collaborative,
Multi‐disciplinary
Research......
which
leverage
capabilities
of
federated
 grid
computing
environments
    29. 29. The
Browser
as
the
 Universal
Interface • If
it
isn’t
already
obvious
to
you • Any
interactive
application
developed
today
should
be
web‐based
with
a
 RESTful
interface
(if
at
all
possible) • A
rich
set
of
tools
and
techniques • AJAX,
HTML4/5,
CSS,
and
JavaScript • Dynamic
content
negotiation • HTTP
headers,
caching,
security,
sessions/cookies • Scalable,
replicable,
centralized,
multi‐threaded,
 multi‐user • Alternatives • Command
Line
(CLI):
great
for
scriptable
jobs • GUI
toolkits:
necessary
for
applications
with
high
graphics
or
I/O
demandsGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    30. 30. What
is
a
web
portal? • A
web‐based
gateway
to
resources
and
data • simpliFied
access • centralized
access • uniFied
access
(CGI,
Perl,
Python,
PHP,
static
HTML,
static
Files,
etc.) • Attempt
to
provide
uniform
access
to
a
range
of
 services
and
resources • Data
access
via
HTTP • Leverage
brilliance
of
Apache
HTTPD
and
 associated
modulesGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    31. 31. SBGrid
Science
Portal
Objectives A.
 Extensible
infrastructure
to
facilitate
 development
and
deployment
of
novel
 computational
workFlows
 B. Web‐accessible
environment
for
collaborative,
 compute
and
data
intensive
scienceGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    32. 32. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    33. 33. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    34. 34. Protein
Structure
DeterminationGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    35. 35. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    36. 36. Results
Visualization
and
AnalysisGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    37. 37. Data
Access
    38. 38. User
access
to
results
dataGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    39. 39. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    40. 40. Experimental
Data
Access • Collaboration • Access
Control • Identity
Management • Data
Management • High
Performance
Data
Movement • Multi‐modal
AccessGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    41. 41. About
2PB
with 100
front
end
 servers
for
high
 bandwidth
parallel
 File
transferGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    42. 42. Globus
Online:
High
Performance
 Reliable
3rd
Party
File
TransferGUMS DN
to
user
mappingVOMS VO
membership portal cluster data collection lab file facility serverGrid Overview - Ian Stokes-Rees desktop laptop ijstokes@hkl.hms.harvard.edu
    43. 43. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    44. 44. Identity
Management
 and
Security
    45. 45. Access
ControlGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    46. 46. Access
Control • Need
a
strong
Identity
Management
environment • individuals:
identity
tokens
and
identiFiers • groups:
membership
lists • Active
Directory/CIFS
(Windows),
Open
Directory
(Apple),
FreeIPA
(Unix)
all
LDAP‐ basedGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    47. 47. Access
Control • Need
a
strong
Identity
Management
environment • individuals:
identity
tokens
and
identiFiers • groups:
membership
lists • Active
Directory/CIFS
(Windows),
Open
Directory
(Apple),
FreeIPA
(Unix)
all
LDAP‐ based • Need
to
manage
and
communicate
Access
Control
policies • institutionally
driven • user
drivenGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    48. 48. Access
Control • Need
a
strong
Identity
Management
environment • individuals:
identity
tokens
and
identiFiers • groups:
membership
lists • Active
Directory/CIFS
(Windows),
Open
Directory
(Apple),
FreeIPA
(Unix)
all
LDAP‐ based • Need
to
manage
and
communicate
Access
Control
policies • institutionally
driven • user
driven • Need
Authorization
System • Policy
Enforcement
Point
(shell
login,
data
access,
web
access,
start
application) • Policy
Decision
Point
(store
policies
and
understand
relationship
of
identity
token

 and
policy)Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    49. 49. Access
Control • What
is
a
user? • .htaccess
and
.htpasswd • local
system
user
(NIS
or
/etc/passwd) • portal
framework
user
(proprietary
DB
schema) • grid
user
(X.509
DN) • What
are
we
securing
access
to? • Web
pages? • URLs? • Data? • SpeciFic
operations? • Meta
Data? • What
kind
of
policies
do
we
enable? • Simplify
to
READ
WRITE
EXECUTE
LIST
ADMINGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    50. 50. UniFied
Account
Management Hierarchical
LDAP
database user
basics passwords Standard
schemas Relational
DB user
custom
proFiles institutions lab
groups Custom
schemasGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    51. 51. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    52. 52. Harvard Catalyst brings together the intellectual force, technologies, and clinical expertise of Harvard University and its affiliates and partners to reduce the burden of human illness.Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    53. 53. Architecture
Diagrams
    54. 54. Service
Architecture GlobusOnline UC San Diego @Argonne GUMS User GUMS GridFTP + glideinWMS data Hadoop factory Open Science Grid computations MyProxy @NCSA, UIUC monitoring interfaces data computation ID mgmt Ganglia scp Condor FreeIPA Apache DOEGrids CA Nagios GridFTP Cycle Server @Lawrence GridSite LDAP RSV SRM VDT Berkley Labs Django VOMS Globus pacct WebDAV Sage Math GUMS glideinWMS Gratia Accting R-Studio GACL @FermiLab file SQL shell CLI server DB cluster Monitoring SBGrid Science Portal @ Harvard Medical School @IndianaGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    55. 55. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    56. 56. Acknowledgements
&
Questions • Piotr
Sliz • Principle
Investigator,
head
of
SBGrid • SBGrid
Science
Portal • Daniel
O’Donovan,
Meghan
Porter‐Mahoney • SBGrid
System
Administrators • Ian
Levesque,
Peter
Doherty,
Steve
Jahl • Globus
Online
Team • Steve
Tueke,
Ian
Foster,
Rachana
 Ananthakrishnan,
Raj
Kettimuthu
 • Ruth
Pordes • Director
of
OSG,
for
championing
SBGridGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    57. 57. Acknowledgements
&
Questions • Piotr
Sliz Please
contact
me
 • Principle
Investigator,
head
of
SBGrid with
any
questions: • SBGrid
Science
Portal • Ian
Stokes‐Rees • Daniel
O’Donovan,
Meghan
Porter‐Mahoney • ijstokes@hkl.hms.harvard.edu • SBGrid
System
Administrators • ijstokes@spmetric.com • Ian
Levesque,
Peter
Doherty,
Steve
Jahl • Globus
Online
Team Look
at
our
work • Steve
Tueke,
Ian
Foster,
Rachana
 • portal.sbgrid.org Ananthakrishnan,
Raj
Kettimuthu
 • www.sbgrid.org • Ruth
Pordes • www.opensciencegrid.org • Director
of
OSG,
for
championing
SBGridGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    58. 58. Extra
SlidesGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    59. 59. Grid
Architectural
Details • Resources • Information • Uniform
compute
clusters • LDAP
based
most
common
(not
 • Managed
via
batch
queues optimized
for
writes) • Local
scratch
disk • Domain
speciFic
layer • Sometimes
high
perf.
network
 • Open
problem! (e.g.
InFiniBand) • Fabric • Behind
NAT
and
Firewall • In
most
cases,
assume
functioning
 • No
shell
access Internet • Data • Some
sites
part
of
experimental
 private
networks • Tape‐backed
mass
storage • Disk
arrays
(100s
TB
to
PB) • Security • High
bandwidth
(multi‐stream)
 • Typically
underpinned
by
X.509
 transfer
protocols Public
Key
Infrastructure • File
catalogs • Same
standards
as
SSL/TLS
and
 • Meta‐data “server
certs”
for
“https” • Replica
managementGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    60. 60. TeraGrid SBGrid User NERSC Community Open Science Grid National Federated Cyberinfrastructure Odyssey Facilitate
interface
 between
community
 and
cyberinfrastructure Orchestra EC2Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    61. 61. Existing
Security
 Infrastructure • X.509
certiFicates • Department
of
Energy
CA • Regional/Institutional
RAs
(SBGrid
is
an
RA) • X.509
proxy
certiFicate
system • Users
self‐sign
a
short‐lived
passwordless
proxy
certiFicate
used
for
“portable”
 and
“automated”
grid
processing
identity
token • Similarities
to
Kerberos
tokens • Virtual
Organizations
(VO)
for
deFinitions
of
roles,
 groups,
attrs • Attribute
CertiFicates • Users
can
(attempt)
to
fetch
ACs
from
the
VO
to
be
attached
to
proxy
certs • POSIX‐like
File
access
control
(Grid
ACL)
Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    62. 62. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    63. 63. Data
Model • Data
Tiers • VO­wide:
all
sites,
admin
managed,
very
stable • User
project:
all
sites,
user
managed,
1‐10
weeks,
1‐3
GB • User
static:
all
sites,
user
managed,
indeFinite,
10
MB • Job
set:
all
sites,
infrastructure
managed,
1‐10
days,
0.1‐1
GB • Job:
direct
to
worker
node,
infrastructure
managed,
1
day,
<10
MB • Job
indirect:
to
worker
node
via
UCSD,
infrastructure
managed,
1
 day,
<10
GBGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    64. 64. Data
Management quota du
scan tmpwatch conventions workFlow
integration Data
Movement scp
(users) rsync
(VO‐wide) grid‐ftp
(UCSD) curl
(WNs) cp
(NFS) htcp
(secure
web)Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    65. 65. red
­
push
Diles green
­
pull
DilesGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    66. 66. red
­
push
Diles green
­
pull
Diles 1.
user
Dile
uploadGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    67. 67. red
­
push
Diles green
­
pull
Diles 2.
replicate
gold
standard 1.
user
Dile
uploadGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    68. 68. 3.
Auto­replicate red
­
push
Diles green
­
pull
Diles 2.
replicate
gold
standard 1.
user
Dile
uploadGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    69. 69. 4.
pull
Diles
from UCSD
to
WNs 3.
Auto­replicate red
­
push
Diles green
­
pull
Diles 2.
replicate
gold
standard 1.
user
Dile
uploadGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    70. 70. 4.
pull
Diles
from UCSD
to
WNs 5.
pull
Diles
from 3.
Auto­replicate local
NSF
to
WNs red
­
push
Diles green
­
pull
Diles 2.
replicate
gold
standard 1.
user
Dile
uploadGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    71. 71. 4.
pull
Diles
from UCSD
to
WNs 5.
pull
Diles
from 3.
Auto­replicate local
NSF
to
WNs 6.
pull
Diles
from SBGrid
to
WNs red
­
push
Diles green
­
pull
Diles 2.
replicate
gold
standard 1.
user
Dile
uploadGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    72. 72. 4.
pull
Diles
from UCSD
to
WNs 5.
pull
Diles
from 3.
Auto­replicate local
NSF
to
WNs 6.
pull
Diles
from SBGrid
to
WNs red
­
push
Diles green
­
pull
Diles 2.
replicate
gold
standard 7.
job
results
copied
 back
to
SBGrid 1.
user
Dile
uploadGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    73. 73. 4.
pull
Diles
from UCSD
to
WNs 5.
pull
Diles
from 3.
Auto­replicate local
NSF
to
WNs 6.
pull
Diles
from SBGrid
to
WNs red
­
push
Diles green
­
pull
Diles 2.
replicate
gold
standard 7.
job
results
copied
 back
to
SBGrid 8a.
large
job
results
 copied
to
UCSD 8b.
later
pulled
to
 1.
user
Dile
upload SBGridGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    74. 74. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    75. 75. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    76. 76. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    77. 77. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    78. 78. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    79. 79. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    80. 80. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    81. 81. “weak” solution 2nx5q2 Log Likelihood GainMHC‐TCR:
2VLJ “strong” solution 1im3a2 Translation Z scoreGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    82. 82. • NEBioGrid
Django
Portal • PyGACL Interactive
dynamic
web
portal
for
 Python
representation
of
GACL
model
 workFlow
deFinition,
submission,
 and
API
to
work
with
GACL
Files monitoring,
and
access
control • osg_wrap • NEBioGrid
Web
Portal Swiss
army
knife
OSG
wrapper
script
to
 GridSite
based
web
portal
for
File‐system
 handle
File
staging,
parameter
sweep,
 level
access
(raw
job
output),
meta‐data
 DAG,
results
aggregation,
monitoring tagging,
X.509
access
control/sharing,
 • sbanalysis CGI data
analysis
and
graphing
tools
for
 • PyCCP4 structural
biology
data
sets Python
wrappers
around
CCP4
 • osg.monitoring structural
biology
applications tools
to
enhance
monitoring
of
job
set
 • PyCondor and
remote
OSG
site
status Python
wrappers
around
common
 • shex Condor
operations Write
bash
scripts
in
Python:
replicate
 enhanced
Condor
log
analysis commands,
syntax,
behavior • PyOSG • xconDig Python
wrappers
around
common
OSG
 Universal
conFiguration operationsGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    83. 83. 10k
grid
jobsExample
Job
Set approx
30k
CPU
hours 99.7%
success
rate evicted - red 24
wall
clock
hours completed - green held - orange MIT 5292 UWisc 1173 1077 120 1657 3 662 Cornell 840 20 Buffalo 720 628 ND 76 407 47 421 Caltech 190 FNAL 1409 237 12 24 79 4 47 UNL 6 1159 3 HMS 60 20 Purdue 349 10,000 jobs 52 17 39 UCR RENCI local queue remote queue SPRACE 1216 running 316 248Grid Overview - Ian Stokes-Rees 24 hours ijstokes@hkl.hms.harvard.edu
    84. 84. Job
LifelinesGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    85. 85. Typical
Layered
Environment Fortran bin • Command
line
application
(e.g.
Fortran) • Friendly
application
API
wrapper Python API Map- • Batch
execution
wrapper
for
N‐iterations Multi-exec wrapper Reduce • Results
extraction
and
aggregation Result aggregator • Grid
job
management
wrapper Grid management • Web
interface Web interface • forms,
views,
static
HTML
results • GOAL
eliminate
shell
scripts • often
found
as
“glue”
language
between
layersGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    86. 86. REST • Don’t
try
to
read
too
much
into
the
name • REpresentational
State
Transfer:
coined
by
Roy
Fielding,
co‐author
of
 HTTP
protocol
and
contributor
to
original
Apache
httpd
server • Idea • The
web
is
the
worlds
largest
asynchronous,
distributed,
parallel
 computational
system • Resources
are
“hidden”
but
representations
are
accessible
via
URLs • Representations
can
be
manipulated
via
HTTP
operations
GET
PUT
POST
 HEAD
DELETE
and
associated
state • State
transitions
are
initiated
by
software
or
by
humans • Implication • Clean
URLs
(e.g.
Flickr)Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    87. 87. Big Data - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    88. 88. Cloud
Computing: Industry
solution
to
the
Grid • Virtualization
has
taken
off
in
the
past
5
years • VMWare,
Xen,
VirtualPC,
VirtualBox,
QEMU,
etc. • Builds
on
ideas
from
VMS
(i.e.
old) • (Good)
System
administrators
are
hard
to
come
by • And
operating
a
large
data
center
is
costly • Internet
boom
means
there
are
companies
that
have
Figured
out
 how
to
do
this
really
well • Google,
Amazon,
Yahoo,
Microsoft,
etc. • Outsource
IT
infrastructure!

Outsource
software
hosting! • Amazon
EC2,
Microsoft
Azure,
RightScale,
Force.com,
Google
Apps • Over
simpliFied: • You
can’t
install
a
cloud • You
can’t
buy
a
gridGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    89. 89. Is
“Cloud”
the
new
“Grid”? • Grid
is
about
mechanisms
for
federated,
 distributed,
heterogeneous
shared
compute
and
 storage
resources • standards
and
software • Cloud
is
about
on‐demand
provisioning
of
 compute
and
storage
resources • services No
one
buys
a
grid.

No
one
installs
a
cloud.Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    90. 90. The
interesting
thing
about
Cloud
Computing
is
that
 we’ve
redeTined
Cloud
Computing
to
include
 everything
that
we
already
do.
.
.
.
I
don’t
understand
 what
we
would
do
differently
in
the
light
of
Cloud
 Computing
other
than
change
the
wording
of
some
of
 our
ads. Larry
Ellison,
Oracle
CEO,
quoted
in
the
Wall
Street
Journal,
September
26,
2008*
 *http://blogs.wsj.com/biztech/2008/09/25/larry‐ellisons‐brilliant‐anti‐cloud‐computing‐rant/Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
    91. 91. When
is
cloud
computing
 interesting? • My
deFinition
of
“cloud
computing” • Dynamic
compute
and
storage
infrastructure
provisioning
in
a
scalable
manner
providing
 uniform
interfaces
to
virtualized
resources • The
underlying
resources
could
be • 
“in‐house”
using
licensed/purchased
software/hardware • “external”
hosted
by
a
service/infrastructure
provider • Consider
using
cloud
computing
if • You
have
operational
problems/constraints
in
your
current
data
center • You
need
to
dynamically
scale
(up
or
down)
access
to
services
and
data • You
want
fast
provisioning,
lots
of
bandwidth,
and
low
latency • Organizationally
you
can
live
with
outsourcing
responsibility
for
(some
of)
your
data
and
 applications • Consider
providing
cloud
computing
services
if • You
have
an
ace
team
efFiciently
running
your
existing
data
center • You
have
lots
of
experience
with
virtualization • You
have
a
speciFic
application/domain
that
could
beneFit
from
being
tied
to
a
large
compute
 farm
or
disk
array
with
great
Internet
connectivityGrid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu

    ×