SlideShare a Scribd company logo
Adapting	
  federated	
  
   cyberinfrastructure	
  for	
  
    shared	
  data	
  collection	
  
facilities	
  in	
  structural	
  biology



                    GlobusWorld	
  2012

Ian	
  Stokes-­‐Rees	
  ijstokes@seas.harvard.edu	
  @ijstokes
                   Harvard	
  Medical	
  School
J.	
  Synchrotron	
  Rad.	
  (2012)	
  19
              doi:10.1107/S0909049512009776




Online	
  6	
  April	
  2012
Structural	
  Biology:
       Study	
  of	
  Protein	
  Structure	
  and	
  Function




                                                                    400m
                            1mm




                                                                           10nm
                                  • Shared	
  scientiRic	
  data	
  collection	
  facility
                                  • Data	
  intensive	
  (10-­‐100	
  GB/day)
Grid Overview - Ian Stokes-Rees                                     ijstokes@hkl.hms.harvard.edu
Cryo	
  Electron	
  Microscopy




      •   Previously,	
  1-­10,000	
  images,	
  managed	
  by	
  hand
      •   Now,	
  robotic	
  systems	
  collect	
  millions	
  of	
  hi-­res	
  images
      •   estimate	
  250,000	
  CPU-­hours	
  to	
  reconstruct	
  model
      •   20-­50	
  TB	
  of	
  data	
  in	
  most	
  extreme	
  case
Grid Overview - Ian Stokes-Rees                               ijstokes@hkl.hms.harvard.edu
Molecular	
  Dynamics	
  Simulations
                                        1	
  fs	
  time	
  step
                                        1ns	
  snapshot
                                        1	
  us	
  simulation
                                        1e6	
  steps
                                        1000	
  frames
                                        10	
  MB	
  /	
  frame
                                        10	
  GB	
  /	
  sim
                                        20	
  CPU-­years
                                        3	
  months	
  (wall-­
                                        clock)

Grid Overview - Ian Stokes-Rees   ijstokes@hkl.hms.harvard.edu
SBGrid	
  Consortium                                                Cornell U.
           Washington U. School of Med.
                                                                                                       R. Cerione              NE-CAT
           T. Ellenberger
                                                                                                       B. Crane                R. Oswald
           D. Fremont
                                                                                                       S. Ealick               C. Parrish
                                               Rosalind Franklin NIH                                   M. Jin                  H. Sondermann
                                               D. Harrison                    M. Mayer
                                                                                                       A. Ke                    UMass Medical
           U. Washington
           T. Gonen
                                                                              U. Maryland                                       W. Royer
                                                                              E. Toth
                                                                                                                                 Brandeis U.
           UC Davis                                                                                                              N. Grigorieff
           H. Stahlberg                                                                                                          Tufts U.
                                                                                                                                 K. Heldwein
           UCSF                                                                                                                  Columbia U.
           JJ Miranda                                                                                                            Q. Fan
           Y. Cheng
                                                                                                                                 Rockefeller U.
           Stanford                                                                                                              R. MacKinnon
           A. Brunger                                                                                               Yale U.
           K. Garcia                                                                                                T. Boggon            K. Reinisch
           T. Jardetzky                                                                                             D. Braddock          J. Schlessinger
                                                                                                                    Y. Ha                F. Sigworth
           CalTech                                                                                                  E. Lolis             F. Zhou
           P. Bjorkman                                                                                              Harvard and Affiliates
           W. Clemons                                                                                                   N. Beglova         A. Leschziner
           G. Jensen                       Rice University                                                              S. Blacklow        K. Miller
           D. Rees                           E. Nikonowicz                                                              B. Chen            A. Rao
                                             Y. Shamoo                Vanderbilt                                        J. Chou            T. Rapoport
                                             Y.J. Tao                 Center for Structural Biology                     J. Clardy          M. Samso
           WesternU
                                                                      W. Chazin          C. Sanders                     M. Eck             P. Sliz
           M. Swairjo
                                                                      B. Eichman         B. Spiller                     B. Furie           T. Springer
                                                                      M. Egli            M. Stone                       R. Gaudet          G. Verdine
           UCSD                                                       B. Lacy            M. Waterman                    M. Grant           G. Wagner
           T. Nakagawa                                                M. Ohi                                            S.C. Harrison      L. Walensky
           H. Viadiu                                                 Thomas Jefferson                                   J. Hogle           S.Walker
                                                                     J. Williams                                        D. Jeruzalmi       T.Walz
                                                                                                                        D. Kahne           J. Wang
       Not Pictured:
       University of Toronto: L. Howell, E. Pai, F. Sicheri; NHRI (Taiwan): G. Liou; Trinity College, Dublin: Amir Khan T. Kirchhausen     S. Wong
Grid Overview - Ian Stokes-Rees                                                                                   ijstokes@hkl.hms.harvard.edu
Boston	
  Life	
  Sciences	
  Hub


   •   Biomedical	
  researchers
   •   Government	
  agencies
   •   Life	
  sciences                                 Tufts



   •   Universities
                                                        Universit
                                                        y
                                                        School
                                                        of
                                                        Medicin
                                                        e



   •   Hospitals




Grid Overview - Ian Stokes-Rees      ijstokes@hkl.hms.harvard.edu
Data	
  Access
Globus	
  Online:	
  High	
  Performance	
  
           Reliable	
  3rd	
  Party	
  File	
  Transfer
GUMS
 DN	
  to	
  user	
  mapping                                                CertiRicate	
  Authority
VOMS                                                                           root	
  of	
  trust
 VO	
  membership




                        portal

         cluster

                                           Globus	
  Online
                                              Rile	
  transfer	
  service



                                                                                   data collection
                     lab file
                                                                                       facility
                     server



                                 desktop    laptop
User	
  can	
  directly	
  access	
  lab	
  
                                                       or	
  facility	
  data	
  from	
  laptop
                                                                  "Sco?"+from
                               Harvard                                                           general+public
Local	
  accounts	
  within                                       the+Sliz+lab                                    Public	
  access	
  available	
  to	
  
lab	
  infrastructure               ~sue
                                                                                                                  archived	
  data	
  through	
  web	
  
                                    ~andy                                                                         interface
                                    ~sco?
                                    Sliz+lab



                               NEBCAT+beamline+at+APS
                                                                                                      WWW
                                                            data
                                                            collec(on

Shared	
  (lab	
  level)	
  
accounts	
  at	
  facility
                                                                     /data/sliz                   /public/2009/
                                                                                                                           Embargo	
  policy	
  to	
  
                                         /stage/sliz                 /data/murphy                 /embarg/2010/            hold	
  deposited	
  data	
  
                                         /stage/murphy
                                                                                                                           for	
  agreed	
  time
Tiered	
  storage                                                    /data/deacon                 /embarg/2011/


                                       6+month                   10+TB+per+group+                  10+PB
                                    staging+storage             permanent+archive               public+archive
                                           Tier'1                     Tier'2                          Tier'3




                                                                                                                              VO	
  management
                                                                                            VOMS
                                   Globus'Online                      SBGrid'Science'Portal
Architecture
✦   SBGrid
    •       manages	
  all	
  user	
  account	
  creation	
  and	
  credential	
  mgmt
    •       hosts	
  MyProxy,	
  VOMS,	
  GridFTP,	
  and	
  user	
  interfaces

✦   Facility
    •       knows	
  about	
  lab	
  groups
        •       e.g.	
  “Harrison”,	
  “Sliz”
    •       delegates	
  knowledge	
  of	
  group	
  membership	
  to	
  SBGrid	
  VOMS
        •       facility	
  can	
  poll	
  VOMS	
  for	
  list	
  of	
  current	
  members
    •       uses	
  X.509	
  for	
  user	
  identiRication
    •       deploys	
  GridFTP	
  server

✦   Lab	
  group
    •       designates	
  group	
  manager	
  that	
  adds/removes	
  individuals
    •       deploys	
  GridFTP	
  server	
  or	
  Globus	
  Connect	
  client

✦   Individual
    •       username/password	
  to	
  access	
  facility	
  and	
  lab	
  storage
    •       Globus	
  Connect	
  for	
  personal	
  GridFTP	
  server	
  to	
  laptop
    •       Globus	
  Online	
  web	
  interface	
  to	
  “drive”	
  transfers
Objective

✦   Easy	
  to	
  use	
  high	
  performance	
  data	
  mgmt	
  
    environment
✦   Fast	
  and	
  reliable	
  Rile	
  transfer
    •   facility-­‐to-­‐lab,	
  facility-­‐to-­‐individual,	
  lab-­‐to-­‐individual
✦   Reduced	
  administrative	
  overhead
✦   Better	
  data	
  curation
✦   Release	
  data	
  to	
  public	
  after	
  embargo	
  period
Challenges
✦   Access	
  control
    •   visibility
    •   policies
✦   Provenance
    •   data	
  origin
    •   history
✦   Meta-­‐data
    •   attributes
    •   searching
Grid	
  Computing	
  Today
Capabilities

✦   Server-­‐to-­‐server	
  interaction
✦   Federated	
  aggregation	
  of	
  CPU	
  power
✦   Predictable	
  data	
  patterns
✦   Command-­‐line	
  access	
  from	
  specialized	
  servers
✦   Web-­‐based	
  access	
  through	
  “Science	
  Gateways”
    •   pre-­‐deRined	
  computing	
  workRlows
    •   e.g.	
  SBGrid	
  Science	
  Portal
Weaknesses

✦   Federated	
  user	
  identity	
  system
    •   X.509	
  digital	
  certiRicates	
  and	
  associated	
  apparatus
✦   Ease	
  of	
  use	
  for	
  end	
  users
✦   Data	
  management	
  tools
✦   Support	
  for	
  collaborations
Opportunities

✦   “Last	
  mile”	
  challenge
    •   to	
  the	
  desktop
    •   to	
  the	
  laptop
✦   UniRied	
  identity	
  management
    •   centralized	
  set	
  of	
  credentials	
  for	
  each	
  person
✦   Empower	
  collaborations	
  to	
  self-­‐manage
✦   Shift	
  of	
  focus	
  from	
  “compute”	
  to	
  “data”
    •   for	
  users
    •   for	
  facilities	
  where	
  data	
  is	
  the	
  main	
  challenge
User	
  Credentials
X.509	
  Digital	
  CertiRicates
✦   Analogy	
  to	
  a	
  passport:
    •   Application	
  form
    •   Sponsor’s	
  attestation
    •   Consular	
  services
        •   veriRication	
  of	
  application,	
  sponsor,	
  and	
  accompanying	
  
            identiRication	
  and	
  eligibility	
  documents
    •   Passport	
  issuing	
  ofRice
✦   Portable,	
  digital	
  passport
    •   Rixed	
  and	
  secure	
  user	
  identiRiers
        •   name,	
  email,	
  home	
  institution
    •   signed	
  by	
  widely	
  trusted	
  issuer
    •   time	
  limited
    •   ISO	
  standard
U1          U1        U1



Addressing	
  CertiRicate	
  Problems
                /.'           -.'.)"*&'                           012*%2!' 3%"!'
                                                                             )"*"!4&"'
                                                                             ,"!&'5":'14(!'
                                        !"#$"%&'%()*"+',"!&'
                                                                                 U1
                                      !"&$!*'&!4,5(*)'*$67"!''

                    *289:'4)"*&%'
                    !";("<'!"#$"%&'
                                       R1
time




                                       ;"!(9:'$%"!'"=()(7(=(&:'

                                         ,2*>!6'"=()(7(=(&:'
                                                                         S1
                    411!2;"',"!&'
                                       R2
       %()*',"!&'
                                         *289:'4;4(=47(=(&:'

                                            !"&!(";"',"!&'
                                                                                U2a
                                                                              "?12!&'%()*"+'
                                                                              ,"!&'5":'14(!'
VO	
  (Group)	
  Membership	
  
               Registration
             !")*#        !"#$%&'(#                                 *+,(-,.#    /-0.#
                                 +.0-0(:#50.:#:,#.0?>0-:#&0&90.-<'+#93#@A#

                                 .0?>0-:#!"#8.,>+-#4(%#.,70-#                           U2b
                  (,123#4%&'(#
                                  V1
                                         ;0.'23#>-0.#07'8'9'7':3#

                                          5,(6.&#07'8'9'7':3#
                                                                           S2
time




                  4++.,;0#&0&90.-<'+=#
                  8.,>+-=#4(%#.,70-#
        4%%#@A#                     V2
       :,#!")*#
                                                 (,123#


                                          .0?>0-:#!")*#$B#
                                            .0:>.(#!")*#$B#                        4%%#$B#:,#
                                                                                   +.,C3#50.:#
()'               AB)!'       !"#$%&'                   *+",-"#'               .-/#'

                                    ;<'                           #/>:/-$'+"#$%&'%66":,$'
       =3!#"I3'     ;<=*'          )@81,'        0/#176%?",'/8%1&'-/,$'
                                                                              /8%1&'0/#17/@'           U1
                                                4/,/#%$/'                   !"#$%&'(%)*+%
                      #/>:/-$'-14,/@'6/#$'      6/#$'9/3'+%1#'              ',,#""%+#0%
                                                                            1*$/'2%
                  #/$:#,'$#%691,4',:85/#'
                  ,"?23'%4/,$-'
                                         0/#123'/&14151&1$3'
                               A1a
                           6#/%$/'                    6",7#8'/&14151&1$3'
                           &"6%&'%66$'
                                                                              S1*
                       %++#"0/'6/#$'
time




                  -14,'6/#$'
                  ,"?23'%0%1&%51&1$3'     A1b
                                         -/$'#/$#1/0%&'-/#1%&',:85/#'
                      -/$';<'#14C$-'
                                         %66":,$'#/%@3',"?76%?",'
                                                                                 +"#$%&'&"41,'
                                                                                                 U2*
                  #/>:/-$'-14,/@'6/#?76%$/'
                  #/$:#,'-14,/@'6/#?76%$/'       D'E'+%1#'-14,/@'6/#$'
                                                 1,$"'!F(*GDH'7&/'
              #/41-$/#'+#"I3'6/#$'
                                                 H'E'6#/%$/'&"6%&'
              J1$C'=3!#"I3'                                                 !"#$%&'(%)*+%
                                                 +#"I3'6/#$'
                                                                            ',,#""%-#.#$'/#.%
                                                                            $#"*!$,#"%
Process	
  and	
  Design	
  Improvements
  ✦   Single	
  web-­‐form	
  application
      •   includes	
  e-­‐mail	
  veriRicationn
  ✦   Centralized	
  and	
  connected	
  credential	
  management
      •   FreeIPA	
  LDAP	
  -­‐	
  user	
  directory	
  and	
  credential	
  store
      •   VOMS	
  -­‐	
  lab,	
  institution,	
  and	
  collaboration	
  afRiliations
      •   MyProxy	
  -­‐	
  X.509	
  credential	
  store
  ✦   Overlap	
  administrative	
  roles
      •   system	
  admin
      •   registration	
  agent	
  for	
  certiRicate	
  authority	
  (approve	
  X.509	
  
          request)
      •   VO	
  administrator	
  to	
  register	
  group	
  afRiliations
  ✦   Automation
“Last	
  Mile”	
  and
  Ease	
  of	
  Use
Grid	
  computing	
  at	
  your	
  desk
✦   Platforms
    •   Windows
    •   OS	
  X
✦   Web	
  interfaces
    •   SBGrid	
  Science	
  Portal
✦   One-­‐stop-­‐shop	
  for	
  account	
  management
    •   Centralized	
  services
    •   Automated	
  processes
    •   Administrator	
  intervention
    •   Support
Ryan,	
  a	
  postdoc	
  in	
  the	
  
                                                                                                                                Frank	
  Lab	
  at	
  Columbia

                                                                                                                                Access	
  NRAMM	
  facilities	
  
                                                                                                                                securely	
  and	
  transfer	
  data	
  
                                                                                                                                back	
  to	
  home	
  institute
                   automated	
  X.509
                           check	
  SBGrid	
  for	
  
                   application group	
  
                           Ryan’s	
  
                           membership                                                /data/columbia/frank
                                                                    facility file
                                                                      server                    transfer	
  data	
  to	
  lab
                       veriRication	
  in	
  Frank	
  Lab,	
  so	
  
                                       of	
  
                                       grant	
  access	
  to	
  Riles
                       lab	
  membership


        SBGrid                                 Ryan	
  initiate	
  tfransfer	
  at	
  
                                                        applies	
   or	
  an	
           request	
  access
        Science                                account	
  at	
  the	
  SBGrid	
  
                                                    NRAMM                                to	
  NRAMM
         Portal                                Science	
  Portal                         facility
                                   using	
  credential	
  
                                                     notify	
  user	
  of	
  
                                   held	
  by	
  SBGrid                                                                                   lab file
                                                                                                                  desktop
                                                     completion                                                                           server
automated	
                                                                                                         /nfs/data/rsmith
Globus	
  Online	
  
application
                                        use	
  Globus	
  Online	
  
                                        to	
  manage
                                        transfer	
  from	
                                   /Users/Ryan
                                                                                    laptop
                                        NRAMM	
  back	
  to	
  lab
Architecture	
  Diagrams
Service	
  Architecture
            GlobusOnline                            UC San Diego
             @Argonne              GUMS
User                              GUMS
                               GridFTP +            glideinWMS
          data                  Hadoop                 factory         Open Science Grid


 computations
                                                                                    MyProxy
                                                                                  @NCSA, UIUC
 monitoring      interfaces            data          computation    ID mgmt
  Ganglia                         scp                Condor         FreeIPA
                 Apache                                                            DOEGrids CA
  Nagios                          GridFTP            Cycle Server                   @Lawrence
                 GridSite                                           LDAP
  RSV                             SRM                VDT                           Berkley Labs
                 Django                                             VOMS
                                                     Globus
  pacct                           WebDAV
                 Sage Math                                          GUMS
                                                     glideinWMS                   Gratia Acct'ing
                 R-Studio                                           GACL           @FermiLab
                                file          SQL
                 shell CLI    server          DB       cluster
                                                                                    Monitoring
 SBGrid Science Portal @ Harvard Medical School                                     @Indiana
Grid Overview - Ian Stokes-Rees   ijstokes@hkl.hms.harvard.edu
Acknowledgements
✦   Piotr	
  Sliz
✦   SBGrid	
  Science	
  Portal
    •   Daniel	
  O’Donovan,	
  Meghan	
  Porter-­‐Mahoney
✦   SBGrid	
  System	
  Administrators
    •   Ian	
  Levesque,	
  Peter	
  Doherty,	
  Steve	
  Jahl
✦   Facility	
  Collaborators
    •   Frank	
  Murphy	
  (NE-­‐CAT/APS)
    •   Ashley	
  Deacon	
  (JCSG/SLAC)
✦   Globus	
  Online	
  Team
    •   Steve	
  Tueke,	
  Ian	
  Foster,	
  Rachana	
  
        Ananthakrishnan,	
  Raj	
  Kettimuthu	
  
✦   Ruth	
  Pordes
    •   Director	
  of	
  OSG,	
  for	
  championing	
  SBGrid

More Related Content

More from Boston Consulting Group

Cloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science TeamsCloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science Teams
Boston Consulting Group
 
Cloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science TeamsCloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science Teams
Boston Consulting Group
 
Beyond the Science Gateway
Beyond the Science GatewayBeyond the Science Gateway
Beyond the Science Gateway
Boston Consulting Group
 
Anaconda Data Science Collaboration
Anaconda Data Science CollaborationAnaconda Data Science Collaboration
Anaconda Data Science Collaboration
Boston Consulting Group
 
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
Boston Consulting Group
 
Big Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data setsBig Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data sets
Boston Consulting Group
 
Wide Search Molecular Replacement and the NEBioGrid portal interface
Wide Search Molecular Replacement and the NEBioGrid portal interfaceWide Search Molecular Replacement and the NEBioGrid portal interface
Wide Search Molecular Replacement and the NEBioGrid portal interface
Boston Consulting Group
 
To Infiniband and Beyond
To Infiniband and BeyondTo Infiniband and Beyond
To Infiniband and Beyond
Boston Consulting Group
 

More from Boston Consulting Group (8)

Cloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science TeamsCloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science Teams
 
Cloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science TeamsCloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science Teams
 
Beyond the Science Gateway
Beyond the Science GatewayBeyond the Science Gateway
Beyond the Science Gateway
 
Anaconda Data Science Collaboration
Anaconda Data Science CollaborationAnaconda Data Science Collaboration
Anaconda Data Science Collaboration
 
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
2011 11 pre_cs50_accelerating_sciencegrid_ianstokesrees
 
Big Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data setsBig Data: tools and techniques for working with large data sets
Big Data: tools and techniques for working with large data sets
 
Wide Search Molecular Replacement and the NEBioGrid portal interface
Wide Search Molecular Replacement and the NEBioGrid portal interfaceWide Search Molecular Replacement and the NEBioGrid portal interface
Wide Search Molecular Replacement and the NEBioGrid portal interface
 
To Infiniband and Beyond
To Infiniband and BeyondTo Infiniband and Beyond
To Infiniband and Beyond
 

Recently uploaded

Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 

Recently uploaded (20)

Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Artificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic WarfareArtificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic Warfare
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 

Adapting federated cyberinfrastructure for shared data collection facilities in structural biology

  • 1. Adapting  federated   cyberinfrastructure  for   shared  data  collection   facilities  in  structural  biology GlobusWorld  2012 Ian  Stokes-­‐Rees  ijstokes@seas.harvard.edu  @ijstokes Harvard  Medical  School
  • 2. J.  Synchrotron  Rad.  (2012)  19 doi:10.1107/S0909049512009776 Online  6  April  2012
  • 3. Structural  Biology: Study  of  Protein  Structure  and  Function 400m 1mm 10nm • Shared  scientiRic  data  collection  facility • Data  intensive  (10-­‐100  GB/day) Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
  • 4. Cryo  Electron  Microscopy • Previously,  1-­10,000  images,  managed  by  hand • Now,  robotic  systems  collect  millions  of  hi-­res  images • estimate  250,000  CPU-­hours  to  reconstruct  model • 20-­50  TB  of  data  in  most  extreme  case Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
  • 5. Molecular  Dynamics  Simulations 1  fs  time  step 1ns  snapshot 1  us  simulation 1e6  steps 1000  frames 10  MB  /  frame 10  GB  /  sim 20  CPU-­years 3  months  (wall-­ clock) Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
  • 6. SBGrid  Consortium Cornell U. Washington U. School of Med. R. Cerione NE-CAT T. Ellenberger B. Crane R. Oswald D. Fremont S. Ealick C. Parrish Rosalind Franklin NIH M. Jin H. Sondermann D. Harrison M. Mayer A. Ke UMass Medical U. Washington T. Gonen U. Maryland W. Royer E. Toth Brandeis U. UC Davis N. Grigorieff H. Stahlberg Tufts U. K. Heldwein UCSF Columbia U. JJ Miranda Q. Fan Y. Cheng Rockefeller U. Stanford R. MacKinnon A. Brunger Yale U. K. Garcia T. Boggon K. Reinisch T. Jardetzky D. Braddock J. Schlessinger Y. Ha F. Sigworth CalTech E. Lolis F. Zhou P. Bjorkman Harvard and Affiliates W. Clemons N. Beglova A. Leschziner G. Jensen Rice University S. Blacklow K. Miller D. Rees E. Nikonowicz B. Chen A. Rao Y. Shamoo Vanderbilt J. Chou T. Rapoport Y.J. Tao Center for Structural Biology J. Clardy M. Samso WesternU W. Chazin C. Sanders M. Eck P. Sliz M. Swairjo B. Eichman B. Spiller B. Furie T. Springer M. Egli M. Stone R. Gaudet G. Verdine UCSD B. Lacy M. Waterman M. Grant G. Wagner T. Nakagawa M. Ohi S.C. Harrison L. Walensky H. Viadiu Thomas Jefferson J. Hogle S.Walker J. Williams D. Jeruzalmi T.Walz D. Kahne J. Wang Not Pictured: University of Toronto: L. Howell, E. Pai, F. Sicheri; NHRI (Taiwan): G. Liou; Trinity College, Dublin: Amir Khan T. Kirchhausen S. Wong Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
  • 7. Boston  Life  Sciences  Hub • Biomedical  researchers • Government  agencies • Life  sciences Tufts • Universities Universit y School of Medicin e • Hospitals Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
  • 9. Globus  Online:  High  Performance   Reliable  3rd  Party  File  Transfer GUMS DN  to  user  mapping CertiRicate  Authority VOMS root  of  trust VO  membership portal cluster Globus  Online Rile  transfer  service data collection lab file facility server desktop laptop
  • 10. User  can  directly  access  lab   or  facility  data  from  laptop "Sco?"+from Harvard general+public Local  accounts  within the+Sliz+lab Public  access  available  to   lab  infrastructure ~sue archived  data  through  web   ~andy interface ~sco? Sliz+lab NEBCAT+beamline+at+APS WWW data collec(on Shared  (lab  level)   accounts  at  facility /data/sliz /public/2009/ Embargo  policy  to   /stage/sliz /data/murphy /embarg/2010/ hold  deposited  data   /stage/murphy for  agreed  time Tiered  storage /data/deacon /embarg/2011/ 6+month 10+TB+per+group+ 10+PB staging+storage permanent+archive public+archive Tier'1 Tier'2 Tier'3 VO  management VOMS Globus'Online SBGrid'Science'Portal
  • 11. Architecture ✦ SBGrid • manages  all  user  account  creation  and  credential  mgmt • hosts  MyProxy,  VOMS,  GridFTP,  and  user  interfaces ✦ Facility • knows  about  lab  groups • e.g.  “Harrison”,  “Sliz” • delegates  knowledge  of  group  membership  to  SBGrid  VOMS • facility  can  poll  VOMS  for  list  of  current  members • uses  X.509  for  user  identiRication • deploys  GridFTP  server ✦ Lab  group • designates  group  manager  that  adds/removes  individuals • deploys  GridFTP  server  or  Globus  Connect  client ✦ Individual • username/password  to  access  facility  and  lab  storage • Globus  Connect  for  personal  GridFTP  server  to  laptop • Globus  Online  web  interface  to  “drive”  transfers
  • 12.
  • 13. Objective ✦ Easy  to  use  high  performance  data  mgmt   environment ✦ Fast  and  reliable  Rile  transfer • facility-­‐to-­‐lab,  facility-­‐to-­‐individual,  lab-­‐to-­‐individual ✦ Reduced  administrative  overhead ✦ Better  data  curation ✦ Release  data  to  public  after  embargo  period
  • 14. Challenges ✦ Access  control • visibility • policies ✦ Provenance • data  origin • history ✦ Meta-­‐data • attributes • searching
  • 16. Capabilities ✦ Server-­‐to-­‐server  interaction ✦ Federated  aggregation  of  CPU  power ✦ Predictable  data  patterns ✦ Command-­‐line  access  from  specialized  servers ✦ Web-­‐based  access  through  “Science  Gateways” • pre-­‐deRined  computing  workRlows • e.g.  SBGrid  Science  Portal
  • 17. Weaknesses ✦ Federated  user  identity  system • X.509  digital  certiRicates  and  associated  apparatus ✦ Ease  of  use  for  end  users ✦ Data  management  tools ✦ Support  for  collaborations
  • 18. Opportunities ✦ “Last  mile”  challenge • to  the  desktop • to  the  laptop ✦ UniRied  identity  management • centralized  set  of  credentials  for  each  person ✦ Empower  collaborations  to  self-­‐manage ✦ Shift  of  focus  from  “compute”  to  “data” • for  users • for  facilities  where  data  is  the  main  challenge
  • 20. X.509  Digital  CertiRicates ✦ Analogy  to  a  passport: • Application  form • Sponsor’s  attestation • Consular  services • veriRication  of  application,  sponsor,  and  accompanying   identiRication  and  eligibility  documents • Passport  issuing  ofRice ✦ Portable,  digital  passport • Rixed  and  secure  user  identiRiers • name,  email,  home  institution • signed  by  widely  trusted  issuer • time  limited • ISO  standard
  • 21. U1 U1 U1 Addressing  CertiRicate  Problems /.' -.'.)"*&' 012*%2!' 3%"!' )"*"!4&"' ,"!&'5":'14(!' !"#$"%&'%()*"+',"!&' U1 !"&$!*'&!4,5(*)'*$67"!'' *289:'4)"*&%' !";("<'!"#$"%&' R1 time ;"!(9:'$%"!'"=()(7(=(&:' ,2*>!6'"=()(7(=(&:' S1 411!2;"',"!&' R2 %()*',"!&' *289:'4;4(=47(=(&:' !"&!(";"',"!&' U2a "?12!&'%()*"+' ,"!&'5":'14(!'
  • 22. VO  (Group)  Membership   Registration !")*# !"#$%&'(# *+,(-,.# /-0.# +.0-0(:#50.:#:,#.0?>0-:#&0&90.-<'+#93#@A# .0?>0-:#!"#8.,>+-#4(%#.,70-# U2b (,123#4%&'(# V1 ;0.'23#>-0.#07'8'9'7':3# 5,(6.&#07'8'9'7':3# S2 time 4++.,;0#&0&90.-<'+=# 8.,>+-=#4(%#.,70-# 4%%#@A# V2 :,#!")*# (,123# .0?>0-:#!")*#$B# .0:>.(#!")*#$B# 4%%#$B#:,# +.,C3#50.:#
  • 23. ()' AB)!' !"#$%&' *+",-"#' .-/#' ;<' #/>:/-$'+"#$%&'%66":,$' =3!#"I3' ;<=*' )@81,' 0/#176%?",'/8%1&'-/,$' /8%1&'0/#17/@' U1 4/,/#%$/' !"#$%&'(%)*+% #/>:/-$'-14,/@'6/#$' 6/#$'9/3'+%1#' ',,#""%+#0% 1*$/'2% #/$:#,'$#%691,4',:85/#' ,"?23'%4/,$-' 0/#123'/&14151&1$3' A1a 6#/%$/' 6",7#8'/&14151&1$3' &"6%&'%66$' S1* %++#"0/'6/#$' time -14,'6/#$' ,"?23'%0%1&%51&1$3' A1b -/$'#/$#1/0%&'-/#1%&',:85/#' -/$';<'#14C$-' %66":,$'#/%@3',"?76%?",' +"#$%&'&"41,' U2* #/>:/-$'-14,/@'6/#?76%$/' #/$:#,'-14,/@'6/#?76%$/' D'E'+%1#'-14,/@'6/#$' 1,$"'!F(*GDH'7&/' #/41-$/#'+#"I3'6/#$' H'E'6#/%$/'&"6%&' J1$C'=3!#"I3' !"#$%&'(%)*+% +#"I3'6/#$' ',,#""%-#.#$'/#.% $#"*!$,#"%
  • 24. Process  and  Design  Improvements ✦ Single  web-­‐form  application • includes  e-­‐mail  veriRicationn ✦ Centralized  and  connected  credential  management • FreeIPA  LDAP  -­‐  user  directory  and  credential  store • VOMS  -­‐  lab,  institution,  and  collaboration  afRiliations • MyProxy  -­‐  X.509  credential  store ✦ Overlap  administrative  roles • system  admin • registration  agent  for  certiRicate  authority  (approve  X.509   request) • VO  administrator  to  register  group  afRiliations ✦ Automation
  • 25. “Last  Mile”  and Ease  of  Use
  • 26. Grid  computing  at  your  desk ✦ Platforms • Windows • OS  X ✦ Web  interfaces • SBGrid  Science  Portal ✦ One-­‐stop-­‐shop  for  account  management • Centralized  services • Automated  processes • Administrator  intervention • Support
  • 27. Ryan,  a  postdoc  in  the   Frank  Lab  at  Columbia Access  NRAMM  facilities   securely  and  transfer  data   back  to  home  institute automated  X.509 check  SBGrid  for   application group   Ryan’s   membership /data/columbia/frank facility file server transfer  data  to  lab veriRication  in  Frank  Lab,  so   of   grant  access  to  Riles lab  membership SBGrid Ryan  initiate  tfransfer  at   applies   or  an   request  access Science account  at  the  SBGrid   NRAMM to  NRAMM Portal Science  Portal facility using  credential   notify  user  of   held  by  SBGrid lab file desktop completion server automated   /nfs/data/rsmith Globus  Online   application use  Globus  Online   to  manage transfer  from   /Users/Ryan laptop NRAMM  back  to  lab
  • 29. Service  Architecture GlobusOnline UC San Diego @Argonne GUMS User GUMS GridFTP + glideinWMS data Hadoop factory Open Science Grid computations MyProxy @NCSA, UIUC monitoring interfaces data computation ID mgmt Ganglia scp Condor FreeIPA Apache DOEGrids CA Nagios GridFTP Cycle Server @Lawrence GridSite LDAP RSV SRM VDT Berkley Labs Django VOMS Globus pacct WebDAV Sage Math GUMS glideinWMS Gratia Acct'ing R-Studio GACL @FermiLab file SQL shell CLI server DB cluster Monitoring SBGrid Science Portal @ Harvard Medical School @Indiana
  • 30. Grid Overview - Ian Stokes-Rees ijstokes@hkl.hms.harvard.edu
  • 31. Acknowledgements ✦ Piotr  Sliz ✦ SBGrid  Science  Portal • Daniel  O’Donovan,  Meghan  Porter-­‐Mahoney ✦ SBGrid  System  Administrators • Ian  Levesque,  Peter  Doherty,  Steve  Jahl ✦ Facility  Collaborators • Frank  Murphy  (NE-­‐CAT/APS) • Ashley  Deacon  (JCSG/SLAC) ✦ Globus  Online  Team • Steve  Tueke,  Ian  Foster,  Rachana   Ananthakrishnan,  Raj  Kettimuthu   ✦ Ruth  Pordes • Director  of  OSG,  for  championing  SBGrid