SBGrid Science Portal - eScience 2012

539 views

Published on

The SBGrid Science Portal provides multi-modal access to computational infrastructure, data storage, and data analysis tools for the structural biology community. It incorporates features not previously seen in cyberinfrastructure science gateways. It enables researchers to securely share a computational study area, including large volumes of data and active computational workflows. A rich identity management system has been developed that simplifies federated access to US national cyberinfrastructure, distributed data storage, and high performance file transfer tools. It integrates components from the Virtual Data Toolkit, Condor, glideinWMS, the Globus Toolkit and Globus Online, the FreeIPA identity management system, Apache web server, and the Django web framework.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
539
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

SBGrid Science Portal - eScience 2012

  1. 1. The  SBGrid  Science  Portal: An  integrated  environment  for protein  structure  studies Ian  Stokes-­‐Rees Harvard  Medical  School   eScience  2012,  Chicago,  October  2012
  2. 2. What’s  interesting  about   another  Science  Portal?j.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  3. 3. What’s  interesting  about   another  Science  Portal? ✦ Interface  modalities • Web  forms,  RESTful  interfaces,  command  linej.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  4. 4. What’s  interesting  about   another  Science  Portal? ✦ Interface  modalities • Web  forms,  RESTful  interfaces,  command  line ✦ Access  model • Browser  SSO,  X.509,  LDAP,  .htaccess,  GACLj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  5. 5. What’s  interesting  about   another  Science  Portal? ✦ Interface  modalities • Web  forms,  RESTful  interfaces,  command  line ✦ Access  model • Browser  SSO,  X.509,  LDAP,  .htaccess,  GACL ✦ Identity  management • Streamlined  grid  account  creationj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  6. 6. What’s  interesting  about   another  Science  Portal? ✦ Interface  modalities • Web  forms,  RESTful  interfaces,  command  line ✦ Access  model • Browser  SSO,  X.509,  LDAP,  .htaccess,  GACL ✦ Identity  management • Streamlined  grid  account  creation ✦ Computational  capability • local,  cluster,  and  grid  computingj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  7. 7. What’s  interesting  about   another  Science  Portal? ✦ Interface  modalities • Web  forms,  RESTful  interfaces,  command  line ✦ Access  model • Browser  SSO,  X.509,  LDAP,  .htaccess,  GACL ✦ Identity  management • Streamlined  grid  account  creation ✦ Computational  capability • local,  cluster,  and  grid  computing ✦ Data  management • Web  (HTTP),  scp,  GridFTP,  GlobusOnline • Tiered  staging  of  dataj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  8. 8. I’m  still  skeptical.  What  about  Taverna,   GridSphere,  Galaxy,  or  HubZero?j.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  9. 9. I’m  still  skeptical.  What  about  Taverna,   GridSphere,  Galaxy,  or  HubZero? ✦ All  great  if • the  portal  or  application  plugin  already  exists;  and • the  application  workGlows  closely  match  your   requirementsj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  10. 10. I’m  still  skeptical.  What  about  Taverna,   GridSphere,  Galaxy,  or  HubZero? ✦ All  great  if • the  portal  or  application  plugin  already  exists;  and • the  application  workGlows  closely  match  your   requirements ✦ Not-­‐so-­‐great  if • you  have  to  implement  a  new  portal  on  top  of  one  of   those  frameworks • you  want  to  adapt  the  workGlow • your  data  model  changes • you  want  to  add  a  new  application • you  want  to  explore  the  data  in  an  unanticipated  way • command-­‐line  access  is  also  important  to  you • you  are  working  with  othersj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  11. 11. Links www.sbgrid.org portal.sbgrid.org j.mp/esci12-sbgrid ijstokes@seas.harvard.edu @ijstokesj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  12. 12. Outline ✦ Community • Who  the  SBGrid  Science  Portal  is  meant  to  serve ✦ Objectives • What  was  the  vision  for  the  Science  Portal ✦ Implementation • Software  and  service  architectures ✦ Security,  Collaboration,  and  IdM • ...  or  “How  I  learned  to  stop  worrying  and  love  X.509” ✦ Data • Tiered  data  distribution  modelj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  13. 13. Washington U. School of Med. Cornell U. R. Cerione NE-CAT T. Ellenberger B. Crane R. Oswald D. Fremont S. Ealick C. Parrish Rosalind Franklin NIH M. Jin H. Sondermann D. Harrison M. Mayer A. Ke UMass Medical U. Washington T. Gonen U. Maryland W. Royer E. Toth Brandeis U. UC Davis N. Grigorieff H. Stahlberg Tufts U. K. Heldwein UCSF Columbia U. JJ Miranda Q. Fan Y. Cheng Community Rockefeller U. Stanford R. MacKinnon A. Brunger Yale U. K. Garcia T. Boggon K. Reinisch T. Jardetzky D. Braddock J. Schlessinger Y. Ha F. Sigworth CalTech E. Lolis F. Zhou P. Bjorkman Harvard and Affiliates W. Clemons N. Beglova A. Leschziner G. Jensen Rice University S. Blacklow K. Miller D. Rees E. Nikonowicz B. Chen A. Rao Y. Shamoo Vanderbilt J. Chou T. Rapoport Y.J. Tao Center for Structural Biology J. Clardy M. Samso WesternU W. Chazin C. Sanders M. Eck P. Sliz M. Swairjo B. Eichman B. Spiller B. Furie T. Springer M. Egli M. Stone R. Gaudet G. Verdine UCSD B. Lacy M. Waterman M. Grant G. Wagner T. Nakagawa M. Ohi S.C. Harrison L. Walensky H. Viadiu Thomas Jefferson J. Hogle S.Walker J. Williams D. Jeruzalmi T.Walz D. Kahne J. WangNot Pictured:University of Toronto: L. Howell, E. Pai, F. Sicheri; NHRI (Taiwan): G. Liou; Trinity College, Dublin: Amir Khan T. Kirchhausen S. Wong
  14. 14. Structural  Biology: Study  of  Protein  Structure  and  Function 400m 1mmj.mp/esci12-sbgrid 10nm ijstokes@seas.harvard.edu
  15. 15. Structural  Biology: Study  of  Protein  Structure  and  Function 400m 1mm 10nm • Shared  scientiGic  data  collection  facility • Data  intensive  (10-­‐100  GB/day)j.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  16. 16. Consortium  By  The  Numbers ✦ ~200  member  labs • representing  about  1500  users ✦ ~200  software  packages • multi-­‐platform  (Linux,  OS  X) • multi-­‐version ✦ 4  FTE  staff ✦ Automated  software  distribution • 80  GB  for  full  package • rsync+ssh  for  updates ✦ Everything  “Just  Works” • So  labs  are  happy  to  renew  membership  and  refer  friendsj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  17. 17. Boston  Life  Sciences  Hub • Biomedical  researchers • Government  agencies • Life  sciences Tufts • Universities Universit y School of Medicin e • Hospitalsj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  18. 18. Hug  a  Life  Scientist!j.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  19. 19. Hug  a  Life  Scientist! ✦ Let  them  know  you  care  ...j.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  20. 20. Hug  a  Life  Scientist! ✦ Let  them  know  you  care  ... • ...  because  the  software  we  give  them  doesn’tj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  21. 21. Hug  a  Life  Scientist! ✦ Let  them  know  you  care  ... • ...  because  the  software  we  give  them  doesn’t • ...  and  neither  do  the  systems  we  subject  them  toj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  22. 22. Hug  a  Life  Scientist! ✦ Let  them  know  you  care  ... • ...  because  the  software  we  give  them  doesn’t • ...  and  neither  do  the  systems  we  subject  them  to • ...  but  to  be  fair,  a  lot  of  the  pain  is  self-­‐inZlictedj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  23. 23. Hug  a  Life  Scientist! ✦ Let  them  know  you  care  ... • ...  because  the  software  we  give  them  doesn’t • ...  and  neither  do  the  systems  we  subject  them  to • ...  but  to  be  fair,  a  lot  of  the  pain  is  self-­‐inZlicted ✦ SBGrid  came  into  existence  to  Zill  the  tech   void/pain  experienced  by  structural  biologistsj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  24. 24. Hug  a  Life  Scientist! ✦ Let  them  know  you  care  ... • ...  because  the  software  we  give  them  doesn’t • ...  and  neither  do  the  systems  we  subject  them  to • ...  but  to  be  fair,  a  lot  of  the  pain  is  self-­‐inZlicted ✦ SBGrid  came  into  existence  to  Zill  the  tech   void/pain  experienced  by  structural  biologists ✦ Started  with  providing  reliable  compiled  softwarej.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  25. 25. Hug  a  Life  Scientist! ✦ Let  them  know  you  care  ... • ...  because  the  software  we  give  them  doesn’t • ...  and  neither  do  the  systems  we  subject  them  to • ...  but  to  be  fair,  a  lot  of  the  pain  is  self-­‐inZlicted ✦ SBGrid  came  into  existence  to  Zill  the  tech   void/pain  experienced  by  structural  biologists ✦ Started  with  providing  reliable  compiled  software ✦ Expanded  intoj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  26. 26. Hug  a  Life  Scientist! ✦ Let  them  know  you  care  ... • ...  because  the  software  we  give  them  doesn’t • ...  and  neither  do  the  systems  we  subject  them  to • ...  but  to  be  fair,  a  lot  of  the  pain  is  self-­‐inZlicted ✦ SBGrid  came  into  existence  to  Zill  the  tech   void/pain  experienced  by  structural  biologists ✦ Started  with  providing  reliable  compiled  software ✦ Expanded  into • training  events  and  workshopsj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  27. 27. Hug  a  Life  Scientist! ✦ Let  them  know  you  care  ... • ...  because  the  software  we  give  them  doesn’t • ...  and  neither  do  the  systems  we  subject  them  to • ...  but  to  be  fair,  a  lot  of  the  pain  is  self-­‐inZlicted ✦ SBGrid  came  into  existence  to  Zill  the  tech   void/pain  experienced  by  structural  biologists ✦ Started  with  providing  reliable  compiled  software ✦ Expanded  into • training  events  and  workshops • best  practice  guidesj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  28. 28. Hug  a  Life  Scientist! ✦ Let  them  know  you  care  ... • ...  because  the  software  we  give  them  doesn’t • ...  and  neither  do  the  systems  we  subject  them  to • ...  but  to  be  fair,  a  lot  of  the  pain  is  self-­‐inZlicted ✦ SBGrid  came  into  existence  to  Zill  the  tech   void/pain  experienced  by  structural  biologists ✦ Started  with  providing  reliable  compiled  software ✦ Expanded  into • training  events  and  workshops • best  practice  guides • shared  computational  infrastructure (clusters!  OSG!  GlobusOnline!)j.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  29. 29. Hug  a  Life  Scientist! ✦ Let  them  know  you  care  ... • ...  because  the  software  we  give  them  doesn’t • ...  and  neither  do  the  systems  we  subject  them  to • ...  but  to  be  fair,  a  lot  of  the  pain  is  self-­‐inZlicted ✦ SBGrid  came  into  existence  to  Zill  the  tech   void/pain  experienced  by  structural  biologists ✦ Started  with  providing  reliable  compiled  software ✦ Expanded  into • training  events  and  workshops • best  practice  guides • shared  computational  infrastructure (clusters!  OSG!  GlobusOnline!) • web-­‐based  collaborative  computational  and  data  servicesj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  30. 30. Objectives A. Extensible  infrastructure  to  facilitate   development  and  deployment  of  novel   computational  workGlows   B. Web-­‐accessible  environment  for  collaborative,   compute  and  data  intensive  sciencej.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  31. 31. Objectives   (explained)j.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  32. 32. Objectives   (explained) ✦ Pareto  Principlej.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  33. 33. Objectives   (explained) ✦ Pareto  Principle • 80%  of  the  time  users  are  happy  with  basic  web  form  interface   to  standard  application  workGlow  and  canned  result  analysisj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  34. 34. Objectives   (explained) ✦ Pareto  Principle • 80%  of  the  time  users  are  happy  with  basic  web  form  interface   to  standard  application  workGlow  and  canned  result  analysis • 20%  of  the  effort  to  address  these  routine  casesj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  35. 35. Objectives   (explained) ✦ Pareto  Principle • 80%  of  the  time  users  are  happy  with  basic  web  form  interface   to  standard  application  workGlow  and  canned  result  analysis • 20%  of  the  effort  to  address  these  routine  cases • Science  Portals  are  a  big  win  over  cumbersome  and  complex   Fortran  codej.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  36. 36. Objectives   (explained) ✦ Pareto  Principle • 80%  of  the  time  users  are  happy  with  basic  web  form  interface   to  standard  application  workGlow  and  canned  result  analysis • 20%  of  the  effort  to  address  these  routine  cases • Science  Portals  are  a  big  win  over  cumbersome  and  complex   Fortran  code ✦ Corollary  to  Pareto  Principlej.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  37. 37. Objectives   (explained) ✦ Pareto  Principle • 80%  of  the  time  users  are  happy  with  basic  web  form  interface   to  standard  application  workGlow  and  canned  result  analysis • 20%  of  the  effort  to  address  these  routine  cases • Science  Portals  are  a  big  win  over  cumbersome  and  complex   Fortran  code ✦ Corollary  to  Pareto  Principle • 20%  of  the  time  users  want  or  need  customized  application   work<low  and/or  result  analysisj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  38. 38. Objectives   (explained) ✦ Pareto  Principle • 80%  of  the  time  users  are  happy  with  basic  web  form  interface   to  standard  application  workGlow  and  canned  result  analysis • 20%  of  the  effort  to  address  these  routine  cases • Science  Portals  are  a  big  win  over  cumbersome  and  complex   Fortran  code ✦ Corollary  to  Pareto  Principle • 20%  of  the  time  users  want  or  need  customized  application   work<low  and/or  result  analysis • 80%  of  the  effort  to  make  possiblej.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  39. 39. Objectives   (explained) ✦ Pareto  Principle • 80%  of  the  time  users  are  happy  with  basic  web  form  interface   to  standard  application  workGlow  and  canned  result  analysis • 20%  of  the  effort  to  address  these  routine  cases • Science  Portals  are  a  big  win  over  cumbersome  and  complex   Fortran  code ✦ Corollary  to  Pareto  Principle • 20%  of  the  time  users  want  or  need  customized  application   work<low  and/or  result  analysis • 80%  of  the  effort  to  make  possible • But  rare  that  anyone  knows  in  advance  whether  80  or  20  sidej.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  40. 40. My  Experience  and  Perspectivej.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  41. 41. Aud My  Experience  and  iPerspective enc e  Pa rtic ipat ionj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  42. 42. Aud My  Experience  and  iPerspective enc e  Pa rtic ipat ion ✦ The  really  interesting  stuff  happensj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  43. 43. Aud My  Experience  and  iPerspective enc e  Pa rtic ipat ion ✦ The  really  interesting  stuff  happens • in  the  unpredictable  20%j.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  44. 44. Aud My  Experience  and  iPerspective enc e  Pa rtic ipat ion ✦ The  really  interesting  stuff  happens • in  the  unpredictable  20% ✦ Innovative  analytical  strategies  requirej.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  45. 45. Aud My  Experience  and  iPerspective enc e  Pa rtic ipat ion ✦ The  really  interesting  stuff  happens • in  the  unpredictable  20% ✦ Innovative  analytical  strategies  require • an  ability  to  rapidly  adjust  work<low  and  data  analysisj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  46. 46. Aud My  Experience  and  iPerspective enc e  Pa rtic ipat ion ✦ The  really  interesting  stuff  happens • in  the  unpredictable  20% ✦ Innovative  analytical  strategies  require • an  ability  to  rapidly  adjust  work<low  and  data  analysis ✦ You’re  stuffedj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  47. 47. Aud My  Experience  and  iPerspective enc e  Pa rtic ipat ion ✦ The  really  interesting  stuff  happens • in  the  unpredictable  20% ✦ Innovative  analytical  strategies  require • an  ability  to  rapidly  adjust  work<low  and  data  analysis ✦ You’re  stuffed • if  workGlow  and  data  are  tightly  coupled  to  portal  frameworkj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  48. 48. Aud My  Experience  and  iPerspective enc e  Pa rtic ipat ion ✦ The  really  interesting  stuff  happens • in  the  unpredictable  20% ✦ Innovative  analytical  strategies  require • an  ability  to  rapidly  adjust  work<low  and  data  analysis ✦ You’re  stuffed • if  workGlow  and  data  are  tightly  coupled  to  portal  framework ✦ Collaboration  is  critical:j.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  49. 49. Aud My  Experience  and  iPerspective enc e  Pa rtic ipat ion ✦ The  really  interesting  stuff  happens • in  the  unpredictable  20% ✦ Innovative  analytical  strategies  require • an  ability  to  rapidly  adjust  work<low  and  data  analysis ✦ You’re  stuffed • if  workGlow  and  data  are  tightly  coupled  to  portal  framework ✦ Collaboration  is  critical: • you  need  to  be  able  to  share  your  work  (securely)j.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  50. 50. Aud My  Experience  and  iPerspective enc e  Pa rtic ipat ion ✦ The  really  interesting  stuff  happens • in  the  unpredictable  20% ✦ Innovative  analytical  strategies  require • an  ability  to  rapidly  adjust  work<low  and  data  analysis ✦ You’re  stuffed • if  workGlow  and  data  are  tightly  coupled  to  portal  framework ✦ Collaboration  is  critical: • you  need  to  be  able  to  share  your  work  (securely) • the  web  is  the  obvious  (only!)  way  anyone  wants  to  do  thisj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  51. 51. Implementation  and   Architecture
  52. 52. Front  End  Interface ✦ Django  (Python)  web   framework ✦ Apache  web  server ✦ Per-­‐user  protected   jobs  and  data ✦ WebDAV  to  data ✦ ssh  access  possible ✦ Richer  access  control   in  developmentj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  53. 53. Results  Visualization  and  Analysis
  54. 54. NoSQL  hierarchical  document  store ✦ The  SBGrid  Portal’s  leading  workGlow: • 100,000  jobs • 300,000  output  Giles • 20-­‐100k  CPU-­‐hours ✦ Need  a  good  way  to  store  data • Glexible  data  format • Glexible  analysis  output • Gine  grained,  user-­‐driven  access  control • parallel  access • remote  access ✦ high  capacity  non-­‐relational  hierarchical  storage • ????j.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  55. 55. Operating  Systems are  Pretty  Goodj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  56. 56. Operating  Systems are  Pretty  Good ✦ File  systems  work  well • organize  data  carefully  (hierarchically) • include  meta-­‐data  (mod_cern_meta,  Gile  system) • serve  intelligently  via  multiple  protocols  (http,  gridftp) • leverage  POSIX  ownerships  (user,  group,  other,  r/w) • leverage  user,  group,  and  volume  quotas • storage  management  and  backups  are  easy  easierj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  57. 57. Operating  Systems are  Pretty  Good ✦ File  systems  work  well • organize  data  carefully  (hierarchically) • include  meta-­‐data  (mod_cern_meta,  Gile  system) • serve  intelligently  via  multiple  protocols  (http,  gridftp) • leverage  POSIX  ownerships  (user,  group,  other,  r/w) • leverage  user,  group,  and  volume  quotas • storage  management  and  backups  are  easy  easier ✦ Process  management  works  well • execute  as  the  actual  user,  where  possible • setuid,  su,  ssh,  suexec,  and  gsexec  can  all  help  with  this • process  accounting  is  your  friend!  (pacct) • leverage  ulimit  for  process  resource  limitsj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  58. 58. Data  Accessj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  59. 59. Same  data  servedby  web  and  availablefrom  command  line
  60. 60. Open  Science  Grid http://opensciencegrid.org ✦ US  National   Cyberinfrastructure ✦ Primarily  used  for  high  energy   physics  computing 5,073,293  hours ✦ 80  sites ~570  years ✦ O(1e5)  job  slots ✦ O(1e6)  core-­‐hours  per  day ✦ PB  scale  aggregate  storagej.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  61. 61. Service  Architecture GlobusOnline UC San Diego @Argonne GUMSUser GUMS GridFTP + glideinWMS data Hadoop factory Open Science Grid computations MyProxy @NCSA, UIUC monitoring interfaces data computation ID mgmt Ganglia scp Condor FreeIPA Apache DOEGrids CA Nagios GridFTP Cycle Server @Lawrence GridSite LDAP RSV SRM VDT Berkley Labs Django VOMS Globus pacct WebDAV Sage Math GUMS glideinWMS Gratia Accting R-Studio GACL @FermiLab file SQL shell CLI server DB cluster Monitoring SBGrid Science Portal @ Harvard Medical School @Indiana
  62. 62. SBGrid  Portal:  Current  Statusj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  63. 63. SBGrid  Portal:  Current  Status ✦ 262  users  (lifetime),  72  active  in  past  quarterj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  64. 64. SBGrid  Portal:  Current  Status ✦ 262  users  (lifetime),  72  active  in  past  quarter ✦ 2.4  million  hours  on  OSG  last  12  monthsj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  65. 65. SBGrid  Portal:  Current  Status ✦ 262  users  (lifetime),  72  active  in  past  quarter ✦ 2.4  million  hours  on  OSG  last  12  months ✦ Seamless  data  sharing  from  web  to  ssh? • requires  NFSv4  to  allow  >12  POSIX  groups/user • suexec  or  gsexec  possibilityj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  66. 66. SBGrid  Portal:  Current  Status ✦ 262  users  (lifetime),  72  active  in  past  quarter ✦ 2.4  million  hours  on  OSG  last  12  months ✦ Seamless  data  sharing  from  web  to  ssh? • requires  NFSv4  to  allow  >12  POSIX  groups/user • suexec  or  gsexec  possibility ✦ Account  integration • PAM  (ssh/command  line)  +  web  through  FreeIPA  LDAP • prototype  of  X.509  +  VOMS  +  MyProxy  (next  section!)j.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  67. 67. SBGrid  Portal:  Current  Status ✦ 262  users  (lifetime),  72  active  in  past  quarter ✦ 2.4  million  hours  on  OSG  last  12  months ✦ Seamless  data  sharing  from  web  to  ssh? • requires  NFSv4  to  allow  >12  POSIX  groups/user • suexec  or  gsexec  possibility ✦ Account  integration • PAM  (ssh/command  line)  +  web  through  FreeIPA  LDAP • prototype  of  X.509  +  VOMS  +  MyProxy  (next  section!) ✦ Collaboration • shared  secret  (password) • manual  .htaccess  or  .gaclj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  68. 68. Identity  Management* *  or  “How  I  learned  to  stop  worrying  and  love  X.509”
  69. 69. Big  Picturej.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  70. 70. Big  Picture ✦ Federated  environment  requires • federated  identity  management • trusted  identity  providers  (“roots  of  trust”)j.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  71. 71. Big  Picture ✦ Federated  environment  requires • federated  identity  management • trusted  identity  providers  (“roots  of  trust”) ✦ Collaboration  requires • user-­‐driven  capacity  to  form  cross-­‐organization  user  groups   (aka  “Virtual  Organizations”) • roles  (or  at  least  privilege  levels)  within  VOj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  72. 72. Big  Picture ✦ Federated  environment  requires • federated  identity  management • trusted  identity  providers  (“roots  of  trust”) ✦ Collaboration  requires • user-­‐driven  capacity  to  form  cross-­‐organization  user  groups   (aka  “Virtual  Organizations”) • roles  (or  at  least  privilege  levels)  within  VO ✦ State  of  Play • InCommon  will  get  us  part  way  there  (waiting  on  adoption!) • OpenID  nice  for  users,  but  no  trust  or  delegated  perms • X.509  process  and  details  still  tough  for  end  user • SSH  keys  lack  standard  root  of  trust  and  rolesj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  73. 73. X.509  Digital  CertiZicates ✦ Analogy  to  a  passport: • Application  form • Sponsor’s  attestation • Consular  services • veriGication  of  application,  sponsor,  and  accompanying   identiGication  and  eligibility  documents • Passport  issuing  ofGice ✦ Portable,  digital  passport • Gixed  and  secure  user  identiGiers • name,  email,  home  institution • signed  by  widely  trusted  issuer • time  limited • ISO  standardj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  74. 74. X.509  Challenges ✦ Lots  of  “humans  in  the  loop”  to  get  usable  cert • Registration  Agent,  Sponsor,  VO  Manager,  User ✦ Awkward  working  with  X.509  certs • multiple  formats • proxy  certs  and  VOMS  ACs • proxy  servers  (MyProxy) • expiry  (of  proxy,  of  base  cert,  of  VO  membership) • browser  integration  and  import  process • CA  cert  chain • digital  token  needs  to  be  available  on  all  devices • particularly  challenging  for  phones  and  tabletsj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  75. 75. X.509  Nirvana   (ours  at  least)j.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  76. 76. X.509  Nirvana   (ours  at  least) ✦ User  never  sees  X.509  anything • unless  they  want  toj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  77. 77. X.509  Nirvana   (ours  at  least) ✦ User  never  sees  X.509  anything • unless  they  want  to ✦ X.509  request  +  VO  membership  +  account  creation   completed  in  one  step  by  one  person • single  step  for  user • single  step  for  one  administratorj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  78. 78. X.509  Nirvana   (ours  at  least) ✦ User  never  sees  X.509  anything • unless  they  want  to ✦ X.509  request  +  VO  membership  +  account  creation   completed  in  one  step  by  one  person • single  step  for  user • single  step  for  one  administrator ✦ Goodbye  passphrases  (and  forgotten  passphrases) • hold  private  key  in  LDAP  and  use  LDAP  authentication  to  accessj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  79. 79. X.509  Nirvana   (ours  at  least) ✦ User  never  sees  X.509  anything • unless  they  want  to ✦ X.509  request  +  VO  membership  +  account  creation   completed  in  one  step  by  one  person • single  step  for  user • single  step  for  one  administrator ✦ Goodbye  passphrases  (and  forgotten  passphrases) • hold  private  key  in  LDAP  and  use  LDAP  authentication  to  access ✦ Automate  everything • login  (web  or  command  line)  triggers  X.509  proxy  request  with   (default)  VOMS  AC,  and  loading  to  MyProxy  serverj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  80. 80. X.509  Nirvana   (ours  at  least) ✦ User  never  sees  X.509  anything • unless  they  want  to ✦ X.509  request  +  VO  membership  +  account  creation   completed  in  one  step  by  one  person • single  step  for  user • single  step  for  one  administrator ✦ Goodbye  passphrases  (and  forgotten  passphrases) • hold  private  key  in  LDAP  and  use  LDAP  authentication  to  access ✦ Automate  everything • login  (web  or  command  line)  triggers  X.509  proxy  request  with   (default)  VOMS  AC,  and  loading  to  MyProxy  server ✦ VO  Management  System  run  by  users • Users  need  to  be  able  to  self-­‐manage  their  (sub-­‐)  VOsj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  81. 81. U1 U1 U1 Addressing  CertiZicate  Problems /. -..)"*& 012*%2! 3%"! )"*"!4&" ,"!&5":14(! !"#$"%&%()*"+,"!& U1 !"&$!*&!4,5(*)*$67"! *289:4)"*&% !";("<!"#$"%& R1 time ;"!(9:$%"!"=()(7(=(&: ,2*>!6"=()(7(=(&: S1 411!2;","!& R2 %()*,"!& *289:4;4(=47(=(&: !"&!(";","!& U2a "?12!&%()*"+ ,"!&5":14(!j.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  82. 82. U1 U1 U1 Addressing  CertiZicate  Problems /. -..)"*& 012*%2! 3%"! )"*"!4&" ,"!&5":14(! !"#$"%&%()*"+,"!& U1 !"&$!*&!4,5(*)*$67"! T0  =  late  Saturday *289:4)"*&% night  lab  session !";("<!"#$"%& R1 time ;"!(9:$%"!"=()(7(=(&: ,2*>!6"=()(7(=(&: S1 411!2;","!& R2 %()*,"!& *289:4;4(=47(=(&: !"&!(";","!& U2a "?12!&%()*"+ ,"!&5":14(!j.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  83. 83. U1 U1 U1 Addressing  CertiZicate  Problems /. -..)"*& 012*%2! 3%"! )"*"!4&" ,"!&5":14(! !"#$"%&%()*"+,"!& U1 !"&$!*&!4,5(*)*$67"! T0  =  late  Saturday *289:4)"*&% night  lab  session !";("<!"#$"%& R1 T+40h  =  mid-­‐Monday time ;"!(9:$%"!"=()(7(=(&: response ,2*>!6"=()(7(=(&: S1 411!2;","!& R2 %()*,"!& *289:4;4(=47(=(&: !"&!(";","!& U2a "?12!&%()*"+ ,"!&5":14(!j.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  84. 84. U1 U1 U1 Addressing  CertiZicate  Problems /. -..)"*& 012*%2! 3%"! )"*"!4&" ,"!&5":14(! !"#$"%&%()*"+,"!& U1 !"&$!*&!4,5(*)*$67"! T0  =  late  Saturday *289:4)"*&% night  lab  session !";("<!"#$"%& R1 T+40h  =  mid-­‐Monday time ;"!(9:$%"!"=()(7(=(&: response ,2*>!6"=()(7(=(&: T+60h  =  early-­‐Tuesday S1 411!2;","!& response R2 %()*,"!& *289:4;4(=47(=(&: !"&!(";","!& U2a "?12!&%()*"+ ,"!&5":14(!j.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  85. 85. U1 U1 U1 Addressing  CertiZicate  Problems /. -..)"*& 012*%2! 3%"! )"*"!4&" ,"!&5":14(! !"#$"%&%()*"+,"!& U1 !"&$!*&!4,5(*)*$67"! T0  =  late  Saturday *289:4)"*&% night  lab  session !";("<!"#$"%& R1 T+40h  =  mid-­‐Monday time ;"!(9:$%"!"=()(7(=(&: response ,2*>!6"=()(7(=(&: T+60h  =  early-­‐Tuesday S1 411!2;","!& response R2 T+66h  =  late-­‐Tuesday %()*,"!& *289:4;4(=47(=(&: response !"&!(";","!& U2a "?12!&%()*"+ ,"!&5":14(!j.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  86. 86. U1 U1 U1 Addressing  CertiZicate  Problems /. -..)"*& 012*%2! 3%"! )"*"!4&" ,"!&5":14(! !"#$"%&%()*"+,"!& U1 !"&$!*&!4,5(*)*$67"! T0  =  late  Saturday *289:4)"*&% night  lab  session !";("<!"#$"%& R1 T+40h  =  mid-­‐Monday time ;"!(9:$%"!"=()(7(=(&: response ,2*>!6"=()(7(=(&: T+60h  =  early-­‐Tuesday S1 411!2;","!& response R2 T+66h  =  late-­‐Tuesday %()*,"!& *289:4;4(=47(=(&: response !"&!(";","!& T+70h  =  late-­‐Tuesday U2a "?12!&%()*"+ STAGE  1 ,"!&5":14(!j.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  87. 87. VO  (Group)  Membership   Registration !")*# !"#$%&(# *+,(-,.# /-0.# +.0-0(:#50.:#:,#.0?>0-:#&0&90.-<+#93#@A# .0?>0-:#!"#8.,>+-#4(%#.,70-# U2b (,123#4%&(# V1 ;0.23#>-0.#07897:3# 5,(6.&#07897:3# S2time 4++.,;0#&0&90.-<+=# 8.,>+-=#4(%#.,70-# 4%%#@A# V2 :,#!")*# (,123# .0?>0-:#!")*#$B# .0:>.(#!")*#$B# 4%%#$B#:,# +.,C3#50.:#
  88. 88. VO  (Group)  Membership   Registration T+82h  =  mid-­‐Wednesday !")*# !"#$%&(# *+,(-,.# /-0.# ask  “What  next?” +.0-0(:#50.:#:,#.0?>0-:#&0&90.-<+#93#@A# .0?>0-:#!"#8.,>+-#4(%#.,70-# U2b (,123#4%&(# V1 ;0.23#>-0.#07897:3# 5,(6.&#07897:3# S2time 4++.,;0#&0&90.-<+=# 8.,>+-=#4(%#.,70-# 4%%#@A# V2 :,#!")*# (,123# .0?>0-:#!")*#$B# .0:>.(#!")*#$B# 4%%#$B#:,# +.,C3#50.:#
  89. 89. VO  (Group)  Membership   Registration T+82h  =  mid-­‐Wednesday !")*# !"#$%&(# *+,(-,.# /-0.# ask  “What  next?” +.0-0(:#50.:#:,#.0?>0-:#&0&90.-<+#93#@A# .0?>0-:#!"#8.,>+-#4(%#.,70-# U2b (,123#4%&(# V1 T+95h  =  early-­‐Thursday ;0.23#>-0.#07897:3# response  (time  zone!) 5,(6.&#07897:3# S2time 4++.,;0#&0&90.-<+=# 8.,>+-=#4(%#.,70-# 4%%#@A# V2 :,#!")*# (,123# .0?>0-:#!")*#$B# .0:>.(#!")*#$B# 4%%#$B#:,# +.,C3#50.:#
  90. 90. VO  (Group)  Membership   Registration T+82h  =  mid-­‐Wednesday !")*# !"#$%&(# *+,(-,.# /-0.# ask  “What  next?” +.0-0(:#50.:#:,#.0?>0-:#&0&90.-<+#93#@A# .0?>0-:#!"#8.,>+-#4(%#.,70-# U2b (,123#4%&(# V1 T+95h  =  early-­‐Thursday ;0.23#>-0.#07897:3# response  (time  zone!) 5,(6.&#07897:3# S2time 4++.,;0#&0&90.-<+=# T+100h  =  early-­‐Thursday 4%%#@A# 8.,>+-=#4(%#.,70-# V2 response :,#!")*# (,123# .0?>0-:#!")*#$B# .0:>.(#!")*#$B# 4%%#$B#:,# +.,C3#50.:#
  91. 91. VO  (Group)  Membership   Registration T+82h  =  mid-­‐Wednesday !")*# !"#$%&(# *+,(-,.# /-0.# ask  “What  next?” +.0-0(:#50.:#:,#.0?>0-:#&0&90.-<+#93#@A# .0?>0-:#!"#8.,>+-#4(%#.,70-# U2b (,123#4%&(# V1 T+95h  =  early-­‐Thursday ;0.23#>-0.#07897:3# response  (time  zone!) 5,(6.&#07897:3# S2time 4++.,;0#&0&90.-<+=# T+100h  =  early-­‐Thursday 4%%#@A# 8.,>+-=#4(%#.,70-# V2 response :,#!")*# (,123# T+105h  =  mid-­‐Thursday response .0?>0-:#!")*#$B# .0:>.(#!")*#$B# 4%%#$B#:,# +.,C3#50.:#
  92. 92. VO  (Group)  Membership   Registration T+82h  =  mid-­‐Wednesday !")*# !"#$%&(# *+,(-,.# /-0.# ask  “What  next?” +.0-0(:#50.:#:,#.0?>0-:#&0&90.-<+#93#@A# .0?>0-:#!"#8.,>+-#4(%#.,70-# U2b (,123#4%&(# V1 T+95h  =  early-­‐Thursday ;0.23#>-0.#07897:3# response  (time  zone!) 5,(6.&#07897:3# S2time 4++.,;0#&0&90.-<+=# T+100h  =  early-­‐Thursday 4%%#@A# 8.,>+-=#4(%#.,70-# V2 response :,#!")*# (,123# T+105h  =  mid-­‐Thursday response .0?>0-:#!")*#$B# .0:>.(#!")*#$B# 4%%#$B#:,# +.,C3#50.:# T+105h  =  4.5  days  waiting
  93. 93. () AB)! !"#$%& *+",-"# .-/# ;< #/>:/-$+"#$%&%66":,$ =3!#"I3 ;<=* )@81, 0/#176%?",/8%1&-/,$ /8%1&0/#17/@ U1 4/,/#%$/ !"#$%&(%)*+% #/>:/-$-14,/@6/#$ 6/#$9/3+%1# ,,#""%+#0% 1*$/2% #/$:#,$#%691,4,:85/# ,"?23%4/,$- 0/#123/&14151&1$3 A1a 6#/%$/ 6",7#8/&14151&1$3 &"6%&%66$ S1* %++#"0/6/#$time -14,6/#$ ,"?23%0%1&%51&1$3 A1b -/$#/$#1/0%&-/#1%&,:85/# -/$;<#14C$- %66":,$#/%@3,"?76%?", +"#$%&&"41, U2* #/>:/-$-14,/@6/#?76%$/ #/$:#,-14,/@6/#?76%$/ DE+%1#-14,/@6/#$ 1,$"!F(*GDH7&/ #/41-$/#+#"I36/#$ HE6#/%$/&"6%& J1$C=3!#"I3 !"#$%&(%)*+% +#"I36/#$ ,,#""%-#.#$/#.% $#"*!$,#"%
  94. 94. () AB)! !"#$%& *+",-"# .-/# #/>:/-$+"#$%&%66":,$ T0  =  late  Saturday ;< =3!#"I3 ;<=* )@81, 0/#176%?",/8%1&-/,$ night  lab  session /8%1&0/#17/@ U1 4/,/#%$/ !"#$%&(%)*+% #/>:/-$-14,/@6/#$ 6/#$9/3+%1# ,,#""%+#0% 1*$/2% #/$:#,$#%691,4,:85/# ,"?23%4/,$- 0/#123/&14151&1$3 A1a 6#/%$/ 6",7#8/&14151&1$3 &"6%&%66$ S1* %++#"0/6/#$time -14,6/#$ ,"?23%0%1&%51&1$3 A1b -/$#/$#1/0%&-/#1%&,:85/# -/$;<#14C$- %66":,$#/%@3,"?76%?", +"#$%&&"41, U2* #/>:/-$-14,/@6/#?76%$/ #/$:#,-14,/@6/#?76%$/ DE+%1#-14,/@6/#$ 1,$"!F(*GDH7&/ #/41-$/#+#"I36/#$ HE6#/%$/&"6%& J1$C=3!#"I3 !"#$%&(%)*+% +#"I36/#$ ,,#""%-#.#$/#.% $#"*!$,#"%
  95. 95. () AB)! !"#$%& *+",-"# .-/# #/>:/-$+"#$%&%66":,$ T0  =  late  Saturday ;< =3!#"I3 ;<=* )@81, 0/#176%?",/8%1&-/,$ night  lab  session /8%1&0/#17/@ U1 4/,/#%$/ !"#$%&(%)*+% #/>:/-$-14,/@6/#$ 6/#$9/3+%1# ,,#""%+#0% 1*$/2% #/$:#,$#%691,4,:85/# ,"?23%4/,$- 0/#123/&14151&1$3 A1a 6#/%$/ 6",7#8/&14151&1$3 &"6%&%66$ S1* %++#"0/6/#$ T+40h  =  mid-­‐Mondaytime -14,6/#$ ,"?23%0%1&%51&1$3 A1b response -/$#/$#1/0%&-/#1%&,:85/# -/$;<#14C$- %66":,$#/%@3,"?76%?", +"#$%&&"41, U2* #/>:/-$-14,/@6/#?76%$/ #/$:#,-14,/@6/#?76%$/ DE+%1#-14,/@6/#$ 1,$"!F(*GDH7&/ #/41-$/#+#"I36/#$ HE6#/%$/&"6%& J1$C=3!#"I3 !"#$%&(%)*+% +#"I36/#$ ,,#""%-#.#$/#.% $#"*!$,#"%
  96. 96. () AB)! !"#$%& *+",-"# .-/# #/>:/-$+"#$%&%66":,$ T0  =  late  Saturday ;< =3!#"I3 ;<=* )@81, 0/#176%?",/8%1&-/,$ night  lab  session /8%1&0/#17/@ U1 4/,/#%$/ !"#$%&(%)*+% #/>:/-$-14,/@6/#$ 6/#$9/3+%1# ,,#""%+#0% 1*$/2% #/$:#,$#%691,4,:85/# ,"?23%4/,$- 0/#123/&14151&1$3 A1a 6#/%$/ 6",7#8/&14151&1$3 &"6%&%66$ S1* %++#"0/6/#$ T+40h  =  mid-­‐Mondaytime -14,6/#$ ,"?23%0%1&%51&1$3 A1b response -/$#/$#1/0%&-/#1%&,:85/# -/$;<#14C$- %66":,$#/%@3,"?76%?", +"#$%&&"41, U2* #/>:/-$-14,/@6/#?76%$/ #/$:#,-14,/@6/#?76%$/ DE+%1#-14,/@6/#$ 1,$"!F(*GDH7&/ #/41-$/#+#"I36/#$ HE6#/%$/&"6%& J1$C=3!#"I3 !"#$%&(%)*+% +#"I36/#$ ,,#""%-#.#$/#.% T+40h  =  1.7  day  wait $#"*!$,#"%
  97. 97. Data  Managementj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  98. 98. Data  Tiers  -­‐  Scoping • VO-­‐wide:  all  sites,  admin  managed,  very  stable • User  archive:  single  site,  user  managed,  very  stable,  10+  GB • User  project:  all  sites,  user  managed,  1-­‐10  weeks,  1-­‐3  GB • User  static:  all  sites,  user  managed,  indeZinite,  10  MB • Job  set:  all  sites,  infrastructure  managed,  1-­‐10  days,  0.1-­‐1  GB • Job:  direct  to  worker  node,  infrastructure  managed,  1  day,  <10  MB • Job  indirect:  to  worker  node  via  UCSD,  infrastructure  managed,  1   day,  <10  GBj.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  99. 99. About  2PB  with40  front  end  servers  for  high  bandwidth  parallel  Gile  transferData  Movementscp  (users)rsync  (VO-­‐wide)grid-­‐ftp  (UCSD)curl  (WNs)cp  (NFS)htcp  (secure  web)http(s)  (web)j.mp/esci12-sbgrid ijstokes@seas.harvard.edu
  100. 100. Globus  Online:  High  Performance   Reliable  3rd  Party  File  Transfer GUMS DN  to  user  mapping CertiGicate  Authority VOMS root  of  trust VO  membership portal cluster Globus  Online Zile  transfer  service data collection lab file facility serverj.mp/esci12-sbgrid desktop laptop ijstokes@seas.harvard.edu
  101. 101. facility file serverSBGridScience Portal lab file desktop server laptop
  102. 102. Ryan,  a  postdoc  in  the   Frank  Lab  at  Columbia Access  NRAMM  facilities   securely  and  transfer  data   back  to  home  institute facility file serverSBGridScience Portal lab file desktop server laptop
  103. 103. Ryan,  a  postdoc  in  the   Frank  Lab  at  Columbia Access  NRAMM  facilities   securely  and  transfer  data   back  to  home  institute /data/columbia/frank facility file serverSBGridScience Portal lab file desktop server /nfs/data/rsmith /Users/Ryan laptop
  104. 104. Ryan,  a  postdoc  in  the   Frank  Lab  at  Columbia Access  NRAMM  facilities   securely  and  transfer  data   back  to  home  institute automated  X.509 application /data/columbia/frank facility file server SBGrid Ryan  applies  for  an   Science account  at  the  SBGrid   Portal Science  Portal lab fileautomated   desktop serverGlobus  Online   /nfs/data/rsmithapplication/X.509  linking(wish  list!) /Users/Ryan laptop
  105. 105. Ryan,  a  postdoc  in  the   Frank  Lab  at  Columbia Access  NRAMM  facilities   securely  and  transfer  data   back  to  home  institute automated  X.509 application /data/columbia/frank facility file server veriZication  of   lab  membership SBGrid Ryan  applies  for  an   Science account  at  the  SBGrid   Portal Science  Portal lab fileautomated   desktop serverGlobus  Online   /nfs/data/rsmithapplication/X.509  linking(wish  list!) /Users/Ryan laptop

×