Advancing Science through Coordinated Cyberinfrastructure
Upcoming SlideShare
Loading in...5
×
 

Advancing Science through Coordinated Cyberinfrastructure

on

  • 229 views

How local, regional, and national cyberinfrastructure can be coordinated and linked to advance science and engineering, based on experiences and lessons from the Center for Computation & Technology at ...

How local, regional, and national cyberinfrastructure can be coordinated and linked to advance science and engineering, based on experiences and lessons from the Center for Computation & Technology at LSU (ideas, funding, implementation), plus some thoughts on what might be done differently if we were starting today. Presented at First Workshop - Center for Computational Engineering & Sciences, Unicamp, Campinas, Brazil 10 APR 2014

Statistics

Views

Total Views
229
Views on SlideShare
222
Embed Views
7

Actions

Likes
2
Downloads
8
Comments
0

3 Embeds 7

https://twitter.com 3
https://www.linkedin.com 3
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Advancing Science through Coordinated Cyberinfrastructure Advancing Science through Coordinated Cyberinfrastructure Presentation Transcript

  •     www.ci.anl.gov   www.ci.uchicago.edu   Advancing  Science  through   Coordinated  Cyberinfrastructure   Daniel  S.  Katz   d.katz@ieee.org   Senior  Fellow,  ComputaBon  InsBtute,  University  of  Chicago  &  Argonne  NaBonal  Laboratory   Affiliate  Faculty,  Center  for  ComputaBon  &  Technology,  Louisiana  State  University   Adjunct  Associate  Professor,  Electrical  and  Computer  Engineering,  LSU    
  • www.ci.anl.gov   www.ci.uchicago.edu   2   Advancing  Science  through  CI  –  d.katz@ieee.org   Topics   •  What  we  did  in  Louisiana  from  2006-­‐2010   •  What  I  would  do  differently  now   •  A  short  video  to  highlight  some  addiBonal  issues   that  I  hope  the  Center  for  ComputaBonal   Engineering  &  Sciences  will  keep  in  mind  
  • www.ci.anl.gov   www.ci.uchicago.edu   3   Advancing  Science  through  CI  –  d.katz@ieee.org   Louisiana   •  Area: 134 382 km2 (33/51) •  Population: 4 533 000 (2010, 25/51) •  GDP: $208 billion (2009, 24/51) •  GDP/person: $45 700 (2009, 21/51) •  In Poverty: 17% (2009, 44/51) •  High School Degree: 82% (2009, 46/51) •  BS Degree: 21% (2009, 47/51) •  Advanced Degree: 7% (2009, 48/51) State  Goals:  talented  workforce,  great  compeBBveness,  strong   educaBonal  system,  increased  economic  development  
  • www.ci.anl.gov   www.ci.uchicago.edu   4   Advancing  Science  through  CI  –  d.katz@ieee.org   PITAC  Report  Summary:     •  “ComputaBonal  science  -­‐-­‐  the  use  of   advanced  compuBng  capabiliBes  to   understand  and  solve  complex   problems  -­‐-­‐  is  criBcal  to  scienBfic   leadership,  economic  compeBBveness,   and  naBonal  security.  It  is  one  of  the   most  important  technical  fields  of  the   21st  century  because  it  is  essenBal  to   advances  throughout  society.”   •  “UniversiBes  must  significantly  change   organizaBonal  structures:     mulBdisciplinary  &  collaboraBve   research  are  needed  [for  US]  to  remain   compeBBve  in  global  science”   Complex  problems:    Innova1ons  will  occur  at  boundaries  
  • www.ci.anl.gov   www.ci.uchicago.edu   5   Advancing  Science  through  CI  –  d.katz@ieee.org   Big  Science  and  Infrastructure   •  Higgs*  boson  discovery  announced  at  CERN  July  4,  2012   •  Instrument:  Large  Hadron  Collider  (LHC)   •  Infrastructure   –  CompuBng  Hardware:  Worldwide  LHC  CompuBng  Grid  (WLCG):  235,000  cores   across  36  countries,  including  OpenScience  Grid  (OSG,  US),  European  Grid   Infrastructure  (EGI,  Europe),  ...   –  Data:  ~20  PB  of  data  created  in  2011-­‐2012   –  Soiware:  grid  middleware,  physics  analysis  applicaBons,  ...   –  Networks   –  EducaBon  &   Training   •  Data  generated     centrally,  moved     (~3  PB/week)   across  mulB-­‐Bered     infrastructure  to  be     compuBng  upon  
  • www.ci.anl.gov   www.ci.uchicago.edu   6   Advancing  Science  through  CI  –  d.katz@ieee.org   Big  Science  and  Infrastructure   •  Hurricanes  affect  humans   •  MulB-­‐physics:  atmosphere,  ocean,  coast,  vegetaBon,  soil   –  Sensors  and  data  as  inputs   •  Humans:  what  have  they  built,  where  are  they,  what  will  they  do   –  Data  and  models  as  inputs   •  Infrastructure:   –  Urgent/scheduled  processing,     workflow  systems   –  Soiware  applicaBons,  workflows   –  Networks   –  Decision-­‐support  systems,     visualizaBon   –  Data  storage,   interoperability  
  • www.ci.anl.gov   www.ci.uchicago.edu   7   Advancing  Science  through  CI  –  d.katz@ieee.org   Long-­‐tail  Science  and  Infrastructure   •  Exploding  data  volumes  &  powerful   simulaBon  methods    mean  that  more   researchers  need  advanced  infrastructure   •  Such  “long-­‐tail”  researchers    cannot  afford   expensive  experBse  and  unique   infrastructure     •  Challenge:  Outsource  and/or  automate   Bme-­‐consuming  common  processes   –  Tools,  e.g.,  Globus  Online  and  data   management   o  Note:  much  LHC  data  is  moved  by  Globus  GridFTP,   e.g.,  May/June  2012,  >20  PB,  >20M  files   –  Gateways,  e.g.,  nanoHUB,  CIPRES,  access  to   scienBfic  simulaBon  soiware   NSF  grant  size,  2007.  (“Dark   data  in  the  long  tail  of   science”,  B.  Heidorn)  
  • www.ci.anl.gov   www.ci.uchicago.edu   8   Advancing  Science  through  CI  –  d.katz@ieee.org   Long-­‐tail  Science  and  Infrastructure   •  CIPRES  Science  Gateway  for  PhylogeneBcs   –  Study  of  diversificaBon  of  life  and  relaBonships  among  living  things  through  Bme   •  Highly  used,  as  of  mid  2013:   –  Cited  in  at  least  400  publicaBons,  e.g.,  Nature,  PNAS,  Cell   –  More  than  5000  unique  users  in  3  years   –  Used  rouBnely  in  at  least  68  undergraduate  classes   –  45%  US  (including  most  states),  55%  70  other  countries   •  Infrastructure   –  Flexible  web  applicaBon   o  A  science  gateway,  uses  soiware  and  lessons  from  XSEDE  gateways  team,  e.g.,  idenBfy   management,  HPC  job  control   –  Science  soiware:  tree  inference  and  sequence  alignment   o  Parallel  versions  of  MrBayes,  RAxML,  GARLI,  BEAST,  MAFFT   o  PAUP*,  Poy,  ClustalW,  Contralign,  FSA,  MUSCLE,  ...   –  Data   o  Personal  user  space  for  storing     results   o  Tools  to  transfer  and  view  data   Credit:  Mark  Miller,  SDSC  
  • www.ci.anl.gov   www.ci.uchicago.edu   9   Advancing  Science  through  CI  –  d.katz@ieee.org   Infrastructure  Challenges   •  Science   –  Larger  teams,  more  disciplines,  more  countries   •  Data     –  Size,  complexity,  rates  all  increasing  rapidly   –  Need  for  interoperability  (systems  and  policies)   •  Systems   –  More  cores,  more  architectures  (GPUs),  more  memory  hierarchy   –  Changing  balances  (latency  vs  bandwidth)   –  Changing  limits  (power,  funds)   –  System  architecture  and  business  models  changing  (clouds)   –  Network  capacity  growing;  increase  networks  -­‐>  increased  security   •  Soiware   –  MulBphysics  algorithms,  frameworks   –  Programing  models  and  abstracBons  for  science,  data,  and  hardware   –  V&V,  reproducibility,  fault  tolerance   •  People   –  EducaBon  and  training   –  Career  paths   –  Credit  and  avribuBon  
  • www.ci.anl.gov   www.ci.uchicago.edu   10   Advancing  Science  through  CI  –  d.katz@ieee.org   Cyberinfrastructure   “Cyberinfrastructure  consists  of    compu1ng  systems,    data  storage  systems,      advanced  instruments  and      data  repositories,      visualiza1on  environments,  and      people,     all  linked  together  by  so@ware  and      high  performance  networks     to  improve  research  produc1vity  and  enable  breakthroughs      not  otherwise  possible.”              -­‐-­‐  Craig  Stewart    
  • www.ci.anl.gov   www.ci.uchicago.edu   11   Advancing  Science  through  CI  –  d.katz@ieee.org   ComputaBonal  &  Data-­‐enabled     Science  &  Engineering  (CDS&E)   •  LIGO:    Laser  Interferometric  GravitaBonal  Wave   Observatory   •  Ties  together  theory,  computaBon,  and  experiment   –  Each  drives  the  other  two!  
  • www.ci.anl.gov   www.ci.uchicago.edu   12   Advancing  Science  through  CI  –  d.katz@ieee.org   How  We  Started   •  State  commitment:  $25M/year  for  Vision  20/20   –  $9M:  LSU  -­‐>  CCT  (similarly,  ULL  -­‐>  LITE)   •  University  commitment  to  build  new  programs  for   21st  century   •  State  and  University  willingness  to  make   extraordinary  investments   •  Opportunity  to  build  new  world  class  program  in   interdisciplinary  research  and  educaBon,  involving   all  of  LSU   •  Ed  Seidel-­‐led  vision  to  insBgate  state-­‐wide   collaboraBon  
  • www.ci.anl.gov   www.ci.uchicago.edu   13   Advancing  Science  through  CI  –  d.katz@ieee.org   Advancing  Research   •  PotenBally  requires  advances  in  three  areas,   depending  on  exisBng  strengths  
  • www.ci.anl.gov   www.ci.uchicago.edu   14   Advancing  Science  through  CI  –  d.katz@ieee.org   CCT Director Office Edward Seidel HPC Partnership McMahon Cyberinfrastructure Development Katz Focus Areas Allen LONI Systems and Software Coast to Cosmos LSU HPC Performance Team Core Comp. Sci. Corporate Relations Blue Waters, etc. Material World Labs: ACAL, DSL, Viz, LCAT, … NSF TeraGrid Cultural Computing Visualization 14   CCT  OrganizaBon  
  • www.ci.anl.gov   www.ci.uchicago.edu   15   Advancing  Science  through  CI  –  d.katz@ieee.org   Cyberinfrastructure  Development   •  Vision:  combine  research  and  infrastructure   –  Research   o  Computer  science   o  ApplicaBons   o  Tools   •  Both  together  have  squared  growth  of  either   alone   •  CyD  staff  –  PhDs  in  CS  and  apps  who  understand   the  whole  picture  and  want  to  grow  the   ecosystem   15   –  Infrastructure   o  Hardware   o  OperaBons   o  Policies  
  • www.ci.anl.gov   www.ci.uchicago.edu   16   Advancing  Science  through  CI  –  d.katz@ieee.org   NaBonal  Lambda  Rail  UNO   Tulane  UL-­‐L   SUBR   LSU   LA  Tech       LONI:  40  Gbps  network   LONI:  ~100TF  IBM,  Dell   Supercomputers   Cybertools:  Tools  and   Services   CompuBng  in  Louisiana   LONI  InsBtute:  People   and  CollaboraBons   TeraGrid,  OSG  
  • www.ci.anl.gov   www.ci.uchicago.edu   17   Advancing  Science  through  CI  –  d.katz@ieee.org   LONI  -­‐  Networking  &  CompuBng   LSU La TechLSU HSC ULL Tulane SU UNOLSU HSC LONI node Multiple 10GE ~500 core Dell cluster & 112 proc. IBM P5 cluster ~4500 core Dell Cluster ULM McNeese NSU SLU Alex Network:  partners  and  customers  
  • www.ci.anl.gov   www.ci.uchicago.edu   18   Advancing  Science  through  CI  –  d.katz@ieee.org   LONI  CompuBng  Resources  (2010)   •  One  central  Dell  cluster  (Queen  Bee)   –  5500  IB-­‐connected  cores  at  ISB  in  Baton  Rouge   –  Archival  storage  contracted  through  NCSA   –  50%  of  allocaBons  dedicated  to  TeraGrid  from  2008       •  Six  distributed  512-­‐core  Dell  clusters   •  Five  distributed  14-­‐node  (112  procs)  IBM  P5-­‐575  clusters   •  Distributed  PetaShare  storage   –  32  TB  disk  @  each  small  Dell  cluster   –  8  TB  disk  on  LSU  &  LaTech  small  Dell  clusters  –  for  LBRN   –  8  TB  at  SC-­‐S  &  HSC-­‐NO  –  for  LBRN   –  250  TB  tape   •  All  run  by  HPC@LSU,  including  user  support/training  
  • www.ci.anl.gov   www.ci.uchicago.edu   19   Advancing  Science  through  CI  –  d.katz@ieee.org   $12M  NSF  CyberTools  Project:  Enabler  and  Driver  
  • www.ci.anl.gov   www.ci.uchicago.edu   20   Advancing  Science  through  CI  –  d.katz@ieee.org   Cactus   •  Component-­‐based     HPC  framework     –  Freely-­‐available     environment  for     collaboraBve  applicaBon     development   •  Cuzng  edge  CS   –  Grid  compuBng,  petascale,  accelerators,  steering,  remote  viz   •  AcBve  user  &  developer  communiBes   –  10  year  pedigree,  >$10M  support   –  Numerical  RelaBvity,  CFD,  Coastal,  Reservoir  Engineering,  …   •  Domain-­‐specific  toolkits,  e.g.  CFD  toolkit   –  FD/FV/FE  numerical  methods   –  Structured,  mulB-­‐block,  unstructured   –  Uses  PETSc,  Trilinos,  MUMPS,  HYPRE   –  Used  to  build  Black  Oil  Toolkit  
  • www.ci.anl.gov   www.ci.uchicago.edu   21   Advancing  Science  through  CI  –  d.katz@ieee.org   PetaShare   •  Main  concept:  data  is  managed  (migrated,  moved,  replicated,  cached,  etc.)     automaBcally   •  Data-­‐aware  storage  systems,  data-­‐aware  schedulers,  cross-­‐domain  metadata   scheme   •  Provides:  250  TB  disk,  400  TB  tape     storage  (and  access  to  naBonal     storage  faciliBes)   •  ApplicaBons:     coastal  &  environmental     modeling,     geospaBal  analysis,     bioinformaBcs,     medical  imaging,     fluid  dynamics,     petroleum  engineering,     numerical  relaBvity,     high  energy  physics.         Credit:  Tevfik  Kosar  
  • www.ci.anl.gov   www.ci.uchicago.edu   22   Advancing  Science  through  CI  –  d.katz@ieee.org   LONI  InsBtute     “CCT  for  the  Louisiana”   •  $15M  5-­‐year  project   –  $7M  BoR,  $8M  from  LaTech,  LSU,  SUBR,  Tulane,  UNO,   ULL   •  Catalyzes  new  inter-­‐insBtuBonal  collaboraBons,   ambiBous  projects  and  top  level  hires:   –  LONI  network  and  compuBng   –  NSF  projects:    PetaShare,  VizTangibles,  TeraGrid,  Blue   Waters   –  EPSCoR:    NSF  CyberTools,  DOE  UCoMS,  DoD     –  NIH:  $17M  LBRN   –  Promote  collaboraBve  research  at  interfaces  for   innovaBon  
  • www.ci.anl.gov   www.ci.uchicago.edu   23   Advancing  Science  through  CI  –  d.katz@ieee.org   LONI  InsBtute  Vision   •  LONI  investments  create  world  leading  infrastructure   •  Create  bold  new  inter-­‐university  superstructure   –  New  faculty,  staff,  students;    train  others.    Focus  on  CS,  Bio,   Materials,  but  all  disciplines  impacted   –  Promote  research  at  interfaces  for  innovaBon   •  Draw  on,  enhance  strengths  of  all  universiBes   –  Strong  groups  recently  created;    collecBvely  world-­‐class   –  Solve  complex  problems  through  collaboraBon  &  computaBon   –  Much  stronger  recruiBng  opportuniBes  for  all  insBtuBons   –  Statewide  interdisciplinary  educaBon  &  research  program   •  Create  University-­‐Industry  Research  Centers  (UIRCs)   –  Research  Triangle,  NCSA/UIUC,  Bay  Area,  others   •  Transform  Louisiana   –  Such  commived  cooperaBon  between  sites  extraordinary  
  • www.ci.anl.gov   www.ci.uchicago.edu   24   Advancing  Science  through  CI  –  d.katz@ieee.org   LONI  InsBtute  Hiring  and  Projects   •  Two  new  faculty  at  each  insBtuBon  (12  total)   –  Six  in  CS,  six  in  Comp.  Bio/Materials   •  Six  ComputaBonal  ScienBsts   –  Following  Bavarian  KONWIHR  project   –  Support  70-­‐90  projects  over  five  years;  lead  to  external  funding   •  Graduate  students   –  36  new  students  funded,  trained;  two  years  each   •  One  Coordinator/economic  development   •  All  hiring  coordinated  across  state   •  Leading  faculty  across  state  create  mulB-­‐insBtuBonal  seed   projects   •  Building  on  seeds,  dozens  of  new  projects  selected,  started   •  Exploit  common  themes,  compuBng  environments,  tools   found  in  all  areas  
  • www.ci.anl.gov   www.ci.uchicago.edu   25   Advancing  Science  through  CI  –  d.katz@ieee.org   TeraGrid  (XSEDE)   •  TeraGrid:  world’s  largest  open  scienBfic  discovery  infrastructure   •  Leadership  class  resources  at  eleven  partner  sites  combined  to  create   an  integrated,  persistent  computaBonal  resource   –  High-­‐performance  networks   –  High-­‐performance  computers  (>1  Pflops  (~100,000  cores)  -­‐>  1.75  Pflops)   o  And  a  Condor  pool  (w/  ~13,000  CPUs)   –  VisualizaBon  systems   –  Data  CollecBons  (>30  PB,  >100  discipline-­‐specific  databases)   –  Science  Gateways   –  User  portal   –  User  services  -­‐  Help  desk,  training,  advanced  app  support   •  Allocated  to  US  researchers  and  their  collaborators  through  naBonal   peer-­‐review  process   –  Generally,  review  of  compuBng,  not  science   •  Mid  2011:  TeraGrid  -­‐-­‐>  XSEDE  
  • www.ci.anl.gov   www.ci.uchicago.edu   26   Advancing  Science  through  CI  –  d.katz@ieee.org   Campus  Champions   •  “Champion”  is  a  staff  or  faculty  member  on  a  campus  that  provides  informaBon  on   XSEDE  to  his/her  colleagues   •  Currently  ~160  insBtuBons  represented  by  champions   •  Champions  get:   –  Monthly  training  and  updates   –  Start-­‐up  accounts   –  Forum  for  sharing  and  interacBons   –  Access  to  informaBon  on  usage  by     local  users   –  RegistraBons  for  annual  XSEDE     Conference  waived   •  Champions  do:   –  Raise  awareness  locally   –  Provide  training   –  Get  users  started  with  access  quickly   –  Represent  needs  of  local  community   –  Provide  feedback  to  improve  services   –  Avend  annual  XSEDE  conference   –  Share  their  training  and  educaBon  materials   –  Build  community  across  campus,  and  among  all  Champions   March 26, 2014 Revised March 22, 2014 Campus Champion Institutions Standard – 87 EPSCoR States – 51 Minority Serving Institutions – 12 EPSCoR States and Minority Serving Institutions – 8 Total Campus Champion Institutions – 158 Credit:  Kay  Hunt  
  • www.ci.anl.gov   www.ci.uchicago.edu   27   Advancing  Science  through  CI  –  d.katz@ieee.org   LONI  and  NaBonal  Cyberinfrastructure   •  TeraGrid   –  One  of  the  11  TeraGrid  Resource  Providers   –  Playing  a  role  in  TG-­‐wide  governance  (TeraGrid  Forum,  ExecuBve   Steering  Commivee,  various  working  groups,  GIG  Director  of   Science)   –  Contributed  administraBve  soiware  AmieGold  (glue  between  TG   account  info  and  local  info)  and  CS  soiware  (HARC,  PetaShare,   SAGA)   •  OSG   –  Currently  providing  resources   •  XSEDE   –  LONI  not  a  partner  in  XSEDE,  but  a  service  provider   •  NaBonally   –  Bringing  in  new  users  from  the  southeast  US   –  LONI  InsBtute  ComputaBonal  ScienBsts  -­‐>    Campus  Champions  
  • www.ci.anl.gov   www.ci.uchicago.edu   28   Advancing  Science  through  CI  –  d.katz@ieee.org   Create  and  maintain  a  CI   ecosystem  providing  new   capabili'es  that  advance   and  accelerate  scienBfic   inquiry  at  unprecedented   complexity  and  scale   Support  the   foundaBonal  research   needed  to  conBnue  to   efficiently  advance  CI   Enable  transformaBve,   interdisciplinary,   collaboraBve,  science   and  engineering   research  and  educaBon   through  the  use  of   advanced  CI   Transform  pracBce  through  new  policies   for  CI  addressing  challenges  of  academic   culture,  open  disseminaBon  and  use,   reproducibility  and  trust,  curaBon,   sustainability,  governance,  citaBon,   stewardship,  and  avribuBon  of  authorship   Develop  a  next  generaBon  diverse   workforce  of  scienBsts  and  engineers   equipped  with  essenBal  skills  to  use   and  develop  CI,  with  CI  used  in  both   the  research  and  educa'on  process   NSF  Vision:  Infrastructure  Role  &  Lifecycle  
  • www.ci.anl.gov   www.ci.uchicago.edu   29   Advancing  Science  through  CI  –  d.katz@ieee.org   Relevant  NSF  Programs   •  EPSCoR  –  targeted  support  for  states  that  are  less   successful  in  NSF  funding   •  MRI  –  Major  Research  InstrumentaBon   •  CIF21  (NSF’s  CI  umbrella)   –  eXtreme  Digital  (XD)   –  Track  1  (Blue  Waters)   –  Soiware  Infrastructure  for  Sustained  InnovaBon  (SI2)   –  Campus  Cyberinfrastructure  -­‐  Network  Infrastructure   and  Engineering  (CC-­‐NIE)   •  IntegraBve  Graduate  EducaBon  and  Research   Traineeship  Program  (IGERT)   •  General  research  programs  
  • www.ci.anl.gov   www.ci.uchicago.edu   30   Advancing  Science  through  CI  –  d.katz@ieee.org   Recap  (to  2010)   •  Louisiana  decides  that  science  and  technology  can   lead  to  a  bever  future   •  Builds  a  regional  cyberinfrastructure  (network,   compuBng,  soiware,  ~data,  people)  that  connects   to  naBonal-­‐scale  infrastructure     –  Using  a  mix  of  naBonal,  state,  and  local  funding   •  Starts  to  change  culture  –  infuse  computaBon  in   academic  departments,  interdisciplinary  hiring,   large  collaboraBve  projects   •  But...   •  Didn’t  really  think  about  data  as  much  as  we  would   have  were  we  starBng  again  today  
  • www.ci.anl.gov   www.ci.uchicago.edu   31   Advancing  Science  through  CI  –  d.katz@ieee.org   •  Swii  is  designed  to  compose  large  parallel  workflows,  from  serial  or  parallel   applicaBon  programs,  to  run  fast  and  efficiently  on  a  variety  of  pla~orms   –  A  parallel  scripBng  system  for  Grids  and  clusters  for  loosely-­‐coupled  applicaBons  -­‐   programs  (executable,  shell,  python,  R,  Octave,  Matlab,  etc.)  linked  by  exchanging   files   –  Easy  to  write:  simple  high-­‐level  C-­‐like  funcBonal  language,  allows  small  Swii   scripts  to  do  large-­‐scale  work   –  Easy  to  run:  contains  all  services  for  running,  in  one  Java  applicaBon   o  Works  on  mulBcore  workstaBons,  HPC,  Grids  (interfaces  to  schedulers,  Globus,  ssh)   –  A  powerful,  efficient,  scalable  and  flexible  execuBon  engine.     o  Scaling  O(10M)  tasks  –  .5M  in  live  science  work,  and  growing   o  CollecBve  data  management  being  developed  to  opBmize  I/O   •  Used  in  earth  science,  neuroscience,  proteomics,  molecular  dynamics,   biochemistry,  economics,  staBsBcs,  knowledge  modeling,  and  more   •  hvp://www.ci.uchicago.edu/swii   M.  Wilde,  N.  Hategan,  J.  M.  Wozniak,  B.  Clifford,  D.  S.  Katz,  I.  Foster,  "Swii:  A  language  for  distributed  parallel  scripBng,"  Parallel  CompuBng,  v. 37(9),  pp.  633-­‐652,  2011.  
  • www.ci.anl.gov   www.ci.uchicago.edu   32   Advancing  Science  through  CI  –  d.katz@ieee.org   Swii  Programming  model:   all  execuBon  driven  by  parallel  data  flow   •  analyze1()  and  analyze2()  are  computed  in  parallel   •  analyze()  returns  r  when  they  are  done   •  This  parallelism  is  automa1c   •  Works  recursively  throughout  the  program’s  call  graph   –  E.g.,  can  embed  within  foreach  loop,  itself  done  in  parallel   –  Foreach  loops  can  be  nested   (int r) analyze(int i)! {! j = analyze1(i); ! k = analyze2(i);! r = 0.5*(j + k);! }! !
  • www.ci.anl.gov   www.ci.uchicago.edu   33   Advancing  Science  through  CI  –  d.katz@ieee.org   Submit host (login node, laptop, Linux server) Data server Swift script Swii  runBme  system  has  drivers  and  algorithms  to  efficiently   support  and  aggregate  vastly  diverse  runBme  environments   Swii  Environment   Clouds:   Amazon  EC2,   XSEDE  Wispy,   Future  Grid  …   Application Programs
  • www.ci.anl.gov   www.ci.uchicago.edu   34   Advancing  Science  through  CI  –  d.katz@ieee.org   Globus   Big data transfer and sharing… …with Dropbox-like simplicity… …directly from your own storage systems Run as a non-profit service to the non-profit research community
  • www.ci.anl.gov   www.ci.uchicago.edu   35   Advancing  Science  through  CI  –  d.katz@ieee.org   Globus  Users   •  “I  need  a  good  place  to  store  or  backup  my  (big)   research  data,  at  a  reasonable  price.”   •  “I  need  to  easily,  quickly,  and  reliably  move  or   mirror  porBons  of  my  data  to  other  places,   including  my  campus  HPC  system,  lab  server,   desktop,  laptop,  XSEDE,  cloud,  etc.”   •  “I  need  a  way  to  easily  and  securely  share  my  data   with  my  colleagues  at  other  insBtuBons.”   •  “I  want  to  publish  my  data  so  that  it’s  available  and   discoverable  long-­‐term.”   •  “I  want  to  archive  my  data  in  case  it’s  needed   someBme  in  the  future.”  
  • www.ci.anl.gov   www.ci.uchicago.edu   36   Advancing  Science  through  CI  –  d.katz@ieee.org   Globus  is  SaaS   •  Web,  command  line,  and  REST  interfaces   •  Reduced  IT  operaBonal  costs   •  New  features  automaBcally  available   •  Consolidated  support  &  troubleshooBng   •  Easy  to  add  your  laptop,  server,  cluster,   supercomputer,  etc.  with  Globus  Connect    
  • www.ci.anl.gov   www.ci.uchicago.edu   37   Advancing  Science  through  CI  –  d.katz@ieee.org   Globus  Connected  Resources  on  Campus   •  Research  compuBng  center   •  Department  /  lab  storage   •  Campus-­‐wide  home/project  file  system   •  Mass  Storage  Systems   •  Science  instruments   •  Desktops  and  laptops   •  Custom  web  applicaBons   •  Amazon  Web  Services  S3  
  • www.ci.anl.gov   www.ci.uchicago.edu   38   Advancing  Science  through  CI  –  d.katz@ieee.org   Lessons   •  Three  triangle  facets  (infrastructure,  computaBonal,  interdisciplinary)  have   be  taken  seriously  at  highest  levels,  seen  as  important  component  of   academic  research   •  Infrastructure  need  to  be  integrated  at  all  levels  (laboratory,  campus,   regional,  naBonal,  internaBonal)  –  users  need  to  be  able  to  easily  move  work   and  data  to  appropriate  systems,  and  collaborate  across  locaBons     •  EducaBon  and  training  of  students  and  faculty  is  crucial  –  vast  improvements   are  needed  over  the  small  numbers  currently  reached  through  HPC  center   tutorials;  computaBon  and  computaBonal  thinking  need  to  be  part  of  new   curricula  across  all  disciplines     •  Emphasis  should  be  made  on  broadening  parBcipaBon  in  computaBon,  not   just  focusing  on  high  end  systems  where  decreasing  numbers  of  researchers   can  join  in,  but  making  tools  much  more  easily  usable  and  intuiBve  and   freeing  all  researchers  from  the  limitaBons  of  their  personal  workstaBons,   and  providing  access  to  simple  tools  for  large  scale  parameter  studies,  data   archiving,  visualizaBon  and  collaboraBon   •  Vision  needs  to  be  consistent  –  cannot  be  just  one  person   •  Funding  needs  to  be  stable  (acBviBes  need  to  be  sustainable)  
  • www.ci.anl.gov   www.ci.uchicago.edu   39   Advancing  Science  through  CI  –  d.katz@ieee.org   Video   •  Data  Sharing  -­‐  hvps://www.youtube.com/ watch?v=N2zK3sAtr-­‐4  
  • www.ci.anl.gov   www.ci.uchicago.edu   40   Advancing  Science  through  CI  –  d.katz@ieee.org   Sources   •  D.  S.  Katz  et  al.,  “Louisiana:  A  Model  for  Advancing  Regional  e-­‐Science   through  Cyberinfrastructure,”  Philosophical  TransacBons  of  the  Royal  Society   A,  367(1897),  2009.   –  authors  from  Louisiana  State  University,  Tulane  University,  University  of  Louisiana   at  Lafayeve,  Louisiana  Tech  University,  Louisiana  Community  and  Technical   College  System,  Southern  University,  University  of  New  Orleans   •  G.  Allen  and  D.  S.  Katz,  “ComputaBonal  science,  infrastructure  and   interdisciplinary  research  on  university  campuses:  experiences  and  lessons   from  the  Center  for  ComputaBon  and  Technology,”  NSF  Workshop  on   Sustainable  Funding  and  Business  Models  for  Academic  Cyberinfrastructure   FaciliBes,  Cornell  University,  2010   •  Daniel  S.  Katz,  David  Proctor,  “A  Framework  for  Discussing  e-­‐Research   Infrastructure  Sustainability,”  hvp://dx.doi.org/10.6084/m9.figshare.790767,   submived  to  Workshop  on  Sustainable  Soiware  for  Science:  PracBce  and   Experiences  (hvp://wssspe.researchcompuBng.org.uk)  at  SC13   •  Swii:  Swii  Team,  led  by  Mike  Wilde,  hvp://www.ci.uchicago.edu/swii   •  Globus:  Globus  Team,  led  by  Ian  Foster  and  Steve  Tuecke,  hvp:// www.globus.org