So long, computer overlordsHow Cloud (and Grid) can liberate research IT – and transform discoveryIan Foster
The data delugeMACHO et al.: 1 TBPalomar: 3 TB2MASS: 10 TBGALEX: 30 TBSloan: 40 TBPan-STARRS: 40,000 TB100,000 TBGenomic sequencing output x2 every 9 month>300 public centers1330molec. bio databases Nucleic Acids Research (96 in Jan 2001)2004: 36 TB2012: 2,300 TBClimate model intercomparisonproject (CMIP) of the IPCC
Big science has achieved big successesOSG: 1.4M CPU-hours/day, >90 sites, >3000 users, >260 pubs in 2010LIGO: 1 PB data in last science run, distributed worldwideRobust production solutionsSubstantial teams and expenseSustained, multi-year effortApplication-specific solutions,   built on common technologyESG: 1.2 PB climate datadelivered to 23,000 users; 600+ pubsAll build on NSF OCI (& DOE)-supported Globus Toolkit software
But small science is strugglingMore data, more complex dataAd-hoc solutionsInadequate software, hardwareData plan mandates
Medium-scale science struggles too!Blanco 4m on Cerro TololoImage credit: Roger Smith/NOAO/AURA/NSFDark Energy Survey receives 100,000 files each night in IllinoisThey transmit files to Texas for analysis … then move results back to IllinoisProcess must be reliable, routine, and efficientThe cyberinfrastructure team is not large
The challenge of staying competitive   "Well, in our country," said Alice … "you'd generally get to somewhere else — if you run very fast for a long time, as we've been doing.”  "A slow sort of country!" said the Queen. "Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!"
Current approaches are unsustainableSmall laboratoriesPI, postdoc, technician, grad studentsEstimate 5,000 across US university communityAverage ill-spent/unmet need of 0.5 FTE/lab?Medium-scale projectsMultiple PIs, a few software engineersEstimate 500 across US university communityAverage ill-spent/unmet need of 3 FTE/project?Total 4000 FTE: at ~$100K/FTE => $400M/yr    Plus computers, storage, opportunity costs, …
And don’t forget administrative costs42%of the time spent by an average PI on a federally funded research project was reported to be expended on administrative tasks related to that project rather than on research       — Federal Demonstration Partnership faculty burden survey, 2007
You can run a company from a coffee shop
Because businesses outsource their ITWeb presenceEmail (hosted Exchange)Calendar Telephony (hosted VOIP) Human resources and payroll Accounting Customer relationship mgmtSoftware as a Service(SaaS)
And often their large-scale computing tooWeb presenceEmail (hosted Exchange)Calendar Telephony (hosted VOIP) Human resources and payroll Accounting Customer relationship mgmt Data analytics Content distributionSoftware as a Service(SaaS)Infrastructure as a Service(IaaS)
Let’s rethink how we provide research ITAccelerate discovery and innovation worldwide by providing research IT as a serviceLeverage software-as-a-service toprovide millions of researchers with unprecedented access to powerful tools; enable  a massive shortening of cycle times intime-consuming research processes; andreduce research IT costs dramatically via economies of scaleso long, computer overlords
Time-consuming tasks in scienceRun experimentsCollect dataManage dataMove dataAcquire computersAnalyze dataRun simulationsCompare experiment with simulationSearch the literatureCommunicate with colleagues
Publish papers
Find, configure, install relevant software
Find, access, analyze relevant data
Order supplies
Write proposals
Write reports
…Time-consuming tasks in scienceRun experimentsCollect dataManage dataMove dataAcquire computersAnalyze dataRun simulationsCompare experiment with simulationSearch the literatureCommunicate with colleagues
Publish papers
Find, configure, install relevant software
Find, access, analyze relevant data
Order supplies
Write proposals
Write reports
…Data movement can be surprisingly difficultBA
Data movement can be surprisingly difficult                      Discover endpoints, determine available                       protocols, negotiate firewalls, configure software,                       manage space, determine required credentials,                       configure protocols, detect and respond to failures, determine expected performance, determine actual performance, identify diagnose and correct network misconfigurations, integrate with file systems, …It took 2 weeks and much help from many people to move 10 TB between California and Tennessee.(2007 BES report)BA
Globus Online’sSaaS/Web 2.0 architectureCommand line interfacelsalcf#dtn:/scpalcf#dtn:/myfile \nersc#dtn:/myfileHTTP REST interfacePOST https://transfer.api.globusonline.org/ v0.10/transfer <transfer-doc>Web interface(Operate) Fire-and-forget data movementAutomatic fault recoveryHigh performanceNo client software installAcross multiple security domains(Hosted on) GridFTP serversFTP serversOther protocols:HTTP, WebDAV, SRM, …Globus Connecton local computers
Example application: UC sequencing facilityMac using Globus ConnectDelivery of data to customeriBi File ServerMount driveiBi general-purpose compute clusterSequencing-specific compute clusterSequencing instrument
Statistics and user feedbackLaunched November 2010>1400 users registered>350 TB user data moved>28 million user files moved>140 endpoints registeredWidely used on TeraGrid/XSEDE; other centers & facilities; internationally>20x faster than SCPFaster than hand-tuned “Last time I needed to fetch 100,000 files from NERSC, a graduate student babysat the process for a month.”“I expected to spend four weeks writing code to manage my data transfers; with Globus Online, I was up and running in five minutes.”“Globus Online’s speed has us planning experiments that we would never have considered previously.”
Moving 586 Terabytes in two weeks
Monitoring provides deep visibility

So Long Computer Overlords

  • 1.
    So long, computeroverlordsHow Cloud (and Grid) can liberate research IT – and transform discoveryIan Foster
  • 4.
    The data delugeMACHOet al.: 1 TBPalomar: 3 TB2MASS: 10 TBGALEX: 30 TBSloan: 40 TBPan-STARRS: 40,000 TB100,000 TBGenomic sequencing output x2 every 9 month>300 public centers1330molec. bio databases Nucleic Acids Research (96 in Jan 2001)2004: 36 TB2012: 2,300 TBClimate model intercomparisonproject (CMIP) of the IPCC
  • 5.
    Big science hasachieved big successesOSG: 1.4M CPU-hours/day, >90 sites, >3000 users, >260 pubs in 2010LIGO: 1 PB data in last science run, distributed worldwideRobust production solutionsSubstantial teams and expenseSustained, multi-year effortApplication-specific solutions, built on common technologyESG: 1.2 PB climate datadelivered to 23,000 users; 600+ pubsAll build on NSF OCI (& DOE)-supported Globus Toolkit software
  • 6.
    But small scienceis strugglingMore data, more complex dataAd-hoc solutionsInadequate software, hardwareData plan mandates
  • 7.
    Medium-scale science strugglestoo!Blanco 4m on Cerro TololoImage credit: Roger Smith/NOAO/AURA/NSFDark Energy Survey receives 100,000 files each night in IllinoisThey transmit files to Texas for analysis … then move results back to IllinoisProcess must be reliable, routine, and efficientThe cyberinfrastructure team is not large
  • 8.
    The challenge ofstaying competitive "Well, in our country," said Alice … "you'd generally get to somewhere else — if you run very fast for a long time, as we've been doing.” "A slow sort of country!" said the Queen. "Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!"
  • 9.
    Current approaches areunsustainableSmall laboratoriesPI, postdoc, technician, grad studentsEstimate 5,000 across US university communityAverage ill-spent/unmet need of 0.5 FTE/lab?Medium-scale projectsMultiple PIs, a few software engineersEstimate 500 across US university communityAverage ill-spent/unmet need of 3 FTE/project?Total 4000 FTE: at ~$100K/FTE => $400M/yr Plus computers, storage, opportunity costs, …
  • 10.
    And don’t forgetadministrative costs42%of the time spent by an average PI on a federally funded research project was reported to be expended on administrative tasks related to that project rather than on research — Federal Demonstration Partnership faculty burden survey, 2007
  • 11.
    You can runa company from a coffee shop
  • 12.
    Because businesses outsourcetheir ITWeb presenceEmail (hosted Exchange)Calendar Telephony (hosted VOIP) Human resources and payroll Accounting Customer relationship mgmtSoftware as a Service(SaaS)
  • 13.
    And often theirlarge-scale computing tooWeb presenceEmail (hosted Exchange)Calendar Telephony (hosted VOIP) Human resources and payroll Accounting Customer relationship mgmt Data analytics Content distributionSoftware as a Service(SaaS)Infrastructure as a Service(IaaS)
  • 14.
    Let’s rethink howwe provide research ITAccelerate discovery and innovation worldwide by providing research IT as a serviceLeverage software-as-a-service toprovide millions of researchers with unprecedented access to powerful tools; enable a massive shortening of cycle times intime-consuming research processes; andreduce research IT costs dramatically via economies of scaleso long, computer overlords
  • 15.
    Time-consuming tasks inscienceRun experimentsCollect dataManage dataMove dataAcquire computersAnalyze dataRun simulationsCompare experiment with simulationSearch the literatureCommunicate with colleagues
  • 16.
  • 17.
    Find, configure, installrelevant software
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
    …Time-consuming tasks inscienceRun experimentsCollect dataManage dataMove dataAcquire computersAnalyze dataRun simulationsCompare experiment with simulationSearch the literatureCommunicate with colleagues
  • 23.
  • 24.
    Find, configure, installrelevant software
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
    …Data movement canbe surprisingly difficultBA
  • 30.
    Data movement canbe surprisingly difficult Discover endpoints, determine available protocols, negotiate firewalls, configure software, manage space, determine required credentials, configure protocols, detect and respond to failures, determine expected performance, determine actual performance, identify diagnose and correct network misconfigurations, integrate with file systems, …It took 2 weeks and much help from many people to move 10 TB between California and Tennessee.(2007 BES report)BA
  • 31.
    Globus Online’sSaaS/Web 2.0architectureCommand line interfacelsalcf#dtn:/scpalcf#dtn:/myfile \nersc#dtn:/myfileHTTP REST interfacePOST https://transfer.api.globusonline.org/ v0.10/transfer <transfer-doc>Web interface(Operate) Fire-and-forget data movementAutomatic fault recoveryHigh performanceNo client software installAcross multiple security domains(Hosted on) GridFTP serversFTP serversOther protocols:HTTP, WebDAV, SRM, …Globus Connecton local computers
  • 32.
    Example application: UCsequencing facilityMac using Globus ConnectDelivery of data to customeriBi File ServerMount driveiBi general-purpose compute clusterSequencing-specific compute clusterSequencing instrument
  • 33.
    Statistics and userfeedbackLaunched November 2010>1400 users registered>350 TB user data moved>28 million user files moved>140 endpoints registeredWidely used on TeraGrid/XSEDE; other centers & facilities; internationally>20x faster than SCPFaster than hand-tuned “Last time I needed to fetch 100,000 files from NERSC, a graduate student babysat the process for a month.”“I expected to spend four weeks writing code to manage my data transfers; with Globus Online, I was up and running in five minutes.”“Globus Online’s speed has us planning experiments that we would never have considered previously.”
  • 34.
  • 35.
  • 36.
    20 Terabytes inless than one dayTerabyte 20 Gigabyes in more than two daysGigabyte Megabyte Kilobyte
  • 37.
    Common research datamanagement stepsDark Energy SurveyGalaxy genomicsLIGO observatorySBGrid structural biology consortiumNCAR climate data applicationsLand use change; economics
  • 38.
    We have choicesof where to computeCampus systemsFirst target for many researchersXSEDE supercomputers220,000 cores, peer-reviewed awardsOptimized for scientific computingOpen Science Grid60,000 cores; high throughputCommercial cloud providersInstant access for small tasksExpensive for big projectsUsers insist that they need everything connected
  • 39.
    Towards “research ITas a service”
  • 40.
    Research data managementas a serviceGO-UserCredentials and other profile informationGO-TransferData movementGO-TeamGroup membershipGO-CollaborateConnect to collaborative tools: Jira, Confluence, …GO-StoreAccess to campus, cloud, XSEDE storageGO-CatalogOn-demand metadata catalogsGO-ComputeAccess to computersGO-GalaxyShare, create, run workflowsTodayPrototypeFall
  • 41.
    SaaS services inaction: The XSEDE visionXUAS
  • 42.
    Data analysis asa service: Early stepsSecurely and reliably:Assemble codeFind computersDeploy codeRun programAccess dataStore dataRecord workflowReuse workflow[7, 8][1, 2]We have built such systems for biological, environmental,and economics researchersVM imageApp codeWorkflowGalaxyCondor[3, 4][5, 6]Data store
  • 43.
    SaaS economics: Aquick tutorialLower per-user cost (x10?) via aggregation onto common infrastructure$400M/yr $40M/yr?Initial “cost trough” due to fixed costsPer-user revenue permits positive return to scale Further reduce per-user cost over time$0TimeX10 reduction in per-user cost: $50K  $5K/yr per lab $300K  $30K/yr per project
  • 44.
    A national cyberinfrastructurestrategy?To providemore capability formore people at less cost …Create infrastructure Robust and universalEconomies of scalePositive returns to scaleVia the creative use ofAggregation (“cloud”)Federation (“grid”)Small and medium laboratories and projectsPLLLLLLLLLPPPPLLLLLLLLLLLLLLLLLLaaSResearch data management Collaboration, computationResearch administration
  • 45.
    AcknowledgmentsColleagues at UChicagoand ArgonneSteve Tuecke, Ravi Madduri, Kyle Chard, Tanu Malik, Michael Russell, Paul Dave, Stuart Martin, Dan Katz, and many othersColleagues at other institutionsCarl Kesselman, MironLivny, John Towns, and othersNSF OCI, MPS, and SBE; DOE ASCR; and NIH for support
  • 46.
    For more informationFoster,I. Globus Online: Accelerating and democratizing science through cloud-based services. IEEE Internet Computing(May/June):70-73, 2011.Allen, B., Bresnahan, J., Childers, L., Foster, I., Kandaswamy, G., Kettimuthu, R., Kordas, J., Link, M., Martin, S., Pickett, K. and Tuecke, S. Globus Online: Radical Simplification of Data Movement via SaaS. Communications of the ACM, 2011.
  • 47.

Editor's Notes

  • #2 I wanted a catchy title, so I chose one that referred to the recent victory of Watson overBrad Rutter and Ken Jenningsin Jeopardy.
  • #3 But my point (perhaps confusingly) is not that new computer capabilities are a bad thing. On the contrary, these capabilities represent a tremendous opportunity for science.The challenge that I want to speak to is how we leverage these capabilities without computers and computation overwhelming the research community in terms of both human and financial resources.The solution, I will suggest, is to get computation out of the lab—to outsource it to third party providers. I will explain how this task can be achieved.
  • #4 The need to deal with and benefit from large quantities of data is not a new concept: it has been noted in many policy reports, particularly in the US and UK, over the past several years.Series of policy reports, particularly in the US and UK, about the new models of science, and investments to be madeA sampling of key reports, in chron order:Atkins report, 2003 – laid out the vision of cyberinfrastructure – which was also used as a roadmap by the UK for their eScience programNSB Long lived data report, 2005 – defined data, data scientists, and laid out capture and curation issues2020 Science – 2006, outlining the data and computational nature of scienceNSF Vision doc, 2007, consolidated the Atkins report, LL data report, others, to layout a programmatic plan. Datanet, Cyberenabled discovery and innovation came from this planRecent ACI report on data and viz.Harnessing the power – NITRD, 2009, for federal agenciesRCUK eScience reviewBlue ribbon panel on economics of curation
  • #5 But now the data deluge is now upon us. I use a few examples to highlight developments:-- Genome sequencing machines are doubling in output every nine months. This leaves the rather stately 18 month Moore’s Law doubling of computer performance in the shade.-- Astronomy, which only entered the digital era around 2000, projects 100,000 TB data from LSST by the end of the decade. [2MASS completed 2001; -- Simulation -- And not just volume, but also complexityTrends: Scale, complexity, distributed generation, …--------Source for genomic data: http://www.sciencemag.org/content/331/6018/728.short (“Output from next-generation sequencing (NGS) has grown from 10 Mb per day to 40 Gb per day on a single sequencer, and there are now 10 to 20 major sequencing labs worldwide that have each deployed more than 10 sequencers “)Source for mol bio dbs: http://nar.oxfordjournals.org/content/39/suppl_1/D1.full.pdf+htmlSource for climate change image: http://serc.carleton.edu/details/images/17685.html
  • #8 Not just small labs—medium science too.E.g., Dark Energy Survey.
  • #9 For many researchers, projects, and institutions, large data volumes are not an opportunity but a fundamental challenge to their competitiveness as researchers. How can they keep up?
  • #10 200 universities * 250 faculty per university = 5,000Summary:-- Big projects can build sophisticated solutions to IT problems-- Small labs and collaborations have problems with both--They need solutions, not toolkits—ideally outsourced solutions
  • #11 Need date
  • #14 Of course, people also make effective use of IaaS, but only for more specialized tasks
  • #15 More specifically, the opportunity is to apply a very modern technology—software as a service, or SaaS—to address a very modern problem, namely the enormous challenges inherent in translating revolutionary 21st century technologies into scientific advances. Midway’s SaaS approach will address these challenges, and both make powerful tools far more widely available, and reduce the cycle time associated with research and discovery.Achieve economies of scaleReduce cost per researcher dramaticallyAchieve positive returns to scaleMost academic solutions do NOT have PRTSMost industrial solutions DO have PRTS
  • #16 So let’s look at that list again.I and my colleagues started an effort a little while ago aimed at applying SaaS to one of these tasks …
  • #17 Example: small lab generates data at Texas Advanced Computing Center or the Advanced Photon Source. Needs to move it back to their lab.Or: Needs to move data from experimental facility (e.g., sequencing center or Dark Energy Survey) to computing facility for analysis.
  • #18 Data movement is conceptually simple, but can be surprisingly difficult
  • #19 Why? Discover endpoints, determine available protocols, negotiate firewalls, configure software, manage space, determine required credentials, configure protocols, detect and respond to failures, identify diagnose and correct network misconfigurations,…
  • #20 •Reliable file transfer. –Easy “fire and forget” file transfers –Automatic fault recovery –High performance –Across multiple security domains•No IT required. –No client software installation –New features automatically available –Consolidated support and troubleshooting –Works with existing GridFTP servers –Globus Connect solves “last mile problem”
  • #21 I’ll talk about integration with the Galaxy workflow system later …
  • #22 Reduce costs.Improve performance.Enable new science.
  • #26 What else do we need?
  • #27 Add university logos?
  • #31 Slide 33: Is the task of creating reusable workflows part of these 6 steps? Is publication and discovery of workflows/derived data products part of this as well? Is reproducible research part of it as well?
  • #32 Researchers vote with their dollars
  • #33 Before-- Lots of little labs-- Big science-- XSEDE After:lots of empowered SMLs, entrepreneurship in science, reproducible/reusable research etc