Science cloud foster june 2013


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Science cloud foster june 2013

  1. 1. computationinstitute.orgScience as a serviceHow on-demand computingcan accelerate discoveryIan
  2. 2. computationinstitute.orgA time of disruptive changeAs evidenced by cost per human genome
  3. 3. computationinstitute.orgA time of disruptive changeAs evidenced by cost per human genome
  4. 4. computationinstitute.orgBut most labs have extremelylimited resourcesHeidorn: NSFgrants in 2007< $350,00080% of awards50% of grant $$$1,000,000$100,000$10,000$1,0002000 4000 6000 8000
  5. 5. computationinstitute.orgAutomation is requiredto apply moresophisticated methods tofar more data
  6. 6. computationinstitute.orgAutomation is requiredto apply moresophisticated methods tofar more dataOutsourcing is neededto achieve economies ofscale in the use ofautomated methods
  7. 7. computationinstitute.orgBuilding a discovery cloud• Identify time-consuming activities amenableto automation and outsourcing• Implement as high-quality, low-touch SaaS• Leverage IaaS for reliability,economies of scale• Extract common elements asresearch automation platformBonus question: SustainabilitySoftware as a servicePlatform as a serviceInfrastructure as a service
  8. 8. computationinstitute.orgWhere does time go in research?42%!!
  9. 9. computationinstitute.orgWe aspire (initially) to create agreat user experience forresearch data managementWhat would a “dropbox forscience” look like?
  10. 10.• Collect• Move• Sync• Share• Analyze• Annotate• Publish• Search• Backup• ArchiveBIG DATA
  11. 11. computationinstitute.orgRegistryStagingStoreIngestStoreAnalysisStoreCommunityStoreArchive MirrorIngestStoreAnalysisStoreCommunityStoreArchive MirrorRegistryQuotaexceeded!Expiredcredentials!Networkfailed. Retry.!Permissiondenied!It should be trivial to Collect, Move, Sync, Share, Analyze,Annotate, Publish, Search, Backup, & Archive BIG DATA… but in reality it’s often very challenging
  12. 12.• Collect• Move• Sync• Share• Analyze• Annotate• Publish• Search• Backup• ArchiveBIG DATA
  13. 13.• Collect• Move• Sync• Share• Analyze• Annotate• Publish• Search• Backup• Archive• Collect• Move• Sync• ShareCapabilities delivered usingSoftware-as-Service (SaaS) model
  14. 14. computationinstitute.orgDataSourceDataDestinationUserinitiatestransferrequest1GlobusOnlinemoves/syncs files2Globus Onlinenotifies user3
  15. 15. computationinstitute.orgDataSourceUser A selectsfile(s) to share;selectsuser/group, setsshare permissions1Globus Online tracksshared files; no needto move files tocloud storage!2User B logs in toGlobus Onlineand accessesshared file3
  16. 16. computationinstitute.orgExtreme ease of use• InCommon, Oauth, OpenID, X.509, …• Credential management• Group definition and management• Transfer management and optimization• Reliability via transfer retries• Web interface, REST API, command line• One-click “Globus Connect” install• 5-minute Globus Connect Multi User install
  17. 17. computationinstitute.orgEarly adoption is encouraging
  18. 18. computationinstitute.orgEarly adoption is encouraging10,000 registered users; >100 daily~18 PB moved; ~1B files10x (or better) performance vs. scp99.9% availabilityEntirely hosted on Amazon
  19. 19. 1e-011e+011e+031e+051e+07duration2011 20121 second1 minute1 hour1 day1 weekDuration of runs, in seconds, over time.Red: >10 TB transfer; green: >1 TB transfer.
  20. 20. computationinstitute.orgK. Heitmann (Argonne)moves 22 TB of cosmologydata LANL  ANL at 5 Gb/s
  21. 21. computationinstitute.orgB. Winjum (UCLA) moves900K-file plasma physicsdatasets UCLA NERSC
  22. 22. computationinstitute.orgDan Kozak (Caltech)replicates 1 PB LIGOastronomy data for resilience
  23. 23. computationinstitute.org2Credit: Kerstin Kleese-van DamErin Miller (PNNL)collects data atAdvanced PhotonSource, renders atPNNL, and views atANL
  24. 24.• Collect• Move• Sync• Share• Analyze• Annotate• Publish• Search• Backup• ArchiveBIG DATA
  25. 25.• Collect• Move• Sync• Share• Analyze• Annotate• Publish• Search• Backup• ArchiveBIG DATA
  26. 26. computationinstitute.orgGlobus Online already does a lotGlobus ToolkitSharing ServiceTransfer ServiceGlobus Nexus(Identity, Group, Profile)GlobusOnlineAPIsGlobusConnect
  27. 27. computationinstitute.orgA platform for integration
  28. 28. computationinstitute.orgA platform for integration
  29. 29. computationinstitute.orgA platform for integration
  30. 30. computationinstitute.orgData management SaaS (Globus) +Next-gen sequence analysis pipelines (Galaxy) +Cloud IaaS (Amazon) =Flexible, scalable, easy-to-use genomicsanalysis for all biologistsglobusgenomics
  31. 31. computationinstitute.org31Ravi Madduri, Bo Liu, Paul Davé, et al.
  32. 32. computationinstitute.orgRNA-Seq pipeline3Ravi Madduri, Bo Liu, Paul Davé, et al.
  33. 33. computationinstitute.orgAmazon pricing for DiffusionTensor Imaging pipeline00. m1.xlarge m3.xlarge m3.2xlarge m2.xlarge m2.2xlarge m2.4xlargeCostperSubject($)On-Demand Spot (Low) Spot (High)Credit: Kyle Chard
  34. 34. computationinstitute.orgWe are adding capabilitiesGlobus ToolkitSharing ServiceTransfer ServiceGlobus Nexus(Identity, Group, Profile)GlobusOnlineAPIsGlobusConnect
  35. 35. computationinstitute.orgGlobus ToolkitSharing ServiceTransfer ServiceDataset ServicesGlobus Nexus(Identity, Group, Profile)GlobusOnlineAPIsGlobusConnectWe are adding capabilities
  36. 36.• Ingest and publication– Imagine a DropBox that not only replicates, butalso extracts metadata, catalogs, converts• Cataloging– Virtual views of data based on user-definedand/or automatically extracted metadata• Computation– Associate computational procedures,orchestrate application, catalog results, recordprovenanceWe are adding capabilities
  37. 37. computationinstitute.orgLooking deeply at howresearchers use data• A single research question often requires theintegration of many data elements, that are:– In different locations– In different formats (Excel, text, CDF, HDF, …)– Described in different ways• Best grouping can vary during investigation– Longitudinal, vertical, cross-cutting• But always needs to be operated on as a unit– Share, annotate, process, copy, archive, …
  38. 38. computationinstitute.orgHow do we manage data today?• Often, a curious mix of ad hoc methods– Organize in directories using file and directorynaming conventions– Capture status in README files, spreadsheets,notebooks• Time-consuming, complex, error proneWhy can’t we manage our data likewe manage our pictures and music?
  39. 39. computationinstitute.orgIntroducing the dataset• Group data based on use, not location– Logical grouping to organize, reorganize, search, anddescribe usage• Tag with characteristics that reflect content …– Capture as much existing information as we can• …or to reflect current status in investigation– Stage of processing, provenance, validation, ..• Share data sets for collaboration– Control access to data and metadata• Operate on datasets as units– Copy, export, analyze, tag, archive, …
  40. 40. computationinstitute.orgBuilds on catalog as a serviceApproach• Hosted user-definedcatalogs• Based on tag model<subject, name, value>• Optional schemaconstraints• Integrated with otherGlobus servicesThree REST APIs/query/• Retrieve subjects/tags/• Create, delete, retrievetags/tagdef/• Create, delete, retrievetag definitionsBuilds on USC Tagfiler project (C. Kesselman et al.)
  41. 41. computationinstitute.orgProvide more capability formore people at lower cost bybuilding a “Discovery Cloud”Delivering “Science as a service”Our vision for a 21st centurydiscovery infrastructure
  42. 42. computationinstitute.orgIt’s a time of great opportunity … todevelop and apply Science aaSGlobus Nexus(Identity, Group, Profile)…Sharing ServiceTransfer ServiceDataset ServicesGlobus ToolkitGlobusOnlineAPIsGlobusConnect
  43. 43. computationinstitute.orgThanks to great colleaguesand collaborators• Steve Tuecke, Rachana Ananthakrishnan, KyleChard, Raj Kettimuthu, Ravi Madduri, TanuMalik, and many others at Argonne & Uchicago• Carl Kesselman, Karl Czajkowski, Rob Schuler,and others at USC/ISI• Francesco de Carlo, Chris Jacobsen, and othersat Argonne
  44. 44. computationinstitute.orgThank you to our sponsors!