Science cloud foster june 2013
Upcoming SlideShare
Loading in...5
×
 

Science cloud foster june 2013

on

  • 161 views

 

Statistics

Views

Total Views
161
Views on SlideShare
161
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Science cloud foster june 2013 Science cloud foster june 2013 Presentation Transcript

  • computationinstitute.orgScience as a serviceHow on-demand computingcan accelerate discoveryIan Fosterfoster@anl.gov
  • computationinstitute.orgA time of disruptive changeAs evidenced by cost per human genome
  • computationinstitute.orgA time of disruptive changeAs evidenced by cost per human genome
  • computationinstitute.orgBut most labs have extremelylimited resourcesHeidorn: NSFgrants in 2007< $350,00080% of awards50% of grant $$$1,000,000$100,000$10,000$1,0002000 4000 6000 8000
  • computationinstitute.orgAutomation is requiredto apply moresophisticated methods tofar more data
  • computationinstitute.orgAutomation is requiredto apply moresophisticated methods tofar more dataOutsourcing is neededto achieve economies ofscale in the use ofautomated methods
  • computationinstitute.orgBuilding a discovery cloud• Identify time-consuming activities amenableto automation and outsourcing• Implement as high-quality, low-touch SaaS• Leverage IaaS for reliability,economies of scale• Extract common elements asresearch automation platformBonus question: SustainabilitySoftware as a servicePlatform as a serviceInfrastructure as a service
  • computationinstitute.orgWhere does time go in research?42%!!
  • computationinstitute.orgWe aspire (initially) to create agreat user experience forresearch data managementWhat would a “dropbox forscience” look like?
  • computationinstitute.org• Collect• Move• Sync• Share• Analyze• Annotate• Publish• Search• Backup• ArchiveBIG DATA
  • computationinstitute.orgRegistryStagingStoreIngestStoreAnalysisStoreCommunityStoreArchive MirrorIngestStoreAnalysisStoreCommunityStoreArchive MirrorRegistryQuotaexceeded!Expiredcredentials!Networkfailed. Retry.!Permissiondenied!It should be trivial to Collect, Move, Sync, Share, Analyze,Annotate, Publish, Search, Backup, & Archive BIG DATA… but in reality it’s often very challenging
  • computationinstitute.org• Collect• Move• Sync• Share• Analyze• Annotate• Publish• Search• Backup• ArchiveBIG DATA
  • computationinstitute.org• Collect• Move• Sync• Share• Analyze• Annotate• Publish• Search• Backup• Archive• Collect• Move• Sync• ShareCapabilities delivered usingSoftware-as-Service (SaaS) model
  • computationinstitute.orgDataSourceDataDestinationUserinitiatestransferrequest1GlobusOnlinemoves/syncs files2Globus Onlinenotifies user3
  • computationinstitute.orgDataSourceUser A selectsfile(s) to share;selectsuser/group, setsshare permissions1Globus Online tracksshared files; no needto move files tocloud storage!2User B logs in toGlobus Onlineand accessesshared file3
  • computationinstitute.orgExtreme ease of use• InCommon, Oauth, OpenID, X.509, …• Credential management• Group definition and management• Transfer management and optimization• Reliability via transfer retries• Web interface, REST API, command line• One-click “Globus Connect” install• 5-minute Globus Connect Multi User install
  • computationinstitute.orgEarly adoption is encouraging
  • computationinstitute.orgEarly adoption is encouraging10,000 registered users; >100 daily~18 PB moved; ~1B files10x (or better) performance vs. scp99.9% availabilityEntirely hosted on Amazon
  • 1e-011e+011e+031e+051e+07duration2011 20121 second1 minute1 hour1 day1 weekDuration of runs, in seconds, over time.Red: >10 TB transfer; green: >1 TB transfer.
  • computationinstitute.orgK. Heitmann (Argonne)moves 22 TB of cosmologydata LANL  ANL at 5 Gb/s
  • computationinstitute.orgB. Winjum (UCLA) moves900K-file plasma physicsdatasets UCLA NERSC
  • computationinstitute.orgDan Kozak (Caltech)replicates 1 PB LIGOastronomy data for resilience
  • computationinstitute.org2Credit: Kerstin Kleese-van DamErin Miller (PNNL)collects data atAdvanced PhotonSource, renders atPNNL, and views atANL
  • computationinstitute.org• Collect• Move• Sync• Share• Analyze• Annotate• Publish• Search• Backup• ArchiveBIG DATA
  • computationinstitute.org• Collect• Move• Sync• Share• Analyze• Annotate• Publish• Search• Backup• ArchiveBIG DATA
  • computationinstitute.orgGlobus Online already does a lotGlobus ToolkitSharing ServiceTransfer ServiceGlobus Nexus(Identity, Group, Profile)GlobusOnlineAPIsGlobusConnect
  • computationinstitute.orgA platform for integration
  • computationinstitute.orgA platform for integration
  • computationinstitute.orgA platform for integration
  • computationinstitute.orgData management SaaS (Globus) +Next-gen sequence analysis pipelines (Galaxy) +Cloud IaaS (Amazon) =Flexible, scalable, easy-to-use genomicsanalysis for all biologistsglobusgenomics
  • computationinstitute.org31Ravi Madduri, Bo Liu, Paul Davé, et al.
  • computationinstitute.orgRNA-Seq pipeline3Ravi Madduri, Bo Liu, Paul Davé, et al.
  • computationinstitute.orgAmazon pricing for DiffusionTensor Imaging pipeline00.050.10.150.20.250.30.350.40.450.5m1.large m1.xlarge m3.xlarge m3.2xlarge m2.xlarge m2.2xlarge m2.4xlargeCostperSubject($)On-Demand Spot (Low) Spot (High)Credit: Kyle Chard
  • computationinstitute.orgWe are adding capabilitiesGlobus ToolkitSharing ServiceTransfer ServiceGlobus Nexus(Identity, Group, Profile)GlobusOnlineAPIsGlobusConnect
  • computationinstitute.orgGlobus ToolkitSharing ServiceTransfer ServiceDataset ServicesGlobus Nexus(Identity, Group, Profile)GlobusOnlineAPIsGlobusConnectWe are adding capabilities
  • computationinstitute.org• Ingest and publication– Imagine a DropBox that not only replicates, butalso extracts metadata, catalogs, converts• Cataloging– Virtual views of data based on user-definedand/or automatically extracted metadata• Computation– Associate computational procedures,orchestrate application, catalog results, recordprovenanceWe are adding capabilities
  • computationinstitute.orgLooking deeply at howresearchers use data• A single research question often requires theintegration of many data elements, that are:– In different locations– In different formats (Excel, text, CDF, HDF, …)– Described in different ways• Best grouping can vary during investigation– Longitudinal, vertical, cross-cutting• But always needs to be operated on as a unit– Share, annotate, process, copy, archive, …
  • computationinstitute.orgHow do we manage data today?• Often, a curious mix of ad hoc methods– Organize in directories using file and directorynaming conventions– Capture status in README files, spreadsheets,notebooks• Time-consuming, complex, error proneWhy can’t we manage our data likewe manage our pictures and music?
  • computationinstitute.orgIntroducing the dataset• Group data based on use, not location– Logical grouping to organize, reorganize, search, anddescribe usage• Tag with characteristics that reflect content …– Capture as much existing information as we can• …or to reflect current status in investigation– Stage of processing, provenance, validation, ..• Share data sets for collaboration– Control access to data and metadata• Operate on datasets as units– Copy, export, analyze, tag, archive, …
  • computationinstitute.orgBuilds on catalog as a serviceApproach• Hosted user-definedcatalogs• Based on tag model<subject, name, value>• Optional schemaconstraints• Integrated with otherGlobus servicesThree REST APIs/query/• Retrieve subjects/tags/• Create, delete, retrievetags/tagdef/• Create, delete, retrievetag definitionsBuilds on USC Tagfiler project (C. Kesselman et al.)
  • computationinstitute.orgProvide more capability formore people at lower cost bybuilding a “Discovery Cloud”Delivering “Science as a service”Our vision for a 21st centurydiscovery infrastructure
  • computationinstitute.orgIt’s a time of great opportunity … todevelop and apply Science aaSGlobus Nexus(Identity, Group, Profile)…Sharing ServiceTransfer ServiceDataset ServicesGlobus ToolkitGlobusOnlineAPIsGlobusConnect
  • computationinstitute.orgThanks to great colleaguesand collaborators• Steve Tuecke, Rachana Ananthakrishnan, KyleChard, Raj Kettimuthu, Ravi Madduri, TanuMalik, and many others at Argonne & Uchicago• Carl Kesselman, Karl Czajkowski, Rob Schuler,and others at USC/ISI• Francesco de Carlo, Chris Jacobsen, and othersat Argonne
  • computationinstitute.orgThank you to our sponsors!