• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Science for the Future: Strategies for Moving and Sharing Data
 

Science for the Future: Strategies for Moving and Sharing Data

on

  • 585 views

A talk at the National User Facility Organization (NUFO) 2013 meeting at LBNL, where the theme this year is "the future of scientific data."

A talk at the National User Facility Organization (NUFO) 2013 meeting at LBNL, where the theme this year is "the future of scientific data."

Statistics

Views

Total Views
585
Views on SlideShare
561
Embed Views
24

Actions

Likes
1
Downloads
7
Comments
0

2 Embeds 24

https://twitter.com 19
http://www.ianfoster.org 5

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • This image shows a 3D rendering of a Shewanella biofilm grown on a flat plastic substrate in a Constant Depth bioFilm Fermenter (CDFF).  The image was generated using x-ray microtomography at the Advanced Photon Source, Argonne National Laboratory.   
  • Mention plans start from 20K and that we are talking to NET+ for campus plans with separate pricing?

Science for the Future: Strategies for Moving and Sharing Data Science for the Future: Strategies for Moving and Sharing Data Presentation Transcript

  • globus onlineScience for the FutureStrategies fordistributing and sharing datawww.globusonline.orgIan Fosterfoster@anl.gov
  • Big science data should be easyRegistryStagingStoreIngestStoreAnalysisStoreCommunityStoreArchive MirrorIngestStoreAnalysisStoreCommunityStoreArchive MirrorRegistry
  • … but it’s hard and frustrating!RegistryStagingStoreIngestStoreAnalysisStoreCommunityStoreArchive MirrorIngestStoreAnalysisStoreCommunityStoreArchive MirrorRegistryQuotaexceeded!Expiredcredentials!Networkfailed. Retry.!Permissiondenied!
  • Excerpts from ESNet reports• “Transfers often take longer than expectedbased on available network capacities”• “Lack of an easy to use interface to some of thehigh-performance tools”• “Tools [are] too difficult to install and use”• “Time and interruption to other work required tosupervise large data transfers”• “Need data transfer tools that are easy to use,well-supported, and permitted by site and facilitycybersecurity organizations”
  • We envisage a world where data …… flows rapidly, reliably, and securelyamong:experimental facilities,online and archival storage,computing facilities, andremote institutions
  • We envisage a world where data …… is easily integrated into dynamicdatasets that also include metadataand programs necessary to understandand regenerate it
  • We envisage a world where data …… is readily discoverable andaccessible to collaborators, regardlessof their and the data’s location
  • We believe a new approach isneeded to deliver datamanagement infrastructureFrictionlessAffordableSustainableLike … but for science!
  • Focusing on “frictionless”, we’ve started to dothis with the Globus Online service …Transfer and sharing oflarge data sets …… with dropbox-likecharacteristics …… directly from your ownstorage systems
  • We started with reliable, secure,high-performance file transfer …DataSourceDataDestinationUser initiatestransfer request1Globus Onlinemoves andsyncs files2Globus Onlinenotifies user3
  • … and then made it simple to sharebig data off existing storage systemsDataSourceUser A selectsfile(s) to share,selects user orgroup, and setspermissions1Globus Online tracksshared files; no needto move files to cloudstorage!2User B logs in toGlobus Online andaccesses shared file3
  • Early adoption is encouraging
  • Early adoption is encouraging~18 PB and 1B files moved10x (or better) performance vs. scp99.9% availability
  • B. Winjum (UCLA) moves900K-file plasma physicsdatasets UCLA NERSC
  • Dan Kozak (Caltech)replicates 1 PB LIGOastronomy data forresilience
  • Exemplar: APS Beamline 2-BMX-Ray imaging, tomography, ~few µm to30nm resolutionCurrently can generate>100TB per day<1GB/s data rate; ~3-5GB/s in 5-10 years
  • Transforming data acquisitionCurrent• Experimental parametersoptimized manually• Collected data combinedwith visual inspection toconfirm optimal condition• Data reconstructed and sentto users via external drive• User team starts datareduction at home institution
  • Transforming data acquisitionEnvisaged• Experimental parametersoptimized automatically• Collected data available tooptimization programs• Data are automaticallyreconstructed, reduced,and shared with local andremote participants• User team leaves the APSwith reduced dataCurrent• Experimental parametersoptimized manually• Collected data combinedwith visual inspection toconfirm optimal condition• Data reconstructed and sentto users via external drive• User team starts datareduction at home institution
  • Facility dataacquisitionGlobus Online as enablerGlobus Onlinetransfer serviceReduceddataAnalysis/SharingGlobus Onlinesharing serviceGlobus Onlinedataset service** In development
  • 21Credit: Kerstin Kleese-van DamErin Miller (PNNL)collects data atAdvanced PhotonSource, renders atPNNL, and views atANL
  • We believe a new approach isneeded to deliver datamanagement infrastructureFrictionlessAffordableSustainable
  • We’ve got a handle on “frictionless”• Web interface, REST API, command line• InCommon, Oauth, OpenID, X.509, …• Credential management• Group definition and management• Transfer management and optimization• Reliability via transfer retries• Integration with ESNet “Science DMZs”• One-click “Globus Connect” install• 5-minute Globus Connect Multi User install
  • “Affordable” and “sustainable”?Common expectation is either:– High-priced commercial software(with generally higher levels of quality)Or:– Free, open source software(with generally lower levels of quality)We aim to offer the best of all worlds!
  • We are a non-profit serviceprovider to the non-profitresearch community
  • Our challenge:SustainabilityWe are a non-profit serviceprovider to the non-profitresearch community
  • Starting at $20k per year• Managed endpoints with sharing• Multiple GridFTP servers per endpoint• Branded web sites• Alternate identity provider• Usage reporting• Mass storage system (MSS) optimizations• Operations monitoring and management• Input into and access to product roadmapGlobus Online Provider Plans
  • Provider Plan not required to get startedUse Globus Connect Multiuser to easilyconnect your resources with GlobusGo to: globusonline.org/gcmuRegistryStagingStoreIngestStoreAnalysisStoreCommunityStoreArchive MirrorIngestStoreAnalysisStoreCommunityStoreArchive MirrorRegistry
  • We hope you will join us
  • Providers are also using GlobusOnline as a platformGlobus Nexus(Identity, Group, Profile)…Sharing ServiceTransfer ServiceDataset ServicesGlobus ToolkitGlobusOnlineAPIsGlobusConnect
  • Early platform adopters
  • Our research is supported by:U.S. DE PARTME NT OFENERGY
  • QuestionsContact: support@globusonline.orgProviders: globusonline.org/provider-plansResearchers: globusonline.org/pluswww.globusonline.org