This image shows a 3D rendering of a Shewanella biofilm grown on a flat plastic substrate in a Constant Depth bioFilm Fermenter (CDFF). The image was generated using x-ray microtomography at the Advanced Photon Source, Argonne National Laboratory.
Mention plans start from 20K and that we are talking to NET+ for campus plans with separate pricing?
Science for the Future: Strategies for Moving and Sharing Data
globus onlineScience for the FutureStrategies fordistributing and sharing datawww.globusonline.orgIan Fosterfoster@anl.gov
Big science data should be easyRegistryStagingStoreIngestStoreAnalysisStoreCommunityStoreArchive MirrorIngestStoreAnalysisStoreCommunityStoreArchive MirrorRegistry
… but it’s hard and frustrating!RegistryStagingStoreIngestStoreAnalysisStoreCommunityStoreArchive MirrorIngestStoreAnalysisStoreCommunityStoreArchive MirrorRegistryQuotaexceeded!Expiredcredentials!Networkfailed. Retry.!Permissiondenied!
Excerpts from ESNet reports• “Transfers often take longer than expectedbased on available network capacities”• “Lack of an easy to use interface to some of thehigh-performance tools”• “Tools [are] too difficult to install and use”• “Time and interruption to other work required tosupervise large data transfers”• “Need data transfer tools that are easy to use,well-supported, and permitted by site and facilitycybersecurity organizations”
We envisage a world where data …… flows rapidly, reliably, and securelyamong:experimental facilities,online and archival storage,computing facilities, andremote institutions
We envisage a world where data …… is easily integrated into dynamicdatasets that also include metadataand programs necessary to understandand regenerate it
We envisage a world where data …… is readily discoverable andaccessible to collaborators, regardlessof their and the data’s location
We believe a new approach isneeded to deliver datamanagement infrastructureFrictionlessAffordableSustainableLike … but for science!
Focusing on “frictionless”, we’ve started to dothis with the Globus Online service …Transfer and sharing oflarge data sets …… with dropbox-likecharacteristics …… directly from your ownstorage systems
We started with reliable, secure,high-performance file transfer …DataSourceDataDestinationUser initiatestransfer request1Globus Onlinemoves andsyncs files2Globus Onlinenotifies user3
… and then made it simple to sharebig data off existing storage systemsDataSourceUser A selectsfile(s) to share,selects user orgroup, and setspermissions1Globus Online tracksshared files; no needto move files to cloudstorage!2User B logs in toGlobus Online andaccesses shared file3
Early adoption is encouraging~18 PB and 1B files moved10x (or better) performance vs. scp99.9% availability
B. Winjum (UCLA) moves900K-file plasma physicsdatasets UCLA NERSC
Dan Kozak (Caltech)replicates 1 PB LIGOastronomy data forresilience
Exemplar: APS Beamline 2-BMX-Ray imaging, tomography, ~few µm to30nm resolutionCurrently can generate>100TB per day<1GB/s data rate; ~3-5GB/s in 5-10 years
Transforming data acquisitionCurrent• Experimental parametersoptimized manually• Collected data combinedwith visual inspection toconfirm optimal condition• Data reconstructed and sentto users via external drive• User team starts datareduction at home institution
Transforming data acquisitionEnvisaged• Experimental parametersoptimized automatically• Collected data available tooptimization programs• Data are automaticallyreconstructed, reduced,and shared with local andremote participants• User team leaves the APSwith reduced dataCurrent• Experimental parametersoptimized manually• Collected data combinedwith visual inspection toconfirm optimal condition• Data reconstructed and sentto users via external drive• User team starts datareduction at home institution
Facility dataacquisitionGlobus Online as enablerGlobus Onlinetransfer serviceReduceddataAnalysis/SharingGlobus Onlinesharing serviceGlobus Onlinedataset service** In development
21Credit: Kerstin Kleese-van DamErin Miller (PNNL)collects data atAdvanced PhotonSource, renders atPNNL, and views atANL
We believe a new approach isneeded to deliver datamanagement infrastructureFrictionlessAffordableSustainable
We’ve got a handle on “frictionless”• Web interface, REST API, command line• InCommon, Oauth, OpenID, X.509, …• Credential management• Group definition and management• Transfer management and optimization• Reliability via transfer retries• Integration with ESNet “Science DMZs”• One-click “Globus Connect” install• 5-minute Globus Connect Multi User install
“Affordable” and “sustainable”?Common expectation is either:– High-priced commercial software(with generally higher levels of quality)Or:– Free, open source software(with generally lower levels of quality)We aim to offer the best of all worlds!
We are a non-profit serviceprovider to the non-profitresearch community
Our challenge:SustainabilityWe are a non-profit serviceprovider to the non-profitresearch community
Starting at $20k per year• Managed endpoints with sharing• Multiple GridFTP servers per endpoint• Branded web sites• Alternate identity provider• Usage reporting• Mass storage system (MSS) optimizations• Operations monitoring and management• Input into and access to product roadmapGlobus Online Provider Plans
Provider Plan not required to get startedUse Globus Connect Multiuser to easilyconnect your resources with GlobusGo to: globusonline.org/gcmuRegistryStagingStoreIngestStoreAnalysisStoreCommunityStoreArchive MirrorIngestStoreAnalysisStoreCommunityStoreArchive MirrorRegistry