Science for the Future: Strategies for Moving and Sharing Data

711 views

Published on

A talk at the National User Facility Organization (NUFO) 2013 meeting at LBNL, where the theme this year is "the future of scientific data."

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
711
On SlideShare
0
From Embeds
0
Number of Embeds
23
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • This image shows a 3D rendering of a Shewanella biofilm grown on a flat plastic substrate in a Constant Depth bioFilm Fermenter (CDFF).  The image was generated using x-ray microtomography at the Advanced Photon Source, Argonne National Laboratory.   
  • Mention plans start from 20K and that we are talking to NET+ for campus plans with separate pricing?
  • Science for the Future: Strategies for Moving and Sharing Data

    1. 1. globus onlineScience for the FutureStrategies fordistributing and sharing datawww.globusonline.orgIan Fosterfoster@anl.gov
    2. 2. Big science data should be easyRegistryStagingStoreIngestStoreAnalysisStoreCommunityStoreArchive MirrorIngestStoreAnalysisStoreCommunityStoreArchive MirrorRegistry
    3. 3. … but it’s hard and frustrating!RegistryStagingStoreIngestStoreAnalysisStoreCommunityStoreArchive MirrorIngestStoreAnalysisStoreCommunityStoreArchive MirrorRegistryQuotaexceeded!Expiredcredentials!Networkfailed. Retry.!Permissiondenied!
    4. 4. Excerpts from ESNet reports• “Transfers often take longer than expectedbased on available network capacities”• “Lack of an easy to use interface to some of thehigh-performance tools”• “Tools [are] too difficult to install and use”• “Time and interruption to other work required tosupervise large data transfers”• “Need data transfer tools that are easy to use,well-supported, and permitted by site and facilitycybersecurity organizations”
    5. 5. We envisage a world where data …… flows rapidly, reliably, and securelyamong:experimental facilities,online and archival storage,computing facilities, andremote institutions
    6. 6. We envisage a world where data …… is easily integrated into dynamicdatasets that also include metadataand programs necessary to understandand regenerate it
    7. 7. We envisage a world where data …… is readily discoverable andaccessible to collaborators, regardlessof their and the data’s location
    8. 8. We believe a new approach isneeded to deliver datamanagement infrastructureFrictionlessAffordableSustainableLike … but for science!
    9. 9. Focusing on “frictionless”, we’ve started to dothis with the Globus Online service …Transfer and sharing oflarge data sets …… with dropbox-likecharacteristics …… directly from your ownstorage systems
    10. 10. We started with reliable, secure,high-performance file transfer …DataSourceDataDestinationUser initiatestransfer request1Globus Onlinemoves andsyncs files2Globus Onlinenotifies user3
    11. 11. … and then made it simple to sharebig data off existing storage systemsDataSourceUser A selectsfile(s) to share,selects user orgroup, and setspermissions1Globus Online tracksshared files; no needto move files to cloudstorage!2User B logs in toGlobus Online andaccesses shared file3
    12. 12. Early adoption is encouraging
    13. 13. Early adoption is encouraging~18 PB and 1B files moved10x (or better) performance vs. scp99.9% availability
    14. 14. B. Winjum (UCLA) moves900K-file plasma physicsdatasets UCLA NERSC
    15. 15. Dan Kozak (Caltech)replicates 1 PB LIGOastronomy data forresilience
    16. 16. Exemplar: APS Beamline 2-BMX-Ray imaging, tomography, ~few µm to30nm resolutionCurrently can generate>100TB per day<1GB/s data rate; ~3-5GB/s in 5-10 years
    17. 17. Transforming data acquisitionCurrent• Experimental parametersoptimized manually• Collected data combinedwith visual inspection toconfirm optimal condition• Data reconstructed and sentto users via external drive• User team starts datareduction at home institution
    18. 18. Transforming data acquisitionEnvisaged• Experimental parametersoptimized automatically• Collected data available tooptimization programs• Data are automaticallyreconstructed, reduced,and shared with local andremote participants• User team leaves the APSwith reduced dataCurrent• Experimental parametersoptimized manually• Collected data combinedwith visual inspection toconfirm optimal condition• Data reconstructed and sentto users via external drive• User team starts datareduction at home institution
    19. 19. Facility dataacquisitionGlobus Online as enablerGlobus Onlinetransfer serviceReduceddataAnalysis/SharingGlobus Onlinesharing serviceGlobus Onlinedataset service** In development
    20. 20. 21Credit: Kerstin Kleese-van DamErin Miller (PNNL)collects data atAdvanced PhotonSource, renders atPNNL, and views atANL
    21. 21. We believe a new approach isneeded to deliver datamanagement infrastructureFrictionlessAffordableSustainable
    22. 22. We’ve got a handle on “frictionless”• Web interface, REST API, command line• InCommon, Oauth, OpenID, X.509, …• Credential management• Group definition and management• Transfer management and optimization• Reliability via transfer retries• Integration with ESNet “Science DMZs”• One-click “Globus Connect” install• 5-minute Globus Connect Multi User install
    23. 23. “Affordable” and “sustainable”?Common expectation is either:– High-priced commercial software(with generally higher levels of quality)Or:– Free, open source software(with generally lower levels of quality)We aim to offer the best of all worlds!
    24. 24. We are a non-profit serviceprovider to the non-profitresearch community
    25. 25. Our challenge:SustainabilityWe are a non-profit serviceprovider to the non-profitresearch community
    26. 26. Starting at $20k per year• Managed endpoints with sharing• Multiple GridFTP servers per endpoint• Branded web sites• Alternate identity provider• Usage reporting• Mass storage system (MSS) optimizations• Operations monitoring and management• Input into and access to product roadmapGlobus Online Provider Plans
    27. 27. Provider Plan not required to get startedUse Globus Connect Multiuser to easilyconnect your resources with GlobusGo to: globusonline.org/gcmuRegistryStagingStoreIngestStoreAnalysisStoreCommunityStoreArchive MirrorIngestStoreAnalysisStoreCommunityStoreArchive MirrorRegistry
    28. 28. We hope you will join us
    29. 29. Providers are also using GlobusOnline as a platformGlobus Nexus(Identity, Group, Profile)…Sharing ServiceTransfer ServiceDataset ServicesGlobus ToolkitGlobusOnlineAPIsGlobusConnect
    30. 30. Early platform adopters
    31. 31. Our research is supported by:U.S. DE PARTME NT OFENERGY
    32. 32. QuestionsContact: support@globusonline.orgProviders: globusonline.org/provider-plansResearchers: globusonline.org/pluswww.globusonline.org

    ×