Research Data Management as a Service


Published on

This presentation is by Ian Foster, director of the Computation Institute at The University of Chicago. It was given at the Great Plains Network Annual Meeting, on May 29, 2013.

For more information on Globus Online, visit

"What would a Dropbox for science look like?" asks Foster. "It should be trivial to collect, move, sync, share, analyze, annotate, publish, search, backup, and archive Big Data. But in reality it's often very challenging."

Globus Online, a software as a service for data management, solves these problems. This slideshow explains how Globus Online does that for universities and laboratories around the world.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Here are some of the areas where we have active projectsFocus on areas of particular interest to I2/Esnet, namely HEP, climate change, genomics (up and coming)
  • Many in this room are probably users of Dropbox or similar services for keeping their files synced across multiple machinesWell, the scientific research equivalent is a little different
  • So how would such a drop box for science be used? Let’s look at a very typical scientific data work flow . . .Data is generated by some instrument (a sequencer at JGI or a light source like APS/ALS)…since these instruments are in high demand, users have to get their data off the instrument to make way for the next userSo the data is typically moved from a staging area to some type of ingest storeEtcetera for analysis, sharing of results with collaborators, annotation with metadata for future search, backup/sync/archival, …
  • We figured it needs to allow a group of collaborating researchers to do many or all of these things with their data ……and not just the 2GB of powerpoints…or the 100GB of family photos and videos….but the petabytes and exabytes of data that will soon be the norm for many
  • Started with seemingly simple/mundane task of transferring files …etc.
  • Research Data Management as a Service

    1. 1.    Research data managementas a serviceIan
    2. 2.    High energyphysicsMolecular biologyCosmologyGeneticsMetagenomicsLinguisticsEconomicsClimatechangeVisualarts
    3. 3.    What would a 
“dropbox for science” 
look like?
    4. 4.    Registry  Staging  Store  Ingest  Store  Analysis  Store  Community  Store  Archive   Mirror  Ingest  Store  Analysis  Store  Community  Store  Archive   Mirror  Registry  Quotaexceeded!Expiredcredentials!Networkfailed. Retry.!Permissiondenied!It should be trivial to Collect, Move, Sync, Share, Analyze,Annotate, Publish, Search, Backup, & Archive BIG DATA… but in reality it’s often very challenging
    5. 5.    • Collect  • Move  • Sync  • Share  • Analyze  • Annotate  • Publish  • Search  • Backup  • Archive  BIG  DATA  …for
    6. 6.    • Collect  • Move  • Sync  • Share  • Analyze  • Annotate  • Publish  • Search  • Backup  • Archive  • Collect  • Move  • Sync  • Share     Capabili8es  delivered  using    So=ware-­‐as-­‐Service  (SaaS)  model  
    7. 7.    
    8. 8.    DataSourceDataDestinationUser  iniAates  transfer  request  1Globus  Online  moves/syncs  files  2Globus  Online  noAfies  user  3
    9. 9.    DataSourceUser  A  selects  file(s)  to  share;  selects  user/group,  sets  share  permissions    1Globus  Online  tracks  shared  files;  no  need  to  move  files  to  cloud  storage!  2User  B  logs  in  to  Globus  Online  and  accesses  shared  file  3
    10. 10.    Early  adopAon  is  encouraging  
    11. 11.    Early  adopAon  is  encouraging  8,000  registered  users;  >100  daily  ~16  PB  moved;  ~1B  files  10x  (or  beOer)  performance  vs.  scp  99.9%  availability  En8rely  hosted  on  Amazon  
    12. 12.    Globus  Online  already  does  a  lot  Globus ToolkitSharing ServiceTransfer ServiceGlobus Nexus(Identity, Group, Profile)GlobusOnlineAPIsGlobusConnect
    13. 13.    We  are  also  adding  capabiliAes  Globus ToolkitSharing ServiceTransfer ServiceGlobus Nexus(Identity, Group, Profile)GlobusOnlineAPIsGlobusConnect
    14. 14.    We  are  also  adding  capabiliAes  Globus ToolkitSharing ServiceTransfer ServiceDataset ServicesGlobus Nexus(Identity, Group, Profile)GlobusOnlineAPIsGlobusConnect
    15. 15.    Expanding Globus Online services•  Ingest and publication– Imagine a DropBox that not only replicates, butalso extracts metadata, catalogs, converts•  Cataloging– Virtual views of data based on user-definedand/or automatically extracted metadata•  Computation– Associate computational procedures,orchestrate application, catalog results, recordprovenance
    16. 16.    Builds on catalog as a serviceApproach•  Hosted user-definedcatalogs•  Based on tag model<subject, name, value>•  Optional schemaconstraints•  Integrated with otherGlobus servicesThree REST APIs/query/•  Retrieve subjects/tags/•  Create, delete, retrievetags/tagdef/•  Create, delete, retrievetag definitionsBuilds  on  USC  Tagfiler  project  (C.  Kesselman  et  al.)  
    17. 17. 17  mydata42  owner:  Francesco  type:  3dtomo  format:  HDF5  beamline:  2BM  Tomography!Define  dataset  Infer  type  Extract  metadata  Populate  catalog(s)  Locate  datasets  Access  files  analyze  Catalog  derivedproducts  transfer/schedule  Orchestra8on  Organiza8on  Record    provenance    Annotate,  share  browse,  search  
    18. 18.    Our challenge:SustainabilityWe are a non-profit serviceprovider to the non-profitresearch community
    19. 19.    Globus Online Provider PlansSupport ongoing operationsOffer value-added capabilitiesEngage more closely with users
    20. 20.    Starting at $20k per year•  Provider endpoints with sharing•  Multiple GridFTP servers per endpoint•  Branded web sites•  Alternate identity provider•  Usage reporting•  MSS optimizations•  Operations monitoring and management•  Input into and access to product roadmapProvider Plans offer…
    21. 21.    Thanks to great colleagues 
and collaborators•  Steve Tuecke, Rachana Ananthakrishnan, KyleChard, Raj Kettimuthu, Ravi Madduri, TanuMalik, and many others at Argonne & Uchicago•  Carl Kesselman, Karl Czajkowski, Rob Schuler,and others at USC/ISI•  Birali Runesha and others at UChicagoResearch Computing Center
    22. 22.    Thank  you  to  our  sponsors!