Sharing Between Data Repositories


Published on

Dryad is a generic subject repository that shares author submitted data with other scientific repositories. In a part "how we done it" and part "things to consider" talk, I'll discuss 1) why we chose BagIt and OAI-ORE as mechanisms for sharing our data, 2) how we've integrated with TreeBASE -- a subject repository of phylogenetic information), and 3) the possibility of this method of data sharing being adopted by other repositories within the larger DataONE community. There will be cake.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Sharing Between Data Repositories

  1. 1. Sharing Between Data Repositories Kevin S. Clarke ksclarke@nescent.orgThanks to the Dryad Data Repository contributors and funders:Ryan Scherle, Todd J. Vision, Hilmar Lapp (NESCent)Ben Bosman, Mark Diggory, Kevin Van de Velde (@mire, Inc.) NESCent
  2. 2. The Bio-Reposphere(Generic Subject Repository) (General Scholarly Repository) (Subject Specific Repository)
  3. 3. Generic vs. Specific Repos✔ Easy submission ✔ Complex submission✔ Simple metadata ✔ More useful metadata✔ Data is a “black box” ✔ Well structured data✔ No “orphaned” data ✔ Specific type of data
  4. 4. A Dryad Data Package
  5. 5. One Possible Workflow
  6. 6. “Save the Time of the User” #1
  7. 7. “Save the Time of the User” #2
  8. 8. Three Simple Steps
  9. 9. Case 1: TreeBASE Data Import
  10. 10. Harvesting and Web Services OAI-PMH PhyloWS
  11. 11. Case 2: Data Uploaded to Dryad
  12. 12. Partner Repository Upload
  13. 13. BagIt Disseminator (implements DSpace PackageDisseminator) Dryad Application Profile XSLT Dryad Crosswalk PublicationDSpace Dryad Data DryadMetadata Data File Package Dryad Data File Dryad Data FileBag Data from DSpace
  14. 14. A BagIt Bag bag-info.txt data bagit.txtmanifest-md5.txt tagmanifest-md5.txt
  15. 15. Dryad Data in the Bag datafile-2 dryadpkg.xmldryadfile-2.xml datafile-1 dryadpub.xmldryadfile-1.xml
  16. 16. HTTP PUT Handshake TreeBASE URLEmail BagIt Upload
  17. 17. Lessons Learned✔ Just enough to get the job done and no more✔ Less local conventions and more “standards”✔ There will always be custom solutions✔ Options are developing quickly in this space
  18. 18. Future DirectionsLess reliance on local conventions✔ Plan to use OAI-ORE and Pairtree(s) within BagItOAI-ORE: Because its Linked DataPairtree Filesystem✔ So we can dereference URIs in ORE Resource Maps URI prefix: Path: 83/43 83/43/Arctostaphylos.nex
  19. 19. Other Interesting DevelopmentsDataONE✔ Distributing data files and metadata✔ May support packages in the future“Dropbox of Bags” or Bag replication network (BagNet?)METS in Bags (in contrast to ORE)
  20. 20. The End The cake was a lie
  21. 21. ReferencesDryad Code http://dryad.googlecode.comDryad Data Repository http://datadryad.orgBagIt Primer in BagIt Data Packages (Planning ORE in BagIt) Packagers