Sharing Between     Data Repositories                    Kevin S. Clarke                 ksclarke@nescent.orgThanks to the...
The Bio-Reposphere(Generic Subject Repository)                                                          (General Scholarly...
Generic vs. Specific Repos✔ Easy submission         ✔ Complex submission✔ Simple metadata         ✔ More useful metadata✔ ...
A Dryad Data Package
One Possible Workflow
“Save the Time of the User” #1
“Save the Time of the User” #2
Three Simple Steps
Case 1: TreeBASE Data Import
Harvesting and Web Services           OAI-PMH           PhyloWS
Case 2: Data Uploaded to Dryad
Partner Repository Upload
BagIt Disseminator      (implements DSpace PackageDisseminator)                              Dryad Application Profile    ...
A BagIt Bag  bag-info.txt                        data    bagit.txtmanifest-md5.txt   tagmanifest-md5.txt
Dryad Data in the Bag           datafile-2                                    dryadpkg.xmldryadfile-2.xml   ApineDNA.nexus...
HTTP PUT Handshake        TreeBASE URLEmail              BagIt Upload
Lessons Learned✔   Just enough to get the job done and no more✔   Less local conventions and more “standards”✔   There wil...
Future DirectionsLess reliance on local conventions✔   Plan to use OAI-ORE and Pairtree(s) within BagItOAI-ORE: Because it...
Other Interesting DevelopmentsDataONE✔ Distributing data files and metadata✔ May support packages in the future“Dropbox of...
The End          The cake was a lie
ReferencesDryad Code http://dryad.googlecode.comDryad Data Repository  http://datadryad.orgBagIt  http://en.wikipedia.org/...
Upcoming SlideShare
Loading in...5
×

Sharing Between Data Repositories

1,206

Published on

Dryad is a generic subject repository that shares author submitted data with other scientific repositories. In a part "how we done it" and part "things to consider" talk, I'll discuss 1) why we chose BagIt and OAI-ORE as mechanisms for sharing our data, 2) how we've integrated with TreeBASE -- a subject repository of phylogenetic information), and 3) the possibility of this method of data sharing being adopted by other repositories within the larger DataONE community. There will be cake.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,206
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Sharing Between Data Repositories"

  1. 1. Sharing Between Data Repositories Kevin S. Clarke ksclarke@nescent.orgThanks to the Dryad Data Repository contributors and funders:Ryan Scherle, Todd J. Vision, Hilmar Lapp (NESCent)Ben Bosman, Mark Diggory, Kevin Van de Velde (@mire, Inc.) NESCent
  2. 2. The Bio-Reposphere(Generic Subject Repository) (General Scholarly Repository) (Subject Specific Repository)
  3. 3. Generic vs. Specific Repos✔ Easy submission ✔ Complex submission✔ Simple metadata ✔ More useful metadata✔ Data is a “black box” ✔ Well structured data✔ No “orphaned” data ✔ Specific type of data
  4. 4. A Dryad Data Package
  5. 5. One Possible Workflow
  6. 6. “Save the Time of the User” #1
  7. 7. “Save the Time of the User” #2
  8. 8. Three Simple Steps
  9. 9. Case 1: TreeBASE Data Import
  10. 10. Harvesting and Web Services OAI-PMH PhyloWS
  11. 11. Case 2: Data Uploaded to Dryad
  12. 12. Partner Repository Upload
  13. 13. BagIt Disseminator (implements DSpace PackageDisseminator) Dryad Application Profile XSLT Dryad Crosswalk PublicationDSpace Dryad Data DryadMetadata Data File Package Dryad Data File Dryad Data FileBag Data from DSpace
  14. 14. A BagIt Bag bag-info.txt data bagit.txtmanifest-md5.txt tagmanifest-md5.txt
  15. 15. Dryad Data in the Bag datafile-2 dryadpkg.xmldryadfile-2.xml ApineDNA.nexus datafile-1 dryadpub.xmldryadfile-1.xml ApineCYTB.nexus
  16. 16. HTTP PUT Handshake TreeBASE URLEmail BagIt Upload
  17. 17. Lessons Learned✔ Just enough to get the job done and no more✔ Less local conventions and more “standards”✔ There will always be custom solutions✔ Options are developing quickly in this space
  18. 18. Future DirectionsLess reliance on local conventions✔ Plan to use OAI-ORE and Pairtree(s) within BagItOAI-ORE: Because its Linked DataPairtree Filesystem✔ So we can dereference URIs in ORE Resource Maps http://dx.doi.org/doi:10.5061/dryad.8343 URI prefix: http://dx.doi.org/doi:10.5061/dryad. Path: 83/43 83/43/Arctostaphylos.nex
  19. 19. Other Interesting DevelopmentsDataONE✔ Distributing data files and metadata✔ May support packages in the future“Dropbox of Bags” or Bag replication network (BagNet?)METS in Bags (in contrast to ORE)
  20. 20. The End The cake was a lie
  21. 21. ReferencesDryad Code http://dryad.googlecode.comDryad Data Repository http://datadryad.orgBagIt http://en.wikipedia.org/wiki/BagItOAI-ORE Primer http://www.openarchives.org/ore/1.0/primerOAI-ORE in BagIt http://groups.google.com/group/oai-ore/browse_thread/thread/3ebfa7fcb4588048ADMIRAL Data Packages (Planning ORE in BagIt) http://imageweb.zoo.ox.ac.uk/wiki/index.php/ADMIRAL_data_packagesDSpace Packagers https://wiki.duraspace.org/display/DSPACE/PackagerPlugins
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×