Integrating DSpace with DuraCloud 11-30-11

2,000 views

Published on

Interested in learning more about how to backup and preserve your DSpace content with DuraCloud? Curious about how the two systems integrate? This presentation details the DSpace integration with DuraCloud. Presented Tim Donohue, DSpace Technical Lead, DuraSpace.

This webinar will discuss the DSpace Curation System and how you can seamlessly back up your content to DuraCloud through the DSpace user interface (using a DSpace add-on called the "Replication Task Suite"). The presentation will also explore the curation tasks available, as well as how archival information packages (AIPs) are created and used in the system.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,000
On SlideShare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
Downloads
24
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Basic diagram of full backup of DSpace via the existing 1.7.x AIP Backup & Restore
  • Basic diagram of full restore of DSpace via the existing 1.7.x AIP Backup & Restore
  • Basic diagram of single collection restore via the existing 1.7.x AIP Backup & Restore
  • Contents of an AIP
  • METS is the central file and links everything in the AIP together. Also has metadata which links this AIP to related objects (e.g. Collection AIP links to handles of all Items that exist in that Collection, etc.)
  • Some limitations currently exist in what an AIP can and cannot restore. It doesn’t backup *everything*, but rather concentrates on backing up all “in-archive” content.
  • The 1.7.x AIP Backup & Restore can also be used to migrate entire Communities or Collections (or individual items) from one DSpace 1.7.x to another DSpace 1.7.x. All you do is backup to AIPs from one, and ingest those same AIPs into the other.
  • This is the workflow for backing up content from DSpace 1.7.x to DuraCloud. It’s a two step process – first you export to your local filesystem, then you sync that up to DuraCloud (using DuraCloud’s Sync Tool).
  • This is the workflow for restoring content to DSpace 1.7.x from DuraCloud. Again, it’s currently a two step process: first you download AIPs from DuraCloud using the DuraCloud Retrieval Tool, then you restore those to DSpace using its ‘packager’ tool.
  • In DSpace 1.8, the backup process becomes easier via the new Replication Task Suite (a new set of Curation Tools). Now, backup is a one step process, and can be kicked off via Admin UI or Commandline (your choice). The Replication Suite comes with a DuraCloud plugin which syncs automatically to DuraCloud. (SideNote: There’s also just a normal filesystem plugin, so you can backup to a filesystem folder instead. So DuraCloud is not required.)
  • In DSpace 1.8, the restore process is again now just one step. Again, it can be done via Admin UI or Command Line. There are also some new AIP Auditing Tools: Verify AIP – check to see if the AIP is already backed up to a remote location (to DuraCloud or Filesystem) Audit AIP – do a checksum comparison of remote AIP and a newly generated AIP to see if any changes have occurred. Fetch AIP – download remote AIP to your local DSpace install directory (‘[dspace]/replicate’ folder)
  • Integrating DSpace with DuraCloud 11-30-11

    1. 1. Integrating DSpace with DuraCloud DuraSpace Webinar: 30 Nov 2011 Tim Donohue DuraSpace
    2. 2. <ul><li>DSpace AIP Backup & Restore (1.7 +) </li></ul><ul><ul><li>(Initial DuraCloud use case: Backup & Restore) </li></ul></ul><ul><li>DSpace Curation Task System (1.7 +) </li></ul><ul><li>DSpace Replication Task Suite (1.8 add-on) </li></ul>Basis for DSpace Integration
    3. 3. <ul><li>Primary Use Cases </li></ul><ul><ul><li>Backup & Restore of DSpace Content </li></ul></ul><ul><ul><ul><li>All content or just partial (Community/Collection/Item) </li></ul></ul></ul><ul><ul><li>Migration/Export of DSpace Content </li></ul></ul><ul><ul><ul><li>All content or just partial (Community/Collection/Item) </li></ul></ul></ul><ul><ul><li>DuraCloud Integration </li></ul></ul><ul><ul><ul><li>Offsite backup & restore of content </li></ul></ul></ul>Intro to Archival Info Pkgs (1.7+)
    4. 4. How to Backup DSpace (pre-1.7) Full Database Backup Folder Backup Database Assetstore Folder
    5. 5. How to Restore All (pre-1.7) Full Database Backup Folder Backup Database Assetstore Folder
    6. 6. How to Restore a Collection (pre-1.7) Full Database Backup Folder Backup Database Assetstore Folder Temporary Database Temporary Folder?
    7. 7. How to Restore a Collection (pre-1.7) Full Database Backup Folder Backup Database Assetstore Folder Temporary Database Temporary Folder?
    8. 8. Backup via Archival Info Pkgs Package for each Community, Collection & Item AIP backup
    9. 9. Restore All via Archival Info Pkgs AIP backup Package for each Community, Collection & Item
    10. 10. Restore a Collection via AIPs AIP backup Collection AIP Items in Collection 1 2
    11. 11. What’s in an AIP? METS (DIM / MODS / PREMIS / METSRights) License Content Files or Logos *Also a BagIt version (Replication Suite add-on) Other Files in Bundles (optional) Archival Information Package (AIP)
    12. 12. What’s in an AIP? Descriptive Metadata: DIM & MODS Tech/Preservation Metadata: PREMIS Related Object AIPs METS (DIM / MODS / PREMIS / METSRights) License Content Files or Logos Other Files in Bundles (optional) Rights Metadata: METSRights
    13. 13. The “Site” AIP Top-Level Community AIPs METS (DIM / MODS / PREMIS / METSRights) Special AIP for site-wide info/metadata: (e.g. Group Memberships, EPeople)
    14. 14. <ul><li>Restore All In-Archive Content (Files + Metadata) </li></ul><ul><li>Restore All People & Groups </li></ul><ul><li>Restore All Permissions / Access Rights </li></ul><ul><li>Restore Community / Collection Logos, Metadata, Rights & Item Templates </li></ul><ul><li>Restore Community / Collection / Item Hierarchy </li></ul><ul><li>Restore In-Process / Incomplete Items </li></ul><ul><li>Restore Collection OAI-PMH/ORE Harvest Settings </li></ul><ul><li>Restore all configuration files (dspace.cfg, etc.) </li></ul>What can AIPs restore?
    15. 15. AIP Use Case: Migrate Content 3 4 Items in Collection One DSpace Install Another DSpace Install Collection AIP 1 2
    16. 16. “ Manual” AIP backup to DuraCloud Package for each Community, Collection & Item Local “Watch” Folder 1 2 DuraCloud Sync Tool ./dspace packager -d java -jar synctool.jar [1] [2] This two step route is required for DSpace 1.7.x
    17. 17. “ Manual” AIP restore from DuraCloud Package for each Community, Collection & Item Local Folder 1 2 DuraCloud Retrieval Tool java -jar retrievaltool.jar ./dspace packager -r [1] [2] This two step route is required for DSpace 1.7.x
    18. 18. New: DSpace Replication Suite add-on
    19. 19. <ul><li>Enables a basic ‘microservices’ approach to curating DSpace objects </li></ul><ul><li>Anyone can build a task & share it </li></ul><ul><li>Currently tasks must be written in Java </li></ul><ul><ul><li>Experimental support for non-Java tasks in 1.8 </li></ul></ul><ul><li>“ Frees” admin tasks from Command Line </li></ul><ul><ul><li>Can now run from Admin UI or CLI </li></ul></ul><ul><ul><li>Can also “queue” tasks for later processing </li></ul></ul>Intro to DSpace Curation System
    20. 20. <ul><li>In DSpace 1.8.0, ANY curation task run across your entire DSpace site will report a “NullPointerException” error </li></ul><ul><li>THIS DOES NOT MEAN YOUR TASK FAILED! </li></ul><ul><li>Check dspace logs to see if task did succeed </li></ul><ul><li>This bug will be fixed in 1.8.1 </li></ul><ul><ul><li>https://jira.duraspace.org/browse/DS-1077 </li></ul></ul>DSpace 1.8.0 WARNING
    21. 21. <ul><li>A set of curation tasks geared towards ‘replicating’ (backup/restore/audit) content </li></ul><ul><li>Compatible with DSpace 1.8.1 or above </li></ul><ul><ul><li>Not recommended on 1.8.0 (see previous slide) </li></ul></ul><ul><li>“ Wraps” DSpace AIP Backup & Restore tool </li></ul><ul><li>Provides configurable AIP storage plugins for filesystem or DuraCloud </li></ul><ul><li>Provides optional BagIt packaged AIPs </li></ul>DSpace Replication Suite Add-on
    22. 22. The Suite of Tasks Transmit AIP(s) Verify AIP(s) exist Fetch AIP(s) Audit against AIP(s) Remove AIP(s) Restore Missing Object(s) from AIP(s) Replace Existing Object(s) from AIP(s) Read Odometer (I/O upload/download stats) Estimate storage size of AIP(s) (rough estimate) All tasks can be configured to store AIPs in an existing DuraCloud account. Local/Mounted filesystem storage is also supported.
    23. 23. Backup AIP to DuraCloud Package for each Community, Collection & Item Local Temp Folder (Temporary Cache) OR Command line Curation Tools 1 <ul><li>Replication Task Suite: </li></ul><ul><li>One step process: Generate and Upload AIP </li></ul><ul><li>Via UI or CLI </li></ul>1
    24. 24. Restore AIP from DuraCloud Package for each Community, Collection & Item Local Temp Folder (Temporary Cache) OR Command line Curation Tools <ul><li>Replication Task Suite: </li></ul><ul><li>One step process: </li></ul><ul><li>Retrieve and Restore AIP </li></ul><ul><li>Via UI or CLI </li></ul><ul><li>Also ‘auditing’ tools </li></ul>1 1
    25. 25. DSpace Replication Suite Demo
    26. 26. <ul><li>Cannot yet take advantage of DuraCloud streaming capabilities (AIPs are zip files) </li></ul><ul><li>Cannot yet take advantage of DuraCloud transformation services (AIPs are zip files) </li></ul>Known Limitations
    27. 27. <ul><li>Early December (likely by Mon, Dec 5) </li></ul><ul><li>This release is for early adopters to try out add-on & provide feedback </li></ul><ul><li>Download from Replication Task Suite page: https://wiki.duraspace.org/display/DSPACE/ReplicationTaskSuite </li></ul><ul><li>Install & Configuration instructions also on above page </li></ul>“ Early Access” Release
    28. 28. <ul><li>MIT : Richard Rodgers & Wendy Bossons </li></ul><ul><ul><li>Developed Curation Task Framework </li></ul></ul><ul><ul><li>Developed initial Replication Suite tasks (BagIt versions) </li></ul></ul><ul><li>@mire : Mark Diggory </li></ul><ul><ul><li>Gave feedback throughout early development </li></ul></ul>In Large Thanks to…
    29. 29. <ul><li>Replication Task Suite: </li></ul><ul><ul><li>https://wiki.duraspace.org/display/DSPACE/ReplicationTaskSuite </li></ul></ul><ul><li>AIP Backup & Restore: </li></ul><ul><ul><li>https://wiki.duraspace.org/display/DSDOC18/AIP+Backup+and+Restore </li></ul></ul><ul><li>Curation Task System: </li></ul><ul><ul><li>https://wiki.duraspace.org/display/DSDOC18/Curation+System </li></ul></ul>For More Information
    30. 30. <ul><li>Package : http://www.flickr.com/photos/halfbisqued/2353845688/ </li></ul><ul><li>Harddrive & Terminal icons: http://tango.freedesktop.org/Tango_Desktop_Project </li></ul><ul><li>Folder icon: http://www.openclipart.org/detail/13740 </li></ul><ul><li>Database icon: http://www.openclipart.org/detail/68413 </li></ul><ul><li>Zip Pkg icon: http://veryicon.com/icons/system/capital-icon-suite-mac/zip-10.html </li></ul><ul><li>File icons: http://veryicon.com/icons/system/rhor-v2-part-3/ </li></ul><ul><li>Checkmark & Delete icons: http://veryicon.com/icons/system/on-stage/ </li></ul><ul><li>Tools Icon: http://veryicon.com/icons/system/azullustre/ </li></ul>Photo/Icon Acknowledgments

    ×