Your SlideShare is downloading. ×
0
State Library   • Part of the North                  Carolina Departmentof North          of Cultural ResourcesCarolina   ...
CONTENT                        STAFF  State Publications      Genealogy Research          ~ 4.75 FTE       North Carolinia...
Born-Digital 3.25      Digitized   .75CONTENTdm             CONTENTdm   Connexion            Project Client            Loc...
CONTENT• We preserve access and master copies• 1.27 TB, 162,000+ files• Mostly .tif,  .pdf, .jpg, .txt
CONTENTFile structure by “project”   admindocs   fulltext   images_access   images_master   images_processed   metadataNam...
Local storage  • managed by department-wide IT  • includes working & preservation    content  • server is shared, but our ...
OCLC’s    • Began using in 2008Digital   • Web interface for accessArchive   • FTP or automatic uploads          • Integra...
+• Integration with    • Integration with  CONTENTdm             CONTENTdm• Fixity checks and   • Finding and             ...
DuraSpace’s   • Began using in 2012              • Web interface for                access              • Web interface or...
+• Presentation is like a  traditional gui file manager                                 • Searching• Can designate spaces,...
PREPARATION        CONTENTdm             ?    Local   OCLC DA                 server
CONTENTdm         Local server1.   Exported metadata from CONTENTdm2.   Exported file names from local server3.   Bashed p...
1. Exported metadata           Onerous to   from CONTENTdm              impossible2. Exported file names                  ...
1. Exported metadata             Onerous to   from CONTENTdm                impossible• OCLC had to provide export for lar...
2. Exported file names             Easy   from local server1. Bashed preservation     Easy but time   file names, checksum...
4. Identified and                Easy-ish   recovered missing files• Missing from CONTENTdm? Added by  librarians• Missing...
THE MOVE         Local server        DuraCloud1.   Tested sync and upload tools2.   Discussed spaces3.   Ran sync tool on ...
1. Tested sync and upload tools          Easy• Helped determine flags to manage  computer resources during sync• Verified ...
2. Discussed spaces            Easy, and                             Interesting• Many spaces or few, to accommodate  diff...
3. Ran sync tool on local           Easy   preservation storage• Ran continuously for 5 2/3 days• 94,177 items
4. Ongoing maintenance: upload tool   Easy-ish• Uploads done weekly and monthly• Upload tool used to avoid accidental  ove...
Working                      Staging – Limited Accessdirectory                                                 Local serve...
Insights   • Room for preservation metadata             improvement           • Working with full metadata dumps          ...
Still more   • No, really: manual management               and auditing is getting less feasiblethoughts     • What is acc...
Migrating from OCLC's Digital Archive to DuraCloud
Migrating from OCLC's Digital Archive to DuraCloud
Migrating from OCLC's Digital Archive to DuraCloud
Migrating from OCLC's Digital Archive to DuraCloud
Migrating from OCLC's Digital Archive to DuraCloud
Migrating from OCLC's Digital Archive to DuraCloud
Upcoming SlideShare
Loading in...5
×

Migrating from OCLC's Digital Archive to DuraCloud

319

Published on

Presented by Lisa Gregory at Best Practices Exchange, December 2012. This presentation compared OCLC's Digital Archive to DuraCloud, and the process of migrating storage from one to the other.

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
319
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
15
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Migrating from OCLC's Digital Archive to DuraCloud"

  1. 1. State Library • Part of the North Carolina Departmentof North of Cultural ResourcesCarolina • Work closely/pool resources with the State Archives • Digital Information Management Program
  2. 2. CONTENT STAFF State Publications Genealogy Research ~ 4.75 FTE North Caroliniana Local server CONTENTdm (state-supported)Connexion Digital Import Offsite storage (vendor)SYSTEMS STORAGE
  3. 3. Born-Digital 3.25 Digitized .75CONTENTdm CONTENTdm Connexion Project Client Local Storage .75 Remote Storage
  4. 4. CONTENT• We preserve access and master copies• 1.27 TB, 162,000+ files• Mostly .tif, .pdf, .jpg, .txt
  5. 5. CONTENTFile structure by “project” admindocs fulltext images_access images_master images_processed metadataNaming convention pubs_serial_annualreportclean2005.pdf gen_statefair_lifecharacterthomasruffin1871_0001.tif
  6. 6. Local storage • managed by department-wide IT • includes working & preservation content • server is shared, but our directory is restricted • daily incremental backups STORAGE
  7. 7. OCLC’s • Began using in 2008Digital • Web interface for accessArchive • FTP or automatic uploads • Integrated with CONTENTdm • Detailed reporting, broken out by CONTENTdm collection • Fixity checks, virus checks
  8. 8. +• Integration with • Integration with CONTENTdm CONTENTdm• Fixity checks and • Finding and retrieving items virus scans • Manifest/batch• Responsive upload requirement support • Vendor-side error• Extensive reports reporting • Verifying storage contents
  9. 9. DuraSpace’s • Began using in 2012 • Web interface for access • Web interface or client-side tools for upload • Content Management System-agnostic • Fixity checks
  10. 10. +• Presentation is like a traditional gui file manager • Searching• Can designate spaces, • Sorting permissions• Can make a space public • Verifying storage• Powerful upload tools contents• Fixity scans • Overwriting isn’t• Robust reporting• Easy to get content out hard to do• Choice of storage services • Batch delete• VERY collaborative support• Non-profit • MD5
  11. 11. PREPARATION CONTENTdm ? Local OCLC DA server
  12. 12. CONTENTdm Local server1. Exported metadata from CONTENTdm2. Exported file names from local server3. Bashed preservation file names, checksums4. Identified and recovered missing files
  13. 13. 1. Exported metadata Onerous to from CONTENTdm impossible2. Exported file names Easy from local server3. Bashed preservation Easy but time file names, checksums consuming4. Identified and Easy-ish recovered missing files
  14. 14. 1. Exported metadata Onerous to from CONTENTdm impossible• OCLC had to provide export for largest & most critical collection• 363 MB tar file -> 18 x 100+ MB csv files• Added frustration: metadata for compound objects v. multi-page pdfs
  15. 15. 2. Exported file names Easy from local server1. Bashed preservation Easy but time file names, checksums consuming• Spreadsheet gymnastics• Manual review for filename/checksum inconsistencies
  16. 16. 4. Identified and Easy-ish recovered missing files• Missing from CONTENTdm? Added by librarians• Missing from local server? Request to OCLC or re-download from CONTENTdm
  17. 17. THE MOVE Local server DuraCloud1. Tested sync and upload tools2. Discussed spaces3. Ran sync tool on local preservation storage4. Ongoing maintenance: upload tool
  18. 18. 1. Tested sync and upload tools Easy• Helped determine flags to manage computer resources during sync• Verified logging output, permissions• Helped flesh out local workflow
  19. 19. 2. Discussed spaces Easy, and Interesting• Many spaces or few, to accommodate different workflows?• Assignment of permissions
  20. 20. 3. Ran sync tool on local Easy preservation storage• Ran continuously for 5 2/3 days• 94,177 items
  21. 21. 4. Ongoing maintenance: upload tool Easy-ish• Uploads done weekly and monthly• Upload tool used to avoid accidental overwriting• Have to create “mock” file structure
  22. 22. Working Staging – Limited Accessdirectory Local server StagingWorkingdirectory DuraCloudWorkingdirectory
  23. 23. Insights • Room for preservation metadata improvement • Working with full metadata dumps is problematic • Need for more automated monitoring for local storage • Integration with CMS not helpful unless FULL integration in other words: • Streamlined ingest = streamlined preservation
  24. 24. Still more • No, really: manual management and auditing is getting less feasiblethoughts • What is acceptable content loss? • What is acceptable preservation metadata error rate? • Responsiveness to enhancement requests should be figured into vendor choice • At 5 years out, PREMIS lite is just fine
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×