Taming the Monster: Digital Preservation Planning and Implementation Tools

  • 7,125 views
Uploaded on

Given at Council of UW Libraries "One System, One Library" conference, June 2011.

Given at Council of UW Libraries "One System, One Library" conference, June 2011.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
7,125
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
45
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Taming the Monster Digital Preservation Planning and Implementation Tools Dorothea SaloPhoto: “Happy Easter, to my Peeps”http://www.flickr.com/photos/76074333@N00/449028423/ One System, One LibraryWorldIslandInfo.com / CC-BY 2.0 2 June 2011
  • 2. Why is this so scary?Photo: “Happy Easter, to my Peeps”http://www.flickr.com/photos/76074333@N00/449028423/WorldIslandInfo.com / CC-BY 2.0
  • 3. Isn’t this just as scary?Photo: “News Paper Origami Dragon Monster”http://www.flickr.com/photos/epsos/3777343342/epSos.de / CC-BY 2.0
  • 4. Yet we persevere.Photo: “News Paper Origami Dragon Monster”http://www.flickr.com/photos/epsos/3777343342/epSos.de / CC-BY 2.0
  • 5. DIGITAL IS NO DIFFERENT.Photo: “559 - The Matrix - Seamless Texture”http://www.flickr.com/photos/zooboing/4335531915/Patrick Hoesly / CC-BY 2.0
  • 6. Many of the same ideas apply... • Planning and policy • Risk assessment • Risk management • (knowing that we can’t save everything) • Materials quality matters! • Problem discovery and remediation • Crisis management • Chief problems: staff, $$$, organizational commitmentPhoto: “Where I Teach”http://www.flickr.com/photos/eklektikos/2541408630/Todd Ehlers / CC-BY 2.0
  • 7. Planning and assessment toolsPhoto: “Happy Easter, to my Peeps”http://www.flickr.com/photos/76074333@N00/449028423/WorldIslandInfo.com / CC-BY 2.0
  • 8. Scene-setting • Rosenthal, David. “Requirements for Digital Preservation: a Bottom-Up Approach.” • http://www.dlib.org/dlib/november05/rosenthal/ 11rosenthal.html • If you’re new to this, or trying to find your feet, this is the best short introduction I know. • The list of threats is outstanding.Photo: “Bottoms Up! - Duck; San Anton Gardens, Malta”http://www.flickr.com/photos/foxypar4/3123113762/John Haslam / CC-BY 2.0
  • 9. TRAC• “Trusted Repository Audit Checklist”• Despite the name, covers a LOT more than the technology! ! • Budget • Staffing • “designated communities”• CRL will audit you, if you like • (don’t, unless you’re really serious!)• http://catalog.crl.edu/record=b2212602~S1
  • 10. DRAMBORA• Digital Repository Audit Method Based on Risk Assessment• A “self-test,” if you will. • DRAMBORA is equally good as a pre- or post-test.• Personally, I prefer DRAMBORA to TRAC, ! especially for those just starting out.• http://www.repositoryaudit.eu/ • (registration required for toolkit access)
  • 11. Coping with file formatsPhoto: “Happy Easter, to my Peeps”http://www.flickr.com/photos/76074333@N00/449028423/WorldIslandInfo.com / CC-BY 2.0
  • 12. The one acronym you need to know: FITS• “File Information Tool Set” • (you need to know this; otherwise it’s hard to Google)• Wrapper for several file-format detector software packages• Intended to be baked into other software• It’s early days yet! • (This means you can’t always trust what the tools tell you, especially when they’re telling you about errors.)
  • 13. What’s this file?• wotsit.org “The Programmer’s File and Data Resource”• Directory of file extensions• When in doubt: open in a browser or text editor and see what you get. • N.b.: Microsoft Word is NOT a text editor!
  • 14. Solving the geographic distribution problemPhoto: “Happy Easter, to my Peeps”http://www.flickr.com/photos/76074333@N00/449028423/WorldIslandInfo.com / CC-BY 2.0
  • 15. What problem, now? • The “all your eggs in one basket” problem. • If all your bits are on one server, and the server room is flooded, or your town is nuked—oops. • Not the same as backups! • Don’t get me wrong, backups are important! • Backups are SHORT-TERM, and usually LOCAL. Geographic distribution (plus associated auditing) is intended for the long term. • Don’t forget auditing!Photo: “Nido”http://www.flickr.com/photos/italintheheart/3679974298/Jorge Elías / CC-BY 2.0
  • 16. LOCKSS• Lots of Copies Keeps Stuff Safe! • (There is also Portico, but Portico only works with e‑journal content.) • Open-source software that handles replication and (some) auditing.• “Private LOCKSS network” • A group of institutions agrees to build a LOCKSS network just for the stuff they’re interested in. • ASERL does this for ETDs. Many institutions (including UW-Madison) participate in a PLN for govdocs.
  • 17. “The cloud” • Typical cloud-based storage services make NO promises they won’t lose your stuff. • And for large quantities of data, bandwidth can become an issue. • And can they look at your stuff? Should they be able to? • Some early movers in this market fading • Iron Mountain had to kill their service. • DuraCloud • trying to finesse this issue by negotiating tougher SLAs with cloud-storage providersPhoto: “Sky View From Humboldt Park”http://www.flickr.com/photos/purpleslog/2589612577/Purple Slog / CC-BY 2.0
  • 18. Repository and digital-library platformsPhoto: “Happy Easter, to my Peeps”http://www.flickr.com/photos/76074333@N00/449028423/WorldIslandInfo.com / CC-BY 2.0
  • 19. Friendly wordof advice:PICKSOFTWARELAST. Photo: “Briana Calderon; future educator of america.” http://www.flickr.com/photos/46132085@N03/4703617843/ Arielle Calderon / CC-BY 2.0
  • 20. Another friendly word of advice: DON’T CHASE THE SHINY.Photo: “Sparkle Texture”http://www.flickr.com/photos/abbylanes/3214921616/Abby Lane / CC-BY 2.0
  • 21. Digital-library software • Is almost always VERY BAD at digital preservation! • (most packages don’t even try!) • So if a file gets corrupted on the server, or whatever... no warnings, no restore, nothing. Also, provenance? Who needs provenance? Event tracking? What’s that? • I’m not saying don’t use it. I’m saying that it doesn’t solve this problem. • In fact, if you’re using this software, you need to solve this problem FOR IT.Photo: “National DIGITAL Library”http://www.flickr.com/photos/schex/193912573/Jesse Schexnayder / CC-BY 2.0
  • 22. Examples• ContentDM: http://contentdm.com/• Omeka: http://omeka.org/• Greenstone: http://greenstone.org/
  • 23. Institutional-repository software • Is SHOCKINGLY bad at digital preservation! • (Though sometimes better than most DL software.) • Examples • Hosted/commercial: Digital Commons (BePress), ContentDM, DigiTool • If you go hosted, you’d better ask about their digital- preservation practices! • Open-source: EPrints, DSpace, FedoraPhoto: “IMG_0668”http://www.flickr.com/photos/12967790@N00/66531124Robert / CC-BY 2.0
  • 24. A new approach: curation microservicesPhoto: “Happy Easter, to my Peeps”http://www.flickr.com/photos/76074333@N00/449028423/WorldIslandInfo.com / CC-BY 2.0
  • 25. Do we really needPhoto: “giant crystal blob”http://www.flickr.com/photos/a_of_doom/527905701/A of DooM / CC-BY 2.0 THE BLOB?
  • 26. How about a jigsaw puzzle instead? • Break the digital-preservation problem down into parts. • Code up each part, making sure that it plays nicely with other parts. • lots of nice APIs! • which means other software can adopt/adapt microservices as well! • Put parts together as you need them.Photo: “Lapsana Apogonoides Puzzle”http://www.flickr.com/photos/gdesigneralex/2313092112/gdesigneralex / CC-BY 2.0
  • 27. California Digital Library• Pioneering this approach• Has open-sourced code for microservices• Has added microservices together to build its “Merritt” storage/repository service
  • 28. Escaping the silos: Fedora CommonsPhoto: “Happy Easter, to my Peeps”http://www.flickr.com/photos/76074333@N00/449028423/WorldIslandInfo.com / CC-BY 2.0
  • 29. What is Fedora Commons?• Blueprints and foundation, not the whole house (analogy credit to Peter Gorman)• You build the house you want!• Or you build condominiums on the same foundation. • Need different user interfaces for different materials? • Need different structures and behaviors? • No problem! Fedora can handle that.• (have I run this analogy into the ground yet?)
  • 30. We had this... Diagram courtesy of Peter Gorman.
  • 31. We are building this. Diagram courtesy of Peter Gorman.
  • 32. E-records managementPhoto: “Happy Easter, to my Peeps”http://www.flickr.com/photos/76074333@N00/449028423/WorldIslandInfo.com / CC-BY 2.0
  • 33. Axioms• Records management is about policy and procedures. • If your policy doesn’t fit with their procedures, guess what wins? Choose battles wisely.• There is never enough storage space.• Nobody cares until there’s a crisis.• Software will not save you... but it might help! Photo: “The Never Ending Math Problem” http://www.flickr.com/photos/acidwashphotography/2967752733/ d3 Dan / CC-BY 2.0
  • 34. Duke Data Accessioner• Accessioning tool for digital data • use case: J. Important Scholar dumps her hard drive on your desk, expects you to cope• File migrator, metadata manager, GUI, plugins (e.g. for file-format detection)• Bit rough, but in production use. • http://library.duke.edu/uarchives/about/tools/data- accessioner.html
  • 35. Archivematica• Soup-to-nuts records management and digital preservation tool. • Evaluation and accessioning all the way through preservation actions. (Oddly, they seem to be missing disposal... but they’re in alpha, so...)• Open source • Runs on a Linux server; RMs and archivists log in to GUI application remotely.• Normally I hate and fear silos, but this one is smartly built on microservices.
  • 36. Practical E-Records• Weblog by Chris Prom and protegés• Tool evaluations, conference-session writeups, essays on praxis• Best reading out there for the do-it- yourselfer• If you’re not reading it, why not?• http://e-records.chrisprom.com/
  • 37. Last thoughtsPhoto: “Happy Easter, to my Peeps”http://www.flickr.com/photos/76074333@N00/449028423/WorldIslandInfo.com / CC-BY 2.0
  • 38. If you can’t do everything... Image: “Confused” http://www.flickr.com/photos/kristiand/3223044657/ Kristian D. / CC-BY 2.0 that’s okay. Who can?
  • 39. DO SOMETHING.Photo: “Came hame háááá!”http://www.flickr.com/photos/kristiand/3223044657/Guirí R. Reyes / CC-BY 2.0
  • 40. The worst threat?INACTION. Photo: “Fatty’s role model” http://www.flickr.com/photos/cloudzilla/4910616774/ cloudzilla / CC-BY 2.0
  • 41. Thank you! This presentation is available under a Creative Commons 3.0 United States license.Photo: “Happy Easter, to my Peeps”http://www.flickr.com/photos/76074333@N00/449028423/WorldIslandInfo.com / CC-BY 2.0