Taming the Monster: Digital Preservation Planning and Implementation Tools
1. Taming
the Monster
Digital Preservation Planning
and Implementation Tools
Dorothea Salo
Photo: āHappy Easter, to my Peepsā
http://www.ļ¬ickr.com/photos/76074333@N00/449028423/ One System, One Library
WorldIslandInfo.com / CC-BY 2.0
2 June 2011
2. Why is this
so scary?
Photo: āHappy Easter, to my Peepsā
http://www.ļ¬ickr.com/photos/76074333@N00/449028423/
WorldIslandInfo.com / CC-BY 2.0
3. Isnāt this just
as scary?
Photo: āNews Paper Origami Dragon Monsterā
http://www.ļ¬ickr.com/photos/epsos/3777343342/
epSos.de / CC-BY 2.0
4. Yet we
persevere.
Photo: āNews Paper Origami Dragon Monsterā
http://www.ļ¬ickr.com/photos/epsos/3777343342/
epSos.de / CC-BY 2.0
5. DIGITAL IS NO
DIFFERENT.
Photo: ā559 - The Matrix - Seamless Textureā
http://www.ļ¬ickr.com/photos/zooboing/4335531915/
Patrick Hoesly / CC-BY 2.0
6. Many of the same ideas apply...
ā¢ Planning and policy
ā¢ Risk assessment
ā¢ Risk management
ā¢ (knowing that we canāt save everything)
ā¢ Materials quality matters!
ā¢ Problem discovery and remediation
ā¢ Crisis management
ā¢ Chief problems: staļ¬, $$$, organizational
commitment
Photo: āWhere I Teachā
http://www.ļ¬ickr.com/photos/eklektikos/2541408630/
Todd Ehlers / CC-BY 2.0
7. Planning and
assessment
tools
Photo: āHappy Easter, to my Peepsā
http://www.ļ¬ickr.com/photos/76074333@N00/449028423/
WorldIslandInfo.com / CC-BY 2.0
8. Scene-setting
ā¢ Rosenthal, David. āRequirements for Digital
Preservation: a Bottom-Up Approach.ā
ā¢ http://www.dlib.org/dlib/november05/rosenthal/
11rosenthal.html
ā¢ If youāre new to this, or trying to ļ¬nd your
feet, this is the best short introduction I
know.
ā¢ The list of threats is outstanding.
Photo: āBottoms Up! - Duck; San Anton Gardens, Maltaā
http://www.ļ¬ickr.com/photos/foxypar4/3123113762/
John Haslam / CC-BY 2.0
9. TRAC
ā¢ āTrusted Repository Audit Checklistā
ā¢ Despite the name, covers a LOT more than
the technology!
!
ā¢ Budget
ā¢ Staļ¬ng
ā¢ ādesignated communitiesā
ā¢ CRL will audit you, if you like
ā¢ (donāt, unless youāre really serious!)
ā¢ http://catalog.crl.edu/record=b2212602~S1
10. DRAMBORA
ā¢ Digital Repository Audit Method Based on
Risk Assessment
ā¢ A āself-test,ā if you will.
ā¢ DRAMBORA is equally good as a pre- or post-test.
ā¢ Personally, I prefer DRAMBORA to TRAC,
!
especially for those just starting out.
ā¢ http://www.repositoryaudit.eu/
ā¢ (registration required for toolkit access)
11. Coping with
ļ¬le formats
Photo: āHappy Easter, to my Peepsā
http://www.ļ¬ickr.com/photos/76074333@N00/449028423/
WorldIslandInfo.com / CC-BY 2.0
12. The one acronym you
need to know: FITS
ā¢ āFile Information Tool Setā
ā¢ (you need to know this; otherwise itās hard to Google)
ā¢ Wrapper for several ļ¬le-format detector
software packages
ā¢ Intended to be baked into other software
ā¢ Itās early days yet!
ā¢ (This means you canāt always trust what the tools tell
you, especially when theyāre telling you about errors.)
13. Whatās this ļ¬le?
ā¢ wotsit.org āThe Programmerās File and
Data Resourceā
ā¢ Directory of ļ¬le extensions
ā¢ When in doubt: open in a browser or text
editor and see what you get.
ā¢ N.b.: Microsoft Word is NOT a text editor!
14. Solving the
geographic
distribution
problem
Photo: āHappy Easter, to my Peepsā
http://www.ļ¬ickr.com/photos/76074333@N00/449028423/
WorldIslandInfo.com / CC-BY 2.0
15. What problem, now?
ā¢ The āall your eggs in one basketā problem.
ā¢ If all your bits are on one server, and the server room
is ļ¬ooded, or your town is nukedāoops.
ā¢ Not the same as backups!
ā¢ Donāt get me wrong, backups are important!
ā¢ Backups are SHORT-TERM, and usually LOCAL.
Geographic distribution (plus associated auditing) is
intended for the long term.
ā¢ Donāt forget auditing!
Photo: āNidoā
http://www.ļ¬ickr.com/photos/italintheheart/3679974298/
Jorge ElĆas / CC-BY 2.0
16. LOCKSS
ā¢ Lots of Copies Keeps Stuļ¬ Safe!
ā¢ (There is also Portico, but Portico only works with
eājournal content.)
ā¢ Open-source software that handles replication and
(some) auditing.
ā¢ āPrivate LOCKSS networkā
ā¢ A group of institutions agrees to build a LOCKSS
network just for the stuļ¬ theyāre interested in.
ā¢ ASERL does this for ETDs. Many institutions
(including UW-Madison) participate in a PLN for
govdocs.
17. āThe cloudā
ā¢ Typical cloud-based storage services make
NO promises they wonāt lose your stuļ¬.
ā¢ And for large quantities of data, bandwidth can become
an issue.
ā¢ And can they look at your stuļ¬? Should they be able to?
ā¢ Some early movers in this market fading
ā¢ Iron Mountain had to kill their service.
ā¢ DuraCloud
ā¢ trying to ļ¬nesse this issue by negotiating tougher SLAs
with cloud-storage providers
Photo: āSky View From Humboldt Parkā
http://www.ļ¬ickr.com/photos/purpleslog/2589612577/
Purple Slog / CC-BY 2.0
18. Repository
and digital-library
platforms
Photo: āHappy Easter, to my Peepsā
http://www.ļ¬ickr.com/photos/76074333@N00/449028423/
WorldIslandInfo.com / CC-BY 2.0
19. Friendly word
of advice:
PICK
SOFTWARE
LAST. Photo: āBriana Calderon; future educator of america.ā
http://www.ļ¬ickr.com/photos/46132085@N03/4703617843/
Arielle Calderon / CC-BY 2.0
20. Another friendly word of
advice:
DONāT CHASE
THE SHINY.
Photo: āSparkle Textureā
http://www.ļ¬ickr.com/photos/abbylanes/3214921616/
Abby Lane / CC-BY 2.0
21. Digital-library software
ā¢ Is almost always VERY BAD at digital
preservation!
ā¢ (most packages donāt even try!)
ā¢ So if a ļ¬le gets corrupted on the server, or whatever...
no warnings, no restore, nothing. Also, provenance?
Who needs provenance? Event tracking? Whatās that?
ā¢ Iām not saying donāt use it. Iām saying that
it doesnāt solve this problem.
ā¢ In fact, if youāre using this software, you need to solve
this problem FOR IT.
Photo: āNational DIGITAL Libraryā
http://www.ļ¬ickr.com/photos/schex/193912573/
Jesse Schexnayder / CC-BY 2.0
23. Institutional-repository
software
ā¢ Is SHOCKINGLY bad at digital preservation!
ā¢ (Though sometimes better than most DL software.)
ā¢ Examples
ā¢ Hosted/commercial: Digital Commons (BePress),
ContentDM, DigiTool
ā¢ If you go hosted, youād better ask about their digital-
preservation practices!
ā¢ Open-source: EPrints, DSpace, Fedora
Photo: āIMG_0668ā
http://www.ļ¬ickr.com/photos/12967790@N00/66531124
Robert / CC-BY 2.0
24. A new approach:
curation
microservices
Photo: āHappy Easter, to my Peepsā
http://www.ļ¬ickr.com/photos/76074333@N00/449028423/
WorldIslandInfo.com / CC-BY 2.0
25. Do we really need
Photo: āgiant crystal blobā
http://www.ļ¬ickr.com/photos/a_of_doom/527905701/
A of DooM / CC-BY 2.0
THE BLOB?
26. How about a jigsaw
puzzle instead?
ā¢ Break the digital-preservation problem
down into parts.
ā¢ Code up each part, making sure that it
plays nicely with other parts.
ā¢ lots of nice APIs!
ā¢ which means other software can adopt/adapt
microservices as well!
ā¢ Put parts together as you need them.
Photo: āLapsana Apogonoides Puzzleā
http://www.ļ¬ickr.com/photos/gdesigneralex/2313092112/
gdesigneralex / CC-BY 2.0
27. California Digital Library
ā¢ Pioneering this approach
ā¢ Has open-sourced code for microservices
ā¢ Has added microservices together to build
its āMerrittā storage/repository service
28. Escaping the silos:
Fedora Commons
Photo: āHappy Easter, to my Peepsā
http://www.ļ¬ickr.com/photos/76074333@N00/449028423/
WorldIslandInfo.com / CC-BY 2.0
29. What is Fedora Commons?
ā¢ Blueprints and foundation, not the whole
house (analogy credit to Peter Gorman)
ā¢ You build the house you want!
ā¢ Or you build condominiums on the same
foundation.
ā¢ Need diļ¬erent user interfaces for diļ¬erent materials?
ā¢ Need diļ¬erent structures and behaviors?
ā¢ No problem! Fedora can handle that.
ā¢ (have I run this analogy into the ground yet?)
32. E-records
management
Photo: āHappy Easter, to my Peepsā
http://www.ļ¬ickr.com/photos/76074333@N00/449028423/
WorldIslandInfo.com / CC-BY 2.0
33. Axioms
ā¢ Records management is
about policy and
procedures.
ā¢ If your policy doesnāt ļ¬t with
their procedures, guess what
wins? Choose battles wisely.
ā¢ There is never enough
storage space.
ā¢ Nobody cares until
thereās a crisis.
ā¢ Software will not save
you... but it might help!
Photo: āThe Never Ending Math Problemā
http://www.ļ¬ickr.com/photos/acidwashphotography/2967752733/
d3 Dan / CC-BY 2.0
34. Duke Data Accessioner
ā¢ Accessioning tool for digital data
ā¢ use case: J. Important Scholar dumps her hard drive
on your desk, expects you to cope
ā¢ File migrator, metadata manager, GUI,
plugins (e.g. for ļ¬le-format detection)
ā¢ Bit rough, but in production use.
ā¢ http://library.duke.edu/uarchives/about/tools/data-
accessioner.html
35. Archivematica
ā¢ Soup-to-nuts records management and
digital preservation tool.
ā¢ Evaluation and accessioning all the way through
preservation actions. (Oddly, they seem to be
missing disposal... but theyāre in alpha, so...)
ā¢ Open source
ā¢ Runs on a Linux server; RMs and archivists log in to
GUI application remotely.
ā¢ Normally I hate and fear silos, but this one
is smartly built on microservices.
37. Last thoughts
Photo: āHappy Easter, to my Peepsā
http://www.ļ¬ickr.com/photos/76074333@N00/449028423/
WorldIslandInfo.com / CC-BY 2.0
38. If you canāt do everything...
Image: āConfusedā
http://www.ļ¬ickr.com/photos/kristiand/3223044657/
Kristian D. / CC-BY 2.0
thatās okay. Who can?
39. DO SOMETHING.
Photo: āCame hame hƔƔƔƔ!ā
http://www.ļ¬ickr.com/photos/kristiand/3223044657/
GuirĆ R. Reyes / CC-BY 2.0
40. The worst threat?
INACTION. Photo: āFattyās role modelā
http://www.ļ¬ickr.com/photos/cloudzilla/4910616774/
cloudzilla / CC-BY 2.0
41. Thank you!
This presentation is available
under a Creative Commons 3.0
United States license.
Photo: āHappy Easter, to my Peepsā
http://www.ļ¬ickr.com/photos/76074333@N00/449028423/
WorldIslandInfo.com / CC-BY 2.0