Today, I will talk to you about implementing Preservation Intent Statements at the NLA over the last year. I talked a little about this at last years CurateGear.
Like most things to do with managing digital collections, effective ways of making preservation decisions are evolving. Let’s be quite clear about what we mean by that statement: we (the digital preservation community) have no settled, agreed procedures for the full range of digital preservation challenges, nor even tentative plans we confidently expect to ensure adequate preservation for the next 100, 200 or 500 years. We don’t know with certainty what we have to do now, let alone what will have to be done in 20, 50 or 100 years in the future. So we are necessarily speculating and making up proposals for action. That’s not quite the same as saying ’we don’t know what we are doing’. We are looking for practical approaches that appear most likely to work, while recognising we are unlikely to foresee and forestall every problem. We take the view that methods and solutions do not exist in isolation: consideration of them is meaningless without the context of what has to be achieved. Therefore, part of our approach to preservation planning at the NLA is the development of what we at the NLA have labelled ‘preservation intent statements’, which we have been talking about since 2009.
I see the Pres Intent statements as a kind of Rosetta Stone between, digital preservation, various collections and management. As such, the Library’s preservation intent methodology is simply to engage collection curators in making explicit statements about which collection materials, and which copies of collection materials, need to remain accessible for an extended period, and which ones can be discarded when no longer in use or when access to them becomes troublesome. Curators are also asked to make broad statements clarifying what ’accessible’ means by stating the priority elements that need to be re-presented in any future access for each kind of digital object type in their collections. Importantly, the approach aims to accommodate the specific needs and characteristics of each collection, while looking for common ways of describing things so that patterns can be efficiently and effectively recognised and planned for.
One difficulty of the approach is that those responsible for collecting often see things in terms of genres, workflows and intellectual entities, whereas collection management and preservation decisions typically deal with types of file formats and individual files. Consequently, the same file types may be perceived as having different roles and importance in different collections. At the NLA we have many collections. In some the material is very varied and in others very homogeneous. Therefore, finding common ways of characterising and describing collection material is not as easy as you might think.
Although it has taken some time to engage across the organisation, the NLA now has statements of preservation intent at some level of detail for all of its digital collections. We are also developing new Statements as well as modifying older ones as required. One we have the initial statement with a common vocabulary in is much easier to modify. The various collection curators and a digital preservation specialist discussed and came to an agreement on how their collection could be described and characterised. This dialogue resulted in the drafting of ‘plain language’ statements in consultation with the NLA curators and collection managers. These were recorded on a wiki.
These statements are designed around questions of such as: How can we describe and classify a collection, which makes sense to the collection managers? (this depends on whether you are a splinter or lumper) Who is responsible for authorising what happens to any specific collection materials? Do we need to keep these materials accessible? If so, for how long? In general terms, what would an adequate level of accessibility look like (such as, integrity or fidelity of bits, viewing functionality, editing functionality, navigating, and the ability to manipulate content)? Who is responsible for preservation actions (i.e. digital preservation specialists, curators, ICT staff or another party)? Can we identify any issues, including operational or non-collecting issues, which may hinder preservation efforts in the future? Ultimately, we will want to turn these statements into machine actionable vocabularies
This concept is closely related to the idea of ‘significant properties’. At least in part, it is a response to the difficulties we and others have encountered in trying to apply the significant properties concept as a starting point for preservation planning. Arguably, our approach is a half way point between significant properties and those that don’t believe in them. We believe reference to significant properties in preservation planning requires some prior consideration of both the purposes for which digital content has been collected and the purposes of providing preservation attention. In short, you need to know what the purpose before you start tying to engage with the how. In my opinion this is one of the problems in DP over the last 11 years that I have been involved in it. We like working to work on solutions without fully understanding the requirements!
The example I will show you comes from the Pandora Web Archives. It is one on three web collections at the NLA. Some context about our Web Collections. We hold around five billion files or 184 terabytes of data and currently collects around one billion files or more than 40 terabytes of data each year.
So if we look at our Preservation Intent Statement for the NLA’s Selective Web Collection. General Description Who is Responsible General Intent Important Aspects Preservation of Different Versions Some of the Collecting Issues
I will be happy to explain in more detail in the demo session. Also we expect our article to be publish in the forthcoming D-Lib Vol. Thank you
Those Mad Men from the Antipodes: Presentation Intent at the National Library of Australia
Those Mad Men from the Antipodes: David PearsonPresentation Intent at the National Library of Australia
What do we need to do? ?A range of preservation toolsand methodologies!: Digi By Imogene Pearson (7 years)Google Images (2012) (March 2012)
What do they want? ?Rosetta StoneBritish Museum: Digi By Imogene Pearson (7 years)Google Images (2012) (March 2012)
NLA CollectionsPreservation Intent - Asian Collections and Overseas Collections Management — Version 1.0Preservation Intent - Australian Books and Serials — Version 1.0Preservation Intent - Dance — Version 1.0Preservation Intent - Manuscripts — Version 1.0Preservation Intent - Maps — Version 1.0Preservation Intent - Music — Version 1.0Preservation Intent - Newspaper Digitisation — Version 1.0Preservation Intent - Oral History — Version 1.0Preservation Intent - Pictures — Version 1.0Preservation Intent - Selective Web Harvesting — Version 1.0Preservation Intent - Web Domain Harvests — Version 1.0Preservation Intent – Australian Government Web Archives — In progress (Dec 2012)
These statements are designed around questions of such as:• How can we describe and classify a collection, which makes sense to the collection managers? (splinter or lumper)• Who is responsible for authorising what happens to any specific collection materials?• Do we need to keep these materials accessible? If so, for how long?• In general terms, what would an adequate level of accessibility look like (such as, integrity or fidelity of bits, viewing functionality, editing functionality, navigating, and the ability to manipulate content)?• Who is responsible for preservation actions (i.e. digital preservation specialists, curators, ICT staff or another party)?• Can we identify any issues, including operational or non-collecting issues, which may hinder preservation efforts in the future?
What about Significant Properties? Significant Properties PLATO Google Images (2012)
An Example: Pandora Web ArchiveThe NLA’s web collections fall into three broad categories.• Selectively gathered ‘titles’ which make up PANDORA, Australia’s Web Archive.• ‘Whole Domain Harvests’ which aim to capture a broad periodic snapshot of the Australian Web Domain.• Coherent bulk collections, such as ‘.gov.au’ seedlist derived collections.
GeneralDescriptionWho is Responsible?General IntentImportant AspectsPreservation ofDifferent VersionsSome of theCollecting Issues
For more information see:Webb, C., Pearson, D. and Koerbin, P. (in press) ‘ “Oh, you wanted us to preserve that?!”Statements of Preservation Intent for the National Library of Australia’s DigitalCollections’, in D-Lib Magazine. NLA Digital Preservation Crest: Totus, Omnibus, in Perpetua (Everything, for Everyone Forever)