The Adventures of Digi: Ideas, Requirements and Reality
The Adventures of Digi:Th Ad t f Di iIdeas, Requirements deas, equ e e tsand Reality David Pearson National Library of Australia Future Perfect 2012 Digi By Imogene Pearson (7 years) (March 2012)
1.) Some Context1 ) Some Context Digi By Imogene Pearson (7 years) (March 2012)
From a preservation point of view, the Library’s digital collections present:• A mix of materials needing to be kept in perpetuity, along with materials that can be discarded after specified periods or events;• Mixed levels of complexity in terms of object structure, relationships and dependencies; Mixed levels of complexity in terms of object structure, relationships and dependencies;• Mixed levels of intellectual control;• A wide range of file formats (and carrier formats);• Different levels of complexity in preservation planning and processing;• Different timetables for preservation action;• A need for different preservation approaches, often at different scales; and• A need for recurring – and possibly changing ‐ preservation action cycles over time, using a changing suite of tools. changing suite of tools
EcologyEcology or Layers of consciousness for the need for digital preservation intervention (Given some need to access content over time) Unaware:• I am unaware if I have any digital content; or• I am unaware if I may have a problem accessing any of my digital content I am unaware if I may have a problem accessing any of my digital content.Aware ‐ no response:• I don’t think that I have a problem accessing any of my digital content; p g y y g• I recognise that I have a problem accessing some of my digital content; • I recognise that I have a problem accessing some of my digital content. However, the problem is not my problem; or• II recognise that I have a problem, but have no response in place ‐ not even a limited one. i th t I h bl b th i l t li it dAware – taking some action:• I accept that I may have a problem accessing some of my digital content. I am taking limited I accept that I may have a problem accessing some of my digital content. I am taking limited actions to manage this problem; or• I accept that I may have a problem accessing some of my digital content. The preservation mandate is a part of my enterprise or system ecology.
Another way of looking at it might be: David Pearson 2012
3.) What we have come to understand over 3 ) What we have come to understand over time. http://www.motifake.com/79532 via Google Images
Preservation responsibilities:Preservation of the Librarys digital collections involves three main goals:• Maintaining access to reliable data at bit‐stream level;• Maintaining access to content encoded in the bit streams; and Maintaining access to content encoded in the bit streams; and• Maintaining access to the intended and available meaning of the content.While specific preservation activities may focus on one or more of these goals, the Library’s p p y g ypreservation responsibility is only fulfilled when all three goals have been adequately addressed.This responsibility applies across all digital collections, subject to curatorial and policy decisions for specific groups of digital objects.for specific groups of digital objects
Mission: The primary objective of preservation activities within the NLA is to maintain theability to meaningfully access digital collection content over time. ‘Logical on ‘Logical on Physical Physical Stuff’ Stuff’A B Contextual Dependency Information – About Information – About time Content Formats etc. Systems to Ingest, Systems to Ingest Manage, Report and take Actions time Systems to Access – Master or Derivative ‘Stuffed?’ David Pearson 2012 Google Images
Required preservation processesThe Library must be able to:• Understand what it holds in its collections;• Understand what its preservation intentions are for every digital object and what it is entitled Understand what its preservation intentions are for every digital object and what it is entitled to do to realise its intentions;• Understand what is required to provide access, existing inhibitors to access, and the current level of support the Library is able to provide;• Evaluate and monitor the degree of risk arising from collection composition, preservation intentions and available level of support within the Library for digital collection content, and monitor for risk conditions arising during general Digital Library operations;• Anticipate the effects of changes in support; p g pp• Recognise planning triggers, and plan and take appropriate action on a scale appropriate to the size of the target; and• Audit the effectiveness of its preservation arrangements and modify the arrangements if necessary. necessary
Risk or ‘Risk‐on’ (are you a splitter or a lumper?)• ‘parameter‐based’ risks: a match against a criterion defined by Library staff to indicate a preservation risk – for example, video encoded with a codec considered to be problematic; • ‘exception’ risks: the value of a monitored parameter is outside a set of acceptable values; exception risks: the value of a monitored parameter is outside a set of acceptable values;• ‘change’ risks: there has been a change in status for a monitored parameter for content – for example, the confidence in format identification for a particular file has changed; • ‘conflict’ risks: conflicting values for the parameter are reported by one or more tools – for example, file format identification returns conflicting values; l fil f t id tifi ti t fli ti l• ‘unknown value’ risks: undetermined values for defined parameters – for example, undetermined values for file format and version; and • ‘access support’ risks: changes in level of support which affect the Library’s ability to access pp g pp y y to content in accordance with preservation intent and significance – for example, reduction below an acceptable threshold in the availability of supporting software for a particular file format. • ‘content‐based’ risks: characteristics of content that may not be identifiable from metadata – content based risks: characteristics of content that may not be identifiable from metadata for example, presence of deprecated HTML tags.
Likely preservation treatment actionsBroad preservation action approaches that are likely to be required will include:• Format migration at the point of collecting; Format migration at the point of collecting;• Format migration on recognition of risks;• Format migration at the point of delivery;• Emulation of various levels of software and hardware environments;• Maintenance or supply of appropriate software or hardware;• Documenting known problems for which no other action can be taken; and• Deaccessioning or deletion.
Prioritising Preservation Treatment:The Library expects to take into account indicators of ‘preservation intent’, ‘significance’, and ‘level of support’ within monitoring and reporting activities, and in evaluations of risk and pprioritisation for preservation planning and action. p p g http://callmemilo.deviantart.com/art/Thunderbirds-are-GO-20717927
Preservation intent – indicates the expectations for preservation for content:• whether content is to be preserved;• who is responsible for preservation of the content;• the period over which content must be preserved; the period over which content must be preserved;• the required level of support for access to the content over time; for example, that the Library intends to actively maintain the ability to both present and modify content, or only to present content, or does not intend to actively maintain access to content beyond its expected useful life. t d f l lif• Preservation intent may also extend to include more specific characteristics to be supported, based on curatorial input or constraints imposed by rights policies or agreements with rights p p y g p g g holders.
Significance – indicates the relative priority required for taking preservation action to maintain access to content, as determined by collection curators; for example, content rated as highly d i db ll i f l d hi hlsignificant would be prioritised for preservation planning and action before content of lower significance.Level of support – indicates how well a digital collection object is supported within the Library, based on a combination of how much is known about the object and its components (including their file formats), and the degree to which supporting software or hardware environments are available. NLA Image g
4.) This got us thinking4 ) This got us thinking Colin Webb 2009
Which turned into thisWhich turned into this NLA 2011
Preservation assessment and reportingThe Library must be able to review the composition and characteristics of its digital collections to assess trends that may affect preservation management, to aid setup of preservation monitoring, pplanning and action, and to report on specific aspects of content when necessary. g , p p p yA solution must enable staff to define and request, on both an ad hoc and scheduled basis:• summary reports of content, metadata characteristics and risks across collections or defined sets of managed content;• detailed metadata reports for individual items or sets of items; and detailed metadata reports for individual items or sets of items; and• audit trail history reports for individual items or sets of items.
Reference knowledgebases (General)Enable staff to create, update and maintain reference information E bl ff d d i i f i f i knowledgebases on:• File formats and versions• Software and hardware components that support access to file formats and versions, for maintaining access to managed file formats and versions for maintaining access to managed content; and• The level of support available for particular file formats and versions: – i sets of software or hardware components available to i. sets of software or hardware components available to support access to formats; – ii. functions supported, both for providing access to content and for use in preservation action – for example, presentation, modification, batch processing; – iii. fidelity of support – how well functions are supported; and – iv. known risks, including potential inhibitors to preservation, associated with formats or supporting software or hardware. f h d• Preservation intent descriptions and parameters for sets of content.
Other systems are also required to interrelate in this ecosystem such as: h• Preservation monitoring, reporting and prioritisation• Preservation options and preservation action planning• Preservation action evaluation
This is how we tended to think about it (a job for a new system).
6.) Info on Formats, software and level of support (someprototyping) = NLA 2011
7.) Level of support and Prioritisation7 ) Level of support and Prioritisation NLA 2011
Level of support (an early concept model) DP 2011
Prioritising preservation treatment based on level of support Prioritising preservation treatment based on level of supportIn evaluations of risk and prioritisation for preservation planning and action, we must take into the Level of Support/Access Risks and: • Any constraints imposed by rights policies or agreements; and• The amount of resources available.Based on these factors, the Library (Management, Collections and Digi Pres) should be able to prioritise material to be preserved.
8.) Preservation actions and options generation8 ) Preservation actions and options generation NLA 2011
Options for preservation actions We would like to be able to enable staff to:• define types of preservation actions for use within preservation planning and evaluation. define types of preservation actions for use within preservation planning and evaluation• update and delete reference information on options for preservation action, both in general and for particular formats or format types.• link to information able from the software KB which provides information on what actions specific software might be useful for and the proximity of the software to the format.• Link to other linked data sources.
Pres action options generationThe Library must be able to test and evaluate preservation action plans to determine if they satisfactorily achieve the preservation intent for managed content. For example, a solution should:• enable staff to develop and test executable preservation action plans for sets of managed p p p g content. Including: – Single and multiple step actions (combining manual and automated workflows) – Replacing files/s and linkages in complex objects – Linking to a specific emulation environment (if available) – Replacing access software – Specifying that no action is required• Support simulations or testing of preservation actions against a content Testbed. For example, enable staff to perform what if simulations to determine impact of changes to availability of support for access, including: – a. Removal of software or hardware sets supporting access, to assess risks or impacts on access; and – b. Addition or revision of software or hardware sets supporting access, to assess proposed remedial preservation action plans. ti l• enable staff to define quality assurance criteria for preservation action plan outcomes
Preservation options evaluation• support import and integration of preservation‐treated content and metadata, from either internal or external processes, including: – a. Verifying that preservation‐treated digital content conforms to acceptance criteria for a. Verifying that preservation treated digital content conforms to acceptance criteria for preservation outcomes for designated sets of digital content; – b. Enabling staff to quality assure and approve preservation‐treated digital content for incorporation into the collection; and – c. After approval, send to preservation action scheduler for treatment of file/s, metadata Aft l dt ti ti h d l f t t t f fil / t d t and associated relationships.• support ‘rollback’ of updated versions of content, metadata and associated relationships to pp p p restore previous versions, if necessary.• enable staff to define and approve acceptance criteria for preservation action outcomes for sets of managed content. sets of managed content
10.) So what!10 ) So what!Currently, these ideas and requirements Currently, these ideas and requirementshave become ‘partially real’. They still need to be implemented. They formed the basis for the preservation Th f d th b i f th tirequirements in a subsequent:• RFP (Request for Proposal) process; and• RFT (Request for Tender) process. q p http://www.wildsound-filmmaking-feedback-events.com/images/austin_powers_dr_evil.jpg
RFPSo all of these ideas where consolidated as requirements for a Request for Proposal which q q pwent to the market in July 2011. A number of responses were received for:• C Core systems t• Preservation • Digitisation• Other Workflows Other Workflows http://www.melbournesumos.com.au/pics/twister/Twister078.jpg http://www melbournesumos com au/pics/twister/Twister078 jpgThese were evaluated and some of the vendors were invited to participate in the next stage.
RFTBased on the RFP, the NLA clarified the requirements for the next process. q pA select group from the RFP process were invited to participated in a Request for Tender in which closed in late December Tender in which closed in late December2011. http://simpro.co/wp-content/uploads/2010/10/paperwork2.jpg
What version of realityhave we decided upon? Everything, for Everyone Forever Digi By Imogene Pearson (7 years) (March 2012) http://www.flickr.com/photos/ricksmit/15671245/