• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
KeepIt Course 4: Putting storage, format management and preservation planning in the repository
 

KeepIt Course 4: Putting storage, format management and preservation planning in the repository

on

  • 620 views

This is the opening presentation for module 4 of the 5-module course on digital preservation tools for repository managers, presented by the JISC KeepIt project. This module puts storage, format ...

This is the opening presentation for module 4 of the 5-module course on digital preservation tools for repository managers, presented by the JISC KeepIt project. This module puts storage, format management and preservation planning in the repository, by making such functions available from within the familiar repository interface. This introduction briefly reviews the previous module, which acted as a primer on preservation workflow, formats and characterisation, as preparation for the preservation planning tools to be encountered in this module. For more on this and other presentations in this course look for the tag ’KeepIt course’ in the project blog http://blogs.ecs.soton.ac.uk/keepit/

Statistics

Views

Total Views
620
Views on SlideShare
620
Embed Views
0

Actions

Likes
0
Downloads
11
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    KeepIt Course 4: Putting storage, format management and preservation planning in the repository KeepIt Course 4: Putting storage, format management and preservation planning in the repository Presentation Transcript

    • Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 4, Putting storage, format management and preservation planning in the repository University of Southampton, 18-19 March 2010 Twitter hashtag #dprc(digital preservation repository course)
    • Course structure • Module 1. Organisational issues Scoping, selection, assessment, institutional parameters (19 January) • Module 2. CostsLifecycle costs for managing digital objects, based on the LIFE approach, and institutional costs (5 February) • Module 3. Description Describing content for preservation: provenance, significant properties and preservation metadata (2 March) • Module 4. Preservation workflow tools available in EPrints for format management, risk assessment and storage, and linked to the Plato planning tool from Planets (TODAY) • Module 5. Trust (by others) of the repository’s approach to preservation; trust (by the repository) of the tools and services it chooses (30th March)
    • Tools this module • Eprints preservation apps, including the storage controller, Dave Tarrant and Adam Field, University of Southampton • Plato, preservation planning tool from the Planets project, Andreas Rauber and HannesKulovits, TU Wien
    • Steve Jobs launches Apple iPad Picture by curiouslee http://www.flickr.com/photos/curiouslee/4320074421/
    • Steve Jobs launches Apple iPad “75 million people already own iPod Touches and iPhones. That's all people who already know how to use the iPad.” Picture by curiouslee http://www.flickr.com/photos/curiouslee/4320074421/
    • Some revision from KeepIt Module 3 • Preservation workflow
    • Preservation workflow Check Analyse Action •Format Preservation planning •Migration identification, version Characterisation: • Emulation ing Significant properties and • Storage selection • File validation technical • Virus check characteristics, provenance, for • Bit checking and mat, risk factors checksum calculation Risk analysis Tools Tools e.g. DROID Plato (Planets) JHOVE PRONOM (TNA) FITS P2 risk registry (KeepIt) INFORM (U Illinois)
    • Format risks 1000 Ubiquity: degree of adoption of the format 1001 Support: number of tools available which can access the format 1002 Disclosure: extent to which the format documentation is publicly disclosed 1003 Document Quality: completeness of the available documentation 1004 Stability: speed and backwards-compatibility of version change 1005 Ease of identification: ease with which the format can be identified 1006 Ease of validation: ease with which the format can be validated 1007Lossiness: does the format use lossy compression 1008 Intellectual property rights: whether or not the format is encumbered by IPR 1009 Complexity: degree of content or behavioural complexity supported From PRONOM documentation (The National Archives), July 2008
    • Format risks Word vs PDF TIFF vs JPEG XML vs PDF 1000 Ubiquity 1 1 1 1001 Support 1 1 1002 Disclosure 1003 Document Quality 1004 Stability 1 1 1005 Ease of identification 1006 Ease of validation 1 1 1007Lossiness 1 1 1008 Intellectual 1 property rights 1009 Complexity 1 1 1 The WINNER is PDF TIFF XML
    • A group task on format risks 1. Choose two formats to compare (e.g. Word vs PDF, Word vs ODF, PDF vs XML, TIFF vs JPEG) 2. By working through the (surviving) list of format risks select a winner (or a draw) between your chosen formats for each risk category (1 point for win) 3. Total the scores to find an overall winning format 4. Suggest one reason why the winning format using this method may not be the one you would choose for your repository
    • Some revision from KeepIt Module 3 • Preservation workflow – Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective.
    • Some revision from KeepIt Module 3 • Preservation workflow – Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective. • Significant properties
    • InSPECT SP Assessment Framework •Builds on Gero’sFunction-Behaviour-Structure framework •FBS developed to assist engineers/designers to create & redesign artefacts Three categories: • Function: The design intention or purpose that is performed. • Behaviour: The epistemological outcome derived from the function & structure obtained by the stakeholder • Structure: The structural elements of the Object that enables stakeholder to perform behaviour. •Artefact construction is product of designated function. •Behaviour is result of interaction between Function & Structure 13
    • Exercise overview • Analyse the content of an email • Analyse structure of email message • Determine purpose that each technical property performs • Consider how email will be used by stakeholders • Identify set of expected behaviours • Classify set of behaviours into functions for recording 14
    •  Select object type Identify purpose of Determine expected Classify behaviours Associate structure Analyse structure Review & finalise for analysis technical properties behaviours into functions with each function Behaviour Structure subject Determine expected behaviours Message text • What activities would a user – any type of Line break stakeholder – perform when using an email? Paragraph • Draw upon list of property descriptions underline performed in the previous step, formal strikethrough standards and specifications, or other Body background information sources. Body text colour In-reply-to Task 2: references Message-id Identify the type of actions that a user Trace-route would be able to perform using the Sender display-name email (Groups. 15 mins). Sender local-part Sender domain-part • E.g. Establish name of person who sent Recipient display- email name • E.g. May want to confirm that email Recipient local-part Recipient domain- originated from stated source. part 15
    • 1.3 cont. Categories of properties Five high-level categories •Content e.g. character count •Context e.g. date of creation •Rendering e.g. bit depth •Structure e.g. e-mail attachments •Behaviour e.g. hyperlinks 16
    •  Select object type(s) Determine actual Classify behaviours into Assign acceptable Identify stakeholder Cross-match functions Review & finalise for analysis behaviours set of functions value boundaries •Identify Stakeholders •Creator – view, annotate • Researcher corresponds during research with colleagues, peers, administrators etc. •Recipient – reuses content • Student wants to understand research lifecycles by studying real-world practice •Custodian – evidential chain • Maintains permanent email record for externally- funded projects, alongside data and eprint outputs 17
    • Some revision from KeepIt Module 3 • Preservation workflow – Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective. • Significant properties – We considered which characteristics might be significant using the function- behaviour-structure (FBS) framework, and classifying the functions of formatted emails – We recognised that assessment of behaviour, and so of significance, can vary according to the viewpoint of the stakeholder – e.g. creator, user, archivist
    • Some revision from KeepIt Module 3 • Preservation workflow – Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective. • Significant properties – We considered which characteristics might be significant using the function- behaviour-structure (FBS) framework, and classifying the functions of formatted emails – We recognised that assessment of behaviour, and so of significance, can vary according to the viewpoint of the stakeholder – e.g. creator, user, archivist • Documentation – We looked at two means to document these characteristics, and the changes over time 1. Broad and established (PREMIS) 2. Focussed, and work-in-progress (Open Provenance Model)
    • Some revision from KeepIt Module 3 • Preservation workflow – Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective. • Significant properties – We considered which characteristics might be significant using the function- behaviour-structure (FBS) framework, and classifying the functions of formatted emails – We recognised that assessment of behaviour, and so of significance, can vary according to the viewpoint of the stakeholder – e.g. creator, user, archivist • Documentation – We looked at two means to document these characteristics, and the changes over time 1. Broad and established (PREMIS) 2. Focussed, and work-in-progress (Open Provenance Model) • Provenance in action: transmission and recording
    • Provenance: a numbers game • Transmission: recording vs word-of-mouth • Identifying what is significant about the information to be transmitted • Can be self-correcting!
    • Some revision from KeepIt Module 3 • Preservation workflow – Recognised we have digital objects with formats and other characteristics we need to identify and record. These can change over time, or may need to be changed pre-emptively depending on a risk assessment, using a preservation action. Risk is subjective. • Significant properties – We considered which characteristics might be significant using the function- behaviour-structure (FBS) framework, and classifying the functions of formatted emails – We recognised that assessment of behaviour, and so of significance, can vary according to the viewpoint of the stakeholder – e.g. creator, user, archivist • Documentation – We looked at two means to document these characteristics, and the changes over time 1. Broad and established (PREMIS) 2. Focussed, and work-in-progress (Open Provenance Model) • Provenance in action: transmission and recording – Through a simple game we learned that if we don’t recognise the necessary properties at the outset, and maintain a record through all stages of transmission, the information at the end of the chain will likely not be the same as you started with