Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Collaboration on appraisal and collection development for the long-term preservation of digital content


Published on

Slides from a presentation given at: Appraisal in the Digital World, Accademia Nazionale dei Lincei, Rome, Italy, 15-16 November 2007

Published in: Education, Technology
  • Be the first to comment

Collaboration on appraisal and collection development for the long-term preservation of digital content

  1. 1. Collaboration on appraisal and collection development for the long-term preservation of digital content Michael Day DCC Research Team UKOLN, University of Bath Bath BA2 7AY, United Kingdom [email_address]
  2. 2. Presentation outline <ul><li>Different approaches to selection and appraisal </li></ul><ul><li>Collection development </li></ul><ul><li>The importance of collaboration for: </li></ul><ul><ul><li>Digital preservation </li></ul></ul><ul><ul><li>Institutional repositories </li></ul></ul><ul><li>General principles for selection and appraisal </li></ul>
  3. 3. Approaches to selection (1) <ul><li>Fully comprehensive </li></ul><ul><ul><li>“Storage is cheap. Why select?” (topic of ASIST student chapter panel discussion, UNC, 2007) </li></ul></ul><ul><ul><li>May seem to provide a way of avoiding the cultural bias evident in most selection regimes </li></ul></ul><ul><ul><li>But, ad hoc decisions on retention may still be made, but maybe on pragmatic grounds (e.g., available technology, security, privacy) with little in the way of accountability </li></ul></ul><ul><ul><li>It also does not resolve the practical question of who should be responsible for preservation </li></ul></ul>
  4. 4. Approaches to selection (2) <ul><li>Different professional approaches to selection </li></ul><ul><ul><li>Archivists focus on “appraisal” </li></ul></ul><ul><ul><ul><li>Based on well-established theoretical principles </li></ul></ul></ul><ul><ul><ul><li>An important part of archival practice </li></ul></ul></ul><ul><ul><li>Other cultural heritage organisations focus on the development and management of collections </li></ul></ul><ul><ul><ul><li>Based on a different set of assumptions </li></ul></ul></ul>
  5. 5. Example: Web archives (1) <ul><li>Highlights differences between the archival and collection development approaches </li></ul><ul><ul><li>Archivists and records managers approach Web operations as a potential source or generator of records </li></ul></ul><ul><ul><ul><li>Identify best practice for managing Web records, e.g. TNA </li></ul></ul></ul><ul><ul><ul><li>Mitigating organisational risk </li></ul></ul></ul><ul><ul><ul><li>Enhancing accountability </li></ul></ul></ul>
  6. 6. Example: Web archives (2) <ul><ul><li>International Internet Preservation Consortium </li></ul></ul><ul><ul><ul><li>Internet Archive and national libraries </li></ul></ul></ul><ul><ul><ul><li>View Web as a source of “published” content that can be harvested to enhance existing collections </li></ul></ul></ul><ul><ul><ul><li>Whether highly selective (e.g. UK Web Archiving Consortium, National Library of Australia’s PANDORA archive) or broader in scope (domain capture), national library led-initiatives tend to focus on traditional collection development criteria </li></ul></ul></ul>
  7. 7. Collection development (1) <ul><li>Typically focuses both on institutional objectives (e.g. “supporting the research and teaching needs of the university”) and subject needs </li></ul><ul><li>Traditionally includes a range of activities: </li></ul><ul><ul><li>Selection, acquisition, deselection (weeding), disposal, preservation </li></ul></ul><ul><ul><li>Part of collection management (also includes policies, budget allocation, collection evaluation </li></ul></ul><ul><li>Most collections will change over time, e.g. responding to changes to institutional objectives and the resources available (money and space) </li></ul>
  8. 8. Collection development (2) <ul><ul><li>Specific selection factors might include: </li></ul></ul><ul><ul><ul><li>The overall purpose of the collection (e.g. supporting education and research) </li></ul></ul></ul><ul><ul><ul><li>Existing subject strengths </li></ul></ul></ul><ul><ul><ul><li>The information needs of users </li></ul></ul></ul><ul><ul><ul><li>Quality, accuracy, authoritativeness, currency, … </li></ul></ul></ul><ul><ul><ul><li>Value for money </li></ul></ul></ul><ul><ul><ul><li>Statutory requirements (e.g. for national libraries) </li></ul></ul></ul>
  9. 9. Collection development (3) <ul><ul><li>Collection development policies </li></ul></ul><ul><ul><ul><li>These help guide ongoing collecting activities and form the basis for evaluation </li></ul></ul></ul><ul><ul><ul><li>In the library sector, these can be “highly charged political documents and … the province of the most senior library management” (Derek Law) </li></ul></ul></ul><ul><ul><ul><li>Helps to define organisational goals </li></ul></ul></ul><ul><ul><ul><li>“ Deaccessioning” can lead to controversy (e.g. Nicholson Baker’s Double Fold ) </li></ul></ul></ul>
  10. 10. Collection development (4) <ul><li>Digital resources raise new kinds of selection issues: </li></ul><ul><ul><li>Defining content, e.g. understanding the “significant properties” of resources (vitally important for making preservation decisions) </li></ul></ul><ul><ul><li>The need for various types of metadata </li></ul></ul><ul><ul><li>Access </li></ul></ul><ul><ul><ul><li>The longer-term implications of licenses </li></ul></ul></ul><ul><ul><ul><li>User support and training needs </li></ul></ul></ul>
  11. 11. Collection development (5) <ul><ul><li>The principle that it is important to select resources early in their lifecycle </li></ul></ul><ul><ul><ul><li>Obsolescence leads to loss </li></ul></ul></ul><ul><ul><ul><li>Implicit knowledge gets lost </li></ul></ul></ul><ul><ul><ul><li>Metadata and documentation is hard to (re)create retrospectively </li></ul></ul></ul>
  12. 12. Collaboration on preservation (1) <ul><li>Collaborative infrastructures have long been identified as necessary for digital preservation and curation, e.g.: </li></ul><ul><ul><ul><li>Preservation is &quot;an ongoing, long-term commitment, often shared, and cooperatively met, by many stakeholders&quot; (Lavoie & Dempsey, 2004) </li></ul></ul></ul>
  13. 13. Collaboration on preservation (2) <ul><li>Examples: </li></ul><ul><ul><li>Shared services (e.g. registries of representation information, third-party services for bit-level preservation) </li></ul></ul><ul><ul><li>Networks of &quot;trust&quot; (audit and certification) </li></ul></ul><ul><ul><li>Collaboration on policy level, e.g. on collection development and access </li></ul></ul>
  14. 14. Institutional repositories (1) <ul><li>Institutional repositories require collaborative infrastructures: </li></ul><ul><ul><li>Distributed services linked (for access) by metadata harvesting </li></ul></ul><ul><ul><ul><li>Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) </li></ul></ul></ul><ul><ul><ul><li>Data Providers (repositories) and Service Providers (aggregators) </li></ul></ul></ul><ul><ul><li>Potential for the development of shared services to support repositories (Swan & Awre, Linking UK Repositories (JISC, 2006) </li></ul></ul>
  15. 15. Institutional repositories (2) <ul><li>Potential shared services identified by Swan & Awre (2006): </li></ul><ul><ul><li>Resource discovery </li></ul></ul><ul><ul><li>Building or hosting repositories </li></ul></ul><ul><ul><li>Advisory services (e.g. on IPR, preservation) </li></ul></ul><ul><ul><li>Content creation, digitisation </li></ul></ul><ul><ul><li>Metadata capture and enhancement </li></ul></ul><ul><ul><li>Name authorities </li></ul></ul><ul><ul><li>Citation analysis and research assessment </li></ul></ul><ul><ul><li>Preservation services </li></ul></ul>
  16. 16. IRs and preservation (1) <ul><li>Shared services for preservation: </li></ul><ul><ul><li>Assumption that not all institutions with repositories will be able to manage long-term preservation challenges, e.g.: </li></ul></ul><ul><ul><ul><li>Lack of local expertise and resources </li></ul></ul></ul><ul><ul><ul><li>Existing availability of third party services, e.g. provided by subject-based data centres, national libraries </li></ul></ul></ul><ul><ul><ul><li>Preservation is a logical area for collaboration </li></ul></ul></ul>
  17. 17. IRs and preservation (2) <ul><li>Examples: </li></ul><ul><ul><li>DARE (Digital Academic Repositories) initiative (Netherlands) </li></ul></ul><ul><ul><ul><li>National Library of the Netherlands (KB) has responsibility for content deposited in participating repositories </li></ul></ul></ul><ul><ul><li>Repository Bridge project (UK) </li></ul></ul><ul><ul><ul><li>Demonstration of harvesting e-theses (using OAI-PMH and METS) by the National Library of Wales </li></ul></ul></ul>
  18. 18. IRs and preservation (3) <ul><li>Examples (continued): </li></ul><ul><ul><li>SHERPA DP project (UK) - JISC funded </li></ul></ul><ul><ul><ul><li>Developed disaggregated framework for outsourcing preservation, based on the OAIS model </li></ul></ul></ul><ul><ul><ul><li>Explored the packaging and transfer of content (using METS) </li></ul></ul></ul>
  19. 19. IRs and preservation (4) <ul><li>Examples (continued): </li></ul><ul><ul><li>Preserv project (UK) - JISC funded </li></ul></ul><ul><ul><ul><li>Simple model of modular services, e.g. for: </li></ul></ul></ul><ul><ul><ul><ul><li>Bit-level preservation </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Object characterisation and validation (e.g. using registries like PRONOM-DROID) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Preservation Planning (risk assessments, technology watch, etc.) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Preservation strategies (e.g. migration) </li></ul></ul></ul></ul>
  20. 20. IRs and preservation (5) Preserv service provider model (Hichcock, et al ., 2007)
  21. 21. IRs and collection development (1) <ul><li>Collection development issues for : </li></ul><ul><ul><li>Content types </li></ul></ul><ul><ul><ul><li>Peer-reviewed research outputs, scientific datasets, administrative records, ... </li></ul></ul></ul><ul><ul><ul><li>Will have different preservation priorities </li></ul></ul></ul><ul><ul><li>Object types (file formats) </li></ul></ul><ul><ul><ul><li>Policies will have direct influence on risks (and costs) of long-term preservation, e.g.: </li></ul></ul></ul><ul><ul><ul><ul><li>Accepting anything vs. defining the specific standards to be used </li></ul></ul></ul></ul>
  22. 22. IRs and collection development (2) <ul><ul><li>Ongoing review (and weeding) of collections </li></ul></ul><ul><ul><ul><li>Withdrawal of content (contentious issue) </li></ul></ul></ul><ul><ul><ul><li>Superseded or duplicate material </li></ul></ul></ul><ul><ul><li>Defining preservation service levels </li></ul></ul><ul><ul><ul><li>Different policies needed for different types of material </li></ul></ul></ul>
  23. 23. IRs and collection development (3) <ul><li>Potential areas for collaboration: </li></ul><ul><ul><li>Ingest workflows </li></ul></ul><ul><ul><ul><li>Checking conformance with submission rules </li></ul></ul></ul><ul><ul><ul><li>Automated tools for format characterisation and validation, maybe conversion (normalisation) </li></ul></ul></ul><ul><ul><ul><li>Metadata enhancement, e.g. consistent forms of name </li></ul></ul></ul>
  24. 24. Shared collection development (1) <ul><li>Collection development has been a traditional focus of library co-operation, e.g.: </li></ul><ul><ul><li>Farmington Plan (1940s) </li></ul></ul><ul><ul><li>University of London Depository Library </li></ul></ul><ul><li>The concept of &quot;virtual collections&quot; </li></ul><ul><ul><li>IFLA Universal Availability of Publications (UAP) core programme </li></ul></ul><ul><li>Also applies to digital collections </li></ul><ul><ul><li>OhioLINK </li></ul></ul><ul><ul><li>California Digital Library </li></ul></ul>
  25. 25. Shared collection development (2) <ul><li>Collaborative collection development and digital preservation </li></ul><ul><ul><li>Potentially reducing unnecessary duplication of effort </li></ul></ul><ul><ul><li>Enabling co-ordinated decisions to be made about the redundancy and geographical distribution of content </li></ul></ul><ul><ul><li>Also supporting the application of different preservation strategies to the same class of content </li></ul></ul>
  26. 26. Shared collection development (3) <ul><ul><li>Identifying collections at risk and supporting their rescue </li></ul></ul><ul><li>In order to do these things, it may be useful to have some common understanding of what collection development and appraisal should mean in the digital era </li></ul><ul><ul><li>The main appraisal activities identified by the InterPARES Appraisal Task Force may be useful here </li></ul></ul>
  27. 27. InterPARES appraisal framework (1) <ul><li>1. Compiling information </li></ul><ul><ul><li>Identifying the form and contexts of records </li></ul></ul><ul><ul><li>Identifying the particular components that need preservation </li></ul></ul><ul><ul><li>Based on solid research (not just collecting it together in a haphazard fashion) </li></ul></ul><ul><ul><li>This information could become part of the record’s metadata </li></ul></ul>
  28. 28. InterPARES appraisal framework (2) <ul><li>2. Assessing value </li></ul><ul><ul><li>Judgement based on creator’s needs and societal needs </li></ul></ul><ul><ul><li>May be context dependent (institution specific) </li></ul></ul><ul><ul><ul><li>Assessing continuing value </li></ul></ul></ul><ul><ul><ul><li>Authenticity </li></ul></ul></ul><ul><ul><ul><li>Determining value </li></ul></ul></ul>
  29. 29. InterPARES appraisal framework (3) <ul><li>3. Determining the feasibility of preservation </li></ul><ul><ul><li>Determining value is not enough in itself </li></ul></ul><ul><ul><li>Need also to consider whether the records are able to be preserved as authentic records </li></ul></ul><ul><ul><li>Takes into account the organisational ability to undertake preservation </li></ul></ul><ul><ul><li>Gathers technical information </li></ul></ul><ul><li>4. Making the appraisal decision </li></ul><ul><ul><li>Based on value and feasibility </li></ul></ul><ul><ul><li>All decisions made must be documented </li></ul></ul>
  30. 30. InterPARES appraisal framework (4) <ul><li>A generic framework: as developed has a focus on records, but the general principles, broadly interpreted, could be applied to other forms of content, e.g. scientific datasets, Web content </li></ul><ul><li>Does not presuppose a particular preservation approach </li></ul><ul><li>Encourages a focus on organisational objectives, object contexts, object value, the technical feasibility of preservation, and the determination of “significant properties” </li></ul><ul><li>Helps to document the selection process </li></ul>
  31. 31. Conclusions <ul><li>The use of a consistent set of principles might help to encourage: </li></ul><ul><ul><li>More consistency in documenting selection and appraisal decisions across domains, with benefits for collaboration </li></ul></ul><ul><ul><li>May provide insight into assessing value and preservation feasibility in specific contexts (like Web archives) </li></ul></ul>