An introduction to digital curation and preservation  Michael Day Digital Curation Centre UKOLN, University of Bath [email_address] Information and Library Management, University of the West of England, Bristol, 24 March 2009  Slides available on SlideShare: http://www.slideshare.net/michaelday
Presentation outline: The DCC digital curation lifecycle Some definitions OAIS concepts Roles and responsibilities Reasons for preserving research data Digital preservation challenges and strategies A taxonomy of research data collections Infrastructures for preservation and curation Some comments on curation and “Open Science”
Learning outcomes An greater awareness of the factors that need to be taken into account when considering how to preserve research data (and other materials) over time A deeper understanding of the preservation options currently available Part of the “digital curation lifecycle” (Digital Curation Centre)
 
Preservation in the curation lifecycle Lifecycle includes: Creation   Appraisal and selection   Ingest   Preservation   Storage   Access, use and reuse   Transformation   Generic tasks: Preservation planning Community watch Metadata (Descriptive Information, Representation Information)
Preservation in the curation lifecycle There are major dependencies on the rest of the curation process The creation stage is normally the best time to ensure that data are fit-for-purpose and “preservable” Need to document both explicit and implicit knowledge, contexts (part of the metadata issue) Preservation Planning informs ingest strategies as well as preservation actions and transformations
Definitions (1) Preservation: A management function “Its objective is to ensure that information survives in usable form for as long as it is wanted” - John Feather (1991) Not  primarily  about: Conservation or restoration Storage media or backup regimes Concepts of “permanence”
Definitions (2) Digital preservation: Digital information is different Technical problems with ensuring continued access (more of this later) But also (primarily) a managerial problem “ ... the planning, resource allocation, and application of preservation methods and technologies to ensure that digital information of continuing value remains accessible and usable” - Margaret Hedstrom (1998)
Definitions (3) Digital curation: General concept (data curation) originates in the scientific data world (e.g. bioinformatics, astronomy) Is used to mean something more than just the preservation of objects "The activity of managing and promoting the use of data from its point of creation, to ensure it is fit for contemporary purpose, and available for discovery and reuse" - Philip Lord, et al. (2004) "Maintaining and adding value to a trusted body of information for current and future use" -- DCC presentation at CNI (2005)
The OAIS reference model Reference Model for an Open Archival Information System (OAIS) Fundamental standard, defines key concepts Development managed by the Consultative Committee on Space Data Systems (CCSDS) CCSDS Blue Book 650.0-B-1 (2002) ISO 14721:2003 Recently reviewed - no major changes proposed Has established a common framework of terms and concepts  Information model has been influential on the design of some preservation metadata schemas It is still uncertain what 'conformance' might mean
OAIS mandatory responsibilities Negotiating and accepting information Obtaining sufficient control of the information to ensure long-term preservation Determining the "designated community"  Ensuring that information is  independently understandable , i.e. can be (re)used without the assistance of those who produced it Following documented policies and procedures  Making the preserved information available
OAIS Functional Model (1) Six entities Ingest Archival Storage Data Management Administration Preservation Planning Access Described using UML diagrams
OAIS Functional Model (2) Administration Ingest Archival Storage Access Data Management Descriptive info. PRODUCER CONSUMER MANAGEMENT queries result sets Descriptive info. Preservation Planning orders OAIS Functional Entities (Figure 4-1) SIP SIP SIP DIP DIP AIP AIP
OAIS Information Model Defines the “Information Packages” required Ingest (Submission Information Package) Storage (Archival Information Package) Access (Dissemination Information Package) General principle of Information Packages: All objects are wrapped in layers of metadata (Representation Information, Descriptive Information, Packaging, etc.)
Implementing OAIS Fundamentals: OAIS is a reference model (conceptual framework), NOT a blueprint for system design It informs the design of system architectures, the development of systems and components It provides common definitions of terms … a common language, means of making comparison But it does NOT ensure consistency or interoperability between implementations Conformance only relates to mandatory responsibilities and following information model
Repository audit and certification Building on OAIS concepts ... but focusing on requirements for helping to ensure that repositories meet identified criteria: Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) Center for Research Libraries, OCLC, NARA,  et al . http://www.crl.edu/ DRAMBORA (Digital Repository Audit Method Based on Risk Assessment) Self-assessment tool developed by: Digital Curation Centre, Digital Preservation Europe http://www.repositoryaudit.eu/
Who undertakes preservation? Researchers Indirectly - they have most direct contact with creation stage, and understand how data can be used Directly - sometimes responsible for maintaining community data collections Information professionals Sometimes, but it depends on the context  IT professionals Primarily informaticians working with scientists
Roles and responsibilities (1) Dealing with data (JISC) Scientist Institution Data centre User Funder Publisher Long-lived data collections (NSB) Data authors Data managers Data scientists Data users Funding agencies
Roles and responsibilities (2) Scientists Initial creation and use of data Expectation of first use and in gaining appropriate credit and recognition Responsible for: Managing data for life of project For using standards (where possible) For complying with data policies For making the data available in a form that can (easily?) be used by others
Roles and responsibilities (3) Institutions: Role less clear Institutional policies may require short-term management of data Advocacy and training Some institutions are developing repository services Are rarely currently used for research data Federated approaches maintain disciplinary involvement
Roles and responsibilities (3) Data centres Undertakes curation and provides access  Responsible for: Selection and ingest Participating in the development of standards Protecting the rights of data creators Supporting ingest and metadata capture Supporting re-use (tools and services) Training
Roles and responsibilities (4) Users: Users of third-party data Responsible for: Adhering to any licenses and restrictions on use Acknowledging data creators and curators Managing any derived data Provide feedback to scientists and data centres
Roles and responsibilities (5) Funding bodies: Acting at policy level Responsible for: Considering wider policy perspectives Developing policies in co-operation with other stakeholders Monitoring and enforcing data policies Support for long-term data management Support for data curation
What is research data? An extremely broad category of material: “... any information that can be stored in digital form, including text, numbers, images, video or movies, audio, software, algorithms, equations, animations, models, simulations, etc.” (National Science Board, Long-lived digital data collections, 2005) In practice, it can mean almost anything
Why curate research data? (1) Part of the normal research process: The need for others to validate and replicate research In some disciplines, supporting data is routinely made available to reviewers and linked from journal papers Principles of sharing and openness are firmly embedded in some disciplines
Why curate research data? (2) Extrinsic and intrinsic value; High investment in research Data can be very expensive to capture and analyse Data is impossible to recreate once lost Observational data (by definition) is irreplaceable Current generations of instruments can gather more data than can be analysed
Why curate research data? (3) The potential for creating 'new' knowledge from existing data: Re-use, re-analysis, data mining Annotation, e.g. in molecular biology astronomy Combining datasets in innovative ways, e.g. mapping biodiversity data onto ecological GIS “Science 2.0”
Why curate research data? (4) It is increasingly a requirement of some research funding bodies Some have quite mature data retention policies (not necessarily for permanent retention) Increasing expectation of access to data from publicly-funded research OECD Principles and guidelines for access to research data from public funding (2007)
Why curate research data? (5) Institutional asset management: Universities and other research organisations invest very large sums of money into research activities Research data is a key output of this activity It is, therefore, an institutional asset that needs stewardship
Why curate research data? (6) Promoting the institution, research group or individual: Re-use helps promote visibility and 'impact' Institutions become acknowledged 'centres of competence'
Preservation challenges (1) Media (1) Currently magnetic or optical tape and disks, some devices (e.g., memory sticks) Examples include: CD, DVD (optical), DAT, DLT, laptop hard drives (magnetic) Unknown lifetimes Subject to differences in quality or storage conditions But relatively short lifetimes compared to paper or good quality microform Lifetimes measured in years rather than decades
Preservation challenges (2) Media (2) Technical solutions Longer lasting media: e.g. Norsam's High Density Rosetta system - analogue storage on nickel plates COM (output to good-quality microform) Keeping paper copies! Periodic copying of data bits on to new media (refreshing) - data management solution Principle of active management
Preservation challenges (3) Hardware and software dependence Most digital objects are dependent on particular configurations of hardware and software Relatively short obsolescence cycles for: Hardware Scientific instrumentation, peripherals (e.g. floppy disk drives) Software e.g., word-processing files, CAD
Conceptual problems (1) What is an digital object? Some are analogues of traditional objects, e.g. meeting minutes, research papers Others are not, e.g. Web pages, GIS, 3D models of chemical structures Complexity Dynamic nature
Conceptual problems (2) Three layers: Physical: the bits stored on a particular medium Logical: defines how the bits are used by a software application, based on data types (e.g. ASCII); in order to understand (or preserve) the bits, we need to know how to process this Conceptual: things that we deal with in the real world From: Ken Thibodeau, “Overview of technological approaches to digital preservation and challenges in coming years.” In: The state of digital preservation: an international perspective. CLIR, 2002. http://www.clir.org/
Conceptual problems (3) On which of these layers should preservation activities focus? We need to preserve the ability to reproduce the objects, not just the bits In fact, we can change the bits and logical representation and still reproduce an authentic conceptual object (e.g. converting into PDF)  Authenticity and integrity How can we trust that an object is what it claims to be? Digital information can easily be changed by accident or design
Some general principles (1)  Most of the technical problems associated with long-term digital preservation can be solved if a life-cycle management approach is adopted  i.e. a continual programme of active management Ideally, combines both managerial and technical processes, e.g., as in the OAIS Model Many current systems are attempting to support this approach Preservation strategies need to be seen in this wider context Preservation needs to be considered at a very early stage in an object's life-cycle
Some general principles (2) There is a need to identify 'significant properties' Recognises that preservation is context dependent Helps with choosing an acceptable preservation strategy Consider encapsulation Surrounding the digital object - at least conceptually - with all of the information needed to decode and understand it (including software) Produces autonomous 'self-describing' objects, reduces external dependencies (linked to the Information Package concept in the OAIS Reference Model) Keep the original byte-stream
Some general principles (3) Metadata and documentation is vitally important Relates to the OAIS concepts like Representation Information and Preservation Description Information Functions Records scientific meaning Records the research context Enables the development of finding aids Standards are being developed that support digital preservation activities (e.g., the PREMIS Data Dictionary)
Digital preservation strategies Three main families: Technology preservation Technology emulation Information migration Also: Digital archaeology (rescue)
Technology preservation The preservation of an information object together with all of the hardware and software needed to interpret it Successfully preserves the look, feel and behaviour of the whole system (at least while the hardware and software still functions) May have a role for historically important hardware Severe problems with storage and ongoing maintenance, missing documentation Would inevitably lead to 'museums' of “ageing and incompatible computer hardware” -- Mary Feeney May have a shorter-term role for supporting the rescue of digital objects (digital archaeology)
Technology emulation (1) Preserving the original bit-streams and application software; running this on emulator programs that mimic the behaviour of obsolete hardware Emulators change over time Chaining, rehosting Emulation Virtual Machines Running emulators on simplified 'virtual machines' that can be run on a range of different platforms Virtual machines are migrated so the original bit-streams do not have to be
Technology emulation (2) Benefits: Technique already widely used, e.g. for emulating different hardware, computer games Preserves (and uses) the original bits Reduces the need for regular object transformations (but emulators and virtual machines may themselves need to be migrated) Retains ‘look-and-feel’ May be the only approach possible where objects are complex or dependent on executable code Less 'understanding' of formats is needed; little incremental cost in keeping additional formats
Technology emulation (3) Challenges: Do organisations have the technical skills necessary to implement the strategy? Preserving 'look and feel' may not be needed for all objects It will be difficult to know definitively whether user experience has been accurately preserved Conclusions: Promising family of approaches Needs further practical application and research, e.g. Dioscuri software (National Library of the Netherlands (KB), Nationaal Archief and Planets project)
Information migration (1) Managed transformations: A set of organised tasks designed to achieve the periodic transfer of digital information from one hardware and software configuration to another, or from one generation of computer technology to a subsequent one - CPA/RLG report (1996) Abandons attempts to keep old technology (or substitutes for it) working A 'known' solution used by data archives and software vendors (e.g., a linear migration strategy is used by software vendors for some data types, e.g. Microsoft Office files) Focuses on the content (or properties) of objects
Information migration (2) Main types (from OAIS Model): Refreshment Replication Repackaging Transformation Challenges: Labour intensive There can be problems with ensuring the 'integrity and authenticity' of objects Transformations need to be documented (part of the preservation metadata)
Information migration (3) Uses: Seems to be most suitable for dealing with large collections of similar objects Migration can often be combined with some form of  standardisation process, e.g., on ingest ASCII Bit-mapped-page images Well-defined XML formats Some variations: migration on Request (CAMiLEON project) Keep original bits, migrate the rendering tools
Digital archaeology Not so much a preservation strategy, but the default situation if there isn't one Using various techniques to recover digital content from obsolete or damaged physical objects (media, hardware, etc.) A time consuming process, needs specialised equipment and (in most cases) adequate documentation Considered to be expensive (and risky) Remains an option for content deemed to be of value
Choosing a strategy Preservation strategies are not in competition (different strategies will work together) It has been suggested that we should keep the original bits (with some documentation) in any case But the strategy chosen has implications for: The technical infrastructure required (and metadata) Collection management priorities Rights management e.g, Owning the rights to re-engineer software Costs Planets project - PLATO preservation planning tool Decision support tool
File formats and preservation Formats can be identified and validated at ingest JHOVE, PRONOM-DROID Standardisation on ingest Perceived wisdom suggests the adoption of open or non-proprietary standards, e.g. databases structured in XML, uncompressed images However, we need more empirical data on how robust some of these standards are to random bit-rot
Rescue of BBC Domesday (1) Case Study: BBC Domesday project (1986) Commemorated the 900th Anniversary of the original Domesday survey Two interactive videodiscs (12") Mixture of textual material (some produced by schools), maps, statistical data, images and video Technical basis: Hardware: BBC Master Series microcomputer and Philips Laservision (LV-ROM) player Some software in ROM chip, others on the discs System obsolete by end of 1990s; working hardware becoming more difficult to find
Rescue of BBC Domesday (2) CAMiLEON project Proof of concept for the emulation approach Converted data into media-neutral form Adapted an existing emulator for the BBC microcomputer to render Domesday content The National Archives (and partners) Reengineered the whole system for use on Windows PCs Digital versions of images and video converted from original master tapes (still held by BBC) Developed an improved interface Web version: http://domesday1986.com/
 
 
 
Other preservation challenges Scale (1): The “digital deluge” e-Science New generations of instruments Computer  simulations Many terabytes generated per day, petabyte scale computing (and growing) Cory Doctorow, “Welcome to the petacentre.” Nature, 455, pp 17-21, 4 Sep 2008
Other preservation challenges Scale (2): Problems of scale are particularly acute in traditional 'big-science' disciplines: Particle physics (e.g., the Large Hadron Collider) Astronomy (sky surveys, etc) But “smaller experiments will grow the fastest” (Szalay & Gray, Nature, 440, 413-4, 23 Mar 2006) Bioinformatics, crystallography, engineering design, and many others In some cases it may be cheaper just to generate the data again, e.g. for computer simulations
Other preservation challenges Complexity (1) Research data is extremely diverse - not really a single category of material tabular data, images, GIS, etc. raw machine output vs, derived data varying levels of structure (XML, legacy formats, etc.) many different standards Research data is not homogeneous No one-size-fits-all approach possible
Other preservation challenges Complexity (2): Even wider range of social contexts in which data is used (and shared) DCC SCARP project has been exploring disciplinary factors in curation practice Practice even within single disciplines is very fragmented Case studies ongoing Big-science archives, medical and social sciences, architecutre and engineering, biological images
Other preservation challenges Diverse research cultures Data practices vary widely, even within a single discipline Gene sequence data is typically deposited in public databases In proteomics sharing is not so widespread; partly driven by lack of standards, but there is also concern about who have exploitation rights Role of commercial interests Pharmaceuticals, architecture and engineering, geological prospecting
Other preservation challenges Costs Recent JISC study (2008) - focusing on the institution level Some findings: The complex service requirements for curating research data means that institutions are setting-up federated approaches to repository development Currently ingest costs are much higher than long-term storage and preservation costs Start-up (and R&D) costs are high, but there can be economies of scale
Research data collections (1) A typology (1): From National Science Board report Long-lived digital data collections (2005) Research data collections – the products of one or more focused research projects Resource or community data collections – collections that emerge to serve particular subject sub-disciplines Reference data collections – serve a broader and more diverse set of user communities
Research data collections (2) Data in “research data collections” is most at risk A modern version of the “file-drawer problem” Data stored on personal hard-drives or on media; largely undocumented Particular challenge when the data creator has retired or moved to another institution Data creators not aways aware of its potential value The reward structure of science is not always helpful
Curation infrastructures (1) Focus on the generic: Need for a balance between: The 'bottom-up' discipline-based drivers that promote the generation of research data The policy level, looking to make cost effective investment in curation When building Infrastructures, focus on the generic Storage systems and middleware Preservation services Identifying the needs of the wider community
Curation infrastructures (2) The need for collaboration: Need for 'deep-infrastructure' recognised as far back as 1996 by the Task Force on Archiving of Digital Information Digital preservation involves the "grander problem of organizing ourselves over time and as a society ... [to manoeuvre] effectively in a digital landscape" (p. 7)
Summing-up Long-term preservation of digital research data (and other types of object) is a big ongoing challenge Solutions are normally based on the active management of data Decisions needed on whether to adopt standard formats, the identification of “significant properties,” preservation planning Research disciplines and sub-disciplines are at different stages of maturity
The Future ... “It is always a mistake for a historian to try and predict the future. Life, unlike science, is simply too full of surprises” - Richard J. Evans, In defence of history (1997, p. 62)
Readings (1) Neil  Beagrie  and Maggie  Jones ,   Preservation Management of Digital   Materials: a Handbook   (2001). Updated version available at:  http://www.dpconline.org/ Council on Library and Information Resources,  Building a National Strategy for Preservation: Issues in Digital Media Archiving  (April 2002) http://www.clir.org/pubs/abstract/pub106abst.html Council on Library and Information Resources ,  The state of digital   preservation: an international perspective  (July 2002) http://www.clir.org/pubs/abstract/pub107abst.html Margaret  Hedstrom ,   It's about time: research challenges in digital   archiving and long-term preservation  (2003)  http://www.digitalpreservation.gov / Margaret  Hedstrom  and Seamus  Ross,  Invest to save: report and   recommendations of the NSF-DELOS Working Group on Digital Archiving and   Preservation  (2003) http://eprints.erpanet.org/archive/00000095/
Readings (2) Philip  Lord  and Alison  Macdonald ,   Data curation for e-Science in the UK:   an audit to establish requirements for future curation and provision  (2003) http://www.jisc.ac.uk/ Helen R. Tibbo, "On the nature and importance of archiving in the digital age."  Advances in Computers  57 (2003): 1-67. Brian Lavoie and Lorcan Dempsey, "Thirteen Ways of Looking at ... Digital Preservation."  D-Lib Magazine  10, no. 7/8 (July/August 2004) http://www.dlib.org/dlib/july04/lavoie/07lavoie.html National Science Board,  Long-lived digital data collections: enabling research and education in the 21st century  (2005) http://www.nsf.gov/pubs/2005/nsb0540/ DCC Digital Curation Manual (2005-  ) http://www.dcc.ac.uk/resource/curation-manual/chapters/ Christine L. Borgman,  Scholarship in the digital age: Information, infrastructure, and the Internet  (Cambridge, MA: MIT Press, 2007) Murtha Baca (Ed.),  Introduction to metadata , v 3.0 (Los Angeles, CA: Getty Publications, 2008) http://www.getty.edu/research/conducting_research/standards/intrometadata/
Curation and “open science”
The UK research context (1) Dual-support funding system Splits funding of research from infrastructure Research Councils (around EUR 4 billion pa) Higher education funding bodies Direct institutional support Joint Information Systems Committee (JISC) Data curation on the agenda of several of these Research Councils UK Higher Education Funding Council for England National research data service study JISC
The UK research context (2) JISC has been very active in funding work on long-term digital preservation and curation: Research projects Over ten years A major recent focus has been on institutional repositories) Supporting studies Dealing with Data  (2007) Keeping Research Data Safe  (2008) Studies of 'significant properties' of certain classes of content (ongoing) The Digital Curation Centre (DCC)
The Digital Curation Centre (DCC) Launched in 2004 Initial grant funding from: Joint Information Systems Committee (JISC) UK e-Science Core Programme (Engineering and Physical Sciences Research Council) Main activities: Development, services and outreach in digital curation Research programme (2004-2008) Consortium of four institutions Now in second phase
Curation, not just preservation Active management of data over life-cycle of scholarly and scientific interest Reproducibility and reuse Appreciation of differences between disciplines Explored in separate DCC SCARP project Big-science / small-science distinctions are becoming blurred Importance of lifecycles Conception, creation, use, re-use Curation potentially involves a lifetime of endeavour
DCC Curation Lifecycle Model
DCC vision Centre of excellence in digital curation and preservation in the UK Authoritative source of advocacy and expert advice and guidance to the community Key facilitator of an informed research community with established collaborative networks of digital curators Service provider of a wide range of resources, software, tools and support services
Selected DCC activities and outputs User services Curation Lifecycle Model Curation manual and briefing papers Tools for repository self-assessment (DRAMBORA) Community Development Website, journal (IJDC) Events (regular workshops/training, annual international conference) Liaison with JISC's repositories activities Tools and infrastructure Representation Information registries
Problem 1: who 'owns' curation? Many potential stakeholders Dealing with Data  report (2007) identified: scientists, institutions, data centres, the users of data, funding bodies and publishers Also ... data scientists, curation specialists Different repository types (project-specific, community-driven, reference collections) The potential for duplication of effort and confusion is high All of these probably have some kind of role ... so how do we co-ordinate?
Problem 2: institutions v disciplines A major focus in UK is on the institutional role in curation: Building on the Institutional Repository paradigm It is not clear, however, that the curation of  data  is best performed at this level Keeping Research Data Safe  (2008) report notes that data is more often dealt with by discipline-based consortia Bottom-up approaches to curation work well in some domains – but not in all Need to understand domain differences Initial SCARP studies reveal much complexity
Problem 3: how much will it cost? Keeping Research Data Safe  (2008):  Report (with case studies) focused on identifying costs at the institutional level Some findings: The complex service requirements for curating research data means that institutions are setting-up federated approaches to repository development Currently ingest costs are much higher than long-term storage and preservation costs Start-up (and R&D) costs are high for first adopters
What is needed for open science? Some challenges: 1. Being open is not enough Data need to be made available in ways that facilitate high-throughput reuse e.g., Peter Murray-Rust's comments on the amount of chemistry data captured in formats like PDF 2. How do we capture the context(s) of research? Not just papers and data, but Web-sites, annotation services, blogs, wikis, etc. Importance of recording provenance
What is needed for open science? 3. Current scientific reward structures do not support  either  data curation  or  open science Funding bodies can 'mandate' (and in some cases fund) Principal Investigators  to maintain data and make it available Without a sustainable infrastructure, however, this will be only a short term solution We need to decide what infrastructure we need and how we pay for it
What is needed for open science? 4. What will be the role of institutions? They have traditionally had an important role (e.g., research libraries) Currently are major supporters (and hosts) of Institutional Repositories Potential skills gap WRT data: We need to think about the status and skills of data curators (capacity building) DCC Curation 101, DigCCurr project What does the 'institution' mean in Open Science anyway? Open Notebook Science, open grant proposals, loyalty to collaborators or to institution
Summing up There are still many more questions than answers There is a (widely acknowledged) need for better co-ordination: The curation landscape is currently very fragmented, with no real clarity with regard to identifying (and owning) roles and responsibilities Much is specific to particular domains There is a need for infrastructure But what should this include? Are we really able to identify  generic  needs?
Further reading National Science Board, Long-lived digital data collections: enabling research and education in the 21st century (NSF, 2005) http//www.nsf.gov/pubs/2005/nsb0540/ Liz Lyon, Dealing with data; roles, rights, responsibilities and relationships (JISC, 2007) http://www.jisc.ac.uk/whatwedo/programmes/digitalrepositories2005/dealingwithdata.aspx Neil Beagrie, Jullia Chruszcz, and Brian Lavoie, Keeping research data safe: a cost model and guidance for UK universities (JISC, 2008) http://www.jisc.ac.uk/publications/publications/keepingresearchdatasafe.aspx
Thank you for your attention! “ Pigabyte” King Bladud’s Pigs in Bath  (public art project), Summer 2008 http://www.kingbladudspigs.org/
Acknowledgments UKOLN is funded by the Museums, Libraries and Archives Council (MLA), the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC, the European Union, and other sources. UKOLN also receives support from the University of Bath, where it is based. More information: http://www.ukoln.ac.uk/

Introduction to digital curation

  • 1.
    An introduction todigital curation and preservation Michael Day Digital Curation Centre UKOLN, University of Bath [email_address] Information and Library Management, University of the West of England, Bristol, 24 March 2009 Slides available on SlideShare: http://www.slideshare.net/michaelday
  • 2.
    Presentation outline: TheDCC digital curation lifecycle Some definitions OAIS concepts Roles and responsibilities Reasons for preserving research data Digital preservation challenges and strategies A taxonomy of research data collections Infrastructures for preservation and curation Some comments on curation and “Open Science”
  • 3.
    Learning outcomes Angreater awareness of the factors that need to be taken into account when considering how to preserve research data (and other materials) over time A deeper understanding of the preservation options currently available Part of the “digital curation lifecycle” (Digital Curation Centre)
  • 4.
  • 5.
    Preservation in thecuration lifecycle Lifecycle includes: Creation  Appraisal and selection  Ingest  Preservation  Storage  Access, use and reuse  Transformation  Generic tasks: Preservation planning Community watch Metadata (Descriptive Information, Representation Information)
  • 6.
    Preservation in thecuration lifecycle There are major dependencies on the rest of the curation process The creation stage is normally the best time to ensure that data are fit-for-purpose and “preservable” Need to document both explicit and implicit knowledge, contexts (part of the metadata issue) Preservation Planning informs ingest strategies as well as preservation actions and transformations
  • 7.
    Definitions (1) Preservation:A management function “Its objective is to ensure that information survives in usable form for as long as it is wanted” - John Feather (1991) Not primarily about: Conservation or restoration Storage media or backup regimes Concepts of “permanence”
  • 8.
    Definitions (2) Digitalpreservation: Digital information is different Technical problems with ensuring continued access (more of this later) But also (primarily) a managerial problem “ ... the planning, resource allocation, and application of preservation methods and technologies to ensure that digital information of continuing value remains accessible and usable” - Margaret Hedstrom (1998)
  • 9.
    Definitions (3) Digitalcuration: General concept (data curation) originates in the scientific data world (e.g. bioinformatics, astronomy) Is used to mean something more than just the preservation of objects "The activity of managing and promoting the use of data from its point of creation, to ensure it is fit for contemporary purpose, and available for discovery and reuse" - Philip Lord, et al. (2004) "Maintaining and adding value to a trusted body of information for current and future use" -- DCC presentation at CNI (2005)
  • 10.
    The OAIS referencemodel Reference Model for an Open Archival Information System (OAIS) Fundamental standard, defines key concepts Development managed by the Consultative Committee on Space Data Systems (CCSDS) CCSDS Blue Book 650.0-B-1 (2002) ISO 14721:2003 Recently reviewed - no major changes proposed Has established a common framework of terms and concepts Information model has been influential on the design of some preservation metadata schemas It is still uncertain what 'conformance' might mean
  • 11.
    OAIS mandatory responsibilitiesNegotiating and accepting information Obtaining sufficient control of the information to ensure long-term preservation Determining the "designated community" Ensuring that information is independently understandable , i.e. can be (re)used without the assistance of those who produced it Following documented policies and procedures Making the preserved information available
  • 12.
    OAIS Functional Model(1) Six entities Ingest Archival Storage Data Management Administration Preservation Planning Access Described using UML diagrams
  • 13.
    OAIS Functional Model(2) Administration Ingest Archival Storage Access Data Management Descriptive info. PRODUCER CONSUMER MANAGEMENT queries result sets Descriptive info. Preservation Planning orders OAIS Functional Entities (Figure 4-1) SIP SIP SIP DIP DIP AIP AIP
  • 14.
    OAIS Information ModelDefines the “Information Packages” required Ingest (Submission Information Package) Storage (Archival Information Package) Access (Dissemination Information Package) General principle of Information Packages: All objects are wrapped in layers of metadata (Representation Information, Descriptive Information, Packaging, etc.)
  • 15.
    Implementing OAIS Fundamentals:OAIS is a reference model (conceptual framework), NOT a blueprint for system design It informs the design of system architectures, the development of systems and components It provides common definitions of terms … a common language, means of making comparison But it does NOT ensure consistency or interoperability between implementations Conformance only relates to mandatory responsibilities and following information model
  • 16.
    Repository audit andcertification Building on OAIS concepts ... but focusing on requirements for helping to ensure that repositories meet identified criteria: Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) Center for Research Libraries, OCLC, NARA, et al . http://www.crl.edu/ DRAMBORA (Digital Repository Audit Method Based on Risk Assessment) Self-assessment tool developed by: Digital Curation Centre, Digital Preservation Europe http://www.repositoryaudit.eu/
  • 17.
    Who undertakes preservation?Researchers Indirectly - they have most direct contact with creation stage, and understand how data can be used Directly - sometimes responsible for maintaining community data collections Information professionals Sometimes, but it depends on the context IT professionals Primarily informaticians working with scientists
  • 18.
    Roles and responsibilities(1) Dealing with data (JISC) Scientist Institution Data centre User Funder Publisher Long-lived data collections (NSB) Data authors Data managers Data scientists Data users Funding agencies
  • 19.
    Roles and responsibilities(2) Scientists Initial creation and use of data Expectation of first use and in gaining appropriate credit and recognition Responsible for: Managing data for life of project For using standards (where possible) For complying with data policies For making the data available in a form that can (easily?) be used by others
  • 20.
    Roles and responsibilities(3) Institutions: Role less clear Institutional policies may require short-term management of data Advocacy and training Some institutions are developing repository services Are rarely currently used for research data Federated approaches maintain disciplinary involvement
  • 21.
    Roles and responsibilities(3) Data centres Undertakes curation and provides access Responsible for: Selection and ingest Participating in the development of standards Protecting the rights of data creators Supporting ingest and metadata capture Supporting re-use (tools and services) Training
  • 22.
    Roles and responsibilities(4) Users: Users of third-party data Responsible for: Adhering to any licenses and restrictions on use Acknowledging data creators and curators Managing any derived data Provide feedback to scientists and data centres
  • 23.
    Roles and responsibilities(5) Funding bodies: Acting at policy level Responsible for: Considering wider policy perspectives Developing policies in co-operation with other stakeholders Monitoring and enforcing data policies Support for long-term data management Support for data curation
  • 24.
    What is researchdata? An extremely broad category of material: “... any information that can be stored in digital form, including text, numbers, images, video or movies, audio, software, algorithms, equations, animations, models, simulations, etc.” (National Science Board, Long-lived digital data collections, 2005) In practice, it can mean almost anything
  • 25.
    Why curate researchdata? (1) Part of the normal research process: The need for others to validate and replicate research In some disciplines, supporting data is routinely made available to reviewers and linked from journal papers Principles of sharing and openness are firmly embedded in some disciplines
  • 26.
    Why curate researchdata? (2) Extrinsic and intrinsic value; High investment in research Data can be very expensive to capture and analyse Data is impossible to recreate once lost Observational data (by definition) is irreplaceable Current generations of instruments can gather more data than can be analysed
  • 27.
    Why curate researchdata? (3) The potential for creating 'new' knowledge from existing data: Re-use, re-analysis, data mining Annotation, e.g. in molecular biology astronomy Combining datasets in innovative ways, e.g. mapping biodiversity data onto ecological GIS “Science 2.0”
  • 28.
    Why curate researchdata? (4) It is increasingly a requirement of some research funding bodies Some have quite mature data retention policies (not necessarily for permanent retention) Increasing expectation of access to data from publicly-funded research OECD Principles and guidelines for access to research data from public funding (2007)
  • 29.
    Why curate researchdata? (5) Institutional asset management: Universities and other research organisations invest very large sums of money into research activities Research data is a key output of this activity It is, therefore, an institutional asset that needs stewardship
  • 30.
    Why curate researchdata? (6) Promoting the institution, research group or individual: Re-use helps promote visibility and 'impact' Institutions become acknowledged 'centres of competence'
  • 31.
    Preservation challenges (1)Media (1) Currently magnetic or optical tape and disks, some devices (e.g., memory sticks) Examples include: CD, DVD (optical), DAT, DLT, laptop hard drives (magnetic) Unknown lifetimes Subject to differences in quality or storage conditions But relatively short lifetimes compared to paper or good quality microform Lifetimes measured in years rather than decades
  • 32.
    Preservation challenges (2)Media (2) Technical solutions Longer lasting media: e.g. Norsam's High Density Rosetta system - analogue storage on nickel plates COM (output to good-quality microform) Keeping paper copies! Periodic copying of data bits on to new media (refreshing) - data management solution Principle of active management
  • 33.
    Preservation challenges (3)Hardware and software dependence Most digital objects are dependent on particular configurations of hardware and software Relatively short obsolescence cycles for: Hardware Scientific instrumentation, peripherals (e.g. floppy disk drives) Software e.g., word-processing files, CAD
  • 34.
    Conceptual problems (1)What is an digital object? Some are analogues of traditional objects, e.g. meeting minutes, research papers Others are not, e.g. Web pages, GIS, 3D models of chemical structures Complexity Dynamic nature
  • 35.
    Conceptual problems (2)Three layers: Physical: the bits stored on a particular medium Logical: defines how the bits are used by a software application, based on data types (e.g. ASCII); in order to understand (or preserve) the bits, we need to know how to process this Conceptual: things that we deal with in the real world From: Ken Thibodeau, “Overview of technological approaches to digital preservation and challenges in coming years.” In: The state of digital preservation: an international perspective. CLIR, 2002. http://www.clir.org/
  • 36.
    Conceptual problems (3)On which of these layers should preservation activities focus? We need to preserve the ability to reproduce the objects, not just the bits In fact, we can change the bits and logical representation and still reproduce an authentic conceptual object (e.g. converting into PDF) Authenticity and integrity How can we trust that an object is what it claims to be? Digital information can easily be changed by accident or design
  • 37.
    Some general principles(1) Most of the technical problems associated with long-term digital preservation can be solved if a life-cycle management approach is adopted i.e. a continual programme of active management Ideally, combines both managerial and technical processes, e.g., as in the OAIS Model Many current systems are attempting to support this approach Preservation strategies need to be seen in this wider context Preservation needs to be considered at a very early stage in an object's life-cycle
  • 38.
    Some general principles(2) There is a need to identify 'significant properties' Recognises that preservation is context dependent Helps with choosing an acceptable preservation strategy Consider encapsulation Surrounding the digital object - at least conceptually - with all of the information needed to decode and understand it (including software) Produces autonomous 'self-describing' objects, reduces external dependencies (linked to the Information Package concept in the OAIS Reference Model) Keep the original byte-stream
  • 39.
    Some general principles(3) Metadata and documentation is vitally important Relates to the OAIS concepts like Representation Information and Preservation Description Information Functions Records scientific meaning Records the research context Enables the development of finding aids Standards are being developed that support digital preservation activities (e.g., the PREMIS Data Dictionary)
  • 40.
    Digital preservation strategiesThree main families: Technology preservation Technology emulation Information migration Also: Digital archaeology (rescue)
  • 41.
    Technology preservation Thepreservation of an information object together with all of the hardware and software needed to interpret it Successfully preserves the look, feel and behaviour of the whole system (at least while the hardware and software still functions) May have a role for historically important hardware Severe problems with storage and ongoing maintenance, missing documentation Would inevitably lead to 'museums' of “ageing and incompatible computer hardware” -- Mary Feeney May have a shorter-term role for supporting the rescue of digital objects (digital archaeology)
  • 42.
    Technology emulation (1)Preserving the original bit-streams and application software; running this on emulator programs that mimic the behaviour of obsolete hardware Emulators change over time Chaining, rehosting Emulation Virtual Machines Running emulators on simplified 'virtual machines' that can be run on a range of different platforms Virtual machines are migrated so the original bit-streams do not have to be
  • 43.
    Technology emulation (2)Benefits: Technique already widely used, e.g. for emulating different hardware, computer games Preserves (and uses) the original bits Reduces the need for regular object transformations (but emulators and virtual machines may themselves need to be migrated) Retains ‘look-and-feel’ May be the only approach possible where objects are complex or dependent on executable code Less 'understanding' of formats is needed; little incremental cost in keeping additional formats
  • 44.
    Technology emulation (3)Challenges: Do organisations have the technical skills necessary to implement the strategy? Preserving 'look and feel' may not be needed for all objects It will be difficult to know definitively whether user experience has been accurately preserved Conclusions: Promising family of approaches Needs further practical application and research, e.g. Dioscuri software (National Library of the Netherlands (KB), Nationaal Archief and Planets project)
  • 45.
    Information migration (1)Managed transformations: A set of organised tasks designed to achieve the periodic transfer of digital information from one hardware and software configuration to another, or from one generation of computer technology to a subsequent one - CPA/RLG report (1996) Abandons attempts to keep old technology (or substitutes for it) working A 'known' solution used by data archives and software vendors (e.g., a linear migration strategy is used by software vendors for some data types, e.g. Microsoft Office files) Focuses on the content (or properties) of objects
  • 46.
    Information migration (2)Main types (from OAIS Model): Refreshment Replication Repackaging Transformation Challenges: Labour intensive There can be problems with ensuring the 'integrity and authenticity' of objects Transformations need to be documented (part of the preservation metadata)
  • 47.
    Information migration (3)Uses: Seems to be most suitable for dealing with large collections of similar objects Migration can often be combined with some form of standardisation process, e.g., on ingest ASCII Bit-mapped-page images Well-defined XML formats Some variations: migration on Request (CAMiLEON project) Keep original bits, migrate the rendering tools
  • 48.
    Digital archaeology Notso much a preservation strategy, but the default situation if there isn't one Using various techniques to recover digital content from obsolete or damaged physical objects (media, hardware, etc.) A time consuming process, needs specialised equipment and (in most cases) adequate documentation Considered to be expensive (and risky) Remains an option for content deemed to be of value
  • 49.
    Choosing a strategyPreservation strategies are not in competition (different strategies will work together) It has been suggested that we should keep the original bits (with some documentation) in any case But the strategy chosen has implications for: The technical infrastructure required (and metadata) Collection management priorities Rights management e.g, Owning the rights to re-engineer software Costs Planets project - PLATO preservation planning tool Decision support tool
  • 50.
    File formats andpreservation Formats can be identified and validated at ingest JHOVE, PRONOM-DROID Standardisation on ingest Perceived wisdom suggests the adoption of open or non-proprietary standards, e.g. databases structured in XML, uncompressed images However, we need more empirical data on how robust some of these standards are to random bit-rot
  • 51.
    Rescue of BBCDomesday (1) Case Study: BBC Domesday project (1986) Commemorated the 900th Anniversary of the original Domesday survey Two interactive videodiscs (12") Mixture of textual material (some produced by schools), maps, statistical data, images and video Technical basis: Hardware: BBC Master Series microcomputer and Philips Laservision (LV-ROM) player Some software in ROM chip, others on the discs System obsolete by end of 1990s; working hardware becoming more difficult to find
  • 52.
    Rescue of BBCDomesday (2) CAMiLEON project Proof of concept for the emulation approach Converted data into media-neutral form Adapted an existing emulator for the BBC microcomputer to render Domesday content The National Archives (and partners) Reengineered the whole system for use on Windows PCs Digital versions of images and video converted from original master tapes (still held by BBC) Developed an improved interface Web version: http://domesday1986.com/
  • 53.
  • 54.
  • 55.
  • 56.
    Other preservation challengesScale (1): The “digital deluge” e-Science New generations of instruments Computer simulations Many terabytes generated per day, petabyte scale computing (and growing) Cory Doctorow, “Welcome to the petacentre.” Nature, 455, pp 17-21, 4 Sep 2008
  • 57.
    Other preservation challengesScale (2): Problems of scale are particularly acute in traditional 'big-science' disciplines: Particle physics (e.g., the Large Hadron Collider) Astronomy (sky surveys, etc) But “smaller experiments will grow the fastest” (Szalay & Gray, Nature, 440, 413-4, 23 Mar 2006) Bioinformatics, crystallography, engineering design, and many others In some cases it may be cheaper just to generate the data again, e.g. for computer simulations
  • 58.
    Other preservation challengesComplexity (1) Research data is extremely diverse - not really a single category of material tabular data, images, GIS, etc. raw machine output vs, derived data varying levels of structure (XML, legacy formats, etc.) many different standards Research data is not homogeneous No one-size-fits-all approach possible
  • 59.
    Other preservation challengesComplexity (2): Even wider range of social contexts in which data is used (and shared) DCC SCARP project has been exploring disciplinary factors in curation practice Practice even within single disciplines is very fragmented Case studies ongoing Big-science archives, medical and social sciences, architecutre and engineering, biological images
  • 60.
    Other preservation challengesDiverse research cultures Data practices vary widely, even within a single discipline Gene sequence data is typically deposited in public databases In proteomics sharing is not so widespread; partly driven by lack of standards, but there is also concern about who have exploitation rights Role of commercial interests Pharmaceuticals, architecture and engineering, geological prospecting
  • 61.
    Other preservation challengesCosts Recent JISC study (2008) - focusing on the institution level Some findings: The complex service requirements for curating research data means that institutions are setting-up federated approaches to repository development Currently ingest costs are much higher than long-term storage and preservation costs Start-up (and R&D) costs are high, but there can be economies of scale
  • 62.
    Research data collections(1) A typology (1): From National Science Board report Long-lived digital data collections (2005) Research data collections – the products of one or more focused research projects Resource or community data collections – collections that emerge to serve particular subject sub-disciplines Reference data collections – serve a broader and more diverse set of user communities
  • 63.
    Research data collections(2) Data in “research data collections” is most at risk A modern version of the “file-drawer problem” Data stored on personal hard-drives or on media; largely undocumented Particular challenge when the data creator has retired or moved to another institution Data creators not aways aware of its potential value The reward structure of science is not always helpful
  • 64.
    Curation infrastructures (1)Focus on the generic: Need for a balance between: The 'bottom-up' discipline-based drivers that promote the generation of research data The policy level, looking to make cost effective investment in curation When building Infrastructures, focus on the generic Storage systems and middleware Preservation services Identifying the needs of the wider community
  • 65.
    Curation infrastructures (2)The need for collaboration: Need for 'deep-infrastructure' recognised as far back as 1996 by the Task Force on Archiving of Digital Information Digital preservation involves the "grander problem of organizing ourselves over time and as a society ... [to manoeuvre] effectively in a digital landscape" (p. 7)
  • 66.
    Summing-up Long-term preservationof digital research data (and other types of object) is a big ongoing challenge Solutions are normally based on the active management of data Decisions needed on whether to adopt standard formats, the identification of “significant properties,” preservation planning Research disciplines and sub-disciplines are at different stages of maturity
  • 67.
    The Future ...“It is always a mistake for a historian to try and predict the future. Life, unlike science, is simply too full of surprises” - Richard J. Evans, In defence of history (1997, p. 62)
  • 68.
    Readings (1) Neil Beagrie and Maggie Jones , Preservation Management of Digital Materials: a Handbook (2001). Updated version available at: http://www.dpconline.org/ Council on Library and Information Resources, Building a National Strategy for Preservation: Issues in Digital Media Archiving (April 2002) http://www.clir.org/pubs/abstract/pub106abst.html Council on Library and Information Resources , The state of digital preservation: an international perspective (July 2002) http://www.clir.org/pubs/abstract/pub107abst.html Margaret Hedstrom , It's about time: research challenges in digital archiving and long-term preservation (2003) http://www.digitalpreservation.gov / Margaret Hedstrom and Seamus Ross, Invest to save: report and recommendations of the NSF-DELOS Working Group on Digital Archiving and Preservation (2003) http://eprints.erpanet.org/archive/00000095/
  • 69.
    Readings (2) Philip Lord and Alison Macdonald , Data curation for e-Science in the UK: an audit to establish requirements for future curation and provision (2003) http://www.jisc.ac.uk/ Helen R. Tibbo, "On the nature and importance of archiving in the digital age." Advances in Computers 57 (2003): 1-67. Brian Lavoie and Lorcan Dempsey, "Thirteen Ways of Looking at ... Digital Preservation." D-Lib Magazine 10, no. 7/8 (July/August 2004) http://www.dlib.org/dlib/july04/lavoie/07lavoie.html National Science Board, Long-lived digital data collections: enabling research and education in the 21st century (2005) http://www.nsf.gov/pubs/2005/nsb0540/ DCC Digital Curation Manual (2005- ) http://www.dcc.ac.uk/resource/curation-manual/chapters/ Christine L. Borgman, Scholarship in the digital age: Information, infrastructure, and the Internet (Cambridge, MA: MIT Press, 2007) Murtha Baca (Ed.), Introduction to metadata , v 3.0 (Los Angeles, CA: Getty Publications, 2008) http://www.getty.edu/research/conducting_research/standards/intrometadata/
  • 70.
  • 71.
    The UK researchcontext (1) Dual-support funding system Splits funding of research from infrastructure Research Councils (around EUR 4 billion pa) Higher education funding bodies Direct institutional support Joint Information Systems Committee (JISC) Data curation on the agenda of several of these Research Councils UK Higher Education Funding Council for England National research data service study JISC
  • 72.
    The UK researchcontext (2) JISC has been very active in funding work on long-term digital preservation and curation: Research projects Over ten years A major recent focus has been on institutional repositories) Supporting studies Dealing with Data (2007) Keeping Research Data Safe (2008) Studies of 'significant properties' of certain classes of content (ongoing) The Digital Curation Centre (DCC)
  • 73.
    The Digital CurationCentre (DCC) Launched in 2004 Initial grant funding from: Joint Information Systems Committee (JISC) UK e-Science Core Programme (Engineering and Physical Sciences Research Council) Main activities: Development, services and outreach in digital curation Research programme (2004-2008) Consortium of four institutions Now in second phase
  • 74.
    Curation, not justpreservation Active management of data over life-cycle of scholarly and scientific interest Reproducibility and reuse Appreciation of differences between disciplines Explored in separate DCC SCARP project Big-science / small-science distinctions are becoming blurred Importance of lifecycles Conception, creation, use, re-use Curation potentially involves a lifetime of endeavour
  • 75.
  • 76.
    DCC vision Centreof excellence in digital curation and preservation in the UK Authoritative source of advocacy and expert advice and guidance to the community Key facilitator of an informed research community with established collaborative networks of digital curators Service provider of a wide range of resources, software, tools and support services
  • 77.
    Selected DCC activitiesand outputs User services Curation Lifecycle Model Curation manual and briefing papers Tools for repository self-assessment (DRAMBORA) Community Development Website, journal (IJDC) Events (regular workshops/training, annual international conference) Liaison with JISC's repositories activities Tools and infrastructure Representation Information registries
  • 78.
    Problem 1: who'owns' curation? Many potential stakeholders Dealing with Data report (2007) identified: scientists, institutions, data centres, the users of data, funding bodies and publishers Also ... data scientists, curation specialists Different repository types (project-specific, community-driven, reference collections) The potential for duplication of effort and confusion is high All of these probably have some kind of role ... so how do we co-ordinate?
  • 79.
    Problem 2: institutionsv disciplines A major focus in UK is on the institutional role in curation: Building on the Institutional Repository paradigm It is not clear, however, that the curation of data is best performed at this level Keeping Research Data Safe (2008) report notes that data is more often dealt with by discipline-based consortia Bottom-up approaches to curation work well in some domains – but not in all Need to understand domain differences Initial SCARP studies reveal much complexity
  • 80.
    Problem 3: howmuch will it cost? Keeping Research Data Safe (2008): Report (with case studies) focused on identifying costs at the institutional level Some findings: The complex service requirements for curating research data means that institutions are setting-up federated approaches to repository development Currently ingest costs are much higher than long-term storage and preservation costs Start-up (and R&D) costs are high for first adopters
  • 81.
    What is neededfor open science? Some challenges: 1. Being open is not enough Data need to be made available in ways that facilitate high-throughput reuse e.g., Peter Murray-Rust's comments on the amount of chemistry data captured in formats like PDF 2. How do we capture the context(s) of research? Not just papers and data, but Web-sites, annotation services, blogs, wikis, etc. Importance of recording provenance
  • 82.
    What is neededfor open science? 3. Current scientific reward structures do not support either data curation or open science Funding bodies can 'mandate' (and in some cases fund) Principal Investigators to maintain data and make it available Without a sustainable infrastructure, however, this will be only a short term solution We need to decide what infrastructure we need and how we pay for it
  • 83.
    What is neededfor open science? 4. What will be the role of institutions? They have traditionally had an important role (e.g., research libraries) Currently are major supporters (and hosts) of Institutional Repositories Potential skills gap WRT data: We need to think about the status and skills of data curators (capacity building) DCC Curation 101, DigCCurr project What does the 'institution' mean in Open Science anyway? Open Notebook Science, open grant proposals, loyalty to collaborators or to institution
  • 84.
    Summing up Thereare still many more questions than answers There is a (widely acknowledged) need for better co-ordination: The curation landscape is currently very fragmented, with no real clarity with regard to identifying (and owning) roles and responsibilities Much is specific to particular domains There is a need for infrastructure But what should this include? Are we really able to identify generic needs?
  • 85.
    Further reading NationalScience Board, Long-lived digital data collections: enabling research and education in the 21st century (NSF, 2005) http//www.nsf.gov/pubs/2005/nsb0540/ Liz Lyon, Dealing with data; roles, rights, responsibilities and relationships (JISC, 2007) http://www.jisc.ac.uk/whatwedo/programmes/digitalrepositories2005/dealingwithdata.aspx Neil Beagrie, Jullia Chruszcz, and Brian Lavoie, Keeping research data safe: a cost model and guidance for UK universities (JISC, 2008) http://www.jisc.ac.uk/publications/publications/keepingresearchdatasafe.aspx
  • 86.
    Thank you foryour attention! “ Pigabyte” King Bladud’s Pigs in Bath (public art project), Summer 2008 http://www.kingbladudspigs.org/
  • 87.
    Acknowledgments UKOLN isfunded by the Museums, Libraries and Archives Council (MLA), the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC, the European Union, and other sources. UKOLN also receives support from the University of Bath, where it is based. More information: http://www.ukoln.ac.uk/