OCFL v1.0 – The Oxford
Common File Layout
Andrew Hankinson (University of Oxford)
Neil Jefferies (University of Oxford)
Rosalyn Metz (Emory University)
Julian Morley (Stanford University)
Simeon Warner (Cornell University) – presenter
Andrew Woods (LYRASIS)
#WeMissiPRES, online, 2020-09-22/23/24
OCFL is …
an open community effort to
define an application-independent
way of storing
versioned digital objects for
long-term digital preservation
Origins and Process
• 2017 discussion among digital repository architects about the ideal
layout and characteristics for repository storage (in Oxford)
• Initial community meeting (47 attendees from 32 institutions)
• Collected use cases, best practices, recommendations
• Editorial group, regular community meetings and open process on
github
• Significant design influences from Library of Congress Bagit format
and Stanford University Moab experience (see
https://doi.org/10.3390/publications7020039)
1. Completeness –
so that a repository
can be rebuilt from
the files it stores
• The complete intellectual object is
stored together with its metadata
• Falls in line with standards such as
Trusted Digital Repositories (TDR, ISO
16363), NDSA Levels of Preservation,
and Open Archival Information
Systems (OAIS)
• Allows ease of mapping from one
system to another
These standards typically
talk about what you should
do, but not how
OCFL provides the how
2. Parsability – both
by humans and
machines, to ensure
content can be
understood in the
absence of original
software
• In disaster recovery situations,
humans should be able to
understand the content
• Machine readability allows for
simple applications to be
placed on top of an existing
OCFL storage root
3. Robustness –
against errors,
corruption, and
migration between
storage
technologies
• Strong fixity is baked into OCFL
• Objects can easily be validated
using the inventory.json
file
• Objects can be completely self-
contained
4. Versioning – so
so that a repository
can make changes
objects while
persisting their
history
• Changes to objects are tracked
over time, entire history can be
reconstructed using the
inventory.json file
• Content for a version is
immutable once written
• Forward delta versioning
reduces the amount of content
stored (supports delayed
description workflows)
5. Storage diversity
– to ensure content
can be stored on
diverse storage
infrastructures
including cloud
object stores
• Supports conventional
filesystem metaphor
• Designed to work with various
storage infrastructures
including object stores
prevalent in cloud offerings
(e.g. Amazon S3 API).
• Allows deduplication of
content, lowering overall
storage costs
OCFL v1.0
First alpha October 2018
Beta draft June 2019
Version 1.0 July 2020 –
specification and
implementation notes
Validation and manipulation
tools implemented in multiple
programming languages
• Stand alone tools
through
• Fedora 6 using OCFL for
persistence (in development)
Specification
Describes:
• OCFL Objects – including the object
structure, versioning, and the
inventory.json which provides a
comprehensive registry of the object
• OCFL Storage Root – how objects are
arranged
with examples to illustrate use
Implementation Notes
Recommendations and practices for
implementing the specification,
including:
• Digital preservation issues – including
rebuildability and fixity
• Storage choices – including different
content cases and infrastructures
• Client behaviors – to support
common operations and objects in
motion
Thank you!
Details and links to
get involved:
https://ocfl.io/

OCFL v1.0

  • 1.
    OCFL v1.0 –The Oxford Common File Layout Andrew Hankinson (University of Oxford) Neil Jefferies (University of Oxford) Rosalyn Metz (Emory University) Julian Morley (Stanford University) Simeon Warner (Cornell University) – presenter Andrew Woods (LYRASIS) #WeMissiPRES, online, 2020-09-22/23/24
  • 2.
    OCFL is … anopen community effort to define an application-independent way of storing versioned digital objects for long-term digital preservation
  • 3.
    Origins and Process •2017 discussion among digital repository architects about the ideal layout and characteristics for repository storage (in Oxford) • Initial community meeting (47 attendees from 32 institutions) • Collected use cases, best practices, recommendations • Editorial group, regular community meetings and open process on github • Significant design influences from Library of Congress Bagit format and Stanford University Moab experience (see https://doi.org/10.3390/publications7020039)
  • 4.
    1. Completeness – sothat a repository can be rebuilt from the files it stores • The complete intellectual object is stored together with its metadata • Falls in line with standards such as Trusted Digital Repositories (TDR, ISO 16363), NDSA Levels of Preservation, and Open Archival Information Systems (OAIS) • Allows ease of mapping from one system to another These standards typically talk about what you should do, but not how OCFL provides the how
  • 5.
    2. Parsability –both by humans and machines, to ensure content can be understood in the absence of original software • In disaster recovery situations, humans should be able to understand the content • Machine readability allows for simple applications to be placed on top of an existing OCFL storage root
  • 6.
    3. Robustness – againsterrors, corruption, and migration between storage technologies • Strong fixity is baked into OCFL • Objects can easily be validated using the inventory.json file • Objects can be completely self- contained
  • 7.
    4. Versioning –so so that a repository can make changes objects while persisting their history • Changes to objects are tracked over time, entire history can be reconstructed using the inventory.json file • Content for a version is immutable once written • Forward delta versioning reduces the amount of content stored (supports delayed description workflows)
  • 8.
    5. Storage diversity –to ensure content can be stored on diverse storage infrastructures including cloud object stores • Supports conventional filesystem metaphor • Designed to work with various storage infrastructures including object stores prevalent in cloud offerings (e.g. Amazon S3 API). • Allows deduplication of content, lowering overall storage costs
  • 9.
    OCFL v1.0 First alphaOctober 2018 Beta draft June 2019 Version 1.0 July 2020 – specification and implementation notes Validation and manipulation tools implemented in multiple programming languages • Stand alone tools through • Fedora 6 using OCFL for persistence (in development)
  • 10.
    Specification Describes: • OCFL Objects– including the object structure, versioning, and the inventory.json which provides a comprehensive registry of the object • OCFL Storage Root – how objects are arranged with examples to illustrate use Implementation Notes Recommendations and practices for implementing the specification, including: • Digital preservation issues – including rebuildability and fixity • Storage choices – including different content cases and infrastructures • Client behaviors – to support common operations and objects in motion
  • 11.
    Thank you! Details andlinks to get involved: https://ocfl.io/

Editor's Notes

  • #2 I'd like to thank the We Miss iPres festival and the Digital Preservation Coalition Awards for the opportunity to speak today I'm going to briefly present about OCFL - the Oxford Common File Layout I’m speaking on behalf of the OCFL Editorial Group which comprises
  • #5 I’d like to focus on five key requirements that OCFL meets