The Oxford Common File Layout (OCFL) is an emerging data standard that describes an application-independent approach to the storage of digital information in a structured, transparent and predictable manner. With the most recent release, v1.1, OCFL implementations are becoming increasingly popular within institutions looking for long-term preservation solutions that are robust against corruption, offer storage diversity and are easily transportable between storage vendors, thus protecting their content into the foreseeable future.
In this presentation we will discuss the specific design goals and methodologies involved in developing and maintaining the OCFL specification as well as explore different ways institutions are implementing OCFL as part of their digital preservation programs. Implementers will speak about their use case requirements for their individual institutions and reasoning behind selecting OCFL as well as discuss their desired outcomes.
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
The Oxford Common File Layout
1. SLIDESMANIA.
The Oxford Common File
Layout
Understanding the specification,
institutional use cases &
implementations
Arran Griffith - Fedora Program Manager
Stefano Cossu - Harvard University Libraries
Thomas Wrobel - Oxford University, Bodleian Libraries
2. SLIDESMANIA.
The Oxford
Common
File Layout
(OCFL)
“A simple, non-
proprietary, specified,
open-standards approach
to the layout of preservation
persistence.”
● Purpose: provide a preservation-centric,
common approach to filesystem layout
for digital repositories
● Developed and maintained by the OCFL
Editorial Board
● Several implementations of the specification
are in active use
3. SLIDESMANIA.
Benefits of the OCFL
Parsability
Storage Diversity
Robustness
● Readable by both humans and
machines
Versioning
● Checksums to protect against
corruption and errors between storage
technologies
Completeness
● Ensures content can be stored on any
type of infrastructure including
conventional systems or cloud
systems
● So the repository can be rebuilt from
the files it stores
● All changes are versioned, allowed a
repository’s history to persist
5. SLIDESMANIA.
OCFL for Digital Preservation in
● OCFL was incorporated to enhanced long-term digital preservation for Fedora
repositories
● Fedora 6.x writes all data to OCFL formatting using the OCFL-java library
○ Purpose of this was to take advantage of the transparency offered by the
OCFL file structure
Benefits:
● Application-independent persistence
● Human and machine readable data
● Ability to rebuild repository from contents on disk
● Fewer migrations in the future
Standards = What to do Fedora + OCFL = How to do it
7. SLIDESMANIA.
DRS at a glance
Scale
Long-standing legacy
Migration time & costs
DRS Futures project
➜ 3-year capital-funded project to
replace current DRS
➜ Design a new repository without
migrating data
➜ >10M objects, >100M files, 2Pb
replicated
➜ Exponential growth foreseen in the
future
➜ More than 1 year to migrate from
POSIX to OCFL
➜ Don’t want to do that again
➜ In operation and continuously
maintained for 22 years
➜ In need of a complete re-engineering
8. SLIDESMANIA.
Value of OCFL for DRS
Approach What OCFL provides
Assumptions Challenges
➜ A file layout standard specifically
designed for long-term preservation
➜ A software-agnostic data layer
➜ A community dedicated to resolving
digital preservation problems
➜ How do we guarantee performance?
➜ What about backward compatibility?
➜ Do we have enough OCFL-compatible
software choices?
➜ A standard adopted by a sufficiently
large & diverse community that
guarantees the promised stability
➜ A healthy community of implementers
and service providers to implement &
maintain the required tools
➜ Maintain a “storage fabric” separated
from the application layer
➜ Replace current DRS without migrating
or rearranging the data layer
10. SLIDESMANIA.
University of Oxford - Bodleian Libraries
● 300,000+ works
● 100,000 of these have public binary files of which ORA holds the only digital
copy
● Works include:
○ Articles, conference papers, theses, research data, working papers,
posters, and more…
Digital Preservation
Microservices (DPMS)
Oxford University Research Archive
(ORA)
Digital Preservation
Service
Purpose: preserve a versioned copy
of a digital object which will allow
the DPMS to monitor, analyse and
support the system
Purpose: monitor and support the
preservation of binary content and
metadata
11. SLIDESMANIA.
Advantages of OCFL Advantages of Fedora
University of Oxford - Bodleian Libraries
● Platform & application agnostic
● DPS OCFL layer decreases
migrations
● Back-up & monitoring more
simplified
● Parsability
● Single parent directory = no need
for index or management
application to analyse a given
object
● Well documented RESTful API
● Transaction management
● Authentication & Authorization
● Community support and continued
engagement
12. SLIDESMANIA.
What’s Next…
University of Oxford - Bodleian Libraries
● Performance and scale testing of Fedora 6.x + OCFL
● Export ORA repository into DPS and integrate with day-to-day
operations
● Expand to other services with the Bodleian Libraries
Reach Us:
Thomas Wrobel - thomas.wrobel@bodleian.ox.ac.uk
ORA Team - ora-dev@bodleian.ox.ac.uk
13. SLIDESMANIA.
Resourc
es
The Oxford Common File Layout
www.ocfl.io
Fedora Program Info
Wiki:
https://wiki.lyrasis.org/display/FF/Fedora+Repository+Home
Documentation:
https://wiki.lyrasis.org/display/FEDORA6x
Get Connected:
https://wiki.lyrasis.org/display/FF/Mailing+Lists+etc
Harvard DRS Futures
https://sites.harvard.edu/drs-futures/
14. SLIDESMANIA.
Thank You
Arran Griffith - arran.griffith@lyrasis.org
Stefano Cossu - stefano_cossu@harvard.edu
Thomas Wrobel -
thomas.wrobel@bodleian.ox.ac.uk
Editor's Notes
Hello
My name is Arran Griffith and I am the program manager for the Fedora Program. I am joined today in-person my my colleague Stefano Cusso from Harvard University Libraries. Our other co-presenter, Thomas Wroble, sends his regards that he wasn’t able to make it, but he’s given us some info on the work they are doing at the Bodleian to share with you. But Stefano and I are talking generally about the Oxford Common File Layout specification, and sharing how each of us are incorporating the OFCL as components of our systems to take advantage of what the specification offers.
The Oxford Common File Layout (OCFL) is a specification that describes an application-independent approach to the storage of digital information in a structured, transparent, and predictable manner. It was developed to provide a standardized approach to filesystem layout within a digital repository that would also promote preservation and support long-term object management best practices within the repositories.
It is defined and developed by and editorial board who is responsible for the upkeep, continued development and maintenance of the spec. Currently there are several implementations of the OCFL in active use around the globe - all of which can be found on the ocfl website - ocfl.io. Today though, we are here to share with you our individual use cases involving the OCFL and talk about why we’ve opted to use it and how we’re doing that. If you have any questions about the OCFL specifically, Stefano and I are more than happy to try to answer them, but we are by no means the OCFL experts so we will defer to them and encourage you to join the #ocfl chanel on the Code4Lib slack or reach out on the website which I’ve linked too at the end.
As I mentioned, the purpose of OCFL is to provide an application-indenpendent approach to storing digital content. The specification dictates the way files are structured and written and this, in terms of digital preservation offers many benefits. These are the 5 main benefits offered by the specification:
The first being parsability - this means that the files themselves, once written to OCFL are done so in a simple, plain text format which is readable to both machines and humans which means they can be understood in the absence of the original software.
Next is robustness. OCFL provides checksums for both the content and metadata to ensure robustness against errors and data corruption between storage technologies.
OCLF also offers native versioning - This is part of it’s core DNA. It uses a forward delta algorithm which eliminates unnecessary duplication between versions. Built into the specification is the principle of immutable versioning. Everything is there and exists as versions to allow their history to persist.
As I mentioned before, by nature, OCFL allows for storage Diversity - you can use any type of storage system you’d like because the simple file system metaphor with it’s basic files and directories allows you to operate on disk or in the cloud.
And lastly OCFL offers Completeness. And what this means is that everything is preserved in the structure of the spec including all the data and associated provenance which allows you to theoretically rebuild your repository from the files you have. Should the unspeakable ever happen and the hardware fails, you can simply take your oclf repo and stand it up again elsewhere because it would be complete and preserved as such.
As you can see there is a lot to gain from implementing the OCFL specification. Now we are going to share a little bit about why each of our programs and institutions hs chosen to incorporate this standard into our systems and software.
Fedora is here to represent how we, as a software, are taking advantage of OCFL and incorporating it into our core. We use the OCFL-java implementation of the spec and this was the major feature improvement with Fedora 6. The community made the decision to use the OCFL standard within the persistence layer of Fedora in order to give our users back the transparency they were looking for and were used too from Fedora 3. OCFL replaced the MODESHAPE back end of Fedora 4, which was kind of this black box of unknown territory. Making the decision to do this required a major re-write of the core software but now gives us this very transparent and largely enhanced long-term digital preservation tool by offering Fedora 6 + OCFL.
Fedora benefits from using OCFL for preservation for several reasons:
Fedora itself provides a means of reading, writing and delivering digital files to your users, and OCFL provides the standard for which those files are preserved.
If we consider long-term preservation, if Fedora were to ever go away, you have all the info within OCFL to stand up your repository again simply from the files on disk.
The metadata is still intact as well as all of the provenance required to meet preservation standards.
And because of the standardized way that OCFL dictates the file system layout, migrations should be more simplified going forward. There should be no need to reformat data in any way to move into newer versions of the software as was the case with previous Fedora migrations.
OCFL and Fedora provides preservation that is also independent of the storage medium. This gives Fedora users more options for storing their objects. Since OCFL stores plain files, you can use whatever storage medium you choose, whether that be local storage or cloud storage.
There is support within Fedora via the java client for cloud storage
So to sum it all up - standards equal WHAT to do, and this combination of FEdora and OCFL provide the HOW. It’s the combination of the two that provide the best possible software solution for long-term digital preservation.
General info about LTS and DRS
Mention downsides of the approach: restricted choice of solutions that conform to OCFL