SlideShare a Scribd company logo
Oxford Common File Layout
Rosalyn Metz (Emory),
Simeon Warner (Cornell)
Samvera Connect 2018
http://bit.ly/ocfl-samcon2018
Not just us...
OCFL Editorial Group
● Andrew Hankinson (Oxford)
● Neil Jefferies (Oxford)
● Julian Morley (Stanford)
● Andrew Woods (DuraSpace)
● and us (Rosalyn and Simeon)
Community input from pasig-discuss and
ocfl-community groups, and from others
Closest parents: BagIt & Moab
BagIt
Well established and implemented specification for handling sets of files
● Being formally standardized as RFC:
https://tools.ietf.org/html/draft-kunze-bagit-17
● Used for transfer and (somewhat less) for files at rest
● Good fixity support
● No explicit versioning support
○ Could use local conventions for version inside a bag
○ Could use bag-per-version
Moab: A Brief History
Slides adapted from Julian Morley's in the OR2018 OCFL presentation
● Moab is the closest ancestor of OCFL
● Developed at Stanford Libraries by Richard Anderson
○ Article: http://journal.code4lib.org/articles/8482
● Named after Moab, UT
Moab: A Brief History
● Moab is a versioned, forward delta file
structure that supports fixity and file
de-duplication.
● You can preserve anything with it (even
cat pictures found on the internet)
● The tools to manage and create Moabs
are open source Ruby gem
○ https://github.com/sul-dlss/moab-versioning
Moab is part of the
Stanford Digital Repository
Here be Moabs!
Moab in Practice @ Stanford
We have many Moabs in the SDR
● 1.6 million Moab objects
● 5 million version directories
● 50+ million files
● 500+ TB of data (25TB added last month)
● Spread across 15 NFS volumes on NetApp filers
● Backed up by IBM Spectrum Protect (formerly TSM)
○ 1 tape copy kept in local tape frame;1 sent to Iron Mountain
ab123cd4567
v0001
data
content
title.jpg
intro.jpg
page1.jpg
page2.jpg
page3.jpg
metadata
versionMetadata.xml
descMetadata.xml
identityMetadata.xml
manifests
versionInventory.xml
signatureCatalog.xml
versionAdditions.xml
fileInventoryDifference.xml
manifestInventory.xml
v0002
data
content
page2.jpg
metadata
versionMetadata.xml
technicalMetadata.xml
manifests
versionInventory.xml
signatureCatalog.xml
versionAdditions.xml
fileInventoryDifference.xml
manifestInventory.xml
two version directories; /v0001 & /v0002
A sample Moab object on disk
/data content comes from upstream and could
be anything, but our systems create data in
/content and /metadata directories.
/manifests directories are for Moab metadata.
This is where we store all the checksums and
change information for deduplication and
forward deltas.
Lessons from Cornell
CULAR @ 2017
It worked, what
now?
● Fedora 3 no longer being
developed, Fedora 4 not an
appropriate option
● Decision not to buy
"preservation services",
primarily on cost grounds
● Decision that we want one local
copy for legal access reasons
Short term ⇒ use local disk and
AWS S3. Build tools over
filesystem and object stores
Those files sure
are piling up!
Nearly 100TB now, planning
100TB/year digitization
● Plan to purchase a scalable
local (object) storage system
for 1 copy
● Two more copies in cloud
(perhaps tape)
● Content will outlast any
application or software system
● Content will outlast any storage
system
● Expect change and hence
migration ⇒ KISS
OCFL object
OCFL storage root(s)?
Shared Cornell and OCFL Goals
● Provide an application and vendor neutral storage arrangement that can be
used with filesystems and object stores
○ Allow easy replication between multiple storage environments
○ Allow easy migration between storage systems (modulo the inherent burdens)
○ Allow use with multiple and changing applications
● Support package versioning at low cost (complexity and storage use)
● Support internal package validation for completeness and fixity
● Support audit and self-description of entire store
● Have an easy migration path from current archival storage arrangements
● Develop a shared model that is useful at multiple institutions so that all benefit
from community developed tools and expertise.
Lessons from Emory
Lessons from Emory: Deliverables
Actively engaged in a multi-year effort to gather requirements, design, and
develop a digital repository based on the Samvera framework.
Selected deliverables included...
Develop object definitions/types (e.g.
collections, objects, other entities) and their
relationships to one another; determine
preservation objects inside and outside of
Fedora.
Identify needs for AIP structure.
Identify storage requirements (e.g. number of
copies, file access scenarios)
Lessons from Emory: Identified requirements
The means to distribute digital objects to third-party preservation services.
A well understood and well documented model for storing digital objects.
Ability to place multiple copies of digital objects into diverse storage services
(AWS, local storage, etc.).
Easily allow for fixity checking of digital objects.
Digital
Object
Content Files
(Primary or Supplemental)
Content file 1
Content file 2
Content file 3...
… + additional
… + additional
The content itself:
relationships provided in
structural metadata
Metadata (Actionable/Indexed)
Desc. metadata
Technical metadata (File-level)
Preservation Events/Audits
Administrative metadata
Structural metadata (PCDM)
Metadata converted to RDF
for Hyrax/Fedora - editable
and/or searchable
Supplemental Preservation Files
(Metadata/Administrative Files)
Source Metadata (binary file)
Desc. Metadata record (binary file)
METS (binary file)
License/agreement (binary file)
Supplemental PREMIS (binary file)
Variable supplemental info
stored as files (not directly
system-readable):
staff can view or download
file to read it
Collection
Ancient Egyptian
Collection
Administrative
Collection
Carlos Museum
Administrative
Collections reflect the
process the libraries
followed when deciding to
collect materials.
Digital Objects must be a
part of an Administrative
Collection and optionally in
one or more Collections
Digital Objects may
contain one or more files
Digital Objects,
Collections receive
Emory-defined metadata
and relationships
Major Emory
Entities PCDM
Context -
Simple Example
Individual Agreements
contain information about
the Administrative
Collection.
Individual Agreements
may contain one or more
files
Individual Agreements
are assigned to objects
through their parent
Collection
Is a member of
Is a member of
Individual Agreement
Carlos Museum
Agreement
Digital Object
Statuette of a Cat.
Collection
Divine Felines Exhibition
Is a member of
Is a member of
Goals of OCFL
OCFL Requirements
1) Completeness, so that a repository can be
rebuilt from the files it stores,
2) Parsability, both by humans and machines,
most importantly in the absence of original
software,
3) Robustness, against errors, corruption, and
migration between storage technologies, and
4) Storage, on a variety of infrastructures
including cloud object stores.
Many existing digital preservation
standards like:
● TDR (ISO 16363)
● OAIS (ISO 14721)
● NDSA Levels of Preservation
● BagIt
discuss the need for these
requirements, but none provided a
standardized way for how to do it.
OCFL the specification
https://ocfl.io/draft/spec/
OCFL Object
A group of one or more content files and
administrative information identified by a
URI.
The object may contain a sequence of versions
of the files organized into version directories.
The base directory of the object may contain a
logs directory.
A NAMASTE file indicating conformance.
An object contains an inventory digest file
which provides a digest for the
inventory.json file.
[object root]
├── 0=ocfl_object_1.0
├── inventory.json
├── inventory.json.sha512
├── v1
│ ├── empty.txt
│ ├── foo
│ │ └── bar.xml
│ ├── image.tiff
│ ├── inventory.json
│ └── inventory.json.sha512
├── v2
│ ├── foo
│ │ └── bar.xml
│ ├── inventory.json
│ └── inventory.json.sha512
└── v3
├── inventory.json
└── inventory.json.sha512
OCFL Object
An object contains an inventory.json file
which inventories the contents of an object.
The manifest block lists all the digests and
existing file paths for all of the object’s content.
The versions block identifies the logical file path
and the digest for each version of the object’s
content.
Separating the logical file path from the
existing file path and using digests to refer to
files allows for deduplication of content.
{
"head": "v3",
"id": "ark:/12345/bcd987",
"manifest": {
"4d27c8...b53": [ "v2/foo/bar.xml" ],
"7dcc35...c31": [ "v1/foo/bar.xml" ],
"cf83e1...a3e": [ "v1/empty.txt" ],
"ffccf6...62e": [ "v1/image.tiff" ]
},
"type": "Object",
"versions": [
{
"created": "2018-01-01T01:01:01Z",
"message": "Initial import",
"state": {
"7dcc35...c31": [ "foo/bar.xml" ],
"cf83e1...a3e": [ "empty.txt" ],
"ffccf6...62e": [ "image.tiff" ]
},
"type": "Version",
"user": {
"address": "alice@example.com",
"name": "Alice"
},
"version": "v1"
},
{
"created": "2018-02-02T02:02:02Z",
"message": "Fix bar.xml, remove image.tiff,
OCFL Storage Root
The base directory of an OCFL storage layout.
Should also contain the OCFL specification in
human-readable plain-text format.
Should contain the conformance declaration
OCFL Objects may conform to the same or
earlier version of the specification.
The storage hierarchy must terminate with an
OCFL Object Root.
[storage root]
├── 0=ocfl_1.0
├── ocfl_1.0.txt (optional)
├── ab12cd34
│ ├── 0=ocfl_object_1.0
│ ├── inventory.json
│ ├── inventory.json.sha512
│ └── v1
│ ├── file.txt
│ ├── inventory.json
│ └── inventory.json.sha512
└── ef56gh78
. ├── 0=ocfl_object_1.0
├── inventory.json
├── inventory.json.sha512
├── v1
│ ├── empty.txt
│ ├── foo
│ │ └── bar.xml
│ ├── image.tiff
│ ├── inventory.json
│ └── inventory.json.sha512
└── v2
├── foo
│ └── bar.xml
├── inventory.json
└── inventory.json.sha512
OCFL Storage Root
Storage hierarchies must not include files
within intermediate directories
Storage hierarchies must be terminated by
OCFL Object Roots
Storage hierarchies within the same OCFL
Storage Root should use just one layout
pattern
Storage hierarchies within the same OCFL
Storage Root should consistently use either a
directory hierarchy of OCFL Objects or
top-level OCFL Objects
[storage root]
├── 0=ocfl_1.0
├── ocfl_1.0.txt (optional)
└── ab
└── 12
└── cd
└── 34
└── ab12cd34
├── 0=ocfl_object_1.0
├── inventory.json
├── inventory.json.sha512
├── v1
│ ├── empty.txt
│ ├── foo
│ │ └── bar.xml
│ ├── image.tiff
│ ├── inventory.json
│ └── inventory.json.sha512
└── v2
├── foo
│ └── bar.xml
├── inventory.json
└── inventory.json.sha512
OCFL implementation patterns
https://ocfl.io/draft/implementation-notes/
Rebuildability
● Key OCFL goal -- be able to rebuild repo
from an OCFL storage root
● Therefore, in OAIS terms: must include
all the descriptive, administrative,
structural, representation, and
preservation metadata relevant to the
object.
● Optionally include copy of spec in top level
of OCFL storage root
● More complete option would be a specific
OCFL object that contains this
documentation and to have a pointer to its
location in the storage root.
e.g. permissions, access, and
creation times
● not portable between filesystems
● not preservable through file
transfer operations
● ill-defined fixity
⇒ out-of-scope
If important, use filesystem image
format or extract as metadata
Filesystem metadata
Empty Directories
● OCFL preserves files and their
content
● Directories serve as an
organizational convention
● Empty directories not directly
supported
⇒ Use zero-length `.keep` file as
necessary (ala. `git`, BagIt)
Only special files are the inventory,
its digest file, and conformance
declaration files
Otherwise OCFL makes no
distinction between different types of
files.
⇒ Use local conventions as
needed
Data and Metadata
Storage
● Filesystem or Object Store -- you choose
● Original filename or Normalized filename -- you choose
● Deduplication & Forward delta differencing (at file level) --
optional but likely desirable/normal
"logical file path" - path of file in content as part of state for a particular version
"existing file path" - path of file in OCFL object
content addressing ties these two together
Storage Root Hierarchy - flat, pairtree, ex-wye-zee
[storage_root]
├── 0=ocfl_1.0
├── ocfl_1.0.txt (optional)
├── d45be626e024
| ├── 0=ocfl_object_1.0
| ├── inventory.json
| ├── inventory.json.sha512
| └── v1...
├── d45be626e036
| ├── 0=ocfl_object_1.0
| ├── inventory.json
| ├── inventory.json.sha512
| └── v1...
├── 3104edf0363a
| ├── 0=ocfl_object_1.0
| ├── inventory.json
| ├── inventory.json.sha512
| └── v1...
[storage_root]
├── 0=ocfl_1.0
├── ocfl_1.0.txt (optional)
├── d4
| └── 5b
| └── e6
| └── 26
| └── e0
| ├── 24
| | └──d45be626e024
| | ├──
0=ocfl_object_1.0
| | └── ...
| └── 36
| └──d45be626e036
| ├──
0=ocfl_object_1.0
| └── ...
File operations
(mungification?)
● Inheritance
● Addition
● Updating
● Renaming
● Deletion
● Reinstatement
● Purging ⇒ choices:
a. rebuild new object
b. break immutability and
rewrite (not recommended)
Yes - OCFL supports that...
Version Immutability
OCFL supports systems where
versions (everything in a given
version directory) is immutable once
written.
● It is recommended to follow this
practice
● BUT you can rewrite objects if
you really want to, but
OCFL supports (in fact, enforces for
internal references) deduplication
through digests
● Only within an object
● File level
● sha512 digest recommended
Deduplication
Forward Delta
Each version need only include new
and changed files
● Files from previous version
included by reference
● Reference by content (digest)
supports renaming without
duplicating
(You can avoid this and include files again if you
really want. But why?)
1. Digests used for reference
already provide basis for strong
fixity checks (pref. sha512)
2. Additional digests may be
include to support legacy fixity
information (e.g. md5)
(Fixity of inventory files themselves handled by
sidecar file, e.g. inventory.json.sha512)
Fixity
Log Information
log directory in OCFL object
available for information not in
objects content and not versioned
● form not specified
● will be ignored in object
validation
Objects with many small file may
cause problems with some storage
infrastructures and may make
validation/fixity time consuming
● package in single file (ZIP
recommend)
(Options for a later version of the OCFL spec
are ZIPped objects and/or ZIP by version)
Small Files
Roadmap
Alpha (yesterday)
● Released(ish) on October 10 community call
(OCFL Editors and PASIG Discuss)
● Feedback for November community call
Beta (date based on feedback)
● Experimental validation tool
● Determine what other groups communities to
seek input from
Release 1.0 (2019)
● One production-ready validator
● Test suite and fixture objects
● Two institutions committed to backing the
initiative (should define that)
41
Thank You
https://ocfl.io
https://github.com/OCFL
ocfl-community@googlegroups.com

More Related Content

What's hot

Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
Marina Santini
 
MongoDB
MongoDBMongoDB
MongoDB
nikhil2807
 
DSpace 7 - The Power of Configurable Entities
DSpace 7 - The Power of Configurable EntitiesDSpace 7 - The Power of Configurable Entities
DSpace 7 - The Power of Configurable Entities
Atmire
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
Neil Baker
 
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Timothy Spann
 
Moving from SQL Server to MongoDB
Moving from SQL Server to MongoDBMoving from SQL Server to MongoDB
Moving from SQL Server to MongoDB
Nick Court
 
Getting started with DSpace 7 REST API
Getting started with DSpace 7 REST APIGetting started with DSpace 7 REST API
Getting started with DSpace 7 REST API
4Science
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data Capture
Jeff Klukas
 
Web Intelligence - Tutorial1
Web Intelligence - Tutorial1Web Intelligence - Tutorial1
Web Intelligence - Tutorial1
Obily W
 
Introduction to Spring Framework
Introduction to Spring FrameworkIntroduction to Spring Framework
Introduction to Spring Framework
Dineesha Suraweera
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
confluent
 
Apache Jackrabbit Oak - Scale your content repository to the cloud
Apache Jackrabbit Oak - Scale your content repository to the cloudApache Jackrabbit Oak - Scale your content repository to the cloud
Apache Jackrabbit Oak - Scale your content repository to the cloud
Robert Munteanu
 
A Brief History of Database Management (SQL, NoSQL, NewSQL)
A Brief History of Database Management (SQL, NoSQL, NewSQL)A Brief History of Database Management (SQL, NoSQL, NewSQL)
A Brief History of Database Management (SQL, NoSQL, NewSQL)
Abdelkader OUARED
 
Introduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlIntroduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission Control
Leon Chen
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
Joey Wen
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge Graph
Sören Auer
 
Reactive Microservices with Spring 5: WebFlux
Reactive Microservices with Spring 5: WebFlux Reactive Microservices with Spring 5: WebFlux
Reactive Microservices with Spring 5: WebFlux
Trayan Iliev
 
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
Amazon Web Services
 
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
dgarijo
 
Mongo db intro.pptx
Mongo db intro.pptxMongo db intro.pptx
Mongo db intro.pptx
JWORKS powered by Ordina
 

What's hot (20)

Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
MongoDB
MongoDBMongoDB
MongoDB
 
DSpace 7 - The Power of Configurable Entities
DSpace 7 - The Power of Configurable EntitiesDSpace 7 - The Power of Configurable Entities
DSpace 7 - The Power of Configurable Entities
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
 
Moving from SQL Server to MongoDB
Moving from SQL Server to MongoDBMoving from SQL Server to MongoDB
Moving from SQL Server to MongoDB
 
Getting started with DSpace 7 REST API
Getting started with DSpace 7 REST APIGetting started with DSpace 7 REST API
Getting started with DSpace 7 REST API
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data Capture
 
Web Intelligence - Tutorial1
Web Intelligence - Tutorial1Web Intelligence - Tutorial1
Web Intelligence - Tutorial1
 
Introduction to Spring Framework
Introduction to Spring FrameworkIntroduction to Spring Framework
Introduction to Spring Framework
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
 
Apache Jackrabbit Oak - Scale your content repository to the cloud
Apache Jackrabbit Oak - Scale your content repository to the cloudApache Jackrabbit Oak - Scale your content repository to the cloud
Apache Jackrabbit Oak - Scale your content repository to the cloud
 
A Brief History of Database Management (SQL, NoSQL, NewSQL)
A Brief History of Database Management (SQL, NoSQL, NewSQL)A Brief History of Database Management (SQL, NoSQL, NewSQL)
A Brief History of Database Management (SQL, NoSQL, NewSQL)
 
Introduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlIntroduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission Control
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge Graph
 
Reactive Microservices with Spring 5: WebFlux
Reactive Microservices with Spring 5: WebFlux Reactive Microservices with Spring 5: WebFlux
Reactive Microservices with Spring 5: WebFlux
 
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
 
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
 
Mongo db intro.pptx
Mongo db intro.pptxMongo db intro.pptx
Mongo db intro.pptx
 

Similar to Oxford Common File Layout (OCFL)

The Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservationThe Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservation
Simeon Warner
 
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersAlphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
New York University
 
Hypatia for dlf 2011
Hypatia for dlf 2011Hypatia for dlf 2011
Hypatia for dlf 2011
DLFCLIR
 
Fedora Overview
Fedora OverviewFedora Overview
Fedora Overview
eposthumus
 
Memory Analysis of the Dalvik (Android) Virtual Machine
Memory Analysis of the Dalvik (Android) Virtual MachineMemory Analysis of the Dalvik (Android) Virtual Machine
Memory Analysis of the Dalvik (Android) Virtual Machine
Andrew Case
 
Fedora Commons in the CLARIN Infrastructure
Fedora Commons in the CLARIN InfrastructureFedora Commons in the CLARIN Infrastructure
Fedora Commons in the CLARIN Infrastructure
Menzo Windhouwer
 
Using Fedora Commons To Create A Persistent Archive
Using Fedora Commons To Create A Persistent ArchiveUsing Fedora Commons To Create A Persistent Archive
Using Fedora Commons To Create A Persistent Archive
Phil Cryer
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival Technologies
Cliff Landis
 
Sword Cetis 2007 06 29
Sword Cetis 2007 06 29Sword Cetis 2007 06 29
Sword Cetis 2007 06 29
Julie Allinson
 
Sword Cetis 2007 06 29
Sword Cetis 2007 06 29Sword Cetis 2007 06 29
Sword Cetis 2007 06 29
Sheila MacNeill
 
2.28.18 Getting Started with Fedora presentation slides
2.28.18 Getting Started with Fedora presentation slides2.28.18 Getting Started with Fedora presentation slides
2.28.18 Getting Started with Fedora presentation slides
DuraSpace
 
Audio MD Metadata Scheme
Audio MD Metadata SchemeAudio MD Metadata Scheme
Audio MD Metadata Scheme
Ariel Hess
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
Carole Goble
 
OCFL v1.0
OCFL v1.0OCFL v1.0
OCFL v1.0
Simeon Warner
 
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
OpenAIRE
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Anita de Waard
 
Eclipse Memory Analyzer
Eclipse Memory AnalyzerEclipse Memory Analyzer
Eclipse Memory Analyzer
nayashkova
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
Jack Eapen
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
Jack Eapen
 
Sword 2007 06 22
Sword 2007 06 22Sword 2007 06 22
Sword 2007 06 22
Julie Allinson
 

Similar to Oxford Common File Layout (OCFL) (20)

The Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservationThe Oxford Common File Layout: A common approach to digital preservation
The Oxford Common File Layout: A common approach to digital preservation
 
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata MattersAlphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
 
Hypatia for dlf 2011
Hypatia for dlf 2011Hypatia for dlf 2011
Hypatia for dlf 2011
 
Fedora Overview
Fedora OverviewFedora Overview
Fedora Overview
 
Memory Analysis of the Dalvik (Android) Virtual Machine
Memory Analysis of the Dalvik (Android) Virtual MachineMemory Analysis of the Dalvik (Android) Virtual Machine
Memory Analysis of the Dalvik (Android) Virtual Machine
 
Fedora Commons in the CLARIN Infrastructure
Fedora Commons in the CLARIN InfrastructureFedora Commons in the CLARIN Infrastructure
Fedora Commons in the CLARIN Infrastructure
 
Using Fedora Commons To Create A Persistent Archive
Using Fedora Commons To Create A Persistent ArchiveUsing Fedora Commons To Create A Persistent Archive
Using Fedora Commons To Create A Persistent Archive
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival Technologies
 
Sword Cetis 2007 06 29
Sword Cetis 2007 06 29Sword Cetis 2007 06 29
Sword Cetis 2007 06 29
 
Sword Cetis 2007 06 29
Sword Cetis 2007 06 29Sword Cetis 2007 06 29
Sword Cetis 2007 06 29
 
2.28.18 Getting Started with Fedora presentation slides
2.28.18 Getting Started with Fedora presentation slides2.28.18 Getting Started with Fedora presentation slides
2.28.18 Getting Started with Fedora presentation slides
 
Audio MD Metadata Scheme
Audio MD Metadata SchemeAudio MD Metadata Scheme
Audio MD Metadata Scheme
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
OCFL v1.0
OCFL v1.0OCFL v1.0
OCFL v1.0
 
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Eclipse Memory Analyzer
Eclipse Memory AnalyzerEclipse Memory Analyzer
Eclipse Memory Analyzer
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
 
Sword 2007 06 22
Sword 2007 06 22Sword 2007 06 22
Sword 2007 06 22
 

More from Simeon Warner

Questioning Authority Lookup Service: Linking the Data
Questioning Authority Lookup Service: Linking the DataQuestioning Authority Lookup Service: Linking the Data
Questioning Authority Lookup Service: Linking the Data
Simeon Warner
 
OCFL: A Shared Approach to Preservation Persistence
OCFL: A Shared Approach to Preservation PersistenceOCFL: A Shared Approach to Preservation Persistence
OCFL: A Shared Approach to Preservation Persistence
Simeon Warner
 
Welcome to the FOLIO Community
Welcome to the FOLIO CommunityWelcome to the FOLIO Community
Welcome to the FOLIO Community
Simeon Warner
 
Sinopia & FOLIO: Bridging the gap to linked data cataloging
Sinopia & FOLIO: Bridging the gap to linked data cataloging Sinopia & FOLIO: Bridging the gap to linked data cataloging
Sinopia & FOLIO: Bridging the gap to linked data cataloging
Simeon Warner
 
FOLIO and Linked Data
FOLIO and Linked DataFOLIO and Linked Data
FOLIO and Linked Data
Simeon Warner
 
IIIF Technical Specification Status Update
IIIF Technical Specification Status UpdateIIIF Technical Specification Status Update
IIIF Technical Specification Status Update
Simeon Warner
 
LKG Editor Dev
LKG Editor DevLKG Editor Dev
LKG Editor Dev
Simeon Warner
 
Don't bold the field name!
Don't bold the field name!Don't bold the field name!
Don't bold the field name!
Simeon Warner
 
Samvera and IIIF 2018
Samvera and IIIF 2018Samvera and IIIF 2018
Samvera and IIIF 2018
Simeon Warner
 
ORCID @ Cornell
ORCID @ CornellORCID @ Cornell
ORCID @ Cornell
Simeon Warner
 
From Open Annotations to W3C Web Annotations (and the impact on IIIF Present...
From Open Annotations to W3C Web Annotations (and the impact on IIIF Present...From Open Annotations to W3C Web Annotations (and the impact on IIIF Present...
From Open Annotations to W3C Web Annotations (and the impact on IIIF Present...
Simeon Warner
 
Introduction to the IIIF Presentation API (@SWIB17)
Introduction to the IIIF Presentation API (@SWIB17)Introduction to the IIIF Presentation API (@SWIB17)
Introduction to the IIIF Presentation API (@SWIB17)
Simeon Warner
 
Introduction to the International Image Interoperability Framework (IIIF)
Introduction to the International Image Interoperability Framework (IIIF)Introduction to the International Image Interoperability Framework (IIIF)
Introduction to the International Image Interoperability Framework (IIIF)
Simeon Warner
 
From Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and CollaborationsFrom Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and Collaborations
Simeon Warner
 
Mind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvestingMind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvesting
Simeon Warner
 
ORCID & other Person iDs
ORCID & other Person iDsORCID & other Person iDs
ORCID & other Person iDs
Simeon Warner
 
Who's the Author? Identifier soup - ORCID, ISNI, LC NACO and VIAF
Who's the Author? Identifier soup - ORCID, ISNI, LC NACO and VIAFWho's the Author? Identifier soup - ORCID, ISNI, LC NACO and VIAF
Who's the Author? Identifier soup - ORCID, ISNI, LC NACO and VIAF
Simeon Warner
 
IIIF without an image server? No problem!
IIIF without an image server? No problem!IIIF without an image server? No problem!
IIIF without an image server? No problem!
Simeon Warner
 
IIIF Technical Specification Status Update
IIIF Technical Specification Status UpdateIIIF Technical Specification Status Update
IIIF Technical Specification Status Update
Simeon Warner
 
Discovery of IIIF Resources
Discovery of IIIF ResourcesDiscovery of IIIF Resources
Discovery of IIIF Resources
Simeon Warner
 

More from Simeon Warner (20)

Questioning Authority Lookup Service: Linking the Data
Questioning Authority Lookup Service: Linking the DataQuestioning Authority Lookup Service: Linking the Data
Questioning Authority Lookup Service: Linking the Data
 
OCFL: A Shared Approach to Preservation Persistence
OCFL: A Shared Approach to Preservation PersistenceOCFL: A Shared Approach to Preservation Persistence
OCFL: A Shared Approach to Preservation Persistence
 
Welcome to the FOLIO Community
Welcome to the FOLIO CommunityWelcome to the FOLIO Community
Welcome to the FOLIO Community
 
Sinopia & FOLIO: Bridging the gap to linked data cataloging
Sinopia & FOLIO: Bridging the gap to linked data cataloging Sinopia & FOLIO: Bridging the gap to linked data cataloging
Sinopia & FOLIO: Bridging the gap to linked data cataloging
 
FOLIO and Linked Data
FOLIO and Linked DataFOLIO and Linked Data
FOLIO and Linked Data
 
IIIF Technical Specification Status Update
IIIF Technical Specification Status UpdateIIIF Technical Specification Status Update
IIIF Technical Specification Status Update
 
LKG Editor Dev
LKG Editor DevLKG Editor Dev
LKG Editor Dev
 
Don't bold the field name!
Don't bold the field name!Don't bold the field name!
Don't bold the field name!
 
Samvera and IIIF 2018
Samvera and IIIF 2018Samvera and IIIF 2018
Samvera and IIIF 2018
 
ORCID @ Cornell
ORCID @ CornellORCID @ Cornell
ORCID @ Cornell
 
From Open Annotations to W3C Web Annotations (and the impact on IIIF Present...
From Open Annotations to W3C Web Annotations (and the impact on IIIF Present...From Open Annotations to W3C Web Annotations (and the impact on IIIF Present...
From Open Annotations to W3C Web Annotations (and the impact on IIIF Present...
 
Introduction to the IIIF Presentation API (@SWIB17)
Introduction to the IIIF Presentation API (@SWIB17)Introduction to the IIIF Presentation API (@SWIB17)
Introduction to the IIIF Presentation API (@SWIB17)
 
Introduction to the International Image Interoperability Framework (IIIF)
Introduction to the International Image Interoperability Framework (IIIF)Introduction to the International Image Interoperability Framework (IIIF)
Introduction to the International Image Interoperability Framework (IIIF)
 
From Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and CollaborationsFrom Open Access to Open Standards, (Linked) Data and Collaborations
From Open Access to Open Standards, (Linked) Data and Collaborations
 
Mind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvestingMind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvesting
 
ORCID & other Person iDs
ORCID & other Person iDsORCID & other Person iDs
ORCID & other Person iDs
 
Who's the Author? Identifier soup - ORCID, ISNI, LC NACO and VIAF
Who's the Author? Identifier soup - ORCID, ISNI, LC NACO and VIAFWho's the Author? Identifier soup - ORCID, ISNI, LC NACO and VIAF
Who's the Author? Identifier soup - ORCID, ISNI, LC NACO and VIAF
 
IIIF without an image server? No problem!
IIIF without an image server? No problem!IIIF without an image server? No problem!
IIIF without an image server? No problem!
 
IIIF Technical Specification Status Update
IIIF Technical Specification Status UpdateIIIF Technical Specification Status Update
IIIF Technical Specification Status Update
 
Discovery of IIIF Resources
Discovery of IIIF ResourcesDiscovery of IIIF Resources
Discovery of IIIF Resources
 

Recently uploaded

WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 

Recently uploaded (20)

WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 

Oxford Common File Layout (OCFL)

  • 1. Oxford Common File Layout Rosalyn Metz (Emory), Simeon Warner (Cornell) Samvera Connect 2018 http://bit.ly/ocfl-samcon2018
  • 2. Not just us... OCFL Editorial Group ● Andrew Hankinson (Oxford) ● Neil Jefferies (Oxford) ● Julian Morley (Stanford) ● Andrew Woods (DuraSpace) ● and us (Rosalyn and Simeon) Community input from pasig-discuss and ocfl-community groups, and from others
  • 4. BagIt Well established and implemented specification for handling sets of files ● Being formally standardized as RFC: https://tools.ietf.org/html/draft-kunze-bagit-17 ● Used for transfer and (somewhat less) for files at rest ● Good fixity support ● No explicit versioning support ○ Could use local conventions for version inside a bag ○ Could use bag-per-version
  • 5. Moab: A Brief History Slides adapted from Julian Morley's in the OR2018 OCFL presentation ● Moab is the closest ancestor of OCFL ● Developed at Stanford Libraries by Richard Anderson ○ Article: http://journal.code4lib.org/articles/8482 ● Named after Moab, UT
  • 6. Moab: A Brief History ● Moab is a versioned, forward delta file structure that supports fixity and file de-duplication. ● You can preserve anything with it (even cat pictures found on the internet) ● The tools to manage and create Moabs are open source Ruby gem ○ https://github.com/sul-dlss/moab-versioning
  • 7. Moab is part of the Stanford Digital Repository Here be Moabs!
  • 8. Moab in Practice @ Stanford We have many Moabs in the SDR ● 1.6 million Moab objects ● 5 million version directories ● 50+ million files ● 500+ TB of data (25TB added last month) ● Spread across 15 NFS volumes on NetApp filers ● Backed up by IBM Spectrum Protect (formerly TSM) ○ 1 tape copy kept in local tape frame;1 sent to Iron Mountain
  • 9. ab123cd4567 v0001 data content title.jpg intro.jpg page1.jpg page2.jpg page3.jpg metadata versionMetadata.xml descMetadata.xml identityMetadata.xml manifests versionInventory.xml signatureCatalog.xml versionAdditions.xml fileInventoryDifference.xml manifestInventory.xml v0002 data content page2.jpg metadata versionMetadata.xml technicalMetadata.xml manifests versionInventory.xml signatureCatalog.xml versionAdditions.xml fileInventoryDifference.xml manifestInventory.xml two version directories; /v0001 & /v0002 A sample Moab object on disk /data content comes from upstream and could be anything, but our systems create data in /content and /metadata directories. /manifests directories are for Moab metadata. This is where we store all the checksums and change information for deduplication and forward deltas.
  • 11.
  • 12. CULAR @ 2017 It worked, what now? ● Fedora 3 no longer being developed, Fedora 4 not an appropriate option ● Decision not to buy "preservation services", primarily on cost grounds ● Decision that we want one local copy for legal access reasons Short term ⇒ use local disk and AWS S3. Build tools over filesystem and object stores
  • 13.
  • 14. Those files sure are piling up! Nearly 100TB now, planning 100TB/year digitization ● Plan to purchase a scalable local (object) storage system for 1 copy ● Two more copies in cloud (perhaps tape) ● Content will outlast any application or software system ● Content will outlast any storage system ● Expect change and hence migration ⇒ KISS
  • 15.
  • 16.
  • 18. Shared Cornell and OCFL Goals ● Provide an application and vendor neutral storage arrangement that can be used with filesystems and object stores ○ Allow easy replication between multiple storage environments ○ Allow easy migration between storage systems (modulo the inherent burdens) ○ Allow use with multiple and changing applications ● Support package versioning at low cost (complexity and storage use) ● Support internal package validation for completeness and fixity ● Support audit and self-description of entire store ● Have an easy migration path from current archival storage arrangements ● Develop a shared model that is useful at multiple institutions so that all benefit from community developed tools and expertise.
  • 20. Lessons from Emory: Deliverables Actively engaged in a multi-year effort to gather requirements, design, and develop a digital repository based on the Samvera framework. Selected deliverables included... Develop object definitions/types (e.g. collections, objects, other entities) and their relationships to one another; determine preservation objects inside and outside of Fedora. Identify needs for AIP structure. Identify storage requirements (e.g. number of copies, file access scenarios)
  • 21. Lessons from Emory: Identified requirements The means to distribute digital objects to third-party preservation services. A well understood and well documented model for storing digital objects. Ability to place multiple copies of digital objects into diverse storage services (AWS, local storage, etc.). Easily allow for fixity checking of digital objects.
  • 22. Digital Object Content Files (Primary or Supplemental) Content file 1 Content file 2 Content file 3... … + additional … + additional The content itself: relationships provided in structural metadata Metadata (Actionable/Indexed) Desc. metadata Technical metadata (File-level) Preservation Events/Audits Administrative metadata Structural metadata (PCDM) Metadata converted to RDF for Hyrax/Fedora - editable and/or searchable Supplemental Preservation Files (Metadata/Administrative Files) Source Metadata (binary file) Desc. Metadata record (binary file) METS (binary file) License/agreement (binary file) Supplemental PREMIS (binary file) Variable supplemental info stored as files (not directly system-readable): staff can view or download file to read it
  • 23. Collection Ancient Egyptian Collection Administrative Collection Carlos Museum Administrative Collections reflect the process the libraries followed when deciding to collect materials. Digital Objects must be a part of an Administrative Collection and optionally in one or more Collections Digital Objects may contain one or more files Digital Objects, Collections receive Emory-defined metadata and relationships Major Emory Entities PCDM Context - Simple Example Individual Agreements contain information about the Administrative Collection. Individual Agreements may contain one or more files Individual Agreements are assigned to objects through their parent Collection Is a member of Is a member of Individual Agreement Carlos Museum Agreement Digital Object Statuette of a Cat. Collection Divine Felines Exhibition Is a member of Is a member of
  • 25. OCFL Requirements 1) Completeness, so that a repository can be rebuilt from the files it stores, 2) Parsability, both by humans and machines, most importantly in the absence of original software, 3) Robustness, against errors, corruption, and migration between storage technologies, and 4) Storage, on a variety of infrastructures including cloud object stores. Many existing digital preservation standards like: ● TDR (ISO 16363) ● OAIS (ISO 14721) ● NDSA Levels of Preservation ● BagIt discuss the need for these requirements, but none provided a standardized way for how to do it.
  • 27. OCFL Object A group of one or more content files and administrative information identified by a URI. The object may contain a sequence of versions of the files organized into version directories. The base directory of the object may contain a logs directory. A NAMASTE file indicating conformance. An object contains an inventory digest file which provides a digest for the inventory.json file. [object root] ├── 0=ocfl_object_1.0 ├── inventory.json ├── inventory.json.sha512 ├── v1 │ ├── empty.txt │ ├── foo │ │ └── bar.xml │ ├── image.tiff │ ├── inventory.json │ └── inventory.json.sha512 ├── v2 │ ├── foo │ │ └── bar.xml │ ├── inventory.json │ └── inventory.json.sha512 └── v3 ├── inventory.json └── inventory.json.sha512
  • 28. OCFL Object An object contains an inventory.json file which inventories the contents of an object. The manifest block lists all the digests and existing file paths for all of the object’s content. The versions block identifies the logical file path and the digest for each version of the object’s content. Separating the logical file path from the existing file path and using digests to refer to files allows for deduplication of content. { "head": "v3", "id": "ark:/12345/bcd987", "manifest": { "4d27c8...b53": [ "v2/foo/bar.xml" ], "7dcc35...c31": [ "v1/foo/bar.xml" ], "cf83e1...a3e": [ "v1/empty.txt" ], "ffccf6...62e": [ "v1/image.tiff" ] }, "type": "Object", "versions": [ { "created": "2018-01-01T01:01:01Z", "message": "Initial import", "state": { "7dcc35...c31": [ "foo/bar.xml" ], "cf83e1...a3e": [ "empty.txt" ], "ffccf6...62e": [ "image.tiff" ] }, "type": "Version", "user": { "address": "alice@example.com", "name": "Alice" }, "version": "v1" }, { "created": "2018-02-02T02:02:02Z", "message": "Fix bar.xml, remove image.tiff,
  • 29. OCFL Storage Root The base directory of an OCFL storage layout. Should also contain the OCFL specification in human-readable plain-text format. Should contain the conformance declaration OCFL Objects may conform to the same or earlier version of the specification. The storage hierarchy must terminate with an OCFL Object Root. [storage root] ├── 0=ocfl_1.0 ├── ocfl_1.0.txt (optional) ├── ab12cd34 │ ├── 0=ocfl_object_1.0 │ ├── inventory.json │ ├── inventory.json.sha512 │ └── v1 │ ├── file.txt │ ├── inventory.json │ └── inventory.json.sha512 └── ef56gh78 . ├── 0=ocfl_object_1.0 ├── inventory.json ├── inventory.json.sha512 ├── v1 │ ├── empty.txt │ ├── foo │ │ └── bar.xml │ ├── image.tiff │ ├── inventory.json │ └── inventory.json.sha512 └── v2 ├── foo │ └── bar.xml ├── inventory.json └── inventory.json.sha512
  • 30. OCFL Storage Root Storage hierarchies must not include files within intermediate directories Storage hierarchies must be terminated by OCFL Object Roots Storage hierarchies within the same OCFL Storage Root should use just one layout pattern Storage hierarchies within the same OCFL Storage Root should consistently use either a directory hierarchy of OCFL Objects or top-level OCFL Objects [storage root] ├── 0=ocfl_1.0 ├── ocfl_1.0.txt (optional) └── ab └── 12 └── cd └── 34 └── ab12cd34 ├── 0=ocfl_object_1.0 ├── inventory.json ├── inventory.json.sha512 ├── v1 │ ├── empty.txt │ ├── foo │ │ └── bar.xml │ ├── image.tiff │ ├── inventory.json │ └── inventory.json.sha512 └── v2 ├── foo │ └── bar.xml ├── inventory.json └── inventory.json.sha512
  • 32. Rebuildability ● Key OCFL goal -- be able to rebuild repo from an OCFL storage root ● Therefore, in OAIS terms: must include all the descriptive, administrative, structural, representation, and preservation metadata relevant to the object. ● Optionally include copy of spec in top level of OCFL storage root ● More complete option would be a specific OCFL object that contains this documentation and to have a pointer to its location in the storage root. e.g. permissions, access, and creation times ● not portable between filesystems ● not preservable through file transfer operations ● ill-defined fixity ⇒ out-of-scope If important, use filesystem image format or extract as metadata Filesystem metadata
  • 33. Empty Directories ● OCFL preserves files and their content ● Directories serve as an organizational convention ● Empty directories not directly supported ⇒ Use zero-length `.keep` file as necessary (ala. `git`, BagIt) Only special files are the inventory, its digest file, and conformance declaration files Otherwise OCFL makes no distinction between different types of files. ⇒ Use local conventions as needed Data and Metadata
  • 34. Storage ● Filesystem or Object Store -- you choose ● Original filename or Normalized filename -- you choose ● Deduplication & Forward delta differencing (at file level) -- optional but likely desirable/normal "logical file path" - path of file in content as part of state for a particular version "existing file path" - path of file in OCFL object content addressing ties these two together
  • 35. Storage Root Hierarchy - flat, pairtree, ex-wye-zee [storage_root] ├── 0=ocfl_1.0 ├── ocfl_1.0.txt (optional) ├── d45be626e024 | ├── 0=ocfl_object_1.0 | ├── inventory.json | ├── inventory.json.sha512 | └── v1... ├── d45be626e036 | ├── 0=ocfl_object_1.0 | ├── inventory.json | ├── inventory.json.sha512 | └── v1... ├── 3104edf0363a | ├── 0=ocfl_object_1.0 | ├── inventory.json | ├── inventory.json.sha512 | └── v1... [storage_root] ├── 0=ocfl_1.0 ├── ocfl_1.0.txt (optional) ├── d4 | └── 5b | └── e6 | └── 26 | └── e0 | ├── 24 | | └──d45be626e024 | | ├── 0=ocfl_object_1.0 | | └── ... | └── 36 | └──d45be626e036 | ├── 0=ocfl_object_1.0 | └── ...
  • 36. File operations (mungification?) ● Inheritance ● Addition ● Updating ● Renaming ● Deletion ● Reinstatement ● Purging ⇒ choices: a. rebuild new object b. break immutability and rewrite (not recommended) Yes - OCFL supports that...
  • 37. Version Immutability OCFL supports systems where versions (everything in a given version directory) is immutable once written. ● It is recommended to follow this practice ● BUT you can rewrite objects if you really want to, but OCFL supports (in fact, enforces for internal references) deduplication through digests ● Only within an object ● File level ● sha512 digest recommended Deduplication
  • 38. Forward Delta Each version need only include new and changed files ● Files from previous version included by reference ● Reference by content (digest) supports renaming without duplicating (You can avoid this and include files again if you really want. But why?) 1. Digests used for reference already provide basis for strong fixity checks (pref. sha512) 2. Additional digests may be include to support legacy fixity information (e.g. md5) (Fixity of inventory files themselves handled by sidecar file, e.g. inventory.json.sha512) Fixity
  • 39. Log Information log directory in OCFL object available for information not in objects content and not versioned ● form not specified ● will be ignored in object validation Objects with many small file may cause problems with some storage infrastructures and may make validation/fixity time consuming ● package in single file (ZIP recommend) (Options for a later version of the OCFL spec are ZIPped objects and/or ZIP by version) Small Files
  • 40. Roadmap Alpha (yesterday) ● Released(ish) on October 10 community call (OCFL Editors and PASIG Discuss) ● Feedback for November community call Beta (date based on feedback) ● Experimental validation tool ● Determine what other groups communities to seek input from Release 1.0 (2019) ● One production-ready validator ● Test suite and fixture objects ● Two institutions committed to backing the initiative (should define that)