6. âactive management of digital content over time to ensure ongoing
accessâ (LOC)
âseries of managed activities necessary to ensure continued access to
digital materials for as long as necessaryâ (DPC glossary)
Definitions may vary but certainly DP is more than mechanics, more than
software, hardware, networks, 0s and 1s. DP also includes governance,
fiscal commitment, and a designated community
What is digital preservation?
7. Digital preservation is:
âȘ Software and hardware
âȘ Networks of systems that manage content
âȘ The actual bits and bytes of data that weâve decided need preserving
and the actions that we take to preserve them
But significantly, it is also:
âȘ Governance
âȘ Fiscal commitment
âȘ A community
What is digital preservation?
8. Three-legged stool
Archivematica easily fits in as the
technology leg of the three-legged
digital preservation stool, but it also
helps fulfil organizational elements
like preservation planning.
9. Organizational Infrastructure
What are the requirements and parameters for the organization's digital
preservation program?
Technological Infrastructure
How will the organization meet defined digital preservation requirements?
Resources Framework
What resources will it take to develop and maintain the organizationâs
digital preservation program?
What is digital preservation?
10. - Ingests digital content and perform various tasks to make
sure the same content is secure, identifiable, accessible and
capable of supporting the presumption of authenticity over
time
- A framework of policies and processes
- Helps to couple actions that enable long-term management
and accessibility of digital objects
What is a digital preservation
system?
11. What is a digital preservation
system?
A system built from tools that
perform a variety of
functions to ensure the
integrity and authenticity of
digital content
âȘ Identifier assignment
âȘ File format identification
âȘ File format validation
âȘ Metadata extraction
âȘ Fixity checking
âȘ Normalization
âȘ Metadata generation
âȘ AIP packaging
13. â Audit and certification of trustworthy digital
repositories â sets out comprehensive
metrics for what an archive must do
â Based on the OAIS functional model
â Archivematica fulfills many of digital object
management criteria in ISO 16363, but other
aspects must be fulfilled using
complementary systems
Archivematica and Trusted
Digital Repositories (TDRs)
14. What does Archivematica
promise?
â Standards-based
â Open source
â Make system-agnostic, self-describing AIPs
â Connect to producer / consumer interfaces
â Connect to policies (preservation planning)
15. Development timeline
2008 2018
2014
2010
2009
Qubit-OAIS conceived
of as back end for
ICA-AtoM
Qubit-OAIS decoupled
from ICA-AtoM and
renamed Archivematica
City of Vancouver
Archives and UNESCO
fund the first alpha and
beta releases
Archivematica 1.7 was
released on May 1
Archivematica 1.0 is
released
19. BagIt
Standard for packaging multilevel, hierarchical content,
developed by the Library of Congress (USA)
METS
XML schema for encoding descriptive, administrative, and
technical metadata, also developed by the Library of Congress
Standards, standards, standards
20. PREMIS
Standard for defining preservation metadata, including
technical information about objects and information about
the actions taken on the objects in the preservation
repository
Dublin Core (ISO 15836:2009)
Standard for capturing descriptive metadata, developed by
the Dublin Core Metadata Initiative
Standards, standards, standards
21. PRONOM
Technical registry providing impartial and definitive
information about file formats, software products and other
technical components required to support long-term access
to electronic records, developed and maintained by the
National Archives of the UK.
Standards, standards, standards
25. â PREMIS, or Preservation Metadata Implementation Strategies, is the recognized standard for
metadata about objects in a digital preservation system.
â It captures technical information about an object in order to support the implementation of
preservation strategies such as normalization, migration or emulation (PREMIS Object)
â It describes relationships between digital objects (PREMIS Object)
â It provides an audit trail of actions taken by the digital preservation repository to preserve the object
(PREMIS Event)
â It names the individuals, organizations and software tools responsible for taking actions to preserve
digital objects (PREMIS Agent)
â It specifies the actions a repository is allowed to take to preserve digital objects (PREMIS Rights)
What is PREMIS for?
25
26. What is METS for?
â METS, or Metadata Encoding and Transmission Standard, was designed to support inter-repository
data exchange.
â It provides a wrapper for other metadata, such as PREMIS and Dublin Core.
â It defines relationships between digital objects and other digital objects, and between digital objects
and their metadata.
â It can be used to provide technical metadata about digital objects, although Archivematica doesnât
implement it that way (we wrap PREMIS in it instead)
26
27. â Archivematica creates system-agnostic AIPs, meaning
that you do not require a particular system to store and
read AIPs in the future
â AIPs can be stored in any file system that permits
packaged formats (.tar files, .zip files)
â You can migrate AIPs between systems just like any
other type of file or package
System-agnostic packages
35. Deployment
Because Archivematica is not a single application but instead
consists of dozens of different components and tools, there are
many possible deployment configurations
36. Micro-services
âąâgranular system tasks which operate
on a conceptual entity that is
equivalent to an OAIS information
packageâ (archivematica.org)
âąMicro-services are provided by a
combination of scripts and one or more
FOSS (Free and Open Source) tools
bundled in the Archivematica system
âąEach micro-service results in either a
success or error state and the package
is then processed accordingly by the
next micro-service
Transfer
âąâą standard
âąâą bag
âąâą disk image
SIP
(DIP)
AIP
47. <mets:dmdSec ID="dmdSec_1"> The METS file can have more than one dmdSec
<mets:mdWrap MDTYPE="DC"> mdWrap means the metadata are included in the METS file, not referenced by it
<mets:xmlData>
<dcterms:dublincore xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/"
xsi:schemaLocation="http://purl.org/dc/terms/ http://dublincore.org/schemas/xmls/qdc/2008/02/11/dcterms.xsd">
<dc:title>Pictures at an Exhibition</dc:title>
<dc:creator>Mussorgsky, Modest<dc:creator>
âŠ.etc.
47
METS DMDSEC
51. <premis:objectCharacteristicsExtension>[raw tool output]</premis:objectCharacteristicsExtension> This is
where technical metadata from ingested files go, having been extracted by tools like FIDO, Siegfried, Exiftool, MediaInfo, etc.
</premis:objectCharacteristics>
<premis:originalName>%transferDirectory%objects/letter.doc</premis:originalName>
<premis:relationship> This information shows a relationship between an ingested file and its normalized version, along with
a relationship to the normalization Event
<premis:relationshipType>derivation</premis:relationshipType>
<premis:relationshipSubType>is source of</premis:relationshipSubType>
<premis:relatedObjectIdentification> This is the relationship to the related normalized file
<premis:relatedObjectIdentifierType>UUID</premis:relatedObjectIdentifierType>
<premis:relatedObjectIdentifierValue>b041d811-879f-4640-8eaâŠ</premis:relatedObjectIdentifierValue>
</premis:relatedObjectIdentification>
<premis:relatedEventIdentification> And this is the relationship to the normalization Event
<premis:relatedEventIdentifierType>UUID</premis:relatedEventIdentifierType>
<premis:relatedEventIdentifierValue>25ccf003-a007-4f12-beâŠ</premis:relatedEventIdentifierValue>
</premis:relatedEventIdentification>
51
METS AMDSEC: TECHMD CONâT
58. METS FILESEC CONâT
<mets:fileSec>
<mets:fileGrp USE="original"> Identifies what the role of the file is in the context of this AIP
<mets:file GROUPID="Group-b041d811-879f-4640-8ea5-821920a81cf9" ID="file-b041d811-879f-4640-8ea...."
ADMID="amdSec_2"> Note the link to the related amdSec, which has all of the PREMIS data in it
<mets:FLocat xlink:href="letter.doc" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/> This shows where the file
is located in relation to other files within the AIP
</mets:file>
<mets:file GROUPID="Group-002db941-78e1-4cbf-9bd9-afe7ef9c7466"
ID="file-002db941-78e1-4cbf-9bd9-afe7ef9c7466" ADMID="amdSec_4">
<mets:FLocat xlink:href="report.doc" LOCTYPE="OTHER" OTHERLOCTYPE="SYSTEM"/>
</mets:file>
58
62. <mets:structMap ID="structMap_1" LABEL="Archivematica default" TYPE="physical"> The default Archivematica METS structMap
provides a simple physical listing of the AIPâs contents
<mets:div LABEL="Images-298af460-fdf4-4c78-ac8b-2f9266495f77" TYPE="Directory">
<mets:div LABEL="objects" TYPE="Directory" DMDID="dmdSec_1"> This is a link to the AIPâs descriptive metadata
<mets:div LABEL="letter.doc" TYPE="Item">
<mets:fptr FILEID="file-b041d811-879f-4640-8ea5-821920a81cf9"/> fptr = file pointer, a link to the relevant entry in the fileSec
</mets:div>
<mets:div LABEL="letter-f3d84155-3df1-427e-9ff8-5b480895372a.pdf" TYPE="Item">
<mets:fptr FILEID="file-eb4a2422-93e2-4c70-ab4c-a56b4eeadab0"/>
</mets:div>
âŠ.etc.
</mets:div>
<mets:div LABEL="submissionDocumentation" TYPE="Directory">
<mets:div LABEL="transfer1-540aec2f-9b01-463e-bf16-f12d6b58680c" TYPE="Directory">
<mets:div LABEL="DeedOfGift.pdf" TYPE="Item">
<mets:fptr FILEID="file-612db941-78e1-4cbf-9bd9-afe7ef9c7466"/>
âŠ.etc.
62
METS STRUCTMAP
64. What does the pointer file describe?
â The format of the AIP (e.g. 7zip format) (PREMIS: OBJECT)
â Information about the compression event (PREMIS:
EVENT)
â The institution and preservation system that performed the
compression (PREMIS: AGENT)
â The location of the AIP (METS: fileGrp)
64
71. Standard Transfer
âą The most common kind of transfer for the kinds of documents we
often have in archives - images, video, audio, documents
âą Standard transfers are a directory of digital content
86. Metadata.csv format
â You can mix metadata standards within your CSV, but only
Dublin Core will be natively recognized by Archivematica
(anything else will be categorized as âotherâ)
92. Archivematica Camp Day 2
9:30-11 Stream 1 - Building on core: Archivematica's specialized workflows
Stream 2 - Supporting Archivematica workflows
11:00-12 Community profile: University of Houston
12:00-1 Lunch
1:00-3:00 Stream 1 - Archivematica's non-core functionality
Stream 2 - Understanding Archivematica's logs and performance evaluation
3:00-4:00 Community profile: Computer History Museum and
New York Public Library
4:00-5:00 Agenda adjustment for Day 3 - what do you want to talk about tomorrow?
6:00-??? Small group dinners
94. Building on Core
âȘ Transfer types
âȘ Zipped files
âȘ Manual normalization
âȘ A peek at automation tools
95. Why use specialized workflows?
â to accommodate our pre-Archivematica workflows
â content is arriving at the Archivematica pipeline in a
packaged format
â formats with special requirements
â we want to do unique and customized things with
our data
96. Zipped/unzipped bag transfers
Zipped transfers:
â A .zip file of content that is organized using the BagIt
specification
â Includes data subdirectory, bagit.txt, bag-info.txt, etc. and has
been packaged in ZIP format
Archivematica can:
â Unzip the file to perform tasks and reuse some of the
metadata
97. Zipped/unzipped bag transfers
Common use cases:
â Organization uses BagIt locally to package digital content
which they would then like to preserve in Archivematica
â Organization uses Exactly to transfer content from creators
to the archive
98.
99. Zipped/unzipped bag transfers
Unzipped transfers are:
â A directory of content that is structured according to BagIt
specifications
â Includes a data subdirectory, bagit.txt, bag-info.txt,
sha512-manifest.txt, etc.
Archivematica can:
â Reuse some of the metadata
100.
101. Disk image transfers
Common use case:
â An organization has disk images for preservation
Archivematica can:
â Use forensic disk image tools to analyse the image (The
Sleuth Kitâs fiwalk)
â Append imaging metadata to the transfer
105. Manual Normalization
Common use case:
â Organization manually normalizes files for access and
preservation and would like Archivematica to recognize
that work
Archivematica can:
â Work with manually normalized files on the Transfer tab
or on the Ingest tab
106. Other Specialized Workflows
â DSpace
â Including checksums
â Organization creates checksums locally and would like
Archivematica to recognize them
112. Automation Tools
âȘSet of python scripts to automate
âȘTransfers
âȘCreate DIPs
âȘUpload DIPs to AtoM
âȘ https://github.com/artefactual/automation-tools
129. Archivematica Camp Day 3
9:00-9:30 Arrival
9:30-10:30 Module 6: Stream 1 - Special Topics
Module 6: Stream 2 - AIPs for devs/syadmins
10:30-11:30 Module 7: Archivematica in the community
11:30-12:30 Brown Bag Lunch and Community topics
12:30-2:00 Module 8: Implementation roundtable
133. Forensic disk images
- Disk image transfer type
- Funded by Yale in 2013
- Adds imaging metadata to the transfer
- Some limitations within microservices
- We use The Sleuth Kitâs fiwalk command
- Worked with UCLA/NYPL on HFS disk image analysis
- Extract or donât extract? Depends
136. Follow That Format: MKV
âȘ Validation tool â allows users to use MediaConch to
check the conformance of .mkv files (originals and
derivatives) against the Matroska spec
âȘ Checks validity of media files against user-provided
policies
âȘ Sponsored by PREFORMA Project
http://www.preforma-project.eu/