The document discusses topics related to long-term digital preservation at the Swedish National Archives. It provides an introduction to the Swedish archival framework and laws governing records. It then discusses key topics in digital preservation including definitions, trends in the field, challenges around open data and the role of national archives, and efforts around building a digital archive including transferring electronic records and developing standards.
1. “Hot Topics” in Long-term
Preservation of Digital Objects
Borje Justrell
National Archives of Sweden
2. Aim of the Session
This session will focus on some topics in
long-term digital preservation that are
“hot” today at the Swedish National
Archives.
The perspective is Swedish, but the intention
is that the chosen topics will serve as
examples on the discussion in the
European archival community.
3. Programme
10.30 Introduction
- The Swedish archival framework
- Digital preservation – definitions and
trends
11.00 Chosen topics
- Open data and the role of National
Archives
- Transfer of electronic records
- Building a digital archive
12.00 End
4. The Swedish Archival Framework
Sweden, officially the Kingdom of Sweden, is a
Scandinavian country in Northern Europe. At
450,295 square kilometres (173,860 sq mi),
Sweden is the third-largest country in the
European Union by area. With a total population
of over 9.9 million, Sweden consequently has a
low population density of 21 inhabitants per
square kilometre (54/sq mi), with the highest
concentration in the southern half of the country.
Approximately 85% of the population lives in urban
areas
5. The Swedish Archival Framework
Some basic facts:
- Freedom of the Press Act (1766) which is part of the Swedish
constitution. The Archives Act is based on it
- Public records
- Principle of openness and public access
- Record: could be textual or image based – or a data file or
something else that can be read and understood only by using
technical means
7. The Swedish Archival Framework
Organisation:
- One archival institution for state archives (the
National Archives) appearing at 18 locations
around the country, administrating in total 13
physical reading rooms and 1 digital “reading
room” on the Internet. About 500 employees.
- All municipalities (290 primary ones and 20
secondary ones) are to some extent independent
and “performers” in accordance with Swedish laws
and state regulations and also responsible for their
own archiving (under the Freedom of the Press
Act).
8. Digtial Preservation - Definitions
A major difficulty in digital preservation is the lack of a
precise and definitive taxonomy of terms. Different
communities use the same terms in different ways.
Therefore, definitions used in this session may not
necessarily achieve widespread consensus
among the wide ranging of cultural heritage
institutions.
In European calls for R&D project it is often said, that
preservation is on hand when digital objects are
accessible and usable to future users.
Preservation is NOT concerned only with sustaining
single digital objects. Digital objects should be
preserved in context which makes them
understandable and (consequently) usable.
9. Digital Preservation - Definitions
Digital objects
Range from relatively simple, text-based files (e.g.
word processing files), to highly sophisticated
web-based resources which fully exploit the
benefits of technology by combining sound with
images, the ability to link to other resources, and
the ability to interrogate.
Include born digital objects, which are not intended to
have an analogue equivalent, either as the
originating source or as a result of conversion to
analogue form (print out).
10. Digital Preservation - Definitions
Digital archiving
This term is used very differently within sectors. The library and
archiving communities often use it interchangeably with digital
preservation.
Computing professionals tend sometimes to use digital archiving to
mean the process of backup and on-going maintenance
(including storage) as opposed to strategies for long-term digital
preservation.
11. Digital Preservation - Definitions
Digital curation
Digital curation is often used in parallel with digital
preservation; it has wider coverage and involves
“maintaining, preserving and adding value to digital data
throughout its life-cycle”.
http://www.dcc.ac.uk/digital-curation/what-digital-curation
12. Digital Preservation - Definitions
Digitisation
The process of creating digital files by scanning or
otherwise converting analogue materials.
The resulting digital copy, or digital surrogate, could
then be classed as a digital object to sustain and
consequently subject to the same broad
challenges involved in preserving accessibility and
usability to it, as "born digital" materials.
13. Digital Preservation - Definitions
Authenticity
Confidence in the authenticity of digital materials over
time is particularly crucial owing to the ease with
which alterations can be made
In the case of electronic records, authenticity refers to
the trustworthiness of the electronic record as a
record.
In the case of "born digital" and digitised materials, it
refers to the fact that whatever is being cited is the
same as it was when it was first created unless the
accompanying metadata indicates any changes.
17. Open Data and the Role
of National Archives
Open data in its broader meaning is data freely
available to everyone to use and republish as
they wish, without restrictions from any
mechanisms of control including copyright and
patents.
However, an internationally accepted (formal)
definition is still lacking. Discussions have started
about the need for standardisation, unclear of
what.
.
18. Open Data and the Role
of National Archives
In computing, linked data is a method of publishing
structured data so that it can be interlinked and
become more useful through semantic queries. To
create linked open data means that data are not only
open but also published in a machin-readable format
and linked to other sources of data.
The diagram on next slide shows which linking open
datasets are connected, as of August 2014. This was
produced by the Linked Open Data Cloud project,
which started in 2007. Some sets may include
copyrighted data which is freely available
19.
20. Open Data and the Role
of National Archives
Open data is often recognised as a method to achieve a
higher degree of transparency in governmental
management and decision-making. In EU, open data –
government initiatives are built on the union’s directive
for Public Service Information (PSI) which is
implemented in the legislation of the Member States.
But – all PSI-data are not open, and all open data are not
necessarily open public data.
21. Open Data and the Role
of National Archives
Open
data
Public
data
(PSI
data)
Open public data
22. Open Data and the Role
of National Archives
In Sweden, the National Archives has this year got a special
assignment from the government to foster and coordinate state
agencies efforts to make their data available for wider use.
23. Open Data and the Role
of National Archives
The National archives shall, according to the
governments decision, mainly
- collect and publish digital information that state
agancies have to make public in accordance with
the Swedish law on reuse of public records
- stimulate state agencies to publish open data
- administrate and maintain the web portal for
open data (already existing)
- support citizens in finding public data and helping
them in contacting the agencies who are
managing these data
24. Open Data and the Role
of National Archives
This is a assignment for three years. After this period the
outcome will be evaluated.
The reasons behind the Governments decision are clearly
stated:
It should be easy for citizens and companies to find the state
agencies information. However, the agencies need
support to make their information accessible in a uniform
and cost-effective way.
25. Open Data and the Role
of National Archives
But what about other types of open data than PSI
data?
Still under discussion. Most obviosly: Use the
assignment as a stepping stone for a strategy on
open data and linked open data.
A special secretariat at the National Archives has for
some years looked into the challenges and
opportunities in linked open data (incl metadata
standards tools for mapping metadata between
formats and standards)
27. Conditions in the Beginning of
the 21st Century
• No fixed transfer time; data files received from state
agencies can be new or old ones.
• Transfers are negotiated between the agencies and the
National Archives. Funding is remitted from the agencies to
the National Archives to cover the preservation costs.
• When agencies are closed down, their archives are (by law)
transferred to the National Archives
• No common E-Archiving standard and Records
Management standard in use; agencies implement their own
(incompatible) solutions, developed by commercial software
vendors.
28. Regulations for Digital
Preservation
The National Archives issues regulations for digital preservation
in the Swedish agencies (under the Archives Act)
Accepted file formats (media dependent rules)
–Text files (ISO 8859-1, Unicode)
–HTML
–XML (also GML and SGML)
–PDF (PDF/A-1)
–JPEG, TIFF and PNG
–MPEG
29. Digitisation activities
In-house scanning of documents; primarily church records,
at the National Archives large scale digitising facility
MKC
In-house scanning of documents at the National Archives
different locations, further processed at MKC or SVAR
(the digital reading room)
In-house microfilm scanning at SVAR
Microfilm scanning by FamilySearch in Salt Lake City to be
delivered to SVAR; primarily church records and judicial
records.
30. Long-term Digital Storage at the
National Archives (2016-11-01)
• Born-digital files from agencies: about 5 TB
• Audio-video files and multimedia: about 100 TB
• Digitised volumes (one AIP per volume): 466 225
• Digitised images (TIFF-format): 2473 TB
–Images in total: 179 million
–Images published on Internet: 98 million
• DJVU-files (presentation format): about 30 TB
• Total storage: About 5 PB on tape. (All files are stored on two
tapes)
31. Attributes of a Trusted Digital
Repository (OCLC 2002)
• Compliance with the Reference Model for
an Open Archival Information System
(OAIS)
• Administrative responsibility
• Organisational viability
• Financial sustainability
• Technological and procedural suitability
• System security
• Procedural accountability
32. The OAIS model
An OAIS compliant archive is built on six functional
parts
• Ingest
• Archival Storage.
• Data Management
• Administration.
• Access
• Preservation Planning
34. The National Archives Platform for Digital
Preservation (RADAR)
ESSArch
Archival Storage System
Allmänhet
Sökning via NAD
och SVAR:s webbplats
Ingest from
scanning
RALF
Application for
control/prepar
ation at the
agencies
KRAM
Application for
Ingest and
control
ARKIS
Archival Information System
Tjänsteman
Myndighet
Tjänsteman
Riksarkivet
KRAM
Access and
dissemination
of databases
Tjänsteman
Riksarkivet
Tjänsteman
Riksarkivet
35. The Archival Storage System
(ESSArch)
• ESSArch is a back-end system for archival storage
• Storage and retrieval of AIP:s. Stores AIP:s in several bitwise
identical copies
• AIP:s (contain data files and metadata in METS/PREMIS-format) are
stored in TAR-format. No vendor specific backup format
• Reads and writes checksums for packages and files
• Event log and access control
• Local MySQL-database using the PREMIS 2.0. data model
• Automatic updates to the Archival Information System ARKIS
• ESSArch is an open source system based on Linux, Apache, MySQL
och Python. ESSArch (version 2.1.0) is available at SourceForge (
http://sourceforge.net/projects/essarch/ )
• Used by the National Archives in Sweden and Norway
36. General Archival Standards
• ISAD(G) and ISAAR(CPF)
–The Archival Information System ARKIS is modelled after these
standards
• EAD (Encoded Archival Description), XML-format for archival
descriptions. and EAC-CPF (Encoded Archival Context) .XML-
format for the description of archive creators
–These formats are used as exchange formats for archival
description information
–Supported by several commercial archival information systems
–Import and export functions in ARKIS
–Currently a new Swedish EAD and EAC-CPF specification is
being developed
37. Metadata standards for digital
preservation
METS (Metadata Encoding & Transmission Standard) - Structure
for encoding descriptive, administrative, and structural metadata
(DLF/LOC) (2004)
PREMIS (Preservation Metadata) - A data dictionary and supporting
XML schemas for core preservation metadata needed to support
the long-term preservation of digital materials (OCLC/LOC)
(2005)
MIX (NISO Metadata for Images in XML) - XML schema for
encoding technical data elements required to manage digital
image collections (ANSI/NISO) (2006)
38. Other formats
ADDML (Archival Data Description Markup Language)
XML-format used by the National Archives of Norway
and Sweden, XML-format for describing flat files
exported from databases (2001, 2009).
An alternative to the Swiss SIARD-format for
databases
40. E-archive project
To strengthen the development of eGovernment and
create good opportunities for inter-agency coordination, a
delegation for eGovernment was established by the
Government. This delegation initiates strategic e-
government projects, one about e-archive.
This project was headed by the National Archives but in
fact a joint effort involving several other governmental
agencies as well as county councils and municipalities
The goal: to build a foundation for the implementation of
cost effective systems based on common specifications
as opposed to isolated and incompatible systems for
each agency (government, county council or
municipality).
”
41. E-archive project
The first step: to create common specifications (CM)
for exchange formats and thus create
interoperability for the development of compatible
E-Archive and Record Management systems. In
these specifications national adaptations of
several international standards will be used such
as EAD, EAC-CPF, PREMIS, METS, MoReq and
others.
The Project finished in 2014
A maintenance organisation for the common
specifications has now been built up
42. System for long-term information
retrieval
E-Archive
runned by an
agency
(In house or as
an e-service
provided by an
another agency
or a
commercial
company)
General public
Long-term
E-Archive at an
archival
institution
such as the
National
Archives
Other agency
system
Other agency
system
Record
management
system
Search facilities
Agency employees
Agency
employees
Transfer of electronic records
from the business systems to
the E-Archive
Transfer of custody of the electronic
records from the agency to the an
arhival institution
43. Sub-project: Metadata for E-
Archiving
• Developing a Swedish SIP based on standards such as
METS and PREMIS
• For use in agencies as well as archival institutions
– Not only for delivery to the National Archives
– Ensure compatibility between different solutions and E-
Archive implementations
– Generic structure: the SIP should be possible to adapt
to different information types with basic metadata
common to all information types
44. Subproject: Metadata for E-
Archiving
Curent status
• Developing a Swedish SIP
–An official specification for a common SIP
Package structure has been published in August
2015
• Content type specification
–A common content type specification (CM) for
ERMS-systems is currently being developed
45. Generic Package Structure for
E-Archives
SIP Package
structure
Content type
Specification
ERMS-systems
Content type
specification
other type
Content type
specification
other type…
Modified
specification
46. Information Model of Packages
http://www.loc.gov/standards/mets/
From: Karin Bredenberg
I samband med framtagning av FGS:er tas hänsyn till eventuella befintliga föreskrifter och relevanta krav från föreskrifterna kan infogas i specifikationerna.
Exempel: Metadatakrav för arkivredovisning enligt RA-FS 2008:4 ingår i kommande FGS för arkivredovisning
Specifikationerna är frivilliga men de kan ges styrande eller bindande status genom hänvisning till dem från föreskrifter.
Inbakat i vår databas
Hänger ihop med arkivobjektet
I ett paket kan man se alla de olika FGS:Erna samverka.
Paketet hålle rihop det.
Arkivet och arkivbildaren beskrivs med FGS Arkivbeskrivning som kommer att använda två standarder
Strukturen beskrivs av FGS:erna för informaitonstyperna i vissa fall finns även informationen med i strukturen
På lägsta nivån själva de digitala objekten som överförs