Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

C4 borje justrell_hottopicslongtermpreservation


Published on

“Hot Topics” in Long-term Preservation of Digital Objects
Börje Justrell, Riksarkivet, Project Coordinator, and Bert Lemmens
PREFORMA Project / Long term preservation
Open data and the role of National Archives |Transfer of electronic records|To built up a digital archive
2016 EVA/Minerva Jerusalem International Conference on Digitisation of Cultural Heritage

Published in: Education
  • Be the first to comment

  • Be the first to like this

C4 borje justrell_hottopicslongtermpreservation

  1. 1. “Hot Topics” in Long-term Preservation of Digital Objects Borje Justrell National Archives of Sweden
  2. 2. Aim of the Session This session will focus on some topics in long-term digital preservation that are “hot” today at the Swedish National Archives. The perspective is Swedish, but the intention is that the chosen topics will serve as examples on the discussion in the European archival community.
  3. 3. Programme 10.30 Introduction - The Swedish archival framework - Digital preservation – definitions and trends 11.00 Chosen topics - Open data and the role of National Archives - Transfer of electronic records - Building a digital archive 12.00 End
  4. 4. The Swedish Archival Framework Sweden, officially the Kingdom of Sweden, is a Scandinavian country in Northern Europe. At 450,295 square kilometres (173,860 sq mi), Sweden is the third-largest country in the European Union by area. With a total population of over 9.9 million, Sweden consequently has a low population density of 21 inhabitants per square kilometre (54/sq mi), with the highest concentration in the southern half of the country. Approximately 85% of the population lives in urban areas
  5. 5. The Swedish Archival Framework Some basic facts: - Freedom of the Press Act (1766) which is part of the Swedish constitution. The Archives Act is based on it - Public records - Principle of openness and public access - Record: could be textual or image based – or a data file or something else that can be read and understood only by using technical means
  6. 6. Laws and regulations affecting the work
  7. 7. The Swedish Archival Framework Organisation: - One archival institution for state archives (the National Archives) appearing at 18 locations around the country, administrating in total 13 physical reading rooms and 1 digital “reading room” on the Internet. About 500 employees. - All municipalities (290 primary ones and 20 secondary ones) are to some extent independent and “performers” in accordance with Swedish laws and state regulations and also responsible for their own archiving (under the Freedom of the Press Act).
  8. 8. Digtial Preservation - Definitions A major difficulty in digital preservation is the lack of a precise and definitive taxonomy of terms. Different communities use the same terms in different ways. Therefore, definitions used in this session may not necessarily achieve widespread consensus among the wide ranging of cultural heritage institutions. In European calls for R&D project it is often said, that preservation is on hand when digital objects are accessible and usable to future users. Preservation is NOT concerned only with sustaining single digital objects. Digital objects should be preserved in context which makes them understandable and (consequently) usable.
  9. 9. Digital Preservation - Definitions Digital objects Range from relatively simple, text-based files (e.g. word processing files), to highly sophisticated web-based resources which fully exploit the benefits of technology by combining sound with images, the ability to link to other resources, and the ability to interrogate. Include born digital objects, which are not intended to have an analogue equivalent, either as the originating source or as a result of conversion to analogue form (print out).
  10. 10. Digital Preservation - Definitions Digital archiving This term is used very differently within sectors. The library and archiving communities often use it interchangeably with digital preservation. Computing professionals tend sometimes to use digital archiving to mean the process of backup and on-going maintenance (including storage) as opposed to strategies for long-term digital preservation.
  11. 11. Digital Preservation - Definitions Digital curation Digital curation is often used in parallel with digital preservation; it has wider coverage and involves “maintaining, preserving and adding value to digital data throughout its life-cycle”.
  12. 12. Digital Preservation - Definitions Digitisation The process of creating digital files by scanning or otherwise converting analogue materials. The resulting digital copy, or digital surrogate, could then be classed as a digital object to sustain and consequently subject to the same broad challenges involved in preserving accessibility and usability to it, as "born digital" materials.
  13. 13. Digital Preservation - Definitions Authenticity Confidence in the authenticity of digital materials over time is particularly crucial owing to the ease with which alterations can be made In the case of electronic records, authenticity refers to the trustworthiness of the electronic record as a record. In the case of "born digital" and digitised materials, it refers to the fact that whatever is being cited is the same as it was when it was first created unless the accompanying metadata indicates any changes.
  14. 14. Digital Preservation - Trends
  15. 15. Digital Preservation - Trends
  16. 16. Open Data and the Role of National Archives Open data in its broader meaning is data freely available to everyone to use and republish as they wish, without restrictions from any mechanisms of control including copyright and patents. However, an internationally accepted (formal) definition is still lacking. Discussions have started about the need for standardisation, unclear of what. .
  17. 17. Open Data and the Role of National Archives In computing, linked data is a method of publishing structured data so that it can be interlinked and become more useful through semantic queries. To create linked open data means that data are not only open but also published in a machin-readable format and linked to other sources of data. The diagram on next slide shows which linking open datasets are connected, as of August 2014. This was produced by the Linked Open Data Cloud project, which started in 2007. Some sets may include copyrighted data which is freely available
  18. 18. Open Data and the Role of National Archives Open data is often recognised as a method to achieve a higher degree of transparency in governmental management and decision-making. In EU, open data – government initiatives are built on the union’s directive for Public Service Information (PSI) which is implemented in the legislation of the Member States. But – all PSI-data are not open, and all open data are not necessarily open public data.
  19. 19. Open Data and the Role of National Archives Open data Public data (PSI data) Open public data
  20. 20. Open Data and the Role of National Archives In Sweden, the National Archives has this year got a special assignment from the government to foster and coordinate state agencies efforts to make their data available for wider use.
  21. 21. Open Data and the Role of National Archives The National archives shall, according to the governments decision, mainly - collect and publish digital information that state agancies have to make public in accordance with the Swedish law on reuse of public records - stimulate state agencies to publish open data - administrate and maintain the web portal for open data (already existing) - support citizens in finding public data and helping them in contacting the agencies who are managing these data
  22. 22. Open Data and the Role of National Archives This is a assignment for three years. After this period the outcome will be evaluated. The reasons behind the Governments decision are clearly stated: It should be easy for citizens and companies to find the state agencies information. However, the agencies need support to make their information accessible in a uniform and cost-effective way.
  23. 23. Open Data and the Role of National Archives But what about other types of open data than PSI data? Still under discussion. Most obviosly: Use the assignment as a stepping stone for a strategy on open data and linked open data. A special secretariat at the National Archives has for some years looked into the challenges and opportunities in linked open data (incl metadata standards tools for mapping metadata between formats and standards)
  24. 24. Building up a digital archive
  25. 25. Conditions in the Beginning of the 21st Century • No fixed transfer time; data files received from state agencies can be new or old ones. • Transfers are negotiated between the agencies and the National Archives. Funding is remitted from the agencies to the National Archives to cover the preservation costs. • When agencies are closed down, their archives are (by law) transferred to the National Archives • No common E-Archiving standard and Records Management standard in use; agencies implement their own (incompatible) solutions, developed by commercial software vendors.
  26. 26. Regulations for Digital Preservation The National Archives issues regulations for digital preservation in the Swedish agencies (under the Archives Act) Accepted file formats (media dependent rules) –Text files (ISO 8859-1, Unicode) –HTML –XML (also GML and SGML) –PDF (PDF/A-1) –JPEG, TIFF and PNG –MPEG
  27. 27. Digitisation activities In-house scanning of documents; primarily church records, at the National Archives large scale digitising facility MKC In-house scanning of documents at the National Archives different locations, further processed at MKC or SVAR (the digital reading room) In-house microfilm scanning at SVAR Microfilm scanning by FamilySearch in Salt Lake City to be delivered to SVAR; primarily church records and judicial records.
  28. 28. Long-term Digital Storage at the National Archives (2016-11-01) • Born-digital files from agencies: about 5 TB • Audio-video files and multimedia: about 100 TB • Digitised volumes (one AIP per volume): 466 225 • Digitised images (TIFF-format): 2473 TB –Images in total: 179 million –Images published on Internet: 98 million • DJVU-files (presentation format): about 30 TB • Total storage: About 5 PB on tape. (All files are stored on two tapes)
  29. 29. Attributes of a Trusted Digital Repository (OCLC 2002) • Compliance with the Reference Model for an Open Archival Information System (OAIS) • Administrative responsibility • Organisational viability • Financial sustainability • Technological and procedural suitability • System security • Procedural accountability
  30. 30. The OAIS model An OAIS compliant archive is built on six functional parts • Ingest • Archival Storage. • Data Management • Administration. • Access • Preservation Planning
  31. 31. OAIS model
  32. 32. The National Archives Platform for Digital Preservation (RADAR) ESSArch Archival Storage System Allmänhet Sökning via NAD och SVAR:s webbplats Ingest from scanning RALF Application for control/prepar ation at the agencies KRAM Application for Ingest and control ARKIS Archival Information System Tjänsteman Myndighet Tjänsteman Riksarkivet KRAM Access and dissemination of databases Tjänsteman Riksarkivet Tjänsteman Riksarkivet
  33. 33. The Archival Storage System (ESSArch) • ESSArch is a back-end system for archival storage • Storage and retrieval of AIP:s. Stores AIP:s in several bitwise identical copies • AIP:s (contain data files and metadata in METS/PREMIS-format) are stored in TAR-format. No vendor specific backup format • Reads and writes checksums for packages and files • Event log and access control • Local MySQL-database using the PREMIS 2.0. data model • Automatic updates to the Archival Information System ARKIS • ESSArch is an open source system based on Linux, Apache, MySQL och Python. ESSArch (version 2.1.0) is available at SourceForge ( ) • Used by the National Archives in Sweden and Norway
  34. 34. General Archival Standards • ISAD(G) and ISAAR(CPF) –The Archival Information System ARKIS is modelled after these standards • EAD (Encoded Archival Description), XML-format for archival descriptions. and EAC-CPF (Encoded Archival Context) .XML- format for the description of archive creators –These formats are used as exchange formats for archival description information –Supported by several commercial archival information systems –Import and export functions in ARKIS –Currently a new Swedish EAD and EAC-CPF specification is being developed
  35. 35. Metadata standards for digital preservation METS (Metadata Encoding & Transmission Standard) - Structure for encoding descriptive, administrative, and structural metadata (DLF/LOC) (2004) PREMIS (Preservation Metadata) - A data dictionary and supporting XML schemas for core preservation metadata needed to support the long-term preservation of digital materials (OCLC/LOC) (2005) MIX (NISO Metadata for Images in XML) - XML schema for encoding technical data elements required to manage digital image collections (ANSI/NISO) (2006)
  36. 36. Other formats ADDML (Archival Data Description Markup Language) XML-format used by the National Archives of Norway and Sweden, XML-format for describing flat files exported from databases (2001, 2009). An alternative to the Swiss SIARD-format for databases
  37. 37. Transfer of Data from State Agencies
  38. 38. E-archive project To strengthen the development of eGovernment and create good opportunities for inter-agency coordination, a delegation for eGovernment was established by the Government. This delegation initiates strategic e- government projects, one about e-archive. This project was headed by the National Archives but in fact a joint effort involving several other governmental agencies as well as county councils and municipalities The goal: to build a foundation for the implementation of cost effective systems based on common specifications as opposed to isolated and incompatible systems for each agency (government, county council or municipality). ”
  39. 39. E-archive project The first step: to create common specifications (CM) for exchange formats and thus create interoperability for the development of compatible E-Archive and Record Management systems. In these specifications national adaptations of several international standards will be used such as EAD, EAC-CPF, PREMIS, METS, MoReq and others. The Project finished in 2014 A maintenance organisation for the common specifications has now been built up
  40. 40. System for long-term information retrieval E-Archive runned by an agency (In house or as an e-service provided by an another agency or a commercial company) General public Long-term E-Archive at an archival institution such as the National Archives Other agency system Other agency system Record management system Search facilities Agency employees Agency employees Transfer of electronic records from the business systems to the E-Archive Transfer of custody of the electronic records from the agency to the an arhival institution
  41. 41. Sub-project: Metadata for E- Archiving • Developing a Swedish SIP based on standards such as METS and PREMIS • For use in agencies as well as archival institutions – Not only for delivery to the National Archives – Ensure compatibility between different solutions and E- Archive implementations – Generic structure: the SIP should be possible to adapt to different information types with basic metadata common to all information types
  42. 42. Subproject: Metadata for E- Archiving Curent status • Developing a Swedish SIP –An official specification for a common SIP Package structure has been published in August 2015 • Content type specification –A common content type specification (CM) for ERMS-systems is currently being developed
  43. 43. Generic Package Structure for E-Archives SIP Package structure Content type Specification ERMS-systems Content type specification other type Content type specification other type… Modified specification
  44. 44. Information Model of Packages From: Karin Bredenberg
  45. 45. Thank You!