Learn what digital preservation is, why it's important and how to take the first steps to getting on the digital preservation ladder.
Key message is DON"T DO NOTHING!
But before I get going, one thing I’m often asked is ‘what I wish I knew before I started’ when it comes to digital preservation.
One thing that I’ve certainly found out is that lightning really does strike twice.
Lightning strikes are also a bit more relevant to digital preservation than you might think. If you’re looking after digital content for decades, then there are a large number of things and events in and out of your your immediate control that can go happen and / or go wrong and it’s your job to stop them.
Just like lightning, the likelihood that many of them will happen to you is very low. But then again, maybe you didn’t know that 24,000 people are killed by lightning each year? Whilst the odds of any one of you being hit are a million to one, there will always be the ‘unlucky ones’.
What I’ll do today is talk about how digital preservation processes and technology can help you make your odds that little bit better for digital preservation.
Of course you you might ask yourself… well we have physical artefacts that are 3000 years so why should we bother to preserve today’s artefacts as they'll also be here in 3000 years’ time. That is true - we have some of the artefacts from 3000 years ago but what percentage do we have? How many have been lost and how many have been destroyed? I would estimate that we have only a tiny fraction of those ancient artefacts.
The other answer to ‘what I wish I knew before I started’ is that technology is not the place to start.
Digital preservation is primarily about people. It’s about people having the right skills. It’s about people having the right plans. It’s about people working as a team and doing something that’s more than they each achieve on their own.
Technology helps people do digital preservation but people are the place to start.
You can see this in the DCC (that’s the Digital Curation Centre) curation lifecycle model.
People decide what needs preserving
People make the business case
People make preservation plans
People decide on the preservation actions to take
People monitor and manage the process
People hand over to successors
Technology helps support all these activities.
Delaying a decision to get started and doing nothing is the worst thing you can do.
Delays cause digital data to become derelict. Neglect has serious consequences in the digital world – it’s not benign. A decision to do nothing or to delay action can be the equivalent of a digital death sentence. Or, if nothing else, it just increases the cost.
To give you an example, when the BBC were digitising some of their holdings by migrating U-Matic video tapes to a digital format, they found that the costs went up 5x if a tape didn’t play back first time. The longer they waited the higher the costs because more tapes degraded in the meantime. Worse still, treating the casualties first, i.e. the degraded tapes, means that the ‘good stuff’ goes to the back of the queue and creates further causalities from the extra time of all that waiting in line.
The same is true of assets that are already in the digital world: Failing storage media, lost knowledge of content, more time spent trying to fix things just to stand still…
In the end, it’s people that are the biggest risk to digital content surviving into the future. People thinking that preservation is too hard, too expensive or tomorrow’s problem and not today’s.
This is my main message today. Get on and do something.
So what do you do when you have limited resources and need to get started?
You become frugal. You make the most of what limited time and money you have and you make it go a long way by targeting it wisely. Technology can help as I’ll show.
This idea of being frugal is exemplified in two papers by Tim Gollins from The National Archive. He espouses the idea of ‘parsimonious preservation’. Parsimonious here means excessively sparing or frugal. Doing the least amount possible while still getting the job done.
If you read only two papers on preservation, then I’d suggest these are the ones.
http://www.nationalarchives.gov.uk/documents/information-management/parsimonious-preservation.pdf
http://www.nationalarchives.gov.uk/documents/information-management/parsimonious-preservation-in-practice.pdf
So what does Tim suggest?
Firstly, know what you have. If you don’t know what you need to keep, how can you decide how to preserve it?
Second, worry about the bits, which means get the precious stuff into safe storage. If you haven’t got the stuff stored properly then how can you have any confidence of being able to use it again, in any form, in the future?
And that’s it. Everything else comes later, if ever. The strategy is frugal and minimal.
Know what you have and store it securely.
So let’s suppose you’ve got some money – a budget. Probably not as much as you’d like.
Back to Tim’s parsimonious preservation.
How are you going to be frugal when you spend it?
First thing Tim says is ‘know what you have’.
This means an inventory. A digital stock-take
At Arkivum, we integrate our services with Archivematica (more on this later) and Archivematica is a great way to conduct a whole series of inventory and preservation activities on your files.
This screenshot shows some of them.
They can all be automated and conducted in one go. This includes combining fixity checksummming, file format identification and normalisation, virus scanning, metadata creation, packaging for long-term archiving… and many other preservation tasks.
Transfer a bunch of files into Archivematica and it’ll create a complete Archival Information Package for you – again, more on why this is significant and useful later.
Archivematica is often considered as just tool for file format normalisation, that is, addressing the issue of files that are in obsolete formats and converting them for you, but it has a more parsimonious application too. It provides an automated way to generate a lot of information on your files, e.g. file types and checksums, without actually doing anything to the files at all if you don’t want it to.
So moving along in our parsimonious preservation journey. You now know what you have, but how are you going to store it? This is the second of Tim’s cornerstones: Keep the bits safe.
This diagram shows just some of the things that will happen over 25 years of trying to retain data and keep it safe.
In the diagram, a change from blue to yellow is when something happens that has to be managed. In a growing archive, adding or replacing media, e.g. tapes or discs, can be a daily process, so is effectively continual. The archive system needs regular monitoring and maintenance, which might mean monthly checks and updates. Data integrity needs to be actively verified, for example annual retrievals and integrity tests. Then comes obsolescence of hardware and software, meaning refreshes or upgrades that will typically be every 3 – 5 years.
In addition to technical change, there is the need to manage the transition of the staff who run the system, for example support staff and administrators. And suppliers of products and services will come and go too.
Basically, the lifetime of the data is longer than the lifetime of almost everything that’s used to keep that data safe and accessible. The key point is that long-term archiving is an active process and there’s always some form of change going on. And when change happens there’s always a risk that something goes wrong, and there’s always the need to validate that the change has been effected properly. This all requires time, expertise and money. Data archiving is a case of continual interventions to keep content alive and accessible.
So, how do you actually keep your data alive over decade-long timescales?
First you create multiple copies of your data and store them in different locations.
Maybe use different technologies and get different people to look after them. Diversity is your friend here.
At Arkivum we guarantee data integrity and we do this by following a 3-2-1 rule. At least three copies in three separate locations with two online and one offline. The offline copy is important. It’s the firebreak if all else goes wrong.
The second part is to migrate those copies regularly. Keep the data moving on to storage pastures new. If data is on a given storage media, such as a specific tape or disk, for more than 5 years, then you should worry. After 5 years, there’s a good chance that either the media will start to fail; or the data will start to degrade; or more likely the storage media and the system its within will become obsolete and unsupported, and it will get harder to get the data off that media when you need it.
At Arkivum we provide a bit-preservation service that guarantees data integrity. We provide the highest level of secure data storage as defined by the National Archives maturity model (we’re at level 4) and we have all the necessary people, processes and facilities in place to deliver this. We’re also regularly audited and insured for keeping data safe.
Without wanting to give you a sales pitch, the important thing is that we deliver this to our customers as a service. They get level 4 storage and fixity without needing their own skilled staff or dedicated infrastructure.
Ok, to recap… keep calm BUT you do need to get on with some parsimonious preservation.
Let’s now look at how you could get going with a minimum amount of fuss.
What do you need to get going? To be able to say that you’re doing something (and not nothing).
We’ve launched a new service designed to do just one thing – to get you on the digital preservation ladder. It is a minimal solution and delivers just what is required to get on that ladder.
The new service has just one job: file format normalisaiton along with all of the other tasks that Archivematica delivers... And long-term secure data archiving. It fulfills all the requirements of a frugal and parsimonious approach.
What then, is this new service?
It’s a fully hosted, cloud-based managed service; it has zero setup costs; and has no requirement for local IT infrastructure - you can start using the service immediately.
It follows the OAIS (Open Archival Information System) model and is designed for the long-term. It includes a 100% data integrity guarantee backed by indemnity insurance and escrow copies of all the customer’s data
The service provides an automated and managed digital preservation service using open source, industry standards-compliant products and services
And as I said, it requires no local IT infrastructure, resources or IT expertise - this is a fully-hosted and managed service.
Arkivum/Perpetua includes data escrow to provide you with a built-in exit strategy - data is stored with a third-party escrow provider so you have access to your data if and when you ever choose to leave the service.
To sum up, it’s hosted Archivematica bundled with hosted Arkivum/100.
What do you actually get?
The starting point, and you can scale from here as high as you like, is as follows…
You get 1TB of Arkivum archive storage. This is our flagship service that stores three copies of your data and that provides a 100% data integrity guarantee.
This is combined with 120GB of Archivematica cache storage. This is the space you’ll use when ingesting bundles of files into the service.
To get you going, you also get Arkivum and Archivematica training and a day of professional services that you can use for for individual training, consultancy or technical services
And of course support.
Let’s look at this a bit more schematically
Arkivum/Perpetua consists of the Archivematica service integrated to the Arkivum data archiving service.
Files, normally in batches, are transferred into the service. This process can be automated through any number of tools. A simple process is to setup a drop-folder that initiates the process of getting files into Archivematica. For Arkivum/Perpetua, we’ll be recommending ownCloud as a straightforward way of automating the ingest process. ownCloud is free and open source and is roughlyequivalent in functionality to Dropbox.
The transfer process includes automated steps for virus scanning, file format identification (knowing what formats you have is very important – it helps you in the future and also helps you have a conversation with the digitiser about the file formats they are supplying...); checksum generation; bundling into a bag for easy storage; and for creating a submission information package – SIP - that drives the subsequent normalisaiton process.
Archivematica then normalises the files (this is the file-format preservation step whereby the original file is converted to a range of alternative formats to maximise the likelihood of being able to access the file in the future - one of the core tenets of digital preservation).
The final part of the ingest process is to create the AIP and again this is a fully automated process. In Arkivum/Perpetua, the AIP is automatically moved into the Arkivum archive storage part of the service.
This then triggers the standard Arkivum archive process whereby the AIP is encrypted and replicated so as to create at least three copies, one of which is stored offline in escrow
The archived AIPs are online / near line and can be retried with an SLA that has them begin the process of retrieval within 5 minutes
That’s it. And it’s all fully automated.
So, to recap…
Arkivum/Perpetua is a fully automated end to end service for digitally preserving files of any type, size or format
The end result is a normalised AIP that has been archived to a repository that provides bit-level data preservation over decade-long timescales...
… very much the frugal basics, and the foundation for a long-term digital preservation system
My closing message is very simple. Get on with it. Don’t do nothing. Do the basics.
If you want a cost-effective solution that takes the line of least resistance and pain, a hosted, managed service is the way to go. Especially when you factor in the savings associated with local IT infrastructure, resources and expertise.
Our Arkivum/Perpetua service provides the absolute minimum service you need to be able to say that you are actively digital preserving your digital and digitised assets.
It’s available right now. And you can be up and running with it in minutes.
Don’t prevaricate. Don’t procrastinate. Start today.
We’ve written an eBook on all of this and doubles as a ‘how-to’ and ‘beginners guide’ for digital preservation. It’s just being finalized and will be out next week. We’ll email you when it’s ready.