I'm assuming that the audience islikely to be archivists and other non-programmers. Given that, I gear this presentation towards the philosophy that underpins the futureArch/BEAM development work rather than the technology itself. Here is not a place to talk about architectures and such. The main gist of the philosophy is to keep things simple, modular and practical. We're a pragmatic bunch I think!
Just to be clear on the distinction, futureArch is the project and the project is about establishing the BEAM service at the Bodleian Library. BEAM as a service to the library is quite interesting – it quite neatly postpones the need for organisational change by building a service in parallel with existing practice, augmenting the traditional work flow, rather than trying to replace it. Ultimately archiving will adopt to digital, like it absorbed photography in the past, but in the mean time we'll do the digital work...
This is a very simplistic view of how things get into archives. I appreciate there is far more going on than I know or can cover here. Firstly the stuff arrives at the Library from the donor. At this point it is checked over and any risks – active mould, etc. are identified. Then the stuff moves to the safe storage location. In an ideal world it is then prepared, appraised and catalogued, but staffing, cost and political will will all very the time between a collection arriving & it being worked on. (Archives & waiting seem to go hand in hand!) Finally, the things are made available to researchers, etc.
So that is the traditional work flow – how then are we going about “hybridizing” it – introducing the digital. Rather than change the work flow, we inject into it new steps, hopefully making handling digital material easier for the archivists. First then, we get in on that risk-assessment and suggest that digital material needs special treatment on arrival. (It is already at risk – it is effectively “born-mouldy”!) When found, the material is transferred to BEAM for processing. The earlier the better because 1) it may take time to process it so the sooner we get it the sooner we can start work on it and 2) because we can “stabilise” it sooner.
Having identified the digital materials, it needs to safely and securely get to BEAM. This separation of the materials must also be traceable & order preserved. We get very technical here and manage the separation with two sheets of paper. In addition to the physical transfer, BEAM are also building a digital transfer service – Web-based deposit – to make the collection of born-digital material as easy as possible. This also helps encourage donors to deposit little and often rather than dump everything on us (in obsolete formats) at the end of their working lives!
When an archivist encounters a 3inch disk in a box it is unlikely they will know what to do with it, let alone catalogue the contents of that disk. BEAM will provide cataloguers the tools they need to catalogue all the weird and wonderful digital stuff they are seeing more and more of. This includes transforming formats for presentation & cataloguing, providing hints & metadata generated automatically or by the digital archivist, etc.
The Bodleian generally insists readers come to the reading rooms to view manuscripts. This policy is not changing for the digital materials (yet – it is possible reader pressure will change that in the future). So we need a way to present the born-digital along with the paper and to do this we provide customised laptops to the reading rooms. These laptops can be used alongside the paper and we do everything we can to prevent data loss by digital transfer.
So that is what BEAM is offering to the Library. The next question is what are BEAM doing themselves – which is to say what is going on in that mysterious blue box?
The answer is nothing very different to the (traditional) work flow already outlined. We are effectively mimicking the archive process to handle digital materials. Capture is like transfer, only instead of a delivery van we have digital forensics. Digital material needs digital storage and that storage needs the same considerations as paper stores. BS5454 is actually quite relevant to digital – both literally and figuratively – for example its important to check your network neighbours as well as your real ones. Cataloguing remains a separate process, but I can see that merging in the future. We use the digital metadata to feed any preservation actions that may be required to maintain the storage of digital items. As mentioned, we see a hybrid approach to reading room usage of the materials, hence the merged box.
The bit I'll consider here is this bit – getting digital material into a safe place. Why? Because this is probably what is concerning us the most right now – stuff is arriving and we need to get it on the shelves now and not leave it out in the rain!
We're contributing to the development of the Bodleian's digital asset management system. This will provide BEAM with both computing power and disk-based storage space. In order to maintain the data we're keeping two copies on separate sites. The data is written to both stores and cross-checks are made, but not synchronisation because if one site fails we do not want to sync the error to the other site! We're opting for simple file storage for the first phase of the archive work flow. Hopefully simplicity will stand the test of time! Also we're not dealing with single “documents” but compound disk images. Repository software seems to add complexity that doesn't provide any benefit at this stage in the process.
Open source helps in many ways. For example, if I use software to do a transformation, maybe I want to keep that software. If a project fails, maybe someone will adopt it or at least maintain the final version. Freedom of usage seems to provide the edge in terms of sustainability (in the non-financial sense). Open standards are great. But don't make work for yourself trying to implement an open standard internally if it doesn't meet your needs. Keep open standards for the boundaries of the system and avoid shoehorns! Keeping the original may end up with lots of data, but it seems to me it is the only way to ensure integrity. As disks get bigger manual cataloguing of the born-digital will become impossible. How then do you know what to keep? (nb. Storage is finite!)
Fundamental to futureArch development is the building of a system where any or all of the parts can (and will) be replaced. This is the only way to ensure the long-term survival of the system and is founded on the assumption that whatever we do now will be done better by someone else later. Imagine a car with all the parts glued together. It'd be a shame to replace the whole car just because a bulb blew! Modularity & virtualisation allow for a scalable system too – adding resource when needed and scaling back when not.
BEAM is pretty ambitious! We're not intending to build all of these parts from scratch however. We'll be begging, borrowing and stealing all we can! :-) Which is to say, if a tool exists, it is well worth seeing if it meets your needs and if it does, use it. Keep stuff simple and keep stuff easy.
Developing and Implementing Tools to Manage Hybrid Archives
Bodleian Electronic Archives & Manuscripts Developing and Implementing Tools to Manage Hybrid Archives
futureArch & BEAM is the project to establish the BEAM service for 3 years (since Sept 2008) The goal: “ to enable the Bodleian Library to continue to develop and deliver archival and manuscript collections as the format of such material changes*” *which is to say becomes digital five staff: <ul><ul><li>Project Manager/Digital Archivist