Developing and Implementing Tools to Manage Hybrid Archives


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • I'm assuming that the audience islikely to be archivists and other non-programmers. Given that, I gear this presentation towards the philosophy that underpins the futureArch/BEAM development work rather than the technology itself. Here is not a place to talk about architectures and such. The main gist of the philosophy is to keep things simple, modular and practical. We're a pragmatic bunch I think!
  • Just to be clear on the distinction, futureArch is the project and the project is about establishing the BEAM service at the Bodleian Library. BEAM as a service to the library is quite interesting – it quite neatly postpones the need for organisational change by building a service in parallel with existing practice, augmenting the traditional work flow, rather than trying to replace it. Ultimately archiving will adopt to digital, like it absorbed photography in the past, but in the mean time we'll do the digital work...
  • This is a very simplistic view of how things get into archives. I appreciate there is far more going on than I know or can cover here. Firstly the stuff arrives at the Library from the donor. At this point it is checked over and any risks – active mould, etc. are identified. Then the stuff moves to the safe storage location. In an ideal world it is then prepared, appraised and catalogued, but staffing, cost and political will will all very the time between a collection arriving & it being worked on. (Archives & waiting seem to go hand in hand!) Finally, the things are made available to researchers, etc.
  • So that is the traditional work flow – how then are we going about “hybridizing” it – introducing the digital. Rather than change the work flow, we inject into it new steps, hopefully making handling digital material easier for the archivists. First then, we get in on that risk-assessment and suggest that digital material needs special treatment on arrival. (It is already at risk – it is effectively “born-mouldy”!) When found, the material is transferred to BEAM for processing. The earlier the better because 1) it may take time to process it so the sooner we get it the sooner we can start work on it and 2) because we can “stabilise” it sooner.
  • Having identified the digital materials, it needs to safely and securely get to BEAM. This separation of the materials must also be traceable & order preserved. We get very technical here and manage the separation with two sheets of paper. In addition to the physical transfer, BEAM are also building a digital transfer service – Web-based deposit – to make the collection of born-digital material as easy as possible. This also helps encourage donors to deposit little and often rather than dump everything on us (in obsolete formats) at the end of their working lives!
  • When an archivist encounters a 3inch disk in a box it is unlikely they will know what to do with it, let alone catalogue the contents of that disk. BEAM will provide cataloguers the tools they need to catalogue all the weird and wonderful digital stuff they are seeing more and more of. This includes transforming formats for presentation & cataloguing, providing hints & metadata generated automatically or by the digital archivist, etc.
  • The Bodleian generally insists readers come to the reading rooms to view manuscripts. This policy is not changing for the digital materials (yet – it is possible reader pressure will change that in the future). So we need a way to present the born-digital along with the paper and to do this we provide customised laptops to the reading rooms. These laptops can be used alongside the paper and we do everything we can to prevent data loss by digital transfer.
  • So that is what BEAM is offering to the Library. The next question is what are BEAM doing themselves – which is to say what is going on in that mysterious blue box?
  • The answer is nothing very different to the (traditional) work flow already outlined. We are effectively mimicking the archive process to handle digital materials. Capture is like transfer, only instead of a delivery van we have digital forensics. Digital material needs digital storage and that storage needs the same considerations as paper stores. BS5454 is actually quite relevant to digital – both literally and figuratively – for example its important to check your network neighbours as well as your real ones. Cataloguing remains a separate process, but I can see that merging in the future. We use the digital metadata to feed any preservation actions that may be required to maintain the storage of digital items. As mentioned, we see a hybrid approach to reading room usage of the materials, hence the merged box.
  • The bit I'll consider here is this bit – getting digital material into a safe place. Why? Because this is probably what is concerning us the most right now – stuff is arriving and we need to get it on the shelves now and not leave it out in the rain!
  • We're contributing to the development of the Bodleian's digital asset management system. This will provide BEAM with both computing power and disk-based storage space. In order to maintain the data we're keeping two copies on separate sites. The data is written to both stores and cross-checks are made, but not synchronisation because if one site fails we do not want to sync the error to the other site! We're opting for simple file storage for the first phase of the archive work flow. Hopefully simplicity will stand the test of time! Also we're not dealing with single “documents” but compound disk images. Repository software seems to add complexity that doesn't provide any benefit at this stage in the process.
  • Open source helps in many ways. For example, if I use software to do a transformation, maybe I want to keep that software. If a project fails, maybe someone will adopt it or at least maintain the final version. Freedom of usage seems to provide the edge in terms of sustainability (in the non-financial sense). Open standards are great. But don't make work for yourself trying to implement an open standard internally if it doesn't meet your needs. Keep open standards for the boundaries of the system and avoid shoehorns! Keeping the original may end up with lots of data, but it seems to me it is the only way to ensure integrity. As disks get bigger manual cataloguing of the born-digital will become impossible. How then do you know what to keep? (nb. Storage is finite!)
  • Fundamental to futureArch development is the building of a system where any or all of the parts can (and will) be replaced. This is the only way to ensure the long-term survival of the system and is founded on the assumption that whatever we do now will be done better by someone else later. Imagine a car with all the parts glued together. It'd be a shame to replace the whole car just because a bulb blew! Modularity & virtualisation allow for a scalable system too – adding resource when needed and scaling back when not.
  • BEAM is pretty ambitious! We're not intending to build all of these parts from scratch however. We'll be begging, borrowing and stealing all we can! :-) Which is to say, if a tool exists, it is well worth seeing if it meets your needs and if it does, use it. Keep stuff simple and keep stuff easy.
  • Developing and Implementing Tools to Manage Hybrid Archives

    1. 1. Bodleian Electronic Archives & Manuscripts Developing and Implementing Tools to Manage Hybrid Archives
    2. 2. futureArch & BEAM is the project to establish the BEAM service for 3 years (since Sept 2008) The goal: “ to enable the Bodleian Library to continue to develop and deliver archival and manuscript collections as the format of such material changes*” *which is to say becomes digital five staff: <ul><ul><li>Project Manager/Digital Archivist
    3. 3. Project Archivist
    4. 4. Two Software Engineers (of which I am one)
    5. 5. One graduate trainee a year for two years </li></ul></ul>funded by
    6. 6. Overview <ul><li>Bodleian archival work flow (by a software engineer :-))
    7. 7. How BEAM is 'hybridizing' it
    8. 8. Demonstrating the Ingest tool
    9. 9. BEAM tools & development principles </li></ul>
    10. 10. A simple view of archive work flow Transfer Catalogue Use Store quarantine time
    11. 11. Augmenting the work flow <ul><li>Identify
    12. 12. Transfer to BEAM
    13. 13. Staff access & metadata
    14. 14. Researcher access </li></ul>BEAM Transfer Catalogue Use Store Store quarantine time
    15. 15. <ul><li>Identify
    16. 16. Transfer to BEAM
    17. 17. Staff access & metadata
    18. 18. Researcher access </li></ul>Augmenting the work flow BEAM Transfer Catalogue Use Store Store Digital Transfer quarantine time
    19. 19. <ul><li>Identify
    20. 20. Transfer to BEAM
    21. 21. Staff access & metadata
    22. 22. Researcher access </li></ul>Augmenting the work flow BEAM Transfer Catalogue Use Store Store quarantine time
    23. 23. <ul><li>Identify
    24. 24. Transfer to BEAM
    25. 25. Staff access & metadata
    26. 26. Researcher access </li></ul>Augmenting the work flow BEAM Transfer Catalogue Use Store Store quarantine time
    27. 27. Issues so far <ul><li>Early identification
    28. 28. Separation
    29. 29. Safe storage
    30. 30. Access </li><ul><li>Cataloguers
    31. 31. Researchers
    32. 32. Public Web </li></ul></ul>
    33. 33. This mysterious blue box? ? BEAM Transfer Catalogue Use Store Store quarantine time
    34. 34. The mimetic BEAM Ingest Capture Meta Catalogue Use Store Transfer EAD Catalogue Store Digital Transfer Shelves Digital Store Preservation Actions
    35. 35. The mimetic BEAM Ingest Capture Meta Catalogue Use Store Transfer EAD Catalogue Store Digital Transfer Shelves Digital Store Preservation Actions
    36. 36. A bit about BEAM storage <ul><li>Bodleian's Digital Asset Management System
    37. 37. Two sites / two servers
    38. 38. Different storage technology </li><ul><li>One a self-healing file system, one tried and tested </li></ul><li>Must be local
    39. 39. Big & shiny but its still just a file system </li><ul><li>So anyone can do it! </li></ul></ul>
    40. 40. A BEAM Tool: Ingest
    41. 41. Tool building principles <ul><li>Open source where possible
    42. 42. Open standards where appropriate
    43. 43. Shortest path to “stabilisation”
    44. 44. Protect the original, transform late </li></ul>
    45. 45. Tool building principles <ul>Modules not monoliths! </ul>
    46. 46. More BEAM Tools <ul><li>Media recognition manual – see blog
    47. 47. Separation sheets
    48. 48. Collection Management Database
    49. 49. Web-based deposit </li></ul><ul><li>DAMS
    50. 50. Monitoring
    51. 51. Automatic Metadata
    52. 52. Cataloguer Interface
    53. 53. Reading Room Laptops </li></ul>