316 views

RBMS 2011_Edwards

This document discusses Stanford's efforts to process and manage born-digital materials from several collections received in the late 1990s and 2000s. It outlines challenges around reading legacy media formats, describing technical metadata, and providing long-term access. The document also describes Stanford's collaboration with other institutions on the AIMS project and their use of FTK forensic software to extract metadata and organize large email collections.

Technology◦Education◦

First Digital Lives Research Conference: Personal Digital Archives for the 21st Century

FRED (Forensic Recovery Evidence Device: Digital Intelligence) Software: FTK suite (AccessData) - EnCase

AIMS Born-Digital Collections: An Inter-Institutional Model for StewardshipUniversity of VirginiaYale UniversityHull UniversityStanford UniversityFunded by the Andrew W. Mellon Foundation

Robert Creeley papersStephen Jay Gould papersKeith Henson papers re: to Project XanaduPeter Rutledge Koch papers

Stephen Jay Gould Influential American paleontologist, evolutionary biologist and historian of science, Gould began his career at Harvard University in 1967 and worked until his death in 2002. 98 3 ½” floppy diskettes61 5 ½” floppy diskettes4 sets of punch cards3 computer tapes

Dear Peter,Unfortunately we do not manufacture any motherboards now a days which can support the 5.25 floppy. The interface are different than 3.5 and they are becoming obsolete and are no longer available on the newer motherboards.

Trial Two – processing using FTKEmbedded viewer that reads over 200 file formats

http://accessdata.com/downloads/media/Recognized%20File%20Types%20FTK1%207-24-08.pdf

Viewing the files will NOT change last accessed dates

Easy to use user interface for creating “bookmarks” for hierarchical information (series, subseries)

Using “labels” on groups of files for descriptive metadata

Pattern & full-text searches – e.g. looking for restricted content such as credit cards, ss#, student grades, etc.

Output to xml FTK: Technical MetadataTechnical metadata associated with files

Other collections, other issuesRobert Creeley’s original media, processed via FTK:50,000+ emails.

Identified 8 files related to health records and

69 files with SS#More born-digital material received - May 2011 addenda :7 computers

RBMS 2011_Edwards

1.
Processing Born-Digital “Papers”@ StanfordGlynn Edwards, RBMS, Baton Rouge, LA - 2011
2.
Collections in thelate 1990sApple Computer Inc. recordsDouglas Engelbart papersStephen Cabrinety collectionBy 2000, over 7,000 items of legacy computer media received as part of hybrid collectionsNow over 26,000 items recorded during accessioning process
3.
Tracking Computer Media(then)
4.
First Digital LivesResearch Conference: Personal Digital Archives for the 21st Century
5.
FRED (Forensic RecoveryEvidence Device: Digital Intelligence) Software: FTK suite (AccessData) - EnCase
6.
AIMS Born-Digital Collections:An Inter-Institutional Model for StewardshipUniversity of VirginiaYale UniversityHull UniversityStanford UniversityFunded by the Andrew W. Mellon Foundation
7.
Robert Creeley papersStephenJay Gould papersKeith Henson papers re: to Project XanaduPeter Rutledge Koch papers
9.
Stephen Jay Gould Influential American paleontologist, evolutionary biologist and historian of science, Gould began his career at Harvard University in 1967 and worked until his death in 2002. 98 3 ½” floppy diskettes61 5 ½” floppy diskettes4 sets of punch cards3 computer tapes
10.
Dear Peter,Unfortunately wedo not manufacture any motherboards now a days which can support the 5.25 floppy. The interface are different than 3.5 and they are becoming obsolete and are no longer available on the newer motherboards.
12.
Capture: success &failure
13.
Trial One –processing using Explorer
14.
Trial Two –processing using FTKEmbedded viewer that reads over 200 file formats
15.
http://accessdata.com/downloads/media/Recognized%20File%20Types%20FTK1%207-24-08.pdf
16.
Viewing the fileswill NOT change last accessed dates
17.
Easy to useuser interface for creating “bookmarks” for hierarchical information (series, subseries)
18.
Using “labels” ongroups of files for descriptive metadata
19.
Pattern & full-textsearches – e.g. looking for restricted content such as credit cards, ss#, student grades, etc.
20.
Output to xmlFTK: Technical MetadataTechnical metadata associated with files
21.
View Files in“obsolete” file format
22.
Create Hierarchy using“bookmarks”
23.
Administrative & descriptivemetadata
24.
Series 6: Gould’sBorn-Digital Material
25.
Series 6: ProcessingNote
26.
Other collections, otherissuesRobert Creeley’s original media, processed via FTK:50,000+ emails.
27.
Identified 8 filesrelated to health records and
28.
69 files withSS#More born-digital material received - May 2011 addenda :7 computers
29.
3 zip drives

Editor's Notes

#2 [INTRO]A little over two years ago, a few elements converged and a core group of us at Stanford began to get more serious about developing a viable method for processing the born-digital “papers” in our collectionsMost of my talk is centered around our first trialsbut I’d first like to describe the context and pressures at SUL that put us on this path …
#3 The major pressure was the growing quantity of legacy media in our “backlog” …With Stanford situated in Silicon Valley, it’s no big surprise that we have a lot of computer collections that contain old legacy media. Hence our acquisition in the late 1990s of the records of Apple Computer Inc., the papers of Douglas Engelbart and a really large collection of computer games and software. [images: mouse/engelbart, box of Atari games/Cabrinety]
#4 Because of those very acquisitions, in 1997, the Manuscripts Division began tracking the incoming quantities – just an overall count – of legacy computer media contained in new accessions. By the end of the decade we had recorded over 7,000 “items”Increasingly our b-d material comes from faculty, artists, writers, organizations Today, we have over 26,000 items of legacy media recorded in our backlog. [Univ. Archives has ~700 listed]
#5 The other element was an event in February 2009. A staff member on our digital team (Michael Olson), who had previously worked in Manuscripts, attended the Digital Lives Project’s first conference at the British Library. Two things occurred : He heard about a study* done at the B.L. on data loss in legacy computer media (3% per year) and … He saw that the B.L. was exploring the use of forensic tools for capturing data from media. Based on this and coupled with the weight of our growing backlog of media – we decided on two courses of action: *McLeod, Rory paper “Risk Assessment; using a risk based approach to prioritise handheld digital information” 2008
#6 First we purchased forensic hardware and software to enable us to capture and view legacy media and files. Hardware from Digital Intelligence (FRED) Software – we purchased and tested both FTK and En-Case forensic software. This framed the nucleus of our digital lab … And yet, most forensic equipment is geared toward current/modern media. So, we searched Ebay for old floppy disk drives to use with FRED
#7 Next, we partnered with 3 other institutions (U. Va., Yale and Hull) - as part of the AIMS Project - funded by the Andrew W. Mellon Foundation. (AIMS Born-Digital Collections: An Inter-Institutional Model for Stewardship) The goals of the project were to:to process b-d material from 13 (mostly legacy) collections and to deliver the b-d material in some fashion by the end of the grant
#8 Each repository hired a Digital Archivist. Peter Chan was hired at SUL in January 2010 and began actual work on imaging the disks for our 4 collections and trying out various methods for “processing” the data. We choose collections that contained different types of media and content. They are: Robert Creeley papers (poet – mostly email, some writing)Stephen Jay Gould papers (paleontologist, author – writing and some data sets)Peter Routledge Koch papers (fine press printer – a mix of files: email, text, image, and design files like Adobe’s InDesign. This was the only collection with files transferred directly from a donor’s current computer)Xanadu Project records (early hypertext project – software program on 6 hard drives)
#9 It is now 1.5 years later and we have created a viable (although under constant development) workflow for accessioning and processing b-d materials using forensic tools. This is the more detailed workflow for collections that would be “fully” processed… We are also working on a minimal processing workflow and one that would fully “accession” the data – i.e. remove from physical storage media - and store for later processing.
#10 One of the collections that has informed our development of b-d practice is that of Stephen Jay Gould, which contains both paper (analog) – over 500 linear feet – and 3 cartons of digital material. 98 3.5-inch floppy disks61 5.25-inch floppy disks4 sets of punch cards3 computer tapes In total, over 550 linear feet have been rec’d in 8 accessions.His papers and the audio/video are being processed concurrently – by archivist, Jenny Johnson - and will be done this August This month, the processing team discovered another 5 cartons of punch cards in the 2008 accn (21 sets). [This recent find won’t be resolved by end of grant]
#13 Using the two different capture stations – FRED & the floppy/zip station – we created disk images of all the disks : 8 sets of punch cards were successfully read by our neighbors at the Computer History Museum. 1 set was unreadable – as it had no sorting key.We also began tracking loss our own loss statistics - “success or failure” of captures - in a spreadsheet; which we link to our accession records in Archivists’ Toolkit.Loss rate for floppies in Gould are 5% - loss in other collections was higher.Creeley: 6% loss: 1 out of 12 CDs unreadable; 3 out of 53 floppies unreadable. [1987-2004?]Xanadu: 4 of 6 hard drives inoperable – or 67% damaged: [PC’s report: There were mechanical or electrical problems with other drives (one didn't spin after it was powered up and one gave a "dong" sound after it was powered up). We are not sure what the problems with the remaining two drives are – they do spin after power up but we cannot access the data.] Cost to recover ~ $10,000 (2.5K / drive)
#14 To process the materials during our initial trial, we used Windows Explorer.Folders were created that mirrored “series” and “titles” in EAD and files were moved from original media folder into appropriate place. This however changed data associated with the files – such as original path, etc.At this point, Peter Chan attended a week long session on the use of forensic software at Digital Intelligence – focusing on FTKWhile much more robust than we needed for archival work, he decided that many of the tools in FTK could easily be adapted for archival processing. We discovered that this practice mirrored work beginning at both BL and Oxford.
#16 Technical metadata for the disk images are displayed here. The are arranged by floppy disk and display file format (where identifiable), file size, checksum, creation dates, etc. One can change the view to add additional columns, such as duplicate or primary file, etc.
#17 The embedded viewer in FTK – from the same company that does Quick View Plus – allows you to quickly see the contents of many of the files
#18 Here are two quick screen shots showing archival HIERARCHY using FTK’s “bookmark” feature.Series or Subseries can be added as metadata to individual or groups of files by highlighting or checking the boxes of the files in the lower panel.
#20 Description for the three different formats in Gould will be merged at the end of the summer or early fall – paper, audio/video and born-digital files – but the level of description will be different.Gould’s papers are processed to the folder level for most of the collectionThe audio and video are listed at the item level to facilitate any future digitizationThe born-digital material will have Series level description with notes about original mediacapture and processing methods loss/damaged media and delivery methods
#21 Here is a partial view of our working draft for processing notes for Gould b-d “series”
#22 We encountered different issues in our other “AIMS” collections - the main one I will mention is the Robert Creeley collection… His papers originally contained : 53 floppies, 5 zip disks, and 3 CDsInitially the computer media was segregated into a separate collection – but will need to be merged into the main collection record and finding aid in the fall.After processing with FTK, the disk images garnered: Identified 50K emails Identified 8 files related to health records Identified 69 files with SS#A recent addenda complicates the processing of Creeley’s born-digital material : rec’d in May 2011 containing b-d media will need to be processed – and may allow us to have more complete set of emails, drafts, etc.7 computers3 zip drives121 optical discs422 3.5-inch floppy diskettes1 Zip 250 USB Drive1 Olympus C-4000 Camedia Digital Camera & flash cards1 20-gigabyte iPodWe have yet to analyze the data in the new accession and compare to original data but two issues cropped upHow to process and deliver multiple computers over creators life cycleData was captured from various CDs and computers to create an overview of the b-d material before transfer to SUL – what got changed in the process?![image from wikipedia taken by Elsa Dorman]
#23 In processing initial computer media, PC used folder titles on the disks as keywords for files
#24 Using Creeley’s initial text data, we have worked with two individuals – one working in the Digital Humanities – who took the header info from the 50K emails and created a network graph (Elijah Meeks) : Header information from Robert Creeley’s 50,000+ emails emphasizing the connection between the poet and Gerard Malanga.
#27 To wrap up:Donors & users expect us to acquire, organize, preserve and provide access to b-d collectionsSpecial Collections staff capture, appraise, arrange and describe b-d materials AND contribute to requirements for both access and delivery as well as arrangement and description toolsOur digital group will preserve in our preservation repository (SDR) and provide public access and invite participation – Hypatia (under development)

RBMS 2011_Edwards

More Related Content

What's hot

Similar to RBMS 2011_Edwards

Recently uploaded

RBMS 2011_Edwards

Editor's Notes