Everyone's A Mechanic


Published on

Presentation delivered at the 2013 Annual Meeting of the Midwest Archives Conference.

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • You've heard from Cate about getting the records into your repository, so I'm going to take some time to talk about what you do with them once you've got them. Before we get to that, however, it's worth re-considering why keeping records in electronic format is valuable, and what some of the major challenges might be.
  • Now, of course, we'd all like to have the Ferrari version of an e-records system, with all the bells and whistles that implies: Trusted Digital Repository certification, automatic monitoring for bit rot, an integrated access system, and everything else we think about when we think about the ideal digital preservation system. For most of us, however, our e-records process looks a bit more like this:
  • Now, nobody goes out and buys this car. (Well, OK, maybe if you're a teenager and need your first set of wheels.) But this car sort of sneaks up on you-- you ignore the check engine light here, you don't fix the rust spots there, and pretty soon your bumper is falling off. It's the same thing with digital preservation-- it's easy to put the stuff into the backlog until the day you're ready to do the stuff 'right'-- but all the while, you're moving farther and farther away from the original hardware and software environment, your files are degrading, and soon enough you get to the point where your e-records are irreparably damaged-- or unreadable altogether.
  • Again: you cannot do this, unless you want to have the digital file equivalent of the car from slide 4. Eventually, you need to have something that gets you from point A to point B, even if you think it's ungainly or awkward. This isn't to say that you should give up on the Ferrari digipres environment altogether, but in the meantime, while you're waiting for funding or staffing to materialize for that, you can use the best qualities of that system to aspire towards.
  • Ultimately, you are not doing anything different with e-records than you would do with analog records. There are really only two main differences: one, you have the potential to be able to provide a level of descriptive granularity that previous generations of users would have killed for; and two, it is much, much easier to lose the records to preservation problems if you aren't paying attention. Hopefully, the tools I'm going to go over will help you leverage the first and avoid the second.
  • So, this was our first sort of “crossroads” point, in that we had to decide, in the absence of a dedicated repository, what exactly we were trying to accomplish through running these records through their paces. Ultimately, we realized that collecting this metadata served a couple of purposes. Short-term, we wanted to provide the basic technical data to allow future-us to determine what these files were and how to access them. Long-term, we hope to use the metadata we're generating for when we DO get a repository to facilitate ingest. That in mind, we started testing. And testing. And testing.
  • Because ultimately, if you're looking for a swiss-army tool for every conceivable file format, you're probably looking in vain. In our case, we mostly have been dealing with documents, presentation files, images, and a few audio files. You may have different needs, some of which you don't even know about yet, which is why it is so important to know your own collection and be familiar with the tool options before you get started.
  • remember: use "we used" for emphasis and why in this case We found Duke Data Accessioner to be a useful tool both because it generated checksums before and after ingest, but also because it maintained a record of the file structure of the original documents. The XML generated provided us with a nice seed document for PREMIS preservation metadata, and the migration aspect of the tool allowed us to easily create working copies, which I'll talk about in a minute.
  • This is an example of DROID in action. I've highlighted an example of a file that has been identified as mischaracterized, in this case a Word Doc that has the characteristics of a wordperfect document. In this case, the author probably tried to reformat by changing the file extension, which doesn't really work at all-- more on this in a bit. But what this allows us to do is go to that file, do a quick appraisal on it, and either fix or weed it, depending on its importance.
  • So, to go back to my car metaphor, everything you've been doing so far has been to the original, which here represents the car body, engine, transmission, axels, etc. Now we're going to create a working copy, which allows you to preserve the authenticity of the originals while improving accessibility on the user end. Think of this as “tricking out” your car-- adding a spoiler, decals, a giant bass speaker in the trunk, etc. Had I thought of this metaphor earlier than last night, I would have put a photo of the “Pimp My Ride” guy here, but nobody's perfect.
  • So, as noted, because you're keeping the SIP as your authenticity copy, you have a little more leeway to rearrange to reflect logical order. This is especially important in cases where no apparent order exists, such as the accessions you get where someone just threw stuff onto disks as quickly as possible. One aspect of processing that can get overlooked during this kind of MPLP processing is dealing with sensitive information. Luckily, most of what you're wanting to restrict falls into specific patterns, which is where programs like Firefly come in.
  • Here's an example of FreeCommander at use during the arrangement process. In this case, I'm using it to show a before and after folder-renaming process, but as you can see it greatly increases your ability to move files around without clicking through 8 separate windows. There's also a compare feature to make sure that all relevant files have been moved from source to destination, and you can get more in-depth information about the files from the details list, which is another quick way to appraise and/or describe the accession.
  • Now, File level description is useful in facilitating searching across numerous documents on known quantities, such as author, date, subject, etc., but you also don't want to spend too much time on it. We've used a couple of tools here to increase the discoverability of files within our collection. EXIFTool is very good at pulling out embedded metadata from document and common application files, especially photos. It is, however, only as good as the metadata your creators put into it, so this is an opportunity for records management to shine. We also found Renamer very useful to mass rename folders and files to include more descriptive info. Authenticity!
  • In most cases, I prefer that tools export to XML to facilitate later ingest into a repository, but since EXIFTool spits out CSV files, we were able to use that to provide some rudimentary access to collections. This is a simple flat spreadsheet, but you can see that it allows you to quickly full-text search by name, filter by date, sort by author, title, keywords, etc. If you have any sort of facility with database design you can do a lot more with this, such as grouping by folder, etc. You also have the opportunity to massage some of the metadata en masse at this step-- I might change ces5 to Carlos Santiago for clarity in the access table, e.g.
  • In a way, everything you've been doing up to this point is leading up to preservation, because all that metadata you've collected is going to allow you to have records of what your files are and what hardware and software you need to open them. If digital archeology isn't your thing, however, you might want to look into normalizing files to preservation file formats. We used NAA's DPSP because it provides a sort of all-in-one service-- it normalizes files, keeps the originals bitstreams in XML wrappers, moves the files to the location of your choice, and-- importantly-- automatically documents every action it takes.
  • This is an example of an Archival Information Package generated by XENA, which is the normalization component of the DPSP. In this case, XENA is able to render the original file via one of its embedded viewers, but it also provides the XML technical metadata wrapper so that the file will self-document in case the XENA viewer is unavailable. The thing I like about XENA in particular is that you can export both the XML and the normalized and original copies from the XENA interface, which improves the longevity of the AIP even further.
  • Ultimately, whether you use all or none of the tools just described, there are going to be a few things you will always want to do with your electronic records during processing to make sure they have been dealt with properly. It may seem like you're keeping a lot of copies of the same material, and you're right-- but that's really by design. Especially if you follow the advice on the preservation slide and keep the copies in separate locations, you are going to lessen your chance for loss or failure of the files in question. Think of it as insurance for that car you just spent all that money on tricking out.
  • In the end, the “car” you're going to end up with is going to be a lot like my trusty 2000 Mazda Protege. This car had no pickup, the radio had been ripped out, there was a giant dent in the side, the AC didn't work... but crucially, I kept the systems that kept it actually running up to date, and so regardless of its lack of creature comforts, I was usually able to get from Point A to Point B. So it is with processing electronic records-- as long as you are doing your basic maintenance, you will be able to keep your e-recs going much longer than you would have otherwise, even if that Ferrari is still somewhat down the road.
  • Everyone's A Mechanic

    1. 1. Everyones A Mechanic: Building a Simple E- records workflow Brad Houston University of Wisconsin-Milwaukee April 20, 2013
    2. 2. Are e-records worth it?​Well, *I* think so...​  ​Improved access to content  ​More information about context  ​Increased manipulability for research analysis​BUT...  ​Sheer volume increases opacity  ​Digital preservation and "dark ages"  ​Time and Money for new systems (?)
    3. 3. Where would we like to be with E-records? Source: JohnVW on Flickr
    4. 4. Where are we right now? Source: an0nym0n0us on Flickr
    5. 5. The Mechanic Metaphor Borrowed from a Helen Tibbo talk atPurdue University, September 2012  “In the early days of the automobile, everyone was a mechanic.” The scary implication: You have to knowenough about your process to fix things The exciting implication: You can make itas simple or as complex as you need
    6. 6. Whats standing in our way? ​Uncertainty re: appropriate procedures ​Unfamiliarity with e-records tools and systems ​Unfamiliarity with e-records as a medium ​Perceived complexity of metadata and/orpreservation requirements/systems​The common thread: letting the perfect be the enemy of the good!​
    7. 7. Dirty Little Secret about e-records:You already know how to do this!  Accessioning: gain intellectual/physical control, identify potential problems  Arrangement: Put files in series/other logical order  Description: Provide access at various levels of the collection  Preservation: Ensure the ongoing integrity/usability of the materials
    8. 8. The Local Catalyst/Example ​Office of the Chancellor  ​Records from personal and office computers  ​Various file types and formats​  ​Some pre-appraisal by office staff and archivists  Large volume of files– automation a must ​The desired end product: an AIP!  Short-Term: Provide basic description and preservation metadata  Long-Term: Prepare for ingest into future repository​
    9. 9. The following worked for us, but… ​The tools that work for us may not be thebest for *your* needs!  ​Take stock of your own e-records holdings  ​Browse tool catalogs/reviews and experiment  Document what you did to your files and why​ The “Chewing Gum/Baling Wire” approach  Different tools gather complementary data  …But check outputs for redundancies  One tool’s failure won’t bring down the whole thing (probably)
    10. 10. Accessioning and Pre-appraisal Goals: Establish authenticity, performcollection overview and QC Duke Data Accessioner  ​Quick and easy checksum generation  ​Basic technical metadata for PREMIS—XML format ​DROID and/or JHOVE  ​File Format Identification and validation  ​DROID reporting gives overview of collection  Initial triaging for preservation?
    11. 11. The Working Copy Watershed ​From this step on, minimize changes toyour originals:  Create a working copy for weeding/ arrangement  ​Write-protect originals! Creating a disk image (*.iso, *.uif, etc.)may be helpful for preserving fixity here  See also digital forensics tools for increased authenticity
    12. 12. Appraisal and Arrangement ​Goals: Move files to reflect logical order,identify/restrict confidential info FreeCommander (and family)  ​Two-pane browsing– easy arrangement  ​integrated viewer for quick appraisal ​Firefly SSN Finder  ​Identifies Social security, credit card #s, other sensitive info  ​Supplements, not replaces, manual inspection
    13. 13. Description: File Level Goals: extract technical/descriptive metadataautomatically; improve discovery ​EXIFTool  Pulls embedded metadata from files (esp. photos)  ​ Exports data into CSV for tabular description ​ReNamer  Standardize file names, strip special characters  ​Option to add embedded metadata to filenamesn.b. Automation is especially key for this step.(Think MPLP!)
    14. 14. File-Level Description Table
    15. 15. Description: Collection Level Goals: discovery of collection as a whole ​No special tools necessary- Describe as youwould paper records! ​That said, a few EAD considerations...  ​<phystech> should include hardware, OS, and software needed to render all formats  Describe at series/folder, not file, level  ​Consider IP and/or confidentiality issues if including digital object links
    16. 16. Preservation ​Goal: Avoid obsolescence and/ortechnology failureDigital Preservation Software Platform  ​Normalizes files to preservation formats  ​Logs every preservation action taken  n.b. Use this *in addition* to other metadata gathered.​File Storage Location  ​Use stable media or network storage (backed up) -- i.e. *not* CDs, floppies, etc.  ​Best practice: 2 onsite copies, 1 offsite copy​
    17. 17. XENA AIP Example
    18. 18. Putting it all together  Determine needed metadata for preservation/access − Delete extraneous output  Collocate if you can (XSLT, etc.); document whats where if you cant  Provide access to as much or as little metadata as you need  Keep the originals for authenticity − Access copies for everyday usage
    19. 19. And don’t forget…Digital preservation, like  Click to edit thecar maintenance, is an outline text formatongoing process. − Second Outline  Know your collections Level  Have a monitoring plan  Third Outline  Keep up with best Level practices, discussions − Fourth  Dont give up! Anything you I hated this car…Outline to my but it got me can do is helpful. destination. (Most of the time.) Level  Fifth Outline Level
    20. 20. Resources ​AIMS white paper on Born-Digital Records http://www.digitalcurationservices.org/aims/white-paper/ ​Digital Curation Centre (UK) http://digitalcurationexchange.org/ ​The Signal: LOC Digital Preservation blog http://blogs.loc.gov/digitalpreservation/ Digital Curation Exchange http://digitalcurationexchange.org/ ​Practical E-records
    21. 21. Thank You Brad Houston University Records Archivist University of Wisconsin-Milwaukee Libraries, Archives Dept. houstobn@uwm.edu This presentation available for download at:https://www.box.com/s/vx22f1jus8821d20z y5t