Hi folks! I know it’s really early, so I’m going to see if I can help to wake you up. We’re all struggling with budget cuts and far too much work with inadequate staffing. Yet our mandate is to get as much of our content online as possible. How do we do that? Most digitization depends on costly item-level description for search and access. Sometimes – particularly with large manuscript collections -- that just is not feasible.
We’ve all heard about using EADs as an interface to digitized content. I expect you have several questions about this approach. * How difficult is it? * What does it look like? * How long does it take? * How much does it cost? * How effective is it? * What’s missing? I’m going to take a shot at answering these for you. At the University of Alabama, we developed methods and software to make it cheap and easy. Our model was developed for the NHPRC-funded grant “Digitizing the Septimus D. Cabaniss Papers.” And our model can be used regardless of your EAD delivery method.
We had already streamlined our digitization methods: we have standardized identifiers, standard organization of directories and naming conventions, automated processes, and a great delivery system (Acumen) that automatically indexes what we put in a web directory. For mass digitization, we added box and folder number into the file names so we could automate linking into the EAD in the proper place. Boxes were retrieved from the reading room, and materials were digitized in the order encountered in the folders. Each file name includes the box and folder number as well as the sequence for delivery. When a batch of content was ready for upload, we ran a script to generate derivatives for delivery and to upload tiffs to the archive and JPEGs to the server.
Already we knew the collection name, the rights, repository information, and link to the collection. Using this and what could be captured from the file names, we generated simplistic MODS records for each item, and automated the linking into the EAD. So now we have a stub record for each item, that can be improved in the future either by harnessing crowdsourcing methods or, if funds permit, metadata remediation of selected content.
Another script distributes the MODS, JPEGS, and modified EAD into the web directory, where our delivery software (Acumen) automatically indexes it. Here you can see a list of links to specific items within a folder, delivered in the order a researcher would encounter the items in the reading room.
We also developed software that creates HTML web pages for access to items (and if desired, folders), so this method of delivery can be used by those who already have other methods of getting the EAD online.
Digitizing without hand-created item-level metadata, and automating our work makes this method of delivering content very quick. For our analysis, we captured the scanning and optimization time for 12 students over a period of 10 months, using a Phase One CaptureBack overhead scanner and an EPSON 10000 XL flatbed, with Photoshop for optimization. Using these counts and assuming 20% of the content went to the overhead, this gave us scanning estimate of 356 minutes per 100 scans for both workflows. We also assumed an average of 3 pages per document, primarily hand-written manuscript material. We found it takes 47% *less* time to get content online using the new mass digitization method. The processing takes less than half the time, and the cost of metadata librarian work is dropped to zero.
Now, we multiplied out the time savings against the pay rate of the staff who normally perform these steps in our own institution. Your mileage may vary. This comparison does not include costs for hardware, software, overhead, or supervision. However, since we’re providing the software open source, and the time saved lowers the other costs, the actual increased benefit should be much larger – especially if your current workflow is not as streamlined as ours. Already our costs were low; about $2.47 per scan. However, with this new method, our costs were actually closer to 80 cents per scan. That is less than a third of the cost of our usual work flow. Our Cabaniss collection consisted of 31.8 linear feet, a total of 46,663 scans. By using this method of digitization, we *saved* over $78,000.
Did you know that research indicates that experienced researchers prefer to access digitized content, via the finding aid? But what about novice users? Both Scheir and Chapman found that novice users experienced a learning curve during exposure to finding aids, gaining confidence and ease with more exposure. In our own study, we compared access for primarily novice users to a collection delivered via the finding aid as opposed to a similar collection with item-level metadata. We discovered three things that I want you to take away from this presentation: Those without previous digital collection experience found the finding aid interface significantly easier (42% less time, 27% fewer clicks, and 12% more success) than those who claimed familiarity with the more traditional digital library interface. This bodes well for future acceptance of this method of web delivery. Participants for whom English is a second language had difficulty with the archival terminology Learnability needs to be determined by repeated testing over multiple sessions!!! A single session with a few queries is insufficient.
We need more tests on the finding aid interface to determine what actually helps users. Suggestions from the research include: * Replacing archival terminology * Providing search in page feature * Providing navigation links for sections of the finding aid on the left THEN: we need learnability tests for novice users that span multiple sessions.
There is no doubt that delivery of digital content via the finding aid is extremely cost-effective. This new mass-digitization work flow costs only 32% of our already optimized item-level work flow. Clearly this provides a solution for large manuscript collections that may never otherwise see the light of day online. However, we need to make the EAD web interface more usable. Modifications to the display and terminology will increase access for novice users and foreign researchers. Leveraging the EAD for access to online content will enable us to shift more of our precious resources to processing, while providing online access to large manuscript collections that now languish in our archives.
I’ve submitted an article about this to American Archivist, so hopefully you can read more about it there. In the meantime, the JOLI article referenced here details our work flows. The Cabaniss project website, wiki site, and the display links are provided, as well as the Sourceforge site for the Acumen software.
Here are references to the research I referred to earlier.
Cheap, Quick, and Pretty: Mass Digitization of Large Manuscript Collections Jody L. DeRidder University of Alabama Libraries [email_address]
Jody L. DeRidder, “Leveraging EAD for Low-Cost Access to Digitized Collections at the University of Alabama Libraries,” Journal of Library Innovation, 2:1 (2011), http://www.libraryinnovation.org/article/view/69
University of Alabama Libraries, “Septimus D. Cabaniss Papers Digitization Project.”
Experienced researchers prefer the finding aid interface:
Tim West, Kirill Fesenko, and Laura Clark Brown, “Extending the Reach of Southern Sources: Proceeding to Large-Scale Digitization of Manuscript Collections,” Final Grant Report for the Andrew W. Mellon Foundation, Southern Historical Collection, University Library, University of North Carolina at Chapel Hill , June 2009, http://www.lib.unc.edu/mss/archivalmassdigitization/download/extending_the_reach.pdf
Cory Nimer and J. Gordon Daines III, “What Do You Mean It Doesn’t Make Sense? Redesigning Finding Aids from the User’s Perspective,” Journal of Archival Organization 6, no. 4 (2008), http://dx.doi.org/10.1080/15332740802533214
Novice users experience a learning curve:
Wendy Scheir, “First Entry: Report on a Qualitative Exploratory Study of Novice User Experience with Online Finding Aids,” Journal of Archival Organization 3, no. 4 (2006), http://dx.doi.org/10.1300/J201v03n04_04
Joyce Celeste Chapman, “Observing Users: An Empirical Analysis of User Interaction with Online Finding Aids,” Journal of Archival Organization 8, no. 1 (2010) http://dx.doi.org/10.1080/15332748.2010.484361