VRA 2012, Cataloging Case Studies, ROBOCATALOGING
Upcoming SlideShare
Loading in...5
×
 

VRA 2012, Cataloging Case Studies, ROBOCATALOGING

on

  • 705 views

Presented by Joshua Polansky at the Annual Conference of the Visual Resources Association, April 18th - April 21st, 2012, in Albuquerque, New Mexico. ...

Presented by Joshua Polansky at the Annual Conference of the Visual Resources Association, April 18th - April 21st, 2012, in Albuquerque, New Mexico.

The Cataloguing Case Studies session will explore metadata migration, workflows, cloud computing, and tagging and how they can be applied to digital collections. Mary Alexander of the University of Alabama will present on the second of two migrations that have taken place at the University of Alabama Libraries and the importance of metadata schema and workflows in that process. Joshua Polansky of the University of Washington will describe his automated workflow using optical character recognition (OCR), Apple Automator, and Microsoft Excel to speed the process of collecting metadata for 75,000 digital assets. Elizabeth Berenz of ARTstor will look at the advantages of cloud based software for image management using Shared Shelf as a working example. And finally Ian McDermott will demonstrate the advantages of expert tagging and annotation in improving metadata. His presentation will focus on two ARTstor collections that could benefit from the knowledge of the larger ARTstor community: the Gernsheim Photographic Corpus of Drawings and the Larry Qualls Archive of contemporary art exhibitions.

MODERATOR:
Jeannine Keefer, University of Richmond, VA

PRESENTERS:
Mary Alexander, University of Alabama
Elizabeth Berenz, ARTstor
Ian McDermott, ARTstor
Joshua Polansky, University of Washington

Statistics

Views

Total Views
705
Views on SlideShare
703
Embed Views
2

Actions

Likes
0
Downloads
1
Comments
0

1 Embed 2

http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

VRA 2012, Cataloging Case Studies, ROBOCATALOGING VRA 2012, Cataloging Case Studies, ROBOCATALOGING Presentation Transcript

  • ROBOCATALOGINGAccelerated workflows using OCR and automation Joshua Polansky University of Washington College of Built EnvironmentsCataloging Case Studies April 21, 2012 Visual Resources Collection
  • University of Washington College of Built EnvironmentsVisual Resources Collection Serves the departments of Architecture, Construction Management, Landscape Architecture and Urban Design & PlanningAnalog collection:• 130,000 35mm slides accessioned and cataloged since 1950s• Typewritten records; no digital database or online component until 2002
  • Visual Resources CollectionDigital components:MS Access database catalog MDID2 for faculty / students
  • The big question:Automated processes exist for batchdigitizing analog photos.
  • The big question:Automated processes exist for batchdigitizing analog photos. Is it possible to batch digitize old cataloging data, too? Good cataloging information here, researched and typed years ago. More good data, including source and a unique accession number.
  • Paper records to the rescueBinders and binders of accession records Pristine label photocopies
  • A closer look at the slide label Architect Building name Location / Year View SourcePhotocopied label edge that Collection ID that appears Accession numberwill interfere with OCR later on every label in this form
  • The big challenge:• Digitize these typewritten pages• Sort slide label text into distinct columns in Excel• Identify each record with its accession number• Do it all with common or affordable tools
  • Photo: Alvaro Farfán via Flickr. 3392225359
  • Hardware Apple iMac • 2010 model • OS 10.6 Any recent Mac will do (OS 10.4 or higher) Photo: Alvaro Farfán via Flickr. 3392225359
  • Hardware Epson Perfection V500 scanner • With optional Automatic Document Feeder for stacks of 30 sheets at a time • Standard transparency unit makes it useful for other scanning projects • Retails for less than $300 with ADF Photo: Alvaro Farfán via Flickr. 3392225359
  • Photo: Zak Moreira via Flickr. 3425393424
  • Software Photo: Zak Moreira via Flickr. 3425393424
  • Adobe Photoshop CS4• Resize and realign scanned page into a single-column tif with ActionsAdobe Acrobat Pro• Create a pdf of each tif• Analyze pdf with optical character recognition (OCR) and make pdf text selectable
  • Microsoft Excel 2008• Receive text from Acrobat in columns• After text manipulation and sorting, output in a cross-platform format like csvApple AutomatorAutomator Virtual Input• Execute workflows to control multiple applications. Launch, copy, paste, manipulate, save, repeat.• Create Folder Actions for Finder automation• Virtual Input: Extend the functionality of Automator for even more control over apps, mouse, keyboard
  • Automator• Comes standard with Mac OS X 10.4+• Allows scripting and workflow creation via GUI• Can perform operations within an application or across multiple applications
  • Document scanning: Automator, Folder Actions, Photoshop[video here in original presentataion]
  • Text processing: Automator + Automator Virtual Input, Folder Actions, Acrobat, Excel[video here in original presentataion]
  • Processed output in Excel
  • Sometimes it looks good...
  • Sometimes it looks good...Sometimes it doesn’t.
  • Final result after text sorting and cleanup
  • Goal• Produce nearly perfect metadata, clean enough to import into existing database
  • Goal Actual outcome• Produce nearly perfect metadata, • Produced pretty good metadata clean enough to import into • Spent lots of time on data cleanup existing database to get there
  • Goal• Use tools on hand; any new tools should be cheap or useful for other projects
  • Goal Actual outcome• Use tools on hand; any new tools • Used standard software, plus one should be cheap or useful for new application ($25) other projects • iMac is a student workstation • Epson scanner is in use for print and film scanning plus pdf creation
  • Goal• Have 75,000 new records ready to pair with images and publish to MDID
  • Goal Actual outcome• Have 75,000 new records ready • Got 75,000 records! to pair with images and publish • Created a searchable shelf list and to MDID archival finding aid • With further data cleanup, the original goal of MDID use can be achieved
  • Photo: JF Sebastian via Flickr. 412874324
  • • Every Mac comes with Automator and it is easy to learn• You probably have OCR tools on your computer right now• Experimenting can produce great results Photo: JF Sebastian via Flickr. 412874324
  • • Every Mac comes with Automator and it is easy to learn • You probably have OCR tools on your computer right now • Experimenting can produce great resultsPhoto credits Thank you• Software icons and screenshots by Adobe, Apple, Rainer Metzger Microsoft and Singed Labcoat University of Washington• Kraftwerk images by Flickr users Zak Moreira, Alvaro Farfán and JF Sebastian• Other photo and video by UW CBE VRC Photo: JF Sebastian via Flickr. 412874324