Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

More than just a pretty picture: improving the discoverability of illustrations in the Biodiversity Heritage Library


Published on

This was a demo given by Trish Rose-Sandler and Kyle Jaebker at the Museums and the Web Conference on April 20th 2013 related to how BHL is improving access to its natural history illustrations via Flickr and via the Art of Life project. Authors for the poster and handouts include: Gilbert Borrego, Grace Costantino, Bianca Crowley, Kyle Jaebker, and Trish Rose-Sandler

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

More than just a pretty picture: improving the discoverability of illustrations in the Biodiversity Heritage Library

  1. 1. Image access via FlickrThe Biodiversity HeritageLibrary (BHL) is....Art of Life projectMore than just a pretty picture: improving the discoverability ofillustrations in the Biodiversity Heritage Libraryby Gilbert Borrego, Grace Costantino, Bianca Crowley, Kyle Jaebker, Trish Rose-SandlerHidden within BHL literature aremillions of rich illustrations• An open access digital libraryfor historic biodiversity literature• An open data repository oftaxonomic names andbibliographic informationBHL staff manually identify and push BHL images to aFlickr stream ( butthe process does not scale to the millions of imagesavailableThe Art of Life project , enabled by a grant from NEH,aims to automate the process of identifying andtagging images via algorithmsUsers can add tags to imagesin Flickr so that they aresearchable. They are alsoencouraged to add speciesnames via machine tags soBHL can automatically sharethese images with theEncyclopedia of Life( project defined a metadata schema for natural historyillustrations that will help crowdsource more detaileddescriptions via image portals such as Wikimedia Commons(
  2. 2. Uploading Images to FlickrThe Biodiversity Heritage Library (BHL, provides access to thousands of scientificillustrations through the social media site, Flickr. To expedite the process of uploading these images to Flickr, aworkflow was developed within BHL’s backend database. When paginating, or enhancing a book’s page metada-ta, staff can click a single button to upload all illustrations within that book to Flickr. Bibliographic informationand a link to the image in BHL are also embedded during the process.This workflow was internally documented in the form of a tutorial to ensure that all BHL partners can contributeto this effort and be part of the program’s expanding outreach efforts.The use of Flickr as an outreach platform exposes our rich image collection to search engines and new users.Additionally, it allows us to provide images of species to include on the Encyclopedia of Life’s taxon pages. Whilethe original intention of BHL’ Flickr account was to provide easy access to scientific figures, plates and illustra-tions, the site has taken on a life of its own and is being repurposed by users all around the world in the mostimaginative ways.From BHL’s backend dashboard, staff select thepages to upload to Flickr.Final view in Flickr.Once images are uploaded, staff can create sets, addadditional bibliographic information, and assignsets to collections.Visit the BHL Flickr today! how you can help add species names to BHL Images:
  3. 3. The Flickr Tagging ProcessCrowdsourcing Species Identification and Image TaggingThe Biodiversity Heritage Library (BHL,, an open access digital library consortiumfor biodiversity literature, utilizes Flickr to provide access to thousands of images extracted from its digitalcollections. In order to improve discoverability and usability of these images, BHL crowdsources the task ofadding species name machine tags to images in Flickr.Tags are searchable keywords that users can apply to images in Flickr. Machine tags are specially formatted to beread by computers: taxonomy:binomial=“Genus species”BHL encourages its users to identify the species depicted in an image using the book’s image descriptions andadd that species name to the image as a machine tag. By adding these tags to BHL images, users can searchwithin Flickr for images of specific species and BHL can automatically share these images with the Encyclopediaof Life (EOL, is an open access project dedicated to providing a webpage for every species. EOL harvests machine-taggedimages from the BHL Flickr, uploads them to a BHL Image Collection in EOL, and automatically associates theimages with the matching species page. To date, thousands of machine-tagged images have been added to EOL.Visit the BHL Flickr today! how you can help add species names to BHL Images: an image in FlickrAdd a species name machine tagThe image is automatically ingested into the BHL Image Collection in EOLAnd automatically associatedwith the corresponding speciespage in EOL
  4. 4. Users clamor for the Art of LifeThe Art of Life project evolved out of a need to improve access to the rich corpus of natural history illustrationshidden within the digitized pages of books and journals in the Biodiversity Heritage Library (BHL, Currently, these illustrations have no descriptive metadata such as title, creator orsubject matter that can be searched. The only way to uncover these gems is by opening up a BHL book or vol-ume and scrolling through page by page.One solution has been for BHL staff to manually identify pages that contain illustrations and to push those pagesinto a BHL Flickr stream which allows for discovery through themed collections and in some cases speciesnames. While this approach has resulted in improved access to some of BHL’s illustrations, it requires significantstaff time and the process does not scale well to the millions of images that are present within the BHL pages.Example of an illustration described using Art of Life schemaIllustration schema elements.Visit the BHL Flickr today! more about the Art of Life project: chosenwere a mix of VRACore 4.0 andDarwin CoreWorkflow diagram that outlines how each illustration will move through the Art of Life processes.Thus, the Art of Life project was designed as a solution for automating the process of image identification andcrowdsourcing their descriptions. The project is a partnership between the Missouri Botanical Garden and theIndianapolis Museum of Art and supported by the National Endowment for the Humanities. It runs from May2012-April 2014. The Art of Life has five primary objectives: 1) define a metadata schema appropriate for nat-ural history illustrations, 2) build algorithms to automatically identify BHL pages with illustrations, 3) sort andclassify the illustrations, 4) crowdsource descriptions through tagging applications; and 5) integrate descriptivemetadata back into BHL and share images and descriptions with audiences outside of BHL. These illustrationswill be of interest to a diversity of audiences including: artists; biologists; humanities scholars; librarians; educa-tors; citizen scientists.
  5. 5. Automating the Heavy LiftingUsing Algorithms to Identify Images in BHLIn the Art of Life project, the Indianapolis Museum ofArt (IMA) and the Biodiversity Heritage Library (BHL, have been working todevelop algorithms to identify images from the pagesof books and journals digitized from the BHL. Multiplealgorithms are being developed including ABBYYOCR, contrast, color, and compression. Thesealgorithms are being tested to determine the mostefficient and accurate means of identifying images.The IMA developed a set of software tools for runningand analyzing the results of the algorithms. Thissoftware allows for the import of publications andjournals determined to be good test samples for thealgorithms. These samples termed the “Gold Standard”are being used to evaluate the algorithms for howuseful they will be in determining if a scan contains asketch or drawing. Using a custom built interface forreviewing the results, accurate processing results can beseen as well as false positives. In addition to the visualreview of results, analysis across the entire “Gold Stan-dard” is ongoing to determine the best combination ofalgorithms.Once completed, the algorithms will be deployed on acluster to process the entire BHL collection. After theprocessing has been completed the metadata will beused to add additional descriptive and finding aides.This will allow users to discover and processillustrations from the books and journals that used tobe very hard to discover.Visit the BHL Flickr today! more about the Art of Life project: how you can help add species names toBHL Images: Results ViewerCompression Ratio Algorithm AnalysisClose-up Algorithm Result