This document summarizes the Art of Life project, which aims to enhance access to natural history illustrations in the Biodiversity Heritage Library by automatically identifying and describing them. The BHL is a digital library containing over 44 million pages from its partner institutions. The Art of Life project is developing algorithms to identify illustrations on pages and build descriptive metadata by testing techniques like analyzing picture blocks, contrast, color, and compression. A Macaw interface allows classifying pages identified by algorithms and crowdsourcing on Flickr and Wikimedia Commons can add descriptions. Over 1.5 million pages have been processed so far, identifying 300,000 pages with images. The goals are to serve scientists, artists, historians and teachers by enriching content in BHL
1. Merging the Worlds of Art and Science
IFLA Section on Science & Technology
August 19,2014
2. Trish Rose-Sandler
Center for Biodiversity Informatics,
Missouri Botanical Garden, St. Louis, MO, USA
Nancy E. Gwinn
Smithsonian Libraries,
Washington, DC, USA
Constance Rinaldo
Ernst Mayr Library, Museum of Comparative
Zoology, Harvard University,
Cambridge, MA, USA
4. Digital Repository of
Books
Field Notebooks
Journals
• Extensive
• Open
• Global
• Linked
5. What is the Biodiversity Heritage Library?
Digital Library
78,947
Titles
141,770
Volumes
44,275,666
Pages
6. • American Museum of Natural History Library
• California Academy of Sciences Library
• Cornell University Library
• Harvard University Botany Library
• Ernst Mayr Library of the Museum of Comparative Zoology
• Library of Congress
• Marine Biological Laboratory Library
• Missouri Botanical Garden Library
• Natural History Museum, London Library
• The New York Botanical Garden Library
• Royal Botanic Garden, Kew, Library
• Smithsonian Libraries
• Washington University at St. Louis Library
• University of Illinois Library
• United States Geological Survey Libraries
15
MEMBERS
4
AFFILIATES
BHL Central
• Academy of Natural Sciences Library
• Chicago Botanical Garden Library
• The Field Museum Library
• Los Angeles County Natural History Museum Library
8. BHL = Libraries + Technology + Science
of
animals, plants, nature in
general (Taxonomic literature)
• Type in a taxon name and find
in
the text corpus.
• Systematic biology is for
scientists—the taxonomic
impediment is resolved.
11. Challenges
Millions of natural history illustrations
Very few have metadata that
describe their content or where they
are in the book
12. BHL and Flickr—90,000 Images
Only pretty
pictures
Still too manual
Many
illustrations not
included
13. o Full title—
o Grant given to in St. Louis, Missouri
o Funded by U.S.
o Runs
Art of Life?
14. Goals/Objectives
o Define for natural history illustrations
o Build algorithms to of illustrations
o Enhance existing tools to
adding descriptive metadata
into BHL and images and metadata
more widely
15. How to Identify Illustrations?
Algorithms to automate
identification of images tested:
Picture blocks 87-88% effective
Contrast 87-88% effective
Color ineffective
Compression ineffective
21. Art of Life Status
o Running algorithms
o 1.5 million pages processed
o 300,000 pages with images
o Estimate will be 15% of corpus—pages with
images
results—78,000 pages so far
o Testing to Wikimedia Commons
o Testing from tool
o BHL architecture modified and ready to
newly created metadata
22. o Images serve
o Scientists
o Artists
o Historians
o Teachers
Benefits
o Rich content
o If specimens are gone, may be to
document a species
algorithms and analysis tools on github
github.com/IMAmuseum/artoflife
tool (Macaw) on github
github.com/cajunjoel/macaw-neh
23.
24. Questions?
For more information:
Trish Rose-Sandler, Principal Investigator
trish.rose-sandler@mobot.org
biodivlib.wikispaces.com/Art+of+Life
Editor's Notes
BHL is
An extensive repository of biodiversity literature (44 million pages +).
open access – all content is either public domain or licensed under Creative Commons.
Global in content coverage and partners
links to other regional and global initiatives
All content is served from Internet Archive and at specialized BHL portal
In portal Can search by keyword, scientific name, or by article
Browse by Title, author, date or collection
BHL is a VIRTUAL ORGANIZATION
BHL Central is formed by 19 organizations in US and UK:
Global BHL includes BHL central and the other countries on the slide
Membership subscriptions support operational/administration/technical work.
Primarily supports taxonomists in biodiversity
But ...is used by many disciplines other than science particularly the illustrations.
Brief explanation of what is taxonomy in context of natural history. Explain specifically the taxonomic impediment:
Years of work can be compressed.
BHL Book viewer showing a scientific name search result
Through web services we can match species name thesauri with content in BHL.
Then link out to related sites such as the Encyclopedia of Life
5 primary objectives
Define an appropriate metadata schema for natural history illustrations
Build algorithms to automate identification of illustrations
Enhance existing tools to classify illustrations (e.g. drawing vs map)
Crowdsource further descriptions (e.g. species, illustrator)
Integrate metadata into BHL portal and share images and metadata more widely
Indianapolis Museum of Art developed algorithms
Tested four but used only 2 due to effectiveness
Django web application for analyzing and visualizing the algorithm results
Developed Tool for classifying the illustrations
Developed at Smithsonian Libraries
Modified a tool called Macaw used for adding page level metadata to books
On the left is Flickr where BHL can share illustrations and users can tag them
On right is Wikimedia Commons where BHL can share illustrations and users can describe them using the Art of Life schema
Example of illustration described using Art of Life schema