The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this View of Digital Libraries - Presentation Transcript
A Grandeur in this View of Digital Libraries
Martin R. Kalfatovic
Suzanne C. Pilsk
Smithsonian Institution Libraries
LITA Forum
6 October 2007
Denver, Colorado
There is grandeur in this view of life , with its several powers, having been originally breathed into a few forms or into one; and that, whilst this planet has gone cycling on according to the fixed law of gravity, from so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved. Charles Darwin, The Origin of Species, 1859
Boutique Scanning
Scanning back cameras
All special handling
Slow production
Expensive
Rare Books
Obvious choice
“Pretty pictures”
Rare Books
Rarely had textual access (OCR, etc.)
Difficult to link to other materials
Mass Scanning Projects
Google Books
MSN Live
Internet Archive/Open Content Alliance
Difficult (impossible?) to repurpose much of the material
Quality of images often questionable
“Frankenbooks”
Sketchy / inaccurate bibliographic data
Content is more re-purposable than Google
Content not fully open
Nice search interface
Still, no context!
Open Access
Hodge-podge of collections
Interface hard to use!
Still, no, or very little context
United States Exploring Expedition
Biologia Centrali-Americana
Biologia Centrali-Americana
Large scale content and repurposing text
Innotaxa
Our Man Sherborn – the Squire
Cataloger at heart
Slowly went through every relevant text looking for names
Created an index that was useful as soon as he started
Index AND Bibliography of relevant texts from 1758 through 1850
Re-keying of data
Parsing data
Using tools to harvest out data
Manually matching data
Wealth of information locked on the page is being liberated!
Current Uses
Look up individual species
Batch whole data sets for comparisons
Future Possibilities
Updated species information
Accurate images of species
Geographic distributions
Other databases can link (such as uBio)
Future prioritizing of scans
Demonstration I: Index Animalium and Nomenclature Zoologicus
Systema naturae (10 th ed. 1758) by Carl von Linné
Binomial Nomenclature
Genus name and Species epithet or descriptor
Latin or Latin-ized
Bill Gates' Flower Fly
Eristalis gatesi Thompson
St. Louis Code of International Code of Botanical Nomenclature
International Code of Zoological Nomenclature
International Code of Phylogenetic Nomenclature
Index Animalium’s Citation:
albimanus Delphinus, T. R. Peale in Wilkes, Expl. Exped. VIII. 1848, 33
On page 33, of volume 8 of Charles Wilke’s 1848 publication: Narrative of the United State Exploring Expedition the Delphinus albmimanus was first named by T.R. Peale.
Agatea violaris Type specimen from the U.S. National Herbarium (Smithsonian Institution) collected by the United States Exploring Expedition, 1838-1842
Specimen
Plate or other visual image
Taxonomic description
The cited half-life of publications in taxonomy is longer than in any other scientific discipline * * * The decay rate is longer than in any scientific discipline ~ Macro-economic case for open access, Tom Moritz
Taxonomic descriptions must be published for the name to be valid
Publications must be available to the public through trusted sources
Libraries have been the traditional place
Specimen collections
Databases
Publications
Observations
‘ Gray’ literature
Index cards
Field notebooks
Biologia Centrali-Americana. Edited by Frederick Ducane Godman and Osbert Salvin. London : Pub. for the editors by R. H. Porter, 1879-1915
Vishwas Chavan travels a lot. An informatician based at the National Chemical Laboratory in Pune, India, he collects data on what types of animal live where in India to enter into a biodiversity database … Much of the information Chavan seeks is in old, out-of-print tomes … To find them, Chavan has spent years trailing around libraries. He dreams of the day when books such as these are scanned and made available as digital files on the Internet. “ Science in the Web Age: The Real Death of Print” by Andreas von Bubnoff Nature 438, 550-552 1 December 2005
What is Biodiversity?
Ecosystems and landscapes
Diversity of species
Genetic variability within species
Wholesome food
Drinkable water
Breathable air
Stable climate for
Forestry
Agriculture
Fisheries
Waste decomposition
Bioremediation
Invasive species
Pest control
Ecotourism
Pharmaceuticals
Genomics
Proteomics
Bioengineering
Biotechnology
Molecular design
Imitating nature
Designer organisms
Renewable feedstocks
Envirofriendly
Manufacturing processes
2003, Telluride. Encyclopedia of Life meeting
February 2005. London. Library and Laboratory: the Marriage of Research, Data and Taxonomic Literature
May 2005. Washington. Ground work for the Biodiversity Heritage Library
June 2006. Washington. Organizational and Technical meeting
August 2006. New York Botanical Garden. BHL Director’s Meeting.
October 2006. St. Louis/San Francisco. Technical meetings
February 2007. Museum of Comparative Zoology. Organizational meeting
May 2007. Encylopedia of Life Launch. Washington DC.
September 2007. Missouri Botanical Garden. Technical and Organizational Meeting. St. Louis, Missouri.
American Museum of Natural History (New York)
Field Museum (Chicago)
Natural History Museum (London)
Smithsonian Institution (Washington)
Missouri Botanical Garden (St. Louis)
New York Botanical Garden (New York)
Royal Botanic Garden, Kew
Botany Libraries, Harvard University
Ernst Mayr Library of the Museum of Comparative Zoology, Harvard University
OCR Text Stellaria ongipes ? Goldie. Cornwallis Island. Silene acaWlis, L. Woman Islands. Sawfraga oppositifolia, L. Kakkidlarn, Greenland; Cornwallis Island. Sa4mfraga cerua, De Cand. Ukaari; Cornwallis Island. Sairaga csepitosa, L. Wolstenholme, Greenland; Cornwallis Island. Saxfraga rimdaris, De Cand. Whale Fish Island. Sad~ raga nivalis, L. Cornwallis Island. Ptentilla emarginata? Pursh. Wolstenholme. Dipena Lapponica, L. Whale Fish Island. Pyrola rotndifolia, L. Whale Fish Island. Casiope tetragona, Don. Ichauti. Vaccinium uliginosum, L. Loiseleuria procumbens, L. Whale Fish Island.
<genus>Polygonum</genus>
<species>viviparm</species>,
<author>L.</author>
<locality>Bushnan Island.</locality>
Created via automated or
semi-automated means.
Machine-readable literature
10.3 million name strings in NameBank
Uses sophisticated algorithm (TaxonGrab) to locate likely name strings in OCR text
Iterative processing of BHL texts will both increase the number of name strings in NameBank and increase the accuracy of name string recognition
Demonstration III: Taxonomic Intelligence
http://www.ubio.org
Search
Browse
Demonstration IV: BHL Portal
http://www.biodiversitylibrary.org
Funding from the MacArthur and Sloan Foundations
Part of the larger Encyclopedia of Life project
Structure of the Encyclopedia of Life Serine Molecule
Serine Molecule Synthesis Center Field Museum Biodiversity Heritage Library Secretariat Smithsonian Education & Outreach Smithsonian/Harvard Informatics Marine Biological Laboratory & MOBOT
Let us … rejoice in the fact, that we have realised what no other kingdom can boast of, and that such vast and harmoniously related accumulation of knowledge is gathered together around a library Charles Darwin, et al, 1858
Thanks to:
Chris Freeland, Missouri Botanical Garden
Neil Thomson, Natural History Museum, London
David Remsen, Global Biodiversity Information Facility
Neil Sarkar, Marine Biological Laboratory/Woods Hole Oceanographic Institution
Anna Weitzman, National Museum of Natural History
Chris Lyal, Natural History Museum, London
The staff at the Internet Archive
Images from
The Galaxy of Images, Smithsonian Libraries ( www.sil.si.edu/imagegalaxy )
The Biodiversity Heritage Library Mass Digitizing P more
The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this View of Digital Libraries by Martin R. Kalfatovic and Suzanne C. Pilsk, Smithsonian Institution Libraries. LITA National Forum, October 2007. Denver, Colorado. less
1 comments
Comments 1 - 1 of 1 previous next Post a comment