The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this View of Digital Libraries

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

1 comments

Comments 1 - 1 of 1 previous next Post a comment

Post a comment
Embed Video
Edit your comment Cancel

Notes on slide 1

Title Screen

2 Groups

The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this View of Digital Libraries - Presentation Transcript

      • A Grandeur in this View of Digital Libraries
      • Martin R. Kalfatovic
      • Suzanne C. Pilsk
      • Smithsonian Institution Libraries
      • LITA Forum
      • 6 October 2007
      • Denver, Colorado
  1. There is grandeur in this view of life , with its several powers, having been originally breathed into a few forms or into one; and that, whilst this planet has gone cycling on according to the fixed law of gravity, from so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved. Charles Darwin, The Origin of Species, 1859
    • Boutique Scanning
    • Scanning back cameras
    • All special handling
    • Slow production
    • Expensive
    • Rare Books
    • Obvious choice
    • “Pretty pictures”
    • Rare Books
    • Rarely had textual access (OCR, etc.)‏
    • Difficult to link to other materials
    • Mass Scanning Projects
    • Google Books
    • MSN Live
    • Internet Archive/Open Content Alliance
  2.  
    • Difficult (impossible?) to repurpose much of the material
    • Quality of images often questionable
    • “Frankenbooks”
    • Sketchy / inaccurate bibliographic data
  3.  
    • Content is more re-purposable than Google
    • Content not fully open
    • Nice search interface
    • Still, no context!
  4.  
    • Open Access
    • Hodge-podge of collections
    • Interface hard to use!
    • Still, no, or very little context
  5.  
  6.  
    • United States Exploring Expedition
    • Biologia Centrali-Americana
  7.  
    • Biologia Centrali-Americana
      • Large scale content and repurposing text
      • Innotaxa
  8.  
  9.  
  10.  
    • Our Man Sherborn – the Squire
    • Cataloger at heart
    • Slowly went through every relevant text looking for names
    • Created an index that was useful as soon as he started
    • Index AND Bibliography of relevant texts from 1758 through 1850
  11.  
  12.  
    • Re-keying of data
    • Parsing data
    • Using tools to harvest out data
    • Manually matching data
    • Wealth of information locked on the page is being liberated!
  13.  
  14.  
    • Current Uses
      • Look up individual species
      • Batch whole data sets for comparisons
    • Future Possibilities
      • Updated species information
      • Accurate images of species
      • Geographic distributions
      • Other databases can link (such as uBio)‏
      • Future prioritizing of scans
    • Demonstration I: Index Animalium and Nomenclature Zoologicus
    • http://www.sil.si.edu/digitalcollections/indexanimalium/
    • http:// uio.mbl.edu/NomenclatorZoologicus /
    • Nomina si nescis, perit et cognitio rerum
    • Who knoweth not the name, knoweth not the subject
    • ~ Linnaeus, 1737, Critica Botanica n. 210
    • Over 250 years of systematic description of life
    • Systema naturae (10 th ed. 1758) by Carl von Linné
    • Binomial Nomenclature
    • Genus name and Species epithet or descriptor
    • Latin or Latin-ized
    • Bill Gates' Flower Fly
    • Eristalis gatesi Thompson
    • St. Louis Code of International Code of Botanical Nomenclature
    • International Code of Zoological Nomenclature
    • International Code of Phylogenetic Nomenclature
    • Index Animalium’s Citation:
      • albimanus Delphinus, T. R. Peale in Wilkes, Expl. Exped. VIII. 1848, 33
    • On page 33, of volume 8 of Charles Wilke’s 1848 publication: Narrative of the United State Exploring Expedition the Delphinus albmimanus was first named by T.R. Peale.
  15. Agatea violaris Type specimen from the U.S. National Herbarium (Smithsonian Institution) collected by the United States Exploring Expedition, 1838-1842
  16.  
    • Specimen
    • Plate or other visual image
    • Taxonomic description
  17.  
  18. The cited half-life of publications in taxonomy is longer than in any other scientific discipline * * * The decay rate is longer than in any scientific discipline ~ Macro-economic case for open access, Tom Moritz
    • Taxonomic descriptions must be published for the name to be valid
    • Publications must be available to the public through trusted sources
    • Libraries have been the traditional place
    • Specimen collections
    • Databases
    • Publications
    • Observations
    • ‘ Gray’ literature
    • Index cards
    • Field notebooks
  19. Biologia Centrali-Americana. Edited by Frederick Ducane Godman and Osbert Salvin. London : Pub. for the editors by R. H. Porter, 1879-1915
  20.  
  21. Vishwas Chavan travels a lot. An informatician based at the National Chemical Laboratory in Pune, India, he collects data on what types of animal live where in India to enter into a biodiversity database … Much of the information Chavan seeks is in old, out-of-print tomes … To find them, Chavan has spent years trailing around libraries. He dreams of the day when books such as these are scanned and made available as digital files on the Internet. “ Science in the Web Age: The Real Death of Print” by Andreas von Bubnoff Nature 438, 550-552 1 December 2005
  22.  
  23.  
  24.  
    • What is Biodiversity?
    • Ecosystems and landscapes
    • Diversity of species
    • Genetic variability within species
  25.  
  26.  
  27.  
  28.  
    • Wholesome food
    • Drinkable water
    • Breathable air
    • Stable climate for
      • Forestry
      • Agriculture
      • Fisheries
    • Waste decomposition
    • Bioremediation
    • Invasive species
    • Pest control
    • Ecotourism
    • Pharmaceuticals
    • Genomics
    • Proteomics
    • Bioengineering
    • Biotechnology
    • Molecular design
    • Imitating nature
    • Designer organisms
    • Renewable feedstocks
    • Envirofriendly
    • Manufacturing processes
  29.  
    • 2003, Telluride. Encyclopedia of Life meeting
    • February 2005. London. Library and Laboratory: the Marriage of Research, Data and Taxonomic Literature
    • May 2005. Washington. Ground work for the Biodiversity Heritage Library
    • June 2006. Washington. Organizational and Technical meeting
    • August 2006. New York Botanical Garden. BHL Director’s Meeting.
    • October 2006. St. Louis/San Francisco. Technical meetings
    • February 2007. Museum of Comparative Zoology. Organizational meeting
    • May 2007. Encylopedia of Life Launch. Washington DC.
    • September 2007. Missouri Botanical Garden. Technical and Organizational Meeting. St. Louis, Missouri.
    • American Museum of Natural History (New York)‏
    • Field Museum (Chicago)‏
    • Natural History Museum (London)‏
    • Smithsonian Institution (Washington)
    • Missouri Botanical Garden (St. Louis)‏
    • New York Botanical Garden (New York)‏
    • Royal Botanic Garden, Kew
    • Botany Libraries, Harvard University
    • Ernst Mayr Library of the Museum of Comparative Zoology, Harvard University
    • Marine Biological Laboratory / Woods Hole Oceanographic Institution
    • Core literature pre-1923: 400,000 (80 million pages)‏
    • All pre-1923: 600-750,000 (120-150 million pages)‏
    • All literature: 1.4-1.6 million (280-320 million pages)‏
    • Most literature is in
    • the developed world
    • the Northern Hemisphere
    • Most Biodiversity is in
    • developing world
    • the Southern Hemisphere
    • Most literature is
    • in large libraries formed in the 19 th century
  30.  
  31. Who has what? What should we scan and when? Monographs vs Serials Series treated as separates Can it be found and used once scanned?
      • Initial Metadata Analysis:
      • We have 1.3 million catalogue records
      • 73% are monographs (remainder are serials at title-level)
      • 63% is English language material. The next most popular language (9%) is German.
      • About 30% of material was published before 1923.
  32.  
  33.  
    • Scalable Mass Scanning
    • Contracts
    • Firewalls
    • Bathrooms
    • Security
    • Loading docks
    • Trucks
  34.  
  35.  
  36.  
    • Mass Scanning Workflow
    • Pick lists
    • Packing lists
    • Serials management
    • Monographic management
    • Stickers for books
  37.  
  38. Demonstration II: Workflow Tools
  39.  
  40.  
  41.  
  42.  
  43.  
    • Stable URL
    • Handle
    • DOI
    • BICI/SICI
    • ISSN
    • ISBN
    http://www.biodiversitylibrary.org
  44.  
    • Biologia Centrali-Americana :zoology, botany and archaeology
      • Mammalia
      • Aves v. 1
      • Aves v. 4
      • Reptilia and Batrachia
    • Biologia Centrali-Americana : Aves
      • v. 1 Introduction -- Subclass Aves Carinate. Order Passeres.
      • v. 2 Subclass Aves Carinate. Order[s]: Passeres (contd.), Macrochires, Pici, Coccyges, Psitta
    Is it Or
    • Zea mays L.
      • Sp. Pl. 2: 971-972. 1753.
    • Title: Species Plantarum
        • TL2: 4.769
        • Tropicos Pub_id: 1071
        • IPNI Pub_id:1071-2
    • Volume: 2
    • Start Page: 971
    • End Page: 972
    • Year Published: 1753
    http:// hdl.handle.net/1234567/9876 http:// www.biodiversitylibrary.org/page/358992
    • Page turning, multiple views, translations
    • PDF “Grab-n-Go”
    OCR Text Stellaria ongipes ? Goldie. Cornwallis Island. Silene acaWlis, L. Woman Islands. Sawfraga oppositifolia, L. Kakkidlarn, Greenland; Cornwallis Island. Sa4mfraga cerua, De Cand. Ukaari; Cornwallis Island. Sairaga csepitosa, L. Wolstenholme, Greenland; Cornwallis Island. Saxfraga rimdaris, De Cand. Whale Fish Island. Sad~ raga nivalis, L. Cornwallis Island. Ptentilla emarginata? Pursh. Wolstenholme. Dipena Lapponica, L. Whale Fish Island. Pyrola rotndifolia, L. Whale Fish Island. Casiope tetragona, Don. Ichauti. Vaccinium uliginosum, L. Loiseleuria procumbens, L. Whale Fish Island.
    • <genus>Polygonum</genus>
    • <species>viviparm</species>,
    • <author>L.</author>
    • <locality>Bushnan Island.</locality>
    • Created via automated or
    • semi-automated means.
    Machine-readable literature
  45.  
  46.  
  47.  
    • 10.3 million name strings in NameBank
    • Uses sophisticated algorithm (TaxonGrab) to locate likely name strings in OCR text
    • Iterative processing of BHL texts will both increase the number of name strings in NameBank and increase the accuracy of name string recognition
  48.  
    • Demonstration III: Taxonomic Intelligence
    • http://www.ubio.org
  49.  
  50.  
    • Search
    • Browse
    • Demonstration IV: BHL Portal
    • http://www.biodiversitylibrary.org
  51.  
    • Funding from the MacArthur and Sloan Foundations
    • Part of the larger Encyclopedia of Life project
  52.  
  53. Structure of the Encyclopedia of Life Serine Molecule
  54. Serine Molecule Synthesis Center Field Museum Biodiversity Heritage Library Secretariat Smithsonian Education & Outreach Smithsonian/Harvard Informatics Marine Biological Laboratory & MOBOT
  55.  
  56.  
  57.  
  58. Let us … rejoice in the fact, that we have realised what no other kingdom can boast of, and that such vast and harmoniously related accumulation of knowledge is gathered together around a library Charles Darwin, et al, 1858
  59.  
    • Thanks to:
      • Chris Freeland, Missouri Botanical Garden
      • Neil Thomson, Natural History Museum, London
      • David Remsen, Global Biodiversity Information Facility
      • Neil Sarkar, Marine Biological Laboratory/Woods Hole Oceanographic Institution
      • Anna Weitzman, National Museum of Natural History
      • Chris Lyal, Natural History Museum, London
      • The staff at the Internet Archive
    • Images from
      • The Galaxy of Images, Smithsonian Libraries ( www.sil.si.edu/imagegalaxy )‏
      • NASA, Visible Earth Project
      • Martin R. Kalfatovic
      • Diane Rielinger
    • Biodiversity Heritage Library http://www.biodiversitylibrary.org/
    • Encyclopedia of Life http://www.eol.org/
    • Smithsonian Institution Libraries http:// www.sil.si.edu /
    • Universal Biological Indexer and Organizer http://www.ubio.org/
    • Sherborn’s Index Animalium http:// www.sil.si.edu/digitalcollections/indexanimalium /
    • Neave’s Nomenclator Zoologicus http:// www.ubio.org/NomenclatorZoologicus /
    • United States Exploring Expedition http:// www.sil.si.edu/DigitalCollections/usexex/index.htm
    • Biologia Centrali-Americana http://www.sil.si.edu/digitalcollections/bca/
    • Botanicus http:// www.botanicus.org /

+ Martin KalfatovicMartin Kalfatovic, 3 years ago

custom

3105 views, 0 favs, 5 embeds more stats

The Biodiversity Heritage Library Mass Digitizing P more

More info about this document

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Go to text version

  • Total Views 3105
    • 3044 on SlideShare
    • 61 from embeds
  • Comments 1
  • Favorites 0
  • Downloads 89
Most viewed embeds
  • 48 views on http://biodiversitylibrary.blogspot.com
  • 8 views on http://smithsonianlibraries.si.edu
  • 3 views on http://udc793.wikispaces.com
  • 1 views on http://schwitters.wordpress.com
  • 1 views on http://66.102.9.104

more

All embeds
  • 48 views on http://biodiversitylibrary.blogspot.com
  • 8 views on http://smithsonianlibraries.si.edu
  • 3 views on http://udc793.wikispaces.com
  • 1 views on http://schwitters.wordpress.com
  • 1 views on http://66.102.9.104

less

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel
File a copyright complaint
Having problems? Go to our helpdesk?

Categories