The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this View of Digital Libraries
Upcoming SlideShare
Loading in...5
×
 

The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this View of Digital Libraries

on

  • 5,877 views

The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this View of Digital Libraries by Martin R. Kalfatovic and Suzanne C. Pilsk, Smithsonian Institution Libraries. LITA National ...

The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this View of Digital Libraries by Martin R. Kalfatovic and Suzanne C. Pilsk, Smithsonian Institution Libraries. LITA National Forum, October 2007. Denver, Colorado.

Statistics

Views

Total Views
5,877
Views on SlideShare
5,766
Embed Views
111

Actions

Likes
0
Downloads
106
Comments
1

8 Embeds 111

http://biodiversitylibrary.blogspot.com 51
http://schwitters.wordpress.com 27
http://smithsonianlibraries.si.edu 19
http://blog.library.si.edu 5
http://www.slideshare.net 4
http://udc793.wikispaces.com 3
http://66.102.9.104 1
http://localhost 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Title Screen

The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this View of Digital Libraries The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this View of Digital Libraries Presentation Transcript

      • A Grandeur in this View of Digital Libraries
      • Martin R. Kalfatovic
      • Suzanne C. Pilsk
      • Smithsonian Institution Libraries
      • LITA Forum
      • 6 October 2007
      • Denver, Colorado
  • There is grandeur in this view of life , with its several powers, having been originally breathed into a few forms or into one; and that, whilst this planet has gone cycling on according to the fixed law of gravity, from so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved. Charles Darwin, The Origin of Species, 1859
    • Boutique Scanning
    • Scanning back cameras
    • All special handling
    • Slow production
    • Expensive
    • Rare Books
    • Obvious choice
    • “Pretty pictures”
    • Rare Books
    • Rarely had textual access (OCR, etc.)‏
    • Difficult to link to other materials
    • Mass Scanning Projects
    • Google Books
    • MSN Live
    • Internet Archive/Open Content Alliance
  •  
    • Difficult (impossible?) to repurpose much of the material
    • Quality of images often questionable
    • “Frankenbooks”
    • Sketchy / inaccurate bibliographic data
  •  
    • Content is more re-purposable than Google
    • Content not fully open
    • Nice search interface
    • Still, no context!
  •  
    • Open Access
    • Hodge-podge of collections
    • Interface hard to use!
    • Still, no, or very little context
  •  
  •  
    • United States Exploring Expedition
    • Biologia Centrali-Americana
  •  
    • Biologia Centrali-Americana
      • Large scale content and repurposing text
      • Innotaxa
  •  
  •  
  •  
    • Our Man Sherborn – the Squire
    • Cataloger at heart
    • Slowly went through every relevant text looking for names
    • Created an index that was useful as soon as he started
    • Index AND Bibliography of relevant texts from 1758 through 1850
  •  
  •  
    • Re-keying of data
    • Parsing data
    • Using tools to harvest out data
    • Manually matching data
    • Wealth of information locked on the page is being liberated!
  •  
  •  
    • Current Uses
      • Look up individual species
      • Batch whole data sets for comparisons
    • Future Possibilities
      • Updated species information
      • Accurate images of species
      • Geographic distributions
      • Other databases can link (such as uBio)‏
      • Future prioritizing of scans
    • Demonstration I: Index Animalium and Nomenclature Zoologicus
    • http://www.sil.si.edu/digitalcollections/indexanimalium/
    • http:// uio.mbl.edu/NomenclatorZoologicus /
    • Nomina si nescis, perit et cognitio rerum
    • Who knoweth not the name, knoweth not the subject
    • ~ Linnaeus, 1737, Critica Botanica n. 210
    • Over 250 years of systematic description of life
    • Systema naturae (10 th ed. 1758) by Carl von Linné
    • Binomial Nomenclature
    • Genus name and Species epithet or descriptor
    • Latin or Latin-ized
    • Bill Gates' Flower Fly
    • Eristalis gatesi Thompson
    • St. Louis Code of International Code of Botanical Nomenclature
    • International Code of Zoological Nomenclature
    • International Code of Phylogenetic Nomenclature
    • Index Animalium’s Citation:
      • albimanus Delphinus, T. R. Peale in Wilkes, Expl. Exped. VIII. 1848, 33
    • On page 33, of volume 8 of Charles Wilke’s 1848 publication: Narrative of the United State Exploring Expedition the Delphinus albmimanus was first named by T.R. Peale.
  • Agatea violaris Type specimen from the U.S. National Herbarium (Smithsonian Institution) collected by the United States Exploring Expedition, 1838-1842
  •  
    • Specimen
    • Plate or other visual image
    • Taxonomic description
  •  
  • The cited half-life of publications in taxonomy is longer than in any other scientific discipline * * * The decay rate is longer than in any scientific discipline ~ Macro-economic case for open access, Tom Moritz
    • Taxonomic descriptions must be published for the name to be valid
    • Publications must be available to the public through trusted sources
    • Libraries have been the traditional place
    • Specimen collections
    • Databases
    • Publications
    • Observations
    • ‘ Gray’ literature
    • Index cards
    • Field notebooks
  • Biologia Centrali-Americana. Edited by Frederick Ducane Godman and Osbert Salvin. London : Pub. for the editors by R. H. Porter, 1879-1915
  •  
  • Vishwas Chavan travels a lot. An informatician based at the National Chemical Laboratory in Pune, India, he collects data on what types of animal live where in India to enter into a biodiversity database … Much of the information Chavan seeks is in old, out-of-print tomes … To find them, Chavan has spent years trailing around libraries. He dreams of the day when books such as these are scanned and made available as digital files on the Internet. “ Science in the Web Age: The Real Death of Print” by Andreas von Bubnoff Nature 438, 550-552 1 December 2005
  •  
  •  
  •  
    • What is Biodiversity?
    • Ecosystems and landscapes
    • Diversity of species
    • Genetic variability within species
  •  
  •  
  •  
  •  
    • Wholesome food
    • Drinkable water
    • Breathable air
    • Stable climate for
      • Forestry
      • Agriculture
      • Fisheries
    • Waste decomposition
    • Bioremediation
    • Invasive species
    • Pest control
    • Ecotourism
    • Pharmaceuticals
    • Genomics
    • Proteomics
    • Bioengineering
    • Biotechnology
    • Molecular design
    • Imitating nature
    • Designer organisms
    • Renewable feedstocks
    • Envirofriendly
    • Manufacturing processes
  •  
    • 2003, Telluride. Encyclopedia of Life meeting
    • February 2005. London. Library and Laboratory: the Marriage of Research, Data and Taxonomic Literature
    • May 2005. Washington. Ground work for the Biodiversity Heritage Library
    • June 2006. Washington. Organizational and Technical meeting
    • August 2006. New York Botanical Garden. BHL Director’s Meeting.
    • October 2006. St. Louis/San Francisco. Technical meetings
    • February 2007. Museum of Comparative Zoology. Organizational meeting
    • May 2007. Encylopedia of Life Launch. Washington DC.
    • September 2007. Missouri Botanical Garden. Technical and Organizational Meeting. St. Louis, Missouri.
    • American Museum of Natural History (New York)‏
    • Field Museum (Chicago)‏
    • Natural History Museum (London)‏
    • Smithsonian Institution (Washington)
    • Missouri Botanical Garden (St. Louis)‏
    • New York Botanical Garden (New York)‏
    • Royal Botanic Garden, Kew
    • Botany Libraries, Harvard University
    • Ernst Mayr Library of the Museum of Comparative Zoology, Harvard University
    • Marine Biological Laboratory / Woods Hole Oceanographic Institution
    • Core literature pre-1923: 400,000 (80 million pages)‏
    • All pre-1923: 600-750,000 (120-150 million pages)‏
    • All literature: 1.4-1.6 million (280-320 million pages)‏
    • Most literature is in
    • the developed world
    • the Northern Hemisphere
    • Most Biodiversity is in
    • developing world
    • the Southern Hemisphere
    • Most literature is
    • in large libraries formed in the 19 th century
  •  
  • Who has what? What should we scan and when? Monographs vs Serials Series treated as separates Can it be found and used once scanned?
      • Initial Metadata Analysis:
      • We have 1.3 million catalogue records
      • 73% are monographs (remainder are serials at title-level)
      • 63% is English language material. The next most popular language (9%) is German.
      • About 30% of material was published before 1923.
  •  
  •  
    • Scalable Mass Scanning
    • Contracts
    • Firewalls
    • Bathrooms
    • Security
    • Loading docks
    • Trucks
  •  
  •  
  •  
    • Mass Scanning Workflow
    • Pick lists
    • Packing lists
    • Serials management
    • Monographic management
    • Stickers for books
  •  
  • Demonstration II: Workflow Tools
  •  
  •  
  •  
  •  
  •  
    • Stable URL
    • Handle
    • DOI
    • BICI/SICI
    • ISSN
    • ISBN
    http://www.biodiversitylibrary.org
  •  
    • Biologia Centrali-Americana :zoology, botany and archaeology
      • Mammalia
      • Aves v. 1
      • Aves v. 4
      • Reptilia and Batrachia
    • Biologia Centrali-Americana : Aves
      • v. 1 Introduction -- Subclass Aves Carinate. Order Passeres.
      • v. 2 Subclass Aves Carinate. Order[s]: Passeres (contd.), Macrochires, Pici, Coccyges, Psitta
    Is it Or
    • Zea mays L.
      • Sp. Pl. 2: 971-972. 1753.
    • Title: Species Plantarum
        • TL2: 4.769
        • Tropicos Pub_id: 1071
        • IPNI Pub_id:1071-2
    • Volume: 2
    • Start Page: 971
    • End Page: 972
    • Year Published: 1753
    http:// hdl.handle.net/1234567/9876 http:// www.biodiversitylibrary.org/page/358992
    • Page turning, multiple views, translations
    • PDF “Grab-n-Go”
    OCR Text Stellaria ongipes ? Goldie. Cornwallis Island. Silene acaWlis, L. Woman Islands. Sawfraga oppositifolia, L. Kakkidlarn, Greenland; Cornwallis Island. Sa4mfraga cerua, De Cand. Ukaari; Cornwallis Island. Sairaga csepitosa, L. Wolstenholme, Greenland; Cornwallis Island. Saxfraga rimdaris, De Cand. Whale Fish Island. Sad~ raga nivalis, L. Cornwallis Island. Ptentilla emarginata? Pursh. Wolstenholme. Dipena Lapponica, L. Whale Fish Island. Pyrola rotndifolia, L. Whale Fish Island. Casiope tetragona, Don. Ichauti. Vaccinium uliginosum, L. Loiseleuria procumbens, L. Whale Fish Island.
    • <genus>Polygonum</genus>
    • <species>viviparm</species>,
    • <author>L.</author>
    • <locality>Bushnan Island.</locality>
    • Created via automated or
    • semi-automated means.
    Machine-readable literature
  •  
  •  
  •  
    • 10.3 million name strings in NameBank
    • Uses sophisticated algorithm (TaxonGrab) to locate likely name strings in OCR text
    • Iterative processing of BHL texts will both increase the number of name strings in NameBank and increase the accuracy of name string recognition
  •  
    • Demonstration III: Taxonomic Intelligence
    • http://www.ubio.org
  •  
  •  
    • Search
    • Browse
    • Demonstration IV: BHL Portal
    • http://www.biodiversitylibrary.org
  •  
    • Funding from the MacArthur and Sloan Foundations
    • Part of the larger Encyclopedia of Life project
  •  
  • Structure of the Encyclopedia of Life Serine Molecule
  • Serine Molecule Synthesis Center Field Museum Biodiversity Heritage Library Secretariat Smithsonian Education & Outreach Smithsonian/Harvard Informatics Marine Biological Laboratory & MOBOT
  •  
  •  
  •  
  • Let us … rejoice in the fact, that we have realised what no other kingdom can boast of, and that such vast and harmoniously related accumulation of knowledge is gathered together around a library Charles Darwin, et al, 1858
  •  
    • Thanks to:
      • Chris Freeland, Missouri Botanical Garden
      • Neil Thomson, Natural History Museum, London
      • David Remsen, Global Biodiversity Information Facility
      • Neil Sarkar, Marine Biological Laboratory/Woods Hole Oceanographic Institution
      • Anna Weitzman, National Museum of Natural History
      • Chris Lyal, Natural History Museum, London
      • The staff at the Internet Archive
    • Images from
      • The Galaxy of Images, Smithsonian Libraries ( www.sil.si.edu/imagegalaxy )‏
      • NASA, Visible Earth Project
      • Martin R. Kalfatovic
      • Diane Rielinger
    • Biodiversity Heritage Library http://www.biodiversitylibrary.org/
    • Encyclopedia of Life http://www.eol.org/
    • Smithsonian Institution Libraries http:// www.sil.si.edu /
    • Universal Biological Indexer and Organizer http://www.ubio.org/
    • Sherborn’s Index Animalium http:// www.sil.si.edu/digitalcollections/indexanimalium /
    • Neave’s Nomenclator Zoologicus http:// www.ubio.org/NomenclatorZoologicus /
    • United States Exploring Expedition http:// www.sil.si.edu/DigitalCollections/usexex/index.htm
    • Biologia Centrali-Americana http://www.sil.si.edu/digitalcollections/bca/
    • Botanicus http:// www.botanicus.org /