• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Biodiversity Heritage Library Project “Underway” Taxonomic Intelligence for LARGE Scale Digitization Projects
 

Biodiversity Heritage Library Project “Underway” Taxonomic Intelligence for LARGE Scale Digitization Projects

on

  • 2,620 views

Biodiversity Heritage Library Project “Underway” Taxonomic Intelligence for LARGE Scale Digitization Projects by Cathy Norton, Marine Biological Laboratory / Woods Hole Oceanographic Institution ...

Biodiversity Heritage Library Project “Underway” Taxonomic Intelligence for LARGE Scale Digitization Projects by Cathy Norton, Marine Biological Laboratory / Woods Hole Oceanographic Institution Library. 33rd IAMSLIC: Changes on the Horizon.
October 7-11, 2007. Sarasota. FL

Statistics

Views

Total Views
2,620
Views on SlideShare
2,619
Embed Views
1

Actions

Likes
0
Downloads
20
Comments
0

1 Embed 1

http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Biodiversity Heritage Library Project “Underway” Taxonomic Intelligence for LARGE Scale Digitization Projects Biodiversity Heritage Library Project “Underway” Taxonomic Intelligence for LARGE Scale Digitization Projects Presentation Transcript

  • Biodiversity Heritage Library Project “Underway” Taxonomic Intelligence for LARGE Scale Digitization Projects 33 rd IAMSLIC Conference Sarasota, FL Cathy Norton MBLWHOI Library Director Oct. 7-11, 2007
  • This library serves the MBL, WHOI, USGS, NMFS, SEA, WHRC, and other scientific groups in the area. Facing a new dynamic phase NMFS - 1871 MBL - 1888 WHOI - 1930 USGS - 1960 SEA - 1971 WHRC - 1985 Woods Hole Scientific Community
    • Biodiversity Heritage Libraries
    • Open Content Alliance, Principles
    • Internet Archive Partner
    • Northeast Regional Digitizing Center @Boston Public Library
    • Taxonomic Intelligence- modernizing the literature
    TOPICS
  •  
  • Vision Build a Digital Open Access Library for Biodiversity Literature
  • Meetings in Colorado,2005 London, 2005 laboratories and libraries Washington BHL 2006 Simultaneous Meetings in Woods Hole for BHL& EOL 2006
  • Members
    • American Museum of Natural History
    • Botany Library- Harvard
    • British Natural History Museum, UK
    • Field Museum
    • MBLWHOI Library
    • Missouri Botanical Gardens
    • Museum of Comparative Zoology-Harvard
    • New York Botanical Gardens
    • Royal Botanical Gardens @ Kew ,UK
    • Smithsonian Museum of Natural History
      • University of Illinois, contributing member
    • Legacy Taxonomic Literature available in museums has limited access
    • Much of it is rare
    • Systematic literature depends on the historic literature
    • The cited half-life of natural history is longer than that of any other scientific domain
    • 90% of Biodiversity Information is in these libraries
    • 90% of Biodiversity is in 3rd world countries like Africa and South America
    Why BHL now ?
  • The Open Content Alliance (OCA) represents the collaborative efforts of a group of cultural, technology, nonprofit, and governmental organizations from around the world that will help build a permanent archive of multilingual digitized text and multimedia content.
  • Principles of OCA
    • The OCA will encourage the greatest possible degree of access to and reuse of collections in the archive, while respecting the rights of content owners and contributors.
    • INTERNET ARCHIVE
    • Contributors will determine the terms and conditions under which their collections are distributed and how attribution should be made.
    • IA need not be obligated to accept all content that is offered to it and may give preference to that which can be made widely accessible.
    • IA will offer collection and item-level metadata of its hosted collections in a variety of formats.
    • IA welcomes efforts to create and offer tools (including finding aids, catalogs, and indexes) that will enhance the usability of the materials in the archive.
    • Copies of IA collections will reside in multiple archives internationally to ensure their long-term preservation and accessibility to all.
  • Name: BioDiversity Heritage Library Wiki- for all involved Web Presence! Where to begin?
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  • In the end… simplicity…
    • http://bhl.si.edu/
    • BHL invited to be a part of the EOL project.
    • EOL - build one web page for each known species… 1.8 million!
    • Alfred P. Sloan and Macarthur Foundations
  •  
  •  
  • Northeast Digitization Center
    • Boston Public Library
      • Space infrastructure
    • 10 Scanning Stations
    • .10 ¢ per page
    • 50 Books per day
    • Journals- metadata,foldouts
    • Transportation
      • ILL delivery
      • moving company
      • 15 rolling carts per trip
    Photo by lesveilleus 9/20/07 Cathy Norton, Bernie Margolis, Brewster Kale
  • Economies of Scale
    • North East Regional Digitization Center
    • Agreements made with the Boston Public Library to Include the Boston Library Consortium and NE BHL members.
    • Smithsonian and Library of Congress
    • Field Museum of Ill
    • BNH UK and Kew UK*
    Bernie Margolis- Boston Public Library Judy Warnement - Harvard Botany Library, Brewster Kale-Internet Archive Cathy Norton-MBLWHOI Library Doran Weber- Alfred P. Sloan Foundation
  •  
  • 10 scribes BPL
  • Biology Digitization Projects Problems, Dilemmas,Puzzles,Difficulties
    • Copyright - Pre 1923, 1923-1964 , orphan works, out-of-print
      • Stanford University Copyright Renewal Database
    • Permissions
    • Collaboration with publisher, societies, institutions, etc.
    • Duplicates, journals 85,000 - 14,000 BID LIST
    • Monographs, collection analysis-- Ref Works
  • Name Changes over Time Taxonomic Intelligence
  • “ All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge.” ~ Grimaldi & Engel, 2005, Evolution of the Insects
  • The challenge for contemporary DIGITAL libraries Goal: Use one name to find the content for all names
  • Names are even misspelled, such as Loligo pealei Loligo pealeii Loligo pealii Loligo pealei
  • One name can refer to more than one organism Peranema – the fern Peranema – the euglenid Yet, despite this, taxonomists have used names and hierarchies to manage information about organisms very effectively for 250 years
  • Who is affected by these problems? Libraries Publishers Museums Federal Agencies
  • Serious challenges in federated environments One organism 4 scientific names 4 maps We want one map
  • Reconciliation – linking alternative names for the same organism A query initiated with any name, can be expanded to all names and will unify data associated with each
    • All names & all Classifications ClassificationBank
    • Alternative names reconciled
    • Similar names disambiguated
    • Exploit hierarchies to browse and search, build a comprehensive classification
    • Improve performance with federated systems
    • Read documents, web sites, databases and taxonomically indexing the content
    • Create a unified portal to information about organisms on the internet
    Taxonomic intelligence is the inclusion of taxonomic practices, skills and knowledge within informatics services to manage information about organisms
  • Taxonomically intelligent aggregation technology builds portals to distribute information about organisms
    • There are many resources out there, but no single comprehensive resource for species information
    • Rather than building another big database, we can create a new way to link existing information using an aggregation portal
    • This places little or no burden on data providers
    • Protecting ownership and diversity of initiatives
  • Alternative names Expert view Indexing power from NameBank Vernacular names More or less specific Suggestions & corrections
  • Results from an array of resources
    • data from various sources may be merged
    • red dots on the map link back to the website that provided the geographical
    • co-ordinates
    Specimen distribution data from remote sources
  • FindIT - uBio’s Scientific Name Recognition Algorithm
  • Training and Improving the Algorithm
  • uBioRSS Taxonomically Intelligent RSS Feed Aggregator
  • uBioRSS Taxonomically Intelligent RSS Feed Aggregator
  • MBL WHOI Library – Woods Hole authors’ publications
  • MBL WHOI Library – Woods Hole species publications
  • Taxonomically intelligent scientific text parsing
  • Taxonomic intelligence works miracles
    • It will benefit any initiative that uses distributed and heterogeneous information about biology
    • Distributed content on the same species can be drawn together because different names will be standardized through reconciliation
    • We can read documents, find names, catalog and taxonomically index documents
    • Produce a framework around which we can organize and assemble remote and local content
  • “ Taxonomic intelligence” enhances search
    • Documents go to Internet Archive for OCR and storage
    • The documents are added to the BHL collection
    • uBio checks the BHL collection for new documents
    • The documents are scanned for names
    • TaxonFinder adds new strings to Namebank
    • Document markup with anchors
    • TaxonFinder adds all namebankIDs to Taxonomic Index
    • This index is called upon by various applications...
  • Biological Data Revolution Biomedical Knowledge Biodiversity Knowledge
  • Scientific Names Escherichia coli No Complete List of Scientific Names Published Variants 112,133 741,872  49,382 Objective Synonyms Bacterium coli Bacillus coli Mis-spellings Escheria coli * *Scientific Names ≠ Species
  • Taxonomic Knowledge
  • BHLVision Information coming from Everywhere
  • What role will libraries play once the scanning is done?
    • Will you be negotiators like you are now with serials?
    • Public domain publications restricted FOREVER by contract…. or open?
  • Road map
    • US libraries -- 12 billion per year ( OCLC)
    • Acquisitions--- 3-4 billion per year
    • 1% - could scan 1 million BKS/vols per year
    • Librarians will create informatics tools that will enhance indexing and organizing for not only their users but world wide
  • Acknowledgments A.W. Mellon Foundation Alfred P Sloan Foundation Macarthur Foundation Martin Kalfatovic Tom Garnet Graham Higley Connie Rinaldo Neil Sarkar David Remsen David Patterson Diane Rielinger Lesveilleux.com
  • URLS www.biodiversitylibrary.org www.eol.org www.collections.stanford.edu/copyrightrenewal www.ubio.org The Public Domain: How to Find & Use Copyright-Free Writingsk Music, Art & More by Attorney Stephen Fishman
  • BHL Bid List for Serials http://obsidian. nhm .ac. uk/test/library/bhlseriallist/ Taxa Toy