Scientific Disciplines From Discovery to Delivery Cathy Norton Deputy Director BHL BIOONE April 18, 2008
“ The launch of the Encyclopedia of Life will have a profound and creative effect in science… this effort will lay out new directions for research in Every branch of biology ”
E.O. Wilson
Collaborative Tree of Life distributed semantic Biodiversity Heritage Library ever evolving TED all information Synthesis Center Oh wow! SpeciesBase ClassificationBank Education and Outreach ANTS index MacArthur Foundation taxonomic intelligence modular software communal ownership user defined AvenueA | Razorfish OBIS MBL free visualization images WorkBench sounds phylogeny web 2.0 names-based infrastructure Atlas of Living Australia February 2008 Google Marine Biological Laboratory all species Smithsonian FISHBASE Harvard Field Museum Tree of Life E. O. Wilson aggregation / mashup EDIT ScratchPad widgets MOBOT NHM AMNH NYBotancial Sloan Foundation GBIF llison l NameBank videos National Geographic any classification TDWG/BIS
EOL Hierarchy
The EOL Steering Committee is comprised of senior authorities from Harvard University, Smithsonian Institution, the Field Museum of Chicago, the Marine Biological Laboratory at Woods Hole, the Biodiversity Heritage Library consortium, Missouri Botanical Garden, and the Macarthur and Sloan Foundations.
The EOL Institutional Council contains more than 25 institutions from around the world and provides EOL with global perspectives and outreach capabilities. The Distinguished Advisory Board consists of 13 global leaders from the scientific and policy communities.
Con’t
The Species Sites Group works with contributors and data providers and IP issues
Biodiversity Informatics Group is responsible for the software development of tools and open access delivery of species information through a single portal
Education and Outreach Group works to insure widespread awareness of the EOL
Biodiversity Synthesis Group will facilitate cross disciplinary involvement and will explore integrative topics, including taxonomy, evolution, biogeography, phylogenetics and biodiversity informatics.
Scanning and Digitization Group led by the Biodiversity Heritage Library , is a consortium of 10 natural history, botanical and research libraries that will scan for the public commons out of copyright and permissioned works.
Con’t
FishBase ( www.fishbase.org ), a global information system with data on practically every fish species known to science. FishBase is serving information on more than 30,000 fish species through the EOL.
The Catalogue of Life Partnership (CoLp) ( www.catalogueoflife.org ), an informal partnership dedicated to creating an index of the world’s organisms.. They contain substantial contributions of taxonomic expertise from more than fifty organizations around the world, integrated into a single work by the ongoing work of the CoLp partners. The EOL currently uses CoLp as its taxonomic backbone.
Tree of Life web project (ToL) ( www.tolweb.org ), a collaborative effort of biologists from around the world. On more than 9,000 Web pages, the project provides information about the diversity of organisms on Earth, their evolutionary history (phylogeny), and characteristics. ToL project illustrates the genetic connections between all living things.
The Global Biodiversity Information Facility (GBIF) ( www.gbif.org ), the world’s premiere source for information on biological specimen and observational data, providing on-line access to more than 135 million data records from around the world. GBIF is providing range maps for the EOL species pages.
AmphibiaWeb ( http:// amphibiaweb.org ), an online system enabling anyone with a Web browser to search and retrieve information relating to amphibian biology and conservation.
The Solanaceae Source Web site ( www.nhm.ac.uk/research-curation/projects/solanaceaesource ), The aim of the project is to produce a worldwide taxonomic monograph of the species occurring within the plant genus Solanum (the potato and tomato family), with principal investigators from four research institutions in England and the United States.
Data Partners
“ It is exciting to anticipate the scientific chords we might hear once 1.8 million notes are brought together through this instrument. Potential EOL users are professional and citizen scientists, teachers, students, media, environmental managers, families and artists. The site will link the public and scientific community in a collaborative way that’s without precedent in scale.”
Jim Edwards, Executive Director, EOL
Encyclopedia of Life
Major project to create a single Web page for every known species (1.8 million!)
Total funding will reach at least $50M
EOL needs the literature underpinning in the BHL project
Cache A data point is a collection of Data sources EOL Tree http://www.eol.org auto-updates Client application Update ontologies can be used to describe and relate the contents
Using ontologies, unique identifiers, an editable views by semantic lenses An Enterprise Semantic Information Fabric
Serine Molecule Biodiversity Heritage Library Synthesis Center Field Museum Informatics Marine Biological Laboratory & MOBOT Education & Outreach Smithsonian/Harvard Secretariat Smithsonian
This library serves the MBL, WHOI, USGS, NMFS, SEA, WHRC, and other scientific groups in the area. Facing a new dynamic phase NMFS - 1871 MBL - 1888 WHOI - 1930 USGS - 1960 SEA - 1971 WHRC - 1985 Woods Hole Scientific Community
Biodiversity Heritage Library
Museums
Field Museum
Natural History Museum (London)
Smithsonian Institution
American Museum of Natural History
Botanical Gardens
Missouri Botanical Garden
New York Botanical Garden
Royal Botanic Gardens, Kew
University Libraries
Botany Libraries, Harvard University
Ernst Meyer Library of the Museum of Comparative Zoology, Harvard University
Mission: Provide Open Access to Biodiversity Literature Goals:
Digitize the core published literature on biodiversity and put on the Web
Agree on approaches with the global taxonomic community, rights holders and others
How big is the Biodiversity domain?
Over 5.4 million books dating back to 1469
800,000 monographs
40,000 journal titles (12,5000 current )
50% pre-1923
Why now?
Cost low – 10-19 cents a page
Other projects funded recently – BL/Microsoft /Google big ten
Tractable, well-defined scientific domain
Taxonomic information has exceptionally longevity
Supports GBIF and other international initiatives – including CBD, ABS, Darwin Declaration
Taxonomists and other scientists will have access to biodiversity literature - globally
Will provide the developing world with access to the historical literature
Scientists working in many biological domains – and other areas like meteorology, geology, ecology, genomics, etc – will get access
Advance objectives of the Convention on Biological Diversity
Benefits
Less space needed for Library collections In Lillie – space freed for other uses
% material can be stored off-site in ‘dark storage. FTP
Our scientists will get access at their desk or in the field
Library focus will shift to informatics
Virtual web library will increase public access
Library staff will change –
Benefits to the MBLWHOI Library
Key partner of Encyclopedia of Life
Working Groups have agreed technical plan , metadata standards and image standards
Internet Archive to be technical partner – scanning and hosting
‘ Scribe’ scanners now installed in NHM NYC and in Boston
2.5 million pages already available
Where are we now?
Legal issues - BHL organisational structure, content licensing, contracts being developed by EFF
BHL will take responsibility for long-term sustainability of the scanned material
Blackwells Publishing/Wiley back-files possibly available through the BHL
Zoological Record will provide their index as route to BHL articles
OCR and name recognition tools identified and linked to project - Taxonomic Intelligence
BHL is US/UK focused.
Plans to engage European partners – through projects such as EDIT and SYNTHESYS – in a similar attempt to capture the non-English language publications
G8+5 Environment Ministers identified need for ‘Global Species Information System’ – first EU meeting to address response endorsed the BHL as the way forward
Positive discussions have already taken place with the Chinese Academy of Sciences
Australian Government likely to fund scanning as part of Atlas of Australian Life
Where are we now? Europe, Rest of the World
Classes of texts
Public Domain – pre-1923
Non-profit society journals
Post-1923 monographs
some with copyright renewals
some without copyright renewals
Commercial journals
BHL Seeks Permissions
BHL will digitize learned society backfiles and mount them through the BHL Portal at no cost.
Will provide a set of files to the learned society for reuse as they see fit.
Will index the issues using Taxonomic Intelligence increasing their usability.
Benefits
Use of the articles will increase as evidenced by citation upsurge.
Long-term management of the digital assets is provided by the BHL at no cost so it’s contributors
Content will be integrated into EOL project through TI nomenclatural linking.
The Long NOW Strategy Georges Louis Leclerc, comte de Buffon Histoire naturelle : générale et particulière (Oiseaux) , 1799-1808 Convention on Biological Diversity: Article 17
Institutions that are creating the BHL exist to persist through time.
The future is uncertain, the technology landscape changes, people pass on. So create consortial structures that are low-overhead, flexible, and can respond quickly.
Interoperability is the key.. Repository islands will sink
Biologia Centrali-American Physical Distribution… Now… you can Parse data, harvest out data, Wealth of information locked on the pages are now liberated!
Henry Walter Bates The Naturalist on the River Amazons , 1863 Most literature is in the developed world the Northern Hemisphere Most Biodiversity is in the developing world the Southern Hemisphere
Progne subis- Purple Martin Illustrations of the nest and eggs of birds of Ohio , 1879-1886 Library and Laboratory: the Marriage of Research, Data and Taxonomic Literature London, February 2005 Eighty participants from 22 countries gathered to discuss the status and future of access to the taxonomic literature and to propose an agenda for actions that would improve the research environment for taxonomy. The participants were taxonomists; librarians; publishers; representatives of learned and professional societies, private foundations and government agencies; and specialists in information and communications technology. Scalable Mass Scanning Contracts Firewalls Security Loading Docks Trucks 180 mile round trip!
Ernest Ingersoll Hand-book to the National Museum … Smithsonian Institution , 1886 Mass Scanning Workflow Bid Lists Pick Lists Packing Lists Serials Management Monographic Management Stickers for Media and carts Rare Books-vaults
It began and begat Reptilia and Batrachia . (1885-1902) by Albert C.L.G. Günther Open Access: all content can be reused, repurposed, reformatted, sliced, diced, scraped, harvested, integrated. 2003 Telluride . Encyclopedia of Life Meeting 2005 London. Library and laboratory: the Marriage of Research, Data, and Taxonomic Literature. June 2006 Washington. Organization and Technical Meeting October 2006 St Louis/San Francisco Technical Meeting
Reptilia and Batrachia . (1885-1902) by Albert C.L.G. Günther February 2007 MCZ Harvard Organizational Meeting May 2007 Encyclopedia of Life Launch. Washington DC Sept 2007 Missouri Botanical Garden Technical Meeting March 2008 MCZ Harvard Organizational Technical Meeting
Collaborators Sanborn Tenney Natural History of Animals . . . 1868. Internet Archive Set up scanning centers in London, New York, Washington, Boston, etc. High-quality, non-destructive Scanning. Image files and text derived from OCR. Internet Archive International Commission on Zoological Nomenclature Open Content Alliance European Distributed Institute of Taxonomy Global Biodiversity Information Facility (GBIF) Many more under negotiation Sanborn Tenney Natural History of Animals . . . 1868.
Jacob Christian Schäffer Elementa entomologica . . . 1766. BHL Portal http:// www.biodiversitylibrary.org Serve image and test files: create volume, Part, piece, metadata; ingest page level Metadata at scanning level; apply Globally Unique Identifiers (GUIDs) for linking to Other taxonomic services.
Internet Archive Scribe: Boston
Biodiversity Heritage Library
Collaborators: Internet Archive
Biodiversity Heritage Library
Biodiversity Informatics
Period of explosive growth
NCL Centre for Biodiversity Informatics (India)--2000
Species-bases sites: FishBase, AntWeb, AmphibiaWeb, North American Mammals, Swedish ArtDatabanken, Atlas of Living Australia, Netherlands species compendium …
Specimen-based networks: HerpNet, MANIS, ORNIS,
Regional networks: IABIN, OBIS, …
Biogeomancer--2005
IdentifyLife--2005
JRS Biodiversity Foundation--2005
European Distributed Institute of Taxonomy (EDIT)--2006
BDI curricula
University of Illinois Master of Science in Biological Informatics--2006
Encyclopedia of Life (EOL)--2007
An example: The Encyclopedia of Life (EOL)
An online encyclopedia composed of 1.8 million web sites
One for each known species
EOL is developing two aspects of the original GBIF work programme
SpeciesBank--assemblage of all kinds of information about species
Digital library of biodiversity literature
Web 2.0 components of the Encyclopedia of Life (EOL)
Each site consists of several components
Species page for the general public
Draft pages assembled via mashup technology
Drafts authenticated by experts (“curators”) using controlled wikis
Information protected from being changed by anyone except the curators
But anyone can comment on the information and or suggest things to add
Curators will examine these suggestions and may move some of the information to the protected part
Each site consists of several components
Species page for the general public
Community-assembled spaces
E.g. taxonomists, molecular biologists, horticulturists, birdwatchers, pollinator biologists, etc., etc.
Each group/network controls the information on its space
Web 2.0 components of the Encyclopedia of Life (EOL)
Example of a science-based community-assembled space on the EOL
Scientists working on ageing wanted access to longevity information on the EOL
Proposed to organize their community to find this information and put it on the EOL species pages
Will set up their own portal into this information and manage the changing of the information
Received USD 2 million from private foundation to fund this activity
Example of an education-based community-assembled space on the EOL
A school wishes to catalogue the biodiversity of a site near their schoolyard
EOL and GBIF supply a bioblitz tool for them to use
Use GPS-enabled phones to take pictures of organisms found on the site
Assembly software combines these into a community inventory
Students identify the organisms using EOL species pages
Prepare inventory of the site
Serve that information back to the EOL web pages (and potentially even to GBIF)
Web 2.0 components of the Encyclopedia of Life (EOL)
Each site consists of several components
Species page for the general public
Community-assembled spaces
Digitized biodiversity literature
Biodiversity Heritage Library--consortium of 10 of the largest natural history libraries
Scanning and marking up of 320,000,000 pages of literature
“ All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge.” ~ Grimaldi & Engel, 2005, Evolution of the Insects
Who knowth not the name, knoweth not the subject Linnaeus, 1737, Critica Botanica n 210 .
Information about named groups (taxa) of organisms (taxon-related information)
Extends back at least 1000 years
Books, journals, surveys
Museum specimens, herbaria
In many languages and is distributed
From T.E. Glover, The Fishes of Southwestern Japan, c.1870
The challenge for contemporary DIGITAL libraries Goal: Use one name to find the content for all names
Names – the only universal metadata for Biology Names offer a logical way to search for and index content
Names annotate data objects
All names annotate all data objects
A compilation of all names ever used is the foundation of a universal index for biology or for a semantic web for biology
Who is affected by these problems? Libraries Publishers Museums Federal Agencies
Serious challenges in federated environments One organism 4 scientific names 4 maps We want one map
Reconciliation – linking alternative names for the same organism A query initiated with any name, can be expanded to all names and will unify data associated with each
Reuse, don’t rebuild
All names & all Classifications ClassificationBank
Alternative names reconciled
Similar names disambiguated
Exploit hierarchies to browse and search, build a comprehensive classification
Improve performance with federated systems
Read documents, web sites, databases and taxonomically indexing the content
Create a unified portal to information about organisms on the internet
Taxonomic intelligence is the inclusion of taxonomic practices, skills and knowledge within informatics services to manage information about organisms
data from various sources may be merged
red dots on the map link back to the website that provided the geographical
co-ordinates
Specimen distribution data from remote sources
Biodiversity Heritage Library BHL Taxonomic Intelligence Tool Georges Louis Leclerc, comte de Buffon Histoire naturelle : générale et particulière (Oiseaux) , 1799-1808
uBio
10.7 Million+ Name Strings
Reconciliation Groups
http://www.ubio.org
FindIT - uBio’s Scientific Name Recognition Algorithm
Acknowledgments Patrick Leary David Remsen Diane Rielinger David Patterson Neil Sarkar A.W. Mellon Foundation Alfred P. Sloan Foundation John D. & Catherine T. MacArthur Foundation Internet Archive Jim Edwards Christopher Freeland Tom Garnett Martin Kalfatovic Graham Higley BHL & EOL Teams
0 comments
Post a comment