Collection assessment in a collaborative environment: Biodiversity Heritage Library
Upcoming SlideShare
Loading in...5
×
 

Collection assessment in a collaborative environment: Biodiversity Heritage Library

on

  • 205 views

 

Statistics

Views

Total Views
205
Views on SlideShare
204
Embed Views
1

Actions

Likes
2
Downloads
4
Comments
0

1 Embed 1

http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • GOALS:
  • A free & open access digital library for biodiversity literature and primary source materials (field books)A consortium of 15 libraries working together to run a virtual library branchA collection of content from the 15 member BHL consortium and other Internet Archive contributorsAnyone is free to access & download BHL materials
  • SEARCH: Subject searching in BHL via the advanced search http://biodiversitylibrary.org/advsearch"subjects" tab is searching through the table of subject keywords we have in BHL, derived from the LCSH. It does NOT search titles or scientific names. If you do a basic keyword search via the homepage for a subject term, say "Birds", you will pull hits across all titles, articles, authors, subjects and scientific names broken out by tabs. Notice that the subjects tab shows all search results where "birds" is a part of the subject keyword string such as "Birds of prey" or "Cage birds".
  • COLLABORATION!
  • Add images…Also add DOIs?
  • User feedback is key; we rely on the many eyes of the crowd to help us direct our curation activities to the content people are actually usingUsers can let us know if they find a problem with something in our collection through our general feedback form and place a request for something to be scanned through our scanning request form
  • The trees of north america, entomology, or bears: metadata, right? BUT LCSH doesn’t adequately describe the biodiversity literature. Scientists organize around scientific names, articles, and parts of articles (species descriptiond)Rod Page did this: constructed a table listing all the journals in BioNames that have an ISSN, ordered by the number of articles in BioNames (i.e., mostly articles that publish new names). The full table is here, I've reproduced part of it below (limited to those journals with at least 500 articles in BioNames)
  • From Rod PageRod Page did this: constructed a table listing all the journals in BioNames that have an ISSN, ordered by the number of articles in BioNames (i.e., mostly articles that publish new names). The full table is here, I've reproduced part of it below (limited to those journals with at least 500 articles in BioNames)
  • The trees of north america, entomology, or bears: metadata, right? BUT LCSH doesn’t adequately describe the biodiversity literature. Scientists organize around scientific names, articles, and parts of articles (species descriptiond)
  • The Biodiversity Heritage Library uses taxonomic intelligence tools, including Global Names Recognition and Discovery (GNRD) developed by Global Names Architecture, to locate, verify, and record scientific names located within the text of each digitized page. The Note: The text used for this identification is uncorrected OCR, so may not include all results expected or visible in the pageThis names-based index is an incredibly valuable tool for organismal research, and is easily incorporated into external web sites through two different methods of access.
  • Bold= focus for this session—what we have provided on library boxNames aEach dataset has its own complexity: - taxonomic names have a. hierarchy (the previous to last is an infraspecific taxonomic level: forma) b. change over time (the 4th one in the list Pinusdivaricata is a synonym)c. and have all sorts of exceptions to the rules (the last one Pinus X murraybanksiana is a hybrid) - common names are a. subjective, biased towards organisms of well known groups onlyb. are dependant on language, region and time. - subjects are a. language dependantb. hierarchicalc. at title levelre extracted from OCR text
  • These have all been provided on Library Box, in addition to some more specific setsAlso have MODS, Endnote and BibTex files for titles, items/volumes and parts
  • A visualizaton of BHL data (for Pinusbanksiana)How do we reconcile all of this to find out what content covers our question? How can we map the more specific terms to LCSH/call numbers when we have limited resources--we need to automate as much as possible.  We want consistent language.  The BHL uses LC for the volumes but also pulls out scientific names.  How do we get them incorporated into the consistent language of LC in an automated way that can scale?  We want to know what we have so we can compare to an (as yet) unidentified universe.  (bibliographies, index animalium, TL2)A picture of BHL data (for Pinusbanksiana as it appears in page 140 of v.78 of The Canadian field naturalist)How do we reconcile all of this to find out what content covers our question? How can we map the more specific terms to LCSH/call numbers when we have limited resources--we need to automate as much as possible.  We want consistent language.  The BHL uses LC for the volumes but also pulls out scientific names.  How do we get them incorporated into the consistent language of LC in an automated way that can scale?  We want to know what we have so we can compare to an (as yet) unidentified universe.  (bibliographies, index animalium, TL2)Each dataset has its own complexity: - taxonomic names have a. hierarchy (the previous to last is an infraspecific taxonomic level: forma) b. change over time (the 4th one in the list Pinusdivaricata is a synonym) c. and have all sorts of exceptions to the rules (the last one Pinus X murraybanksiana is a hybrid) - common names are a. subjective, biased towards organisms of well known groups only b. are dependant on language, region and time. - subjects are a. language dependant b. hierarchical c. at title level
  • To show that name data come from multiple sources
  • BOLD means in library boxGoogle refine:  what they are and implications for collection analysisThese are links to
  •  index animalium, TL2; Literature breaks down by discipline and even by specific taxon; scientific names and bibliographic structure are different and we are trying to merge the two: looking at scientific data next to library data but have to make sense of the merger in the library world (see coll dev chart).  Scientists work at an article/name/article part level; we work on the level of the volume.Taxonomic Literature: A selective guide to botanical publications and collections with dates, commentaries and types (Stafleu et al.).TL-2 is the premier publication of the International Association for Plant Taxonomy (IAPT), TL-2 is a 15 volume guide to the literature of systematic botany published between 1753 and 1940. It is organized by author and includes numbered entries for the author's publications. How can we map back to LCSH/call numbers when we have limited resources--we need to automate as much as possible.  We want consistent language.  The BHL uses LC for the volumes but also pulls out scientific names.  How do we get them incorporated into the consistent language of LC in an automated way that can scale?  We want to know what we have so we can compare to an (as yet) unidentified universe.  (bibliographies, index animalium, TL2)IndexAnimalium is Sherborn’s life’s work—9000 page bibliography identifying the first book in which over 400,000 organisms appeared; covers 1758-1850LENGTHY process is all of this! Needs more automationZoological Record: is the world's oldest continuing database of animal biology. It is considered the world's leading taxonomic reference, and with coverage back to 1864, has long acted as the world's unofficial register of animal names.Early on we compared the universe of what is in the big libraries to what was in BHL and that allowed us to fill gaps:  https://bhl.wikispaces.com/BHL+Priority+Titles
  • These are keywords that we use to describe how we collect for BHL. These are adapted from LC but not necessarily actual subject heading. We modified some terms to make the language clear and bring in some of the scientific naming conventions (Ornithology instead of birds). This was meant to merge appropriate parts of the library and scientific world. This is the consistent language against which we want to compare BHL content.
  • Many irrelevant features; breaks up phrases (united states) At least is shows that we have lots of BOTANY (but we would want to merge that with plants) .
  • This shows the distribution of keywords for items scanned by the Ernst Mayr Library of the Museum of Comparative Zoology (good thing zoology shows up as a big piece). This was made using tableau software—all of the tiny items can be identified but like wordle, lots of irrelevant stuff. How can we automate the improvement and appropriate merging of metadata? http://public.tableausoftware.com/views/BHLViz/DigitizedSubjects

Collection assessment in a collaborative environment: Biodiversity Heritage Library Collection assessment in a collaborative environment: Biodiversity Heritage Library Presentation Transcript

  • Collection Assessment in a Collaborative Environment: BHL Connie Rinaldo, Bianca Crowley, Trish Rose Sandler & William Ulate
  • The BHL is… • A consortium of 15 natural history, botanical libraries and research institutions • An open access, full-text digital library for legacy biodiversity literature. • An open data repository of taxonomic names and bibliographic information • An expanding global effort • Mission: The Biodiversity Heritage Library improves & makes more efficient the methodology of research in biodiversity studies by collaboratively making biodiversity literature openly available to the world as part of a global biodiversity community.
  • BHL Goals • Goal 1: Relevant Content: Build & maintain the BHL as the largest reliable, reputable, & responsive repository of biodiversity literature & archival materials. • Goal 2: Tools & Services: Develop services & tools which facilitate discovery & improve research efficiency of BHL content. • Goal 3: User Engagement: Increase global awareness about the BHL through outreach, learning & education, & branding through engagement & collaboration with existing & new user communities. • Goal 4: Membership & Partnerships: Grow BHL consortia membership & partnerships while fostering cross-institutional collaboration that continues to serve as a model for digital library development • Goal 5: Financial Sustainability: Ensure sustainability & relevance by being flexible, adaptable, & financially sound while the content & services remain openly & freely available.
  • Core BHL Member Institutions
  • Global Partners
  • http://biodiversitylibrary.org Now online 64,188 titles 120,461 volumes 42 million+ pages
  • BHL Overview • New user interface launched in March • Search by title, author, article, subjects and scientific names • Various download options, including high resolution • Taxonomic name finding algorithm • Machine-to-machine services • Full-text search being tested
  • Core Principles • Open access • Open data • Deconstruct the silo and deliver content where users are already working – Via other biodiversity websites and taxonomic resources – Via social media platforms: blog, flickr, Facebook, Twitter, Pinterest, &etc. • Involve users in collection and technical development activities
  • Scanning Locally, Coordinating Globally Vols. 6, 8, 10 Issue Tracking Software Vols. 1-5 Vols. 7, 9, 11-21
  • Beyond the Silo: Open Data Stable URLs Open Data Policy APIs Application Programming Interfaces Data Exports OAI-PMH Open Archive Initiative – Protocol for Metadata Harvesting
  • User Feedback is Critical General feedback form http://biodiversitylibrary.org/contact Scan request form
  • Impact • “BHL came to the rescue when a planned trip to work in the Mertz Library at The New York Botanical Garden had to be cancelled due to Hurricane Sandy. Thanks to the online resources available through BHL I was able to source most of the key works I needed, with their supporting bibliographic information. Further use of BHL occurred when building work at the Linnean Society of London limited access to some of the book I had been able to use from that collection." • “I would like thank you all very much for invaluable work and support you do. I just got a pdf-file from more than century old (1893) journal paper (regional naturalist society paper, published in Finland), to get copy I should take 500 mile drive to our university library. Now I am got it fastly in high-quality pdf-copy. Cordial thanks and all success in continuing your highly valuable mission.” [conservation biologist from Estonia] • “You are a wonderful resource. I maintain a Website that describes the plant genus Opuntia (prickly pear cacti). There is no way I could maintain such a site without access to literature from 100-200 years ago. Most of the cactus species were discovered long ago; I find it invaluable to put up PDF files to document each species in the literature as I document them photographically. I am a botanist, but I work in the pharmaceutical field (not so many botanical jobs out there). Your library makes it possible for me to continue working with plants in a meaningful and scientific manner.”
  • Biodiversity Literature BHL EOL Scientific Names Researchers Publications Datasets Collecting Events Specimens Localities Field Notes Phylogenies Nomenclators Name Species Checklists Indexes Content Aggregators
  • Questions about BHL Content • How many books in BHL are there about....? • How can we identify areas of weakness in BHL in order to prioritize what materials to scan next? • Rod Page has one suggestion: http://iphylo.blogspot.com/2013/10/whichtaxonomic-journals-should-be.html
  • Questions about BHL Content • What are scalable solutions to content analysis? • Can we provide creative & meaningful visualizations?
  • Why do we care about taxonomic names? • Scientists use taxonomic names to organize their research • Biodiversity literature breaks down by discipline & by specific taxon
  • Extracted Scientific Names
  • What is “Taxonomic Intelligence”? • Global Names Recognition & Discovery tool – Locate, verify, record scientific names from each page – Text is uncorrected OCR
  • Overview of available BHL (meta) data http://biodivlib.wikispaces.com/Data+Exports • Title metadata: contributed from MARC records of hundreds of library catalogs (BHL consortium libraries & non-BHL IA contributors) • Volume/item metadata: provides information about the actual objects & pieces digitized • Subject • Creator/author data • Segment/part/”article” metadata (separate table for segment/part creators?) • Page metadata which includes our algorithmically identified scientific name data • OCR text available at the item/volume level but not overall for corpus of BHL
  • Data Exports
  • Visualization of BHL Data for Pinusbanksiana
  • Source Data Sample
  • Sample BHL & Nomenclatural Data • Google Refine reconciled list of BHL subject keywords • List of vetted BHL subject targets from collection development policy • Taxonomic name data set for trees of North America (link out) • http://www.fs.fed.us/database/feis/plants/tree/ind ex.html • http://www.treesofnorthamerica.net/ • Subject terms associated with BHL titles where Pinus banksiana occurs
  • OtherTools& Process • Bibliographies (discipline & more) • Index Animalium: identifies first appearance of 400,000 animals from 1758-1850 • Researcher supplied specific taxon bibliographies • Zoological Record: Taxonomic references back to 1864. • Taxonomic Literature II: a selective guide to botanical publications with dates, commentaries and scientific types • Compare universe of biodiversity literature to BHL • Unknown dataset for full universe • Compared BHL member collections to BHL content for gap-filling before content expanded (lists automated but gap identification manual) • REST especies: a way to collate species metadata? http://dopaservices.jrc.ec.europa.eu/services/especies/ • DOPA Explorer http://ehabitat-wps.jrc.ec.europa.eu/dopasimple/
  • SAMPLE VISUALIZATIONS
  • Core & Supporting Keywords for BHL Collections
  • Wordle for BHL Content
  • http://public.tableausoftware.com/views/BHLViz/DigitizedSubjects
  • Visualization Opportunities • JournalMap (geo tagging scientific literature) http://www.journalmap.org/ • Visualizing article performance http://bit.ly/1c4TJfn • Better Life Index http://www.oecd.org/statistics/datalab/bli.htm • Altmetric: http://www.altmetric.com/ • Tableau http://www.tableausoftware.com/public/ • Worth it: http://www.wired.com/wiredscience/2013/11/wireddata-life-martin-krzywinski/?viewall=true
  • Taxon Data Manipulation Opportunities • Euler Project: Reasoning with Taxonomies: http://euler.cs.ucdavis.edu/ • REST & Taxonomy: https://drupal.org/project/taxonomy_api
  • SUMMARY • • • • Metadata reconciliation Gap analysis Visualizations All automated!
  • Thank you for your Help! http://biodiversitylibrary.org Connie Rinaldo crinaldo@oeb.harvard.edu