Your SlideShare is downloading. ×

BHL Tech Report

1,021

Published on

Technical Report to the Biodiversity Heritage Library Institutional Council on 22 Mar 2010 at American Museum of Natural History

Technical Report to the Biodiversity Heritage Library Institutional Council on 22 Mar 2010 at American Museum of Natural History

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,021
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Technology Review: BHL Institutional Council Mtg 22 Mar 2010
  • 2. Stats
  • 3. Now online
    • 40,000 titles
    • 76,000 volumes
    • 28.7 million pages
    • 70 million name strings
    • 58 million confirmed names
    • 1.4 million unique names
  • 4.  
  • 5. Size of BHL content *today*
  • 6. Bigger than a breadbox, smaller than a sperm whale http://biodiversitylibrary.org/page/5225013
  • 7. Usage
  • 8. 1.1mil visits from 231 countries since launch
  • 9. Referrers: 2008 - 2009
  • 10. Referrers: 2010 Jan 1 – Mar 15, 2010
  • 11. Stats unique to our tools
  • 12. PDF Articlizing stats
  • 13. # Items by Library Items   Institution Name 11,476   University of California Libraries (archive.org) 11,244   MBLWHOI Library 9,537   Smithsonian Institution Libraries 6,461   New York Botanical Garden 5,129   Harvard University, MCZ, Ernst Mayr Library 4,932   Gerstein - University of Toronto (archive.org) 3,882   Natural History Museum, London 3,350   Missouri Botanical Garden 2,821   Library of Congress (archive.org) 2,509   University of Illinois Urbana Champaign 2,029   American Museum of Natural History Library 1,996   NCSU Libraries (archive.org) 1,692   UMass Amherst Libraries (archive.org) 1,296   Webster Family Library of Veterinary Medicine (archive.org) 1,216   Robarts - University of Toronto (archive.org) 1,100   Canadiana.org (archive.org) 621   Boston Public Library (archive.org) 579   University of New Hampshire Library (archive.org) 516   Montana State Library (archive.org) 282   Prelinger Library (archive.org)
  • 14. # Names by Library Names   Institution Name 14,109,080   MBLWHOI Library 12,241,186   Smithsonian Institution Libraries 9,105,969   New York Botanical Garden 7,860,553   Missouri Botanical Garden 5,323,730   University of California Libraries (archive.org) 4,818,365   Harvard University, MCZ, Ernst Mayr Library 4,776,527   Gerstein - University of Toronto (archive.org) 3,050,242   Natural History Museum, London 2,387,731   American Museum of Natural History Library 2,292,570   NCSU Libraries (archive.org) 2,106,182   UMass Amherst Libraries (archive.org) 1,836,281   University of Illinois Urbana Champaign 532,635   Earth Sciences - University of Toronto (archive.org) 518,695   Robarts - University of Toronto (archive.org) 225,357   Canadiana.org (archive.org) 177,283   Boston Public Library (archive.org) 97,663   Library of Congress (archive.org) 83,089   Prelinger Library (archive.org) 75,113   University of Connecticut Libraries (archive.org) 71,512   The Field Museum
  • 15. “Taxonomic Density” by Library Simple: avg. # names / item Tax. Density Names Items   Institution Name 2,346.4 7,860,553 3,350   Missouri Botanical Garden 1,409.4 9,105,969 6,461   New York Botanical Garden 1,283.5 12,241,186 9,537   Smithsonian Institution Libraries 1,254.8 14,109,080 11,244   MBLWHOI Library 1,244.8 2,106,182 1,692   UMass Amherst Libraries (archive.org) 1,176.8 2,387,731 2,029   American Museum of Natural History Library 1,148.6 2,292,570 1,996   NCSU Libraries (archive.org) 968.5 4,776,527 4,932   Gerstein - University of Toronto (archive.org) 939.4 4,818,365 5,129   Harvard University, MCZ, Ernst Mayr Library 785.7 3,050,242 3,882   Natural History Museum, London 731.9 1,836,281 2,509   University of Illinois Urbana Champaign 463.9 5,323,730 11,476   University of California Libraries (archive.org)
  • 16. Q. How many species have been reported only once? [Taxacom]
    • As of March 1, 2010, BHL had identified more than 70 million potential name strings across its 28 million digitized pages using uBio's TaxonFinder. 58 million of those name strings were confirmed as a name with a NameBankID. Of that set, 1,491,000 name strings were unique. 329,000 of those unique names were found on a single page in BHL.
  • 17. Application / Portal
  • 18. New since November
    • New color scheme
    • IA / CDL content
      • + names indexing
    • APIs
    • OAI interface
    • Work on Darwin’s Library annotations
    • Primary / Secondary titles enhancements
    • Started testing solutions for “orange bag problem”
    • Working with EOL on nomenclatural acts service
  • 19. Consumers
    • EarthCape
    • BioGuid
    • BioSTOR
    • JSTOR – in discussion
    • Research projects
    • BREC - NSF
    • Conjecturator - NSF
    • Darwin’s Library – NEH/JISC
    • Hong Cui @ University of AZ - NSF
  • 20. OCR correction using WikiSource http://biostor.org/wiki/Page:Spixiana1999zool.djvu/293
  • 21. Partnership Statement
    • What, if anything, do we need as an agreement between parties for use of BHL materials?
      • Always open access – more a service agreement
    • Consider: What is true value of $50 we paid to scan BookX when inserted into other research
  • 22. Terms of Use / Privacy Policy
    • Need resolution to move forward on publishing APIs
  • 23. Hardware / Infrastructure
  • 24. WH cluster
    • Transferred 28,000 volumes from IA
      • 22TB
    • 44,000 more in the queue
      • Started Friday
    • Complete BHL + IA/CDL by May
    • Need to discuss implications with BHL-Europe
  • 25. Cluster ~$17,ooo USD
  • 26. DuraCloud
    • Pilot has added partners
      • BHL
      • NYPL
      • WGBH
      • More to come
    • 10TB of content uploaded
      • Good test set, not complete, not intended to be
    • Test download speeds with BHL-E & BHL-Au
    • June 30 deadline for uploading without $$
  • 27. Global BHL
  • 28.  
  • 29.  
  • 30. BHL-Europe
  • 31. BHL-Europe
    • http://biodiversitylibrary.eu
    • Hiring WP2 leader
    • Moving bidlist to Vienna
    • Building infrastructure
    • Getting content
    • Submitting metadata to Europeana
  • 32. BHL-China
  • 33. BHL-China
    • http://bhl-china.org
    • Still working out issues for scanning
    • Plan to scan 48,000 books / year
      • 2 shifts
      • 10 Scribes
    • Excited about Global Tech meeting
      • Will come prepared with ideas for change
  • 34.  
  • 35. BHL-Australia
  • 36. BHL-Australia
    • http://ec2-75-101-224-221.compute-1.amazonaws.com/
      • Took code & easily ran in EC2
    • Offered usability assistance
    • Planning workshop in Au in May
    • September 2010 relaunch of ALAu
    • Ready to go
  • 37. BHL-Brasil
  • 38. BHL-Brasil
    • SciELO content ready for import
      • Can automate ingest into CiteBank
  • 39. CiteBank
  • 40. Ingesting content from Publishers
    • Big publishers - auto ingest
      • Machine to machine
      • Set up, configure & go
    • Small publishers - need help
      • Niche content
      • Likely to provide some assistance, but will require it
    • Individual users – need help
      • Need a lot of individual attention
      • Big community & opportunity, but takes tending
    Publishing platform also important
  • 41. Similar missions / Staffing issues
    • PubMed Central
    • PLoS
    • JSTOR
      • All with multiple staff to handle ingest, inquiries
  • 42. CiteBank Possibilities
    • Need 2 years of developer work to make it bigger
    • Or…
    • Need 2 years of content assistance to make it better
      • fill data into existing structure
    • Biblio is a good start, but needs some tuning for biodiversity literature
  • 43. TL3: GRIB
    • Taxonomic Literature 3: The Global Reference Index to Biodiversity
    • Critical, yet absent: comprehensive list of biodiversity literature, complete with all variants in spelling, known identifiers over time, bibliographic descriptions, and recommendations on how to cite each work.
    • *Big* job, but doable
      • Modeled on & worked in association with Taxonomic Literature 2
  • 44. M agic U nicorn S yndrome Fearing
  • 45. Reallocation
  • 46.
    • Radical question: If #BHL could offer you more content or more services, which would you choose? "Both" not an option in this experiment.
    Posted to Twitter http://twitter.com/chrisfreeland/status/10575364681
  • 47. “ CONTENT!”
    • @chrisfreeland Given that I make my own services, content is what I want #bhl #allyourdataarebelongtome
    • @chrisfreeland at this point of time more people will benefit from more content than more services. unless we treat indexing as service
  • 48.  

×