Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Biodiversity Heritiage Library: progress and process

1,919 views

Published on

And update on Biodiversity Heritage Library's efforts and success in 2010 with a focus on the future as part of the EU project, ViBRANT.

Published in: Technology
  • Be the first to comment

Biodiversity Heritiage Library: progress and process

  1. 1. Phil Cryer Biodiversity Heritage Library Scripting Life : the science behind ViBRANT January 20-21, 2011 - Paris, France Biodiversity Heritage Library: Process & Progress
  2. 2. <ul><ul><li>a consortium of global partners </li></ul></ul><ul><ul><li>aims to share historic biodiversity literature texts </li></ul></ul><ul><ul><li>provides open access of all content </li></ul></ul><ul><ul><li>free for all </li></ul></ul>Biodiversity Heritage Library (BHL) Biodiversity Heritage Library (BHL)
  3. 3. bhl data stats
  4. 4. <ul><li>Content </li></ul><ul><li>45,000 journals & monographs </li></ul><ul><li>8,821 in 2010 </li></ul><ul><li>87,000 volumes </li></ul><ul><li>15,552 in 2010 </li></ul><ul><li>32 million pages </li></ul><ul><li>5.6 million in 2010 </li></ul>
  5. 5. <ul><li>Usage (2010) </li></ul><ul><li>837,000 visits </li></ul><ul><li>422,000 unique visitors </li></ul><ul><li>4.2 millions page views </li></ul><ul><li>221 countries/territories </li></ul>
  6. 6. new features
  7. 7. scanning request form click on ‘Feedback’ to access click on ‘Feedback’ to access click on ‘Feedback’ to access
  8. 8. new user interface for names index sortable columns, exportable via CSV, BibTeX and Endnote
  9. 9. downloadable article PDFs create articles from BHL books create articles from BHL books create articles from BHL books
  10. 10. downloadable article PDFs 1- enter metadata about the article 1- enter metadata about the article 1- enter metadata about the article
  11. 11. downloadable article PDFs 2- select the pages of the article 2- select the pages of the article 2- select the pages of the article
  12. 12. downloadable article PDFs 3- PDF request received 3- PDF request received 3- PDF request received
  13. 13. downloadable article PDFs 4- PDF article arrives via email 4- PDF article arrives via email 4- PDF article arrives via email
  14. 14. CiteBank ( http://citebank.org ) open access repository for biodiversity publications open access repository for biodiversity publications open access repository for biodiversity publications
  15. 15. CiteBank ( http://citebank.org ) Solr search with faceting Solr search with faceting Solr search with faceting
  16. 16. CiteBank ( http://citebank.org ) individual bibliography page individual bibliography page individual bibliography page
  17. 17. CiteBank features <ul><li>access the ‘crowd-sourced’ articles generated from the BHL scans (harvested from BHL) </li></ul><ul><li>platform for journals/publishers/societies in need of tools to store and share content </li></ul><ul><li>harvests metadata from Zookeys , SCiELO, Smithsonian collections nightly via OAI-PMH </li></ul><ul><li>new search index to BHL content using Solr </li></ul>
  18. 18. CiteBank + BHL expands our core features <ul><li>content and tools for scholarly crowd-sourcing </li></ul><ul><ul><li>Users can get content they need, do minor work, share enhancements with community </li></ul></ul><ul><li>look to add more content integration with other existing platforms </li></ul><ul><ul><li>EOL, Atlas of Living Australia, JSTOR Plant Science, BioStor and others </li></ul></ul><ul><ul><li>Mendeley , Zotero, RefWorks, etc </li></ul></ul>
  19. 19. <ul><li>enhancements to the portal home page </li></ul><ul><ul><li>More focus on search </li></ul></ul><ul><li>special collections </li></ul><ul><ul><li>Charles Darwin’s scientific library </li></ul></ul><ul><li>scholarly annotations </li></ul><ul><ul><li>annotations in Darwin’s hand and academic interpretation, crosslinking </li></ul></ul>More BHL features coming soon... More BHL features coming soon...
  20. 20. bhl global
  21. 24. Benefits of Global BHL partnerships <ul><li>redundancy and resilience </li></ul><ul><ul><li>data and app Mirroring </li></ul></ul><ul><li>exposing unique content </li></ul><ul><li>new tools, services, people </li></ul><ul><li>opportunities for new collaborations </li></ul><ul><ul><li>IMPACT, ViBRANT , OpenUp! in EU </li></ul></ul>
  22. 25. storage clusters
  23. 26. <ul><ul><li>all BHL data stored at the Internet Archive in San Francisco </li></ul></ul><ul><ul><li>no redundancy </li></ul></ul><ul><ul><li>limited in how we could serve our data and images </li></ul></ul><ul><ul><li>difficult to analyze data </li></ul></ul><ul><ul><li>First global BHL cluster gives us </li></ul></ul><ul><ul><li>redundancy and failover </li></ul></ul><ul><ul><li>many new serving options </li></ul></ul><ul><ul><li>new ways to run analytics, data mining </li></ul></ul>Storage issues solved using clusters Storage issues solved using clusters
  24. 28. <ul><ul><li>open source software </li></ul></ul><ul><ul><li>Linux operating system </li></ul></ul><ul><ul><li>Gluster distributed storage system </li></ul></ul><ul><ul><li>commodity hardware </li></ul></ul><ul><ul><li>Supermicro servers </li></ul></ul><ul><ul><li>‘ off the shelf ’ hard drives and other system components </li></ul></ul>Open source software / commodity hardware Open source software / commodity hardware
  25. 29. <ul><ul><li>BHL Cluster 01 </li></ul></ul><ul><ul><li>six 4U sized cabinets </li></ul></ul><ul><ul><li>twenty-four 1.5TB hard drives in each cabinet </li></ul></ul><ul><ul><li>97TB   of replicated and distributed storage (over 200TB of raw disk) </li></ul></ul>BHL Cluster 01
  26. 31. <ul><ul><li>find relationships </li></ul></ul><ul><ul><li>R GNU statistical language </li></ul></ul><ul><ul><li>Hadoop, Disco </li></ul></ul><ul><ul><li>make existing data more useful </li></ul></ul><ul><ul><li>image and OCR reprocessing, taxonfinder </li></ul></ul>Statistical computing
  27. 32. data sharing
  28. 33. <ul><ul><li>replicating BHL data globally </li></ul></ul><ul><ul><li>Marine Biological Laboratory (Woods Hole, US ) </li></ul></ul><ul><ul><li>National History Museum (London, UK ) </li></ul></ul><ul><ul><li>Bibliotheca Alexandrina (Alexandrina, EG ) </li></ul></ul><ul><ul><li>Atlas of Living Australia (Canberra, AU ) </li></ul></ul><ul><ul><li>China... Brazil... </li></ul></ul><ul><ul><li>advantages to replication </li></ul></ul><ul><ul><li>redundancy , failover </li></ul></ul><ul><ul><li>load balancing </li></ul></ul><ul><ul><li>geographical distribution </li></ul></ul>Data sharing and replication Data sharing and replication
  29. 34. <ul><ul><li>grabby </li></ul></ul><ul><ul><li>handles initial download from Internet Archive (IA) </li></ul></ul><ul><ul><li>bhl-sync </li></ul></ul><ul><ul><li>open source Dropbox model </li></ul></ul><ul><ul><li>handles syncing remote nodes automatically </li></ul></ul><ul><ul><li>uses inotify, lsyncd, OpenSSH, rsync, unison </li></ul></ul><ul><ul><li>remote server only requires a secure login </li></ul></ul><ul><li>Open source code available at http://bit.ly/bhl-bits </li></ul>Software for data sync Software for data sync
  30. 35. <ul><ul><li>digital repository platform </li></ul></ul><ul><ul><li>enables storage and management of digital content </li></ul></ul><ul><ul><li>maintains a persistent digital archive </li></ul></ul><ul><ul><li>stores data in a neutral manner </li></ul></ul><ul><ul><li>provides backup , redundancy, disaster recovery </li></ul></ul><ul><ul><li>shares data to remote nodes via OAI-PMH </li></ul></ul>Fedora-commons integration Fedora-commons integration
  31. 36. future plans
  32. 37. <ul><li>BHL is a member of CrossRef through The Smithsonian </li></ul><ul><li>will start assigning DOIs to BHL monographs </li></ul><ul><ul><li>easy, non-controversial provides open access of all content </li></ul></ul><ul><li>then move on to articles and other publication types </li></ul><ul><ul><li>CrossRef rules make full assignment challenging for crowd-sourced articles </li></ul></ul>Assigning DOIs (Digital Object Identifier) Assigning DOIs (Digital Object Identifier)
  33. 38. <ul><li>OCR Correction </li></ul><ul><ul><li>a big problem , no easy solution </li></ul></ul><ul><li>add more content </li></ul><ul><ul><li>partnerships, CiteBank </li></ul></ul><ul><li>sustainability planning and funding </li></ul><ul><ul><li>committed to no fees for users </li></ul></ul><ul><li>more outreach </li></ul><ul><ul><li>conferences, marketing </li></ul></ul><ul><ul><li>Facebook, Twitter and other social media avenues... </li></ul></ul>Wish list for 2011 and beyond Wish list for 2011 and beyond
  34. 39. <ul><ul><li>http://biodiversitylibrary.blogspot.com </li></ul></ul><ul><ul><li>http://twitter.com/BioDivLibrary #bhlib </li></ul></ul><ul><ul><li>http://facebook.com/pages/Biodiversity-Heritage-Library/63547246565 </li></ul></ul><ul><ul><li>http://flickr.com/groups/bhl </li></ul></ul><ul><ul><li>http://youtube.com/user/BioHeritageLibrary </li></ul></ul><ul><ul><li>http://biodiversitylibrary.org/RecentRss.aspx </li></ul></ul><ul><ul><li>http://slidesha.re/bhl-slides </li></ul></ul>BHL is social! BHL is social!
  35. 40. slides : slidesha.re/bhl-slides contact : [email_address] Thanks. Phil Cryer : Biodiversity Heritage Library Scripting Life : the science behind ViBRANT January 20-21, 2011 - Paris, France

×