Biodiversity Heritiage Library: progress and process

1,768 views

Published on

And update on Biodiversity Heritage Library's efforts and success in 2010 with a focus on the future as part of the EU project, ViBRANT.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,768
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
13
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • BHL has gone Global
  • Sharing, distribution and delivery of content
  • Global data sharing requires a social infrastructure
  • Biodiversity Heritiage Library: progress and process

    1. 1. Phil Cryer Biodiversity Heritage Library Scripting Life : the science behind ViBRANT January 20-21, 2011 - Paris, France Biodiversity Heritage Library: Process & Progress
    2. 2. <ul><ul><li>a consortium of global partners </li></ul></ul><ul><ul><li>aims to share historic biodiversity literature texts </li></ul></ul><ul><ul><li>provides open access of all content </li></ul></ul><ul><ul><li>free for all </li></ul></ul>Biodiversity Heritage Library (BHL) Biodiversity Heritage Library (BHL)
    3. 3. bhl data stats
    4. 4. <ul><li>Content </li></ul><ul><li>45,000 journals & monographs </li></ul><ul><li>8,821 in 2010 </li></ul><ul><li>87,000 volumes </li></ul><ul><li>15,552 in 2010 </li></ul><ul><li>32 million pages </li></ul><ul><li>5.6 million in 2010 </li></ul>
    5. 5. <ul><li>Usage (2010) </li></ul><ul><li>837,000 visits </li></ul><ul><li>422,000 unique visitors </li></ul><ul><li>4.2 millions page views </li></ul><ul><li>221 countries/territories </li></ul>
    6. 6. new features
    7. 7. scanning request form click on ‘Feedback’ to access click on ‘Feedback’ to access click on ‘Feedback’ to access
    8. 8. new user interface for names index sortable columns, exportable via CSV, BibTeX and Endnote
    9. 9. downloadable article PDFs create articles from BHL books create articles from BHL books create articles from BHL books
    10. 10. downloadable article PDFs 1- enter metadata about the article 1- enter metadata about the article 1- enter metadata about the article
    11. 11. downloadable article PDFs 2- select the pages of the article 2- select the pages of the article 2- select the pages of the article
    12. 12. downloadable article PDFs 3- PDF request received 3- PDF request received 3- PDF request received
    13. 13. downloadable article PDFs 4- PDF article arrives via email 4- PDF article arrives via email 4- PDF article arrives via email
    14. 14. CiteBank ( http://citebank.org ) open access repository for biodiversity publications open access repository for biodiversity publications open access repository for biodiversity publications
    15. 15. CiteBank ( http://citebank.org ) Solr search with faceting Solr search with faceting Solr search with faceting
    16. 16. CiteBank ( http://citebank.org ) individual bibliography page individual bibliography page individual bibliography page
    17. 17. CiteBank features <ul><li>access the ‘crowd-sourced’ articles generated from the BHL scans (harvested from BHL) </li></ul><ul><li>platform for journals/publishers/societies in need of tools to store and share content </li></ul><ul><li>harvests metadata from Zookeys , SCiELO, Smithsonian collections nightly via OAI-PMH </li></ul><ul><li>new search index to BHL content using Solr </li></ul>
    18. 18. CiteBank + BHL expands our core features <ul><li>content and tools for scholarly crowd-sourcing </li></ul><ul><ul><li>Users can get content they need, do minor work, share enhancements with community </li></ul></ul><ul><li>look to add more content integration with other existing platforms </li></ul><ul><ul><li>EOL, Atlas of Living Australia, JSTOR Plant Science, BioStor and others </li></ul></ul><ul><ul><li>Mendeley , Zotero, RefWorks, etc </li></ul></ul>
    19. 19. <ul><li>enhancements to the portal home page </li></ul><ul><ul><li>More focus on search </li></ul></ul><ul><li>special collections </li></ul><ul><ul><li>Charles Darwin’s scientific library </li></ul></ul><ul><li>scholarly annotations </li></ul><ul><ul><li>annotations in Darwin’s hand and academic interpretation, crosslinking </li></ul></ul>More BHL features coming soon... More BHL features coming soon...
    20. 20. bhl global
    21. 24. Benefits of Global BHL partnerships <ul><li>redundancy and resilience </li></ul><ul><ul><li>data and app Mirroring </li></ul></ul><ul><li>exposing unique content </li></ul><ul><li>new tools, services, people </li></ul><ul><li>opportunities for new collaborations </li></ul><ul><ul><li>IMPACT, ViBRANT , OpenUp! in EU </li></ul></ul>
    22. 25. storage clusters
    23. 26. <ul><ul><li>all BHL data stored at the Internet Archive in San Francisco </li></ul></ul><ul><ul><li>no redundancy </li></ul></ul><ul><ul><li>limited in how we could serve our data and images </li></ul></ul><ul><ul><li>difficult to analyze data </li></ul></ul><ul><ul><li>First global BHL cluster gives us </li></ul></ul><ul><ul><li>redundancy and failover </li></ul></ul><ul><ul><li>many new serving options </li></ul></ul><ul><ul><li>new ways to run analytics, data mining </li></ul></ul>Storage issues solved using clusters Storage issues solved using clusters
    24. 28. <ul><ul><li>open source software </li></ul></ul><ul><ul><li>Linux operating system </li></ul></ul><ul><ul><li>Gluster distributed storage system </li></ul></ul><ul><ul><li>commodity hardware </li></ul></ul><ul><ul><li>Supermicro servers </li></ul></ul><ul><ul><li>‘ off the shelf ’ hard drives and other system components </li></ul></ul>Open source software / commodity hardware Open source software / commodity hardware
    25. 29. <ul><ul><li>BHL Cluster 01 </li></ul></ul><ul><ul><li>six 4U sized cabinets </li></ul></ul><ul><ul><li>twenty-four 1.5TB hard drives in each cabinet </li></ul></ul><ul><ul><li>97TB   of replicated and distributed storage (over 200TB of raw disk) </li></ul></ul>BHL Cluster 01
    26. 31. <ul><ul><li>find relationships </li></ul></ul><ul><ul><li>R GNU statistical language </li></ul></ul><ul><ul><li>Hadoop, Disco </li></ul></ul><ul><ul><li>make existing data more useful </li></ul></ul><ul><ul><li>image and OCR reprocessing, taxonfinder </li></ul></ul>Statistical computing
    27. 32. data sharing
    28. 33. <ul><ul><li>replicating BHL data globally </li></ul></ul><ul><ul><li>Marine Biological Laboratory (Woods Hole, US ) </li></ul></ul><ul><ul><li>National History Museum (London, UK ) </li></ul></ul><ul><ul><li>Bibliotheca Alexandrina (Alexandrina, EG ) </li></ul></ul><ul><ul><li>Atlas of Living Australia (Canberra, AU ) </li></ul></ul><ul><ul><li>China... Brazil... </li></ul></ul><ul><ul><li>advantages to replication </li></ul></ul><ul><ul><li>redundancy , failover </li></ul></ul><ul><ul><li>load balancing </li></ul></ul><ul><ul><li>geographical distribution </li></ul></ul>Data sharing and replication Data sharing and replication
    29. 34. <ul><ul><li>grabby </li></ul></ul><ul><ul><li>handles initial download from Internet Archive (IA) </li></ul></ul><ul><ul><li>bhl-sync </li></ul></ul><ul><ul><li>open source Dropbox model </li></ul></ul><ul><ul><li>handles syncing remote nodes automatically </li></ul></ul><ul><ul><li>uses inotify, lsyncd, OpenSSH, rsync, unison </li></ul></ul><ul><ul><li>remote server only requires a secure login </li></ul></ul><ul><li>Open source code available at http://bit.ly/bhl-bits </li></ul>Software for data sync Software for data sync
    30. 35. <ul><ul><li>digital repository platform </li></ul></ul><ul><ul><li>enables storage and management of digital content </li></ul></ul><ul><ul><li>maintains a persistent digital archive </li></ul></ul><ul><ul><li>stores data in a neutral manner </li></ul></ul><ul><ul><li>provides backup , redundancy, disaster recovery </li></ul></ul><ul><ul><li>shares data to remote nodes via OAI-PMH </li></ul></ul>Fedora-commons integration Fedora-commons integration
    31. 36. future plans
    32. 37. <ul><li>BHL is a member of CrossRef through The Smithsonian </li></ul><ul><li>will start assigning DOIs to BHL monographs </li></ul><ul><ul><li>easy, non-controversial provides open access of all content </li></ul></ul><ul><li>then move on to articles and other publication types </li></ul><ul><ul><li>CrossRef rules make full assignment challenging for crowd-sourced articles </li></ul></ul>Assigning DOIs (Digital Object Identifier) Assigning DOIs (Digital Object Identifier)
    33. 38. <ul><li>OCR Correction </li></ul><ul><ul><li>a big problem , no easy solution </li></ul></ul><ul><li>add more content </li></ul><ul><ul><li>partnerships, CiteBank </li></ul></ul><ul><li>sustainability planning and funding </li></ul><ul><ul><li>committed to no fees for users </li></ul></ul><ul><li>more outreach </li></ul><ul><ul><li>conferences, marketing </li></ul></ul><ul><ul><li>Facebook, Twitter and other social media avenues... </li></ul></ul>Wish list for 2011 and beyond Wish list for 2011 and beyond
    34. 39. <ul><ul><li>http://biodiversitylibrary.blogspot.com </li></ul></ul><ul><ul><li>http://twitter.com/BioDivLibrary #bhlib </li></ul></ul><ul><ul><li>http://facebook.com/pages/Biodiversity-Heritage-Library/63547246565 </li></ul></ul><ul><ul><li>http://flickr.com/groups/bhl </li></ul></ul><ul><ul><li>http://youtube.com/user/BioHeritageLibrary </li></ul></ul><ul><ul><li>http://biodiversitylibrary.org/RecentRss.aspx </li></ul></ul><ul><ul><li>http://slidesha.re/bhl-slides </li></ul></ul>BHL is social! BHL is social!
    35. 40. slides : slidesha.re/bhl-slides contact : [email_address] Thanks. Phil Cryer : Biodiversity Heritage Library Scripting Life : the science behind ViBRANT January 20-21, 2011 - Paris, France

    ×