Phil Cryer  Biodiversity Heritage Library  Scripting Life : the science behind ViBRANT January 20-21, 2011 - Paris, France Biodiversity Heritage Library: Process & Progress
a consortium of  global  partners aims to  share  historic biodiversity literature texts provides  open  access of all content free  for all Biodiversity Heritage Library (BHL) Biodiversity Heritage Library (BHL)
bhl data stats
Content 45,000 journals & monographs 8,821 in 2010 87,000 volumes 15,552 in 2010 32 million pages 5.6 million in 2010
Usage  (2010) 837,000 visits 422,000 unique visitors 4.2 millions page views 221 countries/territories
new features
scanning request form click on ‘Feedback’ to access click on ‘Feedback’ to access click on ‘Feedback’ to access
new user interface for names index sortable columns, exportable via CSV, BibTeX and Endnote
downloadable article PDFs create articles from BHL books create articles from BHL books create articles from BHL books
downloadable article PDFs 1- enter metadata about the article 1- enter metadata about the article 1- enter metadata about the article
downloadable article PDFs 2- select the pages of the article 2- select the pages of the article 2- select the pages of the article
downloadable article PDFs 3- PDF request received 3- PDF request received 3- PDF request received
downloadable article PDFs 4- PDF article arrives via email 4- PDF article arrives via email 4- PDF article arrives via email
CiteBank ( http://citebank.org ) open access repository for biodiversity publications open access repository for biodiversity publications open access repository for biodiversity publications
CiteBank ( http://citebank.org ) Solr search with faceting Solr search with faceting Solr search with faceting
CiteBank ( http://citebank.org ) individual bibliography page individual bibliography page individual bibliography page
CiteBank features access the ‘crowd-sourced’  articles  generated from the BHL scans (harvested from BHL) platform for journals/publishers/societies in need of tools to  store and share content harvests metadata from  Zookeys , SCiELO, Smithsonian collections nightly via OAI-PMH new  search  index to BHL content using Solr
CiteBank + BHL expands our core features content and tools for scholarly crowd-sourcing Users can get content they need, do minor work,  share enhancements  with community look to add  more content integration  with other existing platforms EOL, Atlas of Living Australia, JSTOR Plant Science,  BioStor  and others Mendeley , Zotero, RefWorks, etc
enhancements to the portal home page More focus on  search special collections Charles  Darwin’s scientific library scholarly annotations annotations in Darwin’s hand  and academic interpretation, crosslinking More BHL features coming soon... More BHL features coming soon...
bhl global
 
 
 
Benefits of Global BHL partnerships redundancy  and resilience data and app Mirroring exposing  unique content new tools, services,  people opportunities for new collaborations IMPACT,  ViBRANT , OpenUp! in EU
storage clusters
all BHL data stored at the Internet Archive in San Francisco no  redundancy limited  in how we could serve our data and images difficult  to analyze data First global BHL cluster gives us redundancy  and failover many new  serving  options new ways to run  analytics,  data mining Storage issues solved using clusters Storage issues solved using clusters
 
open source software Linux  operating system Gluster   distributed storage  system commodity hardware Supermicro   servers ‘ off the shelf ’ hard drives and other system components Open source software / commodity hardware Open source software / commodity hardware
BHL Cluster 01 six 4U sized cabinets twenty-four 1.5TB hard drives in each cabinet 97TB   of replicated and distributed storage  (over 200TB of raw disk) BHL Cluster 01
 
find  relationships R GNU statistical language Hadoop, Disco make existing data  more useful image and  OCR  reprocessing,  taxonfinder Statistical computing
data sharing
replicating BHL data  globally Marine Biological Laboratory (Woods Hole,  US ) National History Museum (London,  UK ) Bibliotheca Alexandrina (Alexandrina,  EG ) Atlas of Living Australia (Canberra,  AU ) China... Brazil... advantages  to replication redundancy , failover load balancing geographical  distribution Data sharing and replication Data sharing and replication
grabby handles initial  download  from Internet Archive (IA) bhl-sync open source  Dropbox  model handles  syncing remote nodes  automatically uses inotify, lsyncd, OpenSSH, rsync, unison remote server  only  requires a secure login Open source code available at  http://bit.ly/bhl-bits Software for data sync Software for data sync
digital  repository  platform enables storage and management of digital content maintains a  persistent  digital archive stores data in a  neutral  manner provides backup , redundancy, disaster recovery shares data  to remote nodes via OAI-PMH Fedora-commons integration Fedora-commons integration
future plans
BHL is a member of  CrossRef  through The Smithsonian will start  assigning DOIs  to BHL monographs easy, non-controversial provides  open access  of all content then move on to  articles  and other publication types CrossRef rules make full assignment challenging for crowd-sourced articles Assigning DOIs (Digital Object Identifier) Assigning DOIs (Digital Object Identifier)
OCR  Correction a  big problem , no easy solution add  more content partnerships, CiteBank sustainability  planning and funding committed to no fees for users more  outreach conferences, marketing Facebook, Twitter and other  social media  avenues... Wish list for 2011 and beyond Wish list for 2011 and beyond
http://biodiversitylibrary.blogspot.com http://twitter.com/BioDivLibrary  #bhlib http://facebook.com/pages/Biodiversity-Heritage-Library/63547246565 http://flickr.com/groups/bhl http://youtube.com/user/BioHeritageLibrary http://biodiversitylibrary.org/RecentRss.aspx http://slidesha.re/bhl-slides BHL is social! BHL is social!
slides : slidesha.re/bhl-slides contact :  [email_address] Thanks. Phil Cryer : Biodiversity Heritage Library  Scripting Life : the science behind ViBRANT January 20-21, 2011 - Paris, France

Biodiversity Heritiage Library: progress and process

  • 1.
    Phil Cryer Biodiversity Heritage Library Scripting Life : the science behind ViBRANT January 20-21, 2011 - Paris, France Biodiversity Heritage Library: Process & Progress
  • 2.
    a consortium of global partners aims to share historic biodiversity literature texts provides open access of all content free for all Biodiversity Heritage Library (BHL) Biodiversity Heritage Library (BHL)
  • 3.
  • 4.
    Content 45,000 journals& monographs 8,821 in 2010 87,000 volumes 15,552 in 2010 32 million pages 5.6 million in 2010
  • 5.
    Usage (2010)837,000 visits 422,000 unique visitors 4.2 millions page views 221 countries/territories
  • 6.
  • 7.
    scanning request formclick on ‘Feedback’ to access click on ‘Feedback’ to access click on ‘Feedback’ to access
  • 8.
    new user interfacefor names index sortable columns, exportable via CSV, BibTeX and Endnote
  • 9.
    downloadable article PDFscreate articles from BHL books create articles from BHL books create articles from BHL books
  • 10.
    downloadable article PDFs1- enter metadata about the article 1- enter metadata about the article 1- enter metadata about the article
  • 11.
    downloadable article PDFs2- select the pages of the article 2- select the pages of the article 2- select the pages of the article
  • 12.
    downloadable article PDFs3- PDF request received 3- PDF request received 3- PDF request received
  • 13.
    downloadable article PDFs4- PDF article arrives via email 4- PDF article arrives via email 4- PDF article arrives via email
  • 14.
    CiteBank ( http://citebank.org) open access repository for biodiversity publications open access repository for biodiversity publications open access repository for biodiversity publications
  • 15.
    CiteBank ( http://citebank.org) Solr search with faceting Solr search with faceting Solr search with faceting
  • 16.
    CiteBank ( http://citebank.org) individual bibliography page individual bibliography page individual bibliography page
  • 17.
    CiteBank features accessthe ‘crowd-sourced’ articles generated from the BHL scans (harvested from BHL) platform for journals/publishers/societies in need of tools to store and share content harvests metadata from Zookeys , SCiELO, Smithsonian collections nightly via OAI-PMH new search index to BHL content using Solr
  • 18.
    CiteBank + BHLexpands our core features content and tools for scholarly crowd-sourcing Users can get content they need, do minor work, share enhancements with community look to add more content integration with other existing platforms EOL, Atlas of Living Australia, JSTOR Plant Science, BioStor and others Mendeley , Zotero, RefWorks, etc
  • 19.
    enhancements to theportal home page More focus on search special collections Charles Darwin’s scientific library scholarly annotations annotations in Darwin’s hand and academic interpretation, crosslinking More BHL features coming soon... More BHL features coming soon...
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
    Benefits of GlobalBHL partnerships redundancy and resilience data and app Mirroring exposing unique content new tools, services, people opportunities for new collaborations IMPACT, ViBRANT , OpenUp! in EU
  • 25.
  • 26.
    all BHL datastored at the Internet Archive in San Francisco no redundancy limited in how we could serve our data and images difficult to analyze data First global BHL cluster gives us redundancy and failover many new serving options new ways to run analytics, data mining Storage issues solved using clusters Storage issues solved using clusters
  • 27.
  • 28.
    open source softwareLinux operating system Gluster distributed storage system commodity hardware Supermicro servers ‘ off the shelf ’ hard drives and other system components Open source software / commodity hardware Open source software / commodity hardware
  • 29.
    BHL Cluster 01six 4U sized cabinets twenty-four 1.5TB hard drives in each cabinet 97TB   of replicated and distributed storage (over 200TB of raw disk) BHL Cluster 01
  • 30.
  • 31.
    find relationshipsR GNU statistical language Hadoop, Disco make existing data more useful image and OCR reprocessing, taxonfinder Statistical computing
  • 32.
  • 33.
    replicating BHL data globally Marine Biological Laboratory (Woods Hole, US ) National History Museum (London, UK ) Bibliotheca Alexandrina (Alexandrina, EG ) Atlas of Living Australia (Canberra, AU ) China... Brazil... advantages to replication redundancy , failover load balancing geographical distribution Data sharing and replication Data sharing and replication
  • 34.
    grabby handles initial download from Internet Archive (IA) bhl-sync open source Dropbox model handles syncing remote nodes automatically uses inotify, lsyncd, OpenSSH, rsync, unison remote server only requires a secure login Open source code available at http://bit.ly/bhl-bits Software for data sync Software for data sync
  • 35.
    digital repository platform enables storage and management of digital content maintains a persistent digital archive stores data in a neutral manner provides backup , redundancy, disaster recovery shares data to remote nodes via OAI-PMH Fedora-commons integration Fedora-commons integration
  • 36.
  • 37.
    BHL is amember of CrossRef through The Smithsonian will start assigning DOIs to BHL monographs easy, non-controversial provides open access of all content then move on to articles and other publication types CrossRef rules make full assignment challenging for crowd-sourced articles Assigning DOIs (Digital Object Identifier) Assigning DOIs (Digital Object Identifier)
  • 38.
    OCR Correctiona big problem , no easy solution add more content partnerships, CiteBank sustainability planning and funding committed to no fees for users more outreach conferences, marketing Facebook, Twitter and other social media avenues... Wish list for 2011 and beyond Wish list for 2011 and beyond
  • 39.
    http://biodiversitylibrary.blogspot.com http://twitter.com/BioDivLibrary #bhlib http://facebook.com/pages/Biodiversity-Heritage-Library/63547246565 http://flickr.com/groups/bhl http://youtube.com/user/BioHeritageLibrary http://biodiversitylibrary.org/RecentRss.aspx http://slidesha.re/bhl-slides BHL is social! BHL is social!
  • 40.
    slides : slidesha.re/bhl-slidescontact : [email_address] Thanks. Phil Cryer : Biodiversity Heritage Library Scripting Life : the science behind ViBRANT January 20-21, 2011 - Paris, France

Editor's Notes

  • #22 BHL has gone Global
  • #23 Sharing, distribution and delivery of content
  • #24 Global data sharing requires a social infrastructure