Building a large digital library and interacting with the World: BHL


Published on

This talk was given at the 9th Digital History and Philosophy of Science Workshop held at Cambridge University, England from 5-8 September 2012. The talk was part of the portion of the Workshop sponsored by CRASSH - Centre for Research in the Arts, Social Sciences and Humanities of Cambridge University: "
If you build it, will they come? Mobilising online communities for research", held September 6, 2012

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • I would like to thank Alison Pearn for asking Diane Rielinger and myself to speak here today about the Biodiversity Heritage Library and how we have used social media and crowd sourcing to advance our library and aid researchers in their work. In the last month my colleague Diane became co-Director of the Library of the Marine Biological Laboratory and the Woods Hole Oceanographic Institution where we work; because of this her new schedule has kept her from being here; we worked on this presentation together, and Diane sends her greetings and best wishes to everyone. I would also like to thank Jane Maienschein and Manfred Laubichler of the Center for Biology and Society at Arizona State University and our ASU-MBL History and Philosophy of Science Program for asking me to join them here. I ’m Matthew Person, and while hold down an number of varied roles related to serials management and social media at what people in Woods Hole call the MBLWHOI Library, right now I am additionally speaking to you through my experience as a librarian staff member and serials specialist in the Biodiversity Heritage Library... and I am excited to be here today and I look forward to speaking with you about BHL.
  • I attended an early meeting of the BHL in 2007 at the Missouri Botanical Garden in St. Louis, As we began scanning literature for the BHL at that time, we did not know all the elements of how to create a digital library, but the first BHL partners met, discussed what we imagined to be the key elements of a mass digitization project, and we then we ventured into respective library stacks, selected books based upon the strengths of our collections, processed the metadata, and sent the volumes on a truck 79 miles to our scanning partner, the Internet Archive scanning center at the Boston Public Library… a number of days later , the books appeared on the BHL website, and well, there you have the beginnings of the Biodiversity Heritage Library. a very hands on process. If you multiply this by almost 40 million pages…you will be astounded, As WE were working librarians meeting great goals, and there are also inevitable errors and gaps, metadata oddities, and unforeseen issues which needed to be addressed to make this library as clean and accurate as the library which your scanning objects originated from.
  • I am going to begin by introducing our website to you: Here ’s a screen shot of the BHL homepage, This was designed in a functional manner by our technical staff, If we move clockwise from the upper left we see current library stats, which I’ll show you again shortly There’s a basic search window, in the center a donation button, an “add me to the mailing list” button, social media connections in the upper right corner, links to blog posts, twitter, and mention of our featured collection. Featured collections is a way to draw attention to content within content.
  • Here is the homepage for our sister BHL node, BHL Australia. There has been extensive cooperation between our BHL US/UK project and the BHL Australia project Last year we asked members of the BHL users community to participate in a hands on comparison of the 2 sites (I should add that Erick Pierson here from ASU and I performed a live usage study via a video hook up and synchronized computer screens while he was in Arizona and I was in Woods Hole) and much of the information collected during this study was used to understand nuances in website functionality and visual design. This information will be used in the merging of the 2 web sites later this year. We also periodically run online usage surveys, and the answers coming from our user community help inform priorities and other aspects of our development.
  • Returning to the nuts and bolts, in this quick tour, this is a slightly blown up view of the title page of a volume in BHL,
  • And I have clicked into a text page here, Note the blue arrow on the left…and note the scientific names being pointed to. Software called uBio, developed at the MBLWHOI Library, extracts scientific names from the scanned text and displays the liberated linked names…in this case I am pointing to the NAME OF THE plant ASTAGALUS, or Milk Vetch If I click on the name of this species,
  • The Biodiversity heritage Library instantly creates a linked bibliography of instances of that plant in BHL served literature. Now if you look at the blue oval circling the SMALL logo of the Encyclopedia of Life at top, if I click on the EOL logo,
  • We go to the Encyclopedia of Life web page for THIS PLANT … and looking at the blue oval ON THE UPPER CENTRAL RIGHT once more, we can click ON the LITERATURE TAB, WHICH WILL TAKE US TO an EOL – page for this plant, Similar to the bibliography we had previously created to get from BHL to EOL . Here ’s A little background on EOL: The encyclopedia of life was the brainchild of Harvard University Entomologist, Edward O Wilson, who, when he won the TED prize in 2007, he expressed his wish that … “we will work together to help create the key tool that we need to inspire preservation of Earth ’s biodiversity: the Encyclopedia of Life ...a webpage functioning as a mini website for each of the 2 million known species on the earth”, Our key partner EOL now numbers one million online species pages, and BHL is the literature component of EOL.
  • The BHL and EOL are closely associated on many different technological and administrative levels, and we also collaborate on crowdsourcing the improvement of BHL flicker image metadata for inclusion of the images on EOL species pages, and coordination of public outreach, like here for example, at the American Library Association meeting in Dallas last January.
  • Continuing this brief tour of the BHL, the impetus for the Biodiversity Heritage Library began at a conference called “Library and Laboratory” which was held at the Natural History Museum in London in 2005 – Attendees of this meeting foresaw that an historic opportunity was brewing through which lower cost technology could be applied to the physical printed page to mass digitize entire collections and apply tools to the digitized content to make the content more technically powerful than it was when it was originally published.
  • At a global conference the BHL held at the Field Museum in Chicago last year our now retired BHL director, Tom Garnett of the Smithsonian Institution, reiterated our definition of our Library: He said: “ The Biodiversity Heritage Library (BHL) is a consortium of natural history and botanical libraries that cooperate to digitize and make accessible the legacy literature of biodiversity held in their collections and to make that literature available for open access and responsible use as a part of a global “biodiversity commons”. There are other projects associated with BHL as well, such as Charles Darwins Library project, which is accessible through the BHL website, sponsored by the Natural History Museum in London, the American Museum of Natural History, the Cambridge University Library, JISC, and the National Endowment for the Humanities.
  • The global involvement in this project has expanded WITH BHL nodes in China, Egypt, Europe, Brazil, and Australia, and meetings were held in Cape Town this past summer to begin the formation of an African BHL node. One could say that the formation of the BHL itself WAS in response to a collaboratively expressed international need TO DIGITIZE THE WORLD legacy BIODIVERSITY LITERATURE for science communities by the natural history library community. Perhaps that ’s a form of institutional crowd sourcing. Shortly I will speak of how we have used crowd sourcing to further the development of BHL.
  • But here is Where we are now – close to 40 million scanned pages and 99 million linked scientific names online.
  • Early on in this project we realized that the sheer volume of what we were doing was opening up the probability that somewhere during the scanning workflow errors would occur which required correcting. Everything from a scanning technician’s fingers blocking an image scan, to pages which had not been scanned, and other gaps and situations of all sorts. Even developing a comprehensive quality control project did not mean that all of these errors would be caught. Gemini is an open source issue resolution and scanning suggestion online form tool which transmits information from our users directly to the BHL staff.
  • … additionally, through this Gemini tool BHL librarian Grace Costantino has said : “ It harnesses the energy of our users and allows them to do some scanning prioritization work for us… it has been a key element to our success. By us responding individually to each user via email regarding their feedback, we also make them feel involved and invested in our project.. . Scanning based on user requests ensures that our limited funds are spent on titles most needed by our user community. ”
  • In this case it is a library user who notices an issue with a volume and lets us know about it by by clicking the feedback button at the top , or the exclamation point button on the right,
  • they can fill out a form which is forwarded to our staff at the Smithsonian, Where the issue is sent to the library which originally scanned the volume, or if it is a scanning suggestion it is forwarded to a library or libraries which hold the volumes– when the issue is assigned to a library, the Gemini system sends an email alert to every library related to the issue, and
  • Staff members can view and work on issue resolution collectively. So our user crowd informs and directs us in an essential way here, they help us make the library more accurate and complete so that users anywhere with a networked computer can perform their research.
  • The end result is that the volume which a user noticed an error in is now corrected and whole.
  • There are other ways our “full service” digital library harnesses the power of library users
  • Here is our flickr page: We began extracting illustrations from BHL content as a way to participate in Flickr, the online photo posting and sharing website, and as a way to highlight an unique aspect of our collection: illustrations connected to text. The taxonomists and funders who guided much of the early BHL work were less interested in illustrations, and in the beginning we were just keeping up with basic scanning but interest from the general public and citizen-scientists and from those who have other related interests has been substantial. As I mentioned earlier related to BHL-EOL cooperation Images uploaded to flickr could be used by our partner project, the Encyclopedia of Life if the images had proper identifying metadata attached to them, in this case we are talking about flickr tags.
  • To this end WE have held in cooperation with the Encyclopedia of Life “ flickr tagging parties ” at which we invite users to add proper metadata for images to then be included on Encyclopedia of Life species pages
  • Tagging BHL images which have been uploaded to our flickr page involves some research on the part of the volunteer tagger to add tags in a specific format, The photos which have been tagged are then uploaded to the EOL flickr pool of images, which is in turn uploaded to EOL, and associated with the proper species page. The Smithsonian Libraries BHL staff has conducted 2 flickr tagging events with the Encyclopedia of Life recently, and more events are planned in the future, such as a side event to our presence at the Ecological Society of America meeting in the US in the next year.
  • … and as an individual, if you are on the flickr BHL site, you can follow directions to do some tagging yourself…
  • In a discussion about the holding of flickr tagging events, librarian Gilbert Borrego of the Smithsonian who has been involved in developing the tagging events has pointed out pros and cons: he says “ It is also difficult to determine what results you would get after the event and what even qualifies as a good result versus a poor one. How many image edits makes it all worth the time it takes to plan and hold an event such as these? Who ’s to say? The quality of the data is also something to think about. Do citizen scientists provide the same quality of data as trained scientists? Again, hard to quantify. ” Though…
  • Looking at these statistics, something positive must be happening in community building in relation to our flickr presence Because in the first 10 months of our flickr work, there were 390,000 views of 30,000 images in our flickr account. --- 25 years ago the way notable scientific literature images found their way into a broader realm was if users went to library rare book rooms to view them, or they were part of an exhibition, or exhibition catalog, and all too often thieves razored them out of a volume and sold them in a print shop, which is an issue we no longer face in the digital age in the library I work at, because anyone can download the images for free, and do whatever they wish to do with them.
  • Here’s a look at our facebook page,
  • Staff create two regular types of posts on the BHL Facebook page. The first is daily quizzes, which pose questions about biodiversity for users to answer. The second is biodiversity news updates, which are posts highlighting news stories relevant to the biodiversity realm. Whenever possible, staff link all posts back to BHL content.
  • Here’s an example of a facebook quiz post, this one was looking for which of these is commonly known as tooth fungus… the answer being the HERIKIUM ERINAYSHUS IN THE UPPER RIGHT BOX…over 3000 people initially see these posts
  • … and LOOKING AT the facebook stat which shows eventual reach of posts, in the first Quarter of 2012 the eventual reach of our posts went to 77,000 facebook users, While they all may not be doing scientific research, they are the taxpayers and citizen scientists who help drive broader forces of interest towards science, so their participation is welcomed by the BHL.
  • On twitter we have over 1800 followers, and we follow a program related to what we do on facebook, of regular posting of different categories of posts. On occasion we target our social media around a particular event,
  • Such as international Shark week…
  • You can see from these figures showing spikes in social media traffic how impactful coordinated social media can be. BHL librarian Grace Costantino speaks of using such a coordinated approach in developing a form of brand loyalty to our project, which we might be able to draw upon in projects which require lots of fingers on keyboards to correct unclear metadata, for example.
  • We are also working to increase our presence as a scientific and cultural resource on Wikipedia through a National Endowment for the Humanities grant called the Art of LIfe Project: which is developing Schema for sharing BHL image metadata. We are also studying Wiki Source, which is a tool that would allow uploading of and crowd sourcing of OCR text correction. We have built also themed bibliographies of titles which have become collections on iTunes University, such as None of this reaching out to broad cultural and social media resources is flippant; it all revolves around the collections we have scanned into the BHL, estimated to be about 33% of the total pre-1923 texts and 7% total core biodiversity taxonomic literature. And while all of this is going on there are scientists in our crowd community such as Rod Page at the University of Glasgow responding to BHL with an independently developed tool called Biostor
  • And while all of this is going on in our citizen scientist and general public community there are scientists in our crowd community such Woods Hole independent scientist Ryan Schenk, who has developed a scientific name usage through time visualization tool called Synynyms.
  • As mentioned previously, Last November we held our Life and Literature conference at the Field Museum in Chicago. This was a five years into this project event which was open to anyone interested whether they be Librarian, scientist, schoolteacher, or citizen scientist. The conference was an affirmation of the BHL, and it was also an invitation to the attendees to challenge us to move into new directions. Through breakout sessions, the ideas and opinions of the attendees were discussed, recorded and synthesized by staff members into influencing the BHL plan for our future.
  • The 4 concurrent 60 minute breakout sessions were in four areas which were content and connections, new collaborations, technical advancements, and educational outreach. Each individual discussion period was moderated to cover four directed areas of discussion which were: Expand, Innovate, Maintain, and Avoid. I attended the Educational Outreach group and learned about a number of school and museum projects in Australia which were using BHL content as a teaching and creative project tool. In the Technical Advancements group, “Gaming with a purpose” was recognized as something to pursue.
  • App style gaming has been an area of discussion in BHL for some time, but we have not developed any online games as of yet due to a lack of funding and limited staff time, but Chris Freeland, our former Technical Director has spoken to this at meetings and blog posts. By gaming, we mean some kind of “App” which allows a user to play a simple game to play on a computer or mobile device, Which in reality would be interacting with some BHL improvement project. Time consuming Issues in need of resolution which could lead to the development of Games, are for example Pagination, which is the creation of page specific metadata, Other issues which could become themes for game development are (and some of these have been previously mentioned): Rekeying of tables of contents or scientific names Image identification & extraction adding common names to scientific names Geolocation tasks Editing volume information & re-sequencing volumes and correcting OCR text
  • Chris has pointed out as the type of gaming he was speaking about Is similar to the The National Library of Finland's digital koot games website which has simple games developed to assist in OCR correction of scanned newspapers
  • To now quickly wind down this review, The Biodiversity Heritage Library began as a broad project by multiple natural history institutions working collectively and cooperatively, responding to a need of the science community to have digitized the vast corpus of legacy biodiversity literature. By necessity we have developed relationships with our library users to help guide our development priorities, and also to assist us in the day to day operation of the library, relying on the crowd’s well tuned instincts as a quality control mechanism. And we have also reached into social media to engage vital communities beyond scientists… to create engaging resources which serve our users and make our library more accurate and go beyond just scanned text, To create an online library as rich and as significant of the sum total of all the great libraries which constitute our collections. Thank you.
  • Building a large digital library and interacting with the World: BHL

    1. 1. Building a large digital libraryand interacting with the World the Biodiversity Heritage Library Founded in 2006 Matthew Person MBLWHOI Library, Biodiversity Heritage Library, ASU/MBL HPS Program
    2. 2. pagination box
    3. 3. Behind every stack of books is a flood of knowledgeArtist: Jacek Yerka
    4. 4. Biodiversity Heritage Library Library and Laboratory: the Marriage of Research, Data and Taxonomic Literature London, February 2005 Eighty participants from 22 countries gathered to discuss the status and future of access to the taxonomic literature and to propose an agenda for actions that would improve the research environment for taxonomy. The participants were biologists; librarians; conservation workers; publishers; representatives of learned and professional societies, private foundations and government agencies; and specialists in informationProgne subis- Purple Martin technology.Illustrations of the nest and eggs of birds of Ohio,1879-1886
    5. 5. GEMINIcrowd sourced issue resolutionEarly on in this project we realized that the sheervolume of what we were doing was opening up theprobability that somewhere during the scanningworkflow errors would occur which requiredcorrecting. Everything from fingers in a scan imageto pages which had not been scanned, and gapsof all sorts. Even a comprehensive quality controlproject did not mean that all of these errors wouldbe caught. Gemini is an open source issueresolution tool.
    6. 6. …additionally, through this Gemini toolBHL librarian Grace Costantino has said:“Harnessing the energy of our users andallowing them to do the prioritization work forus …(through clicking on a feedback button)…has been a key element to our success. Byresponding individually to each user regardingtheir feedback, we also make them feelinvolved and invested in our project…Scanning based on user requests ensures thatour limited funds are spent on titles mostuseful to our user community.”
    7. 7. There are other ways our “full service”digital library harnesses the power ofthe library consumers to:•grow•fill in gaps•correct errors•assist in research•harness social media for the above
    8. 8. “It is also difficult to determine whatresults you would get after the eventand what even qualifies as a goodresult versus a poor one. How manyimages and article edits makes it allworth the time it takes to plan a hold anevent such as these? Who’s to say?The quality of the data is alsosomething to think about. Do citizenscientists provide the same quality ofdata as trained scientists? Again, hardto quantify.” –BHL librarian Gilbert Borrego
    9. 9. Staff weekly create two regulartypes of posts on the BHLFacebook page. The firstis daily quizzes, which posequestions about biodiversity forusers to answer.The second is biodiversity newsupdates, which are postshighlightingnews stories relevant to thebiodiversity realm. Wheneverpossible, staff linkall posts back to BHL content.
    10. 10. Facebook Statistics Q1, 2012:People Interacting with Page:1,789Interactions: 2,438New Page Likes: 449Total Page Likes: 2,111Page Unlikes: 30Engaged Users: 4,948Reach of Posts: 77,114
    11. 11. e ek ! w a rkSh
    12. 12. Summary of Shark Week Results:Facebook:We saw a 46% increase in traffic to our website originating fromFacebook during the Shark Week campaign as compared with theprevious two months’ average traffic from Facebook.Twitter:We saw a 125% increase in traffic to our website originating fromTwitter during the Shark Week campaign as compared with theprevious two months’ average traffic from Twitter.Flickr:We saw a 50% increase in traffic to our website originating fromFlickr during the Shark Week campaign as compared with theprevious two months’ average traffic from Flickr.
    13. 13. We are also work to increase our presence as ascientific and cultural resource on Wikipedia througha National Endowment for the Humanities grantcalled the Art of LIfe Project: which is developingSchema for sharing BHL image metadata.We are also studying Wiki Source, which is a tool thatwould allow uploading of and crowd sourcing of OCRtext correction.We have also built bibliographies of titles which havebecome collections on iTunes UniversityNone of this is flippant; it all revolves around thecollections we have scanned into the BHL, estimatedto be about 33% of the total pre-1923 texts and 7%total core biodiversity taxonomic literature.
    14. 14. BHL and Purposeful GamingPagination Image taggingSource: Holland, W.J. The butterfly book; a popular guide to a knowledge of thebutterflies of North America. Garden City, NY Doubleday, Page & Company, 1922 Source: Baker, Samuel White, Sir. Wild beasts and their ways London,Macmillan and co.,1890.
    15. 15. Thanks to the Diane Rielinger, MBLWHOI Library co-Director;Biodiversity Heritage Library Staff; MBLWHOI Library Staff; and theASU-MBL History and Philosophy of Science Program