British Library Labs - Presentation at the University of Nottingham - Digital Humanities series
Upcoming SlideShare
Loading in...5

British Library Labs - Presentation at the University of Nottingham - Digital Humanities series



Presentation given my Mahendra Mahey about British Library Labs at the University of Nottingham, Digital Humanities seminar series.

Presentation given my Mahendra Mahey about British Library Labs at the University of Nottingham, Digital Humanities seminar series.



Total Views
Views on SlideShare
Embed Views



1 Embed 4 4



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • 150 secondsNow on to the structure of my talk. I will first give a very brief overview of the Library and then tell you a number of ‘stories’ mostly from aHumanities perspective on how researchers did things in the past and how that is changing because of rapid developments in digital technology. With more and more digital content, data, tools and services being made available, researchers are able to ask questions they had never dreamed of before, share their findings in an open way and collaborating, some of them are becoming the ‘digital’ scholars.I will bring you back the story to the British Library, and how the digital scholar is changing the way we do things. Moving on to the efforts of digitisation across the British Library,giving a whistle stop tour of some of the incredible digital collections we now have and highlight some of the challenges that weface given our historical origins,licensing and technical restrictions. Importantly, I will also try to address how we are trying to tackle some of these challenges. I will outline the work of Digital Scholarship department, created to support the changing research landscape, focusing particularly on the work on the Digital Research Teamand that of British Library Labs, both of which sit in the same department. I will point out some of the surprisingfindings we have discovered and some of the lessons we have learned so far and what we are planning for the future. Finally, I will finish with some important final ‘take away’ messages and an opportunity for you ask questions.
  • 85 secondsThe British Library is the national library of the UK and one of the largest research libraries in the world .The picture you can see is inside the main building in London, it’s the King’s Library – King George the Third’s personal library! Sometimes known as the ‘stack’, I walk past this everyday and I sometimes look at this awe and am reminded that the collections the British Library have are truly staggering! We currently estimate them to exceed 150 million items, representing every age of written civilisation and every known language. Our archives now contain the earliest surviving printed book in the world, the Diamond Sutra, written in Chinese and dating from 868 AD….So some big numbers…Over …14 million books60 million patents8 million stamps4 million maps3 million sound recordings1.6 million music scoresover .3 million manuscripts0.8 million serials titles (which are of course made up of many volumes/editions), Just in case your wondering about why the numbers don’t add up to 150 million, this is where a lot of our content is.
  • 50 secondsSo, the very nature of digital allows us to break down what were previously bound items down into fundamental bits of information and data. These bits of data can be recombined, duplicated and linked to in infinite ways. This is fundamentally changing our view of research. It’s a bit like the ‘Tower of Babble’ sculpture to the right by Brian Dettmer, created by recombining bits from books, words and sentences cut out and put back together in different ways to create something new, surprising and beautiful. This is what scholars are doing with digital content. Let us now move on to what is understood by the term ‘digital scholar’.
  • 25 secondsDigitisation is transforming access to researchers. It is spreading the value of collections, content and expertise. It is about connecting,collaborating and sharing as much as it is about collecting, e.g. through social media and encouraging others to integrate our materials into their services – and vice versa
  • 85 secondsThe British Library faces many challenges of access to our Digital collections! Sometimes digital content is only available onsite due to license restrictions, or even only on a specific computer in a reading room! Technically there are very few reasons why digital content can’t be online though it might be too big or hasn’t been transferred from other digital storage media. Sometimes access is through a paywall. Finally, some content is in the happy sunny place, online, open and freely available. The real reasons why there are challenges to accessing digital content are of course human. They require different approaches from the Library and may often involve an honest, open dialogue and negotiation with the publishers. The Labs project has tried to address this problem my creating a ‘residency model’ for researchers to work intensively with a digital collection on-site, so as to not infringe access conditions, I will say more about this later.
  • 45 secondsThe role of the Labs is to open us as much digital content as it can, there are other parts of the Library who want to do the same.The "Picturing Canada" collection is a series of photographs from the Canadian Copyright Collection held at the British Library. They were digitised with Wikimedia UK and the Eccles Centre for American Studies and 5374 images have been uploaded to Wikimedia Commons in high resolution. This demonstrates that the Library is using open models of releasing the digital content it curates. There are now currently 3 collections of copyright free images on Wikimedia commons.
  • 50 secondsIn his book, The Digital Scholar: How technology is transforming scholarly practice, Martin Weller suggests that a short hand term should be used to loosely define a ‘Digital Scholar’. First of all, the person does NOT necessarily need to be a recognised academic or someone who posts online. It is someone who employs digital, networked and openapproaches to demonstrate their specialism.Let us now look at the area of Humanities, where our scholar Pieter Francois does his work, to investigate the idea of a Digital Scholar a little further.
  • 45 secondsFormed in 2010 and lead by Dr Adam Farquhar, the Digital Scholarship department’s mission is to …become a leading centre of digital scholarship … internationally recognised for innovation and collaboration in support of research and learningBoth in the Digital Scholarship department, the Digital Research Team with its Digital Curators work very closely with Labs . I will now talk about some of the activities of both of these teams to give you an idea of the work that we do..
  • 150 secondsMy colleague Stella Wisdom, Digital Curator, was one of the organisers of Off the Map competition for 2013 where videogame design students had to turn historic maps and engravings from the British Library’s collections into a 3D environment using Crytek's CRYENGINE software. The winners were Pudding Lane Productions, 6 second-year students,De Montfort University, Leicester, won first prize.Their entry used maps of London, and recreated a world that was destroyed by the Great Fire of London in the 16th Century, starting in Pudding Lane. Let’s take a brief look at their winning entry.Cue up from 13 seconds to 133 seconds.A new competition is launching soon, Off the Map Gothic 2014, which will be using digitised Gothic digitised items from the Library to inspire Gothic themed 3D environments, the results will be showcased at our Gothic exhibition at the end of this year, in November 2014, Terror & Wonder: The Gothic Imagination.
  • 45 secondsThe Digital Curators support the development of staff skills in the Librarythrough a bespoke Digital Scholarship Training Programme. A quote from one of the attendees states…“It is about helping librarians and curators at the British Library acclimatise to the idea that the Library is becoming a place full of data as much as it is a place full of physical stuff, and that there is a growing community of users who see it that way”. They offer 15 courses several times a year (animate slide).
  • 40 secondsThe aim of the Lab is to encourage scholars to experiment at scale with our digital collections and data. The team holds competitions, events, and creates the space in which to engage with scholars. Through Labs we’re learning how to better support scholars and build new services. Our website is available at project is kindly funded by the Andrew Mellon Foundation. We would like to announce that the project in the process of receiving 2 more year’s funding until 2017 and we would welcome opportunities to work together.
  • 62 secondsThe primary purpose of Labs primary is to open as much digital content as possible forresearchers and software developers (sometimes they are the same person) and encourage them to use the Library’s content in their research, primarily in UK academia but where appropriate anywhere else in the world.Labs sits within the Digital Scholarship Department at the British Library and works almost on a daily basis with the Digital Research Team. It also works with the Access and Reuse Group, a cross departmental group that meets once every six weeks to deal with requests to openly license digital content. Labs co-operates internally with Curators and Researchers and Technical staff in order to understand the ‘story’ behind a collection and the technical issues involved in providing access to the digital content.
  • 65 secondsThis is how Labs works. We adopt a Data Driven approach to encourage scholars to do research and development with and across British Library digital collections and data. A researcher / developer (again sometimes the same person and sometimes not) comes up with an idea and engages with Labs through various mechanisms such as competitions,events and projects. Through this processthe Library learns how better to support digital scholars and to build on existing processes or create new ones, as well as make tools (e.g. APIs etc.) and services. The case studies are some of the outputs we hope to create that will help other research libraries around the world wanting to build Labs for their digital content,others include open software and publications.
  • 115 secondsFinding openly licensed collections is sometimes like detective work, there are at least 700 digital collections. So…how do we choose content? Well lessons learned through a time limited project like Labs, has taught us to use the following 4 methods for filtering digital content:Is the Copyright cleared for research and non commercial use?Is it Curated (Is there someone who knows the ‘story’ about the collection?)Is there Collection / Item Level Metadata available? And importantly what state is it in, does it need cleansing?)Finally, where is it?These have been effective filters in doing the work of Labs in an agile way.Labs has therefore identified several collections at the website above, some are shown in the slide:Due to our licensing conditions, we are in the process of text mining the abstracts for a large number of journal titles in electronic form. The visualisation indicates the subject spread of our collections.We have been harvesting the UK Web since 1993 and this is available as a resource under specific conditions for research.We are also investigating the use of our item request data (around 17 million records) and anonymised reader data, data protection allowing.The British National Bibliography has over 3million catalogue records, licensed under CCO from the British and Irish National Library catalogues.More information is available on the Labs website.
  • 180 secondsSo what kinds digital research methods are these digital scholars using especially in the area of Digital Humanities.For example, searching for items based on and time location can reveal very interesting patterns, e.g. when and where works were published. For example one researcher is looking at the evidence of copy and paste in newspapers in the 19th Century which was a common practice back then. Knowing where and when items might include text from a source can reveal patterns of how the text travelled over time. Geotagging objects, putting them in space can add new dimensions to the kinds of research questions we might want to ask.Corpus analysis is the analysis of text in language and Text mining is about finding patterns in text through computational analysis, for example, number crunching (a lot of it based on counting words).Tasks that require humans to use technology to complete a task that computers would hard to do, fall under the area of Crowdsourcing and Human Computation for example e.g. recaptcha is used by getting better users to contribute to better text from scanned book by typing in words they see, these are words that computers couldn’t recognise through Optical Character recognition, recaptcha is getting humans to do the task in micro-tasks when they need to log in to websites that require additional authentication. Amazon’s Mechanical Turk is another form of human computation, where tasks are outsourced to humans that computers would find very hard to do.Annotation involves augmenting an item with additional information, usually text, but not necessarily, e.g. highlighting an area, a drawing etc.Natural Language processing is used in the analysis of speech, for example.Similarly transcribing can be the conversion of speech into text through human or computing power to then be used for further analysis.Providing Application Programming Interfaces or APIs to data can be very powerful ways to access datasets, and can even be used by software developers to build software applications on top of them.Many researchers want to see the patterns that are emerging in large amounts of data and are now using a number of very powerful tools to visualise large amounts of data to see patterns.This website from our launch event has 6 minute videos of presentations from researchers using digital research methodsWhat is clear is that digital research methods are much more that searching for an individual item in a catalogue and Libraries, publishers, service and content providers have to change to support that.
  • 80 secondsA major part of Labs activity is to run an annual competition. As mentioned we adopt a data ‘driven approach’, encouraging researchers to look at our data, talk to us, and more importantly to talk to each other and submit ideas and project plans of what they could do in a 4-6 month residency at the British Library. This ‘residency model ‘enables researchers to get access to pretty much all the digital content they require without any license restrictions and we get to engage with them deeply to learn about what they want to do and importantly what we need to learn as a library to support digital scholars better. We worked in an agile way withtwo researchers, Pieter Francois (remember him from earlier?) and Dan Norton over a 4 month period to work on their research questions and ideas. Let’s now look briefly at their ideas and what was achieved.
  • 2 minute video as part of the Made in British Library Series.
  • 70 secondsPieter’s project was the “The Sample Generator” which was a tool to help a researcher by providing representative digitised samples (as well as physical) of materials they were interested in researching about. This is opposed to being faced with the daunting task of sifting through thousands of records to find a representative sample to start working on. Pieter’s area of interest was European travel but the idea of the sample generator could work for any subject. We gained a deeper understanding of the distribution of digitised material to datePieter’s analysis showed that, while extensive, digitised material is not representative of published output. As a consequence, researchers must take additional care when trying to sample representative content using . statistical methods, a problem which The Sample Generator starts to addressFrom this screen shot you can the distribution of all the books the 19th Century. The blue represents the physical collection. The red line is the digital collection (around 2.7 %)This screen shot show the distribution of books about travel routes. The blue indicates all the physical items, the red line the digital and the orange line the sample. What’s key is the orange line mimics the frequency of items in the total collection. Ben will talk about this project in more detail later.
  • 65 secondsDr Dan Norton was researcher at the University of Dundee and artist in residence at Hangar, Centre for Art and Research, Barcelona. Hi idea was “Mixing the Library: The Disc Jockey and the Digital Collection” which brought a DJ’s approach to interacting with multi-format digital collections. Dan’s interactive approach helps build aesthetic, experimental, or logical links between resources. This ambitious project focused on ideas around creating a prototype and what would be the basic building blocks needed to create a simple demonstrator. Dan is now building on the work he did at Labs and is the resident researcher and artist at theLiving Labs: Library of the Future in Barcelona, where he will be working with software developers to produce a fully functioning mixing tool.
  • 50 secondsThe curatorial platform was created to re-use British Library metadata, using the Drupal content management system. It was created by Sara-Wingate-Gray and Kate Lomax, whose Labs 2013 entry was specially commended. Even though they didn’t win, judges loved their idea and subsequently with the help of Labs their idea attracted funding through the Arts and Humanities Research Council in the UK. The project was completed in Jan 2014 and showcases the digital narratives created by Art students using British Library Oil paintings from colonial India.Here is what a basic metadata record looks like on the British Library siteHere is the curatorial interfacewhich has simply ingested the metadata in a comma separated values file.As we can see it has created a very engaging set of user interfaces, using Geo-ocationSlideshows and timelines
  • 130 secondsThat’s just one story, there are so many more stories to tell. Here are just a sample of some of the other stories emanating from our Digital Lab at the British Library, stories we are only happy to tell other organisations and conferences. Invitations and all expenses paid trips to speak at Hawaii are most welcome!There is the story of how we are using subtitle files to create summaries of news programmes to enhance the poor metadata that currently exists at the moment for news programmes.The story of how we are working on analsying music performances with computer clusters and how the resulting data will be made available for researchers.The story of opening up over 100,000 Playbills (posters about plays) from the 17th Century onwards.The story of how we might be printing 3D objects to represent Digital Humanities data, and how people might be able to interact with these objects using their mobile phones or plug in and extract data from embedded USB memory devices.The story of, will be a place we are going to create for all the Library’s open data and freely licensed digital collections.The story of how we are setting up cloud infrastructure, where digital content lives right next door to enormous computing power, so that researchers can begin to interrogate out data at a massive scale and make incredible new discoveries, very similar to the internet archives virtual reading room.And the story of how we are approaching the Andrew Mellon Foundation to keep us funded for another 2 years!
  • 25 secondsA quick reminder again for all of you, our current competition is open, please tell everyone you can about it.The deadline is 22 April 2014 and the residency for two chosen ideas runs from late May to the end of October 2014, more details are available on our website.
  • 20 secondsPlease let us know about any ideas you might have for engaging with Labs.
  • 75 secondsThe work of Labs is really about a number of stories, stories about digital collections and about researchers wanting to ask fascinating research questions about them. Let’s now tell you a story about one collection and the intended and unintended consequences of working with it.The Library digitised 68,000 17th to 19th century books from our collections a few years ago (around 2.7 % of the physical total in that period). You can view them from our catalogue or read them on your IPad via theHistorical Books app developed by BiblioLabs. We also captured 22 million individual page images, along with full text scans of these images all of which contain untold quantity of useful data such as names of people, places, historical events, dates. So the question became then, what next? What can 68,000 books tell us?

British Library Labs - Presentation at the University of Nottingham - Digital Humanities series British Library Labs - Presentation at the University of Nottingham - Digital Humanities series Presentation Transcript

  • Supporting the Digital Scholar: Experiences from the British Library Labs Mahendra Mahey University of Nottingham’s Digital Humanities Seminar Series, Centre for Advanced Studies Friday 4th April, 2014, 1300 – 1320 Manager of British Library Labs
  • 2 #bl_labs Overview • The British Library and Digital Content • The Nature of Digital, Digitisation, Challenges of access • The British Library supporting Digital Scholarship • Experiences of the Digital Research Team and British Library Labs project in supporting digital scholarship • Conclusions and questions
  • 3 #bl_labs British Library Collections > 150million items > 0.8 m serial titles > 8 m stamps > 14 m books > 3 m sound recordings > 4 m maps > 1.6 m musical scores > 0.3 m manuscripts > 60 m patents King’s Library
  • 4 #bl_labs The Nature of Digital Data broken down recombined and duplicated Image: Tower of Babble, Book Sculpture by Brian Dettmer
  • 5 #bl_labs Digitisation - Transforming access Spreading the value of collections, content and expertise Connecting as much as collecting, e.g. social media Encouraging others to integrate our materials into their services – and vice versa
  • 6 #bl_labs only in Reading Rooms due to © only on site due to © not online – various storage devices online and open British Library online behind paywall Challenges of Digital access
  • 7 #bl_labs Opening up Digital content • Picturing Canada: Mapping a Collection:
  • 8 #bl_labs The Digital Scholar not necessarily be a recognised academic or someone who posts online, just a specialist Digital NetworkedOpen From Digital Scholar : How technology is transforming scholarly practice, Martin Weller, Bloomsbury Academic, 2011, page 4 It is someone who employs digital, networked and open approaches to demonstrate their specialism.
  • 9 #bl_labs Digital Scholarship Department …become a leading centre of digital scholarship … internationally recognised for innovation and collaboration in support of research and learning… •The Digital Research Team – Digital Curators •Labs 9
  • 10 #bl_labs Computer Games Off the Map Competition 2013 Pudding Lane Productions, 6 second-year students, De Montfort University, Leicester, won first prize. Off the Map Gothic 2014 ! w
  • 11 #bl_labs Training Library Staff • Foundations in working with Digital Objects: From Images to A/V • Data Visualisation for Analysis in Scholarly Research • Information Integration: Mash-ups, API’s and The Semantic Web Digital Scholarship Training Programme • Behind the Screen: Basics of the Web • What is Digital Scholarship? • Digital Collections at British Library • Digitisation at British Library • Text Encoding Initiative & Annotation • Geo-referencing and Digital Mapping • Crowdsourcing in Libraries, Museums and Cultural Heritage Institutions
  • 12 #bl_labs Funded by the Andrew Mellon Foundation
  • 13 #bl_labs Digital Scholarship Digital Research Access & Reuse Group © Developers/ Technical Staff British Library Universities & wider e.g. companies, start- ups, independent scholars etc. Stakeholders involved in Labs United Kingdom The World Researchers Developers BL Labs Curators / Researchers Digital Content
  • 14 #bl_labs What is Labs… BL Labs Open Software Publications Tools & services to support Digital Scholarship Case Studies Audience Research question / idea idea idea Competition Contact Events Meetings and visits Experimenting with our digital collections Outputs from engagementData Other Digital Collection / Data BL Digital Collection / Data Researchers Developers Data Driven
  • 15 #bl_labs British National Bibliography UK Web Archive Data Text-mining of electronic journals Book ordering and anonymised reader data Sample Labs Digital Collections • Copyright cleared for research use • Curated (Is there someone who knows the ‘story’ about the collection?) • Collection / Item Level Metadata available? (What state is and does it need cleaning?) • Where is it?
  • 16 #bl_labs Example Digital research methods (presentations from researchers using digital research methods) Corpus analysis tools/ Text Mining Visualisations Location based searching Geotagging Annotation Natural Language Processing Using Application Programming Interfaces for datasets e.g. Metadata, Images Transcribing Crowdsourcing / Human Computation
  • 17 #bl_labs The winners of the Labs 2013 competition Pieter Francois (left) and Dan Norton (right) and each received a cheque for £2000 in November 2013 as winners of the first British Library Lab Competition 2013 Two entries chosen in June 2013 They both worked in residence from July to October 2013 with Labs to complete their projects
  • 18 #bl_labs Pieter Francois – made in the British Library
  • 19 #bl_labs Sample Generator: representative samples • Pieter Francois • Focus on European travel in the 19th Century • Uses statistical methods to support text analysis • Tool produces representative samples of texts based on search criteria
  • 20 #bl_labs Mixing the Library: The Disc Jockey & the Digital Collection Prototype design Annotation Preview ‘item’ Selected ‘right’ channel ‘item’ Selected ‘left’ channel ‘item’ Collection ‘stalks’ made of ‘items’. Each ‘item’ is a URL. The order of the ‘items’ can be ‘shuffled’ and sent to the ‘left’ or ‘right’ channels ‘Play back’ of ‘items’ (Blue) and annotations (Yellow) Living Lab: Library of the Future, see: Basic functioning prototype:
  • 21 #bl_labs Curatorial for Library metadata Geo location TimelineSlide show India Office Select materials
  • 22 #bl_labs Other Labs stories…. • Augmenting news metadata • Digital Music Lab, analysing music performances • Opening up over 100,000 Playbills • 3D printed objects representing statistical data •, place for all our open data and digital collections • Content next to parallel compute power, analysis at scale • Seeking future funding!!
  • 23 #bl_labs Competition 2014 • Open!! • Deadline - 22 April 2014 – tell your friends! • Residency between late May and end of October 2014
  • 24 #bl_labs Email Labs • Let us know your ideas for engaging with Labs! • Questions? After coffee break.
  • 25 #bl_labs Story of one digital collection What can 68,000 books tell us? Image: Artwork by Alicia Martin