The British Library:
Exploring new ideas and methods to better understand the
cultural and historic heritage held in digit...
http://labs.bl.uk 2
#bl_labs labs@bl.uk
Overview
• Structure of talk
• The British Library and a typical scholar
• The Nat...
http://labs.bl.uk 3
#bl_labs labs@bl.uk
The British Library
St Pancras, London, UK
Many books are stored 5 stories below t...
http://labs.bl.uk 4
#bl_labs labs@bl.uk
British Library Collections
> 150million items
> 0.8 m serial titles
> 8 m stamps
...
http://labs.bl.uk 5
#bl_labs labs@bl.uk
Our Scholar in Humanities from Oxford…
• Travel routes in the 19th Century
Pieter ...
http://labs.bl.uk 6
#bl_labs labs@bl.uk
The Nature of Digital
Data broken down
recombined and
duplicated Image: Tower of B...
http://labs.bl.uk 7
#bl_labs labs@bl.uk
The Digital Scholar
not necessarily be a recognised academic or someone who posts ...
http://labs.bl.uk 8
#bl_labs labs@bl.uk
“Reading individual
works is as irrelevant
as describing the
architecture of a
bui...
http://labs.bl.uk 9
#bl_labs labs@bl.uk
Example Digital research methods
http://labs.bl.uk/Launch+Event (presentations fro...
http://labs.bl.uk 10
#bl_labs labs@bl.uk
Digitisation - Transforming access
Spreading the value of collections, content an...
http://labs.bl.uk 11
#bl_labs labs@bl.uk
only in
Reading
Rooms due
to ©
only on
site due to
©
not
online –
various
storage...
http://labs.bl.uk 12
#bl_labs labs@bl.uk
British National
Bibliography
UK Web Archive Data
Text-mining of
electronic journ...
http://labs.bl.uk 13
#bl_labs labs@bl.uk
Digital Scholarship Department
…become a leading centre of digital scholarship
… ...
http://labs.bl.uk 14
#bl_labs labs@bl.uk
What is a Digital Curator?
• Explore how digital technologies are
re/shaping rese...
http://labs.bl.uk 15
#bl_labs labs@bl.uk
Training Library Staff
• Foundations in working with Digital Objects:
From Images...
http://labs.bl.uk 16
#bl_labs labs@bl.uk
Opening up Digital content
• Picturing Canada: Mapping a Collection:
http://bit.l...
http://labs.bl.uk 17
#bl_labs labs@bl.uk
Crowdsourcing Digitised Maps
http://www.bl.uk/maps/georeferencingmap.html
http://labs.bl.uk 18
#bl_labs labs@bl.uk
Creative with Wildlife Sounds
http://goo.gl/s7siv0
Sound Edit Wildlife Films
Comp...
http://labs.bl.uk 19
#bl_labs labs@bl.uk
Computer Games
Off the Map Competition 2013
Pudding Lane Productions, 6 second-ye...
http://labs.bl.uk 20
#bl_labs labs@bl.uk
Funded by the Andrew Mellon Foundation
http://labs.bl.uk 21
#bl_labs labs@bl.uk
Digital
Scholarship
Digital
Research
Access &
Reuse Group
©
Developers/
Technical...
http://labs.bl.uk 22
#bl_labs labs@bl.uk
What is Labs…
BL Labs
Open
Software
Publications
Tools &
services to
support Digi...
http://labs.bl.uk 23
#bl_labs labs@bl.uk
Engaging with Labs
Brainstorm ideas & group
Reflect, consider, and choose
Work la...
http://labs.bl.uk 24
#bl_labs labs@bl.uk
The winners of the Labs 2013 competition
Pieter Francois (left) and Dan Norton (r...
http://labs.bl.uk 25
#bl_labs labs@bl.uk
Pieter Francois
– made in the British Library
http://youtu.be/xK80Jy0ijkA
http://labs.bl.uk 26
#bl_labs labs@bl.uk
Sample Generator: representative samples
• Pieter Francois
• Focus on European tr...
http://labs.bl.uk 27
#bl_labs labs@bl.uk
Mixing the Library:
The Disc Jockey & the Digital Collection
http://www.tompro.co...
http://labs.bl.uk 28
#bl_labs labs@bl.uk
Curatorial for Library metadata
Geo location
http://datatales.artefacto.org.uk/
T...
http://labs.bl.uk 29
#bl_labs labs@bl.uk
Story of one digital collection
What can 68,000
books tell us?
Image: Artwork by ...
http://labs.bl.uk 30
#bl_labs labs@bl.uk
The Mechanical Curator
http://mechanicalcurator.tumblr.com
• #similar_to_77576796...
http://labs.bl.uk 31
#bl_labs labs@bl.uk
1,020,418 images!
http://www.flickr.com/photos/britishlibrary/
Each image has a U...
http://labs.bl.uk 32
#bl_labs labs@bl.uk
Risks of releasing the images
Funny Books for Boys and Girls. Struwelpeter. Good-...
http://labs.bl.uk 33
#bl_labs labs@bl.uk
Flickr coverage in the media!
http://labs.bl.uk 34
#bl_labs labs@bl.uk
Creative Uses
http://goo.gl/qPPgxX
http://goo.gl/OH6FSn
Jura’s Sound Skateboard
http://labs.bl.uk 35
#bl_labs labs@bl.uk
Tagging a million images
- Metadata games and other projects
http://www.metadatag...
http://labs.bl.uk 36
#bl_labs labs@bl.uk
Other Labs stories….
• Augmenting news metadata
• Digital Music Lab, analysing mu...
http://labs.bl.uk 37
#bl_labs labs@bl.uk
Competition 2014
• Open!!
• Deadline - 22 April 2014 – tell your friends!
• Resid...
http://labs.bl.uk 38
#bl_labs labs@bl.uk
Conclusions
• Huge appetite for openly available digital content
• There needs to...
http://labs.bl.uk 39
#bl_labs labs@bl.uk
Acknowledgements
Ben O’Steen
- Labs Technical Lead
Digital Curator Team Digital S...
http://labs.bl.uk 40
#bl_labs labs@bl.uk
Email Labs
• Let us know your ideas for engaging with Labs!
• Questions? After co...
Upcoming SlideShare
Loading in …5
×

British Library Labs - Bodleian - University of Oxford

451 views

Published on

Presentation given at the Bodelian - University of Oxford, 24th of March, 2014.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
451
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • 150 seconds<click1>Now on to the structure of my talk. <click2>I will first give a very brief overview of the Library and then tell you a number of ‘stories’ mostly from aHumanities perspective on how researchers did things in the past <click3>and how that is changing because of rapid developments in digital technology. With more and more digital content, data, tools and services being made available, researchers are able to ask questions they had never dreamed of before, share their findings in an open way and collaborating, some of them are becoming the ‘digital’ scholars.<click4>I will bring you back the story to the British Library, and how the digital scholar is changing the way we do things. Moving on to the efforts of digitisation across the British Library,giving a whistle stop tour of some of the incredible digital collections we now have and highlight some of the challenges that weface given our historical origins,licensing and technical restrictions. Importantly, I will also try to address how we are trying to tackle some of these challenges. <click5>I will outline the work of Digital Scholarship department, created to support the changing research landscape, focusing particularly on the work on the Digital Research Teamand that of British Library Labs, both of which sit in the same department. I will point out some of the surprisingfindings we have discovered and some of the lessons we have learned so far and what we are planning for the future. <click6>Finally, I will finish with some important final ‘take away’ messages.Ben then will take a closer look at two Labs projects from a technical perspective.There will be a coffee break and then an opportunity to ask questions and a discussion around working together with the Bodelian.
  • 140 secondsThe British Library is the national library of the UK and one of the largest research libraries in the world .<click1>The Library moved to a new purpose built building in 1997the, largest of it’s kind that was builtin the UK in the 20th century. Many frequently used items are stored 5 stories below the main building at St Pancras in London and many might not know that part of the building is meant to look like a ship on a journey to discovery!<click2>(yellow line appears)<pause 5 seconds). <click3>(yellow line disappears}<click4>The building can sit 1,200 researchers at any one time across 5 reading rooms. <click5>Medium and long term requested items are held at Boston Spa in Yorkshire in a low oxygen warehouse, using robot to retrieve items. In total, the library has 625 km of shelving, growing by 12 km every year.. Some staff in IT Infrastructure, cataloguing, document supply work in Yorkshire too.Whilst we acquire items through purchase or gifts, much of the collection has been built up through legal deposit. That is, by law, a copy of every UK and Ireland print publication must be given to the British Library by its publishers. Around 3 million items are added per year. In 2013, legal deposit was extended to cover non-print material which means by law we take in digitally published items as well, which means regular mass crawls of the entire UK web domain as well as ebooks, ejournals etc.
  • 85 secondsThe picture you can see is inside the main building in London, it’s the King’s Library – King George the Third’s personal library! Sometimes known as the ‘stack’, I walk past this everyday and I sometimes look at this awe and am reminded that the collections the British Library have are truly staggering! We currently estimate them to exceed <click1>150 million items, representing every age of written civilisation and every known language. Our archives now contain the earliest surviving printed book in the world, the Diamond Sutra, written in Chinese and dating from 868 AD….So some big numbers…Over …<click2>14 million books<click3>60 million patents<click4>8 million stamps<click5>4 million maps<click6>3 million sound recordings<click7>1.6 million music scores<click8>over .3 million manuscripts<click9>0.8 million serials titles (which are of course made up of many volumes/editions), Just in case your wondering about why the numbers don’t add up to 150 million, this is where a lot of our content is.
  • 80 secondsI want to give you an example of a typical scholarwho had recently done work at the Library in the Humanities domain. <click1>Pieter Francois is a Post Doctoral researcher at the University of Oxford. When Pieter was doing his PhD he would visit the Library often, look through the library catalogue, find and requestitems he was interested in and then study them in a reading room, disappearing for many years!Pieter Francois was interested <click2>in books that were about travel routes inEurope in the 19th Century.Imagine if a sample of these items were available digitally, imagine the time that would have saved him? Imagine how that would transform the kinds of questions he might ask using the power of computation, across hundreds of items? We will come back to Pieter later in my talk and track his story of becoming a digital scholar.
  • 50 seconds<click1>So, the very nature of digital allows us to <click2>break down what were previously bound items down into fundamental bits of information and data. These bits of data can be recombined, duplicated and linked to in infinite ways. This is fundamentally changing our view of research. It’s a bit like the <click3> ‘Tower of Babble’ sculpture to the right by Brian Dettmer, created by recombining bits from books, words and sentences cut out and put back together in different ways to create something new, surprising and beautiful. This is what scholars are doing with digital content. Let us now move on to what is understood by the term ‘digital scholar’.
  • 50 secondsIn his book, The Digital Scholar: How technology is transforming scholarly practice, Martin Weller suggests that a short hand term should be used to loosely define a Digital Scholar. First of all, <click1>the person does not necessarily need to be a recognised academic or someone who posts online. It is someone who employs <click2>digital,<click3> networked<click 4> and openapproaches to demonstrate their specialism.Let us now look at the area of Humanities, where our scholar Pieter Francois does his work, to investigate the idea of a Digital Scholar a little further.
  • 40 secondsFranco Moretti, a Humanities scholarfrom Stanford University said, <click1>‘Reading individual works is as irrelevant as describing the architecture of a building from a single brick, or the layout of a city from a single church.”Imagine if scholars could view digitalarchives as an infinite pool of multiple layers of loosely held data from which new research questions could be answered, moving beyond the bounds of single items, to enable research at scale.
  • 180 secondsSo what kinds digital research methods are these digital scholars using especially in the area of Digital Humanities.<click1>For example, searching for items based on and time location can reveal very interesting patterns, e.g. when and where works were published. For example one researcher is looking at the evidence of copy and paste in newspapers in the 19th Century which was a common practice back then. Knowing where and when items might include text from a source can reveal patterns of how the text travelled over time. <click2>Geotagging objects, putting them in space can add new dimensions to the kinds of research questions we might want to ask.<click3>Corpus analysis is the analysis of text in language and Text mining is about finding patterns in text through computational analysis, for example, number crunching (a lot of it based on counting words).<click4>Tasks that require humans to use technology to complete a task that computers would hard to do, fall under the area of Crowdsourcing and Human Computation for example e.g. recaptcha is used by getting better users to contribute to better text from scanned book by typing in words they see, these are words that computers couldn’t recognise through Optical Character recognition, recaptcha is getting humans to do the task in microtasks when they need to log in to websites that require additional authentication. Amazon’s Mechanical Turk is another form of human computation, where tasks are outsourced to humans that computers would find very hard to do.<click5>Annotation involves augmenting an item with additional information, usually text, but not necessarily, e.g. highlighting an area, a drawing etc.<click6>Natural Language processing is used in the analysis of speech, for example.<click7>Similarly transcribing can be the conversion of speech into text through human or computing power to then be used for further analysis.<click8>Providing Application Programming Interfaces or APIs to data can be very powerful ways to access datasets, and can even be used by software developers to build software applications on top of them.<click9>Many researchers want to see the patterns that are emerging in large amounts of data and are now using a number of very powerful tools to visualise large amounts of data to see patterns.<click10>This website from our launch event has 6 minute videos of presentations from researchers using digital research methodsWhat is clear is that digital research methods are much more that searching for an individual item in a catalogue and Libraries, publishers, service and content providers have to change to support that.
  • 25 secondsDigitisation is transforming access to researchers.<click> It is spreading the value of collections, content and expertise. <click>It is about connecting,collaborating and sharing as much as it is about collecting, e.g. through social media and <click>encouraging others to integrate our materials into their services – and vice versa
  • 85 seconds<click>The British Library faces many challenges of access to our Digital collections!<click> Sometimes digital content is only available onsite due to license restrictions, <click>or even only on a specific computer in a reading room! Technically there are very few reasons why digital content can’t be online <click> though it might be too big or hasn’t been transferred from other digital storage media. <click>Sometimes access is through a paywall. Finally, <click>some content is in the happy sunny place, online, open and freely available. The real reasons why there are challenges to accessing digital content are of course human. They require different approaches from the Library and may often involve an honest, open dialogue and negotiation with the publishers. The Labs project has tried to address this problem my creating a ‘residency model’ for researchers to work intensively with a digital collection on-site, so as to not infringe access conditions, I will say more about this later.
  • 115 secondsFinding openly licensed collections is sometimes like detective work, there are at least 700 digital collections. So…how do we choose content? Well lessons learned through a time limited project like Labs, has taught us to use the following 4 methods for filtering digital content:<click1>Is the Copyright cleared for research and non commercial use?<click2>Is it Curated (Is there someone who knows the ‘story’ about the collection?)<click3>Is there Collection / Item Level Metadata available? And importantly what state is it in, does it need cleansing?)<click4>Finally, where is it?<click5>These have been effective filters in doing the work of Labs in an agile way.<click6>Labs has therefore identified several collections at the website above, some are shown in the slide:<click7>Due to our licensing conditions, we are in the process of text mining the abstracts for a large number of journal titles in electronic form. The visualisation indicates the subject spread of our collections.<click8>We have been harvesting the UK Web since 1993 and this is available as a resource under specific conditions for research.<click9>We are also investigating the use of our item request data (around 17 million records) and anonymised reader data, data protection allowing.<click10>The British National Bibliography has over 3million catalogue records, licensed under CCO from the British and Irish National Library catalogues.More information is available on the Labs website.
  • 45 secondsFormed in 2010 and lead by Dr Adam Farquhar, <click1>the Digital Scholarship department’s mission is to …become a leading centre of digital scholarship … internationally recognised for innovation and collaboration in support of research and learningBoth in the Digital Scholarship department, <click2>the Digital Research Team with its Digital Curators work very closely with <click3> Labs . I will now talk about some of the activities of both of these teams to give you an idea of the work that we do..
  • 55 secondsOur Digital Curators are, <click1>Stella Wisdom, AquilesAlencar-Brayner, James Baker and Nora McGregor.So what exactly is a Digital Curator?<click2>They explore how digital technologies are re/shaping research and how this informs how the library does its business.<click3>They support staff across the library to identify the opportunities that digital tools and collections afford in modern scholarship and to gain the skills to engage confidently in this area.<click4>They partner with libraries and institutions to enable innovation in digital scholarship.<click5>The don’t curate a specific collection but rather have expertise in digital scholarship, broadly defined.
  • 45 secondsThe Digital Curators support the development of staff skills in the Librarythrough a bespoke <click1>Digital Scholarship Training Programme. A quote from one of the attendees states…“It is about helping librarians and curators at the British Library acclimatise to the idea that the Library is becoming a place full of data as much as it is a place full of physical stuff, and that there is a growing community of users who see it that way”. <click2>They offer 15 courses several times a year (animate slide).
  • 45 secondsThe role of the Labs is to open us as much digital content as it can, however, there are other parts of the Library who want to do the same.<click1>The "Picturing Canada" collection is a series of photographs from the Canadian Copyright Collection held at the British Library. They were digitised with Wikimedia UK and the Eccles Centre for American Studies and 5374 images have been uploaded to Wikimedia Commons in high resolution. This demonstrates that the Library is using open models of releasing the digital content it curates. <click2>There are now currently 3 collections of copyright free images on Wikimedia commons.
  • 22 secondsThe Digital Maps curator has leveraged crowdsourcing to help geo-reference <click1>725 digitised maps of the UK using an accessible and convenient tool called Geoparser<click2>. The maps were assigned spatial metadata in only 5 days <click3>with only a small proportion of errors.
  • 105 secondsCurator Cheryl Tipp Curator of Environment and Nature Sounds <click1>in Digital Scholarship worked with the creative industries department at the British Library and a company called Ideas Tap to launch the <click>‘Sound Edit Wildlife Films Competition’ which challenged animators, filmmakers and photographers to create a short film inspired by the Library's collection of 10 wildlife sound recordings. <click2>The winning entry was 'Dave's Wild Life' from Samuel de Ceccatty,a fantastic short which follows Dave, an amateur naturalist whose sole aim is to have his own TV show. The clip I will show usesthe ‘Haddock drumming calls’ to give a voice to the cranes or, as Dave liked to call them, the Diplodocus longuscranum.Cue up video and play from 2min 41 - 3 minutes 15
  • 150 secondsMy colleague Stella Wisdom, Digital Curator, was one of the organisers of Off the Map competition for 2013 where videogame design students had to turn historic maps and engravings from the British Library’s collections into a 3D environment using Crytek's CRYENGINE software. The winners were Pudding Lane Productions, 6 second-year students,De Montfort University, Leicester, won first prize.Their entry used maps of London, and recreated a world that was destroyed by the Great Fire of London in the 16th Century, starting in Pudding Lane. Let’s take a brief look at their winning entry.Cue up from 13 seconds to 133 seconds.<click3>A new competition is launching soon, Off the Map Gothic 2014, which will be using digitised Gothic digitised items from the Library to inspire Gothic themed 3D environments, the results will be showcased at our Gothic exhibition at the end of this year, in November 2014, Terror & Wonder: The Gothic Imagination.
  • 40 secondsThe aim of the Lab is to encourage scholars to experiment at scale with our digital collections and data. The team holds competitions, events, and creates the space in which to engage with scholars. Through Labs we’re learning how to better support scholars and build new services. Our website is available at labs.bl.uk<click1>The project is kindly funded by the Andrew Mellon Foundation. We would like to announce that the project in the process of receiving 2 more year’s funding until 2017 and we would welcome opportunities to work together.
  • 62 seconds<click1>The primary purpose of Labs primary is to open as much <click>digital content as possible for<click2>researchers and software developers (sometimes they are the same person) and encourage them to use the Library’s content in their research, <click3>primarily in UK academia but where appropriate anywhere else in the <world>world.Labs sits within the Digital Scholarship Department at the British Library <click4>and works almost on a daily basis with the Digital Research Team<click>. It also works with the <click>Access and Reuse Group, a cross departmental group that meets once every six weeks to deal with requests to openly license digital content. Labs co-operates internally with <click5>Curators and Researchers and Technical staff in order to understand the ‘story’ behind a collection and the technical issues involved in providing access to the digital content.
  • 65 secondsThis is how Labs works. <click1>We adopt a Data Driven approach to encourage scholars to do research and development with and across British Library digital collections and data. <click2>A researcher / developer (again sometimes the same person and sometimes not) comes up with an idea and engages with Labs through various mechanisms <click3>such as competitions,events and projects. Through this processthe Library learns how better to support digital scholars and to build on existing processes or create new ones, as well as make <click4>tools (e.g. APIs etc.) and services. The <click5>case studies are some of the outputs we hope to create that will help other research libraries around the world wanting to build Labs for their digital content,<click6>others include open software and publications.
  • 70 secondsWe engage researchers through various activities, such informal events such as:<click1>Hack and Data days – where researchers, developers, curators and anyone interested with digital collections work together.<click2>First brainstorm ideas and try to group them, <click3>then reflect, consider and choose them, focusing on being realisitic of what can be achieved in the time available<click4>and develop prototypes <click>where the atmosphere is relaxed and non judgemental, it’s OK to try things, and make mistakes.<click5>We also run ideas Labs where we get researchers together over lunch, engaging with the Library’s digital collections through<click> playingcards, like Top Trumps, boys amongst you will know what I mean. We encourage them through activities to come up with ideas and research questions, focussing on what outputs might be generated, and to continue to work the us.<click6>We also get involved in projects from within the Library and collaborating with external institutions.
  • 80 secondsA major part of Labs activity is to run an annual competition. As mentioned we adopt a data ‘driven approach’, encouraging researchers to look at our data, talk to us, and more importantly to talk to each other and submit ideas and project plans of what they could do in a 4-6 month residency at the British Library. This ‘residency model ‘enables researchers to get access to pretty much all the digital content they require without any license restrictions and we get to engage with them deeply to learn about what they want to do and importantly what we need to learn as a library to support digital scholars better. We worked in an agile way with<click1>two researchers, <click2>Pieter Francois (remember him from earlier?) and Dan Norton over a <click3>4 month period to work on their research questions and ideas. Let’s now look briefly at their ideas and what was achieved.
  • 2 minute video as part of the Made in British Library Series.
  • 70 seconds<click1>Pieter’s project was the “The Sample Generator” which was a tool to help a researcher by providing representative digitised samples (as well as physical) of materials they were interested in researching about. This is opposed to being faced with the daunting task of sifting through thousands of records to find a representative sample to start working on. Pieter’s area of interest was<click2> European travel but the idea of the sample generator could work for any subject. We gained a deeper understanding of the distribution of digitised material to datePieter’s analysis showed that, while extensive, digitised material is not representative of published output. As a consequence, researchers must take additional care when trying to sample representative content using <click3>. statistical methods,<click4> a problem which The Sample Generator starts to address<click5>From this screen shot you can the distribution of all the books the 19th Century. The blue represents the physical collection. The red line is the digital collection (around 2.7 %)<click6>This screen shot show the distribution of books about travel routes. The blue indicates all the physical items, the red line the digital and the orange line the sample. What’s key is the orange line mimics the frequency of items in the total collection. Ben will talk about this project in more detail later.
  • 65 secondsDr Dan Norton was researcher at the University of Dundee and artist in residence at Hangar, Centre for Art and Research, Barcelona. Hi idea was “Mixing the Library: The Disc Jockey and the Digital Collection” which brought a DJ’s approach to interacting with multi-format digital collections<click1>. Dan’s interactive approach helps build aesthetic, experimental, or logical links between resources. This ambitious project focused on ideas around creating a prototype<click2> and what would be the basic building blocks needed to create a simple demonstrator<click3>. Dan is now building on the work he did at Labs and is the resident researcher and artist at the<click4>Living Labs: Library of the Future in Barcelona, where he will be working with software developers to produce a fully functioning mixing tool.
  • 50 secondsThe curatorial platform was created to re-use British Library metadata, using the Drupal content management system. It was created by Sara-Wingate-Gray and Kate Lomax, whose Labs 2013 entry was specially commended. Even though they didn’t win, judges loved their idea and subsequently with the help of Labs their idea attracted funding through the Arts and Humanities Research Council in the UK. The project was completed in Jan 2014 and showcases the digital narratives created by Art students using British Library Oil paintings from colonial India.Here is what a basic metadata record looks like on the British Library site<click1>Here is the curatorial interface<click2>which has simply ingested the metadata in a comma separated values file.As we can see it has created a very engaging set of user interfaces<click3>, using <click4>Geolocation<click5>Slideshows<click6> and <click7>timelines
  • 75 secondsThe work of Labs is really about a number of stories, stories about digital collections and about researchers wanting to ask fascinating research questions about them. Let’s now tell you a story about one collection and the intended and unintended consequences of working with it.The Library digitised 68,000 17th to 19th century books from our collections a few years ago (around 2.7 % of the physical total in that period). You can view them from our catalogue or read them on your <click>IPad via theHistorical Books app developed by BiblioLabs. We also captured 22 million individual page images, along with full text scans of these images all of which contain untold quantity of useful data such as names of people, places, historical events, dates. So the question became then, what next? What can 68,000 books tell us?
  • 130 secondsBen then decided that as he had started to extract all images from scanned pages, he would start to post an image every hour on a tumblr blog <click1>This was the first image that was published. In discussions with the Digital Research team, the digital curators and me, the service was christened with the somewhat controversial name <click>Mechanical Curator (we like to be a little controversial) and said that it was a ‘she’. Our newest staff member churned away day and night posting an image every hour. It posted previously unseen illustrations taken almost at random. The Mechanical curator uses algorithms to chose other similar images based on a number of algorithms it has at hand, for example <click2>e.g. published date, <click3>slantyness, <click>bubblyness on the x or <click>y axis, or a<click> ‘new train of thought’ if it get’s bored. However, little was known about the actual image, apart from the analytic work of the Mechanical curator.Meanwhile the algorithm that Ben had written to snip the images from the OCR scans was still churning away, how many were there going to be? The Mechanical Curator could publish them every hour, it, but was there somewhere we could put them all for people to browse when they wanted. Importantly if we did put them somewhere, could we get people to help us add descriptions to the individual images making them infinitely more discoverable.
  • 65 secondsHow many images do you think Mechanical Curator found? <click1>Over 1 million images were then put onto<click2>Flickr commons.Why? Because each image would have a <click3>URL. Each image had some<click4>metadata, i.e. the book and page number it came from.However the image itself didn’t have any metadata, i.e. was it a picture of a dog. By releasing the images onto Flickr, we could begin to see if people might start adding tags to the images. <click5>Flickr has a well known and used API which developers and researchers could start using to build applications on top of or for examining large amounts of them at the same time.People have already started putting images into sets as you can see from the picture, portraits and ships are very popular.
  • 35 secondsThere are risks in this of course, surely lurking in the 1 million are images are sordid and of an offensive nature, especially given some of the views that were around at that time?<click1>In the end we decided to not interfere, and take any issues as they may arise on a case-by-case basis. To date we’ve had very few.
  • 18 secondsThere has been considerable news coverage about the million images released on Flickr commons. <click1>The Independent, <click2>Wired magazine, <click3>The Guardian, <click4>Popular Science and the<click5>Mail online to name a very few.
  • 15 secondsThere have been several creative uses of the images and can be found at the website above, <click1>even the creation of a skateboard which you can buy for $64.
  • 50 secondsWe are working with <click>Metadata Games to develop branded British Library games rapidly increase the amount metadata that is being added, probably games launched set by set<click>, e.g. we have lots of ships, maps and portraits to name a few. There is also an Arts and Humanities Research Council Big Data project called Lost Visions, <click>which will be answering detailed research questions about the images and doing some tagging too.
  • 130 secondsThat’s just one story, there are so many more stories to tell. Here are just a sample of some of the other stories emanating from our Digital Lab at the British Library, stories we are only happy to tell other organisations and conferences. Invitations and all expenses paid trips to speak at Hawaii are most welcome!<click1>There is the story of how we are using subtitle files to create summaries of news programmes to enhance the poor metadata that currently exists at the moment for news programmes.<click2>The story of how we are working on analsying music performances with computer clusters and how the resulting data will be made available for researchers.<click3>The story of opening up over 100,000 Playbills (posters about plays) from the 17th Century onwards.<click4>The story of how we might be printing 3D objects to represent Digital Humanities data, and how people might be able to interact with these objects using their mobile phones or plug in and extract data from embedded USB memory devices.<click5>The story of data.bl.uk, will be a place we are going to create for all the Library’s open data and freely licensed digital collections.<click6>The story of how we are setting up cloud infrastructure, where digital content lives right next door to enormous computing power, so that researchers can begin to interrogate out data at a massive scale and make incredible new discoveries, very similar to the internet archives virtual reading room.<click7>And the story of how we are approaching the Andrew Mellon Foundation to keep us funded for another 2 years!
  • 25 secondsA quick reminder again for all of you, our current competition is open, please tell everyone you can about it.The deadline is 22 April 2014 and the residency for two chosen ideas runs from late May to the end of October 2014, more details are available on our website.
  • 50 secondsFinally these are my take away messages:<click1>There is a huge appetite for openly available content as we have shown with the Flickr Commons images.<click2>There needs to be a dynamic continuous interaction with data and researcher to formulate and reformulate research questions<click3>Working with Digital Scholars creates new opportunities, not just new research questions.<click4>Content and service providers, researchers and technical people need to engage with each other to create the new tools, services and content that are needed to facilitate new discoveries.
  • 35 secondsI would like to acknowledge the following colleagues in Digital Scholarship and the Digital Research Team and of course Ben O’Steen and thank Tanya Gray for inviting me.
  • 20 secondsPlease let us know about any ideas you might have for engaging with Labs.
  • British Library Labs - Bodleian - University of Oxford

    1. 1. The British Library: Exploring new ideas and methods to better understand the cultural and historic heritage held in digital form by the Library. Experiences from British Library Labs Mahendra Mahey BL Labs Presentation at the Bodelian, University of Oxford Monday 24th March, 2014, 1400 – 1435 Manager of British Library Labs
    2. 2. http://labs.bl.uk 2 #bl_labs labs@bl.uk Overview • Structure of talk • The British Library and a typical scholar • The Nature of Digital and the Digital Scholar • The British Library supporting Digital Scholarship • Experiences of the Digital Research Team and British Library Labs project in supporting digital scholarship • A closer technical look at two projects • Coffee break and then questions and discussion about working together
    3. 3. http://labs.bl.uk 3 #bl_labs labs@bl.uk The British Library St Pancras, London, UK Many books are stored 5 stories below the building Inside the British Library Space for 1200 readers, around 400,000 visitors per year Uses low oxygen and robots Storage at Boston Spa
    4. 4. http://labs.bl.uk 4 #bl_labs labs@bl.uk British Library Collections > 150million items > 0.8 m serial titles > 8 m stamps > 14 m books > 3 m sound recordings > 4 m maps > 1.6 m musical scores > 0.3 m manuscripts > 60 m patents King’s Library
    5. 5. http://labs.bl.uk 5 #bl_labs labs@bl.uk Our Scholar in Humanities from Oxford… • Travel routes in the 19th Century Pieter Francois Post doctoral researcher at University of Oxford
    6. 6. http://labs.bl.uk 6 #bl_labs labs@bl.uk The Nature of Digital Data broken down recombined and duplicated Image: Tower of Babble, Book Sculpture by Brian Dettmer
    7. 7. http://labs.bl.uk 7 #bl_labs labs@bl.uk The Digital Scholar not necessarily be a recognised academic or someone who posts online, just a specialist Digital NetworkedOpen From Digital Scholar : How technology is transforming scholarly practice, Martin Weller, Bloomsbury Academic, 2011, page 4 It is someone who employs digital, networked and open approaches to demonstrate their specialism.
    8. 8. http://labs.bl.uk 8 #bl_labs labs@bl.uk “Reading individual works is as irrelevant as describing the architecture of a building from a single brick, or the layout of a city from a single church.”
    9. 9. http://labs.bl.uk 9 #bl_labs labs@bl.uk Example Digital research methods http://labs.bl.uk/Launch+Event (presentations from researchers using digital research methods) Corpus analysis tools/ Text Mining Visualisations Location based searching Geotagging Annotation Natural Language Processing Using Application Programming Interfaces for datasets e.g. Metadata, Images Transcribing Crowdsourcing / Human Computation
    10. 10. http://labs.bl.uk 10 #bl_labs labs@bl.uk Digitisation - Transforming access Spreading the value of collections, content and expertise Connecting as much as collecting, e.g. social media Encouraging others to integrate our materials into their services – and vice versa
    11. 11. http://labs.bl.uk 11 #bl_labs labs@bl.uk only in Reading Rooms due to © only on site due to © not online – various storage devices online and open British Library online behind paywall Challenges of Digital access
    12. 12. http://labs.bl.uk 12 #bl_labs labs@bl.uk British National Bibliography UK Web Archive Data Text-mining of electronic journals Book ordering and anonymised reader data Sample Labs Digital Collections http://labs.bl.uk/Digital+Collections • Copyright cleared for research use • Curated (Is there someone who knows the ‘story’ about the collection?) • Collection / Item Level Metadata available? (What state is and does it need cleaning?) • Where is it?
    13. 13. http://labs.bl.uk 13 #bl_labs labs@bl.uk Digital Scholarship Department …become a leading centre of digital scholarship … internationally recognised for innovation and collaboration in support of research and learning… •The Digital Research Team – Digital Curators •Labs 13
    14. 14. http://labs.bl.uk 14 #bl_labs labs@bl.uk What is a Digital Curator? • Explore how digital technologies are re/shaping research and how this informs how the library does its business. • Support staff across the library to identify the opportunities that digital tools and collections afford in modern scholarship and to gain the skills to engage confidently in this area. • Partner with libraries and institutions to enable innovation in digital scholarship. • No specific collection but rather expertise in digital scholarship, broadly defined. James Baker Nora McGregor Stella Wisdom Aquiles Alencar-Brayner
    15. 15. http://labs.bl.uk 15 #bl_labs labs@bl.uk Training Library Staff • Foundations in working with Digital Objects: From Images to A/V • Data Visualisation for Analysis in Scholarly Research • Information Integration: Mash-ups, API’s and The Semantic Web Digital Scholarship Training Programme • Behind the Screen: Basics of the Web • What is Digital Scholarship? • Digital Collections at British Library • Digitisation at British Library • Text Encoding Initiative & Annotation • Geo-referencing and Digital Mapping • Crowdsourcing in Libraries, Museums and Cultural Heritage Institutions
    16. 16. http://labs.bl.uk 16 #bl_labs labs@bl.uk Opening up Digital content • Picturing Canada: Mapping a Collection: http://bit.ly/13GhLIe http://commons.wikimedia.org/wiki/Commons:British_Library/Picturing_Canada
    17. 17. http://labs.bl.uk 17 #bl_labs labs@bl.uk Crowdsourcing Digitised Maps http://www.bl.uk/maps/georeferencingmap.html
    18. 18. http://labs.bl.uk 18 #bl_labs labs@bl.uk Creative with Wildlife Sounds http://goo.gl/s7siv0 Sound Edit Wildlife Films Competition 2013 http://vimeo.com/60401313 'Dave's Wild Life' by Samuel de Ceccatty, won first prize! http://sounds.bl.uk/Environment
    19. 19. http://labs.bl.uk 19 #bl_labs labs@bl.uk Computer Games Off the Map Competition 2013 Pudding Lane Productions, 6 second-year students, De Montfort University, Leicester, won first prize. Off the Map Gothic 2014 ! http://goo.gl/gVuVn w http://youtu.be/SPY-hr-8-M0
    20. 20. http://labs.bl.uk 20 #bl_labs labs@bl.uk Funded by the Andrew Mellon Foundation
    21. 21. http://labs.bl.uk 21 #bl_labs labs@bl.uk Digital Scholarship Digital Research Access & Reuse Group © Developers/ Technical Staff British Library Universities & wider e.g. companies, start-ups, independent scholars etc. Stakeholders involved in Labs United Kingdom The World Researchers Developers BL Labs Curators / Researchers Digital Content
    22. 22. http://labs.bl.uk 22 #bl_labs labs@bl.uk What is Labs… BL Labs Open Software Publications Tools & services to support Digital Scholarship Case Studies Audience Research question / idea idea idea Competition Contact Events Meetings and visits Experimenting with our digital collections Outputs from engagementData Other Digital Collection / Data BL Digital Collection / Data Researchers Developers Data Driven
    23. 23. http://labs.bl.uk 23 #bl_labs labs@bl.uk Engaging with Labs Brainstorm ideas & group Reflect, consider, and choose Work late and show what has been done 1 2 3 Labs Data Cards Ideas Labs Hack and Data days Projects
    24. 24. http://labs.bl.uk 24 #bl_labs labs@bl.uk The winners of the Labs 2013 competition Pieter Francois (left) and Dan Norton (right) and each received a cheque for £2000 in November 2013 as winners of the first British Library Lab Competition 2013 Two entries chosen in June 2013 They both worked in residence from July to October 2013 with Labs to complete their projects
    25. 25. http://labs.bl.uk 25 #bl_labs labs@bl.uk Pieter Francois – made in the British Library http://youtu.be/xK80Jy0ijkA
    26. 26. http://labs.bl.uk 26 #bl_labs labs@bl.uk Sample Generator: representative samples • Pieter Francois • Focus on European travel in the 19th Century • Uses statistical methods to support text analysis • Tool produces representative samples of texts based on search criteria http://goo.gl/YFnZmu
    27. 27. http://labs.bl.uk 27 #bl_labs labs@bl.uk Mixing the Library: The Disc Jockey & the Digital Collection http://www.tompro.co.uk http://www.ablab.org/shetland http://www.ablab.org/pd/di/ Prototype design Annotation Preview ‘item’ Selected ‘right’ channel ‘item’ Selected ‘left’ channel ‘item’ Collection ‘stalks’ made of ‘items’. Each ‘item’ is a URL. The order of the ‘items’ can be ‘shuffled’ and sent to the ‘left’ or ‘right’ channels ‘Play back’ of ‘items’ (Blue) and annotations (Yellow) http://212.71.253.54:8000/a Living Lab: Library of the Future, see: http://alturl.com/284zw Basic functioning prototype:
    28. 28. http://labs.bl.uk 28 #bl_labs labs@bl.uk Curatorial for Library metadata Geo location http://datatales.artefacto.org.uk/ TimelineSlide show India Office Select materials
    29. 29. http://labs.bl.uk 29 #bl_labs labs@bl.uk Story of one digital collection What can 68,000 books tell us? Image: Artwork by Alicia Martin
    30. 30. http://labs.bl.uk 30 #bl_labs labs@bl.uk The Mechanical Curator http://mechanicalcurator.tumblr.com • #similar_to_77576796197_published_date • #similar_to_77576796197_slantyness • #similar_to_77576796197_bubblyness_x • #similar_to_77576796197_bubblyness_y • #new_train_of_thought Image from ‘A Lost Estate, by Mary E.Mann,Volume: 02, Page: 91, 1889, London, Bentley & Son
    31. 31. http://labs.bl.uk 31 #bl_labs labs@bl.uk 1,020,418 images! http://www.flickr.com/photos/britishlibrary/ Each image has a URL Some metadata, but you can add tags! Flickr has an API so researchers and developers can build apps And query the data Flickr Commons – 1,020,418 images!
    32. 32. http://labs.bl.uk 32 #bl_labs labs@bl.uk Risks of releasing the images Funny Books for Boys and Girls. Struwelpeter. Good-for-nothing Boys and Girls. Troublesome Children. King Nutcracker and Poor Reinhold.
    33. 33. http://labs.bl.uk 33 #bl_labs labs@bl.uk Flickr coverage in the media!
    34. 34. http://labs.bl.uk 34 #bl_labs labs@bl.uk Creative Uses http://goo.gl/qPPgxX http://goo.gl/OH6FSn Jura’s Sound Skateboard
    35. 35. http://labs.bl.uk 35 #bl_labs labs@bl.uk Tagging a million images - Metadata games and other projects http://www.metadatagames.org/ Games will probably be developed using Flickr sets http://goo.gl/j6fxac Cardiff University’s - Lost Visions Project
    36. 36. http://labs.bl.uk 36 #bl_labs labs@bl.uk Other Labs stories…. • Augmenting news metadata • Digital Music Lab, analysing music performances • Opening up over 100,000 Playbills • 3D printed objects representing statistical data • data.bl.uk, place for all our open data and digital collections • Content next to parallel compute power, analysis at scale • Seeking future funding!!
    37. 37. http://labs.bl.uk 37 #bl_labs labs@bl.uk Competition 2014 • Open!! • Deadline - 22 April 2014 – tell your friends! • Residency between late May and end of October 2014
    38. 38. http://labs.bl.uk 38 #bl_labs labs@bl.uk Conclusions • Huge appetite for openly available digital content • There needs to be a continuous dynamic interaction with data and the researchers to formulate and reformulate research questions • Working with Digital Scholars creates new opportunities • Content and service providers, researchers and technical people need to talk to each other to create the new tools, services and data needed to facilitate new discoveries
    39. 39. http://labs.bl.uk 39 #bl_labs labs@bl.uk Acknowledgements Ben O’Steen - Labs Technical Lead Digital Curator Team Digital Scholarship Heads Stella Wisdom - Digital Curator Nora McGregor - Digital Curator James Baker - Digital Curator Adam Farquhar - Head of Digital Scholarship (Wrote Labs proposal) Aly Conteh - Head of Digital Research and Curator Team
    40. 40. http://labs.bl.uk 40 #bl_labs labs@bl.uk Email Labs • Let us know your ideas for engaging with Labs! • Questions? After coffee break. labs@bl.uk

    ×