Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013

  1. 1. British Library Labs British Library Labs and Digital Humanities Monday 30th September 2013, 1400 - 1600 Bloomsbury Digital Humanities Group Meeting Meeting Room H, British Library, NW1 2DB London Mr Mahendra Mahey British Library Labs Project Manager Scholarship and Collections, Digital Scholarship
  2. 2. 2 #bl_labs Overview •Background, People, Details, Plan •Content used with Labs •Research methods •The competition and engaging with Labs
  3. 3. 3 #bl_labs Encouraging scholars and developers to do research and development with and across British Library collections and data (+other) The project in a nutshell…
  4. 4. 4 #bl_labs People involved in Labs Project Board Advisory Board Digital Scholarship Team Digital Curators Access and Reuse Group © Over 200 Curators Labs Researchers Developers Researchers Developers British Library Universities & wider
  5. 5. 5 #bl_labs Our Brand…
  6. 6. 6 #bl_labs Background • The Andrew Mellon Foundation • 2 year initial project
  7. 7. 7 #bl_labs Labs details…what • No digitisation involved, just digitized and born digital Library content • Some content online • Other in digital form but not online yet – e.g. too big, needs work, technical challenges, license restrictions (e.g. onsite access etc.) • Examine and analyse the content, especially entire collections (i.e. cross collection research) • Do research, publish • Make things, e.g. tools, services, apps etc… • Transforming processes, services and tools for scholars / developers using Library digital collections
  8. 8. 8 #bl_labs Lab details…how • Competitions, events and various activities • Creating environment where scholars / developers can work intensively with Library’s digital collections (winners will be resident), but not only… • Encourage research / developers generally to do interesting things with BL digital content (+other) with and across collections (a data driven approach) • Labs is more than the competition! • Ideas can be pursued by talking to Library staff , scholars / developers interested in conducting research / making things, e.g. meetings, events etc., business opportunities
  9. 9. 9 #bl_labs Labs Competition • At least 2 Competitions, winners will work ‘in residence’ where possible • Review and feedback to examine approach, at the moment ‘Data Driven Research’, i.e. here is our data come and do stuff with it! • Focus particularly on cross collection research, research at scale • Other research and development encouraged too! • Help develop tools and services to support digital scholarship • Any suggestions for next competition? When to visit?
  10. 10. 10 #bl_labs How Labs works… BL LabsCompetition Events Contact Software Publications Tools and services to support Digital Scholarship BL Digital Collection / Data idea BL Digital Collection / Data Other Digital Collection idea idea idea idea
  11. 11. 11 #bl_labs The plan in time… • Launch Event – 25th March 2013 – draft details of competition and feedback, launched end of April • Virtual 17 May (Video of Hangout Available), more virtual event? • Hack Event 28/29 May London • AHRC research network - 'the infinite archive‘, Open University, University of Nottingham, University of Warwick • Winners announced at 6 July 2013, York (Digital Heritage Conference) • Best two ideas work in residence and showcase their work on the 11th November 2013, when the next competition will be launched (deadline end of March 2014, work on entry May to end of October 2014, Nov/Dec Showcase and final event • Other ideas, look at supporting in other ways e.g. through Labs, other Library departments, Business opportunities etc. • Case studies produced around Nov/Dec for first iteration 2013 and second iteration 2014
  12. 12. 12 #bl_labs BL Labs Services • Developed for scholars / developers wanting to use digital Library collections for research and development • Application Programming Interface (APIs) for data / collections • Powerful interface for researchers and developers for conducting innovative and transformative projects • We are currently doing an audit of what web services the BL has. • Lead by Technical lead
  13. 13. 13 #bl_labs Labs Hack Days… • Bringing researchers, developers, curators and anyone interested with collections together at events, want to do more! • Brainstorming ideas – ideas lab • Scoping research, ideas, solving problems and developing prototypes Brainstorm ideas and group Consider and choose Work into the night and show what has been done
  14. 14. 14 #bl_labs Case studies… • Research generated from the competitions and general activity of Labs • Inform the Library / Other libraries around the world about the issues, challenges, solutions and benefits generated when using a Labs approach
  15. 15. 15 #bl_labs Labs Content • Work with curators to identify those digital collections that are suitable for Labs • Focus on those that are copyright cleared at the moment • Others considered in light of challenges, i.e. in scope for Labs work • Engage researchers/developers with these materials through meetings, road-shows, hack days, promotions (including competitions and events) • Have list of 100s of digital collections • Need a filter
  16. 16. 16 #bl_labs Where do you start?
  17. 17. 17 #bl_labs British Library Digital Collections • Most content unique! • Copyright cleared for research and non-commercial use? • Curated? • Collection Level Metadata available? Available only in Reading Rooms Available on site Digital but not online – various storage devices Available only onsite at the moment Hack Events, In residence Digital and online
  18. 18. 18 #bl_labs We are hoping to launch at some point
  19. 19. 19 #bl_labs Types of content • Datasets • Books / Text • Images / Music • Maps • Sounds • Multimedia
  20. 20. 20 #bl_labs British National Bibliographic Data • (part of • 2.8 Million individual records • Available as Linked Open Data, Basic RDF/XML and Marc21.
  21. 21. 21 #bl_labs UK Web Archive Data • pendata • 32TB subset of the Internet Archive’s web collection relating to the UK. • Collecting freely since e- legal deposit • Comparing events across media types?
  22. 22. 22 #bl_labs 19th Century Digitised Books • 68,000 digitised volumes and their accompanying JP2, PDF, metadata and OCR text files • Many rare or inaccessible books published between 1789 and 1870 and covers a wide range of subject areas including philosophy, history, poetry and literature, travel • Representative materials here: • Text mining? Text is 29Gb
  23. 23. 23 #bl_labs International Dunhuang Project • IDP international collaboration • images of all manuscripts, paintings, textiles and artefacts from Dunhuang and archaeological sites of the Eastern Silk Road freely available on the Internet and to encourage their use through educational and research programmes • • Time-lining the silk road?
  24. 24. 24 #bl_labs Book ordering data… • Every day thousands of items are ordered up from the library stacks and delivered to researchers in our reading rooms. We can provide daily anonymised reports of these titles including shelfmark information and reading room location • Visualising what readers are reading? Anonymised reader data… • Anonymised information about our readers • Big buckets • Social trends?
  25. 25. 25 #bl_labs Bringing Text Mining to the Library Many electronic journals we have negotiated text mining rights for (50%) journals A project to get the tools to readers?
  26. 26. 26 #bl_labs Environment and Nature Sounds • thousands of recordings from the Sound Archive's unrivalled natural sounds collection is available for free download as MP3’s to staff and students UK higher and further education institutions • • Adding sounds to poetry?
  27. 27. 27 #bl_labs Resonance FM • London Community Arts Radio Show • • 10 year sound archive! • Speech to text?
  28. 28. 28 #bl_labs Example Research Methods • Corpus Analysis tools • Visualisations • Topic Models • Location based searching • Geotagging • Annotation • APIs for datasets e.g. Metadata, Images • Crowdsourcing / Human Computation • Natural Language Processing • Transcribing
  29. 29. 29 #bl_labs Examples from Launch event
  30. 30. 30 #bl_labs Ideas from first competition • Text mining in the reading rooms • Curatorial – funded through other stream • Visualising large collections of sound at a glance (thumbnails) • Using sheet music – combined with AHRC proposal being submitted now • Working with a radio archive – possibly funded through another stream – semantic media • Serious news
  31. 31. 31 #bl_labs What’s happening at the moment? • Working with competition entrants • Working on first year project deliverables • Planning for next competition and dissemination
  32. 32. 32 #bl_labs Dan Norton • Mixing the Library: The Disc Jockey and the Digital Collection • Dan Norton is a PhD Researcher, University of Dundee and is Artist in Residence at Hangar, Centre for Art and Research, Barcelona. • Building an interface for interacting in digital collections developed from the DJ's interaction with information. His project uses selecting and mixing as creative behaviours for exploring, learning, and authoring with digital collections. • The prototype will demonstrate the interface requirements necessary for collecting, enriching (organizing, annotating),and mixing information from digital libraries; for building aesthetic, experimental, or logical links between resources; and for developing ad hoc visualizations, or publishing annotated data. • Working on functioning prototype to collect URLs for different media types e.g. text, video, sound and images, shuffle order and then comparing two digital objects and being able to annotate in real time
  33. 33. 33 #bl_labs Pieter Francois • The Sample Generator for Digitised Texts • Pieter Francois is a Postdoctoral Researcher at the University of Oxford. • The ‘Sample Generator for Digitized Texts’ is a relatively simple piece of software which connects one or more major catalogues or bibliographies with one or more collections of digitized texts through the metadata. • The main aim is to tell the story of over a million nineteenth-century books through a structured sampling of 68,000 books, focus is on travel routes/accounts • Creating demonstrator which searches across 1.8 million records and where possible find highly significant digital samples for further research from the books we have digitised so far
  34. 34. 34 #bl_labs Images from the 19th Century books From late 1600s From 1700s 600,000 small images so far (estimated)! Work on going with BL Labs Technical Lead
  35. 35. 35 #bl_labs The Mechanical Curator! • Just launched, lot’s of interest!! • Randomly selected small illustrations and ornamentations, posted on the hour. • Rediscovered artwork from the pages of 17th, 18th and 19th Century books.
  36. 36. 36 #bl_labs Distribution of the use of DDC in BnB Work by Ben O’Steen, BL Labs Technical Lead Looking at the metadata for the books they have limited metadata for subject classification
  37. 37. 37 #bl_labs Workshops – Ideas Labs • Datasets • Ideas • Outputs
  38. 38. 38 #bl_labs Engaging with Labs • ENGAGE • Submit your name, contact details and lets speak! • AHRC Big Data Call, a number of requests to use BL Collections / Data if successful • Next competition? Launch 11 November! • Just talk to us, work with our collections! • Ideas labs, workshops and hack events
  39. 39. 39 #bl_labs Speak to me: 0207 412 7324 Email me: or Labs Website: Twitter: @BL_Labs Hash Tag: #bl_labs Jiscmail: Blog: What next?