1. Digital Research Support
@ British Library
Stella Wisdom, Digital Curator
20th & 21st Century Collections
Doctoral Open Day 2019
2. www.bl.uk
What is Digital Scholarship?
2
"Allows research areas to be
investigated in new ways,
using new tools, leading to
new discoveries and analysis
to generate new
understanding."
Dr Adam Farquhar
Head of Digital Scholarship
British Library
• There’s been a technological and computational
shift in scholarship
• Digital tools have transformed the research
process, specifically two fundamental aspects
of research: search and analysis
• Digital tools help overcome the traditionally
most difficult aspects of being a researcher:
finding information, and interpreting it
3. www.bl.uk
Opportunities for researchers
You can:
• Explore a bigger body of material computationally - 'reading'
thousands, or hundreds of thousands, of volumes of text, images or
media files
• See trends, patterns and relationships not apparent from close
reading individual items, or gain a broad overview of a topic
• Test an idea or hypothesis on a large dataset; generate classification
data about people, places, concepts
3
Adapted from Mia Ridge’s blog post, Some challenges and opportunities for digital scholarship in 2018 (25 April 2018)
Scale
Perspective
Speed
4. www.bl.uk
Some digital tools and methods
4
OCR/HTR
Data
Visualisation
Text and Data
Mining
Digital
Mapping
Crowdsourcing Emerging formats
5. www.bl.uk 5
Meet the Digital Research Team
The Digital Research Team is a
cross-disciplinary mix of curators,
researchers, librarians and
programmers supporting the
creation and innovative use of
British Library's digital collections.
Neil Fitzgerald
Head of Digital
Research
Stella Wisdom
Contemporary
British
Nora McGregor
Europe &
Americas
Dr Mia Ridge
Western
Heritage
Dr Adi Keinan-
Schoonbaert
Asia & Africa
Dr Rossitza
Atanassova
Digitisation
Tom Derrick
2 Centuries of
Indian Print
6. www.bl.uk 6
How we can help you
We support new ways of exploring and accessing our collections through:
• Working behind the scenes to support and improve processes for getting content in
digital form and online
• Enhancing digital skills
for BL staff
• Collaborative projects
• Offer digital research
support and guidance
• Events, competitions,
and awards (BL Labs)
• Outreach through our
blogs and social media
https://twitter.com/AshaMarie18/status/1061925248940101632
8. www.bl.uk 8
Political Meetings Mapper
“I was able to do in minutes with a python code what I’d spent the last ten
years trying to do by hand!”
Dr. Katrina Navickas, BL Labs Winner 2015
5,519 meetings discovered in 462 towns
and villages across the UK
http://politicalmeetingsmapper.co.uk/maps
10. www.bl.uk 10
State of Dementia Research in the UK
Research Question:
The Alzheimer’s Society appointed RAND Europe to produce a report on the state of dementia
research in the UK. RAND wished to investigate the dementia workforce pipeline - how many
researchers are working on dementia and how this is changing over time.
http://www.rand.org/randeurope/research/projects/mapping-uk-dementia-research-landscape.html
Source Collection:
British Library’s electronic thesis service EThOS http://ethos.bl.uk
Digital/Computational Techniques:
EThOS Metadata Manager and RAND analysed a list of theses awarded from 1970 onwards.
Outcome: Discovered dementia-related PhD research has been steadily increasing over the
last 30 years in the UK, however, cancer-related PhDs have skyrocketed over the same time
frame. Now five times more PhD researchers choose to work on cancer than dementia:
http://britishlibrary.typepad.co.uk/science/2015/09/a-novel-use-of-phd-data.html
A Review of the Dementia Research Landscape and Workforce Capacity in the United
Kingdom, http://www.rand.org/pubs/research_reports/RR1186.html
As an extensive source of information on PhDs undertaken in the UK, EThOS data can also be
used to look at trends in PhD research over time.
11. www.bl.uk 11
EThOS & Multimedia PhD Theses
Coral Manton worked on a British Library research placement investigating multimedia
and non-text PhD research outputs and how EThOS might develop to meet the
challenge of evolving digital theses.
She interviewed doctoral students from various disciplines as case studies
http://blogs.bl.uk/digital-scholarship/2016/09/multimedia-phd-research-and-non-text-theses.html
https://www.bl.uk/case-studies/sam-martin
https://www.bl.uk/case-studies/rob-sherman
12. www.bl.uk 12
Emerging Formats Project
This project builds our ability to collect publications designed for mobile devices
that respond to reader interaction or are structured as databases.
Focus on three format types: eBook mobile apps, web-based interactive narratives
and structured data.
https://www.bl.uk/projects/emerging-formats
16. www.bl.uk 16
It Must Have Been Dark by Then
by Duncan Speakman
https://ambientlit.com/index.php/it-must-have-been-dark-by-then/
17. www.bl.uk 17
For 2018 International Games Week the British Library hosted
The Narrative Games Convention: AdventureX
10-11 November 2018
http://adventurexpo.org/
https://youtu.be/PyJl5stFteI
19. www.bl.uk 19
Creating a Chronotopic Ground for the Mapping of Literary Texts: Innovative
Data Visualisation and Spatial Interpretation in the Digital Medium
http://gtr.rcuk.ac.uk/projects?ref=AH%2FP00895X%2F1
http://www.lancaster.ac.uk/news/articles/2017/mapping-project-will-open-up-new-
routes-to-uncharted-territory/
http://www.huffingtonpost.co.uk/professor-sally-bushell/literary-mapping-in-
digit_b_17319788.html
20. www.bl.uk 20
The Litcraft component of the project
is a semi-standalone series of
developments aimed at encouraging
elements of literary environmental
criticism for younger audiences.
Primary and secondary English
lessons do not typically focus on the
descriptions of textual setting; one of
our aims is to introduce this analytic
field, through designing a series of
standalone gaming-based resources
that engage with landscape and world
design.
https://chronotopiccartography.wordpress.com/litcraft/
22. www.bl.uk 22
Datasets
data.bl.uk
As part of its work to open its data to wider use, the British Library is
making copies of some of its datasets available for research and
creative purposes.
We aim to describe collections in terms of their data format (images,
full text, metadata, etc.), licences, temporal and geographic scope,
originating purpose (e.g. specific digitisation projects or exhibitions)
and collection, and related subjects or themes.
This site is a 'beta', and is in development.
If you have questions or feedback about this site or our open data
work, please email digitalresearch@bl.uk.
We'd also love to hear what you've done or made with the data.
24. www.bl.uk
Many creative projects have used Public
Domain images from the Microsoft
Partnership Digitisation Project 2006-8
•68,000 volumes (47,000+ titles) published in the
19th century mostly in English
•Excluded authors active 1850-1901 and who died
after 1936
•Output: 25 million pages
25. www.bl.uk 25
The illustrations were extracted algorithmically from the
digitised books:
25
<?xml version="1.0"
encoding="UTF-8" ?>
- <mets:mets
xmlns:xsi="http://ww
w.w3.org/2001/XML
Schema-instance"
xmlns:mets="http://w
ww.loc.gov/METS/"
xsi:schemaLocation=
"http://www.loc.gov/
METS/
http://www.loc.gov/
standards/mets/ver
sion18/mets.xsd
info:lc/xmlns/premi
s-v2
Image snipped out
Algorithmically
From ALTO XML
Image taken from page 207 of 'London and its Environs. A
picturesque survey of the metropolis and the suburbs ...
Translated by Henry Frith. With ... illustrations'
ALTO XML
31. www.bl.uk
David Normal created light boxes around the
Burning man, using the British Library’s Flickr Images
The Crossroads of Curiosity Installation
at Burning Man Festival
32. The Crossroads of Curiosity Installation at the British Library
June to November 2015
The installation featured an “augmented reality” self-guided tour enabling viewers to
explore the meaning and origins of the painting’s symbols using Blippar.
https://www.crossroadsofcuriosity.org/
33. Odyssey Jam 2017
https://itch.io/jam/odysseyjam
Writing challenge tied in with Read Watch Play, a partnership of libraries
worldwide encouraging themed discussions of books, films, music and games,
each month they have a theme and for March 2017 it was #waterread.
34. Odyssey Jam 2017 entries
https://itch.io/jam/odysseyjam/entries
We encouraged entrants to make use of the digitised images on Flickr that The
British Library had released under a creative commons license.
Some games used these images, e.g. No One and 108 suitors.
35. 200th anniversary of the
publication of Frankenstein. A
perfect opportunity to run a gothic
novel themed challenge.
Gothic Novel Jam with Read Watch
Play; participants to make
something creative inspired by the
gothic novel genre and share it on
the itch.io Gothic Novel Jam site.
Entries invited to include stories,
poetry, art, games, music, films,
pictures, soundscapes, or any
other type of digital media
response.
We wanted participants to use
images from the British Library
Flickr account as inspiration
36. Gothic Novel Jam 2018
We received 46 entries submitted by people from all around the world including UK,
Australia, America and France.
https://itch.io/jam/gothic-novel-jam/entries
37. We encouraged entrants to use the digitised images on Flickr that The British Library had
released as Public Domain. As a glow brings out a haze by Eldridge Misnomer
is a lovely example of how these illustrations are used as a key part of the storytelling.
38. www.bl.uk 38
Have a digital research enquiry?
Get in touch!
Web: http://www.bl.uk/subjects/digital-scholarship
Blog: http://britishlibrary.typepad.co.uk/digital-scholarship/
Email: digitalresearch@bl.uk
Twitter: : @BL_DigiSchol #bldigital
Editor's Notes
Some of these processes help get data ready for analysis (e.g. turning images of items into transcribed and annotated texts), while others support the analysis of large collections at scale, improve discoverability or enable public engagement.
OCR/HTR: creating machine-readable text for search/analysis
Data viz: for analysis or publication
Text and data mining: applying classifications to or analysing texts, images or media.
Digital mapping: displaying/analysing spatial data
Crowdsourcing: public participation and learning
Engaging with emerging formats: 3D, VR, AR
Set up in 2010, the DS team was formed as a way of dedicating focus on the changing research landscape.
Now embedded in collection areas, or joining the library as part of major digitisation projects.
The Universal Viewer: an interface for viewing images from the Library's digitised collections. It enables easier access and download of digitised collection items.
Projects: collaborative PhDs, placements
Labs: roadshows, symposium
Research Question:
Chartism was the biggest popular movement for democracy in 19th Century British history. They campaigned for the vote for all men. The Chartists advertised their meeting in the Northern Star newspaper from 1838 to 1850.
The question is, how many of the meetings took place and where? We started with 1841-1845.
Source Collections:
19th Century Digitised Newspapers, specifically Northern Star newspaper
Digitised and Georeferenced Map of Oxford Street
Digital/Computational Techniques:
The images of the relevant pages of the Northern Star were run through an Optical Character Recognition program (Abbyy Finereader 12) and the resulting text was checked manually.
We developed a set of Python codes to extract and geo-code the place of meeting, using a gazetteer of places, and parse the date of the meeting.
Outcome: 5,519 meetings discovered in 462 towns and villages across the UK! http://politicalmeetingsmapper.co.uk/maps/
Research Question:
Chartism was the biggest popular movement for democracy in 19th Century British history. They campaigned for the vote for all men. The Chartists advertised their meeting in the Northern Star newspaper from 1838 to 1850.
The question is, how many of the meetings took place and where? We started with 1841-1845.
Source Collections:
19th Century Digitised Newspapers, specifically Northern Star newspaper
Digitised and Georeferenced Map of Oxford Street
Digital/Computational Techniques:
The images of the relevant pages of the Northern Star were run through an Optical Character Recognition program (Abbyy Finereader 12) and the resulting text was checked manually.
We developed a set of Python codes to extract and geo-code the place of meeting, using a gazetteer of places, and parse the date of the meeting.
Outcome: 5,519 meetings discovered in 462 towns and villages across the UK! http://politicalmeetingsmapper.co.uk/maps/
The work of Labs is really about a number of stories, stories about digital collections and about researchers wanting to ask fascinating research questions about them. Let’s now tell you a story about one collection and the intended and unintended consequences of working with it.
60 seconds
The Library digitised 68,000 predominantly 19th century books from our collections a few years ago (around 2.7 % of the physical total in that period). You can view them from our catalogue or read them on your <click>IPad via the Historical Books app developed by BiblioLabs.
There are 22 million individual page images, along with full text scans of these images, all of which contain untold quantity of useful data such as names of people, places, historical events, dates.
with no restrictions on use by Microsoft
So the question became then, what next? What can 68,000 books tell us?
60 seconds
As the books were scanned for text, this had a fortunate ‘side effect’ the software not only tries to detect the text on the page but also where the images might be. There had already been some interest in the images from the community of researchers. It seemed easy to extract them.
s part of the Labs competition, Matt Prior attended one of our hack events and when examining our book data and was very interested in the images from the books.
Meanwhile the algorithm that Ben had written to snip the images from the OCR scans was still churning away, how many were there going to be? The Mechanical Curator could publish them every hour, but was there somewhere we could put them all for people to browse when they wanted. Importantly if we did put them somewhere, could we get people to help us add descriptions to the individual images making them infinitely more discoverable.]
With an algorithm by Ben O’Steen we snipped out images from digitised books and put them on to Flickr on December 13 2013, there were over a million, but the problem we had was that we knew which books they came from (author/dates), but we didn’t’ have any information about the images. By releasing them onto flickr, we have got people to start tagging them and using them in very creative ways.
Hosting them internally was not an option and there was not sufficient metadata to put them on Wikipedia. Flickr seemed the obvious option as it is a platform that can support high usage, did not require metadata, allowed tagging and it is free for public domain images.
He speaks about his project, how he came across the images and what he did with them.
How he learnt about the image = it was pure serendipity
Taking images out of the context of books creates potential to reinvent them in a new context.
http://youtu.be/3AOa98RsA2Q
http://www.youtube.com/watch?feature=player_detailpage&v=3AOa98RsA2Q#t=48
Make sure subtitles are on.
This is a surprising use of the images we put onto Flickr. Once a year in the summer, tens of thousands of participants gather in Nevada's Black Rock Desert to create Black Rock City, dedicated to community, art, self-expression, and self-reliance. They depart one week later, having left no trace whatsoever. [This year it took place between August 25 to September 1, Nevada, USA, the show ends by burning an effigy of wooden man! <click>]
American Artist David Normal used images from your Flickr Commons collection and worked on a set of collages called "Crossroads of Curiosity". The finished paintings based on these collages were presented in full colour as ' lightboxes at this year's Burning Man Festival, the theme for which was "Caravansary“. They were presented around the base of the effigy of the Burning Man in the heart of the festival.