Join Dr Mia Ridge, Digital Curator for Western Heritage Collections at the British Library, to discover how research and technology can create a richer picture of our past. Living with Machines is a collaborative project between the Alan Turing Institute, universities and the British Library – home to the world’s most comprehensive research collection. Together, they are using data science and digital history methods to analyse millions of historical documents and understand the impact of mechanisation in the 19th century. Their initial approach has focused on specific regions like Yorkshire that will help tell us the story of industrialisation in Britain.
Call Girls Near Surya International Hotel New Delhi 9873777170
Rethink Research
1. Rethink research,
illuminate history with the
British Library
Dr. Mia Ridge, Digital Curator, British Library; Co-I Living with Machines
@mia_out digitalresearch@bl.uk
@BL_DigiSchol @LivingWMachines https://www.livingwithmachines.ac.uk
Leeds Digital Festival, April 2020 #LeedsDigi20
2. Overview
• About the British Library
• How is the Library responding to changing research practices?
• Why collaborate with academics to explore the future of research?
• How are we combining data science and historical research
methods?
• Where can you find out more?
3. About the British Library
Our mission: 'For research, inspiration and enjoyment'
• The British Library is the national library of the
UK.
• By law we receive a copy of every publication
produced in the UK and Ireland.
• If you saw 5 items a day it would take you over
80,000 years to see the whole collection
4. About the British Library
The British Library has up to
200 million items, including:
16 million books; 8 million
stamps; 350,000 manuscript
volumes; 60 million patents;
4 million maps; 1.6 million
music scores; 60 million
newspapers; pamphlets,
magazines; television and
radio recordings; sounds;
billions of webpages;
terabytes of e-books, e-
journals.
5. About the British Library
• Over 3 million physical and
born-digital new items are
added every year
• Discoverability across
collections is a challenge
• Cataloguing is sometimes
minimal / incomplete: it can
be hard to find specific
historical items, let alone
content within items
6. About the British Library
• Unlike local libraries, we can't lend out books
• We have 11 reading rooms in London and 1 reading
room at Boston Spa
• About 4% of the collections are digitised or born-
digital
• Some digital content is legally only available on-site
• Digital access helps us respond to the changing
research landscape: https://www.bl.uk/digital
7. The British Library's Digital Scholarship team
Our mission is to enable the use of the British Library’s digital
collections for research, inspiration, creativity, and enjoyment.
Digital Research Team Endangered ArchivesLiving with Machines BL Labs
Connect and share
Support digital
scholars
Agents for change Invest in our staff
Innovate and
collaborate
8. The British Library's Digital Research team
The Digital Research Team is a cross-disciplinary
mix of curators, researchers, librarians and
programmers supporting the creation and
innovative use of British Library's digital
collections by:
• Supporting digitisation for use in research
• Enhancing digital skills for BL staff (so they can help
Readers)
• Collaborative projects, digital research support and
guidance
• Events, competitions, and awards (BL Labs)
• Outreach through our blogs and social media
Neil Fitzgerald
Head of Digital Research
Stella Wisdom
Contemporary British
Nora McGregor
Europe & Americas
Dr Mia Ridge
Western Heritage
Dr Adi Keinan-Schoonbaert
Asia & Africa
Dr Rossitza Atanassova
Digitisation
9. Enabling the shift from reading pages to
datasets
https://www.flickr.com/photos/nasacommons/9467783474
https://www.flickr.com/photos/statelibraryqueensland/8808717962
10. Collaborative projects help the ship turn more
quickly
Collaborating with
academics helps us:
• Understand their needs
• Learn more about our
collections
• Bring new techniques
into our practice so we
can then support other
researchers
• Explore new and
emerging technologies at
scale
11. Living with Machines: responding to the
growth of digital scholarship and data science
An opportunity to:
• Collaborate with The Alan Turing
Institute, the national institute for
data science and artificial
intelligence, based in our St
Pancras building
• Collaborate with subject and
methodological experts, building
on the Library's expertise in
research services and public
engagement
• Understand the potential and
challenges of AI / machine
learning for cultural
organisations
• Build on digitisation work
12. Our Partners Our Funders
Living with Machines
Rethinking the impact of technology
on the lives of ordinary people
during the 19th century
@LivingWMachines
https://livingwithmachines.ac.uk
13. Living with Machines aims to
• Generate new historical perspectives on the effects of the mechanisation of
labour on the lives of ordinary people in Britain during the 'long nineteenth
century' (c.1780-1918).
• Develop new computational techniques for working with historical research
questions.
• Create new tools and code that can be reused and built upon
• Support the wider academic and cultural heritage sector in using digital
methods to answer historical questions.
• Enrich the British Library’s data holdings for the benefit of all.
• Advance public awareness of data science methods and how digital research
in the humanities can enhance understandings of history.
14. Living with Machines (also) aims to
• Dr Ruth Ahnert, Principal Investigator: "...a bold proposal for a new research
paradigm. That paradigm is defined by radical collaboration that seeks to close the
gap between computational sciences and the arts and humanities. We want to
create both a data-driven approach to our cultural past, and a human-focused
approach to data science."
• Roly Keating, Chief Executive of the British Library: "By opening up our unrivalled
collections to this unique collaboration between historians and data scientists, we
hope to not only aid researchers and communities in their understanding of our
shared past, but to pave the way towards revolutionising the future of historical
research".
• Adrian Smith, Director of The Alan Turing Institute, commented: "Data science and
artificial intelligence have the potential to supercharge the science and
humanities. We can analyse vast amounts of data at a huge scale and uncover
new insights and questions... and deliver tools and techniques which will benefit
scholars for generations to come".
15. Benefits for the cultural heritage sector
• Provide models for research collaboration and partnership
• Enhance cultural organisations reputations for leading digital innovation
• Improve working with large scale digitisation, digital content, and data:
digitisation workflows, data processing for analysis, ingesting enhanced
metadata
• Better incorporate learnings and outcomes of research projects
• Grow digital collections
• Increase understanding of and ability to apply advanced methods
• Increase awareness of data science and digital history
• Develop a coherent, user-friendly model for mixed-rights access to copyright
/ public domain items and datasets
• Incorporate digital content and data in the exhibition programme
17. Organising our work to create a richer
picture of the past
We began with some major research themes and types of work:
• Sources
• Language
• Space & Time
• Communities
• Integration, infrastructure & interfaces
• Data acquisition and wrangling
18. Historical sources include…
• Full-text: newspapers, trade and postal directories, autobiographies,
journals and diaries, novels, Parliamentary papers
• Tabular: census records; birth, marriage, death records
• Visual: Ordnance Survey maps, Goad fire insurance maps, images
in publications
• Mostly British Library collections but we're negotiating access to
other collections and derived data
• Digitised maps provided by the National Library of Scotland;
digitised newspapers provided by the British Newspaper Archive for
the British Library
19. How are we combining
data science and
historical research
methods?
20. Understanding bias in sources
• How representative are the
digitised newspapers of the 19C
press as a whole?
• Create database from the
newspaper press directories and
contrast with holdings and
digitised articles
• Use machine learning to improve
structured data
• Visualise the contours of the
corpus (the gaps, the absent
voices)
21. Exploring biases in digitisation
Newspaper titles
metadata: place of
publication or coverage
Newspaper press
directories: circulation of
newspapers
23. Analysing language
• Can we measure the social and cultural impact of
mechanisation as reflected in source texts?
• What prompted individuals to write about machinery?
• Issues with OCR (optical character recognition) quality
• Annotations and entity extraction
24. Analysing language: machines as agents of
change?
• 1. Toponym resolution
Can we geolocate mentions of places in texts?
• 2. Lexicon expansion
How can we automatically generate lists of terms related to the
Industrial Revolution from a small list of seed terms suggested by
experts?
• 3. Agency
Can we put these questions together on agency of machines: were
machines and machine-related terms used as agents in our data, and
can we cross- reference this with time and place?
25. Finding place names in texts (toponym
resolution)
Poole & Dorset Herald(November 23, 1882), British Newspaper Archives
Eastern Morning News(September 7, 1889),British Newspaper Archives
26. Creating Infrastructure, Interfaces,
Integration (3I)
• Establish secure storage
and compute environment
• Help team to apply data
science and machine
learning approaches
• Facilitate use of the project
data and infrastructure
• Develop data model to
describe source objects
and annotations
• Support reproducible data-
driven humanities research
28. Understanding change over Space and Time
• Exploring the use of maps as a
source for understanding
industrial change
• Testing image analysis and
computer vision methods on
digital images of nineteenth-
century British Ordnance Survey
(OS) maps.
• How well do machine learning
methods developed for
contemporary images work on
historic collections
• How can we annotate efficiently?
• Plus census linking (soon?)
29. Working with our Communities
• Classifying and annotating
articles to build datasets for
analysis
• Find engaging tasks to
introduce the project
• How do we integrate
machine learning into
crowdsourcing?
• Make more interesting
tasks? Provide better
feedback? Improve
datasets 'on the fly'?
30. Early work with Communities
Participants are actively engaging with our research
32. Further work with Communities
• Use classified articles as the basic of machine learning or lexicon
expansion methods to find more relevant articles
• Test how accident details could be used with nominal linkage
(matching names between datasets) and other methods to trace
the longer-term impact of accidents on individuals, families and
communities
• Explore how language around accidents changed over time, place,
type of newspaper, changes in how people understood technology
• Organise public talks, editathons etc online / in local libraries
33. Find out more! Sample recent blog posts
• Newspaper copyright and Living with Machines
• The impact of OCR on downstream Natural Language Processing tasks
• D3 JavaScript visualisation in a Python Jupyter notebook
• Code and Coffee 💻☕️
• Deep learning reading group
• Sources: Understanding the Victorian Newspaper Landscape
• Collecting annotations from British Library staff
• Finding words in maps, part 2: seeing the results
• A quick tour of two counties
Find out more: http://livingwithmachines.ac.uk/latest/
34. Find out more!
• Research papers and datasets: https://bl.iro.bl.uk/
• GitHub code https://github.com/Living-with-machines
• Blog posts http://livingwithmachines.ac.uk/latest/
• Mailing list: https://mailchi.mp/1ac5f6281510/livingwithmachines
• Website: https://livingwithmachines.ac.uk
• Email: digitalresearch@bl.uk
• Twitter: @LivingWMachines
35. Thank you! Questions?
Dr Mia Ridge, Digital Curator, British Library @mia_out bl.uk/digital
• Research papers and datasets: bl.iro.bl.uk/
• GitHub code github.com/Living-with-machines
• Blog posts livingwithmachines.ac.uk/latest/
• Mailing list: mailchi.mp/1ac5f6281510/livingwithmachines
• Website: livingwithmachines.ac.uk
• Email: digitalresearch@bl.uk
• Twitter: @LivingWMachines
Editor's Notes
I'll aim to talk for about 40 minutes then go into questions.
The Library is big
Only one of several collaboration and digital-focused teams in the Library. Set up in 2010, the DS team was formed as a way of dedicating focus on the changing research landscape.
Digital curators are embedded in collection areas, or joining the library as part of major digitisation projects.
The Digital Scholarship Department enables innovative research based on the British Library's digital collections through:
Getting content in digital form and online
Collaborative projects
Offering digital research support and guidance
Events, competitions, and awards
I often describe my job as...
The British Library's St Pancras building is meant to resemble a ship – and we all know how much work it can take to turn large ships.
An opportunity to learn from each other.
Big thanks to our funders and academic partners. Project runs about 5 years.
Form the press release...
How representative is the digitised press of the 19C press as a whole? Contextual information can create confidence in the claims we make.
Newspaper Picker tool was designed to help the historians pick newspapers to digitise; needed to get from 2500 newspaper titles to about 50. Visualisation weights longer runs of newspapers (more sustained readership), under-represented areas and audiences
Developed tools for disambiguating and locating place names in texts.
We're using Jupyter Notebooks as they let us share work in progress while providing documentation, commentary and executable code