1. 101 This is Digital Scholarship:
Exploring the Computational Turn
in the Humanities
Nora McGregor
Curator, Digital Research
@ndalyrose
2. www.bl.uk 2
10:00 Introductions
10:15 The digital turn and defining digital scholarship
10:30 Behind the Buzzwords
11:00 Comfort Break
11:15 Behind the Buzzwords
11:45 Group Activity & Creative Discussion
12:15 What's next?
12:30 Close
Timetable
3. www.bl.uk 3
• Define Digital Scholarship
• Understand some of the buzzwords
• Text, Data Mining & Data Visualisation
• Georeferencing & Digital Mapping
• Crowdsourcing & Collaboration
• Humanities Data & Datasets
• Explore how digital technologies are re/shaping research and the
challenges/opportunities this affords.
Today’s Main Agenda
5. www.bl.uk 5
Meet the Digital Scholarship Department
Founded in 2010, we support the innovative use
of British Library's digital collections and data
through:
• Working behind the scenes to get content in
digital form and online
• Offering digital research support and guidance
• Supporting collaborative projects
• Running events, competitions, and awards
Teams:
• BL Labs
• Digital Curators
• Endangered Archives Programme
6. www.bl.uk 6
The Digital Curators
Nora McGregor
Asian & African Collections
Stella Wisdom
Contemporary British
Aquiles Alencar-Brayner
European & Americas
Mia Ridge
Western Heritage
Rossitza Atanassova
Digitisation
9. www.bl.uk 9
More than simply resource discovery….
Using computational methods either to
answer existing research questions or to
challenge existing theoretical
paradigms….
Geocoding
Data Visualisation
Data Mining
Georeferencing
Crowdsourcing
Text mining
Collaboration
What is Digital Scholarship generally?
"Allow research areas to be investigated in
new ways, using new tools, leading to new
discoveries and analysis to generate new
understanding."
Dr. Adam Farquhar
Head of Digital Scholarship
10. www.bl.uk 10
The emergence of the new digital humanities isn’t an isolated
academic phenomenon. The institutional and disciplinary changes
are part of a larger cultural shift, inside and outside the
academy, a rapid cycle of emergence and convergence in
technology and culture
Steven E Jones, Emergence of the Digital Humanities (2014)
http://lisacharlotterost.github.io/2015/06/20/Searching-through-the-years/
14. www.bl.uk 14
Text & Data Mining
• Deriving information from and finding patterns in texts and large
datasets.
Data Visualization
• '…showing quantitative and qualitative information so that a viewer can
see patterns, trends, or anomalies, constancy or variation, in ways that
other forms – text and tables – do not allow.' (Michael Friendly)
• '…interactive, visual representations of abstract data to amplify
cognition' (Card et al)
15. www.bl.uk 15
What does this mean for the Library?
• We have millions of texts, pages and bibliographic records….what can
they tell us? Today's scholars have the ability to realistically answer that
question….if we help them!
• We can make it more accessible by providing it in machine readable
formats scholars can use and negotiating open licensing & innovative
mechanisms for accessing content legally.
• Keeping our data clean and having the skills to align it with other
datasets is important to enabling its widespread research use.
• GLAM records often contain uncertainty and fuzziness (e.g. date
ranges, multiple values, uncertain or unavailable information)—We often
have unique expertise in deciphering this and need to bring this
knowledge to bear.
Courses: 107 Data Visualisation,118 Cleaning Up Data and 109 Data on the Web
16. www.bl.uk 16
Case Study: Political Meetings Mapper
Dr. Katrina Navickas, a self-professed
luddite, wanted to know how many, and
where, Chartist movement meetings took
place in the 19th Century and if there was a
more efficient way to extract this information
programmatically from our digitised
newspapers, rather than by hand.
5,519 meetings held from 1838 to 1850
discovered in 462 towns and villages across
the UK!
Will be added to her existing findings:
http://protesthistory.org.uk/the-story-1789-
1848/database-of-meetings
“I was able to do in minutes with a python code what
I’d spent the last ten years trying to do by hand!”
-Dr. Katrina Navickas, BL Labs Winner 2015
17. www.bl.uk 17
Case Study: Big Data History of Music
How can vast amounts of bibliographic data held by research libraries be unlocked for
music researchers to analyse?
Can this data be interrogated in ways that challenge the traditional narratives of music
history?
Analyses and visualisations
exposed previously
uncharted patterns in the
history of music, for instance
the rise and fall of music
printing in 16th- and 17th-
century Europe (huge dips in
output in Venice were down
to plague and war).
https://www.royalholloway.ac
.uk/music/research/abigdata
historyofmusic/home.aspx
18. www.bl.uk 18
Case Study: Digital Music Lab: Infrastructure and
tools for analysing Big Music Data
This collaborative AHRC project used and extracted
information from over 29,000 recordings from the BL's
World and Traditional Music Collections, along with over
18,000 recordings from the Classical Music collections,
over 400 recordings of Oral History, and over 700 radio
broadcasts.
The DML project enabled for the first time:
▪ Facility to perform computational analysis of music
audio collections held at the British Library
▪ Large scale analysis for recognising patterns and
trends within and across multiple collections
▪ Remote access to analysis systems and data stored
onsite at BL without copyright infringement
http://dml.city.ac.uk/
See interface & video tutorial:
http://dml.city.ac.uk/vis/#help
21. www.bl.uk 21
• Correspondence, Networks & Relationships (Republic of Letters)
• Mapping Literature (Willa Cather)
• Historical Social Movements (Political Meetings Mapper)
• Historical reconstructions (Orbis)
• Cities & Memory (Bomb Sight)
• Spread of Technology & Ideas (Atlas of Early Printing)
• Human-Environment Interaction (London Sound Survey)
Some representative modes of enquiry…
22. www.bl.uk 22
Orbis: "Google Maps for Ancient Rome"
Video: https://www.youtube.com/watch?v=eWz7vXzmreg
View Interactive Map: http://atlas.lib.uiowa.edu/
Project Site: http://atlas.lib.uiowa.edu/about.php
The Stanford Geospatial
Network Model of the Roman
World reconstructs the time cost
and financial expense
associated with a wide range of
different types of travel in
antiquity.
ORBIS was created using data
from both primary sources and
computational geography
simulations about travel, wind
and sea patterns, seasonal
access, costs and other
considerations to plot realistic
transport networks.
23. www.bl.uk 23
Georeferencing & Digital Mapping
• Georeferencing relates information (documents, datasets, maps,
images) to geographic locations through place names and place codes
or geospatial referencing (longitude and latitude coordinates).
• Digital Mapping is creating a front end interface to display data
associated with a geographic location, data which is typically stored in a
geographic information system (GIS).
“Digital maps [can] contain, in addition to quantitative data, photographs,
video, audio, archival records, even relevant bibliographies….[can]
synthesize material from various disciplines, reveal patterns and
relationships in both time and space, and provide a dynamic new way to
conduct and share research”.
(Humanities Gone Spatial)
24. www.bl.uk 24
What does this all mean for the Library?
• If we provide this data in a way that others may make easy use of
them we’ll greatly increase access and awareness to our
collections.
• We can provide tools for georeferencing our material and ensuring
our objects have proper place names & geospatial coordinates.
• We might investigate building our own map search interface for
content that might be better understood if discoverable at scale in
this way.
• Geocoding our own materials can help us to understand the
scope/breadth of our own collections.
Course: 108 Georeferencing and Digital Mapping
28. www.bl.uk 28
The East India Company archives
include 900 log-books of ships
containing daily instrumental
measurements of temperature and
pressure, and subjective estimates of
wind speed and direction, from voyages
across the Atlantic and Indian Oceans
between 1789 and 1834.
The Met Office digitised and transcribed
these books, providing 273,000 new
weather records offering an
unprecedentedly detailed view of the
weather and climate of the late
eighteenth and early nineteenth
centuries in certain locations, which can
be used to test the accuracy of their
forecasting models.
http://www.clim-
past.net/8/1551/2012/cp-8-1551-
2012.html
http://blogs.bl.uk/untoldlives/2013
/04/history-and-science-meet-
1.html
Video: https://vimeo.com/43884291
35. www.bl.uk 35
Crowdsourcing & Collaboration
• 'The act of a company or institution taking a function once performed by
employees and outsourcing it to an undefined (and generally large)
network of peple in the form of an open call' Wired, 2006
• Or, using cognitive surplus: 'the spare processing of millions of human
brains'.
36. www.bl.uk 36
• It can be a mechanism for us to create new collections relevant to our
researchers, to turn our existing collections into useful datasets.
• The Library operates at a huge scale as business as usual,
crowdsourcing helps us tackle some of this work in order to be more
strategic about applying our limited resources to them.
• Enhance discoverability of our digital collections while creating engaging
experiences for the public, meaningful form of participation.
• It can build relationships with external specialist expertise, and be an
avenue for sharing our own (Wikipedia)!
Course: 105 Crowdsourcing in Libraries, Museums & Archives
What does this mean for the Library?
43. www.bl.uk 43
Humanities Data & Datasets
• A data set represents a distinct collection of data ideally packaged,
preserved and made accessible for enquiry.
• Humanities data might be sets of bibliographic information, images,
image processing details, texts, texts with mark-up and annotations etc.
• Humanities data….is typically messy!
44. www.bl.uk 44
What does this mean for the Library?
• No one knows our collections data, and the state of it, better. This
expertise is essential to informing the research process, particularly
when such data is the basis of a project.
• Easy access to well described reliable data and datasets that
researchers can trust enables new research, and allows us to respond
more quickly to frequent similar requests.
• Catalogue records in particular are not just for locating individual items.
Taken as a whole, if normalised, structured and made accessible by
API, they are a dataset that help us make informed decisions about
collecting, preserving and digitising materials.
Courses: 109 Data on the Web: Mash-ups, API’s and The Semantic Web, 107 Data Visualisation,118
50. www.bl.uk 50
Pilot will see over 4,000 items between 1713 to 1914,
mostly Bengali to be digitised and catalogued
http://www.bl.uk/press-
releases/2015/november/unlocking-indias-printed-
heritage
Dedicated Digital Curator supporting computationally
driven research, such as text mining, with outputs,
through creating and curating datasets for inclusion
on data.bl.uk and providing digital skills training.
Two Centuries of Indian Print
Right: Pleasing tales designed to improve the understanding, and
direct the conduct of young persons, 1825
51. www.bl.uk 51
The OCR Challenge
• Benefit=enables search but also research at
scale across many items
• Example: Quarterly Lists-What can the
history of book publishing reveal about a
particular time period/place?
• Big challenges! Tables, etc. Solutions tried
yielded poor results!
• Exploring possibilities with Google, Abby
Corporate, Tesseract, Indian partners +
research centres
53. www.bl.uk 53
Activity: Get Creative!
AHRC has just announced a funding call to support “Innovative
Digital Research Projects” out of cultural heritage institutions
In small groups, drawing on some of the tools, methods and ideas
presented today, develop an idea for enabling innovative digital research
at the British Library with our collections.
This could be anything from combining digital collections in interesting
new ways, crowdsourcing a new digital collection or data, or developing
a new service for researchers…get creative!
Consider challenges/opportunities.
Draw/outline your idea and deliver a 5 minute pitch to the room.
You have 20 minutes.
54. www.bl.uk 54
Next steps
The Digital Scholarship Training
Programme is an internal staff
training initiative by the Digital Curator
team that launched in November
2012.
Helps us to situate our collections and
expertise in the realm of digital
research. Explore opportunities and
challenges.
• 101 What is Digital Scholarship?
• 103 Digitisation at British Library
• 105 Crowdsourcing in Libraries, Museums and Cultural Heritage Institutions
• 107 Data Visualisation for Analysis in Scholarly Research
• 108 Geocoding Historical Information and Digital Mapping
• 109 Data on the Web: Mash-ups, API’s and The Semantic Web
• 118 Cleaning up Data
55. www.bl.uk 55
Informal monthly meetings to run through self-paced tutorials
• From Paper Maps to the Web: A DIY Digital Maps Primer
• Literary & Historical Network Analysis using Gephi
• …..join us and recommend more!
More info on the wiki!
56. www.bl.uk 56
Contacts
Team Email: digitalresearch@bl.uk
Team Blog: http://britishlibrary.typepad.co.uk/digital-scholarship/
Nora McGregor Aquiles Alencar-Brayner
Asian & African Collections European & Americas
Stella Wisdom Mia Ridge
Contemporary British Western Heritage
Rossitza Atanassova Mahendra Mahey
Digitisation BL Labs