SlideShare a Scribd company logo
1@BL_Labs @Britishlibrary @BL_DigiSchol
1100 - 1300, Thursday, 26th
April 2018,
British Library Labs and Digital Scholarship at the British Library,
Harley Room, British Library, St Pancras, London.
Presentation to the National Science Library of the Chinese Academy of Sciences
The Work of British Library Labs and Digital Scholarship
Insights from British Library Labs and an emerging role for Libraries
mahendra.mahey@bl.uk
Mahendra Mahey, Manager of British Library Labs (BL Labs)
2@BL_Labs @Britishlibrary @BL_DigiSchol
Mahendra Mahey Wechat
Wechat ID: MahendraMahey
3@BL_Labs @Britishlibrary @BL_DigiSchol
Meet the Digital Scholarship Department
Founded in 2010, we support colleagues and
researchers to make innovative use of British
Library digital collections and data.
Cross disciplinary experts in the areas of
digitisation, librarianship, digital history &
humanities, computer and data science,
looking at how technology is
transforming research, and in turn, our
services.
Activities:
•Help get content in digital form and online
•Offering digital research support and guidance
•Supporting collaborative projects
•Running events, competitions, and awards
•British Library Labs
•Digital Curators
•Endangered Archives Programme (EAP)
•THOR & Freya
Adam Farquhar
Head of Digital Scholarship
Neil Fitzgerald
Head of Digital
Research Team
4@BL_Labs @Britishlibrary @BL_DigiSchol
•An area of scholarly activity, born from humanities computing, at the
intersection of computing/digital technologies and the
humanities.
•The field both employs technology in the pursuit of humanities
research, and subjects technology to humanistic questioning and
interrogation.
•DH is collaborative, crossdisciplinary, and computationally
engaged research, teaching, and publishing.
https://en.wikipedia.org/wiki/Digital_humanities
Defining digital humanities (DH)
5@BL_Labs @Britishlibrary @BL_DigiSchol
Getting (& staying) in the game
The Digital Scholarship Training Programme is
an internal staff training initiative by the Digital
Curator team that launched in November 2012.
Informed by the Digital Humanities, we look at
what researchers in the field were
learning/doing.
6@BL_Labs @Britishlibrary @BL_DigiSchol
Upcoming Courses
• 118 Cleaning up Data October 19, 2017 10:00-15:00
• 116 Metadata at the British Library November 01, 2017 10:00-15:00
• 105 Crowdsourcing in GLAM November 16, 2017 10:00-15:00
• 101 This is Digital Scholarship December 05, 2017 10:00-12:30
• 103 Digitisation Coming early 2018
• 107 Data Visualisation Coming February 2018
• 108 Geo-referencing & Digital Mapping Coming early 2018
Example Hack & Yacks
• Handwritten Text Recognition with Transkribus
• From Paper Maps to the Web: A DIY Digital Maps Primer
• Literary & Historical Network Analysis using Gephi
Example Readings
• Recovering Women’s History with Network Analysis: A Case Study of the Fabian News
• Do Artifacts Have Politics?
• Putting Big Data to Good Use: Historical Case Studies
7@BL_Labs @Britishlibrary @BL_DigiSchol
Theme: Improving access to texts
(Driven by Asian & African & Western Heritage
priorities)
• DCT257 Two Centuries of Indian Print
Pilot will see over 4,000 items between 1713 to 1914, mostly Bengali to be digitised
and catalogued. Dedicated Digital Curator supporting computationally driven
research, such as text mining, with outputs, through creating and curating datasets
for inclusion on data.bl.uk and providing digital skills training.
• DCT330
Arabic Script & derived Scripts Analysis and Recognition Conference (ASAR 2017)
Partnership with researchers at Alan Turing Institute, successful proposal to
hosting the 2nd International Workshop on Arabic Script Analysis and Recognition
(ASAR 2017). Working with curators to produce Arabic datasets for OCR/HCR
competition.
• DCT339:
International conference on the cyber infrastructure for historical China Studies
Bringing together research centers, libraries and public/private text database
creators together with scholars and programmers who are creating online utilities
and APIs to explore a cyberinfrastructure for China studies.
8@BL_Labs @Britishlibrary @BL_DigiSchol
Theme: Emerging formats, interactive fiction, games
(Driven by Contemporary British priorities)
• DCT304: Two Collaborative Doctoral Partnerships researching Digital
Publishing and The Reader: Interactions between Readers and Writers of
Creative Texts in Digital Environments.
The purpose of these collaborative research projects is to investigate the changing
nature of publishing in digital environments.
• DCT316: Ambient Literature
Ambient Literature is an AHRC funded collaboration between the University of
West England, Bath Spa University and the University of Birmingham to investigate
the locational and technological future of the book.
• DCT317 The Infinite Library: Interactive Fiction Summer School
Five day digital writing summer school led by multi-award-winner Dr Abigail Parry
and a host of specialists in fiction, interactive fiction and games writing, teaching
skills and techniques to work within a dynamic form – one that allows the reader to
choose the direction of the narrative.
9@BL_Labs @Britishlibrary @BL_DigiSchol
Theme: Digital Research Environments and
Pathways, proof of concepts, prototype services
• DCT337 Alan Turing Institute / AHRC project
ATI and BL are establishing a joint research programme in Digital Humanities;
initially with a proof of concept project applying data science methodologies to the
British Newspaper Archive
• DCT336 UCL Computer Science student projects 2017/2018
Outputs for each project might include: prototype interfaces, proof of concept
applications of computational techniques, datasets for publication on data.bl.uk,
reusable code on github, staff talks, blog posts and website case studies.
• DCT333: Creating a Chronotopic Ground for the Mapping of Literary Texts:
Innovative Data Visualisation and Spatial Interpretation in the Digital Medium
Three year AHRC funded project started October 2017 and ending September 2020
The project is about visualising literary place and space, using the digital medium to
advance understanding and interpretation of literary texts in entirely new ways.
10@BL_Labs @Britishlibrary @BL_DigiSchol
http://www.bl.uk/projects/british-library-labs
Funded by the Andrew W. Mellon Foundation & British Library
Running since March 2013
Core Team
•Adam Farquhar (Principal Investigator) (0.1)
•Mahendra Mahey (Manager) (Full Time)
•Ben O’Steen (Technical Lead) (Full Time)
•Eleanor Cooper (Project Officer) (0.5)
11@BL_Labs @Britishlibrary @BL_DigiSchol
http://www.bl.uk/projects/british-library-labs
Funded by the Andrew W. Mellon Foundation & British Library
Running since March 2013
Core Team
•Adam Farquhar (Principal Investigator) (0.1)
•Mahendra Mahey (Manager) (Full Time)
•Ben O’Steen (Technical Lead) (Full Time)
•Eleanor Cooper (Project Officer) (0.5)
12@BL_Labs @Britishlibrary @BL_DigiSchol
Wider engagement…not just
Digital Humanities / Scholarship Researchers
Researchers
https://goo.gl/WutNyi Artists
http://goo.gl/nNKhQ2
Librarians
Curators
https://goo.gl/9NWZUW
Software Developers
https://goo.gl/7QQ5Tf
Archivists
https://goo.gl/x7b4tg
Educators
https://goo.gl/qh01Mi
Working and Communicating
Inspirational
examples
Experiences
Challenges
Lessons Learned
Entrepreneurs
https://goo.gl/Fx8RG7
13@BL_Labs @Britishlibrary @BL_DigiSchol
Living Knowledge Vision (2015 – 2023)
Custodianship Research Business
Culture Learning International
To make our intellectual heritage accessible to everyone,
for research, inspiration and enjoyment and be the most open, creative
and innovative institution of its kind by 2023 (50 year anniversary).
Document:http://goo.gl/h41wW7 Speech:https://goo.gl/Py9uHK
Roly Keating (Chief Executive Officer of the British Library)
To make our intellectual heritage accessible to everyone,
for research, inspiration and enjoyment and be the most open, creative
and innovative institution of its kind by 2023 (50 year anniversary).
14@BL_Labs @Britishlibrary @BL_DigiSchol
Physical collections – not just books!
> 180*million items
> 0.8* m serial titles
> 8* m stamps
> 14* m books
> 6* m sound recordings
> 4* m maps
> 1.6* m musical scores
> 0.3* m manuscripts
> 60* m patents
King’s Library *Estimates
15@BL_Labs @Britishlibrary @BL_DigiSchol
What about Digital?
Born Digital Digitised
16@BL_Labs @Britishlibrary @BL_DigiSchol
/
Knowledge Quarter London
80 knowledge organisations (as of 14/04/18) within 1 mile radius of
Kings Cross, http://www.knowledgequarter.london
Born digital
17@BL_Labs @Britishlibrary @BL_DigiSchol
#bldigital
1-2 %* digitised
* estimate
Digitisation
Partnerships
Commercial & Other Organisations
Amount
increasing rapidly
e.g. Heritage Made Digital
Bias in digitisation
http://goo.gl/bR9UJL
Sample Generator
18@BL_Labs @Britishlibrary @BL_DigiSchol
The Story of the Digital Collection…
Digital
Collection
Curator
Who paid for the digitisation?
Who did the digitisation?
Technology used
Born digital?
Published
Unpublished
Where is it?
Can it still be accessed?
Generates income
Reputational risk in using?
Legalities /
Ethics / Morality
Politics when digitised
Personalities involved
Surprises (e.g. gaps)
Descriptive information
Old format not supported
What media was the
digitisation done from?
Is there any background documentation?
No Descriptive information
Inconsistent descriptive information
Still there?
Good to know the background ‘story’ of a Digital Collection
if you want to use it for research and make conclusions…
19@BL_Labs @Britishlibrary @BL_DigiSchol
•Digitisation costs money, time, resources
•704 Digitisation projects / collections
(as of 26/04/2018)
From the UK Web (born digital)
to small amounts of digitised manuscripts (digitised)
So little digitised…why?
© £ 
20@BL_Labs @Britishlibrary @BL_DigiSchol
Openly Licensed Digital Content?
15% Openly
Licensed
Around 80%*
available online
Working through to make more open through
Access and Re-use committee which meets once a month…
Though some collections will always only be available onsite due to
various reasons including legal, ethical etc.
Breakdown by collection*
Manuscripts 59%
Books 9%
Maps and Views 7%
Newspapers 3%
Archives and Records 3%
Paintings, Prints and Drawings 2%
*Based on number of digitisation projects (704 as of 26/04/18)
Largest proportion of funding
Public / Private Partnership
15 %* Openly Licensed – most online
85 %* Available onsite only at the moment
*Estimates
21@BL_Labs @Britishlibrary @BL_DigiSchol
Open Content vs Onsite Only Access
• Access easier for openly licensed content
• More challenging for on-site, in-copyright, non-print legal
deposit, data protected, old content media & contemporary
material (post 1877)
https://goo.gl/Y5zCXg
©
22@BL_Labs @Britishlibrary @BL_DigiSchol
How do we give access to
onsite-only
Digital Collections
(85% of our Digital Collections)?
23@BL_Labs @Britishlibrary @BL_DigiSchol
only in
Reading
Rooms due
to ©
only on
site due to
© or
ethical etc
not online /
available –
various storage
devices,
personal data
online
and open
British Library
online
behind
paywall
Challenges of access to Digital Collections
Labs Residency Model
24@BL_Labs @Britishlibrary @BL_DigiSchol
Why are we doing this?
25@BL_Labs @Britishlibrary @BL_DigiSchol
Why are doing this? (1)
We support research it’s our job!
We want to work closely with and
listening to those who want use
our digital collections and data
for their work!
https://goo.gl/esqpRb
26@BL_Labs @Britishlibrary @BL_DigiSchol
We can learn how we are and should be supporting you and
this therefore shapes the problems we work on, such as:
https://goo.gl/esqpRb
Why are doing this? (2)
• Access to digital collections / data?
• Advice, guidance, technical support,
training
• Services, Tools and Processes?
• Many more reasons…
27@BL_Labs @Britishlibrary @BL_DigiSchol
Where are the gaps between what you want & what we can
give?
How do we build the bridges to overcome the gaps?
Why are doing this? (3)
https://goo.gl/6CwCeE
28@BL_Labs @Britishlibrary @BL_DigiSchol
How do we help you ‘navigate’ their way through the
‘maze’ (sometimes) of the
Library to what they want to do?
Sometimes requires understanding the culture of the organisation
https://goo.gl/62JnQT
Why are doing this? (4)
29@BL_Labs @Britishlibrary @BL_DigiSchol
Have you got X?
https://upload.wikimedia.org/wikipedia/commons/5/50/Real_wuerzburg.jpg
Looking for Physical Content in the British Library
30@BL_Labs @Britishlibrary @BL_DigiSchol
Have you got X digitised / in digital form?
http://www.yorkmix.com/wp-content/uploads/2014/04/mr-simms-sweet-shoppe-york.jpg
Looking for Digitised / Digital Content in the BL
31@BL_Labs @Britishlibrary @BL_DigiSchol
• The Library has to go out to meet researchers, regularly and
cyclically to tell them what we have and learn what they
want to do
• Debunk ‘myths’ about the Library
• Show / tell researchers about the reality of our data
• Researcher’s ideas always change once they explore the
data!
https://goo.gl/esqpRb
Lots of two-way communication!
BL Labs runs annual ‘
Roadshows’ around the UK and the World
32@BL_Labs @Britishlibrary @BL_DigiSchol
https://goo.gl/qpCLlk
https://goo.gl/wMTS3Z
• Dialogue typically:
– you are ‘lucky’ & we have the digital content
/ data relevant to your research
– we don’t have exactly what your looking for,
but is there anything of interest? Let’s talk…
– engagement can be hard work and it’s
constantly required to maintain interest in our
digital collections!
• We also tend to attract researchers with ‘fuzzier’
research boundaries and possibly open to more
interdisciplinary / collaborative research
• Artists find this dialogue easier…
What engagement does the BL have with
researchers wanting use our digital content?
33@BL_Labs @Britishlibrary @BL_DigiSchol
Our Audience and You
Audience
research &
Digital
interests
Digital
collections
you have
This is where Labs works
It starts with a conversation!
Only a small amount of content is digitised!
Might not be the treasure expected at the end of a digital journey!
34@BL_Labs @Britishlibrary @BL_DigiSchol
Interactions with BL Labs “researcher”
wanting to work with our data
35@BL_Labs @Britishlibrary @BL_DigiSchol
Phase 1: Exploration
Allows a researcher to:
– Understand the data in open-ended fashion.
– Discover potential tools to work with the data.
– Gain awareness of their capabilities and limitations.
– Develop a firmer research query.
– Gauge the costs, resources, risks and time needed.
•Outputs of the exploration are not intended to be shareable,
beyond personal experience and key features (data size, formats, tool
successes, etc.).
36@BL_Labs @Britishlibrary @BL_DigiSchol
Phase 2: Query-Focussed
• A firmer and more informed query by the researcher where:
– Suitable datasets already lined up
– There is a good idea of the initial toolset and capabilities (human
and computer) required
– The project output is outlined, and relevant reuse applications are
begun.
– Clear agreements on what happens at the end of the project – data
deletion, virtual machine deletion/archiving/etc.
– Project may iterate on initial ideas,depending on researcher’s
cost/risk appetite
Submit idea
for support
37@BL_Labs @Britishlibrary @BL_DigiSchol
Phase 3: Wrap-up
• Wrap-up
– Work (code, notes) exported and given to researcher
– All derivative data is licenced or retained based on reuse
agreements (Access & Reuse board, etc.)
– Provisions made for the project are wound-down, as agreed
(derivative data deleted after a grace period, etc.)
38@BL_Labs @Britishlibrary @BL_DigiSchol
Playbills, Books, Newspapers
(includes Optical Character Recognition (OCR))
Digital collections and Datasets
British National
Bibliography
http://bnb.data.bl.uk
http://sounds.bl.ukhttp://dml.city.ac.uk/
Music (Recordings & Sheet) & Sounds
http://goo.gl/frSMJt
Broadcast News (TV and Radio)
http://goo.gl/cwThHw
http://goo.gl/pBkisZhttp://goo.gl/E8aRyQ
Usage data
EtHOS
Web ArchiveImages, Manuscripts & Maps
http://www.qdl.qa/
Qatar Digital Library
http://idp.bl.uk/
International
Dunhuang
Project
Maps
http://www.bl.uk/maps/
Hebrew Manuscripts
http://goo.gl/4sbCp9
Flickr &
Wikimedia Commons
https://goo.gl/LZRmaZ
39@BL_Labs @Britishlibrary @BL_DigiSchol
Interoperable Viewer
o IIIF compliant (
http://iiif.io/)
o Downloads
o Citations
o Search within text
o Sound,
multispectral, 3D
o License and
usage terms
o Calls JPEG2000s
40@BL_Labs @Britishlibrary @BL_DigiSchol
Finding Open Cultural Heritage Datasets
http://abs.bl.uk/Digital+Collections
Collection Guides (199 as of 17/04/2018)
https://www.bl.uk/collection-guides/
Datasets about our collections
Bibliographic datasets relating to our published and archival
holdings
Datasets for content mining
Content suitable for use in text and data mining research
Datasets for image analysis
Image collections suitable for large-scale image-analysis-
based research
Datasets from UK Web Archive
Data and API services available for accessing UK Web
Archive
Digital mapping
Geospatial data, cartographic applications, digital aerial
photography and scanned historic map materials
https://data.bl.uk
Download collections as zips, no API
Each dataset has a Digital Object Identifier (DOI)
can be referenced for research
Not all discoverable via
search engines!
41@BL_Labs @Britishlibrary @BL_DigiSchol
Messiness in historical data
• 'Begun in Kiryu, Japan, finished in France'
• 'Bali? Java? Mexico?'
• Variations on USA:
– U.S.
– U.S.A
– U.S.A.
– USA
– United States of America
– USA ?
– United States (case)
• Inconsistency in uncertainty
– U.S.A. or England
– U.S.A./England ?
– England & U.S.A.
42@BL_Labs @Britishlibrary @BL_DigiSchol
Open Refine
http://openrefine.org/
http://freeyourmetadata.org/cleanup/
offers useful advice to cleaning up data
43@BL_Labs @Britishlibrary @BL_DigiSchol
Characterising your data
http://blogs.bl.uk/digital-scholarship/2013/09/data-exploration-through-visualisation.html
44@BL_Labs @Britishlibrary @BL_DigiSchol
Big Data History of Music
How can vast amounts of bibliographic data held by research libraries
be unlocked for music researchers to analyse?
Can this data be interrogated in ways that challenge the traditional
narratives of music history?
Analyses and
visualisations exposed
previously uncharted
patterns in the history of
music, for instance the
rise and fall of music
printing in 16th- and 17th-
century Europe (huge
dips in output in Venice
were down to plague and
war).
45@BL_Labs @Britishlibrary @BL_DigiSchol
• Cultural heritage records contain uncertainty and fuzziness (e.g. date ranges, multiple
values, uncertain or unavailable information)—Curators and staff at institutions often
have unique expertise in deciphering these anomalies-ask them! ( [1960] vs.1960 can
have a big impact depending on what you’re doing)
• Optical Character Recognition in particular is an imperfect art-need to consider how
bad it is, how this might effect your findings, and what needs doing to mitigate it.
• Keeping data clean, organised, open and described well will not only make your life
easier, but enable its widespread re-use beyond and increase future impact. (Datasets
you’ve created in the course of your research projects could even be used to enhance
national collections!)
• Decisions always need to be made while normalising information for visualisation.
Documenting them is important for your research but also future re-use!
• Is your aim enquiry or presentation? All of this will have an impact on the tools and
data cleaning choices you make.
Things to consider: Data + Tools
46@BL_Labs @Britishlibrary @BL_DigiSchol
47@BL_Labs @Britishlibrary @BL_DigiSchol
Data / Digital Curation / Data Librarian
Digitisation
Collecting
Born Digital
Data
Management
Data
Curation
Data
Characterisatio
n
48@BL_Labs @Britishlibrary @BL_DigiSchol
British Library Data Projects
http://dx.doi.org/10.15123/PUB.4307
https://goo.gl/xCM9A7
https://www.datacite.org/
https://odin-project.eu/
https://project-thor.eu/
49@BL_Labs @Britishlibrary @BL_DigiSchol
Data Strategy (2017)
• Data Management
• Data Creation
• Data Archiving and Preservation
• Data Access, Discovery & Reuse
http://blogs.bl.uk/files/britishlibrarydatastrategyoutline.pdf
datasets@bl.uk
https://data.bl.uk
http://bl.uk/datasets
https://goo.gl/X129Yp
50@BL_Labs @Britishlibrary @BL_DigiSchol
How?
51@BL_Labs @Britishlibrary @BL_DigiSchol
Digital research methods
Digital Scholarship
Visualisations
Application Programming Interfaces (APIs)
for datasets e.g. Metadata, Images, etc Annotation
Location based searching & Geo-tagging Crowdsourcing
Human Computation
In 20 years time?
52@BL_Labs @Britishlibrary @BL_DigiSchol
Competition
Awards
Projects
Tell us your ideas (2013-16/17)
Show us what you have already done
in Research, Artistic, Commercial,
Educational & BL Staff categories
Talk to us about working on
collaborative projects
Tell us your ideas (2018
onwards) <=5 days support
• Roadshows
• Events
• Online
• F2F & Virtual
Conversations
New!
Digital Research Support
11 Oct 2018
Engaging with our Digital Collections / Data
More details at:
http://labs.bl.uk
53@BL_Labs @Britishlibrary @BL_DigiSchol
What did people
actually do?
Examples from Text and Images
Over 200 examples (including sound, video) from
Competition and Awards:
http://labs.bl.uk/Ideas+for+Labs
http://labs.bl.uk/Other+Uses+of+Collections
54@BL_Labs @Britishlibrary @BL_DigiSchol
Example Pattern of Research
1, 2, 3
1. Find / identify new things in messy stuff
2. Unlock hidden history / data
3. Celebrate new discoveries
55@BL_Labs @Britishlibrary @BL_DigiSchol
Experiments with Text
56@BL_Labs @Britishlibrary @BL_DigiSchol
https://goo.gl/oUNj5N
https://goo.gl/ImAUv4
Finding things in ‘messy’
Optical Character Recognised (OCR) text
Mrs Folly • Clean up some manually
• Get human ‘ground truth’
• Write computer code (sometimes
it’s machine learning) to find
things reliably in it ‘automatically’
• Try code on messy content
• Tweak if necessary
• Digital ‘lasso’ around content
• Human sift through
Mrs Folly
An example pattern of research
57@BL_Labs @Britishlibrary @BL_DigiSchol
Looking through a rubbish bin?
https://goo.gl/UeEvqs
Good stuff! Some Rubbish
58@BL_Labs @Britishlibrary @BL_DigiSchol
Legalities of Machine Learning /
Text and Data mining
https://goo.gl/toq4Bo
Legalities of Machine Learning / Text and Data
mining still up for discussion…Often misunderstood
Is it the same as humans reading and looking for
patterns…just a bit quicker?
59@BL_Labs @Britishlibrary @BL_DigiSchol
Smell of soup & Machine Learning
Thanks to Memo Akten (@memotv on twitter) for the inspiration!
https://goo.gl/toq4Bo
Nasreddin, 13th
Century Turkish Sufi
http://web2.uvcs.uvic.ca/elc/studyzone/330/reading/smell1.htm
60@BL_Labs @Britishlibrary @BL_DigiSchol
http://victorianhumour.tubmblr.com
Victorian Meme Machine (2014)
https://goo.gl/HMqDt3
Bob Nicholson
http://victorianhumour.tumblr.com/
Bob Nicholson interviewed on
BBC Radio 4 Making History Programme:
http://goo.gl/fmV9ep
And telling jokes to the public:
http://goo.gl/xIDRhz
Bob obtained further funding from his university
Looking for more collaborations
https://www.youtube.com/watch?v=-GRgj7Q5OM0
Rob Walker, Victorian Mother-in-law Jokes
Victorian Comedy Night, 7 Nov 2016
Learnt about access paths
to digital collections
61@BL_Labs @Britishlibrary @BL_DigiSchol
Katrina Navickas (2015)
Political Meetings Mapper
http://politicalmeetingsmapper.co.uk
https://goo.gl/Qq78Oa
Labs Symposium 2015
https://goo.gl/BSA3be
Interview 2015
The Chartist Newspaper
http://goo.gl/vOLSnH
Chartist Monster Meeting
Chartists Walking Tour and
Re-enactment London
Learnt that domain knowledge
reduces noise
62@BL_Labs @Britishlibrary @BL_DigiSchol
Black Abolitionist Performances & their
Presence in Britain (2016) – Hannah-Rose Murray
Frederick
Douglass
Ellen
Craft
Josiah
Henson
Ida B
Wells
A Performance by
Joe Williams &
Martelle Edinborough
http://frederickdouglassinbritain.com/
Started to implement
Machine Learning Techniques
63@BL_Labs @Britishlibrary @BL_DigiSchol
Data-mining verse in 18th
Century newspapers
BL Labs Project 16-17, Jennifer Batt
https://goo.gl/5Akthd
Slides courtesy Jennifer Batt
64@BL_Labs @Britishlibrary @BL_DigiSchol
Verse: 81% lines begin with
initial capital
Prose: 52% lines begin with
initial capital
Westminster Journal 3 March 1745
Slides courtesy Jennifer Batt
Started to refine
Machine Learning Techniques
Jennifer Batt @ the BL on World Poetry Day
‘40,000’ things found…
Possibly using Gale Primary
Sources interface to see if we
can sift this data
65@BL_Labs @Britishlibrary @BL_DigiSchol
OCR Challenges and Opportunities
Enables search + research at
scale across many items
Multiple table styles
Efficient OCR
solution for
Bengali
Bengal Library Catalogue of Books,1918-1919, SV 4
66@BL_Labs @Britishlibrary @BL_DigiSchol
OCR Competition
ICDAR (Kyoto, Japan, Nov 2017)
PRIMA Research Lab, University of Salford
23 institutions 7 countries (50% India)
commercial tech companies + university computer science &
engineering depts
 76% character accuracy
67@BL_Labs @Britishlibrary @BL_DigiSchol
Transcribing historical Arabic Scientific
Manuscripts for OCR research
https://fromthepage.com/bldigital/arabic-scientific-manuscripts
http://blogs.bl.uk/digital-scholarship/2018/03/arabic-handwrittten-ocr.html
68@BL_Labs @Britishlibrary @BL_DigiSchol
Use of Overproof
OCR Correction?
Re-OCR with
ABBY FineReader?
https://www.abbyy.com/en-gb/
http://overproof.projectcomputing.com/
RE-OCR
Cleaning up OCR Text – significant improvement
up (depending on original image quality)
69@BL_Labs @Britishlibrary @BL_DigiSchol
Virtual Infrastructure for OCR text
OCR text ‘scraped’ from
digitised newspapers
and put in internal cloud
Jupyter notebook
Write python code and results
in web browser
http://jupyter.org
Access available for researchers ‘in residence’
70@BL_Labs @Britishlibrary @BL_DigiSchol
Experiments with Images
71@BL_Labs @Britishlibrary @BL_DigiSchol
Worked better for female faces than men’s
Press
http://mechanicalcurator.tumblr.com
Posts image every 30 minutes
http://www.flickr.com/photos/britishlibrary/
1,020,418 images
need tagging!
Creative uses of images
Face recognition
Algorithms based on photos
Mechanical Curator
with an algorithmic brain
(Circles, Squares and Slanty etc)
http://goo.gl/qPPgxX
Snipping out images
from 65,000 Digitised Books*
>800,000,000* views
>17,000,000* tags
https://goo.gl/FgZ4HM
Work @ BL by Ben O’Steen, Labs
and Digital Research Team
*Matt Prior - http://goo.gl/j29Tnx
Since Dec 2013
Tumblr
*Estimates
>More demand to see
physical items
72@BL_Labs @Britishlibrary @BL_DigiSchol
Tagging, Tagging, Tagging…
73@BL_Labs @Britishlibrary @BL_DigiSchol
Tagging a million images
Iterative Crowdsourcing
http://goo.gl/j6fxac
Cardiff University’s
Lost Visions Project
http://www.metadatagames.org/
Metadata Games
James Heald
Mario Klingemann
Chico 45
Use computational methods
Human Tagger
Top British Library Flickr Commons Taggers
18 hard core taggers
How to reward and keep motivated this ‘small group?
Average for ‘crowd’ is 1 tag per person
What kind of ‘task’ can this ‘crowd’ do?
Mobile games for ‘Ships’, ‘Covers’ and ‘Portraits’ Interface for tagging
74@BL_Labs @Britishlibrary @BL_DigiSchol
Adam Crymble (2015)
Crowdsource Arcade
What if crowd sourcing
looked like this?
http://goo.gl/LBfJ4W
http://goo.gl/OH9pOZ
https://goo.gl/7z0j8p
30 mins talk
Labs Symposium (2015)
https://goo.gl/SSRsdd
5 min interview (2015)
http://goo.gl/0APpE8
Game Jam
Using Arcade Games
to help Tag images
‘Art Treachery’ and ‘Tag Attack’
75@BL_Labs @Britishlibrary @BL_DigiSchol
https://www.libcrowds.com/
76@BL_Labs @Britishlibrary @BL_DigiSchol
Special Jury’s Prize (2015)
James Heald – Wikimedia and Map work
https://goo.gl/WYZCB2
http://goo.gl/HNQq5e
https://goo.gl/VPgffL
https://commons.wikimedia.org/
https://goo.gl/djtm1b
Labs Symposium (2015)Geotagging maps
50,000 Maps
Found in Flickr 1 million
Human & Computational Tagging
& Community engagement
Geo-referencing work
https://www.bl.uk/georeferencer
77@BL_Labs @Britishlibrary @BL_DigiSchol
SherlockNet: Competition Winner 2016
Karen Wang, Luda Zhao and Brian Do
Using Convolutional Neural Networks to Automatically Tag and Caption
the British Library Flickr Commons 1 million Image Collection
12 categories
>15.5 million tags added
>100,000 captions
bit.ly/sherlocknet
Pooled surrounding
OCR text on page
from similar images
Used Microsoft COCO (photographs) &
British Museum Prints and Drawings
collections as training sets.
Tags Captions
78@BL_Labs @Britishlibrary @BL_DigiSchol
Applicability for Digital Humanities
 Training image recognition tool on
Indian illustrations
 Mapping Bengali publishing history
through bibliographic xml records
 Possibility for NLP on extensive
range of genres
 Incentivising creative re-use of xml,
TIFFs, metadata through
competition
79@BL_Labs @Britishlibrary @BL_DigiSchol
http://goo.gl/dM8ieA
Mario Klingeman (2015)
Code Artist / Curator
http://goo.gl/bNxGZZ
Kris Hoffman (2016)
Animation for Fashion Week 2016
https://goo.gl/QilqqT
Jiayi Chong 2016 - Animation tool
https://www.facebook.com/RealmlandStory/
Paul Rand Pierce 2016
Graphic Novel on Facebook
Tragic Looking Women
44 Men who Look 44
(Notice the direction faces)
A Hat on the Ground
Spells trouble
Artistic / Creative Works
https://www.youtube.com/watch?v=Q3SBxO34Zlc
David Normal 2014 and 2015
Collages/Paintings & Lightboxes
80@BL_Labs @Britishlibrary @BL_DigiSchol
Imaginary Cities – BL Labs Project /
Exhibition 16-18 (Michael Takeo Magruder)
An artistic exploration seeking to create provocative fictional cityscapes for the Information Age
from the British Library’s digital collection of historic urban maps
81@BL_Labs @Britishlibrary @BL_DigiSchol
Alanna Hilton
British Fashion Colleges Council and
Teatum Jones
82@BL_Labs @Britishlibrary @BL_DigiSchol
Careful of making conclusions based on
‘black box’ software & techniques (e.g.
sentiment analysis), learn the assumptions
behind them first!
Lessons Learned & Challenges…
Beware of ‘Black Box’ software…
83@BL_Labs @Britishlibrary @BL_DigiSchol
Breaking Black Boxes – Melodee Beals
84@BL_Labs @Britishlibrary @BL_DigiSchol
Huge appetite to use digital content & data
for anyone’s ideas!
(e.g. Flickr Commons stats).
Lessons Learned & Challenges…
Huge demand for open digital content…
https://goo.gl/yQ5s4U
85@BL_Labs @Britishlibrary @BL_DigiSchol
Many researchers have the domain knowledge but lack
technical / digital skills to use Digital Research
methods.
Should they be teamed up with those that want to solve
problems or get trained?
Digital skills training needed for Humanities
researchers/ Librarians…
https://goo.gl/i5GVfI
https://goo.gl/kwcK8Jhttps://software-carpentry.org/
https://librarycarpentry.github.io/
http://www.datacarpentry.org/
86@BL_Labs @Britishlibrary @BL_DigiSchol
Labs mindset…
1. Start a conversation, generate positive energy,
be nice, have fun and try to support ideas .
2. Start with small experiments, but think big!
3. Fail faster (don’t be afraid) and persevere.
4. Reject perfectionism! Good enough is
sometimes…good enough!
5. Celebrate the uses of digital collections, tell
the world!
https://goo.gl/noASfl
87@BL_Labs @Britishlibrary @BL_DigiSchol
Library Labs around the world
Meeting in London to share experiences 13-14 September 2018
BL Labs
Royal Danish Library
Austrian National Library
Library of Congress
BnF
KB
DXLab
Swedish National Library
Norwegian National Library
Berlin State Library
88@BL_Labs @Britishlibrary @BL_DigiSchol
Hey there Young Sailor!
Ling Low 2016 – Hey there Young Sailor
https://www.youtube.com/watch?v=bcOP1E5bRE0VIMEO.COM/SWEETANDLOWFIL
MS
@SWEETNLOWFILMS ON
INSTAGRAM
@SWEETNLOWLING ON TWITTER
The Impatient Sisters
Play to fade!
89@BL_Labs @Britishlibrary @BL_DigiSchol
Questions?
Prompt Question
I didn’t understand…. Can you tell me more about…
Why did you… I am not sure about…
What if… Why didn’t you…
What’s the best thing about… What was the worst thing…
If you could have your time again,
…
How did you…
I am not sure I agree about… What was the biggest challenge…
What was the most successful
thing about…
Who did…

More Related Content

What's hot

Cooperating with Google
Cooperating with GoogleCooperating with Google
Cooperating with Google
Max Kaiser
 
AHRC CDP Digital Humanities 101
AHRC CDP Digital Humanities 101  AHRC CDP Digital Humanities 101
AHRC CDP Digital Humanities 101
Digital Research and Curator Team @ British Library
 
BL Labs Roadshow 2016 - Digital Research Team
BL Labs Roadshow 2016 - Digital Research TeamBL Labs Roadshow 2016 - Digital Research Team
BL Labs Roadshow 2016 - Digital Research Team
labsbl
 
Bl labs roadshow aab_open_university.2016
Bl labs roadshow aab_open_university.2016Bl labs roadshow aab_open_university.2016
Bl labs roadshow aab_open_university.2016
Aquiles Alencar Brayner
 
You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?
The European Library
 
Europeana Libraries: the value of a library domain aggregator
Europeana Libraries: the value of a library domain aggregatorEuropeana Libraries: the value of a library domain aggregator
Europeana Libraries: the value of a library domain aggregator
LIBER Europe
 
Cabriology Janbraeckman Bibnet EBLIDA LIBER
Cabriology Janbraeckman Bibnet EBLIDA LIBERCabriology Janbraeckman Bibnet EBLIDA LIBER
Cabriology Janbraeckman Bibnet EBLIDA LIBERBibnet vzw
 
Building Capacities and Communities for Digital Scholarship: The "Digging Dee...
Building Capacities and Communities for Digital Scholarship: The "Digging Dee...Building Capacities and Communities for Digital Scholarship: The "Digging Dee...
Building Capacities and Communities for Digital Scholarship: The "Digging Dee...
Harriett Green
 
British Library Labs Presentation at UK Medical Heritage Library Live Lab
British Library Labs Presentation at UK Medical Heritage Library Live LabBritish Library Labs Presentation at UK Medical Heritage Library Live Lab
British Library Labs Presentation at UK Medical Heritage Library Live Lab
labsbl
 

What's hot (12)

Cooperating with Google
Cooperating with GoogleCooperating with Google
Cooperating with Google
 
AHRC CDP Digital Humanities 101
AHRC CDP Digital Humanities 101  AHRC CDP Digital Humanities 101
AHRC CDP Digital Humanities 101
 
BL Labs Roadshow 2016 - Digital Research Team
BL Labs Roadshow 2016 - Digital Research TeamBL Labs Roadshow 2016 - Digital Research Team
BL Labs Roadshow 2016 - Digital Research Team
 
Bl labs roadshow aab_open_university.2016
Bl labs roadshow aab_open_university.2016Bl labs roadshow aab_open_university.2016
Bl labs roadshow aab_open_university.2016
 
You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?
 
You've Digitised. What Next ?
You've Digitised. What Next ?You've Digitised. What Next ?
You've Digitised. What Next ?
 
Europeana Libraries: the value of a library domain aggregator
Europeana Libraries: the value of a library domain aggregatorEuropeana Libraries: the value of a library domain aggregator
Europeana Libraries: the value of a library domain aggregator
 
Cabriology Janbraeckman Bibnet EBLIDA LIBER
Cabriology Janbraeckman Bibnet EBLIDA LIBERCabriology Janbraeckman Bibnet EBLIDA LIBER
Cabriology Janbraeckman Bibnet EBLIDA LIBER
 
Building Capacities and Communities for Digital Scholarship: The "Digging Dee...
Building Capacities and Communities for Digital Scholarship: The "Digging Dee...Building Capacities and Communities for Digital Scholarship: The "Digging Dee...
Building Capacities and Communities for Digital Scholarship: The "Digging Dee...
 
British Library Labs Presentation at UK Medical Heritage Library Live Lab
British Library Labs Presentation at UK Medical Heritage Library Live LabBritish Library Labs Presentation at UK Medical Heritage Library Live Lab
British Library Labs Presentation at UK Medical Heritage Library Live Lab
 
Prague pptfinal
Prague pptfinalPrague pptfinal
Prague pptfinal
 
Prague olomoucfinal
Prague olomoucfinalPrague olomoucfinal
Prague olomoucfinal
 

Similar to Presentation to the National Science Library of the Chinese Academy of Sciences

British Library Labs - CityLIS
British Library Labs  - CityLISBritish Library Labs  - CityLIS
British Library Labs - CityLIS
labsbl
 
Pratt/KCL Summer School 2017
Pratt/KCL Summer School 2017Pratt/KCL Summer School 2017
Working with the British Library’s Digital Collections & Data - Insights from...
Working with the British Library’s Digital Collections & Data - Insights from...Working with the British Library’s Digital Collections & Data - Insights from...
Working with the British Library’s Digital Collections & Data - Insights from...
labsbl
 
British Library Labs Presentation at Elpub 2014, June 20, 2014
British Library Labs Presentation at Elpub 2014, June 20, 2014British Library Labs Presentation at Elpub 2014, June 20, 2014
British Library Labs Presentation at Elpub 2014, June 20, 2014
labsbl
 
Supporting the Digital Scholar: Experiences from the British Library Labs
Supporting the Digital Scholar:Experiences from the British Library LabsSupporting the Digital Scholar:Experiences from the British Library Labs
Supporting the Digital Scholar: Experiences from the British Library Labs
labsbl
 
Rethink research, illuminate history with the British Library
Rethink research, illuminate history with the British LibraryRethink research, illuminate history with the British Library
Rethink research, illuminate history with the British Library
Mia
 
Developments in digital scholarship: at the British Library and at kitchen ta...
Developments in digital scholarship: at the British Library and at kitchen ta...Developments in digital scholarship: at the British Library and at kitchen ta...
Developments in digital scholarship: at the British Library and at kitchen ta...
Mia
 
'What is British Library Labs?' and 'Example patterns of working with the Bri...
'What is British Library Labs?' and 'Example patterns of working with the Bri...'What is British Library Labs?' and 'Example patterns of working with the Bri...
'What is British Library Labs?' and 'Example patterns of working with the Bri...
labsbl
 
Bl labs roadshow aab_sheffield.2016
Bl labs roadshow aab_sheffield.2016Bl labs roadshow aab_sheffield.2016
Bl labs roadshow aab_sheffield.2016
Aquiles Alencar Brayner
 
Digital Scholarship at the British Library
Digital Scholarship at the British LibraryDigital Scholarship at the British Library
Digital Scholarship at the British Library
Digital Research and Curator Team @ British Library
 
Building Better GLAM Labs - Keynote at University of Victoria, Victoria, BC, ...
Building Better GLAM Labs - Keynote at University of Victoria, Victoria, BC, ...Building Better GLAM Labs - Keynote at University of Victoria, Victoria, BC, ...
Building Better GLAM Labs - Keynote at University of Victoria, Victoria, BC, ...
labsbl
 
British Library Labs - Bodleian - University of Oxford
British Library Labs - Bodleian - University of OxfordBritish Library Labs - Bodleian - University of Oxford
British Library Labs - Bodleian - University of Oxford
labsbl
 
British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...
British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...
British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...The European Library
 
BL Labs Presentation at Open Science Infrastructures for Big Cultural Data
BL Labs Presentation at Open Science Infrastructures for Big Cultural DataBL Labs Presentation at Open Science Infrastructures for Big Cultural Data
BL Labs Presentation at Open Science Infrastructures for Big Cultural Data
labsbl
 
Digital Scholarship at the British Library
Digital Scholarship at the British LibraryDigital Scholarship at the British Library
Digital Scholarship at the British Library
Mia
 
7th BL Labs Symposium (2019): 12_Digital Research team projects update
7th BL Labs Symposium (2019): 12_Digital Research team projects update7th BL Labs Symposium (2019): 12_Digital Research team projects update
7th BL Labs Symposium (2019): 12_Digital Research team projects update
labsbl
 
Building Better GLAM Labs - Opening talk at Museum Big Data Conference - UCL ...
Building Better GLAM Labs - Opening talk at Museum Big Data Conference - UCL ...Building Better GLAM Labs - Opening talk at Museum Big Data Conference - UCL ...
Building Better GLAM Labs - Opening talk at Museum Big Data Conference - UCL ...
labsbl
 
Dh2016 dstp
Dh2016 dstpDh2016 dstp
BL Labs CityLIS Talk
BL Labs CityLIS TalkBL Labs CityLIS Talk
BL Labs CityLIS Talk
labsbl
 

Similar to Presentation to the National Science Library of the Chinese Academy of Sciences (20)

British Library Labs - CityLIS
British Library Labs  - CityLISBritish Library Labs  - CityLIS
British Library Labs - CityLIS
 
Pratt/KCL Summer School 2017
Pratt/KCL Summer School 2017Pratt/KCL Summer School 2017
Pratt/KCL Summer School 2017
 
Working with the British Library’s Digital Collections & Data - Insights from...
Working with the British Library’s Digital Collections & Data - Insights from...Working with the British Library’s Digital Collections & Data - Insights from...
Working with the British Library’s Digital Collections & Data - Insights from...
 
British Library Labs Presentation at Elpub 2014, June 20, 2014
British Library Labs Presentation at Elpub 2014, June 20, 2014British Library Labs Presentation at Elpub 2014, June 20, 2014
British Library Labs Presentation at Elpub 2014, June 20, 2014
 
Supporting the Digital Scholar: Experiences from the British Library Labs
Supporting the Digital Scholar:Experiences from the British Library LabsSupporting the Digital Scholar:Experiences from the British Library Labs
Supporting the Digital Scholar: Experiences from the British Library Labs
 
Rethink research, illuminate history with the British Library
Rethink research, illuminate history with the British LibraryRethink research, illuminate history with the British Library
Rethink research, illuminate history with the British Library
 
Developments in digital scholarship: at the British Library and at kitchen ta...
Developments in digital scholarship: at the British Library and at kitchen ta...Developments in digital scholarship: at the British Library and at kitchen ta...
Developments in digital scholarship: at the British Library and at kitchen ta...
 
'What is British Library Labs?' and 'Example patterns of working with the Bri...
'What is British Library Labs?' and 'Example patterns of working with the Bri...'What is British Library Labs?' and 'Example patterns of working with the Bri...
'What is British Library Labs?' and 'Example patterns of working with the Bri...
 
Bl labs roadshow aab_sheffield.2016
Bl labs roadshow aab_sheffield.2016Bl labs roadshow aab_sheffield.2016
Bl labs roadshow aab_sheffield.2016
 
Digital Scholarship at the British Library
Digital Scholarship at the British LibraryDigital Scholarship at the British Library
Digital Scholarship at the British Library
 
Building Better GLAM Labs - Keynote at University of Victoria, Victoria, BC, ...
Building Better GLAM Labs - Keynote at University of Victoria, Victoria, BC, ...Building Better GLAM Labs - Keynote at University of Victoria, Victoria, BC, ...
Building Better GLAM Labs - Keynote at University of Victoria, Victoria, BC, ...
 
BL_English doctoral_open_day_session
BL_English doctoral_open_day_sessionBL_English doctoral_open_day_session
BL_English doctoral_open_day_session
 
British Library Labs - Bodleian - University of Oxford
British Library Labs - Bodleian - University of OxfordBritish Library Labs - Bodleian - University of Oxford
British Library Labs - Bodleian - University of Oxford
 
British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...
British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...
British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...
 
BL Labs Presentation at Open Science Infrastructures for Big Cultural Data
BL Labs Presentation at Open Science Infrastructures for Big Cultural DataBL Labs Presentation at Open Science Infrastructures for Big Cultural Data
BL Labs Presentation at Open Science Infrastructures for Big Cultural Data
 
Digital Scholarship at the British Library
Digital Scholarship at the British LibraryDigital Scholarship at the British Library
Digital Scholarship at the British Library
 
7th BL Labs Symposium (2019): 12_Digital Research team projects update
7th BL Labs Symposium (2019): 12_Digital Research team projects update7th BL Labs Symposium (2019): 12_Digital Research team projects update
7th BL Labs Symposium (2019): 12_Digital Research team projects update
 
Building Better GLAM Labs - Opening talk at Museum Big Data Conference - UCL ...
Building Better GLAM Labs - Opening talk at Museum Big Data Conference - UCL ...Building Better GLAM Labs - Opening talk at Museum Big Data Conference - UCL ...
Building Better GLAM Labs - Opening talk at Museum Big Data Conference - UCL ...
 
Dh2016 dstp
Dh2016 dstpDh2016 dstp
Dh2016 dstp
 
BL Labs CityLIS Talk
BL Labs CityLIS TalkBL Labs CityLIS Talk
BL Labs CityLIS Talk
 

More from labsbl

7th BL Labs Symposium (2019): 13_Closing comments
7th BL Labs Symposium (2019): 13_Closing comments7th BL Labs Symposium (2019): 13_Closing comments
7th BL Labs Symposium (2019): 13_Closing comments
labsbl
 
7th BL Labs Symposium (2019): 11_The Artistic Award
7th BL Labs Symposium (2019): 11_The Artistic Award7th BL Labs Symposium (2019): 11_The Artistic Award
7th BL Labs Symposium (2019): 11_The Artistic Award
labsbl
 
7th BL Labs Symposium (2019): 10_British Library Staff Award
7th BL Labs Symposium (2019): 10_British Library Staff Award7th BL Labs Symposium (2019): 10_British Library Staff Award
7th BL Labs Symposium (2019): 10_British Library Staff Award
labsbl
 
7th BL Labs Symposium (2019): 09_Community commendation
7th BL Labs Symposium (2019): 09_Community commendation7th BL Labs Symposium (2019): 09_Community commendation
7th BL Labs Symposium (2019): 09_Community commendation
labsbl
 
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
labsbl
 
7th BL Labs Symposium (2019): 06_An overview of digital preservation at the B...
7th BL Labs Symposium (2019): 06_An overview of digital preservation at the B...7th BL Labs Symposium (2019): 06_An overview of digital preservation at the B...
7th BL Labs Symposium (2019): 06_An overview of digital preservation at the B...
labsbl
 
7th BL Labs Symposium (2019): 05_The Research Award
7th BL Labs Symposium (2019): 05_The Research Award7th BL Labs Symposium (2019): 05_The Research Award
7th BL Labs Symposium (2019): 05_The Research Award
labsbl
 
7th BL Labs Symposium (2019): 04_The story of the GLAM Labs community and how...
7th BL Labs Symposium (2019): 04_The story of the GLAM Labs community and how...7th BL Labs Symposium (2019): 04_The story of the GLAM Labs community and how...
7th BL Labs Symposium (2019): 04_The story of the GLAM Labs community and how...
labsbl
 
7th BL Labs Symposium (2019): 03_BL Labs update
7th BL Labs Symposium (2019): 03_BL Labs update7th BL Labs Symposium (2019): 03_BL Labs update
7th BL Labs Symposium (2019): 03_BL Labs update
labsbl
 
7th BL Labs Symposium (2019): 01_Welcome and Introduction
7th BL Labs Symposium (2019): 01_Welcome and Introduction7th BL Labs Symposium (2019): 01_Welcome and Introduction
7th BL Labs Symposium (2019): 01_Welcome and Introduction
labsbl
 
7th BL Labs Symposium (2019): 07_The Teaching & Learning Award
7th BL Labs Symposium (2019): 07_The Teaching & Learning Award7th BL Labs Symposium (2019): 07_The Teaching & Learning Award
7th BL Labs Symposium (2019): 07_The Teaching & Learning Award
labsbl
 
Digital Magical Mystery Tour - British Library
Digital Magical Mystery Tour - British LibraryDigital Magical Mystery Tour - British Library
Digital Magical Mystery Tour - British Library
labsbl
 
Bl labs sfu-dhi_lab-dhilab-2019-workshop
Bl labs sfu-dhi_lab-dhilab-2019-workshopBl labs sfu-dhi_lab-dhilab-2019-workshop
Bl labs sfu-dhi_lab-dhilab-2019-workshop
labsbl
 
Building Better GLAM Labs - Keynote Presentation at Simon Fraser University
Building Better GLAM Labs - Keynote Presentation at Simon Fraser UniversityBuilding Better GLAM Labs - Keynote Presentation at Simon Fraser University
Building Better GLAM Labs - Keynote Presentation at Simon Fraser University
labsbl
 
Introduction to BL Labs and Reading 35,000 Books: The UCD Contagion Project ...
Introduction to BL Labs and Reading 35,000 Books: The UCD Contagion  Project ...Introduction to BL Labs and Reading 35,000 Books: The UCD Contagion  Project ...
Introduction to BL Labs and Reading 35,000 Books: The UCD Contagion Project ...
labsbl
 
BL Labs Presentation to the British Library Development Team
BL Labs Presentation to the British Library Development TeamBL Labs Presentation to the British Library Development Team
BL Labs Presentation to the British Library Development Team
labsbl
 
Presentation to the London Psychology Group
Presentation to the London Psychology GroupPresentation to the London Psychology Group
Presentation to the London Psychology Group
labsbl
 
British Library Labs Leeds Roadshow 2018
British Library Labs Leeds Roadshow 2018British Library Labs Leeds Roadshow 2018
British Library Labs Leeds Roadshow 2018
labsbl
 
Experiences and lessons learned through British Library Labs How have we eng...
Experiences and lessons learned through British Library Labs  How have we eng...Experiences and lessons learned through British Library Labs  How have we eng...
Experiences and lessons learned through British Library Labs How have we eng...
labsbl
 
BL Labs Presentation at the University of Wolverhampton
BL Labs Presentation at the University of WolverhamptonBL Labs Presentation at the University of Wolverhampton
BL Labs Presentation at the University of Wolverhampton
labsbl
 

More from labsbl (20)

7th BL Labs Symposium (2019): 13_Closing comments
7th BL Labs Symposium (2019): 13_Closing comments7th BL Labs Symposium (2019): 13_Closing comments
7th BL Labs Symposium (2019): 13_Closing comments
 
7th BL Labs Symposium (2019): 11_The Artistic Award
7th BL Labs Symposium (2019): 11_The Artistic Award7th BL Labs Symposium (2019): 11_The Artistic Award
7th BL Labs Symposium (2019): 11_The Artistic Award
 
7th BL Labs Symposium (2019): 10_British Library Staff Award
7th BL Labs Symposium (2019): 10_British Library Staff Award7th BL Labs Symposium (2019): 10_British Library Staff Award
7th BL Labs Symposium (2019): 10_British Library Staff Award
 
7th BL Labs Symposium (2019): 09_Community commendation
7th BL Labs Symposium (2019): 09_Community commendation7th BL Labs Symposium (2019): 09_Community commendation
7th BL Labs Symposium (2019): 09_Community commendation
 
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
 
7th BL Labs Symposium (2019): 06_An overview of digital preservation at the B...
7th BL Labs Symposium (2019): 06_An overview of digital preservation at the B...7th BL Labs Symposium (2019): 06_An overview of digital preservation at the B...
7th BL Labs Symposium (2019): 06_An overview of digital preservation at the B...
 
7th BL Labs Symposium (2019): 05_The Research Award
7th BL Labs Symposium (2019): 05_The Research Award7th BL Labs Symposium (2019): 05_The Research Award
7th BL Labs Symposium (2019): 05_The Research Award
 
7th BL Labs Symposium (2019): 04_The story of the GLAM Labs community and how...
7th BL Labs Symposium (2019): 04_The story of the GLAM Labs community and how...7th BL Labs Symposium (2019): 04_The story of the GLAM Labs community and how...
7th BL Labs Symposium (2019): 04_The story of the GLAM Labs community and how...
 
7th BL Labs Symposium (2019): 03_BL Labs update
7th BL Labs Symposium (2019): 03_BL Labs update7th BL Labs Symposium (2019): 03_BL Labs update
7th BL Labs Symposium (2019): 03_BL Labs update
 
7th BL Labs Symposium (2019): 01_Welcome and Introduction
7th BL Labs Symposium (2019): 01_Welcome and Introduction7th BL Labs Symposium (2019): 01_Welcome and Introduction
7th BL Labs Symposium (2019): 01_Welcome and Introduction
 
7th BL Labs Symposium (2019): 07_The Teaching & Learning Award
7th BL Labs Symposium (2019): 07_The Teaching & Learning Award7th BL Labs Symposium (2019): 07_The Teaching & Learning Award
7th BL Labs Symposium (2019): 07_The Teaching & Learning Award
 
Digital Magical Mystery Tour - British Library
Digital Magical Mystery Tour - British LibraryDigital Magical Mystery Tour - British Library
Digital Magical Mystery Tour - British Library
 
Bl labs sfu-dhi_lab-dhilab-2019-workshop
Bl labs sfu-dhi_lab-dhilab-2019-workshopBl labs sfu-dhi_lab-dhilab-2019-workshop
Bl labs sfu-dhi_lab-dhilab-2019-workshop
 
Building Better GLAM Labs - Keynote Presentation at Simon Fraser University
Building Better GLAM Labs - Keynote Presentation at Simon Fraser UniversityBuilding Better GLAM Labs - Keynote Presentation at Simon Fraser University
Building Better GLAM Labs - Keynote Presentation at Simon Fraser University
 
Introduction to BL Labs and Reading 35,000 Books: The UCD Contagion Project ...
Introduction to BL Labs and Reading 35,000 Books: The UCD Contagion  Project ...Introduction to BL Labs and Reading 35,000 Books: The UCD Contagion  Project ...
Introduction to BL Labs and Reading 35,000 Books: The UCD Contagion Project ...
 
BL Labs Presentation to the British Library Development Team
BL Labs Presentation to the British Library Development TeamBL Labs Presentation to the British Library Development Team
BL Labs Presentation to the British Library Development Team
 
Presentation to the London Psychology Group
Presentation to the London Psychology GroupPresentation to the London Psychology Group
Presentation to the London Psychology Group
 
British Library Labs Leeds Roadshow 2018
British Library Labs Leeds Roadshow 2018British Library Labs Leeds Roadshow 2018
British Library Labs Leeds Roadshow 2018
 
Experiences and lessons learned through British Library Labs How have we eng...
Experiences and lessons learned through British Library Labs  How have we eng...Experiences and lessons learned through British Library Labs  How have we eng...
Experiences and lessons learned through British Library Labs How have we eng...
 
BL Labs Presentation at the University of Wolverhampton
BL Labs Presentation at the University of WolverhamptonBL Labs Presentation at the University of Wolverhampton
BL Labs Presentation at the University of Wolverhampton
 

Recently uploaded

How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
PedroFerreira53928
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 

Recently uploaded (20)

How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 

Presentation to the National Science Library of the Chinese Academy of Sciences

  • 1. 1@BL_Labs @Britishlibrary @BL_DigiSchol 1100 - 1300, Thursday, 26th April 2018, British Library Labs and Digital Scholarship at the British Library, Harley Room, British Library, St Pancras, London. Presentation to the National Science Library of the Chinese Academy of Sciences The Work of British Library Labs and Digital Scholarship Insights from British Library Labs and an emerging role for Libraries mahendra.mahey@bl.uk Mahendra Mahey, Manager of British Library Labs (BL Labs)
  • 2. 2@BL_Labs @Britishlibrary @BL_DigiSchol Mahendra Mahey Wechat Wechat ID: MahendraMahey
  • 3. 3@BL_Labs @Britishlibrary @BL_DigiSchol Meet the Digital Scholarship Department Founded in 2010, we support colleagues and researchers to make innovative use of British Library digital collections and data. Cross disciplinary experts in the areas of digitisation, librarianship, digital history & humanities, computer and data science, looking at how technology is transforming research, and in turn, our services. Activities: •Help get content in digital form and online •Offering digital research support and guidance •Supporting collaborative projects •Running events, competitions, and awards •British Library Labs •Digital Curators •Endangered Archives Programme (EAP) •THOR & Freya Adam Farquhar Head of Digital Scholarship Neil Fitzgerald Head of Digital Research Team
  • 4. 4@BL_Labs @Britishlibrary @BL_DigiSchol •An area of scholarly activity, born from humanities computing, at the intersection of computing/digital technologies and the humanities. •The field both employs technology in the pursuit of humanities research, and subjects technology to humanistic questioning and interrogation. •DH is collaborative, crossdisciplinary, and computationally engaged research, teaching, and publishing. https://en.wikipedia.org/wiki/Digital_humanities Defining digital humanities (DH)
  • 5. 5@BL_Labs @Britishlibrary @BL_DigiSchol Getting (& staying) in the game The Digital Scholarship Training Programme is an internal staff training initiative by the Digital Curator team that launched in November 2012. Informed by the Digital Humanities, we look at what researchers in the field were learning/doing.
  • 6. 6@BL_Labs @Britishlibrary @BL_DigiSchol Upcoming Courses • 118 Cleaning up Data October 19, 2017 10:00-15:00 • 116 Metadata at the British Library November 01, 2017 10:00-15:00 • 105 Crowdsourcing in GLAM November 16, 2017 10:00-15:00 • 101 This is Digital Scholarship December 05, 2017 10:00-12:30 • 103 Digitisation Coming early 2018 • 107 Data Visualisation Coming February 2018 • 108 Geo-referencing & Digital Mapping Coming early 2018 Example Hack & Yacks • Handwritten Text Recognition with Transkribus • From Paper Maps to the Web: A DIY Digital Maps Primer • Literary & Historical Network Analysis using Gephi Example Readings • Recovering Women’s History with Network Analysis: A Case Study of the Fabian News • Do Artifacts Have Politics? • Putting Big Data to Good Use: Historical Case Studies
  • 7. 7@BL_Labs @Britishlibrary @BL_DigiSchol Theme: Improving access to texts (Driven by Asian & African & Western Heritage priorities) • DCT257 Two Centuries of Indian Print Pilot will see over 4,000 items between 1713 to 1914, mostly Bengali to be digitised and catalogued. Dedicated Digital Curator supporting computationally driven research, such as text mining, with outputs, through creating and curating datasets for inclusion on data.bl.uk and providing digital skills training. • DCT330 Arabic Script & derived Scripts Analysis and Recognition Conference (ASAR 2017) Partnership with researchers at Alan Turing Institute, successful proposal to hosting the 2nd International Workshop on Arabic Script Analysis and Recognition (ASAR 2017). Working with curators to produce Arabic datasets for OCR/HCR competition. • DCT339: International conference on the cyber infrastructure for historical China Studies Bringing together research centers, libraries and public/private text database creators together with scholars and programmers who are creating online utilities and APIs to explore a cyberinfrastructure for China studies.
  • 8. 8@BL_Labs @Britishlibrary @BL_DigiSchol Theme: Emerging formats, interactive fiction, games (Driven by Contemporary British priorities) • DCT304: Two Collaborative Doctoral Partnerships researching Digital Publishing and The Reader: Interactions between Readers and Writers of Creative Texts in Digital Environments. The purpose of these collaborative research projects is to investigate the changing nature of publishing in digital environments. • DCT316: Ambient Literature Ambient Literature is an AHRC funded collaboration between the University of West England, Bath Spa University and the University of Birmingham to investigate the locational and technological future of the book. • DCT317 The Infinite Library: Interactive Fiction Summer School Five day digital writing summer school led by multi-award-winner Dr Abigail Parry and a host of specialists in fiction, interactive fiction and games writing, teaching skills and techniques to work within a dynamic form – one that allows the reader to choose the direction of the narrative.
  • 9. 9@BL_Labs @Britishlibrary @BL_DigiSchol Theme: Digital Research Environments and Pathways, proof of concepts, prototype services • DCT337 Alan Turing Institute / AHRC project ATI and BL are establishing a joint research programme in Digital Humanities; initially with a proof of concept project applying data science methodologies to the British Newspaper Archive • DCT336 UCL Computer Science student projects 2017/2018 Outputs for each project might include: prototype interfaces, proof of concept applications of computational techniques, datasets for publication on data.bl.uk, reusable code on github, staff talks, blog posts and website case studies. • DCT333: Creating a Chronotopic Ground for the Mapping of Literary Texts: Innovative Data Visualisation and Spatial Interpretation in the Digital Medium Three year AHRC funded project started October 2017 and ending September 2020 The project is about visualising literary place and space, using the digital medium to advance understanding and interpretation of literary texts in entirely new ways.
  • 10. 10@BL_Labs @Britishlibrary @BL_DigiSchol http://www.bl.uk/projects/british-library-labs Funded by the Andrew W. Mellon Foundation & British Library Running since March 2013 Core Team •Adam Farquhar (Principal Investigator) (0.1) •Mahendra Mahey (Manager) (Full Time) •Ben O’Steen (Technical Lead) (Full Time) •Eleanor Cooper (Project Officer) (0.5)
  • 11. 11@BL_Labs @Britishlibrary @BL_DigiSchol http://www.bl.uk/projects/british-library-labs Funded by the Andrew W. Mellon Foundation & British Library Running since March 2013 Core Team •Adam Farquhar (Principal Investigator) (0.1) •Mahendra Mahey (Manager) (Full Time) •Ben O’Steen (Technical Lead) (Full Time) •Eleanor Cooper (Project Officer) (0.5)
  • 12. 12@BL_Labs @Britishlibrary @BL_DigiSchol Wider engagement…not just Digital Humanities / Scholarship Researchers Researchers https://goo.gl/WutNyi Artists http://goo.gl/nNKhQ2 Librarians Curators https://goo.gl/9NWZUW Software Developers https://goo.gl/7QQ5Tf Archivists https://goo.gl/x7b4tg Educators https://goo.gl/qh01Mi Working and Communicating Inspirational examples Experiences Challenges Lessons Learned Entrepreneurs https://goo.gl/Fx8RG7
  • 13. 13@BL_Labs @Britishlibrary @BL_DigiSchol Living Knowledge Vision (2015 – 2023) Custodianship Research Business Culture Learning International To make our intellectual heritage accessible to everyone, for research, inspiration and enjoyment and be the most open, creative and innovative institution of its kind by 2023 (50 year anniversary). Document:http://goo.gl/h41wW7 Speech:https://goo.gl/Py9uHK Roly Keating (Chief Executive Officer of the British Library) To make our intellectual heritage accessible to everyone, for research, inspiration and enjoyment and be the most open, creative and innovative institution of its kind by 2023 (50 year anniversary).
  • 14. 14@BL_Labs @Britishlibrary @BL_DigiSchol Physical collections – not just books! > 180*million items > 0.8* m serial titles > 8* m stamps > 14* m books > 6* m sound recordings > 4* m maps > 1.6* m musical scores > 0.3* m manuscripts > 60* m patents King’s Library *Estimates
  • 15. 15@BL_Labs @Britishlibrary @BL_DigiSchol What about Digital? Born Digital Digitised
  • 16. 16@BL_Labs @Britishlibrary @BL_DigiSchol / Knowledge Quarter London 80 knowledge organisations (as of 14/04/18) within 1 mile radius of Kings Cross, http://www.knowledgequarter.london Born digital
  • 17. 17@BL_Labs @Britishlibrary @BL_DigiSchol #bldigital 1-2 %* digitised * estimate Digitisation Partnerships Commercial & Other Organisations Amount increasing rapidly e.g. Heritage Made Digital Bias in digitisation http://goo.gl/bR9UJL Sample Generator
  • 18. 18@BL_Labs @Britishlibrary @BL_DigiSchol The Story of the Digital Collection… Digital Collection Curator Who paid for the digitisation? Who did the digitisation? Technology used Born digital? Published Unpublished Where is it? Can it still be accessed? Generates income Reputational risk in using? Legalities / Ethics / Morality Politics when digitised Personalities involved Surprises (e.g. gaps) Descriptive information Old format not supported What media was the digitisation done from? Is there any background documentation? No Descriptive information Inconsistent descriptive information Still there? Good to know the background ‘story’ of a Digital Collection if you want to use it for research and make conclusions…
  • 19. 19@BL_Labs @Britishlibrary @BL_DigiSchol •Digitisation costs money, time, resources •704 Digitisation projects / collections (as of 26/04/2018) From the UK Web (born digital) to small amounts of digitised manuscripts (digitised) So little digitised…why? © £ 
  • 20. 20@BL_Labs @Britishlibrary @BL_DigiSchol Openly Licensed Digital Content? 15% Openly Licensed Around 80%* available online Working through to make more open through Access and Re-use committee which meets once a month… Though some collections will always only be available onsite due to various reasons including legal, ethical etc. Breakdown by collection* Manuscripts 59% Books 9% Maps and Views 7% Newspapers 3% Archives and Records 3% Paintings, Prints and Drawings 2% *Based on number of digitisation projects (704 as of 26/04/18) Largest proportion of funding Public / Private Partnership 15 %* Openly Licensed – most online 85 %* Available onsite only at the moment *Estimates
  • 21. 21@BL_Labs @Britishlibrary @BL_DigiSchol Open Content vs Onsite Only Access • Access easier for openly licensed content • More challenging for on-site, in-copyright, non-print legal deposit, data protected, old content media & contemporary material (post 1877) https://goo.gl/Y5zCXg ©
  • 22. 22@BL_Labs @Britishlibrary @BL_DigiSchol How do we give access to onsite-only Digital Collections (85% of our Digital Collections)?
  • 23. 23@BL_Labs @Britishlibrary @BL_DigiSchol only in Reading Rooms due to © only on site due to © or ethical etc not online / available – various storage devices, personal data online and open British Library online behind paywall Challenges of access to Digital Collections Labs Residency Model
  • 25. 25@BL_Labs @Britishlibrary @BL_DigiSchol Why are doing this? (1) We support research it’s our job! We want to work closely with and listening to those who want use our digital collections and data for their work! https://goo.gl/esqpRb
  • 26. 26@BL_Labs @Britishlibrary @BL_DigiSchol We can learn how we are and should be supporting you and this therefore shapes the problems we work on, such as: https://goo.gl/esqpRb Why are doing this? (2) • Access to digital collections / data? • Advice, guidance, technical support, training • Services, Tools and Processes? • Many more reasons…
  • 27. 27@BL_Labs @Britishlibrary @BL_DigiSchol Where are the gaps between what you want & what we can give? How do we build the bridges to overcome the gaps? Why are doing this? (3) https://goo.gl/6CwCeE
  • 28. 28@BL_Labs @Britishlibrary @BL_DigiSchol How do we help you ‘navigate’ their way through the ‘maze’ (sometimes) of the Library to what they want to do? Sometimes requires understanding the culture of the organisation https://goo.gl/62JnQT Why are doing this? (4)
  • 29. 29@BL_Labs @Britishlibrary @BL_DigiSchol Have you got X? https://upload.wikimedia.org/wikipedia/commons/5/50/Real_wuerzburg.jpg Looking for Physical Content in the British Library
  • 30. 30@BL_Labs @Britishlibrary @BL_DigiSchol Have you got X digitised / in digital form? http://www.yorkmix.com/wp-content/uploads/2014/04/mr-simms-sweet-shoppe-york.jpg Looking for Digitised / Digital Content in the BL
  • 31. 31@BL_Labs @Britishlibrary @BL_DigiSchol • The Library has to go out to meet researchers, regularly and cyclically to tell them what we have and learn what they want to do • Debunk ‘myths’ about the Library • Show / tell researchers about the reality of our data • Researcher’s ideas always change once they explore the data! https://goo.gl/esqpRb Lots of two-way communication! BL Labs runs annual ‘ Roadshows’ around the UK and the World
  • 32. 32@BL_Labs @Britishlibrary @BL_DigiSchol https://goo.gl/qpCLlk https://goo.gl/wMTS3Z • Dialogue typically: – you are ‘lucky’ & we have the digital content / data relevant to your research – we don’t have exactly what your looking for, but is there anything of interest? Let’s talk… – engagement can be hard work and it’s constantly required to maintain interest in our digital collections! • We also tend to attract researchers with ‘fuzzier’ research boundaries and possibly open to more interdisciplinary / collaborative research • Artists find this dialogue easier… What engagement does the BL have with researchers wanting use our digital content?
  • 33. 33@BL_Labs @Britishlibrary @BL_DigiSchol Our Audience and You Audience research & Digital interests Digital collections you have This is where Labs works It starts with a conversation! Only a small amount of content is digitised! Might not be the treasure expected at the end of a digital journey!
  • 34. 34@BL_Labs @Britishlibrary @BL_DigiSchol Interactions with BL Labs “researcher” wanting to work with our data
  • 35. 35@BL_Labs @Britishlibrary @BL_DigiSchol Phase 1: Exploration Allows a researcher to: – Understand the data in open-ended fashion. – Discover potential tools to work with the data. – Gain awareness of their capabilities and limitations. – Develop a firmer research query. – Gauge the costs, resources, risks and time needed. •Outputs of the exploration are not intended to be shareable, beyond personal experience and key features (data size, formats, tool successes, etc.).
  • 36. 36@BL_Labs @Britishlibrary @BL_DigiSchol Phase 2: Query-Focussed • A firmer and more informed query by the researcher where: – Suitable datasets already lined up – There is a good idea of the initial toolset and capabilities (human and computer) required – The project output is outlined, and relevant reuse applications are begun. – Clear agreements on what happens at the end of the project – data deletion, virtual machine deletion/archiving/etc. – Project may iterate on initial ideas,depending on researcher’s cost/risk appetite Submit idea for support
  • 37. 37@BL_Labs @Britishlibrary @BL_DigiSchol Phase 3: Wrap-up • Wrap-up – Work (code, notes) exported and given to researcher – All derivative data is licenced or retained based on reuse agreements (Access & Reuse board, etc.) – Provisions made for the project are wound-down, as agreed (derivative data deleted after a grace period, etc.)
  • 38. 38@BL_Labs @Britishlibrary @BL_DigiSchol Playbills, Books, Newspapers (includes Optical Character Recognition (OCR)) Digital collections and Datasets British National Bibliography http://bnb.data.bl.uk http://sounds.bl.ukhttp://dml.city.ac.uk/ Music (Recordings & Sheet) & Sounds http://goo.gl/frSMJt Broadcast News (TV and Radio) http://goo.gl/cwThHw http://goo.gl/pBkisZhttp://goo.gl/E8aRyQ Usage data EtHOS Web ArchiveImages, Manuscripts & Maps http://www.qdl.qa/ Qatar Digital Library http://idp.bl.uk/ International Dunhuang Project Maps http://www.bl.uk/maps/ Hebrew Manuscripts http://goo.gl/4sbCp9 Flickr & Wikimedia Commons https://goo.gl/LZRmaZ
  • 39. 39@BL_Labs @Britishlibrary @BL_DigiSchol Interoperable Viewer o IIIF compliant ( http://iiif.io/) o Downloads o Citations o Search within text o Sound, multispectral, 3D o License and usage terms o Calls JPEG2000s
  • 40. 40@BL_Labs @Britishlibrary @BL_DigiSchol Finding Open Cultural Heritage Datasets http://abs.bl.uk/Digital+Collections Collection Guides (199 as of 17/04/2018) https://www.bl.uk/collection-guides/ Datasets about our collections Bibliographic datasets relating to our published and archival holdings Datasets for content mining Content suitable for use in text and data mining research Datasets for image analysis Image collections suitable for large-scale image-analysis- based research Datasets from UK Web Archive Data and API services available for accessing UK Web Archive Digital mapping Geospatial data, cartographic applications, digital aerial photography and scanned historic map materials https://data.bl.uk Download collections as zips, no API Each dataset has a Digital Object Identifier (DOI) can be referenced for research Not all discoverable via search engines!
  • 41. 41@BL_Labs @Britishlibrary @BL_DigiSchol Messiness in historical data • 'Begun in Kiryu, Japan, finished in France' • 'Bali? Java? Mexico?' • Variations on USA: – U.S. – U.S.A – U.S.A. – USA – United States of America – USA ? – United States (case) • Inconsistency in uncertainty – U.S.A. or England – U.S.A./England ? – England & U.S.A.
  • 42. 42@BL_Labs @Britishlibrary @BL_DigiSchol Open Refine http://openrefine.org/ http://freeyourmetadata.org/cleanup/ offers useful advice to cleaning up data
  • 43. 43@BL_Labs @Britishlibrary @BL_DigiSchol Characterising your data http://blogs.bl.uk/digital-scholarship/2013/09/data-exploration-through-visualisation.html
  • 44. 44@BL_Labs @Britishlibrary @BL_DigiSchol Big Data History of Music How can vast amounts of bibliographic data held by research libraries be unlocked for music researchers to analyse? Can this data be interrogated in ways that challenge the traditional narratives of music history? Analyses and visualisations exposed previously uncharted patterns in the history of music, for instance the rise and fall of music printing in 16th- and 17th- century Europe (huge dips in output in Venice were down to plague and war).
  • 45. 45@BL_Labs @Britishlibrary @BL_DigiSchol • Cultural heritage records contain uncertainty and fuzziness (e.g. date ranges, multiple values, uncertain or unavailable information)—Curators and staff at institutions often have unique expertise in deciphering these anomalies-ask them! ( [1960] vs.1960 can have a big impact depending on what you’re doing) • Optical Character Recognition in particular is an imperfect art-need to consider how bad it is, how this might effect your findings, and what needs doing to mitigate it. • Keeping data clean, organised, open and described well will not only make your life easier, but enable its widespread re-use beyond and increase future impact. (Datasets you’ve created in the course of your research projects could even be used to enhance national collections!) • Decisions always need to be made while normalising information for visualisation. Documenting them is important for your research but also future re-use! • Is your aim enquiry or presentation? All of this will have an impact on the tools and data cleaning choices you make. Things to consider: Data + Tools
  • 47. 47@BL_Labs @Britishlibrary @BL_DigiSchol Data / Digital Curation / Data Librarian Digitisation Collecting Born Digital Data Management Data Curation Data Characterisatio n
  • 48. 48@BL_Labs @Britishlibrary @BL_DigiSchol British Library Data Projects http://dx.doi.org/10.15123/PUB.4307 https://goo.gl/xCM9A7 https://www.datacite.org/ https://odin-project.eu/ https://project-thor.eu/
  • 49. 49@BL_Labs @Britishlibrary @BL_DigiSchol Data Strategy (2017) • Data Management • Data Creation • Data Archiving and Preservation • Data Access, Discovery & Reuse http://blogs.bl.uk/files/britishlibrarydatastrategyoutline.pdf datasets@bl.uk https://data.bl.uk http://bl.uk/datasets https://goo.gl/X129Yp
  • 51. 51@BL_Labs @Britishlibrary @BL_DigiSchol Digital research methods Digital Scholarship Visualisations Application Programming Interfaces (APIs) for datasets e.g. Metadata, Images, etc Annotation Location based searching & Geo-tagging Crowdsourcing Human Computation In 20 years time?
  • 52. 52@BL_Labs @Britishlibrary @BL_DigiSchol Competition Awards Projects Tell us your ideas (2013-16/17) Show us what you have already done in Research, Artistic, Commercial, Educational & BL Staff categories Talk to us about working on collaborative projects Tell us your ideas (2018 onwards) <=5 days support • Roadshows • Events • Online • F2F & Virtual Conversations New! Digital Research Support 11 Oct 2018 Engaging with our Digital Collections / Data More details at: http://labs.bl.uk
  • 53. 53@BL_Labs @Britishlibrary @BL_DigiSchol What did people actually do? Examples from Text and Images Over 200 examples (including sound, video) from Competition and Awards: http://labs.bl.uk/Ideas+for+Labs http://labs.bl.uk/Other+Uses+of+Collections
  • 54. 54@BL_Labs @Britishlibrary @BL_DigiSchol Example Pattern of Research 1, 2, 3 1. Find / identify new things in messy stuff 2. Unlock hidden history / data 3. Celebrate new discoveries
  • 56. 56@BL_Labs @Britishlibrary @BL_DigiSchol https://goo.gl/oUNj5N https://goo.gl/ImAUv4 Finding things in ‘messy’ Optical Character Recognised (OCR) text Mrs Folly • Clean up some manually • Get human ‘ground truth’ • Write computer code (sometimes it’s machine learning) to find things reliably in it ‘automatically’ • Try code on messy content • Tweak if necessary • Digital ‘lasso’ around content • Human sift through Mrs Folly An example pattern of research
  • 57. 57@BL_Labs @Britishlibrary @BL_DigiSchol Looking through a rubbish bin? https://goo.gl/UeEvqs Good stuff! Some Rubbish
  • 58. 58@BL_Labs @Britishlibrary @BL_DigiSchol Legalities of Machine Learning / Text and Data mining https://goo.gl/toq4Bo Legalities of Machine Learning / Text and Data mining still up for discussion…Often misunderstood Is it the same as humans reading and looking for patterns…just a bit quicker?
  • 59. 59@BL_Labs @Britishlibrary @BL_DigiSchol Smell of soup & Machine Learning Thanks to Memo Akten (@memotv on twitter) for the inspiration! https://goo.gl/toq4Bo Nasreddin, 13th Century Turkish Sufi http://web2.uvcs.uvic.ca/elc/studyzone/330/reading/smell1.htm
  • 60. 60@BL_Labs @Britishlibrary @BL_DigiSchol http://victorianhumour.tubmblr.com Victorian Meme Machine (2014) https://goo.gl/HMqDt3 Bob Nicholson http://victorianhumour.tumblr.com/ Bob Nicholson interviewed on BBC Radio 4 Making History Programme: http://goo.gl/fmV9ep And telling jokes to the public: http://goo.gl/xIDRhz Bob obtained further funding from his university Looking for more collaborations https://www.youtube.com/watch?v=-GRgj7Q5OM0 Rob Walker, Victorian Mother-in-law Jokes Victorian Comedy Night, 7 Nov 2016 Learnt about access paths to digital collections
  • 61. 61@BL_Labs @Britishlibrary @BL_DigiSchol Katrina Navickas (2015) Political Meetings Mapper http://politicalmeetingsmapper.co.uk https://goo.gl/Qq78Oa Labs Symposium 2015 https://goo.gl/BSA3be Interview 2015 The Chartist Newspaper http://goo.gl/vOLSnH Chartist Monster Meeting Chartists Walking Tour and Re-enactment London Learnt that domain knowledge reduces noise
  • 62. 62@BL_Labs @Britishlibrary @BL_DigiSchol Black Abolitionist Performances & their Presence in Britain (2016) – Hannah-Rose Murray Frederick Douglass Ellen Craft Josiah Henson Ida B Wells A Performance by Joe Williams & Martelle Edinborough http://frederickdouglassinbritain.com/ Started to implement Machine Learning Techniques
  • 63. 63@BL_Labs @Britishlibrary @BL_DigiSchol Data-mining verse in 18th Century newspapers BL Labs Project 16-17, Jennifer Batt https://goo.gl/5Akthd Slides courtesy Jennifer Batt
  • 64. 64@BL_Labs @Britishlibrary @BL_DigiSchol Verse: 81% lines begin with initial capital Prose: 52% lines begin with initial capital Westminster Journal 3 March 1745 Slides courtesy Jennifer Batt Started to refine Machine Learning Techniques Jennifer Batt @ the BL on World Poetry Day ‘40,000’ things found… Possibly using Gale Primary Sources interface to see if we can sift this data
  • 65. 65@BL_Labs @Britishlibrary @BL_DigiSchol OCR Challenges and Opportunities Enables search + research at scale across many items Multiple table styles Efficient OCR solution for Bengali Bengal Library Catalogue of Books,1918-1919, SV 4
  • 66. 66@BL_Labs @Britishlibrary @BL_DigiSchol OCR Competition ICDAR (Kyoto, Japan, Nov 2017) PRIMA Research Lab, University of Salford 23 institutions 7 countries (50% India) commercial tech companies + university computer science & engineering depts  76% character accuracy
  • 67. 67@BL_Labs @Britishlibrary @BL_DigiSchol Transcribing historical Arabic Scientific Manuscripts for OCR research https://fromthepage.com/bldigital/arabic-scientific-manuscripts http://blogs.bl.uk/digital-scholarship/2018/03/arabic-handwrittten-ocr.html
  • 68. 68@BL_Labs @Britishlibrary @BL_DigiSchol Use of Overproof OCR Correction? Re-OCR with ABBY FineReader? https://www.abbyy.com/en-gb/ http://overproof.projectcomputing.com/ RE-OCR Cleaning up OCR Text – significant improvement up (depending on original image quality)
  • 69. 69@BL_Labs @Britishlibrary @BL_DigiSchol Virtual Infrastructure for OCR text OCR text ‘scraped’ from digitised newspapers and put in internal cloud Jupyter notebook Write python code and results in web browser http://jupyter.org Access available for researchers ‘in residence’
  • 71. 71@BL_Labs @Britishlibrary @BL_DigiSchol Worked better for female faces than men’s Press http://mechanicalcurator.tumblr.com Posts image every 30 minutes http://www.flickr.com/photos/britishlibrary/ 1,020,418 images need tagging! Creative uses of images Face recognition Algorithms based on photos Mechanical Curator with an algorithmic brain (Circles, Squares and Slanty etc) http://goo.gl/qPPgxX Snipping out images from 65,000 Digitised Books* >800,000,000* views >17,000,000* tags https://goo.gl/FgZ4HM Work @ BL by Ben O’Steen, Labs and Digital Research Team *Matt Prior - http://goo.gl/j29Tnx Since Dec 2013 Tumblr *Estimates >More demand to see physical items
  • 73. 73@BL_Labs @Britishlibrary @BL_DigiSchol Tagging a million images Iterative Crowdsourcing http://goo.gl/j6fxac Cardiff University’s Lost Visions Project http://www.metadatagames.org/ Metadata Games James Heald Mario Klingemann Chico 45 Use computational methods Human Tagger Top British Library Flickr Commons Taggers 18 hard core taggers How to reward and keep motivated this ‘small group? Average for ‘crowd’ is 1 tag per person What kind of ‘task’ can this ‘crowd’ do? Mobile games for ‘Ships’, ‘Covers’ and ‘Portraits’ Interface for tagging
  • 74. 74@BL_Labs @Britishlibrary @BL_DigiSchol Adam Crymble (2015) Crowdsource Arcade What if crowd sourcing looked like this? http://goo.gl/LBfJ4W http://goo.gl/OH9pOZ https://goo.gl/7z0j8p 30 mins talk Labs Symposium (2015) https://goo.gl/SSRsdd 5 min interview (2015) http://goo.gl/0APpE8 Game Jam Using Arcade Games to help Tag images ‘Art Treachery’ and ‘Tag Attack’
  • 76. 76@BL_Labs @Britishlibrary @BL_DigiSchol Special Jury’s Prize (2015) James Heald – Wikimedia and Map work https://goo.gl/WYZCB2 http://goo.gl/HNQq5e https://goo.gl/VPgffL https://commons.wikimedia.org/ https://goo.gl/djtm1b Labs Symposium (2015)Geotagging maps 50,000 Maps Found in Flickr 1 million Human & Computational Tagging & Community engagement Geo-referencing work https://www.bl.uk/georeferencer
  • 77. 77@BL_Labs @Britishlibrary @BL_DigiSchol SherlockNet: Competition Winner 2016 Karen Wang, Luda Zhao and Brian Do Using Convolutional Neural Networks to Automatically Tag and Caption the British Library Flickr Commons 1 million Image Collection 12 categories >15.5 million tags added >100,000 captions bit.ly/sherlocknet Pooled surrounding OCR text on page from similar images Used Microsoft COCO (photographs) & British Museum Prints and Drawings collections as training sets. Tags Captions
  • 78. 78@BL_Labs @Britishlibrary @BL_DigiSchol Applicability for Digital Humanities  Training image recognition tool on Indian illustrations  Mapping Bengali publishing history through bibliographic xml records  Possibility for NLP on extensive range of genres  Incentivising creative re-use of xml, TIFFs, metadata through competition
  • 79. 79@BL_Labs @Britishlibrary @BL_DigiSchol http://goo.gl/dM8ieA Mario Klingeman (2015) Code Artist / Curator http://goo.gl/bNxGZZ Kris Hoffman (2016) Animation for Fashion Week 2016 https://goo.gl/QilqqT Jiayi Chong 2016 - Animation tool https://www.facebook.com/RealmlandStory/ Paul Rand Pierce 2016 Graphic Novel on Facebook Tragic Looking Women 44 Men who Look 44 (Notice the direction faces) A Hat on the Ground Spells trouble Artistic / Creative Works https://www.youtube.com/watch?v=Q3SBxO34Zlc David Normal 2014 and 2015 Collages/Paintings & Lightboxes
  • 80. 80@BL_Labs @Britishlibrary @BL_DigiSchol Imaginary Cities – BL Labs Project / Exhibition 16-18 (Michael Takeo Magruder) An artistic exploration seeking to create provocative fictional cityscapes for the Information Age from the British Library’s digital collection of historic urban maps
  • 81. 81@BL_Labs @Britishlibrary @BL_DigiSchol Alanna Hilton British Fashion Colleges Council and Teatum Jones
  • 82. 82@BL_Labs @Britishlibrary @BL_DigiSchol Careful of making conclusions based on ‘black box’ software & techniques (e.g. sentiment analysis), learn the assumptions behind them first! Lessons Learned & Challenges… Beware of ‘Black Box’ software…
  • 83. 83@BL_Labs @Britishlibrary @BL_DigiSchol Breaking Black Boxes – Melodee Beals
  • 84. 84@BL_Labs @Britishlibrary @BL_DigiSchol Huge appetite to use digital content & data for anyone’s ideas! (e.g. Flickr Commons stats). Lessons Learned & Challenges… Huge demand for open digital content… https://goo.gl/yQ5s4U
  • 85. 85@BL_Labs @Britishlibrary @BL_DigiSchol Many researchers have the domain knowledge but lack technical / digital skills to use Digital Research methods. Should they be teamed up with those that want to solve problems or get trained? Digital skills training needed for Humanities researchers/ Librarians… https://goo.gl/i5GVfI https://goo.gl/kwcK8Jhttps://software-carpentry.org/ https://librarycarpentry.github.io/ http://www.datacarpentry.org/
  • 86. 86@BL_Labs @Britishlibrary @BL_DigiSchol Labs mindset… 1. Start a conversation, generate positive energy, be nice, have fun and try to support ideas . 2. Start with small experiments, but think big! 3. Fail faster (don’t be afraid) and persevere. 4. Reject perfectionism! Good enough is sometimes…good enough! 5. Celebrate the uses of digital collections, tell the world! https://goo.gl/noASfl
  • 87. 87@BL_Labs @Britishlibrary @BL_DigiSchol Library Labs around the world Meeting in London to share experiences 13-14 September 2018 BL Labs Royal Danish Library Austrian National Library Library of Congress BnF KB DXLab Swedish National Library Norwegian National Library Berlin State Library
  • 88. 88@BL_Labs @Britishlibrary @BL_DigiSchol Hey there Young Sailor! Ling Low 2016 – Hey there Young Sailor https://www.youtube.com/watch?v=bcOP1E5bRE0VIMEO.COM/SWEETANDLOWFIL MS @SWEETNLOWFILMS ON INSTAGRAM @SWEETNLOWLING ON TWITTER The Impatient Sisters Play to fade!
  • 89. 89@BL_Labs @Britishlibrary @BL_DigiSchol Questions? Prompt Question I didn’t understand…. Can you tell me more about… Why did you… I am not sure about… What if… Why didn’t you… What’s the best thing about… What was the worst thing… If you could have your time again, … How did you… I am not sure I agree about… What was the biggest challenge… What was the most successful thing about… Who did…

Editor's Notes

  1. 90 seconds (270 words) Nehow. My name’s Mahendra Mahey and it’s my great honour to be the keynote speaker for this conference, I really hope you have a wonderful and productive time here and in London. So let’s begin…&amp;lt;CLICK&amp;gt; I manage a project at the British Library called British Library Labs or ‘BL Labs’ for short. It’s made up of a team of 4 people and we also work occasionally with our Digital Research and Digital Scholarship colleagues. The project’s been running for over 4 years and is kindly supported by the Andrew W. Mellon Foundation and the BL. &amp;lt;CLICK&amp;gt; I am going to take you on a journey so that you learn about our experiences of working with the BL‘s digital collections. I will identify issues, challenges, problems and solutions we have encountered and look at the impact our work is having on Research Data Management, Digital Humanities, research infrastructure and data literacy amongst researchers and librarians in particular. I will show you how and why we have engaged with a range of people using our data, highlighting their work and findings, and present some of the lessons we have learned and examine the wider impact of the project on the Library and other organisations.&amp;lt;CLICK&amp;gt; A link to download my presentation appears on the bottom of each slide and for those of you using social media I have also included some relevant tags if you would like to tweet. Hopefully, there will be 5 minutes at the end for questions and my email address is below in case you are shy or think of something afterwards.
  2. Set up in 2010 the team was formed as a way of dedicating focus on the changing research landscape in the digital realm. Now embedded in collection areas, and as you’ll see later, joining the library explicitly as part of major digitisation projects. Main activities: Working behind the scenes to get content in digital form and online Offering digital research support and guidance Supporting collaborative projects Running events, competitions, and awards
  3. https://librarycarpentry.github.io/
  4. 3 seconds (10 words) BL Labs encourages researchers, artists, entrepreneurs, educators and anyone else &amp;lt;CLICK&amp;gt;
  5. 18 seconds (55 words) to ‘experiment’ with our digital collections. We’re particularly interested in people who have questions which focus on finding and creating NEW things using our large collections data which use digital research methods, especially when manual methods aren’t possible such as looking for patterns across thousands of digitised books or millions of newspaper pages.
  6. 23 seconds (71 words) Though the project focusses on working and communicating with Digital Humanities and Digital Scholarship researchers, we have also engaged with amazing Artists, Librarians, Curators, Educators, Entrepreneurs, Archivists, Software Developers and other innovators. Hopefully, I will show you&amp;lt;CLICK&amp;gt; some inspirational examples of work they have done which have used our digital collections.&amp;lt;CLICK&amp;gt; I will also reflect on our experiences, challenges and lessons we have learned working with some amazing and pioneering people.
  7. 42 seconds (128 words) The Library focuses most of its work and collaborations through it’s 8 year Living Knowledge vision. Initiated in 2015, to coincide with the 50th anniversary of the creation of the Library, our vision is to make our intellectual heritage accessible to everyone, for research, inspiration and enjoyment and be the most open, creative and innovative institution of its kind by 2023. The Library’s two core purposes are to build, curate and preserve the UK national collection of published, written and digital content and to support and stimulate research of all kinds.&amp;lt;CLICK&amp;gt; We also support businesses helping them to innovate and grow, engaging everyone with memorable cultural experiences, inspiring young people and learners of all ages and working with international partners around the world to advance knowledge and mutual understanding.
  8. 41 seconds (123 words) We have huge collections of physical items. Here you can see inside the main building in London, it’s the King’s Library – King George the Third and Fourth’s personal library! We currently estimate our total number of physical items exceed &amp;lt;CLICK&amp;gt; 180 million items, representing every age of written civilisation and every known language. Our archives now contain the earliest surviving printed book in the world, the Diamond Sutra, dating from 868 C.E…., we have around 14 million books, only 7% of our collections. We also have around…&amp;lt;CLICK&amp;gt; 60 million patents, 8 million stamps, 4 million maps, 6 million sound recordings, 1.6 million music scores, over 300,000 manuscripts and 800,000 serial titles (which are of course made up of many volumes/editions/issues and series). We employ around 1600 people across all sites.
  9. 6 seconds (20 words) BL Labs focuses on getting people to experiment with its digital collections, things that are already &amp;lt;CLICK&amp;gt; born digital&amp;lt;CLICK&amp;gt; or digitised.
  10. 36 seconds (110 Words) &amp;lt;CLICK&amp;gt; In 2013, legal deposit was extended to cover non-print material, consequently we have been collecting UK websites through the UK Web Archive, e-books, e-journals, CDs, DVDs etc. As a result terabytes and billions of items are being archived at the BL every year. &amp;lt;CLICK&amp;gt; We are the headquarters of the Alan Turing Institute for Data Science and BL Labs is active research partner.&amp;lt;CLICK&amp;gt; We are also part of the Knowledge Quarter London Hub, comprising of 80 world class knowledge based organisations situated within a 1km radius of the BL, sharing ideas, best practice, meetings and events e.g.&amp;lt;CLICK&amp;gt; Companies such as Google and our sister organisation the British Museum to name a few.
  11. 24 seconds (72 words) The BL are world renowned experts in digitising materials from our physical holdings. One common misconception that many people have is that much if not all of our collections are digitised. So, the actual proportion of our collections that are digitised surprises many&amp;lt;CLICK&amp;gt; The figure is around 1-2% of our physical collections.&amp;lt;CLICK&amp;gt; Much of our digitisation activity happens through partnerships with commercial, philanthropic, charitable and foundation partners such as the Qatar Foundation. &amp;lt;CLICK&amp;gt; What is for certain, is the amount we are digitising is increasing rapidly. Our new programme called Heritage Made Digital for example prioritises those collections for digitisation where there is a clear researcher demand.&amp;lt;CLICK&amp;gt; One important thing we have learned is that researchers need to take heed when doing research based on our digital collections, as they are rarely complete, having gaps and not necessarily being representative of our physical collections.
  12. 41 words (125 seconds) Our work in Labs has taught us that it always pays for researchers to know the back ‘story’ of a digital collection especially if they want to use it for research and analysis.&amp;lt;CLICK&amp;gt; There are too many things to consider right now, but a few highlights are such as, ‘are there gaps in the collection?’, ‘can they still be accessed?’, but perhaps most important of all is whether the curator or a human being who knows about the collection is still around who could be asked about it. Our experience has told us that so much will probably be in their head that isn’t written down, information that could be vital, important and useful for knowing about before carrying out research or re-use.
  13. 26 seconds (66 words) So why is so ‘little’ digitised? Simply put, it costs money, time and resources to digitise physical materials to a professional standard. However, even though our digitised collections are a small fraction of our physical, combined with our born digital collections they still represent an impressive and sometimes un-imaginable amount of data.&amp;lt;CLICK&amp;gt; Currently we have 704 collections, ranging from the UK Web which includes billions of websites, to a collection of 130 digitised Chinese scroll maps. Some items on this list are confidential and require due diligence/risk management before we can tell the world about them.
  14. 35 seconds (106 words) Further analysis of our digital collections reveals that only 15% (that’s 105 collections) are openly licensed of which four fifths are available online. &amp;lt;CLICK&amp;gt; 85% of our digital collections are only available onsite. Each month, more collections are being made available under an open access license, through our ‘Access and Re-use’ committee, but this takes time, especially for collections that were digitised before 2012, when we didn’t have such a group.&amp;lt;CLICK&amp;gt; Here’s a breakdown of our digital collections by type, &amp;lt;CLICK&amp;gt; Our digitised collections include born digital, e-acquistions, and of course the results of many digitisation projects funded by public/private partnerships, some of which are still in progress.
  15. 11 seconds (24 words) Giving access to our openly licensed digitised materials is obviously much easier than&amp;lt;CLICK&amp;gt; Digital collections that are only available onsite such as those that are still within copyright to name one of many reasons.&amp;lt;CLICK&amp;gt;
  16. 9 seconds (28 words) So, how do we give access to onsite-only Digital Collections at the British Library? (that’s the 85% of our data).Well there are further challenges in doing this.
  17. 55 seconds (167 words) &amp;lt;CLICK&amp;gt;Sometimes digital content is only available onsite due to license restrictions, or even only on a specific computer in a reading room! Technically of course, there are actually very few reasons why digital content can’t be online, though it might be too big or it hasn’t been transferred from the original digital media device it was stored on, such as CD, minidisc, Vinyl for example.&amp;lt;CLICK&amp;gt; Sometimes, access is provided through a paywall. Finally, &amp;lt;CLICK&amp;gt; some content is in the happy sunny place, online, open and freely available to all of humanity. The real reasons why there are challenges to accessing digital content are of course human. They require different approaches from the Library and may often involve an honest, open dialogue and negotiation with the publishers who gave us the content in the first place. The Labs project has tried to address this problem by creating a ‘residency model’ where they are security cleared using hot desks in staff areas or trailing areas in the reading rooms &amp;lt;CLICK&amp;gt; for researchers to work intensively with a digital collection on-site, so as to not infringe access conditions.
  18. 2 seconds (5 words) Why are we doing this?
  19. 9 seconds (28 words) We support research it’s our job!We want to work closely with and listening to those who want use our digital collections and data for their work!
  20. 13 seconds (39 words) We can learn how we are and should be supporting you and this therefore shapes the problems we work on, such as: Access to digital collections / data? Advice, guidance, technical support, training Services, Tools and Processes? Many more reasons…
  21. 7 seconds (22 words) Where are the gaps between what you want &amp; what we can give? How do we build the bridges to overcome the gaps?
  22. 10 seconds (30 words) How do we help you ‘navigate’ their way through the ‘maze’ (sometimes) of theLibrary to what they want to do? It sometimes requires understanding the culture of the organisation
  23. 28 seconds (85 words) This what I imagine it feels like for a researcher looking for our physical collections. &amp;lt;CLICK&amp;gt; Everything is on an industrial scale and it can feel overwhelming. Sometimes it isn’t always straightforward to find our items, as there are many that are not on our digital library catalogue, e.g. still on card catalogues and some items are in the secret and very secure parts of the Library where you would need very special permission because the items are extremely valuable and fragile for example.
  24. 36 seconds (109 words) Our digital offering is perhaps like this.&amp;lt;CLICK&amp;gt; Imagine entering a boutique sweet shop. We have some lovely things to tempt you, but it’s much smaller than the hypermarket you just visited. The shop keeper tells you there are some things behind the back door in a giant warehouse. However, you will need special access to enter that space. She also states that there are rooms in that warehouse, even she isn’t allowed to look. She isn’t even allowed to share the full list of stock because there are items on there she may never be able to be see because they were meant to be secret.
  25. 33 seconds (99 words) Given these challenges, the Library has to do lots of external engagement, to tell people what we have. Every year we have a roadshow around the UK and sometimes we get to go to other places in the world, such as Qatar, thank you Milena.&amp;lt;CLICK&amp;gt; We do this to partly ‘de-bunk’ the myths about the Library.&amp;lt;CLICK&amp;gt; And to show / tell researchers about the reality of our data.&amp;lt;CLICK&amp;gt; What we have learned is that researcher’s project ideas of what they want to do with our digital collections always change once they explore and see the reality of our data.
  26. 49 seconds (148 words) So what kind of conversations do we have with researchers who may want to use our digital collections and data?&amp;lt;CLICK&amp;gt; The dialogue typically can be: ‘Ah, you are ‘lucky’ &amp; we have the exact digital content / data relevant to your research’, informally we call these our ‘lucky dip researchers’.&amp;lt;CLICK&amp;gt; Or the conversation might go like this…’Ah, we don’t exactly have what you are looking for, but here is what we do have, is there anything of interest that you like? Let’s talk…&amp;lt;CLICK&amp;gt; We have learned that engagement can be hard work. But it’s constantly required to maintain interest in our digital collections because they aren’t all instantly discoverable on search engines.&amp;lt;CLICK&amp;gt; We also tend to attract researchers with ‘fuzzier’ and ‘flexible’ research boundaries and those who are possibly open to more interdisciplinary / collaborative research.&amp;lt;CLICK&amp;gt; Finally, we have found that artists find this dialogue easier.
  27. 12 seconds (37 words). In another way, we are trying to match our audiences research needs and digital interests &amp;lt;CLICK&amp;gt; With the digital collections we have&amp;lt;CLICK&amp;gt; It is at this intersection where Labs works best and it usually starts with a conversation.
  28. 24 seconds (72 words) Let’s look a little further at the types of interactions we have with our researchers. We have summarised these phases as ‘Exploration’ where people often ‘rethink’ their ideas of what they want to do with the data, ‘Query-Focused’ where they often have to iterate to come up with a realistic proposal of what they want to do and a ‘Wrap-up’ phase to end their project with us, if it is relevant.
  29. 26 seconds (78 words) The ‘exploration’ phase allows the researcher to understand the data in an open-ended fashion, discover potential tools to work with the data, gain awareness of their own capabilities and limitations and develop a firmer research query, gauging costs, resources, risks and the realistic time needed to complete the project.&amp;lt;CLICK&amp;gt; The outputs of this exploration are not necessarily intended to be shareable, beyond personal experience and identifying key features of their enquiry (data size, formats, tool successes, etc.).
  30. 43 seconds (129 words) The ‘query-focussed’ phase allows the researcher to develop a firmer and more informed query where: Suitable datasets are already lined up, there is a good idea of the initial toolset and capabilities required, that is human and technical requirements. The project outputs are outlined, and relevant reuse applications are begun. There are clear agreements on what happens at the end of the project – data deletion, virtual machine deletion/archiving/etc. The project may iterate on initial ideas, depending on researcher’s cost and their’s and the BL’s appetite for risk.&amp;lt;CLICK&amp;gt; This phase may typically be supported by the Library through our new Digital Research Support phase where researchers can get up to 5 days of support for them to further develop their project ideas. More about this later.
  31. 33 seconds (99 words) Finally, when working on projects it’s important that there is a wrap-up phase. Here, the Library may give back the researcher’s work (such as code and notes) through an export from BL hosted tools (especially for those that are onsite). Also, all derivative data is licenced or retained based on reuse agreements (such as our Access &amp; Reuse board, etc.). Provisions are made for the project to be wound-down, as agreed (for example, derivative data is deleted after a grace period, or hosted by the Library if requested by the researcher and appropriate for further re-use by others).
  32. 76 seconds (228 words) So let’s have a very brief overview of our digital collections, datasets and derived data. &amp;lt;CLICK&amp;gt; We have thousands of playbills from theatres, cuttings from magazines, books and millions of newspaper pages digitised, including their Optically Character Recognised text.&amp;lt;CLICK&amp;gt; We have been using external platforms to host our digital collections because this is often a more effective way to make them more visible on the internet, such as Flickr and Wikimedia Commons. We have of course been helping develop the Qatar Digital Library, making digitised manuscripts available from the middle east to all. The International Dunhuang Project makes digitised manuscripts from China available. The Polonsky foundation is helping us make Hebrew Manuscripts accessible and we have thousands of geo-referenced historic maps as well as an online crowdsourcing geo-referencer tool.&amp;lt;CLICK&amp;gt; We are making millions of Library data available from UK and Irish National Library catalogues through our British National Bibliography service&amp;lt;CLICK&amp;gt; We can provide usage data from our readers. EtHOS holds all UK PhDs, either born digital or some digitised, and as previously mentioned the UK Web Archive.&amp;lt;CLICK&amp;gt; We have been recording English language TV news broadcasts since 2010 and archiving historic and current UK radio programmes.&amp;lt;CLICK&amp;gt; We have derived data from the Digital Music Lab project which analysed world and traditional music to look for similarities across countries, digitised sheet music and digitised environmental sounds, music and oral history.
  33. iiiF is a standard for serving and displaying high-quality images; iiiF community of practice – BL member of the iiif consortium; developing compatible software that is easy to install and provides a great user experience Image API – Image data Presentation API – title, Structure (TOC), Sequence (iiif xml manifest) Developments – BL project for Universal Player http://accesssit.ad.bl.uk/item/viewer/ark:/81055/vdc_0000000000C0 User actions will depend on copyright status, location, permission (BL/licence) Single interface for mss, printed books, born digital + sound Calls JPEG2000s stored in the Digital Library System and metadata stored elsewhere
  34. 56 seconds (169 words) Despite our digital collections being a small fraction of our physical holdings and over 85% only being available onsite, here are some ways you can find out about our openly licensed cultural heritage collections. &amp;lt;CLICK&amp;gt; First, on the Labs website we have created a guide pointing to over 100 digital collections. Then as of today, curators have created nearly 200 collections guides by subject, each one having a section on what is available digitally onsite and online if relevant.&amp;lt;CLICK&amp;gt; As part of the Labs project and overall data strategy for the Library we have created a data service, ‘data.bl.uk’ where users can download over 100 datasets. Importantly, it provides the ability to download entire collections instead of single items. Each collection is treated as a dataset with it’s own citeable Digital Object Identifier (D.O.I) for replicable research purposes. The site also includes derived data from experiments that have been carried out on our digital collections.&amp;lt;CLICK&amp;gt; Please note that not all of these datasets are discoverable on all search engines.
  35. 40 seconds (124 words) Because of time and resources in Labs, we didn’t spend much of it cleaning the data before putting it on data.bl.uk. We embraced ‘dirty data’ somewhat. Our data therefore comes with a health warning especially those of you who would like to carry out computational research on cultural heritage data, as it tends to pretty messy for computers to make sense of. The problem is that computers think U.S., U. S. , U.S.A., U. S. A. , United States, United States of America are six different places. Fields also contain things like internal notes about potential duplicates, unexpected extra information - notes on what type of location, etc. Lots of inconsistencies - uncertainty and date ranges expressed in different ways.
  36. 35 seconds (107 words) Open Refine is an amazing tool which we have been using to clean up data. It will suggest ways to make the data more consistent for example. You can then export the data and keep working on it in with other tools, or put it into Open Refine. Because it runs locally it can be used for sensitive data you mightn&amp;apos;t put online. One issue is that Libraries tend to use question marks to record uncertainty in attribution, but Refine strips out all punctuation, so you have to be careful about preserving things like that (if that&amp;apos;s what you want). It also takes in various data formats.
  37. 39 Seconds (117 words) We have been learning that characterising our data is a really valuable way for researchers to begin to understand what we have. Though this is pretty resource intensive, we have carried out some simple experiments. &amp;lt;CLICK&amp;gt; Here, you can see that an analysis of our catalogue data reveals the use of different versions of the Dewey Decimal System across the years.&amp;lt;CLICK&amp;gt; Secondly, in the left column you can see what looks like random data/noise. However, when grouped, we can see the dark blue visualisation indicates there is some similarity in the data, in this case it was subtitles from digitised TV broadcasts.&amp;lt;CLICK&amp;gt; We know this is something we should do more of, if we had more resources.
  38. Research Question: Brought together for the first time the world&amp;apos;s biggest datasets about published sheet music, music manuscripts and classical concerts (in excess of 5 million records) for statistical analysis, manipulation and visualisation. Aim was to unlock musical-bibliographical data held by libraries in order to create new research opportunities. The project cleaned and enhanced aspects of the British Library catalogues of printed and manuscript music, which are now available as open data from www.bl.uk/bibliographic/download.html and piloted big data research techniques on these and five other datasets. Source Collections: Data from seven existing databases and catalogues were used as the basis of this project: the British Library&amp;apos;s catalogues of printed and manuscript music; the bibliographies created by Répertoire International des Sources Musicales (RISM) that list European music printed 1500-1800 and music manuscripts in European libraries; and the RISM UK Music Manuscripts Database and the Concert Programmes Project database. Digital/Computational Techniques: Data wrangling using Open Refine and MARCedit. Data visualisation using: Google Fusion Tables and PalladioProject slides: http://www.slideshare.net/historyspot/ihr-big-data-history-of-music-9-june15 Outcome: Analyses and visualisations of these datasets exposed previously uncharted patterns in the history of music, for instance involving the rise and fall of music printing in 16th- and 17th-century Europe (huge dips in output in Venice were down to plague and war!), or the rise of nationalist colourings in music of the late 18th and early 19th centuries. The detection of these long-term trends permits new ways of linking music history to wider histories of culture, economics, society and politics
  39. 25 seconds (77 words) We are also learning that our digital collection will need significant curation to make them more accessible and re-usuable to researchers. At the moment, a definition of a collection really comes from the efforts of digitising it as a digitisation project. This can be meaningless to a researcher. We believe a new role is emerging for researchers, perhaps libraries to develop roles which enable the ability to characterise, manage and curate data for meaningful research by scholars.
  40. 43 seconds (129 words) The BL has been active in providing research services for data for many years. &amp;lt;CLICK&amp;gt; Opportunities for Data Exchange (ODE) looked at the ways in which data centres, publishers, libraries and researchers encourage better data citation.&amp;lt;CLICK&amp;gt; The DataCite service enables researchers to obtain credit and recognition for sharing their research data, built on digital object identifiers (DOIs)&amp;lt;CLICK&amp;gt; ODIN is built on the Open Researcher &amp; Contributor ID Initiative (ORCID) and DataCite to uniquely identify scientists and data sets.&amp;lt;CLICK&amp;gt; The Unlocking Thesis Data project promoted the use of persistent identifiers for theses, their underlying data and their authors.&amp;lt;CLICK&amp;gt; THOR (Technical and Human Infrastructure for Open Research) established integration between articles, data, and researchers across the research lifecycle. &amp;lt;CLICK&amp;gt; A new project called FREYA will simplify the links between people, research outputs and funding.
  41. 40 seconds (121 words) Our updated data strategy sees research data as integral to our collections, research and services as text is today. The strategy is structured around 4 central themes.&amp;lt;CLICK&amp;gt; Data Management involves the creation of a data management plans and processes to meet our obligations under funding council requirements.&amp;lt;CLICK&amp;gt; Data creation of datasets derived from our collections, and supporting those who want to so the same.&amp;lt;CLICK&amp;gt; Datasets collected and created by the Library will be archived and preserved in line with its other collection policies.&amp;lt;CLICK&amp;gt; The Library ensures that there is appropriate discovery, access and reuse of the datasets it holds, as well as those available from third parties.&amp;lt;CLICK&amp;gt; A useful email address and websites are displayed should you want to make further investigations.
  42. 6 Seconds (19 Words) ‘how’ do we try and engage those who might be interested in the BL’s digital collections and data?
  43. 75 seconds (225 words) Here are the kinds digital research methods our digital scholars are using.&amp;lt;CLICK&amp;gt; For example, searching for items based on and time and location can reveal very interesting patterns, e.g. when and where works were published. Geotagging digitised objects, putting them in space can add new dimensions to the kinds of research questions we might want to ask. &amp;lt;CLICK&amp;gt; Corpus analysis of text in language and Text mining are methods which can find patterns in text through computational analysis.&amp;lt;CLICK&amp;gt; Tasks that require humans to use technology to complete a task that computers would hard fall under the area of Crowdsourcing and Human Computation&amp;lt;CLICK&amp;gt; Annotation involves augmenting an item with additional information, usually text.&amp;lt;CLICK&amp;gt; Similarly transcribing can be the conversion of speech into text through human or computing power to then be used for further analysis. &amp;lt;CLICK&amp;gt; Providing Application Programming Interfaces or APIs to data can be very powerful ways for computational access to datasets, used by software developers to build software applications for example. &amp;lt;CLICK&amp;gt; Many researchers want to see the patterns that are emerging in large amounts of data and are now using a number of very powerful tools to visualise them to see patterns. &amp;lt;CLICK&amp;gt; What is clear is that digital methods are much more that searching for an individual item in a catalogue and Libraries, publishers, service and content providers have to change to support that.
  44. 63 seconds (191 words) So how do we engage people to use our digital collections and data?&amp;lt;CLICK&amp;gt; Between 2013-2016 we ran an international competition, where we asked people to come up with project ideas. We chose two and then worked with them for 4-6 months and showed the results to the world at our annual symposium in November.&amp;lt;CLICK&amp;gt; This has now been replaced by a new service where we will provide up to 5 days Digital Research Support to develop researcher’s project ideas of what they want to do with the Library’s digital collections.&amp;lt;CLICK&amp;gt; Our international annual Awards, recognise work already done with our digital collections in Research, Artistic, Commercial and Educational categories. We also try to recognise and celebrate the superb work our own staff do with our digital collections through our annual BL Labs Staff Awards.&amp;lt;CLICK&amp;gt; Finally, talk to us about working on collaborative projects, which tend to be aligned to our overall Library vision wherever possible.&amp;lt;CLICK&amp;gt; We also do a lot of external engagement, running annual roadshows, events, providing online materials, video interviews, case studies, arranging face to face meetings in London or virtually.&amp;lt;CLICK&amp;gt; More information is available via our website.
  45. 21 Seconds (65 Words) Katrina Navickas was particularly interested in the &amp;lt;Click&amp;gt;Chartist Movement who were a group who were campaigning for the vote for working people. &amp;lt;Click&amp;gt;They were the biggest popular movement for democracy in 19th century British history, just as this is early picture shows a huge monster meeting at Kennington Common&amp;lt;Click&amp;gt;She wanted to use a combination of manual and computational methods to explore our Digitised Newspapers to find out when and where they met and plot them on map. &amp;lt;Click&amp;gt;and hopefully unearthing new history.
  46. The search for an OCR solution to this material is an important one to eventually increasing the possibilities for digital research. Simply put, OCR would enable full-text searching, both when exploring the material through the online exhibition, and mining the text of the file formats that we will also make available. The main challenge we have faced is finding OCR software that handles our material well. You can see an image of a typical page of one of the Quarterly Lists. In itself it presents notable OCR challenges with the data being arranged in tables. So to create an accurate representation of this page and all others we need the OCR to as accurately as possible recreate the layout, so that the contents are properly described and titles and associated metadata is broken out and can be recognised and parsed by a computational programme such as Python which is the bedrock of digital research. But this is a challenge considering the layout of the original, with text often running across the columns and in fact there are quite a few styles of tables. The third challenge are the books themselves. OCR for Bengali and indeed lots of other non-Latin scripts is gaining attention and people are doing great work to overcome this challenge, but it is still not as accomplished as OCR for western languages. These are the tools we have experimented with. Abby FineReader 12 – industry standard, market leader. Although it caters to more and more South Asian languages, in our experience it doesn’t handle Bengali. However we have worked with the IMPACT consortia to test a sample of our Quarterly Lists. IMPACT is the Improving Access to Text EU funded project made up of 26 European libraries with the focus of developing OCR technology. They used the Abby SDK on our Quarterly Lists. We are also aware that the Abby Recognition server can produce ALTO XML. Google – They have OCR’d Bengali materials before to what they claim to be a high standard with positive feedback from Bengali readers. We have been in talks about running our scans through their tool which is an upgrade to the cloud vision server. We are awaiting results of how their system deals with a sample of our Bengali texts which reflect the diversity of text layout/variation in the books. Transkribus – Set up for handwritten texts but told it works with Bengali. However, it may not deal well with the layout Quarterly Lists until layout improvements are introduced in early 2017. Tesseract – This exciting piece of code is the command I used for running one of our Bengali TIFF scans through Tesseract. After evaluating the results we found the OCR output to be quite accurate, although it didn’t pick up on some conjuncts which would be prevalent throughout the rest of the texts. We’ve since run through the rest of our test batch. If the errors can be categorised we may be able to correct through post-processing, perhaps through OverProof.
  47. International Conference on Document Analysis and Recognition Pattern Recognition and Image Analysis Evaluating the state-of-the-art in OCR of Bengali print Since competition have been working with the Indian Institute of Technology Hyderabad who claim 98% character accuracy when OCR’ing Bengali texts. (show video of ground truth creation, or use background image of XML code)
  48. X Seconds (X Words)
  49. 970 files from a selection of 19th century newspaper titles from the BL corpus for us to correct using the overProof post-OCR correction software The best way to measure the improvement made by the correction process is to compare the OCR&amp;apos;ed text and the automatically corrected text with a perfect correction made by a human (known as the &amp;quot;ground truth&amp;quot;). Hannah-Rose&amp;apos;s 5 small human-corrected samples are show as green dots. These are not only smaller than the other files, but their raw error rate is much lower at 13.3%. OverProof was measured as reducing this to 5.4%, a removal of almost 60% of errors. The red dotted-line indicates the correction &amp;quot;break-even&amp;quot; point: the further under the line, the better the quality of the document after correction. In the graph below, the grey line shows distribution of files across error rates before correction and the green line after correction.
  50. Posts small illustrations taken almost at random from the digitised book corpus to a Tumblr blog. This experiment with undirected engagement was a by-product of work to uncover the hidden wealth of illustrations within the digitised pages.
  51. 27 Seconds (82 Words) Adam Crymble &amp;lt;Click&amp;gt;wanted to harness the power of playing fun games on arcade machines to help with crowdsourcing the tagging of un-described images. He particularly wanted to engage a younger audience into crowdsourcing .&amp;lt;Click&amp;gt;On the right you can see a replica 1980’s arcade machine we built and &amp;lt;Click&amp;gt;and on the bottom left some tagging games that were developed through a ‘Games Jam’ for the machine. &amp;lt;Click&amp;gt;. Let’s take a closer look at two of the games…&amp;lt;Click&amp;gt;
  52. 18 Seconds (56 Words) Indexing BL the 1 million &amp; Mapping the Maps – was led by James Heald and collaboration with others &amp;lt;Click&amp;gt;They produced an index of 1 million &amp;apos;Mechanical Curator collection&amp;apos; images on &amp;lt;Click&amp;gt;Wikimedia Commons from a collection of largely un-described images. &amp;lt;Click&amp;gt;This gave rise to finding 50,000 maps within the collection partially through a map-tag-a-thon &amp;lt;Click&amp;gt;These are now being geo-referenced. &amp;lt;Click&amp;gt;
  53. (show Giles’ image of the image recognition tool results) Training image recognition tool on Indian illustrations working with Oxford University to train their image recognition software on the pages we have digitised that contain illustrations. Helping development of their algorithms but also helping classify our images with tagged information. We could form an online gallery and potentially harness the interest of users to tag the images with descriptive metadata. As you can see from the attached, it met with some success, both in finding the exact same block and an obvious copy. I’d now like to try it out using our VIC tool, which tries to find the same content within illustrations. It would also be good to do some benchmarking - this was just a casual test.  Mapping Bengali publishing history through bibliographic xml records We’ve focussed on a subset of the OCR for the bibliographic catalogues. We want to plot the locations of printers and publishers around Kolkata to show the growth of publishing output over the course of the 19th century Bengal. We have data on the number of books printed and the price they sold for, so some very interesting comparative data. Thousands of pages of multi-topical content to mine Sentiment analysis, topic modelling Incentivising creative re-use of xml, TIFFs, metadata through competition Could be an artistic output based on the collection of images, data visualisation using the metadata, text analysis, Since publically announcing the lists will be made available we´ve received lots of interested requests from researchers and we´re sure there are many more who are involved with the study of book history and publishing in India who may want to create visualisations or mine the content to identify new trends, with data that just hasn´t been this accessible before.
  54. 22 seconds (67 words) We are learning that only a small group of researchers that Labs is working with posses the digital skills to use digital research methods. Many lack them, including library staff&amp;lt;CLICK&amp;gt; Should they be teamed up with those that have those skills such as computer scientists or should there be a focus on training such as Software / Library and Data carpentry courses for Librarians and budding Digital Humanists.
  55. 15 seconds (47 Words) Start a conversation, generate positive energy, be nice, have fun and try to support ideas.&amp;lt;CLICK&amp;gt; Start with small experiments, but think big! &amp;lt;CLICK&amp;gt; Fail faster (don’t be afraid) and persevere. &amp;lt;CLICK&amp;gt; Reject perfectionism! Good enough is sometimes…good enough! &amp;lt;CLICK&amp;gt; Celebrate the uses of digital collections, tell the world!
  56. 23 Seconds (70 Words) Many national Libraries are now developing Labs around the world, we hope some of their work has been inspired by us. We are holding a meeting in London to share experiences and learn from each other in September. It would be wonderful if the QNL could come along, of course you are invited as well as others who might be interested. &amp;lt;CLICK&amp;gt; Here are a snapshot of some of Libraries attending.