Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
http://www.bl.uk/projects/british-library-labs
Funded by the Andrew W. Mellon...
2
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Who do we work with?
Researchers
https://goo.gl/WutNyi Artists
http://goo.gl/...
3
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Competition
Awards
Projects
Tell us your ideas of what to do with our digital...
4
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Collections – not just books!
> 180*million items
> 0.8* m serial titles
> 8*...
5
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Have you got X?
https://upload.wikimedia.org/wikipedia/commons/5/50/Real_wuer...
6
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
#bldigital
3 %* digitised
* estimate
Digital
Partnerships
Commercial & Other
...
7
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Have you got X digitised / in digital form?
http://www.yorkmix.com/wp-content...
8
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Our Audience and Collections
Audience
research &
Digital
interests
Digital
co...
9
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Finding Open Cultural Heritage Datasets
Collection Guides (219 as of 25/09/20...
10
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Explore Our Data at http://data.bl.uk!
• CSV of Metadata
https://data.bl.uk/...
11
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
The Story of the Digital Collection…
Digital
Collection
Curator
Who paid for...
12
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
https://goo.gl/qpCLlk
https://goo.gl/wMTS3Z
• Dialogue typically:
– you are ...
13
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Open Content vs Onsite Only Access
• Access easier for openly licensed conte...
14
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
How do we give access to
onsite-only
Digital Collections
(85% of our Digital...
15
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
READING
ROOM
ON
SITE
NOT
ONLINE
OPEN
British Library
£
Labs Residency Model
...
16
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Accessing digital collections onsite
OPEN
£
• Have to be ‘onsite’ (interpret...
17
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Phases of interaction at BL Labs
Submit idea for
support
Ideas always change...
18
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
eResearch SA Open Data Directory
http://www.data.sa.edu.au/
19
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
URLs to download sample files not on data.bl.uk
• https://www.data.sa.edu.au...
20
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Working with British Library Digitised Newspapers
• Digitised through public...
21
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Good, Bad, Ugly Image Quality / OCR
• Original image capture of newspaper im...
22
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Breaking Black Boxes – Melodee Beals
http://doi.org/cm3m
23
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Burney Collection
• Gathered by the Reverend Charles Burney (1757- 1817)
• 7...
24
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Web Interface – Burney Collection
25
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
OCR quality can be very poor!
26
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
1268 Folders
27
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
burney_summary.xls
28
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Breakdown of titles
Title No. of Pages
PUBLIC ADVERTISER 60680
LONDON GAZETT...
29
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Example Folders
B0001ORIWEEJO - APPLEBEE''S ORIGINAL WEEKLY JOURNAL - 1715 –...
30
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Example files
‘service’ folder contains page level images and corresponding ...
31
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
APPLEBEE''S ORIGINAL WEEKLY JOURNAL
FROM SATURDAY NOVEMBER 19 TO SATURDAY NO...
32
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
JISC 1 and JISC 2
Newspapers
33
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Accessing digitised newspapers
through Gale Interface (subscription)
34
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Private BL NAS
Accessible onsite or remotely if security cleared via CITRIX
35
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Accessing digitised newspapers
onsite at the BL (JISC 1)
12 Volumes, 80TB of...
36
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Accessing digitised newspapers
onsite at the BL
Accessing ‘service’ Copy (po...
37
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Accessing digitised newspapers
onsite at the BL
Accessing ‘service’
Copy (po...
38
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Accessing digitised newspapers
onsite at the BL
Accessing OCR as XML
39
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
jisc_1.xls
79 Titles, 2 million pages
40
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Metadata from BL (JISC 1 and 2)
• Title Metadata
– Title, as written
– Norma...
41
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Metadata from Gale (JISC 1 and 2)
• Standardised identifier
• Newspaper titl...
42
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Samples for JISC 1
‘master’ contains high res tiff
‘service’ contains post p...
43
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
JISC 2 Collection
• 22 Titles
• Regional titles
• 1020550 pages
44
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
jisc_2.xls
45
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
JISC 2
• 40 TB
• Stored differently locally
192,353 folders
46
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Samples for JISC 2
• Organised differently
47
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Samples for JISC 2
Lancaster Gazetter, And General Advertiser For Lancashire...
48
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Previous ideas of using collection
• Bob Nicholson – Finding jokes
• Katrina...
49
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Useful resources
• http://oceanicexchanges.org/
• http://scissorsandpaste.ne...
50
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Use of Overproof
OCR Correction?
Re-OCR with
ABBY FineReader?
https://www.ab...
51
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Virtual Infrastructure for OCR text
OCR text ‘scraped’ from
digitised newspa...
52
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
65,000 digitised 19th Century books
Image: Artwork by Alicia Martin 2007 / 2...
53
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Working with the MS Books Collection
• Metadata
• Page level images
• OCR Te...
54
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
30 August 2012
55
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Metadata
MicrosoftBooks.xls - Over 65,000 titles
56
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
MS Books – Finish Titles
57
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Fiction / Non Fiction
58
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Latin American Studies
59
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
ALTO XML – Sample Files – 1800 - 1809
1502 Zip Files
60
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
OCR Text – JSON File
61
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
002819694
62
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
63
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
64
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Optically Character Recognised (OCR)
generated Text
Scanned Page
Image on Fl...
65
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Worked better for female faces than men’s
Press
http://mechanicalcurator.tum...
66
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
British Library Flickr Commons
https://www.flickr.com/photos/britishlibrary/...
67
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Flickr Commons (100 + GLAMs as of 25/09/18)
68
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Getting an account on Flickr
•Get a Flickr / Yahoo account
(https://login.ya...
69
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
British Library Flickr Commons
Why Flickr Commons?
• Free!
• Each image has ...
70
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Using British Library Flickr Commons
•How do we find things in this collecti...
71
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
How is Flickr Commons Organised?
• Photostream
• Albums
• Faves
• Galleries
...
72
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Flickr Photostream
https://www.flickr.com/photos/britishlibrary/
Kind of the...
73
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Flickr Albums
Curated by the British Library – specifically Nora McGregor
Sh...
74
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Flickr Faves
Most favorited image first in descending order
To favourite an ...
75
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Flickr Galleries
More useful if you have an account
You can create a Gallery...
76
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Flickr Groups
Community based – for sharing and discussing images
We might c...
77
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Adding Tags in Flickr
Be the next ‘Chico45’!
78
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Get Tags!
79
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Searching within the collection!
80
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
The Anatomy of a BL Flickr Record
Download
high res
300dpi image
81
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
82
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
When you log in to Flickr Commons
83
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
84
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Opportunities
– increasing traffic to Library services
You can purchase
a ‘H...
85
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Refers to the
Physical Copy of
the Item
86
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
87
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
88
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Physical and Digital Copy
Number relates to Physical Copy
89
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
90
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
91
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
92
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
93
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
94
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
95
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
96
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
You can’t beat the Physical Copy!
97
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Now for the Digital Copy!
98
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
99
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
100
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
101
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Warning – can be large file!
It’s aPDF
You can do Ctrl F in it to find text...
102
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
103
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Page numbers don’t always correspond!
Page numbers
Don’t always correspond
...
104
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
105
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Plain Text from Books?
Not working
But can be obtained from https://data.bl...
106
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
All illustrations in book / books in same year!
All the illustrations in th...
107
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Views and Favourites
108
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Galleries
•Personal Galleries which you can share.
109
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Exchangeable Image File Information!
For Geeks only!
110
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Tags!
111
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Tagging a million images
Iterative Crowdsourcing
http://goo.gl/j6fxac
Cardi...
112
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Adding Tags!
•You have to have an account to add tags!
•Could you be the ne...
113
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Generated from book
Description
Generated from user
114
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Generated by Flickr
115
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Flickr Commons API
https://www.flickr.com/services/api/
116
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Generated by SherlockNet!
bit.ly/sherlocknet
117
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Sherlocknet has a search interface!
118
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
SherlockNet Search for ‘people’
119
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Advanced Search in SherlockNet!
Tags Available for Download
120
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
19th Century Books Metadata
• 1,9 Million records of 19th Century Books
• U...
121
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Using the Wikimedia Synoptic Index
• Created to help find all the maps in t...
122
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Google Fusion Table
• https://fusiontables.google.com/DataSource?docid=1BMm...
123
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Geodata
flickr_geodata.csv
124
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Alston Index
Internal Document
55-602 - Topical Index
603 - 925 - Pressmark...
125
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Alston Index
• Internal document (not to be externally shared)
• Published ...
126
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Topical Index
OCR problems – Re-do? Manually correct?
127
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Augment Library Catalogue?
128
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Libcrowds – In the Spotlight
https://www.libcrowds.com/collection/playbills...
129
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Libcrowds – Spotlight - Data
https://www.libcrowds.com/collection/playbills...
130
@BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
Data Journey
• Choose one or two datasets maximum
• Explore the collection ...
Upcoming SlideShare
Loading in …5
×

A hands-on data exploration & challenge to become a derived data-set author on the British Library’s open data-set platform (https://data.bl.uk)

591 views

Published on

Mahendra Mahey, manager of British Library Labs (BL Labs) will examine some of the BL’s digital collections/data & discuss challenges he has had in making the BL's cultural heritage data available openly or onsite at the British Library.

Mahendra will invite delegates to explore data-sets at their leisure, setting a challenge for those who are interested, skilled in exploring, finding patterns and grouping data. They could become data-set authors/creators of derived data-sets, based on pre-existing digital collections/data provided on the day or already available on https://data.bl.uk.

The workshop will conclude with reflections from the delegates and possibly highlighting a number derived data-sets that were generated by participants on the day that could now potentially exist on https://data.bl.uk. If selected, these new derived data-sets will be attributed with the creators' / authors' details and each will have its own cite-able Digital Object Identifier (D.O.I). These new data-sets would then be available for reuse by any researcher in the world.

GUIDANCE FOR THIS WORKSHOP

We strongly recommend you come to this workshop with an appropriate device such as a laptop pre-installed with appropriate tools to analayse different kinds of data-sets, e.g. Microsoft Excel may work with smaller data-sets such as metadata (see other data exploration tools below). If you don't have one, and would still like to attend, please request to 'pair up' with someone who is willing to share and has already signed up.

Other data exploration tools include: Notepad++ (e.g. for viewing text and XML); Open Refine (e.g. for cleaning data); Tableau Public (e.g. for visualising data); Google Fusion Tables (e.g for visualising geo-spatial data); Spacy (e.g. for text and data mining), RStudio (an open source Statistical package), MATLAB (data analysis tool) & NLTK (Natural Language processing).

Please note that this workshop is NOT about training you in using any of these tools, just tools you may be already familiar with to explore and find patterns in our data.

Datatypes you may be examining in this workshop could include: .ZIP, .PDF, .TXT, .CSV, .TSV. .XLS, .XLSX, RDF, .nt, XML (TEI, ALTO and bespoke), .JSON, .JPG, .JPEG, .TIFF and .WARC

Please ensure you are able to read these files on your device before the workshop if you are interested in exploring them during our session.

Slides for session: http://goo.gl/

URL for specific data: http://

Mahendra Mahey tweets at @BL_Labs & @mahendra_mahey

Published in: Education
  • Be the first to comment

  • Be the first to like this

A hands-on data exploration & challenge to become a derived data-set author on the British Library’s open data-set platform (https://data.bl.uk)

  1. 1. 1 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk http://www.bl.uk/projects/british-library-labs Funded by the Andrew W. Mellon Foundation Running since March 2013 A hands-on data exploration & challenge to become a derived data-set author on the British Library’s open data-set platform (https://data.bl.uk) Mahendra Mahey, Manager of BL Labs, British Library, London, UK. 1400 – 1530, Tuesday 25 September 2018 Workshop part of ‘Making Connections’, Digital Humanities Australasia, 2018 (#DHA2018), University of South Australia, City West campus, Adelaide, SA, Australia
  2. 2. 2 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Who do we work with? Researchers https://goo.gl/WutNyi Artists http://goo.gl/nNKhQ2 Librarians Curators https://goo.gl/9NWZUW Software Developers https://goo.gl/7QQ5Tf Archivists https://goo.gl/x7b4tg Educators https://goo.gl/qh01Mi Working and Communicating Entrepreneurs https://goo.gl/Fx8RG7
  3. 3. 3 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Competition Awards Projects Tell us your ideas of what to do with our digital content (2013-16) Show us what you have already done with our digital content in research, artistic, commercial, learning and teaching, staff categories Talk to us about working on collaborative projects Tell us your ideas of what to do with our digital content Engagement • Roadshows • Events • Meetings • Conversations New! Digital Research Support How?
  4. 4. 4 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Collections – not just books! > 180*million items > 0.8* m serial titles > 8* m stamps > 14* m books > 6* m sound recordings > 4* m maps > 1.6* m musical scores > 0.3* m manuscripts > 60* m patents King’s Library *Estimates
  5. 5. 5 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Have you got X? https://upload.wikimedia.org/wikipedia/commons/5/50/Real_wuerzburg.jpg Looking for Physical Content in the British Library
  6. 6. 6 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk #bldigital 3 %* digitised * estimate Digital Partnerships Commercial & Other Organisations Bias in digitisation http://goo.gl/bR9UJL Sample Generator 15 %* Openly Licensed – most online 85 %* Available onsite only at the moment Digitisation / Curating Born Digital costs money, time, resources http://www.turing.ac.uk Digital increasing rapidly Born Digital http://www.webarchive.org.uk/ukwa/
  7. 7. 7 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Have you got X digitised / in digital form? http://www.yorkmix.com/wp-content/uploads/2014/04/mr-simms-sweet-shoppe-york.jpg Looking for Digitised / Digital Content in the BL
  8. 8. 8 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Our Audience and Collections Audience research & Digital interests Digital collections we have This is where Labs works It starts with a making connections! The theme to DHA2018
  9. 9. 9 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Finding Open Cultural Heritage Datasets Collection Guides (219 as of 25/09/2018) https://www.bl.uk/collection-guides/ Datasets about our collections Bibliographic datasets relating to our published and archival holdings Datasets for content mining Content suitable for use in text and data mining research Datasets for image analysis Image collections suitable for large-scale image-analysis-based research Datasets from UK Web Archive Data and API services available for accessing UK Web Archive Digital mapping Geospatial data, cartographic applications, digital aerial photography and scanned historic map materials https://data.bl.uk Download collections as zips, no API Each dataset has a Digital Object Identifier (DOI) can be referenced for research Not all discoverable via search engines!
  10. 10. 10 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Explore Our Data at http://data.bl.uk! • CSV of Metadata https://data.bl.uk/digbks/dig19cbooks-mdata-csv.csv • 19th Century Books - Book Metadata - 01/09/2013. https://data.bl.uk/digbks/db21.html • Digitised Books - Flickr Tag History - Dec 2013 to March 2016. TSV https://data.bl.uk/digbks/db15.html • Digitised Hebrew Manuscripts - Metadata https://data.bl.uk/hebrewmanuscripts/heb1.html • Digitised Hebrew Manuscripts: Or 2210 - Or 2364 https://data.bl.uk/hebrewmanuscripts/heb8.html • Theatrical playbills from Britain and Ireland (OCR text only) https://data.bl.uk/playbills/pb2.html • Portraits of actors, views of theatres and playbills (covering 1750 - 1821 in a single volume) https://data.bl.uk/singlesheet/por1.html • Volumes of Lysons Collectanea (Amusements), comprising broadsides, cuttings, advertisements on amusements.1660-1840. https://data.bl.uk/singlesheet/ad1.html
  11. 11. 11 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk The Story of the Digital Collection… Digital Collection Curator Who paid for the digitisation? Who did the digitisation? Technology used Born digital? Published Unpublished Where is it? Access / API? Can it still be accessed? Generates income Reputational risk in using? Legalities / Ethics / Morality Politics when digitised Personalities involved Surprises (e.g. gaps) Descriptive information Old format not supported What media was the digitisation done from? Is there any background documentation? No Descriptive information Inconsistent descriptive information Still there? Good to know the background ‘story’ of a Digital Collection if you want to use it for projects …
  12. 12. 12 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk https://goo.gl/qpCLlk https://goo.gl/wMTS3Z • Dialogue typically: – you are ‘lucky’ & we have the digital content / data relevant to your research – we don’t have exactly what your looking for, but is there anything of interest? Let’s talk… – engagement is hard work and it’s constantly required to maintain interest in our digital collections! • Artists find this dialogue easier… • We also tend to attract researchers with ‘fuzzier’ research boundaries and possibly open to more interdisciplinary / collaborative research What engagement does the BL have with researchers wanting use our digital content?
  13. 13. 13 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Open Content vs Onsite Only Access • Access easier for openly licensed content • More challenging for on-site, in-copyright, non-print legal deposit, data protected, old content media & contemporary material (post 1877) https://goo.gl/Y5zCXg ©
  14. 14. 14 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk How do we give access to onsite-only Digital Collections (85% of our Digital Collections)?
  15. 15. 15 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk READING ROOM ON SITE NOT ONLINE OPEN British Library £ Labs Residency Model Challenges of access to Digital Collections at the BL
  16. 16. 16 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Accessing digital collections onsite OPEN £ • Have to be ‘onsite’ (interpretations vary) • Need to be ‘security cleared’ ‘trusted’ for some collections – Hence ‘Researcher in Residence Model’ • Permission required (depending on ‘story’ of collection) • Content could be on various media formats (not always online) • 5 - 20 % re-use of material for non commercial research for some collections, depends on agreements in place • We are learning ‘pathways’ so that this becomes ‘everyday’ to provide onsite access to some digital collections in the future
  17. 17. 17 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Phases of interaction at BL Labs Submit idea for support Ideas always change Once people experience the data and culture of the organisation
  18. 18. 18 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk eResearch SA Open Data Directory http://www.data.sa.edu.au/
  19. 19. 19 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk URLs to download sample files not on data.bl.uk • https://www.data.sa.edu.au/dataset/newspapers-from-british-library/ • https://www.data.sa.edu.au/dataset/ • https://www.data.sa.edu.au/dataset/
  20. 20. 20 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Working with British Library Digitised Newspapers • Digitised through public / private means • Can use commercial products to look manually for content, with search interfaces but no APIs, useful starting point though, manual methods can translate into computational ones • OCR quality is not great, metadata is OK, but plenty of hidden material, approaches require to consider this, e.g. ‘Good, Bad and Ugly’ OCR • You can purchase drives from GALE Cengage with content (dependent on subscription)
  21. 21. 21 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Good, Bad, Ugly Image Quality / OCR • Original image capture of newspaper images can effect the quality of the OCR • A poor image, very difficult to re-OCR • Good image quality much better chance for re-OCR • Bi-tonal, Grey Scale, Colour can effect the quality of the OCR • Methodology of working with collection at scale needs to acknowledge OCR and image quality
  22. 22. 22 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Breaking Black Boxes – Melodee Beals http://doi.org/cm3m
  23. 23. 23 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Burney Collection • Gathered by the Reverend Charles Burney (1757- 1817) • 700 volumes, newspapers and news pamphlets, published in London, English provincial, Irish and Scottish papers, and a few examples from the American colonies. • 1271 titles • Around 1 million digitised page images – from around 2006 from Microfilm • OCR quality mixed, used custom XML format • Bi-tonal
  24. 24. 24 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Web Interface – Burney Collection
  25. 25. 25 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk OCR quality can be very poor!
  26. 26. 26 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk 1268 Folders
  27. 27. 27 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk burney_summary.xls
  28. 28. 28 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Breakdown of titles Title No. of Pages PUBLIC ADVERTISER 60680 LONDON GAZETTE 44463 LONDON EVENING POST 38920 LONDON CHRONICLE 32030 GAZETTEER AND NEW DAILY ADVERTISER 31250 LLOYD'S EVENING POST 28941 ST. JAMES'S CHRONICLE OR THE BRITISH EVENING POST 28130 MORNING CHRONICLE AND LONDON ADVERTISER 27658 DAILY COURANT 25334 GENERAL EVENING POST 23500 12 TITLES WITH 10,000+ PAGES 188266 87 TITLES WITH 1,000+ PAGES 289745 216 TITLES WITH 100+ PAGES 79374 945 TITLES WITH 1 TO 100 PAGES 16816
  29. 29. 29 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Example Folders B0001ORIWEEJO - APPLEBEE''S ORIGINAL WEEKLY JOURNAL - 1715 – 1720 B0018CONTPROC - PROCEEDINGS OF THE ARMY UNDER THE COMMAND OF SIR THOMAS FAIRFAX – 1645 B0054REPINFCH - REPORT OF THE STATE OF THE GENERAL INFIRMARY AT CHESTOR - 1754?-1779 B0101PROCPARL - EXACT RELATION OF THE PROCEEDINGS AND TRANSACTIONS OF THE LATE PARLIAMENT – 1654 B0277INSTRUCT - INSTRUCTOR – 1724 B1381SCOU1717 - SCOURGE (1717, REPRINT) - 1717?
  30. 30. 30 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Example files ‘service’ folder contains page level images and corresponding OCR XML BurneyB0001ORIWEEJO17151119service
  31. 31. 31 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk APPLEBEE''S ORIGINAL WEEKLY JOURNAL FROM SATURDAY NOVEMBER 19 TO SATURDAY NOVEMBER 26 1715 WO2_B0001ORIWEEJO_1715_11_19-0001.tiff
  32. 32. 32 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk JISC 1 and JISC 2 Newspapers
  33. 33. 33 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Accessing digitised newspapers through Gale Interface (subscription)
  34. 34. 34 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Private BL NAS Accessible onsite or remotely if security cleared via CITRIX
  35. 35. 35 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Accessing digitised newspapers onsite at the BL (JISC 1) 12 Volumes, 80TB of data
  36. 36. 36 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Accessing digitised newspapers onsite at the BL Accessing ‘service’ Copy (post processed) and results of OCR available as XML
  37. 37. 37 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Accessing digitised newspapers onsite at the BL Accessing ‘service’ Copy (post processed)
  38. 38. 38 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Accessing digitised newspapers onsite at the BL Accessing OCR as XML
  39. 39. 39 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk jisc_1.xls 79 Titles, 2 million pages
  40. 40. 40 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Metadata from BL (JISC 1 and 2) • Title Metadata – Title, as written – Normalised title across all variants – Standardised title abbreviation – Variant titles, with associated dates – Place of publication – Dates of publication – Genre, such as newspaper – Sub-collection, such as Regional Daily Issue Metadata Volume Number Issue Number Date as printed Normalised date (YYYY.MM.DD) Number of pages The microfilm reel number The OCR quality Page image data The number of the image within that issue The filename The spatial coordinates for the page within the image The degree of page skew
  41. 41. 41 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Metadata from Gale (JISC 1 and 2) • Standardised identifier • Newspaper title • Standardised title abbreviation • Project codes • Digitized collection name • Issue number • Date as printed • Standardised date (Month, DD, YYYY) • Standardised date (YYYYMMDD) • Day of the week • Number of Pages • Copyright holder Language Unique ID for publication Holding Library Citation of the physical item Title metadata Title as recorded in the MARC Library Catalogue Dates of publication Genre, such as newspaper Conversion credit, usually a vendor Article Unique ID OCR quality SC, or standardized category of article Unique ID(s) of page(s) Unique ID(s) of individual column(s) Column number Headline Article type
  42. 42. 42 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Samples for JISC 1 ‘master’ contains high res tiff ‘service’ contains post processed tiff and OCR XML BNWL - The Belfast News-Letter - 1871 - November 14 BNWL - The Belfast News-Letter - 1885 - September 12 DNLN - Daily News - 21 Jan 1846 - 31 Dec 1900
  43. 43. 43 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk JISC 2 Collection • 22 Titles • Regional titles • 1020550 pages
  44. 44. 44 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk jisc_2.xls
  45. 45. 45 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk JISC 2 • 40 TB • Stored differently locally 192,353 folders
  46. 46. 46 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Samples for JISC 2 • Organised differently
  47. 47. 47 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Samples for JISC 2 Lancaster Gazetter, And General Advertiser For Lancashire West Southampton Herald Berrows Worcester Journal A - Contains post processed files M - Contains JP2 O - Contains ALTO XML
  48. 48. 48 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Previous ideas of using collection • Bob Nicholson – Finding jokes • Katrina Navickas – Political meetings • Hannah Murray – Black abolitionist performances • Jennifer Batt – Finding poetry • Surendra Singh – Finding suicide articles • Melodee Beals – Evidence of copy and paste • Ryan Cordel – Viral Texts • Paul Fyfe - Snipping out images
  49. 49. 49 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Useful resources • http://oceanicexchanges.org/ • http://scissorsandpaste.net/ • http://viraltexts.org/ • https://repository.lib.ncsu.edu/bitstream/handle/1840.20/33457/fyfe.newspaper.ar chaeology.VPR.pdf?sequence=1
  50. 50. 50 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Use of Overproof OCR Correction? Re-OCR with ABBY FineReader? https://www.abbyy.com/en-gb/ http://overproof.projectcomputing.com/ RE-OCR
  51. 51. 51 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Virtual Infrastructure for OCR text OCR text ‘scraped’ from digitised newspapers and put in cloud Jupyter notebook Write python code and results in web browser http://jupyter.org Access available for researchers ‘in residence’ https://www.docker.com/
  52. 52. 52 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk 65,000 digitised 19th Century books Image: Artwork by Alicia Martin 2007 / 2008 Paid for by: For a full list: https://goo.gl/HqPQMS Subjects include: Philosophy Poetry History Literature 1789 - 1876
  53. 53. 53 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Working with the MS Books Collection • Metadata • Page level images • OCR Text • Flickr Commons - images snipped out and user generated tags for images • 19th Century Books Collection data
  54. 54. 54 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk 30 August 2012
  55. 55. 55 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Metadata MicrosoftBooks.xls - Over 65,000 titles
  56. 56. 56 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk MS Books – Finish Titles
  57. 57. 57 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Fiction / Non Fiction
  58. 58. 58 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Latin American Studies
  59. 59. 59 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk ALTO XML – Sample Files – 1800 - 1809 1502 Zip Files
  60. 60. 60 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk OCR Text – JSON File
  61. 61. 61 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk 002819694
  62. 62. 62 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
  63. 63. 63 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
  64. 64. 64 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Optically Character Recognised (OCR) generated Text Scanned Page Image on Flickr Commons https://goo.gl/AC43vs
  65. 65. 65 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Worked better for female faces than men’s Press http://mechanicalcurator.tumblr.com Posts image every 30 minutes http://www.flickr.com/photos/britishlibrary/ 1,020,418 images need tagging! Creative uses of images Face recognition Algorithms based on photos Mechanical Curator with an algorithmic brain (Circles, Squares and Slanty etc) http://goo.gl/qPPgxX Wikimedia Flickr Commons Individual URL & API Snipping out images from 65,000 Digitised Books* >1000,000,000* views >17,000,000* tags https://goo.gl/FgZ4HM Work @ BL by Ben O’Steen, Labs and Digital Research Team*Matt Prior - http://goo.gl/j29Tnx Since Dec 2013 Tumblr *Estimates >More demand to see physical items
  66. 66. 66 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk British Library Flickr Commons https://www.flickr.com/photos/britishlibrary/ Flickr Commons has items from Galleries, Libraries, Archives and Museums (GLAM) (Mostly Public Domain)
  67. 67. 67 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Flickr Commons (100 + GLAMs as of 25/09/18)
  68. 68. 68 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Getting an account on Flickr •Get a Flickr / Yahoo account (https://login.yahoo.com/account/create) •You can then tag, organise favourites, make your own albums and galleries from Flickr images online or uploaded •You get 1TB for free!
  69. 69. 69 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk British Library Flickr Commons Why Flickr Commons? • Free! • Each image has it’s own unique web address, easy to share • Can Tag images • Has Application Programming Interface (API) Late August 2013
  70. 70. 70 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Using British Library Flickr Commons •How do we find things in this collection? •Remember snipped out images from books with no description? •Not straightforward…
  71. 71. 71 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk How is Flickr Commons Organised? • Photostream • Albums • Faves • Galleries • Tags
  72. 72. 72 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Flickr Photostream https://www.flickr.com/photos/britishlibrary/ Kind of the home page for the collection! Usually displays images with most recent activity!
  73. 73. 73 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Flickr Albums Curated by the British Library – specifically Nora McGregor She works with the public to add images or create new ones! Over 450 Albums as of 25/09/18 – Mostly Maps!
  74. 74. 74 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Flickr Faves Most favorited image first in descending order To favourite an image requires an account
  75. 75. 75 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Flickr Galleries More useful if you have an account You can create a Gallery of Flickr images to share with everyone Gallery is tied to your account
  76. 76. 76 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Flickr Groups Community based – for sharing and discussing images We might create a group for the competition – watch this space!
  77. 77. 77 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Adding Tags in Flickr Be the next ‘Chico45’!
  78. 78. 78 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Get Tags!
  79. 79. 79 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Searching within the collection!
  80. 80. 80 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk The Anatomy of a BL Flickr Record Download high res 300dpi image
  81. 81. 81 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
  82. 82. 82 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk When you log in to Flickr Commons
  83. 83. 83 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
  84. 84. 84 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Opportunities – increasing traffic to Library services You can purchase a ‘High Res’ Copy View in the Library Item Viewer Download .pdf All illustrations in book Other illustrations in books Published in same year View the item in the Library Catalogue Tags auto generated User generated Tag Grouping for image
  85. 85. 85 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Refers to the Physical Copy of the Item
  86. 86. 86 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
  87. 87. 87 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
  88. 88. 88 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Physical and Digital Copy Number relates to Physical Copy
  89. 89. 89 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
  90. 90. 90 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
  91. 91. 91 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
  92. 92. 92 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
  93. 93. 93 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
  94. 94. 94 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
  95. 95. 95 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
  96. 96. 96 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk You can’t beat the Physical Copy!
  97. 97. 97 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Now for the Digital Copy!
  98. 98. 98 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
  99. 99. 99 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
  100. 100. 100 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
  101. 101. 101 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Warning – can be large file! It’s aPDF You can do Ctrl F in it to find text But health warning about OCR!
  102. 102. 102 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
  103. 103. 103 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Page numbers don’t always correspond! Page numbers Don’t always correspond Page 132 on Flickr? Is Page Number in PDF In PDF of book Page number in book
  104. 104. 104 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk
  105. 105. 105 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Plain Text from Books? Not working But can be obtained from https://data.bl.uk/digbks/db14.html
  106. 106. 106 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk All illustrations in book / books in same year! All the illustrations in this book Other illustrations books published in the same year
  107. 107. 107 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Views and Favourites
  108. 108. 108 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Galleries •Personal Galleries which you can share.
  109. 109. 109 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Exchangeable Image File Information! For Geeks only!
  110. 110. 110 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Tags!
  111. 111. 111 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Tagging a million images Iterative Crowdsourcing http://goo.gl/j6fxac Cardiff University’s Lost Visions Project http://www.metadatagames.org/ Metadata Games James Heald Mario Klingemann Chico 45 Use computational methods Human Tagger Top British Library Flickr Commons Taggers 18 hard core taggers How to reward and keep motivated this ‘small group? Average for ‘crowd’ is 1 tag per person What kind of ‘task’ can this ‘crowd’ do? Mobile games for ‘Ships’, ‘Covers’ and ‘Portraits’ Interface for tagging
  112. 112. 112 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Adding Tags! •You have to have an account to add tags! •Could you be the next Chico 45?
  113. 113. 113 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Generated from book Description Generated from user
  114. 114. 114 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Generated by Flickr
  115. 115. 115 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Flickr Commons API https://www.flickr.com/services/api/
  116. 116. 116 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Generated by SherlockNet! bit.ly/sherlocknet
  117. 117. 117 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Sherlocknet has a search interface!
  118. 118. 118 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk SherlockNet Search for ‘people’
  119. 119. 119 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Advanced Search in SherlockNet! Tags Available for Download
  120. 120. 120 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk 19th Century Books Metadata • 1,9 Million records of 19th Century Books • Used for Sample generator project
  121. 121. 121 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Using the Wikimedia Synoptic Index • Created to help find all the maps in the books • Great resource if you want to find things by place! https://goo.gl/zuxRnG
  122. 122. 122 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Google Fusion Table • https://fusiontables.google.com/DataSource?docid=1BMm0FeSsEBa40zgs3C3v ySKC0gnPk-pSvrDqqnA7&pli=1#rows:id=1
  123. 123. 123 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Geodata flickr_geodata.csv
  124. 124. 124 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Alston Index Internal Document 55-602 - Topical Index 603 - 925 - Pressmark Sequence925 page document of BL / British Museum Pressmarks
  125. 125. 125 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Alston Index • Internal document (not to be externally shared) • Published in 1987 – dot matrix printed • Refers to British Museum and British Library Pressmarks / Shelfmarks • Shelfmarks are used internally to identify
  126. 126. 126 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Topical Index OCR problems – Re-do? Manually correct?
  127. 127. 127 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Augment Library Catalogue?
  128. 128. 128 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Libcrowds – In the Spotlight https://www.libcrowds.com/collection/playbills/projects
  129. 129. 129 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Libcrowds – Spotlight - Data https://www.libcrowds.com/collection/playbills/data
  130. 130. 130 @BL_Labs #DHA2018 @BL_DigiSchol labs@bl.uk Data Journey • Choose one or two datasets maximum • Explore the collection and make notes about any challenges and issues • See if you can curate a smaller collection from the larger collection • Tell us what you have done • We will consider to publish on http://data.bl.uk

×