A short paper for a panel on 'Data Science & Digital Humanities: new collaborations, new opportunities and new complexities' at Digital Humanities 2019, Utrecht.
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
In search of the sweet spot: infrastructure at the intersection of cultural heritage and data science?
1. In search of the sweet spot: infrastructure at the
intersection of cultural heritage and data science?
Dr Mia Ridge, Digital Curator, British Library
@mia_out @BL_DigiSchol @LivingWMachines
Data Science & Digital Humanities: new collaborations, new
opportunities and new complexities panel, DH2019
2. www.bl.uk
Why should GLAMs and data scientists collaborate?
GLAMs (galleries, libraries, archives and museums) have vast collections
• The British Library has up to 200 million items, including: 14 million books; 8
million stamps; 310,000 manuscript volumes; 4 million maps; Pamphlets,
magazines, newspapers, sheet music; Television and radio recordings;
Websites, e-books, e-journals. Over 3 million new items are added every year
2
3. www.bl.uk
Why should GLAMs and data scientists
collaborate?
GLAMs (galleries, libraries, archives and museums) have vast collections
• The British Library has up to 200 million items, including: 14 million books; 8
million stamps; 310,000 manuscript volumes; 4 million maps; Pamphlets,
magazines, newspapers, sheet music; Television and radio recordings;
Websites, e-books, e-journals. Over 3 million new items are added every year
But – that scale means cataloguing is often minimal / focused on
particular uses, so they’re not easily findable
3
6. www.bl.uk 6Cook's Handbook for London, 1897
Data scientists need (challenging) sources
(Are our collections too challenging?)
7. What stands in the way of
collaboration?
‘to get the most out of machine learning at your
organization, you need the right team and the right
mindset. The latter requires a cultural shift that
prioritizes and rewards experimentation,
measurement, and testing throughout your
organization’
- Google, ‘Everything a marketer needs to know about
machine learning’
7
Image from page 440 of "Bell telephone magazine" (1922)
8. ‘you need the right team and the right mindset. The
latter requires a cultural shift that prioritizes and
rewards experimentation, measurement, and testing
throughout your organization’
- Google, ‘Everything a marketer needs to know about
machine learning’
‘And a lot of spare capacity across teams. Whose job
changes when you bring in data science?’
- me
8
What stands in the way of
collaboration?
9. • GLAM data can add up to terabytes of data – transfer,
storage and processing become expensive
• Copyright / licensing and data protection issues
• Are GLAM and academic data science outcomes
aligned? Novelty vs application, long-term, at scale?
• How do we integrate AI-generated metadata at scale
without flooding the catalogue with ‘mentions’?
9
What else stands in the
way of collaboration?
10. www.bl.uk
Opportunities to shift GLAM infrastructure from ‘catalogue’
to ‘lake’ and provide platforms for collaborative work?
10
https://www.flickr.com/photos/missouristatearchives/11653956994
11. www.bl.uk 11
Thank you!
Questions?
Dr Mia Ridge, Digital Curator, British Library
@mia_out @BL_DigiSchol @LivingWMachines
Data Science & Digital Humanities: new collaborations, new
opportunities and new complexities panel, DH2019
Editor's Notes
Been working in the field of open cultural data for a long time, which has led to me asking, How can GLAMs and data scientists collaborate to produce outcomes that are useful for both groups?
Proposing that working on data mining with cultural heritage collections is the sweet spot.
I’ve highlighted some strings that could be linked to spatial identifiers in Northern Uganda, but there are lots of other entities that could be noted.
Catalogues are only one way into collections – examining the actual text, images, etc, provides lots of other ways in. We need data science methods to link to identifiers and to create structure from unstructured data
From Report on Northern Uganda (WOMAT-AFR-BEA-227-2-2) Title '29.9.13. Extract from Report on Northern Uganda, compiled by Lieutenant G.P. Cosens, 1st Royal Dragoons.' Description Provides a detailed description of the country, its climate, fauna and flora.
Author Cosens, Gordon Philip Lewes, d 1928, army officer, Author British Library Shelfmark WOMAT/AFR/BEA/227/2/2 Locations Depicted Nakwai Hills, Uganda ; Ngabotok, East Africa Protectorate ;
Turkwel River, East Africa Protectorate
Since the Turing moved into the BL’s building we’ve been working to convince them that our sources are interestingly messy, complicated.
(Is this really true, or are our sources too challengingly messy and vast?)
We’d like lots of lightbulb moments but are we set up for them?
We’d like lots of lightbulb moments but are we set up for them?
Is work with GLAM collections ‘significant’ or ‘novel’ enough for academic research? Creating metadata at scale isn’t sexy but it is necessary
‘Crowdsourcing / machine learning as additional info’ stuff. New data storage paradigms, new Uis/UX ideas
Had been thinking about data lakes as a way of dealing with the complexities of data structures for different uses, but the need for shared infrastructure is actually more profound.
Integrating results of DS into discovery systems means those systems need to change: more ‘data lake’ than MARC? ‘Crowdsourcing / machine learning as additional info’ stuff. New data storage paradigms, new Uis/UX ideas