SlideShare a Scribd company logo
1 of 13
Download to read offline
Melissa Terras, James
Baker, James
Hetherington, David
Beavan, Martin Zaltz
Austwick, Anne Welsh,
Helen O'Neill, Will Finley,
Oliver Duke-Williams, and
Adam Farquhar
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Exceptions: quotations, embeds from external sources, logos, and marked images.
Enabling Complex
Analysis of Large-Scale
Digital Collections
Humanities Research, High Performance
Computing, and transforming access to
British Library Digital Collections Data, code, viz: github.com/UCL-
dataspring
Overview
Barriers to computational approaches:
● fragmentation of communities,
resources, and tools;
● lack of interoperability;
● lack of technical skills
Data, code, viz: github.com/UCL-dataspring
Method
60k books from the British Library:
●
17th
- 19th
century
● 224GB compressed ALTO XML
● UCL High Performance Computing
● 4 humanities researchers
● Research questions to
computational queries
Data, code, viz: github.com/UCL-dataspring
Data, code, viz: github.com/UCL-dataspring
UCL’s Legion Cluster supercomputing facility. Photo: Tony Slade, © UCL Creative Media Services (all rights reserved)
Method
60k books from the British Library:
●
17th
- 19th
century
● 224GB compressed ALTO XML
● UCL High Performance Computing
● 4 humanities researchers
● Research questions to
computational queries
Data, code, viz: github.com/UCL-dataspring
Results
It worked!:
● Case Study 1: History of Medicine
● Case Study 2: History of Images
● Technical barriers
● Search ‘recipes’
Data, code, viz: github.com/UCL-dataspring
Case Study 1
History of Medicine Oliver Duke-Williams, UCL
Data, code, viz: github.com/UCL-dataspring
Case
Study 2
History of
Images
Will Finley,
Sheffield
Data, code, viz: github.com/UCL-dataspring
Case
Study 2
History of
Images
Will Finley,
Sheffield
Data, code, viz: github.com/UCL-dataspring
Technical
Major sticking point:
● Using humanities data on HPCs
Best practice recommendations:
● Derived datasets
● Normalisations
● Documentating decisions
● Fixed/defined dataset
Data, code, viz: github.com/UCL-dataspring
Generic searches:
● for all variants of a word
● that return keywords in context
traced over time
● for a word or phrase that ignore
another word or phrase
● for a word when in close proximity
to word a second word
● based on image metadata
Data, code, viz: github.com/UCL-dataspring
Conclusions
Recommendations for enabling
complex analysis of large-scale digital
collections in the humanities:
● 1 Invest in research software engineer capacity
to deploy and maintain openly licensed large-
scale digital collections from across the GLAM
sector in order to facilitate research in the arts,
humanities and social and historical sciences,
● 2 Invest in training library staff to run these initial
queries in collaboration with humanities faculty,
to support work with subsets of data that are
produced, and to document and manage
resulting code and derived data.
Data, code, viz: github.com/UCL-dataspring
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Exceptions: quotations, embeds from external sources, logos, and marked images.
Special thanks to UCL
Research Computing and
British Library Digital
Research for their hard work
and support!
Data, code, viz: github.com/UCL-
dataspring
Melissa Terras, James
Baker, James
Hetherington, David
Beavan, Martin Zaltz
Austwick, Anne Welsh,
Helen O'Neill, Will Finley,
Oliver Duke-Williams, and
Adam Farquhar

More Related Content

What's hot

The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...Robert H. McDonald
 
Mahendra Mahey, British Library Labs
Mahendra Mahey, British Library LabsMahendra Mahey, British Library Labs
Mahendra Mahey, British Library LabsResearchLibrariesUK
 
JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesRobert H. McDonald
 
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...Nuno Freire
 
NBK update briefing October 2017
NBK update briefing October 2017NBK update briefing October 2017
NBK update briefing October 2017Bethan Ruddock
 
Collection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environmentCollection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environmentConstance Malpas
 
Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?OCLC
 
British Library Labs Competition Presentation - Digital Humanities, Universit...
British Library Labs Competition Presentation - Digital Humanities, Universit...British Library Labs Competition Presentation - Digital Humanities, Universit...
British Library Labs Competition Presentation - Digital Humanities, Universit...labsbl
 
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...LIBER Europe
 
BL Labs and Digital Humanities
BL Labs and Digital HumanitiesBL Labs and Digital Humanities
BL Labs and Digital Humanitieslabsbl
 
British Library Labs and Competition Presentation at the Open University
British Library Labs and Competition Presentation at the Open UniversityBritish Library Labs and Competition Presentation at the Open University
British Library Labs and Competition Presentation at the Open Universitylabsbl
 
Showcasing research data tools
Showcasing research data toolsShowcasing research data tools
Showcasing research data toolsJisc RDM
 
IIIF as an Enabler to Interoperability within a Single Institution
IIIF as an Enabler to Interoperability within a Single InstitutionIIIF as an Enabler to Interoperability within a Single Institution
IIIF as an Enabler to Interoperability within a Single InstitutionIIIF_io
 
Linked Open Data Approaches within the ARIADNE Project
Linked Open Data Approaches within the ARIADNE ProjectLinked Open Data Approaches within the ARIADNE Project
Linked Open Data Approaches within the ARIADNE Projectariadnenetwork
 
British Library Labs Virtual Event - 17 May 2013, 1500GMT
British Library Labs Virtual Event - 17 May 2013, 1500GMTBritish Library Labs Virtual Event - 17 May 2013, 1500GMT
British Library Labs Virtual Event - 17 May 2013, 1500GMTlabsbl
 
IIIF Pre-conference - Usability testing conducted on the UV and Mirador
IIIF Pre-conference - Usability testing conducted on the UV and MiradorIIIF Pre-conference - Usability testing conducted on the UV and Mirador
IIIF Pre-conference - Usability testing conducted on the UV and MiradorJulien A. Raemy
 

What's hot (20)

The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
 
Mahendra Mahey, British Library Labs
Mahendra Mahey, British Library LabsMahendra Mahey, British Library Labs
Mahendra Mahey, British Library Labs
 
JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening Slides
 
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...New approaches for data acquisition at europeana  iiif, sitemaps and schema.o...
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
 
NBK update briefing October 2017
NBK update briefing October 2017NBK update briefing October 2017
NBK update briefing October 2017
 
Collection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environmentCollection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environment
 
Dash UCCSC 2016
Dash UCCSC 2016Dash UCCSC 2016
Dash UCCSC 2016
 
Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?
 
British Library Labs Competition Presentation - Digital Humanities, Universit...
British Library Labs Competition Presentation - Digital Humanities, Universit...British Library Labs Competition Presentation - Digital Humanities, Universit...
British Library Labs Competition Presentation - Digital Humanities, Universit...
 
Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013
Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013
Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013
 
Edina cigs-21-september-2012
Edina cigs-21-september-2012Edina cigs-21-september-2012
Edina cigs-21-september-2012
 
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
 
BL Labs and Digital Humanities
BL Labs and Digital HumanitiesBL Labs and Digital Humanities
BL Labs and Digital Humanities
 
British Library Labs and Competition Presentation at the Open University
British Library Labs and Competition Presentation at the Open UniversityBritish Library Labs and Competition Presentation at the Open University
British Library Labs and Competition Presentation at the Open University
 
Showcasing research data tools
Showcasing research data toolsShowcasing research data tools
Showcasing research data tools
 
IIIF as an Enabler to Interoperability within a Single Institution
IIIF as an Enabler to Interoperability within a Single InstitutionIIIF as an Enabler to Interoperability within a Single Institution
IIIF as an Enabler to Interoperability within a Single Institution
 
Linked Open Data Approaches within the ARIADNE Project
Linked Open Data Approaches within the ARIADNE ProjectLinked Open Data Approaches within the ARIADNE Project
Linked Open Data Approaches within the ARIADNE Project
 
British Library Labs Virtual Event - 17 May 2013, 1500GMT
British Library Labs Virtual Event - 17 May 2013, 1500GMTBritish Library Labs Virtual Event - 17 May 2013, 1500GMT
British Library Labs Virtual Event - 17 May 2013, 1500GMT
 
Ukla uksg 2013_final
Ukla uksg 2013_finalUkla uksg 2013_final
Ukla uksg 2013_final
 
IIIF Pre-conference - Usability testing conducted on the UV and Mirador
IIIF Pre-conference - Usability testing conducted on the UV and MiradorIIIF Pre-conference - Usability testing conducted on the UV and Mirador
IIIF Pre-conference - Usability testing conducted on the UV and Mirador
 

Viewers also liked (15)

Gdz ukrainska mova_bilyaev
Gdz ukrainska mova_bilyaevGdz ukrainska mova_bilyaev
Gdz ukrainska mova_bilyaev
 
Data Fusion Poster
Data Fusion PosterData Fusion Poster
Data Fusion Poster
 
ден на победата
ден на победатаден на победата
ден на победата
 
Uusimmat kohteet turkissa Asunto Alanyasta Turkista
Uusimmat kohteet turkissa Asunto Alanyasta TurkistaUusimmat kohteet turkissa Asunto Alanyasta Turkista
Uusimmat kohteet turkissa Asunto Alanyasta Turkista
 
The Hard Disk as the new Paper Archive
The Hard Disk as the new Paper ArchiveThe Hard Disk as the new Paper Archive
The Hard Disk as the new Paper Archive
 
Museum Ceria's Company Profile
Museum Ceria's Company ProfileMuseum Ceria's Company Profile
Museum Ceria's Company Profile
 
Museum Label for Kids ~ Ajeng
Museum Label for Kids ~ AjengMuseum Label for Kids ~ Ajeng
Museum Label for Kids ~ Ajeng
 
Importance on Conference Call Etiquette
Importance on Conference Call EtiquetteImportance on Conference Call Etiquette
Importance on Conference Call Etiquette
 
[SLIDE FACTORY] [CV slide] Vũ Trà Mi
[SLIDE FACTORY] [CV slide] Vũ Trà Mi[SLIDE FACTORY] [CV slide] Vũ Trà Mi
[SLIDE FACTORY] [CV slide] Vũ Trà Mi
 
Abstencionistas, abstenerse
Abstencionistas, abstenerseAbstencionistas, abstenerse
Abstencionistas, abstenerse
 
Tema 4 1 16
Tema 4 1 16Tema 4 1 16
Tema 4 1 16
 
Tema 5 hegemonía y transmisión de la cultura
Tema 5   hegemonía y transmisión de la cultura   Tema 5   hegemonía y transmisión de la cultura
Tema 5 hegemonía y transmisión de la cultura
 
1b) A2 Media - Language Analysis
1b) A2 Media - Language Analysis1b) A2 Media - Language Analysis
1b) A2 Media - Language Analysis
 
Microservices, DevOps, Continuous Delivery – More Than Three Buzzwords
Microservices, DevOps, Continuous Delivery – More Than Three BuzzwordsMicroservices, DevOps, Continuous Delivery – More Than Three Buzzwords
Microservices, DevOps, Continuous Delivery – More Than Three Buzzwords
 
Proposal Company Job Fair Depok
Proposal Company Job Fair DepokProposal Company Job Fair Depok
Proposal Company Job Fair Depok
 

Similar to Enabling Complex Analysis of Large-Scale Digital Collections: Humanities Research, High Performance Computing, and transforming access to British Library Digital Collections

Frankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectFrankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectGoethe Univeristy
 
Software and Education at NSF/ACI
Software and Education at NSF/ACISoftware and Education at NSF/ACI
Software and Education at NSF/ACIDaniel S. Katz
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?Daniel S. Katz
 
How practising open research can benefit you
How practising open research can benefit youHow practising open research can benefit you
How practising open research can benefit youUoLResearchSupport
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Bertram Ludäscher
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021Gérard Dupont
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...Carole Goble
 
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...Trevor Owens
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College LondonSarah Anna Stewart
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...Carole Goble
 
Making Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org RegistryMaking Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org RegistryHeinz Pampel
 
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13DataDryad
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseRDTF-Discovery
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationMANENDRASINGH30
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎Libcorpio
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityOscar Corcho
 

Similar to Enabling Complex Analysis of Large-Scale Digital Collections: Humanities Research, High Performance Computing, and transforming access to British Library Digital Collections (20)

Frankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee ProjeectFrankfurt Big Data Lab & Refugee Projeect
Frankfurt Big Data Lab & Refugee Projeect
 
Software and Education at NSF/ACI
Software and Education at NSF/ACISoftware and Education at NSF/ACI
Software and Education at NSF/ACI
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?
 
How practising open research can benefit you
How practising open research can benefit youHow practising open research can benefit you
How practising open research can benefit you
 
E Infrastructure for OA
E Infrastructure for OAE Infrastructure for OA
E Infrastructure for OA
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
Open science platforms
Open science platformsOpen science platforms
Open science platforms
 
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College London
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
 
Making Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org RegistryMaking Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org Registry
 
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and Education
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
 
Ppt hk pres_final
Ppt hk pres_finalPpt hk pres_final
Ppt hk pres_final
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
 

More from James Baker

1.5 million words of Mary Dorothy George: a computational approach to curator...
1.5 million words of Mary Dorothy George: a computational approach to curator...1.5 million words of Mary Dorothy George: a computational approach to curator...
1.5 million words of Mary Dorothy George: a computational approach to curator...James Baker
 
Digital History in the student learning experience
Digital History in the student learning experienceDigital History in the student learning experience
Digital History in the student learning experienceJames Baker
 
Decolonial Futures for Colonial Metadata, 1838-present
Decolonial Futures for Colonial Metadata, 1838-presentDecolonial Futures for Colonial Metadata, 1838-present
Decolonial Futures for Colonial Metadata, 1838-presentJames Baker
 
The Programming Historian: Open Access, Open Source, Open Project
The Programming Historian: Open Access, Open Source, Open ProjectThe Programming Historian: Open Access, Open Source, Open Project
The Programming Historian: Open Access, Open Source, Open ProjectJames Baker
 
Outlook: Email Archives , 1990-2007
Outlook: Email Archives , 1990-2007Outlook: Email Archives , 1990-2007
Outlook: Email Archives , 1990-2007James Baker
 
Forensic Recovery from Data Storage
Forensic Recovery from Data StorageForensic Recovery from Data Storage
Forensic Recovery from Data StorageJames Baker
 
Digital History in the student learning experience
Digital History in the student learning experienceDigital History in the student learning experience
Digital History in the student learning experienceJames Baker
 
Who is the Digital Historian?
Who is the Digital Historian?Who is the Digital Historian?
Who is the Digital Historian?James Baker
 
Image Recognition with Pastec
Image Recognition with PastecImage Recognition with Pastec
Image Recognition with PastecJames Baker
 
Publication and Dissemination of Data
Publication and Dissemination of DataPublication and Dissemination of Data
Publication and Dissemination of DataJames Baker
 
Library Carpentry: software skills training for library professionals, Chart...
 Library Carpentry: software skills training for library professionals, Chart... Library Carpentry: software skills training for library professionals, Chart...
Library Carpentry: software skills training for library professionals, Chart...James Baker
 
Hard disks as archives of everyday life
Hard disks as archives of everyday lifeHard disks as archives of everyday life
Hard disks as archives of everyday lifeJames Baker
 
Ditching the Digital
Ditching the DigitalDitching the Digital
Ditching the DigitalJames Baker
 
Complex Analysis of Large Scale Digital Collections: reflections on some oppo...
Complex Analysis of Large Scale Digital Collections: reflections on some oppo...Complex Analysis of Large Scale Digital Collections: reflections on some oppo...
Complex Analysis of Large Scale Digital Collections: reflections on some oppo...James Baker
 
The Hard Disk as the new Paper Archive: opportunities and challenges for hist...
The Hard Disk as the new Paper Archive: opportunities and challenges for hist...The Hard Disk as the new Paper Archive: opportunities and challenges for hist...
The Hard Disk as the new Paper Archive: opportunities and challenges for hist...James Baker
 
Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...James Baker
 
Library Carpentry. Week One: Basics
Library Carpentry. Week One: BasicsLibrary Carpentry. Week One: Basics
Library Carpentry. Week One: BasicsJames Baker
 
On Open Access monograph publishing for Arts, Humanities and Social Science R...
On Open Access monograph publishing for Arts, Humanities and Social Science R...On Open Access monograph publishing for Arts, Humanities and Social Science R...
On Open Access monograph publishing for Arts, Humanities and Social Science R...James Baker
 
Me in three minutes
Me in three minutesMe in three minutes
Me in three minutesJames Baker
 
Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...James Baker
 

More from James Baker (20)

1.5 million words of Mary Dorothy George: a computational approach to curator...
1.5 million words of Mary Dorothy George: a computational approach to curator...1.5 million words of Mary Dorothy George: a computational approach to curator...
1.5 million words of Mary Dorothy George: a computational approach to curator...
 
Digital History in the student learning experience
Digital History in the student learning experienceDigital History in the student learning experience
Digital History in the student learning experience
 
Decolonial Futures for Colonial Metadata, 1838-present
Decolonial Futures for Colonial Metadata, 1838-presentDecolonial Futures for Colonial Metadata, 1838-present
Decolonial Futures for Colonial Metadata, 1838-present
 
The Programming Historian: Open Access, Open Source, Open Project
The Programming Historian: Open Access, Open Source, Open ProjectThe Programming Historian: Open Access, Open Source, Open Project
The Programming Historian: Open Access, Open Source, Open Project
 
Outlook: Email Archives , 1990-2007
Outlook: Email Archives , 1990-2007Outlook: Email Archives , 1990-2007
Outlook: Email Archives , 1990-2007
 
Forensic Recovery from Data Storage
Forensic Recovery from Data StorageForensic Recovery from Data Storage
Forensic Recovery from Data Storage
 
Digital History in the student learning experience
Digital History in the student learning experienceDigital History in the student learning experience
Digital History in the student learning experience
 
Who is the Digital Historian?
Who is the Digital Historian?Who is the Digital Historian?
Who is the Digital Historian?
 
Image Recognition with Pastec
Image Recognition with PastecImage Recognition with Pastec
Image Recognition with Pastec
 
Publication and Dissemination of Data
Publication and Dissemination of DataPublication and Dissemination of Data
Publication and Dissemination of Data
 
Library Carpentry: software skills training for library professionals, Chart...
 Library Carpentry: software skills training for library professionals, Chart... Library Carpentry: software skills training for library professionals, Chart...
Library Carpentry: software skills training for library professionals, Chart...
 
Hard disks as archives of everyday life
Hard disks as archives of everyday lifeHard disks as archives of everyday life
Hard disks as archives of everyday life
 
Ditching the Digital
Ditching the DigitalDitching the Digital
Ditching the Digital
 
Complex Analysis of Large Scale Digital Collections: reflections on some oppo...
Complex Analysis of Large Scale Digital Collections: reflections on some oppo...Complex Analysis of Large Scale Digital Collections: reflections on some oppo...
Complex Analysis of Large Scale Digital Collections: reflections on some oppo...
 
The Hard Disk as the new Paper Archive: opportunities and challenges for hist...
The Hard Disk as the new Paper Archive: opportunities and challenges for hist...The Hard Disk as the new Paper Archive: opportunities and challenges for hist...
The Hard Disk as the new Paper Archive: opportunities and challenges for hist...
 
Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...
 
Library Carpentry. Week One: Basics
Library Carpentry. Week One: BasicsLibrary Carpentry. Week One: Basics
Library Carpentry. Week One: Basics
 
On Open Access monograph publishing for Arts, Humanities and Social Science R...
On Open Access monograph publishing for Arts, Humanities and Social Science R...On Open Access monograph publishing for Arts, Humanities and Social Science R...
On Open Access monograph publishing for Arts, Humanities and Social Science R...
 
Me in three minutes
Me in three minutesMe in three minutes
Me in three minutes
 
Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...Acts of being in proxies for prints: People in the Catalogue of Political and...
Acts of being in proxies for prints: People in the Catalogue of Political and...
 

Recently uploaded

Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdfssuserdda66b
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 

Recently uploaded (20)

Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 

Enabling Complex Analysis of Large-Scale Digital Collections: Humanities Research, High Performance Computing, and transforming access to British Library Digital Collections

  • 1. Melissa Terras, James Baker, James Hetherington, David Beavan, Martin Zaltz Austwick, Anne Welsh, Helen O'Neill, Will Finley, Oliver Duke-Williams, and Adam Farquhar This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Exceptions: quotations, embeds from external sources, logos, and marked images. Enabling Complex Analysis of Large-Scale Digital Collections Humanities Research, High Performance Computing, and transforming access to British Library Digital Collections Data, code, viz: github.com/UCL- dataspring
  • 2. Overview Barriers to computational approaches: ● fragmentation of communities, resources, and tools; ● lack of interoperability; ● lack of technical skills Data, code, viz: github.com/UCL-dataspring
  • 3. Method 60k books from the British Library: ● 17th - 19th century ● 224GB compressed ALTO XML ● UCL High Performance Computing ● 4 humanities researchers ● Research questions to computational queries Data, code, viz: github.com/UCL-dataspring
  • 4. Data, code, viz: github.com/UCL-dataspring UCL’s Legion Cluster supercomputing facility. Photo: Tony Slade, © UCL Creative Media Services (all rights reserved)
  • 5. Method 60k books from the British Library: ● 17th - 19th century ● 224GB compressed ALTO XML ● UCL High Performance Computing ● 4 humanities researchers ● Research questions to computational queries Data, code, viz: github.com/UCL-dataspring
  • 6. Results It worked!: ● Case Study 1: History of Medicine ● Case Study 2: History of Images ● Technical barriers ● Search ‘recipes’ Data, code, viz: github.com/UCL-dataspring
  • 7. Case Study 1 History of Medicine Oliver Duke-Williams, UCL Data, code, viz: github.com/UCL-dataspring
  • 8. Case Study 2 History of Images Will Finley, Sheffield Data, code, viz: github.com/UCL-dataspring
  • 9. Case Study 2 History of Images Will Finley, Sheffield Data, code, viz: github.com/UCL-dataspring
  • 10. Technical Major sticking point: ● Using humanities data on HPCs Best practice recommendations: ● Derived datasets ● Normalisations ● Documentating decisions ● Fixed/defined dataset Data, code, viz: github.com/UCL-dataspring
  • 11. Generic searches: ● for all variants of a word ● that return keywords in context traced over time ● for a word or phrase that ignore another word or phrase ● for a word when in close proximity to word a second word ● based on image metadata Data, code, viz: github.com/UCL-dataspring
  • 12. Conclusions Recommendations for enabling complex analysis of large-scale digital collections in the humanities: ● 1 Invest in research software engineer capacity to deploy and maintain openly licensed large- scale digital collections from across the GLAM sector in order to facilitate research in the arts, humanities and social and historical sciences, ● 2 Invest in training library staff to run these initial queries in collaboration with humanities faculty, to support work with subsets of data that are produced, and to document and manage resulting code and derived data. Data, code, viz: github.com/UCL-dataspring
  • 13. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Exceptions: quotations, embeds from external sources, logos, and marked images. Special thanks to UCL Research Computing and British Library Digital Research for their hard work and support! Data, code, viz: github.com/UCL- dataspring Melissa Terras, James Baker, James Hetherington, David Beavan, Martin Zaltz Austwick, Anne Welsh, Helen O'Neill, Will Finley, Oliver Duke-Williams, and Adam Farquhar