SlideShare a Scribd company logo
Research data spring
Enabling Complex Analysis of Large Scale Digital Collections14/7/2015
Lots of money has been spent digitising heritage collections. Digitised heritage
collections are data. But non-computationally trained scholars don't know what
to ask of large quantities of data. Often they do not have access to high
performance computing facilities and they don’t know how to use them.
We have addressed this fundamental problem by extending research data
management processes in order to enable novel research in the arts, humanities,
and social and historical sciences and a deeper understanding of emerging
research needs. In our first phase, we have successfully implemented large
scale, complex search of a digitised collection: now we scale up…
More & more digitised content is in the public domain
14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 2
UK eScience infrastucture not used in A+H or SHS
14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 3
Phase 1: take 64,000 British Library digitised books
14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 4
See how we can analyse them using UCL’s HPC
14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 5
Moving beyond restrictive basic searches
14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 6
team
14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 7
James Hetherington
Research Software Engineer
Work with researchers 1: detect trends
14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 8
Anne Welsh
Lecturer in Library and
Information Studies, UCL
Interested in growth of professions
in theVictorian era.
Needs to be able to do AND, OR,
NOT, AND NOT Boolean queries:
beyond capabilities of current
Large scale digitisation search
functions.
Work with researchers 2: compare data sources
14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 9
Oliver Duke-Williams
Lecturer in Digital
Information Studies, UCL
Interested in history of
demographics and health data.
Can we track the prevalence of
diseases in the corpus, and do
they relate to known
epidemics, using existing data?
1853-54
c. 11,000 UK deaths
('John Snow / Broad Street pump' epidemic)
Deaths in England 1838 1839
Measles 6,514 10,937
Whooping cough 9,107 8,165
Consumption 59,025 59,559
First outbreak in UK 1831-2
c. 55,000 deaths
Cholera 1848-49
53,293 deaths (England)
1863 – East London
c.6,000 deaths
Work with researchers 3: visualise content
14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 11
Will Finley
PhD Student, History,
University of Sheffield
Interested in History of Printed
Book Illustration 1750-1850.
How can we analyse and
visualise how the size and
placing of illustrations in the
corpus changes over time?
All outputs documented on github
»https://github.com/UCL-dataspring
14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 12
Including all code, recipes, & visualisations
14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 13
Explained in a series of blog posts
» http://britishlibrary.typepad.co.uk/digital-
scholarship/2015/07/turning-research-questions-into-
computational-queries.html
http://bit.ly/dataspring
14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 14
Overview
» Not a Research Project
» Not an API
» Not replicating existing search facilities
» How can we provide access to data and compute?
» What are the technical issues in using escience infrastructure
for cultural and heritage datasets?
» How can we train people in the A+H, and Libraries, to use
this?
» How can we scale this up across the arts and humanities, and
social and historical sciences?
14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 15
Scaling Up 1: more, different data
• 25,000 texts from the first phase
of EEBO-TCP
• 1473 to 1700, 2m pages, 1b
words, public domain
• Little overlap with BL data
• We have global search of the BL
data working.Adding EEBO-TCP
will allow us to compare different
ingest issues
• Inform data service providers
about issues in using different
textual data sets
14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 16
Scaling Up 2: More researchers, understanding needs
14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 17
Scaling Up 3: making researchers into independent users
• Moving away from the “tame programmer in
the room”
• Building a set of reusable recipes
• TrainingA+H, SHS researchers and Librarians
to be able to run queries themselves
• Core set of fundamental queries that can be
tweaked be individual researchers to search for
unique terms
• By end of Phase 2: Have
researchers searching
successfully without the help of
programmers or data scientists
• In prep for Phase 3: where we
train others from the UK in the
set up and query of textual data
using existing HPC facilities.
14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 18
Plan & Outputs
» Month 1: identify researchers. Ingest EEBO-TCP. Stress test existing queries, develop search templates
» Month 2:Training with core set of researchers to adopt and implement queries. Documentation and
developing of training.
» Month 3: Independent Search workshops – software carpentry forA+H research computing
» Month 4: Reflection, write up, preparation of public facing materials that tell others how to do this.
» Fully documented on Github Repo
› https://github.com/UCL-dataspring
› Cluster code
› Raw results
› Visualisations
› User guides
» Publicly presented (will also set up dedicated blog, social media channels, etc in Phase 2)
» Submission of academic paper re project to leading conference
14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 19
Funding
»Pitching for the whole £40,000
»We need adequate funding to pay for research
programmer to:
› set up the infrastructure for training
› Prepare training materials
› Ingest new data set
»Also, other staff time, data preservation costs, travel
between sites
»Full support from UCL in FEC
14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 20
Phase 2: Make digitised books truly searchable
14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 21
Not for the pitch, but please fill in
»Contact person: still MelissaTerras
»Social media presence -@melissaterras and @j_w_baker
14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 22

More Related Content

What's hot

Introduction to an ICT based cross curricular resource for PGDE Geography
Introduction to an ICT based cross curricular resource for PGDE GeographyIntroduction to an ICT based cross curricular resource for PGDE Geography
Introduction to an ICT based cross curricular resource for PGDE Geography
EDINA
 
DMAOnline - data management administration online
DMAOnline - data management administration onlineDMAOnline - data management administration online
DMAOnline - data management administration online
Jisc
 
RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...
RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...
RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...
Nick Sheppard
 
Using Open Research Data for Public Policy Making: Opportunities of Virtual R...
Using Open Research Data for Public Policy Making: Opportunities of Virtual R...Using Open Research Data for Public Policy Making: Opportunities of Virtual R...
Using Open Research Data for Public Policy Making: Opportunities of Virtual R...
Danube University Krems, Centre for E-Governance
 
Towards a Linked Data Publishing Methodology
Towards a Linked Data Publishing MethodologyTowards a Linked Data Publishing Methodology
Towards a Linked Data Publishing Methodology
Danube University Krems, Centre for E-Governance
 
Northumbria University case study
Northumbria University case studyNorthumbria University case study
Northumbria University case study
Jisc RDM
 
Software reuse, repurposing and reproducibility
Software reuse, repurposing and reproducibilitySoftware reuse, repurposing and reproducibility
Software reuse, repurposing and reproducibility
Jisc
 
DMPOnline by Sarah Jones
DMPOnline by Sarah JonesDMPOnline by Sarah Jones
DMPOnline by Sarah Jones
Jisc RDM
 
Open source database as a service (with data publishing)
Open source database as a service (with data publishing)Open source database as a service (with data publishing)
Open source database as a service (with data publishing)
Jisc
 
EOSC pilot STFC
EOSC pilot STFCEOSC pilot STFC
EOSC pilot STFC
Jisc RDM
 
FP7 Funded RI Project experiences: some overly honest tips from a project coo...
FP7 Funded RI Project experiences: some overly honest tips from a project coo...FP7 Funded RI Project experiences: some overly honest tips from a project coo...
FP7 Funded RI Project experiences: some overly honest tips from a project coo...
Vince Smith
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharing
Jisc RDM
 
Jisc unleashing data 5 minutes
Jisc unleashing data 5 minutesJisc unleashing data 5 minutes
Jisc unleashing data 5 minutes
Daniela G. Duca
 
COBWEB Project: Citizens Observatories Side Event
COBWEB Project: Citizens Observatories Side EventCOBWEB Project: Citizens Observatories Side Event
COBWEB Project: Citizens Observatories Side Event
EDINA, University of Edinburgh
 
Engaging researchers in RDM & Open Data at Edinburgh University
Engaging researchers in RDM & Open Data at Edinburgh UniversityEngaging researchers in RDM & Open Data at Edinburgh University
Engaging researchers in RDM & Open Data at Edinburgh University
Robin Rice
 
Giving researchers credit for data
Giving researchers credit for dataGiving researchers credit for data
Giving researchers credit for data
Jisc
 
From Box to Hydra via Archivematica
From Box to Hydra via ArchivematicaFrom Box to Hydra via Archivematica
From Box to Hydra via Archivematica
Jisc RDM
 
Scholze goportis 4-11-14
Scholze goportis 4-11-14Scholze goportis 4-11-14
Scholze goportis 4-11-14
Karlsruhe Institute of Technology (KIT)
 
Scholze imcw 2014-11-25
Scholze imcw 2014-11-25Scholze imcw 2014-11-25
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
Vince Smith
 

What's hot (20)

Introduction to an ICT based cross curricular resource for PGDE Geography
Introduction to an ICT based cross curricular resource for PGDE GeographyIntroduction to an ICT based cross curricular resource for PGDE Geography
Introduction to an ICT based cross curricular resource for PGDE Geography
 
DMAOnline - data management administration online
DMAOnline - data management administration onlineDMAOnline - data management administration online
DMAOnline - data management administration online
 
RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...
RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...
RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...
 
Using Open Research Data for Public Policy Making: Opportunities of Virtual R...
Using Open Research Data for Public Policy Making: Opportunities of Virtual R...Using Open Research Data for Public Policy Making: Opportunities of Virtual R...
Using Open Research Data for Public Policy Making: Opportunities of Virtual R...
 
Towards a Linked Data Publishing Methodology
Towards a Linked Data Publishing MethodologyTowards a Linked Data Publishing Methodology
Towards a Linked Data Publishing Methodology
 
Northumbria University case study
Northumbria University case studyNorthumbria University case study
Northumbria University case study
 
Software reuse, repurposing and reproducibility
Software reuse, repurposing and reproducibilitySoftware reuse, repurposing and reproducibility
Software reuse, repurposing and reproducibility
 
DMPOnline by Sarah Jones
DMPOnline by Sarah JonesDMPOnline by Sarah Jones
DMPOnline by Sarah Jones
 
Open source database as a service (with data publishing)
Open source database as a service (with data publishing)Open source database as a service (with data publishing)
Open source database as a service (with data publishing)
 
EOSC pilot STFC
EOSC pilot STFCEOSC pilot STFC
EOSC pilot STFC
 
FP7 Funded RI Project experiences: some overly honest tips from a project coo...
FP7 Funded RI Project experiences: some overly honest tips from a project coo...FP7 Funded RI Project experiences: some overly honest tips from a project coo...
FP7 Funded RI Project experiences: some overly honest tips from a project coo...
 
Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharing
 
Jisc unleashing data 5 minutes
Jisc unleashing data 5 minutesJisc unleashing data 5 minutes
Jisc unleashing data 5 minutes
 
COBWEB Project: Citizens Observatories Side Event
COBWEB Project: Citizens Observatories Side EventCOBWEB Project: Citizens Observatories Side Event
COBWEB Project: Citizens Observatories Side Event
 
Engaging researchers in RDM & Open Data at Edinburgh University
Engaging researchers in RDM & Open Data at Edinburgh UniversityEngaging researchers in RDM & Open Data at Edinburgh University
Engaging researchers in RDM & Open Data at Edinburgh University
 
Giving researchers credit for data
Giving researchers credit for dataGiving researchers credit for data
Giving researchers credit for data
 
From Box to Hydra via Archivematica
From Box to Hydra via ArchivematicaFrom Box to Hydra via Archivematica
From Box to Hydra via Archivematica
 
Scholze goportis 4-11-14
Scholze goportis 4-11-14Scholze goportis 4-11-14
Scholze goportis 4-11-14
 
Scholze imcw 2014-11-25
Scholze imcw 2014-11-25Scholze imcw 2014-11-25
Scholze imcw 2014-11-25
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 

Similar to Enabling complex analysis of large scale digital collections

Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods
Stella Wisdom
 
PARTHENOS Webinar: Boost Your eHumanities and eHeritage Research with Researc...
PARTHENOS Webinar: Boost Your eHumanities and eHeritage Research with Researc...PARTHENOS Webinar: Boost Your eHumanities and eHeritage Research with Researc...
PARTHENOS Webinar: Boost Your eHumanities and eHeritage Research with Researc...
Parthenos
 
Drowning in information – the need of macroscopes for research funding
Drowning in information – the need of macroscopes for research fundingDrowning in information – the need of macroscopes for research funding
Drowning in information – the need of macroscopes for research funding
Andrea Scharnhorst
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
National Information Standards Organization (NISO)
 
Requirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research DataRequirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research Data
ariadnenetwork
 
Open sciencerefresher2019
Open sciencerefresher2019Open sciencerefresher2019
Open sciencerefresher2019
heila1
 
Introduction to Edinburgh University Data Library and national data services
Introduction to Edinburgh University Data Library and national data servicesIntroduction to Edinburgh University Data Library and national data services
Introduction to Edinburgh University Data Library and national data services
EDINA, University of Edinburgh
 
NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three
dri_ireland
 
Enabling complex analysis of large scale digital collections
Enabling complex analysis of large scale digital collectionsEnabling complex analysis of large scale digital collections
Enabling complex analysis of large scale digital collections
Jisc
 
Introduction to the University Data Library and national data services
Introduction to the University Data Library and national data servicesIntroduction to the University Data Library and national data services
Introduction to the University Data Library and national data services
EDINA, University of Edinburgh
 
Open science / open research
Open science / open researchOpen science / open research
Open science / open research
heila1
 
The Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresThe Developing Needs for e-infrastructures
The Developing Needs for e-infrastructures
guest0dc425
 
Open Culture Data: Metrics and Community Building - Maarten Brinkerink
Open Culture Data: Metrics and Community Building - Maarten BrinkerinkOpen Culture Data: Metrics and Community Building - Maarten Brinkerink
Open Culture Data: Metrics and Community Building - Maarten Brinkerink
Digitised Manuscripts to Europeana
 
Winning Horizon 2020 with Open Science
Winning Horizon 2020 with Open ScienceWinning Horizon 2020 with Open Science
Winning Horizon 2020 with Open Science
Martin Donnelly
 
Ensuring Continuity of Access To Our Published Heritage
Ensuring Continuity of Access To Our Published HeritageEnsuring Continuity of Access To Our Published Heritage
Ensuring Continuity of Access To Our Published Heritage
EDINA, University of Edinburgh
 
Rebecca Grant DPASSH presentation 2015
Rebecca Grant DPASSH presentation 2015Rebecca Grant DPASSH presentation 2015
Rebecca Grant DPASSH presentation 2015
dri_ireland
 
Leading the library of the future: w(h)ither technical services?
Leading the library of the future: w(h)ither technical services?Leading the library of the future: w(h)ither technical services?
Leading the library of the future: w(h)ither technical services?
Keith Webster
 
Leading the library of the future: w(h)ither technical services?
Leading the library of the future: w(h)ither technical services?Leading the library of the future: w(h)ither technical services?
Leading the library of the future: w(h)ither technical services?
Keith Webster
 
Are Libraries Sustainable in a World of Free, Networked, Digital Information?
Are Libraries Sustainable in a World of Free, Networked, Digital Information?Are Libraries Sustainable in a World of Free, Networked, Digital Information?
Are Libraries Sustainable in a World of Free, Networked, Digital Information?
CSUC - Consorci de Serveis Universitaris de Catalunya
 
The Needs of stakeholders in the RDM process - the role of LEARN
The Needs of stakeholders in the RDM process - the role of LEARNThe Needs of stakeholders in the RDM process - the role of LEARN
The Needs of stakeholders in the RDM process - the role of LEARN
LEARN Project
 

Similar to Enabling complex analysis of large scale digital collections (20)

Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods Digital research: Collections, data, tools and methods
Digital research: Collections, data, tools and methods
 
PARTHENOS Webinar: Boost Your eHumanities and eHeritage Research with Researc...
PARTHENOS Webinar: Boost Your eHumanities and eHeritage Research with Researc...PARTHENOS Webinar: Boost Your eHumanities and eHeritage Research with Researc...
PARTHENOS Webinar: Boost Your eHumanities and eHeritage Research with Researc...
 
Drowning in information – the need of macroscopes for research funding
Drowning in information – the need of macroscopes for research fundingDrowning in information – the need of macroscopes for research funding
Drowning in information – the need of macroscopes for research funding
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
Requirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research DataRequirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research Data
 
Open sciencerefresher2019
Open sciencerefresher2019Open sciencerefresher2019
Open sciencerefresher2019
 
Introduction to Edinburgh University Data Library and national data services
Introduction to Edinburgh University Data Library and national data servicesIntroduction to Edinburgh University Data Library and national data services
Introduction to Edinburgh University Data Library and national data services
 
NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three
 
Enabling complex analysis of large scale digital collections
Enabling complex analysis of large scale digital collectionsEnabling complex analysis of large scale digital collections
Enabling complex analysis of large scale digital collections
 
Introduction to the University Data Library and national data services
Introduction to the University Data Library and national data servicesIntroduction to the University Data Library and national data services
Introduction to the University Data Library and national data services
 
Open science / open research
Open science / open researchOpen science / open research
Open science / open research
 
The Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresThe Developing Needs for e-infrastructures
The Developing Needs for e-infrastructures
 
Open Culture Data: Metrics and Community Building - Maarten Brinkerink
Open Culture Data: Metrics and Community Building - Maarten BrinkerinkOpen Culture Data: Metrics and Community Building - Maarten Brinkerink
Open Culture Data: Metrics and Community Building - Maarten Brinkerink
 
Winning Horizon 2020 with Open Science
Winning Horizon 2020 with Open ScienceWinning Horizon 2020 with Open Science
Winning Horizon 2020 with Open Science
 
Ensuring Continuity of Access To Our Published Heritage
Ensuring Continuity of Access To Our Published HeritageEnsuring Continuity of Access To Our Published Heritage
Ensuring Continuity of Access To Our Published Heritage
 
Rebecca Grant DPASSH presentation 2015
Rebecca Grant DPASSH presentation 2015Rebecca Grant DPASSH presentation 2015
Rebecca Grant DPASSH presentation 2015
 
Leading the library of the future: w(h)ither technical services?
Leading the library of the future: w(h)ither technical services?Leading the library of the future: w(h)ither technical services?
Leading the library of the future: w(h)ither technical services?
 
Leading the library of the future: w(h)ither technical services?
Leading the library of the future: w(h)ither technical services?Leading the library of the future: w(h)ither technical services?
Leading the library of the future: w(h)ither technical services?
 
Are Libraries Sustainable in a World of Free, Networked, Digital Information?
Are Libraries Sustainable in a World of Free, Networked, Digital Information?Are Libraries Sustainable in a World of Free, Networked, Digital Information?
Are Libraries Sustainable in a World of Free, Networked, Digital Information?
 
The Needs of stakeholders in the RDM process - the role of LEARN
The Needs of stakeholders in the RDM process - the role of LEARNThe Needs of stakeholders in the RDM process - the role of LEARN
The Needs of stakeholders in the RDM process - the role of LEARN
 

More from Jisc

Adobe Express Engagement Webinar (Delegate).pptx
Adobe Express Engagement Webinar (Delegate).pptxAdobe Express Engagement Webinar (Delegate).pptx
Adobe Express Engagement Webinar (Delegate).pptx
Jisc
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Jisc's value to HE: the University of Sheffield
Jisc's value to HE: the University of SheffieldJisc's value to HE: the University of Sheffield
Jisc's value to HE: the University of Sheffield
Jisc
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
Jisc
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
Jisc
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
Jisc
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
Jisc
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
Jisc
 
International students’ digital experience: understanding and mitigating the ...
International students’ digital experience: understanding and mitigating the ...International students’ digital experience: understanding and mitigating the ...
International students’ digital experience: understanding and mitigating the ...
Jisc
 
Digital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptxDigital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptx
Jisc
 
Open Access book publishing understanding your options (1).pptx
Open Access book publishing understanding your options (1).pptxOpen Access book publishing understanding your options (1).pptx
Open Access book publishing understanding your options (1).pptx
Jisc
 
Scottish Universities Press supporting authors with requirements for open acc...
Scottish Universities Press supporting authors with requirements for open acc...Scottish Universities Press supporting authors with requirements for open acc...
Scottish Universities Press supporting authors with requirements for open acc...
Jisc
 
How Bloomsbury is supporting authors with UKRI long-form open access requirem...
How Bloomsbury is supporting authors with UKRI long-form open access requirem...How Bloomsbury is supporting authors with UKRI long-form open access requirem...
How Bloomsbury is supporting authors with UKRI long-form open access requirem...
Jisc
 
Jisc Northern Ireland Strategy Forum 2023
Jisc Northern Ireland Strategy Forum 2023Jisc Northern Ireland Strategy Forum 2023
Jisc Northern Ireland Strategy Forum 2023
Jisc
 
Jisc Scotland Strategy Forum 2023
Jisc Scotland Strategy Forum 2023Jisc Scotland Strategy Forum 2023
Jisc Scotland Strategy Forum 2023
Jisc
 
Jisc stakeholder strategic update 2023
Jisc stakeholder strategic update 2023Jisc stakeholder strategic update 2023
Jisc stakeholder strategic update 2023
Jisc
 
JISC Presentation.pptx
JISC Presentation.pptxJISC Presentation.pptx
JISC Presentation.pptx
Jisc
 
Community-led Open Access Publishing webinar.pptx
Community-led Open Access Publishing webinar.pptxCommunity-led Open Access Publishing webinar.pptx
Community-led Open Access Publishing webinar.pptx
Jisc
 

More from Jisc (20)

Adobe Express Engagement Webinar (Delegate).pptx
Adobe Express Engagement Webinar (Delegate).pptxAdobe Express Engagement Webinar (Delegate).pptx
Adobe Express Engagement Webinar (Delegate).pptx
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Jisc's value to HE: the University of Sheffield
Jisc's value to HE: the University of SheffieldJisc's value to HE: the University of Sheffield
Jisc's value to HE: the University of Sheffield
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
International students’ digital experience: understanding and mitigating the ...
International students’ digital experience: understanding and mitigating the ...International students’ digital experience: understanding and mitigating the ...
International students’ digital experience: understanding and mitigating the ...
 
Digital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptxDigital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptx
 
Open Access book publishing understanding your options (1).pptx
Open Access book publishing understanding your options (1).pptxOpen Access book publishing understanding your options (1).pptx
Open Access book publishing understanding your options (1).pptx
 
Scottish Universities Press supporting authors with requirements for open acc...
Scottish Universities Press supporting authors with requirements for open acc...Scottish Universities Press supporting authors with requirements for open acc...
Scottish Universities Press supporting authors with requirements for open acc...
 
How Bloomsbury is supporting authors with UKRI long-form open access requirem...
How Bloomsbury is supporting authors with UKRI long-form open access requirem...How Bloomsbury is supporting authors with UKRI long-form open access requirem...
How Bloomsbury is supporting authors with UKRI long-form open access requirem...
 
Jisc Northern Ireland Strategy Forum 2023
Jisc Northern Ireland Strategy Forum 2023Jisc Northern Ireland Strategy Forum 2023
Jisc Northern Ireland Strategy Forum 2023
 
Jisc Scotland Strategy Forum 2023
Jisc Scotland Strategy Forum 2023Jisc Scotland Strategy Forum 2023
Jisc Scotland Strategy Forum 2023
 
Jisc stakeholder strategic update 2023
Jisc stakeholder strategic update 2023Jisc stakeholder strategic update 2023
Jisc stakeholder strategic update 2023
 
JISC Presentation.pptx
JISC Presentation.pptxJISC Presentation.pptx
JISC Presentation.pptx
 
Community-led Open Access Publishing webinar.pptx
Community-led Open Access Publishing webinar.pptxCommunity-led Open Access Publishing webinar.pptx
Community-led Open Access Publishing webinar.pptx
 

Recently uploaded

spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
Educational Technology in the Health Sciences
Educational Technology in the Health SciencesEducational Technology in the Health Sciences
Educational Technology in the Health Sciences
Iris Thiele Isip-Tan
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
deepaannamalai16
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
Prof. Dr. K. Adisesha
 
Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.
IsmaelVazquez38
 
Observational Learning
Observational Learning Observational Learning
Observational Learning
sanamushtaq922
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
Nguyen Thanh Tu Collection
 
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
Payaamvohra1
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
MJDuyan
 
220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx
Kalna College
 
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
ImMuslim
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
RidwanHassanYusuf
 
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
nitinpv4ai
 
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxA Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
OH TEIK BIN
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
danielkiash986
 
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGHKHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
shreyassri1208
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
khuleseema60
 
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
220711130083 SUBHASHREE RAKSHIT  Internet resources for social science220711130083 SUBHASHREE RAKSHIT  Internet resources for social science
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
Kalna College
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
سمير بسيوني
 

Recently uploaded (20)

spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
Educational Technology in the Health Sciences
Educational Technology in the Health SciencesEducational Technology in the Health Sciences
Educational Technology in the Health Sciences
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
 
Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.
 
Observational Learning
Observational Learning Observational Learning
Observational Learning
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
 
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
 
220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx220711130088 Sumi Basak Virtual University EPC 3.pptx
220711130088 Sumi Basak Virtual University EPC 3.pptx
 
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
 
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
 
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxA Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
 
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGHKHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
KHUSWANT SINGH.pptx ALL YOU NEED TO KNOW ABOUT KHUSHWANT SINGH
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
 
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
220711130083 SUBHASHREE RAKSHIT  Internet resources for social science220711130083 SUBHASHREE RAKSHIT  Internet resources for social science
220711130083 SUBHASHREE RAKSHIT Internet resources for social science
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
 

Enabling complex analysis of large scale digital collections

  • 1. Research data spring Enabling Complex Analysis of Large Scale Digital Collections14/7/2015 Lots of money has been spent digitising heritage collections. Digitised heritage collections are data. But non-computationally trained scholars don't know what to ask of large quantities of data. Often they do not have access to high performance computing facilities and they don’t know how to use them. We have addressed this fundamental problem by extending research data management processes in order to enable novel research in the arts, humanities, and social and historical sciences and a deeper understanding of emerging research needs. In our first phase, we have successfully implemented large scale, complex search of a digitised collection: now we scale up…
  • 2. More & more digitised content is in the public domain 14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 2
  • 3. UK eScience infrastucture not used in A+H or SHS 14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 3
  • 4. Phase 1: take 64,000 British Library digitised books 14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 4
  • 5. See how we can analyse them using UCL’s HPC 14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 5
  • 6. Moving beyond restrictive basic searches 14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 6
  • 7. team 14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 7 James Hetherington Research Software Engineer
  • 8. Work with researchers 1: detect trends 14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 8 Anne Welsh Lecturer in Library and Information Studies, UCL Interested in growth of professions in theVictorian era. Needs to be able to do AND, OR, NOT, AND NOT Boolean queries: beyond capabilities of current Large scale digitisation search functions.
  • 9. Work with researchers 2: compare data sources 14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 9 Oliver Duke-Williams Lecturer in Digital Information Studies, UCL Interested in history of demographics and health data. Can we track the prevalence of diseases in the corpus, and do they relate to known epidemics, using existing data?
  • 10. 1853-54 c. 11,000 UK deaths ('John Snow / Broad Street pump' epidemic) Deaths in England 1838 1839 Measles 6,514 10,937 Whooping cough 9,107 8,165 Consumption 59,025 59,559 First outbreak in UK 1831-2 c. 55,000 deaths Cholera 1848-49 53,293 deaths (England) 1863 – East London c.6,000 deaths
  • 11. Work with researchers 3: visualise content 14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 11 Will Finley PhD Student, History, University of Sheffield Interested in History of Printed Book Illustration 1750-1850. How can we analyse and visualise how the size and placing of illustrations in the corpus changes over time?
  • 12. All outputs documented on github »https://github.com/UCL-dataspring 14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 12
  • 13. Including all code, recipes, & visualisations 14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 13
  • 14. Explained in a series of blog posts » http://britishlibrary.typepad.co.uk/digital- scholarship/2015/07/turning-research-questions-into- computational-queries.html http://bit.ly/dataspring 14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 14
  • 15. Overview » Not a Research Project » Not an API » Not replicating existing search facilities » How can we provide access to data and compute? » What are the technical issues in using escience infrastructure for cultural and heritage datasets? » How can we train people in the A+H, and Libraries, to use this? » How can we scale this up across the arts and humanities, and social and historical sciences? 14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 15
  • 16. Scaling Up 1: more, different data • 25,000 texts from the first phase of EEBO-TCP • 1473 to 1700, 2m pages, 1b words, public domain • Little overlap with BL data • We have global search of the BL data working.Adding EEBO-TCP will allow us to compare different ingest issues • Inform data service providers about issues in using different textual data sets 14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 16
  • 17. Scaling Up 2: More researchers, understanding needs 14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 17
  • 18. Scaling Up 3: making researchers into independent users • Moving away from the “tame programmer in the room” • Building a set of reusable recipes • TrainingA+H, SHS researchers and Librarians to be able to run queries themselves • Core set of fundamental queries that can be tweaked be individual researchers to search for unique terms • By end of Phase 2: Have researchers searching successfully without the help of programmers or data scientists • In prep for Phase 3: where we train others from the UK in the set up and query of textual data using existing HPC facilities. 14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 18
  • 19. Plan & Outputs » Month 1: identify researchers. Ingest EEBO-TCP. Stress test existing queries, develop search templates » Month 2:Training with core set of researchers to adopt and implement queries. Documentation and developing of training. » Month 3: Independent Search workshops – software carpentry forA+H research computing » Month 4: Reflection, write up, preparation of public facing materials that tell others how to do this. » Fully documented on Github Repo › https://github.com/UCL-dataspring › Cluster code › Raw results › Visualisations › User guides » Publicly presented (will also set up dedicated blog, social media channels, etc in Phase 2) » Submission of academic paper re project to leading conference 14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 19
  • 20. Funding »Pitching for the whole £40,000 »We need adequate funding to pay for research programmer to: › set up the infrastructure for training › Prepare training materials › Ingest new data set »Also, other staff time, data preservation costs, travel between sites »Full support from UCL in FEC 14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 20
  • 21. Phase 2: Make digitised books truly searchable 14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 21
  • 22. Not for the pitch, but please fill in »Contact person: still MelissaTerras »Social media presence -@melissaterras and @j_w_baker 14/07/15 Enabling Complex Analysis of Large Scale Digital Collections 22

Editor's Notes

  1. Go to ‘View’ menu > ‘Header and Footer…’ to edit the footers on this slide (click ‘Apply’ to change only the currently selected slide, or ‘Apply to All’ to change the footers on all slides). To add a background image to this slide; drag a picture to the placeholder or click the icon in the centre of the placeholder to browse for and add another image. Once added, the image can be cropped, resized or repositioned to suit.
  2. Data for all diseases, normalised by number of words. Notes: 'consumption' is an interesting test case for later: by looking at frequencies of proximate words, can we make a good guess as to whether any given reference is to consumption as a disease (the word was used as a common name for a form of tuberculosis), or just the word 'consumption'? In order to do so we need a reference set of word frequency data, but that can be confidently built by looking at proximate frequencies for the word 'tuberculosis'. Cholera is the most interesting set of results here Image shows major UK outbreaks; other outbreaks were occurring in the rest of the world at other times; first outbreak c. 1817, Bengal. There are pronounced spikes 1870s and 1880s; these are not associated with UK epidemics, but there were outbreaks in the US and elsewhere. It would be interesting to look more closely at these later clusters. Other diseases – there is some apparent relationship – more mentions of 'consumption' than of measles / whooping cough, and more deaths – BUT – this is an unfiltered use of the word 'consumption'. Data sources: Deaths in England – from Chadwick, E (1842) The Sanitary Conditions of the Labouring Population Cholera deaths, various sources mostly Wall AJ, (1893) Asiatic cholera : its history, pathology, and modern treatment, plus some not-fully-cited narratives via google.