SlideShare a Scribd company logo
1 of 15
The Past, Present and Future of
Digital Scholarship with
Newspaper Collections
DH2019, Utrecht, July 2019
The Past, Present and Future of Digital
Scholarship with Newspaper Collections
• Short Project Presentations:
• Living with Machines
• impresso - Media Monitoring of the Past
• Construire avec les usagers la numérisation des collections de périodiques
(NewsEye)
• Overview Papers
• Digital Editions of Serials and media historians: an overview
• Towards a Critical Framework for Digital Newspaper Scholarship
• Q&A
Our Partners Our Funders
Living with Machines
Dr Mia Ridge, British Library, Co-Investigator
Paper authors/project team: Mia Ridge, Giovanni Colavizza, with Ruth Ahnert, Claire
Austin, David Beavan, Kaspar Beelens, Mariona Coll Ardanuy, Adam Farquhar, Emma
Griffin, James Hetherington, Jon Lawrence, Katie McDonough, Barbara McGillivray,
André Piza, Daniel van Strien, Giorgia Tolfo, Alan Wilson, Daniel Wilson.
Project vision
• We aim to facilitate new historical findings about the impact of
technology on the lives of ordinary people during the Industrial
Revolution / long nineteenth century (c. 1780 – 1918)
Or
• Applying new methods to questions about the past to explore the
future of collaboration between data science, history and digital
humanities
Or
• Challenging library professionals, data scientists and historians to
‘radically collaborate’ and learn from and with each other
Why newspapers?
• Large digitised corpus available if requested
• Opportunity to tackle the challenges of working at scale:
operational, methodological, organisational
• Suitable for developing innovative computational models, tools,
code, data and infrastructure reusable by other scholars and
research projects
The British Newspaper Archive
• Nearly 33 million newspaper pages
• Site by Findmypast Limited in commercial partnership with the
British Library
• BL Labs previously facilitated access for researchers to JISC-
funded digitised newspapers
British Library newspapers and periodicals
• British Library has 60m issues (450 million pages, 34,000 titles)
from 17thC to today
• Majority UK/Irish (Legal Deposit from 1869), but also overseas
esp. USA, India, Africa
• New digitisation through ‘Heritage Made Digital’ and Living with
Machines projects
• 6.8% digitised (July 2019)
But what’s actually
available digitally?
Courtesy Yann Ryan @lievesofgrass and @BL_MadeDigital
Copyright ‘safe date’
discussions are on-going
and... complicated
Our early work with newspapers
Research questions tackled across various Labs include:
• How bad is the OCR, really? And what effect does that have on
computational linguistic and nominal linkage methods?
• Can digitising newspaper directories help us understand the
difference in political and religious affiliations (etc.) between the
overall potential corpus and what’s currently been digitised?
• Can we use crowdsourcing tasks to reliably gather information
about industrial accidents? Can we then use the results to train
machine learning tools to find accidents at scale?
Ongoing questions
• To what extent does ‘convenience’ in digitisation and the quest for
geographical coverage affect scholarship?
• Copyright dates, short vs long runs, microfilm vs hard copy
• How do we show the impact of OCR quality on both keyword
searches and data processing at scale?
• What kinds of derived datasets would be useful to researchers?
• Planning for legacy: how do we integrate entity recognition etc.
results into discovery systems? How do we ensure interoperability?
• We can share public domain but not potentially copyrighted pages
– what effect does that have on user experience?
• How do we reconcile different ideas about ‘outputs’?
Thank you!
Living with Machines @LivingWMachines
Sneak preview and newsletter signup:
http://livingwithmachines.ac.uk/
The Past, Present and Future of Digital
Scholarship with Newspaper Collections
• Short Project Presentations:
• Living with Machines
• impresso - Media Monitoring of the Past
• Construire avec les usagers la numérisation des collections de périodiques
(NewsEye)
• Overview Papers
• Digital Editions of Serials and media historians: an overview
• Towards a Critical Framework for Digital Newspaper Scholarship
• Q&A
Dividing the work into ‘Labs’
• Sources - showing the biases in the collection and processing of sources
• Language - combining approaches from computational linguistics to corpora
including newspapers and novels
• Space and time - combining census data and event-based records to
understand urban change with spatial and temporal analyses
• Communities - a meta lab, amplifying results and engaging the public in
meaningful crowdsourcing that contributes to the project's research
• 3I (Integration, infrastructure and interfaces) - connects the IT infrastructure
with work done in the other labs and vice-versa, thinking about computational
processes and integration of data science.
• Data acquisition and wrangling – managing practical aspects of data ingest
including rights and data management

More Related Content

What's hot

Working with other sectors
Working with other sectorsWorking with other sectors
Working with other sectorsJisc
 
Open Data & Local Authorities, Paul Maltby, Nov 2014
Open Data & Local Authorities, Paul Maltby, Nov 2014Open Data & Local Authorities, Paul Maltby, Nov 2014
Open Data & Local Authorities, Paul Maltby, Nov 2014CambridgeshireInsight
 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlandsopenminted_eu
 
BDE Webinar: How does the research community benefit from the new EU General ...
BDE Webinar: How does the research community benefit from the new EU General ...BDE Webinar: How does the research community benefit from the new EU General ...
BDE Webinar: How does the research community benefit from the new EU General ...BigData_Europe
 
The current status of TDM in Europe
The current status of TDM in EuropeThe current status of TDM in Europe
The current status of TDM in EuropeLIBER Europe
 
Nesta destination local cc 070715
Nesta destination local cc 070715Nesta destination local cc 070715
Nesta destination local cc 070715Kathryn Geels
 
02 apps4 energy erik mannens what if we need open data, linked and big data t...
02 apps4 energy erik mannens what if we need open data, linked and big data t...02 apps4 energy erik mannens what if we need open data, linked and big data t...
02 apps4 energy erik mannens what if we need open data, linked and big data t...Els Descheemaeker
 
OPERAS: open access in the european research area through scholarly communica...
OPERAS: open access in the european research area through scholarly communica...OPERAS: open access in the european research area through scholarly communica...
OPERAS: open access in the european research area through scholarly communica...pierre mounier
 
BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...
BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...
BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...BigData_Europe
 
EDF2014: Talk of Ksenia Petrichenko, Building Policy Analyst, Global Building...
EDF2014: Talk of Ksenia Petrichenko, Building Policy Analyst, Global Building...EDF2014: Talk of Ksenia Petrichenko, Building Policy Analyst, Global Building...
EDF2014: Talk of Ksenia Petrichenko, Building Policy Analyst, Global Building...European Data Forum
 
Presentación de Okfn-Spain
Presentación de Okfn-SpainPresentación de Okfn-Spain
Presentación de Okfn-SpainMarc Garriga
 
Getting value from institutional repositories: IRUS UK - Jisc Digital Festiva...
Getting value from institutional repositories: IRUS UK - Jisc Digital Festiva...Getting value from institutional repositories: IRUS UK - Jisc Digital Festiva...
Getting value from institutional repositories: IRUS UK - Jisc Digital Festiva...Jisc
 
Big data and the dark arts - Jisc Digital Media 2015
Big data and the dark arts - Jisc Digital Media 2015Big data and the dark arts - Jisc Digital Media 2015
Big data and the dark arts - Jisc Digital Media 2015Jisc
 
Mikko Järvenpää - infogr.am - Latvia - Stanford Engineering - Feb 23 2015
Mikko Järvenpää - infogr.am - Latvia - Stanford Engineering - Feb 23 2015Mikko Järvenpää - infogr.am - Latvia - Stanford Engineering - Feb 23 2015
Mikko Järvenpää - infogr.am - Latvia - Stanford Engineering - Feb 23 2015Burton Lee
 

What's hot (20)

Working with other sectors
Working with other sectorsWorking with other sectors
Working with other sectors
 
Keynote: Stefano Bertolo
Keynote: Stefano BertoloKeynote: Stefano Bertolo
Keynote: Stefano Bertolo
 
Open Data & Local Authorities, Paul Maltby, Nov 2014
Open Data & Local Authorities, Paul Maltby, Nov 2014Open Data & Local Authorities, Paul Maltby, Nov 2014
Open Data & Local Authorities, Paul Maltby, Nov 2014
 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlands
 
BDE Webinar: How does the research community benefit from the new EU General ...
BDE Webinar: How does the research community benefit from the new EU General ...BDE Webinar: How does the research community benefit from the new EU General ...
BDE Webinar: How does the research community benefit from the new EU General ...
 
The current status of TDM in Europe
The current status of TDM in EuropeThe current status of TDM in Europe
The current status of TDM in Europe
 
Archiving News on the Web
Archiving News on the WebArchiving News on the Web
Archiving News on the Web
 
Collecting 80 days at The British Library, by Stella Wisdom and Giulia Carla ...
Collecting 80 days at The British Library, by Stella Wisdom and Giulia Carla ...Collecting 80 days at The British Library, by Stella Wisdom and Giulia Carla ...
Collecting 80 days at The British Library, by Stella Wisdom and Giulia Carla ...
 
Nesta destination local cc 070715
Nesta destination local cc 070715Nesta destination local cc 070715
Nesta destination local cc 070715
 
02 apps4 energy erik mannens what if we need open data, linked and big data t...
02 apps4 energy erik mannens what if we need open data, linked and big data t...02 apps4 energy erik mannens what if we need open data, linked and big data t...
02 apps4 energy erik mannens what if we need open data, linked and big data t...
 
From Digital Enterprise to Insight(s) - Stefan Decker
From Digital Enterprise to Insight(s) - Stefan DeckerFrom Digital Enterprise to Insight(s) - Stefan Decker
From Digital Enterprise to Insight(s) - Stefan Decker
 
OPERAS: open access in the european research area through scholarly communica...
OPERAS: open access in the european research area through scholarly communica...OPERAS: open access in the european research area through scholarly communica...
OPERAS: open access in the european research area through scholarly communica...
 
BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...
BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...
BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...
 
The British Library Digital Research Centre
The British Library Digital Research CentreThe British Library Digital Research Centre
The British Library Digital Research Centre
 
EDF2014: Talk of Ksenia Petrichenko, Building Policy Analyst, Global Building...
EDF2014: Talk of Ksenia Petrichenko, Building Policy Analyst, Global Building...EDF2014: Talk of Ksenia Petrichenko, Building Policy Analyst, Global Building...
EDF2014: Talk of Ksenia Petrichenko, Building Policy Analyst, Global Building...
 
Open Public Procurement: Research meets Research meets Policy
Open Public  Procurement:  Research meets  Research meets  PolicyOpen Public  Procurement:  Research meets  Research meets  Policy
Open Public Procurement: Research meets Research meets Policy
 
Presentación de Okfn-Spain
Presentación de Okfn-SpainPresentación de Okfn-Spain
Presentación de Okfn-Spain
 
Getting value from institutional repositories: IRUS UK - Jisc Digital Festiva...
Getting value from institutional repositories: IRUS UK - Jisc Digital Festiva...Getting value from institutional repositories: IRUS UK - Jisc Digital Festiva...
Getting value from institutional repositories: IRUS UK - Jisc Digital Festiva...
 
Big data and the dark arts - Jisc Digital Media 2015
Big data and the dark arts - Jisc Digital Media 2015Big data and the dark arts - Jisc Digital Media 2015
Big data and the dark arts - Jisc Digital Media 2015
 
Mikko Järvenpää - infogr.am - Latvia - Stanford Engineering - Feb 23 2015
Mikko Järvenpää - infogr.am - Latvia - Stanford Engineering - Feb 23 2015Mikko Järvenpää - infogr.am - Latvia - Stanford Engineering - Feb 23 2015
Mikko Järvenpää - infogr.am - Latvia - Stanford Engineering - Feb 23 2015
 

Similar to Digital Scholarship with Newspaper Collections

Rethink research, illuminate history with the British Library
Rethink research, illuminate history with the British LibraryRethink research, illuminate history with the British Library
Rethink research, illuminate history with the British LibraryMia
 
Operationalising AI at a national library
Operationalising AI at a national libraryOperationalising AI at a national library
Operationalising AI at a national libraryMia
 
Living with Machines year two update
Living with Machines year two updateLiving with Machines year two update
Living with Machines year two updateMia
 
Living with Machines: one year in
Living with Machines: one year inLiving with Machines: one year in
Living with Machines: one year inMia
 
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ projectlabsbl
 
Digital Scholarship at the British Library
Digital Scholarship at the British LibraryDigital Scholarship at the British Library
Digital Scholarship at the British LibraryMia
 
AHRC Digital Transformations theme: the Story So Far
AHRC Digital Transformations theme: the Story So FarAHRC Digital Transformations theme: the Story So Far
AHRC Digital Transformations theme: the Story So FarAndrew Prescott
 
Cross-sector collaboration for digital museum and library projects
Cross-sector collaboration for digital museum and library projectsCross-sector collaboration for digital museum and library projects
Cross-sector collaboration for digital museum and library projectsMia
 
MA in Digital Humanities
MA in Digital Humanities MA in Digital Humanities
MA in Digital Humanities Paul Spence
 
Leaders and partners: strategic positioning for transformative services - Wen...
Leaders and partners: strategic positioning for transformative services - Wen...Leaders and partners: strategic positioning for transformative services - Wen...
Leaders and partners: strategic positioning for transformative services - Wen...CONUL Conference
 
20140408 digital newspapers collections [idlc kuala lumpur]
20140408 digital newspapers collections [idlc kuala lumpur]20140408 digital newspapers collections [idlc kuala lumpur]
20140408 digital newspapers collections [idlc kuala lumpur]Frederick Zarndt
 
TSC_CIOPres_FINALrev2_06May13_07Feb16
TSC_CIOPres_FINALrev2_06May13_07Feb16TSC_CIOPres_FINALrev2_06May13_07Feb16
TSC_CIOPres_FINALrev2_06May13_07Feb16meghaninmotion
 
James baker bronte 11.10pptx
James baker bronte 11.10pptxJames baker bronte 11.10pptx
James baker bronte 11.10pptxSoniaJones
 
Presentation to the National Science Library of the Chinese Academy of Sciences
Presentation to the National Science Library of the Chinese Academy of SciencesPresentation to the National Science Library of the Chinese Academy of Sciences
Presentation to the National Science Library of the Chinese Academy of Scienceslabsbl
 
101 This is Digital Scholarship Staff Training
101 This is Digital Scholarship Staff Training101 This is Digital Scholarship Staff Training
101 This is Digital Scholarship Staff TrainingNora McGregor
 
The art of work in the age of ??? reproduction
The art of work in the age of ??? reproductionThe art of work in the age of ??? reproduction
The art of work in the age of ??? reproductionMia
 

Similar to Digital Scholarship with Newspaper Collections (20)

Rethink research, illuminate history with the British Library
Rethink research, illuminate history with the British LibraryRethink research, illuminate history with the British Library
Rethink research, illuminate history with the British Library
 
Operationalising AI at a national library
Operationalising AI at a national libraryOperationalising AI at a national library
Operationalising AI at a national library
 
Living with Machines year two update
Living with Machines year two updateLiving with Machines year two update
Living with Machines year two update
 
Living with Machines: one year in
Living with Machines: one year inLiving with Machines: one year in
Living with Machines: one year in
 
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
 
Digital Scholarship at the British Library
Digital Scholarship at the British LibraryDigital Scholarship at the British Library
Digital Scholarship at the British Library
 
AHRC Digital Transformations theme: the Story So Far
AHRC Digital Transformations theme: the Story So FarAHRC Digital Transformations theme: the Story So Far
AHRC Digital Transformations theme: the Story So Far
 
Cross-sector collaboration for digital museum and library projects
Cross-sector collaboration for digital museum and library projectsCross-sector collaboration for digital museum and library projects
Cross-sector collaboration for digital museum and library projects
 
MA in Digital Humanities
MA in Digital Humanities MA in Digital Humanities
MA in Digital Humanities
 
Leaders and partners: strategic positioning for transformative services - Wen...
Leaders and partners: strategic positioning for transformative services - Wen...Leaders and partners: strategic positioning for transformative services - Wen...
Leaders and partners: strategic positioning for transformative services - Wen...
 
20140408 digital newspapers collections [idlc kuala lumpur]
20140408 digital newspapers collections [idlc kuala lumpur]20140408 digital newspapers collections [idlc kuala lumpur]
20140408 digital newspapers collections [idlc kuala lumpur]
 
Cs global 280114
Cs global 280114Cs global 280114
Cs global 280114
 
TSC_CIOPres_FINALrev2_06May13_07Feb16
TSC_CIOPres_FINALrev2_06May13_07Feb16TSC_CIOPres_FINALrev2_06May13_07Feb16
TSC_CIOPres_FINALrev2_06May13_07Feb16
 
James baker bronte 11.10pptx
James baker bronte 11.10pptxJames baker bronte 11.10pptx
James baker bronte 11.10pptx
 
Presentation to the National Science Library of the Chinese Academy of Sciences
Presentation to the National Science Library of the Chinese Academy of SciencesPresentation to the National Science Library of the Chinese Academy of Sciences
Presentation to the National Science Library of the Chinese Academy of Sciences
 
Bl labs roadshow aab_sheffield.2016
Bl labs roadshow aab_sheffield.2016Bl labs roadshow aab_sheffield.2016
Bl labs roadshow aab_sheffield.2016
 
101 This is Digital Scholarship Staff Training
101 This is Digital Scholarship Staff Training101 This is Digital Scholarship Staff Training
101 This is Digital Scholarship Staff Training
 
Dh2016 dstp
Dh2016 dstpDh2016 dstp
Dh2016 dstp
 
VTDNP at the Massachusetts Library Association Conference
VTDNP at the Massachusetts Library Association ConferenceVTDNP at the Massachusetts Library Association Conference
VTDNP at the Massachusetts Library Association Conference
 
The art of work in the age of ??? reproduction
The art of work in the age of ??? reproductionThe art of work in the age of ??? reproduction
The art of work in the age of ??? reproduction
 

More from Mia

Festival of Maintenance talk: Apps, microsites and collections online: innova...
Festival of Maintenance talk: Apps, microsites and collections online: innova...Festival of Maintenance talk: Apps, microsites and collections online: innova...
Festival of Maintenance talk: Apps, microsites and collections online: innova...Mia
 
Hopes, dreams and reality: crowdsourcing and the democratisation of knowledge...
Hopes, dreams and reality: crowdsourcing and the democratisation of knowledge...Hopes, dreams and reality: crowdsourcing and the democratisation of knowledge...
Hopes, dreams and reality: crowdsourcing and the democratisation of knowledge...Mia
 
Enabling digital scholarship through staff training: the British Library's ex...
Enabling digital scholarship through staff training: the British Library's ex...Enabling digital scholarship through staff training: the British Library's ex...
Enabling digital scholarship through staff training: the British Library's ex...Mia
 
A modest proposal: crowdsourcing in cultural heritage benefits us all.
A modest proposal: crowdsourcing in cultural heritage benefits us all.A modest proposal: crowdsourcing in cultural heritage benefits us all.
A modest proposal: crowdsourcing in cultural heritage benefits us all.Mia
 
Crowdsourcing at the British Library: lessons learnt and future directions
Crowdsourcing at the British Library: lessons learnt and future directionsCrowdsourcing at the British Library: lessons learnt and future directions
Crowdsourcing at the British Library: lessons learnt and future directionsMia
 
Crowdsourcing 'In the Spotlight' at the British Library
Crowdsourcing 'In the Spotlight' at the British LibraryCrowdsourcing 'In the Spotlight' at the British Library
Crowdsourcing 'In the Spotlight' at the British LibraryMia
 
Crowdsourcing: the British Library experience
Crowdsourcing: the British Library experienceCrowdsourcing: the British Library experience
Crowdsourcing: the British Library experienceMia
 
Chair's welcome, MCG's Museums+Tech 2017
Chair's welcome, MCG's Museums+Tech 2017Chair's welcome, MCG's Museums+Tech 2017
Chair's welcome, MCG's Museums+Tech 2017Mia
 
Historical thinking in crowdsourcing and citizen history projects
Historical thinking in crowdsourcing and citizen history projectsHistorical thinking in crowdsourcing and citizen history projects
Historical thinking in crowdsourcing and citizen history projectsMia
 
Connected heritage: How should Cultural Institutions Open and Connect Data?
Connected heritage: How should Cultural Institutions Open and Connect Data?Connected heritage: How should Cultural Institutions Open and Connect Data?
Connected heritage: How should Cultural Institutions Open and Connect Data?Mia
 
Wish upon a star: making crowdsourcing in cultural heritage a reality
Wish upon a star: making crowdsourcing in cultural heritage a realityWish upon a star: making crowdsourcing in cultural heritage a reality
Wish upon a star: making crowdsourcing in cultural heritage a realityMia
 
Doing Digital Research @ British Library
Doing Digital Research @ British LibraryDoing Digital Research @ British Library
Doing Digital Research @ British LibraryMia
 
Beyond the Black Box: Data Visualisation
Beyond the Black Box: Data VisualisationBeyond the Black Box: Data Visualisation
Beyond the Black Box: Data VisualisationMia
 
Introduction to information visualisation for humanities PhDs
Introduction to information visualisation for humanities PhDsIntroduction to information visualisation for humanities PhDs
Introduction to information visualisation for humanities PhDsMia
 
Planning for big data (lessons from cultural heritage)
Planning for big data (lessons from cultural heritage)Planning for big data (lessons from cultural heritage)
Planning for big data (lessons from cultural heritage)Mia
 
Digitised Manuscripts and the British Library's new IIIF viewer
Digitised Manuscripts and the British Library's new IIIF viewer Digitised Manuscripts and the British Library's new IIIF viewer
Digitised Manuscripts and the British Library's new IIIF viewer Mia
 
Why do we digitise? 20 reasons in 20 pictures
Why do we digitise? 20 reasons in 20 picturesWhy do we digitise? 20 reasons in 20 pictures
Why do we digitise? 20 reasons in 20 picturesMia
 
Reaching out: museums, crowdsourcing and participatory heritage
Reaching out: museums, crowdsourcing and participatory heritageReaching out: museums, crowdsourcing and participatory heritage
Reaching out: museums, crowdsourcing and participatory heritageMia
 
Developments in digital scholarship: at the British Library and at kitchen ta...
Developments in digital scholarship: at the British Library and at kitchen ta...Developments in digital scholarship: at the British Library and at kitchen ta...
Developments in digital scholarship: at the British Library and at kitchen ta...Mia
 
Network visualisations and the ‘so what?’ problem
Network visualisations and the ‘so what?’ problemNetwork visualisations and the ‘so what?’ problem
Network visualisations and the ‘so what?’ problemMia
 

More from Mia (20)

Festival of Maintenance talk: Apps, microsites and collections online: innova...
Festival of Maintenance talk: Apps, microsites and collections online: innova...Festival of Maintenance talk: Apps, microsites and collections online: innova...
Festival of Maintenance talk: Apps, microsites and collections online: innova...
 
Hopes, dreams and reality: crowdsourcing and the democratisation of knowledge...
Hopes, dreams and reality: crowdsourcing and the democratisation of knowledge...Hopes, dreams and reality: crowdsourcing and the democratisation of knowledge...
Hopes, dreams and reality: crowdsourcing and the democratisation of knowledge...
 
Enabling digital scholarship through staff training: the British Library's ex...
Enabling digital scholarship through staff training: the British Library's ex...Enabling digital scholarship through staff training: the British Library's ex...
Enabling digital scholarship through staff training: the British Library's ex...
 
A modest proposal: crowdsourcing in cultural heritage benefits us all.
A modest proposal: crowdsourcing in cultural heritage benefits us all.A modest proposal: crowdsourcing in cultural heritage benefits us all.
A modest proposal: crowdsourcing in cultural heritage benefits us all.
 
Crowdsourcing at the British Library: lessons learnt and future directions
Crowdsourcing at the British Library: lessons learnt and future directionsCrowdsourcing at the British Library: lessons learnt and future directions
Crowdsourcing at the British Library: lessons learnt and future directions
 
Crowdsourcing 'In the Spotlight' at the British Library
Crowdsourcing 'In the Spotlight' at the British LibraryCrowdsourcing 'In the Spotlight' at the British Library
Crowdsourcing 'In the Spotlight' at the British Library
 
Crowdsourcing: the British Library experience
Crowdsourcing: the British Library experienceCrowdsourcing: the British Library experience
Crowdsourcing: the British Library experience
 
Chair's welcome, MCG's Museums+Tech 2017
Chair's welcome, MCG's Museums+Tech 2017Chair's welcome, MCG's Museums+Tech 2017
Chair's welcome, MCG's Museums+Tech 2017
 
Historical thinking in crowdsourcing and citizen history projects
Historical thinking in crowdsourcing and citizen history projectsHistorical thinking in crowdsourcing and citizen history projects
Historical thinking in crowdsourcing and citizen history projects
 
Connected heritage: How should Cultural Institutions Open and Connect Data?
Connected heritage: How should Cultural Institutions Open and Connect Data?Connected heritage: How should Cultural Institutions Open and Connect Data?
Connected heritage: How should Cultural Institutions Open and Connect Data?
 
Wish upon a star: making crowdsourcing in cultural heritage a reality
Wish upon a star: making crowdsourcing in cultural heritage a realityWish upon a star: making crowdsourcing in cultural heritage a reality
Wish upon a star: making crowdsourcing in cultural heritage a reality
 
Doing Digital Research @ British Library
Doing Digital Research @ British LibraryDoing Digital Research @ British Library
Doing Digital Research @ British Library
 
Beyond the Black Box: Data Visualisation
Beyond the Black Box: Data VisualisationBeyond the Black Box: Data Visualisation
Beyond the Black Box: Data Visualisation
 
Introduction to information visualisation for humanities PhDs
Introduction to information visualisation for humanities PhDsIntroduction to information visualisation for humanities PhDs
Introduction to information visualisation for humanities PhDs
 
Planning for big data (lessons from cultural heritage)
Planning for big data (lessons from cultural heritage)Planning for big data (lessons from cultural heritage)
Planning for big data (lessons from cultural heritage)
 
Digitised Manuscripts and the British Library's new IIIF viewer
Digitised Manuscripts and the British Library's new IIIF viewer Digitised Manuscripts and the British Library's new IIIF viewer
Digitised Manuscripts and the British Library's new IIIF viewer
 
Why do we digitise? 20 reasons in 20 pictures
Why do we digitise? 20 reasons in 20 picturesWhy do we digitise? 20 reasons in 20 pictures
Why do we digitise? 20 reasons in 20 pictures
 
Reaching out: museums, crowdsourcing and participatory heritage
Reaching out: museums, crowdsourcing and participatory heritageReaching out: museums, crowdsourcing and participatory heritage
Reaching out: museums, crowdsourcing and participatory heritage
 
Developments in digital scholarship: at the British Library and at kitchen ta...
Developments in digital scholarship: at the British Library and at kitchen ta...Developments in digital scholarship: at the British Library and at kitchen ta...
Developments in digital scholarship: at the British Library and at kitchen ta...
 
Network visualisations and the ‘so what?’ problem
Network visualisations and the ‘so what?’ problemNetwork visualisations and the ‘so what?’ problem
Network visualisations and the ‘so what?’ problem
 

Recently uploaded

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Recently uploaded (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

Digital Scholarship with Newspaper Collections

  • 1. The Past, Present and Future of Digital Scholarship with Newspaper Collections DH2019, Utrecht, July 2019
  • 2. The Past, Present and Future of Digital Scholarship with Newspaper Collections • Short Project Presentations: • Living with Machines • impresso - Media Monitoring of the Past • Construire avec les usagers la numérisation des collections de périodiques (NewsEye) • Overview Papers • Digital Editions of Serials and media historians: an overview • Towards a Critical Framework for Digital Newspaper Scholarship • Q&A
  • 3. Our Partners Our Funders Living with Machines Dr Mia Ridge, British Library, Co-Investigator Paper authors/project team: Mia Ridge, Giovanni Colavizza, with Ruth Ahnert, Claire Austin, David Beavan, Kaspar Beelens, Mariona Coll Ardanuy, Adam Farquhar, Emma Griffin, James Hetherington, Jon Lawrence, Katie McDonough, Barbara McGillivray, André Piza, Daniel van Strien, Giorgia Tolfo, Alan Wilson, Daniel Wilson.
  • 4. Project vision • We aim to facilitate new historical findings about the impact of technology on the lives of ordinary people during the Industrial Revolution / long nineteenth century (c. 1780 – 1918) Or • Applying new methods to questions about the past to explore the future of collaboration between data science, history and digital humanities Or • Challenging library professionals, data scientists and historians to ‘radically collaborate’ and learn from and with each other
  • 5. Why newspapers? • Large digitised corpus available if requested • Opportunity to tackle the challenges of working at scale: operational, methodological, organisational • Suitable for developing innovative computational models, tools, code, data and infrastructure reusable by other scholars and research projects
  • 6. The British Newspaper Archive • Nearly 33 million newspaper pages • Site by Findmypast Limited in commercial partnership with the British Library • BL Labs previously facilitated access for researchers to JISC- funded digitised newspapers
  • 7. British Library newspapers and periodicals • British Library has 60m issues (450 million pages, 34,000 titles) from 17thC to today • Majority UK/Irish (Legal Deposit from 1869), but also overseas esp. USA, India, Africa • New digitisation through ‘Heritage Made Digital’ and Living with Machines projects • 6.8% digitised (July 2019)
  • 9. Courtesy Yann Ryan @lievesofgrass and @BL_MadeDigital
  • 10. Copyright ‘safe date’ discussions are on-going and... complicated
  • 11. Our early work with newspapers Research questions tackled across various Labs include: • How bad is the OCR, really? And what effect does that have on computational linguistic and nominal linkage methods? • Can digitising newspaper directories help us understand the difference in political and religious affiliations (etc.) between the overall potential corpus and what’s currently been digitised? • Can we use crowdsourcing tasks to reliably gather information about industrial accidents? Can we then use the results to train machine learning tools to find accidents at scale?
  • 12. Ongoing questions • To what extent does ‘convenience’ in digitisation and the quest for geographical coverage affect scholarship? • Copyright dates, short vs long runs, microfilm vs hard copy • How do we show the impact of OCR quality on both keyword searches and data processing at scale? • What kinds of derived datasets would be useful to researchers? • Planning for legacy: how do we integrate entity recognition etc. results into discovery systems? How do we ensure interoperability? • We can share public domain but not potentially copyrighted pages – what effect does that have on user experience? • How do we reconcile different ideas about ‘outputs’?
  • 13. Thank you! Living with Machines @LivingWMachines Sneak preview and newsletter signup: http://livingwithmachines.ac.uk/
  • 14. The Past, Present and Future of Digital Scholarship with Newspaper Collections • Short Project Presentations: • Living with Machines • impresso - Media Monitoring of the Past • Construire avec les usagers la numérisation des collections de périodiques (NewsEye) • Overview Papers • Digital Editions of Serials and media historians: an overview • Towards a Critical Framework for Digital Newspaper Scholarship • Q&A
  • 15. Dividing the work into ‘Labs’ • Sources - showing the biases in the collection and processing of sources • Language - combining approaches from computational linguistics to corpora including newspapers and novels • Space and time - combining census data and event-based records to understand urban change with spatial and temporal analyses • Communities - a meta lab, amplifying results and engaging the public in meaningful crowdsourcing that contributes to the project's research • 3I (Integration, infrastructure and interfaces) - connects the IT infrastructure with work done in the other labs and vice-versa, thinking about computational processes and integration of data science. • Data acquisition and wrangling – managing practical aspects of data ingest including rights and data management

Editor's Notes

  1. 3 half hour sections
  2. There are a few different ways to think about the goals of the project.
  3. Conveniently already had lots digitised; allowed us to tackle questions of scale and truly break new ground (‘new’ allowing for all the other pojrects!)
  4. Many names of researchers will be familiar to DH audiences
  5. Our dates are different than FMP, which have different relationships with newspaper publishers and can work to a later date
  6. Will we be able to link people, places etc. to identifiers at scale?
  7. 3 half hour sections