Representation and Absence in Digital Resources: The Case of Europeana Newspapers

TU Delft, Netherlands
TU Delft, NetherlandsTU Delft, Netherlands
Representation and Absence in Digital
Resources: The Case of Europeana
Newspapers
Alastair Dunning, The European Library, @alastairdunning
Clemens Neudecker, National Library of Netherlands,
@cneudecker
DH2014, Lausanne
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Source: Europeana Strategic Plan, 2015-2020, currently unpublished. See also Enumerate Project, enumerate.eu
The estimated total cost of digitising
the collections of Europe’s
museums, archives and libraries,
including the audiovisual material
they hold is approximately €100bn,
or €10bn per annum for the next 10
years, factoring in
a cumulative efficiency gain of 0.5%
per annum.
The Research & Development
Budget for the Joint Strike Fighter
programme is estimated at
€40.34bn.
It would cost between 10% and 40%
of the Joint Strike Fighter R&D
budget to digitise every eligible title
in Europe’s librariesSource: Nick Poole, Collections Trust,
http://nickpoole.org.uk/wp-
content/uploads/2011/12/digiti_repor
t.pdf
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Currently:
2
million
pages of full text
By 2015:
10
million
pages of
full text
Searching by keyword, and
organise by language,
date, source library, title
Link: http://www.theeuropeanlibrary.org/tel4/newspapers
Currently:
Metadata records
relating to
1.12m
issues
By 2015:
Metadata records
relating to up to
4m issues -
Browse by date or map
Link: http://www.theeuropeanlibrary.org/tel4/newspapers
Full Text from following libraries
•Bibliotheque nationale de France / National Library
France
•Koninklijke Bibliotheek / National Library of the
Netherlands
•Landesbibliothek Dr. Friedrich Teßmann / Teßmann
Library
•Eesti Rahvusraamatukogu / Estonian National
Library
• Kansalliskirjasto / National Library of Finland
• Latvijas Nacionala Biblioteka / National Library of
Latvia
•Biblioteka Narodowa / National Library of Poland
•Milli Kutuphane Baskanligi / National Library of
Turkey
• Österreichische Nationalbibliothek / Austrian
National Library
•Staatsbibliothek zu Berlin / Berlin State Library
•Staats- und Universitätsbibliothek Hamburg / State
and University Library
• Univerzitet u Beogradu / University Library of
Belgrade
Searching by title
Issue Level Records from following libraries
•National Library of Wales
•St. Cyril and Methodius National Library / The
National Library of Bulgaria
•National Library of Czech Republic
•National and University Library in Zagreb
•Koninklijke Bibliotheek van België / Bibliothèque
royale de Belgique
•Narodna in univerzitetna knjinica / National and
University Library of Slovenia
•National Library of Portugal
•National Library of Romania
•Landsbókasafn Íslands - Háskólabókasafn / National
and Univeristy Library of Iceland National Library of
Spain
•Bibliothèque nationale de Luxembourg / National
Library of Luxembourg
Finding matching results in
single or multiple issues
Highlighting search terms
So far, okay. Similar functionality to other national and
regional digital libraries of newspapers
See other archives via:
https://www.google.com/maps/ms?msid=217164746645697066594.0004c3d764fcb71ed2
314&msa=0
But what was the user response to an aggregation
of European newspaper libraries ?
Results of Usability Testing: http://www.europeana-newspapers.eu/wp-content/uploads/2014/05/The-European-
Library-Newspaper-Archive-Usability-testing-Report-April-2014.pdf
Source: http://www.nytimes.com/2007/03/10/business/yourmoney/11archive.html
“Many saying they would be
keen to return to the site as
the content expands.”
“Ability to search over geographic map was
highly valued”
Plenty of quibbles about design
- positions of advanced options
- re-order list of results
- manipulating facets
Much greater expectations of functionality once logged in
For example,
Saved searches
New content notification
“Much of the value of the site to participants was provided by the
images of the documents.
Participants expected to be able to save a 'local' copy once they
had located content of relevance.
As no download facility is provided, this led to some frustration
and undermined the overall potential value of the site for some
participants.”
Timetable for rest of project
Now – Protype version of interface shared with project
Throughout 2014 - Ongoing creation of OCR, and other
related technical work (OLR, Named Entities)
Throughout 2014 – Live version of website improved /
usability testing / added content
Autumn 2014 - Final project conference
Late 2014 - Newspaper browser completed with content and
tools from project
More information at
http://www.europeana-newspapers.eu/
Interface at
http://www.theeuropeanlibrary.org/tel4/newspapers/
Things the users didn’t say
(but we thought they would)
Why can’t I edit the text ?
(Our sample was researchers/ maybe it is other communities
interested in crowdsourcing?)
Note: If time permits, The European Library will develop some
crowdsourcing feature
Source: Europeana Strategic Plan, 2015-2020, currently unpublished. See also Enumerate Project, enumerate.eu
Number of digitised pages in interface: c.2m
Number of digitised pages in European libraries: c.130m
Number of physical pages in European libraries: 1.5bn+
Source: European Newspaper Survey Report
http://www.europeana-newspapers.eu/wp-content/uploads/2012/04/D4.1-Europeana-newspapers-
survey-report.pdf
Source: European Newspaper Survey Report
http://www.europeana-newspapers.eu/wp-content/uploads/2012/04/D4.1-Europeana-newspapers-
survey-report.pdf
Quantities of newspapers – a) in project b) digitised in total c) in
physical libraries
The project digital library is only a fraction of the newspaper
archive of the continent, indeed the world
As libraries, how should we represent that
absence to users ?
Should such absence be represented in the
interface itself ?
Vast
white
spaces in
the list of
results ?
….. Difficult to represent
‘archival gaps’ when seen in
the context of how little has
been digitised - creates a
needle in the haystack ….
The estimated total cost of digitising
the collections of Europe’s
museums, archives and libraries,
including the audiovisual material
they hold is approximately €100bn,
or €10bn per annum for the next 10
years, factoring in
a cumulative efficiency gain of 0.5%
per annum.
The Research & Development
Budget for the Joint Strike Fighter
programme is estimated at
€40.34bn.
It would cost between 10% and 40%
of the Joint Strike Fighter R&D
budget to digitise every eligible title
in Europe’s librariesSource: Nick Poole, Collections Trust,
http://nickpoole.org.uk/wp-
content/uploads/2011/12/digiti_repor
t.pdf
Standardised information for
every digital resource for
representing collections,
extent of content, licencing
and re-use conditions
Standardised information? For
every digital resource
produced in the world ?
Are you kidding ?
Charts and graphs external to the interface ?
Graphs are the most obvious way of adding context
but still very reliant on the library producing such
charts
How to derive a representative
(random) sample from a digital
collection?
Source: http://dilbert.com/strips/comic/2001-10-25/
Pieter Francois, winner of BL
Labs competition 2013:
“How representative are the
historical texts humanities
scholars study of the overall
body of ‘surviving’ texts that
are held in the various
library collections?”
labs.bl.uk/Sample+
Generator
There are other issues in the project content too
 Major issues
 OCR quality varies
 Different licensing statements from
different countries
 Date of copyright boundaries different in
each country
There are other issues in the interface too
 Minor Issues
 Some pages (2m by 2015) have articles
segmentation
 Some library content has named entity
extraction effecting search results
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Source: http://homepages.inf.ed.ac.uk/balex/publications/slides-DATeCH.pdf
10M pages, 7 billion words – how
much you are actually ignoring
when using only the “good” OCR
How should we allow users better ways to
understand the digital library ?
What role can the API play in this?
Can opening up the data in the digital library and allowing it to
explored in different ways ?
Traditional Model With an API
Interface
(Created by Library)
Data
(Published by Library)
Interface
(Created by Third Party)
Data
(Published by Library)
API – Application Programming Interfaces
Pioneering work of Trove API
(or rather of Tim Sherratt)
Currently:
2
million
pages of full text
By 2015:
10
million
pages of
full text
Searching by keyword, and
organise by language,
date, source library, title
Link: http://www.theeuropeanlibrary.org/tel4/newspapers
Trove Newspapers statistics
develolped by third party, based
on data provided by library
http://wraggelabs.com/shed/trove/graphs/
Interface
(Created by Third Party)
Data
(Published by Library)
Headline Roulette, developed by
third party, based on data
provided by library
http://wraggelabs.com/shed/headline-
roulette/
Interface
(Created by Third Party)
Data
(Published by Library)
Word Count of Articles, developed
by third party, based on data
provided by library
http://dhistory.org/frontpages/53/words/
Interface
(Created by Third Party)
Data
(Published by Library)
Sounds great !
But … ?
How many people in this audience would now
how to build an interface on top of API?
How many users do you know who could
build on top of an API ?
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Currently:
Metadata records
relating to
1.12m
issues
By 2015:
Metadata records
relating to up to
4m issues -
Browse by date or map
Link: http://www.theeuropeanlibrary.org/tel4/newspapers
Desert: https://www.flickr.com/photos/aigle_dore/5952236932/sizes/l
Borges Sign: https://www.flickr.com/photos/monceau/7705020640/
Map: http://gallica.bnf.fr/ark:/12148/btv1b530299707
Strike Fighter : http://en.wikipedia.org/wiki/Strike_fighter
Credits
1 of 55

Recommended

Europeana Newspapers - by
Europeana Newspapers - Europeana Newspapers -
Europeana Newspapers - TU Delft, Netherlands
4.8K views48 slides
The Great Twentieth-Century Hole Or, what the Digital Humanities Miss by
The Great Twentieth-Century Hole Or, what the Digital Humanities MissThe Great Twentieth-Century Hole Or, what the Digital Humanities Miss
The Great Twentieth-Century Hole Or, what the Digital Humanities MissTU Delft, Netherlands
4.6K views26 slides
Europeana in a Research Context by
Europeana in a Research ContextEuropeana in a Research Context
Europeana in a Research ContextTU Delft, Netherlands
2.5K views24 slides
Digitised historic newspapers in Europe by
Digitised historic newspapers in EuropeDigitised historic newspapers in Europe
Digitised historic newspapers in EuropeTU Delft, Netherlands
2.7K views30 slides
eluxemburgensia: the portal for Luxembourg's historic newspapers by
eluxemburgensia: the portal for Luxembourg's historic newspaperseluxemburgensia: the portal for Luxembourg's historic newspapers
eluxemburgensia: the portal for Luxembourg's historic newspapersEuropeana Newspapers
1.7K views12 slides
British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra... by
British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra...British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra...
British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra...labsbl
571 views16 slides

More Related Content

What's hot

Challenges and solutions in creating a european historic newspapers browser by
Challenges and solutions in creating a european historic newspapers browser Challenges and solutions in creating a european historic newspapers browser
Challenges and solutions in creating a european historic newspapers browser Europeana Newspapers
1.3K views22 slides
British Library Labs - Open University Presentation - 3 April 2014, 1100-1200 by
British Library Labs - Open University Presentation - 3 April 2014, 1100-1200British Library Labs - Open University Presentation - 3 April 2014, 1100-1200
British Library Labs - Open University Presentation - 3 April 2014, 1100-1200labsbl
871 views40 slides
Enriching Cultural Heritage Data with DBpedia by
Enriching Cultural Heritage Data with DBpediaEnriching Cultural Heritage Data with DBpedia
Enriching Cultural Heritage Data with DBpediaAntoine Isaac
2.1K views19 slides
EuropeanaTech update - Europeana AGM 2015 by
EuropeanaTech update - Europeana AGM 2015EuropeanaTech update - Europeana AGM 2015
EuropeanaTech update - Europeana AGM 2015Antoine Isaac
1.3K views20 slides
BHL-Europe_MINERVA_20111116_hrainer by
BHL-Europe_MINERVA_20111116_hrainerBHL-Europe_MINERVA_20111116_hrainer
BHL-Europe_MINERVA_20111116_hrainerHeimo Rainer
1.1K views35 slides
Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015 by
Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015
Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015Antoine Isaac
2K views29 slides

What's hot(20)

Challenges and solutions in creating a european historic newspapers browser by Europeana Newspapers
Challenges and solutions in creating a european historic newspapers browser Challenges and solutions in creating a european historic newspapers browser
Challenges and solutions in creating a european historic newspapers browser
British Library Labs - Open University Presentation - 3 April 2014, 1100-1200 by labsbl
British Library Labs - Open University Presentation - 3 April 2014, 1100-1200British Library Labs - Open University Presentation - 3 April 2014, 1100-1200
British Library Labs - Open University Presentation - 3 April 2014, 1100-1200
labsbl871 views
Enriching Cultural Heritage Data with DBpedia by Antoine Isaac
Enriching Cultural Heritage Data with DBpediaEnriching Cultural Heritage Data with DBpedia
Enriching Cultural Heritage Data with DBpedia
Antoine Isaac2.1K views
EuropeanaTech update - Europeana AGM 2015 by Antoine Isaac
EuropeanaTech update - Europeana AGM 2015EuropeanaTech update - Europeana AGM 2015
EuropeanaTech update - Europeana AGM 2015
Antoine Isaac1.3K views
BHL-Europe_MINERVA_20111116_hrainer by Heimo Rainer
BHL-Europe_MINERVA_20111116_hrainerBHL-Europe_MINERVA_20111116_hrainer
BHL-Europe_MINERVA_20111116_hrainer
Heimo Rainer1.1K views
Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015 by Antoine Isaac
Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015
Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015
Antoine Isaac2K views
Multilingual challenges in Europeana by Antoine Isaac
Multilingual challenges in EuropeanaMultilingual challenges in Europeana
Multilingual challenges in Europeana
Antoine Isaac1.9K views
Use Cases From Digital Humanities for Library Linked Data by Nuno Freire
Use Cases From Digital Humanities for Library Linked DataUse Cases From Digital Humanities for Library Linked Data
Use Cases From Digital Humanities for Library Linked Data
Nuno Freire965 views
Stiller & Király, Multilinguality of Metadata by Péter Király
Stiller & Király, Multilinguality of MetadataStiller & Király, Multilinguality of Metadata
Stiller & Király, Multilinguality of Metadata
Péter Király722 views
How to read a million books? by cneudecker
How to read a million books?How to read a million books?
How to read a million books?
cneudecker593 views
Europeana and Schema.org - DC2013 by Antoine Isaac
Europeana and Schema.org - DC2013Europeana and Schema.org - DC2013
Europeana and Schema.org - DC2013
Antoine Isaac2.4K views
British Library Labs - Presentation at the University of Nottingham - Digital... by labsbl
British Library Labs - Presentation at the University of Nottingham - Digital...British Library Labs - Presentation at the University of Nottingham - Digital...
British Library Labs - Presentation at the University of Nottingham - Digital...
labsbl1.3K views
Charper.lawdi.20130531 by charper
Charper.lawdi.20130531Charper.lawdi.20130531
Charper.lawdi.20130531
charper971 views
Future Directions of the European Library by Alastair Dunning
Future Directions of the European LibraryFuture Directions of the European Library
Future Directions of the European Library
Alastair Dunning713 views
A portrait of Europeana as a Linked Open Data case by Antoine Isaac
A portrait of Europeana as a Linked Open Data caseA portrait of Europeana as a Linked Open Data case
A portrait of Europeana as a Linked Open Data case
Antoine Isaac2.9K views
Europeana, more than data aggregation? by Antoine Isaac
Europeana, more than data aggregation?Europeana, more than data aggregation?
Europeana, more than data aggregation?
Antoine Isaac2.2K views
Open ONI and IIIF: NDNP data in an IIIF Viewer by Karen Estlund
Open ONI and IIIF: NDNP data in an IIIF ViewerOpen ONI and IIIF: NDNP data in an IIIF Viewer
Open ONI and IIIF: NDNP data in an IIIF Viewer
Karen Estlund526 views

Similar to Representation and Absence in Digital Resources: The Case of Europeana Newspapers

LIBER, Europeana and the Europeana Newspapers Project by
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectEuropeana Newspapers
598 views25 slides
LIBER, Europeana and the Europeana Newspapers Project by
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER Europe
528 views25 slides
The European(a) Newspapers Project by
The European(a) Newspapers ProjectThe European(a) Newspapers Project
The European(a) Newspapers ProjectEuropeana Newspapers
878 views17 slides
You've Digitised. What Next ? by
You've Digitised. What Next ?You've Digitised. What Next ?
You've Digitised. What Next ?TU Delft, Netherlands
525 views26 slides
You’ve Digitised Your Collection. What Next ? by
You’ve Digitised Your Collection. What Next ?You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?The European Library
531 views26 slides
ENP Belgrade Workshop Project Overview by
ENP Belgrade Workshop Project OverviewENP Belgrade Workshop Project Overview
ENP Belgrade Workshop Project OverviewEuropeana Newspapers
1.4K views14 slides

Similar to Representation and Absence in Digital Resources: The Case of Europeana Newspapers (20)

LIBER, Europeana and the Europeana Newspapers Project by Europeana Newspapers
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project by LIBER Europe
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
LIBER Europe528 views
What's up, Europeana Newspapers? by cneudecker
What's up, Europeana Newspapers?What's up, Europeana Newspapers?
What's up, Europeana Newspapers?
cneudecker424 views
GI2012 pekarek-liber by IGN Vorstand
GI2012 pekarek-liberGI2012 pekarek-liber
GI2012 pekarek-liber
IGN Vorstand672 views
Alastair Dunning, The successes of the Europeana Libraries project, The Europ... by The European Library
Alastair Dunning, The successes of the Europeana Libraries project, The Europ...Alastair Dunning, The successes of the Europeana Libraries project, The Europ...
Alastair Dunning, The successes of the Europeana Libraries project, The Europ...
Europeana. A Digital Library for the Humanities? by AubreyMcFato
Europeana. A Digital Library for the Humanities?Europeana. A Digital Library for the Humanities?
Europeana. A Digital Library for the Humanities?
AubreyMcFato388 views
Europeana Cloud: The Essential Facts by LIBER Europe
Europeana Cloud: The Essential FactsEuropeana Cloud: The Essential Facts
Europeana Cloud: The Essential Facts
LIBER Europe2.5K views
“Virtual Communities in Europe: the cultural mix and how the European Library... by bridgingworlds2008
“Virtual Communities in Europe: the cultural mix and how the European Library...“Virtual Communities in Europe: the cultural mix and how the European Library...
“Virtual Communities in Europe: the cultural mix and how the European Library...
bridgingworlds20082.3K views
The Europeana Newspapers Presentation - Cyberspace 2012 by Europeana Newspapers
The Europeana Newspapers Presentation - Cyberspace 2012The Europeana Newspapers Presentation - Cyberspace 2012
The Europeana Newspapers Presentation - Cyberspace 2012
Europeana Cloud - Alastair Dunning - November 2013 by Europeana
Europeana Cloud - Alastair Dunning - November 2013Europeana Cloud - Alastair Dunning - November 2013
Europeana Cloud - Alastair Dunning - November 2013
Europeana3K views
Designing a multilingual knowledge graph - DCMI2018 by Antoine Isaac
Designing a multilingual knowledge graph - DCMI2018Designing a multilingual knowledge graph - DCMI2018
Designing a multilingual knowledge graph - DCMI2018
Antoine Isaac609 views
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin... by The European Library
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...

More from TU Delft, Netherlands

The Landscape of Research Data Management by
The Landscape of Research Data Management The Landscape of Research Data Management
The Landscape of Research Data Management TU Delft, Netherlands
454 views22 slides
Winning the Tour de France, Research Data and Data Stewardship by
Winning the Tour de France, Research Data and Data StewardshipWinning the Tour de France, Research Data and Data Stewardship
Winning the Tour de France, Research Data and Data StewardshipTU Delft, Netherlands
1.6K views26 slides
Europeana and Researchers by
Europeana and ResearchersEuropeana and Researchers
Europeana and ResearchersTU Delft, Netherlands
535 views22 slides
Introduction to eCloud by
Introduction to eCloudIntroduction to eCloud
Introduction to eCloudTU Delft, Netherlands
914 views41 slides
Short Presentation on Europeana Cloud at Europeana AGM 2013 by
Short Presentation on Europeana Cloud at Europeana AGM 2013Short Presentation on Europeana Cloud at Europeana AGM 2013
Short Presentation on Europeana Cloud at Europeana AGM 2013TU Delft, Netherlands
475 views39 slides
Presentation on Europeana Cloud at Internet Librarian Conference 2013 by
Presentation on Europeana Cloud at Internet Librarian Conference 2013Presentation on Europeana Cloud at Internet Librarian Conference 2013
Presentation on Europeana Cloud at Internet Librarian Conference 2013TU Delft, Netherlands
493 views18 slides

More from TU Delft, Netherlands(16)

Winning the Tour de France, Research Data and Data Stewardship by TU Delft, Netherlands
Winning the Tour de France, Research Data and Data StewardshipWinning the Tour de France, Research Data and Data Stewardship
Winning the Tour de France, Research Data and Data Stewardship
Short Presentation on Europeana Cloud at Europeana AGM 2013 by TU Delft, Netherlands
Short Presentation on Europeana Cloud at Europeana AGM 2013Short Presentation on Europeana Cloud at Europeana AGM 2013
Short Presentation on Europeana Cloud at Europeana AGM 2013
Presentation on Europeana Cloud at Internet Librarian Conference 2013 by TU Delft, Netherlands
Presentation on Europeana Cloud at Internet Librarian Conference 2013Presentation on Europeana Cloud at Internet Librarian Conference 2013
Presentation on Europeana Cloud at Internet Librarian Conference 2013
Challenges and Solutions in Creating a European Historic newspapers Browser by TU Delft, Netherlands
Challenges and Solutions in Creating a European Historic newspapers Browser Challenges and Solutions in Creating a European Historic newspapers Browser
Challenges and Solutions in Creating a European Historic newspapers Browser
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud by TU Delft, Netherlands
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the CloudEuropeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
A general introduction to the Europeana Cloud project by TU Delft, Netherlands
A general introduction to the Europeana Cloud project A general introduction to the Europeana Cloud project
A general introduction to the Europeana Cloud project

Recently uploaded

Drama KS5 Breakdown by
Drama KS5 BreakdownDrama KS5 Breakdown
Drama KS5 BreakdownWestHatch
87 views2 slides
Psychology KS5 by
Psychology KS5Psychology KS5
Psychology KS5WestHatch
103 views5 slides
Collective Bargaining and Understanding a Teacher Contract(16793704.1).pptx by
Collective Bargaining and Understanding a Teacher Contract(16793704.1).pptxCollective Bargaining and Understanding a Teacher Contract(16793704.1).pptx
Collective Bargaining and Understanding a Teacher Contract(16793704.1).pptxCenter for Integrated Training & Education
94 views57 slides
Ch. 8 Political Party and Party System.pptx by
Ch. 8 Political Party and Party System.pptxCh. 8 Political Party and Party System.pptx
Ch. 8 Political Party and Party System.pptxRommel Regala
53 views11 slides
Narration lesson plan by
Narration lesson planNarration lesson plan
Narration lesson planTARIQ KHAN
59 views11 slides
Monthly Information Session for MV Asterix (November) by
Monthly Information Session for MV Asterix (November)Monthly Information Session for MV Asterix (November)
Monthly Information Session for MV Asterix (November)Esquimalt MFRC
58 views26 slides

Recently uploaded(20)

Drama KS5 Breakdown by WestHatch
Drama KS5 BreakdownDrama KS5 Breakdown
Drama KS5 Breakdown
WestHatch87 views
Psychology KS5 by WestHatch
Psychology KS5Psychology KS5
Psychology KS5
WestHatch103 views
Ch. 8 Political Party and Party System.pptx by Rommel Regala
Ch. 8 Political Party and Party System.pptxCh. 8 Political Party and Party System.pptx
Ch. 8 Political Party and Party System.pptx
Rommel Regala53 views
Narration lesson plan by TARIQ KHAN
Narration lesson planNarration lesson plan
Narration lesson plan
TARIQ KHAN59 views
Monthly Information Session for MV Asterix (November) by Esquimalt MFRC
Monthly Information Session for MV Asterix (November)Monthly Information Session for MV Asterix (November)
Monthly Information Session for MV Asterix (November)
Esquimalt MFRC58 views
EIT-Digital_Spohrer_AI_Intro 20231128 v1.pptx by ISSIP
EIT-Digital_Spohrer_AI_Intro 20231128 v1.pptxEIT-Digital_Spohrer_AI_Intro 20231128 v1.pptx
EIT-Digital_Spohrer_AI_Intro 20231128 v1.pptx
ISSIP379 views
The basics - information, data, technology and systems.pdf by JonathanCovena1
The basics - information, data, technology and systems.pdfThe basics - information, data, technology and systems.pdf
The basics - information, data, technology and systems.pdf
JonathanCovena1126 views
Use of Probiotics in Aquaculture.pptx by AKSHAY MANDAL
Use of Probiotics in Aquaculture.pptxUse of Probiotics in Aquaculture.pptx
Use of Probiotics in Aquaculture.pptx
AKSHAY MANDAL104 views
CUNY IT Picciano.pptx by apicciano
CUNY IT Picciano.pptxCUNY IT Picciano.pptx
CUNY IT Picciano.pptx
apicciano54 views
Education and Diversity.pptx by DrHafizKosar
Education and Diversity.pptxEducation and Diversity.pptx
Education and Diversity.pptx
DrHafizKosar177 views
Solar System and Galaxies.pptx by DrHafizKosar
Solar System and Galaxies.pptxSolar System and Galaxies.pptx
Solar System and Galaxies.pptx
DrHafizKosar94 views
Classification of crude drugs.pptx by GayatriPatra14
Classification of crude drugs.pptxClassification of crude drugs.pptx
Classification of crude drugs.pptx
GayatriPatra1492 views
When Sex Gets Complicated: Porn, Affairs, & Cybersex by Marlene Maheu
When Sex Gets Complicated: Porn, Affairs, & CybersexWhen Sex Gets Complicated: Porn, Affairs, & Cybersex
When Sex Gets Complicated: Porn, Affairs, & Cybersex
Marlene Maheu73 views
7 NOVEL DRUG DELIVERY SYSTEM.pptx by Sachin Nitave
7 NOVEL DRUG DELIVERY SYSTEM.pptx7 NOVEL DRUG DELIVERY SYSTEM.pptx
7 NOVEL DRUG DELIVERY SYSTEM.pptx
Sachin Nitave61 views

Representation and Absence in Digital Resources: The Case of Europeana Newspapers

  • 1. Representation and Absence in Digital Resources: The Case of Europeana Newspapers Alastair Dunning, The European Library, @alastairdunning Clemens Neudecker, National Library of Netherlands, @cneudecker DH2014, Lausanne
  • 4. Source: Europeana Strategic Plan, 2015-2020, currently unpublished. See also Enumerate Project, enumerate.eu
  • 5. The estimated total cost of digitising the collections of Europe’s museums, archives and libraries, including the audiovisual material they hold is approximately €100bn, or €10bn per annum for the next 10 years, factoring in a cumulative efficiency gain of 0.5% per annum. The Research & Development Budget for the Joint Strike Fighter programme is estimated at €40.34bn. It would cost between 10% and 40% of the Joint Strike Fighter R&D budget to digitise every eligible title in Europe’s librariesSource: Nick Poole, Collections Trust, http://nickpoole.org.uk/wp- content/uploads/2011/12/digiti_repor t.pdf
  • 7. Currently: 2 million pages of full text By 2015: 10 million pages of full text Searching by keyword, and organise by language, date, source library, title Link: http://www.theeuropeanlibrary.org/tel4/newspapers
  • 8. Currently: Metadata records relating to 1.12m issues By 2015: Metadata records relating to up to 4m issues - Browse by date or map Link: http://www.theeuropeanlibrary.org/tel4/newspapers
  • 9. Full Text from following libraries •Bibliotheque nationale de France / National Library France •Koninklijke Bibliotheek / National Library of the Netherlands •Landesbibliothek Dr. Friedrich Teßmann / Teßmann Library •Eesti Rahvusraamatukogu / Estonian National Library • Kansalliskirjasto / National Library of Finland • Latvijas Nacionala Biblioteka / National Library of Latvia •Biblioteka Narodowa / National Library of Poland •Milli Kutuphane Baskanligi / National Library of Turkey • Österreichische Nationalbibliothek / Austrian National Library •Staatsbibliothek zu Berlin / Berlin State Library •Staats- und Universitätsbibliothek Hamburg / State and University Library • Univerzitet u Beogradu / University Library of Belgrade Searching by title
  • 10. Issue Level Records from following libraries •National Library of Wales •St. Cyril and Methodius National Library / The National Library of Bulgaria •National Library of Czech Republic •National and University Library in Zagreb •Koninklijke Bibliotheek van België / Bibliothèque royale de Belgique •Narodna in univerzitetna knjinica / National and University Library of Slovenia •National Library of Portugal •National Library of Romania •Landsbókasafn Íslands - Háskólabókasafn / National and Univeristy Library of Iceland National Library of Spain •Bibliothèque nationale de Luxembourg / National Library of Luxembourg Finding matching results in single or multiple issues
  • 12. So far, okay. Similar functionality to other national and regional digital libraries of newspapers See other archives via: https://www.google.com/maps/ms?msid=217164746645697066594.0004c3d764fcb71ed2 314&msa=0
  • 13. But what was the user response to an aggregation of European newspaper libraries ? Results of Usability Testing: http://www.europeana-newspapers.eu/wp-content/uploads/2014/05/The-European- Library-Newspaper-Archive-Usability-testing-Report-April-2014.pdf
  • 15. “Many saying they would be keen to return to the site as the content expands.”
  • 16. “Ability to search over geographic map was highly valued”
  • 17. Plenty of quibbles about design - positions of advanced options - re-order list of results - manipulating facets
  • 18. Much greater expectations of functionality once logged in For example, Saved searches New content notification
  • 19. “Much of the value of the site to participants was provided by the images of the documents. Participants expected to be able to save a 'local' copy once they had located content of relevance. As no download facility is provided, this led to some frustration and undermined the overall potential value of the site for some participants.”
  • 20. Timetable for rest of project Now – Protype version of interface shared with project Throughout 2014 - Ongoing creation of OCR, and other related technical work (OLR, Named Entities) Throughout 2014 – Live version of website improved / usability testing / added content Autumn 2014 - Final project conference Late 2014 - Newspaper browser completed with content and tools from project More information at http://www.europeana-newspapers.eu/ Interface at http://www.theeuropeanlibrary.org/tel4/newspapers/
  • 21. Things the users didn’t say (but we thought they would)
  • 22. Why can’t I edit the text ? (Our sample was researchers/ maybe it is other communities interested in crowdsourcing?) Note: If time permits, The European Library will develop some crowdsourcing feature
  • 23. Source: Europeana Strategic Plan, 2015-2020, currently unpublished. See also Enumerate Project, enumerate.eu
  • 24. Number of digitised pages in interface: c.2m Number of digitised pages in European libraries: c.130m Number of physical pages in European libraries: 1.5bn+ Source: European Newspaper Survey Report http://www.europeana-newspapers.eu/wp-content/uploads/2012/04/D4.1-Europeana-newspapers- survey-report.pdf
  • 25. Source: European Newspaper Survey Report http://www.europeana-newspapers.eu/wp-content/uploads/2012/04/D4.1-Europeana-newspapers- survey-report.pdf Quantities of newspapers – a) in project b) digitised in total c) in physical libraries
  • 26. The project digital library is only a fraction of the newspaper archive of the continent, indeed the world
  • 27. As libraries, how should we represent that absence to users ?
  • 28. Should such absence be represented in the interface itself ?
  • 30. ….. Difficult to represent ‘archival gaps’ when seen in the context of how little has been digitised - creates a needle in the haystack ….
  • 31. The estimated total cost of digitising the collections of Europe’s museums, archives and libraries, including the audiovisual material they hold is approximately €100bn, or €10bn per annum for the next 10 years, factoring in a cumulative efficiency gain of 0.5% per annum. The Research & Development Budget for the Joint Strike Fighter programme is estimated at €40.34bn. It would cost between 10% and 40% of the Joint Strike Fighter R&D budget to digitise every eligible title in Europe’s librariesSource: Nick Poole, Collections Trust, http://nickpoole.org.uk/wp- content/uploads/2011/12/digiti_repor t.pdf
  • 32. Standardised information for every digital resource for representing collections, extent of content, licencing and re-use conditions
  • 33. Standardised information? For every digital resource produced in the world ? Are you kidding ?
  • 34. Charts and graphs external to the interface ?
  • 35. Graphs are the most obvious way of adding context but still very reliant on the library producing such charts
  • 36. How to derive a representative (random) sample from a digital collection? Source: http://dilbert.com/strips/comic/2001-10-25/
  • 37. Pieter Francois, winner of BL Labs competition 2013: “How representative are the historical texts humanities scholars study of the overall body of ‘surviving’ texts that are held in the various library collections?” labs.bl.uk/Sample+ Generator
  • 38. There are other issues in the project content too  Major issues  OCR quality varies  Different licensing statements from different countries  Date of copyright boundaries different in each country
  • 39. There are other issues in the interface too  Minor Issues  Some pages (2m by 2015) have articles segmentation  Some library content has named entity extraction effecting search results
  • 41. Source: http://homepages.inf.ed.ac.uk/balex/publications/slides-DATeCH.pdf 10M pages, 7 billion words – how much you are actually ignoring when using only the “good” OCR
  • 42. How should we allow users better ways to understand the digital library ?
  • 43. What role can the API play in this? Can opening up the data in the digital library and allowing it to explored in different ways ?
  • 44. Traditional Model With an API Interface (Created by Library) Data (Published by Library) Interface (Created by Third Party) Data (Published by Library) API – Application Programming Interfaces
  • 45. Pioneering work of Trove API (or rather of Tim Sherratt)
  • 46. Currently: 2 million pages of full text By 2015: 10 million pages of full text Searching by keyword, and organise by language, date, source library, title Link: http://www.theeuropeanlibrary.org/tel4/newspapers
  • 47. Trove Newspapers statistics develolped by third party, based on data provided by library http://wraggelabs.com/shed/trove/graphs/ Interface (Created by Third Party) Data (Published by Library)
  • 48. Headline Roulette, developed by third party, based on data provided by library http://wraggelabs.com/shed/headline- roulette/ Interface (Created by Third Party) Data (Published by Library)
  • 49. Word Count of Articles, developed by third party, based on data provided by library http://dhistory.org/frontpages/53/words/ Interface (Created by Third Party) Data (Published by Library)
  • 51. How many people in this audience would now how to build an interface on top of API?
  • 52. How many users do you know who could build on top of an API ?
  • 54. Currently: Metadata records relating to 1.12m issues By 2015: Metadata records relating to up to 4m issues - Browse by date or map Link: http://www.theeuropeanlibrary.org/tel4/newspapers
  • 55. Desert: https://www.flickr.com/photos/aigle_dore/5952236932/sizes/l Borges Sign: https://www.flickr.com/photos/monceau/7705020640/ Map: http://gallica.bnf.fr/ark:/12148/btv1b530299707 Strike Fighter : http://en.wikipedia.org/wiki/Strike_fighter Credits