Europeana Newspapers
9 June 2014 – London– Morning Edition
Published by Alastair Dunning, The European Library
@alastairdunning, www.slideshare.net/alastairdunning
On 15th
April 1912, the passenger ship
Titanic, carrying over 2,000 passengers
and crew, crashed into an iceberg on
its maiden voyage from Southampton
to New York
Responses to the Titanic Disaster
http://anno.onb.ac.at/cgi-content/anno?aid=nzg&datum=19120417&seite=1&zoom=33
Responses to the Titanic Disaster
http://kranten.delpher.nl/nl/view/index?
query=de+telegraaf+titanic&coll=ddd&image=ddd
%3A110546692%3Ampeg21%3Aa0026&page=2&maxperpage=10&sortfield=date
Responses to the Titanic Disaster
http://gallica.bnf.fr/ark:/12148/bpt6k289555z
Responses to the Titanic Disaster
http://hemerotecadigital.bne.es/details.vm?q=id:0000817544&s=0
Responses to the Titanic Disaster
News travels at
different speeds,
with importance
that diminishes at
different rates.
This is true now
as is was in 1912.
(though the web changes things …)
The Europeana Newspapers
project is making this kind of
investigation easier
A cross-searchable newspapers
interface at The European Library
(with issue-level metadata
forwarded to Europeana)
http://www.theeuropeanlibrary.org/tel4/newspapers
Currently:
Search through
full text of
around 2
million
pages of full
text
By 2015:
10m
pages of
full text, up to
2m issues
Searching by keyword,
and organise by language,
date, source library, title
Currently:
Search through
metadata
records relating
to 1.12m
issues – with
links to source
libraries
By 2015: Search
through
metadata
records relating
to up to 4m
issues - with
links to source
librariesBrowse by date or map
Full Text from following libraries
•Bibliotheque nationale de France / National Library
France
•Koninklijke Bibliotheek / National Library of the
Netherlands
•Landesbibliothek Dr. Friedrich Teßmann / Teßmann
Library
•Eesti Rahvusraamatukogu / Estonian National
Library
• Kansalliskirjasto / National Library of Finland
• Latvijas Nacionala Biblioteka / National Library of
Latvia
•Biblioteka Narodowa / National Library of Poland
•Milli Kutuphane Baskanligi / National Library of
Turkey
• Österreichische Nationalbibliothek / Austrian
National Library
•Staatsbibliothek zu Berlin / Berlin State Library
•Staats- und Universitätsbibliothek Hamburg / State
and University Library
• Univerzitet u Beogradu / University Library of
Belgrade
Searching by title
Issue Level Records from following libraries
•National Library of Wales
•St. Cyril and Methodius National Library / The
National Library of Bulgaria
•National Library of Czech Republic
•National and University Library in Zagreb
•Koninklijke Bibliotheek van België / Bibliothèque
royale de Belgique
•Narodna in univerzitetna knjinica / National and
University Library of Slovenia
•National Library of Portugal
•National Library of Romania
•Landsbókasafn Íslands - Háskólabókasafn /
National and Univeristy Library of Iceland National
Library of Spain
•Bibliothèque nationale de Luxembourg / National
Library of Luxembourg
Finding matching results
in single or multiple issues
Highlighting search terms
So far, okay. Similar functionality
to other national and regional
digital libraries of newspapers
See other archives via:
https://www.google.com/maps/ms?
msid=217164746645697066594.0004c3d764fcb71ed2314&msa=0
But what was the user response to
an aggregation of European
newspaper libraries ?
Results of Usability Testing:
http://www.europeana-newspapers.eu/wp-
content/uploads/2014/05/The-European-Library-
Newspaper-Archive-Usability-testing-Report-
April-2014.pdf
“Aggregated view of content
from many sources highly
valued.
There was a strong positive
reaction to the availability of
the archive.”
“Many saying they would be
keen to return to the site as
the content expands.”
“Ability to search over geographic
map was highly valued”
Plenty of quibbles about design
- positions of advanced options
- re-order list of results
- manipulating facets
Much greater expectations of
functionality once logged in
For example,
Saved searches
New content notification
“Much of the value of the site to participants
was provided by the images of the documents.
Participants expected to be able to save a
'local' copy once they had located content of
relevance.
As no download facility is provided, this led to
some frustration and undermined the overall
potential value of the site for some
participants.”
Timetable for rest of project
Now – Protype version of interface shared with project
Throughout 2014 - Ongoing creation of OCR, and other
related technical work (OLR, Named Entities)
Throughout 2014 – Live version of website improved /
usability testing / added content
Autumn 2014 - Final project conference
Late 2014 - Newspaper browser completed with content and
tools from project
More information at
http://www.europeana-newspapers.eu/
Interface at
http://www.theeuropeanlibrary.org/tel4/newspapers/
Things the users didn’t say
(but I thought they would)
Why can’t I edit the text ?
(Our sample was researchers/ maybe it is other
communities interested in crowdsourcing?)
Note: If time permits, The European Library will
develop some crowdsourcing feature
Can I download text for data
mining?
Remember: Digital Humanists are still a small
percentage of humanists and users
Note: Many of the texts are marked public domain, so
this is feasible in legal terms
Number of digitised pages in
interface: c.2m
Number of digitised pages in
European libraries: c.130m
Number of physical pages in
European libraries: 1.5bn+
Source: European Newspaper Survey Report
http://www.europeana-newspapers.eu/wp-content/uploads/2012/04/D4.1-Europeana-
newspapers-survey-report.pdf
The project digital library is only a
fraction of the newspaper archive
of the continent, indeed the world
As libraries, how should we
represent that absence to users ?
Should such absence be
represented in the interface itself ?
Vast
white
spaces in
the list of
results ?
Provided standardised descriptions
of digitised resources ?
Standardised information for every
digital resource of presenting
collections, content, licencing, re-use
Charts and graphs external to the
interface ?
There are other issues too
 OCR quality varies
 Some pages (2m by 2015) have articles
segmentation
 Some library content has named entity
extraction effecting search results
 Different licensing statements from
different countries
 Date of copyright boundaries different in
each country
How should we allow users better
ways to understand the digital
library ?
What role can the API play in this?
Can opening up the data in the
digital library and allowing it to
explored in different ways ?
Traditional Model With an API
Interface
(Created by Library)
Data
(Published by Library)
Interface
(Created by Third Party)
Data
(Published by Library)
API – Application Programming Interfaces
Pioneering work of Trove API
Interface
(Created by Library)
Data
(Published by
Library)
Trove Newspapers site as
published by National Library of
Australia, and based on data
provided by Library
http://trove.nla.gov.au/newspaper
Trove Newspapers statistics
develolped by third party, based
on data provided by library
http://wraggelabs.com/shed/trove/graphs/
Interface
(Created by Third
Party)
Data
(Published by Library)
Headline Roulette, developed by
third party, based on data
provided by library
http://wraggelabs.com/shed/headline-
roulette/
Interface
(Created by Third
Party)
Data
(Published by Library)
Word Count of Articles,
developed by third party, based
on data provided by library
http://dhistory.org/frontpages/53/words/
Interface
(Created by Third
Party)
Data
(Published by Library)
Sounds great !
But … ?
How many people in this audience
would now how to build an
interface on top of API?
How many users do you know who
could build on top of an API ?
That is the problem I leave you to
discuss
Thank you.
http://www.theeuropeanlibrary.org/tel4/newspapers

Europeana Newspapers -

  • 1.
    Europeana Newspapers 9 June2014 – London– Morning Edition Published by Alastair Dunning, The European Library @alastairdunning, www.slideshare.net/alastairdunning
  • 2.
    On 15th April 1912,the passenger ship Titanic, carrying over 2,000 passengers and crew, crashed into an iceberg on its maiden voyage from Southampton to New York
  • 3.
    Responses to theTitanic Disaster http://anno.onb.ac.at/cgi-content/anno?aid=nzg&datum=19120417&seite=1&zoom=33
  • 4.
    Responses to theTitanic Disaster http://kranten.delpher.nl/nl/view/index? query=de+telegraaf+titanic&coll=ddd&image=ddd %3A110546692%3Ampeg21%3Aa0026&page=2&maxperpage=10&sortfield=date
  • 5.
    Responses to theTitanic Disaster http://gallica.bnf.fr/ark:/12148/bpt6k289555z
  • 6.
    Responses to theTitanic Disaster http://hemerotecadigital.bne.es/details.vm?q=id:0000817544&s=0
  • 7.
    Responses to theTitanic Disaster
  • 8.
    News travels at differentspeeds, with importance that diminishes at different rates. This is true now as is was in 1912. (though the web changes things …)
  • 9.
    The Europeana Newspapers projectis making this kind of investigation easier
  • 10.
    A cross-searchable newspapers interfaceat The European Library (with issue-level metadata forwarded to Europeana) http://www.theeuropeanlibrary.org/tel4/newspapers
  • 11.
    Currently: Search through full textof around 2 million pages of full text By 2015: 10m pages of full text, up to 2m issues Searching by keyword, and organise by language, date, source library, title
  • 12.
    Currently: Search through metadata records relating to1.12m issues – with links to source libraries By 2015: Search through metadata records relating to up to 4m issues - with links to source librariesBrowse by date or map
  • 13.
    Full Text fromfollowing libraries •Bibliotheque nationale de France / National Library France •Koninklijke Bibliotheek / National Library of the Netherlands •Landesbibliothek Dr. Friedrich Teßmann / Teßmann Library •Eesti Rahvusraamatukogu / Estonian National Library • Kansalliskirjasto / National Library of Finland • Latvijas Nacionala Biblioteka / National Library of Latvia •Biblioteka Narodowa / National Library of Poland •Milli Kutuphane Baskanligi / National Library of Turkey • Österreichische Nationalbibliothek / Austrian National Library •Staatsbibliothek zu Berlin / Berlin State Library •Staats- und Universitätsbibliothek Hamburg / State and University Library • Univerzitet u Beogradu / University Library of Belgrade Searching by title
  • 14.
    Issue Level Recordsfrom following libraries •National Library of Wales •St. Cyril and Methodius National Library / The National Library of Bulgaria •National Library of Czech Republic •National and University Library in Zagreb •Koninklijke Bibliotheek van België / Bibliothèque royale de Belgique •Narodna in univerzitetna knjinica / National and University Library of Slovenia •National Library of Portugal •National Library of Romania •Landsbókasafn Íslands - Háskólabókasafn / National and Univeristy Library of Iceland National Library of Spain •Bibliothèque nationale de Luxembourg / National Library of Luxembourg Finding matching results in single or multiple issues
  • 15.
  • 16.
    So far, okay.Similar functionality to other national and regional digital libraries of newspapers See other archives via: https://www.google.com/maps/ms? msid=217164746645697066594.0004c3d764fcb71ed2314&msa=0
  • 17.
    But what wasthe user response to an aggregation of European newspaper libraries ? Results of Usability Testing: http://www.europeana-newspapers.eu/wp- content/uploads/2014/05/The-European-Library- Newspaper-Archive-Usability-testing-Report- April-2014.pdf
  • 18.
    “Aggregated view ofcontent from many sources highly valued. There was a strong positive reaction to the availability of the archive.”
  • 19.
    “Many saying theywould be keen to return to the site as the content expands.”
  • 20.
    “Ability to searchover geographic map was highly valued”
  • 21.
    Plenty of quibblesabout design - positions of advanced options - re-order list of results - manipulating facets
  • 22.
    Much greater expectationsof functionality once logged in For example, Saved searches New content notification
  • 23.
    “Much of thevalue of the site to participants was provided by the images of the documents. Participants expected to be able to save a 'local' copy once they had located content of relevance. As no download facility is provided, this led to some frustration and undermined the overall potential value of the site for some participants.”
  • 24.
    Timetable for restof project Now – Protype version of interface shared with project Throughout 2014 - Ongoing creation of OCR, and other related technical work (OLR, Named Entities) Throughout 2014 – Live version of website improved / usability testing / added content Autumn 2014 - Final project conference Late 2014 - Newspaper browser completed with content and tools from project More information at http://www.europeana-newspapers.eu/ Interface at http://www.theeuropeanlibrary.org/tel4/newspapers/
  • 25.
    Things the usersdidn’t say (but I thought they would)
  • 26.
    Why can’t Iedit the text ? (Our sample was researchers/ maybe it is other communities interested in crowdsourcing?) Note: If time permits, The European Library will develop some crowdsourcing feature
  • 27.
    Can I downloadtext for data mining? Remember: Digital Humanists are still a small percentage of humanists and users Note: Many of the texts are marked public domain, so this is feasible in legal terms
  • 28.
    Number of digitisedpages in interface: c.2m Number of digitised pages in European libraries: c.130m Number of physical pages in European libraries: 1.5bn+ Source: European Newspaper Survey Report http://www.europeana-newspapers.eu/wp-content/uploads/2012/04/D4.1-Europeana- newspapers-survey-report.pdf
  • 30.
    The project digitallibrary is only a fraction of the newspaper archive of the continent, indeed the world
  • 31.
    As libraries, howshould we represent that absence to users ?
  • 32.
    Should such absencebe represented in the interface itself ?
  • 33.
  • 34.
    Provided standardised descriptions ofdigitised resources ? Standardised information for every digital resource of presenting collections, content, licencing, re-use
  • 35.
    Charts and graphsexternal to the interface ?
  • 36.
    There are otherissues too  OCR quality varies  Some pages (2m by 2015) have articles segmentation  Some library content has named entity extraction effecting search results  Different licensing statements from different countries  Date of copyright boundaries different in each country
  • 37.
    How should weallow users better ways to understand the digital library ?
  • 38.
    What role canthe API play in this? Can opening up the data in the digital library and allowing it to explored in different ways ?
  • 39.
    Traditional Model Withan API Interface (Created by Library) Data (Published by Library) Interface (Created by Third Party) Data (Published by Library) API – Application Programming Interfaces
  • 40.
  • 41.
    Interface (Created by Library) Data (Publishedby Library) Trove Newspapers site as published by National Library of Australia, and based on data provided by Library http://trove.nla.gov.au/newspaper
  • 42.
    Trove Newspapers statistics develolpedby third party, based on data provided by library http://wraggelabs.com/shed/trove/graphs/ Interface (Created by Third Party) Data (Published by Library)
  • 43.
    Headline Roulette, developedby third party, based on data provided by library http://wraggelabs.com/shed/headline- roulette/ Interface (Created by Third Party) Data (Published by Library)
  • 44.
    Word Count ofArticles, developed by third party, based on data provided by library http://dhistory.org/frontpages/53/words/ Interface (Created by Third Party) Data (Published by Library)
  • 45.
  • 46.
    How many peoplein this audience would now how to build an interface on top of API?
  • 47.
    How many usersdo you know who could build on top of an API ?
  • 48.
    That is theproblem I leave you to discuss Thank you. http://www.theeuropeanlibrary.org/tel4/newspapers