1. Europeana Newspapers
in a Nutshell
Clemens Neudecker (@cneudecker)
Staatsbibliothek zu Berlin –
Preußischer Kulturbesitz
2. Introduction/Background
• Europeana Newspapers EU Project
(2012 – 2015)
• http://www.europeana-newspapers.eu/
• Main objectives:
– Collect metadata for digitised newspapers in EU
– Perform OCR (text recognition) and OLR (article
separation) on the digitised newspapers
– Develop a common portal for search & discovery
– Establish standards and best practices for
(historical) newspaper digitisation
3. Collection Stats
• Covers newspapers from 1618 – 2016
• 12 EU national and/or research libraries
• >1.000 newspaper titles, ca. 3.3m issues
• 40 languages, 4 alphabets
• Metadata for approx. 20m pages
• 12m pages fully searchable by keyword
(OCR errors – your mileage may vary…)
• Data (scans & OCR) public domain,
metadata CC0 licensed
8. Collaboration with Researchers
• Oceanic Exchanges (Digging Into Data
Transatlantic Platform, 2017-2019)
• impresso (Swiss National Science Fund,
2017-2020)
• NewsEye (EU H2020, 2018 – 2020)
• CLARIN (EU DSI, ongoing)
• Interviews with researchers
• Numerous research groups throughout EU
(though mainly DACH)
9. Outlook
• Relaunch of Europeana Newspapers with redeveloped
search and browse interface integrated directly with
the Europeana Portal as a thematic collection
(July/August 2018)
• Support of IIIF API for the online presentation
and aggregation of newspapers in Europeana
in the longer term, this will open up the possibility to
bookmark, annotate, transcribe or correct and connect
disparate newspaper sources directly in your web browser
• Named Entity Recognition for newspapers
11. Thank you for your
attention!
Clemens Neudecker (@cneudecker)
Staatsbibliothek zu Berlin –
Preußischer Kulturbesitz