What's up, Europeana Newspapers?

C
What‘s up, Europeana Newspapers?
Clemens Neudecker (@cneudecker)
Staatsbibliothek zu Berlin –
Preußischer Kulturbesitz
A little bit of history
2012 – 2015: Europeana Newspapers
ICT-PSP Project (2012-2015)
31 Dec 2016: The European Library (TEL) closed
2017: DSI-2/3: Migration;
Newspapers Collection Plan
July 2018: Planned Re-Launch of Europeana
Newspapers as thematic collection
Main outcomes
– TEL Historic Newspapers Portal:
http://www.theeuropeanlibrary.org/tel4/newspapers
– Deliverables:
http://www.europeana-newspapers.eu/
public-materials/deliverables/
– Tools:
http://www.europeana-newspapers.eu/
public-materials/tools/
– Final Report:
http://europeananewspapers.github.io/
Data
• 1618 – 2016
• 12 countries
• 40 languages
• 120 TB
• Ca. 1,000 titles
• 3,3M issues
Data
• Metadata for more than >20 million pages
• 12 million pages processed with OCR
• 2 million pages processed with OLR
• Most content licensed as Public Domain
• All metadata licensed under CC0
• Copyright cut-off date
(„copyright cliff of death“)
Data
• JP2000 images for use with IIIPserver
• METS container with embedded MODS
for structural and bibliographic metadata
• ALTO for OCRed text
• EDM for Europeana
 Europeana Newspapers METS/ALTO Profile
(ENMAP)
OCR/OLR
• OCR: ABBYY FineReader Engine 11
– Gothic license per page (A4!)
– 4 servers with 8 cores = 32 processing cores
– Average processing time of 5s per newspaper page
• OLR: CCS docWorks
– Article separation & page classification
– Possibility for post-correction/validation of results
Evaluation
• Scenario-based performance evaluation of
OCR/OLR using PAGE ground truth
• Ground truth dataset:
http://primaresearch.org/datasets/ENP
• Performance Evaluation Report:
http://www.europeana-newspapers.eu/wp-
content/uploads/2015/05/D3.5_Performance_Ev
aluation_Report_1.0.pdf
Evaluation
82.4%
85.3%
80.9%
75.9%
67.5%
83.4% 84.1%
68.1%
93.1%
57.6%
87.0%
68.3%
76.1%
82.6%
54.1%
32.7%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
SuccessRate
Language Setting
Bag of Words OCR Evaluation
Per Language
67.3%
81.4%
64.0%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Gothic Normal Mixed
SuccessRate
Font
Bag of Words OCR Evaluation
Per Font
79.1%
62.2%
55.9%
58.8%
94.7%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Keyword
search
Phrase search Access via
content
structure
Print/ebook
on demand
Content
based image
retrieval
SuccessRate(harmonic,areabased)
Evaluation Profile
Layout Analysis Performance
Per evaluationprofile
74.35%
75.31%
70%
71%
72%
73%
74%
75%
76%
77%
NCSR Binarisation Original Image
SuccessRate
Image Source
Bag of Words OCR Evaluation
Binarised image vs. original image
75.3%
53.78%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
SuccessRate(countbased)
OCR Engine
Bag of Words OCR Evaluation
FineReader vs. Tesseract
FineReader Tesseract
Use in Research
Use in Research
• Oceanic Exchanges (Digging Into Data, 2017-2019)
• impresso (Swiss National Fund, 2017 – 2020)
• NewsEye (EU H2020, 2018 – 2020)
• CLARIN (EU ERIC)
• Europeana Research, Interviews with Researchers
• At Scientific Conferences
– DAS, ICDAR: Europeana Newspapers Ground Truth
– LREC, ACL: Europeana Newspapers NER Corpora
Oceanic Exchanges
(Digging Into Data, 2017-2019)
impresso
(Swiss National Fund, 2017 – 2020)
Use in Research
• Digital Humanities
– DHd AG Newspapers initiated at DHd 2018
– #HacktheNews workshop at DHNord 2018
– Roundtable on newspapers at DHBenelux 2018
• At the Berlin State Library:
– University Regensburg
– Technical University Dortmund
– Berlin-Brandenburg Academy of Sciences
Other Activities
• Rise of Literacy Generic Services Projekt
• IIIF Newspaper Interest Group
– http://iiif.io/community/groups/newspapers/
– https://github.com/IIIF/awesome-iiif#newspapers
• TEI SIG Newspapers & Periodicals
– https://wiki.tei-c.org/index.php/
SIG:Newspapers%26Periodicals
Creative Reuse
Berliner Schlagzeilen
• Created as part of Coding da Vinci Berlin 2017
• Twitterbot that tweets out daily about the
news from 100 years ago
• Source code available:
https://github.com/shoutrlabs/
berliner-schlagzeilen
What's up, Europeana Newspapers?
What's up, Europeana Newspapers?
Altpapier App
• Created as part of Coding da Vinci Berlin 2017
• Android (and soon also iOS) app that shows the
user newspaper articles with the possibility to
correct errors
• Available as source code
https://github.com/mariabecker/OldNews
and on the Play Store
https://play.google.com/store/apps/details?id=ol
dnews.de.oldnews
What's up, Europeana Newspapers?
Visualizing European Newspapers
• Visualization prototype with large touch
interface composed of multiple screens
made by Sven Charleer of KU Leuven
What's up, Europeana Newspapers?
Future Plans
Europeana Newspapers
Thematic Collection
The Situation in Germany
2012 – 2015:DFG Pilot Project
„Digitisation of historical newspapers“
Master Plan, Guidelines, etc.
2017: Relaunch of ZDB union catalog of serials
http://zdb-katalog.de/
2017: DFG Proposal (SBB, DDB involved)
„A national portal for digitised historical
newspapers at the Germany Digital Library“
2018: DFG Call for proposals
„Digitisation of historical newspapers“
What's up, Europeana Newspapers?
1 of 27

Recommended

LIBER DH Working Group Workshop: Digital Humanities Activities at Göttingen S... by
LIBER DH Working Group Workshop: Digital Humanities Activities at Göttingen S...LIBER DH Working Group Workshop: Digital Humanities Activities at Göttingen S...
LIBER DH Working Group Workshop: Digital Humanities Activities at Göttingen S...LIBER Europe
405 views9 slides
Europeana Newspapers Transcribathon by
Europeana Newspapers TranscribathonEuropeana Newspapers Transcribathon
Europeana Newspapers Transcribathoncneudecker
1.8K views20 slides
Digitised historic newspapers in Europe by
Digitised historic newspapers in EuropeDigitised historic newspapers in Europe
Digitised historic newspapers in EuropeTU Delft, Netherlands
2.7K views30 slides
Europeana Newspapers in a Nutshell by
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshellcneudecker
507 views11 slides
From Research Library to Research Services by
From Research Library to Research ServicesFrom Research Library to Research Services
From Research Library to Research ServicesSaskia Scheltjens
825 views51 slides
Extrablatt: The Latest News on Newspaper Digitisation in Europe by
Extrablatt: The Latest News on Newspaper Digitisation in EuropeExtrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in Europecneudecker
375 views22 slides

More Related Content

What's hot

You’ve Digitised Your Collection. What Next ? by
You’ve Digitised Your Collection. What Next ?You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?The European Library
531 views26 slides
You've Digitised. What Next ? by
You've Digitised. What Next ?You've Digitised. What Next ?
You've Digitised. What Next ?TU Delft, Netherlands
525 views26 slides
Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020 by
Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020
Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020Beat Estermann
184 views23 slides
Open Cultural Data in Switzerland by
Open Cultural Data in SwitzerlandOpen Cultural Data in Switzerland
Open Cultural Data in SwitzerlandBeat Estermann
113 views15 slides
Europeana in a Research Context by
Europeana in a Research ContextEuropeana in a Research Context
Europeana in a Research ContextTU Delft, Netherlands
2.5K views24 slides
Reusing historical newspapers of KB in e-humanities - Case studies and exampl... by
Reusing historical newspapers of KB in e-humanities - Case studies and exampl...Reusing historical newspapers of KB in e-humanities - Case studies and exampl...
Reusing historical newspapers of KB in e-humanities - Case studies and exampl...Olaf Janssen
2.1K views37 slides

What's hot(20)

Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020 by Beat Estermann
Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020
Estermann Linked Data Ecosystem for Heritage Data - 29 Feb 2020
Beat Estermann184 views
Open Cultural Data in Switzerland by Beat Estermann
Open Cultural Data in SwitzerlandOpen Cultural Data in Switzerland
Open Cultural Data in Switzerland
Beat Estermann113 views
Reusing historical newspapers of KB in e-humanities - Case studies and exampl... by Olaf Janssen
Reusing historical newspapers of KB in e-humanities - Case studies and exampl...Reusing historical newspapers of KB in e-humanities - Case studies and exampl...
Reusing historical newspapers of KB in e-humanities - Case studies and exampl...
Olaf Janssen2.1K views
Europeana Newspapers - Data, Tools & Future Plans by cneudecker
 Europeana Newspapers - Data, Tools & Future Plans  Europeana Newspapers - Data, Tools & Future Plans
Europeana Newspapers - Data, Tools & Future Plans
cneudecker471 views
Open Cultural Heritage Data @ the Rijksmuseum by Saskia Scheltjens
Open Cultural Heritage Data @ the RijksmuseumOpen Cultural Heritage Data @ the Rijksmuseum
Open Cultural Heritage Data @ the Rijksmuseum
Saskia Scheltjens572 views
From Open Acces to Open Collections to Open Minds by Saskia Scheltjens
From Open Acces to Open Collections to Open MindsFrom Open Acces to Open Collections to Open Minds
From Open Acces to Open Collections to Open Minds
Saskia Scheltjens947 views
Text and Data Mining at the Royal Library in the Netherlands by openminted_eu
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlands
openminted_eu679 views
The Great Twentieth-Century Hole Or, what the Digital Humanities Miss by TU Delft, Netherlands
The Great Twentieth-Century Hole Or, what the Digital Humanities MissThe Great Twentieth-Century Hole Or, what the Digital Humanities Miss
The Great Twentieth-Century Hole Or, what the Digital Humanities Miss
Representation and Absence in Digital Resources: The Case of Europeana Newspa... by TU Delft, Netherlands
Representation and Absence in Digital Resources: The Case of Europeana Newspa...Representation and Absence in Digital Resources: The Case of Europeana Newspa...
Representation and Absence in Digital Resources: The Case of Europeana Newspa...
Estermann Panel on Authority Files, 3 June 2020 by Beat Estermann
Estermann Panel on Authority Files, 3 June 2020Estermann Panel on Authority Files, 3 June 2020
Estermann Panel on Authority Files, 3 June 2020
Beat Estermann347 views
Consolidating Openness : Developing Rijksmuseum Research Services by Saskia Scheltjens
Consolidating Openness : Developing Rijksmuseum Research ServicesConsolidating Openness : Developing Rijksmuseum Research Services
Consolidating Openness : Developing Rijksmuseum Research Services
Migration statistics in Eurostat - Definition, statistical production and dis... by Giampaolo Lanzieri
Migration statistics in Eurostat - Definition, statistical production and dis...Migration statistics in Eurostat - Definition, statistical production and dis...
Migration statistics in Eurostat - Definition, statistical production and dis...
Giampaolo Lanzieri176 views
Alastair Dunning, The successes of the Europeana Libraries project, The Europ... by The European Library
Alastair Dunning, The successes of the Europeana Libraries project, The Europ...Alastair Dunning, The successes of the Europeana Libraries project, The Europ...
Alastair Dunning, The successes of the Europeana Libraries project, The Europ...
Europeana Introduction at Creative Kick-Off event - Breandán Knowlton by Europeana
Europeana Introduction at Creative Kick-Off event - Breandán KnowltonEuropeana Introduction at Creative Kick-Off event - Breandán Knowlton
Europeana Introduction at Creative Kick-Off event - Breandán Knowlton
Europeana429 views

Similar to What's up, Europeana Newspapers?

Europeana Newspapers Aggregator Forum 2018 Berlin by
Europeana Newspapers Aggregator Forum 2018 BerlinEuropeana Newspapers Aggregator Forum 2018 Berlin
Europeana Newspapers Aggregator Forum 2018 Berlincneudecker
353 views27 slides
LIBER, Europeana and the Europeana Newspapers Project by
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectEuropeana Newspapers
598 views25 slides
Des nouvelles d’Europeana by
Des nouvelles d’EuropeanaDes nouvelles d’Europeana
Des nouvelles d’EuropeanaDouglas McCarthy
552 views58 slides
The European(a) Newspapers Project by
The European(a) Newspapers ProjectThe European(a) Newspapers Project
The European(a) Newspapers ProjectEuropeana Newspapers
878 views17 slides
LIBER, Europeana and the Europeana Newspapers Project by
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER Europe
528 views25 slides
GI2012 pekarek-liber by
GI2012 pekarek-liberGI2012 pekarek-liber
GI2012 pekarek-liberIGN Vorstand
672 views25 slides

Similar to What's up, Europeana Newspapers?(20)

Europeana Newspapers Aggregator Forum 2018 Berlin by cneudecker
Europeana Newspapers Aggregator Forum 2018 BerlinEuropeana Newspapers Aggregator Forum 2018 Berlin
Europeana Newspapers Aggregator Forum 2018 Berlin
cneudecker353 views
LIBER, Europeana and the Europeana Newspapers Project by Europeana Newspapers
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project by LIBER Europe
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
LIBER Europe528 views
GI2012 pekarek-liber by IGN Vorstand
GI2012 pekarek-liberGI2012 pekarek-liber
GI2012 pekarek-liber
IGN Vorstand672 views
Europeana Essentials (updated June 2014) by Europeana
Europeana Essentials (updated June 2014)Europeana Essentials (updated June 2014)
Europeana Essentials (updated June 2014)
Europeana2.2K views
Europeana essentials June 2013 by Europeana
Europeana essentials June 2013Europeana essentials June 2013
Europeana essentials June 2013
Europeana639 views
From Catalogue 2.0 to the digital humanities: exploring the future of librari... by Sally Chambers
From Catalogue 2.0 to the digital humanities: exploring the future of librari...From Catalogue 2.0 to the digital humanities: exploring the future of librari...
From Catalogue 2.0 to the digital humanities: exploring the future of librari...
Sally Chambers948 views
Europeana essentials August 2013 by Europeana
Europeana essentials August 2013Europeana essentials August 2013
Europeana essentials August 2013
Europeana1K views
Digital cultural heritage as humanities data: a labs approach by Sally Chambers
Digital cultural heritage as humanities data: a labs approachDigital cultural heritage as humanities data: a labs approach
Digital cultural heritage as humanities data: a labs approach
Sally Chambers540 views
Mate Toth: Digitisation and creative re-use of cultural content #blokexpertu by KISK FF MU
Mate Toth: Digitisation and creative re-use of cultural content #blokexpertuMate Toth: Digitisation and creative re-use of cultural content #blokexpertu
Mate Toth: Digitisation and creative re-use of cultural content #blokexpertu
KISK FF MU1.4K views
New tasks, new roles: Libraries in the tension between Digital Humanities, Re... by Stefan Schmunk
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
Stefan Schmunk846 views

More from cneudecker

EuropeanaTech x AI: Qurator.ai @ Berlin State Library by
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Librarycneudecker
142 views13 slides
ALTO, PAGE & Co. Formate für Volltexte by
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltextecneudecker
82 views22 slides
OCR und Strukturerkennung für Zeitungen by
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungencneudecker
99 views21 slides
Digitisation and Digital Humanities - what is the role of Libraries? by
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?cneudecker
214 views26 slides
Multimodal Perspectives for Digitised Historical Newspapers by
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspaperscneudecker
344 views15 slides
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi... by
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...cneudecker
95 views18 slides

More from cneudecker(20)

EuropeanaTech x AI: Qurator.ai @ Berlin State Library by cneudecker
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Library
cneudecker142 views
ALTO, PAGE & Co. Formate für Volltexte by cneudecker
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltexte
cneudecker82 views
OCR und Strukturerkennung für Zeitungen by cneudecker
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungen
cneudecker99 views
Digitisation and Digital Humanities - what is the role of Libraries? by cneudecker
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?
cneudecker214 views
Multimodal Perspectives for Digitised Historical Newspapers by cneudecker
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspapers
cneudecker344 views
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi... by cneudecker
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
cneudecker95 views
AI for digitized cultural heritage by cneudecker
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritage
cneudecker196 views
Kuratieren mit künstlicher Intelligenz by cneudecker
Kuratieren mit künstlicher IntelligenzKuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher Intelligenz
cneudecker1.2K views
Überblick zum DFG-Projekt OCR-D by cneudecker
Überblick zum DFG-Projekt OCR-DÜberblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-D
cneudecker370 views
The many uses of digitized newspapers by cneudecker
The many uses of digitized newspapersThe many uses of digitized newspapers
The many uses of digitized newspapers
cneudecker302 views
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten... by cneudecker
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
cneudecker539 views
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her... by cneudecker
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
cneudecker286 views
OCR-D: An end-to-end open source OCR framework for historical printed documents by cneudecker
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documents
cneudecker2K views
Text and Data Mining by cneudecker
Text and Data MiningText and Data Mining
Text and Data Mining
cneudecker698 views
Formate für Volltexte by cneudecker
Formate für VolltexteFormate für Volltexte
Formate für Volltexte
cneudecker172 views
Reise durch Europeana Collections in 11 Minuten by cneudecker
Reise durch Europeana Collections in 11 MinutenReise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 Minuten
cneudecker306 views
lab.sbb.berlin by cneudecker
lab.sbb.berlinlab.sbb.berlin
lab.sbb.berlin
cneudecker349 views
Named Entity Recognition for Europeana Newspapers by cneudecker
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspapers
cneudecker644 views
Active archives @SBB by cneudecker
Active archives @SBBActive archives @SBB
Active archives @SBB
cneudecker356 views
Coding da Vinci Berlin 2017 - Europeana Newspapers by cneudecker
Coding da Vinci Berlin 2017 - Europeana NewspapersCoding da Vinci Berlin 2017 - Europeana Newspapers
Coding da Vinci Berlin 2017 - Europeana Newspapers
cneudecker706 views

Recently uploaded

Roadmap to Become Experts.pptx by
Roadmap to Become Experts.pptxRoadmap to Become Experts.pptx
Roadmap to Become Experts.pptxdscwidyatamanew
14 views45 slides
Scaling Knowledge Graph Architectures with AI by
Scaling Knowledge Graph Architectures with AIScaling Knowledge Graph Architectures with AI
Scaling Knowledge Graph Architectures with AIEnterprise Knowledge
28 views15 slides
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensorssugiuralab
19 views15 slides
The details of description: Techniques, tips, and tangents on alternative tex... by
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...BookNet Canada
126 views24 slides
SAP Automation Using Bar Code and FIORI.pdf by
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdfVirendra Rai, PMP
22 views38 slides
6g - REPORT.pdf by
6g - REPORT.pdf6g - REPORT.pdf
6g - REPORT.pdfLiveplex
10 views23 slides

Recently uploaded(20)

TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab19 views
The details of description: Techniques, tips, and tangents on alternative tex... by BookNet Canada
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...
BookNet Canada126 views
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf
6g - REPORT.pdf by Liveplex
6g - REPORT.pdf6g - REPORT.pdf
6g - REPORT.pdf
Liveplex10 views
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
HTTP headers that make your website go faster - devs.gent November 2023 by Thijs Feryn
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023
Thijs Feryn21 views
From chaos to control: Managing migrations and Microsoft 365 with ShareGate! by sammart93
From chaos to control: Managing migrations and Microsoft 365 with ShareGate!From chaos to control: Managing migrations and Microsoft 365 with ShareGate!
From chaos to control: Managing migrations and Microsoft 365 with ShareGate!
sammart939 views
DALI Basics Course 2023 by Ivory Egg
DALI Basics Course  2023DALI Basics Course  2023
DALI Basics Course 2023
Ivory Egg16 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker33 views

What's up, Europeana Newspapers?

  • 1. What‘s up, Europeana Newspapers? Clemens Neudecker (@cneudecker) Staatsbibliothek zu Berlin – Preußischer Kulturbesitz
  • 2. A little bit of history 2012 – 2015: Europeana Newspapers ICT-PSP Project (2012-2015) 31 Dec 2016: The European Library (TEL) closed 2017: DSI-2/3: Migration; Newspapers Collection Plan July 2018: Planned Re-Launch of Europeana Newspapers as thematic collection
  • 3. Main outcomes – TEL Historic Newspapers Portal: http://www.theeuropeanlibrary.org/tel4/newspapers – Deliverables: http://www.europeana-newspapers.eu/ public-materials/deliverables/ – Tools: http://www.europeana-newspapers.eu/ public-materials/tools/ – Final Report: http://europeananewspapers.github.io/
  • 4. Data • 1618 – 2016 • 12 countries • 40 languages • 120 TB • Ca. 1,000 titles • 3,3M issues
  • 5. Data • Metadata for more than >20 million pages • 12 million pages processed with OCR • 2 million pages processed with OLR • Most content licensed as Public Domain • All metadata licensed under CC0 • Copyright cut-off date („copyright cliff of death“)
  • 6. Data • JP2000 images for use with IIIPserver • METS container with embedded MODS for structural and bibliographic metadata • ALTO for OCRed text • EDM for Europeana  Europeana Newspapers METS/ALTO Profile (ENMAP)
  • 7. OCR/OLR • OCR: ABBYY FineReader Engine 11 – Gothic license per page (A4!) – 4 servers with 8 cores = 32 processing cores – Average processing time of 5s per newspaper page • OLR: CCS docWorks – Article separation & page classification – Possibility for post-correction/validation of results
  • 8. Evaluation • Scenario-based performance evaluation of OCR/OLR using PAGE ground truth • Ground truth dataset: http://primaresearch.org/datasets/ENP • Performance Evaluation Report: http://www.europeana-newspapers.eu/wp- content/uploads/2015/05/D3.5_Performance_Ev aluation_Report_1.0.pdf
  • 9. Evaluation 82.4% 85.3% 80.9% 75.9% 67.5% 83.4% 84.1% 68.1% 93.1% 57.6% 87.0% 68.3% 76.1% 82.6% 54.1% 32.7% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% SuccessRate Language Setting Bag of Words OCR Evaluation Per Language 67.3% 81.4% 64.0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Gothic Normal Mixed SuccessRate Font Bag of Words OCR Evaluation Per Font 79.1% 62.2% 55.9% 58.8% 94.7% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Keyword search Phrase search Access via content structure Print/ebook on demand Content based image retrieval SuccessRate(harmonic,areabased) Evaluation Profile Layout Analysis Performance Per evaluationprofile 74.35% 75.31% 70% 71% 72% 73% 74% 75% 76% 77% NCSR Binarisation Original Image SuccessRate Image Source Bag of Words OCR Evaluation Binarised image vs. original image 75.3% 53.78% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% SuccessRate(countbased) OCR Engine Bag of Words OCR Evaluation FineReader vs. Tesseract FineReader Tesseract
  • 11. Use in Research • Oceanic Exchanges (Digging Into Data, 2017-2019) • impresso (Swiss National Fund, 2017 – 2020) • NewsEye (EU H2020, 2018 – 2020) • CLARIN (EU ERIC) • Europeana Research, Interviews with Researchers • At Scientific Conferences – DAS, ICDAR: Europeana Newspapers Ground Truth – LREC, ACL: Europeana Newspapers NER Corpora
  • 14. Use in Research • Digital Humanities – DHd AG Newspapers initiated at DHd 2018 – #HacktheNews workshop at DHNord 2018 – Roundtable on newspapers at DHBenelux 2018 • At the Berlin State Library: – University Regensburg – Technical University Dortmund – Berlin-Brandenburg Academy of Sciences
  • 15. Other Activities • Rise of Literacy Generic Services Projekt • IIIF Newspaper Interest Group – http://iiif.io/community/groups/newspapers/ – https://github.com/IIIF/awesome-iiif#newspapers • TEI SIG Newspapers & Periodicals – https://wiki.tei-c.org/index.php/ SIG:Newspapers%26Periodicals
  • 17. Berliner Schlagzeilen • Created as part of Coding da Vinci Berlin 2017 • Twitterbot that tweets out daily about the news from 100 years ago • Source code available: https://github.com/shoutrlabs/ berliner-schlagzeilen
  • 20. Altpapier App • Created as part of Coding da Vinci Berlin 2017 • Android (and soon also iOS) app that shows the user newspaper articles with the possibility to correct errors • Available as source code https://github.com/mariabecker/OldNews and on the Play Store https://play.google.com/store/apps/details?id=ol dnews.de.oldnews
  • 22. Visualizing European Newspapers • Visualization prototype with large touch interface composed of multiple screens made by Sven Charleer of KU Leuven
  • 26. The Situation in Germany 2012 – 2015:DFG Pilot Project „Digitisation of historical newspapers“ Master Plan, Guidelines, etc. 2017: Relaunch of ZDB union catalog of serials http://zdb-katalog.de/ 2017: DFG Proposal (SBB, DDB involved) „A national portal for digitised historical newspapers at the Germany Digital Library“ 2018: DFG Call for proposals „Digitisation of historical newspapers“