SlideShare a Scribd company logo
Digitisation and Digital Humanties:
what is the role of Libraries?
Clemens Neudecker (@cneudecker)
Berlin State Library
8 April 2021
Staatsbibliothek zu Berlin – Preußischer Kulturbesitz (SBB)
• Established 1661 as library of the
King of Prussia
• Largest research library in Germany
• Approximately 12m volumes,
23m media objects in total
• Part of the legal entity
Stiftung Preußischer Kulturbesitz
• https://staatsbibliothek-berlin.de/
Berlin State Library – East & West
Digitization @ SBB
• Since 2007: in-house Digitization Center
• Approx. 1.7M images annual production
• Up to 80 concurrent digitization projects
• 20 diverse bookscanners, scanrobots, etc.
• Operation in two shifts with 24 operators
• Digitisation-on-demand service
• KITODO open source digitisation
workflow management system
Digital Collections
• Main portal for digitised collections
• Currently around 180,000 digitised
documents available online
• Document published before 1920
public domain licensed
• IIIF API compatible
• Full image resolution is provided
• Full text (via OCR) and keyword search for
about 20% of the digitised content
• Downloads for images, OCR, metadata
• https://digital.staatsbibliothek-berlin.de/
ZEFYS – digitized newspapers
• Digitized historical newspapers have their own portal ZEFYS
• About 200 newspaper titles and roughly 10m pages digitized
• GDR Press Portal gives access to main newspapers from the GDR
(after authentication which is necessary due to copyright)
• ZEFYS got hacked in February 2021 - but is now being reconstructed
with a new technology stack
• No full text search (yet) but approx. 5m pages already have OCR
• Currently two major newspaper digitization projects from microfilm
• https://zefys.staatsbibliothek-berlin.de/
DDB Newspaper Portal
• Uniform access and UI for digitised
newspapers in Germany
• Key features
• Title list
• Calender
• Keyword search
• Advanced features
• Citation & Persistance
• Named Entities
• Corpus Building
• https://pro.deutsche-digitale-
bibliothek.de/
deutsches-zeitungsportal
Qurator.ai
• Leverage state-of-the-art AI/ML for
digitized cultural heritage curation
• Development of AI/ML pipeline:
• Binarization
• Layout analysis
• OCR
• Postcorrection
• Named Entity Recognition and
Named Entity Linking
• Image Similarity and Search
• https://qurator.ai
• https://github.com/qurator-spk
OCR-D
• Provide the technical and organisation
framework for the OCR processing of the
German VD digitization initiatives
(documents printed in Germany from 1600
– 1900)
• Open & collaborative development :
• Specifications & Guidelines
https://ocr-d.de/en/dev
• Open source tools https://github.com/OCR-D
• Community https://gitter.im/OCR-D/Lobby
• https://ocr-d.de
SoNAR (IDH)
• Examine and evaluate approaches for an
advanced research environment for
Historical Network Analysis
• Extract person names and relations from
databases & digitized newspapers
• Transform entities with relations into a
historical social network graph
• Create intuitive visualizations and
interfaces for querying and analyzing the
social network graph
• https://sonar.fh-potsdam.de
SBB LAB
• Experimental playground
• Provision of (open) datasets
• Documentation of public APIs
• Presentation of innovative prototypes
using SBB collections
• Events (Hackathons, Transcribathons)
• Digital Researcher Residency
(planned)
• https://lab.sbb.berlin/
Thank you for your attention!
Questions?
Clemens Neudecker (@cneudecker)
Berlin State Library
8 April 2021

More Related Content

What's hot

Exploratory querying of the Dutch GeoRegisters
Exploratory querying of the Dutch GeoRegistersExploratory querying of the Dutch GeoRegisters
Exploratory querying of the Dutch GeoRegisters
Stanislav Ronzhin
 

What's hot (20)

GeoSEO and Map Series - Discovery Integrated With Geographical Search in Map ...
GeoSEO and Map Series - Discovery Integrated With Geographical Search in Map ...GeoSEO and Map Series - Discovery Integrated With Geographical Search in Map ...
GeoSEO and Map Series - Discovery Integrated With Geographical Search in Map ...
 
The Vienna History Wiki – a Collaborative Knowledge Platform for the City of...
The Vienna History Wiki –  a Collaborative Knowledge Platform for the City of...The Vienna History Wiki –  a Collaborative Knowledge Platform for the City of...
The Vienna History Wiki – a Collaborative Knowledge Platform for the City of...
 
Open Data at the Federal Level 2021
Open Data at the Federal Level 2021Open Data at the Federal Level 2021
Open Data at the Federal Level 2021
 
Sound Archives and Musical Instrument Collections
Sound Archives and Musical Instrument CollectionsSound Archives and Musical Instrument Collections
Sound Archives and Musical Instrument Collections
 
Exploratory querying of the Dutch GeoRegisters
Exploratory querying of the Dutch GeoRegistersExploratory querying of the Dutch GeoRegisters
Exploratory querying of the Dutch GeoRegisters
 
Webarchiv - Curatorial approaches, topic collections and cooperation with the...
Webarchiv - Curatorial approaches, topic collections and cooperation with the...Webarchiv - Curatorial approaches, topic collections and cooperation with the...
Webarchiv - Curatorial approaches, topic collections and cooperation with the...
 
Lynx project presentation at ENDORSE 2021 Conference
Lynx project presentation at ENDORSE 2021 ConferenceLynx project presentation at ENDORSE 2021 Conference
Lynx project presentation at ENDORSE 2021 Conference
 
Introduction to Annotation, Content Search, and IIIF Authentication from the ...
Introduction to Annotation, Content Search, and IIIF Authentication from the ...Introduction to Annotation, Content Search, and IIIF Authentication from the ...
Introduction to Annotation, Content Search, and IIIF Authentication from the ...
 
Réseaux de bibliothèques à l'ère du cloud : que partager ? comment travailler...
Réseaux de bibliothèques à l'ère du cloud : que partager ? comment travailler...Réseaux de bibliothèques à l'ère du cloud : que partager ? comment travailler...
Réseaux de bibliothèques à l'ère du cloud : que partager ? comment travailler...
 
MassNow - intelligent church locator
MassNow - intelligent church locatorMassNow - intelligent church locator
MassNow - intelligent church locator
 
Digitised Manuscripts and the British Library's new IIIF viewer
Digitised Manuscripts and the British Library's new IIIF viewer Digitised Manuscripts and the British Library's new IIIF viewer
Digitised Manuscripts and the British Library's new IIIF viewer
 
Working digitally with Historical Documents
Working digitally with Historical DocumentsWorking digitally with Historical Documents
Working digitally with Historical Documents
 
Process, not product Experiences from developing a digital interface of arch...
Process, not product  Experiences from developing a digital interface of arch...Process, not product  Experiences from developing a digital interface of arch...
Process, not product Experiences from developing a digital interface of arch...
 
Sasaki practical-linked-data
Sasaki practical-linked-dataSasaki practical-linked-data
Sasaki practical-linked-data
 
Presentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information DayPresentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information Day
 
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
 
Report of the Soil Data Facility
Report of the Soil Data Facility Report of the Soil Data Facility
Report of the Soil Data Facility
 
The data behind the HuisKluis
The data behind the HuisKluisThe data behind the HuisKluis
The data behind the HuisKluis
 
Data visualisation workshop
Data visualisation workshopData visualisation workshop
Data visualisation workshop
 
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
 

Similar to Digitisation and Digital Humanities - what is the role of Libraries?

The Europeana Newspapers Project at IMPACT Final Event
The Europeana Newspapers Project at IMPACT Final EventThe Europeana Newspapers Project at IMPACT Final Event
The Europeana Newspapers Project at IMPACT Final Event
Europeana Newspapers
 
IMPACT Final Event 26-06-2012 - Use of IMPACT tools in the Europeana Newspap...
IMPACT Final Event 26-06-2012  - Use of IMPACT tools in the Europeana Newspap...IMPACT Final Event 26-06-2012  - Use of IMPACT tools in the Europeana Newspap...
IMPACT Final Event 26-06-2012 - Use of IMPACT tools in the Europeana Newspap...
IMPACT Centre of Competence
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
LIBER Europe
 
Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02
The European Library
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
Europeana Newspapers
 

Similar to Digitisation and Digital Humanities - what is the role of Libraries? (20)

Active archives @SBB
Active archives @SBBActive archives @SBB
Active archives @SBB
 
Dag Hensten - Nasjonalmuseet collections online
Dag Hensten - Nasjonalmuseet collections onlineDag Hensten - Nasjonalmuseet collections online
Dag Hensten - Nasjonalmuseet collections online
 
The European(a) Newspapers Project
The European(a) Newspapers ProjectThe European(a) Newspapers Project
The European(a) Newspapers Project
 
The Europeana Newspapers Project at IMPACT Final Event
The Europeana Newspapers Project at IMPACT Final EventThe Europeana Newspapers Project at IMPACT Final Event
The Europeana Newspapers Project at IMPACT Final Event
 
The Europeana Newspapers Project
The Europeana Newspapers ProjectThe Europeana Newspapers Project
The Europeana Newspapers Project
 
IMPACT Final Event 26-06-2012 - Use of IMPACT tools in the Europeana Newspap...
IMPACT Final Event 26-06-2012  - Use of IMPACT tools in the Europeana Newspap...IMPACT Final Event 26-06-2012  - Use of IMPACT tools in the Europeana Newspap...
IMPACT Final Event 26-06-2012 - Use of IMPACT tools in the Europeana Newspap...
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritage
 
Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
 
You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?
 
You've Digitised. What Next ?
You've Digitised. What Next ?You've Digitised. What Next ?
You've Digitised. What Next ?
 
The Europeana Newspapers Presentation - Cyberspace 2012
The Europeana Newspapers Presentation - Cyberspace 2012The Europeana Newspapers Presentation - Cyberspace 2012
The Europeana Newspapers Presentation - Cyberspace 2012
 
Europeana Libraries: bringing content to the researcher
Europeana Libraries: bringing content to the researcherEuropeana Libraries: bringing content to the researcher
Europeana Libraries: bringing content to the researcher
 
IIIF and Mirador at the YCBA: image based scholarly collaboration and research
IIIF and Mirador at the YCBA: image based scholarly collaboration and researchIIIF and Mirador at the YCBA: image based scholarly collaboration and research
IIIF and Mirador at the YCBA: image based scholarly collaboration and research
 
EuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Library
 
Extrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in EuropeExtrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in Europe
 
Europeana Newspapers Aggregator Forum 2018 Berlin
Europeana Newspapers Aggregator Forum 2018 BerlinEuropeana Newspapers Aggregator Forum 2018 Berlin
Europeana Newspapers Aggregator Forum 2018 Berlin
 
How to Build a Digital Library
How to Build a Digital LibraryHow to Build a Digital Library
How to Build a Digital Library
 
What's up, Europeana Newspapers?
What's up, Europeana Newspapers?What's up, Europeana Newspapers?
What's up, Europeana Newspapers?
 

More from cneudecker

OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
cneudecker
 

More from cneudecker (20)

ALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltexte
 
OCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungen
 
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
 
Kuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher IntelligenzKuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher Intelligenz
 
Überblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-DÜberblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-D
 
The many uses of digitized newspapers
The many uses of digitized newspapersThe many uses of digitized newspapers
The many uses of digitized newspapers
 
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
 
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documents
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Mining
 
Formate für Volltexte
Formate für VolltexteFormate für Volltexte
Formate für Volltexte
 
Reise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 MinutenReise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 Minuten
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshell
 
lab.sbb.berlin
lab.sbb.berlinlab.sbb.berlin
lab.sbb.berlin
 
Named Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspapers
 
Coding da Vinci Berlin 2017 - Europeana Newspapers
Coding da Vinci Berlin 2017 - Europeana NewspapersCoding da Vinci Berlin 2017 - Europeana Newspapers
Coding da Vinci Berlin 2017 - Europeana Newspapers
 
Coding da Vinci Berlin 2017 - Europeana Collections 1914-1918
Coding da Vinci Berlin 2017 - Europeana Collections 1914-1918Coding da Vinci Berlin 2017 - Europeana Collections 1914-1918
Coding da Vinci Berlin 2017 - Europeana Collections 1914-1918
 
Europeana Newspapers Transcribathon
Europeana Newspapers TranscribathonEuropeana Newspapers Transcribathon
Europeana Newspapers Transcribathon
 
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen BibliothekenDigitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
 
How to read a million books?
How to read a million books?How to read a million books?
How to read a million books?
 

Recently uploaded

Recently uploaded (20)

What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdf
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 

Digitisation and Digital Humanities - what is the role of Libraries?

  • 1. Digitisation and Digital Humanties: what is the role of Libraries? Clemens Neudecker (@cneudecker) Berlin State Library 8 April 2021
  • 2. Staatsbibliothek zu Berlin – Preußischer Kulturbesitz (SBB) • Established 1661 as library of the King of Prussia • Largest research library in Germany • Approximately 12m volumes, 23m media objects in total • Part of the legal entity Stiftung Preußischer Kulturbesitz • https://staatsbibliothek-berlin.de/
  • 3. Berlin State Library – East & West
  • 4. Digitization @ SBB • Since 2007: in-house Digitization Center • Approx. 1.7M images annual production • Up to 80 concurrent digitization projects • 20 diverse bookscanners, scanrobots, etc. • Operation in two shifts with 24 operators • Digitisation-on-demand service • KITODO open source digitisation workflow management system
  • 5. Digital Collections • Main portal for digitised collections • Currently around 180,000 digitised documents available online • Document published before 1920 public domain licensed • IIIF API compatible • Full image resolution is provided • Full text (via OCR) and keyword search for about 20% of the digitised content • Downloads for images, OCR, metadata • https://digital.staatsbibliothek-berlin.de/
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. ZEFYS – digitized newspapers • Digitized historical newspapers have their own portal ZEFYS • About 200 newspaper titles and roughly 10m pages digitized • GDR Press Portal gives access to main newspapers from the GDR (after authentication which is necessary due to copyright) • ZEFYS got hacked in February 2021 - but is now being reconstructed with a new technology stack • No full text search (yet) but approx. 5m pages already have OCR • Currently two major newspaper digitization projects from microfilm • https://zefys.staatsbibliothek-berlin.de/
  • 14.
  • 15.
  • 16.
  • 17. DDB Newspaper Portal • Uniform access and UI for digitised newspapers in Germany • Key features • Title list • Calender • Keyword search • Advanced features • Citation & Persistance • Named Entities • Corpus Building • https://pro.deutsche-digitale- bibliothek.de/ deutsches-zeitungsportal
  • 18. Qurator.ai • Leverage state-of-the-art AI/ML for digitized cultural heritage curation • Development of AI/ML pipeline: • Binarization • Layout analysis • OCR • Postcorrection • Named Entity Recognition and Named Entity Linking • Image Similarity and Search • https://qurator.ai • https://github.com/qurator-spk
  • 19. OCR-D • Provide the technical and organisation framework for the OCR processing of the German VD digitization initiatives (documents printed in Germany from 1600 – 1900) • Open & collaborative development : • Specifications & Guidelines https://ocr-d.de/en/dev • Open source tools https://github.com/OCR-D • Community https://gitter.im/OCR-D/Lobby • https://ocr-d.de
  • 20. SoNAR (IDH) • Examine and evaluate approaches for an advanced research environment for Historical Network Analysis • Extract person names and relations from databases & digitized newspapers • Transform entities with relations into a historical social network graph • Create intuitive visualizations and interfaces for querying and analyzing the social network graph • https://sonar.fh-potsdam.de
  • 21. SBB LAB • Experimental playground • Provision of (open) datasets • Documentation of public APIs • Presentation of innovative prototypes using SBB collections • Events (Hackathons, Transcribathons) • Digital Researcher Residency (planned) • https://lab.sbb.berlin/
  • 22.
  • 23.
  • 24.
  • 25.
  • 26. Thank you for your attention! Questions? Clemens Neudecker (@cneudecker) Berlin State Library 8 April 2021