Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanties:
what is the role of Libraries?
Clemens Neudecker (@cneudecker)
Berlin State Library
8 April 2021
Staatsbibliothek zu Berlin – Preußischer Kulturbesitz (SBB)
• Established 1661 as library of the
King of Prussia
• Largest research library in Germany
• Approximately 12m volumes,
23m media objects in total
• Part of the legal entity
Stiftung Preußischer Kulturbesitz
• https://staatsbibliothek-berlin.de/
Digitization @ SBB
• Since 2007: in-house Digitization Center
• Approx. 1.7M images annual production
• Up to 80 concurrent digitization projects
• 20 diverse bookscanners, scanrobots, etc.
• Operation in two shifts with 24 operators
• Digitisation-on-demand service
• KITODO open source digitisation
workflow management system
Digital Collections
• Main portal for digitised collections
• Currently around 180,000 digitised
documents available online
• Document published before 1920
public domain licensed
• IIIF API compatible
• Full image resolution is provided
• Full text (via OCR) and keyword search for
about 20% of the digitised content
• Downloads for images, OCR, metadata
• https://digital.staatsbibliothek-berlin.de/
ZEFYS – digitized newspapers
• Digitized historical newspapers have their own portal ZEFYS
• About 200 newspaper titles and roughly 10m pages digitized
• GDR Press Portal gives access to main newspapers from the GDR
(after authentication which is necessary due to copyright)
• ZEFYS got hacked in February 2021 - but is now being reconstructed
with a new technology stack
• No full text search (yet) but approx. 5m pages already have OCR
• Currently two major newspaper digitization projects from microfilm
• https://zefys.staatsbibliothek-berlin.de/
DDB Newspaper Portal
• Uniform access and UI for digitised
newspapers in Germany
• Key features
• Title list
• Calender
• Keyword search
• Advanced features
• Citation & Persistance
• Named Entities
• Corpus Building
• https://pro.deutsche-digitale-
bibliothek.de/
deutsches-zeitungsportal
Qurator.ai
• Leverage state-of-the-art AI/ML for
digitized cultural heritage curation
• Development of AI/ML pipeline:
• Binarization
• Layout analysis
• OCR
• Postcorrection
• Named Entity Recognition and
Named Entity Linking
• Image Similarity and Search
• https://qurator.ai
• https://github.com/qurator-spk
OCR-D
• Provide the technical and organisation
framework for the OCR processing of the
German VD digitization initiatives
(documents printed in Germany from 1600
– 1900)
• Open & collaborative development :
• Specifications & Guidelines
https://ocr-d.de/en/dev
• Open source tools https://github.com/OCR-D
• Community https://gitter.im/OCR-D/Lobby
• https://ocr-d.de
SoNAR (IDH)
• Examine and evaluate approaches for an
advanced research environment for
Historical Network Analysis
• Extract person names and relations from
databases & digitized newspapers
• Transform entities with relations into a
historical social network graph
• Create intuitive visualizations and
interfaces for querying and analyzing the
social network graph
• https://sonar.fh-potsdam.de
SBB LAB
• Experimental playground
• Provision of (open) datasets
• Documentation of public APIs
• Presentation of innovative prototypes
using SBB collections
• Events (Hackathons, Transcribathons)
• Digital Researcher Residency
(planned)
• https://lab.sbb.berlin/
Thank you for your attention!
Questions?
Clemens Neudecker (@cneudecker)
Berlin State Library
8 April 2021