Successfully reported this slideshow.

Digitisation and Digital Humanities - what is the role of Libraries?

0

Share

Upcoming SlideShare
Active archives @SBB
Active archives @SBB
Loading in …3
×
1 of 26
1 of 26

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Digitisation and Digital Humanities - what is the role of Libraries?

  1. 1. Digitisation and Digital Humanties: what is the role of Libraries? Clemens Neudecker (@cneudecker) Berlin State Library 8 April 2021
  2. 2. Staatsbibliothek zu Berlin – Preußischer Kulturbesitz (SBB) • Established 1661 as library of the King of Prussia • Largest research library in Germany • Approximately 12m volumes, 23m media objects in total • Part of the legal entity Stiftung Preußischer Kulturbesitz • https://staatsbibliothek-berlin.de/
  3. 3. Berlin State Library – East & West
  4. 4. Digitization @ SBB • Since 2007: in-house Digitization Center • Approx. 1.7M images annual production • Up to 80 concurrent digitization projects • 20 diverse bookscanners, scanrobots, etc. • Operation in two shifts with 24 operators • Digitisation-on-demand service • KITODO open source digitisation workflow management system
  5. 5. Digital Collections • Main portal for digitised collections • Currently around 180,000 digitised documents available online • Document published before 1920 public domain licensed • IIIF API compatible • Full image resolution is provided • Full text (via OCR) and keyword search for about 20% of the digitised content • Downloads for images, OCR, metadata • https://digital.staatsbibliothek-berlin.de/
  6. 6. ZEFYS – digitized newspapers • Digitized historical newspapers have their own portal ZEFYS • About 200 newspaper titles and roughly 10m pages digitized • GDR Press Portal gives access to main newspapers from the GDR (after authentication which is necessary due to copyright) • ZEFYS got hacked in February 2021 - but is now being reconstructed with a new technology stack • No full text search (yet) but approx. 5m pages already have OCR • Currently two major newspaper digitization projects from microfilm • https://zefys.staatsbibliothek-berlin.de/
  7. 7. DDB Newspaper Portal • Uniform access and UI for digitised newspapers in Germany • Key features • Title list • Calender • Keyword search • Advanced features • Citation & Persistance • Named Entities • Corpus Building • https://pro.deutsche-digitale- bibliothek.de/ deutsches-zeitungsportal
  8. 8. Qurator.ai • Leverage state-of-the-art AI/ML for digitized cultural heritage curation • Development of AI/ML pipeline: • Binarization • Layout analysis • OCR • Postcorrection • Named Entity Recognition and Named Entity Linking • Image Similarity and Search • https://qurator.ai • https://github.com/qurator-spk
  9. 9. OCR-D • Provide the technical and organisation framework for the OCR processing of the German VD digitization initiatives (documents printed in Germany from 1600 – 1900) • Open & collaborative development : • Specifications & Guidelines https://ocr-d.de/en/dev • Open source tools https://github.com/OCR-D • Community https://gitter.im/OCR-D/Lobby • https://ocr-d.de
  10. 10. SoNAR (IDH) • Examine and evaluate approaches for an advanced research environment for Historical Network Analysis • Extract person names and relations from databases & digitized newspapers • Transform entities with relations into a historical social network graph • Create intuitive visualizations and interfaces for querying and analyzing the social network graph • https://sonar.fh-potsdam.de
  11. 11. SBB LAB • Experimental playground • Provision of (open) datasets • Documentation of public APIs • Presentation of innovative prototypes using SBB collections • Events (Hackathons, Transcribathons) • Digital Researcher Residency (planned) • https://lab.sbb.berlin/
  12. 12. Thank you for your attention! Questions? Clemens Neudecker (@cneudecker) Berlin State Library 8 April 2021

×