Central Registry for Digitized Objects Links Production and Bibliographic Control

•Download as PPT, PDF•

0 likes•613 views

Ralf Stockmann

Technology Education

As things are now
• Huge ventures in
– Digitization
• Google
• Microsoft
• National programs
• Local centers
– Accessibility
• World Digital Library
• European Digital Library
• National portals
• Google Book Search

As things are now
• We just face the dawn of mass digitization
– Leaving behind the state of
manufacturing
– Entering industrialization
– Scanning Robots
– Accessible Full Text (OCR)

Lack of …
• Coordination in
digitization activities
– Who scans what
where when
in which quality
and how will it
be accessible
• How is “quality” defined?
• Do we agree on “what”?

Facing the Consequences
Technical
Improvements
Costs

Waste of Ressources
Costs / Value

Additional
Benefit

Number of digitized items per volume

The Solution
• Central registry for digitized objects
• Focused on the production context (no user
frontend)
• API driven
– Application Programming Interface
– Query / Ingest
– Simple implementation into existing workflow-tools
• Batch mode (lists)
• Open Source / free service
• Matching on volume level
– Score / probability

Implementation
Backend Services
EROMM / EDL / OCLC / …

Registry / Meta Data Store

Aggregator / Normalizer / Mapping

API
Query

Ingest Ingest Ingest

? ? ? ! ! !
Present Collections Running Project Notice of Intent

Metadata Store
• Bibliographic
– Title
– Author
– Date
– Place of publication Matching / Score
– Number of Pages (?) „what“
– Language
– Print / Format
– Edition
• Technical
– Resolution
– Color depth
– File type / compression
• Accessibility Additional Judging
– Institution „who, where, which
– Persistent identifier quality, how
– Rights accesible“
– URL
• Status
– Digitized
– In Progress Decisive Factor
– Intended (Timeline?) „when“
– Requested?

Obstacles
• (open source) Tools for automated matching /
scoring?
• Interface for manual comparison / decision making
• Multivolume works: low rate of uniformity (near
50% of physical SUB stock before 1900)
• Unicode
• Transliteration tables
• Random bound books
• Reliable identifier
– ISBN for old books?

• Anticipated rate of accuracy: 50 – 70 %

Appreciation of Values
• The goal is NOT to build a reliable database in terms of
library standards
• But to prevent further waste of resources.
• If we manage to archive just 50% precision,
• We saved a min. 50% of founding!

Work Packages
• Define metadata model
• Set up database
• Implement mapping tools
• Define API calls
• Implement API
• Build some connectors to popular mass digitization workflow
tools (e.g. “Goobi”)
• Establish ISBN workflow
• Harvest existing sources
• Start with a community of actual projects

• Get some (!) founding
• Estimated schedule plan: 6 months

Viewers also liked

Das materielle Objekt in der digitalen WeltRalf Stockmann

Deutsche Digitale Bibliothek - Vorstellung CeBit 2008Ralf Stockmann

DFG Expertenworkshop - Workflow Volltextgenerierung über OCRRalf Stockmann

eAqua und europeana4D - 2009Ralf Stockmann

Ist Langzeitarchivierung finanzierbar? Präsentation Akademie Sankelmark 2008Ralf Stockmann

Visualisierung bibliographischer DatenRalf Stockmann

GUI-Mockups in der SoftwareentwicklungRalf Stockmann

maple , part2ahamidp

Lansio Cysill Ar Lein12canolfanbedwyr

Fireside Chatscrissy3258

Cyflwyniad Bloccanolfanbedwyr

Lecture04- Use Case Diagramsartgreen

Out of comfort zone, into the adventure Adalberto Geradini

Il processo di cambiamento in un'Azienda SanitariaAdalberto Geradini

Visioning the visionAdalberto Geradini

C'è un nuovo mondo del lavoro ?!Adalberto Geradini

The Genocide In Rwandabpersett

Perchè qualcuno dovrebbe darti un lavoro ?Adalberto Geradini

Gli s-vantaggi della relazioneAdalberto Geradini

Be unique Adalberto Geradini

Viewers also liked (20)

Das materielle Objekt in der digitalen Welt

Deutsche Digitale Bibliothek - Vorstellung CeBit 2008

DFG Expertenworkshop - Workflow Volltextgenerierung über OCR

eAqua und europeana4D - 2009

Ist Langzeitarchivierung finanzierbar? Präsentation Akademie Sankelmark 2008

Visualisierung bibliographischer Daten

GUI-Mockups in der Softwareentwicklung

maple , part2

Lansio Cysill Ar Lein12

Fireside Chats

Cyflwyniad Bloc

Lecture04- Use Case Diagrams

Out of comfort zone, into the adventure

Il processo di cambiamento in un'Azienda Sanitaria

Visioning the vision

C'è un nuovo mondo del lavoro ?!

The Genocide In Rwanda

Perchè qualcuno dovrebbe darti un lavoro ?

Gli s-vantaggi della relazione

Be unique

Similar to Central Registry for Digitized Objects Links Production and Bibliographic Control

Workflows in the Virtual ObservatoryJose Enrique Ruiz

Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Open Analytics

Open Data Summit Presentation by Joe OlsenChristopher Whitaker

Liferay & Big Data Dev Con 2014Miguel Pastor

Wordware 2011: Lingoport i18n Planning & Static AnalysisLingoport (www.lingoport.com)

Designing and Implementing Search SolutionsFindwise

Caliber 2009 Tutorial Mgsreemgsree

Cassandra euJeremy Hanna

HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems

Open Source Web Content Management Technologies for LibrariesAnil Mishra

BI on Cloud Computingtdwiindia

32 cc 3_a_l-drumhellerSociety for Scholarly Publishing

Docs as Part of the Product - Open Source Summit North America 2018Den Delimarsky

Engineering patterns for implementing data science models on big data platformsHisham Arafat

Crossmedia WorkflowsDwight Kelly

Kuali OLE: A Look at our Software Deliverables Roadmap One Year OnRobert H. McDonald

E meyer lamp2012Elaine Meyer

Digitization in theory and practiceHelen Nneka Okpala

Katherine Kott Slides for DLF PM Group 2011DLFCLIR

What Your Library Needs to Know About Kuali Open Library Environment (OLE) an...Robert H. McDonald

Similar to Central Registry for Digitized Objects Links Production and Bibliographic Control (20)

Workflows in the Virtual Observatory

Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...

Open Data Summit Presentation by Joe Olsen

Liferay & Big Data Dev Con 2014

Wordware 2011: Lingoport i18n Planning & Static Analysis

Designing and Implementing Search Solutions

Caliber 2009 Tutorial Mgsree

Cassandra eu

HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...

Open Source Web Content Management Technologies for Libraries

BI on Cloud Computing

32 cc 3_a_l-drumheller

Docs as Part of the Product - Open Source Summit North America 2018

Engineering patterns for implementing data science models on big data platforms

Crossmedia Workflows

Kuali OLE: A Look at our Software Deliverables Roadmap One Year On

E meyer lamp2012

Digitization in theory and practice

Katherine Kott Slides for DLF PM Group 2011

What Your Library Needs to Know About Kuali Open Library Environment (OLE) an...

Recently uploaded

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime

Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

CloudStudio User manual (basic edition):comworks

Key Features Of Token Development (1).pptxLBM Solutions

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

How to convert PDF to text with Nanonetsnaman860154

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

Slack Application Development 101 Slidespraypatel2

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Recently uploaded (20)

Injustice - Developers Among Us (SciFiDevCon 2024)

Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads

Breaking the Kubernetes Kill Chain: Host Path Mount

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

CloudStudio User manual (basic edition):

Key Features Of Token Development (1).pptx

Maximizing Board Effectiveness 2024 Webinar.pptx

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

My Hashitalk Indonesia April 2024 Presentation

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

The Codex of Business Writing Software for Real-World Solutions 2.pptx

How to convert PDF to text with Nanonets

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

Slack Application Development 101 Slides

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

GenCyber Cyber Security Day Presentation

Central Registry for Digitized Objects Links Production and Bibliographic Control

1. Central Registry for Digitized Objects: Linking Production and Bibliographic Control Ralf Stockmann Göttinger Digitization Center

2. As things are now • Huge ventures in – Digitization • Google • Microsoft • National programs • Local centers – Accessibility • World Digital Library • European Digital Library • National portals • Google Book Search

3. As things are now • We just face the dawn of mass digitization – Leaving behind the state of manufacturing – Entering industrialization – Scanning Robots – Accessible Full Text (OCR)

4. Lack of … • Coordination in digitization activities – Who scans what where when in which quality and how will it be accessible • How is “quality” defined? • Do we agree on “what”?

5. Facing the Consequences Technical Improvements Costs Waste of Ressources Costs / Value Additional Benefit Number of digitized items per volume

6. The Solution • Central registry for digitized objects • Focused on the production context (no user frontend) • API driven – Application Programming Interface – Query / Ingest – Simple implementation into existing workflow-tools • Batch mode (lists) • Open Source / free service • Matching on volume level – Score / probability

7. Implementation Backend Services EROMM / EDL / OCLC / … Registry / Meta Data Store Aggregator / Normalizer / Mapping API Query Ingest Ingest Ingest ? ? ? ! ! ! Present Collections Running Project Notice of Intent

8. Metadata Store • Bibliographic – Title – Author – Date – Place of publication Matching / Score – Number of Pages (?) „what“ – Language – Print / Format – Edition • Technical – Resolution – Color depth – File type / compression • Accessibility Additional Judging – Institution „who, where, which – Persistent identifier quality, how – Rights accesible“ – URL • Status – Digitized – In Progress Decisive Factor – Intended (Timeline?) „when“ – Requested?

9. Obstacles • (open source) Tools for automated matching / scoring? • Interface for manual comparison / decision making • Multivolume works: low rate of uniformity (near 50% of physical SUB stock before 1900) • Unicode • Transliteration tables • Random bound books • Reliable identifier – ISBN for old books? • Anticipated rate of accuracy: 50 – 70 %

10. Appreciation of Values • The goal is NOT to build a reliable database in terms of library standards • But to prevent further waste of resources. • If we manage to archive just 50% precision, • We saved a min. 50% of founding!

11. Work Packages • Define metadata model • Set up database • Implement mapping tools • Define API calls • Implement API • Build some connectors to popular mass digitization workflow tools (e.g. “Goobi”) • Establish ISBN workflow • Harvest existing sources • Start with a community of actual projects • Get some (!) founding • Estimated schedule plan: 6 months

12. Thank You (stockmann@uni-goettingen.de)

Central Registry for Digitized Objects Links Production and Bibliographic Control

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Central Registry for Digitized Objects Links Production and Bibliographic Control

Similar to Central Registry for Digitized Objects Links Production and Bibliographic Control (20)

More from Ralf Stockmann

More from Ralf Stockmann (17)

Recently uploaded

Recently uploaded (20)

Central Registry for Digitized Objects Links Production and Bibliographic Control