SlideShare a Scribd company logo
1 of 19
On the Two Sides of the Pond
By Hans-Jörg Lieder,

Head of the Department of Bibliographic Services – Union Catalogue of Serials
Staatsbibliothek zu Berlin - Preußischer Kulturbesitz;

Dr. Katalin Radics,

Distinguished Librarian; Librarian of the West European Collections and Classics
Young Research Library, University of California, Los Angeles
UNIQUE EUROPEAN MATERIALS – HELD IN A US LIBRARY
Partnership
between the UCLA Library and Staatsbibliothek zu Berlin
Newspapers on the way to discoloring and disintegration
Storage facility of the University of California Libraries on
the UCLA campus
- Leaflets 13”x18.5” or 33cm x 47cm
- Imprint indicating the title, date, the number of the issue;
warning
-Published four or five times a day
UCLA stamps including receiving dates
Packed in wrapping paper probably
after 1940, packages of 700-800 sheets
No documentation (ordering or receiving
records) in the library archives; no
correspondence
Normal serial subscription scheme (?)
Very minimal cataloging record – very
low use
Towards a Weeding Decision
Brittle condition
Check for other holdings in California, US and World libraries
OCLC – no other holdings at the time of checking
Nine 1938 issues at BNF
No holding at the German National Library (Deutsche
Nationalbibliothek)
Contact with head of Zeitungsabteilung, Staatsbibliothek – no
holding in Germany
UNIQUE!!!
Decision: keep and preserve the UCLA holdings.
Keep and Preserve
9600 pages
1936-1940 with gaps
Acid-free boxes
The most fragile pages in mylar
Digitization Project
Funding for digitization
Highest quality resolution: 600 dpi
RGB
Add minimal metadata
Title
Deutsches Nachrichtenbüro. 5 Jahrg., Nr. 1581, 1938 October 1,
Erste Morgen-Ausgabe
Alt ID
3813183_1938-10-01_1581 [Local]
AltTitle
Erste Morgen-Ausgabe [Descriptive]
Deutsches Nachrichtenbüro [Descriptive]
Date
October 1, 1938 [Publication]
1938-10-01 [Normalized]
Format
1 p. [Extent]
Language
ger
Name
University of California, Los Angeles. Library. Dept. of Special
Collections [Repository]
Type
newspapers [Genre]
text [Type Of Resource]
Digitized copies: part of UCLA Digital Library at
http://digital2.library.ucla.edu/ -- freely accessible
Searchable only by date
More sophisticated searching capability needed – day by day chronicle of the
Third Reich for a short period of time
-events
-names
-institutions etc.
Deutsches Nachrichten Büro – December 5, 1933network of 36 local services (Landesdienste)
Indexing needed
Fraktur – major problem
Transliteration into Latin characters
OCR (Optical Character Recognition) – has to be made in Germany

Looking for a German
Partner
Not a problem … here we are!
… but who are “we”?
• Project: Europeana Newspapers: http://www.europeana-newspapers.eu/
• 18 partners from 12 countries
• Tasks:
• Provide OCR for 18 million pages
• Provide OLR for 2 million pages
• Provide NER experimentally in assorted languages
• Provide best practice recommendations for newspaper metadata
• Provide quality prediction tools
• Aggregate content and make it available to TEL and Europeana
OCR = Optical Character Recognition
OLR = Optical Layout Recognition
NER = Named Entities Recognition
A Dance of Acronyms:
UCLA, SBB and CCS
UCLA sent data on hard drive
SBB
• Checked data for correctness and moved images into directory
structure
• Sent data to CCS in Hamburg for OCR and OLR
CCS (Content Conversion Specialists)
• Created full texts per article
• Stuck data in NZ web service for preliminary presentation purposes
SBB
• Will perform QA of OCR and OLR results
• Will provide all data to UCLA for further use
• Will present data in ZEFYS, its own newspaper portal; to the
Deutsche Digitale Bibliothek; to TEL (The European Library) and to
Europeana
Layout and structure analysis
 recognition of words, text lines, text blocks,
columns and classification of text blocks,
illustrations, advertisements, tables and the
following page types:
- title page (the title page of an issue)
- content page (a page that consists of content/text only)
- illustration page (a page that has at least one illustration)
- advertisement page (a page that contains adverts only)

 Structure analysis through classification of
headlines and grouping of zones into articles
(incl. article continuation)
ENP OLR workflow | Conversion without scanning
Digital Image
Digital Image
Metadata
Metadata
Delivery
Delivery

Digital Object
Digital Object
Return
Return

Material location
Conversion facility

Inspection //
Inspection
Automatic QA
Automatic QA

Conversion
MD Recording

Reject
Reject

Doc Delivery
Doc Delivery
Quality assurance


@ CCS | Automated markup and basic manual correction:
- headlines, illustrations, tables, captions, advertisements, etc.
- article segmentation and grouping of zones into articles (incl. continuation)



@ Content Provider (Library)
Recommended:
- Zoning: correct classification of blocks as „text“ or „illustration“
- Article segmentation: correct identification of headlines/text blocks/captions
- Grouping: correct gouping of blocks (text, illustration) to articles
- Metadata: correct title, issue date and issue number
Optional:
- Page types: correct page types
- Page numbers: correct page sequence
- OCR: perform text correction of specific zones (e.g. headlines, captions)
Output | METS/ALTO package


METS/ALTO metadata schemas to describe the structured digital output object



A newspaper issue processed in docWorks is converted into one METS XML
file. It reflects the whole physical and logical structure, manages all links to the
image files and the related ALTO XML files. ALTO is based on a standardized
page description schema and contains all information of a page (print space,
margins, coordinates, OCR results).



Benefits of structural markup:
- better browsing and more precise text search
- better access and display on tablet and mobile devices
- automated article classification and clustering through data/text mining and
linguistic technologies
- user engagement for manual online text correction, article classification,
annotation, building personal collections, etc.
- sharing articles via social media platforms like Facebook, Twitter, etc.
_______________
METS = Metadada Encoding and Transmission Standard
ALTO = Analyzed Layout and Text Object

More Related Content

Viewers also liked

The European(a) Newspapers Project
The European(a) Newspapers ProjectThe European(a) Newspapers Project
The European(a) Newspapers ProjectEuropeana Newspapers
 
ENP Belgrade WS refinement introduction
ENP Belgrade WS refinement introductionENP Belgrade WS refinement introduction
ENP Belgrade WS refinement introductionEuropeana Newspapers
 
Europeana Newspapers Project - German infoday
Europeana Newspapers Project - German infoday Europeana Newspapers Project - German infoday
Europeana Newspapers Project - German infoday Europeana Newspapers
 
Europeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introductionEuropeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introductionEuropeana Newspapers
 
Europeana Newspapers: novo mesto susreta korisnika digitalnih sadržaja
Europeana Newspapers: novo mesto susreta korisnika digitalnih sadržaja Europeana Newspapers: novo mesto susreta korisnika digitalnih sadržaja
Europeana Newspapers: novo mesto susreta korisnika digitalnih sadržaja Europeana Newspapers
 
Challenges and solutions in creating a european historic newspapers browser
Challenges and solutions in creating a european historic newspapers browser Challenges and solutions in creating a european historic newspapers browser
Challenges and solutions in creating a european historic newspapers browser Europeana Newspapers
 
eluxemburgensia: the portal for Luxembourg's historic newspapers
eluxemburgensia: the portal for Luxembourg's historic newspaperseluxemburgensia: the portal for Luxembourg's historic newspapers
eluxemburgensia: the portal for Luxembourg's historic newspapersEuropeana Newspapers
 

Viewers also liked (13)

The European(a) Newspapers Project
The European(a) Newspapers ProjectThe European(a) Newspapers Project
The European(a) Newspapers Project
 
ENP Belgrade WS refinement introduction
ENP Belgrade WS refinement introductionENP Belgrade WS refinement introduction
ENP Belgrade WS refinement introduction
 
Europeana Newspapers Project - German infoday
Europeana Newspapers Project - German infoday Europeana Newspapers Project - German infoday
Europeana Newspapers Project - German infoday
 
Web services uddi
Web services uddiWeb services uddi
Web services uddi
 
What is a named entity
What is a named entityWhat is a named entity
What is a named entity
 
Trtovac, dakic, september 2012
Trtovac, dakic, september 2012Trtovac, dakic, september 2012
Trtovac, dakic, september 2012
 
ENP Belgrade WS Metadata
ENP Belgrade WS MetadataENP Belgrade WS Metadata
ENP Belgrade WS Metadata
 
ENP Belgrade WS Introduction
ENP Belgrade WS IntroductionENP Belgrade WS Introduction
ENP Belgrade WS Introduction
 
ENP_SEEDI_2013_UB
ENP_SEEDI_2013_UBENP_SEEDI_2013_UB
ENP_SEEDI_2013_UB
 
Europeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introductionEuropeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introduction
 
Europeana Newspapers: novo mesto susreta korisnika digitalnih sadržaja
Europeana Newspapers: novo mesto susreta korisnika digitalnih sadržaja Europeana Newspapers: novo mesto susreta korisnika digitalnih sadržaja
Europeana Newspapers: novo mesto susreta korisnika digitalnih sadržaja
 
Challenges and solutions in creating a european historic newspapers browser
Challenges and solutions in creating a european historic newspapers browser Challenges and solutions in creating a european historic newspapers browser
Challenges and solutions in creating a european historic newspapers browser
 
eluxemburgensia: the portal for Luxembourg's historic newspapers
eluxemburgensia: the portal for Luxembourg's historic newspaperseluxemburgensia: the portal for Luxembourg's historic newspapers
eluxemburgensia: the portal for Luxembourg's historic newspapers
 

Similar to On the two sides of the pond

Which Data Quality is Needed and Affordable?
Which Data Quality is Needed and Affordable?Which Data Quality is Needed and Affordable?
Which Data Quality is Needed and Affordable?Charleston Conference
 
Links and Entities: The Library Data Revolution
Links and Entities: The Library Data RevolutionLinks and Entities: The Library Data Revolution
Links and Entities: The Library Data RevolutionOCLC
 
Book of the Dead Project
Book of the Dead ProjectBook of the Dead Project
Book of the Dead ProjectBarry Norton
 
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...jessica666
 
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...jessica666
 
How to read a million books?
How to read a million books?How to read a million books?
How to read a million books?cneudecker
 
State Library of Pennsylvania Cataloging PALA 2009
State Library of Pennsylvania Cataloging PALA 2009State Library of Pennsylvania Cataloging PALA 2009
State Library of Pennsylvania Cataloging PALA 2009William Fee
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritagecneudecker
 
Bridging The ALM Divide: An Integrated Archive-Library-Museum Approach for Hy...
Bridging The ALM Divide: An Integrated Archive-Library-Museum Approach for Hy...Bridging The ALM Divide: An Integrated Archive-Library-Museum Approach for Hy...
Bridging The ALM Divide: An Integrated Archive-Library-Museum Approach for Hy...Francesco Spagnolo
 
Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?cneudecker
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshellcneudecker
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER Europe
 
NASIG Webinar 2014 "From Record-Bound to Boundless: FRBR, Linked Data and New...
NASIG Webinar 2014 "From Record-Bound to Boundless: FRBR, Linked Data and New...NASIG Webinar 2014 "From Record-Bound to Boundless: FRBR, Linked Data and New...
NASIG Webinar 2014 "From Record-Bound to Boundless: FRBR, Linked Data and New...Juliya Borie
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectEuropeana Newspapers
 

Similar to On the two sides of the pond (20)

Data Mining Newspapers Metadata
Data Mining Newspapers MetadataData Mining Newspapers Metadata
Data Mining Newspapers Metadata
 
Which Data Quality is Needed and Affordable?
Which Data Quality is Needed and Affordable?Which Data Quality is Needed and Affordable?
Which Data Quality is Needed and Affordable?
 
Session5 03.george rehm
Session5 03.george rehmSession5 03.george rehm
Session5 03.george rehm
 
The Europeana Newspapers Project
The Europeana Newspapers ProjectThe Europeana Newspapers Project
The Europeana Newspapers Project
 
Links and Entities: The Library Data Revolution
Links and Entities: The Library Data RevolutionLinks and Entities: The Library Data Revolution
Links and Entities: The Library Data Revolution
 
Book of the Dead Project
Book of the Dead ProjectBook of the Dead Project
Book of the Dead Project
 
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
 
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
 
How to read a million books?
How to read a million books?How to read a million books?
How to read a million books?
 
State Library of Pennsylvania Cataloging PALA 2009
State Library of Pennsylvania Cataloging PALA 2009State Library of Pennsylvania Cataloging PALA 2009
State Library of Pennsylvania Cataloging PALA 2009
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritage
 
Bridging The ALM Divide: An Integrated Archive-Library-Museum Approach for Hy...
Bridging The ALM Divide: An Integrated Archive-Library-Museum Approach for Hy...Bridging The ALM Divide: An Integrated Archive-Library-Museum Approach for Hy...
Bridging The ALM Divide: An Integrated Archive-Library-Museum Approach for Hy...
 
Bibliotheca Digitalis Summer School: Bibliographic data – Definition, Structu...
Bibliotheca Digitalis Summer School: Bibliographic data – Definition, Structu...Bibliotheca Digitalis Summer School: Bibliographic data – Definition, Structu...
Bibliotheca Digitalis Summer School: Bibliographic data – Definition, Structu...
 
Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?
 
Links and Entities
Links and EntitiesLinks and Entities
Links and Entities
 
The Ground Truth: Arabic Scientific Manuscripts Workshop
The Ground Truth: Arabic Scientific Manuscripts WorkshopThe Ground Truth: Arabic Scientific Manuscripts Workshop
The Ground Truth: Arabic Scientific Manuscripts Workshop
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshell
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
 
NASIG Webinar 2014 "From Record-Bound to Boundless: FRBR, Linked Data and New...
NASIG Webinar 2014 "From Record-Bound to Boundless: FRBR, Linked Data and New...NASIG Webinar 2014 "From Record-Bound to Boundless: FRBR, Linked Data and New...
NASIG Webinar 2014 "From Record-Bound to Boundless: FRBR, Linked Data and New...
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
 

More from Europeana Newspapers

Presentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in ParisPresentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in ParisEuropeana Newspapers
 
Presentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information DayPresentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information DayEuropeana Newspapers
 
Presentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information DayPresentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information DayEuropeana Newspapers
 
Presentation of Hans-Jörg Lieder, BnF Information Day
Presentation of Hans-Jörg Lieder, BnF Information DayPresentation of Hans-Jörg Lieder, BnF Information Day
Presentation of Hans-Jörg Lieder, BnF Information DayEuropeana Newspapers
 
Présentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information DayPrésentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information DayEuropeana Newspapers
 
Presentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information DayPresentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information DayEuropeana Newspapers
 
Presentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information DayPresentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information DayEuropeana Newspapers
 
IFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza AtanassovaIFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza AtanassovaEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne KoutsEuropeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne KoutsEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel VeimannEuropeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel VeimannEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista AruEuropeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista AruEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred PussEuropeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred PussEuropeana Newspapers
 
Europeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday NeudeckerEuropeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday NeudeckerEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday ThompsonEuropeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday ThompsonEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday RossiEuropeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday RossiEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday MessinaEuropeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday MessinaEuropeana Newspapers
 
Europeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday MarchettiEuropeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday MarchettiEuropeana Newspapers
 

More from Europeana Newspapers (20)

Presentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in ParisPresentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in Paris
 
Presentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information DayPresentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information Day
 
Presentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information DayPresentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information Day
 
Presentation of Hans-Jörg Lieder, BnF Information Day
Presentation of Hans-Jörg Lieder, BnF Information DayPresentation of Hans-Jörg Lieder, BnF Information Day
Presentation of Hans-Jörg Lieder, BnF Information Day
 
Présentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information DayPrésentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information Day
 
Presentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information DayPresentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information Day
 
Presentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information DayPresentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information Day
 
IFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza AtanassovaIFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza Atanassova
 
Europeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne KoutsEuropeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne Kouts
 
Europeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel VeimannEuropeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel Veimann
 
Europeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista Kiisa
 
Europeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista AruEuropeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista Aru
 
Europeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred PussEuropeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred Puss
 
Europeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday NeudeckerEuropeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday Neudecker
 
Europeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday ThompsonEuropeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday Thompson
 
Europeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday RossiEuropeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday Rossi
 
Enp lft infoday_neudecker
Enp lft infoday_neudeckerEnp lft infoday_neudecker
Enp lft infoday_neudecker
 
Europeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday Muehlberger
 
Europeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday MessinaEuropeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday Messina
 
Europeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday MarchettiEuropeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday Marchetti
 

Recently uploaded

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

On the two sides of the pond

  • 1. On the Two Sides of the Pond By Hans-Jörg Lieder, Head of the Department of Bibliographic Services – Union Catalogue of Serials Staatsbibliothek zu Berlin - Preußischer Kulturbesitz; Dr. Katalin Radics, Distinguished Librarian; Librarian of the West European Collections and Classics Young Research Library, University of California, Los Angeles
  • 2. UNIQUE EUROPEAN MATERIALS – HELD IN A US LIBRARY
  • 3. Partnership between the UCLA Library and Staatsbibliothek zu Berlin
  • 4. Newspapers on the way to discoloring and disintegration Storage facility of the University of California Libraries on the UCLA campus
  • 5. - Leaflets 13”x18.5” or 33cm x 47cm - Imprint indicating the title, date, the number of the issue; warning -Published four or five times a day
  • 6. UCLA stamps including receiving dates Packed in wrapping paper probably after 1940, packages of 700-800 sheets No documentation (ordering or receiving records) in the library archives; no correspondence Normal serial subscription scheme (?) Very minimal cataloging record – very low use
  • 7. Towards a Weeding Decision Brittle condition Check for other holdings in California, US and World libraries OCLC – no other holdings at the time of checking Nine 1938 issues at BNF No holding at the German National Library (Deutsche Nationalbibliothek) Contact with head of Zeitungsabteilung, Staatsbibliothek – no holding in Germany UNIQUE!!! Decision: keep and preserve the UCLA holdings.
  • 8. Keep and Preserve 9600 pages 1936-1940 with gaps Acid-free boxes The most fragile pages in mylar
  • 9. Digitization Project Funding for digitization Highest quality resolution: 600 dpi RGB Add minimal metadata
  • 10. Title Deutsches Nachrichtenbüro. 5 Jahrg., Nr. 1581, 1938 October 1, Erste Morgen-Ausgabe Alt ID 3813183_1938-10-01_1581 [Local] AltTitle Erste Morgen-Ausgabe [Descriptive] Deutsches Nachrichtenbüro [Descriptive] Date October 1, 1938 [Publication] 1938-10-01 [Normalized] Format 1 p. [Extent] Language ger Name University of California, Los Angeles. Library. Dept. of Special Collections [Repository] Type newspapers [Genre] text [Type Of Resource]
  • 11. Digitized copies: part of UCLA Digital Library at http://digital2.library.ucla.edu/ -- freely accessible Searchable only by date More sophisticated searching capability needed – day by day chronicle of the Third Reich for a short period of time -events -names -institutions etc. Deutsches Nachrichten Büro – December 5, 1933network of 36 local services (Landesdienste)
  • 12. Indexing needed Fraktur – major problem Transliteration into Latin characters OCR (Optical Character Recognition) – has to be made in Germany Looking for a German Partner
  • 13. Not a problem … here we are!
  • 14. … but who are “we”? • Project: Europeana Newspapers: http://www.europeana-newspapers.eu/ • 18 partners from 12 countries • Tasks: • Provide OCR for 18 million pages • Provide OLR for 2 million pages • Provide NER experimentally in assorted languages • Provide best practice recommendations for newspaper metadata • Provide quality prediction tools • Aggregate content and make it available to TEL and Europeana OCR = Optical Character Recognition OLR = Optical Layout Recognition NER = Named Entities Recognition
  • 15. A Dance of Acronyms: UCLA, SBB and CCS UCLA sent data on hard drive SBB • Checked data for correctness and moved images into directory structure • Sent data to CCS in Hamburg for OCR and OLR CCS (Content Conversion Specialists) • Created full texts per article • Stuck data in NZ web service for preliminary presentation purposes SBB • Will perform QA of OCR and OLR results • Will provide all data to UCLA for further use • Will present data in ZEFYS, its own newspaper portal; to the Deutsche Digitale Bibliothek; to TEL (The European Library) and to Europeana
  • 16. Layout and structure analysis  recognition of words, text lines, text blocks, columns and classification of text blocks, illustrations, advertisements, tables and the following page types: - title page (the title page of an issue) - content page (a page that consists of content/text only) - illustration page (a page that has at least one illustration) - advertisement page (a page that contains adverts only)  Structure analysis through classification of headlines and grouping of zones into articles (incl. article continuation)
  • 17. ENP OLR workflow | Conversion without scanning Digital Image Digital Image Metadata Metadata Delivery Delivery Digital Object Digital Object Return Return Material location Conversion facility Inspection // Inspection Automatic QA Automatic QA Conversion MD Recording Reject Reject Doc Delivery Doc Delivery
  • 18. Quality assurance  @ CCS | Automated markup and basic manual correction: - headlines, illustrations, tables, captions, advertisements, etc. - article segmentation and grouping of zones into articles (incl. continuation)  @ Content Provider (Library) Recommended: - Zoning: correct classification of blocks as „text“ or „illustration“ - Article segmentation: correct identification of headlines/text blocks/captions - Grouping: correct gouping of blocks (text, illustration) to articles - Metadata: correct title, issue date and issue number Optional: - Page types: correct page types - Page numbers: correct page sequence - OCR: perform text correction of specific zones (e.g. headlines, captions)
  • 19. Output | METS/ALTO package  METS/ALTO metadata schemas to describe the structured digital output object  A newspaper issue processed in docWorks is converted into one METS XML file. It reflects the whole physical and logical structure, manages all links to the image files and the related ALTO XML files. ALTO is based on a standardized page description schema and contains all information of a page (print space, margins, coordinates, OCR results).  Benefits of structural markup: - better browsing and more precise text search - better access and display on tablet and mobile devices - automated article classification and clustering through data/text mining and linguistic technologies - user engagement for manual online text correction, article classification, annotation, building personal collections, etc. - sharing articles via social media platforms like Facebook, Twitter, etc. _______________ METS = Metadada Encoding and Transmission Standard ALTO = Analyzed Layout and Text Object