SlideShare a Scribd company logo
Text- und Strukturerkennung für historische Zeitungen 
Günter Mühlberger 
Universität Innsbruck – Digitalisierung und elektronische Archivierung
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Who we are 
•Digitisation and Digital Preservation group @ University of Innsbruck 
•Since mid 1990ies involved in digitisation and Optical Character Recognition (OCR) 
•Research projects: LAURIN, METADATA ENGINE, books2u!, reUSE, Digitisation on Demand, eBooks on Demand, IMPACT, PrestoPRIME, ARROW+, Europeana Newspaper, tranScriptorium,… 
•Our mission: “Digitisation of humanities” = Digital Humanities 
•Selection of Digitisation projects 
•Austrian Literature Online (since 2002) 
•Digitisation of the Innsbrucker Newspaper Archive (2004-2006) 
•Digitisation of the Tiroler Tageszeitung from 1945-2003) (2012-2014) 
•Text recognition of 8 Mill. Newspaper pages within Europeana Newspapers 
•Commercial services via the Technology Transferplatform of the University 
2
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Digitisation 
3 
IMAGE CAPTURING 
TEXT & STRUCTURE RECOGNITION 
NATURAL LANGUAGE PROCESSING 
CONTENT REPRESEN- TATION
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Example – Index card: Capturing 
4
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
OCR Interface 
5
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Raw OCR Text 
6 
“Â.”- ikonogr. 
religiös 
V oragine , Jacob a ; LEGENDA AUREA Dresdae ÄLipsiae 1846
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Structure Recognition 
7 
“Â.”- ikonogr. 
religiös 
V oragine , Jacob a ; LEGENDA AUREA Dresdae ÄLipsiae 1846
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Natural Language Processing 
8 
Voragine, Jacob 
LEGENDA AUREA 
1846 
 Matching with reference database, e.g. WorldCat
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Matching with Reference (knowledge) data 
9
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
The actual book 
10
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Content Representation 
Instead of a scanned index card we are able to access/link/work with a full featured catalogue entry and the actually digitised work 
Instead of digitised newspapers we want to access/link/work with the content/information/knowledge contained in these newspapers! 
OCR is one important step towards this overall objective! 
11
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
OCR – Some Facts 
•Optical Character Recognition 
•“Old” technology: “pattern recognition” 
•Largest progress in late 1990ies 
•Market situation 
•Two large companies: ABBYY, Nuance 
•Cheap technology 
•Open Source tools: Tesseract, Ocropus, Gamera,… 
•Google: Worked with ABYYY, changed to Tesseract since 2012 
•ABBYY 
•Took part in two EU projects 
•Gothic letter and long “s” out of the box “Old Italian” as language 
•Direct export of Analysed Layout and Text Object (ALTO) 
12
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Output 
•Processing 
•University Innsbruck, 32 ABBYY Licenses on 4 Server 
•10.000 large newspaper pages per day, 40.000 medium size, 150.000 book size 
•PDF 
•Text above the image vs. text behind the image 
•PDF/A Standard 
•Tagged PDF 
•XML - ALTO 
•Keeps all the information: Blocks, type of blocks, languages, lines, words, characters, confidence of words, etc. 
•ALTO: de-facto standard – Library of Congress 
13
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Accuracy rates 
•What do we expect? 
•Researchers: Critical edition of Shakespears Works: no error accepted 
•eBooks: less than 1 error per 1000 characters (=half a page) 
•Users getting full-text searching offered as an additional feature? 
•Academic staff working (copy & paste) with a text? 
•Natural language processing? 
•Knowledge extraction? 
•Word Error Rate (WER) vs. Character Error Rate (CER) 
•WER more meaningful to users 
•WER easier to measure 
14
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
IMPACT EVA/MINERVA 12th Nov. 2008 
15
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
IMPACT EVA/MINERVA 12th Nov. 2008 
16
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
17
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Outlook OCR 
•Abbyy 
•For small and medium amounts, up to some ten-millions of pages 
•Tesseract 
•Growing community 
•Can be parallelized on High Performance Computing engines (e.g. several hundreds or thousands of nodes) 
•More experiments can be done for very large volumes, e.g. hundreds of millions of pages 
•Handwritten Text Recognition 
•Next generation of engines for handwritten material 
•Speech and face recognition as technological background 
•Transcription and Recognition Platform 
•Virtual Research Environment 
•Will be released by University of Innsbruck in 2015 
18
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Structural Metadata 
•Layout Analyses 
•Noise reduction (redundant text) 
•A newspaper contains much more than edited articles 
Content units 
•One separation could be: edited articles – advertisements - entertainment 
•Document Understanding 
•Newspaper consists of repeated sections (“templates”) 
•Unique vs. common content 
E.g. local news, local advertisements, etc. vs. “world news” 
•Common content may be found elsewhere in more detail 
E.g. book announcement 
19
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Austrian Newspapers Online – ANNO - 1916 
20
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
…more than edited articles 
21
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Edited articles vs. advertisements vs. entertainment 
22 
Innsbrucker Nachrichten, 4 June 1870
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Innsbrucker Nachrichten 1870 
23
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Content units 
•Types 
•List of recently died persons 
•Announcement of local associations 
•Apartments to rent 
•Obituaries 
•Continued novels 
•… 
24
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Technical approaches 
•Layout analysis 
•Specific tools 
•XML Output of OCR engine (cheap, easy to handle) 
•Approaches 
•Rule based approaches (experts needed) 
•Machine learning approaches (large amounts of training samples needed) 
•Functional Extension Parser (IMPACT project) 
•Rule based approach for historical books (pre 1900) 
•More than 80% accuracy for non-trivial features are hard to reach 
•E.g. separation edited text – advertisments – entertainment, running titles, section headings, 
25
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Summary 
•Digitisation of newspapers is in many countries/regions still at the beginning 
•OCR, though erroneous, is a must and cheap (compared to scanning) 
•Post-processing of OCR is promising 
•Structural metadata are a must as well, new approaches are needed (beyond article separation) 
•Natural Language Processing and more advanced operations will benefit 
•Final goal of “document understanding” by machines 
26
Thank you for your attention! 
l 
Günter Mühlberger <guenter.muehlberger@uibk.ac.at>

More Related Content

What's hot

ENP Belgrade WS refinement introduction
ENP Belgrade WS refinement introductionENP Belgrade WS refinement introduction
ENP Belgrade WS refinement introduction
Europeana Newspapers
 
Large scale refinement of digital historical newspapers with named entities r...
Large scale refinement of digital historical newspapers with named entities r...Large scale refinement of digital historical newspapers with named entities r...
Large scale refinement of digital historical newspapers with named entities r...
cneudecker
 
ENP Belgrade WS Introduction
ENP Belgrade WS IntroductionENP Belgrade WS Introduction
ENP Belgrade WS Introduction
Europeana Newspapers
 
ENP Belgrade Workshop Project Overview
ENP Belgrade Workshop Project OverviewENP Belgrade Workshop Project Overview
ENP Belgrade Workshop Project Overview
Europeana Newspapers
 
ENP Belgrade WS Metadata
ENP Belgrade WS MetadataENP Belgrade WS Metadata
ENP Belgrade WS Metadata
Europeana Newspapers
 
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Data Driven Innovation
 
Europeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers LIBER2013 Workshop introEuropeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers
 
Refinement of Digitised Newspapers
Refinement of Digitised NewspapersRefinement of Digitised Newspapers
Refinement of Digitised Newspapers
cneudecker
 
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Data Driven Innovation
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award Ceremony
Archiver
 
ENP_Dutch_Infoday_LWilms
ENP_Dutch_Infoday_LWilmsENP_Dutch_Infoday_LWilms
ENP_Dutch_Infoday_LWilms
Europeana Newspapers
 
An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...
cneudecker
 
Experimental Workflow Development in Digitisation
Experimental Workflow Development in DigitisationExperimental Workflow Development in Digitisation
Experimental Workflow Development in Digitisation
cneudecker
 
Benefits of collaborative EU digitization projects
Benefits of collaborative EU digitization projectsBenefits of collaborative EU digitization projects
Benefits of collaborative EU digitization projects
Trilce Navarrete
 
Centre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens NeudeckerCentre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens Neudecker
Biblioteca Nacional de España
 
2015 july 9 europeana labs market & audiences
2015 july 9 europeana labs market & audiences2015 july 9 europeana labs market & audiences
2015 july 9 europeana labs market & audiences
Europeana
 
Europeana Music Channel, wireframes
Europeana Music Channel, wireframesEuropeana Music Channel, wireframes
Europeana Music Channel, wireframes
David Haskiya
 
Open Government Data in Europe
Open Government Data in EuropeOpen Government Data in Europe
Open Government Data in Europe
okfn
 
Prototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and CeremonyPrototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and Ceremony
Archiver
 
Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013
Europeana Newspapers
 

What's hot (20)

ENP Belgrade WS refinement introduction
ENP Belgrade WS refinement introductionENP Belgrade WS refinement introduction
ENP Belgrade WS refinement introduction
 
Large scale refinement of digital historical newspapers with named entities r...
Large scale refinement of digital historical newspapers with named entities r...Large scale refinement of digital historical newspapers with named entities r...
Large scale refinement of digital historical newspapers with named entities r...
 
ENP Belgrade WS Introduction
ENP Belgrade WS IntroductionENP Belgrade WS Introduction
ENP Belgrade WS Introduction
 
ENP Belgrade Workshop Project Overview
ENP Belgrade Workshop Project OverviewENP Belgrade Workshop Project Overview
ENP Belgrade Workshop Project Overview
 
ENP Belgrade WS Metadata
ENP Belgrade WS MetadataENP Belgrade WS Metadata
ENP Belgrade WS Metadata
 
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
 
Europeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers LIBER2013 Workshop introEuropeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers LIBER2013 Workshop intro
 
Refinement of Digitised Newspapers
Refinement of Digitised NewspapersRefinement of Digitised Newspapers
Refinement of Digitised Newspapers
 
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award Ceremony
 
ENP_Dutch_Infoday_LWilms
ENP_Dutch_Infoday_LWilmsENP_Dutch_Infoday_LWilms
ENP_Dutch_Infoday_LWilms
 
An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...
 
Experimental Workflow Development in Digitisation
Experimental Workflow Development in DigitisationExperimental Workflow Development in Digitisation
Experimental Workflow Development in Digitisation
 
Benefits of collaborative EU digitization projects
Benefits of collaborative EU digitization projectsBenefits of collaborative EU digitization projects
Benefits of collaborative EU digitization projects
 
Centre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens NeudeckerCentre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens Neudecker
 
2015 july 9 europeana labs market & audiences
2015 july 9 europeana labs market & audiences2015 july 9 europeana labs market & audiences
2015 july 9 europeana labs market & audiences
 
Europeana Music Channel, wireframes
Europeana Music Channel, wireframesEuropeana Music Channel, wireframes
Europeana Music Channel, wireframes
 
Open Government Data in Europe
Open Government Data in EuropeOpen Government Data in Europe
Open Government Data in Europe
 
Prototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and CeremonyPrototype Phase Kick-off Event and Ceremony
Prototype Phase Kick-off Event and Ceremony
 
Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013
 

Viewers also liked

Présentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information DayPrésentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information Day
Europeana Newspapers
 
Projekt Europeana Newspapers - online brána k evropským historickým novinám
Projekt Europeana Newspapers - online brána k evropským historickým novinámProjekt Europeana Newspapers - online brána k evropským historickým novinám
Projekt Europeana Newspapers - online brána k evropským historickým novinám
Europeana Newspapers
 
Europeana Newspapers project contribution to the freedom of information: find...
Europeana Newspapers project contribution to the freedom of information: find...Europeana Newspapers project contribution to the freedom of information: find...
Europeana Newspapers project contribution to the freedom of information: find...
Europeana Newspapers
 
EunewsLDN_Christa_Müller
EunewsLDN_Christa_MüllerEunewsLDN_Christa_Müller
EunewsLDN_Christa_Müller
Europeana Newspapers
 
Europeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday NeudeckerEuropeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday Neudecker
Europeana Newspapers
 
Timarit: Iceland's historic newspapers online.
Timarit: Iceland's historic newspapers online.Timarit: Iceland's historic newspapers online.
Timarit: Iceland's historic newspapers online.
Europeana Newspapers
 
DocWorks Demo
DocWorks DemoDocWorks Demo
DocWorks Demo
Europeana Newspapers
 
EurnewsLDN_Krzysztof_Nichczynski
EurnewsLDN_Krzysztof_NichczynskiEurnewsLDN_Krzysztof_Nichczynski
EurnewsLDN_Krzysztof_Nichczynski
Europeana Newspapers
 
Europeana Newspapers German Infoday Quality Assessment
Europeana Newspapers German Infoday Quality AssessmentEuropeana Newspapers German Infoday Quality Assessment
Europeana Newspapers German Infoday Quality AssessmentEuropeana Newspapers
 
Europeana Newspaper metadata LIBER2013
Europeana Newspaper metadata LIBER2013Europeana Newspaper metadata LIBER2013
Europeana Newspaper metadata LIBER2013
Europeana Newspapers
 
IFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza AtanassovaIFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza Atanassova
Europeana Newspapers
 
Europeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred PussEuropeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers
 
ENP Belgrade WS OLR @ CCS
ENP Belgrade WS OLR @ CCSENP Belgrade WS OLR @ CCS
ENP Belgrade WS OLR @ CCS
Europeana Newspapers
 
Utilising Webometric Data from Online Digitised Newspaper Collections
Utilising Webometric Data from Online Digitised Newspaper CollectionsUtilising Webometric Data from Online Digitised Newspaper Collections
Utilising Webometric Data from Online Digitised Newspaper Collections
Europeana Newspapers
 
ENP_ONB_infday_GMuehlberger
ENP_ONB_infday_GMuehlbergerENP_ONB_infday_GMuehlberger
ENP_ONB_infday_GMuehlberger
Europeana Newspapers
 
Ifla 2013 newspapers_kiisa_day2_15082013
Ifla 2013 newspapers_kiisa_day2_15082013Ifla 2013 newspapers_kiisa_day2_15082013
Ifla 2013 newspapers_kiisa_day2_15082013
Europeana Newspapers
 

Viewers also liked (16)

Présentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information DayPrésentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information Day
 
Projekt Europeana Newspapers - online brána k evropským historickým novinám
Projekt Europeana Newspapers - online brána k evropským historickým novinámProjekt Europeana Newspapers - online brána k evropským historickým novinám
Projekt Europeana Newspapers - online brána k evropským historickým novinám
 
Europeana Newspapers project contribution to the freedom of information: find...
Europeana Newspapers project contribution to the freedom of information: find...Europeana Newspapers project contribution to the freedom of information: find...
Europeana Newspapers project contribution to the freedom of information: find...
 
EunewsLDN_Christa_Müller
EunewsLDN_Christa_MüllerEunewsLDN_Christa_Müller
EunewsLDN_Christa_Müller
 
Europeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday NeudeckerEuropeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday Neudecker
 
Timarit: Iceland's historic newspapers online.
Timarit: Iceland's historic newspapers online.Timarit: Iceland's historic newspapers online.
Timarit: Iceland's historic newspapers online.
 
DocWorks Demo
DocWorks DemoDocWorks Demo
DocWorks Demo
 
EurnewsLDN_Krzysztof_Nichczynski
EurnewsLDN_Krzysztof_NichczynskiEurnewsLDN_Krzysztof_Nichczynski
EurnewsLDN_Krzysztof_Nichczynski
 
Europeana Newspapers German Infoday Quality Assessment
Europeana Newspapers German Infoday Quality AssessmentEuropeana Newspapers German Infoday Quality Assessment
Europeana Newspapers German Infoday Quality Assessment
 
Europeana Newspaper metadata LIBER2013
Europeana Newspaper metadata LIBER2013Europeana Newspaper metadata LIBER2013
Europeana Newspaper metadata LIBER2013
 
IFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza AtanassovaIFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza Atanassova
 
Europeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred PussEuropeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred Puss
 
ENP Belgrade WS OLR @ CCS
ENP Belgrade WS OLR @ CCSENP Belgrade WS OLR @ CCS
ENP Belgrade WS OLR @ CCS
 
Utilising Webometric Data from Online Digitised Newspaper Collections
Utilising Webometric Data from Online Digitised Newspaper CollectionsUtilising Webometric Data from Online Digitised Newspaper Collections
Utilising Webometric Data from Online Digitised Newspaper Collections
 
ENP_ONB_infday_GMuehlberger
ENP_ONB_infday_GMuehlbergerENP_ONB_infday_GMuehlberger
ENP_ONB_infday_GMuehlberger
 
Ifla 2013 newspapers_kiisa_day2_15082013
Ifla 2013 newspapers_kiisa_day2_15082013Ifla 2013 newspapers_kiisa_day2_15082013
Ifla 2013 newspapers_kiisa_day2_15082013
 

Similar to Europeana Newspapers LFT Infoday Muehlberger

Europeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers OnlineEuropeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers Online
cneudecker
 
04 europeana newspapers
04 europeana newspapers04 europeana newspapers
04 europeana newspapers
Europeana
 
Europeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers
 
Europeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introductionEuropeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introduction
Europeana Newspapers
 
Overview of the Europeana Newspapers Project
Overview of the Europeana Newspapers ProjectOverview of the Europeana Newspapers Project
Overview of the Europeana Newspapers Project
Europeana Newspapers
 
Europeana Newspapers in a nutshell
Europeana Newspapers in a nutshellEuropeana Newspapers in a nutshell
Europeana Newspapers in a nutshell
cneudecker
 
EurnewsLDN_Clemens_Neudecker
EurnewsLDN_Clemens_NeudeckerEurnewsLDN_Clemens_Neudecker
EurnewsLDN_Clemens_Neudecker
Europeana Newspapers
 
Europeana Newspapers Polish Information Day
Europeana Newspapers Polish Information DayEuropeana Newspapers Polish Information Day
Europeana Newspapers Polish Information Day
Europeana Newspapers
 
Positioning libraries in the digital preservation landscape
Positioning libraries in the digital preservation landscapePositioning libraries in the digital preservation landscape
Positioning libraries in the digital preservation landscape
LIBER Europe
 
The challenges of making Europe's newspapers available online
The challenges of making Europe's newspapers available onlineThe challenges of making Europe's newspapers available online
The challenges of making Europe's newspapers available online
LIBER Europe
 
Performance Evaluation and Quality Assessment
Performance Evaluation and Quality AssessmentPerformance Evaluation and Quality Assessment
Performance Evaluation and Quality Assessment
Europeana Newspapers
 
The Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiativesThe Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiatives
Michael Day
 
Cultural Heritage & H2020
Cultural Heritage & H2020Cultural Heritage & H2020
Cultural Heritage & H2020
locloud
 
Workflow Development for OCR (and beyond)
Workflow Development for OCR (and beyond)Workflow Development for OCR (and beyond)
Workflow Development for OCR (and beyond)
cneudecker
 
Digitizing European Industry
Digitizing European IndustryDigitizing European Industry
Digitizing European Industry
Direzione Att. produttive Regione Toscana
 
Bne impact co_c
Bne impact co_cBne impact co_c
Living Lab Expo 12102012
Living Lab Expo 12102012Living Lab Expo 12102012
Living Lab Expo 12102012
European Network of Living Labs (ENoLL)
 
Living Lab Expo ENoLL presentatio
Living Lab Expo ENoLL presentatioLiving Lab Expo ENoLL presentatio
Living Lab Expo ENoLL presentatio
European Network of Living Labs (ENoLL)
 
IMPACT HPC Cloud Day
IMPACT HPC Cloud DayIMPACT HPC Cloud Day
IMPACT HPC Cloud Day
cneudecker
 

Similar to Europeana Newspapers LFT Infoday Muehlberger (19)

Europeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers OnlineEuropeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers Online
 
04 europeana newspapers
04 europeana newspapers04 europeana newspapers
04 europeana newspapers
 
Europeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista Kiisa
 
Europeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introductionEuropeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introduction
 
Overview of the Europeana Newspapers Project
Overview of the Europeana Newspapers ProjectOverview of the Europeana Newspapers Project
Overview of the Europeana Newspapers Project
 
Europeana Newspapers in a nutshell
Europeana Newspapers in a nutshellEuropeana Newspapers in a nutshell
Europeana Newspapers in a nutshell
 
EurnewsLDN_Clemens_Neudecker
EurnewsLDN_Clemens_NeudeckerEurnewsLDN_Clemens_Neudecker
EurnewsLDN_Clemens_Neudecker
 
Europeana Newspapers Polish Information Day
Europeana Newspapers Polish Information DayEuropeana Newspapers Polish Information Day
Europeana Newspapers Polish Information Day
 
Positioning libraries in the digital preservation landscape
Positioning libraries in the digital preservation landscapePositioning libraries in the digital preservation landscape
Positioning libraries in the digital preservation landscape
 
The challenges of making Europe's newspapers available online
The challenges of making Europe's newspapers available onlineThe challenges of making Europe's newspapers available online
The challenges of making Europe's newspapers available online
 
Performance Evaluation and Quality Assessment
Performance Evaluation and Quality AssessmentPerformance Evaluation and Quality Assessment
Performance Evaluation and Quality Assessment
 
The Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiativesThe Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiatives
 
Cultural Heritage & H2020
Cultural Heritage & H2020Cultural Heritage & H2020
Cultural Heritage & H2020
 
Workflow Development for OCR (and beyond)
Workflow Development for OCR (and beyond)Workflow Development for OCR (and beyond)
Workflow Development for OCR (and beyond)
 
Digitizing European Industry
Digitizing European IndustryDigitizing European Industry
Digitizing European Industry
 
Bne impact co_c
Bne impact co_cBne impact co_c
Bne impact co_c
 
Living Lab Expo 12102012
Living Lab Expo 12102012Living Lab Expo 12102012
Living Lab Expo 12102012
 
Living Lab Expo ENoLL presentatio
Living Lab Expo ENoLL presentatioLiving Lab Expo ENoLL presentatio
Living Lab Expo ENoLL presentatio
 
IMPACT HPC Cloud Day
IMPACT HPC Cloud DayIMPACT HPC Cloud Day
IMPACT HPC Cloud Day
 

More from Europeana Newspapers

Presentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in ParisPresentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in Paris
Europeana Newspapers
 
Presentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information DayPresentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information Day
Europeana Newspapers
 
Presentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information DayPresentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information Day
Europeana Newspapers
 
Presentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information DayPresentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information Day
Europeana Newspapers
 
Europeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne KoutsEuropeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers
 
Europeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel VeimannEuropeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers
 
Europeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista AruEuropeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers
 
Europeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday ThompsonEuropeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday Thompson
Europeana Newspapers
 
Europeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday RossiEuropeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday Rossi
Europeana Newspapers
 
Europeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday MessinaEuropeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday Messina
Europeana Newspapers
 
Europeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday MarchettiEuropeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday Marchetti
Europeana Newspapers
 
Europeana Newspapers LFT Infoday Kempf
Europeana Newspapers LFT Infoday KempfEuropeana Newspapers LFT Infoday Kempf
Europeana Newspapers LFT Infoday Kempf
Europeana Newspapers
 
Europeana Newspapers LFT Infoday Bolioli
Europeana Newspapers LFT Infoday BolioliEuropeana Newspapers LFT Infoday Bolioli
Europeana Newspapers LFT Infoday Bolioli
Europeana Newspapers
 
ENP_Dutch_Infoday_MWillems
ENP_Dutch_Infoday_MWillemsENP_Dutch_Infoday_MWillems
ENP_Dutch_Infoday_MWillems
Europeana Newspapers
 
ENP_Dutch_Infoday_PHuijnen
ENP_Dutch_Infoday_PHuijnen ENP_Dutch_Infoday_PHuijnen
ENP_Dutch_Infoday_PHuijnen
Europeana Newspapers
 
ENP_Dutch_Infoday_SKruizinga
ENP_Dutch_Infoday_SKruizingaENP_Dutch_Infoday_SKruizinga
ENP_Dutch_Infoday_SKruizinga
Europeana Newspapers
 
ENP_Dutch_infoday_HCrijns
ENP_Dutch_infoday_HCrijnsENP_Dutch_infoday_HCrijns
ENP_Dutch_infoday_HCrijns
Europeana Newspapers
 
ENP_Dutch_infoday_EVanEijck
ENP_Dutch_infoday_EVanEijckENP_Dutch_infoday_EVanEijck
ENP_Dutch_infoday_EVanEijck
Europeana Newspapers
 
ENP_ONB_infoday_Schaller
ENP_ONB_infoday_SchallerENP_ONB_infoday_Schaller
ENP_ONB_infoday_Schaller
Europeana Newspapers
 

More from Europeana Newspapers (20)

Presentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in ParisPresentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in Paris
 
Presentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information DayPresentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information Day
 
Presentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information DayPresentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information Day
 
Presentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information DayPresentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information Day
 
Europeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne KoutsEuropeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne Kouts
 
Europeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel VeimannEuropeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel Veimann
 
Europeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista AruEuropeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista Aru
 
Europeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday ThompsonEuropeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday Thompson
 
Europeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday RossiEuropeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday Rossi
 
Enp lft infoday_neudecker
Enp lft infoday_neudeckerEnp lft infoday_neudecker
Enp lft infoday_neudecker
 
Europeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday MessinaEuropeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday Messina
 
Europeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday MarchettiEuropeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday Marchetti
 
Europeana Newspapers LFT Infoday Kempf
Europeana Newspapers LFT Infoday KempfEuropeana Newspapers LFT Infoday Kempf
Europeana Newspapers LFT Infoday Kempf
 
Europeana Newspapers LFT Infoday Bolioli
Europeana Newspapers LFT Infoday BolioliEuropeana Newspapers LFT Infoday Bolioli
Europeana Newspapers LFT Infoday Bolioli
 
ENP_Dutch_Infoday_MWillems
ENP_Dutch_Infoday_MWillemsENP_Dutch_Infoday_MWillems
ENP_Dutch_Infoday_MWillems
 
ENP_Dutch_Infoday_PHuijnen
ENP_Dutch_Infoday_PHuijnen ENP_Dutch_Infoday_PHuijnen
ENP_Dutch_Infoday_PHuijnen
 
ENP_Dutch_Infoday_SKruizinga
ENP_Dutch_Infoday_SKruizingaENP_Dutch_Infoday_SKruizinga
ENP_Dutch_Infoday_SKruizinga
 
ENP_Dutch_infoday_HCrijns
ENP_Dutch_infoday_HCrijnsENP_Dutch_infoday_HCrijns
ENP_Dutch_infoday_HCrijns
 
ENP_Dutch_infoday_EVanEijck
ENP_Dutch_infoday_EVanEijckENP_Dutch_infoday_EVanEijck
ENP_Dutch_infoday_EVanEijck
 
ENP_ONB_infoday_Schaller
ENP_ONB_infoday_SchallerENP_ONB_infoday_Schaller
ENP_ONB_infoday_Schaller
 

Recently uploaded

How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
สมใจ จันสุกสี
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
siemaillard
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Diana Rendina
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 

Recently uploaded (20)

How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 

Europeana Newspapers LFT Infoday Muehlberger

  • 1. Text- und Strukturerkennung für historische Zeitungen Günter Mühlberger Universität Innsbruck – Digitalisierung und elektronische Archivierung
  • 2. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Who we are •Digitisation and Digital Preservation group @ University of Innsbruck •Since mid 1990ies involved in digitisation and Optical Character Recognition (OCR) •Research projects: LAURIN, METADATA ENGINE, books2u!, reUSE, Digitisation on Demand, eBooks on Demand, IMPACT, PrestoPRIME, ARROW+, Europeana Newspaper, tranScriptorium,… •Our mission: “Digitisation of humanities” = Digital Humanities •Selection of Digitisation projects •Austrian Literature Online (since 2002) •Digitisation of the Innsbrucker Newspaper Archive (2004-2006) •Digitisation of the Tiroler Tageszeitung from 1945-2003) (2012-2014) •Text recognition of 8 Mill. Newspaper pages within Europeana Newspapers •Commercial services via the Technology Transferplatform of the University 2
  • 3. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Digitisation 3 IMAGE CAPTURING TEXT & STRUCTURE RECOGNITION NATURAL LANGUAGE PROCESSING CONTENT REPRESEN- TATION
  • 4. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Example – Index card: Capturing 4
  • 5. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp OCR Interface 5
  • 6. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Raw OCR Text 6 “Â.”- ikonogr. religiös V oragine , Jacob a ; LEGENDA AUREA Dresdae ÄLipsiae 1846
  • 7. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Structure Recognition 7 “Â.”- ikonogr. religiös V oragine , Jacob a ; LEGENDA AUREA Dresdae ÄLipsiae 1846
  • 8. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Natural Language Processing 8 Voragine, Jacob LEGENDA AUREA 1846  Matching with reference database, e.g. WorldCat
  • 9. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Matching with Reference (knowledge) data 9
  • 10. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp The actual book 10
  • 11. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Content Representation Instead of a scanned index card we are able to access/link/work with a full featured catalogue entry and the actually digitised work Instead of digitised newspapers we want to access/link/work with the content/information/knowledge contained in these newspapers! OCR is one important step towards this overall objective! 11
  • 12. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp OCR – Some Facts •Optical Character Recognition •“Old” technology: “pattern recognition” •Largest progress in late 1990ies •Market situation •Two large companies: ABBYY, Nuance •Cheap technology •Open Source tools: Tesseract, Ocropus, Gamera,… •Google: Worked with ABYYY, changed to Tesseract since 2012 •ABBYY •Took part in two EU projects •Gothic letter and long “s” out of the box “Old Italian” as language •Direct export of Analysed Layout and Text Object (ALTO) 12
  • 13. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Output •Processing •University Innsbruck, 32 ABBYY Licenses on 4 Server •10.000 large newspaper pages per day, 40.000 medium size, 150.000 book size •PDF •Text above the image vs. text behind the image •PDF/A Standard •Tagged PDF •XML - ALTO •Keeps all the information: Blocks, type of blocks, languages, lines, words, characters, confidence of words, etc. •ALTO: de-facto standard – Library of Congress 13
  • 14. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Accuracy rates •What do we expect? •Researchers: Critical edition of Shakespears Works: no error accepted •eBooks: less than 1 error per 1000 characters (=half a page) •Users getting full-text searching offered as an additional feature? •Academic staff working (copy & paste) with a text? •Natural language processing? •Knowledge extraction? •Word Error Rate (WER) vs. Character Error Rate (CER) •WER more meaningful to users •WER easier to measure 14
  • 15. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp IMPACT EVA/MINERVA 12th Nov. 2008 15
  • 16. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp IMPACT EVA/MINERVA 12th Nov. 2008 16
  • 17. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 17
  • 18. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Outlook OCR •Abbyy •For small and medium amounts, up to some ten-millions of pages •Tesseract •Growing community •Can be parallelized on High Performance Computing engines (e.g. several hundreds or thousands of nodes) •More experiments can be done for very large volumes, e.g. hundreds of millions of pages •Handwritten Text Recognition •Next generation of engines for handwritten material •Speech and face recognition as technological background •Transcription and Recognition Platform •Virtual Research Environment •Will be released by University of Innsbruck in 2015 18
  • 19. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Structural Metadata •Layout Analyses •Noise reduction (redundant text) •A newspaper contains much more than edited articles Content units •One separation could be: edited articles – advertisements - entertainment •Document Understanding •Newspaper consists of repeated sections (“templates”) •Unique vs. common content E.g. local news, local advertisements, etc. vs. “world news” •Common content may be found elsewhere in more detail E.g. book announcement 19
  • 20. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Austrian Newspapers Online – ANNO - 1916 20
  • 21. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp …more than edited articles 21
  • 22. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Edited articles vs. advertisements vs. entertainment 22 Innsbrucker Nachrichten, 4 June 1870
  • 23. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Innsbrucker Nachrichten 1870 23
  • 24. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Content units •Types •List of recently died persons •Announcement of local associations •Apartments to rent •Obituaries •Continued novels •… 24
  • 25. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Technical approaches •Layout analysis •Specific tools •XML Output of OCR engine (cheap, easy to handle) •Approaches •Rule based approaches (experts needed) •Machine learning approaches (large amounts of training samples needed) •Functional Extension Parser (IMPACT project) •Rule based approach for historical books (pre 1900) •More than 80% accuracy for non-trivial features are hard to reach •E.g. separation edited text – advertisments – entertainment, running titles, section headings, 25
  • 26. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Summary •Digitisation of newspapers is in many countries/regions still at the beginning •OCR, though erroneous, is a must and cheap (compared to scanning) •Post-processing of OCR is promising •Structural metadata are a must as well, new approaches are needed (beyond article separation) •Natural Language Processing and more advanced operations will benefit •Final goal of “document understanding” by machines 26
  • 27. Thank you for your attention! l Günter Mühlberger <guenter.muehlberger@uibk.ac.at>