Experimental Workflow Development in Digitisation

C
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Experimental workflow development 
in digitisation 
The concept of collaborative workflow development in the IMPACT project 
Mustafa Dogan (Göttingen State and University Library) 
Clemens Neudecker (Koninklijke Bibliotheek) 
Gerd Zechmeister (Austrian National Library) 
Sven Schlarb (Austrian National Library)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
27.5.2010 QQML 
2 
Agenda 
 Background of IMPACT 
 Digitisation workflows 
 Collaborative workflow development 
 Architectural principles 
 Workflow development platform 
 Key success factors 
 Outlook and future scenarios
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
27.5.2010 QQML 
3 
Background of IMPACT 
 Project partners 
– 26 Libraries, Research Institutes and Industry Partners 
 Main objective 
– Improve access to historical books and newspapers printed before 1900 
 Software tools and prototypes 
– Image Enhancement & Segmentation Toolkit 
– Improved ABBYY FineReader OCR Engine, IBM Adaptive OCR 
– Post-processing and -correction modules 
– Lexical resources for several European languages 
 Support to the MLA community 
– Best Practises & Strategic/Operational Guidelines 
– Online Helpdesk 
– Tool Showcases & Demonstrators 
– Centre of Competence
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Digitisation workflows 
 Digitisation: a sequence of steps, from selection of analogue source 
material to presentation of digital objects for end-users 
 Workflow: software-based execution of a sequence without human 
27.5.2010 QQML 
4 
interaction 
 Challenges and barriers 
– Workflows are tailored to specific needs 
– Lack of interoperability for applied software and input/outdata data 
– Lack of collaboratively used and developed resources and expertise
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Collaborative workflow development 
 Workflow Development as a community-driven activity using an 
27.5.2010 QQML 
5 
experimental platform 
 Scientific workflows: using web services representing individual 
software modules (Shiyong Lu et al. 2009) 
 Providing highly innovative and efficient tools to a wider community to 
design workflows 
 Technical staff providing the platform, conceptual/library staff 
designing workflows 
 Using Web 2.0 features to share and expand knowledge and 
resources
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
27.5.2010 QQML 
6 
Architectural platform principles 
 Modularity 
 Transparency 
 Flexibility 
 Extensibility 
 Open standards based 
 Accessibility 
 Scalability 
 Collaboration
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
27.5.2010 QQML 
7 
Workflow development platform
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
27.5.2010 QQML 
8 
Workflow development phases
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Evaluation criteria 
 OCR: correctly recognised characters/words 
 Segmentation: correctly identified text and graphical regions 
 Workflows: comparing workflows and identifiying most suitable 
 Statistical and provenance data: e.g. processing time 
27.5.2010 QQML 
9
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
27.5.2010 QQML 
10 
Outlook 
 Keys to success 
– Joint effort by library and software development staff 
– Usability of tools and platform 
– Incentive to collaborative work 
– Testing and adaptation of workflows 
– Permanently tailoring and optimizing workflows 
 Future work 
– Demonstration of current (web) services 
– Experimental platform as sustainable resource for a Centre of 
Competence for the MLA community
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
27.5.2010 QQML 
11 
Thank you very much! 
Contact: 
Project Website: http://www.impact-project.eu 
Project Office: impact@kb.nl
1 of 11

Recommended

An Experimental Workflow Development Platform for Historical Document Digitis... by
An Experimental Workflow Development Platform for Historical Document Digitis...An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...cneudecker
367 views13 slides
IMPACT at OCR Summit by
IMPACT at OCR SummitIMPACT at OCR Summit
IMPACT at OCR Summitcneudecker
302 views17 slides
IMPACT HPC Cloud Day by
IMPACT HPC Cloud DayIMPACT HPC Cloud Day
IMPACT HPC Cloud Daycneudecker
348 views13 slides
Presentation of Hans-Jörg Lieder, BnF Information Day by
Presentation of Hans-Jörg Lieder, BnF Information DayPresentation of Hans-Jörg Lieder, BnF Information Day
Presentation of Hans-Jörg Lieder, BnF Information DayEuropeana Newspapers
1.2K views15 slides
Metadata by
MetadataMetadata
MetadataEuropeana Newspapers
1.3K views32 slides
Europeana Newspapers Project by
Europeana Newspapers ProjectEuropeana Newspapers Project
Europeana Newspapers ProjectEuropeana Newspapers
1.2K views10 slides

More Related Content

What's hot

Succeed Introduction - Rafael Carrasco by
Succeed Introduction  - Rafael CarrascoSucceed Introduction  - Rafael Carrasco
Succeed Introduction - Rafael CarrascoIMPACT Centre of Competence
604 views4 slides
ENP Belgrade Workshop Project Overview by
ENP Belgrade Workshop Project OverviewENP Belgrade Workshop Project Overview
ENP Belgrade Workshop Project OverviewEuropeana Newspapers
1.4K views14 slides
Europeana Newspapers LFT Infoday Muehlberger by
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers
729 views27 slides
Europeana_Newspapers_ONB_infoday_HJLieder by
Europeana_Newspapers_ONB_infoday_HJLiederEuropeana_Newspapers_ONB_infoday_HJLieder
Europeana_Newspapers_ONB_infoday_HJLiederEuropeana Newspapers
632 views15 slides
Europeana Newspapers LFT Infoday Genereux by
Europeana Newspapers LFT Infoday GenereuxEuropeana Newspapers LFT Infoday Genereux
Europeana Newspapers LFT Infoday GenereuxEuropeana Newspapers
580 views11 slides
Intelligent tools-mitja-jermol-2013-bali-7 may2013 by
Intelligent tools-mitja-jermol-2013-bali-7 may2013Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013MediaMixerCommunity
627 views38 slides

What's hot(20)

Intelligent tools-mitja-jermol-2013-bali-7 may2013 by MediaMixerCommunity
Intelligent tools-mitja-jermol-2013-bali-7 may2013Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013
Sound Archives and Musical Instrument Collections by Synapta
Sound Archives and Musical Instrument CollectionsSound Archives and Musical Instrument Collections
Sound Archives and Musical Instrument Collections
Synapta195 views
Europeana Network Association Members Council Meeting, Copenhagen by Max Kaiser by Europeana
Europeana Network Association Members Council Meeting, Copenhagen by Max KaiserEuropeana Network Association Members Council Meeting, Copenhagen by Max Kaiser
Europeana Network Association Members Council Meeting, Copenhagen by Max Kaiser
Europeana206 views
agINFRA 5BOAC Presentation by Benjamin Cave
agINFRA 5BOAC PresentationagINFRA 5BOAC Presentation
agINFRA 5BOAC Presentation
Benjamin Cave394 views
Up2U Worskshop at the TNC18 conference by Up2Universe
Up2U Worskshop at the TNC18 conferenceUp2U Worskshop at the TNC18 conference
Up2U Worskshop at the TNC18 conference
Up2Universe21 views
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To... by Data Driven Innovation
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Up2U Workshop at TNC 2018-introduction by Up2Universe
Up2U Workshop at TNC 2018-introductionUp2U Workshop at TNC 2018-introduction
Up2U Workshop at TNC 2018-introduction
Up2Universe30 views
Opal case study 48 gitta switzerland by OPAL2010
Opal case study 48 gitta switzerlandOpal case study 48 gitta switzerland
Opal case study 48 gitta switzerland
OPAL2010340 views
The integration and management of archaeological datasets: the Europeana proj... by CARARE
The integration and management of archaeological datasets: the Europeana proj...The integration and management of archaeological datasets: the Europeana proj...
The integration and management of archaeological datasets: the Europeana proj...
CARARE71 views

Viewers also liked

Bessere Suchergebnisse durch Named Entity Recognition by
Bessere Suchergebnisse durch Named Entity RecognitionBessere Suchergebnisse durch Named Entity Recognition
Bessere Suchergebnisse durch Named Entity Recognitioncneudecker
799 views15 slides
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat... by
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...cneudecker
825 views36 slides
Europeana Newspapers - the Gateway to European Newspapers Online by
Europeana Newspapers - the Gateway to European Newspapers OnlineEuropeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers Onlinecneudecker
434 views17 slides
OCR challenges in historic documents and the contribution of IMPACT by
OCR challenges in historic documents and the contribution of IMPACTOCR challenges in historic documents and the contribution of IMPACT
OCR challenges in historic documents and the contribution of IMPACTcneudecker
542 views26 slides
Europeana Newspapers in a nutshell by
Europeana Newspapers in a nutshellEuropeana Newspapers in a nutshell
Europeana Newspapers in a nutshellcneudecker
373 views15 slides
Workflow Development for OCR (and beyond) by
Workflow Development for OCR (and beyond)Workflow Development for OCR (and beyond)
Workflow Development for OCR (and beyond)cneudecker
713 views20 slides

Viewers also liked(16)

Bessere Suchergebnisse durch Named Entity Recognition by cneudecker
Bessere Suchergebnisse durch Named Entity RecognitionBessere Suchergebnisse durch Named Entity Recognition
Bessere Suchergebnisse durch Named Entity Recognition
cneudecker799 views
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat... by cneudecker
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...
cneudecker825 views
Europeana Newspapers - the Gateway to European Newspapers Online by cneudecker
Europeana Newspapers - the Gateway to European Newspapers OnlineEuropeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers Online
cneudecker434 views
OCR challenges in historic documents and the contribution of IMPACT by cneudecker
OCR challenges in historic documents and the contribution of IMPACTOCR challenges in historic documents and the contribution of IMPACT
OCR challenges in historic documents and the contribution of IMPACT
cneudecker542 views
Europeana Newspapers in a nutshell by cneudecker
Europeana Newspapers in a nutshellEuropeana Newspapers in a nutshell
Europeana Newspapers in a nutshell
cneudecker373 views
Workflow Development for OCR (and beyond) by cneudecker
Workflow Development for OCR (and beyond)Workflow Development for OCR (and beyond)
Workflow Development for OCR (and beyond)
cneudecker713 views
Succeed 2nd hackathon by cneudecker
Succeed 2nd hackathonSucceed 2nd hackathon
Succeed 2nd hackathon
cneudecker627 views
Refinement of Digitised Newspapers by cneudecker
Refinement of Digitised NewspapersRefinement of Digitised Newspapers
Refinement of Digitised Newspapers
cneudecker350 views
Collaborative Workflow Development and Experimentation in the Digital Humanities by cneudecker
Collaborative Workflow Development and Experimentation in the Digital HumanitiesCollaborative Workflow Development and Experimentation in the Digital Humanities
Collaborative Workflow Development and Experimentation in the Digital Humanities
cneudecker735 views
The IMPACT Interoperability Framework - Workflows for OCR and beyond by cneudecker
The IMPACT Interoperability Framework - Workflows for OCR and beyondThe IMPACT Interoperability Framework - Workflows for OCR and beyond
The IMPACT Interoperability Framework - Workflows for OCR and beyond
cneudecker457 views
Digitale Kuratierungstechnologien in Bibliotheken by cneudecker
Digitale Kuratierungstechnologien in BibliothekenDigitale Kuratierungstechnologien in Bibliotheken
Digitale Kuratierungstechnologien in Bibliotheken
cneudecker393 views
The Elephant in the Library - Integrating Hadoop by cneudecker
The Elephant in the Library - Integrating HadoopThe Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating Hadoop
cneudecker388 views
Berliner DH Rundgang by cneudecker
Berliner DH RundgangBerliner DH Rundgang
Berliner DH Rundgang
cneudecker389 views
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n... by cneudecker
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...
cneudecker731 views
What is Hadoop? by cneudecker
What is Hadoop?What is Hadoop?
What is Hadoop?
cneudecker630 views
Preservation Workflows with Taverna by cneudecker
Preservation Workflows with TavernaPreservation Workflows with Taverna
Preservation Workflows with Taverna
cneudecker350 views

Similar to Experimental Workflow Development in Digitisation

IMPACT Demo Dag at KB by
IMPACT Demo Dag at KBIMPACT Demo Dag at KB
IMPACT Demo Dag at KBcneudecker
314 views15 slides
IMPACT: Building a Centre of Competence for Digitisation by
IMPACT: Building a Centre of Competence for DigitisationIMPACT: Building a Centre of Competence for Digitisation
IMPACT: Building a Centre of Competence for DigitisationIMPACT Centre of Competence
719 views23 slides
The Improving Access to Text (IMPACT) project and other European initiatives by
The Improving Access to Text (IMPACT) project and other European initiativesThe Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiativesMichael Day
1.2K views21 slides
Bne impact co_c by
Bne impact co_cBne impact co_c
Bne impact co_cIMPACT Centre of Competence
507 views21 slides
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud by
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the CloudEuropeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the CloudTU Delft, Netherlands
1.2K views31 slides
Ecloud copenhagen-130625074823-phpapp01 by
Ecloud copenhagen-130625074823-phpapp01Ecloud copenhagen-130625074823-phpapp01
Ecloud copenhagen-130625074823-phpapp01The European Library
932 views31 slides

Similar to Experimental Workflow Development in Digitisation(20)

IMPACT Demo Dag at KB by cneudecker
IMPACT Demo Dag at KBIMPACT Demo Dag at KB
IMPACT Demo Dag at KB
cneudecker314 views
The Improving Access to Text (IMPACT) project and other European initiatives by Michael Day
The Improving Access to Text (IMPACT) project and other European initiativesThe Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiatives
Michael Day1.2K views
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud by TU Delft, Netherlands
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the CloudEuropeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin... by The European Library
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
Europeana Newspapers Amsterdam workshop introduction by Europeana Newspapers
Europeana Newspapers Amsterdam workshop introductionEuropeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introduction
ECLAP White paper, social network for Cultural Heritage on Peforming arts by Paolo Nesi
ECLAP White paper, social network for Cultural Heritage on Peforming artsECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming arts
Paolo Nesi364 views
Euterpe Project Proposal for Digital Libraries by Nick Glezakos
Euterpe Project Proposal for Digital LibrariesEuterpe Project Proposal for Digital Libraries
Euterpe Project Proposal for Digital Libraries
Nick Glezakos899 views
16,40 16,55 h. open aire eblida-naple conference by FESABID
16,40 16,55 h. open aire eblida-naple conference16,40 16,55 h. open aire eblida-naple conference
16,40 16,55 h. open aire eblida-naple conference
FESABID583 views
Europeana Creative by Max Kaiser
Europeana CreativeEuropeana Creative
Europeana Creative
Max Kaiser1.1K views

More from cneudecker

EuropeanaTech x AI: Qurator.ai @ Berlin State Library by
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Librarycneudecker
142 views13 slides
ALTO, PAGE & Co. Formate für Volltexte by
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltextecneudecker
82 views22 slides
OCR und Strukturerkennung für Zeitungen by
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungencneudecker
99 views21 slides
Digitisation and Digital Humanities - what is the role of Libraries? by
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?cneudecker
214 views26 slides
Multimodal Perspectives for Digitised Historical Newspapers by
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspaperscneudecker
344 views15 slides
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi... by
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...cneudecker
95 views18 slides

More from cneudecker(20)

EuropeanaTech x AI: Qurator.ai @ Berlin State Library by cneudecker
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Library
cneudecker142 views
ALTO, PAGE & Co. Formate für Volltexte by cneudecker
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltexte
cneudecker82 views
OCR und Strukturerkennung für Zeitungen by cneudecker
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungen
cneudecker99 views
Digitisation and Digital Humanities - what is the role of Libraries? by cneudecker
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?
cneudecker214 views
Multimodal Perspectives for Digitised Historical Newspapers by cneudecker
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspapers
cneudecker344 views
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi... by cneudecker
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
cneudecker95 views
AI for digitized cultural heritage by cneudecker
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritage
cneudecker196 views
Kuratieren mit künstlicher Intelligenz by cneudecker
Kuratieren mit künstlicher IntelligenzKuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher Intelligenz
cneudecker1.2K views
Überblick zum DFG-Projekt OCR-D by cneudecker
Überblick zum DFG-Projekt OCR-DÜberblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-D
cneudecker370 views
The many uses of digitized newspapers by cneudecker
The many uses of digitized newspapersThe many uses of digitized newspapers
The many uses of digitized newspapers
cneudecker302 views
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten... by cneudecker
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
cneudecker539 views
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her... by cneudecker
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
cneudecker286 views
OCR-D: An end-to-end open source OCR framework for historical printed documents by cneudecker
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documents
cneudecker2K views
Text and Data Mining by cneudecker
Text and Data MiningText and Data Mining
Text and Data Mining
cneudecker698 views
Formate für Volltexte by cneudecker
Formate für VolltexteFormate für Volltexte
Formate für Volltexte
cneudecker172 views
Extrablatt: The Latest News on Newspaper Digitisation in Europe by cneudecker
Extrablatt: The Latest News on Newspaper Digitisation in EuropeExtrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in Europe
cneudecker375 views
Reise durch Europeana Collections in 11 Minuten by cneudecker
Reise durch Europeana Collections in 11 MinutenReise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 Minuten
cneudecker306 views
Europeana Newspapers in a Nutshell by cneudecker
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshell
cneudecker507 views
lab.sbb.berlin by cneudecker
lab.sbb.berlinlab.sbb.berlin
lab.sbb.berlin
cneudecker349 views
Named Entity Recognition for Europeana Newspapers by cneudecker
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspapers
cneudecker644 views

Recently uploaded

Choosing the Right Flutter App Development Company by
Choosing the Right Flutter App Development CompanyChoosing the Right Flutter App Development Company
Choosing the Right Flutter App Development CompanyFicode Technologies
13 views9 slides
Innovation & Entrepreneurship strategies in Dairy Industry by
Innovation & Entrepreneurship strategies in Dairy IndustryInnovation & Entrepreneurship strategies in Dairy Industry
Innovation & Entrepreneurship strategies in Dairy IndustryPervaizDar1
35 views26 slides
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell by
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell
"Node.js vs workers — A comparison of two JavaScript runtimes", James M SnellFwdays
14 views30 slides
NTGapps NTG LowCode Platform by
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform Mustafa Kuğu
437 views30 slides
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De... by
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...Moses Kemibaro
35 views38 slides
CryptoBotsAI by
CryptoBotsAICryptoBotsAI
CryptoBotsAIchandureddyvadala199
42 views5 slides

Recently uploaded(20)

Innovation & Entrepreneurship strategies in Dairy Industry by PervaizDar1
Innovation & Entrepreneurship strategies in Dairy IndustryInnovation & Entrepreneurship strategies in Dairy Industry
Innovation & Entrepreneurship strategies in Dairy Industry
PervaizDar135 views
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell by Fwdays
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell
Fwdays14 views
NTGapps NTG LowCode Platform by Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu437 views
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De... by Moses Kemibaro
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Moses Kemibaro35 views
The Power of Generative AI in Accelerating No Code Adoption.pdf by Saeed Al Dhaheri
The Power of Generative AI in Accelerating No Code Adoption.pdfThe Power of Generative AI in Accelerating No Code Adoption.pdf
The Power of Generative AI in Accelerating No Code Adoption.pdf
Saeed Al Dhaheri39 views
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023 by BookNet Canada
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
BookNet Canada44 views
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
The Role of Patterns in the Era of Large Language Models by Yunyao Li
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
Yunyao Li91 views
Mobile Core Solutions & Successful Cases.pdf by IPLOOK Networks
Mobile Core Solutions & Successful Cases.pdfMobile Core Solutions & Successful Cases.pdf
Mobile Core Solutions & Successful Cases.pdf
IPLOOK Networks14 views
Initiating and Advancing Your Strategic GIS Governance Strategy by Safe Software
Initiating and Advancing Your Strategic GIS Governance StrategyInitiating and Advancing Your Strategic GIS Governance Strategy
Initiating and Advancing Your Strategic GIS Governance Strategy
Safe Software184 views
Cocktail of Environments. How to Mix Test and Development Environments and St... by Aleksandr Tarasov
Cocktail of Environments. How to Mix Test and Development Environments and St...Cocktail of Environments. How to Mix Test and Development Environments and St...
Cocktail of Environments. How to Mix Test and Development Environments and St...
Transcript: Redefining the book supply chain: A glimpse into the future - Tec... by BookNet Canada
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
BookNet Canada41 views
Optimizing Communication to Optimize Human Behavior - LCBM by Yaman Kumar
Optimizing Communication to Optimize Human Behavior - LCBMOptimizing Communication to Optimize Human Behavior - LCBM
Optimizing Communication to Optimize Human Behavior - LCBM
Yaman Kumar38 views
Business Analyst Series 2023 - Week 4 Session 8 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8
DianaGray10145 views

Experimental Workflow Development in Digitisation

  • 1. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Experimental workflow development in digitisation The concept of collaborative workflow development in the IMPACT project Mustafa Dogan (Göttingen State and University Library) Clemens Neudecker (Koninklijke Bibliotheek) Gerd Zechmeister (Austrian National Library) Sven Schlarb (Austrian National Library)
  • 2. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 27.5.2010 QQML 2 Agenda  Background of IMPACT  Digitisation workflows  Collaborative workflow development  Architectural principles  Workflow development platform  Key success factors  Outlook and future scenarios
  • 3. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 27.5.2010 QQML 3 Background of IMPACT  Project partners – 26 Libraries, Research Institutes and Industry Partners  Main objective – Improve access to historical books and newspapers printed before 1900  Software tools and prototypes – Image Enhancement & Segmentation Toolkit – Improved ABBYY FineReader OCR Engine, IBM Adaptive OCR – Post-processing and -correction modules – Lexical resources for several European languages  Support to the MLA community – Best Practises & Strategic/Operational Guidelines – Online Helpdesk – Tool Showcases & Demonstrators – Centre of Competence
  • 4. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Digitisation workflows  Digitisation: a sequence of steps, from selection of analogue source material to presentation of digital objects for end-users  Workflow: software-based execution of a sequence without human 27.5.2010 QQML 4 interaction  Challenges and barriers – Workflows are tailored to specific needs – Lack of interoperability for applied software and input/outdata data – Lack of collaboratively used and developed resources and expertise
  • 5. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Collaborative workflow development  Workflow Development as a community-driven activity using an 27.5.2010 QQML 5 experimental platform  Scientific workflows: using web services representing individual software modules (Shiyong Lu et al. 2009)  Providing highly innovative and efficient tools to a wider community to design workflows  Technical staff providing the platform, conceptual/library staff designing workflows  Using Web 2.0 features to share and expand knowledge and resources
  • 6. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 27.5.2010 QQML 6 Architectural platform principles  Modularity  Transparency  Flexibility  Extensibility  Open standards based  Accessibility  Scalability  Collaboration
  • 7. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 27.5.2010 QQML 7 Workflow development platform
  • 8. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 27.5.2010 QQML 8 Workflow development phases
  • 9. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Evaluation criteria  OCR: correctly recognised characters/words  Segmentation: correctly identified text and graphical regions  Workflows: comparing workflows and identifiying most suitable  Statistical and provenance data: e.g. processing time 27.5.2010 QQML 9
  • 10. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 27.5.2010 QQML 10 Outlook  Keys to success – Joint effort by library and software development staff – Usability of tools and platform – Incentive to collaborative work – Testing and adaptation of workflows – Permanently tailoring and optimizing workflows  Future work – Demonstration of current (web) services – Experimental platform as sustainable resource for a Centre of Competence for the MLA community
  • 11. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 27.5.2010 QQML 11 Thank you very much! Contact: Project Website: http://www.impact-project.eu Project Office: impact@kb.nl

Editor's Notes

  1. 26 Libraries, Research Institutes and Industry Partners: providing content/material, knowledge/expertise, tools/software modules/prototypes
  2. What is digitisation in IMPACT How do we define a workflow in IMPACT What are the challenges in current workflow development and application Workflows are tailored to library-/project-specific needs no out-of-the-box system causes labour- and cost-intensive evaluation and adaptation for repurpose Lack of interoperability for applied software and input/outdata data Lack of collaboratively used and developed resources and expertise Human intervention often required to guarantee ongoing processing
  3. Concept of scientific workflows: http://www.cs.wayne.edu/~shiyong/papers/tsc09.pdf Technical staff providing the platform, conceptual staff designing workflows  no in-depth technical and procedural knowledge required by conceptual staff
  4. Modularity: modules combined in number of combinations identify the most suitable processing chain service-oriented-architecture (SOA) is the guiding architectural design principle principle of loose coupling of reusable processing units minimising interdependencies Transparency: Each processing step tested and evaluated separately Flexibility: platform-independent capable of integrating different types of software performance of tools can be compared easily. Extensibility: Third party components  small extra effort not restricted to software tools developed in IMPACT Open standards based: widely supported open source software (Apache Software Foundation) Interoperability through use of XML standards such as METS/ALTO for encoding of structural information and the OCR-recognised text SOAP as the message exchange protocol WSDL for web service description Accessibility: 3 different types of interfaces user-friendly, graphical workflow design and execution interface a web client generator  seamless integration into web sites machine interface (API) Scalability: Components will be deployed in the IT infrastructure of different partner organisations in a distributed network with cloned services Services available in a redundant way Balancing the workload and adding additional computing capacity when needed. Collaboration: community-wide applicability optimisation of workflows accessible by various channels (including Web 2.0 features) comprehensively described and documented.
  5. Joint effort by library and software development staff: library: concepts, content-providing – SD: technical framework, integration of services etc. Expanding portfolio of web services: also by scanning services, quality assurance/evaluation modules etc. to cover entire range of digitisation workflow steps