SlideShare a Scribd company logo
1 of 17
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
An Experimental Workflow Development 
Platform for Historical Document Digitisation 
Clemens Neudecker, KB National Library of the Netherlands
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Background 
 IMPACT – Improving Access to Text (2008 – 2011) 
From a technical perspective: 
> 20 software components for solving specific issues 
Prototyping new algorithms, improving commercial solutions 
Different frameworks (C, C++, Java, etc.), platforms (Win/Linux) 
+ 3rd party applications 
“One ring to rule them all…” 
 IMPACT Interoperability Framework (IIF)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Main requirements 
Behavioural: 
 Minimize integration effort 
 Minimize deployment effort 
 Maximize usability 
 Maximize scalability 
Functional: 
 Modular 
 Transparent 
 Expandable 
 Open source 
 Platform independent
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Framework integration 
 Simple to use generic command line wrapper for web services
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Architecture 
 IMPACT Interoperability Framework: Technologies 
- Java 
- Apache Maven 
- Apache Tomcat 
- Apache Axis2+Synapse 
- Taverna Workflow Engine 
 IMPACT Interoperability Framework: Dataset 
- more than 600.000 images from digital libraries 
- more than 50.000 ground truth transcriptions
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Generic Web Service Wrapper 
Only requirement: Command Line Application  HTML form 
Source code available on github: 
https://github.com/impactcentre/toolwrapper 
 Easy integration: developers can focus on their application 
and have to worry less about integration = 
higher quality software components
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Workflows 
 OCR workflow = 
data pipeline 
 Building blocks = 
processing modules 
 Integration = 
interaction between 
nodes (mashups) 
 Collaboration with
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Workflow management 
 Web 2.0 style registry: myExperiment 
 Local client: Taverna Workbench 
 Web client: Project website
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Local client: Taverna Workbench 
 Background: 
BioSciences 
 Developed and 
maintained by 
myGrid, UK 
 Available for 
Windows/Linux/OSX 
and as open source 
(Java)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Web client: Taverna Server/ 
Workflow Parser 
 SOAP/REST API 
 Remote execution of workflows
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Community 
 Web2.0 style workflow registry 
 Community of experts 
 Sharing of resources 
 Knowledge exchange 
 A central meeting point 
for users and researchers
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Compute cluster 
 Enterprise Service Bus 
receives requests from 
users and distributes 
the load to the available 
worker nodes 
 Main effect: 
Process parallelization, 
Load distribution, 
Fail over 
 Test deployment on Dutch Supercomputing Cloud HPC
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Dataset 
 Representative and annotated dataset of significant size, with 
metadata, ground truth and search facilities
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Evaluation features 
 Text based comparison of result with ground truth, 
using Levenshtein distance method 
 Layout based comparison of result with ground truth, 
using the Page Analysis And Ground Truth Elements Framework 
 Example:
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Outlook 
 Online service for testing/evaluation/processing 
Results Repository (WebDAV, POI) 
 Extending the scope: 
Workflows for linguistic analysis: CLARIN 
Workflows for preservation: SCAPE 
 Even better scalability: MapReduce/Hadoop 
 Supported by a community of developers & practitioners
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Summary 
- Availability of resources (images, ground truth and tools) 
to the international research community 
- A common baseline for transparent evaluation and comparison 
- Ready-to-use components, reproducible experiments 
- Sharing of results and know-how 
- Enable scalability for prototypes/data intensive workflows 
- Simple and uniform user interface for all embedded tools 
- Consolidation of support and maintenance 
Thank you! 
Questions?

More Related Content

What's hot

Amersfoort 2016 koch_wg_v02
Amersfoort 2016 koch_wg_v02Amersfoort 2016 koch_wg_v02
Amersfoort 2016 koch_wg_v02walter koch
 
G02 walter koch_micro_services
G02 walter koch_micro_servicesG02 walter koch_micro_services
G02 walter koch_micro_servicesevaminerva
 
Governments ENabled through IPv6 - GEN6 - project overview
Governments ENabled through IPv6 - GEN6 - project overviewGovernments ENabled through IPv6 - GEN6 - project overview
Governments ENabled through IPv6 - GEN6 - project overviewGovernments ENabled with IPv6
 
Up2U Workshop at TNC 2018-introduction
Up2U Workshop at TNC 2018-introductionUp2U Workshop at TNC 2018-introduction
Up2U Workshop at TNC 2018-introductionUp2Universe
 
Heterogeneous HPC Computing in the DeepHealth Project
Heterogeneous HPC Computing in the DeepHealth ProjectHeterogeneous HPC Computing in the DeepHealth Project
Heterogeneous HPC Computing in the DeepHealth ProjectBig Data Value Association
 
ICN in the IRTF and IETF
ICN in the IRTF and IETFICN in the IRTF and IETF
ICN in the IRTF and IETFDirk Kutscher
 
SEMANCO poster at ESWC 2014
SEMANCO poster at ESWC 2014SEMANCO poster at ESWC 2014
SEMANCO poster at ESWC 2014Álvaro Sicilia
 
Polyglot Notebooks with Squeak/Smalltalk on the GraalVM
Polyglot Notebooks with Squeak/Smalltalk on the GraalVMPolyglot Notebooks with Squeak/Smalltalk on the GraalVM
Polyglot Notebooks with Squeak/Smalltalk on the GraalVMESUG
 
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF toolsCIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF toolsPieter Pauwels
 
Lime recommendation
Lime recommendationLime recommendation
Lime recommendationJohn Pereira
 
CV Senior Integratie en Cloud Architect an
CV Senior Integratie en Cloud Architect anCV Senior Integratie en Cloud Architect an
CV Senior Integratie en Cloud Architect anJurriaan Brandsma
 
VIVOCamp slides: agenda and slides on the extension of the ontology
VIVOCamp slides: agenda and slides on the extension of the ontologyVIVOCamp slides: agenda and slides on the extension of the ontology
VIVOCamp slides: agenda and slides on the extension of the ontologyValeria Pesce
 

What's hot (20)

Amersfoort 2016 koch_wg_v02
Amersfoort 2016 koch_wg_v02Amersfoort 2016 koch_wg_v02
Amersfoort 2016 koch_wg_v02
 
G02 walter koch_micro_services
G02 walter koch_micro_servicesG02 walter koch_micro_services
G02 walter koch_micro_services
 
FIRE slideshow @ECFI-2
FIRE slideshow @ECFI-2FIRE slideshow @ECFI-2
FIRE slideshow @ECFI-2
 
Governments ENabled through IPv6 - GEN6 - project overview
Governments ENabled through IPv6 - GEN6 - project overviewGovernments ENabled through IPv6 - GEN6 - project overview
Governments ENabled through IPv6 - GEN6 - project overview
 
Up2U Workshop at TNC 2018-introduction
Up2U Workshop at TNC 2018-introductionUp2U Workshop at TNC 2018-introduction
Up2U Workshop at TNC 2018-introduction
 
Fire at Net Futures2015
Fire at Net Futures2015Fire at Net Futures2015
Fire at Net Futures2015
 
Heterogeneous HPC Computing in the DeepHealth Project
Heterogeneous HPC Computing in the DeepHealth ProjectHeterogeneous HPC Computing in the DeepHealth Project
Heterogeneous HPC Computing in the DeepHealth Project
 
Deep Hybrid DataCloud
Deep Hybrid DataCloudDeep Hybrid DataCloud
Deep Hybrid DataCloud
 
AntoineLambertResume
AntoineLambertResumeAntoineLambertResume
AntoineLambertResume
 
FIRE at the ICT2015
FIRE at the ICT2015FIRE at the ICT2015
FIRE at the ICT2015
 
FIRE Brochure 2014 multimedia eBook -version
FIRE Brochure 2014 multimedia eBook -versionFIRE Brochure 2014 multimedia eBook -version
FIRE Brochure 2014 multimedia eBook -version
 
ICN in the IRTF and IETF
ICN in the IRTF and IETFICN in the IRTF and IETF
ICN in the IRTF and IETF
 
SEMANCO poster at ESWC 2014
SEMANCO poster at ESWC 2014SEMANCO poster at ESWC 2014
SEMANCO poster at ESWC 2014
 
Polyglot Notebooks with Squeak/Smalltalk on the GraalVM
Polyglot Notebooks with Squeak/Smalltalk on the GraalVMPolyglot Notebooks with Squeak/Smalltalk on the GraalVM
Polyglot Notebooks with Squeak/Smalltalk on the GraalVM
 
TANGO Project Poster v2
TANGO Project Poster v2TANGO Project Poster v2
TANGO Project Poster v2
 
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF toolsCIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
 
Lime recommendation
Lime recommendationLime recommendation
Lime recommendation
 
CV Senior Integratie en Cloud Architect an
CV Senior Integratie en Cloud Architect anCV Senior Integratie en Cloud Architect an
CV Senior Integratie en Cloud Architect an
 
The European Portal for documents and Archives: the APEnet Project
The European Portal for documents and Archives: the APEnet ProjectThe European Portal for documents and Archives: the APEnet Project
The European Portal for documents and Archives: the APEnet Project
 
VIVOCamp slides: agenda and slides on the extension of the ontology
VIVOCamp slides: agenda and slides on the extension of the ontologyVIVOCamp slides: agenda and slides on the extension of the ontology
VIVOCamp slides: agenda and slides on the extension of the ontology
 

Viewers also liked

OCR challenges in historic documents and the contribution of IMPACT
OCR challenges in historic documents and the contribution of IMPACTOCR challenges in historic documents and the contribution of IMPACT
OCR challenges in historic documents and the contribution of IMPACTcneudecker
 
Europeana Newspapers in a nutshell
Europeana Newspapers in a nutshellEuropeana Newspapers in a nutshell
Europeana Newspapers in a nutshellcneudecker
 
The IMPACT Interoperability Framework - Workflows for OCR and beyond
The IMPACT Interoperability Framework - Workflows for OCR and beyondThe IMPACT Interoperability Framework - Workflows for OCR and beyond
The IMPACT Interoperability Framework - Workflows for OCR and beyondcneudecker
 
Experimental Workflow Development in Digitisation
Experimental Workflow Development in DigitisationExperimental Workflow Development in Digitisation
Experimental Workflow Development in Digitisationcneudecker
 
Succeed 2nd hackathon
Succeed 2nd hackathonSucceed 2nd hackathon
Succeed 2nd hackathoncneudecker
 
Collaborative Workflow Development and Experimentation in the Digital Humanities
Collaborative Workflow Development and Experimentation in the Digital HumanitiesCollaborative Workflow Development and Experimentation in the Digital Humanities
Collaborative Workflow Development and Experimentation in the Digital Humanitiescneudecker
 
Refinement of Digitised Newspapers
Refinement of Digitised NewspapersRefinement of Digitised Newspapers
Refinement of Digitised Newspaperscneudecker
 
Europeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers OnlineEuropeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers Onlinecneudecker
 
Bessere Suchergebnisse durch Named Entity Recognition
Bessere Suchergebnisse durch Named Entity RecognitionBessere Suchergebnisse durch Named Entity Recognition
Bessere Suchergebnisse durch Named Entity Recognitioncneudecker
 
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...cneudecker
 
IMPACT HPC Cloud Day
IMPACT HPC Cloud DayIMPACT HPC Cloud Day
IMPACT HPC Cloud Daycneudecker
 
Digitale Kuratierungstechnologien in Bibliotheken
Digitale Kuratierungstechnologien in BibliothekenDigitale Kuratierungstechnologien in Bibliotheken
Digitale Kuratierungstechnologien in Bibliothekencneudecker
 
The Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating HadoopThe Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating Hadoopcneudecker
 
Berliner DH Rundgang
Berliner DH RundgangBerliner DH Rundgang
Berliner DH Rundgangcneudecker
 
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...cneudecker
 
Preservation Workflows with Taverna
Preservation Workflows with TavernaPreservation Workflows with Taverna
Preservation Workflows with Tavernacneudecker
 
What is Hadoop?
What is Hadoop?What is Hadoop?
What is Hadoop?cneudecker
 

Viewers also liked (17)

OCR challenges in historic documents and the contribution of IMPACT
OCR challenges in historic documents and the contribution of IMPACTOCR challenges in historic documents and the contribution of IMPACT
OCR challenges in historic documents and the contribution of IMPACT
 
Europeana Newspapers in a nutshell
Europeana Newspapers in a nutshellEuropeana Newspapers in a nutshell
Europeana Newspapers in a nutshell
 
The IMPACT Interoperability Framework - Workflows for OCR and beyond
The IMPACT Interoperability Framework - Workflows for OCR and beyondThe IMPACT Interoperability Framework - Workflows for OCR and beyond
The IMPACT Interoperability Framework - Workflows for OCR and beyond
 
Experimental Workflow Development in Digitisation
Experimental Workflow Development in DigitisationExperimental Workflow Development in Digitisation
Experimental Workflow Development in Digitisation
 
Succeed 2nd hackathon
Succeed 2nd hackathonSucceed 2nd hackathon
Succeed 2nd hackathon
 
Collaborative Workflow Development and Experimentation in the Digital Humanities
Collaborative Workflow Development and Experimentation in the Digital HumanitiesCollaborative Workflow Development and Experimentation in the Digital Humanities
Collaborative Workflow Development and Experimentation in the Digital Humanities
 
Refinement of Digitised Newspapers
Refinement of Digitised NewspapersRefinement of Digitised Newspapers
Refinement of Digitised Newspapers
 
Europeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers OnlineEuropeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers Online
 
Bessere Suchergebnisse durch Named Entity Recognition
Bessere Suchergebnisse durch Named Entity RecognitionBessere Suchergebnisse durch Named Entity Recognition
Bessere Suchergebnisse durch Named Entity Recognition
 
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...
 
IMPACT HPC Cloud Day
IMPACT HPC Cloud DayIMPACT HPC Cloud Day
IMPACT HPC Cloud Day
 
Digitale Kuratierungstechnologien in Bibliotheken
Digitale Kuratierungstechnologien in BibliothekenDigitale Kuratierungstechnologien in Bibliotheken
Digitale Kuratierungstechnologien in Bibliotheken
 
The Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating HadoopThe Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating Hadoop
 
Berliner DH Rundgang
Berliner DH RundgangBerliner DH Rundgang
Berliner DH Rundgang
 
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...
 
Preservation Workflows with Taverna
Preservation Workflows with TavernaPreservation Workflows with Taverna
Preservation Workflows with Taverna
 
What is Hadoop?
What is Hadoop?What is Hadoop?
What is Hadoop?
 

Similar to IMPACT at OCR Summit

IMPACT Demo Dag at KB
IMPACT Demo Dag at KBIMPACT Demo Dag at KB
IMPACT Demo Dag at KBcneudecker
 
Centre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens NeudeckerCentre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens NeudeckerBiblioteca Nacional de España
 
IMPACT Interoperability and Evaluation Framework. Clemens Neudecker
IMPACT Interoperability and Evaluation Framework. Clemens NeudeckerIMPACT Interoperability and Evaluation Framework. Clemens Neudecker
IMPACT Interoperability and Evaluation Framework. Clemens NeudeckerBiblioteca Nacional de España
 
ECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming artsECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming artsPaolo Nesi
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE Project
 
Application scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National LibraryApplication scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National LibrarySven Schlarb
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbSCAPE Project
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsSCAPE Project
 
Wonderland @ Cattid - Sun's Virtual Workplace
Wonderland @ Cattid - Sun's Virtual WorkplaceWonderland @ Cattid - Sun's Virtual Workplace
Wonderland @ Cattid - Sun's Virtual Workplacevincenzo de simone
 
Cbsdl 2015 koch_wg_v01
Cbsdl 2015 koch_wg_v01Cbsdl 2015 koch_wg_v01
Cbsdl 2015 koch_wg_v01walter koch
 
Per Blixt - Fire results from call 5 and plans for call 7
Per Blixt - Fire results from call 5 and plans for call 7Per Blixt - Fire results from call 5 and plans for call 7
Per Blixt - Fire results from call 5 and plans for call 7Fire Conference 2010
 
Ecloud copenhagen-130625074823-phpapp01
Ecloud copenhagen-130625074823-phpapp01Ecloud copenhagen-130625074823-phpapp01
Ecloud copenhagen-130625074823-phpapp01The European Library
 
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the CloudEuropeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the CloudTU Delft, Netherlands
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Project
 
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowdFranco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowdEOSC-hub project
 
RNP Cloud Infrastructure model, services and challenges
RNP Cloud Infrastructure model, services and challengesRNP Cloud Infrastructure model, services and challenges
RNP Cloud Infrastructure model, services and challengesEUBrasilCloudFORUM .
 
The Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiativesThe Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiativesMichael Day
 

Similar to IMPACT at OCR Summit (20)

IMPACT Demo Dag at KB
IMPACT Demo Dag at KBIMPACT Demo Dag at KB
IMPACT Demo Dag at KB
 
Centre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens NeudeckerCentre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens Neudecker
 
IMPACT Interoperability and Evaluation Framework. Clemens Neudecker
IMPACT Interoperability and Evaluation Framework. Clemens NeudeckerIMPACT Interoperability and Evaluation Framework. Clemens Neudecker
IMPACT Interoperability and Evaluation Framework. Clemens Neudecker
 
ECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming artsECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming arts
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
 
Application scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National LibraryApplication scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National Library
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven Schlarb
 
Bne impact co_c
Bne impact co_cBne impact co_c
Bne impact co_c
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation Environments
 
Wonderland @ Cattid - Sun's Virtual Workplace
Wonderland @ Cattid - Sun's Virtual WorkplaceWonderland @ Cattid - Sun's Virtual Workplace
Wonderland @ Cattid - Sun's Virtual Workplace
 
Cbsdl 2015 koch_wg_v01
Cbsdl 2015 koch_wg_v01Cbsdl 2015 koch_wg_v01
Cbsdl 2015 koch_wg_v01
 
Per Blixt - Fire results from call 5 and plans for call 7
Per Blixt - Fire results from call 5 and plans for call 7Per Blixt - Fire results from call 5 and plans for call 7
Per Blixt - Fire results from call 5 and plans for call 7
 
Ecloud copenhagen-130625074823-phpapp01
Ecloud copenhagen-130625074823-phpapp01Ecloud copenhagen-130625074823-phpapp01
Ecloud copenhagen-130625074823-phpapp01
 
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the CloudEuropeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
 
FIRE slideshow @ECFI-2
FIRE slideshow @ECFI-2FIRE slideshow @ECFI-2
FIRE slideshow @ECFI-2
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
 
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowdFranco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd
 
IMPACT Final Conference - Muehlberger - FEP
IMPACT Final Conference - Muehlberger - FEPIMPACT Final Conference - Muehlberger - FEP
IMPACT Final Conference - Muehlberger - FEP
 
RNP Cloud Infrastructure model, services and challenges
RNP Cloud Infrastructure model, services and challengesRNP Cloud Infrastructure model, services and challenges
RNP Cloud Infrastructure model, services and challenges
 
The Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiativesThe Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiatives
 

More from cneudecker

EuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Librarycneudecker
 
ALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltextecneudecker
 
OCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungencneudecker
 
Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?cneudecker
 
Multimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspaperscneudecker
 
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...cneudecker
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritagecneudecker
 
Kuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher IntelligenzKuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher Intelligenzcneudecker
 
Überblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-DÜberblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-Dcneudecker
 
The many uses of digitized newspapers
The many uses of digitized newspapersThe many uses of digitized newspapers
The many uses of digitized newspaperscneudecker
 
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...cneudecker
 
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...cneudecker
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentscneudecker
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Miningcneudecker
 
Formate für Volltexte
Formate für VolltexteFormate für Volltexte
Formate für Volltextecneudecker
 
Extrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in EuropeExtrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in Europecneudecker
 
Reise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 MinutenReise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 Minutencneudecker
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshellcneudecker
 
lab.sbb.berlin
lab.sbb.berlinlab.sbb.berlin
lab.sbb.berlincneudecker
 
Named Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspaperscneudecker
 

More from cneudecker (20)

EuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Library
 
ALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltexte
 
OCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungen
 
Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?
 
Multimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspapers
 
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritage
 
Kuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher IntelligenzKuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher Intelligenz
 
Überblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-DÜberblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-D
 
The many uses of digitized newspapers
The many uses of digitized newspapersThe many uses of digitized newspapers
The many uses of digitized newspapers
 
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
 
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documents
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Mining
 
Formate für Volltexte
Formate für VolltexteFormate für Volltexte
Formate für Volltexte
 
Extrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in EuropeExtrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in Europe
 
Reise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 MinutenReise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 Minuten
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshell
 
lab.sbb.berlin
lab.sbb.berlinlab.sbb.berlin
lab.sbb.berlin
 
Named Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspapers
 

Recently uploaded

Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 

Recently uploaded (20)

Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 

IMPACT at OCR Summit

  • 1. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. An Experimental Workflow Development Platform for Historical Document Digitisation Clemens Neudecker, KB National Library of the Netherlands
  • 2. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Background  IMPACT – Improving Access to Text (2008 – 2011) From a technical perspective: > 20 software components for solving specific issues Prototyping new algorithms, improving commercial solutions Different frameworks (C, C++, Java, etc.), platforms (Win/Linux) + 3rd party applications “One ring to rule them all…”  IMPACT Interoperability Framework (IIF)
  • 3. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Main requirements Behavioural:  Minimize integration effort  Minimize deployment effort  Maximize usability  Maximize scalability Functional:  Modular  Transparent  Expandable  Open source  Platform independent
  • 4. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Framework integration  Simple to use generic command line wrapper for web services
  • 5. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Architecture  IMPACT Interoperability Framework: Technologies - Java - Apache Maven - Apache Tomcat - Apache Axis2+Synapse - Taverna Workflow Engine  IMPACT Interoperability Framework: Dataset - more than 600.000 images from digital libraries - more than 50.000 ground truth transcriptions
  • 6. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Generic Web Service Wrapper Only requirement: Command Line Application  HTML form Source code available on github: https://github.com/impactcentre/toolwrapper  Easy integration: developers can focus on their application and have to worry less about integration = higher quality software components
  • 7. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Workflows  OCR workflow = data pipeline  Building blocks = processing modules  Integration = interaction between nodes (mashups)  Collaboration with
  • 8. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
  • 9. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Workflow management  Web 2.0 style registry: myExperiment  Local client: Taverna Workbench  Web client: Project website
  • 10. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Local client: Taverna Workbench  Background: BioSciences  Developed and maintained by myGrid, UK  Available for Windows/Linux/OSX and as open source (Java)
  • 11. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Web client: Taverna Server/ Workflow Parser  SOAP/REST API  Remote execution of workflows
  • 12. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Community  Web2.0 style workflow registry  Community of experts  Sharing of resources  Knowledge exchange  A central meeting point for users and researchers
  • 13. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Compute cluster  Enterprise Service Bus receives requests from users and distributes the load to the available worker nodes  Main effect: Process parallelization, Load distribution, Fail over  Test deployment on Dutch Supercomputing Cloud HPC
  • 14. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Dataset  Representative and annotated dataset of significant size, with metadata, ground truth and search facilities
  • 15. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Evaluation features  Text based comparison of result with ground truth, using Levenshtein distance method  Layout based comparison of result with ground truth, using the Page Analysis And Ground Truth Elements Framework  Example:
  • 16. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Outlook  Online service for testing/evaluation/processing Results Repository (WebDAV, POI)  Extending the scope: Workflows for linguistic analysis: CLARIN Workflows for preservation: SCAPE  Even better scalability: MapReduce/Hadoop  Supported by a community of developers & practitioners
  • 17. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Summary - Availability of resources (images, ground truth and tools) to the international research community - A common baseline for transparent evaluation and comparison - Ready-to-use components, reproducible experiments - Sharing of results and know-how - Enable scalability for prototypes/data intensive workflows - Simple and uniform user interface for all embedded tools - Consolidation of support and maintenance Thank you! Questions?