SlideShare a Scribd company logo
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
A sustainable infrastructure for 
large scale document image analysis 
HPC Cloud day – 4 October
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Background 
 IMPACT – Improving Access to Text (2008 – 2011) 
Large-scale integrating research project, funded by the EC 
– Consortium of 26 partners 
– Coordinated by the National Library of the Netherlands (KB) 
– EU funding: € 12 100 000 (FP7 ICT Work Programme) 
– From 2012: sustainable Centre of Competence with alternative 
2 
resources 
 Main objectives: 
- Innovate OCR technology 
- Capacity building in mass-digitisation
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
OCR: A multitude of challenges… 
VVt Venetien den 1.Junij, Anno 1618. 
DJgn i f paffato te S' aö'Jifeert mo?üen/bah .)etgi'uotbciraetail)i.r/JtmelchontDecht te / 
sbnbe bele btr felbrr geiufttceert baer bnber eeniglje jprant o^fen/bie ftcb .met 
beSpaenfcbeu enbeeemgljen bifet Cbeiiupcen berbonbru befe 
3
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
OCR: A multitude of challenges… 
• I. OCR challenges (gothic fonts, bleed-through, warping, etc.)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
OCR: A multitude of challenges… 
• II. Language challenges (spelling variants, inflection, and many 
more!) 
Example: historical variants of the Dutch word ‘wereld’ (world): 
werelt weerelt wereld weerelds wereldt werelden weereld werrelts waerelds weerlyt 
wereldts vveerelts waereld weerelden waerelden weerlt werlt werelds sweerels 
zwerlys swarels swerelts werelts swerrels weirelts tsweerelds werret vverelt werlts 
werrelt worreld werlden wareld weirelt weireld waerelt werreld werld vvereld weerelts 
werlde tswerels werreldts weereldt wereldje waereldje weurlt wald weëled
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
6 
IMPACT Solutions 
 From a technical perspective: 
> 20 software toolkits for solving different problems 
 Such as: 
OCR (C++, C#), 
Image Processing & Lexica (DLL), 
Command Line Tools (Win/Linux), 
Java, Ruby, PHP, Perl, etc. 
 IMPACT Interoperability Framework (IIF)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
7 
Architecture 
 IMPACT Interoperability Framework: Technologies 
- Java 6 
- Generic Web Service Wrapper 
- Apache Maven 
- Apache Tomcat 
- Apache Axis2 
- Apache Synapse 
- Taverna Workflow Engine 
 IMPACT Interoperability Framework: Dataset 
- PHP/mySQL database, frontend for search 
- approx. 5 TB raw data (images, text files, metadata) and growing
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
How does it work? 
1. Digitisation/OCR challenges registered and tagged in database 
8 
 Warped text 
2. Database contains 99,95% correct result: “ground truth”
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
How does it work? 
3. Researcher develops new method to tackle a problem 
4. Research prototype is wrapped to a SOAP web service 
9
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
10 
How does it work? 
5. Web service is integrated as a workflow module 
6. Workflow module can be evaluated, based on the ground truth
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
11 
Current setup 
 Enterprise Service Bus 
receives requests from 
users and distributes 
the load to the available 
worker nodes (= server 
with all services installed) 
 Main effect: 
Process parallelization, 
Load distribution, 
Fail over 
 Drawback: 
Data is sent to worker nodes all around Europe = 
huge amount of data needs to be sent over the net!
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Proposed setup 
Set up worker nodes on the HPC cloud (same location) 
Advantage: 
- Improve speed and availability for concurrent users 
- Remove constraints for large-scale processing 
12
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Benefits 
 Scalable platform 
 Availability of resources to a large number of users 
 Enable research into scalable computing 
 Consolidation of support and maintenance 
 Various interfaces (web/local) 
13

More Related Content

What's hot

The once and future library - reimagining the national library as infrastruct...
The once and future library - reimagining the national library as infrastruct...The once and future library - reimagining the national library as infrastruct...
The once and future library - reimagining the national library as infrastruct...
ukcorr
 
EuropeanaTech PGM12
EuropeanaTech PGM12EuropeanaTech PGM12
EuropeanaTech PGM12
Antoine Isaac
 
Project "The Digital City Revives, A Case Study of Web Archaeology"
Project "The Digital City Revives, A Case Study of Web Archaeology"Project "The Digital City Revives, A Case Study of Web Archaeology"
Project "The Digital City Revives, A Case Study of Web Archaeology"
Tjarda de Haan
 
Digital Cultural Heritage and the new EU Framework Programme
Digital Cultural Heritage and the new EU Framework ProgrammeDigital Cultural Heritage and the new EU Framework Programme
Digital Cultural Heritage and the new EU Framework Programme
locloud
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
Europeana
 
16,40 16,55 h. open aire eblida-naple conference
16,40 16,55 h. open aire eblida-naple conference16,40 16,55 h. open aire eblida-naple conference
16,40 16,55 h. open aire eblida-naple conference
FESABID
 
National Stakeholders Coordination Group Meeting 1- Invitation and agenda v4.0
National Stakeholders Coordination Group Meeting 1- Invitation and agenda v4.0National Stakeholders Coordination Group Meeting 1- Invitation and agenda v4.0
National Stakeholders Coordination Group Meeting 1- Invitation and agenda v4.0
Michael Hübner
 
LoCloud: Local Content in a Europeana Cloud
LoCloud: Local Content in a Europeana CloudLoCloud: Local Content in a Europeana Cloud
LoCloud: Local Content in a Europeana Cloud
locloud
 
The LoCloud lightweight digital library and alternative content sources, Adam...
The LoCloud lightweight digital library and alternative content sources, Adam...The LoCloud lightweight digital library and alternative content sources, Adam...
The LoCloud lightweight digital library and alternative content sources, Adam...
locloud
 
06 presto4 u
06 presto4 u06 presto4 u
06 presto4 u
Europeana
 
EDF2013: Language Technology Panel, Hans-Ulrich von Freyberg: Language Infras...
EDF2013: Language Technology Panel, Hans-Ulrich von Freyberg: Language Infras...EDF2013: Language Technology Panel, Hans-Ulrich von Freyberg: Language Infras...
EDF2013: Language Technology Panel, Hans-Ulrich von Freyberg: Language Infras...
European Data Forum
 
Europeana Network Association Members Council Meeting, Copenhagen by Max Kaiser
Europeana Network Association Members Council Meeting, Copenhagen by Max KaiserEuropeana Network Association Members Council Meeting, Copenhagen by Max Kaiser
Europeana Network Association Members Council Meeting, Copenhagen by Max Kaiser
Europeana
 
Succeed Introduction - Rafael Carrasco
Succeed Introduction  - Rafael CarrascoSucceed Introduction  - Rafael Carrasco
Succeed Introduction - Rafael Carrasco
IMPACT Centre of Competence
 
Clare Lanigan - Presentation to IES Students
Clare Lanigan - Presentation to IES StudentsClare Lanigan - Presentation to IES Students
Clare Lanigan - Presentation to IES Students
dri_ireland
 
Local content in a Europeana cloud for small & medium content providers
Local content in a Europeana cloud for small & medium content providersLocal content in a Europeana cloud for small & medium content providers
Local content in a Europeana cloud for small & medium content providers
locloud
 
The value of EOSC from a user perspective: Key themes and actions from Day 1
The value of EOSC from a user perspective: Key themes and actions from Day 1The value of EOSC from a user perspective: Key themes and actions from Day 1
The value of EOSC from a user perspective: Key themes and actions from Day 1
EOSCpilot .eu
 
Tenegen P3 Capdm
Tenegen P3 CapdmTenegen P3 Capdm
Tenegen P3 Capdm
ITStudy Ltd.
 
Reducing Infrastructure and Service Fragmentation
Reducing Infrastructure and Service Fragmentation Reducing Infrastructure and Service Fragmentation
Reducing Infrastructure and Service Fragmentation
EOSCpilot .eu
 
OpenAIRE Advance: A trusted eInfrastructure for the EOSC
OpenAIRE Advance: A trusted eInfrastructure for the EOSCOpenAIRE Advance: A trusted eInfrastructure for the EOSC
OpenAIRE Advance: A trusted eInfrastructure for the EOSC
EOSCpilot .eu
 
Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)
Web@rchive Austria
 

What's hot (20)

The once and future library - reimagining the national library as infrastruct...
The once and future library - reimagining the national library as infrastruct...The once and future library - reimagining the national library as infrastruct...
The once and future library - reimagining the national library as infrastruct...
 
EuropeanaTech PGM12
EuropeanaTech PGM12EuropeanaTech PGM12
EuropeanaTech PGM12
 
Project "The Digital City Revives, A Case Study of Web Archaeology"
Project "The Digital City Revives, A Case Study of Web Archaeology"Project "The Digital City Revives, A Case Study of Web Archaeology"
Project "The Digital City Revives, A Case Study of Web Archaeology"
 
Digital Cultural Heritage and the new EU Framework Programme
Digital Cultural Heritage and the new EU Framework ProgrammeDigital Cultural Heritage and the new EU Framework Programme
Digital Cultural Heritage and the new EU Framework Programme
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
 
16,40 16,55 h. open aire eblida-naple conference
16,40 16,55 h. open aire eblida-naple conference16,40 16,55 h. open aire eblida-naple conference
16,40 16,55 h. open aire eblida-naple conference
 
National Stakeholders Coordination Group Meeting 1- Invitation and agenda v4.0
National Stakeholders Coordination Group Meeting 1- Invitation and agenda v4.0National Stakeholders Coordination Group Meeting 1- Invitation and agenda v4.0
National Stakeholders Coordination Group Meeting 1- Invitation and agenda v4.0
 
LoCloud: Local Content in a Europeana Cloud
LoCloud: Local Content in a Europeana CloudLoCloud: Local Content in a Europeana Cloud
LoCloud: Local Content in a Europeana Cloud
 
The LoCloud lightweight digital library and alternative content sources, Adam...
The LoCloud lightweight digital library and alternative content sources, Adam...The LoCloud lightweight digital library and alternative content sources, Adam...
The LoCloud lightweight digital library and alternative content sources, Adam...
 
06 presto4 u
06 presto4 u06 presto4 u
06 presto4 u
 
EDF2013: Language Technology Panel, Hans-Ulrich von Freyberg: Language Infras...
EDF2013: Language Technology Panel, Hans-Ulrich von Freyberg: Language Infras...EDF2013: Language Technology Panel, Hans-Ulrich von Freyberg: Language Infras...
EDF2013: Language Technology Panel, Hans-Ulrich von Freyberg: Language Infras...
 
Europeana Network Association Members Council Meeting, Copenhagen by Max Kaiser
Europeana Network Association Members Council Meeting, Copenhagen by Max KaiserEuropeana Network Association Members Council Meeting, Copenhagen by Max Kaiser
Europeana Network Association Members Council Meeting, Copenhagen by Max Kaiser
 
Succeed Introduction - Rafael Carrasco
Succeed Introduction  - Rafael CarrascoSucceed Introduction  - Rafael Carrasco
Succeed Introduction - Rafael Carrasco
 
Clare Lanigan - Presentation to IES Students
Clare Lanigan - Presentation to IES StudentsClare Lanigan - Presentation to IES Students
Clare Lanigan - Presentation to IES Students
 
Local content in a Europeana cloud for small & medium content providers
Local content in a Europeana cloud for small & medium content providersLocal content in a Europeana cloud for small & medium content providers
Local content in a Europeana cloud for small & medium content providers
 
The value of EOSC from a user perspective: Key themes and actions from Day 1
The value of EOSC from a user perspective: Key themes and actions from Day 1The value of EOSC from a user perspective: Key themes and actions from Day 1
The value of EOSC from a user perspective: Key themes and actions from Day 1
 
Tenegen P3 Capdm
Tenegen P3 CapdmTenegen P3 Capdm
Tenegen P3 Capdm
 
Reducing Infrastructure and Service Fragmentation
Reducing Infrastructure and Service Fragmentation Reducing Infrastructure and Service Fragmentation
Reducing Infrastructure and Service Fragmentation
 
OpenAIRE Advance: A trusted eInfrastructure for the EOSC
OpenAIRE Advance: A trusted eInfrastructure for the EOSCOpenAIRE Advance: A trusted eInfrastructure for the EOSC
OpenAIRE Advance: A trusted eInfrastructure for the EOSC
 
Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)
 

Viewers also liked

Europeana Newspapers in a nutshell
Europeana Newspapers in a nutshellEuropeana Newspapers in a nutshell
Europeana Newspapers in a nutshell
cneudecker
 
Bessere Suchergebnisse durch Named Entity Recognition
Bessere Suchergebnisse durch Named Entity RecognitionBessere Suchergebnisse durch Named Entity Recognition
Bessere Suchergebnisse durch Named Entity Recognitioncneudecker
 
Europeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers OnlineEuropeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers Online
cneudecker
 
Refinement of Digitised Newspapers
Refinement of Digitised NewspapersRefinement of Digitised Newspapers
Refinement of Digitised Newspapers
cneudecker
 
Succeed 2nd hackathon
Succeed 2nd hackathonSucceed 2nd hackathon
Succeed 2nd hackathon
cneudecker
 
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...
cneudecker
 
OCR challenges in historic documents and the contribution of IMPACT
OCR challenges in historic documents and the contribution of IMPACTOCR challenges in historic documents and the contribution of IMPACT
OCR challenges in historic documents and the contribution of IMPACT
cneudecker
 
The IMPACT Interoperability Framework - Workflows for OCR and beyond
The IMPACT Interoperability Framework - Workflows for OCR and beyondThe IMPACT Interoperability Framework - Workflows for OCR and beyond
The IMPACT Interoperability Framework - Workflows for OCR and beyond
cneudecker
 
Collaborative Workflow Development and Experimentation in the Digital Humanities
Collaborative Workflow Development and Experimentation in the Digital HumanitiesCollaborative Workflow Development and Experimentation in the Digital Humanities
Collaborative Workflow Development and Experimentation in the Digital Humanities
cneudecker
 
IMPACT at OCR Summit
IMPACT at OCR SummitIMPACT at OCR Summit
IMPACT at OCR Summit
cneudecker
 
The Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating HadoopThe Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating Hadoop
cneudecker
 
Digitale Kuratierungstechnologien in Bibliotheken
Digitale Kuratierungstechnologien in BibliothekenDigitale Kuratierungstechnologien in Bibliotheken
Digitale Kuratierungstechnologien in Bibliotheken
cneudecker
 
Berliner DH Rundgang
Berliner DH RundgangBerliner DH Rundgang
Berliner DH Rundgang
cneudecker
 
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...
cneudecker
 
What is Hadoop?
What is Hadoop?What is Hadoop?
What is Hadoop?
cneudecker
 
Preservation Workflows with Taverna
Preservation Workflows with TavernaPreservation Workflows with Taverna
Preservation Workflows with Taverna
cneudecker
 

Viewers also liked (16)

Europeana Newspapers in a nutshell
Europeana Newspapers in a nutshellEuropeana Newspapers in a nutshell
Europeana Newspapers in a nutshell
 
Bessere Suchergebnisse durch Named Entity Recognition
Bessere Suchergebnisse durch Named Entity RecognitionBessere Suchergebnisse durch Named Entity Recognition
Bessere Suchergebnisse durch Named Entity Recognition
 
Europeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers OnlineEuropeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers Online
 
Refinement of Digitised Newspapers
Refinement of Digitised NewspapersRefinement of Digitised Newspapers
Refinement of Digitised Newspapers
 
Succeed 2nd hackathon
Succeed 2nd hackathonSucceed 2nd hackathon
Succeed 2nd hackathon
 
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...
 
OCR challenges in historic documents and the contribution of IMPACT
OCR challenges in historic documents and the contribution of IMPACTOCR challenges in historic documents and the contribution of IMPACT
OCR challenges in historic documents and the contribution of IMPACT
 
The IMPACT Interoperability Framework - Workflows for OCR and beyond
The IMPACT Interoperability Framework - Workflows for OCR and beyondThe IMPACT Interoperability Framework - Workflows for OCR and beyond
The IMPACT Interoperability Framework - Workflows for OCR and beyond
 
Collaborative Workflow Development and Experimentation in the Digital Humanities
Collaborative Workflow Development and Experimentation in the Digital HumanitiesCollaborative Workflow Development and Experimentation in the Digital Humanities
Collaborative Workflow Development and Experimentation in the Digital Humanities
 
IMPACT at OCR Summit
IMPACT at OCR SummitIMPACT at OCR Summit
IMPACT at OCR Summit
 
The Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating HadoopThe Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating Hadoop
 
Digitale Kuratierungstechnologien in Bibliotheken
Digitale Kuratierungstechnologien in BibliothekenDigitale Kuratierungstechnologien in Bibliotheken
Digitale Kuratierungstechnologien in Bibliotheken
 
Berliner DH Rundgang
Berliner DH RundgangBerliner DH Rundgang
Berliner DH Rundgang
 
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...
 
What is Hadoop?
What is Hadoop?What is Hadoop?
What is Hadoop?
 
Preservation Workflows with Taverna
Preservation Workflows with TavernaPreservation Workflows with Taverna
Preservation Workflows with Taverna
 

Similar to IMPACT HPC Cloud Day

An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...
cneudecker
 
IMPACT Demo Dag at KB
IMPACT Demo Dag at KBIMPACT Demo Dag at KB
IMPACT Demo Dag at KB
cneudecker
 
Computer Lexica in OCR and Retrieval
Computer Lexica in OCR and RetrievalComputer Lexica in OCR and Retrieval
Computer Lexica in OCR and Retrieval
Biblioteca Nacional de España
 
IMPACT Interoperability and Evaluation Framework. Clemens Neudecker
IMPACT Interoperability and Evaluation Framework. Clemens NeudeckerIMPACT Interoperability and Evaluation Framework. Clemens Neudecker
IMPACT Interoperability and Evaluation Framework. Clemens Neudecker
Biblioteca Nacional de España
 
The Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiativesThe Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiatives
Michael Day
 
Ecloud copenhagen-130625074823-phpapp01
Ecloud copenhagen-130625074823-phpapp01Ecloud copenhagen-130625074823-phpapp01
Ecloud copenhagen-130625074823-phpapp01
The European Library
 
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the CloudEuropeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
TU Delft, Netherlands
 
Europeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers
 
IMPACT Final Conference - Muehlberger - FEP
IMPACT Final Conference - Muehlberger - FEPIMPACT Final Conference - Muehlberger - FEP
IMPACT Final Conference - Muehlberger - FEP
IMPACT Centre of Competence
 
Targeted Language Resources for the Digitisation of Historical Collections
Targeted Language Resources for the Digitisation of Historical CollectionsTargeted Language Resources for the Digitisation of Historical Collections
Targeted Language Resources for the Digitisation of Historical Collections
Emma Huber
 
IMPACT OCR in a nutshell. Clemens Neudecker
IMPACT OCR in a nutshell. Clemens NeudeckerIMPACT OCR in a nutshell. Clemens Neudecker
IMPACT OCR in a nutshell. Clemens Neudecker
Biblioteca Nacional de España
 
Bratislava WS - Schlarb - ONB - technical tools_pdf
Bratislava WS - Schlarb - ONB - technical tools_pdfBratislava WS - Schlarb - ONB - technical tools_pdf
Bratislava WS - Schlarb - ONB - technical tools_pdf
IMPACT Centre of Competence
 
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
The European Library
 
Science Demonstrator Session: Social and Earth Sciences
Science Demonstrator Session: Social and Earth SciencesScience Demonstrator Session: Social and Earth Sciences
Science Demonstrator Session: Social and Earth Sciences
EOSCpilot .eu
 
ECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming artsECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming arts
Paolo Nesi
 
Europeana Cloud - Alastair Dunning - November 2013
Europeana Cloud - Alastair Dunning - November 2013Europeana Cloud - Alastair Dunning - November 2013
Europeana Cloud - Alastair Dunning - November 2013
Europeana
 
BL Demo Day - July2011 - (1) Introduction to IMPACT
BL Demo Day - July2011 - (1) Introduction to IMPACTBL Demo Day - July2011 - (1) Introduction to IMPACT
BL Demo Day - July2011 - (1) Introduction to IMPACT
IMPACT Centre of Competence
 
Benefits of collaborative EU digitization projects
Benefits of collaborative EU digitization projectsBenefits of collaborative EU digitization projects
Benefits of collaborative EU digitization projects
Trilce Navarrete
 
Science Demonstrator Session: Physics and Astrophysics
Science Demonstrator Session: Physics and AstrophysicsScience Demonstrator Session: Physics and Astrophysics
Science Demonstrator Session: Physics and Astrophysics
EOSCpilot .eu
 
Presentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information DayPresentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information Day
Europeana Newspapers
 

Similar to IMPACT HPC Cloud Day (20)

An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...
 
IMPACT Demo Dag at KB
IMPACT Demo Dag at KBIMPACT Demo Dag at KB
IMPACT Demo Dag at KB
 
Computer Lexica in OCR and Retrieval
Computer Lexica in OCR and RetrievalComputer Lexica in OCR and Retrieval
Computer Lexica in OCR and Retrieval
 
IMPACT Interoperability and Evaluation Framework. Clemens Neudecker
IMPACT Interoperability and Evaluation Framework. Clemens NeudeckerIMPACT Interoperability and Evaluation Framework. Clemens Neudecker
IMPACT Interoperability and Evaluation Framework. Clemens Neudecker
 
The Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiativesThe Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiatives
 
Ecloud copenhagen-130625074823-phpapp01
Ecloud copenhagen-130625074823-phpapp01Ecloud copenhagen-130625074823-phpapp01
Ecloud copenhagen-130625074823-phpapp01
 
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the CloudEuropeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
 
Europeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday Muehlberger
 
IMPACT Final Conference - Muehlberger - FEP
IMPACT Final Conference - Muehlberger - FEPIMPACT Final Conference - Muehlberger - FEP
IMPACT Final Conference - Muehlberger - FEP
 
Targeted Language Resources for the Digitisation of Historical Collections
Targeted Language Resources for the Digitisation of Historical CollectionsTargeted Language Resources for the Digitisation of Historical Collections
Targeted Language Resources for the Digitisation of Historical Collections
 
IMPACT OCR in a nutshell. Clemens Neudecker
IMPACT OCR in a nutshell. Clemens NeudeckerIMPACT OCR in a nutshell. Clemens Neudecker
IMPACT OCR in a nutshell. Clemens Neudecker
 
Bratislava WS - Schlarb - ONB - technical tools_pdf
Bratislava WS - Schlarb - ONB - technical tools_pdfBratislava WS - Schlarb - ONB - technical tools_pdf
Bratislava WS - Schlarb - ONB - technical tools_pdf
 
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
 
Science Demonstrator Session: Social and Earth Sciences
Science Demonstrator Session: Social and Earth SciencesScience Demonstrator Session: Social and Earth Sciences
Science Demonstrator Session: Social and Earth Sciences
 
ECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming artsECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming arts
 
Europeana Cloud - Alastair Dunning - November 2013
Europeana Cloud - Alastair Dunning - November 2013Europeana Cloud - Alastair Dunning - November 2013
Europeana Cloud - Alastair Dunning - November 2013
 
BL Demo Day - July2011 - (1) Introduction to IMPACT
BL Demo Day - July2011 - (1) Introduction to IMPACTBL Demo Day - July2011 - (1) Introduction to IMPACT
BL Demo Day - July2011 - (1) Introduction to IMPACT
 
Benefits of collaborative EU digitization projects
Benefits of collaborative EU digitization projectsBenefits of collaborative EU digitization projects
Benefits of collaborative EU digitization projects
 
Science Demonstrator Session: Physics and Astrophysics
Science Demonstrator Session: Physics and AstrophysicsScience Demonstrator Session: Physics and Astrophysics
Science Demonstrator Session: Physics and Astrophysics
 
Presentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information DayPresentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information Day
 

More from cneudecker

EuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Library
cneudecker
 
ALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltexte
cneudecker
 
OCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungen
cneudecker
 
Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?
cneudecker
 
Multimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspapers
cneudecker
 
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
cneudecker
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritage
cneudecker
 
Kuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher IntelligenzKuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher Intelligenz
cneudecker
 
Überblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-DÜberblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-D
cneudecker
 
The many uses of digitized newspapers
The many uses of digitized newspapersThe many uses of digitized newspapers
The many uses of digitized newspapers
cneudecker
 
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
cneudecker
 
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
cneudecker
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documents
cneudecker
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Mining
cneudecker
 
Formate für Volltexte
Formate für VolltexteFormate für Volltexte
Formate für Volltexte
cneudecker
 
Extrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in EuropeExtrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in Europe
cneudecker
 
Reise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 MinutenReise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 Minuten
cneudecker
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshell
cneudecker
 
lab.sbb.berlin
lab.sbb.berlinlab.sbb.berlin
lab.sbb.berlin
cneudecker
 
Named Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspapers
cneudecker
 

More from cneudecker (20)

EuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Library
 
ALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltexte
 
OCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungen
 
Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?
 
Multimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspapers
 
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritage
 
Kuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher IntelligenzKuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher Intelligenz
 
Überblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-DÜberblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-D
 
The many uses of digitized newspapers
The many uses of digitized newspapersThe many uses of digitized newspapers
The many uses of digitized newspapers
 
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
 
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documents
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Mining
 
Formate für Volltexte
Formate für VolltexteFormate für Volltexte
Formate für Volltexte
 
Extrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in EuropeExtrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in Europe
 
Reise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 MinutenReise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 Minuten
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshell
 
lab.sbb.berlin
lab.sbb.berlinlab.sbb.berlin
lab.sbb.berlin
 
Named Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspapers
 

Recently uploaded

OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 

Recently uploaded (20)

OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 

IMPACT HPC Cloud Day

  • 1. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. A sustainable infrastructure for large scale document image analysis HPC Cloud day – 4 October
  • 2. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Background  IMPACT – Improving Access to Text (2008 – 2011) Large-scale integrating research project, funded by the EC – Consortium of 26 partners – Coordinated by the National Library of the Netherlands (KB) – EU funding: € 12 100 000 (FP7 ICT Work Programme) – From 2012: sustainable Centre of Competence with alternative 2 resources  Main objectives: - Innovate OCR technology - Capacity building in mass-digitisation
  • 3. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. OCR: A multitude of challenges… VVt Venetien den 1.Junij, Anno 1618. DJgn i f paffato te S' aö'Jifeert mo?üen/bah .)etgi'uotbciraetail)i.r/JtmelchontDecht te / sbnbe bele btr felbrr geiufttceert baer bnber eeniglje jprant o^fen/bie ftcb .met beSpaenfcbeu enbeeemgljen bifet Cbeiiupcen berbonbru befe 3
  • 4. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. OCR: A multitude of challenges… • I. OCR challenges (gothic fonts, bleed-through, warping, etc.)
  • 5. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. OCR: A multitude of challenges… • II. Language challenges (spelling variants, inflection, and many more!) Example: historical variants of the Dutch word ‘wereld’ (world): werelt weerelt wereld weerelds wereldt werelden weereld werrelts waerelds weerlyt wereldts vveerelts waereld weerelden waerelden weerlt werlt werelds sweerels zwerlys swarels swerelts werelts swerrels weirelts tsweerelds werret vverelt werlts werrelt worreld werlden wareld weirelt weireld waerelt werreld werld vvereld weerelts werlde tswerels werreldts weereldt wereldje waereldje weurlt wald weëled
  • 6. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 6 IMPACT Solutions  From a technical perspective: > 20 software toolkits for solving different problems  Such as: OCR (C++, C#), Image Processing & Lexica (DLL), Command Line Tools (Win/Linux), Java, Ruby, PHP, Perl, etc.  IMPACT Interoperability Framework (IIF)
  • 7. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 7 Architecture  IMPACT Interoperability Framework: Technologies - Java 6 - Generic Web Service Wrapper - Apache Maven - Apache Tomcat - Apache Axis2 - Apache Synapse - Taverna Workflow Engine  IMPACT Interoperability Framework: Dataset - PHP/mySQL database, frontend for search - approx. 5 TB raw data (images, text files, metadata) and growing
  • 8. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. How does it work? 1. Digitisation/OCR challenges registered and tagged in database 8  Warped text 2. Database contains 99,95% correct result: “ground truth”
  • 9. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. How does it work? 3. Researcher develops new method to tackle a problem 4. Research prototype is wrapped to a SOAP web service 9
  • 10. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 10 How does it work? 5. Web service is integrated as a workflow module 6. Workflow module can be evaluated, based on the ground truth
  • 11. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 11 Current setup  Enterprise Service Bus receives requests from users and distributes the load to the available worker nodes (= server with all services installed)  Main effect: Process parallelization, Load distribution, Fail over  Drawback: Data is sent to worker nodes all around Europe = huge amount of data needs to be sent over the net!
  • 12. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Proposed setup Set up worker nodes on the HPC cloud (same location) Advantage: - Improve speed and availability for concurrent users - Remove constraints for large-scale processing 12
  • 13. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Benefits  Scalable platform  Availability of resources to a large number of users  Enable research into scalable computing  Consolidation of support and maintenance  Various interfaces (web/local) 13

Editor's Notes

  1. DDD: Krantentitel:   Courante uyt Italien, Duytslandt, &c .Datum, editie:   14-06-1618, Dag Uitgever:   Caspar van Hilten Plaats van Uitgave:   Amsterdam. Actual selection from online database
  2. <number>
  3. <number>