Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The IMPACT Interoperability Framework: 
Workflows for OCR and beyond 
Clemens Neudecker, KB National Library of the Nether...
Background 
 > 20 individual software components for specific challenges 
 Prototyping new algorithms, improving commerc...
Architecture 
 Java 
 Web Services 
 Apache 
 Taverna 
Open Source available on https://github.com/impactcentre 
Free ...
Integration 
 Only requirement: 
command line executable 
 Generic command line wrapper 
produces web service 
 Web ser...
Workflows 
 OCR workflow = 
data pipeline 
 Building blocks = 
processing modules 
(nodes) 
 Integration = 
interaction...
Evaluation features 
 Text comparison of result with ground truth, 
using Levenshtein distance method 
 Word evaluation ...
Community 
 Web2.0 style 
workflow registry 
 Ready-to-use and 
documented resources 
 Community of experts 
 Sharing ...
Local client: Taverna Workbench 
 Background: 
BioSciences 
 Developed and 
maintained by 
myGrid, UK 
 Open source 
 ...
Remote client: Portal 
 SOAP/REST API 
 Remote execution of web services & workflows
Results Repository 
Custom service for IMPACT: 
 automatic storage of 
workflow outputs and 
provenance via WebDAV 
 Ful...
Scalability 
 Central ESB proxy 
manages multiple 
service copies 
 Process parallelization, 
Load distribution, 
Fail o...
Outlook 
 Online service for testing/evaluation 
 Specification & Guidelines 
 Extending the scope: 
Workflows for ling...
xkcd.com/688 
“Anyway, the thing about progress is 
that is always seems greater than it really is.” 
Ludwig Wittgenstein,...
The IMPACT Interoperability Framework - Workflows for OCR and beyond
The IMPACT Interoperability Framework - Workflows for OCR and beyond
Upcoming SlideShare
Loading in …5
×

The IMPACT Interoperability Framework - Workflows for OCR and beyond

413 views

Published on

The IMPACT Interoperability Framework - Workflows for OCR and beyond
Better, faster, cheaper. Solutions of the IMPACT Centre of Competence and future challenges, The British Library, 24-25 October 2011, London, United Kingdom.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

The IMPACT Interoperability Framework - Workflows for OCR and beyond

  1. 1. The IMPACT Interoperability Framework: Workflows for OCR and beyond Clemens Neudecker, KB National Library of the Netherlands 2nd IMPACT Conference, British Library, London 24/25 October 2011
  2. 2. Background  > 20 individual software components for specific challenges  Prototyping new algorithms, improving commercial solutions  Different frameworks (C, C++, Java, etc.), platforms (Win/Linux)  Extensible with 3rd party applications  IMPACT Interoperability Framework (IIF)
  3. 3. Architecture  Java  Web Services  Apache  Taverna Open Source available on https://github.com/impactcentre Free Hackathon 14/15 November, University of Manchester http://impact-mygrid-taverna-hackathon.wikispaces.com/
  4. 4. Integration  Only requirement: command line executable  Generic command line wrapper produces web service  Web service exposed as workflow module with documentation  Quick & easy integration: developers can focus on their application and have to worry less about integration = higher quality software
  5. 5. Workflows  OCR workflow = data pipeline  Building blocks = processing modules (nodes)  Integration = interaction between nodes (mashups)  Collaboration with
  6. 6. Evaluation features  Text comparison of result with ground truth, using Levenshtein distance method  Word evaluation (with reading order)  Layout based comparison of result with ground truth, using the Page Analysis And Ground Truth Elements Framework
  7. 7. Community  Web2.0 style workflow registry  Ready-to-use and documented resources  Community of experts  Sharing of experiments and know how
  8. 8. Local client: Taverna Workbench  Background: BioSciences  Developed and maintained by myGrid, UK  Open source  GUI for design and execution of web services & workflows
  9. 9. Remote client: Portal  SOAP/REST API  Remote execution of web services & workflows
  10. 10. Results Repository Custom service for IMPACT:  automatic storage of workflow outputs and provenance via WebDAV  Fully interoperable, since HTTP-based  Configurable storage of result sets  Create reports using POI
  11. 11. Scalability  Central ESB proxy manages multiple service copies  Process parallelization, Load distribution, Fail over, Security  Served >2M requests  Throughput improvements of 94% with every additional instance  Tested on Dutch Supercomputing Cloud (“Enlighten Your Research”)
  12. 12. Outlook  Online service for testing/evaluation  Specification & Guidelines  Extending the scope: Workflows for linguistic analysis: CLARIN Workflows for preservation: SCAPE  Even better scalability: Map/Reduce  Supported by a community of developers & practitioners
  13. 13. xkcd.com/688 “Anyway, the thing about progress is that is always seems greater than it really is.” Ludwig Wittgenstein, Philosophical Investigations (quoting Johann Nestroy)

×