The IMPACT Interoperability Framework: Workflows for OCR and beyond Clemens Neudecker, KB National Library of the Netherla...
Background <ul><li>> 20 individual software components for specific challenges </li></ul><ul><li>Prototyping new algorithm...
Architecture <ul><li>Java </li></ul><ul><li>Web Services </li></ul><ul><li>Apache </li></ul><ul><li>Taverna </li></ul><ul>...
Integration <ul><li>Only requirement: command line executable </li></ul><ul><li>Generic command line wrapper produces web ...
Workflows <ul><li>OCR workflow =  data pipeline </li></ul><ul><li>Building blocks =  processing modules  (nodes) </li></ul...
 
Evaluation features <ul><li>Text comparison of result with ground truth,  using Levenshtein distance method </li></ul><ul>...
Community <ul><li>Web2.0 style  workflow registry </li></ul><ul><li>Ready-to-use and  documented resources </li></ul><ul><...
Local client: Taverna Workbench <ul><li>Background:  </li></ul><ul><li>BioSciences </li></ul><ul><li>Developed and maintai...
Remote client: Portal <ul><li>SOAP/REST API </li></ul><ul><li>Remote execution of web services & workflows </li></ul>
Results Repository <ul><li>Custom service for IMPACT: </li></ul><ul><li>automatic storage of  </li></ul><ul><li>workflow o...
Scalability <ul><li>Central ESB proxy  manages multiple  service copies </li></ul><ul><li>Process parallelization, Load di...
Outlook <ul><li>Online service for testing/evaluation </li></ul><ul><li>Specification & Guidelines </li></ul><ul><li>Exten...
 
<ul><li>“ Anyway, the thing about progress is  that is always seems greater than it really is.” </li></ul><ul><ul><ul><li>...
Upcoming SlideShare
Loading in...5
×

IMPACT Final Conference - Clemens Neudecker

1,427
-1

Published on

The IMPACT Interoperability

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,427
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

IMPACT Final Conference - Clemens Neudecker

  1. 1. The IMPACT Interoperability Framework: Workflows for OCR and beyond Clemens Neudecker, KB National Library of the Netherlands 2 nd IMPACT Conference, British Library, London 24/25 October 2011
  2. 2. Background <ul><li>> 20 individual software components for specific challenges </li></ul><ul><li>Prototyping new algorithms, improving commercial solutions </li></ul><ul><li>Different frameworks (C, C++, Java, etc.), platforms (Win/Linux) </li></ul><ul><li>Extensible with 3 rd party applications </li></ul><ul><li> IMPACT Interoperability Framework (IIF) </li></ul>
  3. 3. Architecture <ul><li>Java </li></ul><ul><li>Web Services </li></ul><ul><li>Apache </li></ul><ul><li>Taverna </li></ul><ul><li>Open Source available on https://github.com/impactcentre </li></ul><ul><li>Free Hackathon 14/15 November, University of Manchester </li></ul><ul><li>http://impact-mygrid-taverna-hackathon.wikispaces.com/ </li></ul>
  4. 4. Integration <ul><li>Only requirement: command line executable </li></ul><ul><li>Generic command line wrapper produces web service </li></ul><ul><li>Web service exposed as workflow module with documentation </li></ul><ul><li>Quick & easy integration: developers can focus on their application and have to worry less about integration = higher quality software </li></ul>
  5. 5. Workflows <ul><li>OCR workflow = data pipeline </li></ul><ul><li>Building blocks = processing modules (nodes) </li></ul><ul><li>Integration = interaction between nodes (mashups) </li></ul><ul><li> Collaboration with </li></ul>
  6. 7. Evaluation features <ul><li>Text comparison of result with ground truth, using Levenshtein distance method </li></ul><ul><li>Word evaluation (with reading order) </li></ul><ul><li>Layout based comparison of result with ground truth, using the Page Analysis And Ground Truth Elements Framework </li></ul>
  7. 8. Community <ul><li>Web2.0 style workflow registry </li></ul><ul><li>Ready-to-use and documented resources </li></ul><ul><li>Community of experts </li></ul><ul><li>Sharing of experiments and know how </li></ul>
  8. 9. Local client: Taverna Workbench <ul><li>Background: </li></ul><ul><li>BioSciences </li></ul><ul><li>Developed and maintained by myGrid, UK </li></ul><ul><li>Open source </li></ul><ul><li>GUI for design and execution of web services & workflows </li></ul>
  9. 10. Remote client: Portal <ul><li>SOAP/REST API </li></ul><ul><li>Remote execution of web services & workflows </li></ul>
  10. 11. Results Repository <ul><li>Custom service for IMPACT: </li></ul><ul><li>automatic storage of </li></ul><ul><li>workflow outputs and </li></ul><ul><li>provenance via WebDAV </li></ul><ul><li>Fully interoperable, </li></ul><ul><li>since HTTP-based </li></ul><ul><li>Configurable s torage of </li></ul><ul><li>result sets </li></ul><ul><li>Create reports using POI </li></ul>
  11. 12. Scalability <ul><li>Central ESB proxy manages multiple service copies </li></ul><ul><li>Process parallelization, Load distribution, Fail over, Security </li></ul><ul><li>Served >2M requests </li></ul><ul><li>Throughput improvements of 94% with every additional instance </li></ul><ul><li>Tested on Dutch Supercomputing Cloud (“Enlighten Your Research”) </li></ul>
  12. 13. Outlook <ul><li>Online service for testing/evaluation </li></ul><ul><li>Specification & Guidelines </li></ul><ul><li>Extending the scope: Workflows for linguistic analysis: CLARIN Workflows for preservation: SCAPE </li></ul><ul><li>Even better scalability: Map/Reduce </li></ul><ul><li>Supported by a community of developers & practitioners </li></ul>
  13. 15. <ul><li>“ Anyway, the thing about progress is that is always seems greater than it really is.” </li></ul><ul><ul><ul><li>Ludwig Wittgenstein, Philosophical Investigations (quoting Johann Nestroy) </li></ul></ul></ul>xkcd.com/688
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×