Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The IMPACT Interoperability Framework: Workflows for OCR and beyond Clemens Neudecker, KB National Library of the Netherla...
Background <ul><li>> 20 individual software components for specific challenges </li></ul><ul><li>Prototyping new algorithm...
Architecture <ul><li>Java </li></ul><ul><li>Web Services </li></ul><ul><li>Apache </li></ul><ul><li>Taverna </li></ul><ul>...
Integration <ul><li>Only requirement: command line executable </li></ul><ul><li>Generic command line wrapper produces web ...
Workflows <ul><li>OCR workflow =  data pipeline </li></ul><ul><li>Building blocks =  processing modules  (nodes) </li></ul...
 
Evaluation features <ul><li>Text comparison of result with ground truth,  using Levenshtein distance method </li></ul><ul>...
Community <ul><li>Web2.0 style  workflow registry </li></ul><ul><li>Ready-to-use and  documented resources </li></ul><ul><...
Local client: Taverna Workbench <ul><li>Background:  </li></ul><ul><li>BioSciences </li></ul><ul><li>Developed and maintai...
Remote client: Portal <ul><li>SOAP/REST API </li></ul><ul><li>Remote execution of web services & workflows </li></ul>
Results Repository <ul><li>Custom service for IMPACT: </li></ul><ul><li>automatic storage of  </li></ul><ul><li>workflow o...
Scalability <ul><li>Central ESB proxy  manages multiple  service copies </li></ul><ul><li>Process parallelization, Load di...
Outlook <ul><li>Online service for testing/evaluation </li></ul><ul><li>Specification & Guidelines </li></ul><ul><li>Exten...
 
<ul><li>“ Anyway, the thing about progress is  that is always seems greater than it really is.” </li></ul><ul><ul><ul><li>...
Upcoming SlideShare
Loading in …5
×

IMPACT Final Conference - Clemens Neudecker

The IMPACT Interoperability

  • Be the first to comment

  • Be the first to like this

IMPACT Final Conference - Clemens Neudecker

  1. 1. The IMPACT Interoperability Framework: Workflows for OCR and beyond Clemens Neudecker, KB National Library of the Netherlands 2 nd IMPACT Conference, British Library, London 24/25 October 2011
  2. 2. Background <ul><li>> 20 individual software components for specific challenges </li></ul><ul><li>Prototyping new algorithms, improving commercial solutions </li></ul><ul><li>Different frameworks (C, C++, Java, etc.), platforms (Win/Linux) </li></ul><ul><li>Extensible with 3 rd party applications </li></ul><ul><li> IMPACT Interoperability Framework (IIF) </li></ul>
  3. 3. Architecture <ul><li>Java </li></ul><ul><li>Web Services </li></ul><ul><li>Apache </li></ul><ul><li>Taverna </li></ul><ul><li>Open Source available on https://github.com/impactcentre </li></ul><ul><li>Free Hackathon 14/15 November, University of Manchester </li></ul><ul><li>http://impact-mygrid-taverna-hackathon.wikispaces.com/ </li></ul>
  4. 4. Integration <ul><li>Only requirement: command line executable </li></ul><ul><li>Generic command line wrapper produces web service </li></ul><ul><li>Web service exposed as workflow module with documentation </li></ul><ul><li>Quick & easy integration: developers can focus on their application and have to worry less about integration = higher quality software </li></ul>
  5. 5. Workflows <ul><li>OCR workflow = data pipeline </li></ul><ul><li>Building blocks = processing modules (nodes) </li></ul><ul><li>Integration = interaction between nodes (mashups) </li></ul><ul><li> Collaboration with </li></ul>
  6. 7. Evaluation features <ul><li>Text comparison of result with ground truth, using Levenshtein distance method </li></ul><ul><li>Word evaluation (with reading order) </li></ul><ul><li>Layout based comparison of result with ground truth, using the Page Analysis And Ground Truth Elements Framework </li></ul>
  7. 8. Community <ul><li>Web2.0 style workflow registry </li></ul><ul><li>Ready-to-use and documented resources </li></ul><ul><li>Community of experts </li></ul><ul><li>Sharing of experiments and know how </li></ul>
  8. 9. Local client: Taverna Workbench <ul><li>Background: </li></ul><ul><li>BioSciences </li></ul><ul><li>Developed and maintained by myGrid, UK </li></ul><ul><li>Open source </li></ul><ul><li>GUI for design and execution of web services & workflows </li></ul>
  9. 10. Remote client: Portal <ul><li>SOAP/REST API </li></ul><ul><li>Remote execution of web services & workflows </li></ul>
  10. 11. Results Repository <ul><li>Custom service for IMPACT: </li></ul><ul><li>automatic storage of </li></ul><ul><li>workflow outputs and </li></ul><ul><li>provenance via WebDAV </li></ul><ul><li>Fully interoperable, </li></ul><ul><li>since HTTP-based </li></ul><ul><li>Configurable s torage of </li></ul><ul><li>result sets </li></ul><ul><li>Create reports using POI </li></ul>
  11. 12. Scalability <ul><li>Central ESB proxy manages multiple service copies </li></ul><ul><li>Process parallelization, Load distribution, Fail over, Security </li></ul><ul><li>Served >2M requests </li></ul><ul><li>Throughput improvements of 94% with every additional instance </li></ul><ul><li>Tested on Dutch Supercomputing Cloud (“Enlighten Your Research”) </li></ul>
  12. 13. Outlook <ul><li>Online service for testing/evaluation </li></ul><ul><li>Specification & Guidelines </li></ul><ul><li>Extending the scope: Workflows for linguistic analysis: CLARIN Workflows for preservation: SCAPE </li></ul><ul><li>Even better scalability: Map/Reduce </li></ul><ul><li>Supported by a community of developers & practitioners </li></ul>
  13. 15. <ul><li>“ Anyway, the thing about progress is that is always seems greater than it really is.” </li></ul><ul><ul><ul><li>Ludwig Wittgenstein, Philosophical Investigations (quoting Johann Nestroy) </li></ul></ul></ul>xkcd.com/688

×