Your SlideShare is downloading. ×
IMPACT Final Conference - Clemens Neudecker
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

IMPACT Final Conference - Clemens Neudecker

1,373

Published on

The IMPACT Interoperability

The IMPACT Interoperability

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,373
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The IMPACT Interoperability Framework: Workflows for OCR and beyond Clemens Neudecker, KB National Library of the Netherlands 2 nd IMPACT Conference, British Library, London 24/25 October 2011
  • 2. Background
    • > 20 individual software components for specific challenges
    • Prototyping new algorithms, improving commercial solutions
    • Different frameworks (C, C++, Java, etc.), platforms (Win/Linux)
    • Extensible with 3 rd party applications
    •  IMPACT Interoperability Framework (IIF)
  • 3. Architecture
    • Java
    • Web Services
    • Apache
    • Taverna
    • Open Source available on https://github.com/impactcentre
    • Free Hackathon 14/15 November, University of Manchester
    • http://impact-mygrid-taverna-hackathon.wikispaces.com/
  • 4. Integration
    • Only requirement: command line executable
    • Generic command line wrapper produces web service
    • Web service exposed as workflow module with documentation
    • Quick & easy integration: developers can focus on their application and have to worry less about integration = higher quality software
  • 5. Workflows
    • OCR workflow = data pipeline
    • Building blocks = processing modules (nodes)
    • Integration = interaction between nodes (mashups)
    •  Collaboration with
  • 6.  
  • 7. Evaluation features
    • Text comparison of result with ground truth, using Levenshtein distance method
    • Word evaluation (with reading order)
    • Layout based comparison of result with ground truth, using the Page Analysis And Ground Truth Elements Framework
  • 8. Community
    • Web2.0 style workflow registry
    • Ready-to-use and documented resources
    • Community of experts
    • Sharing of experiments and know how
  • 9. Local client: Taverna Workbench
    • Background:
    • BioSciences
    • Developed and maintained by myGrid, UK
    • Open source
    • GUI for design and execution of web services & workflows
  • 10. Remote client: Portal
    • SOAP/REST API
    • Remote execution of web services & workflows
  • 11. Results Repository
    • Custom service for IMPACT:
    • automatic storage of
    • workflow outputs and
    • provenance via WebDAV
    • Fully interoperable,
    • since HTTP-based
    • Configurable s torage of
    • result sets
    • Create reports using POI
  • 12. Scalability
    • Central ESB proxy manages multiple service copies
    • Process parallelization, Load distribution, Fail over, Security
    • Served >2M requests
    • Throughput improvements of 94% with every additional instance
    • Tested on Dutch Supercomputing Cloud (“Enlighten Your Research”)
  • 13. Outlook
    • Online service for testing/evaluation
    • Specification & Guidelines
    • Extending the scope: Workflows for linguistic analysis: CLARIN Workflows for preservation: SCAPE
    • Even better scalability: Map/Reduce
    • Supported by a community of developers & practitioners
  • 14.  
  • 15.
    • “ Anyway, the thing about progress is that is always seems greater than it really is.”
        • Ludwig Wittgenstein, Philosophical Investigations (quoting Johann Nestroy)
    xkcd.com/688

×