Successfully reported this slideshow.

Experimental Workflow Development in Digitisation

1

Share

1 of 11
1 of 11

Experimental Workflow Development in Digitisation

1

Share

Download to read offline

Description

Experimental Workflow Development in Digitisation
2nd Qualitative and Quantitative Methods in Libraries International Conference (QQML2010), 25-28 May 2010, Chania, Greece.

Transcript

  1. 1. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Experimental workflow development in digitisation The concept of collaborative workflow development in the IMPACT project Mustafa Dogan (Göttingen State and University Library) Clemens Neudecker (Koninklijke Bibliotheek) Gerd Zechmeister (Austrian National Library) Sven Schlarb (Austrian National Library)
  2. 2. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 27.5.2010 QQML 2 Agenda  Background of IMPACT  Digitisation workflows  Collaborative workflow development  Architectural principles  Workflow development platform  Key success factors  Outlook and future scenarios
  3. 3. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 27.5.2010 QQML 3 Background of IMPACT  Project partners – 26 Libraries, Research Institutes and Industry Partners  Main objective – Improve access to historical books and newspapers printed before 1900  Software tools and prototypes – Image Enhancement & Segmentation Toolkit – Improved ABBYY FineReader OCR Engine, IBM Adaptive OCR – Post-processing and -correction modules – Lexical resources for several European languages  Support to the MLA community – Best Practises & Strategic/Operational Guidelines – Online Helpdesk – Tool Showcases & Demonstrators – Centre of Competence
  4. 4. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Digitisation workflows  Digitisation: a sequence of steps, from selection of analogue source material to presentation of digital objects for end-users  Workflow: software-based execution of a sequence without human 27.5.2010 QQML 4 interaction  Challenges and barriers – Workflows are tailored to specific needs – Lack of interoperability for applied software and input/outdata data – Lack of collaboratively used and developed resources and expertise
  5. 5. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Collaborative workflow development  Workflow Development as a community-driven activity using an 27.5.2010 QQML 5 experimental platform  Scientific workflows: using web services representing individual software modules (Shiyong Lu et al. 2009)  Providing highly innovative and efficient tools to a wider community to design workflows  Technical staff providing the platform, conceptual/library staff designing workflows  Using Web 2.0 features to share and expand knowledge and resources
  6. 6. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 27.5.2010 QQML 6 Architectural platform principles  Modularity  Transparency  Flexibility  Extensibility  Open standards based  Accessibility  Scalability  Collaboration
  7. 7. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 27.5.2010 QQML 7 Workflow development platform
  8. 8. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 27.5.2010 QQML 8 Workflow development phases
  9. 9. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Evaluation criteria  OCR: correctly recognised characters/words  Segmentation: correctly identified text and graphical regions  Workflows: comparing workflows and identifiying most suitable  Statistical and provenance data: e.g. processing time 27.5.2010 QQML 9
  10. 10. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 27.5.2010 QQML 10 Outlook  Keys to success – Joint effort by library and software development staff – Usability of tools and platform – Incentive to collaborative work – Testing and adaptation of workflows – Permanently tailoring and optimizing workflows  Future work – Demonstration of current (web) services – Experimental platform as sustainable resource for a Centre of Competence for the MLA community
  11. 11. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 27.5.2010 QQML 11 Thank you very much! Contact: Project Website: http://www.impact-project.eu Project Office: impact@kb.nl

Editor's Notes

  • 26 Libraries, Research Institutes and Industry Partners: providing content/material, knowledge/expertise, tools/software modules/prototypes
  • What is digitisation in IMPACT
    How do we define a workflow in IMPACT
    What are the challenges in current workflow development and application
    Workflows are tailored to library-/project-specific needs
    no out-of-the-box system
    causes labour- and cost-intensive evaluation and adaptation for repurpose
    Lack of interoperability for applied software and input/outdata data
    Lack of collaboratively used and developed resources and expertise
    Human intervention often required to guarantee ongoing processing
  • Concept of scientific workflows: http://www.cs.wayne.edu/~shiyong/papers/tsc09.pdf
    Technical staff providing the platform, conceptual staff designing workflows  no in-depth technical and procedural knowledge required by conceptual staff
  • Modularity:
    modules combined in number of combinations
    identify the most suitable processing chain
    service-oriented-architecture (SOA) is the guiding architectural design principle
    principle of loose coupling of reusable processing units
    minimising interdependencies
    Transparency:
    Each processing step tested and evaluated separately
    Flexibility:
    platform-independent
    capable of integrating different types of software
    performance of tools can be compared easily.
    Extensibility:
    Third party components  small extra effort
    not restricted to software tools developed in IMPACT
    Open standards based:
    widely supported open source software (Apache Software Foundation)
    Interoperability through use of XML standards such as
    METS/ALTO for encoding of structural information and the OCR-recognised text
    SOAP as the message exchange protocol
    WSDL for web service description
    Accessibility:
    3 different types of interfaces
    user-friendly, graphical workflow design and execution interface
    a web client generator  seamless integration into web sites
    machine interface (API)
    Scalability:
    Components will be deployed in the IT infrastructure of different partner organisations in a distributed network with cloned services
    Services available in a redundant way
    Balancing the workload and adding additional computing capacity when needed.
    Collaboration:
    community-wide applicability
    optimisation of workflows
    accessible by various channels (including Web 2.0 features)
    comprehensively described and documented.
  • Joint effort by library and software development staff: library: concepts, content-providing – SD: technical framework, integration of services etc.
    Expanding portfolio of web services: also by scanning services, quality assurance/evaluation modules etc. to cover entire range of digitisation workflow steps
  • Description

    Experimental Workflow Development in Digitisation
    2nd Qualitative and Quantitative Methods in Libraries International Conference (QQML2010), 25-28 May 2010, Chania, Greece.

    Transcript

    1. 1. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Experimental workflow development in digitisation The concept of collaborative workflow development in the IMPACT project Mustafa Dogan (Göttingen State and University Library) Clemens Neudecker (Koninklijke Bibliotheek) Gerd Zechmeister (Austrian National Library) Sven Schlarb (Austrian National Library)
    2. 2. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 27.5.2010 QQML 2 Agenda  Background of IMPACT  Digitisation workflows  Collaborative workflow development  Architectural principles  Workflow development platform  Key success factors  Outlook and future scenarios
    3. 3. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 27.5.2010 QQML 3 Background of IMPACT  Project partners – 26 Libraries, Research Institutes and Industry Partners  Main objective – Improve access to historical books and newspapers printed before 1900  Software tools and prototypes – Image Enhancement & Segmentation Toolkit – Improved ABBYY FineReader OCR Engine, IBM Adaptive OCR – Post-processing and -correction modules – Lexical resources for several European languages  Support to the MLA community – Best Practises & Strategic/Operational Guidelines – Online Helpdesk – Tool Showcases & Demonstrators – Centre of Competence
    4. 4. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Digitisation workflows  Digitisation: a sequence of steps, from selection of analogue source material to presentation of digital objects for end-users  Workflow: software-based execution of a sequence without human 27.5.2010 QQML 4 interaction  Challenges and barriers – Workflows are tailored to specific needs – Lack of interoperability for applied software and input/outdata data – Lack of collaboratively used and developed resources and expertise
    5. 5. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Collaborative workflow development  Workflow Development as a community-driven activity using an 27.5.2010 QQML 5 experimental platform  Scientific workflows: using web services representing individual software modules (Shiyong Lu et al. 2009)  Providing highly innovative and efficient tools to a wider community to design workflows  Technical staff providing the platform, conceptual/library staff designing workflows  Using Web 2.0 features to share and expand knowledge and resources
    6. 6. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 27.5.2010 QQML 6 Architectural platform principles  Modularity  Transparency  Flexibility  Extensibility  Open standards based  Accessibility  Scalability  Collaboration
    7. 7. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 27.5.2010 QQML 7 Workflow development platform
    8. 8. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 27.5.2010 QQML 8 Workflow development phases
    9. 9. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Evaluation criteria  OCR: correctly recognised characters/words  Segmentation: correctly identified text and graphical regions  Workflows: comparing workflows and identifiying most suitable  Statistical and provenance data: e.g. processing time 27.5.2010 QQML 9
    10. 10. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 27.5.2010 QQML 10 Outlook  Keys to success – Joint effort by library and software development staff – Usability of tools and platform – Incentive to collaborative work – Testing and adaptation of workflows – Permanently tailoring and optimizing workflows  Future work – Demonstration of current (web) services – Experimental platform as sustainable resource for a Centre of Competence for the MLA community
    11. 11. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 27.5.2010 QQML 11 Thank you very much! Contact: Project Website: http://www.impact-project.eu Project Office: impact@kb.nl

    Editor's Notes

  • 26 Libraries, Research Institutes and Industry Partners: providing content/material, knowledge/expertise, tools/software modules/prototypes
  • What is digitisation in IMPACT
    How do we define a workflow in IMPACT
    What are the challenges in current workflow development and application
    Workflows are tailored to library-/project-specific needs
    no out-of-the-box system
    causes labour- and cost-intensive evaluation and adaptation for repurpose
    Lack of interoperability for applied software and input/outdata data
    Lack of collaboratively used and developed resources and expertise
    Human intervention often required to guarantee ongoing processing
  • Concept of scientific workflows: http://www.cs.wayne.edu/~shiyong/papers/tsc09.pdf
    Technical staff providing the platform, conceptual staff designing workflows  no in-depth technical and procedural knowledge required by conceptual staff
  • Modularity:
    modules combined in number of combinations
    identify the most suitable processing chain
    service-oriented-architecture (SOA) is the guiding architectural design principle
    principle of loose coupling of reusable processing units
    minimising interdependencies
    Transparency:
    Each processing step tested and evaluated separately
    Flexibility:
    platform-independent
    capable of integrating different types of software
    performance of tools can be compared easily.
    Extensibility:
    Third party components  small extra effort
    not restricted to software tools developed in IMPACT
    Open standards based:
    widely supported open source software (Apache Software Foundation)
    Interoperability through use of XML standards such as
    METS/ALTO for encoding of structural information and the OCR-recognised text
    SOAP as the message exchange protocol
    WSDL for web service description
    Accessibility:
    3 different types of interfaces
    user-friendly, graphical workflow design and execution interface
    a web client generator  seamless integration into web sites
    machine interface (API)
    Scalability:
    Components will be deployed in the IT infrastructure of different partner organisations in a distributed network with cloned services
    Services available in a redundant way
    Balancing the workload and adding additional computing capacity when needed.
    Collaboration:
    community-wide applicability
    optimisation of workflows
    accessible by various channels (including Web 2.0 features)
    comprehensively described and documented.
  • Joint effort by library and software development staff: library: concepts, content-providing – SD: technical framework, integration of services etc.
    Expanding portfolio of web services: also by scanning services, quality assurance/evaluation modules etc. to cover entire range of digitisation workflow steps
  • More Related Content

    Similar to Experimental Workflow Development in Digitisation

    Related Books

    Free with a 30 day trial from Scribd

    See all

    ×