• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Bratislava WS - Conteh - BL - IMPACT overview_pdf
 

Bratislava WS - Conteh - BL - IMPACT overview_pdf

on

  • 796 views

 

Statistics

Views

Total Views
796
Views on SlideShare
761
Embed Views
35

Actions

Likes
0
Downloads
9
Comments
0

2 Embeds 35

http://impactocr.wordpress.com 32
http://www.slideshare.net 3

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Bratislava WS - Conteh - BL - IMPACT overview_pdf Bratislava WS - Conteh - BL - IMPACT overview_pdf Presentation Transcript

    • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Overview of the IMPACT Project IMPACT Workshop, 7th May 2010, Bratislava Aly Conteh, British Library
    • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Background  Text that is not digital is virtually invisible  Digitised material is becoming available too slowly, in too small quantities and from too few sources  OCR (optical character recognition) technology does not produce satisfactory results for historical documents  There is a lack of institutional knowledge and expertise which causes inefficiency and ‘re-inventing the wheel’ Aly Conteh, British Library 2
    • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. OBJECTIVES Significantly improve mass digitisation of historical printed text by  Innovating OCR software and language technology  Sharing expertise and building capacity across Europe  Ensuring that tools and services will be sustained after the end of the project Aly Conteh, British Library 3
    • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. The IMPACT Consortium - Original  Libraries  Universities & Research centres – National Library of the Netherlands (KB) – Dutch Institute for Lexicology (INL) – The British Library (BL) – National Centre for Scientific Research – – Bibliothèque nationale de France (BNF) Demokritos (NCSR) – German National Library (DNB) – University of Salford (USAL) – Bavarian State Library (BSB) – University of Munich (CIS group) – Göttingen State and University Library – University of Innsbruck (InfMath group) (UGOE) – University of Bath (UKOLN) – Austrian National Library (ONB) – University of Innsbruck Library (UIBK)  Industry partners – IBM (Haifa Research Lab) – ABBYY (Moscow) Aly Conteh, British Library 4
    • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. IMPACT Extension: objectives  To demonstrate the IMPACT tools for efficient lexicon building for language families outside the current IMPACT focus → Currently in IMPACT three Germanic languages : English, German, Dutch → Add Romance and Slavic languages  To demonstrate and disseminate project results in Southern and Eastern Europe, and support building capacity in digitisation in these countries  To reinforce cooperation and better exploitation of ICT R&D synergies across the enlarged European Union  To build strategic partnerships with aim of gaining access to knowledge, developing standards and interoperable solutions Aly Conteh, British Library 5
    • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Extention in two iterations: 1. Second phase, foreseen in original IMPACT contract → 3 languages: French, Spanish, Polish → 5 partners (entry 1 february 2010) 2. Proposal in Objective ICT-2009.9.5 , call 5 of FP7: Enlarged European Union → 3 languages: Slovene, Bulgarian and Czech → 6 partners (entry 1 april 2010)  All will be equal partners in consortium  Full integration expected in June 2010 Aly Conteh, British Library 6
    • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. New partners identified: second phase 22 Analyse et Traitement Informatique de la Langue Française ATILF FR 23 Biblioteca Nacional de España BNE ES 24 Fundación Biblioteca Virtual Miguel de Cervantes BVC ES 25 Poznań Supercomputing and Networking Center PSNC PL 26 University of Warsaw, Department of Formal Linguistics UW DFL PL Aly Conteh, British Library 7
    • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. New partners identified – IMPACT enlarged EU 16 Institute for Parallel Processing, Bulgarian Academy of Sciences BAS BG 17 “St. Cyril and Methodius” National Library NLB BG 18 Jožef Stefan Institute JSI SI 19 Narodna in univerzitetna knjižnica (National and University Library) NUK SI 20 Institute of the Czech National Corpus, Charles University Prague CUP CZ 21 Národní knihovna České republiky (National Library of the Czech Republic) NKC CZ Aly Conteh, British Library 8
    • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Aly Conteh, British Library 9
    • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Facts and figures  Project supported by the European Community under the FP7 ICT Work Programme.  coordinated by the National Library of the Netherlands (KB)  Project type: Large-scale Integrating Project  EU funding: € 11 500 000  Start date: 1 January 2008  Duration: 48 months  From 2012: sustainable Centre of Competence  Contact: impact@kb.nl  Web site: www.impact-project.eu Aly Conteh, British Library 10
    • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Project Structure OPERATIONAL CONTEXT Requirements, Benchmarking and Metrics Best Practices and Guidelines Technical Framework and Technical Integration TEXT RECOGNITION ENHANCEMENT & ENRICHMENT Pre-processing and segmentation Collaborative correction Adaptive and experimental OCR Lexicons and gazetteers Models and dictionaries Structural metadata CAPACITY BUILDING Published resources Training and support Demonstration Aly Conteh, British Library 11
    • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Tools for Text Recognition (OCR) Technologies for the extraction of text in a digital form from the page  Adaptive OCR engine: Core of IMPACT, cutting-edge software system which is tailored specifically to the needs of libraries adapts itself to the material during OCR process, integrating several other tools:  Image enhancement toolkit  Segmentation toolkit  Post-correction modules  Other OCR engines  Experimental prototypes and tools  Typewritten OCR prototype  Wordspotting engine  Inventory extraction prototype OC TR EE CB Aly Conteh, British Library 12
    • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Tools for Enrichment (language technology) Make the OCR results more accurate and more accessible  Collaborative correction Full web-based collaborative correction system: web-based platform, suitable for massive volunteer participation, validates and corrects OCR results. first tool of its kind to be directly linked to an OCR engine  Lexicons and gazetteers  General and Named Entities lexica for Dutch, German and English as well as support for lexicon development in other European languages  Toolboxes providing the means to overcome the historical language barrier  Collaborative web-based workspace for named entity management  Structural metadata Functional Extension Parser: a set of web services that can be exploited to automatically detect and tag structural metadata of scanned material OC TR EE CB Aly Conteh, British Library 13
    • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Strategic tools and services  Web site provides access to all project outputs and forms the nucleus of a virtual network of all European digitisation centres of competence and associated research activities  A set of Decision Support Tools that can be used to initiate, organise, manage and cost mass digitisation projects  A learning resource toolbox will contain operational guidelines, providing guidance on real world implementation of all tools produced within the project.  Training and support  Help Desk system that brokers end-user requests to project partners and to other digitisation centres of competence.  Training programme dealing with large-scale digitisation issues and technologies, with a range of supporting documentation made available through the project website  Demonstration OC TR EE CB Aly Conteh, British Library 14
    • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Building a sustainable Centre of Competence  First Phase 2008: IMPACT core consortium of 15 partners  Good mix of public and private partners  Experience in mass digitisation and research in OCR, Language and Image processing  Second Phase 2010: extension with 11 additional partners  Public collection holders and language institutes  Adding wider set of European languages and experience in mass digitisation  Third Phase 2011: Open to all partners  Other Centres of Competence  Digitisation Suppliers  Research Institutes  Libraries, Archives and Museums By 2012 IMPACT exists as a sustainable Centre of Competence Aly Conteh, British Library 15
    • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Aly Conteh, British Library 16
    • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. http://www.impact-project.eu Aly Conteh, British Library 17
    • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Twitter: impactocr Blog: impactocr.wordpress.com Aly Conteh, British Library 18
    • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Thank you Aly Conteh, British Library 19