IMPACT: Building a Centre of Competence for Digitisation
Upcoming SlideShare
Loading in...5
×
 

IMPACT: Building a Centre of Competence for Digitisation

on

  • 1,980 views

Presentation given by Hildelies Balk during the 2nd LIBER-EBLIDA Workshop on Digitisation of Library Material in Europe (19-21 October 2009, The Hague, the Netherlands)

Presentation given by Hildelies Balk during the 2nd LIBER-EBLIDA Workshop on Digitisation of Library Material in Europe (19-21 October 2009, The Hague, the Netherlands)

Statistics

Views

Total Views
1,980
Views on SlideShare
1,962
Embed Views
18

Actions

Likes
1
Downloads
28
Comments
0

3 Embeds 18

http://digitizationjournal.blogspot.com 10
http://www.slideshare.net 7
http://impactocr.wordpress.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

IMPACT: Building a Centre of Competence for Digitisation IMPACT: Building a Centre of Competence for Digitisation Presentation Transcript

  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. IMPACT: BUILDING A CENTRE OF COMPETENCE FOR DIGITISATION LIBER digitisation workshop 20 October 2009 Hildelies Balk, Head of European Projects at the National Library of the Netherlands and Coordinator of the IMPACT project
  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Outline of the presentation I 2010 vision and the role of Centres of Competence IMPACT as a Centre of Competence Challenges Innovation Building a sustainable Centre of Competence LIBER digitisation workshop 20 October 2009 2
  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. i 2010 vision Create a European Digital Library: all Cultural Heritage online Ambition to speed up the process of mass digitisation in order to produce massive corpus of digitised material online, create a ‘critical mass’ National Memory Institutions to take the lead, smaller institutions will follow LIBER digitisation workshop 20 October 2009 3
  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Centres of Competence Important role in realising the i2010 vision: Centres of Competence – Groups of excellent partners from public and private sector – high level of expertise – able to support different organisational stakeholders – provide over time access to a new generation of digitisation tools, services and skills EU Funding available through Framework Programmes LIBER digitisation workshop 20 October 2009 Hildelies Balk, Coordinator IMPACT 4 4
  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. IMPACT as Centre of Competence Consortium of 15 partners good mix of public and private, libaries and research All excelling in their field Each established in a large international network to facilitate outreach In May 2008 submitted succesful proposal in answer to the first call of the FP 7 ICT Work Programme 2007.4.1 Digital Libraries and technology-enhanced learning Project duration: 4 years, working on transformation into sustainable Centre form 2011 onwards Currently, over 100 people across Europe, Israel and Russia involved in the project On the verge of extending with 6 to 12 new partners in 2010 LIBER digitisation workshop 20 October 2009 Hildelies Balk, Coordinator IMPACT 5 5
  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. The IMPACT Consortium 8 Libraries 6 Universities & Research centres – National Library of the Netherlands (KB) – Dutch Institute for Lexicology (INL) – The British Library (BL) – National Centre for Scientific Research – Demokritos (NCSR) – Bibliothèque nationale de France (BNF) – University of Salford (USAL) – German National Library (DNB) – University of Munich (CIS group) – Bavarian State Library (BSB) – University of Innsbruck (InfMath group) – Göttingen State and University Library (UGOE) – University of Bath (UKOLN) – Austrian National Library (ONB) – University of Innsbruck Library (UIBK)* 2 Industry partners – IBM (Haifa Research Lab) – ABBYY (Moscow) Coordination: National Library of the Netherlands LIBER digitisation workshop 20 October 2009 Hildelies Balk, Coordinator IMPACT 6 6
  • Insight into issues of Permanent Access to the Records of Science in Europe Centre of Competence in Mass Digitisation Preservation and Long-term access through Networked Services Keeping Emulation Environments Portable Koninklijke Bibliotheek – National Library of the Netherlands
  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Main challenges in mass digitisation Technical challenges in the process from image capture to online access Strategic challenge: lack of institutional knowledge and expertise which causes inefficiency and ‘re-inventing the wheel’ LIBER digitisation workshop 20 October 2009 8
  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. IMPACT Focus on Historical printed text Improve the digital accessibility of all printed text produced before 1900. this is the bulk of copyright free material that still lies largely untouched in the storage of most libraries across Europe. Currently difficult to access in a digital form state-of-the-art Optical Character Recognition does not produce satisfactory results for old books, magazines and newspapers Commercial OCR technologies focus mainly on modern documents Not fit for historic material with archaic fonts, complex layouts, warped or degraded pages currently available modern lexica are not sufficient for the recognition of obsolete words and inflections in historical texts. large range of language variants (for instance spelling variants) Manual post-correction is slow and expensive LIBER digitisation workshop 20 October 2009 9
  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Gothic print types General description Effects on OCRing Historic fonts, obsolete characters Effects are high since such fonts and such as the long s characters are often not recognised correctly. LIBER digitisation workshop 20 October 2009 10
  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Difficult layout General description Due to difficult layouts, pages can be segmented incorrectly Effects on OCRing Effects are high since text is not ordered in the right way LIBER digitisation workshop 20 October 2009 11
  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Historical language Historical variants of the Dutch word ‘wereld’ (world): werelt weerelt wereld weerelds wereldt werelden weereld werrelts waerelds weerlyt wereldts vveerelts waereld weerelden waerelden weerlt werlt werelds sweerels zwerlys swarels swerelts werelts swerrels weirelts tsweerelds werret vverelt werlts werrelt worreld werlden wareld weirelt weireld waerelt werreld werld vvereld weerelts werlde tswerels werreldts weereldt wereldje waereldje weurlt wald weëled LIBER digitisation workshop 20 October 2009 12
  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Two main objectives IMPACT aims to significantly improving mass digitisation of historical printed text by: Pushing innovation of OCR software and language technology as far as possible during the project Sharing expertise and building capacity across Europe Centre of Competence LIBER digitisation workshop 20 October 2009 13
  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Innovation in IMPACT Exploring new approaches in OCR technology incorporating tools for the whole workflow of the object after it leaves the scanner, from image to full text: Image processing, OCR processing (including use of dictionaries), OCR correction and Document formatting providing computational lexica for a number of languages that will enhance the accessibility of the material, support for lexicon development in other European languages LIBER digitisation workshop 20 October 2009 14
  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Sharing expertise and building capacity Tackling the organisational barriers to mass digitisation development of strategic tools to facilitate outreach 2008-2009 demonstration of project results from 2010 onwards extension with new partners in two iterations in 2010 building a sustainable centre of competence 2009-2011 LIBER digitisation workshop 20 October 2009 15
  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Strategic tools and services Website provides access to all project outputs and forms the nucleus of a virtual network of all European digitisation centres of competence and associated research activities A set of Decision Support Tools that can be used to initiate, organise, manage and cost mass digitisation projects A Learning Resource Toolbox will contain operational guidelines, providing guidance on real world implementation of all tools produced within the project Training and support Help Desk system that brokers end-user requests to project partners and to other digitisation centres of competence Training programme dealing with large-scale digitisation issues and technologies, with a range of supporting documentation made available through the project website Interoperability Framework with demonstrator platform LIBER digitisation workshop 20 October 2009 16
  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Extension with new partners 1 Objectives: Adding European language partners to implement IMPACT language tools for their language Adding content holders from these language areas to share experience in mass digitisation, demonstrate IMPACT strategic tools Test and demonstrate IMPACT language independent tools Support training and dissemination in these new languages • Entry new partners from January 2010 Current languages in IMPACT: English, German, Dutch All three Germanic languages To be added partners from Southern and Eastern Europe LIBER digitisation workshop 20 October 2009 17
  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Extension with new partners 2 First iteration according to original contract with EC (1jan 2008): up to six partners from France, Spain and Poland Second iteration: proposal in special objective of the fifth call of FP7 (26 oct 2009): Enlarged European Union Up to six partners from countries that recently entered the EU Currently working on proposal with partners from Slovenia, Bulgaria and Czech Republic Extra option for working with Croatia In 2010 consortium will be extended with up to 12 partners LIBER digitisation workshop 20 October 2009 18
  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Towards a sustainable Centre of Competence 1 Achieved 2009: • Strategic tools in place • Demonstrator platform ready for implementation • Training and dissemination programme under way • Range of excellent new partners in ‘ waiting room’ • Engagement with other Centres of Competence in digitisation • Fruitful contacts with research community on one side and content holders on the other side LIBER digitisation workshop 20 October 2009 19
  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Towards a sustainable Centre of Competence 2 To be realised in 2010-2011 Kick off series of local events for dissemination and training Building out of virtual channels: e.g. registry/repository of ground truth Extension of the IMPACT community on the web and in the world Business model of sustainable Centre defined tangible commitment of all partners secured Resources for continuation LIBER digitisation workshop 20 October 2009 20
  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. IMPACT vision for 2012 In 2012 IMPACT is a sustainable Centre of Competence for mass digitisation of historical printed text in Europe, • providing (links to) tools and guidance; • sharing expertise; • giving access to professional training for digitisation workflow management • working with other Centres of Competence in digitisation to avoid the fragmentation and duplication of effort across Europe • Provides a channel for user requirements on the one hand and research community on the other hand Around this centre, a bigger community has formed, with added expertise from digitisation suppliers, research institutes, libraries and archives across Europe. This will contribute to the ultimate aim: All of Europe’s historical text digitised in a form that is accessible, on a par to born digital documents. LIBER digitisation workshop 20 October 2009 21
  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. More information about the IMPACT project: www.impact-project.eu LIBER digitisation workshop 20 October 2009 22
  • IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Thank you! Questions? Join the project mailing list? Project Office: impact@kb.nl Telephone: + 31 70 314 0958 LIBER digitisation workshop 20 October 2009 23