The Europeana Newspapers Project at IMPACT Final Event

601 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
601
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The Europeana Newspapers Project at IMPACT Final Event

  1. 1. The Europeana NewspapersProjectIMPACT Final EventDen Haag, 26-06-2012Lotte Wilms
  2. 2. Europeana NewspapersWhy newspapers? • Important source of information for researchers • Relevant for general publicEuropeana Newspapers: • Aims at the aggregation and refinement of newspapers for The European Library and Europeana. • Will use refinement methods for OCR, OLR (article segmentation), and named entity (NER) and class recognition • The libraries participating in the project will provide around 18 million digitised newspaper pages to Europeana • More libraries will be encouraged to contribute newspapers to Europeana and TEL by the project • Builds on work from IMPACT 2
  3. 3. Project Profile: Consortium & stakeholders• 17 partners from 12 countries within the consortium • National libraries • University libraries • SME• External partners and stakeholders: • Involvement of libraries outside the project consortium• Framework: • Funded as a Best Practice Network in the ICT-PSP program of the European Commission • Project Duration: February 2012 – January 2015 3
  4. 4. Europeana Newspapers Consortium NL E NLF LIBER TEL SUB HH NLL CCSUSAL NLP BL SBB KB ONB NLT UIBK BnF UB LFT
  5. 5. Project Profile: Objectives1) Selection, Refinement & Aggregation of content • Provision of more than 18 million newspaper pages to Europeana, many of those with full-text • Support move from images to texts in Europeana2) Analysis of existing newspaper collections • Survey of newspaper holdings in Europe3) Quality Assurance & Best practice recommendations • Contribute to optimised workflows • Provide best practice recommendations for digitisation, refinement, workflows, metadata etc.4) Presentation and full-text search • Improve access to newspaper collections within Europeana 5
  6. 6. 1) Selection, Refinement & Aggregation of content• Aggregation of 18 million pages of digitised newspapers to Europeana and to The European Library • 8 million pages “as is” (content providers) • 8 million refined pages: OCR (UIBK, Austria) www.europeana.eu/ • 2 million refined pages: OCR/OLR (article segmentation) (CCS, Germany)• Analysis of available digital newspaper collections and selection of subsets suitable for refinement www.theeuropeanlibrary.org/ 6
  7. 7. 1) Refinement – OCR and OLR - UIBK• 8 million refined pages: OCR using ABBYY FRE10 (UIBK, Austria) • UIBK enriches the OCR with structural information from the Document Understanding Platform (FEP) developed within IMPACT • Dedicated profiles will be produced which are specifically tuned to the characteristics of newspapers to yield optimal results
  8. 8. 1) Refinement – OCR and OLR - CCS• 2 million refined pages: OCR/OLR (article segmentation) (CCS, Germany) • CCS produces OCR and verification of column recognition, zoning, article segmentation, and page class recognition • CCS provides libraries with a client technology for manual correction of recognition and segmentation results • OCRing done with ABBYY FRE10, which includes improvements developed CCS: Column recognition, article segmentation within IMPACT
  9. 9. 1) Refinement - Named Entity Recognition• KB provides named entities recognition (NER) for material from up to three languages (Dutch, English, and German) • Pilot planned for second half of 2012 Image by Frank Landsbergen (INL)
  10. 10. 2) Analysis of existing digitised newspaper collections• Project partners and others are contacted to provide input until 31 July 2012 to analyse the extent of digitised newspapers collections at their institutions • Results will be embedded in “Zeitschriftendatenbank” of Staatsbibliothek zu Berlin (Union Catalogue of Serials) • Potential new partners for the extension of the network will be suggested by survey• Also useful to ascertain the technical status of digitised dataIf you have a digital newspaper collection and would like to participate inthe survey  please go to: http://www.surveymonkey.com/s/BQ28579
  11. 11. 3) Quality Assurance & Best practice recommendations• The digitisation workflow for newspapers, including refinement, will be evaluation through an evaluation and quality assessment framework, containing tools developed in IMPACT • Document Management System • Ground truth production tool Aletheia • Evaluation tools• Provide recommendations on best practices for digitisation and refinement of newspapers
  12. 12. 3) Quality Assurance & Best practice recommendations• Analysis of metadata formats in use by libraries in digitisation projects• Align metadata models with the METS/ALTO standard• Release best practice recommendation on how to apply these formats in newspaper digitisation and refinement• Supports content browser
  13. 13. 4) Presentation & Access to full-text• Within the lifetime of the project, a content browser will be built within TEL portal so that users can … • Search full text, e.g. • by search term, • by named entities • by collections of newspapers • by date …. • See newspaper images • Be linked to relevant library sources • This browser will be built in TEL during the project; and exported to Europeana after the project
  14. 14. 5) Dissemination• Objectives: • Establishment of publicity • Increasing usage of Europeana • Awareness raising among target groups• Tasks: 1. Media Communication 2. Workshops and conferences • Three main dissemination workshops • National information days • Network extension 3. Exploitation 14
  15. 15. Thank you for your attention!http://www.europeana-newspapers.eu/ Lotte Wilms Lotte.wilms@kb.nl

×