Your SlideShare is downloading. ×
The Europeana Newspapers Project at IMPACT Final Event
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

The Europeana Newspapers Project at IMPACT Final Event

371
views

Published on

Published in: Technology, Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
371
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The Europeana NewspapersProjectIMPACT Final EventDen Haag, 26-06-2012Lotte Wilms
  • 2. Europeana NewspapersWhy newspapers? • Important source of information for researchers • Relevant for general publicEuropeana Newspapers: • Aims at the aggregation and refinement of newspapers for The European Library and Europeana. • Will use refinement methods for OCR, OLR (article segmentation), and named entity (NER) and class recognition • The libraries participating in the project will provide around 18 million digitised newspaper pages to Europeana • More libraries will be encouraged to contribute newspapers to Europeana and TEL by the project • Builds on work from IMPACT 2
  • 3. Project Profile: Consortium & stakeholders• 17 partners from 12 countries within the consortium • National libraries • University libraries • SME• External partners and stakeholders: • Involvement of libraries outside the project consortium• Framework: • Funded as a Best Practice Network in the ICT-PSP program of the European Commission • Project Duration: February 2012 – January 2015 3
  • 4. Europeana Newspapers Consortium NL E NLF LIBER TEL SUB HH NLL CCSUSAL NLP BL SBB KB ONB NLT UIBK BnF UB LFT
  • 5. Project Profile: Objectives1) Selection, Refinement & Aggregation of content • Provision of more than 18 million newspaper pages to Europeana, many of those with full-text • Support move from images to texts in Europeana2) Analysis of existing newspaper collections • Survey of newspaper holdings in Europe3) Quality Assurance & Best practice recommendations • Contribute to optimised workflows • Provide best practice recommendations for digitisation, refinement, workflows, metadata etc.4) Presentation and full-text search • Improve access to newspaper collections within Europeana 5
  • 6. 1) Selection, Refinement & Aggregation of content• Aggregation of 18 million pages of digitised newspapers to Europeana and to The European Library • 8 million pages “as is” (content providers) • 8 million refined pages: OCR (UIBK, Austria) www.europeana.eu/ • 2 million refined pages: OCR/OLR (article segmentation) (CCS, Germany)• Analysis of available digital newspaper collections and selection of subsets suitable for refinement www.theeuropeanlibrary.org/ 6
  • 7. 1) Refinement – OCR and OLR - UIBK• 8 million refined pages: OCR using ABBYY FRE10 (UIBK, Austria) • UIBK enriches the OCR with structural information from the Document Understanding Platform (FEP) developed within IMPACT • Dedicated profiles will be produced which are specifically tuned to the characteristics of newspapers to yield optimal results
  • 8. 1) Refinement – OCR and OLR - CCS• 2 million refined pages: OCR/OLR (article segmentation) (CCS, Germany) • CCS produces OCR and verification of column recognition, zoning, article segmentation, and page class recognition • CCS provides libraries with a client technology for manual correction of recognition and segmentation results • OCRing done with ABBYY FRE10, which includes improvements developed CCS: Column recognition, article segmentation within IMPACT
  • 9. 1) Refinement - Named Entity Recognition• KB provides named entities recognition (NER) for material from up to three languages (Dutch, English, and German) • Pilot planned for second half of 2012 Image by Frank Landsbergen (INL)
  • 10. 2) Analysis of existing digitised newspaper collections• Project partners and others are contacted to provide input until 31 July 2012 to analyse the extent of digitised newspapers collections at their institutions • Results will be embedded in “Zeitschriftendatenbank” of Staatsbibliothek zu Berlin (Union Catalogue of Serials) • Potential new partners for the extension of the network will be suggested by survey• Also useful to ascertain the technical status of digitised dataIf you have a digital newspaper collection and would like to participate inthe survey  please go to: http://www.surveymonkey.com/s/BQ28579
  • 11. 3) Quality Assurance & Best practice recommendations• The digitisation workflow for newspapers, including refinement, will be evaluation through an evaluation and quality assessment framework, containing tools developed in IMPACT • Document Management System • Ground truth production tool Aletheia • Evaluation tools• Provide recommendations on best practices for digitisation and refinement of newspapers
  • 12. 3) Quality Assurance & Best practice recommendations• Analysis of metadata formats in use by libraries in digitisation projects• Align metadata models with the METS/ALTO standard• Release best practice recommendation on how to apply these formats in newspaper digitisation and refinement• Supports content browser
  • 13. 4) Presentation & Access to full-text• Within the lifetime of the project, a content browser will be built within TEL portal so that users can … • Search full text, e.g. • by search term, • by named entities • by collections of newspapers • by date …. • See newspaper images • Be linked to relevant library sources • This browser will be built in TEL during the project; and exported to Europeana after the project
  • 14. 5) Dissemination• Objectives: • Establishment of publicity • Increasing usage of Europeana • Awareness raising among target groups• Tasks: 1. Media Communication 2. Workshops and conferences • Three main dissemination workshops • National information days • Network extension 3. Exploitation 14
  • 15. Thank you for your attention!http://www.europeana-newspapers.eu/ Lotte Wilms Lotte.wilms@kb.nl