The Europeana NewspapersProjectIMPACT Final EventDen Haag, 26-06-2012Lotte Wilms
Europeana NewspapersWhy newspapers?  • Important source of information for researchers  • Relevant for general publicEurop...
Project Profile: Consortium & stakeholders• 17 partners from 12 countries within the consortium    • National libraries   ...
Europeana Newspapers Consortium                                    N LE                       N LF                   LIBER...
Project Profile: Objectives1) Selection, Refinement & Aggregation of content   • Provision of more than 18 million newspap...
1) Selection, Refinement & Aggregation of content• Aggregation of 18 million pages of digitised  newspapers to Europeana a...
1) Refinement – OCR and OLR - UIBK• 8 million refined pages: OCR using ABBYY FRE10 (UIBK, Austria)   • UIBK enriches the O...
1) Refinement – OCR and OLR - CCS• 2 million refined pages: OCR/OLR (article segmentation) (CCS, Germany)   • CCS produces...
1) Refinement - Named Entity Recognition• KB provides named entities recognition (NER) for material from up to three langu...
2 ) A n a ly s is o f e x is t in g d ig it is e dn e w s p a p e r c o lle c t io n s• Project partners and others are co...
3) Quality Assurance & Best practice recommendations• The digitisation workflow for newspapers, including refinement, will...
3) Quality Assurance & Best practice recommendations• Analysis of metadata formats in use by libraries in digitisation pro...
4) Presentation & Access to full-text• Within the lifetime of the project, a content browser will be built within TEL port...
5) Dissemination• Objectives:   • Establishment of publicity   • Increasing usage of Europeana   • Awareness raising among...
Thank you for your attention!http://www.europeana-newspapers.eu/ Lotte Wilms Lotte.wilms@kb.nl
Upcoming SlideShare
Loading in …5
×

IMPACT Final Event 26-06-2012 - Use of IMPACT tools in the Europeana Newspapers project by Lotte Wilms (KB)

801 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
801
On SlideShare
0
From Embeds
0
Number of Embeds
71
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

IMPACT Final Event 26-06-2012 - Use of IMPACT tools in the Europeana Newspapers project by Lotte Wilms (KB)

  1. 1. The Europeana NewspapersProjectIMPACT Final EventDen Haag, 26-06-2012Lotte Wilms
  2. 2. Europeana NewspapersWhy newspapers? • Important source of information for researchers • Relevant for general publicEuropeana Newspapers: • Aims at the aggregation and refinement of newspapers for The European Library and Europeana. • Will use refinement methods for OCR, OLR (article segmentation), and named entity (NER) and class recognition • The libraries participating in the project will provide around 18 million digitised newspaper pages to Europeana • More libraries will be encouraged to contribute newspapers to Europeana and TEL by the project • Builds on work from IMPACT 2
  3. 3. Project Profile: Consortium & stakeholders• 17 partners from 12 countries within the consortium • National libraries • University libraries • SME• External partners and stakeholders: • Involvement of libraries outside the project consortium• Framework: • Funded as a Best Practice Network in the ICT-PSP program of the European Commission • Project Duration: February 2012 – January 2015 3
  4. 4. Europeana Newspapers Consortium N LE N LF LIBER TEL SUB HH NLL CCSUSAL NLP BL SBB KB ONB NLT UIBK BnF UB LFT
  5. 5. Project Profile: Objectives1) Selection, Refinement & Aggregation of content • Provision of more than 18 million newspaper pages to Europeana, many of those with full-text • Support move from images to texts in Europeana2) Analysis of existing newspaper collections • Survey of newspaper holdings in Europe3) Quality Assurance & Best practice recommendations • Contribute to optimised workflows • Provide best practice recommendations for digitisation, refinement, workflows, metadata etc.4) Presentation and full-text search • Improve access to newspaper collections within Europeana 5
  6. 6. 1) Selection, Refinement & Aggregation of content• Aggregation of 18 million pages of digitised newspapers to Europeana and to The European Library • 8 million pages “as is” (content providers) • 8 million refined pages: OCR (UIBK, Austria) www.europeana.eu/ • 2 million refined pages: OCR/OLR (article segmentation) (CCS, Germany)• Analysis of available digital newspaper collections and selection of subsets suitable for refinement www.theeuropeanlibrary.org/ 6
  7. 7. 1) Refinement – OCR and OLR - UIBK• 8 million refined pages: OCR using ABBYY FRE10 (UIBK, Austria) • UIBK enriches the OCR with structural information from the Document Understanding Platform (FEP) developed within IMPACT • Dedicated profiles will be produced which are specifically tuned to the characteristics of newspapers to yield optimal results
  8. 8. 1) Refinement – OCR and OLR - CCS• 2 million refined pages: OCR/OLR (article segmentation) (CCS, Germany) • CCS produces OCR and verification of column recognition, zoning, article segmentation, and page class recognition • CCS provides libraries with a client technology for manual correction of recognition and segmentation results • OCRing done with ABBYY FRE10, which includes improvements developed within IMPACT CCS: Column recognition, article segmentation
  9. 9. 1) Refinement - Named Entity Recognition• KB provides named entities recognition (NER) for material from up to three languages (Dutch, English, and German) • Pilot planned for second half of 2012 Image by Frank Landsbergen (INL)
  10. 10. 2 ) A n a ly s is o f e x is t in g d ig it is e dn e w s p a p e r c o lle c t io n s• Project partners and others are contacted to provide input until 31 July 2012 to analyse the extent of digitised newspapers collections at their institutions • Results will be embedded in “Zeitschriftendatenbank” of Staatsbibliothek zu Berlin (Union Catalogue of Serials) • Potential new partners for the extension of the network will be suggested by survey• Also useful to ascertain the technical status of digitised dataIf you have a digital newspaper collection and would like to participate inthe survey  please go to: http://www.surveymonkey.com/s/BQ28579
  11. 11. 3) Quality Assurance & Best practice recommendations• The digitisation workflow for newspapers, including refinement, will be evaluation through an evaluation and quality assessment framework, containing tools developed in IMPACT • Document Management System • Ground truth production tool Aletheia • Evaluation tools• Provide recommendations on best practices for digitisation and refinement of newspapers
  12. 12. 3) Quality Assurance & Best practice recommendations• Analysis of metadata formats in use by libraries in digitisation projects• Align metadata models with the METS/ALTO standard• Release best practice recommendation on how to apply these formats in newspaper digitisation and refinement• Supports content browser
  13. 13. 4) Presentation & Access to full-text• Within the lifetime of the project, a content browser will be built within TEL portal so that users can … • Search full text, e.g. • by search term, • by named entities • by collections of newspapers • by date …. • See newspaper images • Be linked to relevant library sources • This browser will be built in TEL during the project; and exported to Europeana after the project
  14. 14. 5) Dissemination• Objectives: • Establishment of publicity • Increasing usage of Europeana • Awareness raising among target groups• Tasks: • Media Communication • Workshops and conferences • Three main dissemination workshops • National information days • Network extension 3. Exploitation 14
  15. 15. Thank you for your attention!http://www.europeana-newspapers.eu/ Lotte Wilms Lotte.wilms@kb.nl

×