Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Europeana newspapers IFLA2013 satellite meeting

579 views

Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Europeana newspapers IFLA2013 satellite meeting

  1. 1. Europeana Newspapers: The Gateway to European Newspapers Online IFLA 2013 SATELLITE MEETING ON NEWSPAPER & GENLOC SECTIONS Singapore, 14 August 2013 Clemens Neudecker @cneudecker
  2. 2. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Overview • Objectives • Overview of Dataset • Workflows & Technologies • Questions & Answers 2 Image: Nationaal Archief The Netherlands
  3. 3. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Objectives • Refinement of 10 mill. pages with OCR, OLR, NER • Ingestion of metadata for 18 mill. pages in Europeana • Create a full text content browser for newspapers • Create a unified METS/ALTO profile (ENMAP) • Produce tools in order to ease creation of ENMAP objects • Share best practices and provide recommendations 3
  4. 4. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Who 12 content providers 2 networking partners 4 technology providers 1 aggregator
  5. 5. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Recently associated
  6. 6. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp The data
  7. 7. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Europeana Newspaper Dataset (1)
  8. 8. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Europeana Newspaper Dataset (2)
  9. 9. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Europeana Newspapers Dataset (3)
  10. 10. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Europeana Newspapers Dataset (4)
  11. 11. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp The workflow 11
  12. 12. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp OCR @ UIBK • OCR = Optical Character Recognition • Technologies: ABBYY FineReader SDK • State-of-the-art OCR software, fully supports Fraktur/Latin/Cyrillic fonts • METS/ALTO package containing images, metadata & full text 12
  13. 13. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp OLR @ CCS • OLR = Optical Layout Recognition • Technologies: docWorks • Separation of columns, articles, headlines, page classes • METS/ALTO package containing images, metadata & full text 13
  14. 14. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp NER @ KB • NER = Named Entities Recognition • Technologies: Stanford CRF-NER • Open source: https://github.com/KBNLresearch/europeananp-ner • Detection of Named entities: Person, Location, Organization 14
  15. 15. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp QA @ PRImA • Layout and OCR evaluation • Technologies: Ground truth + Evaluation Tools (IMPACT) • In-depth scenario driven evaluation using profiles with more than 600 metrics 15
  16. 16. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Full-text search @ TEL 16 Blog www.europeana-newspapers.eu Workshop 16 Sept. 2013 (Amsterdam)
  17. 17. Thank you for your attention! clemens.neudecker@kb.nl

×