Improving the discovery of European Historic Newspapers 
Rossitza Atanassova, British Library 
@RossiAtanassova 
IFLA Newspapers, Lyon, 20 August 2014
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Europeana Newspapers is making historic newspapers pages searchable 
2 
http://vimeo.com/100313926
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Project outcomes 
•Content in 22 languages ranging 17th-20th century 
•10 million pages of full text 
•Article-level records and named entities for 2 million pages 
•Aggregation of up to 18 million pages 
•Aggregation of metadata of up to additional 19 million pages 
•Cross-searchable newspapers interface at The European Library 
•http://www.theeuropeanlibrary. org/tel4/newspapers 
•Issue-level metadata via Europeana http://www.europeana.eu/ 
3
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Statistics 
Currently one can search through 
•full-text for over 2 million pages 
•metadata records relating to 
to over 1 million issues 
(links to source libraries) 
4
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
5
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Search and browse options 
6
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Display options 
•Metadata, full-text and full zoomable images 
•Metadata, full-text and static images (full size or snippets) 
•Metadata and full-text 
•Metadata 
7
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Usability testing 
•Remote 60 minutes long test sessions in April 2014 
•Conducted by User Vision, Edinburgh 
•12 participants from 5 countries with professional or strong personal research interest in the content 
•6 task scenarios 
•Pre- and post-test questionnaires 
•User Vision Report at http://www.europeana- newspapers.eu/usability-testing-results-for-our-historic- newspapers-browser/ 
8
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Task success and ease of use ratings 
9 
Images in Alan Blackwood, The European Library Newspaper Archive – Usability Testing, 16/04/2014
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
User response to the interface 
•“Strong positive reaction to the availability of the archive” 
•“Aggregated view of content from many sources highly valued” 
•“Basic search functionalities worked well” 
•Presentation of images and image navigation controls are appreciated, as is the display of OCRed text 
•Browse content over geographical map is popular 
•Identified issues with design and functionality: facets, results, navigation 
•More expectations: print, download, saved searches 
10
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Before and after 
11
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Changes to landing page 
•Prominent browse and advanced options 
•‘Discover’ tab for browse options page 
•This day in history allows users to scroll through all relevant issues 
12
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Changes to browsing options 
•Search by issue date modified to include a text input box for the year with auto-suggestions 
•Select title from an alphabetical index 
•Geographical map of Europe is bigger and uses better colour palette to indicate number of issues 
13
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
•Sort by relevance, descending date and ascending date 
•Configure number of items per page (10-100) 
•Further recommendations: controls to navigate between results, a ‘back to search results’ button and a search input box to allow modification of search terms 
14 
Changes to results pages
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
15 
Faceted search and newspaper source page
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
16
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Integration of the viewer into the Europeana portal 
17
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Next steps with the browser 
18 
•Second usability test in September 
•Final version by end of 2014 
•Add OCR correction functionality 
•Allow access via API 
•Further integration of the newspapers viewer within Europeana
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Research practices and expectations 
•Participants in the usability test have well established research practices and higher expectations of the site’s functionality 
•Preference for search over browsing 
•Greater control over search results 
•Multiple layers of search through facets 
•Would like to search by subject area and historical period 
•User account to save search histories 
•Download and print options 
•New content notifications and feedback submission option 
19
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Researchers’ interest in the Europeana Newspapers archive 
20 
•Interdisciplinary source of information 
•Mass digitised content 
•Pan-European cross- searchable archive 
•Transnational comparative studies 
•Text mining for multilingual content 
•Computational analysis and visualisation of the data
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
What researchers value 
21 
“I see enormous value in an archive that breaks down national boundaries automatically, where I can search for content from a range of countries..” – Bob Nicholson 
“The difference lies not just in access but in the conversion of a massive amount of print into a searchable resource … This holds the potential to make connections across newspapers in ways previously unimaginable.” Matt Rubery 
“Now software allows us to work with millions of pages. By combining words and expressions, machines uncover patterns that we never even suspected were there …” Professor Toine Pieters
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Digital Humanities approaches to digitised newspaper archives 
22 
•Asymmetrical Encounters: E- Humanity Approaches to Reference Cultures in Europe, 1815-1992’ 
•The project will apply multi- lingual text mining techniques to long runs of digitised newspapers and other textual materials
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
The Victorian Meme Machine project 
23 
•Partnership between Bob Nicholson, Edge Hill University and British Library Labs 
•Extract Victorian jokes from 19th century British newspapers 
•Crowdsource transcriptions 
•Algorithms to pair text with images 
•Share and re-use memes 
https://www.youtube.com/watch?v=FN1ZSAz2vMg
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Europeana Newspapers Information Days 
24
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 
Final workshop “Newspapers in Europe & the Digital Agenda for Europe” 
25 
•British Library, 29-30 September 2014 
•The value of digitised historic newspapers 
•How to overcome the barriers to improving access to digitised historic newspapers 
•Policy makers, researchers, librarians, cultural heritage professionals and newspaper publishers
Thank you! For more information visit www.europeana-newspapers.eu

IFLA 2014 Europeana Newspapers Rossitza Atanassova

  • 1.
    Improving the discoveryof European Historic Newspapers Rossitza Atanassova, British Library @RossiAtanassova IFLA Newspapers, Lyon, 20 August 2014
  • 2.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Europeana Newspapers is making historic newspapers pages searchable 2 http://vimeo.com/100313926
  • 3.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Project outcomes •Content in 22 languages ranging 17th-20th century •10 million pages of full text •Article-level records and named entities for 2 million pages •Aggregation of up to 18 million pages •Aggregation of metadata of up to additional 19 million pages •Cross-searchable newspapers interface at The European Library •http://www.theeuropeanlibrary. org/tel4/newspapers •Issue-level metadata via Europeana http://www.europeana.eu/ 3
  • 4.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Statistics Currently one can search through •full-text for over 2 million pages •metadata records relating to to over 1 million issues (links to source libraries) 4
  • 5.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 5
  • 6.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Search and browse options 6
  • 7.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Display options •Metadata, full-text and full zoomable images •Metadata, full-text and static images (full size or snippets) •Metadata and full-text •Metadata 7
  • 8.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Usability testing •Remote 60 minutes long test sessions in April 2014 •Conducted by User Vision, Edinburgh •12 participants from 5 countries with professional or strong personal research interest in the content •6 task scenarios •Pre- and post-test questionnaires •User Vision Report at http://www.europeana- newspapers.eu/usability-testing-results-for-our-historic- newspapers-browser/ 8
  • 9.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Task success and ease of use ratings 9 Images in Alan Blackwood, The European Library Newspaper Archive – Usability Testing, 16/04/2014
  • 10.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp User response to the interface •“Strong positive reaction to the availability of the archive” •“Aggregated view of content from many sources highly valued” •“Basic search functionalities worked well” •Presentation of images and image navigation controls are appreciated, as is the display of OCRed text •Browse content over geographical map is popular •Identified issues with design and functionality: facets, results, navigation •More expectations: print, download, saved searches 10
  • 11.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Before and after 11
  • 12.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Changes to landing page •Prominent browse and advanced options •‘Discover’ tab for browse options page •This day in history allows users to scroll through all relevant issues 12
  • 13.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Changes to browsing options •Search by issue date modified to include a text input box for the year with auto-suggestions •Select title from an alphabetical index •Geographical map of Europe is bigger and uses better colour palette to indicate number of issues 13
  • 14.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp •Sort by relevance, descending date and ascending date •Configure number of items per page (10-100) •Further recommendations: controls to navigate between results, a ‘back to search results’ button and a search input box to allow modification of search terms 14 Changes to results pages
  • 15.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 15 Faceted search and newspaper source page
  • 16.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 16
  • 17.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Integration of the viewer into the Europeana portal 17
  • 18.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Next steps with the browser 18 •Second usability test in September •Final version by end of 2014 •Add OCR correction functionality •Allow access via API •Further integration of the newspapers viewer within Europeana
  • 19.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Research practices and expectations •Participants in the usability test have well established research practices and higher expectations of the site’s functionality •Preference for search over browsing •Greater control over search results •Multiple layers of search through facets •Would like to search by subject area and historical period •User account to save search histories •Download and print options •New content notifications and feedback submission option 19
  • 20.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Researchers’ interest in the Europeana Newspapers archive 20 •Interdisciplinary source of information •Mass digitised content •Pan-European cross- searchable archive •Transnational comparative studies •Text mining for multilingual content •Computational analysis and visualisation of the data
  • 21.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp What researchers value 21 “I see enormous value in an archive that breaks down national boundaries automatically, where I can search for content from a range of countries..” – Bob Nicholson “The difference lies not just in access but in the conversion of a massive amount of print into a searchable resource … This holds the potential to make connections across newspapers in ways previously unimaginable.” Matt Rubery “Now software allows us to work with millions of pages. By combining words and expressions, machines uncover patterns that we never even suspected were there …” Professor Toine Pieters
  • 22.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Digital Humanities approaches to digitised newspaper archives 22 •Asymmetrical Encounters: E- Humanity Approaches to Reference Cultures in Europe, 1815-1992’ •The project will apply multi- lingual text mining techniques to long runs of digitised newspapers and other textual materials
  • 23.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp The Victorian Meme Machine project 23 •Partnership between Bob Nicholson, Edge Hill University and British Library Labs •Extract Victorian jokes from 19th century British newspapers •Crowdsource transcriptions •Algorithms to pair text with images •Share and re-use memes https://www.youtube.com/watch?v=FN1ZSAz2vMg
  • 24.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Europeana Newspapers Information Days 24
  • 25.
    This project ispartially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Final workshop “Newspapers in Europe & the Digital Agenda for Europe” 25 •British Library, 29-30 September 2014 •The value of digitised historic newspapers •How to overcome the barriers to improving access to digitised historic newspapers •Policy makers, researchers, librarians, cultural heritage professionals and newspaper publishers
  • 26.
    Thank you! Formore information visit www.europeana-newspapers.eu