Europeana Newspapers -


Published on

Building a website to search over historic European newspapers,

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Europeana Newspapers -

  1. 1. Europeana Newspapers 9 June 2014 – London– Morning Edition Published by Alastair Dunning, The European Library @alastairdunning,
  2. 2. On 15th April 1912, the passenger ship Titanic, carrying over 2,000 passengers and crew, crashed into an iceberg on its maiden voyage from Southampton to New York
  3. 3. Responses to the Titanic Disaster
  4. 4. Responses to the Titanic Disaster query=de+telegraaf+titanic&coll=ddd&image=ddd %3A110546692%3Ampeg21%3Aa0026&page=2&maxperpage=10&sortfield=date
  5. 5. Responses to the Titanic Disaster
  6. 6. Responses to the Titanic Disaster
  7. 7. Responses to the Titanic Disaster
  8. 8. News travels at different speeds, with importance that diminishes at different rates. This is true now as is was in 1912. (though the web changes things …)
  9. 9. The Europeana Newspapers project is making this kind of investigation easier
  10. 10. A cross-searchable newspapers interface at The European Library (with issue-level metadata forwarded to Europeana)
  11. 11. Currently: Search through full text of around 2 million pages of full text By 2015: 10m pages of full text, up to 2m issues Searching by keyword, and organise by language, date, source library, title
  12. 12. Currently: Search through metadata records relating to 1.12m issues – with links to source libraries By 2015: Search through metadata records relating to up to 4m issues - with links to source librariesBrowse by date or map
  13. 13. Full Text from following libraries •Bibliotheque nationale de France / National Library France •Koninklijke Bibliotheek / National Library of the Netherlands •Landesbibliothek Dr. Friedrich Teßmann / Teßmann Library •Eesti Rahvusraamatukogu / Estonian National Library • Kansalliskirjasto / National Library of Finland • Latvijas Nacionala Biblioteka / National Library of Latvia •Biblioteka Narodowa / National Library of Poland •Milli Kutuphane Baskanligi / National Library of Turkey • Österreichische Nationalbibliothek / Austrian National Library •Staatsbibliothek zu Berlin / Berlin State Library •Staats- und Universitätsbibliothek Hamburg / State and University Library • Univerzitet u Beogradu / University Library of Belgrade Searching by title
  14. 14. Issue Level Records from following libraries •National Library of Wales •St. Cyril and Methodius National Library / The National Library of Bulgaria •National Library of Czech Republic •National and University Library in Zagreb •Koninklijke Bibliotheek van België / Bibliothèque royale de Belgique •Narodna in univerzitetna knjinica / National and University Library of Slovenia •National Library of Portugal •National Library of Romania •Landsbókasafn Íslands - Háskólabókasafn / National and Univeristy Library of Iceland National Library of Spain •Bibliothèque nationale de Luxembourg / National Library of Luxembourg Finding matching results in single or multiple issues
  15. 15. Highlighting search terms
  16. 16. So far, okay. Similar functionality to other national and regional digital libraries of newspapers See other archives via: msid=217164746645697066594.0004c3d764fcb71ed2314&msa=0
  17. 17. But what was the user response to an aggregation of European newspaper libraries ? Results of Usability Testing: content/uploads/2014/05/The-European-Library- Newspaper-Archive-Usability-testing-Report- April-2014.pdf
  18. 18. “Aggregated view of content from many sources highly valued. There was a strong positive reaction to the availability of the archive.”
  19. 19. “Many saying they would be keen to return to the site as the content expands.”
  20. 20. “Ability to search over geographic map was highly valued”
  21. 21. Plenty of quibbles about design - positions of advanced options - re-order list of results - manipulating facets
  22. 22. Much greater expectations of functionality once logged in For example, Saved searches New content notification
  23. 23. “Much of the value of the site to participants was provided by the images of the documents. Participants expected to be able to save a 'local' copy once they had located content of relevance. As no download facility is provided, this led to some frustration and undermined the overall potential value of the site for some participants.”
  24. 24. Timetable for rest of project Now – Protype version of interface shared with project Throughout 2014 - Ongoing creation of OCR, and other related technical work (OLR, Named Entities) Throughout 2014 – Live version of website improved / usability testing / added content Autumn 2014 - Final project conference Late 2014 - Newspaper browser completed with content and tools from project More information at Interface at
  25. 25. Things the users didn’t say (but I thought they would)
  26. 26. Why can’t I edit the text ? (Our sample was researchers/ maybe it is other communities interested in crowdsourcing?) Note: If time permits, The European Library will develop some crowdsourcing feature
  27. 27. Can I download text for data mining? Remember: Digital Humanists are still a small percentage of humanists and users Note: Many of the texts are marked public domain, so this is feasible in legal terms
  28. 28. Number of digitised pages in interface: c.2m Number of digitised pages in European libraries: c.130m Number of physical pages in European libraries: 1.5bn+ Source: European Newspaper Survey Report newspapers-survey-report.pdf
  29. 29. The project digital library is only a fraction of the newspaper archive of the continent, indeed the world
  30. 30. As libraries, how should we represent that absence to users ?
  31. 31. Should such absence be represented in the interface itself ?
  32. 32. Vast white spaces in the list of results ?
  33. 33. Provided standardised descriptions of digitised resources ? Standardised information for every digital resource of presenting collections, content, licencing, re-use
  34. 34. Charts and graphs external to the interface ?
  35. 35. There are other issues too  OCR quality varies  Some pages (2m by 2015) have articles segmentation  Some library content has named entity extraction effecting search results  Different licensing statements from different countries  Date of copyright boundaries different in each country
  36. 36. How should we allow users better ways to understand the digital library ?
  37. 37. What role can the API play in this? Can opening up the data in the digital library and allowing it to explored in different ways ?
  38. 38. Traditional Model With an API Interface (Created by Library) Data (Published by Library) Interface (Created by Third Party) Data (Published by Library) API – Application Programming Interfaces
  39. 39. Pioneering work of Trove API
  40. 40. Interface (Created by Library) Data (Published by Library) Trove Newspapers site as published by National Library of Australia, and based on data provided by Library
  41. 41. Trove Newspapers statistics develolped by third party, based on data provided by library Interface (Created by Third Party) Data (Published by Library)
  42. 42. Headline Roulette, developed by third party, based on data provided by library roulette/ Interface (Created by Third Party) Data (Published by Library)
  43. 43. Word Count of Articles, developed by third party, based on data provided by library Interface (Created by Third Party) Data (Published by Library)
  44. 44. Sounds great ! But … ?
  45. 45. How many people in this audience would now how to build an interface on top of API?
  46. 46. How many users do you know who could build on top of an API ?
  47. 47. That is the problem I leave you to discuss Thank you.