Your SlideShare is downloading. ×
0
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Representation and Absence in Digital Resources: The Case of Europeana Newspapers
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Representation and Absence in Digital Resources: The Case of Europeana Newspapers

419

Published on

Presentation at Digital Humanities 2014, Lausanne. Looks at some of the issues related to digitising historic newspapers in Europe, particularly how a website that can search through all of them can …

Presentation at Digital Humanities 2014, Lausanne. Looks at some of the issues related to digitising historic newspapers in Europe, particularly how a website that can search through all of them can be built

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
419
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Representation and Absence in Digital Resources: The Case of Europeana Newspapers Alastair Dunning, The European Library, @alastairdunning Clemens Neudecker, National Library of Netherlands, @cneudecker DH2014, Lausanne
  • 2. Source: Europeana Strategic Plan, 2015-2020, currently unpublished. See also Enumerate Project, enumerate.eu
  • 3. The estimated total cost of digitising the collections of Europe’s museums, archives and libraries, including the audiovisual material they hold is approximately €100bn, or €10bn per annum for the next 10 years, factoring in a cumulative efficiency gain of 0.5% per annum. The Research & Development Budget for the Joint Strike Fighter programme is estimated at €40.34bn. It would cost between 10% and 40% of the Joint Strike Fighter R&D budget to digitise every eligible title in Europe’s librariesSource: Nick Poole, Collections Trust, http://nickpoole.org.uk/wp- content/uploads/2011/12/digiti_repor t.pdf
  • 4. Currently: 2 million pages of full text By 2015: 10 million pages of full text Searching by keyword, and organise by language, date, source library, title Link: http://www.theeuropeanlibrary.org/tel4/newspapers
  • 5. Currently: Metadata records relating to 1.12m issues By 2015: Metadata records relating to up to 4m issues - Browse by date or map Link: http://www.theeuropeanlibrary.org/tel4/newspapers
  • 6. Full Text from following libraries •Bibliotheque nationale de France / National Library France •Koninklijke Bibliotheek / National Library of the Netherlands •Landesbibliothek Dr. Friedrich Teßmann / Teßmann Library •Eesti Rahvusraamatukogu / Estonian National Library • Kansalliskirjasto / National Library of Finland • Latvijas Nacionala Biblioteka / National Library of Latvia •Biblioteka Narodowa / National Library of Poland •Milli Kutuphane Baskanligi / National Library of Turkey • Österreichische Nationalbibliothek / Austrian National Library •Staatsbibliothek zu Berlin / Berlin State Library •Staats- und Universitätsbibliothek Hamburg / State and University Library • Univerzitet u Beogradu / University Library of Belgrade Searching by title
  • 7. Issue Level Records from following libraries •National Library of Wales •St. Cyril and Methodius National Library / The National Library of Bulgaria •National Library of Czech Republic •National and University Library in Zagreb •Koninklijke Bibliotheek van België / Bibliothèque royale de Belgique •Narodna in univerzitetna knjinica / National and University Library of Slovenia •National Library of Portugal •National Library of Romania •Landsbókasafn Íslands - Háskólabókasafn / National and Univeristy Library of Iceland National Library of Spain •Bibliothèque nationale de Luxembourg / National Library of Luxembourg Finding matching results in single or multiple issues
  • 8. Highlighting search terms
  • 9. So far, okay. Similar functionality to other national and regional digital libraries of newspapers See other archives via: https://www.google.com/maps/ms?msid=217164746645697066594.0004c3d764fcb71ed2 314&msa=0
  • 10. But what was the user response to an aggregation of European newspaper libraries ? Results of Usability Testing: http://www.europeana-newspapers.eu/wp-content/uploads/2014/05/The-European- Library-Newspaper-Archive-Usability-testing-Report-April-2014.pdf
  • 11. Source: http://www.nytimes.com/2007/03/10/business/yourmoney/11archive.html
  • 12. “Many saying they would be keen to return to the site as the content expands.”
  • 13. “Ability to search over geographic map was highly valued”
  • 14. Plenty of quibbles about design - positions of advanced options - re-order list of results - manipulating facets
  • 15. Much greater expectations of functionality once logged in For example, Saved searches New content notification
  • 16. “Much of the value of the site to participants was provided by the images of the documents. Participants expected to be able to save a 'local' copy once they had located content of relevance. As no download facility is provided, this led to some frustration and undermined the overall potential value of the site for some participants.”
  • 17. Timetable for rest of project Now – Protype version of interface shared with project Throughout 2014 - Ongoing creation of OCR, and other related technical work (OLR, Named Entities) Throughout 2014 – Live version of website improved / usability testing / added content Autumn 2014 - Final project conference Late 2014 - Newspaper browser completed with content and tools from project More information at http://www.europeana-newspapers.eu/ Interface at http://www.theeuropeanlibrary.org/tel4/newspapers/
  • 18. Things the users didn’t say (but we thought they would)
  • 19. Why can’t I edit the text ? (Our sample was researchers/ maybe it is other communities interested in crowdsourcing?) Note: If time permits, The European Library will develop some crowdsourcing feature
  • 20. Source: Europeana Strategic Plan, 2015-2020, currently unpublished. See also Enumerate Project, enumerate.eu
  • 21. Number of digitised pages in interface: c.2m Number of digitised pages in European libraries: c.130m Number of physical pages in European libraries: 1.5bn+ Source: European Newspaper Survey Report http://www.europeana-newspapers.eu/wp-content/uploads/2012/04/D4.1-Europeana-newspapers- survey-report.pdf
  • 22. Source: European Newspaper Survey Report http://www.europeana-newspapers.eu/wp-content/uploads/2012/04/D4.1-Europeana-newspapers- survey-report.pdf Quantities of newspapers – a) in project b) digitised in total c) in physical libraries
  • 23. The project digital library is only a fraction of the newspaper archive of the continent, indeed the world
  • 24. As libraries, how should we represent that absence to users ?
  • 25. Should such absence be represented in the interface itself ?
  • 26. Vast white spaces in the list of results ?
  • 27. ….. Difficult to represent ‘archival gaps’ when seen in the context of how little has been digitised - creates a needle in the haystack ….
  • 28. The estimated total cost of digitising the collections of Europe’s museums, archives and libraries, including the audiovisual material they hold is approximately €100bn, or €10bn per annum for the next 10 years, factoring in a cumulative efficiency gain of 0.5% per annum. The Research & Development Budget for the Joint Strike Fighter programme is estimated at €40.34bn. It would cost between 10% and 40% of the Joint Strike Fighter R&D budget to digitise every eligible title in Europe’s librariesSource: Nick Poole, Collections Trust, http://nickpoole.org.uk/wp- content/uploads/2011/12/digiti_repor t.pdf
  • 29. Standardised information for every digital resource for representing collections, extent of content, licencing and re-use conditions
  • 30. Standardised information? For every digital resource produced in the world ? Are you kidding ?
  • 31. Charts and graphs external to the interface ?
  • 32. Graphs are the most obvious way of adding context but still very reliant on the library producing such charts
  • 33. How to derive a representative (random) sample from a digital collection? Source: http://dilbert.com/strips/comic/2001-10-25/
  • 34. Pieter Francois, winner of BL Labs competition 2013: “How representative are the historical texts humanities scholars study of the overall body of ‘surviving’ texts that are held in the various library collections?” labs.bl.uk/Sample+ Generator
  • 35. There are other issues in the project content too  Major issues  OCR quality varies  Different licensing statements from different countries  Date of copyright boundaries different in each country
  • 36. There are other issues in the interface too  Minor Issues  Some pages (2m by 2015) have articles segmentation  Some library content has named entity extraction effecting search results
  • 37. Source: http://homepages.inf.ed.ac.uk/balex/publications/slides-DATeCH.pdf 10M pages, 7 billion words – how much you are actually ignoring when using only the “good” OCR
  • 38. How should we allow users better ways to understand the digital library ?
  • 39. What role can the API play in this? Can opening up the data in the digital library and allowing it to explored in different ways ?
  • 40. Traditional Model With an API Interface (Created by Library) Data (Published by Library) Interface (Created by Third Party) Data (Published by Library) API – Application Programming Interfaces
  • 41. Pioneering work of Trove API (or rather of Tim Sherratt)
  • 42. Currently: 2 million pages of full text By 2015: 10 million pages of full text Searching by keyword, and organise by language, date, source library, title Link: http://www.theeuropeanlibrary.org/tel4/newspapers
  • 43. Trove Newspapers statistics develolped by third party, based on data provided by library http://wraggelabs.com/shed/trove/graphs/ Interface (Created by Third Party) Data (Published by Library)
  • 44. Headline Roulette, developed by third party, based on data provided by library http://wraggelabs.com/shed/headline- roulette/ Interface (Created by Third Party) Data (Published by Library)
  • 45. Word Count of Articles, developed by third party, based on data provided by library http://dhistory.org/frontpages/53/words/ Interface (Created by Third Party) Data (Published by Library)
  • 46. Sounds great ! But … ?
  • 47. How many people in this audience would now how to build an interface on top of API?
  • 48. How many users do you know who could build on top of an API ?
  • 49. Currently: Metadata records relating to 1.12m issues By 2015: Metadata records relating to up to 4m issues - Browse by date or map Link: http://www.theeuropeanlibrary.org/tel4/newspapers
  • 50. Desert: https://www.flickr.com/photos/aigle_dore/5952236932/sizes/l Borges Sign: https://www.flickr.com/photos/monceau/7705020640/ Map: http://gallica.bnf.fr/ark:/12148/btv1b530299707 Strike Fighter : http://en.wikipedia.org/wiki/Strike_fighter Credits

×