Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

411 views
289 views

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
411
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Europeana Newspapers: Surveying Newspaper Digitisation in European Libraries, then Aggregating Them

  1. 1. Europeana Newspapers Alastair Dunning Programme Manager, The European Library @alastairdunning, alastair.dunning AT kb.nl LIBER Conference, June 2013, Munich Surveying Newspaper Digitisation in European Libraries, Then Aggregating Them ! This presentation is at http://www.slideshare.net/alastairdunning
  2. 2. On November 3, 1948, the early edition of the Chicago Tribune proclaimed Thomas Dewey as winner of the US presidential campaign http://www.chicagotribune.com/news/politics/chi-histdewey_defeats_an20080104104816,0,547284.photo
  3. 3. In actual fact, the campaign was won by Harry Truman, who became the 33rd President of the United States http://en.wikipedia.org/wiki/File:Deweytruman12.jpg
  4. 4. Later editions of the Chicago Tribune corrected this mistake with headline "DEMOCRATS MAKE SWEEP OF STATE OFFICES" However, I cannot find these online ! http://en.wikipedia.org/wiki/File:Deweytruman12.jpg
  5. 5. As we shall see, presenting comprehensive digital archives, where everything is digitised, is difficult... yet this is what users often demand !
  6. 6. "This lack of collocation and collection presents efficiency challenges and deepens scholars’ concerns about comprehensiveness. The anxiety over “missing something” was quite common across interviews." Ithaka S+R, Supporting the Changing Research Practices of Historians, http://www.sr.ithaka.org/research-publications/supporting-changingresearch-practices-historians
  7. 7. "When lined up against the non-digital object upon which it is based, the digital object can only ever appear impoverished." Jim Mussell, Historian at University of Birmingham http://jimmussell.com/2013/05/23/the-proximal-pastdigital-archives-and-the-here-and-now/
  8. 8. Genealogists - those studying family history "Genealogists represent the majority of users in many archives. And yet, the traditional archival information system does not meet their needs." Wendy M. Duff, Catherine A. Johnson, Where Is the List with All the Names? Information-Seeking Behavior of Genealogists, American Archivist, Volume 66(1), 2003, http://archivists.metapress.com/content/L375UJ047224737N
  9. 9. Despite this, European libraries have made great strides in digitising their newspapers (These results taken from first Europeana Newspapers survey, 2012. 47 libraries responded.) http://www.europeana-newspapers.eu/wp-content/uploads/2012/04/D4.1-Europeananewspapers-survey-report.pdf
  10. 10. 129, 041, 663 from 23,987 titles pages
  11. 11. 11 libraries have digitised more than 3m pages 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. National Library of Czech Republic Koninklijke Bibliotheek van België National Library of Spain National Library of Norway National and Univeristy Library of Iceland BCU Lausanne Hamburg State and University Library Bibliothèque nationale de France British Library Koninklijke Bibliotheek Austrian National Library
  12. 12. But, only 12 (26%) 10% of the libraries had digitised more than of their collection (either in terms of titles or page numbers)
  13. 13. National Library of Luxembourg 4.000.000 pages in collection 620.000 pages digitised
  14. 14. National Library of Finland 620.000 pages digitised 2.010.246 pages in collection
  15. 15. Hamburg State and University Library c. 2.000.000 pages digitised c. 12.000.000 pages in collection
  16. 16. What else did the survey discover ?
  17. 17. Access to digitised newspapers is nearly always free of charge. At least 40 (85%) offered free access to their digitised newspapers. One library had pay per view, whilst another three offered subscription services for users (ie paid access per day or per month). Only four libraries licensed their newspaper contents to other groups (e.g. school, universities).
  18. 18. Access to twentieth-century content remains problematic. 27 out of 47 libraries (57%) have a cut off date beyond which they will not publish digitised newspapers on the web. Most frequently, this is based on a 70 year sliding scale. 23% (11 out of 47) had an agreement with a rights organisation so that in-copyright digitised newspapers could be published, but often restricted to individual titles
  19. 19. There is still much to be done to exploit the richness of digitised newspaper content 64% (37 from 47) of libraries made use of OCR But only 17 of these libraries ( 36% ) exposed the resulting full text to the viewer 36% 13% had undertaken zoning and segmentation and only six libraries ( ) had included features such as facetted browsing or extracting entities such as place or name
  20. 20. --> Motivation for Europeana Newspapers Others WPs will explain process of improving digitised archives but I want to return to one earlier quote
  21. 21. "... the lack of comprehensive search tools for primary sources ..." Locating primary sources presents a crucial challenge for reserachers. --> TEL aggregator as part of Europeana Newspapers project
  22. 22. Timetable: Early version with limited content added to The European Library website in September 20 More content being added in 2013 and 2014
  23. 23. http://theeuropeanlibrary.org will deliver a search interface to help locate 18m pages digitised at European libraires Users will also be able to search over titles of newspapers. Title metadata will also be forwarded to Europeana
  24. 24. Some Issues: Copyright means that some images cannot be shared at all, only metadata (e.g. names and dates of newspapers)
  25. 25. Some Issues: OCR and zoning quality will affect search results significantly. Eg Higher quality OCR will be returned more often in search results
  26. 26. Some Issues: Some pages have no OCR whatsoever - more difficult to find
  27. 27. Some Issues: Different libraries are willing to share different amounts of content Some libraries happy for full content to be shared; for others it is just snippets of images
  28. 28. Last Thoughts and What Next ?: The European Library will sustain access beyond project funding; but adding more content will require membership of TEL How can we allow for transcription? What do non-academic users want? How do we create full-text APIs ?
  29. 29. Oh, the results here were all based on the first edition of the project survey. If your library want to contribute to later editions, see links by July 2013 http://www.europeana-newspapers.eu/tell-us-about-your-newspaperdigitisation-project/ http://www.surveymonkey.com/s/BQ28579

×