Digitised historic newspapers in Europe


Published on

Results of a survey on newspaper digitisation with European public libraries. Also, plans of The European Library to build a cross-search tool incorporating library collections

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Digitised historic newspapers in Europe

  1. 1. Surveying Newspaper Digitisation in EuropeanLibraries, Then Aggregating Them !Europeana NewspapersAlastair DunningProgramme Manager, The European Library@alastairdunning, alastair.dunning AT kb.nlLIBER Conference, June 2013, MunichThis presentation is at http://www.slideshare.net/alastairdunning
  2. 2. On November 3, 1948,the early edition of theChicago Tribuneproclaimed ThomasDewey as winner of theUS presidentialcampaignhttp://www.chicagotribune.com/news/politics/chi-histdewey_defeats_an20080104104816,0,547284.photo
  3. 3. In actual fact, thecampaign was won byHarry Truman, whobecame the 33rdPresident of the UnitedStateshttp://en.wikipedia.org/wiki/File:Deweytruman12.jpg
  4. 4. Later editions of theChicago Tribunecorrected this mistakewith headline"DEMOCRATS MAKESWEEP OF STATEOFFICES"However, I cannot findthese online !http://en.wikipedia.org/wiki/File:Deweytruman12.jpg
  5. 5. As we shall see, presentingcomprehensive digital archives,where everything is digitised, isdifficult... yet this is what usersoften demand !
  6. 6. "This lack of collocation and collectionpresents efficiency challenges and deepensscholars’ concerns aboutcomprehensiveness. The anxiety over“missing something” was quite commonacross interviews."Ithaka S+R, Supporting the ChangingResearch Practices of Historians,http://www.sr.ithaka.org/research-publications/supporting-changing-research-practices-historians
  7. 7. "When lined up against the non-digitalobject upon which it is based, the digitalobject can only ever appear impoverished."Jim Mussell, Historian atUniversity of Birminghamhttp://jimmussell.com/2013/05/23/the-proximal-past-digital-archives-and-the-here-and-now/
  8. 8. Genealogists - those studying familyhistory"Genealogists represent the majority ofusers in many archives. And yet, thetraditional archival information systemdoes not meet their needs."Wendy M. Duff, Catherine A. Johnson, Where Is theList with All the Names? Information-Seeking Behaviorof Genealogists, American Archivist, Volume 66(1),2003, http://archivists.metapress.com/content/L375UJ047224737N
  9. 9. Despite this, Europeanlibraries have made greatstrides in digitising theirnewspapers(These results taken from firstEuropeana Newspaperssurvey, 2012. 47 librariesresponded.)http://www.europeana-newspapers.eu/wp-content/uploads/2012/04/D4.1-Europeana-newspapers-survey-report.pdf
  10. 10. 129, 041, 663pagesfrom23,987 titles
  11. 11. 11 libraries have digitised more than 3m pages1. National Library of Czech Republic2. Koninklijke Bibliotheek van België3. National Library of Spain4. National Library of Norway5. National and Univeristy Library of Iceland6. BCU Lausanne7. Hamburg State and University Library8. Bibliothèque nationale de France9. British Library10. Koninklijke Bibliotheek11. Austrian National Library
  12. 12. But, only 12 (26%) of thelibraries had digitised more than 10%of their collection(either in terms of titles or page numbers)
  13. 13. National Library of Luxembourg620.000pages digitised4.000.000pages in collection
  14. 14. National Library of Finland620.000pages digitised2.010.246pages in collection
  15. 15. Hamburg State and University Libraryc. 2.000.000 pages digitisedc. 12.000.000 pagesin collection
  16. 16. What else did the survey discover ?
  17. 17. Access to digitised newspapers is nearly alwaysfree of charge. At least 40 (85%)offered free access to their digitisednewspapers.One library had pay per view, whilst another three offeredsubscription services for users (ie paid access per day or permonth).Only four libraries licensed their newspaper contents toother groups (e.g. school, universities).
  18. 18. Access to twentieth-century content remainsproblematic.27 out of 47 libraries (57%)have a cut off datebeyond which they will not publish digitised newspapers onthe web. Most frequently, this is based on a 70 year slidingscale.23%(11 out of 47) had an agreement with a rightsorganisation so that in-copyright digitised newspapers couldbe published, but often restricted to individual titles
  19. 19. There is still much to be done to exploit the richnessof digitised newspaper content64%(37 from 47) of libraries made use of OCRBut only 17 of these libraries (36%) exposed the resultingfull text to the viewer36%had undertaken zoning and segmentation and only sixlibraries (13%) had included features such as facettedbrowsing or extracting entities such as place or name
  20. 20. --> Motivation for EuropeanaNewspapersOthers WPs will explain process ofimproving digitised archives but Iwant to return to one earlierquote
  21. 21. "... the lack of comprehensive searchtools for primary sources ..."Locating primary sources presents acrucial challenge for reserachers.--> TEL aggregator as part ofEuropeana Newspapers project
  22. 22. Timetable: Early version withlimited content added to TheEuropean Library website inSeptember 20More content being added in 2013and 2014
  23. 23. http://theeuropeanlibrary.org willdeliver a search interface to helplocate 18mpages digitisedat European librairesUsers will also be able to searchover titles of newspapers. Titlemetadata will also be forwarded toEuropeana
  24. 24. Some Issues:Copyright means that someimages cannot be shared at all,only metadata (e.g. names anddates of newspapers)
  25. 25. Some Issues:OCR and zoning quality will affectsearch results significantly. EgHigher quality OCR will bereturned more often in searchresults
  26. 26. Some Issues:Some pages have no OCRwhatsoever - more difficult to find
  27. 27. Some Issues:Different libraries are willing toshare different amounts ofcontentSome libraries happy for fullcontent to be shared; for others itis just snippets of images
  28. 28. Last Thoughts and What Next ?:The European Library will sustain accessbeyond project funding; but adding morecontent will require membership of TELHow can we allow for transcription?What do non-academic users want?How do we create full-text APIs ?
  29. 29. Oh, the results herewere all based on thefirst edition of theproject survey.If your library want tocontribute to latereditions, see links byJuly 2013http://www.europeana-newspapers.eu/tell-us-about-your-newspaper-digitisation-project/http://www.surveymonkey.com/s/BQ28579
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.