How to Build a Digital Library


Published on

Large-Scale Digital Archives: Publisher and Library Case Studies

Speakers: Thijs Willems, Project Manager, Online Archives, Springer; Jasper Faase, Project Manager, Newspaper Digitization Project, National Library of the Netherlands.

This session will present two large scale digitization projects, the Springer Book Archives and the National Library of the Netherlands (aka the Dutch KB). The audience will learn the ‘nuts and bolts’ of these unique projects: key decisions, timelines, consequences for internal and external stakeholders, production matters and clearing hurdles such as rights and permissions. The impact these key initiatives may have on long term preservation, the physical library, metadata and discoverability, author relations and the long tail of usage are topics for discussion with the audience.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

How to Build a Digital Library

  1. 1. How to built a digital Library?A case of Newspaperdigitization at the NationalLibrary of the Netherlands3th of November 2011Jasper FaaseProject Manager DigitizationEmail:
  2. 2. Mission• The Koninklijke Bibliotheek is the national library of the Netherlands: we bring people and information together.• Our core values are: accessibility sustainability accessibility, sustainability, innovation and cooperation.Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
  3. 3. Vision• We • Offer everyone everywhere access to everything published in the Netherlands • Pl a central role i th ( i tifi ) i f Play t l l in the (scientific) information ti infrastructure of the Netherlands • Promote permanent access to d g o o e pe e ccess o digital information o o nationally and internationallyKoninklijke Bibliotheek – Nationale bibliotheek van Nederland
  4. 4. How do we translate this vision into practice• Mass-digitization: when possible in public-private partnerships• Speeding up digitization: by the end of 2013 10% of all Dutch Books Newspapers and Magazines will be Books, digitized (60 million pages).Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
  5. 5. Digitization at the Dutch National Library (KB)ProjectP j t Timeline Ti li Pages PDutch Parliamentary Newspapers 2004-2010 2.500.000Dutch Daily Newspapers 2007-2012 9.000.000Early Dutch Books Online 2008-2010 2.000.000Magazines 2009-2011 1.500.000Google 2011 2014 2011-2014 30.000.000 30 000 000ProQuest 2011-2016 6.000.000Metamorfoze (Books, 2012-2016 9.000.000Newspapers and Magazines)Totaal 60.000.000Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
  6. 6. Projectworkflow 1. 1 Selection KB + partners 2. Material preparation KB 3. Scanning, 3 Scanning OCR + metadata Outsourced 4. Quality assessment KB 5. 5 Presentation & storage KB + partnersKoninklijke Bibliotheek – Nationale bibliotheek van Nederland
  7. 7. The scope is ‘everything’: but where to start?Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
  8. 8. Selection of newspapers (1618-1995)Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
  9. 9. Copyright• Freelancers own rights until 70 years after death of ‘author’• Publishers own rights until 70 years after publication• Online publication is only possible if permitted by the copyright holders• KB negotiated an agreement with representative bodies of freelancers, journalists and photographers (Lira/Pictoright)• KB negotiated successfully with 15 publishers to clear copyrights• Result: 102 Dutch newspapers will be published online until 1995 and can be accessed free of charge d b df f hKoninklijke Bibliotheek – Nationale bibliotheek van Nederland
  10. 10. Material preparation• Every page is checked• Small repairs are carried out• Metadata is added• Bindings are p p g prepared for transportation• Cut and digitize is not an option pKoninklijke Bibliotheek – Nationale bibliotheek van Nederland
  11. 11. A workflow of 50.000 pages per weekKoninklijke Bibliotheek – Nationale bibliotheek van Nederland
  12. 12. Digitization• Digitization is outsourced• Final goal (e.g. web service & digital preservation) drives technical choices• Metadata enrichment to improve quality of the automated process of segmentation and Optical Character Recognition (OCR)• Output: • JPEG2000 (masterimages & accessimages) • PDF • Descriptive, structural and technical metadata (DCX, MPEG21/METS, ALTO, MIX)Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
  13. 13. Quality assessment• QA takes a lot of time so we do Q as automatically as p Q QA y possible• Automatic checks on validity of XML files, file names, correlation between different files, completeness p• Samples for all aspects that cannot be checked automatically (results of correction of OCR in headers, segmentation, ea) g• Focus on improving structural problems instead of incidental errors• Balance between quantity and qualityKoninklijke Bibliotheek – Nationale bibliotheek van Nederland
  14. 14. Presentation & storage• Generic architecture to support all our content • central metadata store and search engine • open architecture by use of standards (DublinCore) and protocols (OAI, SRU)• A ti l l l access Article level• Advanced search options for: date of publication, publication city of publication, specific publication newspaper(s), article type• Storage: longterm preservation of results in KB’s E-DepotKoninklijke Bibliotheek – Nationale bibliotheek van Nederland
  15. 15. Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
  16. 16. Mass digitization – lessons learned• Do your homework: p y perform desktop research, develop a clear p , p functional design and implement a pilot phase• Define detailed specifications and workflows for different source types - and stick to them yp• Start early with negotiations to clear titles for online publication• Planning is vital to stay in co t o . Perform regular transports to a g s ta control. e o egu a t a spo ts suppliers. Agree a detailed planning for deliveries• Don’t underestimate costs of developing technical and organisational infrastructure for mass-digitization mass digitizationKoninklijke Bibliotheek – Nationale bibliotheek van Nederland
  17. 17. Costs of newspaperdigitization Newspapers: price per page (all in) Labourcosts (€ 0,59) Digitization (€ 0,68) Infrastructure and software (€ 0,16) ( , ) Conservation (€ 0,01) Diverse costs (€ 0,05)Koninklijke Bibliotheek – Nationale bibliotheek van Nederland
  18. 18. Challenges• Cutting down the p g price of digitization g• Bringing the physical and digital library together• Linking initiatives of p g potential p partners to our ambitions• Bringing digital collections together by providing a digital platform for all Dutch Books, Newspapers and Magazines• Improving quality of OCR for historical text (IMPACT)Koninklijke Bibliotheek – Nationale bibliotheek van Nederland