The Big Dutch 20 Year 730 Million Page Digitisation Challenge

2,545 views

Published on

The National Library of the Netherlands (KB) is mass-digitizing all Dutch publications since 1470. This article outlines KB's strategy for making this output publicly available.

In the next 20 years, the Dutch national library (KB) will mass-digitize all Dutch printed books, newspapers and magazines since 1470, a total of 730 million pages. Until recently, this was done by public funding alone. To speed up things in a climate of ongoing budget cuts, KB entered into public-private partnerships with both Google and Proquest to digitize 42 million pages by 2013. Besides the availability of funding, digitization priority is determined by a mix of client and institutional needs such as copyright status, uniqueness, institutional capability and user demand.

At the same time, KB is answering user demand for centralized access and content distribution by streamlining its scattered online services portfolio. For this, KB develops two strategic lines of action.
* The first is on metadata (searching FOR publications): in 2013, KB will unify metadata searching across all its paper and digital collections via OCLC's WorldCat Local.
* The second is on full-text (searching IN publications): for searching in full-text historic publications (i.e. mass digitization output) KB is currently developing its Platform for Digital Publications. Besides a search engine, it is also a:
* Presentation environment, associating each full-text object with a standardized webpage and persistent URL, offering a uniform look and feel, and unique reference for all KB's full-texts. This landing page enables third-party services (e.g. WorldCat Local, Europeana, Google) to refer to objects in a persistent way.
* Delivery platform, enabling KB to deliver content in the workflows of users via APIs and expose it to research communities.
* Aggregator, enabling KB to set up a network of partners to bring together all Dutch digital books, newspapers and magazines, at the same time supporting Europeana's content aggregation strategy.

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,545
On SlideShare
0
From Embeds
0
Number of Embeds
34
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The Big Dutch 20 Year 730 Million Page Digitisation Challenge

  1. 1. The Big Du tch 20 Year 730 M illion Page Digitisation Challenge w.alumni.ubc.ca/wp/wp-content/uploads/Gohagan_RLDutchWW_2012_01_HollandWindmillTulips.jpg 10th International Conference on the Book 30th June 2012, Barcelona, SpainOlaf Janssen, National Library of the Netherlands – olaf.janssen@kb.nl / @ookgezellig / slideshare.net/OlafJanssenNL
  2. 2. Hello, my name isOlaf Janssen Photo: KB
  3. 3. Hello, my name is Olaf Janssenhttp://pinterest.com/pin/238901955203421002/ I’m a project manager for KB, the National Library of the Netherlands…
  4. 4. These are my colleagues … Source: KB intranet
  5. 5. Every day we give our best … Source: KB intranet
  6. 6. ... because we’ve atask to accomplish Source: NRC
  7. 7. We arescanning all Dutch Source: NRC
  8. 8. books
  9. 9. newspapersbooks
  10. 10. newspapers books &m agazineshttp://www.corbisimages.com/stock-photo/rights-managed/42-20042070/1960s-1970s-boy-leaning-against-tree-reading?popup=1http://www.corbisimages.com/stock-photo/rights-managed/42-26195211/humor-portrait-man-wearing-hat-sitting-on?popup=1http://eu.art.com/products/p6901596700-sa-i5098387/posters.htm?ui=F9E1398DA4CC4D3DB105E128FBAB2C4D
  11. 11. since 1470http://marksayers.files.wordpress.com/2011/05/charles-darwin-2.png
  12. 12. A whopping 730 pageshttp://www.portlandmonthlymag.com/assets/0004/4122/surprised-woman.jpghttp://morristrust.com/wp-content/uploads/2012/04/Surprised-Face.jpg
  13. 13. A whopping 730.000 pageshttp://www.portlandmonthlymag.com/assets/0004/4122/surprised-woman.jpghttp://morristrust.com/wp-content/uploads/2012/04/Surprised-Face.jpg
  14. 14. A whopping 730.000.000 pageshttp://www.portlandmonthlymag.com/assets/0004/4122/surprised-woman.jpghttp://morristrust.com/wp-content/uploads/2012/04/Surprised-Face.jpg
  15. 15. For the next 7300 days (approx.)http://www.picturesfromourpast.com/gallery/RightsManaged/20110817/CKSA011_YC019.jpghttp://imgc.allpostersimages.com/images/P-473-488-90/56/5641/2RYMG00Z/posters/george-marks-surprised-woman-posing-portrait.jpg
  16. 16. That’s100.000pages every single day !! http://www.allposters.co.uk/-sp/Man-Wiping-Forehead-Posters_i8018953_.htm
  17. 17. And of course, after digitisat ion,we want to make many peop lehappy with our content. http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html
  18. 18. I work on this because I believe… http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html
  19. 19. ultimately people want to know who they are.For that they explore their histories & origins.I want to help them in exploring these worlds http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html
  20. 20. ultimately people want to know who they are.For that they explore their histories & origins.I want to help them in exploring these worlds http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html
  21. 21. ultimately people want to know who they are.For that they explore their histories & origins.I want to help them in exploring these worlds. http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html
  22. 22. The key idea I’d like to share with you today: How KB goes about tackling these grand challenges… http://mestadelsbilder.files.wordpress.com/2011/06/dali.jpg
  23. 23. First, we a re creating digital content ... http://www.bjp-online.com/IMG/001/110001/getty-archive-collection.jpg?1282665196
  24. 24. 1995-1999http://www.corbisimages.com/stock-photo/rights-managed/42-20042210/1940s-man-in-suit-holding-up-index?popup=1
  25. 25. 1995-1999Small scale digitisation
  26. 26. 1995-1999 Small scale digitisationTreasures & highlights of KB collection (1000s pages)
  27. 27. 1995-1999 Small scale digitisation Memory ofthe Netherlands (730K images)
  28. 28. 2000-2010http://www.corbisimages.com/stock-photo/rights-managed/42-26194724/smiling-blond-nurse-with-surprised-expression-talking?popup=1
  29. 29. 2000-2010Large scale, public funding
  30. 30. 2000-2010 Large scale, public fundingEarly Dutch Books(2.2M pages, full-text, 1781-1800)
  31. 31. 2000-2010 Large scale, public fundingHistoric Newspapers (9.5M pages, full-text, 1618-1995)
  32. 32. 2010-todayhttp://www.corbisimages.com/stock-photo/rights-managed/42-26194844/smiling-woman-counting-on-fingers-wearing-pearls?popup=1
  33. 33. 2010-todayMass scale, private & public funding
  34. 34. 2010-today Mass scale, private & public fundingProquest partnership (12M pages, 1450-1700)http://www.kb.nl/nieuws/2011/proquest-en.html
  35. 35. 2010-today Mass scale, private & public fundingGoogle partnership (35M pages, full-text, 1701-1871)http://www.kb.nl/nieuws/2010/google-en.html
  36. 36. OK, so we’re very busy creatin g loads of digital content … http://www.bjp-online.com/IMG/001/110001/getty-archive-collection.jpg?1282665196
  37. 37. problem!! Houston, we’ve ahttp://2.bp.blogspot.com/_BWzuYwiS6-I/TMgeRsFd3mI/AAAAAAAAElw/3cvgbZSPWcs/s1600/doctor+macro+judy+scared.jpg
  38. 38. Although we create & store our digital content in a strictlystandardized process … (JP2, JPG, XML-OCR, MPEG21, ALTO,PDF … *)* http://kb.nl/hrd/digitalisering/index-en.html http://www.electrohype.org/press/pionjar/IBM_System360_Mod_50.jpg
  39. 39. .. this back-end standardisation does not reflect in the front-endhttp://www.electrohype.org/press/pionjar/IBM_System360_Mod_50.jpg
  40. 40. Memory oKB Treasures f the Netherlan ds
  41. 41. Memory o KB Treasures f the Netherlan ds KB full-textHistoric boo K B full-text ks spapers H istoric new
  42. 42. Memory o KB Treasures f the Netherlan ds KB full-textHistoric boo K B full-text ks spapers H istoric new t Proquest Google full-tex Historic books Historic books
  43. 43. To many people KB’s website portfolio feels something like this http://berichtenuithetverleden.files.wordpress.com/2011/03/escher.jpg
  44. 44. Current KB websites are inconsistent in Images: http://www.corbisimages.com/Search#pg=h+armstrong+roberts URL-logic Search logic Design Object Display of Branding presentation result set User experience
  45. 45. Current KB websites are inconsistent in Images: http://www.corbisimages.com/Search#pg=h+armstrong+roberts URL-logic Search logic Design Object Display of Branding presentation result set Scattered & unrelated User experience collections
  46. 46. Current KB websites are inconsistent in Images: http://www.corbisimages.com/Search#pg=h+armstrong+roberts URL-logic Search logic Design Object Display of Branding presentation result set Scattered & unrelated User Non- experience collections interoperability
  47. 47. For short: Current KB websites don’t m eet expectations of modern & future generationshttp://www.corbisimages.com/stock-photo/rights-managed/NT3707756/depressed-cheerleader?popup=1http://www.corbisimages.com/stock-photo/rights-managed/42-20036948/1960s-1970s-seated-baby-in-diaper-with?popup=1
  48. 48. http://www.leninimports.com/cary_grant_new_7a.jpg No panic !
  49. 49. http://www.leninimports.com/cary_grant_new_7a.jpg We are working on a solution..
  50. 50. KB is implementing 3 lines of actionhttp://simplehomeschool.net/wp-content/uploads/2011/06/woman_walking_between_bookshelf-e1308357752773.jpghttp://www.corbisimages.com/stock-photo/rights-managed/NT3765115/an-old-hat-trick?popup=1
  51. 51. 1. for publicationsUnified searching KB is implementing 3 lines of action http://simplehomeschool.net/wp-content/uploads/2011/06/woman_walking_between_bookshelf-e1308357752773.jpg http://www.corbisimages.com/stock-photo/rights-managed/NT3765115/an-old-hat-trick?popup=1
  52. 52. 1. for publicationsUnified searching 2. in publicationsUnified searching KB is implementing 3 lines of action http://simplehomeschool.net/wp-content/uploads/2011/06/woman_walking_between_bookshelf-e1308357752773.jpg http://www.corbisimages.com/stock-photo/rights-managed/NT3765115/an-old-hat-trick?popup=1
  53. 53. 1. for publicationsUnified searching 2. in publicationsUnified searching 3. Unified KB is implementing 3 lines of action object presentation http://simplehomeschool.net/wp-content/uploads/2011/06/woman_walking_between_bookshelf-e1308357752773.jpg http://www.corbisimages.com/stock-photo/rights-managed/NT3765115/an-old-hat-trick?popup=1
  54. 54. 1. Unified searching for publications Metadata
  55. 55. 1. Unified searching for publications MetadataKB General Cataloguesearching for• (e-)books• (e-)magazines• (e-)newspapers
  56. 56. 1. Unified searching for publications MetadataKB General Cataloguesearching for• (e-)books• (e-)magazines• (e-)newspapersMetaLibsearching for• scholarly e-journals• licensed 3rd party databases
  57. 57. 1. Unified searching for publications MetadataKB General Cataloguesearching for• (e-)books WorldCat Local• (e-)magazines KB’s single starting point for• (e-)newspapers searching for publicationsMetaLibsearching for• scholarly e-journals• licensed 3rd party databases
  58. 58. 1. Unified searching for publications Downsides of WorldCat Local
  59. 59. 1. Unified searching for publications Downsides of WorldCat Local 1. Nofull-text searching
  60. 60. 1. Unified searching for publications Downsides of WorldCat Local http://www.corbisimages.com/Search#pg=h+armstrong+roberts&p=1&ColorFormat=2&q=sad 1. No 2. Nofull-text searching object presentation
  61. 61. http://www.leninimports.com/cary_grant_new_7a.jpg No panic !
  62. 62. http://www.leninimports.com/cary_grant_new_7a.jpg We are tackling this…
  63. 63. 2. Unified searching in publications Full-text
  64. 64. 2. Unified searching in publications Full-textKB Platform for Digital Publications
  65. 65. 2. Unified searching in publications Full-text KB full-texthistoric books KB Platform for Digital Publications
  66. 66. 2. Unified searching in publications Full-text Google full-text historic books KB full-texthistoric books KB Platform for Digital Publications
  67. 67. 2. Unified searching in publications Full-text Google full-text KB full-text historic books historic newspapers KB full-texthistoric books KB Platform for Digital Publications
  68. 68. 2. Unified searching in publications Full-text All future KB full-text Google full-text KB full-text digitisation output historic books historic newspapers KB full-texthistoric books KB Platform for Digital Publications
  69. 69. 2. Unified searching in publications Full-text All future KB full-text Google full-text KB full-text digitisation output historic books historic newspapers KB full-text KB full-texthistoric books historic magazines (sept 2012) KB Platform for Digital Publications
  70. 70. 3. Unified object presentationUniform look & feel, independent of object
  71. 71. 3. Unified object presentation Uniform look & feel, independent of object Book(early wireframing stage)
  72. 72. 3. Unified object presentation Uniform look & feel, independent of object Newspaper(early wireframing stage)
  73. 73. 3. Unified object presentation Uniform look & feel, independent of object Magazine(early wireframing stage)
  74. 74. 3. Unified object presentation Landing page + persistent ID
  75. 75. 3. Unified object presentation Landing page + persistent ID Landing page (within Platform for Digital Publications)
  76. 76. 3. Unified object presentation Landing page + persistent ID persistent ID Landing page (within Platform for Digital Publications)
  77. 77. 3. Unified object presentation Landing page + persistent IDKB metadata search (via WCLocal) Landing page (within Platform for Digital Publications)
  78. 78. 3. Unified object presentation Landing page + persistent ID KB metadata search (via WCLocal)KB full-text search (via Platform for Digital Landing page Publications) (within Platform for Digital Publications)
  79. 79. 3. Unified object presentation Landing page + persistent ID KB metadata search Scientist, student etc. (via WCLocal)KB full-text search (via Platform for Digital Landing page Publications) (within Platform for Digital Publications)
  80. 80. 3. Unified object presentation Landing page + persistent ID KB metadata search Scientist, student etc. (via WCLocal)KB full-text search (via Platform for Digital Landing page Publications) (within Platform for Digital Publications)
  81. 81. 3. Unified object presentation Landing page + persistent ID KB metadata search Scientist, student etc. (via WCLocal)KB full-text search (via Platform for Digital Landing page Publications) (within Platform for Digital Publications)
  82. 82. 3. Unified object presentation Landing page + persistent ID KB metadata search Scientist, student etc. (via WCLocal)KB full-text search (via Platform for Digital Landing page Publications) (within Platform for Digital Publications)
  83. 83. So… we’re very busy creating loads of digital content … http://www.bjp-online.com/IMG/001/110001/getty-archive-collection.jpg?1282665196
  84. 84. So… we’re very busy creating loads of digital content … and we’re also creatingunified discovery & presenta tion … http://www.bjp-online.com/IMG/001/110001/getty-archive-collection.jpg?1282665196
  85. 85. http://www.corbisimages.com/Search#pg=h+armstrong+roberts&p=1&ColorFormat=2&q=happy y… ppmaking quite a few people ha
  86. 86. B ut we want more …
  87. 87. KB wants to make MORE peopleHAPPI ER with its content!
  88. 88. Some strategies… http://mestadelsbilder.files.wordpress.com/2011/06/dali.jpg
  89. 89. 1) APIs / dataservices OAI-PMH & SRU http://www.ducatimeccanica.com/single_engine.jpg
  90. 90. 2) Content via (social) networks
  91. 91. 3) Clear licensing information for use & reuse CC-zero, unless (legal) restrictions apply
  92. 92. 4) Strategic partnerships Europeana & Wikipedia
  93. 93. 5) Crowdsourcing Collaborative OCR correctionhttp://4.bp.blogspot.com/-QqBeVbbjrpY/T4csQk4dtcI/AAAAAAAAAYw/6btxopsuRsM/s1600/crowd2.jpg
  94. 94. Thanks for your attention! olaf.janssen@kb.nl - @ookgez ellig - slideshare.net/OlafJanssenN L

×