Your SlideShare is downloading. ×
The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

The Big Dutch 20 Year 730 Million Page Digitisation Challenge

1,960
views

Published on

The National Library of the Netherlands (KB) is mass-digitizing all Dutch publications since 1470. This article outlines KB's strategy for making this output publicly available. …

The National Library of the Netherlands (KB) is mass-digitizing all Dutch publications since 1470. This article outlines KB's strategy for making this output publicly available.

In the next 20 years, the Dutch national library (KB) will mass-digitize all Dutch printed books, newspapers and magazines since 1470, a total of 730 million pages. Until recently, this was done by public funding alone. To speed up things in a climate of ongoing budget cuts, KB entered into public-private partnerships with both Google and Proquest to digitize 42 million pages by 2013. Besides the availability of funding, digitization priority is determined by a mix of client and institutional needs such as copyright status, uniqueness, institutional capability and user demand.

At the same time, KB is answering user demand for centralized access and content distribution by streamlining its scattered online services portfolio. For this, KB develops two strategic lines of action.
* The first is on metadata (searching FOR publications): in 2013, KB will unify metadata searching across all its paper and digital collections via OCLC's WorldCat Local.
* The second is on full-text (searching IN publications): for searching in full-text historic publications (i.e. mass digitization output) KB is currently developing its Platform for Digital Publications. Besides a search engine, it is also a:
* Presentation environment, associating each full-text object with a standardized webpage and persistent URL, offering a uniform look and feel, and unique reference for all KB's full-texts. This landing page enables third-party services (e.g. WorldCat Local, Europeana, Google) to refer to objects in a persistent way.
* Delivery platform, enabling KB to deliver content in the workflows of users via APIs and expose it to research communities.
* Aggregator, enabling KB to set up a network of partners to bring together all Dutch digital books, newspapers and magazines, at the same time supporting Europeana's content aggregation strategy.

Published in: Business, Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,960
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The Big Du tch 20 Year 730 M illion Page Digitisation Challenge w.alumni.ubc.ca/wp/wp-content/uploads/Gohagan_RLDutchWW_2012_01_HollandWindmillTulips.jpg 10th International Conference on the Book 30th June 2012, Barcelona, SpainOlaf Janssen, National Library of the Netherlands – olaf.janssen@kb.nl / @ookgezellig / slideshare.net/OlafJanssenNL
  • 2. Hello, my name isOlaf Janssen Photo: KB
  • 3. Hello, my name is Olaf Janssenhttp://pinterest.com/pin/238901955203421002/ I’m a project manager for KB, the National Library of the Netherlands…
  • 4. These are my colleagues … Source: KB intranet
  • 5. Every day we give our best … Source: KB intranet
  • 6. ... because we’ve atask to accomplish Source: NRC
  • 7. We arescanning all Dutch Source: NRC
  • 8. books
  • 9. newspapersbooks
  • 10. newspapers books &m agazineshttp://www.corbisimages.com/stock-photo/rights-managed/42-20042070/1960s-1970s-boy-leaning-against-tree-reading?popup=1http://www.corbisimages.com/stock-photo/rights-managed/42-26195211/humor-portrait-man-wearing-hat-sitting-on?popup=1http://eu.art.com/products/p6901596700-sa-i5098387/posters.htm?ui=F9E1398DA4CC4D3DB105E128FBAB2C4D
  • 11. since 1470http://marksayers.files.wordpress.com/2011/05/charles-darwin-2.png
  • 12. A whopping 730 pageshttp://www.portlandmonthlymag.com/assets/0004/4122/surprised-woman.jpghttp://morristrust.com/wp-content/uploads/2012/04/Surprised-Face.jpg
  • 13. A whopping 730.000 pageshttp://www.portlandmonthlymag.com/assets/0004/4122/surprised-woman.jpghttp://morristrust.com/wp-content/uploads/2012/04/Surprised-Face.jpg
  • 14. A whopping 730.000.000 pageshttp://www.portlandmonthlymag.com/assets/0004/4122/surprised-woman.jpghttp://morristrust.com/wp-content/uploads/2012/04/Surprised-Face.jpg
  • 15. For the next 7300 days (approx.)http://www.picturesfromourpast.com/gallery/RightsManaged/20110817/CKSA011_YC019.jpghttp://imgc.allpostersimages.com/images/P-473-488-90/56/5641/2RYMG00Z/posters/george-marks-surprised-woman-posing-portrait.jpg
  • 16. That’s100.000pages every single day !! http://www.allposters.co.uk/-sp/Man-Wiping-Forehead-Posters_i8018953_.htm
  • 17. And of course, after digitisat ion,we want to make many peop lehappy with our content. http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html
  • 18. I work on this because I believe… http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html
  • 19. ultimately people want to know who they are.For that they explore their histories & origins.I want to help them in exploring these worlds http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html
  • 20. ultimately people want to know who they are.For that they explore their histories & origins.I want to help them in exploring these worlds http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html
  • 21. ultimately people want to know who they are.For that they explore their histories & origins.I want to help them in exploring these worlds. http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html
  • 22. The key idea I’d like to share with you today: How KB goes about tackling these grand challenges… http://mestadelsbilder.files.wordpress.com/2011/06/dali.jpg
  • 23. First, we a re creating digital content ... http://www.bjp-online.com/IMG/001/110001/getty-archive-collection.jpg?1282665196
  • 24. 1995-1999http://www.corbisimages.com/stock-photo/rights-managed/42-20042210/1940s-man-in-suit-holding-up-index?popup=1
  • 25. 1995-1999Small scale digitisation
  • 26. 1995-1999 Small scale digitisationTreasures & highlights of KB collection (1000s pages)
  • 27. 1995-1999 Small scale digitisation Memory ofthe Netherlands (730K images)
  • 28. 2000-2010http://www.corbisimages.com/stock-photo/rights-managed/42-26194724/smiling-blond-nurse-with-surprised-expression-talking?popup=1
  • 29. 2000-2010Large scale, public funding
  • 30. 2000-2010 Large scale, public fundingEarly Dutch Books(2.2M pages, full-text, 1781-1800)
  • 31. 2000-2010 Large scale, public fundingHistoric Newspapers (9.5M pages, full-text, 1618-1995)
  • 32. 2010-todayhttp://www.corbisimages.com/stock-photo/rights-managed/42-26194844/smiling-woman-counting-on-fingers-wearing-pearls?popup=1
  • 33. 2010-todayMass scale, private & public funding
  • 34. 2010-today Mass scale, private & public fundingProquest partnership (12M pages, 1450-1700)http://www.kb.nl/nieuws/2011/proquest-en.html
  • 35. 2010-today Mass scale, private & public fundingGoogle partnership (35M pages, full-text, 1701-1871)http://www.kb.nl/nieuws/2010/google-en.html
  • 36. OK, so we’re very busy creatin g loads of digital content … http://www.bjp-online.com/IMG/001/110001/getty-archive-collection.jpg?1282665196
  • 37. problem!! Houston, we’ve ahttp://2.bp.blogspot.com/_BWzuYwiS6-I/TMgeRsFd3mI/AAAAAAAAElw/3cvgbZSPWcs/s1600/doctor+macro+judy+scared.jpg
  • 38. Although we create & store our digital content in a strictlystandardized process … (JP2, JPG, XML-OCR, MPEG21, ALTO,PDF … *)* http://kb.nl/hrd/digitalisering/index-en.html http://www.electrohype.org/press/pionjar/IBM_System360_Mod_50.jpg
  • 39. .. this back-end standardisation does not reflect in the front-endhttp://www.electrohype.org/press/pionjar/IBM_System360_Mod_50.jpg
  • 40. Memory oKB Treasures f the Netherlan ds
  • 41. Memory o KB Treasures f the Netherlan ds KB full-textHistoric boo K B full-text ks spapers H istoric new
  • 42. Memory o KB Treasures f the Netherlan ds KB full-textHistoric boo K B full-text ks spapers H istoric new t Proquest Google full-tex Historic books Historic books
  • 43. To many people KB’s website portfolio feels something like this http://berichtenuithetverleden.files.wordpress.com/2011/03/escher.jpg
  • 44. Current KB websites are inconsistent in Images: http://www.corbisimages.com/Search#pg=h+armstrong+roberts URL-logic Search logic Design Object Display of Branding presentation result set User experience
  • 45. Current KB websites are inconsistent in Images: http://www.corbisimages.com/Search#pg=h+armstrong+roberts URL-logic Search logic Design Object Display of Branding presentation result set Scattered & unrelated User experience collections
  • 46. Current KB websites are inconsistent in Images: http://www.corbisimages.com/Search#pg=h+armstrong+roberts URL-logic Search logic Design Object Display of Branding presentation result set Scattered & unrelated User Non- experience collections interoperability
  • 47. For short: Current KB websites don’t m eet expectations of modern & future generationshttp://www.corbisimages.com/stock-photo/rights-managed/NT3707756/depressed-cheerleader?popup=1http://www.corbisimages.com/stock-photo/rights-managed/42-20036948/1960s-1970s-seated-baby-in-diaper-with?popup=1
  • 48. http://www.leninimports.com/cary_grant_new_7a.jpg No panic !
  • 49. http://www.leninimports.com/cary_grant_new_7a.jpg We are working on a solution..
  • 50. KB is implementing 3 lines of actionhttp://simplehomeschool.net/wp-content/uploads/2011/06/woman_walking_between_bookshelf-e1308357752773.jpghttp://www.corbisimages.com/stock-photo/rights-managed/NT3765115/an-old-hat-trick?popup=1
  • 51. 1. for publicationsUnified searching KB is implementing 3 lines of action http://simplehomeschool.net/wp-content/uploads/2011/06/woman_walking_between_bookshelf-e1308357752773.jpg http://www.corbisimages.com/stock-photo/rights-managed/NT3765115/an-old-hat-trick?popup=1
  • 52. 1. for publicationsUnified searching 2. in publicationsUnified searching KB is implementing 3 lines of action http://simplehomeschool.net/wp-content/uploads/2011/06/woman_walking_between_bookshelf-e1308357752773.jpg http://www.corbisimages.com/stock-photo/rights-managed/NT3765115/an-old-hat-trick?popup=1
  • 53. 1. for publicationsUnified searching 2. in publicationsUnified searching 3. Unified KB is implementing 3 lines of action object presentation http://simplehomeschool.net/wp-content/uploads/2011/06/woman_walking_between_bookshelf-e1308357752773.jpg http://www.corbisimages.com/stock-photo/rights-managed/NT3765115/an-old-hat-trick?popup=1
  • 54. 1. Unified searching for publications Metadata
  • 55. 1. Unified searching for publications MetadataKB General Cataloguesearching for• (e-)books• (e-)magazines• (e-)newspapers
  • 56. 1. Unified searching for publications MetadataKB General Cataloguesearching for• (e-)books• (e-)magazines• (e-)newspapersMetaLibsearching for• scholarly e-journals• licensed 3rd party databases
  • 57. 1. Unified searching for publications MetadataKB General Cataloguesearching for• (e-)books WorldCat Local• (e-)magazines KB’s single starting point for• (e-)newspapers searching for publicationsMetaLibsearching for• scholarly e-journals• licensed 3rd party databases
  • 58. 1. Unified searching for publications Downsides of WorldCat Local
  • 59. 1. Unified searching for publications Downsides of WorldCat Local 1. Nofull-text searching
  • 60. 1. Unified searching for publications Downsides of WorldCat Local http://www.corbisimages.com/Search#pg=h+armstrong+roberts&p=1&ColorFormat=2&q=sad 1. No 2. Nofull-text searching object presentation
  • 61. http://www.leninimports.com/cary_grant_new_7a.jpg No panic !
  • 62. http://www.leninimports.com/cary_grant_new_7a.jpg We are tackling this…
  • 63. 2. Unified searching in publications Full-text
  • 64. 2. Unified searching in publications Full-textKB Platform for Digital Publications
  • 65. 2. Unified searching in publications Full-text KB full-texthistoric books KB Platform for Digital Publications
  • 66. 2. Unified searching in publications Full-text Google full-text historic books KB full-texthistoric books KB Platform for Digital Publications
  • 67. 2. Unified searching in publications Full-text Google full-text KB full-text historic books historic newspapers KB full-texthistoric books KB Platform for Digital Publications
  • 68. 2. Unified searching in publications Full-text All future KB full-text Google full-text KB full-text digitisation output historic books historic newspapers KB full-texthistoric books KB Platform for Digital Publications
  • 69. 2. Unified searching in publications Full-text All future KB full-text Google full-text KB full-text digitisation output historic books historic newspapers KB full-text KB full-texthistoric books historic magazines (sept 2012) KB Platform for Digital Publications
  • 70. 3. Unified object presentationUniform look & feel, independent of object
  • 71. 3. Unified object presentation Uniform look & feel, independent of object Book(early wireframing stage)
  • 72. 3. Unified object presentation Uniform look & feel, independent of object Newspaper(early wireframing stage)
  • 73. 3. Unified object presentation Uniform look & feel, independent of object Magazine(early wireframing stage)
  • 74. 3. Unified object presentation Landing page + persistent ID
  • 75. 3. Unified object presentation Landing page + persistent ID Landing page (within Platform for Digital Publications)
  • 76. 3. Unified object presentation Landing page + persistent ID persistent ID Landing page (within Platform for Digital Publications)
  • 77. 3. Unified object presentation Landing page + persistent IDKB metadata search (via WCLocal) Landing page (within Platform for Digital Publications)
  • 78. 3. Unified object presentation Landing page + persistent ID KB metadata search (via WCLocal)KB full-text search (via Platform for Digital Landing page Publications) (within Platform for Digital Publications)
  • 79. 3. Unified object presentation Landing page + persistent ID KB metadata search Scientist, student etc. (via WCLocal)KB full-text search (via Platform for Digital Landing page Publications) (within Platform for Digital Publications)
  • 80. 3. Unified object presentation Landing page + persistent ID KB metadata search Scientist, student etc. (via WCLocal)KB full-text search (via Platform for Digital Landing page Publications) (within Platform for Digital Publications)
  • 81. 3. Unified object presentation Landing page + persistent ID KB metadata search Scientist, student etc. (via WCLocal)KB full-text search (via Platform for Digital Landing page Publications) (within Platform for Digital Publications)
  • 82. 3. Unified object presentation Landing page + persistent ID KB metadata search Scientist, student etc. (via WCLocal)KB full-text search (via Platform for Digital Landing page Publications) (within Platform for Digital Publications)
  • 83. So… we’re very busy creating loads of digital content … http://www.bjp-online.com/IMG/001/110001/getty-archive-collection.jpg?1282665196
  • 84. So… we’re very busy creating loads of digital content … and we’re also creatingunified discovery & presenta tion … http://www.bjp-online.com/IMG/001/110001/getty-archive-collection.jpg?1282665196
  • 85. http://www.corbisimages.com/Search#pg=h+armstrong+roberts&p=1&ColorFormat=2&q=happy y… ppmaking quite a few people ha
  • 86. B ut we want more …
  • 87. KB wants to make MORE peopleHAPPI ER with its content!
  • 88. Some strategies… http://mestadelsbilder.files.wordpress.com/2011/06/dali.jpg
  • 89. 1) APIs / dataservices OAI-PMH & SRU http://www.ducatimeccanica.com/single_engine.jpg
  • 90. 2) Content via (social) networks
  • 91. 3) Clear licensing information for use & reuse CC-zero, unless (legal) restrictions apply
  • 92. 4) Strategic partnerships Europeana & Wikipedia
  • 93. 5) Crowdsourcing Collaborative OCR correctionhttp://4.bp.blogspot.com/-QqBeVbbjrpY/T4csQk4dtcI/AAAAAAAAAYw/6btxopsuRsM/s1600/crowd2.jpg
  • 94. Thanks for your attention! olaf.janssen@kb.nl - @ookgez ellig - slideshare.net/OlafJanssenN L

×