• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge
 

The Big Dutch 20 Year 730 Million Page Digitisation Challenge

on

  • 2,091 views

The National Library of the Netherlands (KB) is mass-digitizing all Dutch publications since 1470. This article outlines KB's strategy for making this output publicly available. ...

The National Library of the Netherlands (KB) is mass-digitizing all Dutch publications since 1470. This article outlines KB's strategy for making this output publicly available.

In the next 20 years, the Dutch national library (KB) will mass-digitize all Dutch printed books, newspapers and magazines since 1470, a total of 730 million pages. Until recently, this was done by public funding alone. To speed up things in a climate of ongoing budget cuts, KB entered into public-private partnerships with both Google and Proquest to digitize 42 million pages by 2013. Besides the availability of funding, digitization priority is determined by a mix of client and institutional needs such as copyright status, uniqueness, institutional capability and user demand.

At the same time, KB is answering user demand for centralized access and content distribution by streamlining its scattered online services portfolio. For this, KB develops two strategic lines of action.
* The first is on metadata (searching FOR publications): in 2013, KB will unify metadata searching across all its paper and digital collections via OCLC's WorldCat Local.
* The second is on full-text (searching IN publications): for searching in full-text historic publications (i.e. mass digitization output) KB is currently developing its Platform for Digital Publications. Besides a search engine, it is also a:
* Presentation environment, associating each full-text object with a standardized webpage and persistent URL, offering a uniform look and feel, and unique reference for all KB's full-texts. This landing page enables third-party services (e.g. WorldCat Local, Europeana, Google) to refer to objects in a persistent way.
* Delivery platform, enabling KB to deliver content in the workflows of users via APIs and expose it to research communities.
* Aggregator, enabling KB to set up a network of partners to bring together all Dutch digital books, newspapers and magazines, at the same time supporting Europeana's content aggregation strategy.

Statistics

Views

Total Views
2,091
Views on SlideShare
2,060
Embed Views
31

Actions

Likes
0
Downloads
3
Comments
0

4 Embeds 31

https://si0.twimg.com 26
https://twimg0-a.akamaihd.net 2
https://twitter.com 2
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    The  Big Dutch 20 Year  730 Million Page Digitisation  Challenge The Big Dutch 20 Year 730 Million Page Digitisation Challenge Presentation Transcript

    • The Big Du tch 20 Year 730 M illion Page Digitisation Challenge w.alumni.ubc.ca/wp/wp-content/uploads/Gohagan_RLDutchWW_2012_01_HollandWindmillTulips.jpg 10th International Conference on the Book 30th June 2012, Barcelona, SpainOlaf Janssen, National Library of the Netherlands – olaf.janssen@kb.nl / @ookgezellig / slideshare.net/OlafJanssenNL
    • Hello, my name isOlaf Janssen Photo: KB
    • Hello, my name is Olaf Janssenhttp://pinterest.com/pin/238901955203421002/ I’m a project manager for KB, the National Library of the Netherlands…
    • These are my colleagues … Source: KB intranet
    • Every day we give our best … Source: KB intranet
    • ... because we’ve atask to accomplish Source: NRC
    • We arescanning all Dutch Source: NRC
    • books
    • newspapersbooks
    • newspapers books &m agazineshttp://www.corbisimages.com/stock-photo/rights-managed/42-20042070/1960s-1970s-boy-leaning-against-tree-reading?popup=1http://www.corbisimages.com/stock-photo/rights-managed/42-26195211/humor-portrait-man-wearing-hat-sitting-on?popup=1http://eu.art.com/products/p6901596700-sa-i5098387/posters.htm?ui=F9E1398DA4CC4D3DB105E128FBAB2C4D
    • since 1470http://marksayers.files.wordpress.com/2011/05/charles-darwin-2.png
    • A whopping 730 pageshttp://www.portlandmonthlymag.com/assets/0004/4122/surprised-woman.jpghttp://morristrust.com/wp-content/uploads/2012/04/Surprised-Face.jpg
    • A whopping 730.000 pageshttp://www.portlandmonthlymag.com/assets/0004/4122/surprised-woman.jpghttp://morristrust.com/wp-content/uploads/2012/04/Surprised-Face.jpg
    • A whopping 730.000.000 pageshttp://www.portlandmonthlymag.com/assets/0004/4122/surprised-woman.jpghttp://morristrust.com/wp-content/uploads/2012/04/Surprised-Face.jpg
    • For the next 7300 days (approx.)http://www.picturesfromourpast.com/gallery/RightsManaged/20110817/CKSA011_YC019.jpghttp://imgc.allpostersimages.com/images/P-473-488-90/56/5641/2RYMG00Z/posters/george-marks-surprised-woman-posing-portrait.jpg
    • That’s100.000pages every single day !! http://www.allposters.co.uk/-sp/Man-Wiping-Forehead-Posters_i8018953_.htm
    • And of course, after digitisat ion,we want to make many peop lehappy with our content. http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html
    • I work on this because I believe… http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html
    • ultimately people want to know who they are.For that they explore their histories & origins.I want to help them in exploring these worlds http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html
    • ultimately people want to know who they are.For that they explore their histories & origins.I want to help them in exploring these worlds http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html
    • ultimately people want to know who they are.For that they explore their histories & origins.I want to help them in exploring these worlds. http://annakrentz.blogspot.nl/2011/05/dutch-liberation.html
    • The key idea I’d like to share with you today: How KB goes about tackling these grand challenges… http://mestadelsbilder.files.wordpress.com/2011/06/dali.jpg
    • First, we a re creating digital content ... http://www.bjp-online.com/IMG/001/110001/getty-archive-collection.jpg?1282665196
    • 1995-1999http://www.corbisimages.com/stock-photo/rights-managed/42-20042210/1940s-man-in-suit-holding-up-index?popup=1
    • 1995-1999Small scale digitisation
    • 1995-1999 Small scale digitisationTreasures & highlights of KB collection (1000s pages)
    • 1995-1999 Small scale digitisation Memory ofthe Netherlands (730K images)
    • 2000-2010http://www.corbisimages.com/stock-photo/rights-managed/42-26194724/smiling-blond-nurse-with-surprised-expression-talking?popup=1
    • 2000-2010Large scale, public funding
    • 2000-2010 Large scale, public fundingEarly Dutch Books(2.2M pages, full-text, 1781-1800)
    • 2000-2010 Large scale, public fundingHistoric Newspapers (9.5M pages, full-text, 1618-1995)
    • 2010-todayhttp://www.corbisimages.com/stock-photo/rights-managed/42-26194844/smiling-woman-counting-on-fingers-wearing-pearls?popup=1
    • 2010-todayMass scale, private & public funding
    • 2010-today Mass scale, private & public fundingProquest partnership (12M pages, 1450-1700)http://www.kb.nl/nieuws/2011/proquest-en.html
    • 2010-today Mass scale, private & public fundingGoogle partnership (35M pages, full-text, 1701-1871)http://www.kb.nl/nieuws/2010/google-en.html
    • OK, so we’re very busy creatin g loads of digital content … http://www.bjp-online.com/IMG/001/110001/getty-archive-collection.jpg?1282665196
    • problem!! Houston, we’ve ahttp://2.bp.blogspot.com/_BWzuYwiS6-I/TMgeRsFd3mI/AAAAAAAAElw/3cvgbZSPWcs/s1600/doctor+macro+judy+scared.jpg
    • Although we create & store our digital content in a strictlystandardized process … (JP2, JPG, XML-OCR, MPEG21, ALTO,PDF … *)* http://kb.nl/hrd/digitalisering/index-en.html http://www.electrohype.org/press/pionjar/IBM_System360_Mod_50.jpg
    • .. this back-end standardisation does not reflect in the front-endhttp://www.electrohype.org/press/pionjar/IBM_System360_Mod_50.jpg
    • Memory oKB Treasures f the Netherlan ds
    • Memory o KB Treasures f the Netherlan ds KB full-textHistoric boo K B full-text ks spapers H istoric new
    • Memory o KB Treasures f the Netherlan ds KB full-textHistoric boo K B full-text ks spapers H istoric new t Proquest Google full-tex Historic books Historic books
    • To many people KB’s website portfolio feels something like this http://berichtenuithetverleden.files.wordpress.com/2011/03/escher.jpg
    • Current KB websites are inconsistent in Images: http://www.corbisimages.com/Search#pg=h+armstrong+roberts URL-logic Search logic Design Object Display of Branding presentation result set User experience
    • Current KB websites are inconsistent in Images: http://www.corbisimages.com/Search#pg=h+armstrong+roberts URL-logic Search logic Design Object Display of Branding presentation result set Scattered & unrelated User experience collections
    • Current KB websites are inconsistent in Images: http://www.corbisimages.com/Search#pg=h+armstrong+roberts URL-logic Search logic Design Object Display of Branding presentation result set Scattered & unrelated User Non- experience collections interoperability
    • For short: Current KB websites don’t m eet expectations of modern & future generationshttp://www.corbisimages.com/stock-photo/rights-managed/NT3707756/depressed-cheerleader?popup=1http://www.corbisimages.com/stock-photo/rights-managed/42-20036948/1960s-1970s-seated-baby-in-diaper-with?popup=1
    • http://www.leninimports.com/cary_grant_new_7a.jpg No panic !
    • http://www.leninimports.com/cary_grant_new_7a.jpg We are working on a solution..
    • KB is implementing 3 lines of actionhttp://simplehomeschool.net/wp-content/uploads/2011/06/woman_walking_between_bookshelf-e1308357752773.jpghttp://www.corbisimages.com/stock-photo/rights-managed/NT3765115/an-old-hat-trick?popup=1
    • 1. for publicationsUnified searching KB is implementing 3 lines of action http://simplehomeschool.net/wp-content/uploads/2011/06/woman_walking_between_bookshelf-e1308357752773.jpg http://www.corbisimages.com/stock-photo/rights-managed/NT3765115/an-old-hat-trick?popup=1
    • 1. for publicationsUnified searching 2. in publicationsUnified searching KB is implementing 3 lines of action http://simplehomeschool.net/wp-content/uploads/2011/06/woman_walking_between_bookshelf-e1308357752773.jpg http://www.corbisimages.com/stock-photo/rights-managed/NT3765115/an-old-hat-trick?popup=1
    • 1. for publicationsUnified searching 2. in publicationsUnified searching 3. Unified KB is implementing 3 lines of action object presentation http://simplehomeschool.net/wp-content/uploads/2011/06/woman_walking_between_bookshelf-e1308357752773.jpg http://www.corbisimages.com/stock-photo/rights-managed/NT3765115/an-old-hat-trick?popup=1
    • 1. Unified searching for publications Metadata
    • 1. Unified searching for publications MetadataKB General Cataloguesearching for• (e-)books• (e-)magazines• (e-)newspapers
    • 1. Unified searching for publications MetadataKB General Cataloguesearching for• (e-)books• (e-)magazines• (e-)newspapersMetaLibsearching for• scholarly e-journals• licensed 3rd party databases
    • 1. Unified searching for publications MetadataKB General Cataloguesearching for• (e-)books WorldCat Local• (e-)magazines KB’s single starting point for• (e-)newspapers searching for publicationsMetaLibsearching for• scholarly e-journals• licensed 3rd party databases
    • 1. Unified searching for publications Downsides of WorldCat Local
    • 1. Unified searching for publications Downsides of WorldCat Local 1. Nofull-text searching
    • 1. Unified searching for publications Downsides of WorldCat Local http://www.corbisimages.com/Search#pg=h+armstrong+roberts&p=1&ColorFormat=2&q=sad 1. No 2. Nofull-text searching object presentation
    • http://www.leninimports.com/cary_grant_new_7a.jpg No panic !
    • http://www.leninimports.com/cary_grant_new_7a.jpg We are tackling this…
    • 2. Unified searching in publications Full-text
    • 2. Unified searching in publications Full-textKB Platform for Digital Publications
    • 2. Unified searching in publications Full-text KB full-texthistoric books KB Platform for Digital Publications
    • 2. Unified searching in publications Full-text Google full-text historic books KB full-texthistoric books KB Platform for Digital Publications
    • 2. Unified searching in publications Full-text Google full-text KB full-text historic books historic newspapers KB full-texthistoric books KB Platform for Digital Publications
    • 2. Unified searching in publications Full-text All future KB full-text Google full-text KB full-text digitisation output historic books historic newspapers KB full-texthistoric books KB Platform for Digital Publications
    • 2. Unified searching in publications Full-text All future KB full-text Google full-text KB full-text digitisation output historic books historic newspapers KB full-text KB full-texthistoric books historic magazines (sept 2012) KB Platform for Digital Publications
    • 3. Unified object presentationUniform look & feel, independent of object
    • 3. Unified object presentation Uniform look & feel, independent of object Book(early wireframing stage)
    • 3. Unified object presentation Uniform look & feel, independent of object Newspaper(early wireframing stage)
    • 3. Unified object presentation Uniform look & feel, independent of object Magazine(early wireframing stage)
    • 3. Unified object presentation Landing page + persistent ID
    • 3. Unified object presentation Landing page + persistent ID Landing page (within Platform for Digital Publications)
    • 3. Unified object presentation Landing page + persistent ID persistent ID Landing page (within Platform for Digital Publications)
    • 3. Unified object presentation Landing page + persistent IDKB metadata search (via WCLocal) Landing page (within Platform for Digital Publications)
    • 3. Unified object presentation Landing page + persistent ID KB metadata search (via WCLocal)KB full-text search (via Platform for Digital Landing page Publications) (within Platform for Digital Publications)
    • 3. Unified object presentation Landing page + persistent ID KB metadata search Scientist, student etc. (via WCLocal)KB full-text search (via Platform for Digital Landing page Publications) (within Platform for Digital Publications)
    • 3. Unified object presentation Landing page + persistent ID KB metadata search Scientist, student etc. (via WCLocal)KB full-text search (via Platform for Digital Landing page Publications) (within Platform for Digital Publications)
    • 3. Unified object presentation Landing page + persistent ID KB metadata search Scientist, student etc. (via WCLocal)KB full-text search (via Platform for Digital Landing page Publications) (within Platform for Digital Publications)
    • 3. Unified object presentation Landing page + persistent ID KB metadata search Scientist, student etc. (via WCLocal)KB full-text search (via Platform for Digital Landing page Publications) (within Platform for Digital Publications)
    • So… we’re very busy creating loads of digital content … http://www.bjp-online.com/IMG/001/110001/getty-archive-collection.jpg?1282665196
    • So… we’re very busy creating loads of digital content … and we’re also creatingunified discovery & presenta tion … http://www.bjp-online.com/IMG/001/110001/getty-archive-collection.jpg?1282665196
    • http://www.corbisimages.com/Search#pg=h+armstrong+roberts&p=1&ColorFormat=2&q=happy y… ppmaking quite a few people ha
    • B ut we want more …
    • KB wants to make MORE peopleHAPPI ER with its content!
    • Some strategies… http://mestadelsbilder.files.wordpress.com/2011/06/dali.jpg
    • 1) APIs / dataservices OAI-PMH & SRU http://www.ducatimeccanica.com/single_engine.jpg
    • 2) Content via (social) networks
    • 3) Clear licensing information for use & reuse CC-zero, unless (legal) restrictions apply
    • 4) Strategic partnerships Europeana & Wikipedia
    • 5) Crowdsourcing Collaborative OCR correctionhttp://4.bp.blogspot.com/-QqBeVbbjrpY/T4csQk4dtcI/AAAAAAAAAYw/6btxopsuRsM/s1600/crowd2.jpg
    • Thanks for your attention! olaf.janssen@kb.nl - @ookgez ellig - slideshare.net/OlafJanssenN L