Muehlberger umea google


Published on

Published in: Education, Technology
  • Be the first to comment

Muehlberger umea google

  1. 1. What libraries can learn from Google – and what they can do better Günter Mühlberger University Innsbruck Library
  2. 2. Agenda • Introduction • A story about digitisation • The continuation of the story • Some conclusions
  3. 3. Introduction • Department for Digitisation and Digital Preservation – Founded in 2002, 14 FTE, R&D and Digitisation Services – Since 1998 coordinated several R&D EU projects in the digital library domain – Currently involved in several projects, e.g.: IMPACT (mass-digitisation of textual material, text recognition and language technologies), Prestoprime (long term preservation of audio-visual material), both projects will set up a CoC – Coordinator of the library network eBooks on Demand (EOD) with 30 member libraries in 13 countries: Digitisation on Demand service • Several medium and large scale digitisation projects + respective applications for searching, browsing, archiving – Catalogue cards – Newspapers and newspaper clippings – Books and journals • Our mission – To make a valuable contribution to an up to date digital library
  4. 4. A short story • January 2007 – Collection of 30.000 books from a monastery “Servitenbibliothek” as present to the library – No spare shelves at the library for such a collection since a collection of German dissertations occupies the best magazines – Suggestion to get rid of the dissertations – Decision to digitize first and than to throw them away • During 2007 – Several experiments with document scanners, cutting of the documents, workflows, etc.
  5. 5. Digitisation of dissertations • 2008 – mid 2010 – Real production process with two parallel document scanners and up to 70.000 pages per day, 50.000 pages as average – Average of 2’ per dissertation (110 pages) including ALL steps in the workflow – Convincing scan quality: Tests show that OCR will be nearly perfect – All extra pages (supplements, tables, etc.) are treated extra – Single cutting of documents too time consuming – Change of paper quality • Summer 2010 – We have processed 216.000 dissertations with 24 mill. pages, 1800 shelf meters – 400 GB image data (TIFF IV bitonal) – Overall time invested: 8000 hours or 5 person years – High quality industrial equipment for less than 50.000 EUR – Tests for OCR processing the 24 mill. pages are encouraging
  6. 6. Continuation of the story • How can we give access to this large collection? – Copyright comes in • Investigations on Austrian copyright – We are allowed to scan for preservation purposes. O.k! – We are allowed to store for preservation. O.k! – We are allowed to print out a copy and use it instead of what we had before we digitised everything. Hm! – We are allowed to use this copy for interlibrary loan – but need to get it back. Uups! – We are not allowed to make them available to the public. O.k! – We are not allowed to make them available to our researchers and students at the university. Uups! – We are not allowed to make them available to other libraries owning the same dissertations. Pff! – We are allowed to provide access on a handful of dedicated computers at the library. Mmh!
  7. 7. Some more considerations • “Making available” is a new kind of use – Copying, distribution, translation, exhibiting, etc. are traditional use forms and publisher contracts cover this kind of use – In 2003 (following the EU Directive on Copyright from 2001) a new kind of use was introduced: “making available” – Since this is a new right “old” contracts (usually) do not cover this right. – The author is therefore the right holder, not the publisher. – In some countries it is more complicated (e.g. Germany) but as a rule of thumb most authors in Europe still have the right to decide by whom, when and how their digitised work will be made available to the public • Dissertations – Even simpler since no publishers or RROs are involved – Dissertations were printed on behalf of the authors, never distributed via the book market
  8. 8. Our approach to copyright • Let’s the social Internet work for us – Dissertations will be made available online, but only title page, table of contents and abstract/introduction will be shown to everyone – Under discussion: Maybe also some more pages and search snippets – Readers will get the chance to write a short “Request”: I would need this book for my scientific work, etc. – Readers will be encouraged to contact potential right holders (“Do the diligent search for us”) • Registration mechanism – A big displayer will appear: If you are the author or if you know the author/right holder – please help us! – Authors will need to register (personal coordinates), set some options and confirm their statement
  9. 9. Authorisation • Copyright options – They may want to make a general statement: Open Access, Creative Commons, All rights reserved – A cooperation with authors organisation (RRO) will make sense – Or they may want to make a specific statement: This library is allowed to do that and that. Than it is a simple bilateral, non- exclusive contract. • How to identify the right holder? – Digital signatures or eCards would make life much easier. • Current plan: – Author provides address. – He receives a letter with a list of TAN codes which will be needed for any action within the system. – If he chooses to “reserve all rights” the data are transferred to the RRO(s) – Minimal risk remains but can be neglected
  10. 10. Our dream • We hope – That it becomes a “self-runner” where those who need the information will convince those who have the rights to provide free access – or at least provide some access rights for libraries – That authors will understand why it is so important that libraries digitise current material and provide access to everyone – That users will understand that authors have rights (copyright and personal rights) which need to be respected – That RROs and publishers will understand that not everyone is interested in “making money with books written 30 years ago” but that many are also willing to support the idea of open access – That thousands and ten-thousands of authors and readers will take part
  11. 11. What we can learn from Google • Mission of Google is to organise the information universe based on technological innovation – Therefore books are highly important (they contain much better information than websites) – Digitisation of books was just one step towards the overall objective • If you have a mission, do the first step first and afterwards sort out the problems – Organise the cheapest way to scan, build your own machines, workflow, etc. – Make a reasonable compromise between quantity and quality – Be innovative (take what is here but put it together in a new way) • Convert problems into chances – Rather sure that Google underestimated the impact of copyright – Settlement was probably not foreseen from the very beginning, but now it is a great business opportunity for them – If it comes, it will allow them to make a lot of money • Battle on books is won /lost in the 20th century not in the public domain – Who reads books from 19th, 18th or 17th century?
  12. 12. What libraries can do better • Libraries also need to follow their mission: to preserve the intellectual heritage of mankind and to provide free access to everyone – Google is not a library – It does many things as if it were a library (and better), but it never will become a library – Preservation comprises analogue AND digital preservation (go hand in hand) • to digitise (collect) everything – Libraries are collection holders, not Google or anyone else – Digitisation (and everything what is connected) has to be part of the daily business and not only of projects – Digitisation should be twofold: on-demand AND via mass digitisation (including cutting of documents and 20th century material) – A natural consequence is to also collect modern material in digital format (right from the beginning, pre-press files)
  13. 13. What libraries can do better • to cooperate among each other (nationally and internationally) – Most libraries have the same books, even duplicates within an institution – Swedish books in Austria, German books in Sweden, etc. – Open access material will no longer belong to one library, but to everyone! – Therefore it makes definitely sense to cut one book and store the pages digitally and analogue (acid free box) • to involve readers (and right holders) – Libraries have a “natural authority” which needs to be exploited as a market advantage – Libraries are much nearer to authors and readers than anyone else, but they need to give them the chance to express themselves – They may be slow, old-fashioned and technologically not on the fore- front but they are trustful organisations and are able to mobilise thousands or even hundred thousands of users
  14. 14. Let’s go to work!