Digitisation on demand arcadia seminar
Upcoming SlideShare
Loading in...5
×
 

Digitisation on demand arcadia seminar

on

  • 1,128 views

For his Michaelmas 2010 Arcadia fellowship, Ed Chamberlain investigated ways to speed up the digitisation process in academic libraries. He identified three problem areas and explored issues ...

For his Michaelmas 2010 Arcadia fellowship, Ed Chamberlain investigated ways to speed up the digitisation process in academic libraries. He identified three problem areas and explored issues surrounding corresponding potential solutions, including automated book scanners and the Espresso print-on-demand machine. The seminar will recount his findings, and provide an opportunity to discuss how libraries can successfully interface with innovative technologies.

About the Speaker:

Ed Chamberlain works as Systems Development Librarian at Cambridge University Library

His library career so far has spanned three sectors, including Oxford, the London Library and the Natural History Museum. Here, Ed was involved in the early creation and development of online services based around digitised materials, including the Bio-Diversity Heritage Library mass-digitisation project. He has a BA in Politics from the University of East Anglia and an MA in Library and Information management at Loughborough University.

Ed took up his current position in 2007 and has taken a lead in the redevelopment of online services and systems supporting both electronic and print library resources. Ed's professional interests include all aspects of online library and information services, especially web design trends and underlying software architecture. He is also interested in new standards of metadata, including emerging semantic web based services and open publishing models for both data and content.

Please email your intent to attend to Michelle Heydon, mh569@cam.ac.uk

This talk is part of the Arcadia Project Seminars series.

Date: Tuesday 3rd May 2011
Time: 18:00-19:15 - refreshments from 17:45
Venue: Old Combination Room (OCR), Wolfson College

Statistics

Views

Total Views
1,128
Slideshare-icon Views on SlideShare
1,128
Embed Views
0

Actions

Likes
0
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • This is primarily a recap of my Michalemas 2010 fellowship. It covered a lot of ground, but I’ve tried to string tonight's talk into something coherent. If the fellowship had been based around a single question, It would have most likely have been this.
  • This was the answer … 61 pages worth of report … Worth stating I think that both the report and tonight's presentation are really about what we could do in Cambridge, its not about a choice of policy, that’s not really what the Arcadia project is about, I’ve used the UL as an example and hook for my investigation, but a lot of it is non-specific.
  • Check room for Librarians … Take a step back and remember why we are doing this stuff The everything online expectation is an interesting one. By their own admission, Google believe they have digitized 15 million works since 2004, what they believe to be about 12% of everything published. There is of course a difference between everything and what is needed or useful, but we do need to challenge this whilst also trying to meet it … Then we have digitization as preservation, ( rationale being by minimising access to physical items rather than replacing physical items with digital) – not currently covered under fair usage in the States. My oroject focused upon 1+2 as key drivers, especially in the context of Cambridge libraries and where we are right now.. Three is important, but a whole career onto itself.
  • To spend a term on something, I really should care ‘enough’ about it. I’ve worked on large scale digitsation projects before, and its great. You get a chance to completely rethink all aspects of a library service. But that seems a bit different to my experiences in Cambridge, which is still fairly traditional in its approach. And I feel that libraries in general are not fullfilling their tremendous potential here Also, lets identify and elephant, Cambridge never signed up with Google, which is proably as much in our favour as it is not Google grab headlines with their digitization efforts, but its their library partners who have to be trusted with the long term preservation of their work. Also, they’ve recently deleted their video archive. Is there an alternative ?I wondered how we could digitize usefully and sustainably without a google style partner. We are in this for the long term. Google may not be. Is a one site UL with its great 30 minute stack request sustainable for much longer?
  • Digitisation is focused on special collections Relatively slow, manual process done to exemplar standards Not scalable to the whole collection (at an effective cost) Restrictions on material
  • So here we have a library reader -they can request material and copies of individual images, and meanwhile, our digitisation proceeds one, and they can access the results, but the operations are not centered around them. What is stopping us growing our digitisation program?
  • Copyright legislation Cost / time Difficulty in reading on a screen
  • So, for three barriers, three solutions … Copyright legislation => Speed up / rationalise copyright analysis Cost /time => Explore automated book scanning People prefer to a book to a screen => Explore print on demand
  • So, in the spirit of the Arcadia fellowship, how can technology help us overcome this, what is out there right now that can I found three technologies, and imagined a digitsation workflow based around them. All three areas of focus were based upon existing developments from somewhat outside of the library sphere Chose a specific development based upon only upon personal awareness
  • Full or partial digitization of a work instead of a stack request initiated from a catalogue Straight to desktop in less than a day Order a bound print copy as an option If it’s a public domain work then made available for all, under Creative Commons License… In my mind, this is the untapped digital potential of any library service
  • Expectation of modern society - retail from tescos to, ebook purchasing I assumed it could be self sustaining - no large external donor needed Practically - Is a one-site UL sustainable forever? Can our exemplar sub 30 minute stack request really continue to work for us? Ragnathans’ laws … ‘ Every book its reader’ - let them choose digitsation rather than google, Microsfot Save the time of the reader – we can possibly expand this concept with digital material to present fewer barriers’ - if they want digital text - they should be able to get it!
  • Explore each area in turn … Visit case studies Assemble facts and figures where possible Draw out advantages and disadvantages of each piece of technology
  • 1) Copyright and copyright calculation
  • Fiendish stuff 70 years after death is just the tip of the iceberg Complexity slows down decisions I would argue that complexity and risk upsets risk-averse Librarians – we are not hedge fund managers , we don’t like making controversial or bad decisions – especially not in high profile institutions like Cambridge. But we d know we can only fully digitise what is in the Public Domain
  • Copyright legislation as a set of rules into which data about a work is fed Out comes a result (yes/ no/ probably) Sounds like a job for a machine, rather than a person …
  • Open Knowledge Foundation - Public domain works project / Europeana Now exists as a machine accessible API Feed in bib data - get a results
  • Out of the 100 samples, 76 returned an expected result given the data available , 24 were judged as incorrect. Out of these, a further eight could have been correct if a publication date + 150 years safe cut-off date was assumed. Great technology to potentially assist in decision making As useful in asserting what is not in the public domain, as opposed to what is Data we can provide is incomplete for the whole task – author death date is incorrect,
  • Ion book saver Kirtas IA scribe machine DIT book scanner
  • Two in Cambridge at the press Used in the Cambridge libraries Collections project CUP let me take a look!
  • But with a human watching just in case … Cost saving? Still quicker than ‘by hand’
  • But images are also sent to India for a two week tidy-up Quick enough for on-demand?
  • Focus on improving access rather than preservation Would a preservation quality image be too expensive to produce for an on-demand approach? For the iPad and Kindle - text is as important as a scanned image
  • 91% of Cambridge academics surveyed would be interested in a full text digital copy of an out-of copyright work 62% would be interested in a partial digital copy of an in-copyright work if available
  • Around 19% of CUL’s collections fall within the public domain Niche interest in this area - 2% of circulation transactions affected material from 1850 -1920 In fact, a lot of this has already been digitised - some 20% of our collection held digitally by Hathi Trust (OCLC). Getting hold of this is not as straightforward as is should be – I looked at licensing and terms and conditions in the report
  • Cheaper than current services … Total: £236.25 for 360 pages But still not that cheap… About £25 for a 350 page work (cost modelling based around the Kirtas)
  • Survey information reveals that academic users would prefer to pay under £ 1 5 for a digitised copy Achieving this at cost or with a small surplus would be a challenge. Attempting to recoup capital investment directly would push costs beyond this ‘sweet-spot’ price point Should they have to pay at all? “ I should add: I don't think the university should be charging academics for access to research materials. Seriously, the two questions about funds struck me as ridiculous.”
  • Final stage of the proposed workflow ….
  • Lots of interest in the machine - it gets people through the door, vital for any shop. Great for long tail, not really suitable for Da vinci code. Needs full time staff to run, occasionally needs fixing. Most staff in the store can find and print off a book for customers. Takes 5-15 minutes to print, depending on nature of digtisation (PASS AROUND EXAMPLES ) (Quality is variable - pass around examples) Strong interest in self publishing Increasing amounts of material available from a variety of sources (Project Gutenburg, Google Books, publishers) - both in and out of copyright
  • 65% would also be interested in a print facsimile 42% of academic respondents would be willing to pay £ 1 0-£ 1 5, 33% £ 1 5- £ 2 5
  • £10 per 350 page volume Blackwells have a pricing model that does not recoup capital
  • High upfront cost – any model that attempts to recoup capital through charges prices itself out of market High upfront cost – High risk of failure ‘ Innovators dilemma’ - we are in effect in competition with our bread and butter services
  • Aiming to hit a moving target of user expectation – iPad and Kindle have radically reshaped reader expectations of online reading Danger of early adoption – not understanding or being aware of longer term issues (ejournals)
  • Demand is high Breakthrough technology – getting cheaper If I’m going to reach a final conclusion, its that we need to do something, here are some options, but to me they don't quite stack up yet.
  • Google continues to digitise, despite legal setbacks and gain the headlines Users continue to digitise themselves… Privately in research groups ‘ Socially’( http://library.nu/ and other academic torrent sites) Many in academia now chose to ignore or challenge inflexibilities of copyright to get the material they need
  • Remove barriers - Make it easier to get material people need for free for them (or cheaply) - Print or digital Lower costs – new approaches, new models of working The three new librarians—Andy Burkhardt, Champlain College, VT; Catherine Johnson, University of Baltimore; and Carissa Tomlinson, Towson University, MD—who presented on the “virtues” of next-gen librarians seemed to have an abundance of audacity, however, describing solutions to common problems, like how to engage first-year students in the library, with a little ingenuity. The next-gen virtues they identified include collegiality, playfulness, collaboration, flexibility, creativity, courage, and service-orientation, characteristics that must span the profession if we are to move our libraries ahead.

Digitisation on demand arcadia seminar Digitisation on demand arcadia seminar Presentation Transcript

  • Digitisation and Print-on-demand Ed Chamberlain - Systems Development Librarian
  • Question:
    • How could we (better) automate the digitisation workflow?
  •  
  • Why is digitisation important to libraries?
    • Better expose existing collections to a wider audience
    • Better meet reader expectation of ‘everything online’
    • Preserve material
  • Why is it important to me?
    • Previous work on the Biodiversity Heritage Library project
    • I feel that libraries are still not fulfilling their tremendous potential here
    • Cambridge has no ‘Google books project’
        • Is there an alternative model?
    • Is a one-site UL sustainable forever?
  • What's’ happening now in Cambridge?
    • Digitisation is focused on special collections
      • Limited funds
      • Cambridge's USP!
    • Relatively slow, manual process done to exemplar standards
    • Not scalable (at an effective cost)
  • As it stands …
  • Barriers to digitisation
    • Barriers are not technology-centric
      • Copyright legislation
      • Cost / time
      • Difficulty in reading on a screen
  • Areas of investigation …
    • Examine technological responses to barriers:
        • Copyright legislation => Speed up / rationalise copyright analysis
        • Cost /time => Explore automated book scanning
        • People prefer to a book to a screen => Explore print on demand
  • What exactly?
        • Copyright legislation = > Speed up the copyright analysis => COPYRIGHT CALCULATOR
        • Cost / time => Explore automated book scanning =>KIRTAS AUTOMATIC BOOK SCANNER
        • People prefer to a book to a screen => Explore print on demand => ESPRESSO BOOK PRINTING MACHINE
  • Focus of fellowship …
        • COPYRIGHT CALCULATOR
        • KIRTAS AUTOMATIC BOOK SCANNER
        • ESPRESSO BOOK PRINTING MACHINE
            • … investigate them as a basis for a potential ‘on-demand’ digitisation service
  • Imagine …
    • Full or partial digitization of a work instead of a stack request initiated from a catalogue
    • Straight to desktop in less than a day
    • Order a bound print copy as an option
    • If it’s a public domain work then made available for all, under Creative Commons License…
  • Full digitisation at readers’ request …
  • Why ‘On-Demand’?
    • Expectation of modern society
    • Self sustaining – if reader pays for cost of digitisation -no large external donor needed
    • ‘ Every book its reader’
    • ‘ Save the time of the reader’
  • Fellowship methodology …
    • Explore each area in turn …
    • Visit case studies
    • Assemble facts and figures where possible
    • Draw out advantages and disadvantages of each piece of technology
    • 1) Copyright and
    • copyright calculation
  • Basic problems with copyright:
    • Fiendish stuff
    • Complexity slows down decisions
    • Upsets risk-averse Librarians
    • We can only fully digitise what is in the Public Domain
  • Scope for automation
    • Copyright legislation as a set of rules into which data about a work is fed
    • Out comes a result (yes/ no/ probably)
    • Sounds like a job for a machine, rather than a person …
  • … Exactly what others have thought
    • Open Knowledge Foundation - Public domain works project / Europeana
    • Now exists as a machine accessible API
    • Feed in bib data - get a results
  •  
  •  
  • Conclusions on copyright calculation
    • Out of the 100 samples, 76 returned an expected result given the data available (further 8 could have been useful if a safe cut off point was added)
    • Great technology to potentially assist in decision making
    • As useful in asserting what is not in the public domain, as opposed to what is
    • Data we can provide is incomplete for the task – sometimes further research will be needed
    • Great feature for a library catalogue - kick off an ordering process
    • 2) Digitisation-on-demand
  • Not that difficult to copy a book quickly…
  • Why Kirtas?
    • Two in Cambridge at the press
    • Used in the Cambridge libraries Collections project
    • CUP let me take a look!
  • Kirtas video …
    • http:// www.youtube.com/watch?v =V03s5oJDwwc
  • Automated page turning …
    • But with a human watching just in case …
    • Cost saving?
    • Still quicker than ‘by hand’
  • Automated post processing …
    • But images are also sent to India for a two week tidy-up
    • Quick enough for on-demand?
  • What level of quality is sufficient for a library surrogate?
      • Focus on improving access rather than preservation
      • Would a preservation quality image be too expensive to produce for an on-demand approach?
      • For the iPad and Kindle - text is as important as a scanned image
  • Demand for this kind of thing?
    • 91% (56/61) of Cambridge academics surveyed would be interested in a full text digital copy of an out-of copyright work
    • 62% (36/58) would be interested in a partial digital copy of an in-copyright work if available
  • What can we copy? Estimations of University of Cambridge holdings within the public domain. R.Pollock 2009 657,361 19 3458,116 Total 0 0 2130,509 1970-2009 0 0 262,974 1960-1970 0 0 118,251 1950-1960 4,361 6 72,692 1940-1950 9,057 10 90,576 1930-1940 19,667 25 78,670 1920-1930 24,195 40 60,489 1910-1920 45,734 65 70,360 1900-1910 56,850 85 66,883 1890-1900 60,171 90 66,857 1880-1890 48,035 95 50,564 1870-1880 43,734 100 43,734 1860-1870 40,970 100 40,970 1850-1860 304,587 100 304,587 1400-1850 No. PD % PD Items Pub. Date
  • What can we copy?
    • Around 19% of CUL’s collections fall within the public domain
    • Niche interest in this area - 2% of circulation transactions affected material from 1850 -1920
  • How much does it cost?
    • Cheaper than current services …
      • Imaging option: Photocopy/Scan
      • Image Type: A4 300 dpi (pdf)
      • Image production (350 images at 0.50): 175.00
      • Service charge (15%): 26.25
      • VAT (20%): 35.00
      • Total: £236.25 for 350 pages
    • But still not that cheap…
      • About £30 for a 350 page work
      • (cost modelling based around the Kirtas manned by imaging services staff)
      • No capital recoup in that model
  • How much would readers pay?
        • Survey information reveals that 66% (36/54) academic users would prefer to pay under £ 1 5 for a digitised copy
        • Achieving this at cost or with a small surplus would be a challenge
        • Attempting to recoup capital investment directly would push costs beyond a ‘sweet-spot’ price point
        • Should they have to pay at all?
  • Conclusions for digitisation-on-demand
    • Great technology, nice idea, some demand
    • Somewhat limited as an effective service by size of public domain
    • Large upfront costs if Kirtas purchased
    • Other cost models available (lease hire, outsource)
    • 3) Print-on-demand
  • Print on demand
    • Nothing new for publishing
    • Espresso Book Machine is the most exciting thing out there
  • EBM video
    • http:// www.youtube.com/watch?v =Q946sfGLxm4
  • Blackwells Experience
    • Lots of interest
    • Needs full time staff to run
    • Strong interest in self publishing (theses)
    • Increasing amounts of material available from a variety of sources
      • (Project Gutenburg, Google Books, publishers)
  • Utah Experience
        • “ It undermines the need for traditional subject selection, disrupting a major sub-discipline of librarianship. By doing so, it also undermines the rationale for a large research collection—if the purpose of the collection is to meet patrons’ information needs, and if they can now be met without buying and housing a large just-in-case collection, then how do we defend the unbelievably expensive and arguably quite wasteful practice of traditional collection building?”
    Rick Anderson, Marriott Library University of Utah
  • Utah Experience
    • “ Undermines the need for publishers to print speculative runs of new books, thus potentially changing in a drastic way the logistics of the publishing world. In a rational marketplace, every bookstore would have an EBM or something that works on the same principle, and books would only be printed at the point of demand and purchase”
    • “ O b viously, its full potential has yet to be realized—but the fundamental model is now in place. What are left to fix (bad metadata, incomplete catalog, rights issues, etc.) are the details. In most cases, fixing them will require only money and effort, and as roadblocks go those are relatively simple ones”
    Rick Anderson, Marriott Library University of Utah
  • Demand?
    • 65% would also be interested in a print facsimile
    • 42% of academic respondents would be willing to pay £ 1 0-£ 1 5, 33% £ 1 5- £ 2 5
  • Costs?
    • £10 per 350 page volume
    • Blackwells have a pricing model that does not recoup capital
    • Final thoughts …
  • Conclusions for both print and digitisation
    • High upfront cost – any model that attempts to recoup capital through charges prices itself out of market
    • High upfront cost – High risk of failure
    • ‘Innovators dilemma’ - we are in effect in competition with our bread and butter services
  • Conclusions for both print and digitisation
    • Aiming to hit a moving target of user expectation
    • Danger of early adoption – not understanding or being aware of longer term issues (ejournals)
  • Conclusions for both print and digitisation
    • Demand is high
    • Breakthrough technology – getting cheaper
  • Are libraries loosing digital customers by playing fair?
    • Google continues to digitise, despite legal setbacks and gain the headlines
    • Users continue to digitise themselves…
      • Privately in research groups
      • ‘ Socially’( http:// library.nu / and other academic torrent sites)
    • Many in academia now chose to ignore or challenge inflexibilities of copyright to get the material they need
    • Remove barriers - Make it easier to get material people need for free for them (or cheaply)
    • Lower costs – new approaches, new models of working
    How could we respond?
  • Ed Chamberlain
    • [email_address]
    • @edchamberlain
    This work is licensed under a Creative Commons Attribution 3.0 Unported License .