Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Digitisation and Print-on-demand Ed Chamberlain - Systems Development Librarian
Question: <ul><li>How could we (better) automate the digitisation workflow?  </li></ul>
 
Why is digitisation important to libraries? <ul><li>Better expose existing collections to a wider audience </li></ul><ul><...
Why is it important to me? <ul><li>Previous work on the Biodiversity Heritage Library project </li></ul><ul><li>I feel tha...
What's’ happening now in Cambridge? <ul><li>Digitisation is focused on special collections </li></ul><ul><ul><li>Limited f...
As it stands …
Barriers to digitisation <ul><li>Barriers are not technology-centric </li></ul><ul><ul><li>Copyright legislation </li></ul...
Areas of investigation … <ul><li>Examine technological responses to barriers: </li></ul><ul><ul><ul><li>Copyright legislat...
What exactly? <ul><ul><ul><li>Copyright legislation  = >  Speed up the copyright analysis  =>   COPYRIGHT CALCULATOR </li>...
Focus of fellowship … <ul><ul><ul><li>COPYRIGHT CALCULATOR </li></ul></ul></ul><ul><ul><ul><li>KIRTAS AUTOMATIC BOOK SCANN...
Imagine … <ul><li>Full or partial digitization of a work instead of a stack request initiated from a catalogue </li></ul><...
Full digitisation at readers’ request …
Why ‘On-Demand’? <ul><li>Expectation of modern society </li></ul><ul><li>Self sustaining – if reader pays for cost of digi...
Fellowship methodology … <ul><li>Explore each area in turn … </li></ul><ul><li>Visit case studies </li></ul><ul><li>Assemb...
<ul><li>1) Copyright and  </li></ul><ul><li>copyright calculation </li></ul>
Basic problems with copyright: <ul><li>Fiendish stuff </li></ul><ul><li>Complexity slows down decisions </li></ul><ul><li>...
Scope for automation <ul><li>Copyright legislation as a set of rules into which data about a work is fed </li></ul><ul><li...
…  Exactly what others have thought <ul><li>Open Knowledge Foundation - Public domain works project / Europeana </li></ul>...
 
 
Conclusions on copyright calculation <ul><li>Out of the 100 samples, 76 returned an expected result given the data availab...
<ul><li>2) Digitisation-on-demand </li></ul>
Not that difficult to copy a book quickly…
Why Kirtas? <ul><li>Two in Cambridge at the press </li></ul><ul><li>Used in the Cambridge libraries Collections project </...
Kirtas video … <ul><li>http:// www.youtube.com/watch?v =V03s5oJDwwc </li></ul>
Automated page turning … <ul><li>But with a human watching just in case … </li></ul><ul><li>Cost saving? </li></ul><ul><li...
Automated post processing … <ul><li>But images are also sent to India for a two week tidy-up </li></ul><ul><li>Quick enoug...
What level of quality is sufficient for a library surrogate? <ul><ul><li>Focus on improving access rather than preservatio...
Demand for this kind of thing? <ul><li>91% (56/61) of Cambridge academics surveyed would be interested in a full text digi...
What can we copy? Estimations of University of Cambridge holdings within the public domain. R.Pollock 2009 657,361 19 3458...
What can we copy? <ul><li>Around 19% of CUL’s collections fall within the public domain  </li></ul><ul><li>Niche interest ...
How much does it cost? <ul><li>Cheaper than current services … </li></ul><ul><ul><li>Imaging option:  Photocopy/Scan </li>...
How much would readers pay? <ul><ul><ul><li>Survey information reveals that 66% (36/54) academic users would prefer to pay...
Conclusions for digitisation-on-demand <ul><li>Great technology, nice idea, some demand </li></ul><ul><li>Somewhat limited...
<ul><li>3) Print-on-demand </li></ul>
Print on demand <ul><li>Nothing new for publishing </li></ul><ul><li>Espresso Book Machine is the most exciting thing out ...
EBM video <ul><li>http:// www.youtube.com/watch?v =Q946sfGLxm4 </li></ul>
Blackwells Experience <ul><li>Lots of interest </li></ul><ul><li>Needs full time staff to run </li></ul><ul><li>Strong int...
Utah Experience <ul><ul><ul><li>“ It undermines the need for traditional subject selection, disrupting a major sub-discipl...
Utah Experience <ul><li>“ Undermines the need for publishers to print speculative runs of new books, thus potentially chan...
Demand? <ul><li>65% would also be interested in a print facsimile </li></ul><ul><li>42% of academic respondents would be w...
Costs? <ul><li>£10 per 350 page volume </li></ul><ul><li>Blackwells have a pricing model that does not recoup capital </li...
<ul><li>Final thoughts … </li></ul>
Conclusions for both print and digitisation <ul><li>High upfront cost – any model that attempts to recoup capital through ...
Conclusions for both print and digitisation <ul><li>Aiming to hit a moving target of user expectation </li></ul><ul><li>Da...
Conclusions for both print and digitisation <ul><li>Demand is high </li></ul><ul><li>Breakthrough technology – getting che...
Are libraries loosing digital customers by playing fair? <ul><li>Google continues to digitise, despite legal setbacks and ...
<ul><li>Remove barriers - Make it easier to get material people need for free for them (or cheaply) </li></ul><ul><li>Lowe...
Ed Chamberlain <ul><li>[email_address] </li></ul><ul><li>@edchamberlain </li></ul>This work is licensed under a  Creative ...
Upcoming SlideShare
Loading in …5
×

Digitisation on demand arcadia seminar

1,400 views

Published on

For his Michaelmas 2010 Arcadia fellowship, Ed Chamberlain investigated ways to speed up the digitisation process in academic libraries. He identified three problem areas and explored issues surrounding corresponding potential solutions, including automated book scanners and the Espresso print-on-demand machine. The seminar will recount his findings, and provide an opportunity to discuss how libraries can successfully interface with innovative technologies.

About the Speaker:

Ed Chamberlain works as Systems Development Librarian at Cambridge University Library

His library career so far has spanned three sectors, including Oxford, the London Library and the Natural History Museum. Here, Ed was involved in the early creation and development of online services based around digitised materials, including the Bio-Diversity Heritage Library mass-digitisation project. He has a BA in Politics from the University of East Anglia and an MA in Library and Information management at Loughborough University.

Ed took up his current position in 2007 and has taken a lead in the redevelopment of online services and systems supporting both electronic and print library resources. Ed's professional interests include all aspects of online library and information services, especially web design trends and underlying software architecture. He is also interested in new standards of metadata, including emerging semantic web based services and open publishing models for both data and content.

Please email your intent to attend to Michelle Heydon, mh569@cam.ac.uk

This talk is part of the Arcadia Project Seminars series.

Date: Tuesday 3rd May 2011
Time: 18:00-19:15 - refreshments from 17:45
Venue: Old Combination Room (OCR), Wolfson College

Published in: Technology, Business
  • Be the first to comment

Digitisation on demand arcadia seminar

  1. 1. Digitisation and Print-on-demand Ed Chamberlain - Systems Development Librarian
  2. 2. Question: <ul><li>How could we (better) automate the digitisation workflow? </li></ul>
  3. 4. Why is digitisation important to libraries? <ul><li>Better expose existing collections to a wider audience </li></ul><ul><li>Better meet reader expectation of ‘everything online’ </li></ul><ul><li>Preserve material </li></ul>
  4. 5. Why is it important to me? <ul><li>Previous work on the Biodiversity Heritage Library project </li></ul><ul><li>I feel that libraries are still not fulfilling their tremendous potential here </li></ul><ul><li>Cambridge has no ‘Google books project’ </li></ul><ul><ul><ul><li>Is there an alternative model? </li></ul></ul></ul><ul><li>Is a one-site UL sustainable forever? </li></ul>
  5. 6. What's’ happening now in Cambridge? <ul><li>Digitisation is focused on special collections </li></ul><ul><ul><li>Limited funds </li></ul></ul><ul><ul><li>Cambridge's USP! </li></ul></ul><ul><li>Relatively slow, manual process done to exemplar standards </li></ul><ul><li>Not scalable (at an effective cost) </li></ul>
  6. 7. As it stands …
  7. 8. Barriers to digitisation <ul><li>Barriers are not technology-centric </li></ul><ul><ul><li>Copyright legislation </li></ul></ul><ul><ul><li>Cost / time </li></ul></ul><ul><ul><li>Difficulty in reading on a screen </li></ul></ul>
  8. 9. Areas of investigation … <ul><li>Examine technological responses to barriers: </li></ul><ul><ul><ul><li>Copyright legislation => Speed up / rationalise copyright analysis </li></ul></ul></ul><ul><ul><ul><li>Cost /time => Explore automated book scanning </li></ul></ul></ul><ul><ul><ul><li>People prefer to a book to a screen => Explore print on demand </li></ul></ul></ul>
  9. 10. What exactly? <ul><ul><ul><li>Copyright legislation = > Speed up the copyright analysis => COPYRIGHT CALCULATOR </li></ul></ul></ul><ul><ul><ul><li>Cost / time => Explore automated book scanning =>KIRTAS AUTOMATIC BOOK SCANNER </li></ul></ul></ul><ul><ul><ul><li>People prefer to a book to a screen => Explore print on demand => ESPRESSO BOOK PRINTING MACHINE </li></ul></ul></ul>
  10. 11. Focus of fellowship … <ul><ul><ul><li>COPYRIGHT CALCULATOR </li></ul></ul></ul><ul><ul><ul><li>KIRTAS AUTOMATIC BOOK SCANNER </li></ul></ul></ul><ul><ul><ul><li>ESPRESSO BOOK PRINTING MACHINE </li></ul></ul></ul><ul><ul><ul><ul><ul><li>… investigate them as a basis for a potential ‘on-demand’ digitisation service </li></ul></ul></ul></ul></ul>
  11. 12. Imagine … <ul><li>Full or partial digitization of a work instead of a stack request initiated from a catalogue </li></ul><ul><li>Straight to desktop in less than a day </li></ul><ul><li>Order a bound print copy as an option </li></ul><ul><li>If it’s a public domain work then made available for all, under Creative Commons License… </li></ul>
  12. 13. Full digitisation at readers’ request …
  13. 14. Why ‘On-Demand’? <ul><li>Expectation of modern society </li></ul><ul><li>Self sustaining – if reader pays for cost of digitisation -no large external donor needed </li></ul><ul><li>‘ Every book its reader’ </li></ul><ul><li>‘ Save the time of the reader’ </li></ul>
  14. 15. Fellowship methodology … <ul><li>Explore each area in turn … </li></ul><ul><li>Visit case studies </li></ul><ul><li>Assemble facts and figures where possible </li></ul><ul><li>Draw out advantages and disadvantages of each piece of technology </li></ul>
  15. 16. <ul><li>1) Copyright and </li></ul><ul><li>copyright calculation </li></ul>
  16. 17. Basic problems with copyright: <ul><li>Fiendish stuff </li></ul><ul><li>Complexity slows down decisions </li></ul><ul><li>Upsets risk-averse Librarians </li></ul><ul><li>We can only fully digitise what is in the Public Domain </li></ul>
  17. 18. Scope for automation <ul><li>Copyright legislation as a set of rules into which data about a work is fed </li></ul><ul><li>Out comes a result (yes/ no/ probably) </li></ul><ul><li>Sounds like a job for a machine, rather than a person … </li></ul>
  18. 19. … Exactly what others have thought <ul><li>Open Knowledge Foundation - Public domain works project / Europeana </li></ul><ul><li>Now exists as a machine accessible API </li></ul><ul><li>Feed in bib data - get a results </li></ul>
  19. 22. Conclusions on copyright calculation <ul><li>Out of the 100 samples, 76 returned an expected result given the data available (further 8 could have been useful if a safe cut off point was added) </li></ul><ul><li>Great technology to potentially assist in decision making </li></ul><ul><li>As useful in asserting what is not in the public domain, as opposed to what is </li></ul><ul><li>Data we can provide is incomplete for the task – sometimes further research will be needed </li></ul><ul><li>Great feature for a library catalogue - kick off an ordering process </li></ul>
  20. 23. <ul><li>2) Digitisation-on-demand </li></ul>
  21. 24. Not that difficult to copy a book quickly…
  22. 25. Why Kirtas? <ul><li>Two in Cambridge at the press </li></ul><ul><li>Used in the Cambridge libraries Collections project </li></ul><ul><li>CUP let me take a look! </li></ul>
  23. 26. Kirtas video … <ul><li>http:// www.youtube.com/watch?v =V03s5oJDwwc </li></ul>
  24. 27. Automated page turning … <ul><li>But with a human watching just in case … </li></ul><ul><li>Cost saving? </li></ul><ul><li>Still quicker than ‘by hand’ </li></ul>
  25. 28. Automated post processing … <ul><li>But images are also sent to India for a two week tidy-up </li></ul><ul><li>Quick enough for on-demand? </li></ul>
  26. 29. What level of quality is sufficient for a library surrogate? <ul><ul><li>Focus on improving access rather than preservation </li></ul></ul><ul><ul><li>Would a preservation quality image be too expensive to produce for an on-demand approach? </li></ul></ul><ul><ul><li>For the iPad and Kindle - text is as important as a scanned image </li></ul></ul>
  27. 30. Demand for this kind of thing? <ul><li>91% (56/61) of Cambridge academics surveyed would be interested in a full text digital copy of an out-of copyright work </li></ul><ul><li>62% (36/58) would be interested in a partial digital copy of an in-copyright work if available </li></ul>
  28. 31. What can we copy? Estimations of University of Cambridge holdings within the public domain. R.Pollock 2009 657,361 19 3458,116 Total 0 0 2130,509 1970-2009 0 0 262,974 1960-1970 0 0 118,251 1950-1960 4,361 6 72,692 1940-1950 9,057 10 90,576 1930-1940 19,667 25 78,670 1920-1930 24,195 40 60,489 1910-1920 45,734 65 70,360 1900-1910 56,850 85 66,883 1890-1900 60,171 90 66,857 1880-1890 48,035 95 50,564 1870-1880 43,734 100 43,734 1860-1870 40,970 100 40,970 1850-1860 304,587 100 304,587 1400-1850 No. PD % PD Items Pub. Date
  29. 32. What can we copy? <ul><li>Around 19% of CUL’s collections fall within the public domain </li></ul><ul><li>Niche interest in this area - 2% of circulation transactions affected material from 1850 -1920 </li></ul>
  30. 33. How much does it cost? <ul><li>Cheaper than current services … </li></ul><ul><ul><li>Imaging option: Photocopy/Scan </li></ul></ul><ul><ul><li>Image Type: A4 300 dpi (pdf) </li></ul></ul><ul><ul><li>Image production (350 images at 0.50): 175.00 </li></ul></ul><ul><ul><li>Service charge (15%): 26.25 </li></ul></ul><ul><ul><li>VAT (20%): 35.00 </li></ul></ul><ul><ul><li>Total: £236.25 for 350 pages </li></ul></ul><ul><li>But still not that cheap… </li></ul><ul><ul><li>About £30 for a 350 page work </li></ul></ul><ul><ul><li>(cost modelling based around the Kirtas manned by imaging services staff) </li></ul></ul><ul><ul><li>No capital recoup in that model </li></ul></ul>
  31. 34. How much would readers pay? <ul><ul><ul><li>Survey information reveals that 66% (36/54) academic users would prefer to pay under £ 1 5 for a digitised copy </li></ul></ul></ul><ul><ul><ul><li>Achieving this at cost or with a small surplus would be a challenge </li></ul></ul></ul><ul><ul><ul><li>Attempting to recoup capital investment directly would push costs beyond a ‘sweet-spot’ price point </li></ul></ul></ul><ul><ul><ul><li>Should they have to pay at all? </li></ul></ul></ul>
  32. 35. Conclusions for digitisation-on-demand <ul><li>Great technology, nice idea, some demand </li></ul><ul><li>Somewhat limited as an effective service by size of public domain </li></ul><ul><li>Large upfront costs if Kirtas purchased </li></ul><ul><li>Other cost models available (lease hire, outsource) </li></ul>
  33. 36. <ul><li>3) Print-on-demand </li></ul>
  34. 37. Print on demand <ul><li>Nothing new for publishing </li></ul><ul><li>Espresso Book Machine is the most exciting thing out there </li></ul>
  35. 38. EBM video <ul><li>http:// www.youtube.com/watch?v =Q946sfGLxm4 </li></ul>
  36. 39. Blackwells Experience <ul><li>Lots of interest </li></ul><ul><li>Needs full time staff to run </li></ul><ul><li>Strong interest in self publishing (theses) </li></ul><ul><li>Increasing amounts of material available from a variety of sources </li></ul><ul><ul><li>(Project Gutenburg, Google Books, publishers) </li></ul></ul>
  37. 40. Utah Experience <ul><ul><ul><li>“ It undermines the need for traditional subject selection, disrupting a major sub-discipline of librarianship. By doing so, it also undermines the rationale for a large research collection—if the purpose of the collection is to meet patrons’ information needs, and if they can now be met without buying and housing a large just-in-case collection, then how do we defend the unbelievably expensive and arguably quite wasteful practice of traditional collection building?” </li></ul></ul></ul>Rick Anderson, Marriott Library University of Utah
  38. 41. Utah Experience <ul><li>“ Undermines the need for publishers to print speculative runs of new books, thus potentially changing in a drastic way the logistics of the publishing world. In a rational marketplace, every bookstore would have an EBM or something that works on the same principle, and books would only be printed at the point of demand and purchase” </li></ul><ul><li>“ O b viously, its full potential has yet to be realized—but the fundamental model is now in place. What are left to fix (bad metadata, incomplete catalog, rights issues, etc.) are the details. In most cases, fixing them will require only money and effort, and as roadblocks go those are relatively simple ones” </li></ul>Rick Anderson, Marriott Library University of Utah
  39. 42. Demand? <ul><li>65% would also be interested in a print facsimile </li></ul><ul><li>42% of academic respondents would be willing to pay £ 1 0-£ 1 5, 33% £ 1 5- £ 2 5 </li></ul>
  40. 43. Costs? <ul><li>£10 per 350 page volume </li></ul><ul><li>Blackwells have a pricing model that does not recoup capital </li></ul>
  41. 44. <ul><li>Final thoughts … </li></ul>
  42. 45. Conclusions for both print and digitisation <ul><li>High upfront cost – any model that attempts to recoup capital through charges prices itself out of market </li></ul><ul><li>High upfront cost – High risk of failure </li></ul><ul><li>‘Innovators dilemma’ - we are in effect in competition with our bread and butter services </li></ul>
  43. 46. Conclusions for both print and digitisation <ul><li>Aiming to hit a moving target of user expectation </li></ul><ul><li>Danger of early adoption – not understanding or being aware of longer term issues (ejournals) </li></ul>
  44. 47. Conclusions for both print and digitisation <ul><li>Demand is high </li></ul><ul><li>Breakthrough technology – getting cheaper </li></ul>
  45. 48. Are libraries loosing digital customers by playing fair? <ul><li>Google continues to digitise, despite legal setbacks and gain the headlines </li></ul><ul><li>Users continue to digitise themselves… </li></ul><ul><ul><li>Privately in research groups </li></ul></ul><ul><ul><li>‘ Socially’( http:// library.nu / and other academic torrent sites) </li></ul></ul><ul><li>Many in academia now chose to ignore or challenge inflexibilities of copyright to get the material they need </li></ul>
  46. 49. <ul><li>Remove barriers - Make it easier to get material people need for free for them (or cheaply) </li></ul><ul><li>Lower costs – new approaches, new models of working </li></ul>How could we respond?
  47. 50. Ed Chamberlain <ul><li>[email_address] </li></ul><ul><li>@edchamberlain </li></ul>This work is licensed under a Creative Commons Attribution 3.0 Unported License .

×