JISC Digital Media Seminar  15 September 2009 Putting it into practice: a digitisation case study Grant Young Digitisation...
<ul><li>A “Large-Scale Digitisation Initiative” (LSDI): </li></ul><ul><li>12 partners </li></ul><ul><li>25 months (Februar...
Sponsor Funder Project Lead & Digitisation Publishing partner Metadata partner Content Contributors Project Manager Educat...
Content What did we create? JSTOR collection   http://www.jstor.org/
 
 
 
 
 
Libraries Collections Pamphlets (approx.) Durham Earls Grey Family collection 1,000 Liverpool Earls of Derby (Knowsley) Fa...
Project How did we do it? Scoping study   http://www.jisc.ac.uk/publications/documents/pub_digi_scopingstudy. aspx Project...
<ul><ul><li>Create significant content (size and scope) </li></ul></ul><ul><ul><li>Build on previous relationships and exp...
<ul><ul><li>Relationships – RLUK membership </li></ul></ul><ul><ul><li>Metadata – RSLP/CURL 19 th  Century Pamphlets Catal...
Ensure discoverability Pamphlet Collection Google Scholar Search Copac Academic & National Library Catalogue Catalogues of...
<ul><ul><li>Partners license all their content to RLUK </li></ul></ul><ul><ul><li>RLUK-JSTOR agreement for  25 years </li>...
<ul><ul><li>Condition of pamphlets </li></ul></ul><ul><ul><li>Extent of duplication </li></ul></ul><ul><ul><li>Copyright s...
<ul><ul><li>Condition of pamphlets  –  tough </li></ul></ul><ul><ul><li>Extent of duplication  –  significant </li></ul></...
Technical standards Images: 600 bitonal for text; 300 grey for images OCR:  97-98% character accuracy Metadata: METS – str...
Workflows
Tools
Timetables
<ul><ul><li>Getting started and finished </li></ul></ul><ul><ul><li>Coping with changing standards </li></ul></ul><ul><ul>...
Lessons What did we learn? Final report   http://www.britishpamphlets.org.uk/docs/about/PamphletsFinalReport.pdf
<ul><ul><li>Projects don’t go to plan – things will go wrong and opportunities will arise </li></ul></ul><ul><ul><li>Proje...
<ul><ul><li>Challenge to get people in place at beginning and keep them there till end </li></ul></ul><ul><ul><li>Building...
<ul><ul><li>Scholars can view pamphlets quite differently  (intellectual content vs archival objects; individual items vs ...
<ul><ul><li>Librarians often treat pamphlets very differently  (definition, location, binding, handling) </li></ul></ul>No...
<ul><ul><li>Sampling and piloting are helpful, but not foolproof  (especially if you change parameters – e.g. from 600 bit...
Must pay close attention to the workflow! Insufficient scanning rate detected New scanners Need for more pamphlets detecte...
Must pay close attention to the workflow! 20 seconds to write two-page grey image to file = significant operator delay Sca...
<ul><ul><li>Some 19 th  century content will still be in copyright, but without considerable research it is not always pos...
<ul><ul><li>Very, very complicated when there is a large consortium (12 partners), a long agreement (25 years), two jurisd...
<ul><ul><li>The METS standard can be configured in several different ways </li></ul></ul><ul><ul><li>Both MIX and PREMIS w...
<ul><ul><li>Can provide challenge when there are different priorities, cultures and timezones… </li></ul></ul><ul><ul><li>...
<ul><ul><li>Large number and wide selection of 19 th  Century pamphlets available online </li></ul></ul><ul><ul><li>Effici...
<ul><ul><li>Southampton has no large follow-on project </li></ul></ul><ul><ul><li>It has lost highly skilled scanning staf...
<ul><ul><li>Ithaka report: http://www.ithaka.org/ithaka-s-r/strategy/sca_ithaka_sustainability_report-final.pdf   </li></u...
Any questions or comments? With thanks to JISC, JSTOR and the university libraries of Cambridge and Southampton, particula...
Upcoming SlideShare
Loading in …5
×

Putting it into practice: a digitisation case study

1,389 views

Published on

Grant presents a case study of the 19th Century Pamphlets digitisation project, covering the decisions made in planning the project, the challenges encountered, and key lessons learned.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,389
On SlideShare
0
From Embeds
0
Number of Embeds
106
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Putting it into practice: a digitisation case study

  1. 1. JISC Digital Media Seminar 15 September 2009 Putting it into practice: a digitisation case study Grant Young Digitisation & Digital Preservation Specialist Cambridge University Library CAMBRIDGE UNIVERSITY LIBRARY
  2. 2. <ul><li>A “Large-Scale Digitisation Initiative” (LSDI): </li></ul><ul><li>12 partners </li></ul><ul><li>25 months (February 2007-February 2009) </li></ul><ul><li>26,041 unique pamphlets, 1,000,732 pages </li></ul><ul><li>Archival dataset comprising 3 million+ files </li></ul><ul><li>£1M project (including partner contributions) </li></ul>
  3. 3. Sponsor Funder Project Lead & Digitisation Publishing partner Metadata partner Content Contributors Project Manager Education Officer Website Developer
  4. 4. Content What did we create? JSTOR collection http://www.jstor.org/
  5. 10. Libraries Collections Pamphlets (approx.) Durham Earls Grey Family collection 1,000 Liverpool Earls of Derby (Knowsley) Family collection 1,500 Newcastle Joseph Cowen (1829-1900) Personal collection 1,500 UCL Joseph Hume (1777-1855) Personal collection 5,000 Manchester Foreign Office & Colonial Office collections Government collections Local and anti-slavery collections 5,000 Bristol Selections from 19 th Century collection 5,000 LSE Selections from 19 th Century collection 7,000 26,000
  6. 11. Project How did we do it? Scoping study http://www.jisc.ac.uk/publications/documents/pub_digi_scopingstudy. aspx Project plan http://www.jisc.ac.uk/media/documents/programmes/digitisation/pampp. pdf Final report http://www.britishpamphlets.org.uk/docs/about/PamphletsFinalReport.pdf
  7. 12. <ul><ul><li>Create significant content (size and scope) </li></ul></ul><ul><ul><li>Build on previous relationships and experience of partners </li></ul></ul><ul><ul><li>Ensure good discoverability </li></ul></ul><ul><ul><li>Ensure good sustainability </li></ul></ul>Key goals in preparing project bid & plan
  8. 13. <ul><ul><li>Relationships – RLUK membership </li></ul></ul><ul><ul><li>Metadata – RSLP/CURL 19 th Century Pamphlets Cataloguing Project (1999-2002, £800K) </li></ul></ul><ul><ul><li>Digitisation infrastructure – Southampton’s BOPCRIS digitisation unit (18 th Century Parliamentary Papers) </li></ul></ul><ul><ul><li>Delivery & preservation infrastructure – JSTOR </li></ul></ul><ul><ul><li>Resource discovery infrastructure – JSTOR and Mimas </li></ul></ul>Build on previous relationships and experience of partners
  9. 14. Ensure discoverability Pamphlet Collection Google Scholar Search Copac Academic & National Library Catalogue Catalogues of libraries holding pamphlets JSTOR’s search interface 19 th Century Pamphlets Web Guide Pamphlet level (bibliographic) Full text search JSTOR Mimas Links from other JSTOR content Many other services, resources & collections CrossRef, OAI… Regular Google Search
  10. 15. <ul><ul><li>Partners license all their content to RLUK </li></ul></ul><ul><ul><li>RLUK-JSTOR agreement for 25 years </li></ul></ul><ul><ul><ul><li>JSTOR provides free archiving & delivery for UK (HE, FE, schools, public libraries) in exchange for commercialisation in rest of world </li></ul></ul></ul><ul><ul><li>Only exclusive for 5 years . After this… </li></ul></ul><ul><ul><ul><li>Libraries can deliver digital copies of their own pamphlets via open access </li></ul></ul></ul><ul><ul><ul><li>RLUK can enter into further agreements over use of the content </li></ul></ul></ul>Ensure sustainability
  11. 16. <ul><ul><li>Condition of pamphlets </li></ul></ul><ul><ul><li>Extent of duplication </li></ul></ul><ul><ul><li>Copyright status </li></ul></ul><ul><ul><li>Technical standards </li></ul></ul><ul><ul><li>Workflows, tools and timetables </li></ul></ul>Scoping study survey by library staff testing, samples, discussions
  12. 17. <ul><ul><li>Condition of pamphlets – tough </li></ul></ul><ul><ul><li>Extent of duplication – significant </li></ul></ul><ul><ul><li>Copyright status – couldn’t be ignored </li></ul></ul><ul><ul><li>Technical standards – JSTOR’s capture standards, emerging metadata standards </li></ul></ul><ul><ul><li>Workflows, tools and timetables – developed or scoped out </li></ul></ul>Scoping study
  13. 18. Technical standards Images: 600 bitonal for text; 300 grey for images OCR: 97-98% character accuracy Metadata: METS – structural MODS – bibliographic METS – technical* PREMIS – preservation* *selective use
  14. 19. Workflows
  15. 20. Tools
  16. 21. Timetables
  17. 22. <ul><ul><li>Getting started and finished </li></ul></ul><ul><ul><li>Coping with changing standards </li></ul></ul><ul><ul><li>Ensuring sufficient production rate </li></ul></ul><ul><ul><li>Dealing with IPR issues </li></ul></ul><ul><ul><li>Maintaining relationships with multiple partners over a (relatively) long project </li></ul></ul>Smooth sailing…well, not quite!
  18. 23. Lessons What did we learn? Final report http://www.britishpamphlets.org.uk/docs/about/PamphletsFinalReport.pdf
  19. 24. <ul><ul><li>Projects don’t go to plan – things will go wrong and opportunities will arise </li></ul></ul><ul><ul><li>Projects depend on people as well as technology – good communication and trust are vital </li></ul></ul>The headlines (no surprises here):
  20. 25. <ul><ul><li>Challenge to get people in place at beginning and keep them there till end </li></ul></ul><ul><ul><li>Building/altering systems will take time and can cause delays </li></ul></ul><ul><ul><li>Preparing and transferring large amounts of content (electronic or physical!) can take longer than anticipated </li></ul></ul>Starting & finishing a project is hard work!
  21. 26. <ul><ul><li>Scholars can view pamphlets quite differently (intellectual content vs archival objects; individual items vs collections) </li></ul></ul>Not all pamphlets and users are equal!
  22. 27. <ul><ul><li>Librarians often treat pamphlets very differently (definition, location, binding, handling) </li></ul></ul>Not all pamphlets and users are equal!
  23. 28. <ul><ul><li>Sampling and piloting are helpful, but not foolproof (especially if you change parameters – e.g. from 600 bitonal to 300 grey) </li></ul></ul><ul><ul><li>Time and motion is very important – every second can count when undertaking large-scale digitisation </li></ul></ul>Must pay close attention to the workflow!
  24. 29. Must pay close attention to the workflow! Insufficient scanning rate detected New scanners Need for more pamphlets detected Additional pamphlets & month extension
  25. 30. Must pay close attention to the workflow! 20 seconds to write two-page grey image to file = significant operator delay Scanner is ready by time next page is set up
  26. 31. <ul><ul><li>Some 19 th century content will still be in copyright, but without considerable research it is not always possible to know </li></ul></ul><ul><ul><li>If we’d taken zero risk, we would have excluded 25% of the pamphlets </li></ul></ul><ul><ul><li>Accepting a tiny, calculated risk, we excluded less than 1% </li></ul></ul>IPR: worth taking a risk with copyright!…
  27. 32. <ul><ul><li>Very, very complicated when there is a large consortium (12 partners), a long agreement (25 years), two jurisdictions (UK and US), and commercialisation (subscription model) </li></ul></ul><ul><ul><li>We needed 9 separate agreements and took two years to conclude them! </li></ul></ul><ul><ul><li>This can test relationships </li></ul></ul>IPR: …but not with licensing!
  28. 33. <ul><ul><li>The METS standard can be configured in several different ways </li></ul></ul><ul><ul><li>Both MIX and PREMIS were updated in the course of the project requiring us to adjust our system and regenerate data </li></ul></ul>Standards are not always clear and can change on you!
  29. 34. <ul><ul><li>Can provide challenge when there are different priorities, cultures and timezones… </li></ul></ul><ul><ul><li>Can provide opportunity and flexibility, with a wider pool of skills and experience to draw on </li></ul></ul>Working collaboratively + and -
  30. 35. <ul><ul><li>Large number and wide selection of 19 th Century pamphlets available online </li></ul></ul><ul><ul><li>Efficient centralised scanning </li></ul></ul><ul><ul><li>Sustainable preservation and delivery </li></ul></ul><ul><ul><li>Sophisticated, distributed discovery and access </li></ul></ul><ul><ul><li>Useful models for other projects </li></ul></ul><ul><ul><li>But… </li></ul></ul>Despite challenges, we think we met project aim and objectives
  31. 36. <ul><ul><li>Southampton has no large follow-on project </li></ul></ul><ul><ul><li>It has lost highly skilled scanning staff </li></ul></ul><ul><ul><li>It has underutilised capacity </li></ul></ul><ul><ul><li>It doesn’t have to maintain data (as in previous projects), but it does have high ongoing space/maintenance costs </li></ul></ul>… although we sustained the content, we failed to sustain the infrastructure!
  32. 37. <ul><ul><li>Ithaka report: http://www.ithaka.org/ithaka-s-r/strategy/sca_ithaka_sustainability_report-final.pdf </li></ul></ul><ul><ul><li>Case studies: http://www.ithaka.org/ithaka-s-r/strategy/ithaka-case-studies-in-sustainability/case-studies/ </li></ul></ul>SCA/Ithaka focus on sustainability
  33. 38. Any questions or comments? With thanks to JISC, JSTOR and the university libraries of Cambridge and Southampton, particularly Christine Fowler of Southampton CAMBRIDGE UNIVERSITY LIBRARY

×