Putting it into practice: a digitisation case study - Presentation Transcript
JISC Digital Media Seminar 15 September 2009 Putting it into practice: a digitisation case study Grant Young Digitisation & Digital Preservation Specialist Cambridge University Library CAMBRIDGE UNIVERSITY LIBRARY
Content What did we create? JSTOR collection http://www.jstor.org/
Libraries Collections Pamphlets (approx.) Durham Earls Grey Family collection 1,000 Liverpool Earls of Derby (Knowsley) Family collection 1,500 Newcastle Joseph Cowen (1829-1900) Personal collection 1,500 UCL Joseph Hume (1777-1855) Personal collection 5,000 Manchester Foreign Office & Colonial Office collections Government collections Local and anti-slavery collections 5,000 Bristol Selections from 19 th Century collection 5,000 LSE Selections from 19 th Century collection 7,000 26,000
Project How did we do it? Scoping study http://www.jisc.ac.uk/publications/documents/pub_digi_scopingstudy. aspx Project plan http://www.jisc.ac.uk/media/documents/programmes/digitisation/pampp. pdf Final report http://www.britishpamphlets.org.uk/docs/about/PamphletsFinalReport.pdf
Create significant content (size and scope)
Build on previous relationships and experience of partners
Digitisation infrastructure – Southampton’s BOPCRIS digitisation unit (18 th Century Parliamentary Papers)
Delivery & preservation infrastructure – JSTOR
Resource discovery infrastructure – JSTOR and Mimas
Build on previous relationships and experience of partners
Ensure discoverability Pamphlet Collection Google Scholar Search Copac Academic & National Library Catalogue Catalogues of libraries holding pamphlets JSTOR’s search interface 19 th Century Pamphlets Web Guide Pamphlet level (bibliographic) Full text search JSTOR Mimas Links from other JSTOR content Many other services, resources & collections CrossRef, OAI… Regular Google Search
Partners license all their content to RLUK
RLUK-JSTOR agreement for 25 years
JSTOR provides free archiving & delivery for UK (HE, FE, schools, public libraries) in exchange for commercialisation in rest of world
Only exclusive for 5 years . After this…
Libraries can deliver digital copies of their own pamphlets via open access
RLUK can enter into further agreements over use of the content
Ensure sustainability
Condition of pamphlets
Extent of duplication
Copyright status
Technical standards
Workflows, tools and timetables
Scoping study survey by library staff testing, samples, discussions
Workflows, tools and timetables – developed or scoped out
Scoping study
Technical standards Images: 600 bitonal for text; 300 grey for images OCR: 97-98% character accuracy Metadata: METS – structural MODS – bibliographic METS – technical* PREMIS – preservation* *selective use
Workflows
Tools
Timetables
Getting started and finished
Coping with changing standards
Ensuring sufficient production rate
Dealing with IPR issues
Maintaining relationships with multiple partners over a (relatively) long project
Smooth sailing…well, not quite!
Lessons What did we learn? Final report http://www.britishpamphlets.org.uk/docs/about/PamphletsFinalReport.pdf
Projects don’t go to plan – things will go wrong and opportunities will arise
Projects depend on people as well as technology – good communication and trust are vital
The headlines (no surprises here):
Challenge to get people in place at beginning and keep them there till end
Building/altering systems will take time and can cause delays
Preparing and transferring large amounts of content (electronic or physical!) can take longer than anticipated
Starting & finishing a project is hard work!
Scholars can view pamphlets quite differently (intellectual content vs archival objects; individual items vs collections)
Not all pamphlets and users are equal!
Librarians often treat pamphlets very differently (definition, location, binding, handling)
Not all pamphlets and users are equal!
Sampling and piloting are helpful, but not foolproof (especially if you change parameters – e.g. from 600 bitonal to 300 grey)
Time and motion is very important – every second can count when undertaking large-scale digitisation
Must pay close attention to the workflow!
Must pay close attention to the workflow! Insufficient scanning rate detected New scanners Need for more pamphlets detected Additional pamphlets & month extension
Must pay close attention to the workflow! 20 seconds to write two-page grey image to file = significant operator delay Scanner is ready by time next page is set up
Some 19 th century content will still be in copyright, but without considerable research it is not always possible to know
If we’d taken zero risk, we would have excluded 25% of the pamphlets
Accepting a tiny, calculated risk, we excluded less than 1%
IPR: worth taking a risk with copyright!…
Very, very complicated when there is a large consortium (12 partners), a long agreement (25 years), two jurisdictions (UK and US), and commercialisation (subscription model)
We needed 9 separate agreements and took two years to conclude them!
This can test relationships
IPR: …but not with licensing!
The METS standard can be configured in several different ways
Both MIX and PREMIS were updated in the course of the project requiring us to adjust our system and regenerate data
Standards are not always clear and can change on you!
Can provide challenge when there are different priorities, cultures and timezones…
Can provide opportunity and flexibility, with a wider pool of skills and experience to draw on
Working collaboratively + and -
Large number and wide selection of 19 th Century pamphlets available online
Efficient centralised scanning
Sustainable preservation and delivery
Sophisticated, distributed discovery and access
Useful models for other projects
But…
Despite challenges, we think we met project aim and objectives
Southampton has no large follow-on project
It has lost highly skilled scanning staff
It has underutilised capacity
It doesn’t have to maintain data (as in previous projects), but it does have high ongoing space/maintenance costs
… although we sustained the content, we failed to sustain the infrastructure!
Case studies: http://www.ithaka.org/ithaka-s-r/strategy/ithaka-case-studies-in-sustainability/case-studies/
SCA/Ithaka focus on sustainability
Any questions or comments? With thanks to JISC, JSTOR and the university libraries of Cambridge and Southampton, particularly Christine Fowler of Southampton CAMBRIDGE UNIVERSITY LIBRARY
Grant presents a case study of the 19th Century Pam more
Grant presents a case study of the 19th Century Pamphlets digitisation project, covering the decisions made in planning the project, the challenges encountered, and key lessons learned. less
0 comments
Post a comment