1. Intro to Digitisation
Matthew Brack
Digitisation Project Manager
Wellcome Library
Wednesday, 24 September 2014
2. Some digitisation projects
http://eeb.chadwyck.com/
http://bit.ly/1x0bRAj
ProQuest Early
Euopean Books
UK Medical
Heritage Library
http://wellcomelibrary.org/moh/ http://bit.ly/1vSt8Ic
3. Some more digitisation projects
Reading Room / Pathways
Western Manuscripts 1000-1650
Forensics and
Sex temporary
exhibitions
4. Nature of digitisation
Got general questions about digitisation?
The answer will always be: “It depends”
Simon Tanner, Measuring the Impact of Digital Resources: http://bit.ly/1nXeI7u
8. Why are we doing this?
What Big Media Can Learn From the New York Public
Library: http://theatln.tc/1wF91NU
JISC/RLUK survey: http://bit.ly/1wF9CiC
http://www.gutenberg.org/
9.
10. You won’t digitise everything…
“We’re going to digitise everything.”
Probably not.
Первый блин всегда комом
11. Project problems
post-mortem:
Machinery issues
Retrieval across 30
collections, 4 floors,
2 buildings, 2 states
of access
Copyright clearance
in parallel
12% of selection not
found
Display issues
13. 215B STACKS 1.22 STORAGE CONSERVATION CATALOGUING
BOOKS IN
STACKS
IN
SCOPE
NOTE
STAY ON
SHELF
ONLINE
CAT?
PRINT
CAT?
NOTE GENE-RATE
SHELF
LIST
DUPLI-CATE
CHECK
SINGLE
SHELF
LISTS
SORT
BY
SIZE
CHECK
OUT
CHECK
OUT
CON
ASS-ESS
UPDATE
SHELF
LIST
RETURN TO
SHELF
DIGI-TISE
CONDI-TION?
REPAIR
BOX
TO
CATALO-GUE?
CATA-LOGUE
1.22
STORE
NO
NO
YES
YES
LARGER
NOT OK
NO WAY
OK
FAIR
POOR
YES
1.22
STORE
NO
1.22
STORE
START
1a
1b
1c
1d
2
3
4
5
6
11
7
8
9
10
1.21 DIADEIS
14. It’s a lot of small tasks, repeated
1. Generate unique ID
2. Create ‘scan list’
3. Create ‘review file’
4. Make unavailable to users
5. Create barcodes
6. Retrieve items
7. Insert barcodes
8. Deliver items for imaging
9. Update tracking list
[Re-work]
a. Return
b. Remove barcodes
c. Update tracking list
d. Make available to users
e. Pray for no more re-work
f. Repeat for next batch
15. Metadata is really important
• Digital objects ‘don’t exist’ without metadata – no search, no
discovery
• Metadata first, then digitisation – otherwise you don’t know what
you have, where it is, or any way of controlling it…
• On average 50% of project time is spent on metadata and
cataloguing
• Must be shaped by user need and what an organisation is
capable of delivering
• Tension between low-volume digitisation with more metadata
for a richer user experience or larger-scale digitisation with
lighter metadata attached
• Standards-based framework helpful for consistency, accuracy
and efficiency in metadata input (e.g. Dublin Core, MARC21)
16.
17. Copyright and IPR
http://creativecommons.org/
Extended Collective Licensing:
http://www.legislation.gov.uk/uk
dsi/2014/9780111116890
Bøkhylla project by the National Library of
Norway: www.nb.no
19. Does digi damage your stuff?
• Most damage to collections comes from handling
• Digitisation handles collections intensively in
new ways
• Survey to develop image capture approach and identify
out of scope material
• Survey detail depends on collection
• Training for photographers and digital preparators
• Actual preparation of materials (staples, openings)
• Digitisation is not preservation
22. Useful resources
THORNTON, E. (2013) Digitisation Doctor Workshop. 15th April 2013.
Available from: http://blog.wellcomelibrary.org/2013/05/resources-from-digitisation-
doctor-workshop-now-available
HENSHAW, C. and KILEY, R. (2013) The Wellcome Library, Digital.
Ariadne. July 2013. Available from:
http://www.ariadne.ac.uk/issue71/henshaw-kiley
JISC, Project Management for Digitisation, JISC Digital Media. Available
from: http://www.jiscdigitalmedia.ac.uk/guide/project-management-for-a-digitisation-
project
BRACK, M. (2012) Bridging the Gap: Library digital collections, innovation
and the user. Thesis submitted in partial fulfilment of the requirements of
King’s College London for the Degree of Masters in Digital Asset
Management. Available from: http://nsla.org.au/publication/bridging-gap-library-
digital-collections-innovation-and-user
ProQuest EEB: Commercial collaboration to digitise all our books published in Europe before 1700 (14,000 items).
Codebreakers: 1,800 modern, in-copyright books on Genetics; funded digitisation of genetics archives (we collect a lot of digital content this way).
London’s Pulse (MOH): Digitisation in the Netherlands; post-processing (incl. xml for tables) in India.
UK MHL: 19thC books from nine partner libraries, digitised at Wellcome Library by Internet Archive.
Bespoke, mixed-format digitisation done in-house by our own photographers.
Supporting exhibitions at Wellcome Collection with digitised Library content.
Also, all Western MSS to 1650.
We have also put on Digitisation Doctor and x6 Open Days.
DigiDoc in a nutshell:
The problem with digitisation is that no one can tell you exactly how to do it.
There are too many variables: organisational culture, content type, resources
I could tell you exactly what we do and how to do it, but you wouldn’t want to do it, because it wouldn’t fit.
So for that reason I’m offering a high-level intro to some of the key elements to consider in digitisation.
Firstly, executive support and institutional buy-in are really key, so that when you hit problems, you have something to fall back on.
Strategy will dictate some key elements of your digitisation: selection, type of access (including licensing), resource allocation.
Spend time looking through your stuff and figure out
e.g. Are you serving a niche research audience?
Are you conducting a PR exercise where you want the most bang for your buck?
Perhaps what you want is best achieved through a collaboration, even a commercial one.
e.g. Florence Nightingale project – small institutions get stuff online via BU
If you appreciate that digitisation is a project-based activity, and deal with it as such, you will tackle most of your problems up-front.
PMGMT is best defined as “organised common sense” – sure, you may have common sense, but is it organised, and is it the same common sense that your colleagues have?
Taking the decision to acquaint yourself with PMGMT will help you to deliver your digitisation efficiently and effectively.
You can have a look at JISC’s guide to PMGMT for digitisation – it’s a great starting point.
So I could stop the presentation now, and simply say that PMGMT will take care of almost every problem you might encounter in digitisation.
Why? Because you will plan your work properly, you will identify all of the interested parties, you will know who is doing what when and how.
Digitisation cuts across a cultural institution like nothing else. In this image: a digitisation project is dictated by selection and delivery, usually all different people.
Digitisation isn’t a problem technically problem, it’s a cultural one within organisations.
Assuming that digitisation is part of your strategy, and assuming you have a strategy, you can expect a couple years of work for processes to bed in.
So let’s take a second to remember why we’re doing this:
1. Googlification of information – 2012 UK survey of academics – 40% start with a search engine for research; 15% start with online library catalogue; 2% visit physical library.
2. By the way, digitisation in itself isn’t innovative – digitisation started with Project Gutenberg in 1971 – it’s about what you do with it.
3. Good news = online world is open to all – where there is a will there is a way.
Best operator in this space anywhere is New York Public Library – NYPL Labs (BL have copied them) – check them out for inspiration.
Look what one man managed to achieve.
Something will stop you: condition of items, bad metadata, other computer error.
It’s important to realise this because:
Know your limits before you begin, otherwise you will waste resources on spending time fixing problems to make things available that have comparatively little value within the overall project…
And projects will go on and on and on.
It’s one of the hardest things to close a digitisation project – identify your deliverables before you begin.
Don’t go for perfection, go for pancakes – the first pancake is always lumpy (just start with something).
When people think of digitisation, they usually think of taking pictures.
This is actually the most straight-forward part of the whole process.
Yes, you need images, but there’s a lot more to it, and that’s what we’d like to draw your attention to today.
Digitisation is mostly preparation and planning.
Here is the imaging step in the workflow.
Systems workflow also comes after.
Taking a good picture is a well-established protocol – everything else won’t be when you’re starting out.
What it takes to get an item to the camera, and back to the shelf.
In digitisation you are bridging the gap between digital and physical.
A person working on a digitisation project needs a sensibility for both.
This is particularly the case with metadata:
We always have a physical identifier (i.e. shelf mark) that rides along with our folder structures and file naming until the digital object is safely ingested into our DAM.
Otherwise, how do you know what object’s images you are looking at? How do you know how to get it back for re-work?
Copyright: safe = 100 years old (publication date)
Otherwise – individual rights clearance – lots of time and resources – not a scalable exercise.
No ‘fair use’ in this country, though there is the possibility of ECL.
IP management: licenses important consideration – should be strategic decision made by your organisation.
Creative Commons is leader in open licensing of creative works.
Also available CC0 waiver - if you are a holder of copyright or database rights, and you wish to waive all your interests in your work worldwide.
Also: don’t forget sensitivity for your archives – copyright check, license application and sensitivity should all be part of your digitisation workflow.
Not as much as you might think…
If your stuff is like these images – then sure…
But usually this should not be a guiding principle of your digitisation project.
But generally your original physical material is going to last much longer than your digital manifestation – no competition.
You’ve just created a second collection of material that you need to ‘preserve’ and manage.
Preservation doesn’t mean much in a digital context – it’s actually a contradiction from traditional usage (restricting access) – we are interested in access.
You might say, fine, we’ll restrict access to preserve our originals. Potential implications:
don’t create a self-imposed obsolescence for your physical building (there might be someone upstairs who wonders why they’re keeping London real estate for stuff that’s only available online - some people still think that ‘going digital’ equates to reducing costs)
What would your users think?
Time and again it seems that the physical originals are consulted more frequently after digitisation.
This is actually what will make or break your project – whether people know about it or not.
Social media etc. – Twitter is pretty powerful, if you only did one thing you could start with that – it’s an amazing professional resource.
Networking and collaborative opportunities.