The document summarizes a project that digitized specimens from the Natural History Museum in London to connect historical collection data with current field observations of orchids in the UK. Over 80 million objects in the museum's collection were digitized with the help of over 15,000 transcriptions by 1,000 volunteers. This generated datasets spanning over 200 years that were combined with 1,700 new field observations and an existing dataset of 1,600 field observations to analyze changes in flowering times and patterns due to climate change. The project attracted participants from existing naturalist communities and new volunteers. While data quality was high, the analysis found most participants focused on either online activities or fieldwork, with limited crossover between the two.
Module for Grade 9 for Asynchronous/Distance learning
A crowd of specimens: Digitising Collections at the Natural History Museum, London
1. A crowd of specimens
Digitising Collections at the Natural History Museum,
London
Helen Hardy, Digital Collections Programme Manager
Dr John Tweddle, Head of the Angela Marmont Centre for UK Biodiversity
2. 80 million objects
• 30m insects
• 7m fossils
• 5,000 meteorites
• 250 yrs of collecting
• Almost 5b years
represented
3. ‘It shall be the duty of the Trustees to secure, so far as appears to them to be
practicable, that the objects comprised in the collections of the Museum… are, when
required for inspection by members of the public, made available …’
British Museum Act 1963 section 3(3)
4. Mass digitisation?
Image (specimen, labels, register?)
Transcription (labels, register?)
Geo-referencing
Diagnostic
Images
Number of specimens
Computabledata
12. • NHM, U. Oxford (Zooniverse), UC
Davis, Botanical Society of Britain
and Ireland, funded by AHRC
• Explores impact of climate change
• By building an extended record of
flowering phenology for UK orchids
using two citizen science datasets:
museum specimens + field obsns.
• Experimental approach: combines
the two main natural history citizen
science approaches: field+online
• An experiment!
Fly Orchid (Roger Powley)
13. Outdoors: find and photograph 29 species of UK orchid (Spring-
Autumn 2015) and then upload the photos to the project’s webpages
Early-purple Orchid
(Mike Waller)
15. Transcribe & extract phenology data from historical specimens
Analytical approach: combine the new field observations and historical specimen
data with other observations from the field naturalist community, co-publish
16. Early findings on
Orchid Observers:
Did we meet our science
and engagement goals?
Pyramidal Orchid
(Fred Rumsey)
Based on three data sources:
• Scientific data (n=>20k)
• Patterns of participation (n=>55k)
• Social science questionnaire (n=126)
17. Dataset Core date
range
3,700 historical
specimens
1780-1980
Field obs. (BSBI) 1970-2014
1,700 new field obs. 2015
• Different data biases, but that’s OK
• Time-series spans >200 years
• Passion, expertise and legacy of
volunteer naturalists and other
‘citizen scientists’ over >3 Centuries!
The datasets generated through
the project are complementary
Green-winged Orchid
(field contributor)
18. Accuracy of online
consensus (n=1,462)
No.
species
100% 13
90-99% 12
79-85% 4
• Self-assigned confidence = reliable guide
• Online consensus of 5 people sufficient
Data quality is very high
Confidence of field
identifier (n=1,604)
Accuracy
Certain 99.2%
Likely 89.7%
Uncertain 66.7%
Musk Orchid (Fred Rumsey)
19. • We can look at patterns of flowering over time, the effects of different
climate components & predict response per unit change in e.g. Spring temp.
• More projects of this type as collections data become available open access
Green-winged Orchid
Because of these two factors, the science looks promising!
Advances four
days per degree
Celsius increase in
Feb-April
temperature
Peak flowering date
from 1800-2015
20. Bee Orchid
(collected in 1918)
Who took part and why?
• Field naturalists, Zooniverse, and
first-time citizen scientists (17%)
• Necessitated 3 promotion routes
• 292 people (field), 1,745 (online)
Top 3 reasons for taking part
• Interest in botany (66%)
• To contribute to science (63%)
• Enjoyment of Zooniverse (59%)
• Outdoor natural history (32%)
21. Participant group % group that contributed to
online activities
Field photos Specimens Both
Online (n=1,745) 35% 52% 14%
Field (n=292) 13% 4% 7%
How did people take part? Two key observations:
1. Participants generally focused on one activity
• Most kept doing what they already enjoyed doing (existing interests or
areas of confidence?), with limited - but important - cross-over.
22. How did people take part? Two key observations:
2. For each task we see a range of participation
• This includes a small number of ‘super-contributors’ and numerous
one-off participants
Task Contribution to data made by:
Top 5% 1-off
Online – field images (n=1,143) 74% (57) 0.9% (183)
Field activities (n=292) 53% (15) 8.0% (137)
24. With thanks to everyone that contributed their time, expertise
and photos: participants, BSBI, NHM and Zooniverse teams
Bee Orchid
(Kath Castillo)
25. Five conclusions
1. Experimental, but worked: appealed to
existing citizen scientists, attracted new
contributors, produced sound data
2. Project design is more complex: trade-
off between complexity, accessibility
and online environment
3. Many benefits to working with both
communities, but limited cross-over
4. Two areas to improve: extent to which
participants felt they were valued and
how connected they felt with team
5. …transcription still remains niche!
Bee Orchid
(collected in 1918)
Editor's Notes
Abstract:
We’ve embarked on an epic journey to set data free from the Natural History Museum’s collections of some 80 million objects. In this talk, we’ll reflect on how we’re approaching this challenge, and how crowdsourcing can contribute. ‘Digitising’ a specimen is not a single process. Usually it starts with an image, but how we capture that depends on what the specimen is, and how it’s been preserved. Even when we can establish a fast and reliable imaging workflow, data often remains locked in labels or registers, perhaps crumbling, handwritten, in vanished languages or all three. In tackling data transcription, we have to balance crowdsourcing against other approaches including automated solutions. Cost and data quality are important, but we also need to find ways to value the engagement crowdsourcing gives us with a wider citizen science audience, publicising our collections and involving the public in their interpretation and use. And we need to understand our audiences for digitised data, not only in the research community but across the wider crowd as well.
The second half of our talk will look in-depth at the Orchid Observers project as a case study of one of the approaches that we are testing. Orchid Observers is an innovative collaboration between amateur naturalists and museum scientists that has combined two of the principal forms of citizen science: field-based ecological observation and online image-based classification. The project investigates how climate change is affecting the UK’s much loved and ecologically important orchid populations, by building a long-term record of flowering times from two distinct datasets: contemporary field observations and historical museum specimens. Combining outdoor and online citizen science in this way was a highly experimental approach and brought together two distinct communities of citizen scientists. We will overview the approaches that we have taken, challenges that we have encountered, successes, failures and unanticipated findings.
The eagle eyed among you may remember these figures from Ali Thomas’ earlier slide.
All figure approx.
Depend what you count as a specimen.
Also –
>5m library & archives
400k minerals
6-7m botany
27m zoology
Both the Museum’s original mission – referring to ‘the learned and the curious’ - and our more recent legislation, highlight that we should have not only a desire but a duty to make the collections available.
Although this legislation refers to availability at the museum, digitisation (and the scale of collections built up over time ‘behind the scenes’) arguably changes the nature of this duty, by opening up new kinds of availability.
Full wording also references safety of collections: “It shall be the duty of the Trustees to secure, so far as appears to them to be practicable, that the objects comprised in the collections of the Museum (including objects stored under the preceding subsection) are, when required for inspection by members of the public, made available in one or other of the authorised repositories under such conditions as the Trustees think fit to impose for preserving the safety of the collections and ensuring the proper administration of the Museum.”
80m objects is an epic digitisation challenge by any standards
AND digitisation is not a single process.
No ‘science’ to these bars – just illustrating in very broad terms some of the stages of digitisation. We are going for ‘broad and thin’ but also want data to be scientifically useful. Sometimes we will do one stage and plan to do the other at a later point.
I’m going to look first at imaging, then transcription
Imaging itself is not one process – different for every specimen type and involves a lot of hardware/software innovation
Here:
A set up designed by colleagues to gently hold open our historic Sloane Herbarium volumes for imaging
The makings of an imaging set-up for capturing labels on pins
Microscope slides – we’ve tried several approaches including trays of slides and individual images. Latter cheaper and so far better. Images are of labels not specimens
A specimen images in our e-mesozoic project, which had to adopt different camera set-ups depending on specimen size and manouverability
Some ethical and practical constraints around using volunteers at the imaging stage. But imaging generates all kinds of stories – of specimens; innovation etc – that we want to share with the crowd.
Imaging only gets us so far though – we need to transcribe label or other data to make it computable.
We are trialing crowdsourcing for label transcription using Notes from Nature on the Zooniverse platform – microscope slides of miniature insects and fossils
Batch 1 Aug-Jan
2,097 slides
150 days
600+ participants
Batch 2 jan-Mar
2,097 slides
61 days
Batch 3 mar-jun
68 days
90%
900+ participants
Image not immediately appealing, workflow complex, lots of things that can be confusing – although we do have some supertranscribers who relish the challenge and building up their expertise.
Quality is an issue – calling value for money into question…
It’s possible crowdsourcing could become part of a triage approach? BUT
Does exception processing make a satisfying workflow for people?
Where are we valuing the wider engagement we get through crowdsourcing?
We need to think back about why we are doing digitisation and the importance of access
Images and data are made available – open data – but not yet accessible or engaging
Hopefully you heard Ali Thomas speak earlier about our Visiteering – which combines transcription effort with an enjoyable experience to help people really understand more about collections and connect more strongly with what the Museum is doing.
Putting the focus on connection is likely the way forward for our crowdsourcing efforts – on that note I’ll hand over to John Tweddle to give you a more in depth view of the Orchid Observers crowdsourcing initiative, which we hope gives that richer experience.
L
Really successful – for science and initial social study – for the participants
Relatively niche but can deliver engagement with collections and high quality science through this kind of approach
Transcription remains comparatively niche – not going to get tens of thousands I but that’s abs ok
One piece of learning – comms point
L
Really successful – for science and initial social study – for the participants
Relatively niche but can deliver engagement with collections and high quality science through this kind of approach
Transcription remains comparatively niche – not going to get tens of thousands I but that’s abs ok
One piece of learning – comms point