Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Diverse collections spanning
space and time
Challenge of scale:
>80 million specimens!
Challenge of speed
(digitising with...
Higher Classification
Scientific name: Thymelicus lineola (Ochsenheimer, 1808)
Family: Hesperiidae
Location
Locality: Tilb...
http://data.nhm.ac.uk
Complete NHM Specimen Dataset (3.3M records)
http://bit.ly/2goEpBB
GitHub Gist – NHM API:
http://bit.ly/2gtukRv
iCollectio...
Potential Challenges
How did collecting effort change over time?
Who was the collector who collected from the most distinc...
Complete NHM Specimen Dataset (3.3M records)
http://bit.ly/2goEpBB
GitHub Gist – NHM API:
http://bit.ly/2gtukRv
iCollectio...
The Natural History Open Data Challenge @ OTA16
The Natural History Open Data Challenge @ OTA16
Upcoming SlideShare
Loading in …5
×

The Natural History Open Data Challenge @ OTA16

371 views

Published on

the presentation given by Laurence Livermore & Ben Scott at Over the Air 2016

Published in: Technology
  • Be the first to comment

  • Be the first to like this

The Natural History Open Data Challenge @ OTA16

  1. 1. Diverse collections spanning space and time Challenge of scale: >80 million specimens! Challenge of speed (digitising within a lifetime) Ambitious digitisation programme (DCP) Institutional policy “open by default” 
  2. 2. Higher Classification Scientific name: Thymelicus lineola (Ochsenheimer, 1808) Family: Hesperiidae Location Locality: Tilbury Docks State/province: England Country: United Kingdom Continent: Europe Decimal latitude: 51.4605 Decimal longitude: 0.3449 Collection Event Recorded by: T G. Howarth; Howarth Collection date: 31 / 07 / 1938 Most iCollections specimens will have ~30 fields containing data (over 100 different fields across all collections) There are some issues… (where is H. M. Edelsten!?)
  3. 3. http://data.nhm.ac.uk
  4. 4. Complete NHM Specimen Dataset (3.3M records) http://bit.ly/2goEpBB GitHub Gist – NHM API: http://bit.ly/2gtukRv iCollections Datasets http://bit.ly/2gGZub5 Even more data… http://www.gbif.org/occurrence
  5. 5. Potential Challenges How did collecting effort change over time? Who was the collector who collected from the most distinct localities? – can we make a ranking table and mash up data with Wikipedia or other sources? What can we learn about the collectors – who travelled the furthest or most regularly? Were most specimens collected in rural areas? Is there collection bias in particular counties? How can we make the data more attractive to difference audiences? How could we display the data in more engaging or informative ways?
  6. 6. Complete NHM Specimen Dataset (3.3M records) http://bit.ly/2goEpBB GitHub Gist – NHM API: http://bit.ly/2gtukRv iCollections Datasets http://bit.ly/2gGZub5 Even more data… http://www.gbif.org/occurrence

×