Building the
Biodiversity Knowledge Graph
@rdmpage
http://iphylo.blogspot.com
• There are known knowns, things we know
that we know
• There are known unknowns, things we
now know we don’t know
• But t...
known
unknown
Things we don’t know
that we know
Melissotarsus insularis
Melissotarsus insularis no hit
CASENT0107663-D01 DQ176312
Melissotarsus sp. BLF m1DQ176312
CASENT0107663-D01Melissotarsus ...
We have a vast amount of
“old stuff”
Numbers of new animal names
1923
WWI
WWII
We are learning new stuff
“New” and “old” are
disconnected
Dark taxa
http://iphylo.blogspot.co.uk/2011/04/dark-taxa-genbank-in-post-taxonomic.html
Mammals in GenBank
Proper Linnaean names
Aus sp.
Mammals
Proper Linnaean names
Aus sp.
“Invertebrates”
BOLD
Challenge:
linking things together
(sticky data)
Data is good
More data is better…
…but this data is not sticky
Location
name
name
Tags
Namenname
Identifiers
Shared identifiers are sticky
Identifiers
• Globally unique
• Resolvable (for humans and machines)
• Use other people’s identifiers to link
things toget...
Human and machine readable
machine
human
{
"author": [
{
"family": "Page",
"given": "Roderic D.M."
}
],
"container-title": "PeerJ",
"reference-count": 60,
"page": ...
Using other people’s identifiers
is hard work and scary
• Hard work - you have to find their identifiers
• Scary - what ha...
http://dx.doi.org/10.7717/peerj.190
DOI (Digital Object Identifier)
Biodiversity Knowledge Graph
(linking things together)
Our questions are
“paths” in this network
Phylogeography
Taxonomy
GenBank records from Spain
MESH term
PMID:948206
http://biostor.org/reference/102054
http://data.gbif.org/occurrences/215921922/
BHL and GBIF as biomedical databases
http://iphylo.blogspot.co.uk/2012/03/bhl-and-gbif-as-biomedical-databases.html
Metrics
(counting links in the knowledge graph)
In an attempt to live up to that increasing
demand for documentation, the leadership of
the Natural History Museum of Denm...
https://twitter.com/#!/search/10.1371%252Fjournal.pone.0036881
https://twitter.com/edwbaker/status
/205595933159858176
https://twitter.com/edwbaker/status/205595933159858176
http://www.museum-analytics.org/
Cited, linkable specimens
NMNH Vertebrate Zoology
Herpetology Collections
11194
CAS Herpetology Collection Catalog
MCZ Her...
Annotation
(everyone can make
the knowledge graph)
http://bionames.org/labs/bookmarklet/
How many people view annotation
Data
Fix me!
Annotation as fixing errors
Annotation as building
the knowledge graph
paper specimen
paper
sequence
taxonomic name
specimen
cites
publishes
has vouch...
OK, but if the
biodiversity knowledge graph is so cool,
why haven’t we made it already?
Open question:
Who will build the
biodiversity knowledge graph?
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
Upcoming SlideShare
Loading in …5
×

Building the Biodiversity Knowledge Graph

1,076 views

Published on

Slides from 4th Global Online Biodiversity Informatics Seminar https://plus.google.com/events/clvk6nd14d9fhh7e4a6oe5mt9s0

Published in: Science, Technology, Education

Building the Biodiversity Knowledge Graph

  1. 1. Building the Biodiversity Knowledge Graph @rdmpage http://iphylo.blogspot.com
  2. 2. • There are known knowns, things we know that we know • There are known unknowns, things we now know we don’t know • But there are also unknown unknowns, things we do not know we don’t know
  3. 3. known unknown
  4. 4. Things we don’t know that we know
  5. 5. Melissotarsus insularis
  6. 6. Melissotarsus insularis no hit CASENT0107663-D01 DQ176312 Melissotarsus sp. BLF m1DQ176312 CASENT0107663-D01Melissotarsus insularis 1 Melissotarsus insularisMelissotarsus sp. BLF m1 =
  7. 7. We have a vast amount of “old stuff”
  8. 8. Numbers of new animal names 1923 WWI WWII
  9. 9. We are learning new stuff
  10. 10. “New” and “old” are disconnected
  11. 11. Dark taxa http://iphylo.blogspot.co.uk/2011/04/dark-taxa-genbank-in-post-taxonomic.html
  12. 12. Mammals in GenBank Proper Linnaean names Aus sp.
  13. 13. Mammals Proper Linnaean names Aus sp.
  14. 14. “Invertebrates” BOLD
  15. 15. Challenge: linking things together (sticky data)
  16. 16. Data is good
  17. 17. More data is better…
  18. 18. …but this data is not sticky
  19. 19. Location
  20. 20. name name Tags
  21. 21. Namenname
  22. 22. Identifiers
  23. 23. Shared identifiers are sticky
  24. 24. Identifiers • Globally unique • Resolvable (for humans and machines) • Use other people’s identifiers to link things together
  25. 25. Human and machine readable machine human
  26. 26. { "author": [ { "family": "Page", "given": "Roderic D.M." } ], "container-title": "PeerJ", "reference-count": 60, "page": "e190", "deposited": { "date-parts": [ [ 2013, 11, 18 ] ], "timestamp": 1384732800000 }, "title": "BioNames: linking taxonomy, texts, and trees", "type": "journal-article", "DOI": "10.7717/peerj.190", "ISSN": [ "2167-8359" ], "URL": "http://dx.doi.org/10.7717/peerj.190” }
  27. 27. Using other people’s identifiers is hard work and scary • Hard work - you have to find their identifiers • Scary - what happens if other person breaks their identifiers? • Solution: make it easy to find them, and make them robust (e.g., CrossRef and DOIs)
  28. 28. http://dx.doi.org/10.7717/peerj.190 DOI (Digital Object Identifier)
  29. 29. Biodiversity Knowledge Graph (linking things together)
  30. 30. Our questions are “paths” in this network
  31. 31. Phylogeography
  32. 32. Taxonomy
  33. 33. GenBank records from Spain
  34. 34. MESH term
  35. 35. PMID:948206
  36. 36. http://biostor.org/reference/102054
  37. 37. http://data.gbif.org/occurrences/215921922/
  38. 38. BHL and GBIF as biomedical databases http://iphylo.blogspot.co.uk/2012/03/bhl-and-gbif-as-biomedical-databases.html
  39. 39. Metrics (counting links in the knowledge graph)
  40. 40. In an attempt to live up to that increasing demand for documentation, the leadership of the Natural History Museum of Denmark has issued an order to its curatorial staff - The staff members are requested to document which publications from 2011, written entirely by external scientists, that in one way or another are based on material in the collections of the Museum. http://markmail.org/message/opv2we7fkmro2nen@TAXACOM
  41. 41. https://twitter.com/#!/search/10.1371%252Fjournal.pone.0036881
  42. 42. https://twitter.com/edwbaker/status /205595933159858176 https://twitter.com/edwbaker/status/205595933159858176
  43. 43. http://www.museum-analytics.org/
  44. 44. Cited, linkable specimens NMNH Vertebrate Zoology Herpetology Collections 11194 CAS Herpetology Collection Catalog MCZ Herpetology Collection Herpetology Collection (University of Kansas Biodiversity Research Center) 9619 6720 5818 http://iphylo.blogspot.co.uk/2012/02/gbif-specimens-in-biostor-who-are-top.html
  45. 45. Annotation (everyone can make the knowledge graph)
  46. 46. http://bionames.org/labs/bookmarklet/
  47. 47. How many people view annotation Data Fix me!
  48. 48. Annotation as fixing errors
  49. 49. Annotation as building the knowledge graph paper specimen paper sequence taxonomic name specimen cites publishes has voucher
  50. 50. OK, but if the biodiversity knowledge graph is so cool, why haven’t we made it already?
  51. 51. Open question: Who will build the biodiversity knowledge graph?

×