Rod Page @rdmpage
http://iphylo.blogspot.com
Knowledge graphs
Holly Bik @hollybik
Let’s rise up to unite taxonomy and technology
10.1371/journal.pbio.2002231
http://ispecies.org
Simple Javascript mashup
DBpedia
GBIF
CrossRef
EOL
Open Tree of Life
TreeBASE
https://doi.org/10.7717/peerj.190
The Semantic web:
“The future of the web…
and always will be” –
Peter Norvig (Google)
Obstacles to building knowledge graphs
•Technical
•Social
Obstacles to building knowledge graphs
• Need globally unique, persistent identifiers
(how to label the nodes of the graph)
• Need to create and agree on vocabularies
(how to label the edges of the graph)
• Need to agree how to transmit the graph
• Who stores the global graph?
A new hope
• The identifier wars are (nearly) over (DOIs FTW)
• Lots of domain-specific vocabularies, but
schema.org is “good enough” for most things
• XML becoming a bedtime story to frighten the
children, JSON is everywhere (JSON-LD FTW).
• Wikidata
Obstacles to building knowledge graphs
•Technical
•Social Economic
Identifiers, identifiers, identifiers, identifiers
How do we measure progress?
before
now
now
before
Linear growth (easy) Connectivity (hard)
Need network effects
One is useless Two is “meh” Many is better
The Semantic web:
“The future of the web…
and always will be” –
Peter Norvig (Google)
The knowledge graph is
already here (it’s just
not evenly distributed)
William Gibson @GreatDismal
Google’s Knowledge Graph
PREFIX wdt: http://www.wikidata.org/prop/direct/
PREFIX wd: <http://www.wikidata.org/entity/>
SELECT ?root_name ?parent_name ?child_name WHERE
{
VALUES ?root_name {"Hominini"}
?root wdt:P225 ?root_name .
?child wdt:P171+ ?root .
?child wdt:P171 ?parent .
?child wdt:P225 ?child_name .
?parent wdt:P225 ?parent_name .
}
http://biohackathon.org/d3sparql/
Toshiaki Katayama @tktym
http://iphylo.blogspot.ca/2017/01/displaying-taxonomic-classifications.html
“Citations for the sum of
human knowledge”
WikiCite @WikiCite
Goal 1: Every citation in the Wikipedias should be in Wikidata
Goal 2: Every citation should be in Wikidata (!?)
Small knowledge graphs (hexastores)
Very simple
ontology
Tom Scott @derivadow
Leigh Dodds @ldodds
Hexastore
• A triple is [s, p, o]
• Find all statements [s, ?, ?] is simple array lookup (all elements with key “s”)
• Find all statements [?, ?, o] is slow (scan all triples)…
• …unless we add array of [o, s, p] triples, then simple array lookup (all elements with
key “o”)
• Six variations cover all queries: [s,p,o], [s,o,p], [p, s, o], [p, o, s], [o, s, p], [o, p, s]
(hence “hexastore”)
• In-memory graph database in Javascript (think offline apps)
http://crubier.github.io/Hexastore/
Xanadu,
the web that wasn’t
Ted Nelson Hyperlinks and
hypermedia
Two-way links and
“transclusion”
= Xanadu
Tim Berners-Lee
HTTP, URL, HTML
One-way links
= world wide web
Web page Other web
page
Web linking, one way, document-level, “target”
doesn’t know that it is linked to (“cited”),
link can break (404)
text
Work Source
text
Xanadu linking, two way, fragment-level,
“source” knows it is linked to, source content
is embedded, links don’t break
Xanadu
A New Account of the Genus
Horsfieldia (Myristicaceae), Pt 2
W J J O De Wilde
The Gardens' bulletin, Singapore 38(1): 55-144 (1985)
http://biostor.org/reference/175018
Horsfieldia lancifolia
BioStor @biostor_org
Biodiversity Heritage Library @biodivlibrary
Flora Malesiana. Series I - Seed Plants,
Volume 14. Myristicaceae
https://doi.org/10.3897/ab.e1141
DescriptionDescriptio
n
Flora Article
Embedded markup (bad)…
Crocidura absconditus, new species
<i>Crocidura absconditus</i>, new species
0 20
{ [0,20], “italics” }
…versus annotation (good)
(think NLM JATS XML markup
versus Substance JSON used
by Lens viewer
https://lens.elifesciences.org/
about/)
Crocidura absconditus, new species
@hypothes_is
Annotating a
scientific paper
Aggregating annotations (iPhylo)
http://iphylo.blogspot.co.uk/2016/06/aggregating-annotations-on-scientific_30.html
Taxonomic
names,
specimen
codes,
geographic
localities,
references are
all
annotations
Taxonomic databases
are not lists of names…
…they are lists of annotations
(“this name occurs on this page”)
Annotations are retrospective nanopublications
Annotating existing content
(extracting “facts”)
Today
Publishing “facts” as nanopublications
Stream of “facts”
Social design and the
knowledge graph
Obstacles to building knowledge graphs
•Technical
•Social Economic
Nico Franz @taxonbytes
ORCID
(person)
DOI
(publication)
LSID
(plant name)
Find my papers that
published new species
@SandyKnapp
ORCID
(person)
DOI
(publication)
LSID
(plant name)
#Iamataxonomist
(claim/demonstrate expertise)
specimen plant name
What Sandy really wants
collected type for
publication
person
“What specimens that I collected that have been
described as new species by other people?”
Published in
author
other person
not the same person
Knowledge graphs
considered harmful
(remember Impact Factors?)
http://www.museum-analytics.org/
Cited, linkable specimens
NMNH Vertebrate Zoology
Herpetology Collections
11194
CAS Herpetology Collection Catalog
MCZ Herpetology Collection
Herpetology Collection (University
of Kansas Biodiversity Research
Center)
9619
6720
5818
http://iphylo.blogspot.co.uk/2012/02/gbif-specimens-in-biostor-who-are-top.html
We will need to ensure our knowledge graph is
free, open, and used for good

Towards a biodiversity knowledge graph

Editor's Notes

  • #4 https://doi.org/10.1371/journal.pbio.2002231
  • #5 http://ispecies.org
  • #6 https://doi.org/10.7717/peerj.190
  • #18 https://www.wikidata.org
  • #19 http://iphylo.blogspot.ca/2017/01/displaying-taxonomic-classifications.html
  • #21 http://www.bbc.co.uk/nature/life/Steller's_Sea_Eagle
  • #25 http://crubier.github.io/Hexastore/
  • #33 Ted Nelson’s Xanadu project, linking and microcredit
  • #34 http://biostor.org/reference/175018
  • #35 https://doi.org/10.3897/ab.e1141
  • #37 https://lens.elifesciences.org/about/)
  • #40 http://iphylo.blogspot.co.uk/2016/06/aggregating-annotations-on-scientific_30.html
  • #45 https://doi.org/10.1101/157214
  • #54 https://ontotext.com/knowledgehub/case-studies/sn-scigraph-uses-graphdb/. Springer SciGraph https://twitter.com/OntotextGraphDB/status/898143878724935681