3. …turns out what I should have asked
for was a NSA for biodiversity
4.
5. • There are known knowns, things we know
that we know
• There are known unknowns, things we
now know we don’t know
• But there are also unknown unknowns,
things we do not know we don't know
13. Implications
• Rate of new taxa being described is relatively
constant
• Suggests taxonomists are working at capacity
• Most taxonomic work is in the past
• Compare this to exponential growth of sequencing
• --
17. Dark taxa
• Disconnect between taxonomy and genomics
• How much of this comprises taxa we already
know about versus new diversity?
• Do we need taxonomic names?
• --
19. Scanned legacy
• BHL is more than pre-1923 literature
• The real gap is post-1923 to pre-open access (2003)
• Most of the 20th
century taxonomic literature is
“dark”
• --
20. Size of Wikipedia articles on mammals
Few, large articles
Many, small
articles“long tail”
21. Power law
• We know a lot about a few species
• For most species we know very little (even in well-
known groups)
• For poorly known species need to go to legacy
literature
• --
25. Publishers
• BioStor (BHL) is the single largest source of
taxonomic literature
• Lots of tiny publishers (long tail)
• Commercial publishers important (Magnolia Press,
Springer, Informa, Wiley, Elsevier, BioOne)
• Who do we talk to about data mining?
• --
34. Implications
• GenBank is about more than genes
• GenBank has a wealth of information on location,
and ecological interactions
35.
36. Implications
• Phylogenetic data is not being archived (why not?)
• Makes it hard to reproduce studies
• Does data matter?
• What level of granularity should be citable?