Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Unknown knowns, long tails, and long data


Published on

Talk given at Scientific Collections International: Food Security Research Initiative, USDA National Agriculture Library

Published in: Science
  • Be the first to comment

  • Be the first to like this

Unknown knowns, long tails, and long data

  1. 1. Unknown knowns, long tails, and long data @rdmpage Stressors and Drivers of Food Security: Evidence from Scientific Collections
  2. 2. We know this We know we don’t know this We don’t know that we don’t know this known unknown We know this, but we don’t know that
  3. 3. “long data”
  4. 4. 100,000 articles from (BHL) 1923 today
  5. 5. Long tail (Wikipedia pages for mammals) Lots of very small articles A few very large articles
  6. 6. Mining the biodiversity literature
  7. 7. Associations between species
  8. 8.
  9. 9. Biological Diversity in the Patent System Paul Oldham, Stephen Hall, Oscar Forero PLoS ONE “…human innovative activity involving biodiversity in the patent system focuses on approximately 4% of taxonomically described…” @junglepaul
  10. 10. PMID:948206
  11. 11.
  12. 12.
  13. 13. BHL and GBIF as biomedical databases PubMed (disease) BioStor (publication) GBIF (specimen)
  14. 14. Summary • Open access literature is a potential goldmine of information (long data, long tail) • Text mining for entities (scientific names, places, specimens, attributes) (search is still the killer app) • Linking things together (unknown knowns)