The Future of MicroalgalTaxonomy
AnneThessen, athessen@mbl.edu
David Patterson dpatterson@mbl.edu
(DataConservancy, Life S...
Scientist’s Dream
Computer, what is
the trajectory of
the planet Seti
Alpha 5?
Taxonomist’s Dream
How many algal
species can be
found on this
planet?
Taxonomist’s Dream
What species is
this?
Taxonomist’s Dream
Taxonomist’s Dream
Setting the stage for a ‘big new biology’
• BIG = data-centric (like particle
physics and astronomy)
• Characterized by da...
Small science
Large number of providers with small
amounts of data.
Small number of
providers with lots
of data.
Aa
paleacea
Limulus polyphemus
Kiwa
hirsuta
Osedax
frankpressi
Kingia
australis
Pieris
japonica
Pieris rapae
Trypanosoma
b...
Many names for one taxon
Didimosphenia
geminata
Didymosphenia
geminata
Didymosphenia
geminata
Didymosphenia
geminata
Rock ...
Reconciliation Group
Didymosphenia geminata
Didimosphenia geminata
Didymo
Rock Snot
Echinella geminata
Gomphonema geminatu...
Reconciliation Group
Didymosphenia geminata
Didimosphenia geminata
Didymo
Rock Snot
Echinella geminata
Gomphonema geminatu...
One name for many taxa
Contextual data
Diatom
Chloroplast
Frustule
Benthic
Marine
Disambiguate by
authority, species,
cont...
Global Names Architecture
Provider Services
DATA AND SERVICE
CONSUMERS
DATA AND SERVICE
PROVIDERS
EXPERTS
Consumer Service...
Names-based cyberinfrastructure
• Managing names to manage biodiversity data
- All names (scientific vernacular surrogate)...
Legacy Data
• Narrative tradition in biology
• Too much for a human
• Can we get a machine to do
the work?
• NLP!!!
Legacy Data
• Use NLP/machine learning to extract names and
characters
• Hong Cui
Legacy Data
• Spirogyra:chloroplasts:present
Legacy Data
• Spirogyra:chloroplasts:present:attribution
Coffee Ontology
coffee
is a
drink
Existing Ontology
SemanticWeb
Data Discovery and Aggregation
Future Data
Triple Store
The New Workforce
• Informatics/computing training
• Modified workflows
• Importance of data management and
preservation
In Summary
• Big New Biology is coming, taxonomy can
benefit from being a part of it
• Existing data can be made machine-r...
Acknowledgments
• Dima Mozzherin
• David Shorthouse
• SayeedChoudhury
• Pete DeVries
Upcoming SlideShare
Loading in …5
×

The Future of Microalgal Taxonomy

317 views

Published on

This talk describes the potential semantic web technology has to make the practice of taxonomy easier. It was presented at the 2011 Phycological Society of America conference in Seattle, WA, USA.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
317
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The Future of Microalgal Taxonomy

  1. 1. The Future of MicroalgalTaxonomy AnneThessen, athessen@mbl.edu David Patterson dpatterson@mbl.edu (DataConservancy, Life Sciences)
  2. 2. Scientist’s Dream Computer, what is the trajectory of the planet Seti Alpha 5?
  3. 3. Taxonomist’s Dream How many algal species can be found on this planet?
  4. 4. Taxonomist’s Dream What species is this?
  5. 5. Taxonomist’s Dream
  6. 6. Taxonomist’s Dream
  7. 7. Setting the stage for a ‘big new biology’ • BIG = data-centric (like particle physics and astronomy) • Characterized by data sharing via a virtual pool • New = new skill sets, tools, cyber- infrastructure to exploit the data pool • Data driven discovery as a new means of understanding • GenBank as a model within the Life Sciences
  8. 8. Small science Large number of providers with small amounts of data. Small number of providers with lots of data.
  9. 9. Aa paleacea Limulus polyphemus Kiwa hirsuta Osedax frankpressi Kingia australis Pieris japonica Pieris rapae Trypanosoma brucei Homo sapiens
  10. 10. Many names for one taxon Didimosphenia geminata Didymosphenia geminata Didymosphenia geminata Didymosphenia geminata Rock snot Didymo Echinella geminata Gomphonema geminatum Gomphonema vulgare
  11. 11. Reconciliation Group Didymosphenia geminata Didimosphenia geminata Didymo Rock Snot Echinella geminata Gomphonema geminatum Gomphonema vulgare
  12. 12. Reconciliation Group Didymosphenia geminata Didimosphenia geminata Didymo Rock Snot Echinella geminata Gomphonema geminatum Gomphonema vulgare
  13. 13. One name for many taxa Contextual data Diatom Chloroplast Frustule Benthic Marine Disambiguate by authority, species, contextual data Contextual data Food Moth Wings Exoskeleton Caterpillar
  14. 14. Global Names Architecture Provider Services DATA AND SERVICE CONSUMERS DATA AND SERVICE PROVIDERS EXPERTS Consumer Services GNA
  15. 15. Names-based cyberinfrastructure • Managing names to manage biodiversity data - All names (scientific vernacular surrogate) - For all organisms - Many names for one species reconciled - One name for many species disambiguated • Global Names Architecture - a virtual layer, using names services to link together distributed data • Globalnames.org • Micro*scope (microscope.mbl.edu) and Encyclopedia of Life (eol.org)
  16. 16. Legacy Data • Narrative tradition in biology • Too much for a human • Can we get a machine to do the work? • NLP!!!
  17. 17. Legacy Data • Use NLP/machine learning to extract names and characters • Hong Cui
  18. 18. Legacy Data • Spirogyra:chloroplasts:present
  19. 19. Legacy Data • Spirogyra:chloroplasts:present:attribution
  20. 20. Coffee Ontology coffee is a drink
  21. 21. Existing Ontology
  22. 22. SemanticWeb
  23. 23. Data Discovery and Aggregation
  24. 24. Future Data Triple Store
  25. 25. The New Workforce • Informatics/computing training • Modified workflows • Importance of data management and preservation
  26. 26. In Summary • Big New Biology is coming, taxonomy can benefit from being a part of it • Existing data can be made machine-readable using information extraction algorithms • Existing workflows can be modified to capture data close to the source • Data can be shared using the semantic web
  27. 27. Acknowledgments • Dima Mozzherin • David Shorthouse • SayeedChoudhury • Pete DeVries

×