SlideShare a Scribd company logo
Karen Cranston
National Evolutionary Synthesis Center
@kcranstn
http://www.slideshare.net/kcranstn
opentreeoflife.org
What does it mean to “have” the tree of life?
complete & dynamic
browse, download, query
use for research questions
implies digital access
0"
2000"
4000"
6000"
8000"
10000"
12000"
1978"1979"1980"1981"1982"1983"1984"1985"1986"1987"1988"1989"1990"1991"1992"1993"1994"1995"1996"1997"1998"1999"2000"2001"2002"2003"2004"2005"2006"2007"2008"
Number'of'papers'published'
Year'
Phylogeny'papers,'1978;2008'
Source:"ISI"Web"of"Science""
Rapid"increase"in"applica?ons"of"
phylogeny,"beginning"in"early"1990s"
graph from David Hillis
Goals
1. Synthesize a complete draft tree of life from existing phylogenies
2. Release in year 1 with:
a. engaging public interface
b. ability to upload new data, explore conflict, see provenance
c. open data: tree, subtrees and source data
Graph databases of
taxonomy + source trees
•filter / weight input trees
•combine into synthetic trees
•feedback
•input new data sets
~ 4% of all published
phylogenetic trees
Stoltzfus et al 2012
Inputs: Phylogenetic data
Archiving sequence data is a community norm
assembly
alignment
inference
expertise
time
$$$
thermore, a paraphyletic relationship of phorids and syrphids
would support the hypothesis that their shared special mode of
extraembryonic development (dorsal amnion closure) (26)
evolved in the stem lineage of Cyclorrhapha and preceded the
origin of the schizophoran amnioserosa.
To test this hypothesis, we used a relatively recent phylogenomic
marker: small, noncoding, regulatory micro-RNAs (miRNAs).
miRNAs exhibit a striking phylogenetic pattern of conservation
across the metazoan tree of life, suggesting the accumulation and
maintenance of miRNA families throughout organismal evolution
Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (−lnL =
344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im-
proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–
88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The number
of origins of each trait was estimated with reference to the phylogeny, the distribution of each trait among genera within a family, and the known biology of
the organisms.
Wiegmann et al. PNAS Early Edition | 3 of 6
Why do we need to database phylogenetic trees?
Heroic data collection efforts
Surveyed >7000 phylogenetic studies in plants, fungi and
animals, unicellular organisms
Result: repository of data for >2300 studies, >4800 trees
Remaining data not available digitally
Manuscript accepted to PLoS Biology
Inputs: Taxonomy
Large fraction of species not represented in phylogenies
taxonomy provides backbone & coverage at tips
Need name resolution services for data cleaning
Process
Source trees
(Phylografter) Data storage &
synthesis
(treemachine)
OpenTree:
visualization,
search, downloadTaxonomies
(taxamachine)
Source tree management
phylografter.opentreeoflife.org
Source tree & taxonomy synthesis
Novel graph database for phylogenies (treemachine) and
taxonomy (taxomachine)
Allows for efficient storage and retrieval
OpenTree
dev.opentreeoflife/opentree
Public tree of life
publictreeoflife.com/tree
open data: requiring CC0 license on source trees
open source software: https://github.com/OpenTreeOfLife
wiki: http://opentree.wikispaces.com/ (52 members)
public mailing list (67 members)
“Open” Tree of Life
Community engagement
~50 visitors per day to blog.opentreeoflife.org
@opentreeoflife on Twitter (~900 followers)
Tree of Life symposium: Evolution 2013
Hackathon in year 2 (joint with Arbor)
Collaborations
providing images and text for public tree
developing methods for subtree extraction
summer student providing links to ToLWeb
pages
treeviz project from U Indiana MOOC,
upcoming summer intern
year 2-3 plans for data archiving / harvest
Assessment: PI survey
general satisfaction with progress on data collection,
synthesis and software development
more focus on incentives for users
more integration across labs
Assessment: Advisory board	
Members:
David Hillis (UT Austin)
Jan Reichelt (Mendeley)
Andy Sinauer (Sinauer Associates)
Planning meeting for start of year 2
On track for year 1 release
1. Synthesize a complete draft tree of life from existing phylogenies
2. Release in year 1 with:
a. engaging public interface
b. ability to upload new data, explore conflict, see provenance
c. open data: tree, subtrees and source data
Goals for year 2
Refine draft tree based on user feedback
Empirical use cases drive development
Incentives for users / data contributors
Collaboration with external projects (AVAToL, ToLWeb,
Phylotastic, Dryad)
opentreeoflife.org

More Related Content

What's hot

iplant-highlights-pag2015
iplant-highlights-pag2015iplant-highlights-pag2015
iplant-highlights-pag2015
Matthew Vaughn
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
TheContentMine
 
Content Mining of Science in Cambridge
Content Mining of Science in CambridgeContent Mining of Science in Cambridge
Content Mining of Science in Cambridge
TheContentMine
 
Using the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support EcoinformaticsUsing the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support Ecoinformatics
ebiquity
 
Behavior ontology workshop princeton
Behavior ontology workshop princetonBehavior ontology workshop princeton
Behavior ontology workshop princeton
Cyndy Parr
 
Mining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistryMining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistry
petermurrayrust
 
Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practice
petermurrayrust
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!
petermurrayrust
 
Using and extending Darwin Core for structured attribute data
Using and extending Darwin Core for structured attribute dataUsing and extending Darwin Core for structured attribute data
Using and extending Darwin Core for structured attribute data
Cyndy Parr
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
Duncan Hull
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked Data
Michel Dumontier
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIH
petermurrayrust
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
Carole Goble
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
TheContentMine
 
Feedback of a couple of eco-informatic tools for soil invertebrate functional...
Feedback of a couple of eco-informatic tools for soil invertebrate functional...Feedback of a couple of eco-informatic tools for soil invertebrate functional...
Feedback of a couple of eco-informatic tools for soil invertebrate functional...
Alison Specht
 
TRY - a global database of plant traits
TRY - a global database of plant traitsTRY - a global database of plant traits
TRY - a global database of plant traits
Future Earth
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
petermurrayrust
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolism
petermurrayrust
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
petermurrayrust
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literature
petermurrayrust
 

What's hot (20)

iplant-highlights-pag2015
iplant-highlights-pag2015iplant-highlights-pag2015
iplant-highlights-pag2015
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
 
Content Mining of Science in Cambridge
Content Mining of Science in CambridgeContent Mining of Science in Cambridge
Content Mining of Science in Cambridge
 
Using the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support EcoinformaticsUsing the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support Ecoinformatics
 
Behavior ontology workshop princeton
Behavior ontology workshop princetonBehavior ontology workshop princeton
Behavior ontology workshop princeton
 
Mining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistryMining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistry
 
Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practice
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!
 
Using and extending Darwin Core for structured attribute data
Using and extending Darwin Core for structured attribute dataUsing and extending Darwin Core for structured attribute data
Using and extending Darwin Core for structured attribute data
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked Data
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIH
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
 
Feedback of a couple of eco-informatic tools for soil invertebrate functional...
Feedback of a couple of eco-informatic tools for soil invertebrate functional...Feedback of a couple of eco-informatic tools for soil invertebrate functional...
Feedback of a couple of eco-informatic tools for soil invertebrate functional...
 
TRY - a global database of plant traits
TRY - a global database of plant traitsTRY - a global database of plant traits
TRY - a global database of plant traits
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolism
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 

Similar to Open Tree of Life @NSF

Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
GigaScience, BGI Hong Kong
 
Biodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBiodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary Challenge
Bryan Heidorn
 
Cranston Evolution 2013
Cranston Evolution 2013Cranston Evolution 2013
Cranston Evolution 2013
Karen Cranston
 
2014 sage-talk
2014 sage-talk2014 sage-talk
2014 sage-talk
c.titus.brown
 
iEvoBio Keynote Talk 2010
iEvoBio Keynote Talk 2010iEvoBio Keynote Talk 2010
iEvoBio Keynote Talk 2010
Rob Guralnick
 
RPG iEvoBio 2010 Keynote
RPG iEvoBio 2010 KeynoteRPG iEvoBio 2010 Keynote
RPG iEvoBio 2010 Keynote
Rob Guralnick
 
20140317 pi b_nmbe_journal_club
20140317 pi b_nmbe_journal_club20140317 pi b_nmbe_journal_club
20140317 pi b_nmbe_journal_club
agosti
 
Vince smith-delivering biodiversity knowledge in the information age-notext
Vince smith-delivering biodiversity knowledge in the information age-notextVince smith-delivering biodiversity knowledge in the information age-notext
Vince smith-delivering biodiversity knowledge in the information age-notext
Vince Smith
 
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
GigaScience, BGI Hong Kong
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
mikaelhuss
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Spark Summit
 
Open Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysisOpen Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysis
Antica Culina
 
Open Tree of Life at Evolution 2014
Open Tree of Life at Evolution 2014Open Tree of Life at Evolution 2014
Open Tree of Life at Evolution 2014
Karen Cranston
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
Scott Edmunds
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
GigaScience, BGI Hong Kong
 
Carleton Biology talk : March 2014
Carleton Biology talk : March 2014Carleton Biology talk : March 2014
Carleton Biology talk : March 2014
Karen Cranston
 
Scratchpads introductory presentation 45mins
Scratchpads introductory presentation   45minsScratchpads introductory presentation   45mins
Scratchpads introductory presentation 45mins
Dimitrios Koureas
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
GigaScience, BGI Hong Kong
 
Supporting researchers in the molecular life sciences Jeff Christiansen
Supporting researchers in the molecular life sciences Jeff Christiansen Supporting researchers in the molecular life sciences Jeff Christiansen
Supporting researchers in the molecular life sciences Jeff Christiansen
ARDC
 

Similar to Open Tree of Life @NSF (20)

Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
Biodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBiodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary Challenge
 
Cranston Evolution 2013
Cranston Evolution 2013Cranston Evolution 2013
Cranston Evolution 2013
 
2014 sage-talk
2014 sage-talk2014 sage-talk
2014 sage-talk
 
iEvoBio Keynote Talk 2010
iEvoBio Keynote Talk 2010iEvoBio Keynote Talk 2010
iEvoBio Keynote Talk 2010
 
RPG iEvoBio 2010 Keynote
RPG iEvoBio 2010 KeynoteRPG iEvoBio 2010 Keynote
RPG iEvoBio 2010 Keynote
 
20140317 pi b_nmbe_journal_club
20140317 pi b_nmbe_journal_club20140317 pi b_nmbe_journal_club
20140317 pi b_nmbe_journal_club
 
Vince smith-delivering biodiversity knowledge in the information age-notext
Vince smith-delivering biodiversity knowledge in the information age-notextVince smith-delivering biodiversity knowledge in the information age-notext
Vince smith-delivering biodiversity knowledge in the information age-notext
 
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
 
Open Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysisOpen Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysis
 
Open Tree of Life at Evolution 2014
Open Tree of Life at Evolution 2014Open Tree of Life at Evolution 2014
Open Tree of Life at Evolution 2014
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
Carleton Biology talk : March 2014
Carleton Biology talk : March 2014Carleton Biology talk : March 2014
Carleton Biology talk : March 2014
 
Scratchpads introductory presentation 45mins
Scratchpads introductory presentation   45minsScratchpads introductory presentation   45mins
Scratchpads introductory presentation 45mins
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
 
Supporting researchers in the molecular life sciences Jeff Christiansen
Supporting researchers in the molecular life sciences Jeff Christiansen Supporting researchers in the molecular life sciences Jeff Christiansen
Supporting researchers in the molecular life sciences Jeff Christiansen
 

More from Karen Cranston

Open Tree of Life Phyloseminar 2014
Open Tree of Life Phyloseminar 2014Open Tree of Life Phyloseminar 2014
Open Tree of Life Phyloseminar 2014
Karen Cranston
 
WSSSPE: Building communities
WSSSPE: Building communitiesWSSSPE: Building communities
WSSSPE: Building communities
Karen Cranston
 
Building communities around open-source scientific software
Building communities around open-source scientific softwareBuilding communities around open-source scientific software
Building communities around open-source scientific software
Karen Cranston
 
Using phylogenetic metadata for large-scale phylogeny synthesis
Using phylogenetic metadata for large-scale phylogeny synthesisUsing phylogenetic metadata for large-scale phylogeny synthesis
Using phylogenetic metadata for large-scale phylogeny synthesis
Karen Cranston
 
Open Tree at UNCC Jan 2013
Open Tree at UNCC Jan 2013Open Tree at UNCC Jan 2013
Open Tree at UNCC Jan 2013
Karen Cranston
 
Freeing scientific data using CC0
Freeing scientific data using CC0Freeing scientific data using CC0
Freeing scientific data using CC0
Karen Cranston
 
If this is the future, where is my tree of life?
If this is the future, where is my tree of life?If this is the future, where is my tree of life?
If this is the future, where is my tree of life?
Karen Cranston
 
Phylotastic @iEvoBio
Phylotastic @iEvoBioPhylotastic @iEvoBio
Phylotastic @iEvoBio
Karen Cranston
 
Open Tree of Life @Evolution 2012
Open Tree of Life @Evolution 2012Open Tree of Life @Evolution 2012
Open Tree of Life @Evolution 2012
Karen Cranston
 
OpenTree at NESCent Academy 2012
OpenTree at NESCent Academy 2012OpenTree at NESCent Academy 2012
OpenTree at NESCent Academy 2012
Karen Cranston
 
Open Tree of Life at Duke Futures
Open Tree of Life at Duke FuturesOpen Tree of Life at Duke Futures
Open Tree of Life at Duke Futures
Karen Cranston
 

More from Karen Cranston (11)

Open Tree of Life Phyloseminar 2014
Open Tree of Life Phyloseminar 2014Open Tree of Life Phyloseminar 2014
Open Tree of Life Phyloseminar 2014
 
WSSSPE: Building communities
WSSSPE: Building communitiesWSSSPE: Building communities
WSSSPE: Building communities
 
Building communities around open-source scientific software
Building communities around open-source scientific softwareBuilding communities around open-source scientific software
Building communities around open-source scientific software
 
Using phylogenetic metadata for large-scale phylogeny synthesis
Using phylogenetic metadata for large-scale phylogeny synthesisUsing phylogenetic metadata for large-scale phylogeny synthesis
Using phylogenetic metadata for large-scale phylogeny synthesis
 
Open Tree at UNCC Jan 2013
Open Tree at UNCC Jan 2013Open Tree at UNCC Jan 2013
Open Tree at UNCC Jan 2013
 
Freeing scientific data using CC0
Freeing scientific data using CC0Freeing scientific data using CC0
Freeing scientific data using CC0
 
If this is the future, where is my tree of life?
If this is the future, where is my tree of life?If this is the future, where is my tree of life?
If this is the future, where is my tree of life?
 
Phylotastic @iEvoBio
Phylotastic @iEvoBioPhylotastic @iEvoBio
Phylotastic @iEvoBio
 
Open Tree of Life @Evolution 2012
Open Tree of Life @Evolution 2012Open Tree of Life @Evolution 2012
Open Tree of Life @Evolution 2012
 
OpenTree at NESCent Academy 2012
OpenTree at NESCent Academy 2012OpenTree at NESCent Academy 2012
OpenTree at NESCent Academy 2012
 
Open Tree of Life at Duke Futures
Open Tree of Life at Duke FuturesOpen Tree of Life at Duke Futures
Open Tree of Life at Duke Futures
 

Recently uploaded

Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 

Recently uploaded (20)

Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 

Open Tree of Life @NSF

  • 1. Karen Cranston National Evolutionary Synthesis Center @kcranstn http://www.slideshare.net/kcranstn opentreeoflife.org
  • 2. What does it mean to “have” the tree of life? complete & dynamic browse, download, query use for research questions implies digital access
  • 4. Goals 1. Synthesize a complete draft tree of life from existing phylogenies 2. Release in year 1 with: a. engaging public interface b. ability to upload new data, explore conflict, see provenance c. open data: tree, subtrees and source data
  • 5. Graph databases of taxonomy + source trees •filter / weight input trees •combine into synthetic trees •feedback •input new data sets
  • 6. ~ 4% of all published phylogenetic trees Stoltzfus et al 2012 Inputs: Phylogenetic data Archiving sequence data is a community norm
  • 7. assembly alignment inference expertise time $$$ thermore, a paraphyletic relationship of phorids and syrphids would support the hypothesis that their shared special mode of extraembryonic development (dorsal amnion closure) (26) evolved in the stem lineage of Cyclorrhapha and preceded the origin of the schizophoran amnioserosa. To test this hypothesis, we used a relatively recent phylogenomic marker: small, noncoding, regulatory micro-RNAs (miRNAs). miRNAs exhibit a striking phylogenetic pattern of conservation across the metazoan tree of life, suggesting the accumulation and maintenance of miRNA families throughout organismal evolution Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (−lnL = 344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im- proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80– 88%). Colored squares on terminal branches indicate the presence, in at least one species of a family, of ecological traits as shown to lower left. The number of origins of each trait was estimated with reference to the phylogeny, the distribution of each trait among genera within a family, and the known biology of the organisms. Wiegmann et al. PNAS Early Edition | 3 of 6 Why do we need to database phylogenetic trees?
  • 8. Heroic data collection efforts Surveyed >7000 phylogenetic studies in plants, fungi and animals, unicellular organisms Result: repository of data for >2300 studies, >4800 trees Remaining data not available digitally Manuscript accepted to PLoS Biology
  • 9. Inputs: Taxonomy Large fraction of species not represented in phylogenies taxonomy provides backbone & coverage at tips Need name resolution services for data cleaning
  • 10. Process Source trees (Phylografter) Data storage & synthesis (treemachine) OpenTree: visualization, search, downloadTaxonomies (taxamachine)
  • 12. Source tree & taxonomy synthesis Novel graph database for phylogenies (treemachine) and taxonomy (taxomachine) Allows for efficient storage and retrieval
  • 14. Public tree of life publictreeoflife.com/tree
  • 15. open data: requiring CC0 license on source trees open source software: https://github.com/OpenTreeOfLife wiki: http://opentree.wikispaces.com/ (52 members) public mailing list (67 members) “Open” Tree of Life
  • 16. Community engagement ~50 visitors per day to blog.opentreeoflife.org @opentreeoflife on Twitter (~900 followers) Tree of Life symposium: Evolution 2013 Hackathon in year 2 (joint with Arbor)
  • 17. Collaborations providing images and text for public tree developing methods for subtree extraction summer student providing links to ToLWeb pages treeviz project from U Indiana MOOC, upcoming summer intern year 2-3 plans for data archiving / harvest
  • 18. Assessment: PI survey general satisfaction with progress on data collection, synthesis and software development more focus on incentives for users more integration across labs
  • 19. Assessment: Advisory board Members: David Hillis (UT Austin) Jan Reichelt (Mendeley) Andy Sinauer (Sinauer Associates) Planning meeting for start of year 2
  • 20. On track for year 1 release 1. Synthesize a complete draft tree of life from existing phylogenies 2. Release in year 1 with: a. engaging public interface b. ability to upload new data, explore conflict, see provenance c. open data: tree, subtrees and source data
  • 21. Goals for year 2 Refine draft tree based on user feedback Empirical use cases drive development Incentives for users / data contributors Collaboration with external projects (AVAToL, ToLWeb, Phylotastic, Dryad)