SlideShare a Scribd company logo
1 of 46
The iPlant Tree of Life Project and Toolkit: Building aCyberinfrastructure for Plant Science Research Naim Matasci 520 303 8623 The iPlant Collaborative National Museum of Natural History Jul 14, 2011
What is iPlant?
Discovery Environment NEW RELEASE COMING SOON! http://www.iplantcollaborative.org/discovery-environment-preview-access
4
Physical Infrastructure Computation ,[object Object]
20K cores cluster
1 TB RAM
512 GPUsStorage ,[object Object]
20  PB archive
High speed parallel data transfer ,[object Object]
Cloud Storage AVAILABLE NOW! ,[object Object]
Multiple points of entry: web interface, mounted FS, API
Free and securehttp://www.iplantcollaborative.org/about/policies/data-set-hosting
Cloud Computing AVAILABLE NOW! Virtual Machines Up to 4 cores, 32 GB RAM, 100 GB dedicated disk Run any x86-compatible OS (even Windows) Persistent or on-demand Log in via SSH or secure VNC Use Cases Internet-enabled Servers Database management appliances Virtual desktops …The sky is the limit! http://www.iplantcollaborative.org/atmosphere-preview
Consumer Applications 9 iPlant's CI
iPlant Tree of Life Grand Challange Large phylogenetic inference Building a tree of life for up to 500,000 green plants Tree Visualization Scalable visualization for small to large trees Data Assembly and Integration Acquisition, organization and processing the data Taxonomic Intelligence Sorting out different names for the same species Tree Reconciliation Resolving discordant gene and species trees Trait Evolution Using trees to understand how traits evolved
Big Trees To optimize existing methods to construct phylogenetic trees in the order of 500K taxa.
Big Trees NINJA/WINDJAMMER (Travis Wheeler) Neighbor-Joining implementation that can analyze > 200K species  Six day run time reduced 32-fold to 4.5 hours for 220K species data set Two/three day run time reduced 1,800-folds to 2 minutes for distance matrix calculation on 220K set  RAxML-Light (AlexandrosStamatakis) Large Scale Maximum Likelihood implementation  55K Tree published (Stephen A. Smith et al., “Understanding angiosperm diversification using small and large phylogenetic trees,” American Journal of Botany 98, no. 3 (2011): 404 -414) AVAILABLE NOW!
Tree Visualization To develop an application for viewing, analyzing and exploring large phylogenetic trees.
Tree Visualization > 500K Taxa Fast Web based, platform independent Semantic zooming Metadata driven display of information
iPlant Tree Viewer Prototype AVAILABLE NOW! http://portnoy.iplantcollaborative.org/
1KP Collaboration (1KP) – To support the data analysis of the Thousand Plant Transcriptomes Project
1KP dozens of species completed genomes unexplored territory N(genes) dozens of genes PCR in 104 species N(species)
Broad phylogenetic coverage algae non-flowering flowering (angiosperm) on role of polyploidy in Darwin’s “abominable mystery” Phylogenomicsof 1000 species across plant taxa
Tree Reconciliation To reconcile the evolutionary history of genes and species.
Gene family data courtesy John Bowers Tree Reconciliation
Taxonomic Name Resolution Collaboration (BIEN) - To unify and resolve synonymous, erroneous, or other conflicting taxonomic names.
Taxonomic uncertainty Non-existent names ,[object Object]
Contamination
Annotations
Morphospecies
Digitization issues (frame shifts, character encoding)Lexical variants (digitization conventions)Synonymy ,[object Object]
Taxonomic synonyms / conceptsMisidentifications, incomplete identifications
a)Centauriumcurvistamineum (Wittr.) Abrams (1951) b)Centaurium minimum (Howell) Piper (1915) c)Centauriummuhlenbergii(Griseb.) Wight ex Piper (1906) d)Centauriummuhlenbergii (Griseb.) Wight ex Piper forma albiflorum (Suksd.) St. John (1937) e)Centauriummuhlenbergii (Griseb.) Wight ex Piper var. albiflorumSuksd. (1927) f)Centaurodesmuhlenbergii (Griseb.) Kuntze (1891) g)ErythraeacurvistamineaWittr. (1886) h)Erythraea minima Howell (1901) i)ErythraeamuhlenbergiiGriseb. (1839) Image: Gordon Leppig & Andrea J. Pickart
How to figure that out? …or ask around at My-Plant.org
Makemake at de.wikipedia
Non-existent names: Herbarium specimens *New World plant specimens, 34 herbaria, simple match against IPNI and TROPICOS, excluding authors
Hans Hillewaert
Taxonomic Name Resolution Service Computer assisted standardization of plant names Corrects spelling errors and alternative spellings to a standard list of names Convert out-of-date names to currently accepted names
Availability Source code (3-clause BSD) http://github.com/iPlantCollaborativeOpenSource/TNRS Web + API instructions http://tnrs.iplantcollaborative.org

More Related Content

Viewers also liked

Phylotastic reconciliation
Phylotastic reconciliationPhylotastic reconciliation
Phylotastic reconciliationNaim Matasci
 
Creatures of Habit Creativity Workshop
Creatures of Habit Creativity WorkshopCreatures of Habit Creativity Workshop
Creatures of Habit Creativity WorkshopSimon Jack
 
The TNRS: a Taxonomic Name Resolution Service for Plants
The TNRS: a Taxonomic Name Resolution Service for PlantsThe TNRS: a Taxonomic Name Resolution Service for Plants
The TNRS: a Taxonomic Name Resolution Service for PlantsNaim Matasci
 
Sandra Slater.Copyright & Legal Issues
Sandra Slater.Copyright & Legal IssuesSandra Slater.Copyright & Legal Issues
Sandra Slater.Copyright & Legal IssuesSandy Gottlieb
 
Animation Trivia
Animation TriviaAnimation Trivia
Animation TriviaCel Mallari
 
Post-tree Analyses Workflow
Post-tree Analyses WorkflowPost-tree Analyses Workflow
Post-tree Analyses WorkflowNaim Matasci
 
iPlant Tree of Life
iPlant Tree of LifeiPlant Tree of Life
iPlant Tree of LifeNaim Matasci
 
Sandra Slater.Technology Planning.ppt
Sandra Slater.Technology Planning.pptSandra Slater.Technology Planning.ppt
Sandra Slater.Technology Planning.pptSandy Gottlieb
 
Robots second quarter ay 2011-2012 dps
Robots second quarter ay 2011-2012 dpsRobots second quarter ay 2011-2012 dps
Robots second quarter ay 2011-2012 dpsCel Mallari
 
Intro to animation
Intro to animationIntro to animation
Intro to animationCel Mallari
 

Viewers also liked (10)

Phylotastic reconciliation
Phylotastic reconciliationPhylotastic reconciliation
Phylotastic reconciliation
 
Creatures of Habit Creativity Workshop
Creatures of Habit Creativity WorkshopCreatures of Habit Creativity Workshop
Creatures of Habit Creativity Workshop
 
The TNRS: a Taxonomic Name Resolution Service for Plants
The TNRS: a Taxonomic Name Resolution Service for PlantsThe TNRS: a Taxonomic Name Resolution Service for Plants
The TNRS: a Taxonomic Name Resolution Service for Plants
 
Sandra Slater.Copyright & Legal Issues
Sandra Slater.Copyright & Legal IssuesSandra Slater.Copyright & Legal Issues
Sandra Slater.Copyright & Legal Issues
 
Animation Trivia
Animation TriviaAnimation Trivia
Animation Trivia
 
Post-tree Analyses Workflow
Post-tree Analyses WorkflowPost-tree Analyses Workflow
Post-tree Analyses Workflow
 
iPlant Tree of Life
iPlant Tree of LifeiPlant Tree of Life
iPlant Tree of Life
 
Sandra Slater.Technology Planning.ppt
Sandra Slater.Technology Planning.pptSandra Slater.Technology Planning.ppt
Sandra Slater.Technology Planning.ppt
 
Robots second quarter ay 2011-2012 dps
Robots second quarter ay 2011-2012 dpsRobots second quarter ay 2011-2012 dps
Robots second quarter ay 2011-2012 dps
 
Intro to animation
Intro to animationIntro to animation
Intro to animation
 

Similar to The iPlant Tree of Life Project and Toolkit

The iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and ToolkitThe iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and ToolkitNaim Matasci
 
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2Ellinor Michel
 
iPlant TNRS for digital collections - iDigBio Workshop
iPlant TNRS for digital collections - iDigBio WorkshopiPlant TNRS for digital collections - iDigBio Workshop
iPlant TNRS for digital collections - iDigBio WorkshopNaim Matasci
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingGigaScience, BGI Hong Kong
 
nternational Biodiversity Projects and Natural History Museums: Current stat...
nternational Biodiversity Projects and Natural History Museums:  Current stat...nternational Biodiversity Projects and Natural History Museums:  Current stat...
nternational Biodiversity Projects and Natural History Museums: Current stat...Klaus Riede
 
The Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersThe Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersLarry Smarr
 
Overview on arabidopsis and rice genome
Overview on arabidopsis and rice genomeOverview on arabidopsis and rice genome
Overview on arabidopsis and rice genomeGopal Singh
 
Nomenclature for the Future: The power and challenges for stable and sensible...
Nomenclature for the Future: The power and challenges for stable and sensible...Nomenclature for the Future: The power and challenges for stable and sensible...
Nomenclature for the Future: The power and challenges for stable and sensible...ICZN
 
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceGigaScience, BGI Hong Kong
 
OptIPuter: Metagenomics at Light Speed
OptIPuter: Metagenomics at Light SpeedOptIPuter: Metagenomics at Light Speed
OptIPuter: Metagenomics at Light SpeedLarry Smarr
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"GigaScience, BGI Hong Kong
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"GigaScience, BGI Hong Kong
 
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK Cyndy Parr
 
The Future of Microalgal Taxonomy
The Future of Microalgal TaxonomyThe Future of Microalgal Taxonomy
The Future of Microalgal TaxonomyAnne Thessen
 
The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2
The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2
The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2Ellinor Michel
 
Scott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data CitationScott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data CitationGigaScience, BGI Hong Kong
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
 
Plant phenotyping platforms
Plant phenotyping platformsPlant phenotyping platforms
Plant phenotyping platformsMichal Slota
 
High Performance Collaboration
High Performance CollaborationHigh Performance Collaboration
High Performance CollaborationLarry Smarr
 

Similar to The iPlant Tree of Life Project and Toolkit (20)

The iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and ToolkitThe iPlant Tree of Life Project and Toolkit
The iPlant Tree of Life Project and Toolkit
 
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
Michel digital nomenclature-gna-zoobank-2014-co-namesconfv2
 
iPlant TNRS for digital collections - iDigBio Workshop
iPlant TNRS for digital collections - iDigBio WorkshopiPlant TNRS for digital collections - iDigBio Workshop
iPlant TNRS for digital collections - iDigBio Workshop
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
nternational Biodiversity Projects and Natural History Museums: Current stat...
nternational Biodiversity Projects and Natural History Museums:  Current stat...nternational Biodiversity Projects and Natural History Museums:  Current stat...
nternational Biodiversity Projects and Natural History Museums: Current stat...
 
The Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersThe Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics Researchers
 
Overview on arabidopsis and rice genome
Overview on arabidopsis and rice genomeOverview on arabidopsis and rice genome
Overview on arabidopsis and rice genome
 
Nomenclature for the Future: The power and challenges for stable and sensible...
Nomenclature for the Future: The power and challenges for stable and sensible...Nomenclature for the Future: The power and challenges for stable and sensible...
Nomenclature for the Future: The power and challenges for stable and sensible...
 
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
 
OptIPuter: Metagenomics at Light Speed
OptIPuter: Metagenomics at Light SpeedOptIPuter: Metagenomics at Light Speed
OptIPuter: Metagenomics at Light Speed
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
 
The Future of Microalgal Taxonomy
The Future of Microalgal TaxonomyThe Future of Microalgal Taxonomy
The Future of Microalgal Taxonomy
 
The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2
The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2
The power of names smithsonian talk-2013-iczn_nomenclature&bioinformatics-v2
 
Scott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data CitationScott Edmunds at DataCite 2012: Adventures in Data Citation
Scott Edmunds at DataCite 2012: Adventures in Data Citation
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
Plant phenotyping platforms
Plant phenotyping platformsPlant phenotyping platforms
Plant phenotyping platforms
 
High Performance Collaboration
High Performance CollaborationHigh Performance Collaboration
High Performance Collaboration
 
2014 sage-talk
2014 sage-talk2014 sage-talk
2014 sage-talk
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

The iPlant Tree of Life Project and Toolkit

  • 1. The iPlant Tree of Life Project and Toolkit: Building aCyberinfrastructure for Plant Science Research Naim Matasci 520 303 8623 The iPlant Collaborative National Museum of Natural History Jul 14, 2011
  • 3. Discovery Environment NEW RELEASE COMING SOON! http://www.iplantcollaborative.org/discovery-environment-preview-access
  • 4. 4
  • 5.
  • 8.
  • 9. 20 PB archive
  • 10.
  • 11.
  • 12. Multiple points of entry: web interface, mounted FS, API
  • 14. Cloud Computing AVAILABLE NOW! Virtual Machines Up to 4 cores, 32 GB RAM, 100 GB dedicated disk Run any x86-compatible OS (even Windows) Persistent or on-demand Log in via SSH or secure VNC Use Cases Internet-enabled Servers Database management appliances Virtual desktops …The sky is the limit! http://www.iplantcollaborative.org/atmosphere-preview
  • 15. Consumer Applications 9 iPlant's CI
  • 16. iPlant Tree of Life Grand Challange Large phylogenetic inference Building a tree of life for up to 500,000 green plants Tree Visualization Scalable visualization for small to large trees Data Assembly and Integration Acquisition, organization and processing the data Taxonomic Intelligence Sorting out different names for the same species Tree Reconciliation Resolving discordant gene and species trees Trait Evolution Using trees to understand how traits evolved
  • 17. Big Trees To optimize existing methods to construct phylogenetic trees in the order of 500K taxa.
  • 18. Big Trees NINJA/WINDJAMMER (Travis Wheeler) Neighbor-Joining implementation that can analyze > 200K species Six day run time reduced 32-fold to 4.5 hours for 220K species data set Two/three day run time reduced 1,800-folds to 2 minutes for distance matrix calculation on 220K set RAxML-Light (AlexandrosStamatakis) Large Scale Maximum Likelihood implementation 55K Tree published (Stephen A. Smith et al., “Understanding angiosperm diversification using small and large phylogenetic trees,” American Journal of Botany 98, no. 3 (2011): 404 -414) AVAILABLE NOW!
  • 19. Tree Visualization To develop an application for viewing, analyzing and exploring large phylogenetic trees.
  • 20. Tree Visualization > 500K Taxa Fast Web based, platform independent Semantic zooming Metadata driven display of information
  • 21. iPlant Tree Viewer Prototype AVAILABLE NOW! http://portnoy.iplantcollaborative.org/
  • 22. 1KP Collaboration (1KP) – To support the data analysis of the Thousand Plant Transcriptomes Project
  • 23. 1KP dozens of species completed genomes unexplored territory N(genes) dozens of genes PCR in 104 species N(species)
  • 24. Broad phylogenetic coverage algae non-flowering flowering (angiosperm) on role of polyploidy in Darwin’s “abominable mystery” Phylogenomicsof 1000 species across plant taxa
  • 25. Tree Reconciliation To reconcile the evolutionary history of genes and species.
  • 26. Gene family data courtesy John Bowers Tree Reconciliation
  • 27.
  • 28. Taxonomic Name Resolution Collaboration (BIEN) - To unify and resolve synonymous, erroneous, or other conflicting taxonomic names.
  • 29.
  • 33.
  • 34. Taxonomic synonyms / conceptsMisidentifications, incomplete identifications
  • 35. a)Centauriumcurvistamineum (Wittr.) Abrams (1951) b)Centaurium minimum (Howell) Piper (1915) c)Centauriummuhlenbergii(Griseb.) Wight ex Piper (1906) d)Centauriummuhlenbergii (Griseb.) Wight ex Piper forma albiflorum (Suksd.) St. John (1937) e)Centauriummuhlenbergii (Griseb.) Wight ex Piper var. albiflorumSuksd. (1927) f)Centaurodesmuhlenbergii (Griseb.) Kuntze (1891) g)ErythraeacurvistamineaWittr. (1886) h)Erythraea minima Howell (1901) i)ErythraeamuhlenbergiiGriseb. (1839) Image: Gordon Leppig & Andrea J. Pickart
  • 36. How to figure that out? …or ask around at My-Plant.org
  • 38. Non-existent names: Herbarium specimens *New World plant specimens, 34 herbaria, simple match against IPNI and TROPICOS, excluding authors
  • 40.
  • 41. Taxonomic Name Resolution Service Computer assisted standardization of plant names Corrects spelling errors and alternative spellings to a standard list of names Convert out-of-date names to currently accepted names
  • 42.
  • 43.
  • 44.
  • 45. Availability Source code (3-clause BSD) http://github.com/iPlantCollaborativeOpenSource/TNRS Web + API instructions http://tnrs.iplantcollaborative.org
  • 46.
  • 47.
  • 48.
  • 49.
  • 50. Trait Evolution To develop an infrastructure for downstream analysis of large trees.
  • 51. Trait Evolution Toolkit to study the evolution of traits of interest on very large phylogenies Diversification Biogeographic patterns Adaptation Co-evolution …
  • 52. Current analyses (Proof of concept) Phylogenetically Independent Contrasts(Felsenstein 1985) Continuous Ancestral Character Estimation (Schulter et al. 1997, Paradis 2004) Discrete Ancestral Character Estimation (Pagel 1994, Paradis 2004)
  • 53. Community Integrated (2 ½ Days Workshop) EUtils Lopper RAxML Ninja Phyml Muscle PHYLIP VCF to GFF script LRmaqqtl FASTX quality stats FASTX quality boxplot FASTX nucleotide distribution Cuffcompare ERMINEJ progressiveMauve iPlantBorda (mlpy) iPlantCanberra (mlpy) vbay MECPM OUCH Picante Ontologize BOWTIE BWA TopHat SHRiMP Cuffdiff GNU Core Text utilities GeneMania SRA import PARS PL DTT BBC biclustering
  • 54. My-Plant.org To easily share information and research, collaborate, and stay on top of the latest news in the field.
  • 55. Collaborative Tool AVAILABLE NOW! NEW AND IMPROVED! http://my-plant.org/
  • 56.

Editor's Notes

  1. Bringing a culture of computing to the Plant Sciences.
  2. World class resources:Rocinante: 128 cores; 16 nodes; 64 GB node; 300 TB storageCorral: 1.7 PB storage + 20 PB archiveLonestar4: 22,656 Intel Westmere cores; 40 GB QDR-IB; 1 PB storage; 44.3 TB RAM. Plus 1 TB RAM, GPU, and Cloud upgrades.Longhorn: 2048 Intel Nehalem cores. 512 NVIDIA Quadro FX 5800 GPU. 14.5 TB RAM. 1 PB storage.Ranger: 62976 AMD Opteron cores; 123 TB RAM; 32 GPUs. 1.7 PB storage.
  3. Large: >2 Gigs, where browsers fail
  4. Highest level of abstraction
  5. Distance matrix calculation compared to FASTREE
  6. BIEN: biological information and ecology network
  7. Parsing: GNI Parser Dmitry MozzherinMatching: Taxamatch by Tony Rees
  8. Provide the scientific community with a toolkit that will allow them to study the evolution of traits of interestAdaptation in response to past climate changeCo-evolution of pollinators and flowers or hosts and parasites
  9. Contrast: Test for correlation of continuous traits, taking into account phylogenyDACE: Estimating the status of a discrete trait (e.g. presence/absence of fruit, color) in the ancestors of a group of taxaCACE: Estimating the value of a continuous trait (e.g. yield, hight) in the ancestors of a group of taxa