SlideShare a Scribd company logo
1 of 1
Download to read offline
Distance
to
variety
•Variety for probabilistic model of evolution
•Aligned DNA sequence data as data points
for each topology
Keep if
cutoff >
value
•Quartet topology gives taxa pairings
•Data-dependent hypothesis testing
and other scores
Phylogenetic
Tree
A distance-based method for phylogenetic tree
reconstruction using algebraic geometry
Emily Castner*1, Brent Davis2, and Dr. Joseph Rusinko3
Mount Holyoke College1, Colorado State University2, Hobart and William Smith Colleges3
Phylogenetic trees show evolutionary history
• We reconstruct four-species quartet trees, which can be
amalgamated into larger phylogenetic trees.
• In Markov models, species evolve because of random,
independent nucleotide substitutions in their genome.
Varieties show expected word distributions
Select the topology closest to the variety
Test a tree space of varying branch lengths
• Topology: Tree structure giving specific pairings between
species, shows which are more closely related
• Quartet trees have three possible topologies; one is correct
• Words: Four-letter combinations of DNA bases {A, C, G, T}
• Three possible species orderings give three sets of words
• Variety: Solution set of a system of polynomial equations
• Parameterized from nucleotide substitution probabilities
to genome word distributions
• Data fits a topology if the distance to its variety is small
M. Casanellas, L. D. Garcia, and S. Sullivant, “Catalog of small trees,” in Algebraic Statistics for Computational Biology. New York:
Cambridge Univ. Press, 2005 pp. 291—304. [Online]. Available: http://dx.doi.org/10.1017/CBO9780511610684.019
N. Eriksson, “Using invariants for phylogenetic tree construction,” IMA Volumes in Mathematics and its Applications, vol. 149,
Emerging Applications of Algebraic Geometry, pp. 89—108, Springer, New York, 2009.
Jesús Fernández-Sánchez and Marta Casanellas, “Invariant versus classical quartet inference when evolution is heterogeneous
across sites and lineages,” Systematic Biology, 2015.
Thomas H Jukes and Charles R. Cantor, “Evolution of protein molecules,” Mammalian protein metabolism, vol. 3, pp. 21—132, 1980.
Anna M. Kedzierska and Marta Casanellas, “Gennon-h: Generating multiple sequence alignments on nonhomogenous phylogenetic
trees,” BMC bioinformatics, vol. 13, no. 1, pp. 216, 2012.
Motoo Kimura, “A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide
sequences,” Journal of molecular evolution, vol. 16, no. 2, pp. 111—120, 1980.
M. Kimura, “Estimation of evolutionary distances between homologous nucleotide sequences,” Proceedings of the National Academy of
Sciences, vol. 78, no. 1, pp. 454—458, 1981.
M. S. Swenson, R. Suri, C. R. Linder, and T. Warnow, “An experimental study of quartets maxcut and other supertree methods,”
Algorithms for Molecular Biology, vol. 6, no. 1, pp. 7, 2011.
Kimura 3-parameter model is most accurate
Algebraic geometry has phylogenetic promise
This material is based upon work supported by the National Science Foundation under Grant No. DMS-1358534 and hosted by the 2015 Winthrop University REU: Bridging Applied & Theoretical
Mathematics. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Acknowledgments
References
Summarize observed distributions as points
A T C G
A C G T
A T T G
A G C A
AAAA,
TCTG,
CGTC,
GTGA
A T C G
A T T G
A G C A
A C G T
AAAA,
TTGC,
CTCG,
GGAT
A T C G
A G C A
A T T G
A C G T
AAAA,
TGTC,
CCTG,
GAGT
Python
Matlab
MaxCut
Jukes-Cantor model is fastest
𝑐
𝑎 𝑎
𝑏 𝑏
• 𝑎, 𝑏 ∈ 0.01, 1.5 ;
Left; 𝑐 = 𝑎
• 𝑎 = 0.05; 𝑏 = 0.75
Left; c ∈ 0.01, 0.4 ;
• c ∈ 0.01, 0.4 ;
Right; 𝑎 = 0.05; 𝑏 = 0.75
Jukes-Cantor reconstructs other models’ data
𝑐
𝑎 𝑏
𝑎 𝑏
• Use Jukes-Cantor implementation on all data
• A faster runtime
• Almost the same accuracy
• At least 95% accurate for 64% of the sample space
• At least 85% accurate for 77% of the sample space
• Most effective when excluding high 𝑏 and extreme 𝑎
• Reconstruction methods are useful for biologists.
Jukes-Cantor
A C G T
A x0 x1 x1 x1
C x1 x0 x1 x1
G x1 x1 x0 x1
T x1 x1 x1 x0
Kimura
2-Parameter
A C G T
A x0 x1 x2 x1
C x1 x0 x1 x2
G x2 x1 x0 x1
T x1 x2 x1 x0
Kimura
3-Parameter
A C G T
A x0 x1 x2 x3
C x1 x0 x3 x2
G x2 x3 x0 x1
T x3 x2 x1 x0

More Related Content

What's hot

Bioinformatics.Assignment
Bioinformatics.AssignmentBioinformatics.Assignment
Bioinformatics.Assignment
Naima Tahsin
 

What's hot (20)

Survey of softwares for phylogenetic analysis
Survey of softwares for phylogenetic analysisSurvey of softwares for phylogenetic analysis
Survey of softwares for phylogenetic analysis
 
Multiple Sequence Alignment-just glims of viewes on bioinformatics.
 Multiple Sequence Alignment-just glims of viewes on bioinformatics. Multiple Sequence Alignment-just glims of viewes on bioinformatics.
Multiple Sequence Alignment-just glims of viewes on bioinformatics.
 
Phylogenetics1
Phylogenetics1Phylogenetics1
Phylogenetics1
 
UPGMA
UPGMAUPGMA
UPGMA
 
philogenetic tree
philogenetic treephilogenetic tree
philogenetic tree
 
Phylogenetic analysis in nutshell
Phylogenetic analysis in nutshellPhylogenetic analysis in nutshell
Phylogenetic analysis in nutshell
 
PHYLOGENETICS WITH MEGA
PHYLOGENETICS WITH MEGAPHYLOGENETICS WITH MEGA
PHYLOGENETICS WITH MEGA
 
The tree of life
The tree of lifeThe tree of life
The tree of life
 
Phylogenetic studies
Phylogenetic studiesPhylogenetic studies
Phylogenetic studies
 
Gene tree-species tree methods in RevBayes
Gene tree-species tree methods in RevBayesGene tree-species tree methods in RevBayes
Gene tree-species tree methods in RevBayes
 
Phylogenetic tree
Phylogenetic treePhylogenetic tree
Phylogenetic tree
 
Construction of phylogenetic tree from multiple gene trees using principal co...
Construction of phylogenetic tree from multiple gene trees using principal co...Construction of phylogenetic tree from multiple gene trees using principal co...
Construction of phylogenetic tree from multiple gene trees using principal co...
 
Parsimony methods
Parsimony methodsParsimony methods
Parsimony methods
 
Bioinformatica 24-11-2011-t6-phylogenetics
Bioinformatica 24-11-2011-t6-phylogeneticsBioinformatica 24-11-2011-t6-phylogenetics
Bioinformatica 24-11-2011-t6-phylogenetics
 
Softwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisSoftwares For Phylogentic Analysis
Softwares For Phylogentic Analysis
 
Phylogenetic trees
Phylogenetic treesPhylogenetic trees
Phylogenetic trees
 
Phylogenetic tree and its construction and phylogeny of
Phylogenetic tree and its construction and phylogeny ofPhylogenetic tree and its construction and phylogeny of
Phylogenetic tree and its construction and phylogeny of
 
Phylogenetics an overview
Phylogenetics an overviewPhylogenetics an overview
Phylogenetics an overview
 
Bioinformatics.Assignment
Bioinformatics.AssignmentBioinformatics.Assignment
Bioinformatics.Assignment
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 

Similar to A distance-based method for phylogenetic tree reconstruction using algebraic geometry

Humans, it would seem, have a great love of categorizing, organi
Humans, it would seem, have a great love of categorizing, organiHumans, it would seem, have a great love of categorizing, organi
Humans, it would seem, have a great love of categorizing, organi
NarcisaBrandenburg70
 
Bls 303 l1.phylogenetics
Bls 303 l1.phylogeneticsBls 303 l1.phylogenetics
Bls 303 l1.phylogenetics
Bruno Mmassy
 

Similar to A distance-based method for phylogenetic tree reconstruction using algebraic geometry (20)

Gutell 097.jphy.2006.42.0655
Gutell 097.jphy.2006.42.0655Gutell 097.jphy.2006.42.0655
Gutell 097.jphy.2006.42.0655
 
BTC 506 Phylogenetic Analysis.pptx
BTC 506 Phylogenetic Analysis.pptxBTC 506 Phylogenetic Analysis.pptx
BTC 506 Phylogenetic Analysis.pptx
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
 
Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
 
Basics of Genome Assembly
Basics of Genome Assembly Basics of Genome Assembly
Basics of Genome Assembly
 
Interpreting ‘tree space’ in the context of very large empirical datasets
Interpreting ‘tree space’ in the context of very large empirical datasetsInterpreting ‘tree space’ in the context of very large empirical datasets
Interpreting ‘tree space’ in the context of very large empirical datasets
 
Perl for Phyloinformatics
Perl for PhyloinformaticsPerl for Phyloinformatics
Perl for Phyloinformatics
 
07_Phylogeny_2022.pdf
07_Phylogeny_2022.pdf07_Phylogeny_2022.pdf
07_Phylogeny_2022.pdf
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotation
 
Humans, it would seem, have a great love of categorizing, organi
Humans, it would seem, have a great love of categorizing, organiHumans, it would seem, have a great love of categorizing, organi
Humans, it would seem, have a great love of categorizing, organi
 
bai2
bai2bai2
bai2
 
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
 
Bayesian Divergence Time Estimation – Workshop Lecture
Bayesian Divergence Time Estimation – Workshop LectureBayesian Divergence Time Estimation – Workshop Lecture
Bayesian Divergence Time Estimation – Workshop Lecture
 
eg.poster
eg.postereg.poster
eg.poster
 
Beast dating ppt.pptx
Beast dating ppt.pptxBeast dating ppt.pptx
Beast dating ppt.pptx
 
Bioinformatics presentation shabir .pptx
Bioinformatics presentation shabir .pptxBioinformatics presentation shabir .pptx
Bioinformatics presentation shabir .pptx
 
Bls 303 l1.phylogenetics
Bls 303 l1.phylogeneticsBls 303 l1.phylogenetics
Bls 303 l1.phylogenetics
 
Introduction to Modern Biosystemaics for Fungal Classification
Introduction to Modern Biosystemaics for Fungal ClassificationIntroduction to Modern Biosystemaics for Fungal Classification
Introduction to Modern Biosystemaics for Fungal Classification
 
MUSEPosterCoGAPS
MUSEPosterCoGAPSMUSEPosterCoGAPS
MUSEPosterCoGAPS
 

Recently uploaded

Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Silpa
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
Silpa
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Silpa
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Silpa
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 

Recently uploaded (20)

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 

A distance-based method for phylogenetic tree reconstruction using algebraic geometry

  • 1. Distance to variety •Variety for probabilistic model of evolution •Aligned DNA sequence data as data points for each topology Keep if cutoff > value •Quartet topology gives taxa pairings •Data-dependent hypothesis testing and other scores Phylogenetic Tree A distance-based method for phylogenetic tree reconstruction using algebraic geometry Emily Castner*1, Brent Davis2, and Dr. Joseph Rusinko3 Mount Holyoke College1, Colorado State University2, Hobart and William Smith Colleges3 Phylogenetic trees show evolutionary history • We reconstruct four-species quartet trees, which can be amalgamated into larger phylogenetic trees. • In Markov models, species evolve because of random, independent nucleotide substitutions in their genome. Varieties show expected word distributions Select the topology closest to the variety Test a tree space of varying branch lengths • Topology: Tree structure giving specific pairings between species, shows which are more closely related • Quartet trees have three possible topologies; one is correct • Words: Four-letter combinations of DNA bases {A, C, G, T} • Three possible species orderings give three sets of words • Variety: Solution set of a system of polynomial equations • Parameterized from nucleotide substitution probabilities to genome word distributions • Data fits a topology if the distance to its variety is small M. Casanellas, L. D. Garcia, and S. Sullivant, “Catalog of small trees,” in Algebraic Statistics for Computational Biology. New York: Cambridge Univ. Press, 2005 pp. 291—304. [Online]. Available: http://dx.doi.org/10.1017/CBO9780511610684.019 N. Eriksson, “Using invariants for phylogenetic tree construction,” IMA Volumes in Mathematics and its Applications, vol. 149, Emerging Applications of Algebraic Geometry, pp. 89—108, Springer, New York, 2009. Jesús Fernández-Sánchez and Marta Casanellas, “Invariant versus classical quartet inference when evolution is heterogeneous across sites and lineages,” Systematic Biology, 2015. Thomas H Jukes and Charles R. Cantor, “Evolution of protein molecules,” Mammalian protein metabolism, vol. 3, pp. 21—132, 1980. Anna M. Kedzierska and Marta Casanellas, “Gennon-h: Generating multiple sequence alignments on nonhomogenous phylogenetic trees,” BMC bioinformatics, vol. 13, no. 1, pp. 216, 2012. Motoo Kimura, “A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences,” Journal of molecular evolution, vol. 16, no. 2, pp. 111—120, 1980. M. Kimura, “Estimation of evolutionary distances between homologous nucleotide sequences,” Proceedings of the National Academy of Sciences, vol. 78, no. 1, pp. 454—458, 1981. M. S. Swenson, R. Suri, C. R. Linder, and T. Warnow, “An experimental study of quartets maxcut and other supertree methods,” Algorithms for Molecular Biology, vol. 6, no. 1, pp. 7, 2011. Kimura 3-parameter model is most accurate Algebraic geometry has phylogenetic promise This material is based upon work supported by the National Science Foundation under Grant No. DMS-1358534 and hosted by the 2015 Winthrop University REU: Bridging Applied & Theoretical Mathematics. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Acknowledgments References Summarize observed distributions as points A T C G A C G T A T T G A G C A AAAA, TCTG, CGTC, GTGA A T C G A T T G A G C A A C G T AAAA, TTGC, CTCG, GGAT A T C G A G C A A T T G A C G T AAAA, TGTC, CCTG, GAGT Python Matlab MaxCut Jukes-Cantor model is fastest 𝑐 𝑎 𝑎 𝑏 𝑏 • 𝑎, 𝑏 ∈ 0.01, 1.5 ; Left; 𝑐 = 𝑎 • 𝑎 = 0.05; 𝑏 = 0.75 Left; c ∈ 0.01, 0.4 ; • c ∈ 0.01, 0.4 ; Right; 𝑎 = 0.05; 𝑏 = 0.75 Jukes-Cantor reconstructs other models’ data 𝑐 𝑎 𝑏 𝑎 𝑏 • Use Jukes-Cantor implementation on all data • A faster runtime • Almost the same accuracy • At least 95% accurate for 64% of the sample space • At least 85% accurate for 77% of the sample space • Most effective when excluding high 𝑏 and extreme 𝑎 • Reconstruction methods are useful for biologists. Jukes-Cantor A C G T A x0 x1 x1 x1 C x1 x0 x1 x1 G x1 x1 x0 x1 T x1 x1 x1 x0 Kimura 2-Parameter A C G T A x0 x1 x2 x1 C x1 x0 x1 x2 G x2 x1 x0 x1 T x1 x2 x1 x0 Kimura 3-Parameter A C G T A x0 x1 x2 x3 C x1 x0 x3 x2 G x2 x3 x0 x1 T x3 x2 x1 x0