SlideShare a Scribd company logo
1 of 12
A biologist’s guide to
Bayesian phylogenetic
analysis
Nascimento, et al. 2017
By Ryan Long
Systematics
“Nascimento’s BEST
publication since 2011”
Introduction
 Bayesian phylogenetic models were
introduced in the 1990s, and have
revolutionized the way we sequence data
 popularity due to (1) the development of
powerful models of data analysis and (2) the
availability of user-friendly computer
programs to apply the models
 But models are becoming increasingly
complicated and the programs themselves
not always clear to the users
Why does this paper exist?
 Purpose: "this paper aims to explain the
basic concepts of Bayesian statistics and
discuss the major features of MCMC
algorithms, such as the prior and the
likelihood, MCMC proposals, diagnosis of
MCMC convergence and mixing, and the
summary of the posterior sample.“
 Audience: the empirical biologist who
needs to use Bayesian phylogenetic
programs to analyze their data
What is the Bayesian method?
 it is a statistical inference methodology and is
the basis for a number of programs used to
produce phylogenetic estimations
 Uses Bayes' Theorem, which describes the
probability of an event based on prior
knowledge of conditions that might be related
to the event
 Probability distribution: the mathematical
function that gives all the possible values and
likelihoods that a variable can take in a given
range
 Prior (probability distribution): the probability
distribution that would express one's beliefs
about this quantity before some evidence is
taken into account.
Bayesian method cont.
 Posterior distribution: the probability that the model is correct – basically
what you think post-analysis
Summarizes all our information about what’s being examined
As we get more correct data the posterior distribution should become
more sharply peaked about a single value.
Ex: Previous data suggests 20% of systematics students explode by the end of
the semester (prior). You think it’s higher, you collect data, and the new results
are the posterior (50% of students will explode)
Prior belief
Data
Posterior belief
What type of data can be used?
 most common type of data in phylogenetic
analyses:
 DNA and amino acid sequence
alignments
 Morphological characters
Must be properly aligned before input
Multiple sequence alignment pictured below:
How to choose a prior for Bayesian analysis
 There is a certain level of subjectivity to prior selection,
and improper selection can affect results
 The prior distribution can strongly affect the posterior
distribution
 Default priors in software packages may not be
appropriate for one’s data
 It is also necessary to specify a prior on the evolutionary
rates for the different loci or partitions
What is
Markov chain Monte Carlo (MCMC): a class of algorithms for
sampling from a probability distribution
They make it possible to compute large hierarchical models that
require integrations exceeding thousands of unknown
parameters
Need: data, a prior, and a model – then you use MCMC to take a
sample from the posterior distribution
The algorithm generates that sample, which can then be used to
estimate the mean, the standard deviation of the posterior or
even the whole posterior distribution
What are convergence, burn-in and mixing of
the MCMC?
 Convergence rate: The MCMC should be visiting regions of high probability
within the posterior distribution. The convergence rate is the rate at which the
MCMC will move to these high-probability regions from any initial starting point
 Burn-in: The initial part of the MCMC chain where it is approaching the sampling
distribution from its starting point. This is usually discarded. (Before it reaches
the high-probability zones)
 Slow convergence rate = long burn-in
 Mixing efficiency: efficiency at which the MCMC samples parameters. If an
MCMC chain is mixing well, it implies that autocorrelation in the chain is low, ESS
is high and the estimates obtained are accurate. A chain that is mixing well will
have parameter traces that look like straight hairy caterpillars, with the chain
fluctuating so rapidly around the equilibrium that their are no obvious trends
Conclusion
 Bayesian phylogenetics has undergone
explosive growth in the past decade due to
easy-to-use models and available programs
 Bayesian MCMC methods are most commonly
used “in the areas of divergence time
estimation integrating molecular,
morphological and fossil information, species
tree estimation using multi-loci genomic
sequence data, and species delimitation
incorporating genetic and
morphological/ecological information”
Questions

More Related Content

What's hot

What's hot (20)

Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq Data
 
Genome analysis2
Genome analysis2Genome analysis2
Genome analysis2
 
RNA interference (RNAi)
RNA interference (RNAi)RNA interference (RNAi)
RNA interference (RNAi)
 
Protein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on RosettaProtein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on Rosetta
 
Snp genotyping
Snp genotypingSnp genotyping
Snp genotyping
 
qPCR Design Strategies for Specific Applications
qPCR Design Strategies for Specific ApplicationsqPCR Design Strategies for Specific Applications
qPCR Design Strategies for Specific Applications
 
Gene mapping
Gene mappingGene mapping
Gene mapping
 
“Proteomics” to study genes and genomes
“Proteomics” to study genes and genomes“Proteomics” to study genes and genomes
“Proteomics” to study genes and genomes
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Association mapping
Association mappingAssociation mapping
Association mapping
 
Phylogenetic trees
Phylogenetic treesPhylogenetic trees
Phylogenetic trees
 
Scoring schemes in bioinformatics (blosum)
Scoring schemes in bioinformatics (blosum)Scoring schemes in bioinformatics (blosum)
Scoring schemes in bioinformatics (blosum)
 
ChIP-seq
ChIP-seqChIP-seq
ChIP-seq
 
Basics of association_mapping
Basics of association_mappingBasics of association_mapping
Basics of association_mapping
 
Exonucleases & endonucleases as molecular tools
Exonucleases & endonucleases as molecular toolsExonucleases & endonucleases as molecular tools
Exonucleases & endonucleases as molecular tools
 
Protein-protein interaction networks
Protein-protein interaction networksProtein-protein interaction networks
Protein-protein interaction networks
 
Association mapping
Association mappingAssociation mapping
Association mapping
 
Snp
SnpSnp
Snp
 
Primer design task
Primer design taskPrimer design task
Primer design task
 
CytoScape
CytoScapeCytoScape
CytoScape
 

Similar to Bayesian Phylogenetics - Systematics.pptx

GeneticProgramming
GeneticProgrammingGeneticProgramming
GeneticProgramming
Dave Coulter
 
Cannonical correlation
Cannonical correlationCannonical correlation
Cannonical correlation
domsr
 
Cannonical Correlation
Cannonical CorrelationCannonical Correlation
Cannonical Correlation
domsr
 

Similar to Bayesian Phylogenetics - Systematics.pptx (20)

Research Proposal
Research ProposalResearch Proposal
Research Proposal
 
The bayesian revolution in genetics
The bayesian revolution in geneticsThe bayesian revolution in genetics
The bayesian revolution in genetics
 
cannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdfcannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdf
 
Softwares For Phylogentic Analysis
Softwares For Phylogentic AnalysisSoftwares For Phylogentic Analysis
Softwares For Phylogentic Analysis
 
GeneticProgramming
GeneticProgrammingGeneticProgramming
GeneticProgramming
 
Bayesian network-based predictive analytics applied to invasive species distr...
Bayesian network-based predictive analytics applied to invasive species distr...Bayesian network-based predictive analytics applied to invasive species distr...
Bayesian network-based predictive analytics applied to invasive species distr...
 
Cannonical correlation
Cannonical correlationCannonical correlation
Cannonical correlation
 
Cannonical Correlation
Cannonical CorrelationCannonical Correlation
Cannonical Correlation
 
PNN and inversion-B
PNN and inversion-BPNN and inversion-B
PNN and inversion-B
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data StreamsNovel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
 
Advanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data AnalysisAdvanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data Analysis
 
Bioinformatics_Sequence Analysis
Bioinformatics_Sequence AnalysisBioinformatics_Sequence Analysis
Bioinformatics_Sequence Analysis
 
SENIOR COMP FINAL
SENIOR COMP FINALSENIOR COMP FINAL
SENIOR COMP FINAL
 
man0 ppt.pptx
man0 ppt.pptxman0 ppt.pptx
man0 ppt.pptx
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
 
ReComp for genomics
ReComp for genomicsReComp for genomics
ReComp for genomics
 
Introduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysisIntroduction to 16S rRNA gene multivariate analysis
Introduction to 16S rRNA gene multivariate analysis
 
PhD defense - Exploiting distributional semantics for content-based and conte...
PhD defense - Exploiting distributional semantics for content-based and conte...PhD defense - Exploiting distributional semantics for content-based and conte...
PhD defense - Exploiting distributional semantics for content-based and conte...
 
Sequence Analysis
Sequence AnalysisSequence Analysis
Sequence Analysis
 

More from RyanLong78

More from RyanLong78 (7)

Ryan Long - Diet as Medicine - BY 418.pptx
Ryan Long - Diet as Medicine - BY 418.pptxRyan Long - Diet as Medicine - BY 418.pptx
Ryan Long - Diet as Medicine - BY 418.pptx
 
Diet as medicine-Medical Botany-Spring 2021.pptx
Diet as medicine-Medical Botany-Spring 2021.pptxDiet as medicine-Medical Botany-Spring 2021.pptx
Diet as medicine-Medical Botany-Spring 2021.pptx
 
Solis-Lemus & Ane (2016) Inferring Phylogenetic Networks.pptx
Solis-Lemus & Ane (2016) Inferring Phylogenetic Networks.pptxSolis-Lemus & Ane (2016) Inferring Phylogenetic Networks.pptx
Solis-Lemus & Ane (2016) Inferring Phylogenetic Networks.pptx
 
Soybean Pres.pptx
Soybean Pres.pptxSoybean Pres.pptx
Soybean Pres.pptx
 
Evolution of Human Diet 2021.pptx
Evolution of Human Diet 2021.pptxEvolution of Human Diet 2021.pptx
Evolution of Human Diet 2021.pptx
 
Cardinal Presentation.pptx
Cardinal Presentation.pptxCardinal Presentation.pptx
Cardinal Presentation.pptx
 
Vanilla Planifolia.pptx
Vanilla Planifolia.pptxVanilla Planifolia.pptx
Vanilla Planifolia.pptx
 

Recently uploaded

Warming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptxWarming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptx
GlendelCaroz
 
HIV AND INFULENZA VIRUS PPT HIV PPT INFULENZA VIRUS PPT
HIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPTHIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPT

Recently uploaded (20)

FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdfFORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary Gland
 
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...
 
Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
 
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptxPOST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
 
Fun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdfFun for mover student's book- English book for teaching.pdf
Fun for mover student's book- English book for teaching.pdf
 
Costs to heap leach gold ore tailings in Karamoja region of Uganda
Costs to heap leach gold ore tailings in Karamoja region of UgandaCosts to heap leach gold ore tailings in Karamoja region of Uganda
Costs to heap leach gold ore tailings in Karamoja region of Uganda
 
Polyethylene and its polymerization.pptx
Polyethylene and its polymerization.pptxPolyethylene and its polymerization.pptx
Polyethylene and its polymerization.pptx
 
Warming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptxWarming the earth and the atmosphere.pptx
Warming the earth and the atmosphere.pptx
 
ANATOMY OF DICOT AND MONOCOT LEAVES.pptx
ANATOMY OF DICOT AND MONOCOT LEAVES.pptxANATOMY OF DICOT AND MONOCOT LEAVES.pptx
ANATOMY OF DICOT AND MONOCOT LEAVES.pptx
 
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
 
Heads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdfHeads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdf
 
Classification of Kerogen, Perspective on palynofacies in depositional envi...
Classification of Kerogen,  Perspective on palynofacies in depositional  envi...Classification of Kerogen,  Perspective on palynofacies in depositional  envi...
Classification of Kerogen, Perspective on palynofacies in depositional envi...
 
HIV AND INFULENZA VIRUS PPT HIV PPT INFULENZA VIRUS PPT
HIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPTHIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPT
HIV AND INFULENZA VIRUS PPT HIV PPT INFULENZA VIRUS PPT
 
TEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdfTEST BANK for Organic Chemistry 6th Edition.pdf
TEST BANK for Organic Chemistry 6th Edition.pdf
 
Vital Signs of Animals Presentation By Aftab Ahmed Rahimoon
Vital Signs of Animals Presentation By Aftab Ahmed RahimoonVital Signs of Animals Presentation By Aftab Ahmed Rahimoon
Vital Signs of Animals Presentation By Aftab Ahmed Rahimoon
 
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
 
GBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) MetabolismGBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) Metabolism
 
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
 

Bayesian Phylogenetics - Systematics.pptx

  • 1. A biologist’s guide to Bayesian phylogenetic analysis Nascimento, et al. 2017 By Ryan Long Systematics “Nascimento’s BEST publication since 2011”
  • 2. Introduction  Bayesian phylogenetic models were introduced in the 1990s, and have revolutionized the way we sequence data  popularity due to (1) the development of powerful models of data analysis and (2) the availability of user-friendly computer programs to apply the models  But models are becoming increasingly complicated and the programs themselves not always clear to the users
  • 3. Why does this paper exist?  Purpose: "this paper aims to explain the basic concepts of Bayesian statistics and discuss the major features of MCMC algorithms, such as the prior and the likelihood, MCMC proposals, diagnosis of MCMC convergence and mixing, and the summary of the posterior sample.“  Audience: the empirical biologist who needs to use Bayesian phylogenetic programs to analyze their data
  • 4. What is the Bayesian method?  it is a statistical inference methodology and is the basis for a number of programs used to produce phylogenetic estimations  Uses Bayes' Theorem, which describes the probability of an event based on prior knowledge of conditions that might be related to the event  Probability distribution: the mathematical function that gives all the possible values and likelihoods that a variable can take in a given range  Prior (probability distribution): the probability distribution that would express one's beliefs about this quantity before some evidence is taken into account.
  • 5. Bayesian method cont.  Posterior distribution: the probability that the model is correct – basically what you think post-analysis Summarizes all our information about what’s being examined As we get more correct data the posterior distribution should become more sharply peaked about a single value. Ex: Previous data suggests 20% of systematics students explode by the end of the semester (prior). You think it’s higher, you collect data, and the new results are the posterior (50% of students will explode) Prior belief Data Posterior belief
  • 6. What type of data can be used?  most common type of data in phylogenetic analyses:  DNA and amino acid sequence alignments  Morphological characters Must be properly aligned before input Multiple sequence alignment pictured below:
  • 7. How to choose a prior for Bayesian analysis  There is a certain level of subjectivity to prior selection, and improper selection can affect results  The prior distribution can strongly affect the posterior distribution  Default priors in software packages may not be appropriate for one’s data  It is also necessary to specify a prior on the evolutionary rates for the different loci or partitions
  • 8. What is Markov chain Monte Carlo (MCMC): a class of algorithms for sampling from a probability distribution They make it possible to compute large hierarchical models that require integrations exceeding thousands of unknown parameters Need: data, a prior, and a model – then you use MCMC to take a sample from the posterior distribution The algorithm generates that sample, which can then be used to estimate the mean, the standard deviation of the posterior or even the whole posterior distribution
  • 9. What are convergence, burn-in and mixing of the MCMC?  Convergence rate: The MCMC should be visiting regions of high probability within the posterior distribution. The convergence rate is the rate at which the MCMC will move to these high-probability regions from any initial starting point  Burn-in: The initial part of the MCMC chain where it is approaching the sampling distribution from its starting point. This is usually discarded. (Before it reaches the high-probability zones)  Slow convergence rate = long burn-in  Mixing efficiency: efficiency at which the MCMC samples parameters. If an MCMC chain is mixing well, it implies that autocorrelation in the chain is low, ESS is high and the estimates obtained are accurate. A chain that is mixing well will have parameter traces that look like straight hairy caterpillars, with the chain fluctuating so rapidly around the equilibrium that their are no obvious trends
  • 10. Conclusion  Bayesian phylogenetics has undergone explosive growth in the past decade due to easy-to-use models and available programs  Bayesian MCMC methods are most commonly used “in the areas of divergence time estimation integrating molecular, morphological and fossil information, species tree estimation using multi-loci genomic sequence data, and species delimitation incorporating genetic and morphological/ecological information”
  • 11.

Editor's Notes

  1. Grade A joke opportunity: “You could say it’s a McDouble.” Play Badum tss sound effect