An introduction to the Bayesian phylogenetics software program RevBayes. Tutorials for RevBayes can be found at http://revbayes.github.io/
Edited July 20, 2016
Estimating Species Divergence Times in RevBayes – iEvoBio 2014Tracy Heath
Phylogenetic analyses of macroevolutionary processes require estimates of species divergence times. Critically, this requires a framework for modeling lineage-specific substitution rates and speciation times while accounting for uncertainty in the tree topology. Bayesian inference methods are well suited to such analyses. However, implementations of these methods have historically been limited by the available models and priors in each program. RevBayes is a new statistical programming environment that provides a flexible framework for phylogenetic inference. We have implemented phylogenetic inference under a diverse set of relaxed-clock and branching-process models in RevBayes. The user specifies the model and analysis details in Rev -- an interpreted programming language based on R. I will present the theory behind the implementation of phylogenetic models in RevBayes that gives the software its flexibility and show the results of empirical analyses.
A lecture on Bayesian divergence-time estimation at the SSB 2015 Meeting in Ann Arbor, MI USA.
Slides on MCMC by Paul Lewis from the Woods Hole Workshop on Molecular Evolution (https://molevol.mbl.edu/index.php/Paul_Lewis)
Workshop materials: http://phyloworks.org/resources/ssbws.html
Subword and spatiotemporal models for identifying actionable information in ...Robert Munro
Crisis-affected populations are often able to maintain digital communications but in a sudden-onset crisis any aid organizations will have the least free resources to process such communications. Information that aid agencies can actually act on, ‘actionable’ information, will be sparse so there is great potential to (semi)automatically identify actionable communications. However, there are hurdles as the languages spoken will often be underresourced, have orthographic variation, and the precise definition of ‘actionable’ will be response-specific and evolving.
We present a novel system that addresses this, drawing on 40,000 emergency text messages sent in Haiti following the January 12, 2010 earthquake, predominantly in Haitian Kreyol. We show that keyword/ngram-based models using streaming MaxEnt achieve up to F=0.21 accuracy. Further, we find current state-of-the-art subword models increase this substantially to F=0.33 accuracy, while modeling the spatial, temporal, topic and source contexts of the messages can increase this to a very accurate F=0.86 over direct text messages and F=0.90-0.97 over social media, making it a viable strategy for message prioritization.
Bayesian Divergence Time Estimation – Workshop LectureTracy Heath
**These lecture slides are no longer being updated. For the most current version please go to: https://figshare.com/articles/Bayesian_Divergence-Time_Estimation_Lecture/6849005
A lecture on Bayesian divergence-time estimation by Tracy A. Heath (http://phyloworks.org/).
Выступление Сергея Кольцова (НИУ ВШЭ) на International Conference on Big Data and its Applications (ICBDA).
ICBDA — конференция для предпринимателей и разработчиков о том, как эффективно решать бизнес-задачи с помощью анализа больших данных.
http://icbda2015.org/
Estimating Species Divergence Times in RevBayes – iEvoBio 2014Tracy Heath
Phylogenetic analyses of macroevolutionary processes require estimates of species divergence times. Critically, this requires a framework for modeling lineage-specific substitution rates and speciation times while accounting for uncertainty in the tree topology. Bayesian inference methods are well suited to such analyses. However, implementations of these methods have historically been limited by the available models and priors in each program. RevBayes is a new statistical programming environment that provides a flexible framework for phylogenetic inference. We have implemented phylogenetic inference under a diverse set of relaxed-clock and branching-process models in RevBayes. The user specifies the model and analysis details in Rev -- an interpreted programming language based on R. I will present the theory behind the implementation of phylogenetic models in RevBayes that gives the software its flexibility and show the results of empirical analyses.
A lecture on Bayesian divergence-time estimation at the SSB 2015 Meeting in Ann Arbor, MI USA.
Slides on MCMC by Paul Lewis from the Woods Hole Workshop on Molecular Evolution (https://molevol.mbl.edu/index.php/Paul_Lewis)
Workshop materials: http://phyloworks.org/resources/ssbws.html
Subword and spatiotemporal models for identifying actionable information in ...Robert Munro
Crisis-affected populations are often able to maintain digital communications but in a sudden-onset crisis any aid organizations will have the least free resources to process such communications. Information that aid agencies can actually act on, ‘actionable’ information, will be sparse so there is great potential to (semi)automatically identify actionable communications. However, there are hurdles as the languages spoken will often be underresourced, have orthographic variation, and the precise definition of ‘actionable’ will be response-specific and evolving.
We present a novel system that addresses this, drawing on 40,000 emergency text messages sent in Haiti following the January 12, 2010 earthquake, predominantly in Haitian Kreyol. We show that keyword/ngram-based models using streaming MaxEnt achieve up to F=0.21 accuracy. Further, we find current state-of-the-art subword models increase this substantially to F=0.33 accuracy, while modeling the spatial, temporal, topic and source contexts of the messages can increase this to a very accurate F=0.86 over direct text messages and F=0.90-0.97 over social media, making it a viable strategy for message prioritization.
Bayesian Divergence Time Estimation – Workshop LectureTracy Heath
**These lecture slides are no longer being updated. For the most current version please go to: https://figshare.com/articles/Bayesian_Divergence-Time_Estimation_Lecture/6849005
A lecture on Bayesian divergence-time estimation by Tracy A. Heath (http://phyloworks.org/).
Выступление Сергея Кольцова (НИУ ВШЭ) на International Conference on Big Data and its Applications (ICBDA).
ICBDA — конференция для предпринимателей и разработчиков о том, как эффективно решать бизнес-задачи с помощью анализа больших данных.
http://icbda2015.org/
Incremental View Maintenance for openCypher QueriesGábor Szárnyas
Presented at the Fourth openCypher Implementers Meeting
Numerous graph use cases require continuous evaluation of queries over a constantly changing data set, e.g. fraud detection in financial systems, recommendations, and checking integrity constraints. For relational systems, incremental view maintenance has been researched for three decades, resulting in a wide body of literature. The property graph data model and the openCypher language, however, are recent developments, and therefore lack established techniques to perform efficient view maintenance. In this talk, we give an overview of the view maintenance problem for property graphs, discuss why it is particularly difficult and present an approach that tackles a meaningful subset of the language.
Continuous Architecting of Stream-Based SystemsCHOOSE
Pooyan Jamshidi CHOOSE Talk 2016-11-01
Big data architectures have been gaining momentum in recent years. For instance, Twitter uses stream processing frameworks like Storm to analyse billions of tweets per minute and learn the trending topics. However, architectures that process big data involve many different components interconnected via semantically different connectors making it a difficult task for software architects to refactor the initial designs. As an aid to designers and developers, we developed OSTIA (On-the-fly Static Topology Inference Analysis) that allows: (a) visualizing big data architectures for the purpose of design-time refactoring while maintaining constraints that would only be evaluated at later stages such as deployment and run-time; (b) detecting the occurrence of common anti-patterns across big data architectures; (c) exploiting software verification techniques on the elicited architectural models. In the lecture, OSTIA will be shown on three industrial-scale case studies.
See: http://www.choose.s-i.ch/events/jamshidi-2016/
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deren Lei
Deep reinforcement learning (RL) has been a commonly-used strategy for the abstractive summarization task to address both the exposure bias and non-differentiable task issues. However, the conventional reward ROUGE-L simply looks for exact n-grams matches between candidates and annotated references, which inevitably makes the generated sentences repetitive and incoherent. In this paper, we explore the practicability of utilizing the distributional semantics to measure the matching degrees. Our proposed distributional semantics reward has distinct superiority in capturing the lexical and compositional diversity of natural language.
Reference Scope Identification of Citances Using Convolutional Neural NetworkSaurav Jha
In the task of summarization of a scientific paper, a lot of information stands to be gained about a reference paper, from the papers that cite it. Automatically generating the reference scope (the span of cited text) in a reference paper, corresponding to citances (sentences in the citing papers that cite it) has great significance in preparing a structured summary of the reference paper. We treat this task as a binary classification problem, by extracting feature vectors from pairs of citances and reference sentences. These features are lexical, corpus-based, surface and knowledge-based. We extend the current feature set employed for reference-citance pair identification in the current state-of-the-art system. Using these features, we present a novel classification approach for this task, that employs a deep Convolutional Neural Network along with two boosting ensemble algorithms. We outperform the existing state-of-the- art for distinguishing between cited spans and non-cited spans of text in the reference paper.
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...Pooyan Jamshidi
https://arxiv.org/abs/1606.06543
Finding optimal configurations for Stream Processing Systems (SPS) is a challenging problem due to the large number of parameters that can influence their performance and the lack of analytical models to anticipate the effect of a change. To tackle this issue, we consider tuning methods where an experimenter is given a limited budget of experiments and needs to carefully allocate this budget to find optimal configurations. We propose in this setting Bayesian Optimization for Configuration Optimization (BO4CO), an auto-tuning algorithm that leverages Gaussian Processes (GPs) to iteratively capture posterior distributions of the configuration spaces and sequentially drive the experimentation. Validation based on Apache Storm demonstrates that our approach locates optimal configurations within a limited experimental budget, with an improvement of SPS performance typically of at least an order of magnitude compared to existing configuration algorithms.
Generalized Linear Models in Spark MLlib and SparkRDatabricks
Generalized linear models (GLMs) unify various statistical models such as linear regression and logistic regression through the specification of a model family and link function. They are widely used in modeling, inference, and prediction with applications in numerous fields. In this talk, we will summarize recent community efforts in supporting GLMs in Spark MLlib and SparkR. We will review supported model families, link functions, and regularization types, as well as their use cases, e.g., logistic regression for classification and log-linear model for survival analysis. Then we discuss the choices of solvers and their pros and cons given training datasets of different sizes, and implementation details in order to match R’s model output and summary statistics. We will also demonstrate the APIs in MLlib and SparkR, including R model formula support, which make building linear models a simple task in Spark. This is a joint work with Eric Liang, Yanbo Liang, and some other Spark contributors.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
More Related Content
Similar to An Introduction to RevBayes and Graphical Models
Incremental View Maintenance for openCypher QueriesGábor Szárnyas
Presented at the Fourth openCypher Implementers Meeting
Numerous graph use cases require continuous evaluation of queries over a constantly changing data set, e.g. fraud detection in financial systems, recommendations, and checking integrity constraints. For relational systems, incremental view maintenance has been researched for three decades, resulting in a wide body of literature. The property graph data model and the openCypher language, however, are recent developments, and therefore lack established techniques to perform efficient view maintenance. In this talk, we give an overview of the view maintenance problem for property graphs, discuss why it is particularly difficult and present an approach that tackles a meaningful subset of the language.
Continuous Architecting of Stream-Based SystemsCHOOSE
Pooyan Jamshidi CHOOSE Talk 2016-11-01
Big data architectures have been gaining momentum in recent years. For instance, Twitter uses stream processing frameworks like Storm to analyse billions of tweets per minute and learn the trending topics. However, architectures that process big data involve many different components interconnected via semantically different connectors making it a difficult task for software architects to refactor the initial designs. As an aid to designers and developers, we developed OSTIA (On-the-fly Static Topology Inference Analysis) that allows: (a) visualizing big data architectures for the purpose of design-time refactoring while maintaining constraints that would only be evaluated at later stages such as deployment and run-time; (b) detecting the occurrence of common anti-patterns across big data architectures; (c) exploiting software verification techniques on the elicited architectural models. In the lecture, OSTIA will be shown on three industrial-scale case studies.
See: http://www.choose.s-i.ch/events/jamshidi-2016/
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deren Lei
Deep reinforcement learning (RL) has been a commonly-used strategy for the abstractive summarization task to address both the exposure bias and non-differentiable task issues. However, the conventional reward ROUGE-L simply looks for exact n-grams matches between candidates and annotated references, which inevitably makes the generated sentences repetitive and incoherent. In this paper, we explore the practicability of utilizing the distributional semantics to measure the matching degrees. Our proposed distributional semantics reward has distinct superiority in capturing the lexical and compositional diversity of natural language.
Reference Scope Identification of Citances Using Convolutional Neural NetworkSaurav Jha
In the task of summarization of a scientific paper, a lot of information stands to be gained about a reference paper, from the papers that cite it. Automatically generating the reference scope (the span of cited text) in a reference paper, corresponding to citances (sentences in the citing papers that cite it) has great significance in preparing a structured summary of the reference paper. We treat this task as a binary classification problem, by extracting feature vectors from pairs of citances and reference sentences. These features are lexical, corpus-based, surface and knowledge-based. We extend the current feature set employed for reference-citance pair identification in the current state-of-the-art system. Using these features, we present a novel classification approach for this task, that employs a deep Convolutional Neural Network along with two boosting ensemble algorithms. We outperform the existing state-of-the- art for distinguishing between cited spans and non-cited spans of text in the reference paper.
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...Pooyan Jamshidi
https://arxiv.org/abs/1606.06543
Finding optimal configurations for Stream Processing Systems (SPS) is a challenging problem due to the large number of parameters that can influence their performance and the lack of analytical models to anticipate the effect of a change. To tackle this issue, we consider tuning methods where an experimenter is given a limited budget of experiments and needs to carefully allocate this budget to find optimal configurations. We propose in this setting Bayesian Optimization for Configuration Optimization (BO4CO), an auto-tuning algorithm that leverages Gaussian Processes (GPs) to iteratively capture posterior distributions of the configuration spaces and sequentially drive the experimentation. Validation based on Apache Storm demonstrates that our approach locates optimal configurations within a limited experimental budget, with an improvement of SPS performance typically of at least an order of magnitude compared to existing configuration algorithms.
Generalized Linear Models in Spark MLlib and SparkRDatabricks
Generalized linear models (GLMs) unify various statistical models such as linear regression and logistic regression through the specification of a model family and link function. They are widely used in modeling, inference, and prediction with applications in numerous fields. In this talk, we will summarize recent community efforts in supporting GLMs in Spark MLlib and SparkR. We will review supported model families, link functions, and regularization types, as well as their use cases, e.g., logistic regression for classification and log-linear model for survival analysis. Then we discuss the choices of solvers and their pros and cons given training datasets of different sizes, and implementation details in order to match R’s model output and summary statistics. We will also demonstrate the APIs in MLlib and SparkR, including R model formula support, which make building linear models a simple task in Spark. This is a joint work with Eric Liang, Yanbo Liang, and some other Spark contributors.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
1. I P
I RB
Tracy A. Heath
Iowa State University
@trayc7
Michael J. Landis
Yale University
@landismj
2016 Workshop on Molecular Evolution
Woods Hole, MA
2. O
Overview – Heath
Introduction to RevBayes
• Motivation
• Probabilistic graphical models
• The Rev language and demo
short break
Demo & Tutorial – Landis
Phylogenetic reconstruction in RevBayes
• Demo: tree reconstruction using MCMC under JC
• Tutorial (on your own): specify the HKY model, sample
using MCMC, summarize the tree
beer(s)
5. M B P S
Several software packages in
phylogenetics are moving toward a
more modular framework
• reuse code
• easier to extend existing models
and implement new models
through a rich, language-based
interface
• provides a unified framework for
analyses under complex models
RevBayes
Bali-Phy
BEAST2
6. RB
Höhna et al. 2016. RevBayes:
Bayesian phylogenetic inference using
graphical models and an interactive
model-specification language.
Systematic Biology.
(doi: 10.1093/sysbio/syw021)
http://revbayes.com
Development team
Höhna
Lartillot Huelsenbeck Ronquist
Landis Heath Boussau
& others...
(Höhna et al. 2016. Systematic Biology, 65:726-736.)
7. G M RB
Graphical models provide tools for
visually & computationally representing
complex, parameter-rich probabilistic
models
We can depict the conditional
dependence structure of various
parameters and other random variables
Höhna, Heath, Boussau, Landis, Ronquist, Huelsenbeck. 2014.
Probabilistic Graphical Model Representation in Phylogenetics.
Systematic Biology. (doi: 10.1093/sysbio/syu039)
8. M C V
What is the distribution of heights in a population of
penguins?
4.098
2.867
3.756
1.693
3.251
2.516
3.998
2.606
2.744
3.463
4.20
3.058
4.559
3.55
2.852
(silhouette from http://phylopic.org/)
9. M C V
To estimate the distribution of a variable (like the heights
of all individuals within a population of penguins), we need
a prior model of that parameter
For a parameter like height, we
want a distribution with properties
that match our biological
knowledge
3.756
The Lognormal distribution is an asymmetric probability
distribution on positive values
(silhouette from http://phylopic.org/)
10. T L D
A variable that is lognormally distributed matches a normal
distribution when on a log scale
χ ∼ LN(μ,σ)
log(χ) ∼ Norm(μ,σ) 3.756
χ must be a positive real number & is the product of a
large number of independent, identically-distributed variables
(silhouette from http://phylopic.org/)
11. T L D
0 1 2 3 4 5 6 7 8
mean = exp(µ + σ/2)
mean = 1.11
mean = 1.29
mean = 1.65
Density
Variable
σ=0.2, µ=0
σ=0.5, µ=0
σ=1.0, µ=0
21. G M RB
Defining the model in the Rev language
xi
µ
observations = [<your data go here>]
22. G M RB
Defining the model in the Rev language
xi
µ
M
α β observations = [<your data go here>]
alpha <- 3.0
beta <- 1.0
23. G M RB
Defining the model in the Rev language
xi
µ
M
α β observations = [<your data go here>]
alpha <- 3.0
beta <- 1.0
M ∼ dnGamma(alpha, beta)
24. G M RB
Defining the model in the Rev language
xi
µ σ
M
λ
α β observations = [<your data go here>]
alpha <- 3.0
beta <- 1.0
M ∼ dnGamma(alpha, beta)
lambda <- 1.0
25. G M RB
Defining the model in the Rev language
xi
µ σ
M
λ
α β observations = [<your data go here>]
alpha <- 3.0
beta <- 1.0
M ∼ dnGamma(alpha, beta)
lambda <- 1.0
sigma ∼ dnExponential(lambda)
26. G M RB
Defining the model in the Rev language
xi
µ σ
M
λ
α β observations = [<your data go here>]
alpha <- 3.0
beta <- 1.0
M ∼ dnGamma(alpha, beta)
lambda <- 1.0
sigma ∼ dnExponential(lambda)
mu := ln(M) - (power(sigma, 2.0) / 2.0)
27. G M RB
Defining the model in the Rev language
xi
µ σ
M
i ∈ N
λ
α β observations = [<your data go here>]
alpha <- 3.0
beta <- 1.0
M ∼ dnGamma(alpha, beta)
lambda <- 1.0
sigma ∼ dnExponential(lambda)
mu := ln(M) - (power(sigma, 2.0) / 2.0)
N <- observations.size()
for( i in 1:N ){
x[i] ∼ dnLnorm(mu, sigma)
}
28. G M RB
Defining the model in the Rev language
xi
µ σ
M
i ∈ N
λ
α β observations = [<your data go here>]
alpha <- 3.0
beta <- 1.0
M ∼ dnGamma(alpha, beta)
lambda <- 1.0
sigma ∼ dnExponential(lambda)
mu := ln(M) - (power(sigma, 2.0) / 2.0)
N <- observations.size()
for( i in 1:N ){
x[i] ∼ dnLnorm(mu, sigma)
x[i].clamp(observations[i])
}
29. RB D: A S M
Use MCMC to approximate the posterior distributions of
stochastic and deterministic variables