This document provides a summary of Bayesian phylogenetic inference and Markov chain Monte Carlo (MCMC) methods. It begins with an introduction to probability distributions and stochastic processes relevant to phylogenetic modeling. It then discusses how Bayesian inference is applied to phylogenetics by combining prior distributions on tree topologies and other model parameters with the likelihood of the data to obtain posterior distributions. MCMC methods like the Metropolis-Hastings algorithm are introduced as a way to sample from these posterior distributions. Issues around convergence, mixing, and tuning MCMC proposals are also covered.
Ab Initio Protein Structure Prediction is a method to determine the tertiary structure of protein in the absence of experimentally solved structure of a similar/homologous protein. This method builds protein structure guided by energy function.
I had prepared this presentation for an internal project during my masters degree course.
Protein phosphorylation, a reversible process, is characterized by adding phosphate donated from ATP and removing phosphate from a phosphorylated protein substrate. For more information, please visit: https://www.creative-proteomics.com/services/phosphorylation.htm
Ab Initio Protein Structure Prediction is a method to determine the tertiary structure of protein in the absence of experimentally solved structure of a similar/homologous protein. This method builds protein structure guided by energy function.
I had prepared this presentation for an internal project during my masters degree course.
Protein phosphorylation, a reversible process, is characterized by adding phosphate donated from ATP and removing phosphate from a phosphorylated protein substrate. For more information, please visit: https://www.creative-proteomics.com/services/phosphorylation.htm
Genome-wide association study (GWAS) technology has been a primary method for identifying the genes responsible for diseases and other traits for the past ten years. GWAS continues to be highly relevant as a scientific method. Over 2,000 human GWAS reports now appear in scientific journals. Our free eBook aims to explain the basic steps and concepts to complete a GWAS experiment.
Genome-wide association study (GWAS) technology has been a primary method for identifying the genes responsible for diseases and other traits for the past ten years. GWAS continues to be highly relevant as a scientific method. Over 2,000 human GWAS reports now appear in scientific journals. Our free eBook aims to explain the basic steps and concepts to complete a GWAS experiment.
Multi Model Ensemble (MME) predictions are a popular ad-hoc technique for improving predictions of high-dimensional, multi-scale dynamical systems. The heuristic idea behind MME framework is simple: given a collection of models, one considers predictions obtained through the convex superposition of the individual probabilistic forecasts in the hope of mitigating model error. However, it is not obvious if this is a viable strategy and which models should be included in the MME forecast in order to achieve the best predictive performance. I will present an information-theoretic approach to this problem which allows for deriving a sufficient condition for improving dynamical predictions within the MME framework; moreover, this formulation gives rise to systematic and practical guidelines for optimising data assimilation techniques which are based on multi-model ensembles. Time permitting, the role and validity of “fluctuation-dissipation” arguments for improving imperfect predictions of externally perturbed non-autonomous systems - with possible applications to climate change considerations - will also be addressed.
Inria Tech Talk - La classification de données complexes avec MASSICCCStéphanie Roger
MASSICCC - Une plateforme SaaS pour le traitement de la classification de données complexes hétérogènes et incomplètes.
Dans ce Tech Talk venez découvrir, tester et apprendre à maîtriser MASSICCC (Massive clustering in cloud computing) une plateforme SaaS orientée utilisateurs, ainsi que ses trois familles d’algorithmes de #classification, fruits des dernières avancées des équipes de recherche Modal & Celeste de Inria, pour analyser et faire de l’apprentissage sur vos "Big Data" (ex : en immobilier, maintenance prédictive, santé, open data, etc. ).
MASSICCC c’est aussi :
- Un accès gratuit pour le test et la recherche sur https://massiccc.lille.inria.fr
- Un "one for all" de la classification
- Une forte interprétabilité des résultats (avec ses graphiques)
- Un mode SaaS qui vous permet un suivi des expériences (en cours ou terminées)
- Et des algorithmes open source qui sont réutilisables indépendamment.
Research internship on optimal stochastic theory with financial application u...Asma Ben Slimene
This is a presntation of my second year intership on optimal stochastic theory and how we can apply it on some financial application then how we can solve such problems using finite differences methods!
Enjoy it !
Presentation on stochastic control problem with financial applications (Merto...Asma Ben Slimene
This is an introductory to optimal stochastic control theory with two applications in finance: Merton portfolio problem and Investement/consumption problem with numerical results using finite differences approach
Typically quantifying uncertainty requires many evaluations of a computational model or simulator. If a simulator is computationally expensive and/or high-dimensional, working directly with a simulator often proves intractable. Surrogates of expensive simulators are popular and powerful tools for overcoming these challenges. I will give an overview of surrogate approaches from an applied math perspective and from a statistics perspective with the goal of setting the stage for the "other" community.
최근 이수가 되고 있는 Bayesian Deep Learning 관련 이론과 최근 어플리케이션들을 소개합니다. Bayesian Inference 의 이론에 관해서 간단히 설명하고 Yarin Gal 의 Monte Carlo Dropout 의 이론과 어플리케이션들을 소개합니다.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
4. Do not worry about your
difficulties in Mathematics. I
can assure you that mine
are still greater.
Albert Einstein
5.
6. α Uniform distribution
β
γ
δ
ε
Beta distribution
Gamma distribution
Dirichlet distribution
Exponential distribution
Continuous probability distributions
7. DiscreteProbability
Sample spaceω1 ω2 ω3 ω4 ω5 ω6
E = ω1,ω3,ω5{ } Event (subset of outcomes;
e.g., face with odd number)
Distribution function
(E.g., uniform)
m(ω)
Pr(E) = m(ω)
ω ∈Ε
∑ Probability
Random variable X
8. ContinuousProbability
Sample space
(an interval)
Disc with
circumference 1
f (x) Probability density
function (pdf)
(e.g. Uniform(0,1))
Pr(E) = f (x)
x∈E
∫ dx Probability
E = a,b[ ) Event (a subspace of the
sample space)
a b
Random variable X
9. ProbabilityDistributions
• Uniform distribution
• Beta distribution
• Gamma distribution
• Dirichlet distribution
• Exponential distribution
• Normal distribution
• Lognormal distribution
• Multivariate normal distribution
Continuous Distributions
13. ExponentialDistribution
f (x) = λe−λx
Mean: 1/λ
λ = rate (of decay)
Exp(λ)X ~
Parameters:
Probability density function:
Exponential distribution
14. GammaDistribution
f (x) µ xα−1
e−βx
Mean: α /β
α = shape
Gamma(α,β)X ~
Parameters:
Probability density function:
Gamma distribution
β = inverse scale
Scaled gamma: α = β
Scaled Gamma
15. BetaDistribution
f (x) µ xα1 −1
(1− x)α2 −1
Mode:
α1 −1
αi −1( )
i
∑
α1,α2 = shape parameters
Beta(α1,α2)X ~
Parameters:
Probability density function:
Beta distribution
Defined on two proportions of a whole
(a simplex)
16. DirichletDistribution
f (x) µ xi
αi −1
i
∏
α = vector of k shape parameters
Dir(α) :α = α1,α2,...,αk{ }X ~
Parameters:
Probability density function:
Dirichlet distribution
Defined on k proportions of a whole
Dir(1,1,1,1)
Dir(300,300,300,300)
20. MaximumLikelihood
Pr(D |θ)
Maximum Likelihood Inference
Data D; Model M with parameters θ
We can calculate or f (D |θ)
Maximum likelihood: find the value of θ that
maximizes L(θ)
Define the likelihood function L(θ)µ f (D |θ)
Confidence: asymptotic behavior, more
samples, bootstrapping
21. BayesianInference
Pr(D |θ)
Bayesian Inference
Data D; Model M with parameters θ
We can calculate or f (D |θ)
We are actually interested in Pr(θ | D) or f (θ | D)
Bayes’ rule:
f (θ | D) =
f (θ) f (D |θ)
f (D)
Posterior density
Prior density
”Likelihood”
Normalizing constant
Marginal likelihood of the data
Model likelihood
f (D) = f (θ)∫ f (D |θ) dθ
27. Learn more:
• Wikipedia (good texts on most statistical distributions,
sometimes a little difficult)
• Grinstead & Snell: Introduction to Probability.
American Mathematical Society. Free pdf available
from:
http://www.dartmouth.edu/~chance/teaching_aids/books_art
• Team up with a statistician or a computational /
theoretical evolutionary biologist!
31. A B C
Prior distribution
probability
1.0
Posterior distribution
probability
1.0
Data (observations)
Model
32. D The data
Taxon Characters
A ACG TTA TTA AAT TGT CCT CTT TTC AGA
B ACG TGT TTC GAT CGT CCT CTT TTC AGA
C ACG TGT TTA GAC CGA CCT CGG TTA AGG
D ACA GGA TTA GAT CGT CCG CTT TTC AGA
33. Model: topology AND branch lengths
θ Parameters
topology )(τ
branch lengths )( iv
A
B
3v
C
D
2v
1v
4v
5v
(expected amount of change)
),( vτθ =
35. Model: molecular evolution
Probabilities are calculated using the transition
probability matrix P
4 /3
4 /3
1 1
4 4( )
1 3
4 4
v
Qv
v
e
P v e
e
−
−
−
= =
+
(change)
(no change)
36. Priors on parameters
Topology
all unique topologies have equal
probability
Branch lengths
exponential prior (puts more weight on
small branch lengths)
37. 4 /3v
p k ke−
= −v
Exp(1)v :
Exp(1)v :
The effect on data likelihood is most important
Jeffrey’s uninformative priors formalize this
Branch length Prob. of substitution
Scale matters in priors
38. Bayes’ theorem
( ) ( | )
( | )
( ) ( | ) d
f f D
f D
f f D
θ θ
θ
θ θ θ
=
∫
Posterior
distribution
Prior distribution ”Likelihood”
Normalizing constant
D = Data
θ = Model parameters
39. tree 1 tree 2 tree 3
θ
)|( Xf θ
Posterior probability distribution
Parameter space
Posteriorprobability
40. tree 1 tree 2 tree 3
20% 48% 32%
We can focus on any parameter of interest
(there are no nuisance parameters) by
marginalizing the posterior over the other
parameters (integrating out the
uncertainty in the other parameters)
(Percentages denote marginal probability distribution on trees)
42. tree 1 tree 2 tree 3
always accept
accept sometimes
Start at an arbitrary point
Make a small random move
Calculate height ratio (r) of new state to old state:
r > 1 -> new state accepted
r < 1 -> new state accepted with probability r. If new
state not accepted, stay in the old state
Go to step 2
Markov chain Monte Carlo
The proportion of time the
MCMC procedure samples
from a particular parameter
region is an estimate of that
region’s posterior
probability density
1
2b
2a
20 % 48 % 32 %
43. Metropolis algorithm
Assume that the current state has
parameter values θ
Consider a move to a state with parameter
values θ *
The height ratio r is
(prior ratio x likelihood ratio)
r =
f (θ*
| D)
f (θ | D)
=
f (θ*
) f (D |θ*
)/ f (D)
f (θ) f (D |θ)/ f (D)
=
f (θ*
)
f (θ)
×
f (D |θ*
)
f (D |θ)
44. MCMC Sampling Strategies
Great freedom of strategies:
Typically one or a few related parameters
changed at a time
You can cycle through parameters
systematically or choose randomly
One ”generation” or ”iteration” or ”cycle” can
include a single randomly chosen proposal (or
move, operator, kernel), one proposal for each
parameter, a block of randomly chosen
proposals
50. Summarizing Trees
Maximum posterior probability tree (MAP tree)
can be difficult to estimate precisely
can have low probability
Majority rule consensus tree
easier to estimate clade probabilities exactly
branch length distributions can be summarized across all
trees with the branch
can hide complex topological dependence
branch length distributions can be multimodal
Credible sets of trees
Include trees in order of decreasing probability to
obtain, e.g., 95 % credible set
“Median” or “central” tree
51. Adding Model Complexity
A
B
topology General Time Reversible
substitution model
)(τ
÷
÷
÷
÷
÷
−
−
−
−
=
GTGCTCATA
GTTCGCAGA
CTTCGGACA
ATTAGGACC
rrr
rrr
rrr
rrr
Q
πππ
πππ
πππ
πππ
3v
C
D
2v
1v
4v
5v
branch lengths )( iv
55. Summarizing Variables
Mean, median, variance common
summaries
95 % credible interval: discard the
lowest 2.5 % and highest 2.5 % of
sampled values
95 % region of highest posterior
density (HPD): find smallest region
containing 95 % of probability
57. Other Sampling Methods
Gibbs sampling: sample from the conditional posterior (a variant of
the Metropolis algorithm)
Metropolized Gibbs sampling: more efficient variant of Gibbs
sampling of discrete characters
Slice sampling: less prone to get stuck in local optima than the
Metropolis algorithm
Hamiltonian sampling. A technique for decreasing the problem with
sampling correlated parameters.
Simulated annealing: increase ”greediness” during the burn-in
phase of MCMC sampling
Data augmentation techniques: add parameters to facilitate
probability calculations
Sequential Monte Carlo techniques: generate a sample of complete
state by building sets of particles from incomplete states
60. Convergence and Mixing
Convergence is the degree to which
the chain has converged onto the
target distribution
Mixing is the speed with which the
chain covers the region of interest in
the target distribution
61. Assessing Convergence
Plateau in the trace plot
Look at sampling behavior within the
run (autocorrelation time, effective
sample size)
Compare independent runs with
different, randomly chosen starting
points
62. Convergence within Run
t =1+ 2 ρk (θ)
k=1
∞
∑Autocorrelation time
where ρk(θ) is the autocorrelation in the MCMC
samples for a lag of k generations
Effective sample size (ESS) e =
n
t
where n is the total sample size (number of
generations)
Good mixing when t is small and e large
63. Convergence among Runs
Tree topology:
Compare clade probabilities (split frequencies)
Average standard deviation of split frequencies
above some cut-off (min. 10 % in at least one
run). Should go to 0 as runs converge.
Continuous variables
Potential scale reduction factor (PSRF).
Compares variance within and between runs.
Should approach 1 as runs converge.
Assumes overdispersed starting points
66. Sampledvalue Target distribution
Too modest proposals
Acceptance rate too high
Poor mixing
Too bold proposals
Acceptance rate too low
Poor mixing
Moderately bold proposals
Acceptance rate intermediate
Good mixing
67. Tuning Proposals
Manually by changing tuning parameters
Increase the boldness of a proposal if
acceptance rate is too high
Decrease the boldness of a proposal if
acceptance rate is too low
Auto-tuning
Tuning parameters are adjusted automatically
by the MCMC procedure to reach a target
proposal rate
83. Models, models, models
Alignment-free models
Heterogeneity in substitution rates and stationary
frequencies across sites and lineages
Relaxed clock models
Models for morphology and biogeography
Sampling across model space, e.g. GTR space and
partition space
Models of dependence across sites according to
3D structure of proteins
Positive selection models
Aminoacid models
Models for population genetics and phylogeography
84. Bayes’ Rule
f (θ | D) =
f (θ) f (D |θ)
f (θ) f (D |θ) dθ∫
=
f (θ) f (D |θ)
f (D)
Marginal likelihood (of the data)
f (θ | D,M) =
f (θ | M) f (D |θ,M)
f (D | M)
We have implicitly conditioned on a model:
85. Bayesian Model Choice
f M1( ) f D | M1( )
f M0( ) f D | M0( )
Posterior model odds:
B10 =
f D | M1( )
f D | M0( )
Bayes factor:
86. Bayesian Model Choice
The normalizing constant in Bayes’ theorem, the
marginal likelihood of the data, f(D) or f(D|M), can
be used for model choice
f(D|M) can be estimated by taking the harmonic
mean of the likelihood values from the MCMC run.
Thermodynamic integration and stepping-stone
sampling are computationally more complex but
more accurate methods
Any models can be compared: nested, non-nested,
data-derived; it is just a probability comparison
No correction for number of parameters
Can prefer a simpler model over a more complex
model
Critical values in Kass and Raftery (1997)
88. Bayes Factor Comparisons
Interpretation of the Bayes factor
2ln(B10) B10 Evidence against M0
0 to 2 1 to 3
Not worth more than a
bare mention
2 to 6 3 to 20 Positive
6 to 10 20 to 150 Strong
> 10 > 150 Very strong
90. Listening to lectures,
after a certain age,
diverts the mind too much
from its creative pursuits.
Any scientist who attends
too many lectures and
uses her own brain too
little falls into lazy habits
of thinking.
after Albert Einstein