This document provides an overview of ABC methodology and applications. It begins with examples from population genetics and econometrics that are well-suited for ABC. It then describes the basic ABC algorithm for Bayesian inference using simulation: specifying prior distributions, simulating data under different parameter values, and accepting simulations that best match the observed data. Indirect inference is also discussed as a method for choosing informative summary statistics for ABC. The document traces the origins of ABC to population genetics models from the late 1990s and highlights ongoing contributions from that field to ABC methodology.
Networks community detection using artificial bee colony swarm optimizationAboul Ella Hassanien
Community structure identification in complex networks has been an important research topic in recent years. Community detection can be viewed as an optimization problem in which an objective quality function that captures the intuition of a community as a group of nodes with better internal connectivity than external connectivity is chosen to be optimized. In this work Artificial bee
colony (ABC) optimization has been used as an effective optimization technique to solve the community detection problem with the advantage that the number of
communities is automatically determined in the process. However, the algorithm performance is influenced directly by the quality function used in the optimization
process. A comparison is conducted between different popular communities’ quality measures when used as an objective function within ABC. Experiments on real life networks show the capability of the ABC to successfully find an optimized community structure based on the quality function used.
Confidence Intervals––Exact Intervals, Jackknife, and BootstrapFrancesco Casalegno
••• Learn how to correctly compute and interprete Confidence Intervals •••
In this presentation:
▸ (mis)understanding the real meaning of confidence intervals
▸ exact methods for known distributions
▸ approximated methods for non-parametric statistics
▸ resampling techniques: jackknife and bootstrap
I am Joe M. I am a Statistics Homework Expert at statisticshomeworkhelper.com. I hold a Master's in Statistics, from the Gold Coast, Australia. I have been helping students with their homework for the past 6 years. I solve homework related to Statistics.
Visit statisticshomeworkhelper.com or email info@statisticshomeworkhelper.com.You can also call on +1 678 648 4277 for any assistance with Statistics Homework.
On New Root Finding Algorithms for Solving Nonlinear Transcendental EquationsAI Publications
In this paper, we present new iterative algorithms to find a root of the given nonlinear transcendental equations. In the proposed algorithms, we use nonlinear Taylor’s polynomial interpolation and a modified error correction term with a fixed-point concept. We also investigated for possible extension of the higher order iterative algorithms in single variable to higher dimension. Several numerical examples are presented to illustrate the proposed algorithms.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Unveiling the Energy Potential of Marshmallow Deposits.pdf
ABC short course: introduction chapters
1. ABC methodology and applications
Christian P. Robert
Universit´e Paris-Dauphine, University of Warwick, & IUF
´Ecole d’Hiver, Les Diablerets, CH, Feb. 4-8 2016
2. Outline
1 simulation-based methods in
Econometrics
2 Genetics of ABC
3 Approximate Bayesian computation
4 ABC for model choice
5 ABC model choice via random forests
6 ABC estimation via random forests
7 [some] asymptotics of ABC
3. A motivating if pedestrian example
paired and orphan socks
A drawer contains an unknown number of socks, some of which
can be paired and some of which are orphans (single). One takes
at random 11 socks without replacement from this drawer: no pair
can be found among those. What can we infer about the total
number of socks in the drawer?
4. A motivating if pedestrian example
paired and orphan socks
A drawer contains an unknown number of socks, some of which
can be paired and some of which are orphans (single). One takes
at random 11 socks without replacement from this drawer: no pair
can be found among those. What can we infer about the total
number of socks in the drawer?
• sounds like an impossible task
• one observation x = 11 and two unknowns, nsocks and npairs
• writing the likelihood is a challenge [exercise]
5. Feller’s shoes
A closet contains n pairs of shoes. If 2r shoes are chosen
at random (with 2r < n), what is the probability that
there will be (a) no complete pair, (b) exactly one
complete pair, (c) exactly two complete pairs among
them?
[Feller, 1970, Chapter II, Exercise 26]
6. Feller’s shoes
A closet contains n pairs of shoes. If 2r shoes are chosen
at random (with 2r < n), what is the probability that
there will be (a) no complete pair, (b) exactly one
complete pair, (c) exactly two complete pairs among
them?
[Feller, 1970, Chapter II, Exercise 26]
Resolution as
pj =
n
j
22r−2j n − j
2r − 2j
2n
2r
being probability of obtaining js pairs among those 2r shoes, or for
an odd number t of shoes
pj = 2t−2j n
j
n − j
t − 2j
2n
t
7. Feller’s shoes
A closet contains n pairs of shoes. If 2r shoes are chosen
at random (with 2r < n), what is the probability that
there will be (a) no complete pair, (b) exactly one
complete pair, (c) exactly two complete pairs among
them?
[Feller, 1970, Chapter II, Exercise 26]
If one draws 11 socks out of m socks made of f orphans and g
pairs, with f + 2g = m, number k of socks from the orphan group
is hypergeometric H (11, m, f ) and probability to observe 11
orphan socks total is
11
k=0
f
k
2g
11−k
m
11
×
211−k g
11−k
2g
11−k
8. A prioris on socks
Given parameters nsocks and npairs, set of socks
S = s1, s1, . . . , snpairs , snpairs , snpairs+1, . . . , snsocks
and 11 socks picked at random from S give X unique socks.
9. A prioris on socks
Given parameters nsocks and npairs, set of socks
S = s1, s1, . . . , snpairs , snpairs , snpairs+1, . . . , snsocks
and 11 socks picked at random from S give X unique socks.
Rassmus’ reasoning
If you are a family of 3-4 persons then a guesstimate would be that
you have something like 15 pairs of socks in store. It is also
possible that you have much more than 30 socks. So as a prior for
nsocks I’m going to use a negative binomial with mean 30 and
standard deviation 15.
On npairs/2nsocks I’m going to put a Beta prior distribution that puts
most of the probability over the range 0.75 to 1.0,
[Rassmus B˚a˚ath’s Research Blog, Oct 20th, 2014]
10. Simulating the experiment
Given a prior distribution on nsocks and npairs,
nsocks ∼ Neg(30, 15) npairs|nsocks ∼ nsocks/2Be(15, 2)
possible to
1 generate new values
of nsocks and npairs,
2 generate a new
observation of X,
number of unique
socks out of 11.
11. Simulating the experiment
Given a prior distribution on nsocks and npairs,
nsocks ∼ Neg(30, 15) npairs|nsocks ∼ nsocks/2Be(15, 2)
possible to
1 generate new values
of nsocks and npairs,
2 generate a new
observation of X,
number of unique
socks out of 11.
3 accept the pair
(nsocks, npairs) if the
realisation of X is
equal to 11
12. Meaning
ns
Density
0 10 20 30 40 50 60
0.000.010.020.030.040.050.06
The outcome of this simulation method returns a distribution on
the pair (nsocks, npairs) that is the conditional distribution of the
pair given the observation X = 11
Proof: Generations from π(nsocks, npairs) are accepted with probability
P {X = 11|(nsocks, npairs)}
13. Meaning
ns
Density
0 10 20 30 40 50 60
0.000.010.020.030.040.050.06
The outcome of this simulation method returns a distribution on
the pair (nsocks, npairs) that is the conditional distribution of the
pair given the observation X = 11
Proof: Hence accepted values distributed from
π(nsocks, npairs) × P {X = 11|(nsocks, npairs)} = π(nsocks, npairs|X = 11)
14. Econ’ections
1 simulation-based methods in
Econometrics
2 Genetics of ABC
3 Approximate Bayesian computation
4 ABC for model choice
5 ABC model choice via random forests
6 ABC estimation via random forests
7 [some] asymptotics of ABC
15. Usages of simulation in Econometrics
Similar exploration of simulation-based techniques in Econometrics
• Simulated method of moments
• Method of simulated moments
• Simulated pseudo-maximum-likelihood
• Indirect inference
[Gouri´eroux & Monfort, 1996]
16. Simulated method of moments
Given observations yo
1:n from a model
yt = r(y1:(t−1), t, θ) , t ∼ g(·)
simulate 1:n, derive
yt (θ) = r(y1:(t−1), t , θ)
and estimate θ by
arg min
θ
n
t=1
(yo
t − yt (θ))2
17. Simulated method of moments
Given observations yo
1:n from a model
yt = r(y1:(t−1), t, θ) , t ∼ g(·)
simulate 1:n, derive
yt (θ) = r(y1:(t−1), t , θ)
and estimate θ by
arg min
θ
n
t=1
yo
t −
n
t=1
yt (θ)
2
18. Method of simulated moments
Given a statistic vector K(y) with
Eθ[K(Yt)|y1:(t−1)] = k(y1:(t−1); θ)
find an unbiased estimator of k(y1:(t−1); θ),
˜k( t, y1:(t−1); θ)
Estimate θ by
arg min
θ
n
t=1
K(yt) −
S
s=1
˜k( s
t , y1:(t−1); θ)/S
[Pakes & Pollard, 1989]
19. Indirect inference
Minimise (in θ) the distance between estimators ˆβ based on
pseudo-models for genuine observations and for observations
simulated under the true model and the parameter θ.
[Gouri´eroux, Monfort, & Renault, 1993;
Smith, 1993; Gallant & Tauchen, 1996]
20. Indirect inference (PML vs. PSE)
Example of the pseudo-maximum-likelihood (PML)
ˆβ(y) = arg max
β
t
log f (yt|β, y1:(t−1))
leading to
arg min
θ
||ˆβ(yo
) − ˆβ(y1(θ), . . . , yS (θ))||2
when
ys(θ) ∼ f (y|θ) s = 1, . . . , S
21. Indirect inference (PML vs. PSE)
Example of the pseudo-score-estimator (PSE)
ˆβ(y) = arg min
β
t
∂ log f
∂β
(yt|β, y1:(t−1))
2
leading to
arg min
θ
||ˆβ(yo
) − ˆβ(y1(θ), . . . , yS (θ))||2
when
ys(θ) ∼ f (y|θ) s = 1, . . . , S
22. Consistent indirect inference
...in order to get a unique solution the dimension of
the auxiliary parameter β must be larger than or equal to
the dimension of the initial parameter θ. If the problem is
just identified the different methods become easier...
23. Consistent indirect inference
...in order to get a unique solution the dimension of
the auxiliary parameter β must be larger than or equal to
the dimension of the initial parameter θ. If the problem is
just identified the different methods become easier...
Consistency depending on the criterion and on the asymptotic
identifiability of θ
[Gouri´eroux, Monfort, 1996, p. 66]
24. AR(2) vs. MA(1) example
true (AR) model
yt = t − θ t−1
and [wrong!] auxiliary (MA) model
yt = β1yt−1 + β2yt−2 + ut
R code
x=eps=rnorm(250)
x[2:250]=x[2:250]-0.5*x[1:249]
simeps=rnorm(250)
propeta=seq(-.99,.99,le=199)
dist=rep(0,199)
bethat=as.vector(arima(x,c(2,0,0),incl=FALSE)$coef)
for (t in 1:199)
dist[t]=sum((as.vector(arima(c(simeps[1],simeps[2:250]-propeta[t]*
simeps[1:249]),c(2,0,0),incl=FALSE)$coef)-bethat)^2)
25. AR(2) vs. MA(1) example
One sample:
−1.0 −0.5 0.0 0.5 1.0
0.00.20.40.60.8
θ
distance
26. AR(2) vs. MA(1) example
Many samples:
0.2 0.4 0.6 0.8 1.0
0123456
27. Choice of pseudo-model
Pick model such that
1 ˆβ(θ) not flat
(i.e. sensitive to changes in θ)
2 ˆβ(θ) not dispersed (i.e. robust agains changes in ys(θ))
[Frigessi & Heggland, 2004]
28. ABC using indirect inference (1)
We present a novel approach for developing summary statistics
for use in approximate Bayesian computation (ABC) algorithms by
using indirect inference(...) In the indirect inference approach to
ABC the parameters of an auxiliary model fitted to the data become
the summary statistics. Although applicable to any ABC technique,
we embed this approach within a sequential Monte Carlo algorithm
that is completely adaptive and requires very little tuning(...)
[Drovandi, Pettitt & Faddy, 2011]
c Indirect inference provides summary statistics for ABC...
29. ABC using indirect inference (2)
...the above result shows that, in the limit as h → 0, ABC will
be more accurate than an indirect inference method whose auxiliary
statistics are the same as the summary statistic that is used for
ABC(...) Initial analysis showed that which method is more
accurate depends on the true value of θ.
[Fearnhead and Prangle, 2012]
c Indirect inference provides estimates rather than global inference...
30. Genetics of ABC
1 simulation-based methods in
Econometrics
2 Genetics of ABC
3 Approximate Bayesian computation
4 ABC for model choice
5 ABC model choice via random forests
6 ABC estimation via random forests
7 [some] asymptotics of ABC
31. Genetic background of ABC
ABC is a recent computational technique that only requires a
generative model, i.e., being able to sample from the density f (·|θ)
This technique stemmed from population genetics models, about
15 years ago, and population geneticists still contribute
significantly to methodological developments of ABC.
[Griffith & al., 1997; Tavar´e & al., 1999]
32. Population genetics
[Part derived from the teaching material of Raphael Leblois, ENS Lyon, November 2010]
• Describe the genotypes, estimate the alleles frequencies,
determine their distribution among individuals, populations
and between populations;
• Predict and understand the evolution of gene frequencies in
populations as a result of various factors.
c Analyses the effect of various evolutive forces (mutation, drift,
migration, selection) on the evolution of gene frequencies in time
and space.
33. Wright-Fisher model
Le modèle de Wright-Fisher
•! En l’absence de mutation et de
sélection, les fréquences
alléliques dérivent (augmentent
et diminuent) inévitablement
jusqu’à la fixation d’un allèle
•! La dérive conduit donc à la
perte de variation génétique à
l’intérieur des populations
• A population of constant
size, in which individuals
reproduce at the same time.
• Each gene in a generation is
a copy of a gene of the
previous generation.
• In the absence of mutation
and selection, allele
frequencies derive inevitably
until the fixation of an
allele.
34. Coalescent theory
[Kingman, 1982; Tajima, Tavar´e, &tc]
!"#$%&'(('")**+$,-'".'"/010234%'".'5"*$*%()23$15"6"
!!"7**+$,-'",()5534%'" " "!"7**+$,-'"8",$)('5,'1,'"9"
"":";<;=>7?@<#" " " """"":"ABC7#?@>><#"
Coalescence theory interested in the genealogy of a sample of
genes back in time to the common ancestor of the sample.
35. Common ancestor
6
Timeofcoalescence
(T)
Modélisation du processus de dérive génétique
en “remontant dans le temps”
jusqu’à l’ancêtre commun d’un échantillon de gènes
Les différentes
lignées fusionnent
(coalescent) au fur
et à mesure que
l’on remonte vers le
passé
The different lineages merge when we go back in the past.
36. Neutral mutations
20
Sous l’hypothèse de neutralité des marqueurs génétiques étudiés,
les mutations sont indépendantes de la généalogie
i.e. la généalogie ne dépend que des processus démographiques
On construit donc la généalogie selon les paramètres
démographiques (ex. N),
puis on ajoute a posteriori les
mutations sur les différentes
branches, du MRCA au feuilles de
l’arbre
On obtient ainsi des données de
polymorphisme sous les modèles
démographiques et mutationnels
considérés
• Under the assumption of
neutrality, the mutations
are independent of the
genealogy.
• We construct the genealogy
according to the
demographic parameters,
then we add a posteriori the
mutations.
37. Neutral model at a given microsatellite locus, in a closed
panmictic population at equilibrium
Kingman’s genealogy
When time axis is
normalized,
T(k) ∼ Exp(k(k −1)/2)
38. Neutral model at a given microsatellite locus, in a closed
panmictic population at equilibrium
Kingman’s genealogy
When time axis is
normalized,
T(k) ∼ Exp(k(k −1)/2)
Mutations according to
the Simple stepwise
Mutation Model
(SMM)
• date of the mutations ∼
Poisson process with
intensity θ/2 over the
branches
39. Neutral model at a given microsatellite locus, in a closed
panmictic population at equilibrium
Observations: leafs of the tree
ˆθ =?
Kingman’s genealogy
When time axis is
normalized,
T(k) ∼ Exp(k(k −1)/2)
Mutations according to
the Simple stepwise
Mutation Model
(SMM)
• date of the mutations ∼
Poisson process with
intensity θ/2 over the
branches
• MRCA = 100
• independent mutations:
±1 with pr. 1/2
40. Much more interesting models. . .
• several independent locus
Independent gene genealogies and mutations
• different populations
linked by an evolutionary scenario made of divergences,
admixtures, migrations between populations, selection
pressure, etc.
• larger sample size
usually between 50 and 100 genes
41. Available population scenarios
Between populations: three types of events, backward in time
• the divergence is the fusion between two populations,
• the admixture is the split of a population into two parts,
• the migration allows the move of some lineages of a
population to another.
•
4
•
2
•
5
•
3
•
1
Lignée ancestrale
Présent
T5
T4
T3
T2
FIGURE 2.2: Exemple de généalogie de cinq individus issus d’une seule population fermée à l’équilibre. Les
individus échantillonnés sont représentés par les feuilles du dendrogramme, les durées inter-coalescences
T2, . . . , T5 sont indépendantes, et Tk est de loi exponentielle de paramètre k k - 1 /2.
Pop1 Pop2
Pop1
Divergence
(a)
t
t0
Pop1 Pop3 Pop2
Admixture
(b)
1 - rr
t
t0
m12
m21
Pop1 Pop2
Migration
(c)
t
t0
FIGURE 2.3: Représentations graphiques des trois types d’évènements inter-populationnels d’un scénario
démographique. Il existe deux familles d’évènements inter-populationnels. La première famille est simple,
elle correspond aux évènement inter-populationnels instantanés. C’est le cas d’une divergence ou d’une
admixture. (a) Deux populations qui évoluent pour se fusionner dans le cas d’une divergence. (b) Trois po-
pulations qui évoluent en parallèle pour une admixture. Pour cette situation, chacun des tubes représente
(on peut imaginer qu’il porte à l’intérieur) la généalogie de la population qui évolue indépendamment des
42. A complex scenario
The goal is to discriminate between different population scenarios
from a dataset of polymorphism (DNA sample) y observed at the
present time.
2.5 Conclusion 37
Divergence
Pop1
Ne1
Pop4
Ne4
Admixture
Pop3
Ne3
Pop6Ne6
Pop2
Ne2
Pop5Ne5
Migration
m
m0
t = 0
t5
t4
t0
4
Ne4
Ne0
4
t3
t2
t1
r 1 - r
1 - ss
FIGURE 2.1: Exemple d’un scénario évolutif complexe composé d’évènements inter-populationnels. Ce
43. Demo-genetic inference
Each model is characterized by a set of parameters θ that cover
historical (time divergence, admixture time ...), demographics
(population sizes, admixture rates, migration rates, ...) and genetic
(mutation rate, ...) factors
The goal is to estimate these parameters from a dataset of
polymorphism (DNA sample) y observed at the present time
Problem: most of the time, we can not calculate the likelihood of
the polymorphism data f (y|θ).
44. Untractable likelihood
Missing (too missing!) data structure:
f (y|θ) =
G
f (y|G, θ)f (G|θ)dG
The genealogies are considered as nuisance parameters.
This problematic thus differs from the phylogenetic approach
where the tree is the parameter of interesst.
45. A genuine example of application
94
!""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03!
1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+
Pygmies populations: do they have a common origin? Is there a
lot of exchanges between pygmies and non-pygmies populations?
47. Simulation results
Différents scénarios possibles, choix de scenari
Le scenario 1a est largement soutenu par rap
autres ! plaide pour une origine commune
!""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03
1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.
Différents scénarios possibles, choix de scenario par ABC
Le scenario 1a est largement soutenu par rapport aux
autres ! plaide pour une origine commune des
populations pygmées d’Afrique de l’Ouest
Verdu e
c Scenario 1A is chosen.
49. Instance of ecological questions [message in a beetle]
• How the Asian Ladybird
beetle arrived in Europe?
• Why does they swarm right
now?
• What are the routes of
invasion?
• How to get rid of them?
• Why did the chicken cross
the road?
[Lombaert & al., 2010, PLoS ONE]
beetles in forests
50. Worldwide invasion routes of Harmonia Axyridis
For each outbreak, the arrow indicates the most likely invasion
pathway and the associated posterior probability, with 95% credible
intervals in brackets
[Estoup et al., 2012, Molecular Ecology Res.]
51. Worldwide invasion routes of Harmonia Axyridis
For each outbreak, the arrow indicates the most likely invasion
pathway and the associated posterior probability, with 95% credible
intervals in brackets
[Estoup et al., 2012, Molecular Ecology Res.]
52. A population genetic illustration of ABC model choice
Two populations (1 and 2) having diverged at a fixed known time
in the past and third population (3) which diverged from one of
those two populations (models 1 and 2, respectively).
Observation of 50 diploid individuals/population genotyped at 5,
50 or 100 independent microsatellite loci.
Model 2
53. A population genetic illustration of ABC model choice
Two populations (1 and 2) having diverged at a fixed known time
in the past and third population (3) which diverged from one of
those two populations (models 1 and 2, respectively).
Observation of 50 diploid individuals/population genotyped at 5,
50 or 100 independent microsatellite loci.
Stepwise mutation model: the number of repeats of the mutated
gene increases or decreases by one. Mutation rate µ common to all
loci set to 0.005 (single parameter) with uniform prior distribution
µ ∼ U[0.0001, 0.01]
54. A population genetic illustration of ABC model choice
Summary statistics associated to the (δµ)2 distance
xl,i,j repeated number of allele in locus l = 1, . . . , L for individual
i = 1, . . . , 100 within the population j = 1, 2, 3. Then
(δµ)2
j1,j2
=
1
L
L
l=1
1
100
100
i1=1
xl,i1,j1 −
1
100
100
i2=1
xl,i2,j2
2
.
55. A population genetic illustration of ABC model choice
For two copies of locus l with allele sizes xl,i,j1 and xl,i ,j2
, most
recent common ancestor at coalescence time τj1,j2 , gene genealogy
distance of 2τj1,j2 , hence number of mutations Poisson with
parameter 2µτj1,j2 . Therefore,
E xl,i,j1 − xl,i ,j2
2
|τj1,j2 = 2µτj1,j2
and
Model 1 Model 2
E (δµ)2
1,2 2µ1t 2µ2t
E (δµ)2
1,3 2µ1t 2µ2t
E (δµ)2
2,3 2µ1t 2µ2t
56. A population genetic illustration of ABC model choice
Thus,
• Bayes factor based only on distance (δµ)2
1,2 not convergent: if
µ1 = µ2, same expectation
• Bayes factor based only on distance (δµ)2
1,3 or (δµ)2
2,3 not
convergent: if µ1 = 2µ2 or 2µ1 = µ2 same expectation
• if two of the three distances are used, Bayes factor converges:
there is no (µ1, µ2) for which all expectations are equal
57. A population genetic illustration of ABC model choice
q
q q
5 50 100
0.00.40.8
DM2(12)
q
q
q
q
q
q
q
q
q
qq
q q
q
5 50 100
0.00.40.8
DM2(13)
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqq
q
qq
q
qqqq
q
q
q
q
q
q
5 50 100
0.00.40.8
DM2(13) & DM2(23)
Posterior probabilities that the data is from model 1 for 5, 50
and 100 loci