M. Kum Cletus Kwa a soutenu une thèse de Doctorat/Phd en mathématiques ce 14 juin 2016 à l'Université de Dschang. Le jury lui a décerné à l'issue des échanges la mention très honorable.
Probability Models for Estimating Haplotype Frequencies and Bayesian Survival Analysis of Two Treatments
1. References
Probability Models for Estimating Haplotype
Frequencies and Bayesian Survival Analysis of
Two Treatments
Cletus Kwa Kum
Mathematics/Computer Science Department
University of Dschang
Supervisors:
Professor Daniel Thorburn: Statistics Department–Stockholm University–Sweden
Professor Bitjong Ndombol: Maths/Computer Department – University of Dschang
June 14, 2016
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis1/77
2. References
Introduction
Malaria is an old disease which spans over 100 years. It is
mostly present in the tropical countries.
WHO in 2008 declared that malaria is endemic in 109
countries.
There were about 1 million deaths out of 243 million cases
No vaccine exist and the parasite is continuously developing
resistance to many available drugs.
Mathematical modelling in malaria has to play a central role in
the fight against the disease.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis2/77
3. References
Overview of models and modelling in malaria
Malaria models
The applications of mathematics to biology started in 1628,
when Harvey calculated the amount and prove the circular
movement of blood.
It was in 1910 that the first malaria model was proposed by Sir
Ronald Ross.
Since then Malaria has been studied from different
perspectives, and immense literature exists describing a large
number of models and modelling approaches.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis3/77
4. References
Ongoing clinical Trials
Drug resistance and efficacy
Drug resistance is a major setback in all efforts to eliminate
malaria
Antimalarial drug efficacy clinical trials are on the increase.
One of such trials was conducted in Tanzania in 2004 to
compare the efficacy of two malaria treatments.
The treatments compared were artesunate plus
sulfadoxine-pyrimethamine (ASP) and
sulfadoxine-pyrimethamine alone (SP)
The duration was 84 days.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis4/77
5. References
Data from the trial
The data collected carried informaion on single nucleotide
polymorphisms (SNPs) at three positions of DNA
These positions are believed to be related to drug reistance
Mixed infections were present which obscure the underlying
frequencies of the alleles at each locus and associations
between loci in samples where alleles are mixed.
Combinations of alleles at different markers along the same
chromosomes are known as Haplotypes.
A major problem has been to accurately determine number of
infections in multiple infections.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis5/77
6. References
Motivation and Objectives of this thesis
Motivation
Malaria has developed resistance to most single drug
treatments e.g chloroquine.
WHO recommends a switched to ACTs to reduce drug
resistance.
Therefore knowledge through Mathematical models will
contribute to the fight against drug resistance and malaria.
Main objective
To develop Statistical Methods that can be used for a better
understanding of the epidemiological processes
Apply these methods to real data on malaria and hence
improve and add scientific knowledge in this field.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis6/77
7. References
Specific objectives
Estimating haplotype frequencies of multi-single nucleotide
polymorphisms (SNPs)
Unveil the hidden combinations of the possible haplotypes
that gave rise to the mixed–phenotype infections observed in
the genotyped data.
Study the effects of treatment on proportion of resistant
parasites.
Estimate the average number of different haplotypes a child is
carrying both at baseline and at first recurrence of malaria.
Compare using Bayesian methods the efficacy of ASP and SP
Estimate differences in malaria-free periods for ASP snd SP
Investigate the effect of covariates on pribability of first
recurrence of malaria
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis7/77
8. References
Outline of the thesis
Chapter 2 presents a background to some statistical aspects.
Chapter 3 describes the clinical study and data, which is the
foundation of this research.
Chapter 4 focuses on the effects of treatment on parasite drug
resistance, estimating haplotype frequencies and the
expected number of infection times a child is infected with
different gene types.
Chapter 5 presents the comparison of the efficacy of the two
treatments and estimation of malaria free times.
In Chapter 6, we study the effects of some covariates on the
probability of first recurrence of malaria.
Chapter 7 contains conclusions and future work.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis7/77
9. References
Chapter 2
Key ideas of Chapter 2
Models and parameter estimation using MLE methods
Models selection and Deviance
Difference between Classical Statistics and Bayesian
Statistics
Numerical methods: Nelder–Mead simplex method, Markov
Chain Monte Carlo method and Gibbs sampling
Binary models and related statistical measures: Cure rates,
Odds Ratio and Logistic regression
Event history analysis: survival analysis, discrete time hazard,
Restricted mean survival time and Estimator of the survival
function.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis7/77
10. References
Maximum likelihood estimator
The maximum likelihood estimator is defined as
ˆθmle
def
= arg max
θ∈Θ
L(θ) (1)
In maximum likelihood estimation, the best parameter
estimates are those that maximize the likelihood.
When L(θ) do not have closed form solutions, we use
computational methods to obtain estimates for the
parameters.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis7/77
11. References
Model selection
Choosing a statistical model from a set of competing models
for a given data set.
Two criteria that are used to make the choice are parsimony
and goodness-of-fit.
The deviance assesses the goodness of fit for the model by
looking at the difference between the log-likelihood functions
of the saturated model and the reduced model.
Classical statistics
The classical approach is the form of statistical inference that
is most widely used of the statistical paradigms.
Its foundation is on the repeatibility of events: E.g. the
probability of an event is the proportion of times it occurs out
of a large number of independent repeated experiments.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis7/77
12. References
Bayesian Statistics
Bayesian paradigm is based on specifying a probability model
for the observed data D, given a vector of unknown
parameters θ, leading to the likelihood function L(θ|D).
We assume θ is random and has a prior distribution π(θ)
Inference concerning θ is then based on the posterior
distribution given by
π(θ|D) =
L(θ|D)π(θ)
Θ L(θ|D)π(θ)dθ
, (2)
where Θ denotes the parameter space of θ and θ has an
absolutely continuous distribution.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis7/77
13. References
Simple example in medicine
Before giving a good diagnosis, a Physician wants to know
symptoms and family history of cancer. This initial information
form the prior which will be updated when test results arrive
giving a posterior (Diagnosis)
Under uncertainty, further test can be made to update the old
posterior.
The father of Bayesian statistics is Reverend Thomas Bayes
(1702 – 1761).
Popular today because of the availability of fast computational
methods (e.g. MCMC)
Numerical methods
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis7/77
14. References
Optimization method used in the thesis
We have formulated a maximum likelihood function for the
estimation of haplotype frequencies in this thesis and the
function is non-smooth.
We rely on direct search method for estimation (specifically
the Nelder–Mead simplex method -1965).
The method is quite popular. About 19,721 papers and books
published have made reference to the Nelder–Mead method
as of May 2016.
The Nelder–Mead simplex algorithm is implemented in R as a
function optim(init, f, df, method="nelder
mead") which can be applied to stochastic functions.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis8/77
15. References
Markov Chain Monte Carlo (MCMC) Techniques and other
computational aspects.
Markov chain Monte Carlo (MCMC) methods enable the
drawing of samples from the joint posterior distribution
One of the most widely used MCMC techniques is Gibbs
sampling technique.
The idea in Gibbs sampling is to generate posterior samples
by sweeping through each variable to sample from its
conditional distribution with the remaining variables fixed to
their current values.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis8/77
16. References
Event history analysis
Survival analysis
Survival analysis is set of methods for analyzing data where
the outcome variable is the time until the occurrence of an
event of interest. The event can be death, occurrence or
recurrence of a disease, marriage or divorce.
The time to event or survival time can be measured in days,
weeks or years.
In the thesis, event of interest is the first recurrence of malaria
since start of treatment.
And the survival time is the time in days until a child is tested
malaria positive.
Observations are called censored when the information about
their survival time is not complete;
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis8/77
17. References
Event history analysis
Restricted mean survival time
The RMST is the expected survival time within a fixed
follow-up interval.
It was first proposed by Irwin1
since the mean survival time is
not estimable in the presence of censoring.
The mean survival time can be mathematically calculated as
the total area between a survival function and the x-axis.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis9/77
18. References
Chapter 3: Medical Background and Description of Data
Conduct of the Clinical Trial
The study was undertaken in Tanzania – Uzini and Konde with
206 and 178 uncomplicated malaria patients,respectively
The ethical related issues as stipulated in the Declaration of
Helsinki were respected.
The trial conduct was supervised by the Karolinska Institute -
Sweden.
On Day 0, the patients were tested for parasites and retested
on days 7, 21, 28, 42, 56 and 84.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis10/77
19. References
Description of Data
Genotype malaria
PCR techniques were used in differentiating recrudescence
parasitaemia from new infections (at baseline and at
recurrence of malaria).
The parasites were analysed and the single–nucleotide
polymorphisms (SNPs) at three positions in the pdhfr gene
were determined.
The three positions (pdhfr 51, 59 and 108) could be defined
as either resistant (R) or sensitive (S).
If both parasites with R and S SNPs were present, this was
denoted by the letter M.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis11/77
20. References
Description of Data continued
Genotype malaria
Each blood sample was classified for its parasites pdhfr
characterized by three letters from SSS to MMM, denoting the
status of each one of the individual SNPs.
RSM means that the child had only parasites with resistant
SNPs at the first position and only sensitive SNPs at the
second position, but there were both parasites with resistant
and sensitive SNPs at the third position.
To avoid the ambiguity in differentiating between malaria due
to reinfection and recrudescence.
In this thesis, any all first recurrence of malaria within the
study period is known simply as a first recurrence of malaria.
Table 1 is an extract of such data.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis12/77
22. References
Data for Chapter 5
Cured Data
Table 2: Number of patients cured of malaria and those at start (in
parentheses)
Location Drug (0 – 42] (0 – 84]
ASP 86 (90) 34 (90)
KONDE SP 43 (86) 29 (86)
ASP 63 (94) 56 (94)
UZINI SP 63 (110) 57 (110)
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis14/77
23. References
Data for Chapter 5 contd
Recurrence infection
Table 3: Number with first recurrence of malaria and those at risk (in
parentheses)
Location Drug (0–7] (7–21] (21–28] (28–42] (42–56] (56–84]
ASP 3 (90) 8 (87) 21 (79) 16 (58) 7 (42) 3 (35)
KONDE SP 7 (86) 17 (79) 10 (62) 9 (52) 10 (43) 4 (33)
ASP 4 (94) 13 (90) 9 (77) 8 (68) 9 (60) 1 (51)
UZINI SP 15 (110) 14 (95) 13 (81) 6 (68) 2 (62) 3 (60)
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis15/77
24. References
Chapter 4- Estimating Haplotype frequencies and the
effects of Treatment
Motivation
The problem of estimating the proportions of different
haplotypes from blood samples with multiple genes, have
usually been approached using simple methods like
neglecting all mixed infections with multiple genotypes or
counting the multiple genes as resistant.
Wigger et al.(2013) uses MCMC methods where one step
simulates the true state. Hastings and Smith(2008) present a
computer programme for calculations.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis16/77
25. References
Unveiling haplotypes in mixed infections
Example: Hidden combination
The classification MRM is RRR+SRS and RRS+SRR. M
indicates the presence of both R and S
Other combinations are RRR+RRS+SRR, RRR+RRS+SRS.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis17/77
26. References
Hypothesis and methodology in this Chapter
There are relatively more resistant SNPs at the first
recurrence of malaria if we correct for the fact that the children
at baseline were infected by multiple types of malaria
parasites simultaneously.
We derive models at baseline, recurrence models and at both
periods. Page 102.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis18/77
27. References
One time–point probability models
Saturated model
There were 27 different possible genotypes at baseline giving
malaria (see page 44 ) and 28 at the first reappearance of
malaria.
We represent 27 genotypes by IJK, where I=M, R or S, J = M,
R or S and K = M, R or S.
Let nIJK be the corresponding number of patients with this
infection.
Then the probability of an infection IJK can be estimated by
the corresponding relative frequency, that is,
π∗
IJK =
nIJK
N
, (3)
where N is the total number of observed patients with infection
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis19/77
28. References
Haplotypes and observations with multiple genes
Model derivation
The parasites can be classified into eight haplotypes.
XYZ ∈ {RRR, RRS, RSR, SRR, RSS, SRS, SSR, SSS}.
We use letters IJK to classify observations into R, S or M, but
XYZ when classifying parasites with only R or S.
Let pXYZ = 1 − qXYZ , be the probability of a susceptible child
being infected by type XYZ.
Assuming that these eight haplotypes infect children
independently, the probability that a child stays healthy is
π(H) =
8
XYZ
qXYZ , (4)
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis20/77
29. References
Model derivation cont’d
Probability of each infection
The probability of one infection, e.g. RRR infection is
πRRR =
pRRR
qRRR
XYZ
qXYZ (5)
Also, for the infection MRR infection is
πMRR =
pRRR
qRRR
×
pSRR
qSRR
XYZ
qXYZ . (6)
.
MRR, corresponds to an infection with both RRR and SRR
and no other.
There are 12 classifications with 1 M, 6 classifications with 2
M and 193 with 3 M.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis21/77
30. References
Model derivation cont’d
Probability of MMM infection
The probability of MMM can be obtained by summing 193
terms but it is probably simpler to subtract the sum of all the
other probabilities from 1.
πMMM = 1 −
IJK
πIJK , where IJK = MMM. (7)
Our interest is on observed proportions condition on the event
of having malaria. Thus the probability of an observation
classified as IJK is
π′
IJK =
πIJK
1 −
XYZ
qXYZ
, (8)
where I=M, R or S, J= M, R or S and K= M, R or S.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis22/77
31. References
One time–point probability models
Our model
Using equation (8) and the fact that the number of observed
types follows a multinomial distribution, we get that the
likelihood function is
L(p) =
27
IJK
πIJK /
1 −
8
XYZ
qXYZ
nIJK
, (9)
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis23/77
32. References
One time–point probability models
Estimation of the model
We maximise this function to obtain estimates of the
probabilities of XYZ infections by the maximum likelihood
method. A programme code was written and the optimisation
technique used, was the Nelder – Mead method4
in R.
This technique does not require derivatives which makes it
suitable for optimisation of non–smooth functions. It often
shows rapid improvements with a relatively small number of
iterations.
The optimisation procedure employed, produced maximum
likelihood(ML) estimates for pXYZ , which are presented in
Tables 4 and 5, each for Konde and Uzini, respectively.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis24/77
35. References
Average number of infections per child
Average number assuming Poisson model
If we assume that the number of times a child is infected
follows a Poisson distribution, then Pr(0) = exp(−λ), where λ
is the parameter or the mean.
pXYZ is the probability of getting infected at least once with
haplotype XYZ.
Since we assume a Poisson distribution, then λXYZ equals the
expected number of times a person is infected with XYZ.
λXYZ /(1 − pXYZ ) = E(Number of times a person is infected
given that he is infected at least once).
The expected number of infection times can be estimated by
−
XYZ
ln(qXYZ ) / (1 −
8
XYZ
qXYZ ) (10)
These figures can be found in Tables 6 and 7.Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis27/77
36. References
Average number of infections per child
Average estimates
Table 6: Average number of haplotypes and expected number of times
patients were infected: No distinction between ASP and SP
KONDE UZINI
Baseline First Recurrence Baseline First Recurrence
Average no. of haplotypes 1.74 1.19 1.31 1.21
Expected no. of infection times 2.29 1.46 1.46 1.35
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis28/77
37. References
Average number of infections per child
Average estimates
Table 7: Average number of haplotypes and expected number of times
patients were infected: Distinction between ASP and SP
KONDE UZINI
ASP SP ASP SP
Baseline FirstRecurrence Baseline FirstRecurrence Baseline FirstRecurrence Baseline FirstRecurrence
Averageno.ofhaplotypes 1.79 1.34 1.69 1.15 1.33 1.15 1.30 1.25
Expectedno.ofinfectiontimes 2.31 1.61 2.27 1.63 1.49 1.22 1.47 1.50
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis29/77
38. References
A model assuming independence between gene
positions: M1
Is the occurrence of R or an S independent at all positions?
fXYZ = fX.. × f.Y. × f..Z = λX..λ.Y.λ..Z /
8
UVW
λUVW , (11)
Since pXYZ is fairly small, λXYZ = − ln(1 − pXYZ ) ≈ pXYZ . Thus
factorizing fXYZ is approximately the same as factorizing pXYZ , but
we may include a normalizing constant, α. Thus, we tested the
model
pXYZ = α × pX.. × p.Y. × p..Z (12)
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis30/77
39. References
Two time–points models: Baseline and first recurrence
time points.
Is there a relation between the baseline and first recurrence time?
Starting point the model (M2), with 16 parameters. Denoting
the parameters at baseline with an extra index b, and at first
recurrence of malaria by an extra index r,
pbXYZ = 1 − exp(−λbXYZ )
prXYZ = 1 − exp(−λrXYZ )
: M2 (13)
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis31/77
40. References
Two time–points models: Baseline and first recurrence
time points.
A model with varying amount of infection: M3
At baseline there are infections than at recurrence due to
longer exposore to different types.
We consider a simple model in which the only difference
between baseline and at the first recurrence, is that total
amount of infection t is smaller at this first appearance of
malaria.
prXYZ = 1 − exp(−t × λbXYZ ). (14)
Testing against the model M2, the increase in 2×loglikelihood
ratio is 16.5 for Konde and 20.2 for Uzini.
With 7 degrees of freedom, we must reject this model. We
thus look at a model where the decrease is different for
different genes.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis32/77
41. References
Two time–points models: Baseline and first recurrence
time points.
A model with varying amounts of infection and differences between
gene positions
The eight types of infections seem not decrease equally much.
We try to model the differences by less than eight parameters.
We consider two models, one additive and one multiplicative.
These models describe the hypothesis that the proportion of
sensitive genes decreases between baseline and at first
recurrence of malaria.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis33/77
42. References
Two time–points models: Baseline and first recurrence
time points.
Multiplicative model M4 and additive model M5
Multiplicative model.
pbXYZ = 1 − exp(−t(αI1
1 × αI2
2 × αI3
3 × λbXYZ )) : M4 (15)
Additive model
pbXYZ = 1 − exp(−t(α1I1 + α2I2 + α3I3 + λbXYZ )) : M5 (16)
where
I1 = 1 if X in the genotype (XYZ) = S and 0 otherwise
I2 = 1 if Y in the genotype (XYZ) = S and 0 otherwise
I3 = 1 if Z in the genotype (XYZ) = S and 0 otherwise
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis34/77
43. References
Two time–points models: Baseline and first recurrence
time points.
Using the multiplicative model M4 to measure the relative
increase in haplotypes with S-genes at different positions
Table 8 shows estimates for the four parameters.
Table 8: Measure of the relative increase in haplotypes with
S-genes at different positions
Parameter Konde Uzini All data ASP SP
tK 0.56 – 0.48 0.579 0.439
tU – 1.07 1.12 0.512 1.915
α1 0.00 1.29 0.98 0.605 1.073
α2 0.83 0.69 0.75 1.464 0.486
α3 0.37 0.49 0.47 0.882 0.304
The proportion of R–genes at the last position were much
higher at recurrence and somewhat higher, but not significant,
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis35/77
44. References
Two time–points models: Baseline and first recurrence
time points.
A model combining both health centres
To test if the true relative decrease in the proportion of
S-genes at the three positions can be the same at both
locations via the three α-parameters.
prKXYZ = 1−exp(−tK (αI1
1 ×αI2
2 ×αI3
3 ×λbKXYZ )) at Konde : M6
(17)
prUXYZ = 1−exp(−tU(αI1
1 ×αI2
2 ×αI3
3 ×λbUXYZ )) at Uzini : M7
(18)
where I1, I2 and I3 are as defined before.
We may accept that the same model can be used at both
centres.
The hypothesis of the exposure parameters t being the same
was rejected.
All ASP α′s are around 1.Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis36/77
45. References
Two–time points models
A model combining both health centres cont’d
The decrease is clear for those treated with SP.
Where we earlier had a decrease by the factors 0.75 and 0.47
(see Table 8), it is now only 0.49 and 0.30 for SP.
The decrease is most obvious in Uzini, where there were
more infections.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis37/77
46. References
Results and Discussion on this Chapter
Important results
We could not reject the hypothesis that the eight haplotypes of
malaria infected the children independently of each other.
At first recurrence of malaria, the proportions of some parasite
types were smaller compared to baseline.
In particular, those haplotypes with genes marked S at
second and third positions decreased when treated with SP.
When the children were treated with ASP, the decrease was
much smaller.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis38/77
47. References
More results on the chapter
Important results cont’d
This because SP did not kill all the parasites with resistant
genes which led for the reappearance of malaria.
Treatment with ASP, all were killed and all observed first
recurrences depended on new infections.
A child with malaria was at baseline on the average infected
by 2 different parasite types.
The estimated number of times they were infected being 3 in
Konde and 2 times in Uzini.
At the first recurrence of malaria, the number of haplotypes
had decreased.
Results have been published in3
(International Journal of
Biostatistics - 2013)
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis39/77
48. References
Chapter 4: Efficacy of Treatments - Cure Rates and
Malaria Free Times: A Bayesian approach
Motivation
Many studies have concluded that ACTs are better treatment
therapies in terms of efficacy.
But ACTs are not a panacea to malaria for they too also fail. A
signal of treatment failure is a recrudescence of malaria.
Some researchers have at the level of the laboratory measure
the duration for parasite clearance in hosts.
Objective
Which of the two treatments is more efficacious?
Present a new methodology that can be used to estimate how
long a treatment can postpone the recurrence of the disease
in case of failure.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis40/77
49. References
Assumptions
Some preliminary assumptions:
We consider the first recurrence of malaria during follow–up
periods as a heuristic justification of treatment failure.
All cases of failure must happened before time tmax .
We do not assume any distribution in determining the
diffference in mean survival times between the two treatments.
Hence nonparametric procedure!
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis41/77
50. References
Statistical Models
Cure rate model
Let ni be the number at start receiving treatment i
Xi equal the number of cured patients before time tmax, the
end of the follow–up).
Then our model is Xi ∼ Bin(ni , pi ), where i = ASP or SP and
pi is the probability to be cured by treatment i.
Assuming a Jeffrey’s prior, the posterior distribution for pi is
Beta(α + Xi , β + ni − Xi ) for i = ASP or SP . (19)
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis42/77
51. References
Delay time model
Recurrence rate model at each follow-up
Suppose that rescreening is done at some fixed time points,
t0 = 0, t1, . . . , tk = tmax.
Let Rj,i and Yj,i denote the number of children who had been
free from malaria up to the time point tj−1 and those who get
malaria between time points tj−1 and tj (i = ASP or SP),
respectively.
Then for each of these intervals, the children Yj,i witnessing
the event of interest can be modelled as Bin(Rj,i , θji )
The posterior distribution of θji is
Beta(αji + Yj,i , βji + Rj−1,i − Yj,i ) (20)
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis43/77
52. References
Delay time model contd
Survival rate model at each follow-up
The posterior distribution of θji is given by model (20).
Thus the posterior for the survival functions at all follow–ups
for each treatment is
S(tj ) =
j
k=1
(1 − θki )
To obtain the full distribution, we assume that it is a piecewise
linear function between the follow–up times,
t0 = 0, t1, . . . , tk = tmax.
Victims of recrudescence between tj−1 and tj at the average
get a recurrence at the time midpoint (tj−1 + tj )/2.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis44/77
53. References
Delay time model contd
Mean survival time
Assuming that T is continuous with density f(s), then the
probability of surviving the event of interest till time t is
S(t) = 1 − F(t) =
∞
t
f(s)ds.
Then
E(T) =
∞
0
tf(s)ds =
∞
0
S(s)ds. (21)
For a given time tk = tmax < ∞,
E(T) =
j
[(tj −tj−1)S(tj )+(S(tj−1)−S(tj ))(tj −tj−1)/2]+E(max(T−Tmax,0))
=
j
[(tj −tj−1)(S(tj )+S(tj−1))/2)]+E(max(T−Tmax,0)). (22)
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis45/77
54. References
Delay time model
Mean survival time
The last term in the sum in equation (22) is the healthy time in
the period for those who get sick within the same period. The
last term outside the sum corresponds to the excess time of
those who are healthy at the end of the follow–up.
For tj < tmax, the conditional survival function of those that
have a first recurrence is
S∗
(tj |T < tmax) =
j
k=1
(1 − θki )/ (1 − S(tmax)) . (23)
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis46/77
55. References
Mean delay time model
Mean difference in survival times
For T1 and T2, the expected difference between the two
survival times is
E [T1 − T2] = E
∞
0
S1(t1)dt1 −
∞
0
S2(t2)dt2 .
This can further be simplified to
E T1 − T2|˜T =
1
2
j
(tj + tj−1) S∗
1(tj |˜T) − (S∗
2(tj |˜T) , (24)
where ˜T = T < Tmax. We note that E(max(T − Tmax, 0)) = 0
since tk = tmax must not be an event time in the model
assumption
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis47/77
56. References
Computations
Monte Carlo implementation
We use the MCMC Gibbs sampler in the Bayesian setup.
For the efficacy posterior estimates and densities, we draw
random samples from their posterior distributions defined in
model (19).
For adjornment time computation
Model (24) ia complex posterior and it links (2 × (k − 1)) Beta
distributions corresponding to two treatments and k follow–up
times. which we can call H
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis48/77
57. References
Computations
Monte Carlo implementation contd
Sample θ
(1)
ji from Beta(Yij + α, Rj,i − Yj,i + β), then compute
S
∗(1)
j,i and H(1),
Go on, . . ., Sample θ
(N)
ji from Beta(Yji + α, Rij − Yij + β),
compute S
∗(N)
ij and H(N).
The resulting sequence {H(1), H(2), . . . , H(N)} constitutes N
independent samples from H.
From these, we are able to obtain our estimates.
A histogram is plotted for these simulations to obtain the
posterior distribution for H = E[T1 − T2|(.)]
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis49/77
58. References
Application to data
Cured Data
Table 9: Number of patients cured of malaria and those at start (in
parentheses)
Location Drug (0 – 42] (0 – 84]
ASP 86 (90) 34 (90)
KONDE SP 43 (86) 29 (86)
ASP 63 (94) 56 (94)
UZINI SP 63 (110) 57 (110)
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis50/77
59. References
Data for recurrence
We applied models to the given data
Table 10: Number with first recurrence of malaria and those at risk (in
parentheses)
Location Drug (0–7] (7–21] (21–28] (28–42] (42–56] (56–84]
ASP 3 (90) 8 (87) 21 (79) 16 (58) 7 (42) 3 (35)
KONDE SP 7 (86) 17 (79) 10 (62) 9 (52) 10 (43) 4 (33)
ASP 4 (94) 13 (90) 9 (77) 8 (68) 9 (60) 1 (51)
UZINI SP 15 (110) 14 (95) 13 (81) 6 (68) 2 (62) 3 (60)
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis51/77
61. References
Results of simulations contd
Cure rate model
Table 12: Posterior Estimates: P(ASP>SP)
Period KONDE UZINI
(0–42] 0.55 0.94
(0–84] 0.71 0.87
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis53/77
62. References
Results of simulations contd
Cure rate model
Figure 1: Treatment Efficacy after
42 days Posterior Distribution
Figure 2: Treatment Efficacy after
84 days Posterior Distribution
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis54/77
63. References
Results of simulations contd
First recurrence at each follow–up
Figure 3: KONDE: First recurrence
posterior densities at each
follow–up
Figure 4: UZINI: First recurrence
posterior densities at each
follow–up
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis55/77
64. References
Results of simulations contd
Day 42 Posterior survival plots
Figure 5: Observed 42 days
posterior survival plots
Figure 6: Truncated 42 days
posterior survival plots
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis56/77
65. References
Results of simulations contd
Day 84 Posterior survival plots
Figure 7: Observed 84 days
posterior survival plots
Figure 8: Truncated 84 days
posterior survival plots
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis57/77
66. References
Results of simulations contd
Mean Delay time to first recurrence
Table 13: Posterior Estimates for Mean Delay by ASP
Location Period Parameter Estimate SE 2.5% 50% 97.5%
0 – 42 µk1 6.38 1.50 3.35 6.40 9.23
KONDE 0 – 84 µk2 2.98 2.73 -2.43 3.00 8.28
0 – 42 µu1 6.11 1.46 3.18 6.14 8.90
UZINI 0 – 84 µu2 7.78 2.70 2.34 7.827 12.96
0 – 42 µku 6.24 1.05 4.20 6.24 8.29
KONDE–UZINI 0 – 84 µku 5.41 1.92 1.64 5.41 9.17
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis58/77
67. References
Results of simulations contd
Posterior densities
Figure 9: Posterior densities in
delay time in 42 days
Figure 10: Posterior densities in
delay time in 84 days
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis59/77
68. References
Discussion and Conclusions
Discussion
The results obtained still okays ACTs as better treatment for
uncomplicated malaria
Contribution to existing knowledge
A major contribution to existing knowledge on the efficacy of
malaria treatment studies is the delay time.
In case of treatment failure, recipients of ASP will stay
asymptomatic for 7 days if treatment was administered for 42
days. The delay time will be 6 days, if treatment was provided
for 84 days.
Results have been published in2
. (International Journal of
Statistics in Medical Research-2013)
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis60/77
69. References
Chapter 6: Effects of background variables
Motivation
In Chapter 5, we compared the efficacies of SP and ASP
without considering any covariates.
In the clinical study, there were some background variables on
the patients and the severity of the infection at baseline.
We extend the work done in the last chapter by studying the
effects of such variables on recurrence or non-recurrence of
malaria.
This is done using logistic regression analysis.
We choose the classical logistic regression over the Bayesian
logistic regression because the Bayesian logistic approach
with independent non-informative priors will provide almost
the same results as the classical method.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis61/77
70. References
A few background variables noted during the study
Covariates
Time: The date of enrolment of the children into the study in
calendar days was known in Konde.
Age: The age of the children recruited for the study.
Drug Type: Artesunate plus sulfadoxine-pyrimethamine (ASP)
or sulfadoxine-pyrimethamine (SP)
Ri , Si , Mi for three sites I = 1, 2, 3. They indicate whether
there were only resistant genes, only sensitive genes or both
types present in the blood sample.
D0p: The number of parasites per millilitre of blood on day
zero
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis62/77
71. References
Effects of background variables
Logistic model
Here Y, is dichotomous, that is, recurrence or non–recurrence
of disease.
The expected value (or mean) of Y is the probability that
Y = 1 and it is limited to the range 0 through 1, inclusive.
If we let π = P(Y = 1), the ratio π/(1 − π) take on values in
(0, +∞) and its logarithm (ln) of π/(1 − π) in (−∞, +∞).
In multiple logistic regression, the probability of patient j being
cured subject to some covariates, that is,
πj = P(Yj = 1|x1j , x2j , . . . , xkj ), is written as
ln
πj
1 − πj
= β0 + β1x1j + β2x2j + . . . + βk xkj . (25)
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis63/77
72. References
Effects of background variables
Logit transformation
Using the logit transformation, we have
πj =
exp(β0 + β1x1j + β2x2j + . . . + βk xkj )
1 + exp(β0 + β1x1j + β2x2j + . . . + βk xkj )
. (26)
Concerning the subscript j, we do not actually have logits for
each individual observation but just have 0’s or 1’s. As a
consequence, instead of πj on the left hand side of equations
25 and 26, we can simply write π.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis64/77
73. References
Proposed models
The following models are motivated by our objective to
determine the probability of a no recurrence of malaria given
clinical or important variables that were measured during the
clinical trial.
logit(π)=β1+β2DRUG+β3M+β4R+β5M′
1+β6M′
2+β7M′
3+β8R′
1+β9R′
2+β10R′
3.
(27)
logit(π)=β1+β2 log(D0p)+β3TIME+β4PCODE+β5DRUG+β6M′+β7R′.
(28)
logit(π)=β1+β2 log(D0p)+β3AGE+β4PCODE+β5DRUG+β6M′+β7R′. (29)
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis65/77
74. References
Choosing the final models
important covariate
The above models cannot adequately explain the probability
of no first recurrence of parasites.
However, we had formulate two reduced models keeping
clinically important variables.
In our case Drug type, PCODE, M′, R′ and S′ are clinically
and intuitively important. The two models of interest are:
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis66/77
75. References
Final retained models
important covariate
The following two were retained
logit(π) = β1 + β2DRUG + β3PCODE + β4M′
+ β5R′
. (30)
logit(π) = β1 + β2DRUG + β3PCODE + β4S′
. (31)
Applying these models to data from Konde and Uzini, we have
results presented in Tables 14 and 15, respectively.
These values are also presented on these Tables.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis67/77
76. References
Final retained models
Results from a model in Konde Day 42
Table 14: Effect of some factors on the probability of cure in Konde
ESTIMATES ON KONDE DATA
0–42 days β exp(β)SE(β)z-value Pr(> |z|)
Model 30
Intercept 0.016 1.016 0.837 0.019 0.985
DrugSP -0.152 0.859 0.347 -0.438 0.661
PCODE 0.630 1.878 0.381 1.654 0.098
M′ -0.369 0.691 0.302 -1.225 0.221
R′ -0.175 0.840 0.303 -0.576 0.564
Null dev: 194.03 on 141 df; Resid dev: 188.67 on 137 df(187.01 on 135 df)
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis68/77
77. References
Final retained models
Results from a model in Uzini Day 42
Table 15: Effect of some factors on the probability of cure in Uzini
ESTIMATES ON UZINI DATA
0–42 days β exp(β)SE(β)z-value Pr(> |z|)
Model 30
Intercept 0.381 1.464 0.577 0.661 0.509
DrugSP -0.555 0.574 0.309 -1.798 0.072
PCODE -0.483 0.617 0.453 -1.067 0.286
M′ 0.266 1.304 0.213 1.249 0.212
R′ 0.365 1.440 0.173 2.111 0.035∗
Null dev: 256.97 on 192 df; Resid dev: 248.19 on 188 df(242.95 on 186 df)
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis69/77
78. References
Some results
Interpretation of results for Day 42 in Konde
Large number of parasites in the blood increases the chances
of getting cured, especially in Konde
The probability of getting cured is higher with S – genes
Time not significant
Resistance associated more with M – genes
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis70/77
79. References
General Conclusions
Methodology
We have built probability, statistical and survival models
stimulated by data from a clinical trial on efficacy of two
treatments.
The thesis carefully puts modelling theory and application in
one piece.
Probability models for the estimation of haplotype frequencies
were proposed and better model obtain by model
discrimination procedures.
Haplotype frequencies would have been underestimated if we
did not use combinatorics to unveil the hidden possible
haplotypes
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis71/77
80. References
Conclusions
Important results
There were eight different haplotypes that could infect victims
independently.
Haplotypes with sensitive genes at the second and third
positions decreased at first episode of the disease since start
of administration of SP.
SP was nor effective in killing all parasites with resistant
strains and the surviving parasites were responsible for
recurrence of malaria.
ASP known to have a faster parasites clearance, cleared all
parasites and any first episode of malaria should be from a
new infection.
Sick children could have on the average been bitten between
1 to 3 times by mosquitoes and there were 1 to 3 different
haplotypes present at baseline.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis72/77
81. References
Conclusions
Important results 2
It is well established that the spread of resistance to SP may
be delayed by its combination with artesunate.
How long one may remain free from the parasites has not
been estimated to the best of our knowledge.
We obtained Bayesian estimates for the duration which can
be up to 7 days for a follow–up period of 42 days and 6 days
for a follow–up period of 84 days, respectively.
The logistic models cautiously say that the higher the parasite
density the smaller the risk for a recurrence of malaria.
The children had no partial immunity.
Recurrence of malaria was more common with children
harbouring multiple infections followed by children carrying the
single resistance strain.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis73/77
82. References
Room for improvement
For the future
We assumed uniform distributions for everyone within
intervals between follow–up dates, which is a limitation. Why
not others such as the exponential?
We focused only on the event of a first recurrence of malaria.
Why not second or third recurrence?
The methods can be generalized to clinical investigations
involving more than two study sites and using more than two
treatments.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis74/77
83. References
Remark
Are we perfect?
Models will be useful in the fight against malaria, if they are
formulated with important biological and practical realities in
mind and when their results are interpreted with care
Some of the models were rejected, but the rejection of these
models does not remove intrinsic biological questions that
motivated their modelling.
Models in general with malaria models inclusive are not
perfect just as the real world.
However, their findings can be useful to the universal malaria
control community
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis75/77
84. References
Thank you for your attention
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis76/77
85. References
Thank you for your attention
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis76/77
86. References
References I
[1] Irwin, J. (1949). The standard error of an estimate of expectation of life, with
special reference to expectation of tumourless life in experiments with mice.
Journal of Hygiene, 47(02):188–189.
[2] Kum, C. K., Thorburn, D., Ghilagaber, G., Gil, P., and Björkman, A. (2013a). A
nonparametric bayesian approach to estimating malaria prophylactic effect
after two treatments. International Journal of Statistics in Medical Research,
2(2):76–87.
[3] Kum, C. K., Thorburn, D., Ghilagaber, G., Gil, P., and Björkman, A. (2013b).
On the effects of malaria treatment on parasite drug resistance: Probability
modelling of genotyped malaria infections. The International Journal of
Biostatistics, 9(1):1–14.
[4] Nelder, J. and Mead, R. (1965). A simplex method for function minimization.
The computer journal, 7(4):308–313.
Cletus Kwa Kum Estimating Haplotype Frequencies & Bayesian Survival Analysis77/77