SlideShare a Scribd company logo
A Bayesian approach to identifying high-concern individuals in an infection-bearing population
A Bayesian approach to identifying high-concern
individuals in an infection-bearing population
Anqi Dong
March 9, 2012
1 Background
The delayed diagnosis and treatment of in-
dividuals who are carrying infectious diseases
can place a large burden on the healthcare sys-
tem [4, 8] and the general population. Such
delays are especially worrisome for infections
with prolonged asymptomatic periods, such as
HIV, chlamydia, and gonorrhea, in environments
where many persons have vulnerable immune
systems, such as hospitals and nursing homes,
and in environments characterized by high inter-
personal contact rates, such as schools and pris-
ons. Public health authorities use methods such
as contact tracing to locate contacts of reported
infected individuals and determine whether they
are infected as well. The lack of direct connec-
tion between an individual’s contact patterns and
that individual’s infection state means that con-
tact tracing procedures often need to be fairly
exhaustive in order to not miss infected individu-
als [9,13].
In certain contexts, there is much data col-
lected about the individual-level characteristics
of an infection within a population. One of the
key types of data about populations that is col-
lected is descriptions of potentially infection-
transmitting person-to-person interactions. These
interactions are generally represented using con-
tact networks—graphs, with vertices represent-
ing persons and edges connecting persons who in-
teract with each other [3]. For large, well-mixed
populations, such as those of cities, it is difficult
and infeasible to obtain useful sets of data [11].
However, traditional contact tracing provides
data on individuals during outbreak investiga-
tions. In addition, for smaller, relatively closed
settings, such as hospitals, nursing homes, and
schools, it is possible to gather detailed epidemi-
ological data by distributing on-body proximity
sensors to the entire population to log persons’
contacts [6,12].
Bayesian statistical approaches are valu-
able tools for analysis of epidemiological data.
Current literature on using Bayesian tech-
niques in the context of infection spread on a
heterogeneously-mixing contact network gen-
erally focuses on inferring infection parame-
ters such as the average probability of infec-
tion [1, 2, 5, 7, 10]. Current techniques assume
that data on the structure of contact networks is
virtually nonexistent, and therefore either ignore
the effects of contact networks on the infection,
or sample from generated networks of a simple
family of graphs (for example, Bernoulli random
graphs) as part of the Bayesian approach [2,5].
This means that there is very little work on in-
ferring additional individual-level characteris-
tics from epidemiological data, especially when
knowledge of the contact network is good.
Analyses of available data on contact net-
works and individual epidemiological records
could better inform healthcare systems about
various infection patterns and trends, so that
March 9, 2012 Anqi Dong Page 1 of 6
A Bayesian approach to identifying high-concern individuals in an infection-bearing population
they can better target their limited efforts and
resources. Such efficiencies include prioritizing
contact tracing and testing to first assess persons
who haven’t reported but who are likely to be
infected, and first vaccinating those who are un-
infected but who are at high risk of immediate
infection.
Coupled with an automated, continuous data-
gathering system such as iEpi [6], an inference
system could provide quasi-real-time predictions
in institutional settings, identifying unknown
sources of infection or patients at high risk of
becoming infected. The spatial data provided by
such a data-gathering system could even consider
hidden environmental pathogen reservoirs, such
as a contaminated surface at a particular location.
2 Project goals
To implement an inference system that iden-
tifies persons of interest or concern in a contact
network using an incomplete set of epidemio-
logical data. These persons of interest include
likely high spreaders of infection, persons who
are probably infected but whose infection sta-
tuses are unknown, and persons who probably
are not currently infected, but are likely to be-
come infected soon.
To design a general mathematical framework
for performing inference on individual-level con-
tact data and infection histories.
3 Methods
3.1 Epidemiological model
We consider a contact network C of n per-
sons. In terms of infection spread, the network is
idealized as a closed one, meaning that infection
cannot enter C except via a small set of persons—
the index infectives (infectious individuals) of the
population. This network can be heterogeneous—
the persons in C do not necessarily have the same
number of contacts or patterns of connection.
Let C be made up of persons p1, p2,..., pn.
In C , each person pi has a number of contacts,
the set C (pi). For each person pj that is an ele-
ment of C (pi), we can define cpj→pi(t), the rate
of contact from pj to pi at time t. This rate is not
symmetric, so cpj→pi(t) = cpi→pj (t) in general.
The function cpj→pi(t) reflects the number of po-
tentially infection-transmitting contacts from pj
to pi. Depending on the pathogen, a “contact”
can be events like sneezing, needle-sharing, or
sexual contact.
We use the convention that cpj→pi(t) = 0 if
pj is not contacting pi at t. This occurs, for ex-
ample, if pj and pi do not contact each other,
and also when pj is not infected. Per contact, an
uninfected person has a certain probability β of
becoming infected as the result of that contact.
Infectiousness is a boolean state—a person is
either infected or not, and a person cannot be re-
infected when already infected. We refer to the
product of β and the cumulative number of con-
tacts pi experiences per unit time at some time
as the “infection pressure” felt by pi at that time.
After becoming infected, we represent each
person as presenting their infection following
a second-order delay. In this paper, we make
the simplifying assumption that all patients are
treated for their infection upon presentation,
meaning that they will not continue spreading
infection after presentation.
We assume that a patient cannot naturally re-
cover from an infection—recover without health-
care intervention. We also assume herein that
each person can become infected at most once.
In order to create a general inference frame-
work, this model is not specific to a particular
pathogen, but instead can be applied reasonably
well to a range of microparasitic infections. Our
model parameters are thus not necessarily repre-
sentative of a specific disease.
3.2 Inference methods
Let ip and tp be respectively the infection and
presentation times of some arbitrary person p.
Consider
P tp ip C , (1)
the probability density of some presentation and
infection times, given a contact network. For
March 9, 2012 Anqi Dong Page 2 of 6
A Bayesian approach to identifying high-concern individuals in an infection-bearing population
brevity, we omit some terms of this equation
when discussing it below.
By integrating or summing over some range
of values, and comparing the cumulative proba-
bility of this subset of values to the probability
of the universal set of all permissible values, we
can determine how likely the subset of data is to
occur. As there are many types of data embed-
ded in (1), the above probability density, with
manipulation, provides a rich set of probabilis-
tic information about the sets of infection and
presentation times and C . Here, we focus on de-
termining probabilities related to the infection
time of a certain person p.
P(ip) means “the probability density of per-
son p becoming infected at time ip”. This implies
two things: that person p was not infected before
time ip, and that person p was infected exactly at
time ip. Moreover, P(ip) is a probability density.
To find the probability of person p becoming
infected during the interval [a,b], we integrate,
finding the value of b
a P(ip)dip. Alternately, if
we know that person p must have been infected
somewhere in the interval [c,d], we can use the
definition of conditional probabilities to find that
the probability of ip being in the interval [a,b] is
b
a P(ip)dip
P(U)
=
b
a P(ip)dip
d
c P(ip)dip
,
where U is the universal set.
If we know the presentation or infection time
for a person, we can simply insert that value
into (1). However, often epidemiological data is
more scarce, and many presentation and infec-
tion times are not available. In this case, we can
marginalize the probability through integration.
For example, for some person q, if tq is known
to be in the range [a,b], and iq is known to be
within the range [c,d], the probability that iq < k
for some k, where a < k < b, can be calculated
as
k
a P(iq)diq
b
a P(iq)diq
=
k
a
d
c P(iqtq)diq dtq
b
a
d
c P(iqtq)diq dtq
. (2)
Equation 2 has some important consequences:
the probability that person p is already infected
but has not presented is equivalent to the prob-
ability that ip < T and tp > T (where T is the
current time). Also, the probability that p is un-
infected but will be infected “soon” is equivalent
to the probability that T < tp < T +∆t, where ∆t
quantifies the duration of “soon”.
In our inference model, we additionally allow
for the case where some individuals have been
tested in the past for their “infection status” (in-
fected/uninfected) at that time. We incorporate
this testing data by enforcing additional bounds
on the infection times. For example, if person p
was tested to be uninfected at time x1 and found
to be infected at time x2 (where x2 > x1), we
know that x1 < ip < x2. We can perform similar
bounding with a presentation time: if p has not
yet presented at the present time T, we know that
tp > T. However, the probability density function
itself remains unchanged, as knowledge about in-
fection status does not affect how the infection
behaves.
To calculate the numerical value of (1) and
related equations, we factor the probability into a
product of probabilities, with terms of the general
form P(tp|ip)P ip q∈C (p)
iq . Both of these
two probability terms are expressed in closed
form using typical epidemiological representa-
tions of infection.
3.2.1 Infection time partial ordering
If, say, both iA and iB are unknown, and A and
B are connected, we may not be able to determine
the direction of infection pressure (A → B versus
B → A). To resolve this ambiguity, we marginal-
ize the probability in Equation 1 as follows:
P tp ip C = ∑
d∈D
P tp ip d C .
The directed acyclic graph (dag) d imposes a
topological ordering on C . For each edge, d spec-
ifies which person of the pair was infected first,
thereby also specifying the directionality of infec-
tion pressure. The set D contains all the permis-
March 9, 2012 Anqi Dong Page 3 of 6
A Bayesian approach to identifying high-concern individuals in an infection-bearing population
sible dags that contain all the vertices of C and
provide an ordering for all edges of C . Dags that
contradict other knowledge about the ordering of
infection times are excluded from D.
When computing probabilities considering
only infection and presentation times, d is a nui-
sance parameter, and we mathematically rewrite
the expressions to eliminate the use of d and D
in the final integral to be evaluated.
3.3 Assessment of inference
Sources in the literature generally assess the
accuracy and performance of their developed in-
ference algorithms by running their inference
models on historical datasets and discussing the
logicality of the results of the inference. While
such demonstrations are valuable in showing the
practicality of inference results, it is difficult to
validate statistical measures of historical data.
To assess the performance of our inference
algorithm, I instead developed and used a sim-
ulation model of infection spread. This model
representing the infection mechanisms described
above on a best-effort basis, simulating infection
spread and testing for all individuals contained
within a computer-generated contact network.
4 Results and discussion
My primary method of assessing the numer-
ical behavior of the inference algorithm was to
plot the probability of person p becoming in-
fected before time x (given the contact network,
some presentation times, and some patient his-
tory as recorded at some time t, where t may be
less than x) as a function of x. An example of
this can be seen in Figure 1. I plotted this figure
by manually splitting the desired integral along
ip into several integrals with mutually exclusive
regions.
Generally, the results produced by the infer-
ence appear reasonable, considering the addi-
tional data produced by the model (data that was
not used in the inference). That is, the time at
which p was infected in the simulation model is
usually close to or at a part of the integral with
a high rate of change. However, the results of
the simulation model are not necessarily highly
probable, and the probability of ip < t is not a
perfect analogue to the probability density that
ip = t, so such a comparison is not definitive.
Making use of data on patients’ infection his-
tory can lead to the probability densities exhibit-
ing subtle behavior. For example, if person X was
tested to be uninfected at time 2.95, inference
will usually suggest that there is a low probabil-
ity that X was infected by time 3.00. However, if
it was not known that X was uninfected at time
2.95, the probability that iX < 3.00 may be much
higher, for there would then be no restriction
that iX ≥ 2.95. Here, knowledge that iX ≥ 2.95
did not change the probability that iX = 3.00
but it did change the probability that iX < 3.00.
This further demonstrates that the probability that
ip < t is not analogous to the probability density
that ip = t.
Monte Carlo numerical integration (MCI)
techniques were used to evaluate the probabil-
ities required for inference. As MCI is a stochas-
tic technique, it is difficult to properly estimate
the technique’s precision without detailed math-
ematical knowledge of the specific integrand.
While error estimators such as the one in Mathe-
matica proved to be inaccurate assessors of the
MCI’s precision, testing smaller, symbolically
integrable functions showed that the used MCI
implementation was usually within an order of
magnitude of the exact integration value. Consid-
ering that different intervals of integration rou-
tinely differ from each other by ratios of 1010
or more, a magnitude of precision is probably
sufficient for most statistical uses.
Increasing the number of individuals in
C leads to higher-dimensional integrals and a
smaller integrand (in terms of absolute value).
For larger graphs, because of the high variance
observed when performing multiple evaluations
of the same integral, the simple MCI techniques
in Mathematica (the techniques that are currently
used) will be inadequate when scaling up the
inference algorithm to large contact networks.
March 9, 2012 Anqi Dong Page 4 of 6
A Bayesian approach to identifying high-concern individuals in an infection-bearing population
0%
20%
40%
60%
80%
100%
0 1 2 3 4 5 6
Probabilitythatpersonhas
alreadybecameinfected
Model time
ip4 ip5 ip6 ip7
Figure 1: A plot of the cumulative probability of each person in a four-person contact network being
already infected, as a function of time.
However, using a well-designed Markov chain
process for point sampling during the integration
would lead to better integrand stability and allow
for inference to be performed on large contact
networks in a reasonable amount of time. Work
is being done towards implementing this feature
into the inference algorithm.
It may be the case that inputting higher-
degree contact networks into the inference model
may lead to tighter inferred distributions, because
larger contact networks generally embed more
information and heterogeneity. The increased
amount of available data means better statistics
can be inferred. However, some reengineering of
the integration mechanism is likely required be-
fore large-scale testing of higher-degree contact
networks can be performed.
5 Conclusions
The inference algorithms I developed demon-
strate that it is possible to infer distributions for
the likelihood of becoming infected at a certain
time from limited epidemiological data (contact
network structure, some presentation times, and
some infection testing history), even if this time
is in the future. However, when using Bayesian
probabilistic techniques, it is important to remem-
ber that they are not omniscient or failproof. The
inference techniques described herein, while po-
tentially very powerful, can be not very informa-
tive or even misleading if used to analyze sig-
nificantly erroneous data or insufficient sets of
data.
Though discussions of the required proce-
dures is beyond the scope of this report, it is
clear that our inference algorithm can be eas-
ily adapted mathematically to represent an even
wider range of scenarios and to infer more types
of data. Possible extensions of this inference
model in the near future include representing
natural recovery, allowing for dynamic (evolv-
ing) contact networks, modeling static sources
of infection, and determining the probability of a
particular directed edge spreading infection.
6 Acknowledgments
I would like to thank Dr. Michael Horsch
and Dr. Nathaniel Osgood of the University of
Saskatchewan for their oversight and their sug-
gestions.
March 9, 2012 Anqi Dong Page 5 of 6
A Bayesian approach to identifying high-concern individuals in an infection-bearing population
References
[1] T. Britton, T. Kypraios, and P. D. O’Neill. Inference for epidemics with three levels of mixing:
Methodology and application to a measles outbreak. Scandinavian Journal of Statistics,
38(3):578–599, 2011.
[2] T. Britton and P. D. O’Neill. Bayesian inference for stochastic epidemics in populations with
random social structure. Scandinavian Journal of Statistics, 29:375–390, 2002.
[3] K. T. D. Eames and M. J. Keeling. Contact tracing and disease control. Proceedings of the
Royal Society of London B, 270:2565–2571, 2003.
[4] J. A. Fleishman, B. R. Yehia, R. D. Moore, K. A. Gebo, and HIV Research Network. The
economic burden of late entry into medical care for patients with hiv infection. Med Care,
48(12):1071–1079, 2010.
[5] C. Groendyke, D. Welch, and D. R. Hunter. Bayesian inference for contact networks given
epidemic data. Scandinavian Journal of Statistics, 38:600–616, 2011.
[6] M. Hashemian, K. G. Stanley, D. L. Knowles, J. Calver, and N. D. Osgood. Human network
data collection in the wild: The epidemiological utility of micro-contact and location data. In
Proceedings of the ACM SIGHIT International Health Informatics Symposium (IHI 2012),
Miami, FL, January 28–30 2012.
[7] Y. Hosseinkashi. Statistical Inference on Stochastic Graphs. PhD thesis, Department of
Statistics, University of Waterloo, 2011.
[8] H. B. Krentz, M. C. Auld, and M. J. Gill. The high cost of medical care for patients who
present late (CD4 < 200 cells/µl) with HIV infection. HIV Medicine, 5:93–98, 2004.
[9] C. Mulder, C. G. M. Erkens, P. M. Kouw, E. M. Huisman, W. Meijer-Veldman, M. W. Borgdorff,
and F. van Leth. Missed opportunities in tuberculosis control in the netherlands due to
prioritization of contact investigations. European Journal of Public Health (advance access),
2011.
[10] J. Ray and Y. M. Marzouk. A Bayesian method for inferring transmission chains in a partially
observed epidemic. In Proceedings of the Joint Statistical Meetings, Denver, CO, 2010. Sandia
National Laboratories.
[11] J. Read, K. Eames, and W. Edmunds. Dynamic social networks and the implications for the
spread of infectious disease. Journal of the Royal Society Interface, 5:1001–1007, 2008.
[12] M. Salath´e, M. Kazandjieva, J. W. Lee, P. Levis, M. W. Feldman, and J. H. Jones. A high-
resolution human contact network for infectious disease transmission. Proceedings of the
National Academy of Sciences of the USA, 107(51):22020–22025, 2010.
[13] J. Veen. Microepidemics of tuberculosis: the stone-in-the-pond principle. Tubercle and Lung
Disease, 73:73–76, 1992.
March 9, 2012 Anqi Dong Page 6 of 6

More Related Content

Viewers also liked

AgilePM 03: Incremental delivery
AgilePM 03: Incremental deliveryAgilePM 03: Incremental delivery
AgilePM 03: Incremental delivery
Frank Turley
 
Project1 Jessin Jose
Project1 Jessin JoseProject1 Jessin Jose
Project1 Jessin Jose
Jessin
 
Management Practices in Real Life
Management Practices in Real LifeManagement Practices in Real Life
Management Practices in Real Life
Ibrahim Rasel
 
Are comparative ads persuasive
Are comparative ads persuasiveAre comparative ads persuasive
Are comparative ads persuasiveGraceYLi
 
8 09 Tech Marketing Ideas
8 09 Tech Marketing Ideas8 09 Tech Marketing Ideas
8 09 Tech Marketing Ideas
Mandy de Leon
 
Non geeks-big-data-playbook-106947
Non geeks-big-data-playbook-106947Non geeks-big-data-playbook-106947
Non geeks-big-data-playbook-106947
CMR WORLD TECH
 
Presentación cinc indrets de bcn
Presentación cinc indrets de bcnPresentación cinc indrets de bcn
Presentación cinc indrets de bcn
cfapalaudemar
 
Media literacy(quizzes)
Media literacy(quizzes)Media literacy(quizzes)
Media literacy(quizzes)
Jayr Nator
 
Patterns for Asynchronous Microservices with NATS
Patterns for Asynchronous Microservices with NATSPatterns for Asynchronous Microservices with NATS
Patterns for Asynchronous Microservices with NATSRaül Pérez
 
pool campus 2
pool campus 2pool campus 2
pool campus 2
chetan9212
 
UCL ISOP January 2017 - Living in London
UCL ISOP January 2017 - Living in LondonUCL ISOP January 2017 - Living in London
UCL ISOP January 2017 - Living in London
wdurdle
 
оценка качества интернет ресурсов тан сяотан
оценка качества интернет ресурсов тан сяотаноценка качества интернет ресурсов тан сяотан
оценка качества интернет ресурсов тан сяотан
Xiaotang tang
 
Innovation Assessment Questionnaire
Innovation Assessment QuestionnaireInnovation Assessment Questionnaire
Innovation Assessment Questionnaire
Boardroom Metrics
 
Shooting schedule-overview complete
Shooting schedule-overview completeShooting schedule-overview complete
Shooting schedule-overview complete
rhsmediastudies
 

Viewers also liked (15)

AgilePM 03: Incremental delivery
AgilePM 03: Incremental deliveryAgilePM 03: Incremental delivery
AgilePM 03: Incremental delivery
 
Project1 Jessin Jose
Project1 Jessin JoseProject1 Jessin Jose
Project1 Jessin Jose
 
Management Practices in Real Life
Management Practices in Real LifeManagement Practices in Real Life
Management Practices in Real Life
 
Are comparative ads persuasive
Are comparative ads persuasiveAre comparative ads persuasive
Are comparative ads persuasive
 
8 09 Tech Marketing Ideas
8 09 Tech Marketing Ideas8 09 Tech Marketing Ideas
8 09 Tech Marketing Ideas
 
Non geeks-big-data-playbook-106947
Non geeks-big-data-playbook-106947Non geeks-big-data-playbook-106947
Non geeks-big-data-playbook-106947
 
Presentación cinc indrets de bcn
Presentación cinc indrets de bcnPresentación cinc indrets de bcn
Presentación cinc indrets de bcn
 
Media literacy(quizzes)
Media literacy(quizzes)Media literacy(quizzes)
Media literacy(quizzes)
 
Patterns for Asynchronous Microservices with NATS
Patterns for Asynchronous Microservices with NATSPatterns for Asynchronous Microservices with NATS
Patterns for Asynchronous Microservices with NATS
 
pool campus 2
pool campus 2pool campus 2
pool campus 2
 
UCL ISOP January 2017 - Living in London
UCL ISOP January 2017 - Living in LondonUCL ISOP January 2017 - Living in London
UCL ISOP January 2017 - Living in London
 
Smis3
Smis3Smis3
Smis3
 
оценка качества интернет ресурсов тан сяотан
оценка качества интернет ресурсов тан сяотаноценка качества интернет ресурсов тан сяотан
оценка качества интернет ресурсов тан сяотан
 
Innovation Assessment Questionnaire
Innovation Assessment QuestionnaireInnovation Assessment Questionnaire
Innovation Assessment Questionnaire
 
Shooting schedule-overview complete
Shooting schedule-overview completeShooting schedule-overview complete
Shooting schedule-overview complete
 

Similar to ysf_report

Descriptive epidemiology
Descriptive epidemiologyDescriptive epidemiology
Descriptive epidemiology
Navas Vadakkangara
 
Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...
Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...
Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...
Muhammad Habibi
 
Defing the Epidemiologic of Covid-19
Defing the Epidemiologic of Covid-19Defing the Epidemiologic of Covid-19
Defing the Epidemiologic of Covid-19
Valentina Corona
 
MWEBAZA VICTOR - AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE ...
 MWEBAZA VICTOR - AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE ... MWEBAZA VICTOR - AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE ...
MWEBAZA VICTOR - AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE ...
Dr. MWEBAZA VICTOR
 
Per contact probability of infection by Highly Pathogenic Avian Influenza
Per contact probability of infection by Highly Pathogenic Avian InfluenzaPer contact probability of infection by Highly Pathogenic Avian Influenza
Per contact probability of infection by Highly Pathogenic Avian Influenza
Harm Kiezebrink
 
MathematicallyModelingEpidemicsThroughtheUseoftheReed-FrostEquation
MathematicallyModelingEpidemicsThroughtheUseoftheReed-FrostEquationMathematicallyModelingEpidemicsThroughtheUseoftheReed-FrostEquation
MathematicallyModelingEpidemicsThroughtheUseoftheReed-FrostEquationAlexander Kaunzinger
 
Al04606233238
Al04606233238Al04606233238
Al04606233238
IJERA Editor
 
The SIR Model and the 2014 Ebola Virus Disease Outbreak in Guinea, Liberia an...
The SIR Model and the 2014 Ebola Virus Disease Outbreak in Guinea, Liberia an...The SIR Model and the 2014 Ebola Virus Disease Outbreak in Guinea, Liberia an...
The SIR Model and the 2014 Ebola Virus Disease Outbreak in Guinea, Liberia an...
CSCJournals
 
The role of influenza in the epidemiology of pneumonia
The role of influenza in the epidemiology of pneumoniaThe role of influenza in the epidemiology of pneumonia
The role of influenza in the epidemiology of pneumoniaJoshua Berus
 
Infection and Disease 2021-22.pptx
Infection and Disease 2021-22.pptxInfection and Disease 2021-22.pptx
Infection and Disease 2021-22.pptx
jelikov
 
A COMPUTER VIRUS PROPAGATION MODEL USING DELAY DIFFERENTIAL EQUATIONS WITH PR...
A COMPUTER VIRUS PROPAGATION MODEL USING DELAY DIFFERENTIAL EQUATIONS WITH PR...A COMPUTER VIRUS PROPAGATION MODEL USING DELAY DIFFERENTIAL EQUATIONS WITH PR...
A COMPUTER VIRUS PROPAGATION MODEL USING DELAY DIFFERENTIAL EQUATIONS WITH PR...
IJCNCJournal
 
Session ii g3 overview epidemiology modeling mmc
Session ii g3 overview epidemiology modeling mmcSession ii g3 overview epidemiology modeling mmc
Session ii g3 overview epidemiology modeling mmcUSD Bioinformatics
 
Investigation of an epidemic
Investigation of an epidemicInvestigation of an epidemic
Investigation of an epidemic
Devyani Wanjari
 
Science.abb3221
Science.abb3221Science.abb3221
Science.abb3221
gisa_legal
 
Imperial college-covid19-npi-modelling-16-03-2020
Imperial college-covid19-npi-modelling-16-03-2020Imperial college-covid19-npi-modelling-16-03-2020
Imperial college-covid19-npi-modelling-16-03-2020
Wouter de Heij
 
Design of a Clinical Decision Support System Framework for the Diagnosis and ...
Design of a Clinical Decision Support System Framework for the Diagnosis and ...Design of a Clinical Decision Support System Framework for the Diagnosis and ...
Design of a Clinical Decision Support System Framework for the Diagnosis and ...
Editor IJCATR
 
Vida útil do Facebook
Vida útil do FacebookVida útil do Facebook
Vida útil do Facebook
Culturadigitaw
 
Epidemiological modeling of online social network dynamics
Epidemiological modeling of online social network dynamicsEpidemiological modeling of online social network dynamics
Epidemiological modeling of online social network dynamics
Dario Caliendo
 

Similar to ysf_report (20)

Descriptive epidemiology
Descriptive epidemiologyDescriptive epidemiology
Descriptive epidemiology
 
Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...
Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...
Tweeting fever can twitter be used to Monitor the Incidence of Dengue-Like Il...
 
Defing the Epidemiologic of Covid-19
Defing the Epidemiologic of Covid-19Defing the Epidemiologic of Covid-19
Defing the Epidemiologic of Covid-19
 
MWEBAZA VICTOR - AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE ...
 MWEBAZA VICTOR - AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE ... MWEBAZA VICTOR - AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE ...
MWEBAZA VICTOR - AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE ...
 
Per contact probability of infection by Highly Pathogenic Avian Influenza
Per contact probability of infection by Highly Pathogenic Avian InfluenzaPer contact probability of infection by Highly Pathogenic Avian Influenza
Per contact probability of infection by Highly Pathogenic Avian Influenza
 
MathematicallyModelingEpidemicsThroughtheUseoftheReed-FrostEquation
MathematicallyModelingEpidemicsThroughtheUseoftheReed-FrostEquationMathematicallyModelingEpidemicsThroughtheUseoftheReed-FrostEquation
MathematicallyModelingEpidemicsThroughtheUseoftheReed-FrostEquation
 
Al04606233238
Al04606233238Al04606233238
Al04606233238
 
The SIR Model and the 2014 Ebola Virus Disease Outbreak in Guinea, Liberia an...
The SIR Model and the 2014 Ebola Virus Disease Outbreak in Guinea, Liberia an...The SIR Model and the 2014 Ebola Virus Disease Outbreak in Guinea, Liberia an...
The SIR Model and the 2014 Ebola Virus Disease Outbreak in Guinea, Liberia an...
 
The role of influenza in the epidemiology of pneumonia
The role of influenza in the epidemiology of pneumoniaThe role of influenza in the epidemiology of pneumonia
The role of influenza in the epidemiology of pneumonia
 
G027041044
G027041044G027041044
G027041044
 
Infection and Disease 2021-22.pptx
Infection and Disease 2021-22.pptxInfection and Disease 2021-22.pptx
Infection and Disease 2021-22.pptx
 
A COMPUTER VIRUS PROPAGATION MODEL USING DELAY DIFFERENTIAL EQUATIONS WITH PR...
A COMPUTER VIRUS PROPAGATION MODEL USING DELAY DIFFERENTIAL EQUATIONS WITH PR...A COMPUTER VIRUS PROPAGATION MODEL USING DELAY DIFFERENTIAL EQUATIONS WITH PR...
A COMPUTER VIRUS PROPAGATION MODEL USING DELAY DIFFERENTIAL EQUATIONS WITH PR...
 
Session ii g3 overview epidemiology modeling mmc
Session ii g3 overview epidemiology modeling mmcSession ii g3 overview epidemiology modeling mmc
Session ii g3 overview epidemiology modeling mmc
 
Investigation of an epidemic
Investigation of an epidemicInvestigation of an epidemic
Investigation of an epidemic
 
HunterThesis
HunterThesisHunterThesis
HunterThesis
 
Science.abb3221
Science.abb3221Science.abb3221
Science.abb3221
 
Imperial college-covid19-npi-modelling-16-03-2020
Imperial college-covid19-npi-modelling-16-03-2020Imperial college-covid19-npi-modelling-16-03-2020
Imperial college-covid19-npi-modelling-16-03-2020
 
Design of a Clinical Decision Support System Framework for the Diagnosis and ...
Design of a Clinical Decision Support System Framework for the Diagnosis and ...Design of a Clinical Decision Support System Framework for the Diagnosis and ...
Design of a Clinical Decision Support System Framework for the Diagnosis and ...
 
Vida útil do Facebook
Vida útil do FacebookVida útil do Facebook
Vida útil do Facebook
 
Epidemiological modeling of online social network dynamics
Epidemiological modeling of online social network dynamicsEpidemiological modeling of online social network dynamics
Epidemiological modeling of online social network dynamics
 

ysf_report

  • 1. A Bayesian approach to identifying high-concern individuals in an infection-bearing population A Bayesian approach to identifying high-concern individuals in an infection-bearing population Anqi Dong March 9, 2012 1 Background The delayed diagnosis and treatment of in- dividuals who are carrying infectious diseases can place a large burden on the healthcare sys- tem [4, 8] and the general population. Such delays are especially worrisome for infections with prolonged asymptomatic periods, such as HIV, chlamydia, and gonorrhea, in environments where many persons have vulnerable immune systems, such as hospitals and nursing homes, and in environments characterized by high inter- personal contact rates, such as schools and pris- ons. Public health authorities use methods such as contact tracing to locate contacts of reported infected individuals and determine whether they are infected as well. The lack of direct connec- tion between an individual’s contact patterns and that individual’s infection state means that con- tact tracing procedures often need to be fairly exhaustive in order to not miss infected individu- als [9,13]. In certain contexts, there is much data col- lected about the individual-level characteristics of an infection within a population. One of the key types of data about populations that is col- lected is descriptions of potentially infection- transmitting person-to-person interactions. These interactions are generally represented using con- tact networks—graphs, with vertices represent- ing persons and edges connecting persons who in- teract with each other [3]. For large, well-mixed populations, such as those of cities, it is difficult and infeasible to obtain useful sets of data [11]. However, traditional contact tracing provides data on individuals during outbreak investiga- tions. In addition, for smaller, relatively closed settings, such as hospitals, nursing homes, and schools, it is possible to gather detailed epidemi- ological data by distributing on-body proximity sensors to the entire population to log persons’ contacts [6,12]. Bayesian statistical approaches are valu- able tools for analysis of epidemiological data. Current literature on using Bayesian tech- niques in the context of infection spread on a heterogeneously-mixing contact network gen- erally focuses on inferring infection parame- ters such as the average probability of infec- tion [1, 2, 5, 7, 10]. Current techniques assume that data on the structure of contact networks is virtually nonexistent, and therefore either ignore the effects of contact networks on the infection, or sample from generated networks of a simple family of graphs (for example, Bernoulli random graphs) as part of the Bayesian approach [2,5]. This means that there is very little work on in- ferring additional individual-level characteris- tics from epidemiological data, especially when knowledge of the contact network is good. Analyses of available data on contact net- works and individual epidemiological records could better inform healthcare systems about various infection patterns and trends, so that March 9, 2012 Anqi Dong Page 1 of 6
  • 2. A Bayesian approach to identifying high-concern individuals in an infection-bearing population they can better target their limited efforts and resources. Such efficiencies include prioritizing contact tracing and testing to first assess persons who haven’t reported but who are likely to be infected, and first vaccinating those who are un- infected but who are at high risk of immediate infection. Coupled with an automated, continuous data- gathering system such as iEpi [6], an inference system could provide quasi-real-time predictions in institutional settings, identifying unknown sources of infection or patients at high risk of becoming infected. The spatial data provided by such a data-gathering system could even consider hidden environmental pathogen reservoirs, such as a contaminated surface at a particular location. 2 Project goals To implement an inference system that iden- tifies persons of interest or concern in a contact network using an incomplete set of epidemio- logical data. These persons of interest include likely high spreaders of infection, persons who are probably infected but whose infection sta- tuses are unknown, and persons who probably are not currently infected, but are likely to be- come infected soon. To design a general mathematical framework for performing inference on individual-level con- tact data and infection histories. 3 Methods 3.1 Epidemiological model We consider a contact network C of n per- sons. In terms of infection spread, the network is idealized as a closed one, meaning that infection cannot enter C except via a small set of persons— the index infectives (infectious individuals) of the population. This network can be heterogeneous— the persons in C do not necessarily have the same number of contacts or patterns of connection. Let C be made up of persons p1, p2,..., pn. In C , each person pi has a number of contacts, the set C (pi). For each person pj that is an ele- ment of C (pi), we can define cpj→pi(t), the rate of contact from pj to pi at time t. This rate is not symmetric, so cpj→pi(t) = cpi→pj (t) in general. The function cpj→pi(t) reflects the number of po- tentially infection-transmitting contacts from pj to pi. Depending on the pathogen, a “contact” can be events like sneezing, needle-sharing, or sexual contact. We use the convention that cpj→pi(t) = 0 if pj is not contacting pi at t. This occurs, for ex- ample, if pj and pi do not contact each other, and also when pj is not infected. Per contact, an uninfected person has a certain probability β of becoming infected as the result of that contact. Infectiousness is a boolean state—a person is either infected or not, and a person cannot be re- infected when already infected. We refer to the product of β and the cumulative number of con- tacts pi experiences per unit time at some time as the “infection pressure” felt by pi at that time. After becoming infected, we represent each person as presenting their infection following a second-order delay. In this paper, we make the simplifying assumption that all patients are treated for their infection upon presentation, meaning that they will not continue spreading infection after presentation. We assume that a patient cannot naturally re- cover from an infection—recover without health- care intervention. We also assume herein that each person can become infected at most once. In order to create a general inference frame- work, this model is not specific to a particular pathogen, but instead can be applied reasonably well to a range of microparasitic infections. Our model parameters are thus not necessarily repre- sentative of a specific disease. 3.2 Inference methods Let ip and tp be respectively the infection and presentation times of some arbitrary person p. Consider P tp ip C , (1) the probability density of some presentation and infection times, given a contact network. For March 9, 2012 Anqi Dong Page 2 of 6
  • 3. A Bayesian approach to identifying high-concern individuals in an infection-bearing population brevity, we omit some terms of this equation when discussing it below. By integrating or summing over some range of values, and comparing the cumulative proba- bility of this subset of values to the probability of the universal set of all permissible values, we can determine how likely the subset of data is to occur. As there are many types of data embed- ded in (1), the above probability density, with manipulation, provides a rich set of probabilis- tic information about the sets of infection and presentation times and C . Here, we focus on de- termining probabilities related to the infection time of a certain person p. P(ip) means “the probability density of per- son p becoming infected at time ip”. This implies two things: that person p was not infected before time ip, and that person p was infected exactly at time ip. Moreover, P(ip) is a probability density. To find the probability of person p becoming infected during the interval [a,b], we integrate, finding the value of b a P(ip)dip. Alternately, if we know that person p must have been infected somewhere in the interval [c,d], we can use the definition of conditional probabilities to find that the probability of ip being in the interval [a,b] is b a P(ip)dip P(U) = b a P(ip)dip d c P(ip)dip , where U is the universal set. If we know the presentation or infection time for a person, we can simply insert that value into (1). However, often epidemiological data is more scarce, and many presentation and infec- tion times are not available. In this case, we can marginalize the probability through integration. For example, for some person q, if tq is known to be in the range [a,b], and iq is known to be within the range [c,d], the probability that iq < k for some k, where a < k < b, can be calculated as k a P(iq)diq b a P(iq)diq = k a d c P(iqtq)diq dtq b a d c P(iqtq)diq dtq . (2) Equation 2 has some important consequences: the probability that person p is already infected but has not presented is equivalent to the prob- ability that ip < T and tp > T (where T is the current time). Also, the probability that p is un- infected but will be infected “soon” is equivalent to the probability that T < tp < T +∆t, where ∆t quantifies the duration of “soon”. In our inference model, we additionally allow for the case where some individuals have been tested in the past for their “infection status” (in- fected/uninfected) at that time. We incorporate this testing data by enforcing additional bounds on the infection times. For example, if person p was tested to be uninfected at time x1 and found to be infected at time x2 (where x2 > x1), we know that x1 < ip < x2. We can perform similar bounding with a presentation time: if p has not yet presented at the present time T, we know that tp > T. However, the probability density function itself remains unchanged, as knowledge about in- fection status does not affect how the infection behaves. To calculate the numerical value of (1) and related equations, we factor the probability into a product of probabilities, with terms of the general form P(tp|ip)P ip q∈C (p) iq . Both of these two probability terms are expressed in closed form using typical epidemiological representa- tions of infection. 3.2.1 Infection time partial ordering If, say, both iA and iB are unknown, and A and B are connected, we may not be able to determine the direction of infection pressure (A → B versus B → A). To resolve this ambiguity, we marginal- ize the probability in Equation 1 as follows: P tp ip C = ∑ d∈D P tp ip d C . The directed acyclic graph (dag) d imposes a topological ordering on C . For each edge, d spec- ifies which person of the pair was infected first, thereby also specifying the directionality of infec- tion pressure. The set D contains all the permis- March 9, 2012 Anqi Dong Page 3 of 6
  • 4. A Bayesian approach to identifying high-concern individuals in an infection-bearing population sible dags that contain all the vertices of C and provide an ordering for all edges of C . Dags that contradict other knowledge about the ordering of infection times are excluded from D. When computing probabilities considering only infection and presentation times, d is a nui- sance parameter, and we mathematically rewrite the expressions to eliminate the use of d and D in the final integral to be evaluated. 3.3 Assessment of inference Sources in the literature generally assess the accuracy and performance of their developed in- ference algorithms by running their inference models on historical datasets and discussing the logicality of the results of the inference. While such demonstrations are valuable in showing the practicality of inference results, it is difficult to validate statistical measures of historical data. To assess the performance of our inference algorithm, I instead developed and used a sim- ulation model of infection spread. This model representing the infection mechanisms described above on a best-effort basis, simulating infection spread and testing for all individuals contained within a computer-generated contact network. 4 Results and discussion My primary method of assessing the numer- ical behavior of the inference algorithm was to plot the probability of person p becoming in- fected before time x (given the contact network, some presentation times, and some patient his- tory as recorded at some time t, where t may be less than x) as a function of x. An example of this can be seen in Figure 1. I plotted this figure by manually splitting the desired integral along ip into several integrals with mutually exclusive regions. Generally, the results produced by the infer- ence appear reasonable, considering the addi- tional data produced by the model (data that was not used in the inference). That is, the time at which p was infected in the simulation model is usually close to or at a part of the integral with a high rate of change. However, the results of the simulation model are not necessarily highly probable, and the probability of ip < t is not a perfect analogue to the probability density that ip = t, so such a comparison is not definitive. Making use of data on patients’ infection his- tory can lead to the probability densities exhibit- ing subtle behavior. For example, if person X was tested to be uninfected at time 2.95, inference will usually suggest that there is a low probabil- ity that X was infected by time 3.00. However, if it was not known that X was uninfected at time 2.95, the probability that iX < 3.00 may be much higher, for there would then be no restriction that iX ≥ 2.95. Here, knowledge that iX ≥ 2.95 did not change the probability that iX = 3.00 but it did change the probability that iX < 3.00. This further demonstrates that the probability that ip < t is not analogous to the probability density that ip = t. Monte Carlo numerical integration (MCI) techniques were used to evaluate the probabil- ities required for inference. As MCI is a stochas- tic technique, it is difficult to properly estimate the technique’s precision without detailed math- ematical knowledge of the specific integrand. While error estimators such as the one in Mathe- matica proved to be inaccurate assessors of the MCI’s precision, testing smaller, symbolically integrable functions showed that the used MCI implementation was usually within an order of magnitude of the exact integration value. Consid- ering that different intervals of integration rou- tinely differ from each other by ratios of 1010 or more, a magnitude of precision is probably sufficient for most statistical uses. Increasing the number of individuals in C leads to higher-dimensional integrals and a smaller integrand (in terms of absolute value). For larger graphs, because of the high variance observed when performing multiple evaluations of the same integral, the simple MCI techniques in Mathematica (the techniques that are currently used) will be inadequate when scaling up the inference algorithm to large contact networks. March 9, 2012 Anqi Dong Page 4 of 6
  • 5. A Bayesian approach to identifying high-concern individuals in an infection-bearing population 0% 20% 40% 60% 80% 100% 0 1 2 3 4 5 6 Probabilitythatpersonhas alreadybecameinfected Model time ip4 ip5 ip6 ip7 Figure 1: A plot of the cumulative probability of each person in a four-person contact network being already infected, as a function of time. However, using a well-designed Markov chain process for point sampling during the integration would lead to better integrand stability and allow for inference to be performed on large contact networks in a reasonable amount of time. Work is being done towards implementing this feature into the inference algorithm. It may be the case that inputting higher- degree contact networks into the inference model may lead to tighter inferred distributions, because larger contact networks generally embed more information and heterogeneity. The increased amount of available data means better statistics can be inferred. However, some reengineering of the integration mechanism is likely required be- fore large-scale testing of higher-degree contact networks can be performed. 5 Conclusions The inference algorithms I developed demon- strate that it is possible to infer distributions for the likelihood of becoming infected at a certain time from limited epidemiological data (contact network structure, some presentation times, and some infection testing history), even if this time is in the future. However, when using Bayesian probabilistic techniques, it is important to remem- ber that they are not omniscient or failproof. The inference techniques described herein, while po- tentially very powerful, can be not very informa- tive or even misleading if used to analyze sig- nificantly erroneous data or insufficient sets of data. Though discussions of the required proce- dures is beyond the scope of this report, it is clear that our inference algorithm can be eas- ily adapted mathematically to represent an even wider range of scenarios and to infer more types of data. Possible extensions of this inference model in the near future include representing natural recovery, allowing for dynamic (evolv- ing) contact networks, modeling static sources of infection, and determining the probability of a particular directed edge spreading infection. 6 Acknowledgments I would like to thank Dr. Michael Horsch and Dr. Nathaniel Osgood of the University of Saskatchewan for their oversight and their sug- gestions. March 9, 2012 Anqi Dong Page 5 of 6
  • 6. A Bayesian approach to identifying high-concern individuals in an infection-bearing population References [1] T. Britton, T. Kypraios, and P. D. O’Neill. Inference for epidemics with three levels of mixing: Methodology and application to a measles outbreak. Scandinavian Journal of Statistics, 38(3):578–599, 2011. [2] T. Britton and P. D. O’Neill. Bayesian inference for stochastic epidemics in populations with random social structure. Scandinavian Journal of Statistics, 29:375–390, 2002. [3] K. T. D. Eames and M. J. Keeling. Contact tracing and disease control. Proceedings of the Royal Society of London B, 270:2565–2571, 2003. [4] J. A. Fleishman, B. R. Yehia, R. D. Moore, K. A. Gebo, and HIV Research Network. The economic burden of late entry into medical care for patients with hiv infection. Med Care, 48(12):1071–1079, 2010. [5] C. Groendyke, D. Welch, and D. R. Hunter. Bayesian inference for contact networks given epidemic data. Scandinavian Journal of Statistics, 38:600–616, 2011. [6] M. Hashemian, K. G. Stanley, D. L. Knowles, J. Calver, and N. D. Osgood. Human network data collection in the wild: The epidemiological utility of micro-contact and location data. In Proceedings of the ACM SIGHIT International Health Informatics Symposium (IHI 2012), Miami, FL, January 28–30 2012. [7] Y. Hosseinkashi. Statistical Inference on Stochastic Graphs. PhD thesis, Department of Statistics, University of Waterloo, 2011. [8] H. B. Krentz, M. C. Auld, and M. J. Gill. The high cost of medical care for patients who present late (CD4 < 200 cells/µl) with HIV infection. HIV Medicine, 5:93–98, 2004. [9] C. Mulder, C. G. M. Erkens, P. M. Kouw, E. M. Huisman, W. Meijer-Veldman, M. W. Borgdorff, and F. van Leth. Missed opportunities in tuberculosis control in the netherlands due to prioritization of contact investigations. European Journal of Public Health (advance access), 2011. [10] J. Ray and Y. M. Marzouk. A Bayesian method for inferring transmission chains in a partially observed epidemic. In Proceedings of the Joint Statistical Meetings, Denver, CO, 2010. Sandia National Laboratories. [11] J. Read, K. Eames, and W. Edmunds. Dynamic social networks and the implications for the spread of infectious disease. Journal of the Royal Society Interface, 5:1001–1007, 2008. [12] M. Salath´e, M. Kazandjieva, J. W. Lee, P. Levis, M. W. Feldman, and J. H. Jones. A high- resolution human contact network for infectious disease transmission. Proceedings of the National Academy of Sciences of the USA, 107(51):22020–22025, 2010. [13] J. Veen. Microepidemics of tuberculosis: the stone-in-the-pond principle. Tubercle and Lung Disease, 73:73–76, 1992. March 9, 2012 Anqi Dong Page 6 of 6