Bayesian modelling for COVID-19 seroprevalence studies
1. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Bayesian modelling for COVID-19
seroprevalence studies
Christian S. Perone
christian.perone@gmail.com
https://perone.github.io/covid19analysis
June 21, 2020
2. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Who Am I
Christian S. Perone
BSc Computer Science
(UPF - Brazil)
MSc Biomedical Engineering
(Polytechnique Montreal/UdeM - Canada)
Blog
http://blog.christianperone.com
COVID-19 Analyses
https://perone.github.io/
covid19analysis/
Twitter @tarantulae
3. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Agenda
Epidemiology Concepts
Incidence and prevalence
Prevalence studies
Rapid Tests
Rapid testing kits
Dynamics of the infection
Prevalence in RS
Details of the study
Results from first wave
Scientific communication
Modelling Sampling Error
Bayesian framework
Sampling error
Modelling test kits uncertainty
Test validation properties
True prevalence vs apparent prevalence
Final remarks
References
Q&A
4. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Section I
Epidemiology Concepts
5. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Incidence and Prevalence
Incidence and prevalence are two concepts that are often confused:
Incidence
Incidence refers to the proportion or rate of persons who develop a condition
during a particular time period.
6. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Incidence and Prevalence
Incidence and prevalence are two concepts that are often confused:
Incidence
Incidence refers to the proportion or rate of persons who develop a condition
during a particular time period.
Prevalence
Prevalence refers to proportion of persons who have a condition at or during a
particular time period.
7. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Epidemiologist Bathtub
8. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
This is incidence for Porto Alegre/RS25/02
29/02
04/03
08/03
12/03
16/03
20/03
24/03
28/03
01/04
05/04
09/04
13/04
17/04
21/04
25/04
29/04
03/05
07/05
11/05
15/05
19/05
23/05
27/05
31/05
04/06
08/06
12/06
16/06
20/06
24/06
Symptom onset date
0
10
20
30
40
50
60
Confirmedcases
Be careful with the right-censoring trap;
Often used by people to convey misleading messages to the public.
10. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Example of right-censoring adjusment
11. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Prevalence studies
Initial surveillance is focused primarily on patients with severe disease;
Not different for Porto Alegre/RS, the protocol is to test all severe
cases (SRAG – Síndrome Respiratória Aguda Grave –);
Protocol also describes testing of health workers, risk groups (nursing
homes, etc) and local outbreaks;
RT-PCR between 2nd and 6th day of symptom onset and tests for
anti-bodies after the 14th day of symptoms (SMS/POA);
Source: Boletim Epidemiológico / Secretaria Municipal de Saúde de Porto Alegre (SMS/POA).
12. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Prevalence studies
Extent and fraction of mild or asymptomatic infections are not clear.
We know that many infections are being missed, what is the
proportion of the total population has been infected ?
13. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Prevalence studies
Extent and fraction of mild or asymptomatic infections are not clear.
We know that many infections are being missed, what is the
proportion of the total population has been infected ?
WHO designed a protocol to investigate the extent of infection, as
determined by seropositivity in the general population, in any country
in which COVID-19 virus infection has been reported;
Population-based age-stratified seroepidemiological investigation
protocol for COVID-19 virus infection (March, 2020).
14. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Prevalence studies
Extent and fraction of mild or asymptomatic infections are not clear.
We know that many infections are being missed, what is the
proportion of the total population has been infected ?
WHO designed a protocol to investigate the extent of infection, as
determined by seropositivity in the general population, in any country
in which COVID-19 virus infection has been reported;
Population-based age-stratified seroepidemiological investigation
protocol for COVID-19 virus infection (March, 2020).
Critical to remember that serosurveys are population-level surveys.
They are intended to inform our broader understanding of the disease,
not to tell individuals whether they have or have not been infected.
(COVID-19 Data Dives: The Takeaways From Seroprevalence Surveys,
Natalie Dean, Medscape, 2020);
15. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Taxonomy of different studies
There two main groups of studies: experimental and observational.
16. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Taxonomy of different studies
There two main groups of studies: experimental and observational.
Experimental
This group of studies are based on intervention. The gold standard nowadays to
assess causal effect is the RCT (Randomized Controlled Trial) where one group
receives the intervention while the control group receives nothing or an inactive
placebo.
17. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Taxonomy of different studies
There two main groups of studies: experimental and observational.
Experimental
This group of studies are based on intervention. The gold standard nowadays to
assess causal effect is the RCT (Randomized Controlled Trial) where one group
receives the intervention while the control group receives nothing or an inactive
placebo.
Observational
We want to observe the effect of a risk factor, diagnostic test, etc, without trying
to change who is or isn’t exposed to it.
Seroprevalence studies are observational studies, they use
observational data collected using a diagnostic test.
18. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Options for seroprevalence studies
Recommended by WHO proposal, three possibilities for the study design:
Cross-sectional
Analyzes data from a population, or a representative subset, at a specific point
in time.
19. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Options for seroprevalence studies
Recommended by WHO proposal, three possibilities for the study design:
Cross-sectional
Analyzes data from a population, or a representative subset, at a specific point
in time.
Repeated cross-sectional
Also called trend study, it is basically a repeated application of the cross-sectional
collection at different points in time.
20. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Options for seroprevalence studies
Recommended by WHO proposal, three possibilities for the study design:
Cross-sectional
Analyzes data from a population, or a representative subset, at a specific point
in time.
Repeated cross-sectional
Also called trend study, it is basically a repeated application of the cross-sectional
collection at different points in time.
Longitudinal
The same individuals are evaluated on regular follow-ups.
21. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Options for seroprevalence studies
Recommended by WHO proposal, three possibilities for the study design:
Cross-sectional
Analyzes data from a population, or a representative subset, at a specific point
in time.
Repeated cross-sectional
Also called trend study, it is basically a repeated application of the cross-sectional
collection at different points in time.
Longitudinal
The same individuals are evaluated on regular follow-ups.
The most common and easy to implement is the cross-sectional or repeated
cross-sectional. Brazilian government has funded both state-level and
national-level repeated cross-sectional studies.
22. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Study designs
Time t Time t
Time t + Δ
Time t
Time t + Δ
Cross-sectional
Repeated
Cross-sectional
Longitudinal
23. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Prevalence Studies in Brazil
This presentation will focus on the repeated cross-sectional studies
developed at Rio Grande do Sul (state-level), carried by UFPel and
its partners;
24. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Prevalence Studies in Brazil
This presentation will focus on the repeated cross-sectional studies
developed at Rio Grande do Sul (state-level), carried by UFPel and
its partners;
It’s easier to poke holes in a study
than to run one yourself.
– COVID-19 Data Dives: The Takeaways From Seroprevalence Surveys.
Natalie E. Dean. May/2020. Medscape.
25. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Focus of this talk
This talk will focus on uncertainty modelling using a Bayesian
framework for the sampling errors and test kit validation properties;
I will not focus on the probability
sampling adjustments:
Most of the time, you know
beforehand (census) the
characteristics of
populations/sub-populations;
You can do a weighting scheme to
adjust for non-respondents, different
demographics than the census, etc;
Source: Sampling and Weighting – A Better Practice
Guide for Practitioners, by Data Analysis Australia.
26. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Section II
Rapid Tests
27. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
SARS-CoV-2 rapid tests
Many brands available on the market nowadays, most follow this pattern
for IgM/IgG/control lines:
28. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Wondfo Test Kit
29. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
1) Capillary blood/serum/plasma collection
30. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
2) Buffer Solution
31. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
3) Reading results
32. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Dynamics of the infection
Many of these intervals still have a lot of uncertainty:
33. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Section III
Prevalence in RS
34. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Seroprevalence study in Rio Grande do Sul
The study was designed an developed by UFPel and its collaborators;
Funded by federal government (test kits from Ministry of Health);
35. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Seroprevalence study in Rio Grande do Sul
The study was designed an developed by UFPel and its collaborators;
Funded by federal government (test kits from Ministry of Health);
Planned waves:
11-13/April, N=4500
25-27/April, N=4500
09-11/May, N=4500
23-25/May, N=4500
Later extended up to August
36. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Seroprevalence study in Rio Grande do Sul
The study was designed an developed by UFPel and its collaborators;
Funded by federal government (test kits from Ministry of Health);
Planned waves:
11-13/April, N=4500
25-27/April, N=4500
09-11/May, N=4500
23-25/May, N=4500
Later extended up to August
Used Wondfo COVID-19 test kits
Subjects were also interviewed regarding social distancing
37. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Seroprevalence study in Rio Grande do Sul
The study was conducted at 9 sentinel cities:
Regiões
1. Porto Alegre
2. Canoas
3. Pelotas
4. Caxias do Sul
5. Santa Cruz do Sul
6. Santa Maria
7. Passo Fundo
8. Ijuí
9. Uruguaiana
População gaúcha: 11,3 milhões
Cidades da
amostra
31%
da população do RS
Source: slides presented at the government press conference.
38. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Seroprevalence study in Rio Grande do Sul
Respondents for the first wave on each city:
Resultados
Caxias do Sul 500
Ijuí 500
Passo Fundo 500
Pelotas 500
Santa Cruz do Sul 500
Uruguaiana 500
Santa Maria 461
Porto Alegre 396
Canoas 332
TOTAL 4.189
Source: slides presented at the government press conference.
39. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Seroprevalence study in Rio Grande do Sul
This is the key slide with the results that was presented at the press
conference for the first wave of the survey on 11-13/April:
Resultados
testes válidos4189
testes positivos2
% com anticorpos0,05%
infectado a cada 2000 habitantes1
pessoas com anticorpos no RS5650
Source: slides presented at the government press conference.
40. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
News after the press conference
After presenting these results at the press conference, a lot of news articles
started to be advertised:
Revista Pesquisa Fapesp:
"Rio Grande do Sul pode ter 7,5 vezes mais casos do que o
confirmado"
41. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
News after the press conference
After presenting these results at the press conference, a lot of news articles
started to be advertised:
Revista Pesquisa Fapesp:
"Rio Grande do Sul pode ter 7,5 vezes mais casos do que o
confirmado"
UFPel:
"Dos 4189 testes validados pelos pesquisadores, dois testes
confirmaram positivo, cerca de 0,05% da população gaúcha deve ter
sido contaminada."
42. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
News after the press conference
After presenting these results at the press conference, a lot of news articles
started to be advertised:
Revista Pesquisa Fapesp:
"Rio Grande do Sul pode ter 7,5 vezes mais casos do que o
confirmado"
UFPel:
"Dos 4189 testes validados pelos pesquisadores, dois testes
confirmaram positivo, cerca de 0,05% da população gaúcha deve ter
sido contaminada."
Jornal O Nacional:
"Estudo estima que RS tenha 5.650 pessoas infectadas pela Covid-19"
43. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
News after the press conference
After presenting these results at the press conference, a lot of news articles
started to be advertised:
Revista Pesquisa Fapesp:
"Rio Grande do Sul pode ter 7,5 vezes mais casos do que o
confirmado"
UFPel:
"Dos 4189 testes validados pelos pesquisadores, dois testes
confirmaram positivo, cerca de 0,05% da população gaúcha deve ter
sido contaminada."
Jornal O Nacional:
"Estudo estima que RS tenha 5.650 pessoas infectadas pela Covid-19"
Diário da Manhã:
"Estudo estima que RS tenha mais de 5.600 pessoas infectadas pela
Covid-19"
44. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
News after the press conference
After presenting these results at the press conference, a lot of news articles
started to be advertised:
Revista Pesquisa Fapesp:
"Rio Grande do Sul pode ter 7,5 vezes mais casos do que o
confirmado"
UFPel:
"Dos 4189 testes validados pelos pesquisadores, dois testes
confirmaram positivo, cerca de 0,05% da população gaúcha deve ter
sido contaminada."
Jornal O Nacional:
"Estudo estima que RS tenha 5.650 pessoas infectadas pela Covid-19"
Diário da Manhã:
"Estudo estima que RS tenha mais de 5.600 pessoas infectadas pela
Covid-19"
Notícia da UNISC:
"(...) estima-se que há um infectado a cada dois mil habitantes do
estado, podendo existir 5650 pessoas (...)"
45. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Rule Number I
These are some principles from the Royal Statistical Society (Data Science
section), the first one is:
Rule 1. Scientists and journalists should express the
level of uncertainty associated with a forecast
All mathematical models contain uncertainty. This should
be explicit – researchers should communicate their own
certainty that a result is true. A range of plausible results
should be provided, not just one extreme result.
– All models are wrong, but some are completely wrong.
Royal Statistical Society (Data Science section).
46. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Section IV
Modelling Sampling Error
47. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Prior, likelihood and posterior
Posterior
p(θ|X)
48. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Prior, likelihood and posterior
Posterior
p(θ|X) ∝ p(X|θ)
Likelihood
49. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Prior, likelihood and posterior
Posterior
p(θ|X) ∝ p(X|θ)
Likelihood
Prior
π(θ)
51. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Multiple explanations for phenomena
4 3 2 1 0 1 2 3 4
10
5
0
5
10
Source: Ian Osband et al. Using Randomized Prior Functions for Deep Reinforcement Learning. NIPS
2018. Image from: http://blog.christianperone.com
52. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Sampling error
The first uncertainty we want to model is the sampling error. We are using a
small sample of a population to do inference about the population.
Image from: https://towardsdatascience.com/data-samples-and-error-visualization-techniques-832c4a7fbcb2
53. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Prior for prevalence proportion
We want to choose a non-informative prior for the prevalence proportion,
so we use a Beta(α, β) distribution with a Beta(1.0, 1.0) , that yields
a flat prior, with support of x ∈ (0, 1):
0.0 0.2 0.4 0.6 0.8 1.0
0.96
0.98
1.00
1.02
1.04
PDF
Prior Beta( , ) for prevalence
54. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Prior for prevalence proportion
We want to choose a non-informative prior for the prevalence proportion,
so we use a Beta(α, β) distribution with a Beta(1.0, 1.0) , that yields
a flat prior, with support of x ∈ (0, 1):
0.0 0.2 0.4 0.6 0.8 1.0
0.96
0.98
1.00
1.02
1.04
PDF
Prior Beta( , ) for prevalence
You can also incorporate knowledge, if you know that your prevalence must
be higher than i.e. 0.01% (as the state was already doing tests), you can
incorporate it here.
55. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
The likelihood
We want now to choose the likelihood for the observed data distribution.
COVID-19 tests are number of successes (positive tests) in a sequence of n
independent experiments. This is by definition the Binomial(n, p )
distribution.
56. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
The likelihood
We want now to choose the likelihood for the observed data distribution.
COVID-19 tests are number of successes (positive tests) in a sequence of n
independent experiments. This is by definition the Binomial(n, p )
distribution.
with pm.Model() as model:
true_p = pm.Beta("true_p", alpha=1, beta=1)
obs = pm.Binomial('obs', p=true_p, n=n,
observed=positive)
57. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
The likelihood
We want now to choose the likelihood for the observed data distribution.
COVID-19 tests are number of successes (positive tests) in a sequence of n
independent experiments. This is by definition the Binomial(n, p )
distribution.
with pm.Model() as model:
true_p = pm.Beta("true_p", alpha=1, beta=1)
obs = pm.Binomial('obs', p=true_p, n=n,
observed=positive)
Model in plate notation:
58. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Prior predictive distribution
Let’s now check the prior predictive distribution by sampling from the
model (what is possible under absence of evidence):
with model:
ppc = pm.sample_prior_predictive(samples=10000,
var_names=["true_p"])
59. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Prior predictive distribution
Let’s now check the prior predictive distribution by sampling from the
model (what is possible under absence of evidence):
with model:
ppc = pm.sample_prior_predictive(samples=10000,
var_names=["true_p"])
0.0 0.2 0.4 0.6 0.8 1.0
0
100
200
300
400
500
Prior Predictive Distribution for 'true_p'
60. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Prior predictive distribution
What would happen to the prior predictive distribution if we used a
different Beta prior:
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
PDF
Prior Beta( , ) for prevalence
61. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Prior predictive distribution
What would happen to the prior predictive distribution if we used a
different Beta prior:
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
PDF
Prior Beta( , ) for prevalence
0.0 0.2 0.4 0.6 0.8
0
100
200
300
400
500
Prior Predictive Distribution for 'true_p'
62. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Finally the MCMC sampling
Let’s now estimate the posterior distribution:
with model:
trace = pm.sample(draws=10000, tune=4000,
cores=2, target_accept=0.85)
63. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Finally the MCMC sampling
Let’s now estimate the posterior distribution:
with model:
trace = pm.sample(draws=10000, tune=4000,
cores=2, target_accept=0.85)
pm.traceplot(trace)
64. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Finally the MCMC sampling
Let’s now estimate the posterior distribution:
with model:
trace = pm.sample(draws=10000, tune=4000,
cores=2, target_accept=0.85)
pm.traceplot(trace)
0.0000 0.0005 0.0010 0.0015 0.0020 0.0025 0.0030
true_p
0 2000 4000 6000 8000
0.000
0.001
0.002
0.003
true_p
65. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Looking at the posterior
Let’s visualize posterior distribution and compute intervals:
pm.plot_posterior(trace, ref_val=5650, point_estimate="mode")
66. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Looking at the posterior
Let’s visualize posterior distribution and compute intervals:
pm.plot_posterior(trace, ref_val=5650, point_estimate="mode")
0 5000 10000 15000 20000 25000 30000 35000
Population
746 16290
94% HPD
mode=5518
35.5% <5650< 64.5%
Mode is quite similar to the value reported, however, the credibility interval
of 94% we see an interval of 764 − 16.290, showing the huge uncertainty
due to survey design.
67. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Perfect test ?
We went from a single point estimate of 5650 to an entire posterior
distribution with a 94% credibility interval of 764 − 16.290;
68. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Perfect test ?
We went from a single point estimate of 5650 to an entire posterior
distribution with a 94% credibility interval of 764 − 16.290;
The results we reported are assuming a perfect test, that will never
give a False Positive (FP) or a False Negative (FN);
69. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Perfect test ?
We went from a single point estimate of 5650 to an entire posterior
distribution with a 94% credibility interval of 764 − 16.290;
The results we reported are assuming a perfect test, that will never
give a False Positive (FP) or a False Negative (FN);
This test doesn’t exist, and the prevalence we are measuring is the apparent
prevalence and not the true prevalence.
Therefore we have another uncertainty to model: the validation properties
of the test kit employed.
70. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Section V
Modelling test kits uncertainty
71. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Sensitivity and Specificity
The two statistical quantities used to measure the performance of
diagnostic tests are: sensitivity and specificity.
72. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Sensitivity and Specificity
The two statistical quantities used to measure the performance of
diagnostic tests are: sensitivity and specificity.
Sensitivity
Probability of a positive test given that the patient has COVID-19.
SE = TP/(TP + FN)
73. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Sensitivity and Specificity
The two statistical quantities used to measure the performance of
diagnostic tests are: sensitivity and specificity.
Sensitivity
Probability of a positive test given that the patient has COVID-19.
SE = TP/(TP + FN)
Specificity
Probability of a negative test given that the patient is well.
SP = TN/(TN + FP)
74. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Wondfo rapid test validation
Rapid test manufacturers usually do test validation on ideal conditions, so
the results are often very optimistic.
These are from Wondfo for "venous whole blood, serum and plasma"
sample:
Sensitivity in this case is 86.43% and specificity is 99.57%.
75. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Wondfo rapid test validation
These are from Wondfo for "fingerstick whole blood" sample:
Sensitivity in this case is 100.00% and specificity is 98.84%.
76. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Wondfo rapid test in Brazil
The curious case of the missing table (thanks to Ricardo Parolin
Schnekenberg):
77. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
The COVID-19 Testing Project
Independent testing project for rapid tests:
Site: https://covidtestingproject.org/
Paper: https://www.medrxiv.org/content/10.1101/2020.
04.25.20074856v2 (pre-print)
Evaluating tests from many different suppliers;
The specimen set comprised:
Plasma or serum samples from symptomatic SARS-CoV-2
RT-PCR-positive individuals;
Pre-COVID-19 negative controls;
Recent samples from individuals who underwent respiratory viral
testing but were not diagnosed with COVID-19;
80. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Results for Wondfo
IgM or IgG
Days since onset Total N Positive %
1 − 5 days 25 10 40.00
6 − 10 days 36 24 66.67
11 − 15 days 33 27 81.82
16 − 20 21 17 80.95
>20 11 9 81.82
Table: Percentage of positive specimens from patients w/ positive RT-PCR. Source: COVID-19 Testing
Project.
81. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Results for Wondfo
Sensitivity of the test seems low, increasing the chance of False
Negatives (FN);
Specificity on the other hand seems quite good:
From 106 blood donor plasma specimens collected before July 2018,
the test gave only 1 False Positive (FP);
However, in very low prevalence scenarios, even a good specificity can
be low;
82. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Results for Wondfo
Sensitivity of the test seems low, increasing the chance of False
Negatives (FN);
Specificity on the other hand seems quite good:
From 106 blood donor plasma specimens collected before July 2018,
the test gave only 1 False Positive (FP);
However, in very low prevalence scenarios, even a good specificity can
be low;
Now the question becomes:
How to propagate the uncertainty of these validation tests into our
modelling ?
Also, imagine two validation scenarios:
Validation test using 33 samples with 27 positives detected;
Validation test using 330 samples with 270 positives detected;
Both have a sensitivity of 81.82%.
83. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Beta prior for sensitivity
The way we will incorporate the uncertainty of the validation tests is on the
prior Beta distribution:
N=33 w/ 27 positives detected
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
4
5
6
PDF
Prior Beta( , ) for sensitiviy
84. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Beta prior for sensitivity
The way we will incorporate the uncertainty of the validation tests is on the
prior Beta distribution:
N=33 w/ 27 positives detected
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
4
5
6
PDF
Prior Beta( , ) for sensitiviy
N=330 w/ 270 positives detected
0.0 0.2 0.4 0.6 0.8 1.0
0.0
2.5
5.0
7.5
10.0
12.5
15.0
17.5
PDF
Prior Beta( , ) for sensitiviy
85. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
True prevalence vs Apparent Prevalence
We know the effect that sensitivity and specificity can affect the prevalence
(aka Rogan-Gladen estimator):
p
True prevalence
=
Apparent prevalence
ˆp +
Specificity
sp −1
sp
Specificity
+ se
Sensitivity
−1
86. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
True prevalence vs Apparent Prevalence
We know the effect that sensitivity and specificity can affect the prevalence
(aka Rogan-Gladen estimator):
p
True prevalence
=
Apparent prevalence
ˆp +
Specificity
sp −1
sp
Specificity
+ se
Sensitivity
−1
Rearranging terms, we can get the formula for the apparent prevalence:
ˆp = p ∗ se + (1 − p) ∗ (1 − sp)
Since the apparent prevalence is what we observe, we will model it as our
observed distribution.
87. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Model overview
Test Validation
Results
Sensitivity
Prior
Specificity
Prior
True
Prevalence
Prior
Apparent
Prevalence
Observed
We want to
estimate
88. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
First scenario: COVID-19 testing project
We will first define our priors for sensitivity and specificity using
N=33 positive w/ 27 positives
detected
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
4
5
6
PDF
Prior Beta( , ) for sensitiviy
89. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
First scenario: COVID-19 testing project
We will first define our priors for sensitivity and specificity using
N=33 positive w/ 27 positives
detected
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
4
5
6
PDF
Prior Beta( , ) for sensitiviy
N=106 negative w/ 1 false positives
detected
0.0 0.2 0.4 0.6 0.8 1.0
0
5
10
15
20
25
30
35
40
PDF
Prior Beta( , ) for specificity
90. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Model modifications
with pm.Model() as model:
true_p = pm.Beta("true_p", alpha=1., beta=1.)
se_p = pm.Beta("se_p", alpha=x_se + 1,
beta=n_se - x_se + 1)
sp_p = pm.Beta("sp_p", alpha=x_sp + 1,
beta=n_sp - x_sp + 1)
apparent_p = pm.Deterministic("apparent_p",
true_p * se_p +
(1.0 - true_p) *
(1.0 - sp_p))
obs = pm.Binomial('obs', p=apparent_p,
n=s, observed=positive)
We’re parametrizing the p of the Binomial(n, p) distribution. This
p comes from the apparent prevalence, which is what we observe. The
n is our total sample size 4189 in this case for the first wave of the study.
93. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
True prevalence posterior estimation
For the true prevalence estimation, we see that with a credibility interval of
94% we have now a interval of 0 − 15201 with the mean at 5789 but with
the mode very close to zero.
0 10000 20000 30000 40000 50000 60000
0 15201
94% HPD
mode=0
60.1% <5650< 39.9%
94. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Second scenario: Wondfo validation
We will first define our priors for sensitivity and specificity using
N=42 positive w/ 42 positives
detected
0.0 0.2 0.4 0.6 0.8 1.0
0
10
20
30
40
PDF
Prior Beta( , ) for sensitiviy
95. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Second scenario: Wondfo validation
We will first define our priors for sensitivity and specificity using
N=42 positive w/ 42 positives
detected
0.0 0.2 0.4 0.6 0.8 1.0
0
10
20
30
40
PDF
Prior Beta( , ) for sensitiviy
N=172 negative w/ 2 false positives
detected
0.0 0.2 0.4 0.6 0.8 1.0
0
10
20
30
40
PDF
Prior Beta( , ) for specificity
97. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
True prevalence posterior estimation
Now with Wondfo validation data, we see that with a credibility interval of
94% of 0 − 11372 with the mean at 4218 but with the mode very close to
zero too.
0 10000 20000 30000 40000
0 11372
94% HPD
mode=0
73.1% <5650< 26.9%
98. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Final remarks
Uncertainty is paramount in scientific communication;
Scientists and journalists have a moral responsibility to convey the
uncertainty;
99. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Final remarks
Uncertainty is paramount in scientific communication;
Scientists and journalists have a moral responsibility to convey the
uncertainty;
We are discovering much more about asymptomatic and mild
SARS-CoV-2 infections. Example:
"Growing evidence suggests that asymptomatic and mild
SARS-CoV-2 infections, together comprising >95% of all infections,
may be associated with lower antibody titers than severe infections. In
addition, antibody levels peak a few weeks after infection and decay
gradually. Yet, positive controls used for determining the
sensitivity of serological assays are usually limited to samples
from hospitalized patients with severe disease, leading to what is
commonly known as spectrum bias in estimating seroprevalence in
the general population."
Source: "Are SARS-CoV-2 seroprevalence estimates biased ?", Saki
Takahashi, et al. 2020. (pre-print).
100. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Be careful with "good news"
101. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Be careful with "good news"
11-03
13-03
15-03
17-03
19-03
21-03
23-03
25-03
27-03
29-03
31-03
02-04
04-04
06-04
08-04
10-04
12-04
14-04
16-04
18-04
20-04
22-04
24-04
26-04
28-04
30-04
02-05
04-05
06-05
08-05
10-05
12-05
14-05
16-05
18-05
20-05
22-05
24-05
26-05
28-05
30-05
01-06
03-06
05-06
07-06
09-06
11-06
13-06
15-06
17-06
19-06
21-06
23-06
Date
0
4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
64
68
72
76
80
84
88
92
Maximum since the beginning of the outbreak
Confirmed COVID-19 patients on ICUs in Porto Alegre/RS/Brazil
ICU Porto Alegre/RS/Brasil
102. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
R(t) estimation
103. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
References
Takahashi, S., Greenhouse, B., & Rodríguez-Barraquer, I. (2020, May 30). Are SARS-CoV-2
seroprevalence estimates biased ? https://doi.org/10.31219/osf.io/y3fxt.
Daniel B. Larremore, Bailey K. Fosdick, Sam Zhang, Yonatan H. Grad bioRxiv 2020.05.23.112649; doi:
https://doi.org/10.1101/2020.05.23.112649
Natalie E. Dean. Commentary: COVID-19 Data Dives: The Takeaways From Seroprevalence Surveys.
2020. https://www.medscape.com/viewarticle/929861.
Jeffrey D. Whitman, et al. Test performance evaluation of SARS-CoV-2 serological assays. medRxiv
2020.04.25.20074856; doi: https://doi.org/10.1101/2020.04.25.20074856.
Governo do Rio Grande do Sul. Pesquisa de Prevalência. https://bit.ly/3dlUzVZ.
CDC, Centers for Disease Control and Prevention. COVID-19 Serology Surveillance Strategy. 2020.
https://bit.ly/3df4wEw
M. J. Vilar, et al. Bayesian Estimation of the True Prevalence and of the Diagnostic Test Sensitivity
and Specificity of Enteropathogenic Yersinia in Finnish Pig Serum Samples. 2015.
https://bit.ly/3egUV1c
104. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
References
Christian S. Perone. Nota sobre o estudo da UFPel no Rio Grande do Sul. 2020.
http://blog.christianperone.com/2020/04/nota-ufpel-covid.
Christian S. Perone. COVID-19 R(t) estimates for states in Brazil and Portugal. 2020.
https://perone.github.io/covid19analysis/.
Timothy L. Lash, et al. Applying Quantitative Bias Analysis to Epidemiologic Data. Book, 2009.
https://link.springer.com/book/10.1007/978-0-387-87959-8.
Benjamin D. Brody, Sharon J. Parish, Dora Kanellopoulos, Mark J. Russ. (2020) A COVID-19
Testing and Triage Algorithm for Psychiatric Units: One Hospital’s Response to the New York
Region’s Pandemic.. Psychiatry Research, 113244. https://bit.ly/314cJJc.
Eran Bendavid, et al. COVID-19 Antibody Seroprevalence in Santa Clara County, California. 2020.
medRxiv 2020.04.14.20062463; doi: https://doi.org/10.1101/2020.04.14.20062463.
W. J. Rogan and B. Gladen. Estimating prevalence from the results of a screening test. American
Journal of Epidemiology, vol. 107, no. 1, pp. 71–76, 1978.
105. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Section VI
Q&A
106. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A
Q&A