Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bayesian modelling for COVID-19 seroprevalence studies

797 views

Published on

A short tutorial on how to model sampling error and test kit uncertainty for COVID-19 seroprevalence studies.

Published in: Science
  • Be the first to comment

Bayesian modelling for COVID-19 seroprevalence studies

  1. 1. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Bayesian modelling for COVID-19 seroprevalence studies Christian S. Perone christian.perone@gmail.com https://perone.github.io/covid19analysis June 21, 2020
  2. 2. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Who Am I Christian S. Perone BSc Computer Science (UPF - Brazil) MSc Biomedical Engineering (Polytechnique Montreal/UdeM - Canada) Blog http://blog.christianperone.com COVID-19 Analyses https://perone.github.io/ covid19analysis/ Twitter @tarantulae
  3. 3. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Agenda Epidemiology Concepts Incidence and prevalence Prevalence studies Rapid Tests Rapid testing kits Dynamics of the infection Prevalence in RS Details of the study Results from first wave Scientific communication Modelling Sampling Error Bayesian framework Sampling error Modelling test kits uncertainty Test validation properties True prevalence vs apparent prevalence Final remarks References Q&A
  4. 4. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Section I Epidemiology Concepts
  5. 5. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Incidence and Prevalence Incidence and prevalence are two concepts that are often confused: Incidence Incidence refers to the proportion or rate of persons who develop a condition during a particular time period.
  6. 6. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Incidence and Prevalence Incidence and prevalence are two concepts that are often confused: Incidence Incidence refers to the proportion or rate of persons who develop a condition during a particular time period. Prevalence Prevalence refers to proportion of persons who have a condition at or during a particular time period.
  7. 7. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Epidemiologist Bathtub
  8. 8. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A This is incidence for Porto Alegre/RS25/02 29/02 04/03 08/03 12/03 16/03 20/03 24/03 28/03 01/04 05/04 09/04 13/04 17/04 21/04 25/04 29/04 03/05 07/05 11/05 15/05 19/05 23/05 27/05 31/05 04/06 08/06 12/06 16/06 20/06 24/06 Symptom onset date 0 10 20 30 40 50 60 Confirmedcases Be careful with the right-censoring trap; Often used by people to convey misleading messages to the public.
  9. 9. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Symptom Onset vs Confirmation Porto Alegre/RS 10/03 17/03 24/03 31/03 07/04 14/04 21/04 28/04 05/05 12/05 19/05 26/05 02/06 09/06 16/06 23/06 Confirmation Date 25/02 03/03 10/03 17/03 24/03 31/03 07/04 14/04 21/04 28/04 05/05 12/05 19/05 26/05 02/06 09/06 16/06 SymptomOnsetDate
  10. 10. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Example of right-censoring adjusment
  11. 11. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Prevalence studies Initial surveillance is focused primarily on patients with severe disease; Not different for Porto Alegre/RS, the protocol is to test all severe cases (SRAG – Síndrome Respiratória Aguda Grave –); Protocol also describes testing of health workers, risk groups (nursing homes, etc) and local outbreaks; RT-PCR between 2nd and 6th day of symptom onset and tests for anti-bodies after the 14th day of symptoms (SMS/POA); Source: Boletim Epidemiológico / Secretaria Municipal de Saúde de Porto Alegre (SMS/POA).
  12. 12. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Prevalence studies Extent and fraction of mild or asymptomatic infections are not clear. We know that many infections are being missed, what is the proportion of the total population has been infected ?
  13. 13. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Prevalence studies Extent and fraction of mild or asymptomatic infections are not clear. We know that many infections are being missed, what is the proportion of the total population has been infected ? WHO designed a protocol to investigate the extent of infection, as determined by seropositivity in the general population, in any country in which COVID-19 virus infection has been reported; Population-based age-stratified seroepidemiological investigation protocol for COVID-19 virus infection (March, 2020).
  14. 14. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Prevalence studies Extent and fraction of mild or asymptomatic infections are not clear. We know that many infections are being missed, what is the proportion of the total population has been infected ? WHO designed a protocol to investigate the extent of infection, as determined by seropositivity in the general population, in any country in which COVID-19 virus infection has been reported; Population-based age-stratified seroepidemiological investigation protocol for COVID-19 virus infection (March, 2020). Critical to remember that serosurveys are population-level surveys. They are intended to inform our broader understanding of the disease, not to tell individuals whether they have or have not been infected. (COVID-19 Data Dives: The Takeaways From Seroprevalence Surveys, Natalie Dean, Medscape, 2020);
  15. 15. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Taxonomy of different studies There two main groups of studies: experimental and observational.
  16. 16. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Taxonomy of different studies There two main groups of studies: experimental and observational. Experimental This group of studies are based on intervention. The gold standard nowadays to assess causal effect is the RCT (Randomized Controlled Trial) where one group receives the intervention while the control group receives nothing or an inactive placebo.
  17. 17. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Taxonomy of different studies There two main groups of studies: experimental and observational. Experimental This group of studies are based on intervention. The gold standard nowadays to assess causal effect is the RCT (Randomized Controlled Trial) where one group receives the intervention while the control group receives nothing or an inactive placebo. Observational We want to observe the effect of a risk factor, diagnostic test, etc, without trying to change who is or isn’t exposed to it. Seroprevalence studies are observational studies, they use observational data collected using a diagnostic test.
  18. 18. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Options for seroprevalence studies Recommended by WHO proposal, three possibilities for the study design: Cross-sectional Analyzes data from a population, or a representative subset, at a specific point in time.
  19. 19. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Options for seroprevalence studies Recommended by WHO proposal, three possibilities for the study design: Cross-sectional Analyzes data from a population, or a representative subset, at a specific point in time. Repeated cross-sectional Also called trend study, it is basically a repeated application of the cross-sectional collection at different points in time.
  20. 20. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Options for seroprevalence studies Recommended by WHO proposal, three possibilities for the study design: Cross-sectional Analyzes data from a population, or a representative subset, at a specific point in time. Repeated cross-sectional Also called trend study, it is basically a repeated application of the cross-sectional collection at different points in time. Longitudinal The same individuals are evaluated on regular follow-ups.
  21. 21. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Options for seroprevalence studies Recommended by WHO proposal, three possibilities for the study design: Cross-sectional Analyzes data from a population, or a representative subset, at a specific point in time. Repeated cross-sectional Also called trend study, it is basically a repeated application of the cross-sectional collection at different points in time. Longitudinal The same individuals are evaluated on regular follow-ups. The most common and easy to implement is the cross-sectional or repeated cross-sectional. Brazilian government has funded both state-level and national-level repeated cross-sectional studies.
  22. 22. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Study designs Time t Time t Time t + Δ Time t Time t + Δ Cross-sectional Repeated Cross-sectional Longitudinal
  23. 23. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Prevalence Studies in Brazil This presentation will focus on the repeated cross-sectional studies developed at Rio Grande do Sul (state-level), carried by UFPel and its partners;
  24. 24. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Prevalence Studies in Brazil This presentation will focus on the repeated cross-sectional studies developed at Rio Grande do Sul (state-level), carried by UFPel and its partners; It’s easier to poke holes in a study than to run one yourself. – COVID-19 Data Dives: The Takeaways From Seroprevalence Surveys. Natalie E. Dean. May/2020. Medscape.
  25. 25. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Focus of this talk This talk will focus on uncertainty modelling using a Bayesian framework for the sampling errors and test kit validation properties; I will not focus on the probability sampling adjustments: Most of the time, you know beforehand (census) the characteristics of populations/sub-populations; You can do a weighting scheme to adjust for non-respondents, different demographics than the census, etc; Source: Sampling and Weighting – A Better Practice Guide for Practitioners, by Data Analysis Australia.
  26. 26. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Section II Rapid Tests
  27. 27. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A SARS-CoV-2 rapid tests Many brands available on the market nowadays, most follow this pattern for IgM/IgG/control lines:
  28. 28. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Wondfo Test Kit
  29. 29. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A 1) Capillary blood/serum/plasma collection
  30. 30. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A 2) Buffer Solution
  31. 31. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A 3) Reading results
  32. 32. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Dynamics of the infection Many of these intervals still have a lot of uncertainty:
  33. 33. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Section III Prevalence in RS
  34. 34. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Seroprevalence study in Rio Grande do Sul The study was designed an developed by UFPel and its collaborators; Funded by federal government (test kits from Ministry of Health);
  35. 35. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Seroprevalence study in Rio Grande do Sul The study was designed an developed by UFPel and its collaborators; Funded by federal government (test kits from Ministry of Health); Planned waves: 11-13/April, N=4500 25-27/April, N=4500 09-11/May, N=4500 23-25/May, N=4500 Later extended up to August
  36. 36. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Seroprevalence study in Rio Grande do Sul The study was designed an developed by UFPel and its collaborators; Funded by federal government (test kits from Ministry of Health); Planned waves: 11-13/April, N=4500 25-27/April, N=4500 09-11/May, N=4500 23-25/May, N=4500 Later extended up to August Used Wondfo COVID-19 test kits Subjects were also interviewed regarding social distancing
  37. 37. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Seroprevalence study in Rio Grande do Sul The study was conducted at 9 sentinel cities: Regiões 1. Porto Alegre 2. Canoas 3. Pelotas 4. Caxias do Sul 5. Santa Cruz do Sul 6. Santa Maria 7. Passo Fundo 8. Ijuí 9. Uruguaiana População gaúcha: 11,3 milhões Cidades da amostra 31% da população do RS Source: slides presented at the government press conference.
  38. 38. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Seroprevalence study in Rio Grande do Sul Respondents for the first wave on each city: Resultados Caxias do Sul 500 Ijuí 500 Passo Fundo 500 Pelotas 500 Santa Cruz do Sul 500 Uruguaiana 500 Santa Maria 461 Porto Alegre 396 Canoas 332 TOTAL 4.189 Source: slides presented at the government press conference.
  39. 39. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Seroprevalence study in Rio Grande do Sul This is the key slide with the results that was presented at the press conference for the first wave of the survey on 11-13/April: Resultados testes válidos4189 testes positivos2 % com anticorpos0,05% infectado a cada 2000 habitantes1 pessoas com anticorpos no RS5650 Source: slides presented at the government press conference.
  40. 40. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A News after the press conference After presenting these results at the press conference, a lot of news articles started to be advertised: Revista Pesquisa Fapesp: "Rio Grande do Sul pode ter 7,5 vezes mais casos do que o confirmado"
  41. 41. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A News after the press conference After presenting these results at the press conference, a lot of news articles started to be advertised: Revista Pesquisa Fapesp: "Rio Grande do Sul pode ter 7,5 vezes mais casos do que o confirmado" UFPel: "Dos 4189 testes validados pelos pesquisadores, dois testes confirmaram positivo, cerca de 0,05% da população gaúcha deve ter sido contaminada."
  42. 42. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A News after the press conference After presenting these results at the press conference, a lot of news articles started to be advertised: Revista Pesquisa Fapesp: "Rio Grande do Sul pode ter 7,5 vezes mais casos do que o confirmado" UFPel: "Dos 4189 testes validados pelos pesquisadores, dois testes confirmaram positivo, cerca de 0,05% da população gaúcha deve ter sido contaminada." Jornal O Nacional: "Estudo estima que RS tenha 5.650 pessoas infectadas pela Covid-19"
  43. 43. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A News after the press conference After presenting these results at the press conference, a lot of news articles started to be advertised: Revista Pesquisa Fapesp: "Rio Grande do Sul pode ter 7,5 vezes mais casos do que o confirmado" UFPel: "Dos 4189 testes validados pelos pesquisadores, dois testes confirmaram positivo, cerca de 0,05% da população gaúcha deve ter sido contaminada." Jornal O Nacional: "Estudo estima que RS tenha 5.650 pessoas infectadas pela Covid-19" Diário da Manhã: "Estudo estima que RS tenha mais de 5.600 pessoas infectadas pela Covid-19"
  44. 44. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A News after the press conference After presenting these results at the press conference, a lot of news articles started to be advertised: Revista Pesquisa Fapesp: "Rio Grande do Sul pode ter 7,5 vezes mais casos do que o confirmado" UFPel: "Dos 4189 testes validados pelos pesquisadores, dois testes confirmaram positivo, cerca de 0,05% da população gaúcha deve ter sido contaminada." Jornal O Nacional: "Estudo estima que RS tenha 5.650 pessoas infectadas pela Covid-19" Diário da Manhã: "Estudo estima que RS tenha mais de 5.600 pessoas infectadas pela Covid-19" Notícia da UNISC: "(...) estima-se que há um infectado a cada dois mil habitantes do estado, podendo existir 5650 pessoas (...)"
  45. 45. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Rule Number I These are some principles from the Royal Statistical Society (Data Science section), the first one is: Rule 1. Scientists and journalists should express the level of uncertainty associated with a forecast All mathematical models contain uncertainty. This should be explicit – researchers should communicate their own certainty that a result is true. A range of plausible results should be provided, not just one extreme result. – All models are wrong, but some are completely wrong. Royal Statistical Society (Data Science section).
  46. 46. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Section IV Modelling Sampling Error
  47. 47. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Prior, likelihood and posterior Posterior p(θ|X)
  48. 48. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Prior, likelihood and posterior Posterior p(θ|X) ∝ p(X|θ) Likelihood
  49. 49. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Prior, likelihood and posterior Posterior p(θ|X) ∝ p(X|θ) Likelihood Prior π(θ)
  50. 50. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A posterior 0 0.5 1 likelihood 0 0.5 1 prior 0 0.5 1 ⇥ / posterior 0 0.5 1 prior 0 0.5 1 ⇥ / ⇥ / likelihood 0 0.5 1 prior 0 0.5 1 likelihood 0 0.5 1 posterior 0 0.5 1 Source: Statistical Rethinking/Winter 2019. Richard McElreath.
  51. 51. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Multiple explanations for phenomena 4 3 2 1 0 1 2 3 4 10 5 0 5 10 Source: Ian Osband et al. Using Randomized Prior Functions for Deep Reinforcement Learning. NIPS 2018. Image from: http://blog.christianperone.com
  52. 52. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Sampling error The first uncertainty we want to model is the sampling error. We are using a small sample of a population to do inference about the population. Image from: https://towardsdatascience.com/data-samples-and-error-visualization-techniques-832c4a7fbcb2
  53. 53. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Prior for prevalence proportion We want to choose a non-informative prior for the prevalence proportion, so we use a Beta(α, β) distribution with a Beta(1.0, 1.0) , that yields a flat prior, with support of x ∈ (0, 1): 0.0 0.2 0.4 0.6 0.8 1.0 0.96 0.98 1.00 1.02 1.04 PDF Prior Beta( , ) for prevalence
  54. 54. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Prior for prevalence proportion We want to choose a non-informative prior for the prevalence proportion, so we use a Beta(α, β) distribution with a Beta(1.0, 1.0) , that yields a flat prior, with support of x ∈ (0, 1): 0.0 0.2 0.4 0.6 0.8 1.0 0.96 0.98 1.00 1.02 1.04 PDF Prior Beta( , ) for prevalence You can also incorporate knowledge, if you know that your prevalence must be higher than i.e. 0.01% (as the state was already doing tests), you can incorporate it here.
  55. 55. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A The likelihood We want now to choose the likelihood for the observed data distribution. COVID-19 tests are number of successes (positive tests) in a sequence of n independent experiments. This is by definition the Binomial(n, p ) distribution.
  56. 56. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A The likelihood We want now to choose the likelihood for the observed data distribution. COVID-19 tests are number of successes (positive tests) in a sequence of n independent experiments. This is by definition the Binomial(n, p ) distribution. with pm.Model() as model: true_p = pm.Beta("true_p", alpha=1, beta=1) obs = pm.Binomial('obs', p=true_p, n=n, observed=positive)
  57. 57. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A The likelihood We want now to choose the likelihood for the observed data distribution. COVID-19 tests are number of successes (positive tests) in a sequence of n independent experiments. This is by definition the Binomial(n, p ) distribution. with pm.Model() as model: true_p = pm.Beta("true_p", alpha=1, beta=1) obs = pm.Binomial('obs', p=true_p, n=n, observed=positive) Model in plate notation:
  58. 58. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Prior predictive distribution Let’s now check the prior predictive distribution by sampling from the model (what is possible under absence of evidence): with model: ppc = pm.sample_prior_predictive(samples=10000, var_names=["true_p"])
  59. 59. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Prior predictive distribution Let’s now check the prior predictive distribution by sampling from the model (what is possible under absence of evidence): with model: ppc = pm.sample_prior_predictive(samples=10000, var_names=["true_p"]) 0.0 0.2 0.4 0.6 0.8 1.0 0 100 200 300 400 500 Prior Predictive Distribution for 'true_p'
  60. 60. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Prior predictive distribution What would happen to the prior predictive distribution if we used a different Beta prior: 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 PDF Prior Beta( , ) for prevalence
  61. 61. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Prior predictive distribution What would happen to the prior predictive distribution if we used a different Beta prior: 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 PDF Prior Beta( , ) for prevalence 0.0 0.2 0.4 0.6 0.8 0 100 200 300 400 500 Prior Predictive Distribution for 'true_p'
  62. 62. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Finally the MCMC sampling Let’s now estimate the posterior distribution: with model: trace = pm.sample(draws=10000, tune=4000, cores=2, target_accept=0.85)
  63. 63. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Finally the MCMC sampling Let’s now estimate the posterior distribution: with model: trace = pm.sample(draws=10000, tune=4000, cores=2, target_accept=0.85) pm.traceplot(trace)
  64. 64. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Finally the MCMC sampling Let’s now estimate the posterior distribution: with model: trace = pm.sample(draws=10000, tune=4000, cores=2, target_accept=0.85) pm.traceplot(trace) 0.0000 0.0005 0.0010 0.0015 0.0020 0.0025 0.0030 true_p 0 2000 4000 6000 8000 0.000 0.001 0.002 0.003 true_p
  65. 65. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Looking at the posterior Let’s visualize posterior distribution and compute intervals: pm.plot_posterior(trace, ref_val=5650, point_estimate="mode")
  66. 66. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Looking at the posterior Let’s visualize posterior distribution and compute intervals: pm.plot_posterior(trace, ref_val=5650, point_estimate="mode") 0 5000 10000 15000 20000 25000 30000 35000 Population 746 16290 94% HPD mode=5518 35.5% <5650< 64.5% Mode is quite similar to the value reported, however, the credibility interval of 94% we see an interval of 764 − 16.290, showing the huge uncertainty due to survey design.
  67. 67. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Perfect test ? We went from a single point estimate of 5650 to an entire posterior distribution with a 94% credibility interval of 764 − 16.290;
  68. 68. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Perfect test ? We went from a single point estimate of 5650 to an entire posterior distribution with a 94% credibility interval of 764 − 16.290; The results we reported are assuming a perfect test, that will never give a False Positive (FP) or a False Negative (FN);
  69. 69. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Perfect test ? We went from a single point estimate of 5650 to an entire posterior distribution with a 94% credibility interval of 764 − 16.290; The results we reported are assuming a perfect test, that will never give a False Positive (FP) or a False Negative (FN); This test doesn’t exist, and the prevalence we are measuring is the apparent prevalence and not the true prevalence. Therefore we have another uncertainty to model: the validation properties of the test kit employed.
  70. 70. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Section V Modelling test kits uncertainty
  71. 71. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Sensitivity and Specificity The two statistical quantities used to measure the performance of diagnostic tests are: sensitivity and specificity.
  72. 72. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Sensitivity and Specificity The two statistical quantities used to measure the performance of diagnostic tests are: sensitivity and specificity. Sensitivity Probability of a positive test given that the patient has COVID-19. SE = TP/(TP + FN)
  73. 73. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Sensitivity and Specificity The two statistical quantities used to measure the performance of diagnostic tests are: sensitivity and specificity. Sensitivity Probability of a positive test given that the patient has COVID-19. SE = TP/(TP + FN) Specificity Probability of a negative test given that the patient is well. SP = TN/(TN + FP)
  74. 74. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Wondfo rapid test validation Rapid test manufacturers usually do test validation on ideal conditions, so the results are often very optimistic. These are from Wondfo for "venous whole blood, serum and plasma" sample: Sensitivity in this case is 86.43% and specificity is 99.57%.
  75. 75. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Wondfo rapid test validation These are from Wondfo for "fingerstick whole blood" sample: Sensitivity in this case is 100.00% and specificity is 98.84%.
  76. 76. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Wondfo rapid test in Brazil The curious case of the missing table (thanks to Ricardo Parolin Schnekenberg):
  77. 77. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A The COVID-19 Testing Project Independent testing project for rapid tests: Site: https://covidtestingproject.org/ Paper: https://www.medrxiv.org/content/10.1101/2020. 04.25.20074856v2 (pre-print) Evaluating tests from many different suppliers; The specimen set comprised: Plasma or serum samples from symptomatic SARS-CoV-2 RT-PCR-positive individuals; Pre-COVID-19 negative controls; Recent samples from individuals who underwent respiratory viral testing but were not diagnosed with COVID-19;
  78. 78. Supplier Days Since Onset / Assay 1-5d IgM IgG IgG +/or IgM 6-10d IgM IgG IgG +/or IgM 11-15d IgM IgG IgG +/or IgM 16-20d IgM IgG IgG +/or IgM >20d IgM IgG IgG +/or IgM Biomedomics Bioperfectus DecomBio DeepBlue Innovita Premier Sure-Bio UCP Biosciences VivaDiag Wondfo Antibody Detection in Samples from SARS-CoV-2 PCR+ Individuals Test Result Negative Positive NA View specificity
  79. 79. Supplier Assay IgM Biomedomics Bioperfectus DecomBio DeepBlue Innovita Specificity Supplier Assay IgM IgG IgG +/or IgM Biomedomics Bioperfectus DecomBio DeepBlue Innovita Premier Sure-Bio UCP Biosciences VivaDiag Wondfo Epitope ELISA Specificity View % positive Test Result Negative Positive NA
  80. 80. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Results for Wondfo IgM or IgG Days since onset Total N Positive % 1 − 5 days 25 10 40.00 6 − 10 days 36 24 66.67 11 − 15 days 33 27 81.82 16 − 20 21 17 80.95 >20 11 9 81.82 Table: Percentage of positive specimens from patients w/ positive RT-PCR. Source: COVID-19 Testing Project.
  81. 81. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Results for Wondfo Sensitivity of the test seems low, increasing the chance of False Negatives (FN); Specificity on the other hand seems quite good: From 106 blood donor plasma specimens collected before July 2018, the test gave only 1 False Positive (FP); However, in very low prevalence scenarios, even a good specificity can be low;
  82. 82. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Results for Wondfo Sensitivity of the test seems low, increasing the chance of False Negatives (FN); Specificity on the other hand seems quite good: From 106 blood donor plasma specimens collected before July 2018, the test gave only 1 False Positive (FP); However, in very low prevalence scenarios, even a good specificity can be low; Now the question becomes: How to propagate the uncertainty of these validation tests into our modelling ? Also, imagine two validation scenarios: Validation test using 33 samples with 27 positives detected; Validation test using 330 samples with 270 positives detected; Both have a sensitivity of 81.82%.
  83. 83. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Beta prior for sensitivity The way we will incorporate the uncertainty of the validation tests is on the prior Beta distribution: N=33 w/ 27 positives detected 0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 5 6 PDF Prior Beta( , ) for sensitiviy
  84. 84. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Beta prior for sensitivity The way we will incorporate the uncertainty of the validation tests is on the prior Beta distribution: N=33 w/ 27 positives detected 0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 5 6 PDF Prior Beta( , ) for sensitiviy N=330 w/ 270 positives detected 0.0 0.2 0.4 0.6 0.8 1.0 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 PDF Prior Beta( , ) for sensitiviy
  85. 85. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A True prevalence vs Apparent Prevalence We know the effect that sensitivity and specificity can affect the prevalence (aka Rogan-Gladen estimator): p True prevalence = Apparent prevalence ˆp + Specificity sp −1 sp Specificity + se Sensitivity −1
  86. 86. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A True prevalence vs Apparent Prevalence We know the effect that sensitivity and specificity can affect the prevalence (aka Rogan-Gladen estimator): p True prevalence = Apparent prevalence ˆp + Specificity sp −1 sp Specificity + se Sensitivity −1 Rearranging terms, we can get the formula for the apparent prevalence: ˆp = p ∗ se + (1 − p) ∗ (1 − sp) Since the apparent prevalence is what we observe, we will model it as our observed distribution.
  87. 87. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Model overview Test Validation Results Sensitivity Prior Specificity Prior True Prevalence Prior Apparent Prevalence Observed We want to estimate
  88. 88. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A First scenario: COVID-19 testing project We will first define our priors for sensitivity and specificity using N=33 positive w/ 27 positives detected 0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 5 6 PDF Prior Beta( , ) for sensitiviy
  89. 89. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A First scenario: COVID-19 testing project We will first define our priors for sensitivity and specificity using N=33 positive w/ 27 positives detected 0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 5 6 PDF Prior Beta( , ) for sensitiviy N=106 negative w/ 1 false positives detected 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 25 30 35 40 PDF Prior Beta( , ) for specificity
  90. 90. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Model modifications with pm.Model() as model: true_p = pm.Beta("true_p", alpha=1., beta=1.) se_p = pm.Beta("se_p", alpha=x_se + 1, beta=n_se - x_se + 1) sp_p = pm.Beta("sp_p", alpha=x_sp + 1, beta=n_sp - x_sp + 1) apparent_p = pm.Deterministic("apparent_p", true_p * se_p + (1.0 - true_p) * (1.0 - sp_p)) obs = pm.Binomial('obs', p=apparent_p, n=s, observed=positive) We’re parametrizing the p of the Binomial(n, p) distribution. This p comes from the apparent prevalence, which is what we observe. The n is our total sample size 4189 in this case for the first wave of the study.
  91. 91. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A MCMC sampling with model: trace = pm.sample(draws=60000, tune=4000, cores=4, target_accept=0.95) pm.traceplot(trace); 0.000 0.001 0.002 0.003 0.004 0.005 true_p 0 10000 20000 30000 40000 50000 0.000 0.002 0.004 true_p 0.5 0.6 0.7 0.8 0.9 1.0 se_p 0 10000 20000 30000 40000 50000 0.6 0.8 1.0 se_p 0.996 0.997 0.998 0.999 1.000 sp_p 0 10000 20000 30000 40000 50000 0.996 0.997 0.998 0.999 1.000 sp_p 0.000 0.001 0.002 0.003 0.004 0.005 apparent_p 0 10000 20000 30000 40000 50000 0.000 0.002 0.004 apparent_p
  92. 92. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Sensitivity vs Specificity posterior pm.pairplot(trace, var_names=["se_p", "sp_p"], kind="kde") 0.5 0.6 0.7 0.8 0.9 Sensitivity 0.9960 0.9965 0.9970 0.9975 0.9980 0.9985 0.9990 0.9995 Specificity
  93. 93. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A True prevalence posterior estimation For the true prevalence estimation, we see that with a credibility interval of 94% we have now a interval of 0 − 15201 with the mean at 5789 but with the mode very close to zero. 0 10000 20000 30000 40000 50000 60000 0 15201 94% HPD mode=0 60.1% <5650< 39.9%
  94. 94. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Second scenario: Wondfo validation We will first define our priors for sensitivity and specificity using N=42 positive w/ 42 positives detected 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 PDF Prior Beta( , ) for sensitiviy
  95. 95. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Second scenario: Wondfo validation We will first define our priors for sensitivity and specificity using N=42 positive w/ 42 positives detected 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 PDF Prior Beta( , ) for sensitiviy N=172 negative w/ 2 false positives detected 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 PDF Prior Beta( , ) for specificity
  96. 96. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Sensitivity vs Specificity posterior pm.pairplot(trace, var_names=["se_p", "sp_p"], kind="kde") 0.75 0.80 0.85 0.90 0.95 Sensitivity 0.996 0.997 0.998 0.999 Specificity
  97. 97. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A True prevalence posterior estimation Now with Wondfo validation data, we see that with a credibility interval of 94% of 0 − 11372 with the mean at 4218 but with the mode very close to zero too. 0 10000 20000 30000 40000 0 11372 94% HPD mode=0 73.1% <5650< 26.9%
  98. 98. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Final remarks Uncertainty is paramount in scientific communication; Scientists and journalists have a moral responsibility to convey the uncertainty;
  99. 99. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Final remarks Uncertainty is paramount in scientific communication; Scientists and journalists have a moral responsibility to convey the uncertainty; We are discovering much more about asymptomatic and mild SARS-CoV-2 infections. Example: "Growing evidence suggests that asymptomatic and mild SARS-CoV-2 infections, together comprising >95% of all infections, may be associated with lower antibody titers than severe infections. In addition, antibody levels peak a few weeks after infection and decay gradually. Yet, positive controls used for determining the sensitivity of serological assays are usually limited to samples from hospitalized patients with severe disease, leading to what is commonly known as spectrum bias in estimating seroprevalence in the general population." Source: "Are SARS-CoV-2 seroprevalence estimates biased ?", Saki Takahashi, et al. 2020. (pre-print).
  100. 100. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Be careful with "good news"
  101. 101. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Be careful with "good news" 11-03 13-03 15-03 17-03 19-03 21-03 23-03 25-03 27-03 29-03 31-03 02-04 04-04 06-04 08-04 10-04 12-04 14-04 16-04 18-04 20-04 22-04 24-04 26-04 28-04 30-04 02-05 04-05 06-05 08-05 10-05 12-05 14-05 16-05 18-05 20-05 22-05 24-05 26-05 28-05 30-05 01-06 03-06 05-06 07-06 09-06 11-06 13-06 15-06 17-06 19-06 21-06 23-06 Date 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 Maximum since the beginning of the outbreak Confirmed COVID-19 patients on ICUs in Porto Alegre/RS/Brazil ICU Porto Alegre/RS/Brasil
  102. 102. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A R(t) estimation
  103. 103. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A References Takahashi, S., Greenhouse, B., & Rodríguez-Barraquer, I. (2020, May 30). Are SARS-CoV-2 seroprevalence estimates biased ? https://doi.org/10.31219/osf.io/y3fxt. Daniel B. Larremore, Bailey K. Fosdick, Sam Zhang, Yonatan H. Grad bioRxiv 2020.05.23.112649; doi: https://doi.org/10.1101/2020.05.23.112649 Natalie E. Dean. Commentary: COVID-19 Data Dives: The Takeaways From Seroprevalence Surveys. 2020. https://www.medscape.com/viewarticle/929861. Jeffrey D. Whitman, et al. Test performance evaluation of SARS-CoV-2 serological assays. medRxiv 2020.04.25.20074856; doi: https://doi.org/10.1101/2020.04.25.20074856. Governo do Rio Grande do Sul. Pesquisa de Prevalência. https://bit.ly/3dlUzVZ. CDC, Centers for Disease Control and Prevention. COVID-19 Serology Surveillance Strategy. 2020. https://bit.ly/3df4wEw M. J. Vilar, et al. Bayesian Estimation of the True Prevalence and of the Diagnostic Test Sensitivity and Specificity of Enteropathogenic Yersinia in Finnish Pig Serum Samples. 2015. https://bit.ly/3egUV1c
  104. 104. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A References Christian S. Perone. Nota sobre o estudo da UFPel no Rio Grande do Sul. 2020. http://blog.christianperone.com/2020/04/nota-ufpel-covid. Christian S. Perone. COVID-19 R(t) estimates for states in Brazil and Portugal. 2020. https://perone.github.io/covid19analysis/. Timothy L. Lash, et al. Applying Quantitative Bias Analysis to Epidemiologic Data. Book, 2009. https://link.springer.com/book/10.1007/978-0-387-87959-8. Benjamin D. Brody, Sharon J. Parish, Dora Kanellopoulos, Mark J. Russ. (2020) A COVID-19 Testing and Triage Algorithm for Psychiatric Units: One Hospital’s Response to the New York Region’s Pandemic.. Psychiatry Research, 113244. https://bit.ly/314cJJc. Eran Bendavid, et al. COVID-19 Antibody Seroprevalence in Santa Clara County, California. 2020. medRxiv 2020.04.14.20062463; doi: https://doi.org/10.1101/2020.04.14.20062463. W. J. Rogan and B. Gladen. Estimating prevalence from the results of a screening test. American Journal of Epidemiology, vol. 107, no. 1, pp. 71–76, 1978.
  105. 105. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Section VI Q&A
  106. 106. Epidemiology Concepts Rapid Tests Prevalence in RS Modelling Sampling Error Modelling test kits uncertainty Q&A Q&A

×