: COVID-19 has profoundly impacted all our lives. Not all such impacts in science are negative. For example, how we adapt to online learning, remote mentorship, and online teamwork may become new “norms” of future scientific collaborations, breaking down institutional boundaries to communication. The COVID-19 pandemic has united the scientific community more than ever, through more than 3600 clinical trials, 60,000 peer-reviewed publications, 80,000 SARS-CoV-2 genome sequences, 100,000 COVID-19 open software tools, and a global community of scientists, with which all of us are working hard to find epidemiological patterns, diagnosis, therapeutics, and vaccines in a “War Against COVID-19”. In this talk, I will define and characterize data-driven medicine primarily through my personal journey in the past ten months, having witnessed the rapid “weaponizing of data science tools” in our community’s fight against COVID-19 (including ours, at http://covid19.ubrite.org/). I will review up-to-date COVID-19 literature, especially those related to how biomedical informatics, data science, and artificial intelligence have been applied in accelerating COVID-19 breakthrough discoveries, from basic research to clinical practice. I will end by sharing my thoughts on how the future of medicine in cancer and other translational areas can benefit from the proactive incorporation of new “data science engines.”
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
Lessons from COVID-19: How Are Data Science and AI Changing Future Biomedical Research?
1. Lessons from COVID-19:
How Are Data Science and AI Changing Future Biomedical Research
Jake Y. Chen, PhD, FACMI
Professor & Chief Bioinformatics Officer, Informatics Institute
The University of Alabama at Birmingham, USA
jakechen@uab.edu | aimed-lab.org
October 30th 2020
3. Outline
• COVID-19: can we estimate the pandemic numbers?
• COVID-19 Big Data, Data Science, and AI
• Data science: team-based approach at UAB
• Modeling, simulations, and predictions
• Final thoughts: data-driven medicine
3
4. 4
On 3/10, I gave the first UAB-wide COVID-19 talk
6. My COVID-19 prediction relied
on inferring World’s response
from China’s accomplishment
6
China
Other countries
10
100
1k
10k
100k
~1 month delay
Jake’s prediction (if
contained):
• 3/20: 87k
• 4/1: 130-170k
• 5/1: ~300k
• June: season ends
Total affected:
• <500K worldwide
excluding China
Total deaths:
• <15k worldwide
excluding China
Warm weather appears to curb transmission
Analysis Done on March 10th, 2020; Data source: JHU
7. Why was I so wrong in my prediction?
7
“Infectious diseases do not respect national boundaries…The United
States should be investing efforts and funds to strengthen the
health structures in countries around the world.”
“Lessons from SARS” (2003) Science: Vol. 300, Issue 5620, pp. 701
8. COVID-19: Where are we now?
8
1/21 4/11
500,000 cases
10/21-28
500,000cases
500,000
15.2 days
Cases in the
past week
Doubling time
since the start
109 days
Doubling time
in the past two
months
9. A very conservative look into the near future
9
10.6 M
Cases by 12/1/2020
13 M
Cases by 1/1/2021
EstimatedUncheckedDate
20 M
Cases by March 2021
11. COVID-19 Infection Fatality Ratio (IFR)
A historic comparison between the US and China
Chinese CDC (2/11/2020) US CDC (9/10/2020)
0-19 Years 0.1% (1 in 1,000) 0.003% (1 in 33,333)
20-49 Years 0.26% (1 in 375) 0.02% (1 in 5,000)
50-69 Years 2.45% (1 in 40) 0.5% (1 in 200)
70-79 Years 8% (1 in 12) 5.5% (1 in 18)
80+ Years 14.8% (1 in 7) NA
11
Source: China CDC Weekly, Vol. 2, No. 8, pp 113-122; US CDC
https://www.cdc.gov/coronavirus/2019-ncov/hcp/planning-scenarios.html
12. Asymptomatic COVID-19 patients
• “40-45% of COVID-19 patients may be asymptomatic”
• D.P. Oran and E.J. Topol“Prevalence of Asymptomatic SARS-CoV-2 infection“,
in Annals of Internal Medicine, 9/1/2020
• “20% of people who become infected with SARS-CoV-2 and remain
asymptomatic throughout infection”
• 94 studies
• D Buitrago-Garcia et al “Occurrence and transmission potential of
asymptomatic and presymptomatic SARS-CoV-2 infections: A living systematic
review and meta-analysis” in Plos Medicine, 9/22/2020
12
14. COVID-19 PCR
diagnostic test
has a False
Negative Rate
of 25-50%
(days 5-16)
14Ann Intern Med . 2020 Aug 18;173(4):262-267. doi: 10.7326/M20-1495
15. Is 10% COVID infection worldwide true?
15
“WHO’s best
estimates
indicate roughly
1 in 10 people
worldwide may
have been
infected by the
coronavirus.”
--Dr. Michael Ryan,
WHO COVID-19 Executive
Board 10/5/2020
10% of World population (7.8 B) = 780 M, or ~ 23 x 35M confirmed cases (10/4/2020)
16. USA India
Population 331M 1.4 B
COVID-19 Tests Performed 142 M 101 M
Cumulative Cases 8.4M 7.8 M
% positives given tests = 8.4 / 142 =5.9% = 7.8 / 101 = 7.7%
Adjust for asymptomatic = 5.9% / 80% = 7.4 % = 7.7% / 80% = 9.6%
Adjust further for false negative
(FNR=30%) tests
= 7.4 % * 70% + (1 - 7.4%) * 30% =
29.3%
= 9.6 % * 70% + (1-9.6%) * 30% = 29.1 %
Adjusted Actual Cases = 142 M * 29.3% = 42 M =101 M * 29.1% = 29 M
Adjust for un-tested population = 42M + (331 M - 142 M) *
(8.4M/331M) = 42M + 4.8M = 47M
= 29M + (1.4 B - 101 M) * (7.8M/101M)
= 29M + 101M = 130M 16
My unpublished worksheet showing 10% of all of USA and
India population may have already been infected
17. Data Science Lessons Learned
1. Sound data science is built on sound data collection
and curation
• Epidemiology and public health data collections
2. Ensure data quality and trustworthiness before using
them.
3. Valid real-world assumptions go before “theoretical
beauty” of any model
17
18. Outline
• COVID-19: can we estimate the pandemic numbers?
• COVID-19 Big Data, Data Science, and AI
• Data science: team-based approach at UAB
• Modeling, simulations, and predictions
• Final thoughts: data-driven medicine
18
19. Global sharing of COVID-19 information critical
for biomedical research and healthcare
19LANCET. VOLUME 395, ISSUE 10224, P537, FEBRUARY 22, 2020
20. Growth of COVID-19 Papers in PubMed
20
N=65, 685 as of 10/28/2020
160 % All UAB PubMed Publications since 1981
50 % All cancer-related PubMed Publications in 2020
3 years All infectious disease Publications in 2017-19
16 % All papers contain the phrase “data” in Title/Abstract
26. AI helping fight against COVID-19
• Detecting outbreaks, tracing contacts and shaping public health
responses
• Screening for people who might be infected
• Facilitating earlier diagnosis
• Predicting risk of deterioration and poor outcomes
• Augmenting remote monitoring and virtual care
• Developing potential treatments and vaccines
• Assisting hospital responses
26
PMID: 33111999
31. Critics of Black-box AI in the era of COVID-19
• Risk of cyber-attack
• E.g., input attacks that leads to wrong results (criminal
offense) are difficult to discern
• Risk of systematic bias affecting the patient’s
healthcare
• Sampling bias, e.g., against minorities or rare disorders or
gender/age
• Human has an “automation” bias (i.e., think machine is
unbiased from human decisions)
• Machines are not as held responsible as human
physicians
• Risk of mismatch
• Correlation instead of causation
• Unable to discern “meta-risk” (false risk) from real risk
31
PMID: 33110296
32. Data Science Lessons Learned
1. Global community of open data sharing is critical to
fighting future major diseases
2. AI is increasingly relevant for biomedical research
and selected healthcare applications
3. AI is not mature enough for broad-scale adoptions in
the US
32
33. Outline
• COVID-19: can we estimate the pandemic numbers?
• COVID-19 Big Data, Data Science, and AI
• Data science: team-based approach at UAB
• Modeling, simulations, and predictions
• Final thoughts: data-driven medicine
33
34. Can we speed up team data science development?
1. Ideas & Team Formation
34
2. Prototyping
3. Pilot Studies
4. System Building
5. Real-world Solutions
Feedback,
decomposition, testing,
refinement, integration,
and redeployment
35. We developed a new COVID-UBRITE infrastructure
Amy Wang, MD
Matt Wyatt
Jelai Wang
http://covid19.ubrite.org/
39. 2020 COVID-19 Data Science Hackathon:
An Overview
116 90 40 10
MS Team
Users
Registrants Registrants Teams Winners
3
Pre-hackathon Course
“Programming with
Biological Data”
(Rosenberg, Basu, & Wang)
Bootcamps
“UBRITE for COVID-19
Hackathon”
(Chen, Wang, Jelai &
Wyatt)
Two-day Hackathon
“COVID-19 Data-driven
Medicine”
(Wang, Jelai, Wyatt &
Chen)
Hackathon Symposium
“COVID-19 Hackathon
Showcase”
(Wang)
6/15-16 6/196/9-125/11 – 6/5
40. Modeling the COVID-19 Epidemic with CNN and Conway’s Game of Life
Selected Team Projects Performed by Contestants
County-level social determinants of COVID-19 health
Genetic epidemiology analysis of SARS-CoV-2A Network-based SEIDR model in simulating the outbreak of COVID-19
43. 1. There are three pillars to successful team-based data science:
people, technology, and science
2. Data curation and data processing takes the majority of time yet
they are essential to problem-solving
3. To be prepared for national impact in a rapidly evolving
environment, a person/team needs to already become expert on
a selected topic
Data Science Lessons Learned
44. Outline
• COVID-19: can we estimate the pandemic numbers?
• COVID-19 Big Data, Data Science, and AI
• Data science: team-based approach at UAB
• Modeling, simulations, and predictions
• Final thoughts: data-driven medicine
44
46. Policy indices of COVID-19 government responses
46
https://github.com/OxCGRT/covid-policy-tracker/blob/master/documentation/codebook.md
The Oxford COVID-19 Government Response Tracker
(OxCGRT) provides a systematic measure across
governments and across time to understand how
government responses have evolved over the full
period of the disease’s spread.
CHI
47. US Geographical variations of CHI Reported
daily cases
CHI
Days after 1st
recorded case
Source: University of Oxford
https://www.bsg.ox.ac.uk/research/publications/variation-us-states-responses-covid-19
48. Containment and Health Index (CHI)
Drawn according to state government’s party affiliations
48
Source: University of Oxford
https://www.bsg.ox.ac.uk/research/publications/variation-us-states-responses-covid-19
49. 49
Mask Wearing: a central policy mandate difference
among republican and democrat controlled states
51. Simulating the impact of three health policies
51
Scenario
Mask use in the
population?
When are
mandates removed?
Threshold at which
mandates are re-imposed?
What mandates?
Current
projection
Assumes mask use
continues at currently
observed rates
Assumes that the gradual easing
of social distancing mandates
continues
Assumes that mandates will
be re-imposed for six weeks if
daily deaths reach 8 per million
•Educational facilities closed
•Non-essential businesses closed
•People ordered to stay at home
•Large gatherings banned
Mandates
easing
Assumes mask use
continues at currently
observed rates
Assumes that the gradual easing
of social distancing mandates
continues
Assumes that mandates are
never re-imposed
Not applicable
Universal
masks
Assumes mask use
rises to 95% within 7
days
Assumes that the gradual easing
of social distancing mandates
continues
Assumes that mandates will
be re-imposed for six weeks if
daily deaths reach 8 per million
•Educational facilities closed
•Non-essential businesses closed
•People ordered to stay at home
•Large gatherings banned
Source: UW Institute for Health Metrics and Evaluation https://covid19.healthdata.org
52. Public mask use over time in the US
% of population who say they would wear a mask in public
52Source: https://covid19.healthdata.org/united-states-of-america?view=total-deaths&tab=trend
Singapore, China, etc
53. Change in mobility over time in the US
Mobility estimated from cell phone proximity data
53Source: https://covid19.healthdata.org/united-states-of-america?view=social-distancing&tab=trend
Projection: mask use, testing, isolation,
and contact tracing all together
54. Universal masks could save 240K lives, while mandates
easing could cost 350K more lives (by 1/1/2021)
54Source: https://covid19.healthdata.org/global?view=total-deaths&tab=trend
55. The Network SEIR Model
susceptible (S), exposed (E), infected (I),
deceased (D), and recovered (R)
4
2 2
1 3
2
3
3
3
2
2
2
2
3 2
2
3800
2800
1500
2600
210,000
200,000
Family levelCommunity levelCounty level
4,908,621
State level
Family size avg.: 2.52±1.5Community size avg.: 3000±1000County size avg.: 35000±10000
56. 𝛽
σ
𝜃
3/14/20 500+
250-499
100-249
50-99
10-49
1-9
0
Confirmed case propagation in the community network of Jefferson county, AL
3/28/20
4/11/20
4/25/20
5/9/20
5/23/20
6/6/20
6/14/20
A susceptible-infected contact results in a new exposure
An exposed person becomes infective
The group-to-group contact rate
Population: 658,573
Quarantine: community
Average size: 3000
The first day of the infected case (1 case) reported in Jefferson county in Alabama is '3/13/20’
57. Developing a parameterized network SEIR model to
predict Jefferson county infections on 1/1/2021
𝜷
𝝈
𝜽
April 4th, Stay-at-home order
April 30th, Reopened
May 25th, Protest
Model β σ θ
stay_home 0.033 0.353 0.0006
reopen 0.039 0.359 0.003
protest 0.043 0.376 0.002
Interpretation Range
β
A susceptible-infected contact
results in a new exposure
[0,0.15]
σ
An exposed person becomes
infective
[0.3,0.4]
θ The group-to-group contact rate [0, 0.01]
Define Model for Jefferson County, AL
• Total population: 658,573
• Community size: 3,084±941
• Community: 219
48,000
4. 1/1/2021 prediction
43,000
35,000
1/1/2021
10/30/2020
3. Model parameterization2. Model parameter fitting1. Model construction
58. Data Science Lessons Learned
1. Consider discretizing the qualitative traits such as health policy to
be able to quantitatively studying them
2. Mathematical/engineering models can better explain mechanisms,
while giving predictions.
3. Governmental health policy can directly save or cost human lives!
58
59. Outline
• COVID-19: can we estimate the pandemic numbers?
• COVID-19 Big Data, Data Science, and AI
• Data science: team-based approach at UAB
• Modeling, simulations, and predictions
• Final thoughts: data-driven medicine
59
61. Cellular abnormality
Subcellular changes
Genetic variation
Altered reactions
Phenotypic changes
Organ damage
…
Human Diseases
Animal Models
Microscopy
X-ray crystallography
Neuro imaging
Radioactive tracers
Surgery
…
Experimental Tools
Physician-led hospital care
62. Genetic basis of the
disease
Mutations
Inflammation
Oxidative stress
Cytokines/Hormones
Biochemical changes
…
Individual
Abnormality
Genomic Medicine
Precision Medicine
CRISPER
Network Medicine
Deep learning
Robo-surgery
AI-enabled Drug
discovery
…
Data science + AI
Data-driven robo-doc-assisted healthcare
63. Acknowledgment
Zongliang Yue, PhD
Eric Zhang,
Sunny Khurana
Nishant Batra
Son Do Hai Dang
Thanh Nguyen, PhD
The AI.Med Lab
Univ. WI Madison
U54TR002731
U01CA223976
R01AI134023
R01AR073850
UAB Informatics Institute
Jim Cimino, MD
Amy Wang, MD
Wayne Liang, MD
Jelai Wang
Matt Wyatt
Geoff Gordon
Nafisa Ajala
Heather WattsUAB CS
Da Yan, PhD
Rubio Pique Jose
UAB BME
Jay Zhang, MD/PhD
Clark Xu
Editor's Notes
Hospital capacity need
Second paper: Improved
How the infected cases propagate
To provide you a better understanding of the N-SEIDR, we show the