This document provides an overview and caveats for COVID-19 analyses conducted by Dr. Steven Shafer of Stanford University. It outlines data sources, modeling approaches, locations included, and a request for feedback. Dr. Shafer aims to keep the analysis apolitical and provide daily updates, though clinical duties may delay some. Suggestions are welcome to improve understanding of the pandemic trajectory.
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
COVID-19 Analysis: June 23, 2020
1. Caveats and Comments
1
Overview:
This is my analysis, not Stanford’s. It is simply a set of regressions in R to understand the trajectory of COVID. It is not confidential and can be freely shared. The R
program code and all previous PowerPoint files are available at https://github.com/StevenLShafer/COVID19/. Please contact me at steven.shafer@Stanford.edu if
you would like to be added or removed from the recipient list. Suggestions are most welcome! You are welcome to use the R code on GitHub for any purpose. If
you create new graphs, please let me know, as I may want to add them to my analyses.
I am attempting to keep the analysis and commentary as apolitical as possible. I am now including partisan lean as a metric. This is just more data to understand
the COVID epidemic.
I try to provide a daily update in the morning. However, as an anesthesiologist at Stanford, when I have clinical duties and USAFacts has not updated the US data
by the time I leave for Stanford the analysis will be delayed.
There is a lot of information on the figures. If something isn’t clear, please see the explanation on slide 2.
Data sources:
• USA Case Data: https://usafactsstatic.blob.core.windows.net/public/data/covid-19/covid_confirmed_usafacts.csv
• USA Death Data: https://usafactsstatic.blob.core.windows.net/public/data/covid-19/covid_deaths_usafacts.csv
• USA Testing Data: https://raw.githubusercontent.com/COVID19Tracking/covid-tracking-data/master/data/states_daily_4pm_et.csv
• Global Case Data: https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
• Global Death Data: https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv
• Global Testing Data: https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv
• Mobility Data: https://www.gstatic.com/covid19/mobility/Global_Mobility_Report.csv
• Partisan Lean: MIT Election Data and Science Lab: https://doi.org/10.7910/DVN/VOQCHQ/HEIJCQ
Models:
1. Future projections are based on the Gompertz function: log 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑐𝑎𝑠𝑒𝑠 = 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑐𝑎𝑠𝑒𝑠 + 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑐𝑎𝑠𝑒𝑠 − 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑐𝑎𝑠𝑒𝑠 1 − 𝑒−𝑘 𝑡 . This is a
naïve asymptotic model. k is the rate constant, such that log(2) / k = time to 50% rise. t is the number of days. Wikipedia has a good description:
https://en.wikipedia.org/wiki/Gompertz_function. The Gompertz function is estimated from the last 3 weeks of data for cumulative cases. These points
appear as red dots in the figures.
2. The rate of changed is based on a simple linear regression of new cases over the last 10 days, divided by the number of new cases yesterday.
Locations
The locations for the modeling are where Pamela and I have family and friends, or locations of interest to friends and colleagues. Additionally, some locations
figure prominently in news reports (e.g., China, South Korea, Sweden, Brazil) or have significant economic impact on the United States (e.g., Japan, Canada,
Mexico). I am happy to add locations, just let me know.
Stay safe, well, resilient, and kind.
Steve Shafer
steven.shafer@Stanford.edu
2. 2,586,092
152,804
1
10
100
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
Actual(points)/Predicted(line)
Phase
Pre-Model
Modeled
Deaths
Tests
USA projection as of 2020-05-27
0
10,000
20,000
30,000
0
2,000
4,000
6,000
Cases/Day
Deaths/Day
Cases: 1,662,302 -- Deaths: 98,220 -- Deaths per 10,000: 3.1 -- Case Mortality: 5.9% -- Daily Change in Cases: -0.5%
Explanation of the Figures
2
Brown dots:
cumulative tests
Red dots: cumulative cases
used to estimate Gompertz
function, presently set to last
3 weeks
Red line: predicted cumulative
cases based on the Gompertz
function estimated from the red
dots
Red number: total cases
on June 30th, based on
the Gompertz function
estimated from the red
dots
Black number: total
cases on June 30th,
based on the total
cases (see above) *
case mortality (see
below)
Black line: predicted
cumulative deaths, calculated
as red line (above) * mortality
(see below)
Red line: predicted new cases
based on the Gompertz
function estimated from the
red dots
Axis for deaths / day, usually
1/5th of the axis for cases /
day on the left side of the
figure.
Funny bump: an artifact of the
Gompertz function, which
mathematically requires an
inflection in total cases.
Green line: linear regression
over 8 days, used to calculate
percent increase / decrease
(see below)
Daily change in cases,
calculated as the slope of the
green line (above left) /
number of new cases
yesterday.
Case mortality:
cumulative deaths
/ cumulative cases.
Cases / day calculated
from cumulative cases
used to estimate the
Gompertz function
Cases / day calculated
from cumulative cases
not used to estimate the
Gompertz function
Deaths / day,
axis is on the left
Cumulative deaths
/ population *
10,000
Blue line: today
Blue dots: cumulative cases not
used to estimate Gompertz
function
Yesterday’s
total cases
and deaths
Axis for cases / day.
Axis for deaths / day
appears to the right.
Geographic
location
Date of analysis,
also shown as
blue vertical line
below
8. Change in New Cases per Day
New cases are:
Increasing > +3%
Increasing between +1% and +3%
No Change (-1% to +1%)
Decreasing between -1% and -3%
Decreasing > -3%
New cases by state as of 2020-06-23
8
9. Cases as a Percent of Peak Cases
HI TX FL
OK LA MS AL GA
AZ NM KS AR TN NC SC DC
CA UT CO NE MO KY WV VA MD DE
OR NV WY SD IA IN OH PA NJ CT RI
WA ID MT ND MN IL MI NY MA
WI VT NH
AK ME
0
25
50
75
100
0
25
50
75
100
0
25
50
75
100
0
25
50
75
100
0
25
50
75
100
0
25
50
75
100
0
25
50
75
100
0
25
50
75
100
PercentofPeak
Daily Cases as a Percent of Peak Cases
9
10. Change in New Deaths per Day
New deaths are:
Increasing > +0.5%
Increasing between +0.1% and +0.5%
No Change (-0.1% to +0.1%)
Decreasing between -0.1% and -0.5%
Decreasing > -0.5%
New deaths by state as of 2020-06-23
10
11. Deaths as a Percent of Peak Deaths
HI TX FL
OK LA MS AL GA
AZ NM KS AR TN NC SC DC
CA UT CO NE MO KY WV VA MD DE
OR NV WY SD IA IN OH PA NJ CT RI
WA ID MT ND MN IL MI NY MA
WI VT NH
AK ME
0
25
50
75
100
0
25
50
75
100
0
25
50
75
100
0
25
50
75
100
0
25
50
75
100
0
25
50
75
100
0
25
50
75
100
0
25
50
75
100
PercentofPeak
Daily Deaths as a Percent of Peak Deaths
11
13. Percent Population Tested
HI TX FL
OK LA MS AL GA
AZ NM KS AR TN NC SC DC
CA UT CO NE MO KY WV VA MD DE
OR NV WY SD IA IN OH PA NJ CT RI
WA ID MT ND MN IL MI NY MA
WI VT NH
AK ME
0
5
10
15
20
0
5
10
15
20
0
5
10
15
20
0
5
10
15
20
0
5
10
15
20
0
5
10
15
20
0
5
10
15
20
0
5
10
15
20
PercentPopulationTested
Percent Population
13
14. Percent Positive Tests
HI TX FL
OK LA MS AL GA
AZ NM KS AR TN NC SC DC
CA UT CO NE MO KY WV VA MD DE
OR NV WY SD IA IN OH PA NJ CT RI
WA ID MT ND MN IL MI NY MA
WI VT NH
AK ME
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
0
5
10
15
PercentPositive(%)
Percent Positive Tests Over Last 28 Days
14
15. Case Mortality vs. Testing
AL
AK
AZ
AR
CA
CO
CT
DE
DC
FL
GA
HIID
IL
IN
IA
KS
KY
LA
ME
MD
MA
MI
MN
MS
MO
MT
NE
NV
NH
NJ
NM
NY
NC ND
OH
OK
OR
PA
RI
SC
SD
TN
TX
UT
VT
VA
WA
WV
WI
WY
0.0
2.5
5.0
7.5
10.0
0 5 10 15 20
% Tested
%Mortality
Mortality vs. Testing as of 2020-06-23
15
68. Change in New Cases per Day
Direction
Increasing > +2%
Increasing between +0.5% and +2%
No Change (-0.5% to +0.5%)
Decreasing between -0.5% and -2%
Decreasing > -2%
NA
Trends by county as of 2020-06-23
NA = Inadequate data
68
69. Percent Change by Partisan Lean
-10
-5
0
5
10
0 25 50 75 100
Percent Republican
Percentchangeinnewcasesperday
25
50
75
Republican
Counties by 2016 presidential election results
Dark green line is a Friedman's supersmoother
69
70. Percent Change by Population
-10
-5
0
5
10
1,000 10,000 100,000 1,000,000 10,000,000
Population
Percentchangeinnewcasesperday
25
50
75
Republican
Counties by Population
Dark green line is a Friedman's 'super smoother'
70
71. Partisan Lean vs Population and Direction
1,000
10,000
100,000
1,000,000
10,000,000
0 25 50 75 100
Percent Republican
Population
Direction
Increasing > +2%
Increasing between +0.5% and +2%
No Change (-0.5% to +0.5%)
Decreasing between -0.5% and -2%
Decreasing > -2%
Partisan Lean vs Population and Direction
Dark green line is a Friedman's 'super smoother'
71
72. Cases as a Percent of Population
0.001%
0.01%
0.1%
1%
10%
20%
1,000 10,000 100,000 1,000,000 10,000,000
County Population
Totalcases
Total Cases as a Percent of County Population
Slanted lines are counties with small integer numbers of cases, green line: Friedman's 'super smoother'
72
73. Deaths as a Percent of Population
0.0001%
0.001%
0.01%
0.1%
1%
1,000 10,000 100,000 1,000,000 10,000,000
County Population
Totaldeaths
Total Deaths as a Percent of County Population
Slanted lines are counties with small integer numbers of cases, green line: Friedman's 'super smoother'
73
74. Case Mortality vs. Population
0.1%
1%
10%
100%
1,000 10,000 100,000 1,000,000 10,000,000
County Population
Casemortality
Case Mortality vs. County Population
74
146. Case Mortality vs. Testing
BHR
LUX
ISL
DNK
LTU
RUS
QAT
PRT
BLR
ISR
KWT
ITA
USA
IRL
EST
BEL
AUS
LVA
NZL
ESP
MDVKAZ
CAN
GBR
AUT
DEU
CHE
SGP
NOR
SRB
CHL
CZE
SVN
FIN
SWE
SVK
SAU
POL
TUR
ROU
NLD
GRC
HUN
PANKORZAF
MYS
SLV
HRV
BGR
IRN
URY
MAR
CUB
UKR
COL
GHARWA
PRY
THA
PER
ARG
TUN
ECU
CRI
IND
PAK
PHL
NPL
BOL
SENBGD
JPN
MEX
TWN
UGAVNM
BRA
KEN
ETH
ZWE
IDN
MMR
NGA
0
5
10
15
0 10 20
% Tested
%CaseMortality
Case Mortality vs. Testing as of 2020-06-23
146