SlideShare a Scribd company logo
1 of 16
Download to read offline
coronavirus-case-tracking
May 31, 2020
1 The Data Science Pipeline: COVID-19 Case Tracking
Philip Tian, Jonathan Lin, David Ahmed
1.1 Introduction
We will introduce the data science pipeline by analyzing data from the (currently ongoing) COVID-
19 pandemic in the United States. COVID-19, also known as the coronavirus, or more specifically
the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a disease that first appeared
in the continental United States in early February and March. A comprehensive list of known
and documented symptoms can be found here. As of May 15, 2020, there are around 4.5 million
individuals infected with the virus, which is why many experts in the media and academia are
calling this crisis a pandemic.
The COVID-19 crisis has caused many temporary economic and social changes. For example, most
states in the United States have issued stay-at-home orders, where guidelines are issued on when
and why one can leave home for certain reasons. These issues mean that normal work activities have
stopped and countries have been struggling with issues like sudden drops in economic productivity
(documented in news stories like this one or this one). These economic issues directly caused by
the virus have big impacts on the well-being of people all across the world, as unemployment grows
in several countries worldwide, which is a direct consequence of the effects that the pandemic and
governmental policies like stay-at-home and the closure of “non-essential businesses” have on the
economy and people’s income.
Given this balance between maintaining local or national economic health and preventing the
spread of a dangerous disease, policymakers are faced with certain questions about the current
state of affairs. - How long should we maintain a stay-at-home order nationally/locally? - Are
other states/countries handling the situation “better”? What are they doing differently? - What
does “better” even mean? Can we quantify these things when we are making policy decisions? -
How will cases grow from today? Can our current infrastructure (hospitals) handle this growth?
Faced with a variety of policy decisions that might literally cost lives, policy makers are walking on
tightropes, and in this day and age of information it is especially important that the decision which
is finally made is close to optimal. But how does one know whether or not a decision is optimal?
Such a question can be answered using the techniques of data science.
1.1.1 The Importance of Data Science During the COVID-19 Crisis
As the total number of COVID-19 cases increases in the USA and worldwide, it seems almost too
easy for the mainstream media to capitalize on all the news cases to churn out new news stories. An
1
increase in cases (perhaps coupled with some official or expert statement) is a news story waiting
to happen. For example, here, here, and here are all news stories which were directly obtained by
internet searching the phrase “increase in cases”. With this influx of information, there are certain
questions we should ask. - Can we trust that the data that is being reported is accurate and
correct? - What exactly does 1000 (or 2000, or 500) cases daily mean in the context of a certain
state/country/county? - To what extent can we predict how the crisis (especially in terms of the
number of cases) evolves over time? - Is it possible to make policy or economic decisions based on
such analysis, and should we make these decisions?
It is precisely the field of data science and its techniques and methods which allow us to conduct an
accurate analysis of the data in order to produce more accurate answers to the questions above. By
extracting, manipulating, and analyzing sets of data, we ensure that policymakers are much more
informed about the status of the crisis locally and nationally much better than just day by day
information and playing it by ear, so to speak. Gathering such information is critical, especially in
our time when reliable information seems hard to find.
In order to even start considering questions like those posed above, we must consider many of
the aspects of the data science pipeline, which we have listed below. 1. Data Collection: data
which is relevant to the project is found and collected. 2. Data Manipulation and Cleaning:
Data is manipulated into a format which is suitable for analysis. 3. Initial Data Visualization
(or exploratory data analysis): numerical data can be graphed to spot trends. 4. Modeling and
Prediction: One can use various methods to do predictions on future data or general population.
In our project, we will attempt to illustrate the data science pipeline with respect to the COVID-19
crisis. We will illustrate (with working code) the basic aspects of this pipeline as described above,
starting from the beginning. First we will discuss the data and where we got it from, as well as
the background and assessment of accuracy. Then, we will manipulate and modify the data for our
purposes. Finally, we will use the data we obtained to make future predictions.
1.2 Data Collection
For this project, we would like to collect data about total cases in the United States more closely
restricted to the local level, such as states and counties. We’ve obtained data from the following
sources, described below.
1.2.1 The COVID Tracking Project
The COVID Tracking Project is a collective volunteer effort to document data about the crisis day
by day. Their data collection claims to be comprehensive, with data from every state and most US
Territories. Their data, as one of only comprehensive sources of data on cases in the US, is being
used by many news outlets and academic experts.
We will be using their data on the day by day number of cases in states and in the US. In
the code below, this data is encompassed in the csv files Data/CTPStates-historical.csv,
Data/CTPTestingMaryland and Data/CTPUS-historical, the first containing the data of the
states, the second the number of tests given out in Maryland, and the last containing overall
US data. These spreadsheets give us the data we need for cases over time. The data is updated
daily and can be found here in this spreadsheet.
2
1.2.2 COVID-19 Data from JHU
Johns Hopkins University has a nice GUI which displays the current case data from around the
world. It can be found here. It is a very visually striking application and readers are encouraged to
go and play around with it. But where does this application pull data from? It pulls the data from a
github repository which is maintained by JHU for the purpose of keeping the application up to date.
The data itself is more specific than that of the COVID Tracking project, in that it tracks cases and
deaths by county, which is much more specific than state-wide. For example, in Maryland, one might
notice that the majority of cases arise in Montgomery County and Prince George’s County, with
less overall cases in neighboring counties (though Baltimore County does not trail too far behind).
The data we’ve obtained from JHU is packaged into the files Data/CasesDeathsCounty.csv and
Data/CountyConfirmedCases.csv.
1.2.3 Date of Stay-at-Home Information
We can find the date that each state established a stay-at-home order through this CNN Article,
which gives detailed information on when all the US States established this order.
Limitations of the data gathered Though as clearly highlighted above, there is lots of data
regarding overall cases both nationally and locally regarding the total cases over time of COVID-19,
one must at least be aware of the limitations that such convenient data offers us.
The primary concern when working with this data is accuracy. If a policymaker were to consider
using such data and its trends to make important decisions relevant to the crisis, then the first
question they ask should be - Is this data reliable? Can I trust that the data values are accurately
measured? This concern about accuracy is very important when considering stay-at-home orders,
shutdowns, and other impactful decisions.
The main limitation with large datasets which measure data from lots of locations, such as the data
from the COVID Tracking project and the JHU dataset is that you need lots and lots of sources in
order to measure accuracy of the data. With COVID, it is at least one source for each state and
territory. For JHU, it is at least one source for each county. This is seemingly even worse, as a lot
more can go wrong.
Concerns about the COVID Tracker Project Data The COVID-19 Tracker Project Dataset
pulls data from the relevant state and territory government health services. This means that the
reliability of this data starts and ends with the accuracy of state government reporting. It is a
difficult job to figure out how each state is measuring its data and whether or not it is accurate.
Concerns about the JHU Dataset As of May 16, the github page for the JHU dataset has
over 1300 reported issues. Some of these issues are complete non-issues. But some of these issues
report supposedly serious problems with the data, for example this one which is claiming that the
applet is not reporting the case number for the country of Nepal correctly. A lot of the issues with
the data are encoding related, having to do with state or county codes. Hopefully these will not be
too much of an issue with respect to this project.
For the purposes of our project, we need not really concern ourselves with the reliability of the data
collection. It is not our job, and besides, we don’t have the power to confirm to ourselves. However,
3
if we were in a position of more power, with the influence to affect policy decisions, then weighing
these issues is something that we absolutely have to do. Hopefully we have made the point that
in many cases the hardest part of the data science pipeline is verifying that the data
you get to be analyzed is completely accurate to the fullest extent possible. But for the
purposes of continuing our project we will assume that it is accurate.
2 Setup
Here we set up all the libraries we will be using. We list them below: - pandas will be used for data
manipulation. We will use it to format our raw data into tables. - plotnine is a Python library
similar to ggplot2 in R. We will use it to graph our data. - numpy is a scientific computing library.
- statsmodels is a statistical library that we will use for our data analysis.
[29]: import pandas as pd
from plotnine import *
import numpy as np
import statsmodels.formula.api as sm
import warnings
warnings.filterwarnings("ignore")
2.1 Data Manipulation
Here, this code manipulates the data into a format amenable to analysis. First we extract the infor-
mation from the .csv files into pandas dataframes using the read_csv command. Using some filters
we filter the relevant information into two data tables, maryland_deaths and maryland_confirmed.
[30]: counties = set(["Allegany", "Anne Arundel", "Baltimore", "Calvert",
"Caroline", "Carroll", "Cecil", "Charles", "Dorchester",
"Frederick", "Garrett", "Harford", "Howard", "Kent",
"Montgomery", "Prince George's", "Queen Anne's",
"St. Mary's", "Somerset", "Talbot", "Washington",
"Wicomico", "Worcester", "Baltimore City"])
county_deaths = pd.read_csv("Data/CasesDeathsCounty.csv");
county_confirmed = pd.read_csv("Data/CountyConfirmedCases.csv")
testing = pd.read_csv("Data/CTPTestingMaryland.csv")
testing['date'] = pd.to_datetime(testing['date'],format = '%Y-%m-%d')
maryland_deaths = county_deaths.loc[county_deaths["state"] == "Maryland"].
→loc[county_deaths["location_name"].isin(counties)].loc[county_deaths["date"]
.between(("04/1/2020"), ("05/12/2018"))].
→reset_index(drop=True)
maryland_confirmed = county_confirmed.loc[county_confirmed["county_name"].
→isin(counties)].query('state == "Maryland"').reset_index(drop=True)
4
maryland_deaths.head(10)
[30]: uid location_type fips_code location_name state date 
0 84024001 county 24001.0 Allegany Maryland 04/10/2020
1 84024001 county 24001.0 Allegany Maryland 04/11/2020
2 84024001 county 24001.0 Allegany Maryland 04/12/2020
3 84024001 county 24001.0 Allegany Maryland 04/13/2020
4 84024001 county 24001.0 Allegany Maryland 04/14/2020
5 84024001 county 24001.0 Allegany Maryland 04/15/2020
6 84024001 county 24001.0 Allegany Maryland 04/16/2020
7 84024001 county 24001.0 Allegany Maryland 04/17/2020
8 84024001 county 24001.0 Allegany Maryland 04/18/2020
9 84024001 county 24001.0 Allegany Maryland 04/19/2020
total_population cumulative_cases cumulative_cases_per_100_000 
0 71977.0 10 13.89
1 71977.0 11 15.28
2 71977.0 13 18.06
3 71977.0 15 20.84
4 71977.0 17 23.62
5 71977.0 17 23.62
6 71977.0 20 27.79
7 71977.0 26 36.12
8 71977.0 33 45.85
9 71977.0 33 45.85
cumulative_deaths cumulative_deaths_per_100_000 new_cases new_deaths 
0 0 0.00 2.0 0.0
1 0 0.00 1.0 0.0
2 0 0.00 2.0 0.0
3 0 0.00 2.0 0.0
4 0 0.00 2.0 0.0
5 1 1.39 0.0 1.0
6 1 1.39 3.0 0.0
7 1 1.39 6.0 0.0
8 1 1.39 7.0 0.0
9 1 1.39 0.0 0.0
new_cases_per_100_000 new_deaths_per_100_000
0 2.78 0.00
1 1.39 0.00
2 2.78 0.00
3 2.78 0.00
4 2.78 0.00
5 0.00 1.39
6 4.17 0.00
7 8.34 0.00
5
8 9.73 0.00
9 0.00 0.00
[31]: maryland_confirmed.head(10)
[31]: last_update state county_name county_name_long 
0 2020-05-12 21:32:28 Maryland Allegany Allegany, Maryland, US
1 2020-05-12 21:32:28 Maryland Anne Arundel Anne Arundel, Maryland, US
2 2020-05-12 21:32:28 Maryland Baltimore Baltimore, Maryland, US
3 2020-05-12 21:32:28 Maryland Calvert Calvert, Maryland, US
4 2020-05-12 21:32:28 Maryland Caroline Caroline, Maryland, US
5 2020-05-12 21:32:28 Maryland Carroll Carroll, Maryland, US
6 2020-05-12 21:32:28 Maryland Cecil Cecil, Maryland, US
7 2020-05-12 21:32:28 Maryland Charles Charles, Maryland, US
8 2020-05-12 21:32:28 Maryland Dorchester Dorchester, Maryland, US
9 2020-05-12 21:32:28 Maryland Frederick Frederick, Maryland, US
fips_code lat lon NCHS_urbanization total_population 
0 24001.0 39.623576 -78.692805 Small metro 71977.0
1 24003.0 39.006702 -76.603293 Large fringe metro 567696.0
2 24005.0 39.457847 -76.629120 Large fringe metro 827625.0
3 24009.0 38.539616 -76.568206 Large fringe metro 91082.0
4 24011.0 38.871723 -75.829042 Non-core 32875.0
5 24013.0 39.564536 -77.023737 Large fringe metro 167522.0
6 24015.0 39.566477 -75.946274 Large fringe metro 102517.0
7 24017.0 38.510923 -76.985807 Large fringe metro 157671.0
8 24019.0 38.454135 -76.027524 Micropolitan 32261.0
9 24021.0 39.472966 -77.399994 Large fringe metro 248472.0
confirmed confirmed_per_100000 deaths deaths_per_100000
0 148 205.62 13 18.06
1 2520 443.90 127 22.37
2 4051 489.47 204 24.65
3 211 231.66 13 14.27
4 174 529.28 0 0.00
5 589 351.60 60 35.82
6 270 263.37 15 14.63
7 761 482.65 55 34.88
8 102 316.17 2 6.20
9 1282 515.95 77 30.99
[32]: maryland_overall = pd.read_csv("Data/CTPStates-historical.csv").query('state ==␣
→"MD"').reset_index()
maryland_overall['date'] = pd.to_datetime(maryland_overall['date'],format =␣
→'%Y%m%d')
maryland_overall = maryland_overall.sort_values('date')
6
maryland_overall = maryland_overall.merge(testing, left_on='date',␣
→right_on='date', how='inner')
maryland_overall['positiveRatio'] = maryland_overall['positive'] /␣
→maryland_overall['cumulative_total_people_tested']
maryland_overall =␣
→maryland_overall[['date','positive','positiveIncrease','positiveRatio','cumulative_total_peo
maryland_overall.head(10)
[32]: date positive positiveIncrease positiveRatio 
0 2020-03-05 0.0 NaN 0.000000
1 2020-03-06 3.0 3.0 0.103448
2 2020-03-07 3.0 0.0 0.068182
3 2020-03-08 3.0 0.0 0.054545
4 2020-03-09 5.0 2.0 0.064103
5 2020-03-10 6.0 1.0 0.063158
6 2020-03-11 9.0 3.0 0.087379
7 2020-03-12 12.0 3.0 0.113208
8 2020-03-13 17.0 5.0 0.153153
9 2020-03-14 26.0 9.0 0.216667
cumulative_total_people_tested
0 17
1 29
2 44
3 55
4 78
5 95
6 103
7 106
8 111
9 120
2.2 Exploratory Data Analysis
The simplest way we can view the data is by looking at the number of positive tested cases over
time. This will give us a basic understanding about the spread of COVID-19.
[33]: (ggplot(maryland_overall,aes(y='positive',x='date'))
+ geom_point() + ggtitle("Number of Confirmed Cases in MD")
+ xlab("Date") + ylab("Cases"))
7
[33]: <ggplot: (146476666316)>
We can see that the number of cases is rapidly increasing, and it seems like the rate is stabilizing
into a more linear shape as time goes on.
Another interesting thing we can look at is by graphing the positive increase per day, essentially
a graph of derivatives at each day. This will give us an idea of how the rate of transmission is
changing.
[34]: (ggplot(maryland_overall,aes(y='positiveIncrease',x='date'))
+ geom_point() + geom_smooth(method='lm')
+ ggtitle("Increase in confirmed cases in MD"))
8
[34]: <ggplot: (146483016147)>
We see that the rate of transmission seems to be still on the rise, despite quarantine and stay-at-
home orders.
We can also view the spread of COVID-19 through the ratio of tests that return positive for the virus
against total number of tests given. It should also be noted that this data is not truly indicative
of the true rate of transmission of the virus, as the we imagine only people who are exhibiting
symptoms will go out to be tested. Nevertheless, this graph will show give us some interesting
results.
[35]: (ggplot(maryland_overall,aes(y='positiveRatio',x='date'))
+ geom_point() + geom_smooth(method='lm')
+ ggtitle("Ratio of positive cases in MD"))
9
[35]: <ggplot: (146482781353)>
Here, the ratio of cases has a sudden discontinuity between the dates of 3/26 and 3/27. The reason
for this is that within the data, prior to 3/27, only positive tests were tracked. This means that
the ratio is extremely skewed towards positive number of cases, and creates an improbable curve
within the graph. In this case, to see the change in overall ratio, we will limit the graph to after
3/27. The COVID Tracking project corroborates this discontinuity here where under the MD row
in “States” they record that Maryland did not report negative cases between 3/12 and 3/28 (which
is a one day discrepancy between this plot and that recording). Below you can find a plot where
the values before 3/28 are thrown out and a linear regression model is fit.
[36]: temp = maryland_overall.query('date > "2020-03-27"')
(ggplot(temp,aes(y='positiveRatio',x='date'))
+ geom_point() + geom_smooth(method='lm')
+ ggtitle("Ratio of positive cases in MD"))
10
[36]: <ggplot: (146473328399)>
Here is a facet plot of the cumulative cases for all counties in Maryland. From the chart it is clear
that some counties have far larger case growth rates than other counties (which is to be expected,
as only a few counties are very populous).
[37]: (ggplot(maryland_deaths, aes(x="date", y="cumulative_cases"))
+ geom_point()
+ facet_wrap('~location_name', ncol=4)
+ theme(axis_text_x = element_text(angle=90), figure_size=(30,30)))
11
[37]: <ggplot: (146482755697)>
This is a similar plot, but restricted to the 8 most populous counties (Montgomery, Prince George’s,
Baltimore, Baltimore City, Howard, Anne Arundel, Frederick, Harford) as listed here. As popula-
tion likely affects transmission rate, it may be useful to isolate these.
[38]: temp = maryland_deaths.query('location_name in ["Montgomery", "Prince␣
→George's", "Baltimore", "Baltimore City", 
"Frederick", "Howard", "Anne Arundel",␣
→"Harford"]')
(ggplot(temp, aes(x="date", y="cumulative_cases"))
+ geom_point()
+ facet_wrap('~location_name', ncol=4)
12
+ theme(axis_text_x = element_text(angle=90), figure_size=(70,35)))
[38]: <ggplot: (-9223371890378682827)>
2.3 Hypothesis Testing and Machine Learning
Now we will perform some hypothesis testing to see if quarantine has actually changed the transmis-
sion rate of COVID-19. To do so, we will create linear regressions for the rate of increasing positive
cases before and during the quarantine. We will set our null hypothesis as there is no difference
between rate of increasing cases, and our alternative hypothesis that there is some difference.
[39]: JDate = list()
before_df = maryland_overall.query('date < "2020-04-14"')
for i, row in before_df.iterrows():
JDate.append(row['date'].to_julian_date()-2458912.5)
before_df['jDate'] = JDate
before_df['status'] = "0"
JDate = list()
after_df = maryland_overall.query('date >= "2020-04-14"')
for i, row in after_df.iterrows():
JDate.append(row['date'].to_julian_date()-2458952.5)
after_df['jDate'] = JDate
after_df['status'] = "1"
13
compare_df = pd.concat([before_df, after_df])
(ggplot(compare_df,aes(y='positive',x='jDate',color='status'))
+ geom_point() + geom_smooth(method='lm')
+ ggtitle("Ratio of positive cases in MD"))
[39]: <ggplot: (-9223371890372047065)>
[40]: compare_res = sm.ols('positive~jDate*status', data=compare_df).fit()
compare_res.summary()
[40]: <class 'statsmodels.iolib.summary.Summary'>
"""
OLS Regression Results
==============================================================================
Dep. Variable: positive R-squared: 0.989
Model: OLS Adj. R-squared: 0.988
Method: Least Squares F-statistic: 1880.
Date: Mon, 18 May 2020 Prob (F-statistic): 2.55e-62
Time: 16:31:47 Log-Likelihood: -573.32
No. Observations: 68 AIC: 1155.
14
Df Residuals: 64 BIC: 1164.
Df Model: 3
Covariance Type: nonrobust
================================================================================
=====
coef std err t P>|t| [0.025
0.975]
--------------------------------------------------------------------------------
-----
Intercept -2068.0115 368.746 -5.608 0.000 -2804.667
-1331.356
status[T.1] 9435.4242 577.425 16.341 0.000 8281.886
1.06e+04
jDate 191.6871 15.674 12.230 0.000 160.376
222.999
jDate:status[T.1] 711.5184 31.022 22.936 0.000 649.546
773.491
==============================================================================
Omnibus: 6.598 Durbin-Watson: 0.111
Prob(Omnibus): 0.037 Jarque-Bera (JB): 6.084
Skew: 0.722 Prob(JB): 0.0478
Kurtosis: 3.243 Cond. No. 99.8
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
"""
The above plot and table show the information for a linear regression. The table shows us that
the rate of transmission prior to the quarentine is ~191 new confirmed cases per day, whereas after
quarentine began and the incubation period passed, the rate increased to ~711 new cases per day.
We can also see that the p-value for each of these two regressions are lower than our designated
α of 0.05, thus we can say that these slopes are reasonable. We also see that the p-value for
comparing the two slopes is 0, meaning that we reject our null hypothesis that the rate stayed the
same throughout quarantine.
Something we should note is that our rate actually increased drastically after quarantine(and incu-
bation period), which is not at all what we expect, nor what actually happened. It should be noted
that it makes sense for viruses, especially ones as contagious as COVID, to increase proportionally
to the number of positive cases, as more cases means higher rate of transmission. Additionally,
there are many other factors, such as differences between states, urbanization, availability of testing,
as well as other reasons that are likely outside the scope of our understanding.
2.4 Other Resources
As the COVID-19 case is an ongoing pandemic, you should strive to stay informed about the
virus in general. Every major news source, and most minor sources as well, will have ongoing
15
updates concerning the virus. As our analysis was solely centered around the case in Maryland,
you should check your local and state government webpages concerning their stance and laws
concerning the situation. For more in-depth news and articles concerning COVID, you should
follow the World Health Organization website for scholarly articles and global news. To view data
about the pandemic, we would recommend the data provided by the World Health Organization,
as well as taking a look at the COVID Tracking Project and data gathered from Johns Hopkins
University, which were used in the analysis of our data. COVID-19 is a serious threat accross the
globe, so the more informed you are and the more involved you are with the data, the better your
decision making will be on how to stay safe and healthy.
[ ]:
16

More Related Content

What's hot

DARPA -Transition of Technologies
DARPA -Transition of TechnologiesDARPA -Transition of Technologies
DARPA -Transition of TechnologiesDr Dev Kambhampati
 
Federal Statistical System, Transparency Camp West
Federal Statistical System, Transparency Camp WestFederal Statistical System, Transparency Camp West
Federal Statistical System, Transparency Camp Westbradstenger
 
Dr Dev Kambhampati | USA Cybersecurity R&D Strategic Plan
Dr Dev Kambhampati | USA Cybersecurity R&D Strategic PlanDr Dev Kambhampati | USA Cybersecurity R&D Strategic Plan
Dr Dev Kambhampati | USA Cybersecurity R&D Strategic PlanDr Dev Kambhampati
 
Data privacy and security in ICT4D - Meeting Report
Data privacy and security in ICT4D - Meeting Report Data privacy and security in ICT4D - Meeting Report
Data privacy and security in ICT4D - Meeting Report UN Global Pulse
 
Data Breach Research Plan 72415 FINAL
Data Breach Research Plan 72415 FINALData Breach Research Plan 72415 FINAL
Data Breach Research Plan 72415 FINALJoseph White MPA CPM
 
LIS 60030 Final Project
LIS 60030 Final ProjectLIS 60030 Final Project
LIS 60030 Final ProjectLaura Levy
 
Big data for development
Big data for development Big data for development
Big data for development Junaid Qadir
 
Open Data Sources for Grants
Open Data Sources for GrantsOpen Data Sources for Grants
Open Data Sources for Grantsjasonparker83
 
Twitter Based Election Prediction and Analysis
Twitter Based Election Prediction and AnalysisTwitter Based Election Prediction and Analysis
Twitter Based Election Prediction and AnalysisIRJET Journal
 
How to handle government related questions.
How to handle government related questions.How to handle government related questions.
How to handle government related questions.Kyle Guzik
 
Fomcprojtabl20210616
Fomcprojtabl20210616Fomcprojtabl20210616
Fomcprojtabl20210616Josua Pardede
 
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...Gabriela Agustini
 
Arbor U.S. Economic Overview 2018 q4
Arbor U.S. Economic Overview 2018 q4Arbor U.S. Economic Overview 2018 q4
Arbor U.S. Economic Overview 2018 q4Ivan Kaufman
 
El periodismo de datos en 2017
El periodismo de datos en 2017El periodismo de datos en 2017
El periodismo de datos en 2017María Rubio
 
2018 Economic Forecast Dragas Center for Economic Analysis and Policy - (31 J...
2018 Economic Forecast Dragas Center for Economic Analysis and Policy - (31 J...2018 Economic Forecast Dragas Center for Economic Analysis and Policy - (31 J...
2018 Economic Forecast Dragas Center for Economic Analysis and Policy - (31 J...rmcnab67
 

What's hot (17)

DARPA -Transition of Technologies
DARPA -Transition of TechnologiesDARPA -Transition of Technologies
DARPA -Transition of Technologies
 
Federal Statistical System, Transparency Camp West
Federal Statistical System, Transparency Camp WestFederal Statistical System, Transparency Camp West
Federal Statistical System, Transparency Camp West
 
Dr Dev Kambhampati | USA Cybersecurity R&D Strategic Plan
Dr Dev Kambhampati | USA Cybersecurity R&D Strategic PlanDr Dev Kambhampati | USA Cybersecurity R&D Strategic Plan
Dr Dev Kambhampati | USA Cybersecurity R&D Strategic Plan
 
Data privacy and security in ICT4D - Meeting Report
Data privacy and security in ICT4D - Meeting Report Data privacy and security in ICT4D - Meeting Report
Data privacy and security in ICT4D - Meeting Report
 
Data Breach Research Plan 72415 FINAL
Data Breach Research Plan 72415 FINALData Breach Research Plan 72415 FINAL
Data Breach Research Plan 72415 FINAL
 
LIS 60030 Final Project
LIS 60030 Final ProjectLIS 60030 Final Project
LIS 60030 Final Project
 
Big data for development
Big data for development Big data for development
Big data for development
 
Open Data Sources for Grants
Open Data Sources for GrantsOpen Data Sources for Grants
Open Data Sources for Grants
 
Estudio data journalism-in-2017 - google
Estudio data journalism-in-2017 - googleEstudio data journalism-in-2017 - google
Estudio data journalism-in-2017 - google
 
Twitter Based Election Prediction and Analysis
Twitter Based Election Prediction and AnalysisTwitter Based Election Prediction and Analysis
Twitter Based Election Prediction and Analysis
 
How to handle government related questions.
How to handle government related questions.How to handle government related questions.
How to handle government related questions.
 
Fomcprojtabl20210616
Fomcprojtabl20210616Fomcprojtabl20210616
Fomcprojtabl20210616
 
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
Newspaper Editors vs the Crowd: On the Appropriateness of Front Page News Sel...
 
Arbor U.S. Economic Overview 2018 q4
Arbor U.S. Economic Overview 2018 q4Arbor U.S. Economic Overview 2018 q4
Arbor U.S. Economic Overview 2018 q4
 
Workforce Development in the Post Pandemic World
Workforce Development in the Post Pandemic WorldWorkforce Development in the Post Pandemic World
Workforce Development in the Post Pandemic World
 
El periodismo de datos en 2017
El periodismo de datos en 2017El periodismo de datos en 2017
El periodismo de datos en 2017
 
2018 Economic Forecast Dragas Center for Economic Analysis and Policy - (31 J...
2018 Economic Forecast Dragas Center for Economic Analysis and Policy - (31 J...2018 Economic Forecast Dragas Center for Economic Analysis and Policy - (31 J...
2018 Economic Forecast Dragas Center for Economic Analysis and Policy - (31 J...
 

Similar to Coronavirus Case Tracking

Spatial-Temporal Data Science of COVID-19 Data.pptx
Spatial-Temporal Data Science of COVID-19 Data.pptxSpatial-Temporal Data Science of COVID-19 Data.pptx
Spatial-Temporal Data Science of COVID-19 Data.pptxSanjayBhargavMadaman
 
Dengue Outrage Forecasting via SAS
Dengue Outrage Forecasting via SASDengue Outrage Forecasting via SAS
Dengue Outrage Forecasting via SASSaurav Gupta
 
GRBN Trust and Personal Data Survey report - Part 1 - Concern, familiarity, t...
GRBN Trust and Personal Data Survey report - Part 1 - Concern, familiarity, t...GRBN Trust and Personal Data Survey report - Part 1 - Concern, familiarity, t...
GRBN Trust and Personal Data Survey report - Part 1 - Concern, familiarity, t...Andrew Cannon
 
ESOMAR Telephone and Internet Coverage around the World 2016
ESOMAR Telephone and Internet Coverage around the World 2016ESOMAR Telephone and Internet Coverage around the World 2016
ESOMAR Telephone and Internet Coverage around the World 2016T.S. Lim
 
Reuters digital news report 2013
Reuters digital news report 2013Reuters digital news report 2013
Reuters digital news report 2013Xosé María Cid
 
Digital news report_2013
Digital news report_2013Digital news report_2013
Digital news report_2013Insoon Kim
 
GRBN Trust and Personal Data Survey Report - Part 2 - Regions and countries -...
GRBN Trust and Personal Data Survey Report - Part 2 - Regions and countries -...GRBN Trust and Personal Data Survey Report - Part 2 - Regions and countries -...
GRBN Trust and Personal Data Survey Report - Part 2 - Regions and countries -...Andrew Cannon
 
COVID-19 data configuration and statistical analysis
COVID-19 data configuration and statistical analysisCOVID-19 data configuration and statistical analysis
COVID-19 data configuration and statistical analysisAnshJAIN50
 
Applications & Implications of Big Data for Official Statistics - Emmanuel L...
Applications & Implications of Big Data for Official Statistics - Emmanuel L...Applications & Implications of Big Data for Official Statistics - Emmanuel L...
Applications & Implications of Big Data for Official Statistics - Emmanuel L...Bill Oates
 
Proposed requirements of the management dashboard at the national level
Proposed requirements of the management dashboard at the national levelProposed requirements of the management dashboard at the national level
Proposed requirements of the management dashboard at the national levelarmabadi
 
An analysis of consumer interest level data for online health information in ...
An analysis of consumer interest level data for online health information in ...An analysis of consumer interest level data for online health information in ...
An analysis of consumer interest level data for online health information in ...IJMIT JOURNAL
 
AN ANALYSIS OF CONSUMER INTEREST LEVEL DATA FOR ONLINE HEALTH INFORMATION IN ...
AN ANALYSIS OF CONSUMER INTEREST LEVEL DATA FOR ONLINE HEALTH INFORMATION IN ...AN ANALYSIS OF CONSUMER INTEREST LEVEL DATA FOR ONLINE HEALTH INFORMATION IN ...
AN ANALYSIS OF CONSUMER INTEREST LEVEL DATA FOR ONLINE HEALTH INFORMATION IN ...IJMIT JOURNAL
 
Monitoring and Understanding the Co-Spread of COVID-19 Misinformation and Fac...
Monitoring and Understanding the Co-Spread of COVID-19 Misinformation and Fac...Monitoring and Understanding the Co-Spread of COVID-19 Misinformation and Fac...
Monitoring and Understanding the Co-Spread of COVID-19 Misinformation and Fac...Gregoire Burel
 
Fiware: open data &amp; open big data
Fiware: open data &amp; open big dataFiware: open data &amp; open big data
Fiware: open data &amp; open big dataEUBrasilCloudFORUM .
 
Lockdown optimization for Corona Virus
Lockdown optimization for Corona VirusLockdown optimization for Corona Virus
Lockdown optimization for Corona VirusShivanand (Shiva) Rai
 
Coronavirus diseasecovid 19research and statistics
Coronavirus diseasecovid 19research and statisticsCoronavirus diseasecovid 19research and statistics
Coronavirus diseasecovid 19research and statisticsDr. Nasir Mustafa
 
What does “BIG DATA” mean for official statistics?
What does “BIG DATA” mean for official statistics?What does “BIG DATA” mean for official statistics?
What does “BIG DATA” mean for official statistics?Vincenzo Patruno
 
CORONAVIRUS: How Doctors' Social Networks Fight the Pandemic
CORONAVIRUS: How Doctors' Social Networks Fight the PandemicCORONAVIRUS: How Doctors' Social Networks Fight the Pandemic
CORONAVIRUS: How Doctors' Social Networks Fight the PandemicLen Starnes
 
GIVING UP PRIVACY FOR SECURITY: A SURVEY ON PRIVACY TRADE-OFF DURING PANDEMIC...
GIVING UP PRIVACY FOR SECURITY: A SURVEY ON PRIVACY TRADE-OFF DURING PANDEMIC...GIVING UP PRIVACY FOR SECURITY: A SURVEY ON PRIVACY TRADE-OFF DURING PANDEMIC...
GIVING UP PRIVACY FOR SECURITY: A SURVEY ON PRIVACY TRADE-OFF DURING PANDEMIC...ijcisjournal
 
Iisrt z dr.s.sapna
Iisrt z dr.s.sapnaIisrt z dr.s.sapna
Iisrt z dr.s.sapnaIISRT
 

Similar to Coronavirus Case Tracking (20)

Spatial-Temporal Data Science of COVID-19 Data.pptx
Spatial-Temporal Data Science of COVID-19 Data.pptxSpatial-Temporal Data Science of COVID-19 Data.pptx
Spatial-Temporal Data Science of COVID-19 Data.pptx
 
Dengue Outrage Forecasting via SAS
Dengue Outrage Forecasting via SASDengue Outrage Forecasting via SAS
Dengue Outrage Forecasting via SAS
 
GRBN Trust and Personal Data Survey report - Part 1 - Concern, familiarity, t...
GRBN Trust and Personal Data Survey report - Part 1 - Concern, familiarity, t...GRBN Trust and Personal Data Survey report - Part 1 - Concern, familiarity, t...
GRBN Trust and Personal Data Survey report - Part 1 - Concern, familiarity, t...
 
ESOMAR Telephone and Internet Coverage around the World 2016
ESOMAR Telephone and Internet Coverage around the World 2016ESOMAR Telephone and Internet Coverage around the World 2016
ESOMAR Telephone and Internet Coverage around the World 2016
 
Reuters digital news report 2013
Reuters digital news report 2013Reuters digital news report 2013
Reuters digital news report 2013
 
Digital news report_2013
Digital news report_2013Digital news report_2013
Digital news report_2013
 
GRBN Trust and Personal Data Survey Report - Part 2 - Regions and countries -...
GRBN Trust and Personal Data Survey Report - Part 2 - Regions and countries -...GRBN Trust and Personal Data Survey Report - Part 2 - Regions and countries -...
GRBN Trust and Personal Data Survey Report - Part 2 - Regions and countries -...
 
COVID-19 data configuration and statistical analysis
COVID-19 data configuration and statistical analysisCOVID-19 data configuration and statistical analysis
COVID-19 data configuration and statistical analysis
 
Applications & Implications of Big Data for Official Statistics - Emmanuel L...
Applications & Implications of Big Data for Official Statistics - Emmanuel L...Applications & Implications of Big Data for Official Statistics - Emmanuel L...
Applications & Implications of Big Data for Official Statistics - Emmanuel L...
 
Proposed requirements of the management dashboard at the national level
Proposed requirements of the management dashboard at the national levelProposed requirements of the management dashboard at the national level
Proposed requirements of the management dashboard at the national level
 
An analysis of consumer interest level data for online health information in ...
An analysis of consumer interest level data for online health information in ...An analysis of consumer interest level data for online health information in ...
An analysis of consumer interest level data for online health information in ...
 
AN ANALYSIS OF CONSUMER INTEREST LEVEL DATA FOR ONLINE HEALTH INFORMATION IN ...
AN ANALYSIS OF CONSUMER INTEREST LEVEL DATA FOR ONLINE HEALTH INFORMATION IN ...AN ANALYSIS OF CONSUMER INTEREST LEVEL DATA FOR ONLINE HEALTH INFORMATION IN ...
AN ANALYSIS OF CONSUMER INTEREST LEVEL DATA FOR ONLINE HEALTH INFORMATION IN ...
 
Monitoring and Understanding the Co-Spread of COVID-19 Misinformation and Fac...
Monitoring and Understanding the Co-Spread of COVID-19 Misinformation and Fac...Monitoring and Understanding the Co-Spread of COVID-19 Misinformation and Fac...
Monitoring and Understanding the Co-Spread of COVID-19 Misinformation and Fac...
 
Fiware: open data &amp; open big data
Fiware: open data &amp; open big dataFiware: open data &amp; open big data
Fiware: open data &amp; open big data
 
Lockdown optimization for Corona Virus
Lockdown optimization for Corona VirusLockdown optimization for Corona Virus
Lockdown optimization for Corona Virus
 
Coronavirus diseasecovid 19research and statistics
Coronavirus diseasecovid 19research and statisticsCoronavirus diseasecovid 19research and statistics
Coronavirus diseasecovid 19research and statistics
 
What does “BIG DATA” mean for official statistics?
What does “BIG DATA” mean for official statistics?What does “BIG DATA” mean for official statistics?
What does “BIG DATA” mean for official statistics?
 
CORONAVIRUS: How Doctors' Social Networks Fight the Pandemic
CORONAVIRUS: How Doctors' Social Networks Fight the PandemicCORONAVIRUS: How Doctors' Social Networks Fight the Pandemic
CORONAVIRUS: How Doctors' Social Networks Fight the Pandemic
 
GIVING UP PRIVACY FOR SECURITY: A SURVEY ON PRIVACY TRADE-OFF DURING PANDEMIC...
GIVING UP PRIVACY FOR SECURITY: A SURVEY ON PRIVACY TRADE-OFF DURING PANDEMIC...GIVING UP PRIVACY FOR SECURITY: A SURVEY ON PRIVACY TRADE-OFF DURING PANDEMIC...
GIVING UP PRIVACY FOR SECURITY: A SURVEY ON PRIVACY TRADE-OFF DURING PANDEMIC...
 
Iisrt z dr.s.sapna
Iisrt z dr.s.sapnaIisrt z dr.s.sapna
Iisrt z dr.s.sapna
 

Recently uploaded

Famous Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in ka...
Famous Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in ka...Famous Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in ka...
Famous Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in ka...baharayali
 
Jual Obat Aborsi Ponorogo ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Ponorogo ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Ponorogo ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Ponorogo ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Certified Amil baba, Black magic specialist in Russia and Kala jadu expert in...
Certified Amil baba, Black magic specialist in Russia and Kala jadu expert in...Certified Amil baba, Black magic specialist in Russia and Kala jadu expert in...
Certified Amil baba, Black magic specialist in Russia and Kala jadu expert in...makhmalhalaaay
 
Genesis 1:5 - Meditate the Scripture Daily bit by bit
Genesis 1:5 - Meditate the Scripture Daily bit by bitGenesis 1:5 - Meditate the Scripture Daily bit by bit
Genesis 1:5 - Meditate the Scripture Daily bit by bitmaricelcanoynuay
 
Balaghat Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Balaghat Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsBalaghat Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Balaghat Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsDeepika Singh
 
Famous Kala Jadu, Black magic expert in UK and Kala ilam expert in Saudi Arab...
Famous Kala Jadu, Black magic expert in UK and Kala ilam expert in Saudi Arab...Famous Kala Jadu, Black magic expert in UK and Kala ilam expert in Saudi Arab...
Famous Kala Jadu, Black magic expert in UK and Kala ilam expert in Saudi Arab...baharayali
 
Deerfoot Church of Christ Bulletin 5 12 24
Deerfoot Church of Christ Bulletin 5 12 24Deerfoot Church of Christ Bulletin 5 12 24
Deerfoot Church of Christ Bulletin 5 12 24deerfootcoc
 
The Revelation Chapter 4 Working Copy.docx
The Revelation Chapter 4 Working Copy.docxThe Revelation Chapter 4 Working Copy.docx
The Revelation Chapter 4 Working Copy.docxFred Gosnell
 
Popular Kala Jadu, Kala ilam specialist in USA and Bangali Amil baba in Saudi...
Popular Kala Jadu, Kala ilam specialist in USA and Bangali Amil baba in Saudi...Popular Kala Jadu, Kala ilam specialist in USA and Bangali Amil baba in Saudi...
Popular Kala Jadu, Kala ilam specialist in USA and Bangali Amil baba in Saudi...baharayali
 
Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...
Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...
Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...baharayali
 
"The Magnificent Surah Rahman: PDF Version"
"The Magnificent Surah Rahman: PDF Version""The Magnificent Surah Rahman: PDF Version"
"The Magnificent Surah Rahman: PDF Version"aijazuddin14
 
The_Chronological_Life_of_Christ_Part_99_Words_and_Works
The_Chronological_Life_of_Christ_Part_99_Words_and_WorksThe_Chronological_Life_of_Christ_Part_99_Words_and_Works
The_Chronological_Life_of_Christ_Part_99_Words_and_WorksNetwork Bible Fellowship
 
Human Design Gates Cheat Sheet | Kabastro.com
Human Design Gates Cheat Sheet | Kabastro.comHuman Design Gates Cheat Sheet | Kabastro.com
Human Design Gates Cheat Sheet | Kabastro.comKabastro
 
Hire Best Next Js Developer For Your Project
Hire Best Next Js Developer For Your ProjectHire Best Next Js Developer For Your Project
Hire Best Next Js Developer For Your ProjectCyanic lab
 
Exploring the Meaning of Jesus’ Ascension
Exploring the Meaning of Jesus’ AscensionExploring the Meaning of Jesus’ Ascension
Exploring the Meaning of Jesus’ AscensionbluetroyvictorVinay
 
About Kabala (English) | Kabastro.com | Kabala.vn
About Kabala (English) | Kabastro.com | Kabala.vnAbout Kabala (English) | Kabastro.com | Kabala.vn
About Kabala (English) | Kabastro.com | Kabala.vnKabastro
 
Amil baba in Lahore /Amil baba in Karachi /Amil baba in Pakistan
Amil baba in Lahore /Amil baba in Karachi /Amil baba in PakistanAmil baba in Lahore /Amil baba in Karachi /Amil baba in Pakistan
Amil baba in Lahore /Amil baba in Karachi /Amil baba in PakistanAmil Baba Mangal Maseeh
 
A Spiritual Guide To Truth v10.pdf xxxxxxx
A Spiritual Guide To Truth v10.pdf xxxxxxxA Spiritual Guide To Truth v10.pdf xxxxxxx
A Spiritual Guide To Truth v10.pdf xxxxxxxssuser83613b
 

Recently uploaded (20)

Louise de Marillac and Care for the Elderly
Louise de Marillac and Care for the ElderlyLouise de Marillac and Care for the Elderly
Louise de Marillac and Care for the Elderly
 
Famous Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in ka...
Famous Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in ka...Famous Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in ka...
Famous Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in ka...
 
Jual Obat Aborsi Ponorogo ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Ponorogo ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Ponorogo ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Ponorogo ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Certified Amil baba, Black magic specialist in Russia and Kala jadu expert in...
Certified Amil baba, Black magic specialist in Russia and Kala jadu expert in...Certified Amil baba, Black magic specialist in Russia and Kala jadu expert in...
Certified Amil baba, Black magic specialist in Russia and Kala jadu expert in...
 
Famous Best astrologer in Islamabad / Amil baba in Islamabad/ Amil baba in UK...
Famous Best astrologer in Islamabad / Amil baba in Islamabad/ Amil baba in UK...Famous Best astrologer in Islamabad / Amil baba in Islamabad/ Amil baba in UK...
Famous Best astrologer in Islamabad / Amil baba in Islamabad/ Amil baba in UK...
 
Genesis 1:5 - Meditate the Scripture Daily bit by bit
Genesis 1:5 - Meditate the Scripture Daily bit by bitGenesis 1:5 - Meditate the Scripture Daily bit by bit
Genesis 1:5 - Meditate the Scripture Daily bit by bit
 
Balaghat Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Balaghat Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsBalaghat Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Balaghat Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
 
Famous Kala Jadu, Black magic expert in UK and Kala ilam expert in Saudi Arab...
Famous Kala Jadu, Black magic expert in UK and Kala ilam expert in Saudi Arab...Famous Kala Jadu, Black magic expert in UK and Kala ilam expert in Saudi Arab...
Famous Kala Jadu, Black magic expert in UK and Kala ilam expert in Saudi Arab...
 
Deerfoot Church of Christ Bulletin 5 12 24
Deerfoot Church of Christ Bulletin 5 12 24Deerfoot Church of Christ Bulletin 5 12 24
Deerfoot Church of Christ Bulletin 5 12 24
 
The Revelation Chapter 4 Working Copy.docx
The Revelation Chapter 4 Working Copy.docxThe Revelation Chapter 4 Working Copy.docx
The Revelation Chapter 4 Working Copy.docx
 
Popular Kala Jadu, Kala ilam specialist in USA and Bangali Amil baba in Saudi...
Popular Kala Jadu, Kala ilam specialist in USA and Bangali Amil baba in Saudi...Popular Kala Jadu, Kala ilam specialist in USA and Bangali Amil baba in Saudi...
Popular Kala Jadu, Kala ilam specialist in USA and Bangali Amil baba in Saudi...
 
Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...
Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...
Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...
 
"The Magnificent Surah Rahman: PDF Version"
"The Magnificent Surah Rahman: PDF Version""The Magnificent Surah Rahman: PDF Version"
"The Magnificent Surah Rahman: PDF Version"
 
The_Chronological_Life_of_Christ_Part_99_Words_and_Works
The_Chronological_Life_of_Christ_Part_99_Words_and_WorksThe_Chronological_Life_of_Christ_Part_99_Words_and_Works
The_Chronological_Life_of_Christ_Part_99_Words_and_Works
 
Human Design Gates Cheat Sheet | Kabastro.com
Human Design Gates Cheat Sheet | Kabastro.comHuman Design Gates Cheat Sheet | Kabastro.com
Human Design Gates Cheat Sheet | Kabastro.com
 
Hire Best Next Js Developer For Your Project
Hire Best Next Js Developer For Your ProjectHire Best Next Js Developer For Your Project
Hire Best Next Js Developer For Your Project
 
Exploring the Meaning of Jesus’ Ascension
Exploring the Meaning of Jesus’ AscensionExploring the Meaning of Jesus’ Ascension
Exploring the Meaning of Jesus’ Ascension
 
About Kabala (English) | Kabastro.com | Kabala.vn
About Kabala (English) | Kabastro.com | Kabala.vnAbout Kabala (English) | Kabastro.com | Kabala.vn
About Kabala (English) | Kabastro.com | Kabala.vn
 
Amil baba in Lahore /Amil baba in Karachi /Amil baba in Pakistan
Amil baba in Lahore /Amil baba in Karachi /Amil baba in PakistanAmil baba in Lahore /Amil baba in Karachi /Amil baba in Pakistan
Amil baba in Lahore /Amil baba in Karachi /Amil baba in Pakistan
 
A Spiritual Guide To Truth v10.pdf xxxxxxx
A Spiritual Guide To Truth v10.pdf xxxxxxxA Spiritual Guide To Truth v10.pdf xxxxxxx
A Spiritual Guide To Truth v10.pdf xxxxxxx
 

Coronavirus Case Tracking

  • 1. coronavirus-case-tracking May 31, 2020 1 The Data Science Pipeline: COVID-19 Case Tracking Philip Tian, Jonathan Lin, David Ahmed 1.1 Introduction We will introduce the data science pipeline by analyzing data from the (currently ongoing) COVID- 19 pandemic in the United States. COVID-19, also known as the coronavirus, or more specifically the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a disease that first appeared in the continental United States in early February and March. A comprehensive list of known and documented symptoms can be found here. As of May 15, 2020, there are around 4.5 million individuals infected with the virus, which is why many experts in the media and academia are calling this crisis a pandemic. The COVID-19 crisis has caused many temporary economic and social changes. For example, most states in the United States have issued stay-at-home orders, where guidelines are issued on when and why one can leave home for certain reasons. These issues mean that normal work activities have stopped and countries have been struggling with issues like sudden drops in economic productivity (documented in news stories like this one or this one). These economic issues directly caused by the virus have big impacts on the well-being of people all across the world, as unemployment grows in several countries worldwide, which is a direct consequence of the effects that the pandemic and governmental policies like stay-at-home and the closure of “non-essential businesses” have on the economy and people’s income. Given this balance between maintaining local or national economic health and preventing the spread of a dangerous disease, policymakers are faced with certain questions about the current state of affairs. - How long should we maintain a stay-at-home order nationally/locally? - Are other states/countries handling the situation “better”? What are they doing differently? - What does “better” even mean? Can we quantify these things when we are making policy decisions? - How will cases grow from today? Can our current infrastructure (hospitals) handle this growth? Faced with a variety of policy decisions that might literally cost lives, policy makers are walking on tightropes, and in this day and age of information it is especially important that the decision which is finally made is close to optimal. But how does one know whether or not a decision is optimal? Such a question can be answered using the techniques of data science. 1.1.1 The Importance of Data Science During the COVID-19 Crisis As the total number of COVID-19 cases increases in the USA and worldwide, it seems almost too easy for the mainstream media to capitalize on all the news cases to churn out new news stories. An 1
  • 2. increase in cases (perhaps coupled with some official or expert statement) is a news story waiting to happen. For example, here, here, and here are all news stories which were directly obtained by internet searching the phrase “increase in cases”. With this influx of information, there are certain questions we should ask. - Can we trust that the data that is being reported is accurate and correct? - What exactly does 1000 (or 2000, or 500) cases daily mean in the context of a certain state/country/county? - To what extent can we predict how the crisis (especially in terms of the number of cases) evolves over time? - Is it possible to make policy or economic decisions based on such analysis, and should we make these decisions? It is precisely the field of data science and its techniques and methods which allow us to conduct an accurate analysis of the data in order to produce more accurate answers to the questions above. By extracting, manipulating, and analyzing sets of data, we ensure that policymakers are much more informed about the status of the crisis locally and nationally much better than just day by day information and playing it by ear, so to speak. Gathering such information is critical, especially in our time when reliable information seems hard to find. In order to even start considering questions like those posed above, we must consider many of the aspects of the data science pipeline, which we have listed below. 1. Data Collection: data which is relevant to the project is found and collected. 2. Data Manipulation and Cleaning: Data is manipulated into a format which is suitable for analysis. 3. Initial Data Visualization (or exploratory data analysis): numerical data can be graphed to spot trends. 4. Modeling and Prediction: One can use various methods to do predictions on future data or general population. In our project, we will attempt to illustrate the data science pipeline with respect to the COVID-19 crisis. We will illustrate (with working code) the basic aspects of this pipeline as described above, starting from the beginning. First we will discuss the data and where we got it from, as well as the background and assessment of accuracy. Then, we will manipulate and modify the data for our purposes. Finally, we will use the data we obtained to make future predictions. 1.2 Data Collection For this project, we would like to collect data about total cases in the United States more closely restricted to the local level, such as states and counties. We’ve obtained data from the following sources, described below. 1.2.1 The COVID Tracking Project The COVID Tracking Project is a collective volunteer effort to document data about the crisis day by day. Their data collection claims to be comprehensive, with data from every state and most US Territories. Their data, as one of only comprehensive sources of data on cases in the US, is being used by many news outlets and academic experts. We will be using their data on the day by day number of cases in states and in the US. In the code below, this data is encompassed in the csv files Data/CTPStates-historical.csv, Data/CTPTestingMaryland and Data/CTPUS-historical, the first containing the data of the states, the second the number of tests given out in Maryland, and the last containing overall US data. These spreadsheets give us the data we need for cases over time. The data is updated daily and can be found here in this spreadsheet. 2
  • 3. 1.2.2 COVID-19 Data from JHU Johns Hopkins University has a nice GUI which displays the current case data from around the world. It can be found here. It is a very visually striking application and readers are encouraged to go and play around with it. But where does this application pull data from? It pulls the data from a github repository which is maintained by JHU for the purpose of keeping the application up to date. The data itself is more specific than that of the COVID Tracking project, in that it tracks cases and deaths by county, which is much more specific than state-wide. For example, in Maryland, one might notice that the majority of cases arise in Montgomery County and Prince George’s County, with less overall cases in neighboring counties (though Baltimore County does not trail too far behind). The data we’ve obtained from JHU is packaged into the files Data/CasesDeathsCounty.csv and Data/CountyConfirmedCases.csv. 1.2.3 Date of Stay-at-Home Information We can find the date that each state established a stay-at-home order through this CNN Article, which gives detailed information on when all the US States established this order. Limitations of the data gathered Though as clearly highlighted above, there is lots of data regarding overall cases both nationally and locally regarding the total cases over time of COVID-19, one must at least be aware of the limitations that such convenient data offers us. The primary concern when working with this data is accuracy. If a policymaker were to consider using such data and its trends to make important decisions relevant to the crisis, then the first question they ask should be - Is this data reliable? Can I trust that the data values are accurately measured? This concern about accuracy is very important when considering stay-at-home orders, shutdowns, and other impactful decisions. The main limitation with large datasets which measure data from lots of locations, such as the data from the COVID Tracking project and the JHU dataset is that you need lots and lots of sources in order to measure accuracy of the data. With COVID, it is at least one source for each state and territory. For JHU, it is at least one source for each county. This is seemingly even worse, as a lot more can go wrong. Concerns about the COVID Tracker Project Data The COVID-19 Tracker Project Dataset pulls data from the relevant state and territory government health services. This means that the reliability of this data starts and ends with the accuracy of state government reporting. It is a difficult job to figure out how each state is measuring its data and whether or not it is accurate. Concerns about the JHU Dataset As of May 16, the github page for the JHU dataset has over 1300 reported issues. Some of these issues are complete non-issues. But some of these issues report supposedly serious problems with the data, for example this one which is claiming that the applet is not reporting the case number for the country of Nepal correctly. A lot of the issues with the data are encoding related, having to do with state or county codes. Hopefully these will not be too much of an issue with respect to this project. For the purposes of our project, we need not really concern ourselves with the reliability of the data collection. It is not our job, and besides, we don’t have the power to confirm to ourselves. However, 3
  • 4. if we were in a position of more power, with the influence to affect policy decisions, then weighing these issues is something that we absolutely have to do. Hopefully we have made the point that in many cases the hardest part of the data science pipeline is verifying that the data you get to be analyzed is completely accurate to the fullest extent possible. But for the purposes of continuing our project we will assume that it is accurate. 2 Setup Here we set up all the libraries we will be using. We list them below: - pandas will be used for data manipulation. We will use it to format our raw data into tables. - plotnine is a Python library similar to ggplot2 in R. We will use it to graph our data. - numpy is a scientific computing library. - statsmodels is a statistical library that we will use for our data analysis. [29]: import pandas as pd from plotnine import * import numpy as np import statsmodels.formula.api as sm import warnings warnings.filterwarnings("ignore") 2.1 Data Manipulation Here, this code manipulates the data into a format amenable to analysis. First we extract the infor- mation from the .csv files into pandas dataframes using the read_csv command. Using some filters we filter the relevant information into two data tables, maryland_deaths and maryland_confirmed. [30]: counties = set(["Allegany", "Anne Arundel", "Baltimore", "Calvert", "Caroline", "Carroll", "Cecil", "Charles", "Dorchester", "Frederick", "Garrett", "Harford", "Howard", "Kent", "Montgomery", "Prince George's", "Queen Anne's", "St. Mary's", "Somerset", "Talbot", "Washington", "Wicomico", "Worcester", "Baltimore City"]) county_deaths = pd.read_csv("Data/CasesDeathsCounty.csv"); county_confirmed = pd.read_csv("Data/CountyConfirmedCases.csv") testing = pd.read_csv("Data/CTPTestingMaryland.csv") testing['date'] = pd.to_datetime(testing['date'],format = '%Y-%m-%d') maryland_deaths = county_deaths.loc[county_deaths["state"] == "Maryland"]. →loc[county_deaths["location_name"].isin(counties)].loc[county_deaths["date"] .between(("04/1/2020"), ("05/12/2018"))]. →reset_index(drop=True) maryland_confirmed = county_confirmed.loc[county_confirmed["county_name"]. →isin(counties)].query('state == "Maryland"').reset_index(drop=True) 4
  • 5. maryland_deaths.head(10) [30]: uid location_type fips_code location_name state date 0 84024001 county 24001.0 Allegany Maryland 04/10/2020 1 84024001 county 24001.0 Allegany Maryland 04/11/2020 2 84024001 county 24001.0 Allegany Maryland 04/12/2020 3 84024001 county 24001.0 Allegany Maryland 04/13/2020 4 84024001 county 24001.0 Allegany Maryland 04/14/2020 5 84024001 county 24001.0 Allegany Maryland 04/15/2020 6 84024001 county 24001.0 Allegany Maryland 04/16/2020 7 84024001 county 24001.0 Allegany Maryland 04/17/2020 8 84024001 county 24001.0 Allegany Maryland 04/18/2020 9 84024001 county 24001.0 Allegany Maryland 04/19/2020 total_population cumulative_cases cumulative_cases_per_100_000 0 71977.0 10 13.89 1 71977.0 11 15.28 2 71977.0 13 18.06 3 71977.0 15 20.84 4 71977.0 17 23.62 5 71977.0 17 23.62 6 71977.0 20 27.79 7 71977.0 26 36.12 8 71977.0 33 45.85 9 71977.0 33 45.85 cumulative_deaths cumulative_deaths_per_100_000 new_cases new_deaths 0 0 0.00 2.0 0.0 1 0 0.00 1.0 0.0 2 0 0.00 2.0 0.0 3 0 0.00 2.0 0.0 4 0 0.00 2.0 0.0 5 1 1.39 0.0 1.0 6 1 1.39 3.0 0.0 7 1 1.39 6.0 0.0 8 1 1.39 7.0 0.0 9 1 1.39 0.0 0.0 new_cases_per_100_000 new_deaths_per_100_000 0 2.78 0.00 1 1.39 0.00 2 2.78 0.00 3 2.78 0.00 4 2.78 0.00 5 0.00 1.39 6 4.17 0.00 7 8.34 0.00 5
  • 6. 8 9.73 0.00 9 0.00 0.00 [31]: maryland_confirmed.head(10) [31]: last_update state county_name county_name_long 0 2020-05-12 21:32:28 Maryland Allegany Allegany, Maryland, US 1 2020-05-12 21:32:28 Maryland Anne Arundel Anne Arundel, Maryland, US 2 2020-05-12 21:32:28 Maryland Baltimore Baltimore, Maryland, US 3 2020-05-12 21:32:28 Maryland Calvert Calvert, Maryland, US 4 2020-05-12 21:32:28 Maryland Caroline Caroline, Maryland, US 5 2020-05-12 21:32:28 Maryland Carroll Carroll, Maryland, US 6 2020-05-12 21:32:28 Maryland Cecil Cecil, Maryland, US 7 2020-05-12 21:32:28 Maryland Charles Charles, Maryland, US 8 2020-05-12 21:32:28 Maryland Dorchester Dorchester, Maryland, US 9 2020-05-12 21:32:28 Maryland Frederick Frederick, Maryland, US fips_code lat lon NCHS_urbanization total_population 0 24001.0 39.623576 -78.692805 Small metro 71977.0 1 24003.0 39.006702 -76.603293 Large fringe metro 567696.0 2 24005.0 39.457847 -76.629120 Large fringe metro 827625.0 3 24009.0 38.539616 -76.568206 Large fringe metro 91082.0 4 24011.0 38.871723 -75.829042 Non-core 32875.0 5 24013.0 39.564536 -77.023737 Large fringe metro 167522.0 6 24015.0 39.566477 -75.946274 Large fringe metro 102517.0 7 24017.0 38.510923 -76.985807 Large fringe metro 157671.0 8 24019.0 38.454135 -76.027524 Micropolitan 32261.0 9 24021.0 39.472966 -77.399994 Large fringe metro 248472.0 confirmed confirmed_per_100000 deaths deaths_per_100000 0 148 205.62 13 18.06 1 2520 443.90 127 22.37 2 4051 489.47 204 24.65 3 211 231.66 13 14.27 4 174 529.28 0 0.00 5 589 351.60 60 35.82 6 270 263.37 15 14.63 7 761 482.65 55 34.88 8 102 316.17 2 6.20 9 1282 515.95 77 30.99 [32]: maryland_overall = pd.read_csv("Data/CTPStates-historical.csv").query('state ==␣ →"MD"').reset_index() maryland_overall['date'] = pd.to_datetime(maryland_overall['date'],format =␣ →'%Y%m%d') maryland_overall = maryland_overall.sort_values('date') 6
  • 7. maryland_overall = maryland_overall.merge(testing, left_on='date',␣ →right_on='date', how='inner') maryland_overall['positiveRatio'] = maryland_overall['positive'] /␣ →maryland_overall['cumulative_total_people_tested'] maryland_overall =␣ →maryland_overall[['date','positive','positiveIncrease','positiveRatio','cumulative_total_peo maryland_overall.head(10) [32]: date positive positiveIncrease positiveRatio 0 2020-03-05 0.0 NaN 0.000000 1 2020-03-06 3.0 3.0 0.103448 2 2020-03-07 3.0 0.0 0.068182 3 2020-03-08 3.0 0.0 0.054545 4 2020-03-09 5.0 2.0 0.064103 5 2020-03-10 6.0 1.0 0.063158 6 2020-03-11 9.0 3.0 0.087379 7 2020-03-12 12.0 3.0 0.113208 8 2020-03-13 17.0 5.0 0.153153 9 2020-03-14 26.0 9.0 0.216667 cumulative_total_people_tested 0 17 1 29 2 44 3 55 4 78 5 95 6 103 7 106 8 111 9 120 2.2 Exploratory Data Analysis The simplest way we can view the data is by looking at the number of positive tested cases over time. This will give us a basic understanding about the spread of COVID-19. [33]: (ggplot(maryland_overall,aes(y='positive',x='date')) + geom_point() + ggtitle("Number of Confirmed Cases in MD") + xlab("Date") + ylab("Cases")) 7
  • 8. [33]: <ggplot: (146476666316)> We can see that the number of cases is rapidly increasing, and it seems like the rate is stabilizing into a more linear shape as time goes on. Another interesting thing we can look at is by graphing the positive increase per day, essentially a graph of derivatives at each day. This will give us an idea of how the rate of transmission is changing. [34]: (ggplot(maryland_overall,aes(y='positiveIncrease',x='date')) + geom_point() + geom_smooth(method='lm') + ggtitle("Increase in confirmed cases in MD")) 8
  • 9. [34]: <ggplot: (146483016147)> We see that the rate of transmission seems to be still on the rise, despite quarantine and stay-at- home orders. We can also view the spread of COVID-19 through the ratio of tests that return positive for the virus against total number of tests given. It should also be noted that this data is not truly indicative of the true rate of transmission of the virus, as the we imagine only people who are exhibiting symptoms will go out to be tested. Nevertheless, this graph will show give us some interesting results. [35]: (ggplot(maryland_overall,aes(y='positiveRatio',x='date')) + geom_point() + geom_smooth(method='lm') + ggtitle("Ratio of positive cases in MD")) 9
  • 10. [35]: <ggplot: (146482781353)> Here, the ratio of cases has a sudden discontinuity between the dates of 3/26 and 3/27. The reason for this is that within the data, prior to 3/27, only positive tests were tracked. This means that the ratio is extremely skewed towards positive number of cases, and creates an improbable curve within the graph. In this case, to see the change in overall ratio, we will limit the graph to after 3/27. The COVID Tracking project corroborates this discontinuity here where under the MD row in “States” they record that Maryland did not report negative cases between 3/12 and 3/28 (which is a one day discrepancy between this plot and that recording). Below you can find a plot where the values before 3/28 are thrown out and a linear regression model is fit. [36]: temp = maryland_overall.query('date > "2020-03-27"') (ggplot(temp,aes(y='positiveRatio',x='date')) + geom_point() + geom_smooth(method='lm') + ggtitle("Ratio of positive cases in MD")) 10
  • 11. [36]: <ggplot: (146473328399)> Here is a facet plot of the cumulative cases for all counties in Maryland. From the chart it is clear that some counties have far larger case growth rates than other counties (which is to be expected, as only a few counties are very populous). [37]: (ggplot(maryland_deaths, aes(x="date", y="cumulative_cases")) + geom_point() + facet_wrap('~location_name', ncol=4) + theme(axis_text_x = element_text(angle=90), figure_size=(30,30))) 11
  • 12. [37]: <ggplot: (146482755697)> This is a similar plot, but restricted to the 8 most populous counties (Montgomery, Prince George’s, Baltimore, Baltimore City, Howard, Anne Arundel, Frederick, Harford) as listed here. As popula- tion likely affects transmission rate, it may be useful to isolate these. [38]: temp = maryland_deaths.query('location_name in ["Montgomery", "Prince␣ →George's", "Baltimore", "Baltimore City", "Frederick", "Howard", "Anne Arundel",␣ →"Harford"]') (ggplot(temp, aes(x="date", y="cumulative_cases")) + geom_point() + facet_wrap('~location_name', ncol=4) 12
  • 13. + theme(axis_text_x = element_text(angle=90), figure_size=(70,35))) [38]: <ggplot: (-9223371890378682827)> 2.3 Hypothesis Testing and Machine Learning Now we will perform some hypothesis testing to see if quarantine has actually changed the transmis- sion rate of COVID-19. To do so, we will create linear regressions for the rate of increasing positive cases before and during the quarantine. We will set our null hypothesis as there is no difference between rate of increasing cases, and our alternative hypothesis that there is some difference. [39]: JDate = list() before_df = maryland_overall.query('date < "2020-04-14"') for i, row in before_df.iterrows(): JDate.append(row['date'].to_julian_date()-2458912.5) before_df['jDate'] = JDate before_df['status'] = "0" JDate = list() after_df = maryland_overall.query('date >= "2020-04-14"') for i, row in after_df.iterrows(): JDate.append(row['date'].to_julian_date()-2458952.5) after_df['jDate'] = JDate after_df['status'] = "1" 13
  • 14. compare_df = pd.concat([before_df, after_df]) (ggplot(compare_df,aes(y='positive',x='jDate',color='status')) + geom_point() + geom_smooth(method='lm') + ggtitle("Ratio of positive cases in MD")) [39]: <ggplot: (-9223371890372047065)> [40]: compare_res = sm.ols('positive~jDate*status', data=compare_df).fit() compare_res.summary() [40]: <class 'statsmodels.iolib.summary.Summary'> """ OLS Regression Results ============================================================================== Dep. Variable: positive R-squared: 0.989 Model: OLS Adj. R-squared: 0.988 Method: Least Squares F-statistic: 1880. Date: Mon, 18 May 2020 Prob (F-statistic): 2.55e-62 Time: 16:31:47 Log-Likelihood: -573.32 No. Observations: 68 AIC: 1155. 14
  • 15. Df Residuals: 64 BIC: 1164. Df Model: 3 Covariance Type: nonrobust ================================================================================ ===== coef std err t P>|t| [0.025 0.975] -------------------------------------------------------------------------------- ----- Intercept -2068.0115 368.746 -5.608 0.000 -2804.667 -1331.356 status[T.1] 9435.4242 577.425 16.341 0.000 8281.886 1.06e+04 jDate 191.6871 15.674 12.230 0.000 160.376 222.999 jDate:status[T.1] 711.5184 31.022 22.936 0.000 649.546 773.491 ============================================================================== Omnibus: 6.598 Durbin-Watson: 0.111 Prob(Omnibus): 0.037 Jarque-Bera (JB): 6.084 Skew: 0.722 Prob(JB): 0.0478 Kurtosis: 3.243 Cond. No. 99.8 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. """ The above plot and table show the information for a linear regression. The table shows us that the rate of transmission prior to the quarentine is ~191 new confirmed cases per day, whereas after quarentine began and the incubation period passed, the rate increased to ~711 new cases per day. We can also see that the p-value for each of these two regressions are lower than our designated α of 0.05, thus we can say that these slopes are reasonable. We also see that the p-value for comparing the two slopes is 0, meaning that we reject our null hypothesis that the rate stayed the same throughout quarantine. Something we should note is that our rate actually increased drastically after quarantine(and incu- bation period), which is not at all what we expect, nor what actually happened. It should be noted that it makes sense for viruses, especially ones as contagious as COVID, to increase proportionally to the number of positive cases, as more cases means higher rate of transmission. Additionally, there are many other factors, such as differences between states, urbanization, availability of testing, as well as other reasons that are likely outside the scope of our understanding. 2.4 Other Resources As the COVID-19 case is an ongoing pandemic, you should strive to stay informed about the virus in general. Every major news source, and most minor sources as well, will have ongoing 15
  • 16. updates concerning the virus. As our analysis was solely centered around the case in Maryland, you should check your local and state government webpages concerning their stance and laws concerning the situation. For more in-depth news and articles concerning COVID, you should follow the World Health Organization website for scholarly articles and global news. To view data about the pandemic, we would recommend the data provided by the World Health Organization, as well as taking a look at the COVID Tracking Project and data gathered from Johns Hopkins University, which were used in the analysis of our data. COVID-19 is a serious threat accross the globe, so the more informed you are and the more involved you are with the data, the better your decision making will be on how to stay safe and healthy. [ ]: 16