Forecasting covid 19 by states with mobility data

Uncover COVID-19
Forecasting COVID-19 by States with Mobility Data
Group 3
Srinivasa Chaitanya Sai
Sai Kumar Mukka
Yasas Wijesuriya
Project AIT-664
Dr. Hemant Purohit

Outline
[1] Introduction
[2] Data Acquisition and Preparation
[3] Approach and Models
[4] Results and Discussion
[5] Conclusion
2

Introduction
● COVID-19 - An ongoing
pandemic
● State level analysis of
COVID-19 spread in USA
and also integrating it with the
human mobility data.
● Relationship with Human
Mobility Data where mobility
explains about the difference
in the behaviours.
3
Information
Useful, organized,
structured.
3
Data
Signals, know nothing 4
Knowledge
Inference,
understanding,
actionable
1
Knowledge
Contextual, leaning 2

Challenges
● We focused on one specific aspect of initial requirements in this study due to
complexity of integration of multiple datasets at multiple levels of granularity
○ E.g. Patient Level vs State Level
○ We only focus on State Level Data
● Modeling itself was tricky due to internal and external factors changing
COVID-19
○ We used Prophet (by Facebook) which alleviates lot of manual fine tuning needed by the
model.
○ Prophet is a procedure for forecasting time series data based on an additive model where non-
linear trends are fit with weekly effects.
4

Research Questions
1. How lockdown* affected the spread of covid?
Mobility as proxy to see how Lockdowns/Stay-at-Home affects
spread of the virus
1. How mobility changes COVID-19 spread from state to state?
Study the effects of Lockdown in top four states with maximum
number of COVID-19 cases
*note that we are interested only in identifying actual state of lockdown/stay-at-home not the effect of state enforced lockdown. To
prevent further confusion between the meaning of lockdown we will simply refer to this as mobility in feature.
5

Data and Information Acquisition
● Main Dataset - UNCOVER COVID-19 Challenge [1]
➔ A collection of over 200 publicly selected datasets from different sources like World Health
Organisation, New York Times, John Hopkins, World Bank, Google Mobility Data and many
more.
➔ It contains the data which has a different varieties of statistics ,local and global infection
rates,social distancing rules and regulations and also geospatial data on the movement of
people .
1. US-States to Code Mapping [2]
● State → Abbreviation
● Virginia → VA
6

Dataset - New York Times [NYTD]
● New York Times contains five
columns
○ date - Date of the record
○ state - State of USA that has the
cases
○ fips - Federal Information Processing
Standard code of state (numeric)
○ cases - (cumulative) total number of
cases up to that date
○ deaths - (cumulative) total number of
deaths up to that date
● We are interested in the columns
with bold text above
7

Total Number of COVID-19 Cases
8
Alaska
Choropleth Map of Percentage of COVID-19 cases of each state till 07-25-2020
● selected the maximum cases for each state and made a new column of cases_p that contain percentage of cases out of total cases in
USA.
● Sorted the values by the number of cases in each state in descending order where the top ones are states with maximum number of
cases.
Number of Cases in States
with Maximum Cases

Cumulative Number of Cases for each State
9
● This slide shows the cumulative number of COVID-19 cases in different states
● As you can see there are different shapes of distribution of cases in each state
● Makes it harder to identify the number of cases

Dataset - Mobility Data [MD]
● Google Mobility Data [3] contains nine
columns
○ date - Date of the record
○ state - State of the record
○ county - County of the record
○ retail_and_recreation, grocery_and_pharmacy,
parks, transit_stations, workplaces, residential
Difference in time spent in
categorized places compared to
a baseline days
10

Problem Formulation
● The objective of our model is to forecast the number of COVID-19 cases that
will be identified provided the forecasted mobility information.
11
This the problem foundation, basically it indicates the output of the model where these x value indicates the different
features and there will be the mobility features and y is the target variable which is the number of cases. Main aim is
to forecast the number of cases using the past data to predict the future data (1day).

Modeling
Approach
12
NYTD
MD
+
Preprocessing
Fill Null
Values
Dataset Curve Fitting
Regression
Prophet fb
Visualizations (using Matplotlib)
Results and Evaluations Conclusions
Cross
Validation
Ablation Study
Synopsys
of findings
Choropleth
Map
Time Series
Plots
Correlation
Heatmaps
Tables
Time Series
Plots
Bar
Charts
In the approach, visualizations founded at different stages of the process. At first, by taking the new york times dataset and the
mobility dataset from the google then concatenated these two data sets into one dataset. Then, filled the null values with zeros. In the
exploratory, used choropleth map, time series plot , correlation and also auto correlation between the attributes. Next in the modeling
step, performed the regression model, curve fitting and also prophet model. Then used cross validation model and also ablation study
to predict which features are important in predicting the covid cases and for better understanding used visualizations like timeseries,
barcharts and showed as tables. Finally conclusion with the summary of findings.

Data loading and Preprocessing
● Both NYT Data[NYTD] Mobility Data[MD]
available as CSV
● CSVs are loaded using Pandas
● Dataset Integration →
● Preprocessing
○ Rows not found in the mobility
dataset are filled with zeros (i.e.
assumes that it has the baseline
mobility)
○ Missing values are from the start of
time period∴ it has minimal impact
on the analysis
● Dataset Size: 4539
● Period: 02-15-2020 → 07-25-2020
13
Preprocessing
Algo 1. Algorithm used in data integration

Attribute Correlation
14
Visualizations
● These figures shows the correlations between the attributes. The first figure shows the correlation between the
attributes in all usa states.
● The second figure shows the correlation between the attributes in california state.
● The third figure shows the correlation between the attributes in new york state.
● Some states have higher correlation
● We see that parks are more correlated than other places with the number of cases

Autocorrelation
15
● Correlation of the series with itself, lagged by x days
Visualizations

Modeling - Prophet [4]
● Additive Regression Model [5]
● Implementation and Training is Relatively
easy
● Ability to add weekly seasonal component
- On weekdays persons go to office vs
Weekend they might stay inside
16
Modeling

Parameters
● We created multiple models using
different mobility features and
evaluated prediction in next day.
● set k=1 so that our features are
lagged by one day.
● set Prophet to learn weakly trends.
● Cross Validation Parameters:
○ initial=60 days, period=1 days,
horizon = 7 days
17
60 days 7 days
Dataset [State] - Size
7 days
7 days
7 days
7 days
TestTrain
60 days
60 days
60 days
60 days
Results

● Mean Absolute Percentage
Error
● Mean Absolute Error
Error Measures
● Root Mean Squared Error
18
Results

Results
19
Results
Visualizations helps us to determine if models are working properly.

Evaluation and Discussion
20
State RMSE MAE MAPE (%)
California 10187 6411 3.06
Florida 16104 10165 5.99
New Jersey 1487 1258 0.81
New York 3322 2358 0.67
Performance of models on States with maximum cases.
● Different states have different external parameters that
makes it harder to forecast
● Not suitable to compare different states using RMSE or
MAE
● MAPE since it takes into account the total number of cases
Results

Ablation Study
● We studied the performance of models by
removing each of the Mobility Features
21
ID Features Used (POI)
M0 transit_stations, parks, retail_and_recreation,
grocery_and_pharmacy, residential
M1 - transit_stations
M2 - parks
M3 - retail_and_recreation
M4 - grocery_and_pharmacy
M5 - residential
M6 - workplace
Results

Ablation Study
● We studied the performance of models by
removing each of the Mobility Features
22
Results
ID Features Used (POI)
M0 transit_stations, parks, retail_and_recreation,
grocery_and_pharmacy, residential
M1 - transit_stations
M2 - parks
M3 - retail_and_recreation
M4 - grocery_and_pharmacy
M5 - residential
M6 - workplace

Conclusion
● Mobility vs Spread of COVID-19
● Varies from state to state
● How mobility affected the spread of covid?
○ Mobility as proxy to see how Lockdowns/Stay-at-Home affects spread of the virus
■ Shows changes to mobility from usual time to COVID-19 time
■ Provides the real state of lockdown
○ Mobility is useful in predicting number of cases
○ Mobility of different POIs has different effect on predicting number of cases
● How mobility changes COVID-19 spread changes from state to state?
○ Study the effects of Lockdown in four states with maximum number of COVID-19 cases
○ Different states had different affinity to mobility
○ Can be reasoned with local knowledge of that state [requires further investigation]
○ Granular level modeling (state level) could give more insights/knowledge
23
Conclusions

Conclusion
● Challenges
○ Modeling itself was tricky due to internal and external factors changing COVID-19
○ Performing analysis on ongoing pandemic is also challenging
■ Less data to train models
● Visualization
○ Helps identifying which features are best
○ Helps to validate models
○ Presentation matters
● Future Work
○ Add recent data to train and evaluate the models
○ Try selecting best features for each state
● Repository
○ https://github.com/ysenarath/covid-19-mobility-analysis
24
Conclusions

References
[1] UNCOVER COVID-19 Challenge. https://kaggle.com/roche-data-science-coalition/uncover. Accessed 4 Dec. 2020.
[2] Ong, Jason. Jasonong/List-of-US-States. 2012. 2020. GitHub, https://github.com/jasonong/List-of-US-States.
[3] “COVID-19 Community Mobility Report.” COVID-19 Community Mobility Report,
https://www.google.com/covid19/mobility?hl=en. Accessed 4 Dec. 2020.
[4] Taylor, S. J., & Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1), 37-45.
[5] “Additive Model.” Wikipedia, 24 Jan. 2020. Wikipedia,
https://en.wikipedia.org/w/index.php?title=Additive_model&oldid=937324426.
25

26
Group 3
Srinivasa Chaitanya Sai Mupparisetty
Sai Kumar Mukka
Yasas Wijesuriya

Forecasting covid 19 by states with mobility data

Recommended

Recommended

More Related Content

Similar to Forecasting covid 19 by states with mobility data

Similar to Forecasting covid 19 by states with mobility data (20)

More from Yasas Senarath

More from Yasas Senarath (7)

Recently uploaded

Recently uploaded (20)

Forecasting covid 19 by states with mobility data