SlideShare a Scribd company logo
1 of 26
Uncover COVID-19
Forecasting COVID-19 by States with Mobility Data
Group 3
Srinivasa Chaitanya Sai
Sai Kumar Mukka
Yasas Wijesuriya
Project AIT-664
Dr. Hemant Purohit
Outline
[1] Introduction
[2] Data Acquisition and Preparation
[3] Approach and Models
[4] Results and Discussion
[5] Conclusion
2
Introduction
● COVID-19 - An ongoing
pandemic
● State level analysis of
COVID-19 spread in USA
and also integrating it with the
human mobility data.
● Relationship with Human
Mobility Data where mobility
explains about the difference
in the behaviours.
3
Information
Useful, organized,
structured.
3
Data
Signals, know nothing 4
Knowledge
Inference,
understanding,
actionable
1
Knowledge
Contextual, leaning 2
Challenges
● We focused on one specific aspect of initial requirements in this study due to
complexity of integration of multiple datasets at multiple levels of granularity
○ E.g. Patient Level vs State Level
○ We only focus on State Level Data
● Modeling itself was tricky due to internal and external factors changing
COVID-19
○ We used Prophet (by Facebook) which alleviates lot of manual fine tuning needed by the
model.
○ Prophet is a procedure for forecasting time series data based on an additive model where non-
linear trends are fit with weekly effects.
4
Research Questions
1. How lockdown* affected the spread of covid?
Mobility as proxy to see how Lockdowns/Stay-at-Home affects
spread of the virus
1. How mobility changes COVID-19 spread from state to state?
Study the effects of Lockdown in top four states with maximum
number of COVID-19 cases
*note that we are interested only in identifying actual state of lockdown/stay-at-home not the effect of state enforced lockdown. To
prevent further confusion between the meaning of lockdown we will simply refer to this as mobility in feature.
5
Data and Information Acquisition
● Main Dataset - UNCOVER COVID-19 Challenge [1]
➔ A collection of over 200 publicly selected datasets from different sources like World Health
Organisation, New York Times, John Hopkins, World Bank, Google Mobility Data and many
more.
➔ It contains the data which has a different varieties of statistics ,local and global infection
rates,social distancing rules and regulations and also geospatial data on the movement of
people .
1. US-States to Code Mapping [2]
● State → Abbreviation
● Virginia → VA
6
Dataset - New York Times [NYTD]
● New York Times contains five
columns
○ date - Date of the record
○ state - State of USA that has the
cases
○ fips - Federal Information Processing
Standard code of state (numeric)
○ cases - (cumulative) total number of
cases up to that date
○ deaths - (cumulative) total number of
deaths up to that date
● We are interested in the columns
with bold text above
7
Total Number of COVID-19 Cases
8
Alaska
Choropleth Map of Percentage of COVID-19 cases of each state till 07-25-2020
● selected the maximum cases for each state and made a new column of cases_p that contain percentage of cases out of total cases in
USA.
● Sorted the values by the number of cases in each state in descending order where the top ones are states with maximum number of
cases.
Number of Cases in States
with Maximum Cases
Cumulative Number of Cases for each State
9
● This slide shows the cumulative number of COVID-19 cases in different states
● As you can see there are different shapes of distribution of cases in each state
● Makes it harder to identify the number of cases
Dataset - Mobility Data [MD]
● Google Mobility Data [3] contains nine
columns
○ date - Date of the record
○ state - State of the record
○ county - County of the record
○ retail_and_recreation, grocery_and_pharmacy,
parks, transit_stations, workplaces, residential
Difference in time spent in
categorized places compared to
a baseline days
10
Problem Formulation
● The objective of our model is to forecast the number of COVID-19 cases that
will be identified provided the forecasted mobility information.
11
This the problem foundation, basically it indicates the output of the model where these x value indicates the different
features and there will be the mobility features and y is the target variable which is the number of cases. Main aim is
to forecast the number of cases using the past data to predict the future data (1day).
Modeling
Approach
12
NYTD
MD
+
Preprocessing
Fill Null
Values
Dataset Curve Fitting
Regression
Prophet fb
Visualizations (using Matplotlib)
Results and Evaluations Conclusions
Cross
Validation
Ablation Study
Synopsys
of findings
Choropleth
Map
Time Series
Plots
Correlation
Heatmaps
Tables
Time Series
Plots
Bar
Charts
In the approach, visualizations founded at different stages of the process. At first, by taking the new york times dataset and the
mobility dataset from the google then concatenated these two data sets into one dataset. Then, filled the null values with zeros. In the
exploratory, used choropleth map, time series plot , correlation and also auto correlation between the attributes. Next in the modeling
step, performed the regression model, curve fitting and also prophet model. Then used cross validation model and also ablation study
to predict which features are important in predicting the covid cases and for better understanding used visualizations like timeseries,
barcharts and showed as tables. Finally conclusion with the summary of findings.
Data loading and Preprocessing
● Both NYT Data[NYTD] Mobility Data[MD]
available as CSV
● CSVs are loaded using Pandas
● Dataset Integration →
● Preprocessing
○ Rows not found in the mobility
dataset are filled with zeros (i.e.
assumes that it has the baseline
mobility)
○ Missing values are from the start of
time period∴ it has minimal impact
on the analysis
● Dataset Size: 4539
● Period: 02-15-2020 → 07-25-2020
13
Preprocessing
Algo 1. Algorithm used in data integration
Attribute Correlation
14
Visualizations
● These figures shows the correlations between the attributes. The first figure shows the correlation between the
attributes in all usa states.
● The second figure shows the correlation between the attributes in california state.
● The third figure shows the correlation between the attributes in new york state.
● Some states have higher correlation
● We see that parks are more correlated than other places with the number of cases
Autocorrelation
15
● Correlation of the series with itself, lagged by x days
Visualizations
Modeling - Prophet [4]
● Additive Regression Model [5]
● Implementation and Training is Relatively
easy
● Ability to add weekly seasonal component
- On weekdays persons go to office vs
Weekend they might stay inside
16
Modeling
Parameters
● We created multiple models using
different mobility features and
evaluated prediction in next day.
● set k=1 so that our features are
lagged by one day.
● set Prophet to learn weakly trends.
● Cross Validation Parameters:
○ initial=60 days, period=1 days,
horizon = 7 days
17
60 days 7 days
Dataset [State] - Size
7 days
7 days
7 days
7 days
TestTrain
60 days
60 days
60 days
60 days
Results
● Mean Absolute Percentage
Error
● Mean Absolute Error
Error Measures
● Root Mean Squared Error
18
Results
Results
19
Results
Visualizations helps us to determine if models are working properly.
Evaluation and Discussion
20
State RMSE MAE MAPE (%)
California 10187 6411 3.06
Florida 16104 10165 5.99
New Jersey 1487 1258 0.81
New York 3322 2358 0.67
Performance of models on States with maximum cases.
● Different states have different external parameters that
makes it harder to forecast
● Not suitable to compare different states using RMSE or
MAE
● MAPE since it takes into account the total number of cases
Results
Ablation Study
● We studied the performance of models by
removing each of the Mobility Features
21
ID Features Used (POI)
M0 transit_stations, parks, retail_and_recreation,
grocery_and_pharmacy, residential
M1 - transit_stations
M2 - parks
M3 - retail_and_recreation
M4 - grocery_and_pharmacy
M5 - residential
M6 - workplace
Results
Ablation Study
● We studied the performance of models by
removing each of the Mobility Features
22
Results
ID Features Used (POI)
M0 transit_stations, parks, retail_and_recreation,
grocery_and_pharmacy, residential
M1 - transit_stations
M2 - parks
M3 - retail_and_recreation
M4 - grocery_and_pharmacy
M5 - residential
M6 - workplace
Conclusion
● Mobility vs Spread of COVID-19
● Varies from state to state
● How mobility affected the spread of covid?
○ Mobility as proxy to see how Lockdowns/Stay-at-Home affects spread of the virus
■ Shows changes to mobility from usual time to COVID-19 time
■ Provides the real state of lockdown
○ Mobility is useful in predicting number of cases
○ Mobility of different POIs has different effect on predicting number of cases
● How mobility changes COVID-19 spread changes from state to state?
○ Study the effects of Lockdown in four states with maximum number of COVID-19 cases
○ Different states had different affinity to mobility
○ Can be reasoned with local knowledge of that state [requires further investigation]
○ Granular level modeling (state level) could give more insights/knowledge
23
Conclusions
Conclusion
● Challenges
○ Modeling itself was tricky due to internal and external factors changing COVID-19
○ Performing analysis on ongoing pandemic is also challenging
■ Less data to train models
● Visualization
○ Helps identifying which features are best
○ Helps to validate models
○ Presentation matters
● Future Work
○ Add recent data to train and evaluate the models
○ Try selecting best features for each state
● Repository
○ https://github.com/ysenarath/covid-19-mobility-analysis
24
Conclusions
References
[1] UNCOVER COVID-19 Challenge. https://kaggle.com/roche-data-science-coalition/uncover. Accessed 4 Dec. 2020.
[2] Ong, Jason. Jasonong/List-of-US-States. 2012. 2020. GitHub, https://github.com/jasonong/List-of-US-States.
[3] “COVID-19 Community Mobility Report.” COVID-19 Community Mobility Report,
https://www.google.com/covid19/mobility?hl=en. Accessed 4 Dec. 2020.
[4] Taylor, S. J., & Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1), 37-45.
[5] “Additive Model.” Wikipedia, 24 Jan. 2020. Wikipedia,
https://en.wikipedia.org/w/index.php?title=Additive_model&oldid=937324426.
25
26
Group 3
Srinivasa Chaitanya Sai Mupparisetty
Sai Kumar Mukka
Yasas Wijesuriya

More Related Content

Similar to Forecasting covid 19 by states with mobility data

Google trends correlate
Google trends   correlateGoogle trends   correlate
Google trends correlateBitsytask
 
PREDICTION and RATE analysis: Health Insurance
PREDICTION and RATE analysis: Health Insurance PREDICTION and RATE analysis: Health Insurance
PREDICTION and RATE analysis: Health Insurance Sunitha Flowerhill
 
Covid-19 Data Analysis and Visualization
Covid-19 Data Analysis and VisualizationCovid-19 Data Analysis and Visualization
Covid-19 Data Analysis and VisualizationIRJET Journal
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptxVishalLabde
 
Hyatt Hotel Group Project
Hyatt Hotel Group ProjectHyatt Hotel Group Project
Hyatt Hotel Group ProjectErik Bebernes
 
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGPREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGIJDKP
 
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGPREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGIJDKP
 
Predictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime RatePredictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime RateIRJET Journal
 
Informs2020 using machine learning to identify the factors of people's mobi...
Informs2020   using machine learning to identify the factors of people's mobi...Informs2020   using machine learning to identify the factors of people's mobi...
Informs2020 using machine learning to identify the factors of people's mobi...Alex Gilgur
 
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICSCOMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICScscpconf
 
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...Martin Chapman
 
ANALYZING THE EFFECTS OF DIFFERENT POLICIES AND STRICTNESS LEVELS ON MONTHLY ...
ANALYZING THE EFFECTS OF DIFFERENT POLICIES AND STRICTNESS LEVELS ON MONTHLY ...ANALYZING THE EFFECTS OF DIFFERENT POLICIES AND STRICTNESS LEVELS ON MONTHLY ...
ANALYZING THE EFFECTS OF DIFFERENT POLICIES AND STRICTNESS LEVELS ON MONTHLY ...IJDKP
 
Database and Analytics Programming - Project report
Database and Analytics Programming - Project reportDatabase and Analytics Programming - Project report
Database and Analytics Programming - Project reportsarthakkhare3
 
VPUE Presentation - P3 Stakeholder Theory
VPUE Presentation - P3 Stakeholder TheoryVPUE Presentation - P3 Stakeholder Theory
VPUE Presentation - P3 Stakeholder TheoryJohn Lundquist
 
data science pptx
data science pptxdata science pptx
data science pptxHome
 
2b A-Using Big Data for the Sustainable Development Goals 10222015.pdf
2b A-Using Big Data for the Sustainable Development Goals 10222015.pdf2b A-Using Big Data for the Sustainable Development Goals 10222015.pdf
2b A-Using Big Data for the Sustainable Development Goals 10222015.pdfMuhammadZafarHasan
 
Us census bureau
Us census bureauUs census bureau
Us census bureauaddiskeven
 
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSTHE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSManishReddy706923
 

Similar to Forecasting covid 19 by states with mobility data (20)

Google trends correlate
Google trends   correlateGoogle trends   correlate
Google trends correlate
 
PREDICTION and RATE analysis: Health Insurance
PREDICTION and RATE analysis: Health Insurance PREDICTION and RATE analysis: Health Insurance
PREDICTION and RATE analysis: Health Insurance
 
Covid-19 Data Analysis and Visualization
Covid-19 Data Analysis and VisualizationCovid-19 Data Analysis and Visualization
Covid-19 Data Analysis and Visualization
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
Hyatt Hotel Group Project
Hyatt Hotel Group ProjectHyatt Hotel Group Project
Hyatt Hotel Group Project
 
Practical Machine Learning at Work
Practical Machine Learning at WorkPractical Machine Learning at Work
Practical Machine Learning at Work
 
Working from home after COVID-19 - Alexandre Judes
Working from home after COVID-19 - Alexandre JudesWorking from home after COVID-19 - Alexandre Judes
Working from home after COVID-19 - Alexandre Judes
 
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGPREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
 
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MININGPREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
PREDICTIVE MODELLING OF CRIME DATASET USING DATA MINING
 
Predictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime RatePredictive Modeling for Topographical Analysis of Crime Rate
Predictive Modeling for Topographical Analysis of Crime Rate
 
Informs2020 using machine learning to identify the factors of people's mobi...
Informs2020   using machine learning to identify the factors of people's mobi...Informs2020   using machine learning to identify the factors of people's mobi...
Informs2020 using machine learning to identify the factors of people's mobi...
 
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICSCOMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
 
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...
 
ANALYZING THE EFFECTS OF DIFFERENT POLICIES AND STRICTNESS LEVELS ON MONTHLY ...
ANALYZING THE EFFECTS OF DIFFERENT POLICIES AND STRICTNESS LEVELS ON MONTHLY ...ANALYZING THE EFFECTS OF DIFFERENT POLICIES AND STRICTNESS LEVELS ON MONTHLY ...
ANALYZING THE EFFECTS OF DIFFERENT POLICIES AND STRICTNESS LEVELS ON MONTHLY ...
 
Database and Analytics Programming - Project report
Database and Analytics Programming - Project reportDatabase and Analytics Programming - Project report
Database and Analytics Programming - Project report
 
VPUE Presentation - P3 Stakeholder Theory
VPUE Presentation - P3 Stakeholder TheoryVPUE Presentation - P3 Stakeholder Theory
VPUE Presentation - P3 Stakeholder Theory
 
data science pptx
data science pptxdata science pptx
data science pptx
 
2b A-Using Big Data for the Sustainable Development Goals 10222015.pdf
2b A-Using Big Data for the Sustainable Development Goals 10222015.pdf2b A-Using Big Data for the Sustainable Development Goals 10222015.pdf
2b A-Using Big Data for the Sustainable Development Goals 10222015.pdf
 
Us census bureau
Us census bureauUs census bureau
Us census bureau
 
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSTHE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
 

More from Yasas Senarath

Aspect Based Sentiment Analysis
Aspect Based Sentiment AnalysisAspect Based Sentiment Analysis
Aspect Based Sentiment AnalysisYasas Senarath
 
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...Yasas Senarath
 
Affect Level Opinion Mining
Affect Level Opinion MiningAffect Level Opinion Mining
Affect Level Opinion MiningYasas Senarath
 
Data science / Big Data
Data science / Big DataData science / Big Data
Data science / Big DataYasas Senarath
 
Lecture on Deep Learning
Lecture on Deep LearningLecture on Deep Learning
Lecture on Deep LearningYasas Senarath
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysisYasas Senarath
 

More from Yasas Senarath (7)

Aspect Based Sentiment Analysis
Aspect Based Sentiment AnalysisAspect Based Sentiment Analysis
Aspect Based Sentiment Analysis
 
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
Evaluating Semantic Feature Representations to Efficiently Detect Hate Intent...
 
Solr workshop
Solr workshopSolr workshop
Solr workshop
 
Affect Level Opinion Mining
Affect Level Opinion MiningAffect Level Opinion Mining
Affect Level Opinion Mining
 
Data science / Big Data
Data science / Big DataData science / Big Data
Data science / Big Data
 
Lecture on Deep Learning
Lecture on Deep LearningLecture on Deep Learning
Lecture on Deep Learning
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
 

Recently uploaded

如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethSamantha Rae Coolbeth
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...yulianti213969
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样jk0tkvfv
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024patrickdtherriault
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证dq9vz1isj
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"John Sobanski
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancingmohamed Elzalabany
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksBoston Institute of Analytics
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一fztigerwe
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.pptRachmaGhifari
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...ssuserf63bd7
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证pwgnohujw
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证ppy8zfkfm
 

Recently uploaded (20)

如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 

Forecasting covid 19 by states with mobility data

  • 1. Uncover COVID-19 Forecasting COVID-19 by States with Mobility Data Group 3 Srinivasa Chaitanya Sai Sai Kumar Mukka Yasas Wijesuriya Project AIT-664 Dr. Hemant Purohit
  • 2. Outline [1] Introduction [2] Data Acquisition and Preparation [3] Approach and Models [4] Results and Discussion [5] Conclusion 2
  • 3. Introduction ● COVID-19 - An ongoing pandemic ● State level analysis of COVID-19 spread in USA and also integrating it with the human mobility data. ● Relationship with Human Mobility Data where mobility explains about the difference in the behaviours. 3 Information Useful, organized, structured. 3 Data Signals, know nothing 4 Knowledge Inference, understanding, actionable 1 Knowledge Contextual, leaning 2
  • 4. Challenges ● We focused on one specific aspect of initial requirements in this study due to complexity of integration of multiple datasets at multiple levels of granularity ○ E.g. Patient Level vs State Level ○ We only focus on State Level Data ● Modeling itself was tricky due to internal and external factors changing COVID-19 ○ We used Prophet (by Facebook) which alleviates lot of manual fine tuning needed by the model. ○ Prophet is a procedure for forecasting time series data based on an additive model where non- linear trends are fit with weekly effects. 4
  • 5. Research Questions 1. How lockdown* affected the spread of covid? Mobility as proxy to see how Lockdowns/Stay-at-Home affects spread of the virus 1. How mobility changes COVID-19 spread from state to state? Study the effects of Lockdown in top four states with maximum number of COVID-19 cases *note that we are interested only in identifying actual state of lockdown/stay-at-home not the effect of state enforced lockdown. To prevent further confusion between the meaning of lockdown we will simply refer to this as mobility in feature. 5
  • 6. Data and Information Acquisition ● Main Dataset - UNCOVER COVID-19 Challenge [1] ➔ A collection of over 200 publicly selected datasets from different sources like World Health Organisation, New York Times, John Hopkins, World Bank, Google Mobility Data and many more. ➔ It contains the data which has a different varieties of statistics ,local and global infection rates,social distancing rules and regulations and also geospatial data on the movement of people . 1. US-States to Code Mapping [2] ● State → Abbreviation ● Virginia → VA 6
  • 7. Dataset - New York Times [NYTD] ● New York Times contains five columns ○ date - Date of the record ○ state - State of USA that has the cases ○ fips - Federal Information Processing Standard code of state (numeric) ○ cases - (cumulative) total number of cases up to that date ○ deaths - (cumulative) total number of deaths up to that date ● We are interested in the columns with bold text above 7
  • 8. Total Number of COVID-19 Cases 8 Alaska Choropleth Map of Percentage of COVID-19 cases of each state till 07-25-2020 ● selected the maximum cases for each state and made a new column of cases_p that contain percentage of cases out of total cases in USA. ● Sorted the values by the number of cases in each state in descending order where the top ones are states with maximum number of cases. Number of Cases in States with Maximum Cases
  • 9. Cumulative Number of Cases for each State 9 ● This slide shows the cumulative number of COVID-19 cases in different states ● As you can see there are different shapes of distribution of cases in each state ● Makes it harder to identify the number of cases
  • 10. Dataset - Mobility Data [MD] ● Google Mobility Data [3] contains nine columns ○ date - Date of the record ○ state - State of the record ○ county - County of the record ○ retail_and_recreation, grocery_and_pharmacy, parks, transit_stations, workplaces, residential Difference in time spent in categorized places compared to a baseline days 10
  • 11. Problem Formulation ● The objective of our model is to forecast the number of COVID-19 cases that will be identified provided the forecasted mobility information. 11 This the problem foundation, basically it indicates the output of the model where these x value indicates the different features and there will be the mobility features and y is the target variable which is the number of cases. Main aim is to forecast the number of cases using the past data to predict the future data (1day).
  • 12. Modeling Approach 12 NYTD MD + Preprocessing Fill Null Values Dataset Curve Fitting Regression Prophet fb Visualizations (using Matplotlib) Results and Evaluations Conclusions Cross Validation Ablation Study Synopsys of findings Choropleth Map Time Series Plots Correlation Heatmaps Tables Time Series Plots Bar Charts In the approach, visualizations founded at different stages of the process. At first, by taking the new york times dataset and the mobility dataset from the google then concatenated these two data sets into one dataset. Then, filled the null values with zeros. In the exploratory, used choropleth map, time series plot , correlation and also auto correlation between the attributes. Next in the modeling step, performed the regression model, curve fitting and also prophet model. Then used cross validation model and also ablation study to predict which features are important in predicting the covid cases and for better understanding used visualizations like timeseries, barcharts and showed as tables. Finally conclusion with the summary of findings.
  • 13. Data loading and Preprocessing ● Both NYT Data[NYTD] Mobility Data[MD] available as CSV ● CSVs are loaded using Pandas ● Dataset Integration → ● Preprocessing ○ Rows not found in the mobility dataset are filled with zeros (i.e. assumes that it has the baseline mobility) ○ Missing values are from the start of time period∴ it has minimal impact on the analysis ● Dataset Size: 4539 ● Period: 02-15-2020 → 07-25-2020 13 Preprocessing Algo 1. Algorithm used in data integration
  • 14. Attribute Correlation 14 Visualizations ● These figures shows the correlations between the attributes. The first figure shows the correlation between the attributes in all usa states. ● The second figure shows the correlation between the attributes in california state. ● The third figure shows the correlation between the attributes in new york state. ● Some states have higher correlation ● We see that parks are more correlated than other places with the number of cases
  • 15. Autocorrelation 15 ● Correlation of the series with itself, lagged by x days Visualizations
  • 16. Modeling - Prophet [4] ● Additive Regression Model [5] ● Implementation and Training is Relatively easy ● Ability to add weekly seasonal component - On weekdays persons go to office vs Weekend they might stay inside 16 Modeling
  • 17. Parameters ● We created multiple models using different mobility features and evaluated prediction in next day. ● set k=1 so that our features are lagged by one day. ● set Prophet to learn weakly trends. ● Cross Validation Parameters: ○ initial=60 days, period=1 days, horizon = 7 days 17 60 days 7 days Dataset [State] - Size 7 days 7 days 7 days 7 days TestTrain 60 days 60 days 60 days 60 days Results
  • 18. ● Mean Absolute Percentage Error ● Mean Absolute Error Error Measures ● Root Mean Squared Error 18 Results
  • 19. Results 19 Results Visualizations helps us to determine if models are working properly.
  • 20. Evaluation and Discussion 20 State RMSE MAE MAPE (%) California 10187 6411 3.06 Florida 16104 10165 5.99 New Jersey 1487 1258 0.81 New York 3322 2358 0.67 Performance of models on States with maximum cases. ● Different states have different external parameters that makes it harder to forecast ● Not suitable to compare different states using RMSE or MAE ● MAPE since it takes into account the total number of cases Results
  • 21. Ablation Study ● We studied the performance of models by removing each of the Mobility Features 21 ID Features Used (POI) M0 transit_stations, parks, retail_and_recreation, grocery_and_pharmacy, residential M1 - transit_stations M2 - parks M3 - retail_and_recreation M4 - grocery_and_pharmacy M5 - residential M6 - workplace Results
  • 22. Ablation Study ● We studied the performance of models by removing each of the Mobility Features 22 Results ID Features Used (POI) M0 transit_stations, parks, retail_and_recreation, grocery_and_pharmacy, residential M1 - transit_stations M2 - parks M3 - retail_and_recreation M4 - grocery_and_pharmacy M5 - residential M6 - workplace
  • 23. Conclusion ● Mobility vs Spread of COVID-19 ● Varies from state to state ● How mobility affected the spread of covid? ○ Mobility as proxy to see how Lockdowns/Stay-at-Home affects spread of the virus ■ Shows changes to mobility from usual time to COVID-19 time ■ Provides the real state of lockdown ○ Mobility is useful in predicting number of cases ○ Mobility of different POIs has different effect on predicting number of cases ● How mobility changes COVID-19 spread changes from state to state? ○ Study the effects of Lockdown in four states with maximum number of COVID-19 cases ○ Different states had different affinity to mobility ○ Can be reasoned with local knowledge of that state [requires further investigation] ○ Granular level modeling (state level) could give more insights/knowledge 23 Conclusions
  • 24. Conclusion ● Challenges ○ Modeling itself was tricky due to internal and external factors changing COVID-19 ○ Performing analysis on ongoing pandemic is also challenging ■ Less data to train models ● Visualization ○ Helps identifying which features are best ○ Helps to validate models ○ Presentation matters ● Future Work ○ Add recent data to train and evaluate the models ○ Try selecting best features for each state ● Repository ○ https://github.com/ysenarath/covid-19-mobility-analysis 24 Conclusions
  • 25. References [1] UNCOVER COVID-19 Challenge. https://kaggle.com/roche-data-science-coalition/uncover. Accessed 4 Dec. 2020. [2] Ong, Jason. Jasonong/List-of-US-States. 2012. 2020. GitHub, https://github.com/jasonong/List-of-US-States. [3] “COVID-19 Community Mobility Report.” COVID-19 Community Mobility Report, https://www.google.com/covid19/mobility?hl=en. Accessed 4 Dec. 2020. [4] Taylor, S. J., & Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1), 37-45. [5] “Additive Model.” Wikipedia, 24 Jan. 2020. Wikipedia, https://en.wikipedia.org/w/index.php?title=Additive_model&oldid=937324426. 25
  • 26. 26 Group 3 Srinivasa Chaitanya Sai Mupparisetty Sai Kumar Mukka Yasas Wijesuriya