We aim to provide useful information on COVID-19 spread at the county level while contributing new cutting edge techniques to the MLOps and AI research communities. Moreover, we hope to build an easily accessible infrastructure that others can use.
2. Core Products/Services
• COVID-19 county projections dashboard
• Time series forecasting platform
• Multivariate Data lake
• Rapid experimentation ML-Ops framework
3. Current COVID Model deficiencies
”Unfortunately, today’s state-of-the-art models have the following deficiencies:
• Resolution — they work mostly at the country or state level and few are at the county
level (let alone city, neighborhood, or block)
• Timeliness of data used — they have limited ability to leverage observed data
• Model sophistication — at this time, they have limited ability to consider age
distributions or socio-economic conditions.”
-D.J. Patil Former U.S. Chief Data Scientist
4. Our COVID Models
• COVID-19 projections at the county
level (most current models only
forecast at state or country level).
• Model forecasts updated daily with
most recent COVID-19 data and are
retrained on a weekly basis on new
data.
• Final models will incorporate
demographic, geo-spatial, social
media, weather, mask usage and
mobility data.
5. COVID Projections
Dashboard
• Key questions our dashboard aims to answer:
• What will your county hospitalizations and
case counts look like 16 days from now?
• What will my county case numbers look like if
we took the following actions ?
• Which factors are driving the COVID-19
spread (or lack thereof) in the community?
• Historically, if my county had done x what
would our case numbers/hospitalizations be?
• Show a county wide warning if projected
number of hospital beds crosses 100%..
• Forecasted foot traffic/trends in mobility?
• Valuable for epidemiologists, public policy
officials, business owners, and others
concerned about the virus.
6. Core Products/Services
• COVID-19 county projections dashboard
• Time series forecasting platform
• Multivariate Data lake
• Rapid experimentation ML-Ops framework
7. Generating accurate temporal
forecasts and predictions is
crucial in many different
industries!
• Medicine/Health (pandemics, patient
responses to treatments, re-admissions,
mortality risk, OR utilization)
• Climate (hurricanes, droughts, flash floods,
tornado trajectories, solar energy, wind
energy)
• Agriculture (crop yield, crop demand, milk
production, )
• Retail (product demand, product stocking,
click through rate)
• Manufacturing (machine failure, process
scheduling)
• Finance (stocks, company valuation)
8. Many companies/non-
profits struggle to develop
accurate time series
models that leverage all
their data sources and
learn the underlying
trends + casualty in their
data.
9. Flow Forecast: A unified
deep time series
forecasting platform
• Originally used for stream flow
forecasting has now been successfully
applied to COVID-19 and other
problems.
• Wide array of models and loss functions
for time series forecasting, classification,
and anomaly detection.
• Clear documentation and easy to try out
many different parameters to get the
best forecast/prediction.
• Support for model interpretability and
determining casualty in time series.
• Easy to use transfer learning through
config files
10. Flow Forecast
synthesis methods:
• Modules to support synthesis of
image, textual, and other forms
of meta-data.
• Easy to generate embeddings
from many modalities.
• Incorporate multi-horizon
temporal data.
11. Core Products/Services
• COVID-19 county projections dashboard
• Time series forecasting platform
• Multivariate data lake and weight storage
• Rapid experimentation ML-Ops framework
12. Leveraging public data can
enhance performance
• “If only I’d known,” said demand forecasters after a
major weekend weather change left empty space on
many shelves across the region. Pick the weather
event, pick the category impacted, it happens all of the
time. ” - Julian Bridle
• “Many still haven’t realized, however, is that the impact
on ongoing data science production setups has been
dramatic, too. Many of the models used for
segmentation or forecasting started to fail when traffic
and shopping patterns changed, supply chains were
interrupted, and borders were locked down.”
- Info World
13. Enhancing forecasts with public data
• Publicly available data such as weather, census info, COVID-mobility, can enhance many
business forecasts.
• However as of now it is a pain to locate it all join it together.
• Similarly training on publicly available time series data or using existing model weights
could improve proprietary business models.
14. Multivariate Data Lake
• Large data lake consisting of highly
granular weather, soil, COVID, census,
traffic, and other time series data.
• Clear data dictionaries documenting all
the data fields.
• Standardized datetimes and prebuilt
queries to make it easy to join together
data and see if it improves model
performance.
• Saved model weights from public models
to use as a starting point.
• Native integration with flow-forecast
repository.
15. Core Products/Services
• COVID-19 county projections dashboard
• Time series forecasting platform
• Multivariate Data lake
• Rapid experimentation ML-Ops framework
16. 87% of machine
learning models never
make it to production.
Problems with
reproducibility play a
huge role.
–Venture Beat
17. AI
Experimentation
Framework
Experiments from common frameworks (i.e.
flow-forecast, Allen-NLP, transformers) are
spun up based on a GitHub PR.
Links to the experiments results are logged
to the GitHub PR
All communication/experiment results takes
place in the PR.
PR is merged to dev for historical record.
If performance is good enough for
production PR is merged to production
branch. Model is auto deployed.