Data Engineers in Uncertain Times: A COVID-19 Case Study

Data Engineers in
Uncertain Times:
COVID-19 Case Study
Databricks Data + AI Summit - Europe
November 2020

2All content copyright © 2020 QuantumBlack, a McKinsey company
Speakers
James McNiff
Principal Data & ML Engineer
Cris Cunha
Analytics Associate Partner

QuantumBlack is McKinsey’s Centre of Excellence for Analytics and AI
From 28 people in 2015 to 450 in 2020
including >110 PhDs in the field of AI
We have dedicated
people …
… with distinct
capabilities for delivery
of advanced analytics…
Data engineering Data science User design Delivery
… across geographies
…
Tokyo
London
Boston
Sydney
Montreal
Chicago
Singapore
Gurgaon
Sao Paulo

11 March 2020: WHO declares COVID-19 a pandemic; 10 days
later we see significant growth in Australia
Number of daily Australian cases
Source: covid19data.com.au
0
100
140
20
60
40
200
80
120
160
180
220
240
18 Mar
2020
15 Mar
2020
14 Mar
2020
11 Mar
2020
12 Mar
2020
13 Mar
2020
16 Mar
2020
17 Mar
2020
19 Mar
2020
20 Mar
2020
21 Mar
2020
Exponential fit Daily new cases Australian context
Daily growth in cases taking
form of an exponential
shape in Australia
Immediate concern on
health system capacity,
livelihoods of Australians
and at-risk vulnerable
populations
How can we best provide
quick insights that are
relevant to thought
leaders in the country to
best prepare for the
challenges ahead?

We stood up a team to build Analytics models aimed at answering 3
questions around the potential future of COVID-19 in Australia
McKinsey & Company 6
150
200
500
50
450
400
350
2.5
0
4.0
1.0
250
3.0
3.5
100
4.5
5.0
1.5
0.5
2.0
0
300
01May2020
15Mar2020
16Apr2020
15Apr2020
20Apr2020
10Apr2020
07Apr2020
04Apr2020
12.May2020
02.Apr2020
13Mar2020
08.May2020
31.Mar2020
21Mar2020
29.Mar2020
17Apr2020
25Mar2020
22Mar2020
21Apr2020
24Mar2020
11Apr2020
26Mar2020
23Mar2020
03May2020
16Mar2020
20.May2020
06.May2020
08Apr2020
27.May2020
14Mar2020
18Mar2020
15.May2020
12.Mar2020
09.May2020
23Apr2020
28Mar2020
25Apr2020
19Apr2020
12Apr2020
04May2020
14.May2020
03.Apr2020
01.Apr2020
24Apr2020
07.May2020
27Apr2020
18.May2020
02May2020
30Apr2020
20Mar2020
17.May2020
16.May2020
22Apr2020
06Apr2020
30.Mar2020
05May2020
10.May2020
28Apr2020
29Apr2020
11.May2020
27Mar2020
05.Apr2020
26Apr2020
Effective reproductive
number over time (Rt)
13.May2020
24.May2020
18Apr2020
19.May2020
23.May2020
21.May2020
22.May2020
25.May2020
14Apr2020
13Apr2020
26.May2020
17Mar2020
28.May2020
11.Mar2020
09Apr2020
19Mar2020
B. Australia’s Rt is around 1; case numbers are
low but still happening driving large uncertainty
Number of Australian cases tracked against introduction of restrictions, Rt
Interventions
Rt
impact
Moderate measures (e.g.,
high-adherence physical
distancing, shelter-in-
place) to delay the
overload to healthcare
systems
Conservative measures
(e.g., voluntary physical
distancing) to minimize
economic and social
impacts
Robust measures (e.g.,
high-adherence shelter-in-
place, lockdown) to
minimise transmission
HighLow
Source: QuantumBlack Analysis. apparent reproductive numbers at point in time t from SEIR model differential evolution fit based on QuantumBlack SEIR model framework with values used by the Doherty Institute in modelling for
Australian government. Assumptions (updated as of 25/05/2020 in line with Doherty Institute modelling for Australian Government): Incubation period of 3.2 days, infectious period of 2.9 days. Moss, R., et al. ”Modelling the impact of
COVID-19 in Australia to inform transmission reducing measures and health system preparedness, Doherty Institute, 2020. “Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand”
from Neil Ferguson et al from the Imperial College Covid-19 Response Team on March 16 2020
Overseas Transmission
(not registered before 7 Apr)
Local TransmissionRtRt 95% CI
18 March 2020
Restrictions on indoor gatherings
20 March 2020
Travel ban on foreign nationals entering Australia
Restriction of travel to remote communities
28 March 2020
All people entering Australia required to undertake a mandatory 14-day
quarantine at designated facilities (e.g. hotels) in their port of arrival
16 March 2020
Non-essential
gatherings of > 500
people banned
15 March
2020
All overseas
arrivals required
to self-isolate for
14 days and
cruise ship
arrivals banned.
Modelling as of May 29 DRAFT
Note: CONFIDENTIAL AND PROPRIETARY. Any use of this material without specific permission of the owner is strictly prohibited. The information included in this report will
not contain, nor are they for the purpose of constituting, policy advice. We emphasize that statements of expectation, forecasts and projections relate to future events and are
based on assumptions that may not remain valid for the whole of the relevant period. Consequently, they cannot be relied upon, and we express no opinion as to how closely
the actual results achieved will correspond to any statements of expectation, forecasts or projections.
11 May 2020
Government publishes 4 stage plan with
several states relaxing restrictions on
gatherings and hospitality businesses
start with a limited reopening
McKinsey & Company
DOCUMENT INTENDED TO PROVIDE INSIGHT BASED ON CURRENTLY AVAILABLE INFORMATION FOR CONSIDERATION AND NOT SPECIFIC ADVICE, 25/05/2020
8
C. Hitting the “dancefloor”: it is important to keep Rt below 1
to not overwhelm the Australian health system
Health system capacity for new Rt scenarios in 10 days
500,000
5,000
1
0
50
0
25
26/12/2004/12/2021/10/2020/5/20 11/6/20 18/3/2112/11/2016/8/20 07/9/2025/7/2003/7/20 10/4/2117/1/21 27/2/2129/9/20
1
0
500,000
5,000
500,000
1
0
5,000
Source: QuantumBlack Analysis, health system capacity from “Anzics-CORE-Report” , AIHW hospitals. Apparent reproductive numbers at point in time t from SEIR model fit based on QuantumBlack SEIR model framework with values used by the Doherty Institute in modelling for Australian government. Assumptions (updated as of
06/05/2020 in line with Doherty Institute modelling for Australian Government): Incubation period of 3.2 days, infectious period of 2.9 days. Moss, R., et al. ”Modelling the impact of COVID-19 in Australia to inform transmission reducing measures and health system preparedness, Doherty Institute, 2020. “Impact of non-pharmaceutical
interventions (NPIs) to reduce COVID19 mortality and healthcare demand” from Neil Ferguson et al from the Imperial College Covid-19 Response Team on March 16 2020
Note: CONFIDENTIAL AND PROPRIETARY. Any use of this material without specific permission of the owner is strictly prohibited. The information included in this report will not contain, nor are they for the purpose of constituting, policy advice.
We emphasize that statements of expectation, forecasts and projections relate to future events and are based on assumptions that may not remain valid for the whole of the relevant period. Consequently, they cannot be relied upon, and we
express no opinion as to how closely the actual results achieved will correspond to any statements of expectation, forecasts or projections.
200% of ICU capacity: 4458 beds
Moderate social distancing Rt 1.68
Assumes 33% reduction in contacts network
of individuals across the population
Light social distancing Rt 1.9
Assumes 25% reduction in contacts network
of individuals across the population
Unmitigated Rt 2.53
Represents a scenario where there is no at-
scale and deliberate actions to proactively
control transmission of the disease
Current trajectory Rt 0.5-0.8
Latest Rt levels observed based on an
rolling window of the last 10 days
100% of ICU capacity: 2229 beds 50% of ICU capacity: 1114 beds
Stricter social distancing Rt 1.2
Assumes a scenario in which there is still
60-70% of reduction in contacts network
5,000
2,500
0
Peak ICU demand ~450,000; 85% of population infected
0.02% of population infected
400% of ICU capacity: 8916 beds
10,000
ICU demand
# active infections requiring ICU, log5 normalised
What do we know about COVID-19
and how do we model critical
characteristics from the disease based
on different countries, populations and
behaviours over time?
What is the trajectory of COVID-
19 in Australia and how do we best
track it at a state and transmission
source (overseas, local, community)
level?
What are potential scenarios for
the Australian health system and
how to best think about the levers to
pull, based on critical input
parameters?
Providing a perspective on how to prepare for the COVID-19 challenge in Australia
C. It is unlikely that Rt will sustain above 1 in the
“dance”; interventions can take different forms
Rt trajectory archetypes from reference countries based on historical analysis
Rt trajectory archetypes
Rapid reduction: New Zealand
deployed high stringency lockdowns
early in the growth phase
Contained transmissions: Australia
did not go to the depth of New
Zealand stringency however
intervened early in the growth phase
Extensive spread: UK, US. Both
countries slowly ramped up
restrictions and started acting later in
the growth trajectory resulting in a
longer period at Rt levels
Second wave: Singapore acted early
and constantly in the trajectory
however saw resurgence taking
place from localized outbreaks
Note: CONFIDENTIAL AND PROPRIETARY. Any use of this material without specific permission of the owner is strictly prohibited. The information included in
this report will not contain, nor are they for the purpose of constituting, policy advice. We emphasize that statements of expectation, forecasts and projections
relate to future events and are based on assumptions that may not remain valid for the whole of the relevant period. Consequently, they cannot be relied upon,
and we express no opinion as to how closely the actual results achieved will correspond to any statements of expectation, forecasts or projections.
New Zealand Australia
US Singapore
20,000
10,000
840,000
0
630,000
2
4
0
07/3/20
03/3/20
28/4/20
10/5/20
23/3/20
04/4/20
20/4/20
27/3/20
15/3/20
02/5/20
08/4/20
12/4/20
11/3/20
16/4/20
24/4/20
31/3/20
06/5/20
19/3/20
Number of new confirmed casesRt
300
2200
0
4400
100
500
0
5
1
3
06/5/20
12/4/20
23/3/20
31/3/20
24/4/20
16/4/20
10/5/20
27/3/20
04/4/20
02/5/20
20/4/20
28/4/20
08/4/20
500
0
1
51,500
3
2
1,000
0
4
03/3/20
07/3/20
27/3/20
10/5/20
12/4/20
19/3/20
15/3/20
06/5/20
04/4/20
16/4/20
08/4/20
20/4/20
31/3/20
23/3/20
24/4/20
02/5/20
11/3/20
28/4/20
2200
0
100
300
400
1
500
4
5
3
0
15/3/20
11/3/20
24/4/20
10/5/20
07/3/20
19/3/20
27/3/20
04/4/20
23/3/20
03/3/20
08/4/20
31/3/20
20/4/20
12/4/20
16/4/20
28/4/20
02/5/20
06/5/20
McKinsey & Company
5
30
3.01.0 2.5
1
2.0 6.04.0 7.0 8.55.0 8.0
50
9.0
10
0
0.1
3.5
20
7.5
40
60
1.5 4.50.5 5.5 6.5 9.5
Bird flu
Swine flu
Spanish flu
Ebola
Tuberculosis
SARS
Seasonal flu
Smallpox
Chickenpox
Common cold
Measles
Rotavirus
Infectiousness average no. of people infected by each sick person (R0)
Polio
Percent who die
MERS
Norovirus
A. COVID behavior is still largely variable however there are
common insights from the early phases of the pandemic
COVID-19 Data Pack
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21
Seasonal Flu
Cholera
Bacterial
Average
Typical time
(“Stomach bug”)
Meningitis
Chickenpox
Potential unknown at present
SARS
Measles
MERS
Ebola
Swine Flu H1N1
COVID-19
Coronavirus
Pneumonia
(“Vomiting bug”)
Rotavirus
Norovirus
Range of time after infection but before showing symptoms, when a person can potentially spread
a disease
Days after being infected
Those aged +60 are most at risk …
Percentage of deceased (Italy, UK and Australia)
Multiple conditions increase risk
Serious conditions present in those who have died
Study of 3,372 death cases in UK & 21,551 deaths in Italy
Average age of victim
in this analysis was
79
1% no conditions
25%
1 condition
26%
2 conditions
48%
+3 conditions
Having these conditions does not
mean you will die of the disease
– they are just risk factors
Study of 355 deaths from 16,925 confirmed cases in Italy
CFR is unreliable
during a pandemic
We don’t fully know yet but it’s in this range
COVID-19 infectiousness: 1.5-3.5%
Fatally rate: 0.7-3.4%
Mild Like flu, stay at home Severe Hospitalization Critical Intensive care
80,90%
13,80%
4,70%
Incubation periods How contagious & deadly is it? The majority of infections are mild
Seriousness of symptoms
Source: US centers for Disease Control & Prevention, WHO, Lauer et al (2020) Source: US Centers for Disease Control & Prevention, WHO, New York Times Source: China Center for Disease Control and Prevention, Statista
Study of 44,672 confirmed cases in Mainland, China
7,30%
Abnormally high blood pressure
6,30%
Cardiovascular disease
No existing conditions
Cancer
Diabetes
Chronic respiratory disease
10,50%
6,00%
5,60%
0,90%
…especially those with existing conditions
% of deceased with other serious ailments
Source: Italian Portal of Epidemiology for Public Health, UK Office of National Statistics Source: Italian Portal of Epidemiology for Public HealthSource: China Center for Disease Control and Prevention, Statista
Study of 44,672 confirmed cases in Mainland, China
Source: ourworldindata.org, QuantumBlack Analysis
0.7 0.8
2.5
9.7
24.4
30.3
25.1
0.8 1.7
5.4
11.2
38.6
15.6
1.0 2.0
8.0
31.0
28.0
19.0
26.6
Age
<40 40-49 50-59 80-8970-7960-69 90+
Italy UK Australia
63%
37%
Men
Women
Note
The case fatality rate
(CFR) only shows
the percentage of
confirmed cases
who have died
McKinsey & Company
7
B. Most states Rt remain < 1 with signs of isolated outbreaks driving
changes since early May and bringing Australia Rt closer to 1
Total new daily confirmed cases and Rt1 over time, rolling window of 10 days by state
1 Apparent reproductive number at time t from SEIR model fit based on QuantumBlack SEIR model framework. Assumptions (updated as of 27/05/2020 in line with Doherty Institute modelling for Australian Government and Imperial College): Incubation period of 3.2 days (95% CrI:
2.3, 4.0), serial interval mean 4.7 and standard deviation 2.9. “Estimating the case detection rate and temporal variation in transmission of COVID-19 in Australia”, Doherty Institute and “Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare
demand” from Neil Ferguson et al from the Imperial College Covid-19 Response Team on March 16 2020
Source: Case data compiled by John Hopkins University: https://coronavirus.jhu.edu/map.html, and https://www.covid19data.com.au and QuantumBlack analysis
Rt (local cases only including under investigation)Overseas or interstate cases Under investigation cases Known local cases
NSW
VIC
QLD
WA
SA
TAS
Number of weeks to achieve selected ICU capacity % factors, Australia
Note: CONFIDENTIAL AND PROPRIETARY. Any use of this material without specific permission of the owner is strictly prohibited. The information included in
this report will not contain, nor are they for the purpose of constituting, policy advice. We emphasize that statements of expectation, forecasts and projections
relate to future events and are based on assumptions that may not remain valid for the whole of the relevant period. Consequently, they cannot be relied upon,
and we express no opinion as to how closely the actual results achieved will correspond to any statements of expectation, forecasts or projections.
Number of weeks to peak 50% of ICU capacity for
scenarios of active cases vs. R
4 – 8 weeks 8 - 12 weeks > 12 weeks
Reaction time is
dependent on both the
number of active cases
and the Rt levels we
expect to see
Number of active casesNumber of active cases
3.0
Number of active cases Number of active cases
Rt
0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5
10
100
200
300
400
500
600
700
800
900
1000
2000
3000
4000
5000
10000
Rt
0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5
10
100
200
300
400
500
600
700
800
900
1000
2000
3000
4000
5000
10000
Rt
0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5
10
100
200
300
400
500
600
700
800
900
1000
2000
3000
4000
5000
10000
Rt
0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5
10
100
200
300
400
500
600
700
800
900
1000
2000
3000
4000
5000
10000
D. Reaction time becomes the underpinning
indicator to effectively manage the “dance”
Rt levels up to between
1.2-1.4 regardless of ICU
capacity would see
Australia having > 12
weeks before
overwhelming the health
system
Anywhere above 1000
active cases start
significantly increasing the
zone where Australia
doesn’t have more than 4-
8 weeks before hitting
ICU capacity

So, how did we get there
in a fragmented and rapidly
evolving data landscape?
Unclear data
reconciliation
rules…
Over 10+ data
sources to tap in
to…
Varying levels of
granularity at a state
level…
Inconsistencies between
reporting times from
different sources…
Evolving virus
characteristics on a daily
basis with a wide range
of critical assumptions
being used…

Timeliness
Credibility
Transparency
Data
Access
Data
licensing
Consent
Flexibility
Data Quality
Data
management
Intellectual
property
Adaptability
Trust
Challenges as a
data consumer
All content copyright © 2020 QuantumBlack, a McKinsey company 7

Open-sourced python library for production-ready data
and ML pipelines
kedro.readthedocs.io
https://github.com/quantumblacklabs/kedro
“Always know what to expect from your data”
Kedro Plugin for Great Expectations. Open Sourcing
soon.
https://github.com/great-expectations/great_expectations
Kedro Great Expectations
+

Special thanks to
Juliette O’Brien
Creator of covid19data.com.au

Find out more & get in touch

Data Engineers in Uncertain Times: A COVID-19 Case Study

Data Engineers in Uncertain Times: A COVID-19 Case Study

Recommended

Recommended

More Related Content

Similar to Data Engineers in Uncertain Times: A COVID-19 Case Study

Similar to Data Engineers in Uncertain Times: A COVID-19 Case Study (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

Data Engineers in Uncertain Times: A COVID-19 Case Study